I was tinkering with the LFN and FAT all this day and it turned out
this is Win95 specific issue.
To make long story short, “don’t put UTF-8 in 8.3”.
in <firstname.lastname@example.org> email@example.com wrote:
firstname.lastname@example.org > wrote:
: I guess this is already in bugreport, but
: Looks like UTF-8’ed long filenames are written somewhat incorrect
: on the filesystem. You won’t notice it on English-only-filename
: filled directories; only localized filenames
person to actually notice this. I can see all sorts of issues with
processing of the (non-ASCII) multibyte names and 8.3 munging, but it
would help to know exactly what you see. Are files missing in the
archive? Are they missing when restored? Are they incorrectly named
in the archive or when restored? Is there a pattern to the mis-naming
(short, long, starting with certain characters, etc)? Thanks …
tar is correctly created, with UTF-8 filenames.
I confirmed this by also binary dump/untarring in Solaris.
It looks correct after restore in QNX File Manager.
By hexdumping the FAT entries, I confirmed that at least LFN is restored
correctly, including checksums.
the localised “Program” filename:
Unicode FF8C FF9F FF9B FF78 FF9E FF97 FF91
was stored in FAT entry as:
00004A40 41 8C FF 9F FF 9B FF 78:FF 9E FF 0F 00 5A 97 FF|A…x…Z…
00004A50 91 FF 00 00 FF FF FF FF:FF FF 00 00 FF FF FF FF|…
00004A60 EF BE 8C EF BE 9F 7E 31:20 20 20 20 00 00 00 00|…~1 …
00004A70 00 00 00 00 00 00 88 2E:87 2B 03 00 0A 00 00 00|…+…
(the 8.3 is (EF BE 8C)(EF BE 9F)~1, seemingly mangled from UTF-8)
However, the 8.3 filename is munged directly from UTF-8 and
Win95 seems to not like this.
After numerous try-and-error, Win95 seems to check
- the 8 of 8.3 ends with “~1” or similar munge indicator
- the 8.3 looks sane in system codepage
(“chev us” (use US codepage) doesn’t cure)
In Win95, so that only those entries having “sane” 8.3 by chance,
will have LFN assigned.
- The problem seems to lie in 8.3 generator.
As result, most filenames are presented with QNX mangled 8.3 in raw UTF-8.
Some of these would contain illegal characters, so Win95 can’t
stat/delete them (bad).
in above example,[EF BE 8C EF BE 9F 7E 31:20 20 20] doesn’t look like
a valid string in codepage 932 (japanese), so Win95 thinks
this doesn’t have valid LFN. (don’t ask me why)
I don’t expect QNX will generate 8.3 exactly as in Windows,
as this requires codepage option in fs-dos, but at least
it’s better to have LFN picked up.
So here’s the short-term fix suggestion:
- Have the 8.3 generator only include ASCII filenames.
You don’t have to decode Unicode to DOS codepage;
just masking into 7 bit (and excluding illegal chars) is enough.
I know (internationalized,multilingual,whatever) iso-8859-* folks
won’t like 7bit, so anyway you may need some 8.3 generation option
In long term you may need “codepage=” option for fs-dos.