Unicode directory and file manipulation functions

All the directory and file manipulation functions (mkdir, stat, etc) appear
to take single byte strings (or maybe UTF-8) for filenames and pathnames.
Do equivilant functions exist for unicode filenames and pathnames? Or do I
need to convert unicode to multibyte (UTF-8)?

Thanks
Kevin

Kevin Stallard <kevin@a.com> wrote:

All the directory and file manipulation functions (mkdir, stat, etc) appear
to take single byte strings (or maybe UTF-8) for filenames and pathnames.
Do equivilant functions exist for unicode filenames and pathnames? Or do I
need to convert unicode to multibyte (UTF-8)?

Non-authoritative:

Hm… the POSIX functions do just take simple strings, and I don’t know
of any unicode versions. So, I’d expect at minimum you’d have to convert
to UTF-8.

But…

It further depends on what file system you are talking to – many
filesystems restrict which characters are considered to be legal in
a file name, and in particular I’m pretty sure that the QNX 4 filesystem
does so. So, even after converting to UTF-8, what you have may, quite
likely, be an illegal file name.

-David

David Gibbs
QNX Training Services
dagibbs@qnx.com

Kevin Stallard wrote:

All the directory and file manipulation functions (mkdir, stat, etc) appear
to take single byte strings (or maybe UTF-8) for filenames and pathnames.
Do equivilant functions exist for unicode filenames and pathnames? Or do I
need to convert unicode to multibyte (UTF-8)?

It is best to follow the rule that they are UTF-8. For filesystems
which actually care (eg fs-dos, fs-cd, where they are dealing with
on-disk 16-bit BE wide-chars) and are doing conversions, your input
filename is parsed in that manner, and output filenames constructed in
that format. See routines like wcstombs() to help you convert if you
have non-UTF8 yourself. Filesystems like fs-qnx4, or any other naive
server which is just using str*() routines and storing raw name bytes
as-provided, UTF-8 just works too (due to properties of that encoding)
even though it is not being enforced in or out. But being consistent
here makes things cleaner / less confusing (especially if you say throw
Photon into the mix, which uses UTF8 for its strings, so somethnig like
pfm should just work).

Summary answers: mostly, no, yes …