>> the byte 0x5C occurs as second byte of some multibyte characters. If such a >> character is used inside a directory name, code that uses ISSLASH does not >> work correctly. All gnulib modules that use ISSLASH are affected.
>Could this also be a problem on Unix systems using multibyte encoded >(UTF-8) filesystems, if not now then in the future? Nope. Unix kernels/filesystems don't care at all what encoding the file names are in. Encodings are handled in userspace. The only thing that matters is that a '/' (0x2F) or '\0' byte can't be part of a directory entry name. I don't think this is going to change. >Maybe some (future) Unix systems support multi-byte encoded filenames >containing 0x3F in the second+ byte of a multi-byte character. 0x3F is '?'. You mean '/', 0x2F? I very much doubt that. >It's probably best to choose one internal representation of pathnames >and stick to it, but any representation other than single 'char' is a >lot of work, as you say! Well, UTF-8 is the one that causes least problems for legacy code, IMHO, although it is a variable-length multi-byte representation. As long as the code doesn't try to do things like split file names at random points between path component separators, or case convert single bytes, legacy code just works. Issues that need work when porting to Windows are the obvious: accepting both '/' and '\\' as directory separator and handling the multitude of roots. On Unix leading slash(es) indicate an absolute pathname, while on Windows it can be any of \, X:\, \\server\share\, \\?\X:\ or \\?\UNC\server\share\, where the backslashes in most cases(?) can also be slashes. I haven't checked whether freely mixing slashes and backslashes as in monstrosities like //?\UNC/server\share\dir/file.foo would actually work, though. I've never seen the \\?\ or \\?\UNC\server\share\ cases being handled in any Open Source code, and certainly not bothered myself with them either... But they are legal in the Unicode Win32 API (and in fact in a sense they are the "canonical" way to specify absolute pathnames, according to the docs), so if one is a perfectionist, one should. The docs say that for normal paths the max length is 259 (drive letter, colon, backslash, 256 chars), but if you prefix with \\?\, the Unicode version of the API permits a path length of 32767. >Maybe the wrapper functions could avoid converting to and from UTF-16 >if they are running on WinME and earlier. That's what GLib does: int g_open (const gchar *filename, int flags, int mode) { #ifdef G_OS_WIN32 if (G_WIN32_HAVE_WIDECHAR_API ()) { wchar_t *wfilename = g_utf8_to_utf16 (filename, -1, NULL, NULL, NULL); int retval; int save_errno; if (wfilename == NULL) { errno = EINVAL; return -1; } retval = _wopen (wfilename, flags, mode); save_errno = errno; g_free (wfilename); errno = save_errno; return retval; } else { gchar *cp_filename = g_locale_from_utf8 (filename, -1, NULL, NULL, NULL); int retval; int save_errno; if (cp_filename == NULL) { errno = EINVAL; return -1; } retval = open (cp_filename, flags, mode); save_errno = errno; g_free (cp_filename); errno = save_errno; return retval; } #else return open (filename, flags, mode); #endif } G_WIN32_HAVE_WIDECHAR_API() is a run-time test for NT-based Windows. wchar_t is a short on Windows (well, the Microsoft C library to be precise), and wchar_t strings are UTF-16. _wopen() is the wide-char variant of open() in the C library. g_locale_from_utf8() converts to the system codepage, which on Windows is either single-byte or single/double-byte. (Unfortunately there is no UTF-8 codepage. Or actually, there is (65001), but it can't be the system codepage.) >Anyway, Windows 95/98/ME is history. Not a platform worth caring about any >more. I agree. Unfortunately, "customers" think differently. There are still lots of people struggling along with Win98, and using GTK+-based software like GIMP or GAIM. GTK+ 2.8, perhaps, won't run any longer on Win9x/ME. Cheers, --tml _______________________________________________ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib