==Summary== I suggest that we either decide not to support non-UTF-8 file paths in Gecko/Firefox on Gtk platforms or, failing that, use the glib facilities to convert between Unicode and file paths.
Opinions? ==Relevant bugs== https://bugzilla.mozilla.org/show_bug.cgi?id=960957 https://bugzilla.mozilla.org/show_bug.cgi?id=978056 ==Details== On (non-OS X) *nix, file paths are byte strings without file system or kernel-level semantics for the bytes with the most significant bit set. While on *nix nsIFile holds a byte strings as the path, JS strings are UTF-16, so when paths are exposed to JS code, there's a need to convert between UTF-16 and byte strings, which requires deciding what encoding to use for the byte strings. On OS X, Gonk and Android, we assume that the encoding of the byte strings is always UTF-8. However, on Gtk platforms, except when using the OS.File API, we try to determine the encoding by calling nl_langinfo(CODESET) or by falling back onto ISO-8859-1. Most of the time, the result is UTF-8 anyway, since UTF-8 has been the default of Linux distros for years. Red Hat 8, released in 2002, defaulted to UTF-8. On the Debian side, UTF-8 has been the default since 2007. The most likely deviation from the rule is launching Gecko with the POSIX C locale, which results in the file path encoding being US-ASCII--i.e. non-ASCII paths fail. The OS.File API assumes UTF-8 even on Gtk platform even when nl_langinfo(CODESET) says something other than UTF-8. The above-described behavior differs from how glib/Gtk apps are supposed to behave. The way glib/Gtk apps are supposed to behave is leaving conversion between Unicode and the file path encoding to glib, which behaves as follows: 1) If the G_FILENAME_ENCODING environment variable is set, treat it as a comma-separated list and assume the first item is the file name encoding. (A special token @locale means taking the encoding from the locale.) 2) Otherwise, if the G_BROKEN_FILENAMES environment variable is set, take use the encoding from the locale as the file name encoding. 3) Otherwise, use UTF-8 as the file name encoding. ==Suggestion 1== Since we've managed to ship with OS.File failing if the file path encoding is not UTF-8 and since UTF-8 has been the default since 2002 on the Red Hat side and since 2007 on the Debian side, let's drop support for non-UTF-8 file name paths on all *nix platforms. ==Suggestion 2== If Suggestion 1 is considered too radical and we want to keep supporting configurations that use non-UTF-8 file paths because the systems, let's at least drop some legacy code on our side and behave like a normal Gtk app: 1) Upon initialization, make nsNativeCharsetUtils convert a UTF-8 string using g_filename_to_utf8(). If the resulting string has the same bytes, set gIsNativeUTF8 to true. 2) If gIsNativeUTF8 is true, implement NS_CopyNativeToUnicode as CopyUTF8toUTF16 and implement NS_CopyUnicodeToNative as CopyUTF16toUTF8. 3) If nsNativeCharsetConverter::gIsNativeUTF8 is false, implement NS_CopyNativeToUnicode as g_filename_to_utf8() followed by CopyUTF8toUTF16 and implement NS_CopyUnicodeToNative as CopyUTF16toUTF8 followed by g_filename_from_utf8(). (This results in a UTF-8 intermediate copy when the file system, but only in the unusual case of the file system encoding not being UTF-8. Also, we get rid of a bunch of old code and unify our behavior with other Gtk apps.) 4) Make OS.File use NS_CopyNativeToUnicode and NS_CopyUnicodeToNative for converting between JS strings and file system path byte strings. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform