==Summary==

I suggest that we either decide not to support non-UTF-8 file paths in
Gecko/Firefox on Gtk platforms or, failing that, use the glib
facilities to convert between Unicode and file paths.

Opinions?

==Relevant bugs==

https://bugzilla.mozilla.org/show_bug.cgi?id=960957
https://bugzilla.mozilla.org/show_bug.cgi?id=978056

==Details==

On (non-OS X) *nix, file paths are byte strings without file system or
kernel-level semantics for the bytes with the most significant bit
set. While on *nix nsIFile holds a byte strings as the path, JS
strings are UTF-16, so when paths are exposed to JS code, there's a
need to convert between UTF-16 and byte strings, which requires
deciding what encoding to use for the byte strings.

On OS X, Gonk and Android, we assume that the encoding of the byte
strings is always UTF-8.

However, on Gtk platforms, except when using the OS.File API, we try
to determine the encoding by calling nl_langinfo(CODESET) or by
falling back onto ISO-8859-1. Most of the time, the result is UTF-8
anyway, since UTF-8 has been the default of Linux distros for years.
Red Hat 8, released in 2002, defaulted to UTF-8. On the Debian side,
UTF-8 has been the default since 2007. The most likely deviation from
the rule is launching Gecko with the POSIX C locale, which results in
the file path encoding being US-ASCII--i.e. non-ASCII paths fail.

The OS.File API assumes UTF-8 even on Gtk platform even when
nl_langinfo(CODESET) says something other than UTF-8.

The above-described behavior differs from how glib/Gtk apps are
supposed to behave. The way glib/Gtk apps are supposed to behave is
leaving conversion between Unicode and the file path encoding to glib,
which behaves as follows:
 1) If the G_FILENAME_ENCODING environment variable is set, treat it
as a comma-separated list and assume the first item is the file name
encoding. (A special token @locale means taking the encoding from the
locale.)
 2) Otherwise, if the G_BROKEN_FILENAMES environment variable is set,
take use the encoding from the locale as the file name encoding.
 3) Otherwise, use UTF-8 as the file name encoding.

==Suggestion 1==

Since we've managed to ship with OS.File failing if the file path
encoding is not UTF-8 and since UTF-8 has been the default since 2002
on the Red Hat side and since 2007 on the Debian side, let's drop
support for non-UTF-8 file name paths on all *nix platforms.

==Suggestion 2==

If Suggestion 1 is considered too radical and we want to keep
supporting configurations that use non-UTF-8 file paths because the
systems, let's at least drop some legacy code on our side and behave
like a normal Gtk app:
 1) Upon initialization, make nsNativeCharsetUtils convert a UTF-8
string using g_filename_to_utf8(). If the resulting string has the
same bytes, set gIsNativeUTF8 to true.
 2) If gIsNativeUTF8 is true, implement NS_CopyNativeToUnicode as
CopyUTF8toUTF16 and implement NS_CopyUnicodeToNative as
CopyUTF16toUTF8.
 3) If nsNativeCharsetConverter::gIsNativeUTF8 is false, implement
NS_CopyNativeToUnicode as g_filename_to_utf8() followed by
CopyUTF8toUTF16 and implement NS_CopyUnicodeToNative as
CopyUTF16toUTF8 followed by g_filename_from_utf8(). (This results in a
UTF-8 intermediate copy when the file system, but only in the unusual
case of the file system encoding not being UTF-8. Also, we get rid of
a bunch of old code and unify our behavior with other Gtk apps.)
 4) Make OS.File use NS_CopyNativeToUnicode and NS_CopyUnicodeToNative
for converting between JS strings and file system path byte strings.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to