I intentionally ignored non-UTF-8 UNIX locales because our support for those locales is already half-broken and almost nobody cares about that. For example, OS.File assumes that the filesystem encoding is always UTF-8 on UNIX while nsIFile does not. This discrepancy caused a bug[1] that did not get much attention.
I think it's time to stop pretending to support non-UTF-8 UNIX locales. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1342659 On 2017/11/30 7:09, Karl Tomlinson wrote: > I've always found this confusing, and so I'll write down the > understanding I've reached, in the hope that either it will help > others, or others can help me by correcting if these are > misunderstandings. > > On Unix systems: > > `nativePath` > > contains the bytes corresponding to the native filename used > by native system calls. > > `path` > > is a UTF-16 encoding of an attempt to provide a human > readable version of the native filename. This involves > interpreting native bytes according to the character encoding > specified by the current locale of the application as > indicated by nl_langinfo(CODESET). > > For different locales, the same file can have a different > `path`. > > The native bytes may not be valid UTF-8, and so if the > character encoding is UTF-8, then there may not be a valid > `path` that can be encoded to produce the same `nativePath`. > > It is best to use `nativePath` for working with filenames, > including conversion to URI, but use `path` when displaying > names in the UI. > > On WINNT systems: > > `path` > > contains wide characters corresponding to the native filename > used by native wide character system APIs. For at least most > configurations, I assume wide characters are UTF-16, in which > case this is also human readable. > > `nativePath` > > is an attempt to represent the native filename in the native > multibyte character encoding specified by the current locale > of the application. > > For different locales, I assume the same file can have a > different `nativePath`. > > I assume there is not necessarily a valid multibyte character > encoding, and so there may not be a valid `nativePath` that > can be decoded to produce the same `path`. > > It is best to use `path` for working with filenames. > Conversion to URI involves assuming `path` is UTF-16 and > converting to UTF-8. > > The parameters mean very different things on different systems, > and so it is not generally possible to write XP code with either > of these, but Gecko attempts to do so anyway. > > The numbers of applications not using UTF-8 and filenames not > valid UTF-8 are much smaller on Unix systems than the numbers of > applications not using UTF-8 and non-ASCII filenames on WINNT > systems, and so choosing to work with `path` provides more > compatibility than working with `nativePath`. > _______________________________________________ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform