I intentionally ignored non-UTF-8 UNIX locales because our support for
those locales is already half-broken and almost nobody cares about that.
For example, OS.File assumes that the filesystem encoding is always
UTF-8 on UNIX while nsIFile does not. This discrepancy caused a bug[1]
that did not get much attention.

I think it's time to stop pretending to support non-UTF-8 UNIX locales.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1342659

On 2017/11/30 7:09, Karl Tomlinson wrote:
> I've always found this confusing, and so I'll write down the
> understanding I've reached, in the hope that either it will help
> others, or others can help me by correcting if these are
> misunderstandings.
> 
> On Unix systems:
> 
>   `nativePath`
> 
>      contains the bytes corresponding to the native filename used
>      by native system calls.
> 
>   `path`
> 
>      is a UTF-16 encoding of an attempt to provide a human
>      readable version of the native filename.  This involves
>      interpreting native bytes according to the character encoding
>      specified by the current locale of the application as
>      indicated by nl_langinfo(CODESET).
> 
>      For different locales, the same file can have a different
>      `path`.
> 
>      The native bytes may not be valid UTF-8, and so if the
>      character encoding is UTF-8, then there may not be a valid
>      `path` that can be encoded to produce the same `nativePath`.
> 
>   It is best to use `nativePath` for working with filenames,
>   including conversion to URI, but use `path` when displaying
>   names in the UI.
> 
> On WINNT systems:
> 
>   `path`
> 
>      contains wide characters corresponding to the native filename
>      used by native wide character system APIs.  For at least most
>      configurations, I assume wide characters are UTF-16, in which
>      case this is also human readable.
> 
>   `nativePath`
> 
>      is an attempt to represent the native filename in the native
>      multibyte character encoding specified by the current locale
>      of the application.
> 
>      For different locales, I assume the same file can have a
>      different `nativePath`.
> 
>      I assume there is not necessarily a valid multibyte character
>      encoding, and so there may not be a valid `nativePath` that
>      can be decoded to produce the same `path`.
> 
>   It is best to use `path` for working with filenames.
>   Conversion to URI involves assuming `path` is UTF-16 and
>   converting to UTF-8.
> 
> The parameters mean very different things on different systems,
> and so it is not generally possible to write XP code with either
> of these, but Gecko attempts to do so anyway.
> 
> The numbers of applications not using UTF-8 and filenames not
> valid UTF-8 are much smaller on Unix systems than the numbers of
> applications not using UTF-8 and non-ASCII filenames on WINNT
> systems, and so choosing to work with `path` provides more
> compatibility than working with `nativePath`.
> _______________________________________________
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
> 
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to