I think that we don't have any data when user doesn't use non-UTF-8
(and C) locale such as ja_JP.eucJP.  We should get data via telemetry.

-- Makoto

On Thu, Nov 30, 2017 at 9:02 PM, Masatoshi Kimura <vyv03...@nifty.ne.jp> wrote:
> I intentionally ignored non-UTF-8 UNIX locales because our support for
> those locales is already half-broken and almost nobody cares about that.
> For example, OS.File assumes that the filesystem encoding is always
> UTF-8 on UNIX while nsIFile does not. This discrepancy caused a bug[1]
> that did not get much attention.
>
> I think it's time to stop pretending to support non-UTF-8 UNIX locales.
>
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1342659
>
> On 2017/11/30 7:09, Karl Tomlinson wrote:
>> I've always found this confusing, and so I'll write down the
>> understanding I've reached, in the hope that either it will help
>> others, or others can help me by correcting if these are
>> misunderstandings.
>>
>> On Unix systems:
>>
>>   `nativePath`
>>
>>      contains the bytes corresponding to the native filename used
>>      by native system calls.
>>
>>   `path`
>>
>>      is a UTF-16 encoding of an attempt to provide a human
>>      readable version of the native filename.  This involves
>>      interpreting native bytes according to the character encoding
>>      specified by the current locale of the application as
>>      indicated by nl_langinfo(CODESET).
>>
>>      For different locales, the same file can have a different
>>      `path`.
>>
>>      The native bytes may not be valid UTF-8, and so if the
>>      character encoding is UTF-8, then there may not be a valid
>>      `path` that can be encoded to produce the same `nativePath`.
>>
>>   It is best to use `nativePath` for working with filenames,
>>   including conversion to URI, but use `path` when displaying
>>   names in the UI.
>>
>> On WINNT systems:
>>
>>   `path`
>>
>>      contains wide characters corresponding to the native filename
>>      used by native wide character system APIs.  For at least most
>>      configurations, I assume wide characters are UTF-16, in which
>>      case this is also human readable.
>>
>>   `nativePath`
>>
>>      is an attempt to represent the native filename in the native
>>      multibyte character encoding specified by the current locale
>>      of the application.
>>
>>      For different locales, I assume the same file can have a
>>      different `nativePath`.
>>
>>      I assume there is not necessarily a valid multibyte character
>>      encoding, and so there may not be a valid `nativePath` that
>>      can be decoded to produce the same `path`.
>>
>>   It is best to use `path` for working with filenames.
>>   Conversion to URI involves assuming `path` is UTF-16 and
>>   converting to UTF-8.
>>
>> The parameters mean very different things on different systems,
>> and so it is not generally possible to write XP code with either
>> of these, but Gecko attempts to do so anyway.
>>
>> The numbers of applications not using UTF-8 and filenames not
>> valid UTF-8 are much smaller on Unix systems than the numbers of
>> applications not using UTF-8 and non-ASCII filenames on WINNT
>> systems, and so choosing to work with `path` provides more
>> compatibility than working with `nativePath`.
>> _______________________________________________
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
>>
> _______________________________________________
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to