Re: Do we actually have use cases for rejecting non-characters in UTF-8ness check?

Henri Sivonen Fri, 17 Mar 2017 04:12:57 -0700

On Fri, Mar 17, 2017 at 12:12 PM, Anne van Kesteren <ann...@annevk.nl> wrote:
> On Fri, Mar 17, 2017 at 11:00 AM, Henri Sivonen <hsivo...@hsivonen.fi> wrote:
>> Our IsUTF8() by default rejects strings that contain code points whose
>> lowest 16 bits are 0xFFFE or 0xFFFF.
>>
>> Do we actually have use cases for rejecting such strings in UTF-8ness checks?
>
> I'm not aware of any web-observable feature that would need that.


Thanks.

> The
> only places I know of that do something with non-characters are URLs
> and HTML, which exclude them for validity purposes, but there's no
> browser API necessarily affected by that and they wouldn't use a
> IsUTF8() code path. Are there too many callers to examine the
> implications?

The callers aren't many, but they involve protocols and formats that
I'm not familiar with on the quirk level of detail:
https://searchfox.org/mozilla-central/search?q=symbol:_Z6IsUTF8RK10nsACStringb&redirect=false

As a matter of API design, I disapprove of a method called IsUTF8
doing something other than a pure UTF-8ness check. For example, the
reason why it now has the option to opt out of the non-character
rejection quirk is that Web Socket code used the function for what its
name says and that was a bug. Instead of changing the semantics to
match the name for everyone, an opt-out was introduced for callers in
Web Socket code.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Do we actually have use cases for rejecting non-characters in UTF-8ness check?

Reply via email to