On Fri, Mar 17, 2017 at 12:12 PM, Anne van Kesteren <ann...@annevk.nl> wrote: > On Fri, Mar 17, 2017 at 11:00 AM, Henri Sivonen <hsivo...@hsivonen.fi> wrote: >> Our IsUTF8() by default rejects strings that contain code points whose >> lowest 16 bits are 0xFFFE or 0xFFFF. >> >> Do we actually have use cases for rejecting such strings in UTF-8ness checks? > > I'm not aware of any web-observable feature that would need that.
Thanks. > The > only places I know of that do something with non-characters are URLs > and HTML, which exclude them for validity purposes, but there's no > browser API necessarily affected by that and they wouldn't use a > IsUTF8() code path. Are there too many callers to examine the > implications? The callers aren't many, but they involve protocols and formats that I'm not familiar with on the quirk level of detail: https://searchfox.org/mozilla-central/search?q=symbol:_Z6IsUTF8RK10nsACStringb&redirect=false As a matter of API design, I disapprove of a method called IsUTF8 doing something other than a pure UTF-8ness check. For example, the reason why it now has the option to opt out of the non-character rejection quirk is that Web Socket code used the function for what its name says and that was a bug. Instead of changing the semantics to match the name for everyone, an opt-out was introduced for callers in Web Socket code. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform