Re: Use cases for invalid-Unicode atoms

2018-03-21 Thread Henri Sivonen
> I'll go ahead with doing WHATWG-compliant "with replacement" when > trying to atomize invalid UTF-8 (which shouldn't happen anyway). For code archeologists: Don't believe what I said upthread about non-atomic empty-string atoms. My code reading was evidently wrong considering that we have test c

Re: Use cases for invalid-Unicode atoms

2018-03-21 Thread Henri Sivonen
On Wed, Mar 21, 2018 at 11:40 AM, Anne van Kesteren wrote: > On Wed, Mar 21, 2018 at 10:27 AM, Henri Sivonen wrote: >> * A bunch of things atomicize URL components. I'd hope that the URLs >> were converted from UTF-16 to UTF-8 at some prior point ensuring UTF-8 >> validity, but it's hard to be s

Re: Use cases for invalid-Unicode atoms

2018-03-21 Thread Anne van Kesteren
On Wed, Mar 21, 2018 at 10:27 AM, Henri Sivonen wrote: > * A bunch of things atomicize URL components. I'd hope that the URLs > were converted from UTF-16 to UTF-8 at some prior point ensuring UTF-8 > validity, but it's hard to be sure. At least per the specification all URL components end up wi

Re: Use cases for invalid-Unicode atoms

2018-03-21 Thread Henri Sivonen
On Tue, Mar 20, 2018 at 8:22 PM, Steve Fink wrote: > On 03/20/2018 06:49 AM, Henri Sivonen wrote: >> >> On Tue, Mar 20, 2018 at 12:44 PM, Henri Sivonen >> wrote: >>> >>> On Tue, Mar 20, 2018 at 11:12 AM, Henri Sivonen >>> wrote: OK. I'll leave the UTF-16 case unchanged and will make th

Re: Use cases for invalid-Unicode atoms

2018-03-20 Thread Steve Fink
On 03/20/2018 06:49 AM, Henri Sivonen wrote: On Tue, Mar 20, 2018 at 12:44 PM, Henri Sivonen wrote: On Tue, Mar 20, 2018 at 11:12 AM, Henri Sivonen wrote: OK. I'll leave the UTF-16 case unchanged and will make the minimal changes on the UTF-8 side to retain the existing outward behavior witho

Re: Use cases for invalid-Unicode atoms

2018-03-20 Thread Henri Sivonen
On Tue, Mar 20, 2018 at 12:44 PM, Henri Sivonen wrote: > On Tue, Mar 20, 2018 at 11:12 AM, Henri Sivonen wrote: >> OK. I'll leave the UTF-16 case unchanged and will make the minimal >> changes on the UTF-8 side to retain the existing outward behavior >> without burning the tree. Hopefully I can m

Re: Use cases for invalid-Unicode atoms

2018-03-20 Thread Henri Sivonen
On Tue, Mar 20, 2018 at 11:12 AM, Henri Sivonen wrote: > OK. I'll leave the UTF-16 case unchanged and will make the minimal > changes on the UTF-8 side to retain the existing outward behavior > without burning the tree. Hopefully I can make the UTF-8 case faster > while at it. It depended on not-s

Re: Use cases for invalid-Unicode atoms

2018-03-20 Thread Henri Sivonen
On Tue, Mar 20, 2018 at 11:00 AM, smaug wrote: > On 03/19/2018 09:30 PM, Kris Maglione wrote: >> >> On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote: >>> >>> It appears that currently we allow atomicizing invalid UTF-16 string, >>> which are impossible to look up by UTF-8 key and we d

Re: Use cases for invalid-Unicode atoms

2018-03-20 Thread smaug
On 03/19/2018 09:30 PM, Kris Maglione wrote: On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote: It appears that currently we allow atomicizing invalid UTF-16 string, which are impossible to look up by UTF-8 key and we don't allow atomicizing invalid UTF-8. I need to change some thin

Re: Use cases for invalid-Unicode atoms

2018-03-19 Thread Kris Maglione
On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote: It appears that currently we allow atomicizing invalid UTF-16 string, which are impossible to look up by UTF-8 key and we don't allow atomicizing invalid UTF-8. I need to change some things in this area in response to changing error