Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Henri Sivonen
On Sun, Oct 5, 2014 at 7:26 PM, Simon Sapin wrote: > JavaScript strings, however, can. (They are effectively potentially > ill-formed UTF-16.) It’s possible (?) that the Web depends on these > surrogates being preserved. It's clear that JS programs depend on being able to hold unpaired surrogates

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Boris Zbarsky
On 10/5/14, 7:51 PM, Cameron Zwarich wrote: Are there any plans to eliminate the copies in Gecko? No. Measurement showed that in practice the cost of copying short strings, which most of these are, is very low. For large strings you do end up having to copy, but keep in mind that Gecko used

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Cameron Zwarich
On Oct 5, 2014, at 3:08 PM, Boris Zbarsky wrote: > On 10/5/14, 2:27 PM, Cameron Zwarich wrote: >> I am opposed to anything that requires string copies between the DOM and JS > > The only way to do that with SpiderMonkey in its current state is to use > JSString for your string type. You cannot

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Cameron Zwarich
On Oct 5, 2014, at 3:13 PM, Patrick Walton wrote: > On 10/5/14 3:08 PM, Boris Zbarsky wrote: >> On 10/5/14, 2:27 PM, Cameron Zwarich wrote: >>> I am opposed to anything that requires string copies between the DOM >>> and JS >> >> The only way to do that with SpiderMonkey in its current state is

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Patrick Walton
On 10/5/14 3:08 PM, Boris Zbarsky wrote: On 10/5/14, 2:27 PM, Cameron Zwarich wrote: I am opposed to anything that requires string copies between the DOM and JS The only way to do that with SpiderMonkey in its current state is to use JSString for your string type. You cannot safely grab the c

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Boris Zbarsky
On 10/5/14, 2:27 PM, Cameron Zwarich wrote: I am opposed to anything that requires string copies between the DOM and JS The only way to do that with SpiderMonkey in its current state is to use JSString for your string type. You cannot safely grab the chars from a SpiderMonkey string and hold

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Cameron Zwarich
On Oct 5, 2014, at 2:05 PM, Ms2ger wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 10/05/2014 08:27 PM, Cameron Zwarich wrote: >> If JS can’t handle WTF-8 natively, then what’s the benefit of >> using it? I am opposed to anything that requires string copies >> between the DOM and

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Ms2ger
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/05/2014 08:27 PM, Cameron Zwarich wrote: > If JS can’t handle WTF-8 natively, then what’s the benefit of > using it? I am opposed to anything that requires string copies > between the DOM and JS, unless there’s some really great overriding > reas

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Cameron Zwarich
If JS can’t handle WTF-8 natively, then what’s the benefit of using it? I am opposed to anything that requires string copies between the DOM and JS, unless there’s some really great overriding reason. Cameron On Oct 5, 2014, at 9:26 AM, Simon Sapin wrote: > We’ve discussed using UTF-8 interna

[dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-05 Thread Simon Sapin
We’ve discussed using UTF-8 internally for strings in Servo, but well-formed UTF-8 can not represent surrogate code points. JavaScript strings, however, can. (They are effectively potentially ill-formed UTF-16.) It’s possible (?) that the Web depends on these surrogates being preserved. So i