Re: [dev-servo] character encoding in the HTML parser

Luke Wagner Thu, 03 Apr 2014 09:04:45 -0700

Another option we've just been discussing is to lazily compute a flag on the 
string indicating "contents are 7-bit ascii" that allowed us to use array 
indexing.  I'd expect this to often be true.  There are also many cases where 
we'd eagerly have this flag (atoms produced during parsing, strings converted 
from numbers, concatenations of 7-bit ascii strings, substrings of 7-bit ascii 
strings, as a parameter from the embedding, etc) so that we would be able to 
avoid much of the overhead of this check.  (One could even imagine a background 
thread that computed 7-bit-ness ;)


----- Original Message -----
> On Wed, Apr 2, 2014 at 4:25 PM, Robert O'Callahan <rob...@ocallahan.org>
> wrote:
> > If we could get the JS engine to use evil-UTF8 with some hack to handle
> > charAt and friends efficiently (e.g. tacking on a UCS-2 version of the
> > string when necessary)
> 
> Have we instrumented Gecko to find out what the access patterns are
> like? If the main performance-sensitive access pattern is sequential
> iteration over the string, instead of storing a UCS-2 copy, we could
> store the next expected UCS-2 index and the next UTF-8 index. charAt
> would then start by comparing if its argument equals the next expected
> UCS-2 index in which case read would start at the next UTF-8 index.
> 
> --
> Henri Sivonen
> hsivo...@hsivonen.fi
> https://hsivonen.fi/
> _______________________________________________
> dev-servo mailing list
> dev-servo@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-servo
> 
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Re: [dev-servo] character encoding in the HTML parser

Reply via email to