date:20141009

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-09 Thread Henri Sivonen

On Wed, Oct 8, 2014 at 4:13 PM, Jan de Mooij  wrote:
> When I added Latin1 to SpiderMonkey, we did consider using UTF8 but it's
> complicated. As mentioned, we have to ensure charAt/charCodeAt stay fast
> (crypto benchmarks etc rely on this, sadly).

It would be even more tragic to miss the opportunity to use 8-bit code
units for strings in Servo because JS crypto benchmarks use strings.
What chances are there to retire the use of strings-for-crypto in
benchmarking? Such a benchmark doesn't represent a reasonable real
application. A reasonable real application would use the Web Crypto
API to delegate crypto operations to outside the JS engine or use
ArrayBuffers to perform byte-oriented operations inside the JS engine.

> Many other string operations
> are also very perf-sensitive and extra branches in tight loops can hurt a
> lot.

Besides charAt/charCodeAt, what operations do you expect to be
adversely affected by WTF-8 memory layout?

As for extra branches, if each logically immutable string maintains
"is ASCII-only" immutable bit of state and, if that bit is false, two
mutable integers of state: "Next UTF-16 index" and "Next WTF-8 index"
is a branch at the start of charAt to see if the argument is equal to
"Next UTF-16 index" (in which case start reading at "Next WTF-8
index") substantially worse than checks to see if there a PIC of
something exists? Also, if the JIT knows about the internals of
strings, couldn't these checks be optimized out by temporarily
hoisting "Next UTF-16 index" and "Next WTF-8 index" out of the object
and inline into the code accessing the string before an optimizer
optimizes the code in obviously sequential loops?

> Also, the regular expression engine currently emits JIT code to load
> and compare multiple characters at once.

Since changing the code unit size to smaller is a matter of
concatenation and concatenation is a regular construct, whether a
regexp engine can be retargeted to *TF-8 is not an open research
question but a matter of doing the work, it doesn't make sense to me
to block Servo's use of *TF-8 on regexp concerns. When the time comes
to have product-level (as opposed to research placeholder) performance
for regexps, it should be a matter of doing work--not a matter of
researching if it is possible.

> All this is fixable to work on
> WTF-8 strings, but it's a lot of work and performance is a risk.

Considering all the work involved in making Servo into a engine
suitable for browsing the Web, it seems to me that it would be fair to
have this work on the todo list among everything else and accept
non-optimized WTF-8 string object support into SpiderMonkey as a
compile-time option for the time being.

> Also note that the copying we do for strings passed from JS to Gecko is not
> only necessary for moving GC, but also to inflate Latin1 strings (= most
> strings)

Has SpiderMonkey ever been instrumented to find out if most strings
are even just ASCII?

> to TwoByte Gecko strings. If Servo or Gecko could deal with both
> Latin1 and TwoByte strings, we could think about ways to avoid the copying.
> Though, as Boris said, I'm not aware of any (non-micro-)benchmark
> regressions from the copying so I don't expect big wins from optimizing
> this. But again, doing a Latin1 -> TwoByte copy is a very tight loop that
> compilers can probably vectorize. UTF8/WTF8 -> TwoByte is more complicated
> and probably slower.

Gecko already has vectorized code for conversions between UTF-8 and
UTF-16, so it's probably worth measuring how much worse vectorized
UTF-8 <-> UTF-16 is compared to vectorized Latin-1 <-> UTF-16. It's
quite possible that the answer is "not too much slower" if already
there aren't microbenchmarks relying on the copy speed.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

2014-10-09 Thread Nicholas Nethercote

On Thu, Oct 9, 2014 at 9:21 PM, Henri Sivonen  wrote:
> On Wed, Oct 8, 2014 at 4:13 PM, Jan de Mooij  wrote:
>
> Has SpiderMonkey ever been instrumented to find out if most strings
> are even just ASCII?

There are some measurements in
https://blog.mozilla.org/javascript/2014/07/21/slimmer-and-faster-javascript-strings-in-firefox/.

But even better, you can visit about:memory and see for yourself. Look
for entries like this:

│   ├──26.43 MB (08.36%) -- strings
│   │  ├──13.98 MB (04.43%) -- malloc-heap
│   │  │  ├──10.84 MB (03.43%) ── latin1
│   │  │  └───3.14 MB (00.99%) ── two-byte
│   │  └──12.45 MB (03.94%) -- gc-heap
│   │ ├───9.05 MB (02.86%) ── latin1
│   │ └───3.40 MB (01.08%) ── two-byte

You can see these stats on a per-zone basis in the "explicit" tree, or
for the entire main runtime under the "js-main-runtime" tree.

"gc-heap" refers to the JSString objects store on the GC heap, some of
which hold the entire string's chars. "malloc-heap" refers to
separately-stored chars for longer strings.

Nick
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Re: [dev-servo] The current scrolling model

2014-10-09 Thread Robert O'Callahan

On Wed, Oct 8, 2014 at 10:55 PM, Patrick Walton 
wrote:

> On 10/8/14 10:51 PM, Robert O'Callahan wrote:
>
>> You can get away with that for position:fixed, but I don't think you can
>> get away with that for overflow:auto/scroll. We find in Gecko many real
>> situations where scrolled content for a given scrollable container has
>> to be split into multiple layers because content from the container is
>> interspersed in z-order with content that's not scrolling.
>>
>
> Samples of such content would be really interesting to take a look at and
> make sure we handle.
>
> In general our approach is functionally identical to WebKit's, so I
> suspect that whatever WebKit does we can just copy.
>

FirefoxOS Marketplace was one example. I don't have a simple example on
hand but I can make up one if you need one.

Rob
-- 
oIo otoeololo oyooouo otohoaoto oaonoyooonoeo owohooo oioso oaonogoroyo
owoiotoho oao oboroootohoeoro oooro osoiosotoeoro owoiololo oboeo
osouobojoeocoto otooo ojouodogomoeonoto.o oAogoaoiono,o oaonoyooonoeo
owohooo
osoaoyoso otooo oao oboroootohoeoro oooro osoiosotoeoro,o o‘oRoaocoao,o’o
oioso
oaonosowoeoroaoboloeo otooo otohoeo ocooouoroto.o oAonodo oaonoyooonoeo
owohooo
osoaoyoso,o o‘oYooouo ofolo!o’o owoiololo oboeo oiono odoaonogoeoro
ooofo
otohoeo ofoioroeo ooofo ohoeololo.
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

Re: [dev-servo] WTF-8 encoding for DOM strings and HTML parsing

Re: [dev-servo] The current scrolling model

3 matches

Site Navigation

Mail list logo

Footer information