[dev-servo] Rooting progress

2014-03-29 Thread Josh Matthews
One of many hurdles in upgrading to a newer SpiderMonkey revision was 
the exact rooting requirement, and I've got a branch that has a decent 
start on that: https://github.com/jdm/servo/commits/newroot


In short, with my changes all of Servo's content tests run correctly 
despite GCing every time SpiderMonkey allocates anything, suggesting 
that our compiler-generated trace hooks are effective, and the on-stack 
rooting API I've implemented is doing its job. There's more work 
required to devise some types that prevent errors like the HTML parser 
one that I fixed in the most recent commit, but I thought I'd post my 
early promising results.


Cheers,
Josh
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] Rooting progress

2014-03-29 Thread Josh Matthews

On 03/29/2014 01:35 PM, Josh Matthews wrote:

One of many hurdles in upgrading to a newer SpiderMonkey revision was
the exact rooting requirement, and I've got a branch that has a decent
start on that: https://github.com/jdm/servo/commits/newroot

In short, with my changes all of Servo's content tests run correctly
despite GCing every time SpiderMonkey allocates anything, suggesting
that our compiler-generated trace hooks are effective, and the on-stack
rooting API I've implemented is doing its job. There's more work
required to devise some types that prevent errors like the HTML parser
one that I fixed in the most recent commit, but I thought I'd post my
early promising results.

Cheers,
Josh


For those who want quick links, 
https://github.com/jdm/servo/commit/3d3c8e8cdc833e88d81b4ac9f84e4cb2822dde55 
shows the changes required for hand-written DOM code so far (ie. all DOM 
methods now take the equivalent of JS::Handle and arguments must 
therefore be explicitly rooted). Generated bindings code Just Works as 
you would expect.


The parser rooting error I references is 
https://github.com/jdm/servo/commit/0efd09497bb7a956d04b7589b685c7ba43cd1f06#diff-f716191032d0b98cc78247b27ec08f5dL75 
.. The parser is callback-based, so we were a) creating an unrooted JS 
object, then GCing it when we tried to attach any attributes to it, b) 
not rooting the element during the period between when it's created and 
when it's attached to the tree. I suspect the solution for this is to 
make methods return an Unrooted instead of JS, but I'll spend some 
time thinking through the possibilities.


Cheers,
Josh
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] character encoding in the HTML parser

2014-03-29 Thread Simon Sapin

On 10/03/2014 23:54, Keegan McAllister wrote:

[...]

Also, should we follow Gecko in representing the input stream as a
queue of UTF-16 buffers?  With UTF-8 we would have about half as much
data to stream through the parser, and (on 64-bit) we could do
case-insensitive ASCII operations 8 characters at a time.


UTF-8 would be nice, but I think we’re stuck with UTF-16 for the reasons 
below. (Actually sequences of 16 bit integers containing 
potentially-invalid UTF-16. Is that called "UCS-2"?)


Or I guess we could use what I’ll call "evil UTF-8", which is UTF-8 
without the artificial restriction of not encoding surrogates. But I 
feel bad just suggesting it. (This restriction, as well as the upper 
limit of U+10, exist to align with the value-space of UTF-16. 
UTF-8’s underlying algorithm is perfectly fine without either.)




Speaking of which, [5]


Any character that is a not a Unicode character, i.e. any isolated
surrogate, is a parse error. (These can only find their way into
the input stream via script APIs such as document.write().)


I don't see a designated error-recovery behavior, unlike most parse
errors in the spec.  Is there an implicit behavior that applies to
input stream preprocessing?  Anyway I hope that this means we don't
need to represent isolated surrogates in the input to a UTF-8
parser.

[5] 
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#input-stream


As far as I understand, a "parse error" in the spec is meant for 
conformance checkers (validators), not user agents. There is no error 
recovery behavior, because this is not an error.



I’d be surprised if anyone relies on truly isolated surrogates, because 
Chrome gets very confused when rendering them:


data:text/html,document.write("a\uD800b")


Unfortunately I’d be less surprised if someone relies on having the two 
halves of a surrogate pair in separate document.write() call, as this 
seems more interoperable:


data:text/html,document.write("\uD83D");document.write("\uDCA9")

--
Simon Sapin
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] character encoding in the HTML parser

2014-03-29 Thread Boris Zbarsky

On 3/29/14 6:56 PM, Simon Sapin wrote:

Or I guess we could use what I’ll call "evil UTF-8", which is UTF-8
without the artificial restriction of not encoding surrogates.


http://en.wikipedia.org/wiki/CESU-8


As far as I understand, a "parse error" in the spec is meant for
conformance checkers (validators), not user agents. There is no error
recovery behavior, because this is not an error.


Technically a UA is allowed to just stop parsing at a "parse error". 
For a browser, of course, that's not web-compatible as you note.


-Boris
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


[dev-servo] Crazy idea: CSS selector JITting at parse time

2014-03-29 Thread Patrick Walton

Hi everyone,

I've been discussing this idea with a few people in person over the past 
week, and nobody told me it was completely insane. ;) So I thought I'd 
send this idea around.


By now many people have heard of WebKit's CSS JIT. It's a surprisingly 
small amount of code. One of the issues that they cite in their blog 
post [1] is compilation time: the compilation has to be very fast in 
order to avoid nullifying the gains. With that in mind, I had an idea: 
what if the "AST" that the CSS parser generated was actually machine code?


The AST for the CSS grammar is extremely small: 176 lines of Rust for 
the whole thing [2], and the selector grammar is just a fraction of 
that. This is because selectors describe a very simple language, even 
with all the extensions that have been added over the years. So one 
could write a simple template-emitting compiler that takes CSS selectors 
and writes a small template of machine code. (This is in fact the 
approach that WebKit uses.) For example, I could imagine that the JIT 
code for the selector ".foo #a" might look like (assuming that "#a" has 
already been matched in the ID hash):


; assume that node is in ebx
l1: mov ebx,[ebx+offsetof(parentNode)]
test ebx,ebx
jz nomatch
mov edx,[ebx+offsetof(class)]
xor ecx,ecx
l2: cmp [edx+ecx*4],"foo"
je match
inc ecx
cmp ecx,[ebx+offsetof(classcount)]
jne l2
jmp l1

This is just 29 bytes of code when assembled. This is likely larger than 
the equivalent `nsRuleNode`, but `nsRuleNode` instances are not a large 
amount of the heap in my cursory glances at `about:memory`.


The major question is: If you don't have an AST, then how do you 
implement the CSSOM and style debugging? Here's the trick: Since CSS 
selectors are so simple compared to, say, the output of a JS JIT, it 
should be possible to *disassemble* the code by simply pattern-matching. 
Since the compiler is nothing more than a very simple template processor 
(and it has to be, for speed reasons) it should be possible to reverse 
the output and go from JIT code back to the CSS selectors. This would be 
done by performing some very simple analysis (not full disassembly) on 
the machine code. For instance, on the above selector, it might do:


; start with `#a` from the rule hash

l1: mov ebx,[ebx+offsetof(parentNode)]
test ebx,ebx
jz nomatch
; found a test for parentNode, so we must be a descendant or
; child selector (depending on whether we see a backwards jump)

mov edx,[ebx+offsetof(class)]
xor ecx,ecx
l2: cmp [edx+ecx*4],"foo"
je match
inc ecx
cmp ecx,[ebx+offsetof(classcount)]
jne l2
; found a test for the "foo" class

jmp l1
; found a backwards jump, so we know we're a descendant selector
; result: `.foo #a`

So you wouldn't need a full disassembler, just a simple pattern matcher.

Note that one doesn't have to JIT everything: for more complex tests 
such as "attribute contains substring", we can just call out to 
functions implemented in Rust in the JIT code. The disassembler can just 
pattern match on those function addresses and arguments. This also 
increases the maintainability of this approach in the presence of new 
stuff that the CSS working groups add to the spec: we can implement all 
new features as functions at first, and only JIT them if they become 
widely used.


Thoughts?

Patrick

[1]: https://www.webkit.org/blog/3271/webkit-css-selector-jit-compiler/

[2]: https://github.com/mozilla-servo/rust-cssparser/blob/master/ast.rs
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] Crazy idea: CSS selector JITting at parse time

2014-03-29 Thread Boris Zbarsky

On 3/29/14 8:23 PM, Patrick Walton wrote:

This is just 29 bytes of code when assembled. This is likely larger than
the equivalent `nsRuleNode`


The right comparison is not nsRuleNode but nsCSSSelector, right?

These are actually pretty bloated in Gecko right now.  For example, 
".foo #a" is parsed into a linked list of two nsCSSSelectors. 
nsCSSSelector consists of 8 words and 8 more bytes, so 40 bytes on 
32-bit and 72 bytes on 64-bit.  So right now storing the parsed version 
of ".foo #a" is 80 or 144 bytes in Gecko.  This is not great, and we've 
been talking about shrinking these for a while, but it's clearly not fatal.


The rest of your proposal seems pretty workable.  The biggest issue, I 
suspect, is a maintainability one: it forces a pretty tight coupling 
between your selector jit and the object layout of your nodes, for any 
of the tests performed directly in jitcode.  For example, your cited 
jitcode assumes that nodes have a "class" member and a "classCount" 
member and that "class" points to an array of classes, right?


I think it should be possible to minimize the amount of coupling if it 
turns out to be an issue in practice.


-Boris
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo


Re: [dev-servo] Crazy idea: CSS selector JITting at parse time

2014-03-29 Thread Patrick Walton
Yes, it does have pretty tight coupling. That is the biggest risk.

On a related note, I have been tossing around ideas today for using SIMD to 
match multiple selectors that have the same "shape" in parallel. For example, 
if we have ".foo #a" and ".bar #a" it may be possible to use the packed 
comparison instructions in SSE4 to match both at the same time. Obviously, this 
adds significant complexity and correspondingly increased maintenance burden, 
and its effectiveness depends on how often selectors have the same shape in the 
wild (if it works at all). So I'm filing it into the "potentially interesting 
project, not a high priority" mental bin. Could be neat though.

Patrick

On March 29, 2014 6:55:41 PM PDT, Boris Zbarsky  wrote:
>On 3/29/14 8:23 PM, Patrick Walton wrote:
>> This is just 29 bytes of code when assembled. This is likely larger
>than
>> the equivalent `nsRuleNode`
>
>The right comparison is not nsRuleNode but nsCSSSelector, right?
>
>These are actually pretty bloated in Gecko right now.  For example, 
>".foo #a" is parsed into a linked list of two nsCSSSelectors. 
>nsCSSSelector consists of 8 words and 8 more bytes, so 40 bytes on 
>32-bit and 72 bytes on 64-bit.  So right now storing the parsed version
>
>of ".foo #a" is 80 or 144 bytes in Gecko.  This is not great, and we've
>
>been talking about shrinking these for a while, but it's clearly not
>fatal.
>
>The rest of your proposal seems pretty workable.  The biggest issue, I 
>suspect, is a maintainability one: it forces a pretty tight coupling 
>between your selector jit and the object layout of your nodes, for any 
>of the tests performed directly in jitcode.  For example, your cited 
>jitcode assumes that nodes have a "class" member and a "classCount" 
>member and that "class" points to an array of classes, right?
>
>I think it should be possible to minimize the amount of coupling if it 
>turns out to be an issue in practice.
>
>-Boris
>___
>dev-servo mailing list
>dev-servo@lists.mozilla.org
>https://lists.mozilla.org/listinfo/dev-servo

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo