[dev-servo] Rooting progress
One of many hurdles in upgrading to a newer SpiderMonkey revision was the exact rooting requirement, and I've got a branch that has a decent start on that: https://github.com/jdm/servo/commits/newroot In short, with my changes all of Servo's content tests run correctly despite GCing every time SpiderMonkey allocates anything, suggesting that our compiler-generated trace hooks are effective, and the on-stack rooting API I've implemented is doing its job. There's more work required to devise some types that prevent errors like the HTML parser one that I fixed in the most recent commit, but I thought I'd post my early promising results. Cheers, Josh ___ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo
Re: [dev-servo] Rooting progress
On 03/29/2014 01:35 PM, Josh Matthews wrote: One of many hurdles in upgrading to a newer SpiderMonkey revision was the exact rooting requirement, and I've got a branch that has a decent start on that: https://github.com/jdm/servo/commits/newroot In short, with my changes all of Servo's content tests run correctly despite GCing every time SpiderMonkey allocates anything, suggesting that our compiler-generated trace hooks are effective, and the on-stack rooting API I've implemented is doing its job. There's more work required to devise some types that prevent errors like the HTML parser one that I fixed in the most recent commit, but I thought I'd post my early promising results. Cheers, Josh For those who want quick links, https://github.com/jdm/servo/commit/3d3c8e8cdc833e88d81b4ac9f84e4cb2822dde55 shows the changes required for hand-written DOM code so far (ie. all DOM methods now take the equivalent of JS::Handle and arguments must therefore be explicitly rooted). Generated bindings code Just Works as you would expect. The parser rooting error I references is https://github.com/jdm/servo/commit/0efd09497bb7a956d04b7589b685c7ba43cd1f06#diff-f716191032d0b98cc78247b27ec08f5dL75 .. The parser is callback-based, so we were a) creating an unrooted JS object, then GCing it when we tried to attach any attributes to it, b) not rooting the element during the period between when it's created and when it's attached to the tree. I suspect the solution for this is to make methods return an Unrooted instead of JS, but I'll spend some time thinking through the possibilities. Cheers, Josh ___ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo
Re: [dev-servo] character encoding in the HTML parser
On 10/03/2014 23:54, Keegan McAllister wrote: [...] Also, should we follow Gecko in representing the input stream as a queue of UTF-16 buffers? With UTF-8 we would have about half as much data to stream through the parser, and (on 64-bit) we could do case-insensitive ASCII operations 8 characters at a time. UTF-8 would be nice, but I think we’re stuck with UTF-16 for the reasons below. (Actually sequences of 16 bit integers containing potentially-invalid UTF-16. Is that called "UCS-2"?) Or I guess we could use what I’ll call "evil UTF-8", which is UTF-8 without the artificial restriction of not encoding surrogates. But I feel bad just suggesting it. (This restriction, as well as the upper limit of U+10, exist to align with the value-space of UTF-16. UTF-8’s underlying algorithm is perfectly fine without either.) Speaking of which, [5] Any character that is a not a Unicode character, i.e. any isolated surrogate, is a parse error. (These can only find their way into the input stream via script APIs such as document.write().) I don't see a designated error-recovery behavior, unlike most parse errors in the spec. Is there an implicit behavior that applies to input stream preprocessing? Anyway I hope that this means we don't need to represent isolated surrogates in the input to a UTF-8 parser. [5] http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#input-stream As far as I understand, a "parse error" in the spec is meant for conformance checkers (validators), not user agents. There is no error recovery behavior, because this is not an error. I’d be surprised if anyone relies on truly isolated surrogates, because Chrome gets very confused when rendering them: data:text/html,document.write("a\uD800b") Unfortunately I’d be less surprised if someone relies on having the two halves of a surrogate pair in separate document.write() call, as this seems more interoperable: data:text/html,document.write("\uD83D");document.write("\uDCA9") -- Simon Sapin ___ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo
Re: [dev-servo] character encoding in the HTML parser
On 3/29/14 6:56 PM, Simon Sapin wrote: Or I guess we could use what I’ll call "evil UTF-8", which is UTF-8 without the artificial restriction of not encoding surrogates. http://en.wikipedia.org/wiki/CESU-8 As far as I understand, a "parse error" in the spec is meant for conformance checkers (validators), not user agents. There is no error recovery behavior, because this is not an error. Technically a UA is allowed to just stop parsing at a "parse error". For a browser, of course, that's not web-compatible as you note. -Boris ___ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo
[dev-servo] Crazy idea: CSS selector JITting at parse time
Hi everyone, I've been discussing this idea with a few people in person over the past week, and nobody told me it was completely insane. ;) So I thought I'd send this idea around. By now many people have heard of WebKit's CSS JIT. It's a surprisingly small amount of code. One of the issues that they cite in their blog post [1] is compilation time: the compilation has to be very fast in order to avoid nullifying the gains. With that in mind, I had an idea: what if the "AST" that the CSS parser generated was actually machine code? The AST for the CSS grammar is extremely small: 176 lines of Rust for the whole thing [2], and the selector grammar is just a fraction of that. This is because selectors describe a very simple language, even with all the extensions that have been added over the years. So one could write a simple template-emitting compiler that takes CSS selectors and writes a small template of machine code. (This is in fact the approach that WebKit uses.) For example, I could imagine that the JIT code for the selector ".foo #a" might look like (assuming that "#a" has already been matched in the ID hash): ; assume that node is in ebx l1: mov ebx,[ebx+offsetof(parentNode)] test ebx,ebx jz nomatch mov edx,[ebx+offsetof(class)] xor ecx,ecx l2: cmp [edx+ecx*4],"foo" je match inc ecx cmp ecx,[ebx+offsetof(classcount)] jne l2 jmp l1 This is just 29 bytes of code when assembled. This is likely larger than the equivalent `nsRuleNode`, but `nsRuleNode` instances are not a large amount of the heap in my cursory glances at `about:memory`. The major question is: If you don't have an AST, then how do you implement the CSSOM and style debugging? Here's the trick: Since CSS selectors are so simple compared to, say, the output of a JS JIT, it should be possible to *disassemble* the code by simply pattern-matching. Since the compiler is nothing more than a very simple template processor (and it has to be, for speed reasons) it should be possible to reverse the output and go from JIT code back to the CSS selectors. This would be done by performing some very simple analysis (not full disassembly) on the machine code. For instance, on the above selector, it might do: ; start with `#a` from the rule hash l1: mov ebx,[ebx+offsetof(parentNode)] test ebx,ebx jz nomatch ; found a test for parentNode, so we must be a descendant or ; child selector (depending on whether we see a backwards jump) mov edx,[ebx+offsetof(class)] xor ecx,ecx l2: cmp [edx+ecx*4],"foo" je match inc ecx cmp ecx,[ebx+offsetof(classcount)] jne l2 ; found a test for the "foo" class jmp l1 ; found a backwards jump, so we know we're a descendant selector ; result: `.foo #a` So you wouldn't need a full disassembler, just a simple pattern matcher. Note that one doesn't have to JIT everything: for more complex tests such as "attribute contains substring", we can just call out to functions implemented in Rust in the JIT code. The disassembler can just pattern match on those function addresses and arguments. This also increases the maintainability of this approach in the presence of new stuff that the CSS working groups add to the spec: we can implement all new features as functions at first, and only JIT them if they become widely used. Thoughts? Patrick [1]: https://www.webkit.org/blog/3271/webkit-css-selector-jit-compiler/ [2]: https://github.com/mozilla-servo/rust-cssparser/blob/master/ast.rs ___ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo
Re: [dev-servo] Crazy idea: CSS selector JITting at parse time
On 3/29/14 8:23 PM, Patrick Walton wrote: This is just 29 bytes of code when assembled. This is likely larger than the equivalent `nsRuleNode` The right comparison is not nsRuleNode but nsCSSSelector, right? These are actually pretty bloated in Gecko right now. For example, ".foo #a" is parsed into a linked list of two nsCSSSelectors. nsCSSSelector consists of 8 words and 8 more bytes, so 40 bytes on 32-bit and 72 bytes on 64-bit. So right now storing the parsed version of ".foo #a" is 80 or 144 bytes in Gecko. This is not great, and we've been talking about shrinking these for a while, but it's clearly not fatal. The rest of your proposal seems pretty workable. The biggest issue, I suspect, is a maintainability one: it forces a pretty tight coupling between your selector jit and the object layout of your nodes, for any of the tests performed directly in jitcode. For example, your cited jitcode assumes that nodes have a "class" member and a "classCount" member and that "class" points to an array of classes, right? I think it should be possible to minimize the amount of coupling if it turns out to be an issue in practice. -Boris ___ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo
Re: [dev-servo] Crazy idea: CSS selector JITting at parse time
Yes, it does have pretty tight coupling. That is the biggest risk. On a related note, I have been tossing around ideas today for using SIMD to match multiple selectors that have the same "shape" in parallel. For example, if we have ".foo #a" and ".bar #a" it may be possible to use the packed comparison instructions in SSE4 to match both at the same time. Obviously, this adds significant complexity and correspondingly increased maintenance burden, and its effectiveness depends on how often selectors have the same shape in the wild (if it works at all). So I'm filing it into the "potentially interesting project, not a high priority" mental bin. Could be neat though. Patrick On March 29, 2014 6:55:41 PM PDT, Boris Zbarsky wrote: >On 3/29/14 8:23 PM, Patrick Walton wrote: >> This is just 29 bytes of code when assembled. This is likely larger >than >> the equivalent `nsRuleNode` > >The right comparison is not nsRuleNode but nsCSSSelector, right? > >These are actually pretty bloated in Gecko right now. For example, >".foo #a" is parsed into a linked list of two nsCSSSelectors. >nsCSSSelector consists of 8 words and 8 more bytes, so 40 bytes on >32-bit and 72 bytes on 64-bit. So right now storing the parsed version > >of ".foo #a" is 80 or 144 bytes in Gecko. This is not great, and we've > >been talking about shrinking these for a while, but it's clearly not >fatal. > >The rest of your proposal seems pretty workable. The biggest issue, I >suspect, is a maintainability one: it forces a pretty tight coupling >between your selector jit and the object layout of your nodes, for any >of the tests performed directly in jitcode. For example, your cited >jitcode assumes that nodes have a "class" member and a "classCount" >member and that "class" points to an array of classes, right? > >I think it should be possible to minimize the amount of coupling if it >turns out to be an issue in practice. > >-Boris >___ >dev-servo mailing list >dev-servo@lists.mozilla.org >https://lists.mozilla.org/listinfo/dev-servo -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. ___ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo