Gecko's HTML5 parser is based on machine-translating the validator.nu parser 
from Java into C++.  We are developing [1] a Rust translation for use in Servo.

I've been working on that recently and I have some doubts about this approach.  
Java and C++ share some features that Rust does not have.  hsivonen and I have 
worked around some of these mismatches, but it's been a fair amount of effort 
already, and the translator is not that close to producing Rust code that will 
even compile.

I think the biggest unknown is memory management.  It's likely that an exact 
copy of the C++ approach will upset the borrowchecker, requiring either unsafe 
code or a more sophisticated translator.  Using unsafe code in the HTML5 parser 
would undermine Servo's security goals.

The translator directly prints C++ or Rust code as it traverses the Java AST.  
This makes it hard to implement anything beyond a close mapping of individual 
syntax elements.

Writing our own HTML5 parser would be a lot of work, but does not seem 
infeasible.  The parsers I've found (including the translated C++ code for 
Gecko) are in the 10-20 KLoC range.  We can do a one-time translation from Java 
for the most mechanical parts, without building a complete translator.

There is a standard test suite [2] for static HTML5 parsers.  Browsers have 
additional requirements due to speculation and document.write(), but it looks 
like [3] Gecko implements that outside the translated parser, so this is code 
we would have to write and test in any case.

The bug thread [4] about landing the HTML5 parser in Gecko may be of interest.

For the short term I will continue to work on the translator and see if we can 
get more clarity about some of these unknowns.  But I'm also inclined to try 
implementing parts of a new HTML5 parser in Rust.  At any rate we should pay 
close attention to Gecko's parser design, and I will continue reading through 
that code.

keegan

[1] https://github.com/mozilla/servo/issues/1289
[2] https://github.com/html5lib/html5lib-tests
[3] https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=487949
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to