Gecko's HTML5 parser is based on machine-translating the validator.nu parser from Java into C++. We are developing [1] a Rust translation for use in Servo.
I've been working on that recently and I have some doubts about this approach. Java and C++ share some features that Rust does not have. hsivonen and I have worked around some of these mismatches, but it's been a fair amount of effort already, and the translator is not that close to producing Rust code that will even compile. I think the biggest unknown is memory management. It's likely that an exact copy of the C++ approach will upset the borrowchecker, requiring either unsafe code or a more sophisticated translator. Using unsafe code in the HTML5 parser would undermine Servo's security goals. The translator directly prints C++ or Rust code as it traverses the Java AST. This makes it hard to implement anything beyond a close mapping of individual syntax elements. Writing our own HTML5 parser would be a lot of work, but does not seem infeasible. The parsers I've found (including the translated C++ code for Gecko) are in the 10-20 KLoC range. We can do a one-time translation from Java for the most mechanical parts, without building a complete translator. There is a standard test suite [2] for static HTML5 parsers. Browsers have additional requirements due to speculation and document.write(), but it looks like [3] Gecko implements that outside the translated parser, so this is code we would have to write and test in any case. The bug thread [4] about landing the HTML5 parser in Gecko may be of interest. For the short term I will continue to work on the translator and see if we can get more clarity about some of these unknowns. But I'm also inclined to try implementing parts of a new HTML5 parser in Rust. At any rate we should pay close attention to Gecko's parser design, and I will continue reading through that code. keegan [1] https://github.com/mozilla/servo/issues/1289 [2] https://github.com/html5lib/html5lib-tests [3] https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading [4] https://bugzilla.mozilla.org/show_bug.cgi?id=487949 _______________________________________________ dev-servo mailing list dev-servo@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo