Re: How to efficiently walk the DOM tree and its strings

Andrew Sutherland Tue, 04 Mar 2014 01:28:25 -0800

On 03/04/2014 03:13 AM, Henri Sivonen wrote:

It saddens me that we are using non-compliant ad hoc parsers when wealready have two spec-compliant (at least at some point in time) ones.


Interesting!  I assume you are referring to:
https://github.com/davidflanagan/html5/blob/master/html5parser.js

Which seems to be (explicitly) derived from:
https://github.com/aredridel/html5

Which in turn seems to actually includes a few parser variants.

Per the discussion with you onhttps://groups.google.com/d/msg/mozilla.dev.webapi/wDFM_T9v7Tc/Nr9Df4FUwuwJfor the Gaia e-mail app we initially ended up using an in-page datadocument mechanism for sanitization. We later migrated to using aworker based parser. There were some coordination hiccups with thismigration (https://bugzil.la/814257) and some time B2G time-pressure soa comprehensive survey of HTML parsers did not happen so much.

While we have a defense-in-depth strategy (CSP and iframe sandbox shouldbe protecting us from the worst possible scenarios) and we're hopefulthat Service Workers will eventually let us providensIContentPolicy-level protection, the quality of the HTML parser is ofcourse fairly important[1] to the operation of the HTML sanitizer. Ifyou'd like to bless a specific implementation for workers to performstreaming HTML parsing or other some other explicit strategy, I'd behappy to file a bug for us to go in that direction. Because we areusing a white-list based mechanism and are fairly limited and arguablyfairly luddite in what we whitelist, it's my hope that our errors are onthe side of safety (and breaking adventurous HTML email :), but that isindeed largely hope. Your input is definitely appreciated, especiallyas it relates to prioritizing such enhancements and potential risk fromour current strategy.


Andrew


1: understatement
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: How to efficiently walk the DOM tree and its strings

Reply via email to