On 03/04/2014 03:13 AM, Henri Sivonen wrote:
It saddens me that we are using non-compliant ad hoc parsers when we already have two spec-compliant (at least at some point in time) ones.

Interesting!  I assume you are referring to:
https://github.com/davidflanagan/html5/blob/master/html5parser.js

Which seems to be (explicitly) derived from:
https://github.com/aredridel/html5

Which in turn seems to actually includes a few parser variants.

Per the discussion with you on https://groups.google.com/d/msg/mozilla.dev.webapi/wDFM_T9v7Tc/Nr9Df4FUwuwJ for the Gaia e-mail app we initially ended up using an in-page data document mechanism for sanitization. We later migrated to using a worker based parser. There were some coordination hiccups with this migration (https://bugzil.la/814257) and some time B2G time-pressure so a comprehensive survey of HTML parsers did not happen so much.

While we have a defense-in-depth strategy (CSP and iframe sandbox should be protecting us from the worst possible scenarios) and we're hopeful that Service Workers will eventually let us provide nsIContentPolicy-level protection, the quality of the HTML parser is of course fairly important[1] to the operation of the HTML sanitizer. If you'd like to bless a specific implementation for workers to perform streaming HTML parsing or other some other explicit strategy, I'd be happy to file a bug for us to go in that direction. Because we are using a white-list based mechanism and are fairly limited and arguably fairly luddite in what we whitelist, it's my hope that our errors are on the side of safety (and breaking adventurous HTML email :), but that is indeed largely hope. Your input is definitely appreciated, especially as it relates to prioritizing such enhancements and potential risk from our current strategy.

Andrew


1: understatement
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to