I was hoping to write a more thorough blog post about this proposal (I have some notes in a gist [1]), but for now I've added comments inline. The main takeaway here is that I want to do a bare-bones replacement of just the parts of expat we currently use. It needs to support DTD entities, have a streaming interface, and support XML 1 v4. That's it, no new features, no rewrite of our entire XML stack.
-e [1] https://gist.github.com/EricRahm/f718c4d8a862cc08b69d7d4290c02927 On Mon, May 22, 2017 at 11:43 PM, Henri Sivonen <hsivo...@hsivonen.fi> wrote: > In reference to: https://twitter.com/nnethercote/status/866792097101238272 > > Is the rewrite meant to replace expat only or also some of our old > code on both above and below expat? > Just expat. > Back in 2011, I wrote a plan for rewriting the code around expat > without rewriting expat itself: > https://wiki.mozilla.org/Platform/XML_Rewrite > I've had higher-priority stuff to do ever since... > > Yes, I've seen this. It explicitly calls out not replacing expat, so the plans are mostly orthogonal. > (The above plan talks about pushing UTF-16 to the XML parser and > having deep C++ namespaces. Any project starting this year should make > the new parser use UTF-8 internally for cache-friendliness and use > less deep C++ namespaces.) > Our current interface is UTF-16, so that's my target for now. I think whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 -> UTF-16. > Also, I think the decision of which XML version to support should be a > deliberate decision and not an accident. I think the reasonable > choices are XML 1.0 4th edition (not rocking the boat) and reviving > XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 , > latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead. > XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1 > cake too and expanded the set of documents that parser doesn't reject. > Any of the newly well-forming documents would be incompatible with 4th > ed. and earlier parsers, which would be a break from universal XML > interop. I think it doesn't make sense to relax XML only a bit. If XML > is to be relaxed (breaking interop in the sense of starting to accept > docs that old browsers would show the Yellow Screen of Death on), we > should go all the way (i.e. XML5). > > My current goal is a drop-in replacement for expat with just the features gecko cares about, so just 1.0 version 4 I guess. It's possible whatever we end up with could be merged with another library when XML5 is settled, but I don't want to wait for that. > Notably, it looks like Servo already has an XML5 parser written in Rust: > https://github.com/servo/html5ever/tree/master/xml5ever > > Yes, this lacks DTD support (and 1.0 support). > The tweets weren't clear about whether xml5ever had been considered, > but https://twitter.com/eroc/status/866808814959378434 looks like it's > talking about writing a new one. > > Correct, I looked at xml5ever and spoke with some folks on #servo about it. It doesn't meet Firefox's requirements. > It seems like integrating xml5ever (as opposed to another XML parser > written in Rust) into Gecko would give some insight into how big a > deal it would be to replace Gecko's HTML parser with html5ever > (although due to document.write(), HTML is always a bigger deal > integration-wise than XML). > > That's a non-goal for me, but I can see how it would be useful. > (If the outcome here is to do XML5, we should make sure the spec is > polished enough at the WHATWG in order not to a unilateral thing in > relative secret.) > > That is not my current goal, but that seems reasonable regardless of this project. > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ > _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform