I was hoping to write a more thorough blog post about this proposal (I have
some notes in a gist [1]), but for now I've added comments inline. The main
takeaway here is that I want to do a bare-bones replacement of just the
parts of expat we currently use. It needs to support DTD entities, have a
streaming interface, and support XML 1 v4. That's it, no new features, no
rewrite of our entire XML stack.

-e

[1] https://gist.github.com/EricRahm/f718c4d8a862cc08b69d7d4290c02927

On Mon, May 22, 2017 at 11:43 PM, Henri Sivonen <hsivo...@hsivonen.fi>
wrote:

> In reference to: https://twitter.com/nnethercote/status/866792097101238272
>
> Is the rewrite meant to replace expat only or also some of our old
> code on both above and below expat?
>

Just expat.


> Back in 2011, I wrote a plan for rewriting the code around expat
> without rewriting expat itself:
> https://wiki.mozilla.org/Platform/XML_Rewrite
> I've had higher-priority stuff to do ever since...
>
>
Yes, I've seen this. It explicitly calls out not replacing expat, so the
plans are mostly orthogonal.


> (The above plan talks about pushing UTF-16 to the XML parser and
> having deep C++ namespaces. Any project starting this year should make
> the new parser use UTF-8 internally for cache-friendliness and use
> less deep C++ namespaces.)
>

Our current interface is UTF-16, so that's my target for now. I think
whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8
-> UTF-16.


> Also, I think the decision of which XML version to support should be a
> deliberate decision and not an accident. I think the reasonable
> choices are XML 1.0 4th edition (not rocking the boat) and reviving
> XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 ,
> latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead.
> XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1
> cake too and expanded the set of documents that parser doesn't reject.
> Any of the newly well-forming documents would be incompatible with 4th
> ed. and earlier parsers, which would be a break from universal XML
> interop. I think it doesn't make sense to relax XML only a bit. If XML
> is to be relaxed (breaking interop in the sense of starting to accept
> docs that old browsers would show the Yellow Screen of Death on), we
> should go all the way (i.e. XML5).
>
>
My current goal is a drop-in replacement for expat with just the features
gecko cares about, so just 1.0 version 4 I guess. It's possible whatever we
end up with could be merged with another library when XML5 is settled, but
I don't want to wait for that.


> Notably, it looks like Servo already has an XML5 parser written in Rust:
> https://github.com/servo/html5ever/tree/master/xml5ever
>
>
Yes, this lacks DTD support (and 1.0 support).


> The tweets weren't clear about whether xml5ever had been considered,
> but https://twitter.com/eroc/status/866808814959378434 looks like it's
> talking about writing a new one.
>
>
Correct, I looked at xml5ever and spoke with some folks on #servo about it.
It doesn't meet Firefox's requirements.


> It seems like integrating xml5ever (as opposed to another XML parser
> written in Rust) into Gecko would give some insight into how big a
> deal it would be to replace Gecko's HTML parser with html5ever
> (although due to document.write(), HTML is always a bigger deal
> integration-wise than XML).
>
>
That's a non-goal for me, but I can see how it would be useful.


> (If the outcome here is to do XML5, we should make sure the spec is
> polished enough at the WHATWG in order not to a unilateral thing in
> relative secret.)
>
>
That is not my current goal, but that seems reasonable regardless of this
project.


> --
> Henri Sivonen
> hsivo...@hsivonen.fi
> https://hsivonen.fi/
>
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to