We’ve discussed using UTF-8 internally for strings in Servo, but
well-formed UTF-8 can not represent surrogate code points.
JavaScript strings, however, can. (They are effectively potentially
ill-formed UTF-16.) It’s possible (?) that the Web depends on these
surrogates being preserved.
So instead of UTF-8, we can use something we’ll call Wobbly
Transformation Format − 8-bit (WTF-8).
* Specification: https://simonsapin.github.io/wtf-8/
* Rust implementation, as a Cargo library:
https://github.com/SimonSapin/rust-wtf8
* Library documentation:
https://simonsapin.github.io/rust-wtf8/wtf8/index.html
It is a strict superset of UTF-8 (like UTF-8 is a strict superset of
ASCII), so converting from well-formed UTF-8 is a no-op. It can
losslessly represent all values JavaScript strings can (code points,
including surrogates, as long as they’re not paired.) Concatenating
needs care to behave like concatenating JS strings would. (Convert
newly-paired surrogates into supplementary code points.)
Proposal for Servo: use WTF-8 internally for all strings in the DOM and
for HTML parsing.
* rust-encoding decodes bytes form the network into well-formed UTF-8
* document.write() converts its argument to WTF-8
* The html5ever tokenizer takes buffers that are either UTF-8 or WTF-8
* html5ever uses WTF-8 everywhere internally, and emits WTF-8 to the
tree builder.
* (Optionally, html5ever could support a separate UTF-8 only interface
for non-Servo users that don’t need to support document.write().)
* Servo’s DOM stores WTF-8.
* Strings are converted to/from potentially ill-formed UTF-16 (that
SpiderMonkey can use) by the bindings code generation, at the boundary
between JavaScript and Rust.
In the future, if the JS team thinks it’s a good idea (and figures
something out for .charAt() and friends), SpiderMonkey could support
WTF-8 internally for JS strings and Servo’s bindings could remove the
conversion.
What do you think?
--
Simon Sapin
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo