We’ve discussed using UTF-8 internally for strings in Servo, but well-formed UTF-8 can not represent surrogate code points.

JavaScript strings, however, can. (They are effectively potentially ill-formed UTF-16.) It’s possible (?) that the Web depends on these surrogates being preserved.

So instead of UTF-8, we can use something we’ll call Wobbly Transformation Format − 8-bit (WTF-8).

* Specification: https://simonsapin.github.io/wtf-8/
* Rust implementation, as a Cargo library:
  https://github.com/SimonSapin/rust-wtf8
* Library documentation:
  https://simonsapin.github.io/rust-wtf8/wtf8/index.html

It is a strict superset of UTF-8 (like UTF-8 is a strict superset of ASCII), so converting from well-formed UTF-8 is a no-op. It can losslessly represent all values JavaScript strings can (code points, including surrogates, as long as they’re not paired.) Concatenating needs care to behave like concatenating JS strings would. (Convert newly-paired surrogates into supplementary code points.)


Proposal for Servo: use WTF-8 internally for all strings in the DOM and for HTML parsing.

* rust-encoding decodes bytes form the network into well-formed UTF-8
* document.write() converts its argument to WTF-8
* The html5ever tokenizer takes buffers that are either UTF-8 or WTF-8
* html5ever uses WTF-8 everywhere internally, and emits WTF-8 to the tree builder. * (Optionally, html5ever could support a separate UTF-8 only interface for non-Servo users that don’t need to support document.write().)
* Servo’s DOM stores WTF-8.
* Strings are converted to/from potentially ill-formed UTF-16 (that SpiderMonkey can use) by the bindings code generation, at the boundary between JavaScript and Rust.


In the future, if the JS team thinks it’s a good idea (and figures something out for .charAt() and friends), SpiderMonkey could support WTF-8 internally for JS strings and Servo’s bindings could remove the conversion.


What do you think?
--
Simon Sapin
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to