Hi all,
In the distant past, SpiderMonkey APIs consumed source text as two-byte UCS-2
or one-byte |const char*|. Was one-byte text ASCII? UTF-8? EBCDIC?
Something else? Who could say; no one thought about text encodings then. *By
happenstance* one-byte JS text was Latin-1: a byte is a code point. And so
lots of people used Latin-1 for JS purely because SpiderMonkey's carelessness
made it easy.
SpiderMonkey's UTF-8 source support is far better and clearer now. Most
single-byte source users use UTF-8. So I'm changing the remaining Gecko
Latin-1 users to UTF-8. The following scripts/script loaders now use
exclusively UTF-8:
* JS components/modules (bug 1492932)
* subscripts via mozIJSSubScriptLoader.loadSubScript{,WithOptions} (bug 1492937)
* mochitest-browser scripts, because they're subscripts (bug 1492937)
* SJS scripts executed by httpd.js, because they're subscripts (bug 1513152,
bug 1492937) [0]
Also, proxy autoconfig scripts may now be valid UTF-8 (bug 1492938). (For
compatibility reasons, invalid UTF-8 is treated as Latin-1, by inflating to
UTF-16 and compiling that.)
Every affected script in the tree used UTF-8, so this just makes reality match
expectation. But it sometimes changes behavior and may affect patch backports:
* You may use non-ASCII code points directly in scripts (even outside comments)
without needing escape sequences.
* If you *intend* to construct a string of the constituent UTF-8 code units of
a non-ASCII code point, you must use hexadecimal escapes: "\xF0\x9F\x92\xA9".
Another step toward fewer text encodings. \o/
Jeff
0. Note that until bug 1514075 lands, SJS scripts used in Android test runs
will be interpreted as Latin-1 there (and only there). Hopefully we can fix
that quickly!
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform