# Intent to ship: UTF-8 parsing of external <script>s and worker scripts
## Introduction JS acts on 16-bit code units (UTF-16 with lone surrogates permitted), because 1990s. 💯 As a consequence, SpiderMonkey has long handled only 16-bit source text. APIs taking `const char*` or `const JS::Latin1Char*` or similar just inflated to UTF-16 and processed that. Now, UTF-8 is ubiquitous. And scripts have lots of ASCII keywords and operators compactly represented in UTF-8. I've been making SpiderMonkey natively handle UTF-8 source text, using lots of templates and template specializations. (*Only* valid UTF-8: any invalidity, including for WTF-8, is an immediate error, no replacement-character semantics applied.) Uncompressed UTF-8 typically requires half the bytes and processing of UTF-16. And compressed UTF-8 is also generally smaller than compressed UTF-16, because compressors needn't devote bandwidth to lots of null bytes. Since late May, DOM workers' accumulated UTF-8 data is directly parsed as UTF-8 in nightly builds. Since mid-June, external <script> data is also accumulated and then directly parsed as UTF-8 in nightly builds (pref-controlled, in both cases). (I haven't changed inline <script>s: they're often small and aren't stored as UTF-8, so benefits there are less clear.) Bugs found and created during the entire effort were all readily fixed. No bugs have been reported on the pref-flips -- or at least you haven't reported any. ;-) So it seems like a good time to set the pref to flatly true in nightly and beta. The full month and a half remaining til next merge ought be plenty of time for the beta audience to suss out any remaining issues for fixing. ## Tracking bugs https://bugzilla.mozilla.org/show_bug.cgi?id=1543517 (to enable in beta) https://bugzilla.mozilla.org/show_bug.cgi?id=1543514 (to enable in release -- but if the prior bug is fixed, this will just happen naturally next uplift) ## Platform coverage All ## Estimated or target release 69, if all goes to plan ## Where to send your bugs Bugs in the JS side of this go here: https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=JavaScript%20Engine Bugs in the DOM side, prior to invoking UTF-8 parsing, go here: https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=DOM:%20Core%20%26%20HTML If you don't know which you have, file a JS bug and put a needinfo on me, and I'll move it to the right place. Jeff _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform