# Intent to ship: UTF-8 parsing of external <script>s and worker scripts

## Introduction

JS acts on 16-bit code units (UTF-16 with lone surrogates permitted), because 
1990s.  💯  As a consequence, SpiderMonkey has long handled only 16-bit source 
text.  APIs taking `const char*` or `const JS::Latin1Char*` or similar just 
inflated to UTF-16 and processed that.

Now, UTF-8 is ubiquitous.  And scripts have lots of ASCII keywords and 
operators compactly represented in UTF-8.

I've been making SpiderMonkey natively handle UTF-8 source text, using lots of 
templates and template specializations.  (*Only* valid UTF-8: any invalidity, 
including for WTF-8, is an immediate error, no replacement-character semantics 
applied.)  Uncompressed UTF-8 typically requires half the bytes and processing 
of UTF-16.  And compressed UTF-8 is also generally smaller than compressed 
UTF-16, because compressors needn't devote bandwidth to lots of null bytes.

Since late May, DOM workers' accumulated UTF-8 data is directly parsed as UTF-8 
in nightly builds.  Since mid-June, external <script> data is also accumulated 
and then directly parsed as UTF-8 in nightly builds (pref-controlled, in both 
cases).  (I haven't changed inline <script>s: they're often small and aren't 
stored as UTF-8, so benefits there are less clear.)  Bugs found and created 
during the entire effort were all readily fixed.  No bugs have been reported on 
the pref-flips -- or at least you haven't reported any.  ;-)

So it seems like a good time to set the pref to flatly true in nightly and 
beta.  The full month and a half remaining til next merge ought be plenty of 
time for the beta audience to suss out any remaining issues for fixing.

## Tracking bugs

https://bugzilla.mozilla.org/show_bug.cgi?id=1543517 (to enable in beta)
https://bugzilla.mozilla.org/show_bug.cgi?id=1543514 (to enable in release -- 
but if the prior bug is fixed, this will just happen naturally next uplift)

## Platform coverage

All

## Estimated or target release

69, if all goes to plan

## Where to send your bugs

Bugs in the JS side of this go here:

https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=JavaScript%20Engine

Bugs in the DOM side, prior to invoking UTF-8 parsing, go here:

https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=DOM:%20Core%20%26%20HTML

If you don't know which you have, file a JS bug and put a needinfo on me, and 
I'll move it to the right place.

Jeff
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to