With the recently landed changes in <http://trac.webkit.org/changeset/100510> 
(and two subsequent fixes in <http://trac.webkit.org/changeset/100523> and 
<http://trac.webkit.org/changeset/100729>), strings in JavaScriptCore are 
stored internally in either 8 bit or 16 bit forms.  This is implemented in the 
StringImpl class and the classes based upon it like JSC::UString and 
WTF::String. Since most platforms have a signed "char" type while a few have an 
unsigned char type or char signedness is selectable via a compiler option, we 
added typedef unsigned char LChar in <wtf/unicode/Unicode.h>.

Changes to Using Strings

Although the UChar* characters() method for the various string classes still 
works, all new code should check what "flavor" a string is constructed by using 
the new is8Bit() method on the various string classes.  After determining the 
flavor, a call to either LChar* characters8() or UChar* characters16() as 
appropriate should be done to access the raw characters of a string.  The call 
to characters() on an 8 bit string will create a 16 bit buffer and convert the 
native 8 bit string, keeping the conversion for future use, before returning 
the 16bit result.  Obviously the expense of this conversion grows with a 
string's length and it increases the memory footprint beyond what was required 
by the original 16 bit string implementation.

The various string construction methods as well as Identifier constructors have 
been modified to create natively sized strings.  The JavaScriptCore lexers and 
parsers favor making 8 bit strings where possibly, even if the source text is 
16 bit.  There are cases where parsing an 8 bit native source string will 
produce a 16 bit string, e.g. the string literal "abc\u1234".

Future Work

This change and it's prior dependent changes are not the end of the 8 bit 
string work.  In fact it should be seen as the foundation for the real 8 bit 
string work tuning JavaScriptCore and also in WebCore. The goal is to make 
WebCore's processing of text use appropriately sized strings. For Latin-1 based 
documents, string processing will be done using 8 bits except where string 
escapes require 16 bit strings. 

- Michael Saboff
[email protected]

_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Reply via email to