Re: [Python-Dev] PEP 393: Special-casing ASCII-only strings

Martin v. Löwis Thu, 15 Sep 2011 22:46:08 -0700

Am 16.09.11 00:42, schrieb Nick Coghlan:

On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. Löwis
<mar...@v.loewis.de> wrote:

Thinking about this, the following may work:


- ASCIIObject: state, length, hash, wstr*, data follow

- SingleBlockUnicode: ASCIIObject, wstr_len, utf8*, utf8_len, data
follow

- UnicodeObject: SingleBlockUnicode, data pointer, no data follow

This is essentially your proposal, except that the wstr_len is
dropped for ASCII strings, and that it uses nested structs.

The single-block variants would always be "ready", the full unicode
object is ready only if the data pointer is set.


In your "UnicodeObject" here, is the 'data pointer' the
any/latin1/ucs2/ucs4 union from the original structure definition?


Yes, it is. I'm considering dropping the union again, since you'll
have to cast the data pointer anyway in the compact cases.

Also, what are the constraints on the "SingleBlockUnicode"? Does it
only hold strings that can be represented in latin1? Or can the size
 of the individual elements be more than 1 byte?


Any size - what matters is whether the maximum character is known
at creation time (i.e. whether you've used PyUnicode_New(size, maxchar)
or PyUnicode_FromUnicode(NULL, size)). In the latter case, a Py_UNICODE
block will be allocated in wstr, and the data pointer left NULL.
Then, when PyUnicode_Ready is called, the maxmimum character is
determined in the Py_UNICODE block, and a new data block allocated -
but that will have to be a second memory block (the Py_UNICODE
block is then dropped in _Ready).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Special-casing ASCII-only strings

Reply via email to