Re: [Python-Dev] PEP 393: Special-casing ASCII-only strings

Terry Reedy Thu, 15 Sep 2011 11:48:18 -0700

On 9/15/2011 11:50 AM, "Martin v. Löwis" wrote:

To comply with the C aliasing rules, the structures would look like this:

typedef struct {
PyObject_HEAD
Py_ssize_t length;
union {
void *any;
Py_UCS1 *latin1;
Py_UCS2 *ucs2;
Py_UCS4 *ucs4;
} data;
Py_hash_t hash;
int state; /* may include SSTATE_SHORT_ASCII flag */
wchar_t *wstr;
} PyASCIIObject;


typedef struct {
PyASCIIObject _base;
Py_ssize_t utf8_length;
char *utf8;
Py_ssize_t wstr_length;
} PyUnicodeObject;

Code that directly accesses the structures would become more
complex; code that use the accessor macros wouldn't notice.

...

What do you think?

That nearly all code outside CPython itself should treat the unicodetypes, especially, as opaque types and only access instances throughfunctions and macros -- the 'public' interfaces. We need to be free tofiddle with internal implementation details as experience suggests changes.

P.S. There are similar reductions that could be applied
to the wstr_length in general: on 32-bit wchar_t systems,
it could be always dropped, on a 16-bit wchar_t system,
it could be dropped for UCS-2 strings. However, I'm not
proposing these, as I think the increase in complexity
is not worth the savings.

I would certainly do just the one change now and see how it goes. Ithink you should be free to do more like the above if you change yourmind with experience.


--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Special-casing ASCII-only strings

Reply via email to