On 9/15/2011 11:50 AM, "Martin v. Löwis" wrote:
To comply with the C aliasing rules, the structures would look like this: typedef struct { PyObject_HEAD Py_ssize_t length; union { void *any; Py_UCS1 *latin1; Py_UCS2 *ucs2; Py_UCS4 *ucs4; } data; Py_hash_t hash; int state; /* may include SSTATE_SHORT_ASCII flag */ wchar_t *wstr; } PyASCIIObject; typedef struct { PyASCIIObject _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyUnicodeObject; Code that directly accesses the structures would become more complex; code that use the accessor macros wouldn't notice.
...
What do you think?
That nearly all code outside CPython itself should treat the unicode types, especially, as opaque types and only access instances through functions and macros -- the 'public' interfaces. We need to be free to fiddle with internal implementation details as experience suggests changes.
P.S. There are similar reductions that could be applied to the wstr_length in general: on 32-bit wchar_t systems, it could be always dropped, on a 16-bit wchar_t system, it could be dropped for UCS-2 strings. However, I'm not proposing these, as I think the increase in complexity is not worth the savings.
I would certainly do just the one change now and see how it goes. I think you should be free to do more like the above if you change your mind with experience.
-- Terry Jan Reedy _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com