Torsten Becker, 22.08.2011 20:58:
I have implemented an initial version of PEP 393 -- "Flexible String
Representation" as part of my Google Summer of Code project.  My patch
is hosted as a repository on bitbucket [1] and I created a related
issue on the bug tracker [2].  I posted documentation for the current
state of the development in the wiki [3].

One thing that occurred to me regarding the object struct:

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;       /* Number of code points in the string */
    void *str;               /* Canonical, smallest-form Unicode buffer */
    Py_hash_t hash;          /* Hash value; -1 if not set */
    int state;               /* != 0 if interned. In this case the two
                              * references from the dictionary to this
                              * object are *not* counted in ob_refcnt.
                              * See SSTATE_KIND_* for other bits */
    Py_ssize_t utf8_length;  /* Number of bytes in utf8, excluding the
                              * terminating \0. */
    char *utf8;              /* UTF-8 representation (null-terminated) */
    Py_ssize_t wstr_length;  /* Number of code points in wstr, possible
                              * surrogates count as two code points. */
    wchar_t *wstr;           /* wchar_t representation (null-terminated) */
} PyUnicodeObject;


Wouldn't the "normal" approach be to use a union for the str field? I.e.

    union str {
       unsigned char* latin1;
       Py_UCS2* ucs2;
       Py_UCS4* ucs4;
    }

Given that they're all pointers, all fields have the same size, but I find it more readable to write

    u.str.latin1

than

    ((const unsigned char*)u.str)

Plus, the three types would be given by the struct, rather than by a per-usage cast.

Has this been considered before? Was there a reason to decide against it?

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to