pe, 2009-11-27 kello 11:17 +0100, Francesc Alted kirjoitti: > A Friday 27 November 2009 10:47:53 Pauli Virtanen escrigué: > > 1) For 'S' dtype, I believe we use Bytes for the raw data and the > > interface. > > > > Maybe we want to introduce a separate "bytes" dtype that's an alias > > for 'S'? > > Yeah. As regular strings in Python 3 are Unicode, I think that introducing > separate "bytes" dtype would help doing the transition. Meanwhile, the next > should still work: > > In [2]: s = np.array(['asa'], dtype="S10") > > In [3]: s[0] > Out[3]: 'asa' # will become b'asa' in Python 3 > > In [4]: s.dtype.itemsize > Out[4]: 10 # still 1-byte per element
Yes. But now I wonder, should array(['foo'], str) array(['foo']) be of dtype 'S' or 'U' in Python 3? I think I'm leaning towards 'U', which will mean unavoidable code breakage -- there's probably no avoiding it. [clip] > Also, I suppose that there will be issues with the current Unicode support in > NumPy: > > In [5]: u = np.array(['asa'], dtype="U10") > > In [6]: u[0] > Out[6]: u'asa' # will become 'asa' in Python 3 > > In [7]: u.dtype.itemsize > Out[7]: 40 # not sure about the size in Python 3 I suspect the Unicode stuff will keep working without major changes, except maybe dropping the u in repr. It is difficult to believe the CPython guys would have significantly changed the current Unicode implementation, if they didn't bother changing the names of the functions :) > For example, if it is true that internal strings in Python 3 and Unicode > UTF-8 > (as René seems to suggest), I suppose that the internal conversions from 2- > bytes or 4-bytes (depending on how the Python interpreter has been compiled) > in NumPy Unicode dtype to the new Python string should have to be reworked > (perhaps you have dealt with that already). I don't think they are internally UTF-8: http://docs.python.org/3.1/c-api/unicode.html """Python’s default builds use a 16-bit type for Py_UNICODE and store Unicode values internally as UCS2.""" -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion