A Friday 27 November 2009 10:47:53 Pauli Virtanen escrigué: > 1) For 'S' dtype, I believe we use Bytes for the raw data and the > interface. > > Maybe we want to introduce a separate "bytes" dtype that's an alias > for 'S'?
Yeah. As regular strings in Python 3 are Unicode, I think that introducing separate "bytes" dtype would help doing the transition. Meanwhile, the next should still work: In [2]: s = np.array(['asa'], dtype="S10") In [3]: s[0] Out[3]: 'asa' # will become b'asa' in Python 3 In [4]: s.dtype.itemsize Out[4]: 10 # still 1-byte per element Also, I suppose that there will be issues with the current Unicode support in NumPy: In [5]: u = np.array(['asa'], dtype="U10") In [6]: u[0] Out[6]: u'asa' # will become 'asa' in Python 3 In [7]: u.dtype.itemsize Out[7]: 40 # not sure about the size in Python 3 For example, if it is true that internal strings in Python 3 and Unicode UTF-8 (as René seems to suggest), I suppose that the internal conversions from 2- bytes or 4-bytes (depending on how the Python interpreter has been compiled) in NumPy Unicode dtype to the new Python string should have to be reworked (perhaps you have dealt with that already). Cheers, -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion