A Friday 27 November 2009 13:23:10 René Dudfield escrigué: > >> I don't think they are internally UTF-8: > >> http://docs.python.org/3.1/c-api/unicode.html > >> > >> """Python’s default builds use a 16-bit type for Py_UNICODE and store > >> Unicode values internally as UCS2.""" > > > > Ah! No changes for that matter. Much better then. > > Hello, > > > in py3... > > >>> 'Hello\u0020World !'.encode() > > b'Hello World !' > > >>> "Äpfel".encode('utf-8') > > b'\xc3\x84pfel' > > >>> "Äpfel".encode() > > b'\xc3\x84pfel' > > The default encoding does appear to be utf-8 in py3. > > Although it is compiled with something different, and stores it as > something different, that is UCS2 or UCS4.
OK. One thing is which is the default encoding for Unicode and another is how Python keeps Unicode internally. And internally Python 3 is still using UCS2 or UCS4, i.e. the same thing than in Python 2, so no worries here. > I imagine dtype 'S' and 'U' need more clarification. As it misses the > concept of encodings it seems? Currently, S appears to mean 8bit > characters no encoding, and U appears to mean 16bit characters no > encoding? Or are some sort of default encodings assumed? [clip] You only need encoding if you are going to represent Unicode strings with other types (for example bytes). Currently, NumPy can transparently import/export native Python Unicode strings (UCS2 or UCS4) into its own Unicode type (always UCS4). So, we don't have to worry here either. > btw, in my numpy tree there is a unicode_() alias to str in py3, and > to unicode in py2 (inside the compat.py file). This helped us in many > cases with compatible string code in the pygame port. This allows you > to create unicode strings on both platforms with the same code. Correct. But, in addition, we are going to need a new 'bytes' dtype for NumPy for Python 3, right? -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion