On Thu, Jan 23, 2014 at 11:43 AM, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote: > On Thu, Jan 23, 2014 at 11:23:09AM -0500, josef.p...@gmail.com wrote: >> >> another curious example, encode utf-8 to latin-1 bytes >> >> >>> b >> array(['Õsc', 'zxc'], >> dtype='<U3') >> >>> b[0].encode('utf8') >> b'\xc3\x95sc' >> >>> b[0].encode('latin1') >> b'\xd5sc' >> >>> b.astype('S') >> Traceback (most recent call last): >> File "<pyshell#40>", line 1, in <module> >> b.astype('S') >> UnicodeEncodeError: 'ascii' codec can't encode character '\xd5' in >> position 0: ordinal not in range(128) >> >>> c = b.view('S4').astype('S1').view('S3') >> >>> c >> array([b'\xd5sc', b'zxc'], >> dtype='|S3') >> >>> c[0].decode('latin1') >> 'Õsc' > > Okay, so it seems that .view() implicitly uses latin-1 whereas .astype() uses > ascii: > >>>> np.array(['Õsc']).astype('S4') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeEncodeError: 'ascii' codec can't encode character '\xd5' in position > 0: ordinal not in range(128) >>>> np.array(['Õsc']).view('S4') > array([b'\xd5', b's', b'c'], > dtype='|S4')
No, a view doesn't change the memory, it just changes the interpretation and there shouldn't be any conversion involved. astype does type conversion, but it goes through ascii encoding which fails. >>> b = np.array(['Õsc', 'zxc'], dtype='<U3') >>> b.tostring() b'\xd5\x00\x00\x00s\x00\x00\x00c\x00\x00\x00z\x00\x00\x00x\x00\x00\x00c\x00\x00\x00' >>> b.view('S12') array([b'\xd5\x00\x00\x00s\x00\x00\x00c', b'z\x00\x00\x00x\x00\x00\x00c'], dtype='|S12') The conversion happens somewhere in the array creation, but I have no idea about the memory encoding for uc2 and the low level layouts. Josef > >> -------- >> The original numpy py3 conversion used latin-1 as default >> (It's still used in statsmodels, and I haven't looked at the structure >> under the common py2-3 codebase) >> >> if sys.version_info[0] >= 3: >> import io >> bytes = bytes >> unicode = str >> asunicode = str > > These two functions are an abomination: > >> def asbytes(s): >> if isinstance(s, bytes): >> return s >> return s.encode('latin1') >> def asstr(s): >> if isinstance(s, str): >> return s >> return s.decode('latin1') > > > Oscar > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion