Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

Stefan Krah Tue, 06 Mar 2012 10:17:09 -0800

Victor Stinner <victor.stin...@gmail.com> wrote:
> > 'c' -> UCS1
> > 'u' -> UCS2
> > 'w' -> UCS4
> 
> A Unicode string is an array of code point. Another approach is to
> expose such string as an array of uint8/uint16/uint32 integers. I
> don't know if you expect to get a character / a substring when you
> read the buffer of a string object. Using Python 3.2, I get:
> 
> >>> memoryview(b"abc")[0]
> b'a'
> 
> ... but using Python 3.3 I get a number :-)


Yes, that's changed because officially (see struct module) the format
is unsigned bytes, which are integers in struct module syntax:

>>> unsigned_bytes = memoryview(b"abc")
>>> unsigned_bytes.format
'B'
>>> char_array = unsigned_bytes.cast('c')
>>> char_array.format
'c'
>>> char_array[0]
b'a'


Possibly the uint8/uint16/uint32 integer approach that you mention
would make more sense.


Stefan Krah


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP-393/PEP-3118: unicode format specifiers

Reply via email to