[Numpy-discussion] String type again.

Charles R Harris Sat, 12 Jul 2014 15:07:07 -0700

As previous posts have pointed out, Numpy's `S` type is currently treated
as a byte string, which leads to more complicated code in python3. OTOH,
the unicode type is stored as UCS4, which consumes a lot of space,
especially for ascii strings. This note proposes to adapt the currently
existing 'a' type letter, currently aliased to 'S', as a new fixed encoding
dtype. Python 3.3 introduced two one byte internal representations for
unicode strings, ascii and latin1. Ascii has the advantage that it is a
subset of UTF-8, whereas latin1 has a few more symbols. Another possibility
is to just make it an UTF-8 encoding, but I think this would involve more
overhead as Python would need to determine the maximum character size.
These are just preliminary thoughts, comments are welcome.


Chuck

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] String type again.

Reply via email to