Re: [Numpy-discussion] String type again.

Chris Barker Thu, 17 Jul 2014 05:24:07 -0700

On Tue, Jul 15, 2014 at 4:26 AM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:


> Just wondering, couldn't we have a type which actually has an
>  (arbitrary, python supported) encoding (and "bytes" might even just be a
> special case of no encoding)?


well, then we're back to the core issue here:

numpy dtypes need to be a pre-specified length

encoded bytes are an arbitrary length.

This leads us to wanting to use only fixed-number-of-bytes-per-character
encodings:
 - ascii
 - latin-a
 - UCS-4 (or UTF-32..I get a bit confused about the names)

maybe UCS-2 (NOT UTF-16) would be worth considering, for a compromise
between space and fraction of unicode supported.

Basically storing bytes and on access do
> element[i].decode(specified_encoding) and on storing element[i] =
> value.encode(specified_encoding).
>

this really doesn't seem that different than just using python strings --
is there a point to having a pointer-to-python-string type as a less
generalized version of the currently possible  python strings in object
arrays?

 There is always the never ending small issue of trailing null bytes. If

> we want to be fully compatible, such a type would have to store the
> string length explicitly to support trailing null bytes.
>

are null bytes legal (as something other than a terminator) in some
encodings?

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] String type again.

Reply via email to