On Tue, Jul 15, 2014 at 4:26 AM, Sebastian Berg <sebast...@sipsolutions.net> wrote:
> Just wondering, couldn't we have a type which actually has an > (arbitrary, python supported) encoding (and "bytes" might even just be a > special case of no encoding)? well, then we're back to the core issue here: numpy dtypes need to be a pre-specified length encoded bytes are an arbitrary length. This leads us to wanting to use only fixed-number-of-bytes-per-character encodings: - ascii - latin-a - UCS-4 (or UTF-32..I get a bit confused about the names) maybe UCS-2 (NOT UTF-16) would be worth considering, for a compromise between space and fraction of unicode supported. Basically storing bytes and on access do > element[i].decode(specified_encoding) and on storing element[i] = > value.encode(specified_encoding). > this really doesn't seem that different than just using python strings -- is there a point to having a pointer-to-python-string type as a less generalized version of the currently possible python strings in object arrays? There is always the never ending small issue of trailing null bytes. If > we want to be fully compatible, such a type would have to store the > string length explicitly to support trailing null bytes. > are null bytes legal (as something other than a terminator) in some encodings? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion