On Tue, Jan 21, 2014 at 3:22 PM, Andrew Collette <andrew.colle...@gmail.com>wrote:
> Just stumbled on this discussion (I'm the lead author of h5py). > > We would be overjoyed if there were a 1-byte text type available in > NumPy. cool -- it looks like someone is going to get a draft PEP going -- so stay tuned, and add you comments when there is something to add them too.. String handling is the source of major pain right now in the > HDF5 world. All HDF5 strings are text (opaque types are used for > binary data), but we're forced into using the "S" type most of the > time because (1) the "U" type doesn't round-trip between HDF5 and > NumPy, as there's no fixed-width wide-character string type in HDF5, > it looks from here: http://www.hdfgroup.org/HDF5/doc/ADGuide/WhatsNew180.html that HDF uses utf-8 for unicode strings -- so you _could_ roundtrip with a lot of calls to encode/decode -- which could be pretty slow, compared to other ways to dump numpy arrays into HDF-5 -- that may be waht you mean by "doesn't round trip". This may be a good case for a numpy utf-8 dtype, I suppose (or a arbitrary encoding dtype, anyway). But: How does hdf handle the fact that utf-8 is not a fixed length encoding? ASCII-only would be preferable, partly for selfish reasons (HDF5's > default is ASCII only), and partly to make it possible to copy them > into containers labelled "UTF-8" without manually inspecting every > value. > hmm -- ascii does have those advantages, but I'm not sure its worth the restriction on what can be encoded. But you're quite right, you could dump asciii straight into something expecting utf-8, whereas you could not do that with latin-1, for instance. But you can't go the other way -- does it help much to avoided encoding in one direction? But maybe we can have a any-one-byte-per-char encoding option, in which case hdfpy could use ascii, but we wouldn't have to everywhere. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion