Hi Chris, > What it would do is push the problem from the HDF5<->numpy interface to the > python<->numpy interface. > > I'm not sure that's a good trade off.
Maybe I'm being too paranoid about the truncation issue. We already perform truncation when going from e.g. vlen to fixed-width strings in h5py... it's just the truncation behavior for same-width data that throws me. Here's a strawman for how a Latin-1 "a" type might be handled in h5py: 1. Creation from existing "a" data: Use vlen strings. Doesn't preserve the dtype, but maybe that's not so important. 2. Writing from "a" data to fixed-width ASCII: Copy, and replace bytes>127 with "?" (or don't) 3. Writing from "a" data to fixed-width UTF-8: Transcode and truncate (being careful not to end in the middle of a multibyte character) 4. Reading from fixed-width ASCII to "a": Straight copy, no inspection 5. Reading from fixed-width UTF-8 to "a": Copy, and replace non-Latin-1 chars with "?" (The above example uses replacement rather than raising an exception, because an exception in the HDF5 conversion callback will leave the write/read half-completed). In any case, I can say that the lack of an text 'S' type in NumPy has been a significant pain point for h5py users on Python 3 over the years. Whatever specific encoding ends up being used, such a type can only improve the situation, and I'm firmly in favor of it. Andrew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion