On Tue, Jul 15, 2014 at 4:29 PM, Charles R Harris <charlesr.har...@gmail.com> wrote: > Thinking more about it, the easiest thing to do might be to make the S dtype > a UTF-8 encoding. Most of the machinery to deal with that is already in > place. That change might affect some users though, and we might need to do > some work to make it backwards compatible with python 2.
I'd be very concerned about backcompat for existing code that uses e.g. "S128" as a dtype to mean "128 arbitrary bytes". An example is this file format reading code: https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L123 The file format says there are 128 bytes there, and their interpretation depends on other fields in the header -- but in one case, for "large montages", there's an encoding where every 3 bytes represents 4 characters using an ad hoc 6-bit character set: https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L133 Perhaps this case could be handled better by using a u8 subarray or something (that code also goes to some efforts to work around nul padding), and that particular project hasn't been ported to py3 yet so technically wouldn't be affected if we changed the meaning of "S" on py3. But it does seem useful to have a "fixed length bytes" dtype even in py3, and if we declare that be "S" then it avoids breaking any existing code depending on it... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion