On Thu, Jul 17, 2014 at 5:48 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Tue, Jul 15, 2014 at 4:29 PM, Charles R Harris > <charlesr.har...@gmail.com> wrote: >> Thinking more about it, the easiest thing to do might be to make the S dtype >> a UTF-8 encoding. Most of the machinery to deal with that is already in >> place. That change might affect some users though, and we might need to do >> some work to make it backwards compatible with python 2. > > I'd be very concerned about backcompat for existing code that uses > e.g. "S128" as a dtype to mean "128 arbitrary bytes". An example is > this file format reading code: > https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L123 > The file format says there are 128 bytes there, and their > interpretation depends on other fields in the header -- but in one > case, for "large montages", there's an encoding where every 3 bytes > represents 4 characters using an ad hoc 6-bit character set: > https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L133 > > Perhaps this case could be handled better by using a u8 subarray or > something (that code also goes to some efforts to work around nul > padding), and that particular project hasn't been ported to py3 yet so > technically wouldn't be affected if we changed the meaning of "S" on > py3. But it does seem useful to have a "fixed length bytes" dtype even > in py3, and if we declare that be "S" then it avoids breaking any > existing code depending on it... >
We break code either way. Either we break applications using S as string type, but now it becomes bytes in python3. Or we break applications treating S as byte type and we change it to string in python3. Unfortunately we missed the opportunity when adding python3 support to fix the same exact same bytes/text boundary issue which is the main reason why pythons3 exists in the first place. We should have made porting to numpy3 a intentionally(!) backward incompatible change just like python itself did. Now we are stuck with deciding, which option breaks less. On the one hand, that S is bytes in python3 is somewhat established by now and lots of workarounds are already place. On the other hand, I think code that relies on S being bytes is in the minority and python3 usage is probably still insignificant in this area. Unfortunately getting actual numbers and not wild guesses on this is probably not easy. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion