On Tue, Jul 15, 2014 at 7:40 PM, Aldcroft, Thomas <aldcr...@head.cfa.harvard.edu> wrote: > > On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith <n...@pobox.com> wrote: >> >> OTOH, fixed length nul padded latin1 would be useful for various flat file >> reading tasks. > > As one of the original agitators for this, let me re-iterate that what the > astronomical community *really* wants is the original proposal as described > by Chris Barker [1] and essentially what Charles said. We have large data > archives that have ASCII string data in binary formats like FITS and HDF5. > The current readers for those datasets present users with numpy S data > types, which in Python 3 cannot be compared to str (unicode) literals. In > many cases those datasets are large, and in my case I regularly deal with > multi-Gb sized bytestring arrays. Converting those to a U dtype is not > practical.
This is feedback is *super* useful, thanks. Can you elaborate a bit more on your requirements? I get that: - You have data that is treated as text, so it is convenient to be able to use Python strings for things like equality tests, np.sum(arr == "green") etc. - Your data uses only ASCII characters, and you don't want to spend more than 1 byte of memory per character. Do you ever have 8 bit characters, and if so, what encoding do you use? Does it matter to you that the memory layout for these 1-byte-per-char strings remain fixed-width nul-padded concatenated strings (e.g., because you are mmap'ing files that have this format)? Or do FITS/HDF5 handle layout details internally and you don't care so long as the above requirements are met? Does the fixed-width nature of numpy strings cause problems in the above setting? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion