On Mon, Apr 24, 2017 at 7:41 PM, Nathaniel Smith wrote:
>
> On Mon, Apr 24, 2017 at 7:23 PM, Robert Kern
wrote:
> > On Mon, Apr 24, 2017 at 7:07 PM, Nathaniel Smith wrote:
> >
> >> That said, AFAICT what people actually want in most use cases is
support
> >> for arrays that can hold variable-len
On Mon, Apr 24, 2017 at 7:41 PM, Nathaniel Smith wrote:
> But also, is it important whether strings we're loading/saving to an
> HDF5 file have the same in-memory representation in numpy as they
> would in the file? I *know* [1] no-one is reading HDF5 files using
> np.memmap :-).
Of course they
On Mon, Apr 24, 2017 at 7:23 PM, Robert Kern wrote:
> On Mon, Apr 24, 2017 at 7:07 PM, Nathaniel Smith wrote:
>
>> That said, AFAICT what people actually want in most use cases is support
>> for arrays that can hold variable-length strings, and the only place where
>> the current approach is *opt
On Mon, Apr 24, 2017 at 7:07 PM, Nathaniel Smith wrote:
> That said, AFAICT what people actually want in most use cases is support
for arrays that can hold variable-length strings, and the only place where
the current approach is *optimal* is when we need mmap compatibility with
legacy formats th
On Apr 21, 2017 2:34 PM, "Stephan Hoyer" wrote:
I still don't understand why a latin encoding makes sense as a preferred
one-byte-per-char dtype. The world, including Python 3, has standardized on
UTF-8, which is also one-byte-per-char for (ASCII) scientific data.
You may already know this, but
On Mon, Apr 24, 2017 at 5:56 PM, Aldcroft, Thomas <
aldcr...@head.cfa.harvard.edu> wrote:
>
> On Mon, Apr 24, 2017 at 7:11 PM, Robert Kern
wrote:
>>
>> On Mon, Apr 24, 2017 at 4:06 PM, Aldcroft, Thomas <
aldcr...@head.cfa.harvard.edu> wrote:
>> >
>> > On Mon, Apr 24, 2017 at 4:06 PM, Robert Kern
On Mon, Apr 24, 2017 at 7:11 PM, Robert Kern wrote:
> On Mon, Apr 24, 2017 at 4:06 PM, Aldcroft, Thomas <
> aldcr...@head.cfa.harvard.edu> wrote:
> >
> > On Mon, Apr 24, 2017 at 4:06 PM, Robert Kern
> wrote:
> >>
> >> I am not unfamiliar with this problem. I still work with files that
> have fie
On Mon, Apr 24, 2017 at 4:09 PM, Stephan Hoyer wrote:
>
> On Mon, Apr 24, 2017 at 11:13 AM, Chris Barker
wrote:
>>>
>>> On the other hand, if this is the use-case, perhaps we really want an
encoding closer to "Python 2" string, i.e, "unknown", to let this be
signaled more explicitly. I would sugg
On Mon, Apr 24, 2017 at 4:08 PM, Robert Kern wrote:
> Let me make a counter-proposal for your latin-1 dtype (your #2) that might
> address your, Thomas's, and Julian's use cases:
>
> 2) We want a single-byte-per-character, NULL-terminated string dtype that
> can be used to represent mostly-ASCII
On Mon, Apr 24, 2017 at 4:06 PM, Aldcroft, Thomas <
aldcr...@head.cfa.harvard.edu> wrote:
>
> On Mon, Apr 24, 2017 at 4:06 PM, Robert Kern
wrote:
>>
>> I am not unfamiliar with this problem. I still work with files that have
fields that are supposed to be in EBCDIC but actually contain text in
ASC
On Mon, Apr 24, 2017 at 11:13 AM, Chris Barker
wrote:
> On the other hand, if this is the use-case, perhaps we really want an
>> encoding closer to "Python 2" string, i.e, "unknown", to let this be
>> signaled more explicitly. I would suggest that "text[unknown]" should
>> support operations like
Chris, you've mashed all of my emails together, some of them are in reply
to you, some in reply to others. Unfortunately, this dropped a lot of the
context from each of them, and appears to be creating some
misunderstandings about what each person is advocating.
On Mon, Apr 24, 2017 at 2:00 PM, Ch
On Mon, Apr 24, 2017 at 4:06 PM, Robert Kern wrote:
> I am not unfamiliar with this problem. I still work with files that have
> fields that are supposed to be in EBCDIC but actually contain text in
> ASCII, UTF-8 (if I'm lucky) or any of a variety of East European 8-bit
> encodings. In that expe
On Mon, Apr 24, 2017 at 11:36 AM, Robert Kern wrote:
> > I agree -- it is a VERY common case for scientific data sets. But a
> one-byte-per-char encoding would handle it nicely, or UCS-4 if you want
> Unicode. The wasted space is not that big a deal with short strings...
>
> Unless if you have hu
On Mon, Apr 24, 2017 at 11:56 AM, Aldcroft, Thomas <
aldcr...@head.cfa.harvard.edu> wrote:
>
> On Mon, Apr 24, 2017 at 2:47 PM, Robert Kern
wrote:
>>
>> On Mon, Apr 24, 2017 at 10:51 AM, Aldcroft, Thomas <
aldcr...@head.cfa.harvard.edu> wrote:
>> >
>> > On Mon, Apr 24, 2017 at 1:04 PM, Chris Barke
On Mon, Apr 24, 2017 at 10:04 AM, Chris Barker
wrote:
>
> On Fri, Apr 21, 2017 at 2:34 PM, Stephan Hoyer wrote:
>
>>> In this case, we want something compatible with Python's string (i.e.
full Unicode supporting) and I think should be as transparent as possible.
Python's string has made the decis
On Mon, Apr 24, 2017 at 2:47 PM, Robert Kern wrote:
> On Mon, Apr 24, 2017 at 10:51 AM, Aldcroft, Thomas <
> aldcr...@head.cfa.harvard.edu> wrote:
> >
> > On Mon, Apr 24, 2017 at 1:04 PM, Chris Barker
> wrote:
>
> >> - round-tripping of binary data (at least with Python's
> encoding/decoding) --
On Mon, Apr 24, 2017 at 10:51 AM, Aldcroft, Thomas <
aldcr...@head.cfa.harvard.edu> wrote:
>
> On Mon, Apr 24, 2017 at 1:04 PM, Chris Barker
wrote:
>> - round-tripping of binary data (at least with Python's
encoding/decoding) -- ANY string of bytes can be decodes as latin-1 and
re-encoded to get
On Mon, Apr 24, 2017 at 11:21 AM, Chris Barker
wrote:
>
> On Mon, Apr 24, 2017 at 10:51 AM, Aldcroft, Thomas <
aldcr...@head.cfa.harvard.edu> wrote:
>>>
>>> BTW -- maybe we should keep the pathological use-case in mind: really
short strings. I think we are all thinking in terms of longer strings,
On Mon, Apr 24, 2017 at 10:51 AM, Aldcroft, Thomas <
aldcr...@head.cfa.harvard.edu> wrote:
> BTW -- maybe we should keep the pathological use-case in mind: really
>> short strings. I think we are all thinking in terms of longer strings,
>> maybe a name field, where you might assign 32 bytes or so
On Mon, Apr 24, 2017 at 10:51 AM, Stephan Hoyer wrote:
> - round-tripping of binary data (at least with Python's encoding/decoding)
>> -- ANY string of bytes can be decodes as latin-1 and re-encoded to get the
>> same bytes back. You may get garbage, but you won't get an EncodingError.
>>
>
> For
On Mon, Apr 24, 2017 at 1:04 PM, Chris Barker wrote:
> On Fri, Apr 21, 2017 at 2:34 PM, Stephan Hoyer wrote:
>
>
>> In this case, we want something compatible with Python's string (i.e.
>>> full Unicode supporting) and I think should be as transparent as possible.
>>> Python's string has made th
On Mon, Apr 24, 2017 at 10:04 AM, Chris Barker
wrote:
> latin-1 or latin-9 buys you (over ASCII):
>
> ...
>
> - round-tripping of binary data (at least with Python's encoding/decoding)
> -- ANY string of bytes can be decodes as latin-1 and re-encoded to get the
> same bytes back. You may get garb
On Fri, Apr 21, 2017 at 2:34 PM, Stephan Hoyer wrote:
> In this case, we want something compatible with Python's string (i.e. full
>> Unicode supporting) and I think should be as transparent as possible.
>> Python's string has made the decision to present a character oriented API
>> to users (de
24 matches
Mail list logo