[Numpy-discussion] Standard for dtype string representation?
Hi, We are in the process of using a standard representation of data types for the forthcoming version of N-dim arrays in C-Blosc2, and we want to use the NumPy string representation for that (see the end of https://github.com/Blosc/c-blosc2/blob/main/README_B2ND_METALAYER.rst). It might seem a bit strange to use the specification of a Python package for that, but provided its predominant role in data science, I don't think this should com as a surprise to anyone. There are some small gotchas though. For simple data types, the string representation is *apparently* fine. E.g.: In [16]: str(np.dtype("i8")) Out[16]: 'int64' However, as long as we try to represent the endianness of the type, we get: In [17]: str(np.dtype(">i8")) Out[17]: '>i8' So, it uses the short version of the representation. And the same happens with the structured types: In [22]: str(np.dtype("S1,i8")) Out[22]: "[('f0', 'S1'), ('f1', 'https://data-apis.org/array-api/latest/API_specification/data_types.html#data-types), but it does not seem this is being addressed. For now, (and for the Python-Blosc2 wrapper) we are going in this direction: if dtype.kind == 'V': repr = str(dtype) else: repr = dtype.str Is there a way (or an ongoing effort) to express the variety of data types in NumPy that beats the above (which seems somewhat inconsistent to me)? Thanks! -- Francesc Alted ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Standard for dtype string representation?
On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote: > Hi, > > > Is there a way (or an ongoing effort) to express the variety of data > types > in NumPy that beats the above (which seems somewhat inconsistent to > me)? How about using the Python buffer interface format string (maybe with some limitations). But other than that, I don't have an obvious idea right now. - Sebsatian > > Thanks! > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Introducing Arm Optimized Routines
Hello again :-) Just as an update for the list, the first PR has now been raised to integrate Optimized Routines, demonstrating the performance improvements (sometimes 2x faster): https://github.com/numpy/numpy/pull/23171 Once we've achieved the initial milestone of getting these routines integrated and the performance improved it would be interesting to understand what's required to translate them into universal intrinsics? I notice that SVE support (https://github.com/numpy/numpy/pull/22265) isn't quite ready for universal intrinsics which would lead me to believe we would need to use the library there either way? Cheers, Chris ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] next NumPy Newcomers' Hour - 8 pm UTC
Our next Newcomers' Hour will be held tomorrow, February 9th at 8 pm UTC. Stop by to ask questions or just to say hi. To add to the meeting agenda the topics you’d like to discuss, follow the link: https://hackmd.io/3f3otyyuTte3FU9y3QzsLg?both Join the meeting via Zoom: https://us06web.zoom.us/j/82563808729?pwd=ZFU3Z2dMcXBGb05YemRsaGE1OW5nQT09 -- Cheers, Inessa Inessa Pawson Contributor Experience Lead | NumPy https://numpy.org/ GitHub: inessapawson ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Standard for dtype string representation?
On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg wrote: > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote: > > Hi, > > > > > > > > > Is there a way (or an ongoing effort) to express the variety of data > > types > > in NumPy that beats the above (which seems somewhat inconsistent to > > me)? > > How about using the Python buffer interface format string (maybe with > some limitations). > If you mean the array interface ( https://numpy.org/doc/stable/reference/arrays.interface.html), this is what dtype.str provides ( https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html). But the limitation here is that structured types are represented by the 'V' char, which is not properly representing it by any means. > But other than that, I don't have an obvious idea right now. > > - Sebsatian > > > > > Thanks! > > ___ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: sebast...@sipsolutions.net > > > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: fal...@gmail.com > -- Francesc Alted ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Standard for dtype string representation?
On Wed, 2023-02-08 at 14:31 +0100, Francesc Alted wrote: > On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg < > sebast...@sipsolutions.net> > wrote: > > > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote: > > > Hi, > > > > > > > > > > > > > > > Is there a way (or an ongoing effort) to express the variety of > > > data > > > types > > > in NumPy that beats the above (which seems somewhat inconsistent > > > to > > > me)? > > > > How about using the Python buffer interface format string (maybe > > with > > some limitations). > > > > If you mean the array interface ( > https://numpy.org/doc/stable/reference/arrays.interface.html), this > is what > dtype.str provides ( > https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html > ). > But the limitation here is that structured types are represented by > the 'V' > char, which is not properly representing it by any means. > Ah, I was thinking of what the Python buffer protocol uses, which is what struct uses: https://docs.python.org/3/library/struct.html#module-struct That has some annoyances for sure, and structured dtypes with field names need rather strange syntax. Also I think padding bytes at best are simply fields with an empty name. But overall, it probably already does a better job than any `str()` for basic types: In [2]: import numpy as np In [3]: np.array(0, dtype="i,i,2f") Out[3]: array((0, 0, [0., 0.]), dtype=[('f0', ' > > But other than that, I don't have an obvious idea right now. > > > > - Sebsatian > > > > > > > > Thanks! > > > ___ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: sebast...@sipsolutions.net > > > > > > ___ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: fal...@gmail.com > > > > > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Standard for dtype string representation?
On Wed, Feb 8, 2023 at 3:19 PM Sebastian Berg wrote: > On Wed, 2023-02-08 at 14:31 +0100, Francesc Alted wrote: > > On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg < > > sebast...@sipsolutions.net> > > wrote: > > > > > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote: > > > > Hi, > > > > > > > > > > > > > > > > > > > > > Is there a way (or an ongoing effort) to express the variety of > > > > data > > > > types > > > > in NumPy that beats the above (which seems somewhat inconsistent > > > > to > > > > me)? > > > > > > How about using the Python buffer interface format string (maybe > > > with > > > some limitations). > > > > > > > If you mean the array interface ( > > https://numpy.org/doc/stable/reference/arrays.interface.html), this > > is what > > dtype.str provides ( > > https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html > > ). > > But the limitation here is that structured types are represented by > > the 'V' > > char, which is not properly representing it by any means. > > > > Ah, I was thinking of what the Python buffer protocol uses, which is > what struct uses: > > https://docs.python.org/3/library/struct.html#module-struct > > That has some annoyances for sure, and structured dtypes with field > names need rather strange syntax. Also I think padding bytes at best > are simply fields with an empty name. > But overall, it probably already does a better job than any `str()` for > basic types: > > In [2]: import numpy as np > > In [3]: np.array(0, dtype="i,i,2f") > Out[3]: > array((0, 0, [0., 0.]), > dtype=[('f0', ' > In [4]: memoryview(np.array(0, dtype="i,i,2f")).format > Out[4]: 'T{i:f0:i:f1:(2)f:f2:}' > Aha, that's pretty cool, although I don't think this is flexible enough to support e.g. field names or nested fields. After pondering about it, I think we will add a format ID to our spec, and will stick with NumPy as the default. If in the future another format appears that is more well defined, we could still change the representation and use a new ID, while keeping backwards compatibility if needed. Thanks! > > - Sebastian > > > > > > > But other than that, I don't have an obvious idea right now. > > > > > > - Sebsatian > > > > > > > > > > > Thanks! > > > > ___ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: sebast...@sipsolutions.net > > > > > > > > > ___ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: fal...@gmail.com > > > > > > > > > ___ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: sebast...@sipsolutions.net > > > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: fal...@gmail.com > -- Francesc Alted ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Standard for dtype string representation?
On Wed, 2023-02-08 at 17:08 +0100, Francesc Alted wrote: > On Wed, Feb 8, 2023 at 3:19 PM Sebastian Berg < > sebast...@sipsolutions.net> > wrote: > > > On Wed, 2023-02-08 at 14:31 +0100, Francesc Alted wrote: > > > On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg < > > > sebast...@sipsolutions.net> > > > wrote: > > > > > > > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote: > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > Is there a way (or an ongoing effort) to express the variety > > > > > of > > > > > data > > > > > types > > > > > in NumPy that beats the above (which seems somewhat > > > > > inconsistent > > > > > to > > > > > me)? > > > > > > > > How about using the Python buffer interface format string > > > > (maybe > > > > with > > > > some limitations). > > > > > > > > > > If you mean the array interface ( > > > https://numpy.org/doc/stable/reference/arrays.interface.html), > > > this > > > is what > > > dtype.str provides ( > > > https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html > > > ). > > > But the limitation here is that structured types are represented > > > by > > > the 'V' > > > char, which is not properly representing it by any means. > > > > > > > Ah, I was thinking of what the Python buffer protocol uses, which > > is > > what struct uses: > > > > https://docs.python.org/3/library/struct.html#module-struct > > > > That has some annoyances for sure, and structured dtypes with field > > names need rather strange syntax. Also I think padding bytes at > > best > > are simply fields with an empty name. > > But overall, it probably already does a better job than any `str()` > > for > > basic types: > > > > In [2]: import numpy as np > > > > In [3]: np.array(0, dtype="i,i,2f") > > Out[3]: > > array((0, 0, [0., 0.]), > > dtype=[('f0', ' > > > In [4]: memoryview(np.array(0, dtype="i,i,2f")).format > > Out[4]: 'T{i:f0:i:f1:(2)f:f2:}' > > > > Aha, that's pretty cool, although I don't think this is flexible > enough to > support e.g. field names or nested fields. After pondering about it, > I > think we will add a format ID to our spec, and will stick with NumPy > as the > default. If in the future another format appears that is more well > defined, we could still change the representation and use a new ID, > while > keeping backwards compatibility if needed. > It does support field names. I think the main problem may be that it cannot support e.g. datetimes. You probably also can't support empty field names (but maybe that is too weird to be useful anyway). Not ethat in the output there `T{i:f0}` denotes that it is structured and the name is "f0". And yes, you can nest another `T{}` inside. I am not sure whether type codes are limited to single characters (which IMO wouldn't be nice), which (to me) would seem a bit limited. I also think we may have to agree on e.g. an empty name denoting padding. - Sebastian > Thanks! > > > > > > - Sebastian > > > > > > > > > > > But other than that, I don't have an obvious idea right now. > > > > > > > > - Sebsatian > > > > > > > > > > > > > > Thanks! > > > > > ___ > > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > > To unsubscribe send an email to > > > > > numpy-discussion-le...@python.org > > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > > Member address: sebast...@sipsolutions.net > > > > > > > > > > > > ___ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to > > > > numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: fal...@gmail.com > > > > > > > > > > > > > ___ > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > Member address: sebast...@sipsolutions.net > > > > > > ___ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to numpy-discussion-le...@python.org > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > Member address: fal...@gmail.com > > > > > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/l
[Numpy-discussion] Re: Standard for dtype string representation?
On Wed, Feb 8, 2023 at 5:28 PM Sebastian Berg wrote: > On Wed, 2023-02-08 at 17:08 +0100, Francesc Alted wrote: > > On Wed, Feb 8, 2023 at 3:19 PM Sebastian Berg < > > sebast...@sipsolutions.net> > > wrote: > > > > > On Wed, 2023-02-08 at 14:31 +0100, Francesc Alted wrote: > > > > On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg < > > > > sebast...@sipsolutions.net> > > > > wrote: > > > > > > > > > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote: > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is there a way (or an ongoing effort) to express the variety > > > > > > of > > > > > > data > > > > > > types > > > > > > in NumPy that beats the above (which seems somewhat > > > > > > inconsistent > > > > > > to > > > > > > me)? > > > > > > > > > > How about using the Python buffer interface format string > > > > > (maybe > > > > > with > > > > > some limitations). > > > > > > > > > > > > > If you mean the array interface ( > > > > https://numpy.org/doc/stable/reference/arrays.interface.html), > > > > this > > > > is what > > > > dtype.str provides ( > > > > > https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html > > > > ). > > > > But the limitation here is that structured types are represented > > > > by > > > > the 'V' > > > > char, which is not properly representing it by any means. > > > > > > > > > > Ah, I was thinking of what the Python buffer protocol uses, which > > > is > > > what struct uses: > > > > > > https://docs.python.org/3/library/struct.html#module-struct > > > > > > That has some annoyances for sure, and structured dtypes with field > > > names need rather strange syntax. Also I think padding bytes at > > > best > > > are simply fields with an empty name. > > > But overall, it probably already does a better job than any `str()` > > > for > > > basic types: > > > > > > In [2]: import numpy as np > > > > > > In [3]: np.array(0, dtype="i,i,2f") > > > Out[3]: > > > array((0, 0, [0., 0.]), > > > dtype=[('f0', ' > > > > > In [4]: memoryview(np.array(0, dtype="i,i,2f")).format > > > Out[4]: 'T{i:f0:i:f1:(2)f:f2:}' > > > > > > > Aha, that's pretty cool, although I don't think this is flexible > > enough to > > support e.g. field names or nested fields. After pondering about it, > > I > > think we will add a format ID to our spec, and will stick with NumPy > > as the > > default. If in the future another format appears that is more well > > defined, we could still change the representation and use a new ID, > > while > > keeping backwards compatibility if needed. > > > > > It does support field names. I think the main problem may be that it > cannot support e.g. datetimes. You probably also can't support empty > field names (but maybe that is too weird to be useful anyway). > Right. I don't think empty field names are useful, but not supporting datetimes is a deal breaker for us. > > Not ethat in the output there `T{i:f0}` denotes that it is structured > and the name is "f0". And yes, you can nest another `T{}` inside. > Good to know. > > I am not sure whether type codes are limited to single characters > (which IMO wouldn't be nice), which (to me) would seem a bit limited. > I also think we may have to agree on e.g. an empty name denoting > padding. > While I agree that the buffer format is a good effort, I consider the NumPy representation pretty more complete (which makes sense, as it had to evolve following user's needs more closely). Still, I *hope* there will be an effort in standarizing the NumPy format a bit more formally in the future. > > - Sebastian > > > > Thanks! > > > > > > > > > > - Sebastian > > > > > > > > > > > > > > > But other than that, I don't have an obvious idea right now. > > > > > > > > > > - Sebsatian > > > > > > > > > > > > > > > > > Thanks! > > > > > > ___ > > > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > > > To unsubscribe send an email to > > > > > > numpy-discussion-le...@python.org > > > > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > > > Member address: sebast...@sipsolutions.net > > > > > > > > > > > > > > > ___ > > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > > To unsubscribe send an email to > > > > > numpy-discussion-le...@python.org > > > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > > Member address: fal...@gmail.com > > > > > > > > > > > > > > > > > ___ > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > > > To unsubscribe send an email to numpy-discussion-le...@python.org > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > > > Member address: sebast...@sipsolutions.net > > > > > > > > > ___ > > > NumPy-Discussion mailing list