[Numpy-discussion] Standard for dtype string representation?

2023-02-08 Thread Francesc Alted
Hi,

We are in the process of using a standard representation of data types for
the forthcoming version of N-dim arrays in C-Blosc2, and we want to use the
NumPy string representation for that (see the end of
https://github.com/Blosc/c-blosc2/blob/main/README_B2ND_METALAYER.rst).  It
might seem a bit strange to use the specification of a Python package for
that, but provided its predominant role in data science, I don't think this
should com as a surprise to anyone.

There are some small gotchas though.  For simple data types, the string
representation is *apparently* fine. E.g.:

In [16]: str(np.dtype("i8"))
Out[16]: 'int64'

However, as long as we try to represent the endianness of the type, we get:

In [17]: str(np.dtype(">i8"))
Out[17]: '>i8'

So, it uses the short version of the representation.  And the same happens
with the structured types:

In [22]: str(np.dtype("S1,i8"))
Out[22]: "[('f0', 'S1'), ('f1', 'https://data-apis.org/array-api/latest/API_specification/data_types.html#data-types),
but it does not seem this is being addressed.

For now, (and for the Python-Blosc2 wrapper) we are going in this direction:

if dtype.kind == 'V':
repr = str(dtype)
else:
repr = dtype.str

Is there a way (or an ongoing effort) to express the variety of data types
in NumPy that beats the above (which seems somewhat inconsistent to me)?

Thanks!
-- 
Francesc Alted
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Standard for dtype string representation?

2023-02-08 Thread Sebastian Berg
On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote:
> Hi,
> 
> 



> Is there a way (or an ongoing effort) to express the variety of data
> types
> in NumPy that beats the above (which seems somewhat inconsistent to
> me)?

How about using the Python buffer interface format string (maybe with
some limitations).
But other than that, I don't have an obvious idea right now.

- Sebsatian

> 
> Thanks!
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Introducing Arm Optimized Routines

2023-02-08 Thread Chris Sidebottom
Hello again :-) 

Just as an update for the list, the first PR has now been raised to integrate 
Optimized Routines, demonstrating the performance improvements (sometimes 2x 
faster):
https://github.com/numpy/numpy/pull/23171

Once we've achieved the initial milestone of getting these routines integrated 
and the performance improved it would be interesting to understand what's 
required to translate them into universal intrinsics? I notice that SVE support 
(https://github.com/numpy/numpy/pull/22265) isn't quite ready for universal 
intrinsics which would lead me to believe we would need to use the library 
there either way?

Cheers,
Chris
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] next NumPy Newcomers' Hour - 8 pm UTC

2023-02-08 Thread Inessa Pawson
Our next Newcomers' Hour will be held tomorrow, February 9th at 8 pm UTC.
Stop by to ask questions or just to say hi.

To add to the meeting agenda the topics you’d like to discuss, follow the
link: https://hackmd.io/3f3otyyuTte3FU9y3QzsLg?both

Join the meeting via Zoom:
https://us06web.zoom.us/j/82563808729?pwd=ZFU3Z2dMcXBGb05YemRsaGE1OW5nQT09

--
Cheers,
Inessa

Inessa Pawson
Contributor Experience Lead | NumPy
https://numpy.org/
GitHub: inessapawson
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Standard for dtype string representation?

2023-02-08 Thread Francesc Alted
On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg 
wrote:

> On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote:
> > Hi,
> >
> >
>
> 
>
> > Is there a way (or an ongoing effort) to express the variety of data
> > types
> > in NumPy that beats the above (which seems somewhat inconsistent to
> > me)?
>
> How about using the Python buffer interface format string (maybe with
> some limitations).
>

If you mean the array interface (
https://numpy.org/doc/stable/reference/arrays.interface.html), this is what
dtype.str provides (
https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html).
But the limitation here is that structured types are represented by the 'V'
char, which is not properly representing it by any means.


> But other than that, I don't have an obvious idea right now.
>
> - Sebsatian
>
> >
> > Thanks!
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: fal...@gmail.com
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Standard for dtype string representation?

2023-02-08 Thread Sebastian Berg
On Wed, 2023-02-08 at 14:31 +0100, Francesc Alted wrote:
> On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg <
> sebast...@sipsolutions.net>
> wrote:
> 
> > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote:
> > > Hi,
> > > 
> > > 
> > 
> > 
> > 
> > > Is there a way (or an ongoing effort) to express the variety of
> > > data
> > > types
> > > in NumPy that beats the above (which seems somewhat inconsistent
> > > to
> > > me)?
> > 
> > How about using the Python buffer interface format string (maybe
> > with
> > some limitations).
> > 
> 
> If you mean the array interface (
> https://numpy.org/doc/stable/reference/arrays.interface.html), this
> is what
> dtype.str provides (
> https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html
> ).
> But the limitation here is that structured types are represented by
> the 'V'
> char, which is not properly representing it by any means.
> 

Ah, I was thinking of what the Python buffer protocol uses, which is
what struct uses:

https://docs.python.org/3/library/struct.html#module-struct

That has some annoyances for sure, and structured dtypes with field
names need rather strange syntax.  Also I think padding bytes at best
are simply fields with an empty name.
But overall, it probably already does a better job than any `str()` for
basic types:

In [2]: import numpy as np

In [3]: np.array(0, dtype="i,i,2f")
Out[3]: 
array((0, 0, [0., 0.]),
  dtype=[('f0', ' 
> > But other than that, I don't have an obvious idea right now.
> > 
> > - Sebsatian
> > 
> > > 
> > > Thanks!
> > > ___
> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > Member address: sebast...@sipsolutions.net
> > 
> > 
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: fal...@gmail.com
> > 
> 
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Standard for dtype string representation?

2023-02-08 Thread Francesc Alted
On Wed, Feb 8, 2023 at 3:19 PM Sebastian Berg 
wrote:

> On Wed, 2023-02-08 at 14:31 +0100, Francesc Alted wrote:
> > On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg <
> > sebast...@sipsolutions.net>
> > wrote:
> >
> > > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote:
> > > > Hi,
> > > >
> > > >
> > >
> > > 
> > >
> > > > Is there a way (or an ongoing effort) to express the variety of
> > > > data
> > > > types
> > > > in NumPy that beats the above (which seems somewhat inconsistent
> > > > to
> > > > me)?
> > >
> > > How about using the Python buffer interface format string (maybe
> > > with
> > > some limitations).
> > >
> >
> > If you mean the array interface (
> > https://numpy.org/doc/stable/reference/arrays.interface.html), this
> > is what
> > dtype.str provides (
> > https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html
> > ).
> > But the limitation here is that structured types are represented by
> > the 'V'
> > char, which is not properly representing it by any means.
> >
>
> Ah, I was thinking of what the Python buffer protocol uses, which is
> what struct uses:
>
> https://docs.python.org/3/library/struct.html#module-struct
>
> That has some annoyances for sure, and structured dtypes with field
> names need rather strange syntax.  Also I think padding bytes at best
> are simply fields with an empty name.
> But overall, it probably already does a better job than any `str()` for
> basic types:
>
> In [2]: import numpy as np
>
> In [3]: np.array(0, dtype="i,i,2f")
> Out[3]:
> array((0, 0, [0., 0.]),
>   dtype=[('f0', '
> In [4]: memoryview(np.array(0, dtype="i,i,2f")).format
> Out[4]: 'T{i:f0:i:f1:(2)f:f2:}'
>

Aha, that's pretty cool, although I don't think this is flexible enough to
support e.g. field names or nested fields.  After pondering about it, I
think we will add a format ID to our spec, and will stick with NumPy as the
default.  If in the future another format appears that is more well
defined, we could still change the representation and use a new ID, while
keeping backwards compatibility if needed.

Thanks!


>
> - Sebastian
>
>
> >
> > > But other than that, I don't have an obvious idea right now.
> > >
> > > - Sebsatian
> > >
> > > >
> > > > Thanks!
> > > > ___
> > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > Member address: sebast...@sipsolutions.net
> > >
> > >
> > > ___
> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > Member address: fal...@gmail.com
> > >
> >
> >
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: fal...@gmail.com
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Standard for dtype string representation?

2023-02-08 Thread Sebastian Berg
On Wed, 2023-02-08 at 17:08 +0100, Francesc Alted wrote:
> On Wed, Feb 8, 2023 at 3:19 PM Sebastian Berg <
> sebast...@sipsolutions.net>
> wrote:
> 
> > On Wed, 2023-02-08 at 14:31 +0100, Francesc Alted wrote:
> > > On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg <
> > > sebast...@sipsolutions.net>
> > > wrote:
> > > 
> > > > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote:
> > > > > Hi,
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > > Is there a way (or an ongoing effort) to express the variety
> > > > > of
> > > > > data
> > > > > types
> > > > > in NumPy that beats the above (which seems somewhat
> > > > > inconsistent
> > > > > to
> > > > > me)?
> > > > 
> > > > How about using the Python buffer interface format string
> > > > (maybe
> > > > with
> > > > some limitations).
> > > > 
> > > 
> > > If you mean the array interface (
> > > https://numpy.org/doc/stable/reference/arrays.interface.html),
> > > this
> > > is what
> > > dtype.str provides (
> > > https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html
> > > ).
> > > But the limitation here is that structured types are represented
> > > by
> > > the 'V'
> > > char, which is not properly representing it by any means.
> > > 
> > 
> > Ah, I was thinking of what the Python buffer protocol uses, which
> > is
> > what struct uses:
> > 
> >     https://docs.python.org/3/library/struct.html#module-struct
> > 
> > That has some annoyances for sure, and structured dtypes with field
> > names need rather strange syntax.  Also I think padding bytes at
> > best
> > are simply fields with an empty name.
> > But overall, it probably already does a better job than any `str()`
> > for
> > basic types:
> > 
> > In [2]: import numpy as np
> > 
> > In [3]: np.array(0, dtype="i,i,2f")
> > Out[3]:
> > array((0, 0, [0., 0.]),
> >   dtype=[('f0', ' > 
> > In [4]: memoryview(np.array(0, dtype="i,i,2f")).format
> > Out[4]: 'T{i:f0:i:f1:(2)f:f2:}'
> > 
> 
> Aha, that's pretty cool, although I don't think this is flexible
> enough to
> support e.g. field names or nested fields.  After pondering about it,
> I
> think we will add a format ID to our spec, and will stick with NumPy
> as the
> default.  If in the future another format appears that is more well
> defined, we could still change the representation and use a new ID,
> while
> keeping backwards compatibility if needed.
> 


It does support field names.  I think the main problem may be that it
cannot support e.g. datetimes.  You probably also can't support empty
field names (but maybe that is too weird to be useful anyway).

Not ethat in the output there `T{i:f0}` denotes that it is structured
and the name is "f0".  And yes, you can nest another `T{}` inside.

I am not sure whether type codes are limited to single characters
(which IMO wouldn't be nice), which (to me) would seem a bit limited. 
I also think we may have to agree on e.g. an empty name denoting
padding.

- Sebastian


> Thanks!
> 
> 
> > 
> > - Sebastian
> > 
> > 
> > > 
> > > > But other than that, I don't have an obvious idea right now.
> > > > 
> > > > - Sebsatian
> > > > 
> > > > > 
> > > > > Thanks!
> > > > > ___
> > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > > To unsubscribe send an email to 
> > > > > numpy-discussion-le...@python.org
> > > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > > Member address: sebast...@sipsolutions.net
> > > > 
> > > > 
> > > > ___
> > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > To unsubscribe send an email to 
> > > > numpy-discussion-le...@python.org
> > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > Member address: fal...@gmail.com
> > > > 
> > > 
> > > 
> > > ___
> > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > Member address: sebast...@sipsolutions.net
> > 
> > 
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: fal...@gmail.com
> > 
> 
> 
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/l

[Numpy-discussion] Re: Standard for dtype string representation?

2023-02-08 Thread Francesc Alted
On Wed, Feb 8, 2023 at 5:28 PM Sebastian Berg 
wrote:

> On Wed, 2023-02-08 at 17:08 +0100, Francesc Alted wrote:
> > On Wed, Feb 8, 2023 at 3:19 PM Sebastian Berg <
> > sebast...@sipsolutions.net>
> > wrote:
> >
> > > On Wed, 2023-02-08 at 14:31 +0100, Francesc Alted wrote:
> > > > On Wed, Feb 8, 2023 at 1:42 PM Sebastian Berg <
> > > > sebast...@sipsolutions.net>
> > > > wrote:
> > > >
> > > > > On Wed, 2023-02-08 at 12:48 +0100, Francesc Alted wrote:
> > > > > > Hi,
> > > > > >
> > > > > >
> > > > >
> > > > > 
> > > > >
> > > > > > Is there a way (or an ongoing effort) to express the variety
> > > > > > of
> > > > > > data
> > > > > > types
> > > > > > in NumPy that beats the above (which seems somewhat
> > > > > > inconsistent
> > > > > > to
> > > > > > me)?
> > > > >
> > > > > How about using the Python buffer interface format string
> > > > > (maybe
> > > > > with
> > > > > some limitations).
> > > > >
> > > >
> > > > If you mean the array interface (
> > > > https://numpy.org/doc/stable/reference/arrays.interface.html),
> > > > this
> > > > is what
> > > > dtype.str provides (
> > > >
> https://numpy.org/doc/stable/reference/generated/numpy.dtype.str.html
> > > > ).
> > > > But the limitation here is that structured types are represented
> > > > by
> > > > the 'V'
> > > > char, which is not properly representing it by any means.
> > > >
> > >
> > > Ah, I was thinking of what the Python buffer protocol uses, which
> > > is
> > > what struct uses:
> > >
> > > https://docs.python.org/3/library/struct.html#module-struct
> > >
> > > That has some annoyances for sure, and structured dtypes with field
> > > names need rather strange syntax.  Also I think padding bytes at
> > > best
> > > are simply fields with an empty name.
> > > But overall, it probably already does a better job than any `str()`
> > > for
> > > basic types:
> > >
> > > In [2]: import numpy as np
> > >
> > > In [3]: np.array(0, dtype="i,i,2f")
> > > Out[3]:
> > > array((0, 0, [0., 0.]),
> > >   dtype=[('f0', ' > >
> > > In [4]: memoryview(np.array(0, dtype="i,i,2f")).format
> > > Out[4]: 'T{i:f0:i:f1:(2)f:f2:}'
> > >
> >
> > Aha, that's pretty cool, although I don't think this is flexible
> > enough to
> > support e.g. field names or nested fields.  After pondering about it,
> > I
> > think we will add a format ID to our spec, and will stick with NumPy
> > as the
> > default.  If in the future another format appears that is more well
> > defined, we could still change the representation and use a new ID,
> > while
> > keeping backwards compatibility if needed.
> >
>
>
> It does support field names.  I think the main problem may be that it
> cannot support e.g. datetimes.  You probably also can't support empty
> field names (but maybe that is too weird to be useful anyway).
>

Right.  I don't think empty field names are useful, but not supporting
datetimes is a deal breaker for us.


>
> Not ethat in the output there `T{i:f0}` denotes that it is structured
> and the name is "f0".  And yes, you can nest another `T{}` inside.
>

Good to know.


>
> I am not sure whether type codes are limited to single characters
> (which IMO wouldn't be nice), which (to me) would seem a bit limited.
> I also think we may have to agree on e.g. an empty name denoting
> padding.
>

While I agree that the buffer format is a good effort, I consider the NumPy
representation pretty more complete (which makes sense, as it had to evolve
following user's needs more closely).  Still, I *hope* there will be an
effort in standarizing the NumPy format a bit more formally in the future.



>
> - Sebastian
>
>
> > Thanks!
> >
> >
> > >
> > > - Sebastian
> > >
> > >
> > > >
> > > > > But other than that, I don't have an obvious idea right now.
> > > > >
> > > > > - Sebsatian
> > > > >
> > > > > >
> > > > > > Thanks!
> > > > > > ___
> > > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > > > To unsubscribe send an email to
> > > > > > numpy-discussion-le...@python.org
> > > > > >
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > > > Member address: sebast...@sipsolutions.net
> > > > >
> > > > >
> > > > > ___
> > > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > > To unsubscribe send an email to
> > > > > numpy-discussion-le...@python.org
> > > > >
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > > Member address: fal...@gmail.com
> > > > >
> > > >
> > > >
> > > > ___
> > > > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > > > To unsubscribe send an email to numpy-discussion-le...@python.org
> > > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > > > Member address: sebast...@sipsolutions.net
> > >
> > >
> > > ___
> > > NumPy-Discussion mailing list