[Numpy-discussion] plan for moving to Meson

2022-11-11 Thread Ralf Gommers
Hi all,

With distutils now removed from the stdlib in the Python 3.12 release
cycle, the clock is ticking a bit for dealing with our build system
situation. With SciPy's move to Meson now basically complete - there are
always loose ends & improvements, but the 1.9 releases have gone well -
it's time to look at NumPy doing the same thing. Scikit-image has also
merged support for Meson, and Pandas is about to. So all the generic &
Python packaging related things we need should be in place. NumPy does have
some hairy stuff of course that those other projects don't have (lots of
config checks and platform-specific behavior, extensive SIMD support). It
shouldn't be too difficult to get a baseline build - Linux/macOS with
baseline SIMD flags - working, but there'll be a long tail of things that
are hard to test or will need to be upstreamed to Meson.

Here is a tracking issue where I wrote up a plan for how to approach the
transition: https://github.com/numpy/numpy/issues/22546. Linked from there
is a GitHub project board that we plan to use to divide up the work. And
there is a `meson` branch in the repo with a start to the implementation.

NumPy should default to building with Meson in the 1.25.0 release, which
should also still build with `numpy.distutils`. And `numpy.distutils` will
continue to be shipped for Python <3.12 for a while after that, until we
determine that it's no longer needed.

If anyone has ideas or concerns regarding the current plan, it'd be great
to hear them - here or on the issue.

Cheers,
Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: plan for moving to Meson

2022-11-11 Thread Sebastian Berg
On Fri, 2022-11-11 at 12:27 +0100, Ralf Gommers wrote:
> Hi all,
> 
> With distutils now removed from the stdlib in the Python 3.12 release
> cycle, the clock is ticking a bit for dealing with our build system
> situation. With SciPy's move to Meson now basically complete - there
> are
> always loose ends & improvements, but the 1.9 releases have gone well
> -
> it's time to look at NumPy doing the same thing. Scikit-image has
> also
> merged support for Meson, and Pandas is about to. So all the generic
> &
> Python packaging related things we need should be in place. NumPy
> does have
> some hairy stuff of course that those other projects don't have (lots
> of
> config checks and platform-specific behavior, extensive SIMD
> support). It
> shouldn't be too difficult to get a baseline build - Linux/macOS with
> baseline SIMD flags - working, but there'll be a long tail of things
> that
> are hard to test or will need to be upstreamed to Meson.
> 
> Here is a tracking issue where I wrote up a plan for how to approach
> the
> transition: https://github.com/numpy/numpy/issues/22546. Linked from
> there
> is a GitHub project board that we plan to use to divide up the work.
> And
> there is a `meson` branch in the repo with a start to the
> implementation.
> 
> NumPy should default to building with Meson in the 1.25.0 release,
> which
> should also still build with `numpy.distutils`. And `numpy.distutils`
> will
> continue to be shipped for Python <3.12 for a while after that, until
> we
> determine that it's no longer needed.
> 
> If anyone has ideas or concerns regarding the current plan, it'd be
> great
> to hear them - here or on the issue.


Thanks for working on this!  I fully trusting you, Stéfan, and everyone
else involved to push this forward.

It seems like most of the decisions were really made for us and the
open points might be mainly about the how and maybe details about the
what.
To me it would also seem fine if you work on the main branch, but I
guess that would just make things harder to iterate at this point.

Since "NEP" was mentioned somewhere.  There would be two goals of
having a short one:
1. Settling on a specific approach for our build system
2. Informing users (about breakages/workarounds and how things work)

I doubt you/we need it for 1. at this point or details for why meson,
but if anyone disagrees then maybe we do ;).  But 2. may very much be
worthwhile.

- Sebastian


PS: I do like the idea of having short NEPs.  My feeling is that the
"painful" ones are the ones that are technically tricky.  For those it
is clear that they are necessary.
The other issue is about document intent (sometimes a NEP may be more
of a roadmap proposal then a concrete implementation one).


> 
> Cheers,
> Ralf
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: plan for moving to Meson

2022-11-11 Thread Ralf Gommers
On Fri, Nov 11, 2022 at 1:52 PM Sebastian Berg 
wrote:

> On Fri, 2022-11-11 at 12:27 +0100, Ralf Gommers wrote:
> > Hi all,
> >
> > With distutils now removed from the stdlib in the Python 3.12 release
> > cycle, the clock is ticking a bit for dealing with our build system
> > situation. With SciPy's move to Meson now basically complete - there
> > are
> > always loose ends & improvements, but the 1.9 releases have gone well
> > -
> > it's time to look at NumPy doing the same thing. Scikit-image has
> > also
> > merged support for Meson, and Pandas is about to. So all the generic
> > &
> > Python packaging related things we need should be in place. NumPy
> > does have
> > some hairy stuff of course that those other projects don't have (lots
> > of
> > config checks and platform-specific behavior, extensive SIMD
> > support). It
> > shouldn't be too difficult to get a baseline build - Linux/macOS with
> > baseline SIMD flags - working, but there'll be a long tail of things
> > that
> > are hard to test or will need to be upstreamed to Meson.
> >
> > Here is a tracking issue where I wrote up a plan for how to approach
> > the
> > transition: https://github.com/numpy/numpy/issues/22546. Linked from
> > there
> > is a GitHub project board that we plan to use to divide up the work.
> > And
> > there is a `meson` branch in the repo with a start to the
> > implementation.
> >
> > NumPy should default to building with Meson in the 1.25.0 release,
> > which
> > should also still build with `numpy.distutils`. And `numpy.distutils`
> > will
> > continue to be shipped for Python <3.12 for a while after that, until
> > we
> > determine that it's no longer needed.
> >
> > If anyone has ideas or concerns regarding the current plan, it'd be
> > great
> > to hear them - here or on the issue.
>
>
> Thanks for working on this!  I fully trusting you, Stéfan, and everyone
> else involved to push this forward.
>
> It seems like most of the decisions were really made for us and the
> open points might be mainly about the how and maybe details about the
> what.
>

I think there will be quite a few decisions on the way that different
people may want to weigh in on. So we need kind of a "decision log"
somewhere and a way to bubble up those decisions - which fits with the NEP
idea (as long as it remains draft and can be extended).

To give you a couple examples just from spending a few hours on build
system changes today:

(1) I noticed we still have a `oldnumeric.h` header with a comment that it
can probably be removed. So I used the new GitHub code search, found the
two actively developed projects that still use it, and asked them to get
rid of their usages so that we can indeed safely remove it. And then added
code + comment in the meson branch to not install it.

(2) a more important one, the `.c.src` format. In SciPy we got rid of it,
and we're not going to make Meson understand an ad-hoc templating method
that only NumPy uses. So we have two choices: also get rid of it, or write
a new custom preprocessing utility for NumPy's Meson build. I think we have
too much code using it to remove it, so probably should go with the "new
utility" option. But in case we did want to get rid of it, now would be a
good time.


> To me it would also seem fine if you work on the main branch, but I
> guess that would just make things harder to iterate at this point.
>

Yes, that'd be too disruptive at this point.


>
> Since "NEP" was mentioned somewhere.  There would be two goals of
> having a short one:
> 1. Settling on a specific approach for our build system
> 2. Informing users (about breakages/workarounds and how things work)
>
> I doubt you/we need it for 1. at this point or details for why meson,
> but if anyone disagrees then maybe we do ;).  But 2. may very much be
> worthwhile.
>

We should have regular html docs for everything, so the NEP on that point
would be more an FYI with a pointer to those docs.


>
> - Sebastian
>
>
> PS: I do like the idea of having short NEPs.  My feeling is that the
> "painful" ones are the ones that are technically tricky.  For those it
> is clear that they are necessary.
> The other issue is about document intent (sometimes a NEP may be more
> of a roadmap proposal then a concrete implementation one).
>

That does make sense to me. NEPs are not good as documentation - we should
have more design & architecture docs that are maintained over time for that
- but to document intent at the time a change was made they are quite nice.

Cheers,
Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Create `np.exceptions` for new exceptions in NumPy?

2022-11-11 Thread Sebastian Berg
Hi all,

I want to add a new exception or two.  It is a longer story, that you
can find at the bottom :).

Lets create a namespace for custom errors!  I don't want to propose new
exceptions that just get dumped in to the main namespace, so why not
make one like `errors` in pandas or `exceptions` in scikit-learn.

I would suggest introducing `np.exceptions`.

We already have custom errors and warnings:

* AxisError
* TooHardError  (used by `np.shares_memory()`)
* ComplexWarning
* RankWarning
* VisibleDeprecationWarning
* ModuleDeprecationWarning  (not sure what this is)

And a few private ones around ufunc "no loops" or casting failures (for
delayed printing and formatting convenience). 

No need to move them all now, but maybe it is time to create a module
where we put them all?  With the intention that when the stars align,
we will deprecate their main namespace aliases (either soon or in
years).

Beyond the error I just wanted, there were things brought up before,
such as either `BroadcastError` or `ShapeMismatch`.
Adding the namespace would make them more discoverable and just remove
an annoying road-block for adding new ones.
I will argue that the cost is practically zero.  I do not want custom
exceptions for too many things, but there are probably good reasons to
have more than we do have right now, and even the ones we have seem
enough for a namespace.


Cheers,

Sebastian



The long story is that following one of those many threads, I decided
that it looks worthwhile to introduce a new error class:

InvalidPromotion

I would want to use this for any/most promotion related failures.  That
means:

* `np.result_type` or `np.promote_types` will give this if there is no
  valid way to promote

* UFuncs will either give this error when there is no implementation
  or use it to raise a reliable error for "operation not defined for
  the inputs".  [0]

This would inherit from `TypeError` "of course".  The why is a ball of
yarn, that includes having a better shot at *finally* getting rid of
the annoying comparison deprecation/future warning [1], eventually
allowing more informative promotion errors, and that it might just be
useful.

Cheers,

Sebastian


[0] I first thought we should use the same error, but you can argue
that `InvalidPromotion` doesn't include "this ufunc only works for
floating point values".
And yes, "no loop" can also mean "not implemented", but that may be
need to be distinguished explicitly if really needed.

[1] e.g. `np.array(["asdf"]) == 0`


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New feature: binary (arbitrary base) rounding

2022-11-11 Thread Oscar Gustafsson
Thanks! That does indeed look like a promising approach! And for sure it
would be better to avoid having to reimplement the whole array-part and
only focus on the data types. (If successful, my idea of a project would
basically solve all the custom numerical types discussed, bfloat16, int2,
int4 etc.)

I understand that the following is probably a hard question to answer, but
is it expected that there will be work done on this in the "near" future
to fill any holes and possibly become more stable? For context, the current
plan on my side is to propose this as a student project for the spring, so
primarily asking for planning and describing the project a bit better.

BR Oscar

Den tors 10 nov. 2022 kl 15:13 skrev Sebastian Berg <
sebast...@sipsolutions.net>:

> On Thu, 2022-11-10 at 14:55 +0100, Oscar Gustafsson wrote:
> > Den tors 10 nov. 2022 kl 13:10 skrev Sebastian Berg <
> > sebast...@sipsolutions.net>:
> >
> > > On Thu, 2022-11-10 at 11:08 +0100, Oscar Gustafsson wrote:
> > > > >
> > > > > I'm not an expert, but I never encountered rounding floating
> > > > > point
> > > > > numbers
> > > > > in bases different from 2 and 10.
> > > > >
> > > >
> > > > I agree that this is probably not very common. More a possibility
> > > > if
> > > > one
> > > > would supply a base argument to around.
> > > >
> > > > However, it is worth noting that Matlab has the quant function,
> > > > https://www.mathworks.com/help/deeplearning/ref/quant.html which
> > > > basically
> > > > supports arbitrary bases (as a special case of an even more
> > > > general
> > > > approach). So there may be other use cases (although the example
> > > > basically
> > > > just implements around(x, 1)).
> > >
> > >
> > > To be honest, hearing hardware design and data compression does
> > > make me
> > > lean towards it not being mainstream enough that inclusion in NumPy
> > > really makes sense.  But happy to hear opposing opinions.
> > >
> >
> > Here I can easily argue that "all" computations are limited by finite
> > word
> > length and as soon as you want to see the effect of any type of
> > format not
> > supported out of the box, it will be beneficial. (Strictly, it makes
> > more
> > sense to quantize to a given number of bits than a given number of
> > decimal
> > digits, as we cannot represent most of those exactly.)  But I may not
> > do
> > that.
> >
> >
> > > It would be nice to have more of a culture around ufuncs that do
> > > not
> > > live in NumPy.  (I suppose at some point it was more difficult to
> > > do C-
> > > extension, but that is many years ago).
> > >
> >
> > I do agree with this though. And this got me realizing that maybe
> > what I
> > actually would like to do is to create an array-library with fully
> > customizable (numeric) data types instead. That is, sort of, the
> > proper way
> > to do it, although the proposed approach is indeed simpler and in
> > most
> > cases will work well enough.
> >
> > (Am I right in believing that it is not that easy to piggy-back
> > custom data
> > types onto NumPy arrays? Something different from using object as
> > dtype or
> > the "struct-like" custom approach using the existing scalar types.)
>
> NumPy is pretty much fully customizeable (beyond just numeric data
> types).
> Admittedly, to not have weird edge cases and have more power you have
> to use the new API (NEP 41-43 [1]) and that is "experimental" and may
> have some holes.
> "Experimental" doesn't mean it is expected to change significantly,
> just that you can't ship your stuff broadly really.
>
> The holes may matter for some complicated dtypes (custom memory
> allocation, parametric...). But at this point many should be rather
> fixable, so before you do your own give NumPy a chance?
>
> - Sebastian
>
>
> [1] https://numpy.org/neps/nep-0041-improved-dtype-support.html
>
> >
> > BR Oscar Gustafsson
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: sebast...@sipsolutions.net
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: oscar.gustafs...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: plan for moving to Meson

2022-11-11 Thread Evgeni Burovski
> (2) a more important one, the `.c.src` format. In SciPy we got rid of it, and 
> we're not going to make Meson understand an ad-hoc templating method that 
> only NumPy uses. So we have two choices: also get rid of it, or write a new 
> custom preprocessing utility for NumPy's Meson build. I think we have too 
> much code using it to remove it, so probably should go with the "new utility" 
> option. But in case we did want to get rid of it, now would be a good time.

As a comment from a peanut gallery, where the project board is not
even visible (https://github.com/orgs/numpy/projects/7/views/7 404s
for me --- it would be perfectly understandable if you prefer to keep
it visible to select individuals!), and it has probably been discussed
before: any thoughts to change it to e.g. tempita templating?
Translating .c.src templates to .c.in is straightforward if tedious,
as e.g. SciPy transition showed.
This is of course quite a bit of work, but so is a new utility.
Again, just throwing it out there :-).

Cheers,

Evgeni
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New feature: binary (arbitrary base) rounding

2022-11-11 Thread Sebastian Berg
On Fri, 2022-11-11 at 14:55 +0100, Oscar Gustafsson wrote:
> Thanks! That does indeed look like a promising approach! And for sure
> it
> would be better to avoid having to reimplement the whole array-part
> and
> only focus on the data types. (If successful, my idea of a project
> would
> basically solve all the custom numerical types discussed, bfloat16,
> int2,
> int4 etc.)

OK, more below.  But unfortunately `int2` and `int4` *are* problematic,
because the NumPy array uses a byte-sized strided layout, so you would
have to store them in a full byte, which is probably not what you want.

I am always thinking of adding a provision for it in the DTypes so that
someone could use part of the NumPy machine to make an array that can
have non-byte sized strides, but the NumPy array itself is ABI
incompatible with storing these packed :(.

(I.e. we could plug that "hole" to allow making an int4 DType in NumPy,
but it would still have to take 1-byte storage space when put into a
NumPy array, so I am not sure there is much of a point.)

> 
> I understand that the following is probably a hard question to
> answer, but
> is it expected that there will be work done on this in the "near"
> future
> to fill any holes and possibly become more stable? For context, the
> current
> plan on my side is to propose this as a student project for the
> spring, so
> primarily asking for planning and describing the project a bit
> better.


Well, it depends on what you need.  With the exception above, I doubt
the "holes" will matter much practice unless you are targeting for a
polished release rather than experimentation.
But of course it may be that you run into something that is important
for you, but doesn't yet quite work.

I will note just dealing with the Python/NumPy C-API can be a fairly
steep learning curve, so you need someone comfortable to dive in and
budget a good amount of time for that part.
And yes, this is pretty new, so there may be stumbling stones (which I
am happy to discuss in NumPy issues or directly).

- Sebastian


> 
> BR Oscar
> 
> Den tors 10 nov. 2022 kl 15:13 skrev Sebastian Berg <
> sebast...@sipsolutions.net>:
> 
> > On Thu, 2022-11-10 at 14:55 +0100, Oscar Gustafsson wrote:
> > > Den tors 10 nov. 2022 kl 13:10 skrev Sebastian Berg <
> > > sebast...@sipsolutions.net>:
> > > 
> > > > On Thu, 2022-11-10 at 11:08 +0100, Oscar Gustafsson wrote:
> > > > > > 
> > > > > > I'm not an expert, but I never encountered rounding
> > > > > > floating
> > > > > > point
> > > > > > numbers
> > > > > > in bases different from 2 and 10.
> > > > > > 
> > > > > 
> > > > > I agree that this is probably not very common. More a
> > > > > possibility
> > > > > if
> > > > > one
> > > > > would supply a base argument to around.
> > > > > 
> > > > > However, it is worth noting that Matlab has the quant
> > > > > function,
> > > > > https://www.mathworks.com/help/deeplearning/ref/quant.html wh
> > > > > ich
> > > > > basically
> > > > > supports arbitrary bases (as a special case of an even more
> > > > > general
> > > > > approach). So there may be other use cases (although the
> > > > > example
> > > > > basically
> > > > > just implements around(x, 1)).
> > > > 
> > > > 
> > > > To be honest, hearing hardware design and data compression does
> > > > make me
> > > > lean towards it not being mainstream enough that inclusion in
> > > > NumPy
> > > > really makes sense.  But happy to hear opposing opinions.
> > > > 
> > > 
> > > Here I can easily argue that "all" computations are limited by
> > > finite
> > > word
> > > length and as soon as you want to see the effect of any type of
> > > format not
> > > supported out of the box, it will be beneficial. (Strictly, it
> > > makes
> > > more
> > > sense to quantize to a given number of bits than a given number
> > > of
> > > decimal
> > > digits, as we cannot represent most of those exactly.)  But I may
> > > not
> > > do
> > > that.
> > > 
> > > 
> > > > It would be nice to have more of a culture around ufuncs that
> > > > do
> > > > not
> > > > live in NumPy.  (I suppose at some point it was more difficult
> > > > to
> > > > do C-
> > > > extension, but that is many years ago).
> > > > 
> > > 
> > > I do agree with this though. And this got me realizing that maybe
> > > what I
> > > actually would like to do is to create an array-library with
> > > fully
> > > customizable (numeric) data types instead. That is, sort of, the
> > > proper way
> > > to do it, although the proposed approach is indeed simpler and in
> > > most
> > > cases will work well enough.
> > > 
> > > (Am I right in believing that it is not that easy to piggy-back
> > > custom data
> > > types onto NumPy arrays? Something different from using object as
> > > dtype or
> > > the "struct-like" custom approach using the existing scalar
> > > types.)
> > 
> > NumPy is pretty much fully customizeable (beyond just numeric data
> > types).
> > Admittedly, to not have weird edge cases and have more power you
> > have

[Numpy-discussion] Re: plan for moving to Meson

2022-11-11 Thread Sebastian Berg
On Fri, 2022-11-11 at 17:03 +0300, Evgeni Burovski wrote:
> > (2) a more important one, the `.c.src` format. In SciPy we got rid
> > of it, and we're not going to make Meson understand an ad-hoc
> > templating method that only NumPy uses. So we have two choices:
> > also get rid of it, or write a new custom preprocessing utility for
> > NumPy's Meson build. I think we have too much code using it to
> > remove it, so probably should go with the "new utility" option. But
> > in case we did want to get rid of it, now would be a good time.
> 
> As a comment from a peanut gallery, where the project board is not
> even visible (https://github.com/orgs/numpy/projects/7/views/7 404s
> for me --- it would be perfectly understandable if you prefer to keep
> it visible to select individuals!), and it has probably been
> discussed

Whoops, got distracted:  I made it publicly visible.  I assume that was
the intention and invisible is just the default.

- Sebastian


> before: any thoughts to change it to e.g. tempita templating?
> Translating .c.src templates to .c.in is straightforward if tedious,
> as e.g. SciPy transition showed.
> This is of course quite a bit of work, but so is a new utility.
> Again, just throwing it out there :-).
> 
> Cheers,
> 
> Evgeni
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net
> 


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: plan for moving to Meson

2022-11-11 Thread Ralf Gommers
On Fri, Nov 11, 2022 at 3:43 PM Sebastian Berg 
wrote:

> On Fri, 2022-11-11 at 17:03 +0300, Evgeni Burovski wrote:
> > > (2) a more important one, the `.c.src` format. In SciPy we got rid
> > > of it, and we're not going to make Meson understand an ad-hoc
> > > templating method that only NumPy uses. So we have two choices:
> > > also get rid of it, or write a new custom preprocessing utility for
> > > NumPy's Meson build. I think we have too much code using it to
> > > remove it, so probably should go with the "new utility" option. But
> > > in case we did want to get rid of it, now would be a good time.
> >
> > As a comment from a peanut gallery, where the project board is not
> > even visible (https://github.com/orgs/numpy/projects/7/views/7 404s
> > for me --- it would be perfectly understandable if you prefer to keep
> > it visible to select individuals!), and it has probably been
> > discussed
>
> Whoops, got distracted:  I made it publicly visible.  I assume that was
> the intention and invisible is just the default.
>

Yes, definitely intended, thanks for fixing that. I didn't notice that it
was set up as private, that's just my inexperience with the new GitHub
Projects interface.

Cheers,
Ralf



> - Sebastian
>
>
> > before: any thoughts to change it to e.g. tempita templating?
> > Translating .c.src templates to .c.in is straightforward if tedious,
> > as e.g. SciPy transition showed.
> > This is of course quite a bit of work, but so is a new utility.
> > Again, just throwing it out there :-).
>

The utility isn't from-scratch new, it's more the boring job of refactoring
one of the numpy.distutils parts that should survive as a standalone thing.

Cheers,
Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New feature: binary (arbitrary base) rounding

2022-11-11 Thread Greg Lucas
>
> OK, more below.  But unfortunately `int2` and `int4` *are* problematic,
> because the NumPy array uses a byte-sized strided layout, so you would
> have to store them in a full byte, which is probably not what you want.


> I am always thinking of adding a provision for it in the DTypes so that
> someone could use part of the NumPy machine to make an array that can
> have non-byte sized strides, but the NumPy array itself is ABI
> incompatible with storing these packed :(.



(I.e. we could plug that "hole" to allow making an int4 DType in NumPy,
> but it would still have to take 1-byte storage space when put into a
> NumPy array, so I am not sure there is much of a point.)




I have also been curious about the new DTypes mechanism and whether we
could do non byte-size DTypes with it. One use-case I have specifically is
for reading and writing non byte-aligned data [1]. So, this would work very
well for that use-case if the dtype knew how to read/write the
proper bit-size. For my use-case I wouldn't care too much if internally
Numpy needs to expand and store the data as full bytes, but being able to
read a bitwise binary stream into Numpy native dtypes for further
processing would be useful I think (without having to resort to unpackbits
and do rearranging/packing to other types).

dtype = {'names': ('count0', 'count1'), 'formats': ('uint3', 'uint5')}
# x would have two unsigned ints, but reading only one byte from the stream
x = np.frombuffer(buffer, dtype)
# would be ideal to get tobytes() to know how to pack a uint3+uint5 DType
into a single byte as well
x.tobytes()

Greg

[1] Specifically, this is for very low bandwidth satellite data where we
try to pack as much information in the downlink and use every bit of space,
but once on the ground I can expand the bit-size fields to byte-size fields
without too much issue of worrying about space [puns intended].


On Fri, Nov 11, 2022 at 7:14 AM Sebastian Berg 
wrote:

> On Fri, 2022-11-11 at 14:55 +0100, Oscar Gustafsson wrote:
> > Thanks! That does indeed look like a promising approach! And for sure
> > it
> > would be better to avoid having to reimplement the whole array-part
> > and
> > only focus on the data types. (If successful, my idea of a project
> > would
> > basically solve all the custom numerical types discussed, bfloat16,
> > int2,
> > int4 etc.)
>
> OK, more below.  But unfortunately `int2` and `int4` *are* problematic,
> because the NumPy array uses a byte-sized strided layout, so you would
> have to store them in a full byte, which is probably not what you want.
>
> I am always thinking of adding a provision for it in the DTypes so that
> someone could use part of the NumPy machine to make an array that can
> have non-byte sized strides, but the NumPy array itself is ABI
> incompatible with storing these packed :(.
>
> (I.e. we could plug that "hole" to allow making an int4 DType in NumPy,
> but it would still have to take 1-byte storage space when put into a
> NumPy array, so I am not sure there is much of a point.)
>
> >
> > I understand that the following is probably a hard question to
> > answer, but
> > is it expected that there will be work done on this in the "near"
> > future
> > to fill any holes and possibly become more stable? For context, the
> > current
> > plan on my side is to propose this as a student project for the
> > spring, so
> > primarily asking for planning and describing the project a bit
> > better.
>
>
> Well, it depends on what you need.  With the exception above, I doubt
> the "holes" will matter much practice unless you are targeting for a
> polished release rather than experimentation.
> But of course it may be that you run into something that is important
> for you, but doesn't yet quite work.
>
> I will note just dealing with the Python/NumPy C-API can be a fairly
> steep learning curve, so you need someone comfortable to dive in and
> budget a good amount of time for that part.
> And yes, this is pretty new, so there may be stumbling stones (which I
> am happy to discuss in NumPy issues or directly).
>
> - Sebastian
>
>
> >
> > BR Oscar
> >
> > Den tors 10 nov. 2022 kl 15:13 skrev Sebastian Berg <
> > sebast...@sipsolutions.net>:
> >
> > > On Thu, 2022-11-10 at 14:55 +0100, Oscar Gustafsson wrote:
> > > > Den tors 10 nov. 2022 kl 13:10 skrev Sebastian Berg <
> > > > sebast...@sipsolutions.net>:
> > > >
> > > > > On Thu, 2022-11-10 at 11:08 +0100, Oscar Gustafsson wrote:
> > > > > > >
> > > > > > > I'm not an expert, but I never encountered rounding
> > > > > > > floating
> > > > > > > point
> > > > > > > numbers
> > > > > > > in bases different from 2 and 10.
> > > > > > >
> > > > > >
> > > > > > I agree that this is probably not very common. More a
> > > > > > possibility
> > > > > > if
> > > > > > one
> > > > > > would supply a base argument to around.
> > > > > >
> > > > > > However, it is worth noting that Matlab has the quant
> > > > > > function,
> > > > > > https://www.mathworks.com/help/deeplearn

[Numpy-discussion] Re: New feature: binary (arbitrary base) rounding

2022-11-11 Thread Sebastian Berg
On Fri, 2022-11-11 at 09:13 -0700, Greg Lucas wrote:
> > 
> > OK, more below.  But unfortunately `int2` and `int4` *are*
> > problematic,
> > because the NumPy array uses a byte-sized strided layout, so you
> > would
> > have to store them in a full byte, which is probably not what you
> > want.
> 
> 
> > I am always thinking of adding a provision for it in the DTypes so
> > that
> > someone could use part of the NumPy machine to make an array that
> > can
> > have non-byte sized strides, but the NumPy array itself is ABI
> > incompatible with storing these packed :(.
> 
> 
> 
> (I.e. we could plug that "hole" to allow making an int4 DType in
> NumPy,
> > but it would still have to take 1-byte storage space when put into
> > a
> > NumPy array, so I am not sure there is much of a point.)
> 
> 
> 
> 
> I have also been curious about the new DTypes mechanism and whether
> we
> could do non byte-size DTypes with it. One use-case I have
> specifically is
> for reading and writing non byte-aligned data [1]. So, this would
> work very
> well for that use-case if the dtype knew how to read/write the
> proper bit-size. For my use-case I wouldn't care too much if
> internally
> Numpy needs to expand and store the data as full bytes, but being
> able to
> read a bitwise binary stream into Numpy native dtypes for further
> processing would be useful I think (without having to resort to
> unpackbits
> and do rearranging/packing to other types).
> 
> dtype = {'names': ('count0', 'count1'), 'formats': ('uint3',
> 'uint5')}
> # x would have two unsigned ints, but reading only one byte from the
> stream
> x = np.frombuffer(buffer, dtype)
> # would be ideal to get tobytes() to know how to pack a uint3+uint5
> DType
> into a single byte as well
> x.tobytes()


Unfortunately, I suspect the amount of expectations users would have
from a full DType, and the fact that bit-sized will be a bit awkward in
NumPy arrays for the forseeable future makes me think dedicated
conversion functions are probably more practical.

Yes, you could do a `MyInt(bits=5, offset=3)` DType and at least you
could view the same array also with `MyInt(bits=3, offset=0)`.  (Maybe
also structured DType, but I am not certain that is advisable and
custom structured DTypes would require holes to be plucked).

A custom dtype that is "structured" might work (i.e. you could store
two numbers in one byte of course).
Currently you cannot integrate deep enough into NumPy to build
structured dtypes based on arbitrary other dtypes, but you could do it
for your own bit DType.
(I am not quite sure you can make `arr["count0"]` work, this is a hole
that needs plucking.)

This is probably not a small task though.


Could `tobytes()` be made to compactify?  Yes, but then it suddenly
needs extra logic for bit-sized and doesn't just expose memory.  That
is maybe fine, but also seems a bit awkward? 

I would love to have a better answer, but dancing around the byte-
strided ABI seems tricky...

Anyway, I am always available to discuss such possibilities, there are
some corners w.r.t. to such bit-sized thoughts which are still shrouded
in fog.

- Sebastian


> 
> Greg
> 
> [1] Specifically, this is for very low bandwidth satellite data where
> we
> try to pack as much information in the downlink and use every bit of
> space,
> but once on the ground I can expand the bit-size fields to byte-size
> fields
> without too much issue of worrying about space [puns intended].
> 
> 
> On Fri, Nov 11, 2022 at 7:14 AM Sebastian Berg <
> sebast...@sipsolutions.net>
> wrote:
> 
> > On Fri, 2022-11-11 at 14:55 +0100, Oscar Gustafsson wrote:
> > > Thanks! That does indeed look like a promising approach! And for
> > > sure
> > > it
> > > would be better to avoid having to reimplement the whole array-
> > > part
> > > and
> > > only focus on the data types. (If successful, my idea of a
> > > project
> > > would
> > > basically solve all the custom numerical types discussed,
> > > bfloat16,
> > > int2,
> > > int4 etc.)
> > 
> > OK, more below.  But unfortunately `int2` and `int4` *are*
> > problematic,
> > because the NumPy array uses a byte-sized strided layout, so you
> > would
> > have to store them in a full byte, which is probably not what you
> > want.
> > 
> > I am always thinking of adding a provision for it in the DTypes so
> > that
> > someone could use part of the NumPy machine to make an array that
> > can
> > have non-byte sized strides, but the NumPy array itself is ABI
> > incompatible with storing these packed :(.
> > 
> > (I.e. we could plug that "hole" to allow making an int4 DType in
> > NumPy,
> > but it would still have to take 1-byte storage space when put into
> > a
> > NumPy array, so I am not sure there is much of a point.)
> > 
> > > 
> > > I understand that the following is probably a hard question to
> > > answer, but
> > > is it expected that there will be work done on this in the "near"
> > > future
> > > to fill any holes and possibly become more stable? For contex

[Numpy-discussion] Re: Create `np.exceptions` for new exceptions in NumPy?

2022-11-11 Thread Stefan van der Walt
Hi Sebastian,

On Fri, Nov 11, 2022, at 05:46, Sebastian Berg wrote:
> I would suggest introducing `np.exceptions`.
>
> We already have custom errors and warnings:
>
> * AxisError
> * TooHardError  (used by `np.shares_memory()`)
> * ComplexWarning
> * RankWarning
> * VisibleDeprecationWarning
> * ModuleDeprecationWarning  (not sure what this is)

At first glance, grouping these classes, mainly used internally, into a 
namespace makes sense to me.
We also now have the ability to keep them exposed in their old locations for 
backward compatibility, while not showing them in __all__ and __dir__ (but not 
even sure that's 100% necessary?).

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: New feature: binary (arbitrary base) rounding

2022-11-11 Thread Michael Siebert
Hi all,

an advantage of sub-byte datatypes is the potential for accelerated computing. 
For GPUs, int4 is already happening. Or take int1 for example: if one had two 
arrays of size 64, that would be eight bytes. Now, if one wanted to add those 
two arrays, one could simply xor them as a uint64 (or 8x uint8 xor).

However, I would rather limit sub-bytetypes to int1, (u)int2 and (u)int4, as 
they are the only ones that divide the byte evenly (or to begin with at least).

Considering single element access: a single element in such an array could be 
accessed by dividing the index, e.g. and ANDing with a mask. Probably uint8 
would make sense for this. That would create some overhead of course, but the 
data is more compact (which is nice for CPU/GPU cache) and full-array ops are 
faster.

Striding could be done similarly to single element access. This would be 
inefficient as well, but one could auto-generate some type specific C code (for 
int1, (u)int2, (u)int4 and their combinations) that accelerates popular 
operators. So one would not need to actually loop over every entry with single 
element access.

„byte size strided“: isn‘t it possible to pre-process the strides and 
post-process the output as mentioned above? Like a wrapping class around a 
uint8 array.

What do you think? Am I missing out on something?

Best, Michael

> On 11. Nov 2022, at 18:23, Sebastian Berg  wrote:
> 
> On Fri, 2022-11-11 at 09:13 -0700, Greg Lucas wrote:
>>> 
>>> OK, more below.  But unfortunately `int2` and `int4` *are*
>>> problematic,
>>> because the NumPy array uses a byte-sized strided layout, so you
>>> would
>>> have to store them in a full byte, which is probably not what you
>>> want.
>> 
>> 
>>> I am always thinking of adding a provision for it in the DTypes so
>>> that
>>> someone could use part of the NumPy machine to make an array that
>>> can
>>> have non-byte sized strides, but the NumPy array itself is ABI
>>> incompatible with storing these packed :(.
>> 
>> 
>> 
>> (I.e. we could plug that "hole" to allow making an int4 DType in
>> NumPy,
>>> but it would still have to take 1-byte storage space when put into
>>> a
>>> NumPy array, so I am not sure there is much of a point.)
>> 
>> 
>> 
>> 
>> I have also been curious about the new DTypes mechanism and whether
>> we
>> could do non byte-size DTypes with it. One use-case I have
>> specifically is
>> for reading and writing non byte-aligned data [1]. So, this would
>> work very
>> well for that use-case if the dtype knew how to read/write the
>> proper bit-size. For my use-case I wouldn't care too much if
>> internally
>> Numpy needs to expand and store the data as full bytes, but being
>> able to
>> read a bitwise binary stream into Numpy native dtypes for further
>> processing would be useful I think (without having to resort to
>> unpackbits
>> and do rearranging/packing to other types).
>> 
>> dtype = {'names': ('count0', 'count1'), 'formats': ('uint3',
>> 'uint5')}
>> # x would have two unsigned ints, but reading only one byte from the
>> stream
>> x = np.frombuffer(buffer, dtype)
>> # would be ideal to get tobytes() to know how to pack a uint3+uint5
>> DType
>> into a single byte as well
>> x.tobytes()
> 
> 
> Unfortunately, I suspect the amount of expectations users would have
> from a full DType, and the fact that bit-sized will be a bit awkward in
> NumPy arrays for the forseeable future makes me think dedicated
> conversion functions are probably more practical.
> 
> Yes, you could do a `MyInt(bits=5, offset=3)` DType and at least you
> could view the same array also with `MyInt(bits=3, offset=0)`.  (Maybe
> also structured DType, but I am not certain that is advisable and
> custom structured DTypes would require holes to be plucked).
> 
> A custom dtype that is "structured" might work (i.e. you could store
> two numbers in one byte of course).
> Currently you cannot integrate deep enough into NumPy to build
> structured dtypes based on arbitrary other dtypes, but you could do it
> for your own bit DType.
> (I am not quite sure you can make `arr["count0"]` work, this is a hole
> that needs plucking.)
> 
> This is probably not a small task though.
> 
> 
> Could `tobytes()` be made to compactify?  Yes, but then it suddenly
> needs extra logic for bit-sized and doesn't just expose memory.  That
> is maybe fine, but also seems a bit awkward? 
> 
> I would love to have a better answer, but dancing around the byte-
> strided ABI seems tricky...
> 
> Anyway, I am always available to discuss such possibilities, there are
> some corners w.r.t. to such bit-sized thoughts which are still shrouded
> in fog.
> 
> - Sebastian
> 
> 
>> 
>> Greg
>> 
>> [1] Specifically, this is for very low bandwidth satellite data where
>> we
>> try to pack as much information in the downlink and use every bit of
>> space,
>> but once on the ground I can expand the bit-size fields to byte-size
>> fields
>> without too much issue of worrying about space [puns intended].
>> 
>> 
>>

[Numpy-discussion] Re: plan for moving to Meson

2022-11-11 Thread Stefan van der Walt
On Fri, Nov 11, 2022, at 06:03, Evgeni Burovski wrote:
> before: any thoughts to change it to e.g. tempita templating?

With the "e.g." maybe being jinja2. tempita works well, but hasn't been worked 
on since 2013.

Stéfan
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: plan for moving to Meson

2022-11-11 Thread Ralf Gommers
On Fri, Nov 11, 2022 at 10:07 PM Stefan van der Walt 
wrote:

> On Fri, Nov 11, 2022, at 06:03, Evgeni Burovski wrote:
> > before: any thoughts to change it to e.g. tempita templating?
>
> With the "e.g." maybe being jinja2. tempita works well, but hasn't been
> worked on since 2013.
>

It actually was only moderately painful; I just refactored the thing. So
let's think about removing .src at another time (if ever).

Cheers,
Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Create `np.exceptions` for new exceptions in NumPy?

2022-11-11 Thread Aaron Meurer
No comment on the separate namespace for exceptions, but +1 to more
specific exceptions like BroadcastError or InvalidPromotion. They are
more informative, allow users to catch specific errors without pattern
matching the message string, and they would allow putting the relevant
error information in properties rather than just the message (e.g.,
like AxisError does with axis and ndim), which makes for nicer
programmatic access. It would be interesting to see this with
IndexError too, although I'm not sure if it's a good idea to change
the exception type there.

On Fri, Nov 11, 2022 at 11:10 AM Stefan van der Walt
 wrote:
>
> Hi Sebastian,
>
> On Fri, Nov 11, 2022, at 05:46, Sebastian Berg wrote:
> > I would suggest introducing `np.exceptions`.
> >
> > We already have custom errors and warnings:
> >
> > * AxisError
> > * TooHardError  (used by `np.shares_memory()`)
> > * ComplexWarning
> > * RankWarning
> > * VisibleDeprecationWarning
> > * ModuleDeprecationWarning  (not sure what this is)
>
> At first glance, grouping these classes, mainly used internally, into a 
> namespace makes sense to me.
> We also now have the ability to keep them exposed in their old locations for 
> backward compatibility, while not showing them in __all__ and __dir__ (but 
> not even sure that's 100% necessary?).

The new exceptions wouldn't need to go there, but anyone who has ever
wanted to catch one of the existing exceptions will have done "from
numpy import AxisError" or "except np.AxisError". So I think they
would need to stay, or at least go through a deprecation. I personally
have written code that imports VisibleDeprecationWarning.

Aaron Meurer

>
> Stéfan
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: asmeu...@gmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com