[Numpy-discussion] next NumPy triage meeting - November 27th, 2024 at 18:00 UTC

2024-11-24 Thread Inessa Pawson via NumPy-Discussion
The next NumPy triage meeting will be held this Wednesday, November 27th at
18:00 UTC. This is a meeting where we synchronously triage prioritized PRs
and issues.
Join us via Zoom:
https://numfocus-org.zoom.us/j/82096749952?pwd=MW9oUmtKQ1c3a2gydGk1RTdYUUVXZz09
Everyone is welcome to attend and contribute to a conversation.
Please notify us of issues or PRs that you’d like to have reviewed by
adding a GitHub link to them in the meeting agenda:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

-- 
Cheers,
Inessa

Inessa Pawson
GitHub: inessapawson
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2024-11-24 Thread Marten van Kerkwijk
Hi Matti,

I'm sorry, I should probably have started a new thread with a proper
introduction.  `reduceat` has always been about having piecewise
reductions, but in a way that is rather convoluted.  From
https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html
One sees that the indices are interpreted as follows:

```
For i in range(len(indices)), reduceat computes
ufunc.reduce(array[indices[i]:indices[i+1]]), which becomes the i-th
generalized "row" parallel to `axis` in the final result (i.e., ).
There are three exceptions to this:

* when i = len(indices) - 1 (so for the last index), indices[i+1] = 
array.shape[axis].
* if indices[i] >= indices[i + 1], the i-th generalized “row” is simply 
array[indices[i]].
* if indices[i] >= len(array) or indices[i] < 0, an error is raised.
```

The exceptions are the main issue I have with the current definition (see
also other threads over the years [1]): really, the current setup is
only natural for contiguous pieces; for anything else, it requires
contortion.  For instance, the documentation describes how to get a
running sum as follows:
```
np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2]
```
Note the slice at the end to remove the unwanted elements!  And note
that this *omits* the last set of 4 elements -- to get this, one has to
add a solitary index 4 at the end - one cannot get slices that include
the last element except as the last one.

The PR arose from this unnatural way to describe slices: Why can one not
just pass in the start and stop values directly?  With no exceptions,
but just interpreted as slices should be.  I.e., get a running sum as
```
np.add.reduceat(np.arange(8), ((start := np.arange(0, 8//2+1)), start+8//2))

Currently, the updated docstring explains the new mode as follows:
```
There are two modes for how `indices` is interpreted. If it is a tuple of
2 arrays (or an array with two rows), then these are interpreted as start
and stop values of slices over which to compute reductions, i.e., for each
row i, ``ufunc.reduce(array[indices[0, i]:indices[1, i]])`` is computed,
which becomes the i-th element along `axis` in the final result (e.g., in
a 2-D array, if ``axis=0``, it becomes the i-th row, but if ``axis=1``,
it becomes the i-th column). Like for slices, negative indices are allowed
for both start and stop, and the values are clipped to be between 0 and
the shape of the array along `axis`.
```

The reason `initial` was added is that with the new layout is that I did
not want to have the exception currently present, where if stop < start,
one gets the value at start.  I felt it was more logical to treat this
case as an empty reduction, but then it becomes necessary to able to
pass in an initial value for reductions that do not have an identity,
like np.minimum (which of course just helps make `reduceat` more similar
to `reduce`).

Note that I considered requiring `slice(start, stop)`, which might be
clearer.  I only did not do that since implementation-wise just having a
tuple or an array with 2 columns was super easy.  I also liked that with
this implementation the old way could at least in principle be described
in terms of the new one, as having a default stop that just takes the
next element of start (with the same exceptions as above).  I ended not
describing it as such in the docstring, though.

Anyway, if in principle it is thought a good idea to make `reduceat`
more flexible, the API is up for discussion.  It could require
`indices=slice(start, stop)` (possibly step too), or one could have
allow not passing in `indices` if `start` and `stop` are present.

Hope this clarifies things!

Marten

matti picus via NumPy-Discussion  writes:

> I am not sure how I feel about this. If I understand correctly, the
> issue started as a corner case when the indices were incorrect, and
> grew to dealing with initial values, and then added a desire for
> piecewise reducat with multiple segements. Is that correct? Could you
> give a better summary of the issue the PR is trying to solve? The
> examples look magic to me, it took me a long time to understand that
> the `[1, 3, 5]` correspond to start indices and `[2, -1, 0]`
> correspond to stop indices. Perhaps we should require kwarg use
> instead of positional to make the code more readable.
> Matti
> 
> On Sun, Nov 24, 2024 at 3:13 AM Marten van Kerkwijk
>  wrote:
>>
>> Hi All,
>>
>> This discussion about updating reduceat went silent, but recently I came
>> back to my PR to allow `indices` to be a 2-dimensional array of start
>> and stop values (or a tuple of separate start and stop arrays).  I
>> thought a bit more about it and think it is the easiest way to extend
>> the present definition.  So, I have added some tests and documentation
>> and would now like to open it for proper discussion.  See
>>
>> https://github.com/numpy/numpy/pull/25476
>>
>> >From the examples there:
>> ```
>> a = np.arange(12)
>> np.add.reduceat(a, ([1, 3, 5],

[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2024-11-24 Thread Marten van Kerkwijk
I forgot to add links to previous discussions.

Github issue: https://github.com/numpy/numpy/issues/834
2011 thread: 
https://mail.python.org/archives/list/numpy-discussion@python.org/thread/DX5KVE5O36MQHIEBOFK6YRH2JPRMFPVB/#I5MKK4ZPX3FA6K6H5457F4WOHYSO67NN
2016 thread #1: 
https://mail.python.org/archives/list/numpy-discussion@python.org/thread/RI7MYBUB6S7PGXQ27ZCNLOWEPNSFMDHI/#PBIB7BEA35NE2WMI25SJJEWN7YW6W72V
2016 thread #2: 
https://mail.python.org/archives/list/numpy-discussion@python.org/thread/RZZ3TJVB5M4UFTZKD4XDDLPT2AW3ANR6/#RZZ3TJVB5M4UFTZKD4XDDLPT2AW3ANR6
2017 thread: 
https://mail.python.org/archives/list/numpy-discussion@python.org/thread/YKLT53KE54MKKO2FDWOYQQZGSN4EGSGU/#YKLT53KE54MKKO2FDWOYQQZGSN4EGSGU
(where I suggested adding a `slice` argument to reduce instead; also an 
option...)
2023 thread (this one): 
https://mail.python.org/archives/list/numpy-discussion@python.org/thread/VWDXYJW362WPNE5JLZCQNPEAQD6EIKSI/#MB4LXPBEJ5OFMPXG24PCAFTXYLQIXZSG

-- Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com