[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread john . dawson
> From my point of view, such function is a bit of a corner-case to be added to 
> numpy. And it doesn’t justify it’s naming anymore. It is not one operation 
> anymore. It is a cumsum and prepending 0. And it is very difficult to argue 
> why prepending 0 to cumsum is a part of cumsum.

That is backwards. Consider the array [x0, x1, x2].

The sum of the first 0 elements is 0.
The sum of the first 1 elements is x0.
The sum of the first 2 elements is x0+x1.
The sum of the first 3 elements is x0+x1+x2.

Hence, the array of partial sums is [0, x0, x0+x1, x0+x1+x2].

Thus, the operation [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] is a natural and 
primitive one.

The current behaviour of numpy.cumsum is the composition of two basic 
operations, computing the partial sums and omitting the initial value:

[x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] -> [x0, x0+x1, x0+x1+x2].

> What I would rather vouch for is adding an argument to `np.diff` so that it 
> leaves first row unmodified.
> def diff0(a, axis=-1):
> """Differencing which appends first item along the axis"""
> a0 = np.take(a, [0], axis=axis)
> return np.concatenate([a0, np.diff(a, n=1, axis=axis)], axis=axis)
> This would be more sensible from conceptual point of view. As difference can 
> not be made, the result is the difference from absolute origin. With 
> recognition that first non-origin value in a sequence is the one after it. 
> And if the first row is the origin in a specific case, then that origin is 
> correctly defined in relation to absolute origin.
> Then, if origin row is needed, then it can be prepended in the beginning of a 
> procedure. And np.diff and np.cumsum are inverses throughout the sequential 
> code.
> np.diff0 was one the first functions I had added to my numpy utils and been 
> using it instead of np.diff quite a lot.

This suggestion is bad: diff0 is conceptually confused. numpy.diff changes an 
array of numpy.datetime64s to an array of numpy.timedelta64s, but numpy.diff0 
changes an array of numpy.datetime64s to a heterogeneous array where one 
element is a numpy.datetime64 and the rest are numpy.timedelta64s. In general, 
whereas numpy.diff changes an array of positions to an array of displacements, 
diff0 changes an array of positions to a heterogeneous array where one element 
is a position and the rest are displacements.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread Ilhan Polat
On Tue, Aug 15, 2023 at 2:44 PM  wrote:

> > From my point of view, such function is a bit of a corner-case to be
> added to numpy. And it doesn’t justify it’s naming anymore. It is not one
> operation anymore. It is a cumsum and prepending 0. And it is very
> difficult to argue why prepending 0 to cumsum is a part of cumsum.
>
> That is backwards. Consider the array [x0, x1, x2].
>
> The sum of the first 0 elements is 0.
> The sum of the first 1 elements is x0.
> The sum of the first 2 elements is x0+x1.
> The sum of the first 3 elements is x0+x1+x2.
>
> Hence, the array of partial sums is [0, x0, x0+x1, x0+x1+x2].
>
> Thus, the operation [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] is a natural
> and primitive one.
>
>
You are describing ndarray.sum() behavior here inside an array as
intermediate results; sum is an aggregator that produces single item from a
list of items. Then you can argue about missing items behavior and the
values you have provided are exactly the values the accumulator would get.
However, cumsum, cumprod, diff etc. are "array functions". In other words
they provide fast vectorized access to otherwise laborious for loops. You
have to consider the equivalent for loops working on the array *data*, not
the ideal math framework over the number field. You don't start with the
array element that is before the first element for an array function hence
no elements -> 0 is only applicable to sum but not to the array function.
Or at least that would be my argument.

If you have no element meaning 0 elements the cumulative sum is not 0, it
is the empty array. Because there is no array to cumulatively "sum"
(remember we are working on the array to generate another array, not
aggregating). You can argue what empty set translates to under summation
etc. but I don't think it applies here. But that's my opinion. I'm not sure
why folks wanted to have this at all. It is the same as asking whether this
code

for k in range(0):
...some code ...

should at least spin once (fortran-ish behavior). I don't know why it
should. But then again, it becomes a bikeshedding with some conflicting
idealistic mathy axioms thrown at each other.

NumPy cumsum returns empty array for empty array (I think all software does
this including matlab). ndarray.sum() however returns scalar 0 (and I think
most software does this too), because that's pretty much a no-op over the
initialization value and aggregated, in the example above

x=0
for k in range(0):
x += 1
return x # returns 0

I think all these point to the missing convenient functionality that
extends arrays. In matlab "[0 arr 10]" nicely extends the array to a new
one but in NumPy you need to punch quite some code and some courage to
remember whether it is hstack or vstack or concat or block as the correct
naming which decreases the "code morale". So if people want to quickly
extend arrays they either have to change the code for their needs or create
larger arrays which is pretty much #6044. So I think this is a feature
request of "prepend", "append" in a convenient fashion not to ufuncs but to
ndarray. Because concatenation is just pain in NumPy and ubiquitous
operation all around. Hence probably we should get a decision on that
instead of discussing each case separately.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Cirrus testing

2023-08-15 Thread Charles R Harris
Hi All,

This is a heads up that we have already exceeded our allotment of free time
on Cirrus CI. They are giving us a pass this month, but next month they
will start enforcing the limits. That will impact both our testing and our
releases. We have taken steps to reduce our use of Cirrus, but it could
still be a problem, we should have a contingency plan in place for at least
the next two months.

Thoughts?

Chuck.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Cirrus testing

2023-08-15 Thread Ralf Gommers
On Tue, Aug 15, 2023 at 4:19 PM Charles R Harris 
wrote:

> Hi All,
>
> This is a heads up that we have already exceeded our allotment of free
> time on Cirrus CI. They are giving us a pass this month, but next month
> they will start enforcing the limits. That will impact both our testing and
> our releases. We have taken steps to reduce our use of Cirrus, but it could
> still be a problem, we should have a contingency plan in place for at least
> the next two months.
>
> Thoughts?
>

At the current rate, our bill would be around $100/month. Cirrus CI is very
useful, and I don't think we should move away from it - having to run
64-bit ARM platforms under QEMU would be quite bad. So I think the
contingency plan here should be: just pay the bill. We're not exactly
wealthy as a project, but it's not 2015 anymore either - we have funds at
https://opencollective.com/numpy, and a monthly income of a few thousand
dollars a month (hat tip to Tidelift). So we can easily afford it, and
it'll be money well spent.

The most annoying thing with non-free things is not the money itself, but
the logistics around it and that someone has to be responsible for it.
Assuming there's no better idea to avoid paying the bill, and the steering
council signs off on paying the bill, I think we can manage that though.

Cheers,
Ralf
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread Dom Grigonis


> On 14 Aug 2023, at 15:22, john.daw...@camlingroup.com wrote:
> 
>> From my point of view, such function is a bit of a corner-case to be added 
>> to numpy. And it doesn’t justify it’s naming anymore. It is not one 
>> operation anymore. It is a cumsum and prepending 0. And it is very difficult 
>> to argue why prepending 0 to cumsum is a part of cumsum.
> 
> That is backwards. Consider the array [x0, x1, x2].
> 
> The sum of the first 0 elements is 0.
> The sum of the first 1 elements is x0.
> The sum of the first 2 elements is x0+x1.
> The sum of the first 3 elements is x0+x1+x2.
> 
> Hence, the array of partial sums is [0, x0, x0+x1, x0+x1+x2].
> 
> Thus, the operation [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] is a natural and 
> primitive one.
> 
> The current behaviour of numpy.cumsum is the composition of two basic 
> operations, computing the partial sums and omitting the initial value:
> 
> [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] -> [x0, x0+x1, x0+x1+x2].
In reality both of these functions do exactly what they need to do. But the 
issue, as I understand it, is to have one of these in such way, so that they 
are inverses of each other. The only question is which one is better suitable 
for it and provides most benefits.

Arguments for np.diff0:
1. Dimension length stays constant, while cumusm0 extends length to n+1, then 
np.diff, truncates it back. This adds extra complexity, while things are very 
convenient to work with when dimension length stays constant throughout the 
code.
2. Although I see your argument about element 0, but the fact is that it 
doesn’t exist at all. in np.diff0 case at least half of it exists and the other 
half has a half decent rationale. In cumsum0 case it just appeared out of 
nowhere and in your example above you are providing very different logic to 
what np.cumsum is intrinsically. Ilhan has accurately pointed it out in his 
e-mail.

For now, I only see my point of view and I can list a number of cases from data 
analysis and modelling, where I found np.diff0 to be a fairly optimal choice to 
use and it made things smoother. While I haven’t seen any real-life examples 
where np.cumsum0 would be useful so I am naturally biased. I would appreciate 
If anyone provided some examples that justify np.cumsum0 - for now I just can’t 
think of any case where this could actually be useful or why it would be more 
convenient/sensible than np.diff0.

>> What I would rather vouch for is adding an argument to `np.diff` so that it 
>> leaves first row unmodified.
>> def diff0(a, axis=-1):
>>"""Differencing which appends first item along the axis"""
>>a0 = np.take(a, [0], axis=axis)
>>return np.concatenate([a0, np.diff(a, n=1, axis=axis)], axis=axis)
>> This would be more sensible from conceptual point of view. As difference can 
>> not be made, the result is the difference from absolute origin. With 
>> recognition that first non-origin value in a sequence is the one after it. 
>> And if the first row is the origin in a specific case, then that origin is 
>> correctly defined in relation to absolute origin.
>> Then, if origin row is needed, then it can be prepended in the beginning of 
>> a procedure. And np.diff and np.cumsum are inverses throughout the 
>> sequential code.
>> np.diff0 was one the first functions I had added to my numpy utils and been 
>> using it instead of np.diff quite a lot.
> 
> This suggestion is bad: diff0 is conceptually confused. numpy.diff changes an 
> array of numpy.datetime64s to an array of numpy.timedelta64s, but numpy.diff0 
> changes an array of numpy.datetime64s to a heterogeneous array where one 
> element is a numpy.datetime64 and the rest are numpy.timedelta64s. In 
> general, whereas numpy.diff changes an array of positions to an array of 
> displacements, diff0 changes an array of positions to a heterogeneous array 
> where one element is a position and the rest are displacements.


This isn’t really argument against np.diff0, just one aspect of it which would 
have to be dealt with. If instead of just prepending, the difference from 0 was 
made, it would result in numpy.timedelta64s. So not a big issue.


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread Dom Grigonis
With this I agree, this sounds like a more radical (in a good way) solution.

> So I think this is a feature request of "prepend", "append" in a convenient 
> fashion not to ufuncs but to ndarray. Because concatenation is just pain in 
> NumPy and ubiquitous operation all around. Hence probably we should get a 
> decision on that instead of discussing each case separately.

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Cirrus testing

2023-08-15 Thread Andrew Nelson
There's a scipy issue on this that discusses how to reduce usage,
https://github.com/scipy/scipy/issues/19006.

Main points:

- at the moment CI is run on PR and on Merge. Convert to only running on PR
commits. I've just submitted a PR to do this for numpy.
- add a manual trigger. Simple to achieve, but requires input from a
maintainer.
- reduce wheel build frequency. At the moment I believe they're made every
week. However, that decision has to factor in the increased frequency that
may be desired as numpy2.0 is worked on.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Cirrus testing

2023-08-15 Thread Andrew Nelson
On Wed, 16 Aug 2023 at 10:51, Andrew Nelson  wrote:

> There's a scipy issue on this that discusses how to reduce usage,
> https://github.com/scipy/scipy/issues/19006.
>
> Main points:
>
> - at the moment CI is run on PR and on Merge. Convert to only running on
> PR commits. I've just submitted a PR to do this for numpy.
> - add a manual trigger. Simple to achieve, but requires input from a
> maintainer.
> - reduce wheel build frequency. At the moment I believe they're made every
> week. However, that decision has to factor in the increased frequency that
> may be desired as numpy2.0 is worked on.
>
>
Also, it's significantly more expensive to test on macOS M1 compared to
linux_aarch64. The latter isn't tested on cirrus. However, you could use
linux_aarch64 as a proxy for general ARM testing, and only run macOS when
necessary.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com