[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-22 Thread john . dawson
Dom Grigonis wrote:
> 1. Dimension length stays constant, while cumusm0 extends length to n+1, then 
> np.diff, truncates it back. This adds extra complexity, while things are very 
> convenient to work with when dimension length stays constant throughout the 
> code.

For n values there are n-1 differences. Equivalently, for k differences there 
are k+1 values. Herefor, `diff` ought to reduce length by 1 and `cumsum` ought 
to increase it by 1. Returning arrays of the same length is a fencepost error. 
This is a problem in the current behaviour of `cumsum` and the proposed 
behaviour of `diff0`.

Dom Grigonis wrote:
> For now, I only see my point of view and I can list a number of cases from 
> data analysis and modelling, where I found np.diff0 to be a fairly optimal 
> choice to use and it made things smoother. While I haven’t seen any real-life 
> examples where np.cumsum0 would be useful so I am naturally biased. I would 
> appreciate If anyone provided some examples that justify np.cumsum0 - for now 
> I just can’t think of any case where this could actually be useful or why it 
> would be more convenient/sensible than np.diff0.


EXAMPLE

Consider a path given by a list of points, say (101, 203), (102, 205), (107, 
204) and (109, 202). What are the positions at fractions, say 1/3 and 2/3, 
along the path (linearly interpolating)?

The problem is naturally solved with `diff` and `cumsum0`:

```
import numpy as np
from scipy import interpolate

positions = np.array([[101, 203], [102, 205], [107, 204], [109, 202]], 
dtype=float)
steps_2d = np.diff(positions, axis=0)
steps_1d = np.linalg.norm(steps_2d, axis=1)
distances = np.cumsum0(steps_1d)
fractions = distances / distances[-1]
interpolate_at = interpolate.make_interp_spline(fractions, positions, 1)
interpolate_at(1/3)
interpolate_at(2/3)
```

Please show how to solve the problem with `diff0` and `cumsum`.


Both `diff0` and `cumsum` have a fencepost problem, but `diff0` has a second 
defect: it maps an array of positions to a heterogeneous array where one 
element is a position and the rest are displacements. The operations that make 
sense for displacements, like scaling, differ from those that make sense for 
positions.


EXAMPLE

Money is invested on 2023-01-01. The annualized rate is 4% until 2023-02-04 and 
5% thence until 2023-04-02. By how much does the money multiply in this time?

The problem is naturally solved with `diff`:

```
import numpy as np

percents = np.array([4, 5], dtype=float)
times = np.array(["2023-01-01", "2023-02-04", "2023-04-02"], 
dtype=np.datetime64)
durations = np.diff(times)
YEAR = np.timedelta64(365, "D")
multipliers = (1 + percents / 100) ** (durations / YEAR)
multipliers.prod()
```

Please show how to solve the problem with `diff0`. It makes sense to divide 
`np.diff(times)` by `YEAR`, but it would not make sense to divide the output of 
`np.diff0(times)` by `YEAR` because of its incongruous initial value.

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2024-03-20 Thread john . dawson
Yet another example is
```
d = np.zeros(n)
d[1:] = np.linalg.norm(np.diff(points, axis=1), axis=0)
r = d.cumsum()
```
https://github.com/WarrenWeckesser/ufunclab/blob/main/examples/linear_interp1d_demo.py#L13-L15
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Add to NumPy a function to compute cumulative sums from 0.

2023-08-11 Thread john . dawson
`cumsum` computes the sum of the first k summands for every k from 1. Judging 
by my experience, it is more often useful to compute the sum of the first k 
summands for every k from 0, as `cumsum`'s behaviour leads to fencepost-like 
problems.
https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
For example, `cumsum` is not the inverse of `diff`. I propose adding a function 
to NumPy to compute cumulative sums beginning with 0, that is, an inverse of 
`diff`. It might be called `cumsum0`. The following code is probably not the 
best way to implement it, but it illustrates the desired behaviour.

```
def cumsum0(a, axis=None, dtype=None, out=None):
"""
Return the cumulative sum of the elements along a given axis,
beginning with 0.

cumsum0 does the same as cumsum except that cumsum computes the sum
of the first k summands for every k from 1 and cumsum, from 0.

Parameters
--
a : array_like
Input array.
axis : int, optional
Axis along which the cumulative sum is computed. The default
(None) is to compute the cumulative sum over the flattened
array.
dtype : dtype, optional
Type of the returned array and of the accumulator in which the
elements are summed. If `dtype` is not specified, it defaults to
the dtype of `a`, unless `a` has an integer dtype with a
precision less than that of the default platform integer. In
that case, the default platform integer is used.
out : ndarray, optional
Alternative output array in which to place the result. It must
have the same shape and buffer length as the expected output but
the type will be cast if necessary. See
:ref:`ufuncs-output-type` for more details.

Returns
---
cumsum0_along_axis : ndarray.
A new array holding the result is returned unless `out` is
specified, in which case a reference to `out` is returned. If
`axis` is not None the result has the same shape as `a` except
along `axis`, where the dimension is smaller by 1.

See Also

cumsum : Cumulatively sum array elements, beginning with the first.
sum : Sum array elements.
trapz : Integration of array values using the composite trapezoidal rule.
diff : Calculate the n-th discrete difference along given axis.

Notes
-
Arithmetic is modular when using integer types, and no error is
raised on overflow.

``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
values since ``sum`` may use a pairwise summation routine, reducing
the roundoff-error. See `sum` for more information.

Examples

>>> a = np.array([[1, 2, 3], [4, 5, 6]])
>>> a
array([[1, 2, 3],
   [4, 5, 6]])
>>> np.cumsum0(a)
array([ 0,  1,  3,  6, 10, 15, 21])
>>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
array([ 0.,  1.,  3.,  6., 10., 15., 21.])

>>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
array([[0, 0, 0],
   [1, 2, 3],
   [5, 7, 9]])
>>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
array([[ 0,  1,  3,  6],
   [ 0,  4,  9, 15]])

``cumsum(b)[-1]`` may not be equal to ``sum(b)``

>>> b = np.array([1, 2e-9, 3e-9] * 100)
>>> np.cumsum0(b)[-1]
100.0050045159
>>> b.sum()
100.005029

"""
empty = a.take([], axis=axis)
zero = empty.sum(axis, dtype=dtype, keepdims=True)
later_cumsum = a.cumsum(axis, dtype=dtype)
return concatenate([zero, later_cumsum], axis=axis, dtype=dtype, out=out)
```
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

2023-08-15 Thread john . dawson
> From my point of view, such function is a bit of a corner-case to be added to 
> numpy. And it doesn’t justify it’s naming anymore. It is not one operation 
> anymore. It is a cumsum and prepending 0. And it is very difficult to argue 
> why prepending 0 to cumsum is a part of cumsum.

That is backwards. Consider the array [x0, x1, x2].

The sum of the first 0 elements is 0.
The sum of the first 1 elements is x0.
The sum of the first 2 elements is x0+x1.
The sum of the first 3 elements is x0+x1+x2.

Hence, the array of partial sums is [0, x0, x0+x1, x0+x1+x2].

Thus, the operation [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] is a natural and 
primitive one.

The current behaviour of numpy.cumsum is the composition of two basic 
operations, computing the partial sums and omitting the initial value:

[x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] -> [x0, x0+x1, x0+x1+x2].

> What I would rather vouch for is adding an argument to `np.diff` so that it 
> leaves first row unmodified.
> def diff0(a, axis=-1):
> """Differencing which appends first item along the axis"""
> a0 = np.take(a, [0], axis=axis)
> return np.concatenate([a0, np.diff(a, n=1, axis=axis)], axis=axis)
> This would be more sensible from conceptual point of view. As difference can 
> not be made, the result is the difference from absolute origin. With 
> recognition that first non-origin value in a sequence is the one after it. 
> And if the first row is the origin in a specific case, then that origin is 
> correctly defined in relation to absolute origin.
> Then, if origin row is needed, then it can be prepended in the beginning of a 
> procedure. And np.diff and np.cumsum are inverses throughout the sequential 
> code.
> np.diff0 was one the first functions I had added to my numpy utils and been 
> using it instead of np.diff quite a lot.

This suggestion is bad: diff0 is conceptually confused. numpy.diff changes an 
array of numpy.datetime64s to an array of numpy.timedelta64s, but numpy.diff0 
changes an array of numpy.datetime64s to a heterogeneous array where one 
element is a numpy.datetime64 and the rest are numpy.timedelta64s. In general, 
whereas numpy.diff changes an array of positions to an array of displacements, 
diff0 changes an array of positions to a heterogeneous array where one element 
is a position and the rest are displacements.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com