[Numpy-discussion] Add to NumPy a function to compute cumulative sums from 0.
`cumsum` computes the sum of the first k summands for every k from 1. Judging by my experience, it is more often useful to compute the sum of the first k summands for every k from 0, as `cumsum`'s behaviour leads to fencepost-like problems. https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error For example, `cumsum` is not the inverse of `diff`. I propose adding a function to NumPy to compute cumulative sums beginning with 0, that is, an inverse of `diff`. It might be called `cumsum0`. The following code is probably not the best way to implement it, but it illustrates the desired behaviour. ``` def cumsum0(a, axis=None, dtype=None, out=None): """ Return the cumulative sum of the elements along a given axis, beginning with 0. cumsum0 does the same as cumsum except that cumsum computes the sum of the first k summands for every k from 1 and cumsum, from 0. Parameters -- a : array_like Input array. axis : int, optional Axis along which the cumulative sum is computed. The default (None) is to compute the cumulative sum over the flattened array. dtype : dtype, optional Type of the returned array and of the accumulator in which the elements are summed. If `dtype` is not specified, it defaults to the dtype of `a`, unless `a` has an integer dtype with a precision less than that of the default platform integer. In that case, the default platform integer is used. out : ndarray, optional Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output but the type will be cast if necessary. See :ref:`ufuncs-output-type` for more details. Returns --- cumsum0_along_axis : ndarray. A new array holding the result is returned unless `out` is specified, in which case a reference to `out` is returned. If `axis` is not None the result has the same shape as `a` except along `axis`, where the dimension is smaller by 1. See Also cumsum : Cumulatively sum array elements, beginning with the first. sum : Sum array elements. trapz : Integration of array values using the composite trapezoidal rule. diff : Calculate the n-th discrete difference along given axis. Notes - Arithmetic is modular when using integer types, and no error is raised on overflow. ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point values since ``sum`` may use a pairwise summation routine, reducing the roundoff-error. See `sum` for more information. Examples >>> a = np.array([[1, 2, 3], [4, 5, 6]]) >>> a array([[1, 2, 3], [4, 5, 6]]) >>> np.cumsum0(a) array([ 0, 1, 3, 6, 10, 15, 21]) >>> np.cumsum0(a, dtype=float) # specifies type of output value(s) array([ 0., 1., 3., 6., 10., 15., 21.]) >>> np.cumsum0(a, axis=0) # sum over rows for each of the 3 columns array([[0, 0, 0], [1, 2, 3], [5, 7, 9]]) >>> np.cumsum0(a, axis=1) # sum over columns for each of the 2 rows array([[ 0, 1, 3, 6], [ 0, 4, 9, 15]]) ``cumsum(b)[-1]`` may not be equal to ``sum(b)`` >>> b = np.array([1, 2e-9, 3e-9] * 100) >>> np.cumsum0(b)[-1] 100.0050045159 >>> b.sum() 100.005029 """ empty = a.take([], axis=axis) zero = empty.sum(axis, dtype=dtype, keepdims=True) later_cumsum = a.cumsum(axis, dtype=dtype) return concatenate([zero, later_cumsum], axis=axis, dtype=dtype, out=out) ``` ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.
I'm very sensitive to the issues of adding to the already bloated numpy API, but I would definitely find use in this function. I literally made this error (thinking that the first element of cumsum should be 0) just a couple of days ago! What are the plans for the "extended" NumPy API after 2.0? Is there a good place for these variants? On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote: > `cumsum` computes the sum of the first k summands for every k from 1. > Judging by my experience, it is more often useful to compute the sum of > the first k summands for every k from 0, as `cumsum`'s behaviour leads > to fencepost-like problems. > https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error > For example, `cumsum` is not the inverse of `diff`. I propose adding a > function to NumPy to compute cumulative sums beginning with 0, that is, > an inverse of `diff`. It might be called `cumsum0`. The following code > is probably not the best way to implement it, but it illustrates the > desired behaviour. > > ``` > def cumsum0(a, axis=None, dtype=None, out=None): > """ > Return the cumulative sum of the elements along a given axis, > beginning with 0. > > cumsum0 does the same as cumsum except that cumsum computes the sum > of the first k summands for every k from 1 and cumsum, from 0. > > Parameters > -- > a : array_like > Input array. > axis : int, optional > Axis along which the cumulative sum is computed. The default > (None) is to compute the cumulative sum over the flattened > array. > dtype : dtype, optional > Type of the returned array and of the accumulator in which the > elements are summed. If `dtype` is not specified, it defaults to > the dtype of `a`, unless `a` has an integer dtype with a > precision less than that of the default platform integer. In > that case, the default platform integer is used. > out : ndarray, optional > Alternative output array in which to place the result. It must > have the same shape and buffer length as the expected output but > the type will be cast if necessary. See > :ref:`ufuncs-output-type` for more details. > > Returns > --- > cumsum0_along_axis : ndarray. > A new array holding the result is returned unless `out` is > specified, in which case a reference to `out` is returned. If > `axis` is not None the result has the same shape as `a` except > along `axis`, where the dimension is smaller by 1. > > See Also > > cumsum : Cumulatively sum array elements, beginning with the first. > sum : Sum array elements. > trapz : Integration of array values using the composite trapezoidal rule. > diff : Calculate the n-th discrete difference along given axis. > > Notes > - > Arithmetic is modular when using integer types, and no error is > raised on overflow. > > ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point > values since ``sum`` may use a pairwise summation routine, reducing > the roundoff-error. See `sum` for more information. > > Examples > > >>> a = np.array([[1, 2, 3], [4, 5, 6]]) > >>> a > array([[1, 2, 3], >[4, 5, 6]]) > >>> np.cumsum0(a) > array([ 0, 1, 3, 6, 10, 15, 21]) > >>> np.cumsum0(a, dtype=float) # specifies type of output value(s) > array([ 0., 1., 3., 6., 10., 15., 21.]) > > >>> np.cumsum0(a, axis=0) # sum over rows for each of the 3 columns > array([[0, 0, 0], >[1, 2, 3], >[5, 7, 9]]) > >>> np.cumsum0(a, axis=1) # sum over columns for each of the 2 rows > array([[ 0, 1, 3, 6], >[ 0, 4, 9, 15]]) > > ``cumsum(b)[-1]`` may not be equal to ``sum(b)`` > > >>> b = np.array([1, 2e-9, 3e-9] * 100) > >>> np.cumsum0(b)[-1] > 100.0050045159 > >>> b.sum() > 100.005029 > > """ > empty = a.take([], axis=axis) > zero = empty.sum(axis, dtype=dtype, keepdims=True) > later_cumsum = a.cumsum(axis, dtype=dtype) > return concatenate([zero, later_cumsum], axis=axis, dtype=dtype, out=out) > ``` > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: j...@fastmail.com ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.
I'm really confused. Summing from zero should be what cumsum() does now. ``` >>> np.__version__ '1.22.4' >>> np.cumsum([[1, 2, 3], [4, 5, 6]]) array([ 1, 3, 6, 10, 15, 21]) ``` which matches your example in the cumsum0() documentation. Did something change in a recent release? Ben Root On Fri, Aug 11, 2023 at 8:55 AM Juan Nunez-Iglesias wrote: > I'm very sensitive to the issues of adding to the already bloated numpy > API, but I would definitely find use in this function. I literally made > this error (thinking that the first element of cumsum should be 0) just a > couple of days ago! What are the plans for the "extended" NumPy API after > 2.0? Is there a good place for these variants? > > On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote: > > `cumsum` computes the sum of the first k summands for every k from 1. > > Judging by my experience, it is more often useful to compute the sum of > > the first k summands for every k from 0, as `cumsum`'s behaviour leads > > to fencepost-like problems. > > https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error > > For example, `cumsum` is not the inverse of `diff`. I propose adding a > > function to NumPy to compute cumulative sums beginning with 0, that is, > > an inverse of `diff`. It might be called `cumsum0`. The following code > > is probably not the best way to implement it, but it illustrates the > > desired behaviour. > > > > ``` > > def cumsum0(a, axis=None, dtype=None, out=None): > > """ > > Return the cumulative sum of the elements along a given axis, > > beginning with 0. > > > > cumsum0 does the same as cumsum except that cumsum computes the sum > > of the first k summands for every k from 1 and cumsum, from 0. > > > > Parameters > > -- > > a : array_like > > Input array. > > axis : int, optional > > Axis along which the cumulative sum is computed. The default > > (None) is to compute the cumulative sum over the flattened > > array. > > dtype : dtype, optional > > Type of the returned array and of the accumulator in which the > > elements are summed. If `dtype` is not specified, it defaults to > > the dtype of `a`, unless `a` has an integer dtype with a > > precision less than that of the default platform integer. In > > that case, the default platform integer is used. > > out : ndarray, optional > > Alternative output array in which to place the result. It must > > have the same shape and buffer length as the expected output but > > the type will be cast if necessary. See > > :ref:`ufuncs-output-type` for more details. > > > > Returns > > --- > > cumsum0_along_axis : ndarray. > > A new array holding the result is returned unless `out` is > > specified, in which case a reference to `out` is returned. If > > `axis` is not None the result has the same shape as `a` except > > along `axis`, where the dimension is smaller by 1. > > > > See Also > > > > cumsum : Cumulatively sum array elements, beginning with the first. > > sum : Sum array elements. > > trapz : Integration of array values using the composite trapezoidal > rule. > > diff : Calculate the n-th discrete difference along given axis. > > > > Notes > > - > > Arithmetic is modular when using integer types, and no error is > > raised on overflow. > > > > ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point > > values since ``sum`` may use a pairwise summation routine, reducing > > the roundoff-error. See `sum` for more information. > > > > Examples > > > > >>> a = np.array([[1, 2, 3], [4, 5, 6]]) > > >>> a > > array([[1, 2, 3], > >[4, 5, 6]]) > > >>> np.cumsum0(a) > > array([ 0, 1, 3, 6, 10, 15, 21]) > > >>> np.cumsum0(a, dtype=float) # specifies type of output value(s) > > array([ 0., 1., 3., 6., 10., 15., 21.]) > > > > >>> np.cumsum0(a, axis=0) # sum over rows for each of the 3 columns > > array([[0, 0, 0], > >[1, 2, 3], > >[5, 7, 9]]) > > >>> np.cumsum0(a, axis=1) # sum over columns for each of the 2 rows > > array([[ 0, 1, 3, 6], > >[ 0, 4, 9, 15]]) > > > > ``cumsum(b)[-1]`` may not be equal to ``sum(b)`` > > > > >>> b = np.array([1, 2e-9, 3e-9] * 100) > > >>> np.cumsum0(b)[-1] > > 100.0050045159 > > >>> b.sum() > > 100.005029 > > > > """ > > empty = a.take([], axis=axis) > > zero = empty.sum(axis, dtype=dtype, keepdims=True) > > later_cumsum = a.cumsum(axis, dtype=dtype) > > return concatenate([zero, later_cumsum], axis=axis, dtype=dtype, > out=out) > > ``` > > ___ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > To unsubscribe send an email to
[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.
On Fri, Aug 11, 2023 at 1:47 PM Benjamin Root wrote: > I'm really confused. Summing from zero should be what cumsum() does now. > > ``` > >>> np.__version__ > '1.22.4' > >>> np.cumsum([[1, 2, 3], [4, 5, 6]]) > array([ 1, 3, 6, 10, 15, 21]) > ``` > which matches your example in the cumsum0() documentation. Did something > change in a recent release? > That's not what's in his example. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.
After blinking and rubbing my eyes, I finally see what is meant by all of this. I see now that the difference is that `cumsum0()` would return a result that essentially have 0 be prepended to what would normally be the result from `cumsum()`. From the description, I thought the "problem" was that the summation starts from 1. Personally, I never really thought of cumsum() as starting from index 1, so I didn't understand the problem as stated. So, I think some workshopping of the description is in order. On Fri, Aug 11, 2023 at 1:53 PM Robert Kern wrote: > On Fri, Aug 11, 2023 at 1:47 PM Benjamin Root > wrote: > >> I'm really confused. Summing from zero should be what cumsum() does now. >> >> ``` >> >>> np.__version__ >> '1.22.4' >> >>> np.cumsum([[1, 2, 3], [4, 5, 6]]) >> array([ 1, 3, 6, 10, 15, 21]) >> ``` >> which matches your example in the cumsum0() documentation. Did something >> change in a recent release? >> > > That's not what's in his example. > > -- > Robert Kern > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: ben.v.r...@gmail.com > ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.
This has come up before, see https://github.com/numpy/numpy/issues/6044 for the first time this came up; there were several subsequent discussions linked there. In the meantime, the data APIs consortium has been actively working on adding a `cumulative_sum` function to the array API standard, see https://github.com/data-apis/array-api/issues/597 and https://github.com/data-apis/array-api/pull/653. The proposed `cumulative_sum` function includes an `include_initial` keyword argument that gets the OP's desired behavior. I think we should probably eventually deprecate `cumsum` and `cumprod` in favor of the array API standard's `cumulative_sum` and `cumulative_product` if only because of the embarrassing naming issue. Once the array API standard has finalized the name for the keyword argument, I think it makes sense to add the keyword argument to np.cumsum, even if we don't deprecate it yet. I don't think it makes sense to add a new function just for this. On Fri, Aug 11, 2023 at 6:34 AM wrote: > `cumsum` computes the sum of the first k summands for every k from 1. > Judging by my experience, it is more often useful to compute the sum of the > first k summands for every k from 0, as `cumsum`'s behaviour leads to > fencepost-like problems. > https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error > For example, `cumsum` is not the inverse of `diff`. I propose adding a > function to NumPy to compute cumulative sums beginning with 0, that is, an > inverse of `diff`. It might be called `cumsum0`. The following code is > probably not the best way to implement it, but it illustrates the desired > behaviour. > > ``` > def cumsum0(a, axis=None, dtype=None, out=None): > """ > Return the cumulative sum of the elements along a given axis, > beginning with 0. > > cumsum0 does the same as cumsum except that cumsum computes the sum > of the first k summands for every k from 1 and cumsum, from 0. > > Parameters > -- > a : array_like > Input array. > axis : int, optional > Axis along which the cumulative sum is computed. The default > (None) is to compute the cumulative sum over the flattened > array. > dtype : dtype, optional > Type of the returned array and of the accumulator in which the > elements are summed. If `dtype` is not specified, it defaults to > the dtype of `a`, unless `a` has an integer dtype with a > precision less than that of the default platform integer. In > that case, the default platform integer is used. > out : ndarray, optional > Alternative output array in which to place the result. It must > have the same shape and buffer length as the expected output but > the type will be cast if necessary. See > :ref:`ufuncs-output-type` for more details. > > Returns > --- > cumsum0_along_axis : ndarray. > A new array holding the result is returned unless `out` is > specified, in which case a reference to `out` is returned. If > `axis` is not None the result has the same shape as `a` except > along `axis`, where the dimension is smaller by 1. > > See Also > > cumsum : Cumulatively sum array elements, beginning with the first. > sum : Sum array elements. > trapz : Integration of array values using the composite trapezoidal > rule. > diff : Calculate the n-th discrete difference along given axis. > > Notes > - > Arithmetic is modular when using integer types, and no error is > raised on overflow. > > ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point > values since ``sum`` may use a pairwise summation routine, reducing > the roundoff-error. See `sum` for more information. > > Examples > > >>> a = np.array([[1, 2, 3], [4, 5, 6]]) > >>> a > array([[1, 2, 3], >[4, 5, 6]]) > >>> np.cumsum0(a) > array([ 0, 1, 3, 6, 10, 15, 21]) > >>> np.cumsum0(a, dtype=float) # specifies type of output value(s) > array([ 0., 1., 3., 6., 10., 15., 21.]) > > >>> np.cumsum0(a, axis=0) # sum over rows for each of the 3 columns > array([[0, 0, 0], >[1, 2, 3], >[5, 7, 9]]) > >>> np.cumsum0(a, axis=1) # sum over columns for each of the 2 rows > array([[ 0, 1, 3, 6], >[ 0, 4, 9, 15]]) > > ``cumsum(b)[-1]`` may not be equal to ``sum(b)`` > > >>> b = np.array([1, 2e-9, 3e-9] * 100) > >>> np.cumsum0(b)[-1] > 100.0050045159 > >>> b.sum() > 100.005029 > > """ > empty = a.take([], axis=axis) > zero = empty.sum(axis, dtype=dtype, keepdims=True) > later_cumsum = a.cumsum(axis, dtype=dtype) > return concatenate([zero, later_cumsum], axis=axis, dtype=dtype, > out=out) > ``` > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.
[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.
On 11 Aug 2023, at 7:52 pm, Robert Kern mailto:robert.k...@gmail.com>> wrote: >>> np.cumsum([[1, 2, 3], [4, 5, 6]]) array([ 1, 3, 6, 10, 15, 21]) ``` which matches your example in the cumsum0() documentation. Did something change in a recent release? That's not what's in his example. The example is creating a cumsum-like array of n+1 elements starting with the number 0, not array[0] – i.e. essentially just inserting 0 along every axis, so that np.diff(np.cumsum0(a)) = a Not sure if this would be too complicated to effect with the existing ufuncs either… Almost all of the documentation sounds very repetitive, so maybe implementing this via a new kwarg to cumsum would be a better option? Cheers, Derek ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.
On Fri, 2023-08-11 at 13:43 -0400, Benjamin Root wrote: > I'm really confused. Summing from zero should be what cumsum() does > now. > What they mean is *including* the "implicit" 0 in the result. There are some old NumPy issues on this, suggesting something like a new kwarg like `include_initial=True`. This was also discussed here more recently: https://github.com/data-apis/array-api/issues/597 I think everyone always agreed with such an addition being good. It terribly be super hard, although the code needs some restructuring to do it, so not sure it is easy either. - Sebastian > ``` > > > > np.__version__ > '1.22.4' > > > > np.cumsum([[1, 2, 3], [4, 5, 6]]) > array([ 1, 3, 6, 10, 15, 21]) > ``` > which matches your example in the cumsum0() documentation. Did > something > change in a recent release? > > Ben Root > > On Fri, Aug 11, 2023 at 8:55 AM Juan Nunez-Iglesias > > wrote: > > > I'm very sensitive to the issues of adding to the already bloated > > numpy > > API, but I would definitely find use in this function. I literally > > made > > this error (thinking that the first element of cumsum should be 0) > > just a > > couple of days ago! What are the plans for the "extended" NumPy API > > after > > 2.0? Is there a good place for these variants? > > > > On Fri, 11 Aug 2023, at 2:07 AM, john.daw...@camlingroup.com wrote: > > > `cumsum` computes the sum of the first k summands for every k > > > from 1. > > > Judging by my experience, it is more often useful to compute the > > > sum of > > > the first k summands for every k from 0, as `cumsum`'s behaviour > > > leads > > > to fencepost-like problems. > > > https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error > > > For example, `cumsum` is not the inverse of `diff`. I propose > > > adding a > > > function to NumPy to compute cumulative sums beginning with 0, > > > that is, > > > an inverse of `diff`. It might be called `cumsum0`. The following > > > code > > > is probably not the best way to implement it, but it illustrates > > > the > > > desired behaviour. > > > > > > ``` > > > def cumsum0(a, axis=None, dtype=None, out=None): > > > """ > > > Return the cumulative sum of the elements along a given axis, > > > beginning with 0. > > > > > > cumsum0 does the same as cumsum except that cumsum computes > > > the sum > > > of the first k summands for every k from 1 and cumsum, from > > > 0. > > > > > > Parameters > > > -- > > > a : array_like > > > Input array. > > > axis : int, optional > > > Axis along which the cumulative sum is computed. The > > > default > > > (None) is to compute the cumulative sum over the > > > flattened > > > array. > > > dtype : dtype, optional > > > Type of the returned array and of the accumulator in > > > which the > > > elements are summed. If `dtype` is not specified, it > > > defaults to > > > the dtype of `a`, unless `a` has an integer dtype with a > > > precision less than that of the default platform integer. > > > In > > > that case, the default platform integer is used. > > > out : ndarray, optional > > > Alternative output array in which to place the result. It > > > must > > > have the same shape and buffer length as the expected > > > output but > > > the type will be cast if necessary. See > > > :ref:`ufuncs-output-type` for more details. > > > > > > Returns > > > --- > > > cumsum0_along_axis : ndarray. > > > A new array holding the result is returned unless `out` > > > is > > > specified, in which case a reference to `out` is > > > returned. If > > > `axis` is not None the result has the same shape as `a` > > > except > > > along `axis`, where the dimension is smaller by 1. > > > > > > See Also > > > > > > cumsum : Cumulatively sum array elements, beginning with the > > > first. > > > sum : Sum array elements. > > > trapz : Integration of array values using the composite > > > trapezoidal > > rule. > > > diff : Calculate the n-th discrete difference along given > > > axis. > > > > > > Notes > > > - > > > Arithmetic is modular when using integer types, and no error > > > is > > > raised on overflow. > > > > > > ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for > > > floating-point > > > values since ``sum`` may use a pairwise summation routine, > > > reducing > > > the roundoff-error. See `sum` for more information. > > > > > > Examples > > > > > > >>> a = np.array([[1, 2, 3], [4, 5, 6]]) > > > >>> a > > > array([[1, 2, 3], > > > [4, 5, 6]]) > > > >>> np.cumsum0(a) > > > array([ 0, 1, 3, 6, 10, 15, 21]) > > > >>> np.cumsum0(a, dtype=float) # specifies type of output > > > value(s) > > > array([ 0., 1., 3., 6., 10., 15., 21.]) > > > > > > >>> np.cumsum
[Numpy-discussion] update on build system changes in NumPy's main branch
Hey all, We've landed some major changes in `main` this week, so I thought it's a good idea to keep everyone in the loop. First the good news: we now have full SIMD support in the Meson builds on `main`! This was a huge amount of work by Sayed, so I'd like to say thank you to him for doing all that. This was the main missing piece of the puzzle for Python 3.12 support, and for the switch away from `numpy.distutils`. We're still in the process of some backports and other tweaks, but we're basically ready for a first 1.26.0 pre-release. We also need to port some CI jobs over in `main` from setup.py to meson, but that is separate from work for 1.26.x. Now let me give a summary of where we are at, because I anticipate these questions coming up regularly. We had to fork and vendor both Meson and meson-python, because for SIMD support we need a significant new feature in Meson ( https://github.com/mesonbuild/meson/pull/11307) that is not yet merged; once that PR does get merged and ends up in the next Meson feature release, we can drop our vendored versions. For now they live as git submodules under `vendored-meson` in the root dir of the numpy repo. And we will keep the forks at https://github.com/numpy/meson and https://github.com/numpy/meson-python. As a result of that fork, it is no longer possible to run `meson setup` in the root of the repo. This wasn't really recommended anyway, but it's good to know that it doesn't work right now. The two ways of building are: 1. Via `pip` or `pypa/build`; that will trigger the build backend via the hook in `pyproject.toml`. 2. Via the `spin` commands - this is the developer CLI (see https://github.com/scientific-python/spin). Note that editable builds work well too, with `pip install -e . --no-build-isolation`. For the remaining loose ends after the switch to Meson, see https://github.com/numpy/numpy/issues/23981. Once those are all taken care of and we have released 1.26.0 and see that there are no issues, we can remove the `setup.py` based builds. Probably towards the end of this year. We still need to update the docs for the change over to Meson - this will happen before the 1.26.0 release. For now, the SciPy docs at http://scipy.github.io/devdocs/building/index.html are a good reference; those were recently fully rewritten, and everything for NumPy is essentially the same as for SciPy. Another thing worth mentioning is that we have now, in the `main` and `1.26.x` branches, defaulted to failing the build if BLAS/LAPACK cannot be found. This is because all the detection mechanisms changed, and it would otherwise be too easy to silently get a build that is unoptimized (that can cause up to 100x slowdowns in widely used linalg functions). If you get a failure, you can fix it or pass a `-Dallow-noblas` CLI flag to indicate that that is what you intended (or at least, you don't mind a slow build). We may still revert the change for 1.26.0 and default to switching to `lapack_lite`, for the discussion on that and more details, see https://github.com/numpy/numpy/pull/24279. A difference between the meson and setup.py builds is that rather than having a host of `NPY_` environment variables, we now have CLI flags for build options. These options can be queried and the used ones are reported in the build log, so it makes it much easier to understand what is being used. It does mean though that if you have a set of env vars in, for example, your `.bashrc`, those will no longer have an effect. For a full list of options, see the `meson_options.txt` file in the root of the repo. Note that some of the options for BLAS/LAPACK control (e.g., `NPY_BLAS_ORDER`) are not yet available. Those should arrive within the next couple of months. Cheers, Ralf ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: update on build system changes in NumPy's main branch
On Fri, Aug 11, 2023, at 12:04, Ralf Gommers wrote: > We've landed some major changes in `main` this week, so I thought it's a good > idea to keep everyone in the loop. This is a *significant* amount of work. Thank you, Ralf, for keeping track of all the moving parts and for working with the rest of the team to get this overhaul completed. The Meson build machinery is a joy to use! Best regards, Stéfan ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com