[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-23 Thread Sebastian Berg
On Fri, 2023-12-22 at 18:01 -0500, Marten van Kerkwijk wrote:
> Hi Martin,
> 
> I agree it is a long-standing issue, and I was reminded of it by your
> comment.  I have a draft PR at 
> https://github.com/numpy/numpy/pull/25476
> that does not change the old behaviour, but allows you to pass in a
> start-stop array which behaves more sensibly (exact API TBD).
> 
> Please have a look!


That looks nice, I don't have a clear feeling on the order of items, if
we think of it in terms of `(start, stop)` there was also the idea
voiced to simply add another name in which case you would allow start
and stop to be separate arrays.
Of course if go with your `slice(start, stop)` idea that also works,
although passing as separate parameters seems nice too.

Adding another name (if we can think of one at least) seems pretty good
to me, since I suspect we would add docs to suggest not using
`reduceat`.


One small thing about the PR: I would like to distinct `default` and
`initial`.  I.e. the default value is used only for empty reductions,
while the initial value should be always used (unless you would pass
both, which we don't for normal reductions though).
I suppose the machinery isn't quite set up to do both side-by-side.

- Sebastian



> 
> Marten
> 
> Martin Ling  writes:
> 
> > Hi folks,
> > 
> > I don't follow numpy development in much detail these days but I
> > see
> > that there is a 2.0 release planned soon.
> > 
> > Would this be an opportunity to change the behaviour of 'reduceat'?
> > 
> > This issue has been open in some form since 2006!
> > https://github.com/numpy/numpy/issues/834
> > 
> > The current behaviour was originally inherited from Numeric, and
> > makes
> > reduceat often unusable in practice, even where it should be the
> > perfect, concise, efficient solution. But it has been impossible to
> > change it without breaking compatibіlity with existing code.
> > 
> > As a result, horrible hacks are needed instead, e.g. my answer
> > here:
> > https://stackoverflow.com/questions/57694003
> > 
> > Is this something that could finally be fixed in 2.0?
> > 
> > 
> > Martin
> > ___
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: m...@astro.utoronto.ca
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-23 Thread Marten van Kerkwijk
Hi Sebastian,

> That looks nice, I don't have a clear feeling on the order of items, if
> we think of it in terms of `(start, stop)` there was also the idea
> voiced to simply add another name in which case you would allow start
> and stop to be separate arrays.

Yes, one could add another method.  Or perhaps even add a new argument
to `.reduce` instead (say `slices`).  But this seemed the simplest
route...

> Of course if go with your `slice(start, stop)` idea that also works,
> although passing as separate parameters seems nice too.
> 
> Adding another name (if we can think of one at least) seems pretty good
> to me, since I suspect we would add docs to suggest not using
> `reduceat`.

If we'd want to, even with the present PR it would be possible to (very
slowly) deprecate the use of a list of single integers.  But I'm trying
to go with just making the existing method more useful.

> One small thing about the PR: I would like to distinct `default` and
> `initial`.  I.e. the default value is used only for empty reductions,
> while the initial value should be always used (unless you would pass
> both, which we don't for normal reductions though).
> I suppose the machinery isn't quite set up to do both side-by-side.

I just followed what is done for reduce, where a default could also have
made sense given that `where` can exclude all inputs along a given row.
I'm not convinced it would be necessary to have both, though it would
not be hard to add.

All the best,

Marten
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Fixing definition of reduceat for Numpy 2.0?

2023-12-23 Thread Sebastian Berg
On Sat, 2023-12-23 at 09:56 -0500, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> > That looks nice, I don't have a clear feeling on the order of
> > items, if
> > we think of it in terms of `(start, stop)` there was also the idea
> > voiced to simply add another name in which case you would allow
> > start
> > and stop to be separate arrays.
> 
> Yes, one could add another method.  Or perhaps even add a new
> argument
> to `.reduce` instead (say `slices`).  But this seemed the simplest
> route...

Yeah, I don't mind this, doesn't stop us from a better idea either.
Adding to `.reduce` could be fine, but overall I actually think a new
name or using `reduceat` is nicer than overloading it more, even
`reduce_slices()`.

> > 


> > 
> > I suppose the machinery isn't quite set up to do both side-by-side.
> 
> I just followed what is done for reduce, where a default could also
> have
> made sense given that `where` can exclude all inputs along a given
> row.
> I'm not convinced it would be necessary to have both, though it would
> not be hard to add.

Sorry, I misread the code: You do use initial the same way as in
reductions, I thought it wasn't used when there were multiple elements.
I.e. it is used for non-empty slices also.

There is still a little annoyance when `initial=` isn't passed, since
default/initial can be different (this is the case for object add for
example: the default is `0`, but it is not used as initial for non
empty reductions).
Anyway, its a small details to some degree even if it may be finicky to
get right.  At the moment it seems passing `dtype=object` somehow
changes the result also.

- Sebastian


> 
> All the best,
> 
> Marten
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net
> 


___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Change definition of complex sign (and use it in copysign)

2023-12-23 Thread Dom Grigonis
To me this sounds like a reasonable change.

It does seem like there is a return value which is more sensible than 
alternatives.

And the fact that sympy is already doing that indicates that same conclusion 
was reached more than once.

I am not dealing much with complex numbers at the moment, but I see this being 
of non-trivial convenience when I need to.

Regards,
DG

> On 22 Dec 2023, at 15:48, Oscar Benjamin  wrote:
> 
> On Fri, 22 Dec 2023 at 13:25,  wrote:
>> 
>> Anyway, to me the main question would be whether this would break any 
>> workflows (though it is hard to see how it could, given that the previous 
>> definition was really rather useless...).
> 
> SymPy already defines sign(z) as z/abs(z) (with sign(0) = 0) as proposed here.
> 
> Checking this I see that the current mismatch causes a bug when
> SymPy's lambdify function is used to evaluate the sign function with
> NumPy:
> 
> In [12]: sign(z).subs(z, 1+1j)
> Out[12]: 0.707106781186548 + 0.707106781186548⋅ⅈ
> 
> In [13]: lambdify(z, sign(z))(1+1j) # uses numpy
> Out[13]: (1+0j)
> 
> The proposed change to NumPy's sign function would fix this bug.
> 
> --
> Oscar
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] ENH: ignore_nan parameter in np.average

2023-12-23 Thread alexwest . mays
Hi there, 

I would love some input on this issue, 
https://github.com/numpy/numpy/issues/21375. 

I have created a draft PR which I believe resolves it, 
https://github.com/numpy/numpy/pull/25474. Decided to go with adding ignore_nan 
to np.average rather than adding weights to np.nanmean as it seemed logical and 
was the path of least resistance. 

Please let me know what you think!

-awestm
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: ENH: ignore_nan parameter in np.average

2023-12-23 Thread mhvk
I think the logical choice would be to allow a weighted average in the 
nan-functions, not to change `np.average`. In particular, I think it is bad API 
to add options for such exclusions; we have another open issue about infinitiy 
in the data, which leads to nan even with weight 0 (because `0*inf = nan`; see 
https://github.com/numpy/numpy/issues/25362); should that gets its own option?

It also seems a case that is *very* easily solved by the user with a wrapper, 
or, as noted in the issue, with creating a mask and using `np.ma.average`. Or, 
indeed, just calculating `np.nansum(data * weight, axis=...) / 
np.nansum(weight, axis=...)`

Anyway, my 2¢ would be to think whether we actually need this addition, and, if 
we decide we do, to see how this would fit with the `nan*` functions. This 
might mean introducing `np.nanaverage` since the Array API is fairly clear that 
`mean()` should be unweighted. Indeed, if written such that the `NaN` skipping 
is done after data and weight are multiplied, that would solve 
https://github.com/numpy/numpy/issues/25362 too.

-- Marten

p.s. My other reason for hesitation is that generally when you get a NaN, 
something is actually wrong...

> I would love some input on this issue, 
> https://github.com/numpy/numpy/issues/21375.
>
> I have created a draft PR which I believe resolves it, 
> https://github.com/numpy/numpy/pull/25474. Decided to go with adding 
> ignore_nan to np.average rather than adding weights to np.nanmean as it 
> seemed logical and was the path of least resistance.
> Please let me know what you think!
> -awestm
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com