[Numpy-discussion] Re: NEP 50 and cast safety for scalar assignment/conversions

2022-10-22 Thread Sebastian Berg
On Fri, 2022-10-21 at 17:17 -0600, Aaron Meurer wrote:
> I'm probably not understanding all the subtleties here. In the
> documentation for can_cast (and other places), it says, "'safe' means
> only casts which can preserve values are allowed." So by that
> definition, I think 'safe' casting should disallow 5000 to be cast to


Yes, but we never look at the actual value normally (NumPy does
currently for 0-D arrays, but I doubt we want to continue that [1]).

So casting:

   np.array([100, 100], dtype=np.int64) -> int8

is unsafe even though it is safe when you look at the values.

But for (Python) scalar assignment, we want to look at the value.  In
particularly:

np.add(np.int8(1), 500)

should error (in the future) because we need to convert the 500 to
`int8`.  But the default casting for ufuncs is "same kind" and if you
write:

np.add(np.int8(1), np.int64(500), casting="unsafe", dtype="int8")

NumPy would happily do the operation.

- Sebastian



> int8, because it would not preserve the value. IMO the definition of
> "value" is more vague when considering whether 100.0 can be cast to
> int8. Personally I don't think float -> int should ever be considered
> 'safe' even if the numeric value technically is preserved. It's also
> ambiguous whether "preserve value" applies to the inputs or just the
> outputs, i.e., is np.add(100, 100, dtpye=int8) safe?
> 
> The can_cast() documentation says, "Returns True if cast between data
> types can occur according to the casting rule." 'safe' is by
> definition a value-based cast. So my expectation is that for a
> non-value based cast there should be a new type of casting rule that
> is non-value based only.
> 
> Unless your entire suggestion is to change the definition of 'safe'
> to
> not be value-based. I wasn't completely clear about that.
> 
> Aaron Meurer
> 
> On Thu, Oct 20, 2022 at 7:30 AM Sebastian Berg
>  wrote:
> > 
> > Hi all,
> > 
> > I am happy that we have the correct integer handling for NEP 50
> > merged,
> > so the relevant parts of the proposal can now be tested. [1]
> > 
> > However, this has highlighted that NumPy has problems with applying
> > the
> > "cast safety" logic to scalars.  We had discussed this a bit
> > yesterday,
> > and this is an attempt to summarize the issue and thoughts on how
> > to
> > "resolve" it.
> > 
> > This mainly affects Python int, float, and complex due to their
> > special
> > handling with NEP 50.
> > 
> > 
> > NumPy has the cast safety concept for converting between different
> > dtypes:
> >   
> > https://numpy.org/doc/stable/reference/generated/numpy.can_cast.html
> > 
> > It uses "same-kind" in ufuncs (users do not usually notice this
> > unless
> > `out=` or `dtype=` is used).
> > NumPy otherwise tends to use "unsafe" for casts and assignments by
> > default which can lead to undefined/strange results at times.
> > 
> > 
> > Since casts/assignment use "unsafe" casting, scalars are often
> > converted in a non-safe way.  However, there are certain
> > exceptions:
> > 
> >     np.arange(5)[3] = np.nan  # Errors (an unsafe cast would not)
> > 
> > More importantly, NEP 50 requires the following to error:
> > 
> >     np.uint8(3) + 5000  # 5000 cannot be converted to uint8
> > 
> > And we just put in a deprecation that would always disallow the
> > above!
> > But what would the answer to:
> > 
> >     np.can_cast(5000, np.uint8, casting="safe/same_kind/unsafe")
> > 
> > be?  And how to resolve the fact that casting scalars and arrays
> > has a
> > different notion of "safety"?
> > 
> > I could imagine two main approaches:
> > 
> > * cast-safety doesn't apply to scalar conversions, they are
> > whatever
> >   they currently are (sometimes unsafe, sometimes same-kind, but
> >   strictly safe for integer assignments).
> >   `np.can_cast(5000, np.uint8)` just errors out.  We have an
> > assignment
> >   "safety" that is independent of casting safety.
> > 
> >   For `np.add(np.uint8(5), 100, casting="safe")` the "safe" (or
> >   other modes) simply doesn't make sense for the `100` since
> >   effectively the assignment "safety" is used.
> > 
> > * Scalar conversions also have a cast-safety and it may inspect the
> >   value.
> > 
> > The problem with defining cast-safety for scalar conversion is not
> > implementing it, but rather how to (not?) resolve the
> > inconsistencies.
> > 
> > Even if we change the default casting for assignments to "same
> > kind" (a
> > deprecation also applied to arrays):
> > 
> >     int8_arr[3] = 5000
> > 
> > should presumably be an error (not even "unsafe"), but:
> > 
> >     np.can_cast(np.int64, np.int8, casting="same_kind")
> > 
> > returns `True` (an int64 could be 5000 as well), and `same_kind` is
> > what ufuncs also use.
> > 
> > 
> > I don't have a clear plan on this right now, my best thought is
> > that we
> > live with the inconsistency:
> > 
> >     np.can_cast(100, np.int8)
> > 
> > would be "safe" while:
> > 
> >     np.can_cast(100., np.int8)
> > 
> > would be "un

[Numpy-discussion] Out paramter for argsort?

2022-10-22 Thread Paul Rudin
Argsort returns its result in a new array. It would be useful if allocation of 
the new array could be avoided when a suitable array is already available.

Is there a reason why this wouldn't be useful or problematic to implement?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Out paramter for argsort?

2022-10-22 Thread Hameer Abbasi
Hello,

One thing I can think off right off the bat is that it’d only be useful if the 
out argument has the correct dtype (np.uintp, if I’m not mistaken). I’m not 
sure how much of a nuisance that could be for end users, but I assume not very 
much given the correct error message.

Best regards,
Hameer Abbasi
Von meinem iPhone gesendet

> Am 22.10.2022 um 17:51 schrieb Paul Rudin :
> 
> Argsort returns its result in a new array. It would be useful if allocation 
> of the new array could be avoided when a suitable array is already available.
> 
> Is there a reason why this wouldn't be useful or problematic to implement?
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: hameerabb...@yahoo.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Passing output array to bincount

2022-10-22 Thread Jerome Kieffer
On Thu, 20 Oct 2022 23:26:37 -
ntess...@pm.me wrote:

> As far as I can see, there is no equivalent numpy functionality. In fact, as 
> far as I'm aware, there isn't any fast alternative outside of 
> C/Cython/numba/..

We have cummulative histograms in silx ... and found it useful. Maybe it would 
worth having it in numpy.
https://github.com/silx-kit/silx/blob/master/src/silx/math/chistogramnd.pyx#L110

Cheers,

Jerome
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com