On Fri, 2022-10-21 at 17:17 -0600, Aaron Meurer wrote:
> I'm probably not understanding all the subtleties here. In the
> documentation for can_cast (and other places), it says, "'safe' means
> only casts which can preserve values are allowed." So by that
> definition, I think 'safe' casting should disallow 5000 to be cast to
Yes, but we never look at the actual value normally (NumPy does
currently for 0-D arrays, but I doubt we want to continue that [1]).
So casting:
np.array([100, 100], dtype=np.int64) -> int8
is unsafe even though it is safe when you look at the values.
But for (Python) scalar assignment, we want to look at the value. In
particularly:
np.add(np.int8(1), 500)
should error (in the future) because we need to convert the 500 to
`int8`. But the default casting for ufuncs is "same kind" and if you
write:
np.add(np.int8(1), np.int64(500), casting="unsafe", dtype="int8")
NumPy would happily do the operation.
- Sebastian
> int8, because it would not preserve the value. IMO the definition of
> "value" is more vague when considering whether 100.0 can be cast to
> int8. Personally I don't think float -> int should ever be considered
> 'safe' even if the numeric value technically is preserved. It's also
> ambiguous whether "preserve value" applies to the inputs or just the
> outputs, i.e., is np.add(100, 100, dtpye=int8) safe?
>
> The can_cast() documentation says, "Returns True if cast between data
> types can occur according to the casting rule." 'safe' is by
> definition a value-based cast. So my expectation is that for a
> non-value based cast there should be a new type of casting rule that
> is non-value based only.
>
> Unless your entire suggestion is to change the definition of 'safe'
> to
> not be value-based. I wasn't completely clear about that.
>
> Aaron Meurer
>
> On Thu, Oct 20, 2022 at 7:30 AM Sebastian Berg
> wrote:
> >
> > Hi all,
> >
> > I am happy that we have the correct integer handling for NEP 50
> > merged,
> > so the relevant parts of the proposal can now be tested. [1]
> >
> > However, this has highlighted that NumPy has problems with applying
> > the
> > "cast safety" logic to scalars. We had discussed this a bit
> > yesterday,
> > and this is an attempt to summarize the issue and thoughts on how
> > to
> > "resolve" it.
> >
> > This mainly affects Python int, float, and complex due to their
> > special
> > handling with NEP 50.
> >
> >
> > NumPy has the cast safety concept for converting between different
> > dtypes:
> >
> > https://numpy.org/doc/stable/reference/generated/numpy.can_cast.html
> >
> > It uses "same-kind" in ufuncs (users do not usually notice this
> > unless
> > `out=` or `dtype=` is used).
> > NumPy otherwise tends to use "unsafe" for casts and assignments by
> > default which can lead to undefined/strange results at times.
> >
> >
> > Since casts/assignment use "unsafe" casting, scalars are often
> > converted in a non-safe way. However, there are certain
> > exceptions:
> >
> > np.arange(5)[3] = np.nan # Errors (an unsafe cast would not)
> >
> > More importantly, NEP 50 requires the following to error:
> >
> > np.uint8(3) + 5000 # 5000 cannot be converted to uint8
> >
> > And we just put in a deprecation that would always disallow the
> > above!
> > But what would the answer to:
> >
> > np.can_cast(5000, np.uint8, casting="safe/same_kind/unsafe")
> >
> > be? And how to resolve the fact that casting scalars and arrays
> > has a
> > different notion of "safety"?
> >
> > I could imagine two main approaches:
> >
> > * cast-safety doesn't apply to scalar conversions, they are
> > whatever
> > they currently are (sometimes unsafe, sometimes same-kind, but
> > strictly safe for integer assignments).
> > `np.can_cast(5000, np.uint8)` just errors out. We have an
> > assignment
> > "safety" that is independent of casting safety.
> >
> > For `np.add(np.uint8(5), 100, casting="safe")` the "safe" (or
> > other modes) simply doesn't make sense for the `100` since
> > effectively the assignment "safety" is used.
> >
> > * Scalar conversions also have a cast-safety and it may inspect the
> > value.
> >
> > The problem with defining cast-safety for scalar conversion is not
> > implementing it, but rather how to (not?) resolve the
> > inconsistencies.
> >
> > Even if we change the default casting for assignments to "same
> > kind" (a
> > deprecation also applied to arrays):
> >
> > int8_arr[3] = 5000
> >
> > should presumably be an error (not even "unsafe"), but:
> >
> > np.can_cast(np.int64, np.int8, casting="same_kind")
> >
> > returns `True` (an int64 could be 5000 as well), and `same_kind` is
> > what ufuncs also use.
> >
> >
> > I don't have a clear plan on this right now, my best thought is
> > that we
> > live with the inconsistency:
> >
> > np.can_cast(100, np.int8)
> >
> > would be "safe" while:
> >
> > np.can_cast(100., np.int8)
> >
> > would be "un