[Numpy-discussion] np.bool_ vs Python bool behavior

2022-03-12 Thread Jacob Reinhold
A pain point I ran into a while ago was assuming that an np.ndarray with 
dtype=np.bool_ would act similarly to the Python built-in boolean under 
addition. This is not the case, as shown in the following code snippet:

>>> np.bool_(True) + True
True
>>> True + True
2

In fact, I'm somewhat confused about all the arithmetic operations on boolean 
arrays:

>>> np.bool_(True) * True
True
>>> np.bool_(True) / True
1.0
>>> np.bool_(True) - True
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the 
bitwise_xor, the `^` operator, or the logical_xor function instead.
>>> for x, y in ((False, False), (False, True), (True, False), (True, True)): 
>>> print(np.bool_(x) ** y, end=" ")
1 0 1 1

I get that addition corresponds to "logical or" and multiplication corresponds 
to "logical and", but I'm lost on the division and exponentiation operations 
given that addition and multiplication don't promote the dtype to integers or 
floats.

If arrays stubbornly refused to ever change type or interact with objects of a 
different type under addition, that'd be one thing, but they do change:

>>> np.uint8(0) - 1
-1
>>> (np.uint8(0) - 1).dtype
dtype('int64')
>>> (np.uint8(0) + 0.1).dtype
dtype('float64')

This dtype change can also be seen in the division and exponentiation above for 
np.bool_.

Why the discrepancy in behavior for np.bool_? And why are arithmetic operations 
for np.bool_ inconsistently promoted to other data types?

If all arithmetic operations on np.bool_ resulted in integers, that would be 
consistent (so easier to work with) and wouldn't restrict expressiveness 
because there are also "logical or" (|) and "logical and" (&) operations 
available. Alternatively, division and exponentiation could throw errors like 
subtract, but the discrepancy between np.bool_ and the Python built-in bool for 
addition and multiplication would remain.

For context, I ran into an issue with this discrepancy in behavior while 
working on an image segmentation problem. For binary segmentation problems, we 
make use of boolean arrays to represent where an object is (the locations in 
the array which are "True" correspond to the foreground/object-of-interest, 
"False" corresponds to the background). I was aggregating multiple binary 
segmentation arrays to do a majority vote with an implementation that boiled 
down to the following:

>>> pred1, pred2, ..., predN = np.array(..., dtype=np.bool_), np.array(..., 
>>> dtype=np.bool_), ..., np.array(..., dtype=np.bool_)
>>> aggregate = (pred1 + pred2 + ... + predN) / N
>>> agg_pred = aggregate >= 0.5

Which returned (1.0 / N) in all indices which had at least one "True" value in 
a prediction. I assumed that the arrays would be promoted to integers (False -> 
0; True -> 1) and added so that agg_pred would hold the majority vote result. 
But agg_pred was always empty because the maximum value was (1.0 / N) for N > 2.

My current "work around" is to remind myself of this discrepancy by importing 
"builtins" from the standard library and annotating the relevant functions and 
variables as using the "builtins.bool" to explicitly distinguish it from 
np.bool_ behavior where applicable, and add checks and/or conversions on top of 
that. But why not make np.bool_ act like the built-in bool under addition and 
multiplication  and let users use the already existing | and & operations for 
"logical or" and "logical and"?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: np.bool_ vs Python bool behavior

2022-03-16 Thread Jacob Reinhold
Hi Sebastian and Chuck,

Thanks for the response! (Sorry about the formatting in my original post, I 
wasn't familiar with how to display code in this setting).

I think keeping + as "logical or" and * as "logical and" on np.bool_ types is 
fine, although redundant given that | and & provide this functionality and 
potentially misleading given the different behavior from the native Python 
bool; however, I could see it being too painful of a migration within v1.* 
numpy.

I think my main point of contention is that division and exponentiation aren't 
well defined operations on np.bool_, at least as currently defined, and they 
should raise errors like subtraction. Raising those errors would have caught 
the problem I ran into when trying to taking the mean of multiple ndarrays of 
dtype=np.bool_. I'm not sure what the realistic use case is to have 
division/exp. return a float/int, especially when +/* return np.bool_ and 
subtraction throws an error.

Sebastian, you stated:
"N.B.:  I have changed that logic. "Future" ufuncs are now reversed.
They will default to an error rather than using the `int8`
implementation."

So is the division/exp. issue that I described with np.bool_ solved in future 
releases?

Happy to help out on implementation/formalizing a proposal!

FWIW, I suppose you could change + to XOR. Then np.bool_ would be a field 
(isomorphic to Z/2Z) and then you could reasonably define - and /. (Although + 
would be equivalent to - and * would be equivalent to /, which would probably 
be confusing to most users.)

Best,
Jacob
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com