A pain point I ran into a while ago was assuming that an np.ndarray with
dtype=np.bool_ would act similarly to the Python built-in boolean under
addition. This is not the case, as shown in the following code snippet:
>>> np.bool_(True) + True
True
>>> True + True
2
In fact, I'm somewhat confused about all the arithmetic operations on boolean
arrays:
>>> np.bool_(True) * True
True
>>> np.bool_(True) / True
1.0
>>> np.bool_(True) - True
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the
bitwise_xor, the `^` operator, or the logical_xor function instead.
>>> for x, y in ((False, False), (False, True), (True, False), (True, True)):
>>> print(np.bool_(x) ** y, end=" ")
1 0 1 1
I get that addition corresponds to "logical or" and multiplication corresponds
to "logical and", but I'm lost on the division and exponentiation operations
given that addition and multiplication don't promote the dtype to integers or
floats.
If arrays stubbornly refused to ever change type or interact with objects of a
different type under addition, that'd be one thing, but they do change:
>>> np.uint8(0) - 1
-1
>>> (np.uint8(0) - 1).dtype
dtype('int64')
>>> (np.uint8(0) + 0.1).dtype
dtype('float64')
This dtype change can also be seen in the division and exponentiation above for
np.bool_.
Why the discrepancy in behavior for np.bool_? And why are arithmetic operations
for np.bool_ inconsistently promoted to other data types?
If all arithmetic operations on np.bool_ resulted in integers, that would be
consistent (so easier to work with) and wouldn't restrict expressiveness
because there are also "logical or" (|) and "logical and" (&) operations
available. Alternatively, division and exponentiation could throw errors like
subtract, but the discrepancy between np.bool_ and the Python built-in bool for
addition and multiplication would remain.
For context, I ran into an issue with this discrepancy in behavior while
working on an image segmentation problem. For binary segmentation problems, we
make use of boolean arrays to represent where an object is (the locations in
the array which are "True" correspond to the foreground/object-of-interest,
"False" corresponds to the background). I was aggregating multiple binary
segmentation arrays to do a majority vote with an implementation that boiled
down to the following:
>>> pred1, pred2, ..., predN = np.array(..., dtype=np.bool_), np.array(...,
>>> dtype=np.bool_), ..., np.array(..., dtype=np.bool_)
>>> aggregate = (pred1 + pred2 + ... + predN) / N
>>> agg_pred = aggregate >= 0.5
Which returned (1.0 / N) in all indices which had at least one "True" value in
a prediction. I assumed that the arrays would be promoted to integers (False ->
0; True -> 1) and added so that agg_pred would hold the majority vote result.
But agg_pred was always empty because the maximum value was (1.0 / N) for N > 2.
My current "work around" is to remind myself of this discrepancy by importing
"builtins" from the standard library and annotating the relevant functions and
variables as using the "builtins.bool" to explicitly distinguish it from
np.bool_ behavior where applicable, and add checks and/or conversions on top of
that. But why not make np.bool_ act like the built-in bool under addition and
multiplication and let users use the already existing | and & operations for
"logical or" and "logical and"?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com