[Numpy-discussion] Re: np.bool_ vs Python bool behavior
On Sat, Mar 12, 2022 at 4:53 PM Jacob Reinhold wrote: > A pain point I ran into a while ago was assuming that an np.ndarray with > dtype=np.bool_ would act similarly to the Python built-in boolean under > addition. This is not the case, as shown in the following code snippet: > > >>> np.bool_(True) + True > True > >>> True + True > 2 > > In fact, I'm somewhat confused about all the arithmetic operations on > boolean arrays: > > >>> np.bool_(True) * True > True > >>> np.bool_(True) / True > 1.0 > >>> np.bool_(True) - True > TypeError: numpy boolean subtract, the `-` operator, is not supported, use > the bitwise_xor, the `^` operator, or the logical_xor function instead. > >>> for x, y in ((False, False), (False, True), (True, False), (True, > True)): print(np.bool_(x) ** y, end=" ") > 1 0 1 1 > > I get that addition corresponds to "logical or" and multiplication > corresponds to "logical and", but I'm lost on the division and > exponentiation operations given that addition and multiplication don't > promote the dtype to integers or floats. > > If arrays stubbornly refused to ever change type or interact with objects > of a different type under addition, that'd be one thing, but they do change: > > >>> np.uint8(0) - 1 > -1 > >>> (np.uint8(0) - 1).dtype > dtype('int64') > >>> (np.uint8(0) + 0.1).dtype > dtype('float64') > > This dtype change can also be seen in the division and exponentiation > above for np.bool_. > > Why the discrepancy in behavior for np.bool_? And why are arithmetic > operations for np.bool_ inconsistently promoted to other data types? > > If all arithmetic operations on np.bool_ resulted in integers, that would > be consistent (so easier to work with) and wouldn't restrict expressiveness > because there are also "logical or" (|) and "logical and" (&) operations > available. Alternatively, division and exponentiation could throw errors > like subtract, but the discrepancy between np.bool_ and the Python built-in > bool for addition and multiplication would remain. > > For context, I ran into an issue with this discrepancy in behavior while > working on an image segmentation problem. For binary segmentation problems, > we make use of boolean arrays to represent where an object is (the > locations in the array which are "True" correspond to the > foreground/object-of-interest, "False" corresponds to the background). I > was aggregating multiple binary segmentation arrays to do a majority vote > with an implementation that boiled down to the following: > > >>> pred1, pred2, ..., predN = np.array(..., dtype=np.bool_), > np.array(..., dtype=np.bool_), ..., np.array(..., dtype=np.bool_) > >>> aggregate = (pred1 + pred2 + ... + predN) / N > >>> agg_pred = aggregate >= 0.5 > > Which returned (1.0 / N) in all indices which had at least one "True" > value in a prediction. I assumed that the arrays would be promoted to > integers (False -> 0; True -> 1) and added so that agg_pred would hold the > majority vote result. But agg_pred was always empty because the maximum > value was (1.0 / N) for N > 2. > > My current "work around" is to remind myself of this discrepancy by > importing "builtins" from the standard library and annotating the relevant > functions and variables as using the "builtins.bool" to explicitly > distinguish it from np.bool_ behavior where applicable, and add checks > and/or conversions on top of that. But why not make np.bool_ act like the > built-in bool under addition and multiplication and let users use the > already existing | and & operations for "logical or" and "logical and"? > NumPy bool_ is a type and is only represented by the values (0, 1) with the "+" and "*' operators overloaded to be "or". The later Python bool is pretty much just an integer, as that was backward compatible. So you end up with things like In [20]: type(np.bool_(1) + np.bool_(1)) # "+" is the "or" operator Out[20]: np.bool_ In [21]: type(bool(1) + bool(1)) # "+" is integer addition Out[21]: int In [22]: type(np.bool_(1) * np.bool_(1)) # "*" is the "and" operator Out[22]: np.bool_ In [23]: type(bool(1) + bool(1)) # "*" is integer multiplication Out[23]: int Numpy bool_ will be promoted to int when combined with Python ints. Chuck ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: np.bool_ vs Python bool behavior
On Sun, Mar 13, 2022 at 10:31 AM Charles R Harris wrote: > > > On Sat, Mar 12, 2022 at 4:53 PM Jacob Reinhold > wrote: > >> A pain point I ran into a while ago was assuming that an np.ndarray with >> dtype=np.bool_ would act similarly to the Python built-in boolean under >> addition. This is not the case, as shown in the following code snippet: >> >> >>> np.bool_(True) + True >> True >> >>> True + True >> 2 >> >> In fact, I'm somewhat confused about all the arithmetic operations on >> boolean arrays: >> >> >>> np.bool_(True) * True >> True >> >>> np.bool_(True) / True >> 1.0 >> >>> np.bool_(True) - True >> TypeError: numpy boolean subtract, the `-` operator, is not supported, >> use the bitwise_xor, the `^` operator, or the logical_xor function instead. >> >>> for x, y in ((False, False), (False, True), (True, False), (True, >> True)): print(np.bool_(x) ** y, end=" ") >> 1 0 1 1 >> >> I get that addition corresponds to "logical or" and multiplication >> corresponds to "logical and", but I'm lost on the division and >> exponentiation operations given that addition and multiplication don't >> promote the dtype to integers or floats. >> >> If arrays stubbornly refused to ever change type or interact with objects >> of a different type under addition, that'd be one thing, but they do change: >> >> >>> np.uint8(0) - 1 >> -1 >> >>> (np.uint8(0) - 1).dtype >> dtype('int64') >> >>> (np.uint8(0) + 0.1).dtype >> dtype('float64') >> >> This dtype change can also be seen in the division and exponentiation >> above for np.bool_. >> >> Why the discrepancy in behavior for np.bool_? And why are arithmetic >> operations for np.bool_ inconsistently promoted to other data types? >> >> If all arithmetic operations on np.bool_ resulted in integers, that would >> be consistent (so easier to work with) and wouldn't restrict expressiveness >> because there are also "logical or" (|) and "logical and" (&) operations >> available. Alternatively, division and exponentiation could throw errors >> like subtract, but the discrepancy between np.bool_ and the Python built-in >> bool for addition and multiplication would remain. >> >> For context, I ran into an issue with this discrepancy in behavior while >> working on an image segmentation problem. For binary segmentation problems, >> we make use of boolean arrays to represent where an object is (the >> locations in the array which are "True" correspond to the >> foreground/object-of-interest, "False" corresponds to the background). I >> was aggregating multiple binary segmentation arrays to do a majority vote >> with an implementation that boiled down to the following: >> >> >>> pred1, pred2, ..., predN = np.array(..., dtype=np.bool_), >> np.array(..., dtype=np.bool_), ..., np.array(..., dtype=np.bool_) >> >>> aggregate = (pred1 + pred2 + ... + predN) / N >> >>> agg_pred = aggregate >= 0.5 >> >> Which returned (1.0 / N) in all indices which had at least one "True" >> value in a prediction. I assumed that the arrays would be promoted to >> integers (False -> 0; True -> 1) and added so that agg_pred would hold the >> majority vote result. But agg_pred was always empty because the maximum >> value was (1.0 / N) for N > 2. >> >> My current "work around" is to remind myself of this discrepancy by >> importing "builtins" from the standard library and annotating the relevant >> functions and variables as using the "builtins.bool" to explicitly >> distinguish it from np.bool_ behavior where applicable, and add checks >> and/or conversions on top of that. But why not make np.bool_ act like the >> built-in bool under addition and multiplication and let users use the >> already existing | and & operations for "logical or" and "logical and"? >> > > NumPy bool_ is a type and is only represented by the values (0, 1) with > the "+" and "*' operators overloaded to be "or". The later Python bool is > pretty much just an integer, as that was backward compatible. So you end up > with things like > > In [20]: type(np.bool_(1) + np.bool_(1)) # "+" is the "or" operator > Out[20]: np.bool_ > > In [21]: type(bool(1) + bool(1)) # "+" is integer addition > Out[21]: int > > In [22]: type(np.bool_(1) * np.bool_(1)) # "*" is the "and" operator > Out[22]: np.bool_ > > In [23]: type(bool(1) + bool(1)) # "*" is integer multiplication > Out[23]: int > > Numpy bool_ will be promoted to int when combined with Python ints. > > The non-logical operators convert np.bool_ to numbers with the exception of "-", which also used to be overloaded as a logical operator. We raised an error when we changed that so that people could adjust their code and use "^" instead. Long term it might make sense to reintroduce "-" with integer promotion. Chuck ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.c
[Numpy-discussion] Re: np.bool_ vs Python bool behavior
Hi Jacob, adding to what Chuck mentioned, a few inline comments if you are interested in some gory details. On Sat, 2022-03-12 at 21:40 +, Jacob Reinhold wrote: > A pain point I ran into a while ago was assuming that an np.ndarray > with dtype=np.bool_ would act similarly to the Python built-in > boolean under addition. This is not the case, as shown in the > following code snippet: > > > > > np.bool_(True) + True > True > > > > True + True > 2 > > In fact, I'm somewhat confused about all the arithmetic operations on > boolean arrays: > > > > > np.bool_(True) * True > True > > > > np.bool_(True) / True > 1.0 > > > > np.bool_(True) - True > TypeError: numpy boolean subtract, the `-` operator, is not > supported, use the bitwise_xor, the `^` operator, or the logical_xor > function instead. > > > > for x, y in ((False, False), (False, True), (True, False), > > > > (True, True)): print(np.bool_(x) ** y, end=" ") > 1 0 1 1 > > I get that addition corresponds to "logical or" and multiplication > corresponds to "logical and", but I'm lost on the division and > exponentiation operations given that addition and multiplication > don't promote the dtype to integers or floats. I doubt this is historically intentional – or at least choices made fairly pragmatically 10-20 years ago. But gaining the momentum to change is hard. Although, we did disable `bool - bool`, because it was particularly ill defined. If you are interested in the guts of it, there are three types of behaviors: 1. Functions that are explicitly defined for bool. E.g. `add` and `multiply` are examples. (Check `np.add.types` for ufuncs.) 2. Functions which probably never had a conscious decision made, but do not have a bool implementation: These will usually end up using `int8` (e.g. `floor_divide`) 3. A few functions are more explicit. Subtraction refuses booleans, division uses float64 (although int8/int8 -> float64 so that is not very special). The reason is that if there is no boolean implementation, by default the "next" implementation (e.g. the `int8` one) will be used. Leading to behavior 2. To get an error (e.g. for subtract) we have to refuse it explicitly and that is a bit complex (3). That is both complicated and easy to forget. N.B.: I have changed that logic. "Future" ufuncs are now reversed. They will default to an error rather than using the `int8` implementation. That should make change easier, but doesn't really solve the problem at hand... > If arrays stubbornly refused to ever change type or interact with > objects of a different type under addition, that'd be one thing, but > they do change: > > > > > np.uint8(0) - 1 > -1 > > > > (np.uint8(0) - 1).dtype > dtype('int64') > > > > (np.uint8(0) + 0.1).dtype > dtype('float64') > > This dtype change can also be seen in the division and exponentiation > above for np.bool_. This is has a subtly different reason: It is due to "value-based promotion" and how it works. How NumPy interprets the `1` depends a on the context! We use a "weak" (but value-inspecting) logic if other is an _array_: np.array([0, 1, 2], dtype=np.uint8) - 1 # array([255, 0, 1], dtype=uint8) Where the value inspecting part kicks in for: np.array([0, 1, 2], dtype=np.uint8) + 300 # Will go to uint16 But, when the other object is a NumPy scalar or a 0-D array, we do not use that logic currently. We instead do: np.array(0, dtype=np.uint8) - 1 => np.array(0, dtype=np.uint8) - np.asarray(1) => np.array(0, dtype=np.uint8) - np.array(1, dtype=np.int64) And that gives you the default integer (usually int64)! We are considering changing it, but it is a big change I am actively working on: https://github.com/numpy/numpy/pull/21103 https://discuss.scientific-python.org/t/poll-future-numpy-behavior-when-mixing-arrays-numpy-scalars-and-python-scalars/202 > > Why the discrepancy in behavior for np.bool_? And why are arithmetic > operations for np.bool_ inconsistently promoted to other data types? > > If all arithmetic operations on np.bool_ resulted in integers, that > would be consistent (so easier to work with) and wouldn't restrict > expressiveness because there are also "logical or" (|) and "logical > and" (&) operations available. Alternatively, division and > exponentiation could throw errors like subtract, but the discrepancy > between np.bool_ and the Python built-in bool for addition and > multiplication would remain. I am not sure anyone ever seriously tried to change this. In general, we would have to take this pretty slow probably, similar to what Chuck said about subtraction: 1. Make it an error (subtraction is there 2. Switch (potentially with a warning first) to making it an integer Or we just stay with errors of course. In general, I like the idea of doing something about this, so we should discuss this! But, I do suspect in the end we would have to formalize a proposal. And some users are bound to be disappointed t