On Fri, Nov 4, 2011 at 2:29 PM, Nathaniel Smith <[email protected]> wrote:
> On Fri, Nov 4, 2011 at 1:22 PM, T J <[email protected]> wrote: > > I agree that it would be ideal if the default were to skip IGNORED > values, > > but that behavior seems inconsistent with its propagation properties > (such > > as when adding arrays with IGNORED values). To illustrate, when we did > > "x+2", we were stating that: > > > > IGNORED(2) + 2 == IGNORED(4) > > > > which means that we propagated the IGNORED value. If we were to skip > them > > by default, then we'd have: > > > > IGNORED(2) + 2 == 2 > > > > To be consistent, then it seems we also should have had: > > > >>>> x + 2 > > [3, 2, 5] > > > > which I think we can agree is not so desirable. What this seems to come > > down to is that we tend to want different behavior when we are doing > > reductions, and that for IGNORED data, we want it to propagate in every > > situation except for a reduction (where we want to skip over it). > > > > I don't know if there is a well-defined way to distinguish reductions > from > > the other operations. Would it hold for generalized ufuncs? Would it > hold > > for other functions which might return arrays instead of scalars? > > Continuing my theme of looking for consensus first... there are > obviously a ton of ugly corners in here. But my impression is that at > least for some simple cases, it's clear what users want: > > >>> a = [1, IGNORED(2), 3] > # array-with-ignored-values + unignored scalar only affects unignored > values > >>> a + 2 > [3, IGNORED(2), 5] > # reduction operations skip ignored values > >>> np.sum(a) > 4 > > For example, Gary mentioned the common idiom of wanting to take an > array and subtract off its mean, and he wants to do that while leaving > the masked-out/ignored values unchanged. As long as the above cases > work the way I wrote, we will have > > >>> np.mean(a) > 2 > >>> a -= np.mean(a) > >>> a > [-1, IGNORED(2), 1] > > Which I'm pretty sure is the result that he wants. (Gary, is that > right?) Also numpy.ma follows these rules, so that's some additional > evidence that they're reasonable. (And I think part of the confusion > between Lluís and me was that these are the rules that I meant when I > said "non-propagating", but he understood that to mean something > else.) > > So before we start exploring the whole vast space of possible ways to > handle masked-out data, does anyone see any reason to consider rules > that don't have, as a subset, the ones above? Do other rules have any > use cases or user demand? (I *love* playing with clever mathematics > and making things consistent, but there's not much point unless the > end result is something that people will use :-).) > I guess I'm just confused on how one, in principle, would distinguish the various forms of propagation that you are suggesting (ie for reductions). I also don't think it is good that we lack commutativity. If we disallow unignoring, then yes, I agree that what you wrote above is what people want. But if we are allowed to unignore, then I do not. Also, how does something like this get handled? >>> a = [1, 2, IGNORED(3), NaN] If I were to say, "What is the mean of 'a'?", then I think most of the time people would want 1.5. I guess if we kept nanmean around, then we could do: >>> a -= np.nanmean(a) [-.5, .5, IGNORED(3), NaN] Sorry if this is considered digging deeper than consensus. I'm just curious if arrays having NaNs in them, in addition to IGNORED, causes problems.
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
