Re: [Numpy-discussion] in the NA discussion, what can we agree on?

Gary Strangman Fri, 04 Nov 2011 08:20:04 -0700


On Fri, 4 Nov 2011, Benjamin Root wrote:

On Friday, November 4, 2011, Gary Strangman <[email protected]>
wrote:
>
>> > non-destructive+propagating -- it really depends on exactly what
>> > computations you want to perform, and how you expect them to work. The
>> > main difference is how reduction operations are treated. I kind of
>> > feel like the non-propagating version makes more sense overall, but I
>> > don't know if there's any consensus on that.
>>
>> I think this is further evidence for my idea that a mask should not be
>> undone, but is non destructive.  If you want to be able to access the
values
>> after masking, have a view, or only apply the mask to a view.
>
> OK, so my understanding of what's meant by propagating is probably
incomplete (and is definitely still fuzzy). I'm a little confused by the
phrase "a mask should not be undone" though. Say I want to perform a
statistical analysis or filtering procedure excluding and (separately)
including a handful of outliers? Isn't that a natural case for undoing a
mask? Or did you mean something else?
>
> I think I understand the "use a view" option above, though I don't see how
one could apply a mask only to a view. What if my view is every other row in
a 2D array, and I want to mask the last half of this view? What is the state
of the original array once the mask has been applied?
>
> (If this is derailing the progress of this thread, feel free to ignore
it.)
>
> -best
> Gary

Ufuncs can be broadly categorized as element-wise (binary ops like +, *,
etc) as well as regular functions that return an array with a shape that
matches the inputs broadcasted together.  And reduction ops (sum, min, mean,
etc).

For element-wise, things are a bit murky for IGNORE, and I defer to Mark's
NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#id17,
and it probably should be expanded and clarified in the NEP.

For reduction ops, propagation means that sum([3 5 NA 6]) == NA, just like
if you had a NaN in the array. Non-propagating (or skipping or ignore) would
have that operation produce 14.  A mean() for the propagating case would be
NA, but 4.6666 for non-propagating.

The part about undoing a mask is addressing the issue of when an operation
produces a new array that has ignored elements in it, then those elements
never were initialized with any value at all.  Therefore, "unmasking" those
elements and accessing their values make no sense. This and more are covered
in this section of the NEP:
https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#id11

For your stated case, I would have two views of the data (or at least the
original data and a view of it).  For the view, I would apply the mask to
hide the outliers from the filtering operation and produce a result.  The
first view (or the original array) sees the same data as it did before the
other view took on a mask, so you can perform the filtering operation on the
data and have two separate results. You can keep the masked view for
subsequent calculations, and/or keep the original view, and/or create new
views with new masks for other analyzes, all while keeping the original data
intact.

Note that I am right now speaking of views in a somewhat more abstract sense
that is only loosely tied to numpy's specific behavior with respect to views
right now.  As for np.view() in specific, that is an implementation detail
that probably shouldn't be in this thread yet, so don't hook too much onto
it.

Thanks Ben. That's quite helpful. And it also points to my worry (sorry, Ialready knew enough about views to be dangerous) ... your "conceptual"version of views is great, but I don't think numpy fully and reliablyfollows it (occasionally giving copies instead of views, for example, whena view is particularly difficult to generate). So I worry that your notionof views will actually collide with core numpy view implementations. Butlike you said, perhaps this thread shouldn't go there (yet).

Given I'm still fuzzy on all the distinctions, perhaps someone could tryto help me (and others?) to define all /4/ logical possibilities ... somemay be obvious dead-ends. I'll take a stab at them, but these shoulddefinitely get edited by others:

destructive + propagating = the data point is truly missing (satellitefell into the ocean; dog ate my source datasheet, or whatever), this isthe nature of that data point, such missingness should be replicated inelementwise operations, and the missingness SHOULD interfere withreduction operations that involve that datapoint(np.sum([1,MISSING])=MISSING)

destructive + non-propagating = the data point is truly missing, this isthe nature of that data point, such missingness should be replicated inelementwise operations, but such missingness should NOT interfere withreduction operations that involve that datapoint (np.sum([1,MISSING])=1)

non-destructive + propagating = I want to ignore this datapoint fornow; element-wise operations should replicate this "ignore" designation,and missingness of this type SHOULD interfere with reduction operationsthat involve this datapoint (np.sum([1,IGNORE])=IGNORE)

non-destructive + non-propagating = I want to ignore this datapoint fornow; element-wise operations should replicate this "ignore" designation,but missingness of this type SHOULD NOT interfere with reductionoperations that involve this datapoint (np.sum([1,IGNORE])=1)


Comments?

-best
Gary


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] in the NA discussion, what can we agree on?

Reply via email to