On Fri, 4 Nov 2011, Benjamin Root wrote:
On Friday, November 4, 2011, Gary Strangman <[email protected]> wrote: > >> > non-destructive+propagating -- it really depends on exactly what >> > computations you want to perform, and how you expect them to work. The >> > main difference is how reduction operations are treated. I kind of >> > feel like the non-propagating version makes more sense overall, but I >> > don't know if there's any consensus on that. >> >> I think this is further evidence for my idea that a mask should not be >> undone, but is non destructive. If you want to be able to access the values >> after masking, have a view, or only apply the mask to a view. > > OK, so my understanding of what's meant by propagating is probably incomplete (and is definitely still fuzzy). I'm a little confused by the phrase "a mask should not be undone" though. Say I want to perform a statistical analysis or filtering procedure excluding and (separately) including a handful of outliers? Isn't that a natural case for undoing a mask? Or did you mean something else? > > I think I understand the "use a view" option above, though I don't see how one could apply a mask only to a view. What if my view is every other row in a 2D array, and I want to mask the last half of this view? What is the state of the original array once the mask has been applied? > > (If this is derailing the progress of this thread, feel free to ignore it.) > > -best > Gary Ufuncs can be broadly categorized as element-wise (binary ops like +, *, etc) as well as regular functions that return an array with a shape that matches the inputs broadcasted together. And reduction ops (sum, min, mean, etc). For element-wise, things are a bit murky for IGNORE, and I defer to Mark's NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#id17, and it probably should be expanded and clarified in the NEP. For reduction ops, propagation means that sum([3 5 NA 6]) == NA, just like if you had a NaN in the array. Non-propagating (or skipping or ignore) would have that operation produce 14. A mean() for the propagating case would be NA, but 4.6666 for non-propagating. The part about undoing a mask is addressing the issue of when an operation produces a new array that has ignored elements in it, then those elements never were initialized with any value at all. Therefore, "unmasking" those elements and accessing their values make no sense. This and more are covered in this section of the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst#id11 For your stated case, I would have two views of the data (or at least the original data and a view of it). For the view, I would apply the mask to hide the outliers from the filtering operation and produce a result. The first view (or the original array) sees the same data as it did before the other view took on a mask, so you can perform the filtering operation on the data and have two separate results. You can keep the masked view for subsequent calculations, and/or keep the original view, and/or create new views with new masks for other analyzes, all while keeping the original data intact. Note that I am right now speaking of views in a somewhat more abstract sense that is only loosely tied to numpy's specific behavior with respect to views right now. As for np.view() in specific, that is an implementation detail that probably shouldn't be in this thread yet, so don't hook too much onto it.
Thanks Ben. That's quite helpful. And it also points to my worry (sorry, I already knew enough about views to be dangerous) ... your "conceptual" version of views is great, but I don't think numpy fully and reliably follows it (occasionally giving copies instead of views, for example, when a view is particularly difficult to generate). So I worry that your notion of views will actually collide with core numpy view implementations. But like you said, perhaps this thread shouldn't go there (yet).
Given I'm still fuzzy on all the distinctions, perhaps someone could try to help me (and others?) to define all /4/ logical possibilities ... some may be obvious dead-ends. I'll take a stab at them, but these should definitely get edited by others:
destructive + propagating = the data point is truly missing (satellite fell into the ocean; dog ate my source datasheet, or whatever), this is the nature of that data point, such missingness should be replicated in elementwise operations, and the missingness SHOULD interfere with reduction operations that involve that datapoint (np.sum([1,MISSING])=MISSING)
destructive + non-propagating = the data point is truly missing, this is the nature of that data point, such missingness should be replicated in elementwise operations, but such missingness should NOT interfere with reduction operations that involve that datapoint (np.sum([1,MISSING])=1)
non-destructive + propagating = I want to ignore this datapoint for now; element-wise operations should replicate this "ignore" designation, and missingness of this type SHOULD interfere with reduction operations that involve this datapoint (np.sum([1,IGNORE])=IGNORE)
non-destructive + non-propagating = I want to ignore this datapoint for now; element-wise operations should replicate this "ignore" designation, but missingness of this type SHOULD NOT interfere with reduction operations that involve this datapoint (np.sum([1,IGNORE])=1)
Comments? -best Gary The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
