On Sat, Jun 25, 2011 at 1:05 PM, Nathaniel Smith <n...@pobox.com> wrote: > On Sat, Jun 25, 2011 at 9:26 AM, Matthew Brett <matthew.br...@gmail.com> > wrote: >> So far I see the difference between 1) and 2) being that you cannot >> unmask. So, if you didn't even know you could unmask data, then it >> would not matter that 1) was being implemented by masks? > > I guess that is a difference, but I'm trying to get at something more > fundamental -- not just what operations are allowed, but what > operations people *expect* to be allowed. It seems like some of us > have been talking past each other a lot, where someone says "but > changing masks is the single most important feature!" and then someone > else says "what are you talking about that doesn't even make sense". > >> To clarify, you're proposing for: >> >> a = np.sum(np.array([np.NA, np.NA]) >> >> 1) -> np.NA >> 2) -> 0.0 > > Yes -- and in R you get actually do get NA, while in numpy.ma you > actually do get 0. I don't think this is a coincidence; I think it's > because they're designed as coherent systems that are trying to solve > different problems. (Well, numpy.ma's "hardmask" idea seems inspired > by the missing-data concept rather than the temporary-mask concept, > but aside from that it seems pretty consistent in implementing option > 2.)
Agree. My basic observation about numpy.ma is that it's a finely crafted solution for a different set of problems than the ones I have. I just don't want the same thing to happen here so I'm stuck writing code (like I am now) that looks like mask = y.mask the_sum = y.sum(axis) the_count = mask.sum(axis) the_sum[the_count == 0] = nan > Here's another possible difference -- in (1), intuitively, missingness > is a property of the data, so the logical place to put information > about whether you can expect missing values is in the dtype, and to > enable missing values you need to make a new array with a new dtype. > (If we use a mask-based implementation, then > np.asarray(nomissing_array, dtype=yesmissing_type) would still be able > to skip making a copy of the data -- I'm talking ONLY about the > interface here, not whether missing data has a different storage > format from non-missing data.) > > In (2), the whole point is to use different masks with the same data, > so I'd argue masking should be a property of the array object rather > than the dtype, and the interface should logically allow masks to be > created, modified, and destroyed in place. > > They're both internally consistent, but I think we might have to make > a decision and stick to it. > >> I agree it's good to separate the API from the implementation. I >> think the implementation is also important because I care about memory >> and possibly speed. But, that is a separate problem from the API... > > Yes, absolutely memory and speed are important. But a really fast > solution to the wrong problem isn't so useful either :-). > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion