On Thu, Jun 23, 2011 at 4:19 PM, Nathaniel Smith <[email protected]> wrote:
> I'd like to see a statement of what the "missing data problem" is, and > how this solves it? Because I don't think this is entirely intuitive, > or that everyone necessarily has the same idea. > I agree it represents different problems in different contexts. For NumPy, I think the mechanism for dealing with it needs to be intuitive to work with in a maximum number of contexts, avoiding surprises. Getting feedback from a broad range of people is the only way a general solution can be designed with any level of confidence. > Reduction operations like 'sum', 'prod', 'min', and 'max' will operate as > if the values weren't there > > For context: My experience with missing data is in statistical > analysis; I find R's NA support to be pretty awesome for those > purposes. The conceptual model it's based on is that an NA value is > some number that we just happen not to know. So from this perspective, > I find it pretty confusing that adding an unknown quantity to 3 should > result in 3, rather than another unknown quantity. (Obviously it > should be possible to compute the sum of the known values, but IME > it's important for the default behavior to be to fail loudly when > things are wonky, not to silently patch them up, possibly > incorrectly!) > The conceptual model you describe sounds reasonable to me, and I definitely like the idea of consistently following one such model for all default behaviors. > Also, what should 'dot' do with missing values? > A matrix multiplication is defined in terms of sums of products, so it can be implemented to behave consistently with your conceptual model. > > -- Nathaniel > > On Thu, Jun 23, 2011 at 1:53 PM, Mark Wiebe <[email protected]> wrote: > > Enthought has asked me to look into the "missing data" problem and how > NumPy > > could treat it better. I've considered the different ideas of adding > dtype > > variants with a special signal value and masked arrays, and concluded > that > > adding masks to the core ndarray appears is the best way to deal with the > > problem in general. > > I've written a NEP that proposes a particular design, viewable here: > > > https://github.com/m-paradox/numpy/blob/cmaskedarray/doc/neps/c-masked-array.rst > > There are some questions at the bottom of the NEP which definitely need > > discussion to find the best design choices. Please read, and let me know > of > > all the errors and gaps you find in the document. > > Thanks, > > Mark > > _______________________________________________ > > NumPy-Discussion mailing list > > [email protected] > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
