Hi, On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe <mwwi...@gmail.com> wrote:
> First I'd like to thank everyone for all the feedback you're providing, > clearly this is an important topic to many people, and the discussion has > helped clarify the ideas for me. I've renamed and updated the NEP, then > placed it into the master NumPy repository so it has a more permanent home > here: > > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst > > In the NEP, I've tried to address everything that was raised in the > original thread and in Nathaniel's followup 'Concepts' thread. To deal with > the issue of whether a mask is True or False for a missing value, I've > removed the 'mask' attribute entirely, except for ufunc-like functions > np.ismissing and np.isavail which return the two styles of masks. Here's a > high level summary of how I'm thinking of the topic, and what I will > implement: > > *Missing Data Abstraction* > > There appear to be two useful ways to think about missing data that are > worth supporting. > > 1) Unknown yet existing data > 2) Data that doesn't exist > > In 1), an NA value causes outputs to become NA except in a small number of > exceptions such as boolean logic, and in 2), operations treat the data as if > there were a smaller array without the NA values. > > *Temporarily Ignoring Data* > * > * > In some cases, it is useful to flag data as NA temporarily, possibly in > several different ways, for particular calculations or testing out different > ways of throwing away outliers. This is independent of the missing data > abstraction, still requiring a choice of 1) or 2) above. > > *Implementation Techniques* > * > * > There are two mechanisms generally used to implement missing data > abstractions, > * > * > 1) An NA bit pattern > 2) A mask > > I've described a design in the NEP which can include both techniques using > the same interface. The mask approach is strictly more general than the NA > bit pattern approach, except for a few things like the idea of supporting > the dtype 'NA[f8,InfNan]' which you can read about in the NEP. > > My intention is to implement the mask-based design, and possibly also > implement the NA bit pattern design, but if anything gets cut it will be the > NA bit patterns. > > Thanks again for all your input so far, and thanks in advance for your > suggestions for improving this new revision of the NEP. > A very impressive PEP indeed. However, how would corner cases, like >>> a = np.array([np.NA, np.NA], dtype='f8', masked=True) >>> np.mean(a, skipna=True) >>> np.mean(a) be handled? My concern here is that there always seems to be such corner cases which can only be handled with specific context knowledge. Thus producing 100% generic code to handle 'missing data' is not doable. Thanks, - eat > > -Mark > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion