On Mon, Jun 27, 2011 at 8:53 PM, Mark Wiebe <mwwi...@gmail.com> wrote:
> On Mon, Jun 27, 2011 at 12:44 PM, eat <e.antero.ta...@gmail.com> wrote: > >> Hi, >> >> On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe <mwwi...@gmail.com> wrote: >> >>> First I'd like to thank everyone for all the feedback you're providing, >>> clearly this is an important topic to many people, and the discussion has >>> helped clarify the ideas for me. I've renamed and updated the NEP, then >>> placed it into the master NumPy repository so it has a more permanent home >>> here: >>> >>> https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst >>> >>> In the NEP, I've tried to address everything that was raised in the >>> original thread and in Nathaniel's followup 'Concepts' thread. To deal with >>> the issue of whether a mask is True or False for a missing value, I've >>> removed the 'mask' attribute entirely, except for ufunc-like functions >>> np.ismissing and np.isavail which return the two styles of masks. Here's a >>> high level summary of how I'm thinking of the topic, and what I will >>> implement: >>> >>> *Missing Data Abstraction* >>> >>> There appear to be two useful ways to think about missing data that are >>> worth supporting. >>> >>> 1) Unknown yet existing data >>> 2) Data that doesn't exist >>> >>> In 1), an NA value causes outputs to become NA except in a small number >>> of exceptions such as boolean logic, and in 2), operations treat the data as >>> if there were a smaller array without the NA values. >>> >>> *Temporarily Ignoring Data* >>> * >>> * >>> In some cases, it is useful to flag data as NA temporarily, possibly in >>> several different ways, for particular calculations or testing out different >>> ways of throwing away outliers. This is independent of the missing data >>> abstraction, still requiring a choice of 1) or 2) above. >>> >>> *Implementation Techniques* >>> * >>> * >>> There are two mechanisms generally used to implement missing data >>> abstractions, >>> * >>> * >>> 1) An NA bit pattern >>> 2) A mask >>> >>> I've described a design in the NEP which can include both techniques >>> using the same interface. The mask approach is strictly more general than >>> the NA bit pattern approach, except for a few things like the idea of >>> supporting the dtype 'NA[f8,InfNan]' which you can read about in the NEP. >>> >>> My intention is to implement the mask-based design, and possibly also >>> implement the NA bit pattern design, but if anything gets cut it will be the >>> NA bit patterns. >>> >>> Thanks again for all your input so far, and thanks in advance for your >>> suggestions for improving this new revision of the NEP. >>> >> A very impressive PEP indeed. >> > Hi, > >> However, how would corner cases, like >> >> >>> a = np.array([np.NA, np.NA], dtype='f8', masked=True) >> >>> np.mean(a, skipna=True) >> >> This should be equivalent to removing all the NA values, then calling > mean, like this: > > >>> b = np.array([], dtype='f8') > >>> np.mean(b) > /home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2374: > RuntimeWarning: invalid value encountered in double_scalars > return mean(axis, dtype, out) > nan > > >>> np.mean(a) >> >> This would return NA, since NA values are sitting in positions that would > affect the output result. > OK. > > >> be handled? >> >> My concern here is that there always seems to be such corner cases which >> can only be handled with specific context knowledge. Thus producing 100% >> generic code to handle 'missing data' is not doable. >> > > Working out the corner cases for the functions that are already in numpy > seems tractable to me, how to or whether to support missing data is > something the author of each new function will have to consider when missing > data support is in NumPy, but I don't think we can do more than provide the > mechanisms for people to use. > Sure. I'll ride up with this and wait when I'll have some tangible to outperform the 'traditional' NaN handling. - eat > > -Mark > > >> Thanks, >> - eat >> >>> >>> -Mark >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion