On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith <[email protected]> wrote: > On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris > <[email protected]> wrote: >> Let me try to summarize. To begin with, the environment of the nan functions >> is rather special. >> >> 1) if the array is of not of inexact type, they punt to the non-nan >> versions. >> 2) if the array is of inexact type, then out and dtype must be inexact if >> specified >> >> The second assumption guarantees that NaN can be used in the return values. > > The requirement on the 'out' dtype only exists because currently the > nan function like to return nan for things like empty arrays, right? > If not for that, it could be relaxed? (it's a rather weird > requirement, since the whole point of these functions is that they > ignore nans, yet they don't always...) > >> sum and nansum >> >> These should be consistent so that empty sums are 0. This should cover the >> empty array case, but will change the behaviour of nansum which currently >> returns NaN if the array isn't empty but the slice is after NaN removal. > > I agree that returning 0 is the right behaviour, but we might need a > FutureWarning period. > >> mean and nanmean >> >> In the case of empty arrays, an empty slice, this leads to 0/0. For Python >> this is always a zero division error, for Numpy this raises a warning and >> and returns NaN for floats, 0 for integers. >> >> Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In >> the special case where dtype=int, the NaN is cast to integer. >> >> Option1 >> 1) mean raise error on 0/0 >> 2) nanmean no warning, return NaN >> >> Option2 >> 1) mean raise warning, return NaN (current behavior) >> 2) nanmean no warning, return NaN >> >> Option3 >> 1) mean raise warning, return NaN (current behavior) >> 2) nanmean raise warning, return NaN > > I have mixed feelings about the whole np.seterr apparatus, but since > it exists, shouldn't we use it for consistency? I.e., just do whatever > numpy is set up to do with 0/0? (Which I think means, warn and return > NaN by default, but this can be changed.) > >> var, std, nanvar, nanstd >> >> 1) if ddof > axis(axes) size, raise error, probably a program bug. >> 2) If ddof=0, then whatever is the case for mean, nanmean >> >> For nanvar, nanstd it is possible that some slice are good, some bad, so >> >> option1 >> 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice >> >> option2 >> 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice > > I don't really have any intuition for these ddof cases. Just raising > an error on negative effective dof is pretty defensible and might be > the safest -- it's a easy to turn an error into something sensible > later if people come up with use cases...
related why does reduceat not have empty slices? >>> np.add.reduceat(np.arange(8),[0,4, 5, 7,7]) array([ 6, 4, 11, 7, 7]) I'm in favor of returning nans instead of raising exceptions, except if the return type is int and we cannot cast nan to int. If we get functions into numpy that know how to handle nans, then it would be useful to get the nans, so we can work with them Some cases where this might come in handy are when we iterate over slices of an array that define groups or category levels with possible empty groups *) >>> idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2]) >>> x = np.arange(9) >>> [x[idx==ii].mean() for ii in range(4)] [1.5, 5.0, nan, 7.5] instead of >>> [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()>0] [1.5, 5.0, 7.5] same for var, I wouldn't have to check that the size is larger than the ddof (whatever that is in the specific case) *) groups could be empty because they were defined for a larger dataset or as a union of different datasets PS: I used mean() above and not var() because >>> np.__version__ '1.5.1' >>> np.mean([]) nan >>> np.var([]) 0.0 Josef > > -n > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
