On Mon, Jul 15, 2013 at 5:34 PM, Charles R Harris <[email protected]> wrote: > > > On Mon, Jul 15, 2013 at 2:44 PM, <[email protected]> wrote: >> >> On Mon, Jul 15, 2013 at 4:24 PM, <[email protected]> wrote: >> > On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith <[email protected]> wrote: >> >> On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris >> >> <[email protected]> wrote: >> >>> Let me try to summarize. To begin with, the environment of the nan >> >>> functions >> >>> is rather special. >> >>> >> >>> 1) if the array is of not of inexact type, they punt to the non-nan >> >>> versions. >> >>> 2) if the array is of inexact type, then out and dtype must be inexact >> >>> if >> >>> specified >> >>> >> >>> The second assumption guarantees that NaN can be used in the return >> >>> values. >> >> >> >> The requirement on the 'out' dtype only exists because currently the >> >> nan function like to return nan for things like empty arrays, right? >> >> If not for that, it could be relaxed? (it's a rather weird >> >> requirement, since the whole point of these functions is that they >> >> ignore nans, yet they don't always...) >> >> >> >>> sum and nansum >> >>> >> >>> These should be consistent so that empty sums are 0. This should cover >> >>> the >> >>> empty array case, but will change the behaviour of nansum which >> >>> currently >> >>> returns NaN if the array isn't empty but the slice is after NaN >> >>> removal. >> >> >> >> I agree that returning 0 is the right behaviour, but we might need a >> >> FutureWarning period. >> >> >> >>> mean and nanmean >> >>> >> >>> In the case of empty arrays, an empty slice, this leads to 0/0. For >> >>> Python >> >>> this is always a zero division error, for Numpy this raises a warning >> >>> and >> >>> and returns NaN for floats, 0 for integers. >> >>> >> >>> Currently mean returns NaN and raises a RuntimeWarning when 0/0 >> >>> occurs. In >> >>> the special case where dtype=int, the NaN is cast to integer. >> >>> >> >>> Option1 >> >>> 1) mean raise error on 0/0 >> >>> 2) nanmean no warning, return NaN >> >>> >> >>> Option2 >> >>> 1) mean raise warning, return NaN (current behavior) >> >>> 2) nanmean no warning, return NaN >> >>> >> >>> Option3 >> >>> 1) mean raise warning, return NaN (current behavior) >> >>> 2) nanmean raise warning, return NaN >> >> >> >> I have mixed feelings about the whole np.seterr apparatus, but since >> >> it exists, shouldn't we use it for consistency? I.e., just do whatever >> >> numpy is set up to do with 0/0? (Which I think means, warn and return >> >> NaN by default, but this can be changed.) >> >> >> >>> var, std, nanvar, nanstd >> >>> >> >>> 1) if ddof > axis(axes) size, raise error, probably a program bug. >> >>> 2) If ddof=0, then whatever is the case for mean, nanmean >> >>> >> >>> For nanvar, nanstd it is possible that some slice are good, some bad, >> >>> so >> >>> >> >>> option1 >> >>> 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice >> >>> >> >>> option2 >> >>> 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice >> >> >> >> I don't really have any intuition for these ddof cases. Just raising >> >> an error on negative effective dof is pretty defensible and might be >> >> the safest -- it's a easy to turn an error into something sensible >> >> later if people come up with use cases... >> > >> > related why does reduceat not have empty slices? >> > >> >>>> np.add.reduceat(np.arange(8),[0,4, 5, 7,7]) >> > array([ 6, 4, 11, 7, 7]) >> > >> > >> > I'm in favor of returning nans instead of raising exceptions, except >> > if the return type is int and we cannot cast nan to int. >> > >> > If we get functions into numpy that know how to handle nans, then it >> > would be useful to get the nans, so we can work with them >> > >> > Some cases where this might come in handy are when we iterate over >> > slices of an array that define groups or category levels with possible >> > empty groups *) >> > >> >>>> idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2]) >> >>>> x = np.arange(9) >> >>>> [x[idx==ii].mean() for ii in range(4)] >> > [1.5, 5.0, nan, 7.5] >> > >> > instead of >> >>>> [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()>0] >> > [1.5, 5.0, 7.5] >> > >> > same for var, I wouldn't have to check that the size is larger than >> > the ddof (whatever that is in the specific case) >> > >> > *) groups could be empty because they were defined for a larger >> > dataset or as a union of different datasets >> >> background: >> >> I wrote several robust anova versions a few weeks ago, that were >> essentially list comprehension as above. However, I didn't allow nans >> and didn't check for minimum size. >> Allowing for empty groups to return nan would mainly be a convenience, >> since I need to check the group size only once. >> >> ddof: tests for proportions have ddof=0, for regular t-test ddof=1, >> for tests of correlation ddof=2 IIRC >> so we would need to check for the corresponding minimum size that n-ddof>0 >> >> "negative effective dof" doesn't exist, that's np.maximum(n - ddof, 0) >> which is always non-negative but might result in a zero-division >> error. :) >> >> I don't think making anything conditional on ddof>0 is useful. >> > > So how would you want it? > > To summarize the problem areas: > > 1) What is the sum of an empty slice? NaN or 0? 0 as it is now for sum, (including 0 for nansum with no valid entries).
> 2) What is mean of empy slice? NaN, NaN and warn, or error? > 3) What if n - ddof < 0 for slice? NaN, NaN and warn, or error? > 4) What if n - ddof = 0 for slice? NaN, NaN and warn, or error? > > I'm tending to NaN and warn for 2 -- 3, because, as Nathaniel notes, the > warning can be turned into an error by the user. The errstate context > manager would be good for that. Yes, That's what I would prefer also, NaN and ZeroDivisionError, for 2-4, including mean, var and std, for both nan and non-nan functions. with the extra argument that 3) and 4) are the same case (except in polyfit :) Josef > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
