Re: [Numpy-discussion] What should be the result in some statistics corner cases?

Charles R Harris Mon, 15 Jul 2013 06:52:42 -0700

On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris <[email protected]
> wrote:


>
>
> On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser <
> [email protected]> wrote:
>
>> On 7/14/13, Charles R Harris <[email protected]> wrote:
>> > Some corner cases in the mean, var, std.
>> >
>> > *Empty arrays*
>> >
>> > I think these cases should either raise an error or just return nan.
>> > Warnings seem ineffective to me as they are only issued once by default.
>> >
>> > In [3]: ones(0).mean()
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61:
>> > RuntimeWarning: invalid value encountered in double_scalars
>> >   ret = ret / float(rcount)
>> > Out[3]: nan
>> >
>> > In [4]: ones(0).var()
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
>> > RuntimeWarning: invalid value encountered in true_divide
>> >   out=arrmean, casting='unsafe', subok=False)
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>> > RuntimeWarning: invalid value encountered in double_scalars
>> >   ret = ret / float(rcount)
>> > Out[4]: nan
>> >
>> > In [5]: ones(0).std()
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
>> > RuntimeWarning: invalid value encountered in true_divide
>> >   out=arrmean, casting='unsafe', subok=False)
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>> > RuntimeWarning: invalid value encountered in double_scalars
>> >   ret = ret / float(rcount)
>> > Out[5]: nan
>> >
>> > *ddof >= number of elements*
>> >
>> > I think these should just raise errors. The results for ddof >=
>> #elements
>> > is happenstance, and certainly negative numbers should never be
>> returned.
>> >
>> > In [6]: ones(2).var(ddof=2)
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>> > RuntimeWarning: invalid value encountered in double_scalars
>> >   ret = ret / float(rcount)
>> > Out[6]: nan
>> >
>> > In [7]: ones(2).var(ddof=3)
>> > Out[7]: -0.0
>> > *
>> > nansum*
>> >
>> > Currently returns nan for empty arrays. I suspect it should return nan
>> for
>> > slices that are all nan, but 0 for empty slices. That would make it
>> > consistent with sum in the empty case.
>> >
>>
>>
>> For nansum, I would expect 0 even in the case of all nans.  The point
>> of these functions is to simply ignore nans, correct?  So I would aim
>> for this behaviour:  nanfunc(x) behaves the same as func(x[~isnan(x)])
>>
>>
> Agreed, although that changes current behavior. What about the other
> cases?
>
>
Looks like there isn't much interest in the topic, so I'll just go ahead
with the following choices:

Non-NaN case

1) Empty array -> ValueError

The current behavior with stats is an accident, i.e., the nan arises from
0/0. I like to think that in this case the result is any number, rather
than not a number, so *the* value is simply not defined. So in this case
raise a ValueError for empty array.

2) ddof >= n -> ValueError

If the number of elements, n, is not zero and ddof >= n, raise a ValueError
for the ddof value.

Nan case

1) Empty array -> Value Error
2) Empty slice -> NaN
3) For slice ddof >= n -> Nan

 Chuck

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] What should be the result in some statistics corner cases?

Reply via email to