Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Jeff Reback
related recent issue: https://github.com/numpy/numpy/issues/4638 and pandas is now explicitly specifying the accumulator to avoid this problem: https://github.com/pydata/pandas/pull/6954/files pandas also implemented the Welfords method for rolling_var in 0.14.0, see here: https://github.com/pydat

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread RayS
Probably a number of scipy places as well import numpy import scipy.stats print numpy.__version__ print scipy.__version__ for s in range(16777214, 16777944): if scipy.stats.nanmean(numpy.ones((s, 1), numpy.float32))[0]!=1: print '\nbroke', s, scipy.stats.nanmean(numpy.ones((s, 1),

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread RayS
import numpy print numpy.__version__ for s in range(1864100, 1864200): if numpy.ones((s, 9), numpy.float32).sum()!= s*9: print '\nbroke', s break else: print '\r',s, C:\temp>python np_sum.py 1.8.0b2 1864135 broke 1864136 import numpy print numpy.__version__ for s

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Frédéric Bastien
On Thu, Jul 24, 2014 at 12:59 PM, Charles R Harris < charlesr.har...@gmail.com> wrote: > > > > On Thu, Jul 24, 2014 at 8:27 AM, Jaime Fernández del Río < > jaime.f...@gmail.com> wrote: > >> On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < >> jtaylor.deb...@googlemail.com> wrote: >> >>> In practice

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Joseph Martinot-Lagarde
Le 24/07/2014 12:55, Thomas Unterthiner a écrit : > I don't agree. The problem is that I expect `mean` to do something > reasonable. The documentation mentions that the results can be > "inaccurate", which is a huge understatement: the results can be utterly > wrong. That is not reasonable. At the

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Charles R Harris
On Thu, Jul 24, 2014 at 8:27 AM, Jaime Fernández del Río < jaime.f...@gmail.com> wrote: > On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < > jtaylor.deb...@googlemail.com> wrote: > >> In practice one of the better methods is pairwise summation that is >> pretty much as fast as a naive summation b

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Alan G Isaac
On 7/24/2014 5:59 AM, Eelco Hoogendoorn wrote to Thomas: > np.mean isn't broken; your understanding of floating point number is. This comment seems to conflate separate issues: the desirable return type, and the computational algorithm. It is certainly possible to compute a mean of float32 doing

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Jaime Fernández del Río
On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < jtaylor.deb...@googlemail.com> wrote: > In practice one of the better methods is pairwise summation that is > pretty much as fast as a naive summation but has an accuracy of > O(logN) ulp. > This is the method numpy 1.9 will use this method by defau

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Julian Taylor
On Thu, Jul 24, 2014 at 1:33 PM, Fabien wrote: > Hi all, > > On 24.07.2014 11:59, Eelco Hoogendoorn wrote: >> np.mean isn't broken; your understanding of floating point number is. > > I am quite new to python, and this problem is discussed over and over > for other languages too. However, numpy's

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Fabien
Hi all, On 24.07.2014 11:59, Eelco Hoogendoorn wrote: > np.mean isn't broken; your understanding of floating point number is. I am quite new to python, and this problem is discussed over and over for other languages too. However, numpy's summation problem appears with relatively small arrays al

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Thomas Unterthiner
I don't agree. The problem is that I expect `mean` to do something reasonable. The documentation mentions that the results can be "inaccurate", which is a huge understatement: the results can be utterly wrong. That is not reasonable. At the very least, a warning should be issued in cases where

Re: [Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Eelco Hoogendoorn
Arguably, this isn't a problem of numpy, but of programmers being trained to think of floating point numbers as 'real' numbers, rather than just a finite number of states with a funny distribution over the number line. np.mean isn't broken; your understanding of floating point number is. What you

[Numpy-discussion] numpy.mean still broken for large float32 arrays

2014-07-24 Thread Thomas Unterthiner
Hi! The following is a known "bug" since at least 2010 [1]: import numpy as np X = np.ones((5, 1024), np.float32) print X.mean() >>> 0.32768 I ran into this for the first time today as part of a larger program. I was very surprised by this, and spent over an hour lookin