On 11/03/2010 03:04 PM, Bruce Southey wrote: > On 11/03/2010 06:52 AM, Pauli Virtanen wrote: >> Wed, 03 Nov 2010 12:39:08 +0100, Vincent Schut wrote: >> [clip] >>> Btw, should I file a bug on this? >> One can argue that mean() and sum() should use a numerically stabler >> algorithm, so yes, a bug can be filed if there is not yet one already. >> > This is a 'user bug' not a numpy bug because it is a well known > numerical problem. I recall that we have had this type of discussion > before that has resulted in these functions being left as they are. The > numerical problem is mentioned better in the np.mean docstring than the > np.sum docstring. > > My understanding was that any new algorithm has to be better than the > current algorithm especially in speed and accuracy across 'typical' > numpy problems across the different Python and OS versions not just for > numerically challenged cases. For example, I would not want to sacrifice > speed if I achieve the same accuracy without losing as much speed as > just changing the dtype to float128 (as I use x86_64 Linux). > > Also in Warren's mean example, this is simply a 32-bit error because it > disappears when using 64-bit (numpy's default) - well, until we reach > the extreme 64-bit values. > > >>> np.ones((11334,16002)).mean() > 1.0 > >>> np.ones((11334,16002),np.float32).mean() > 0.092504406598019437 > >>> np.ones((11334,16002),np.float32).mean().dtype > dtype('float64') > > Note that there is probably a bug in np.mean because a 64-bit dtype is > returned for integers and 32-bit or lower precision floats. So upcast is > not apparently being done on the accumulator. > > > Bruce
Thanks for the info, all. I agree that this is a 'user bug', however, mentioning this as a corner case someplace a user would look when finding errors like these might be an idea, as I have the feeling it will keep turning up once and again at this list otherwise. Maybe start a webpage 'numpy and some often encountered floating point issues'? For now, I've just ordered more memory. The cheapest and simplest solution, if you ask me :-) Vincent. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion