On Tue, Sep 9, 2008 at 6:55 AM, Sebastian Stephan Berg <[EMAIL PROTECTED]> wrote: > Yeah, I memory wise it doesn't matter to sum to a double, but just > trying around it seems that the mixing of float and double is very slow
Yes, the memory argument explains why you would float32 data vs float64 data, not the accumulator (that certainly what Matthieu meant). Having an accumulator of float64 means you don't care so much about speed, but more about memory, and are willing to trade speed for less memory. > (at least on my comp) while if the starting array is already double > there is almost no difference for summing. Generally double precision > calculations should be slower though. Don't extensions like SSE2 operate > either on 2 doubles or 4 floats at once and thus should be about twice > as fast for floats? For add/multiply this behaviour is for me visible > anyways. We don't use SSE and co in numpy, and I doubt the compilers (even Intel one) are able to generate effective SSE for numpy ATM. Actually, double and float are about the same speed for x86 (using the x87 FPU and not the SSE units), because internally, the register is 80 bits wide when doing computation. The real difference is the memory pressure induced by double (8 bytes per items) compared to float when doing computation with double, and for certain operations, for a reason I don't understand (log, sin and co are as fast for float and double using the FPU, but sqrt and divide are twice faster for float, for example). I don't have any clear explanation on why mixing float with a double accumulator would be slow; maybe the default code to convert float to double - as generated by the compiler - is bad. Or maybe numpy does something funny. cheers, David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion