On Tue, Jul 15, 2014 at 04:50:33PM -0500, Segher Boessenkool wrote: > On Tue, Jul 15, 2014 at 05:20:31PM -0400, Michael Meissner wrote: > > I did some timing tests to compare the new PowerPC IEEE 128-bit results to > > the > > current implementation of long double using the IBM extended format. > > > > The test consisted a short loop doing the operation over arrays of 1,024 > > elements, reading in two values, doing the operation, and then storing it > > back. > > This loop in turn was done multiple times, with the idea that most of the > > values would be in the cache, and we didn't have to worry about > > pre-fetching, > > etc. > > > > The float, double tests were done with vectorization disabled, while the > > vector > > float and vector double tests, the compiler was allowed to do the normal > > auto > > vectorization. > > > > The number reported was how much longer the second column took over the > > first: > > I assume you mean the other way around? > > > Generally, the __float128 is 2x slower than the current IBM extended double > > format, except for divide, where it is 5x slower. I must say, the software > > floating point emulation routines worked well, and once the proper macros > > were > > setup, I only needed to override the type used for IEEE 128-bit. > > > > Add loop > > ======== > > > > float vs double: 2.00x > > Why is float twice as slow as double?
Pat re-ran the tests, and now float/double are the same speed. Since I was running this on a development machine, and not a dedicated machine, it was probably just luck of the draw that somebody was doing a large build at the time I ran the tests. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797