On Tue, Jul 15, 2014 at 04:50:33PM -0500, Segher Boessenkool wrote:
> On Tue, Jul 15, 2014 at 05:20:31PM -0400, Michael Meissner wrote:
> > I did some timing tests to compare the new PowerPC IEEE 128-bit results to 
> > the
> > current implementation of long double using the IBM extended format.
> > 
> > The test consisted a short loop doing the operation over arrays of 1,024
> > elements, reading in two values, doing the operation, and then storing it 
> > back.
> > This loop in turn was done multiple times, with the idea that most of the
> > values would be in the cache, and we didn't have to worry about 
> > pre-fetching,
> > etc.
> > 
> > The float, double tests were done with vectorization disabled, while the 
> > vector
> > float and vector double tests, the compiler was allowed to do the normal 
> > auto
> > vectorization.
> > 
> > The number reported was how much longer the second column took over the 
> > first:
> 
> I assume you mean the other way around?
> 
> > Generally, the __float128 is 2x slower than the current IBM extended double
> > format, except for divide, where it is 5x slower.  I must say, the software
> > floating point emulation routines worked well, and once the proper macros 
> > were
> > setup, I only needed to override the type used for IEEE 128-bit.
> > 
> > Add loop
> > ========
> > 
> > float       vs double:          2.00x
> 
> Why is float twice as slow as double?

Pat re-ran the tests, and now float/double are the same speed.  Since I was
running this on a development machine, and not a dedicated machine, it was
probably just luck of the draw that somebody was doing a large build at the
time I ran the tests.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Reply via email to