Uros, 

> > Actually, in many cases, SSE did help x86 performance as 
> well.  That 
> > happens in FP-intensive applications which spend a lot of time in 
> > loops when the XMM register set can be used more 
> efficiently than the x87 stack.
> 
>   This code could be a perfect example how XMM register file 
> beats x87 reg stack.
> However, contrary to all expectations, x87 code is 20% 
> faster(!!) /on p4, but it would be interesting to see this 
> comparison on x86_64, or perhaps on 32bit AMD/.
> The code structure, produced with -mfpmath=sse, is the same 
> as the code structure produced with -mfpmath=x87, so IMO 
> there is no register allocator effects in play.

I'll look into it and share what I see.
 
>   I was trying to look into this problem, but on first sight, 
> code seems optimal to me...

FWIW, here's some old data I got almost 2 years ago (run-times and geometric 
means of the ratios using SPEC's bases):

CPU2000 A       B
164.gzip        205s    203s
175.vpr 185s    188s
176.gcc 117s    116s
181.mcf 313s    314s
186.crafty      112s    112s
197.parser      268s    268s
252.eon 147s    167s
253.perlbmk     175s    180s
254.gap 148s    148s
255.vortex      178s    178s
256.bzip2       211s    202s
300.twolf       313s    328s
Int Geomean     812     801
177.mesa        173s    187s
179.art 346s    690s
183.equake      163s    162s
188.ammp        325s    336s
FP Geomean      757     620

Using GCC 3.3.3 from 3_3-hammer branch with the options for runs in column B 
were "-m32 -O3 -march=k8 -ffast-math -fomit-frame-pointer -malign-double +FDO", 
for column A, the same ones plus "-mfpmath=sse".  The system was a 1.4GHz 
Athlon 64 with PC2100 RAM.

Because things were so much better with SSE, I haven't run with x87 lately...

-- 
Evandro

Reply via email to