Re: [Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?

Mikhail Kuzminsky Sat, 13 Oct 2007 08:14:49 -0700

In message from Mark Hahn <[EMAIL PROTECTED]> (Fri, 12 Oct 200716:09:05 -0400 (EDT)):

This means that 2 additional FP results per cycle inmicroarchitecture givesonly about 7% of performance increase :-(
the 4 flops/cycle is really for linpack-like code: it assumes you areexecuting packed double SIMD.

Yes, but AFAIK most of the modern optimizing F9x compilers for x86 cangenerate codes w/SSEx instructions (instead of x87). And I assume thatmany real world codes, including some from SPECfp2006 set, includesthe work w/floating point vectors. It's not necessary to have verylong vectors - taking into account that 64 bit SSE vectors havelength=2.Such things may gives theoretically 2x speedup !

just that not all FP is SIMD-friendly, I think.

Yes, I agree w/"not all". But 7% speedup means, I beleive, "veryseldom FP codes" ?


Yours
Mikhail

if your code spendsa lot of time in blas/lapack functions, I would expect it to see goodspeedup.
regards, mark hahn.


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?

Reply via email to