Mark Hahn wrote: >> They have a paper that explains it well and has some >> interesting benchmarks. >> >> http://sc06.supercomputing.org/schedule/pdf/pap225.pdf > > this is quite interesting. I wish they had done benchmarks with doubles, > especially since they alluded to, for instance, the n-body calculation > really needing at least careful consideration of precision/resolution. > (now that I think of it, using 23 bits of mantisas on a 256^3 FFT sounds > numerically dubious too.) > > interesting that for a 2.4GHz Cell, they get at most 10 FP Gflops per SPE. > does anyone have SGEMM numbers for a 3GHz Intel Core2? I'll guess that > efficiency of libgoto with 2 threads would be >= 80%, so flops would be > .8*2*8*3 =~ 40 Gflops, or half a Cell chip. makes it hard to argue for > wide use of Cell, I think...
Unfortunately, the reality is a little crappier. Sciencemark 2.0 SGEMM sees 11 gflops on an E6700. DGEMM sees 5-6 gflops. http://www.pcper.com/article.php?aid=265&type=expert&pid=3 This is an order of magnitude less performance than SGEMM predictions in the LBL paper. Unfortunately, the LBL numbers are only predictions. http://www.lbl.gov/Science-Articles/Archive/sabl/2006/Jul/CellProcessorPotential.pdf#search=%22sgemm%20cell%22 The linked article _is_ an evaluation of performance on an actual Cell chip. Unfortunately, it's a lower clocked pre-production example running an experimental pseudo-compiler. I'm interested in seeing SGEMM using Cell-specific intrinsics. Such a benchmark should represent the maximum practical performance peak. Note: even if the Sequoia numbers are approximately the same as SPE intrinsics, cell is still 7x faster than Core2. -- Geoffrey D. Jacobs Go to the Chinese Restaurant, Order the Special _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf