Re: [Beowulf] Strange Opteron 2350 performance: Gaussian-03

Mikhail Kuzminsky Sat, 28 Jun 2008 15:37:23 -0700

In message from Joe Landman <[EMAIL PROTECTED]> (Sat, 28Jun 2008 14:48:02 -0400):>

This is possible, depending upon the compiler used. Though I havetoadmit that I find it odd that it would be the case within the Opteronfamily and not between Opteron and Xeon.
Intel compilers used to (haven't checked 10.1) switch between fast(SSE*) and slow (x87 FP) paths as a function of a processor versionstring. If this is an old Intel compiler built code, this ispossible that the code paths may be different, though as noted, Iwould find that surprising if this were the case within the Opteronfamily.

Well, I thought about (absense of) using of SSE in binary Gaussian 03Rev.C02 versionI used, but even if x87-codes were really generated by pgf77 - whythis x87-based codes gives such "high" performance on Opteron 246 incomparison w/Opteron 2350 core ? On both CPUs I ran the same binaryGaussian codes !

Modern PGI compilers (suggested default for Gaussian-03 last Ichecked) have the ability to do this as well, though I don't know howthey implement it (capability testing hopefully?)
Out of curiousity, how does streams run on both systems?

I ran stream on Opteron 242 and 244 few years ago. The scalability andthe troughput itself was OK. Currently I ran stream on my Opteron2350-based dual-socket server. In accordance w/more fast DDR2-667 Iobtained more high throughput. I reproduced in particular 8-coresresult presented in McCalpin's table (sent from AMD), and some datapresented early on our Beowulf maillist.(BTW, there is one bad thing for stream on this server - thecorresponding data are absent in McCalpin's table: the throughput isscaled good from 1 to 2 OpenMP threads, and gives good result for 8threads, but the throughput for 4 threads is about the same as for 2threads. The reason is, IMHO, that for 8 threads RAM is allocated bykernel in both nodes, but for 4 threads the RAM allocated is placed inone node, and 4 threads have bad competition for memory access).Taking into account that Gaussian-03 was bad on Opteron 2350 core - insequential run, Opteron 2350 RAM gives it only pluses in comparisonw/Opteron 246. I didn't run stream on Opteron 246, but it's clear forme.

Also, itispossible, with a larger cache, that you might be running into someodd cache effects (tlb/page thrashing). But DFTs are usually "small"and thus "sensitive" to cache size.
You might be able to instrument the run within a papi wrapper, andsee if you observe a large number of cache/tlb flushes for somereason.
On a related note: are you using a stepping before B3 of 2350?Thatcould impact performance, if you have the patch in place or have thetlb/cache turned off in bios (some MB makers created a patch to dothis).

Gaussian-03 fails in link302 on Barcelona B2 because of this error. Iuse stepping B3.

Yours
Mikhail


Joe


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Strange Opteron 2350 performance: Gaussian-03

Reply via email to