In message from "Li, Bo" <[EMAIL PROTECTED]> (Sun, 29 Jun 2008 00:07:07
+0800):
Hello,
I am afraid there must be something wrong with your experiment.
How did you get the performance? Was your DFT codes running in
parallel? Any optimization involved?
I was afraid the same, but the results are reproduced twice.
As I wrote in my message:
- there were ONE CORE (one CPU for Opteron 246) runs
- the optimization was performed for OLD Opteron 246 (because
Gaussian, Inc do not propose binaries optimized specially for
Barcelona)
DFT test397 (as any other DFT) is parallelized well, and on Opteron
246 it gives 1.9 times speedup on 2 CPUs. But I didn't run 2-cores
parallelized job for Opteron 2350: I was stressed by results obtained
for 1 core.
In most of my test, K8L or K10 can beat old opteron at the same
frequency with about 20% improvement.
Sorry, do you have this on Gaussian-03 and for DFT in particular ? Did
you compile it on K10 using target=barcelona (i.e. optimized for
barcelona) ?
Yours
Mikhail
Regards,
Li, Bo
----- Original Message -----
From: "Mikhail Kuzminsky" <[EMAIL PROTECTED]>
To: <beowulf@beowulf.org>
Sent: Saturday, June 28, 2008 11:48 PM
Subject: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
I'm runnung a set of quad-core Opteron 2350 benchmarks, in
particular
using Gaussian-03 (binary version from Gaussian, Inc, i.e.
translated
by more old - than current - pgf77 version, for Opteron target).
I compare in particular *one core* of Opteron 2350 w/Opteron 246
having the same 2 Ghz frequency and the same amount of cache per
core
(512K L2 + 0.25*2 MB L3 for Opteron 2350 is just 1 MB L2 for Opteron
246). Opteron 246 has even more fast DDR2-667 RAM.
The Gaussian-03 performance in some cases is close for both
Opteron's
(I remember that compilation didn't know about Barcelona !), but for
very popular DFT method Opteron 2350 cores looks as slow: one job
gives 33% more bad (than Opteron 246) performance.
But on standard Gaussian-03 test397.com DFT/B3LYP test: *one* (1)
Opteron 2350 core run 15667 sec. (both startstop and cpu) vs 8709
sec.
on (one) Opteron 246 !!
There is no powersaved daemon, so the frequnecy of Opteron 2350 is
fixed to 2 Ghz. I reproduced this result twice on Opteron 2350, in
particular one time using forced good numactl behaviour. I'm
reproducing it on Opteron 246 again :-) but I have indirect
confirmation of this timings (based on 2-cpus Opteron 246 parallel
test).
Yes, AFAIK DFT method is cache-friendly, and more slow L3 cache in
Opteron 2350 may give more bad performance. But in 1.8 times ??
Any your comments are welcome.
Mikhail Kuzminsky
Computer Assistance to Chemical Research Center
Zelinsky Institute of Organic Chemistry
Moscow
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf