Geoff Jacobs wrote:

Well, each Opteron core would have to split it's local memory pool with
it's sister, so pure bandwidth would be similar. The memory controller
on the Opteron would give a latency bonus, but the registered DIMMs
would incur a penalty. The Socket A motherboards are using an SIS
chipset which might be a little more tuned.

If the application largely factored out the interconnect, I could accept
the results being this close. But you're right. HT is so much better for
inter-process communication, and GROMACS should derive a big advantage
from it.

Hmmm... as with most codes, the details of the calculation, as well as the quality of the code base, the compiler used, etc factor into this as much if not more than the underlying interconnect at the small core count size systems.

If we take an overly simple calculation, it might scale one way, and yet when we do a different calculation, it will scale in a rather different manner as you are hitting different code paths by different amounts. This is why it is (extraordinarily) dangerous to use *standard* benchmarks (HPL, etc) as an indication of anything other than how much entropy you can generate (both in the physical waste heat view of entropy, and in the information theoretic destruction of bits view).

Without knowing the details of Doug's calc (yeah, I might look in short order, I am beating my head against a PPTP problem right now... ), it would be rather hard to assess why the calculation performs as it does.



--

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to