Re: [Beowulf] latency vs bandwidth for NAMD

Jim Phillips Wed, 22 Aug 2007 15:41:24 -0700

I ran the TACC benchmarks myself, and the config files are posted athttp://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdAtTexas

Part of the performance difference may be due to differences between the2.6 released and newer cvs versions of NAMD, but even the released amd64binary ran really fast at TACC. There may be some special sauce in theservers or OS that I'm not aware of.


-Jim


On Wed, 22 Aug 2007, Kevin Ball wrote:

Hi Dow,

On Wed, 2007-08-22 at 09:52, Dow Hurst DPHURST wrote:

Jim and Kevin,
Why would the 4 core point on the performance benchmark be reversed between
the 2.66GHz and 3.0GHz?  I'm pretty sure that the Lonestar NAMD was
compiled with the Intel compilers.  I don't know what was used on the
Cambridge Darwin cluster.  Both machines are Intel Woodcrest dual cores and
dual physical CPUs per node.


I believe this is likely due to my lack of knowledge in regards to
tuning the Intel compilers.  If whoever submitted the TACC results would
be willing to send out their configuration files that would help resolve
this question.  Thanks!

-Kevin


Both Infinipath clusters listed on the performance benchmark have the best
scaling for the apoa1 benchmark between 128 to 512 cores.

Sure seems if SDR is good enough for an Intel Clovertown based cluster that
that would be more cost effective.  The Woodcrest and Clovertown are priced
about the same.

Thanks for your comments!
Dow

__________________________________
Dow P. Hurst, Research Scientist
Department of Chemistry and Biochemistry
University of North Carolina at Greensboro
435 New Science Bldg.
Greensboro, NC 27402-6170
[EMAIL PROTECTED]
[EMAIL PROTECTED]
336-334-4766 lab
336-334-5122 office
336-334-5402 fax

-----Jim Phillips <[EMAIL PROTECTED]> wrote: -----

To: Kevin Ball <[EMAIL PROTECTED]>
From: Jim Phillips <[EMAIL PROTECTED]>
Date: 08/22/2007 12:25PM
cc: Dow Hurst DPHURST <[EMAIL PROTECTED]>, beowulf@beowulf.org
Subject: Re: [Beowulf] latency vs bandwidth for NAMD


Those NAMD results are up now ("Cambridge Xeon/3.0 InfiniPath" at
http://www.ks.uiuc.edu/Research/namd/performance.html).  My opinion is
that SDR is sufficient for NAMD, but I haven't had a chance to see if
there is any benefit to DDR.  I did hear that the new TACC Ranger cluster
with 16 cores per node will use SDR.  I assume that on larger clusters the
switch is more likely to be the limiting factor than the card (I know
precious little about either).

-Jim


On Tue, 21 Aug 2007, Kevin Ball wrote:

Hi Dow,

 The QLE7240 DDR HCA is not available yet, but we do not expect that it
would have any substantial advantage on NAMD as compared to the QLE7140
(SDR), because we don't believe that NAMD requires substantial pt to pt
bandwidth from the interconnect.

 The TACC cluster is not using QLogic InfiniBand (IB) cards, but I
believe they are SDR IB cards from another vendor.

 Just last week I submitted a result to the folks at UIUC with results
on a similar cluster with the QLE7140.  It has not yet shown up on their
results page, but in essence, the scalability is similar until around
256 cores, at which point the results diverge with the QLE7140 cluster
dramatically outperforming the TACC cluster at 512 cores.

 I expect the QLE7140 results will show up in the next week or so on
that website, (http://www.ks.uiuc.edu/Research/namd/performance.html) so
you can compare to TACC performance at that time.  On that site you can
also see performance with a number of other machines, including an SGI
Altix with much higher pt to pt bandwidth yet worse scaling than IB,
which is part of why I don't think DDR will improve results.

 If you are interested in other MD codes, we have found advantages on
codes like CHARMM and GROMACS as well.  Some of thsee are detailed in a
white paper on our website:

http://www.qlogic.com/documents/datasheets/knowledge_data/whitepapers/HSG-WP07005.pdf


 Fair notice:  I work for QLogic on the InfiniPath product line.  I
have tried my best to make what bias I have open and clear.

-Kevin


On Fri, 2007-08-17 at 14:03, Dow Hurst DPHURST wrote:

I'd like to get advice on how latency affects scaling of molecular

dynamics

codes versus total bandwidth of the interconnect card.  We use NAMD as

the

molecular dynamics code and have had Ammasso RDMA interconnects.  Right
now, we have a chance to upgrade and add nodes to our cluster using
Infiniband.  I've found that NAMD was coded to be latency tolerant,
however, I'd like to scale up to 64 cores and beyond.  I'm going blind
reading IB card specs, performance benchmarks, and searching Google.

I'd

love some advice from someone who knows whether a consistent very low
latency IB card, such as the Infinipath QLE7140, is better/worse for

NAMD

than a higher latency but higher bandwidth card such as the QLE7240?  I

can

tell that Lonestar at TACC has great NAMD performance but I can't tell

what

IB card is used.  I imagine that switch performance plays a large role

too.

Thanks for your time,
Dow

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit

http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit

http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] latency vs bandwidth for NAMD

Reply via email to