Rahul Nabar wrote:
On Tue, Aug 11, 2009 at 11:16 AM, Joe
Landman wrote:
There is a cost to going cheap. This cost is time, and loss of
productivity. If your time (your students time) is free, and you don't need
to pay for consequences (loss of grants, loss of revenue, loss of
productivity, ...
On Tue, Aug 11, 2009 at 11:16 AM, Joe
Landman wrote:
>
> There is a cost to going cheap. This cost is time, and loss of
> productivity. If your time (your students time) is free, and you don't need
> to pay for consequences (loss of grants, loss of revenue, loss of
> productivity, ...) in delayed
On Tue, 11 Aug 2009, Joe Landman wrote:
There is a cost to going cheap. This cost is time, and loss of productivity.
If your time (your students time) is free, and you don't need to pay for
consequences (loss of grants, loss of revenue, loss of productivity, ...) in
delayed delivery of result
2009/8/11 Rahul Nabar
> On Tue, Aug 11, 2009 at 5:57 PM, Bruno Coutinho
> wrote:
> > Nehalem and Barcelona have the following cache architecture:
> >
> > L1 cache: 64KB (32kb data, 32kb instruction), per core
> > L2 cache: Barcelona :512kb, Nehalem: 256kb, per core
> > L3 cache: Barcelona: 2MB, N
On Tue, Aug 11, 2009 at 5:57 PM, Bruno Coutinho wrote:
> Nehalem and Barcelona have the following cache architecture:
>
> L1 cache: 64KB (32kb data, 32kb instruction), per core
> L2 cache: Barcelona :512kb, Nehalem: 256kb, per core
> L3 cache: Barcelona: 2MB, Nehalem: 8MB , shared among all cores.
2009/8/11 Rahul Nabar
> On Tue, Aug 11, 2009 at 12:06 PM, Bill Broadley
> wrote:
> > Looks to me like you fit in the barcelona 512KB L2 cache (and get good
> > scaling) and do not fit in the nehalem 256KB L2 cache (and get poor
> scaling).
>
> Thanks Bill! I never realized that the L2 cache of th
On Thu, Apr 9, 2009 at 11:35 AM, Douglas J.
Trainor wrote:
> Rahul,
>
> I think Greg et al. are correct. Does your SC1435 have a Delta Electronics
> switching power supply? I bet you have a 600 watt Delta.
>
> Intel recently had problems with outsourced 350 watt "FHJ350WPS" switching
> power supp
On Tue, Aug 11, 2009 at 12:19 PM, Mikhail Kuzminsky wrote:
> If this results are for HyperThreading "ON", it may be not too strange
> because of "virtual cores" competition.
>
> But if this results are for switched off Hyperthreading - it's strange.
> I have usual good DFT scaling w/number of cores
Rahul Nabar wrote:
> On Tue, Aug 11, 2009 at 12:40 PM, Craig Tierney wrote:
>> What are you doing to ensure that you have both memory and processor
>> affinity enabled?
>>
>
> All I was using now was the flag:
>
> --mca mpi_paffinity_alone 1
>
> Is there anything else I ought to be doing as well
On Tue, Aug 11, 2009 at 12:06 PM, Bill Broadley wrote:
> Looks to me like you fit in the barcelona 512KB L2 cache (and get good
> scaling) and do not fit in the nehalem 256KB L2 cache (and get poor scaling).
Thanks Bill! I never realized that the L2 cache of the Nehalem is
actually smaller than th
On Tue, Aug 11, 2009 at 12:40 PM, Craig Tierney wrote:
> What are you doing to ensure that you have both memory and processor
> affinity enabled?
>
All I was using now was the flag:
--mca mpi_paffinity_alone 1
Is there anything else I ought to be doing as well?
--
Rahul
___
Joe Landman wrote:
> I am arguing for commodity systems. But some gear is just plain junk.
> Not all switches are created equal. Some inexpensive switches do a far
> better job than some of the expensive ones. Some brand name machines
> are wholly inappropriate as compute nodes, yet they ar
It's interesting, that for this hard&software configuration disabling
of NUMA in BIOS gives more high STREAM results in comparison w/"NUMA
enabled".
I.e. for NUMA "off": 8723/8232/10388/10317 MB/s
for NUMA "on": 5620/5217/6795/6767 MB/s
(both for OMP_NUM_THREADS=1 and ifort 11.1 compiler).
The
Daniel Pfenniger wrote:
There is a cost to *EVERYTHING*
Well, not really surprising. The point is to be quantitative,
not subjective (fear, etc.). Each solution has a cost and alert
people will choose the best one for them, not for the vendor.
Sadly, not always (choosing the best one f
Rahul Nabar wrote:
> On Mon, Aug 10, 2009 at 12:48 PM, Bruno Coutinho wrote:
>> This is often caused by cache competition or memory bandwidth saturation.
>> If it was cache competition, rising from 4 to 6 threads would make it worse.
>> As the code became faster with DDR3-1600 and much slower with
Joe Landman wrote:
Gerry Creager wrote:
Daniel Pfenniger wrote:
Douglas Eadline wrote:
[...]
This article sounds unbalanced and self-serving.
I thought it read a bit like a chronicle of my recent experiences.
Mine were not so bad, so I found the tone too pessimistic.
I think that this
In message from Rahul Nabar (Sun, 9 Aug 2009
22:42:25 -0500):
(a) I am seeing strange scaling behaviours with Nehlem cores. eg A
specific DFT (Density Functional Theory) code we use is maxing out
performance at 2, 4 cpus instead of 8. i.e. runs on 8 cores are
actually slower than 2 and 4 cores (
Rahul Nabar wrote:
> Exactly! But I thought this was the big advance with the Nehalem that
> it has removed the CPU<->Cache<->RAM bottleneck.
Not sure I'd say removed, but they have made a huge improvement. To the point
where a single socket intel is better than a dual socket barcelona.
> So if
+1
Joe Landman wrote:
Gerry Creager wrote:
Daniel Pfenniger wrote:
Douglas Eadline wrote:
[...]
This article sounds unbalanced and self-serving.
I thought it read a bit like a chronicle of my recent experiences.
I think that this article is fine, not unbalanced. What I like to point
On 11 Aug 2009, at 3:38 pm, Daniel Pfenniger wrote:
Douglas Eadline wrote:
All,
I posted this on ClusterMonkey the other week.
It is actually derived from a white paper I wrote for
SiCortex. I'm sure those on this list have some
experience/opinions with these issues (and other
cluster issues!)
Gerry Creager wrote:
Daniel Pfenniger wrote:
Douglas Eadline wrote:
[...]
This article sounds unbalanced and self-serving.
I thought it read a bit like a chronicle of my recent experiences.
I think that this article is fine, not unbalanced. What I like to point
out to customers and par
On Mon, Aug 10, 2009 at 01:02:51PM -0700, Rahul Nabar wrote:
> On Mon, Aug 10, 2009 at 2:09 PM, Joshua Baker-LePain wrote:
> > Well, as there are only 8 "real" cores, running a computationally intensive
> > process across 16 should *definitely* do worse than across 8.
Some workloads will benefit
Daniel Pfenniger wrote:
Douglas Eadline wrote:
All,
I posted this on ClusterMonkey the other week.
It is actually derived from a white paper I wrote for
SiCortex. I'm sure those on this list have some
experience/opinions with these issues (and other
cluster issues!)
The True Cost of HPC Clus
Douglas Eadline wrote:
All,
I posted this on ClusterMonkey the other week.
It is actually derived from a white paper I wrote for
SiCortex. I'm sure those on this list have some
experience/opinions with these issues (and other
cluster issues!)
The True Cost of HPC Cluster Ownership
http://w
All,
I posted this on ClusterMonkey the other week.
It is actually derived from a white paper I wrote for
SiCortex. I'm sure those on this list have some
experience/opinions with these issues (and other
cluster issues!)
The True Cost of HPC Cluster Ownership
http://www.clustermonkey.net//con
On Aug 10, 2009, at 23:07 , Tom Elken wrote:
Summary:
IBM, SGI and Platform have some comparisons on clusters with "SMT
On" of running 1 rank for every core compared to running 2 ranks on
every core. In general, on low core-counts, like up to 32 there is
about an 8% advantage for running
26 matches
Mail list logo