On Sep 17, 2010, at 9:11 AM, Bill Rankin wrote:

> 
> On Sep 17, 2010, at 7:39 AM, Hearns, John wrote:
> 
>> http://www.theregister.co.uk/2010/09/17/hotzle_on_brawny_and_wimpy_cores
> 
> Interesting article (more of a letter really) - to be honest when I first 
> scanned it I was not sure of what Holzle's actual argument was.  To me, he 
> omitted a lot of the details that makes this discussion much less 
> black-and-white (and much more interesting) than he would contend:
> 
> 1) He cites Amdahls' law, but leaves out Gustafson's.
> 
> 2) Yeah, parallel programming is hard.  Deal with it.  It's helps me pay my 
> bills.
> 
> 3) While I am not a die hard FLOPS/Watt evangelist, he seems to completely 
> ignore the power and cooling cost when discussing infrastructure costs.
> 
> The whole letter just seems like it's full of non-sequiturs and just a 
> general gripe about processor architectures.


This letter of Holzle's is consistent with our experience at SiCortex.  The 
cores we had were far more power efficient than the x86's, but they were 
slower.  Because the interconnect was so fast, generally you could scale up 
farther than with commodity clusters so that you got better absolute 
performance and better price-performance, but it was tiring to argue these 
points over and over again.  Especially to customers who weren't paying for 
their power or infrastructure and didn't really value the low power aspect.

Holzle's letter doesn't go into enough detail however.  One of the other ideas 
at SiCortex was that a slow core wouldn't affect application performance of 
codes that were actually limited by the memory system.  We noticed many codes 
running at 1 - 5% of peak performance, spending the rest of their time burning 
a lot of power waiting for the memory.  I think this argument has yet to be 
tested, because the first generation SC machines didn't actually have a very 
good memory system.  The cores were limited to a single outstanding miss.  I 
think there is a fairly good case to be made that systems with slower, low 
power cores can get higher average efficiencies (% of peak) than fast cores -- 
provided that the memory systems are close to equivalent.  Everyone is using 
the same DRAMs.

Of course this argument doesn't work well if the application is compute bound, 
or fits in the cache.

There are lots of alternative ideas in this space.  Hyperthreading switches to 
a different thread when one blocks on the memory, turboboost runs <faster> when 
the power envelope permits.  I recall a paper or two about slowing down cores 
in a parallel application until the app itself started to run more slowly, etc.

-L

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to