Re: [Beowulf] Power 6

Vincent Diepeveen Sat, 25 Aug 2007 03:06:01 -0700

----- Original Message -----From: "Li, Bo" <[EMAIL PROTECTED]>

To: "Vincent Diepeveen" <[EMAIL PROTECTED]>; "Toon Knapen" <[EMAIL PROTECTED]>
Cc: <beowulf@beowulf.org>; "Robert G. Brown" <[EMAIL PROTECTED]>
Sent: Friday, August 24, 2007 4:57 PM
Subject: Re: [Beowulf] Intel Quad-Core or AMD Opteron

Intel will have CSI and on die memory controller soon following what AMDhas done for a few years.HT or CSI will help us build machines based on NUMA or similararchitectures.Based on current memory technologies, I can't find any methods for "memorywall". And a 4 core processor can eat all memory bandwidth in some cases.With NUMA we can gat machines work as several current machine butconnected with fast on-board connection. Image a super computer ondesktop, and what's next?Many-core processors are coming, how to power beowulf with them? I thinkit is a very interesting topic.Power 6 is a really strange processor for me. It works with a in orderarchitecture. I am looking forward to see any detailed evaluation for it.

I didn't get deep into power6 yet, as odds is near 0% i'll ever run on it,but for software that isn't using IBM libraries power6 might get a bigdissappointment.

My assumption with respect to power6 is that is will keep a very expensivecpu, drawing a lot of power.

Now at specint and specfp there is always programs where optimizing X or Ysuddenly boost that specific program factor 2 or so.

Of course as you see me co-author of sjeng-spec, i happen to know itshashtable is 150MB; more wasn't allowed by spec (WEIRD DECISION TAKING OVERTHERE). That fits nearly in power6's cache of course.

Sjeng-spec, in contradiction to the better software, is needing quite somebandwidth internally for move storage (using many integers whereas in Diepfor example is use 1 or 2 integers). All that should on paper be a majoradvantage for power6. For evaluation sjeng-spec uses a function table, whichi use in Diep too, and which had let crash several compilers from bigmanufacturers for their 'spec' test compile, years ago. In itself anotherinteresting question why something completely legal in C, which actuallygets used in quite some software, let those compilers crash.

Point is, normally spoken most cpu's mispredict all this. So that's anotheradvantage to power6.

Despite all those huge advantages for a highend chip compared to lowendchips,

practical the 4.7Ghz power6 is equal to a 2.x Ghz core2.

Whether it is 2.2Ghz or 2.4Ghz or 2.6Ghz, that''s all not so interesting.

I rechecked 5 times to be sure that i didn't make a mistake interpretingspecint data.I didn't even exactly calculate what speed core2 it is equal too. It's justso shocking slow for integer work.

Power6 gets delivered in 2008 to the universities. We all know that improvedintel and AMD cpu's in 2008 will by then for sure clock far over 3Ghz. Youcan of course risk big wars by guessing 3Ghz for AMD and 3.2Ghz for intel,it being unclear which of those 2 chips is faster by the time power6 getsdelivered, but i'm pretty convinced i keep on the safe side saying that forthe average number crunching program, the small tiny processors are on thewinning side.

If you go inorder, then deliver within 1 cpu a core or 64+ and clock it 4+Ghz :)


We all remember itanium2 just too well.

You don't buy a power6 just to do matrix calculations. You buy it to runsoftware that without much of a modification needs to run generic fast atit.

Of course for most universities the choice is a tad more complex. If youwant 100% garantuee that a new intel Xeon 4 core chip releases start 2008,you'll never get it.If you want garantuees as an university that a new AMD machine by 1 januari2008 can deliver you X gflops, that's a bad idea too.

Hardware goes that fast, it's only possible to know at the very last momentwhat amount of gflops an ordered machine is going to deliver.

What can give some garantuees is when AMD+Intel garantuee that a specificsocket is going to work for a new generations of chips.

That does give the possibility to order a new machine some months inadvance, without the need for a garantuee with respect to when the fastercpu's arrive.

We're speaking about a huge amount ofmoney difference here, not some % oftotal price, but rather some factors of price difference. Seems to me power6is still profitting from the way university commissions and subsidy rulestend to work.


Power6 might be one of the last highend cpu's.

I am not sure whether we must be sad or happy about just that.

Vincent

Regards,
Li, Bo

----- Original Message -----From: "Vincent Diepeveen" <[EMAIL PROTECTED]>

To: "Toon Knapen" <[EMAIL PROTECTED]>
Cc: <beowulf@beowulf.org>; "Robert G. Brown" <[EMAIL PROTECTED]>
Sent: Friday, August 24, 2007 9:37 PM
Subject: Re: [Beowulf] Intel Quad-Core or AMD Opteron

Even worse,

Does SSE2 code of intel not by default in th eintel primitives have an'if

then else' that at opteron it runs without using SIMD?

But apart from that, SIMD at oldie K8 is very slow compared to core2,
though not a factor 2. Barcelona for well optimized code should have an
IPC in SIMD of up to 40+% faster i guess than core2.

So only 2 questions are when they release and especially at *what* price
for the 4 socket mainboards.

A 16 core barcelona machine with 4 DDR2 memory controllers might be avery

mighty system for all kind of applications that need shared memory to
scale well.

When releasing Barcelona core within a few months from now, AMD has ahuge

lead over intel with respect to 4 core cpu's, as it seems to me.

I feel personally intels choice of CPU design using small tiny L1 caches
from performance viewpoint is a catastrophic one. If there is just ONE
competitor for an intel chip that manages to clock a cpu nearly at the
same clock like intel and with the same number of cores, then intel
usually gets totally outperformed. Now that intel & AMD produce
cpu's at the same type of machines their cpu's, it seems to me
that AMD will in general outperform intel.

Comparing the 2006 core2 with a 2003 release is not a very fair
compare IMHO.

We can definitely conclude that intel managed to produce their new
generation cpu ( core2) more than 1 year sooner than AMD did do, using a
simple trick, namely glueing 2 dual core chips together.

In the meantime i keep wondering more and more about intel not having an
equivalent on the market for AMD's hypertransport.

For highend, when buying multiple socket nodes, it is hard to see intelas

an alternative to barcelona core driven machines, as it doesn't have any
form of load balancing thanks to having just 1 memory controller for all
cores.

Most interesting for scientists might be buying a few nodes with some
double rail network and each node consisting out of 4 socket AMD machines
quadcore. Initially now perhaps 2Ghz. Then in end 2008 you can
upgrade the cpu's to 3+ Ghz.

When also putting a lot of RAM onto such AMD machine, then
such a node of course also totally annihilates power6, even before power6
gets taken into production, against a fraction of the price of a power6
node.

The advantage of using 4 socket machines for a cluster/supercomputer is

obviously the fact that the network costs form a smaller part of thetotal

solution, meanwhile keeping the total number of nodes limited.

A few nodes you could arguably use 8 socket solutions for, not to scaleup

to more cores, as most software can't handle such bad memory latencies,
but it might be you could even outgun power6 in terms of total memory a
node.

What is the amount of ram that power6 supports versus the 8 socket AMD
solutions?

Best Regards,
Vincent



On Fri, 24 Aug 2007, Toon Knapen wrote:

> I understand that, when comparing Quad-Core Xeons with Opterons,
> people focus on the scability issues of the different multi core
> architectures, but we've run some benchmarks on both and the thing
> that at the time surprised me the most was that if your application
> makes much use of the functions provided by Intel Math Kernel Library,
> a single Xeon core (e.g Clovertown) can be up to twice as fast as a
> single Opteron core.


You are comparing Intel MKL on Xeon with what exactly on Opteron? Intel
MKL on Opteron is certainly not optimal. I hope you compared to GotoBLAS
on Opteron.

t
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org

To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org

To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Power 6

Reply via email to