hi Bill

I'm not limited by knowledge on materials unlike you.
I'd argue if something gives 10x the clockrate it destroys everything of course,
even at 1/10 th of the the transistor capacity.

Current status is:
Phenom2 overclocks better and is real cheap and when programmed real low level near assembler level it's having a faster IPC than Nehalem. Especially in SSE2 type codes it's dominant. Just the compiler fools you, it's intel friendly, to say polite. That seems current status.

Yet objectively, Q6600 was a quantum leap forward. A brilliant design, when it released.
Connected L2's or not connected, who cares when it delivers a big punch?
Testsetprogram tricks like hyperthreading, have seen this, done that. It doesn't work for most HPC type
workloads. Just makes timing your software more complicated.

All the cpu's are still 4 cores, that's reality. I don't see progress in multicores.

Newer processtechnology from 65 nm to 45 nm, hopefully it produces cpu's cheaper, yet it hardly clocks a lot higher at production level. Only for watercooled overclockers it makes AMD suddenly very attractive now, yet that's not how clusters get build usually (my cluster probably is a big exception anyway,
it has 1 node currently to give one example).

Nehalem hardly is better performing IPC wise than Q6600 for integer workloads, and it is doing so
at a huge powercost. Phenom2 in fact
is 0% better integerwise than Phenom1. Even more disappointing in that respect. Just its price is cool. factor 4 cheaper than Nehalem 3.2Ghz Nearly factor 5. And just 200Mhz lower default clock.

I'm quite disappointed by the new cpu's from intel and amd to be honest.

The way these manufacturers 'fix' performance on paper is by using special testsuites. Most testsets are too L3 oriented and too much subject of optimization of compiler teams.

If you have a $100 billion company and under 100 'test programs' that's what you get, then for such a huge company with such a huge compiler team it's too easy to bust everything.

Current new generation cpu's are faster on paper, in reality they aren't.
     "paper supports everything"

A $100 billion companies will bust every test and manage to manipulate in new tests with a datasize that benefits big L3's whereas in reality big L3's are just not needed for HPC.

That's just total ballony for matrix calculations, CFD whatever. Either your code hardly gets inside L3, or you need that much gigabytes of RAM that L3 doesn't matter either. A few mb's is enough.
4 MB versus 16MB is no big deal simply.

Only some 'chosen' working set sizes benefit to L3.

A 20Ghz PhenomGAaS will of course destroy everything.

As explained however, that doesn't really matter, because L3's eat relative little power compared to
the execution logics, so that is a big bummer in that case.

My plans for a 128 core (each core low power) multiprocessor, which allows easy porting of HPC codes to it, as i voted for say 50% of the total ram assigned to each core local through a local L2 (total not-shared with the other cores) and a very slow, possibly even off-chip L3 cache to a shared memory (the other 50% of the RAM), it got laughed away by some intel fanboys. If that's the case then intel is dead in HPC of course as nvidia and AMD will take over with GPU type supercomputers. I tend to have more faith in engineers though than the fanboys do and more than most professors are. I believe in new solutions, not in vicious circles that were the past.

A manycore is really complicated to write efficient algorithms for, whereas some modified multicore type cpu,
is easy to port codes to.

I'd argue approaching things from software viewpoint: WHAT IS EASY TO PORT might be a rather good idea for
future cpu design.

If you quote now something that can run at 10x the clockspeed, then the question is of course: "suppose we would make a big building filled with GaAs processing units, at what price can you build it me and what computing power does it
give at what power?

If the answer is: "the building might explode with odds 1 in a million", i'm sure some governments want to take that risk if it is that much faster. See it as a feature. Ideal feature to sell to N*SA i'd argue.

The amount of power it uses is quite important IMHO.

Power should be ever more a bigger concern in highend HPC i feel. Right now it is paper demands from governments that just receive lied statements - i feel this is unsellable in future to government. The amount of watt a gflop matters quite a lot.

If it was that easy to produce energy, we would of course already have cars on electricity or drive on water.

Of course we want ECC and ECC ram on every design. Too many errors at such computing power is not acceptable.

Best Regards,
Vincent



On Jan 19, 2009, at 7:31 PM, Bill Broadley wrote:

John Hearns wrote:
BTW, re the discussion on processor frequency scaling,
what finally did happen to Emitter Coupled Logic and gallium arsenide?

I followed the exponential "intel killer" for quite some time, although it seemed obvious to me from the first slides it was going to be a failure. Sky high clock rates, tiny caches, and a poor memory buss seemed to be destined
for failure.

If gallium arsenide or some other material gave us 10x the clock rate per watt, but 1/2 the transistors would it really matter? Seemed like even intel is begrudgingly admitting it's the memory bus, and finally the nehalem is
blessed with dramatically more bandwidth.

Seems like increasingly cores are turning latency limited workloads (for the parallel jobs of course) into bandwidth limited ones. Without a memory bus that allows for 10x the bandwidth it doesn't really seem like 10x the clock
rate would be of particular use.

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to