Re: [Beowulf] Multicore Is Bad News For Supercomputers

Vincent Diepeveen Fri, 05 Dec 2008 09:22:34 -0800

Well every scientist who says he needs a lot of RAM now,
ECC-DDR2 ram has a cost of near nothing right now.


Very cheaply you can build nodes now with like 4 cheapo cpu's
and 128 GB ram inside.

There is no excuse for those who beg for big RAM to not buy a bunchof those

nodes.

What happens each time is that at the moment that finally the priceof some sortof RAM drops (note that ECC-Registered DDR ram never has gottencheap, muchto my disappointment), that a newer generation RAM is there whichagain is really

expensive.

I tend to believe that many algorithms that require really a lot ofram can do with a bitless and profit from todays huge cpu power, using some clever tricksand enhancementsand/or new algorithms (sometimes it is difficult to define what is anew algorithm, if itlooks so much like a previous one with just a few new enhancements),which probably

are far from trivial.

Usually programming the 'new' algorithm efficiently low level is thebig killerproblem whyit doesn't get used yet (as there is no budget to hire people who arespecialized here,or simply because they work for some other company or othergovernment body).

I would really argue that sometimes you have to give industry sometime to mass produce memory,just design a new generation cpu based upon the RAM that's there nowand just read massively

parallel from that RAM. That also gives a HUGE bandwidth.

If some older GPU based upon DDR3 ram claims 106GB/s bandwidth to RAM,
versus todays Nehalem claims 32GB/s and is achieving a 17 to 18GB/s,
then obviously it wasn't important enough for intel to give us more
bandwidth to the RAM.

If nvidia/amd GPU's can do it years before, and latest cpu is afactor 4+ off then discussions

about bandwidth to RAM are quite artificial.

The reason for that is the limitations of SPEC to RAM consumption.They design a benchmark

years beforehand to use an amount of RAM that is "common" now.

I would argue that those most hungry for bandwidth/core crunchingpower is the scientific world

and/or safety research (air and car industry).

Note that i'm speaking of streaming bandwidth above. Most scientistsdo not know thedifference between bandwidth and latency, basically because they areright that in the end it is

all bandwidth related from theoretical viewpoint.

Yet practical there is so many factors influencing the latency. Intel/AMD/IBM are doingbig efforts of course to reduce latency a lot. Maybe 95% of all theirwork onto a cpu (blindfolded guess

from a computer science guy - so not hardware designer)?

In the end it is all about the testsets in spec. If we manage to geta bunch of real WELL OPTIMIZED low levelcodes that eat gigabytes of RAM finally into that spec then withinyears AMD and Intel will show up with some

real fast cpu's for scientific workloads.

If all "professors" type RGB make a lot of noise world wide to getthat done, then they have to follow.

Any criticism against intel and amd with respect to: "why not do thisand that", i'm doing it also all the time,but at the same time if you look to what happens in spec, spec isonly about "who has the best compilerand the biggest L2 cache that nearly can contain the entire workingset size of this tiny RAM program".


Get some serious software into SPEC i'd argue.

To start looking at myself: the reason i didn't donate Diep isbecause competitors can

also obtain my code, whereas all those compiler

and hardware manufacturers i don't care if they have my proggiessource code.


Vincent

On Dec 5, 2008, at 2:44 PM, Mark Hahn wrote:

(Well, duh).
yeah - the point seems to be that we (still) need to scale memory
along with core count.  not just memory bandwidth but also concurrency
(number of banks), though "ieee spectrum online for tech insiders"
doesn't get into that kind of depth :(

I still usually explain this as "traditional (ie Cray) supercomputing
requires a balanced system." commodity processors are always lessbalancedthan ideal, but to varying degrees. intel dual-socket quad-corewas probably the worst for a long time, but things are looking upas intel
joins AMD with memory connected to each socket.
stacking memory on the processor is a red herring IMO, though theyappearto assumed that the number of dram banks will scale linearly withcores.
to me that sounds more like dram-based per-core cache.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Multicore Is Bad News For Supercomputers

Reply via email to