Well every scientist who says he needs a lot of RAM now,
ECC-DDR2 ram has a cost of near nothing right now.

Very cheaply you can build nodes now with like 4 cheapo cpu's
and 128 GB ram inside.

There is no excuse for those who beg for big RAM to not buy a bunch of those
nodes.

What happens each time is that at the moment that finally the price of some sort of RAM drops (note that ECC-Registered DDR ram never has gotten cheap, much to my disappointment), that a newer generation RAM is there which again is really
expensive.

I tend to believe that many algorithms that require really a lot of ram can do with a bit less and profit from todays huge cpu power, using some clever tricks and enhancements and/or new algorithms (sometimes it is difficult to define what is a new algorithm, if it looks so much like a previous one with just a few new enhancements), which probably
are far from trivial.

Usually programming the 'new' algorithm efficiently low level is the big killerproblem why it doesn't get used yet (as there is no budget to hire people who are specialized here, or simply because they work for some other company or other government body).

I would really argue that sometimes you have to give industry some time to mass produce memory, just design a new generation cpu based upon the RAM that's there now and just read massively
parallel from that RAM. That also gives a HUGE bandwidth.

If some older GPU based upon DDR3 ram claims 106GB/s bandwidth to RAM,
versus todays Nehalem claims 32GB/s and is achieving a 17 to 18GB/s,
then obviously it wasn't important enough for intel to give us more
bandwidth to the RAM.

If nvidia/amd GPU's can do it years before, and latest cpu is a factor 4+ off then discussions
about bandwidth to RAM are quite artificial.

The reason for that is the limitations of SPEC to RAM consumption. They design a benchmark
years beforehand to use an amount of RAM that is "common" now.

I would argue that those most hungry for bandwidth/core crunching power is the scientific world
and/or safety research (air and car industry).

Note that i'm speaking of streaming bandwidth above. Most scientists do not know the difference between bandwidth and latency, basically because they are right that in the end it is
all bandwidth related from theoretical viewpoint.

Yet practical there is so many factors influencing the latency. Intel/ AMD/IBM are doing big efforts of course to reduce latency a lot. Maybe 95% of all their work onto a cpu (blindfolded guess
from a computer science guy - so not hardware designer)?

In the end it is all about the testsets in spec. If we manage to get a bunch of real WELL OPTIMIZED low level codes that eat gigabytes of RAM finally into that spec then within years AMD and Intel will show up with some
real fast cpu's for scientific workloads.

If all "professors" type RGB make a lot of noise world wide to get that done, then they have to follow.

Any criticism against intel and amd with respect to: "why not do this and that", i'm doing it also all the time, but at the same time if you look to what happens in spec, spec is only about "who has the best compiler and the biggest L2 cache that nearly can contain the entire working set size of this tiny RAM program".

Get some serious software into SPEC i'd argue.

To start looking at myself: the reason i didn't donate Diep is because competitors can
also obtain my code, whereas all those compiler
and hardware manufacturers i don't care if they have my proggies source code.

Vincent

On Dec 5, 2008, at 2:44 PM, Mark Hahn wrote:

(Well, duh).

yeah - the point seems to be that we (still) need to scale memory
along with core count.  not just memory bandwidth but also concurrency
(number of banks), though "ieee spectrum online for tech insiders"
doesn't get into that kind of depth :(

I still usually explain this as "traditional (ie Cray) supercomputing
requires a balanced system." commodity processors are always less balanced than ideal, but to varying degrees. intel dual-socket quad-core was probably the worst for a long time, but things are looking up as intel
joins AMD with memory connected to each socket.

stacking memory on the processor is a red herring IMO, though they appear to assumed that the number of dram banks will scale linearly with cores.
to me that sounds more like dram-based per-core cache.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to