Re: [Beowulf] Opinions of Hyper-threading?

Jim Lux Thu, 28 Feb 2008 08:37:38 -0800

Quoting Mark Hahn <[EMAIL PROTECTED]>, on Thu 28 Feb 2008 07:33:07 AM PST:

The problem with many (cores|threads) is that memory bandwidthwall. A fixed size (B) pipe to memory, with N requesters onthat pipe ...
What wall? Bandwidth is easy, it just costs money, and not muchat that. Want 50GB/sec[1] buy a $170 video card. Want100GB/sec... buy a
Heh... if it were that easy, we would spend extra on more bandwidth for
Harpertown and Barcelona ...
I think the point is that chip vendors are not talking about mere
doubling of number of cores, but (apparently with straight faces),
things like 1k GP cores/chip.

personally, I think they're in for a surprise - that there isn't a vast
market for more than 2-4 cores per chip.

Perhaps not today. But then, Thomas Watson said there wasn't a vastmarket for computers.. perhaps 5 world wide.

No question that folks will have to figure out how to effectively useall that parallelism. (e.g. each processor deals with one page of aWord document, or a range of Excel cells?). I can see a lot of fairlyeasily coded things dealing with rapid search (e.g. which of mydocuments have the word hyperthreading and Hahn in them). Right now,search and retrieval of unstructured data is a very computationallyintensive task that millions of folks suffer through daily. (How manyof you find Google over the web faster than Microsoft's "Search forFile or Folder.." (or, greping the entire disk) on your local machine? )

And we cluster dweebs have a headstart on them... we've been dealingwith figuring out how to spread problems that are too big to fit onone node across multiples for years now. After all, billg'sprogramming fame is from a flood fill graphics algorithm, and look howwell he's done with that <grin>.

limits, and no programming technique is going to get you around that
limit per socket.  You need to change your programming technique to go
many socket.  That limit is the bandwidth wall.


IMO, this is the main fallacy behind the current industry harangue.
the problem is _NOT_ that programmers are dragging their feet, but
rather some combination of amdahl's law and the low average _inherent_
parallelism of computation.  (I'm _not_ talking about MC or graphics
rendering here, but today's most common computer uses: web and email.)

Text search and retrieval is where it's at. almost 30 years ago Iworked on developing a piece of office equipment the size of a 2drawer filecabinet that would do just that, hooked up to a bunch ofword processors (i.e. find me that letter we sent to John Smith).. Itwas expensive! It had a 80MB (or 160MB) disk drive (huge!), it couldsearch thousands of pages in the blink of an eye. (called theOFISfile, sold by Burroughs) And people DID buy it. And, withoutgiving away the internals, it could have made excellent use of a 1000core type processor.

Granted, the googles of the world will (correctly) contend that anequally good solution is to have a good comm link to a centralizedsearch and retrieval engine (doesn't even have to be that fast.. justcomparable to the time it takes me to enter the request and read theresults). But, they too can use parallelism.


the manycore cart is being put before the horse.  worse, no one has really
shown that manycore (and the presumed ccnuma model) is actually
scalable to large values on "normal" workloads.  (getting good scaling
for an AM
CFD code on 128 cores in an Altix is kind of a different proposition than
scaling to 128 cores in a single chip.)

To a certain extent it's an example of build it and they will come(to 10% of the things that are built, the other 90% are interestingblips left by the side of the road).

When compilers were introduced, I'm sure the skilled machine languagecoders said.. hmmph, we can do just fine with our octal and hex,there's no expressed demand for high level languages. (Kids..get offamy lawn!) Heck, the plugboard programmers on EAM equipment probablysaid that to the guys working with stored program computers. Andbefore that, the supervisor of the computer pool probably said thatto the plugboard guys, as he gazed over a room full of Marchandcalculators with computers punching numbers and pulling the handles.


what's missing is a reason to think that basically all workloads can be made
cache-friendly enough to scale to 10e2 or 10e3 cores.  I just don't see that.

Not all workloads... just enough so that it forms a significantmarket. and text search and retrieval is a pretty big consumer of CPUcycles, in the big wide world (as opposed to the specialized world oflarge numeric simulations and the like that have historically beenhosted on clusters)

Remember, the recurring cost is basically related to the size of thedie, not what's on it. So, if there's a significant market for 10,000processor widgets, they'll be made, and cheaply.

As data rates get higher, even really good bit error rates on thewire get to be too big. Consider this.. a BER of 1E-10 is quitegood, but if you're pumping 10Gb/s over the wire, that's an errorevery second. (A BER of 1E-10 is a typical rate for something like100Mbps link...). So, practical systems
I'm no expert, but 1e-10 seems quite high to me.  the docs I found about 10G
requirements all specified 1e-12, and claimed to have achieved 1e15 in
realistic, long-range tests...

That's probably the error rate above the PHY layer. I.e. after theforward error correction. And the 10G requirement is tighter than the100Mbps requirement, just to make FEC possible with reasonableredundancy. Typically, you want a raw PHY BER at least 100x away fromthe data rate (e.g. 1E8 bps->1E-10 BER, 1E10 bps->1E-12 BER)




_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Opinions of Hyper-threading?

Reply via email to