Re: [Beowulf] Opinions of Hyper-threading?

Vincent Diepeveen Wed, 27 Feb 2008 11:46:57 -0800


On Feb 25, 2008, at 10:03 PM, Mark Hahn wrote:

I believe it's actually simultaneous, instructions from 2different processes can run in the same cycle against 2 differentregister files.
for some definition of 'simultaneous'. I suspect that netburst-HTsimplyruns with a thread until it stalls, then switches. I don't thinkIntel ever detailed which stalling events do this. in some of theinitial papers
on netburst-HT, it was implied that the implementation was almost a
side-effect of how the chip tracks in-flight operations. since nomodern chip really has a unitary pipeline, HT might well tolerateone threadchugging through a microcoded transcendental at the same time asanother,
say, follows a pointer.
had assumed so but I appear to be confused about it.Hyperthreading keeps athread ready to take advantage of stalls in a preceeding thread,but doesn'tever actually perform a second instruction in one click tick,correct? One
I believe HT switching does happen cycle-by-cycle, and would guessthat in-flight ops from multiple threads can coexist (not executingon the same unit in the same cycle, though.)
to me, this makes a lot more sense than manycore chips, actually.
manycore basically assumes that tracking inflight ops is the mainscaling problem with modern chips. that may be the case, but I'venever really heard it described as such.
imagine if, instead of 8 cores onchip, you just had 8 "threadsequence"units that contained fetch/decode, architected registers andretirement.and a single big pool of scoreboarded functional units, of course.the advantage being that one thread could use many units. asopposed to a static 8-core where each thread gets only the unit(s)in its core...


Hi Mark,

Let's calculate with your imaginary chip where you get rid of themulticore thought and have to get rid of out of order in order to getyour thread sequence idea to work:


If you've got 8 threads that execute each 1 instruction a cycle,
that's:

8 * 1 * 3Ghz = 24 Gflop double precision

Now let's compare with a todays quadcore, a system we build for just600 euro, like the nodes i'm planning to build now:


198 euro for the chip @ 2.4Ghz of intel and 134 euro for amd @ 2.2Ghz:

here goes calculation against your 24 gflop:

4 cores * 3 instructions a cycle * 2 DP in each SSE2/SSE3 vector *2.4Ghz = 24 * 2.4 = 48 + 9.6 = 57.6

It is very hard to beat todays quadcores with the imaginary cpu ofthe future.

Multicore and out of order are big winners that butcher RISC and theold Alpha engineers SMT idea completely,

with exception of power usage.

Multicore right now means BOOM you are factor 4.0 faster nearly (3.8in case of my chessproggie), and out of order means you have

a potential of 3 to 4 instructions a cycle which is a big winner too.

Replacing that with some other technique SMT means the othertechnique SMT needs to find a factor 12 in speed somewhere.


Vincent

I think the main takehome from netburst-HT is that SMT needs toprovide more units, not just provide a new way for two threads tointerfere.
regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visithttp://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Opinions of Hyper-threading?

Reply via email to