On Feb 25, 2008, at 10:03 PM, Mark Hahn wrote:

I believe it's actually simultaneous, instructions from 2 different processes can run in the same cycle against 2 different register files.

for some definition of 'simultaneous'. I suspect that netburst-HT simply runs with a thread until it stalls, then switches. I don't think Intel ever detailed which stalling events do this. in some of the initial papers
on netburst-HT, it was implied that the implementation was almost a
side-effect of how the chip tracks in-flight operations. since no modern chip really has a unitary pipeline, HT might well tolerate one thread chugging through a microcoded transcendental at the same time as another,
say, follows a pointer.

had assumed so but I appear to be confused about it. Hyperthreading keeps a thread ready to take advantage of stalls in a preceeding thread, but doesn't ever actually perform a second instruction in one click tick, correct? One

I believe HT switching does happen cycle-by-cycle, and would guess that in-flight ops from multiple threads can coexist (not executing on the same unit in the same cycle, though.)

to me, this makes a lot more sense than manycore chips, actually.
manycore basically assumes that tracking inflight ops is the main scaling problem with modern chips. that may be the case, but I've never really heard it described as such.

imagine if, instead of 8 cores onchip, you just had 8 "thread sequence" units that contained fetch/decode, architected registers and retirement. and a single big pool of scoreboarded functional units, of course. the advantage being that one thread could use many units. as opposed to a static 8-core where each thread gets only the unit(s) in its core...

Hi Mark,

Let's calculate with your imaginary chip where you get rid of the multicore thought and have to get rid of out of order in order to get your thread sequence idea to work:

If you've got 8 threads that execute each 1 instruction a cycle,
that's:

8 * 1 * 3Ghz = 24 Gflop double precision

Now let's compare with a todays quadcore, a system we build for just 600 euro, like the nodes i'm planning to build now:

198 euro for the chip @ 2.4Ghz of intel and 134 euro for amd @ 2.2Ghz:

here goes calculation against your 24 gflop:

4 cores * 3 instructions a cycle * 2 DP in each SSE2/SSE3 vector * 2.4Ghz = 24 * 2.4 = 48 + 9.6 = 57.6

It is very hard to beat todays quadcores with the imaginary cpu of the future.

Multicore and out of order are big winners that butcher RISC and the old Alpha engineers SMT idea completely,
with exception of power usage.

Multicore right now means BOOM you are factor 4.0 faster nearly (3.8 in case of my chessproggie), and out of order means you have
a potential of 3 to 4 instructions a cycle which is a big winner too.

Replacing that with some other technique SMT means the other technique SMT needs to find a factor 12 in speed somewhere.

Vincent


I think the main takehome from netburst-HT is that SMT needs to provide more units, not just provide a new way for two threads to interfere.

regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to