On Feb 25, 2008, at 10:03 PM, Mark Hahn wrote:
I believe it's actually simultaneous, instructions from 2
different processes can run in the same cycle against 2 different
register files.
for some definition of 'simultaneous'. I suspect that netburst-HT
simply
runs with a thread until it stalls, then switches. I don't think
Intel ever detailed which stalling events do this. in some of the
initial papers
on netburst-HT, it was implied that the implementation was almost a
side-effect of how the chip tracks in-flight operations. since no
modern chip really has a unitary pipeline, HT might well tolerate
one thread
chugging through a microcoded transcendental at the same time as
another,
say, follows a pointer.
had assumed so but I appear to be confused about it.
Hyperthreading keeps a
thread ready to take advantage of stalls in a preceeding thread,
but doesn't
ever actually perform a second instruction in one click tick,
correct? One
I believe HT switching does happen cycle-by-cycle, and would guess
that in-flight ops from multiple threads can coexist (not executing
on the same unit in the same cycle, though.)
to me, this makes a lot more sense than manycore chips, actually.
manycore basically assumes that tracking inflight ops is the main
scaling problem with modern chips. that may be the case, but I've
never really heard it described as such.
imagine if, instead of 8 cores onchip, you just had 8 "thread
sequence"
units that contained fetch/decode, architected registers and
retirement.
and a single big pool of scoreboarded functional units, of course.
the advantage being that one thread could use many units. as
opposed to a static 8-core where each thread gets only the unit(s)
in its core...
Hi Mark,
Let's calculate with your imaginary chip where you get rid of the
multicore thought and have to get rid of out of order in order to get
your thread sequence idea to work:
If you've got 8 threads that execute each 1 instruction a cycle,
that's:
8 * 1 * 3Ghz = 24 Gflop double precision
Now let's compare with a todays quadcore, a system we build for just
600 euro, like the nodes i'm planning to build now:
198 euro for the chip @ 2.4Ghz of intel and 134 euro for amd @ 2.2Ghz:
here goes calculation against your 24 gflop:
4 cores * 3 instructions a cycle * 2 DP in each SSE2/SSE3 vector *
2.4Ghz = 24 * 2.4 = 48 + 9.6 = 57.6
It is very hard to beat todays quadcores with the imaginary cpu of
the future.
Multicore and out of order are big winners that butcher RISC and the
old Alpha engineers SMT idea completely,
with exception of power usage.
Multicore right now means BOOM you are factor 4.0 faster nearly (3.8
in case of my chessproggie), and out of order means you have
a potential of 3 to 4 instructions a cycle which is a big winner too.
Replacing that with some other technique SMT means the other
technique SMT needs to find a factor 12 in speed somewhere.
Vincent
I think the main takehome from netburst-HT is that SMT needs to
provide more units, not just provide a new way for two threads to
interfere.
regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf