[gentoo-amd64] Re: Hyper-threading an AMD64 3800+

Duncan Thu, 11 Jun 2009 02:31:15 -0700

Volker Armin Hemmann <volkerar...@googlemail.com> posted
200906110022.26698.volkerar...@googlemail.com, excerpted below, on  Thu,
11 Jun 2009 00:22:26 +0200:

> On Donnerstag 11 Juni 2009, Greg wrote:
>> I've been having trouble determining if my processor has
>> hyper-threading. I'm thinking that it does. I know that it isn't a
>> dual-core.
>>
>> If it is a hyper-thread processor, I can't seem to figure out exactly
>> how to enable the hyper-thread under linux.
> 
> no amd supports hyper-threading. They have that flag because they are
> compatible - and if they are multicore to 'trick' stupid software that
> checks for ht to multi thread but does not multithread on multicore
> cpus.

More to the point, AMD CPUs don't /need/ hyper-threading to run 
efficiently.

Here's the deal on hyper-threading.

It first became popular (and I believe was first introduced, but I may be 
mistaken on that) with the Intel "Netburst" architecture, back in the 
last gasps of the clock-rate-is-everything era when Intel was doing 
everything they could to write those last few hundred MHz out of their 
CPUs, even at the expense of such deep pipelines that it actually hurt 
performance in many cases.  (Plus it ran way hot, and sucked up power at 
such a rate that people were doing projections indicating that at the 
rate things were going, in a few years each CPU was going to need its own 
Nuclear reactor power supply... and the cooling to go along with it!)

Happily Intel has moved beyond that stage now, and the core-2s and 
beyond, and moving to true dual-core and beyond, they once again began 
competing extremely favorably against AMD, but netburst was the last gasp 
of the old "ever higher clocks" process, and it simply didn't compete 
well at all.

One of the things Intel did with netburst to keep the clock rates rising 
was create an incredibly deep instruction pipeline.  Once the pipeline 
got full, the CPU still dispatched the typical instruction per clock tick 
(I say typical because some instructions take more than a tick, while 
others can be processed two at a time, so the detail is considerably more 
complex than one instruction one tick, but the general idea remains 
"typically" accurate), but each instruction took many ticks to work thru 
the pipeline, so the penalty was horrible for a branch mis-predict or 
other event that emptied the instruction pipeline, as the units at the 
end of the pipeline effectively had to sit there doing nothing for dozens 
of clock ticks, waiting for the new instructions to get processed to that 
point again, filling the pipeline.  To some degree they could compensate 
by using better branch prediction, pre-caching, and other techniques, but 
it really wasn't nearly enough to fully compensate for the penalty they 
were paying when the prediction was wrong, due to the incredibly deep 
pipelining.

So the Intel engineers came up with the solution the marketers billed 
"hyper-threading" in ordered to try to claw back some of the performance 
they were losing due to all this.  Basically, they added a bit of very 
fast local storage, giving the CPU access to it on a swapping basis.  
When one thread ran into a mis-prediction, thereby emptying the pipeline, 
instead of the components at the end of the pipeline waiting idle for 
several dozen clocks for the pipeline to refill, they swapped to the 
hyperthread and continued working on it.  Ideally, by the time it got 
stuck, the first one was ready to go again, so they could switch back to 
it, while they waited on the other one now.

Thus, what was really happening was that they were trying desperately to 
compensate for their design choice of an overly deep pipeline (forced on 
them by the pursuit of ever faster clock rates), and the marketers billed 
hyper-threading, in reality a very very clever but not really adequate 
compensation for a bad design choice, as a feature they were able to sell 
surprisingly effectively.

Meanwhile, AMD saw the light and decided the MHz game simply wasn't going 
to work for them.  They decided the loss of performance per clock they 
were seeing continuing to play the MHz game just wasn't worth it, and 
deliberately did NOT continue targeting the ever increasing clock rates, 
instead, choosing to emphasize their AMD64 instruction set and other 
features.

As a result, AMD's chips didn't have to pay the price of the incredibly 
deep pipeline Intel was using, and with their shorter pipeline, the 
penalty for mis-prediction was much lower as well, and it didn't really 
make sense to do the hyper-threading thing because it didn't really help 
with the lower mis-prediction penalty they were paying.

Thus, AMD never needed hyper-threading as compensation for their bad 
design choices and never implemented it, thus never getting to sell the 
very clever but still poor workaround for a poor design choice as a great 
feature, as Intel was doing at the time.

So that's where all the hype over hyper-threading first started.  
Eventually, tho, Intel realized the cost it was paying for pursuit of the 
MHz God wasn't worth it, and they came out with the Core-2s, which REALLY 
gave AMD a run for the money.  (Truth be told, the core-2s were spanking 
AMD's butt, performance-wise.  Added to that AMD in its turn slipped up 
with its original quad-core implementation in the phenoms, handing Intel 
the win for another few quarters.  The problem of course being that Intel 
is a far larger company than AMD, so it fumbling as it did for a couple 
years, didn't hurt it near as much as AMD's fumbling for just a couple 
quarters!)

Soon enough the real multi-cores came out, and hyper-threading as a 
rather poor substitute was somewhat forgotten.  However, Intel, having 
sold it as this great feature, found it was still in demand, with people 
wondering why their dual-cores couldn't use hyper-threading to appear as 
four cores, just as the single-core netburst arch had appeared as dual-
cores.

So the Intel marketing folks stuck their heads together with the 
engineering folks, and soon enough, hyper-threaded dual-cores were 
available as well.  The new architecture didn't really gain that much 
benefit from it as Intel had long since worked thru their way-too-long-
pipeline issues, so with the exception of rare corner-cases, hyper-
threading was now mostly buying performance directly from the real cores, 
and there was no gain under most loads that couldn't have been at least 
equally achieved by using the same transistor budget elsewhere, say for 
more cache, but once the market had been programmed to accept hyper-
threading as a solution, it demanded it, and seeing those extra "fake" 
cores listed /did/ look impressive, so Intel continued to provide what 
the market was now demanding, real performance gain or not.

That's where we are today.  On a modern CPU, hyper-threading provides 
very little real performance gain, one that actually may be a loss if one 
considers what else that same transistor budget could have otherwise been 
used for, but the market, once programmed for it, now continues to demand 
it, so Intel continues to provide it.

The (main) source for much of my understanding at the level explained 
above is Arstechnica's CPU writeups over the years, with additional 
articles as found on Tom's Hardware, Slashdot, and elsewhere.  Of course, 
when Ars does it, it's complete with unit and instruction flow diagrams, 
etc, plus much more detail that I gave above.  Anybody that's interested 
in this sort of thing really should follow Ars, as they have a guy that's 
really an expert in it following the industry for them, doing writeups on 
new developments generally some time after initial announcement, but 
before or immediately after initial full public release.  I've been 
following the articles there since the Pentium Pro era and the 
reliability level is very high.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

[gentoo-amd64] Re: Hyper-threading an AMD64 3800+

Reply via email to