Volker Armin Hemmann <volkerar...@googlemail.com> posted 200906110022.26698.volkerar...@googlemail.com, excerpted below, on Thu, 11 Jun 2009 00:22:26 +0200:
> On Donnerstag 11 Juni 2009, Greg wrote: >> I've been having trouble determining if my processor has >> hyper-threading. I'm thinking that it does. I know that it isn't a >> dual-core. >> >> If it is a hyper-thread processor, I can't seem to figure out exactly >> how to enable the hyper-thread under linux. > > no amd supports hyper-threading. They have that flag because they are > compatible - and if they are multicore to 'trick' stupid software that > checks for ht to multi thread but does not multithread on multicore > cpus. More to the point, AMD CPUs don't /need/ hyper-threading to run efficiently. Here's the deal on hyper-threading. It first became popular (and I believe was first introduced, but I may be mistaken on that) with the Intel "Netburst" architecture, back in the last gasps of the clock-rate-is-everything era when Intel was doing everything they could to write those last few hundred MHz out of their CPUs, even at the expense of such deep pipelines that it actually hurt performance in many cases. (Plus it ran way hot, and sucked up power at such a rate that people were doing projections indicating that at the rate things were going, in a few years each CPU was going to need its own Nuclear reactor power supply... and the cooling to go along with it!) Happily Intel has moved beyond that stage now, and the core-2s and beyond, and moving to true dual-core and beyond, they once again began competing extremely favorably against AMD, but netburst was the last gasp of the old "ever higher clocks" process, and it simply didn't compete well at all. One of the things Intel did with netburst to keep the clock rates rising was create an incredibly deep instruction pipeline. Once the pipeline got full, the CPU still dispatched the typical instruction per clock tick (I say typical because some instructions take more than a tick, while others can be processed two at a time, so the detail is considerably more complex than one instruction one tick, but the general idea remains "typically" accurate), but each instruction took many ticks to work thru the pipeline, so the penalty was horrible for a branch mis-predict or other event that emptied the instruction pipeline, as the units at the end of the pipeline effectively had to sit there doing nothing for dozens of clock ticks, waiting for the new instructions to get processed to that point again, filling the pipeline. To some degree they could compensate by using better branch prediction, pre-caching, and other techniques, but it really wasn't nearly enough to fully compensate for the penalty they were paying when the prediction was wrong, due to the incredibly deep pipelining. So the Intel engineers came up with the solution the marketers billed "hyper-threading" in ordered to try to claw back some of the performance they were losing due to all this. Basically, they added a bit of very fast local storage, giving the CPU access to it on a swapping basis. When one thread ran into a mis-prediction, thereby emptying the pipeline, instead of the components at the end of the pipeline waiting idle for several dozen clocks for the pipeline to refill, they swapped to the hyperthread and continued working on it. Ideally, by the time it got stuck, the first one was ready to go again, so they could switch back to it, while they waited on the other one now. Thus, what was really happening was that they were trying desperately to compensate for their design choice of an overly deep pipeline (forced on them by the pursuit of ever faster clock rates), and the marketers billed hyper-threading, in reality a very very clever but not really adequate compensation for a bad design choice, as a feature they were able to sell surprisingly effectively. Meanwhile, AMD saw the light and decided the MHz game simply wasn't going to work for them. They decided the loss of performance per clock they were seeing continuing to play the MHz game just wasn't worth it, and deliberately did NOT continue targeting the ever increasing clock rates, instead, choosing to emphasize their AMD64 instruction set and other features. As a result, AMD's chips didn't have to pay the price of the incredibly deep pipeline Intel was using, and with their shorter pipeline, the penalty for mis-prediction was much lower as well, and it didn't really make sense to do the hyper-threading thing because it didn't really help with the lower mis-prediction penalty they were paying. Thus, AMD never needed hyper-threading as compensation for their bad design choices and never implemented it, thus never getting to sell the very clever but still poor workaround for a poor design choice as a great feature, as Intel was doing at the time. So that's where all the hype over hyper-threading first started. Eventually, tho, Intel realized the cost it was paying for pursuit of the MHz God wasn't worth it, and they came out with the Core-2s, which REALLY gave AMD a run for the money. (Truth be told, the core-2s were spanking AMD's butt, performance-wise. Added to that AMD in its turn slipped up with its original quad-core implementation in the phenoms, handing Intel the win for another few quarters. The problem of course being that Intel is a far larger company than AMD, so it fumbling as it did for a couple years, didn't hurt it near as much as AMD's fumbling for just a couple quarters!) Soon enough the real multi-cores came out, and hyper-threading as a rather poor substitute was somewhat forgotten. However, Intel, having sold it as this great feature, found it was still in demand, with people wondering why their dual-cores couldn't use hyper-threading to appear as four cores, just as the single-core netburst arch had appeared as dual- cores. So the Intel marketing folks stuck their heads together with the engineering folks, and soon enough, hyper-threaded dual-cores were available as well. The new architecture didn't really gain that much benefit from it as Intel had long since worked thru their way-too-long- pipeline issues, so with the exception of rare corner-cases, hyper- threading was now mostly buying performance directly from the real cores, and there was no gain under most loads that couldn't have been at least equally achieved by using the same transistor budget elsewhere, say for more cache, but once the market had been programmed to accept hyper- threading as a solution, it demanded it, and seeing those extra "fake" cores listed /did/ look impressive, so Intel continued to provide what the market was now demanding, real performance gain or not. That's where we are today. On a modern CPU, hyper-threading provides very little real performance gain, one that actually may be a loss if one considers what else that same transistor budget could have otherwise been used for, but the market, once programmed for it, now continues to demand it, so Intel continues to provide it. The (main) source for much of my understanding at the level explained above is Arstechnica's CPU writeups over the years, with additional articles as found on Tom's Hardware, Slashdot, and elsewhere. Of course, when Ars does it, it's complete with unit and instruction flow diagrams, etc, plus much more detail that I gave above. Anybody that's interested in this sort of thing really should follow Ars, as they have a guy that's really an expert in it following the industry for them, doing writeups on new developments generally some time after initial announcement, but before or immediately after initial full public release. I've been following the articles there since the Pentium Pro era and the reliability level is very high. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman