On Thu, 01 Oct 2015, Henrique de Moraes Holschuh wrote: > We have a fix for the HLE BDW50 errata confirmed for Broadwell-H, > through updated microcode. > > Broadwell-H errata BDW50 fix: > signature 0x40671, pf_mask 0x22, revision >= 0x12 > > Which would allow us to selectively blacklist only Broadwell with > "outdated" microcode. I can (and will) make a major ruckus about this
... > There is also a Skylake microcode update available (dated 2015-08-08), I Which has been confirmed to fix the HLE issue there, as well. Further datapoints: other than the E7-v3 Xeons, it looks like HLE is critically broken on anything running microcode older than 2015-04. We can probably depend on Broadwell-DE (Xeon D-1500) motherboards to ship with new-enough microcode. My search was not completely exhaustive and I don't have priviledged access to any Intel documentation, so I might have missed a processor family or two, etc. Still, I could not find *anything* supposed to support TSX/HLE that didn't have either the old Haswell "TSX may cause unpredictable system behavior" erratum, or the newer errata "TSX not available" and "reading the memory destination of an instruction that begins an HLE transaction may return the original value" listed. Skylake's spec update doesn't list them yet as of 2015-10-04, but we *know* it has either the same or very similar errata, and that it got fixed by a recent microcode update. So, for non-free and Ubuntu, microcode updates through the intel-microcode package are likely to be a viable way to fix this: it all depends on the required microcode updates being made available in the first place. But non-free is not Debian, people rarely update their firmware unless you push hard for it, and it takes at least six months for fixed microcode to be reasonably available through firmware updates. Just ignoring the issue (read: passively documenting it), while still an option, should be left as the least desireable choice IMO. Unfortunately, blacklisting HLE by microcode revision would require parsing /proc/cpuinfo ATM, which is not really desireable for the HLE blacklist code, to put it lightly. So, it looks like any blacklisting done in the library code will have to be all-or-nothing: fixing the processor by a microcode update will not lift the blacklist. Also, processors that share the same CPU signature have to blacklisted as a group, even if they take different microcode (which would also be a problem for microcode-revision-based whitelists: we *might* need to know the processor's microcode platform flags in some cases). I recommend that, for Debian stable (jessie), we switch to a whitelist-based approach for HLE support, currently only whitelisting the latest stepping of Haswell-EX (Xeon E7-v3) and Broadwell-DE (Xeon D-1500). We can revisit that decision in six months or one year, and possibly switch back to blacklisting instead of whitelisting. Only processors that are known to never have been widely deployed with HLE errata would be eligible to be whitelisted. This means at least Broadwell, Broadwell-H, and Skylake-H/S would never get HLE support reenabled in Debian jessie, which includes several Xeon processors. Obviously, if we ever find a way to make the blacklist microcode-revision aware, we can do better. For unstable, we could adopt the same whitelisting approach in the short term (three to six months), while we work on something more flexible that would allow processors that got a later-than-launch errata fix to get delisted from the HLE "whitelist-based blacklist". One should keep in mind that, if we add such blacklisting, we also need to decide how we will deal with removals from the blacklist in the future due to fixed microcode being made available: should we lift the blacklisting for a processor signature, it will regress systems still missing the microcode update (fixable by installing non-free intel-microcode and rebooting before upgrading glibc). It is possible to add preinst logic to abort glibc install/upgrades for the "we are removing this processor signature from the blackist, and /proc/cpuinfo lists a microcode revision known to be broken" case. This takes care of regressions (in a rather user-unfriendly way, though) should we decide that users ought to either install firmware updates, or tolerate installing non-free intel-microcode. Something would need to be done for the Debian installer as well (to address new installs). I really wish we had without-HLE and with-HLE variants of glibc for x86-64, with non-HLE being the preferred/default choice for now (the preferred choice being something to revisit in the future, as working HLE becomes more widespread). Then, we could have as-complex-as-required blacklisting logic in the preinst of the HLE variant, which could be easily be made microcode-revision aware, etc. It would be really user-unfriendly when tripped (refuse the install / abort the upgrade), but at least it would be safer. Comments? -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh