Package: libc6
Version: 2.19-4
Severity: grave
Justification: causes non-serious data loss

Intel Broadwell-H and Skylake-S/H have critical errata that causes HLE
to be extremely dangerous to use on those processors, resulting in
unpredictable behavior (i.e. process crashes when you are lucky, data
corruption when you are not) when hardware lock-elision is enabled in
glibc/libpthread.

Broadwell errata BBD50 (desktop/mobile), BDW50 (server):

        An HLE (Hardware Lock Elision) transactional region begins with
        an instruction with the XACQUIRE prefix.  Due to this erratum,
        reads from within the transactional region of the memory
        destination of that instruction may return the value that was
        in memory before the transactional region began

According to the Intel errata list, a firmware fix is possible, but I
have no idea whether it is done by toggling a boot-locked MSR that
disables HLE, or through a microcode update.  The MSR is more likely,
but if it is a microcode update, it is going to be as much of a hazard
as the Haswell one that disabled TSX+HLE.

I recommend that we extend the HLE blacklist in glibc to also include
CPU signature 0x40671.  This will disable HLE on Xeon E3-1200v4, and
5th-generation Core i5/i7.  These processors are supposed to already
have TSX disabled (errata BBD51/BDW51).

Skylake's latest public specification update still doesn't list any HLE
errata, but it is not really recent.  OTOH, there is a Gentoo user's
report that Skylake is also unstable when HLE is enabled in glibc and
that the crashes stop when glibc is compiled without lock elision.

For that reason, it might be a good idea to also blacklist HLE on CPU
signatures 0x506e1, 0x506e2 and 0x506e3, which would disable HLE on
Skylake-S and Skylake-H (6th gen Core i5/i7).  This won't cover the
Skylake Xeon E3-1200v5, for which there are no reports of breakage (nor
a public specification update I could find).

References: 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=762195
https://bbs.archlinux.org/viewtopic.php?id=202545

In hindsight, it looks like we would have been better off by disabling
lock elision entirely for Debian jessie when we fixed #762195.
Something to consider when the time comes to fix this bug in stable
through a stable update...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

Reply via email to