craig.topper added a comment. In D70157#1747726 <https://reviews.llvm.org/D70157#1747726>, @davezarzycki wrote:
> In D70157#1747640 <https://reviews.llvm.org/D70157#1747640>, @jyknight wrote: > > > In D70157#1747551 <https://reviews.llvm.org/D70157#1747551>, @davezarzycki > > wrote: > > > > > In D70157#1747510 <https://reviews.llvm.org/D70157#1747510>, @xbolva00 > > > wrote: > > > > > > > > Even though core2 isn't affected by the erratum, core2 code can run > > > > > on CPUs that do have the bug (and core2 is a popular target for code > > > > > that needs to run "everywhere"), therefore all target CPUs that > > > > > predate a hardware fix really > > > > > > > > So perf of all code/generic codegen will go down. This is not very > > > > acceptable. -1 for making this enabled by default for generic codegen. > > > > > > > > > I'm okay with this not being the default if we document somewhere that > > > LLVM assumes that machines have up to date microcode installed. > > > > > > So, my understanding is: > > > > 1. There is a CPU bug, that in very rare but unspecified circumstances, can > > cause branches of all kinds (conditional-jump, fused-conditional-jump, > > unconditional jump, call, return, indirect jumps and calls?) to go to the > > wrong place, if the instruction crosses a 64-byte cacheline boundary. > > 2. In order to fix that bug, the new CPU firmware disables the faster > > decoded-icache mechanism, falling back to the legacy decoder for any > > 32-byte block of code which ends with one of those jumps (because it works > > in 32-byte blocks, not 64-byte blocks). Falling back to the legacy decoder > > can have a sometimes-significant performance cost. Of course, this is not > > the _only_ reason that the CPU falls back to the legacy decoder, but it's a > > significant new one. > > > > This patch, then, spends some memory and icache space (and potentially > > decode-time) in order to try to get back the performance lost by the above > > fix. This can be worthwhile because if you can cause there not to be any > > 32-byte code sections that have a branch at the end, then the fallback to > > the legacy decoder won't trigger, and the performance will hopefully return > > to something comparable to the initial state. Unfortunately, while this > > ought to increase performance impact on all of the affected processors, it > > will have only negative effects on unaffected processors, since then you're > > spending memory and icache space, and not getting anything back in return. > > > > Similarly, if you're doing this extra alignment on code that isn't > > executed repeatedly out of the icache, it's also going to be useless. > > > > So, I'd say it's _not at all_ clear that it should be enabled by default. > > > > I'm also a bit confused as to why the default is set as it is. Why is the > > default only fused+jcc+jmp, not all branch types? > > > Strictly speaking, Intel doesn't say what happens if this bug occurs, nor do > they say that branches themselves are the problem, just "branch instruction > bytes" and that "unpredictable system behavior may occur". Compare and > contrast with the subsequent erratum where they explicitly say that x86, AVX, > and integer divide instructions (not instruction bytes) can have > unpredictable behavior. See SKX102 and SKX103: > https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html > > But we're digressing. I think the default policy discussion might be better > had on llvm-dev than a Phab review. I believe the unpredictable behavior erratum only affects conditional jumps. But the microcode update to workaround the erratum used a bigger hammer that effects all kinds of jumps, calls, and returns. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70157/new/ https://reviews.llvm.org/D70157 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits