craig.topper added a comment.

In D70157#1747726 <https://reviews.llvm.org/D70157#1747726>, @davezarzycki 
wrote:

> In D70157#1747640 <https://reviews.llvm.org/D70157#1747640>, @jyknight wrote:
>
> > In D70157#1747551 <https://reviews.llvm.org/D70157#1747551>, @davezarzycki 
> > wrote:
> >
> > > In D70157#1747510 <https://reviews.llvm.org/D70157#1747510>, @xbolva00 
> > > wrote:
> > >
> > > > > Even though core2 isn't affected by the erratum, core2 code can run 
> > > > > on CPUs that do have the bug (and core2 is a popular target for code 
> > > > > that needs to run "everywhere"), therefore all target CPUs that 
> > > > > predate a hardware fix really
> > > >
> > > > So perf of all code/generic codegen will go down. This is not very 
> > > > acceptable. -1 for making this enabled by default for generic codegen.
> > >
> > >
> > > I'm okay with this not being the default if we document somewhere that 
> > > LLVM assumes that machines have up to date microcode installed.
> >
> >
> > So, my understanding is:
> >
> > 1. There is a CPU bug, that in very rare but unspecified circumstances, can 
> > cause branches of all kinds (conditional-jump, fused-conditional-jump, 
> > unconditional jump, call, return, indirect jumps and calls?) to go to the 
> > wrong place, if the instruction crosses a 64-byte cacheline boundary.
> > 2. In order to fix that bug, the new CPU firmware disables the faster 
> > decoded-icache mechanism, falling back to the legacy decoder for any 
> > 32-byte block of code which ends with one of those jumps (because it works 
> > in 32-byte blocks, not 64-byte blocks). Falling back to the legacy decoder 
> > can have a sometimes-significant performance cost. Of course, this is not 
> > the _only_ reason that the CPU falls back to the legacy decoder, but it's a 
> > significant new one.
> >
> >   This patch, then, spends some memory and icache space (and potentially 
> > decode-time) in order to try to get back the performance lost by the above 
> > fix. This can be worthwhile because if you can cause there not to be any 
> > 32-byte code sections that have a branch at the end, then the fallback to 
> > the legacy decoder won't trigger, and the performance will hopefully return 
> > to something comparable to the initial state. Unfortunately, while this 
> > ought to increase performance impact on all of the affected processors, it 
> > will have only negative effects on unaffected processors, since then you're 
> > spending memory and icache space, and not getting anything back in return.
> >
> >   Similarly, if you're doing this extra alignment on code that isn't 
> > executed repeatedly out of the icache, it's also going to be useless.
> >
> >   So, I'd say it's _not at all_ clear that it should be enabled by default.
> >
> >   I'm also a bit confused as to why the default is set as it is. Why is the 
> > default only fused+jcc+jmp, not all branch types?
>
>
> Strictly speaking, Intel doesn't say what happens if this bug occurs, nor do 
> they say that branches themselves are the problem, just "branch instruction 
> bytes" and that "unpredictable system behavior may occur". Compare and 
> contrast with the subsequent erratum where they explicitly say that x86, AVX, 
> and integer divide instructions (not instruction bytes) can have 
> unpredictable behavior. See SKX102 and SKX103: 
> https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
>
> But we're digressing. I think the default policy discussion might be better 
> had on llvm-dev than a Phab review.


I believe the unpredictable behavior erratum only affects conditional jumps. 
But the microcode update to workaround the erratum used a bigger hammer that 
effects all kinds of jumps, calls, and returns.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to