[PATCH] D70157: Align branches within 32-Byte boundary

James Y Knight via Phabricator via cfe-commits Sat, 16 Nov 2019 07:52:11 -0800

jyknight added a comment.

In D70157#1747551 <https://reviews.llvm.org/D70157#1747551>, @davezarzycki 
wrote:

> In D70157#1747510 <https://reviews.llvm.org/D70157#1747510>, @xbolva00 wrote:
>
> > > Even though core2 isn't affected by the erratum, core2 code can run on 
> > > CPUs that do have the bug (and core2 is a popular target for code that 
> > > needs to run "everywhere"), therefore all target CPUs that predate a 
> > > hardware fix really
> >
> > So perf of all code/generic codegen will go down. This is not very 
> > acceptable. -1 for making this enabled by default for generic codegen.
>
>
> I'm okay with this not being the default if we document somewhere that LLVM 
> assumes that machines have up to date microcode installed.

So, my understanding is:

1. There is a CPU bug, that in very rare but unspecified circumstances, can 
cause branches of all kinds (conditional-jump, fused-conditional-jump, 
unconditional jump, call, return, indirect jumps and calls?) to go to the wrong 
place, if the instruction crosses a 64-byte cacheline boundary.
2. In order to fix that bug, the new CPU firmware disables the faster 
decoded-icache mechanism, falling back to the legacy decoder for any 32-byte 
block of code which ends with one of those jumps (because it works in 32-byte 
blocks, not 64-byte blocks). Falling back to the legacy decoder can have a 
sometimes-significant performance cost. Of course, this is not the _only_ 
reason that the CPU falls back to the legacy decoder, but it's a significant 
new one.

This patch, then, spends some memory and icache space (and potentially 
decode-time) in order to try to get back the performance lost by the above fix. 
This can be worthwhile because if you can cause there not to be any 32-byte 
code sections that have a branch at the end, then the fallback to the legacy 
decoder won't trigger, and the performance will hopefully return to something 
comparable to the initial state. Unfortunately, while this ought to increase 
performance impact on all of the affected processors, it will have only 
negative effects on unaffected processors, since then you're spending memory 
and icache space, and not getting anything back in return.

Similarly, if you're doing this extra alignment on code that isn't executed 
repeatedly out of the icache, it's also going to be useless.

So, I'd say it's _not at all_ clear that it should be enabled by default.

I'm also a bit confused as to why the default is set as it is. Why is the 
default only fused+jcc+jmp, not all branch types?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D70157: Align branches within 32-Byte boundary

Reply via email to