On 6/13/24 5:32 AM, Manolis Tsamis wrote:
This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:
strb w2, [x1, 1]
ldr x0, [x1] # Expensive store forwarding to larger load.
To:
ldr x0, [x1]
strb w2, [x1]
bfi x0, x2, 0, 8
Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.
Neoverse-N1: +29.4%
Intel Coffeelake: +13.1%
AMD 5950X: +17.5%
gcc/ChangeLog:
* Makefile.in: Add avoid-store-forwarding.o.
* common.opt: New option -favoid-store-forwarding.
* params.opt: New param store-forwarding-max-distance.
* doc/invoke.texi: Document new pass.
* doc/passes.texi: Document new pass.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
Just a note for others. I've sent Manolis's a few more failures spit
out by my tester. The crosses aren't quite all clean, but they're
getting closer. Once the crosses are clean we'd run the QEMU emulated
targets which take dramatically longer (but which are also a deeper test
for this kind of change).
Jeff