On 6/13/24 5:32 AM, Manolis Tsamis wrote:
This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

      strb    w2, [x1, 1]
      ldr     x0, [x1]      # Expensive store forwarding to larger load.

To:

      ldr     x0, [x1]
      strb    w2, [x1]
      bfi     x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

   Neoverse-N1:      +29.4%
   Intel Coffeelake: +13.1%
   AMD 5950X:        +17.5%

gcc/ChangeLog:

        * Makefile.in: Add avoid-store-forwarding.o.
        * common.opt: New option -favoid-store-forwarding.
        * params.opt: New param store-forwarding-max-distance.
        * doc/invoke.texi: Document new pass.
        * doc/passes.texi: Document new pass.
        * passes.def: Schedule a new pass.
        * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
        * avoid-store-forwarding.cc: New file.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
        * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
        * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
        * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
         * gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
Just a note for others. I've sent Manolis's a few more failures spit out by my tester. The crosses aren't quite all clean, but they're getting closer. Once the crosses are clean we'd run the QEMU emulated targets which take dramatically longer (but which are also a deeper test for this kind of change).

Jeff


Reply via email to