Manos Anagnostakis <manos.anagnosta...@vrull.eu> writes:
> Στις Δευ 4 Δεκ 2023, 21:22 ο χρήστης Richard Sandiford <
> richard.sandif...@arm.com> έγραψε:
>
>> Manos Anagnostakis <manos.anagnosta...@vrull.eu> writes:
>> > This is an RTL pass that detects store forwarding from stores to larger
>> loads (load pairs).
>> >
>> > This optimization is SPEC2017-driven and was found to be beneficial for
>> some benchmarks,
>> > through testing on ampere1/ampere1a machines.
>> >
>> > For example, it can transform cases like
>> >
>> > str  d5, [sp, #320]
>> > fmul d5, d31, d29
>> > ldp  d31, d17, [sp, #312] # Large load from small store
>> >
>> > to
>> >
>> > str  d5, [sp, #320]
>> > fmul d5, d31, d29
>> > ldr  d31, [sp, #312]
>> > ldr  d17, [sp, #320]
>> >
>> > Currently, the pass is disabled by default on all architectures and
>> enabled by a target-specific option.
>> >
>> > If deemed beneficial enough for a default, it will be enabled on
>> ampere1/ampere1a,
>> > or other architectures as well, without needing to be turned on by this
>> option.
>> >
>> > Bootstrapped and regtested on aarch64-linux.
>> >
>> > gcc/ChangeLog:
>> >
>> >         * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
>> >         * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New
>> pass.
>> >         * config/aarch64/aarch64-protos.h
>> (make_pass_avoid_store_forwarding): Declare.
>> >         * config/aarch64/aarch64.opt (mavoid-store-forwarding): New
>> option.
>> >       (aarch64-store-forwarding-threshold): New param.
>> >         * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
>> >         * doc/invoke.texi: Document new option and new param.
>> >         * config/aarch64/aarch64-store-forwarding.cc: New file.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >         * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
>> >         * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
>> >         * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
>> >
>> > Signed-off-by: Manos Anagnostakis <manos.anagnosta...@vrull.eu>
>> > Co-Authored-By: Manolis Tsamis <manolis.tsa...@vrull.eu>
>> > Co-Authored-By: Philipp Tomsich <philipp.toms...@vrull.eu>
>> > ---
>> > Changes in v4:
>> >       - I had problems to make cselib_subst_to_values work correctly
>> >         so I used cselib_lookup to implement the exact same behaviour and
>> >         record the store value at the time we iterate over it.
>> >       - Removed the store/load_mem_addr check from is_forwarding as
>> >         unnecessary.
>> >       - The pass is called on all optimization levels right now.
>> >       - The threshold check should remain as it is as we only care for
>> >         the front element of the list. The comment above the check
>> explains
>> >         why a single if is enough.
>>
>> I still think this is structurally better as a while.  There's no reason
>> in principle we why wouldn't want to record the stores in:
>>
>>         stp     x0, x1, [x4, #8]
>>         ldp     x0, x1, [x4, #0]
>>         ldp     x2, x3, [x4, #16]
>>
>> and then the two stores should have the same distance value.
>> I realise we don't do that yet, but still.
>>
> Ah, you mean forwarding from stp. I was a bit confused with what you meant
> the previous time. This was not initially meant for this patch, but I think
> it wouldn't take long to implement that before pushing this. It is your
> call of course if I should include it.

No strong opinion either way, really.  It's definitely fine to check
only for STRs initially.  I'd just rather not bake that assumption into
places where we don't need to.

If you do check for STPs, it'd probably be worth committing the pass
without it at first, waiting until Alex's patch is in, and then doing
STP support as a separate follow-up patch.  That's because Alex's patch
will convert STPs to using a single double-width memory operand.

(Waiting shouldn't jeopardise the chances of the patch going in, since
the pass was posted well in time and the hold-up isn't your fault.)

> [...]
>> > +/* Return true if STORE_MEM_ADDR is forwarding to the address of
>> LOAD_MEM;
>> > +   otherwise false.  STORE_MEM_MODE is the mode of the MEM rtx
>> containing
>> > +   STORE_MEM_ADDR.  */
>> > +
>> > +static bool
>> > +is_forwarding (rtx store_mem_addr, rtx load_mem, machine_mode
>> store_mem_mode)
>> > +{
>> > +  /* Sometimes we do not have the proper value.  */
>> > +  if (!CSELIB_VAL_PTR (store_mem_addr))
>> > +    return false;
>> > +
>> > +  gcc_checking_assert (MEM_P (load_mem));
>> > +
>> > +  rtx load_mem_addr = get_addr (XEXP (load_mem, 0));
>> > +  machine_mode load_mem_mode = GET_MODE (load_mem);
>> > +  load_mem_addr = cselib_lookup (load_mem_addr, load_mem_mode, 1,
>> > +                              load_mem_mode)->val_rtx;
>>
>> Like I said in the previous review, it shouldn't be necessary to do any
>> manual lookup on the load address.  rtx_equal_for_cselib_1 does the
>> lookup itself.  Does that not work?
>>
> I thought you meant only that the if check was redundant here, which it
> was.

Ah, no, I meant the full lookup, sorry.

> I'll reply if cselib can handle the load all by itself.

Thanks!

Richard

Reply via email to