Hi Richard,

the patch is working correctly with the four lines deleted and "get_addr"
on "load_mem" inlined on "rtx_equal_for_cselib_1" call.
Also changed store_info to str_info to avoid a warning of duplicate names
on bootstrap (one definition rule).

Rebased on top of your sme2 changes and submitted as v5.

Thanks!
Manos.

On Mon, Dec 4, 2023 at 10:35 PM Richard Sandiford <richard.sandif...@arm.com>
wrote:

> Manos Anagnostakis <manos.anagnosta...@vrull.eu> writes:
> > Στις Δευ 4 Δεκ 2023, 21:22 ο χρήστης Richard Sandiford <
> > richard.sandif...@arm.com> έγραψε:
> >
> >> Manos Anagnostakis <manos.anagnosta...@vrull.eu> writes:
> >> > This is an RTL pass that detects store forwarding from stores to
> larger
> >> loads (load pairs).
> >> >
> >> > This optimization is SPEC2017-driven and was found to be beneficial
> for
> >> some benchmarks,
> >> > through testing on ampere1/ampere1a machines.
> >> >
> >> > For example, it can transform cases like
> >> >
> >> > str  d5, [sp, #320]
> >> > fmul d5, d31, d29
> >> > ldp  d31, d17, [sp, #312] # Large load from small store
> >> >
> >> > to
> >> >
> >> > str  d5, [sp, #320]
> >> > fmul d5, d31, d29
> >> > ldr  d31, [sp, #312]
> >> > ldr  d17, [sp, #320]
> >> >
> >> > Currently, the pass is disabled by default on all architectures and
> >> enabled by a target-specific option.
> >> >
> >> > If deemed beneficial enough for a default, it will be enabled on
> >> ampere1/ampere1a,
> >> > or other architectures as well, without needing to be turned on by
> this
> >> option.
> >> >
> >> > Bootstrapped and regtested on aarch64-linux.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >         * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> >> >         * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New
> >> pass.
> >> >         * config/aarch64/aarch64-protos.h
> >> (make_pass_avoid_store_forwarding): Declare.
> >> >         * config/aarch64/aarch64.opt (mavoid-store-forwarding): New
> >> option.
> >> >       (aarch64-store-forwarding-threshold): New param.
> >> >         * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> >> >         * doc/invoke.texi: Document new option and new param.
> >> >         * config/aarch64/aarch64-store-forwarding.cc: New file.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> >         * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> >> >         * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> >> >         * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> >> >
> >> > Signed-off-by: Manos Anagnostakis <manos.anagnosta...@vrull.eu>
> >> > Co-Authored-By: Manolis Tsamis <manolis.tsa...@vrull.eu>
> >> > Co-Authored-By: Philipp Tomsich <philipp.toms...@vrull.eu>
> >> > ---
> >> > Changes in v4:
> >> >       - I had problems to make cselib_subst_to_values work correctly
> >> >         so I used cselib_lookup to implement the exact same behaviour
> and
> >> >         record the store value at the time we iterate over it.
> >> >       - Removed the store/load_mem_addr check from is_forwarding as
> >> >         unnecessary.
> >> >       - The pass is called on all optimization levels right now.
> >> >       - The threshold check should remain as it is as we only care for
> >> >         the front element of the list. The comment above the check
> >> explains
> >> >         why a single if is enough.
> >>
> >> I still think this is structurally better as a while.  There's no reason
> >> in principle we why wouldn't want to record the stores in:
> >>
> >>         stp     x0, x1, [x4, #8]
> >>         ldp     x0, x1, [x4, #0]
> >>         ldp     x2, x3, [x4, #16]
> >>
> >> and then the two stores should have the same distance value.
> >> I realise we don't do that yet, but still.
> >>
> > Ah, you mean forwarding from stp. I was a bit confused with what you
> meant
> > the previous time. This was not initially meant for this patch, but I
> think
> > it wouldn't take long to implement that before pushing this. It is your
> > call of course if I should include it.
>
> No strong opinion either way, really.  It's definitely fine to check
> only for STRs initially.  I'd just rather not bake that assumption into
> places where we don't need to.
>
> If you do check for STPs, it'd probably be worth committing the pass
> without it at first, waiting until Alex's patch is in, and then doing
> STP support as a separate follow-up patch.  That's because Alex's patch
> will convert STPs to using a single double-width memory operand.
>
> (Waiting shouldn't jeopardise the chances of the patch going in, since
> the pass was posted well in time and the hold-up isn't your fault.)
>
> > [...]
> >> > +/* Return true if STORE_MEM_ADDR is forwarding to the address of
> >> LOAD_MEM;
> >> > +   otherwise false.  STORE_MEM_MODE is the mode of the MEM rtx
> >> containing
> >> > +   STORE_MEM_ADDR.  */
> >> > +
> >> > +static bool
> >> > +is_forwarding (rtx store_mem_addr, rtx load_mem, machine_mode
> >> store_mem_mode)
> >> > +{
> >> > +  /* Sometimes we do not have the proper value.  */
> >> > +  if (!CSELIB_VAL_PTR (store_mem_addr))
> >> > +    return false;
> >> > +
> >> > +  gcc_checking_assert (MEM_P (load_mem));
> >> > +
> >> > +  rtx load_mem_addr = get_addr (XEXP (load_mem, 0));
> >> > +  machine_mode load_mem_mode = GET_MODE (load_mem);
> >> > +  load_mem_addr = cselib_lookup (load_mem_addr, load_mem_mode, 1,
> >> > +                              load_mem_mode)->val_rtx;
> >>
> >> Like I said in the previous review, it shouldn't be necessary to do any
> >> manual lookup on the load address.  rtx_equal_for_cselib_1 does the
> >> lookup itself.  Does that not work?
> >>
> > I thought you meant only that the if check was redundant here, which it
> > was.
>
> Ah, no, I meant the full lookup, sorry.
>
> > I'll reply if cselib can handle the load all by itself.
>
> Thanks!
>
> Richard
>


-- 
*Manos Anagnostakis | Compiler Engineer |*
E: manos.anagnosta...@vrull.eu <makeljana.shku...@vrull.eu>

*VRULL GmbH *| Beatrixgasse 32 1030 Vienna |
 W: www.vrull.eu | LinkedIn <https://www.linkedin.com/company/vrull/>

Reply via email to