On Tue, Jan 18, 2022 at 1:22 PM Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > In g:526e1639aa76b0a8496b0dc3a3ff2c450229544e I'd added support > for finding more consecutive MEMs. However, the check was too > eager, in that it matched MEM_REFs with the same base address > even if that base address was an arbitrary SSA name. This can > give wrong results if a MEM_REF from one loop iteration is > compared with a MEM_REF from another (e.g. after rtl unrolling).
Hmm, if it really happens that way it also poses an issue for other must-alias analysis like DSE (though I don't remember RTL DSE looking at MEM_EXPRs for this purpose). And may-alias as well if say MEM[_1 + 0] and MEM[_1 + 4] are seen but _1 is assigned _1 + 4 between both (RTL unrolling would basically violate the single-definition guarantee of SSA). > In principle, we could still accept MEM_REFs based on the same > incoming SSA name, but there seems to be no out-of-the-box API > for doing that. Adding a new one at this stage in GCC 12 doesn't > feel like a good risk/reward trade-off. It's also not exactly obvious how to do that. > This patch therefore restricts the MEM_EXPR comparison to base decls > only, excluding all MEM_REFs. It means we lose all the new STPs in > the PR testcase but keep the ones in the original stp_1.c testcase. > > Tested on aarch64-linux-gnu and pushed. > > Richard > > > gcc/ > PR target/104005 > * config/aarch64/aarch64.cc (aarch64_check_consecutive_mems): > When using MEM_EXPR, require the base to be a decl. > > gcc/testsuite/ > PR target/104005 > * gcc.target/aarch64/pr104005.c: New test. > --- > gcc/config/aarch64/aarch64.cc | 1 + > gcc/testsuite/gcc.target/aarch64/pr104005.c | 17 +++++++++++++++++ > 2 files changed, 18 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr104005.c > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index fdf0c9bd5b8..296145e6008 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24747,6 +24747,7 @@ aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, > bool *reversed) > &expr_offset2); > if (!expr_base1 > || !expr_base2 > + || !DECL_P (expr_base1) > || !operand_equal_p (expr_base1, expr_base2, OEP_ADDRESS_OF)) > return false; > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr104005.c > b/gcc/testsuite/gcc.target/aarch64/pr104005.c > new file mode 100644 > index 00000000000..09dd81910eb > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr104005.c > @@ -0,0 +1,17 @@ > +/* { dg-options "-O2 -funroll-loops" } */ > + > +typedef int v2 __attribute__((vector_size(8))); > + > +void f(void) { > + v2 v[1024]; > + v2 *ptr = v; > + for (int i = 0; i < 512; ++i) > + { > + ptr[0][0] = 0; > + asm volatile ("":::"memory"); > + ptr[0][1] = 1; > + ptr += 2; > + } > +} > + > +/* { dg-final { scan-assembler-not {\tstp\t} } } */ > -- > 2.25.1 >