On Wed, Sep 1, 2021 at 1:52 AM Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > apinski--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > From: Andrew Pinski <apin...@marvell.com> > > > > The problem here is the aarch64_expand_setmem code did not check > > STRICT_ALIGNMENT if it is creating an overlapping store. > > This patch adds that check and the testcase works. > > > > gcc/ChangeLog: > > > > PR target/101934 > > * config/aarch64/aarch64.c (aarch64_expand_setmem): > > Check STRICT_ALIGNMENT before creating an overlapping > > store. > > > > gcc/testsuite/ChangeLog: > > > > PR target/101934 > > * gcc.target/aarch64/memset-strict-align-1.c: New test. > > OK, thanks.
Applied now also on the GCC 11 branch. Thanks, Andrew > > Richard > > > --- > > gcc/config/aarch64/aarch64.c | 4 +-- > > .../aarch64/memset-strict-align-1.c | 28 +++++++++++++++++++ > > 2 files changed, 30 insertions(+), 2 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > index 3213585a588..26d59ba1e13 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -23566,8 +23566,8 @@ aarch64_expand_setmem (rtx *operands) > > /* Do certain trailing copies as overlapping if it's going to be > > cheaper. i.e. less instructions to do so. For instance doing a 15 > > byte copy it's more efficient to do two overlapping 8 byte copies > > than > > - 8 + 4 + 2 + 1. */ > > - if (n > 0 && n < copy_limit / 2) > > + 8 + 4 + 2 + 1. Only do this when -mstrict-align is not supplied. */ > > + if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT) > > { > > next_mode = smallest_mode_for_size (n, MODE_INT); > > int n_bits = GET_MODE_BITSIZE (next_mode).to_constant (); > > diff --git a/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c > > b/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c > > new file mode 100644 > > index 00000000000..5cdc8a44968 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/memset-strict-align-1.c > > @@ -0,0 +1,28 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-Os -mstrict-align" } */ > > + > > +struct s { char x[95]; }; > > +void foo (struct s *); > > +void bar (void) { struct s s1 = {}; foo (&s1); } > > + > > +/* memset (s1 = {}, sizeof = 95) should be expanded out > > + such that there are no overlap stores when -mstrict-align > > + is in use. > > + so 2 pair 16 bytes stores (64 bytes). > > + 1 16 byte stores > > + 1 8 byte store > > + 1 4 byte store > > + 1 2 byte store > > + 1 1 byte store > > + */ > > + > > +/* { dg-final { scan-assembler-times "stp\tq" 2 } } */ > > +/* { dg-final { scan-assembler-times "str\tq" 1 } } */ > > +/* { dg-final { scan-assembler-times "str\txzr" 1 } } */ > > +/* { dg-final { scan-assembler-times "str\twzr" 1 } } */ > > +/* { dg-final { scan-assembler-times "strh\twzr" 1 } } */ > > +/* { dg-final { scan-assembler-times "strb\twzr" 1 } } */ > > + > > +/* Also one store pair for the frame-pointer and the LR. */ > > +/* { dg-final { scan-assembler-times "stp\tx" 1 } } */ > > +