Andrew Pinski via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Thu, Nov 18, 2021 at 5:55 PM apinski--- via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> From: Andrew Pinski <apin...@marvell.com> >> >> The problem here is that aarch64_expand_setmem does not change the alignment >> for strict alignment case. This is a simplified patch from what I had >> previously. >> So constraining copy_limit to the alignment of the mem in the case of strict >> align >> fixes the issue without checking to many other changes to the code. >> >> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. > > Ping? > >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64.c (aarch64_expand_setmem): Constraint >> copy_limit to the alignment of the mem if STRICT_ALIGNMENT is >> true.
This looks correct, but for a modified version of the testcase in the PR: void f(char *x) { __builtin_memset (x, 0, 216); } we'll now emit 216 STRB instructions, which seems a bit excessive. I think in practice the code has only been tuned on targets that support LDP/STP Q, so how about moving the copy_limit calculation further up and doing: unsigned max_set_size = (copy_limit * 8) / BITS_PER_UNIT; ? It would be good to have a scan-assembler-not testcase for the testsuite. Thanks, Richard >> --- >> gcc/config/aarch64/aarch64.c | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 7389b5953dc..e9c2e89d8ce 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -23744,9 +23744,16 @@ aarch64_expand_setmem (rtx *operands) >> /* Maximum amount to copy in one go. We allow 256-bit chunks based on the >> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter. setmem expand >> pattern is only turned on for TARGET_SIMD. */ >> - const int copy_limit = (aarch64_tune_params.extra_tuning_flags >> - & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) >> - ? GET_MODE_BITSIZE (TImode) : 256; >> + int copy_limit; >> + >> + if (aarch64_tune_params.extra_tuning_flags >> + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) >> + copy_limit = GET_MODE_BITSIZE (TImode); >> + else >> + copy_limit = 256; >> + >> + if (STRICT_ALIGNMENT) >> + copy_limit = MIN (copy_limit, (int)MEM_ALIGN (dst)); >> >> while (n > 0) >> { >> -- >> 2.17.1 >>