Andrew Pinski via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, Nov 18, 2021 at 5:55 PM apinski--- via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> From: Andrew Pinski <apin...@marvell.com>
>>
>> The problem here is that aarch64_expand_setmem does not change the alignment
>> for strict alignment case. This is a simplified patch from what I had 
>> previously.
>> So constraining copy_limit to the alignment of the mem in the case of strict 
>> align
>> fixes the issue without checking to many other changes to the code.
>>
>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> Ping?
>
>>
>> gcc/ChangeLog:
>>
>>         * config/aarch64/aarch64.c (aarch64_expand_setmem): Constraint
>>         copy_limit to the alignment of the mem if STRICT_ALIGNMENT is
>>         true.

This looks correct, but for a modified version of the testcase in the PR:

void f(char *x) { __builtin_memset (x, 0, 216); }

we'll now emit 216 STRB instructions, which seems a bit excessive.

I think in practice the code has only been tuned on targets that
support LDP/STP Q, so how about moving the copy_limit calculation
further up and doing:

  unsigned max_set_size = (copy_limit * 8) / BITS_PER_UNIT;

?

It would be good to have a scan-assembler-not testcase for the testsuite.

Thanks,
Richard


>> ---
>>  gcc/config/aarch64/aarch64.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 7389b5953dc..e9c2e89d8ce 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -23744,9 +23744,16 @@ aarch64_expand_setmem (rtx *operands)
>>    /* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
>>       AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.  setmem expand
>>       pattern is only turned on for TARGET_SIMD.  */
>> -  const int copy_limit = (aarch64_tune_params.extra_tuning_flags
>> -                         & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
>> -                         ? GET_MODE_BITSIZE (TImode) : 256;
>> +  int copy_limit;
>> +
>> +  if (aarch64_tune_params.extra_tuning_flags
>> +      & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
>> +    copy_limit = GET_MODE_BITSIZE (TImode);
>> +  else
>> +    copy_limit = 256;
>> +
>> +  if (STRICT_ALIGNMENT)
>> +    copy_limit = MIN (copy_limit, (int)MEM_ALIGN (dst));
>>
>>    while (n > 0)
>>      {
>> --
>> 2.17.1
>>

Reply via email to