On Thu, Jan 11, 2024 at 9:30 AM HAO CHEN GUI <guih...@linux.ibm.com> wrote:
>
> Hi,
>   This patch eliminates unnecessary byte swaps for block clear on P8
> LE. For block clear, all the bytes are set to zero. The byte order
> doesn't make sense. So the alignment of destination could be set to
> the store mode size in stead of 1 byte in order to eliminates
> unnecessary byte swap instructions on P8 LE. The test case shows the
> problem.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
>
> Thanks
> Gui Haochen
>
> ChangeLog
> rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE
>
> gcc/
>         PR target/113325
>         * config/rs6000/rs6000-string.cc (expand_block_clear): Set the
>         alignment of destination to the size of mode.
>
> gcc/testsuite/
>         PR target/113325
>         * gcc.target/powerpc/pr113325.c: New.
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index 7f777666ba9..4c9b2cbeefc 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
>         }
>
>        dest = adjust_address (orig_dest, mode, offset);
> -
> +      /* Set the alignment of dest to the size of mode in order to
> +        avoid unnecessary byte swaps on LE.  */
> +      set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);

but the alignment is now wrong which might cause ripple-down
wrong-code effects, no?

It's probably bad to hide the byte-swapping in the move patterns (I'm
just guessing
you do that)

>        emit_move_insn (dest, CONST0_RTX (mode));
>      }
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
> b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> new file mode 100644
> index 00000000000..4a3cae019c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
> +
> +void* foo (void* s1)
> +{
> +  return __builtin_memset (s1, 0, 32);
> +}

Reply via email to