Hi Haochen, on 2024/1/11 16:28, HAO CHEN GUI wrote: > Hi, > This patch eliminates unnecessary byte swaps for block clear on P8 > LE. For block clear, all the bytes are set to zero. The byte order > doesn't make sense. So the alignment of destination could be set to > the store mode size in stead of 1 byte in order to eliminates > unnecessary byte swap instructions on P8 LE. The test case shows the > problem.
I agree with Richi's concern, a bytes swap can be eliminated if the bytes swapped result is known as before, one typical case is the vector constant with predicate const_vector_each_byte_same, we can do some optimization for that. BR, Kewen > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE > > gcc/ > PR target/113325 > * config/rs6000/rs6000-string.cc (expand_block_clear): Set the > alignment of destination to the size of mode. > > gcc/testsuite/ > PR target/113325 > * gcc.target/powerpc/pr113325.c: New. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000-string.cc > b/gcc/config/rs6000/rs6000-string.cc > index 7f777666ba9..4c9b2cbeefc 100644 > --- a/gcc/config/rs6000/rs6000-string.cc > +++ b/gcc/config/rs6000/rs6000-string.cc > @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[]) > } > > dest = adjust_address (orig_dest, mode, offset); > - > + /* Set the alignment of dest to the size of mode in order to > + avoid unnecessary byte swaps on LE. */ > + set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT); > emit_move_insn (dest, CONST0_RTX (mode)); > } > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c > b/gcc/testsuite/gcc.target/powerpc/pr113325.c > new file mode 100644 > index 00000000000..4a3cae019c2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ > + > +void* foo (void* s1) > +{ > + return __builtin_memset (s1, 0, 32); > +}