Hi! On Wed, Jul 01, 2020 at 06:43:32PM -0400, Michael Meissner wrote: > This patch fixes a PR that I noticed several years ago during power8 > development. I noticed that the compiler would often create a two element > vector and store the vector. > > Particularly for DImode on power8, this could involve two direct moves and a > XXPERMDI to glue the two parts together. On power9, there a single direct > move > instruction that combines the two elements.
But on p9 it reduces path length (and actual latency) as well, while not taking more insns. > > Originally I had the optimization for DFmode as well as DImode. I found if > the > values were already in vector registers, that generally it was faster to do > the > XXPERMDI and vector store. I don't see that? How is DF different from DI in any way here? > +(define_insn_and_split "*concatv2di_store" > + [(set (match_operand:V2DI 0 "memory_operand" "=m,m,m,m") Should this just require ds_form_memory, instead of having a fallback later? > + /* Because we are creating scalar stores, we don't have to swap the order > + of the elements and then swap the stores to get the right order on > + little endian systems. */ Or any other system. Yes. > +;; Optimize creating a vector with 2 duplicate DImode elements and storing > it. > +(define_insn_and_split "*dupv2di_store" Hrm, can't this just use *concatv2di_store some way? > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr81594.c > @@ -0,0 +1,61 @@ > +/* { dg-do compile { target { powerpc-*-* && ilp64 } } } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */ This hasn't been tested. > +/* PR target/81594. Optimize creating a vector of 2 64-bit elements and then > + storing the vector into separate stores. */ Please mention the PR # in the subject and the changelog. > +/* { dg-final { scan-assembler-not {\mstxv\M} } } */ > +/* { dg-final { scan-assembler-not {\mstxvx\M} } } */ > +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ > +/* { dg-final { scan-assembler-not {\mmtvsrd\M} } } */ > +/* { dg-final { scan-assembler-not {\mmtvsrdd\M} } } */ > +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */ There are other ways to write the stores (older insns, and prefixed insns, for example). xxpermdi has many extenden mnemonics (and we actually use those, they improve readability quite a bit). It helps if you also test what insns *should* be there, the testcase will not so easily silently pass (while the generated code is not what it wants) that way. Segher