Hi!

On Fri, Sep 26, 2025 at 12:13:19PM +0530, Avinash Jayakar wrote:
> Whenever a vector of integers is left shifted by a constant value 1,
> gcc generates the following code for powerpc64le target:
>         vspltisw 0,1
>         vsld 2,2,0
> Instead the following code can be generated which is more efficient:
>         vaddudm 2,2,2

Yup.  But please let the generic optimisers handle this, don't do
premature "optimisations" (which actually cause missed optimisations!)
in some define_expand.  Just have a define_insn for this, that matches
exactly the shift let by one (and has a name starting with "*", aka "no
real name", so that it cannot (accidentally) be generated by name), that
either expands to the vaddudm insn, or does a split (not needed here, so
please don't).

Hrm, I was looking at an older version of the patch first it seems, you
fixed all this already, thanks :-)

> Added a predicate in predicates.md to recognize if the rtl node is a
> uniform constant vector with value 1.

That sounds useful to have.  If you find other places where this would
be useful to use, please send a patch :-)

> gcc/ChangeLog:
>         PR target/119702
>         * config/rs6000/altivec.md (*altivec_vsl<VI_char>_const_1): recognize 
>       << 1 and replace with vadd insn.

Broken indentation?  Or, in one case you use a tab and in another eight
spaces?

>         * config/rs6000/predicates.md (vector_constant_1): predicate to check
>         vector constant of all 1.

All lines should start with a capital.

> +(define_insn "*altivec_vsl<VI_char>_const_1"
> +  [(set (match_operand:VI2 0 "register_operand" "=v")
> +     (ashift:VI2 (match_operand:VI2 1 "register_operand" "v")
> +                   (match_operand:VI2 2 "vector_constant_1" "")))]
> +  "<VI_unit>"
> +  "vaddu<VI_char>m %0,%1,%1"
> +)

Cool.  Thanks :-)

> +;; Return 1 if the operand is a vector constant with 1 in all of the 
> elements.
> +(define_predicate "vector_constant_1"
> +  (match_code "const_vector")
> +{
> +  unsigned nunits = GET_MODE_NUNITS (mode), i;

Please don't have declarations for uninitialised things after
initialised ones (or in the same line at all).

> +  for (i = 0; i < nunits; i++)
> +  {
> +    if (INTVAL (CONST_VECTOR_ELT (op, i)) != 1)
> +      return 0;
> +  }

Every block should be indented by two spaces.  So

  for (i = 0; i < nunits; i++)
    {
      if (INTVAL (CONST_VECTOR_ELT (op, i)) != 1)
        return 0;
    }

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr119702-1.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -maltivec" } */
> +/* { dg-require-effective-target powerpc_altivec } */

You need to test for p8 or later (arch 2.07) for vaddudm.  Either on
that test, or for the whole testcase if that is simpler (and then
put a comment there saying why it is required, then).

> +void lshift1_16(unsigned short *a) 

(There is a trailing space on this line).

> +void lshift1_8(unsigned char *a) 

(And another).

> +/* { dg-final { scan-assembler-times {\mvaddudm\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvadduwm\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvadduhm\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvaddubm\M} 1 } } */

You could do something like
/* { dg-final { scan-assembler-times {\mvaddudm\M} 1 { target has_arch_pwr8 } } 
} */

I never know what exactly is wanted or needed there.  Just try it out?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr119702-2.c

(All the same things here, of course).

As a follow-up, you could also handle muls by four (shifts by two) by
doing two consecutive vadds, or muls by 3?  But that is an extra thing
(and the mul by 4 is not so obviously an optimisation always!)

So, please fix the vaddudm tests, and the formatting nits.  You're
almost there!

Thanks,


Segher

Reply via email to