Hi!
On Fri, Sep 26, 2025 at 12:13:19PM +0530, Avinash Jayakar wrote:
> Whenever a vector of integers is left shifted by a constant value 1,
> gcc generates the following code for powerpc64le target:
> vspltisw 0,1
> vsld 2,2,0
> Instead the following code can be generated which is more efficient:
> vaddudm 2,2,2
Yup. But please let the generic optimisers handle this, don't do
premature "optimisations" (which actually cause missed optimisations!)
in some define_expand. Just have a define_insn for this, that matches
exactly the shift let by one (and has a name starting with "*", aka "no
real name", so that it cannot (accidentally) be generated by name), that
either expands to the vaddudm insn, or does a split (not needed here, so
please don't).
Hrm, I was looking at an older version of the patch first it seems, you
fixed all this already, thanks :-)
> Added a predicate in predicates.md to recognize if the rtl node is a
> uniform constant vector with value 1.
That sounds useful to have. If you find other places where this would
be useful to use, please send a patch :-)
> gcc/ChangeLog:
> PR target/119702
> * config/rs6000/altivec.md (*altivec_vsl<VI_char>_const_1): recognize
> << 1 and replace with vadd insn.
Broken indentation? Or, in one case you use a tab and in another eight
spaces?
> * config/rs6000/predicates.md (vector_constant_1): predicate to check
> vector constant of all 1.
All lines should start with a capital.
> +(define_insn "*altivec_vsl<VI_char>_const_1"
> + [(set (match_operand:VI2 0 "register_operand" "=v")
> + (ashift:VI2 (match_operand:VI2 1 "register_operand" "v")
> + (match_operand:VI2 2 "vector_constant_1" "")))]
> + "<VI_unit>"
> + "vaddu<VI_char>m %0,%1,%1"
> +)
Cool. Thanks :-)
> +;; Return 1 if the operand is a vector constant with 1 in all of the
> elements.
> +(define_predicate "vector_constant_1"
> + (match_code "const_vector")
> +{
> + unsigned nunits = GET_MODE_NUNITS (mode), i;
Please don't have declarations for uninitialised things after
initialised ones (or in the same line at all).
> + for (i = 0; i < nunits; i++)
> + {
> + if (INTVAL (CONST_VECTOR_ELT (op, i)) != 1)
> + return 0;
> + }
Every block should be indented by two spaces. So
for (i = 0; i < nunits; i++)
{
if (INTVAL (CONST_VECTOR_ELT (op, i)) != 1)
return 0;
}
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr119702-1.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -maltivec" } */
> +/* { dg-require-effective-target powerpc_altivec } */
You need to test for p8 or later (arch 2.07) for vaddudm. Either on
that test, or for the whole testcase if that is simpler (and then
put a comment there saying why it is required, then).
> +void lshift1_16(unsigned short *a)
(There is a trailing space on this line).
> +void lshift1_8(unsigned char *a)
(And another).
> +/* { dg-final { scan-assembler-times {\mvaddudm\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvadduwm\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvadduhm\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvaddubm\M} 1 } } */
You could do something like
/* { dg-final { scan-assembler-times {\mvaddudm\M} 1 { target has_arch_pwr8 } }
} */
I never know what exactly is wanted or needed there. Just try it out?
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr119702-2.c
(All the same things here, of course).
As a follow-up, you could also handle muls by four (shifts by two) by
doing two consecutive vadds, or muls by 3? But that is an extra thing
(and the mul by 4 is not so obviously an optimisation always!)
So, please fix the vaddudm tests, and the formatting nits. You're
almost there!
Thanks,
Segher