Tamar Christina <[email protected]> writes:
> Hi All,
>
> There's a slight mismatch between the vectorizer optabs and the intrinsics
> patterns for NEON. The vectorizer expects operands[3] and operands[0] to be
> the same but the aarch64 intrinsics expanders expect operands[0] and
> operands[1] to be the same.
>
> This means we need different patterns here. This adds a separate usdot
> vectorizer pattern which just shuffles around the RTL params.
>
> There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
> patterns which is not corrected here.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
Couldn't we just change:
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index
> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8
> 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
> {
> - return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
> + return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
…this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?
I think that's an OK thing to do when the function is named after
an optab rather than an arm_neon.h intrinsic.
Thanks,
Richard
> }
>
> __extension__ extern __inline int32x4_t
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
> {
> - return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
> + return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
> }
>
> __extension__ extern __inline int32x2_t