davemgreen wrote:

AArch64 has a udot and sdot instruction (and a usdot instruction). They perform 
a "partial" reduction though, producing a v4i32 from two v16i8 inputs. We would 
like to use those from the vectorizer and have recently added a 
partial-reduction intrinsic, but doing it with a higher level intrinsic might 
be a little nicer.

It would seem like a "udot" can be represented already as 
`vecreduce.add(mul(zext, zext))`, and fdot is simpler still. Is there any 
particular reason to add a new intrinsic for it if it is already representable 
as a vecreduce? And it would feel like a shame if it couldn't be used with the 
actual AArch64 instructions. 

@SamTebbs33 @NickGuy-Arm FYI.

https://github.com/llvm/llvm-project/pull/102872
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to