davemgreen wrote: AArch64 has a udot and sdot instruction (and a usdot instruction). They perform a "partial" reduction though, producing a v4i32 from two v16i8 inputs. We would like to use those from the vectorizer and have recently added a partial-reduction intrinsic, but doing it with a higher level intrinsic might be a little nicer.
It would seem like a "udot" can be represented already as `vecreduce.add(mul(zext, zext))`, and fdot is simpler still. Is there any particular reason to add a new intrinsic for it if it is already representable as a vecreduce? And it would feel like a shame if it couldn't be used with the actual AArch64 instructions. @SamTebbs33 @NickGuy-Arm FYI. https://github.com/llvm/llvm-project/pull/102872 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits