https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246

--- Comment #3 from Jiangning Liu <jiangning.liu at amperecomputing dot com> ---
Expect to vectorize the inner loop by generating the code below for x86,

vpbroadcastd [mem], ymm0
vpaddd [mem], ymm0, ymm1
vpbroadcastd reg, ymm2
vpcmpeqd ymm2, ymm1, k0
kortestw k0, k0
cmovne ...

AArch64 should have vectorization instructions counterpart to implement the
same functionality.

Reply via email to