https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246
--- Comment #3 from Jiangning Liu <jiangning.liu at amperecomputing dot com> --- Expect to vectorize the inner loop by generating the code below for x86, vpbroadcastd [mem], ymm0 vpaddd [mem], ymm0, ymm1 vpbroadcastd reg, ymm2 vpcmpeqd ymm2, ymm1, k0 kortestw k0, k0 cmovne ... AArch64 should have vectorization instructions counterpart to implement the same functionality.