https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #1 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> --- Since there are additional costs for the run-time check, we can see the benefit if upbound `m` is large; if upbound is small (e.g. < 12), the vectorized code (from clang) is worse than un-vectorized binary.