https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98299
--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> --- At the very least, it seems like a worthwhile pattern to recognize in -O3, even if only to avoid vectorizing it, i.e. have similar effects to what happens if you add `if (n >= 1000) __builtin_unreachable();` to the start of f1. Altogether, though, it seems unlikely that the modulo would be costlier than the loop except in very narrow cases, since it is optimized into a multiplication and a few other operations with little cost. Also, the transformation into a modulo seems to occur in the vectorized version too, though it is weirdly optimized.