https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96275
--- Comment #1 from Witold Baryluk <witold.baryluk+gcc at gmail dot com> --- FYI. clang trunk 12 / 76a0c0ee6ffa9c38485776921948d8f930109674, doesn't do that either: fillArray: # @fillArray test dil, 31 jne .LBB0_8 test edi, edi je .LBB0_8 vmovss xmm0, dword ptr [rdx] # xmm0 = mem[0],zero,zero,zero mov eax, edi cmp edi, 32 jae .LBB0_4 xor edx, edx jmp .LBB0_7 .LBB0_4: vbroadcastss ymm1, xmm0 mov edx, eax xor edi, edi and edx, -32 .LBB0_5: # =>This Inner Loop Header: Depth=1 vmulps ymm2, ymm1, ymmword ptr [rcx + 4*rdi] vmulps ymm3, ymm1, ymmword ptr [rcx + 4*rdi + 32] vmulps ymm4, ymm1, ymmword ptr [rcx + 4*rdi + 64] vmulps ymm5, ymm1, ymmword ptr [rcx + 4*rdi + 96] vmovups ymmword ptr [rsi + 4*rdi], ymm2 vmovups ymmword ptr [rsi + 4*rdi + 32], ymm3 vmovups ymmword ptr [rsi + 4*rdi + 64], ymm4 vmovups ymmword ptr [rsi + 4*rdi + 96], ymm5 add rdi, 32 cmp rdx, rdi jne .LBB0_5 cmp rdx, rax je .LBB0_8 .LBB0_7: # =>This Inner Loop Header: Depth=1 vmulss xmm1, xmm0, dword ptr [rcx + 4*rdx] vmovss dword ptr [rsi + 4*rdx], xmm1 inc rdx cmp rax, rdx jne .LBB0_7 .LBB0_8: vzeroupper ret the main inner loop is unrolled / pipelined more aggressively, and the fallback code is simpler (just handle scalars scalarly), which is unrelated. But the fallback code is still there. Changing to different variations of the condition, like `if ((N/32)*32 == N) {`, `if ((N % 32) == 0) {`, `if ((N & ~31u) == N) {`, `if ((N >> 5) << 5 == N) {`, doesn't make any difference. I tried with signed int, and unsigned int. Same effect. Reassigning to N (after removing constness), i.e. `N = N & ~31u`, or `N = (N >> 5) << 5`, does appear to do something, but if it is inside the condition it is already too late.