https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96275

--- Comment #1 from Witold Baryluk <witold.baryluk+gcc at gmail dot com> ---
FYI.

clang trunk 12 / 76a0c0ee6ffa9c38485776921948d8f930109674, doesn't do that
either:

fillArray:                              # @fillArray
        test    dil, 31
        jne     .LBB0_8
        test    edi, edi
        je      .LBB0_8
        vmovss  xmm0, dword ptr [rdx]           # xmm0 = mem[0],zero,zero,zero
        mov     eax, edi
        cmp     edi, 32
        jae     .LBB0_4
        xor     edx, edx
        jmp     .LBB0_7
.LBB0_4:
        vbroadcastss    ymm1, xmm0
        mov     edx, eax
        xor     edi, edi
        and     edx, -32
.LBB0_5:                                # =>This Inner Loop Header: Depth=1
        vmulps  ymm2, ymm1, ymmword ptr [rcx + 4*rdi]
        vmulps  ymm3, ymm1, ymmword ptr [rcx + 4*rdi + 32]
        vmulps  ymm4, ymm1, ymmword ptr [rcx + 4*rdi + 64]
        vmulps  ymm5, ymm1, ymmword ptr [rcx + 4*rdi + 96]
        vmovups ymmword ptr [rsi + 4*rdi], ymm2
        vmovups ymmword ptr [rsi + 4*rdi + 32], ymm3
        vmovups ymmword ptr [rsi + 4*rdi + 64], ymm4
        vmovups ymmword ptr [rsi + 4*rdi + 96], ymm5
        add     rdi, 32
        cmp     rdx, rdi
        jne     .LBB0_5
        cmp     rdx, rax
        je      .LBB0_8
.LBB0_7:                                # =>This Inner Loop Header: Depth=1
        vmulss  xmm1, xmm0, dword ptr [rcx + 4*rdx]
        vmovss  dword ptr [rsi + 4*rdx], xmm1
        inc     rdx
        cmp     rax, rdx
        jne     .LBB0_7
.LBB0_8:
        vzeroupper
        ret


the main inner loop is unrolled / pipelined more aggressively, and the fallback
code is simpler (just handle scalars scalarly), which is unrelated. But the
fallback code is still there.



Changing to different variations of the condition, like `if ((N/32)*32 == N)
{`, `if ((N % 32) == 0) {`, `if ((N & ~31u) == N) {`, `if ((N >> 5) << 5 == N)
{`,  doesn't make any difference.

I tried with signed int, and unsigned int. Same effect.

Reassigning to N (after removing constness), i.e. `N = N & ~31u`, or `N = (N >>
5) << 5`, does appear to do something, but if it is inside the condition it is
already too late.

Reply via email to