https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921

--- Comment #26 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> which should have same semantics as x86 min/max, and the backend need to
> support those variants for it.

w/ more combine pattern, now it generates

.L4:
        vxorps  %xmm0, %xmm0, %xmm0
        vmaxps  (%rsi,%rax), %ymm0, %ymm0
        vmovups %ymm0, (%rdi,%rax)
        addq    $32, %rax
        cmpq    %rax, %r8
        jne     .L4

Looks like after split1, there's no loop inviriant motion and failed to hoist
vxorps outside.


Similar for the testcase in #c16

.L4:
        vxorps  %xmm0, %xmm0, %xmm0
        vmaxps  (%rsi,%rax), %ymm0, %ymm0
        vmovups %ymm0, (%rdi,%rax)
        addq    $32, %rax
        cmpq    %rax, %r8
        jne     .L4

But it should still better than before since vxorps is cheap.

Reply via email to