https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71921
--- Comment #26 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> which should have same semantics as x86 min/max, and the backend need to
> support those variants for it.
w/ more combine pattern, now it generates
.L4:
vxorps %xmm0, %xmm0, %xmm0
vmaxps (%rsi,%rax), %ymm0, %ymm0
vmovups %ymm0, (%rdi,%rax)
addq $32, %rax
cmpq %rax, %r8
jne .L4
Looks like after split1, there's no loop inviriant motion and failed to hoist
vxorps outside.
Similar for the testcase in #c16
.L4:
vxorps %xmm0, %xmm0, %xmm0
vmaxps (%rsi,%rax), %ymm0, %ymm0
vmovups %ymm0, (%rdi,%rax)
addq $32, %rax
cmpq %rax, %r8
jne .L4
But it should still better than before since vxorps is cheap.