https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100267

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #4)
> (In reply to Hongtao.liu from comment #3)
> > After support v{,p}expand* thats w/o mask operands, codegen seems to be
> > optimal
> > 
> 
> I was wrong, without mask, it's just simple move.

finally optimized to

 _Z16dummyf1_avx512x8PK11flow_avx512:
.LFB5665:
        .cfi_startproc
        movl    (%rdi), %edx
        movq    8(%rdi), %rax
        vmovdqu (%rax,%rdx,8), %ymm0
        vmovdqu 32(%rax,%rdx,8), %ymm1
        vpaddq  %ymm1, %ymm0, %ymm0
        ret

I'm testing the patch.

Reply via email to