https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96166

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
That is what happens on the trunk (the revision that introduced didn't do that
yet).  But even that permutation is more expensive than the rotate,
        rolq    $32, (%rdi)
vs.
        movq    (%rdi), %xmm1
        pshufd  $225, %xmm1, %xmm0
        movq    %xmm0, (%rdi)
At least for code size...

Reply via email to