https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984

--- Comment #8 from Maxim Egorushkin <maxim.yegorushkin at gmail dot com> ---
(In reply to Andrew Pinski from comment #6)
> If you look at the difference between the 2 functions.
>         vextracti128    xmm1, ymm0, 0x1
> 
> vs
>         vmovdqa xmm1, xmm0
>         vextracti128    xmm0, ymm0, 0x1
> 
> The register allocator is allocating the result of the
> _mm256_extracti128_si256 in the first case to xmm1 but in the second case to
> xmm0. That means in the second case we need to a move instruction to copy
> what was in ymm0 but only 128bits of it. And that is where vmovdqa is coming
> from.

I am sorry for being thick, but I fail to see what requires/causes 

> That means in the second case we need to a move instruction to copy what was 
> in ymm0 but only 128bits of it_

What exactly needs moving only 128 bits of ymm0 and why, please?

Reply via email to