https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Maxim Egorushkin from comment #8)
> (In reply to Andrew Pinski from comment #6)
> > If you look at the difference between the 2 functions.
> >         vextracti128    xmm1, ymm0, 0x1
> > 
> > vs
> >         vmovdqa xmm1, xmm0
> >         vextracti128    xmm0, ymm0, 0x1
> > 
> > The register allocator is allocating the result of the
> > _mm256_extracti128_si256 in the first case to xmm1 but in the second case to
> > xmm0. That means in the second case we need to a move instruction to copy
> > what was in ymm0 but only 128bits of it. And that is where vmovdqa is coming
> > from.
> 
> I am sorry for being thick, but I fail to see what requires/causes 
> 
> > That means in the second case we need to a move instruction to copy what 
> > was in ymm0 but only 128bits of it_
> 
> What exactly needs moving only 128 bits of ymm0 and why, please?


Because you have a conflict. Register allocation happens localized in many
cases and if you need a value from a register that will be clobbered by a
different instruction, a move will be inserted (it just happens in this case we
only need the lower part of the register so we can use the "vmovdqa xmm*"
instruction to do the copying instead of copying the full register).

Reply via email to