https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #8 from Maxim Egorushkin <maxim.yegorushkin at gmail dot com> --- (In reply to Andrew Pinski from comment #6) > If you look at the difference between the 2 functions. > vextracti128 xmm1, ymm0, 0x1 > > vs > vmovdqa xmm1, xmm0 > vextracti128 xmm0, ymm0, 0x1 > > The register allocator is allocating the result of the > _mm256_extracti128_si256 in the first case to xmm1 but in the second case to > xmm0. That means in the second case we need to a move instruction to copy > what was in ymm0 but only 128bits of it. And that is where vmovdqa is coming > from. I am sorry for being thick, but I fail to see what requires/causes > That means in the second case we need to a move instruction to copy what was > in ymm0 but only 128bits of it_ What exactly needs moving only 128 bits of ymm0 and why, please?