https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|101926                      |

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
If you look at the difference between the 2 functions.
        vextracti128    xmm1, ymm0, 0x1

vs
        vmovdqa xmm1, xmm0
        vextracti128    xmm0, ymm0, 0x1

The register allocator is allocating the result of the _mm256_extracti128_si256
in the first case to xmm1 but in the second case to xmm0. That means in the
second case we need to a move instruction to copy what was in ymm0 but only
128bits of it. And that is where vmovdqa is coming from.

IIRC there are a few other examples of this issue and it comes down to subreg
not being so good for the register allocation.

As I mentioned register allocation is NP complete problem so getting an extra
move (copy register) might/will happen if allocate in the wrong order in some
cases.

It happens more often with vector instructions/registers due to the different
"modes" of the registers that it can hold (subregs).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101926
[Bug 101926] [meta-bug] struct/complex/other argument passing and return should
be improved

Reply via email to