https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #5 from Maxim Egorushkin <maxim.yegorushkin at gmail dot com> --- (In reply to Andrew Pinski from comment #2) > Register allocation is NP complete problem after all. vmovdqa instruction probably intends to turn a ymm register into a xmm register by zeroing all the higher bits beyond xmm. But the following vpaddq instruction ignores the higher bits in its xmm arguments, and the xmm result has its all higher bits set to 0. I appreciate the register allocation problem being NP hard. Allocating registers is the most time consuming part when having to write non-trivial assembly, and I defer to the compiler to allocate registers for my last resort inline asm statements, when tweaking my C++ code cannot produce the expected desirable assembly output. Hence, I wonder, what requires that vmovdqa instruction? Why does register allocation end up having to allocate a register for this unnecessary instruction? Is this instruction emitted by the register allocation, or something preceding the register allocation? How do I find out the root cause of this instruction, please?