https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #14 from Maxim Egorushkin <maxim.yegorushkin at gmail dot com> --- (In reply to Andrew Pinski from comment #6) > It happens more often with vector instructions/registers due to the > different "modes" of the registers that it can hold (subregs). That's right, my empirical observations point to this issue as one root of the unnecessary gcc register moves. Whenever I examine the assembly generated by gcc, I cannot help noticing these unnecessary register moves, like `vmovq` or `vmovdqa` here . Or, worse, allocating one constant variable to multiple registers, which gets never modified, for no apparent reason. The most frequent example: whenever I need to access the low double from an xmm __v2df register, despite the fact that the following instruction consuming that low double ignores all the higher bits in the xmm register and zeros out the higher bits in the destination, thus breaking any dependencies on the higher bits of the source registers. Yet, gcc generates a register move that zeros out all the higher bits before using that register in the following instruction that ignores these higher bits anyway, issuing that `vmovq` or `vmovdqa` register move instruction serving no useful purpose but costing extra RAM storage, pressure on L1i cache, and possibly µops cache, and possibly extra iTLB cache misses. Eliminating these unnecessary `vmovq` or `vmovdqa` register moves is the primary reason I have to use inline asm statements instead of calling Intel AVX API intrinsic functions, or operating with portable gcc vector extensions alone. One simple fix could be a special cast to treat an xmm register with multiple elements as a register with one low element without copying. E.g. `_mm_cvtsd_f64` generates `vmovq`, I need a version of this function which generates no instruction but rather treats the __v2df xmm register as a plain double. And a vice versa cast - treat a `double` in an xmm register as a __v2df xmm register without a cast, while assuming the higher bits undefined.