https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984

--- Comment #14 from Maxim Egorushkin <maxim.yegorushkin at gmail dot com> ---
(In reply to Andrew Pinski from comment #6)

> It happens more often with vector instructions/registers due to the
> different "modes" of the registers that it can hold (subregs).

That's right, my empirical observations point to this issue as one root of the
unnecessary gcc register moves.

Whenever I examine the assembly generated by gcc, I cannot help noticing these
unnecessary register moves, like `vmovq` or `vmovdqa` here . Or, worse,
allocating one constant variable to multiple registers, which gets never
modified, for no apparent reason.

The most frequent example: whenever I need to access the low double from an xmm
__v2df register, despite the fact that the following instruction consuming that
low double ignores all the higher bits in the xmm register and zeros out the
higher bits in the destination, thus breaking any dependencies on the higher
bits of the source registers. Yet, gcc generates a register move that zeros out
all the higher bits before using that register in the following instruction
that ignores these higher bits anyway, issuing that `vmovq`  or `vmovdqa`
register move instruction serving no useful purpose but costing extra RAM
storage, pressure on L1i cache, and possibly µops cache, and possibly extra
iTLB cache misses.

Eliminating these unnecessary `vmovq`  or `vmovdqa` register moves is the
primary reason I have to use inline asm statements instead of calling Intel AVX
API intrinsic functions, or operating with portable gcc vector extensions
alone.

One simple fix could be a special cast to treat an xmm register with multiple
elements as a register with one low element without copying. E.g.
`_mm_cvtsd_f64` generates `vmovq`, I need a version of this function which
generates no instruction but rather treats the __v2df xmm  register as a plain
double. And a vice versa cast - treat a `double` in an xmm register as a __v2df
xmm  register without a cast, while assuming the higher bits undefined.

Reply via email to