https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #6 from Michael_S <already5chosen at yahoo dot com> --- (In reply to Michael_S from comment #5) > > Even scalar-to-scalar or vector-to-vector moves that are hoisted at renamer > does not have a zero cost, because quite often renamer itself constitutes > the narrowest performance bottleneck. But those moves... I don't think that > they are hoisted by renamer. I took a look at several Intel and AMD Optimization Reference Manuals and instruction tables. None of existing x86 microarchitectures, either old or new, eliminates scalar-to-SIMD moves at renamer. Which is sort of obvious for new microarchitectures (Bulldozer or later for AMD, Sandy Bridge or later for Intel), because on these microarchitectures scalar and SIMD registers live in separate physical register files. As to older microarchitectures, they, may be, had them in the common physical storage area, but they simply were not sufficiently smart to eliminate the moves. So, these moves have non-zero latency. On some of the cores, including some of the newest, the latency is even higher than one clock. And the throughput tends to be rather low, most typically, one scalar-to-SIMD move per clock. For comparison, scalar-to-scalar and SIMD-to-SIMD moves can be executed (or eliminated at renamer) at rates of 2, 3 or even 4 per clock.