many cases

already5chosen at yahoo dot com via Gcc-bugs Mon, 16 May 2022 07:08:36 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617


--- Comment #6 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Michael_S from comment #5)
> 
> Even scalar-to-scalar or vector-to-vector moves that are hoisted at renamer
> does not have a zero cost, because quite often renamer itself constitutes
> the narrowest performance bottleneck. But those moves... I don't think that
> they are hoisted by renamer.

I took a look at several Intel and AMD Optimization Reference Manuals and
instruction tables. None of existing x86 microarchitectures, either old or new,
eliminates scalar-to-SIMD moves at renamer. Which is sort of obvious for new
microarchitectures (Bulldozer or later for AMD, Sandy Bridge or later for
Intel), because on these microarchitectures scalar and SIMD registers live in
separate physical register files.
As to older microarchitectures, they, may be, had them in the common physical
storage area, but they simply were not sufficiently smart to eliminate the
moves.
So, these moves have non-zero latency. On some of the cores, including some of
the newest, the latency is even higher than one clock. And the throughput tends
to be rather low, most typically, one scalar-to-SIMD move per clock. For
comparison,  scalar-to-scalar and SIMD-to-SIMD moves can be executed (or
eliminated at renamer) at rates of 2, 3 or even 4 per clock.

[Bug target/105617] [12/13 Regression] Slp is maybe too aggressive in some/many cases

Reply via email to