https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119010

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #5)
> I had patch to reduce max issue back to 4 (exactly because of the compile
> time slowdown and because the way we model decoder we own't issue more than
> 4 instructions).

Btw, the decoder can indeed not saturate the 6 issue width (but decode
4 insns per thread), but the uop cache can issue 6 uops per clock per thread
and the OOO scheduler can dispatch 8 uops per cycle (for both threads).

So it's a good question what we model and whether that makes sense.  We do
seem to model the actual issue ports, but that's after OOO, but as we
schedule what gets into the decoder there's quite some intermediate magic
happening...  I suppose we can argue we optimize code layout for the
case of a uop cache miss (rather than for example for best uop cache
occupation or 6-wide issue from that).  Btw, you can actually benchmark
with the uop cache turned off (see
https://chipsandcheese.com/p/disabling-zen-5s-op-cache-and-exploring for which
MSR to tick) - might be an interesting
thing when tuning the scheduler.

Reply via email to