https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119010
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #5) > I had patch to reduce max issue back to 4 (exactly because of the compile > time slowdown and because the way we model decoder we own't issue more than > 4 instructions). Btw, the decoder can indeed not saturate the 6 issue width (but decode 4 insns per thread), but the uop cache can issue 6 uops per clock per thread and the OOO scheduler can dispatch 8 uops per cycle (for both threads). So it's a good question what we model and whether that makes sense. We do seem to model the actual issue ports, but that's after OOO, but as we schedule what gets into the decoder there's quite some intermediate magic happening... I suppose we can argue we optimize code layout for the case of a uop cache miss (rather than for example for best uop cache occupation or 6-wide issue from that). Btw, you can actually benchmark with the uop cache turned off (see https://chipsandcheese.com/p/disabling-zen-5s-op-cache-and-exploring for which MSR to tick) - might be an interesting thing when tuning the scheduler.