https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121591
ak at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ak at gcc dot gnu.org --- Comment #2 from ak at gcc dot gnu.org --- Many x86 targets have limits on how many branches their branch predictor can track per 16 byte line so what you are asking for is likely slower. On others there are also similar limits what the decoded icache can cache per 32 bytes.