https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116645
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed| |2024-09-09 Summary|[13/14/15 regression] Huge |[13/14/15 regression] Huge |performance loss after |performance loss after |13.2.0 compiler upgrade |13.2.0 compiler upgrade; | |reload CSE regs has | |scalability issues --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- I measure 41s with GCC 13.2, 57% spent in reload CSE regs, vs. 14s with GCC 11.4, 7% spent in reload CSE regs. The compile-time with -O2 behaves similarly with GCC 13.2 but increases to 36s with GCC 11.4, also showing the 63% reload CSE regs figure. So I'd probably blame inliner heuristic changes for the observed difference but the problem exposed looks latent. With -O1, the suggested option for large auto-generated code when you experience compile-time or memory-usage issues, GCC 11.4 takes 20s, GCC 13.2 similar (again both with 53% in reload CSE regs). Also confirmed with GCC 14.2 and a somewhat old trunk (r15-2794). So confirmed. I don't think bisection will reveal anything interesting. Somebody needs to sit down and look at postreload why it takes so long for this testcase. A profile for GCC 14.2 shows Samples: 89K of event 'cycles:Pu', Event count (approx.): 114469365450 Overhead Samples Command Shared Object Symbol 37.58% 33298 cc1plus cc1plus [.] _ZN10hash_tableI13cselib_hasherLb0E11xcallocatorE19find_slot_with_hashERKPNS0_3keyEj13insert_option 14.25% 12619 cc1plus cc1plus [.] _Z22rtx_equal_for_cselib_1P7rtx_defS0_12machine_modei 4.27% 3776 cc1plus cc1plus [.] _Z14bitmap_set_bitP11bitmap_headi so I'd say it's a bad hash (again).