https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- compile-time is back to the first jump caused by r14-2337-g37a231cc7594d1, thanks Roger. We still have LRA non-specific : 3.53 ( 75%) at -O0 here which Rogers followup patch will improve (but not generally solve the issue). At -O1 combine dominates, at -O2 we see other parts of RA being slow: integrated RA : 7.10 ( 23%) LRA non-specific : 1.56 ( 5%) LRA virtuals elimination : 0.07 ( 0%) LRA reload inheritance : 1.02 ( 3%) LRA create live ranges : 0.88 ( 3%) LRA hard reg assignment : 8.22 ( 27%) LRA coalesce pseudo regs : 0.00 ( 0%) LRA rematerialization : 0.18 ( 1%) Samples: 124K of event 'cycles:u', Event count (approx.): 164730867020 Overhead Samples Command Shared Object Symbol 16.60% 20660 cc1 cc1 [.] find_hard_regno_for_1 11.90% 14742 cc1 cc1 [.] bitmap_set_bit 6.47% 7973 cc1 cc1 [.] color_allocnos 3.31% 4023 cc1 cc1 [.] bitmap_bit_p 3.07% 3791 cc1 cc1 [.] remove_allocno_from_bucket_and_push 2.77% 3435 cc1 cc1 [.] assign_hard_reg 2.54% 3138 cc1 cc1 [.] ira_build_conflicts in find_hard_regno_for_1 the loop over live ranges is what's costly, esp. because it seems the conditionals in the loops depend on (indirect) memory and that no longer fits nicely into caches. Maybe regno_allocno_class_array can be shrunk from 'enum reg_class' (unsigned int) to something smaller. It looks like this array is a memory optimization since reg_allocno_class would perform a much sparser access.