https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
compile-time is back to the first jump caused by r14-2337-g37a231cc7594d1,
thanks Roger.  We still have

 LRA non-specific                   :   3.53 ( 75%)

at -O0 here which Rogers followup patch will improve (but not generally
solve the issue).

At -O1 combine dominates, at -O2 we see other parts of RA being slow:

 integrated RA                      :   7.10 ( 23%) 
 LRA non-specific                   :   1.56 (  5%)
 LRA virtuals elimination           :   0.07 (  0%)
 LRA reload inheritance             :   1.02 (  3%)
 LRA create live ranges             :   0.88 (  3%)
 LRA hard reg assignment            :   8.22 ( 27%)
 LRA coalesce pseudo regs           :   0.00 (  0%)
 LRA rematerialization              :   0.18 (  1%)

Samples: 124K of event 'cycles:u', Event count (approx.): 164730867020          
Overhead       Samples  Command  Shared Object       Symbol                     
  16.60%         20660  cc1      cc1                 [.] find_hard_regno_for_1
  11.90%         14742  cc1      cc1                 [.] bitmap_set_bit
   6.47%          7973  cc1      cc1                 [.] color_allocnos
   3.31%          4023  cc1      cc1                 [.] bitmap_bit_p
   3.07%          3791  cc1      cc1                 [.]
remove_allocno_from_bucket_and_push
   2.77%          3435  cc1      cc1                 [.] assign_hard_reg
   2.54%          3138  cc1      cc1                 [.] ira_build_conflicts

in find_hard_regno_for_1 the loop over live ranges is what's costly, esp.
because it seems the conditionals in the loops depend on (indirect) memory
and that no longer fits nicely into caches.

Maybe regno_allocno_class_array can be shrunk from 'enum reg_class'
(unsigned int) to something smaller.  It looks like this array is a
memory optimization since reg_allocno_class would perform a much sparser
access.

Reply via email to