https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
      Known to fail|                            |14.0
           Keywords|                            |ra
     Ever confirmed|0                           |1
                 CC|                            |vmakarov at gcc dot gnu.org
   Last reconfirmed|2024-03-26 00:00:00         |2024-03-27

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
I see on x86_64-linux w/ release checking

 tree SSA rewrite                   :  76.99 ( 31%)   0.09 (  5%)  77.11 ( 31%)
   96M (  9%)
 integrated RA                      :  92.31 ( 37%)   0.15 (  8%)  92.49 ( 37%)
  105M ( 10%)
 LRA create live ranges             :  54.01 ( 22%)   0.00 (  0%)  54.02 ( 22%)
  885k (  0%)
 TOTAL                              : 246.34          1.88        248.43       
 1039M
246.34user 2.02system 4:08.92elapsed 99%CPU (0avgtext+0avgdata
3287072maxresident)k
70416inputs+0outputs (110major+1229628minor)pagefaults 0swaps

tree SSA rewrite is interesting, probably bitmap slowness and cache dependent.

With -O1:

 tree PTA                           :  85.65 ( 14%)   0.21 (  3%)  85.89 ( 14%)
  348M (  2%)
 tree SSA rewrite                   :  76.05 ( 13%)   0.10 (  1%)  76.14 ( 12%)
   96M (  1%)
 tree SSA incremental               : 181.52 ( 30%)   0.03 (  0%) 181.50 ( 30%)
10031k (  0%)
 expand vars                        :  66.72 ( 11%)   0.00 (  0%)  66.74 ( 11%)
 6132k (  0%)
 expand                             :  64.33 ( 11%)   0.02 (  0%)  64.39 ( 11%)
  172M (  1%)
 TOTAL                              : 603.55          7.72        611.61       
19327M
603.55user 7.83system 10:11.78elapsed 99%CPU (0avgtext+0avgdata
19809792maxresident)k
21520inputs+0outputs (48major+5102514minor)pagefaults 0swaps

definitely "interesting" testcase.

The profile for -O0 shows IDF compute (that's SSA rewrite, a usual suspect)
and other bits that might be interesting for the RA part.

Samples: 1M of event 'cycles:u', Event count (approx.): 1332096582355           
Overhead       Samples  Command  Shared Object       Symbol                     
  24.78%        243663  cc1plus  cc1plus             [.] compute_idf
  11.29%        115134  cc1plus  cc1plus             [.] make_hard_regno_dead
  10.29%        104126  cc1plus  cc1plus             [.] process_bb_node_lives
   5.29%         53680  cc1plus  cc1plus             [.] mark_pseudo_regno_live
   4.95%         50051  cc1plus  cc1plus             [.] mark_ref_dead
   3.95%         40075  cc1plus  cc1plus             [.]
update_allocno_pressure
   2.73%         27977  cc1plus  cc1plus             [.]
lra_create_live_ranges_
   2.48%         25136  cc1plus  cc1plus             [.] inc_register_pressure
   2.37%         24268  cc1plus  cc1plus             [.] update_pseudo_point
   2.23%         21976  cc1plus  cc1plus             [.] mergesort<sort_ctx>
   2.19%         22208  cc1plus  cc1plus             [.] make_object_dead
   2.09%         21316  cc1plus  cc1plus             [.] sparseset_clear_bit
   1.99%         20181  cc1plus  cc1plus             [.] bitmap_set_bit

I'll note this was all tested on trunk, GCC 11 might behave even worse and
quite some deep recursion issues have been fixed in newer releases.

Reply via email to