https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #96 from Richard Biener <rguenth at gcc dot gnu.org> ---
The full testcase on trunk (g:95d94b52ea8478334fb92cca545f0bd904bd0034) at -O0
-g
now takes 9s to compile and uses 1GB ram.

With -O1 -g we have

Time variable                                   usr           sys          wall
          GGC
 callgraph functions expansion      :  13.41 ( 12%)   0.21 ( 60%)  13.63 ( 12%)
  439M ( 73%)
 callgraph ipa passes               :  94.79 ( 86%)   0.13 ( 37%)  94.95 ( 86%)
   75M ( 13%)
 ipa function summary               :  91.46 ( 83%)   0.02 (  6%)  91.53 ( 83%)
   17M (  3%)
 tree PTA                           :   5.78 (  5%)   0.05 ( 14%)   5.85 (  5%)
   23M (  4%)
 TOTAL                              : 109.96          0.35        110.37       
  597M
109.97user 0.37system 1:50.38elapsed 99%CPU (0avgtext+0avgdata
1110568maxresident)k
0inputs+0outputs (0major+350549minor)pagefaults 0swaps

where perf shows

Samples: 448K of event 'cycles:u', Event count (approx.): 483237005145          
Overhead       Samples  Command   Shared Object     Symbol                      
  17.26%         77187  f951      f951              [.] get_ref_base_and_extent
                                          #
   8.36%         37385  f951      f951              [.]
stmt_may_clobber_ref_p_1                                          #
   7.16%         32045  f951      f951              [.] default_binds_local_p_3
                                          #
   6.40%         28628  f951      f951              [.] bitmap_bit_p           
                                          #
   6.39%         28557  f951      f951              [.]
determine_known_aggregate_parts                                   #
   5.92%         26464  f951      f951              [.] pt_solution_includes_1 
                                          #
   4.66%         20834  f951      f951              [.]
call_may_clobber_ref_p_1                                          #
   3.44%         15406  f951      f951              [.] flags_from_decl_or_type
                                          #
   3.35%         14971  f951      f951              [.] refs_may_alias_p_1     
                                          #
   3.05%         13667  f951      f951              [.] gimple_call_flags      
                                          #
   2.55%         11387  f951      f951              [.]
cgraph_node::get_availability                                     #
   2.40%         10739  f951      libc-2.26.so      [.] __strncmp_sse42        
                                          #
   2.32%         10372  f951      f951              [.] check_fnspec           
                                          #
   1.89%          8411  f951      f951              [.] bitmap_set_bit         
                                          #
   1.71%          7635  f951      f951              [.]
private_lookup_attribute                                          #
   1.68%          7512  f951      f951              [.]
get_modref_function_summary                                       #
   1.52%          6805  f951      f951              [.]
decl_binds_to_current_def_p                                       #
   1.46%          6512  f951      f951              [.] gimple_call_fnspec     
                                          #
   1.26%          5582  f951      f951              [.] bitmap_clear_bit       
                                          #
   0.94%          4212  f951      f951              [.]
cgraph_node::function_or_virtual_thunk_symbol       

we need to do sth about the IPA fnsummary cost, it looks unreasonable compared
to all the rest, at least for -O1.  Cutting down --param ipa-max-aa-steps
doesn't seem to help but it looks accounting is simply broken.

And with -O2 or -O3 we have

Time variable                                   usr           sys          wall
          GGC
 callgraph functions expansion      : 201.23 ( 20%)   0.77 ( 46%) 202.05 ( 20%)
 1230M ( 82%)
 callgraph ipa passes               : 807.58 ( 80%)   0.86 ( 52%) 808.75 ( 80%)
  201M ( 13%)
 ipa inlining heuristics            :  40.25 (  4%)   0.01 (  1%)  40.24 (  4%)
   41M (  3%)
 alias stmt walking                 :  21.48 (  2%)   0.20 ( 12%)  21.72 (  2%)
  601k (  0%)
 tree PTA                           : 788.36 ( 78%)   0.76 ( 46%) 789.43 ( 78%)
  101M (  7%)
 tree slp vectorization             :  13.97 (  1%)   0.04 (  2%)  14.01 (  1%)
  225M ( 15%)
 expand vars                        :  92.66 (  9%)   0.00 (  0%)  92.72 (  9%)
   63M (  4%)
 TOTAL                              :1010.42          1.66       1012.46       
 1509M
1010.42user 1.73system 16:52.53elapsed 99%CPU (0avgtext+0avgdata
4764428maxresident)k
0inputs+0outputs (0major+1199966minor)pagefaults 0swaps

surprisingly the IPA fnsummary issue is -O1 only but maybe it's an accounting
issue.  perf with callgraph points to (if I interpret correctly) the
determine_known_aggregate_parts function which, while accounting alias
queries done via get_continuation_for_phi, does not account those done
by walking the VDEF chain itself.  I'm testing a fix.

Reply via email to