https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
Current trunk at -O2 -fno-checking (w/ otherwise checking enabled):

Time variable                                   usr           sys          wall
              GGC
 phase setup                        :   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
   1245 kB (  0%)
 phase parsing                      :  16.72 (  6%)  15.15 ( 75%)  31.86 ( 10%)
 612162 kB ( 18%)
 phase opt and generate             : 272.51 ( 94%)   5.08 ( 25%) 277.63 ( 90%)
2719266 kB ( 82%)
 ipa inlining heuristics            :  31.82 ( 11%)   0.00 (  0%)  31.85 ( 10%)
      0 kB (  0%)
 ipa profile                        :   9.92 (  3%)   0.00 (  0%)   9.93 (  3%)
      0 kB (  0%)
 ipa SRA                            : 153.77 ( 53%)   1.81 (  9%) 155.54 ( 50%)
 741949 kB ( 22%)
 early inlining heuristics          :  24.54 (  8%)   0.03 (  0%)  24.65 (  8%)
   2987 kB (  0%)

at -O -g we can also see to my surprise:

 tree CFG construction              :   6.27 (  4%)   0.04 (  0%)   6.28 (  4%)
 628095 kB ( 15%)
 tree operand scan                  :   3.78 (  3%)   0.99 (  4%)   5.01 (  3%)
  47597 kB (  1%)
 tree CFG cleanup                   :   7.51 (  5%)   0.05 (  0%)   7.71 (  5%)
      0 kB (  0%)

the tree CFG construction time is _entirely_ spent in assign_discriminators!
That's because expand_location is costly and the discriminator_per_locus
hashtable does that all the time.  It's also because the testcase
sits on a single line.  The whole code seems odd to me as well given
it doesn't very well handle trailing or leading UNKNOWN_LOCATION stmts.
I also wonder why it is done at CFG construction time.

The IPA SRA time is all spent in compute_fn_summary via convert_callers.
Not sure why that's necessary here?  Martin, in r152368 you reduced those
to once-per-caller but obviously if each function calls each other function
as in this testcase this is still O(n^2).  Why's the summary not simply
recomputed when we process the caller next?  Thus at most N times?

Reply via email to