https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jamborm at gcc dot gnu.org --- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> --- Current trunk at -O2 -fno-checking (w/ otherwise checking enabled): Time variable usr sys wall GGC phase setup : 0.00 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 1245 kB ( 0%) phase parsing : 16.72 ( 6%) 15.15 ( 75%) 31.86 ( 10%) 612162 kB ( 18%) phase opt and generate : 272.51 ( 94%) 5.08 ( 25%) 277.63 ( 90%) 2719266 kB ( 82%) ipa inlining heuristics : 31.82 ( 11%) 0.00 ( 0%) 31.85 ( 10%) 0 kB ( 0%) ipa profile : 9.92 ( 3%) 0.00 ( 0%) 9.93 ( 3%) 0 kB ( 0%) ipa SRA : 153.77 ( 53%) 1.81 ( 9%) 155.54 ( 50%) 741949 kB ( 22%) early inlining heuristics : 24.54 ( 8%) 0.03 ( 0%) 24.65 ( 8%) 2987 kB ( 0%) at -O -g we can also see to my surprise: tree CFG construction : 6.27 ( 4%) 0.04 ( 0%) 6.28 ( 4%) 628095 kB ( 15%) tree operand scan : 3.78 ( 3%) 0.99 ( 4%) 5.01 ( 3%) 47597 kB ( 1%) tree CFG cleanup : 7.51 ( 5%) 0.05 ( 0%) 7.71 ( 5%) 0 kB ( 0%) the tree CFG construction time is _entirely_ spent in assign_discriminators! That's because expand_location is costly and the discriminator_per_locus hashtable does that all the time. It's also because the testcase sits on a single line. The whole code seems odd to me as well given it doesn't very well handle trailing or leading UNKNOWN_LOCATION stmts. I also wonder why it is done at CFG construction time. The IPA SRA time is all spent in compute_fn_summary via convert_callers. Not sure why that's necessary here? Martin, in r152368 you reduced those to once-per-caller but obviously if each function calls each other function as in this testcase this is still O(n^2). Why's the summary not simply recomputed when we process the caller next? Thus at most N times?