https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149
Martin Liška <marxin at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|marxin at gcc dot gnu.org |jamborm at gcc dot gnu.org --- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> --- So the problematic predictor change is: - DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (91), 0) + DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (71), 0) It lower frequency of BB 5: primal_bea_mpp (int64_t m, struct arc_t * arcs, struct arc_t * stop_arcs, int64_t * basket_sizes, struct BASKET * * perm, int thread, struct arc_t * * end_arc, int64_t step, int64_t num_threads, int64_t max_elems) { ... <bb 33> [local count: 12634988]: READY: # DEBUG BEGIN_STMT _113 = *_40; _114 = (sizetype) _113; _115 = _114 + 1; _116 = _115 * 8; _117 = perm_148(D) + _116; _118 = *_117; _118->number = -1; # DEBUG BEGIN_STMT _122 = *_40; if (_122 == 0) goto <bb 34>; [29.93%] else goto <bb 35>; [70.07%] <bb 34> [local count: 3781652]: # DEBUG BEGIN_STMT // predicted unlikely by early return (on trees) predictor. goto <bb 36>; [100.00%] <bb 35> [local count: 8853336]: # DEBUG BEGIN_STMT _127 = (long unsigned int) _122; _128 = perm_148(D) + 8; spec_qsort (_128, _127, 8, cost_compare); # DEBUG BEGIN_STMT _176 = MEM[(struct BASKET * *)perm_148(D) + 8B]; <bb 36> [local count: 12634988]: # _135 = PHI <0B(34), _176(35)> return _135; } It's function defined in pbeampp.c. Then in IPA CP we end up with: Evaluating opportunities for spec_qsort/353. - considering value arc_compare for param #3 int (*<T813>) (const void *, const void *) (caller_count: 1) good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 987, scc) -> evaluation: 314, threshold: 500 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 987, scc) -> evaluation: 643, threshold: 500 Creating a specialized node of spec_qsort/353. replacing param #2 size_t with const 8 replacing param #3 int (*<T813>) (const void *, const void *) with const arc_compare Accounting size:173.00, time:191.00 on predicate exec:(true) Accounting size:3.00, time:2.00 on new predicate exec:(not inlined) the new node is spec_qsort.constprop/377. ipa-prop: Discovered an indirect call to a known target (spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 71 converting indirect call in spec_qsort.constprop to direct call to arc_compare controlled uses count of param 3 bumped down to 8 ipa-prop: Discovered an indirect call to a known target (spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 218 converting indirect call in spec_qsort.constprop to direct call to arc_compare controlled uses count of param 3 bumped down to 7 ipa-prop: Discovered an indirect call to a known target (spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 267 converting indirect call in spec_qsort.constprop to direct call to arc_compare controlled uses count of param 3 bumped down to 6 ipa-prop: Discovered an indirect call to a known target (spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 348 converting indirect call in spec_qsort.constprop to direct call to arc_compare controlled uses count of param 3 bumped down to 5 - considering value cost_compare for param #3 int (*<T813>) (const void *, const void *) (caller_count: 1) good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 701, scc) -> evaluation: 223, threshold: 500 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 701, scc) -> evaluation: 457, threshold: 500 - Creating a specialized node of spec_qsort/353 for all known contexts. replacing param #2 size_t with const 8 Accounting size:173.00, time:191.00 on predicate exec:(true) Accounting size:3.00, time:2.00 on new predicate exec:(not inlined) the new node is spec_qsort.constprop/378. Evaluating opportunities for spec_qsort/353. - adding an extra caller spec_qsort.constprop/377 of spec_qsort.constprop/377 - considering value cost_compare for param #3 int (*<T813>) (const void *, const void *) (caller_count: 1) good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 701, scc) -> evaluation: 223, threshold: 500 good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 701, scc) -> evaluation: 457, threshold: 500 Marking node as dead: spec_qsort/353. Here you can see evaluation is 457, which the threshold is 500. Another question is why IPA inline does not inline the call. Answer is here: IPA function summary for primal_bea_mpp.constprop/366 inlinable global time: 127.809798 self size: 134 global size: 132 min size: 10 self stack: 0 global stack: 0 size:112.000000, time:108.000000 size:7.000000, time:2.000000, executed if:(not inlined) size:2.000000, time:2.000000, nonconst if:(op3 changed) size:1.000000, time:1.000000, nonconst if:(op7 changed) size:3.000000, time:3.000000, nonconst if:(op9 changed) size:1.000000, time:1.000000, nonconst if:(op4 changed) size:1.000000, time:1.000000, nonconst if:(op8 changed) loop iterations:(op8 changed) loop stride:(op8 changed) calls: spec_qsort.constprop/378 function body not available loop depth: 0 freq:0.70 size: 5 time: 14 callee size:138 stack: 0 op2 is compile time invariant op3 is compile time invariant Note that spec_qsort is a recursive function, so I guess spec_qsort.constprop/378 is clone created due to the recursion. Leaving to Martin Jambor as he's the IPA CP expert! P.S. Martin please apply following patch because one dump output has the dash and second not: diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 4202c999675..edc0bda3eca 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -4625,7 +4625,7 @@ decide_about_value (struct cgraph_node *node, int index, HOST_WIDE_INT offset, return false; if (dump_file) - fprintf (dump_file, " Creating a specialized node of %s.\n", + fprintf (dump_file, " - Creating a specialized node of %s.\n", node->dump_name ()); callers = gather_edges_for_value (val, node, caller_count);