https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|marxin at gcc dot gnu.org          |jamborm at gcc dot 
gnu.org

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
So the problematic predictor change is:

- DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (91), 0)
+ DEF_PREDICTOR (PRED_NULL_RETURN, "null return", HITRATE (71), 0)

It lower frequency of BB 5:

primal_bea_mpp (int64_t m, struct arc_t * arcs, struct arc_t * stop_arcs,
int64_t * basket_sizes, struct BASKET * * perm, int thread, struct arc_t * *
end_arc, int64_t step, int64_t num_threads, int64_t max_elems)
{
...
  <bb 33> [local count: 12634988]:
READY:
  # DEBUG BEGIN_STMT
  _113 = *_40;
  _114 = (sizetype) _113;
  _115 = _114 + 1;
  _116 = _115 * 8;
  _117 = perm_148(D) + _116;
  _118 = *_117;
  _118->number = -1;
  # DEBUG BEGIN_STMT
  _122 = *_40;
  if (_122 == 0)
    goto <bb 34>; [29.93%]
  else
    goto <bb 35>; [70.07%]

  <bb 34> [local count: 3781652]:
  # DEBUG BEGIN_STMT
  // predicted unlikely by early return (on trees) predictor.
  goto <bb 36>; [100.00%]

  <bb 35> [local count: 8853336]:
  # DEBUG BEGIN_STMT
  _127 = (long unsigned int) _122;
  _128 = perm_148(D) + 8;
  spec_qsort (_128, _127, 8, cost_compare);
  # DEBUG BEGIN_STMT
  _176 = MEM[(struct BASKET * *)perm_148(D) + 8B];

  <bb 36> [local count: 12634988]:
  # _135 = PHI <0B(34), _176(35)>
  return _135;
}

It's function defined in pbeampp.c.

Then in IPA CP we end up with:

Evaluating opportunities for spec_qsort/353.
 - considering value arc_compare for param #3 int (*<T813>) (const void *,
const void *) (caller_count: 1)
     good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 987, scc) ->
evaluation: 314, threshold: 500
     good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 987, scc) ->
evaluation: 643, threshold: 500
  Creating a specialized node of spec_qsort/353.
    replacing param #2 size_t with const 8
    replacing param #3 int (*<T813>) (const void *, const void *) with const
arc_compare
                Accounting size:173.00, time:191.00 on predicate exec:(true)
                Accounting size:3.00, time:2.00 on new predicate exec:(not
inlined)
     the new node is spec_qsort.constprop/377.
ipa-prop: Discovered an indirect call to a known target
(spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 71
converting indirect call in spec_qsort.constprop to direct call to arc_compare
     controlled uses count of param 3 bumped down to 8
ipa-prop: Discovered an indirect call to a known target
(spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 218
converting indirect call in spec_qsort.constprop to direct call to arc_compare
     controlled uses count of param 3 bumped down to 7
ipa-prop: Discovered an indirect call to a known target
(spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 267
converting indirect call in spec_qsort.constprop to direct call to arc_compare
     controlled uses count of param 3 bumped down to 6
ipa-prop: Discovered an indirect call to a known target
(spec_qsort.constprop/377 -> arc_compare/144), for stmt with uid 348
converting indirect call in spec_qsort.constprop to direct call to arc_compare
     controlled uses count of param 3 bumped down to 5
 - considering value cost_compare for param #3 int (*<T813>) (const void *,
const void *) (caller_count: 1)
     good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 701, scc) ->
evaluation: 223, threshold: 500
     good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 701, scc) ->
evaluation: 457, threshold: 500
 - Creating a specialized node of spec_qsort/353 for all known contexts.
    replacing param #2 size_t with const 8
                Accounting size:173.00, time:191.00 on predicate exec:(true)
                Accounting size:3.00, time:2.00 on new predicate exec:(not
inlined)
     the new node is spec_qsort.constprop/378.

Evaluating opportunities for spec_qsort/353.
 - adding an extra caller spec_qsort.constprop/377 of spec_qsort.constprop/377
 - considering value cost_compare for param #3 int (*<T813>) (const void *,
const void *) (caller_count: 1)
     good_cloning_opportunity_p (time: 143, size: 269, freq_sum: 701, scc) ->
evaluation: 223, threshold: 500
     good_cloning_opportunity_p (time: 942, size: 866, freq_sum: 701, scc) ->
evaluation: 457, threshold: 500
  Marking node as dead: spec_qsort/353.

Here you can see evaluation is 457, which the threshold is 500.

Another question is why IPA inline does not inline the call. Answer is here:

IPA function summary for primal_bea_mpp.constprop/366 inlinable
  global time:     127.809798
  self size:       134
  global size:     132
  min size:       10
  self stack:      0
  global stack:    0
    size:112.000000, time:108.000000
    size:7.000000, time:2.000000,  executed if:(not inlined)
    size:2.000000, time:2.000000,  nonconst if:(op3 changed)
    size:1.000000, time:1.000000,  nonconst if:(op7 changed)
    size:3.000000, time:3.000000,  nonconst if:(op9 changed)
    size:1.000000, time:1.000000,  nonconst if:(op4 changed)
    size:1.000000, time:1.000000,  nonconst if:(op8 changed)
  loop iterations:(op8 changed)
  loop stride:(op8 changed)
  calls:
    spec_qsort.constprop/378 function body not available
      loop depth: 0 freq:0.70 size: 5 time: 14 callee size:138 stack: 0
       op2 is compile time invariant
       op3 is compile time invariant

Note that spec_qsort is a recursive function, so I guess
spec_qsort.constprop/378 is
clone created due to the recursion.

Leaving to Martin Jambor as he's the IPA CP expert!
P.S. Martin please apply following patch because one dump output has the dash
and second not:

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4202c999675..edc0bda3eca 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4625,7 +4625,7 @@ decide_about_value (struct cgraph_node *node, int index,
HOST_WIDE_INT offset,
     return false;

   if (dump_file)
-    fprintf (dump_file, "  Creating a specialized node of %s.\n",
+    fprintf (dump_file, " - Creating a specialized node of %s.\n",
             node->dump_name ());

   callers = gather_edges_for_value (val, node, caller_count);

Reply via email to