Ping, attachment of https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00764/exchange2.tar.gz shows the profile count difference on cloned nodes digits_2.constprop.[0...8] without/with this patch. Thanks!
Xiong Hu On 2020/1/14 14:45, luoxhu wrote: > Hi, > > On 2020/1/3 00:58, Jan Hubicka wrote: >>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c >>> index 14064ae0034..947bf7c7199 100644 >>> --- a/gcc/ipa-cp.c >>> +++ b/gcc/ipa-cp.c >>> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node >>> *orig_node, >>> false); >>> new_sum = stats.count_sum; >>> + class ipa_node_params *info = IPA_NODE_REF (orig_node); >>> + if (info && info->node_is_self_scc) >>> + { >>> + profile_count self_recursive_count; >>> + >>> + /* The self recursive edge is on the orig_node. */ >>> + for (cs = orig_node->callees; cs; cs = cs->next_callee) >>> + if (ipa_edge_within_scc (cs)) >>> + { >>> + cgraph_node *callee = cs->callee->function_symbol (); >>> + if (callee && cs->caller == cs->callee) >>> + { >>> + self_recursive_count = cs->count; >>> + break; >>> + } >>> + } >> >> What happens here when there are multiple self recursive calls or when >> the SCC has two mutually recursive functions? >> >> I am still confused by the logic of this function. I will take a deeper >> look at your previous email. >>> + >>> + /* Proportion count for self recursive node from all callers. */ >>> + new_sum >>> + = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum); >>> + >>> + /* Proportion count for specialized cloned node. */ >>> + new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth); >>> + } >>> + >>> if (orig_node_count < orig_sum + new_sum) >>> { >>> if (dump_file) >>> diff --git a/gcc/params.opt b/gcc/params.opt >>> index d88ae0c468b..40a03b04580 100644 >>> --- a/gcc/params.opt >>> +++ b/gcc/params.opt >>> @@ -199,7 +199,7 @@ Common Joined UInteger >>> Var(param_ipa_cp_loop_hint_bonus) Init(64) Param >>> Compile-time bonus IPA-CP assigns to candidates which make loop bounds >>> or strides known. >>> -param=ipa-cp-max-recursive-depth= >>> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param >>> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) >>> IntegerRange(1, 10) Param >>> Maximum depth of recursive cloning for self-recursive function. >> >> The values stats from 0 but I also wonder why 10 is a good upper bound? >> If the function calls itself with one type of value (like depth-1) then >> we may clone well over 10 times, if it calls itself with two different >> sets then 8 is already quite high I would say... >> >> I suppose the call probabilities will eventually drop to be very low, >> but I am not quite sure about that because we do not handle any IP >> frequency propagation. Do we have some way to treat wide trees? Or do >> we clone only all self recursive calls are the same? >> >> Honza > > Update v3 patch. This regression could only be reproduced when built with > "-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param > ipa-cp-unit-growth=80" on exchange_2, on some platforms -fno-inline may be > also needed, I attached 3 files (compressed to exchange.tar.gz) > exchange2_gcc.use.orig.wpa.071i.cp.old, exchange2_gcc.use.orig.wpa.071i.cp.new > and cp.diff to show the profile count difference of specialized node > digits_2.constprop/152 to digits_2.constprop/159 without/with this patch. > > Profile count is decreasing slowly with this patch instead of keeping very > small from the first clone (very low count as cold will cause complete unroll > not working), it still differs from really execution (exchange2.png), but this > average model takes the recursive edge as feedback. Thanks. > > > v3: > 1. Enable proportion orig_sum to the new nodes for self recursive node (k > means > multiple self recursive calls): > new_sum = (orig_sum + new_sum) * 1 / k \ > * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth). > 2. Two mutually recursive functions are not supported in self recursive > clone yet so also not supported in update_profiling_info here. > 3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8). > If it calls itself two different sets, usually recursive boudary limit > will stop the specialize first, otherwise it is slow even without > recursive specialize. > > The performance of exchange2 built with PGO will decrease ~28% by r278808 > due to profile count set incorrectly. The cloned nodes are updated to a > very small count caused later pass cunroll fail to unroll the recursive > function in exchange2. > > digits_2 -> > digits_2.constprop.0, digits_2.constprop.1, etc. > > gcc/ChangeLog: > > 2020-01-14 Luo Xiong Hu <luo...@linux.ibm.com> > > * ipa-cp.c (update_profiling_info): Check self_scc node. > * params.opt (ipa-cp-max-recursive-depth): Set param range. > --- > gcc/ipa-cp.c | 33 ++++++++++++++++++++++++++++++++- > gcc/params.opt | 2 +- > 2 files changed, 33 insertions(+), 2 deletions(-) > > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > index 17da1d8e8a7..8b759168e24 100644 > --- a/gcc/ipa-cp.c > +++ b/gcc/ipa-cp.c > @@ -2041,7 +2041,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs, > /* Recursively generate lattice values with a limited count. */ > FOR_EACH_VEC_ELT (val_seeds, i, src_val) > { > - for (int j = 1; j < max_recursive_depth; j++) > + for (int j = 0; j < max_recursive_depth; j++) > { > tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2, > src_val, res_type); > @@ -4313,6 +4313,37 @@ update_profiling_info (struct cgraph_node *orig_node, > false); > new_sum = stats.count_sum; > > + class ipa_node_params *info = IPA_NODE_REF (orig_node); > + if (info && info->node_is_self_scc) > + { > + profile_count self_recursive_count = profile_count::zero (); > + unsigned k = 0; > + > + /* The self recursive edge is on the orig_node. */ > + for (cs = orig_node->callees; cs; cs = cs->next_callee) > + if (ipa_edge_within_scc (cs)) > + { > + cgraph_node *callee = cs->callee->function_symbol (); > + if (callee && cs->caller == cs->callee) > + { > + self_recursive_count += cs->count; > + k++; > + } > + } > + > + /* Proportion count for multiple self recursive node. */ > + if (k) > + self_recursive_count.apply_scale (1, k); > + > + /* Proportion count for self recursive node from all callers. */ > + new_sum > + = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum); > + > + /* Proportion count for specialized cloned node. */ > + if (param_ipa_cp_max_recursive_depth) > + new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth); > + } > + > if (orig_node_count < orig_sum + new_sum) > { > if (dump_file) > diff --git a/gcc/params.opt b/gcc/params.opt > index 31cc20031b1..9eceee5871f 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -199,7 +199,7 @@ Common Joined UInteger > Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimiza > Compile-time bonus IPA-CP assigns to candidates which make loop bounds or > strides known. > > -param=ipa-cp-max-recursive-depth= > -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param > Optimization > +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(7) > IntegerRange(0, 8) Param Optimization > Maximum depth of recursive cloning for self-recursive function. > > -param=ipa-cp-min-recursive-probability=