Hi Honza, The testcase I am looking at is perlbench from Spec2017 but still working on isolating the exact cause for the slowdown.
It seems the change has a big impact on layout and so also some alignment changes. So it's a bit hard to track down. There does seem to be an extra 2k bytes in the new binary and a lot of BB moved around. > So it would be good to understand why performance chnaged. If you have > testcases to look into, it would be useful. We may want to reduce big > speedup threshold or so which may be better tunned for the broken > behaviour. Perhaps incresing --param max-predicted-loop-iterations could > help, too. Large values however lead to troubles for functions which > combine loop nests with known and loop nests with unknown number of > iterations. I can't seem to find this parameter in params.opt? There is max-predicted--iterations but doesn't seem to be what you're referring to? Thanks, Tamar > -----Original Message----- > From: Jan Hubicka <hubi...@ucw.cz> > Sent: Friday, January 17, 2020 19:21 > To: Tamar Christina <tamar.christ...@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com> > Subject: Re: Make profile estimation more precise > > > Hi Honza, > > > > This change seems to have cost a 0.2% regression in SPEC2017 vs a codesize > reduction of 0.02%. > > > > Only leela seems to have made any noticable difference at 2.64% codesize > reduction. > > > > Is there anyway to keep the performance at -O3 where the codesize > doesn't matter much? > > The patch mostly eliminated roundoff errors. We used to calculate everythig > with 10000 based fixpoint probabilities. Then the code was updated to use > higher precision but the propagation recalculated to > 10000 base rounding to nearest value. This led to some basic blocks to have > sum of outgoing probabilities greater than 1 which in turn led to never ending > loops. > > So it would be good to understand why performance chnaged. If you have > testcases to look into, it would be useful. We may want to reduce big > speedup threshold or so which may be better tunned for the broken > behaviour. Perhaps incresing --param max-predicted-loop-iterations could > help, too. Large values however lead to troubles for functions which > combine loop nests with known and loop nests with unknown number of > iterations. > > We preict loops with unkonwn iterations to predict somewhere between 3 > and 10 iterations. Loops with known number of iterations are predicted > precisely up to --param max-predicted-loop-iterations. > > I remember debugging spec2k benchmark which had multiple nested loops > to perform initialization followed with actual calculation with unkown bounds. > The logic above made us to predict initialization code to be hot and actual > calculation to be cold. > > Honza > > > > Thanks, > > Tamar > > > > > -----Original Message----- > > > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches- > ow...@gcc.gnu.org> > > > On Behalf Of Jan Hubicka > > > Sent: Thursday, January 16, 2020 23:00 > > > To: gcc-patches@gcc.gnu.org > > > Subject: Make profile estimation more precise > > > > > > Hi, > > > while analyzing code size regression in SPEC2k GCC binary I noticed > > > that we perform some inline decisions because we think that number > > > of executions are very high. > > > In particular there was inline decision inlining gen_rtx_fmt_ee to > > > find_reloads believing that it is called 4 billion times. This > > > turned out to be cummulation of roundoff errors in propagate_freq > > > which was bit mechanically updated from original sreals to C++ > > > sreals and later to new probabilities. > > > > > > This led us to estimate that a loopback edge is reached with > > > probability 2.3 which was capped to 1-1/10000 and since this > > > happened in nested loop it quickly escalated to large values. > > > > > > Originally capping to REG_BR_PROB_BASE avoided such problems but > now > > > we have much higher range. > > > > > > This patch avoids going from probabilites to REG_BR_PROB_BASE so > > > precision is kept. In addition it makes the propagation to not > > > estimate more than param-max-predicted-loop-iterations. The first > > > change makes the cap to not be triggered on the gcc build, but it is still > better to be safe than sorry. > > > > > > 2020-01-16 Jan Hubicka <hubi...@ucw.cz> > > > > > > * ipa-fnsummary.c (estimate_calls_size_and_time): Fix formating of > > > dump. > > > * params.opt: (max-predicted-iterations): Set bounds. > > > * predict.c (real_almost_one, real_br_prob_base, > > > real_inv_br_prob_base, real_one_half, real_bb_freq_max): Remove. > > > (propagate_freq): Add max_cyclic_prob parameter; cap cyclic > > > probabilities; do not truncate to reg_br_prob_bases. > > > (estimate_loops_at_level): Pass max_cyclic_prob. > > > (estimate_loops): Compute max_cyclic_prob. > > > (estimate_bb_frequencies): Do not initialize real_*; update > > > calculation > > > of back edge prob. > > > * profile-count.c (profile_probability::to_sreal): New. > > > * profile-count.h (class sreal): Move up in file. > > > (profile_probability::to_sreal): Declare. > > > > > > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index > > > 4e1c587b101..dbd53f12a40 100644 > > > --- a/gcc/ipa-fnsummary.c > > > +++ b/gcc/ipa-fnsummary.c > > > @@ -3258,7 +3258,7 @@ estimate_calls_size_and_time (struct > > > cgraph_node *node, int *size, > > > gcc_assert (*size == old_size); > > > if (time && (*time - old_time > 1 || *time - old_time < -1) > > > && dump_file) > > > - fprintf (dump_file, "Time mismatch in call summary %f!=%f", > > > + fprintf (dump_file, "Time mismatch in call summary %f!=%f\n", > > > old_time.to_double (), > > > time->to_double ()); > > > } > > > diff --git a/gcc/params.opt b/gcc/params.opt index > > > 31cc20031b1..f02c769d0e3 100644 > > > --- a/gcc/params.opt > > > +++ b/gcc/params.opt > > > @@ -555,7 +555,7 @@ Common Joined UInteger > > > Var(param_max_pow_sqrt_depth) Init(5) IntegerRange(1, 32) Maximum > > > depth of sqrt chains to use when synthesizing exponentiation by a > > > real constant. > > > > > > -param=max-predicted-iterations= > > > -Common Joined UInteger Var(param_max_predicted_iterations) > > > Init(100) Param Optimization > > > +Common Joined UInteger Var(param_max_predicted_iterations) > > > +Init(100) IntegerRange(1, 65536) Param Optimization > > > The maximum number of loop iterations we predict statically. > > > > > > -param=max-reload-search-insns= > > > diff --git a/gcc/predict.c b/gcc/predict.c index > > > 02c5fe0667d..c3aed9ed854 > > > 100644 > > > --- a/gcc/predict.c > > > +++ b/gcc/predict.c > > > @@ -76,10 +76,6 @@ enum predictor_reason static const char > > > *reason_messages[] = {"", " (ignored)", > > > " (single edge duplicate)", " (edge pair duplicate)"}; > > > > > > -/* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE, > > > - 1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX. */ > > > -static sreal real_almost_one, real_br_prob_base, > > > - real_inv_br_prob_base, real_one_half, real_bb_freq_max; > > > > > > static void combine_predictions_for_insn (rtx_insn *, basic_block); > > > static void dump_prediction (FILE *, enum br_predictor, int, > > > basic_block, @@ - > > > 3266,7 +3262,8 @@ public: > > > TOVISIT, starting in HEAD. */ > > > > > > static void > > > -propagate_freq (basic_block head, bitmap tovisit) > > > +propagate_freq (basic_block head, bitmap tovisit, > > > + sreal max_cyclic_prob) > > > { > > > basic_block bb; > > > basic_block last; > > > @@ -3322,22 +3319,14 @@ propagate_freq (basic_block head, bitmap > > > tovisit) > > > > > > FOR_EACH_EDGE (e, ei, bb->preds) > > > if (EDGE_INFO (e)->back_edge) > > > - { > > > - cyclic_probability += EDGE_INFO (e)->back_edge_prob; > > > - } > > > + cyclic_probability += EDGE_INFO (e)->back_edge_prob; > > > else if (!(e->flags & EDGE_DFS_BACK)) > > > { > > > - /* frequency += (e->probability > > > - * BLOCK_INFO (e->src)->frequency / > > > - REG_BR_PROB_BASE); */ > > > - > > > /* FIXME: Graphite is producing edges with no profile. Once > > > this is fixed, drop this. */ > > > sreal tmp = e->probability.initialized_p () ? > > > - e->probability.to_reg_br_prob_base () : 0; > > > - tmp *= BLOCK_INFO (e->src)->frequency; > > > - tmp *= real_inv_br_prob_base; > > > - frequency += tmp; > > > + e->probability.to_sreal () : 0; > > > + frequency += tmp * BLOCK_INFO (e->src)->frequency; > > > } > > > > > > if (cyclic_probability == 0) > > > @@ -3346,14 +3335,29 @@ propagate_freq (basic_block head, bitmap > tovisit) > > > } > > > else > > > { > > > - if (cyclic_probability > real_almost_one) > > > - cyclic_probability = real_almost_one; > > > - > > > - /* BLOCK_INFO (bb)->frequency = frequency > > > - / (1 - cyclic_probability) */ > > > - > > > - cyclic_probability = sreal (1) - cyclic_probability; > > > - BLOCK_INFO (bb)->frequency = frequency / cyclic_probability; > > > + if (cyclic_probability > max_cyclic_prob) > > > + { > > > + if (dump_file) > > > + fprintf (dump_file, > > > + "cyclic probability of bb %i is %f (capped to %f)" > > > + "; turning freq %f", > > > + bb->index, cyclic_probability.to_double (), > > > + max_cyclic_prob.to_double (), > > > + frequency.to_double ()); > > > + > > > + cyclic_probability = max_cyclic_prob; > > > + } > > > + else if (dump_file) > > > + fprintf (dump_file, > > > + "cyclic probability of bb %i is %f; turning freq %f", > > > + bb->index, cyclic_probability.to_double (), > > > + frequency.to_double ()); > > > + > > > + BLOCK_INFO (bb)->frequency = frequency > > > + / (sreal (1) - cyclic_probability); > > > + if (dump_file) > > > + fprintf (dump_file, " to %f\n", > > > + BLOCK_INFO (bb)->frequency.to_double ()); > > > } > > > } > > > > > > @@ -3362,16 +3366,11 @@ propagate_freq (basic_block head, bitmap > tovisit) > > > e = find_edge (bb, head); > > > if (e) > > > { > > > - /* EDGE_INFO (e)->back_edge_prob > > > - = ((e->probability * BLOCK_INFO (bb)->frequency) > > > - / REG_BR_PROB_BASE); */ > > > - > > > /* FIXME: Graphite is producing edges with no profile. Once > > > this is fixed, drop this. */ > > > sreal tmp = e->probability.initialized_p () ? > > > - e->probability.to_reg_br_prob_base () : 0; > > > - tmp *= BLOCK_INFO (bb)->frequency; > > > - EDGE_INFO (e)->back_edge_prob = tmp * real_inv_br_prob_base; > > > + e->probability.to_sreal () : 0; > > > + EDGE_INFO (e)->back_edge_prob = tmp * BLOCK_INFO (bb)- > > > >frequency; > > > } > > > > > > /* Propagate to successor blocks. */ @@ -3396,7 +3395,7 @@ > > > propagate_freq (basic_block head, bitmap tovisit) > > > /* Estimate frequencies in loops at same nest level. */ > > > > > > static void > > > -estimate_loops_at_level (class loop *first_loop) > > > +estimate_loops_at_level (class loop *first_loop, sreal > > > +max_cyclic_prob) > > > { > > > class loop *loop; > > > > > > @@ -3407,7 +3406,7 @@ estimate_loops_at_level (class loop *first_loop) > > > unsigned i; > > > auto_bitmap tovisit; > > > > > > - estimate_loops_at_level (loop->inner); > > > + estimate_loops_at_level (loop->inner, max_cyclic_prob); > > > > > > /* Find current loop back edge and mark it. */ > > > e = loop_latch_edge (loop); > > > @@ -3417,7 +3416,7 @@ estimate_loops_at_level (class loop *first_loop) > > > for (i = 0; i < loop->num_nodes; i++) > > > bitmap_set_bit (tovisit, bbs[i]->index); > > > free (bbs); > > > - propagate_freq (loop->header, tovisit); > > > + propagate_freq (loop->header, tovisit, max_cyclic_prob); > > > } > > > } > > > > > > @@ -3428,17 +3427,18 @@ estimate_loops (void) { > > > auto_bitmap tovisit; > > > basic_block bb; > > > + sreal max_cyclic_prob = (sreal)1 - (sreal)1 / > > > + param_max_predicted_iterations; > > > > > > /* Start by estimating the frequencies in the loops. */ > > > if (number_of_loops (cfun) > 1) > > > - estimate_loops_at_level (current_loops->tree_root->inner); > > > + estimate_loops_at_level (current_loops->tree_root->inner, > > > + max_cyclic_prob); > > > > > > /* Now propagate the frequencies through all the blocks. */ > > > FOR_ALL_BB_FN (bb, cfun) > > > { > > > bitmap_set_bit (tovisit, bb->index); > > > } > > > - propagate_freq (ENTRY_BLOCK_PTR_FOR_FN (cfun), tovisit); > > > + propagate_freq (ENTRY_BLOCK_PTR_FOR_FN (cfun), tovisit, > > > + max_cyclic_prob); > > > } > > > > > > /* Drop the profile for NODE to guessed, and update its frequency > > > based on @@ -3844,21 +3844,6 @@ estimate_bb_frequencies (bool > force) > > > if (force || profile_status_for_fn (cfun) != PROFILE_READ > > > || !update_max_bb_count ()) > > > { > > > - static int real_values_initialized = 0; > > > - > > > - if (!real_values_initialized) > > > - { > > > - real_values_initialized = 1; > > > - real_br_prob_base = REG_BR_PROB_BASE; > > > - /* Scaling frequencies up to maximal profile count may result in > > > - frequent overflows especially when inlining loops. > > > - Small scalling results in unnecesary precision loss. Stay in > > > - the half of the (exponential) range. */ > > > - real_bb_freq_max = (uint64_t)1 << (profile_count::n_bits / 2); > > > - real_one_half = sreal (1, -1); > > > - real_inv_br_prob_base = sreal (1) / real_br_prob_base; > > > - real_almost_one = sreal (1) - real_inv_br_prob_base; > > > - } > > > > > > mark_dfs_back_edges (); > > > > > > @@ -3879,10 +3864,10 @@ estimate_bb_frequencies (bool force) > > > this is fixed, drop this. */ > > > if (e->probability.initialized_p ()) > > > EDGE_INFO (e)->back_edge_prob > > > - = e->probability.to_reg_br_prob_base (); > > > + = e->probability.to_sreal (); > > > else > > > - EDGE_INFO (e)->back_edge_prob = REG_BR_PROB_BASE / 2; > > > - EDGE_INFO (e)->back_edge_prob *= real_inv_br_prob_base; > > > + /* back_edge_prob = 0.5 */ > > > + EDGE_INFO (e)->back_edge_prob = sreal (1, -1); > > > } > > > } > > > > > > @@ -3895,14 +3880,18 @@ estimate_bb_frequencies (bool force) > > > if (freq_max < BLOCK_INFO (bb)->frequency) > > > freq_max = BLOCK_INFO (bb)->frequency; > > > > > > - freq_max = real_bb_freq_max / freq_max; > > > + /* Scaling frequencies up to maximal profile count may result in > > > + frequent overflows especially when inlining loops. > > > + Small scalling results in unnecesary precision loss. Stay in > > > + the half of the (exponential) range. */ > > > + freq_max = (sreal (1) << (profile_count::n_bits / 2)) / > > > +freq_max; > > > if (freq_max < 16) > > > freq_max = 16; > > > profile_count ipa_count = ENTRY_BLOCK_PTR_FOR_FN > > > (cfun)->count.ipa (); > > > cfun->cfg->count_max = profile_count::uninitialized (); > > > FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, > > > next_bb) > > > { > > > - sreal tmp = BLOCK_INFO (bb)->frequency * freq_max + > > > real_one_half; > > > + sreal tmp = BLOCK_INFO (bb)->frequency * freq_max + sreal (1, > > > +-1); > > > profile_count count = profile_count::from_gcov_type (tmp.to_int > > > ()); > > > > > > /* If we have profile feedback in which this function was never > > > diff - -git a/gcc/profile-count.c b/gcc/profile-count.c index > > > d463ddd5cce..0c792297826 100644 > > > --- a/gcc/profile-count.c > > > +++ b/gcc/profile-count.c > > > @@ -446,3 +446,12 @@ profile_probability::combine_with_count > > > (profile_count count1, > > > else > > > return *this * even () + other * even (); } > > > + > > > +/* Return probability as sreal in range [0, 1]. */ > > > + > > > +sreal > > > +profile_probability::to_sreal () const { > > > + gcc_checking_assert (initialized_p ()); > > > + return ((sreal)m_val) >> (n_bits - 2); } > > > diff --git a/gcc/profile-count.h b/gcc/profile-count.h index > > > 987d3ce9a49..09217a8699b 100644 > > > --- a/gcc/profile-count.h > > > +++ b/gcc/profile-count.h > > > @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see > > > > > > struct function; > > > struct profile_count; > > > +class sreal; > > > > > > /* Quality of the profile count. Because gengtype does not support > enums > > > inside of classes, this is in global namespace. */ @@ -614,6 > > > +615,8 @@ > > > public: > > > profile_probability other, > > > profile_count count2) const; > > > > > > + /* Return probability as sreal. */ sreal to_sreal () const; > > > /* LTO streaming support. */ > > > static profile_probability stream_in (class lto_input_block *); > > > void stream_out (struct output_block *); @@ -674,8 +677,6 @@ public: > > > > > > */ > > > > > > -class sreal; > > > - > > > struct GTY(()) profile_count > > > { > > > public: