Ping on this one since the new param would be useful in the other profile related patch I'm working on (the one to handle the dropped profile counts in tree-profile).
Thanks, Teresa On Thu, Oct 10, 2013 at 2:35 PM, Teresa Johnson <tejohn...@google.com> wrote: > On Mon, Oct 7, 2013 at 10:45 PM, Teresa Johnson <tejohn...@google.com> wrote: >> On Sun, Oct 6, 2013 at 6:43 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>> 2013-10-03 Teresa Johnson <tejohn...@google.com> >>>> >>>> * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter. >>> >>> I would not mention HOT in the parameter name. Blocks are >>> hot/normal/unlikely >>> and we HOT_BB_FREQUENCY_FRACTION. >>> >>> So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is >>> relative >>> to number of train runs? >> >> Ok. >> >>>> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO, >>>> + "min-hot-run-ratio", >>>> + "The minimum ratio of profile runs to basic block execution >>>> count " >>>> + "for the block to be considered hot", >>> Perhaps as: >>>> + "The maximum ratio of profile runs to basic block execution >>>> count " >>>> + "for the block to be considered unlikely", >>> >>>> + 4, 0, 10000) >>> >>> lower bound should be 1 or we divide by 0. >> >> Yep, will fix. >> >>>> Index: predict.c >>>> =================================================================== >>>> --- predict.c (revision 203152) >>>> +++ predict.c (working copy) >>>> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun, >>>> gcc_checking_assert (fun); >>>> if (profile_status_for_function (fun) == PROFILE_READ) >>>> { >>>> - if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0) >>>> + int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO); >>>> + if (RDIV (count * min_run_ratio, profile_info->runs) > 0) >>> >>> This way if count is 1, runs needs to be n_run_ratio * 2 >>> So perhaps if (count * min_run_ratio >= profile_info->runs) instead of >>> division. >> >> Ok. >> >>>> - if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < >>>> REG_BR_PROB_BASE) >>>> + if (ENTRY_BLOCK_PTR->count) >>>> { >>>> - return (RDIV (frequency * ENTRY_BLOCK_PTR->count, >>>> - ENTRY_BLOCK_PTR->frequency) >>>> - < REG_BR_PROB_BASE / 4); >>>> + gcov_type scaled_count >>>> + = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio; >>>> + gcov_type computed_count; >>>> + /* Check for overflow, in which case ENTRY_BLOCK_PTR->count >>>> should >>>> + be large enough to do the division first without losing much >>>> + precision. */ >>>> + if (scaled_count/frequency/min_run_ratio != >>>> ENTRY_BLOCK_PTR->count) >>> >>> I do not like the undefined behaviour triggered before the check (sure >>> unsigned >>> arithmetic would fix that, but it seems all strange). What about guarding >>> the >>> whole code. >>> >>> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE) >>> For large counts the roudoff problems in first test should not be problem. >> >> Sure, that sounds reasonable. > > Actually, we don't care about count at this point, just > ENTRY_BLOCK_PTR->count. > >> >>>> + { >>>> + computed_count = RDIV (ENTRY_BLOCK_PTR->count, >>>> + ENTRY_BLOCK_PTR->frequency); >>>> + computed_count *= frequency * min_run_ratio; >>>> + } >>>> + else >>>> + computed_count = RDIV (scaled_count, >>>> ENTRY_BLOCK_PTR->frequency); >>>> + if (RDIV (computed_count * min_run_ratio, profile_info->runs) > >>>> 0) >>> >>> So at nonoverflow path you compute >>> >>> ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * >>> min_run_ratio / runs > 0.5 >> >> Yes, there is an extra min_run_ratio factor in there that I need to remove! >>> >>> I think you want >>> >>> ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= >>> runs >> >> Right, will change that too. > > Note that the above change essentially decreases the min_run_ratio in > half (since we are comparing >= runs instead of >= 0.5 runs due to the > rounding divide). > >> >>> >>> If entry_bb_frequency is 0, you can just return true iff frequency is > >>> min_run_ratio. >> >> We already return false conservatively if entry_bb_frequency is 0. Do >> we really want to change this - I'm not sure it makes sense to compare >> the frequency to the run ratio directly. >> >>> >>> I think safe way to compute this is >>> >>> if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE) >>> scaled_count = ((frequency * entry_bb_count * min_run_ratio) / >>> entry_bb_frequency; >>> else >>> scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * >>> min_run_ratio >>> >>> In first path, we have at most 10000*10000*10000*10000 that still fits in >>> 64bit >>> value. >>> >>> In the second path the base of fixed point arithmetic is large enough so the >>> division should not matter. We never get over entry_count * 10000 that is >>> safe. >>> The first part of statement however just compute value that should match >>> count >>> so count > 10000 unless profile got misupated and the low-count roundoff >>> errors should >>> not matter. So perhaps the whole code can execute only if >>> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE >>> && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE) >>> and we do not need any conditionals for scaled_count > > As noted above, we don't use count here, so this can be simplified to: > > if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE) > > (and ENTRY_BLOCK_PTR->count > 0, which is checked earlier). See the > fixed patch below. > >>> >>> Does such code work with ratio changed to 16? I think the double >>> multiplication in your >>> patch may have caused problems. >> >> Let me fix this up and will get back. > > As noted above, the change to avoid the RDIV by runs decreases the > effect of the min_run_ratio (now unlikely_count_fraction) in half. So > we really need to increase this parameter to 32 to compare to what the > old version of the code was doing. > > Indeed, using 32 does fix the same set of issues. However, when > setting the parameter to the old default of 4, there are 5 core dumps > using the patch to make the unlikely code unexecutable with a 100 > training runs (see below for some examples why). This goes down to 2 > failures if I set the parameter to 10, and 0 if I set it to 20. Using > 20 essentially means that the code has to be executed 5% of the runs, > which seems reasonable for something like splitting. What do you > think? > > A couple examples why there was cold split code being executed with > 100 train runs. > > 1) No context sensitivity for routine that is inlined: > > In > /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr33870.c, > we call merge_pagelist 3 times from sort_pagelist. In the profile-use > compile merge_pagelist is inlined (I believe in all 3 locations). One > of the conditional blocks (line 23) has a profile count that is less > than runs/4. It turns out that from the first merge_pagelist callsite > we always execute this block, but the other two are completely cold. > But the profile info from all three calls is of course merged, and it > looks like it is executed infrequently overall. So the block is split > but then we execute it from the inline instance corresponding to the > first call. > > I'm not sure what we can do in general here. But it is another data > point suggesting to me that the blocks should be very cold to be > split. > > 2) RTL expansion: > > Another was in the following code from > /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr43784.c: > > static struct s __attribute__((noinline)) rp(void) > { > return *p; > } > > where p is a global that is assigned as: > > static union u v; > static struct s *p = &v.d.b; > > RTL expansion synthesizes a bunch of tests and branches (alignment > handling?), and guesses at the probabilities of the resulting > branches. Unfortunately, this is less than accurate and we end up > executing a block that is given a small likelihood, making it below > the threshold with the default unlikely fraction of 4. > > Here is the patch with the current threshold kept as 4. Let me know if > you think we can kick this up to 20, or if we should just increase the > threshold for splitting. > > Passes regression tests, ok for trunk? > > (BTW, getting close. I have a bunch of fixes to the loop unroller that > need some cleanup, and a small fix to ssa tail merging to update the > profiles, but I think that does it for the tests I was looking at. > Will send those soon for review.) > > Thanks, > Teresa > > 2013-10-10 Teresa Johnson <tejohn...@google.com> > > * predict.c (probably_never_executed): Compare frequency-based > count to number of training runs. > * params.def (UNLIKELY_BB_COUNT_FRACTION): New parameter. > > Index: predict.c > =================================================================== > --- predict.c (revision 203390) > +++ predict.c (working copy) > @@ -237,17 +237,33 @@ probably_never_executed (struct function *fun, > gcc_checking_assert (fun); > if (profile_status_for_function (fun) == PROFILE_READ) > { > - if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0) > + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); > + if (count * unlikely_count_fraction >= profile_info->runs) > return false; > if (!frequency) > return true; > if (!ENTRY_BLOCK_PTR->frequency) > return false; > - if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < > REG_BR_PROB_BASE) > + if (ENTRY_BLOCK_PTR->count) > { > - return (RDIV (frequency * ENTRY_BLOCK_PTR->count, > - ENTRY_BLOCK_PTR->frequency) > - < REG_BR_PROB_BASE / 4); > + gcov_type computed_count; > + /* Check for possibility of overflow, in which case entry bb count > + is large enough to do the division first without losing much > + precision. */ > + if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE) > + { > + gcov_type scaled_count > + = frequency * ENTRY_BLOCK_PTR->count * > unlikely_count_fraction; > + computed_count = RDIV (scaled_count, > ENTRY_BLOCK_PTR->frequency); > + } > + else > + { > + computed_count = RDIV (ENTRY_BLOCK_PTR->count, > + ENTRY_BLOCK_PTR->frequency); > + computed_count *= frequency * unlikely_count_fraction; > + } > + if (computed_count >= profile_info->runs) > + return false; > } > return true; > } > Index: params.def > =================================================================== > --- params.def (revision 203390) > +++ params.def (working copy) > @@ -373,6 +373,11 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION, > "Select fraction of the maximal frequency of executions of > basic block in function given basic block needs to have to be > considered hot", > 1000, 0, 0) > > +DEFPARAM(UNLIKELY_BB_COUNT_FRACTION, > + "unlikely-bb-count-fraction", > + "The minimum fraction of profile runs a given basic block > execution count must be not to be considered unlikely", > + 4, 1, 10000) > + > DEFPARAM (PARAM_ALIGN_THRESHOLD, > "align-threshold", > "Select fraction of the maximal frequency of executions of > basic block in function given basic block get alignment", > > >> >> Thanks, >> Teresa >> >>> >>> I will check the second part of patch separately. >>> Thanks! >>> Honza >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413