Ping on this one since the new param would be useful in the other
profile related patch I'm working on (the one to handle the dropped
profile counts in tree-profile).

Thanks,
Teresa

On Thu, Oct 10, 2013 at 2:35 PM, Teresa Johnson <tejohn...@google.com> wrote:
> On Mon, Oct 7, 2013 at 10:45 PM, Teresa Johnson <tejohn...@google.com> wrote:
>> On Sun, Oct 6, 2013 at 6:43 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>> 2013-10-03  Teresa Johnson  <tejohn...@google.com>
>>>>
>>>>         * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
>>>
>>> I would not mention HOT in the parameter name.  Blocks are 
>>> hot/normal/unlikely
>>> and we HOT_BB_FREQUENCY_FRACTION.
>>>
>>> So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is 
>>> relative
>>> to number of train runs?
>>
>> Ok.
>>
>>>> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
>>>> +        "min-hot-run-ratio",
>>>> +         "The minimum ratio of profile runs to basic block execution 
>>>> count "
>>>> +         "for the block to be considered hot",
>>> Perhaps as:
>>>> +         "The maximum ratio of profile runs to basic block execution 
>>>> count "
>>>> +         "for the block to be considered unlikely",
>>>
>>>> +        4, 0, 10000)
>>>
>>> lower bound should be 1 or we divide by 0.
>>
>> Yep, will fix.
>>
>>>> Index: predict.c
>>>> ===================================================================
>>>> --- predict.c   (revision 203152)
>>>> +++ predict.c   (working copy)
>>>> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>>>>    gcc_checking_assert (fun);
>>>>    if (profile_status_for_function (fun) == PROFILE_READ)
>>>>      {
>>>> -      if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
>>>> +      int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
>>>> +      if (RDIV (count * min_run_ratio, profile_info->runs) > 0)
>>>
>>> This way if count is 1, runs needs to be n_run_ratio * 2
>>> So perhaps if (count * min_run_ratio >= profile_info->runs) instead of 
>>> division.
>>
>> Ok.
>>
>>>> -      if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
>>>> REG_BR_PROB_BASE)
>>>> +      if (ENTRY_BLOCK_PTR->count)
>>>>         {
>>>> -         return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
>>>> -                       ENTRY_BLOCK_PTR->frequency)
>>>> -                 < REG_BR_PROB_BASE / 4);
>>>> +          gcov_type scaled_count
>>>> +              = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
>>>> +          gcov_type computed_count;
>>>> +          /* Check for overflow, in which case ENTRY_BLOCK_PTR->count 
>>>> should
>>>> +             be large enough to do the division first without losing much
>>>> +             precision.  */
>>>> +          if (scaled_count/frequency/min_run_ratio != 
>>>> ENTRY_BLOCK_PTR->count)
>>>
>>> I do not like the undefined behaviour triggered before the check (sure 
>>> unsigned
>>> arithmetic would fix that, but it seems all strange).  What about guarding 
>>> the
>>> whole code.
>>>
>>> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE)
>>> For large counts the roudoff problems in first test should not be problem.
>>
>> Sure, that sounds reasonable.
>
> Actually, we don't care about count at this point, just 
> ENTRY_BLOCK_PTR->count.
>
>>
>>>> +            {
>>>> +              computed_count = RDIV (ENTRY_BLOCK_PTR->count,
>>>> +                                     ENTRY_BLOCK_PTR->frequency);
>>>> +              computed_count *= frequency * min_run_ratio;
>>>> +            }
>>>> +          else
>>>> +            computed_count = RDIV (scaled_count, 
>>>> ENTRY_BLOCK_PTR->frequency);
>>>> +          if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 
>>>> 0)
>>>
>>> So at nonoverflow path you compute
>>>
>>>    ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * 
>>> min_run_ratio / runs > 0.5
>>
>> Yes, there is an extra min_run_ratio factor in there that I need to remove!
>>>
>>> I think you want
>>>
>>>    ((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= 
>>> runs
>>
>> Right, will change that too.
>
> Note that the above change essentially decreases the min_run_ratio in
> half (since we are comparing >= runs instead of >= 0.5 runs due to the
> rounding divide).
>
>>
>>>
>>> If entry_bb_frequency is 0, you can just return true iff frequency is > 
>>> min_run_ratio.
>>
>> We already return false conservatively if entry_bb_frequency is 0. Do
>> we really want to change this - I'm not sure it makes sense to compare
>> the frequency to the run ratio directly.
>>
>>>
>>> I think safe way to compute this is
>>>
>>> if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
>>>   scaled_count = ((frequency * entry_bb_count * min_run_ratio) / 
>>> entry_bb_frequency;
>>> else
>>>   scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * 
>>> min_run_ratio
>>>
>>> In first path, we have at most 10000*10000*10000*10000 that still fits in 
>>> 64bit
>>> value.
>>>
>>> In the second path the base of fixed point arithmetic is large enough so the
>>> division should not matter. We never get over entry_count * 10000 that is 
>>> safe.
>>> The first part of statement however just compute value that should match 
>>> count
>>> so count > 10000 unless profile got misupated and the low-count roundoff 
>>> errors should
>>> not matter. So perhaps the whole code can execute only if
>>> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE
>>>     && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
>>> and we do not need any conditionals for scaled_count
>
> As noted above, we don't use count here, so this can be simplified to:
>
> if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
>
> (and ENTRY_BLOCK_PTR->count > 0, which is checked earlier). See the
> fixed patch below.
>
>>>
>>> Does such code work with ratio changed to 16?  I think the double 
>>> multiplication in your
>>> patch may have caused problems.
>>
>> Let me fix this up and will get back.
>
> As noted above, the change to avoid the RDIV by runs decreases the
> effect of the min_run_ratio (now unlikely_count_fraction) in half. So
> we really need to increase this parameter to 32 to compare to what the
> old version of the code was doing.
>
> Indeed, using 32 does fix the same set of issues. However, when
> setting the parameter to the old default of 4, there are 5 core dumps
> using the patch to make the unlikely code unexecutable with a 100
> training runs (see below for some examples why). This goes down to 2
> failures if I set the parameter to 10, and 0 if I set it to 20. Using
> 20 essentially means that the code has to be executed 5% of the runs,
> which seems reasonable for something like splitting. What do you
> think?
>
> A couple examples why there was cold split code being executed with
> 100 train runs.
>
> 1) No context sensitivity for routine that is inlined:
>
> In 
> /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr33870.c,
> we call merge_pagelist 3 times from sort_pagelist. In the profile-use
> compile merge_pagelist is inlined (I believe in all 3 locations). One
> of the conditional blocks (line 23) has a profile count that is less
> than runs/4. It turns out that from the first merge_pagelist callsite
> we always execute this block, but the other two are completely cold.
> But the profile info from all three calls is of course merged, and it
> looks like it is executed infrequently overall. So the block is split
> but then we execute it from the inline instance corresponding to the
> first call.
>
> I'm not sure what we can do in general here. But it is another data
> point suggesting to me that the blocks should be very cold to be
> split.
>
>  2) RTL expansion:
>
> Another was in the following code from
> /usr/local/google/home/tejohnson/gcc_trunk_5/gcc/testsuite/gcc.c-torture/execute/pr43784.c:
>
> static struct s __attribute__((noinline)) rp(void)
> {
>   return *p;
> }
>
> where p is a global that is assigned as:
>
> static union u v;
> static struct s *p = &v.d.b;
>
> RTL expansion synthesizes a bunch of tests and branches (alignment
> handling?), and guesses at the probabilities of the resulting
> branches. Unfortunately, this is less than accurate and we end up
> executing a block that is given a small likelihood, making it below
> the threshold with the default unlikely fraction of 4.
>
> Here is the patch with the current threshold kept as 4. Let me know if
> you think we can kick this up to 20, or if we should just increase the
> threshold for splitting.
>
> Passes regression tests, ok for trunk?
>
> (BTW, getting close. I have a bunch of fixes to the loop unroller that
> need some cleanup, and a small fix to ssa tail merging to update the
> profiles, but I think that does it for the tests I was looking at.
> Will send those soon for review.)
>
> Thanks,
> Teresa
>
> 2013-10-10  Teresa Johnson  <tejohn...@google.com>
>
>         * predict.c (probably_never_executed): Compare frequency-based
>         count to number of training runs.
>         * params.def (UNLIKELY_BB_COUNT_FRACTION): New parameter.
>
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 203390)
> +++ predict.c   (working copy)
> @@ -237,17 +237,33 @@ probably_never_executed (struct function *fun,
>    gcc_checking_assert (fun);
>    if (profile_status_for_function (fun) == PROFILE_READ)
>      {
> -      if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
> +      int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> +      if (count * unlikely_count_fraction >= profile_info->runs)
>         return false;
>        if (!frequency)
>         return true;
>        if (!ENTRY_BLOCK_PTR->frequency)
>         return false;
> -      if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
> REG_BR_PROB_BASE)
> +      if (ENTRY_BLOCK_PTR->count)
>         {
> -         return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
> -                       ENTRY_BLOCK_PTR->frequency)
> -                 < REG_BR_PROB_BASE / 4);
> +          gcov_type computed_count;
> +          /* Check for possibility of overflow, in which case entry bb count
> +             is large enough to do the division first without losing much
> +             precision.  */
> +          if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
> +            {
> +              gcov_type scaled_count
> +                  = frequency * ENTRY_BLOCK_PTR->count *
> unlikely_count_fraction;
> +              computed_count = RDIV (scaled_count, 
> ENTRY_BLOCK_PTR->frequency);
> +            }
> +          else
> +            {
> +              computed_count = RDIV (ENTRY_BLOCK_PTR->count,
> +                                     ENTRY_BLOCK_PTR->frequency);
> +              computed_count *= frequency * unlikely_count_fraction;
> +            }
> +          if (computed_count >= profile_info->runs)
> +            return false;
>         }
>        return true;
>      }
> Index: params.def
> ===================================================================
> --- params.def  (revision 203390)
> +++ params.def  (working copy)
> @@ -373,6 +373,11 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
>          "Select fraction of the maximal frequency of executions of
> basic block in function given basic block needs to have to be
> considered hot",
>          1000, 0, 0)
>
> +DEFPARAM(UNLIKELY_BB_COUNT_FRACTION,
> +        "unlikely-bb-count-fraction",
> +         "The minimum fraction of profile runs a given basic block
> execution count must be not to be considered unlikely",
> +        4, 1, 10000)
> +
>  DEFPARAM (PARAM_ALIGN_THRESHOLD,
>           "align-threshold",
>           "Select fraction of the maximal frequency of executions of
> basic block in function given basic block get alignment",
>
>
>>
>> Thanks,
>> Teresa
>>
>>>
>>> I will check the second part of patch separately.
>>> Thanks!
>>> Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Reply via email to