On Thu, Dec 11, 2014 at 5:56 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>> Hi,
>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt
>>> issue still exists.
>>>
>>> Current candidate selecting algorithm tends to select fewer candidates given
>>> below reasons:
>>>   1) to better handle loops with many induction uses but the best choice is
>>> one generic basic induction variable;
>>>   2) to keep compilation time low.
>>>
>>> One fundamental weakness of the strategy is the opposite situation can't be
>>> handled properly sometimes.  For these cases the best choice is each
>>> induction variable has its own candidate.
>>> This patch fixes the problem by shuffling candidate set after fix-point is
>>> reached by current implementation.  The reason why this strategy works is it
>>> replaces candidate set by selecting local optimal candidate for some
>>> induction uses, and the new candidate set (has lower cost) is exact what we
>>> want in the mentioned case.  Instrumentation data shows this can find better
>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>>
>>> This patch actually is extension to the first version patch posted at
>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>>> another selecting pass with special seed set (more or less like the shuffled
>>> set in this patch).  Data also confirms this patch can find optimal sets for
>>> most loops found by the first one, as well as optimal sets for many new
>>> loops.
>>>
>>> Bootstrap and test on x86_64, no regression on benchmarks.  Bootstrap and
>>> test on aarch64.
>>> Since this patch only selects candidate set with lower cost, any regressions
>>> revealed are latent bugs of other components in GCC.
>>> I also collected GCC bootstrap time on x86_64, no regression either.
>>> Is this OK?
>>
>> The algorithm seems to be quadratic in the number of IV candidates
>> (at least):
> Yes, I worried about that too, that's why I measured the bootstrap
> time.  One way is restrict this procedure one time for each loop.  I
> already tried that and it can capture +90% loops.  Is this sounds
> reasonable?
By +90%, I mean 90% from the 6% improved loops, not the total loop number...
>
> BTW, do we have some compilation time benchmarks for GCC?
>
> Thanks,
> bin
>>
>> +         for (i = 0; i < n_iv_cands (data); i++)
>> +           {
>> ...
>> +             iv_ca_replace (data, ivs, cand, act_delta, &tmp_delta);
>> ...
>>
>> and
>>
>> +static void
>> +iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs,
>> +              struct iv_cand *cand, struct iv_ca_delta *act_delta,
>> +              struct iv_ca_delta **delta)
>> +{
>> ...
>> +  for (i = 0; i < ivs->upto; i++)
>> +    {
>> ...
>> +      if (data->consider_all_candidates)
>> +       {
>> +         for (j = 0; j < n_iv_cands (data); j++)
>> +           {
>>
>> possibly cubic if ivs->upto is of similar value.
>>
>> I wonder if it is possible to restrict this to the single IV with
>> the largest delta?  After all we are iterating try_improve_iv_set.
>> Alternatively move the handling out of iteration completey,
>> thus into the caller of try_improve_iv_set?
>>
>> Note that compile-time issues always arise in auto-generated code,
>> not during GCC bootstrap.
>>
>> Richard.
>>
>>
>>> 2014-12-03  Bin Cheng  bin.ch...@arm.com
>>>
>>>   PR tree-optimization/62178
>>>   * tree-ssa-loop-ivopts.c (iv_ca_replace): New function.
>>>   (try_improve_iv_set): Shuffle candidates set in order to handle
>>>   case in which candidate wrto each iv use should be selected.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2014-12-03  Bin Cheng  bin.ch...@arm.com
>>>
>>>   PR tree-optimization/62178
>>>   * gcc.target/aarch64/pr62178.c: New test.

Reply via email to