On Thu, Dec 11, 2014 at 5:56 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>> Hi,
>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt
>>> issue still exists.
>>>
>>> Current candidate selecting algorithm tends to select fewer candidates given
>>> below reasons:
>>> 1) to better handle loops with many induction uses but the best choice is
>>> one generic basic induction variable;
>>> 2) to keep compilation time low.
>>>
>>> One fundamental weakness of the strategy is the opposite situation can't be
>>> handled properly sometimes. For these cases the best choice is each
>>> induction variable has its own candidate.
>>> This patch fixes the problem by shuffling candidate set after fix-point is
>>> reached by current implementation. The reason why this strategy works is it
>>> replaces candidate set by selecting local optimal candidate for some
>>> induction uses, and the new candidate set (has lower cost) is exact what we
>>> want in the mentioned case. Instrumentation data shows this can find better
>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>>
>>> This patch actually is extension to the first version patch posted at
>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>>> another selecting pass with special seed set (more or less like the shuffled
>>> set in this patch). Data also confirms this patch can find optimal sets for
>>> most loops found by the first one, as well as optimal sets for many new
>>> loops.
>>>
>>> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and
>>> test on aarch64.
>>> Since this patch only selects candidate set with lower cost, any regressions
>>> revealed are latent bugs of other components in GCC.
>>> I also collected GCC bootstrap time on x86_64, no regression either.
>>> Is this OK?
>>
>> The algorithm seems to be quadratic in the number of IV candidates
>> (at least):
> Yes, I worried about that too, that's why I measured the bootstrap
> time. One way is restrict this procedure one time for each loop. I
> already tried that and it can capture +90% loops. Is this sounds
> reasonable?
By +90%, I mean 90% from the 6% improved loops, not the total loop number...
>
> BTW, do we have some compilation time benchmarks for GCC?
>
> Thanks,
> bin
>>
>> + for (i = 0; i < n_iv_cands (data); i++)
>> + {
>> ...
>> + iv_ca_replace (data, ivs, cand, act_delta, &tmp_delta);
>> ...
>>
>> and
>>
>> +static void
>> +iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs,
>> + struct iv_cand *cand, struct iv_ca_delta *act_delta,
>> + struct iv_ca_delta **delta)
>> +{
>> ...
>> + for (i = 0; i < ivs->upto; i++)
>> + {
>> ...
>> + if (data->consider_all_candidates)
>> + {
>> + for (j = 0; j < n_iv_cands (data); j++)
>> + {
>>
>> possibly cubic if ivs->upto is of similar value.
>>
>> I wonder if it is possible to restrict this to the single IV with
>> the largest delta? After all we are iterating try_improve_iv_set.
>> Alternatively move the handling out of iteration completey,
>> thus into the caller of try_improve_iv_set?
>>
>> Note that compile-time issues always arise in auto-generated code,
>> not during GCC bootstrap.
>>
>> Richard.
>>
>>
>>> 2014-12-03 Bin Cheng bin.ch...@arm.com
>>>
>>> PR tree-optimization/62178
>>> * tree-ssa-loop-ivopts.c (iv_ca_replace): New function.
>>> (try_improve_iv_set): Shuffle candidates set in order to handle
>>> case in which candidate wrto each iv use should be selected.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2014-12-03 Bin Cheng bin.ch...@arm.com
>>>
>>> PR tree-optimization/62178
>>> * gcc.target/aarch64/pr62178.c: New test.