On Wed, Dec 17, 2014 at 10:47 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Tue, Dec 16, 2014 at 4:42 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Thu, Dec 11, 2014 at 8:08 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Thu, Dec 11, 2014 at 10:56 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>>> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener
>>>> <richard.guent...@gmail.com> wrote:
>>>>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>>>> Hi,
>>>>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the 
>>>>>> ivopt
>>>>>> issue still exists.
>>>>>>
>>>>>> Current candidate selecting algorithm tends to select fewer candidates 
>>>>>> given
>>>>>> below reasons:
>>>>>>   1) to better handle loops with many induction uses but the best choice 
>>>>>> is
>>>>>> one generic basic induction variable;
>>>>>>   2) to keep compilation time low.
>>>>>>
>>>>>> One fundamental weakness of the strategy is the opposite situation can't 
>>>>>> be
>>>>>> handled properly sometimes.  For these cases the best choice is each
>>>>>> induction variable has its own candidate.
>>>>>> This patch fixes the problem by shuffling candidate set after fix-point 
>>>>>> is
>>>>>> reached by current implementation.  The reason why this strategy works 
>>>>>> is it
>>>>>> replaces candidate set by selecting local optimal candidate for some
>>>>>> induction uses, and the new candidate set (has lower cost) is exact what 
>>>>>> we
>>>>>> want in the mentioned case.  Instrumentation data shows this can find 
>>>>>> better
>>>>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>>>>>
>>>>>> This patch actually is extension to the first version patch posted at
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>>>>>> another selecting pass with special seed set (more or less like the 
>>>>>> shuffled
>>>>>> set in this patch).  Data also confirms this patch can find optimal sets 
>>>>>> for
>>>>>> most loops found by the first one, as well as optimal sets for many new
>>>>>> loops.
>>>>>>
>>>>>> Bootstrap and test on x86_64, no regression on benchmarks.  Bootstrap and
>>>>>> test on aarch64.
>>>>>> Since this patch only selects candidate set with lower cost, any 
>>>>>> regressions
>>>>>> revealed are latent bugs of other components in GCC.
>>>>>> I also collected GCC bootstrap time on x86_64, no regression either.
>>>>>> Is this OK?
>>>>>
>>>>> The algorithm seems to be quadratic in the number of IV candidates
>>>>> (at least):
>>>> Yes, I worried about that too, that's why I measured the bootstrap
>>>> time.  One way is restrict this procedure one time for each loop.  I
>>>> already tried that and it can capture +90% loops.  Is this sounds
>>>> reasonable?
>>>
>>> Yes.  That's my suggestion to handle it in the caller of try_improve_iv_set?
>>>
>>>> BTW, do we have some compilation time benchmarks for GCC?
>>>
>>> There are various testcases linked from PR47344, I don't remember
>>> any particular one putting load on IVOPTs (but I do remember seeing
>>> IVOPTs in the ~25% area in -ftime-report for some testcases).
>>
>
> Hi,
> I further refined the patch.  Specifically, I factored out common
> code, improved comments, and restricted new code in several ways, for
> example, now iv_ca_replace runs exactly one time for each
> find_optimal_iv_set; iv_ca_replace only tries to replace one candidate
> in IVS each time and makes quick return if lower cost set is found;
> most importantly, iv_ca_replace now checks
> ALWAYS_PRUNE_CAND_SET_BOUND.
> The patch is simplified with these changes.  As for compilation time,
> IVOPT isn't regressed obviously for the overloaded case I created,
> also regression in llvm compilation time benchmarks is gone.
>
> I think we could adapt data structure in IVOPT to make it faster, for
> example, record information in candidate about which uses are
> represented by each cand, sort candidates by cost for each iv use.  I
> may do some refactor in next stage1.

Yes, I agree.  A similar thing is to use affine combinations throughout
them to avoid going into/out of that representation all the time (I think
I've suggested that elsewhere).

> Bootstrap on x86_64, test ongoing.  So OK if no regressions?

Ok.

Thanks,
Richard.

> Thanks,
> bin
>
> 2014-12-17  Bin Cheng  <bin.ch...@arm.com>
>
>     PR tree-optimization/62178
>     * tree-ssa-loop-ivopts.c (cheaper_cost_with_cand): New function.
>     (iv_ca_replace): New function.
>     (try_improve_iv_set): New parameter try_replace_p.
>     Break local optimal fixed-point by calling iv_ca_replace.
>     (find_optimal_iv_set_1): Pass new argument to try_improve_iv_set.
>
> gcc/testsuite/ChangeLog
> 2014-12-17  Bin Cheng  <bin.ch...@arm.com>
>
>     PR tree-optimization/62178
>     * gcc.target/aarch64/pr62178.c: New test.

Reply via email to