On Wed, Dec 17, 2014 at 10:47 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, Dec 16, 2014 at 4:42 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Thu, Dec 11, 2014 at 8:08 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Thu, Dec 11, 2014 at 10:56 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>> Hi, >>>>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the >>>>>> ivopt >>>>>> issue still exists. >>>>>> >>>>>> Current candidate selecting algorithm tends to select fewer candidates >>>>>> given >>>>>> below reasons: >>>>>> 1) to better handle loops with many induction uses but the best choice >>>>>> is >>>>>> one generic basic induction variable; >>>>>> 2) to keep compilation time low. >>>>>> >>>>>> One fundamental weakness of the strategy is the opposite situation can't >>>>>> be >>>>>> handled properly sometimes. For these cases the best choice is each >>>>>> induction variable has its own candidate. >>>>>> This patch fixes the problem by shuffling candidate set after fix-point >>>>>> is >>>>>> reached by current implementation. The reason why this strategy works >>>>>> is it >>>>>> replaces candidate set by selecting local optimal candidate for some >>>>>> induction uses, and the new candidate set (has lower cost) is exact what >>>>>> we >>>>>> want in the mentioned case. Instrumentation data shows this can find >>>>>> better >>>>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64. >>>>>> >>>>>> This patch actually is extension to the first version patch posted at >>>>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds >>>>>> another selecting pass with special seed set (more or less like the >>>>>> shuffled >>>>>> set in this patch). Data also confirms this patch can find optimal sets >>>>>> for >>>>>> most loops found by the first one, as well as optimal sets for many new >>>>>> loops. >>>>>> >>>>>> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and >>>>>> test on aarch64. >>>>>> Since this patch only selects candidate set with lower cost, any >>>>>> regressions >>>>>> revealed are latent bugs of other components in GCC. >>>>>> I also collected GCC bootstrap time on x86_64, no regression either. >>>>>> Is this OK? >>>>> >>>>> The algorithm seems to be quadratic in the number of IV candidates >>>>> (at least): >>>> Yes, I worried about that too, that's why I measured the bootstrap >>>> time. One way is restrict this procedure one time for each loop. I >>>> already tried that and it can capture +90% loops. Is this sounds >>>> reasonable? >>> >>> Yes. That's my suggestion to handle it in the caller of try_improve_iv_set? >>> >>>> BTW, do we have some compilation time benchmarks for GCC? >>> >>> There are various testcases linked from PR47344, I don't remember >>> any particular one putting load on IVOPTs (but I do remember seeing >>> IVOPTs in the ~25% area in -ftime-report for some testcases). >> > > Hi, > I further refined the patch. Specifically, I factored out common > code, improved comments, and restricted new code in several ways, for > example, now iv_ca_replace runs exactly one time for each > find_optimal_iv_set; iv_ca_replace only tries to replace one candidate > in IVS each time and makes quick return if lower cost set is found; > most importantly, iv_ca_replace now checks > ALWAYS_PRUNE_CAND_SET_BOUND. > The patch is simplified with these changes. As for compilation time, > IVOPT isn't regressed obviously for the overloaded case I created, > also regression in llvm compilation time benchmarks is gone. > > I think we could adapt data structure in IVOPT to make it faster, for > example, record information in candidate about which uses are > represented by each cand, sort candidates by cost for each iv use. I > may do some refactor in next stage1.
Yes, I agree. A similar thing is to use affine combinations throughout them to avoid going into/out of that representation all the time (I think I've suggested that elsewhere). > Bootstrap on x86_64, test ongoing. So OK if no regressions? Ok. Thanks, Richard. > Thanks, > bin > > 2014-12-17 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/62178 > * tree-ssa-loop-ivopts.c (cheaper_cost_with_cand): New function. > (iv_ca_replace): New function. > (try_improve_iv_set): New parameter try_replace_p. > Break local optimal fixed-point by calling iv_ca_replace. > (find_optimal_iv_set_1): Pass new argument to try_improve_iv_set. > > gcc/testsuite/ChangeLog > 2014-12-17 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/62178 > * gcc.target/aarch64/pr62178.c: New test.