On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.ch...@arm.com> wrote: >> Hi, >> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt >> issue still exists. >> >> Current candidate selecting algorithm tends to select fewer candidates given >> below reasons: >> 1) to better handle loops with many induction uses but the best choice is >> one generic basic induction variable; >> 2) to keep compilation time low. >> >> One fundamental weakness of the strategy is the opposite situation can't be >> handled properly sometimes. For these cases the best choice is each >> induction variable has its own candidate. >> This patch fixes the problem by shuffling candidate set after fix-point is >> reached by current implementation. The reason why this strategy works is it >> replaces candidate set by selecting local optimal candidate for some >> induction uses, and the new candidate set (has lower cost) is exact what we >> want in the mentioned case. Instrumentation data shows this can find better >> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64. >> >> This patch actually is extension to the first version patch posted at >> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds >> another selecting pass with special seed set (more or less like the shuffled >> set in this patch). Data also confirms this patch can find optimal sets for >> most loops found by the first one, as well as optimal sets for many new >> loops. >> >> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and >> test on aarch64. >> Since this patch only selects candidate set with lower cost, any regressions >> revealed are latent bugs of other components in GCC. >> I also collected GCC bootstrap time on x86_64, no regression either. >> Is this OK? > > The algorithm seems to be quadratic in the number of IV candidates > (at least): Yes, I worried about that too, that's why I measured the bootstrap time. One way is restrict this procedure one time for each loop. I already tried that and it can capture +90% loops. Is this sounds reasonable?
BTW, do we have some compilation time benchmarks for GCC? Thanks, bin > > + for (i = 0; i < n_iv_cands (data); i++) > + { > ... > + iv_ca_replace (data, ivs, cand, act_delta, &tmp_delta); > ... > > and > > +static void > +iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs, > + struct iv_cand *cand, struct iv_ca_delta *act_delta, > + struct iv_ca_delta **delta) > +{ > ... > + for (i = 0; i < ivs->upto; i++) > + { > ... > + if (data->consider_all_candidates) > + { > + for (j = 0; j < n_iv_cands (data); j++) > + { > > possibly cubic if ivs->upto is of similar value. > > I wonder if it is possible to restrict this to the single IV with > the largest delta? After all we are iterating try_improve_iv_set. > Alternatively move the handling out of iteration completey, > thus into the caller of try_improve_iv_set? > > Note that compile-time issues always arise in auto-generated code, > not during GCC bootstrap. > > Richard. > > >> 2014-12-03 Bin Cheng bin.ch...@arm.com >> >> PR tree-optimization/62178 >> * tree-ssa-loop-ivopts.c (iv_ca_replace): New function. >> (try_improve_iv_set): Shuffle candidates set in order to handle >> case in which candidate wrto each iv use should be selected. >> >> gcc/testsuite/ChangeLog >> 2014-12-03 Bin Cheng bin.ch...@arm.com >> >> PR tree-optimization/62178 >> * gcc.target/aarch64/pr62178.c: New test.