On Mon, 10 Feb 2020, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 10, 2020 at 02:17:04PM +0800, Kewen.Lin wrote: > > on 2020/1/20 下午8:33, Segher Boessenkool wrote: > > > On Thu, Jan 16, 2020 at 05:36:52PM +0800, Kewen.Lin wrote: > > >> As we discussed in the thread > > >> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html > > >> Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html, > > >> I'm working to teach IVOPTs to consider D-form group access during > > >> unrolling. > > >> The difference on D-form and other forms during unrolling is we can put > > >> the > > >> stride into displacement field to avoid additional step increment. eg: > > > > > > <snip> > > > > > >> Imagining that if the loop get unrolled by 8 times, then 3 step updates > > >> with > > >> D-form vs. 8 step updates with X-form. Here we only need to check stride > > >> meet D-form field requirement, since if OFF doesn't meet, we can > > >> construct > > >> baseA' with baseA + OFF. > > > > > > So why doesn't the existing code do this already? Why does it make all > > > the extra induction variables? Is the existing cost model bad, are our > > > target costs bad, or something like that? > > > > > > > I think the main cause is IVOPTs runs before RTL unroll, when it's > > determining > > the IV sets, it can only take the normal step cost into account, since its > > input isn't unrolled yet. After unrolling, the x-form indexing register > > has to > > play with more UF-1 times update, but we can probably hide them in d-form > > displacement field. The way I proposed here is to adjust IV cost with > > additional cost_step according to estimated unroll. It doesn't introduce > > new > > IV cand but can affect the final optimal set. > > Yes, we should decide how often we want to unroll things somewhere before > ivopts already, and just use that info here. > > Or are there advantage to doing it *in* ivopts? It sounds like doing > it there is probably expensive, but maybe not, and we need to do similar > analysis there anyway.
Well, if the only benefit of doing the unrolling is that IVs get cheaper then yes, IVOPTs should drive it. But usually unrolling exposes redundancies (catched by predictive commoning which drives some unrolling) or it enables better use of CPU resources via scheduling (only catched later in RTL). For scheduling we have the additional complication that the RTL side doesn't have as much of a fancy data dependence analysis framework as on the GIMPLE side. So I'd put my bet on trying to move something like SMS to GIMPLE and combine it with unrolling (IIRC SMS at most interleaves 1 1/2 loop iterations). Richard.