on 2020/6/3 下午5:27, Richard Biener wrote: > On Wed, 3 Jun 2020, Kewen.Lin wrote: > >> on 2020/6/3 下午3:07, Richard Biener wrote: >>> On Wed, 3 Jun 2020, Kewen.Lin wrote: >>> >>>> Hi Richi, >>>>
snip ... >>>>> >>>>> I'd just mention there are other targets that have the choice between >>>>> the above forms. Since IVOPTs itself does not perform the unrolling >>>>> the IL it produces is the same, correct? >>>>> >>>> Yes. Before this patch, IVOPTs doesn't consider the unrolling impacts, >>>> it only models things based on what it sees. We can assume it thinks >>>> later RTL unrolling won't perform. >>>> >>>> With this patch, since the IV choice probably changes, the IL can probably >>>> change. The typical difference with this patch is: >>>> >>>> vect__1.7_15 = MEM[symbol: x, index: ivtmp.19_22, offset: 0B]; >>>> vs. >>>> vect__1.7_15 = MEM[base: _29, offset: 0B]; >>> >>> So we're asking IVOPTS "if we were unrolling this loop would you make >>> a different IV choice?" thus I wonder why we need so much complexity >>> here? >> >> I would describe it more like "we are going to unroll this loop with >> unroll factor uf in RTL, would you consider this variable when modeling?" >> >> In most cases, one single iteration is representative for the unrolled >> body, so it doesn't matter considering unrolling or not. But for the >> case here, it's not true, expected reg_offset iv cand can make iv cand >> step cost reduced, it leads the difference. >> >>> That is, if we can classify the loop as being possibly unrolled >>> we could evaluate IVOPTs IV choice (and overall cost) on the original >>> loop and in a second run on the original loop with fake IV uses >>> added with extra offset. If the overall IV cost is similar we'll >>> take the unroll friendly choice if the costs are way different >>> (I wouldn't expect this to be the case ever?) I'd side with the >>> IV choice when not unrolling (and mark the loop as to be not unrolled). >>> >> >> Could you elaborate it a bit? I guess it won't estimate the unroll >> factor here, just guess it's to be unrolled or not? The second run >> with fake IV uses added with extra offset sounds like scaling up the >> iv group cost by uf. > > From your example above the D-form (MEM[symbol: x, index: ivtmp.19_22, > offset: 0B]) is preferable since in the unrolled variant we have > the same addres but with a different constant offset for the unroll > copies while the second form would have to update the 'base' IV. > > Thus I think the difference in IV cost and decision should already > show up if we, for each USE add a USE with an added constant offset. > This might be what your patch does with that extra flag on the USEs, > I was suggesting to model the USEs more explicitely, simulating a > 2-way unroll. I think in the end I'll defer to Bin here who knows > the code best. > Thanks for your further explanation! As your proposal we introduce more iv use groups with step added. Take the example here https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547128.html Imagining initially the cand iv 4 leading to x-form wins, it's the original iv, has the iv-group cost 1 against the address group. Although we introduce one more group (2-way unrolling), the iv still wins since pulling the address iv in takes 5 (15 for three). Probably we can introduce more groups according to uf here. OK. Looking forward to Bin's comments. >>> Thus I'd err on the side of not unrolling but leave the ultimate choice >>> of whether to unroll to RTL unless IV cost makes that prohibitive. >>> >>> Even without X- or D- form addressing modes the IV choice may differ >>> and I think we don't need extra knobs for the unroller but instead >>> can decide to set the existing n_unroll to zero (force not unroll) >>> when costs say it would be bad? >> >> Yes, even without x- or d- form addressing, the difference probably comes >> from compare type IV use for loop ending, maybe more cases which I am not >> aware of. But I don't see people care about it, probably the impact is >> small. >> >> IIUC what you stated here looks like to use ivopts information for unrolling >> factor decision, I think this is a separate direction, do we have this >> kind of case where ivopts costs can foresee the unrolling? >> >> Now the unroll factor estimation can be used for other optimization passes >> if they are wondering future unrolling factor decision, as discussed it >> sounds a good idea to override the n_unroll with some benchmarking. > > I didnt' suggest to use IVOPTs to determine the unroll factor. In > fact your patch looks like it does this? Instead I wanted to make > IVOPTs choose a set of IVs that is best for a blend of both worlds - use > D-form when it doesn't hurt the not unrolled code [much], and X-form > when the D-form is way worse (for whatever reason) and signal that > to the unroller (but we could chose to not do that). > Sorry for my weak comprehension! Nice, we are on the same direction. :) > The real issue is of course we're applying IV decision to a not final > loop. > Exactly. BR, Kewen