on 2020/1/7 下午7:25, Richard Biener wrote: > On Tue, 7 Jan 2020, Kewen.Lin wrote: > >> on 2020/1/7 下午5:14, Richard Biener wrote: >>> On Mon, 6 Jan 2020, Kewen.Lin wrote: >>> >>>> We are thinking whether it can be handled in IVOPTs instead of one RTL >>>> pass. >>>> >>>> During IVOPTs selecting IV cands, it doesn't know the loop will be >>>> unrolled so >>>> it doesn't count the possible step cost in with X-form. If we can teach >>>> it to >>>> consider the case, the IV cands which plays with D-form can be preferred. >>>> Currently unrolling (incomplete) happens in RTL, it looks we have to >>>> predict >>>> the loop whether unroll in IVOPTs. Since there is some parameter checks >>>> on RTL >>>> insn counts and target hooks, it seems not easy to get that. Besides, we >>>> need >>>> to check the step is valid to put into D-form field (eg: DQ-form requires >>>> divide >>>> 16 exactly), to ensure no extra ADDIs needed. >>>> >>>> I'm not sure whether it's a good idea to implement in IVOPTs, but I did >>>> some >>>> changes in IVOPTs to prove it's doable to get expected codes, the patch is >>>> attached. >>>> >>>> Any comments/suggestions are highly appreiciated! >>> >>> Is the unrolled code better than the not unrolled code (assuming >>> optimal IV choice)? Then IMHO IVOPTs should drive the unrolling, >>> either by actually doing it or by forcing it via the loop->unroll >>> setting. I don't think second-guessing the RTL unroller at this >>> point is going to work. Alternatively turn X-form into D-form during >>> RTL unrolling? >>> >> >> Hi Richard, >> >> Thanks for the comments! >> >> Yes, unrolled version is better on Power9 for both forms, but D-form >> unrolled is better than X-form unrolled. If we drive unrolling in >> IVOPTs, not sure it will be a concern that IVOPTs becomes too heavy? or >> too rude with forced UF if imprecise? Do we still have the plan to >> introduce one middle-end unroll pass, does it help if yes? > > I have the opinion that an isolated unrolling pass is not wanted. > Instead unrolling should be driven by some profitability metric > which in your case is better induction variable optimization. > In the "usual" case it is better scheduling where then scheduling > should drive unrolling.
OK, it makes sense. I heard some compiler consider unrolling factor for vectorization and some for modulo scheduling. > >> The quoted >> RTL patch is to propose one RTL pass after RTL loop passes, it also >> sounds good to check whether RTL unrolling is a good place! > > Why would you need a new RTL pass? I'd do it during the unroll > transform itself, ideally on the not unrolled body because that's > likely simpler than updating N copies? Good question, I don't have good understanding on it. But from the notes of the patch, I guess one new pass doesn't only handle the cases exposed by unrolling, but also the others without unrolling. Quoted from its note: "This new pass scans existing rtl expressions and replaces X-form loads and stores with rtl expressions that favor selection of the D-form instructions in contexts for which the D-form instructions are preferred. The new pass runs after the RTL loop optimizations since loop unrolling often introduces opportunities for beneficial replacements of X-form addressing instructions." BR, Kewen