on 2020/1/7 下午7:25, Richard Biener wrote:
> On Tue, 7 Jan 2020, Kewen.Lin wrote:
> 
>> on 2020/1/7 下午5:14, Richard Biener wrote:
>>> On Mon, 6 Jan 2020, Kewen.Lin wrote:
>>>
>>>> We are thinking whether it can be handled in IVOPTs instead of one RTL 
>>>> pass.
>>>>
>>>> During IVOPTs selecting IV cands, it doesn't know the loop will be 
>>>> unrolled so
>>>> it doesn't count the possible step cost in with X-form.  If we can teach 
>>>> it to
>>>> consider the case, the IV cands which plays with D-form can be preferred.
>>>> Currently unrolling (incomplete) happens in RTL, it looks we have to 
>>>> predict
>>>> the loop whether unroll in IVOPTs.  Since there is some parameter checks 
>>>> on RTL
>>>> insn counts and target hooks, it seems not easy to get that.  Besides, we 
>>>> need
>>>> to check the step is valid to put into D-form field (eg: DQ-form requires 
>>>> divide
>>>> 16 exactly), to ensure no extra ADDIs needed.
>>>>
>>>> I'm not sure whether it's a good idea to implement in IVOPTs, but I did 
>>>> some
>>>> changes in IVOPTs to prove it's doable to get expected codes, the patch is 
>>>> attached.
>>>>
>>>> Any comments/suggestions are highly appreiciated!
>>>
>>> Is the unrolled code better than the not unrolled code (assuming
>>> optimal IV choice)?  Then IMHO IVOPTs should drive the unrolling,
>>> either by actually doing it or by forcing it via the loop->unroll
>>> setting.  I don't think second-guessing the RTL unroller at this
>>> point is going to work.  Alternatively turn X-form into D-form during
>>> RTL unrolling?
>>>
>>
>> Hi Richard,
>>
>> Thanks for the comments!
>>
>> Yes, unrolled version is better on Power9 for both forms, but D-form 
>> unrolled is better than X-form unrolled.  If we drive unrolling in 
>> IVOPTs, not sure it will be a concern that IVOPTs becomes too heavy? or 
>> too rude with forced UF if imprecise? Do we still have the plan to 
>> introduce one middle-end unroll pass, does it help if yes?
> 
> I have the opinion that an isolated unrolling pass is not wanted.
> Instead unrolling should be driven by some profitability metric
> which in your case is better induction variable optimization.
> In the "usual" case it is better scheduling where then scheduling
> should drive unrolling.

OK, it makes sense.  I heard some compiler consider unrolling factor
for vectorization and some for modulo scheduling.

> 
>> The quoted 
>> RTL patch is to propose one RTL pass after RTL loop passes, it also 
>> sounds good to check whether RTL unrolling is a good place!
> 
> Why would you need a new RTL pass?  I'd do it during the unroll
> transform itself, ideally on the not unrolled body because that's
> likely simpler than updating N copies?

Good question, I don't have good understanding on it.  But from the notes
of the patch, I guess one new pass doesn't only handle the cases exposed
by unrolling, but also the others without unrolling.

Quoted from its note: "This new pass scans existing rtl expressions and
replaces X-form loads and stores with rtl expressions that favor selection
of the D-form instructions in contexts for which the D-form instructions
are preferred.  The new pass runs after the RTL loop optimizations since
loop unrolling often introduces opportunities for beneficial replacements
of X-form addressing instructions."

BR,
Kewen

Reply via email to