Hi Richard,

on 2020/9/21 下午2:50, Richard Sandiford wrote:
> "Kewen.Lin" <li...@linux.ibm.com> writes:
>> Hi Richard,
>>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>>> Hi,
>>>>
>>>> The commit r11-3230 brings a nice improvement to use full
>>>> vectors instead of partial vectors when available.  But
>>>> it caused some vector with length test cases to fail on
>>>> Power.
>>>>
>>>> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
>>>> exposed one issue that: we call function 
>>>> vect_need_peeling_or_partial_vectors_p in function
>>>> vect_analyze_loop_2, since it's in analysis phase, for
>>>> the epilogue loop, we could get the wrong information like
>>>> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
>>>> answer for vect_need_peeling_or_partial_vectors_p.
>>>> For the epilogue loop in this failure specific, the niter
>>>> that we get is 58 (should be 1), vf is 2.
>>>>
>>>> For epilogue loop with partial vectors, it would use the
>>>> same VF as the main loop, so it won't be able to use full
>>>> vector, this patch is to exclude epilogue loop for the
>>>> check vect_need_peeling_or_partial_vectors_p in
>>>> vect_analyze_loop_2.
>>>
>>> Hmm.  For better or worse, I think the analysis phase is actually
>>> operating on the assumption that the vector code needs to handle
>>> all scalar iterations, even in the epilogue case.  We actually
>>> rely on that to implement VECT_COMPARE_COSTS (although it wasn't
>>> the original motivation for handling epilogues this way).
>>>
>>> Perhaps we should expand the functionality (and name)
>>> of determine_peel_for_niter so that it also computes
>>> LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
>>> that flag once we know the final epilogue scalar iteration count,
>>> just like we recompute LOOP_VINFO_PEELING_FOR_NITER.
>>>
>>> As a sanity check, I think it would also be good for the
>>> renamed function to do the following parts of vect_analyze_loop_2:
>>>
>>>   /* If epilog loop is required because of data accesses with gaps,
>>>      one additional iteration needs to be peeled.  Check if there is
>>>      enough iterations for vectorization.  */
>>>   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>>>       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>>>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>>>     {
>>>       poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>>       tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
>>>
>>>       if (known_lt (wi::to_widest (scalar_niters), vf))
>>>     return opt_result::failure_at (vect_location,
>>>                                    "loop has no enough iterations to"
>>>                                    " support peeling for gaps.\n");
>>>     }
>>>
>>>   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
>>>      to be able to handle fewer than VF scalars, or needs to have a lower VF
>>>      than the main loop.  */
>>>   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>>>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>>>       && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>>>                LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
>>>     return opt_result::failure_at (vect_location,
>>>                                "Vectorization factor too high for"
>>>                                " epilogue loop.\n");
>>>
>>> and then assert that no failure occurs when called for epilogues
>>> from vect_do_peeling.  So the function would be doing three things:
>>>
>>> - compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
>>>   in vect_analyze_loop_2
>>>
>>> - perform the checks quoted above
>>>
>>> - what the function currently does
>>>
>>> That's all speculation though -- I haven't tried any of this.
>>>
>>
>> Thanks for your comments!  I'm curious that are your suggestions mainly
>> for future extension?  IIUC, currently if the partial vector usage is 2,
>> we will have no epilogue loops, if the partial vector usage is 1, the
>> epilogue loop uses the same VF as the one of the main loop, its total
>> iterations count is less than VF, it has no chance to use full vector.
>> It looks to exclude epilogue loops is enough for now.
> 
> The problem is that partial-vector-usage==2 does not guarantee that all
> loops will be able to use partial vectors.  It's still subject to what
> the target and the vectoriser support.  And on AArch64, we compare up
> to 6 implementations of a loop side-by-side:
> 
>   - up to 4 SVE implementations, for different levels of unpacking
>   - up to 2 Advanced SIMD implementations (128-bit and 64-bit)
> 
> So even today, we have some loop_vinfos that can use predication and
> some loop_vinfos that can't.  We can then end up analysing a loop_vinfo
> once, but with two conflicting uses in mind:
> 
>   (a) as an epilogue loop for a main loop that can't use predication, or
>   (b) as a replacement for that main loop.
> 
> If analysis succeeds, we decide based on the cost of the loop_vinfo
> whether to go for (b) or (a).
> 
> To handle this correctly, we need to make the analysis for (b) as
> close as possible to the analysis without (a).
> 
> There's also a conceptual objection: it seems wrong to check for
> epilogue loops here, but not do the same for the later:
> 

Yeah, good point!

>   determine_peel_for_niter (loop_vinfo);
> 
> even though the information is equally wrong for (a) in both cases.
> 
> So what I suggested above was trying to keep things consistent:
> 
>   - during the initial loop_vinfo analysis, all decisions based on the
>     number of iterations consider the use of the loop_vinfo as the main
>     loop.
> 
>   - if we end up using the loop_vinfo as an epilogue loop, all those
>     decisions are remade based on the final (now known) number of
>     iterations.
> 

Got it, to keep them consistent makes more sense, thanks for your
detailed explanation and your time on the rework!

BR,
Kewen

Reply via email to