Hi Richard,

> "Kewen.Lin" <li...@linux.ibm.com> writes:
>> Hi,
>>
>> The commit r11-3230 brings a nice improvement to use full
>> vectors instead of partial vectors when available.  But
>> it caused some vector with length test cases to fail on
>> Power.
>>
>> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
>> exposed one issue that: we call function 
>> vect_need_peeling_or_partial_vectors_p in function
>> vect_analyze_loop_2, since it's in analysis phase, for
>> the epilogue loop, we could get the wrong information like
>> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
>> answer for vect_need_peeling_or_partial_vectors_p.
>> For the epilogue loop in this failure specific, the niter
>> that we get is 58 (should be 1), vf is 2.
>>
>> For epilogue loop with partial vectors, it would use the
>> same VF as the main loop, so it won't be able to use full
>> vector, this patch is to exclude epilogue loop for the
>> check vect_need_peeling_or_partial_vectors_p in
>> vect_analyze_loop_2.
> 
> Hmm.  For better or worse, I think the analysis phase is actually
> operating on the assumption that the vector code needs to handle
> all scalar iterations, even in the epilogue case.  We actually
> rely on that to implement VECT_COMPARE_COSTS (although it wasn't
> the original motivation for handling epilogues this way).
> 
> Perhaps we should expand the functionality (and name)
> of determine_peel_for_niter so that it also computes
> LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
> that flag once we know the final epilogue scalar iteration count,
> just like we recompute LOOP_VINFO_PEELING_FOR_NITER.
> 
> As a sanity check, I think it would also be good for the
> renamed function to do the following parts of vect_analyze_loop_2:
> 
>   /* If epilog loop is required because of data accesses with gaps,
>      one additional iteration needs to be peeled.  Check if there is
>      enough iterations for vectorization.  */
>   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>     {
>       poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>       tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
> 
>       if (known_lt (wi::to_widest (scalar_niters), vf))
>       return opt_result::failure_at (vect_location,
>                                      "loop has no enough iterations to"
>                                      " support peeling for gaps.\n");
>     }
> 
>   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
>      to be able to handle fewer than VF scalars, or needs to have a lower VF
>      than the main loop.  */
>   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>       && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>                  LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
>     return opt_result::failure_at (vect_location,
>                                  "Vectorization factor too high for"
>                                  " epilogue loop.\n");
> 
> and then assert that no failure occurs when called for epilogues
> from vect_do_peeling.  So the function would be doing three things:
> 
> - compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
>   in vect_analyze_loop_2
> 
> - perform the checks quoted above
> 
> - what the function currently does
> 
> That's all speculation though -- I haven't tried any of this.
> 

Thanks for your comments!  I'm curious that are your suggestions mainly
for future extension?  IIUC, currently if the partial vector usage is 2,
we will have no epilogue loops, if the partial vector usage is 1, the
epilogue loop uses the same VF as the one of the main loop, its total
iterations count is less than VF, it has no chance to use full vector.
It looks to exclude epilogue loops is enough for now.

BR,
Kewen

Reply via email to