"Kewen.Lin" <li...@linux.ibm.com> writes:
> Hi Richard,
>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>> Hi,
>>>
>>> The commit r11-3230 brings a nice improvement to use full
>>> vectors instead of partial vectors when available.  But
>>> it caused some vector with length test cases to fail on
>>> Power.
>>>
>>> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c
>>> exposed one issue that: we call function 
>>> vect_need_peeling_or_partial_vectors_p in function
>>> vect_analyze_loop_2, since it's in analysis phase, for
>>> the epilogue loop, we could get the wrong information like
>>> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong
>>> answer for vect_need_peeling_or_partial_vectors_p.
>>> For the epilogue loop in this failure specific, the niter
>>> that we get is 58 (should be 1), vf is 2.
>>>
>>> For epilogue loop with partial vectors, it would use the
>>> same VF as the main loop, so it won't be able to use full
>>> vector, this patch is to exclude epilogue loop for the
>>> check vect_need_peeling_or_partial_vectors_p in
>>> vect_analyze_loop_2.
>> 
>> Hmm.  For better or worse, I think the analysis phase is actually
>> operating on the assumption that the vector code needs to handle
>> all scalar iterations, even in the epilogue case.  We actually
>> rely on that to implement VECT_COMPARE_COSTS (although it wasn't
>> the original motivation for handling epilogues this way).
>> 
>> Perhaps we should expand the functionality (and name)
>> of determine_peel_for_niter so that it also computes
>> LOOP_VINFO_USING_PARTIAL_VECTORS_P.  We'd then recompute
>> that flag once we know the final epilogue scalar iteration count,
>> just like we recompute LOOP_VINFO_PEELING_FOR_NITER.
>> 
>> As a sanity check, I think it would also be good for the
>> renamed function to do the following parts of vect_analyze_loop_2:
>> 
>>   /* If epilog loop is required because of data accesses with gaps,
>>      one additional iteration needs to be peeled.  Check if there is
>>      enough iterations for vectorization.  */
>>   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>>       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
>>     {
>>       poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>       tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
>> 
>>       if (known_lt (wi::to_widest (scalar_niters), vf))
>>      return opt_result::failure_at (vect_location,
>>                                     "loop has no enough iterations to"
>>                                     " support peeling for gaps.\n");
>>     }
>> 
>>   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
>>      to be able to handle fewer than VF scalars, or needs to have a lower VF
>>      than the main loop.  */
>>   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
>>       && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>>       && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
>>                 LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
>>     return opt_result::failure_at (vect_location,
>>                                 "Vectorization factor too high for"
>>                                 " epilogue loop.\n");
>> 
>> and then assert that no failure occurs when called for epilogues
>> from vect_do_peeling.  So the function would be doing three things:
>> 
>> - compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code
>>   in vect_analyze_loop_2
>> 
>> - perform the checks quoted above
>> 
>> - what the function currently does
>> 
>> That's all speculation though -- I haven't tried any of this.
>> 
>
> Thanks for your comments!  I'm curious that are your suggestions mainly
> for future extension?  IIUC, currently if the partial vector usage is 2,
> we will have no epilogue loops, if the partial vector usage is 1, the
> epilogue loop uses the same VF as the one of the main loop, its total
> iterations count is less than VF, it has no chance to use full vector.
> It looks to exclude epilogue loops is enough for now.

The problem is that partial-vector-usage==2 does not guarantee that all
loops will be able to use partial vectors.  It's still subject to what
the target and the vectoriser support.  And on AArch64, we compare up
to 6 implementations of a loop side-by-side:

  - up to 4 SVE implementations, for different levels of unpacking
  - up to 2 Advanced SIMD implementations (128-bit and 64-bit)

So even today, we have some loop_vinfos that can use predication and
some loop_vinfos that can't.  We can then end up analysing a loop_vinfo
once, but with two conflicting uses in mind:

  (a) as an epilogue loop for a main loop that can't use predication, or
  (b) as a replacement for that main loop.

If analysis succeeds, we decide based on the cost of the loop_vinfo
whether to go for (b) or (a).

To handle this correctly, we need to make the analysis for (b) as
close as possible to the analysis without (a).

There's also a conceptual objection: it seems wrong to check for
epilogue loops here, but not do the same for the later:

  determine_peel_for_niter (loop_vinfo);

even though the information is equally wrong for (a) in both cases.

So what I suggested above was trying to keep things consistent:

  - during the initial loop_vinfo analysis, all decisions based on the
    number of iterations consider the use of the loop_vinfo as the main
    loop.

  - if we end up using the loop_vinfo as an epilogue loop, all those
    decisions are remade based on the final (now known) number of
    iterations.

Of course, it would be better if the code was structured differently
and so these things could be done more directly.  E.g. it would be
good if:

(1) the analysis phase was split up more, so that it's possible to reuse
    parts of the core analysis and just vary “epilogueness“.

(2) the actual number of epilogue iterations was available during the
    analysis phase (at least if the number is constant).

But the structure of the vectoriser tends to make things more difficult
than they should be.

In a sense, the changes I suggested in the quote above are moving
towards (1), so I think they are forward progress of a kind.

Thanks,
Richard

Reply via email to