Thanks for looking at this. "Kewen.Lin" <li...@linux.ibm.com> writes: > Hi, > > The commit r11-3230 brings a nice improvement to use full > vectors instead of partial vectors when available. But > it caused some vector with length test cases to fail on > Power. > > The failure on gcc.target/powerpc/p9-vec-length-epil-7.c > exposed one issue that: we call function > vect_need_peeling_or_partial_vectors_p in function > vect_analyze_loop_2, since it's in analysis phase, for > the epilogue loop, we could get the wrong information like > LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong > answer for vect_need_peeling_or_partial_vectors_p. > For the epilogue loop in this failure specific, the niter > that we get is 58 (should be 1), vf is 2. > > For epilogue loop with partial vectors, it would use the > same VF as the main loop, so it won't be able to use full > vector, this patch is to exclude epilogue loop for the > check vect_need_peeling_or_partial_vectors_p in > vect_analyze_loop_2.
Hmm. For better or worse, I think the analysis phase is actually operating on the assumption that the vector code needs to handle all scalar iterations, even in the epilogue case. We actually rely on that to implement VECT_COMPARE_COSTS (although it wasn't the original motivation for handling epilogues this way). Perhaps we should expand the functionality (and name) of determine_peel_for_niter so that it also computes LOOP_VINFO_USING_PARTIAL_VECTORS_P. We'd then recompute that flag once we know the final epilogue scalar iteration count, just like we recompute LOOP_VINFO_PEELING_FOR_NITER. As a sanity check, I think it would also be good for the renamed function to do the following parts of vect_analyze_loop_2: /* If epilog loop is required because of data accesses with gaps, one additional iteration needs to be peeled. Check if there is enough iterations for vectorization. */ if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) { poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo); if (known_lt (wi::to_widest (scalar_niters), vf)) return opt_result::failure_at (vect_location, "loop has no enough iterations to" " support peeling for gaps.\n"); } /* If we're vectorizing an epilogue loop, the vectorized loop either needs to be able to handle fewer than VF scalars, or needs to have a lower VF than the main loop. */ if (LOOP_VINFO_EPILOGUE_P (loop_vinfo) && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))) return opt_result::failure_at (vect_location, "Vectorization factor too high for" " epilogue loop.\n"); and then assert that no failure occurs when called for epilogues from vect_do_peeling. So the function would be doing three things: - compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code in vect_analyze_loop_2 - perform the checks quoted above - what the function currently does That's all speculation though -- I haven't tried any of this. Andrea, how should we handle this? Is it something you'd have time to look at? Thanks, Richard > > The failure on gcc.target/powerpc/p9-vec-length-full-6.c > is just a test issue, the 64bit/32bit pairs are able to > use full vector, fixed in the patch accordingly. > > Bootstrapped/regtested on powerpc64le-linux-gnu P9. > > Is it OK for trunk? > > BR, > Kewen > ----- > gcc/ChangeLog: > > * tree-vect-loop.c (vect_analyze_loop_2): Don't check > vect_need_peeling_or_partial_vectors_p for the epilogue loop. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/p9-vec-length-full-6.c: Adjust. > > diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c > b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c > index cfae9bbc927..5d2357aabfa 100644 > --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c > +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c > @@ -9,8 +9,7 @@ > #include "p9-vec-length-6.h" > > /* It can use normal vector load for constant vector load. */ > -/* { dg-final { scan-assembler-not {\mstxv\M} } } */ > -/* { dg-final { scan-assembler-not {\mlxvx\M} } } */ > -/* { dg-final { scan-assembler-not {\mstxvx\M} } } */ > -/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */ > -/* { dg-final { scan-assembler-times {\mstxvl\M} 16 } } */ > +/* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */ > +/* 64bit/32bit pairs won't use partial vectors. */ > +/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */ > +/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */ > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index ab627fbf029..7273e998a99 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -2278,7 +2278,8 @@ start_over: > { > /* Don't use partial vectors if we don't need to peel the loop. */ > if (param_vect_partial_vector_usage == 0 > - || !vect_need_peeling_or_partial_vectors_p (loop_vinfo)) > + || (!LOOP_VINFO_EPILOGUE_P (loop_vinfo) > + && !vect_need_peeling_or_partial_vectors_p (loop_vinfo))) > LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; > else if (vect_verify_full_masking (loop_vinfo) > || vect_verify_loop_lens (loop_vinfo))