Hi Richard, on 2020/9/21 下午2:50, Richard Sandiford wrote: > "Kewen.Lin" <li...@linux.ibm.com> writes: >> Hi Richard, >>> "Kewen.Lin" <li...@linux.ibm.com> writes: >>>> Hi, >>>> >>>> The commit r11-3230 brings a nice improvement to use full >>>> vectors instead of partial vectors when available. But >>>> it caused some vector with length test cases to fail on >>>> Power. >>>> >>>> The failure on gcc.target/powerpc/p9-vec-length-epil-7.c >>>> exposed one issue that: we call function >>>> vect_need_peeling_or_partial_vectors_p in function >>>> vect_analyze_loop_2, since it's in analysis phase, for >>>> the epilogue loop, we could get the wrong information like >>>> LOOP_VINFO_INT_NITERS (loop_vinfo), further get the wrong >>>> answer for vect_need_peeling_or_partial_vectors_p. >>>> For the epilogue loop in this failure specific, the niter >>>> that we get is 58 (should be 1), vf is 2. >>>> >>>> For epilogue loop with partial vectors, it would use the >>>> same VF as the main loop, so it won't be able to use full >>>> vector, this patch is to exclude epilogue loop for the >>>> check vect_need_peeling_or_partial_vectors_p in >>>> vect_analyze_loop_2. >>> >>> Hmm. For better or worse, I think the analysis phase is actually >>> operating on the assumption that the vector code needs to handle >>> all scalar iterations, even in the epilogue case. We actually >>> rely on that to implement VECT_COMPARE_COSTS (although it wasn't >>> the original motivation for handling epilogues this way). >>> >>> Perhaps we should expand the functionality (and name) >>> of determine_peel_for_niter so that it also computes >>> LOOP_VINFO_USING_PARTIAL_VECTORS_P. We'd then recompute >>> that flag once we know the final epilogue scalar iteration count, >>> just like we recompute LOOP_VINFO_PEELING_FOR_NITER. >>> >>> As a sanity check, I think it would also be good for the >>> renamed function to do the following parts of vect_analyze_loop_2: >>> >>> /* If epilog loop is required because of data accesses with gaps, >>> one additional iteration needs to be peeled. Check if there is >>> enough iterations for vectorization. */ >>> if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) >>> && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) >>> && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) >>> { >>> poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); >>> tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo); >>> >>> if (known_lt (wi::to_widest (scalar_niters), vf)) >>> return opt_result::failure_at (vect_location, >>> "loop has no enough iterations to" >>> " support peeling for gaps.\n"); >>> } >>> >>> /* If we're vectorizing an epilogue loop, the vectorized loop either needs >>> to be able to handle fewer than VF scalars, or needs to have a lower VF >>> than the main loop. */ >>> if (LOOP_VINFO_EPILOGUE_P (loop_vinfo) >>> && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) >>> && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), >>> LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))) >>> return opt_result::failure_at (vect_location, >>> "Vectorization factor too high for" >>> " epilogue loop.\n"); >>> >>> and then assert that no failure occurs when called for epilogues >>> from vect_do_peeling. So the function would be doing three things: >>> >>> - compute LOOP_VINFO_USING_PARTIAL_VECTORS_P, using the current code >>> in vect_analyze_loop_2 >>> >>> - perform the checks quoted above >>> >>> - what the function currently does >>> >>> That's all speculation though -- I haven't tried any of this. >>> >> >> Thanks for your comments! I'm curious that are your suggestions mainly >> for future extension? IIUC, currently if the partial vector usage is 2, >> we will have no epilogue loops, if the partial vector usage is 1, the >> epilogue loop uses the same VF as the one of the main loop, its total >> iterations count is less than VF, it has no chance to use full vector. >> It looks to exclude epilogue loops is enough for now. > > The problem is that partial-vector-usage==2 does not guarantee that all > loops will be able to use partial vectors. It's still subject to what > the target and the vectoriser support. And on AArch64, we compare up > to 6 implementations of a loop side-by-side: > > - up to 4 SVE implementations, for different levels of unpacking > - up to 2 Advanced SIMD implementations (128-bit and 64-bit) > > So even today, we have some loop_vinfos that can use predication and > some loop_vinfos that can't. We can then end up analysing a loop_vinfo > once, but with two conflicting uses in mind: > > (a) as an epilogue loop for a main loop that can't use predication, or > (b) as a replacement for that main loop. > > If analysis succeeds, we decide based on the cost of the loop_vinfo > whether to go for (b) or (a). > > To handle this correctly, we need to make the analysis for (b) as > close as possible to the analysis without (a). > > There's also a conceptual objection: it seems wrong to check for > epilogue loops here, but not do the same for the later: >
Yeah, good point! > determine_peel_for_niter (loop_vinfo); > > even though the information is equally wrong for (a) in both cases. > > So what I suggested above was trying to keep things consistent: > > - during the initial loop_vinfo analysis, all decisions based on the > number of iterations consider the use of the loop_vinfo as the main > loop. > > - if we end up using the loop_vinfo as an epilogue loop, all those > decisions are remade based on the final (now known) number of > iterations. > Got it, to keep them consistent makes more sense, thanks for your detailed explanation and your time on the rework! BR, Kewen