https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310
Bug ID: 110310 Summary: vector epilogue handling is inefficient Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- It looks like we apply some analysis only when transforming the main vector loop. In particular vect_do_peeling does the following which elides a vector epilogue after costing. /* If we know the number of scalar iterations for the main loop we should check whether after the main loop there are enough iterations left over for the epilogue. */ if (vect_epilogues && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && prolog_peeling >= 0 && known_eq (vf, lowest_vf)) { unsigned HOST_WIDE_INT eiters = (LOOP_VINFO_INT_NITERS (loop_vinfo) - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)); eiters -= prolog_peeling; eiters = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); while (!vect_update_epilogue_niters (epilogue_vinfo, eiters)) { delete epilogue_vinfo; epilogue_vinfo = NULL; if (loop_vinfo->epilogue_vinfos.length () == 0) { vect_epilogues = false; break; } epilogue_vinfo = loop_vinfo->epilogue_vinfos[0]; loop_vinfo->epilogue_vinfos.ordered_remove (0); } vect_epilogues_updated_niters = true; So for example for the loop void foo (int * __restrict a, int *b) { for (int i = 0; i < 20; ++i) a[i] = b[i] + 42; } we end up with no vectorized epilogue when using AVX512 but instead of the AVX2 epilogue which is discarded we'd like to use a SSE2 epilogue. It seems that vect_determine_partial_vectors_and_peeling as called from vect_update_epilogue_niters should have been already determined when analyzing the epilogue, but during the epilogue costing the loop_vinfo still inherits the main loop NITER. For the testcase at hand we're somewhat saved by BB vectorization but when doing partial loop vectorization we unnecessarily get a AVX512 masked epilogue here and the cost model doesn't get a chance to see the updated known niter for the epilogue nor would there be a meaningful way to do this when costs are compared because we have no way of estimating the number of masked out lanes for example.