https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |53947 Last reconfirmed| |2023-01-16 Target| |x86_64-*-* Keywords| |missed-optimization CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- One issue is that we at most perform one epilogue loop vectorization, so with AVX512 we vectorize the epilogue with AVX2 but its epilogue remains unvectorized. With AVX512 we'd want to use a fully masked epilogue using AVX512 instead. I started working on fully masked vectorization support for AVX512 but got distracted. Another option would be to use SSE vectorization for the epilogue (note for SSE we vectorize the epilogue with 64bit half-SSE vectors!), which would mean giving the target (some) control over the mode used for vectorizing the epilogue. That is, in vect_analyze_loop change /* For epilogues start the analysis from the first mode. The motivation behind starting from the beginning comes from cases where the VECTOR_MODES array may contain length-agnostic and length-specific modes. Their ordering is not guaranteed, so we could end up picking a mode for the main loop that is after the epilogue's optimal mode. */ vector_modes[0] = autodetected_vector_mode; to go through a target hook (possibly first produce a "candidate mode" set and allow the target to prune that). This might be an "easy" fix for the AVX512 issue for low-trip loops. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations