https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346
--- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Richard Biener from comment #5) > (In reply to Tamar Christina from comment #4) > > I believe the problem is actually g:27842e2a1eb26a7eae80b8efd98fb8c8bd74a68e > > > > We added an optab for the widening left shift pattern there however the > > operation requires a uniform shift constant to work. See > > https://godbolt.org/z/4hqKc69Ke > > > > The existing pattern that deals with this is vect_recog_widen_shift_pattern > > which is a scalar pattern. during build_slp it validates that constants are > > the same and when they're not it aborts SLP. This is why we lose > > vectorization. Eventually we hit V4HI for which we have no widening shift > > optab for and it vectorizes using that low VF. > > > > This example shows a number of things wrong: > > > > 1. The generic costing seems off, this sequence shouldn't have been > > generated, as a vector sequence it's more inefficient than the scalar > > sequence. Using -mcpu=neover-n1 or any other costing structure correctly > > only gives scalar. > > > > 2. vect_recog_widen_shift_pattern is implemented in the wrong place. It > > predates the existence of the SLP pattern matcher. Because of the uniform > > requirements it's better to use the SLP pattern matcher where we have access > > to all the constants to decide whether the pattern is a match or not. That > > way we don't abort SLP. Are you ok with this as a fix Richi? > > patterns are difficult beasts - I think vect_recog_widen_shift_pattern is > at the correct place but instead what is lacking is SLP discovery support > for scrapping it - that is, ideally the vectorizer would take patterns as > a hint and ignore them when they are not helpful. Hmm, yes but the problem is that we've consumed additional related statements which now need to be handled by build_slp as well. I suppose you could do an in-place build_slp on the pattern stmt seq iterator. Though that seems like undoing a mistake. > > Now - in theory, for SLP vectorization, all patterns could be handled > as SLP patterns and scalar patterns disabled. But that isn't easy to > do either. As long as we still have the non-SLP loop vect it's probably not a good idea no? since we then lose all patterns for it. The widening shift was already sufficiently limited that it wouldn't really regress here. > > I fear to fight this regression the easiest route is to pretend the > ISA can do widen shift by vector and fixup in the expander ... I can do this, but we're hiding the cost then. Or did you want me to fudge the numbers in the costing hooks? > > > 3. The epilogue costing seems off.. > > > > This example https://godbolt.org/z/YoPcWv6Td ends up generating an > > exceptionally high epilogue cost and so thinks vectorization at the higher > > VF is not profitable. > > > > *src1_18(D) 1 times vec_to_scalar costs 2 in epilogue > > MEM[(uint16_t *)src1_18(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue > > MEM[(uint16_t *)src1_18(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue > > MEM[(uint16_t *)src1_18(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue > > MEM[(uint16_t *)src1_18(D) + 8B] 1 times vec_to_scalar costs 2 in epilogue > > MEM[(uint16_t *)src1_18(D) + 10B] 1 times vec_to_scalar costs 2 in epilogue > > MEM[(uint16_t *)src1_18(D) + 12B] 1 times vec_to_scalar costs 2 in epilogue > > MEM[(uint16_t *)src1_18(D) + 14B] 1 times vec_to_scalar costs 2 in epilogue > > /app/example.c:16:12: note: Cost model analysis for part in loop 0: > > Vector cost: 23 > > Scalar cost: 17 > > I don't see any epilogue cost - the example doesn't have a loop. With BB > vect you could see no epilogue costs? That was my expectation, but see e.g. https://godbolt.org/z/MGEMYEe86 the SLP shows the above output. I don't understand where the vec_to_scalar costs come from. > > > For some reason it thinks it needs a scalar epilogue? using > > -fno-vect-cost-model gives the desired codegen.