https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Potential regression on     |[11/12/13 Regression]
                   |vectorization of left shift |Potential regression on
                   |with constants since        |vectorization of left shift
                   |r11-5160-g9fc9573f9a5e94    |with constants since
                   |                            |r11-5160-g9fc9573f9a5e94
           Priority|P3                          |P2
                 CC|                            |rguenth at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |tnfchris at gcc dot 
gnu.org
   Target Milestone|---                         |11.5
             Status|NEW                         |ASSIGNED

--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
I believe the problem is actually g:27842e2a1eb26a7eae80b8efd98fb8c8bd74a68e

We added an optab for the widening left shift pattern there however the
operation requires a uniform shift constant to work. See
https://godbolt.org/z/4hqKc69Ke

The existing pattern that deals with this is vect_recog_widen_shift_pattern
which is a scalar pattern.  during build_slp it validates that constants are
the same and when they're not it aborts SLP.  This is why we lose
vectorization.  Eventually we hit V4HI for which we have no widening shift
optab for and it vectorizes using that low VF.

This example shows a number of things wrong:

1. The generic costing seems off, this sequence shouldn't have been generated,
as a vector sequence it's more inefficient than the scalar sequence. Using
-mcpu=neover-n1 or any other costing structure correctly only gives scalar.

2. vect_recog_widen_shift_pattern is implemented in the wrong place.  It
predates the existence of the SLP pattern matcher. Because of the uniform
requirements it's better to use the SLP pattern matcher where we have access to
all the constants to decide whether the pattern is a match or not.  That way we
don't abort SLP. Are you ok with this as a fix Richi?

3. The epilogue costing seems off..

This example https://godbolt.org/z/YoPcWv6Td ends up generating an
exceptionally high epilogue cost and so thinks vectorization at the higher VF
is not profitable.

*src1_18(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 8B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 10B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 12B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(uint16_t *)src1_18(D) + 14B] 1 times vec_to_scalar costs 2 in epilogue
/app/example.c:16:12: note: Cost model analysis for part in loop 0:
  Vector cost: 23
  Scalar cost: 17

For some reason it thinks it needs a scalar epilogue? using
-fno-vect-cost-model gives the desired codegen.

Reply via email to