https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101842

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
             Blocks|                            |53947
   Last reconfirmed|                            |2021-08-10
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that there's no symbolic expression to compute the number of
iterations since the number of iterations depends on data computed inside the
loop.  We require a symbolic number of iterations in various places since
we're using a canonical IV for loop control.  There's also the dynamic cost
check which depends on the number of vector iterations - I suppose for this
kind of loop we'd have to statically assert the vectorization is always
profitable.

But confirmed, we can't vectorize this loop.  But we should vectorize the
basic-block eventually.  We currently don't because the reduction handling
has the mixed +- case not implemented yet and we see

  _41 = powmult_3 + powmult_5;
  _42 = powmult_7 + _41;
  _43 = powmult_9 + _42;
  d_25 = d_35 - _43;

we detect this as reduction of 5 lanes and fail to see the opportunity to
reduce the 4 lanes with PLUS and then do the final minus with the remaining
(unvectorized) scalar.

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index f9ca24415a2..33b21c8c247 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5666,10 +5666,12 @@ vect_slp_check_for_constructors (bb_vec_info bb_vinfo)
                {
                  if (chain[i].dt != vect_internal_def)
                    invalid_cst = true;
-                 else if (chain[i].code != code)
-                   invalid_op = true;
                  else
-                   valid_lanes++;
+                   {
+                     valid_lanes++;
+                     if (chain[i].code != code)
+                       invalid_op = true;
+                   }
                }
              if (!invalid_op && !invalid_cst)
                {

then properly prints:

t.c:4:27: optimized:  BB reduction missed with 5 lanes

The one different op lane could be handled similar as to the yet unsupported
constant - we need to record this operand and apply the it to the reduction
int the epilogue.

Let me try sth.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to