https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120176

            Bug ID: 120176
           Summary: Missed reduction chain vectorization
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

We fail to handle the 2nd loop of the PR98138 testcase as reduction-chain:

    for( int i = 0; i < 4; i++ )
    {
        HADAMARD4( a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]
);
        sum += abs2(a0) + abs2(a1) + abs2(a2) + abs2(a3);
    }

because after reassoc we have inside of the loop

  sum.0_77 = (unsigned int) sum_152;
  _157 = _153 + sum.0_77;
  _88 = _157 + _132;
  _78 = _88 + _127;
  sum_108 = (int) _78;

and on the exit

  _143 = sum_108 & 65535;
  _79 = (unsigned int) _143;
  sum.1_80 = (unsigned int) sum_108;

which would be fine, but then forwprop applies folding of the int <-> unsigned
int roundtrip and we end up with

  # sum_152 = PHI <sum_108(4), 0(3)>
  sum.0_77 = (unsigned int) sum_152;
  _157 = sum.0_77 + _153;
  _88 = _132 + _157;
  _78 = _88 + _127;
  sum_108 = (int) _78;

  _75 = _78 & 65535;
  _81 = _78 >> 16;

which then fails reduction chain detection in the vectorizer as the
reduction path ends with sum_108 but the live variable on the exit
is _78 and so we fail the single-use check on _78:

          if ((op.code != code && !leading_conversion)
              /* We can only handle the final value in epilogue
                 generation for reduction chains.  */
              || (i != 1 && !has_single_use (gimple_get_lhs (stmt))))
            is_slp_reduc = false;

the fix is to either avoid the folding with some extensive heuristics
or relax the reduction chain detection, allowing either the trailing
conversion or the last computation result to be live.

Not handling this case as a reduction chain blocks us from using larger
than V4SI vectors for the 2nd loop.

Reply via email to