https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91435

            Bug ID: 91435
           Summary: Better induction variable for vectorization
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

(from https://stackoverflow.com/q/57465290/1918193)
long RegularTest(int n) {
  long sum = 0;
  for (int i = 0; i < n; ++i)
    if (i % 2 != 0)
      sum += i + 1;
  return sum;
}

Compiling with -O3 -march=skylake, this gets vectorized, but the result has

  # vect_vec_iv_.14_60 = PHI <{ 0, 1, 2, 3, 4, 5, 6, 7 }(5),
vect_vec_iv_.14_61(6)>
  vect_vec_iv_.14_61 = vect_vec_iv_.14_60 + { 8, 8, 8, 8, 8, 8, 8, 8 };
  vect__3.17_66 = vect_vec_iv_.14_60 + { 2, 2, 2, 2, 2, 2, 2, 2 };

(those are the only uses of vect_vec_iv_.14_6[01])

If we are only ever going to use x+2, why not use that instead, initialize with
{2,3,4,...}, and skip the +2 at every iteration?

(there are other things to discuss about optimizing this testcase, for instance
clang is clever enough to unroll by a factor of 2 and remove the condition, but
let's stick to the induction variable for this PR)

Reply via email to