https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119860

            Bug ID: 119860
           Summary: needless vector unrolling causes less profitable
                    vectorization
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
            Blocks: 53947, 115130
  Target Milestone: ---

consider the following loop:

#define N 512
#define END 505

long long x[N] __attribute__((aligned(32)));

int __attribute__((noipa))
foo (void)
{
  for (unsigned int i = 0; i < END; ++i)
    {
      if (x[i] > 0)
        return 1;
    }
  return -1;
}

When vectorized produces with -O3 -march=armv8-a:

.L2:
        add     v29.4s, v29.4s, v26.4s
        add     v28.4s, v28.4s, v27.4s
        cmp     x1, x0
        beq     .L15
.L4:
        ldp     q31, q30, [x0], 32
        cmgt    v31.2d, v31.2d, #0
        cmgt    v30.2d, v30.2d, #0
        orr     v31.16b, v31.16b, v30.16b
        umaxp   v31.4s, v31.4s, v31.4s
        fmov    x3, d31
        cbz     x3, .L2

which is suboptimal, due to the forcing of an 2x unroll factor.

This happens because the SLP tree rooted in the if with the vector IV works on
a smaller type than the rest of the loop.

The vectorizer enforces datasize rather than VF over the different SLP
instances and so we end up with V2DI vs V4SI and so the V2DI needs to be
unrolled.

I could have chosen V2DI and V2SI.

The unroll factor ends up making early break less profitable depending on the
loop size and also prevents further optimizations.

Perhaps we should try to enforce VF rather that total vector size first?


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
[Bug 115130] [meta-bug] early break vectorization

Reply via email to