https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117510

            Bug ID: 117510
           Summary: Inner loop with static trip count breaks vectorization
                    of outer loop
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: freddie at witherden dot org
  Target Milestone: ---

Consider the following snippet:

void f(int n, int m, double *a)
{
    #pragma omp simd
    for (int i = 0; i < n; i++)
        for (int j = 0; j < m; j++)
            a[i] += 2*a[i] + j;
}

where the objective is to vectorize the outer loop.  At the moment GCC will
refuse to vectorize this due to the inner loop.  However, this loop presents no
issues and, indeed, if m is substituted for a small constant it will vectorize
fine (presumably because of unrolling).

When m is known at compile time (pretty common) and the loop body is small
(such as in this example) unrolling is viable.  But for larger inner loop
bodies it quickly becomes expensive and leads to large amounts of unnecessary
code bloat.  It would therefore be nice if the vectorizer could explicitly
recognize this idiom of a non-problematic inner loop.

(For some context this loop structure appears frequently in PDE solvers where
you need to apply some kind of iterative method at each grid-point.  Typically,
with something like Newton's method we can bound the trip count and thus avoid
breaks/tests, thus giving rise to these inner loops with fixed trip counts.)

Reply via email to