https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117510
Bug ID: 117510 Summary: Inner loop with static trip count breaks vectorization of outer loop Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Target Milestone: --- Consider the following snippet: void f(int n, int m, double *a) { #pragma omp simd for (int i = 0; i < n; i++) for (int j = 0; j < m; j++) a[i] += 2*a[i] + j; } where the objective is to vectorize the outer loop. At the moment GCC will refuse to vectorize this due to the inner loop. However, this loop presents no issues and, indeed, if m is substituted for a small constant it will vectorize fine (presumably because of unrolling). When m is known at compile time (pretty common) and the loop body is small (such as in this example) unrolling is viable. But for larger inner loop bodies it quickly becomes expensive and leads to large amounts of unnecessary code bloat. It would therefore be nice if the vectorizer could explicitly recognize this idiom of a non-problematic inner loop. (For some context this loop structure appears frequently in PDE solvers where you need to apply some kind of iterative method at each grid-point. Typically, with something like Newton's method we can bound the trip count and thus avoid breaks/tests, thus giving rise to these inner loops with fixed trip counts.)