https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65594
--- Comment #1 from vries at gcc dot gnu.org --- The testcase contains one loop nest with 3 loops with iteration counts 500, so the inner loop body is executed 125.000.000 times: ... #define N 500 int X[2*N], Y[2*N], B[2*N]; int A[2*N][2*N], C[2*N][2*N]; int foo(void) { int i, j, k; for (i = 0; i < N; i++) { X[i] = Y[i] + 10; for (j = 0; j < N; j++) { B[j] = A[j][N]; for (k = 0; k < N; k++) { A[j+1][k] = B[j] + C[j][k]; } Y[i+j] = A[j+1][N]; } } return A[1][5]*B[6]; } ... The testcase uses -ftree-parallelize-loops=4 -floop-parallelize-all, and we parallelize the inner loop, which means we call __builtin_GOMP_parallel 250.000 times, to create 1.000.000 threads, which each handle 125 iterations.