https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443
vries at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #35078|0 |1 is obsolete| | --- Comment #10 from vries at gcc dot gnu.org --- Created attachment 35092 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35092&action=edit WIP patch Updated patch which fixes probability/frequency. The generated code for the loopfn is now identical at the optimized dump (previously we were sinking loads into the loop nest due to the broken probability/frequency). The main difference in generated code at the optimized dump is this: ... <bb 5>: + n_24 = n_5(D); .paral_data_store.6.a = &a; .paral_data_store.6.b = &b; .paral_data_store.6.c = &c; - .paral_data_store.6.D.1854 = _12; + .paral_data_store.6.D.1854 = n_5(D); __builtin_GOMP_parallel (f._loopfn.0, &.paral_data_store.6, 2, 0); - ivtmp_27 = (signed int) _12; - _29 = a[ivtmp_27]; - _30 = b[ivtmp_27]; - _31 = _29 + _30; - c[ivtmp_27] = _31; ... That is, we up the number of iterations with one (from _n - 1 to n), and remove the peeled-off last loop iteration (the code after the __builtin_GOMP_parallel).