https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443
vries at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #35078|0 |1
is obsolete| |
--- Comment #10 from vries at gcc dot gnu.org ---
Created attachment 35092
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35092&action=edit
WIP patch
Updated patch which fixes probability/frequency. The generated code for the
loopfn is now identical at the optimized dump (previously we were sinking loads
into the loop nest due to the broken probability/frequency).
The main difference in generated code at the optimized dump is this:
...
<bb 5>:
+ n_24 = n_5(D);
.paral_data_store.6.a = &a;
.paral_data_store.6.b = &b;
.paral_data_store.6.c = &c;
- .paral_data_store.6.D.1854 = _12;
+ .paral_data_store.6.D.1854 = n_5(D);
__builtin_GOMP_parallel (f._loopfn.0, &.paral_data_store.6, 2, 0);
- ivtmp_27 = (signed int) _12;
- _29 = a[ivtmp_27];
- _30 = b[ivtmp_27];
- _31 = _29 + _30;
- c[ivtmp_27] = _31;
...
That is, we up the number of iterations with one (from _n - 1 to n), and remove
the peeled-off last loop iteration (the code after the
__builtin_GOMP_parallel).