https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440
--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 17 May 2019, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440 > > --- Comment #14 from Martin Liška <marxin at gcc dot gnu.org> --- > (In reply to rguent...@suse.de from comment #13) > > On Fri, 17 May 2019, marxin at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440 > > > > > > --- Comment #12 from Martin Liška <marxin at gcc dot gnu.org> --- > > > > > > > > Can you share -fopt-report-loop differences? From the above I would > > > > guess we split a lot of loops, meaning the memcpy/memmove/memset > > > > calls are in the "middle" and we have to split loops (how many > > > > calls are detected here?). If that's true another way would be > > > > to only allow calls at head or tail position, thus a single > > > > non-builtin partition. > > > > > > I newly see ~1400 lines: > > > > > > module_configure.fppized.f90:7993:0: optimized: Loop 10 distributed: > > > split to 0 > > > loops and 1 library calls. > > > module_configure.fppized.f90:7995:0: optimized: Loop 11 distributed: > > > split to 0 > > > loops and 1 library calls. > > > module_configure.fppized.f90:8000:0: optimized: Loop 15 distributed: > > > split to 0 > > > loops and 1 library calls. > > > module_configure.fppized.f90:8381:0: optimized: Loop 77 distributed: > > > split to 0 > > > loops and 1 library calls. > > > module_configure.fppized.f90:8383:0: optimized: Loop 78 distributed: > > > split to 0 > > > loops and 1 library calls. > > > module_configure.fppized.f90:8498:0: optimized: Loop 105 distributed: > > > split to > > > 0 loops and 1 library calls. > > > module_configure.fppized.f90:9742:0: optimized: Loop 169 distributed: > > > split to > > > 0 loops and 1 library calls. > > > module_configure.fppized.f90:9978:0: optimized: Loop 207 distributed: > > > split to > > > 0 loops and 1 library calls. > > > module_configure.fppized.f90:9979:0: optimized: Loop 208 distributed: > > > split to > > > 0 loops and 1 library calls. > > > module_configure.fppized.f90:9980:0: optimized: Loop 209 distributed: > > > split to > > > 0 loops and 1 library calls. > > > module_configure.fppized.f90:9981:0: optimized: Loop 210 distributed: > > > split to > > > 0 loops and 1 library calls. > > > ... > > > > All with "0 loops"? That disputes my theory :/ > > Yep. All these are in a form of: > > <bb 1809> [local count: 118163158]: > # S.1565_41079 = PHI <1(2028), S.1565_32687(3351)> > # ivtmp_38850 = PHI <11(2028), ivtmp_38848(3351)> > _3211 = S.1565_41079 + -1; > _3212 = fire_ignition_start_y1[_3211]; > MEM[(real(kind=4)[11] *)&model_config_rec + 101040B][_3211] = _3212; > S.1565_32687 = S.1565_41079 + 1; > ivtmp_38848 = ivtmp_38850 - 1; > if (ivtmp_38848 == 0) > goto <bb 2027>; [9.09%] > else > goto <bb 3351>; [90.91%] > > <bb 3351> [local count: 107425740]: > goto <bb 1809>; [100.00%] > > <bb 2027> [local count: 10737418]: > > <bb 1810> [local count: 118163158]: > # S.1566_41080 = PHI <1(2027), S.1566_32689(3350)> > # ivtmp_38856 = PHI <11(2027), ivtmp_38854(3350)> > _3213 = S.1566_41080 + -1; > _3214 = fire_ignition_end_x1[_3213]; > MEM[(real(kind=4)[11] *)&model_config_rec + 101084B][_3213] = _3214; > S.1566_32689 = S.1566_41080 + 1; > ivtmp_38854 = ivtmp_38856 - 1; > if (ivtmp_38854 == 0) > goto <bb 2026>; [9.09%] > else > goto <bb 3350>; [90.91%] > > <bb 3350> [local count: 107425740]: > goto <bb 1810>; [100.00%] > > <bb 2026> [local count: 10737418]: > > <bb 1811> [local count: 118163158]: > # S.1567_41081 = PHI <1(2026), S.1567_32691(3349)> > # ivtmp_38860 = PHI <11(2026), ivtmp_38858(3349)> > _3215 = S.1567_41081 + -1; > _3216 = fire_ignition_end_y1[_3215]; > MEM[(real(kind=4)[11] *)&model_config_rec + 101128B][_3215] = _3216; > S.1567_32691 = S.1567_41081 + 1; > ivtmp_38858 = ivtmp_38860 - 1; > if (ivtmp_38858 == 0) > goto <bb 2025>; [9.09%] > else > goto <bb 3349>; [90.91%] > > <bb 3349> [local count: 107425740]: > goto <bb 1811>; [100.00%] > > <bb 2025> [local count: 10737418]: > ... > > > It's a configure module, so that it probably contains so many loops for > various > configs. Hmm, so then it might be we run into some CFG complexity cut-off before for PRE and RA but not after since the CFG should simplify a lot if we make memcpy from all of the above loops...