[Bug tree-optimization/88440] size optimization of memcpy-like code

rguenther at suse dot de Fri, 17 May 2019 05:02:09 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440


--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 17 May 2019, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440
> 
> --- Comment #14 from Martin Liška <marxin at gcc dot gnu.org> ---
> (In reply to rguent...@suse.de from comment #13)
> > On Fri, 17 May 2019, marxin at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88440
> > > 
> > > --- Comment #12 from Martin Liška <marxin at gcc dot gnu.org> ---
> > > > 
> > > > Can you share -fopt-report-loop differences?  From the above I would
> > > > guess we split a lot of loops, meaning the memcpy/memmove/memset
> > > > calls are in the "middle" and we have to split loops (how many
> > > > calls are detected here?).  If that's true another way would be
> > > > to only allow calls at head or tail position, thus a single
> > > > non-builtin partition.
> > > 
> > > I newly see ~1400 lines:
> > > 
> > > module_configure.fppized.f90:7993:0: optimized: Loop 10 distributed: 
> > > split to 0
> > > loops and 1 library calls.
> > > module_configure.fppized.f90:7995:0: optimized: Loop 11 distributed: 
> > > split to 0
> > > loops and 1 library calls.
> > > module_configure.fppized.f90:8000:0: optimized: Loop 15 distributed: 
> > > split to 0
> > > loops and 1 library calls.
> > > module_configure.fppized.f90:8381:0: optimized: Loop 77 distributed: 
> > > split to 0
> > > loops and 1 library calls.
> > > module_configure.fppized.f90:8383:0: optimized: Loop 78 distributed: 
> > > split to 0
> > > loops and 1 library calls.
> > > module_configure.fppized.f90:8498:0: optimized: Loop 105 distributed: 
> > > split to
> > > 0 loops and 1 library calls.
> > > module_configure.fppized.f90:9742:0: optimized: Loop 169 distributed: 
> > > split to
> > > 0 loops and 1 library calls.
> > > module_configure.fppized.f90:9978:0: optimized: Loop 207 distributed: 
> > > split to
> > > 0 loops and 1 library calls.
> > > module_configure.fppized.f90:9979:0: optimized: Loop 208 distributed: 
> > > split to
> > > 0 loops and 1 library calls.
> > > module_configure.fppized.f90:9980:0: optimized: Loop 209 distributed: 
> > > split to
> > > 0 loops and 1 library calls.
> > > module_configure.fppized.f90:9981:0: optimized: Loop 210 distributed: 
> > > split to
> > > 0 loops and 1 library calls.
> > > ...
> > 
> > All with "0 loops"?  That disputes my theory :/
> 
> Yep. All these are in a form of:
> 
>   <bb 1809> [local count: 118163158]:
>   # S.1565_41079 = PHI <1(2028), S.1565_32687(3351)>
>   # ivtmp_38850 = PHI <11(2028), ivtmp_38848(3351)>
>   _3211 = S.1565_41079 + -1;
>   _3212 = fire_ignition_start_y1[_3211];
>   MEM[(real(kind=4)[11] *)&model_config_rec + 101040B][_3211] = _3212;
>   S.1565_32687 = S.1565_41079 + 1;
>   ivtmp_38848 = ivtmp_38850 - 1;
>   if (ivtmp_38848 == 0)
>     goto <bb 2027>; [9.09%]
>   else
>     goto <bb 3351>; [90.91%]
> 
>   <bb 3351> [local count: 107425740]:
>   goto <bb 1809>; [100.00%]
> 
>   <bb 2027> [local count: 10737418]:
> 
>   <bb 1810> [local count: 118163158]:
>   # S.1566_41080 = PHI <1(2027), S.1566_32689(3350)>
>   # ivtmp_38856 = PHI <11(2027), ivtmp_38854(3350)>
>   _3213 = S.1566_41080 + -1;
>   _3214 = fire_ignition_end_x1[_3213];
>   MEM[(real(kind=4)[11] *)&model_config_rec + 101084B][_3213] = _3214;
>   S.1566_32689 = S.1566_41080 + 1;
>   ivtmp_38854 = ivtmp_38856 - 1;
>   if (ivtmp_38854 == 0)
>     goto <bb 2026>; [9.09%]
>   else
>     goto <bb 3350>; [90.91%]
> 
>   <bb 3350> [local count: 107425740]:
>   goto <bb 1810>; [100.00%]
> 
>   <bb 2026> [local count: 10737418]:
> 
>   <bb 1811> [local count: 118163158]:
>   # S.1567_41081 = PHI <1(2026), S.1567_32691(3349)>
>   # ivtmp_38860 = PHI <11(2026), ivtmp_38858(3349)>
>   _3215 = S.1567_41081 + -1;
>   _3216 = fire_ignition_end_y1[_3215];
>   MEM[(real(kind=4)[11] *)&model_config_rec + 101128B][_3215] = _3216;
>   S.1567_32691 = S.1567_41081 + 1;
>   ivtmp_38858 = ivtmp_38860 - 1;
>   if (ivtmp_38858 == 0)
>     goto <bb 2025>; [9.09%]
>   else
>     goto <bb 3349>; [90.91%]
> 
>   <bb 3349> [local count: 107425740]:
>   goto <bb 1811>; [100.00%]
> 
>   <bb 2025> [local count: 10737418]:
> ...
> 
> 
> It's a configure module, so that it probably contains so many loops for 
> various
> configs.

Hmm, so then it might be we run into some CFG complexity cut-off
before for PRE and RA but not after since the CFG should simplify
a lot if we make memcpy from all of the above loops...

[Bug tree-optimization/88440] size optimization of memcpy-like code

Reply via email to