[Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107

dominiq at lps dot ens.fr Sat, 10 Dec 2011 10:40:00 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51497


--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-12-10 
18:39:15 UTC ---
The profiles are without -flto:

+ 34.6%, nf3dprecon.2105.constprop.1, a.out
|   34.6%, nf2dprecon.2116, a.out
  33.5%, spmmult.2139, a.out
+ 29.8%, nfcg_, a.out
| + 7.6%, nf3dprecon.2105.constprop.1, a.out
| |   0.4%, nf2dprecon.2116, a.out
|   0.4%, nf2dprecon.2116, a.out
  0.9%, mattest_, a.out

and with -flto

+ 37.7%, nf3dprecon.2105.2457.constprop.1.2435, a.out
|   37.7%, nf2dprecon.2116.2442.2436, a.out
  32.7%, spmmult.2139.2426.2446, a.out
+ 27.6%, nfcg_, a.out
| + 7.0%, nf3dprecon.2105.2457.constprop.1.2435, a.out
| |   0.4%, nf2dprecon.2116.2442.2436, a.out
|   0.4%, nf2dprecon.2116.2442.2436, a.out
|   0.0%, free, libSystem.B.dylib
  0.8%, mattest_, a.out

So the slow routines are nf2dprecon, accounting for ~1.2s, and spmmult,
accounting for ~0.5s. If I am reading the assembly correctly, in nf2dprecon,
the implicit loop

x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)

is unrolled eight times without -flto and four times with -flto. In spmmult,
the implicit loop

b = ad*x

is unrolled four times and vectorized without -flto and eight times, but not
vectorized, with -flto.

Note that --param max-unroll-times=4 does not change the times.

[Bug lto/51497] [4.7 Regression] The run time for the polyhedron test nf.f90 is ~10% slower with -flto after revision 182107

Reply via email to