[Bug fortran/48636] Enable more inlining with -O2 and higher

dominiq at lps dot ens.fr Sun, 17 Apr 2011 07:12:47 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636


--- Comment #5 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-04-17 
14:12:30 UTC ---
I have investigated why test_fpu is slower with --param
max-inline-insns-auto=400 (11.18s) compared to -finline-limit=600 (10.84s) in
the timings of comment #2. This is due to the inlining of dgemm in the fourth
test Lapack 2:

[macbook] lin/test% gfc -Ofast -funroll-loops -fstack-arrays --param
max-inline-insns-auto=385 test_lap.f90
[macbook] lin/test% time a.out
  Benchmark running, hopefully as only ACTIVE task
Test4 - Lapack 2 (1001x1001) inverts  2.6 sec  Err= 0.000000000000250
                             total =  2.6 sec

2.824u 0.081s 0:02.90 100.0%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfc -Ofast -funroll-loops -fstack-arrays --param
max-inline-insns-auto=386 test_lap.f90
[macbook] lin/test% time a.out
  Benchmark running, hopefully as only ACTIVE task
Test4 - Lapack 2 (1001x1001) inverts  3.0 sec  Err= 0.000000000000250
                             total =  3.0 sec

3.214u 0.082s 0:03.29 100.0%    0+0k 0+0io 0pf+0w

Looking at the assembly, I see 'call    _dgemm_' three times for 385 and none
for 386 (note there are only two calls in the code one in dgetri always inlined
and one in dgetrf not inlined). It would be interesting to understand why
inlining dgemm slows down the code.

[Bug fortran/48636] Enable more inlining with -O2 and higher

Reply via email to