[Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3

ubizjak at gmail dot com Sat, 13 Oct 2007 04:27:06 -0700

The measurements were actually done on gzip-1.2.4 sources on core2-d with:

a) gcc -mtune=generic -m32 -O2
b) gcc -mtune=generic -m32 -O3


The testfile was created as the tar archive of current SVN trunk repository,
which currently accounts for 865M uncompressed.

profile of a)

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 54.63     14.76    14.76 102254750     0.00     0.00  longest_match
 18.47     19.75     4.99        1     4.99    27.02  deflate
 10.25     22.52     2.77    27389     0.00     0.00  fill_window
  6.81     24.36     1.84    27390     0.00     0.00  updcrc
  3.15     25.21     0.85     5901     0.00     0.00  compress_block
  2.85     25.98     0.77 203123663     0.00     0.00  send_bits
  2.66     26.70     0.72 89123566     0.00     0.00  ct_tally
  0.67     26.88     0.18  3378994     0.00     0.00  pqdownheap
  0.22     26.94     0.06    17709     0.00     0.00  build_tree
  0.15     26.98     0.04    11802     0.00     0.00  send_tree
  0.07     27.00     0.02  1367732     0.00     0.00  bi_reverse
  0.07     27.02     0.02    17710     0.00     0.00  gen_codes
  0.00     27.02     0.00    27390     0.00     0.00  file_read

profile of b)

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 86.86     29.35    29.35        1    29.35    33.79  deflate
  5.27     31.13     1.78    27390     0.00     0.00  updcrc
  2.69     32.04     0.91     5901     0.00     0.00  compress_block
  2.55     32.90     0.86 89123566     0.00     0.00  ct_tally
  2.04     33.59     0.69 203123663     0.00     0.00  send_bits
  0.44     33.74     0.15    17709     0.00     0.00  build_tree
  0.06     33.76     0.02  1367732     0.00     0.00  bi_reverse
  0.06     33.78     0.02     5903     0.00     0.00  flush_block
  0.03     33.79     0.01    11802     0.00     0.00  send_tree
  0.00     33.79     0.00    27390     0.00     0.00  file_read
  0.00     33.79     0.00     9237     0.00     0.00  flush_outbuf
  0.00     33.79     0.00        2     0.00     0.00  basename
  0.00     33.79     0.00        2     0.00     0.00  copy_block
  0.00     33.79     0.00        1     0.00     0.00  add_envopt

As can be seen from profiles, longest_match was inlined into deflate. Adding
__attribute__((noinline)) to longest_match prototype, we obtain:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 55.80     13.86    13.86 102254750     0.00     0.00  longest_match
 27.62     20.72     6.86        1     6.86    24.84  deflate
  7.09     22.48     1.76    27390     0.00     0.00  updcrc
  3.74     23.41     0.93     5901     0.00     0.00  compress_block
  2.62     24.06     0.65 89123566     0.00     0.00  ct_tally
  2.42     24.66     0.60 203123663     0.00     0.00  send_bits
  0.56     24.80     0.14    17709     0.00     0.00  build_tree
  0.08     24.82     0.02  1367732     0.00     0.00  bi_reverse
  0.08     24.84     0.02    11802     0.00     0.00  send_tree
  0.00     24.84     0.00    27390     0.00     0.00  file_read
  0.00     24.84     0.00     9237     0.00     0.00  flush_outbuf
  0.00     24.84     0.00     5903     0.00     0.00  flush_block
  0.00     24.84     0.00        2     0.00     0.00  basename
  0.00     24.84     0.00        2     0.00     0.00  copy_block

or ~26.5% improvement. I speculate that inlining increases register pressure on
SMALL_REGISTER_CLASS target, as this problem is not that noticeable on x86_64.

The results of 32bit run are at [1] (valid from 13. oct) and results of 64bit
run at [2].

[1]
http://vmakarov.fedorapeople.org/spec/spec2000.toolbox_32/gcc/individual-run-ratio.html
[2]
http://vmakarov.fedorapeople.org/spec/spec2000.toolbox/gcc/individual-run-ratio.html


-- 
           Summary: non-optimal inlining heuristics pessimizes gzip SPEC
                    score at -O3
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: ubizjak at gmail dot com
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761

[Bug target/33761] New: non-optimal inlining heuristics pessimizes gzip SPEC score at -O3

Reply via email to