6 Regression] Huge increase in memory usage and compile time in combine

rguenth at gcc dot gnu.org Thu, 11 Feb 2016 02:44:19 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59811


--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
So SCCVN is limited with --param sccvn-max-alias-queries-per-access which has
a quite high default value (1000).  There isn't something more elaborate like
having an overall budget as well.

With lowering the default value to 100 I get to

 alias stmt walking      :  12.15 (16%) usr   0.06 (12%) sys  12.23 (16%) wall 
     2 kB ( 0%) ggc
...
 tree PRE                :   9.48 (13%) usr   0.01 ( 2%) sys   9.48 (13%) wall 
  1333 kB ( 0%) ggc
 tree FRE                :   9.91 (13%) usr   0.03 ( 6%) sys   9.92 (13%) wall 
   310 kB ( 0%) ggc
...
 combiner                :  31.87 (43%) usr   0.24 (46%) sys  32.10 (43%) wall 
528441 kB (80%) ggc
 rest of compilation     :  51.05 (69%) usr   0.31 (60%) sys  51.25 (69%) wall 
620347 kB (94%) ggc
 TOTAL                 :  73.85             0.52            74.34            
661274 kB

(see disclaimer about things not adding up).  As we run FRE twice and PRE
only once (but it uses a cheaper walking mode that ends up giving up earlier)
a reasonable result would be FRE at around 2 times the PRE time.

Note that we do run into the above limit all the time with this testcase
which has loads of stores and loads (non-aliased, of course).  And we do
seem to optimize stuff as upping the limit results in faster overall compile
time.

I think we need to more intelligently limit the walking and/or design some
caching of queries that works better or simply make the queries cheaper.
Most of the query cost comes from decomposing our reference trees and
repeatedly computing the overall access offset/size (get_ref_base_and_extent).
So if we could cache that we might win.

The number of loads/stores are not that bad in the testcase, we have
1700 stores after fre1 and 547 loads, 1142 stores after fre2 and 547 loads.
Before fre1 we had 1714 stores and 15334 loads.  So its clearly fre1 that
will run into this compile-time hog (it does max. #loads * #stores alias
queries).  OTOH we _do_ want to optimize those 14000 redundant loads!
It simply looks like the defs are far away (happens for example if you
have an initializer at the start of the function we can optimize from).

The 547 loads after fre1 is actually with upping the limit to 10000, with
the default limit we still have 5548 loads!  With lowering to 100 we get
11616 loads.

Note that compile-time would be worse if there were nothing to optimize here
(then you hit the worst-case of #loads * #stores alias queries).

[Bug rtl-optimization/59811] [4.9/5/6 Regression] Huge increase in memory usage and compile time in combine

Reply via email to