https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943

--- Comment #48 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Macleod from comment #47)
> Created attachment 52637 [details]
> new patch
> 
> I am working on a alternative cache for GCC 13, but along the way, I have
> changes to the ranger_cache::range_from_dom() routine.  The original version
> gave up when it hit a block which had outgoing edges. The new version is
> smarter and basically goes back until it finds a cache entry, and then
> intersects all outgoing edge between the two places. It also removes the
> recursion , and does not SET any cache values during the lookup (making it a
> true query).
> 
> The net effect of this is significant improvements in cache performance
> because its used far less, but there is more time spend doing calculations.
> This bootstraps and passes all regression tests.  we do miss out on a few
> minor opportunities (30 out of 4400 in all of EVRP over the GCC source) 
> which occur as a result of updated values not being propagated properly as
> the cache is no longer "full" like it was before.  
> 
> IN GCC 13 I will address this, but I thought you might be interested in
> trying this patch against this PR.
> 
> In building 380 GCC source files, I see the following avg speedups
> evrp : -22.57%
> VRP2 : -5.4%
> thread_jumps_full : -14.16%
> total : -0.44%
> 
> So it is not insignificant.
> 
> It is likely to be most effective in large CFGs.
> This is *total* compile time percent speed up for the 5 most significant
> cases:
> 
> expr.ii  -2.62%
> lra-constraints.ii -3.75%
> caller-save.ii -3.98%
> reload.ii -4.04%
> optabs.ii -5.05%
> 
> EVRP isolated speedups (yes, these are *percetage* speedup)
> expr.ii -62.38
> simplify-rtx.ii  -65.97
> lra-constraints.ii  -67.87
> reload.ii trunk  -68.67
> caller-save.ii trunk  -71.93
> optabs.ii trunk  -78.69
> 
> I think those times are probably worth the odd miss.
> 
> Anyway, next time you are checking performance for this PR maybe also try
> this patch and see how it performs.

It helps quite a bit, the worst case is now

 tree VRP                           :   5.14 (  7%)   0.02 (  3%)   5.15 (  7%)
   2
9M (  3%)
 backwards jump threading           :   4.05 (  6%)   0.00 (  0%)   4.06 (  6%)
 222
0k (  0%)

overall the patch reduces compile time from 766s to 749 (parallel compile,
serial LTO, release checking).  So IMHO definitely worth it if you are happy
with it.

Reply via email to