https://gcc.gnu.org/g:4ed07a11ee2845c2085a3cd5cff043209a452441

commit r15-7915-g4ed07a11ee2845c2085a3cd5cff043209a452441
Author: Jeff Law <j...@ventanamicro.com>
Date:   Sun Mar 9 13:28:10 2025 -0600

    [rtl-optimization/117467] Avoid unnecessarily marking things live in ext-dce
    
    This is the first of what I expect to be a few patches to improve memory
    consumption and performance of ext-dce.
    
    While I haven't been able to reproduce the insane memory usage that Richi 
saw,
    I can certainly see how we might get there.  I instrumented ext-dce to dump 
the
    size of liveness sets, removed the memory allocation limiter, then compiled 
the
    appropriate file from specfp on rv64.
    
    In my test I saw the liveness sets growing to absurd sizes as we worked from
    the last block back to the first.  Think 125k entries by the time we got 
back
    to the entry block which would mean ~30k live registers.  Simply no way 
that's
    correct.
    
    The use handling is the primary source of problems and the code that I most
    want to rewrite for gcc-16.  It's just a fugly mess.  I'm not terribly 
inclined
    to do that rewrite for gcc-15 though.  So these will be spot adjustments.
    
    The most important thing to know about use processing is it sets up an 
iterator
    and walks that.  When a SET is encountered we actually manually
    dive into the SRC/DEST and ideally terminate the iterator.
    
    If during that SET processing we encounter something unexpected we let the
    iterator continue normally, which causes iteration down into the SET_DEST
    object.  That's safe behavior, though it can lead to too many objects as 
being
    marked live.
    
    We can refine that behavior by trivially realizing that we need not process 
the
    SET_DEST if it is a naked REG (and probably for other cases too, but they're
    not expected to be terribly important).  So once we see the SET with a 
simple
    REG destination, we can bump the iterator to avoid having it dive into the
    SET_DEST if something unexpected is seen on the SET_SRC side.
    
    Fixing this alone takes us from 125k live objects to 10k live objects at the
    entry block.  Time in ext-dce for rv64 on the testcase goes from 10.81s to
    2.14s.
    
    Given this reduces the things considered live, this could easily result in
    finding more cases for ext-dce to improve.  In fact a missed optimization 
issue
    for rv64 I've been poking at needs this patch as a prerequisite.
    
    Bootstrapped and regression tested on x86_64.
    
    Pushing to the trunk.
    
            PR rtl-optimization/117467
    gcc
            * ext-dce.cc (ext_dce_process_uses): When trivially possible advance
            the iterator over the destination of a SET.

Diff:
---
 gcc/ext-dce.cc | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 626c431f601e..c53dd5b46161 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -643,6 +643,18 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
          /* The code of the RHS of a SET.  */
          enum rtx_code code = GET_CODE (src);
 
+         /* If we break the main loop below, then we will continue processing
+            sub-components of this RTX, including the SET_DEST.
+
+            That is not necessary if the SET_DEST is a REG.  We can just bump 
the
+            iterator to the next element to skip handling the SET_DEST.
+
+            We can probably do this for ZERO_EXTRACT, STRICT_LOW_PART and 
SUBREG
+            destinations as well.  But I want to rewrite all this code and keep
+            this fix conservative given we're deep into the gcc-15 release 
cycle.  */
+         if (REG_P (dst))
+           iter.next ();
+
          /* ?!? How much of this should mirror SET handling, potentially
             being shared?   */
          if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())

Reply via email to