https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120627

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jeff Law <l...@gcc.gnu.org>:

https://gcc.gnu.org/g:41155992d572030f7918682b2642365ada1f4fbf

commit r16-1835-g41155992d572030f7918682b2642365ada1f4fbf
Author: Jeff Law <j...@ventanamicro.com>
Date:   Mon Jun 30 14:38:33 2025 -0600

    [committed] [PR rtl-optimization/120242] Fix SUBREG_PROMOTED_VAR_P after
ext-dce's actions

    I've gone back and forth of these problems multiple times.  We have two
passes,
    ext-dce and combine which eliminate extensions using totally different
    mechanisms.

    ext-dce looks for cases where the state of upper bits in an object aren't
    observable and if they aren't observable, then eliminates extensions which
set
    those bits.

    combine looks for cases where we know the state of the upper bits and can
prove
    an extension is just setting those bits to their prior value.  Combine also
    looks for cases where the precise extension isn't really important, just
the
    knowledge that the upper bits are zero or sign extended from a narrower
mode
    is needed.

    Combine relies heavily on the SUBREG_PROMOTED_VAR state to do its job.  If
the
    actions of ext-dce (or any other pass for that matter) make
    SUBREG_PROMOTED_VAR's state inconsistent with combine's expectations, then
    combine can end up generating incorrect code.

    --

    When ext-dce eliminates an extension and turns it into a subreg copy
(without
    any known SUBREG_PROMOTED_VAR state).  Since we can no longer guarantee the
    destination object has any known extension state, we scurry around and wipe
    SUBREG_PROMOTED_VAR state for the destination object.

    That's fine and dandy, but ultimately insufficient.  Consider if the
    destination of the optimized extension was used as a source in a simple
copy
    insn.  Furthermore assume that the destination of that copy is used within
a
    SUBREG expression with SUBREG_PROMOTED_VAR set.  ext-dce's actions have
    clobbered the SUBREG_PROMOTED_VAR state on the destination of that copy,
albeit
    indirectly.

    This patch addresses this problem by taking the set of pseudos directly
    impacted by ext-dce's actions and expands that set by building a transitive
    closure for pseudos connected via copies.  We then scurry around finding
    SUBREG_PROMOTED_VAR state to wipe for everything in that expanded set of
    pseudos.  Voila, everything just works.

    --

    The other approach here would be to further expand the liveness sets inside
    ext-dce.  That's a simpler path forward, but ultimately regresses the
quality
    of codes we do care about.

    One good piece of news is that with the transitive closure bits in place,
we
    can eliminate a bit of the live set expansion we had in place for
    SUBREG_PROMOTED_VAR objects.

    --

    So let's take one case of the 5 that have been reported.

    In ext-dce we have this insn:

    > (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ])
    >         (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552
{*zero_extendhidi2_bitmanip}
    >      (expr_list:REG_DEAD (reg:DI 162)
    >         (nil)))

    There are reachable uses of (reg 134):

    > (insn 49 47 52 6 (set (mem/c:HI (lo_sum:DI (reg/f:DI 186)
    >                 (symbol_ref:DI ("al") [flags 0x86]  <var_decl
0x7ffff73c2da8 al>)) [2 al+0 S2 A16])
    >         (subreg/s/v:HI (reg:DI 134 [ al_lsm.9 ]) 0)) 279
{*movhi_internal}
    >      (expr_list:REG_DEAD (reg/f:DI 186)
    >         (nil)))Obviously safe if we were to remove the extension.

    > (insn 52 49 53 6 (set (reg:DI 176)
    >         (and:DI (reg:DI 134 [ al_lsm.9 ])
    >             (const_int 5 [0x5]))) "j.c":21:12 106 {*anddi3}
    >      (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ])
    >         (nil)))
    > (insn 53 52 56 6 (set (reg:SI 177 [ _8 ])
    >         (zero_extend:SI (subreg:HI (reg:DI 176) 0))) "j.c":21:12 551
{*zero_extendhisi2_bitmanip}
    >      (expr_list:REG_DEAD (reg:DI 176)
    >         (nil))) Safe to remove the extension as we only read the low 16
bits from the destination register (reg 176) in insn 53.

    > (insn 27 26 29 3 (set (reg:DI 162)
    >         (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 134 [ al_lsm.9 ])
0)
    >                 (const_int 1 [0x1])))) "j.c":17:17 8 {addsi3_extended}
    >      (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ])
    >         (nil)))
    > (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ])
    >         (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552
{*zero_extendhidi2_bitmanip}
    >      (expr_list:REG_DEAD (reg:DI 162)
    >         (nil)))

    Again, not as obvious as the first case, but we only read the low 16 bits
from
    (reg 162) in insn 29.  So those upper bits in (reg 134) don't matter.

    > (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ])
    >         (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit}
    >      (nil))
    > (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ])
    >         (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0)))
"j.c":17:9 558 {*extendhidi2_bitmanip}
    >      (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ])
    >         (nil)))Also safe in isolation.  But worth noting that if we
remove the extension at insn 29, then the promoted status on (reg:DI 144) in
insn 30 is no longer valid.

    Setting aside the promoted state of (reg:DI 144) at insn 30 for a minute,
let's
    look into combine.

    > (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ])
    >         (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit}
    >      (nil))   [ ... ]
    > (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ])
    >         (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0)))
"j.c":17:9 558 {*extendhidi2_bitmanip}
    >      (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ])
    >         (nil)))
    > (jump_insn 31 30 32 3 (set (pc)
    >         (if_then_else (eq (reg:DI 135 [ al.2_3 ])
    >                 (const_int 0 [0]))
    >             (label_ref:DI 41)
    >             (pc))) "j.c":4:55 371 {*branchdi}
    >      (int_list:REG_BR_PROB 536870913 (nil))
    >  -> 41)

    Combine will do its thing on insns 30/31.  Essentially the sign extension
is
    not necessary in this context, assuming the promoted subreg status in insn
30
    -- the equality test doesn't really care about the kind of extension, just
    knowing the value is extended is enough to safely elide the extension.

    And now we've come to the crux the problem.  That promotion state needs to
be
    adjusted.  The new ext-dce code will see that copy at insn 26 and add (reg
144)
    to the set of registers that need promotion state wiped.  And everything is
    happy after that.

    The other cases are similar in nature.

    --

    This has been bootstrapped and regression tested on x86_64 and aarch64.
    Variants have bootstrapped & regression tested on several other platforms
and
    it's survived testing on the crosses as well.

    Pushing to the trunk...

            PR rtl-optimization/120242
            PR rtl-optimization/120627
            PR rtl-optimization/120736
            PR rtl-optimization/120813
    gcc/

            * ext-dce.cc (ext_dce_process_uses): Remove some cases where we
            unnecessarily expanded live sets for promoted subregs.
            (expand_changed_pseudos): New function.
            (reset_subreg_promoted_p): Use it.

    gcc/testsuite/

            * gcc.dg/torture/pr120242.c: New test.
            * gcc.dg/torture/pr120627.c: Likewise.
            * gcc.dg/torture/pr120736.c: Likewise.
            * gcc.dg/torture/pr120813.c: Likewise.

Reply via email to