https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120627
--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jeff Law <l...@gcc.gnu.org>: https://gcc.gnu.org/g:41155992d572030f7918682b2642365ada1f4fbf commit r16-1835-g41155992d572030f7918682b2642365ada1f4fbf Author: Jeff Law <j...@ventanamicro.com> Date: Mon Jun 30 14:38:33 2025 -0600 [committed] [PR rtl-optimization/120242] Fix SUBREG_PROMOTED_VAR_P after ext-dce's actions I've gone back and forth of these problems multiple times. We have two passes, ext-dce and combine which eliminate extensions using totally different mechanisms. ext-dce looks for cases where the state of upper bits in an object aren't observable and if they aren't observable, then eliminates extensions which set those bits. combine looks for cases where we know the state of the upper bits and can prove an extension is just setting those bits to their prior value. Combine also looks for cases where the precise extension isn't really important, just the knowledge that the upper bits are zero or sign extended from a narrower mode is needed. Combine relies heavily on the SUBREG_PROMOTED_VAR state to do its job. If the actions of ext-dce (or any other pass for that matter) make SUBREG_PROMOTED_VAR's state inconsistent with combine's expectations, then combine can end up generating incorrect code. -- When ext-dce eliminates an extension and turns it into a subreg copy (without any known SUBREG_PROMOTED_VAR state). Since we can no longer guarantee the destination object has any known extension state, we scurry around and wipe SUBREG_PROMOTED_VAR state for the destination object. That's fine and dandy, but ultimately insufficient. Consider if the destination of the optimized extension was used as a source in a simple copy insn. Furthermore assume that the destination of that copy is used within a SUBREG expression with SUBREG_PROMOTED_VAR set. ext-dce's actions have clobbered the SUBREG_PROMOTED_VAR state on the destination of that copy, albeit indirectly. This patch addresses this problem by taking the set of pseudos directly impacted by ext-dce's actions and expands that set by building a transitive closure for pseudos connected via copies. We then scurry around finding SUBREG_PROMOTED_VAR state to wipe for everything in that expanded set of pseudos. Voila, everything just works. -- The other approach here would be to further expand the liveness sets inside ext-dce. That's a simpler path forward, but ultimately regresses the quality of codes we do care about. One good piece of news is that with the transitive closure bits in place, we can eliminate a bit of the live set expansion we had in place for SUBREG_PROMOTED_VAR objects. -- So let's take one case of the 5 that have been reported. In ext-dce we have this insn: > (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ]) > (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 {*zero_extendhidi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 162) > (nil))) There are reachable uses of (reg 134): > (insn 49 47 52 6 (set (mem/c:HI (lo_sum:DI (reg/f:DI 186) > (symbol_ref:DI ("al") [flags 0x86] <var_decl 0x7ffff73c2da8 al>)) [2 al+0 S2 A16]) > (subreg/s/v:HI (reg:DI 134 [ al_lsm.9 ]) 0)) 279 {*movhi_internal} > (expr_list:REG_DEAD (reg/f:DI 186) > (nil)))Obviously safe if we were to remove the extension. > (insn 52 49 53 6 (set (reg:DI 176) > (and:DI (reg:DI 134 [ al_lsm.9 ]) > (const_int 5 [0x5]))) "j.c":21:12 106 {*anddi3} > (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ]) > (nil))) > (insn 53 52 56 6 (set (reg:SI 177 [ _8 ]) > (zero_extend:SI (subreg:HI (reg:DI 176) 0))) "j.c":21:12 551 {*zero_extendhisi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 176) > (nil))) Safe to remove the extension as we only read the low 16 bits from the destination register (reg 176) in insn 53. > (insn 27 26 29 3 (set (reg:DI 162) > (sign_extend:DI (plus:SI (subreg/s/v:SI (reg:DI 134 [ al_lsm.9 ]) 0) > (const_int 1 [0x1])))) "j.c":17:17 8 {addsi3_extended} > (expr_list:REG_DEAD (reg:DI 134 [ al_lsm.9 ]) > (nil))) > (insn 29 27 30 3 (set (reg:DI 134 [ al_lsm.9 ]) > (zero_extend:DI (subreg:HI (reg:DI 162) 0))) "j.c":17:17 552 {*zero_extendhidi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 162) > (nil))) Again, not as obvious as the first case, but we only read the low 16 bits from (reg 162) in insn 29. So those upper bits in (reg 134) don't matter. > (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ]) > (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit} > (nil)) > (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ]) > (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) "j.c":17:9 558 {*extendhidi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ]) > (nil)))Also safe in isolation. But worth noting that if we remove the extension at insn 29, then the promoted status on (reg:DI 144) in insn 30 is no longer valid. Setting aside the promoted state of (reg:DI 144) at insn 30 for a minute, let's look into combine. > (insn 26 92 27 3 (set (reg:DI 144 [ ivtmp.17 ]) > (reg:DI 134 [ al_lsm.9 ])) 277 {*movdi_64bit} > (nil)) [ ... ] > (insn 30 29 31 3 (set (reg:DI 135 [ al.2_3 ]) > (sign_extend:DI (subreg/s/v:HI (reg:DI 144 [ ivtmp.17 ]) 0))) "j.c":17:9 558 {*extendhidi2_bitmanip} > (expr_list:REG_DEAD (reg:DI 144 [ ivtmp.17 ]) > (nil))) > (jump_insn 31 30 32 3 (set (pc) > (if_then_else (eq (reg:DI 135 [ al.2_3 ]) > (const_int 0 [0])) > (label_ref:DI 41) > (pc))) "j.c":4:55 371 {*branchdi} > (int_list:REG_BR_PROB 536870913 (nil)) > -> 41) Combine will do its thing on insns 30/31. Essentially the sign extension is not necessary in this context, assuming the promoted subreg status in insn 30 -- the equality test doesn't really care about the kind of extension, just knowing the value is extended is enough to safely elide the extension. And now we've come to the crux the problem. That promotion state needs to be adjusted. The new ext-dce code will see that copy at insn 26 and add (reg 144) to the set of registers that need promotion state wiped. And everything is happy after that. The other cases are similar in nature. -- This has been bootstrapped and regression tested on x86_64 and aarch64. Variants have bootstrapped & regression tested on several other platforms and it's survived testing on the crosses as well. Pushing to the trunk... PR rtl-optimization/120242 PR rtl-optimization/120627 PR rtl-optimization/120736 PR rtl-optimization/120813 gcc/ * ext-dce.cc (ext_dce_process_uses): Remove some cases where we unnecessarily expanded live sets for promoted subregs. (expand_changed_pseudos): New function. (reset_subreg_promoted_p): Use it. gcc/testsuite/ * gcc.dg/torture/pr120242.c: New test. * gcc.dg/torture/pr120627.c: Likewise. * gcc.dg/torture/pr120736.c: Likewise. * gcc.dg/torture/pr120813.c: Likewise.