https://gcc.gnu.org/g:19ba913517b5e2a001fa9c0f060a1ac74430c027
commit r15-9131-g19ba913517b5e2a001fa9c0f060a1ac74430c027 Author: Jakub Jelinek <ja...@redhat.com> Date: Tue Apr 1 16:40:55 2025 +0200 combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291] The following testcase is miscompiled on x86_64-linux at -O2 by the combiner. We have from earlier combinations (insn 22 21 23 4 (set (reg:SI 104 [ _7 ]) (const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal} (nil)) (insn 23 22 24 4 (set (reg/v:SI 117 [ e ]) (reg/v:SI 116 [ e ])) 96 {*movsi_internal} (expr_list:REG_DEAD (reg/v:SI 116 [ e ]) (nil))) (note 24 23 25 4 NOTE_INSN_DELETED) (insn 25 24 26 4 (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (neg:SI (reg:SI 104 [ _7 ])) (const_int 0 [0]))) (set (reg/v:SI 116 [ e ]) (neg:SI (reg:SI 104 [ _7 ]))) ]) "pr119291.c":26:13 977 {*negsi_2} (expr_list:REG_DEAD (reg:SI 104 [ _7 ]) (nil))) (note 26 25 27 4 NOTE_INSN_DELETED) (insn 27 26 28 4 (set (reg:DI 128 [ _9 ]) (ne:DI (reg:CCZ 17 flags) (const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1} (expr_list:REG_DEAD (reg:CCZ 17 flags) (nil))) and try_combine is called on i3 25 and i2 22 (second time) and reach the hunk being patched with simplified i3 (insn 25 24 26 4 (parallel [ (set (pc) (pc)) (set (reg/v:SI 116 [ e ]) (const_int 0 [0])) ]) "pr119291.c":28:13 977 {*negsi_2} (expr_list:REG_DEAD (reg:SI 104 [ _7 ]) (nil))) and (insn 22 21 23 4 (set (reg:SI 104 [ _7 ]) (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal} (nil)) Now, the try_combine code there attempts to split two independent sets in newpat by moving one of them to i2. And among other tests it checks !modified_between_p (SET_DEST (set1), i2, i3) which is certainly needed, if there would be say (set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a])) in between i2 and i3, we couldn't do that, as that set would overwrite the value set by set1 we want to move to the i2 position. But in this case pseudo 116 isn't set in between i2 and i3, but used (and additionally there is a REG_DEAD note for it). This is equally bad for the move, because while the i3 insn and later will see the pseudo value that we set, the insn in between which uses the value will see a different value from the one that it should see. As we don't check for that, in the end try_combine succeeds and changes the IL to: (insn 22 21 23 4 (set (reg/v:SI 116 [ e ]) (const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal} (nil)) (insn 23 22 24 4 (set (reg/v:SI 117 [ e ]) (reg/v:SI 116 [ e ])) 96 {*movsi_internal} (expr_list:REG_DEAD (reg/v:SI 116 [ e ]) (nil))) (note 24 23 25 4 NOTE_INSN_DELETED) (insn 25 24 26 4 (set (pc) (pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE} (nil)) (note 26 25 27 4 NOTE_INSN_DELETED) (insn 27 26 28 4 (set (reg:DI 128 [ _9 ]) (const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal} (nil)) (note, the i3 got turned into a nop and try_combine also modified insn 27). The following patch replaces the modified_between_p tests with reg_used_between_p, my understanding is that modified_between_p is a subset of reg_used_between_p, so one doesn't need both. Looking at this some more today, I think we should special case set_noop_p because that can be put into i2 (except for the JUMP_P violations), currently both modified_between_p (pc_rtx, i2, i3) and reg_used_between_p (pc_rtx, i2, i3) returns false. I'll post a patch incrementally for that (but that feels like new optimization, so probably not something that should be backported). On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote: > Can we constrain SET_DEST (set1/set0) to a REG_P in combine? Why > does the comment talk about memory? I was worried about making too risky changes this late in stage4 (and especially also for backports). Most of this code is 1992-ish. I think many of the functions are just misnamed, the reg_ in there doesn't match what those functions do (bet they initially supported just REGs and later on support for other kinds of expressions was added, but haven't done git archeology to prove that). What we know for sure is: && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART that is checked earlier in the condition. Then it calls && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)), XVECEXP (newpat, 0, 0)) && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)), XVECEXP (newpat, 0, 1)) While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p which is also misnamed, that function handles just fine all of REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT, STRICT_LOW_PART, PC and even some further cases. So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG of REG, PC (at least the REG and PC cases are triggered on the testcase) and quite possibly also MEM (SUBREG of MEM not, see below). Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that function for constants just returns false, for PC returns true, for REG returns reg_set_between_p, for MEM recurses on the address, for MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code whether the memory could have been modified in between, for all other rtxes recurses on the subrtxes. This part didn't change in my patch. I've only changed those - && !modified_between_p (SET_DEST (set{1,0}), i2, i3) + && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3) where the former has been described above and clearly handles all of REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things. The replacement reg_used_between_p calls reg_overlap_mentioned_p on each instruction in between i2 and i3. So, there is clearly a difference in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p returns unconditionally true even if there are no instructions in between, but reg_used_between_p if there are no non-debug insns in between returns false. Sorry for missing that, guess I should check for that (with the exception of the noop moves which are often (set (pc) (pc)) and handled by the incremental patch). In fact not just that, reg_used_between_p will only return true for PC if it is mentioned anywhere in the insns in between. Anyway, except for that, for REG it calls refers_to_regno_p and so should find any occurrences of any of the REG or parts of it for hard registers, for MEM returns true if it sees any MEMs in insns in between (conservatively), for SUBREGs apparently it relies on it being SUBREG of REG (so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the SUBREG_REG, PC I've already described. Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think already the initial && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)), XVECEXP (newpat, 0, 0)) && ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)), XVECEXP (newpat, 0, 1)) calls would have failed --enable-checking=rtl or would have misbehaved, so I think there is no need to check for it further. To your question why I don't use reg_referenced_p, that is because reg_referenced_p is something to call on one insn pattern, while reg_used_between_p is pretty much that on all insns in between two instructions (excluding the boundaries). So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx checks to preserve former behavior, like in the following version. 2025-04-01 Jakub Jelinek <ja...@redhat.com> PR rtl-optimization/119291 * combine.cc (try_combine): For splitting of PARALLEL with 2 independent SETs into i2 and i3 sets check reg_used_between_p of the SET_DESTs rather than just modified_between_p. * gcc.c-torture/execute/pr119291.c: New test. Diff: --- gcc/combine.cc | 16 +++++++------ gcc/testsuite/gcc.c-torture/execute/pr119291.c | 33 ++++++++++++++++++++++++++ 2 files changed, 42 insertions(+), 7 deletions(-) diff --git a/gcc/combine.cc b/gcc/combine.cc index ef13f5d5e900..1b6c4e314cc9 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -4012,18 +4012,19 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, rtx set1 = XVECEXP (newpat, 0, 1); /* Normally, it doesn't matter which of the two is done first, but - one which uses any regs/memory set in between i2 and i3 can't - be first. The PARALLEL might also have been pre-existing in i3, - so we need to make sure that we won't wrongly hoist a SET to i2 - that would conflict with a death note present in there, or would - have its dest modified between i2 and i3. */ + one which uses any regs/memory set or used in between i2 and i3 + can't be first. The PARALLEL might also have been pre-existing + in i3, so we need to make sure that we won't wrongly hoist a SET + to i2 that would conflict with a death note present in there, or + would have its dest modified or used between i2 and i3. */ if (!modified_between_p (SET_SRC (set1), i2, i3) && !(REG_P (SET_DEST (set1)) && find_reg_note (i2, REG_DEAD, SET_DEST (set1))) && !(GET_CODE (SET_DEST (set1)) == SUBREG && find_reg_note (i2, REG_DEAD, SUBREG_REG (SET_DEST (set1)))) - && !modified_between_p (SET_DEST (set1), i2, i3) + && SET_DEST (set1) != pc_rtx + && !reg_used_between_p (SET_DEST (set1), i2, i3) /* If I3 is a jump, ensure that set0 is a jump so that we do not create invalid RTL. */ && (!JUMP_P (i3) || SET_DEST (set0) == pc_rtx) @@ -4038,7 +4039,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, && !(GET_CODE (SET_DEST (set0)) == SUBREG && find_reg_note (i2, REG_DEAD, SUBREG_REG (SET_DEST (set0)))) - && !modified_between_p (SET_DEST (set0), i2, i3) + && SET_DEST (set0) != pc_rtx + && !reg_used_between_p (SET_DEST (set0), i2, i3) /* If I3 is a jump, ensure that set1 is a jump so that we do not create invalid RTL. */ && (!JUMP_P (i3) || SET_DEST (set1) == pc_rtx) diff --git a/gcc/testsuite/gcc.c-torture/execute/pr119291.c b/gcc/testsuite/gcc.c-torture/execute/pr119291.c new file mode 100644 index 000000000000..41eadf013d7d --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr119291.c @@ -0,0 +1,33 @@ +/* PR rtl-optimization/119291 */ + +int a; +long c; + +__attribute__((noipa)) void +foo (int x) +{ + if (x != 0) + __builtin_abort (); + a = 42; +} + +int +main () +{ + int e = 1; +lab: + if (a < 2) + { + int b = e; + _Bool d = a != 0; + _Bool f = b != 0; + unsigned long g = -(d & f); + unsigned long h = c & g; + unsigned long i = ~c; + e = -(i & h); + c = e != 0; + a = ~e + b; + foo (e); + goto lab; + } +}