PR 49095 requested the following optimization:
- movl -120(%rax), %ecx
- leal -1(%rcx), %edx
- movl %edx, -120(%rax)
- testl %edx, %edx
+ subl $1, -120(%rax)
jne .L92
The PR was fixed by adding a peephole, but it doesn't actually trigger
for the code sequence quoted above. This is because the pattern expects
to see a parallel including a clobber of CC, which is what you'd get for
a normal add or logical operation. For lea, this does not match: the
clobber is missing, and also the input and output operands can be different.
This shows up with some IRA cost changes I'm testing for a different PR.
The following patch adds a variant peephole. It would be a prerequisite
for those IRA changes so as to not regress an existing testcase. The new
peephole triggers a few times in my collection of .i files.
Bootstrapped and tested on x86_64-linux. Ok?
Bernd
* config/i386/i386.md (operation on memory peephole): Duplicate an
existing peephole and adapt it to match lea rather than an operation
that clobbers CC.
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md (revision 233451)
+++ gcc/config/i386/i386.md (working copy)
@@ -17952,6 +17952,38 @@ (define_peephole2
operands[5], const0_rtx);
})
+;; Likewise for instances where we have a lea pattern.
+(define_peephole2
+ [(set (match_operand:SWI 0 "register_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (set (match_operand:SWI 3 "register_operand")
+ (plus (match_dup 0)
+ (match_operand:SWI 2 "<nonmemory_operand>")))
+ (set (match_dup 1) (match_dup 3))
+ (set (reg FLAGS_REG) (compare (match_dup 3) (const_int 0)))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (4, operands[3])
+ && (rtx_equal_p (operands[0], operands[3])
+ || peep2_reg_dead_p (2, operands[0]))
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[3], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])
+ && (<MODE>mode != QImode
+ || immediate_operand (operands[2], QImode)
+ || any_QIreg_operand (operands[2], QImode))
+ && ix86_match_ccmode (peep2_next_insn (3), CCGOCmode)"
+ [(parallel [(set (match_dup 4) (match_dup 5))
+ (set (match_dup 1) (plus:SWI (match_dup 1)
+ (match_dup 2)))])]
+{
+ operands[4] = SET_DEST (PATTERN (peep2_next_insn (3)));
+ operands[5] = gen_rtx_PLUS (<MODE>mode,
+ copy_rtx (operands[1]),
+ copy_rtx (operands[2]));
+ operands[5] = gen_rtx_COMPARE (GET_MODE (operands[4]),
+ operands[5], const0_rtx);
+})
+
(define_peephole2
[(parallel [(set (match_operand:SWI 0 "register_operand")
(match_operator:SWI 2 "plusminuslogic_operator"