https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113114
--- Comment #5 from Alex Coplan <acoplan at gcc dot gnu.org> --- Hmm, so initially (with the testcase in c3) we have: ldp s30, s29, [x0, #-4] ... add x0, x0, #-4 and we try to form: ldp s30, s29, [x0, #-4]! with this RTL: (rr) call debug (pair_change.m_insn->rtl ()) (insn 47 18 20 3 (parallel [ (set (reg:DI 0 x0 [119]) (plus:DI (reg:DI 0 x0 [orig:101 ivtmp.12 ] [101]) (const_int -4 [0xfffffffffffffffc]))) (set (reg:SF 62 v30 [orig:122 MEM[(float *)_18] ] [122]) (mem:SF (plus:DI (reg:DI 0 x0 [orig:101 ivtmp.12 ] [101]) (const_int -4 [0xfffffffffffffffc])) [0 +0 S4 A32])) (set (reg:SF 61 v29 [orig:116 MEM[(float *)_18] ] [116]) (mem:SF (reg:DI 0 x0 [orig:101 ivtmp.12 ] [101]) [0 +4 S4 A32])) ]) "t.c":6:7 -1 (nil)) but the problem is that we're expecting to match this pattern: ;; Load pair with pre-index writeback. (define_insn "*loadwb_pre_pair_<ldst_sz>" [(set (match_operand 0 "pmode_register_operand") (match_operator 8 "pmode_plus_operator" [ (match_operand 1 "pmode_register_operand") (match_operand 4 "const_int_operand")])) (set (match_operand:GPI 2 "aarch64_ldp_reg_operand") (match_operator 6 "memory_operand" [ (match_operator 9 "pmode_plus_operator" [ (match_dup 1) (match_dup 4) ])])) (set (match_operand:GPI 3 "aarch64_ldp_reg_operand") (match_operator 7 "memory_operand" [ (match_operator 10 "pmode_plus_operator" [ (match_dup 1) (match_operand 5 "const_int_operand") ])]))] "aarch64_mem_pair_offset (operands[4], <MODE>mode) && known_eq (INTVAL (operands[5]), INTVAL (operands[4]) + GET_MODE_SIZE (<MODE>mode))" {@ [cons: =&0, 1, =2, =3; attrs: type ] [ rk, 0, r, r; load_<ldpstp_sz>] ldp\t%<w>2, %<w>3, [%0, %4]! [ rk, 0, w, w; neon_load1_2reg ] ldp\t%<v>2, %<v>3, [%0, %4]! } ) which simply doesn't match due to the shape of the RTL: that is, the pattern hard-codes two plus operands, but due to the offset of -4 here we end up with the second operand accessing memory directly at (the initial value of) x0. We could add a second pattern to handle this specific case, or we could just adjust try_promote_writeback to not assert that recog succeeds and accept the missed optimization for the time being.