https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113114

--- Comment #5 from Alex Coplan <acoplan at gcc dot gnu.org> ---
Hmm, so initially (with the testcase in c3) we have:

ldp s30, s29, [x0, #-4]
...
add x0, x0, #-4

and we try to form:

ldp s30, s29, [x0, #-4]!

with this RTL:

(rr) call debug (pair_change.m_insn->rtl ())
(insn 47 18 20 3 (parallel [
            (set (reg:DI 0 x0 [119])
                (plus:DI (reg:DI 0 x0 [orig:101 ivtmp.12 ] [101])
                    (const_int -4 [0xfffffffffffffffc])))
            (set (reg:SF 62 v30 [orig:122 MEM[(float *)_18] ] [122])
                (mem:SF (plus:DI (reg:DI 0 x0 [orig:101 ivtmp.12 ] [101])
                        (const_int -4 [0xfffffffffffffffc])) [0 +0 S4 A32]))
            (set (reg:SF 61 v29 [orig:116 MEM[(float *)_18] ] [116])
                (mem:SF (reg:DI 0 x0 [orig:101 ivtmp.12 ] [101]) [0 +4 S4
A32]))
        ]) "t.c":6:7 -1
     (nil))

but the problem is that we're expecting to match this pattern:

;; Load pair with pre-index writeback.
(define_insn "*loadwb_pre_pair_<ldst_sz>"
  [(set (match_operand 0 "pmode_register_operand")
        (match_operator 8 "pmode_plus_operator" [
          (match_operand 1 "pmode_register_operand")
          (match_operand 4 "const_int_operand")]))
   (set (match_operand:GPI 2 "aarch64_ldp_reg_operand")
        (match_operator 6 "memory_operand" [
          (match_operator 9 "pmode_plus_operator" [
            (match_dup 1)
            (match_dup 4)
          ])]))
   (set (match_operand:GPI 3 "aarch64_ldp_reg_operand")
        (match_operator 7 "memory_operand" [
          (match_operator 10 "pmode_plus_operator" [
             (match_dup 1)
             (match_operand 5 "const_int_operand")
          ])]))]
  "aarch64_mem_pair_offset (operands[4], <MODE>mode)
   && known_eq (INTVAL (operands[5]),
                INTVAL (operands[4]) + GET_MODE_SIZE (<MODE>mode))"
  {@ [cons: =&0, 1, =2, =3; attrs: type     ]
     [       rk, 0,  r,  r; load_<ldpstp_sz>] ldp\t%<w>2, %<w>3, [%0, %4]!
     [       rk, 0,  w,  w; neon_load1_2reg ] ldp\t%<v>2, %<v>3, [%0, %4]!
  }
)

which simply doesn't match due to the shape of the RTL: that is, the pattern
hard-codes two plus operands, but due to the offset of -4 here we end up with
the second operand accessing memory directly at (the initial value of) x0.

We could add a second pattern to handle this specific case, or we could just
adjust try_promote_writeback to not assert that recog succeeds and accept the
missed optimization for the time being.

Reply via email to