On Tue, Jun 06, 2006 at 08:27:10PM +0200, Rask Ingemann Lambertsen wrote:
> On Tue, Jun 06, 2006 at 10:39:46AM +0100, Richard Sandiford wrote:

> > This is just a guess, but the insn above might be an output reload.
> 
> It is, in a peculiar (and not useful) way. Diffing the greg dump against
> the lreg dump shows (using the example code I posted):
> 
> +(insn:HI 25 17 38 2 (set (reg:QI 3 r3)
> +        (reg:QI 3 r3 [110])) 158 {*arm_movqi_insn_swp} (nil)
> +    (nil))
> 
> -(insn:HI 25 17 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 101 [ x ])
> +(insn 38 25 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 0 r0 [orig:101 x ]
> + [101])
>                  (const_int 5 [0x5])) [0 <variable>.c2+0 S1 A8])
> -        (subreg:QI (reg:SI 110) 0)) 158 {*arm_movqi_insn_swp} (nil)
> -    (expr_list:REG_DEAD (reg:SI 110)
> -        (expr_list:REG_DEAD (reg/v/f:SI 101 [ x ])
> -            (nil))))
> +        (reg:QI 3 r3)) 158 {*arm_movqi_insn_swp} (nil)
> +    (nil))
> 
> I.e. change insn 25 to a nop and then add insn 38 as essentially a duplicate
> of the original insn 25. I don't think reload was supposed to do that.

The reason this happens is because reload decides that the cheapest
alternative is the first one, which is reg->reg. Then of course it has to
make an output reload to copy the value to memory...

I have not yet looked into why reload thinks the reg->reg alternative is the
cheapest one. That'll be tomorrow at the earliest.

Meanwhile, to ensure that reload decides upon the alternative we want,
delete the *arm_movqi_insn_swp pattern and use these two instead:

(define_insn "*arm_movqi_insn_to_reg"
  [(set (match_operand:QI 0 "register_operand" "=r,r,r")
        (match_operand:QI 1 "general_operand" "rI,K,m"))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
   && (   register_operand (operands[0], QImode)
       || register_operand (operands[1], QImode))"
  "@
   mov%?\\t%0, %1
   mvn%?\\t%0, #%B1
   ldr%?b\\t%0, %1"
  [(set_attr "type" "*,*,load1")
   (set_attr "predicable" "yes")]
)

(define_insn "*arm_movqi_insn_swp"
  [(set (match_operand:QI 0 "memory_operand" "=Q")
        (match_operand:QI 1 "nonmemory_operand" "r"))]
  "TARGET_ARM && TARGET_SWP_BYTE_WRITES
   && (   register_operand (operands[0], QImode)
       || register_operand (operands[1], QImode))"
  "swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"
  [(set_attr "type" "store1")
   (set_attr "predicable" "yes")]
)

Also, undo the change to arm_legitimate_address_p() arm.c.

Btw, why are there calls to register_operand() in the insn conditions?

With the changes above, I now get this (-O2 -swp-byte-writes):

bytewritetest:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        ldrb    r3, [r0, #5]    @ zero_extendqisi2
        ldrb    r2, [r0, #4]    @ zero_extendqisi2
        eor     r1, r3, r2
        add     r3, r3, r2
        add     r2, r0, #5
        ldr     ip, [r0, #0]
        swpb    r1, r1, [r2, #0]
        @ lr needed for prologue
        str     ip, [r0, #8]
        str     r3, [r0, #0]
        bx      lr

That's a lot better. It is now only one instruction longer than without
-swp-byte-writes.

-- 
Rask Ingemann Lambertsen

Reply via email to