On Tue, Jun 06, 2006 at 08:27:10PM +0200, Rask Ingemann Lambertsen wrote:
> On Tue, Jun 06, 2006 at 10:39:46AM +0100, Richard Sandiford wrote:
> > This is just a guess, but the insn above might be an output reload.
>
> It is, in a peculiar (and not useful) way. Diffing the greg dump against
> the lreg dump shows (using the example code I posted):
>
> +(insn:HI 25 17 38 2 (set (reg:QI 3 r3)
> + (reg:QI 3 r3 [110])) 158 {*arm_movqi_insn_swp} (nil)
> + (nil))
>
> -(insn:HI 25 17 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 101 [ x ])
> +(insn 38 25 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 0 r0 [orig:101 x ]
> + [101])
> (const_int 5 [0x5])) [0 <variable>.c2+0 S1 A8])
> - (subreg:QI (reg:SI 110) 0)) 158 {*arm_movqi_insn_swp} (nil)
> - (expr_list:REG_DEAD (reg:SI 110)
> - (expr_list:REG_DEAD (reg/v/f:SI 101 [ x ])
> - (nil))))
> + (reg:QI 3 r3)) 158 {*arm_movqi_insn_swp} (nil)
> + (nil))
>
> I.e. change insn 25 to a nop and then add insn 38 as essentially a duplicate
> of the original insn 25. I don't think reload was supposed to do that.
The reason this happens is because reload decides that the cheapest
alternative is the first one, which is reg->reg. Then of course it has to
make an output reload to copy the value to memory...
I have not yet looked into why reload thinks the reg->reg alternative is the
cheapest one. That'll be tomorrow at the earliest.
Meanwhile, to ensure that reload decides upon the alternative we want,
delete the *arm_movqi_insn_swp pattern and use these two instead:
(define_insn "*arm_movqi_insn_to_reg"
[(set (match_operand:QI 0 "register_operand" "=r,r,r")
(match_operand:QI 1 "general_operand" "rI,K,m"))]
"TARGET_ARM && TARGET_SWP_BYTE_WRITES
&& ( register_operand (operands[0], QImode)
|| register_operand (operands[1], QImode))"
"@
mov%?\\t%0, %1
mvn%?\\t%0, #%B1
ldr%?b\\t%0, %1"
[(set_attr "type" "*,*,load1")
(set_attr "predicable" "yes")]
)
(define_insn "*arm_movqi_insn_swp"
[(set (match_operand:QI 0 "memory_operand" "=Q")
(match_operand:QI 1 "nonmemory_operand" "r"))]
"TARGET_ARM && TARGET_SWP_BYTE_WRITES
&& ( register_operand (operands[0], QImode)
|| register_operand (operands[1], QImode))"
"swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0"
[(set_attr "type" "store1")
(set_attr "predicable" "yes")]
)
Also, undo the change to arm_legitimate_address_p() arm.c.
Btw, why are there calls to register_operand() in the insn conditions?
With the changes above, I now get this (-O2 -swp-byte-writes):
bytewritetest:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldrb r3, [r0, #5] @ zero_extendqisi2
ldrb r2, [r0, #4] @ zero_extendqisi2
eor r1, r3, r2
add r3, r3, r2
add r2, r0, #5
ldr ip, [r0, #0]
swpb r1, r1, [r2, #0]
@ lr needed for prologue
str ip, [r0, #8]
str r3, [r0, #0]
bx lr
That's a lot better. It is now only one instruction longer than without
-swp-byte-writes.
--
Rask Ingemann Lambertsen