On Tue, Jun 06, 2006 at 08:27:10PM +0200, Rask Ingemann Lambertsen wrote: > On Tue, Jun 06, 2006 at 10:39:46AM +0100, Richard Sandiford wrote:
> > This is just a guess, but the insn above might be an output reload. > > It is, in a peculiar (and not useful) way. Diffing the greg dump against > the lreg dump shows (using the example code I posted): > > +(insn:HI 25 17 38 2 (set (reg:QI 3 r3) > + (reg:QI 3 r3 [110])) 158 {*arm_movqi_insn_swp} (nil) > + (nil)) > > -(insn:HI 25 17 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 101 [ x ]) > +(insn 38 25 36 2 (set (mem/s:QI (plus:SI (reg/v/f:SI 0 r0 [orig:101 x ] > + [101]) > (const_int 5 [0x5])) [0 <variable>.c2+0 S1 A8]) > - (subreg:QI (reg:SI 110) 0)) 158 {*arm_movqi_insn_swp} (nil) > - (expr_list:REG_DEAD (reg:SI 110) > - (expr_list:REG_DEAD (reg/v/f:SI 101 [ x ]) > - (nil)))) > + (reg:QI 3 r3)) 158 {*arm_movqi_insn_swp} (nil) > + (nil)) > > I.e. change insn 25 to a nop and then add insn 38 as essentially a duplicate > of the original insn 25. I don't think reload was supposed to do that. The reason this happens is because reload decides that the cheapest alternative is the first one, which is reg->reg. Then of course it has to make an output reload to copy the value to memory... I have not yet looked into why reload thinks the reg->reg alternative is the cheapest one. That'll be tomorrow at the earliest. Meanwhile, to ensure that reload decides upon the alternative we want, delete the *arm_movqi_insn_swp pattern and use these two instead: (define_insn "*arm_movqi_insn_to_reg" [(set (match_operand:QI 0 "register_operand" "=r,r,r") (match_operand:QI 1 "general_operand" "rI,K,m"))] "TARGET_ARM && TARGET_SWP_BYTE_WRITES && ( register_operand (operands[0], QImode) || register_operand (operands[1], QImode))" "@ mov%?\\t%0, %1 mvn%?\\t%0, #%B1 ldr%?b\\t%0, %1" [(set_attr "type" "*,*,load1") (set_attr "predicable" "yes")] ) (define_insn "*arm_movqi_insn_swp" [(set (match_operand:QI 0 "memory_operand" "=Q") (match_operand:QI 1 "nonmemory_operand" "r"))] "TARGET_ARM && TARGET_SWP_BYTE_WRITES && ( register_operand (operands[0], QImode) || register_operand (operands[1], QImode))" "swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0" [(set_attr "type" "store1") (set_attr "predicable" "yes")] ) Also, undo the change to arm_legitimate_address_p() arm.c. Btw, why are there calls to register_operand() in the insn conditions? With the changes above, I now get this (-O2 -swp-byte-writes): bytewritetest: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldrb r3, [r0, #5] @ zero_extendqisi2 ldrb r2, [r0, #4] @ zero_extendqisi2 eor r1, r3, r2 add r3, r3, r2 add r2, r0, #5 ldr ip, [r0, #0] swpb r1, r1, [r2, #0] @ lr needed for prologue str ip, [r0, #8] str r3, [r0, #0] bx lr That's a lot better. It is now only one instruction longer than without -swp-byte-writes. -- Rask Ingemann Lambertsen