------- Comment #5 from ubizjak at gmail dot com 2007-07-12 07:05 ------- (In reply to comment #3)
> regmove should have changed that but it does not probably because the final > constraint does not have a duplicate operand. Actually, I think you want to > look at anddi_1_rex64, not adddi_1_rex64: Yes, of course. anddi is the problematic insn. > > [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r") > (and:DI (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm") > (match_operand:DI 2 "x86_64_szext_general_operand" > "Z,re,rm,L"))) > (clobber (reg:CC FLAGS_REG))] > > The final constraint is for when and is used to create a zero-extending moves > (L matches constants 0xFF and 0xFFFF). I would say that you have to 1) define > a predicate which has the same behavior as L and 2) split that alternative out > of the three anddi patterns that use it (grep for '\<L\>') into a separate > insn. Hm, please note, that we are not operating with constants, but strictly with registers. We are dealing with alternative 2 "=rm/%0/re", so I think that splitting L out of insn pattern would not have a desired effect. However, I conducted a little experiment and changed (data & m1) into (data - m1). minus pattern is not commutative, and in lreg pass we have following sequence (I remove clobber for TImode reg in the middle): (insn:HI 24 23 25 2 pr32725.c:24 (set (reg:DI 74) (zero_extend:DI (mem:HI (plus:DI (mult:DI (reg:DI 73) (const_int 2 [0x2])) (reg/v/f:DI 65 [ src ])) [3 S2 A16]))) 114 {zero_extendhidi2 } (expr_list:REG_DEAD (reg:DI 73) (nil))) (insn:HI 25 24 28 2 pr32725.c:24 (parallel [ (set (reg:DI 74) (minus:DI (reg:DI 74) (reg:DI 71))) (clobber (reg:CC 17 flags)) ]) 237 {*subdi_1_rex64} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn:HI 33 28 37 2 pr32725.c:24 (parallel [ (set (reg:TI 79) (mult:TI (zero_extend:TI (reg:DI 74)) (zero_extend:TI (reg:DI 70)))) (clobber (reg:CC 17 flags)) ]) 264 {*umulditi3_insn} (expr_list:REG_DEAD (reg:DI 74) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) A natural selection for reg 74 would be %rax that satisfies the constraints of the whole sequence. It indeed _looks_ like Rask is saying, that allocator doesn't notice REG_DEAD note in insn 24 and somehow blocks the use of %rax for minus and mult expr. This leads to extra reload (insn 101): (insn:HI 24 23 25 2 pr32725.c:24 (set (reg:DI 3 bx [74]) (zero_extend:DI (mem:HI (plus:DI (mult:DI (reg:DI 0 ax [73]) (const_int 2 [0x2])) (reg/v/f:DI 4 si [orig:65 src ] [65])) [3 S2 A16]))) 114 {zero_extendhidi2} (nil)) (insn:HI 25 24 101 2 pr32725.c:24 (parallel [ (set (reg:DI 3 bx [74]) (minus:DI (reg:DI 3 bx [74]) (reg:DI 38 r9 [71]))) (clobber (reg:CC 17 flags)) ]) 237 {*subdi_1_rex64} (nil)) (insn 101 25 33 2 pr32725.c:24 (set (reg:DI 0 ax) (reg:DI 3 bx [74])) 82 {*movdi_1_rex64} (nil)) (insn:HI 33 101 102 2 pr32725.c:24 (parallel [ (set (reg:TI 0 ax) (mult:TI (zero_extend:TI (reg:DI 0 ax)) (zero_extend:TI (reg:DI 39 r10 [70])))) (clobber (reg:CC 17 flags)) ]) 264 {*umulditi3_insn} (nil)) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32725