------- Comment #5 from ubizjak at gmail dot com 2007-07-12 07:05 -------
(In reply to comment #3)
> regmove should have changed that but it does not probably because the final
> constraint does not have a duplicate operand. Actually, I think you want to
> look at anddi_1_rex64, not adddi_1_rex64:
Yes, of course. anddi is the problematic insn.
>
> [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r")
> (and:DI (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm")
> (match_operand:DI 2 "x86_64_szext_general_operand"
> "Z,re,rm,L")))
> (clobber (reg:CC FLAGS_REG))]
>
> The final constraint is for when and is used to create a zero-extending moves
> (L matches constants 0xFF and 0xFFFF). I would say that you have to 1) define
> a predicate which has the same behavior as L and 2) split that alternative out
> of the three anddi patterns that use it (grep for '\<L\>') into a separate
> insn.
Hm, please note, that we are not operating with constants, but strictly with
registers. We are dealing with alternative 2 "=rm/%0/re", so I think that
splitting L out of insn pattern would not have a desired effect.
However, I conducted a little experiment and changed (data & m1) into
(data - m1). minus pattern is not commutative, and in lreg pass we have
following sequence (I remove clobber for TImode reg in the middle):
(insn:HI 24 23 25 2 pr32725.c:24 (set (reg:DI 74)
(zero_extend:DI (mem:HI (plus:DI (mult:DI (reg:DI 73)
(const_int 2 [0x2]))
(reg/v/f:DI 65 [ src ])) [3 S2 A16]))) 114
{zero_extendhidi2
} (expr_list:REG_DEAD (reg:DI 73)
(nil)))
(insn:HI 25 24 28 2 pr32725.c:24 (parallel [
(set (reg:DI 74)
(minus:DI (reg:DI 74)
(reg:DI 71)))
(clobber (reg:CC 17 flags))
]) 237 {*subdi_1_rex64} (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn:HI 33 28 37 2 pr32725.c:24 (parallel [
(set (reg:TI 79)
(mult:TI (zero_extend:TI (reg:DI 74))
(zero_extend:TI (reg:DI 70))))
(clobber (reg:CC 17 flags))
]) 264 {*umulditi3_insn} (expr_list:REG_DEAD (reg:DI 74)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
A natural selection for reg 74 would be %rax that satisfies the constraints of
the whole sequence. It indeed _looks_ like Rask is saying, that allocator
doesn't notice REG_DEAD note in insn 24 and somehow blocks the use of %rax for
minus and mult expr. This leads to extra reload (insn 101):
(insn:HI 24 23 25 2 pr32725.c:24 (set (reg:DI 3 bx [74])
(zero_extend:DI (mem:HI (plus:DI (mult:DI (reg:DI 0 ax [73])
(const_int 2 [0x2]))
(reg/v/f:DI 4 si [orig:65 src ] [65])) [3 S2 A16]))) 114
{zero_extendhidi2} (nil))
(insn:HI 25 24 101 2 pr32725.c:24 (parallel [
(set (reg:DI 3 bx [74])
(minus:DI (reg:DI 3 bx [74])
(reg:DI 38 r9 [71])))
(clobber (reg:CC 17 flags))
]) 237 {*subdi_1_rex64} (nil))
(insn 101 25 33 2 pr32725.c:24 (set (reg:DI 0 ax)
(reg:DI 3 bx [74])) 82 {*movdi_1_rex64} (nil))
(insn:HI 33 101 102 2 pr32725.c:24 (parallel [
(set (reg:TI 0 ax)
(mult:TI (zero_extend:TI (reg:DI 0 ax))
(zero_extend:TI (reg:DI 39 r10 [70]))))
(clobber (reg:CC 17 flags))
]) 264 {*umulditi3_insn} (nil))
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32725