------- Comment #6 from ubizjak at gmail dot com 2007-07-14 14:04 ------- (In reply to comment #5)
> > This is two more movdqa then the hand-written code in CallSumDeltas3. > > paddd %xmm1, %xmm0 (2) > movdqa %xmm0, %xmm1 (2) > movdqa %xmm0, foo1 (1) > jne .L7 (1) is fixed by http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01330.html (2) it looks like a register allocator should be enhanced to match insn _output_ to the input that will produce less moves. We are dealing with %0: [(set (match_operand:SSEMODEI 0 "register_operand" "=x") (plus:SSEMODEI (match_operand:SSEMODEI 1 "nonimmediate_operand" "%0") (match_operand:SSEMODEI 2 "nonimmediate_operand" "xm")))] So there is no reason why RA shouldn't match output with most optimal _input_, producing one insn shorter sequence: ... cmpl $100000000, %eax movdqa %xmm0, %xmm1 pslldq $8, %xmm1 paddd %xmm0, %xmm1 # paddd %xmm1, %xmm0 # movdqa %xmm0, %xmm1 jne .L7 -- ubizjak at gmail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2007-07-14 14:04:19 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735