This is a target bug as it does not effect any reasonable processor.
With -mfpmath=sse -msse2 I get:
.L2:
decl %eax
addsd %xmm1, %xmm0
jne .L2
my example was about version 3.4.4, which still has this problem with sse options:
.L5: movsd -8(%ebp), %xmm1 decl %eax addsd %xmm0, %xmm1 movsd %xmm1, -8(%ebp) jns .L5
you're right with 4.0 about my example. but the testcase by benjamin still has this problem, with 4.0, with sse:
the inner loop:
.L126: incl %eax movsd -8(%edx), %xmm0 movsd (%edx), %xmm1 addl $8, %edx cmpl $1000, %eax mulsd %xmm0, %xmm1 addsd %xmm1, %xmm0 addsd -48(%ebp), %xmm0 movsd %xmm0, -48(%ebp) jne .L126
inner loop with one of the changes benjamin suggested, which shouldn't have any effect:
.L124: incl %eax movsd -8(%edx), %xmm0 movsd (%edx), %xmm1 addl $8, %edx cmpl $1000, %eax mulsd %xmm0, %xmm1 addsd %xmm1, %xmm0 addsd %xmm0, %xmm2 jne .L124
-- Stefan Strasser
