------- Comment #7 from uros at kss-loka dot si 2006-08-17 07:21 -------
(In reply to comment #6)
> I think that remaining time difference is due to strange loop above innermost:
... due to strange _header_ above innermost loop ...
The problem is that we load zero in both arms of "if".
This is what I get in .099t.optimized (using gcc-4.2 -O2 -fno-ivopts):
<L1>:;
r.0 = (unsigned int) r;
D.1556 = r.0 * 4;
rowR = *((int *) D.1556 + row);
rowRp1 = *((int *) D.1556 + row + 4B);
if (rowR < rowRp1) goto <L41>; else goto <L42>;
<L42>:;
sum = 0.0;
goto <bb 5> (<L4>);
<L41>:;
i = rowR;
sum = 0.0;
Assignment to sum should be moved before if...
SSE is able to somehow CSE zero load during RTL:
.L8:
movl 20(%ebp), %edx
movapd %xmm2, %xmm1
movl (%edx,%ebx,4), %eax
movl 4(%edx,%ebx,4), %ecx
cmpl %ecx, %eax
jge .L11
movl %eax, %edx
.p2align 4,,7
.L12:
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676