------- Comment #7 from uros at kss-loka dot si 2006-08-17 07:21 ------- (In reply to comment #6)
> I think that remaining time difference is due to strange loop above innermost: ... due to strange _header_ above innermost loop ... The problem is that we load zero in both arms of "if". This is what I get in .099t.optimized (using gcc-4.2 -O2 -fno-ivopts): <L1>:; r.0 = (unsigned int) r; D.1556 = r.0 * 4; rowR = *((int *) D.1556 + row); rowRp1 = *((int *) D.1556 + row + 4B); if (rowR < rowRp1) goto <L41>; else goto <L42>; <L42>:; sum = 0.0; goto <bb 5> (<L4>); <L41>:; i = rowR; sum = 0.0; Assignment to sum should be moved before if... SSE is able to somehow CSE zero load during RTL: .L8: movl 20(%ebp), %edx movapd %xmm2, %xmm1 movl (%edx,%ebx,4), %eax movl 4(%edx,%ebx,4), %ecx cmpl %ecx, %eax jge .L11 movl %eax, %edx .p2align 4,,7 .L12: -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21676