Ian Lance Taylor <ian@airs.com> writes: > I mentioned on IRC that I had a simple patch to let the RTL level > aliasing analysis see the underlying decl, the one with the restrict > qualifier. My original patch was for the 4.0 branch. This is a > version updated for the 4.1 branch.
I forgot to add the effects. For this test case: void copy (int * __restrict p, const int * __restrict q, unsigned int n) { unsigned int i; for (i = 0; i < n; ++i) { p[0] = q[0]; p[1] = q[1]; p[2] = q[2]; p[3] = q[3]; p += 4; q += 4; } } compiled with -O2 -fschedule-insns on i686-pc-linux-gnu, the unpatched compiler generates this code in the loop: movl (%edx), %eax incl %ebx movl %eax, (%ecx) movl 4(%edx), %eax movl %eax, 4(%ecx) movl 8(%edx), %eax movl %eax, 8(%ecx) movl 12(%edx), %eax addl $16, %edx movl %eax, 12(%ecx) addl $16, %ecx The patched compiler generates this code: movl 4(%esi), %eax movl (%esi), %ebx movl 8(%esi), %edx movl 12(%esi), %ecx addl $16, %esi incl -16(%ebp) movl %eax, 4(%edi) movl %ebx, (%edi) movl -16(%ebp), %eax movl %edx, 8(%edi) movl %ecx, 12(%edi) addl $16, %edi In the unpatched compiler, the RTL level does not see that p and q can not alias each other, and therefore does the assignments precisely as they appear in the program. In the patched compiler, the compiler sees that there is no aliasing, and all the loads are done before all the stores. The latter code will normally minimize load delays. Of course this will have a more dramatic effect on processors which do in-order execution. Ian