Ian Lance Taylor <ian@airs.com> writes:

> I mentioned on IRC that I had a simple patch to let the RTL level
> aliasing analysis see the underlying decl, the one with the restrict
> qualifier.  My original patch was for the 4.0 branch.  This is a
> version updated for the 4.1 branch.

I forgot to add the effects.  For this test case:

void
copy (int * __restrict p, const int * __restrict q, unsigned int n)
{
  unsigned int i;

  for (i = 0; i < n; ++i)
    {
      p[0] = q[0];
      p[1] = q[1];
      p[2] = q[2];
      p[3] = q[3];
      p += 4;
      q += 4;
    }
}

compiled with -O2 -fschedule-insns on i686-pc-linux-gnu, the unpatched
compiler generates this code in the loop:

        movl    (%edx), %eax
        incl    %ebx
        movl    %eax, (%ecx)
        movl    4(%edx), %eax
        movl    %eax, 4(%ecx)
        movl    8(%edx), %eax
        movl    %eax, 8(%ecx)
        movl    12(%edx), %eax
        addl    $16, %edx
        movl    %eax, 12(%ecx)
        addl    $16, %ecx

The patched compiler generates this code:

        movl    4(%esi), %eax
        movl    (%esi), %ebx
        movl    8(%esi), %edx
        movl    12(%esi), %ecx
        addl    $16, %esi
        incl    -16(%ebp)
        movl    %eax, 4(%edi)
        movl    %ebx, (%edi)
        movl    -16(%ebp), %eax
        movl    %edx, 8(%edi)
        movl    %ecx, 12(%edi)
        addl    $16, %edi

In the unpatched compiler, the RTL level does not see that p and q can
not alias each other, and therefore does the assignments precisely as
they appear in the program.  In the patched compiler, the compiler
sees that there is no aliasing, and all the loads are done before all
the stores.  The latter code will normally minimize load delays.  Of
course this will have a more dramatic effect on processors which do
in-order execution.

Ian

Reply via email to