On 9/29/06, David Edelsohn <[EMAIL PROTECTED]> wrote:
        The GCC register allocator allocates objects that span multiple
registers in adjacent registers.  For instance, a 64-bit doubleword
integer (long long int) will be allocated in two adjacent hardware
registers when GCC is targeted at a processor with 32-bit registers.

I guess I'm still not being clear.

Let's say I have a function like:

typedef int aligned_int __attribute__((__aligned__(8)));

int foo(aligned_int *a, aligned_int *b, int count)
{
  int i;
  int sum = 0;
  for (i = 0; i < count; i+=2) {
     sum += a[0] + b[0];
     sum += a[1] + b[1];
     a += 2; b += 2;
  }
  return sum;
}

The load of a[i] and a[i+1] could be loaded together, if the
register allocater places them next to each other.

Instead I get:

.LL5:
       ld      [%o0], %g3
       add     %g1, %g3, %g3
       ld      [%o1], %g1
       add     %g3, %g1, %g3
       ld      [%o1+4], %g1
       ld      [%o0+4], %g2
       add     %g4, 2, %g4
       add     %g2, %g1, %g2
       cmp     %o2, %g4
       add     %o0, 8, %o0
       add     %g3, %g2, %g1
       bg,pt   %icc, .LL5
        add    %o1, 8, %o1

Now the peephole2 never really has a chance, because register allocation
has already assigned registers that aren't paired.  But if we could tell
an earlier pass that two values next to each other in memory can be loaded
together, we could have fused the loads and the register allocater would have
probably been just fine.

And this can make a difference for microarchitectures that are limited by
bandwidth in and out of the cache, which is not uncommon.

I guess in a way this is "autovectorization of random code snippets" so maybe
it's too complex but it seems within the realm of what combine could do...

--
Why are ``tolerant'' people so intolerant of intolerant people?

Reply via email to