On 9/29/06, David Edelsohn <[EMAIL PROTECTED]> wrote:
The GCC register allocator allocates objects that span multiple registers in adjacent registers. For instance, a 64-bit doubleword integer (long long int) will be allocated in two adjacent hardware registers when GCC is targeted at a processor with 32-bit registers.
I guess I'm still not being clear. Let's say I have a function like: typedef int aligned_int __attribute__((__aligned__(8))); int foo(aligned_int *a, aligned_int *b, int count) { int i; int sum = 0; for (i = 0; i < count; i+=2) { sum += a[0] + b[0]; sum += a[1] + b[1]; a += 2; b += 2; } return sum; } The load of a[i] and a[i+1] could be loaded together, if the register allocater places them next to each other. Instead I get: .LL5: ld [%o0], %g3 add %g1, %g3, %g3 ld [%o1], %g1 add %g3, %g1, %g3 ld [%o1+4], %g1 ld [%o0+4], %g2 add %g4, 2, %g4 add %g2, %g1, %g2 cmp %o2, %g4 add %o0, 8, %o0 add %g3, %g2, %g1 bg,pt %icc, .LL5 add %o1, 8, %o1 Now the peephole2 never really has a chance, because register allocation has already assigned registers that aren't paired. But if we could tell an earlier pass that two values next to each other in memory can be loaded together, we could have fused the loads and the register allocater would have probably been just fine. And this can make a difference for microarchitectures that are limited by bandwidth in and out of the cache, which is not uncommon. I guess in a way this is "autovectorization of random code snippets" so maybe it's too complex but it seems within the realm of what combine could do... -- Why are ``tolerant'' people so intolerant of intolerant people?