------- Additional Comments From steven at gcc dot gnu dot org 2005-01-23
19:23 -------
For x86 I get this:
g:
movl r+8, %edx
movl r, %eax
addl %edx, %eax
movl %eax, r
addl r+4, %eax
movl %eax, r+4
addl %edx, %eax
movl %eax, r+8
ret
That is pretty much the best you can get, as far as I can tell.
For AMD64 it's similar:
g:
.LFB2:
movl r+8(%rip), %edx
movl r(%rip), %eax
addl %edx, %eax
movl %eax, r(%rip)
addl r+4(%rip), %eax
movl %eax, r+4(%rip)
addl %edx, %eax
movl %eax, r+8(%rip)
ret
.LFE2:
I'm not sure what you think the missed optimization is here. You will have
to show what you want at the assembly level, and explain why you think this
is a coalescing problem. So far, I don't see a missed optimization.
What is worse is that we fail to do store motion when you put such blocks
inside a loop, e.g.
int r[6];
void g (int n)
{
while (--n)
{
r [0] += r [1];
r [1] += r [2];
r [2] += r [0];
}
}
which is the issue discussed in PR19581.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19580