------- Additional Comments From steven at gcc dot gnu dot org 2005-01-23 19:23 ------- For x86 I get this: g: movl r+8, %edx movl r, %eax addl %edx, %eax movl %eax, r addl r+4, %eax movl %eax, r+4 addl %edx, %eax movl %eax, r+8 ret That is pretty much the best you can get, as far as I can tell. For AMD64 it's similar: g: .LFB2: movl r+8(%rip), %edx movl r(%rip), %eax addl %edx, %eax movl %eax, r(%rip) addl r+4(%rip), %eax movl %eax, r+4(%rip) addl %edx, %eax movl %eax, r+8(%rip) ret .LFE2: I'm not sure what you think the missed optimization is here. You will have to show what you want at the assembly level, and explain why you think this is a coalescing problem. So far, I don't see a missed optimization. What is worse is that we fail to do store motion when you put such blocks inside a loop, e.g. int r[6]; void g (int n) { while (--n) { r [0] += r [1]; r [1] += r [2]; r [2] += r [0]; } } which is the issue discussed in PR19581.
-- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19580