------- Comment #13 from zippel at gcc dot gnu dot org 2007-07-19 18:27 ------- The initial test case is part of the missed optimization. For example current stable Debian gcc (4.1.2 20061115) produces code like this:
movl 4(%esp), %eax movl 8(%esp), %edx leal (%eax,%edx,4), %edx movl 4(%edx), %ecx movl 8(%edx), %eax addl %ecx, %eax movl 12(%edx), %ecx addl %ecx, %eax ret Which has some unnecessaries moves, but it shows the basic idea, so with eliminated moves it would be: movl 4(%esp), %eax movl 8(%esp), %edx leal (%eax,%edx,4), %edx movl 4(%edx), %eax addl 8(%edx), %eax addl 12(%edx), %eax ret >From the code size this is identical to: movl 4(%esp), %ecx movl 8(%esp), %edx movl 8(%ecx,%edx,4), %eax addl 4(%ecx,%edx,4), %eax addl 12(%ecx,%edx,4), %eax ret But it depends now on the target which instruction sequence is better. The problem is now with the new canonical form, that AFAICT it has become practically very difficult to generate the optimal sequence based on instruction costs. The older gcc produces this IL before RTL generation: D.1283 = (int *) (i * 4) + p; return *(D.1283 + 4B) + *(D.1283 + 8B) + *(D.1283 + 12B); which produces far better RTL for the optimizers to work with. BTW this problem is not limited to pointer expression, since the lea instruction is used in other expressions as well. Let's take this example: void f(unsigned int *p, unsigned int a) { p[0] = a * 4 + 4; p[1] = a * 4 + 8; p[2] = a * 4 + 12; } Above gcc 4.1 produces this: D.1281 = a * 4; *p = D.1281 + 4; *(p + 4B) = D.1281 + 8; *(p + 8B) = D.1281 + 12; movl 8(%esp), %eax movl 4(%esp), %ecx sall $2, %eax leal 4(%eax), %edx movl %edx, (%ecx) leal 8(%eax), %edx addl $12, %eax movl %edx, 4(%ecx) movl %eax, 8(%ecx) ret gcc 4.2 produces this: *p = (a + 1) * 4; D.1545 = a * 4; *(p + 4B) = D.1545 + 8; *(p + 8B) = D.1545 + 12; movl 8(%esp), %eax movl 4(%esp), %ecx leal 4(,%eax,4), %edx sall $2, %eax movl %edx, (%ecx) leal 8(%eax), %edx addl $12, %eax movl %edx, 4(%ecx) movl %eax, 8(%ecx) ret So 4.2 already produces slightly worse code. Current gcc finally produces: *p = (a + 1) * 4; *(p + 4) = (a + 2) * 4; *(p + 8) = (a + 3) * 4; movl 8(%esp), %eax movl 4(%esp), %ecx leal 4(,%eax,4), %edx movl %edx, (%ecx) leal 8(,%eax,4), %edx leal 12(,%eax,4), %eax movl %edx, 4(%ecx) movl %eax, 8(%ecx) ret This has now the largest code size of all versions. This new canonical form IMHO clearly conflicts with what is expected at RTL level, so I don't understand why it's so important to use this one. Could you maybe explain the reason behind this choice? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698