[Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression

zippel at gcc dot gnu dot org Thu, 19 Jul 2007 11:27:31 -0700


------- Comment #13 from zippel at gcc dot gnu dot org  2007-07-19 18:27 -------
The initial test case is part of the missed optimization. For example current
stable Debian gcc (4.1.2 20061115) produces code like this:


        movl    4(%esp), %eax
        movl    8(%esp), %edx
        leal    (%eax,%edx,4), %edx
        movl    4(%edx), %ecx
        movl    8(%edx), %eax
        addl    %ecx, %eax
        movl    12(%edx), %ecx
        addl    %ecx, %eax
        ret

Which has some unnecessaries moves, but it shows the basic idea, so with
eliminated moves it would be:

        movl    4(%esp), %eax
        movl    8(%esp), %edx
        leal    (%eax,%edx,4), %edx
        movl    4(%edx), %eax
        addl    8(%edx), %eax
        addl    12(%edx), %eax
        ret

>From the code size this is identical to:

        movl    4(%esp), %ecx
        movl    8(%esp), %edx
        movl    8(%ecx,%edx,4), %eax
        addl    4(%ecx,%edx,4), %eax
        addl    12(%ecx,%edx,4), %eax
        ret

But it depends now on the target which instruction sequence is better.
The problem is now with the new canonical form, that AFAICT it has become
practically very difficult to generate the optimal sequence based on
instruction costs.

The older gcc produces this IL before RTL generation:

  D.1283 = (int *) (i * 4) + p;
  return *(D.1283 + 4B) + *(D.1283 + 8B) + *(D.1283 + 12B);

which produces far better RTL for the optimizers to work with.

BTW this problem is not limited to pointer expression, since the lea
instruction is used in other expressions as well.
Let's take this example:

void f(unsigned int *p, unsigned int a)
{       
  p[0] = a * 4 + 4;
  p[1] = a * 4 + 8;
  p[2] = a * 4 + 12;
}

Above gcc 4.1 produces this:

  D.1281 = a * 4;
  *p = D.1281 + 4;
  *(p + 4B) = D.1281 + 8;
  *(p + 8B) = D.1281 + 12;

        movl    8(%esp), %eax
        movl    4(%esp), %ecx
        sall    $2, %eax
        leal    4(%eax), %edx
        movl    %edx, (%ecx)
        leal    8(%eax), %edx
        addl    $12, %eax
        movl    %edx, 4(%ecx)
        movl    %eax, 8(%ecx)
        ret

gcc 4.2 produces this:

  *p = (a + 1) * 4;
  D.1545 = a * 4;
  *(p + 4B) = D.1545 + 8;
  *(p + 8B) = D.1545 + 12;

        movl    8(%esp), %eax
        movl    4(%esp), %ecx
        leal    4(,%eax,4), %edx
        sall    $2, %eax
        movl    %edx, (%ecx)
        leal    8(%eax), %edx
        addl    $12, %eax
        movl    %edx, 4(%ecx)
        movl    %eax, 8(%ecx)
        ret

So 4.2 already produces slightly worse code.
Current gcc finally produces:

  *p = (a + 1) * 4;
  *(p + 4) = (a + 2) * 4;
  *(p + 8) = (a + 3) * 4;

        movl    8(%esp), %eax
        movl    4(%esp), %ecx
        leal    4(,%eax,4), %edx
        movl    %edx, (%ecx)
        leal    8(,%eax,4), %edx
        leal    12(,%eax,4), %eax
        movl    %edx, 4(%ecx)
        movl    %eax, 8(%ecx)
        ret

This has now the largest code size of all versions.

This new canonical form IMHO clearly conflicts with what is expected at RTL
level, so I don't understand why it's so important to use this one. Could you
maybe explain the reason behind this choice?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32698

[Bug tree-optimization/32698] [4.3 regression] inefficient pointer expression

Reply via email to