http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53189

             Bug #: 53189
           Summary: DImode and/or/not/xor optimized poorly in
                    core-registers
    Classification: Unclassified
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: a...@gcc.gnu.org
            Target: arm


The following code does not optimize well on current trunk.

--------------
void bar (long long, long long);

void
foo (long long a)
{
  bar (a&1, a);
}
--------------

Compiled with "-O2 -mfpu=vfpv3 -mthumb" gives:

--------------
foo:
        mov     r2, r0
        mov     r3, r1
        movs    r0, #1
        movs    r1, #0
        ands    r0, r0, r2
        ands    r1, r1, r3
        b       bar
--------------

As you can see there are many missed optimizations here:
 1. Failure to notice that r1 will always be zero.
 2. Failure to use immediate constant "#1" with "ands".

I'd expect output like this:
        mov     r2, r0
        mov     r3, r1
        ands    r0, r0, #1
        mov     r1, #0
        b       bar

The problem is two-fold:

First, adddi3 does not expand to two instructions so the two parts of the
operation cannot be optimized independently.

Second, adddi3 does not allow immediate constants so the expander is forced to
put the constants in registers.

As a general rule, if NEON or IWMMXT is not in use then DImode operations
should be decomposed from expand. If NEON/IWMMXT is available then
decomposition should be delayed until after reload, and the splitters should
attempt to produce optimal sequences in as many cases as possible. (Ideally, we
would be able to make the decision long before register allocation, but we're
not there yet.)

Reply via email to