http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53189
Bug #: 53189 Summary: DImode and/or/not/xor optimized poorly in core-registers Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: a...@gcc.gnu.org Target: arm The following code does not optimize well on current trunk. -------------- void bar (long long, long long); void foo (long long a) { bar (a&1, a); } -------------- Compiled with "-O2 -mfpu=vfpv3 -mthumb" gives: -------------- foo: mov r2, r0 mov r3, r1 movs r0, #1 movs r1, #0 ands r0, r0, r2 ands r1, r1, r3 b bar -------------- As you can see there are many missed optimizations here: 1. Failure to notice that r1 will always be zero. 2. Failure to use immediate constant "#1" with "ands". I'd expect output like this: mov r2, r0 mov r3, r1 ands r0, r0, #1 mov r1, #0 b bar The problem is two-fold: First, adddi3 does not expand to two instructions so the two parts of the operation cannot be optimized independently. Second, adddi3 does not allow immediate constants so the expander is forced to put the constants in registers. As a general rule, if NEON or IWMMXT is not in use then DImode operations should be decomposed from expand. If NEON/IWMMXT is available then decomposition should be delayed until after reload, and the splitters should attempt to produce optimal sequences in as many cases as possible. (Ideally, we would be able to make the decision long before register allocation, but we're not there yet.)