Compile following function with options -Os -mthumb -march=armv5te unsigned get_least_bits(unsigned value) { return value << 9 >> 9; }
Gcc generates: ldr r3, .L2 @ sp needed for prologue and r0, r0, r3 bx lr .L3: .align 2 .L2: .word 8388607 A better code sequence should be: lsl r0, 9 lsr r0, 9 bx lr It is smaller (without constant pool) and faster. This transformation was done very early and we can see it in the first tree dump shift.c.003t.original. Gcc thinks and with a constant is cheaper than two shifts. It is not true for this case in thumb ISA. On the other hand if the constant used to and is small, such as 7, it is definitely cheaper than two shifts. So which method is better is highly depend on both the constant and the target ISA. It is difficult to make a correct decision in the TREE level. Maybe we should define a peephole rule to do it. -- Summary: inefficient code to extract least bits from an integer value Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: carrot at google dot com GCC build triplet: i686-linux GCC host triplet: i686-linux GCC target triplet: arm-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40697