http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304
Bug #: 50304 Summary: poor code for accessing certain element of arrays Classification: Unclassified Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: t...@deltasystem.hu Array elements on known (hardcoded) positions are accessed by inefficient code. The addresses are computed runtime, waisting both runtime and code size. For example: int a[4][4][16]; void foo( void ) { int c; c = a[2][3][4]; ... } compiles this way in -O1, ARM Cortex M0: 0: 4b02 ldr r3, [pc, #8] ; (c <foo+0xc>) 2: 22b4 movs r2, #180 ; 0xb4 4: 0092 lsls r2, r2, #2 6: 589a ldr r2, [r3, r2] ----------- I tried to convert this operation to be precompiled by forming a constant address. int a[4][4][16]; int *const b=&a[2][3][4]; void foo( void ) { int c; c = *b; ... } However, it is ignored by gcc, and compiles this way in -O1: 0: 4b03 ldr r3, [pc, #12] ; (10 <foo+0x10>) 2: 21b4 movs r1, #180 ; 0xb4 4: 0089 lsls r1, r1, #2 6: 185a adds r2, r3, r1 8: 6812 ldr r2, [r2, #0] The actual code can vary, depending on the situation. For example: 0: 4b02 ldr r3, [pc, #8] ; (c <foo+0xc>) 2: 4a03 ldr r2, [pc, #12] ; (10 <foo+0x10>) 4: 589a ldr r2, [r3, r2] or 0: 4b02 ldr r3, [pc, #8] ; (c <foo+0xc>) 2: 4903 ldr r1, [pc, #12] ; (10 <foo+0x10>) 4: 185a adds r2, r3, r1 a[0][0][0]=c; 6: 6812 ldr r2, [r2, #0] Sometimes (I don't know yet why and exactly when) it can be even much worse, introducing bytewide accessing of e.g. an int32_t, dissassembling and reassembling it. (It's definetely not an alignment problem.) This waists about 40 instructions for a read-modify-write access. ------------------ It should have look like this: 0: 4b02 ldr r3, [pc, #8] ; (c <foo+0xc>) 2: 681b ldr r3, [r3, #0] 4: 681a ldr r2, [r3, #0] where the constant (at 0x0c) at the first row is a precalculated address. ------------------ With variable address, like this: int a[4][4][16]; int *b=&a[2][3][4]; ... it work nicely. However, it is really not equivalent solution at flash based processors.