http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59393
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target| |mips16 Target Milestone|--- |4.8.3 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- <bb 2>: s_4 = &key_3(D)->S[0]; ... _15 = _14 * 8; _16 = s_4 + _15; _17 = *_16; ... _21 = _20 * 8; _22 = s_4 + _21; _23 = *_22; ... formerly we'd have created _17 = MEM[key_3].S[_14]; ... _23 = MEM[key_3].S[_20]; which isn't a valid transform. That eventually gets us better addressing mode selection? At RTL this probably (didn't verify) re-associates the key_3 + offsetof(S) + index * 8 expression to a more suitable way and by-passes the multiple-use restriction of combine (forwprop here un-CSEs key_3 + offsetof(S)). In a loop IVOPTs would be the one to utilize target addressing mode information and eventually generate a TARGET_MEM_REF. In non-loops we have SLSR (gimple-ssa-strength-reduction.c) that could serve as a vehicle to generate TARGET_MEM_REFs (it doesn't). In the end I would point at RTL forwprop which is supposed to improve addressing-mode selection. At least on x86_64 I see leaq 144(%rsi), %rax ... xorq 4096(%rax,%rbx,8), %r8 addl 6144(%rax,%r9,8), %r8d as well (and %rsi is live as well), instead of folding the 144 into the dereference offset. forwprop sees (insn 8 5 9 2 (parallel [ (set (reg/v/f:DI 85 [ s ]) (plus:DI (reg/v/f:DI 991 [ key ]) (const_int 144 [0x90]))) (clobber (reg:CC 17 flags)) ... (insn 20 19 21 3 (set (reg:DI 998 [ *_22 ]) (mem:DI (plus:DI (mult:DI (reg:DI 995) (const_int 8 [0x8])) (reg/v/f:DI 85 [ s ])) [2 *_22+0 S8 A64])) ... and then combine folds in an additional addition: Trying 18 -> 20: Successfully matched this instruction: (set (reg:DI 998 [ *_22 ]) (mem:DI (plus:DI (plus:DI (mult:DI (reg:DI 994 [ D.1883 ]) (const_int 8 [0x8])) (reg/v/f:DI 85 [ s ])) (const_int 2048 [0x800])) [2 *_22+0 S8 A64])) but of course doesn't consider insn 8 (it's cross basic-block and it has multiple uses). Now there isn't any further forwprop pass after combine (which would maybe now fold in the addition - not sure). Certainly ira/lra/reload do not consider materializing the def in-place either instead of spilling it for you. Not sure how the situation is on mips16, but in the end RTL optimizers are supposed to fixup anything related to addressing mode selection.