http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49473
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2011.07.20 15:59:59 CC| |ramana at gcc dot gnu.org Ever Confirmed|0 |1 --- Comment #2 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2011-07-20 15:59:59 UTC --- > - the add at .LPIC0 will stall for two cycles because the preceding load has a > result latency of three. The two subsequent MOVs could have been scheduled in > these slots since they don't have any data dependency on the ADD; This looks like it might be to do with the latency of the call instruction at least for the LPIC0 case. The scheduler thinks that r0 isn't ready really till cycle 34 or so and hence the compiler can't hoist the mov r5, r0 above the add r4, pc, r4 . The case around LPIC1 doesn't seem to show up in a recent build of trunk I have : .L5: ldr r1, .L7+24 @ 135 pic_load_addr_32bit [length = 4] add r2, r5, #32768 @ 25 *arm_addsi3/1 [length = 4] mov r0, r7 @ 27 *arm_movsi_insn/1 [length = 4] .LPIC1: add r1, pc, r1 @ 28 pic_add_dot_plus_eight [length = 4] add r2, r2, #180 @ 29 *arm_addsi3/1 [length = 4] bl gst_structure_get_int(PLT) @ 30 *call_value_symbol This is the bit I see with a more recent version of trunk and that looks better than what was shown in this case. We need to dig further into the 1136 TRM for the other comments in this report. Ramana