http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53241
Bug #: 53241 Summary: Bad pre increment insn for ARM vfp store instructions Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: car...@google.com Target: arm-unknown-linux-gnueabi Compile the following code with options -march=armv7-a -mfloat-abi=softfp -mfpu=neon -mthumb -Os void t0o(double* p0, double* p1, double* p2) { int i; for (i=0; i<10; i++) p0[i+2] = p1[i] + p2[i]; } GCC generates: t0o: adds r0, r0, #8 movs r3, #0 push {r4, r5, lr} .L3: adds r5, r1, r3 adds r4, r2, r3 fldd d17, [r5, #0] fldd d16, [r4, #0] faddd d16, d17, d16 adds r3, r3, #8 cmp r3, #80 fmrrd r4, r5, d16 // A strd r4, [r0, #8]! // B bne .L3 pop {r4, r5, pc} If we change instructions AB to fstd d16, [r0, #8] adds r0, r0, 8 It is better in terms of both performance and code size since adds is shorter than strd, and the move between vfp register and core register may be expensive in some implementation, the current result also needs two extra core registers. -O2 has the same problem. Before pass auto_inc_dec, the code is in good shape 64 (insn 42 41 43 3 (set (reg:SI 181 [ ivtmp.29 ]) 65 (plus:SI (reg:SI 181 [ ivtmp.29 ]) 66 (const_int 8 [0x8]))) 4 {*arm_addsi3} 67 (nil)) ... 96 (insn 48 47 49 3 (set (mem:DF (reg:SI 181 [ ivtmp.29 ]) [2 MEM[base: D.5019_41, offset: 0B]+0 S8 A64]) 97 (reg:DF 191 [ D.4979 ])) src/t0o.c:5 653 {*thumb2_movdf_vfp} 98 (expr_list:REG_DEAD (reg:DF 191 [ D.4979 ]) 99 (nil))) Pass auto_inc_dec wrongly combined these two insns: 150 (insn 48 47 49 3 (set (mem:DF (pre_inc:SI (reg:SI 181 [ ivtmp.29 ])) [2 MEM[base: D.5019_41, offset: 0B]+0 S8 A64]) 151 (reg:DF 191 [ D.4979 ])) src/t0o.c:5 653 {*thumb2_movdf_vfp} 152 (expr_list:REG_INC (reg:SI 181 [ ivtmp.29 ]) 153 (expr_list:REG_DEAD (reg:DF 191 [ D.4979 ]) 154 (nil)))) Although arm supports pre increment for normal memory access, but it doesn't support pre increment for vfp store instructions. So in later reload pass, the vfp register is moved to core registers, then store them with pre increment. So we should prevent such pre increment cases.