http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53241

             Bug #: 53241
           Summary: Bad pre increment insn for ARM vfp store instructions
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: car...@google.com
            Target: arm-unknown-linux-gnueabi


Compile the following code with options -march=armv7-a -mfloat-abi=softfp
-mfpu=neon -mthumb -Os

void t0o(double* p0, double* p1, double* p2)
{
  int i;
  for (i=0; i<10; i++)
    p0[i+2] = p1[i] + p2[i];
}

GCC generates:

t0o:
    adds    r0, r0, #8
    movs    r3, #0
    push    {r4, r5, lr}
.L3:
    adds    r5, r1, r3
    adds    r4, r2, r3
    fldd    d17, [r5, #0]
    fldd    d16, [r4, #0]
    faddd    d16, d17, d16
    adds    r3, r3, #8
    cmp    r3, #80
    fmrrd    r4, r5, d16     // A
    strd    r4, [r0, #8]!   // B
    bne    .L3
    pop    {r4, r5, pc}

If we change instructions AB to

  fstd     d16, [r0, #8]
  adds     r0, r0, 8

It is better in terms of both performance and code size since adds is shorter
than strd, and the move between vfp register and core register may be expensive
in some implementation, the current result also needs two extra core registers.

-O2 has the same problem.

Before pass auto_inc_dec, the code is in good shape


 64 (insn 42 41 43 3 (set (reg:SI 181 [ ivtmp.29 ]) 65         (plus:SI (reg:SI
181 [ ivtmp.29 ])
 66             (const_int 8 [0x8]))) 4 {*arm_addsi3}
 67      (nil))

...

 96 (insn 48 47 49 3 (set (mem:DF (reg:SI 181 [ ivtmp.29 ]) [2 MEM[base:
D.5019_41, offset: 0B]+0 S8 A64])
 97         (reg:DF 191 [ D.4979 ])) src/t0o.c:5 653 {*thumb2_movdf_vfp}
 98      (expr_list:REG_DEAD (reg:DF 191 [ D.4979 ])
 99         (nil)))

Pass auto_inc_dec wrongly combined these two insns:


150 (insn 48 47 49 3 (set (mem:DF (pre_inc:SI (reg:SI 181 [ ivtmp.29 ])) [2
MEM[base: D.5019_41, offset: 0B]+0 S8 A64])
151         (reg:DF 191 [ D.4979 ])) src/t0o.c:5 653 {*thumb2_movdf_vfp}
152      (expr_list:REG_INC (reg:SI 181 [ ivtmp.29 ])
153         (expr_list:REG_DEAD (reg:DF 191 [ D.4979 ])
154             (nil))))

Although arm supports pre increment for normal memory access, but it doesn't
support pre increment for vfp store instructions. So in later reload pass, the
vfp register is moved to core registers, then store them with pre increment.

So we should prevent such pre increment cases.

Reply via email to