On 07 Jul 2015, at 13:52, Bin.Cheng <amker.ch...@gmail.com> wrote:

> On Tue, Jul 7, 2015 at 10:05 AM, Anmol Paralkar (anmparal)
> <anmpa...@cisco.com> wrote:
>> Hello,
>> 
>> Does GCC generate LDRD/STRD (Register) forms [A8.8.74/A8.8.211 per ARMv7-A
>> & ARMv7-R ARM]?
>> 
>> Based on various attempts to write code to get GCC to generate a sample
>> form, and subsequently inspecting the code I see in
>> config/arm/arm.c/output_move_double () & arm.md [GCC 4.9.2], I think that
>> these register based forms of LDRD/STRD are
>> not generated, but I thought it might be a good idea to ask on the list,
>> just in case.
> Register based LDRD is harder than immediate version.  ARM doesn't
> support [base + reg + offset] addressing mode, so address computation
> of the second memory reference is scattered both in and out of memory
> reference.  To identify such opportunities, one needs to trace
> registers in address expression the memory access instruction and does
> some kind of value computation and re-association.

Basically, this is what we're trying to do with AMS.  For each mem access it 
tries to trace the reg values and figure out the effective address expression.  
For now we've limited it to the form 'base_reg + index_reg*scale + 
const_displacement'.  Then we try to see how to fit the address expressions to 
the available address modes.

It's still work in progress but already shows some improvements.
A classic SH4 example:

float fun (float* x)
{
  return x[0] + x[1] + x[2] + x[3];
}

no AMS:
        mov     r4,r1
        add     #4,r1
        fmov.s  @r4,fr0
        fmov.s  @r1,fr1
        mov     r4,r1
        add     #8,r1
        fadd    fr1,fr0
        fmov.s  @r1,fr1
        add     #12,r4
        fadd    fr1,fr0
        fmov.s  @r4,fr1
        rts     
        fadd    fr1,fr0

AMS:
        fmov.s  @r4+,fr0
        fmov.s  @r4+,fr1
        fadd    fr1,fr0
        fmov.s  @r4+,fr1
        fadd    fr1,fr0
        fmov.s  @r4,fr1
        rts     
        fadd    fr1,fr0

If I understand correctly, ARM's LDRD/STRD are similar to SH's FPU 2x32 pair 
loads/stores.  It needs the mem access insns of adjacent addresses to be 
adjacent in the insn stream.  We'll try to do some mem access reordering in 
AMS, mainly to improve post/pre inc/dec address mode utilization.  Afterwards, 
adjacent mem accesses can be fused together in a separate RTL pass or AMS 
sub-pass to avoid re-discovering mem access sequence information, which AMS 
already has.

Cheers,
Oleg

Reply via email to