Hello,

While testing SMS on Crotex-A9 I see that the latency of load instruction
is 1
cycle when compiling with -mcpu=cortex-a9 -mthumb -mtune=cortex-a9 -O3.

Below is a snippet from the SMS dump file showing the DDG, created for the
loop in foo function, which depicts the edge between the load of input[i]
(insn 181) and the mult instruction (insn 184).
[181 -(T,1,0)-> 184]   is the true dependence edge created between the
 two insns; with latency of 1.
On Crotex-A8 the latency of the load is 3 as expected.
I've read in crotex-a9.md file that loads should have a latency of 4 cycles
so I just wanted to check if I should have used other combination of flags
for Crotex-A9 or the load latency should indeed be of 1 cycle here.

Thanks,
Revital

int foo (int max, signed short *input, int y)
{
      int i, accum;

      for (i = 0; i < max; i++) {
            accum += (signed int) input[i] * (signed int) input[i+y];
        }
    return accum;
}

The snippet from the DDG:


Node num: 2
(insn 181 178 184 13 (set (reg:SI 216 [ D.2019 ])
        (zero_extend:SI (mem:HI (plus:SI (reg:SI 319 [ ivtmp.34 ])
                    (reg:SI 345)) [2 MEM[base: D.2076_257, index:
D.2079_226, offset: 0B]+0 S2 A16]))) tmp.c:7 714
{*thumb2_zero_extendhisi2_v6}
     (nil))
OUT ARCS:  [181 -(A,0,1)-> 176]  [181 -(T,1,0)-> 184]
IN ARCS:  [184 -(A,0,1)-> 181]  [176 -(T,1,0)-> 181]
Node num: 3
(insn 184 181 234 13 (set (reg/v:SI 209 [ accum ])
        (plus:SI (mult:SI (sign_extend:SI (subreg/s/u:HI (reg:SI 212
[ D.2013 ]) 0))
                (sign_extend:SI (subreg/s/u:HI (reg:SI 216 [ D.2019 ]) 0)))
            (reg/v:SI 209 [ accum ]))) tmp.c:7 64 {maddhisi4}
     (expr_list:REG_DEAD (reg:SI 216 [ D.2019 ])
        (expr_list:REG_DEAD (reg:SI 212 [ D.2013 ])
            (nil))))


_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to