Hello Bernie,
Thanks for your quick response.

Yes, I observed performance gap. Followings are data what I got on our LS1043A 
platform:

fcp for L1 cache with gcc-4.8:  5196.12 MB/s for L1 cache
fcp for L1 cache with gcc-5.1:  2983.11 MB/s for L1 cache

Following part of assembly code for fcp function: 

Gcc-5.1:
  40110c:       3dc00c6c        ldr     q12, [x3,#48]
  401110:       3dc0106b        ldr     q11, [x3,#64]
  401114:       3dc0146a        ldr     q10, [x3,#80]
  401118:       3dc01869        ldr     q9, [x3,#96]
  40111c:       3dc01c68        ldr     q8, [x3,#112]
  401120:       3dc0207f        ldr     q31, [x3,#128]
  401124:       3dc0247e        ldr     q30, [x3,#144]
  401128:       3dc0287d        ldr     q29, [x3,#160]
  40112c:       3dc02c7c        ldr     q28, [x3,#176]
  401130:       3dc0307b        ldr     q27, [x3,#192]
  401134:       3dc0347a        ldr     q26, [x3,#208]
  401138:       3dc03879        ldr     q25, [x3,#224]
  40113c:       3dc03c78        ldr     q24, [x3,#240]
  401140:       3dc04077        ldr     q23, [x3,#256]
  401144:       3dc04476        ldr     q22, [x3,#272]
  401148:       3dc04875        ldr     q21, [x3,#288]
  40114c:       3dc04c74        ldr     q20, [x3,#304]
  401150:       3dc05073        ldr     q19, [x3,#320]
  401154:       3dc05472        ldr     q18, [x3,#336]
  401158:       3dc05871        ldr     q17, [x3,#352]
  40115c:       3dc05c70        ldr     q16, [x3,#368]
  401160:       3dc06067        ldr     q7, [x3,#384]
  401164:       3dc06466        ldr     q6, [x3,#400]
  401168:       3dc06865        ldr     q5, [x3,#416]
  40116c:       3dc06c64        ldr     q4, [x3,#432]
  401170:       3dc07063        ldr     q3, [x3,#448]
  401174:       3dc07462        ldr     q2, [x3,#464]
  401178:       3dc07861        ldr     q1, [x3,#480]
  40117c:       3dc07c60        ldr     q0, [x3,#496]
  401180:       3dc0006f        ldr     q15, [x3]
  401184:       91080063        add     x3, x3, #0x200

Gcc-4.8:
  40135c:       4cdf78af        ld1     {v15.4s}, [x5], #16
  401360:       4c40790d        ld1     {v13.4s}, [x8]
  401364:       4c4078ae        ld1     {v14.4s}, [x5]
  401368:       9100c048        add     x8, x2, #0x30
  40136c:       91010045        add     x5, x2, #0x40
  401370:       4c40790c        ld1     {v12.4s}, [x8]
  401374:       4c4078ab        ld1     {v11.4s}, [x5]
  401378:       91014048        add     x8, x2, #0x50
  40137c:       91018045        add     x5, x2, #0x60
  401380:       4c40790a        ld1     {v10.4s}, [x8]
  401384:       4c4078a9        ld1     {v9.4s}, [x5]
  401388:       9101c048        add     x8, x2, #0x70
  40138c:       91020045        add     x5, x2, #0x80
  401390:       4c407908        ld1     {v8.4s}, [x8]
  401394:       4c4078bf        ld1     {v31.4s}, [x5]
  401398:       91024048        add     x8, x2, #0x90
  40139c:       91028045        add     x5, x2, #0xa0
  4013a0:       4c40791e        ld1     {v30.4s}, [x8]
  4013a4:       4c4078bd        ld1     {v29.4s}, [x5]
  4013a8:       9102c048        add     x8, x2, #0xb0
  4013ac:       91030045        add     x5, x2, #0xc0


Best Regards
Ron

-----Original Message-----
From: Bernie Ogden [mailto:bernie.og...@linaro.org] 
Sent: Tuesday, January 05, 2016 6:36 PM
To: Xiaofeng Ren <xiaofeng....@nxp.com>
Cc: linaro-toolchain@lists.linaro.org
Subject: Re: gcc-linaro-5.1 vs gcc-linaro-4.8

Hello,

I'm not sure from the information below whether you have observed a performance 
gap, or are expecting to observe one. Have you seen a performance gap?

Regards,

Bernie

On 5 January 2016 at 10:29, Xiaofeng Ren <xiaofeng....@nxp.com> wrote:
> Hello All,
>
> I found one difference between gcc-linaro-5.1 vs gcc-linaro-4.8 while 
> I’m doing lmbench benchmark test for our LS1043 (cortex-A53).
>
> While using gcc-linaro-4.8, gcc will generate advanced SIMD 
> instructions (like as ld1, etc), however, gcc-linaro-5.1 will not 
> generate advance SIMD instructions. This will cause big performance 
> gap between gcc-4.8 and
> gcc-5.1 for lmbench memory bandwidth “fcp” test (bw_mem program).
>
>
>
> My compiler flags is “-O3 -mcpu=cortex-a53”. I also tried several 
> different compiler flags (“-O3 -mcpu=cortex-a53+fp+simd”, “-O2 
> -ftree-vectorize -mcpu=cortex-a53”,  “-O3 -ftree-vectorize 
> -mcpu=cortex-a53”), all of them doesn’t work.
>
>
>
> Gcc-5.1 toolchain was downloaded from following link:
>
>
>
> https://snapshots.linaro.org/openembedded/sources/gcc-linaro-5.1-snaps
> hot-2015.06-1-x86_64_aarch64-linux-gnu.tar.xz
>
>
>
> Can I have your comments on this?
>
>
>
>
>
> Thanks
>
> Ron
>
>
> _______________________________________________
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
>
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to