Re: NEON intrinsics vs. assembly code

Jim Huang Thu, 21 Apr 2011 20:24:48 -0700

On 21 April 2011 23:38, Richard Sandiford <richard.sandif...@linaro.org> wrote:
> Michael mentioned that some users reported seeing better preformance from
> RVCT using arm_neon.h then they did when coding directly in assembler.
> He suggested we try the same thing for GCC.  Here's an experiment using
> the example that Jim Huang posted to the dev list recently:
>
>    https://wiki.linaro.org/RichardSandiford/Sandbox/IntrinsicsPerformance


hi Richard,

I appreciate your analysis very much!
In fact, that was the practice when I learned ARM NEON.

> The summary is that the C version needs to borrow a trick from the
> assembly code in order to be competitive.  If it does that, though,
> the C code can be faster.  I think this is mostly down to scheduling,
> although I haven't checked in detail yet.
>

Thanks for the conclusion.  Indeed, GCC meeds extra hints for NEON
iterative modulo scheduling.
Do you have any further plan to improve?

Sincerely,
-jserv

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: NEON intrinsics vs. assembly code

Reply via email to