On 21 April 2011 23:38, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Michael mentioned that some users reported seeing better preformance from > RVCT using arm_neon.h then they did when coding directly in assembler. > He suggested we try the same thing for GCC. Here's an experiment using > the example that Jim Huang posted to the dev list recently: > > https://wiki.linaro.org/RichardSandiford/Sandbox/IntrinsicsPerformance
hi Richard, I appreciate your analysis very much! In fact, that was the practice when I learned ARM NEON. > The summary is that the C version needs to borrow a trick from the > assembly code in order to be competitive. If it does that, though, > the C code can be faster. I think this is mostly down to scheduling, > although I haven't checked in detail yet. > Thanks for the conclusion. Indeed, GCC meeds extra hints for NEON iterative modulo scheduling. Do you have any further plan to improve? Sincerely, -jserv _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain