Michael mentioned that some users reported seeing better preformance from
RVCT using arm_neon.h then they did when coding directly in assembler.
He suggested we try the same thing for GCC.  Here's an experiment using
the example that Jim Huang posted to the dev list recently:

    https://wiki.linaro.org/RichardSandiford/Sandbox/IntrinsicsPerformance

The summary is that the C version needs to borrow a trick from the
assembly code in order to be competitive.  If it does that, though,
the C code can be faster.  I think this is mostly down to scheduling,
although I haven't checked in detail yet.

Richard

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to