I've updated:

    https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv

so that it gives the output for current trunk, including Ira's commit
yesterday to reduce the amount of overpromotion.  I also reran the
microbenchmarks.  The good news is that the vectorised code is now
better in all cases than the non-vectorised code.

The biggest winner from last time was rgb24tobgr16_C().  It used to be
much worse with vectorisation due to lots of excessive widening.
Thanks to Ira's patch, the loop now looks pretty respectable,
and is ~3.25x faster than the non-vectorised code.

As well as using a more recent compiler, the new version also uses
-mvectorize-with-neon-quad.  Once again it shows a significant improvement
over the default.

Richard

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to