Here's an implementation of an 8x8 integer DCT done with NEON
intrinsics -- essentially a translation of the assembly version in
libjpeg-turbo trunk:

https://github.com/mkedwards/crosstool-ng/blob/master/patches/libjpeg-turbo/trunk/0001-Implement-jsimd_idct_ifast-using-NEON-intrinsics.patch

It is in a compilable (on Linaro 2011.05 GCC 4.5, anyway; a recent
Linaro 4.6 snapshot ICEs) but otherwise untested state.  Still, it's
interesting to compare the assembly that it generates against the
hand-written version.  I thought I'd give linaro-toolchain a heads-up
in case y'all could use a test case that generates plenty of pressure
on the VFP/NEON register bank.  (I intend to use it to see how much
performance difference there really is, on the A8 and A9, between NEON
code compiled for 16 vs. 32 registers.)

Cheers,
- Michael

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to