vc1: Arm 64-bit NEON inverse transform fast paths

Martin Storsjö Wed, 30 Mar 2022 07:01:44 -0700

On Wed, 30 Mar 2022, Martin Storsjö wrote:

On Fri, 25 Mar 2022, Ben Avison wrote:

checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.

vc1dsp.vc1_inv_trans_4x4_c: 158.2
vc1dsp.vc1_inv_trans_4x4_neon: 65.7
vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5
vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5
vc1dsp.vc1_inv_trans_4x8_c: 335.2
vc1dsp.vc1_inv_trans_4x8_neon: 106.2
vc1dsp.vc1_inv_trans_4x8_dc_c: 151.2
vc1dsp.vc1_inv_trans_4x8_dc_neon: 25.5
vc1dsp.vc1_inv_trans_8x4_c: 365.7
vc1dsp.vc1_inv_trans_8x4_neon: 97.2
vc1dsp.vc1_inv_trans_8x4_dc_c: 139.7
vc1dsp.vc1_inv_trans_8x4_dc_neon: 16.5
vc1dsp.vc1_inv_trans_8x8_c: 547.7
vc1dsp.vc1_inv_trans_8x8_neon: 137.0
vc1dsp.vc1_inv_trans_8x8_dc_c: 268.2
vc1dsp.vc1_inv_trans_8x8_dc_neon: 30.5

Signed-off-by: Ben Avison <[email protected]>
---
libavcodec/aarch64/vc1dsp_init_aarch64.c |  19 +
libavcodec/aarch64/vc1dsp_neon.S         | 678 +++++++++++++++++++++++
2 files changed, 697 insertions(+)

Looks generally reasonable. Is it possible to factorize out the individualtransforms (so that you'd e.g. invoke the same macro twice in the 8x8 and 4x4functions) without too much loss? The downshift which differs between thw twocould either be left outside of the macro, or the downshift amount could bemade a macro parameter.

Another aspect: I forgot the aspect that we have existing arm assembly forthe idct. In some cases, there's value in keeping the implementationssimilar if possible and relevant. But your implementation seems quitestraightforward, and seems to get better benchmark numbers on the samecores, so I guess it's fine to diverge and add a new from-scratchimplementation here.


// Martin
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

Reply via email to