Re: [FFmpeg-devel] [PATCH] lavc/aarch64: Add neon implementation for sse16

Martin Storsjö Thu, 04 Aug 2022 00:46:33 -0700

On Mon, 25 Jul 2022, Hubert Mazur wrote:

Provide neon implementation for sse16 function.


Performance comparison tests are shown below.
- sse_0_c: 273.0
- sse_0_neon: 48.2

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <[email protected]>
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  4 ++
libavcodec/aarch64/me_cmp_neon.S         | 82 ++++++++++++++++++++++++
2 files changed, 86 insertions(+)

+// iterate by one
+2:
+
+        ld1             {v0.16b}, [x1], x3              // Load pix1
+        ld1             {v1.16b}, [x2], x3              // Load pix2
+
+        uabd            v30.16b, v0.16b, v1.16b
+        umull           v29.8h, v0.8b, v1.8b
+        umull2          v28.8h, v0.16b, v1.16b


This should probably be using v30 instead of v0/v1 in the umull here.

The whole codepath for non-modulo-4 heights is untested in practice. Youcan apply the patches fromhttps://patchwork.ffmpeg.org/project/ffmpeg/list/?series=7028 to makecheckasm test it, so please make sure that the uncommon codepaths in thepatches do work too.


// Martin

_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: Add neon implementation for sse16

Reply via email to