On Sun, 10 Jan 2021, [email protected] wrote:

From: Reimar Döffinger <[email protected]>

Speedup is fairly small, around 1.5%, but these are fairly simple.
---
libavcodec/aarch64/hevcdsp_idct_neon.S    | 190 ++++++++++++++++++++++
libavcodec/aarch64/hevcdsp_init_aarch64.c |  24 +++
2 files changed, 214 insertions(+)

diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S 
b/libavcodec/aarch64/hevcdsp_idct_neon.S
index 9f67e45..edd03a0 100644
--- a/libavcodec/aarch64/hevcdsp_idct_neon.S
+++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
@@ -36,6 +36,196 @@ const trans, align=4
        .short 31, 22, 13, 4
endconst

+.macro clip10 in1, in2, c1, c2
+        smax        \in1, \in1, \c1
+        smax        \in2, \in2, \c1
+        smin        \in1, \in1, \c2
+        smin        \in2, \in2, \c2
+.endm
+
+function ff_hevc_add_residual_4x4_8_neon, export=1
+        ld1             {v0.8H-v1.8H}, [x1]
+        ld1             {v2.S}[0], [x0], x2
+        ld1             {v2.S}[1], [x0], x2
+        ld1             {v2.S}[2], [x0], x2
+        ld1             {v2.S}[3], [x0], x2
+        sub             x0, x0, x2, lsl #2
+        uxtl            v8.8H, v2.8B
+        uxtl2           v9.8H, v2.16B
+        sqadd           v0.8H, v0.8H, v8.8H

FWIW, as a matter of taste, I dislike the shouty uppercase version of e.g. element specifiers, like .8H here. The code base contains both styles, but I'd say the lowercase form is more prevalent.

Overall, this patch looks good, nothing much to comment on I think. Not tested fully though, as it depends on the other patch, which still has a few issues (and fails checkasm).

// Martin
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Reply via email to