On Sun, 10 Jan 2021, [email protected] wrote:
From: Reimar Döffinger <[email protected]>Speedup is fairly small, around 1.5%, but these are fairly simple. --- libavcodec/aarch64/hevcdsp_idct_neon.S | 190 ++++++++++++++++++++++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++ 2 files changed, 214 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 9f67e45..edd03a0 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -36,6 +36,196 @@ const trans, align=4 .short 31, 22, 13, 4 endconst +.macro clip10 in1, in2, c1, c2 + smax \in1, \in1, \c1 + smax \in2, \in2, \c1 + smin \in1, \in1, \c2 + smin \in2, \in2, \c2 +.endm + +function ff_hevc_add_residual_4x4_8_neon, export=1 + ld1 {v0.8H-v1.8H}, [x1] + ld1 {v2.S}[0], [x0], x2 + ld1 {v2.S}[1], [x0], x2 + ld1 {v2.S}[2], [x0], x2 + ld1 {v2.S}[3], [x0], x2 + sub x0, x0, x2, lsl #2 + uxtl v8.8H, v2.8B + uxtl2 v9.8H, v2.16B + sqadd v0.8H, v0.8H, v8.8H
FWIW, as a matter of taste, I dislike the shouty uppercase version of e.g. element specifiers, like .8H here. The code base contains both styles, but I'd say the lowercase form is more prevalent.
Overall, this patch looks good, nothing much to comment on I think. Not tested fully though, as it depends on the other patch, which still has a few issues (and fails checkasm).
// Martin _______________________________________________ ffmpeg-devel mailing list [email protected] https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email [email protected] with subject "unsubscribe".
