Hi, the attached patch is a low-hanging fruit.
I think the code using the computed values could be improved (eg you probably need half the GPRs to store results and you can probably shuffle more efficiently data), but this requires more effort. I'm mostly submitting it because it still applies, and I can't really spend more time on it. -- Christophe
From 57819727586c186bfea733a8f06eead22ac6a1f2 Mon Sep 17 00:00:00 2001 From: Christophe Gisquet <[email protected]> Date: Wed, 23 Jul 2014 23:21:20 +0200 Subject: [PATCH 08/13] x86: hevc_deblock: remove unnecessary masking The unpacks/shuffles later on makes it unnecessary. Before: 1508 decicycles in h, 2096759 runs, 393 skips 2512 decicycles in v, 2095422 runs, 1730 skips After: 1477 decicycles in h, 2096745 runs, 407 skips 2484 decicycles in v, 2095297 runs, 1855 skips --- libavcodec/x86/hevc_deblock.asm | 4 ---- 1 file changed, 4 deletions(-) diff --git a/libavcodec/x86/hevc_deblock.asm b/libavcodec/x86/hevc_deblock.asm index 89c0f9b..7fa0803 100644 --- a/libavcodec/x86/hevc_deblock.asm +++ b/libavcodec/x86/hevc_deblock.asm @@ -355,19 +355,15 @@ ALIGN 16 psrld m8, 16 paddw m8, m10 movd r7d, m8 - and r7, 0xffff; 1dp0 + 1dp3 pshufd m8, m8, 0x4E movd r8d, m8 - and r8, 0xffff; 0dp0 + 0dp3 pshufd m8, m11, 0x31 psrld m8, 16 paddw m8, m11 movd r9d, m8 - and r9, 0xffff; 1dq0 + 1dq3 pshufd m8, m8, 0x4E movd r10d, m8 - and r10, 0xffff; 0dq0 + 0dq3 ; end calc for weak filter ; filtering mask -- 1.9.2.msysgit.0
_______________________________________________ ffmpeg-devel mailing list [email protected] http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
