PR #20377 opened by george.zaguri
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20377
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20377.patch
Optimisations for NEON platform with fixes to improve performance on Mac and
fixed comments to patch
RPi4:
Apple M2 (MacBook Air):
vvc_alf_classify_8x8_8_c:2.6 ( 1.00x)
vvc_alf_classify_8x8_8_neon: 1.2 ( 2.06x)
vvc_alf_classify_8x8_10_c: 2.7 ( 1.00x)
vvc_alf_classify_8x8_10_neon:1.1 ( 2.41x)
vvc_alf_classify_8x8_12_c: 2.8 ( 1.00x)
vvc_alf_classify_8x8_12_neon:1.1 ( 2.48x)
vvc_alf_classify_16x16_8_c: 7.2 ( 1.00x)
vvc_alf_classify_16x16_8_neon: 3.4 ( 2.09x)
vvc_alf_classify_16x16_10_c: 4.3 ( 1.00x)
vvc_alf_classify_16x16_10_neon: 3.1 ( 1.38x)
vvc_alf_classify_16x16_12_c: 4.4 ( 1.00x)
vvc_alf_classify_16x16_12_neon: 3.2 ( 1.40x)
vvc_alf_classify_32x32_8_c: 13.6 ( 1.00x)
vvc_alf_classify_32x32_8_neon: 10.6 ( 1.29x)
vvc_alf_classify_32x32_10_c:12.1 ( 1.00x)
vvc_alf_classify_32x32_10_neon: 9.6 ( 1.26x)
vvc_alf_classify_32x32_12_c:12.3 ( 1.00x)
vvc_alf_classify_32x32_12_neon: 9.6 ( 1.28x)
vvc_alf_classify_64x64_8_c: 44.0 ( 1.00x)
vvc_alf_classify_64x64_8_neon: 38.6 ( 1.14x)
vvc_alf_classify_64x64_10_c:41.0 ( 1.00x)
vvc_alf_classify_64x64_10_neon: 35.0 ( 1.17x)
vvc_alf_classify_64x64_12_c:41.7 ( 1.00x)
vvc_alf_classify_64x64_12_neon: 34.9 ( 1.20x)
vvc_alf_classify_128x128_8_c: 157.8 ( 1.00x)
vvc_alf_classify_128x128_8_neon: 147.2 ( 1.07x)
vvc_alf_classify_128x128_10_c: 150.4 ( 1.00x)
vvc_alf_classify_128x128_10_neon: 131.6 ( 1.14x)
vvc_alf_classify_128x128_12_c: 150.0 ( 1.00x)
vvc_alf_classify_128x128_12_neon: 130.6 ( 1.15x)
>From 8b279086db3eb4d1c680be706756f57ca926e0b2 Mon Sep 17 00:00:00 2001
From: Georgii Zagoruiko
Date: Tue, 8 Jul 2025 23:52:18 +0400
Subject: [PATCH 1/3] avcodec/aarch64/vvc: optimised alf_classify function
8/10/12bit of vvc codec for aarch64
- vvc_alf.alf_classify [OK]
vvc_alf_classify_8x8_8_c: 1314.4 ( 1.00x)
vvc_alf_classify_8x8_8_neon: 794.3 ( 1.65x)
vvc_alf_classify_8x8_10_c:1154.7 ( 1.00x)
vvc_alf_classify_8x8_10_neon: 770.0 ( 1.50x)
vvc_alf_classify_8x8_12_c:1091.7 ( 1.00x)
vvc_alf_classify_8x8_12_neon: 770.7 ( 1.42x)
vvc_alf_classify_16x16_8_c: 3710.0 ( 1.00x)
vvc_alf_classify_16x16_8_neon:2205.6 ( 1.68x)
vvc_alf_classify_16x16_10_c: 3306.2 ( 1.00x)
vvc_alf_classify_16x16_10_neon: 2087.9 ( 1.58x)
vvc_alf_classify_16x16_12_c: 3307.9 ( 1.00x)
vvc_alf_classify_16x16_12_neon: 2089.6 ( 1.58x)
vvc_alf_classify_32x32_8_c: 12770.2 ( 1.00x)
vvc_alf_classify_32x32_8_neon:7124.6 ( 1.79x)
vvc_alf_classify_32x32_10_c: 11780.3 ( 1.00x)
vvc_alf_classify_32x32_10_neon: 6856.7 ( 1.72x)
vvc_alf_classify_32x32_12_c: 11779.2 ( 1.00x)
vvc_alf_classify_32x32_12_neon: 7002.8 ( 1.68x)
vvc_alf_classify_64x64_8_c: 49332.3 ( 1.00x)
vvc_alf_classify_64x64_8_neon: 26040.4 ( 1.89x)
vvc_alf_classify_64x64_10_c: 45353.7 ( 1.00x)
vvc_alf_classify_64x64_10_neon: 26251.5 ( 1.73x)
vvc_alf_classify_64x64_12_c: 44876.9 ( 1.00x)
vvc_alf_classify_64x64_12_neon: 26491.3 ( 1.69x)
vvc_alf_classify_128x128_8_c: 191953.5 ( 1.00x)
vvc_alf_classify_128x128_8_neon: 96166.3 ( 2.00x)
vvc_alf_classify_128x128_10_c: 177198.5 ( 1.00x)
vvc_alf_classify_128x128_10_neon:96077.9 ( 1.84x)
vvc_alf_classify_128x128_12_c: 177461.1 ( 1.00x)
vvc_alf_classify_128x128_12_neon:96184.4 ( 1.85x)
---
libavcodec/aarch64/vvc/alf.S | 278 ++
libavcodec/aarch64/vvc/alf_template.c | 87
libavcodec/aarch64/vvc/dsp_init.c | 6 +
3 files changed, 371 insertions(+)
diff --git a/li