[FFmpeg-devel] [PATCH] avcodec/aarch64/vvc: Implement dmvr_v_8 (PR #20563)

2025-09-20 Thread welder via ffmpeg-devel
PR #20563 opened by welder URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20563 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20563.patch The primary optimization is to load the first row before entering the loop instead of loading two rows each iteration. >From 832f354be2ae0e63e8c47

[FFmpeg-devel] [PATCH] avcodec/aarch64/vvc: Unroll vvc_bdof_grad_filter_8x_neon (PR #20519)

2025-09-14 Thread welder via ffmpeg-devel
PR #20519 opened by welder URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20519 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20519.patch I hope it's not an overkill, I unrolled the 16 width variant, interleaved the loads, stores and arithmetic ops to the best of my ability. Additional

[FFmpeg-devel] [PATCH] avcodec/aarch64/vvc: Optimize dmvr_hv_10 (PR #20517)

2025-09-14 Thread welder via ffmpeg-devel
PR #20517 opened by welder URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20517 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20517.patch Nothing spectacular, merged a few adds and shifts into rounding shifts. >From 7809ff9746abf83bc41c1f13d9e1b2f1da6b0fb9 Mon Sep 17 00:00:00 2001 Fro

[FFmpeg-devel] [PATCH] avcodec/aarch64/vvc: Implement dmvr_h_8 (PR #20451)

2025-09-05 Thread welder via ffmpeg-devel
PR #20451 opened by welder URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20451 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20451.patch >From 93dea0cf1f04013607adb15be53f1be8061d4440 Mon Sep 17 00:00:00 2001 From: Krzysztof Pyrkosz Date: Fri, 5 Sep 2025 22:24:55 +0200 Subject: [PATC

[FFmpeg-devel] [PATCH] Optimize vvc_apply_bdof_block_8x (PR #20448)

2025-09-05 Thread welder via ffmpeg-devel
PR #20448 opened by welder URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20448 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20448.patch The speed improvement is attached in the commit message. The count of arithmetic operation is down from 10 to 6 and some cruft is cleaned up. >Fro

[FFmpeg-devel] [PATCH] Replace uxtl with umull in dmvr_hv_8 (PR #20442)

2025-09-04 Thread welder via ffmpeg-devel
PR #20442 opened by welder URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20442 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20442.patch A low hanging fruit Before and after on A78: dmvr_hv_8_12x20_neon: 205.3 ( 5.21x) dmvr_hv_8_20x12_neon: