PR #20563 opened by welder
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20563
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20563.patch
The primary optimization is to load the first row before entering the loop
instead of loading two rows each iteration.
>From 832f354be2ae0e63e8c47
PR #20519 opened by welder
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20519
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20519.patch
I hope it's not an overkill, I unrolled the 16 width variant, interleaved the
loads, stores and arithmetic ops to the best of my ability. Additional
PR #20517 opened by welder
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20517
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20517.patch
Nothing spectacular, merged a few adds and shifts into rounding shifts.
>From 7809ff9746abf83bc41c1f13d9e1b2f1da6b0fb9 Mon Sep 17 00:00:00 2001
Fro
PR #20451 opened by welder
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20451
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20451.patch
>From 93dea0cf1f04013607adb15be53f1be8061d4440 Mon Sep 17 00:00:00 2001
From: Krzysztof Pyrkosz
Date: Fri, 5 Sep 2025 22:24:55 +0200
Subject: [PATC
PR #20448 opened by welder
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20448
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20448.patch
The speed improvement is attached in the commit message. The count of
arithmetic operation is down from 10 to 6 and some cruft is cleaned up.
>Fro
PR #20442 opened by welder
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20442
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20442.patch
A low hanging fruit
Before and after on A78:
dmvr_hv_8_12x20_neon: 205.3 ( 5.21x)
dmvr_hv_8_20x12_neon: