On Mon, 27 Jan 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
On Sun, Jan 26, 2025 at 01:29:38AM +0200, Martin Storsjö wrote:With the following diff: @@ -40,8 +41,8 @@ function ff_aac_quant_bands_neon, export=1 movi v5.4s, 0x80, lsl #24 .irp signed,1,0 \signed: - subs w3, w3, #4 ld1 {v3.4s}, [x2], #16 + subs w3, w3, #4 fmul v3.4s, v3.4s, v0.s[0] .if \signed ld1 {v4.4s}, [x1], #16 I'm getting the following improvement: Before: Cortex A53 A72 A78 quant_bands_signed_neon: 5661.0 2383.2 1113.2 quant_bands_unsigned_neon: 5401.5 2067.8 811.8 After: quant_bands_signed_neon: 5402.5 2385.5 1090.0 quant_bands_unsigned_neon: 5145.5 2067.8 809.5 No change on the A72 here, but apparently a (very) small improvement on the A78, and a bigger improvement on the A53 as expected. If you don't mind these changes, we could land the change with that tweaked. (I guess the numbers in the commit message could be re-measured, but I'm not sure if they change enough to make much of a difference there, especially on the cores you've measured on.) // MartinI don't mind these changes, I'm perfectly fine with applying any improvements on top of the patch. The speeds on A78 and x13s did not change significantly, the initial benchmark values can be used.
Ok, great, I've pushed this patch then. Thanks for your contribution! // Martin _______________________________________________ ffmpeg-devel mailing list [email protected] https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email [email protected] with subject "unsubscribe".
