Re: [FFmpeg-devel] [PATCH] lavc/aarch64: h264qpel, add lowpass_8 based functions

Martin Storsjö Fri, 03 Sep 2021 04:26:27 -0700

On Fri, 3 Sep 2021, Martin Storsjö wrote:

+function \type\()_h264_qpel8_v_lowpass_neon_10
+        ld1             {v16.8H}, [x1], x3
+        ld1             {v18.8H}, [x1], x3
+        ld1             {v20.8H}, [x1], x3
+        ld1             {v22.8H}, [x1], x3
+        ld1             {v24.8H}, [x1], x3
+        ld1             {v26.8H}, [x1], x3
+        ld1             {v28.8H}, [x1], x3
+        ld1             {v30.8H}, [x1], x3
+        ld1             {v17.8H}, [x1], x3
+        ld1             {v19.8H}, [x1], x3
+        ld1             {v21.8H}, [x1], x3
+        ld1             {v23.8H}, [x1], x3
+        ld1             {v25.8H}, [x1]
+
+        transpose_8x8H  v16, v18, v20, v22, v24, v26, v28, v30, v0,  v1
+        transpose_8x8H  v17, v19, v21, v23, v25, v27, v29, v31, v0,  v1
+        lowpass_8_10    v16, v17, v18, v19, v16, v17
+        lowpass_8_10    v20, v21, v22, v23, v18, v19
+        lowpass_8_10    v24, v25, v26, v27, v20, v21
+        lowpass_8_10    v28, v29, v30, v31, v22, v23
+        transpose_8x8H  v16, v17, v18, v19, v20, v21, v22, v23, v0,  v1

I'm a bit surprised by doing this kind of vertical filtering by transposingand doing it horizontally - when vertical filtering can be done soefficiently as-is without needing any extra 'ext' instructions and such. ButI see that the existing code does it this way. I'll give it a try to make aPoC of rewriting the existing code for some case to see how it behaveswithout the transposes.

The potential speedups for the vertical filters are huge actually; I'vesent a patch that IMO simplifies this (getting rid of all transposes). I'dappreciate if you'd remodel your patch according to it.


// Martin
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: h264qpel, add lowpass_8 based functions

Reply via email to