me_cmp: R-V V pix_abs

flow gg Tue, 06 Feb 2024 16:01:50 -0800

I think in most cases it is like this, but specifically for this function,
using Reduction only once would be slower.


The currently submitted version roughly takes:
pix_abs_0_0_rvv_i32: 136.2

The version that uses Reduction only once takes:
pix_abs_0_0_rvv_i32: 169.2

Here is the implementation of the version that uses it only once:

func ff_pix_abs16_temp_rvv, zve32x
        vsetivli        zero, 16, e32, m4, ta, ma
        vmv.v.i         v24, 0
        vmv.s.x         v0, zero
1:
        vsetvli         zero, zero, e8, m1, tu, ma
        vle8.v          v4, (a1)
        vle8.v          v12, (a2)
        addi            a4, a4, -1
        vwsubu.vv       v16, v4, v12
        add             a1, a1, a3
        vwsubu.vv       v20, v12, v4
        vsetvli         zero, zero, e16, m2, tu, ma
        vmax.vv         v16, v16, v20
        add             a2, a2, a3
        vwadd.wv        v24, v24, v16
        bnez            a4, 1b

        vsetvli         zero, zero, e32, m4, ta, ma
        vwredsumu.vs    v0, v24, v0
        vmv.x.s         a0, v0
        ret
endfunc

Rémi Denis-Courmont <[email protected]> 于2024年2月7日周三 00:58写道：

> Hi,
>
> To sum a vector, you should only reduce once at the end of the function,
> c.f.
> how it's done in existing scalar products. Reduction instructions are
> (intrinsically) slow.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
>
>
>
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs

Reply via email to