On Mon, Nov 16, 2020 at 11:03 AM Alan Kelly
<[email protected]> wrote:
> +cglobal yuv2yuvX, 6, 7, 16, filter, filterSize, dest, dstW, dither, offset,
> src
Only 8 xmm registers are used, so 8 should be used instead of 16 here.
Otherwise it causes unnecessary spilling of registers on 64-bit
Windows.
> +%if ARCH_X86_64
> +%define ptr_size 8
[...]
> +%else
> +%define ptr_size 4
The predefined variable gprsize already exists for this purpose, so
that can be used instead.
> + movq xmm3, [ditherq]
If vpbroadcastq m3, [ditherq] is used for AVX2 here, then the following
> + vperm2i128 m3, m3, m3, 0
instruction can be eliminated.
> + punpcklwd m1, m1
> + punpckldq m1, m1
Can be replaced with pshuflw m1, m1, q0000
>+ mov srcq, [filterSizeq]
>+ test srcd, srcd
test srcq, srcq should be used here, since the lower 32 bits of a
valid pointer could randomly happen to be zero on a 64-bit system.
> + REP_RET
Since non-temporal stores are being used, this should be replaced with
sfence
RET
to guarantee proper memory ordering semantics in multi-threaded use
cases. Things will usually work fine without it, but may potentially
break in "fun to debug" ways.
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".