Hi,

one more, I forgot.

On Sun, May 19, 2024 at 8:46 PM Stone Chen <[email protected]> wrote:

> +pw_1: dw 1
>
[..]

> +    vpbroadcastw       m4, [pw_1]
>

We typically suggest to use vpbroadcastd, not w (and then pw_1: times 2 dw
1). agner shows that on e.g. Haswell, the former (d) is 1 uops with 5
cycles latency, whereas the latter (w) is 3 uops with 7 cycles latency, or
more generally d is faster then w.

Ronald
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Reply via email to