Hi, one more, I forgot.
On Sun, May 19, 2024 at 8:46 PM Stone Chen <[email protected]> wrote: > +pw_1: dw 1 > [..] > + vpbroadcastw m4, [pw_1] > We typically suggest to use vpbroadcastd, not w (and then pw_1: times 2 dw 1). agner shows that on e.g. Haswell, the former (d) is 1 uops with 5 cycles latency, whereas the latter (w) is 3 uops with 7 cycles latency, or more generally d is faster then w. Ronald _______________________________________________ ffmpeg-devel mailing list [email protected] https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email [email protected] with subject "unsubscribe".
