On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali <[email protected]> wrote: > 2017-12-13 17:37 GMT+01:00 Henrik Gramner <[email protected]>: >> You could also do vextracti128 + 128-bit packuswb instead of 256-bit >> packuswb + vpermq. >> > Sorry don't understand this part > do you mean 128 bit packuswb + movh for each lane ? > or something else ?
packuswb m0, m0 vpermq m0, m0, q3120 vs. vextracti128 xm1, m0, 1 packuswb xm0, xm1 Uses a 128-bit op instead of a 256-bit one which is generally preferable whenever possible. _______________________________________________ ffmpeg-devel mailing list [email protected] http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
