On Tue, Oct 20, 2020 at 04:39:56PM -0400, Arvind Sankar wrote: > Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64 > (tested on Broadwell Xeon) while not increasing code size too much. > > Signed-off-by: Arvind Sankar <nived...@alum.mit.edu> > ---
Looks good, Reviewed-by: Eric Biggers <ebigg...@google.com>