On Mon, 19 Mar 2018, Rahul Lakkireddy wrote: > Use VMOVDQU AVX CPU instruction when available to do 256-bit > IO read and write.
That's not what the patch does. See below. > Signed-off-by: Rahul Lakkireddy <rahul.lakkire...@chelsio.com> > Signed-off-by: Ganesh Goudar <ganes...@chelsio.com> That Signed-off-by chain is wrong.... > +#ifdef CONFIG_AS_AVX > +#include <asm/fpu/api.h> > + > +static inline u256 __readqq(const volatile void __iomem *addr) > +{ > + u256 ret; > + > + kernel_fpu_begin(); > + asm volatile("vmovdqu %0, %%ymm0" : > + : "m" (*(volatile u256 __force *)addr)); > + asm volatile("vmovdqu %%ymm0, %0" : "=m" (ret)); > + kernel_fpu_end(); > + return ret; You _cannot_ assume that the instruction is available just because CONFIG_AS_AVX is set. The availability is determined by the runtime evaluated CPU feature flags, i.e. X86_FEATURE_AVX. Aside of that I very much doubt that this is faster than 4 consecutive 64bit reads/writes as you have the full overhead of kernel_fpu_begin()/end() for each access. You did not provide any numbers for this so its even harder to determine. As far as I can tell the code where you are using this is a debug facility. What's the point? Debug is hardly a performance critical problem. Thanks, tglx