On 09/13/2016 09:10 AM, Paolo Bonzini wrote: > @@ -177,16 +231,15 @@ bool test_buffer_is_zero_next_accel(void) > > static bool select_accel_fn(const void *buf, size_t len) > { > - uintptr_t ibuf = (uintptr_t)buf; > #ifdef CONFIG_AVX2_OPT > - if (len % 128 == 0 && ibuf % 32 == 0 && (cpuid_cache & CACHE_AVX2)) { > + if (len >= 128 && (cpuid_cache & CACHE_AVX2)) { > return buffer_zero_avx2(buf, len); > } > - if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE4)) { > + if (len >= 64 && (cpuid_cache & CACHE_SSE4)) { > return buffer_zero_sse4(buf, len); > } > #endif > - if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE2)) { > + if (len >= 64 && (cpuid_cache & CACHE_SSE2)) { > return buffer_zero_sse2(buf, len); > }
You've dropped a major change to select_accel_fn here. (1) The avx2 routine, as written, can support len >= 64, therefore a common test works for all of the vectorized functions. (2) I had saved the pointer to the routine, so that we didn't have to repeatedly test multiple cpuid_cache bits. r~