On Wed, 18 Oct 2023, Jakub Jelinek wrote: > On Wed, Oct 18, 2023 at 07:14:36AM +0000, Richard Biener wrote: > > It's interesting that when the target has AVX512 enabled we get > > AVX512 style masks used also for SSE and AVX vector sizes but the > > OMP SIMD clones for SSE and AVX vector sizes use SSE/AVX style > > masks and only the AVX512 size clone uses the AVX512 integer mode > > mask. That necessarily requires an extra setup instruction for > > the mask argument. > > It is an ABI matter, the ABI of the clones shouldn't change just because > of a supposedly non ABI changing option (ISA flags like -mavx512f etc.). > Under the hood, if the callers are -mavx512f the expectation is that the > AVX512 simd clone will be used, but of course that doesn't have to be the > case either because of options requesting only 256 or 128-bit vector width > or loops with small safelen or number of iterations or other reasons.
Yes, understood. Just saying that with AVX10 we're going to hit that oddball case by default (and on most Intel sub-archs the default is 256bit irrespective of AVX512 support). Possibly extending the ABI to add a "AVX10"(?) case with AVX vector width but AVX512 style mask (but only up to SImode?) could make sense. The mask fiddling for vect-simd-clone-16.c is for example movl $1, %edx .. vpbroadcastd %edx, %ymm5 vmovdqa %ymm5, -144(%rbp) .. .L6: .. vpblendmd -144(%rbp), %ymm3, %ymm1{%k1} .. call _ZGVdM8v_foo so inside of the loop it's a single instruction, but involving memory because of the call ABI. I can't think of a more efficient way to do %k ? { 1, .. } : { 0, .. } besides doing the %k mask producing compare twice, for the OMP SIMD call argument with AVX style (but that's going to be difficult for the vectorizer, the mask is not always going to be directly produced by a compare). Richard.