x86_64

Richard Biener Wed, 18 Oct 2023 01:01:12 -0700

On Wed, 18 Oct 2023, Jakub Jelinek wrote:

> On Wed, Oct 18, 2023 at 07:14:36AM +0000, Richard Biener wrote:
> > It's interesting that when the target has AVX512 enabled we get
> > AVX512 style masks used also for SSE and AVX vector sizes but the
> > OMP SIMD clones for SSE and AVX vector sizes use SSE/AVX style
> > masks and only the AVX512 size clone uses the AVX512 integer mode
> > mask.  That necessarily requires an extra setup instruction for
> > the mask argument.
> 
> It is an ABI matter, the ABI of the clones shouldn't change just because
> of a supposedly non ABI changing option (ISA flags like -mavx512f etc.).
> Under the hood, if the callers are -mavx512f the expectation is that the
> AVX512 simd clone will be used, but of course that doesn't have to be the
> case either because of options requesting only 256 or 128-bit vector width
> or loops with small safelen or number of iterations or other reasons.


Yes, understood.  Just saying that with AVX10 we're going to hit that
oddball case by default (and on most Intel sub-archs the default is
256bit irrespective of AVX512 support).  Possibly extending the ABI
to add a "AVX10"(?) case with AVX vector width but AVX512 style
mask (but only up to SImode?) could make sense.

The mask fiddling for vect-simd-clone-16.c is for example

        movl    $1, %edx
..
        vpbroadcastd    %edx, %ymm5
        vmovdqa %ymm5, -144(%rbp)
..
.L6:
..
        vpblendmd       -144(%rbp), %ymm3, %ymm1{%k1}
..
        call    _ZGVdM8v_foo

so inside of the loop it's a single instruction, but
involving memory because of the call ABI.  I can't think
of a more efficient way to do %k ? { 1, .. } : { 0, .. }
besides doing the %k mask producing compare twice,
for the OMP SIMD call argument with AVX style (but that's
going to be difficult for the vectorizer, the mask is
not always going to be directly produced by a compare).

Richard.

Re: [r14-4629 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

Reply via email to