On Thu, 19 Aug 2021, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Wed, 18 Aug 2021, Richard Sandiford wrote:
> >> I think it would be OK/sensible to use the larger of the index or
> >> result vectors to determine the mask, if that helps.  There just
> >> wasn't any need to make a distinction for SVE, since there the
> >> mask type is determined entirely by the number of elements.
> >
> > I don't think that would help - for a V4SFmode gather with V4DImode
> > indices the x86 ISA expects the AVX2 mask to be V4SImode, that is,
> > the mask corresponds to the data mode, not to the index mode.
> 
> Ah, OK.  So the widest type determines the ISA and then the data
> type determines the mask type within that ISA?

Yes.

> > The real issue is that we're asking for a mask mode just using
> > the data mode but not the instruction.  So with V8SFmode data
> > mode and -mavx512f -mno-avx512vl it appears we want a AVX2
> > mask mode but in reality the gather instruction we can use
> > uses EVEX.512 encoding when the index mode is V8DImode
> > and thus is available w/ -mavx512f.
> >
> > So it appears that we'd need to pass another mode to the
> > get_mask_mode target hook that determines the instruction
> > encoding ...
> >
> > I'm also not sure if this wouldn't open up a can of worms
> > with the mask computation stmts which would not know which
> > (maybe there are two) context they are used in.  We're
> > fundamentally using two different vector sizes and ISA subsets
> > here :/
> 
> Yeah.
> 
> > Maybe vinfo->vector_mode should fully determine whether
> > we're using EVEX or VEX encoding in the loop and thus we should
> > reject attempts to do VEX encoding vectorization in EVEX mode?
> > I supose this is how you don't get a mix of SVE and ADVSIMD
> > vectorization in one loop?
> 
> Yeah, that's right.  It means that for -msve-vector-bits=128
> we effectively have two V16QIs: the “real” AdvSIMD V16QI and
> the SVE VNx16QI.  I can't imagine that being popular for x86 :-)
> 
> > But since VEX/EVEX use the same modes (but SVE/ADVSIMD do not)
> > this will get interesting.
> >
> > Oh, and -mavx512bw would also be prone to use (different size again)
> > AVX2 vectors at the same time.  Maybe the "solution" is to simply
> > not expose the EVEX modes to the vectorizer when there's such
> > a possibility, thus only when all F, VL and BW are available?
> > At least when thinking about relaxing the vector sizes we can
> > deal with.
> >
> > I've opened this can of worms with
> >
> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > index 97745a830a2..db938f79c9c 100644
> > --- a/gcc/tree-vect-data-refs.c
> > +++ b/gcc/tree-vect-data-refs.c
> > @@ -3749,13 +3749,15 @@ vect_gather_scatter_fn_p (vec_info *vinfo, bool 
> > read_p, 
> > bool masked_p,
> >  
> >    for (;;)
> >      {
> > -      tree offset_vectype = get_vectype_for_scalar_type (vinfo, 
> > offset_type);
> > -      if (!offset_vectype)
> > -       return false;
> > -
> > +      tree offset_vectype
> > +       = get_related_vectype_for_scalar_type (vinfo->vector_mode, 
> > offset_type,
> > +                                              TYPE_VECTOR_SUBPARTS 
> > (vectype));
> >
> > to even get the mixed size optabs to be considered.  If we were to
> > expose the real instructions instead we'd have
> > mask_gather_loadv16sfv8di but as said the semantics of such
> > optab entry would need to be defined.  That's the way the current
> > builtin_gather target hook code works, so it seems that would be
> > a working approach, side-stepping the mixed size problems.
> > Here we'd define that excess elements on either the data or the
> > index operand are ignored (on the output operand they are zeroed).
> 
> It seems a bit ugly to expose such a low-level target-specific detail
> at the gimple level though.  Also, if we expose the v16sf at the gimple
> level then we would need to construct a v16sf vector for scatter stores
> and initialise all the elements (at least until we have a don't care
> VEC_PERM_EXPR encoding).

True - which is why I tried to not do this in the first place ...

> I'm not sure this would solve the problem with the mixture of mask
> computation stmts that you mentioned above.  E.g. what if the input
> to the v16sf scatter store above is a COND_EXPR that shares some mask
> subexpresions with the scatter store mask?  This COND_EXPR would be a
> v8sf VEC_COND_EXPR and so would presumably use the get_mask_mode
> corresponding to v8sf rather than v16sf.  The mask computation stmts
> would still need to cope with a mixture of AVX512 and AVX2 masks.

But the smaller data mode is only exposed because of the gather
optab - the index mode is large only because the rest of the loop
is vectorized with the larger size.  This is I think why the current
scheme using the builtin_gather target hook works - we expose the
larger data vector here.

> Even if x86 doesn't support that combination yet (because it instead
> requires all vector sizes to be equal), it seems like something that
> is likely to be supported in future.

Maybe, yes.

That said, I'm somewhat out of options here but I'll see if working
around the mask mode thing is possible in the gather specific code.

At least when we ever try to relax the vector size constraint for
loop vectorization we'll have to find a more generic solution to
this problem... (passing a "ISA mode" to the get_mask_mode hook
does sound appealing, but whether that's enough remains to be seen).

Richard.

Reply via email to