Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

Richard Biener Wed, 04 Jun 2025 02:43:21 -0700

On Wed, 4 Jun 2025, Richard Sandiford wrote:

> Sorry for responding late.
> 
> Richard Biener <rguent...@suse.de> writes:
> >> > > > > > OK, so SVE VLS -msve-vector-bits=128 modes are indistinguishable 
> >> > > > > > from
> >> > Adv.
> >> > > > > > SIMD
> >> > > > > > modes by the middle-end?
> >> > > > >
> >> > > > > I believe so, the ACLE types have an annotation on them to lift 
> >> > > > > some of the
> >> > > > > restrictions but the modes are the same.
> 
> Yeah, the modes are different but have the same properties (nunits,
> size, etc.).  But like you say, the tree-level types are different even
> to the middle end, thanks to an a target attribute, since the Adv SIMD
> and SVE types have different ABIs.
> 
> >> > > > >
> >> > > > > > Is there a way to distinguish them, say, by cost
> >> > > > > > (target_reg_cost?)?  Since any use of a SVE reg will require a 
> >> > > > > > predicate reg?
> >> > > > > >
> >> > > > >
> >> > > > > We do have unpredicated SVE instructions, but yes costing could 
> >> > > > > work.
> >> > > > > Essentially what we're trying to do is find the cheapest mode to 
> >> > > > > perform
> >> > > > > the operation on.
> >> > > > >
> >> > > > > This could work.. But how would we incorporate it into the 
> >> > > > > costing? Part of
> >> > > > > the problem is that to iterate over similar modes with the same 
> >> > > > > element size
> >> > > > > likely requires some target input no?  Or are you saying we should 
> >> > > > > only
> >> > > > > iterate over fixed size modes?
> >> > > > >
> >> > > > > Regards,
> >> > > > > Tamar
> >> > > > >
> >> > > > > > I think we miss a critical bit of information in the middle-end 
> >> > > > > > here and I'm
> >> > > > > > trying to see what piece of information that actually is.
> >> > "find_subvector_type"
> >> > > > > > doesn't seem to be it, it's maybe using that hidden bit of 
> >> > > > > > information for
> >> > > > > > one specific use-case but it would be far better to have a way 
> >> > > > > > for the target
> >> > > > > > to communicate the missing bit of information in a more generic 
> >> > > > > > way?
> >> > > > > > We can then wrap a "find_subvector_type" around that.
> >> > > >
> >> > > > So for this one sth like targetm.mode_requires_predication ()?  But
> >> > > > as Tamar says above this really depends on the operation.  But the
> >> > > > optabs do _not_ expose this requirement (we have non-.COND_ADD for
> >> > > > SVE modes), but you want to take advantage of this difference.
> >> > > > Can we access insn attributes from optab entries?  Could we add
> >> > > > some "standard" attribute noting that an insn requires a predicate?
> >> > > > But of course that likely depends on the alternative?
> 
> Personally, I think it would be a nice model if targets that only have
> conditional instructions could define only the cond_* optab, and the
> target-independent code would provide the all-true predicate where necessary.
> That would directly give target-independent code more information,
> but it would also give target-independent code more work to do.
> Does that seem like a fair trade-off?


I'm not sure, it really depends on how common the situation is.  But
yes, it would reflect restrictions of the target so IMO it would be
a sound change.

> >> > > >
> >> > >
> >> > > We'd likely also require the mask that would be used, because I think 
> >> > > otherwise
> >> > > targetm.mode_requires_predication would be a bit ambiguous for non-flag
> >> > setting
> >> > > instructions or instructions that don’t do cross lane operations.
> >> > >
> >> > > e.g. SVE has both COND_ADD and ADD. But the key here is that if we 
> >> > > know we'll
> >> > > access the bottom 64 or 128 bits we could use an Adv. SIMD ADD.
> >> > 
> >> > But SVE ADD still requires a predicate register (with all lanes enabled),
> >> > no?  That's the whole point of the optimization we're discussing?
> >> > I see the only problem with -msve-vector-bits=N where GET_MODE_SIZE
> >> > is no longer a POLY_INT - otherwise that would be the easy
> >> > way to identify Adv. SIMD vs. SVE and heuristically prefer
> >> > fixed-size modes in the vectorizer when possible (for small known
> >> > niter <= the fixed-size mode number of lanes).  But with
> >> > -msve-vector-bits=128 GET_MODE_SIZE for Adv. SIMD and SVE is equal(?),
> >> > so we need another way to distinguish.  Because even with
> >> > -msve-vector-bits=128 you need the predicate register appropriately
> >> > set up as I understand you are not altering the SVE HW config which
> >> > would be also possible(?), but I'm not sure that would make it
> >> > possible to have a predicate register less ADD instruction.
> >> > 
> >> > What SVE register taking machine instructions do not 
> >> > explicitly/implicitly
> >> > use one of the SVE predicate registers?
> >> 
> >> Many, ADD for instance is this 
> >> https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/ADD--vectors--unpredicated---Add-vectors--unpredicated--
> >> 
> >> And SVE2 added many more. GCC already takes advantage of this and drops
> >> predicates entirely when it can to avoid the dependency on the predicate 
> >> pipe.
> >> 
> >> Those are actually different instructions not just aliases.
> >
> > I see.  So this clearly is a feature on instructions then, not modes.
> > In fact it might be profitable to use unpredicated add to avoid
> > computing the loop mask for a specific element width completely even
> > when that would require more operation for a wide SVE implementation.
> >
> > For the patch at hand I would suggest to re-post without a new target 
> > hook, ignoring the -msve-vector-bits complication for now and simply
> > key on GET_MODE_SIZE being POLY_INT, having a vectorizer local helper
> > like
> >
> > tree
> > get_fixed_size_vectype (tree old_vectype, unsigned nlanes-upper-bound)
> 
> I can see the attraction of that, but it doesn't seem to be conceptually
> a poly-int vs. fixed-size thing.
> 
> If a new hook seems like too much, maybe an alternative would be
> to pass an optional code_helper to TARGET_VECTORIZE_RELATED_MODE?
> That's the hook that we already use for switching between vector modes
> in a single piece of vectorisation.

I'm not against a new hook.  In fact I'd like to keep related_mode
specific here.  I was looking for the rest of the patch (which I
think still needs adjustments) to go forward.

For a hook it could be like

  new_mode
  lowpart_mode_for_operation (old_mode, code_helper)

but then it's still somewhat odd since it mangles the users
implementation detail (using a "lowpart" bit-field-ref/view-convert)
but what we're asking for is a mode that can be used
unpredicated for code_helper but here without a way knowing that
old_mode would have either needed predication or possibly would
be more costly (in case of more HW lanes in case of using an
unpredicated operation).

The situation with SVE and Adv. SIMD here is really awfully special,
so it's difficult to design sth that doesn't seem very target specific
without another target having the same "issue" :/

I suppose some target pass analyzing actual predicate contents
to relax SVE to Adv. SIMD (maybe similar to RISC-V setvl 
analysis/placement) is still out of the question?  I do see that
simple combine isn't going to do the trick.

Richard.

> Richard
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

Reply via email to