Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

Richard Sandiford Wed, 04 Jun 2025 01:47:52 -0700

Sorry for responding late.

Richard Biener <rguent...@suse.de> writes:
>> > > > > > OK, so SVE VLS -msve-vector-bits=128 modes are indistinguishable 
>> > > > > > from
>> > Adv.
>> > > > > > SIMD
>> > > > > > modes by the middle-end?
>> > > > >
>> > > > > I believe so, the ACLE types have an annotation on them to lift some 
>> > > > > of the
>> > > > > restrictions but the modes are the same.


Yeah, the modes are different but have the same properties (nunits,
size, etc.).  But like you say, the tree-level types are different even
to the middle end, thanks to an a target attribute, since the Adv SIMD
and SVE types have different ABIs.

>> > > > >
>> > > > > > Is there a way to distinguish them, say, by cost
>> > > > > > (target_reg_cost?)?  Since any use of a SVE reg will require a 
>> > > > > > predicate reg?
>> > > > > >
>> > > > >
>> > > > > We do have unpredicated SVE instructions, but yes costing could work.
>> > > > > Essentially what we're trying to do is find the cheapest mode to 
>> > > > > perform
>> > > > > the operation on.
>> > > > >
>> > > > > This could work.. But how would we incorporate it into the costing? 
>> > > > > Part of
>> > > > > the problem is that to iterate over similar modes with the same 
>> > > > > element size
>> > > > > likely requires some target input no?  Or are you saying we should 
>> > > > > only
>> > > > > iterate over fixed size modes?
>> > > > >
>> > > > > Regards,
>> > > > > Tamar
>> > > > >
>> > > > > > I think we miss a critical bit of information in the middle-end 
>> > > > > > here and I'm
>> > > > > > trying to see what piece of information that actually is.
>> > "find_subvector_type"
>> > > > > > doesn't seem to be it, it's maybe using that hidden bit of 
>> > > > > > information for
>> > > > > > one specific use-case but it would be far better to have a way for 
>> > > > > > the target
>> > > > > > to communicate the missing bit of information in a more generic 
>> > > > > > way?
>> > > > > > We can then wrap a "find_subvector_type" around that.
>> > > >
>> > > > So for this one sth like targetm.mode_requires_predication ()?  But
>> > > > as Tamar says above this really depends on the operation.  But the
>> > > > optabs do _not_ expose this requirement (we have non-.COND_ADD for
>> > > > SVE modes), but you want to take advantage of this difference.
>> > > > Can we access insn attributes from optab entries?  Could we add
>> > > > some "standard" attribute noting that an insn requires a predicate?
>> > > > But of course that likely depends on the alternative?

Personally, I think it would be a nice model if targets that only have
conditional instructions could define only the cond_* optab, and the
target-independent code would provide the all-true predicate where necessary.
That would directly give target-independent code more information,
but it would also give target-independent code more work to do.
Does that seem like a fair trade-off?

>> > > >
>> > >
>> > > We'd likely also require the mask that would be used, because I think 
>> > > otherwise
>> > > targetm.mode_requires_predication would be a bit ambiguous for non-flag
>> > setting
>> > > instructions or instructions that don’t do cross lane operations.
>> > >
>> > > e.g. SVE has both COND_ADD and ADD. But the key here is that if we know 
>> > > we'll
>> > > access the bottom 64 or 128 bits we could use an Adv. SIMD ADD.
>> > 
>> > But SVE ADD still requires a predicate register (with all lanes enabled),
>> > no?  That's the whole point of the optimization we're discussing?
>> > I see the only problem with -msve-vector-bits=N where GET_MODE_SIZE
>> > is no longer a POLY_INT - otherwise that would be the easy
>> > way to identify Adv. SIMD vs. SVE and heuristically prefer
>> > fixed-size modes in the vectorizer when possible (for small known
>> > niter <= the fixed-size mode number of lanes).  But with
>> > -msve-vector-bits=128 GET_MODE_SIZE for Adv. SIMD and SVE is equal(?),
>> > so we need another way to distinguish.  Because even with
>> > -msve-vector-bits=128 you need the predicate register appropriately
>> > set up as I understand you are not altering the SVE HW config which
>> > would be also possible(?), but I'm not sure that would make it
>> > possible to have a predicate register less ADD instruction.
>> > 
>> > What SVE register taking machine instructions do not explicitly/implicitly
>> > use one of the SVE predicate registers?
>> 
>> Many, ADD for instance is this 
>> https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/ADD--vectors--unpredicated---Add-vectors--unpredicated--
>> 
>> And SVE2 added many more. GCC already takes advantage of this and drops
>> predicates entirely when it can to avoid the dependency on the predicate 
>> pipe.
>> 
>> Those are actually different instructions not just aliases.
>
> I see.  So this clearly is a feature on instructions then, not modes.
> In fact it might be profitable to use unpredicated add to avoid
> computing the loop mask for a specific element width completely even
> when that would require more operation for a wide SVE implementation.
>
> For the patch at hand I would suggest to re-post without a new target 
> hook, ignoring the -msve-vector-bits complication for now and simply
> key on GET_MODE_SIZE being POLY_INT, having a vectorizer local helper
> like
>
> tree
> get_fixed_size_vectype (tree old_vectype, unsigned nlanes-upper-bound)

I can see the attraction of that, but it doesn't seem to be conceptually
a poly-int vs. fixed-size thing.

If a new hook seems like too much, maybe an alternative would be
to pass an optional code_helper to TARGET_VECTORIZE_RELATED_MODE?
That's the hook that we already use for switching between vector modes
in a single piece of vectorisation.

Richard

Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

Reply via email to