Re: [PATCH 03/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions

Spencer Abson Mon, 16 Jun 2025 03:53:40 -0700

On Tue, Jun 10, 2025 at 07:43:20PM +0100, Richard Sandiford wrote:
> Spencer Abson <spencer.ab...@arm.com> writes:
> > On Mon, Jun 09, 2025 at 02:48:58PM +0100, Richard Sandiford wrote:
> >> Spencer Abson <spencer.ab...@arm.com> writes:
> >> > On Thu, Jun 05, 2025 at 09:24:27PM +0100, Richard Sandiford wrote:
> >> >> Spencer Abson <spencer.ab...@arm.com> writes:
> >> >> > diff --git 
> >> >> > a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c 
> >> >> > b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c
> >> >> > new file mode 100644
> >> >> > index 00000000000..8f69232f2cf
> >> >> > --- /dev/null
> >> >> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c
> >> >> > @@ -0,0 +1,47 @@
> >> >> > +/* { dg-do compile } */
> >> >> > +/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=2048 
> >> >> > -fno-trapping-math" } */
> >> >> 
> >> >> The =2048 is ok, but do you need it for these autovectorisation tests?
> >> >> If vectorisation is treated as not profitable without it, then perhaps
> >> >> we could switch to Tamar's -mmax-vectorization, once that's in.
> >> >
> >> > This isn't needed to make vectorization profitable, but rather to
> >> > make partial vector modes the reliably obvious choice - and hopefully
> >> > one that is isn't affected by future cost model changes.  With =2048
> >> > and COUNT, each loop should be fully-unrolled into a single unpacked 
> >> > operation (plus setup and return).
> >> >
> >> > For me, this was much more flexible than using builtin vector types,
> >> > and easier to reason about.  Maybe that's just me though!  I can try
> >> > something else if it would be preferred.
> >> 
> >> I don't really agree about the "easier to reason about" bit: IMO,
> >> builtin vector types are the most direct and obvious way of testing
> >> things with fixed-length vectors, for the cases that they can handle
> >> directly.  But I agree that vectorisation is more flexible, in that
> >> it can deal with cases that fixed-length builtin vectors can't yet
> >> handle directly.
> >> 
> >> My main concern was that the tests didn't seem to have much coverage
> >> of normal VLA codegen.  If the aim is predictable costing, it might
> >> be enough to use -moverride=sve_width=2048 instead of
> >> -msve-vector-bits=2048.
> >
> > I see - yeah, -moverride=sve_width=2048 is enough.
> >
> > How about we use builtin vectors wherever possible, and fall back
> > to the current approach (but replacing -msve-vector-bits with
> > -moverride=sve_width) everywhere else?
> >
> > Alternatively, if we'd like to focus on VLA codegen, I could
> > just replace -msve-vector-bits with -moverride=sve_width throughout
> > the series.
> 
> I don't think there's any need to go back and change the way the tests
> are written.  Just replacing -msve-vector-bits with -moverride=sve_width
> for the vectoriser-based tests sounds good.
>


Hi,

> I see - yeah, -moverride=sve_width=2048 is enough.

This was a bit of an oversight from me, sorry.  Testing these changes
in the order that they are written in using VLA codegen is quite
difficult in practice; testing each element size/container size
pair requires coercing the vectorizer into choosing a specific VF,
this choice can change even as the series itself evolves, and often
requires more tuning/tweaking than just -moverride=sve_width.

I'm a little worried about pushing potentially flaky tests.  If I
can't easily follow the reasoning of the cost-model/vectorizer, would
you mind if I stick to the original fixed-length format?

If the choice seems obvious enough, I think I ought to make sure that
nothing silently fails by checking for each extending load pattern,
e.g.

/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d,} n } } */

Thanks,
Spencer

Re: [PATCH 03/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions

Reply via email to