On Mon, Jun 2, 2025 at 5:53 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > In the comment trail for PR119966, I'd said that the validate_subreg > condition: > > /* The outer size must be ordered wrt the register size, otherwise > we wouldn't know at compile time how many registers the outer > mode occupies. */ > if (!ordered_p (osize, regsize)) > return false; > > "is also potentially relevant" for paradoxical subregs. But I'd > forgotten an important caveat. If the inner size is smaller than > a register, we know that the inner value will only occupy a single > register. Although the paradoxical subreg might extend that single > register to multiple registers by padding with undefined bits, > the register size that matters for the extension is: > > REGMODE_NATURAL_SIZE (omode) > > rather than regsize's: > > REGMODE_NATURAL_SIZE (imode) > > The ordered check is still relevant if the inner value spans > multiple registers. > > Enabling the check above for paradoxical subregs led to an ICE in the > testcase, where we tried to generate a VNx4QI paradoxical subreg of a > QI scalar. This was previously allowed, and AFAIK worked correctly. > > The patch doesn't have the effect of relaxing the condition for > non-paradoxical subregs, since: > > known_le (osize, isize) && known_le (isize, regsize) > => known_le (osize, regsize) > => ordered_p (osize, regsize) > > So even before the patch for PR119966, the condition only existed for > the maybe_gt (isize, regsize) case. > > The term "block" used in the comment is taken from the rtl.texi > documentation of subregs. > > Tested on aarch64-linux-gnu. OK to install?
OK. Thanks, Richard. > Richard > > > gcc/ > PR rtl-optimization/120447 > * emit-rtl.cc (validate_subreg): Restrict ordered_p test > between osize and regsize to cases where the inner value > occupies multiple blocks. > > gcc/testsuite/ > PR rtl-optimization/120447 > * gcc.dg/pr120447.c: New test. > --- > gcc/emit-rtl.cc | 9 +++++---- > gcc/testsuite/gcc.dg/pr120447.c | 24 ++++++++++++++++++++++++ > 2 files changed, 29 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/pr120447.c > > diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc > index 3f453cda67e..50e3bfcb777 100644 > --- a/gcc/emit-rtl.cc > +++ b/gcc/emit-rtl.cc > @@ -998,10 +998,11 @@ validate_subreg (machine_mode omode, machine_mode imode, > && known_le (osize, isize)) > return false; > > - /* The outer size must be ordered wrt the register size, otherwise > - we wouldn't know at compile time how many registers the outer > - mode occupies. */ > - if (!ordered_p (osize, regsize)) > + /* If ISIZE is greater than REGSIZE, the inner value is split into blocks > + of size REGSIZE. The outer size must then be ordered wrt REGSIZE, > + otherwise we wouldn't know at compile time how many blocks the > + outer mode occupies. */ > + if (maybe_gt (isize, regsize) && !ordered_p (osize, regsize)) > return false; > > /* For normal pseudo registers, we want most of the same checks. Namely: > diff --git a/gcc/testsuite/gcc.dg/pr120447.c b/gcc/testsuite/gcc.dg/pr120447.c > new file mode 100644 > index 00000000000..bd51f9b174d > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr120447.c > @@ -0,0 +1,24 @@ > +/* { dg-options "-Ofast" } */ > +/* { dg-additional-options "-mcpu=neoverse-v2" { target aarch64*-*-* } } */ > + > +char g; > +long h; > +typedef struct { > + void *data; > +} i; > +i* a; > +void b(i *j, char *p2); > +void c(char *d) { > + d = d ? " and " : " or "; > + b(a, d); > +} > +void b(i *j, char *p2) { > + h = __builtin_strlen(p2); > + while (g) > + ; > + int *k = j->data; > + char *l = p2, *m = p2 + h; > + l += 4; > + while (l < m) > + *k++ = *l++; > +} > -- > 2.43.0 >