Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

Richard Biener Tue, 24 Jun 2025 04:05:54 -0700

On Tue, 24 Jun 2025, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
> >
> >> Tamar Christina <tamar.christ...@arm.com> writes:
> >> > store_bit_field_1 has an optimization where if a target is not a memory 
> >> > operand
> >> > and the entire value is being set from something larger we can just wrap 
> >> > a
> >> > subreg around the source and emit a move.
> >> >
> >> > For vector constructors this is however problematic because the subreg 
> >> > means
> >> > that the expansion of the constructor won't happen through vec_init 
> >> > anymore.
> >> >
> >> > Complicated constructors which aren't natively supported by targets then 
> >> > ICE as
> >> > they wouldn't have been expanded so recog fails.
> >> >
> >> > This patch blocks the optimization on non-constant vector constructors. 
> >> > Or non-uniform
> >> > non-constant vectors. I allowed constant vectors because if I read the 
> >> > code right
> >> > simplify-rtx should be able to perform the simplification of pulling out 
> >> > the element
> >> > or merging the constant values.  There are several testcases in 
> >> > aarch64-sve-pcs.exp
> >> > that test this as well. I allowed uniform non-constant vectors because 
> >> > they
> >> > would be folded into a vec_select later on.
> >> >
> >> > Note that codegen is quite horrible, for what should only be an lsr.  
> >> > But I'll
> >> > address that separately so that this patch is backportable.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> >> > -m32, -m64 and no issues.
> >> >
> >> > Ok for master? and GCC 15, 14, 13?
> >> 
> >> I was discussing this Alex off-list last week, and the fix we talked
> >> about there was:
> >> 
> >> diff --git a/gcc/explow.cc b/gcc/explow.cc
> >> index 7799a98053b..8b138f54f75 100644
> >> --- a/gcc/explow.cc
> >> +++ b/gcc/explow.cc
> >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
> >>          machine_mode innermode, poly_uint64 byte)
> >>  {
> >>    rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
> >> -  if (x)
> >> +  if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x))))
> >>      return x;
> >>  
> >>    auto *start = get_last_insn ();
> >> 
> >> The justification is that force_subreg is somewhat like a "subreg
> >> version of force_operand", and so should try to avoid returning
> >> subregs that force_operand would have replaced.  The force_operand
> >> code I mean is:
> >
> > Yeah, in particular CONSTANT_P isn't sth documented as valid as
> > subreg operands, only registers (and memory) are.  But isn't this
> > then a bug in simplify_gen_subreg itself, that it creates a SUBREG
> > of a non-REG/MEM?
> 
> I don't think the documentation is correct/up-to-date.  subreg is
> de facto used as a general operation, and for example there are
> patterns like:
> 
> (define_insn ""
>   [(set (match_operand:QI 0 "general_operand_dst" 
> "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
>         (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" 
> "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
>                                 (const_int 16)) 3))
>    (clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r"))
>    (clobber (reg:CC CC_REG))]
>   ""
>   "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0"
>   [(set_attr "length" "10")])


I see.  Is the subreg for such define_insn generated by the middle-end
though?

> (from h8300).  This is also why simplify_gen_subreg has:
> 
>   if (GET_CODE (op) == SUBREG
>       || GET_CODE (op) == CONCAT
>       || GET_MODE (op) == VOIDmode)
>     return NULL_RTX;
> 
>   if (MODE_COMPOSITE_P (outermode)
>       && (CONST_SCALAR_INT_P (op)
>         || CONST_DOUBLE_AS_FLOAT_P (op)
>         || CONST_FIXED_P (op)
>         || GET_CODE (op) == CONST_VECTOR))
>     return NULL_RTX;
> 
> rather than the !REG_P (op) && !MEM_P (op) that the documentation
> would imply.

So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC
we don't seem to ever legitmize constants wrapped in a SUBREG, so
we shouldn't generate a SUBREG of a constant (in the middle-end)?

Richard.

> 
> To take another example that I happened to be working on today,
> LRA eliminates registers recursively, so if we have:
> 
>   (subreg:SI (reg:DI R) 0)
> 
> for an eliminable register R, we end up with:
> 
>   (subreg:SI (plus:DI (reg:DI R') (const_int N)) 0)
> 
> I wouldn't object to trying to change this, but
> 
> (a) if we do, we should also restrict it to just REGs, not REGs and MEMs
> 
> (b) it would be far too invasive to backport
> 
> (c) it might be an ongoing whack-a-mole project, a bit like the recent
>     tightening of validate_subreg
> 
> But I agree it would make things cleaner in some ways.
> 
> Thanks,
> Richard
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

Reply via email to