On Tue, 24 Jun 2025, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Tue, 24 Jun 2025, Richard Sandiford wrote: > > > >> Tamar Christina <tamar.christ...@arm.com> writes: > >> > store_bit_field_1 has an optimization where if a target is not a memory > >> > operand > >> > and the entire value is being set from something larger we can just wrap > >> > a > >> > subreg around the source and emit a move. > >> > > >> > For vector constructors this is however problematic because the subreg > >> > means > >> > that the expansion of the constructor won't happen through vec_init > >> > anymore. > >> > > >> > Complicated constructors which aren't natively supported by targets then > >> > ICE as > >> > they wouldn't have been expanded so recog fails. > >> > > >> > This patch blocks the optimization on non-constant vector constructors. > >> > Or non-uniform > >> > non-constant vectors. I allowed constant vectors because if I read the > >> > code right > >> > simplify-rtx should be able to perform the simplification of pulling out > >> > the element > >> > or merging the constant values. There are several testcases in > >> > aarch64-sve-pcs.exp > >> > that test this as well. I allowed uniform non-constant vectors because > >> > they > >> > would be folded into a vec_select later on. > >> > > >> > Note that codegen is quite horrible, for what should only be an lsr. > >> > But I'll > >> > address that separately so that this patch is backportable. > >> > > >> > Bootstrapped Regtested on aarch64-none-linux-gnu, > >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > >> > -m32, -m64 and no issues. > >> > > >> > Ok for master? and GCC 15, 14, 13? > >> > >> I was discussing this Alex off-list last week, and the fix we talked > >> about there was: > >> > >> diff --git a/gcc/explow.cc b/gcc/explow.cc > >> index 7799a98053b..8b138f54f75 100644 > >> --- a/gcc/explow.cc > >> +++ b/gcc/explow.cc > >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op, > >> machine_mode innermode, poly_uint64 byte) > >> { > >> rtx x = simplify_gen_subreg (outermode, op, innermode, byte); > >> - if (x) > >> + if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x)))) > >> return x; > >> > >> auto *start = get_last_insn (); > >> > >> The justification is that force_subreg is somewhat like a "subreg > >> version of force_operand", and so should try to avoid returning > >> subregs that force_operand would have replaced. The force_operand > >> code I mean is: > > > > Yeah, in particular CONSTANT_P isn't sth documented as valid as > > subreg operands, only registers (and memory) are. But isn't this > > then a bug in simplify_gen_subreg itself, that it creates a SUBREG > > of a non-REG/MEM? > > I don't think the documentation is correct/up-to-date. subreg is > de facto used as a general operation, and for example there are > patterns like: > > (define_insn "" > [(set (match_operand:QI 0 "general_operand_dst" > "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg") > (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" > "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7") > (const_int 16)) 3)) > (clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r")) > (clobber (reg:CC CC_REG))] > "" > "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0" > [(set_attr "length" "10")])
I see. Is the subreg for such define_insn generated by the middle-end though? > (from h8300). This is also why simplify_gen_subreg has: > > if (GET_CODE (op) == SUBREG > || GET_CODE (op) == CONCAT > || GET_MODE (op) == VOIDmode) > return NULL_RTX; > > if (MODE_COMPOSITE_P (outermode) > && (CONST_SCALAR_INT_P (op) > || CONST_DOUBLE_AS_FLOAT_P (op) > || CONST_FIXED_P (op) > || GET_CODE (op) == CONST_VECTOR)) > return NULL_RTX; > > rather than the !REG_P (op) && !MEM_P (op) that the documentation > would imply. So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC we don't seem to ever legitmize constants wrapped in a SUBREG, so we shouldn't generate a SUBREG of a constant (in the middle-end)? Richard. > > To take another example that I happened to be working on today, > LRA eliminates registers recursively, so if we have: > > (subreg:SI (reg:DI R) 0) > > for an eliminable register R, we end up with: > > (subreg:SI (plus:DI (reg:DI R') (const_int N)) 0) > > I wouldn't object to trying to change this, but > > (a) if we do, we should also restrict it to just REGs, not REGs and MEMs > > (b) it would be far too invasive to backport > > (c) it might be an ongoing whack-a-mole project, a bit like the recent > tightening of validate_subreg > > But I agree it would make things cleaner in some ways. > > Thanks, > Richard > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)