Richard Biener <rguent...@suse.de> writes: > On Tue, 24 Jun 2025, Richard Sandiford wrote: > >> Richard Biener <rguent...@suse.de> writes: >> > On Tue, 24 Jun 2025, Richard Sandiford wrote: >> > >> >> Tamar Christina <tamar.christ...@arm.com> writes: >> >> > store_bit_field_1 has an optimization where if a target is not a memory >> >> > operand >> >> > and the entire value is being set from something larger we can just >> >> > wrap a >> >> > subreg around the source and emit a move. >> >> > >> >> > For vector constructors this is however problematic because the subreg >> >> > means >> >> > that the expansion of the constructor won't happen through vec_init >> >> > anymore. >> >> > >> >> > Complicated constructors which aren't natively supported by targets >> >> > then ICE as >> >> > they wouldn't have been expanded so recog fails. >> >> > >> >> > This patch blocks the optimization on non-constant vector constructors. >> >> > Or non-uniform >> >> > non-constant vectors. I allowed constant vectors because if I read the >> >> > code right >> >> > simplify-rtx should be able to perform the simplification of pulling >> >> > out the element >> >> > or merging the constant values. There are several testcases in >> >> > aarch64-sve-pcs.exp >> >> > that test this as well. I allowed uniform non-constant vectors because >> >> > they >> >> > would be folded into a vec_select later on. >> >> > >> >> > Note that codegen is quite horrible, for what should only be an lsr. >> >> > But I'll >> >> > address that separately so that this patch is backportable. >> >> > >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu, >> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu >> >> > -m32, -m64 and no issues. >> >> > >> >> > Ok for master? and GCC 15, 14, 13? >> >> >> >> I was discussing this Alex off-list last week, and the fix we talked >> >> about there was: >> >> >> >> diff --git a/gcc/explow.cc b/gcc/explow.cc >> >> index 7799a98053b..8b138f54f75 100644 >> >> --- a/gcc/explow.cc >> >> +++ b/gcc/explow.cc >> >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op, >> >> machine_mode innermode, poly_uint64 byte) >> >> { >> >> rtx x = simplify_gen_subreg (outermode, op, innermode, byte); >> >> - if (x) >> >> + if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x)))) >> >> return x; >> >> >> >> auto *start = get_last_insn (); >> >> >> >> The justification is that force_subreg is somewhat like a "subreg >> >> version of force_operand", and so should try to avoid returning >> >> subregs that force_operand would have replaced. The force_operand >> >> code I mean is: >> > >> > Yeah, in particular CONSTANT_P isn't sth documented as valid as >> > subreg operands, only registers (and memory) are. But isn't this >> > then a bug in simplify_gen_subreg itself, that it creates a SUBREG >> > of a non-REG/MEM? >> >> I don't think the documentation is correct/up-to-date. subreg is >> de facto used as a general operation, and for example there are >> patterns like: >> >> (define_insn "" >> [(set (match_operand:QI 0 "general_operand_dst" >> "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg") >> (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand" >> "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7") >> (const_int 16)) 3)) >> (clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r")) >> (clobber (reg:CC CC_REG))] >> "" >> "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0" >> [(set_attr "length" "10")]) > > I see. Is the subreg for such define_insn generated by the middle-end > though?
I assume it was written to match something that combine could generate. Whether it still does in another question. >> (from h8300). This is also why simplify_gen_subreg has: >> >> if (GET_CODE (op) == SUBREG >> || GET_CODE (op) == CONCAT >> || GET_MODE (op) == VOIDmode) >> return NULL_RTX; >> >> if (MODE_COMPOSITE_P (outermode) >> && (CONST_SCALAR_INT_P (op) >> || CONST_DOUBLE_AS_FLOAT_P (op) >> || CONST_FIXED_P (op) >> || GET_CODE (op) == CONST_VECTOR)) >> return NULL_RTX; >> >> rather than the !REG_P (op) && !MEM_P (op) that the documentation >> would imply. > > So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC > we don't seem to ever legitmize constants wrapped in a SUBREG, so > we shouldn't generate a SUBREG of a constant (in the middle-end)? Hmm, yeah, maybe. I'd originally rejected that because I assumed the MODE_COMPOSITE_P was there for a reason. But looking at the history, the check came from c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f, where the reason given was: As for the simplify_gen_subreg change, I think it would be desirable to just avoid creating SUBREGs of constants on all targets and for all constants, if simplify_immed_subreg simplified, fine, otherwise punt, but as we are late in GCC11 development, the patch instead guards this behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to powerpc{,64,64le} double double long double - and only for the cases where simplify_immed_subreg was called. Richard