Richard Biener <[email protected]> writes:
> On Tue, 24 Jun 2025, Richard Sandiford wrote:
>
>> Richard Biener <[email protected]> writes:
>> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
>> >
>> >> Tamar Christina <[email protected]> writes:
>> >> > store_bit_field_1 has an optimization where if a target is not a memory
>> >> > operand
>> >> > and the entire value is being set from something larger we can just
>> >> > wrap a
>> >> > subreg around the source and emit a move.
>> >> >
>> >> > For vector constructors this is however problematic because the subreg
>> >> > means
>> >> > that the expansion of the constructor won't happen through vec_init
>> >> > anymore.
>> >> >
>> >> > Complicated constructors which aren't natively supported by targets
>> >> > then ICE as
>> >> > they wouldn't have been expanded so recog fails.
>> >> >
>> >> > This patch blocks the optimization on non-constant vector constructors.
>> >> > Or non-uniform
>> >> > non-constant vectors. I allowed constant vectors because if I read the
>> >> > code right
>> >> > simplify-rtx should be able to perform the simplification of pulling
>> >> > out the element
>> >> > or merging the constant values. There are several testcases in
>> >> > aarch64-sve-pcs.exp
>> >> > that test this as well. I allowed uniform non-constant vectors because
>> >> > they
>> >> > would be folded into a vec_select later on.
>> >> >
>> >> > Note that codegen is quite horrible, for what should only be an lsr.
>> >> > But I'll
>> >> > address that separately so that this patch is backportable.
>> >> >
>> >> > Bootstrapped Regtested on aarch64-none-linux-gnu,
>> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
>> >> > -m32, -m64 and no issues.
>> >> >
>> >> > Ok for master? and GCC 15, 14, 13?
>> >>
>> >> I was discussing this Alex off-list last week, and the fix we talked
>> >> about there was:
>> >>
>> >> diff --git a/gcc/explow.cc b/gcc/explow.cc
>> >> index 7799a98053b..8b138f54f75 100644
>> >> --- a/gcc/explow.cc
>> >> +++ b/gcc/explow.cc
>> >> @@ -753,7 +753,7 @@ force_subreg (machine_mode outermode, rtx op,
>> >> machine_mode innermode, poly_uint64 byte)
>> >> {
>> >> rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
>> >> - if (x)
>> >> + if (x && (!SUBREG_P (x) || REG_P (SUBREG_REG (x))))
>> >> return x;
>> >>
>> >> auto *start = get_last_insn ();
>> >>
>> >> The justification is that force_subreg is somewhat like a "subreg
>> >> version of force_operand", and so should try to avoid returning
>> >> subregs that force_operand would have replaced. The force_operand
>> >> code I mean is:
>> >
>> > Yeah, in particular CONSTANT_P isn't sth documented as valid as
>> > subreg operands, only registers (and memory) are. But isn't this
>> > then a bug in simplify_gen_subreg itself, that it creates a SUBREG
>> > of a non-REG/MEM?
>>
>> I don't think the documentation is correct/up-to-date. subreg is
>> de facto used as a general operation, and for example there are
>> patterns like:
>>
>> (define_insn ""
>> [(set (match_operand:QI 0 "general_operand_dst"
>> "=rm,Za,Zb,Zc,Zd,Ze,Zf,Zh,Zg")
>> (subreg:QI (lshiftrt:SI (match_operand:SI 1 "register_operand"
>> "r,Z0,Z1,Z2,Z3,Z4,Z5,Z6,Z7")
>> (const_int 16)) 3))
>> (clobber (match_scratch:SI 2 "=&r,&r,&r,&r,&r,&r,&r,&r,&r"))
>> (clobber (reg:CC CC_REG))]
>> ""
>> "mov.w\\t%e1,%f2\;mov.b\\t%w2,%R0"
>> [(set_attr "length" "10")])
>
> I see. Is the subreg for such define_insn generated by the middle-end
> though?
I assume it was written to match something that combine could generate.
Whether it still does in another question.
>> (from h8300). This is also why simplify_gen_subreg has:
>>
>> if (GET_CODE (op) == SUBREG
>> || GET_CODE (op) == CONCAT
>> || GET_MODE (op) == VOIDmode)
>> return NULL_RTX;
>>
>> if (MODE_COMPOSITE_P (outermode)
>> && (CONST_SCALAR_INT_P (op)
>> || CONST_DOUBLE_AS_FLOAT_P (op)
>> || CONST_FIXED_P (op)
>> || GET_CODE (op) == CONST_VECTOR))
>> return NULL_RTX;
>>
>> rather than the !REG_P (op) && !MEM_P (op) that the documentation
>> would imply.
>
> So maybe we can drop the MODE_COMPOSITE_P check here, as said on IRC
> we don't seem to ever legitmize constants wrapped in a SUBREG, so
> we shouldn't generate a SUBREG of a constant (in the middle-end)?
Hmm, yeah, maybe. I'd originally rejected that because I assumed
the MODE_COMPOSITE_P was there for a reason. But looking at the
history, the check came from c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f,
where the reason given was:
As for the simplify_gen_subreg change, I think it would be desirable
to just avoid creating SUBREGs of constants on all targets and for all
constants, if simplify_immed_subreg simplified, fine, otherwise punt,
but as we are late in GCC11 development, the patch instead guards this
behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to
powerpc{,64,64le} double double long double - and only for the cases where
simplify_immed_subreg was called.
Richard