Sorry for the slow reply.

Dimitar Dimitrov <dimi...@dinux.eu> writes:
> On Fri, May 16, 2025 at 06:14:30PM +0100, Richard Sandiford wrote:
>> Dimitar Dimitrov <dimi...@dinux.eu> writes:
>> > After r16-160-ge6f89d78c1a752, late_combine2 started transforming the
>> > following RTL for pru-unknown-elf:
>> >
>> >   (insn 3949 3948 3951 255 (set (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856])
>> >           (and:QI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855])
>> >               (const_int 3 [0x3])))
>> >        (nil))
>> >   ...
>> >   (insn 3961 7067 3962 255 (set (reg:SI 56 r14.b0)
>> >           (zero_extend:SI (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856])))
>> >        (nil))
>> >
>> > into:
>> >
>> >   (insn 3961 7067 3962 255 (set (reg:SI 56 r14.b0)
>> >           (and:SI (subreg:SI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855]) 0)
>> >               (const_int 3 [0x3])))
>> >        (nil))
>> >
>> > That caused libbacktrace build to break for pru-unknown-elf.  Register
>> > r0.b1 (regno 1) is not valid for SImode, which validate_subreg failed to
>> > reject.
>> >
>> > Fix by calling HARD_REGNO_MODE_OK to ensure that both inner and outer
>> > modes are valid for the hardware subreg.
>> >
>> > This patch fixes the broken PRU toolchain build.  It leaves only two
>> > test case regressions for PRU, caused by rnreg pass renaming a valid
>> > paradoxical subreg into an invalid one.
>> >   gcc.c-torture/execute/20040709-1.c
>> >   gcc.c-torture/execute/20040709-2.c
>> >
>> >    PR target/119966
>> >
>> > gcc/ChangeLog:
>> >
>> >    * emit-rtl.cc (validate_subreg): Validate inner
>> >    and outer mode for paradoxical hardware subregs.
>> >
>> > Co-authored-by: Andrew Pinski <pins...@gmail.com>
>> > Signed-off-by: Dimitar Dimitrov <dimi...@dinux.eu>
>> > ---
>> >  gcc/emit-rtl.cc | 3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
>> > index e46b0f9eac4..6c5d9b55508 100644
>> > --- a/gcc/emit-rtl.cc
>> > +++ b/gcc/emit-rtl.cc
>> > @@ -983,6 +983,9 @@ validate_subreg (machine_mode omode, machine_mode 
>> > imode,
>> >        if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
>> >      && GET_MODE_INNER (imode) == omode)
>> >    ;
>> > +      else if (!targetm.hard_regno_mode_ok (regno, imode)
>> > +         || !targetm.hard_regno_mode_ok (regno, omode))
>> > +  return false;
>> 
>> It isn't meaningful to test regno against omode, since that isn't
>> necessarily the register that would be produced by the subreg.
>
> Do you refer to the register renaming pass?  I can't think of another reason
> for the regno of a hardware register in a subreg to be changed.

It's a general property of subregs.  For example, suppose we have
(reg:SI 0) on a 16-bit target.  Then:

  (subreg:HI (reg:SI 0) 2)

folds to (reg:HI 1) rather than (reg:HI 0).  It's therefore register 1
rather than register 0 that should be tested against HImode.

>> ISTR that this is a sensitive part of the codebase.  I think there
>> are/were targets that create unfoldable subregs for argument passing
>> and return.  And I think e500 had unfoldable subregs of FP registers,
>> although that port is gone now.
>
> Could you share what is "unfoldable subreg"?  I could not find this phrase
> anywhere in the source, except in one comment in the i386 port.

Yeah, it was an ad-hoc term, sorry, but...

> Perhaps a subreg of a hardware register is "unfoldable" when the hardware
> register is not valid in the outer mode?  In which case the subreg cannot be
> replaced directly with a hardware register?

...yes, that's what I meant.

>> So I suppose the question is: when given a hard register, should
>> validate_subreg test whether the subreg can be folded to a hard
>> register?  Or is it more relaxed than that?  Do we need different
>> rules before LRA (which could fix up subregs through reloading)
>> and after LRA (where unfoldable subregs stay unfoldable).
>
> My naive answer _was_ that validate_subreg should always perform checks for
> hardware registers.  Now I see it was too naive, because I was not aware of
> the different ways targets use subregs.  Hence this patch should be dropped.
>
> Meanwhile PR119966, which this patch hoped to address, got fixed instead
> with r16-809-gf725d6765373f7.

Ah, nice!

>> If validate_subreg should test whether a subreg of a hard register
>> can be folded to a hard register, the fix would be to use
>> simplify_subreg_regno instead of the current tests.  But it looks
>> like that was deliberately not done.
>
> When validate_subreg was introduced with r0-63800-gbeb72684810c1a,
> simplify_subreg_regno simply did not exit.  The simplify_subreg_regno
> itself was added later with r0-89444-geef302d277ea42.

Ah, yeah, my bad.  I'd forgotten the history of these routines.

>> It might still be worth trying to use simplify_subreg_regno and
>> seeing what breaks.  Any fallaout would at least let us expand
>> the comments to explain the constraints.
>
> I tried simplify_subreg_regno, and some tests regressed for 
> x86_64-pc-linux-gnu:
>
>   check-gcc-c                     trunk               patched     change
>   -------------------------------+-------------------+------------+-----
>   # of expected passes            216431              216358      -73
>   # of unexpected failures        117                 197         +80
>   # of unexpected successes       25                  25          0
>   # of expected failures          1552                1552        0
>   # of unresolved testcases       0                   33          +33
>   # of unsupported tests          3888                3888        0
>
> For reference, here are a few of the newly failing tests:
>  FAIL: gcc.dg/asan/pr82517.c   -O0  (internal compiler error: in gen_reg_rtx, 
> at emit-rtl.cc:1188)
>  FAIL: c-c++-common/cpp/embed-28.c  -Wc++-compat  (internal compiler error: 
> in gen_reg_rtx, at emit-rtl.cc:1188)
>  FAIL: gcc.dg/torture/stackalign/pr16660-3.c   -Os -mforce-drap -fpic 
> (internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1188)
>
> The stack pointer register is explicitly rejected by simplify_subreg_regno.
> So gen_lowpart fails to return a regno for 'sp' in config/i386/i386.md:28561:
>
>   ;; Attempt to always use XOR for zeroing registers (including FP modes).
>   (define_peephole2
>     [(set (match_operand 0 "general_reg_operand")
>           (match_operand 1 "const0_operand"))]
>     "GET_MODE_SIZE (GET_MODE (operands[0])) <= UNITS_PER_WORD
>      && (! TARGET_USE_MOV0 || optimize_insn_for_size_p ())
>      && peep2_regno_dead_p (0, FLAGS_REG)"
>     [(parallel [(set (match_dup 0) (const_int 0))
>                 (clobber (reg:CC FLAGS_REG))])]
>     "operands[0] = gen_lowpart (word_mode, operands[0]);")
>
> peephole2 tries to generate the following, but gen_lowpart cannot create the
> subreg:
>
>   (insn 29 8 12 2 (parallel [
>               (set (subreg:DI (reg/v:SI 7 sp [ a ]) 0)
>                   (const_int 0 [0]))
>               (clobber (reg:CC 17 flags))
>           ])
>
> ICE backtrace:
>   .../gcc/gcc/testsuite/gcc.dg/asan/pr82517.c:34:1: internal compiler error: 
> in gen_reg_rtx, at emit-rtl.cc:1188
>   0x22d3d1f internal_error(char const*, ...)
>           
> /home/dinux/projects/pru/local-workspace/gcc/gcc/diagnostic-global-context.cc:517
>   0x669157 fancy_abort(char const*, int, char const*)
>           /home/dinux/projects/pru/local-workspace/gcc/gcc/diagnostic.cc:1815
>   0x46df3d gen_reg_rtx(machine_mode)
>           /home/dinux/projects/pru/local-workspace/gcc/gcc/emit-rtl.cc:1188
>   0x946c80 copy_to_reg(rtx_def*)
>           /home/dinux/projects/pru/local-workspace/gcc/gcc/explow.cc:622
>   0xd5c64f gen_lowpart_general(machine_mode, rtx_def*)
>           /home/dinux/projects/pru/local-workspace/gcc/gcc/rtlhooks.cc:56
>   0x1998ca9 gen_split_318(rtx_insn*, rtx_def**)
>           
> /home/dinux/projects/pru/local-workspace/gcc/gcc/config/i386/i386.md:12877
>   0x1d8a2b9 split_insns(rtx_def*, rtx_insn*)
>           
> /home/dinux/projects/pru/local-workspace/gcc/gcc/config/i386/i386.md:20885
>   0x930f8b try_split(rtx_def*, rtx_insn*, int)
>           /home/dinux/projects/pru/local-workspace/gcc/gcc/emit-rtl.cc:3952
>   0xd18ab5 split_insn
>           /home/dinux/projects/pru/local-workspace/gcc/gcc/recog.cc:3479
>   0xd1e877 split_all_insns()
>           /home/dinux/projects/pru/local-workspace/gcc/gcc/recog.cc:3583
>   0xd1e928 execute
>           /home/dinux/projects/pru/local-workspace/gcc/gcc/recog.cc:4507

As an experiment, it would be interesting to see whether subregs of
the stack, frame, and argument pointers are the only exceptions,
in which case we could add a special case.

But if the PR has already been fixed, then I supose there's not much
incentive to rework such a sensitive piece of code. :)

Thanks,
Richard

Reply via email to