On Wed, Jun 18, 2025 at 04:06:14PM +0100, Stafford Horne wrote:
> On Sat, Jun 07, 2025 at 06:53:28PM +0300, Dimitar Dimitrov wrote:
> > On Sat, Jun 07, 2025 at 11:38:46AM +0100, Stafford Horne wrote:
> > > On Fri, Jun 06, 2025 at 09:54:53PM +0100, Stafford Horne wrote:
> > > > On Fri, Jun 06, 2025 at 11:31:00PM +0300, Dimitar Dimitrov wrote:
> > > > > On Fri, Jun 06, 2025 at 06:12:13PM +0100, Stafford Horne wrote:
> > > > > > On Fri, Jun 06, 2025 at 04:41:21PM +0100, Stafford Horne wrote:
> > > > > > > On Fri, Jun 06, 2025 at 04:20:10PM +0100, Richard Sandiford wrote:
> > > > > > > > Stafford Horne <[email protected]> writes:
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > This seems to be causing a build regression on the or1k port.
> > > > > > > > >
> > > > > > > > > I have not quite figured it out yet but I have bisected to
> > > > > > > > > this commit.
> > > > > > > > >
> > > > > > > > > The failure is as below, this seems to be caused by the
> > > > > > > > > cstoresi4 instruction produced by
> > > > > > > > > or1k.md. So I think its likely something we are doing funny
> > > > > > > > > in the port.
> > > > > > > > >
> > > > > > > > > Thanks to Jeff for pointing this out. If you have any idea
> > > > > > > > > let me know.
> > > > > > > > >
> > > > > > > > > Log:
> > > > > > > > >
> > > > > > > > > during RTL pass: ce1
> > > > > > > > > dump file: libgcc2.c.286r.ce1
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c: In
> > > > > > > > > function '__muldi3':
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c:538:1:
> > > > > > > > > internal compiler error: in emit_move_multi_word, at
> > > > > > > > > expr.cc:4492
> > > > > > > > > 538 | }
> > > > > > > > > | ^
> > > > > > > > > 0x1b094df internal_error(char const*, ...)
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/diagnostic-global-context.cc:517
> > > > > > > > > 0x6459c1 fancy_abort(char const*, int, char const*)
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/diagnostic.cc:1815
> > > > > > > > > 0x473a2e emit_move_multi_word
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:4492
> > > > > > > > > 0x93fd7d emit_move_insn(rtx_def*, rtx_def*)
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:4746
> > > > > > > > > 0x9175f1 copy_to_reg(rtx_def*)
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/explow.cc:630
> > > > > > > > > 0xd18977 gen_lowpart_general(machine_mode, rtx_def*)
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/rtlhooks.cc:56
> > > > > > > > > 0x9414c7 convert_mode_scalar
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:818
> > > > > > > > > 0x9421a1 convert_modes(machine_mode, machine_mode,
> > > > > > > > > rtx_def*, int)
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/expr.cc:961
> > > > > > > > > 0xc0afa0 prepare_operand(insn_code, rtx_def*, int,
> > > > > > > > > machine_mode, machine_mode, int)
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:4637
> > > > > > > > > 0xc0b33a prepare_cmp_insn
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:4538
> > > > > > > > > 0xc0f0dd emit_conditional_move(rtx_def*, rtx_comparison,
> > > > > > > > > rtx_def*, rtx_def*, machine_mode, int)
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/optabs.cc:5098
> > > > > > > > > 0x192cc7b noce_emit_cmove
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:1777
> > > > > > > > > 0x193365b try_emit_cmove_seq
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3410
> > > > > > > > > 0x193365b noce_convert_multiple_sets_1
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3705
> > > > > > > > > 0x1933c71 noce_convert_multiple_sets
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:3496
> > > > > > > > > 0x1938687 noce_process_if_block
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4045
> > > > > > > > > 0x1938687 noce_find_if_block
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4726
> > > > > > > > > 0x1938687 find_if_header
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:4931
> > > > > > > > > 0x1938687 if_convert
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:6078
> > > > > > > > > 0x1938d01 rest_of_handle_if_conversion
> > > > > > > > >
> > > > > > > > > /home/shorne/work/gnu-toolchain/gcc/gcc/ifcvt.cc:6143
> > > > > > > > >
> > > > > > > > > -Stafford
> > > > > > > >
> > > > > > > > Can you attach the .i?
> > > > > > >
> > > > > > > Sure please find attached, I can reproduce with the or1k compiler
> > > > > > > using:
> > > > > > >
> > > > > > > $HOME/work/gnu-toolchain/build-gcc/./gcc/cc1 -O2 __muldi3.i
> > > > > > >
> > > > > > > I am still debugging, but I notice when reverting the patch the
> > > > > > > compiler never
> > > > > > > goes to emit_move_multi_word(). I am trying to work backwards as
> > > > > > > to where it goes
> > > > > > > different, it's just taking a bit of time as I haven't debugged
> > > > > > > GCC ICE's for a
> > > > > > > while.
> > > > > >
> > > > > > I made some progress on this. In openRISC we generate cstoresi4
> > > > > > with special
> > > > > > register SR_F (condition flag) bit register.
> > > > > >
> > > > > > This is represented as (reg:BI 34 ?sr_f).
> > > > > >
> > > > > > The RTL produced for cstoresi4 becomes something like:
> > > > > >
> > > > > > // The compare
> > > > > > (insn 49 48 50 (set (reg:BI 34 ?sr_f)
> > > > > > (leu:BI (reg/v:SI 68 [ __x2D.6783 ])
> > > > > > (reg/v:SI 69 [ __x1D.6782 ])))
> > > > > > "/home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c":532:22 -1
> > > > > > (nil)
> > > > > > // The conditional branch
> > > > > > (jump_insn 50 49 0 (set (pc)
> > > > > > (if_then_else (ne (reg:BI 34 ?sr_f)
> > > > > > (const_int 0 [0]))
> > > > > > (label_ref 0)
> > > > > > (pc)))
> > > > > > "/home/shorne/work/gnu-toolchain/gcc/libgcc/libgcc2.c":532:22 -1
> > > > > > (int_list:REG_BR_PROB 536870916 (nil)))
> > > > > >
> > > > > > During ce1 the try_emit_cmove_seq would try to convert (reg:BI 34
> > > > > > ?sr_f) to SI.
> > > > > > Before this patch it was done in gen_lowpart_general by
> > > > > > simplify_context::simplify_gen_subreg.
> > > > > >
> > > > > > The simplity to subreg operation would convert:
> > > > > > (reg:BI 34 ?sr_f)
> > > > > >
> > > > > > To:
> > > > > > (subreg:SI (reg:BI 34 ?sr_f) 0)
> > > > > >
> > > > > > But now the call to validate_subreg is returning false and
> > > > > > simplify_gen_subreg
> > > > > > returns NULL_RTX, this causes gen_lowpart_general to try to do
> > > > > > copy_to_reg,
> > > > > > which fails.
> > > > > >
> > > > > > I need to step away for a bit, but I would next look into why
> > > > > > validate_subreg
> > > > > > is now failing to allow (reg:BI 34 ?sr_f) to be subreg'd.
> > > > >
> > > > > I think that the REG_CAN_CHANGE_MODE_P call in validate_subreg is
> > > > > rejecting the subreg, because or1k_can_change_mode_class rejects the
> > > > > mode change of sr_f from BI to SI.
> > > > >
> > > > Hi Dimitar,
> > > >
> > > > That's right, I noticed the same thing and I am testing out a patch to
> > > > or1k_can_change_mode_class now, it fixes the ICE. But I'll have to
> > > > spend a bit
> > > > more time testing.
> > >
> > > Hi All,
> > >
> > > The patch I have now is as below and it looks to work ok:
> > >
> > > --- a/gcc/config/or1k/or1k.cc
> > > +++ b/gcc/config/or1k/or1k.cc
> > > @@ -1408,8 +1408,9 @@ static bool
> > > or1k_can_change_mode_class (machine_mode from, machine_mode to,
> > > reg_class_t rclass)
> > > {
> > > + /* Allow cnoverting special flags to SI mode subregs. */
> > > if (rclass == FLAG_REGS)
> > > - return from == to;
> > > + return from == to || (from == BImode && to == SImode);
> > > return true;
> > > }
> >
> > Another patch I intend to merge tomorrow will tighten the checks even
> > more: https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685837.html
>
> Hi Dimitar,
>
> Have you merged this yet?
Yes, I merged it today as r16-1565-g2dcc6dbd8a00ca.
>
> > Seeing that or1k_hard_regno_mode_ok forbids SImode for class FLAG_REGS,
> > the validate_subreg may once again start rejecting subregs for or1k.
>
> As you rightfully point out, after that patch the issue returns again. I have
> the following fix which allows convert_mode_scalar to bypass trying to use
> subregs and use "sign extension" instead as a method to move the SR[F] details
> into a register.
>
> diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md
> index 627e40084b3..66e5af10a98 100644
> --- a/gcc/config/or1k/or1k.md
> +++ b/gcc/config/or1k/or1k.md
> @@ -515,6 +515,24 @@
> (ne:SI (reg:BI SR_F_REGNUM) (const_int 0)))]
> "")
>
> +(define_expand "zero_extendbisi2"
> + [(set (match_operand:SI 0 "register_operand" "")
> + (zero_extend:SI (match_operand:BI 1 "register_operand" "")))]
> + ""
> +{
> + emit_insn(gen_sne_sr_f (operands[0]));
> + DONE;
> +})
> +
> +(define_expand "extendbisi2"
> + [(set (match_operand:SI 0 "register_operand" "")
> + (sign_extend:SI (match_operand:BI 1 "register_operand" "")))]
> + ""
> +{
> + emit_insn(gen_sne_sr_f (operands[0]));
> + DONE;
> +})
> +
>
> This fixes the ICE issue, but it's a bit of a hack and it points out that the
> ce1
> pass is not able to perform good optimizations on OpenRISC.
>
> This is because openrisc very early (during expand) converts branch statements
> separate to "set flag" and "branch if flag". If we would maintain the branch
> RTL in the original form a bit longer I think the ce1 pass could have a better
> chance to optimize.
>
> e.g. cbranchsi2 on OpenRISC produces:
>
> (insn 11 10 12 2 (set (reg:BI 34 ?sr_f)
> (le:BI (reg:SI 46)
> (reg:SI 47))) "cmov.c":30:6 -1
> (nil))
> (jump_insn 12 11 13 2 (set (pc)
> (if_then_else (ne (reg:BI 34 ?sr_f)
> (const_int 0 [0]))
> (label_ref:SI 25)
> (pc))) "cmov.c":30:6 77 {*cbranch}
> (int_list:REG_BR_PROB 365072228 (nil))
> -> 25)
>
> The ce1 pass tries to optimize "insn 12".
>
> I think it could do better if we first kept the form, as below, and let later
> split patterns generate the "*sr_f" and "*cbranch" patterns.
>
> (jump_insn (set (pc)
> (if_then_else (le (reg:SI 47)
> (reg: SI 47))
> (label_ref:SI 25)
> (pc)))
> )
>
> I will try this.
When you split, I assume you'll clobber SR_F register. In such case
you'll need to ensure it is not live. Perhaps do the split before
register allocation using a pseudo BImode register?
Regards,
Dimitar
>
> -Stafford