On 8/1/23 12:17, Vineet Gupta wrote:
Hi Jeff,

As discussed this morning, I'm sending over dumps for the optim of DF const -0.0 (PR/110748)  [1] For rv64gc_zbs build, IRA is undoing the split which eventually leads to ICE in final pass.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15

void znd(double *d) {  *d = -0.0;   }


*split1*

(insn 10 3 11 2 (set (reg:DI 136)
        (const_int [0x8000000000000000])) "neg.c":4:5 -1

(insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
        (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1

*ira*

(insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
        (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 {*movdf_hardfloat_rv64}
     (expr_list:REG_DEAD (reg:DI 135)


For the working case, the large const is not involved and not subject to IRA playing foul.

I investigated this some more. So IRA update_equiv_regs () has code identifying potential replacements: if a reg is referenced exactly twice: set once and used once.

          if (REG_N_REFS (regno) == 2
              && (rtx_equal_p (replacement, src)
              || ! equiv_init_varies_p (src))
              && NONJUMP_INSN_P (insn)
              && equiv_init_movable_p (PATTERN (insn), regno))
            reg_equiv[regno].replace = 1;
        }

And combine_and_move_insns () does the replacement, undoing the split1 above.

As a hack if I disable this code, the ICE above goes away and we get the right output.

    bseti    a5,zero,63
    sd    a5,0(a0)
    ret

In fact this is the reason for many more split1 being undone. See the suboptimal codegen for large const for Andrew Pinski's test case [1]

  long long f(void)
    {
      unsigned t = 0x101_0101;
      long long t1 = t;
      long long t2 = ((unsigned long long )t) << 32;
      asm("":"+r"(t1));
      return t1 | t2;
    }

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620413.html

Again if I use the hacked compiler the redundant immediate goes away.

      upstream          |       ira hack         |
    li a0,0x101_0000    |    li   a5,0x1010000   |
    addi a0,a0,0x101    |    addi a5,a5,0x101    |
    li a5,0x101_0000    |    mv   a0,a5          |
    slli a0,a0,32       |    slli a5,a5,32       |
    addi a5,a5,0x101    |    or   a0,a0,a5       |
    or   a0,a5,a0       |    ret                 |
    ret                 |                        |



This code has been there since 2009, when ira.c was created so it obviously per design/expected.

I'm wondering (naively) if there is some way to tune this - for a given backend. In general it would make sense to do the replacement, but not if the cost changes (e.g. consts could be embedded in x86 insn freely, but not for RISC-V where this is costly and if something is split, it might been intentional.

Thx,
-Vineet

Reply via email to