Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

Jeff Law via Gcc-patches Fri, 11 Aug 2023 17:04:44 -0700



On 8/11/23 17:32, Vineet Gupta wrote:

On 8/1/23 12:17, Vineet Gupta wrote:
Hi Jeff,
As discussed this morning, I'm sending over dumps for the optim of DFconst -0.0 (PR/110748) [1]For rv64gc_zbs build, IRA is undoing the split which eventually leadsto ICE in final pass.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15

void znd(double *d) {  *d = -0.0;   }


*split1*

(insn 10 3 11 2 (set (reg:DI 136)
        (const_int [0x8000000000000000])) "neg.c":4:5 -1

(insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
        (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1

*ira*

(insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
(const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190{*movdf_hardfloat_rv64}
     (expr_list:REG_DEAD (reg:DI 135)
For the working case, the large const is not involved and not subjectto IRA playing foul.
I investigated this some more. So IRA update_equiv_regs () has codeidentifying potential replacements: if a reg is referenced exactlytwice: set once and used once.
           if (REG_N_REFS (regno) == 2
               && (rtx_equal_p (replacement, src)
               || ! equiv_init_varies_p (src))
               && NONJUMP_INSN_P (insn)
               && equiv_init_movable_p (PATTERN (insn), regno))
             reg_equiv[regno].replace = 1;
         }
And combine_and_move_insns () does the replacement, undoing the split1above.

Right. This is as expected. There was actually similar code that goesback even before the introduction of IRA -- like to the 80s and 90s.

Conceptually the idea is a value with an equivalence that has a singleset and single use isn't a good use of a hard register. Better tonarrow the live range to a single pair of instructions.

It's not always a good tradeoff. Consider if the equivalence was also aloop invariant and hoisted out of the loop and register pressure is low.

In fact this is the reason for many more split1 being undone. See thesuboptimal codegen for large const for Andrew Pinski's test case [1]

No doubt.  I think it's also a problem with some of Jivan's work.

I'm wondering (naively) if there is some way to tune this - for a givenbackend. In general it would make sense to do the replacement, but notif the cost changes (e.g. consts could be embedded in x86 insn freely,but not for RISC-V where this is costly and if something is split, itmight been intentional.

I'm not immediately aware of a way to tune.

When it comes to tuning, the toplevel questions are do we have any ofthe info we need to tune at the point where the transformation occurs.The two most obvious pieces here would be loop info an register pressure.

ie, do we have enough loop structure to know if the def is at ashallower loop nest than the use. There's a reasonable chance we havethis information as my recollection is this analysis is done fairlyearly in IRA.

But that means we likely don't have any sense of register pressure atthe points between the def and use. So the most useful metric fortuning isn't really available.

The one thing that stands out is we don't do this transformation at allwhen register pressure sensitive scheduling is enabled. And we reallyshould be turning that on by default. Our data shows register pressuresensitive scheduling is about a 6-7% cycle improvement on x264 as itavoids spilling in those key satd loops.

 /* Don't move insns if live range shrinkage or register
     pressure-sensitive scheduling were done because it will not
     improve allocation but likely worsen insn scheduling.  */
  if (optimize
      && !flag_live_range_shrinkage
      && !(flag_sched_pressure && flag_schedule_insns))
    combine_and_move_insns ();

So you might want to look at register pressure sensitive schedulingfirst. If you go into x264_r from specint and look atx264_pixel_satd_8x4. First verify the loops are fully unrolled. Ifthey are, then look for 32bit loads/stores into the stack. If you havethem, then you're spilling and getting crappy performance. Usingregister pressure sensitive scheduling should help significantly.

We've certainly seen that internally. The plan was to submit a patch tomake register pressure sensitive scheduling the default when thescheduler is enabled. We just haven't pushed on it. If you can verifythat you're seeing spilling as well, then it'd certainly bolster theargument that register-pressure-sensitive-scheduling is desirable.


Jeff

Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

Reply via email to