RE: [PATCH][MIPS] P5600 scheduling
Hi Richard, Please refer to the attached patch files. gcc-p5600-noMSA.patch The patch implements P5600 pipeline and scheduling for GP and FPU instructions. gcc-p5600-noMSA-msched-weight.patch The patch implements -msched-weight option. The -msched-weight option: We are using ~650 hot-spot functions from VP9/VP8/H264/MPEG4 codes available to us as a test suite. The default Haifa-scheduler worked well for most of the functions, but excess spilling was observed in cases where register pressure was more than ~20. The -fsched-pressure flag proved good in some cases, but the algorithm focuses more on reducing register pressure. We observed increase in stalls (but less spilling) with the -fsched- pressure option. When the register pressure goes beyond a certain threshold, the -msched-weight option tries to keep it down by promoting certain instructions up in the order. It has been implemented as part of TARGET_SCHED_REORDER target hook (function mips_sched_weight). The change is generic and there is nothing specific to MIPS. When the register pressure goes beyond 15 then an instruction with maximum forward dependency is promoted ahead of the instruction at READY[NREADY-1]. Scheduling of an INSN with maximum forward dependency enables early scheduling of instructions dependent on it. When the register pressure goes beyond 25 and if consumer of the instruction in question (INSN) has higher priority than the instruction at READY[NREADY-1] then INSN is promoted. This chooses an INSN which has a high priority instruction dependent on it. This triggers the scheduling of that consumer instruction as early as possible to free up the registers used by that instruction. Change log: 2014-05-21 Jaydeep Patil Prachi Godbole * config/mips/mips.c (mips_sched_weight): New function. * config/mips/mips.opt: New option -msched-weight. 2014-05-20 Jaydeep Patil Prachi Godbole * config/mips/p5600.md: New file. * config/mips/mips-cpus.def: Added p5600. * config/mips/mips-protos.h (mips_fmadd_bypass): Add prototype. * config/mips/mips.c (mips_fmadd_bypass): New function. * config/mips/mips.h: Added p5600 default settings. * config/mips/mips.md: Included p5600.md. Regards, Jaydeep -Original Message- From: Richard Sandiford [mailto:rdsandif...@googlemail.com] Sent: 20 May 2014 AM 02:37 To: Jaydeep Patil Cc: Rich Fuhler; Matthew Fortune Subject: Re: [PATCH][MIPS] P5600 scheduling Jaydeep Patil writes: > Please refer to the attached patch which implements P5600 pipeline and > scheduling for GP and FPU instructions. > > Target option -msched-pressure has been implemented to reduce register > pressure while keeping the latency intact. Please could you submit the -msched-pressure stuff separately? Also please explain the heuristic in more detail. Is it really MIPS-specific, or should it be done in generic scheduler code? > +;; The address generation queue (AGQ) has AL2, CTISTD and LDSTA pipes > +(define_cpu_unit "p5600_agq, p5600_al2, p5600_ctistd, p5600_ldsta, > + p5600_gpdiv" "p5600_agen_pipe") Please indent to the column after the ". > +;; fadd, fsub > +(define_insn_reservation "fpu_fadd" 4 > + (eq_attr "type" "fadd,fabs,fneg") > + "p5600_fpu_long, p5600_fpu_apu") The insn reservations should have a p5600_ prefix too. > +;; fload > +(define_insn_reservation "fpu_fload" 8 > + (and (eq_attr "type" "fpload,fpidxload") > +(eq_attr "mode" "!TI")) > + "p5600_fpu_long, p5600_fpu_apu") > + > +;; fstore > +(define_insn_reservation "fpu_fstore" 1 > + (and (eq_attr "type" "fpstore,fpidxstore") > +(eq_attr "mode" "!TI")) > + "p5600_fpu_short, p5600_fpu_apu") Checking for just !TI (without TF) seems a bit odd. If you want to test for a single load or store, you should be able to use (eq_attr "insn_count" "1") instead. > +;; call > +(define_insn_reservation "int_call" 2 > + (eq_attr "jal" "indirect,direct") > + "p5600_agq_ctistd") > + Very minor nit, but no blank line at the end of the file. > @@ -13059,6 +13093,32 @@ mips_output_division (const char *division, rtx > *operands) >return s; > } > > +/* Return true if destination of IN_INSN is used as add source in > + OUT_INSN. Both IN_INSN and OUT_INSN are of type fmadd. Example: > + madd.s dst, x, y, z > + madd.s a, dst, b, c */ > + > +bool > +mips_fmadd_bypass (rtx out_insn, rtx in_insn) { > + int dst_reg, src_reg; > + > + if ((recog_memoized (in_insn) < 0) > + || (recog_memoized (out_insn) < 0)) &g
RE: [PATCH][MIPS] P5600 scheduling
Hi Richard, Thanks for the review. Let me get back to you on the results of -fsched-pressure --param sched-pressure-algorithm=2 options. Yes, we start the EBB with pressure 0. Thanks, Jaydeep -Original Message- From: Richard Sandiford [mailto:rdsandif...@googlemail.com] Sent: 22 May 2014 PM 08:23 To: Jaydeep Patil Cc: Rich Fuhler; Matthew Fortune; gcc-patches@gcc.gnu.org Subject: Re: [PATCH][MIPS] P5600 scheduling Hi Jaydeep, Thanks for the write-up and updated patches. I'll try to get to them this weekend. In the meantime... Jaydeep Patil writes: > The -msched-weight option: > We are using ~650 hot-spot functions from VP9/VP8/H264/MPEG4 codes > available to us as a test suite. The default Haifa-scheduler worked > well for most of the functions, but excess spilling was observed in > cases where register pressure was more than ~20. The -fsched-pressure > flag proved good in some cases, but the algorithm focuses more on > reducing register pressure. > We observed increase in stalls (but less spilling) with the -fsched- > pressure option. When the register pressure goes beyond a certain > threshold, the -msched-weight option tries to keep it down by > promoting certain instructions up in the order. It has been > implemented as part of TARGET_SCHED_REORDER target hook (function > mips_sched_weight). The change is generic and there is nothing > specific to MIPS. > > When the register pressure goes beyond 15 then an instruction with > maximum forward dependency is promoted ahead of the instruction at > READY[NREADY-1]. > Scheduling of an INSN with maximum forward dependency enables early > scheduling of instructions dependent on it. > > When the register pressure goes beyond 25 and if consumer of the > instruction in question (INSN) has higher priority than the > instruction at READY[NREADY-1] then INSN is promoted. This chooses an > INSN which has a high priority instruction dependent on it. This > triggers the scheduling of that consumer instruction as early as > possible to free up the registers used by that instruction. Yeah, this sounds similar to what I was seeing for Cortex-A8 with the default -fsched-pressure (which is tuned for and known to work well on x86). Did you try with: -fsched-pressure --param sched-pressure-heuristic=2 ? That isn't advertised in the documentation because we don't want to make it a user-level option that would then need to be supported in future. But if you find that the above works better than plain -fsched-pressure, which is equivalent to: -fsched-pressure --param sched-pressure-heuristic=1 then we could consider making sched-pressure-heuristic=2 the default for MIPS. I'd certainly be interested to know how it compares with -msched-weight. FWIW the write-up of the alternative -fsched-pressure is here: https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html It sounds like the Linaro guys have a performance fix pending, so if running the tests takes a lot of your time or a lot of resources, it might be worth waiting until that's submitted. E.g. one of the things I noticed with the default -fsched-pressure at the time -- not sure whether this has changed since -- is that the pressure at the start of the EBB was based on the total number of live values, including those that are live across a loop but not used in it. So in many cases the starting pressure was too high and so like you say the heuristic was too conservative. That was one of the main things that the alternative heuristic was supposed to help with. It looks like you solve that by starting with a pressure of 0 for each EBB, is that right? That's a bit more aggressive than what I did, since AIUI starting with 0 will ignore loop invariants. On the other hand, being more aggressive (i.e. closer to what you'd get with the default scheduling heuristic) also means that it's more likely to be usable by default. Thanks, Richard
RE: [PATCH][MIPS] P5600 scheduling
Hi Richard, Please refer to the attached patch files. gcc-p5600-noMSA.patch TARGET_P5600 has been removed gcc-p5600-noMSA-msched-weight.patch Per-instruction structure has been created to store both GP and VEC pressures. We are using size of a mode to differentiate GP and VEC registers. Registers of size 16bytes and above are considered as vector registers. Regards, Jaydeep -Original Message- From: Richard Sandiford [mailto:rdsandif...@googlemail.com] Sent: 25 May 2014 PM 04:18 To: Jaydeep Patil Cc: Rich Fuhler; Matthew Fortune; gcc-patches@gcc.gnu.org Subject: Re: [PATCH][MIPS] P5600 scheduling Jaydeep Patil writes: > Hi Richard, > > Please refer to the attached patch files. > > gcc-p5600-noMSA.patch > The patch implements P5600 pipeline and scheduling for GP and FPU > instructions. This patch is OK, thanks. Generally (i.e. when I remember to check) we don't define TARGET_* options until they're needed, so please leave out the TARGET_P5600 definition. > gcc-p5600-noMSA-msched-weight.patch > The patch implements -msched-weight option. > +/* Return 1 if size of REG_MODE matches size of REG_TYPE. */ > +#define GET_REGTYPE_SIZE(REG_MODE, REG_TYPE) \ > + (((REG_TYPE) == GPREG \ > +&& GET_MODE_SIZE ((REG_MODE)) <= GET_MODE_SIZE (DImode)) \ > + || ((REG_TYPE) == VECREG \ > + && GET_MODE_SIZE ((REG_MODE)) > GET_MODE_SIZE (DImode))) Is size the best way of distinguishing between these? I would have thought that SFmode values would be counted against the VEC/FP register file instead. Also, in 32-bit mode a DImode would take up 2 registers rather than 1. Maybe it would be better to count SCALAR_INT_MODE_P modes against GPREG and others against VECREG. That would still mishandle things like float<->integer conversions. We could avoid that by checking the register classes in the instruction constraints, but since this is a heuristic, in practice it might be OK to ignore corner cases like that. Rather than: > + INSN_REGTYPE_WEIGHT (insn, GPREG) > + = find_insn_regtype_weight (insn, GPREG); > + INSN_REGTYPE_WEIGHT (insn, VECREG) > + = find_insn_regtype_weight (insn, VECREG); and: > + if (regtype_weight[GPREG] != NULL) > + CURR_REGTYPE_PRESSURE (GPREG) > + += INSN_REGTYPE_WEIGHT (insn, GPREG); > + if (regtype_weight[VECREG] != NULL) > + CURR_REGTYPE_PRESSURE (VECREG) > + += INSN_REGTYPE_WEIGHT (insn, VECREG); I think it would be more efficient to have a per-insn structure that contains both the GP and VEC pressures and only process each instruction once. > + regtype_weight[GPREG] = (short *) xcalloc (old_max_uid, sizeof > + (short)); regtype_weight[VECREG] = (short *) xcalloc (old_max_uid, > + sizeof (short)); ... = XCNEWVEC (short, old_max_uid); > + FOR_EACH_BB_REVERSE_FN (b, cfun) > +{ > + rtx insn, next_tail, head, tail; > + get_ebb_head_tail (b, b, &head, &tail); > + next_tail = NEXT_INSN (tail); > + for (insn = head; insn != next_tail; insn = NEXT_INSN (insn)) > + { > + /* Handle register life information. */ > + if (!INSN_P (insn)) > + continue; AFAICT this loop doesn't carry any state between instructions, so it should be: FOR_EACH_BB_FN (b, cfun) FOR_BB_INSNS (b, insn) if (NONDEBUG_INSN_P (insn)) { ... } Thanks, Richard gcc-p5600-noMSA.patch Description: gcc-p5600-noMSA.patch gcc-p5600-noMSA-msched-weight.patch Description: gcc-p5600-noMSA-msched-weight.patch