RE: [PATCH][MIPS] P5600 scheduling

2014-05-22 Thread Jaydeep Patil
Hi Richard,

Please refer to the attached patch files.

gcc-p5600-noMSA.patch
  The patch implements P5600 pipeline and scheduling for GP and FPU 
instructions.

gcc-p5600-noMSA-msched-weight.patch
  The patch implements -msched-weight option.

The -msched-weight option:
We are using ~650 hot-spot functions from VP9/VP8/H264/MPEG4 codes 
available to us as a test suite. The default Haifa-scheduler worked 
well for most of the functions, but excess spilling was observed in 
cases where register pressure was more than ~20. The 
-fsched-pressure flag proved good in some cases, but the algorithm 
focuses more on reducing register pressure.
We observed increase in stalls (but less spilling) with the -fsched- 
pressure option. When the register pressure goes beyond a certain 
threshold, the -msched-weight option tries to keep it down by 
promoting certain instructions up in the order. It has been 
implemented as part of TARGET_SCHED_REORDER target hook (function 
mips_sched_weight). The change is generic and there is nothing 
specific to MIPS.

When the register pressure goes beyond 15 then an instruction with 
maximum forward dependency is promoted ahead of the instruction at
READY[NREADY-1].
Scheduling of an INSN with maximum forward dependency enables early 
scheduling of instructions dependent on it.

When the register pressure goes beyond 25 and if consumer of the 
instruction in question (INSN) has higher priority than the 
instruction at READY[NREADY-1] then INSN is promoted. This chooses 
an INSN which has a high priority instruction dependent on it. This 
triggers the scheduling of that consumer instruction as early as 
possible to free up the registers used by that instruction.

Change log:

2014-05-21  Jaydeep Patil  
Prachi Godbole  

* config/mips/mips.c (mips_sched_weight): New function.
* config/mips/mips.opt: New option -msched-weight.

2014-05-20  Jaydeep Patil  
Prachi Godbole  

* config/mips/p5600.md: New file.
* config/mips/mips-cpus.def: Added p5600.
* config/mips/mips-protos.h (mips_fmadd_bypass): Add prototype.
* config/mips/mips.c (mips_fmadd_bypass): New function.
* config/mips/mips.h: Added p5600 default settings.
* config/mips/mips.md: Included p5600.md.

Regards,
Jaydeep

-Original Message-
From: Richard Sandiford [mailto:rdsandif...@googlemail.com] 
Sent: 20 May 2014 AM 02:37
To: Jaydeep Patil
Cc: Rich Fuhler; Matthew Fortune
Subject: Re: [PATCH][MIPS] P5600 scheduling

Jaydeep Patil  writes:
> Please refer to the attached patch which implements P5600 pipeline and 
> scheduling for GP and FPU instructions.
>
> Target option -msched-pressure has been implemented to reduce register 
> pressure while keeping the latency intact.

Please could you submit the -msched-pressure stuff separately?
Also please explain the heuristic in more detail.  Is it really MIPS-specific, 
or should it be done in generic scheduler code?

> +;; The address generation queue (AGQ) has AL2, CTISTD and LDSTA pipes 
> +(define_cpu_unit "p5600_agq, p5600_al2, p5600_ctistd, p5600_ldsta,
> + p5600_gpdiv" "p5600_agen_pipe")

Please indent to the column after the ".

> +;; fadd, fsub
> +(define_insn_reservation "fpu_fadd" 4
> +  (eq_attr "type" "fadd,fabs,fneg")
> +  "p5600_fpu_long, p5600_fpu_apu")

The insn reservations should have a p5600_ prefix too.

> +;; fload
> +(define_insn_reservation "fpu_fload" 8
> +  (and (eq_attr "type" "fpload,fpidxload")
> +(eq_attr "mode" "!TI"))
> +  "p5600_fpu_long, p5600_fpu_apu")
> +
> +;; fstore
> +(define_insn_reservation "fpu_fstore" 1
> +  (and (eq_attr "type" "fpstore,fpidxstore")
> +(eq_attr "mode" "!TI"))
> +  "p5600_fpu_short, p5600_fpu_apu")

Checking for just !TI (without TF) seems a bit odd.  If you want to test for a 
single load or store, you should be able to use (eq_attr "insn_count" "1") 
instead.

> +;; call
> +(define_insn_reservation "int_call" 2
> +  (eq_attr "jal" "indirect,direct")
> +  "p5600_agq_ctistd")
> +

Very minor nit, but no blank line at the end of the file.

> @@ -13059,6 +13093,32 @@ mips_output_division (const char *division, rtx 
> *operands)
>return s;
>  }
>  

> +/* Return true if destination of IN_INSN is used as add source in
> +   OUT_INSN. Both IN_INSN and OUT_INSN are of type fmadd. Example:
> +   madd.s dst, x, y, z
> +   madd.s a, dst, b, c */
> +
> +bool
> +mips_fmadd_bypass (rtx out_insn, rtx in_insn) {
> +  int dst_reg, src_reg;
> +
> +  if ((recog_memoized (in_insn) < 0)
> +   || (recog_memoized (out_insn) < 0))
&g

RE: [PATCH][MIPS] P5600 scheduling

2014-05-22 Thread Jaydeep Patil
Hi Richard,

Thanks for the review.
Let me get back to you on the results of -fsched-pressure --param 
sched-pressure-algorithm=2 options.
Yes, we start the EBB with pressure 0.

Thanks,
Jaydeep

-Original Message-
From: Richard Sandiford [mailto:rdsandif...@googlemail.com] 
Sent: 22 May 2014 PM 08:23
To: Jaydeep Patil
Cc: Rich Fuhler; Matthew Fortune; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH][MIPS] P5600 scheduling

Hi Jaydeep,

Thanks for the write-up and updated patches.  I'll try to get to them this 
weekend.  In the meantime...

Jaydeep Patil  writes:
> The -msched-weight option:
> We are using ~650 hot-spot functions from VP9/VP8/H264/MPEG4 codes 
> available to us as a test suite. The default Haifa-scheduler worked 
> well for most of the functions, but excess spilling was observed in 
> cases where register pressure was more than ~20. The -fsched-pressure 
> flag proved good in some cases, but the algorithm focuses more on 
> reducing register pressure.
> We observed increase in stalls (but less spilling) with the -fsched- 
> pressure option. When the register pressure goes beyond a certain 
> threshold, the -msched-weight option tries to keep it down by 
> promoting certain instructions up in the order. It has been 
> implemented as part of TARGET_SCHED_REORDER target hook (function 
> mips_sched_weight). The change is generic and there is nothing 
> specific to MIPS.
>
> When the register pressure goes beyond 15 then an instruction with 
> maximum forward dependency is promoted ahead of the instruction at 
> READY[NREADY-1].
> Scheduling of an INSN with maximum forward dependency enables early 
> scheduling of instructions dependent on it.
>
> When the register pressure goes beyond 25 and if consumer of the 
> instruction in question (INSN) has higher priority than the 
> instruction at READY[NREADY-1] then INSN is promoted. This chooses an 
> INSN which has a high priority instruction dependent on it. This 
> triggers the scheduling of that consumer instruction as early as 
> possible to free up the registers used by that instruction.

Yeah, this sounds similar to what I was seeing for Cortex-A8 with the default 
-fsched-pressure (which is tuned for and known to work well on x86).  Did you 
try with:

  -fsched-pressure --param sched-pressure-heuristic=2

?  That isn't advertised in the documentation because we don't want to make it 
a user-level option that would then need to be supported in future.  But if you 
find that the above works better than plain -fsched-pressure, which is 
equivalent to:

  -fsched-pressure --param sched-pressure-heuristic=1

then we could consider making sched-pressure-heuristic=2 the default for MIPS.  
I'd certainly be interested to know how it compares with -msched-weight.

FWIW the write-up of the alternative -fsched-pressure is here:

  https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html

It sounds like the Linaro guys have a performance fix pending, so if running 
the tests takes a lot of your time or a lot of resources, it might be worth 
waiting until that's submitted.

E.g. one of the things I noticed with the default -fsched-pressure at the time 
-- not sure whether this has changed since -- is that the pressure at the start 
of the EBB was based on the total number of live values, including those that 
are live across a loop but not used in it.  So in many cases the starting 
pressure was too high and so like you say the heuristic was too conservative.  
That was one of the main things that the alternative heuristic was supposed to 
help with.

It looks like you solve that by starting with a pressure of 0 for each EBB, is 
that right?  That's a bit more aggressive than what I did, since AIUI starting 
with 0 will ignore loop invariants.
On the other hand, being more aggressive (i.e. closer to what you'd get with 
the default scheduling heuristic) also means that it's more likely to be usable 
by default.

Thanks,
Richard


RE: [PATCH][MIPS] P5600 scheduling

2014-05-27 Thread Jaydeep Patil
Hi Richard,

Please refer to the attached patch files.

gcc-p5600-noMSA.patch 
TARGET_P5600 has been removed

gcc-p5600-noMSA-msched-weight.patch
Per-instruction structure has been created to store both GP and VEC pressures. 
We are using size of a mode to differentiate GP and VEC registers. Registers of 
size 16bytes and above are considered as vector registers.

Regards,
Jaydeep

-Original Message-
From: Richard Sandiford [mailto:rdsandif...@googlemail.com] 
Sent: 25 May 2014 PM 04:18
To: Jaydeep Patil
Cc: Rich Fuhler; Matthew Fortune; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH][MIPS] P5600 scheduling

Jaydeep Patil  writes:
> Hi Richard,
>
> Please refer to the attached patch files.
>
> gcc-p5600-noMSA.patch
>   The patch implements P5600 pipeline and scheduling for GP and FPU 
> instructions.

This patch is OK, thanks.  Generally (i.e. when I remember to check) we don't 
define TARGET_* options until they're needed, so please leave out the 
TARGET_P5600 definition.

> gcc-p5600-noMSA-msched-weight.patch
>   The patch implements -msched-weight option.

> +/* Return 1 if size of REG_MODE matches size of REG_TYPE.  */
> +#define GET_REGTYPE_SIZE(REG_MODE, REG_TYPE) \
> +  (((REG_TYPE) == GPREG  \
> +&& GET_MODE_SIZE ((REG_MODE)) <= GET_MODE_SIZE (DImode)) \
> +   || ((REG_TYPE) == VECREG  \
> +   && GET_MODE_SIZE ((REG_MODE)) > GET_MODE_SIZE (DImode)))

Is size the best way of distinguishing between these?  I would have thought 
that SFmode values would be counted against the VEC/FP register file instead.  
Also, in 32-bit mode a DImode would take up 2 registers rather than 1.  Maybe 
it would be better to count SCALAR_INT_MODE_P modes against GPREG and others 
against VECREG.

That would still mishandle things like float<->integer conversions.
We could avoid that by checking the register classes in the instruction 
constraints, but since this is a heuristic, in practice it might be OK to 
ignore corner cases like that.

Rather than:

> +   INSN_REGTYPE_WEIGHT (insn, GPREG)
> + = find_insn_regtype_weight (insn, GPREG);
> +   INSN_REGTYPE_WEIGHT (insn, VECREG)
> + = find_insn_regtype_weight (insn, VECREG);

and:

> +   if (regtype_weight[GPREG] != NULL)
> + CURR_REGTYPE_PRESSURE (GPREG)
> +   += INSN_REGTYPE_WEIGHT (insn, GPREG);
> +   if (regtype_weight[VECREG] != NULL)
> + CURR_REGTYPE_PRESSURE (VECREG)
> +   += INSN_REGTYPE_WEIGHT (insn, VECREG);

I think it would be more efficient to have a per-insn structure that contains 
both the GP and VEC pressures and only process each instruction once.

> +  regtype_weight[GPREG] = (short *) xcalloc (old_max_uid, sizeof 
> + (short));  regtype_weight[VECREG] = (short *) xcalloc (old_max_uid, 
> + sizeof (short));

... = XCNEWVEC (short, old_max_uid);

> +  FOR_EACH_BB_REVERSE_FN (b, cfun)
> +{
> +  rtx insn, next_tail, head, tail;
> +  get_ebb_head_tail (b, b, &head, &tail);
> +  next_tail = NEXT_INSN (tail);
> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
> + {
> +   /* Handle register life information.  */
> +   if (!INSN_P (insn))
> + continue;

AFAICT this loop doesn't carry any state between instructions, so it should be:

  FOR_EACH_BB_FN (b, cfun)
FOR_BB_INSNS (b, insn)
  if (NONDEBUG_INSN_P (insn))
{
  ...
}

Thanks,
Richard


gcc-p5600-noMSA.patch
Description: gcc-p5600-noMSA.patch


gcc-p5600-noMSA-msched-weight.patch
Description: gcc-p5600-noMSA-msched-weight.patch