I'd like to add a new experimental optimization to the trunk. This
optimization was discussed on RA BOF of this summer GNU Cauldron.
It is a register pressure relief through live-range shrinkage. It
is implemented on the scheduler base and uses register-pressure insn
scheduling infrastructure. By rearranging insns we shorten pseudo
live-ranges and increase a chance to them be assigned to a hard
register.
The code looks pretty simple but there are a lot of works behind
this patch. I've tried about ten different versions of this code
(different heuristics for two currently existing register-pressure
algorithms).
I think it is *upto target maintainers* to decide to use or not to
use this optimization for their targets. I'd recommend to use this at
least for x86/x86-64. I think any OOO processor with small or
moderate register file which does not use the 1st insn scheduling
might benefit from this too.
On SPEC2000 for x86/x86-64 (I use Haswell processor, -O3 with
general tuning), the optimization usage results in smaller code size
in average (for floating point and integer benchmarks in 32- and
64-bit mode). The improvement better visible for SPECFP2000 (although
I have the same improvement on x86-64 SPECInt2000 but it might be
attributed mostly mcf benchmark unstability). It is about 0.5% for
32-bit and 64-bit mode. It is understandable, as the optimization has
more opportunities to improve the code on longer BBs. Different from
other heuristic optimizations, I don't see any significant worse
performance. It gives practically the same or better performance (a
few benchmarks imporoved by 1% or more upto 3%).
The single but significant drawback is additional compilation time
(4%-6%) as the 1st insn scheduling pass is quite expensive. So I'd
recommend target maintainers to switch it on only for -Ofast. If
somebody finds that the optimization works on processors which uses
1st insn scheduling by default (in which I slightly doubt), we could
improve the compilation time by reusing data for this optimization and
the 1st insn scheduling.
Any comments, questions, thoughts are appreciated.
2013-11-05 Vladimir Makarov <[email protected]>
* tree-pass.h (make_pass_live_range_shrinkage): New external.
* timevar.def (TV_LIVE_RANGE_SHRINKAGE): New.
* sched-rgn.c (gate_handle_live_range_shrinkage): New.
(rest_of_handle_live_range_shrinkage): Ditto
(class pass_live_range_shrinkage): Ditto.
(pass_data_live_range_shrinkage): Ditto.
(make_pass_live_range_shrinkage): Ditto.
* sched-int.h (sched_relief_p): New external.
* sched-deps.c (create_insn_reg_set): Make void return value.
* passes.def: Add pass_live_range_shrinkage.
* ira.c (update_equiv_regs): Don't move if
flag_live_range_shrinkage.
* haifa-sched.c (sched_relief_p): New.
(rank_for_schedule): Add code for pressure relief through live
range shrinkage.
(schedule_insn): Print more debug info.
(sched_init): Setup SCHED_PRESSURE_WEIGHTED for pressure relief
through live range shrinkage.
* doc/invoke.texi (-flive-range-shrinkage): New.
* common.opt (flive-range-shrinkage): New.
Index: common.opt
===================================================================
--- common.opt (revision 204380)
+++ common.opt (working copy)
@@ -1738,6 +1738,10 @@ fregmove
Common Ignore
Does nothing. Preserved for backward compatibility.
+flive-range-shrinkage
+Common Report Var(flag_live_range_shrinkage) Init(0) Optimization
+Relief of register pressure through live range shrinkage
+
frename-registers
Common Report Var(flag_rename_registers) Init(2) Optimization
Perform a register renaming optimization pass
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 204216)
+++ doc/invoke.texi (working copy)
@@ -378,7 +378,7 @@ Objective-C and Objective-C++ Dialects}.
-fira-region=@var{region} -fira-hoist-pressure @gol
-fira-loop-pressure -fno-ira-share-save-slots @gol
-fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
--fivopts -fkeep-inline-functions -fkeep-static-consts @gol
+-fivopts -fkeep-inline-functions -fkeep-static-consts -flive-range-shrinkage @gol
-floop-block -floop-interchange -floop-strip-mine -floop-nest-optimize @gol
-floop-parallelize-all -flto -flto-compression-level @gol
-flto-partition=@var{alg} -flto-report -flto-report-wpa -fmerge-all-constants @gol
@@ -7257,6 +7257,12 @@ registers after writing to their lower 3
Enabled for x86 at levels @option{-O2}, @option{-O3}.
+@item -flive-range-shrinkage
+@opindex flive-range-shrinkage
+Attempt to decrease register pressure through register live range
+shrinkage. This is helpful for fast processors with small or moderate
+size register sets.
+
@item -fira-algorithm=@var{algorithm}
Use the specified coloring algorithm for the integrated register
allocator. The @var{algorithm} argument can be @samp{priority}, which
Index: haifa-sched.c
===================================================================
--- haifa-sched.c (revision 204380)
+++ haifa-sched.c (working copy)
@@ -150,6 +150,9 @@ along with GCC; see the file COPYING3.
#ifdef INSN_SCHEDULING
+/* True if we do pressure relief pass. */
+bool sched_relief_p;
+
/* issue_rate is the number of insns that can be scheduled in the same
machine cycle. It can be defined in the config/mach/mach.h file,
otherwise we set it to 1. */
@@ -2519,7 +2522,7 @@ rank_for_schedule (const void *x, const
rtx tmp = *(const rtx *) y;
rtx tmp2 = *(const rtx *) x;
int tmp_class, tmp2_class;
- int val, priority_val, info_val;
+ int val, priority_val, info_val, diff;
if (MAY_HAVE_DEBUG_INSNS)
{
@@ -2532,6 +2535,20 @@ rank_for_schedule (const void *x, const
return INSN_LUID (tmp) - INSN_LUID (tmp2);
}
+ if (sched_relief_p)
+ {
+ gcc_assert (sched_pressure == SCHED_PRESSURE_WEIGHTED);
+ if ((INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp) < 0
+ || INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2) < 0)
+ && (diff = (INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp)
+ - INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2))) != 0)
+ return diff;
+ /* Sort by INSN_LUID (original insn order), so that we make the
+ sort stable. This minimizes instruction movement, thus
+ minimizing sched's effect on debugging and cross-jumping. */
+ return INSN_LUID (tmp) - INSN_LUID (tmp2);
+ }
+
/* The insn in a schedule group should be issued the first. */
if (flag_sched_group_heuristic &&
SCHED_GROUP_P (tmp) != SCHED_GROUP_P (tmp2))
@@ -2542,8 +2559,6 @@ rank_for_schedule (const void *x, const
if (sched_pressure != SCHED_PRESSURE_NONE)
{
- int diff;
-
/* Prefer insn whose scheduling results in the smallest register
pressure excess. */
if ((diff = (INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp)
@@ -3731,7 +3746,10 @@ schedule_insn (rtx insn)
{
fputc (':', sched_dump);
for (i = 0; i < ira_pressure_classes_num; i++)
- fprintf (sched_dump, "%s%+d(%d)",
+ fprintf (sched_dump, "%s%s%+d(%d)",
+ scheduled_insns.length () > 1
+ && INSN_LUID (insn)
+ < INSN_LUID (scheduled_insns[scheduled_insns.length () - 2]) ? "@" : "",
reg_class_names[ira_pressure_classes[i]],
pressure_info[i].set_increase, pressure_info[i].change);
}
@@ -6578,9 +6596,11 @@ sched_init (void)
if (targetm.sched.dispatch (NULL_RTX, IS_DISPATCH_ON))
targetm.sched.dispatch_do (NULL_RTX, DISPATCH_INIT);
- if (flag_sched_pressure
- && !reload_completed
- && common_sched_info->sched_pass_id == SCHED_RGN_PASS)
+ if (sched_relief_p)
+ sched_pressure = SCHED_PRESSURE_WEIGHTED;
+ else if (flag_sched_pressure
+ && !reload_completed
+ && common_sched_info->sched_pass_id == SCHED_RGN_PASS)
sched_pressure = ((enum sched_pressure_algorithm)
PARAM_VALUE (PARAM_SCHED_PRESSURE_ALGORITHM));
else
Index: ira.c
===================================================================
--- ira.c (revision 204380)
+++ ira.c (working copy)
@@ -3794,11 +3794,12 @@ update_equiv_regs (void)
if (! reg_equiv[regno].replace
|| reg_equiv[regno].loop_depth < loop_depth
- /* There is no sense to move insns if we did
- register pressure-sensitive scheduling was
- done because it will not improve allocation
- but worsen insn schedule with a big
- probability. */
+ /* There is no sense to move insns if live range
+ shrinkage or register pressure-sensitive
+ scheduling were done because it will not
+ improve allocation but worsen insn schedule
+ with a big probability. */
+ || flag_live_range_shrinkage
|| (flag_sched_pressure && flag_schedule_insns))
continue;
Index: passes.def
===================================================================
--- passes.def (revision 204380)
+++ passes.def (working copy)
@@ -358,6 +358,7 @@ along with GCC; see the file COPYING3.
NEXT_PASS (pass_mode_switching);
NEXT_PASS (pass_match_asm_constraints);
NEXT_PASS (pass_sms);
+ NEXT_PASS (pass_live_range_shrinkage);
NEXT_PASS (pass_sched);
NEXT_PASS (pass_ira);
NEXT_PASS (pass_reload);
Index: sched-deps.c
===================================================================
--- sched-deps.c (revision 204380)
+++ sched-deps.c (working copy)
@@ -1938,8 +1938,8 @@ create_insn_reg_use (int regno, rtx insn
return use;
}
-/* Allocate and return reg_set_data structure for REGNO and INSN. */
-static struct reg_set_data *
+/* Allocate reg_set_data structure for REGNO and INSN. */
+static void
create_insn_reg_set (int regno, rtx insn)
{
struct reg_set_data *set;
@@ -1949,7 +1949,6 @@ create_insn_reg_set (int regno, rtx insn
set->insn = insn;
set->next_insn_set = INSN_REG_SET_LIST (insn);
INSN_REG_SET_LIST (insn) = set;
- return set;
}
/* Set up insn register uses for INSN and dependency context DEPS. */
Index: sched-int.h
===================================================================
--- sched-int.h (revision 204380)
+++ sched-int.h (working copy)
@@ -28,6 +28,9 @@ along with GCC; see the file COPYING3.
#include "df.h"
#include "basic-block.h"
+/* True if we do pressure relief pass. */
+extern bool sched_relief_p;
+
/* Identificator of a scheduler pass. */
enum sched_pass_id_t { SCHED_PASS_UNKNOWN, SCHED_RGN_PASS, SCHED_EBB_PASS,
SCHED_SMS_PASS, SCHED_SEL_PASS };
Index: sched-rgn.c
===================================================================
--- sched-rgn.c (revision 204380)
+++ sched-rgn.c (working copy)
@@ -3565,6 +3565,33 @@ advance_target_bb (basic_block bb, rtx i
#endif
static bool
+gate_handle_live_range_shrinkage (void)
+{
+#ifdef INSN_SCHEDULING
+ return flag_live_range_shrinkage;
+#else
+ return 0;
+#endif
+}
+
+/* Run instruction scheduler. */
+static unsigned int
+rest_of_handle_live_range_shrinkage (void)
+{
+#ifdef INSN_SCHEDULING
+ int saved;
+
+ sched_relief_p = true;
+ saved = flag_schedule_interblock;
+ flag_schedule_interblock = false;
+ schedule_insns ();
+ flag_schedule_interblock = saved;
+ sched_relief_p = false;
+#endif
+ return 0;
+}
+
+static bool
gate_handle_sched (void)
{
#ifdef INSN_SCHEDULING
@@ -3621,6 +3648,45 @@ rest_of_handle_sched2 (void)
}
namespace {
+
+const pass_data pass_data_live_range_shrinkage =
+{
+ RTL_PASS, /* type */
+ "lr_shrinkage", /* name */
+ OPTGROUP_NONE, /* optinfo_flags */
+ true, /* has_gate */
+ true, /* has_execute */
+ TV_LIVE_RANGE_SHRINKAGE, /* tv_id */
+ 0, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ ( TODO_df_finish | TODO_verify_rtl_sharing
+ | TODO_verify_flow ), /* todo_flags_finish */
+};
+
+class pass_live_range_shrinkage : public rtl_opt_pass
+{
+public:
+ pass_live_range_shrinkage(gcc::context *ctxt)
+ : rtl_opt_pass(pass_data_live_range_shrinkage, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ bool gate () { return gate_handle_live_range_shrinkage (); }
+ unsigned int execute () { return rest_of_handle_live_range_shrinkage (); }
+
+}; // class pass_live_range_shrinkage
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_live_range_shrinkage (gcc::context *ctxt)
+{
+ return new pass_live_range_shrinkage (ctxt);
+}
+
+namespace {
const pass_data pass_data_sched =
{
Index: timevar.def
===================================================================
--- timevar.def (revision 204380)
+++ timevar.def (working copy)
@@ -223,6 +223,7 @@ DEFTIMEVAR (TV_COMBINE , "
DEFTIMEVAR (TV_IFCVT , "if-conversion")
DEFTIMEVAR (TV_MODE_SWITCH , "mode switching")
DEFTIMEVAR (TV_SMS , "sms modulo scheduling")
+DEFTIMEVAR (TV_LIVE_RANGE_SHRINKAGE , "live range shrinkage")
DEFTIMEVAR (TV_SCHED , "scheduling")
DEFTIMEVAR (TV_IRA , "integrated RA")
DEFTIMEVAR (TV_LRA , "LRA non-specific")
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 204380)
+++ tree-pass.h (working copy)
@@ -530,6 +530,7 @@ extern rtl_opt_pass *make_pass_lower_sub
extern rtl_opt_pass *make_pass_mode_switching (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_sms (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_sched (gcc::context *ctxt);
+extern rtl_opt_pass *make_pass_live_range_shrinkage (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_ira (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_reload (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_clean_state (gcc::context *ctxt);