I'd like to add a new experimental optimization to the trunk. This optimization was discussed on RA BOF of this summer GNU Cauldron.
It is a register pressure relief through live-range shrinkage. It is implemented on the scheduler base and uses register-pressure insn scheduling infrastructure. By rearranging insns we shorten pseudo live-ranges and increase a chance to them be assigned to a hard register. The code looks pretty simple but there are a lot of works behind this patch. I've tried about ten different versions of this code (different heuristics for two currently existing register-pressure algorithms). I think it is *upto target maintainers* to decide to use or not to use this optimization for their targets. I'd recommend to use this at least for x86/x86-64. I think any OOO processor with small or moderate register file which does not use the 1st insn scheduling might benefit from this too. On SPEC2000 for x86/x86-64 (I use Haswell processor, -O3 with general tuning), the optimization usage results in smaller code size in average (for floating point and integer benchmarks in 32- and 64-bit mode). The improvement better visible for SPECFP2000 (although I have the same improvement on x86-64 SPECInt2000 but it might be attributed mostly mcf benchmark unstability). It is about 0.5% for 32-bit and 64-bit mode. It is understandable, as the optimization has more opportunities to improve the code on longer BBs. Different from other heuristic optimizations, I don't see any significant worse performance. It gives practically the same or better performance (a few benchmarks imporoved by 1% or more upto 3%). The single but significant drawback is additional compilation time (4%-6%) as the 1st insn scheduling pass is quite expensive. So I'd recommend target maintainers to switch it on only for -Ofast. If somebody finds that the optimization works on processors which uses 1st insn scheduling by default (in which I slightly doubt), we could improve the compilation time by reusing data for this optimization and the 1st insn scheduling. Any comments, questions, thoughts are appreciated. 2013-11-05 Vladimir Makarov <vmaka...@redhat.com> * tree-pass.h (make_pass_live_range_shrinkage): New external. * timevar.def (TV_LIVE_RANGE_SHRINKAGE): New. * sched-rgn.c (gate_handle_live_range_shrinkage): New. (rest_of_handle_live_range_shrinkage): Ditto (class pass_live_range_shrinkage): Ditto. (pass_data_live_range_shrinkage): Ditto. (make_pass_live_range_shrinkage): Ditto. * sched-int.h (sched_relief_p): New external. * sched-deps.c (create_insn_reg_set): Make void return value. * passes.def: Add pass_live_range_shrinkage. * ira.c (update_equiv_regs): Don't move if flag_live_range_shrinkage. * haifa-sched.c (sched_relief_p): New. (rank_for_schedule): Add code for pressure relief through live range shrinkage. (schedule_insn): Print more debug info. (sched_init): Setup SCHED_PRESSURE_WEIGHTED for pressure relief through live range shrinkage. * doc/invoke.texi (-flive-range-shrinkage): New. * common.opt (flive-range-shrinkage): New.
Index: common.opt =================================================================== --- common.opt (revision 204380) +++ common.opt (working copy) @@ -1738,6 +1738,10 @@ fregmove Common Ignore Does nothing. Preserved for backward compatibility. +flive-range-shrinkage +Common Report Var(flag_live_range_shrinkage) Init(0) Optimization +Relief of register pressure through live range shrinkage + frename-registers Common Report Var(flag_rename_registers) Init(2) Optimization Perform a register renaming optimization pass Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 204216) +++ doc/invoke.texi (working copy) @@ -378,7 +378,7 @@ Objective-C and Objective-C++ Dialects}. -fira-region=@var{region} -fira-hoist-pressure @gol -fira-loop-pressure -fno-ira-share-save-slots @gol -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol --fivopts -fkeep-inline-functions -fkeep-static-consts @gol +-fivopts -fkeep-inline-functions -fkeep-static-consts -flive-range-shrinkage @gol -floop-block -floop-interchange -floop-strip-mine -floop-nest-optimize @gol -floop-parallelize-all -flto -flto-compression-level @gol -flto-partition=@var{alg} -flto-report -flto-report-wpa -fmerge-all-constants @gol @@ -7257,6 +7257,12 @@ registers after writing to their lower 3 Enabled for x86 at levels @option{-O2}, @option{-O3}. +@item -flive-range-shrinkage +@opindex flive-range-shrinkage +Attempt to decrease register pressure through register live range +shrinkage. This is helpful for fast processors with small or moderate +size register sets. + @item -fira-algorithm=@var{algorithm} Use the specified coloring algorithm for the integrated register allocator. The @var{algorithm} argument can be @samp{priority}, which Index: haifa-sched.c =================================================================== --- haifa-sched.c (revision 204380) +++ haifa-sched.c (working copy) @@ -150,6 +150,9 @@ along with GCC; see the file COPYING3. #ifdef INSN_SCHEDULING +/* True if we do pressure relief pass. */ +bool sched_relief_p; + /* issue_rate is the number of insns that can be scheduled in the same machine cycle. It can be defined in the config/mach/mach.h file, otherwise we set it to 1. */ @@ -2519,7 +2522,7 @@ rank_for_schedule (const void *x, const rtx tmp = *(const rtx *) y; rtx tmp2 = *(const rtx *) x; int tmp_class, tmp2_class; - int val, priority_val, info_val; + int val, priority_val, info_val, diff; if (MAY_HAVE_DEBUG_INSNS) { @@ -2532,6 +2535,20 @@ rank_for_schedule (const void *x, const return INSN_LUID (tmp) - INSN_LUID (tmp2); } + if (sched_relief_p) + { + gcc_assert (sched_pressure == SCHED_PRESSURE_WEIGHTED); + if ((INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp) < 0 + || INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2) < 0) + && (diff = (INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp) + - INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2))) != 0) + return diff; + /* Sort by INSN_LUID (original insn order), so that we make the + sort stable. This minimizes instruction movement, thus + minimizing sched's effect on debugging and cross-jumping. */ + return INSN_LUID (tmp) - INSN_LUID (tmp2); + } + /* The insn in a schedule group should be issued the first. */ if (flag_sched_group_heuristic && SCHED_GROUP_P (tmp) != SCHED_GROUP_P (tmp2)) @@ -2542,8 +2559,6 @@ rank_for_schedule (const void *x, const if (sched_pressure != SCHED_PRESSURE_NONE) { - int diff; - /* Prefer insn whose scheduling results in the smallest register pressure excess. */ if ((diff = (INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp) @@ -3731,7 +3746,10 @@ schedule_insn (rtx insn) { fputc (':', sched_dump); for (i = 0; i < ira_pressure_classes_num; i++) - fprintf (sched_dump, "%s%+d(%d)", + fprintf (sched_dump, "%s%s%+d(%d)", + scheduled_insns.length () > 1 + && INSN_LUID (insn) + < INSN_LUID (scheduled_insns[scheduled_insns.length () - 2]) ? "@" : "", reg_class_names[ira_pressure_classes[i]], pressure_info[i].set_increase, pressure_info[i].change); } @@ -6578,9 +6596,11 @@ sched_init (void) if (targetm.sched.dispatch (NULL_RTX, IS_DISPATCH_ON)) targetm.sched.dispatch_do (NULL_RTX, DISPATCH_INIT); - if (flag_sched_pressure - && !reload_completed - && common_sched_info->sched_pass_id == SCHED_RGN_PASS) + if (sched_relief_p) + sched_pressure = SCHED_PRESSURE_WEIGHTED; + else if (flag_sched_pressure + && !reload_completed + && common_sched_info->sched_pass_id == SCHED_RGN_PASS) sched_pressure = ((enum sched_pressure_algorithm) PARAM_VALUE (PARAM_SCHED_PRESSURE_ALGORITHM)); else Index: ira.c =================================================================== --- ira.c (revision 204380) +++ ira.c (working copy) @@ -3794,11 +3794,12 @@ update_equiv_regs (void) if (! reg_equiv[regno].replace || reg_equiv[regno].loop_depth < loop_depth - /* There is no sense to move insns if we did - register pressure-sensitive scheduling was - done because it will not improve allocation - but worsen insn schedule with a big - probability. */ + /* There is no sense to move insns if live range + shrinkage or register pressure-sensitive + scheduling were done because it will not + improve allocation but worsen insn schedule + with a big probability. */ + || flag_live_range_shrinkage || (flag_sched_pressure && flag_schedule_insns)) continue; Index: passes.def =================================================================== --- passes.def (revision 204380) +++ passes.def (working copy) @@ -358,6 +358,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_mode_switching); NEXT_PASS (pass_match_asm_constraints); NEXT_PASS (pass_sms); + NEXT_PASS (pass_live_range_shrinkage); NEXT_PASS (pass_sched); NEXT_PASS (pass_ira); NEXT_PASS (pass_reload); Index: sched-deps.c =================================================================== --- sched-deps.c (revision 204380) +++ sched-deps.c (working copy) @@ -1938,8 +1938,8 @@ create_insn_reg_use (int regno, rtx insn return use; } -/* Allocate and return reg_set_data structure for REGNO and INSN. */ -static struct reg_set_data * +/* Allocate reg_set_data structure for REGNO and INSN. */ +static void create_insn_reg_set (int regno, rtx insn) { struct reg_set_data *set; @@ -1949,7 +1949,6 @@ create_insn_reg_set (int regno, rtx insn set->insn = insn; set->next_insn_set = INSN_REG_SET_LIST (insn); INSN_REG_SET_LIST (insn) = set; - return set; } /* Set up insn register uses for INSN and dependency context DEPS. */ Index: sched-int.h =================================================================== --- sched-int.h (revision 204380) +++ sched-int.h (working copy) @@ -28,6 +28,9 @@ along with GCC; see the file COPYING3. #include "df.h" #include "basic-block.h" +/* True if we do pressure relief pass. */ +extern bool sched_relief_p; + /* Identificator of a scheduler pass. */ enum sched_pass_id_t { SCHED_PASS_UNKNOWN, SCHED_RGN_PASS, SCHED_EBB_PASS, SCHED_SMS_PASS, SCHED_SEL_PASS }; Index: sched-rgn.c =================================================================== --- sched-rgn.c (revision 204380) +++ sched-rgn.c (working copy) @@ -3565,6 +3565,33 @@ advance_target_bb (basic_block bb, rtx i #endif static bool +gate_handle_live_range_shrinkage (void) +{ +#ifdef INSN_SCHEDULING + return flag_live_range_shrinkage; +#else + return 0; +#endif +} + +/* Run instruction scheduler. */ +static unsigned int +rest_of_handle_live_range_shrinkage (void) +{ +#ifdef INSN_SCHEDULING + int saved; + + sched_relief_p = true; + saved = flag_schedule_interblock; + flag_schedule_interblock = false; + schedule_insns (); + flag_schedule_interblock = saved; + sched_relief_p = false; +#endif + return 0; +} + +static bool gate_handle_sched (void) { #ifdef INSN_SCHEDULING @@ -3621,6 +3648,45 @@ rest_of_handle_sched2 (void) } namespace { + +const pass_data pass_data_live_range_shrinkage = +{ + RTL_PASS, /* type */ + "lr_shrinkage", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + true, /* has_gate */ + true, /* has_execute */ + TV_LIVE_RANGE_SHRINKAGE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + ( TODO_df_finish | TODO_verify_rtl_sharing + | TODO_verify_flow ), /* todo_flags_finish */ +}; + +class pass_live_range_shrinkage : public rtl_opt_pass +{ +public: + pass_live_range_shrinkage(gcc::context *ctxt) + : rtl_opt_pass(pass_data_live_range_shrinkage, ctxt) + {} + + /* opt_pass methods: */ + bool gate () { return gate_handle_live_range_shrinkage (); } + unsigned int execute () { return rest_of_handle_live_range_shrinkage (); } + +}; // class pass_live_range_shrinkage + +} // anon namespace + +rtl_opt_pass * +make_pass_live_range_shrinkage (gcc::context *ctxt) +{ + return new pass_live_range_shrinkage (ctxt); +} + +namespace { const pass_data pass_data_sched = { Index: timevar.def =================================================================== --- timevar.def (revision 204380) +++ timevar.def (working copy) @@ -223,6 +223,7 @@ DEFTIMEVAR (TV_COMBINE , " DEFTIMEVAR (TV_IFCVT , "if-conversion") DEFTIMEVAR (TV_MODE_SWITCH , "mode switching") DEFTIMEVAR (TV_SMS , "sms modulo scheduling") +DEFTIMEVAR (TV_LIVE_RANGE_SHRINKAGE , "live range shrinkage") DEFTIMEVAR (TV_SCHED , "scheduling") DEFTIMEVAR (TV_IRA , "integrated RA") DEFTIMEVAR (TV_LRA , "LRA non-specific") Index: tree-pass.h =================================================================== --- tree-pass.h (revision 204380) +++ tree-pass.h (working copy) @@ -530,6 +530,7 @@ extern rtl_opt_pass *make_pass_lower_sub extern rtl_opt_pass *make_pass_mode_switching (gcc::context *ctxt); extern rtl_opt_pass *make_pass_sms (gcc::context *ctxt); extern rtl_opt_pass *make_pass_sched (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_live_range_shrinkage (gcc::context *ctxt); extern rtl_opt_pass *make_pass_ira (gcc::context *ctxt); extern rtl_opt_pass *make_pass_reload (gcc::context *ctxt); extern rtl_opt_pass *make_pass_clean_state (gcc::context *ctxt);