Re: Adjust debug output from SMS's get_schdedule_window
Hello Richard, > This patch adjusts the dump output from modulo-sched.c:get_schdedule_window. > Dump output is very much down to personal preference, so please feel free > just to reject the change. The output format looks great to me although I cannot approve it. Thanks, Revital
Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)
Hello, > On Mon, Jan 2, 2012 at 3:30 PM, Richard Sandiford > wrote: > > Ayal Zaks writes: > >> + for (i = 0; i < ira_pressure_classes_num; i++) > >> + { > >> + enum reg_class pressure_class; > >> + > >> + pressure_class = ira_pressure_classes[i]; > >> + > >> + if (max_reg_pressure[pressure_class] == 0) > >> + continue; > >> + > >> + if (dump_file) > >> + fprintf (dump_file, "%s=%d %d ", reg_class_names [pressure_class], > >> + max_reg_pressure[pressure_class], > >> + ira_available_class_regs[pressure_class]); > >> + > >> + if (max_reg_pressure[pressure_class] > >> + > ira_class_hard_regs_num[pressure_class]) > >> + { > >> + if (dump_file) > >> + fprintf (dump_file, "(pressure)\n"); > >> + > >> + pressure_p = true; > >> > >> you can "break;" now. > > > > FWIW, I thought the same thing at first, but I think Revital's wayis better. > > It's nice to know when looking at dumps whether there is excess pressure > > in more than one class. This isn't performance-critical code after all. > > > > ok > > >> however, you have everything setup to compute the amount of spill, so > >> suggest to do the following instead: > >> > >> pressure += max_reg_pressure[pressure_class] > >> - ira_class_hard_regs_num[pressure_class]); > >> > >> or better call the variable "spillage" instead of "pressure", and > >> return it. Current use will only check if it's zero or positive. > > > > I read this suggestion in the same way as Revital seems to have done: > > that we sum the pressure change over all classes. But that isn't the idea. > > Using N too many registers in one pressure class cannot be mitigated by > > leaving N registers in another pressure class unused. > > > > of-course (wasn't thinking of spilling from one register file to > another); meant to suggest doing: > > if (max_reg_pressure[pressure_class] > ira_class_hard_regs_num > [pressure_class]) > spillage += max_reg_pressure[pressure_class] - > ira_class_hard_regs_num[pressure_class]); > > > > TBH, the original version of this function looked better to me. > > > > ok, it would surely suffice for now. > Attached is an updated version with the two changes mentioned above taken from the previous patch. Tested and bootstrap with the other patch in the series on ppc64-redhat-linux, enabling SMS on loops with SC 1. Thanks again, Revital 2012-01-03 Richard Sandiford Revital Eres * loop-invariant.c (get_regno_pressure_class): Move function to... * ira.c: Here. * common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose): New flags. * doc/invoke.texi (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose): Document the flags. * ira.h (get_regno_pressure_class, reset_pseudo_classes_defined_p): Declare. * ira-costs.c (reset_pseudo_classes_defined_p): New function. * Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h. (modulo-sched-pressure.o): New. * modulo-sched.c (ira.h, modulo-sched.h): New includes. (partial_schedule_ptr, ps_insn_ptr, struct ps_insn, struct ps_reg_move_info, struct partial_schedule): Move to modulo-sched.h. (ps_rtl_insn, ps_reg_move): Remove static. (apply_reg_moves): Remove static and call df_insn_rescan only if PS is final. (undo_reg_moves): New function. (sms_schedule): Call register pressure estimation. * modulo-sched.h: New file. * modulo-sched-pressure.c: New file. (See attached file: patch_pressure_3_1_12.txt)Index: doc/invoke.texi === --- doc/invoke.texi (revision 182766) +++ doc/invoke.texi (working copy) @@ -374,6 +374,7 @@ Objective-C and Objective-C++ Dialects}. -floop-parallelize-all -flto -flto-compression-level @gol -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol +-fmodulo-sched-reg-pressure -fmodulo-sched-verbose=@var{n} @gol -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg @gol -fno-default-inline @gol -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol @@ -6476,6 +6477,16 @@ deleted which will trigger the generatio life-range analysis. This option is effective only with @option{-fmodulo-sched} enabled. +@item -fmodulo-sched-reg-pressure +@opindex fmodulo-sched-reg-pressure +Do not apply @option{-fmodulo-sched} to loops if the result would lead +to register spilling within the loop. +This option is effective only with @option{-fmodulo-sched} enabled. + +@item -fmodulo-sched-verbose=@var{n} +@opindex fmodulo-sched-verbose +Set up how verbose dump file for the SMS will be. + @item -fno-branch-count-reg @opindex fno-branch-count-reg Do not use ``decrement and branch'' instructions on a count
[PATCH, SMS] Fix PR51794
Hello, The patch below fixes ICE reported in PR51794. It avoids creating DDG edges for register uses of class DF_REF_ARTIFICIAL as the latter does not have real instructions for them and thus calling BLOCK_FOR_INSN fails. Tested and bootstrap on ppc64-redhat-linux, enabling SMS on loops with SC 1. OK for mainline? Thanks, Revital Chanelog: PR rtl-optimization/51794 * ddg.c (add_cross_iteration_register_deps): Avoid creating edges for uses of class DF_REF_ARTIFICIAL. testsuite/ PR rtl-optimization/51794 * gcc.dg/sms-12.c: New test. (See attached file: patch_fix_pr51794.txt)Index: ddg.c === --- ddg.c (revision 183001) +++ ddg.c (working copy) @@ -315,7 +315,12 @@ add_cross_iteration_register_deps (ddg_p /* Create inter-loop true dependences and anti dependences. */ for (r_use = DF_REF_CHAIN (last_def); r_use != NULL; r_use = r_use->next) { - rtx use_insn = DF_REF_INSN (r_use->ref); + rtx use_insn; + + if (r_use->ref == NULL || DF_REF_CLASS (r_use->ref) == DF_REF_ARTIFICIAL) + continue; + + use_insn = DF_REF_INSN (r_use->ref); if (BLOCK_FOR_INSN (use_insn) != g->bb) continue; Index: testsuite/gcc.dg/sms-12.c === --- testsuite/gcc.dg/sms-12.c (revision 0) +++ testsuite/gcc.dg/sms-12.c (revision 0) @@ -0,0 +1,13 @@ + /* { dg-do compile } */ + /* { dg-options "-O -fmodulo-sched" } */ + + +int +res_inverse (int a) +{ + int i; + char **b = (char **) __builtin_alloca (a * sizeof (char *)); + for (i = 0; i < a; i++) +b[i] = (char *) __builtin_alloca (sizeof (*b[i])); + return 0; +}
Re: [PATCH 3/9] [SMS] Eliminate redundant edges
Hi Roman, > While building a data dependency graph for loop a ddg edge for some pair > of instructions with inter-loop dependency should be created only if > there is no edge for intra-loop dependency between these instructions. > Creating both of edges leads sometimes to the fact that function It would be much appreciated if you could provide an example for the problematic scenario which leads to the miscompilation. Thanks, Revital
Re: [PATCH 5/9] [SMS] Support new loop pattern
Hello Roman, > This patch should be applied only after pending patches by Revital. This patch > significantly enhances the existing implementation of the SMS. Patch adds > support of scheduling loops without doloop pattern. The loop should meet the > following requirements. Thanks for the patch! I will try to go through it in more details soon. I'm testing now whether the recent bootstrap failure on ARM machine (PR49789) is revolved with your patch. I also plan to see the effect of it on PowerPC and SPU which currently support doloop pattern. Thanks, Revital
Re: Patches ping
Hi, Thanks for the review! > > >Changelog: > > > > (sms_schedule_by_order): Update call to get_sched_window. > > all set_must_precede_follow. > ^^^ > call Done. > >+/* Set bitmaps TMP_FOLLOW and TMP_PRECEDE to MUST_FOLLOW and MUST_PRECEDE > >+ respectively only if cycle C falls in the scheduling window boundaries > ^^ > on the border of Done. > > > > sbitmap tmp_precede = NULL; > > sbitmap tmp_follow = NULL; > are redundantly reset in set_must_precede_follow(). Done. I removed the setting of tmp_precede and tmp_follow before calling set_must_precede_follow. (they are reset in set_must_precede_follow) > >+/* Update the sched_params for node U using the II, > ^ > (time, row and stage) > >+ the CYCLE of U and MIN_CYCLE. */ > Please also clarify that we're not simply taking > SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii); > because the stages are aligned on cycle 0. Done. I assume it should be "because the stages may not be aligned on cycle 0." > >+ /* First, normailize the partial schedualing. */ > ^ ^ > > >+ /* Try to achieve optimized SC by normalizing the partial > >+ schedule (having the cycles start from cycle zero). The branch > ^ > >+ location must be placed in row ii-1 in the final scheduling. > > >+ If that's not the case after the normalization then try to > ^^ > >+ move the branch to that row if possible. */ > ^ > If failed, shift all instructions to position the branch in row ii-1. Done. > For consistency and clarity, may be instead of: > > >+ /* Bring the branch to cycle -1. */ > >+ int amount = SCHED_TIME (g->closing_branch) + 1; > it would be better to have: > > + /* Bring the branch to cycle ii-1. */ > + int amount = SCHED_TIME (g->closing_branch) - (ii - 1); > OK, the attached patch contains this fix in other places as well where we bring the branch to cycle -1. > Some thoughts on possible improvements (not mandatory; especially > given the delay in approval, sorry..thanks for the ping): > o Have optimize_sc() take care of all possible rotations doing the > best it can, without returning an indication which leaves subsequent > (suboptimal) processing to the caller. > o Instead of removing the branch and then trying to get it back into > the same cycle if you can't place it in row ii-1, consider keeping it > in its place and removing it only if you succeed to schedule it > (also..) in row ii-1. > o May be worthwhile to apply more refactoring, so that the code to > reschedule the branch in row ii-1 reuses more of the general code for > scheduling an instruction (possibly setting end = start + end), but > avoid splitting a row. > o Would be interesting to learn of loops that still have suboptimal > SC's, to see if it's still an issue. Thanks for the suggestions, will try to address them. I'm now re-testing the attached patch on PowerPC and will commit it if there will be no further comments. Thanks, Revital (See attached file: patch_opt_sc_1_8.txt) Index: ChangeLog === --- ChangeLog (revision 176998) +++ ChangeLog (working copy) @@ -1,3 +1,18 @@ +2011-08-01 Revital Eres + + * modulo-sched.c (calculate_stage_count, + calculate_must_precede_follow, get_sched_window, + try_scheduling_node_in_cycle, remove_node_from_ps): Add + declaration. + (update_node_sched_params, set_must_precede_follow, optimize_sc): + New functions. + (reset_sched_times): Call update_node_sched_params. + (sms_schedule): Call optimize_sc. + (get_sched_window): Change function arguments. + (sms_schedule_by_order): Update call to get_sched_window. + Call set_must_precede_follow. + (calculate_stage_count): Add function argument. + 2011-07-31 Richard Henderson * config/h8300/crti.asm: Add flags to .section directive. Index: modulo-sched.c === --- modulo-sched.c (revision 176998) +++ modulo-sched.c (working copy) @@ -203,7 +203,16 @@ static void generate_prolog_epilog (part rtx, rtx); static void duplicate_insns_of_cycles (partial_schedule_ptr, int, int, int, rtx); -static int calculate_stage_count (partial_schedule_ptr ps); +static int calculate_stage_count (partial_schedule_ptr, int); +static void calculate_must_precede_follow (ddg_node_ptr, int, int, + int, int, sbitmap, sbitmap, sbitmap
Re: debug insns in SMS (was: Re: Debug_insn)
Hello Alexandre > I think this will restore proper functioning to SMS in the presence of > debug insns. A while ago, we'd never generate deps of non-debug insns > on debug insns. I introduced them to enable sched to adjust (reset) > debug insns when non-debug insns were moved before them. I believe it > is safe to leave them out of the SCCs. Even though this will end up > causing some loss of debug info, that's probably unavoidable, and the > end result after this change is pobably the best we can hope for. Your > thoughts? Thanks for the patch! I actually discussed this issue with Ayal yesterday. Ayal also suggested to reconsider the edges that are created in the DDG between real instructions and debug_insns. Currently, we create bidirectional anti deps edges between them. This leads to the problem you were trying to solve in the current patch (described below) where these extra edges influence the construction of the strongly connected component and the code generated with and w\o -g. Your patch seems to solve this problem. However I can not approve it as I'm not the maintainer (Ayal is). Another problem with the handling of debug insns in SMS is that debug_insns instructions are been ignored while scheduling which means that they do not delimit the scheduling window of the real instructions they depend on. This could lead to a scenario where the dependencies between real instruction and debug_insn could be violated as we leave the debug_insns in their original place after scheduling. I'll try to send you an example of this problem as well. Example of code gen difference when using -g and without it which this patch tries to solve: The following nodes are all belong to the same SCC running with -g. node 1 is not present in this SCC when not using -g. (the nodes marked in [] are the one that do not exist with your patch and prevent from node 1 to be added to the SCC when compiling with -g) node 0 Debug_insn (ior(MEM(ivtmp.24),MEM(ivtmp.46)) in edges: 1->0, 2->0 out edges: [0->1], [0->4], [0->2] node 1 Reg 220 = MEM (pre_inc (ivtmp.24)) in edges: [0->1], 1->1 out edges: 1->0, 1->1, 1->3 node 2 Reg 221= MEM (pre_inc (ivtmp.46)) in edges: [0->2], 4->2, 2->2 out edges: 2->0, 2->3, 2->4, 2->2 node 3 Reg 222= ior (220,221) in edges: 2->3, 1->3 out edges: 3->4 node 4 MEM[pre_inc(196)] = 222 in edges: 3->4, 4->4, [0->4], 2->4 out edges: 4->4, 4->2 Thanks, Revital
[PATCH, SMS 3/3] Skip DEBUG_INSN in loop-doloop.
Hello, (sorry for multiple copies of this email) This small fix was inserted to skip DEBUG_INSNs while recognizing doloop pattern in loop-doloop.c file. It's a fix for the already approved do-loop patch (not in mainline yet, http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) in loop-doloop.c The patch was tested together with the rest of the patches in this series and on top of the patch to support do-loop for ARM (not yet in mainline, but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html). On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? Thanks, Revital Changelog: * loop-doloop.c (doloop_condition_get): Use prev_nondebug_insn instead of PREV_INSN. --- loop-doloop.c 2011-05-07 16:08:27.0 +0300 +++ loop-doloop_new.c 2011-05-07 16:07:48.0 +0300 @@ -151,7 +151,7 @@ doloop_condition_get (rtx doloop_pat) inc = XVECEXP (PATTERN (prev_insn), 0, 1); } else -inc = PATTERN (PREV_INSN (doloop_pat)); +inc = PATTERN (prev_insn); /* We expect the condition to be of the form (reg != 0) */ cond = XEXP (SET_SRC (cmp), 0); if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx)
[PATCH, SMS] Support closing_branch_deps
Hello, The attached patch includes enhancements for SMS to support targets that their doloop part is not decoupled from the rest of the loop's instructions, as SMS currently requires. (ARM is an example for such target, where the loop's instructions might use CC which is used in the doloop part) The patch uses already existing closing_branch_deps field in the DDG to indicate whether the closing branch should be scheduled with the rest of the loop instructions to preserve the dependencies between them. In the current implementation, closing_branch_deps is always false which enables to leave the closing branch outside of the scheduling process and place it at ii-1 row at the end of the process. This patch extends the implementation by identifying loops where closing_branch_deps should be true and scheduling the closing branch with the rest of the instructions for such loops, rotating the branch to be in the ii-1 row at the end of the scheduling process. The patch also enables the scheduling when debug_insn is generated for instructions in the do-loop part. To enable this, a small fix was inserted on top of the already approved do-loop patch (not in mainline yet, http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) in loop-doloop.c file to consider prev_nondebug_insn instead of PREV_INSN as follows: (the diff is against the patch in the above link and thus it is not included in the attached file, however it is part of the tested patch and thus I appreciate an approval for this part also) @@ -610,7 +614,7 @@ +inc = XVECEXP (PATTERN (prev_insn), 0, 1); + } + else -+inc = PATTERN (PREV_INSN (doloop_pat)); ++inc = PATTERN (prev_insn); /* We expect the condition to be of the form (reg != 0) */ cond = XEXP (SET_SRC (cmp), 0); if (GET_CODE (cond) != NE || XEXP (cond, 1) != const0_rtx) The attached patch was tested with the patch to support do-loop for ARM (http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html) including the fix mentioned above; as following: On ppc64-redhat-linux regtest as well as bootstrap with flags(*); enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with flags (*) configured w and w\o --with-arch=armv7-a; and w and w\o --with-mode=thumb (four different configurations in total) enabling SMS also on loops with stage count 1. (*) flags used for bootstrap: -O2 -fmodulo-sched -fmodulo-sched-allow-regmoves -fno-auto-inc-dec -funsafe-math-optimizations OK for trunk once stage 1 will be open? Thanks, Revital ChangeLog: * ddg.c (check_closing_branch_deps, get_node_of_insn_uid): New functions. (create_ddg): Pass sbitmap containing do-loop related instructions instead of closing_branch_deps parameter and call check_closing_branch_deps function. * ddg.h (create_ddg): Adjust the function declaration. * modulo-sched.c (PS_STAGE_COUNT): Rename to CALC_STAGE_COUNT and redefine. (doloop_register_get): Handle NONDEBUG_INSN_P. (stage_count): New field in struct partial_schedule. (mark_doloop_insns, calculate_stage_count): New functions. (normalize_sched_times): Rename to reset_sched_times and handle incrementing the sched time of the nodes by a constant value passed as parameter. (duplicate_insns_of_cycles): Skip closing branch. (sms_schedule_by_order): Schedule closing branch when closing_branch_deps is true. (ps_insn_find_column): Handle closing branch. (sms_schedule): Call reset_sched_times and handle case where do-loop pattern is not decoupled from the other loop instructions. (ps_insert_empty_row): Update calls to normalize_sched_times and rotate_partial_schedule functions. * loop-doloop.c (doloop_condition_get): Use prev_nondebug_insn instead of PREV_INSN. testsuite Changlog: * gcc.target/arm/sms-9.c: New file. * gcc.target/arm/sms-10.c: New file. (See attached file: patch_doloop_fix_7_3_new.txt) (See attached file: sms-10.c)(See attached file: sms-9.c)Index: ddg.c === --- ddg.c (revision 170464) +++ ddg.c (working copy) @@ -60,6 +60,8 @@ static void create_ddg_dep_no_link (ddg_ static ddg_edge_ptr create_ddg_edge (ddg_node_ptr, ddg_node_ptr, dep_type, dep_data_type, int, int); static void add_edge_to_ddg (ddg_ptr g, ddg_edge_ptr); +static ddg_node_ptr get_node_of_insn_uid (ddg_ptr, int); + /* Auxiliary variable for mem_read_insn_p/mem_write_insn_p. */ static bool mem_ref_p; @@ -488,12 +490,65 @@ build_intra_loop_deps (ddg_ptr g) sched_free_deps (head, tail, false); } +/* Given DOLOOP_INSNS which holds the instructions that + belong to the do-loop part; mark closing_branch_deps field in ddg G + as TRUE if the do-