[PATCH, libstdc++, testsuite] Remove redundant -save-temps options
Hi, the attached patch removes redundant -save-temps options from some libstdc++ tests, since the option is not needed in dg-do-compile/scan-assembler tests. Thanks, Nikolai 2015-08-04 Nikolai Bozhenov * testsuite/20_util/enable_shared_from_this/cons/constexpr.cc: Remove redundant -save-temps option. * testsuite/20_util/shared_ptr/cons/constexpr.cc: Likewise. * testsuite/20_util/unique_ptr/cons/constexpr.cc: Likewise. * testsuite/20_util/weak_ptr/cons/constexpr.cc: Likewise. * testsuite/30_threads/future/cons/constexpr.cc: Likewise. * testsuite/30_threads/shared_future/cons/constexpr.cc: Likewise. diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/cons/constexpr.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/cons/constexpr.cc index 78d8f06..18bf0c7 100644 --- a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/cons/constexpr.cc +++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/cons/constexpr.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" } +// { dg-options "-std=gnu++11 -fno-inline -g0" } // { dg-final { scan-assembler-not "_ZNSt23enable_shared_from_thisIiEC2Ev" } } // { dg-final { scan-assembler-not "_ZN7derivedC2Ev" } } diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/constexpr.cc b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/constexpr.cc index 0c9e9b2..63cc60b 100644 --- a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/constexpr.cc +++ b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/constexpr.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" } +// { dg-options "-std=gnu++11 -fno-inline -g0" } // { dg-final { scan-assembler-not "_ZNSt10shared_ptrIiEC2Ev" } } // { dg-final { scan-assembler-not "_ZNSt10shared_ptrIiEC2EDn" } } diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/constexpr.cc b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/constexpr.cc index 4d6ae77..f118415 100644 --- a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/constexpr.cc +++ b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/constexpr.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" } +// { dg-options "-std=gnu++11 -fno-inline -g0" } // { dg-final { scan-assembler-not "_ZNSt10unique_ptrIiSt14default_deleteIiEEC2Ev" } } // { dg-final { scan-assembler-not "_ZNSt10unique_ptrIiSt14default_deleteIiEEC2EDn" } } diff --git a/libstdc++-v3/testsuite/20_util/weak_ptr/cons/constexpr.cc b/libstdc++-v3/testsuite/20_util/weak_ptr/cons/constexpr.cc index b5eff55..6b77e9b 100644 --- a/libstdc++-v3/testsuite/20_util/weak_ptr/cons/constexpr.cc +++ b/libstdc++-v3/testsuite/20_util/weak_ptr/cons/constexpr.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" } +// { dg-options "-std=gnu++11 -fno-inline -g0" } // { dg-final { scan-assembler-not "_ZNSt8weak_ptrIiEC2Ev" } } // Copyright (C) 2010-2015 Free Software Foundation, Inc. diff --git a/libstdc++-v3/testsuite/30_threads/future/cons/constexpr.cc b/libstdc++-v3/testsuite/30_threads/future/cons/constexpr.cc index 0ec5fda..11945ad 100644 --- a/libstdc++-v3/testsuite/30_threads/future/cons/constexpr.cc +++ b/libstdc++-v3/testsuite/30_threads/future/cons/constexpr.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" } +// { dg-options "-std=gnu++11 -fno-inline -g0" } // { dg-require-cstdint "" } // { dg-require-gthreads "" } // { dg-require-atomic-builtins "" } diff --git a/libstdc++-v3/testsuite/30_threads/shared_future/cons/constexpr.cc b/libstdc++-v3/testsuite/30_threads/shared_future/cons/constexpr.cc index 11826e1..eebf797 100644 --- a/libstdc++-v3/testsuite/30_threads/shared_future/cons/constexpr.cc +++ b/libstdc++-v3/testsuite/30_threads/shared_future/cons/constexpr.cc @@ -1,5 +1,5 @@ // { dg-do compile } -// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" } +// { dg-options "-std=gnu++11 -fno-inline -g0" } // { dg-require-cstdint "" } // { dg-require-gthreads "" } // { dg-require-atomic-builtins "" }
Re: [RFC, PATCH] Disable -fprofile-use related optimizations if corresponding .gcda file not found.
On 10/07/2015 08:07 PM, pins...@gmail.com wrote: Yes. But I am saying some one could do -fprofile-use -frename-registers and expect rename registers to stay on even if there is no profile. That's true, we shouldn't disable any options that were explicitly requested by the user. Why not figure out how much to improve that instead. Rename register is just doing renaming of register usage and not undoing any scheduling at all (well it might cause some moves to be removed). I think you really should dig into deeper why it is causing a performance issue. Yep, the better way is to fix all the problems in all the passes. We're working on it but it may take a while. Meanwhile, we want to use PGO and we need a way to avoid degradations caused by it. We want to fallback to the usual compilation process for a source file if we don't like how it is compiled with PGO. And what we want to do is to simply remove the corresponding gcda file and have the compiler ignore the -fprofile-use option for the translation unit. Anyway, I find current GCC's behavior to be confusing. Currently it silently ignores missed profile data but triggers some opaque internal options which affect further compilation. I think in case of a missed gcda file GCC should either ignore -fprofile-use completely or issue a warning/error. Thanks, Nikolai
Fix prototype for print_insn in rtl.h
Currently prototype for print_insn in rtl.h doesn't match it's definition in sched-vis.c The patch fixes this mismatch. Thanks, Nikolai 2015-10-13 Nikolai Bozhenov * gcc/rtl.h (print_insn): fix prototype diff --git a/gcc/rtl.h b/gcc/rtl.h index a592a1e..d6edc71 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -3574,7 +3574,7 @@ extern void dump_rtl_slim (FILE *, const rtx_insn *, const rtx_insn *, int, int); extern void print_value (pretty_printer *, const_rtx, int); extern void print_pattern (pretty_printer *, const_rtx, int); -extern void print_insn (pretty_printer *, const_rtx, int); +extern void print_insn (pretty_printer *, const rtx_insn *, int); extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block); extern const char *str_pattern_slim (const_rtx);
Re: Fix prototype for print_insn in rtl.h
On 10/13/2015 03:22 PM, Jeff Law wrote: On 10/13/2015 02:21 AM, Nikolai Bozhenov wrote: Currently prototype for print_insn in rtl.h doesn't match it's definition in sched-vis.c The patch fixes this mismatch. I'll run this through the usual bootstrap & regression testing before installing later today. jeff I've bootstrapped it on x86_64, but I don't see much sense in regression testing this patch cause it's so small. Though, if you think it's necessary, I can test it myself and write to you when I get the results. Thanks, Nikolai
Re: Fix prototype for print_insn in rtl.h
On 10/15/2015 07:28 PM, Andrew MacLeod wrote: On 10/13/2015 11:32 AM, Jeff Law wrote: On 10/13/2015 02:21 AM, Nikolai Bozhenov wrote: 2015-10-13 Nikolai Bozhenov * gcc/rtl.h (print_insn): fix prototype Installed on the trunk after bootstrap & regression test. jeff Sorry, a little late to the party.. but why is print_insn even in rtl.h? it seems that sched-vis.c is the only thing that uses it... Andrew I'm going to use it in the scheduler... Thanks, Nikolai
Re: Fix prototype for print_insn in rtl.h
On 10/15/2015 09:42 PM, Trevor Saunders wrote: Sorry, a little late to the party.. but why is print_insn even in rtl.h? it seems that sched-vis.c is the only thing that uses it... Andrew I'm going to use it in the scheduler... but then wouldn't something like sched-int.h make more sense? On the other hand it seems like printing insns is generally useful functionality, so I'm curious why the scheduler needs its own way of doing it. Trev As for me, I believe sched-int.h is inappropriate place for print_insn prototype because the function has nothing scheduler specific. And I like Jeff's idea of removing sched-vis.c and moving everything from it into print-rtl.[hc]. It would be nice if such code motion resulted also in some interface and implementation unification for regular and slim dumpers. Thanks, Nikolai
Change behavior of -fsched-verbose option
Hi! Currently -fsched-verbose option redirects debugging dumps to stderr if there is no dump_file for the current pass. It would be fine if there were the only scheduling pass. But for example for AArch64 there are 3 scheduling passes in the default pipeline: sched1, fusion and sched2. So, when passing options like -fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for sched1 in the file and a mess of fusion/sched2 logs in the console. In fact, currently there's no way to tell GCC that I want extremely verbose logs for sched1 and I want no logs for all other passes. Especially to the console. I suggest disabling such redirection in the scheduler and omitting debugging output for passes without dump_file. I believe a better way to redirect printing to the stderr is to use options like -fdump-rtl-sched=stderr. The attached patch implements the suggestion. Anyway, the old behavior may be reproduced with options -fsched-verbose=7 -fdump-rtl-sched1 -fdump-rtl-{sched_fusion,sched2}=stderr if it is really necessary. The patch has been bootstrapped and regtested on x86_64. Thanks, Nikolai 2015-10-22 Nikolai Bozhenov * haifa-sched.c (setup_sched_dump): Don't redirect output to stderr. * common.opt (-fsched-verbose): Set default value to 1. * invoke.texi (-fsched-verbose): Update the option's description. diff --git a/gcc/common.opt b/gcc/common.opt index 224d3ad..5c23d29 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1973,7 +1973,7 @@ Common Report Var(flag_schedule_speculative_load_dangerous) Optimization Allow speculative motion of more loads fsched-verbose= -Common RejectNegative Joined UInteger Var(sched_verbose_param) +Common RejectNegative Joined UInteger Var(sched_verbose_param) Init(1) -fsched-verbose= Set the verbosity level of the scheduler fsched2-use-superblocks diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 027ce96..d3cb88c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -7403,12 +7403,7 @@ The @var{number} should be different for every file you compile. @item -fsched-verbose=@var{n} @opindex fsched-verbose On targets that use instruction scheduling, this option controls the -amount of debugging output the scheduler prints. This information is -written to standard error, unless @option{-fdump-rtl-sched1} or -@option{-fdump-rtl-sched2} is specified, in which case it is output -to the usual dump listing file, @file{.sched1} or @file{.sched2} -respectively. However for @var{n} greater than nine, the output is -always printed to standard error. +amount of debugging output the scheduler prints to the dump files. For @var{n} greater than zero, @option{-fsched-verbose} outputs the same information as @option{-fdump-rtl-sched1} and @option{-fdump-rtl-sched2}. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 46751fe..32506cb 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -206,17 +206,14 @@ static int modulo_last_stage; /* sched-verbose controls the amount of debugging output the scheduler prints. It is controlled by -fsched-verbose=N: - N>0 and no -DSR : the output is directed to stderr. - N>=10 will direct the printouts to stderr (regardless of -dSR). - N=1: same as -dSR. + N=0: no debugging output. + N=1: default value. N=2: bb's probabilities, detailed ready list info, unit/insn info. N=3: rtl at abort point, control-flow, regions info. N=5: dependences info. */ - int sched_verbose = 0; -/* Debugging file. All printouts are sent to dump, which is always set, - either to stderr, or to the dump listing file (-dRS). */ +/* Debugging file. All printouts are sent to dump. */ FILE *sched_dump = 0; /* This is a placeholder for the scheduler parameters common @@ -7222,17 +7219,14 @@ set_priorities (rtx_insn *head, rtx_insn *tail) return n_insn; } -/* Set dump and sched_verbose for the desired debugging output. If no - dump-file was specified, but -fsched-verbose=N (any N), print to stderr. - For -fsched-verbose=N, N>=10, print everything to stderr. */ +/* Set sched_dump and sched_verbose for the desired debugging output. */ void setup_sched_dump (void) { sched_verbose = sched_verbose_param; - if (sched_verbose_param == 0 && dump_file) -sched_verbose = 1; - sched_dump = ((sched_verbose_param >= 10 || !dump_file) - ? stderr : dump_file); + sched_dump = dump_file; + if (!dump_file) +sched_verbose = 0; } /* Allocate data for register pressure sensitive scheduling. */
Re: Change behavior of -fsched-verbose option
On 10/22/2015 06:56 PM, Bernd Schmidt wrote: On 10/22/2015 05:38 PM, Nikolai Bozhenov wrote: Currently -fsched-verbose option redirects debugging dumps to stderr if there is no dump_file for the current pass. It would be fine if there were the only scheduling pass. But for example for AArch64 there are 3 scheduling passes in the default pipeline: sched1, fusion and sched2. So, when passing options like -fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for sched1 in the file and a mess of fusion/sched2 logs in the console. * haifa-sched.c (setup_sched_dump): Don't redirect output to stderr. * common.opt (-fsched-verbose): Set default value to 1. * invoke.texi (-fsched-verbose): Update the option's description. That looks sensible, I agree that dumping to stderr does not make much sense. When in the debugger you can still set sched_dump to stderr if you want. + if (!dump_file) +sched_verbose = 0; Do you even need that? Otherwise, patch OK. Bernd Most loggers look like if (sched_verbose >= 4) fprintf (sched_dump, ";;\tAdvance the current state.\n"); So, we need to make sure that sched_verbose is set to zero if there's no dump_file. Otherwise, we'll be writing to NULL file pointer. Thanks, Nikolai
[ping] Change behavior of -fsched-verbose option
Please commit if it is OK for trunk. Thanks, Nikolai On 10/22/2015 06:38 PM, Nikolai Bozhenov wrote: Hi! Currently -fsched-verbose option redirects debugging dumps to stderr if there is no dump_file for the current pass. It would be fine if there were the only scheduling pass. But for example for AArch64 there are 3 scheduling passes in the default pipeline: sched1, fusion and sched2. So, when passing options like -fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for sched1 in the file and a mess of fusion/sched2 logs in the console. In fact, currently there's no way to tell GCC that I want extremely verbose logs for sched1 and I want no logs for all other passes. Especially to the console. I suggest disabling such redirection in the scheduler and omitting debugging output for passes without dump_file. I believe a better way to redirect printing to the stderr is to use options like -fdump-rtl-sched=stderr. The attached patch implements the suggestion. Anyway, the old behavior may be reproduced with options -fsched-verbose=7 -fdump-rtl-sched1 -fdump-rtl-{sched_fusion,sched2}=stderr if it is really necessary. The patch has been bootstrapped and regtested on x86_64. Thanks, Nikolai 2015-10-22 Nikolai Bozhenov * haifa-sched.c (setup_sched_dump): Don't redirect output to stderr. * common.opt (-fsched-verbose): Set default value to 1. * invoke.texi (-fsched-verbose): Update the option's description. diff --git a/gcc/common.opt b/gcc/common.opt index 961a1b6..757ce85 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1967,17 +1967,17 @@ fsched-spec-load Common Report Var(flag_schedule_speculative_load) Optimization Allow speculative motion of some loads. fsched-spec-load-dangerous Common Report Var(flag_schedule_speculative_load_dangerous) Optimization Allow speculative motion of more loads. fsched-verbose= -Common RejectNegative Joined UInteger Var(sched_verbose_param) +Common RejectNegative Joined UInteger Var(sched_verbose_param) Init(1) -fsched-verbose= Set the verbosity level of the scheduler. fsched2-use-superblocks Common Report Var(flag_sched2_use_superblocks) Optimization If scheduling post reload, do superblock scheduling. fsched2-use-traces Common Ignore diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4fc7d88..11b8697 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -7415,22 +7415,17 @@ place unique stamps in coverage data files and the object files that produce them. You can use the @option{-frandom-seed} option to produce reproducibly identical object files. The @var{number} should be different for every file you compile. @item -fsched-verbose=@var{n} @opindex fsched-verbose On targets that use instruction scheduling, this option controls the -amount of debugging output the scheduler prints. This information is -written to standard error, unless @option{-fdump-rtl-sched1} or -@option{-fdump-rtl-sched2} is specified, in which case it is output -to the usual dump listing file, @file{.sched1} or @file{.sched2} -respectively. However for @var{n} greater than nine, the output is -always printed to standard error. +amount of debugging output the scheduler prints to the dump files. For @var{n} greater than zero, @option{-fsched-verbose} outputs the same information as @option{-fdump-rtl-sched1} and @option{-fdump-rtl-sched2}. For @var{n} greater than one, it also output basic block probabilities, detailed ready list information and unit/insn info. For @var{n} greater than two, it includes RTL at abort point, control-flow and regions info. And for @var{n} over four, @option{-fsched-verbose} also includes dependence info. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index caadc11..835648b 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -198,27 +198,24 @@ static int modulo_iter0_max_uid; static int modulo_backtracks_left; /* The stage in which the last insn from the original loop was scheduled. */ static int modulo_last_stage; /* sched-verbose controls the amount of debugging output the scheduler prints. It is controlled by -fsched-verbose=N: - N>0 and no -DSR : the output is directed to stderr. - N>=10 will direct the printouts to stderr (regardless of -dSR). - N=1: same as -dSR. + N=0: no debugging output. + N=1: default value. N=2: bb's probabilities, detailed ready list info, unit/insn info. N=3: rtl at abort point, control-flow, regions info. N=5: dependences info. */ - int sched_verbose = 0; -/* Debugging file. All printouts are sent to dump, which is always set, - either to stderr, or to the dump listing file (-dRS). */ +/* Debugging file. All printouts are sent to dump. */ FILE *sched_dump = 0; /* This is a placeholder for the scheduler parameters common to all schedulers. */ struct common_sched_info_def *common_sched_info; #define INSN_TICK(INSN) (HID (INSN)->tick) #define INSN_EXACT_TICK(
[sched] Dump dependency graph to a dot file
Hi! The attached patch adds a procedure to dump the scheduler's dependency graph into a dot file. The patch has been bootstrapped and regtested on x86_64. Please commit if it is OK for trunk. Thanks, Nikolai 2015-11-04 Nikolai Bozhenov * sched-int.h (dump_rgn_dependencies_dot): Declare * sched-rgn.c (dump_rgn_dependencies_dot): New function * print-rtl.h (print_insn): Add prototype diff --git a/gcc/print-rtl.h b/gcc/print-rtl.h index eb079af..f601d33 100644 --- a/gcc/print-rtl.h +++ b/gcc/print-rtl.h @@ -25,12 +25,14 @@ extern void print_rtl (FILE *, const_rtx); #endif extern void dump_value_slim (FILE *, const_rtx, int); extern void dump_insn_slim (FILE *, const rtx_insn *); extern void dump_rtl_slim (FILE *, const rtx_insn *, const rtx_insn *, int, int); extern void print_value (pretty_printer *, const_rtx, int); extern void print_pattern (pretty_printer *, const_rtx, int); +extern void print_insn (pretty_printer *pp, const rtx_insn *x, int verbose); + extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block); extern const char *str_pattern_slim (const_rtx); #endif // GCC_PRINT_RTL_H diff --git a/gcc/sched-int.h b/gcc/sched-int.h index 86f5821..4600347 100644 --- a/gcc/sched-int.h +++ b/gcc/sched-int.h @@ -1492,16 +1492,19 @@ extern void sched_rgn_local_finish (void); extern void sched_rgn_local_free (void); extern void extend_regions (void); extern void rgn_make_new_region_out_of_new_block (basic_block); extern void compute_priorities (void); extern void increase_insn_priority (rtx_insn *, int); extern void debug_rgn_dependencies (int); extern void debug_dependencies (rtx_insn *, rtx_insn *); +extern void dump_rgn_dependencies_dot (FILE *); +extern void dump_rgn_dependencies_dot (const char *); + extern void free_rgn_deps (void); extern int contributes_to_priority (rtx_insn *, rtx_insn *); extern void extend_rgns (int *, int *, sbitmap, int *); extern void deps_join (struct deps_desc *, struct deps_desc *); extern void rgn_setup_common_sched_info (void); extern void rgn_setup_sched_infos (void); diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c index eafb3fd..d744d83 100644 --- a/gcc/sched-rgn.c +++ b/gcc/sched-rgn.c @@ -58,16 +58,18 @@ along with GCC; see the file COPYING3. If not see #include "insn-attr.h" #include "except.h" #include "params.h" #include "cfganal.h" #include "sched-int.h" #include "sel-sched.h" #include "tree-pass.h" #include "dbgcnt.h" +#include "pretty-print.h" +#include "print-rtl.h" #ifdef INSN_SCHEDULING /* Some accessor macros for h_i_d members only used within this file. */ #define FED_BY_SPEC_LOAD(INSN) (HID (INSN)->fed_by_spec_load) #define IS_LOAD_INSN(INSN) (HID (insn)->is_load_insn) /* nr_inter/spec counts interblock/speculative motion for the function. */ @@ -2855,16 +2857,118 @@ void debug_dependencies (rtx_insn *head, rtx_insn *tail) DEP_NONREG (dep) ? "n" : "", DEP_MULTIPLE (dep) ? "m" : ""); } fprintf (sched_dump, "\n"); } fprintf (sched_dump, "\n"); } + +/* Dump dependency graph for the current region to a file using dot syntax. */ + +void +dump_rgn_dependencies_dot (FILE *file) +{ + rtx_insn *head, *tail, *con, *pro; + sd_iterator_def sd_it; + dep_t dep; + int bb; + pretty_printer pp; + + pp.buffer->stream = file; + pp_printf (&pp, "digraph SchedDG {\n"); + + for (bb = 0; bb < current_nr_blocks; ++bb) +{ + // begin subgraph (basic block) + pp_printf (&pp, "subgraph cluster_block_%d {\n", bb); + pp_printf (&pp, "\t" "color=blue;" "\n"); + pp_printf (&pp, "\t" "style=bold;" "\n"); + pp_printf (&pp, "\t" "label=\"BB #%d\";\n", BB_TO_BLOCK (bb)); + + // setup head and tail (no support for EBBs) + gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); + get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); + tail = NEXT_INSN (tail); + + // dump all insns + for (con = head; con != tail; con = NEXT_INSN (con)) + { + if (!INSN_P (con)) + continue; + + // pretty print the insn + pp_printf (&pp, "\t%d [label=\"{", INSN_UID (con)); + pp_write_text_to_stream (&pp); + print_insn (&pp, con, /*verbose=*/false); + pp_write_text_as_dot_label_to_stream (&pp, /*for_record=*/true); + pp_write_text_to_stream (&pp); + + // dump instruction attributes + pp_printf (&pp, "|{ uid:%d | luid:%d | prio:%d }}\",shape=record]\n", + INSN_UID (con), INSN_LUID (con), INSN_PRIORITY (con)); + + /* dump all deps */ + FOR_EACH_DEP (con, SD_LIST_BACK, sd_it, dep) + { + int weight = 0; + const cha
Re: [ping] Change behavior of -fsched-verbose option
On 11/05/2015 12:26 PM, Bernd Schmidt wrote: On 11/05/2015 09:11 AM, Nikolai Bozhenov wrote: Please commit if it is OK for trunk. Hmm, do you have a copyright assignment on file? Yes I do have a copyright assignment for all past and future changes to GCC (RT:828836). Thanks, Nikolai
Re: [sched] Dump dependency graph to a dot file
On 11/05/2015 12:18 PM, Kyrill Tkachov wrote: Hi Nikolai, On 05/11/15 08:29, Nikolai Bozhenov wrote: Hi! The attached patch adds a procedure to dump the scheduler's dependency graph into a dot file. The patch has been bootstrapped and regtested on x86_64. Please commit if it is OK for trunk. Thanks, Nikolai A couple of style nits. + // begin subgraph (basic block) + pp_printf (&pp, "subgraph cluster_block_%d {\n", bb); + pp_printf (&pp, "\t" "color=blue;" "\n"); + pp_printf (&pp, "\t" "style=bold;" "\n"); + pp_printf (&pp, "\t" "label=\"BB #%d\";\n", BB_TO_BLOCK (bb)); + + // setup head and tail (no support for EBBs) + gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); + get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); + tail = NEXT_INSN (tail); + + // dump all insns + for (con = head; con != tail; con = NEXT_INSN (con)) +{ + if (!INSN_P (con)) +continue; + + // pretty print the insn + pp_printf (&pp, "\t%d [label=\"{", INSN_UID (con)); Please use C-style comments i.e. /**/ instead of //. Also, throughout the comments leave two empty spaces after a full stop i.e. /* . */. You can use the check_GNU_style.sh script in the contrib/ directory on your patches to highlight similar issues. For example: $ ./contrib/check_GNU_style.sh ~/patches/dep-graph.patch Dot, space, space, end of comment. 83:+/* Dump dependency graph for the current region to a file using dot syntax. */ 166:+/* Dump dependency graph for the current region to a file using dot syntax. */ Sentences should end with a dot. Dot, space, space, end of the comment. 127:+ /* dump all deps */ Cheers, Kyrill Thanks for your remarks, Kyrill! I've uploaded an updated patch. Nikolai 2015-11-04 Nikolai Bozhenov * sched-int.h (dump_rgn_dependencies_dot): Declare * sched-rgn.c (dump_rgn_dependencies_dot): New function * print-rtl.h (print_insn): Add prototype diff --git a/gcc/print-rtl.h b/gcc/print-rtl.h index eb079af..f601d33 100644 --- a/gcc/print-rtl.h +++ b/gcc/print-rtl.h @@ -25,12 +25,14 @@ extern void print_rtl (FILE *, const_rtx); #endif extern void dump_value_slim (FILE *, const_rtx, int); extern void dump_insn_slim (FILE *, const rtx_insn *); extern void dump_rtl_slim (FILE *, const rtx_insn *, const rtx_insn *, int, int); extern void print_value (pretty_printer *, const_rtx, int); extern void print_pattern (pretty_printer *, const_rtx, int); +extern void print_insn (pretty_printer *pp, const rtx_insn *x, int verbose); + extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block); extern const char *str_pattern_slim (const_rtx); #endif // GCC_PRINT_RTL_H diff --git a/gcc/sched-int.h b/gcc/sched-int.h index 86f5821..4600347 100644 --- a/gcc/sched-int.h +++ b/gcc/sched-int.h @@ -1492,16 +1492,19 @@ extern void sched_rgn_local_finish (void); extern void sched_rgn_local_free (void); extern void extend_regions (void); extern void rgn_make_new_region_out_of_new_block (basic_block); extern void compute_priorities (void); extern void increase_insn_priority (rtx_insn *, int); extern void debug_rgn_dependencies (int); extern void debug_dependencies (rtx_insn *, rtx_insn *); +extern void dump_rgn_dependencies_dot (FILE *); +extern void dump_rgn_dependencies_dot (const char *); + extern void free_rgn_deps (void); extern int contributes_to_priority (rtx_insn *, rtx_insn *); extern void extend_rgns (int *, int *, sbitmap, int *); extern void deps_join (struct deps_desc *, struct deps_desc *); extern void rgn_setup_common_sched_info (void); extern void rgn_setup_sched_infos (void); diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c index eafb3fd..d744d83 100644 --- a/gcc/sched-rgn.c +++ b/gcc/sched-rgn.c @@ -58,16 +58,18 @@ along with GCC; see the file COPYING3. If not see #include "insn-attr.h" #include "except.h" #include "params.h" #include "cfganal.h" #include "sched-int.h" #include "sel-sched.h" #include "tree-pass.h" #include "dbgcnt.h" +#include "pretty-print.h" +#include "print-rtl.h" #ifdef INSN_SCHEDULING /* Some accessor macros for h_i_d members only used within this file. */ #define FED_BY_SPEC_LOAD(INSN) (HID (INSN)->fed_by_spec_load) #define IS_LOAD_INSN(INSN) (HID (insn)->is_load_insn) /* nr_inter/spec counts interblock/speculative motion for the function. */ @@ -2855,16 +2857,118 @@ void debug_dependencies (rtx_insn *head, rtx_insn *tail) DEP_NONREG (dep) ? "n" : "", DEP_MULTIPLE (dep) ? "m" : ""); } fprintf (sched_dump, "\n"); } fprintf (sched_dump, "\n"); } + +/* Dump dependenc
Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check
On 10/28/2015 01:07 PM, Kyrill Tkachov wrote: Hi all, This RTL checking error occurs on aarch64 in aarch_accumulator_forwarding when processing an msubsi insn with subregs: (insn 15 14 16 3 (set (reg/v:SI 78 [ i ]) (minus:SI (subreg:SI (reg/v:DI 76 [ aul ]) 0) (mult:SI (subreg:SI (reg:DI 83) 0) (subreg:SI (reg:DI 75 [ _20 ]) 0 schedice.c:10 357 {*msubsi} The register_operand predicate for that pattern allows subregs (I think correctly). The code in aarch_accumulator_forwarding doesn't take that into account and ends up taking a REGNO of a SUBREG, causing a checking error. This patch fixes that by stripping the subregs off the accumulator rtx before checking that the inner expression is a REG and taking its REGNO. The testcase now works fine with an aarch64-none-elf toolchain configure for RTL checking. The testcase is taken verbatim from the BZ entry for PR 68088. Since this function is shared between arm and aarch64 I've bootstrapped and tested it on both and I'll need ok's for both ports. Ok for trunk? Thanks, Kyrill 2015-10-28 Kyrylo Tkachov PR target/68088 * config/arm/aarch-common.c (aarch_strip_subreg): New function. (aarch_accumulator_forwarding): Strip subregs from accumulator rtx when appropriate. 2015-10-28 Kyrylo Tkachov * gcc.target/aarch64/pr68088_1.c: New test. Hi! I faced the same issue but I had somewhat different RTL for the consumer: (insn 20 15 21 2 (set (reg/i:SI 0 r0) (minus:SI (subreg:SI (reg:DI 117) 4) (mult:SI (reg:SI 123) (reg:SI 114 gasman.c:4 48 {*mulsi3subsi}) where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one cycle in this case? Thanks, Nikolai
Re: Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check
On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote: Hi! I faced the same issue but I had somewhat different RTL for the consumer: (insn 20 15 21 2 (set (reg/i:SI 0 r0) (minus:SI (subreg:SI (reg:DI 117) 4) (mult:SI (reg:SI 123) (reg:SI 114 gasman.c:4 48 {*mulsi3subsi}) where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one cycle in this case? If the accumulator can be forwarded (i.e. a SImode register), there isn't a reason why a subreg:SI (reg:DI) will not get forwarded. The subreg:SI is an artifact before register allocation, thus it's a representation issue that the patch is fixing here unless I misunderstand your question. I mean, in my example it is not the multiplication result that is forwarded but its upper part. So, shouldn't we check that offset in a subreg expression is zero? Or is it ok to forward only the upper part of a multiplication? Thanks, Nikolai
Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check
On 11/06/2015 08:09 PM, Kyrill Tkachov wrote: On 06/11/15 17:07, Nikolai Bozhenov wrote: On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote: Hi! I faced the same issue but I had somewhat different RTL for the consumer: (insn 20 15 21 2 (set (reg/i:SI 0 r0) (minus:SI (subreg:SI (reg:DI 117) 4) (mult:SI (reg:SI 123) (reg:SI 114 gasman.c:4 48 {*mulsi3subsi}) where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one cycle in this case? If the accumulator can be forwarded (i.e. a SImode register), there isn't a reason why a subreg:SI (reg:DI) will not get forwarded. The subreg:SI is an artifact before register allocation, thus it's a representation issue that the patch is fixing here unless I misunderstand your question. I mean, in my example it is not the multiplication result that is forwarded but its upper part. So, shouldn't we check that offset in a subreg expression is zero? Or is it ok to forward only the upper part of a multiplication? Could you please post the full RTL instruction we're talking about here as it appears in the scheduler dump? So that we're all on the same page about which case we're talking about. Sure. I'm talking about a sequence like this: (insn 8 13 11 2 (set (reg:DI 117) (mult:DI (zero_extend:DI (reg:SI 116)) (zero_extend:DI (reg:SI 118 test.c:2 54 {*umulsidi3_v6} (expr_list:REG_DEAD (reg:SI 118) (expr_list:REG_DEAD (reg:SI 116) (expr_list:REG_EQUAL (mult:DI (zero_extend:DI (reg:SI 116)) (const_int 1374400 [0x14f8c0])) (nil) (insn 11 8 12 2 (set (reg:DI 120) (mult:DI (zero_extend:DI (subreg:SI (reg:DI 117) 4)) (zero_extend:DI (reg:SI 121 test.c:2 54 {*umulsidi3_v6} (expr_list:REG_DEAD (reg:SI 121) (expr_list:REG_EQUAL (mult:DI (zero_extend:DI (subreg:SI (reg:DI 117) 4)) (const_int 3435973837 [0xcccd])) (nil (insn 12 11 20 2 (set (reg:SI 114) (lshiftrt:SI (subreg:SI (reg:DI 120) 4) (const_int 3 [0x3]))) test.c:2 126 {*arm_shiftsi3} (expr_list:REG_DEAD (reg:DI 120) (expr_list:REG_EQUAL (udiv:SI (subreg:SI (reg:DI 117) 4) (const_int 10 [0xa])) (nil (insn 20 12 21 2 (set (reg/i:SI 0 r0) (minus:SI (subreg:SI (reg:DI 117) 4) (mult:SI (reg:SI 123) (reg:SI 114 test.c:3 48 {*mulsi3subsi} (expr_list:REG_DEAD (reg:SI 123) (expr_list:REG_DEAD (reg:DI 117) (expr_list:REG_DEAD (reg:SI 114) (nil) The last insn is a MLA-like operation which is tested by aarch_accumulator_forwarding if it is a valid forwarding consumer. Thanks, Nikolai
Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check
On 11/06/2015 08:16 PM, Kyrill Tkachov wrote: On 06/11/15 17:09, Kyrill Tkachov wrote: On 06/11/15 17:07, Nikolai Bozhenov wrote: On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote: Hi! I faced the same issue but I had somewhat different RTL for the consumer: (insn 20 15 21 2 (set (reg/i:SI 0 r0) (minus:SI (subreg:SI (reg:DI 117) 4) (mult:SI (reg:SI 123) (reg:SI 114 gasman.c:4 48 {*mulsi3subsi}) where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one cycle in this case? If the accumulator can be forwarded (i.e. a SImode register), there isn't a reason why a subreg:SI (reg:DI) will not get forwarded. The subreg:SI is an artifact before register allocation, thus it's a representation issue that the patch is fixing here unless I misunderstand your question. I mean, in my example it is not the multiplication result that is forwarded but its upper part. So, shouldn't we check that offset in a subreg expression is zero? Or is it ok to forward only the upper part of a multiplication? Could you please post the full RTL instruction we're talking about here as it appears in the scheduler dump? So that we're all on the same page about which case we're talking about. Sorry, missed the above instruction. This subreg is just a pre-register allocation representation of the instruction and will go away after reload. This particular function only really has a real effect in post-reload scheduling as it's only there when the final register numbers are known. I see. aarch_accumulator_forwarding always returns 0 for virtual registers. But isn't it overly pessimistic to assume that accumulator forwarding is never possible at sched1? I wonder if it would be better to be more optimistic about register allocation outcome. I mean, in case of virtual registers we could assume forwarding from A to B if B is the only consumer of A's result. Something like this: if (REGNO (dest) >= FIRST_VIRTUAL_REGISTER || REGNO (accumulator) >= FIRST_VIRTUAL_REGISTER) return (DF_REG_USE_COUNT (REGNO (dest)) == 1) && (DF_REF_INSN (DF_REG_USE_CHAIN (REGNO (dest))) == consumer); else return REGNO (dest) == REGNO (accumulator); Thanks, Nikolai