[PATCH, libstdc++, testsuite] Remove redundant -save-temps options

2015-08-04 Thread Nikolai Bozhenov

Hi,

the attached patch removes redundant -save-temps options from some libstdc++
tests, since the option is not needed in dg-do-compile/scan-assembler tests.

Thanks,
Nikolai

2015-08-04  Nikolai Bozhenov  

	* testsuite/20_util/enable_shared_from_this/cons/constexpr.cc: Remove
	redundant -save-temps option.
	* testsuite/20_util/shared_ptr/cons/constexpr.cc: Likewise.
	* testsuite/20_util/unique_ptr/cons/constexpr.cc: Likewise.
	* testsuite/20_util/weak_ptr/cons/constexpr.cc: Likewise.
	* testsuite/30_threads/future/cons/constexpr.cc: Likewise.
	* testsuite/30_threads/shared_future/cons/constexpr.cc: Likewise.
  

diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/cons/constexpr.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/cons/constexpr.cc
index 78d8f06..18bf0c7 100644
--- a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/cons/constexpr.cc
+++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/cons/constexpr.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" }
+// { dg-options "-std=gnu++11 -fno-inline -g0" }
 // { dg-final { scan-assembler-not "_ZNSt23enable_shared_from_thisIiEC2Ev" } }
 // { dg-final { scan-assembler-not "_ZN7derivedC2Ev" } }
 
diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/constexpr.cc b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/constexpr.cc
index 0c9e9b2..63cc60b 100644
--- a/libstdc++-v3/testsuite/20_util/shared_ptr/cons/constexpr.cc
+++ b/libstdc++-v3/testsuite/20_util/shared_ptr/cons/constexpr.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" }
+// { dg-options "-std=gnu++11 -fno-inline -g0" }
 // { dg-final { scan-assembler-not "_ZNSt10shared_ptrIiEC2Ev" } }
 // { dg-final { scan-assembler-not "_ZNSt10shared_ptrIiEC2EDn" } }
 
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/constexpr.cc b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/constexpr.cc
index 4d6ae77..f118415 100644
--- a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/constexpr.cc
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/constexpr.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" }
+// { dg-options "-std=gnu++11 -fno-inline -g0" }
 // { dg-final { scan-assembler-not "_ZNSt10unique_ptrIiSt14default_deleteIiEEC2Ev" } }
 // { dg-final { scan-assembler-not "_ZNSt10unique_ptrIiSt14default_deleteIiEEC2EDn" } }
 
diff --git a/libstdc++-v3/testsuite/20_util/weak_ptr/cons/constexpr.cc b/libstdc++-v3/testsuite/20_util/weak_ptr/cons/constexpr.cc
index b5eff55..6b77e9b 100644
--- a/libstdc++-v3/testsuite/20_util/weak_ptr/cons/constexpr.cc
+++ b/libstdc++-v3/testsuite/20_util/weak_ptr/cons/constexpr.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" }
+// { dg-options "-std=gnu++11 -fno-inline -g0" }
 // { dg-final { scan-assembler-not "_ZNSt8weak_ptrIiEC2Ev" } }
 
 // Copyright (C) 2010-2015 Free Software Foundation, Inc.
diff --git a/libstdc++-v3/testsuite/30_threads/future/cons/constexpr.cc b/libstdc++-v3/testsuite/30_threads/future/cons/constexpr.cc
index 0ec5fda..11945ad 100644
--- a/libstdc++-v3/testsuite/30_threads/future/cons/constexpr.cc
+++ b/libstdc++-v3/testsuite/30_threads/future/cons/constexpr.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" }
+// { dg-options "-std=gnu++11 -fno-inline -g0" }
 // { dg-require-cstdint "" }
 // { dg-require-gthreads "" }
 // { dg-require-atomic-builtins "" }
diff --git a/libstdc++-v3/testsuite/30_threads/shared_future/cons/constexpr.cc b/libstdc++-v3/testsuite/30_threads/shared_future/cons/constexpr.cc
index 11826e1..eebf797 100644
--- a/libstdc++-v3/testsuite/30_threads/shared_future/cons/constexpr.cc
+++ b/libstdc++-v3/testsuite/30_threads/shared_future/cons/constexpr.cc
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-std=gnu++11 -fno-inline -save-temps -g0" }
+// { dg-options "-std=gnu++11 -fno-inline -g0" }
 // { dg-require-cstdint "" }
 // { dg-require-gthreads "" }
 // { dg-require-atomic-builtins "" }


Re: [RFC, PATCH] Disable -fprofile-use related optimizations if corresponding .gcda file not found.

2015-10-08 Thread Nikolai Bozhenov


On 10/07/2015 08:07 PM, pins...@gmail.com wrote:

Yes. But I am saying some one could do -fprofile-use -frename-registers and 
expect rename registers to stay on even if there is no profile.


That's true, we shouldn't disable any options that were explicitly 
requested by

the user.


Why not figure out how much to improve that instead. Rename register is just 
doing renaming of register usage and not undoing any scheduling at all (well it 
might cause some moves to be removed).  I think you really should dig into 
deeper why it is causing a performance issue.


Yep, the better way is to fix all the problems in all the passes. We're
working on it but it may take a while. Meanwhile, we want to use PGO and we
need a way to avoid degradations caused by it. We want to fallback to the
usual compilation process for a source file if we don't like how it is
compiled with PGO. And what we want to do is to simply remove the
corresponding gcda file and have the compiler ignore the -fprofile-use 
option

for the translation unit.

Anyway, I find current GCC's behavior to be confusing. Currently it silently
ignores missed profile data but triggers some opaque internal options which
affect further compilation. I think in case of a missed gcda file GCC should
either ignore -fprofile-use completely or issue a warning/error.

Thanks,
Nikolai


Fix prototype for print_insn in rtl.h

2015-10-13 Thread Nikolai Bozhenov
Currently prototype for print_insn in rtl.h doesn't match it's 
definition in sched-vis.c The patch fixes this mismatch.


Thanks,
Nikolai
2015-10-13  Nikolai Bozhenov  

* gcc/rtl.h (print_insn): fix prototype

diff --git a/gcc/rtl.h b/gcc/rtl.h
index a592a1e..d6edc71 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3574,7 +3574,7 @@ extern void dump_rtl_slim (FILE *, const rtx_insn *, 
const rtx_insn *,
   int, int);
 extern void print_value (pretty_printer *, const_rtx, int);
 extern void print_pattern (pretty_printer *, const_rtx, int);
-extern void print_insn (pretty_printer *, const_rtx, int);
+extern void print_insn (pretty_printer *, const rtx_insn *, int);
 extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block);
 extern const char *str_pattern_slim (const_rtx);
 


Re: Fix prototype for print_insn in rtl.h

2015-10-13 Thread Nikolai Bozhenov

On 10/13/2015 03:22 PM, Jeff Law wrote:

On 10/13/2015 02:21 AM, Nikolai Bozhenov wrote:

Currently prototype for print_insn in rtl.h doesn't match it's
definition in sched-vis.c The patch fixes this mismatch.
I'll run this through the usual bootstrap & regression testing before 
installing later today.

jeff


I've bootstrapped it on x86_64, but I don't see much sense in regression
testing this patch cause it's so small. Though, if you think it's necessary,
I can test it myself and write to you when I get the results.

Thanks,
Nikolai


Re: Fix prototype for print_insn in rtl.h

2015-10-15 Thread Nikolai Bozhenov


On 10/15/2015 07:28 PM, Andrew MacLeod wrote:

On 10/13/2015 11:32 AM, Jeff Law wrote:

On 10/13/2015 02:21 AM, Nikolai Bozhenov wrote:

2015-10-13  Nikolai Bozhenov

 * gcc/rtl.h (print_insn): fix prototype

Installed on the trunk after bootstrap & regression test.

jeff

Sorry, a little late to the party.. but why is print_insn even in 
rtl.h?  it seems that sched-vis.c is the only thing that uses it...


Andrew



I'm going to use it in the scheduler...

Thanks,
Nikolai


Re: Fix prototype for print_insn in rtl.h

2015-10-20 Thread Nikolai Bozhenov


On 10/15/2015 09:42 PM, Trevor Saunders wrote:

Sorry, a little late to the party.. but why is print_insn even in rtl.h?
it seems that sched-vis.c is the only thing that uses it...

Andrew

I'm going to use it in the scheduler...

but then wouldn't something like sched-int.h make more sense? On
the other hand it seems like printing insns is generally useful
functionality, so I'm curious why the scheduler needs its own way of
doing it.

Trev

As for me, I believe sched-int.h is inappropriate place for print_insn
prototype because the function has nothing scheduler specific. And I like
Jeff's idea of removing sched-vis.c and moving everything from it into
print-rtl.[hc]. It would be nice if such code motion resulted also in some
interface and implementation unification for regular and slim dumpers.

Thanks,
Nikolai



Change behavior of -fsched-verbose option

2015-10-22 Thread Nikolai Bozhenov

Hi!

Currently -fsched-verbose option redirects debugging dumps to stderr
if there is no dump_file for the current pass. It would be fine if
there were the only scheduling pass. But for example for AArch64
there are 3 scheduling passes in the default pipeline: sched1,
fusion and sched2. So, when passing options like
-fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for
sched1 in the file and a mess of fusion/sched2 logs in the console.

In fact, currently there's no way to tell GCC that I want extremely
verbose logs for sched1 and I want no logs for all other passes.
Especially to the console.

I suggest disabling such redirection in the scheduler and omitting
debugging output for passes without dump_file. I believe a better
way to redirect printing to the stderr is to use options like
-fdump-rtl-sched=stderr. The attached patch implements the
suggestion.

Anyway, the old behavior may be reproduced with options
-fsched-verbose=7 -fdump-rtl-sched1 -fdump-rtl-{sched_fusion,sched2}=stderr
if it is really necessary.

The patch has been bootstrapped and regtested on x86_64.

Thanks,
Nikolai
2015-10-22  Nikolai Bozhenov  

	* haifa-sched.c (setup_sched_dump): Don't redirect output to stderr.
	* common.opt (-fsched-verbose): Set default value to 1.
	* invoke.texi (-fsched-verbose): Update the option's description.

diff --git a/gcc/common.opt b/gcc/common.opt
index 224d3ad..5c23d29 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1973,7 +1973,7 @@ Common Report Var(flag_schedule_speculative_load_dangerous) Optimization
 Allow speculative motion of more loads
 
 fsched-verbose=
-Common RejectNegative Joined UInteger Var(sched_verbose_param)
+Common RejectNegative Joined UInteger Var(sched_verbose_param) Init(1)
 -fsched-verbose=	Set the verbosity level of the scheduler
 
 fsched2-use-superblocks
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 027ce96..d3cb88c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7403,12 +7403,7 @@ The @var{number} should be different for every file you compile.
 @item -fsched-verbose=@var{n}
 @opindex fsched-verbose
 On targets that use instruction scheduling, this option controls the
-amount of debugging output the scheduler prints.  This information is
-written to standard error, unless @option{-fdump-rtl-sched1} or
-@option{-fdump-rtl-sched2} is specified, in which case it is output
-to the usual dump listing file, @file{.sched1} or @file{.sched2}
-respectively.  However for @var{n} greater than nine, the output is
-always printed to standard error.
+amount of debugging output the scheduler prints to the dump files.
 
 For @var{n} greater than zero, @option{-fsched-verbose} outputs the
 same information as @option{-fdump-rtl-sched1} and @option{-fdump-rtl-sched2}.
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 46751fe..32506cb 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -206,17 +206,14 @@ static int modulo_last_stage;
 
 /* sched-verbose controls the amount of debugging output the
scheduler prints.  It is controlled by -fsched-verbose=N:
-   N>0 and no -DSR : the output is directed to stderr.
-   N>=10 will direct the printouts to stderr (regardless of -dSR).
-   N=1: same as -dSR.
+   N=0: no debugging output.
+   N=1: default value.
N=2: bb's probabilities, detailed ready list info, unit/insn info.
N=3: rtl at abort point, control-flow, regions info.
N=5: dependences info.  */
-
 int sched_verbose = 0;
 
-/* Debugging file.  All printouts are sent to dump, which is always set,
-   either to stderr, or to the dump listing file (-dRS).  */
+/* Debugging file.  All printouts are sent to dump. */
 FILE *sched_dump = 0;
 
 /* This is a placeholder for the scheduler parameters common
@@ -7222,17 +7219,14 @@ set_priorities (rtx_insn *head, rtx_insn *tail)
   return n_insn;
 }
 
-/* Set dump and sched_verbose for the desired debugging output.  If no
-   dump-file was specified, but -fsched-verbose=N (any N), print to stderr.
-   For -fsched-verbose=N, N>=10, print everything to stderr.  */
+/* Set sched_dump and sched_verbose for the desired debugging output. */
 void
 setup_sched_dump (void)
 {
   sched_verbose = sched_verbose_param;
-  if (sched_verbose_param == 0 && dump_file)
-sched_verbose = 1;
-  sched_dump = ((sched_verbose_param >= 10 || !dump_file)
-		? stderr : dump_file);
+  sched_dump = dump_file;
+  if (!dump_file)
+sched_verbose = 0;
 }
 
 /* Allocate data for register pressure sensitive scheduling.  */


Re: Change behavior of -fsched-verbose option

2015-10-22 Thread Nikolai Bozhenov

On 10/22/2015 06:56 PM, Bernd Schmidt wrote:

On 10/22/2015 05:38 PM, Nikolai Bozhenov wrote:


Currently -fsched-verbose option redirects debugging dumps to stderr
if there is no dump_file for the current pass. It would be fine if
there were the only scheduling pass. But for example for AArch64
there are 3 scheduling passes in the default pipeline: sched1,
fusion and sched2. So, when passing options like
-fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for
sched1 in the file and a mess of fusion/sched2 logs in the console.



* haifa-sched.c (setup_sched_dump): Don't redirect output to stderr.
* common.opt (-fsched-verbose): Set default value to 1.
* invoke.texi (-fsched-verbose): Update the option's description.


That looks sensible, I agree that dumping to stderr does not make much 
sense. When in the debugger you can still set sched_dump to stderr if 
you want.



+  if (!dump_file)
+sched_verbose = 0;


Do you even need that? Otherwise, patch OK.


Bernd



Most loggers look like

if (sched_verbose >= 4)
fprintf (sched_dump, ";;\tAdvance the current state.\n");

So, we need to make sure that sched_verbose is set to zero if
there's no dump_file. Otherwise, we'll be writing to NULL file
pointer.

Thanks,
Nikolai


[ping] Change behavior of -fsched-verbose option

2015-11-05 Thread Nikolai Bozhenov

Please commit if it is OK for trunk.

Thanks,
Nikolai

On 10/22/2015 06:38 PM, Nikolai Bozhenov wrote:

Hi!

Currently -fsched-verbose option redirects debugging dumps to stderr
if there is no dump_file for the current pass. It would be fine if
there were the only scheduling pass. But for example for AArch64
there are 3 scheduling passes in the default pipeline: sched1,
fusion and sched2. So, when passing options like
-fsched-verbose=7 -fdump-rtl-sched1 to GCC I get a neat dump for
sched1 in the file and a mess of fusion/sched2 logs in the console.

In fact, currently there's no way to tell GCC that I want extremely
verbose logs for sched1 and I want no logs for all other passes.
Especially to the console.

I suggest disabling such redirection in the scheduler and omitting
debugging output for passes without dump_file. I believe a better
way to redirect printing to the stderr is to use options like
-fdump-rtl-sched=stderr. The attached patch implements the
suggestion.

Anyway, the old behavior may be reproduced with options
-fsched-verbose=7 -fdump-rtl-sched1 
-fdump-rtl-{sched_fusion,sched2}=stderr

if it is really necessary.

The patch has been bootstrapped and regtested on x86_64.

Thanks,
Nikolai


2015-10-22  Nikolai Bozhenov  

	* haifa-sched.c (setup_sched_dump): Don't redirect output to stderr.
	* common.opt (-fsched-verbose): Set default value to 1.
	* invoke.texi (-fsched-verbose): Update the option's description.

diff --git a/gcc/common.opt b/gcc/common.opt
index 961a1b6..757ce85 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1967,17 +1967,17 @@ fsched-spec-load
 Common Report Var(flag_schedule_speculative_load) Optimization
 Allow speculative motion of some loads.
 
 fsched-spec-load-dangerous
 Common Report Var(flag_schedule_speculative_load_dangerous) Optimization
 Allow speculative motion of more loads.
 
 fsched-verbose=
-Common RejectNegative Joined UInteger Var(sched_verbose_param)
+Common RejectNegative Joined UInteger Var(sched_verbose_param) Init(1)
 -fsched-verbose=	Set the verbosity level of the scheduler.
 
 fsched2-use-superblocks
 Common Report Var(flag_sched2_use_superblocks) Optimization
 If scheduling post reload, do superblock scheduling.
 
 fsched2-use-traces
 Common Ignore
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4fc7d88..11b8697 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7415,22 +7415,17 @@ place unique stamps in coverage data files and the object files that
 produce them.  You can use the @option{-frandom-seed} option to produce
 reproducibly identical object files.
 
 The @var{number} should be different for every file you compile.
 
 @item -fsched-verbose=@var{n}
 @opindex fsched-verbose
 On targets that use instruction scheduling, this option controls the
-amount of debugging output the scheduler prints.  This information is
-written to standard error, unless @option{-fdump-rtl-sched1} or
-@option{-fdump-rtl-sched2} is specified, in which case it is output
-to the usual dump listing file, @file{.sched1} or @file{.sched2}
-respectively.  However for @var{n} greater than nine, the output is
-always printed to standard error.
+amount of debugging output the scheduler prints to the dump files.
 
 For @var{n} greater than zero, @option{-fsched-verbose} outputs the
 same information as @option{-fdump-rtl-sched1} and @option{-fdump-rtl-sched2}.
 For @var{n} greater than one, it also output basic block probabilities,
 detailed ready list information and unit/insn info.  For @var{n} greater
 than two, it includes RTL at abort point, control-flow and regions info.
 And for @var{n} over four, @option{-fsched-verbose} also includes
 dependence info.
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index caadc11..835648b 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -198,27 +198,24 @@ static int modulo_iter0_max_uid;
 static int modulo_backtracks_left;
 
 /* The stage in which the last insn from the original loop was
scheduled.  */
 static int modulo_last_stage;
 
 /* sched-verbose controls the amount of debugging output the
scheduler prints.  It is controlled by -fsched-verbose=N:
-   N>0 and no -DSR : the output is directed to stderr.
-   N>=10 will direct the printouts to stderr (regardless of -dSR).
-   N=1: same as -dSR.
+   N=0: no debugging output.
+   N=1: default value.
N=2: bb's probabilities, detailed ready list info, unit/insn info.
N=3: rtl at abort point, control-flow, regions info.
N=5: dependences info.  */
-
 int sched_verbose = 0;
 
-/* Debugging file.  All printouts are sent to dump, which is always set,
-   either to stderr, or to the dump listing file (-dRS).  */
+/* Debugging file.  All printouts are sent to dump. */
 FILE *sched_dump = 0;
 
 /* This is a placeholder for the scheduler parameters common
to all schedulers.  */
 struct common_sched_info_def *common_sched_info;
 
 #define INSN_TICK(INSN)	(HID (INSN)->tick)
 #define INSN_EXACT_TICK(

[sched] Dump dependency graph to a dot file

2015-11-05 Thread Nikolai Bozhenov

Hi!

The attached patch adds a procedure to dump the scheduler's dependency
graph into a dot file. The patch has been bootstrapped and regtested
on x86_64. Please commit if it is OK for trunk.

Thanks,
Nikolai
2015-11-04  Nikolai Bozhenov  

	* sched-int.h (dump_rgn_dependencies_dot): Declare
	* sched-rgn.c (dump_rgn_dependencies_dot): New function
	* print-rtl.h (print_insn): Add prototype

diff --git a/gcc/print-rtl.h b/gcc/print-rtl.h
index eb079af..f601d33 100644
--- a/gcc/print-rtl.h
+++ b/gcc/print-rtl.h
@@ -25,12 +25,14 @@ extern void print_rtl (FILE *, const_rtx);
 #endif
 
 extern void dump_value_slim (FILE *, const_rtx, int);
 extern void dump_insn_slim (FILE *, const rtx_insn *);
 extern void dump_rtl_slim (FILE *, const rtx_insn *, const rtx_insn *,
 			   int, int);
 extern void print_value (pretty_printer *, const_rtx, int);
 extern void print_pattern (pretty_printer *, const_rtx, int);
+extern void print_insn (pretty_printer *pp, const rtx_insn *x, int verbose);
+
 extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block);
 extern const char *str_pattern_slim (const_rtx);
 
 #endif  // GCC_PRINT_RTL_H
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 86f5821..4600347 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1492,16 +1492,19 @@ extern void sched_rgn_local_finish (void);
 extern void sched_rgn_local_free (void);
 extern void extend_regions (void);
 extern void rgn_make_new_region_out_of_new_block (basic_block);
 
 extern void compute_priorities (void);
 extern void increase_insn_priority (rtx_insn *, int);
 extern void debug_rgn_dependencies (int);
 extern void debug_dependencies (rtx_insn *, rtx_insn *);
+extern void dump_rgn_dependencies_dot (FILE *);
+extern void dump_rgn_dependencies_dot (const char *);
+
 extern void free_rgn_deps (void);
 extern int contributes_to_priority (rtx_insn *, rtx_insn *);
 extern void extend_rgns (int *, int *, sbitmap, int *);
 extern void deps_join (struct deps_desc *, struct deps_desc *);
 
 extern void rgn_setup_common_sched_info (void);
 extern void rgn_setup_sched_infos (void);
 
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index eafb3fd..d744d83 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -58,16 +58,18 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-attr.h"
 #include "except.h"
 #include "params.h"
 #include "cfganal.h"
 #include "sched-int.h"
 #include "sel-sched.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "pretty-print.h"
+#include "print-rtl.h"
 
 #ifdef INSN_SCHEDULING
 
 /* Some accessor macros for h_i_d members only used within this file.  */
 #define FED_BY_SPEC_LOAD(INSN) (HID (INSN)->fed_by_spec_load)
 #define IS_LOAD_INSN(INSN) (HID (insn)->is_load_insn)
 
 /* nr_inter/spec counts interblock/speculative motion for the function.  */
@@ -2855,16 +2857,118 @@ void debug_dependencies (rtx_insn *head, rtx_insn *tail)
 		   DEP_NONREG (dep) ? "n" : "",
 		   DEP_MULTIPLE (dep) ? "m" : "");
   }
   fprintf (sched_dump, "\n");
 }
 
   fprintf (sched_dump, "\n");
 }
+
+/* Dump dependency graph for the current region to a file using dot syntax. */
+
+void
+dump_rgn_dependencies_dot (FILE *file)
+{
+  rtx_insn *head, *tail, *con, *pro;
+  sd_iterator_def sd_it;
+  dep_t dep;
+  int bb;
+  pretty_printer pp;
+
+  pp.buffer->stream = file;
+  pp_printf (&pp, "digraph SchedDG {\n");
+
+  for (bb = 0; bb < current_nr_blocks; ++bb)
+{
+  // begin subgraph (basic block)
+  pp_printf (&pp, "subgraph cluster_block_%d {\n", bb);
+  pp_printf (&pp, "\t" "color=blue;" "\n");
+  pp_printf (&pp, "\t" "style=bold;" "\n");
+  pp_printf (&pp, "\t" "label=\"BB #%d\";\n", BB_TO_BLOCK (bb));
+
+  // setup head and tail (no support for EBBs)
+  gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
+  get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+  tail = NEXT_INSN (tail);
+
+  // dump all insns
+  for (con = head; con != tail; con = NEXT_INSN (con))
+	{
+	  if (!INSN_P (con))
+	continue;
+
+	  // pretty print the insn
+	  pp_printf (&pp, "\t%d [label=\"{", INSN_UID (con));
+	  pp_write_text_to_stream (&pp);
+	  print_insn (&pp, con, /*verbose=*/false);
+	  pp_write_text_as_dot_label_to_stream (&pp, /*for_record=*/true);
+	  pp_write_text_to_stream (&pp);
+
+	  // dump instruction attributes
+	  pp_printf (&pp, "|{ uid:%d | luid:%d | prio:%d }}\",shape=record]\n",
+		 INSN_UID (con), INSN_LUID (con), INSN_PRIORITY (con));
+
+	  /* dump all deps */
+	  FOR_EACH_DEP (con, SD_LIST_BACK, sd_it, dep)
+	{
+	  int weight = 0;
+	  const cha

Re: [ping] Change behavior of -fsched-verbose option

2015-11-05 Thread Nikolai Bozhenov

On 11/05/2015 12:26 PM, Bernd Schmidt wrote:

On 11/05/2015 09:11 AM, Nikolai Bozhenov wrote:

Please commit if it is OK for trunk.


Hmm, do you have a copyright assignment on file?



Yes I do have a copyright assignment for all past and future changes
to GCC (RT:828836).

Thanks,
Nikolai


Re: [sched] Dump dependency graph to a dot file

2015-11-05 Thread Nikolai Bozhenov

On 11/05/2015 12:18 PM, Kyrill Tkachov wrote:

Hi Nikolai,

On 05/11/15 08:29, Nikolai Bozhenov wrote:

Hi!

The attached patch adds a procedure to dump the scheduler's dependency
graph into a dot file. The patch has been bootstrapped and regtested
on x86_64. Please commit if it is OK for trunk.

Thanks,
Nikolai


A couple of style nits.

+  // begin subgraph (basic block)
+  pp_printf (&pp, "subgraph cluster_block_%d {\n", bb);
+  pp_printf (&pp, "\t" "color=blue;" "\n");
+  pp_printf (&pp, "\t" "style=bold;" "\n");
+  pp_printf (&pp, "\t" "label=\"BB #%d\";\n", BB_TO_BLOCK (bb));
+
+  // setup head and tail (no support for EBBs)
+  gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
+  get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, 
&tail);

+  tail = NEXT_INSN (tail);
+
+  // dump all insns
+  for (con = head; con != tail; con = NEXT_INSN (con))
+{
+  if (!INSN_P (con))
+continue;
+
+  // pretty print the insn
+  pp_printf (&pp, "\t%d [label=\"{", INSN_UID (con));

Please use C-style comments i.e. /**/ instead of //.
Also, throughout the comments leave two empty spaces after a full stop
i.e. /* .  */.

You can use the check_GNU_style.sh script in the contrib/ directory on
your patches to highlight similar issues. For example:
$ ./contrib/check_GNU_style.sh ~/patches/dep-graph.patch

Dot, space, space, end of comment.
83:+/* Dump dependency graph for the current region to a file using 
dot syntax. */
166:+/* Dump dependency graph for the current region to a file using 
dot syntax. */


Sentences should end with a dot.  Dot, space, space, end of the comment.
127:+  /* dump all deps */

Cheers,
Kyrill



Thanks for your remarks, Kyrill!

I've uploaded an updated patch.

Nikolai
2015-11-04  Nikolai Bozhenov  

	* sched-int.h (dump_rgn_dependencies_dot): Declare
	* sched-rgn.c (dump_rgn_dependencies_dot): New function
	* print-rtl.h (print_insn): Add prototype

diff --git a/gcc/print-rtl.h b/gcc/print-rtl.h
index eb079af..f601d33 100644
--- a/gcc/print-rtl.h
+++ b/gcc/print-rtl.h
@@ -25,12 +25,14 @@ extern void print_rtl (FILE *, const_rtx);
 #endif
 
 extern void dump_value_slim (FILE *, const_rtx, int);
 extern void dump_insn_slim (FILE *, const rtx_insn *);
 extern void dump_rtl_slim (FILE *, const rtx_insn *, const rtx_insn *,
 			   int, int);
 extern void print_value (pretty_printer *, const_rtx, int);
 extern void print_pattern (pretty_printer *, const_rtx, int);
+extern void print_insn (pretty_printer *pp, const rtx_insn *x, int verbose);
+
 extern void rtl_dump_bb_for_graph (pretty_printer *, basic_block);
 extern const char *str_pattern_slim (const_rtx);
 
 #endif  // GCC_PRINT_RTL_H
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 86f5821..4600347 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1492,16 +1492,19 @@ extern void sched_rgn_local_finish (void);
 extern void sched_rgn_local_free (void);
 extern void extend_regions (void);
 extern void rgn_make_new_region_out_of_new_block (basic_block);
 
 extern void compute_priorities (void);
 extern void increase_insn_priority (rtx_insn *, int);
 extern void debug_rgn_dependencies (int);
 extern void debug_dependencies (rtx_insn *, rtx_insn *);
+extern void dump_rgn_dependencies_dot (FILE *);
+extern void dump_rgn_dependencies_dot (const char *);
+
 extern void free_rgn_deps (void);
 extern int contributes_to_priority (rtx_insn *, rtx_insn *);
 extern void extend_rgns (int *, int *, sbitmap, int *);
 extern void deps_join (struct deps_desc *, struct deps_desc *);
 
 extern void rgn_setup_common_sched_info (void);
 extern void rgn_setup_sched_infos (void);
 
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index eafb3fd..d744d83 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -58,16 +58,18 @@ along with GCC; see the file COPYING3.  If not see
 #include "insn-attr.h"
 #include "except.h"
 #include "params.h"
 #include "cfganal.h"
 #include "sched-int.h"
 #include "sel-sched.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "pretty-print.h"
+#include "print-rtl.h"
 
 #ifdef INSN_SCHEDULING
 
 /* Some accessor macros for h_i_d members only used within this file.  */
 #define FED_BY_SPEC_LOAD(INSN) (HID (INSN)->fed_by_spec_load)
 #define IS_LOAD_INSN(INSN) (HID (insn)->is_load_insn)
 
 /* nr_inter/spec counts interblock/speculative motion for the function.  */
@@ -2855,16 +2857,118 @@ void debug_dependencies (rtx_insn *head, rtx_insn *tail)
 		   DEP_NONREG (dep) ? "n" : "",
 		   DEP_MULTIPLE (dep) ? "m" : "");
   }
   fprintf (sched_dump, "\n");
 }
 
   fprintf (sched_dump, "\n");
 }
+
+/* Dump dependenc

Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-06 Thread Nikolai Bozhenov

On 10/28/2015 01:07 PM, Kyrill Tkachov wrote:

Hi all,

This RTL checking error occurs on aarch64 in 
aarch_accumulator_forwarding when processing an msubsi insn

with subregs:
(insn 15 14 16 3 (set (reg/v:SI 78 [ i ])
(minus:SI (subreg:SI (reg/v:DI 76 [ aul ]) 0)
(mult:SI (subreg:SI (reg:DI 83) 0)
(subreg:SI (reg:DI 75 [ _20 ]) 0 schedice.c:10 357 
{*msubsi}


The register_operand predicate for that pattern allows subregs (I 
think correctly).
The code in aarch_accumulator_forwarding doesn't take that into 
account and ends up

taking a REGNO of a SUBREG, causing a checking error.

This patch fixes that by stripping the subregs off the accumulator rtx 
before

checking that the inner expression is a REG and taking its REGNO.

The testcase now works fine with an aarch64-none-elf toolchain 
configure for RTL checking.


The testcase is taken verbatim from the BZ entry for PR 68088.
Since this function is shared between arm and aarch64 I've 
bootstrapped and tested it on both

and I'll need ok's for both ports.

Ok for trunk?

Thanks,
Kyrill

2015-10-28  Kyrylo Tkachov  

PR target/68088
* config/arm/aarch-common.c (aarch_strip_subreg): New function.
(aarch_accumulator_forwarding): Strip subregs from accumulator rtx
when appropriate.

2015-10-28  Kyrylo Tkachov  

* gcc.target/aarch64/pr68088_1.c: New test.


Hi!

I faced the same issue but I had somewhat different RTL for the consumer:

(insn 20 15 21 2 (set (reg/i:SI 0 r0)
(minus:SI (subreg:SI (reg:DI 117) 4)
(mult:SI (reg:SI 123)
(reg:SI 114 gasman.c:4 48 {*mulsi3subsi})

where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
cycle in this case?

Thanks,
Nikolai


Re: Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-06 Thread Nikolai Bozhenov

On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote:

Hi!

I faced the same issue but I had somewhat different RTL for the consumer:

 (insn 20 15 21 2 (set (reg/i:SI 0 r0)
 (minus:SI (subreg:SI (reg:DI 117) 4)
 (mult:SI (reg:SI 123)
 (reg:SI 114 gasman.c:4 48 {*mulsi3subsi})

where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
cycle in this case?

If the accumulator can be forwarded (i.e. a SImode register), there isn't a 
reason why a subreg:SI (reg:DI) will not get forwarded.

The subreg:SI is an artifact before register allocation, thus it's a 
representation issue that the patch is fixing here unless I misunderstand your 
question.


I mean, in my example it is not the multiplication result that is
forwarded but its upper part. So, shouldn't we check that offset in a
subreg expression is zero? Or is it ok to forward only the upper part
of a multiplication?

Thanks,
Nikolai


Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-06 Thread Nikolai Bozhenov

On 11/06/2015 08:09 PM, Kyrill Tkachov wrote:


On 06/11/15 17:07, Nikolai Bozhenov wrote:

On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote:

Hi!

I faced the same issue but I had somewhat different RTL for the 
consumer:


 (insn 20 15 21 2 (set (reg/i:SI 0 r0)
 (minus:SI (subreg:SI (reg:DI 117) 4)
 (mult:SI (reg:SI 123)
 (reg:SI 114 gasman.c:4 48 {*mulsi3subsi})

where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
cycle in this case?
If the accumulator can be forwarded (i.e. a SImode register), there 
isn't a reason why a subreg:SI (reg:DI) will not get forwarded.


The subreg:SI is an artifact before register allocation, thus it's a 
representation issue that the patch is fixing here unless I 
misunderstand your question.



I mean, in my example it is not the multiplication result that is
forwarded but its upper part. So, shouldn't we check that offset in a
subreg expression is zero? Or is it ok to forward only the upper part
of a multiplication?


Could you please post the full RTL instruction we're talking about 
here as it appears in the scheduler dump?

So that we're all on the same page about which case we're talking about.



Sure. I'm talking about a sequence like this:

(insn 8 13 11 2 (set (reg:DI 117)
(mult:DI (zero_extend:DI (reg:SI 116))
(zero_extend:DI (reg:SI 118 test.c:2 54 {*umulsidi3_v6}
 (expr_list:REG_DEAD (reg:SI 118)
(expr_list:REG_DEAD (reg:SI 116)
(expr_list:REG_EQUAL (mult:DI (zero_extend:DI (reg:SI 116))
(const_int 1374400 [0x14f8c0]))
(nil)
(insn 11 8 12 2 (set (reg:DI 120)
(mult:DI (zero_extend:DI (subreg:SI (reg:DI 117) 4))
(zero_extend:DI (reg:SI 121 test.c:2 54 {*umulsidi3_v6}
 (expr_list:REG_DEAD (reg:SI 121)
(expr_list:REG_EQUAL (mult:DI (zero_extend:DI (subreg:SI 
(reg:DI 117) 4))

(const_int 3435973837 [0xcccd]))
(nil
(insn 12 11 20 2 (set (reg:SI 114)
(lshiftrt:SI (subreg:SI (reg:DI 120) 4)
(const_int 3 [0x3]))) test.c:2 126 {*arm_shiftsi3}
 (expr_list:REG_DEAD (reg:DI 120)
(expr_list:REG_EQUAL (udiv:SI (subreg:SI (reg:DI 117) 4)
(const_int 10 [0xa]))
(nil
(insn 20 12 21 2 (set (reg/i:SI 0 r0)
(minus:SI (subreg:SI (reg:DI 117) 4)
(mult:SI (reg:SI 123)
(reg:SI 114 test.c:3 48 {*mulsi3subsi}
 (expr_list:REG_DEAD (reg:SI 123)
(expr_list:REG_DEAD (reg:DI 117)
(expr_list:REG_DEAD (reg:SI 114)
(nil)

The last insn is a MLA-like operation which is tested by
aarch_accumulator_forwarding if it is a valid forwarding consumer.

Thanks,
Nikolai


Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-09 Thread Nikolai Bozhenov



On 11/06/2015 08:16 PM, Kyrill Tkachov wrote:


On 06/11/15 17:09, Kyrill Tkachov wrote:


On 06/11/15 17:07, Nikolai Bozhenov wrote:

On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote:

Hi!

I faced the same issue but I had somewhat different RTL for the 
consumer:


 (insn 20 15 21 2 (set (reg/i:SI 0 r0)
 (minus:SI (subreg:SI (reg:DI 117) 4)
 (mult:SI (reg:SI 123)
 (reg:SI 114 gasman.c:4 48 {*mulsi3subsi})

where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
cycle in this case?
If the accumulator can be forwarded (i.e. a SImode register), there 
isn't a reason why a subreg:SI (reg:DI) will not get forwarded.


The subreg:SI is an artifact before register allocation, thus it's 
a representation issue that the patch is fixing here unless I 
misunderstand your question.



I mean, in my example it is not the multiplication result that is
forwarded but its upper part. So, shouldn't we check that offset in a
subreg expression is zero? Or is it ok to forward only the upper part
of a multiplication?


Could you please post the full RTL instruction we're talking about 
here as it appears in the scheduler dump?

So that we're all on the same page about which case we're talking about.



Sorry, missed the above instruction.
This subreg is just a pre-register allocation representation of the 
instruction and will go away after reload.
This particular function only really has a real effect in post-reload 
scheduling as it's only there when the final

register numbers are known.



I see. aarch_accumulator_forwarding always returns 0 for virtual
registers. But isn't it overly pessimistic to assume that accumulator
forwarding is never possible at sched1? I wonder if it would be better
to be more optimistic about register allocation outcome. I mean, in
case of virtual registers we could assume forwarding from A to B if B
is the only consumer of A's result. Something like this:

if (REGNO (dest) >= FIRST_VIRTUAL_REGISTER
|| REGNO (accumulator) >= FIRST_VIRTUAL_REGISTER)
  return (DF_REG_USE_COUNT (REGNO (dest)) == 1)
&& (DF_REF_INSN (DF_REG_USE_CHAIN (REGNO (dest))) == consumer);
else
  return REGNO (dest) == REGNO (accumulator);

Thanks,
Nikolai