Re: making the new if-converter not mangle IR that is already vectorizer-friendly

2015-08-05 Thread Richard Biener
On Fri, Jul 31, 2015 at 7:43 PM, Abe  wrote:
> [Abe wrote:]
>>>
>>> TTBOMK, that "expense" is temporary, pending fixes/improvement
>>> to the new converter.  IOW, I don`t think any of the relevant
>>> regressions are as a result of "essential complexity".
>
>
> [Richard wrote:]
>>
>> I don't see how you can fix this without scatter support in the CPU.
>
>
> Well, for some test cases, I can see in the GIMPLE dumps that
> a/the scratchpad wasn`t needed here and wasn`t needed there.
> IOW, the converter that "knows how" to indirect via scratchpads
> is doing so _always_ rather than only when it`s truly needed.
> This is causing additional indirection which the vectorizer
> rejects.  Making the new converter "smarter" should fix this.
>
> Overall, however, I see your point and agree with it.  Gather support
> is needed for the loads and scatter for the stores, in the general case.

You skipped my analysis that even with scatter/gather support as
implemented by AVX on x86_64 vectorization is not possible because
of how those instructions are designed.

Note that the old converter is already "smarter".  Note that the old
converter knows to use masked loads/stores which is a closer
match to the original code and which will be faster and trivially
vectorizable.  Also note that all x86_64 CPUs that suport gather
also support masked loads and stores.

I know no other CPU architecture with scatter/gather support
(and doubt they do not at the same time implement masked loads/stores)

Richard.

>
> [Richard wrote:]
>>
>> But then you should restrict the if-conversion
>> to targets that can do gather vectorization.
>
>
> I think adding this requirement -- to generalize, _two_ req.s:
> _gather_ support for _load_ if conversion with [a] scratchpad[s],
> and scatter support for _store_ if conversion with [a] scratchpad[s] --
> is a reasonable and probably wise goal.  Doing so should eventually
> enable us to conclude automatically "it is safe to enable this kind of
> if conversion for/on this target" and enable the relevant transformations
> automatically when given e.g. "-O3" and autovectorization.
>
>
> [Abe wrote]
>>>
>>> As of now, the relevant "improvement" in/due-to the new
>>> converter is not always effective [...] We intend to fix this.
>
>
> [Richard wrote:]
>>
>> This should be fixed before the patch can go in.
>
>
> The current plan is to add new switches that control which [if any]
> if converter is active, and to have the old one _and_ the new one
> be included in GCC trunk for the time being.  Then I/we can have
> more eyes on the new code, helping with bugs etc., without having
> any impact at all on code that only either is compiled with
> the old flags [which shall retain their old semantics
> for now] or with no relevant flags whatsoever.
>
> Is that plan OK?
>
>
> [Richard wrote:]
>>
>> There is the existing mechanism to if-convert only a copy
>> of the loop guarded with IFN_LOOP_VECOTRIZED so the
>> cost model is that of the vectorizer when it tries to vectorize
>> the if-converted copy.  If it doesn't vectorize that copy the
>> non-if-converted loop is preserved, otherwise it is deleted.
>
>
> I think I found the code to which you are referring here,
> and I tried to use it [so as to not reduplicate efforts],
> but my first effort at doing so was unsuccessful AFAIK.
> Trying again -- this time with help from Jakub,
> I hope -- is on my work agenda.
>
>
> [Richard wrote:]
>>
>> I've argued in the past that the GIMPLE if-converter should
>> always go that path (thus strictly be a vectorization enabler only).
>
>
> If I understand correctly what you mean by that, then I agree.
>
>
>> if-conversion is already on by default if vectorizing.
>
>> We only do not introduce store-data-races by default.
>
> Thanks.
>
> Do you know if the e.g. GCC 5.1 [and 5.2, if bug fix not included]
> release[s] were "safe" even without your recent bug fix [July 10]
> for code _not_ compiled with "-ftree-loop-if-convert-stores"?
> I ask b/c the test case I wrote which was able to cause GCC to
> generate crashing code [for a correct-under-low-optimization test
> case] intentionally had a crash due to dereferencing a null pointer
> in a _load_.  IIRC, getting GCC to generate that bad code required
> "-ftree-loop-if-convert-stores" despite it being a faulty _load_.
>
> FYI: I have made some progress on the SPEC testing, but I do not
> yet have the work at a point where I`m ready to report numbers.
>
>
> Regards,
>
> Abe


July/August GNU Toolchain update

2015-08-05 Thread Nick Clifton
Hi Guys,

  Sorry for the delay in bringing you this update; I have been very
  busy in the last few months.  Anyway the highlights of the changes
  to the GNU Toolchain are as follows:

   * The GDB 7.10 branch has been created.  Expect a release soon.

   * Support for tracepoints on aarch64-linux was added in GDBserver.

   * A point update of the FSF Binutils - 2.25.1 - has been released.
 No new features but lots of important bug fixes.

   * GCC 5.2 has been released.  This is a bug-fix update to the
 previous 5.1 release.

   * The linker now has experimental support for the removal of
 redundant sections from COFF and PE format files.  This is
 enabled via the --gc-sections linker command line option.

   * A new linker command line option --require-defined= has
 been added.  This behaves in much the same way as the
 --undefined= option in that it creates a reference to an
 undefined symbol that should force a library to be pulled into
 the link or garbage collection not to remove a specific section.
 The difference between --require-defined and --undefined is that
 with the former the linker will issue an error message if the
 specified symbol has not been defined by the end of the link.

  * The --disassemble (or -d) and --disassemble-all (or -D) options to
objdump have received a subtle change.  With -d objdump will
assume that any symbols present in a code section occur on the
boundary between instructions and so it will refuse to disassemble
across such a boundary.  With -D however this assumption is
suppressed.  This means that it is possible for the output of
-d and -D to differ if, for example, data is stored in a code
section.

  * GCC has a couple of new warning options available:

  -Wframe-address

This generates a warning when the __builtin_frame_address or
__builtin_return_address are called with an argument greater than
0.  Such calls may return indeterminate values or crash the
program.

  -Wtautological-compare

This generates a warning if a self-comparison always evaluates to
true or false.  This detects various mistakes such as:

  int i = 1;

  if (i > i) { ... }

  * With the Nios II port of GCC it is now possible to specify the
target architecture variant with -march=r1 or -march=r2.  It is
also possible to explicitly enable or disable the use of the r2
BMX (bit manipulation) and CDX (code density) instructions via the
use of the new -mbmx -mno-bmx -mcdx and -mno-cdx options.

Cheers
  Nick


[PATCH] PR rtl-optimization/67029: gcc-5.2.0 unable to find a register to spill with O3 fsched-pressure fschedule-insns

2015-08-05 Thread H.J. Lu
Since ira_implicitly_set_insn_hard_regs may be called outside of
ira-lives.c, it can't use the local variable, preferred_alternatives.
This patch adds an alternative_mask argument to
ira_implicitly_set_insn_hard_regs.

OK for master and 5 branch if there are no regressions on Linux/x86-64?

H.J.
---
gcc/

PR rtl-optimization/67029
* ira-color.c: Include "recog.h" before including "ira-int.h".
* target-globals.c: Likewise.
* ira-lives.c (ira_implicitly_set_insn_hard_regs): Add an
adds an alternative_mask argument and use it instead of
preferred_alternatives.
* ira.h (ira_implicitly_set_insn_hard_regs): Moved to ...
* ira-int.h (ira_implicitly_set_insn_hard_regs): Here.
* sched-deps.c: Include "ira-int.h" after including "ira.h".
(sched_analyze_insn): Update call to
ira_implicitly_set_insn_hard_regs.
* sel-sched.c: Include "ira-int.h" after including "ira.h".
(implicit_clobber_conflict_p): Update call to
ira_implicitly_set_insn_hard_regs.

gcc/testsuite/

PR rtl-optimization/67029
* gcc.dg/pr67029.c: New test.
---
 gcc/ira-color.c|  1 +
 gcc/ira-int.h  |  2 ++
 gcc/ira-lives.c|  4 ++--
 gcc/ira.h  |  1 -
 gcc/sched-deps.c   |  4 +++-
 gcc/sel-sched.c|  4 +++-
 gcc/target-globals.c   |  1 +
 gcc/testsuite/gcc.dg/pr67029.c | 14 ++
 8 files changed, 26 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr67029.c

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 74d2c2e..c8f33ed 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "ira.h"
 #include "alloc-pool.h"
+#include "recog.h"
 #include "ira-int.h"
 
 typedef struct allocno_hard_regs *allocno_hard_regs_t;
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index a7c0f40..a993dfc 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -1041,6 +1041,8 @@ extern void ira_debug_live_ranges (void);
 extern void ira_create_allocno_live_ranges (void);
 extern void ira_compress_allocno_live_ranges (void);
 extern void ira_finish_allocno_live_ranges (void);
+extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *,
+  alternative_mask);
 
 /* ira-conflicts.c */
 extern void ira_debug_conflicts (bool);
diff --git a/gcc/ira-lives.c b/gcc/ira-lives.c
index 1cb05c2..011d513 100644
--- a/gcc/ira-lives.c
+++ b/gcc/ira-lives.c
@@ -831,7 +831,8 @@ single_reg_operand_class (int op_num)
might be used by insn reloads because the constraints are too
strict.  */
 void
-ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set)
+ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set,
+  alternative_mask preferred)
 {
   int i, c, regno = 0;
   enum reg_class cl;
@@ -854,7 +855,6 @@ ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set)
  mode = (GET_CODE (op) == SCRATCH
  ? GET_MODE (op) : PSEUDO_REGNO_MODE (regno));
  cl = NO_REGS;
- alternative_mask preferred = preferred_alternatives;
  for (; (c = *p); p += CONSTRAINT_LEN (c, p))
if (c == '#')
  preferred &= ~ALTERNATIVE_BIT (0);
diff --git a/gcc/ira.h b/gcc/ira.h
index 504b5e6..881674b 100644
--- a/gcc/ira.h
+++ b/gcc/ira.h
@@ -192,7 +192,6 @@ extern void ira_init (void);
 extern void ira_setup_eliminable_regset (void);
 extern rtx ira_eliminate_regs (rtx, machine_mode);
 extern void ira_set_pseudo_classes (bool, FILE *);
-extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
 extern void ira_expand_reg_equiv (void);
 extern void ira_update_equiv_info_by_shuffle_insn (int, int, rtx_insn *);
 
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 3ac66e8..0a8dcb0 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "cselib.h"
 #include "ira.h"
+#include "ira-int.h"
 #include "target.h"
 
 #ifdef INSN_SCHEDULING
@@ -2891,7 +2892,8 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, 
rtx_insn *insn)
 
   extract_insn (insn);
   preprocess_constraints (insn);
-  ira_implicitly_set_insn_hard_regs (&temp);
+  alternative_mask prefrred = get_preferred_alternatives (insn);
+  ira_implicitly_set_insn_hard_regs (&temp, prefrred);
   AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs);
   IOR_HARD_REG_SET (implicit_reg_pending_clobbers, temp);
 }
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index ec2ab05..1860444 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtlhooks-def.h"
 #include "emit-rtl.h"
 #include "ira.h"
+#include "ira-int.h"
 #include "rtl-iter.h"
 
 #ifdef INSN_SCHEDULING
@@ -2104,7 +2105,8 @@ implic

gcc-4.9-20150805 is now available

2015-08-05 Thread gccadmin
Snapshot gcc-4.9-20150805 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150805/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 226647

You'll find:

 gcc-4.9-20150805.tar.bz2 Complete GCC

  MD5=ec5f377410e6ccd5c6ed08b766d3ffad
  SHA1=008fd20869c46941b7e069588d5e7c5c717137ee

Diffs from 4.9-20150729 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.