date:20171205

Re: [PATCH] emit a trap for buffer overflow in placement new

2017-12-05 Thread Marc Glisse


On Mon, 4 Dec 2017, Martin Sebor wrote:


On 12/04/2017 03:44 PM, Marc Glisse wrote:

On Mon, 4 Dec 2017, Martin Sebor wrote:


The -Wplacement-new option warns for buffer overflow in placement
new expressions with objects of constant sizes, but because it's
implemented completely in the C++ front end it misses the more
interesting non-constant sizes.

The attached patch instruments both forms of operator placement
new to emit a trap when __builtin_object_size() determines that
the pointer points to an object less than the specified number
of bytes.  This is done only when _FORTIFY_SOURCE is defined
to a non-zero value.  This makes it possible to prevent buffer
overflow in most of the same cases as in built-ins like strcpy,
though without warnings when the size is nor a C++ constant
integer expression.

On x86_64-linux it passes testing with no apparent regressions.
Can anyone think of problems with this solution?  If not, given
its simplicity, would it be appropriate even at this stage?


AFAIK, one can call this operator new manually on any pointer, including
one-past-the-end pointers and null pointers. It is only with new
expressions that the limitation comes in (because it runs a constructor
afterwards). Not that people often do that...


Hmm, yes, there don't seem to be any requirements on calling
the operator by itself.  That's too bad.  I was going to move
the placement new checking into the middle-end to improve
the detection and avoid some false positives but the operator
disappears too early, before the size of the expression that's
being stuffed into the destination is known.


What's wrong with generating the same code you wanted to put in operator 
new from the front-end, next to the call to operator new, when the 
front-end handles a new expression?



The only other solution that comes to mind to detect non-constant
overflow is to do something like:

 void* operator new (size_t n, void *p)
 {
   if (__builtin_object_size (p, 0) < n)
__builtin_warn ("buffer overflow in placement new");
   return p;
 }

and emit the warning from __builtin_warn during expansion.  That
should work and be usable in other contexts as well (e.g., to
implement overflow and other error detection in user-defined
functions).  It would be kind of like the middle-end form of
Clang attribute diagnose_if.


I may have misunderstood, but that seems to have the same issue of 
applying to all calls to operator new instead of just new expressions.



Still, I wonder if tightening up the standard to require that
the pointer point to an object of at least n bytes in size would
run into problems or be met with objections.


--
Marc Glisse

[PING 2] Ability to remap file names in FILE, etc (PR other/70268)

2017-12-05 Thread Boris Kolpackov

Hi,

I would like to again ping this patch:

https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01451.html

While the patch touches a few places, it is mostly reshuffling,
generalizing, and fixing existing code. It also includes tests
for the new functionality. Seeing that there is interest[1] in
reproducible builds, I think it will be useful to have this
patch considered for GCC 8, if possible.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70268

Thanks,
Boris

Re: [PATCH][gimple-interchange] Add reduction validity check

2017-12-05 Thread Richard Biener

On Mon, 4 Dec 2017, Richard Biener wrote:

> On Mon, 4 Dec 2017, Bin.Cheng wrote:
> 
> > On Mon, Dec 4, 2017 at 1:11 PM, Richard Biener  wrote:
> > >
> > > I've noticed we perform FP reduction association without the required
> > > checks for associative math.  I've added
> > > gcc.dg/tree-ssa/loop-interchange-1b.c to cover this.
> > >
> > > I also noticed we happily interchange a loop with a reduction like
> > >
> > >  sum = a[i] - sum;
> > >
> > > where a change in order of elements isn't ok.  Unfortunately bwaves
> > > is exactly a case where single_use != next_def (tried to simply remove
> > > that case for now), because reassoc didn't have a chance to fix the
> > > operand order.  Thus this patch exports the relevant handling from
> > > the vectorizer (for stage1 having a separate infrastructure gathering /
> > > analyzing of reduction/induction infrastructure would be nice...)
> > > and uses it from interchange.  We then don't handle
> > > gcc.dg/tree-ssa/loop-interchange-4.c anymore (similar vectorizer
> > > missed-opt is PR65930).  I didn't bother to split up the vectorizer
> > > code further to implement relaxed validity checking but simply XFAILed
> > > this testcase.
> > >
> > > Earlier I simplified allocation stuff in the main loop which is why
> > > this part is included in this patch.
> > >
> > > Bootstrap running on x86_64-unknown-linux-gnu.
> > >
> > > I'll see to craft a testcase with the sum = a[i] - sum; mis-handling.
> > >
> > > Ok?
> > Sure.
> > Just for the record.  There is also similar associative check in
> > predcom.  As you suggested, a path extraction/checking interface for
> > associative checking would be great, given we have multiple users now.
> 
> Yeah.  Note for interchange we don't need associativeness in the sense
> as implemented by associative_tree_code, we need left-associativeness
> which also MINUS_EXPR provides.  So my check isn't perfect.  I guess
> we instead need
> 
>  associative_tree_code ()
>  || (code == MINUS_EXPR
>  && use_p->use == gimple_assign_rhs1_ptr (single_use))
> 
> where we could also allow RDIV_EXPR and other left-associative
> tree codes (but check_reduction_path would do the wrong thing
> with those at the moment but it has MINUS_EXPR handling that
> would support sum = x - (y - sum) which the above rejects).
> 
> So I'm retesting with
> 
>   /* Check the reduction operation.  We require a left-associative 
> operation.  
>  For FP math we also need to be allowed to associate operations.  */
>   enum tree_code code = gimple_assign_rhs_code (single_use);
>   if (gassign *ass = dyn_cast  (single_use))
> {
>   enum tree_code code = gimple_assign_rhs_code (ass);
>   if (! (associative_tree_code (code)
>  || (code == MINUS_EXPR
>  && use_p->use == gimple_assign_rhs1_ptr (ass)))
>   || (FLOAT_TYPE_P (TREE_TYPE (var))
>   && ! flag_associative_math))
> return false;
> }
>   else
> return false;
> 
> which is more restrictive than before.

And here's two testcases, one for sum = x - sum and one for division
by sum which shows wrong-code without this patch.

Richard.

2017-12-05  Richard Biener  

* gcc.dg/tree-ssa/loop-interchange-12.c: New testcase.
* gcc.dg/tree-ssa/loop-interchange-13.c: Likewise.

Index: gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-12.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-12.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-12.c (working copy)
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
+
+/* Copied from graphite/interchange-4.c */
+
+#define DEBUG 0
+#if DEBUG
+#include 
+#endif
+
+unsigned u[1024];
+
+static void __attribute__((noinline,noclone,noipa))
+foo (int N, unsigned *res)
+{
+  int i, j;
+  unsigned sum = 1;
+  for (i = 0; i < N; i++)
+for (j = 0; j < N; j++)
+  sum = u[i + 2 * j] / sum;
+
+  *res = sum;
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j;
+  unsigned res;
+
+  u[0] = 10;
+  u[1] = 200;
+  u[2] = 10;
+  u[3] = 10;
+
+  foo (2, &res);
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 0)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "is interchanged" "linterchange"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-13.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-13.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-13.c (working copy)
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
+
+/* Copied from graphite/interchange-4.c */
+
+#define DEBUG 0
+#if DEBUG
+#include 
+#endif
+
+unsigned u[1024];
+
+static void __attribute__((noinline,noclone,noipa))
+foo (int N, int M, unsigned *res)
+{
+  int i, j

Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-12-05 Thread Jakub Jelinek

On Tue, Nov 21, 2017 at 12:59:46PM +0100, Martin Liška wrote:
> On 10/16/2017 10:39 PM, Martin Liška wrote:
> > Hi.
> > 
> > All nits included in mainline review request I've just done:
> > https://reviews.llvm.org/D38971
> > 
> > Martin
> 
> Hi.
> 
> There's updated version of patch where I added new test-cases and it's rebased
> with latest version of libsanitizer changes. This is subject for libsanitizer 
> review process.

A slightly modified version has been finally accepted upstream, here is the
updated patch with that final version cherry-picked, plus small changes to
make it apply after the POINTER_DIFF_EXPR changes, and a lot of testsuite
tweaks, so that we don't run it 7 times with -O0, but with different
optimization levels, cleanups etc.
The most important change I've done in the testsuite was pointer-subtract-2.c
used -fsanitize=address,pointer-subtract, but the function was actually
doing pointer comparison.  Guess that needs to be propagated upstream at
some point.  Another thing is that in pointer-compare-1.c where you test
p - 1, p and p, p - 1 on the global variables, we need to ensure there is
some other array before it, otherwise we run into the issue that there is no
red zone before the first global (and when optimizing, global objects seems
to be sorted by decreasing size).

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2017-12-05  Martin Liska  
Jakub Jelinek  

gcc/
* doc/invoke.texi: Document the options.
* flag-types.h (enum sanitize_code): Add
SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
* ipa-inline.c (sanitize_attrs_match_for_inline_p): Add handling
of SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
* opts.c: Define new sanitizer options.
* sanitizer.def (BUILT_IN_ASAN_POINTER_COMPARE): Likewise.
(BUILT_IN_ASAN_POINTER_SUBTRACT): Likewise.
gcc/c/
* c-typeck.c (pointer_diff): Add new argument and instrument
pointer subtraction.
(build_binary_op): Similar for pointer comparison.
gcc/cp/
* typeck.c (pointer_diff): Add new argument and instrument
pointer subtraction.
(cp_build_binary_op): Create compound expression if doing an
instrumentation.
gcc/testsuite/
* c-c++-common/asan/pointer-compare-1.c: New test.
* c-c++-common/asan/pointer-compare-2.c: New test.
* c-c++-common/asan/pointer-subtract-1.c: New test.
* c-c++-common/asan/pointer-subtract-2.c: New test.
* c-c++-common/asan/pointer-subtract-3.c: New test.
* c-c++-common/asan/pointer-subtract-4.c: New test.
libsanitizer/
* asan/asan_descriptions.cc: Cherry-pick upstream r319668.
* asan/asan_descriptions.h: Likewise.
* asan/asan_report.cc: Likewise.
* asan/asan_thread.cc: Likewise.
* asan/asan_thread.h: Likewise.

--- gcc/c/c-typeck.c.jj 2017-12-01 19:42:09.504222186 +0100
+++ gcc/c/c-typeck.c2017-12-04 22:41:50.680290866 +0100
@@ -95,7 +95,7 @@ static tree lookup_field (tree, tree);
 static int convert_arguments (location_t, vec, tree,
  vec *, vec *, tree,
  tree);
-static tree pointer_diff (location_t, tree, tree);
+static tree pointer_diff (location_t, tree, tree, tree *);
 static tree convert_for_assignment (location_t, location_t, tree, tree, tree,
enum impl_conv, bool, tree, tree, int);
 static tree valid_compound_expr_initializer (tree, tree);
@@ -3768,10 +3768,11 @@ parser_build_binary_op (location_t locat
 }

 /* Return a tree for the difference of pointers OP0 and OP1.
-   The resulting tree has type ptrdiff_t.  */
+   The resulting tree has type ptrdiff_t.  If POINTER_SUBTRACT sanitization is
+   enabled, assign to INSTRUMENT_EXPR call to libsanitizer.  */

 static tree
-pointer_diff (location_t loc, tree op0, tree op1)
+pointer_diff (location_t loc, tree op0, tree op1, tree *instrument_expr)
 {
   tree restype = ptrdiff_type_node;
   tree result, inttype;
@@ -3815,6 +3816,17 @@ pointer_diff (location_t loc, tree op0,
 pedwarn (loc, OPT_Wpointer_arith,
 "pointer to a function used in subtraction");

+  if (sanitize_flags_p (SANITIZE_POINTER_SUBTRACT))
+{
+  gcc_assert (current_function_decl != NULL_TREE);
+
+  op0 = save_expr (op0);
+  op1 = save_expr (op1);
+
+  tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_SUBTRACT);
+  *instrument_expr = build_call_expr_loc (loc, tt, 2, op0, op1);
+}
+
   /* First do the subtraction, then build the divide operator
  and only convert at the very end.
  Do not do default conversions in case restype is a short type.  */
@@ -3825,8 +3837,8 @@ pointer_diff (location_t loc, tree op0,
  space, cast the pointers to some larger integer type and do the
  computations in that type.  */
   if (TYPE_PRECISION (inttype) > TYPE_PRECISION (TREE_TYPE (op0)))
-

Re: [PATCH] ARM testsuite: force hardfp for addr-modes-float.c

2017-12-05 Thread Kyrill Tkachov



On 01/12/17 15:43, Charles Baylis wrote:

On 30 November 2017 at 15:56, Kyrill  Tkachov
 wrote:


So is it the case that you don't run any arm tests that include arm_neon.h
in your configuration?

No, it is only the case that any arm test which includes arm_neon.h
(in fact, any system header) *and* uses dg-add-options
-mfloat-abi=hard fails on my configuration (And -mfloat-abi=softfp
fails in my configurations which default to hardfp). [1]


Yes, you're right.


The only test which currently has -mfloat-abi=hard and #include
 is gcc.target/arm/pr51534.c, and it FAILs in my
arm-unknown-linux-gnueabi configuration.


If so, then I would be fine with leaving this test unsupported on this
configuration.

I don't see why, when the test can simply be fixed with
attribute((pcs)), but if you prefer I can respin the patch
accordingly.


No need, I think adding the pcs attribute to this test is the simpler 
solution here.

So your patch is ok as is, sorry for the noise...


By the way, I notice that in addr-modes-float.c the arm_neon_ok check is
placed before the dg-add-options.
I don't remember the arcane rules exactly, but I think the effective target
check should go before it, so that the test gets skipped properly.

OK, I can respin the patch with that change.


This can be done as a follow up if you want, or if you make this change 
as part of this patch

they are pre-approved.

Thanks,
Kyrill


[1] full details as follows:

$ arm-unknown-linux-gnueabi-gcc -v
COLLECT_GCC=/home/cbaylis/tools//tools-arm-unknown-linux-gnueabi-git/bin/arm-unknown-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/home/cbaylis/tools/tools-arm-unknown-linux-gnueabi-git/bin/../libexec/gcc/arm-unknown-linux-gnueabi/8.0.0/lto-wrapper
Target: arm-unknown-linux-gnueabi
Configured with: /home/cbaylis/srcarea/gcc/gcc-git/configure
--prefix=/home/cbaylis/tools//tools-arm-unknown-linux-gnueabi-git
--target=arm-unknown-linux-gnueabi --enable-languages=c,c++
--with-sysroot=/home/cbaylis/tools//sysroot-arm-unknown-linux-gnueabi-git
--with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
--with-float=softfp --with-mode=thumb
Thread model: posix
gcc version 8.0.0 20171124 (experimental) (GCC)

$ cat tn.c
#include 

$ arm-unknown-linux-gnueabi-gcc -mfloat-abi=hard tn.c
In file included from
/home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/features.h:447,
  from
/home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/bits/libc-header-start.h:33,
  from
/home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/stdio.h:27,
  from tn.c:2:
/home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/gnu/stubs.h:10:11:
fatal error: gnu/stubs-hard.h: Dosiero aŭ dosierujo ne ekzistas
  # include 
^~
compilation terminated.

[PATCH branch/gimple-interchange]obvious cleanup

2017-12-05 Thread Bin Cheng

Hi,
This is an obvious cleanup patch doing variable renaming, function inlining.
Is it OK?

Thanks,
bin
2017-12-05  Bin Cheng  

* gimple-loop-interchange.cc (struct induction): Rename fields.
(dump_induction, loop_cand::analyze_induction_var): Update uses.
(loop_cand::undo_simple_reduction): Ditto.
(tree_loop_interchange::map_inductions_to_loop): Ditto.
(tree_loop_interchange::can_interchange_loops): Delete.
(tree_loop_interchange::interchange): Inline can_interchange_loops.From 734217d0879c246a139d4c55ecd65837b2fa6077 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Tue, 5 Dec 2017 10:00:06 +
Subject: [PATCH] cleanup-1

---
 gcc/gimple-loop-interchange.cc | 49 +-
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 1f46509..8c396ab 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -100,10 +100,11 @@ typedef struct induction
 {
   /* IV itself.  */
   tree var;
-  /* Initializer.  */
-  tree init;
-  /* IV's base and step part of SCEV.  */
-  tree base;
+  /* IV's initializing value, which is the init arg of the IV PHI node.  */
+  tree init_val;
+  /* IV's initializing expr, which is (the expanded result of) init_val.  */
+  tree init_expr;
+  /* IV's step.  */
   tree step;
 }* induction_p;
 
@@ -161,7 +162,7 @@ dump_induction (struct loop *loop, induction_p iv)
   fprintf (dump_file, "  Induction:  ");
   print_generic_expr (dump_file, iv->var, TDF_SLIM);
   fprintf (dump_file, " = {");
-  print_generic_expr (dump_file, iv->base, TDF_SLIM);
+  print_generic_expr (dump_file, iv->init_expr, TDF_SLIM);
   fprintf (dump_file, ", ");
   print_generic_expr (dump_file, iv->step, TDF_SLIM);
   fprintf (dump_file, "}_%d\n", loop->num);
@@ -742,8 +743,8 @@ loop_cand::analyze_induction_var (tree var, tree chrec)
 {
   struct induction *iv = XCNEW (struct induction);
   iv->var = var;
-  iv->init = init;
-  iv->base = chrec;
+  iv->init_val = init;
+  iv->init_expr = chrec;
   iv->step = build_int_cst (TREE_TYPE (chrec), 0);
   m_inductions.safe_push (iv);
   return true;
@@ -757,8 +758,8 @@ loop_cand::analyze_induction_var (tree var, tree chrec)
 
   struct induction *iv = XCNEW (struct induction);
   iv->var = var;
-  iv->init = init;
-  iv->base = CHREC_LEFT (chrec);
+  iv->init_val = init;
+  iv->init_expr = CHREC_LEFT (chrec);
   iv->step = CHREC_RIGHT (chrec);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -938,14 +939,14 @@ loop_cand::undo_simple_reduction (reduction_p re, bitmap dce_seeds)
   /* Find all stmts on which expression "MEM_REF[idx]" depends.  */
   find_deps_in_bb_for_stmt (&stmts, gimple_bb (re->consumer), re->consumer);
   /* Because we generate new stmt loading from the MEM_REF to TMP.  */
-  tree tmp = copy_ssa_name (re->var);
+  tree cond, tmp = copy_ssa_name (re->var);
   stmt = gimple_build_assign (tmp, re->init_ref);
   gimple_seq_add_stmt (&stmts, stmt);
 
   /* Init new_var to MEM_REF or CONST depending on if it is the first
 	 iteration.  */
   induction_p iv = m_inductions[0];
-  tree cond = fold_build2 (NE_EXPR, boolean_type_node, iv->var, iv->init);
+  cond = fold_build2 (NE_EXPR, boolean_type_node, iv->var, iv->init_val);
   new_var = copy_ssa_name (re->var);
   stmt = gimple_build_assign (new_var, COND_EXPR, cond, tmp, re->init);
   gimple_seq_add_stmt (&stmts, stmt);
@@ -1007,7 +1008,6 @@ public:
 private:
   void update_data_info (unsigned, unsigned, vec, vec);
   bool valid_data_dependences (unsigned, unsigned, vec);
-  bool can_interchange_loops (loop_cand &, loop_cand &);
   void interchange_loops (loop_cand &, loop_cand &);
   void map_inductions_to_loop (loop_cand &, loop_cand &);
   void move_code_to_inner_loop (struct loop *, struct loop *, basic_block *);
@@ -1099,21 +1099,6 @@ tree_loop_interchange::valid_data_dependences (unsigned i_idx, unsigned o_idx,
   return true;
 }
 
-/* Return true if ILOOP and OLOOP can be interchanged in terms of code
-   transformation.  */
-
-bool
-tree_loop_interchange::can_interchange_loops (loop_cand &iloop,
-	  loop_cand &oloop)
-{
-  return (iloop.analyze_carried_vars (NULL)
-	  && iloop.analyze_lcssa_phis ()
-	  && oloop.analyze_carried_vars (&iloop)
-	  && oloop.analyze_lcssa_phis ()
-	  && iloop.can_interchange_p (NULL)
-	  && oloop.can_interchange_p (&iloop));
-}
-
 /* Interchange two loops specified by ILOOP and OLOOP.  */
 
 void
@@ -1227,7 +1212,8 @@ tree_loop_interchange::map_inductions_to_loop (loop_cand &src, loop_cand &tgt)
 	{
 	  /* Map the IV by creating the same one in target loop.  */
 	  tree var_before, var_after;
-	  tree base = unshare_expr (iv->base), step = unshare_expr (iv->step);
+	  tree base = unshare_expr (iv->init_expr);
+	  tree step = unshare_expr (iv->step);
 	  create_iv (base, step, SSA_NAM

Re: [PATCH 1/2] [SPARC] Prevent -mfix-ut699 from generating b2bst errata sequences

2017-12-05 Thread Eric Botcazou

> The sequence
>   st
>   fdivd / fsqrtd
>   std
> was generated in some cases with -mfix-ut699 when there was
> a st before the div/sqrt. This sequence could trigger the b2bst errata.
> 
> Now the following safe sequence is generated instead:
>   st
>   nop
>   fdivd / fsqrtd
>   std
> 
> gcc/ChangeLog:
> 
> 2017-11-27  Martin Aberg  
> 
>   * config/sparc/sparc.md (divdf3_fix): Add NOP and adjust length
> to prevent b2bst errata sequence.
> (sqrtdf2_fix): Likewise.

But isn't that the change I already rejected back in July?
  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00496.html

If so, what has changed since then?  If nothing, then please revert.

-- 
Eric Botcazou

Re: [PATCH branch/gimple-interchange]obvious cleanup

2017-12-05 Thread Richard Biener

On Tue, Dec 5, 2017 at 11:07 AM, Bin Cheng  wrote:
> Hi,
> This is an obvious cleanup patch doing variable renaming, function inlining.
> Is it OK?

Yes.

Thanks,
Richard.

> Thanks,
> bin
> 2017-12-05  Bin Cheng  
>
> * gimple-loop-interchange.cc (struct induction): Rename fields.
> (dump_induction, loop_cand::analyze_induction_var): Update uses.
> (loop_cand::undo_simple_reduction): Ditto.
> (tree_loop_interchange::map_inductions_to_loop): Ditto.
> (tree_loop_interchange::can_interchange_loops): Delete.
> (tree_loop_interchange::interchange): Inline can_interchange_loops.

[Ada] Improper visibility of loop parameter with Iterable aspect

2017-12-05 Thread Pierre-Marie de Rodat

This patch fixes a visibility bug in the expansion of a loop over an object
whose type carries the GNAT-specific Iterable aspect. Prior to this patch,
the loop variable remained visible after exit from the loop.

Compiling iterator_test.adb must yield:

   iterator_test.adb:26:61: "number" is not visible
   iterator_test.adb:26:61: non-visible declaration at line 21
   iterator_test.adb:26:61: non-visible declaration at line 16

---
with Ada.Text_IO;
with iterators; use iterators;
procedure iterator_test is

   it_test : constant my_iterator := create_it(from => 3, to => 10);

begin

   -- reference loop
   for number in 1 .. 10 loop
  Ada.Text_IO.Put_Line (Integer'Image (number));
   end loop;

   -- gas day iterations
   Ada.Text_IO.Put_Line ("Test foreward iteration");
   for number in it_test loop
  Ada.Text_IO.Put_Line (Integer'Image (number));
   end loop;

   Ada.Text_IO.Put_Line ("Test backward iteration");
   for number in create_it(from => 10, to => 3)
   loop
  Ada.Text_IO.Put_Line (Integer'Image (number));
   end loop;

   Ada.Text_IO.Put_Line ("Out of scope?! " & Integer'Image (number));

end iterator_test;
---
package iterators is

   type my_iterator is private with Iterable =>
(first   => iterator_first,
 next=> iterator_next,
 has_element => iterator_has_element);

   function create_it (from : Integer; to : Integer) return my_iterator;

   function iterator_first (it : my_iterator) return Integer;

   function iterator_next (it : my_iterator; prev_item : Integer)
  return Integer;

   function iterator_has_element (it : my_iterator; prev_item : Integer) 
 return Boolean;

private
   type my_iterator is record
  from : Integer;
  to   : Integer;
   end record;

end iterators;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-05  Ed Schonberg  

* exp_ch5.adb (Expand_Formal_Container_Loop): Ensure that the loop
parameter becomes invisible after analyzing the loop, which has been
rewritten as a while-loop.

Index: exp_ch5.adb
===
--- exp_ch5.adb (revision 255408)
+++ exp_ch5.adb (working copy)
@@ -3135,6 +3135,7 @@
 
   Advance   : Node_Id;
   Init_Decl : Node_Id;
+  Init_Name : Entity_Id;
   New_Loop  : Node_Id;
 
begin
@@ -3169,7 +3170,8 @@
   --  the loop.
 
   Analyze (Init_Decl);
-  Set_Ekind (Defining_Identifier (Init_Decl), E_Loop_Parameter);
+  Init_Name := Defining_Identifier (Init_Decl);
+  Set_Ekind (Init_Name, E_Loop_Parameter);
 
   --  The cursor was marked as a loop parameter to prevent user assignments
   --  to it, however this renders the advancement step illegal as it is not
@@ -3182,9 +3184,11 @@
   --  Because we have to analyze the initial declaration of the loop
   --  parameter multiple times its scope is incorrectly set at this point
   --  to the one surrounding the block statement - so set the scope
-  --  manually to be the actual block statement.
+  --  manually to be the actual block statement, and indicate that it is
+  --  not visible after the block has been analyzed.
 
-  Set_Scope (Defining_Identifier (Init_Decl), Entity (Identifier (N)));
+  Set_Scope (Init_Name, Entity (Identifier (N)));
+  Set_Is_Immediately_Visible (Init_Name, False);
end Expand_Formal_Container_Loop;
 
--

[Ada] Propagate SLOC in iteration over array

2017-12-05 Thread Pierre-Marie de Rodat

Expand_Iterator_Loop_Over_Array turns a loop over an array:

   for Element of Array loop

into a loop with an explicit iteration variable, but it doesn't propagate the 
SLOC of Element onto the new iteration variable.  This results in inaccurate
information when precise coverage is requested, as the SLOC of the enclosing
loop statement is propagated instead.

The change fixes this issue by using the SLOC of iteration scheme for the
entire rewrite of the construct, except for the new loop statements which
still inherit the SLOC of the original loop statement.

No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-05  Eric Botcazou  

* exp_ch5.adb (Expand_Iterator_Loop_Over_Array): Use the SLOC of the
iteration scheme throughout, except for the new loop statement(s).

Index: exp_ch5.adb
===
--- exp_ch5.adb (revision 255410)
+++ exp_ch5.adb (working copy)
@@ -3673,7 +3673,7 @@
   Array_Typ  : constant Entity_Id  := Base_Type (Etype (Array_Node));
   Array_Dim  : constant Pos:= Number_Dimensions (Array_Typ);
   Id : constant Entity_Id  := Defining_Identifier (I_Spec);
-  Loc: constant Source_Ptr := Sloc (N);
+  Loc: constant Source_Ptr := Sloc (Isc);
   Stats  : constant List_Id:= Statements (N);
   Core_Loop  : Node_Id;
   Dim1   : Int;
@@ -3734,7 +3734,7 @@
   end if;
 
   Core_Loop :=
-Make_Loop_Statement (Loc,
+Make_Loop_Statement (Sloc (N),
   Iteration_Scheme =>
 Make_Iteration_Scheme (Loc,
   Loop_Parameter_Specification =>
@@ -3771,7 +3771,7 @@
 --end loop;
 
 Core_Loop :=
-  Make_Loop_Statement (Loc,
+  Make_Loop_Statement (Sloc (N),
 Iteration_Scheme =>
   Make_Iteration_Scheme (Loc,
 Loop_Parameter_Specification =>

[Ada] Fix handling of load addresses for backtraces through SO

2017-12-05 Thread Pierre-Marie de Rodat

The processing of shared library load addresses for backtrace symbolization
suffers from disconnects between various parts in the runtime library regarding
where run-time vs shared-object-file-relative addresses are at hand.

At some points the code processes a local copy of each address coming from a
backtrace, adjusted by the load address when it comes from a shared library,
while other places keep referring to the run-time (unadjusted) address within
the backtrace array, at spots where module-relative addresses are needed as
well, e.g. when looking up debug info mmaped from the module file.

The first obvious "fix" idea for this would be to adjust the addresses in the
backtrace array as well, but we need to keep the unadjusted address someplace,
at least be able to display it to the user. It corresponds to real run-time
addresses that can be looked-up and examined while the process is running, so
is generally valuable.

This change introduces more explicit and systematic use of the load address
shift instead.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-05  Olivier Hainque  

libgnat/

* s-trasym__dwarf.adb (spec of Module_Name.Get): Instead of
possibly adjusting the lookup address by a load address, expect
a extra argument through which the load address can be conveyed
separately.
(Multi_Module_Symbolic_Traceback): Adjust accordingly. Pass the
retrieved load address to Init_Module.
* s-tsmona__linux.adb (Get): Honor the new interface.
* s-tsmona__mingw.adb (Get): Likewise.
* s-dwalin.ads: Adjust comments to be explicit about which
addresses are from module info and which are run-time addresses,
offsetted by the module load address.
* s-dwalin.adb (Set_Load_Address): Simply set C.Load_Slide.
Do not alter the module Low and High (relative) addresses.
(Is_Inside): Improve documentation regarding the kinds of addresses
at hand and correct the test.
(Symbolic_Traceback): Use separate variables with explicit names
for the address in traceback (run-time value) and the address to
lookup within the shared object (module-relative). Adjust the
computation of address passed to Symbolic_Address for symbolization.
Index: libgnat/s-tsmona__mingw.adb
===
--- libgnat/s-tsmona__mingw.adb (revision 255408)
+++ libgnat/s-tsmona__mingw.adb (working copy)
@@ -50,15 +50,20 @@
-- Get --
-
 
-   function Get (Addr : access System.Address) return String is
+   function Get (Addr : System.Address;
+ Load_Addr : access System.Address)
+ return String
+   is
   Res : DWORD;
   hModule : aliased HANDLE;
   Path: String (1 .. 1_024);
 
begin
+  Load_Addr.all := System.Null_Address;
+
   if GetModuleHandleEx
(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS,
-Addr.all,
+Addr,
 hModule'Access) = Win32.TRUE
   then
  Res := GetModuleFileName (hModule, Path'Address, Path'Length);
Index: libgnat/s-tsmona__linux.adb
===
--- libgnat/s-tsmona__linux.adb (revision 255408)
+++ libgnat/s-tsmona__linux.adb (working copy)
@@ -32,8 +32,6 @@
 --  This is the GNU/Linux specific version of this package
 with Interfaces.C;  use Interfaces.C;
 
-with System.Address_Operations; use System.Address_Operations;
-
 separate (System.Traceback.Symbolic)
 
 package body Module_Name is
@@ -134,7 +132,10 @@
-- Get --
-
 
-   function Get (Addr : access System.Address) return String is
+   function Get (Addr : System.Address;
+ Load_Addr : access System.Address)
+ return String
+   is
 
   --  Dl_info record for Linux, used to get sym reloc offset
 
@@ -154,13 +155,15 @@
   info : aliased Dl_info;
 
begin
-  if dladdr (Addr.all, info'Access) /= 0 then
+  Load_Addr.all := System.Null_Address;
 
+  if dladdr (Addr, info'Access) /= 0 then
+
  --  If we have a shared library we need to adjust the address to
  --  be relative to the base address of the library.
 
  if Is_Shared_Lib (info.dli_fbase) then
-Addr.all := SubA (Addr.all, info.dli_fbase);
+Load_Addr.all := info.dli_fbase;
  end if;
 
  return Value (info.dli_fname);
Index: libgnat/s-trasym__dwarf.adb
===
--- libgnat/s-trasym__dwarf.adb (revision 255408)
+++ libgnat/s-trasym__dwarf.adb (working copy)
@@ -132,10 +132,12 @@
   procedure Build_Cache_For_All_Modules;
   --  Create the cache for all current modules
 
-  function Get (Addr : access System.Address) return String;
-  --  Returns the module name for the given address, Addr may be updated
-  --  to be set relative to a shared libr

Re: [PR 83141] Prevent SRA from removing type changing assignment

2017-12-05 Thread Martin Jambor

On Tue, Dec 05 2017, Martin Jambor wrote:
> On Tue, Dec 05 2017, Martin Jambor wrote:
> Hi,
>
>> Hi,
>>
>> this is a followup to Richi's
>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
>> 83141.  The basic idea is simple, be just as conservative about type
>> changing MEM_REFs as we are about actual VCEs.
>>
>> I have checked how that would affect compilation of SPEC 2006 and (non
>> LTO) Mozilla Firefox and am happy to report that the difference was
>> tiny.  However, I had to make the test less strict, otherwise testcase
>> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
>> and expects us to track values accross:
>>
>>   int a[] = { 1, 2, 3 };
>>   /* ... */
>>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
>>  /* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
>>  /* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
>>  /* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
>>   
>> SRA is able to load replacement of a[0] directly from the temporary
>> array which is apparently necessary to generate proper debug info.  I
>> have therefore allowed the current transformation to go forward if the
>> source does not contain any padding or if it is a read-only declaration.
>
> Ah, the read-only test is of course bogus, it was a last minute addition
> when I was apparently already too tired to think it through.  Please
> disregard that line in the patch (it has passed bootstrap and testing
> without it).
>
> Sorry for the noise,
>
> Martin
>

And for the record, below is the actual patch, after a fresh round of
re-testing to double check I did not mess up anything else.  As before,
I'd like to ask for review, especially of the type_contains_padding_p
predicate and then would like to commit it to trunk.

Thanks,

Martin


2017-12-05  Martin Jambor  

PR tree-optimization/83141
* tree-sra.c (type_contains_padding_p): New function.
(contains_vce_or_bfcref_p): Move up in the file, also test for
MEM_REFs implicitely changing types with padding.  Remove inline
keyword.
(build_accesses_from_assign): Check contains_vce_or_bfcref_p
before setting bit in should_scalarize_away_bitmap.

testsuite/
* gcc.dg/tree-ssa/pr83141.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +
 gcc/tree-sra.c  | 115 ++--
 2 files changed, 127 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
new file mode 100644
index 000..86caf66025b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O -fdump-tree-esra-details" } */
+
+volatile short vs;
+volatile long vl;
+
+struct A { short s; long i; long j; };
+struct A a, b;
+void foo ()
+{
+  struct A c;
+  __builtin_memcpy (&c, &b, sizeof (struct A));
+  __builtin_memcpy (&a, &c, sizeof (struct A));
+
+  vs = c.s;
+  vl = c.i;
+  vl = c.j;
+}
+int main()
+{
+  __builtin_memset (&b, 0, sizeof (struct A));
+  b.s = 1;
+  __builtin_memcpy ((char *)&b+2, &b, 2);
+  foo ();
+  __builtin_memcpy (&a, (char *)&a+2, 2);
+  if (a.s != 1)
+__builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" "esra" 
} } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 866cff0edb0..df9f10f83b6 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref)
   return false;
 }
 
+/* Return false if we can determine that TYPE has no padding, otherwise return
+   true.  */
+
+static bool
+type_contains_padding_p (tree type)
+{
+  if (is_gimple_reg_type (type))
+return false;
+
+  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
+return true;
+
+  switch (TREE_CODE (type))
+{
+case RECORD_TYPE:
+  {
+   unsigned HOST_WIDE_INT pos = 0;
+   for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
+ if (TREE_CODE (fld) == FIELD_DECL)
+   {
+ tree ft = TREE_TYPE (fld);
+
+ if (DECL_BIT_FIELD (fld)
+ || DECL_PADDING_P (fld)
+ || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
+   return true;
+
+ tree t = bit_position (fld);
+ if (!tree_fits_uhwi_p (t))
+   return true;
+ unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
+ if (pos != bp)
+   return true;
+
+ pos += tree_to_uhwi (TYPE_SIZE (ft));
+ if (type_contains_padding_p (ft))
+   return true;
+   }
+
+   if (pos != tree_to_uhwi (TYPE_SIZE (type)))
+ return true;
+
+   return false;
+  }
+
+case ARRAY_TYPE:
+  return type_contains_padding_p (TYPE_SIZE (type))

[Ada] Fix end-of dwarf section detection in address symbolizer

2017-12-05 Thread Pierre-Marie de Rodat

When executing a Line Number program statement, the state machine bails
out a few bytes before the current offset reaches the end of section to
account for possible padding bytes at the end.

The current test is checking if current_offset + 4 >= section_length,
which is too early for e.g. a program terminating with

  Advance PC by constant 17 to 0x776
  Extended opcode 1: End of Sequence

at offset 101 for a section with length 105.

This change tightens the computation, allowing proper interpretation
in the example case above, and explains the rationale.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-05  Olivier Hainque  

* s-dwalin.adb (Read_And_Execute_Isn): Adjust test checking for the end
of section. Add comments explaining the rationale of the computation.

Index: libgnat/s-dwalin.adb
===
--- libgnat/s-dwalin.adb(revision 255411)
+++ libgnat/s-dwalin.adb(working copy)
@@ -578,14 +578,21 @@
  Initialize_State_Machine (C);
   end if;
 
-  --  Read the next prologue
+  --  If we have reached the next prologue, read it. Beware of possibly
+  --  empty blocks.
 
+  --  When testing for the end of section, beware of possible zero padding
+  --  at the end. Bail out as soon as there's not even room for at least a
+  --  DW_LNE_end_sequence, 3 bytes from Off to Off+2. This resolves to
+  --  Off+2 > Last_Offset_Within_Section, that is Off+2 > Section_Length-1,
+  --  or Off+3 > Section_Length.
+
   Tell (C.Lines, Off);
   while Off = C.Next_Prologue loop
  Initialize_State_Machine (C);
  Parse_Prologue (C);
  Tell (C.Lines, Off);
- exit when Off + 4 >= Length (C.Lines);
+ exit when Off + 3 > Length (C.Lines);
   end loop;
 
   --  Test whether we're done
@@ -595,7 +602,7 @@
   --  We are finished when we either reach the end of the section, or we
   --  have reached zero padding at the end of the section.
 
-  if Prologue.Unit_Length = 0 or else Off + 4 >= Length (C.Lines) then
+  if Prologue.Unit_Length = 0 or else Off + 3 > Length (C.Lines) then
  Done := True;
  return;
   end if;

Re: [i386] Mask generation in avx2intrin.h

2017-12-05 Thread Marc Glisse


Adding Cc: Uros and Kirill
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02233.html

On Sat, 25 Nov 2017, Marc Glisse wrote:


Hello,

the way full masks are generated currently in avx2intrin.h is questionable: 
opaque for the inline functions, weird/wrong for the macros.


It is possible we may want to add code so the constant mask with all ones may 
be generated with vxorpd+vcmpeqpd instead of loading it from memory, but that 
looks like something that should be decided globally, not in each instruction 
that uses it.


Bootstrap+regtest on x86_64-pc-linux-gnu (skylake).

2017-11-27  Marc Glisse  

PR target/80885
* config/i386/avx2intrin.h (_mm_i32gather_pd): Rewrite mask generation.
(_mm256_i32gather_pd): Likewise.
(_mm_i64gather_pd): Likewise.
(_mm256_i64gather_pd): Likewise.
(_mm_i32gather_ps): Likewise.
(_mm256_i32gather_ps): Likewise.
(_mm_i64gather_ps): Likewise.
(_mm256_i64gather_ps): Likewise.


--
Marc Glisse

[Ada] Fix indirect calls to imported subprograms within generic

2017-12-05 Thread Pierre-Marie de Rodat

Our trampoline avoidance scheme based on subprogram descriptors
should always be disconnected for subprograms with foreign convention.
For access to subprogram subtypes, the Set_Convention front-end
procedure is expected to take care of this, but it currently only
does so for subtypes not within a generic instance.

This causes wrong code to be generated on many targets for the
testcase below, where the indirect call from Trigger is performed
through a (non existent) descriptor, resulting in SEGV in most
circumstances.

/* cfoo.c */
void c_foo ()
{
 printf ("hello there from C");
}

-- blob.ads
generic
package Blob is
  procedure Initialize;
  procedure Trigger;
end;

-- blob.adb
package body Blob is

  type Service_Ptr is access procedure;
  pragma Convention (C, Service_Ptr);

  Hook : Service_Ptr;

  procedure C_Foo;
  pragma Import (C, C_Foo, "c_foo");

  procedure Initialize is
  begin
 Hook := C_Foo'Access;
  end;

  procedure Trigger is
  begin
 Hook.all;
  end;
end;

The change proposed here just arranges for the bit clearing
to happen systematically in Set_Convention, which fixes the case
at hand and just confirms the bit value in case the entity was
processed previously.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-05  Olivier Hainque  

* sem_util.adb (Set_Convention): Always clear Can_Use_Internal_Rep
on access to subprogram types with foreign convention.

Index: sem_util.adb
===
--- sem_util.adb(revision 255412)
+++ sem_util.adb(working copy)
@@ -23090,17 +23090,7 @@
 and then Is_Access_Subprogram_Type (Base_Type (E))
 and then Has_Foreign_Convention (E)
   then
-
- --  A pragma Convention in an instance may apply to the subtype
- --  created for a formal, in which case we have already verified
- --  that conventions of actual and formal match and there is nothing
- --  to flag on the subtype.
-
- if In_Instance then
-null;
- else
-Set_Can_Use_Internal_Rep (E, False);
- end if;
+ Set_Can_Use_Internal_Rep (E, False);
   end if;
 
   --  If E is an object, including a component, and the type of E is an

[Ada] Warn on weal elaboration model for SPARK

2017-12-05 Thread Pierre-Marie de Rodat

This patch introduces a check which ensures that SPARK elaboration code is
processed using the static elaboration model as it guarantees the highest
degree of safety.


-- Source --


--  spark_pack.ads

package SPARK_Pack with SPARK_Mode is
   pragma Elaborate_Body;

   type Root is tagged null record;
   procedure Prim (Obj : Root);

   type Child is new Root with null record;
   procedure Prim (Obj : Child);
end SPARK_Pack;

--  spark_pack.adb

with Ada.Text_IO; use Ada.Text_IO;

package body SPARK_Pack with SPARK_Mode is
   procedure Prim (Obj : Root) is
   begin
  Put_Line ("Root.Prim");
   end Prim;

   procedure Prim (Obj : Child) is
   begin
  Put_Line ("Child.Prim");
   end Prim;
end SPARK_Pack;


-- Compilation and output --


$ echo "Static model"
$ gcc -c spark_pack.adb
$ echo "Relaxed static model"
$ gcc -c spark_pack.adb-gnatJ
$ echo "Dynamic model"
$ gcc -c spark_pack.adb -gnatE
$ echo "Relaxed dynamic model"
$ gcc -c spark_pack.adb -gnatE -gnatJ
Static model
Relaxed static model
spark_pack.ads:7:04: warning: SPARK elaboration checks require static
  elaboration model
spark_pack.ads:7:04: warning: relaxed elaboration model is in effect
Dynamic model
spark_pack.ads:4:09: warning: SPARK elaboration checks require static
  elaboration model
spark_pack.ads:4:09: warning: dynamic elaboration model is in effect
Relaxed dynamic model
spark_pack.ads:4:09: warning: SPARK elaboration checks require static
  elaboration model
spark_pack.ads:4:09: warning: dynamic elaboration model is in effect

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-05  Hristian Kirtchev  

* sem_elab.adb: Update the terminology and switch sections.
(Check_SPARK_Model_In_Effect): New routine.
(Check_SPARK_Scenario): Verify the model in effect for SPARK.
(Process_Conditional_ABE_Call_SPARK): Verify the model in effect for
SPARK.
(Process_Conditional_ABE_Instantiation_SPARK): Verify the model in
effect for SPARK.
(Process_Conditional_ABE_Variable_Assignment_SPARK): Verify the model
in effect for SPARK.

Index: sem_elab.adb
===
--- sem_elab.adb(revision 255412)
+++ sem_elab.adb(working copy)
@@ -117,6 +117,9 @@
-- Terminology --
-
 
+   --  * ABE - An attempt to activate, call, or instantiate a scenario which
+   --has not been fully elaborated.
+   --
--  * Bridge target - A type of target. A bridge target is a link between
--scenarios. It is usually a byproduct of expansion and does not have
--any direct ABE ramifications.
@@ -421,6 +424,8 @@
--   calls to subprograms which verify the run-time semantics of
--   the following assertion pragmas:
--
+   --  Default_Initial_Condition
+   --  Initial_Condition
--  Invariant
--  Invariant'Class
--  Post
@@ -429,8 +434,8 @@
--  Type_Invariant
--  Type_Invariant_Class
--
-   --   As a result, the assertion expressions of the pragmas will not
-   --   be processed.
+   --   As a result, the assertion expressions of the pragmas are not
+   --   processed.
--
--  -gnatd.U ignore indirect calls for static elaboration
--
@@ -1044,6 +1049,12 @@
--  Verify that expanded instance Exp_Inst does not precede the generic body
--  it instantiates (SPARK RM 7.7(6)).
 
+   procedure Check_SPARK_Model_In_Effect (N : Node_Id);
+   pragma Inline (Check_SPARK_Model_In_Effect);
+   --  Determine whether a suitable elaboration model is currently in effect
+   --  for verifying the SPARK rules of scenario N. Emit a warning if this is
+   --  not the case.
+
procedure Check_SPARK_Scenario (N : Node_Id);
pragma Inline (Check_SPARK_Scenario);
--  Top-level dispatcher for verifying SPARK scenarios which are not always
@@ -2696,12 +2707,57 @@
   end if;
end Check_SPARK_Instantiation;
 
+   -
+   -- Check_SPARK_Model_In_Effect --
+   -
+
+   SPARK_Model_Warning_Posted : Boolean := False;
+   --  This flag prevents the same SPARK model-related warning from being
+   --  emitted multiple times.
+
+   procedure Check_SPARK_Model_In_Effect (N : Node_Id) is
+   begin
+  --  Do not emit the warning multiple times as this creates useless noise
+
+  if SPARK_Model_Warning_Posted then
+ null;
+
+  --  SPARK rule verification requires the "strict" static model
+
+  elsif Static_Elaboration_Checks and not Relaxed_Elaboration_Checks then
+ null;
+
+  --  Any other combination of models does not guarantee the absence of ABE
+  --  problems for SPARK rule verification purposes. Note that there is no
+  --  need to c

Re: [PATCH 1/2] [SPARC] Prevent -mfix-ut699 from generating b2bst errata sequences

2017-12-05 Thread Daniel Cederman


But isn't that the change I already rejected back in July?


You are right, that was a mistake from our side to submit it again. We 
had rediscovered the sequence using our scanning script and I forgot 
that the sequence was harmless in this case. We will update our scripts 
and revert the patch.


The person here with write access, danielh, is currently traveling, so 
we wont be able to revert it until next week. If it is urgent, could I 
ask you to revert it for us?


Thanks for checking the patches in detail and for improving the formating.

/Daniel C

Re: C++ PATCH for c++/79228, complex literal suffixes

2017-12-05 Thread Jakub Jelinek

On Fri, Dec 01, 2017 at 03:16:23PM -0500, Jason Merrill wrote:
> commit e39b7d506d236ce7ef9f64d1bcf0b384bb2d8038
> Author: Jason Merrill 
> Date:   Fri Dec 1 07:45:03 2017 -0500
> 
> PR c++/79228 - extensions hide C++14 complex literal operators
> 
> libcpp/
> * expr.c (interpret_float_suffix): Ignore 'i' in C++14 and up.
> (interpret_int_suffix): Likewise.
> gcc/cp/
> * parser.c (cp_parser_userdef_numeric_literal): Be helpful about
> 'i' in C++14 and up.
> 
> +  /* In C++14 and up these suffixes are in the standard library, so treat
> +  them as user-defined literals.  */
> +  if (CPP_OPTION (pfile, cplusplus)
> +   && CPP_OPTION (pfile, lang) > CLK_CXX11
> +   && (!memcmp (orig_s, "i", orig_len)
> +   || !memcmp (orig_s, "if", orig_len)
> +   || !memcmp (orig_s, "il", orig_len)))

This doesn't seem right, it will invoke UB if orig_len > 2 by potentially
accessing bytes beyond 'i' and '\0' in "i" (and for orig_len > 3 also after
"if" or "il").
If you only want to return 0 if orig_len bytes starting at orig_s are 'i'
or 'i' 'f' or 'i' 'l', then I'd write instead as in the patch below.
Or if memcmp is more readable, at least check orig_len first.

> +  /* In C++14 and up these suffixes are in the standard library, so treat
> +  them as user-defined literals.  */
> +  if (CPP_OPTION (pfile, cplusplus)
> +   && CPP_OPTION (pfile, lang) > CLK_CXX11
> +   && (!memcmp (s, "i", orig_len)
> +   || !memcmp (s, "if", orig_len)
> +   || !memcmp (s, "il", orig_len)))

Similarly.  Additionally, "if" can't happen here, because we don't allow 'f'
among suffixes.

So, ok for trunk if it passes testing, or some other form thereof?

2017-12-05  Jakub Jelinek  

PR c++/79228
* expr.c (interpret_float_suffix): Avoid memcmp.
(interpret_int_suffix): Likewise.  Don't check for if.

--- libcpp/expr.c.jj2017-12-01 22:13:24.0 +0100
+++ libcpp/expr.c   2017-12-05 13:26:57.019683785 +0100
@@ -280,9 +280,10 @@ interpret_float_suffix (cpp_reader *pfil
 them as user-defined literals.  */
   if (CPP_OPTION (pfile, cplusplus)
  && CPP_OPTION (pfile, lang) > CLK_CXX11
- && (!memcmp (orig_s, "i", orig_len)
- || !memcmp (orig_s, "if", orig_len)
- || !memcmp (orig_s, "il", orig_len)))
+ && orig_s[0] == 'i'
+ && (orig_len == 1
+ || (orig_len == 2
+ && (orig_s[1] == 'f' || orig_s[1] == 'l'
return 0;
 }
 
@@ -345,9 +346,8 @@ interpret_int_suffix (cpp_reader *pfile,
 them as user-defined literals.  */
   if (CPP_OPTION (pfile, cplusplus)
  && CPP_OPTION (pfile, lang) > CLK_CXX11
- && (!memcmp (s, "i", orig_len)
- || !memcmp (s, "if", orig_len)
- || !memcmp (s, "il", orig_len)))
+ && s[0] == 'i'
+ && (orig_len == 1 || (orig_len == 2 && s[1] == 'l')))
return 0;
 }
 


Jakub

[Ada] Reject properly an aspect Predicate on a formal type

2017-12-05 Thread Pierre-Marie de Rodat

This patch adds a check to reject an aspect Predicate on a formal type
declaration.

Compiling gen.adb must yield:

   gen.ads:2:55: predicate cannot apply to formal type

---
generic
   type T is array (Integer range <>) of Integer
   with Predicate => T'First /= 1;
function Gen return Integer;
---
function Gen return Integer is
   X : T(1 .. 2);
begin
   return X'First;
end Gen;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-05  Ed Schonberg  

* sem_ch13.adb (Analyze_Aspect_Specifications, case Predicate): A
predicate cannot apply to a formal type.

Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 255408)
+++ sem_ch13.adb(working copy)
@@ -2389,6 +2389,10 @@
   elsif Is_Incomplete_Type (E) then
  Error_Msg_N
("predicate cannot apply to incomplete view", Aspect);
+
+  elsif Is_Generic_Type (E) then
+ Error_Msg_N
+   ("predicate cannot apply to formal type", Aspect);
  goto Continue;
   end if;

[Ada] Improve debugging of task type discriminants

2017-12-05 Thread Pierre-Marie de Rodat

This patch improves the debugging information generated by the compiler
for task discriminant.

No test available.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-05  Javier Miranda  

* exp_ch9.adb (Install_Private_Data_Declarations): Add missing
Debug_Info_Needed decoration of internally generated discriminal
renaming declaration.

Index: exp_ch9.adb
===
--- exp_ch9.adb (revision 255412)
+++ exp_ch9.adb (working copy)
@@ -13450,6 +13450,12 @@
Selector_Name => Make_Identifier (Loc, Chars (D;
Add (Decl);
 
+   --  Set debug info needed on this renaming declaration even
+   --  though it does not come from source, so that the debugger
+   --  will get the right information for these generated names.
+
+   Set_Debug_Info_Needed (Discriminal (D));
+
Next_Discriminant (D);
 end loop;
  end;

[Ada] Spurious error with private overriding of overloaded subprogram

2017-12-05 Thread Pierre-Marie de Rodat

This patch fixds a spurious error report on a prefixed call where the
operation is a private overriding of a visible operation, and the operation
has various overloadings in the visible and private parts.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-05  Ed Schonberg  

* sem_ch4.adb (Is_Private_Overriding): If the candidate private
subprogram is overloaded, scan the list of homonyms in the same
scope, to find the inherited operation that may be overridden
by the candidate.
* exp_ch11.adb, exp_ch7.adb: Minor reformatting.

gcc/testsuite/

2017-12-05  Ed Schonberg  

* gnat.dg/private_overriding.adb: New testcase.
Index: exp_ch11.adb
===
--- exp_ch11.adb(revision 255412)
+++ exp_ch11.adb(working copy)
@@ -1419,19 +1419,28 @@
  return;
   end if;
 
-  --  Add clean up actions if required
+  --  Add cleanup actions if required. No cleanup actions are needed in
+  --  thunks associated with interfaces, because they only displace the
+  --  pointer to the object. For extended return statements, we need
+  --  cleanup actions if the Handled_Statement_Sequence contains generated
+  --  objects of controlled types, for example. We do not want to clean up
+  --  the return object.
 
   if not Nkind_In (Parent (N), N_Accept_Statement,
N_Extended_Return_Statement,
N_Package_Body)
 and then not Delay_Cleanups (Current_Scope)
-
---  No cleanup action needed in thunks associated with interfaces
---  because they only displace the pointer to the object.
-
 and then not Is_Thunk (Current_Scope)
   then
  Expand_Cleanup_Actions (Parent (N));
+
+  elsif Nkind (Parent (N)) = N_Extended_Return_Statement
+and then Handled_Statement_Sequence (Parent (N)) = N
+and then not Delay_Cleanups (Current_Scope)
+  then
+ pragma Assert (not Is_Thunk (Current_Scope));
+ Expand_Cleanup_Actions (Parent (N));
+
   else
  Set_First_Real_Statement (N, First (Statements (N)));
   end if;
Index: exp_ch7.adb
===
--- exp_ch7.adb (revision 255408)
+++ exp_ch7.adb (working copy)
@@ -310,7 +310,7 @@
function Build_Cleanup_Statements
  (N  : Node_Id;
   Additional_Cleanup : List_Id) return List_Id;
-   --  Create the clean up calls for an asynchronous call block, task master,
+   --  Create the cleanup calls for an asynchronous call block, task master,
--  protected subprogram body, task allocation block or task body, or
--  additional cleanup actions parked on a transient block. If the context
--  does not contain the above constructs, the routine returns an empty
@@ -479,7 +479,7 @@
  return False;
 
   --  Do not consider C and C++ types since it is assumed that the non-Ada
-  --  side will handle their clean up.
+  --  side will handle their cleanup.
 
   elsif Convention (Desig_Typ) = Convention_C
 or else Convention (Desig_Typ) = Convention_CPP
@@ -1554,8 +1554,8 @@
 Jump_Alts := New_List;
  end if;
 
- --  If the context requires additional clean up, the finalization
- --  machinery is added after the clean up code.
+ --  If the context requires additional cleanup, the finalization
+ --  machinery is added after the cleanup code.
 
  if Acts_As_Clean then
 Finalizer_Stmts   := Clean_Stmts;
@@ -1784,7 +1784,7 @@
  end if;
 
  --  Protect the statements with abort defer/undefer. This is only when
- --  aborts are allowed and the clean up statements require deferral or
+ --  aborts are allowed and the cleanup statements require deferral or
  --  there are controlled objects to be finalized. Note that the abort
  --  defer/undefer pair does not require an extra block because each
  --  finalization exception is caught in its corresponding finalization
@@ -1800,7 +1800,7 @@
 
  --  The local exception does not need to be reraised for library-level
  --  finalizers. Note that this action must be carried out after object
- --  clean up, secondary stack release and abort undeferral. Generate:
+ --  cleanup, secondary stack release, and abort undeferral. Generate:
 
  --if Raised and then not Abort then
  --   Raise_From_Controlled_Operation (E);
@@ -1907,7 +1907,7 @@
 Append_To (Spec_Decls, Fin_Spec);
 Analyze (Fin_Spec);
 
---  When the finalizer acts solely as a clean up routine, the body
+--  When the finalizer acts solely as a cleanup routine, the body
 --  is inserted right after the spec.
 
 if Ac

Re: [PATCH] Add testcase for PR83252 (was Re: patch to fix PR80818)

2017-12-05 Thread Jakub Jelinek

On Mon, Dec 04, 2017 at 11:52:01AM +0100, Jakub Jelinek wrote:
> On Wed, Nov 29, 2017 at 05:21:22PM -0500, Vladimir Makarov wrote:
> >   The following patch fixes
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80818
> > 
> >   The patch was successfully tested and bootstrapped on x86_64. The patch
> > has no test because it is hard to check the problem.  I checked manually
> 
> This changed fixed PR83252 which has a reasonably small testcase.
> I've further reduced it using creduce (with -O0 -W{,maybe-}uninitialized
> and/or -fsanitize=undefined checking, plus test that it succeeds with
> r255258 and fails with r255257).  Can you please double check if the
> testcase represents the same issue you were working on or if your change
> merely made the bug latent again?

Unfortunately, it broke again with r255377.  Vladimir, could you please have
a look at this PR?

Jakub

[PATCH][gimple-interchange] Final cleanup stuff

2017-12-05 Thread Richard Biener


This is my final sweep through the code doing cleanup on-the-fly.

I think the code is ready to go now (after you committed your changes).

What I'd eventually like to see is merge the two loop_cand objects
into a single interchange_cand object, having two really confuses
me when reading and trying to understand code.  But let's defer this
for next stage1.

A change that's still required is adjusting the graphite testcases
to use -floop-nest-optimize (you promised to do that) and to enable
the interchange pass at -O3 by default.

With that, can you prepare a patch that merges the changes on the
branch to trunk and provide updated performance / statistics
for SPEC (maybe also SPEC compile-time figures with/without the
patch?  it's enough to look at the Elapsed Compile time numbers in 
the log file IMHO).

Thanks,
Richard.

2017-12-05  Richard Biener  

* gimple-loop-interchange.cc (AVG_LOOP_NITER): Remove.
(loop_cand::supported_operations): Simplify.
(loop_cand::analyze_iloop_reduction_var): Use m_exit.
(loop_cand::analyze_oloop_reduction_var): Likewise.
(loop_cand::analyze_lcssa_phis): Likewise.
(find_deps_in_bb_for_stmt): Use gimple_seq_add_stmt_without_update.
(loop_cand::undo_simple_reduction): Likewise, properly release
virtual defs.
(tree_loop_interchange::interchange_loops): Likewise.  Move code
to innner loop here.
(tree_loop_interchange::map_inductions_to_loop): Remove code moving
code to inner loop.
(insert_pos_at_inner_loop): Inline into single caller...
(tree_loop_interchange::move_code_to_inner): ...here.  Properly
release virtual defs.
(proper_loop_form_for_interchange): Properly analyze/instantiate SCEV.
(prepare_perfect_loop_nest): Do not explicitely allocate vectors.

Index: gcc/gimple-loop-interchange.cc
===
--- gcc/gimple-loop-interchange.cc  (revision 255414)
+++ gcc/gimple-loop-interchange.cc  (working copy)
@@ -81,8 +81,6 @@ along with GCC; see the file COPYING3.
 #define MAX_NUM_STMT(PARAM_VALUE (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS))
 /* Maximum number of data references in loop nest.  */
 #define MAX_DATAREFS(PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS))
-/* Default average number of loop iterations.  */
-#define AVG_LOOP_NITER  (PARAM_VALUE (PARAM_AVG_LOOP_NITER))
 
 /* Comparison ratio of access stride between inner/outer loops to be
interchanged.  This is the minimum stride ratio for loop interchange
@@ -105,7 +103,7 @@ typedef struct induction
   /* IV's base and step part of SCEV.  */
   tree base;
   tree step;
-}* induction_p;
+} *induction_p;
 
 /* Enum type for loop reduction variable.  */
 enum reduction_type
@@ -136,7 +134,7 @@ typedef struct reduction
  reference.  */
   tree fini_ref;
   enum reduction_type type;
-}* reduction_p;
+} *reduction_p;
 
 
 /* Dump reduction RE.  */
@@ -302,24 +300,17 @@ loop_cand::supported_operations (basic_b
   if (is_gimple_debug (stmt))
continue;
 
-  if (gimple_has_volatile_ops (stmt)
- || gimple_has_side_effects (stmt))
+  if (gimple_has_side_effects (stmt))
return false;
 
   bb_num_stmts++;
-  if (is_gimple_call (stmt))
+  if (gcall *call = dyn_cast  (stmt))
{
- int cflags = gimple_call_flags (stmt);
- /* Only support const/pure calls.  */
- if (!(cflags & (ECF_CONST | ECF_PURE)))
-   return false;
-
  /* In basic block of outer loop, the call should be cheap since
 it will be moved to inner loop.  */
  if (iloop != NULL
- && !gimple_inexpensive_call_p (as_a  (stmt)))
+ && !gimple_inexpensive_call_p (call))
return false;
-
  continue;
}
 
@@ -334,6 +325,7 @@ loop_cand::supported_operations (basic_b
   tree lhs;
   /* Support loop invariant memory reference if it's only used once by
 inner loop.  */
+  /* ???  How's this checking for invariantness?  */
   if (gimple_assign_single_p (stmt)
  && (lhs = gimple_assign_lhs (stmt)) != NULL_TREE
  && TREE_CODE (lhs) == SSA_NAME
@@ -347,7 +339,7 @@ loop_cand::supported_operations (basic_b
   /* Allow PHI nodes in any basic block of inner loop, PHI nodes in outer
  loop's header, or PHI nodes in dest bb of inner loop's exit edge.  */
   if (!iloop || bb == m_loop->header
-  || bb == single_dom_exit (iloop->m_loop)->dest)
+  || bb == iloop->m_exit->dest)
 return true;
 
   /* Don't allow any other PHI nodes.  */
@@ -484,7 +476,6 @@ loop_cand::analyze_iloop_reduction_var (
   gphi *lcssa_phi = NULL, *use_phi;
   tree init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (m_loop));
   tree next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (m_loop));
-  edge e = single_exit (m_loop);
   reduction_p re;
   gimple *stmt, *next_def, *single_use = NULL;

Re: [PR 83141] Prevent SRA from removing type changing assignment

2017-12-05 Thread Richard Biener

On Tue, 5 Dec 2017, Martin Jambor wrote:

> On Tue, Dec 05 2017, Martin Jambor wrote:
> > On Tue, Dec 05 2017, Martin Jambor wrote:
> > Hi,
> >
> >> Hi,
> >>
> >> this is a followup to Richi's
> >> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
> >> 83141.  The basic idea is simple, be just as conservative about type
> >> changing MEM_REFs as we are about actual VCEs.
> >>
> >> I have checked how that would affect compilation of SPEC 2006 and (non
> >> LTO) Mozilla Firefox and am happy to report that the difference was
> >> tiny.  However, I had to make the test less strict, otherwise testcase
> >> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
> >> and expects us to track values accross:
> >>
> >>   int a[] = { 1, 2, 3 };
> >>   /* ... */
> >>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
> >>/* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
> >>/* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
> >>/* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
> >>   
> >> SRA is able to load replacement of a[0] directly from the temporary
> >> array which is apparently necessary to generate proper debug info.  I
> >> have therefore allowed the current transformation to go forward if the
> >> source does not contain any padding or if it is a read-only declaration.
> >
> > Ah, the read-only test is of course bogus, it was a last minute addition
> > when I was apparently already too tired to think it through.  Please
> > disregard that line in the patch (it has passed bootstrap and testing
> > without it).
> >
> > Sorry for the noise,
> >
> > Martin
> >
> 
> And for the record, below is the actual patch, after a fresh round of
> re-testing to double check I did not mess up anything else.  As before,
> I'd like to ask for review, especially of the type_contains_padding_p
> predicate and then would like to commit it to trunk.

Comments below...

> Thanks,
> 
> Martin
> 
> 
> 2017-12-05  Martin Jambor  
> 
>   PR tree-optimization/83141
>   * tree-sra.c (type_contains_padding_p): New function.
>   (contains_vce_or_bfcref_p): Move up in the file, also test for
>   MEM_REFs implicitely changing types with padding.  Remove inline
>   keyword.
>   (build_accesses_from_assign): Check contains_vce_or_bfcref_p
>   before setting bit in should_scalarize_away_bitmap.
> 
> testsuite/
>   * gcc.dg/tree-ssa/pr83141.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +
>  gcc/tree-sra.c  | 115 
> ++--
>  2 files changed, 127 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> new file mode 100644
> index 000..86caf66025b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -fdump-tree-esra-details" } */
> +
> +volatile short vs;
> +volatile long vl;
> +
> +struct A { short s; long i; long j; };
> +struct A a, b;
> +void foo ()
> +{
> +  struct A c;
> +  __builtin_memcpy (&c, &b, sizeof (struct A));
> +  __builtin_memcpy (&a, &c, sizeof (struct A));
> +
> +  vs = c.s;
> +  vl = c.i;
> +  vl = c.j;
> +}
> +int main()
> +{
> +  __builtin_memset (&b, 0, sizeof (struct A));
> +  b.s = 1;
> +  __builtin_memcpy ((char *)&b+2, &b, 2);
> +  foo ();
> +  __builtin_memcpy (&a, (char *)&a+2, 2);
> +  if (a.s != 1)
> +__builtin_abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" 
> "esra" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 866cff0edb0..df9f10f83b6 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref)
>return false;
>  }
>  
> +/* Return false if we can determine that TYPE has no padding, otherwise 
> return
> +   true.  */
> +
> +static bool
> +type_contains_padding_p (tree type)
> +{
> +  if (is_gimple_reg_type (type))
> +return false;
> +
> +  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
> +return true;
> +
> +  switch (TREE_CODE (type))
> +{
> +case RECORD_TYPE:
> +  {
> + unsigned HOST_WIDE_INT pos = 0;
> + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> +   if (TREE_CODE (fld) == FIELD_DECL)
> + {
> +   tree ft = TREE_TYPE (fld);
> +
> +   if (DECL_BIT_FIELD (fld)
> +   || DECL_PADDING_P (fld)
> +   || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
> + return true;
> +
> +   tree t = bit_position (fld);
> +   if (!tree_fits_uhwi_p (t))
> + return true;
> +   unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
> +   if (pos != bp)
> + return true;
> +
> +   pos += tre

Re: [PATCH][gimple-interchange] Final cleanup stuff

2017-12-05 Thread Bin.Cheng

On Tue, Dec 5, 2017 at 1:02 PM, Richard Biener  wrote:
>
> This is my final sweep through the code doing cleanup on-the-fly.
Hi,
Thanks very much for all your help!
>
> I think the code is ready to go now (after you committed your changes).
>
> What I'd eventually like to see is merge the two loop_cand objects
> into a single interchange_cand object, having two really confuses
> me when reading and trying to understand code.  But let's defer this
> for next stage1.
>
> A change that's still required is adjusting the graphite testcases
> to use -floop-nest-optimize (you promised to do that) and to enable
> the interchange pass at -O3 by default.
Yeah, there will be two separated patches, one adjusting graphite
tests and the other enabling at O3 with miscellaneous test change.

>
> With that, can you prepare a patch that merges the changes on the
> branch to trunk and provide updated performance / statistics
> for SPEC (maybe also SPEC compile-time figures with/without the
> patch?  it's enough to look at the Elapsed Compile time numbers in
> the log file IMHO).
Will collect data for the final version.  Thanks again.

Thanks,
bin
>
> Thanks,
> Richard.
>
> 2017-12-05  Richard Biener  
>
> * gimple-loop-interchange.cc (AVG_LOOP_NITER): Remove.
> (loop_cand::supported_operations): Simplify.
> (loop_cand::analyze_iloop_reduction_var): Use m_exit.
> (loop_cand::analyze_oloop_reduction_var): Likewise.
> (loop_cand::analyze_lcssa_phis): Likewise.
> (find_deps_in_bb_for_stmt): Use gimple_seq_add_stmt_without_update.
> (loop_cand::undo_simple_reduction): Likewise, properly release
> virtual defs.
> (tree_loop_interchange::interchange_loops): Likewise.  Move code
> to innner loop here.
> (tree_loop_interchange::map_inductions_to_loop): Remove code moving
> code to inner loop.
> (insert_pos_at_inner_loop): Inline into single caller...
> (tree_loop_interchange::move_code_to_inner): ...here.  Properly
> release virtual defs.
> (proper_loop_form_for_interchange): Properly analyze/instantiate SCEV.
> (prepare_perfect_loop_nest): Do not explicitely allocate vectors.
>
> Index: gcc/gimple-loop-interchange.cc
> ===
> --- gcc/gimple-loop-interchange.cc  (revision 255414)
> +++ gcc/gimple-loop-interchange.cc  (working copy)
> @@ -81,8 +81,6 @@ along with GCC; see the file COPYING3.
>  #define MAX_NUM_STMT(PARAM_VALUE (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS))
>  /* Maximum number of data references in loop nest.  */
>  #define MAX_DATAREFS(PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS))
> -/* Default average number of loop iterations.  */
> -#define AVG_LOOP_NITER  (PARAM_VALUE (PARAM_AVG_LOOP_NITER))
>
>  /* Comparison ratio of access stride between inner/outer loops to be
> interchanged.  This is the minimum stride ratio for loop interchange
> @@ -105,7 +103,7 @@ typedef struct induction
>/* IV's base and step part of SCEV.  */
>tree base;
>tree step;
> -}* induction_p;
> +} *induction_p;
>
>  /* Enum type for loop reduction variable.  */
>  enum reduction_type
> @@ -136,7 +134,7 @@ typedef struct reduction
>   reference.  */
>tree fini_ref;
>enum reduction_type type;
> -}* reduction_p;
> +} *reduction_p;
>
>
>  /* Dump reduction RE.  */
> @@ -302,24 +300,17 @@ loop_cand::supported_operations (basic_b
>if (is_gimple_debug (stmt))
> continue;
>
> -  if (gimple_has_volatile_ops (stmt)
> - || gimple_has_side_effects (stmt))
> +  if (gimple_has_side_effects (stmt))
> return false;
>
>bb_num_stmts++;
> -  if (is_gimple_call (stmt))
> +  if (gcall *call = dyn_cast  (stmt))
> {
> - int cflags = gimple_call_flags (stmt);
> - /* Only support const/pure calls.  */
> - if (!(cflags & (ECF_CONST | ECF_PURE)))
> -   return false;
> -
>   /* In basic block of outer loop, the call should be cheap since
>  it will be moved to inner loop.  */
>   if (iloop != NULL
> - && !gimple_inexpensive_call_p (as_a  (stmt)))
> + && !gimple_inexpensive_call_p (call))
> return false;
> -
>   continue;
> }
>
> @@ -334,6 +325,7 @@ loop_cand::supported_operations (basic_b
>tree lhs;
>/* Support loop invariant memory reference if it's only used once by
>  inner loop.  */
> +  /* ???  How's this checking for invariantness?  */
>if (gimple_assign_single_p (stmt)
>   && (lhs = gimple_assign_lhs (stmt)) != NULL_TREE
>   && TREE_CODE (lhs) == SSA_NAME
> @@ -347,7 +339,7 @@ loop_cand::supported_operations (basic_b
>/* Allow PHI nodes in any basic block of inner loop, PHI nodes in outer
>   loop's header, or PHI nodes in dest bb of inner loop's exit edge.  */
>if (!iloop || bb == m_loop->he

[PATCH] Add timevars to tree-ssa-math-opts.c passes

2017-12-05 Thread Richard Biener


Just noticed they're missing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-12-05  Richard Biener   

* timevar.def (TV_TREE_RECIP, TV_TREE_SINCOS, TV_TREE_WIDEN_MUL):
Add.
* tree-ssa-math-opts.c (pass_data_cse_reciprocal): Use TV_TREE_RECIP.
(pass_data_cse_sincos): Use TV_TREE_SINCOS.
(pass_data_optimize_widening_mul): Use TV_TREE_WIDEN_MUL.

Index: gcc/timevar.def
===
--- gcc/timevar.def (revision 255414)
+++ gcc/timevar.def (working copy)
@@ -209,6 +209,9 @@ DEFTIMEVAR (TV_TREE_SSA_VERIFY   , "
 DEFTIMEVAR (TV_TREE_STMT_VERIFY  , "tree STMT verifier")
 DEFTIMEVAR (TV_TREE_SWITCH_CONVERSION, "tree switch conversion")
 DEFTIMEVAR (TV_TREE_SWITCH_LOWERING,   "tree switch lowering")
+DEFTIMEVAR (TV_TREE_RECIP, "gimple CSE reciprocals")
+DEFTIMEVAR (TV_TREE_SINCOS   , "gimple CSE sin/cos")
+DEFTIMEVAR (TV_TREE_WIDEN_MUL, "gimple widening/fma detection")
 DEFTIMEVAR (TV_TRANS_MEM , "transactional memory")
 DEFTIMEVAR (TV_TREE_STRLEN   , "tree strlen optimization")
 DEFTIMEVAR (TV_CGRAPH_VERIFY , "callgraph verifier")
Index: gcc/tree-ssa-math-opts.c
===
--- gcc/tree-ssa-math-opts.c(revision 255414)
+++ gcc/tree-ssa-math-opts.c(working copy)
@@ -688,7 +688,7 @@ const pass_data pass_data_cse_reciprocal
   GIMPLE_PASS, /* type */
   "recip", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE, /* tv_id */
+  TV_TREE_RECIP, /* tv_id */
   PROP_ssa, /* properties_required */
   0, /* properties_provided */
   0, /* properties_destroyed */
@@ -1902,7 +1902,7 @@ const pass_data pass_data_cse_sincos =
   GIMPLE_PASS, /* type */
   "sincos", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE, /* tv_id */
+  TV_TREE_SINCOS, /* tv_id */
   PROP_ssa, /* properties_required */
   PROP_gimple_opt_math, /* properties_provided */
   0, /* properties_destroyed */
@@ -3243,7 +3243,7 @@ const pass_data pass_data_optimize_widen
   GIMPLE_PASS, /* type */
   "widening_mul", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE, /* tv_id */
+  TV_TREE_WIDEN_MUL, /* tv_id */
   PROP_ssa, /* properties_required */
   0, /* properties_provided */
   0, /* properties_destroyed */

[committed] Add testcase for already fixed PR (PR tree-optimization/83283)

2017-12-05 Thread Jakub Jelinek

Hi!

This testcase has been fixed by r253146 but that testcase is quite different
from this one, so I've committed this to trunk as obvious.

2017-12-05  Jakub Jelinek  

PR tree-optimization/83283
* g++.dg/torture/pr83283.C: New test.

--- gcc/testsuite/g++.dg/torture/pr83283.C.jj   2017-12-05 13:01:03.432144366 
+0100
+++ gcc/testsuite/g++.dg/torture/pr83283.C  2017-12-05 13:00:55.926238921 
+0100
@@ -0,0 +1,26 @@
+// PR tree-optimization/83283
+// { dg-do run }
+// { dg-additional-options "-std=c++11" }
+
+enum E : unsigned char { X = 0, Y = 1 };
+
+void __attribute__((noinline))
+foo (E *v, int size)
+{
+  for (int i = 0; i < size; ++i)
+{
+  const bool b = (v[i] == E::Y);
+  v[i] = static_cast(static_cast(b));
+}
+}
+
+int
+main ()
+{
+  constexpr int items = 32;
+  E vals[items] = {X};
+  vals[3] = Y;
+  foo (vals, items);
+  if (vals[3] != 1)
+__builtin_abort ();
+}

Jakub

[committed] Avoid i suffixes in libgomp oacc testsuite (PR testsuite/83281)

2017-12-05 Thread Jakub Jelinek

Hi!

The i suffix in C++14 does something different now, so I've changed
it to use the j suffix which stayed as is.

Regtested on x86_64-linux and i686-linux, committed to trunk.

2017-12-05  Jakub Jelinek  

PR testsuite/83281
* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c (main): Use
j suffix instead of i.
* testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c (main):
Likewise.

--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c.jj 
2017-02-09 12:59:36.0 +0100
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c
2017-12-05 13:17:55.698446879 +0100
@@ -99,7 +99,7 @@ int main (void)
 {
   float frac = ix * (1.0f / 1024) + 1.0f;
   
-  ary[ix] = frac + frac * 2.0i - 1.0i;
+  ary[ix] = frac + frac * 2.0j - 1.0j;
   sum += ary[ix];
   prod *= ary[ix];
 }
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c.jj 
2017-02-09 12:59:36.0 +0100
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-dbl.c
2017-12-05 13:17:40.953631736 +0100
@@ -99,7 +99,7 @@ int main (void)
 {
   double frac = ix * (1.0 / 1024) + 1.0;
   
-  ary[ix] = frac + frac * 2.0i - 1.0i;
+  ary[ix] = frac + frac * 2.0j - 1.0j;
   sum += ary[ix];
   prod *= ary[ix];
 }

Jakub

[PATCH] Avoid introducing UB in match.pd (T)(P+A)-(T)(P+B) and similar optimizations (PR sanitizer/81281)

2017-12-05 Thread Jakub Jelinek

Hi!

As mentioned in the PR, while (T)(P + A) - (T)P -> (T) A
shouldn't introduce UB when it wasn't there earlier,
(T)P - (T)(P + A) -> -(T) A
and
(T)(P + A) - (T)(P + B) -> (T)A - (T)B
can, so if the conversion to T isn't widening and T has undefined overflow,
then we need to perform the negation or minus in unsigned_type_for instead
of T.
Another thing is that while pointer_plus isn't commutative, plus is, so
we should use :c in that case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-05  Jakub Jelinek  

PR sanitizer/81281
* match.pd ((T)(P + A) - (T)P -> (T) A): Split into separate
simplify for plus with :c added, and pointer_plus without that.
((T)P - (T)(P + A) -> -(T) A): Likewise.  If type is integral
with undefined overflow and the conversion is not widening,
perform negation in utype and only convert to type afterwards.
((T)(P + A) - (T)(P + B) -> (T)A - (T)B): Split into separate
simplify for plus with :c added, and pointer_plus without that.
If type is integral with undefined overflow and the conversion is
not widening, perform minus in utype and only convert to type
afterwards.  Move the last pointer_diff_expr simplify into the
two outermost ifs.

* gcc.c-torture/execute/pr81281.c: New test.
* gcc.dg/pr81281-1.c: New test.
* gcc.dg/pr81281-2.c: New test.
* g++.dg/ubsan/pr81281.C: New test.
* g++.dg/ubsan/pr81281-aux.cc: New test.

--- gcc/match.pd.jj 2017-11-28 09:40:08.0 +0100
+++ gcc/match.pd2017-12-05 11:36:58.855074420 +0100
@@ -1783,28 +1783,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bit_not @0))
 
   /* (T)(P + A) - (T)P -> (T) A */
-  (for add (plus pointer_plus)
-   (simplify
-(minus (convert (add @@0 @1))
- (convert @0))
-(if (element_precision (type) <= element_precision (TREE_TYPE (@1))
-/* For integer types, if A has a smaller type
-   than T the result depends on the possible
-   overflow in P + A.
-   E.g. T=size_t, A=(unsigned)429497295, P>0.
-   However, if an overflow in P + A would cause
-   undefined behavior, we can assume that there
-   is no overflow.  */
-|| (INTEGRAL_TYPE_P (TREE_TYPE (@0))
-&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
-/* For pointer types, if the conversion of A to the
-   final type requires a sign- or zero-extension,
-   then we have to punt - it is not defined which
-   one is correct.  */
-|| (POINTER_TYPE_P (TREE_TYPE (@0))
-&& TREE_CODE (@1) == INTEGER_CST
-&& tree_int_cst_sign_bit (@1) == 0))
- (convert @1
+  (simplify
+   (minus (convert (plus:c @@0 @1))
+(convert @0))
+   (if (element_precision (type) <= element_precision (TREE_TYPE (@1))
+   /* For integer types, if A has a smaller type
+  than T the result depends on the possible
+  overflow in P + A.
+  E.g. T=size_t, A=(unsigned)429497295, P>0.
+  However, if an overflow in P + A would cause
+  undefined behavior, we can assume that there
+  is no overflow.  */
+   || (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0
+(convert @1)))
+  (simplify
+   (minus (convert (pointer_plus @@0 @1))
+(convert @0))
+   (if (element_precision (type) <= element_precision (TREE_TYPE (@1))
+   /* For pointer types, if the conversion of A to the
+  final type requires a sign- or zero-extension,
+  then we have to punt - it is not defined which
+  one is correct.  */
+   || (POINTER_TYPE_P (TREE_TYPE (@0))
+   && TREE_CODE (@1) == INTEGER_CST
+   && tree_int_cst_sign_bit (@1) == 0))
+(convert @1)))
(simplify
 (pointer_diff (pointer_plus @@0 @1) @0)
 /* The second argument of pointer_plus must be interpreted as signed, and
@@ -1813,10 +1817,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (convert (convert:stype @1
 
   /* (T)P - (T)(P + A) -> -(T) A */
-  (for add (plus pointer_plus)
-   (simplify
-(minus (convert @0)
- (convert (add @@0 @1)))
+  (simplify
+   (minus (convert @0)
+(convert (plus:c @@0 @1)))
+   (if (INTEGRAL_TYPE_P (type)
+   && TYPE_OVERFLOW_UNDEFINED (type)
+   && element_precision (type) <= element_precision (TREE_TYPE (@1)))
+(with { tree utype = unsigned_type_for (type); }
+ (convert (negate (convert:utype @1
 (if (element_precision (type) <= element_precision (TREE_TYPE (@1))
 /* For integer types, if A has a smaller type
than T the result depends on the possible
@@ -1826,7 +1834,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
undefined behavior, we can assume that there
is no overflow.  */
 || (INTEGRAL_TYPE_P (TREE_TYPE (@0))
-&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0

Re: [PATCH] emit a trap for buffer overflow in placement new

2017-12-05 Thread Jonathan Wakely


On 05/12/17 09:48 +0100, Marc Glisse wrote:

On Mon, 4 Dec 2017, Martin Sebor wrote:


On 12/04/2017 03:44 PM, Marc Glisse wrote:

On Mon, 4 Dec 2017, Martin Sebor wrote:


The -Wplacement-new option warns for buffer overflow in placement
new expressions with objects of constant sizes, but because it's
implemented completely in the C++ front end it misses the more
interesting non-constant sizes.

The attached patch instruments both forms of operator placement
new to emit a trap when __builtin_object_size() determines that
the pointer points to an object less than the specified number
of bytes.  This is done only when _FORTIFY_SOURCE is defined
to a non-zero value.  This makes it possible to prevent buffer
overflow in most of the same cases as in built-ins like strcpy,
though without warnings when the size is nor a C++ constant
integer expression.

On x86_64-linux it passes testing with no apparent regressions.
Can anyone think of problems with this solution?  If not, given
its simplicity, would it be appropriate even at this stage?


AFAIK, one can call this operator new manually on any pointer, including
one-past-the-end pointers and null pointers. It is only with new
expressions that the limitation comes in (because it runs a constructor
afterwards). Not that people often do that...


Hmm, yes, there don't seem to be any requirements on calling
the operator by itself.  That's too bad.  I was going to move
the placement new checking into the middle-end to improve
the detection and avoid some false positives but the operator
disappears too early, before the size of the expression that's
being stuffed into the destination is known.


What's wrong with generating the same code you wanted to put in 
operator new from the front-end, next to the call to operator new, 
when the front-end handles a new expression?



The only other solution that comes to mind to detect non-constant
overflow is to do something like:

void* operator new (size_t n, void *p)
{
  if (__builtin_object_size (p, 0) < n)
   __builtin_warn ("buffer overflow in placement new");
  return p;
}

and emit the warning from __builtin_warn during expansion.  That
should work and be usable in other contexts as well (e.g., to
implement overflow and other error detection in user-defined
functions).  It would be kind of like the middle-end form of
Clang attribute diagnose_if.


I may have misunderstood, but that seems to have the same issue of 
applying to all calls to operator new instead of just new expressions.


Yes, but if it's only a warning then at least it doesn't cause
compilation to fail in the (probably very rare) cases where people are
doing something strange and calling it directly with a too-small
pointer.

Re: C++ PATCH for c++/79228, complex literal suffixes

2017-12-05 Thread Jason Merrill

OK, thanks.

On Tue, Dec 5, 2017 at 7:33 AM, Jakub Jelinek  wrote:
> On Fri, Dec 01, 2017 at 03:16:23PM -0500, Jason Merrill wrote:
>> commit e39b7d506d236ce7ef9f64d1bcf0b384bb2d8038
>> Author: Jason Merrill 
>> Date:   Fri Dec 1 07:45:03 2017 -0500
>>
>> PR c++/79228 - extensions hide C++14 complex literal operators
>>
>> libcpp/
>> * expr.c (interpret_float_suffix): Ignore 'i' in C++14 and up.
>> (interpret_int_suffix): Likewise.
>> gcc/cp/
>> * parser.c (cp_parser_userdef_numeric_literal): Be helpful about
>> 'i' in C++14 and up.
>>
>> +  /* In C++14 and up these suffixes are in the standard library, so 
>> treat
>> +  them as user-defined literals.  */
>> +  if (CPP_OPTION (pfile, cplusplus)
>> +   && CPP_OPTION (pfile, lang) > CLK_CXX11
>> +   && (!memcmp (orig_s, "i", orig_len)
>> +   || !memcmp (orig_s, "if", orig_len)
>> +   || !memcmp (orig_s, "il", orig_len)))
>
> This doesn't seem right, it will invoke UB if orig_len > 2 by potentially
> accessing bytes beyond 'i' and '\0' in "i" (and for orig_len > 3 also after
> "if" or "il").
> If you only want to return 0 if orig_len bytes starting at orig_s are 'i'
> or 'i' 'f' or 'i' 'l', then I'd write instead as in the patch below.
> Or if memcmp is more readable, at least check orig_len first.
>
>> +  /* In C++14 and up these suffixes are in the standard library, so 
>> treat
>> +  them as user-defined literals.  */
>> +  if (CPP_OPTION (pfile, cplusplus)
>> +   && CPP_OPTION (pfile, lang) > CLK_CXX11
>> +   && (!memcmp (s, "i", orig_len)
>> +   || !memcmp (s, "if", orig_len)
>> +   || !memcmp (s, "il", orig_len)))
>
> Similarly.  Additionally, "if" can't happen here, because we don't allow 'f'
> among suffixes.
>
> So, ok for trunk if it passes testing, or some other form thereof?
>
> 2017-12-05  Jakub Jelinek  
>
> PR c++/79228
> * expr.c (interpret_float_suffix): Avoid memcmp.
> (interpret_int_suffix): Likewise.  Don't check for if.
>
> --- libcpp/expr.c.jj2017-12-01 22:13:24.0 +0100
> +++ libcpp/expr.c   2017-12-05 13:26:57.019683785 +0100
> @@ -280,9 +280,10 @@ interpret_float_suffix (cpp_reader *pfil
>  them as user-defined literals.  */
>if (CPP_OPTION (pfile, cplusplus)
>   && CPP_OPTION (pfile, lang) > CLK_CXX11
> - && (!memcmp (orig_s, "i", orig_len)
> - || !memcmp (orig_s, "if", orig_len)
> - || !memcmp (orig_s, "il", orig_len)))
> + && orig_s[0] == 'i'
> + && (orig_len == 1
> + || (orig_len == 2
> + && (orig_s[1] == 'f' || orig_s[1] == 'l'
> return 0;
>  }
>
> @@ -345,9 +346,8 @@ interpret_int_suffix (cpp_reader *pfile,
>  them as user-defined literals.  */
>if (CPP_OPTION (pfile, cplusplus)
>   && CPP_OPTION (pfile, lang) > CLK_CXX11
> - && (!memcmp (s, "i", orig_len)
> - || !memcmp (s, "if", orig_len)
> - || !memcmp (s, "il", orig_len)))
> + && s[0] == 'i'
> + && (orig_len == 1 || (orig_len == 2 && s[1] == 'l')))
> return 0;
>  }
>
>
>
> Jakub

[AArch64] Fix some define_insn_and_split conditions

2017-12-05 Thread Richard Sandiford

The split conditions for aarch64_simd_bsldi_internal and
aarch64_simd_bsldi_alt were:

  "&& GP_REGNUM_P (REGNO (operands[0]))"

But since they (deliberately) can be split before reload, the operand
matched by register_operand can be a SUBREG rather than a REG.  This
triggered a boostrap failure building libgcc with rtl checking enabled.

While checking other define_insn_and_splits for the same thing,
I noticed a couple of SIMD ones were missing the leading "&&",
meaning that they would trigger even without TARGET_SIMD.  That
shouldn't matter in practice, since combine should never end up
generating matching rtl, but...

Tested on aarch64-linux-gnu.  OK to install?

Thanks,
Richard


2017-12-05  Richard Sandiford  

gcc/
* config/aarch64/aarch64-simd.md (aarch64_simd_bsldi_internal)
(aarch64_simd_bsldi_alt): Check REG_P before GP_REGNUM_P.
(aarch64_cmdi, aarch64_cmtstdi): Add leading "&&" to
split condition.

Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  2017-12-05 14:24:52.474015293 +
+++ gcc/config/aarch64/aarch64-simd.md  2017-12-05 14:25:28.128170629 +
@@ -2484,7 +2484,7 @@ (define_insn_and_split "aarch64_simd_bsl
   bit\\t%0.8b, %2.8b, %1.8b
   bif\\t%0.8b, %3.8b, %1.8b
   #"
-  "&& GP_REGNUM_P (REGNO (operands[0]))"
+  "&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
   [(match_dup 1) (match_dup 1) (match_dup 2) (match_dup 3)]
 {
   /* Split back to individual operations.  If we're before reload, and
@@ -2526,7 +2526,7 @@ (define_insn_and_split "aarch64_simd_bsl
   bit\\t%0.8b, %3.8b, %1.8b
   bif\\t%0.8b, %2.8b, %1.8b
   #"
-  "&& GP_REGNUM_P (REGNO (operands[0]))"
+  "&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
   [(match_dup 0) (match_dup 1) (match_dup 2) (match_dup 3)]
 {
   /* Split back to individual operations.  If we're before reload, and
@@ -4453,7 +4453,7 @@ (define_insn_and_split "aarch64_cm

Re: [PATCH] gcc: xtensa: enable address sanitizer

2017-12-05 Thread Max Filippov

On Mon, Dec 4, 2017 at 9:37 PM, augustine.sterl...@gmail.com
 wrote:
> On Mon, Dec 4, 2017 at 1:28 PM, Max Filippov  wrote:
>> gcc/
>> 2017-12-04  Max Filippov  
>>
>> * config/xtensa/xtensa.c (xtensa_asan_shadow_offset): New
>> function.
>> (TARGET_ASAN_SHADOW_OFFSET): New macro definition.
>> * config/xtensa/xtensa.h (FRAME_GROWS_DOWNWARD): Set to 1 if
>> ASAN is enabled.
>
> This is OK.

Thanks! Applied to trunk and backported to gcc-7-branch.

-- Max

Re: [AArch64] Fix some define_insn_and_split conditions

2017-12-05 Thread James Greenhalgh

On Tue, Dec 05, 2017 at 02:28:56PM +, Richard Sandiford wrote:
> The split conditions for aarch64_simd_bsldi_internal and
> aarch64_simd_bsldi_alt were:
> 
>   "&& GP_REGNUM_P (REGNO (operands[0]))"
> 
> But since they (deliberately) can be split before reload, the operand
> matched by register_operand can be a SUBREG rather than a REG.  This
> triggered a boostrap failure building libgcc with rtl checking enabled.
> 
> While checking other define_insn_and_splits for the same thing,
> I noticed a couple of SIMD ones were missing the leading "&&",
> meaning that they would trigger even without TARGET_SIMD.  That
> shouldn't matter in practice, since combine should never end up
> generating matching rtl, but...
> 
> Tested on aarch64-linux-gnu.  OK to install?

OK.

Thanks for fixing my mistake!

James

> 2017-12-05  Richard Sandiford  
> 
> gcc/
>   * config/aarch64/aarch64-simd.md (aarch64_simd_bsldi_internal)
>   (aarch64_simd_bsldi_alt): Check REG_P before GP_REGNUM_P.
>   (aarch64_cmdi, aarch64_cmtstdi): Add leading "&&" to
>   split condition.
>

Re: [Patch][aarch64] Add missing thunderx2-t99 instruction scheduling pipeline descriptions.

2017-12-05 Thread James Greenhalgh

On Mon, Dec 04, 2017 at 05:33:37PM +, Steve Ellcey wrote:
> On Mon, 2017-12-04 at 17:18 +, Kyrill Tkachov wrote:
> 
> > > +(define_insn_reservation "thunderx2t99_multiple" 1
> > > +  (and (eq_attr "tune" "thunderx2t99")
> > > +   (eq_attr "type" "multiple"))
> > > +  "thunderx2t99_i0+thunderx2t99_i1+thunderx2t99_i2+thunderx2t99_ls
> > > 0+thunderx2t99_ls1+thunderx2t99_sd+thunderx2t99_i1m1+thunderx2t99_i
> > > 1m2+thunderx2t99_i1m3+thunderx2t99_ls0d1+thunderx2t99_ls0d2+thunder
> > > x2t99_ls0d3+thunderx2t99_ls1d1+thunderx2t99_ls1d2+thunderx2t99_ls1d
> > > 3+thunderx2t99_f0+thunderx2t99_f1")
> > > +
> > We try to adhere to the 80 columns per line rule in the scheduling 
> > description files as well,
> > so can you please use "\" to break this into multiple lines.
> 
> Yes, I wasn't sure if whatever program parses this file would honor
> backslashes so I didn't break it up.  The falkor_extra definition in
> falkor.md that I looked at is more than 80 characters and that is one
> of the reasons I wasn't sure backslashes would be honored.  But I see
> other places (power8.md) where the backslashes are used so I will make
> that change if and when the patch is approved.

As long as it bootstraps I'm happy to take it - I'm sure you're better placed
to spot and understand regressions in thunderx2t99 performance than I am.

OK

Thanks,
James

[PATCH] Fix PR83277

2017-12-05 Thread Richard Biener


The following fixes a code-gen error in GRAPHITE.  liveout handling
is still somewhat flaky and the easiest fix is to force code generation
of defs where ISL scheduled them (rather than at the PHI edge src).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-12-05  Richard Biener  

PR tree-optimization/83277
* graphite-isl-ast-to-gimple.c (should_copy_to_new_region): Make sure
to code-gen liveout vars.

* gcc.dg/graphite/pr83277.c: New testcase.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 255402)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -1137,8 +1137,10 @@ should_copy_to_new_region (gimple *stmt,
   if (is_gimple_assign (stmt)
   && (lhs = gimple_assign_lhs (stmt))
   && TREE_CODE (lhs) == SSA_NAME
-  && is_gimple_reg (lhs)
-  && scev_analyzable_p (lhs, region->region))
+  && scev_analyzable_p (lhs, region->region)
+  /* But to code-generate liveouts - liveout PHI generation is
+ in generic sese.c code that cannot do code generation.  */
+  && ! bitmap_bit_p (region->liveout, SSA_NAME_VERSION (lhs)))
 return false;
 
   return true;
Index: gcc/testsuite/gcc.dg/graphite/pr83277.c
===
--- gcc/testsuite/gcc.dg/graphite/pr83277.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr83277.c (working copy)
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+int rk, si = 0;
+int jr[2];
+
+int
+wv (signed char n8)
+{
+  const int tw = 8;
+  int xq[tw];
+  int bj, pu = 0;
+
+  for (bj = 0; bj < tw; ++bj)
+xq[bj] = 0;
+
+  bj = 0;
+  while (bj < 1)
+{
+  int gs = n8 ^ 128;
+
+  if (gs != 0)
+   {
+ int u7[3];
+
+ while (bj < 2)
+   {
+ u7[bj] = 0;
+ ++bj;
+   }
+
+ jr[0] = u7[0];
+ rk = xq[0];
+ pu = n8;
+
+ if (si != 0)
+   return si;
+   }
+}
+
+  return pu;
+}
+
+int
+main (void)
+{
+  signed char ax = 1;
+
+  if (wv (ax) != ax)
+__builtin_abort ();
+  return 0;
+}

Re: libstdc++ PATCH to harmonize noexcept

2017-12-05 Thread Jonathan Wakely


On 01/12/17 15:32 +, Jonathan Wakely wrote:

On 14/11/17 13:56 -0500, Jason Merrill wrote:

While working on an unrelated issue I noticed that the compiler didn't
like some of these declarations after preprocessing, when they aren't
protected by system-header permissiveness.

I thought about limiting the permissiveness to only extern "C"
functions, but I believe that system headers are adding more C++
declarations, so we'd likely run into this issue again.

Shouldn't we build libstdc++ with -Wsystem-headers?  Maybe along with
-Wno-error=system-headers?

Jonathan approved these changes elsewhere.

Jason



commit abe66184d116f85f10108191b081f48dd0cfe3aa
Author: Jason Merrill 
Date:   Tue Nov 14 13:48:57 2017 -0500

  Correct noexcept mismatch in declarations.

  * include/bits/fs_ops.h (permissions): Add noexcept.
  * include/bits/fs_fwd.h (copy, copy_file): Remove noexcept.
  (permissions): Add noexcept.
  * include/std/string_view (find_first_of): Add noexcept.



There's another one needed too:

--- a/libstdc++-v3/libsupc++/eh_throw.cc
+++ b/libstdc++-v3/libsupc++/eh_throw.cc
@@ -53,8 +53,10 @@ __gxx_exception_cleanup (_Unwind_Reason_Code code, 
_Unwind_Exception *exc)
}

extern "C" __cxa_refcounted_exception*
-__cxxabiv1::__cxa_init_primary_exception(void *obj, std::type_info *tinfo,
- void (_GLIBCXX_CDTOR_CALLABI *dest) 
(void *))
+__cxxabiv1::
+__cxa_init_primary_exception(void *obj, std::type_info *tinfo,
+void (_GLIBCXX_CDTOR_CALLABI *dest) (void *))
+_GLIBCXX_NOTHROW
{
 __cxa_refcounted_exception *header
   = __get_refcounted_exception_header_from_obj (obj);


Tested powerpc64le-linux and committed to trunk.

commit ce8d32a9d091879df6b3435a8c2ba83cea2dd47e
Author: Jonathan Wakely 
Date:   Tue Dec 5 14:50:52 2017 +

Correct noexcept mismatch in declarations.

2017-12-05  Jason Merrill  
Jonathan Wakely  

* include/bits/fs_fwd.h (copy, copy_file): Remove noexcept.
(permissions): Add noexcept.
* include/bits/fs_ops.h (permissions): Add noexcept.
* libsupc++/eh_throw.cc (__cxa_init_primary_exception): Add
_GLIBCXX_NOTHROW.

diff --git a/libstdc++-v3/include/bits/fs_fwd.h b/libstdc++-v3/include/bits/fs_fwd.h
index f408a39b974..66b0d20f027 100644
--- a/libstdc++-v3/include/bits/fs_fwd.h
+++ b/libstdc++-v3/include/bits/fs_fwd.h
@@ -300,11 +300,11 @@ _GLIBCXX_END_NAMESPACE_CXX11
 
   void copy(const path& __from, const path& __to, copy_options __options);
   void copy(const path& __from, const path& __to, copy_options __options,
-	error_code&) noexcept;
+	error_code&);
 
   bool copy_file(const path& __from, const path& __to, copy_options __option);
   bool copy_file(const path& __from, const path& __to, copy_options __option,
-		 error_code&) noexcept;
+		 error_code&);
 
   path current_path();
 
@@ -319,7 +319,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   file_time_type last_write_time(const path&);
   file_time_type last_write_time(const path&, error_code&) noexcept;
 
-  void permissions(const path&, perms, perm_options, error_code&);
+  void permissions(const path&, perms, perm_options, error_code&) noexcept;
 
   path proximate(const path& __p, const path& __base, error_code& __ec);
   path proximate(const path& __p, const path& __base, error_code& __ec);
diff --git a/libstdc++-v3/include/bits/fs_ops.h b/libstdc++-v3/include/bits/fs_ops.h
index 075d61e2a63..22422bd1f4d 100644
--- a/libstdc++-v3/include/bits/fs_ops.h
+++ b/libstdc++-v3/include/bits/fs_ops.h
@@ -253,7 +253,7 @@ namespace filesystem
 
   void
   permissions(const path& __p, perms __prms, perm_options __opts,
-	  error_code& __ec);
+	  error_code& __ec) noexcept;
 
   inline path proximate(const path& __p, error_code& __ec)
   { return proximate(__p, current_path(), __ec); }
diff --git a/libstdc++-v3/libsupc++/eh_throw.cc b/libstdc++-v3/libsupc++/eh_throw.cc
index 13428d92da7..daf134993d3 100644
--- a/libstdc++-v3/libsupc++/eh_throw.cc
+++ b/libstdc++-v3/libsupc++/eh_throw.cc
@@ -53,8 +53,10 @@ __gxx_exception_cleanup (_Unwind_Reason_Code code, _Unwind_Exception *exc)
 }
 
 extern "C" __cxa_refcounted_exception*
-__cxxabiv1::__cxa_init_primary_exception(void *obj, std::type_info *tinfo,
- void (_GLIBCXX_CDTOR_CALLABI *dest) (void *))
+__cxxabiv1::
+__cxa_init_primary_exception(void *obj, std::type_info *tinfo,
+			 void (_GLIBCXX_CDTOR_CALLABI *dest) (void *))
+_GLIBCXX_NOTHROW
 {
   __cxa_refcounted_exception *header
 = __get_refcounted_exception_header_from_obj (obj);

[committed] Fix avx256-unaligned* testcases (PR testsuite/83289)

2017-12-05 Thread Jakub Jelinek

Hi!

As Segher mentioned, the -dp printed alternative numbers used to be one
based, but now are zero based.  This adjusts all the -dp testcases that
have /[1-9] regexps.

Tested on x86_64-linux and i686-linux, committed to trunk.

2017-12-05  Jakub Jelinek  

PR testsuite/83289
* gcc.target/i386/avx256-unaligned-load-1.c: Adjust for -dp
alternative numbers being 0 based instead of former 1 based.
* gcc.target/i386/avx256-unaligned-store-1.c: Likewise.
* gcc.target/i386/avx256-unaligned-store-2.c: Likewise.
* gcc.target/i386/avx256-unaligned-store-3.c: Likewise.
* gcc.target/i386/avx256-unaligned-store-4.c: Likewise.
* gcc.target/i386/sse2-init-v2di-2.c: Likewise.

--- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-1.c.jj  2016-05-22 
12:20:08.0 +0200
+++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-1.c 2017-12-05 
17:10:59.760515267 +0100
@@ -14,6 +14,6 @@ avx_test (void)
 c[i] = a[i] * b[i+3];
 }
 
-/* { dg-final { scan-assembler-not "vmovups\[^\n\r]*movv8sf_internal/3" } } */
-/* { dg-final { scan-assembler "movv4sf_internal/3" } } */
+/* { dg-final { scan-assembler-not "vmovups\[^\n\r]*movv8sf_internal/2" } } */
+/* { dg-final { scan-assembler "movv4sf_internal/2" } } */
 /* { dg-final { scan-assembler "vinsertf128" } } */
--- gcc/testsuite/gcc.target/i386/avx256-unaligned-store-1.c.jj 2016-05-22 
12:20:29.0 +0200
+++ gcc/testsuite/gcc.target/i386/avx256-unaligned-store-1.c2017-12-05 
17:08:43.835187190 +0100
@@ -17,6 +17,6 @@ avx_test (void)
 d[i] = c[i] * 20.0;
 }
 
-/* { dg-final { scan-assembler-not "vmovups.*movv8sf_internal/4" } } */
-/* { dg-final { scan-assembler "vmovups.*movv4sf_internal/4" } } */
+/* { dg-final { scan-assembler-not "vmovups.*movv8sf_internal/3" } } */
+/* { dg-final { scan-assembler "vmovups.*movv4sf_internal/3" } } */
 /* { dg-final { scan-assembler "vextractf128" } } */
--- gcc/testsuite/gcc.target/i386/avx256-unaligned-store-2.c.jj 2016-05-02 
09:22:07.0 +0200
+++ gcc/testsuite/gcc.target/i386/avx256-unaligned-store-2.c2017-12-05 
17:08:58.182010720 +0100
@@ -23,6 +23,6 @@ avx_test (void)
 }
 }
 
-/* { dg-final { scan-assembler-not "vmovups.*movv32qi_internal/4" } } */
-/* { dg-final { scan-assembler "vmovups.*movv16qi_internal/4" } } */
+/* { dg-final { scan-assembler-not "vmovups.*movv32qi_internal/3" } } */
+/* { dg-final { scan-assembler "vmovups.*movv16qi_internal/3" } } */
 /* { dg-final { scan-assembler "vextract.128" } } */
--- gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c.jj 2016-05-22 
12:20:05.0 +0200
+++ gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c2017-12-05 
17:09:12.409835713 +0100
@@ -17,6 +17,6 @@ avx_test (void)
 d[i] = c[i] * 20.0;
 }
 
-/* { dg-final { scan-assembler-not "vmovups.*movv4df_internal/4" } } */
-/* { dg-final { scan-assembler "vmovups.*movv2df_internal/4" } } */
+/* { dg-final { scan-assembler-not "vmovups.*movv4df_internal/3" } } */
+/* { dg-final { scan-assembler "vmovups.*movv2df_internal/3" } } */
 /* { dg-final { scan-assembler "vextractf128" } } */
--- gcc/testsuite/gcc.target/i386/avx256-unaligned-store-4.c.jj 2016-05-22 
12:20:09.0 +0200
+++ gcc/testsuite/gcc.target/i386/avx256-unaligned-store-4.c2017-12-05 
17:09:23.989693278 +0100
@@ -14,6 +14,6 @@ avx_test (void)
 b[i+3] = a[i] * c[i];
 }
 
-/* { dg-final { scan-assembler "vmovups.*movv8sf_internal/4" } } */
-/* { dg-final { scan-assembler-not "movups.*movv4sf_internal/4" } } */
+/* { dg-final { scan-assembler "vmovups.*movv8sf_internal/3" } } */
+/* { dg-final { scan-assembler-not "movups.*movv4sf_internal/3" } } */
 /* { dg-final { scan-assembler-not "vextractf128" } } */
--- gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c.jj 2016-06-02 
11:43:06.0 +0200
+++ gcc/testsuite/gcc.target/i386/sse2-init-v2di-2.c2017-12-05 
17:07:46.449893047 +0100
@@ -10,4 +10,4 @@ test (long long b)
   return _mm_cvtsi64_si128 (b); 
 }
 
-/* { dg-final { scan-assembler-times "vec_concatv2di/5" 1 } } */
+/* { dg-final { scan-assembler-times "vec_concatv2di/4" 1 } } */

Jakub

[Patch ARM] Fix probe_stack constraint.

2017-12-05 Thread Ramana Radhakrishnan

The probe_stack pattern uses r0 as a fixed register. This can cause 
issues if we have auto-increment instructions coming out that have r0 as 
the base register.


Tested with a bootstrap and regression run. richi reports that the 
original issue was fixed in the run. I did consider whether 
probe_stack_range was affected but it all comes back to probe_stack 
pattern so I think we are ok.


I don't have a testcase that seems to provoke this but it seems to be 
default on most distributions so I'm expecting the testcoverage to come 
from there.


Applied to trunk.

Ramana

PR target/82248

* config/arm/arm.md (probe_stack) : Change to using the 'o' constraint.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 545ee257699..d60c5af551c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8612,8 +8612,11 @@
(set_attr "type" "block")]
 )
 
+;; Since we hard code r0 here use the 'o' constraint to prevent
+;; provoking undefined behaviour in the hardware with putting out
+;; auto-increment operations with potentially r0 as the base register.
 (define_insn "probe_stack"
-  [(set (match_operand:SI 0 "memory_operand" "=m")
+  [(set (match_operand:SI 0 "memory_operand" "=o")
 (unspec:SI [(const_int 0)] UNSPEC_PROBE_STACK))]
   "TARGET_32BIT"
   "str%?\\tr0, %0"

Re: [PATCH] enhance documentation of -Wstringop-truncation

2017-12-05 Thread Jeff Law

On 12/03/2017 04:55 PM, Martin Sebor wrote:
> It was suggested to me that it might be helpful to mention
> attribute nonstring as a solution to -Wstringop-truncation
> warnings.  I think it's a good idea to add a reference to
> the attribute to the manual.  The attached patch does that.
> 
> Since the attribute is suitable only under rare conditions
> suggesting it in the text of the diagnostics (e.g., in the
> form of a note) would unlikely be useful (and, if acted on,
> could even lead to other kinds of warnings) so I opted not
> to make such a change.
> 
> Martin
> 
> wstringop-trunc-nonstring.diff
> 
> 
> 2017-12-03  Martin Sebor  
> 
>   * doc/invoke.texi (-Wstringop-truncation): Mention attribute nonstring.
OK.
jeff

Re: [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better

2017-12-05 Thread Jeff Law

On 12/04/2017 02:25 PM, Michael Matz wrote:
> Hi,
> 
> On Thu, 30 Nov 2017, Jeff Law wrote:
> 
>> And after PHI propagation we have:
>>   # m_5 = PHI <10(5), 12(6), 14(7)>
>>   # _2 = PHI <10(5), 12(6), 14(7)>
>>   # _3 = PHI <320(5), 384(6), 448(7)>
>> :
>>   goto ; [100.00%]
>>
>> DCE will come along and wipe out m_5 and _2 if they are unused.
> 
> When I did something along these lines a long time ago I had to be a bit 
> careful in not regressing performance.  Every PHI node with constants will 
> generate N instructions (with N the arity), there's no coalescing 
> possible.  And if the feeding PHI nodes don't go away it increases 
> register pressure by (at least) one.
Yup.  I ran into similar problems with a different patch in this general
space.  It's actually fairly easy to restrict to single use cases if we
wanted which should result in never making things worse at the expense
of sometimes not making it better when it could.

> 
> (If I read the patch correctly you don't handle the situation of "x op y" 
> where both arguments come from PHI nodes, so it's probably not an issue 
> for you, but I thought I'd mention it)
True, I don't handle x op y where both come from PHIs.  It's just X OP
CONST where X comes from a PHI with all constant arguments.

jeff

ps.  Ironically I was wandering through the regression list yesterday
and stumbled on a BZ which *may* be helped by this code.  I still need
to dig deeper though.

Re: [035/nnn] poly_int: expand_debug_expr

2017-12-05 Thread Jeff Law

On 10/23/2017 11:14 AM, Richard Sandiford wrote:
> This patch makes expand_debug_expr track polynomial memory offsets.
> It simplifies the handling of the case in which the reference is not
> to the first byte of the base, which seemed non-trivial enough to
> make it worth splitting out as a separate patch.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree.h (get_inner_reference): Add a version that returns the
>   offset and size as poly_int64_pods rather than HOST_WIDE_INTs.
>   * cfgexpand.c (expand_debug_expr): Track polynomial offsets.  Simply
>   the case in which bitpos is not associated with the first byte.
OK.
jeff

Re: [041/nnn] poly_int: reload.c

2017-12-05 Thread Jeff Law

On 10/23/2017 11:18 AM, Richard Sandiford wrote:
> This patch makes a few small poly_int64 changes to reload.c,
> such as in the "decomposition" structure.  In practice, any
> port with polynomial-sized modes should be using LRA rather
> than reload, but it's easier to convert reload anyway than
> to sprinkle to_constants everywhere.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * reload.h (reload::inc): Change from an int to a poly_int64_pod.
>   * reload.c (combine_reloads, debug_reload_to_stream): Likewise.
>   (decomposition): Change start and end from HOST_WIDE_INT
>   to poly_int64_pod.
>   (decompose, immune_p): Update accordingly.
>   (find_inc_amount): Return a poly_int64 rather than an int.
>   * reload1.c (inc_for_reload): Take the inc_amount as a poly_int64
>   rather than an int.
OK.
jeff

Re: [PATCH] PR ada/66205 gnatbind generates invalid code when finalization is enabled in restricted runtime

2017-12-05 Thread Simon Wright

On 7 Mar 2017, at 16:20, Simon Wright  wrote:
> 
> On 19 Dec 2015, at 22:05, Simon Wright  wrote:
>> 
>> On 12 Nov 2015, at 10:02, Arnaud Charlet  wrote:
>>> 
> This situation arises, for example, with an embedded RTS that
> incorporates the
> Ada 2012 generalized container iterators.
 
 I should add, this PR is the ???other half??? of PR ada/66242, which is 
 fixed
 in GCC 6; so please can it be reviewed?
>>> 
>>> The proper patch for PR ada/66242 hasn't been committed yet (it's pending),
>>> so I'd rather review the situation once PR ada/66242 is dealt with.
>>> 
>>> I'm not convinced at all that your patch is the way to go, so I'd rather
>>> consider it only after PR ada/66242 is solved properly.
>> 
>> Looks as though PR ada/66242 has been sorted out.
>> 
>> Since we can now *compile* code that is built with finalization enabled in a 
>> restricted runtime, but we can't *bind* it, could we take another look at 
>> this? the patch I provided in this thread still applies at snapshot 20151213 
>> with minor offsets (8).
> 
> Same problem exists in gcc version 7.0.1 20170302 (experimental) (GCC).

and with gcc 8.0.0 20171102 (experimental) (r254339); and there's been no 
change to the affected file (bindgen.adb) since then.

I've come up with a considerably simpler patch, which merely causes the 
procedure adafinal to be generated with a null body if the restriction 
No_Task_Termination is set (note, this restriction is part of the Ravenscar 
profile; if tasks can't terminate, program level finalization can't happen [ARM 
10.2(25), "When the environment task completes (normally or abnormally), it 
waits for the termination of all such tasks, and then finalizes any remaining 
objects of the partition."]

I've bootstrapped the compiler (x86_64-apple-darwin15), and "make check-ada" 
produces the same results with and without the patch (the same 3 FAILs occur in 
both, in gnat.sum). For the arm-eabi compiler, I successfully make and run 
builds for an STM32F4 target without exception propagation but with and without 
finalization.

gcc/ada/Changelog:

2017-12-05  Simon Wright 

 PR ada/66205
  * bindgen.adb (Gen_AdaFinal): If the restriction
 No_Task_Termination is present, generate a null body.



bindgen.adb.diff
Description: Binary data

Re: [042/nnn] poly_int: reload1.c

2017-12-05 Thread Jeff Law

On 10/23/2017 11:18 AM, Richard Sandiford wrote:
> This patch makes a few small poly_int64 changes to reload1.c,
> mostly related to eliminations.  Again, there's no real expectation
> that reload will be used for targets that have polynomial-sized modes,
> but it seemed easier to convert it anyway.
And since all the routines are "free", there's nothing that really
prevents them from being called from elsewhere.  So better to go ahead
and convert 'em.

If someone were to look at a refactoring project, pulling all the static
objects into a class and using that to drive turning the free functions
into methods would help give us better isolation of this code.

One could argue we should do this across the board to cut down on the
amount of globals we continue to access and make it clearer what
routines need those globals vs which are truly free standing functions.


> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * reload1.c (elim_table): Change initial_offset, offset and
>   previous_offset from HOST_WIDE_INT to poly_int64_pod.
>   (offsets_at): Change the target array's element type from
>   HOST_WIDE_INT to poly_int64_pod.
>   (set_label_offsets, eliminate_regs_1, eliminate_regs_in_insn)
>   (elimination_costs_in_insn, update_eliminable_offsets)
>   (verify_initial_elim_offsets, set_offsets_for_label)
>   (init_eliminable_invariants): Update after above changes.
OK.
jeff

Re: [052/nnn] poly_int: bit_field_size/offset

2017-12-05 Thread Jeff Law

On 10/23/2017 11:22 AM, Richard Sandiford wrote:
> verify_expr ensured that the size and offset in gimple BIT_FIELD_REFs
> satisfied tree_fits_uhwi_p.  This patch extends that so that they can
> be poly_uint64s, and adds helper routines for accessing them when the
> verify_expr requirements apply.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree.h (bit_field_size, bit_field_offset): New functions.
>   * hsa-gen.c (gen_hsa_addr): Use them.
>   * tree-ssa-forwprop.c (simplify_bitfield_ref): Likewise.
>   (simplify_vector_constructor): Likewise.
>   * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>   * tree-cfg.c (verify_expr): Require the sizes and offsets of a
>   BIT_FIELD_REF to be poly_uint64s rather than uhwis.
>   * fold-const.c (fold_ternary_loc): Protect tree_to_uhwi with
>   tree_fits_uhwi_p.
> 
OK.
jeff

Re: [059/nnn] poly_int: tree-ssa-loop-ivopts.c:iv_use

2017-12-05 Thread Jeff Law

On 10/23/2017 11:24 AM, Richard Sandiford wrote:
> This patch makes ivopts handle polynomial address offsets
> when recording potential IV uses.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-ssa-loop-ivopts.c (iv_use::addr_offset): Change from
>   an unsigned HOST_WIDE_INT to a poly_uint64_pod.
>   (group_compare_offset): Update accordingly.
>   (split_small_address_groups_p): Likewise.
>   (record_use): Take addr_offset as a poly_uint64 rather than
>   an unsigned HOST_WIDE_INT.
>   (strip_offset): Return the offset as a poly_uint64 rather than
>   an unsigned HOST_WIDE_INT.
>   (record_group_use, split_address_groups): Track polynomial offsets.
>   (add_iv_candidate_for_use): Likewise.
>   (addr_offset_valid_p): Take the offset as a poly_int64 rather
>   than a HOST_WIDE_INT.
>   (strip_offset_1): Return the offset as a poly_int64 rather than
>   a HOST_WIDE_INT.
OK.
jeff

[PR C++/83287] Mark lookup for keeping

2017-12-05 Thread Nathan Sidwell

This fixes 83287, a case where we failed to mark a lookup occuring 
inside a template definition as being kept for instantiation.  It turns 
out that CAST_EXPR's single argument is a TREE_LIST, so we need to check 
it for OVERLOADS.  CAST_EXPR behaves this way because it's modelling a 
function call. AFAICT it is the only node of this form.


applying to trunk.

nathan
--
Nathan Sidwell
Index: cp/tree.c
===
--- cp/tree.c	(revision 255420)
+++ cp/tree.c	(working copy)
@@ -3230,6 +3230,13 @@ build_min (enum tree_code code, tree tt,
 }
 
   va_end (p);
+
+  if (code == CAST_EXPR)
+/* The single operand is a TREE_LIST, which we have to check.  */
+for (tree v = TREE_OPERAND (t, 0); v; v = TREE_CHAIN (v))
+  if (TREE_CODE (TREE_VALUE (v)) == OVERLOAD)
+	lookup_keep (TREE_VALUE (v), true);
+
   return t;
 }
 
Index: testsuite/g++.dg/lookup/pr83287.C
===
--- testsuite/g++.dg/lookup/pr83287.C	(revision 0)
+++ testsuite/g++.dg/lookup/pr83287.C	(working copy)
@@ -0,0 +1,19 @@
+// PR c++/83287 failed to keep lookup until instantiation time
+
+void foo ();
+
+namespace {
+  void foo (int);
+}
+
+template 
+void bar ()
+{
+  T (*p)() = (T (*)(void)) foo;
+}
+
+void
+baz ()
+{
+  bar ();
+}

Re: [060/nnn] poly_int: loop versioning threshold

2017-12-05 Thread Jeff Law

On 10/23/2017 11:25 AM, Richard Sandiford wrote:
> This patch splits the loop versioning threshold out from the
> cost model threshold so that the former can become a poly_uint64.
> We still use a single test to enforce both limits where possible.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vectorizer.h (_loop_vec_info): Add a versioning_threshold
>   field.
>   (LOOP_VINFO_VERSIONING_THRESHOLD): New macro
>   (vect_loop_versioning): Take the loop versioning threshold as a
>   separate parameter.
>   * tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
>   * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>   versioning_threshold.
>   (vect_analyze_loop_2): Compute the loop versioning threshold
>   whenever loop versioning is needed, and store it in the new
>   field rather than combining it with the cost model threshold.
>   (vect_transform_loop): Update call to vect_loop_versioning.
>   Try to combine the loop versioning and cost thresholds here.
So you dropped the tests for PEELING_FOR_GAPS and PEELING_FOR_NITER in
vect_analyze_loop_2.  Was that intentional?

Otherwise it looks fine.  If the drop was intentional, then OK as-is.

jeff

Re: [062/nnn] poly_int: prune_runtime_alias_test_list

2017-12-05 Thread Jeff Law

On 10/23/2017 11:25 AM, Richard Sandiford wrote:
> This patch makes prune_runtime_alias_test_list take the iteration
> factor as a poly_int and tracks polynomial offsets internally
> as well.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-data-ref.h (prune_runtime_alias_test_list): Take the
>   factor as a poly_uint64 rather than an unsigned HOST_WIDE_INT.
>   * tree-data-ref.c (prune_runtime_alias_test_list): Likewise.
>   Track polynomial offsets.
This is OK.  Note that both Richi and Bin have been pretty active in
this general area.  So adjustments may be needed.


Jeff

Re: [065/nnn] poly_int: vect_nunits_for_cost

2017-12-05 Thread Jeff Law

On 10/23/2017 11:27 AM, Richard Sandiford wrote:
> This patch adds a function for getting the number of elements in
> a vector for cost purposes, which is always constant.  It makes
> it possible for a later patch to change GET_MODE_NUNITS and
> TYPE_VECTOR_SUBPARTS to a poly_int.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vectorizer.h (vect_nunits_for_cost): New function.
>   * tree-vect-loop.c (vect_model_reduction_cost): Use it.
>   * tree-vect-slp.c (vect_analyze_slp_cost_1): Likewise.
>   (vect_analyze_slp_cost): Likewise.
>   * tree-vect-stmts.c (vect_model_store_cost): Likewise.
>   (vect_model_load_cost): Likewise.
OK.
jeff

Re: [066/nnn] poly_int: omp_max_vf

2017-12-05 Thread Jeff Law

On 10/23/2017 11:27 AM, Richard Sandiford wrote:
> This patch makes omp_max_vf return a polynomial vectorization factor.
> We then need to be able to stash a polynomial value in
> OMP_CLAUSE_SAFELEN_EXPR too:
> 
>/* If max_vf is non-zero, then we can use only a vectorization factor
>   up to the max_vf we chose.  So stick it into the safelen clause.  */
> 
> For now the cfgloop safelen is still constant though.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * omp-general.h (omp_max_vf): Return a poly_uint64 instead of an int.
>   * omp-general.c (omp_max_vf): Likewise.
>   * omp-expand.c (omp_adjust_chunk_size): Update call to omp_max_vf.
>   (expand_omp_simd): Handle polynomial safelen.
>   * omp-low.c (omplow_simd_context): Add a default constructor.
>   (omplow_simd_context::max_vf): Change from int to poly_uint64.
>   (lower_rec_simd_input_clauses): Update accordingly.
>   (lower_rec_input_clauses): Likewise.
OK.
jeff

Re: [064/nnn] poly_int: SLP max_units

2017-12-05 Thread Jeff Law

On 10/23/2017 11:26 AM, Richard Sandiford wrote:
> This match makes tree-vect-slp.c track the maximum number of vector
> units as a poly_uint64 rather than an unsigned int.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-slp.c (vect_record_max_nunits, vect_build_slp_tree_1)
>   (vect_build_slp_tree_2, vect_build_slp_tree): Change max_nunits
>   from an unsigned int * to a poly_uint64_pod *.
>   (calculate_unrolling_factor): New function.
>   (vect_analyze_slp_instance): Use it.  Track polynomial max_nunits.
OK.

jeff

Re: [071/nnn] poly_int: vectorizable_induction

2017-12-05 Thread Jeff Law

On 10/23/2017 11:29 AM, Richard Sandiford wrote:
> This patch makes vectorizable_induction cope with variable-length
> vectors.  For now we punt on SLP inductions, but patchees after
> the main SVE submission add support for those too.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-loop.c (vectorizable_induction): Treat the number
>   of units as polynomial.  Punt on SLP inductions.  Use an integer
>   VEC_SERIES_EXPR for variable-length integer reductions.  Use a
>   cast of such a series for variable-length floating-point
>   reductions.
OK.
jeff

Re: [079/nnn] poly_int: vect_no_alias_p

2017-12-05 Thread Jeff Law

On 10/23/2017 11:32 AM, Richard Sandiford wrote:
> This patch replaces the two-state vect_no_alias_p with a three-state
> vect_compile_time_alias that handles polynomial segment lengths.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-data-refs.c (vect_no_alias_p): Replace with...
>   (vect_compile_time_alias): ...this new function.  Do the calculation
>   on poly_ints rather than trees.
>   (vect_prune_runtime_alias_test_list): Update call accordingly.
OK.
jeff

Re: [080/nnn] poly_int: tree-vect-generic.c

2017-12-05 Thread Jeff Law

On 10/23/2017 11:32 AM, Richard Sandiford wrote:
> This patch makes tree-vect-generic.c cope with variable-length vectors.
> Decomposition is only supported for constant-length vectors, since we
> should never generate unsupported variable-length operations.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-generic.c (nunits_for_known_piecewise_op): New function.
>   (expand_vector_piecewise): Use it instead of TYPE_VECTOR_SUBPARTS.
>   (expand_vector_addition, add_rshift, expand_vector_divmod): Likewise.
>   (expand_vector_condition, vector_element): Likewise.
>   (subparts_gt): New function.
>   (get_compute_type): Use subparts_gt.
>   (count_type_subparts): Delete.
>   (expand_vector_operations_1): Use subparts_gt instead of
>   count_type_subparts.
OK.
jeff

Re: [076/nnn] poly_int: vectorizable_conversion

2017-12-05 Thread Jeff Law

On 11/28/2017 11:09 AM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 10/23/2017 11:30 AM, Richard Sandiford wrote:
>>> This patch makes vectorizable_conversion cope with variable-length
>>> vectors.  We already require the number of elements in one vector
>>> to be a multiple of the number of elements in the other vector,
>>> so the patch uses that to choose between widening and narrowing.
>>>
>>>
>>> 2017-10-23  Richard Sandiford  
>>> Alan Hayward  
>>> David Sherwood  
>>>
>>> gcc/
>>> * tree-vect-stmts.c (vectorizable_conversion): Treat the number
>>> of units as polynomial.  Choose between WIDE and NARROW based
>>> on multiple_p.
>> If I'm reding this right, if nunits_in < nunits_out, but the latter is
>> not a multiple of the former, we'll choose WIDEN, which is the opposite
>> of what we'd do before this patch.  Was that intentional?
> 
> That case isn't possible, so we'd assert:
> 
>   if (must_eq (nunits_out, nunits_in))
> modifier = NONE;
>   else if (multiple_p (nunits_out, nunits_in))
> modifier = NARROW;
>   else
> {
>   gcc_checking_assert (multiple_p (nunits_in, nunits_out));
>   modifier = WIDEN;
> }
> 
> We already implicitly rely on this, since we either widen one full
> vector to N full vectors or narrow N full vectors to one vector.
> 
> Structurally this is enforced by all vectors having the same number of
> bytes (current_vector_size) and the number of vector elements being a
> power of 2 (or in the case of poly_int, a power of 2 times a runtime
> variant, but that's good enough, since the runtime invariant is the same
> in both cases).
OK.  THanks for clarifying.

jeff

[PATCH, GCC/testsuite] Fix dump-noaddr dumpbase

2017-12-05 Thread Thomas Preudhomme


Hi,

dump-noaddr test FAILS when $tmpdir is not the same as the directory
where runtest is called from. Note that this does not happen when
running make check because tmpdir is set to srcdir.

In that case, file mkdir will create the directory in the current
directory while GCC is invoked from tmpdir and hence -dumpbase look
for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to
be relative to tmpdir which will work in all case.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-12-05  Thomas Preud'homme  

* gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base
relative to tmpdir.

Testing: Successfully ran unsorted.exp via make check and out of tree
testing using runtest from /test with tmpdir set in
/test/site.exp to .

Is this ok for stage3?

Best regards,

Thomas
diff --git a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
index d14d494570944b2be82c2575204cdbf4b15721ca..68d6c3e38325cabbdd280ecf05e663dbcda99900 100644
--- a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
+++ b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
@@ -11,10 +11,10 @@ proc dump_compare { src options } {
 foreach option $option_list {
 	file delete -force dump1
 	file mkdir dump1
-	c-torture-compile $src "$option $options -dumpbase dump1/$dumpbase -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
+	c-torture-compile $src "$option $options -dumpbase [pwd]/dump1/$dumpbase -DMASK=1 -x c --param ggc-min-heapsize=1 -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
 	file delete -force dump2
 	file mkdir dump2
-	c-torture-compile $src "$option $options -dumpbase dump2/$dumpbase -DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
+	c-torture-compile $src "$option $options -dumpbase [pwd]/dump2/$dumpbase -DMASK=2 -x c -fdump-ipa-all -fdump-rtl-all -fdump-tree-all -fdump-noaddr"
 	foreach dump1 [lsort [glob -nocomplain dump1/*]] {
 	regsub dump1/ $dump1 dump2/ dump2
 	set dumptail "gcc.c-torture/unsorted/[file tail $dump1]"

Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase

2017-12-05 Thread Andrew Pinski

On Tue, Dec 5, 2017 at 9:50 AM, Thomas Preudhomme
 wrote:
> Hi,
>
> dump-noaddr test FAILS when $tmpdir is not the same as the directory
> where runtest is called from. Note that this does not happen when
> running make check because tmpdir is set to srcdir.
>
> In that case, file mkdir will create the directory in the current
> directory while GCC is invoked from tmpdir and hence -dumpbase look
> for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to
> be relative to tmpdir which will work in all case.
>
> ChangeLog entry is as follows:
>
> *** gcc/testsuite/ChangeLog ***
>
> 2017-12-05  Thomas Preud'homme  
>
> * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base
> relative to tmpdir.
>
> Testing: Successfully ran unsorted.exp via make check and out of tree
> testing using runtest from /test with tmpdir set in
> /test/site.exp to .
>
> Is this ok for stage3?

https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html
I don't remember where this discussion went last time.
Maybe this time there will be a resolution :).

Thanks,
Andrew


>
> Best regards,
>
> Thomas

[AArch64] Fix ICEs in aarch64_print_operand

2017-12-05 Thread Richard Sandiford

Three related regression fixes:

- We can't use asserts like:

gcc_assert (GET_MODE_SIZE (mode) == 16);

  in aarch64_print_operand because it could trigger for invalid user input.

- The output_operand_lossage in aarch64_print_address_internal:

output_operand_lossage ("invalid operand for '%%%c'", op);

  wasn't right because "op" is an rtx_code enum rather than the
  prefix character.

- aarch64_print_operand_address shouldn't call output_operand_lossage
  (because it doesn't have a prefix code) but instead fall back to
  output_addr_const.

Tested on aarch64-linux-gnu.  OK to install?

Thanks,
Richard


2017-12-05  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_print_address_internal): Return
a bool success value.  Don't call output_operand_lossage here.
(aarch64_print_ldpstp_address): Return a bool success value.
(aarch64_print_operand_address): Call output_addr_const if
aarch64_print_address_internal fails.
(aarch64_print_operand): Don't assert that the mode is 16 bytes for
'y'; call output_operand_lossage instead.  Call output_operand_lossage
if aarch64_print_ldpstp_address fails.

gcc/testsuite/
* gcc.target/aarch64/asm-2.c: New test.
* gcc.target/aarch64/asm-3.c: Likewise.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-12-05 14:24:52.477015238 +
+++ gcc/config/aarch64/aarch64.c2017-12-05 17:54:56.466247227 +
@@ -150,7 +150,7 @@ static bool aarch64_builtin_support_vect
 bool is_packed);
 static machine_mode
 aarch64_simd_container_mode (scalar_mode mode, unsigned width);
-static void aarch64_print_ldpstp_address (FILE *f, machine_mode mode, rtx x);
+static bool aarch64_print_ldpstp_address (FILE *, machine_mode, rtx);
 
 /* Major revision number of the ARM Architecture implemented by the target.  */
 unsigned aarch64_architecture_version;
@@ -5600,22 +5600,21 @@ #define buf_size 20
   {
machine_mode mode = GET_MODE (x);
 
-   if (GET_CODE (x) != MEM)
+   if (GET_CODE (x) != MEM
+   || (code == 'y' && GET_MODE_SIZE (mode) != 16))
  {
output_operand_lossage ("invalid operand for '%%%c'", code);
return;
  }
 
if (code == 'y')
- {
-   /* LDP/STP which uses a single double-width memory operand.
-  Adjust the mode to appear like a typical LDP/STP.
-  Currently this is supported for 16-byte accesses only.  */
-   gcc_assert (GET_MODE_SIZE (mode) == 16);
-   mode = DFmode;
- }
+ /* LDP/STP which uses a single double-width memory operand.
+Adjust the mode to appear like a typical LDP/STP.
+Currently this is supported for 16-byte accesses only.  */
+ mode = DFmode;
 
-   aarch64_print_ldpstp_address (f, mode, XEXP (x, 0));
+   if (!aarch64_print_ldpstp_address (f, mode, XEXP (x, 0)))
+ output_operand_lossage ("invalid operand prefix '%%%c'", code);
   }
   break;
 
@@ -5628,7 +5627,7 @@ #define buf_size 20
 /* Print address 'x' of a memory access with mode 'mode'.
'op' is the context required by aarch64_classify_address.  It can either be
MEM for a normal memory access or PARALLEL for LDP/STP.  */
-static void
+static bool
 aarch64_print_address_internal (FILE *f, machine_mode mode, rtx x, RTX_CODE op)
 {
   struct aarch64_address_info addr;
@@ -5645,7 +5644,7 @@ aarch64_print_address_internal (FILE *f,
else
  asm_fprintf (f, "[%s, %wd]", reg_names [REGNO (addr.base)],
   INTVAL (addr.offset));
-   return;
+   return true;
 
   case ADDRESS_REG_REG:
if (addr.shift == 0)
@@ -5654,7 +5653,7 @@ aarch64_print_address_internal (FILE *f,
else
  asm_fprintf (f, "[%s, %s, lsl %u]", reg_names [REGNO (addr.base)],
   reg_names [REGNO (addr.offset)], addr.shift);
-   return;
+   return true;
 
   case ADDRESS_REG_UXTW:
if (addr.shift == 0)
@@ -5663,7 +5662,7 @@ aarch64_print_address_internal (FILE *f,
else
  asm_fprintf (f, "[%s, w%d, uxtw %u]", reg_names [REGNO (addr.base)],
   REGNO (addr.offset) - R0_REGNUM, addr.shift);
-   return;
+   return true;
 
   case ADDRESS_REG_SXTW:
if (addr.shift == 0)
@@ -5672,7 +5671,7 @@ aarch64_print_address_internal (FILE *f,
else
  asm_fprintf (f, "[%s, w%d, sxtw %u]", reg_names [REGNO (addr.base)],
   REGNO (addr.offset) - R0_REGNUM, addr.shift);
-   return;
+   return true;
 
   case ADDRESS_REG_WB:
switch (GET_CODE (x))
@@ -5680,27 +5679,27 @@ aarch64_print_address_internal (FILE *f,
  case PRE_INC:
asm_fprintf (f, "[%s, %d]!", reg_names [REGNO (addr.base)],

Re: [Patch, Fortran, F2008] PR 44549: Type-bound procedure: bogus error from list after PROCEDURE

2017-12-05 Thread Bernhard Reutner-Fischer

On 16 June 2010 at 14:56, Janus Weil  wrote:
>>> Regtested on x86_64-unknown-linux-gnu. Ok for trunk?
>>>
>>
>> OK. Kudos to Dominique for finding it and thanks, Janus, for fixing it.
>
> Thanks. Committed as r160834.

@@ -4591,12 +4591,14 @@ gfc_check_symbol_typed (gfc_symbol* sym, gfc_namespace*
ns,
list and marked `error' until symbols are committed.  */

 gfc_typebound_proc*
-gfc_get_typebound_proc (void)
+gfc_get_typebound_proc (gfc_typebound_proc *tb0)
 {
   gfc_typebound_proc *result;
   tentative_tbp *list_node;

   result = XCNEW (gfc_typebound_proc);
+  if (tb0)
+*result = *tb0;
   result->error = 1;

   list_node = XCNEW (tentative_tbp);

doesn't this leak result if tb0 != NULL ? I must be missing something?

thanks,
>
> Cheers,
> Janus

C++ PATCH for c++/82331, ICE with variadic specialization of auto template parm

2017-12-05 Thread Jason Merrill

The problem here is that deduction of F tries to substitute into the
type of F, but we haven't deduced R or A yet, so we can't do anything.
tsubst_pack_expansion knows how to handle this, but only expects to
see it within a template.  In general, substitution in the middle of
deduction needs to be treated as happening in template context.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 353564f5e2bcc9108fa3f2ef770d5b46a3be76d0
Author: Jason Merrill 
Date:   Tue Dec 5 12:41:13 2017 -0500

PR c++/82331 - ICE with variadic partial specialization of auto

* pt.c (unify) [TEMPLATE_PARM_INDEX]: Set processing_template_decl
around call to tsubst.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 500ac0c64fe..685f34a735d 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -20942,7 +20942,9 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
 template-parameter exactly, except that a template-argument
 deduced from an array bound may be of any integral type.
 The non-type parameter might use already deduced type parameters.  */
+  ++processing_template_decl;
   tparm = tsubst (TREE_TYPE (parm), targs, 0, NULL_TREE);
+  --processing_template_decl;
   if (tree a = type_uses_auto (tparm))
{
  tparm = do_auto_deduction (tparm, arg, a, complain, adc_unify);
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype-auto13.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype-auto13.C
new file mode 100644
index 000..2152cef811e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype-auto13.C
@@ -0,0 +1,18 @@
+// PR c++/82331
+// { dg-options -std=c++17 }
+
+template 
+class X;
+
+template 
+class X {
+public:
+static R call (A... args)
+{
+return (*F)(args...);
+}
+};
+
+int func (int a, int b) { return a + b; }
+
+int test () { return X<&func>::call(1, 2); }

Re: [PATCH] PR ada/66205 gnatbind generates invalid code when finalization is enabled in restricted runtime

2017-12-05 Thread Arnaud Charlet

> I've come up with a considerably simpler patch, which merely causes the
> procedure adafinal to be generated with a null body if the restriction
> No_Task_Termination is set (note, this restriction is part of the
> Ravenscar profile; if tasks can't terminate, program level finalization
> can't happen [ARM 10.2(25), "When the environment task completes
> (normally or abnormally), it waits for the termination of all such tasks,
> and then finalizes any remaining objects of the partition."]
> 
> I've bootstrapped the compiler (x86_64-apple-darwin15), and "make
> check-ada" produces the same results with and without the patch (the same 3
> FAILs occur in both, in gnat.sum). For the arm-eabi compiler, I
> successfully make and run builds for an STM32F4 target without exception
> propagation but with and without finalization.

That's a much simpler and better approach indeed.

OK to commit.

> gcc/ada/Changelog:
> 
> 2017-12-05  Simon Wright 
> 
>  PR ada/66205
>   * bindgen.adb (Gen_AdaFinal): If the restriction
>  No_Task_Termination is present, generate a null body.
>

Re: [PATCH libstdc++/66689] comp_ellint_3 and ellint_3 return garbage values

2017-12-05 Thread David Edelsohn

On Mon, Nov 20, 2017 at 4:51 PM, Jonathan Wakely  wrote:
> On 20/11/17 21:07 +, Jonathan Wakely wrote:
>>
>> On 20/11/17 21:01 +, Jonathan Wakely wrote:
>>>
>>> On 20/11/17 21:43 +0100, Christophe Lyon wrote:

 On 20 November 2017 at 17:02, David Edelsohn  wrote:
>
> This patch has introduced new regressions on at least PowerPC and
> AArch64.
>
> FAIL: ext/special_functions/hyperg/check_value.cc execution test
> FAIL:
> tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc
> execution test
>
> Thanks, David


 On AArch64 and ARM, I have also noticed
 FAIL: special_functions/18_riemann_zeta/check_value.cc (test for excess
 errors)
 UNRESOLVED: special_functions/18_riemann_zeta/check_value.cc
 compilation failed to produce executable
 because:

 /libstdc++-v3/testsuite/special_functions/18_riemann_zeta/check_value.cc:
 In function 'void test(const testcase_riemann_zeta (&)[Num],
 Ret)':

 /libstdc++-v3/testsuite/special_functions/18_riemann_zeta/check_value.cc:292:
 error: 'riemann_zeta' is not a member of 'std'

 /libstdc++-v3/testsuite/special_functions/18_riemann_zeta/check_value.cc:292:
 note: suggested alternative: 'remainder'
 compiler exited with status 1
>>>
>>>
>>> The problem is that { dg-addition-options } was changed to dg-options,
>>> and so the first dg-options that enables the special functions is not
>>> used:
>>>
>>> ---
>>> a/libstdc++-v3/testsuite/special_functions/18_riemann_zeta/check_value.cc
>>> +++
>>> b/libstdc++-v3/testsuite/special_functions/18_riemann_zeta/check_value.cc
>>> @@ -21,7 +21,7 @@
>>> //  riemann_zeta
>>>
>>> // This can take long on simulators, timing out the test.
>>> -// { dg-additional-options "-DMAX_ITERATIONS=5" { target simulator } }
>>> +// { dg-options "-DMAX_ITERATIONS=5" { target simulator } }
>>>
>>
>>
>> I have a script to check dejagnu directives, and it says:
>>
>>
>> testsuite/tr1/5_numerical_facilities/special_functions/20_riemann_zeta/check_value_neg.cc
>> has multiple dg-options directives
>> testsuite/ext/special_functions/airy_ai/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/ext/special_functions/hyperg/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/ext/special_functions/conf_hyperg/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/ext/special_functions/airy_bi/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/02_assoc_legendre/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/14_expint/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/12_ellint_2/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/09_cyl_bessel_k/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/21_sph_neumann/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/15_hermite/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/19_sph_bessel/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/05_comp_ellint_2/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/11_ellint_1/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/17_legendre/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/10_cyl_neumann/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/06_comp_ellint_3/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/06_comp_ellint_3/pr66689.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/01_assoc_laguerre/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/16_laguerre/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/13_ellint_3/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/13_ellint_3/pr66689.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/07_cyl_bessel_i/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/18_riemann_zeta/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/18_riemann_zeta/check_value.cc has multiple
>> dg-options directives
>> testsuite/special_functions/08_cyl_bessel_j/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/04_comp_ellint_1/check_nan.cc has dg-options
>> after dg-add-options
>> testsuite/special_functions/03_beta/check_nan.cc has dg-options after
>> dg-add-options
>> testsuite/special_functions/20_sph_legendre/check_nan.cc has dg-options
>> after dg-add-options
>>
>> For now I'll just fix the multiple dg-options one causing the FAILs.
>
>
>
> I've committed this patch, which should help for Christophe's cases.
>

Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase

2017-12-05 Thread Thomas Preudhomme




On 05/12/17 17:54, Andrew Pinski wrote:

On Tue, Dec 5, 2017 at 9:50 AM, Thomas Preudhomme
 wrote:

Hi,

dump-noaddr test FAILS when $tmpdir is not the same as the directory
where runtest is called from. Note that this does not happen when
running make check because tmpdir is set to srcdir.

In that case, file mkdir will create the directory in the current
directory while GCC is invoked from tmpdir and hence -dumpbase look
for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to
be relative to tmpdir which will work in all case.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-12-05  Thomas Preud'homme  

 * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base
 relative to tmpdir.

Testing: Successfully ran unsorted.exp via make check and out of tree
testing using runtest from /test with tmpdir set in
/test/site.exp to .

Is this ok for stage3?


https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html
I don't remember where this discussion went last time.
Maybe this time there will be a resolution :).


FWIW, I agree with Matt, creating the dump in tmpdir makes more sense. I think 
his patch can be simplified though because the compiler seems to be invoked from 
tmpdir so it can at least be omitted from the -dumpbase.


Best regards,

Thomas

Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase

2017-12-05 Thread Mike Stump

On Dec 5, 2017, at 9:54 AM, Andrew Pinski  wrote:
> 
> On Tue, Dec 5, 2017 at 9:50 AM, Thomas Preudhomme
>  wrote:
>> Hi,
>> 
>> dump-noaddr test FAILS when $tmpdir is not the same as the directory
>> where runtest is called from. Note that this does not happen when
>> running make check because tmpdir is set to srcdir.
>> 
>> In that case, file mkdir will create the directory in the current
>> directory while GCC is invoked from tmpdir and hence -dumpbase look
>> for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to
>> be relative to tmpdir which will work in all case.
>> 
>> ChangeLog entry is as follows:
>> 
>> *** gcc/testsuite/ChangeLog ***
>> 
>> 2017-12-05  Thomas Preud'homme  
>> 
>>* gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base
>>relative to tmpdir.
>> 
>> Testing: Successfully ran unsorted.exp via make check and out of tree
>> testing using runtest from /test with tmpdir set in
>> /test/site.exp to .
>> 
>> Is this ok for stage3?
> 
> https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html
> I don't remember where this discussion went last time.

I approved his patch last time in:

  https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html

I've not seen anyone argue against it.

> Maybe this time there will be a resolution :).

A resolution would be for someone to check in the approved patch, or ask for it 
to be installed.  :-o

I've checked it in.  It might not work for canadian crosses, but we can punt 
that to the next poor soul.

Give it a try and let us know if that cures the problem.  Feel free to just fix 
it, if you notice anything wrong.  I'm thinking it will cure all the problems 
people have seen.

Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase

2017-12-05 Thread Mike Stump

On Dec 5, 2017, at 11:11 AM, Thomas Preudhomme  
wrote:
> 
> On 05/12/17 17:54, Andrew Pinski wrote:
>> On Tue, Dec 5, 2017 at 9:50 AM, Thomas Preudhomme
>>  wrote:
>>> Hi,
>>> 
>>> dump-noaddr test FAILS when $tmpdir is not the same as the directory
>>> where runtest is called from. Note that this does not happen when
>>> running make check because tmpdir is set to srcdir.
>>> 
>>> In that case, file mkdir will create the directory in the current
>>> directory while GCC is invoked from tmpdir and hence -dumpbase look
>>> for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to
>>> be relative to tmpdir which will work in all case.
>>> 
>>> ChangeLog entry is as follows:
>>> 
>>> *** gcc/testsuite/ChangeLog ***
>>> 
>>> 2017-12-05  Thomas Preud'homme  
>>> 
>>> * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base
>>> relative to tmpdir.
>>> 
>>> Testing: Successfully ran unsorted.exp via make check and out of tree
>>> testing using runtest from /test with tmpdir set in
>>> /test/site.exp to .
>>> 
>>> Is this ok for stage3?
>> https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html
>> I don't remember where this discussion went last time.
>> Maybe this time there will be a resolution :).
> 
> FWIW, I agree with Matt, creating the dump in tmpdir makes more sense. I 
> think his patch can be simplified though because the compiler seems to be 
> invoked from tmpdir so it can at least be omitted from the -dumpbase.

Sounds reasonable.  I've added that on top of his patch and checked that in.  
Let us know if it works or not.

Re: [PATCH] [pr#83069] Keep profile_count for bb under real_bb_freq_max

2017-12-05 Thread Jeff Law

On 11/28/2017 04:34 AM, Siddhesh Poyarekar wrote:
> On Friday 24 November 2017 05:36 PM, Siddhesh Poyarekar wrote:
>> freq_max < 1, i.e. highest frequency among bbs in the function being
>> higher than real_bb_freq_max means that the bb ends up with a profile
>> count larger than real_bb_freq_max and then can go all the way up to
>> and beyond profile_count::max_count.
>>
>> Bootstrapped on aarch64, testsuite in progress.
> 
> Tests came out clean (no new regressions) on aarch64 and x86_64.  Ping?
> 
> Siddhesh
> 
>>
>>  * gcc/predict.c (estimate_bb_frequencies): Don't reset freq_max.
>>  * gcc/testsuite/gcc.dg/pr83069.c: New test case.
Just a note.  Honza indicated in the associated BZ that he was going to
prepare an alternate patch to address this problem.

Jeff

Re: [PATCH] Fix PR80846, change vectorizer reduction epilogue (on x86)

2017-12-05 Thread Jeff Law

On 11/28/2017 08:15 AM, Richard Biener wrote:
> 
> The following adds a new target hook, targetm.vectorize.split_reduction,
> which allows the target to specify a preferred mode to perform the
> final reducion on using either vector shifts or scalar extractions.
> Up to that mode the vector reduction result is reduced by combining
> lowparts and highparts recursively.  This avoids lane-crossing operations
> when doing AVX256 on Zen and Bulldozer and also speeds up things on
> Haswell (I verified ~20% speedup on Broadwell).
> 
> Thus the patch implements the target hook on x86 to _always_ prefer
> SSE modes for the final reduction.
> 
> For the testcase in the bugzilla
> 
> int sumint(const int arr[]) {
> arr = __builtin_assume_aligned(arr, 64);
> int sum=0;
> for (int i=0 ; i<1024 ; i++)
>   sum+=arr[i];
> return sum;
> }
> 
> this changes -O3 -mavx512f code from
> 
> sumint:
> .LFB0:
> .cfi_startproc
> vpxord  %zmm0, %zmm0, %zmm0
> leaq4096(%rdi), %rax
> .p2align 4,,10
> .p2align 3
> .L2:
> vpaddd  (%rdi), %zmm0, %zmm0
> addq$64, %rdi
> cmpq%rdi, %rax
> jne .L2
> vpxord  %zmm1, %zmm1, %zmm1
> vshufi32x4  $78, %zmm1, %zmm0, %zmm2
> vpaddd  %zmm2, %zmm0, %zmm0
> vmovdqa64   .LC0(%rip), %zmm2
> vpermi2d%zmm1, %zmm0, %zmm2
> vpaddd  %zmm2, %zmm0, %zmm0
> vmovdqa64   .LC1(%rip), %zmm2
> vpermi2d%zmm1, %zmm0, %zmm2
> vpaddd  %zmm2, %zmm0, %zmm0
> vmovdqa64   .LC2(%rip), %zmm2
> vpermi2d%zmm1, %zmm0, %zmm2
> vpaddd  %zmm2, %zmm0, %zmm0
> vmovd   %xmm0, %eax
> 
> to
> 
> sumint:
> .LFB0:
> .cfi_startproc
> vpxord  %zmm0, %zmm0, %zmm0
> leaq4096(%rdi), %rax
> .p2align 4,,10
> .p2align 3
> .L2:
> vpaddd  (%rdi), %zmm0, %zmm0
> addq$64, %rdi
> cmpq%rdi, %rax
> jne .L2
> vextracti64x4   $0x1, %zmm0, %ymm1
> vpaddd  %ymm0, %ymm1, %ymm1
> vmovdqa %xmm1, %xmm0
> vextracti128$1, %ymm1, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vpsrldq $8, %xmm0, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vpsrldq $4, %xmm0, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vmovd   %xmm0, %eax
> 
> and for -O3 -mavx2 from
> 
> sumint:
> .LFB0:
> .cfi_startproc
> vpxor   %xmm0, %xmm0, %xmm0
> leaq4096(%rdi), %rax
> .p2align 4,,10
> .p2align 3
> .L2:
> vpaddd  (%rdi), %ymm0, %ymm0
> addq$32, %rdi
> cmpq%rdi, %rax
> jne .L2
> vpxor   %xmm1, %xmm1, %xmm1
> vperm2i128  $33, %ymm1, %ymm0, %ymm2
> vpaddd  %ymm2, %ymm0, %ymm0
> vperm2i128  $33, %ymm1, %ymm0, %ymm2
> vpalignr$8, %ymm0, %ymm2, %ymm2
> vpaddd  %ymm2, %ymm0, %ymm0
> vperm2i128  $33, %ymm1, %ymm0, %ymm1
> vpalignr$4, %ymm0, %ymm1, %ymm1
> vpaddd  %ymm1, %ymm0, %ymm0
> vmovd   %xmm0, %eax
> 
> to
> 
> sumint:
> .LFB0:
> .cfi_startproc
> vpxor   %xmm0, %xmm0, %xmm0
> leaq4096(%rdi), %rax
> .p2align 4,,10
> .p2align 3
> .L2:
> vpaddd  (%rdi), %ymm0, %ymm0
> addq$32, %rdi
> cmpq%rdi, %rax
> jne .L2
> vmovdqa %xmm0, %xmm1
> vextracti128$1, %ymm0, %xmm0
> vpaddd  %xmm0, %xmm1, %xmm0
> vpsrldq $8, %xmm0, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vpsrldq $4, %xmm0, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vmovd   %xmm0, %eax
> vzeroupper
> ret
> 
> which besides being faster is also smaller (less prefixes).
> 
> SPEC 2k6 results on Haswell (thus AVX2) are neutral.  As it merely
> effects reduction vectorization epilogues I didn't expect big effects
> but for loops that do not run much (more likely with AVX512).
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Ok for trunk?
> 
> The PR mentions some more tricks to optimize the sequence but
> those look like backend only optimizations.
> 
> Thanks,
> Richard.
> 
> 2017-11-28  Richard Biener  
> 
>   PR tree-optimization/80846
>   * target.def (split_reduction): New target hook.
>   * targhooks.c (default_split_reduction): New function.
>   * targhooks.h (default_split_reduction): Declare.
>   * tree-vect-loop.c (vect_create_epilog_for_reduction): If the
>   target requests first reduce vectors by combining low and high
>   parts.
>   * tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust.
>   (get_vectype_for_scalar_type_and_size): Export.
>   * tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare.
> 
>   * doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document.
>   * doc/tm.texi: Regenerate.
> 
>   i386/

Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase

2017-12-05 Thread Thomas Preudhomme


Hi Mike,

Thanks, I've tested after the two commits and it works both in tree and out of 
tree. It'll simplify comparing in tree results Vs out of tree for us, thanks a lot!


Would you consider a backport to stable branches if nobody complains after a 
week?

Best regards,

Thomas

On 05/12/17 19:27, Mike Stump wrote:

On Dec 5, 2017, at 11:11 AM, Thomas Preudhomme  
wrote:


On 05/12/17 17:54, Andrew Pinski wrote:

On Tue, Dec 5, 2017 at 9:50 AM, Thomas Preudhomme
 wrote:

Hi,

dump-noaddr test FAILS when $tmpdir is not the same as the directory
where runtest is called from. Note that this does not happen when
running make check because tmpdir is set to srcdir.

In that case, file mkdir will create the directory in the current
directory while GCC is invoked from tmpdir and hence -dumpbase look
for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to
be relative to tmpdir which will work in all case.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-12-05  Thomas Preud'homme  

 * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump base
 relative to tmpdir.

Testing: Successfully ran unsorted.exp via make check and out of tree
testing using runtest from /test with tmpdir set in
/test/site.exp to .

Is this ok for stage3?

https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html
I don't remember where this discussion went last time.
Maybe this time there will be a resolution :).


FWIW, I agree with Matt, creating the dump in tmpdir makes more sense. I think 
his patch can be simplified though because the compiler seems to be invoked 
from tmpdir so it can at least be omitted from the -dumpbase.


Sounds reasonable.  I've added that on top of his patch and checked that in.  
Let us know if it works or not.

[PATCH] Fix up orig_loop_num handling during move_sese_region_to_fn (PR tree-optimization/81945)

2017-12-05 Thread Jakub Jelinek

Hi!

move_sese_region_to_fn moves a subset of the original loop tree
to the dest_cfun (and adds the outermost loop new).
Now, some loops might have non-zero orig_loop_num field.  In the caller
that is fine, if the orig_loop_num loop is moved, then get_loop will just
return NULL and we'll clear it later.  But if a loop with orig_loop_num != 0
is moved into the dest_cfun, where we create new numbers for the loops,
orig_loop_num might be too large for the larray vector, or might point
to an unrelated loop.

The following patch goes through all the loops moved into dest_cfun and if
they have non-zero orig_loop_num, it tries to remap them into a new number
if it points to a loop that was also moved to dest_cfun, or clears it
otherwise.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-05  Jakub Jelinek  

PR tree-optimization/81945
* cfgloop.h (FOR_EACH_LOOP_FN): Use FN instead of hardcoding fn.
* tree-cfg.c (move_sese_region_to_fn): If any of the loops moved
to dest_cfun has orig_loop_num set, either remap it to the new
loop number if the loop got moved too, or clear it.

* gcc.dg/graphite/pr81945.c: New test.

--- gcc/cfgloop.h.jj2017-11-27 18:52:20.0 +0100
+++ gcc/cfgloop.h   2017-12-05 16:00:07.771838975 +0100
@@ -766,7 +766,7 @@ loop_iterator::~loop_iterator ()
(LOOP) = li.next ())
 
 #define FOR_EACH_LOOP_FN(FN, LOOP, FLAGS) \
-  for (loop_iterator li(fn, &(LOOP), FLAGS); \
+  for (loop_iterator li(FN, &(LOOP), FLAGS); \
(LOOP); \
(LOOP) = li.next ())
 
--- gcc/tree-cfg.c.jj   2017-12-04 20:10:29.0 +0100
+++ gcc/tree-cfg.c  2017-12-05 15:54:53.0 +0100
@@ -7468,6 +7468,8 @@ move_sese_region_to_fn (struct function
   loops->state = LOOPS_MAY_HAVE_MULTIPLE_LATCHES;
   set_loops_for_fn (dest_cfun, loops);
 
+  vec *larray = get_loops (saved_cfun)->copy ();
+
   /* Move the outlined loop tree part.  */
   num_nodes = bbs.length ();
   FOR_EACH_VEC_ELT (bbs, i, bb)
@@ -7514,6 +7516,20 @@ move_sese_region_to_fn (struct function
   loop->aux = current_loops->tree_root;
   loop0->aux = current_loops->tree_root;
 
+  /* Fix up orig_loop_num.  If the block referenced in it has been moved
+ to dest_cfun, update orig_loop_num field, otherwise clear it.  */
+  struct loop *dloop;
+  FOR_EACH_LOOP_FN (dest_cfun, dloop, 0)
+if (dloop->orig_loop_num)
+  {
+   if ((*larray)[dloop->orig_loop_num] != NULL
+   && get_loop (saved_cfun, dloop->orig_loop_num) == NULL)
+ dloop->orig_loop_num = (*larray)[dloop->orig_loop_num]->num;
+   else
+ dloop->orig_loop_num = 0;
+  }
+  ggc_free (larray);
+
   pop_cfun ();
 
   /* Move blocks from BBS into DEST_CFUN.  */
--- gcc/testsuite/gcc.dg/graphite/pr81945.c.jj  2017-12-05 16:07:12.375610782 
+0100
+++ gcc/testsuite/gcc.dg/graphite/pr81945.c 2017-12-05 16:07:24.925456255 
+0100
@@ -0,0 +1,21 @@
+/* PR tree-optimization/81945 */
+/* { dg-do compile { target pthread } } */
+/* { dg-options "-O3 -ftree-parallelize-loops=2 -floop-nest-optimize" } */
+
+unsigned long int v;
+
+void
+foo (int x, int y, long int *a)
+{
+  do
+{
+  int **b;
+
+  while (y != 0)
+;
+  v *= 2;
+  **b = *a;
+  ++x;
+}
+  while (x < 1);
+}

Jakub

Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase

2017-12-05 Thread Mike Stump

On Dec 5, 2017, at 12:56 PM, Thomas Preudhomme  
wrote:
> 
> Thanks, I've tested after the two commits and it works both in tree and out 
> of tree. It'll simplify comparing in tree results Vs out of tree for us, 
> thanks a lot!
> 
> Would you consider a backport to stable branches if nobody complains after a 
> week?

Yeah, back port is Ok.

Re: [024/nnn] poly_int: ira subreg liveness tracking

2017-12-05 Thread Richard Sandiford

Jeff Law  writes:
> On 10/23/2017 11:09 AM, Richard Sandiford wrote:
>> Normmaly the IRA-reload interface tries to track the liveness of
>> individual bytes of an allocno if the allocno is sometimes written
>> to as a SUBREG.  This isn't possible for variable-sized allocnos,
>> but it doesn't matter because targets with variable-sized registers
>> should use LRA instead.
>> 
>> This patch adds a get_subreg_tracking_sizes function for deciding
>> whether it is possible to model a partial read or write.  Later
>> patches make it return false if anything is variable.
>> 
>> 
>> 2017-10-23  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>> 
>> gcc/
>>  * ira.c (get_subreg_tracking_sizes): New function.
>>  (init_live_subregs): Take an integer size rather than a register.
>>  (build_insn_chain): Use get_subreg_tracking_sizes.  Update calls
>>  to init_live_subregs.
> OK.
>
> Note this is starting to get close to the discussion around CLOBBER_HIGH
> vs using a self set with a low subreg that we're having with Alan on
> another thread in that liveness tracking of subregs of SVE regs could
> potentially use some improvements.
>
> When I quickly looked at the subreg handling in the df infrstructure my
> first thought was that it might need some updating for SVE.  I can't
> immediately call bits for poly_int/SVE in the patches to-date.  Have you
> dug in there at all for the poly_int/SVE work?

Yeah, although the subreg tracking in this patch is specific to reload,
I thought we had something similar for LRA.  I couldn't find anything
though, and the static type checking of poly_ints would have forced
the issue.

There is the DF_WORD_LR code, which tracks the liveness of words in a
double-word pseudo.  We didn't extend that to variable-length registers
for two reasons: (1) if we did need it, we'd want it for pseudos
that map to 3 or 4 registers, not just 2, so that LD[234] and ST[234]
are handled consistently; and (2) it's only used for DCE at the moment,
and it's rare for LD[234]/ST[234]s to be dead code.

Thanks,
Richard

[PATCH] v4: C/C++: don't suggest implementation names as spelling fixes (PR c/83236)

2017-12-05 Thread David Malcolm

On Mon, 2017-12-04 at 15:34 -0500, Jason Merrill wrote:
> On Mon, Dec 4, 2017 at 12:00 PM, David Malcolm 
> wrote:
> > * don't filter suggestions if the name name being looked up
> > is itself reserved for implementation
> 
> I wonder if we want to avoid filtering if the name being looked up
> starts with a single _, on the theory that the user meant to write
> __.
> 
> Other than that, looks good.
> 
> Jason

Thanks.

Here's an updated version of the patch which implements that (and
adds a little more test coverage).

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu (in
conjunction with the setup patch).

Are they both OK for trunk?

gcc/c-family/ChangeLog:
PR c/83236
* c-common.c (selftest::c_family_tests): Call
selftest::c_spellcheck_cc_tests.
* c-common.h (selftest::c_spellcheck_cc_tests): New decl.
* c-spellcheck.cc: Include "selftest.h".
(name_reserved_for_implementation_p): New function.
(should_suggest_as_macro_p): New function.
(find_closest_macro_cpp_cb): Move the check for NT_MACRO to
should_suggest_as_macro_p and call it.
(selftest::test_name_reserved_for_implementation_p): New function.
(selftest::c_spellcheck_cc_tests): New function.
* c-spellcheck.h (name_reserved_for_implementation_p): New decl.

gcc/c/ChangeLog:
PR c/83236
* c-decl.c (lookup_name_fuzzy): Don't suggest names that are
reserved for use by the implementation.

gcc/cp/ChangeLog:
PR c/83236
* name-lookup.c (consider_binding_level): Don't suggest names that
are reserved for use by the implementation.

gcc/testsuite/ChangeLog:
PR c/83236
* c-c++-common/spellcheck-reserved.c: New test case.
---
 gcc/c-family/c-common.c  |  1 +
 gcc/c-family/c-common.h  |  1 +
 gcc/c-family/c-spellcheck.cc | 66 +++-
 gcc/c-family/c-spellcheck.h  |  2 +
 gcc/c/c-decl.c   | 12 +
 gcc/cp/name-lookup.c | 25 +++--
 gcc/testsuite/c-c++-common/spellcheck-reserved.c | 55 
 7 files changed, 157 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/spellcheck-reserved.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 1d79aee..6a343a3 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8177,6 +8177,7 @@ void
 c_family_tests (void)
 {
   c_format_c_tests ();
+  c_spellcheck_cc_tests ();
 }
 
 } // namespace selftest
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 27b1de9..d9bf8d0 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1450,6 +1450,7 @@ namespace selftest {
   /* Declarations for specific families of tests within c-family,
  by source file, in alphabetical order.  */
   extern void c_format_c_tests (void);
+  extern void c_spellcheck_cc_tests (void);
 
   /* The entrypoint for running all of the above tests.  */
   extern void c_family_tests (void);
diff --git a/gcc/c-family/c-spellcheck.cc b/gcc/c-family/c-spellcheck.cc
index db70a64..a6b9e17 100644
--- a/gcc/c-family/c-spellcheck.cc
+++ b/gcc/c-family/c-spellcheck.cc
@@ -25,6 +25,37 @@ along with GCC; see the file COPYING3.  If not see
 #include "cpplib.h"
 #include "spellcheck-tree.h"
 #include "c-family/c-spellcheck.h"
+#include "selftest.h"
+
+/* Return true iff STR begin with an underscore and either an uppercase
+   letter or another underscore, and is thus, for C and C++, reserved for
+   use by the implementation.  */
+
+bool
+name_reserved_for_implementation_p (const char *str)
+{
+  if (str[0] != '_')
+return false;
+  return (str[1] == '_' || ISUPPER(str[1]));
+}
+
+/* Return true iff HASHNODE is a macro that should be offered as a
+   suggestion for a misspelling.  */
+
+static bool
+should_suggest_as_macro_p (cpp_hashnode *hashnode)
+{
+  if (hashnode->type != NT_MACRO)
+return false;
+
+  /* Don't suggest names reserved for the implementation, but do suggest the 
builtin
+ macros such as __FILE__, __LINE__ etc.  */
+  if (name_reserved_for_implementation_p ((const char *)hashnode->ident.str)
+  && !(hashnode->flags & NODE_BUILTIN))
+return false;
+
+  return true;
+}
 
 /* A callback for cpp_forall_identifiers, for use by best_macro_match's ctor.
Process HASHNODE and update the best_macro_match instance pointed to be
@@ -34,7 +65,7 @@ static int
 find_closest_macro_cpp_cb (cpp_reader *, cpp_hashnode *hashnode,
   void *user_data)
 {
-  if (hashnode->type != NT_MACRO)
+  if (!should_suggest_as_macro_p (hashnode))
 return 1;
 
   best_macro_match *bmm = (best_macro_match *)user_data;
@@ -55,3 +86,36 @@ best_macro_match::best_macro_match (tree goal,
 {
   cpp_forall_identifiers (reader, find_closest_macro_cpp_cb, this);
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Sel

Re: [037/nnn] poly_int: get_bit_range

2017-12-05 Thread Jeff Law

On 10/23/2017 11:14 AM, Richard Sandiford wrote:
> This patch makes get_bit_range return the range and position as poly_ints.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expr.h (get_bit_range): Return the bitstart and bitend as
>   poly_uint64s rather than unsigned HOST_WIDE_INTs.  Return the bitpos
>   as a poly_int64 rather than a HOST_WIDE_INT.
>   * expr.c (get_bit_range): Likewise.
>   (expand_assignment): Update call accordingly.
>   * fold-const.c (optimize_bit_field_compare): Likewise.
OK.
jeff

Re: [048/nnn] poly_int: cfgexpand stack variables

2017-12-05 Thread Jeff Law

On 10/23/2017 11:20 AM, Richard Sandiford wrote:
> This patch changes the type of stack_var::size from HOST_WIDE_INT
> to poly_uint64.  The difference in signedness is because the
> field was set by:
> 
>   v->size = tree_to_uhwi (size);
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * cfgexpand.c (stack_var::size): Change from a HOST_WIDE_INT
>   to a poly_uint64.
>   (add_stack_var, stack_var_cmp, partition_stack_vars)
>   (dump_stack_var_partition): Update accordingly.
>   (alloc_stack_frame_space): Take the size as a poly_int64 rather
>   than a HOST_WIDE_INT.
>   (expand_stack_vars, expand_one_stack_var_1): Handle polynomial sizes.
>   (defer_stack_allocation, estimated_stack_frame_size): Likewise.
>   (account_stack_vars, expand_one_var): Likewise.  Return a poly_uint64
>   rather than a HOST_WIDE_INT.
> 
OK
jeff

Re: [051/nnn] poly_int: emit_group_load/store

2017-12-05 Thread Jeff Law

On 10/23/2017 11:21 AM, Richard Sandiford wrote:
> This patch changes the sizes passed to emit_group_load and
> emit_group_store from int to poly_int64.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expr.h (emit_group_load, emit_group_load_into_temps)
>   (emit_group_store): Take the size as a poly_int64 rather than an int.
>   * expr.c (emit_group_load_1, emit_group_load): Likewise.
>   (emit_group_load_into_temp, emit_group_store): Likewise.
> 
OK.
jeff

Re: [100/nnn] poly_int: memrefs_conflict_p

2017-12-05 Thread Jeff Law

On 10/23/2017 11:40 AM, Richard Sandiford wrote:
> The xsize and ysize arguments to memrefs_conflict_p are encode such
> that:
> 
> - 0 means the size is unknown
> - >0 means the size is known
> - <0 means that the negative of the size is a worst-case size after
>   alignment
> 
> In other words, the sign effectively encodes a boolean; it isn't
> meant to be taken literally.  With poly_ints these correspond to:
> 
> - known_zero (...)
> - may_gt (..., 0)
> - may_lt (..., 0)
> 
> respectively.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * alias.c (addr_side_effect_eval): Take the size as a poly_int64
>   rather than an int.  Use plus_constant.
>   (memrefs_conflict_p): Take the sizes as poly_int64s rather than ints.
>   Take the offset "c" as a poly_int64 rather than a HOST_WIDE_INT.
> 
Not sure why I was dreading this one and kept putting it off.  It really
wasn't too bad to work through.

OK.
jeff

Re: [086/nnn] poly_int: REGMODE_NATURAL_SIZE

2017-12-05 Thread Jeff Law

On 10/23/2017 11:34 AM, Richard Sandiford wrote:
> This patch makes target-independent code that uses REGMODE_NATURAL_SIZE
> treat it as a poly_int rather than a constant.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * combine.c (can_change_dest_mode): Handle polynomial
>   REGMODE_NATURAL_SIZE.
>   * expmed.c (store_bit_field_1): Likewise.
>   * expr.c (store_constructor): Likewise.
>   * emit-rtl.c (validate_subreg): Operate on polynomial mode sizes
>   and polynomial REGMODE_NATURAL_SIZE.
>   (gen_lowpart_common): Likewise.
>   * reginfo.c (record_subregs_of_mode): Likewise.
>   * rtlanal.c (read_modify_subreg_p): Likewise.
> 
OK.
jeff

Re: [022/nnn] poly_int: C++ bitfield regions

2017-12-05 Thread Jeff Law

On 10/23/2017 11:08 AM, Richard Sandiford wrote:
> This patch changes C++ bitregion_start/end values from constants to
> poly_ints.  Although it's unlikely that the size needs to be polynomial
> in practice, the offset could be with future language extensions.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expmed.h (store_bit_field): Change bitregion_start and
>   bitregion_end from unsigned HOST_WIDE_INT to poly_uint64.
>   * expmed.c (adjust_bit_field_mem_for_reg, strict_volatile_bitfield_p)
>   (store_bit_field_1, store_integral_bit_field, store_bit_field)
>   (store_fixed_bit_field, store_split_bit_field): Likewise.
>   * expr.c (store_constructor_field, store_field): Likewise.
>   (optimize_bitfield_assignment_op): Likewise.  Make the same change
>   to bitsize and bitpos.
>   * machmode.h (bit_field_mode_iterator): Change m_bitregion_start
>   and m_bitregion_end from HOST_WIDE_INT to poly_int64.  Make the
>   same change in the constructor arguments.
>   (get_best_mode): Change bitregion_start and bitregion_end from
>   unsigned HOST_WIDE_INT to poly_uint64.
>   * stor-layout.c (bit_field_mode_iterator::bit_field_mode_iterator):
>   Change bitregion_start and bitregion_end from HOST_WIDE_INT to
>   poly_int64.
>   (bit_field_mode_iterator::next_mode): Update for new types
>   of m_bitregion_start and m_bitregion_end.
>   (get_best_mode): Change bitregion_start and bitregion_end from
>   unsigned HOST_WIDE_INT to poly_uint64.
> 
OK.
jeff

Re: [020/nnn] poly_int: store_bit_field bitrange

2017-12-05 Thread Jeff Law

On 10/23/2017 11:08 AM, Richard Sandiford wrote:
> This patch changes the bitnum and bitsize arguments to
> store_bit_field from unsigned HOST_WIDE_INTs to poly_uint64s.
> The later part of store_bit_field_1 still needs to operate
> on constant bit positions and sizes, so the patch splits
> it out into a subfunction (store_integral_bit_field).
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expmed.h (store_bit_field): Take bitsize and bitnum as
>   poly_uint64s rather than unsigned HOST_WIDE_INTs.
>   * expmed.c (simple_mem_bitfield_p): Likewise.  Add a parameter
>   that returns the byte size.
>   (store_bit_field_1): Take bitsize and bitnum as
>   poly_uint64s rather than unsigned HOST_WIDE_INTs.  Update call
>   to simple_mem_bitfield_p.  Split the part that can only handle
>   constant bitsize and bitnum out into...
>   (store_integral_bit_field): ...this new function.
>   (store_bit_field): Take bitsize and bitnum as poly_uint64s rather
>   than unsigned HOST_WIDE_INTs.
>   (extract_bit_field_1): Update call to simple_mem_bitfield_p.
OK.
jeff

Re: [021/nnn] poly_int: extract_bit_field bitrange

2017-12-05 Thread Jeff Law

On 10/23/2017 11:08 AM, Richard Sandiford wrote:
> Similar to the previous store_bit_field patch, but for extractions
> rather than insertions.  The patch splits out the extraction-as-subreg
> handling into a new function (extract_bit_field_as_subreg), both for
> ease of writing and because a later patch will add another caller.
> 
> The simplify_gen_subreg overload is temporary; it goes away
> in a later patch.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * rtl.h (simplify_gen_subreg): Add a temporary overload that
>   accepts poly_uint64 offsets.
>   * expmed.h (extract_bit_field): Take bitsize and bitnum as
>   poly_uint64s rather than unsigned HOST_WIDE_INTs.
>   * expmed.c (lowpart_bit_field_p): Likewise.
>   (extract_bit_field_as_subreg): New function, split out from...
>   (extract_bit_field_1): ...here.  Take bitsize and bitnum as
>   poly_uint64s rather than unsigned HOST_WIDE_INTs.  For vector
>   extractions, check that BITSIZE matches the size of the extracted
>   value and that BITNUM is an exact multiple of that size.
>   If all else fails, try forcing the value into memory if
>   BITNUM is variable, and adjusting the address so that the
>   offset is constant.  Split the part that can only handle constant
>   bitsize and bitnum out into...
>   (extract_integral_bit_field): ...this new function.
>   (extract_bit_field): Take bitsize and bitnum as poly_uint64s
>   rather than unsigned HOST_WIDE_INTs.
OK.

jeff

[PATCH] avoid bogus -Wstringop-overflow for strncpy with _FORTIFY_SOURCE (PR 82646)

2017-12-05 Thread Martin Sebor


PR middle-end/82646 - bogus -Wstringop-overflow with
-D_FORTIFY_SOURCE=2 on strncpy with range to a member array,

The bug points out a false positive in a call to strncpy() when
_FORTIFY_SOURCE is defined that doesn't exist otherwise.

The problem is that __builtin_strncpy buffer overflow checking
is done along with the expansion of the intrinsic in one place
and __builtin___strncpy_chk is handled differently in another,
and the two are out of sync.

The attached patch corrects the choice of arguments used for
overflow detection in __builtin___strncpy_chk and aligns
the diagnostics between the two intrinsics.

Martin
PR tree-optimization/82646 - bogus -Wstringop-overflow with -D_FORTIFY_SOURCE=2 on strncpy with range to a member array

gcc/ChangeLog:

	PR tree-optimization/82646
	* builtins.c (maybe_emit_chk_warning): Use size as the bound for
	strncpy, not maxlen.

gcc/testsuite/ChangeLog:

	PR tree-optimization/82646
	* gcc.dg/builtin-stringop-chk-1.c: Adjust.
	* gcc.dg/builtin-stringop-chk-9.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 097e1b7..6b25253 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -9861,6 +9861,8 @@ maybe_emit_chk_warning (tree exp, enum built_in_function fcode)
  (such as __strncat_chk) or null if the operation isn't bounded
  (such as __strcat_chk).  */
   tree maxlen = NULL_TREE;
+  /* The exact size of the access (such as in __strncpy_chk).  */
+  tree size = NULL_TREE;
 
   switch (fcode)
 {
@@ -9888,7 +9890,7 @@ maybe_emit_chk_warning (tree exp, enum built_in_function fcode)
 case BUILT_IN_STRNCPY_CHK:
 case BUILT_IN_STPNCPY_CHK:
   srcstr = CALL_EXPR_ARG (exp, 1);
-  maxlen = CALL_EXPR_ARG (exp, 2);
+  size = CALL_EXPR_ARG (exp, 2);
   objsize = CALL_EXPR_ARG (exp, 3);
   break;
 
@@ -9911,7 +9913,7 @@ maybe_emit_chk_warning (tree exp, enum built_in_function fcode)
 }
 
   check_sizes (OPT_Wstringop_overflow_, exp,
-	   /*size=*/NULL_TREE, maxlen, srcstr, objsize);
+	   size, maxlen, srcstr, objsize);
 }
 
 /* Emit warning if a buffer overflow is detected at compile time
diff --git a/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c b/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
index 35cc6dc..10048f3 100644
--- a/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-stringop-chk-1.c
@@ -36,7 +36,7 @@ test (int arg, ...)
   vx = stpcpy (&buf2[18], "a");
   vx = stpcpy (&buf2[18], "ab"); /* { dg-warning "writing 3" "stpcpy" } */
   strncpy (&buf2[18], "a", 2);
-  strncpy (&buf2[18], "a", 3); /* { dg-warning "specified bound 3 exceeds destination size 2" "strncpy" } */
+  strncpy (&buf2[18], "a", 3); /* { dg-warning "writing 3 bytes into a region of size 2" "strncpy" } */
   strncpy (&buf2[18], "abc", 2);
   strncpy (&buf2[18], "abc", 3); /* { dg-warning "writing 3 " "strncpy" } */
   memset (buf2, '\0', sizeof (buf2));
@@ -93,7 +93,7 @@ void
 test2 (const H h)
 {
   char c;
-  strncpy (&c, str, 3); /* { dg-warning "specified bound 3 exceeds destination size 1" "strncpy" } */
+  strncpy (&c, str, 3); /* { dg-warning "writing 3 bytes into a region of size 1" "strncpy" } */
 
   struct { char b[4]; } x;
   sprintf (x.b, "%s", "ABCD"); /* { dg-warning "writing 5" "sprintf" } */
diff --git a/gcc/testsuite/gcc.dg/builtin-stringop-chk-9.c b/gcc/testsuite/gcc.dg/builtin-stringop-chk-9.c
new file mode 100644
index 000..b5464c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-stringop-chk-9.c
@@ -0,0 +1,150 @@
+/* PR middle-end/82646 - bogus -Wstringop-overflow with -D_FORTIFY_SOURCE=2
+   on strncpy with range to a member array
+   { dg-do compile }
+   { dg-options "-O2 -Wstringop-overflow -ftrack-macro-expansion=0" } */
+
+#define bos(p)   __builtin_object_size (p, 1)
+
+struct S {
+  char a[5];
+  void (*pf)(void);
+};
+
+/* Verify that none of the string function calls below triggers a warning.  */
+
+char* test_stpncpy_const_nowarn (struct S *p)
+{
+  int n = sizeof p->a;
+
+  return __builtin_stpncpy (p->a, "123456", n);
+}
+
+char* test_strncpy_const_nowarn (struct S *p)
+{
+  int n = sizeof p->a;
+
+  return __builtin_strncpy (p->a, "1234567", n);
+}
+
+char* test_stpncpy_chk_const_nowarn (struct S *p)
+{
+  int n = sizeof p->a;
+
+  return __builtin___stpncpy_chk (p->a, "12345678", n, bos (p->a));
+}
+
+char* test_strncpy_chk_const_nowarn (struct S *p)
+{
+  int n = sizeof p->a;
+
+  return __builtin___strncpy_chk (p->a, "123456789", n, bos (p->a));
+}
+
+
+char* test_stpncpy_range_nowarn (struct S *p, int n)
+{
+  if (n < sizeof p->a)
+n = sizeof p->a;
+
+  return __builtin_stpncpy (p->a, "123456", n);
+}
+
+char* test_strncpy_range_nowarn (struct S *p, int n)
+{
+  if (n < sizeof p->a)
+n = sizeof p->a;
+
+  return __builtin_strncpy (p->a, "1234567", n);
+}
+
+char* test_stpncpy_chk_range_nowarn (struct S *p, int n)
+{
+  if (n < sizeof p->a)
+n = sizeof p->a;
+
+  return __builtin___stpncpy_chk (p->a, "12345678", n, bos (p->a));   /* { dg-bogus "\\\[-Wstringop

Re: [023/nnn] poly_int: store_field & co

2017-12-05 Thread Jeff Law

On 10/23/2017 11:09 AM, Richard Sandiford wrote:
> This patch makes store_field and related routines use poly_ints
> for bit positions and sizes.  It keeps the existing choices
> between signed and unsigned types (there are a mixture of both).
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expr.c (store_constructor_field): Change bitsize from a
>   unsigned HOST_WIDE_INT to a poly_uint64 and bitpos from a
>   HOST_WIDE_INT to a poly_int64.
>   (store_constructor): Change size from a HOST_WIDE_INT to
>   a poly_int64.
>   (store_field): Likewise bitsize and bitpos.

OK
jeff

Re: [PATCH] avoid bogus -Wstringop-overflow for strncpy with _FORTIFY_SOURCE (PR 82646)

2017-12-05 Thread Jeff Law

On 12/05/2017 04:47 PM, Martin Sebor wrote:
> PR middle-end/82646 - bogus -Wstringop-overflow with
> -D_FORTIFY_SOURCE=2 on strncpy with range to a member array,
> 
> The bug points out a false positive in a call to strncpy() when
> _FORTIFY_SOURCE is defined that doesn't exist otherwise.
> 
> The problem is that __builtin_strncpy buffer overflow checking
> is done along with the expansion of the intrinsic in one place
> and __builtin___strncpy_chk is handled differently in another,
> and the two are out of sync.
> 
> The attached patch corrects the choice of arguments used for
> overflow detection in __builtin___strncpy_chk and aligns
> the diagnostics between the two intrinsics.
> 
> Martin
> 
> gcc-82646.diff
> 
> 
> PR tree-optimization/82646 - bogus -Wstringop-overflow with 
> -D_FORTIFY_SOURCE=2 on strncpy with range to a member array
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/82646
>   * builtins.c (maybe_emit_chk_warning): Use size as the bound for
>   strncpy, not maxlen.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/82646
>   * gcc.dg/builtin-stringop-chk-1.c: Adjust.
>   * gcc.dg/builtin-stringop-chk-9.c: New test.
OK.

[ Happy to see something easy fly by that isn't SVE related :-) ]

jeff

Re: [031/nnn] poly_int: aff_tree

2017-12-05 Thread Jeff Law

On 10/23/2017 11:12 AM, Richard Sandiford wrote:
> This patch changes the type of aff_tree::offset from widest_int to
> poly_widest_int and adjusts the function interfaces in the same way.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-affine.h (aff_tree::offset): Change from widest_int
>   to poly_widest_int.
>   (wide_int_ext_for_comb): Delete.
>   (aff_combination_const, aff_comb_cannot_overlap_p): Take the
>   constants as poly_widest_int rather than widest_int.
>   (aff_combination_constant_multiple_p): Return the multiplier
>   as a poly_widest_int.
>   (aff_combination_zero_p, aff_combination_singleton_var_p): Handle
>   polynomial offsets.
>   * tree-affine.c (wide_int_ext_for_comb): Make original widest_int
>   version static and add an overload for poly_widest_int.
>   (aff_combination_const, aff_combination_add_cst)
>   (wide_int_constant_multiple_p, aff_comb_cannot_overlap_p): Take
>   the constants as poly_widest_int rather than widest_int.
>   (tree_to_aff_combination): Generalize INTEGER_CST case to
>   poly_int_tree_p.
>   (aff_combination_to_tree): Track offsets as poly_widest_ints.
>   (aff_combination_add_product, aff_combination_mult): Handle
>   polynomial offsets.
>   (aff_combination_constant_multiple_p): Return the multiplier
>   as a poly_widest_int.
>   * tree-predcom.c (determine_offset): Return the offset as a
>   poly_widest_int.
>   (split_data_refs_to_components, suitable_component_p): Update
>   accordingly.
>   (valid_initializer_p): Update call to
>   aff_combination_constant_multiple_p.
>   * tree-ssa-address.c (addr_to_parts): Handle polynomial offsets.
>   * tree-ssa-loop-ivopts.c (get_address_cost_ainc): Take the step
>   as a poly_int64 rather than a HOST_WIDE_INT.
>   (get_address_cost): Handle polynomial offsets.
>   (iv_elimination_compare_lt): Likewise.
>   (rewrite_use_nonlinear_expr): Likewise.
OK.
Jeff

Re: [PATCH] emit a trap for buffer overflow in placement new

2017-12-05 Thread Martin Sebor


On 12/05/2017 06:53 AM, Jonathan Wakely wrote:

On 05/12/17 09:48 +0100, Marc Glisse wrote:

On Mon, 4 Dec 2017, Martin Sebor wrote:


On 12/04/2017 03:44 PM, Marc Glisse wrote:

On Mon, 4 Dec 2017, Martin Sebor wrote:


The -Wplacement-new option warns for buffer overflow in placement
new expressions with objects of constant sizes, but because it's
implemented completely in the C++ front end it misses the more
interesting non-constant sizes.

The attached patch instruments both forms of operator placement
new to emit a trap when __builtin_object_size() determines that
the pointer points to an object less than the specified number
of bytes.  This is done only when _FORTIFY_SOURCE is defined
to a non-zero value.  This makes it possible to prevent buffer
overflow in most of the same cases as in built-ins like strcpy,
though without warnings when the size is nor a C++ constant
integer expression.

On x86_64-linux it passes testing with no apparent regressions.
Can anyone think of problems with this solution?  If not, given
its simplicity, would it be appropriate even at this stage?


AFAIK, one can call this operator new manually on any pointer,
including
one-past-the-end pointers and null pointers. It is only with new
expressions that the limitation comes in (because it runs a constructor
afterwards). Not that people often do that...


Hmm, yes, there don't seem to be any requirements on calling
the operator by itself.  That's too bad.  I was going to move
the placement new checking into the middle-end to improve
the detection and avoid some false positives but the operator
disappears too early, before the size of the expression that's
being stuffed into the destination is known.


What's wrong with generating the same code you wanted to put in
operator new from the front-end, next to the call to operator new,
when the front-end handles a new expression?


I think it's a reasonable suggestion, thanks.  The only wrinkle
in it is that the trap is guarded by _FORTIFY_SOURCE, so if we
wanted the same behavior as Glibc provides for the string
functions, the C++ front end would have to check for it.  I'm
not sure how favorably hardwiring a Glibc macro into the compiler
would be looked upon.  I suspect not very, so unless it was okay
to trap unconditionally, some other mechanism would be needed
to control it.  Maybe a new option, such as -ftrap-undefined,
to control this behavior throughout GCC.

When changing the compiler I would also want to issue a warning
and not just trap (the trap in the patch I sent was obviously
suboptimal from that point of view).  So I think in the end,
the C++ front end will need to insert both a (conditional) trap
and a builtin to warn on expansion.  The two might be separate
or a single builtin for simplicity.  But I'm just brainstorming
now.


The only other solution that comes to mind to detect non-constant
overflow is to do something like:

void* operator new (size_t n, void *p)
{
  if (__builtin_object_size (p, 0) < n)
   __builtin_warn ("buffer overflow in placement new");
  return p;
}

and emit the warning from __builtin_warn during expansion.  That
should work and be usable in other contexts as well (e.g., to
implement overflow and other error detection in user-defined
functions).  It would be kind of like the middle-end form of
Clang attribute diagnose_if.


I may have misunderstood, but that seems to have the same issue of
applying to all calls to operator new instead of just new expressions.


Yes, but if it's only a warning then at least it doesn't cause
compilation to fail in the (probably very rare) cases where people are
doing something strange and calling it directly with a too-small
pointer.


Right.  I have never seen direct calls to placement new and
I can't think of anything they could be good for, but there
could very well be some.  If someone knows of a valid use
case I'd be interested in hearing about it.

Martin

Re: [045/nnn] poly_int: REG_ARGS_SIZE

2017-12-05 Thread Jeff Law

On 10/23/2017 11:19 AM, Richard Sandiford wrote:
> This patch adds new utility functions for manipulating REG_ARGS_SIZE
> notes and allows the notes to carry polynomial as well as constant sizes.
> 
> The code was inconsistent about whether INT_MIN or HOST_WIDE_INT_MIN
> should be used to represent an unknown size.  The patch uses
> HOST_WIDE_INT_MIN throughout.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * rtl.h (get_args_size, add_args_size_note): New functions.
>   (find_args_size_adjust): Return a poly_int64 rather than a
>   HOST_WIDE_INT.
>   (fixup_args_size_notes): Likewise.  Make the same change to the
>   end_args_size parameter.
>   * rtlanal.c (get_args_size, add_args_size_note): New functions.
>   * builtins.c (expand_builtin_trap): Use add_args_size_note.
>   * calls.c (emit_call_1): Likewise.
>   * explow.c (adjust_stack_1): Likewise.
>   * cfgcleanup.c (old_insns_match_p): Update use of
>   find_args_size_adjust.
>   * combine.c (distribute_notes): Track polynomial arg sizes.
>   * dwarf2cfi.c (dw_trace_info): Change beg_true_args_size,
>   end_true_args_size, beg_delay_args_size and end_delay_args_size
>   from HOST_WIDE_INT to poly_int64.
>   (add_cfi_args_size): Take the args_size as a poly_int64 rather
>   than a HOST_WIDE_INT.
>   (notice_args_size, notice_eh_throw, maybe_record_trace_start)
>   (maybe_record_trace_start_abnormal, scan_trace, connect_traces): Track
>   polynomial arg sizes.
>   * emit-rtl.c (try_split): Use get_args_size.
>   * recog.c (peep2_attempt): Likewise.
>   * reload1.c (reload_as_needed): Likewise.
>   * expr.c (find_args_size_adjust): Return the adjustment as a
>   poly_int64 rather than a HOST_WIDE_INT.
>   (fixup_args_size_notes): Change end_args_size from a HOST_WIDE_INT
>   to a poly_int64 and change the return type in the same way.
>   (emit_single_push_insn): Track polynomial arg sizes.
> 
OK.
jeff

Re: [019/nnn] poly_int: lra frame offsets

2017-12-05 Thread Jeff Law

On 10/23/2017 11:07 AM, Richard Sandiford wrote:
> This patch makes LRA use poly_int64s rather than HOST_WIDE_INTs
> to store a frame offset (including in things like eliminations).
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * lra-int.h (lra_reg): Change offset from int to poly_int64.
>   (lra_insn_recog_data): Change sp_offset from HOST_WIDE_INT
>   to poly_int64.
>   (lra_eliminate_regs_1, eliminate_regs_in_insn): Change
>   update_sp_offset from a HOST_WIDE_INT to a poly_int64.
>   (lra_update_reg_val_offset, lra_reg_val_equal_p): Take the
>   offset as a poly_int64 rather than an int.
>   * lra-assigns.c (find_hard_regno_for_1): Handle poly_int64 offsets.
>   (setup_live_pseudos_and_spill_after_risky_transforms): Likewise.
>   * lra-constraints.c (equiv_address_substitution): Track offsets
>   as poly_int64s.
>   (emit_inc): Check poly_int_rtx_p instead of CONST_INT_P.
>   (curr_insn_transform): Handle the new form of sp_offset.
>   * lra-eliminations.c (lra_elim_table): Change previous_offset
>   and offset from HOST_WIDE_INT to poly_int64.
>   (print_elim_table, update_reg_eliminate): Update accordingly.
>   (self_elim_offsets): Change from HOST_WIDE_INT to poly_int64_pod.
>   (get_elimination): Update accordingly.
>   (form_sum): Check poly_int_rtx_p instead of CONST_INT_P.
>   (lra_eliminate_regs_1, eliminate_regs_in_insn): Change
>   update_sp_offset from a HOST_WIDE_INT to a poly_int64.  Handle
>   poly_int64 offsets generally.
>   (curr_sp_change): Change from HOST_WIDE_INT to poly_int64.
>   (mark_not_eliminable, init_elimination): Update accordingly.
>   (remove_reg_equal_offset_note): Return a bool and pass the new
>   offset back by pointer as a poly_int64.
>   * lra-remat.c (change_sp_offset): Take sp_offset as a poly_int64
>   rather than a HOST_WIDE_INT.
>   (do_remat): Track offsets poly_int64s.
>   * lra.c (lra_update_insn_recog_data, setup_sp_offset): Likewise.
OK.
jeff

Re: [030/nnn] poly_int: get_addr_unit_base_and_extent

2017-12-05 Thread Jeff Law

On 10/23/2017 11:12 AM, Richard Sandiford wrote:
> This patch changes the values returned by
> get_addr_unit_base_and_extent from HOST_WIDE_INT to poly_int64.
> 
> maxsize in gimple_fold_builtin_memory_op goes from HOST_WIDE_INT
> to poly_uint64 (rather than poly_int) to match the previous use
> of tree_fits_uhwi_p.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-dfa.h (get_addr_base_and_unit_offset_1): Return the offset
>   as a poly_int64_pod rather than a HOST_WIDE_INT.
>   (get_addr_base_and_unit_offset): Likewise.
>   * tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise.
>   (get_addr_base_and_unit_offset): Likewise.
>   * doc/match-and-simplify.texi: Change off from HOST_WIDE_INT
>   to poly_int64 in example.
>   * fold-const.c (fold_binary_loc): Update call to
>   get_addr_base_and_unit_offset.
>   * gimple-fold.c (gimple_fold_builtin_memory_op): Likewise.
>   (maybe_canonicalize_mem_ref_addr): Likewise.
>   (gimple_fold_stmt_to_constant_1): Likewise.
>   * ipa-prop.c (ipa_modify_call_arguments): Likewise.
>   * match.pd: Likewise.
>   * omp-low.c (lower_omp_target): Likewise.
>   * tree-sra.c (build_ref_for_offset): Likewise.
>   (build_debug_ref_for_model): Likewise.
>   * tree-ssa-address.c (maybe_fold_tmr): Likewise.
>   * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Likewise.
>   * tree-ssa-ccp.c (optimize_memcpy): Likewise.
>   * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Likewise.
>   (constant_pointer_difference): Likewise.
>   * tree-ssa-loop-niter.c (expand_simple_operations): Likewise.
>   * tree-ssa-phiopt.c (jump_function_from_stmt): Likewise.
>   * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>   * tree-ssa-sccvn.c (vn_reference_fold_indirect): Likewise.
>   (vn_reference_maybe_forwprop_address, vn_reference_lookup_3): Likewise.
>   (set_ssa_val_to): Likewise.
>   * tree-ssa-strlen.c (get_addr_stridx, addr_stridxptr): Likewise.
>   * tree.c (build_simple_mem_ref_loc): Likewise.
OK.

Note Martin S. has some code that's ready to go into the tree that
likely will require converting some bits to poly_int and hits some of
the same areas.  Given the tree isn't poly_int aware right now there may
be some coordination between yourself and Martin S. that will need to
occur depending on which bits go in first.

jeff

Re: [070/nnn] poly_int: vectorizable_reduction

2017-12-05 Thread Jeff Law

On 11/22/2017 11:09 AM, Richard Sandiford wrote:
> Richard Sandiford  writes:
>> This patch makes vectorizable_reduction cope with variable-length vectors.
>> We can handle the simple case of an inner loop reduction for which
>> the target has native support for the epilogue operation.  For now we
>> punt on other cases, but patches after the main SVE submission allow
>> SLP and double reductions too.
> 
> Here's an updated version that applies on top of the recent removal
> of REDUC_*_EXPR.
> 
> Thanks,
> Richard
> 
> 
> 2017-11-22  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree.h (build_index_vector): Declare.
>   * tree.c (build_index_vector): New function.
>   * tree-vect-loop.c (get_initial_def_for_reduction): Treat the number
>   of units as polynomial, forcibly converting it to a constant if
>   vectorizable_reduction has already enforced the condition.
>   (get_initial_defs_for_reduction): Likewise.
>   (vect_create_epilog_for_reduction): Likewise.  Use build_index_vector
>   to create a {1,2,3,...} vector.
>   (vectorizable_reduction): Treat the number of units as polynomial.
>   Choose vectype_in based on the largest scalar element size rather
>   than the smallest number of units.  Enforce the restrictions
>   relied on above.
I assume you'll work with Richi to address any conflicts with his patch
to allow the target to specify a preferred mode for final reductions
using shifts or extractions.

OK.
jeff

Re: [027/nnn] poly_int: DWARF CFA offsets

2017-12-05 Thread Jeff Law

On 10/23/2017 11:10 AM, Richard Sandiford wrote:
> This patch makes the DWARF code use poly_int64 rather than
> HOST_WIDE_INT for CFA offsets.  The main changes are:
> 
> - to make reg_save use a DW_CFA_expression representation when
>   the offset isn't constant and
> 
> - to record the CFA information alongside a def_cfa_expression
>   if either offset is polynomial, since it's quite difficult
>   to reconstruct the CFA information otherwise.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * gengtype.c (main): Handle poly_int64_pod.
>   * dwarf2out.h (dw_cfi_oprnd_cfa_loc): New dw_cfi_oprnd_type.
>   (dw_cfi_oprnd::dw_cfi_cfa_loc): New field.
>   (dw_cfa_location::offset, dw_cfa_location::base_offset): Change
>   from HOST_WIDE_INT to poly_int64_pod.
>   * dwarf2cfi.c (queued_reg_save::cfa_offset): Likewise.
>   (copy_cfa): New function.
>   (lookup_cfa_1): Use the cached dw_cfi_cfa_loc, if it exists.
>   (cfi_oprnd_equal_p): Handle dw_cfi_oprnd_cfa_loc.
>   (cfa_equal_p, dwarf2out_frame_debug_adjust_cfa)
>   (dwarf2out_frame_debug_cfa_offset, dwarf2out_frame_debug_expr)
>   (initial_return_save): Treat offsets as poly_ints.
>   (def_cfa_0): Likewise.  Cache the CFA in dw_cfi_cfa_loc if either
>   offset is nonconstant.
>   (reg_save): Take the offset as a poly_int64.  Fall back to
>   DW_CFA_expression for nonconstant offsets.
>   (queue_reg_save): Take the offset as a poly_int64.
>   * dwarf2out.c (dw_cfi_oprnd2_desc): Handle DW_CFA_def_cfa_expression.
OK.
jeff

Re: [056/nnn] poly_int: MEM_REF offsets

2017-12-05 Thread Jeff Law

On 10/23/2017 11:23 AM, Richard Sandiford wrote:
> This patch allows MEM_REF offsets to be polynomial, with mem_ref_offset
> now returning a poly_offset_int instead of an offset_int.  The
> non-mechanical changes to callers of mem_ref_offset were handled by
> previous patches.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * fold-const.h (mem_ref_offset): Return a poly_offset_int rather
>   than an offset_int.
>   * tree.c (mem_ref_offset): Likewise.
>   * builtins.c (get_object_alignment_2): Treat MEM_REF offsets as
>   poly_ints.
>   * expr.c (get_inner_reference, expand_expr_real_1): Likewise.
>   * gimple-fold.c (get_base_constructor): Likewise.
>   * gimple-ssa-strength-reduction.c (restructure_reference): Likewise.
>   * ipa-polymorphic-call.c
>   (ipa_polymorphic_call_context::ipa_polymorphic_call_context): Likewise.
>   * ipa-prop.c (compute_complex_assign_jump_func, get_ancestor_addr_info)
>   (ipa_get_adjustment_candidate): Likewise.
>   * match.pd: Likewise.
>   * tree-data-ref.c (dr_analyze_innermost): Likewise.
>   * tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise.
>   * tree-eh.c (tree_could_trap_p): Likewise.
>   * tree-object-size.c (addr_object_size): Likewise.
>   * tree-ssa-address.c (copy_ref_info): Likewise.
>   * tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise.
>   (indirect_refs_may_alias_p): Likewise.
>   * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>   * tree-ssa.c (maybe_rewrite_mem_ref_base): Likewise.
>   (non_rewritable_mem_ref_base): Likewise.
>   * tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
>   * tree-vrp.c (search_for_addr_array): Likewise.
>   * varasm.c (decode_addr_const): Likewise.
OK.
jeff

Re: [073/nnn] poly_int: vectorizable_load/store

2017-12-05 Thread Jeff Law

On 10/23/2017 11:29 AM, Richard Sandiford wrote:
> This patch makes vectorizable_load and vectorizable_store cope with
> variable-length vectors.  The reverse and permute cases will be
> excluded by the code that checks the permutation mask (although a
> patch after the main SVE submission adds support for the reversed
> case).  Here we also need to exclude VMAT_ELEMENTWISE and
> VMAT_STRIDED_SLP, which split the operation up into a constant
> number of constant-sized operations.  We also don't try to extend
> the current widening gather/scatter support to variable-length
> vectors, since SVE uses a different approach.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vect-stmts.c (get_load_store_type): Treat the number of
>   units as polynomial.  Reject VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
>   for variable-length vectors.
>   (vectorizable_mask_load_store): Treat the number of units as
>   polynomial, asserting that it is constant if the condition has
>   already been enforced.
>   (vectorizable_store, vectorizable_load): Likewise.

OK.
jeff

Re: [068/nnn] poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZES

2017-12-05 Thread Jeff Law

On 10/23/2017 11:28 AM, Richard Sandiford wrote:
> This patch changes the type of current_vector_size to poly_uint64.
> It also changes TARGET_AUTOVECTORIZE_VECTOR_SIZES so that it fills
> in a vector of possible sizes (as poly_uint64s) instead of returning
> a bitmask.  The documentation claimed that the hook didn't need to
> include the default vector size (returned by preferred_simd_mode),
> but that wasn't consistent with the omp-low.c usage.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * target.h (vector_sizes, auto_vector_sizes): New typedefs.
>   * target.def (autovectorize_vector_sizes): Return the vector sizes
>   by pointer, using vector_sizes rather than a bitmask.
>   * targhooks.h (default_autovectorize_vector_sizes): Update accordingly.
>   * targhooks.c (default_autovectorize_vector_sizes): Likewise.
>   * config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
>   Likewise.
>   * config/arc/arc.c (arc_autovectorize_vector_sizes): Likewise.
>   * config/arm/arm.c (arm_autovectorize_vector_sizes): Likewise.
>   * config/i386/i386.c (ix86_autovectorize_vector_sizes): Likewise.
>   * config/mips/mips.c (mips_autovectorize_vector_sizes): Likewise.
>   * omp-general.c (omp_max_vf): Likewise.
>   * omp-low.c (omp_clause_aligned_alignment): Likewise.
>   * optabs-query.c (can_vec_mask_load_store_p): Likewise.
>   * tree-vect-loop.c (vect_analyze_loop): Likewise.
>   * tree-vect-slp.c (vect_slp_bb): Likewise.
>   * doc/tm.texi: Regenerate.
>   * tree-vectorizer.h (current_vector_size): Change from an unsigned int
>   to a poly_uint64.
>   * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Take
>   the vector size as a poly_uint64 rather than an unsigned int.
>   (current_vector_size): Change from an unsigned int to a poly_uint64.
>   (get_vectype_for_scalar_type): Update accordingly.
>   * tree.h (build_truth_vector_type): Take the size and number of
>   units as a poly_uint64 rather than an unsigned int.
>   (build_vector_type): Add a temporary overload that takes
>   the number of units as a poly_uint64 rather than an unsigned int.
>   * tree.c (make_vector_type): Likewise.
>   (build_truth_vector_type): Take the number of units as a poly_uint64
>   rather than an unsigned int.

OK.

jeff

Re: [101/nnn] poly_int: GET_MODE_NUNITS

2017-12-05 Thread Jeff Law

On 10/23/2017 11:41 AM, Richard Sandiford wrote:
> This patch changes GET_MODE_NUNITS from unsigned char
> to poly_uint16, although it remains a macro when compiling
> target code with NUM_POLY_INT_COEFFS == 1.
> 
> If the number of units isn't known at compile time, we use:
> 
>   (const:M (vec_duplicate:M X))
> 
> to represent a vector in which every element is equal to X.  The code
> ensures that there is only a single instance of each constant, so that
> pointer equality is enough.  (This is a requirement for the constants
> that go in const_tiny_rtx, but we might as well do it for all constants.)
> 
> Similarly we use:
> 
>   (const:M (vec_series:M A B))
> 
> for a linear series starting at A and having step B.
> 
> The to_constant call in make_vector_type goes away in a later patch.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * machmode.h (mode_nunits): Change from unsigned char to
>   poly_uint16_pod.
>   (ONLY_FIXED_SIZE_MODES): New macro.
>   (pod_mode::measurement_type, scalar_int_mode::measurement_type)
>   (scalar_float_mode::measurement_type, scalar_mode::measurement_type)
>   (complex_mode::measurement_type, fixed_size_mode::measurement_type):
>   New typedefs.
>   (mode_to_nunits): Return a poly_uint16 rather than an unsigned short.
>   (GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES,
>   or if measurement_type is not polynomial.
>   * genmodes.c (ZERO_COEFFS): New macro.
>   (emit_mode_nunits_inline): Make mode_nunits_inline return a
>   poly_uint16.
>   (emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod.
>   Use ZERO_COEFFS when emitting initializers.
>   * data-streamer.h (bp_pack_poly_value): New function.
>   (bp_unpack_poly_value): Likewise.
>   * lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
>   for GET_MODE_NUNITS.
>   * lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
>   for GET_MODE_NUNITS.
>   * tree.c (make_vector_type): Remove temporary shim and make
>   the real function take the number of units as a poly_uint64
>   rather than an int.
>   (build_vector_type_for_mode): Handle polynomial nunits.
>   * emit-rtl.c (gen_const_vec_duplicate_1): Likewise.
>   (gen_const_vec_series, gen_rtx_CONST_VECTOR): Likewise.
>   * genrecog.c (validate_pattern): Likewise.
>   * optabs-query.c (can_mult_highpart_p): Likewise.
>   * optabs-tree.c (expand_vec_cond_expr_p): Likewise.
>   * optabs.c (expand_vector_broadcast, expand_binop_directly)
>   (shift_amt_for_vec_perm_mask, expand_vec_perm, expand_vec_cond_expr)
>   (expand_mult_highpart): Likewise.
>   * rtlanal.c (subreg_get_info): Likewise.
>   * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>   (simplify_const_unary_operation, simplify_binary_operation_1)
>   (simplify_const_binary_operation, simplify_ternary_operation)
>   (test_vector_ops_duplicate, test_vector_ops): Likewise.
>   * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
>   (vect_grouped_load_supported): Likewise.
>   * tree-vect-generic.c (type_for_widest_vector_mode): Likewise.
>   * tree-vect-loop.c (have_whole_vector_shift): Likewise.
> 
> gcc/ada/
>   * gcc-interface/misc.c (enumerate_modes): Handle polynomial
>   GET_MODE_NUNITS.
OK.
jeff

Re: [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS

2017-12-05 Thread Jeff Law

On 10/23/2017 11:41 AM, Richard Sandiford wrote:
> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
> encoded in the 10-bit precision field and was previously always stored
> as a simple log2 value.  The challenge was to use this 10 bits to
> encode the number of elements in variable-length vectors, so that
> we didn't need to increase the size of the tree.
> 
> In practice the number of vector elements should always have the form
> N + N * X (where X is the runtime value), and as for constant-length
> vectors, N must be a power of 2 (even though X itself might not be).
> The patch therefore uses the low bit to select between constant-length
> and variable-length and uses the upper 9 bits to encode log2(N).
> Targets without variable-length vectors continue to use the old scheme.
> 
> A new valid_vector_subparts_p function tests whether a given number
> of elements can be encoded.  This is false for the vector modes that
> represent an LD3 or ST3 vector triple (which we want to treat as arrays
> of vectors rather than single vectors).
> 
> Most of the patch is mechanical; previous patches handled the changes
> that weren't entirely straightforward.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle
>   polynomial numbers of units.
>   (SET_TYPE_VECTOR_SUBPARTS): Likewise.
>   (valid_vector_subparts_p): New function.
>   (build_vector_type): Remove temporary shim and take the number
>   of units as a poly_uint64 rather than an int.
>   (build_opaque_vector_type): Take the number of units as a
>   poly_uint64 rather than an int.
>   * tree.c (build_vector): Handle polynomial TYPE_VECTOR_SUBPARTS.
>   (build_vector_from_ctor, type_hash_canon_hash): Likewise.
>   (type_cache_hasher::equal, uniform_vector_p): Likewise.
>   (vector_type_mode): Likewise.
>   (build_vector_from_val): If the number of units isn't constant,
>   use build_vec_duplicate_cst for constant operands and
>   VEC_DUPLICATE_EXPR otherwise.
>   (make_vector_type): Remove temporary is_constant ().
>   (build_vector_type, build_opaque_vector_type): Take the number of
>   units as a poly_uint64 rather than an int.
>   * cfgexpand.c (expand_debug_expr): Handle polynomial
>   TYPE_VECTOR_SUBPARTS.
>   * expr.c (count_type_elements, store_constructor): Likewise.
>   * fold-const.c (const_binop, const_unop, fold_convert_const)
>   (operand_equal_p, fold_view_convert_expr, fold_vec_perm)
>   (fold_ternary_loc, fold_relational_const): Likewise.
>   (native_interpret_vector): Likewise.  Change the size from an
>   int to an unsigned int.
>   * gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial
>   TYPE_VECTOR_SUBPARTS.
>   (gimple_fold_indirect_ref, gimple_build_vector): Likewise.
>   (gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when
>   duplicating a non-constant operand into a variable-length vector.
>   * match.pd: Handle polynomial TYPE_VECTOR_SUBPARTS.
>   * omp-simd-clone.c (simd_clone_subparts): Likewise.
>   * print-tree.c (print_node): Likewise.
>   * stor-layout.c (layout_type): Likewise.
>   * targhooks.c (default_builtin_vectorization_cost): Likewise.
>   * tree-cfg.c (verify_gimple_comparison): Likewise.
>   (verify_gimple_assign_binary): Likewise.
>   (verify_gimple_assign_ternary): Likewise.
>   (verify_gimple_assign_single): Likewise.
>   * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
>   * tree-vect-data-refs.c (vect_permute_store_chain): Likewise.
>   (vect_grouped_load_supported, vect_permute_load_chain): Likewise.
>   (vect_shift_permute_load_chain): Likewise.
>   * tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise.
>   (expand_vector_condition, optimize_vector_constructor): Likewise.
>   (lower_vec_perm, get_compute_type): Likewise.
>   * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
>   (get_initial_defs_for_reduction, vect_transform_loop): Likewise.
>   * tree-vect-patterns.c (vect_recog_bool_pattern): Likewise.
>   (vect_recog_mask_conversion_pattern): Likewise.
>   * tree-vect-slp.c (vect_supported_load_permutation_p): Likewise.
>   (vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise.
>   * tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
>   (get_group_load_store_type, vectorizable_mask_load_store): Likewise.
>   (vectorizable_bswap, simd_clone_subparts, vectorizable_assignment)
>   (vectorizable_shift, vectorizable_operation, vectorizable_store)
>   (vect_gen_perm_mask_any, vectorizable_load, vect_is_simple_cond)
>   (vectorizable_comparison, supportable_widening_operation): Likewise.
>   (supportable_narrowing_operation): Likewise.
> 
> gcc/ada/
>

Re: [063/nnn] poly_int: vectoriser vf and uf

2017-12-05 Thread Jeff Law

On 10/23/2017 11:26 AM, Richard Sandiford wrote:
> This patch changes the type of the vectorisation factor and SLP
> unrolling factor to poly_uint64.  This in turn required some knock-on
> changes in signedness elsewhere.
> 
> Cost decisions are generally based on estimated_poly_value,
> which for VF is wrapped up as vect_vf_for_cost.
> 
> The patch doesn't on its own enable variable-length vectorisation.
> It just makes the minimum changes necessary for the code to build
> with the new VF and UF types.  Later patches also make the
> vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable
> GET_MODE_NUNITS, at which point the code really does handle
> variable-length vectors.
> 
> The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX,
> to avoid hard-coding a particular architectural limit.
> 
> The patch includes a new test because a development version of the patch
> accidentally used file print routines instead of dump_*, which would
> fail with -fopt-info.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree-vectorizer.h (_slp_instance::unrolling_factor): Change
>   from an unsigned int to a poly_uint64.
>   (_loop_vec_info::slp_unrolling_factor): Likewise.
>   (_loop_vec_info::vectorization_factor): Change from an int
>   to a poly_uint64.
>   (MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
>   (vect_get_num_vectors): New function.
>   (vect_update_max_nunits, vect_vf_for_cost): Likewise.
>   (vect_get_num_copies): Use vect_get_num_vectors.
>   (vect_analyze_data_ref_dependences): Change max_vf from an int *
>   to an unsigned int *.
>   (vect_analyze_data_refs): Change min_vf from an int * to a
>   poly_uint64 *.
>   (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
>   than an unsigned HOST_WIDE_INT.
>   * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
>   (vect_analyze_data_ref_dependence): Change max_vf from an int *
>   to an unsigned int *.
>   (vect_analyze_data_ref_dependences): Likewise.
>   (vect_compute_data_ref_alignment): Handle polynomial vf.
>   (vect_enhance_data_refs_alignment): Likewise.
>   (vect_prune_runtime_alias_test_list): Likewise.
>   (vect_shift_permute_load_chain): Likewise.
>   (vect_supportable_dr_alignment): Likewise.
>   (dependence_distance_ge_vf): Take the vectorization factor as a
>   poly_uint64 rather than an unsigned HOST_WIDE_INT.
>   (vect_analyze_data_refs): Change min_vf from an int * to a
>   poly_uint64 *.
>   * tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
>   vfm1 as a poly_uint64 rather than an int.  Make the same change
>   for the returned bound_scalar.
>   (vect_gen_vector_loop_niters): Handle polynomial vf.
>   (vect_do_peeling): Likewise.  Update call to
>   vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
>   (vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
>   be constant.
>   * tree-vect-loop.c (vect_determine_vectorization_factor)
>   (vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
>   (vect_get_known_peeling_cost): Likewise.
>   (vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
>   (vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
>   (vect_transform_loop): Likewise.  Use the lowest possible VF when
>   updating the upper bounds of the loop.
>   (vect_min_worthwhile_factor): Make static.  Return an unsigned int
>   rather than an int.
>   * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
>   polynomial unroll factors.
>   (vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
>   (vect_make_slp_decision): Likewise.
>   (vect_supported_load_permutation_p): Likewise, and polynomial
>   vf too.
>   (vect_analyze_slp_cost): Handle polynomial vf.
>   (vect_slp_analyze_node_operations): Likewise.
>   (vect_slp_analyze_bb_1): Likewise.
>   (vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
>   than an unsigned HOST_WIDE_INT.
>   * tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
>   (vectorizable_load): Handle polynomial vf.
>   * tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
>   a poly_uint64.
>   (adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.
> 
> gcc/testsuite/
>   * gcc.dg/vect-opt-info-1.c: New test.
> 

OK.

Phew.  These are getting bigger...

As I go through this I find myself wondering if as a project we would be
better off moving to a different review model.  A whole lot of this
stuff is pretty straightforward once the basic design is agreed upon --
at which point review isn't adding much.

So I find myself wondering if we ought to (in the future for large work
like this) agree on the overall

Re: [043/nnn] poly_int: frame allocations

2017-12-05 Thread Jeff Law

On 10/23/2017 11:18 AM, Richard Sandiford wrote:
> This patch converts the frame allocation code (mostly in function.c)
> to use poly_int64 rather than HOST_WIDE_INT for frame offsets and
> sizes.
> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * function.h (frame_space): Change start and length from HOST_WIDE_INT
>   to poly_int64.
>   (get_frame_size): Return the size as a poly_int64 rather than a
>   HOST_WIDE_INT.
>   (frame_offset_overflow): Take the offset as a poly_int64 rather
>   than a HOST_WIDE_INT.
>   (assign_stack_local_1, assign_stack_local, assign_stack_temp_for_type)
>   (assign_stack_temp): Likewise for the size.
>   * function.c (get_frame_size): Return a poly_int64 rather than
>   a HOST_WIDE_INT.
>   (frame_offset_overflow): Take the offset as a poly_int64 rather
>   than a HOST_WIDE_INT.
>   (try_fit_stack_local): Take the start, length and size as poly_int64s
>   rather than HOST_WIDE_INTs.  Return the offset as a poly_int64_pod
>   rather than a HOST_WIDE_INT.
>   (add_frame_space): Take the start and end as poly_int64s rather than
>   HOST_WIDE_INTs.
>   (assign_stack_local_1, assign_stack_local, assign_stack_temp_for_type)
>   (assign_stack_temp): Likewise for the size.
>   (temp_slot): Change size, base_offset and full_size from HOST_WIDE_INT
>   to poly_int64.
>   (find_temp_slot_from_address): Handle polynomial offsets.
>   (combine_temp_slots): Likewise.
>   * emit-rtl.h (rtl_data::x_frame_offset): Change from HOST_WIDE_INT
>   to poly_int64.
>   * cfgexpand.c (alloc_stack_frame_space): Return the offset as a
>   poly_int64 rather than a HOST_WIDE_INT.
>   (expand_one_stack_var_at): Take the offset as a poly_int64 rather
>   than a HOST_WIDE_INT.
>   (expand_stack_vars, expand_one_stack_var_1, expand_used_vars): Handle
>   polynomial frame offsets.
>   * config/m32r/m32r-protos.h (m32r_compute_frame_size): Take the size
>   as a poly_int64 rather than an int.
>   * config/m32r/m32r.c (m32r_compute_frame_size): Likewise.
>   * config/v850/v850-protos.h (compute_frame_size): Likewise.
>   * config/v850/v850.c (compute_frame_size): Likewise.
>   * config/xtensa/xtensa-protos.h (compute_frame_size): Likewise.
>   * config/xtensa/xtensa.c (compute_frame_size): Likewise.
>   * config/pa/pa-protos.h (pa_compute_frame_size): Likewise.
>   * config/pa/pa.c (pa_compute_frame_size): Likewise.
>   * explow.h (get_dynamic_stack_base): Take the offset as a poly_int64
>   rather than a HOST_WIDE_INT.
>   * explow.c (get_dynamic_stack_base): Likewise.
>   * final.c (final_start_function): Use the constant lower bound
>   of the frame size for -Wframe-larger-than.
>   * ira.c (do_reload): Adjust for new get_frame_size return type.
>   * lra.c (lra): Likewise.
>   * reload1.c (reload): Likewise.
>   * config/avr/avr.c (avr_asm_function_end_prologue): Likewise.
>   * config/pa/pa.h (EXIT_IGNORE_STACK): Likewise.
>   * rtlanal.c (get_initial_register_offset): Return the offset as
>   a poly_int64 rather than a HOST_WIDE_INT.
> 

OK
Jeff

Re: [PATCH] Fix PR80846, change vectorizer reduction epilogue (on x86)

2017-12-05 Thread Richard Biener

On December 5, 2017 9:18:46 PM GMT+01:00, Jeff Law  wrote:
>On 11/28/2017 08:15 AM, Richard Biener wrote:
>> 
>> The following adds a new target hook,
>targetm.vectorize.split_reduction,
>> which allows the target to specify a preferred mode to perform the
>> final reducion on using either vector shifts or scalar extractions.
>> Up to that mode the vector reduction result is reduced by combining
>> lowparts and highparts recursively.  This avoids lane-crossing
>operations
>> when doing AVX256 on Zen and Bulldozer and also speeds up things on
>> Haswell (I verified ~20% speedup on Broadwell).
>> 
>> Thus the patch implements the target hook on x86 to _always_ prefer
>> SSE modes for the final reduction.
>> 
>> For the testcase in the bugzilla
>> 
>> int sumint(const int arr[]) {
>> arr = __builtin_assume_aligned(arr, 64);
>> int sum=0;
>> for (int i=0 ; i<1024 ; i++)
>>   sum+=arr[i];
>> return sum;
>> }
>> 
>> this changes -O3 -mavx512f code from
>> 
>> sumint:
>> .LFB0:
>> .cfi_startproc
>> vpxord  %zmm0, %zmm0, %zmm0
>> leaq4096(%rdi), %rax
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> vpaddd  (%rdi), %zmm0, %zmm0
>> addq$64, %rdi
>> cmpq%rdi, %rax
>> jne .L2
>> vpxord  %zmm1, %zmm1, %zmm1
>> vshufi32x4  $78, %zmm1, %zmm0, %zmm2
>> vpaddd  %zmm2, %zmm0, %zmm0
>> vmovdqa64   .LC0(%rip), %zmm2
>> vpermi2d%zmm1, %zmm0, %zmm2
>> vpaddd  %zmm2, %zmm0, %zmm0
>> vmovdqa64   .LC1(%rip), %zmm2
>> vpermi2d%zmm1, %zmm0, %zmm2
>> vpaddd  %zmm2, %zmm0, %zmm0
>> vmovdqa64   .LC2(%rip), %zmm2
>> vpermi2d%zmm1, %zmm0, %zmm2
>> vpaddd  %zmm2, %zmm0, %zmm0
>> vmovd   %xmm0, %eax
>> 
>> to
>> 
>> sumint:
>> .LFB0:
>> .cfi_startproc
>> vpxord  %zmm0, %zmm0, %zmm0
>> leaq4096(%rdi), %rax
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> vpaddd  (%rdi), %zmm0, %zmm0
>> addq$64, %rdi
>> cmpq%rdi, %rax
>> jne .L2
>> vextracti64x4   $0x1, %zmm0, %ymm1
>> vpaddd  %ymm0, %ymm1, %ymm1
>> vmovdqa %xmm1, %xmm0
>> vextracti128$1, %ymm1, %xmm1
>> vpaddd  %xmm1, %xmm0, %xmm0
>> vpsrldq $8, %xmm0, %xmm1
>> vpaddd  %xmm1, %xmm0, %xmm0
>> vpsrldq $4, %xmm0, %xmm1
>> vpaddd  %xmm1, %xmm0, %xmm0
>> vmovd   %xmm0, %eax
>> 
>> and for -O3 -mavx2 from
>> 
>> sumint:
>> .LFB0:
>> .cfi_startproc
>> vpxor   %xmm0, %xmm0, %xmm0
>> leaq4096(%rdi), %rax
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> vpaddd  (%rdi), %ymm0, %ymm0
>> addq$32, %rdi
>> cmpq%rdi, %rax
>> jne .L2
>> vpxor   %xmm1, %xmm1, %xmm1
>> vperm2i128  $33, %ymm1, %ymm0, %ymm2
>> vpaddd  %ymm2, %ymm0, %ymm0
>> vperm2i128  $33, %ymm1, %ymm0, %ymm2
>> vpalignr$8, %ymm0, %ymm2, %ymm2
>> vpaddd  %ymm2, %ymm0, %ymm0
>> vperm2i128  $33, %ymm1, %ymm0, %ymm1
>> vpalignr$4, %ymm0, %ymm1, %ymm1
>> vpaddd  %ymm1, %ymm0, %ymm0
>> vmovd   %xmm0, %eax
>> 
>> to
>> 
>> sumint:
>> .LFB0:
>> .cfi_startproc
>> vpxor   %xmm0, %xmm0, %xmm0
>> leaq4096(%rdi), %rax
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> vpaddd  (%rdi), %ymm0, %ymm0
>> addq$32, %rdi
>> cmpq%rdi, %rax
>> jne .L2
>> vmovdqa %xmm0, %xmm1
>> vextracti128$1, %ymm0, %xmm0
>> vpaddd  %xmm0, %xmm1, %xmm0
>> vpsrldq $8, %xmm0, %xmm1
>> vpaddd  %xmm1, %xmm0, %xmm0
>> vpsrldq $4, %xmm0, %xmm1
>> vpaddd  %xmm1, %xmm0, %xmm0
>> vmovd   %xmm0, %eax
>> vzeroupper
>> ret
>> 
>> which besides being faster is also smaller (less prefixes).
>> 
>> SPEC 2k6 results on Haswell (thus AVX2) are neutral.  As it merely
>> effects reduction vectorization epilogues I didn't expect big effects
>> but for loops that do not run much (more likely with AVX512).
>> 
>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> 
>> Ok for trunk?
>> 
>> The PR mentions some more tricks to optimize the sequence but
>> those look like backend only optimizations.
>> 
>> Thanks,
>> Richard.
>> 
>> 2017-11-28  Richard Biener  
>> 
>>  PR tree-optimization/80846
>>  * target.def (split_reduction): New target hook.
>>  * targhooks.c (default_split_reduction): New function.
>>  * targhooks.h (default_split_reduction): Declare.
>>  * tree-vect-loop.c (vect_create_epilog_for_reduction): If the
>>  target requests first reduce vectors by combining low and high
>>  parts.
>>  * tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust.
>>  (get_vectype_for_scalar_type_and_size): Export

1 2 >

1 - 100 of 101 matches

Mail list logo