[committed, gomp4] Fix release_dangling_ssa_names

2015-08-05 Thread Tom de Vries

[ was: Re: Expand oacc kernels after pass_fre ]
On 04/06/15 18:02, Tom de Vries wrote:

Please move this out of the class body.



Fixed and committed (ommitting patch as trivial).


+{
+  unsigned res = execute_expand_omp ();
+
+  /* After running pass_expand_omp_ssa to expand the oacc kernels
+ directive, we are left in the original function with anonymous
+ SSA_NAMEs, with a defining statement that has been deleted.  This
+ pass finds those SSA_NAMEs and releases them.
+ TODO: Either fix this elsewhere, or make the fix unnecessary.  */
+  unsigned int i;
+  for (i = 1; i < num_ssa_names; ++i)
+{
+  tree name = ssa_name (i);
+  if (name == NULL_TREE)
+continue;
+
+  gimple stmt = SSA_NAME_DEF_STMT (name);
+  bool found = false;
+
+  ssa_op_iter op_iter;
+  def_operand_p def_p;
+  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+{
+  tree def = DEF_FROM_PTR (def_p);
+  if (def == name)
+{
+  found = true;
+  break;
+}
+}
+
+  if (!found)
+{
+  if (dump_file)
+fprintf (dump_file, "Released dangling ssa name %u\n", i);
+  release_ssa_name (name);
+}
+}
+
+  return res;
+}


This patch implements the TODO.

The cause of the problems is that in replace_ssa_name, we create a new 
ssa_name with the def stmt of the old ssa_name, but do not reset the def 
stmt of the old ssa_name, leaving the ssa_name in the original function 
having a def stmt in the split-off function.


[ And if we don't do anything about that, at some point in another pass 
we use 'gimple_bb (SSA_NAME_DEF_STMT (name))->index' (a bb index in the 
split-off function) as an index into an array with as length the number 
of bbs in the original function. So the index may be out of bounds. ]


This patch fixes that by making sure we reset the def stmt to NULL. This 
means we can simplify release_dangling_ssa_names to just test for NULL 
def stmts.


Default defs are skipped by release_ssa_name, so setting the def stmt 
for default defs to NULL does not result in the name being released, but 
in an ssa-verification error. So instead, we keep the def stmt nop, and 
create a new nop for the copy in the split-off function.


[ The default def bit seems only to be triggered for the default def 
created by expand_omp_target:

...
  /* If we are in ssa form, we must load the value from the default
 definition of the argument.  That should not be defined now,
 since the argument is not used uninitialized.  */
  gcc_assert (ssa_default_def (cfun, arg) == NULL);
  narg = make_ssa_name (arg, gimple_build_nop ());
  set_ssa_default_def (cfun, arg, narg);
...
]

Bootstrapped and reg-tested on x86_64.

Committed to gomp-4_0-branch.

Thanks,
- Tom

Fix release_dangling_ssa_names

2015-08-05  Tom de Vries  

	* omp-low.c (release_dangling_ssa_names): Release SSA_NAMEs with NULL
	def stmt.
	* tree-cfg.c (replace_ssa_name): Don't move default def nops.  Set def
	stmt of unused SSA_NAME to NULL.
---
 gcc/omp-low.c  | 35 +++
 gcc/tree-cfg.c | 17 -
 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 0ebbbe1..cd2076f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10349,11 +10349,10 @@ make_pass_expand_omp (gcc::context *ctxt)
   return new pass_expand_omp (ctxt);
 }
 
-/* After running pass_expand_omp_ssa to expand the oacc kernels
-   directive, we are left in the original function with anonymous
-   SSA_NAMEs, with a defining statement that has been deleted.  This
-   pass finds those SSA_NAMEs and releases them.
-   TODO: Either fix this elsewhere, or make the fix unnecessary.  */
+/* After running pass_expand_omp_ssa to expand the oacc kernels directive, we
+   are left in the original function with anonymous SSA_NAMEs, with a NULL
+   defining statement.  This function finds those SSA_NAMEs and releases
+   them.  */
 
 static void
 release_dangling_ssa_names (void)
@@ -10366,26 +10365,14 @@ release_dangling_ssa_names (void)
 	continue;
 
   gimple stmt = SSA_NAME_DEF_STMT (name);
-  bool found = false;
-
-  ssa_op_iter op_iter;
-  def_operand_p def_p;
-  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
-	{
-	  tree def = DEF_FROM_PTR (def_p);
-	  if (def == name)
-	{
-	  found = true;
-	  break;
-	}
-	}
+  if (stmt != NULL)
+	continue;
 
-  if (!found)
-	{
-	  if (dump_file)
-	fprintf (dump_file, "Released dangling ssa name %u\n", i);
-	  release_ssa_name (name);
-	}
+  release_ssa_name (name);
+  gcc_assert (SSA_NAME_IN_FREE_LIST (name));
+  if (dump_file
+	  && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Released dangling ssa name %u\n", i);
 }
 }
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index cb9fe6d..6a00b25 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-

Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-08-05 Thread Andreas Schwab
Martin Sebor  writes:

> gcc/testsuite/ChangeLog
> 2015-07-28  Martin Sebor  
>
> * g++.dg/Wframe-address-in-Wall.C: New test.
> * g++.dg/Wframe-address.C: New test.
> * g++.dg/Wno-frame-address.C: New test.
> * gcc.dg/Wframe-address-in-Wall.c: New test.
> * gcc.dg/Wframe-address.c: New test.
> * gcc.dg/Wno-frame-address.c: New test.

FAIL: g++.dg/Wno-frame-address.C  -std=gnu++11 (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:42:30: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:43:30: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:44:30: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:45:30: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:65:30: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:66:30: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:67:30: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:68:30: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:15:28: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:16:28: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:17:28: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/g++.dg/Wframe-address.C:18:28: error: 
unsupported argument to 'void* __builtin_return_address(unsigned int)' [-Werror]

FAIL: gcc.dg/Wno-frame-address.c (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/Wframe-address.c:24:5: error: 
unsupported argument to '__builtin_return_address' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/Wframe-address.c:25:5: error: 
unsupported argument to '__builtin_return_address' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/Wframe-address.c:26:5: error: 
unsupported argument to '__builtin_return_address' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/Wframe-address.c:27:5: error: 
unsupported argument to '__builtin_return_address' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/Wframe-address.c:47:5: error: 
unsupported argument to '__builtin_return_address' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/Wframe-address.c:48:5: error: 
unsupported argument to '__builtin_return_address' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/Wframe-address.c:49:5: error: 
unsupported argument to '__builtin_return_address' [-Werror]
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/Wframe-address.c:50:5: error: 
unsupported argument to '__builtin_return_address' [-Werror]

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [committed, gomp4] Fix release_dangling_ssa_names

2015-08-05 Thread Richard Biener
On Wed, 5 Aug 2015, Tom de Vries wrote:

> [ was: Re: Expand oacc kernels after pass_fre ]
> On 04/06/15 18:02, Tom de Vries wrote:
> > > Please move this out of the class body.
> > > 
> > 
> > Fixed and committed (ommitting patch as trivial).
> > 
> > > > +{
> > > > +  unsigned res = execute_expand_omp ();
> > > > +
> > > > +  /* After running pass_expand_omp_ssa to expand the oacc kernels
> > > > + directive, we are left in the original function with anonymous
> > > > + SSA_NAMEs, with a defining statement that has been deleted.  This
> > > > + pass finds those SSA_NAMEs and releases them.
> > > > + TODO: Either fix this elsewhere, or make the fix unnecessary.  */
> > > > +  unsigned int i;
> > > > +  for (i = 1; i < num_ssa_names; ++i)
> > > > +{
> > > > +  tree name = ssa_name (i);
> > > > +  if (name == NULL_TREE)
> > > > +continue;
> > > > +
> > > > +  gimple stmt = SSA_NAME_DEF_STMT (name);
> > > > +  bool found = false;
> > > > +
> > > > +  ssa_op_iter op_iter;
> > > > +  def_operand_p def_p;
> > > > +  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
> > > > +{
> > > > +  tree def = DEF_FROM_PTR (def_p);
> > > > +  if (def == name)
> > > > +{
> > > > +  found = true;
> > > > +  break;
> > > > +}
> > > > +}
> > > > +
> > > > +  if (!found)
> > > > +{
> > > > +  if (dump_file)
> > > > +fprintf (dump_file, "Released dangling ssa name %u\n", i);
> > > > +  release_ssa_name (name);
> > > > +}
> > > > +}
> > > > +
> > > > +  return res;
> > > > +}
> 
> This patch implements the TODO.
> 
> The cause of the problems is that in replace_ssa_name, we create a new
> ssa_name with the def stmt of the old ssa_name, but do not reset the def stmt
> of the old ssa_name, leaving the ssa_name in the original function having a
> def stmt in the split-off function.
> 
> [ And if we don't do anything about that, at some point in another pass we use
> 'gimple_bb (SSA_NAME_DEF_STMT (name))->index' (a bb index in the split-off
> function) as an index into an array with as length the number of bbs in the
> original function. So the index may be out of bounds. ]
> 
> This patch fixes that by making sure we reset the def stmt to NULL. This means
> we can simplify release_dangling_ssa_names to just test for NULL def stmts.

Not sure if I understand the problem correctly but why are you not simply
releasing the SSA name when you remove its definition?

Richard.

> Default defs are skipped by release_ssa_name, so setting the def stmt for
> default defs to NULL does not result in the name being released, but in an
> ssa-verification error. So instead, we keep the def stmt nop, and create a new
> nop for the copy in the split-off function.
> 
> [ The default def bit seems only to be triggered for the default def created
> by expand_omp_target:
> ...
>   /* If we are in ssa form, we must load the value from the default
>  definition of the argument.  That should not be defined now,
>  since the argument is not used uninitialized.  */
>   gcc_assert (ssa_default_def (cfun, arg) == NULL);
>   narg = make_ssa_name (arg, gimple_build_nop ());
>   set_ssa_default_def (cfun, arg, narg);
> ...
> ]
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Committed to gomp-4_0-branch.
> 
> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


[PATCH] Fix PR67107

2015-08-05 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-08-05  Richard Biener  

PR middle-end/67107
* match.pd: Guard const_binop result checking against NULL_TREE
result.

* gcc.dg/pr67107.c: New testcase.
Index: gcc/match.pd
===
--- gcc/match.pd(revision 226551)
+++ gcc/match.pd(working copy)
@@ -1594,7 +1639,7 @@ (define_operator_list CBRT BUILT_IN_CBRT
tree tem = const_binop (op == PLUS_EXPR ? MINUS_EXPR : PLUS_EXPR,
   TREE_TYPE (@1), @2, @1);
  }
- (if (!TREE_OVERFLOW (tem))
+ (if (tem && !TREE_OVERFLOW (tem))
   (cmp @0 { tem; }))
 
  /* Likewise, we can simplify a comparison of a real constant with
@@ -1605,7 +1650,7 @@ (define_operator_list CBRT BUILT_IN_CBRT
   (simplify
(cmp (minus REAL_CST@0 @1) REAL_CST@2)
(with { tree tem = const_binop (MINUS_EXPR, TREE_TYPE (@1), @0, @2); }
-(if (!TREE_OVERFLOW (tem))
+(if (tem && !TREE_OVERFLOW (tem))
  (cmp { tem; } @1)
 
  /* Fold comparisons against built-in math functions.  */
Index: gcc/testsuite/gcc.dg/pr67107.c
===
--- gcc/testsuite/gcc.dg/pr67107.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr67107.c  (working copy)
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-frounding-math -funsafe-math-optimizations" } */
+
+int test ()
+{
+  return 5.0 < 5.0 - 0.1;
+}


Re: r226516 - in /trunk/gcc: ChangeLog testsuite/Ch...

2015-08-05 Thread Andreas Schwab
FAIL: gcc.dg/pr66314.c (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20150805/gcc/testsuite/gcc.dg/pr66314.c:1:0: warning: 
-fsanitize=address and -fsanitize=kernel-address are not supported for this 
target

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH] Fix PR67109

2015-08-05 Thread Richard Biener

The following fixes invalid group detection in the vectorizer where
the size doesn't fit an unsigned int.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-08-05  Richard Biener  

PR tree-optimization/67109
* tree-vect-data-refs.c (vect_analyze_group_access_1): Check
against too big groups.  Print whether this is a load or store
group.  Rename from ...
(vect_analyze_group_access): ... this which is now a wrapper
dissolving an invalid group.
(vect_analyze_data_ref_accesses): Print whether this is a load
or store group.

* gcc.dg/torture/pr67109.c: New testcase.
* gcc.dg/vect/vect-119.c: Adjust.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 226551)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2012,10 +2012,11 @@ vect_analyze_data_refs_alignment (loop_v
 /* Analyze groups of accesses: check that DR belongs to a group of
accesses of legal size, step, etc.  Detect gaps, single element
interleaving, and other special cases. Set grouped access info.
-   Collect groups of strided stores for further use in SLP analysis.  */
+   Collect groups of strided stores for further use in SLP analysis.
+   Worker for vect_analyze_group_access.  */
 
 static bool
-vect_analyze_group_access (struct data_reference *dr)
+vect_analyze_group_access_1 (struct data_reference *dr)
 {
   tree step = DR_STEP (dr);
   tree scalar_type = TREE_TYPE (DR_REF (dr));
@@ -2182,6 +2183,14 @@ vect_analyze_group_access (struct data_r
   if (groupsize == 0)
 groupsize = count + gaps;
 
+  if (groupsize > UINT_MAX)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"group is too large\n");
+ return false;
+   }
+
   /* Check that the size of the interleaving is equal to count for stores,
  i.e., that there are no gaps.  */
   if (groupsize != count
@@ -2203,13 +2212,18 @@ vect_analyze_group_access (struct data_r
   if (dump_enabled_p ())
{
  dump_printf_loc (MSG_NOTE, vect_location,
-  "Detected interleaving of size %d starting with ",
-  (int)groupsize);
+  "Detected interleaving ");
+ if (DR_IS_READ (dr))
+   dump_printf (MSG_NOTE, "load ");
+ else
+   dump_printf (MSG_NOTE, "store ");
+ dump_printf (MSG_NOTE, "of size %u starting with ",
+  (unsigned)groupsize);
  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
  if (GROUP_GAP (vinfo_for_stmt (stmt)) != 0)
dump_printf_loc (MSG_NOTE, vect_location,
-"There is a gap of %d elements after the group\n",
-(int)GROUP_GAP (vinfo_for_stmt (stmt)));
+"There is a gap of %u elements after the group\n",
+GROUP_GAP (vinfo_for_stmt (stmt)));
}
 
   /* SLP: create an SLP data structure for every interleaving group of
@@ -2249,6 +2263,30 @@ vect_analyze_group_access (struct data_r
   return true;
 }
 
+/* Analyze groups of accesses: check that DR belongs to a group of
+   accesses of legal size, step, etc.  Detect gaps, single element
+   interleaving, and other special cases. Set grouped access info.
+   Collect groups of strided stores for further use in SLP analysis.  */
+
+static bool
+vect_analyze_group_access (struct data_reference *dr)
+{
+  if (!vect_analyze_group_access_1 (dr))
+{
+  /* Dissolve the group if present.  */
+  gimple next, stmt = GROUP_FIRST_ELEMENT (vinfo_for_stmt (DR_STMT (dr)));
+  while (stmt)
+   {
+ stmt_vec_info vinfo = vinfo_for_stmt (stmt);
+ next = GROUP_NEXT_ELEMENT (vinfo);
+ GROUP_FIRST_ELEMENT (vinfo) = NULL;
+ GROUP_NEXT_ELEMENT (vinfo) = NULL;
+ stmt = next;
+   }
+  return false;
+}
+  return true;
+}
 
 /* Analyze the access pattern of the data-reference DR.
In case of non-consecutive accesses call vect_analyze_group_access() to
@@ -2598,6 +2636,10 @@ vect_analyze_data_ref_accesses (loop_vec
{
  dump_printf_loc (MSG_NOTE, vect_location,
   "Detected interleaving ");
+ if (DR_IS_READ (dra))
+   dump_printf (MSG_NOTE, "load ");
+ else
+   dump_printf (MSG_NOTE, "store ");
  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dra));
  dump_printf (MSG_NOTE,  " and ");
  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (drb));
Index: gcc/testsuite/gcc.dg/torture/pr67109.c
===
--- gcc/testsuite/gcc.dg/torture/pr67109.c  (revision 0)
+++ gcc/testsuite/gcc.dg/tort

Re: [fortran,patch] Extend IEEE support to all real kinds

2015-08-05 Thread Andreas Schwab
FAIL: gfortran.dg/ieee/large_1.f90   -O0  (test for excess errors)
Excess errors:
large_1.f90:(.text+0x1792): undefined reference to `logbq'

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH, i386] Disable AVX-512VL insns for scalar mode operands on -march=knl.

2015-08-05 Thread Kirill Yukhin
Hello,

Is it ok to backport the patch to gcc-5-branch?

--
Thanks, K

> On 04 Aug 15:31, Kirill Yukhin wrote:
> 
> commit 1055739cb51648794a01afd85f59efadd14378ed
> Author: Kirill Yukhin 
> Date:   Mon Aug 3 15:21:06 2015 +0300
> 
> Fix vec_concatv2df and vec_dupv2df to block wrongly enabled AVX-512VL 
> insns.
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 5c5c1fc..9ffe9aa 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -784,7 +784,8 @@
>  (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
>   sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
>   avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
> - fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq"
> + fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq,
> + avx512vl,noavx512vl"
>(const_string "base"))
>  
>  (define_attr "enabled" ""
> @@ -819,6 +820,8 @@
>(eq_attr "isa" "noavx512bw") (symbol_ref "!TARGET_AVX512BW")
>(eq_attr "isa" "avx512dq") (symbol_ref "TARGET_AVX512DQ")
>(eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ")
> +  (eq_attr "isa" "avx512vl") (symbol_ref "TARGET_AVX512VL")
> +  (eq_attr "isa" "noavx512vl") (symbol_ref "!TARGET_AVX512VL")
>   ]
>   (const_int 1)))
>  
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 0970f0e..ca1ec2e 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -8638,44 +8638,50 @@
> (set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])
>  
>  (define_insn "vec_dupv2df"
> -  [(set (match_operand:V2DF 0 "register_operand" "=x,v")
> +  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
>   (vec_duplicate:V2DF
> -   (match_operand:DF 1 "nonimmediate_operand" " 0,vm")))]
> +   (match_operand:DF 1 "nonimmediate_operand" " 0,xm,vm")))]
>"TARGET_SSE2 && "
>"@
> unpcklpd\t%0, %0
> -   %vmovddup\t{%1, %0|%0, %1}"
> -  [(set_attr "isa" "noavx,sse3")
> +   %vmovddup\t{%1, %0|%0, %1}
> +   vmovddup\t{%1, %0|%0, %1}"
> +  [(set_attr "isa" "noavx,sse3,avx512vl")
> (set_attr "type" "sselog1")
> -   (set_attr "prefix" "orig,maybe_vex")
> -   (set_attr "mode" "V2DF,DF")])
> +   (set_attr "prefix" "orig,maybe_vex,evex")
> +   (set_attr "mode" "V2DF,DF,DF")])
>  
>  (define_insn "*vec_concatv2df"
> -  [(set (match_operand:V2DF 0 "register_operand" "=x,v,v,x,x,v,x,x")
> +  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v,x,v,x,x,v,x,x")
>   (vec_concat:V2DF
> -   (match_operand:DF 1 "nonimmediate_operand" " 0,v,m,0,x,m,0,0")
> -   (match_operand:DF 2 "vector_move_operand"  " x,v,1,m,m,C,x,m")))]
> +   (match_operand:DF 1 "nonimmediate_operand" " 0,x,v,m,m,0,x,m,0,0")
> +   (match_operand:DF 2 "vector_move_operand"  " x,x,v,1,1,m,m,C,x,m")))]
>"TARGET_SSE
> && (!(MEM_P (operands[1]) && MEM_P (operands[2]))
> || (TARGET_SSE3 && rtx_equal_p (operands[1], operands[2])))"
>"@
> unpcklpd\t{%2, %0|%0, %2}
> vunpcklpd\t{%2, %1, %0|%0, %1, %2}
> +   vunpcklpd\t{%2, %1, %0|%0, %1, %2}
> %vmovddup\t{%1, %0|%0, %1}
> +   vmovddup\t{%1, %0|%0, %1}
> movhpd\t{%2, %0|%0, %2}
> vmovhpd\t{%2, %1, %0|%0, %1, %2}
> %vmovsd\t{%1, %0|%0, %1}
> movlhps\t{%2, %0|%0, %2}
> movhps\t{%2, %0|%0, %2}"
> -  [(set_attr "isa" "sse2_noavx,avx,sse3,sse2_noavx,avx,sse2,noavx,noavx")
> +  [(set_attr "isa" 
> "sse2_noavx,avx,avx512vl,sse3,avx512vl,sse2_noavx,avx,sse2,noavx,noavx")
> (set (attr "type")
>   (if_then_else
> (eq_attr "alternative" "0,1,2")
> (const_string "sselog")
> (const_string "ssemov")))
> -   (set_attr "prefix_data16" "*,*,*,1,*,*,*,*")
> -   (set_attr "prefix" "orig,vex,maybe_vex,orig,vex,maybe_vex,orig,orig")
> -   (set_attr "mode" "V2DF,V2DF,DF,V1DF,V1DF,DF,V4SF,V2SF")])
> +   (set (attr "prefix_data16")
> + (if_then_else (eq_attr "alternative" "5")
> +   (const_string "1")
> +   (const_string "*")))
> +   (set_attr "prefix" 
> "orig,vex,evex,maybe_vex,evex,orig,vex,maybe_vex,orig,orig")
> +   (set_attr "mode" "V2DF,V2DF,V2DF, DF, DF, V1DF,V1DF,DF,V4SF,V2SF")])
>  
>  ;
>  ;;


[PATCH, MIPS] Enable load/store bonding for I6400

2015-08-05 Thread Robert Suchanek
Hi,

Following up 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01730.html

The patch below enables the load-load/store-store bonding for MIPS32/MIPS64 R6.

Ok to apply?

Regards,
Robert

gcc/
* config/mips/mips.h (ENABLE_LD_ST_PAIRS): Enable load/store pairs for
I6400.
---
 gcc/config/mips/mips.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index d17a833..6e262d6 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3177,5 +3177,5 @@ extern GTY(()) struct target_globals *micromips_globals;
performance can be degraded for those targets.  Hence, do not bond for
micromips or fix_24k.  */
 #define ENABLE_LD_ST_PAIRS \
-  (TARGET_LOAD_STORE_PAIRS && TUNE_P5600 \
+  (TARGET_LOAD_STORE_PAIRS && (TUNE_P5600 || TUNE_I6400) \
&& !TARGET_MICROMIPS && !TARGET_FIX_24K)
-- 
2.4.5


[PATCH, MIPS] Remove W32 and W64 pseudo-processors

2015-08-05 Thread Robert Suchanek
Hi,

Since the I6400 scheduler is committed, W32/W64 pseudo-processors
are not needed anymore and can be removed.

Ok to commit?

Regards,
Robert

gcc/
* config/mips/mips.c (mips_rtx_cost_data): Remove costs for W32 and W64
pseudo-processors.
* config/mips/mips.md (processor): Remove w32 and w64.
---
 gcc/config/mips/mips.c  | 26 --
 gcc/config/mips/mips.md |  2 --
 2 files changed, 28 deletions(-)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index bf0f84f..b30d7b9 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -1255,32 +1255,6 @@ static const struct mips_rtx_cost_data
2,/* branch_cost */
4 /* memory_latency */
   },
-  { /* W32 */
-COSTS_N_INSNS (4),/* fp_add */
-COSTS_N_INSNS (4),/* fp_mult_sf */
-COSTS_N_INSNS (5),/* fp_mult_df */
-COSTS_N_INSNS (17),   /* fp_div_sf */
-COSTS_N_INSNS (32),   /* fp_div_df */
-COSTS_N_INSNS (5),/* int_mult_si */
-COSTS_N_INSNS (5),/* int_mult_di */
-COSTS_N_INSNS (41),   /* int_div_si */
-COSTS_N_INSNS (41),   /* int_div_di */
-1,   /* branch_cost */
-4/* memory_latency */
-  },
-  { /* W64 */
-COSTS_N_INSNS (4),/* fp_add */
-COSTS_N_INSNS (4),/* fp_mult_sf */
-COSTS_N_INSNS (5),/* fp_mult_df */
-COSTS_N_INSNS (17),   /* fp_div_sf */
-COSTS_N_INSNS (32),   /* fp_div_df */
-COSTS_N_INSNS (5),/* int_mult_si */
-COSTS_N_INSNS (5),/* int_mult_di */
-COSTS_N_INSNS (41),   /* int_div_si */
-COSTS_N_INSNS (41),   /* int_div_di */
-1,   /* branch_cost */
-4/* memory_latency */
-  },
   { /* M5100 */
 COSTS_N_INSNS (4),/* fp_add */
 COSTS_N_INSNS (4),/* fp_mult_sf */
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 2954a12..a0079d5 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -67,8 +67,6 @@ (define_enum "processor" [
   xlr
   xlp
   p5600
-  w32
-  w64
   m5100
   i6400
 ])
-- 
2.4.5


Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2015-08-05 Thread Richard Biener
On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin  wrote:
> On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
>> We had established the use of a boolean flag have_offload in gcc::context
>> to indicate whether during compilation, we've actually seen any code to
>> be offloaded (see cited below the relevant parts of the patch by Ilya et
>> al.).  This means that currently, the whole offload machinery will not be
>> run unless we actually have any offloaded data.  This means that the
>> configured mkoffload programs (-foffload=[...], defaulting to
>> configure-time --enable-offload-targets=[...]) will not be invoked unless
>> we actually have any offloaded data.  This means that we will not
>> actually generate constructor code to call libgomp's
>> GOMP_offload_register unless we actually have any offloaded data.
>
> Yes, that was the plan.
>
>> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
>> targets have been specified during compilation.
>>
>> But: at runtime, I'd like to know which -foffload=[...] targets have been
>> specified during compilation, so that we can, for example, reliably
>> resort to host fallback execution for -foffload=disable instead of
>> getting error message that an offloaded function is missing.
>
> It's easy to fix:
>
> diff --git a/libgomp/target.c b/libgomp/target.c
> index a5fb164..f81d570 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
> *devicep,
>k.host_end = k.host_start + 1;
>splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
>gomp_mutex_unlock (&devicep->lock);
> -  if (tgt_fn == NULL)
> -   gomp_fatal ("Target function wasn't mapped");
> -
>return (void *) tgt_fn->tgt_offset;
>  }
>  }
> @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
>  return gomp_target_fallback (fn, hostaddrs);
>
>void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> +  if (fn_addr == NULL)
> +return gomp_target_fallback (fn, hostaddrs);
>
>struct target_mem_desc *tgt_vars
>  = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
> @@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t 
> mapnum,
>  }
>
>void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> +  if (fn_addr == NULL)
> +return gomp_target_fallback (fn, hostaddrs);
>
>struct target_mem_desc *tgt_vars
>  = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
>
>
>> other hand, for example, for -foffload=nvptx-none, even if user program
>> code doesn't contain any offloaded data (and thus the offload machinery
>> has not been run), the user program might still contain any executable
>> directives or OpenACC runtime library calls, so we'd still like to use
>> the libgomp nvptx plugin.  However, we currently cannot detect this
>> situation.
>>
>> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
>> configuration in the executable (as a string, for example) for libgomp to
>> look that up, or b) make it a requirement that (if configured via
>> -foffload=[...]), the offload machinery is run even if there is not
>> actually any data to be offloaded, so we then reliably get the respective
>> constructor call to libgomp's GOMP_offload_register.  I once began to
>> implement a), but this to get a big ugly, so then looked into b) instead.
>> Compared to the status quo, always running the whole offloading machinery
>> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
>> are active, certainly does introduce some overhead when there isn't
>> actually any code to be offloaded, so I'm not sure whether that is
>> acceptable?
>
> I vote for (a).

What happens for conflicting -fofffload=[...] options in different TUs?

Richard.

>   -- Ilya


Re: [committed, gomp4] Fix release_dangling_ssa_names

2015-08-05 Thread Tom de Vries

On 05/08/15 09:29, Richard Biener wrote:

This patch fixes that by making sure we reset the def stmt to NULL. This means
>we can simplify release_dangling_ssa_names to just test for NULL def stmts.

Not sure if I understand the problem correctly but why are you not simply
releasing the SSA name when you remove its definition?


In move_sese_region_to_fn we move a region of blocks from one function 
to another, bit by bit.


When we encounter an ssa_name as def or use in the region, we:
- generate a new ssa_name,
- set the def stmt of the old name as def stmt of the new name, and
- add a mapping from the old to the new name.
The next time we encounter the same ssa_name in another statement, we 
find it in the map.


If we release the old ssa name, we effectively create statements with 
operands in the free-list. The first point where that cause breakage, is 
in walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign 
to be defined, which is not the case if it's in the free-list:

...
case GIMPLE_ASSIGN:
  /* Walk the RHS operands.  If the LHS is of a non-renamable type or
 is a register variable, we may use a COMPONENT_REF on the RHS.*/
  if (wi)
{
  tree lhs = gimple_assign_lhs (stmt);
  wi->val_only
= (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
   || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
}
...

Thanks,
- Tom


Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

2015-08-05 Thread Richard Biener
On Tue, Aug 4, 2015 at 6:49 PM, Kumar, Venkataramanan
 wrote:
> Hi Richard,
>
>
>> -Original Message-
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Tuesday, August 04, 2015 4:07 PM
>> To: Kumar, Venkataramanan
>> Cc: Jeff Law; Jakub Jelinek; gcc-patches@gcc.gnu.org
>> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with
>> power 2 integer constant.
>>
>> On Tue, Aug 4, 2015 at 10:52 AM, Kumar, Venkataramanan
>>  wrote:
>> > Hi Jeff,
>> >
>> >> -Original Message-
>> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> >> ow...@gcc.gnu.org] On Behalf Of Jeff Law
>> >> Sent: Monday, August 03, 2015 11:42 PM
>> >> To: Kumar, Venkataramanan; Jakub Jelinek
>> >> Cc: Richard Beiner (richard.guent...@gmail.com);
>> >> gcc-patches@gcc.gnu.org
>> >> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult
>> >> expr with power 2 integer constant.
>> >>
>> >> On 08/02/2015 05:03 AM, Kumar, Venkataramanan wrote:
>> >> > Hi Jakub,
>> >> >
>> >> > Thank you for reviewing the patch.
>> >> >
>> >> > I have incorporated your comments in the attached patch.
>> >> Note Jakub is on PTO for the next 3 weeks.
>> >
>> >  Thank you for this information.
>> >
>> >>
>> >>
>> >> >
>> >> >
>> >> >
>> >> > vectorize_mults_via_shift.diff.txt
>> >> >
>> >> >
>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
>> >> > b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
>> >> Jakub would probably like more testcases :-)
>> >>
>> >> The most obvious thing to test would be other shift factors.
>> >>
>> >> A negative test to verify we don't try to turn a multiply by
>> >> non-constant or multiply by a constant that is not a power of 2 into 
>> >> shifts.
>> >
>> > I have added negative test in the attached patch.
>> >
>> >
>> >>
>> >> [ Would it make sense, for example, to turn a multiply by 3 into a
>> >> shift-add sequence?  As Jakub said, choose_mult_variant can be your
>> >> friend. ]
>> >
>> > Yes I will do that in a follow up patch.
>> >
>> > The new change log becomes
>> >
>> > gcc/ChangeLog
>> > 2015-08-04  Venkataramanan Kumar
>> 
>> >  * tree-vect-patterns.c (vect_recog_mult_pattern): New function for
>> vectorizing
>> > multiplication patterns.
>> >  * tree-vectorizer.h: Adjust the number of patterns.
>> >
>> > gcc/testsuite/ChangeLog
>> > 2015-08-04  Venkataramanan Kumar
>> 
>> >  * gcc.dg/vect/vect-mult-pattern-1.c: New
>> > * gcc.dg/vect/vect-mult-pattern-2.c: New
>> >
>> > Bootstrapped and reg tested on aarch64-unknown-linux-gnu.
>> >
>> > Ok for trunk ?
>>
>> +  if (TREE_CODE (oprnd0) != SSA_NAME
>> +  || TREE_CODE (oprnd1) != INTEGER_CST
>> +  || TREE_CODE (itype) != INTEGER_TYPE
>>
>> INTEGRAL_TYPE_P (itype)
>>
>> +  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);  if
>> + (!optab
>> +  || optab_handler (optab, TYPE_MODE (vectype)) ==
>> CODE_FOR_nothing)
>> +   return NULL;
>> +
>>
>> indent of the return stmt looks wrong
>>
>> +  /* Handle constant operands that are postive or negative powers of 2.
>> + */  if ( wi::exact_log2 (oprnd1) != -1  ||
>> +   wi::exact_log2 (wi::neg (oprnd1)) != -1)
>>
>> no space after (, || goes to the next line.
>>
>> +{
>> +  tree shift;
>> +
>> +  if (wi::exact_log2 (oprnd1) != -1)
>>
>> please cache wi::exact_log2
>>
>> in fact the first if () looks redundant if you simply put an else return NULL
>> after a else if (wi::exact_log2 (wi::neg (oprnd1)) != -1)
>>
>> Note that the issue with INT_MIN is that wi::neg (INT_MIN) is INT_MIN
>> again, but it seems that wi::exact_log2 returns -1 in that case so you are 
>> fine
>> (and in fact not handling this case).
>>
>
> I have updated your review comments in the attached patch.
>
> For the INT_MIN case, I am getting  vectorized output with the patch.   I 
> believe x86_64 also vectorizes but does not negates the results.
>
> #include 
> unsigned long int  __attribute__ ((aligned (64)))arr[100];
>
> int i;
> #if 1
> void test_vector_shifts()
> {
> for(i=0; i<=99;i++)
> arr[i]=arr[i] * INT_MIN;
> }
> #endif
>
> void test_vectorshift_via_mul()
> {
> for(i=0; i<=99;i++)
> arr[i]=arr[i]*(-INT_MIN);
>
> }
>
> Before
> -
> ldr x1, [x0]
> neg x1, x1, lsl 31
> str x1, [x0], 8
> cmp x0, x2
>
> After
> ---
> ldr q0, [x0]
> shl v0.2d, v0.2d, 31
> neg v0.2d, v0.2d
> str q0, [x0], 16
> cmp x1, x0
>
> is this fine ?

The interesting case is of course LONG_MIN if you are vectorizing a
long multiplication.
Also check with arr[] being 'long', not 'unsigned long'.  Note that
with 'long' doing
arr[i]*(-LONG_MIN) invokes undefined behavior.

Richard.

>  > Thanks,
>> Richard.
>>
>> >>
>> >>
>> >>
>> >> > @@ -2147,6 +2152,140 @@ vect_recog_vector_vector_shift_pattern
>> >> (vec *stmts,
>> >> > return pattern_stmt;
>> >> >   }
>> >> >
>> >> > +/* Dete

Re: libgo patch committed: Kill sleep processes in testsuite

2015-08-05 Thread Andreas Schwab
PASS
kill: not enough arguments
FAIL: net
Makefile:4696: recipe for target 'net/check' failed
make[4]: *** [net/check] Error 1

$ cat net/check-testlog
PASS
kill: not enough arguments
FAIL: net
../../../libgo/testsuite/gotest: line 514: gotest-timeout: No such file or 
directory

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH][AArch64][12/14] Target attributes and target pragmas tests

2015-08-05 Thread Andreas Schwab
Kyrill Tkachov  writes:

> diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_1.c 
> b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
> new file mode 100644
> index 000..72d0838
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do assemble } */
> +/* { dg-options "-O2 -mcpu=thunderx -save-temps" } */
> +
> +__attribute__ ((target ("cpu=cortex-a72.cortex-a53")))
> +int
> +foo (int a)
> +{
> +  return a + 1;
> +}
> +
> +/* { dg-final { scan-assembler "//.tune cortex-a72.cortex-a53" } } */
> +/* { dg-final { scan-assembler-not "thunderx" } } */

FAIL: gcc.target/aarch64/target_attr_1.c (test for excess errors)
Excess errors:
Assembler messages:
Error: unknown cpu `thunderx'
Error: unrecognized option -mcpu=thunderx

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [RFC] Elimination of zext/sext - type promotion pass

2015-08-05 Thread Richard Biener
On Wed, Aug 5, 2015 at 2:12 AM, kugan  wrote:
>
>> You indeed need to use CONVERT_EXPR here, maybe you can elaborate
>> on the optimization issues.
>>
>>> 2. for inline asm (a reduced test case that might not make much as a
>>> stand alone test-case, but I ran into similar cases with valid
>>> programmes)
>>>
>>> ;; Function fn1 (fn1, funcdef_no=0, decl_uid=4220, cgraph_uid=0,
>>> symbol_order=0)
>>>
>>> fn1 (short int p1)
>>> {
>>>:
>>>__asm__("" : "=r" p1_2 : "0" p1_1(D));
>>>return;
>>>
>>> }
>>>
>>>
>>> I am generating something like the following which ICEs. What is the
>>> expected out?
>>>
>>> ;; Function fn1 (fn1, funcdef_no=0, decl_uid=4220, cgraph_uid=0,
>>> symbol_order=0)
>>>
>>> fn1 (short int p1)
>>> {
>>>int _1;
>>>int _2;
>>>short int _5;
>>>
>>>:
>>>_1 = (int) p1_4(D);
>>>_5 = (short int) _1;
>>>__asm__("" : "=r" p1_6 : "0" _5);
>>>_2 = (int) p1_6;
>>>return;
>>>
>>> }
>>
>>
>> Parameters are indeed "interesting" to handle ;)  As we now see on ARM
>> the incoming parameter (the default def) and later assignments to it
>> can require different promotions (well, different extensions for ARM).
>>
>> The only sensible way to deal with promoting parameters is to
>> promote them by changing the function signature.  Thus reflect the
>> targets ABI for parameters in the GIMPLE representation (which
>> includes TYPE_ARG_TYPES and DECL_ARGUMENTS).
>> IMHO we should do this during gimplification of parameters / call
>> arguments already.
>>
>> So for your example you'd end up with
>>
>> fn1 (int p1)
>> {
>>__asm__("" : "=r" p1_6 : "0" p1_4(D));
>>return;
>> }
>>
>> that is, promotions also apply to asm inputs/outputs (no?)
>
>
>
> Thanks for the review and answers. For the time being, I am handling
> gimple_asm as one that has to be handled in original type. I Will look into
> improving it after getting the basic framework right.

Yeah, that's always a possibility.  I also see from the dumps that we probably
want to promote function arguments and results on GIMPLE as well.  Possibly
very early during gimplification or as an early IPA pass (as it needs to adjust
the IL for calls as well, exposing ABI required promotions / extensions).

> As it is, attached patch bootstraps on x86_64-linux-gnu, arm-linux-gnu and
> aarch64-linux-gnu. There are few regressions to look into (Please see
> below).
>
> There are cases it is working well. There are cases where it can be
> improved. I am attaching couple test cases (and their results). I am seeing
> some BIT_AND_EXPR which are inserted by promotion are not being optimized
> when they are redundant. This is especially the case when I invalidate the
> VRP range into from VRP1 during the type promotion. I am looking into it.
>
> Please note that attached patch still needs to address:
> * Adding gimple_debug stmts.
> * Address review comment for expr.c handling SEXT_EXPR.
> * Address regression failures
>
> Based on the feedback, I will address the above and split the patch into
> logical patch set for easy detailed review.
>
> Here are the outputs for the testcases.
>
> --- c5.c.142t.veclower212015-08-05 08:50:11.367135339 +1000
> +++ c5.c.143t.promotion 2015-08-05 08:50:11.367135339 +1000
> @@ -1,34 +1,45 @@
>
>  ;; Function unPack (unPack, funcdef_no=0, decl_uid=4145, cgraph_uid=0,
> symbol_order=0)
>
>  unPack (unsigned char c)
>  {
> -  short int _1;
> -  unsigned short _4;
> -  unsigned short _5;
> -  short int _6;
> -  short int _7;
> +  int _1;
> +  unsigned int _2;
> +  unsigned int _3;
> +  unsigned int _4;
> +  unsigned int _5;
> +  int _6;
> +  int _7;
> +  unsigned int _9;
> +  int _11;
> +  int _12;
> +  short int _13;
>
>:
> -  c_3 = c_2(D) & 15;
> -  if (c_3 > 7)
> +  _2 = (unsigned int) c_10(D);
> +  _3 = _2 & 15;
> +  _9 = _3 & 255;
> +  if (_9 > 7)
>  goto ;
>else
>  goto ;
>
>:
> -  _4 = (unsigned short) c_3;
> -  _5 = _4 + 65531;
> -  _6 = (short int) _5;
> +  _4 = _3 & 65535;
> +  _5 = _4 + 4294967291;
> +  _11 = (int) _5;
> +  _6 = (_11) sext from bit (16);

Ok, so in GIMPLE we still have sign-changing conversions.  Another
thing we might want to lower at some stage ... ;)

>goto ;
>
>:
> -  _7 = (short int) c_3;
> +  _12 = (int) _3;
> +  _7 = (_12) sext from bit (16);
>
>:
># _1 = PHI <_6(3), _7(4)>
> -  return _1;
> +  _13 = (short int) _1;
> +  return _13;
>
>  }

Overall this looks like what I'd have expected - also pointing out the
missing argument/return value promotion.

>
> --- c5.org.s2015-08-05 08:51:44.619133892 +1000
> +++ c5.new.s2015-08-05 08:51:29.643134124 +1000
> @@ -16,16 +16,14 @@
> .syntax divided
> .arm
> .type   unPack, %function
>  unPack:
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> @ link register save eliminated.
> and r0, r0, #15
> cmp r0, #7
> subhi   r0, r0, #5
> -   uxthr0, r0
> -

Re: [PR64164] drop copyrename, integrate into expand

2015-08-05 Thread Richard Biener
On Wed, Aug 5, 2015 at 2:38 AM, Alexandre Oliva  wrote:
> On Aug  4, 2015, Richard Biener  wrote:
>
>> Though I wonder on whether splitting the patch into a first one with 
>> disabling
>> coalescing of parms (their default defs(?)) and a followup implementing the
>> support for that.
>
> We can't disable coalescing of parms altogether.  With -O0, we must
> coalesce all SSA_NAMEs referencing each parm to a single partition.
> With optimization, we could coalesce parms in general, just not these
> special cases in which the parm is to live in a caller-supplied memory
> block.
>
> Now, it's not coalescing parms proper that brought so much risk to the
> patch, it is assigning rtl to SSA partitions, and having assign_parms*
> use that assignment.  Considering that sometimes a single param
> necessarily ends up in more than one partition, requiring two
> assignments, and that assign_parms* can't deal with that, I don't see
> how to easily disable the cfgexpand logic when it comes to parms, so as
> to be able to leave assign_parms alone.
>
> How about, if further problems arise that justify reverting the patch
> one more time, I'll look into splitting the patch as you suggested, but
> otherwise, I'll save myself the trouble, ok?

Sure.

>> So - is my observation correct that this is only about coalescing of the
>> default defs of parameters, not other SSA names based on parameter decls?
>
> It's more like the opposite, i.e., we *refrain* from coalescing other
> SSA_NAMEs related with byref params, so that we can easily tell when a
> partition references a byref param and whether that partition holds its
> default def.  We could have coalesced any other names that ended up in
> different partitions, and even the partition holding the default def, if
> we had other means to identify partitions with default defs of byref
> params.  For example, we could create a bitmap of byref param default
> def versions, and then, after partitioning, map those to the partitions
> they were assigned to.  In fact, I might do that as a followup.
>
>> Do you think this splitting is feasible and my concern about the
>> code-gen issues warranted?
>
> It is feasible but not exactly easy.
>
> As for codegen, I hope to have covered all cases now, but should we find
> out I haven't, I'll try the split and see what that gets us.  Did you
> have any special cases in mind that it looks like I may have missed?

It was just a hunch when you talked about BLKmode and params in memory.
As coalescing is about SSA name (thus register) coalescing I was thinking
that if you coalesce a register with incoming memory you'll end up with
more memory accesses?  But maybe I'm completely off here.

I also thought of the RTL expansion thing we do with at first copying
the hardreg incoming args to pseudos and how that interacts with
coalescing.

But I guess you have eyed code-gen changes a bit anyway.

Thanks,
Richard.

> Thanks,
>
> --
> Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [committed, gomp4] Fix release_dangling_ssa_names

2015-08-05 Thread Richard Biener
On Wed, 5 Aug 2015, Tom de Vries wrote:

> On 05/08/15 09:29, Richard Biener wrote:
> > > This patch fixes that by making sure we reset the def stmt to NULL. This
> > > means
> > > >we can simplify release_dangling_ssa_names to just test for NULL def
> > > stmts.
> > Not sure if I understand the problem correctly but why are you not simply
> > releasing the SSA name when you remove its definition?
> 
> In move_sese_region_to_fn we move a region of blocks from one function to
> another, bit by bit.
> 
> When we encounter an ssa_name as def or use in the region, we:
> - generate a new ssa_name,
> - set the def stmt of the old name as def stmt of the new name, and
> - add a mapping from the old to the new name.
> The next time we encounter the same ssa_name in another statement, we find it
> in the map.
> 
> If we release the old ssa name, we effectively create statements with operands
> in the free-list. The first point where that cause breakage, is in
> walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be
> defined, which is not the case if it's in the free-list:
> ...
> case GIMPLE_ASSIGN:
>   /* Walk the RHS operands.  If the LHS is of a non-renamable type or
>  is a register variable, we may use a COMPONENT_REF on the RHS.*/
>   if (wi)
> {
>   tree lhs = gimple_assign_lhs (stmt);
>   wi->val_only
> = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
>|| gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
> }
> ...

Hmm, ok, probably because the stmt moving doesn't happen in DOM
order (move defs before uses).  But

+
+  if (!SSA_NAME_IS_DEFAULT_DEF (name))
+   /* The statement has been moved to the child function.  It no 
longer
+  defines name in the original function.  Mark the def stmt NULL, 
and
+  let release_dangling_ssa_names deal with it.  */
+   SSA_NAME_DEF_STMT (name) = NULL;

applies also to uses - I don't see why it couldn't happen that you
move a use but not its def (the def would be a parameter to the
split-out function).  You'd wreck the IL of the source function this way.

I think that the whole dance of actually moving things instead of
just copying it isn't worth the extra maintainance (well, if we already
have a machinery duplicating a SESE region to another function - I
suppose gimple_duplicate_sese_region could be trivially changed to
support that).

Trunk doesn't have release_dangling_ssa_names it seems but I think
it belongs to move_sese_region_to_fn and not to omp-low.c and it
could also just walk the d->vars_map replace_ssa_name fills to
iterate over the removal candidates (and if the situation of
moving uses but not defs cannot happen you don't need any
SSA_NAME_DEF_STMT dance either).

Thanks,
Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: Regression in target MIC compiler

2015-08-05 Thread Thomas Schwinge
Hi!

It seems as if David's »[PATCH][1/N] Change GET_MODE_INNER to always
return a non-void mode« is relevant here:

On Tue, 4 Aug 2015 16:06:23 +0300, Ilya Verbin  wrote:
> On Tue, Aug 04, 2015 at 14:35:11 +0200, Thomas Schwinge wrote:
> > On Fri, 31 Jul 2015 20:13:02 +0300, Ilya Verbin  wrote:
> > > On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote:
> > > > > > On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
> > > > > > +  /* First search just the GET_CLASS_NARROWEST_MODE to wider 
> > > > > > modes,
> > > > > > +if not found, fallback to all modes.  */
> > > > > > +  int pass;
> > > > > > +  for (pass = 0; pass < 2; pass++)
> > > > > > +   for (machine_mode mr = pass ? VOIDmode
> > > > > > +   : GET_CLASS_NARROWEST_MODE (mclass);
> > > > > > +pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> > > > > > +pass ? mr = (machine_mode) (m + 1)
> > > > > > + : mr = GET_MODE_WIDER_MODE (mr))
> > > > > > + if (GET_MODE_CLASS (mr) != mclass
> > > > > > + || GET_MODE_SIZE (mr) != size
> > > > > > + || GET_MODE_PRECISION (mr) != prec
> > > > > > + || GET_MODE_INNER (mr) != inner
> > > > > > + || GET_MODE_IBIT (mr) != ibit
> > > > > > + || GET_MODE_FBIT (mr) != fbit
> > > > > > + || GET_MODE_NUNITS (mr) != nunits)
> > > > > > +   continue;
> > > > > > 
> > > > > > Given that gomp-4_1-branch works ok, the problem was introduced 
> > > > > > somewhere
> > > > > > between 9 and 31 Jul.  I'll try to find the revision.
> > > > > 
> > > > > Shouldn't 'mr' be here instead of 'm'?
> > > > 
> > > > I think so.  If it works, patch preapproved.
> > > 
> > > It fixes the infinite loop, but causes an error:
> > > lto1: fatal error: unsupported mode QI
> > 
> > Confirmed.
> > 
> > > > But wonder what changed that we haven't been triggering it before.
> > > > What mode do you think it on (mclass/size/prec/inner/ibit/fbit/nunits)?
> > > 
> > > When in hangs, mr is HImode.
> > 
> > Do you already have any further analysis, a workaround, or even a fix?
> 
> Not yet.  I thought since Jakub is the author of this function, he could 
> easily
> point what is wrong here :)  Actually, intelmic doesn't require
> lto_input_mode_table, so temporary workaround is just to disable it.

Well, avoiding lto_input_mode_table doesn't help us with nvptx
offloading...  ;-)

Anyway, I found that if I revert r226328, »[PATCH][1/N] Change
GET_MODE_INNER to always return a non-void mode«,
, the problem
goes away.  I'm trying, but I cannot claim yet to really understand this
mode streaming code...  But, with the producer
gcc/lto-streamer-out.c:lto_write_mode_table having been changed, does
maybe the consumer gcc/lto-streamer-in.c:lto_input_mode_table also need
to be updated accordingly?

For reference, David's change to
gcc/lto-streamer-out.c:lto_write_mode_table:

@@ -2679,23 +2679,23 @@ lto_write_mode_table (void)
   /* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
  also the inner mode marked.  */
   for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
 if (streamer_mode_table[i])
   {
machine_mode m = (machine_mode) i;
-   if (GET_MODE_INNER (m) != VOIDmode)
+   if (GET_MODE_INNER (m) != m)
  streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
   }
   /* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
  so that we can refer to them afterwards.  */
   for (int pass = 0; pass < 2; pass++)
 for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
   if (streamer_mode_table[i] && i != (int) VOIDmode && i != (int) BLKmode)
{
  machine_mode m = (machine_mode) i;
- if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0))
+ if ((GET_MODE_INNER (m) == m) ^ (pass == 0))
continue;
  bp_pack_value (&bp, m, 8);
  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
  bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
  bp_pack_value (&bp, GET_MODE_INNER (m), 8);

(Also, the source code comments need to be updated?)


Grüße,
 Thomas


pgpAjnrdGd60g.pgp
Description: PGP signature


RE: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

2015-08-05 Thread Kumar, Venkataramanan
Hi Richard,

> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Wednesday, August 05, 2015 2:21 PM
> To: Kumar, Venkataramanan
> Cc: Jeff Law; Jakub Jelinek; gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with
> power 2 integer constant.
> 
> On Tue, Aug 4, 2015 at 6:49 PM, Kumar, Venkataramanan
>  wrote:
> > Hi Richard,
> >
> >
> >> -Original Message-
> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> Sent: Tuesday, August 04, 2015 4:07 PM
> >> To: Kumar, Venkataramanan
> >> Cc: Jeff Law; Jakub Jelinek; gcc-patches@gcc.gnu.org
> >> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult
> >> expr with power 2 integer constant.
> >>
> >> On Tue, Aug 4, 2015 at 10:52 AM, Kumar, Venkataramanan
> >>  wrote:
> >> > Hi Jeff,
> >> >
> >> >> -Original Message-
> >> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> >> ow...@gcc.gnu.org] On Behalf Of Jeff Law
> >> >> Sent: Monday, August 03, 2015 11:42 PM
> >> >> To: Kumar, Venkataramanan; Jakub Jelinek
> >> >> Cc: Richard Beiner (richard.guent...@gmail.com);
> >> >> gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult
> >> >> expr with power 2 integer constant.
> >> >>
> >> >> On 08/02/2015 05:03 AM, Kumar, Venkataramanan wrote:
> >> >> > Hi Jakub,
> >> >> >
> >> >> > Thank you for reviewing the patch.
> >> >> >
> >> >> > I have incorporated your comments in the attached patch.
> >> >> Note Jakub is on PTO for the next 3 weeks.
> >> >
> >> >  Thank you for this information.
> >> >
> >> >>
> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > vectorize_mults_via_shift.diff.txt
> >> >> >
> >> >> >
> >> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
> >> >> > b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
> >> >> Jakub would probably like more testcases :-)
> >> >>
> >> >> The most obvious thing to test would be other shift factors.
> >> >>
> >> >> A negative test to verify we don't try to turn a multiply by
> >> >> non-constant or multiply by a constant that is not a power of 2 into
> shifts.
> >> >
> >> > I have added negative test in the attached patch.
> >> >
> >> >
> >> >>
> >> >> [ Would it make sense, for example, to turn a multiply by 3 into a
> >> >> shift-add sequence?  As Jakub said, choose_mult_variant can be
> >> >> your friend. ]
> >> >
> >> > Yes I will do that in a follow up patch.
> >> >
> >> > The new change log becomes
> >> >
> >> > gcc/ChangeLog
> >> > 2015-08-04  Venkataramanan Kumar
> >> 
> >> >  * tree-vect-patterns.c (vect_recog_mult_pattern): New function
> >> > for
> >> vectorizing
> >> > multiplication patterns.
> >> >  * tree-vectorizer.h: Adjust the number of patterns.
> >> >
> >> > gcc/testsuite/ChangeLog
> >> > 2015-08-04  Venkataramanan Kumar
> >> 
> >> >  * gcc.dg/vect/vect-mult-pattern-1.c: New
> >> > * gcc.dg/vect/vect-mult-pattern-2.c: New
> >> >
> >> > Bootstrapped and reg tested on aarch64-unknown-linux-gnu.
> >> >
> >> > Ok for trunk ?
> >>
> >> +  if (TREE_CODE (oprnd0) != SSA_NAME
> >> +  || TREE_CODE (oprnd1) != INTEGER_CST
> >> +  || TREE_CODE (itype) != INTEGER_TYPE
> >>
> >> INTEGRAL_TYPE_P (itype)
> >>
> >> +  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
> >> + if (!optab
> >> +  || optab_handler (optab, TYPE_MODE (vectype)) ==
> >> CODE_FOR_nothing)
> >> +   return NULL;
> >> +
> >>
> >> indent of the return stmt looks wrong
> >>
> >> +  /* Handle constant operands that are postive or negative powers of 2.
> >> + */  if ( wi::exact_log2 (oprnd1) != -1  ||
> >> +   wi::exact_log2 (wi::neg (oprnd1)) != -1)
> >>
> >> no space after (, || goes to the next line.
> >>
> >> +{
> >> +  tree shift;
> >> +
> >> +  if (wi::exact_log2 (oprnd1) != -1)
> >>
> >> please cache wi::exact_log2
> >>
> >> in fact the first if () looks redundant if you simply put an else
> >> return NULL after a else if (wi::exact_log2 (wi::neg (oprnd1)) != -1)
> >>
> >> Note that the issue with INT_MIN is that wi::neg (INT_MIN) is INT_MIN
> >> again, but it seems that wi::exact_log2 returns -1 in that case so
> >> you are fine (and in fact not handling this case).
> >>
> >
> > I have updated your review comments in the attached patch.
> >
> > For the INT_MIN case, I am getting  vectorized output with the patch.   I
> believe x86_64 also vectorizes but does not negates the results.
> >
> > #include 
> > unsigned long int  __attribute__ ((aligned (64)))arr[100];
> >
> > int i;
> > #if 1
> > void test_vector_shifts()
> > {
> > for(i=0; i<=99;i++)
> > arr[i]=arr[i] * INT_MIN;
> > }
> > #endif
> >
> > void test_vectorshift_via_mul()
> > {
> > for(i=0; i<=99;i++)
> > arr[i]=arr[i]*(-INT_MIN);
> >
> > }
> >
> > Before
> > -
> > ldr x1, [x0]
> > neg x1, x1, lsl 31
> > str x1, [x0], 8
> > cmp x0, 

[PATCH] Fix PR67055

2015-08-05 Thread Richard Biener

The inliner decides sth stupid here (IMHO - inlining a function into
a thunk) but at least we shouldn't crash (the tailcall in the thunk
has no BLOCK associated with it).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-08-05  Richard Biener  

PR tree-optimization/67055
* tree-ssa-ccp.c (fold_builtin_alloca_with_align): Handle
NULL gimple_block.

* g++.dg/torture/pr67055.C: New testcase.

Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c  (revision 226577)
+++ gcc/tree-ssa-ccp.c  (working copy)
@@ -2107,6 +2120,7 @@ fold_builtin_alloca_with_align (gimple s
  as a declared array, so we allow a larger size.  */
   block = gimple_block (stmt);
   if (!(cfun->after_inlining
+   && block
 && TREE_CODE (BLOCK_SUPERCONTEXT (block)) == FUNCTION_DECL))
 threshold /= 10;
   if (size > threshold)
Index: gcc/testsuite/g++.dg/torture/pr67055.C
===
--- gcc/testsuite/g++.dg/torture/pr67055.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr67055.C  (working copy)
@@ -0,0 +1,44 @@
+// { dg-do compile }
+// { dg-additional-options "-std=c++14" }
+
+namespace std {
+typedef __SIZE_TYPE__ size_t;
+struct nothrow_t;
+}
+namespace vespamalloc {
+void fn1(void *);
+template  class A {
+public:
+   static unsigned long fillStack(unsigned long);
+};
+template 
+   unsigned long A::fillStack(unsigned long p1) {
+   void *retAddr[p1];
+   fn1(retAddr);
+   }
+class B {
+protected:
+   B(void *);
+};
+template  class D : B {
+public:
+   D() : B(0) {}
+   void alloc(int) { A::fillStack(StackTraceLen); }
+};
+template  class C {
+public:
+   void *malloc(unsigned long);
+};
+template 
+   void *C::malloc(unsigned long) {
+   MemBlockPtrT mem;
+   mem.alloc(0);
+   }
+C, int> *_GmemP;
+}
+void *operator new(std::size_t, std::nothrow_t &) noexcept {
+return vespamalloc::_GmemP->malloc(0);
+}
+void *operator new[](std::size_t, std::nothrow_t &) noexcept {
+return vespamalloc::_GmemP->malloc(0);
+}


Re: [PATCH] Add X != !X pattern

2015-08-05 Thread Andreas Schwab
Richard Biener  writes:

>   * gimple-fold.c (gimple_fold_stmt_to_constant_1): Canonicalize
>   bool compares on RHS.
>   * match.pd: Add X ==/!= !X is false/true pattern.

ERROR in VTST/VTSTQ 
(/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
 line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 0x1 != 0xff  
(signed input)
FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  execution test
FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
\t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ \t]*v[0-9]+.[0-9]+[bshd],[ 
\t]+v[0-9]+.[0-9]+[bshd] 14
FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
\t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ \t]+d[0-9]+ 4
FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
FAIL: gcc.target/aarch64/singleton_intrinsics_1.c scan-assembler-times 
\\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH][AArch64][12/14] Target attributes and target pragmas tests

2015-08-05 Thread Kyrill Tkachov


On 05/08/15 10:03, Andreas Schwab wrote:

Kyrill Tkachov  writes:


diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_1.c 
b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
new file mode 100644
index 000..72d0838
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mcpu=thunderx -save-temps" } */
+
+__attribute__ ((target ("cpu=cortex-a72.cortex-a53")))
+int
+foo (int a)
+{
+  return a + 1;
+}
+
+/* { dg-final { scan-assembler "//.tune cortex-a72.cortex-a53" } } */
+/* { dg-final { scan-assembler-not "thunderx" } } */

FAIL: gcc.target/aarch64/target_attr_1.c (test for excess errors)
Excess errors:
Assembler messages:
Error: unknown cpu `thunderx'
Error: unrecognized option -mcpu=thunderx


yeah, that happens if your assembler doesn't support -mcpu=thunderx.
Newer binutils should support it.

Kyrill



Andreas.





Re: [PATCH] Add X != !X pattern

2015-08-05 Thread Richard Biener
On Wed, 5 Aug 2015, Andreas Schwab wrote:

> Richard Biener  writes:
> 
> > * gimple-fold.c (gimple_fold_stmt_to_constant_1): Canonicalize
> > bool compares on RHS.
> > * match.pd: Add X ==/!= !X is false/true pattern.
> 
> ERROR in VTST/VTSTQ 
> (/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
>  line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 0x1 != 
> 0xff  (signed input)
> FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  execution test
> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
> \t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ \t]*v[0-9]+.[0-9]+[bshd],[ 
> \t]+v[0-9]+.[0-9]+[bshd] 14
> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
> \t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ \t]+d[0-9]+ 4
> FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
> FAIL: gcc.target/aarch64/singleton_intrinsics_1.c scan-assembler-times 
> \\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2

Ick - somebody will have to come up with a reduced testcase for one of
this (best an execute fail).  Reduced to one failing case so I can
investigate with a cross compiler.

Eventually smells like a aarch64 vector specific issue or a latent
issue with the truth_valued_p predicate for vector types.

Richard.
 
> Andreas.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


RE: Regression in target MIC compiler

2015-08-05 Thread David Sherwood
Hi Thomas,

If this looks like my fault I am happy to look into this and fix the bug
if you can tell me how to reproduce it. I recently changed GET_MODE_INNER (m)
to return 'm' itself if there is no inner mode and I thought I'd fixed up lto,
but it seems I got it wrong. It also sounds like there is another bug in this
area too - if I want to test this do I need to apply any other patches too?

Regards,
David.

> 
> Hi!
> 
> It seems as if David's »[PATCH][1/N] Change GET_MODE_INNER to always
> return a non-void mode« is relevant here:
> 
> On Tue, 4 Aug 2015 16:06:23 +0300, Ilya Verbin  wrote:
> > On Tue, Aug 04, 2015 at 14:35:11 +0200, Thomas Schwinge wrote:
> > > On Fri, 31 Jul 2015 20:13:02 +0300, Ilya Verbin  wrote:
> > > > On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote:
> > > > > > > On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
> > > > > > > +  /* First search just the GET_CLASS_NARROWEST_MODE to wider 
> > > > > > > modes,
> > > > > > > +  if not found, fallback to all modes.  */
> > > > > > > +  int pass;
> > > > > > > +  for (pass = 0; pass < 2; pass++)
> > > > > > > + for (machine_mode mr = pass ? VOIDmode
> > > > > > > + : GET_CLASS_NARROWEST_MODE (mclass);
> > > > > > > +  pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> > > > > > > +  pass ? mr = (machine_mode) (m + 1)
> > > > > > > +   : mr = GET_MODE_WIDER_MODE (mr))
> > > > > > > +   if (GET_MODE_CLASS (mr) != mclass
> > > > > > > +   || GET_MODE_SIZE (mr) != size
> > > > > > > +   || GET_MODE_PRECISION (mr) != prec
> > > > > > > +   || GET_MODE_INNER (mr) != inner
> > > > > > > +   || GET_MODE_IBIT (mr) != ibit
> > > > > > > +   || GET_MODE_FBIT (mr) != fbit
> > > > > > > +   || GET_MODE_NUNITS (mr) != nunits)
> > > > > > > + continue;
> > > > > > >
> > > > > > > Given that gomp-4_1-branch works ok, the problem was introduced 
> > > > > > > somewhere
> > > > > > > between 9 and 31 Jul.  I'll try to find the revision.
> > > > > >
> > > > > > Shouldn't 'mr' be here instead of 'm'?
> > > > >
> > > > > I think so.  If it works, patch preapproved.
> > > >
> > > > It fixes the infinite loop, but causes an error:
> > > > lto1: fatal error: unsupported mode QI
> > >
> > > Confirmed.
> > >
> > > > > But wonder what changed that we haven't been triggering it before.
> > > > > What mode do you think it on 
> > > > > (mclass/size/prec/inner/ibit/fbit/nunits)?
> > > >
> > > > When in hangs, mr is HImode.
> > >
> > > Do you already have any further analysis, a workaround, or even a fix?
> >
> > Not yet.  I thought since Jakub is the author of this function, he could 
> > easily
> > point what is wrong here :)  Actually, intelmic doesn't require
> > lto_input_mode_table, so temporary workaround is just to disable it.
> 
> Well, avoiding lto_input_mode_table doesn't help us with nvptx
> offloading...  ;-)
> 
> Anyway, I found that if I revert r226328, »[PATCH][1/N] Change
> GET_MODE_INNER to always return a non-void mode«,
> , the problem
> goes away.  I'm trying, but I cannot claim yet to really understand this
> mode streaming code...  But, with the producer
> gcc/lto-streamer-out.c:lto_write_mode_table having been changed, does
> maybe the consumer gcc/lto-streamer-in.c:lto_input_mode_table also need
> to be updated accordingly?
> 
> For reference, David's change to
> gcc/lto-streamer-out.c:lto_write_mode_table:
> 
> @@ -2679,23 +2679,23 @@ lto_write_mode_table (void)
>/* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
>   also the inner mode marked.  */
>for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
>  if (streamer_mode_table[i])
>{
>   machine_mode m = (machine_mode) i;
> - if (GET_MODE_INNER (m) != VOIDmode)
> + if (GET_MODE_INNER (m) != m)
> streamer_mode_table[(int) GET_MODE_INNER (m)] = 1;
>}
>/* First stream modes that have GET_MODE_INNER (m) == VOIDmode,
>   so that we can refer to them afterwards.  */
>for (int pass = 0; pass < 2; pass++)
>  for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
>if (streamer_mode_table[i] && i != (int) VOIDmode && i != (int) 
> BLKmode)
>   {
> machine_mode m = (machine_mode) i;
> -   if ((GET_MODE_INNER (m) == VOIDmode) ^ (pass == 0))
> +   if ((GET_MODE_INNER (m) == m) ^ (pass == 0))
>   continue;
> bp_pack_value (&bp, m, 8);
> bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
> bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
> bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
> bp_pack_value (&bp, GET_MODE_INNER (m), 8);
> 
> (Also, the source code comments need to be updated?)
> 
> 
> Grüße,
>  Thomas




Re: [PATCH] Add X != !X pattern

2015-08-05 Thread Andrew Pinski
On Wed, Aug 5, 2015 at 3:16 AM, Richard Biener  wrote:
> On Wed, 5 Aug 2015, Andreas Schwab wrote:
>
>> Richard Biener  writes:
>>
>> > * gimple-fold.c (gimple_fold_stmt_to_constant_1): Canonicalize
>> > bool compares on RHS.
>> > * match.pd: Add X ==/!= !X is false/true pattern.
>>
>> ERROR in VTST/VTSTQ 
>> (/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
>>  line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 0x1 != 
>> 0xff  (signed input)
>> FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  execution test
>> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
>> \t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ \t]*v[0-9]+.[0-9]+[bshd],[ 
>> \t]+v[0-9]+.[0-9]+[bshd] 14
>> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
>> \t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ \t]+d[0-9]+ 4
>> FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
>> FAIL: gcc.target/aarch64/singleton_intrinsics_1.c scan-assembler-times 
>> \\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2
>
> Ick - somebody will have to come up with a reduced testcase for one of
> this (best an execute fail).  Reduced to one failing case so I can
> investigate with a cross compiler.
>
> Eventually smells like a aarch64 vector specific issue or a latent
> issue with the truth_valued_p predicate for vector types.

Or constant_boolean_node is not returning {-1,-1,-1,-1} for true vectors.

Thanks,
Andrew

>
> Richard.
>
>> Andreas.
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
> Norton, HRB 21284 (AG Nuernberg)


RE: Regression in target MIC compiler

2015-08-05 Thread Thomas Schwinge
Hi!

On Wed, 5 Aug 2015 11:18:32 +0100, David Sherwood  
wrote:
> If this looks like my fault

Well, not necessarily your fault -- might as well just be something that
has already been lurking in gcc/lto-streamer-in.c:lto_input_mode_table,
but so far we've gotten away without tripping over it.

> I am happy to look into this and fix the bug

Thanks for helping!

> if you can tell me how to reproduce it. I recently changed GET_MODE_INNER (m)
> to return 'm' itself if there is no inner mode and I thought I'd fixed up lto,
> but it seems I got it wrong. It also sounds like there is another bug in this
> area too - if I want to test this do I need to apply any other patches too?

gcc/lto-streamer-out.c:lto_write_mode_table as well as
gcc/lto-streamer-in.c:lto_input_mode_table are not used in regular LTO,
but are only used in offloading configurations,
.  To reproduce this, you'd build
such a configuration (offloading to x86_64-intelmicemul-linux-gnu is
easier to build than nvptx-none),
.
You can use the build scripts I uploaded, or do the steps manually.
Running the libgomp testsuite, then observe, for example,
libgomp.c/examples-4/array_sections-3.c hang (or, fail with »unsupported
mode QI« with the mr vs. m confusion fixed, see below).

I'm happy to test any patches or also hypotheses that you suggest --
maybe something is obvious to you just from looking at the code?


For reference:

> > On Tue, 4 Aug 2015 16:06:23 +0300, Ilya Verbin  wrote:
> > > On Tue, Aug 04, 2015 at 14:35:11 +0200, Thomas Schwinge wrote:
> > > > On Fri, 31 Jul 2015 20:13:02 +0300, Ilya Verbin  
> > > > wrote:
> > > > > On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote:
> > > > > > > > On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
> > > > > > > > +  /* First search just the GET_CLASS_NARROWEST_MODE to 
> > > > > > > > wider modes,
> > > > > > > > +if not found, fallback to all modes.  */
> > > > > > > > +  int pass;
> > > > > > > > +  for (pass = 0; pass < 2; pass++)
> > > > > > > > +   for (machine_mode mr = pass ? VOIDmode
> > > > > > > > +   : GET_CLASS_NARROWEST_MODE 
> > > > > > > > (mclass);
> > > > > > > > +pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> > > > > > > > +pass ? mr = (machine_mode) (m + 1)
> > > > > > > > + : mr = GET_MODE_WIDER_MODE (mr))
> > > > > > > > + if (GET_MODE_CLASS (mr) != mclass
> > > > > > > > + || GET_MODE_SIZE (mr) != size
> > > > > > > > + || GET_MODE_PRECISION (mr) != prec
> > > > > > > > + || GET_MODE_INNER (mr) != inner
> > > > > > > > + || GET_MODE_IBIT (mr) != ibit
> > > > > > > > + || GET_MODE_FBIT (mr) != fbit
> > > > > > > > + || GET_MODE_NUNITS (mr) != nunits)
> > > > > > > > +   continue;
> > > > > > > >
> > > > > > > > Given that gomp-4_1-branch works ok, the problem was introduced 
> > > > > > > > somewhere
> > > > > > > > between 9 and 31 Jul.  I'll try to find the revision.
> > > > > > >
> > > > > > > Shouldn't 'mr' be here instead of 'm'?
> > > > > >
> > > > > > I think so.  If it works, patch preapproved.
> > > > >
> > > > > It fixes the infinite loop, but causes an error:
> > > > > lto1: fatal error: unsupported mode QI
> > > >
> > > > Confirmed.
> > > >
> > > > > > But wonder what changed that we haven't been triggering it before.
> > > > > > What mode do you think it on 
> > > > > > (mclass/size/prec/inner/ibit/fbit/nunits)?
> > > > >
> > > > > When in hangs, mr is HImode.
> > > >
> > > > Do you already have any further analysis, a workaround, or even a fix?
> > >
> > > Not yet.  I thought since Jakub is the author of this function, he could 
> > > easily
> > > point what is wrong here :)  Actually, intelmic doesn't require
> > > lto_input_mode_table, so temporary workaround is just to disable it.
> > 
> > Well, avoiding lto_input_mode_table doesn't help us with nvptx
> > offloading...  ;-)
> > 
> > Anyway, I found that if I revert r226328, »[PATCH][1/N] Change
> > GET_MODE_INNER to always return a non-void mode«,
> > , the problem
> > goes away.  I'm trying, but I cannot claim yet to really understand this
> > mode streaming code...  But, with the producer
> > gcc/lto-streamer-out.c:lto_write_mode_table having been changed, does
> > maybe the consumer gcc/lto-streamer-in.c:lto_input_mode_table also need
> > to be updated accordingly?
> > 
> > For reference, David's change to
> > gcc/lto-streamer-out.c:lto_write_mode_table:
> > 
> > @@ -2679,23 +2679,23 @@ lto_write_mode_table (void)
> >/* Ensure that for GET_MODE_INNER (m) != VOIDmode we have
> >   also the inner mode marked.  */
> >for (int i = 0; i < (int) MAX_MACHINE_MODE; i++)
> >  if (streamer_mode_table[i])
> >{
> > machine_mod

Re: [committed, gomp4] Fix release_dangling_ssa_names

2015-08-05 Thread Tom de Vries

On 05/08/15 11:30, Richard Biener wrote:

On Wed, 5 Aug 2015, Tom de Vries wrote:


On 05/08/15 09:29, Richard Biener wrote:

This patch fixes that by making sure we reset the def stmt to NULL. This
means

we can simplify release_dangling_ssa_names to just test for NULL def

stmts.

Not sure if I understand the problem correctly but why are you not simply
releasing the SSA name when you remove its definition?


In move_sese_region_to_fn we move a region of blocks from one function to
another, bit by bit.

When we encounter an ssa_name as def or use in the region, we:
- generate a new ssa_name,
- set the def stmt of the old name as def stmt of the new name, and
- add a mapping from the old to the new name.
The next time we encounter the same ssa_name in another statement, we find it
in the map.

If we release the old ssa name, we effectively create statements with operands
in the free-list. The first point where that cause breakage, is in
walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be
defined, which is not the case if it's in the free-list:
...
case GIMPLE_ASSIGN:
   /* Walk the RHS operands.  If the LHS is of a non-renamable type or
  is a register variable, we may use a COMPONENT_REF on the RHS.*/
   if (wi)
 {
   tree lhs = gimple_assign_lhs (stmt);
   wi->val_only
 = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
|| gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
 }
...


Hmm, ok, probably because the stmt moving doesn't happen in DOM
order (move defs before uses).  But



There seems to be similar code for the rhs, so I don't think changing 
the order would fix anything.



+
+  if (!SSA_NAME_IS_DEFAULT_DEF (name))
+   /* The statement has been moved to the child function.  It no
longer
+  defines name in the original function.  Mark the def stmt NULL,
and
+  let release_dangling_ssa_names deal with it.  */
+   SSA_NAME_DEF_STMT (name) = NULL;

applies also to uses - I don't see why it couldn't happen that you
move a use but not its def (the def would be a parameter to the
split-out function).  You'd wreck the IL of the source function this way.



If you first move a use, you create a mapping. When you encounter the 
def, you use the mapping. Indeed, if the def is a default def, we don't 
encounter the def. Which is why we create a nop as defining def for 
those cases. The default def in the source function still has a defining 
nop, and has no uses anymore. I don't understand what is broken here.



I think that the whole dance of actually moving things instead of
just copying it isn't worth the extra maintainance (well, if we already
have a machinery duplicating a SESE region to another function - I
suppose gimple_duplicate_sese_region could be trivially changed to
support that).



I'll mention that as todo. For now, I think the fastest way to get a 
working version is to fix move_sese_region_to_fn.



Trunk doesn't have release_dangling_ssa_names it seems


Yep, I only ran into this trouble for the kernels region handling. But I 
don't exclude the possibility it could happen for trunk as well.



but I think
it belongs to move_sese_region_to_fn and not to omp-low.c


Makes sense indeed.


and it
could also just walk the d->vars_map replace_ssa_name fills to
iterate over the removal candidates


Agreed, I suppose in general that's a win over iterating over all the 
ssa names.



(and if the situation of
moving uses but not defs cannot happen you don't need any
SSA_NAME_DEF_STMT dance either).


I'd prefer to keep the SSA_NAME_DEF_STMT () = NULL bit. It makes sure a 
stmt is the defining stmt of only one ssa-name at all times.


I'll prepare a patch for trunk then.

Thanks,
- Tom



Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-08-05 Thread Trevor Saunders
On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
> On Sat, Jul 25, 2015 at 4:37 AM,   wrote:
> > From: Trevor Saunders 
> >
> > * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
> > config/ia64/ia64-protos.h, config/ia64/ia64.c, config/ia64/ia64.h,
> > config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
> > config/rs6000/rs6000.c, config/rs6000/xcoff.h, config/spu/spu.h,
> > config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to
> > the name of a function.
> > * output.h (default_output_label): New prototype.
> > * varasm.c (default_output_label): New function.
> > * vmsdbgout.c: Include tm_p.h.
> > * xcoffout.c: Likewise.
> 
> Just a general remark - the GCC output machinery is known to be slow,
> adding indirect calls might be not the very best idea without refactoring
> some of it.
> 
> Did you do any performance measurements for artificial testcases
> exercising the specific bits you change?

sorry about the delay, but I finally got a chance to do some perf tests
of the first patch.  I took three test cases fold-const.ii, insn-emit.ii
and a random .i from firefox and did 3 trials of the length of 100
compilations.  The only non default flag was -std=gnu++11.

results before patch hookizing output_ascii

fold-const.ii
real3m18.051s
user2m41.340s
sys 0m36.544s
real3m18.141s
user2m42.236s
sys 0m35.740s
real3m18.297s
user2m42.316s
sys 0m35.804s

insn-emit.ii
real9m58.229s
user8m26.960s
sys 1m31.224s
real9m57.857s
user8m24.616s
sys 1m33.072s
real9m57.922s
user8m25.232s
sys 1m32.512s

mozilla.ii
real8m5.732s
user6m44.888s
sys 1m20.764s
real8m5.404s
user6m44.468s
sys 1m20.856s
real7m59.197s
user6m39.632s
sys 1m19.472s

after patch

fold-const.ii
real3m18.488s
user2m41.972s
sys 0m36.388s
real3m18.215s
user2m41.640s
sys 0m36.432s
real3m18.368s
user2m42.492s
sys 0m35.720s

insn-emit.ii
real10m4.700s
user8m32.536s
sys 1m32.120s
real10m4.241s
user8m31.456s
sys 1m32.728s
real10m4.515s
user8m32.056s
sys 1m32.396s

mozilla.ii
real7m58.018s
user6m38.008s
sys 1m19.924s
real7m59.269s
user6m37.736s
sys 1m21.448s
real7m58.254s
user6m37.828s
sys 1m20.324s

So, roughly that looks to me like a range from improving by .5% to
regressing by 1%.  I'm not sure what could cause an improvement, so I
kind of wonder how valid these results are.

Another question is how one can refactor the output machinary to be
faster.  My first  thought is to buffer text internally before calling
stdio functions, but that seems like a giant job.

thanks!

Trev

far outside of noise,
> 
> Richard.
> 
> > ---
> >  gcc/config/arc/arc.h  |  3 +--
> >  gcc/config/bfin/bfin.h|  5 +
> >  gcc/config/frv/frv.h  |  6 +-
> >  gcc/config/ia64/ia64-protos.h |  1 +
> >  gcc/config/ia64/ia64.c| 11 +++
> >  gcc/config/ia64/ia64.h|  8 +---
> >  gcc/config/lm32/lm32.h|  3 +--
> >  gcc/config/mep/mep.h  |  8 +---
> >  gcc/config/mmix/mmix.h|  3 +--
> >  gcc/config/pa/pa-protos.h |  1 +
> >  gcc/config/pa/pa.c| 12 
> >  gcc/config/pa/pa.h|  9 +
> >  gcc/config/rs6000/rs6000-protos.h |  1 +
> >  gcc/config/rs6000/rs6000.c|  8 
> >  gcc/config/rs6000/xcoff.h |  3 +--
> >  gcc/config/spu/spu.h  |  3 +--
> >  gcc/config/visium/visium.h|  3 +--
> >  gcc/defaults.h|  6 +-
> >  gcc/output.h  |  3 +++
> >  gcc/varasm.c  |  9 +
> >  gcc/vmsdbgout.c   |  1 +
> >  gcc/xcoffout.c|  1 +
> >  22 files changed, 60 insertions(+), 48 deletions(-)
> >
> > diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> > index d98cce1..d3747b9 100644
> > --- a/gcc/config/arc/arc.h
> > +++ b/gcc/config/arc/arc.h
> > @@ -1245,8 +1245,7 @@ do {  
> > \
> >
> >  /* This is how to output the definition of a user-level label named NAME,
> > such as the label on a static function or variable NAME.  */
> > -#define ASM_OUTPUT_LABEL(FILE, NAME) \
> > -do { assemble_name (FILE, NAME); fputs (":\n", FILE); } while (0)
> > +#define ASM_OUTPUT_LABEL default_output_label
> >
> >  #define ASM_NAME_P(NAME) ( NAME[0]=='*')
> >
> > diff --git a/gcc/config/bfin/bfin.h b/gcc/config/bfin/bfin.h
> > index 26ba7c2..08906aa 100644
> > --- a/gcc/config/bfin/bfin.h
> > +++ b/gcc/config/bfin/bfin.h
> > @@ -1044,10 +1044,7 @@ typedef enum directives {
> >  ASM_OUTPUT_LABEL(FILE, NAME);  \
> >} while (0)
> >
> > -#define ASM_OUTPUT_LABEL(FILE, NAME)\
> > -  do {  assemble_name (FILE, N

Re: [PATCH] Add X != !X pattern

2015-08-05 Thread Richard Biener
On Wed, 5 Aug 2015, Andrew Pinski wrote:

> On Wed, Aug 5, 2015 at 3:16 AM, Richard Biener  wrote:
> > On Wed, 5 Aug 2015, Andreas Schwab wrote:
> >
> >> Richard Biener  writes:
> >>
> >> > * gimple-fold.c (gimple_fold_stmt_to_constant_1): Canonicalize
> >> > bool compares on RHS.
> >> > * match.pd: Add X ==/!= !X is false/true pattern.
> >>
> >> ERROR in VTST/VTSTQ 
> >> (/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> >>  line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 0x1 != 
> >> 0xff  (signed input)
> >> FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  execution test
> >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
> >> \t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ \t]*v[0-9]+.[0-9]+[bshd],[ 
> >> \t]+v[0-9]+.[0-9]+[bshd] 14
> >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
> >> \t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ \t]+d[0-9]+ 4
> >> FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
> >> FAIL: gcc.target/aarch64/singleton_intrinsics_1.c scan-assembler-times 
> >> \\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2
> >
> > Ick - somebody will have to come up with a reduced testcase for one of
> > this (best an execute fail).  Reduced to one failing case so I can
> > investigate with a cross compiler.
> >
> > Eventually smells like a aarch64 vector specific issue or a latent
> > issue with the truth_valued_p predicate for vector types.
> 
> Or constant_boolean_node is not returning {-1,-1,-1,-1} for true vectors.

It does.

Richard.


Re: [committed, gomp4] Fix release_dangling_ssa_names

2015-08-05 Thread Richard Biener
On Wed, 5 Aug 2015, Tom de Vries wrote:

> On 05/08/15 11:30, Richard Biener wrote:
> > On Wed, 5 Aug 2015, Tom de Vries wrote:
> > 
> > > On 05/08/15 09:29, Richard Biener wrote:
> > > > > This patch fixes that by making sure we reset the def stmt to NULL.
> > > > > This
> > > > > means
> > > > > > we can simplify release_dangling_ssa_names to just test for NULL def
> > > > > stmts.
> > > > Not sure if I understand the problem correctly but why are you not
> > > > simply
> > > > releasing the SSA name when you remove its definition?
> > > 
> > > In move_sese_region_to_fn we move a region of blocks from one function to
> > > another, bit by bit.
> > > 
> > > When we encounter an ssa_name as def or use in the region, we:
> > > - generate a new ssa_name,
> > > - set the def stmt of the old name as def stmt of the new name, and
> > > - add a mapping from the old to the new name.
> > > The next time we encounter the same ssa_name in another statement, we find
> > > it
> > > in the map.
> > > 
> > > If we release the old ssa name, we effectively create statements with
> > > operands
> > > in the free-list. The first point where that cause breakage, is in
> > > walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be
> > > defined, which is not the case if it's in the free-list:
> > > ...
> > > case GIMPLE_ASSIGN:
> > >/* Walk the RHS operands.  If the LHS is of a non-renamable type or
> > >   is a register variable, we may use a COMPONENT_REF on the RHS.*/
> > >if (wi)
> > >  {
> > >tree lhs = gimple_assign_lhs (stmt);
> > >wi->val_only
> > >  = (is_gimple_reg_type (TREE_TYPE (lhs)) && !is_gimple_reg (lhs))
> > > || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS;
> > >  }
> > > ...
> > 
> > Hmm, ok, probably because the stmt moving doesn't happen in DOM
> > order (move defs before uses).  But
> > 
> 
> There seems to be similar code for the rhs, so I don't think changing the
> order would fix anything.
> 
> > +
> > +  if (!SSA_NAME_IS_DEFAULT_DEF (name))
> > +   /* The statement has been moved to the child function.  It no
> > longer
> > +  defines name in the original function.  Mark the def stmt NULL,
> > and
> > +  let release_dangling_ssa_names deal with it.  */
> > +   SSA_NAME_DEF_STMT (name) = NULL;
> > 
> > applies also to uses - I don't see why it couldn't happen that you
> > move a use but not its def (the def would be a parameter to the
> > split-out function).  You'd wreck the IL of the source function this way.
> > 
> 
> If you first move a use, you create a mapping. When you encounter the def, you
> use the mapping. Indeed, if the def is a default def, we don't encounter the
> def. Which is why we create a nop as defining def for those cases. The default
> def in the source function still has a defining nop, and has no uses anymore.
> I don't understand what is broken here.

If you never encounter the DEF then it's broken.  Say, if for

foo(int a)
{
  int b = a;
  if (b)
{
  < code using b >
}
}

you move < code using b > to a function.  Then the def is still in 
foo but you create a mapping for its use(s).  Clearly the outlining
process in this case has to pass b as parameter to the outlined
function, something that may not happen currently.

It would probably be cleaner to separate the def and use remapping
to separate functions and record on whether we saw a def or not.

> > I think that the whole dance of actually moving things instead of
> > just copying it isn't worth the extra maintainance (well, if we already
> > have a machinery duplicating a SESE region to another function - I
> > suppose gimple_duplicate_sese_region could be trivially changed to
> > support that).
> > 
> 
> I'll mention that as todo. For now, I think the fastest way to get a working
> version is to fix move_sese_region_to_fn.

Sure.

> > Trunk doesn't have release_dangling_ssa_names it seems
> 
> Yep, I only ran into this trouble for the kernels region handling. But I don't
> exclude the possibility it could happen for trunk as well.
> 
> > but I think
> > it belongs to move_sese_region_to_fn and not to omp-low.c
> 
> Makes sense indeed.
> 
> > and it
> > could also just walk the d->vars_map replace_ssa_name fills to
> > iterate over the removal candidates
> 
> Agreed, I suppose in general that's a win over iterating over all the ssa
> names.
> 
> > (and if the situation of
> > moving uses but not defs cannot happen you don't need any
> > SSA_NAME_DEF_STMT dance either).
> 
> I'd prefer to keep the SSA_NAME_DEF_STMT () = NULL bit. It makes sure a stmt
> is the defining stmt of only one ssa-name at all times.
> 
> I'll prepare a patch for trunk then.

Thanks,
Richard.


Re: PR middle-end/16351 NULL dereference warnings

2015-08-05 Thread Richard Biener
On Mon, 3 Aug 2015, Manuel López-Ibáñez wrote:

> PING: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01860.html
> 
> Actually, the xfailed test was because the function folded to nothing
> and the offending code was removed without warning. Fixed in the
> attached version. Same changelog.

Ok.

Thanks,
Richard.

> 
> 
> On 22 July 2015 at 17:52, Manuel López-Ibáñez  wrote:
> > I took the patch in
> > https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01715.html and removed
> > the Wnull-attribute part, since most of it can be done from the FE as
> > shown in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01857.html  and
> > also to make the patch smaller and easier to review.
> >
> > I also fixed the comments by Florian here:
> > https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00149.html and added
> > more tests from the PR and its duplicates (one xfailed, I'll open a
> > new PR about it).
> >
> > Futher cleanups may be possible (infer_nonnull_range_by_attribute
> > checks flag_delete_null_pointer_checks, which seems weird to me but it
> > matches the existing behavior of infer_nonnull_range).
> >
> > I added this to Wall to get as much testing as possible, we can always
> > move it to Wextra or disable it by default just before the release if
> > it turns out to be too noisy.
> >
> > Boostrapped and regression tested on x86_64-linux-gnu.
> >
> > OK?
> >
> > gcc/ChangeLog:
> >
> > 2015-07-22  Manuel López-Ibáñez  
> > Jeff Law  
> >
> > PR c/16351
> > * doc/invoke.texi (Wnull-dereference): New.
> > * tree-vrp.c (infer_value_range): Update call to infer_nonnull_range.
> > * gimple-ssa-isolate-paths.c (find_implicit_erroneous_behaviour):
> > Warn for potential NULL dereferences.
> > (find_explicit_erroneous_behaviour): Warn for NULL dereferences.
> > * ubsan.c (instrument_nonnull_arg): Call
> > infer_nonnull_range_by_attribute.
> > (instrument_nonnull_return): Likewise.
> > * common.opt (Wnull-dereference); New.
> > * gimple.c (infer_nonnull_range): Remove bool arguments.
> > (infer_nonnull_range_by_dereference): New.
> > (infer_nonnull_range_by_attribute): New.
> > * gimple.h: Update declarations.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2015-07-22  Manuel López-Ibáñez  
> > Jeff Law  
> >
> > PR c/16351
> > * gcc.dg/tree-ssa/isolate-2.c: Close comment.
> > * gcc.dg/tree-ssa/isolate-4.c: Likewise.
> > * gcc.dg/tree-ssa/wnull-dereference.c: New test.
> > * gcc.dg/tree-ssa/isolate-1.c: Test warnings with -Wnull-dereference.
> > * gcc.dg/tree-ssa/isolate-3.c: Likewise.
> > * gcc.dg/tree-ssa/isolate-5.c: Likewise.
> > * c-c++-common/wnonnull-1.c: New test.
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)

Re: [fortran,patch] Extend IEEE support to all real kinds

2015-08-05 Thread FX
> FAIL: gfortran.dg/ieee/large_1.f90   -O0  (test for excess errors)
> Excess errors:
> large_1.f90:(.text+0x1792): undefined reference to `logbq’

Fixed by the patch there: 
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00124.html
Waiting for review.

FX



Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

2015-08-05 Thread Richard Biener
On Tue, Aug 4, 2015 at 6:49 PM, Kumar, Venkataramanan
 wrote:
> Hi Richard,
>
>
>> -Original Message-
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Tuesday, August 04, 2015 4:07 PM
>> To: Kumar, Venkataramanan
>> Cc: Jeff Law; Jakub Jelinek; gcc-patches@gcc.gnu.org
>> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with
>> power 2 integer constant.
>>
>> On Tue, Aug 4, 2015 at 10:52 AM, Kumar, Venkataramanan
>>  wrote:
>> > Hi Jeff,
>> >
>> >> -Original Message-
>> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> >> ow...@gcc.gnu.org] On Behalf Of Jeff Law
>> >> Sent: Monday, August 03, 2015 11:42 PM
>> >> To: Kumar, Venkataramanan; Jakub Jelinek
>> >> Cc: Richard Beiner (richard.guent...@gmail.com);
>> >> gcc-patches@gcc.gnu.org
>> >> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult
>> >> expr with power 2 integer constant.
>> >>
>> >> On 08/02/2015 05:03 AM, Kumar, Venkataramanan wrote:
>> >> > Hi Jakub,
>> >> >
>> >> > Thank you for reviewing the patch.
>> >> >
>> >> > I have incorporated your comments in the attached patch.
>> >> Note Jakub is on PTO for the next 3 weeks.
>> >
>> >  Thank you for this information.
>> >
>> >>
>> >>
>> >> >
>> >> >
>> >> >
>> >> > vectorize_mults_via_shift.diff.txt
>> >> >
>> >> >
>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
>> >> > b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
>> >> Jakub would probably like more testcases :-)
>> >>
>> >> The most obvious thing to test would be other shift factors.
>> >>
>> >> A negative test to verify we don't try to turn a multiply by
>> >> non-constant or multiply by a constant that is not a power of 2 into 
>> >> shifts.
>> >
>> > I have added negative test in the attached patch.
>> >
>> >
>> >>
>> >> [ Would it make sense, for example, to turn a multiply by 3 into a
>> >> shift-add sequence?  As Jakub said, choose_mult_variant can be your
>> >> friend. ]
>> >
>> > Yes I will do that in a follow up patch.
>> >
>> > The new change log becomes
>> >
>> > gcc/ChangeLog
>> > 2015-08-04  Venkataramanan Kumar
>> 
>> >  * tree-vect-patterns.c (vect_recog_mult_pattern): New function for
>> vectorizing
>> > multiplication patterns.
>> >  * tree-vectorizer.h: Adjust the number of patterns.
>> >
>> > gcc/testsuite/ChangeLog
>> > 2015-08-04  Venkataramanan Kumar
>> 
>> >  * gcc.dg/vect/vect-mult-pattern-1.c: New
>> > * gcc.dg/vect/vect-mult-pattern-2.c: New
>> >
>> > Bootstrapped and reg tested on aarch64-unknown-linux-gnu.
>> >
>> > Ok for trunk ?
>>
>> +  if (TREE_CODE (oprnd0) != SSA_NAME
>> +  || TREE_CODE (oprnd1) != INTEGER_CST
>> +  || TREE_CODE (itype) != INTEGER_TYPE
>>
>> INTEGRAL_TYPE_P (itype)
>>
>> +  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);  if
>> + (!optab
>> +  || optab_handler (optab, TYPE_MODE (vectype)) ==
>> CODE_FOR_nothing)
>> +   return NULL;
>> +
>>
>> indent of the return stmt looks wrong
>>
>> +  /* Handle constant operands that are postive or negative powers of 2.
>> + */  if ( wi::exact_log2 (oprnd1) != -1  ||
>> +   wi::exact_log2 (wi::neg (oprnd1)) != -1)
>>
>> no space after (, || goes to the next line.
>>
>> +{
>> +  tree shift;
>> +
>> +  if (wi::exact_log2 (oprnd1) != -1)
>>
>> please cache wi::exact_log2
>>
>> in fact the first if () looks redundant if you simply put an else return NULL
>> after a else if (wi::exact_log2 (wi::neg (oprnd1)) != -1)
>>
>> Note that the issue with INT_MIN is that wi::neg (INT_MIN) is INT_MIN
>> again, but it seems that wi::exact_log2 returns -1 in that case so you are 
>> fine
>> (and in fact not handling this case).
>>
>
> I have updated your review comments in the attached patch.
>
> For the INT_MIN case, I am getting  vectorized output with the patch.   I 
> believe x86_64 also vectorizes but does not negates the results.
>
> #include 
> unsigned long int  __attribute__ ((aligned (64)))arr[100];
>
> int i;
> #if 1
> void test_vector_shifts()
> {
> for(i=0; i<=99;i++)
> arr[i]=arr[i] * INT_MIN;
> }
> #endif
>
> void test_vectorshift_via_mul()
> {
> for(i=0; i<=99;i++)
> arr[i]=arr[i]*(-INT_MIN);
>
> }
>
> Before
> -
> ldr x1, [x0]
> neg x1, x1, lsl 31
> str x1, [x0], 8
> cmp x0, x2
>
> After
> ---
> ldr q0, [x0]
> shl v0.2d, v0.2d, 31
> neg v0.2d, v0.2d
> str q0, [x0], 16
> cmp x1, x0
>
> is this fine ?

Btw, the patch is ok for trunk.  It looks like it does the correct
thing for LONG_MIN.

Thanks,
Richard.

>  > Thanks,
>> Richard.
>>
>> >>
>> >>
>> >>
>> >> > @@ -2147,6 +2152,140 @@ vect_recog_vector_vector_shift_pattern
>> >> (vec *stmts,
>> >> > return pattern_stmt;
>> >> >   }
>> >> >
>> >> > +/* Detect multiplication by constant which are postive or
>> >> > +negatives of power 2,
>> >> s/postive/positive/
>> >>
>> >>
>> >> J

Re: [libquadmath, patch] Add logbq() to libquadmath

2015-08-05 Thread Dominique d'Humières
> The attached patch adds logbq() to libquadmath, with code lifted from glibc.

AFAICT there is something missing in the patch: I do not see any compilation of 
math/logbq.c and indeed no trace of logbq in libquadmath. What I am missing?

TIA

Dominique



Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-08-05 Thread Richard Biener
On Wed, Aug 5, 2015 at 12:57 PM, Trevor Saunders  wrote:
> On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
>> On Sat, Jul 25, 2015 at 4:37 AM,   wrote:
>> > From: Trevor Saunders 
>> >
>> > * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
>> > config/ia64/ia64-protos.h, config/ia64/ia64.c, config/ia64/ia64.h,
>> > config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
>> > config/rs6000/rs6000.c, config/rs6000/xcoff.h, config/spu/spu.h,
>> > config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to
>> > the name of a function.
>> > * output.h (default_output_label): New prototype.
>> > * varasm.c (default_output_label): New function.
>> > * vmsdbgout.c: Include tm_p.h.
>> > * xcoffout.c: Likewise.
>>
>> Just a general remark - the GCC output machinery is known to be slow,
>> adding indirect calls might be not the very best idea without refactoring
>> some of it.
>>
>> Did you do any performance measurements for artificial testcases
>> exercising the specific bits you change?
>
> sorry about the delay, but I finally got a chance to do some perf tests
> of the first patch.  I took three test cases fold-const.ii, insn-emit.ii
> and a random .i from firefox and did 3 trials of the length of 100
> compilations.  The only non default flag was -std=gnu++11.
>
> results before patch hookizing output_ascii
>
> fold-const.ii
> real3m18.051s
> user2m41.340s
> sys 0m36.544s
> real3m18.141s
> user2m42.236s
> sys 0m35.740s
> real3m18.297s
> user2m42.316s
> sys 0m35.804s
>
> insn-emit.ii
> real9m58.229s
> user8m26.960s
> sys 1m31.224s
> real9m57.857s
> user8m24.616s
> sys 1m33.072s
> real9m57.922s
> user8m25.232s
> sys 1m32.512s
>
> mozilla.ii
> real8m5.732s
> user6m44.888s
> sys 1m20.764s
> real8m5.404s
> user6m44.468s
> sys 1m20.856s
> real7m59.197s
> user6m39.632s
> sys 1m19.472s
>
> after patch
>
> fold-const.ii
> real3m18.488s
> user2m41.972s
> sys 0m36.388s
> real3m18.215s
> user2m41.640s
> sys 0m36.432s
> real3m18.368s
> user2m42.492s
> sys 0m35.720s
>
> insn-emit.ii
> real10m4.700s
> user8m32.536s
> sys 1m32.120s
> real10m4.241s
> user8m31.456s
> sys 1m32.728s
> real10m4.515s
> user8m32.056s
> sys 1m32.396s
>
> mozilla.ii
> real7m58.018s
> user6m38.008s
> sys 1m19.924s
> real7m59.269s
> user6m37.736s
> sys 1m21.448s
> real7m58.254s
> user6m37.828s
> sys 1m20.324s
>
> So, roughly that looks to me like a range from improving by .5% to
> regressing by 1%.  I'm not sure what could cause an improvement, so I
> kind of wonder how valid these results are.

Hmm, indeed.  The speedup looks suspicious.

> Another question is how one can refactor the output machinary to be
> faster.  My first  thought is to buffer text internally before calling
> stdio functions, but that seems like a giant job.

stdio functions are already buffering, so I don't know either.

But yes, going the libas route would improve things here, or for
example enhancing gas to be able to eat target binary data
without the need to encode it in printable characters...

.raw_data number-of-bytes


Makes it quite unparsable to editors of course ...

Richard.

> thanks!
>
> Trev
>
> far outside of noise,
>>
>> Richard.
>>
>> > ---
>> >  gcc/config/arc/arc.h  |  3 +--
>> >  gcc/config/bfin/bfin.h|  5 +
>> >  gcc/config/frv/frv.h  |  6 +-
>> >  gcc/config/ia64/ia64-protos.h |  1 +
>> >  gcc/config/ia64/ia64.c| 11 +++
>> >  gcc/config/ia64/ia64.h|  8 +---
>> >  gcc/config/lm32/lm32.h|  3 +--
>> >  gcc/config/mep/mep.h  |  8 +---
>> >  gcc/config/mmix/mmix.h|  3 +--
>> >  gcc/config/pa/pa-protos.h |  1 +
>> >  gcc/config/pa/pa.c| 12 
>> >  gcc/config/pa/pa.h|  9 +
>> >  gcc/config/rs6000/rs6000-protos.h |  1 +
>> >  gcc/config/rs6000/rs6000.c|  8 
>> >  gcc/config/rs6000/xcoff.h |  3 +--
>> >  gcc/config/spu/spu.h  |  3 +--
>> >  gcc/config/visium/visium.h|  3 +--
>> >  gcc/defaults.h|  6 +-
>> >  gcc/output.h  |  3 +++
>> >  gcc/varasm.c  |  9 +
>> >  gcc/vmsdbgout.c   |  1 +
>> >  gcc/xcoffout.c|  1 +
>> >  22 files changed, 60 insertions(+), 48 deletions(-)
>> >
>> > diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
>> > index d98cce1..d3747b9 100644
>> > --- a/gcc/config/arc/arc.h
>> > +++ b/gcc/config/arc/arc.h
>> > @@ -1245,8 +1245,7 @@ do { 
>> >  \
>> >
>> >  /* This is how to output the definition of a user-level label named NAME,
>> >

Re: [PATCH][AArch64] elf toolchain does not pass -shared linker option

2015-08-05 Thread Christophe Lyon
On 24 July 2015 at 18:18, Szabolcs Nagy  wrote:
> On 24/07/15 14:20, Marcus Shawcroft wrote:
>> On 22 July 2015 at 18:22, Szabolcs Nagy  wrote:
>>
>>> 2015-07-22  Szabolcs Nagy  
>>>
>>> * config/aarch64/aarch64-elf-raw.h (LINK_SPEC): Handle -h, -static,
>>> -shared, -symbolic, -rdynamic.
>>
>> OK, this should be back ported to 5 and 4.9 aswell.
>
> Committed to trunk in r226159.
> Backported to 5 in r226166.
> Backported to 4.9 in r226171.
>
Hi,

Since these commits, I am seeing g++.dg/ipa/devirt-28a.C failing on
aarch64-none-elf target.
This is because the testcase uses -Wl,--no-undefined, and the linker
complains about undefined references to libc symbols.

As a workaround, I added '-lc' to the *libgloss entry in rdimon.specs.

Should this change be applied to newlib/libgloss, or did I
mis-configure something?

Thanks

Christophe.


Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3

2015-08-05 Thread Paul Richard Thomas
Dear All,

I had some unexpected regressions, which turned out to be associated
with mulling over FX's problem with intrinsic IEEE modules.

Sendinggcc/fortran/ChangeLog
Sendinggcc/fortran/module.c
Sendinggcc/fortran/trans-decl.c
Sendinggcc/testsuite/ChangeLog
Sendinggcc/testsuite/gfortran.dg/public_private_module_2.f90
Sendinggcc/testsuite/gfortran.dg/public_private_module_6.f90
Sendinggcc/testsuite/gfortran.dg/submodule_1.f08
Adding gcc/testsuite/gfortran.dg/submodule_10.f08
Sendinggcc/testsuite/gfortran.dg/submodule_5.f08
Sendinggcc/testsuite/gfortran.dg/submodule_9.f08
Sendinggcc/testsuite/lib/fortran-modules.exp
Transmitting file data ...
Committed revision 226622.

The final step is documentation and wiki updates.

Cheers

Paul

On 4 August 2015 at 11:40, Paul Richard Thomas
 wrote:
> Dear Mikael,
>
> Thanks for your comments. I will commit the patch tonight. If folk get
> steamed up about .smod files appearing when they compile their
> favourite non-submodule-based code, I guess that we can put in a
> compilation flag to suppress them. We have plenty of time to tweak
> this before the release of 6 branch.
>
> Once committed, I will get on with the documentation and updating of
> gfortran wiki.
>
> Cheers
>
> Paul
>
> On 3 August 2015 at 17:39, Mikael Morin  wrote:
>> Le 03/08/2015 14:36, Paul Richard Thomas a écrit :
>>>
>>> Dear Mikael,
>>>
>>> Thanks for your green light!
>>>
>>> I have been mulling over the trans-decl part of the patch and having
>>> been wondering if it is necessary.
>>
>> You mean marking entities as public?  Or setting the hidden visibility
>> attribute?  Or both?
>> I think both are necessary.
>>
>>> Without optimization, private
>>> entities can be linked to. Given the discussion concerning the
>>> combination of submodules and private entities, I wonder if this is
>>> not sufficient? Within submodule scope, an advisory could be given for
>>> undefined references to suggest recompiling the module without
>>> optimization or making the entities public.
>>>
>> About recompiling without optimization:
>> If the module contains no code, I guess that would be OK.
>> But otherwise, it would be pretty bad.
>> And one would have to do the same for submodules of a submodule: the parent
>> submodule would be compiled without optimization. :-(
>>
>> About making the entities public:
>> I think the goal of submodules is providing a way to specify a (hopefully)
>> stable interface free of any internal implementation details that users
>> would start playing with if the opportunity was given to them.  Making all
>> entities public would go against that.
>>
>>
>> I've been reading about the hidden visibility attribute since you submitted
>> the 3/3 patch(es).  I think it's the right thing. :-)
>>
>> Mikael
>
>
>
> --
> Outside of a dog, a book is a man's best friend. Inside of a dog it's
> too dark to read.
>
> Groucho Marx



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


[PATCH] Fix PR67121

2015-08-05 Thread Richard Biener

Similar to if-combine if-conversion invalidates range info on stmts
that are executed unconditionally only after the transform.  Thus
we have to invalidate it.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-08-05  Richard Biener  

PR tree-optimization/67121
* tree-if-conv.c (combine_blocks): Clear range-info produced
by stmts no longer executed conditionally.

* gcc.dg/torture/pr67121.c: New testcase.

Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 226612)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2199,9 +2199,11 @@ combine_blocks (struct loop *loop, bool
   /* Merge basic blocks: first remove all the edges in the loop,
  except for those from the exit block.  */
   exit_bb = NULL;
+  bool *predicated = XNEWVEC (bool, orig_loop_num_nodes);
   for (i = 0; i < orig_loop_num_nodes; i++)
 {
   bb = ifc_bbs[i];
+  predicated[i] = !is_true_predicate (bb_predicate (bb));
   free_bb_predicate (bb);
   if (bb_with_exit_edge_p (loop, bb))
{
@@ -2259,9 +2261,21 @@ combine_blocks (struct loop *loop, bool
   if (bb == exit_bb || bb == loop->latch)
continue;
 
-  /* Make stmts member of loop->header.  */
+  /* Make stmts member of loop->header and clear range info from all stmts
+in BB which is now no longer executed conditional on a predicate we
+could have derived it from.  */
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-   gimple_set_bb (gsi_stmt (gsi), merge_target_bb);
+   {
+ gimple stmt = gsi_stmt (gsi);
+ gimple_set_bb (stmt, merge_target_bb);
+ if (predicated[i])
+   {
+ ssa_op_iter i;
+ tree op;
+ FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
+   reset_flow_sensitive_info (op);
+   }
+   }
 
   /* Update stmt list.  */
   last = gsi_last_bb (merge_target_bb);
@@ -2281,6 +2295,7 @@ combine_blocks (struct loop *loop, bool
 
   free (ifc_bbs);
   ifc_bbs = NULL;
+  free (predicated);
 }
 
 /* Version LOOP before if-converting it; the original loop
Index: gcc/testsuite/gcc.dg/torture/pr67121.c
===
--- gcc/testsuite/gcc.dg/torture/pr67121.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr67121.c  (working copy)
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+
+int a[6], b, c = 226, d, e, f;
+signed char g;
+
+void
+fn1 (int p1)
+{
+  b = a[p1];
+}
+
+int
+main ()
+{
+  a[0] = 1;
+  for (f = 0; f < 9; f++)
+{
+  signed char h = c;
+  int i = 1;
+  g = h < 0 ? h : h >> i;
+  e = g;
+  for (d = 1; d; d = 0)
+   ;
+}
+  fn1 (g >> 8 & 1);
+
+  if (b != 0) 
+__builtin_abort (); 
+
+  return 0;
+}


[PATCH] Fix PR67120

2015-08-05 Thread Richard Biener

The following fixes comparing of &i and &i if i is volatile.  We were
using operand_equal_p to compare i with i which of course results in
a false negative.  The following restricts us to the interesting
cases (SSA names and decls) and then simply use ==.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-08-05  Richard Biener  

PR middle-end/67120
* match.pd: Compare address bases with == if they are decls
or SSA names, not operand_equal_p.  Otherwise fail.

* gcc.dg/torture/pr67120.c: New testcase.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 226612)
+++ gcc/match.pd(working copy)
@@ -1848,13 +1920,14 @@ (define_operator_list CBRT BUILT_IN_CBRT
(if (base0 && base1)
 (with
  {
-   int equal;
+   int equal = 2;
if (decl_in_symtab_p (base0)
   && decl_in_symtab_p (base1))
  equal = symtab_node::get_create (base0)
   ->equal_address_to (symtab_node::get_create (base1));
-   else
- equal = operand_equal_p (base0, base1, 0);
+   else if ((DECL_P (base0) || TREE_CODE (base0) == SSA_NAME)
+   && (DECL_P (base1) || TREE_CODE (base1) == SSA_NAME))
+ equal = (base0 == base1);
  }
  (if (equal == 1
  && (cmp == EQ_EXPR || cmp == NE_EXPR
Index: gcc/testsuite/gcc.dg/torture/pr67120.c
===
--- gcc/testsuite/gcc.dg/torture/pr67120.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr67120.c  (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+
+volatile int *volatile *a;
+static volatile int *volatile **b = &a;
+
+int
+main ()
+{
+  volatile int *volatile c;
+  *b = &c;
+
+  if (a != &c) 
+__builtin_abort (); 
+
+  return 0;
+}


Re: [libquadmath, patch] Add logbq() to libquadmath

2015-08-05 Thread FX
> AFAICT there is something missing in the patch: I do not see any compilation 
> of math/logbq.c and indeed no trace of logbq in libquadmath. What I am 
> missing?

Maybe you didn’t regenerate the Makefile.in?
The patch was sent without this regenerated file, as is (as I understand) the 
custom on gcc-patches.

Attached is the full diff, including Makefile.in.

FX



x.diff
Description: Binary data


[PATCH] Cleanup gimple.h accesses to ops array

2015-08-05 Thread Richard Biener

This cleans up the acceses to ops arrays in gimple accessors that already
take a code-specific class as argument.  There is no need to go through
the indirection of gimple_ops () computing the offset of the ops array
at runtime.  For all cases there is also already index checking in place
in the accessor or the class is always allocated with at least the number
of operands that are accessed.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2015-08-05  Richard Biener  

* gimple.h (gimple_call_set_fn): Access op member directly.
(gimple_call_chain_ptr): Likewise.
(gimple_call_set_chain): Likewise.
(gimple_cond_lhs_ptr): Likewise.
(gimple_cond_set_lhs): Likewise.
(gimple_cond_rhs_ptr): Likewise.
(gimple_cond_set_rhs): Likewise.
(gimple_cond_true_label): Likewise.
(gimple_cond_set_true_label): Likewise.
(gimple_cond_set_false_label): Likewise.
(gimple_cond_false_label): Likewise.
(gimple_label_label): Likewise.
(gimple_label_set_label): Likewise.
(gimple_goto_set_dest): Likewise.
(gimple_asm_input_op): Likewise.
(gimple_asm_input_op_ptr): Likewise.
(gimple_asm_set_input_op): Likewise.
(gimple_asm_output_op): Likewise.
(gimple_asm_output_op_ptr): Likewise.
(gimple_asm_set_output_op): Likewise.
(gimple_asm_clobber_op): Likewise.
(gimple_asm_set_clobber_op): Likewise.
(gimple_asm_label_op): Likewise.
(gimple_asm_set_label_op): Likewise.
(gimple_switch_index): Likewise.
(gimple_switch_index_ptr): Likewise.
(gimple_return_retval_ptr): Likewise.
(gimple_return_retval): Likewise.
(gimple_return_set_retval): Likewise.
(gimple_switch_set_index): Likewise.  Remove superfluous GIMPLE_CHECK.
(gimple_switch_label): Likewise.
(gimple_switch_set_label): Likewise.

Index: gcc/gimple.h
===
--- gcc/gimple.h(revision 226623)
+++ gcc/gimple.h(working copy)
@@ -2757,7 +2757,7 @@ static inline void
 gimple_call_set_fn (gcall *gs, tree fn)
 {
   gcc_gimple_checking_assert (!gimple_call_internal_p (gs));
-  gimple_set_op (gs, 1, fn);
+  gs->op[1] = fn;
 }
 
 
@@ -2826,7 +2826,7 @@ gimple_call_chain (const_gimple gs)
 static inline tree *
 gimple_call_chain_ptr (const gcall *call_stmt)
 {
-  return gimple_op_ptr (call_stmt, 2);
+  return const_cast (&call_stmt->op[2]);
 }
 
 /* Set CHAIN to be the static chain for call statement CALL_STMT.  */
@@ -2834,7 +2834,7 @@ gimple_call_chain_ptr (const gcall *call
 static inline void
 gimple_call_set_chain (gcall *call_stmt, tree chain)
 {
-  gimple_set_op (call_stmt, 2, chain);
+  call_stmt->op[2] = chain;
 }
 
 
@@ -3099,7 +3099,7 @@ gimple_cond_lhs (const_gimple gs)
 static inline tree *
 gimple_cond_lhs_ptr (const gcond *gs)
 {
-  return gimple_op_ptr (gs, 0);
+  return const_cast (&gs->op[0]);
 }
 
 /* Set LHS to be the LHS operand of the predicate computed by
@@ -3108,7 +3108,7 @@ gimple_cond_lhs_ptr (const gcond *gs)
 static inline void
 gimple_cond_set_lhs (gcond *gs, tree lhs)
 {
-  gimple_set_op (gs, 0, lhs);
+  gs->op[0] = lhs;
 }
 
 
@@ -3127,7 +3127,7 @@ gimple_cond_rhs (const_gimple gs)
 static inline tree *
 gimple_cond_rhs_ptr (const gcond *gs)
 {
-  return gimple_op_ptr (gs, 1);
+  return const_cast (&gs->op[1]);
 }
 
 
@@ -3137,7 +3137,7 @@ gimple_cond_rhs_ptr (const gcond *gs)
 static inline void
 gimple_cond_set_rhs (gcond *gs, tree rhs)
 {
-  gimple_set_op (gs, 1, rhs);
+  gs->op[1] = rhs;
 }
 
 
@@ -3147,7 +3147,7 @@ gimple_cond_set_rhs (gcond *gs, tree rhs
 static inline tree
 gimple_cond_true_label (const gcond *gs)
 {
-  return gimple_op (gs, 2);
+  return gs->op[2];
 }
 
 
@@ -3157,7 +3157,7 @@ gimple_cond_true_label (const gcond *gs)
 static inline void
 gimple_cond_set_true_label (gcond *gs, tree label)
 {
-  gimple_set_op (gs, 2, label);
+  gs->op[2] = label;
 }
 
 
@@ -3167,7 +3167,7 @@ gimple_cond_set_true_label (gcond *gs, t
 static inline void
 gimple_cond_set_false_label (gcond *gs, tree label)
 {
-  gimple_set_op (gs, 3, label);
+  gs->op[3] = label;
 }
 
 
@@ -3177,8 +3177,7 @@ gimple_cond_set_false_label (gcond *gs,
 static inline tree
 gimple_cond_false_label (const gcond *gs)
 {
-
-  return gimple_op (gs, 3);
+  return gs->op[3];
 }
 
 
@@ -3269,7 +3268,7 @@ gimple_cond_set_condition (gcond *stmt,
 static inline tree
 gimple_label_label (const glabel *gs)
 {
-  return gimple_op (gs, 0);
+  return gs->op[0];
 }
 
 
@@ -3279,7 +3278,7 @@ gimple_label_label (const glabel *gs)
 static inline void
 gimple_label_set_label (glabel *gs, tree label)
 {
-  gimple_set_op (gs, 0, label);
+  gs->op[0] = label;
 }
 
 
@@ -3298,7 +3297,7 @@ gimple_goto_dest (const_gimple gs)
 static inline void
 gimple_goto_set_dest (ggoto *gs, tree dest)
 {
-  gimple_set_op (gs, 0, dest);
+  gs->op[0] = dest;
 }
 
 
@@ -3436,7 +34

Re: [PATCH] Add X != !X pattern

2015-08-05 Thread James Greenhalgh
On Wed, Aug 05, 2015 at 12:09:35PM +0100, Richard Biener wrote:
> On Wed, 5 Aug 2015, Andrew Pinski wrote:
> 
> > On Wed, Aug 5, 2015 at 3:16 AM, Richard Biener  wrote:
> > > On Wed, 5 Aug 2015, Andreas Schwab wrote:
> > >
> > >> Richard Biener  writes:
> > >>
> > >> > * gimple-fold.c (gimple_fold_stmt_to_constant_1): Canonicalize
> > >> > bool compares on RHS.
> > >> > * match.pd: Add X ==/!= !X is false/true pattern.
> > >>
> > >> ERROR in VTST/VTSTQ 
> > >> (/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> > >>  line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 0x1 
> > >> != 0xff  (signed input)
> > >> FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  execution test
> > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
> > >> \t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ \t]*v[0-9]+.[0-9]+[bshd],[ 
> > >> \t]+v[0-9]+.[0-9]+[bshd] 14
> > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times [ 
> > >> \t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ \t]+d[0-9]+ 4
> > >> FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
> > >> FAIL: gcc.target/aarch64/singleton_intrinsics_1.c scan-assembler-times 
> > >> \\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2
> > >
> > > Ick - somebody will have to come up with a reduced testcase for one of
> > > this (best an execute fail).  Reduced to one failing case so I can
> > > investigate with a cross compiler.
> > >
> > > Eventually smells like a aarch64 vector specific issue or a latent
> > > issue with the truth_valued_p predicate for vector types.
> > 
> > Or constant_boolean_node is not returning {-1,-1,-1,-1} for true vectors.
> 
> It does.

You could try with the attached (execute) testcase.

Output for me (x86_64/AArch64 trunk compiler) is:

  Expected: ff00 Got: 7f80807f8000

Those folded values look suspicious! We fold as so:

arg1_2 = { -128, -1, 127, -122, -128, -1, 0, 118 };
arg2_3 = { 127, -128, 127, -128, -1, 127, 127, 0 };

Visiting statement:
_5 = arg1_2 & arg2_3;
which is likely CONSTANT
Match-and-simplified arg1_2 & arg2_3 to { 0, -128, 127, -128, -128, 127, 0, 0 }
Lattice value changed to CONSTANT { 0, -128, 127, -128, -128, 127, 0, 0 }.  
Adding SSA edges to worklist.
interesting_ssa_edges: adding SSA use in _13 = VIEW_CONVERT_EXPR(_5);
marking stmt to be not simulated again

I'd have expected masks of "-1" in the true vector lanes rather than what
we end up with.

Thanks,
James 

#include 
#include 

typedef int8_t int8x8_t __attribute__ ((vector_size (8)));
typedef uint8_t uint8x8_t __attribute__ ((vector_size (8)));
typedef uint64_t uint64x1_t __attribute__ ((vector_size (8)));

__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
vtst_s8 (int8x8_t __a, int8x8_t __b)
{
  return (uint8x8_t) ((__a & __b) != 0);
}

int
main (int argc, char** argv)
{
  int8x8_t arg1 = (int8x8_t) UINT64_C (0x7600ff80867fff80);
  int8x8_t arg2 = (int8x8_t) UINT64_C (0x007f7fff807f807f);
  uint8x8_t result;
  uint64_t got;

  /*  Expected result = ff00  */
  result = vtst_s8(arg1, arg2);
  got = ((uint64x1_t) result)[0];
  uint64_t expected = UINT64_C(0xff00);
  if(expected != got)
{
  printf("Expected: %016" PRIx64 " Got: %016" PRIx64 "\n", expected, got);
}
}



Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-08-05 Thread Trevor Saunders
On Wed, Aug 05, 2015 at 01:47:30PM +0200, Richard Biener wrote:
> On Wed, Aug 5, 2015 at 12:57 PM, Trevor Saunders  
> wrote:
> > On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
> >> On Sat, Jul 25, 2015 at 4:37 AM,   wrote:
> >> > From: Trevor Saunders 
> >> >
> >> > * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
> >> > config/ia64/ia64-protos.h, config/ia64/ia64.c, 
> >> > config/ia64/ia64.h,
> >> > config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
> >> > config/rs6000/rs6000.c, config/rs6000/xcoff.h, config/spu/spu.h,
> >> > config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to
> >> > the name of a function.
> >> > * output.h (default_output_label): New prototype.
> >> > * varasm.c (default_output_label): New function.
> >> > * vmsdbgout.c: Include tm_p.h.
> >> > * xcoffout.c: Likewise.
> >>
> >> Just a general remark - the GCC output machinery is known to be slow,
> >> adding indirect calls might be not the very best idea without refactoring
> >> some of it.
> >>
> >> Did you do any performance measurements for artificial testcases
> >> exercising the specific bits you change?
> >
> > sorry about the delay, but I finally got a chance to do some perf tests
> > of the first patch.  I took three test cases fold-const.ii, insn-emit.ii
> > and a random .i from firefox and did 3 trials of the length of 100
> > compilations.  The only non default flag was -std=gnu++11.
> >
> > results before patch hookizing output_ascii
> >
> > fold-const.ii
> > real3m18.051s
> > user2m41.340s
> > sys 0m36.544s
> > real3m18.141s
> > user2m42.236s
> > sys 0m35.740s
> > real3m18.297s
> > user2m42.316s
> > sys 0m35.804s
> >
> > insn-emit.ii
> > real9m58.229s
> > user8m26.960s
> > sys 1m31.224s
> > real9m57.857s
> > user8m24.616s
> > sys 1m33.072s
> > real9m57.922s
> > user8m25.232s
> > sys 1m32.512s
> >
> > mozilla.ii
> > real8m5.732s
> > user6m44.888s
> > sys 1m20.764s
> > real8m5.404s
> > user6m44.468s
> > sys 1m20.856s
> > real7m59.197s
> > user6m39.632s
> > sys 1m19.472s
> >
> > after patch
> >
> > fold-const.ii
> > real3m18.488s
> > user2m41.972s
> > sys 0m36.388s
> > real3m18.215s
> > user2m41.640s
> > sys 0m36.432s
> > real3m18.368s
> > user2m42.492s
> > sys 0m35.720s
> >
> > insn-emit.ii
> > real10m4.700s
> > user8m32.536s
> > sys 1m32.120s
> > real10m4.241s
> > user8m31.456s
> > sys 1m32.728s
> > real10m4.515s
> > user8m32.056s
> > sys 1m32.396s
> >
> > mozilla.ii
> > real7m58.018s
> > user6m38.008s
> > sys 1m19.924s
> > real7m59.269s
> > user6m37.736s
> > sys 1m21.448s
> > real7m58.254s
> > user6m37.828s
> > sys 1m20.324s
> >
> > So, roughly that looks to me like a range from improving by .5% to
> > regressing by 1%.  I'm not sure what could cause an improvement, so I
> > kind of wonder how valid these results are.
> 
> Hmm, indeed.  The speedup looks suspicious.
> 
> > Another question is how one can refactor the output machinary to be
> > faster.  My first  thought is to buffer text internally before calling
> > stdio functions, but that seems like a giant job.
> 
> stdio functions are already buffering, so I don't know either.

 yeah, but the over head of calling functions in libc is higher than
 that for functions in gcc (especially if they can get inlined)
 right?  Especially when a lot of these things seme to loop calling
 putc...

> But yes, going the libas route would improve things here, or for
> example enhancing gas to be able to eat target binary data
> without the need to encode it in printable characters...
> 
> .raw_data number-of-bytes
> 
> 
> Makes it quite unparsable to editors of course ...

The idea of having .S files that aren't reasonably editable seems kind
of silly, but I guess its up to the gas people.

Trev

> 
> Richard.
> 
> > thanks!
> >
> > Trev
> >
> > far outside of noise,
> >>
> >> Richard.
> >>
> >> > ---
> >> >  gcc/config/arc/arc.h  |  3 +--
> >> >  gcc/config/bfin/bfin.h|  5 +
> >> >  gcc/config/frv/frv.h  |  6 +-
> >> >  gcc/config/ia64/ia64-protos.h |  1 +
> >> >  gcc/config/ia64/ia64.c| 11 +++
> >> >  gcc/config/ia64/ia64.h|  8 +---
> >> >  gcc/config/lm32/lm32.h|  3 +--
> >> >  gcc/config/mep/mep.h  |  8 +---
> >> >  gcc/config/mmix/mmix.h|  3 +--
> >> >  gcc/config/pa/pa-protos.h |  1 +
> >> >  gcc/config/pa/pa.c| 12 
> >> >  gcc/config/pa/pa.h|  9 +
> >> >  gcc/config/rs6000/rs6000-protos.h |  1 +
> >> >  gcc/config/rs6000/rs6000.c|  8 
> >> >  gcc/config/rs6000/xcoff.h |  3 +--
> >> >  gcc/config/spu/spu.h  |  3 +--
> >> >  gcc/config

Re: [PATCH] Add X != !X pattern

2015-08-05 Thread Richard Biener
On Wed, 5 Aug 2015, James Greenhalgh wrote:

> On Wed, Aug 05, 2015 at 12:09:35PM +0100, Richard Biener wrote:
> > On Wed, 5 Aug 2015, Andrew Pinski wrote:
> > 
> > > On Wed, Aug 5, 2015 at 3:16 AM, Richard Biener  wrote:
> > > > On Wed, 5 Aug 2015, Andreas Schwab wrote:
> > > >
> > > >> Richard Biener  writes:
> > > >>
> > > >> > * gimple-fold.c (gimple_fold_stmt_to_constant_1): Canonicalize
> > > >> > bool compares on RHS.
> > > >> > * match.pd: Add X ==/!= !X is false/true pattern.
> > > >>
> > > >> ERROR in VTST/VTSTQ 
> > > >> (/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> > > >>  line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 0x1 
> > > >> != 0xff  (signed input)
> > > >> FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  execution 
> > > >> test
> > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times 
> > > >> [ \t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ \t]*v[0-9]+.[0-9]+[bshd],[ 
> > > >> \t]+v[0-9]+.[0-9]+[bshd] 14
> > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c scan-assembler-times 
> > > >> [ \t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ \t]+d[0-9]+ 4
> > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
> > > >> FAIL: gcc.target/aarch64/singleton_intrinsics_1.c scan-assembler-times 
> > > >> \\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2
> > > >
> > > > Ick - somebody will have to come up with a reduced testcase for one of
> > > > this (best an execute fail).  Reduced to one failing case so I can
> > > > investigate with a cross compiler.
> > > >
> > > > Eventually smells like a aarch64 vector specific issue or a latent
> > > > issue with the truth_valued_p predicate for vector types.
> > > 
> > > Or constant_boolean_node is not returning {-1,-1,-1,-1} for true vectors.
> > 
> > It does.
> 
> You could try with the attached (execute) testcase.
> 
> Output for me (x86_64/AArch64 trunk compiler) is:
> 
>   Expected: ff00 Got: 7f80807f8000
> 
> Those folded values look suspicious! We fold as so:
> 
> arg1_2 = { -128, -1, 127, -122, -128, -1, 0, 118 };
> arg2_3 = { 127, -128, 127, -128, -1, 127, 127, 0 };
> 
> Visiting statement:
> _5 = arg1_2 & arg2_3;
> which is likely CONSTANT
> Match-and-simplified arg1_2 & arg2_3 to { 0, -128, 127, -128, -128, 127, 0, 0 
> }
> Lattice value changed to CONSTANT { 0, -128, 127, -128, -128, 127, 0, 0 }.  
> Adding SSA edges to worklist.
> interesting_ssa_edges: adding SSA use in _13 = 
> VIEW_CONVERT_EXPR(_5);
> marking stmt to be not simulated again
> 
> I'd have expected masks of "-1" in the true vector lanes rather than what
> we end up with.

__extension__ static __inline uint8x8_t __attribute__ 
((__always_inline__))
vtst_s8 (int8x8_t __a, int8x8_t __b)
{
  return (uint8x8_t) ((__a & __b) != 0);
}

you expect that to be a truth and but it is a bitwise and.  So IMHO
it works "as expected".  Does the backend actually generate a truth-and
instruction for vtst_s8!?

Anyway, trying a cross now.

> Thanks,
> James 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [libquadmath, patch] Add logbq() to libquadmath

2015-08-05 Thread Dominique d'Humières

> Le 5 août 2015 à 15:11, FX  a écrit :
> 
>> AFAICT there is something missing in the patch: I do not see any compilation 
>> of math/logbq.c and indeed no trace of logbq in libquadmath. What I am 
>> missing?
> 
> Maybe you didn’t regenerate the Makefile.in?

Indeed I did not!-(I have never succeeded with the regenerate process: not the 
right version, …).

> The patch was sent without this regenerated file, as is (as I understand) the 
> custom on gcc-patches.
> 
> Attached is the full diff, including Makefile.in.

With the updated patch the test gfortran.dg/ieee/large_1.f90 compiles, but 
fails at run time due to the lines

  if (.not. ieee_support_underflow_control(x1)) call abort

and

  if (.not. ieee_support_underflow_control(x2)) call abort

IIRC Uros said that underflow id not supported for __float128.

Thanks for the answer,

Dominique

> 
> FX
> 
> 



Re: [PATCH] Add X != !X pattern

2015-08-05 Thread Richard Biener
On Wed, 5 Aug 2015, Richard Biener wrote:

> On Wed, 5 Aug 2015, James Greenhalgh wrote:
> 
> > On Wed, Aug 05, 2015 at 12:09:35PM +0100, Richard Biener wrote:
> > > On Wed, 5 Aug 2015, Andrew Pinski wrote:
> > > 
> > > > On Wed, Aug 5, 2015 at 3:16 AM, Richard Biener  
> > > > wrote:
> > > > > On Wed, 5 Aug 2015, Andreas Schwab wrote:
> > > > >
> > > > >> Richard Biener  writes:
> > > > >>
> > > > >> > * gimple-fold.c (gimple_fold_stmt_to_constant_1): Canonicalize
> > > > >> > bool compares on RHS.
> > > > >> > * match.pd: Add X ==/!= !X is false/true pattern.
> > > > >>
> > > > >> ERROR in VTST/VTSTQ 
> > > > >> (/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> > > > >>  line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 
> > > > >> 0x1 != 0xff  (signed input)
> > > > >> FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  execution 
> > > > >> test
> > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c 
> > > > >> scan-assembler-times [ \t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ 
> > > > >> \t]*v[0-9]+.[0-9]+[bshd],[ \t]+v[0-9]+.[0-9]+[bshd] 14
> > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c 
> > > > >> scan-assembler-times [ \t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ 
> > > > >> \t]+d[0-9]+ 4
> > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
> > > > >> FAIL: gcc.target/aarch64/singleton_intrinsics_1.c 
> > > > >> scan-assembler-times \\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2
> > > > >
> > > > > Ick - somebody will have to come up with a reduced testcase for one of
> > > > > this (best an execute fail).  Reduced to one failing case so I can
> > > > > investigate with a cross compiler.
> > > > >
> > > > > Eventually smells like a aarch64 vector specific issue or a latent
> > > > > issue with the truth_valued_p predicate for vector types.
> > > > 
> > > > Or constant_boolean_node is not returning {-1,-1,-1,-1} for true 
> > > > vectors.
> > > 
> > > It does.
> > 
> > You could try with the attached (execute) testcase.
> > 
> > Output for me (x86_64/AArch64 trunk compiler) is:
> > 
> >   Expected: ff00 Got: 7f80807f8000
> > 
> > Those folded values look suspicious! We fold as so:
> > 
> > arg1_2 = { -128, -1, 127, -122, -128, -1, 0, 118 };
> > arg2_3 = { 127, -128, 127, -128, -1, 127, 127, 0 };
> > 
> > Visiting statement:
> > _5 = arg1_2 & arg2_3;
> > which is likely CONSTANT
> > Match-and-simplified arg1_2 & arg2_3 to { 0, -128, 127, -128, -128, 127, 0, 
> > 0 }
> > Lattice value changed to CONSTANT { 0, -128, 127, -128, -128, 127, 0, 0 }.  
> > Adding SSA edges to worklist.
> > interesting_ssa_edges: adding SSA use in _13 = 
> > VIEW_CONVERT_EXPR(_5);
> > marking stmt to be not simulated again
> > 
> > I'd have expected masks of "-1" in the true vector lanes rather than what
> > we end up with.
> 
> __extension__ static __inline uint8x8_t __attribute__ 
> ((__always_inline__))
> vtst_s8 (int8x8_t __a, int8x8_t __b)
> {
>   return (uint8x8_t) ((__a & __b) != 0);
> }
> 
> you expect that to be a truth and but it is a bitwise and.  So IMHO
> it works "as expected".  Does the backend actually generate a truth-and
> instruction for vtst_s8!?
> 
> Anyway, trying a cross now.

Reproduces on x86_64 as well, fix in testing.

Richard.


Re: [gofrontend-dev] Re: libgo patch committed: Kill sleep processes in testsuite

2015-08-05 Thread Ian Lance Taylor
[ + Andrew Wilkins ]

On Wed, Aug 5, 2015 at 1:58 AM, Andreas Schwab  wrote:
> PASS
> kill: not enough arguments
> FAIL: net
> Makefile:4696: recipe for target 'net/check' failed
> make[4]: *** [net/check] Error 1
>
> $ cat net/check-testlog
> PASS
> kill: not enough arguments
> FAIL: net
> ../../../libgo/testsuite/gotest: line 514: gotest-timeout: No such file or 
> directory
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
>
> --
> You received this message because you are subscribed to the Google Groups 
> "gofrontend-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to gofrontend-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


Re: [PATCH] Add X != !X pattern

2015-08-05 Thread James Greenhalgh
On Wed, Aug 05, 2015 at 02:38:01PM +0100, Richard Biener wrote:
> On Wed, 5 Aug 2015, James Greenhalgh wrote:
> 
> > On Wed, Aug 05, 2015 at 12:09:35PM +0100, Richard Biener wrote:
> > > On Wed, 5 Aug 2015, Andrew Pinski wrote:
> > > 
> > > > On Wed, Aug 5, 2015 at 3:16 AM, Richard Biener  
> > > > wrote:
> > > > > On Wed, 5 Aug 2015, Andreas Schwab wrote:
> > > > >
> > > > >> Richard Biener  writes:
> > > > >>
> > > > >> > * gimple-fold.c (gimple_fold_stmt_to_constant_1): Canonicalize
> > > > >> > bool compares on RHS.
> > > > >> > * match.pd: Add X ==/!= !X is false/true pattern.
> > > > >>
> > > > >> ERROR in VTST/VTSTQ 
> > > > >> (/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> > > > >>  line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 
> > > > >> 0x1 != 0xff  (signed input)
> > > > >> FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  execution 
> > > > >> test
> > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c 
> > > > >> scan-assembler-times [ \t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ 
> > > > >> \t]*v[0-9]+.[0-9]+[bshd],[ \t]+v[0-9]+.[0-9]+[bshd] 14
> > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c 
> > > > >> scan-assembler-times [ \t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ 
> > > > >> \t]+d[0-9]+ 4
> > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
> > > > >> FAIL: gcc.target/aarch64/singleton_intrinsics_1.c 
> > > > >> scan-assembler-times \\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2
> > > > >
> > > > > Ick - somebody will have to come up with a reduced testcase for one of
> > > > > this (best an execute fail).  Reduced to one failing case so I can
> > > > > investigate with a cross compiler.
> > > > >
> > > > > Eventually smells like a aarch64 vector specific issue or a latent
> > > > > issue with the truth_valued_p predicate for vector types.
> > > > 
> > > > Or constant_boolean_node is not returning {-1,-1,-1,-1} for true 
> > > > vectors.
> > > 
> > > It does.
> > 
> > You could try with the attached (execute) testcase.
> > 
> > Output for me (x86_64/AArch64 trunk compiler) is:
> > 
> >   Expected: ff00 Got: 7f80807f8000
> > 
> > Those folded values look suspicious! We fold as so:
> > 
> > arg1_2 = { -128, -1, 127, -122, -128, -1, 0, 118 };
> > arg2_3 = { 127, -128, 127, -128, -1, 127, 127, 0 };
> > 
> > Visiting statement:
> > _5 = arg1_2 & arg2_3;
> > which is likely CONSTANT
> > Match-and-simplified arg1_2 & arg2_3 to { 0, -128, 127, -128, -128, 127, 0, 
> > 0 }
> > Lattice value changed to CONSTANT { 0, -128, 127, -128, -128, 127, 0, 0 }.  
> > Adding SSA edges to worklist.
> > interesting_ssa_edges: adding SSA use in _13 = 
> > VIEW_CONVERT_EXPR(_5);
> > marking stmt to be not simulated again
> > 
> > I'd have expected masks of "-1" in the true vector lanes rather than what
> > we end up with.
> 
> __extension__ static __inline uint8x8_t __attribute__ 
> ((__always_inline__))
> vtst_s8 (int8x8_t __a, int8x8_t __b)
> {
>   return (uint8x8_t) ((__a & __b) != 0);
> }
> 
> you expect that to be a truth and but it is a bitwise and.  So IMHO
> it works "as expected".  Does the backend actually generate a truth-and
> instruction for vtst_s8!?

Sorry, I pasted an unhelpful part of the log...

I'd have expected the "!= 0" to give a truth value, but we've already lost
that after early inline:

  arg1_2 = { -128, -1, 127, -122, -128, -1, 0, 118 };
  arg2_3 = { 127, -128, 127, -128, -1, 127, 127, 0 };
  _5 = arg1_2 & arg2_3;
  _13 = VIEW_CONVERT_EXPR(_5);
  _14 = _13;
  result_6 = _14;
  _7 = VIEW_CONVERT_EXPR(result_6);
  _8 = _7;
  got_9 = BIT_FIELD_REF <_8, 64, 0>;

After which, 

  _5 = arg1_2 & arg2_3;

folds to non-truth values as expected (as you say, that is a bitwise and) but
we've dropped the "!= 0" which was in vtst_s8 and that we were relying on
to get our truth value masks out.

vtst_s8 (int8x8_tD.2338 __aD.2341, int8x8_tD.2338 __bD.2342)
{
  vector(8) signed charD.17 _3;
  vector(8) signed charD.17 _4;
  uint8x8_tD.2339 _5;

;;   basic block 2, loop depth 0, count 0, freq 0, maybe hot
;;prev block 0, next block 1, flags: (NEW, REACHABLE)
;;pred:   ENTRY (FALLTHRU)
  _3 = __a_1(D) & __b_2(D);
  _4 = _3 != { 0, 0, 0, 0, 0, 0, 0, 0 };
  _5 = VIEW_CONVERT_EXPR(_4);
  # VUSE <.MEM_6(D)>
  return _5;
;;succ:   EXIT
}

Sorry for the poor communication.

James

> 
> Anyway, trying a cross now.
> 
> > Thanks,
> > James 
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
> Norton, HRB 21284 (AG Nuernberg)
> 


RFA: RL78: Fix multiply costs when optimizing for size

2015-08-05 Thread Nick Clifton
Hi DJ,

  The patch below fixes a small problem with the RL78 backend.  When
  optimizing for size it is better to use a slow multiply instruction
  than a faster, but larger, shift sequence.  So the patch tweaks the
  rtx costs for MULT insns when speed is not a priority.

  Tested with no regressions on an rl78-elf toolchain.

  OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2015-08-05  Nick Clifton  

* config/rl78/rl78.c (rl78_rtx_costs): Treat MULT insns as cheap
if optimizing for size.

Index: gcc/config/rl78/rl78.c
===
RCS file: /cvs/cvsfiles/gnupro/gcc/config/rl78/rl78.c,v
retrieving revision 1.12.6.15
diff -u -3 -p -r1.12.6.15 rl78.c
--- gcc/config/rl78/rl78.c  29 Jul 2015 12:24:04 -  1.12.6.15
+++ gcc/config/rl78/rl78.c  30 Jul 2015 15:20:10 -
@@ -4161,7 +4161,9 @@ static bool rl78_rtx_costs (rtx   x,
   switch (code)
{
case MULT:
- if (RL78_MUL_G14)
+ if (! speed)
+   * total = COSTS_N_INSNS (5);
+ else if (RL78_MUL_G14)
*total = COSTS_N_INSNS (14);
  else if (RL78_MUL_G13)
*total = COSTS_N_INSNS (29);


Re: [PATCH] Add X != !X pattern

2015-08-05 Thread Richard Biener
On Wed, 5 Aug 2015, Richard Biener wrote:

> On Wed, 5 Aug 2015, Richard Biener wrote:
> 
> > On Wed, 5 Aug 2015, James Greenhalgh wrote:
> > 
> > > On Wed, Aug 05, 2015 at 12:09:35PM +0100, Richard Biener wrote:
> > > > On Wed, 5 Aug 2015, Andrew Pinski wrote:
> > > > 
> > > > > On Wed, Aug 5, 2015 at 3:16 AM, Richard Biener  
> > > > > wrote:
> > > > > > On Wed, 5 Aug 2015, Andreas Schwab wrote:
> > > > > >
> > > > > >> Richard Biener  writes:
> > > > > >>
> > > > > >> > * gimple-fold.c (gimple_fold_stmt_to_constant_1): 
> > > > > >> > Canonicalize
> > > > > >> > bool compares on RHS.
> > > > > >> > * match.pd: Add X ==/!= !X is false/true pattern.
> > > > > >>
> > > > > >> ERROR in VTST/VTSTQ 
> > > > > >> (/opt/gcc/gcc-20150805/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtst.c
> > > > > >>  line 97 in buffer 'expected_signed') at type uint8x8 index 1: got 
> > > > > >> 0x1 != 0xff  (signed input)
> > > > > >> FAIL: gcc.target/aarch64/advsimd-intrinsics/vtst.c   -O1  
> > > > > >> execution test
> > > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c 
> > > > > >> scan-assembler-times [ \t]cmtst[ \t]+v[0-9]+.[0-9]+[bshd],[ 
> > > > > >> \t]*v[0-9]+.[0-9]+[bshd],[ \t]+v[0-9]+.[0-9]+[bshd] 14
> > > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_1.c 
> > > > > >> scan-assembler-times [ \t]cmtst[ \t]+d[0-9]+,[ \t]*d[0-9]+,[ 
> > > > > >> \t]+d[0-9]+ 4
> > > > > >> FAIL: gcc.target/aarch64/simd/int_comparisons_2.c execution test
> > > > > >> FAIL: gcc.target/aarch64/singleton_intrinsics_1.c 
> > > > > >> scan-assembler-times \\tcmtst\\td[0-9]+, d[0-9]+, d[0-9]+ 2
> > > > > >
> > > > > > Ick - somebody will have to come up with a reduced testcase for one 
> > > > > > of
> > > > > > this (best an execute fail).  Reduced to one failing case so I can
> > > > > > investigate with a cross compiler.
> > > > > >
> > > > > > Eventually smells like a aarch64 vector specific issue or a latent
> > > > > > issue with the truth_valued_p predicate for vector types.
> > > > > 
> > > > > Or constant_boolean_node is not returning {-1,-1,-1,-1} for true 
> > > > > vectors.
> > > > 
> > > > It does.
> > > 
> > > You could try with the attached (execute) testcase.
> > > 
> > > Output for me (x86_64/AArch64 trunk compiler) is:
> > > 
> > >   Expected: ff00 Got: 7f80807f8000
> > > 
> > > Those folded values look suspicious! We fold as so:
> > > 
> > > arg1_2 = { -128, -1, 127, -122, -128, -1, 0, 118 };
> > > arg2_3 = { 127, -128, 127, -128, -1, 127, 127, 0 };
> > > 
> > > Visiting statement:
> > > _5 = arg1_2 & arg2_3;
> > > which is likely CONSTANT
> > > Match-and-simplified arg1_2 & arg2_3 to { 0, -128, 127, -128, -128, 127, 
> > > 0, 0 }
> > > Lattice value changed to CONSTANT { 0, -128, 127, -128, -128, 127, 0, 0 
> > > }.  Adding SSA edges to worklist.
> > > interesting_ssa_edges: adding SSA use in _13 = 
> > > VIEW_CONVERT_EXPR(_5);
> > > marking stmt to be not simulated again
> > > 
> > > I'd have expected masks of "-1" in the true vector lanes rather than what
> > > we end up with.
> > 
> > __extension__ static __inline uint8x8_t __attribute__ 
> > ((__always_inline__))
> > vtst_s8 (int8x8_t __a, int8x8_t __b)
> > {
> >   return (uint8x8_t) ((__a & __b) != 0);
> > }
> > 
> > you expect that to be a truth and but it is a bitwise and.  So IMHO
> > it works "as expected".  Does the backend actually generate a truth-and
> > instruction for vtst_s8!?
> > 
> > Anyway, trying a cross now.
> 
> Reproduces on x86_64 as well, fix in testing.

For reference see below (testing on aarch64 appreciated).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-08-05  Richard Biener  

* gimple-fold.c (gimple_fold_stmt_to_constant_1): Simplify
bool comparison canonicalization and restrict to integers.

Index: gcc

Re: RFA: RL78: Remove far operand optimization in rl78_force_nonfar_3

2015-08-05 Thread Nick Clifton

Hi DJ,


This is OK, but note that it prevents some operations like:

__far int i;

foo()
{
   i ++;
}

from being implemented with a minimum set of opcodes.  This might be
particularly troublesome for volatile far things.


Right - it is something I will have to look into.

Cheers
  Nick




RE: Regression in target MIC compiler

2015-08-05 Thread David Sherwood
Hi Thomas,

In lto_input_mode_table there is the following line of code:

machine_mode inner = (machine_mode) table[bp_unpack_value (&bp, 8)];

Is this right? In lto_write_mode_table this inner mode is written out explicitly
into the stream already, so do we just need this instead?

machine_mode inner = (machine_mode) bp_unpack_value (&bp, 8);

It's possible I'm misunderstanding the code somehow though ...

Regards,
David.

> -Original Message-
> From: Thomas Schwinge [mailto:tho...@codesourcery.com]
> Sent: 05 August 2015 11:46
> To: David Sherwood
> Cc: Jakub Jelinek; gcc-patches@gcc.gnu.org; Kirill Yukhin; 
> nat...@codesourcery.com; Richard Sandiford;
> Ilya Verbin; Jeff Law
> Subject: RE: Regression in target MIC compiler
> 
> Hi!
> 
> On Wed, 5 Aug 2015 11:18:32 +0100, David Sherwood  
> wrote:
> > If this looks like my fault
> 
> Well, not necessarily your fault -- might as well just be something that
> has already been lurking in gcc/lto-streamer-in.c:lto_input_mode_table,
> but so far we've gotten away without tripping over it.
> 
> > I am happy to look into this and fix the bug
> 
> Thanks for helping!
> 
> > if you can tell me how to reproduce it. I recently changed GET_MODE_INNER 
> > (m)
> > to return 'm' itself if there is no inner mode and I thought I'd fixed up 
> > lto,
> > but it seems I got it wrong. It also sounds like there is another bug in 
> > this
> > area too - if I want to test this do I need to apply any other patches too?
> 
> gcc/lto-streamer-out.c:lto_write_mode_table as well as
> gcc/lto-streamer-in.c:lto_input_mode_table are not used in regular LTO,
> but are only used in offloading configurations,
> .  To reproduce this, you'd build
> such a configuration (offloading to x86_64-intelmicemul-linux-gnu is
> easier to build than nvptx-none),
> .
> You can use the build scripts I uploaded, or do the steps manually.
> Running the libgomp testsuite, then observe, for example,
> libgomp.c/examples-4/array_sections-3.c hang (or, fail with »unsupported
> mode QI« with the mr vs. m confusion fixed, see below).
> 
> I'm happy to test any patches or also hypotheses that you suggest --
> maybe something is obvious to you just from looking at the code?
> 
> 
> For reference:
> 
> > > On Tue, 4 Aug 2015 16:06:23 +0300, Ilya Verbin  wrote:
> > > > On Tue, Aug 04, 2015 at 14:35:11 +0200, Thomas Schwinge wrote:
> > > > > On Fri, 31 Jul 2015 20:13:02 +0300, Ilya Verbin  
> > > > > wrote:
> > > > > > On Fri, Jul 31, 2015 at 18:59:59 +0200, Jakub Jelinek wrote:
> > > > > > > > > On Wed, Feb 18, 2015 at 11:00:35 +0100, Jakub Jelinek wrote:
> > > > > > > > > +  /* First search just the GET_CLASS_NARROWEST_MODE to 
> > > > > > > > > wider modes,
> > > > > > > > > +  if not found, fallback to all modes.  */
> > > > > > > > > +  int pass;
> > > > > > > > > +  for (pass = 0; pass < 2; pass++)
> > > > > > > > > + for (machine_mode mr = pass ? VOIDmode
> > > > > > > > > + : GET_CLASS_NARROWEST_MODE 
> > > > > > > > > (mclass);
> > > > > > > > > +  pass ? mr < MAX_MACHINE_MODE : mr != VOIDmode;
> > > > > > > > > +  pass ? mr = (machine_mode) (m + 1)
> > > > > > > > > +   : mr = GET_MODE_WIDER_MODE (mr))
> > > > > > > > > +   if (GET_MODE_CLASS (mr) != mclass
> > > > > > > > > +   || GET_MODE_SIZE (mr) != size
> > > > > > > > > +   || GET_MODE_PRECISION (mr) != prec
> > > > > > > > > +   || GET_MODE_INNER (mr) != inner
> > > > > > > > > +   || GET_MODE_IBIT (mr) != ibit
> > > > > > > > > +   || GET_MODE_FBIT (mr) != fbit
> > > > > > > > > +   || GET_MODE_NUNITS (mr) != nunits)
> > > > > > > > > + continue;
> > > > > > > > >
> > > > > > > > > Given that gomp-4_1-branch works ok, the problem was 
> > > > > > > > > introduced somewhere
> > > > > > > > > between 9 and 31 Jul.  I'll try to find the revision.
> > > > > > > >
> > > > > > > > Shouldn't 'mr' be here instead of 'm'?
> > > > > > >
> > > > > > > I think so.  If it works, patch preapproved.
> > > > > >
> > > > > > It fixes the infinite loop, but causes an error:
> > > > > > lto1: fatal error: unsupported mode QI
> > > > >
> > > > > Confirmed.
> > > > >
> > > > > > > But wonder what changed that we haven't been triggering it before.
> > > > > > > What mode do you think it on 
> > > > > > > (mclass/size/prec/inner/ibit/fbit/nunits)?
> > > > > >
> > > > > > When in hangs, mr is HImode.
> > > > >
> > > > > Do you already have any further analysis, a workaround, or even a fix?
> > > >
> > > > Not yet.  I thought since Jakub is the author of this function, he 
> > > > could easily
> > > > point what is wrong here :)  Actually, intelmic doesn't require
> > > > lto_input_mode_table, so temporary workaround is just to disable it.
> > >
> > > Well, avoiding lto_input_mode_table doesn't help us with nvptx
> > 

PR66311: Fix extension in mpz->wide_int conversions

2015-08-05 Thread Richard Sandiford
wi::from_mpz reads the absolute value of the mpz and then negates the
result if the mpz is negative.  When the top bit of the most-significant
HOST_WIDE_INT in the absolute value is set, we implicitly sign-
rather than zero-extend it to full precision.  For example,
1 << 63 gets mangled to (1 << prec) - (1 << 63).

This patch fixes that by ensuring we zero-extend instead.  The testcase
is taken from comment 15 in bugzilla (thanks FX).

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard

gcc/
PR middle-end/66311
* wide-int.cc (wi::from_mpz): Make sure that absolute mpz value
is zero- rather than sign-extended.

gcc/testsuite/
2015-08-05  Francois-Xavier Coudert  

PR middle-end/66311
* gfortran.dg/pr66311.f90: New file.

diff --git a/gcc/testsuite/gfortran.dg/pr66311.f90 
b/gcc/testsuite/gfortran.dg/pr66311.f90
new file mode 100644
index 000..dc40cb6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr66311.f90
@@ -0,0 +1,60 @@
+! { dg-do run }
+! { dg-additional-options "-fno-range-check -w" }
+!
+! Check that we can print large constants
+!
+! "-fno-range-check -w" is used so the testcase compiles even with targets
+! that don't support large integer kinds.
+
+program test
+  use iso_fortran_env, only : ikinds => integer_kinds
+  implicit none
+
+  ! Largest integer kind
+  integer, parameter :: k = ikinds(size(ikinds))
+  integer, parameter :: hk = k / 2
+
+  if (k <= 8) stop
+
+  call check(900_k, "900")
+  call check(9000_k, "9000")
+  call check(int(huge(1_hk), kind=k), "9223372036854775807")
+  call check(2_k**63, "9223372036854775808")
+  call check(1000_k, "1000")
+  call check(18446744065119617024_k, "18446744065119617024")
+  call check(2_k**64 - 1, "18446744073709551615")
+  call check(2_k**64, "18446744073709551616")
+  call check(2000_k, "2000")
+  call check(huge(0_k), "170141183460469231731687303715884105727")
+  call check(huge(0_k)-1, "170141183460469231731687303715884105726")
+
+  call check(-900_k, "-900")
+  call check(-9000_k, "-9000")
+  call check(-int(huge(1_hk), kind=k), "-9223372036854775807")
+  call check(-2_k**63, "-9223372036854775808")
+  call check(-1000_k, "-1000")
+  call check(-18446744065119617024_k, "-18446744065119617024")
+  call check(-(2_k**64 - 1), "-18446744073709551615")
+  call check(-2_k**64, "-18446744073709551616")
+  call check(-2000_k, "-2000")
+  call check(-huge(0_k), "-170141183460469231731687303715884105727")
+  call check(-(huge(0_k)-1), "-170141183460469231731687303715884105726")
+  call check(-huge(0_k)-1, "-170141183460469231731687303715884105728")
+
+  call check(2_k * huge(1_hk), "18446744073709551614")
+  call check((-2_k) * huge(1_hk), "-18446744073709551614")
+
+contains
+
+  subroutine check (i, str)
+implicit none
+integer(kind=k), intent(in), value :: i
+character(len=*), intent(in) :: str
+
+character(len=100) :: buffer
+write(buffer,*) i
+if (adjustl(buffer) /= adjustl(str)) call abort
+  end subroutine
+
+end
+
diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 13ba10c..9a93660 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -252,13 +252,15 @@ wi::from_mpz (const_tree type, mpz_t x, bool wrap)
 }
 
   /* Determine the number of unsigned HOST_WIDE_INTs that are required
- for representing the value.  The code to calculate count is
+ for representing the absolute value.  The code to calculate count is
  extracted from the GMP manual, section "Integer Import and Export":
  http://gmplib.org/manual/Integer-Import-and-Export.html  */
   numb = CHAR_BIT * sizeof (HOST_WIDE_INT);
   count = (mpz_sizeinbase (x, 2) + numb - 1) / numb;
   HOST_WIDE_INT *val = res.write_val ();
-  /* Write directly to the wide_int storage if possible, otherwise leave
+  /* Read the absolute value.
+
+ Write directly to the wide_int storage if possible, otherwise leave
  GMP to allocate the memory for us.  It might be slightly more efficient
  to use mpz_tdiv_r_2exp for the latter case, but the situation is
  pathological and it seems safer to operate on the original mpz value
@@ -276,7 +278,12 @@ wi::from_mpz (const_tree type, mpz_t x, bool wrap)
   memcpy (val, valres, count * sizeof (HOST_WIDE_INT));
   free (valres);
 }
-  res.set_len (canonize (val, count, prec));
+  /* Zero-extend the absolute value to PREC bits.  */
+  if (count < BLOCKS_NEEDED (prec) && val[count - 1] < 0)
+val[count++] = 0;
+  else
+count = canonize (val, count, prec);
+  res.set_len (count);
 
   if (mpz_sgn (x) < 0)
 res = -res;



Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-08-05 Thread Richard Biener
On Wed, Aug 5, 2015 at 3:36 PM, Trevor Saunders  wrote:
> On Wed, Aug 05, 2015 at 01:47:30PM +0200, Richard Biener wrote:
>> On Wed, Aug 5, 2015 at 12:57 PM, Trevor Saunders  
>> wrote:
>> > On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
>> >> On Sat, Jul 25, 2015 at 4:37 AM,   wrote:
>> >> > From: Trevor Saunders 
>> >> >
>> >> > * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
>> >> > config/ia64/ia64-protos.h, config/ia64/ia64.c, 
>> >> > config/ia64/ia64.h,
>> >> > config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
>> >> > config/rs6000/rs6000.c, config/rs6000/xcoff.h, config/spu/spu.h,
>> >> > config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to
>> >> > the name of a function.
>> >> > * output.h (default_output_label): New prototype.
>> >> > * varasm.c (default_output_label): New function.
>> >> > * vmsdbgout.c: Include tm_p.h.
>> >> > * xcoffout.c: Likewise.
>> >>
>> >> Just a general remark - the GCC output machinery is known to be slow,
>> >> adding indirect calls might be not the very best idea without refactoring
>> >> some of it.
>> >>
>> >> Did you do any performance measurements for artificial testcases
>> >> exercising the specific bits you change?
>> >
>> > sorry about the delay, but I finally got a chance to do some perf tests
>> > of the first patch.  I took three test cases fold-const.ii, insn-emit.ii
>> > and a random .i from firefox and did 3 trials of the length of 100
>> > compilations.  The only non default flag was -std=gnu++11.
>> >
>> > results before patch hookizing output_ascii
>> >
>> > fold-const.ii
>> > real3m18.051s
>> > user2m41.340s
>> > sys 0m36.544s
>> > real3m18.141s
>> > user2m42.236s
>> > sys 0m35.740s
>> > real3m18.297s
>> > user2m42.316s
>> > sys 0m35.804s
>> >
>> > insn-emit.ii
>> > real9m58.229s
>> > user8m26.960s
>> > sys 1m31.224s
>> > real9m57.857s
>> > user8m24.616s
>> > sys 1m33.072s
>> > real9m57.922s
>> > user8m25.232s
>> > sys 1m32.512s
>> >
>> > mozilla.ii
>> > real8m5.732s
>> > user6m44.888s
>> > sys 1m20.764s
>> > real8m5.404s
>> > user6m44.468s
>> > sys 1m20.856s
>> > real7m59.197s
>> > user6m39.632s
>> > sys 1m19.472s
>> >
>> > after patch
>> >
>> > fold-const.ii
>> > real3m18.488s
>> > user2m41.972s
>> > sys 0m36.388s
>> > real3m18.215s
>> > user2m41.640s
>> > sys 0m36.432s
>> > real3m18.368s
>> > user2m42.492s
>> > sys 0m35.720s
>> >
>> > insn-emit.ii
>> > real10m4.700s
>> > user8m32.536s
>> > sys 1m32.120s
>> > real10m4.241s
>> > user8m31.456s
>> > sys 1m32.728s
>> > real10m4.515s
>> > user8m32.056s
>> > sys 1m32.396s
>> >
>> > mozilla.ii
>> > real7m58.018s
>> > user6m38.008s
>> > sys 1m19.924s
>> > real7m59.269s
>> > user6m37.736s
>> > sys 1m21.448s
>> > real7m58.254s
>> > user6m37.828s
>> > sys 1m20.324s
>> >
>> > So, roughly that looks to me like a range from improving by .5% to
>> > regressing by 1%.  I'm not sure what could cause an improvement, so I
>> > kind of wonder how valid these results are.
>>
>> Hmm, indeed.  The speedup looks suspicious.
>>
>> > Another question is how one can refactor the output machinary to be
>> > faster.  My first  thought is to buffer text internally before calling
>> > stdio functions, but that seems like a giant job.
>>
>> stdio functions are already buffering, so I don't know either.
>
>  yeah, but the over head of calling functions in libc is higher than
>  that for functions in gcc (especially if they can get inlined)
>  right?  Especially when a lot of these things seme to loop calling
>  putc...

obstacks are used elsewhere to do char buffering.  But not sure how
easy it is to pick low-hanging fruit here.

I suppose it would be nice to isolate the hot parts of the output machinery
only during a bootstrap for example.

>> But yes, going the libas route would improve things here, or for
>> example enhancing gas to be able to eat target binary data
>> without the need to encode it in printable characters...
>>
>> .raw_data number-of-bytes
>> 
>>
>> Makes it quite unparsable to editors of course ...
>
> The idea of having .S files that aren't reasonably editable seems kind
> of silly, but I guess its up to the gas people.

Heh, indeed.  Maybe instead do

.insert_from_file  

and do that only when we are using -pipe or so.

Richard.

> Trev
>
>>
>> Richard.
>>
>> > thanks!
>> >
>> > Trev
>> >
>> > far outside of noise,
>> >>
>> >> Richard.
>> >>
>> >> > ---
>> >> >  gcc/config/arc/arc.h  |  3 +--
>> >> >  gcc/config/bfin/bfin.h|  5 +
>> >> >  gcc/config/frv/frv.h  |  6 +-
>> >> >  gcc/config/ia64/ia64-protos.h |  1 +
>> >> >  gcc/config/ia64/ia64.c| 11 +++
>> >> >  gcc/config/ia64/ia64.h|  8 +---
>> >> 

Re: PR66311: Fix extension in mpz->wide_int conversions

2015-08-05 Thread Richard Biener
On Wed, Aug 5, 2015 at 4:14 PM, Richard Sandiford
 wrote:
> wi::from_mpz reads the absolute value of the mpz and then negates the
> result if the mpz is negative.  When the top bit of the most-significant
> HOST_WIDE_INT in the absolute value is set, we implicitly sign-
> rather than zero-extend it to full precision.  For example,
> 1 << 63 gets mangled to (1 << prec) - (1 << 63).
>
> This patch fixes that by ensuring we zero-extend instead.  The testcase
> is taken from comment 15 in bugzilla (thanks FX).
>
> Tested on x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
> gcc/
> PR middle-end/66311
> * wide-int.cc (wi::from_mpz): Make sure that absolute mpz value
> is zero- rather than sign-extended.
>
> gcc/testsuite/
> 2015-08-05  Francois-Xavier Coudert  
>
> PR middle-end/66311
> * gfortran.dg/pr66311.f90: New file.
>
> diff --git a/gcc/testsuite/gfortran.dg/pr66311.f90 
> b/gcc/testsuite/gfortran.dg/pr66311.f90
> new file mode 100644
> index 000..dc40cb6
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/pr66311.f90
> @@ -0,0 +1,60 @@
> +! { dg-do run }
> +! { dg-additional-options "-fno-range-check -w" }
> +!
> +! Check that we can print large constants
> +!
> +! "-fno-range-check -w" is used so the testcase compiles even with targets
> +! that don't support large integer kinds.
> +
> +program test
> +  use iso_fortran_env, only : ikinds => integer_kinds
> +  implicit none
> +
> +  ! Largest integer kind
> +  integer, parameter :: k = ikinds(size(ikinds))
> +  integer, parameter :: hk = k / 2
> +
> +  if (k <= 8) stop
> +
> +  call check(900_k, "900")
> +  call check(9000_k, "9000")
> +  call check(int(huge(1_hk), kind=k), "9223372036854775807")
> +  call check(2_k**63, "9223372036854775808")
> +  call check(1000_k, "1000")
> +  call check(18446744065119617024_k, "18446744065119617024")
> +  call check(2_k**64 - 1, "18446744073709551615")
> +  call check(2_k**64, "18446744073709551616")
> +  call check(2000_k, "2000")
> +  call check(huge(0_k), "170141183460469231731687303715884105727")
> +  call check(huge(0_k)-1, "170141183460469231731687303715884105726")
> +
> +  call check(-900_k, "-900")
> +  call check(-9000_k, "-9000")
> +  call check(-int(huge(1_hk), kind=k), "-9223372036854775807")
> +  call check(-2_k**63, "-9223372036854775808")
> +  call check(-1000_k, "-1000")
> +  call check(-18446744065119617024_k, "-18446744065119617024")
> +  call check(-(2_k**64 - 1), "-18446744073709551615")
> +  call check(-2_k**64, "-18446744073709551616")
> +  call check(-2000_k, "-2000")
> +  call check(-huge(0_k), "-170141183460469231731687303715884105727")
> +  call check(-(huge(0_k)-1), "-170141183460469231731687303715884105726")
> +  call check(-huge(0_k)-1, "-170141183460469231731687303715884105728")
> +
> +  call check(2_k * huge(1_hk), "18446744073709551614")
> +  call check((-2_k) * huge(1_hk), "-18446744073709551614")
> +
> +contains
> +
> +  subroutine check (i, str)
> +implicit none
> +integer(kind=k), intent(in), value :: i
> +character(len=*), intent(in) :: str
> +
> +character(len=100) :: buffer
> +write(buffer,*) i
> +if (adjustl(buffer) /= adjustl(str)) call abort
> +  end subroutine
> +
> +end
> +
> diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
> index 13ba10c..9a93660 100644
> --- a/gcc/wide-int.cc
> +++ b/gcc/wide-int.cc
> @@ -252,13 +252,15 @@ wi::from_mpz (const_tree type, mpz_t x, bool wrap)
>  }
>
>/* Determine the number of unsigned HOST_WIDE_INTs that are required
> - for representing the value.  The code to calculate count is
> + for representing the absolute value.  The code to calculate count is
>   extracted from the GMP manual, section "Integer Import and Export":
>   http://gmplib.org/manual/Integer-Import-and-Export.html  */
>numb = CHAR_BIT * sizeof (HOST_WIDE_INT);
>count = (mpz_sizeinbase (x, 2) + numb - 1) / numb;
>HOST_WIDE_INT *val = res.write_val ();
> -  /* Write directly to the wide_int storage if possible, otherwise leave
> +  /* Read the absolute value.
> +
> + Write directly to the wide_int storage if possible, otherwise leave
>   GMP to allocate the memory for us.  It might be slightly more efficient
>   to use mpz_tdiv_r_2exp for the latter case, but the situation is
>   pathological and it seems safer to operate on the original mpz value
> @@ -276,7 +278,12 @@ wi::from_mpz (const_tree type, mpz_t x, bool wrap)
>memcpy (val, valres, count * sizeof (HOST_WIDE_INT));
>free (valres);
>  }
> -  res.set_len (canonize (val, count, prec));
> +  /* Zero-extend the absolute value to PREC bits.  */
> +  if (count < BLOCKS_NEEDED (prec) && val[count - 1] < 0)

Remove bogus can_extend

2015-08-05 Thread Richard Sandiford
Richard Henderson  writes:
> On 07/28/2015 01:36 PM, Richard Sandiford wrote:
> > Index: gcc/target-insns.def
> > ===
> > --- gcc/target-insns.def2015-07-28 20:56:29.721512028 +0100
> > +++ gcc/target-insns.def2015-07-28 20:56:29.713512127 +0100
> > @@ -34,6 +34,7 @@ DEF_TARGET_INSN (allocate_stack, (rtx x0
> >  DEF_TARGET_INSN (builtin_longjmp, (rtx x0))
> >  DEF_TARGET_INSN (builtin_setjmp_receiver, (rtx x0))
> >  DEF_TARGET_INSN (builtin_setjmp_setup, (rtx x0))
> > +DEF_TARGET_INSN (can_extend, (rtx x0, rtx x1))
> >  DEF_TARGET_INSN (canonicalize_funcptr_for_compare, (rtx x0, rtx x1))
> >  DEF_TARGET_INSN (casesi, (rtx x0, rtx x1, rtx x2, rtx x3, rtx x4))
> >  DEF_TARGET_INSN (check_stack, (rtx x0))
>
> Am I missing something?  Where is the can_extend hook used?

Gah.  I'd even fixed this on my local machine but committed it on the
work machine (where most of the testing was done).

Tested on x86_64-linux-gnu and committed as obvious.  Thanks for
catching it.

Richard

gcc/
* target-insns.def (can_extend): Delete.

diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 0c64a6b..ef8e6b0 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -35,7 +35,6 @@ DEF_TARGET_INSN (atomic_test_and_set, (rtx x0, rtx x1, rtx 
x2))
 DEF_TARGET_INSN (builtin_longjmp, (rtx x0))
 DEF_TARGET_INSN (builtin_setjmp_receiver, (rtx x0))
 DEF_TARGET_INSN (builtin_setjmp_setup, (rtx x0))
-DEF_TARGET_INSN (can_extend, (rtx x0, rtx x1))
 DEF_TARGET_INSN (canonicalize_funcptr_for_compare, (rtx x0, rtx x1))
 DEF_TARGET_INSN (casesi, (rtx x0, rtx x1, rtx x2, rtx x3, rtx x4))
 DEF_TARGET_INSN (check_stack, (rtx x0))



Fix reload1.c warning for some targets

2015-08-05 Thread Richard Sandiford
Building some targets results in a warning about orig_dup[i] potentially
being used uninitialised.  I think the warning is fair, since it isn't
obvious that the reog_data-based loop bound remains unchanged between:

  for (i = 0; i < recog_data.n_dups; i++)
orig_dup[i] = *recog_data.dup_loc[i];

and:

  for (i = 0; i < recog_data.n_dups; i++)
*recog_data.dup_loc[i] = orig_dup[i];

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard

gcc/
* reload1.c (elimination_costs_in_insn): Make it obvious to the
compiler that the n_dups and n_operands loop bounds are invariant.

diff --git a/gcc/reload1.c b/gcc/reload1.c
index ce06e06..ad243e3 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -3708,10 +3708,12 @@ elimination_costs_in_insn (rtx_insn *insn)
   /* Eliminate all eliminable registers occurring in operands that
  can be handled by reload.  */
   extract_insn (insn);
-  for (i = 0; i < recog_data.n_dups; i++)
+  int n_dups = recog_data.n_dups;
+  for (i = 0; i < n_dups; i++)
 orig_dup[i] = *recog_data.dup_loc[i];
 
-  for (i = 0; i < recog_data.n_operands; i++)
+  int n_operands = recog_data.n_operands;
+  for (i = 0; i < n_operands; i++)
 {
   orig_operand[i] = recog_data.operand[i];
 
@@ -3756,7 +3758,7 @@ elimination_costs_in_insn (rtx_insn *insn)
}
 }
 
-  for (i = 0; i < recog_data.n_dups; i++)
+  for (i = 0; i < n_dups; i++)
 *recog_data.dup_loc[i]
   = *recog_data.operand_loc[(int) recog_data.dup_num[i]];
 
@@ -3764,9 +3766,9 @@ elimination_costs_in_insn (rtx_insn *insn)
   check_eliminable_occurrences (old_body);
 
   /* Restore the old body.  */
-  for (i = 0; i < recog_data.n_operands; i++)
+  for (i = 0; i < n_operands; i++)
 *recog_data.operand_loc[i] = orig_operand[i];
-  for (i = 0; i < recog_data.n_dups; i++)
+  for (i = 0; i < n_dups; i++)
 *recog_data.dup_loc[i] = orig_dup[i];
 
   /* Update all elimination pairs to reflect the status after the current



Re: [PATCH][AArch64][12/14] Target attributes and target pragmas tests

2015-08-05 Thread Andreas Schwab
Kyrill Tkachov  writes:

> On 05/08/15 10:03, Andreas Schwab wrote:
>> Kyrill Tkachov  writes:
>>
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_1.c 
>>> b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
>>> new file mode 100644
>>> index 000..72d0838
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
>>> @@ -0,0 +1,12 @@
>>> +/* { dg-do assemble } */
>>> +/* { dg-options "-O2 -mcpu=thunderx -save-temps" } */
>>> +
>>> +__attribute__ ((target ("cpu=cortex-a72.cortex-a53")))
>>> +int
>>> +foo (int a)
>>> +{
>>> +  return a + 1;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler "//.tune cortex-a72.cortex-a53" } } */
>>> +/* { dg-final { scan-assembler-not "thunderx" } } */
>> FAIL: gcc.target/aarch64/target_attr_1.c (test for excess errors)
>> Excess errors:
>> Assembler messages:
>> Error: unknown cpu `thunderx'
>> Error: unrecognized option -mcpu=thunderx
>
> yeah, that happens if your assembler doesn't support -mcpu=thunderx.
> Newer binutils should support it.

Then there probably needs to be a dg-require- test for it.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] Add X != !X pattern

2015-08-05 Thread Andreas Schwab
Richard Biener  writes:

> For reference see below (testing on aarch64 appreciated).
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2015-08-05  Richard Biener  
>
>   * gimple-fold.c (gimple_fold_stmt_to_constant_1): Simplify
>   bool comparison canonicalization and restrict to integers.

Fixes all regressions on aarch64.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] Add X != !X pattern

2015-08-05 Thread James Greenhalgh
On Wed, Aug 05, 2015 at 02:56:08PM +0100, Richard Biener wrote:
> For reference see below (testing on aarch64 appreciated).

Looks good to me on aarch64-none-elf.

Thanks,
James

> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> Richard.
> 
> 2015-08-05  Richard Biener  
> 
>   * gimple-fold.c (gimple_fold_stmt_to_constant_1): Simplify
>   bool comparison canonicalization and restrict to integers.



Re: [PATCH][AArch64][12/14] Target attributes and target pragmas tests

2015-08-05 Thread Kyrill Tkachov


On 05/08/15 15:27, Andreas Schwab wrote:

Kyrill Tkachov  writes:


On 05/08/15 10:03, Andreas Schwab wrote:

Kyrill Tkachov  writes:


diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_1.c 
b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
new file mode 100644
index 000..72d0838
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mcpu=thunderx -save-temps" } */
+
+__attribute__ ((target ("cpu=cortex-a72.cortex-a53")))
+int
+foo (int a)
+{
+  return a + 1;
+}
+
+/* { dg-final { scan-assembler "//.tune cortex-a72.cortex-a53" } } */
+/* { dg-final { scan-assembler-not "thunderx" } } */

FAIL: gcc.target/aarch64/target_attr_1.c (test for excess errors)
Excess errors:
Assembler messages:
Error: unknown cpu `thunderx'
Error: unrecognized option -mcpu=thunderx

yeah, that happens if your assembler doesn't support -mcpu=thunderx.
Newer binutils should support it.

Then there probably needs to be a dg-require- test for it.


Or just make this a dg-compile test. I suppose the assemble step doesn't add
anything to this test.

Kyrill



Andreas.





Re: [PATCH][AArch64][12/14] Target attributes and target pragmas tests

2015-08-05 Thread James Greenhalgh
On Mon, Aug 03, 2015 at 12:32:15PM +0100, James Greenhalgh wrote:
> On Fri, Jul 24, 2015 at 09:40:28AM +0100, Kyrill Tkachov wrote:
> > 
> > On 21/07/15 18:14, James Greenhalgh wrote:
> > > On Thu, Jul 16, 2015 at 04:21:15PM +0100, Kyrill Tkachov wrote:
> > >> Hi all,
> > >>
> > >> These are the tests for target attributes and pragmas.
> > >> I've tried to test for the inlining rules, some of the possible errors 
> > >> and
> > >> the preprocessor macros changed from target pragmas.
> > >>
> > >> Ok for trunk?
> > > Mechanical changes in the pragma tests for the sake of grammar!
> > >
> > > s/defined but shouldn't/is defined but should not be/
> > > s/not defined but should/is not defined but should be/
> > >
> > > Note that some of the errors have different text, so you'll have to run
> > > through by hand and check these are consistent.
> > >
> > > It would be good to hand some of these target attribute tests off
> > > to the assembler to make sure we are also putting out appropriate
> > > directives in our output. Perhaps "assemble" is the more appropriate
> > > dg-do directive?
> > >
> > > Some more nits below (mostly missing comments on testcases).
> > 
> > Thanks, here's an updated version.
> > 
> > I've also added a test for the "+nothing" architectural feature
> > attribute introduced in patch 10/14 and renamed the tests to use
> > underscores in their names.
> > 
> > How's this?
> 

These tests fail for me with -fPIC, where you won't get inlining of non-static
functions.

NA->FAIL: gcc.target/aarch64/target_attr_14.c scan-assembler-not bl.*bar
NA->FAIL: gcc.target/aarch64/target_attr_5.c scan-assembler-not bl.*bar
NA->FAIL: gcc.target/aarch64/target_attr_8.c scan-assembler-not bl.*bar
NA->FAIL: gcc.target/aarch64/target_attr_14.c scan-assembler-not bl.*bar
NA->FAIL: gcc.target/aarch64/target_attr_5.c scan-assembler-not bl.*bar
NA->FAIL: gcc.target/aarch64/target_attr_8.c scan-assembler-not bl.*bar


You'll probably want to mark the functions you expect to be inlined as
static, or otherwise skip the test for fPIC.

Thanks,
James



Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming

2015-08-05 Thread Ilya Verbin
On Wed, Aug 05, 2015 at 10:40:44 +0200, Richard Biener wrote:
> On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin  wrote:
> > On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
> >> We had established the use of a boolean flag have_offload in gcc::context
> >> to indicate whether during compilation, we've actually seen any code to
> >> be offloaded (see cited below the relevant parts of the patch by Ilya et
> >> al.).  This means that currently, the whole offload machinery will not be
> >> run unless we actually have any offloaded data.  This means that the
> >> configured mkoffload programs (-foffload=[...], defaulting to
> >> configure-time --enable-offload-targets=[...]) will not be invoked unless
> >> we actually have any offloaded data.  This means that we will not
> >> actually generate constructor code to call libgomp's
> >> GOMP_offload_register unless we actually have any offloaded data.
> >
> > Yes, that was the plan.
> >
> >> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
> >> targets have been specified during compilation.
> >>
> >> But: at runtime, I'd like to know which -foffload=[...] targets have been
> >> specified during compilation, so that we can, for example, reliably
> >> resort to host fallback execution for -foffload=disable instead of
> >> getting error message that an offloaded function is missing.
> >
> > It's easy to fix:
> >
> > diff --git a/libgomp/target.c b/libgomp/target.c
> > index a5fb164..f81d570 100644
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
> > *devicep,
> >k.host_end = k.host_start + 1;
> >splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
> >gomp_mutex_unlock (&devicep->lock);
> > -  if (tgt_fn == NULL)
> > -   gomp_fatal ("Target function wasn't mapped");
> > -
> >return (void *) tgt_fn->tgt_offset;
> >  }
> >  }
> > @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const 
> > void *unused,
> >  return gomp_target_fallback (fn, hostaddrs);
> >
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  if (fn_addr == NULL)
> > +return gomp_target_fallback (fn, hostaddrs);
> >
> >struct target_mem_desc *tgt_vars
> >  = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
> > @@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), 
> > size_t mapnum,
> >  }
> >
> >void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  if (fn_addr == NULL)
> > +return gomp_target_fallback (fn, hostaddrs);
> >
> >struct target_mem_desc *tgt_vars
> >  = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
> >
> >
> >> other hand, for example, for -foffload=nvptx-none, even if user program
> >> code doesn't contain any offloaded data (and thus the offload machinery
> >> has not been run), the user program might still contain any executable
> >> directives or OpenACC runtime library calls, so we'd still like to use
> >> the libgomp nvptx plugin.  However, we currently cannot detect this
> >> situation.
> >>
> >> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
> >> configuration in the executable (as a string, for example) for libgomp to
> >> look that up, or b) make it a requirement that (if configured via
> >> -foffload=[...]), the offload machinery is run even if there is not
> >> actually any data to be offloaded, so we then reliably get the respective
> >> constructor call to libgomp's GOMP_offload_register.  I once began to
> >> implement a), but this to get a big ugly, so then looked into b) instead.
> >> Compared to the status quo, always running the whole offloading machinery
> >> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
> >> are active, certainly does introduce some overhead when there isn't
> >> actually any code to be offloaded, so I'm not sure whether that is
> >> acceptable?
> >
> > I vote for (a).
> 
> What happens for conflicting -fofffload=[...] options in different TUs?

If you're asking about what happens now, only the list of offload targets from
link-time -foffload=tgt1,tgt2 option matters.

I don't like plan (b) because it calls ipa_write_summaries unconditionally for
all OpenMP programs, which creates IR sections, which increases filesize and may
cause other problems, e.g. .
Also compile-time is increased because of LTO machinery, mkoffloads, etc.

If OpenACC requires some registration in libgomp even without offload, maybe you
can run this machinery only under flag_openacc?

  -- Ilya


Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.

2015-08-05 Thread Martin Liška
On 08/03/2015 07:38 PM, Jeff Law wrote:
> On 07/16/2015 05:03 AM, Martin Liška wrote:
>>> So a general question.  We're passing in STRICT to several routines, which 
>>> is fine.  But then we're also checking M_TAIL_MERGE_MODE.  What's the 
>>> difference between the two?  Can they be unified?
>>
>> Hello.
>>
>> I would say that STRICT is a bit generic mechanism that was introduced some 
>> time before. It's e.g. used for checking of THIS arguments for methods and 
>> make checking
>> more sensitive in situations that are somehow special.
>>
>> The newly added state is orthogonal to the previous one.
> Fair enough.  There's some cases where we've documented STRICT, and others 
> where we haven't.
> 
>   If STRICT flag is true, version must match strictly
> Appears as documentation for STRICT.  It seems like it'd be better to 
> describe what "strictly" means here.
> 
> Elsewhere we have comments like:
> 
>   Be strict in case of tail-merge optimization
> 
> Which tends to confuse things a bit.  Perhaps something more like:
> 
>   In the case of tail merge optimization, XYZ must match
> 
> It seems like a nit, but to someone else reading the code I don't think the 
> distinctions are all that clear right now, so if we can improve things, it'd 
> be good.

Hello Jeff.

I decided to come up a bit different approach (I hope it's much saner). Instead 
of passing a new argument (called strict),
I have rewritten func_checker class to work with a new inner state 
(m_comparing_sensitive_rhs), which is turned on in cases
where we compare a RHS operand. Usage of the flag is placed to the 
corresponding part of func_checker.

> 
> 
>>
>>>
>>> I didn't find this comment particularly useful in understanding what this 
>>> function does.  AFAICT the function looks as the uses of the LHS of STMT 
>>> and verifies they're all in the same block as STMT, right?
>>>
>>> It also verifies that the none of the operands within STMT are part of 
>>> SSA_NAMES_SET.
>>>
>>> What role do those properties play in the meaning of "local definition"?
>>
>> I tried to explain it more deeply what's the purpose of this function.
> Thanks.  It's much clearer now.   We've actually got similar code in a couple 
> places (ifcvt).  I wonder if we could unify those implementations as a 
> follow-up cleanup.

Good point, I'm going to write it to my TODO list.

> 
> 
>>
>>>
>>>
>>>
>>>
 @@ -1037,4 +1205,60 @@ func_checker::compare_gimple_asm (const gasm *g1, 
 const gasm *g2)
  return true;
}

 +void
 +ssa_names_set::build (basic_block bb)
>>> Needs a function comment.  What are the "important" names we're collecting 
>>> here?
>>>
>>> Is a single forward and backward pass really sufficient to find all the 
>>> important names?
>>>
>>> In the backward pass, do you have to consider things like ASMs?  I guess 
>>> it's difficult to understand what you need to look at because it's not 
>>> entirely clear the set of SSA_NAMEs you're building.
> These questions and lack of function comment don't seem to have been 
> addressed yet.

Fixed.

> 
> 
> 
>> > +
>> > +using namespace ipa_icf;
>> > +using namespace ipa_icf_gimple;
>>> Is this wise?  Does it significantly help with conciseness within the tail 
>>> merging pass where it wants things ipa-icf and ipa-icf-gimple?
>>>
>>> I'm not objecting at this point, I want to hear your thoughts.
>>
>> I must agree with you, as I've started using both namespaces in tree-tail 
>> merge pass,
>> it makes not much sense anymore. I suggest to come up with a namespace that 
>> will
>> encapsulate 'identical code folding'-related stuff. What about:
>>
>> namespace icf
> Sure if it helps and is clean.  GCC does not have a policy against "using 
> namespace", but other codebases do (google for example) as it does introduce 
> some long term maintenance issues.
> 
> So when I see a "using namespace" of that nature, I'm naturally going to 
> question if it really helps in a significant way.  If it does, then I won't 
> object.  If it's not helping in a significant way, then I'm likely to object.
> 
> I think the updated version is fine WRT namespaces.

Sound good!

> 
> 
>>
>> ?
>>
>>>
>>>

/* Describes a group of bbs with the same successors.  The successor 
 bbs are
   cached in succs, and the successor edge flags are cached in 
 succ_flags.
 @@ -1220,17 +1231,48 @@ gsi_advance_bw_nondebug_nonlocal 
 (gimple_stmt_iterator *gsi, tree *vuse,
}
}

 +static bool
 +check_edges_correspondence (basic_block bb1, basic_block bb2)
>>> Needs a function comment.
> Still needs function comment.
> 
> 
> I think we're pretty close here.  Most of the issues are comments -- I still 
> haven't looked *real* close at ssa_names_set::build.  With a function comment 
> I ought to be able to look at it more closely in the next (and hopefully 
> final) iteration.

Comment in this part was enhanced.

The patch can boostrap on x86_64-linux-gnu,

Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-08-05 Thread David Malcolm
On Wed, 2015-08-05 at 13:47 +0200, Richard Biener wrote:
> On Wed, Aug 5, 2015 at 12:57 PM, Trevor Saunders  
> wrote:
> > On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
> >> On Sat, Jul 25, 2015 at 4:37 AM,   wrote:
> >> > From: Trevor Saunders 
> >> >
> >> > * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
> >> > config/ia64/ia64-protos.h, config/ia64/ia64.c, 
> >> > config/ia64/ia64.h,
> >> > config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
> >> > config/rs6000/rs6000.c, config/rs6000/xcoff.h, config/spu/spu.h,
> >> > config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to
> >> > the name of a function.
> >> > * output.h (default_output_label): New prototype.
> >> > * varasm.c (default_output_label): New function.
> >> > * vmsdbgout.c: Include tm_p.h.
> >> > * xcoffout.c: Likewise.
> >>
> >> Just a general remark - the GCC output machinery is known to be slow,
> >> adding indirect calls might be not the very best idea without refactoring
> >> some of it.
> >>
> >> Did you do any performance measurements for artificial testcases
> >> exercising the specific bits you change?
> >
> > sorry about the delay, but I finally got a chance to do some perf tests
> > of the first patch.  I took three test cases fold-const.ii, insn-emit.ii
> > and a random .i from firefox and did 3 trials of the length of 100
> > compilations.  The only non default flag was -std=gnu++11.
> >
> > results before patch hookizing output_ascii
> >
> > fold-const.ii
> > real3m18.051s
> > user2m41.340s
> > sys 0m36.544s
> > real3m18.141s
> > user2m42.236s
> > sys 0m35.740s
> > real3m18.297s
> > user2m42.316s
> > sys 0m35.804s
> >
> > insn-emit.ii
> > real9m58.229s
> > user8m26.960s
> > sys 1m31.224s
> > real9m57.857s
> > user8m24.616s
> > sys 1m33.072s
> > real9m57.922s
> > user8m25.232s
> > sys 1m32.512s
> >
> > mozilla.ii
> > real8m5.732s
> > user6m44.888s
> > sys 1m20.764s
> > real8m5.404s
> > user6m44.468s
> > sys 1m20.856s
> > real7m59.197s
> > user6m39.632s
> > sys 1m19.472s
> >
> > after patch
> >
> > fold-const.ii
> > real3m18.488s
> > user2m41.972s
> > sys 0m36.388s
> > real3m18.215s
> > user2m41.640s
> > sys 0m36.432s
> > real3m18.368s
> > user2m42.492s
> > sys 0m35.720s
> >
> > insn-emit.ii
> > real10m4.700s
> > user8m32.536s
> > sys 1m32.120s
> > real10m4.241s
> > user8m31.456s
> > sys 1m32.728s
> > real10m4.515s
> > user8m32.056s
> > sys 1m32.396s
> >
> > mozilla.ii
> > real7m58.018s
> > user6m38.008s
> > sys 1m19.924s
> > real7m59.269s
> > user6m37.736s
> > sys 1m21.448s
> > real7m58.254s
> > user6m37.828s
> > sys 1m20.324s
> >
> > So, roughly that looks to me like a range from improving by .5% to
> > regressing by 1%.  I'm not sure what could cause an improvement, so I
> > kind of wonder how valid these results are.
> 
> Hmm, indeed.  The speedup looks suspicious.
> 
> > Another question is how one can refactor the output machinary to be
> > faster.  My first  thought is to buffer text internally before calling
> > stdio functions, but that seems like a giant job.
> 
> stdio functions are already buffering, so I don't know either.
> 
> But yes, going the libas route would improve things here, or for
> example enhancing gas to be able to eat target binary data
> without the need to encode it in printable characters...
> 
> .raw_data number-of-bytes
> 
> 
> Makes it quite unparsable to editors of course ...

A middle-ground might be to do both:

.raw_data number-of-bytes



> Richard.
> 
> > thanks!
> >
> > Trev
> >
> > far outside of noise,
> >>
> >> Richard.
> >>
> >> > ---
> >> >  gcc/config/arc/arc.h  |  3 +--
> >> >  gcc/config/bfin/bfin.h|  5 +
> >> >  gcc/config/frv/frv.h  |  6 +-
> >> >  gcc/config/ia64/ia64-protos.h |  1 +
> >> >  gcc/config/ia64/ia64.c| 11 +++
> >> >  gcc/config/ia64/ia64.h|  8 +---
> >> >  gcc/config/lm32/lm32.h|  3 +--
> >> >  gcc/config/mep/mep.h  |  8 +---
> >> >  gcc/config/mmix/mmix.h|  3 +--
> >> >  gcc/config/pa/pa-protos.h |  1 +
> >> >  gcc/config/pa/pa.c| 12 
> >> >  gcc/config/pa/pa.h|  9 +
> >> >  gcc/config/rs6000/rs6000-protos.h |  1 +
> >> >  gcc/config/rs6000/rs6000.c|  8 
> >> >  gcc/config/rs6000/xcoff.h |  3 +--
> >> >  gcc/config/spu/spu.h  |  3 +--
> >> >  gcc/config/visium/visium.h|  3 +--
> >> >  gcc/defaults.h|  6 +-
> >> >  gcc/output.h  |  3 +++
> >> >  gcc/varasm.c  |  9 +
> >> >  gcc/vmsdbgout.c   |  1 +
> >> >  gcc/xcoffout.c

Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-08-05 Thread David Malcolm
On Wed, 2015-08-05 at 11:28 -0400, David Malcolm wrote:
> On Wed, 2015-08-05 at 13:47 +0200, Richard Biener wrote:
> > On Wed, Aug 5, 2015 at 12:57 PM, Trevor Saunders  
> > wrote:
> > > On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
> > >> On Sat, Jul 25, 2015 at 4:37 AM,   wrote:
> > >> > From: Trevor Saunders 
> > >> >
> > >> > * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
> > >> > config/ia64/ia64-protos.h, config/ia64/ia64.c, 
> > >> > config/ia64/ia64.h,
> > >> > config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
> > >> > config/rs6000/rs6000.c, config/rs6000/xcoff.h, 
> > >> > config/spu/spu.h,
> > >> > config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to
> > >> > the name of a function.
> > >> > * output.h (default_output_label): New prototype.
> > >> > * varasm.c (default_output_label): New function.
> > >> > * vmsdbgout.c: Include tm_p.h.
> > >> > * xcoffout.c: Likewise.
> > >>
> > >> Just a general remark - the GCC output machinery is known to be slow,
> > >> adding indirect calls might be not the very best idea without refactoring
> > >> some of it.
> > >>
> > >> Did you do any performance measurements for artificial testcases
> > >> exercising the specific bits you change?
> > >
> > > sorry about the delay, but I finally got a chance to do some perf tests
> > > of the first patch.  I took three test cases fold-const.ii, insn-emit.ii
> > > and a random .i from firefox and did 3 trials of the length of 100
> > > compilations.  The only non default flag was -std=gnu++11.
> > >
> > > results before patch hookizing output_ascii
> > >
> > > fold-const.ii
> > > real3m18.051s
> > > user2m41.340s
> > > sys 0m36.544s
> > > real3m18.141s
> > > user2m42.236s
> > > sys 0m35.740s
> > > real3m18.297s
> > > user2m42.316s
> > > sys 0m35.804s
> > >
> > > insn-emit.ii
> > > real9m58.229s
> > > user8m26.960s
> > > sys 1m31.224s
> > > real9m57.857s
> > > user8m24.616s
> > > sys 1m33.072s
> > > real9m57.922s
> > > user8m25.232s
> > > sys 1m32.512s
> > >
> > > mozilla.ii
> > > real8m5.732s
> > > user6m44.888s
> > > sys 1m20.764s
> > > real8m5.404s
> > > user6m44.468s
> > > sys 1m20.856s
> > > real7m59.197s
> > > user6m39.632s
> > > sys 1m19.472s
> > >
> > > after patch
> > >
> > > fold-const.ii
> > > real3m18.488s
> > > user2m41.972s
> > > sys 0m36.388s
> > > real3m18.215s
> > > user2m41.640s
> > > sys 0m36.432s
> > > real3m18.368s
> > > user2m42.492s
> > > sys 0m35.720s
> > >
> > > insn-emit.ii
> > > real10m4.700s
> > > user8m32.536s
> > > sys 1m32.120s
> > > real10m4.241s
> > > user8m31.456s
> > > sys 1m32.728s
> > > real10m4.515s
> > > user8m32.056s
> > > sys 1m32.396s
> > >
> > > mozilla.ii
> > > real7m58.018s
> > > user6m38.008s
> > > sys 1m19.924s
> > > real7m59.269s
> > > user6m37.736s
> > > sys 1m21.448s
> > > real7m58.254s
> > > user6m37.828s
> > > sys 1m20.324s
> > >
> > > So, roughly that looks to me like a range from improving by .5% to
> > > regressing by 1%.  I'm not sure what could cause an improvement, so I
> > > kind of wonder how valid these results are.
> > 
> > Hmm, indeed.  The speedup looks suspicious.
> > 
> > > Another question is how one can refactor the output machinary to be
> > > faster.  My first  thought is to buffer text internally before calling
> > > stdio functions, but that seems like a giant job.
> > 
> > stdio functions are already buffering, so I don't know either.
> > 
> > But yes, going the libas route would improve things here, or for
> > example enhancing gas to be able to eat target binary data
> > without the need to encode it in printable characters...
> > 
> > .raw_data number-of-bytes
> > 
> > 
> > Makes it quite unparsable to editors of course ...
> 
> A middle-ground might be to do both:
> 
> .raw_data number-of-bytes
> 

Sorry, I hit "Send" too early; I meant something like this as a
middle-ground:

  .raw_data number-of-bytes
  

  ; comment giving the formatted text

so that cc1 etc are doing the formatting work to make the comment, so
that human readers can see what the raw data is meant to be, but the
assembler doesn't have to do work to parse it.

FWIW, I once had a go at hiding asm_out_file behind a class interface,
trying to build up higher-level methods on top of raw text printing.
Maybe that's a viable migration strategy  (I didn't finish that patch).

[...]

Dave



[committed] Add get_num_insn_codes to gensupport

2015-08-05 Thread Richard Sandiford
This patch adds a gensupport routine that generators can use to get
the number of unique INSN_CODEs, rather than having to track it
themselves.  This is needed for a later patch that changes the
way in which INSN_CODE is calculated.

Bootstrapped and regression-tested on x86_64-linux-gnu.

Thanks,
Richard

gcc/
* gensupport.h (get_num_insn_codes): Declare.
* gensupport.c (get_num_insn_codes): New function.
* genattrtab.c (optimize_attrs): Rename max_insn_code to
num_insn_codes.
(main): Likewise.  Use get_num_insn_codes.
* gencodes.c (main): Remove "last" and use get_num_insn_codes.

diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index 1f57c36..932b18b 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -2952,11 +2952,11 @@ get_attr_order (struct attr_desc ***ret)
 
 /* Optimize the attribute lists by seeing if we can determine conditional
values from the known values of other attributes.  This will save subroutine
-   calls during the compilation.  MAX_INSN_CODE is the number of unique
+   calls during the compilation.  NUM_INSN_CODES is the number of unique
instruction codes.  */
 
 static void
-optimize_attrs (int max_insn_code)
+optimize_attrs (int num_insn_codes)
 {
   struct attr_desc *attr;
   struct attr_value *av;
@@ -2975,7 +2975,7 @@ optimize_attrs (int max_insn_code)
 return;
 
   /* Make 2 extra elements, for "code" values -2 and -1.  */
-  insn_code_values = XCNEWVEC (struct attr_value_list *, max_insn_code + 2);
+  insn_code_values = XCNEWVEC (struct attr_value_list *, num_insn_codes + 2);
 
   /* Offset the table address so we can index by -2 or -1.  */
   insn_code_values += 2;
@@ -3003,7 +3003,7 @@ optimize_attrs (int max_insn_code)
   gcc_assert (iv == ivbuf + num_insn_ents);
 
   /* Process one insn code at a time.  */
-  for (i = -2; i < max_insn_code; i++)
+  for (i = -2; i < num_insn_codes; i++)
 {
   /* Clear the ATTR_CURR_SIMPLIFIED_P flag everywhere relevant.
 We use it to mean "already simplified for this insn".  */
@@ -5161,7 +5161,6 @@ main (int argc, char **argv)
   struct attr_desc *attr;
   struct insn_def *id;
   int i;
-  int max_insn_code = 0;
 
   progname = "genattrtab";
 
@@ -5224,14 +5223,11 @@ main (int argc, char **argv)
}
   if (GET_CODE (info.def) != DEFINE_ASM_ATTRIBUTES)
insn_index_number++;
-  max_insn_code = info.index;
 }
 
   if (have_error)
 return FATAL_EXIT_CODE;
 
-  max_insn_code++;
-
   /* If we didn't have a DEFINE_ASM_ATTRIBUTES, make a null one.  */
   if (! got_define_asm_attributes)
 {
@@ -5248,14 +5244,15 @@ main (int argc, char **argv)
 expand_delays ();
 
   /* Make `insn_alternatives'.  */
-  insn_alternatives = oballocvec (uint64_t, max_insn_code);
+  int num_insn_codes = get_num_insn_codes ();
+  insn_alternatives = oballocvec (uint64_t, num_insn_codes);
   for (id = defs; id; id = id->next)
 if (id->insn_code >= 0)
   insn_alternatives[id->insn_code]
= (((uint64_t) 1) << id->num_alternatives) - 1;
 
   /* Make `insn_n_alternatives'.  */
-  insn_n_alternatives = oballocvec (int, max_insn_code);
+  insn_n_alternatives = oballocvec (int, num_insn_codes);
   for (id = defs; id; id = id->next)
 if (id->insn_code >= 0)
   insn_n_alternatives[id->insn_code] = id->num_alternatives;
@@ -5284,7 +5281,7 @@ main (int argc, char **argv)
   make_length_attrs ();
 
   /* Perform any possible optimizations to speed up compilation.  */
-  optimize_attrs (max_insn_code);
+  optimize_attrs (num_insn_codes);
 
   /* Now write out all the `gen_attr_...' routines.  Do these before the
  special routines so that they get defined before they are used.  */
diff --git a/gcc/gencodes.c b/gcc/gencodes.c
index b9d65a2..c747891 100644
--- a/gcc/gencodes.c
+++ b/gcc/gencodes.c
@@ -49,8 +49,6 @@ gen_insn (md_rtx_info *info)
 int
 main (int argc, char **argv)
 {
-  int last = 1;
-
   progname = "gencodes";
 
   /* We need to see all the possibilities.  Elided insns may have
@@ -79,7 +77,6 @@ enum insn_code {\n\
   case DEFINE_INSN:
   case DEFINE_EXPAND:
gen_insn (&info);
-   last = info.index + 1;
break;
 
   default:
@@ -89,7 +86,7 @@ enum insn_code {\n\
   printf ("  LAST_INSN_CODE = %d\n\
 };\n\
 \n\
-#endif /* GCC_INSN_CODES_H */\n", last);
+#endif /* GCC_INSN_CODES_H */\n", get_num_insn_codes () - 1);
 
   if (ferror (stdout) || fflush (stdout) || fclose (stdout))
 return FATAL_EXIT_CODE;
diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index b7681a2..714af03 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -2602,6 +2602,14 @@ read_md_rtx (md_rtx_info *info)
   return true;
 }
 
+/* Return the number of possible INSN_CODEs.  Only meaningful once the
+   whole file has been processed.  */
+unsigned int
+get_num_insn_codes ()
+{
+  return sequence_num;
+}
+
 /* Helper functions for insn elision.  */
 
 /* Compute a hash function of a c_test structure, which is keyed
diff --git a/g

[committed] Add a get_c_test helper function to gensupport

2015-08-05 Thread Richard Sandiford
Add a new function to return the C condition that must hold for an
.md rtx to be valid.  This is needed by the next patch but is a minor
clean-up anyway.

Bootstrapped and regression-tested on x86_64-linux-gnu.

Thanks,
Richard

gcc/
* gensupport.h (get_c_test): Declare.
* gensupport.c (get_c_test): New function.
* genconditions.c (main): Use it.
* genrecog.c (validate_pattern): Likewise.
(match_pattern_1): Likewise.  Remove c_test argument.
(match_pattern): Update accordingly and remove c_test argument.
(main): Update accordingly.

diff --git a/gcc/genconditions.c b/gcc/genconditions.c
index 23109ee..001e58e 100644
--- a/gcc/genconditions.c
+++ b/gcc/genconditions.c
@@ -222,25 +222,17 @@ main (int argc, char **argv)
   while (read_md_rtx (&info))
 {
   rtx def = info.def;
-  /* N.B. define_insn_and_split, define_cond_exec are handled
-entirely within read_md_rtx; we never see them.  */
+  add_c_test (get_c_test (def), -1);
   switch (GET_CODE (def))
{
case DEFINE_INSN:
case DEFINE_EXPAND:
- add_c_test (XSTR (def, 2), -1);
  /* except.h needs to know whether there is an eh_return
 pattern in the machine description.  */
  if (!strcmp (XSTR (def, 0), "eh_return"))
saw_eh_return = 1;
  break;
 
-   case DEFINE_SPLIT:
-   case DEFINE_PEEPHOLE:
-   case DEFINE_PEEPHOLE2:
- add_c_test (XSTR (def, 1), -1);
- break;
-
default:
  break;
}
diff --git a/gcc/genrecog.c b/gcc/genrecog.c
index 4275bd2..599121f 100644
--- a/gcc/genrecog.c
+++ b/gcc/genrecog.c
@@ -519,10 +519,7 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, 
int set_code)
const struct pred_data *pred;
const char *c_test;
 
-   if (GET_CODE (info->def) == DEFINE_INSN)
- c_test = XSTR (info->def, 2);
-   else
- c_test = XSTR (info->def, 1);
+   c_test = get_c_test (info->def);
 
if (pred_name[0] != 0)
  {
@@ -4080,13 +4077,13 @@ match_pattern_2 (state *s, md_rtx_info *info, position 
*pos, rtx pattern)
 
(1) the rtx doesn't match anything already matched by S
(2) the rtx matches TOP_PATTERN and
-   (3) C_TEST is true.
+   (3) the C test required by INFO->def is true
 
For peephole2, TOP_PATTERN is a SEQUENCE of the instruction patterns
to match, otherwise it is a single instruction pattern.  */
 
 static void
-match_pattern_1 (state *s, md_rtx_info *info, rtx pattern, const char *c_test,
+match_pattern_1 (state *s, md_rtx_info *info, rtx pattern,
 acceptance_type acceptance)
 {
   if (acceptance.type == PEEPHOLE2)
@@ -4120,6 +4117,7 @@ match_pattern_1 (state *s, md_rtx_info *info, rtx 
pattern, const char *c_test,
 }
 
   /* Make sure that the C test is true.  */
+  const char *c_test = get_c_test (info->def);
   if (maybe_eval_c_test (c_test) != 1)
 s = add_decision (s, rtx_test::c_test (c_test), true, false);
 
@@ -4132,7 +4130,7 @@ match_pattern_1 (state *s, md_rtx_info *info, rtx 
pattern, const char *c_test,
backtracking.  */
 
 static void
-match_pattern (state *s, md_rtx_info *info, rtx pattern, const char *c_test,
+match_pattern (state *s, md_rtx_info *info, rtx pattern,
   acceptance_type acceptance)
 {
   if (merge_states_p)
@@ -4140,11 +4138,11 @@ match_pattern (state *s, md_rtx_info *info, rtx 
pattern, const char *c_test,
   state root;
   /* Add the decisions to a fresh state and then merge the full tree
 into the existing one.  */
-  match_pattern_1 (&root, info, pattern, c_test, acceptance);
+  match_pattern_1 (&root, info, pattern, acceptance);
   merge_into_state (s, &root);
 }
   else
-match_pattern_1 (s, info, pattern, c_test, acceptance);
+match_pattern_1 (s, info, pattern, acceptance);
 }
 
 /* Begin the output file.  */
@@ -5256,15 +5254,13 @@ main (int argc, char **argv)
acceptance.u.full.u.num_clobbers = 0;
pattern = add_implicit_parallel (XVEC (def, 1));
validate_pattern (pattern, &info, NULL_RTX, 0);
-   match_pattern (&insn_root, &info, pattern,
-  XSTR (def, 2), acceptance);
+   match_pattern (&insn_root, &info, pattern, acceptance);
 
/* If the pattern is a PARALLEL with trailing CLOBBERs,
   allow recog_for_combine to match without the clobbers.  */
if (GET_CODE (pattern) == PARALLEL
&& remove_clobbers (&acceptance, &pattern))
- match_pattern (&insn_root, &info, pattern,
-XSTR (def, 2), acceptance);
+ match_pattern (&insn_root, &info, pattern, acceptance);
break;
  }
 
@@ -5272,8 +5268,7 @@ main (int argc, char **argv)
  acceptance.type = SPLIT;
  pattern = add_implicit_parallel (XVEC (def, 0));
  validate_pattern (pattern, &i

RE: Regression in target MIC compiler

2015-08-05 Thread Thomas Schwinge
Hi!

On Wed, 5 Aug 2015 15:10:40 +0100, David Sherwood  
wrote:
> In lto_input_mode_table there is the following line of code: [...]

Thanks!  That's not exactly it, but you put me on the right track.
Testing a patch.


Grüße,
 Thomas


pgpvL8b1BSU6F.pgp
Description: PGP signature


[committed] Reduce size of insn_data

2015-08-05 Thread Richard Sandiford
At the moment insn_data contains entries for define_splits and
define_peephole2s "only for reasons of consistency and to simplify
genrecog".  We don't need that any more and it just adds bloat.
This patch instead uses separate counters for define_split and
define_peephole2.

It also removes a redundant DEFINE_SUBST case in read_md_rtx.
define_subst is a preprocessing directive and doesn't get passed
back to generators.

Bootstrapped and regression-tested on x86_64-linux-gnu.

Thanks,
Richard

gcc/
* gensupport.c (sequence_num): Replace with...
(insn_sequence_num, split_sequence_num, peephole2_sequence_num):
...these new variables.
(init_rtx_reader_args_cb): Update accordingly.
(get_num_code_insns): Likewise.
(read_md_rtx): Rework to use a while loop and get_c_test.
Use the new counters.  Remove redundant DEFINE_SUBST case.
* genoutput.c (gen_split): Delete.
(main): Don't call it.

diff --git a/gcc/genoutput.c b/gcc/genoutput.c
index cd7f129..ed9540d 100644
--- a/gcc/genoutput.c
+++ b/gcc/genoutput.c
@@ -973,46 +973,6 @@ gen_expand (md_rtx_info *info)
   place_operands (d);
 }
 
-/* Process a define_split just read.  Assign its code number,
-   only for reasons of consistency and to simplify genrecog.  */
-
-static void
-gen_split (md_rtx_info *info)
-{
-  struct pattern_stats stats;
-  data *d = new data;
-  int i;
-
-  d->code_number = info->index;
-  d->loc = info->loc;
-  d->name = 0;
-
-  /* Build up the list in the same order as the insns are seen
- in the machine description.  */
-  d->next = 0;
-  *idata_end = d;
-  idata_end = &d->next;
-
-  memset (d->operand, 0, sizeof (d->operand));
-
-  /* Get the number of operands by scanning all the patterns of the
- split patterns.  But ignore all the rest of the information thus
- obtained.  */
-  rtx split = info->def;
-  for (i = 0; i < XVECLEN (split, 0); i++)
-scan_operands (d, XVECEXP (split, 0, i), 0, 0);
-
-  get_pattern_stats (&stats, XVEC (split, 0));
-  d->n_generator_args = 0;
-  d->n_operands = stats.num_insn_operands;
-  d->n_dups = 0;
-  d->n_alternatives = 0;
-  d->template_code = 0;
-  d->output_format = INSN_OUTPUT_FORMAT_NONE;
-
-  place_operands (d);
-}
-
 static void
 init_insn_for_nothing (void)
 {
@@ -1055,11 +1015,6 @@ main (int argc, char **argv)
gen_expand (&info);
break;
 
-  case DEFINE_SPLIT:
-  case DEFINE_PEEPHOLE2:
-   gen_split (&info);
-   break;
-
   case DEFINE_CONSTRAINT:
   case DEFINE_REGISTER_CONSTRAINT:
   case DEFINE_ADDRESS_CONSTRAINT:
diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 6870058..9e00f13 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -43,11 +43,14 @@ int insn_elision = 1;
 static struct obstack obstack;
 struct obstack *rtl_obstack = &obstack;
 
-/* Counter for patterns that generate code: define_insn, define_expand,
-   define_split, define_peephole, and define_peephole2.  See read_md_rtx().
-   Any define_insn_and_splits are already in separate queues so that the
-   insn and the splitter get a unique number also.  */
-static int sequence_num;
+/* Counter for named patterns and INSN_CODEs.  */
+static int insn_sequence_num;
+
+/* Counter for define_splits.  */
+static int split_sequence_num;
+
+/* Counter for define_peephole2s.  */
+static int peephole2_sequence_num;
 
 static int predicable_default;
 static const char *predicable_true;
@@ -2504,7 +2507,11 @@ init_rtx_reader_args_cb (int argc, char **argv,
   obstack_init (rtl_obstack);
 
   /* Start at 1, to make 0 available for CODE_FOR_nothing.  */
-  sequence_num = 1;
+  insn_sequence_num = 1;
+
+  /* These sequences are not used as indices, so can start at 1 also.  */
+  split_sequence_num = 1;
+  peephole2_sequence_num = 1;
 
   read_md_files (argc, argv, parse_opt, rtx_handle_directive);
 
@@ -2539,30 +2546,8 @@ init_rtx_reader_args (int argc, char **argv)
 bool
 read_md_rtx (md_rtx_info *info)
 {
-  struct queue_elem **queue, *elem;
-  rtx desc;
-
- discard:
-
-  /* Read all patterns from a given queue before moving on to the next.  */
-  if (define_attr_queue != NULL)
-queue = &define_attr_queue;
-  else if (define_pred_queue != NULL)
-queue = &define_pred_queue;
-  else if (define_insn_queue != NULL)
-queue = &define_insn_queue;
-  else if (other_queue != NULL)
-queue = &other_queue;
-  else
-return false;
-
-  elem = *queue;
-  *queue = elem->next;
-  info->def = elem->data;
-  info->loc = elem->loc;
-  info->index = sequence_num;
-
-  free (elem);
+  int truth, *counter;
+  rtx def;
 
   /* Discard insn patterns which we know can never match (because
  their C test is provably always false).  If insn_elision is
@@ -2570,35 +2555,70 @@ read_md_rtx (md_rtx_info *info)
  elided patterns are never counted by the sequence numbering; it
  is the caller's responsibility, when insn_elision is false, not
  to use elided pattern numbers for anything.  */
-  desc = inf

Re: Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-08-05 Thread Jiong Wang


On 28/07/15 16:44, Martin Sebor wrote:


Attached is an updated patch with the changes above.



gcc/testsuite/ChangeLog
2015-07-28  Martin Sebor

* g++.dg/Wframe-address-in-Wall.C: New test.
* g++.dg/Wframe-address.C: New test.
* g++.dg/Wno-frame-address.C: New test.
* gcc.dg/Wframe-address-in-Wall.c: New test.
* gcc.dg/Wframe-address.c: New test.
* gcc.dg/Wno-frame-address.c: New test.

noticed the new added "Wno-frame-address.c" fail on arm-none-linux-gnueabihf 
native test.

from the comments in the testcase:

+/* Verify that -Wframe-address is not enabled by default by enabling
+   -Werror and verifying the test still compiles.  */

seems you want to make sure -Wframe-address work correctly with -Werror, while 
for arm,
return_address hook is defined to only support level 0, NULL_RTX returned for 
all other levels,
so this caused Wno-frame-address.c failed in those tem != NULL check for 
builtin_return_address.

Regards,
Jiong



Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-08-05 Thread Alan Lawrence

Richard Biener wrote:


Furthermore it doesn't work for three such ops which would require
an additional pattern like

 (simplfiy
  (bit_and:c (op @0 (min @1 @2)) (op @0 @3))
  (op @0 (min (min @1 @2) @3

if that's profitable?


Shouldn't that be just a case of binding @1 in the original pattern:


+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))

>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>> +(op @0 (min @1 @2)

to (min @1 @2) in your three-way pattern, @2 in the original to @3 in yours, and 
@0 to @0 ?


Or is @1 not allowed to bind to a 'min' ?


Secondly: should/can Michael's fix reference PR57600 ?

Cheers,
Alan



Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-08-05 Thread Jeff Law

On 08/05/2015 10:07 AM, Alan Lawrence wrote:

Richard Biener wrote:


Furthermore it doesn't work for three such ops which would require
an additional pattern like

 (simplfiy
  (bit_and:c (op @0 (min @1 @2)) (op @0 @3))
  (op @0 (min (min @1 @2) @3

if that's profitable?


Shouldn't that be just a case of binding @1 in the original pattern:


+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))

 >> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
 >> +(op @0 (min @1 @2)

to (min @1 @2) in your three-way pattern, @2 in the original to @3 in
yours, and @0 to @0 ?

Or is @1 not allowed to bind to a 'min' ?


Secondly: should/can Michael's fix reference PR57600 ?

It probably should.

Reading that BZ ISTM that we probably are missing some expression 
equivalences in DOM.  Given


x = min (a, b)

We should be recording
x <= a -> true
x <= b -> true

in our expression equivalency tables (plus all the related 
equivalences).  I'm not sure if that's the cause of the missed 
optimizations or not though.  Just something I noticed while reading the BZ.


jeff


Re: [PATCH][AARCH64]Add backend combine_bfi pattern.

2015-08-05 Thread Renlin Li

Hi Kyrill,

On 30/07/15 17:08, Kyrill Tkachov wrote:

Hi Renlin,

On 30/07/15 16:50, Renlin Li wrote:

Hi all,

This insn should match the following similar rtx pattern and remove the
redundant zero_extend operation if the width of zero_extract and
inner-size of zero_extend totally match.

(set (zero_extract:SI (reg/i:SI 0 x0)
 (const_int 8 [0x8])
 (const_int 0 [0]))
(zero_extend:SI (reg:QI 1 x1 [ y ])))


aarch64-none-elf regression tests Okay. Okay to commit?

Regards,
Renlin

gcc/ChangeLog:

2015-07-30  Renlin Li  

   * config/aarch64/aarch64.md (combine_bfi): New pattern.

gcc/testsuite/ChangeLog:

2015-07-30  Renlin Li  

   * gcc.target/aarch64/combine-bfi.c: New.

+(define_insn "*combine_bfi"
+  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
+ (match_operand 1 "const_int_operand" "n")
+ (match_operand 2 "const_int_operand" "n"))
+   (zero_extend:GPI (match_operand:ALLX 3  "register_operand" "r")))]
+  "UINTVAL (operands[1]) == "
+  "bfi\\t%0, %3, %2, %1"
+  [(set_attr "type" "bfm")]
+)

I notice we don't have any other patterns in aarch64 that start with combine_*.
Would it be better to name them something like 
"*aarch64_bfi4" instead?

Thanks for the suggestion. I have adjust the patch accordingly.

Regards,
Renlin



gcc/ChangeLog:

2015-08-05  Renlin Li  

* config/aarch64/aarch64.md
  (*aarch64_bfi4): New pattern.

gcc/testsuite/ChangeLog:

2015-08-05  Renlin Li  

* gcc.target/aarch64/combine-bfi.c: New.



Kyrill


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1dbadc0..858fe77 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3903,6 +3903,16 @@
   [(set_attr "type" "bfm")]
 )
 
+(define_insn "*aarch64_bfi4"
+  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
+			  (match_operand 1 "const_int_operand" "n")
+			  (match_operand 2 "const_int_operand" "n"))
+	(zero_extend:GPI (match_operand:ALLX 3  "register_operand" "r")))]
+  "UINTVAL (operands[1]) == "
+  "bfi\\t%0, %3, %2, %1"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*extr_insv_lower_reg"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
 			  (match_operand 1 "const_int_operand" "n")
diff --git a/gcc/testsuite/gcc.target/aarch64/combine-bfi.c b/gcc/testsuite/gcc.target/aarch64/combine-bfi.c
new file mode 100644
index 000..06331f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine-bfi.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+int
+f1 (int x, int y)
+{
+  return (x & ~0x000) | ((y << 8) & 0x000);
+}
+
+int
+f2 (int x, int y)
+{
+  return (x & ~0x0ff000) | ((y & 0x0ff) << 12);
+}
+
+int
+f3 (int x, int y)
+{
+  return (x & ~0x) | (y & 0x);
+}
+
+int
+f4 (int x, int y)
+{
+  return (x & ~0xff) | (y & 0xff);
+}
+
+long
+f5 (long x, long y)
+{
+  return (x & ~0xull) | (y & 0x);
+}
+
+/* { dg-final { scan-rtl-dump-times "\\*aarch64_bfi" 5 "combine" } } */


Re: Fix reload1.c warning for some targets

2015-08-05 Thread Jeff Law

On 08/05/2015 08:18 AM, Richard Sandiford wrote:

Building some targets results in a warning about orig_dup[i] potentially
being used uninitialised.  I think the warning is fair, since it isn't
obvious that the reog_data-based loop bound remains unchanged between:

   for (i = 0; i < recog_data.n_dups; i++)
 orig_dup[i] = *recog_data.dup_loc[i];

and:

   for (i = 0; i < recog_data.n_dups; i++)
 *recog_data.dup_loc[i] = orig_dup[i];

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard

gcc/
* reload1.c (elimination_costs_in_insn): Make it obvious to the
compiler that the n_dups and n_operands loop bounds are invariant.

There's a BZ about this issue.  55035.

What I want is to make sure we don't lose track of the false positive in 
55035 (caused by a miss jump thread due to aliasing issues).


So perhaps the way forward is to install your change and twiddle the 
summary of 55035 in some way that makes it more obvious the bz tracks a 
false positive from -Wuninitialized and attach 55035 to the 
-Wuninitialized meta bug (24639).


Does that work for you?

Jeff



Re: RFA: RL78: Fix multiply costs when optimizing for size

2015-08-05 Thread DJ Delorie

>   OK to apply ?

Ok.  Thanks!

> gcc/ChangeLog
> 2015-08-05  Nick Clifton  
> 
>   * config/rl78/rl78.c (rl78_rtx_costs): Treat MULT insns as cheap
>   if optimizing for size.
> 
> Index: gcc/config/rl78/rl78.c
> ===
> RCS file: /cvs/cvsfiles/gnupro/gcc/config/rl78/rl78.c,v
> retrieving revision 1.12.6.15
> diff -u -3 -p -r1.12.6.15 rl78.c
> --- gcc/config/rl78/rl78.c29 Jul 2015 12:24:04 -  1.12.6.15
> +++ gcc/config/rl78/rl78.c30 Jul 2015 15:20:10 -
> @@ -4161,7 +4161,9 @@ static bool rl78_rtx_costs (rtx   x,
>switch (code)
>   {
>   case MULT:
> -   if (RL78_MUL_G14)
> +   if (! speed)
> + * total = COSTS_N_INSNS (5);
> +   else if (RL78_MUL_G14)
>   *total = COSTS_N_INSNS (14);
> else if (RL78_MUL_G13)
>   *total = COSTS_N_INSNS (29);
> 


Re: Fix reload1.c warning for some targets

2015-08-05 Thread Richard Sandiford
Jeff Law  writes:
> On 08/05/2015 08:18 AM, Richard Sandiford wrote:
>> Building some targets results in a warning about orig_dup[i] potentially
>> being used uninitialised.  I think the warning is fair, since it isn't
>> obvious that the reog_data-based loop bound remains unchanged between:
>>
>>for (i = 0; i < recog_data.n_dups; i++)
>>  orig_dup[i] = *recog_data.dup_loc[i];
>>
>> and:
>>
>>for (i = 0; i < recog_data.n_dups; i++)
>>  *recog_data.dup_loc[i] = orig_dup[i];
>>
>> Tested on x86_64-linux-gnu.  OK to install?
>>
>> Thanks,
>> Richard
>>
>> gcc/
>>  * reload1.c (elimination_costs_in_insn): Make it obvious to the
>>  compiler that the n_dups and n_operands loop bounds are invariant.
> There's a BZ about this issue.  55035.
>
> What I want is to make sure we don't lose track of the false positive in 
> 55035 (caused by a miss jump thread due to aliasing issues).
>
> So perhaps the way forward is to install your change and twiddle the 
> summary of 55035 in some way that makes it more obvious the bz tracks a 
> false positive from -Wuninitialized and attach 55035 to the 
> -Wuninitialized meta bug (24639).

Is it really a false positive in this case though?  We have:

  for (i = 0; i < recog_data.n_dups; i++)
orig_dup[i] = *recog_data.dup_loc[i];

  for (i = 0; i < recog_data.n_operands; i++)
{
  orig_operand[i] = recog_data.operand[i];

  /* For an asm statement, every operand is eliminable.  */
  if (insn_is_asm || insn_data[icode].operand[i].eliminable)
{
  bool is_set_src, in_plus;

  /* Check for setting a register that we know about.  */
  if (recog_data.operand_type[i] != OP_IN
  && REG_P (orig_operand[i]))
{
  /* If we are assigning to a register that can be eliminated, it
 must be as part of a PARALLEL, since the code above handles
 single SETs.  We must indicate that we can no longer
 eliminate this reg.  */
  for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS];
   ep++)
if (ep->from_rtx == orig_operand[i])
  ep->can_eliminate = 0;
}

  /* Companion to the above plus substitution, we can allow
 invariants as the source of a plain move.  */
  is_set_src = false;
  if (old_set && recog_data.operand_loc[i] == &SET_SRC (old_set))
is_set_src = true;
  if (is_set_src && !sets_reg_p)
note_reg_elim_costly (SET_SRC (old_set), insn);
  in_plus = false;
  if (plus_src && sets_reg_p
  && (recog_data.operand_loc[i] == &XEXP (plus_src, 0)
  || recog_data.operand_loc[i] == &XEXP (plus_src, 1)))
in_plus = true;

  eliminate_regs_1 (recog_data.operand[i], VOIDmode,
NULL_RTX,
is_set_src || in_plus, true);
  /* Terminate the search in check_eliminable_occurrences at
 this point.  */
  *recog_data.operand_loc[i] = 0;
}
}

  for (i = 0; i < recog_data.n_dups; i++)
*recog_data.dup_loc[i]
  = *recog_data.operand_loc[(int) recog_data.dup_num[i]];

  /* If any eliminable remain, they aren't eliminable anymore.  */
  check_eliminable_occurrences (old_body);

  /* Restore the old body.  */
  for (i = 0; i < recog_data.n_operands; i++)
*recog_data.operand_loc[i] = orig_operand[i];
  for (i = 0; i < recog_data.n_dups; i++)
*recog_data.dup_loc[i] = orig_dup[i];

and I don't see how GCC could prove that eliminate_regs_1 doesn't
modify the value of recog_data.n_dups between the two loops.
eliminate_regs_1 calls functions like plus_constant that are defined
outside the TU and that certainly aren't pure/const.

So I think c#5 (marked as a bogus reduction) is an accurate reflection
of what reload1.c does.  c#4 looks like a genuine bug but seems different
from the reload1.c case.  If we still warn for c#4 then I think we
should keep the bugzilla entry open for that, but the warning for the
reload1.c code seems justified.  Perhaps the question is why it doesn't
trigger on more targets :-)

Thanks,
Richard



C++ PATCH for c++/65195 (variable template of reference type)

2015-08-05 Thread Jason Merrill

We were missing a convert_from_reference.

Tested x86_64-pc-linux-gnu, applying to trunk and 5.
commit f1f2cd02b7e4549657cf6eeab1d7eae0466ac7a8
Author: Jason Merrill 
Date:   Wed Aug 5 11:46:48 2015 -0400

	PR c++/65195
	PR c++/66619
	* semantics.c (finish_id_expression): Call convert_from_reference
	for variable template.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 44f9f7a..d42838e 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3564,6 +3564,7 @@ finish_id_expression (tree id_expression,
 	{
 	  decl = finish_template_variable (decl);
 	  mark_used (decl);
+	  decl = convert_from_reference (decl);
 	}
   else if (scope)
 	{
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ37.C b/gcc/testsuite/g++.dg/cpp1y/var-templ37.C
new file mode 100644
index 000..11021a3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ37.C
@@ -0,0 +1,23 @@
+// PR c++/65195
+// { dg-do compile { target c++14 } }
+
+template
+T constant {};
+
+template
+struct foo {
+int operator()() const
+{ return 3; }
+};
+
+template
+auto& f = constant>;
+
+int main()
+{
+// fine
+auto& ref = f; ref();
+
+// error: f cannot be used as a function
+f();
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ40.C b/gcc/testsuite/g++.dg/cpp1y/var-templ40.C
new file mode 100644
index 000..0a952c4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ40.C
@@ -0,0 +1,9 @@
+// PR c++/66619
+// { dg-do compile { target c++14 } }
+
+int y;
+template T val1 = y;
+auto&& x1 = val1;
+
+template T val2 = 0;
+auto&& x2 = val2;


C++ PATCH for c++/66260 (various variable template specialization issues)

2015-08-05 Thread Jason Merrill
The fundamental problem here was that I was setting the type of a 
variable TEMPLATE_ID_EXPR to unknown_type_node, which is not dependent, 
so places such as finish_id_expression that check dependent_type_p to 
determine whether an expression is type-dependent were getting the wrong 
answer.


Fixing that means that variable TEMPLATE_ID_EXPRs survive until they 
become non-dependent, and so we need to handle them in 
tsubst_copy_and_build.


We were also using the wrong arguments for a partial specialization in 
instantiate_decl because we forgot to update DECL_TI_ARGS when we 
figured out the correct partial specialization in instantiate_template_1.


Tested x86_64-pc-linux-gnu, applying to trunk and 5.
commit 2b427166d914064eaf5d20c0c293ec5e7dcf0e89
Author: Jason Merrill 
Date:   Wed Aug 5 12:02:11 2015 -0400

	PR c++/66260
	PR c++/66596
	PR c++/66649
	PR c++/66923
	* pt.c (lookup_template_variable): Use NULL_TREE for type.
	(instantiate_template_1): Also set DECL_TI_ARGS based on
	the immediate parent.
	(tsubst_copy_and_build) [TEMPLATE_ID_EXPR]: Handle variable templates.
	(finish_template_variable): Add complain parm.
	* cp-tree.h: Adjust.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 63fd6e9..e70dcb4 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5954,7 +5954,7 @@ extern tree perform_koenig_lookup		(tree, vec *,
 		 tsubst_flags_t);
 extern tree finish_call_expr			(tree, vec **, bool,
 		 bool, tsubst_flags_t);
-extern tree finish_template_variable	(tree);
+extern tree finish_template_variable		(tree, tsubst_flags_t = tf_warning_or_error);
 extern tree finish_increment_expr		(tree, enum tree_code);
 extern tree finish_this_expr			(void);
 extern tree finish_pseudo_destructor_expr   (tree, tree, tree, location_t);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7ad2334..f8c123c 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8187,14 +8187,14 @@ lookup_template_class (tree d1, tree arglist, tree in_decl, tree context,
 tree
 lookup_template_variable (tree templ, tree arglist)
 {
-  tree type = unknown_type_node;
+  tree type = NULL_TREE;
   return build2 (TEMPLATE_ID_EXPR, type, templ, arglist);
 }
 
 /* Instantiate a variable declaration from a TEMPLATE_ID_EXPR for use. */
 
 tree
-finish_template_variable (tree var)
+finish_template_variable (tree var, tsubst_flags_t complain)
 {
   tree templ = TREE_OPERAND (var, 0);
 
@@ -8203,7 +8203,6 @@ finish_template_variable (tree var)
   arglist = add_outermost_template_args (tmpl_args, arglist);
 
   tree parms = DECL_TEMPLATE_PARMS (templ);
-  tsubst_flags_t complain = tf_warning_or_error;
   arglist = coerce_innermost_template_parms (parms, arglist, templ, complain,
 	 /*req_all*/true,
 	 /*use_default*/true);
@@ -14750,6 +14749,17 @@ tsubst_copy_and_build (tree t,
 	if (targs)
 	  targs = tsubst_template_args (targs, args, complain, in_decl);
 
+	if (variable_template_p (templ))
+	  {
+	templ = lookup_template_variable (templ, targs);
+	if (!any_dependent_template_arguments_p (targs))
+	  {
+		templ = finish_template_variable (templ, complain);
+		mark_used (templ);
+	  }
+	RETURN (convert_from_reference (templ));
+	  }
+
 	if (TREE_CODE (templ) == COMPONENT_REF)
 	  {
 	object = TREE_OPERAND (templ, 0);
@@ -16153,6 +16163,7 @@ instantiate_template_1 (tree tmpl, tree orig_args, tsubst_flags_t complain)
   /* The DECL_TI_TEMPLATE should always be the immediate parent
  template, not the most general template.  */
   DECL_TI_TEMPLATE (fndecl) = tmpl;
+  DECL_TI_ARGS (fndecl) = targ_ptr;
 
   /* Now we know the specialization, compute access previously
  deferred.  */
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ32.C b/gcc/testsuite/g++.dg/cpp1y/var-templ32.C
index d9d2fff..80077a1 100644
--- a/gcc/testsuite/g++.dg/cpp1y/var-templ32.C
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ32.C
@@ -4,4 +4,4 @@ template
 bool V1 = true;
 
 template
-bool V1 = false; // { dg-error "primary template" }
+bool V1 = false; // { dg-error "primary template|not deducible" }
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ35.C b/gcc/testsuite/g++.dg/cpp1y/var-templ35.C
index c2c58ac..5ed0abc 100644
--- a/gcc/testsuite/g++.dg/cpp1y/var-templ35.C
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ35.C
@@ -2,4 +2,4 @@
 // { dg-do compile { target c++14 } }
 
 template int typeID{42};
-template double typeID{10.10}; // { dg-error "primary template|redeclaration" }
+template double typeID{10.10}; // { dg-error "primary template|redeclaration|not deducible" }
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ36.C b/gcc/testsuite/g++.dg/cpp1y/var-templ36.C
new file mode 100644
index 000..760e36f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ36.C
@@ -0,0 +1,15 @@
+// { dg-do compile { target c++14 } }
+
+template 
+constexpr T v = T();
+
+template 
+constexpr T v = T();
+
+template 
+struct A {
+  static constexpr decltype (v) v = ::v;
+};
+
+double d1 = v;
+double 

Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-08-05 Thread Martin Sebor

On 08/05/2015 10:02 AM, Jiong Wang wrote:


On 28/07/15 16:44, Martin Sebor wrote:


Attached is an updated patch with the changes above.



gcc/testsuite/ChangeLog
2015-07-28  Martin Sebor

 * g++.dg/Wframe-address-in-Wall.C: New test.
 * g++.dg/Wframe-address.C: New test.
 * g++.dg/Wno-frame-address.C: New test.
 * gcc.dg/Wframe-address-in-Wall.c: New test.
 * gcc.dg/Wframe-address.c: New test.
 * gcc.dg/Wno-frame-address.c: New test.

noticed the new added "Wno-frame-address.c" fail on
arm-none-linux-gnueabihf native test.

from the comments in the testcase:

+/* Verify that -Wframe-address is not enabled by default by enabling
+   -Werror and verifying the test still compiles.  */

seems you want to make sure -Wframe-address work correctly with -Werror,
while for arm,
return_address hook is defined to only support level 0, NULL_RTX
returned for all other levels,
so this caused Wno-frame-address.c failed in those tem != NULL check for
builtin_return_address.


Thanks. I'm traveling the next 10 days and most likely won't be
able to get to this until I get back on the 17th.

Martin


[Patch, fortran] Rename gfc_code's expr4 field

2015-08-05 Thread Mikael Morin

Hello,

this renames the expr4 field of gfc_code to the more descriptive 
ext.lock.acquired_lock.

Regression tested on x86_64-unkown-linux-gnu.  OK for trunk?

Mikael
2015-08-05  Mikael Morin  

* gfortran.h (struct gfc_code): Move expr4 field to
ext.lock.acquired_lock.
* dump-parse-tree.c (show_code_node): Update field usage.
* frontend-passes.c (gfc_code_walker): Likewise.
* match.c (lock_unlock_statement): Likewise.
* resolve.c (resolve_lock_unlock): Likewise.
* trans-stmt.c (gfc_trans_lock_unlock): Likewise.
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 83ecbaa..b782f6d 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1669,10 +1669,10 @@ show_code_node (int level, gfc_code *c)
   fputs ("lock-variable=", dumpfile);
   if (c->expr1 != NULL)
 	show_expr (c->expr1);
-  if (c->expr4 != NULL)
+  if (c->ext.lock.acquired_lock != NULL)
 	{
 	  fputs (" acquired_lock=", dumpfile);
-	  show_expr (c->expr4);
+	  show_expr (c->ext.lock.acquired_lock);
 	}
   if (c->expr2 != NULL)
 	{
diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c
index bc9f621..53783ea 100644
--- a/gcc/fortran/frontend-passes.c
+++ b/gcc/fortran/frontend-passes.c
@@ -3338,6 +3338,10 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 		break;
 	  }
 
+	case EXEC_LOCK:
+	  WALK_SUBEXPR (co->ext.lock.acquired_lock);
+	  break;
+
 	case EXEC_FORALL:
 	case EXEC_DO_CONCURRENT:
 	  {
@@ -3541,7 +3545,6 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn,
 	  WALK_SUBEXPR (co->expr1);
 	  WALK_SUBEXPR (co->expr2);
 	  WALK_SUBEXPR (co->expr3);
-	  WALK_SUBEXPR (co->expr4);
 	  for (b = co->block; b; b = b->block)
 	{
 	  WALK_SUBEXPR (b->expr1);
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 69de5ad..27decfb 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2390,7 +2390,7 @@ typedef struct gfc_code
 
   gfc_st_label *here, *label1, *label2, *label3;
   gfc_symtree *symtree;
-  gfc_expr *expr1, *expr2, *expr3, *expr4;
+  gfc_expr *expr1, *expr2, *expr3;
   /* A name isn't sufficient to identify a subroutine, we need the actual
  symbol for the interface definition.
   const char *sub_name;  */
@@ -2412,6 +2412,13 @@ typedef struct gfc_code
 }
 alloc;
 
+/* LOCK/UNLOCK statements  */
+struct
+{
+  gfc_expr *acquired_lock;
+}
+lock;
+
 struct
 {
   gfc_namespace *ns;
diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index 523e9b2..ae8e1cf 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -2899,7 +2899,7 @@ done:
   new_st.expr1 = lockvar;
   new_st.expr2 = stat;
   new_st.expr3 = errmsg;
-  new_st.expr4 = acq_lock;
+  new_st.ext.lock.acquired_lock = acq_lock;
 
   return MATCH_YES;
 
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 641a3bd..c9e379d 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -8730,15 +8730,16 @@ resolve_lock_unlock (gfc_code *code)
 return;
 
   /* Check ACQUIRED_LOCK.  */
-  if (code->expr4
-  && (code->expr4->ts.type != BT_LOGICAL || code->expr4->rank != 0
-	  || code->expr4->expr_type != EXPR_VARIABLE))
+  if (code->ext.lock.acquired_lock
+  && (code->ext.lock.acquired_lock->ts.type != BT_LOGICAL
+	  || code->ext.lock.acquired_lock->rank != 0
+	  || code->ext.lock.acquired_lock->expr_type != EXPR_VARIABLE))
 gfc_error ("ACQUIRED_LOCK= argument at %L must be a scalar LOGICAL "
-	   "variable", &code->expr4->where);
+	   "variable", &code->ext.lock.acquired_lock->where);
 
-  if (code->expr4
-  && !gfc_check_vardef_context (code->expr4, false, false, false,
-_("ACQUIRED_LOCK variable")))
+  if (code->ext.lock.acquired_lock
+  && !gfc_check_vardef_context (code->ext.lock.acquired_lock, false, false,
+false, _("ACQUIRED_LOCK variable")))
 return;
 }
 
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 6409f7f..5d140d6 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -681,7 +681,9 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
 
   /* Short cut: For single images without STAT= or LOCK_ACQUIRED
  return early. (ERRMSG= is always untouched for -fcoarray=single.)  */
-  if (!code->expr2 && !code->expr4 && flag_coarray != GFC_FCOARRAY_LIB)
+  if (!code->expr2
+  && !code->ext.lock.acquired_lock
+  && flag_coarray != GFC_FCOARRAY_LIB)
 return NULL_TREE;
 
   if (code->expr2)
@@ -694,11 +696,11 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
   else if (flag_coarray == GFC_FCOARRAY_LIB)
 stat = null_pointer_node;
 
-  if (code->expr4)
+  if (code->ext.lock.acquired_lock)
 {
-  gcc_assert (code->expr4->expr_type == EXPR_VARIABLE);
+  gcc_assert (code->ext.lock.acquired_lock->expr_type == EXPR_VARIABLE

Re: [Patch, fortran] Rename gfc_code's expr4 field

2015-08-05 Thread Steve Kargl
On Wed, Aug 05, 2015 at 08:54:21PM +0200, Mikael Morin wrote:
> 
> this renames the expr4 field of gfc_code to the more descriptive 
> ext.lock.acquired_lock.
> Regression tested on x86_64-unkown-linux-gnu.  OK for trunk?
> 

I don't have a problem with a name change.

> +/* LOCK/UNLOCK statements  */
> +struct
> +{
> +  gfc_expr *acquired_lock;
> +}
> +lock;

Just curious.  Why add a struct with a single member?
I would have thought s/expr4/acquired_lock should suffice.

-- 
Steve


Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-08-05 Thread Trevor Saunders
On Wed, Aug 05, 2015 at 11:34:28AM -0400, David Malcolm wrote:
> On Wed, 2015-08-05 at 11:28 -0400, David Malcolm wrote:
> > On Wed, 2015-08-05 at 13:47 +0200, Richard Biener wrote:
> > > On Wed, Aug 5, 2015 at 12:57 PM, Trevor Saunders  
> > > wrote:
> > > > On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
> > > >> On Sat, Jul 25, 2015 at 4:37 AM,   wrote:
> > > >> > From: Trevor Saunders 
> > > >> >
> > > >> > * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
> > > >> > config/ia64/ia64-protos.h, config/ia64/ia64.c, 
> > > >> > config/ia64/ia64.h,
> > > >> > config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
> > > >> > config/rs6000/rs6000.c, config/rs6000/xcoff.h, 
> > > >> > config/spu/spu.h,
> > > >> > config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL 
> > > >> > to
> > > >> > the name of a function.
> > > >> > * output.h (default_output_label): New prototype.
> > > >> > * varasm.c (default_output_label): New function.
> > > >> > * vmsdbgout.c: Include tm_p.h.
> > > >> > * xcoffout.c: Likewise.
> > > >>
> > > >> Just a general remark - the GCC output machinery is known to be slow,
> > > >> adding indirect calls might be not the very best idea without 
> > > >> refactoring
> > > >> some of it.
> > > >>
> > > >> Did you do any performance measurements for artificial testcases
> > > >> exercising the specific bits you change?
> > > >
> > > > sorry about the delay, but I finally got a chance to do some perf tests
> > > > of the first patch.  I took three test cases fold-const.ii, insn-emit.ii
> > > > and a random .i from firefox and did 3 trials of the length of 100
> > > > compilations.  The only non default flag was -std=gnu++11.
> > > >
> > > > results before patch hookizing output_ascii
> > > >
> > > > fold-const.ii
> > > > real3m18.051s
> > > > user2m41.340s
> > > > sys 0m36.544s
> > > > real3m18.141s
> > > > user2m42.236s
> > > > sys 0m35.740s
> > > > real3m18.297s
> > > > user2m42.316s
> > > > sys 0m35.804s
> > > >
> > > > insn-emit.ii
> > > > real9m58.229s
> > > > user8m26.960s
> > > > sys 1m31.224s
> > > > real9m57.857s
> > > > user8m24.616s
> > > > sys 1m33.072s
> > > > real9m57.922s
> > > > user8m25.232s
> > > > sys 1m32.512s
> > > >
> > > > mozilla.ii
> > > > real8m5.732s
> > > > user6m44.888s
> > > > sys 1m20.764s
> > > > real8m5.404s
> > > > user6m44.468s
> > > > sys 1m20.856s
> > > > real7m59.197s
> > > > user6m39.632s
> > > > sys 1m19.472s
> > > >
> > > > after patch
> > > >
> > > > fold-const.ii
> > > > real3m18.488s
> > > > user2m41.972s
> > > > sys 0m36.388s
> > > > real3m18.215s
> > > > user2m41.640s
> > > > sys 0m36.432s
> > > > real3m18.368s
> > > > user2m42.492s
> > > > sys 0m35.720s
> > > >
> > > > insn-emit.ii
> > > > real10m4.700s
> > > > user8m32.536s
> > > > sys 1m32.120s
> > > > real10m4.241s
> > > > user8m31.456s
> > > > sys 1m32.728s
> > > > real10m4.515s
> > > > user8m32.056s
> > > > sys 1m32.396s
> > > >
> > > > mozilla.ii
> > > > real7m58.018s
> > > > user6m38.008s
> > > > sys 1m19.924s
> > > > real7m59.269s
> > > > user6m37.736s
> > > > sys 1m21.448s
> > > > real7m58.254s
> > > > user6m37.828s
> > > > sys 1m20.324s
> > > >
> > > > So, roughly that looks to me like a range from improving by .5% to
> > > > regressing by 1%.  I'm not sure what could cause an improvement, so I
> > > > kind of wonder how valid these results are.
> > > 
> > > Hmm, indeed.  The speedup looks suspicious.
> > > 
> > > > Another question is how one can refactor the output machinary to be
> > > > faster.  My first  thought is to buffer text internally before calling
> > > > stdio functions, but that seems like a giant job.
> > > 
> > > stdio functions are already buffering, so I don't know either.
> > > 
> > > But yes, going the libas route would improve things here, or for
> > > example enhancing gas to be able to eat target binary data
> > > without the need to encode it in printable characters...
> > > 
> > > .raw_data number-of-bytes
> > > 
> > > 
> > > Makes it quite unparsable to editors of course ...
> > 
> > A middle-ground might be to do both:
> > 
> > .raw_data number-of-bytes
> > 
> 
> Sorry, I hit "Send" too early; I meant something like this as a
> middle-ground:
> 
>   .raw_data number-of-bytes
>   
> 
>   ; comment giving the formatted text
> 
> so that cc1 etc are doing the formatting work to make the comment, so
> that human readers can see what the raw data is meant to be, but the
> assembler doesn't have to do work to parse it.

well, having random bytes in the file might still screw up editors, and
I'd kind of expect that to be slower over all since gcc still does the
formating, and both gcc and as do more IO.

> FWIW, I once had a go at hiding asm_out_file

[PATCH] PR rtl-optimization/67029: gcc-5.2.0 unable to find a register to spill with O3 fsched-pressure fschedule-insns]

2015-08-05 Thread H.J. Lu
- Forwarded message from "H.J. Lu"  -

Date: Wed, 5 Aug 2015 13:24:20 -0700
From: "H.J. Lu" 
To: g...@gcc.gnu.org
Cc: Eric Botcazou , Steven Bosscher
, Richard Sandiford 
Subject: [PATCH] PR rtl-optimization/67029: gcc-5.2.0 unable to find a register
to spill with O3 fsched-pressure fschedule-insns
User-Agent: Mutt/1.5.23 (2014-03-12)

Since ira_implicitly_set_insn_hard_regs may be called outside of
ira-lives.c, it can't use the local variable, preferred_alternatives.
This patch adds an alternative_mask argument to
ira_implicitly_set_insn_hard_regs.

OK for master and 5 branch if there are no regressions on Linux/x86-64?

H.J.
---
gcc/

PR rtl-optimization/67029
* ira-color.c: Include "recog.h" before including "ira-int.h".
* target-globals.c: Likewise.
* ira-lives.c (ira_implicitly_set_insn_hard_regs): Add an
adds an alternative_mask argument and use it instead of
preferred_alternatives.
* ira.h (ira_implicitly_set_insn_hard_regs): Moved to ...
* ira-int.h (ira_implicitly_set_insn_hard_regs): Here.
* sched-deps.c: Include "ira-int.h" after including "ira.h".
(sched_analyze_insn): Update call to
ira_implicitly_set_insn_hard_regs.
* sel-sched.c: Include "ira-int.h" after including "ira.h".
(implicit_clobber_conflict_p): Update call to
ira_implicitly_set_insn_hard_regs.

gcc/testsuite/

PR rtl-optimization/67029
* gcc.dg/pr67029.c: New test.
---
 gcc/ira-color.c|  1 +
 gcc/ira-int.h  |  2 ++
 gcc/ira-lives.c|  4 ++--
 gcc/ira.h  |  1 -
 gcc/sched-deps.c   |  4 +++-
 gcc/sel-sched.c|  4 +++-
 gcc/target-globals.c   |  1 +
 gcc/testsuite/gcc.dg/pr67029.c | 14 ++
 8 files changed, 26 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr67029.c

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 74d2c2e..c8f33ed 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "ira.h"
 #include "alloc-pool.h"
+#include "recog.h"
 #include "ira-int.h"
 
 typedef struct allocno_hard_regs *allocno_hard_regs_t;
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index a7c0f40..a993dfc 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -1041,6 +1041,8 @@ extern void ira_debug_live_ranges (void);
 extern void ira_create_allocno_live_ranges (void);
 extern void ira_compress_allocno_live_ranges (void);
 extern void ira_finish_allocno_live_ranges (void);
+extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *,
+  alternative_mask);
 
 /* ira-conflicts.c */
 extern void ira_debug_conflicts (bool);
diff --git a/gcc/ira-lives.c b/gcc/ira-lives.c
index 1cb05c2..011d513 100644
--- a/gcc/ira-lives.c
+++ b/gcc/ira-lives.c
@@ -831,7 +831,8 @@ single_reg_operand_class (int op_num)
might be used by insn reloads because the constraints are too
strict.  */
 void
-ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set)
+ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set,
+  alternative_mask preferred)
 {
   int i, c, regno = 0;
   enum reg_class cl;
@@ -854,7 +855,6 @@ ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set)
  mode = (GET_CODE (op) == SCRATCH
  ? GET_MODE (op) : PSEUDO_REGNO_MODE (regno));
  cl = NO_REGS;
- alternative_mask preferred = preferred_alternatives;
  for (; (c = *p); p += CONSTRAINT_LEN (c, p))
if (c == '#')
  preferred &= ~ALTERNATIVE_BIT (0);
diff --git a/gcc/ira.h b/gcc/ira.h
index 504b5e6..881674b 100644
--- a/gcc/ira.h
+++ b/gcc/ira.h
@@ -192,7 +192,6 @@ extern void ira_init (void);
 extern void ira_setup_eliminable_regset (void);
 extern rtx ira_eliminate_regs (rtx, machine_mode);
 extern void ira_set_pseudo_classes (bool, FILE *);
-extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
 extern void ira_expand_reg_equiv (void);
 extern void ira_update_equiv_info_by_shuffle_insn (int, int, rtx_insn *);
 
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 3ac66e8..0a8dcb0 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "cselib.h"
 #include "ira.h"
+#include "ira-int.h"
 #include "target.h"
 
 #ifdef INSN_SCHEDULING
@@ -2891,7 +2892,8 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, 
rtx_insn *insn)
 
   extract_insn (insn);
   preprocess_constraints (insn);
-  ira_implicitly_set_insn_hard_regs (&temp);
+  alternative_mask prefrred = get_preferred_alternatives (insn);
+  ira_implicitly_set_insn_hard_regs (&temp, prefrred);
   AND_COMPL_HARD_REG_SET (temp, ira_no_alloc_regs);
   IOR_HARD_REG_SET (implicit_reg_pending_clobbers, temp);
  

Re: [Patch, fortran] Rename gfc_code's expr4 field

2015-08-05 Thread Mikael Morin

Le 05/08/2015 21:24, Steve Kargl a écrit :

On Wed, Aug 05, 2015 at 08:54:21PM +0200, Mikael Morin wrote:


+/* LOCK/UNLOCK statements  */
+struct
+{
+  gfc_expr *acquired_lock;
+}
+lock;


Just curious.  Why add a struct with a single member?
I would have thought s/expr4/acquired_lock should suffice.

Well, I plan to move the other LOCK/UNLOCK arguments there, and I will 
need the struct to pack them together.


Mikael


Re: [PATCH, libstdc++, testsuite] Remove redundant -save-temps options

2015-08-05 Thread Jonathan Wakely

On 04/08/15 20:09 +0300, Nikolai Bozhenov wrote:

Hi,

the attached patch removes redundant -save-temps options from some libstdc++
tests, since the option is not needed in dg-do-compile/scan-assembler tests.


Thanks, committed.


Re: Elimitate duplication of get_catalogs in different abi

2015-08-05 Thread Jonathan Wakely

On 30/07/15 21:57 +0200, François Dumont wrote:

It seems that this patch results in unresolved symbols.

I am quite sure that the code is right but build system should be adapted.

I noticed that *_cow.cc files are built with -fimplicit-templates. I try
to apply the same with the old abi but I still experiment unresolved
symbols.

Any help is welcome.


OK, I'll look into it next week, when I'm back from the GNU Cauldron.



Re: [Patch, fortran] Rename gfc_code's expr4 field

2015-08-05 Thread Steve Kargl
On Wed, Aug 05, 2015 at 10:53:29PM +0200, Mikael Morin wrote:
> Le 05/08/2015 21:24, Steve Kargl a écrit :
> > On Wed, Aug 05, 2015 at 08:54:21PM +0200, Mikael Morin wrote:
> >>
> >> +/* LOCK/UNLOCK statements  */
> >> +struct
> >> +{
> >> +  gfc_expr *acquired_lock;
> >> +}
> >> +lock;
> >
> > Just curious.  Why add a struct with a single member?
> > I would have thought s/expr4/acquired_lock should suffice.
> >
> Well, I plan to move the other LOCK/UNLOCK arguments there, and I will 
> need the struct to pack them together.
> 

Oh, I see.  As I stated I have no problems
with the patch, so it's OK to commit.

-- 
Steve


Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-08-05 Thread Segher Boessenkool
On Wed, Aug 05, 2015 at 03:59:18PM +0200, Richard Biener wrote:
> >> Makes it quite unparsable to editors of course ...
> >
> > The idea of having .S files that aren't reasonably editable seems kind
> > of silly, but I guess its up to the gas people.
> 
> Heh, indeed.  Maybe instead do
> 
> .insert_from_file  
> 
> and do that only when we are using -pipe or so.

That's ".incbin".  Do we really want to go through the headaches of using
extra files though?  Is this really a bottleneck?  Will it even help?


Segher


Re: Minor typo fixes

2015-08-05 Thread Abe

Thank you, sir.  :-)

Regards,

Abe




On 8/3/15 10:53 PM, Jeff Law wrote:


I was starting to look at Abe's changes to the gimple if-converter and realized 
 a handful
of the changes were just fixing comments.  No reason those shouldn't go in 
immediately.
So I pulled them out and applied those changes to the trunk.



Abe -- if you find more of those kind of changes, don't hesitate  to break
them out into their own patch and they can go forward very quickly.



Attached are the fixes that were actually applied.  They were bootstrapped for 
completeness.


Re: [PR64164] drop copyrename, integrate into expand

2015-08-05 Thread Alexandre Oliva
On Aug  5, 2015, Richard Biener  wrote:

> It was just a hunch when you talked about BLKmode and params in memory.
> As coalescing is about SSA name (thus register) coalescing I was thinking
> that if you coalesce a register with incoming memory you'll end up with
> more memory accesses?

Since we only coalesce variables whose promoted mode is the same, if one
of them gets BLKmode and has to live in memory, so would all the others
it might coalesce with.  So, even though we have gimple_regs, we can't
have pseudos.  This was observed with vector types for which no native
vector mode is available.

It would still make sense to share them when possible, to reduce the
number of mem-to-mem copies.  And we don't want to copy incoming BLKmode
parms to *another* memory location if we can help it.

Now, maybe you're concerned about incoming parms passed by reference
that *can* be held in pseudos.  For those, we will perform a load from
memory to a pseudo and use that, even if the pseudo ends up allocated in
memory.

> I also thought of the RTL expansion thing we do with at first copying
> the hardreg incoming args to pseudos and how that interacts with
> coalescing.

Most of what changed now is who gets to choose the pseudo; it used to be
assign_parms, now it's cfgexpand.  The other significant change is that
now, when cfgexpand detects a BLKmode parm, it will choose MEM, but it
won't set up the address, so that assign_parms still does what it used
to, namely, copy the incoming hard reg to a pseudo, and then use the
pseudo as the MEM address.

> But I guess you have eyed code-gen changes a bit anyway.

Yeah.  Not much has changed in the before parm_birth area; expected
changes have to do with the pseudo numbering.  IIRC, anything else would
be unexpected.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


2 C++ cleanup PATCHes

2015-08-05 Thread Jason Merrill
1) While working on 66260, it struck me as odd that finish_id_expression 
had its own code for determining whether an id-expression is dependent, 
rather than using type_dependent_expression_p.  So this patch tears out 
a bunch of code and replaces it with a call; t_d_e_p already handled 
everything except a TEMPLATE_ID_EXPR with an IDENTIFIER_NODE as its lhs. 
 I also needed to tweak handling of CONST_DECLs so that we return them 
unchanged even if there is an explicit scope.


2) When we reject an explicit specialization because there is no 
template that matches it, it would be friendly to print a list of the 
candidates considered.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8b381ca4290de9a06b1ae437d04703ae84308270
Author: Jason Merrill 
Date:   Wed Aug 5 12:04:27 2015 -0400

	* decl.c (cp_finish_decl): Tidy.
	* typeck.c (finish_class_member_access_expr): Use
	type_dependent_expression_p.
	* semantics.c (finish_id_expression): Use
	type_dependent_expression_p.  Don't build_qualified_name for a
	decl in non-dependent scope.
	* pt.c (type_dependent_expression_p): A TEMPLATE_ID_EXPR of an
	identifier is dependent.  Remove variable_template_p check.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 52584c5..208173a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6525,11 +6525,10 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p,
 	 then it can be used in future constant expressions, so its value
 	 must be available. */
 
-  if (!VAR_P (decl) || dependent_type_p (type))
+  if (!VAR_P (decl) || type_dependent_p)
 	/* We can't do anything if the decl has dependent type.  */;
   else if (init
 	   && init_const_expr_p
-	   && !type_dependent_p
 	   && TREE_CODE (type) != REFERENCE_TYPE
 	   && decl_maybe_constant_var_p (decl)
 	   && !type_dependent_init_p (init)
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f8c123c..5f28f1b 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -21671,11 +21671,10 @@ type_dependent_expression_p (tree expression)
 	  (TREE_OPERAND (expression, 1)))
 	return true;
 	  expression = TREE_OPERAND (expression, 0);
+	  if (identifier_p (expression))
+	return true;
 	}
 
-  if (variable_template_p (expression))
-return dependent_type_p (TREE_TYPE (expression));
-
   gcc_assert (TREE_CODE (expression) == OVERLOAD
 		  || TREE_CODE (expression) == FUNCTION_DECL);
 
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index d42838e..17b0a14 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3362,7 +3362,7 @@ finish_id_expression (tree id_expression,
 }
   else
 {
-  bool dependent_p;
+  bool dependent_p = type_dependent_expression_p (decl);
 
   /* If the declaration was explicitly qualified indicate
 	 that.  The semantics of `A::f(3)' are different than
@@ -3371,79 +3371,25 @@ finish_id_expression (tree id_expression,
 	  ? CP_ID_KIND_QUALIFIED
 	  : (TREE_CODE (decl) == TEMPLATE_ID_EXPR
 		 ? CP_ID_KIND_TEMPLATE_ID
-		 : CP_ID_KIND_UNQUALIFIED));
-
-
-  /* [temp.dep.expr]
-
-	 An id-expression is type-dependent if it contains an
-	 identifier that was declared with a dependent type.
-
-	 The standard is not very specific about an id-expression that
-	 names a set of overloaded functions.  What if some of them
-	 have dependent types and some of them do not?  Presumably,
-	 such a name should be treated as a dependent name.  */
-  /* Assume the name is not dependent.  */
-  dependent_p = false;
-  if (!processing_template_decl)
-	/* No names are dependent outside a template.  */
-	;
-  else if (TREE_CODE (decl) == CONST_DECL)
-	/* We don't want to treat enumerators as dependent.  */
-	;
-  /* A template-id where the name of the template was not resolved
-	 is definitely dependent.  */
-  else if (TREE_CODE (decl) == TEMPLATE_ID_EXPR
-	   && (identifier_p (TREE_OPERAND (decl, 0
-	dependent_p = true;
-  /* For anything except an overloaded function, just check its
-	 type.  */
-  else if (!is_overloaded_fn (decl))
-	dependent_p
-	  = dependent_type_p (TREE_TYPE (decl));
-  /* For a set of overloaded functions, check each of the
-	 functions.  */
-  else
-	{
-	  tree fns = decl;
-
-	  if (BASELINK_P (fns))
-	fns = BASELINK_FUNCTIONS (fns);
-
-	  /* For a template-id, check to see if the template
-	 arguments are dependent.  */
-	  if (TREE_CODE (fns) == TEMPLATE_ID_EXPR)
-	{
-	  tree args = TREE_OPERAND (fns, 1);
-	  dependent_p = any_dependent_template_arguments_p (args);
-	  /* The functions are those referred to by the
-		 template-id.  */
-	  fns = TREE_OPERAND (fns, 0);
-	}
-
-	  /* If there are no dependent template arguments, go through
-	 the overloaded functions.  */
-	  while (fns && !dependent_p)
-	{
-	  tree fn = OVL_CURRENT (fns);
-
-	  /* Member functions of dependent classes are
-		 dependent.  */
-	  if (TREE_CODE (fn) == 

RE: [PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass

2015-08-05 Thread Thomas Preud'homme
Hi,

> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Tuesday, July 28, 2015 3:04 PM
> 
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> >
> > ChangeLog entry is as follows:
> >
> > 2015-07-28  Thomas Preud'homme  
> >
> > PR tree-optimization/66828
> > * tree-ssa-math-opts.c (perform_symbolic_merge): Change type
> of
> > inc
> > from int64_t to uint64_t.

Can I backport this change to GCC 5 branch? The patch applies cleanly on
GCC 5 and shows no regression on a native x86_64-linux-gnu bootstrapped
GCC and an arm-none-eabi GCC cross-compiler.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ba37d96..a301c23 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-08-04  Thomas Preud'homme  
+
+   Backport from mainline
+   2015-07-28  Thomas Preud'homme  
+
+   PR tree-optimization/66828
+   * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of inc
+   from int64_t to uint64_t.
+
 2015-08-03  John David Anglin  
 
PR target/67060
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index c22a677..c699dcadb 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1856,7 +1856,7 @@ perform_symbolic_merge (gimple source_stmt1, struct 
symbolic_number *n1,
  the same base (array, structure, ...).  */
   if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2))
 {
-  int64_t inc;
+  uint64_t inc;
   HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
   struct symbolic_number *toinc_n_ptr, *n_end;


Best regards,

Thomas




Re: [PATCH] PR rtl-optimization/67029: gcc-5.2.0 unable to find a register to spill with O3 fsched-pressure fschedule-insns

2015-08-05 Thread H.J. Lu
On Wed, Aug 05, 2015 at 01:24:20PM -0700, H.J. Lu wrote:
> Since ira_implicitly_set_insn_hard_regs may be called outside of
> ira-lives.c, it can't use the local variable, preferred_alternatives.
> This patch adds an alternative_mask argument to
> ira_implicitly_set_insn_hard_regs.
> 
> OK for master and 5 branch if there are no regressions on Linux/x86-64?
> 
> H.J.
> ---
> gcc/
> 
>   PR rtl-optimization/67029
>   * ira-color.c: Include "recog.h" before including "ira-int.h".
>   * target-globals.c: Likewise.
>   * ira-lives.c (ira_implicitly_set_insn_hard_regs): Add an
>   adds an alternative_mask argument and use it instead of
>   preferred_alternatives.
>   * ira.h (ira_implicitly_set_insn_hard_regs): Moved to ...
>   * ira-int.h (ira_implicitly_set_insn_hard_regs): Here.
>   * sched-deps.c: Include "ira-int.h" after including "ira.h".
>   (sched_analyze_insn): Update call to
>   ira_implicitly_set_insn_hard_regs.
>   * sel-sched.c: Include "ira-int.h" after including "ira.h".
>   (implicit_clobber_conflict_p): Update call to
>   ira_implicitly_set_insn_hard_regs.
> 

Here is a simpler patch to add preferred_alternatives to recog_data_d so
that preferred_alternatives is available in recog_data_d when needed.

OK for master and 5 branch if there are no regressions on Linux/x86-64?

H.J.
--
gcc/

PR rtl-optimization/67029
* ira-lives.c (preferred_alternatives): Removed.
(check_and_make_def_conflict): Replace preferred_alternatives
with recog_data.preferred_alternatives.
(make_early_clobber_and_input_conflicts): Likewise.
(single_reg_class): Likewise.
(ira_implicitly_set_insn_hard_regs): Likewise.
(process_bb_node_lives): Pass true to extract_insn.  Don't set
preferred_alternatives.
* recog.c (extract_insn): Add an argument, preferred.  Call
get_preferred_alternatives to initialize preferred_alternatives
if preferred is true.
* recog.h (extract_insn): Add an argument, preferred, and
default to false.
(recog_data_d): Add preferred_alternatives.
* sched-deps.c (sched_analyze_insn): Pass true to extract_insn.
* sel-sched.c (implicit_clobber_conflict_p): Likewise.

gcc/testsuite/

PR rtl-optimization/67029
* gcc.dg/pr67029.c: New test.
---
 gcc/ira-lives.c| 15 +--
 gcc/recog.c|  6 +-
 gcc/recog.h|  6 +-
 gcc/sched-deps.c   |  2 +-
 gcc/sel-sched.c|  2 +-
 gcc/testsuite/gcc.dg/pr67029.c | 14 ++
 6 files changed, 31 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr67029.c

diff --git a/gcc/ira-lives.c b/gcc/ira-lives.c
index 1cb05c2..aad3224 100644
--- a/gcc/ira-lives.c
+++ b/gcc/ira-lives.c
@@ -86,10 +86,6 @@ static int last_call_num;
 /* The number of last call at which given allocno was saved.  */
 static int *allocno_saved_at_call;
 
-/* The value of get_preferred_alternatives for the current instruction,
-   supplemental to recog_data.  */
-static alternative_mask preferred_alternatives;
-
 /* Record the birth of hard register REGNO, updating hard_regs_live and
hard reg conflict information for living allocnos.  */
 static void
@@ -648,7 +644,7 @@ check_and_make_def_conflict (int alt, int def, enum 
reg_class def_cl)
 instruction due to the earlyclobber, reload must fix it up.  */
   for (alt1 = 0; alt1 < recog_data.n_alternatives; alt1++)
{
- if (!TEST_BIT (preferred_alternatives, alt1))
+ if (!TEST_BIT (recog_data.preferred_alternatives, alt1))
continue;
  const operand_alternative *op_alt1
= &recog_op_alt[alt1 * n_operands];
@@ -698,7 +694,7 @@ make_early_clobber_and_input_conflicts (void)
   int n_operands = recog_data.n_operands;
   const operand_alternative *op_alt = recog_op_alt;
   for (alt = 0; alt < n_alternatives; alt++, op_alt += n_operands)
-if (TEST_BIT (preferred_alternatives, alt))
+if (TEST_BIT (recog_data.preferred_alternatives, alt))
   for (def = 0; def < n_operands; def++)
{
  def_cl = NO_REGS;
@@ -765,7 +761,7 @@ single_reg_class (const char *constraints, rtx op, rtx 
equiv_const)
   enum constraint_num cn;
 
   cl = NO_REGS;
-  alternative_mask preferred = preferred_alternatives;
+  alternative_mask preferred = recog_data.preferred_alternatives;
   for (; (c = *constraints); constraints += CONSTRAINT_LEN (c, constraints))
 if (c == '#')
   preferred &= ~ALTERNATIVE_BIT (0);
@@ -854,7 +850,7 @@ ira_implicitly_set_insn_hard_regs (HARD_REG_SET *set)
  mode = (GET_CODE (op) == SCRATCH
  ? GET_MODE (op) : PSEUDO_REGNO_MODE (regno));
  cl = NO_REGS;
- alternative_mask preferred = preferred_alternatives;
+ alternative_mask preferred = recog_data.preferred_alternatives;
  for (; (c = *p)