Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-18 Thread Marc Glisse

On Thu, 17 Sep 2015, Michael Collison wrote:

Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR 
expressions. I need some assistance; this test case will fail on targets that 
don't have support for MIN/MAX such as 68k. Is there any way to remedy this 
short of enumerating whether a target support MIN/MAX in 
testsuite/lib/target_support?


2015-07-24  Michael Collison 
   Andrew Pinski 

   * match.pd ((x < y) && (x < z) -> x < min (y,z),
   (x > y) and (x > z) -> x > max (y,z))
   * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
(convert (bit_and (op (convert:utype @0) (convert:utype @1))
  (convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify


You seem to be missing all indentation.


+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to use op:s 
since this is mostly useful if it removes the 2 original comparisons.



+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))


How did you chose this restriction? It seems safe enough, but the 
transformation could make sense in other cases as well. It can always be 
generalized later though.



+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)


Note that you could unify the patterns with something like:
(for op (lt le gt ge)
 ext (min min max max)
 (simplify ...


+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c

new file mode 100644
index 000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)


Maybe writing '&' instead of '&&' would make it depend less on the target. 
Also, both tests seem to be for GENERIC (i.e. I expect that you are 
already seeing the optimized version with -fdump-tree-original or 
-fdump-tree-gimple). Maybe something as simple as:

int f(long a, long b, long c) {
  int cmp1 = a < b;
  int cmp2 = a < c;
  return cmp1 & cmp2;
}


+a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */


--
Marc Glisse


Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-18 Thread Marc Glisse
Just a couple extra points. We can end up with a mix of < and >, which 
might prevent from matching:


  _3 = b_1(D) > a_2(D);
  _5 = a_2(D) < c_4(D);
  _8 = _3 & _5;

Just like with &, we could also transform:
x < y | x < z  --->  x < max(y, z)

(but maybe wait to make sure reviewers are ok with the first 
transformation before generalizing)


On Fri, 18 Sep 2015, Marc Glisse wrote:


On Thu, 17 Sep 2015, Michael Collison wrote:

Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR 
expressions. I need some assistance; this test case will fail on targets 
that don't have support for MIN/MAX such as 68k. Is there any way to remedy 
this short of enumerating whether a target support MIN/MAX in 
testsuite/lib/target_support?


2015-07-24  Michael Collison 
   Andrew Pinski 

   * match.pd ((x < y) && (x < z) -> x < min (y,z),
   (x > y) and (x > z) -> x > max (y,z))
   * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
(convert (bit_and (op (convert:utype @0) (convert:utype @1))
  (convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify


You seem to be missing all indentation.


+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to use op:s 
since this is mostly useful if it removes the 2 original comparisons.



+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))


How did you chose this restriction? It seems safe enough, but the 
transformation could make sense in other cases as well. It can always be 
generalized later though.



+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)


Note that you could unify the patterns with something like:
(for op (lt le gt ge)
ext (min min max max)
(simplify ...


+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c

new file mode 100644
index 000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)


Maybe writing '&' instead of '&&' would make it depend less on the target. 
Also, both tests seem to be for GENERIC (i.e. I expect that you are already 
seeing the optimized version with -fdump-tree-original or 
-fdump-tree-gimple). Maybe something as simple as:

int f(long a, long b, long c) {
 int cmp1 = a < b;
 int cmp2 = a < c;
 return cmp1 & cmp2;
}


+a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */





--
Marc Glisse


[PATCH, libgomp] PR 67141, uninitialized acc_device_lock mutex

2015-09-18 Thread Chung-Lin Tang
Hi,
this patch fixes the uninitialized acc_device_lock mutex situation
reported in PR 67141. The patch attached on the bugzilla page
tries to solve it by constructor priorities, which we think will
probably be less manageable in general.

This patch changes goacc_host_init() to be called from
goacc_runtime_initialize() instead, thereby ensuring the init order.
libgomp testsuite was re-run without regressions, okay for trunk?

Thanks,
Chung-Lin

2015-09-18  Chung-Lin Tang  

PR libgomp/67141

* oacc-int.h (goacc_host_init): Add declaration.
* oacc-host.c (goacc_host_init): Remove static and
constructor attribute
* oacc-init.c (goacc_runtime_initialize): Call goacc_host_init()
at end.
Index: oacc-host.c
===
--- oacc-host.c	(revision 227895)
+++ oacc-host.c	(working copy)
@@ -256,7 +256,7 @@ static struct gomp_device_descr host_dispatch =
   };
 
 /* Initialize and register this device type.  */
-static __attribute__ ((constructor)) void
+void
 goacc_host_init (void)
 {
   gomp_mutex_init (&host_dispatch.lock);
Index: oacc-int.h
===
--- oacc-int.h	(revision 227895)
+++ oacc-int.h	(working copy)
@@ -97,6 +97,7 @@ void goacc_runtime_initialize (void);
 void goacc_save_and_set_bind (acc_device_t);
 void goacc_restore_bind (void);
 void goacc_lazy_initialize (void);
+void goacc_host_init (void);
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility pop
Index: oacc-init.c
===
--- oacc-init.c	(revision 227895)
+++ oacc-init.c	(working copy)
@@ -644,6 +644,9 @@ goacc_runtime_initialize (void)
 
   goacc_threads = NULL;
   gomp_mutex_init (&goacc_thread_lock);
+
+  /* Initialize and register the 'host' device type.  */
+  goacc_host_init ();
 }
 
 /* Compiler helper functions */


[PATCH] Work towards fixing PR66142

2015-09-18 Thread Richard Biener

The following patch fixes PR66142 up to the point where we run into
a alias disambiguation issue.  One step at a time...

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-09-18  Richard Biener  

PR tree-optimization/66142
* fold-const.c (operand_equal_p): When OEP_ADDRESS_OF
treat MEM[&x] and x the same.
* tree-ssa-sccvn.h (vn_reference_fold_indirect): Remove.
* tree-ssa-sccvn.c (vn_reference_fold_indirect): Return true
when we simplified sth.
(vn_reference_maybe_forwprop_address): Likewise.
(valueize_refs_1): When we simplified through
vn_reference_fold_indirect or vn_reference_maybe_forwprop_address
set valueized_anything to true.
(vn_reference_lookup_3): Use stmt_kills_ref_p to see whether
one ref kills the other instead of just a offset-based test.
* tree-ssa-alias.c (stmt_kills_ref_p): Use OEP_ADDRESS_OF
for the operand_equal_p test to compare bases and also compare
sizes.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 227859)
--- gcc/fold-const.c(working copy)
*** operand_equal_p (const_tree arg0, const_
*** 2752,2761 
   TREE_OPERAND (arg1, 0), flags);
  }
  
!   if (TREE_CODE (arg0) != TREE_CODE (arg1)
/* NOP_EXPR and CONVERT_EXPR are considered equal.  */
!   && !(CONVERT_EXPR_P (arg0) && CONVERT_EXPR_P (arg1)))
! return 0;
  
/* This is needed for conversions and for COMPONENT_REF.
   Might as well play it safe and always test this.  */
--- 2759,2791 
   TREE_OPERAND (arg1, 0), flags);
  }
  
!   if (TREE_CODE (arg0) != TREE_CODE (arg1))
! {
/* NOP_EXPR and CONVERT_EXPR are considered equal.  */
!   if (CONVERT_EXPR_P (arg0) && CONVERT_EXPR_P (arg1))
!   ;
!   else if (flags & OEP_ADDRESS_OF)
!   {
! /* If we are interested in comparing addresses ignore
!MEM_REF wrappings of the base that can appear just for
!TBAA reasons.  */
! if (TREE_CODE (arg0) == MEM_REF
! && DECL_P (arg1)
! && TREE_CODE (TREE_OPERAND (arg0, 0)) == ADDR_EXPR
! && TREE_OPERAND (TREE_OPERAND (arg0, 0), 0) == arg1
! && integer_zerop (TREE_OPERAND (arg0, 1)))
!   return 1;
! else if (TREE_CODE (arg1) == MEM_REF
!  && DECL_P (arg0)
!  && TREE_CODE (TREE_OPERAND (arg1, 0)) == ADDR_EXPR
!  && TREE_OPERAND (TREE_OPERAND (arg1, 0), 0) == arg0
!  && integer_zerop (TREE_OPERAND (arg1, 1)))
!   return 1;
! return 0;
!   }
!   else
!   return 0;
! }
  
/* This is needed for conversions and for COMPONENT_REF.
   Might as well play it safe and always test this.  */
Index: gcc/tree-ssa-sccvn.h
===
*** gcc/tree-ssa-sccvn.h(revision 227859)
--- gcc/tree-ssa-sccvn.h(working copy)
*** vn_nary_op_t vn_nary_op_insert (tree, tr
*** 204,211 
  vn_nary_op_t vn_nary_op_insert_stmt (gimple, tree);
  vn_nary_op_t vn_nary_op_insert_pieces (unsigned int, enum tree_code,
   tree, tree *, tree, unsigned int);
- void vn_reference_fold_indirect (vec *,
-unsigned int *);
  bool ao_ref_init_from_vn_reference (ao_ref *, alias_set_type, tree,
vec );
  tree vn_reference_lookup_pieces (tree, alias_set_type, tree,
--- 204,209 
Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 227859)
--- gcc/tree-ssa-sccvn.c(working copy)
*** copy_reference_ops_from_call (gcall *cal
*** 1184,1190 
  
  /* Fold *& at position *I_P in a vn_reference_op_s vector *OPS.  Updates
 *I_P to point to the last element of the replacement.  */
! void
  vn_reference_fold_indirect (vec *ops,
unsigned int *i_p)
  {
--- 1200,1206 
  
  /* Fold *& at position *I_P in a vn_reference_op_s vector *OPS.  Updates
 *I_P to point to the last element of the replacement.  */
! static bool
  vn_reference_fold_indirect (vec *ops,
unsigned int *i_p)
  {
*** vn_reference_fold_indirect (vecoff = tree_to_shwi (mem_op->op0);
else
mem_op->off = -1;
  }
  }
  
  /* Fold *& at position *I_P in a vn_reference_op_s vector *OPS.  Updates
 *I_P to point to the last element of the replacement.  */
! static void
  vn_reference_maybe_forwprop_address (vec *ops,
 unsigned int *i_p)
  {
--- 1226,1239 
mem_op->off = tree_to_shwi (mem_op->op0);
else
mem_op->off = -1;
+   r

Re: [AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations.

2015-09-18 Thread James Greenhalgh
On Thu, Sep 17, 2015 at 05:37:55PM +0100, Matthew Wahab wrote:
> Hello,
> 
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch series adds the
> instructions to GCC, making them available with -march=armv8.1-a or
> -march=armv8+lse, and uses them to implement the __sync and __atomic
> builtins.
> 
> The ARMv8.1 swap instruction swaps the value in a register with a value
> in memory. The load-operate instructions load a value from memory,
> update it with the result of an operation and store the result in
> memory.
> 
> This series uses the swap instruction to implement the atomic_exchange
> patterns and the load-operate instructions to implement the
> atomic_fetch_ and atomic__fetch patterns. For the
> atomic__fetch patterns, the value returned as the result of the
> operation has to be recalculated from the loaded data. The ARMv8 BIC
> instruction is added so that it can be used for this recalculation.
> 
> The patches in this series
> - add and use the atomic swap instruction.
> - add the Aarch64 BIC instruction,
> - add the ARMv8.1 load-operate instructions,
> - use the load-operate instructions to implement the atomic_fetch_
>patterns,
> - use the load-operate instructions to implement the patterns
>atomic__fetch patterns,
> 
> The code-generation changes in this series are based around a new
> function, aarch64_gen_atomic_ldop, which takes the operation to be
> implemented and emits the appropriate code, making use of the atomic
> instruction. This follows the existing uses aarch64_split_atomic_op for
> the same purpose when atomic instructions aren't available.
> 
> This patch adds the ARMv8.1 SWAP instruction and function
> aarch64_gen_atomic_ldop and changes the implementation of the
> atomic_exchange pattern to the atomic instruction when it is available.
> 
> The general form of the code generated for an atomic_exchange, with
> destination D, source S, memory address A and memory order MO is:
> 
> swp S, D, [A]
> 
> where
>  is one of {'', 'a', 'l', 'al'} depending on memory order MO.
>  is one of {'', 'b', 'h'} depending on the data size.
> 
> This patch also adds tests for the changes. These reuse the support code
> introduced for the atomic CAS tests, adding macros to test functions
> taking one memory ordering argument. These are used to iteratively
> define functions using the __atomic_exchange builtins, which should be
> implemented using the atomic swap.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?

OK, though I have a question on one patch hunk.

> gcc/
> 2015-09-17  Matthew Wahab  
> 
>   * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
>   Declare.
>   * config/aarch64/aarch64.c (aarch64_emit_atomic_swp): New.
>   (aarch64_gen_atomic_ldop): New.
>   (aarch64_split_atomic_op): Fix whitespace and add a comment.
>   * config/aarch64/atomics.md (UNSPECV_ATOMIC_SWP): New.
>   (atomic_compare_and_swap_lse): Remove comments and fix
>   whitespace.
>   (atomic_exchange): Replace with an expander.
>   (aarch64_atomic_exchange): New.
>   (aarch64_atomic_exchange_lse): New.
>   (aarch64_atomic_): Fix some whitespace.
>   (aarch64_atomic_swp): New.
> 
> 
> gcc/testsuite/
> 2015-09-17  Matthew Wahab  
> 
>   * gcc.target/aarch64/atomic-inst-ops.inc: (TEST_MODEL): New.
>   (TEST_ONE): New.
>   * gcc.target/aarch64/atomic-inst-swap.c: New.
> 

> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index 65d2cc9..0e71002 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -27,6 +27,7 @@
>  UNSPECV_ATOMIC_CMPSW ; Represent an atomic compare swap.
>  UNSPECV_ATOMIC_EXCHG ; Represent an atomic exchange.
>  UNSPECV_ATOMIC_CAS   ; Represent an atomic CAS.
> +UNSPECV_ATOMIC_SWP   ; Represent an atomic SWP.
>  UNSPECV_ATOMIC_OP; Represent an atomic operation.
>  ])
>  
> @@ -122,19 +123,19 @@
>  )
>  
>  (define_insn_and_split "aarch64_compare_and_swap_lse"
> -  [(set (reg:CC CC_REGNUM)   ;; bool out
> +  [(set (reg:CC CC_REGNUM)
>  (unspec_volatile:CC [(const_int 0)] UNSPECV_ATOMIC_CMPSW))
> -   (set (match_operand:GPI 0 "register_operand" "=&r")   ;; val 
> out
> -(match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
> +   (set (match_operand:GPI 0 "register_operand" "=&r")
> +(match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))
> (set (match_dup 1)
>  (unspec_volatile:GPI
> -  [(match_operand:GPI 2 "aarch64_plus_operand" "rI") ;; expect
> -   (match_operand:GPI 3 "register_operand" "r")  ;; desir

Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Ville Voutilainen
On 18 September 2015 at 02:02, Ville Voutilainen
 wrote:
> Ahem, oops, the patch doesn't do any sort of a pedwarn for standard versions
> below cpp1z; I'll do a new patch taking that into account tomorrow. I don't
> think we have maybe_warn_cpp1z or anything like that? Any preferences
> how to deal with that?

Here. Tested on Linux-PPC64.

/cp
2015-09-18  Ville Voutilainen  

Implement nested namespace definitions.
* parser.c (cp_parser_namespace_definition): Grok nested namespace
definitions.

/testsuite
2015-09-18  Ville Voutilainen  

Implement nested namespace definitions.
* g++.dg/cpp1z/nested-namespace-def1.C: New.
* g++.dg/cpp1z/nested-namespace-def2.C: Likewise.
* g++.dg/cpp1z/nested-namespace-def3.C: Likewise.
* g++.dg/lookup/name-clash5.C: Adjust.
* g++.dg/lookup/name-clash6.C: Likewise.
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4f424b6..602a90b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16953,6 +16953,8 @@ cp_parser_namespace_definition (cp_parser* parser)
   tree identifier, attribs;
   bool has_visibility;
   bool is_inline;
+  cp_token* token;
+  int nested_definition_count = 0;
 
   cp_ensure_no_omp_declare_simd (parser);
   if (cp_lexer_next_token_is_keyword (parser->lexer, RID_INLINE))
@@ -16965,7 +16967,7 @@ cp_parser_namespace_definition (cp_parser* parser)
 is_inline = false;
 
   /* Look for the `namespace' keyword.  */
-  cp_parser_require_keyword (parser, RID_NAMESPACE, RT_NAMESPACE);
+  token = cp_parser_require_keyword (parser, RID_NAMESPACE, RT_NAMESPACE);
 
   /* Get the name of the namespace.  We do not attempt to distinguish
  between an original-namespace-definition and an
@@ -16979,11 +16981,36 @@ cp_parser_namespace_definition (cp_parser* parser)
   /* Parse any specified attributes.  */
   attribs = cp_parser_attributes_opt (parser);
 
-  /* Look for the `{' to start the namespace.  */
-  cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
   /* Start the namespace.  */
   push_namespace (identifier);
 
+  /* Parse any nested namespace definition. */
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SCOPE))
+{
+  if (cxx_dialect < cxx1z)
+pedwarn (input_location, OPT_Wpedantic,
+ "nested namespace definitions only available with "
+ "-std=c++17 or -std=gnu++17");
+  if (is_inline)
+error_at (token->location, "a nested namespace definition cannot be 
inline");
+  while (cp_lexer_next_token_is (parser->lexer, CPP_SCOPE))
+{
+  cp_lexer_consume_token (parser->lexer);
+  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
+identifier = cp_parser_identifier (parser);
+  else
+{
+  cp_parser_error (parser, "nested identifier required");
+  break;
+}
+  ++nested_definition_count;
+  push_namespace (identifier);
+}
+}
+
+  /* Look for the `{' to validate starting the namespace.  */
+  cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
+
   /* "inline namespace" is equivalent to a stub namespace definition
  followed by a strong using directive.  */
   if (is_inline)
@@ -17007,6 +17034,10 @@ cp_parser_namespace_definition (cp_parser* parser)
   if (has_visibility)
 pop_visibility (1);
 
+  /* Finish the nested namespace definitions.  */
+  while (nested_definition_count--)
+pop_namespace ();
+
   /* Finish the namespace.  */
   pop_namespace ();
   /* Look for the final `}'.  */
diff --git a/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def1.C 
b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def1.C
new file mode 100644
index 000..d710ef1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def1.C
@@ -0,0 +1,14 @@
+// { dg-options "-std=c++1z" }
+
+namespace A::B::C
+{
+   struct X {};
+   namespace T::U::V { struct Y {}; }
+}
+
+A::B::C::X x;
+A::B::C::T::U::V::Y y;
+
+inline namespace D::E {} // { dg-error "cannot be inline" }
+
+namespace F::G:: {} // { dg-error "nested identifier required" }
diff --git a/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def2.C 
b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def2.C
new file mode 100644
index 000..c47a94a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def2.C
@@ -0,0 +1,5 @@
+// { dg-options "-std=c++11 -pedantic-errors" }
+
+namespace A::B::C // { dg-error "nested namespace definitions only available 
with" }
+{
+}
diff --git a/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def3.C 
b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def3.C
new file mode 100644
index 000..f2dac8f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def3.C
@@ -0,0 +1,5 @@
+// { dg-options "-std=c++11" }
+
+namespace A::B::C
+{
+}
diff --git a/gcc/testsuite/g++.dg/lookup/name-clash5.C 
b/gcc/testsuite/g++.dg/lookup/name-clash5.C
index 74595c2..9673bb9 100644
--- a/gcc/testsuite/g++.dg/lookup/name-clash5.C
+++ b/gcc/testsuite/g++.dg/lookup

Re: [PATCH, libgomp] PR 67141, uninitialized acc_device_lock mutex

2015-09-18 Thread Jakub Jelinek
On Fri, Sep 18, 2015 at 03:41:30PM +0800, Chung-Lin Tang wrote:
> this patch fixes the uninitialized acc_device_lock mutex situation
> reported in PR 67141. The patch attached on the bugzilla page
> tries to solve it by constructor priorities, which we think will
> probably be less manageable in general.
> 
> This patch changes goacc_host_init() to be called from
> goacc_runtime_initialize() instead, thereby ensuring the init order.
> libgomp testsuite was re-run without regressions, okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2015-09-18  Chung-Lin Tang  
> 
>   PR libgomp/67141
> 

No vertical space in between PR line and subsequent entries.

>   * oacc-int.h (goacc_host_init): Add declaration.
>   * oacc-host.c (goacc_host_init): Remove static and
>   constructor attribute

Full stop at the end of entry.

>   * oacc-init.c (goacc_runtime_initialize): Call goacc_host_init()
>   at end.

The patch is ok.  Though, perhaps as a follow-up, I think I'd prefer getting
rid of pthread_key_create (&goacc_cleanup_key, goacc_destroy_thread);,
it is wasteful if we do the same thing in initialize_team.  As the
goacc_tls_data pointer is __thread anyway, I think just putting it into
struct gomp_thread, arranging for init_team to be called from the env.c
ctor and from the team TLS destructor call also some oacc freeing if
the goacc_tls_data pointer is non-NULL (perhaps with __builtin_expect
unlikely).

Jakub


Re: [AArch64][PATCH 2/5] Add BIC instruction.

2015-09-18 Thread James Greenhalgh
On Thu, Sep 17, 2015 at 05:40:48PM +0100, Matthew Wahab wrote:
> Hello,
> 
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch adds an expander to
> generate a BIC instruction that can be explicitly called when
> implementing the atomic__fetch pattern to calculate the value to
> be returned by the operation.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?

Why not make the "*_one_cmpl_3" pattern
named (remove the leading *) and call that in your atomic__fetch
patterns as:

  and_one_cmpl_3

I'd rather that than to add a pettern that simply expands to the same
thing.

Thanks,
James

> 
> 2015-09-17  Matthew Wahab  
> 
>   * config/aarch64/aarch64.md (bic_3): New.
> 
> 

> From 14e122ee98aa20826ee070d20c58c94206cdd91b Mon Sep 17 00:00:00 2001
> From: Matthew Wahab 
> Date: Mon, 17 Aug 2015 17:48:27 +0100
> Subject: [PATCH 2/5] Add BIC instruction
> 
> Change-Id: Ibef049bfa1bfe5e168feada3dc358f28383e6410
> ---
>  gcc/config/aarch64/aarch64.md | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 88ba72e..bae4af4 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3351,6 +3351,19 @@
> (set_attr "simd" "*,yes")]
>  )
>  
> +(define_expand "bic_3"
> + [(set (match_operand:GPI 0 "register_operand" "=r")
> +   (and:GPI
> +(not:GPI
> + (SHIFT:GPI
> +  (match_operand:GPI 1 "register_operand" "r")
> +  (match_operand:QI 2 "aarch64_shift_imm_si" "n")))
> +(match_operand:GPI 3 "register_operand" "r")))]
> + ""
> + ""
> + [(set_attr "type" "logics_shift_imm")]
> +)
> +
>  (define_insn "*and_one_cmpl3_compare0"
>[(set (reg:CC_NZ CC_REGNUM)
>   (compare:CC_NZ
> -- 
> 2.1.4
> 



Re: [gomp4] OpenACC reduction tests

2015-09-18 Thread Thomas Schwinge
Hi Cesar!

Great progress with your OpenACC reductions work!

On Fri, 17 Jul 2015 11:13:59 -0700, Cesar Philippidis  
wrote:
> This patch updates the libgomp OpenACC reduction test cases [...]

> --- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
> @@ -4,9 +4,12 @@
>  
>  program reduction
>integer, parameter:: n = 40, c = 10
> -  integer   :: i, vsum, sum
> +  integer   :: i, vsum, gs, ws, vs, cs
>  
> -  call redsub (sum, n, c)
> +  call redsub_gang (gs, n, c)
> +  call redsub_worker (gs, n, c)
> +  call redsub_vector (vs, n, c)
> +  call redsub_combined (cs, n, c)
>  
>vsum = 0
>  
> @@ -15,10 +18,11 @@ program reduction
>   vsum = vsum + c
>end do
>  
> -  if (sum.ne.vsum) call abort ()
> +  if (gs .ne. vsum) call abort ()
> +  if (vs .ne. vsum) call abort ()
>  end program reduction

This looks incomplete to me, so I extended it as follows.

With -O0, I frequently see this test FAIL (thus XFAILed), both for nvptx
offloading and host-fallback execution.  Adding a few printfs, I observe
redsub_gang compute "random" results.  Given the following
-Wuninitialized/-Wmaybe-uninitialized warnings (for -O1, for example),
maybe there's some initialization of (internal) variables missing?
(These user-visible warnings about compiler internals need to be
addressed regardless.)  Would you please have a look at that?

source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90: In 
function 'redsub_combined_._omp_fn.0':
source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90:73:0: 
warning: '' is used uninitialized in this function [-Wuninitialized]
   !$acc loop reduction(+:sum) gang worker vector
^
source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90: In 
function 'redsub_vector_._omp_fn.1':
source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90:60:0: 
warning: '' is used uninitialized in this function [-Wuninitialized]
   !$acc loop reduction(+:sum) vector
^
source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90: In 
function 'redsub_worker_._omp_fn.2':
source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90:47:0: 
warning: '' is used uninitialized in this function [-Wuninitialized]
   !$acc loop reduction(+:sum) worker
^
source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90: In 
function 'redsub_gang_._omp_fn.3':
source-gcc/libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90:34:0: 
warning: 'sum.43' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
   !$acc loop reduction(+:sum) gang
^

Committed to gomp-4_0-branch in r227897:

commit 0a1cca2cc3c1d1e2310c6438299e63a7bd99396b
Author: tschwinge 
Date:   Fri Sep 18 08:07:47 2015 +

Extend OpenACC reduction test case, XFAIL for -O0

libgomp/
* testsuite/libgomp.oacc-fortran/reduction-5.f90: Extend.  XFAIL
execution test for -O0.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@227897 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |5 +
 libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 |5 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 63bc7dc..0c0e697 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-09-18  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-fortran/reduction-5.f90: Extend.  XFAIL
+   execution test for -O0.
+
 2015-09-15  Nathan Sidwell  
 
* oacc-parallel.c (GOACC_parallel_keyed): Use GOMP_DIM constants.
diff --git libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 
libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
index 304fe7f..f787e7d 100644
--- libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-xfail-run-if "TODO" { *-*-* } { "-O0" } }
 
 ! subroutine reduction
 
@@ -7,7 +8,7 @@ program reduction
   integer   :: i, vsum, gs, ws, vs, cs
 
   call redsub_gang (gs, n, c)
-  call redsub_worker (gs, n, c)
+  call redsub_worker (ws, n, c)
   call redsub_vector (vs, n, c)
   call redsub_combined (cs, n, c)
 
@@ -19,7 +20,9 @@ program reduction
   end do
 
   if (gs .ne. vsum) call abort ()
+  if (ws .ne. vsum) call abort ()
   if (vs .ne. vsum) call abort ()
+  if (cs .ne. vsum) call abort ()
 end program reduction
 
 subroutine redsub_gang(sum, n, c)

> -subroutine redsub(sum, n, c)
> +subroutine redsub_gang(sum, n, c)
>integer :: sum, n, c
>  
>sum = 0
> @@ -29,4 +33,43 @@ subroutine redsub(sum, n, c)
>   sum = sum + c
>end do
>!$acc end parallel
> -end subroutine redsub
> +end subroutine redsub_gang
> +
> +subroutine redsub_worker(sum, n, c)
> +  integer :: sum, n, c
> +
> +  sum = 0
> +
> + 

[gomp4] Address texinfo warnings

2015-09-18 Thread Thomas Schwinge
Hi!

The generator tool interpreted "[]" as the name of a formal parameter,
which resulted in the following texinfo warnings:

[...]/source-gcc/gcc/doc//tm.texi:5744: warning: unlikely character [ in 
@var.
[...]/source-gcc/gcc/doc//tm.texi:5744: warning: unlikely character ] in 
@var.
[...]/source-gcc/gcc/doc//tm.texi:5757: warning: unlikely character [ in 
@var.
[...]/source-gcc/gcc/doc//tm.texi:5757: warning: unlikely character ] in 
@var.
[...]/source-gcc/gcc/doc//tm.texi:5764: warning: unlikely character [ in 
@var.
[...]/source-gcc/gcc/doc//tm.texi:5764: warning: unlikely character ] in 
@var.

Fixing this by spelling out the formal parameters' names.  Also fixing a
few typos.  Committed to gomp-4_0-branch in r227898:

commit 1ee012489f5c3f0d175156fc3718c023f86137c1
Author: tschwinge 
Date:   Fri Sep 18 08:28:08 2015 +

Address texinfo warnings

gcc/
* target.def  (validate_dims, dim_limit, fork_join, lock):
Spell out the formal parameters' names.  Fix typos.
* doc/tm.texi: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@227898 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |6 ++
 gcc/doc/tm.texi|   20 ++--
 gcc/omp-low.c  |2 +-
 gcc/target.def |   20 ++--
 4 files changed, 27 insertions(+), 21 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index f4a83e2..b551413 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2015-09-18  Thomas Schwinge  
+
+   * target.def  (validate_dims, dim_limit, fork_join, lock):
+   Spell out the formal parameters' names.  Fix typos.
+   * doc/tm.texi: Regenerate.
+
 2015-09-16  Nathan Sidwell  
 
* omp-low.c (oacc_validate_dims): New function, broken out of ...
diff --git gcc/doc/tm.texi gcc/doc/tm.texi
index a151a10..b618a0e 100644
--- gcc/doc/tm.texi
+++ gcc/doc/tm.texi
@@ -5740,32 +5740,32 @@ usable.  In that case, the smaller the number is, the 
more desirable it is
 to use it.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree, int @var{[]}, 
@var{int})
+@deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree @var{decl}, int 
@var{dims[]}, int @var{fn_level})
 This hook should check the launch dimensions provided.  It should fill
-in anything that needs default to non-unity and verify non-defaults.
-Defaults are represented as -1.  Diagnostics should be issuedas 
-ppropriate.  Return true if changes have been made.  You must override
+in anything that needs to default to non-unity and verify non-defaults.
+Defaults are represented as -1.  Diagnostics should be issued as
+appropriate.  Return true if changes have been made.  You must override
 this hook to provide dimensions larger than 1.
 @end deftypefn
 
-@deftypefn {Target Hook} unsigned TARGET_GOACC_DIM_LIMIT (unsigned)
+@deftypefn {Target Hook} unsigned TARGET_GOACC_DIM_LIMIT (unsigned @var{axis})
 This hook should return the maximum size of a particular dimension,
 or zero if unbounded.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_GOACC_FORK_JOIN (gimple, const 
@var{int[]}, @var{bool})
+@deftypefn {Target Hook} bool TARGET_GOACC_FORK_JOIN (gimple @var{stmt}, const 
int @var{dims[]}, bool @var{is_fork})
 This hook should convert IFN_GOACC_FORK and IFN_GOACC_JOIN function
 calls to target-specific gimple.  It is executed during the oacc_xform
 pass.  It should return true, if the functions should be deleted.  The
-default hook returns true, if there is no RTL expanders for them.
+default hook returns true, if there are no RTL expanders for them.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_GOACC_LOCK (gimple, const @var{int[]}, 
@var{unsigned})
+@deftypefn {Target Hook} bool TARGET_GOACC_LOCK (gimple @var{stmt}, const int 
@var{dims[]}, unsigned @var{ifn_code})
 This hook should convert IFN_GOACC_LOCK, IFN_GOACC_UNLOCK,
-IFN_GOACC_LOCK_INIT  function calls to target-specific gimple.  It is
+IFN_GOACC_LOCK_INIT function calls to target-specific gimple.  It is
 executed during the oacc_xform pass.  It should return true, if the
 functions should be deleted.  The default hook returns true, if there
-is no RTL expanders for them.
+are no RTL expanders for them.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_GOACC_REDUCTION (gimple @var{call})
diff --git gcc/omp-low.c gcc/omp-low.c
index 487e7a7..33c8caf 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -9412,7 +9412,7 @@ oacc_launch_pack (unsigned code, tree device, unsigned op)
represented as a list of INTEGER_CST.  Those that are runtime
expres are represented as an INTEGER_CST of zero.  Defaults are set
as NULL_TREE and will be filled in later by the target hook
-   TARGET_OACC_VALIDATE_DIMS.
+   TARGET_GOACC_VALIDATE_DIMS.
 
TOOO. Normally the attribute will just contain a single such list.  If
however it contains a list of lists, this will represen

Re: [RFC] Masking vectorized loops with bound not aligned to VF.

2015-09-18 Thread Richard Biener
On Thu, 17 Sep 2015, Ilya Enkovich wrote:

> 2015-09-16 15:30 GMT+03:00 Richard Biener :
> > On Mon, 14 Sep 2015, Kirill Yukhin wrote:
> >
> >> Hello,
> >> I'd like to initiate discussion on vectorization of loops which
> >> boundaries are not aligned to VF. Main target for this optimization
> >> right now is x86's AVX-512, which features per-element embedded masking
> >> for all instructions. The main goal for this mail is to agree on overall
> >> design of the feature.
> >>
> >> This approach was presented @ GNU Cauldron 2015 by Ilya Enkovich [1].
> >>
> >> Here's a sketch of the algorithm:
> >>   1. Add check on basic stmts for masking: possibility to introduce index 
> >> vector and
> >>  corresponding mask
> >>   2. At the check if statements are vectorizable we additionally check if 
> >> stmts
> >>  need and can be masked and compute masking cost. Result is stored in 
> >> `stmt_vinfo`.
> >>  We are going  to mask only mem. accesses, reductions and modify mask 
> >> for already
> >>  masked stmts (mask load, mask store and vect. condition)
> >
> > I think you also need to mask divisions (for integer divide by zero) and
> > want to mask FP ops which may result in NaNs or denormals (because that's
> > generally to slow down execution a lot in my experience).
> >
> > Why not simply mask all stmts?
> 
> Hi,
> 
> Statement masking may be not free. Especially if we need to transform
> mask somehow to do it. It also may be unsupported on a platform (e.g.
> for AVX-512 not all instructions support masking) but still not be a
> problem to mask a loop. BTW for AVX-512 masking doesn't boost
> performance even if we have some special cases like NaNs. We don't
> consider exceptions in vector code (and it seems to be a case now?)
> otherwise we would need to mask them also.

Well, we do need to honor

  if (x != 0.)
   y[i] = z[i] / x;

in some way.  I think if-conversion currently simply gives up here.
So if we have the epilogue and using masked loads what are the
contents of the 'masked' elements (IIRC they are zero or all-ones, 
right)?  If the end up as zero then even simple code like

  for (i;;)
   a[i] = b[i] / c[i];

cannot be transformed in the suggested way with -ftrapping-math
and the remainder iteration might get slow if processing NaN
operands is still as slow as it was 10 years ago.

IMHO for if-converting possibly trapping stmts (like the above
example) we need some masking support anyway (and a way to express
the masking in GIMPLE).

> >
> >>   3. Make a decision about masking: take computed costs and est. 
> >> iterations count
> >>  into consideration
> >>   4. Modify prologue/epilogue generation according decision made at 
> >> analysis. Three
> >>  options available:
> >> a. Use scalar remainder
> >> b. Use masked remainder. Won't be supported in first version
> >> c. Mask main loop
> >>   5.Support vectorized loop masking:
> >> - Create stmts for mask generation
> >> - Support generation of masked vector code (create generic vector code 
> >> then
> >>   patch it w/ masks)
> >>   -  Mask loads/stores/vconds/reductions only
> >>
> >>  In first version (targeted v6) we're not going to support 4.b and loop
> >> mask pack/unpack. No `pack/unpack` means that masking will be supported
> >> only for types w/ the same size as index variable
> >
> > This means that if ncopies for any stmt is > 1 masking won't be supported,
> > right?  (you'd need two or more different masks)
> 
> We don't think it is a very important feature to have in initial
> version. It can be added later and shouldn't affect overall
> implementation design much. BTW currently masked loads and stores
> don't support masks of other sizes and don't do masks pack/unpack.

I think masked loads/stores support this just fine.  Remember the
masks are regular vectors generated by cond exprs in the current code.

> >
> >>
> >> [1] - 
> >> https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf
> >>
> >> What do you think?
> >
> > There was the idea some time ago to use single-iteration vector
> > variants for prologues/epilogues by simply overlapping them with
> > the vector loop (and either making sure to mask out the overlap
> > area or make sure the result stays the same).  This kind-of is
> > similar to 4b and thus IMHO it's better to get 4b implemented
> > rather than trying 4c.  So for example
> >
> >  int a[];
> >  for (i=0; i < 13; ++i)
> >a[i] = i;
> >
> > would be vectorized (with v4si) as
> >
> >  for (i=0; i < 13 / 4; ++i)
> >((v4si *)a)[i] = { ... };
> >  *(v4si *)(&a[9]) = { ... };
> >
> > where the epilogue store of course would be unaligned.  The masked
> > variant can avoid the data pointer adjustment and instead use a masked
> > store.
> >
> > OTOH it might be that the unaligned scheme is as efficient as the
> > masked version.  Only the masked version is more trivially correct,
> > data dependences can make the 

Re: [patch] libstdc++/65142 Check read() result in std::random_device.

2015-09-18 Thread Christophe Lyon
On 18 September 2015 at 01:18, Moore, Catherine
 wrote:
>
>
>> -Original Message-
>> From: Jonathan Wakely [mailto:jwak...@redhat.com]
>> Sent: Thursday, September 17, 2015 6:54 PM
>> To: Moore, Catherine; fdum...@gcc.gnu.org
>> Cc: Gerald Pfeifer; libstd...@gcc.gnu.org; gcc-patches@gcc.gnu.org
>> Subject: Re: [patch] libstdc++/65142 Check read() result in
>> std::random_device.
>>
>> On 17/09/15 22:32 +, Moore, Catherine wrote:
>> >
>> >
>> >> -Original Message-
>> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> >> ow...@gcc.gnu.org] On Behalf Of Jonathan Wakely
>> >> Sent: Thursday, September 17, 2015 5:28 PM
>> >> To: Gerald Pfeifer
>> >> Cc: libstd...@gcc.gnu.org; gcc-patches@gcc.gnu.org
>> >> Subject: Re: [patch] libstdc++/65142 Check read() result in
>> >> std::random_device.
>> >>
>> >> On 17/09/15 22:21 +0200, Gerald Pfeifer wrote:
>> >> >On Thu, 17 Sep 2015, Jonathan Wakely wrote:
>> >> >>> Any comments on this version?
>> >> >> Committed to trunk.
>> >> >
>> >> >Unfortunately this broke bootstrap on FreeBSD 10.1.
>> >> >
>> >> >/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc: In
>> >> member function 'std::random_device::result_type
>> >> std::random_device::_M_getval()':
>> >> >/scratch/tmp/gerald/gcc-HEAD/libstdc++-
>> v3/src/c++11/random.cc:144:22:
>> >> >error: 'errno' was not declared in this scope
>> >> >  else if (e != -1 || errno != EINTR)
>> >> >  ^
>> >> >/scratch/tmp/gerald/gcc-HEAD/libstdc++-
>> v3/src/c++11/random.cc:144:31:
>> >> >error: 'EINTR' was not declared in this scope
>> >> >  else if (e != -1 || errno != EINTR)
>> >> >   ^
>> >> >Makefile:545: recipe for target 'random.lo' failed
>> >> >
>> >> >I probably won't be able to dig in deeper today, but figured this
>> >> >might already send you on the right path?
>> >> >
>> >> >Actually...
>> >> >
>> >> >...how about he patch below?  Bootstraps on
>> >> >i386-unknown-freebsd10.1, no regressions.
>> >>
>> >> Sorry about that, I've committed your patch.
>> >>
>> >> >Gerald
>> >> >
>> >> >
>> >I'm still seeing errors for a build of the mips-sde-elf target with these
>> patches.
>> >
>> >Errors are:
>> >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s
>> >rc/c++11/debug.cc: In function 'void
>> {anonymous}::print_word({anonymous
>> >}::PrintContext&, const char*, std::ptrdiff_t)':
>> >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s
>> >rc/c++11/debug.cc:573:10: error: 'stderr' was not declared in this scop
>> >e
>> >  fprintf(stderr, "\n");
>> >  ^
>> >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s
>> >rc/c++11/debug.cc:573:22: error: 'fprintf' was not declared in this sco
>> >pe
>> >  fprintf(stderr, "\n");
>> >  ^
>> >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s
>> >rc/c++11/debug.cc:596:14: error: 'stderr' was not declared in this scop e
>> >  fprintf(stderr, "%s", spacing);
>> >  ^
>> >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s
>> >rc/c++11/debug.cc:596:35: error: 'fprintf' was not declared in this sco pe
>> >  fprintf(stderr, "%s", spacing);
>> >   ^
>> >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s
>> >rc/c++11/debug.cc:600:24: error: 'stderr' was not declared in this
>> >scope
>> >  int written = fprintf(stderr, "%s", word);
>> >^
>> >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s
>> >rc/c++11/debug.cc:600:42: error: 'fprintf' was not declared in this
>> >scop e
>> >  int written = fprintf(stderr, "%s", word);
>>
>> That's a different problem, due to https://gcc.gnu.org/r227885
>>
>> François, could you take a look please?
>>
>
> I've now committed this patch to solve this problem (pre-approved by 
> Jonathan).
>
> 2015-09-17  Catherine Moore  
>
> * src/c++11/debug.cc: Include .
>
>
Thanks Catherine, I confirm that it fixes the arm*elf and aarch64*elf
builds too.

Christophe.

> Index: src/c++11/debug.cc
> ===
> --- src/c++11/debug.cc  (revision 227887)
> +++ src/c++11/debug.cc  (working copy)
> @@ -32,6 +32,7 @@
>  #include 
>
>  #include 
> +#include 
>
>  #include  // for std::min
>  #include  // for _Hash_impl


Re: [PATCH, rs6000] Add expansions for min/max vector reductions

2015-09-18 Thread Richard Biener
On Thu, 17 Sep 2015, Bill Schmidt wrote:

> On Thu, 2015-09-17 at 09:18 -0500, Bill Schmidt wrote:
> > On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote:
> > > On Wed, 16 Sep 2015, Alan Lawrence wrote:
> > > 
> > > > On 16/09/15 17:10, Bill Schmidt wrote:
> > > > > 
> > > > > On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote:
> > > > > > On 16/09/15 15:28, Bill Schmidt wrote:
> > > > > > > 2015-09-16  Bill Schmidt  
> > > > > > > 
> > > > > > >   * config/rs6000/altivec.md (UNSPEC_REDUC_SMAX,
> > > > > > > UNSPEC_REDUC_SMIN,
> > > > > > >   UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, 
> > > > > > > UNSPEC_REDUC_SMAX_SCAL,
> > > > > > >   UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL,
> > > > > > >   UNSPEC_REDUC_UMIN_SCAL): New enumerated constants.
> > > > > > >   (reduc_smax_v2di): New define_expand.
> > > > > > >   (reduc_smax_scal_v2di): Likewise.
> > > > > > >   (reduc_smin_v2di): Likewise.
> > > > > > >   (reduc_smin_scal_v2di): Likewise.
> > > > > > >   (reduc_umax_v2di): Likewise.
> > > > > > >   (reduc_umax_scal_v2di): Likewise.
> > > > > > >   (reduc_umin_v2di): Likewise.
> > > > > > >   (reduc_umin_scal_v2di): Likewise.
> > > > > > >   (reduc_smax_v4si): Likewise.
> > > > > > >   (reduc_smin_v4si): Likewise.
> > > > > > >   (reduc_umax_v4si): Likewise.
> > > > > > >   (reduc_umin_v4si): Likewise.
> > > > > > >   (reduc_smax_v8hi): Likewise.
> > > > > > >   (reduc_smin_v8hi): Likewise.
> > > > > > >   (reduc_umax_v8hi): Likewise.
> > > > > > >   (reduc_umin_v8hi): Likewise.
> > > > > > >   (reduc_smax_v16qi): Likewise.
> > > > > > >   (reduc_smin_v16qi): Likewise.
> > > > > > >   (reduc_umax_v16qi): Likewise.
> > > > > > >   (reduc_umin_v16qi): Likewise.
> > > > > > >   (reduc_smax_scal_): Likewise.
> > > > > > >   (reduc_smin_scal_): Likewise.
> > > > > > >   (reduc_umax_scal_): Likewise.
> > > > > > >   (reduc_umin_scal_): Likewise.
> > > > > > 
> > > > > > You shouldn't need the non-_scal reductions. Indeed, they shouldn't 
> > > > > > be
> > > > > > used if
> > > > > > the _scal are present. The non-_scal's were previously defined as
> > > > > > producing a
> > > > > > vector with one element holding the result and the other elements 
> > > > > > all
> > > > > > zero, and
> > > > > > this was only ever used with a vec_extract immediately after; the 
> > > > > > _scal
> > > > > > pattern
> > > > > > now includes the vec_extract as well. Hence the non-_scal patterns 
> > > > > > are
> > > > > > deprecated / considered legacy, as per md.texi.
> > > > > 
> > > > > Thanks -- I had misread the description of the non-scalar versions,
> > > > > missing the part where the other elements are zero.  What I really
> > > > > want/need is an optab defined as computing the maximum value in all
> > > > > elements of the vector.
> > > > 
> > > > Yes, indeed. It seems reasonable to me that this would coexist with an 
> > > > optab
> > > > which computes only a single value (i.e. a scalar).
> > > 
> > > So just to clarify - you need to reduce the vector with max to a scalar
> > > but want the (same) result in all vector elements?
> > 
> > Yes.  Alan Hayward's cond-reduction patch is set up to perform a
> > reduction to scalar, followed by a scalar broadcast to get the value
> > into all positions.  It happens that our most efficient expansion to
> > reduce to scalar will naturally produce the value in all positions.
> > Using the reduc_smax_scal_ expander, we are required to extract
> > this into a scalar and then perform the broadcast.  Those two operations
> > are unnecessary.  We can clean up after the fact, but the cost model
> > will still reflect the unnecessary activity.
> > 
> > > 
> > > > At that point it might be appropriate to change the cond-reduction code 
> > > > to
> > > > generate the reduce-to-vector in all cases, and optabs.c expand it to
> > > > reduce-to-scalar + broadcast if reduce-to-vector was not available. 
> > > > Along with
> > > > the (parallel) changes to cost model already proposed, does that cover 
> > > > all the
> > > > cases? It does add a new tree code, yes, but I'm feeling that could be
> > > > justified if we go down this route.
> > > 
> > > I'd rather have combine recognize an insn that does both (if it
> > > exists).  As I understand powerpc doesn't have reduction to scalar
> > > (I think no ISA actually has this, but all ISAs have reduce to
> > > one vector element plus a cheap way of extraction (effectively a subreg))
> > > but it's reduction already has all result vector elements set to the
> > > same value (as opposed to having some undefined or zero or whatever).
> > 
> > The extraction is not necessarily so cheap.  For floating-point values
> > in a register, we can indeed use a subreg.  But in this case we are
> > talking about integers, an

Re: [PATCH 2/5] completely_scalarize arrays as well as records.

2015-09-18 Thread Richard Biener
On Thu, 17 Sep 2015, Alan Lawrence wrote:

> On 15/09/15 08:43, Richard Biener wrote:
> >
> > Sorry for chiming in so late...
> 
> Not at all, TYVM for your help!
> 
> > TREE_CONSTANT isn't the correct thing to test.  You should use
> > TREE_CODE () == INTEGER_CST instead.
> 
> Done (in some cases, via tree_fits_shwi_p).
> 
> > Also you need to handle
> > NULL_TREE TYPE_MIN/MAX_VALUE as that can happen as well.
> 
> I've not found any documentation as to what these mean, but from experiment,
> it seems that a zero-length array has MIN_VALUE 0 and MAX_VALUE null (as well
> as zero TYPE_SIZE) - so I allow that. In contrast a variable-length array also
> has zero TYPE_SIZE, but a large range of MIN-MAX, and I think I want to rule
> those out.
> 
> Another awkward case is Ada arrays with large offset (e.g. 
> array(2^32...2^32+1)
> which has only two elements); I don't see either of tree_to_shwi or 
> tree_to_uhwi
> as necessarily being "better" here, each will handle (some) (rare) cases the
> other will not, so I've tried to use tree_to_shwi throughout for consistency.
> 
> Summary: taken advantage of tree_fits_shwi_p, as this includes a check against
> NULL_TREE and that TREE_CODE () == INTEGER_CST.
> 
> > +  if (DECL_P (elem) && DECL_BIT_FIELD (elem))
> > +   return false;
> >
> > that can't happen (TREE_TYPE (array-type) is never a DECL).
> 
> Removed.
> 
> > +   int el_size = tree_to_uhwi (elem_size);
> > +   gcc_assert (el_size);
> >
> > so you assert on el_size being > 0 later but above you test
> > only array size ...
> 
> Good point, thanks.
> 
> > +   tree t_idx = build_int_cst (TYPE_DOMAIN (decl_type), idx);
> >
> > use t_idx = size_int (idx);
> 
> Done.
> 
> I've also added another test, of scalarizing a structure containing a
> zero-length array, as earlier attempts accidentally prevented this.
> 
> Bootstrapped + check-gcc,g++,ada,fortran on ARM and x86_64;
> Bootstrapped + check-gcc,g++,fortran on AArch64.
> 
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Alan
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/67283
>   * tree-sra.c (type_consists_of_records_p): Rename to...
>   (scalarizable_type_p): ...this, add case for ARRAY_TYPE.
>   (completely_scalarize_record): Rename to...
>   (completely_scalarize): ...this, add ARRAY_TYPE case, move some code to:
>   (scalarize_elem): New.
>   (analyze_all_variable_accesses): Follow renamings.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/67283
>   * gcc.dg/tree-ssa/sra-15.c: New.
>   * gcc.dg/tree-ssa/sra-16.c: New.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/sra-15.c |  37 
>  gcc/testsuite/gcc.dg/tree-ssa/sra-16.c |  37 
>  gcc/tree-sra.c | 165 
> +++--
>  3 files changed, 191 insertions(+), 48 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-16.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
> new file mode 100644
> index 000..a22062e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
> @@ -0,0 +1,37 @@
> +/* Verify that SRA total scalarization works on records containing arrays.  
> */
> +/* { dg-do run } */
> +/* { dg-options "-O1 -fdump-tree-release_ssa --param 
> sra-max-scalarization-size-Ospeed=32" } */
> +
> +extern void abort (void);
> +
> +struct S
> +{
> +  char c;
> +  unsigned short f[2][2];
> +  int i;
> +  unsigned short f3, f4;
> +};
> +
> +
> +int __attribute__ ((noinline))
> +foo (struct S *p)
> +{
> +  struct S l;
> +
> +  l = *p;
> +  l.i++;
> +  l.f[1][0] += 3;
> +  *p = l;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  struct S a = {0, { {5, 7}, {9, 11} }, 4, 0, 0};
> +  foo (&a);
> +  if (a.i != 5 || a.f[1][0] != 12)
> +abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-16.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/sra-16.c
> new file mode 100644
> index 000..fef34c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-16.c
> @@ -0,0 +1,37 @@
> +/* Verify that SRA total scalarization works on records containing arrays.  
> */
> +/* { dg-do run } */
> +/* { dg-options "-O1 -fdump-tree-release_ssa --param 
> sra-max-scalarization-size-Ospeed=16" } */
> +
> +extern void abort (void);
> +
> +struct S
> +{
> +  long zilch[0];
> +  char c;
> +  int i;
> +  unsigned short f3, f4;
> +};
> +
> +
> +int __attribute__ ((noinline))
> +foo (struct S *p)
> +{
> +  struct S l;
> +
> +  l = *p;
> +  l.i++;
> +  l.f3++;
> +  *p = l;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  struct S a = { { }, 0, 4, 0, 0};
> +  foo (&a);
> +  if (a.i != 5 || a.f3 != 1)
> +abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 8b3a0ad..8

Re: [Patch, fortran] PR40054 and PR63921 - Implement pointer function assignment - redux

2015-09-18 Thread Paul Richard Thomas
Dear Mikael,

Thank you very much for the review. I'll give consideration to your
remarks over the weekend. You will have guessed from the comment that
I too was uneasy about forcing the break. As for your last remark,
yes, the code rewriting is indeed in the wrong place. It should be
rather easy to accomplish both the checks and defined assignments.

Thanks again

Paul

On 17 September 2015 at 15:43, Mikael Morin  wrote:
> Le 06/09/2015 18:40, Paul Richard Thomas a écrit :
>>
>> It helps to attach the patch :-)
>>
>> Paul
>>
>> On 6 September 2015 at 13:42, Paul Richard Thomas
>>  wrote:
>>>
>>> Dear All,
>>>
>>> The attached patch more or less implements the assignment of
>>> expressions to the result of a pointer function. To wit:
>>>
>>> my_ptr_fcn (arg1, arg2...) = expr
>>>
>>> arg1 would usually be the target, pointed to by the function. The
>>> patch parses these statements and resolves them into:
>>>
>>> temp_ptr => my_ptr_fcn (arg1, arg2...)
>>> temp_ptr = expr
>>>
>>> I say more or less implemented because I have ducked one of the
>>> headaches here. At the end of the specification block, there is an
>>> ambiguity between statement functions and pointer function
>>> assignments. I do not even try to resolve this ambiguity and require
>>> that there be at least one other type of executable statement before
>>> these beasts. This can undoubtedly be fixed but the effort seems to me
>>> to be unwarranted at the present time.
>>>
>>> This version of the patch extends the coverage of allowed rvalues to
>>> any legal expression. Also, all the problems with error.c have been
>>> dealt with by Manuel's patch.
>>>
>>> I am grateful to Dominique for reminding me of PR40054 and pointing
>>> out PR63921. After a remark of his on #gfortran, I fixed the checking
>>> of the standard to pick up all the offending lines with F2003 and
>>> earlier.
>>>
>>>
>>> Bootstraps and regtests on FC21/x86_64 - OK for trunk?
>>>
> Hello Paul,
>
> I'm mostly concerned about the position where the code rewriting happens.
> Details below.
>
> Mikael
>
>
>>
>> submit_2.diff
>>
>
>> Index: gcc/fortran/parse.c
>> ===
>> *** gcc/fortran/parse.c (revision 227508)
>> --- gcc/fortran/parse.c (working copy)
>> *** decode_statement (void)
>> *** 356,362 
>> --- 357,371 
>>
>> match (NULL, gfc_match_assignment, ST_ASSIGNMENT);
>> match (NULL, gfc_match_pointer_assignment, ST_POINTER_ASSIGNMENT);
>> +
>> +   if (in_specification_block)
>> + {
>> match (NULL, gfc_match_st_function, ST_STATEMENT_FUNCTION);
>> + }
>> +   else if (!gfc_notification_std (GFC_STD_F2008))
>> + {
>> +   match (NULL, gfc_match_ptr_fcn_assign, ST_ASSIGNMENT);
>> + }
>>
> As match exits the function upon success, I think it makes sense to move
> match (... gfc_match_ptr_fcn_assign ...) out of the else, namely:
>
>   if (in_specification_block)
> {
>   /* match statement function */
> }
>
>   /* match pointer fonction assignment */
>
> so that non-ambiguous cases are recognized with gfc_match_ptr_fcn_assign.
> Non-ambiguous cases are for example the ones where one of the function
> arguments is a non-variable, or a variable with a subreference, or when
> there is one keyword argument. Example (rejected with unclassifiable
> statement):
>
> program p
>   integer, parameter :: b = 3
>   integer, target:: a = 2
>
>   func(arg=b) = 1
>   if (a /= 1) call abort
>
>   func(b + b - 3) = -1
>   if (a /= -1) call abort
>
> contains
>   function func(arg) result(r)
> integer, pointer :: r
> integer :: arg
>
> if (arg == 3) then
>   r => a
> else
>   r => null()
> end if
>   end function func
> end program p
>
>
>> Index: gcc/fortran/resolve.c
>> ===
>> *** gcc/fortran/resolve.c   (revision 227508)
>> --- gcc/fortran/resolve.c   (working copy)
>> *** generate_component_assignments (gfc_code
>> *** 10133,10138 
>> --- 10141,10205 
>>   }
>>
>>
>> + /* F2008: Pointer function assignments are of the form:
>> +   ptr_fcn (args) = expr
>> +This function breaks these assignments into two statements:
>> +   temporary_pointer => ptr_fcn(args)
>> +   temporary_pointer = expr  */
>> +
>> + static bool
>> + resolve_ptr_fcn_assign (gfc_code **code, gfc_namespace *ns)
>> + {
>> +   gfc_expr *tmp_ptr_expr;
>> +   gfc_code *this_code;
>> +   gfc_component *comp;
>> +   gfc_symbol *s;
>> +
>> +   if ((*code)->expr1->expr_type != EXPR_FUNCTION)
>> + return false;
>> +
>> +   /* Even if standard does not support this feature, continue to build
>> +  the two statements to avoid upsetting frontend_passes.c.  */
>
> I don't mind this, but maybe we should return false at the end, when an
> error has been emitted?
>
>> +   gfc_notify_std (GFC_STD_F2008, "Pointer procedure assignment at "
>> + "%L", &(*code)->loc);
>

Re: [PATCH, rs6000] Add expansions for min/max vector reductions

2015-09-18 Thread Richard Biener
On Thu, 17 Sep 2015, Segher Boessenkool wrote:

> On Thu, Sep 17, 2015 at 09:18:42AM -0500, Bill Schmidt wrote:
> > On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote:
> > > So just to clarify - you need to reduce the vector with max to a scalar
> > > but want the (same) result in all vector elements?
> > 
> > Yes.  Alan Hayward's cond-reduction patch is set up to perform a
> > reduction to scalar, followed by a scalar broadcast to get the value
> > into all positions.  It happens that our most efficient expansion to
> > reduce to scalar will naturally produce the value in all positions.
> 
> It also is many insns after expand, so relying on combine to combine
> all that plus the following splat (as Richard suggests below) is not
> really going to work.
> 
> If there also are targets where the _scal version is cheaper, maybe
> we should keep both, and have expand expand to whatever the target
> supports?

Wait .. so you don't actually have an instruction to do, say,
REDUC_MAX_EXPR (neither to scalar nor to vector)?  Then it's better
to _not_ define such pattern and let the vectorizer generate
its fallback code.  If the fallback code isn't "best" then better
think of a way to make it choose the best variant out of its
available ones (and maybe add another).  I think it tests
availability of the building blocks for the variants and simply
picks the first that works without checking the cost model.

Richard.

> 
> Segher
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [AArch64][PATCH 3/5] Add atomic load-operate instructions.

2015-09-18 Thread James Greenhalgh
On Thu, Sep 17, 2015 at 05:42:35PM +0100, Matthew Wahab wrote:
> Hello,
> 
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch adds the ARMv8.1 atomic
> load-operate instructions.
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?
> Matthew
> 
> 2015-09-17  Matthew Wahab  
> 
>   * config/aarch64/aarch64/atomics.md (UNSPECV_ATOMIC_LDOP): New.
>   (UNSPECV_ATOMIC_LDOP_OR): New.
>   (UNSPECV_ATOMIC_LDOP_BIC): New.
>   (UNSPECV_ATOMIC_LDOP_XOR): New.
>   (UNSPECV_ATOMIC_LDOP_PLUS): New.
>   (ATOMIC_LDOP): New.
>   (atomic_ldop): New.
>   (aarch64_atomic_load): New.
> 

> From 6a8a83c4efbd607924f0630779d4915c9dad079c Mon Sep 17 00:00:00 2001
> From: Matthew Wahab 
> Date: Mon, 10 Aug 2015 17:02:08 +0100
> Subject: [PATCH 3/5] Add atomic load-operate instructions.
> 
> Change-Id: I3746875bad7469403bee7df952f0ba565e4abc71
> ---
>  gcc/config/aarch64/atomics.md | 41 +
>  1 file changed, 41 insertions(+)
> 
> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index 0e71002..b7b6fb5 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -29,8 +29,25 @@
>  UNSPECV_ATOMIC_CAS   ; Represent an atomic CAS.
>  UNSPECV_ATOMIC_SWP   ; Represent an atomic SWP.
>  UNSPECV_ATOMIC_OP; Represent an atomic operation.
> +UNSPECV_ATOMIC_LDOP  ; Represent an atomic 
> load-operation
> +UNSPECV_ATOMIC_LDOP_OR   ; Represent an atomic load-or
> +UNSPECV_ATOMIC_LDOP_BIC  ; Represent an atomic load-bic
> +UNSPECV_ATOMIC_LDOP_XOR  ; Represent an atomic load-xor
> +UNSPECV_ATOMIC_LDOP_PLUS ; Represent an atomic load-add
>  ])
>  
> +;; Iterators for load-operate instructions.
> +
> +(define_int_iterator ATOMIC_LDOP
> + [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC
> +  UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS])
> +
> +(define_int_attr atomic_ldop
> + [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr")
> +  (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")])

There is precedent (atomic_optab, atomic_op_operand, const_atomic, etc.) for
these living in config/aarch64/iterators.md so they should be moved there.
Presumably the difficulty with that is to do with the position of the
"unspecv" define_c_enum? I'd argue that is in the wrong place too...

If you want to leave this to a cleanup patch in stage 3 that is fine.

This patch is OK for trunk.

Thanks,
James

> +
> +;; Instruction patterns.
> +
>  (define_expand "atomic_compare_and_swap"
>[(match_operand:SI 0 "register_operand" "");; bool 
> out
> (match_operand:ALLI 1 "register_operand" "")  ;; val 
> out
> @@ -541,3 +558,27 @@
>  else
>return "casal\t%0, %2, %1";
>  })
> +
> +;; Atomic load-op: Load data, operate, store result, keep data.
> +
> +(define_insn "aarch64_atomic_load"
> + [(set (match_operand:ALLI 0 "register_operand" "=r")
> +   (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q"))
> +  (set (match_dup 1)
> +   (unspec_volatile:ALLI
> +[(match_dup 1)
> + (match_operand:ALLI 2 "register_operand")
> + (match_operand:SI 3 "const_int_operand")]
> +ATOMIC_LDOP))]
> + "TARGET_LSE && reload_completed"
> + {
> +   enum memmodel model = memmodel_from_int (INTVAL (operands[3]));
> +   if (is_mm_relaxed (model))
> + return "ld\t%2, %0, %1";
> +   else if (is_mm_acquire (model) || is_mm_consume (model))
> + return "lda\t%2, %0, %1";
> +   else if (is_mm_release (model))
> + return "ldl\t%2, %0, %1";
> +   else
> + return "ldal\t%2, %0, %1";
> + })
> -- 
> 2.1.4
> 



Re: [gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-09-18 Thread Thomas Schwinge
Hi!

On Tue, 1 Sep 2015 18:29:55 +0200, Tom de Vries  wrote:
> On 27/08/15 03:37, Cesar Philippidis wrote:
> > -  ctx->ganglocal_size_host = align_and_expand (&gl_host, host_size, align);
> 
> I suspect this caused a bootstrap failure (align_and_expand unused). 
> Worked-around as attached.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -1450,7 +1450,7 @@ omp_copy_decl (tree var, copy_body_data *cb)
>  
>  /* Modify the old size *POLDSZ to align it up to ALIGN, and then return
> a value with SIZE added to it.  */
> -static tree
> +static tree ATTRIBUTE_UNUSED
>  align_and_expand (tree *poldsz, tree size, unsigned int align)
>  {
>tree oldsz = *poldsz;

If I remember correctly, this has only ever been used in the "ganglocal"
implementation -- which is now gone.  So, should align_and_expand also be
elided (Cesar)?


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH 3/4] [ARM] Add attribute/pragma target fpu=

2015-09-18 Thread Kyrill Tkachov

Hi Christian,
(going through the patches...)

On 14/09/15 12:39, Christian Bruel wrote:

This patch splits the neon_builtins initialization into 2 internals
functions. One for NEON and one for CRYPTO, each one guarded by its own
predicate. arm_init_neon_builtins is now global to be called from
arm_valid_target_attribute_tree if needed.

 
diff -ruN gnu_trunk.p2/gcc/gcc/config/arm/arm-protos.h gnu_trunk.p3/gcc/gcc/config/arm/arm-protos.h

--- gnu_trunk.p2/gcc/gcc/config/arm/arm-protos.h2015-09-11 
15:23:51.852687891 +0200
+++ gnu_trunk.p3/gcc/gcc/config/arm/arm-protos.h2015-09-11 
16:30:15.833511559 +0200
@@ -213,7 +213,10 @@
 extern bool arm_change_mode_p (tree);
 #endif
 
-extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *,

+extern void arm_init_neon_builtins (void);
+
+extern tree arm_valid_target_attribute_tree (tree,
+struct gcc_options *,
 struct gcc_options *);

Why the arm_valid_target_attribute_tree declaration here?
I don't see its relevance in this patch, and it's not mentioned in the 
ChangeLog.

Kyrill



[PATCH] Do not add_location_or_const_value_attribute in early dwarf

2015-09-18 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, gdb testsuite without
regressions, applied.

Richard.

2015-09-18  Richard Biener  

* dwarf2out.c (add_location_or_const_value_attribute): Do nothing
in early-dwarf.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 227898)
+++ gcc/dwarf2out.c (working copy)
@@ -16145,6 +16145,9 @@ add_location_or_const_value_attribute (d
   var_loc_list *loc_list;
   cached_dw_loc_list *cache;
 
+  if (early_dwarf)
+return false;
+
   if (TREE_CODE (decl) == ERROR_MARK)
 return false;
 


Re: [AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns.

2015-09-18 Thread James Greenhalgh
On Thu, Sep 17, 2015 at 05:47:43PM +0100, Matthew Wahab wrote:
> Hello,
> 
> ARMv8.1 adds atomic swap and atomic load-operate instructions with
> optional memory ordering specifiers. This patch uses the ARMv8.1 atomic
> load-operate instructions to implement the atomic_fetch_
> patterns. This patch also updates the implementation of the atomic_
> patterns, which are treated as versions of the atomic_fetch_ which
> discard the loaded data.
> 
> The general form of the code generated for an atomic_fetch_, with
> destination D, source S, memory address A and memory order MO, depends
> on whether the operation is directly supported by the instruction. If
>  is one of PLUS, IOR or XOR, the code generated is:
> 
>  ld S, D, [A]
> 
> where
> is one {add, set, eor}
> is one of {'', 'a', 'l', 'al'} depending on memory order MO.
> is one of {'', 'b', 'h'} depending on the data size.
> 
> If  is SUB, the code generated, with scratch register r, is:
> 
>  neg r, S
>  ldadd r, D, [A]
> 
> If  is AND, the code generated is:
>  not r, S
>  ldclr r, D, [A]
> 
> Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the
> existing aarch64_split_atomic_op function, to implement the operation
> using sequences built with the ARMv8 load-exclusive/store-exclusive
> instructions
> 
> Tested the series for aarch64-none-linux-gnu with native bootstrap and
> make check. Also tested for aarch64-none-elf with cross-compiled
> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> Ok for trunk?

Some comments in line below.

> From c4b8eb6d2ca62c57f4a011e06335b918f603ad2a Mon Sep 17 00:00:00 2001
> From: Matthew Wahab 
> Date: Fri, 7 Aug 2015 17:10:42 +0100
> Subject: [PATCH 4/5] Use atomic instructions for fetch-update patterns.
> 
> Change-Id: I39759f02e61039067ccaabfd52039e4804eddf2f
> ---
>  gcc/config/aarch64/aarch64-protos.h|   2 +
>  gcc/config/aarch64/aarch64.c   | 176 
> -
>  gcc/config/aarch64/atomics.md  | 109 -
>  .../gcc.target/aarch64/atomic-inst-ldadd.c |  58 +++
>  .../gcc.target/aarch64/atomic-inst-ldlogic.c   | 109 +
>  5 files changed, 444 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index eba4c76..76ebd6f 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx);
>  void aarch64_expand_compare_and_swap (rtx op[]);
>  void aarch64_split_compare_and_swap (rtx op[]);
>  void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx);
> +
> +bool aarch64_atomic_ldop_supported_p (enum rtx_code);
>  void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx);
>  void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
>  
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index dc05c6e..33f9ef3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11064,6 +11064,33 @@ aarch64_expand_compare_and_swap (rtx operands[])
>emit_insn (gen_rtx_SET (bval, x));
>  }
>  
> +/* Test whether the target supports using a atomic load-operate instruction.
> +   CODE is the operation and AFTER is TRUE if the data in memory after the
> +   operation should be returned and FALSE if the data before the operation
> +   should be returned.  Returns FALSE if the operation isn't supported by the
> +   architecture.
> +  */

Stray newline, leave the */ on the line before.

> +
> +bool
> +aarch64_atomic_ldop_supported_p (enum rtx_code code)
> +{
> +  if (!TARGET_LSE)
> +return false;
> +
> +  switch (code)
> +{
> +case SET:
> +case AND:
> +case IOR:
> +case XOR:
> +case MINUS:
> +case PLUS:
> +  return true;
> +default:
> +  return false;
> +}
> +}
> +
>  /* Emit a barrier, that is appropriate for memory model MODEL, at the end of 
> a
> sequence implementing an atomic operation.  */
>  
> @@ -11206,26 +11233,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx 
> dst, rtx value,
>emit_insn (gen (dst, mem, value, model));
>  }
>  
> -/* Emit an atomic operation where the architecture supports it.  */
> +/* Operations supported by aarch64_emit_atomic_load_op.  */
> +
> +enum aarch64_atomic_load_op_code
> +{
> +  AARCH64_LDOP_PLUS, /* A + B  */
> +  AARCH64_LDOP_XOR,  /* A ^ B  */
> +  AARCH64_LDOP_OR,   /* A | B  */
> +  AARCH64_LDOP_BIC   /* A & ~B  */
> +};

I have a small preference to calling these the same name as the
instructions they will generate, so AARCH64_LDOP_ADD, AARCH64_LDOP_EOR,
AARCH64_LDOP_SET and AARCH64_LDOP_CLR, but I'm happy fo you to leave it
this way if you prefer.

> +
> +/* Emit an atomic load-operate.  */
> +
>

Re: [PATCH 4/4] [ARM] Add attribute/pragma target fpu=

2015-09-18 Thread Kyrill Tkachov


On 15/09/15 11:47, Christian Bruel wrote:


On 09/14/2015 04:30 PM, Christian Bruel wrote:

Finally, the final part of the patch set does the attribute target
parsing and checking, redefines the preprocessor macros and implements
the inlining rules.

testcases and documentation included.


new version to remove a shadowed remnant piece of code.


  > thanks
  >
  > Christian
  >


+  /* OK to inline between different modes.
+ Function with mode specific instructions, e.g using asm,
+ must be explicitely protected with noinline.  */

s/explicitely/explicitly/


+  const struct arm_fpu_desc *fpu_desc1
+= &all_fpus[caller_opts->x_arm_fpu_index];
+  const struct arm_fpu_desc *fpu_desc2
+= &all_fpus[callee_opts->x_arm_fpu_index];

Please call these caller_fpu and callee_fpu, it's much easier to reason about 
the inlining rules that way

+
+  /* Can't inline NEON extension if the caller doesn't support it.  */
+  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_NEON)
+  && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_NEON))
+return false;
+
+  /* Can't inline CRYPTO extension if the caller doesn't support it.  */
+  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_CRYPTO)
+  && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_CRYPTO))
+return false;
+

We also need to take into account FPU_FL_FP16...
In general what we want is for the callee FPU features to be
a subset of the callers features, similar to the way we handle
the x_aarch64_isa_flags handling in aarch64_can_inline_p from the
aarch64 port. I think that's the way to go here rather than explicitly
writing down a check for each feature.

@@ -242,6 +239,8 @@
 
   /* Update macros.  */

   gcc_assert (cur_opt->x_target_flags == target_flags);
+  /* This one can be redefined by the pragma without warning.  */
+  cpp_undef (parse_in, "__ARM_FP");
   arm_cpu_builtins (parse_in);
 
Could you elaborate why the cpp_undef here?

If you want to undefine __ARM_FP so you can redefine it to a new value
in arm_cpu_builtins then I think you should just undefine it in that function.


diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi 
gnu_trunk.p4/gcc/gcc/doc/invoke.texi
--- gnu_trunk.p3/gcc/gcc/doc/invoke.texi2015-09-10 12:21:00.698911244 
+0200
+++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi2015-09-14 10:27:20.281932581 
+0200
@@ -13360,6 +13363,8 @@
 floating-point arithmetic (in particular denormal values are treated as
 zero), so the use of NEON instructions may lead to a loss of precision.
 
+You can also set the fpu name at function level by using the @code{target("mfpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}).

+

s/"mfpu="/"fpu="


--- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c   1970-01-01 
01:00:00.0 +0100
+++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c   2015-09-14 
16:12:08.449698268 +0200
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3 -mfloat-abi=softfp -ftree-vectorize" } */
+
+void
+f3(int n, int x[], int y[]) {
+  int i;
+  for (i = 0; i < n; ++i)
+y[i] = x[i] << 3;
+}
+

What if GCC has been configured with --with-fpu=neon?
Then f3 will be compiled assuming NEON. You should add a -mfpu=vfp to the 
dg-options.





Re: [gomp4] Remove more gang local bits

2015-09-18 Thread Thomas Schwinge
Hi!

On Thu, 10 Sep 2015 13:48:56 -0400, Nathan Sidwell  wrote:
> I've committed this to gomp4 branch.  It removes more now-obsolete bits of 
> gang 
> local handling.

> --- libgomp/target.c  (revision 227633)
> +++ libgomp/target.c  (working copy)
> @@ -373,12 +373,7 @@ gomp_map_vars (struct gomp_device_descr
>   k->tgt_offset = tgt_size;
>   tgt_size += k->host_end - k->host_start;
>   k->copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
> - k->dealloc_host = (kind & typemask)
> -   == GOMP_MAP_FORCE_TO_GANGLOCAL;
> - if (GOMP_MAP_POINTER_P (kind & typemask) && i < 0 &&
> - (get_kind (is_openacc, kinds, i-1) & typemask)
> - == GOMP_MAP_FORCE_TO_GANGLOCAL)
> -   k->dealloc_host = true;
> + k->dealloc_host = false;
>   k->refcount = 1;
>   k->async_refcount = 0;
>   tgt->refcount++;

The dealloc_host flag had only been used in the "ganglocal"
implementation, which is now gone, so this can now also go; committed to
gomp-4_0-branch in r227900:

commit 108d67ade49b25931ba14788e39d6fd91259c37d
Author: tschwinge 
Date:   Fri Sep 18 09:11:54 2015 +

libgomp: Remove dealloc_host member of struct splay_tree_key_s

libgomp/
* libgomp.h (struct splay_tree_key_s): Remove dealloc_host member.
Adjust all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@227900 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |3 +++
 libgomp/libgomp.h  |2 --
 libgomp/target.c   |5 -
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 0c0e697..12cf8aa 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2015-09-18  Thomas Schwinge  
 
+   * libgomp.h (struct splay_tree_key_s): Remove dealloc_host member.
+   Adjust all users.
+
* testsuite/libgomp.oacc-fortran/reduction-5.f90: Extend.  XFAIL
execution test for -O0.
 
diff --git libgomp/libgomp.h libgomp/libgomp.h
index e976850..d51b08b 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -678,8 +678,6 @@ struct splay_tree_key_s {
   uintptr_t async_refcount;
   /* True if data should be copied from device to host at the end.  */
   bool copy_from;
-  /* True if data should be freed on the host, e.g. for ganglocal vars.  */
-  bool dealloc_host;
 };
 
 #include "splay-tree.h"
diff --git libgomp/target.c libgomp/target.c
index 5b77f3c..6ca80ad 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -373,7 +373,6 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
mapnum,
k->tgt_offset = tgt_size;
tgt_size += k->host_end - k->host_start;
k->copy_from = GOMP_MAP_COPY_FROM_P (kind & typemask);
-   k->dealloc_host = false;
k->refcount = 1;
k->async_refcount = 0;
tgt->refcount++;
@@ -569,8 +568,6 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
do_copyfrom)
  devicep->dev2host_func (devicep->target_id, (void *) k->host_start,
  (void *) (k->tgt->tgt_start + k->tgt_offset),
  k->host_end - k->host_start);
-   if (k->dealloc_host)
- free ((void *)k->host_start);
splay_tree_remove (&devicep->mem_map, k);
if (k->tgt->refcount > 1)
  k->tgt->refcount--;
@@ -712,7 +709,6 @@ gomp_load_image_to_device (struct gomp_device_descr 
*devicep, unsigned version,
   k->refcount = 1;
   k->async_refcount = 0;
   k->copy_from = false;
-  k->dealloc_host = false;
   tgt->list[i] = k;
   tgt->refcount++;
   array->left = NULL;
@@ -741,7 +737,6 @@ gomp_load_image_to_device (struct gomp_device_descr 
*devicep, unsigned version,
   k->refcount = 1;
   k->async_refcount = 0;
   k->copy_from = false;
-  k->dealloc_host = false;
   tgt->list[i] = k;
   tgt->refcount++;
   array->left = NULL;


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: Openacc launch API

2015-09-18 Thread Bernd Schmidt

On 09/17/2015 04:40 PM, Nathan Sidwell wrote:


Added call to gomp_fatal, indicating libgomp is out of date. Also added
a default to the switch following with the same effect.  The trouble
with implementing handling of device_type here now, is difficulty in
testing its correctness.  If it were  buggy we'd be in a worse position
than not having it.


Is that so difficult though? See if nvptx ignores (let's say) intelmic 
arguments in favour of the default and accepts nvptx ones.



+  if (num_waits > 8)
+gomp_fatal ("too many waits for legacy interface");
+
+  va_start (ap, num_waits);
+  for (ix = 0; ix != num_waits; ix++)
+waits[ix] = va_arg (ap, int);
+  waits[ix] = 0;
+  va_end (ap);


I still don't like this. I think there are at least two better 
alternatives: add a new GOMP_LAUNCH_key which makes GOACC_parallel read 
a number of waits from a va_list * pointer passed after it, or just 
admit that the legacy function always does host fallback and just 
truncate the current version after


  if (host_fallback)
{
  goacc_save_and_set_bind (acc_device_host);
  fn (hostaddrs);
  goacc_restore_bind ();
  return;
}

(which incidentally ignores all the wait arguments).

Other than that the patch is fine with me, but Jakub should have the 
last word.



Bernd


Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully

2015-09-18 Thread Rainer Orth
Hi Kyrill,

> Bootstrapped and tested on aarch64 and x86_64.
> Rainer, could you please try this patch in combination with the one I sent
> earlier at:
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html

will do, however, Solaris/SPARC bootstrap is broken right now (PR
bootstrap/67622) and I'll have to hunt that down first.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-18 Thread Richard Biener
On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse  wrote:
> Just a couple extra points. We can end up with a mix of < and >, which might
> prevent from matching:
>
>   _3 = b_1(D) > a_2(D);
>   _5 = a_2(D) < c_4(D);
>   _8 = _3 & _5;
>
> Just like with &, we could also transform:
> x < y | x < z  --->  x < max(y, z)
>
> (but maybe wait to make sure reviewers are ok with the first transformation
> before generalizing)

Please merge the patterns as suggested and do the :c/:s changes as well.

The issue with getting mixed < and > is indeed there - I've wanted to
extend :c to handle tcc_comparison in some way at some point but
didn't get to how best to implement that yet...

So to fix that currently you have to replicate the merged pattern
with swapped comparison operands.

Otherwise I'm fine with the general approach.

Richard.

>
> On Fri, 18 Sep 2015, Marc Glisse wrote:
>
>> On Thu, 17 Sep 2015, Michael Collison wrote:
>>
>>> Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR
>>> expressions. I need some assistance; this test case will fail on targets
>>> that don't have support for MIN/MAX such as 68k. Is there any way to remedy
>>> this short of enumerating whether a target support MIN/MAX in
>>> testsuite/lib/target_support?
>>>
>>> 2015-07-24  Michael Collison 
>>>Andrew Pinski 
>>>
>>>* match.pd ((x < y) && (x < z) -> x < min (y,z),
>>>(x > y) and (x > z) -> x > max (y,z))
>>>* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
>>>
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 5e8fd32..8691710 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
>>> (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>>>   (convert:utype @4)))
>>>
>>> +
>>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>>> +(for op (lt le)
>>> +(simplify
>>
>>
>> You seem to be missing all indentation.
>>
>>> +(bit_and:c (op @0 @1) (op @0 @2))
>>
>>
>> :c seems useless here. On the other hand, it might make sense to use op:s
>> since this is mostly useful if it removes the 2 original comparisons.
>>
>>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>
>>
>> How did you chose this restriction? It seems safe enough, but the
>> transformation could make sense in other cases as well. It can always be
>> generalized later though.
>>
>>> +(op @0 (min @1 @2)
>>> +
>>> +/* Transform (@0 > @1 and @0 > @2) to use max */
>>> +(for op (gt ge)
>>
>>
>> Note that you could unify the patterns with something like:
>> (for op (lt le gt ge)
>> ext (min min max max)
>> (simplify ...
>>
>>> +(simplify
>>> +(bit_and:c (op @0 @1) (op @0 @2))
>>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>> +(op @0 (max @1 @2)
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>> new file mode 100644
>>> index 000..cc0189a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>> @@ -0,0 +1,23 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>> +
>>> +#define N 1024
>>> +
>>> +int a[N], b[N], c[N];
>>> +
>>> +void add (unsigned int m, unsigned int n)
>>> +{
>>> +  unsigned int i;
>>> +  for (i = 0; i < m && i < n; ++i)
>>
>>
>> Maybe writing '&' instead of '&&' would make it depend less on the target.
>> Also, both tests seem to be for GENERIC (i.e. I expect that you are already
>> seeing the optimized version with -fdump-tree-original or
>> -fdump-tree-gimple). Maybe something as simple as:
>> int f(long a, long b, long c) {
>>  int cmp1 = a < b;
>>  int cmp2 = a < c;
>>  return cmp1 & cmp2;
>> }
>>
>>> +a[i] = b[i] + c[i];
>>> +}
>>> +
>>> +void add2 (unsigned int m, unsigned int n)
>>> +{
>>> +  unsigned int i;
>>> +  for (i = N-1; i > m && i > n; --i)
>>> +a[i] = b[i] + c[i];
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
>>> +/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
>>
>>
>>
>
> --
> Marc Glisse


Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249)

2015-09-18 Thread Marek Polacek
On Thu, Sep 17, 2015 at 10:37:40AM -0600, Martin Sebor wrote:
> >>The patch currently issues a false positive for the test case
> >>below. I suspect the chain might need to be cleared after each
> >>condition that involves a side-effect.
> >>
> >>   int foo (int a)
> >>   {
> >> if (a) return 1; else if (++a) return 2; else if (a) return 3;
> >> return 0;
> >>   }
> >
> >But the last branch here can never be reached, right?  If a == 0, foo
> >returns 2, otherwise it just returns 1.  So I think we should diagnose
> >this.
> 
> It probably wasn't the best example. The general issue here is
> that the second condition has a side-effect that can change (in
> this case clearly does) the value of the expression.
> 
> Here's a better example:
> 
> int a;
> 
> int bar (void) { a = 1; return 0; }
> 
> int foo (void) {
> if (a) return 1;
> else if (foo ()) return 2;
> else if (a) return 3;
> return 0;
> }
> 
> Since we don't know bar's side-effects we must assume they change
> the value of a and so we must avoid diagnosing the third if.

Ok, I'm convinced now.  We have something similar in the codebase:
libsupc++/eh_catch.cc has

  int count = header->handlerCount;
  if (count < 0)
{   
  // This exception was rethrown.  Decrement the (inverted) catch
  // count and remove it from the chain when it reaches zero.
  if (++count == 0)
globals->caughtExceptions = header->nextException;
}   
  else if (--count == 0)
{   
  // Handling for this exception is complete.  Destroy the object.
  globals->caughtExceptions = header->nextException;
  _Unwind_DeleteException (&header->unwindHeader);
  return;
}   
  else if (count < 0)
// A bug in the exception handling library or compiler.
std::terminate ();

Here all arms are reachable.  I guess I need to kill the chain of conditions
when we find something with side-effects, exactly as you suggested.

Again, thanks for your comments.

Marek


[PATCH][AARCH64] Emulating aligned mask loads on AArch64

2015-09-18 Thread Pawel Kupidura

This patch uses max reductions to emulate aligned masked loads on AArch64.
It reduces the mask to a scalar that is nonzero if any mask element is true,
then uses that scalar to select between the real address and a scratchpad
address.

The idea is that if the vector load is aligned, it cannot cross a page
boundary and so cannot partially fault.  It is safe to load from the
address (and use only some of the result) if any mask element is true.

The patch provided a 15% speed improvement for simple microbenchmarks.

There were several spec2k6 benchmarks affected by patch: 400.perlbench,
403.gcc, 436.cactusADM, 454.calculix and 464.h264.  However, the changes
had no measureable effect on performance.

Regression-tested on x86_64-linux-gnu, aarch64-linux-gnu and 
arm-linux-gnueabi.


Thanks,
Pawel
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 73f2729..066d133 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -134,5 +134,6 @@ bool can_vec_mask_load_store_p (machine_mode, bool);
 bool can_compare_and_swap_p (machine_mode, bool);
 bool can_atomic_exchange_p (machine_mode, bool);
 bool lshift_cheap_p (bool);
+bool supports_umax_reduction ();
 
 #endif
diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index 254089f..23a85a4 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -463,6 +463,21 @@ can_mult_highpart_p (machine_mode mode, bool uns_p)
   return 0;
 }
 
+/* Return true if target supports unsigned max reduction for any mode.  */
+
+bool
+supports_umax_reduction ()
+{
+  machine_mode mode;
+
+  for (mode = MIN_MODE_VECTOR_INT; mode <= MAX_MODE_VECTOR_INT;
+   mode = (machine_mode) (mode + 1))
+if (optab_handler (reduc_umax_scal_optab, mode) != CODE_FOR_nothing)
+  return true;
+
+  return false;
+}
+
 /* Return true if target supports vector masked load/store for mode.  */
 
 bool
diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-align-4.c
new file mode 100644
index 000..98db8e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-align-4.c
@@ -0,0 +1,65 @@
+/* { dg-require-effective-target umax_reduction } */
+
+#define N 512
+#define K 32
+
+extern void abort (void) __attribute__((noreturn));
+
+int a[N] __attribute__ ((aligned (16)));
+int b[N] __attribute__ ((aligned (16)));
+int c[N] __attribute__ ((aligned (16)));
+
+__attribute__ ((noinline)) void
+init_arrays () {
+  int i;
+
+  for (i = 0; i < N / 4; ++i)
+a[i] = K + 1;
+
+  for (i = N / 4; i < N / 2; ++i)
+a[i] = (i % 2 == 0) ? K - 1 : K + 1;
+
+  for (i = N / 2; i < N; ++i)
+a[i] = K - 1;
+
+  for (i = 0; i < N; ++i)
+b[i] = i;
+}
+
+__attribute__ ((noinline)) void
+check_array () {
+  int i = 0;
+
+  for (i = 0; i < N / 4; ++i)
+if (c[i] != a[i])
+  abort ();
+
+  for (i = N / 4; i < N / 2; ++i)
+if (c[i] != (i % 2 == 0) ? b[i] : a[i])
+  abort ();
+
+  for (i = N / 2; i < N; ++i)
+if (c[i] != b[i])
+  abort ();
+}
+
+__attribute__ ((noinline)) void
+main1 (int* bp) {
+  int i;
+
+  for (i = 0; i < N; ++i)
+c[i] = a[i] < K ? bp[i] : a[i];
+
+  check_array ();
+}
+
+int main (void) {
+  init_arrays ();
+
+  main1 (b);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-align-5.c
new file mode 100644
index 000..93bfaa1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-align-5.c
@@ -0,0 +1,65 @@
+/* { dg-require-effective-target umax_reduction } */
+
+#define N 512
+#define K 32
+
+extern void abort (void) __attribute__((noreturn));
+
+int a[N] __attribute__ ((aligned (16)));
+int b[N];
+int c[N] __attribute__ ((aligned (16)));
+
+__attribute__ ((noinline)) void
+init_arrays () {
+  int i;
+
+  for (i = 0; i < N / 4; ++i)
+a[i] = K + 1;
+
+  for (i = N / 4; i < N / 2; ++i)
+a[i] = (i % 2 == 0) ? K - 1 : K + 1;
+
+  for (i = N / 2; i < N; ++i)
+a[i] = K - 1;
+
+  for (i = 0; i < N; ++i)
+b[i] = i;
+}
+
+__attribute__ ((noinline)) void
+check_array () {
+  int i = 0;
+
+  for (i = 0; i < N / 4; ++i)
+if (c[i] != a[i])
+  abort ();
+
+  for (i = N / 4; i < N / 2; ++i)
+if (c[i] != (i % 2 == 0) ? b[i] : a[i])
+  abort ();
+
+  for (i = N / 2; i < N; ++i)
+if (c[i] != b[i])
+  abort ();
+}
+
+__attribute__ ((noinline)) void
+main1 (int* bp) {
+  int i;
+
+  for (i = 0; i < N; ++i)
+c[i] = a[i] < K ? bp[i] : a[i];
+
+  check_array ();
+}
+
+int main (void) {
+  init_arrays ();
+
+  main1 (b);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index a465eb1..9b1c338 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6449,3 +6449,14 @@ proc check_effective_target_comdat_group {} {
int (*fn) () = foo;
 }]
 }
+
+# Return 1 if the target supports unsigned max 

Re: [PATCH][AARCH64] Emulating aligned mask loads on AArch64

2015-09-18 Thread James Greenhalgh
On Fri, Sep 18, 2015 at 11:24:50AM +0100, Pawel Kupidura wrote:
> This patch uses max reductions to emulate aligned masked loads on AArch64.
> It reduces the mask to a scalar that is nonzero if any mask element is true,
> then uses that scalar to select between the real address and a scratchpad
> address.
> 
> The idea is that if the vector load is aligned, it cannot cross a page
> boundary and so cannot partially fault.  It is safe to load from the
> address (and use only some of the result) if any mask element is true.
> 
> The patch provided a 15% speed improvement for simple microbenchmarks.
> 
> There were several spec2k6 benchmarks affected by patch: 400.perlbench,
> 403.gcc, 436.cactusADM, 454.calculix and 464.h264.  However, the changes
> had no measureable effect on performance.
> 
> Regression-tested on x86_64-linux-gnu, aarch64-linux-gnu and 
> arm-linux-gnueabi.

Hi Pawel, this patch doesn't look AArch64 specific to me. You will probably
get more traction with reviews if you post it tagged appropriately and
with the relevant maintainers on CC, in this case - as an auto-vectorizer
patch, Richard Biener and Zdenek Dvorak.

It is also customary to include a ChangeLog in your submissions, this can
be useful for seeign at a glance what your patch modifies.

Thanks,
James

> diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> index 73f2729..066d133 100644
> --- a/gcc/optabs-query.h
> +++ b/gcc/optabs-query.h
> @@ -134,5 +134,6 @@ bool can_vec_mask_load_store_p (machine_mode, bool);
>  bool can_compare_and_swap_p (machine_mode, bool);
>  bool can_atomic_exchange_p (machine_mode, bool);
>  bool lshift_cheap_p (bool);
> +bool supports_umax_reduction ();
>  
>  #endif
> diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> index 254089f..23a85a4 100644
> --- a/gcc/optabs-query.c
> +++ b/gcc/optabs-query.c
> @@ -463,6 +463,21 @@ can_mult_highpart_p (machine_mode mode, bool uns_p)
>return 0;
>  }
>  
> +/* Return true if target supports unsigned max reduction for any mode.  */
> +
> +bool
> +supports_umax_reduction ()
> +{
> +  machine_mode mode;
> +
> +  for (mode = MIN_MODE_VECTOR_INT; mode <= MAX_MODE_VECTOR_INT;
> +   mode = (machine_mode) (mode + 1))
> +if (optab_handler (reduc_umax_scal_optab, mode) != CODE_FOR_nothing)
> +  return true;
> +
> +  return false;
> +}
> +
>  /* Return true if target supports vector masked load/store for mode.  */
>  
>  bool
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-4.c 
> b/gcc/testsuite/gcc.dg/vect/vect-align-4.c
> new file mode 100644
> index 000..98db8e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-align-4.c
> @@ -0,0 +1,65 @@
> +/* { dg-require-effective-target umax_reduction } */
> +
> +#define N 512
> +#define K 32
> +
> +extern void abort (void) __attribute__((noreturn));
> +
> +int a[N] __attribute__ ((aligned (16)));
> +int b[N] __attribute__ ((aligned (16)));
> +int c[N] __attribute__ ((aligned (16)));
> +
> +__attribute__ ((noinline)) void
> +init_arrays () {
> +  int i;
> +
> +  for (i = 0; i < N / 4; ++i)
> +a[i] = K + 1;
> +
> +  for (i = N / 4; i < N / 2; ++i)
> +a[i] = (i % 2 == 0) ? K - 1 : K + 1;
> +
> +  for (i = N / 2; i < N; ++i)
> +a[i] = K - 1;
> +
> +  for (i = 0; i < N; ++i)
> +b[i] = i;
> +}
> +
> +__attribute__ ((noinline)) void
> +check_array () {
> +  int i = 0;
> +
> +  for (i = 0; i < N / 4; ++i)
> +if (c[i] != a[i])
> +  abort ();
> +
> +  for (i = N / 4; i < N / 2; ++i)
> +if (c[i] != (i % 2 == 0) ? b[i] : a[i])
> +  abort ();
> +
> +  for (i = N / 2; i < N; ++i)
> +if (c[i] != b[i])
> +  abort ();
> +}
> +
> +__attribute__ ((noinline)) void
> +main1 (int* bp) {
> +  int i;
> +
> +  for (i = 0; i < N; ++i)
> +c[i] = a[i] < K ? bp[i] : a[i];
> +
> +  check_array ();
> +}
> +
> +int main (void) {
> +  init_arrays ();
> +
> +  main1 (b);
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-5.c 
> b/gcc/testsuite/gcc.dg/vect/vect-align-5.c
> new file mode 100644
> index 000..93bfaa1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-align-5.c
> @@ -0,0 +1,65 @@
> +/* { dg-require-effective-target umax_reduction } */
> +
> +#define N 512
> +#define K 32
> +
> +extern void abort (void) __attribute__((noreturn));
> +
> +int a[N] __attribute__ ((aligned (16)));
> +int b[N];
> +int c[N] __attribute__ ((aligned (16)));
> +
> +__attribute__ ((noinline)) void
> +init_arrays () {
> +  int i;
> +
> +  for (i = 0; i < N / 4; ++i)
> +a[i] = K + 1;
> +
> +  for (i = N / 4; i < N / 2; ++i)
> +a[i] = (i % 2 == 0) ? K - 1 : K + 1;
> +
> +  for (i = N / 2; i < N; ++i)
> +a[i] = K - 1;
> +
> +  for (i = 0; i < N; ++i)
> +b[i] = i;
> +}
> +
> +__attribute__ ((noinline)) void
> +check_array () {
> +  int i = 0;
> +
> +  for (i = 0; i < N / 4; ++i)
> +if (c[i] != a[i])
> +  abort ();
> +
> +  for (i = N / 4; i < N / 2; ++i)
> +if (c[

Re: debug mode symbols cleanup

2015-09-18 Thread Ulrich Weigand
Francois Dumont wrote:

> * include/debug/formatter.h
> (_Error_formatter::_Parameter::_M_print_field): Delete.
> (_Error_formatter::_Parameter::_M_print_description): Likewise.
> (_Error_formatter::_M_format_word): Likewise.
> (_Error_formatter::_M_print_word): Likewise.
> (_Error_formatter::_M_print_string): Likewise.
> (_Error_formatter::_M_get_max_length): Likewise.
> (_Error_formatter::_M_max_length): Likewise.
> (_Error_formatter::_M_indent): Likewise.
> (_Error_formatter::_M_column): Likewise.
> (_Error_formatter::_M_first_line): Likewise.
> (_Error_formatter::_M_wordwrap): Likewise.
> * src/c++11/debug.cc: Adapt.

This seems to break building an spu-elf cross-compiler:

/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:
 In function 'void {anonymous}::print_word({anonymous}::PrintContext&, const 
char*, std::ptrdiff_t)':
/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:10:
 error: 'stderr' was not declared in this scope
  fprintf(stderr, "\n");
  ^
/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:22:
 error: 'fprintf' was not declared in this scope
  fprintf(stderr, "\n");
  ^

Bye,
Ulrich



Re: debug mode symbols cleanup

2015-09-18 Thread Jonathan Wakely

On 18/09/15 13:00 +0200, Ulrich Weigand wrote:

Francois Dumont wrote:


* include/debug/formatter.h
(_Error_formatter::_Parameter::_M_print_field): Delete.
(_Error_formatter::_Parameter::_M_print_description): Likewise.
(_Error_formatter::_M_format_word): Likewise.
(_Error_formatter::_M_print_word): Likewise.
(_Error_formatter::_M_print_string): Likewise.
(_Error_formatter::_M_get_max_length): Likewise.
(_Error_formatter::_M_max_length): Likewise.
(_Error_formatter::_M_indent): Likewise.
(_Error_formatter::_M_column): Likewise.
(_Error_formatter::_M_first_line): Likewise.
(_Error_formatter::_M_wordwrap): Likewise.
* src/c++11/debug.cc: Adapt.


This seems to break building an spu-elf cross-compiler:

/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:
 In function 'void {anonymous}::print_word({anonymous}::PrintContext&, const 
char*, std::ptrdiff_t)':
/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:10:
 error: 'stderr' was not declared in this scope
 fprintf(stderr, "\n");
 ^
/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:22:
 error: 'fprintf' was not declared in this scope
 fprintf(stderr, "\n");
 ^


Catherine fixed this with r227888.


FW: [PATCH] Target hook for disabling the delay slot filler.

2015-09-18 Thread Simon Dardis
> Are you trying to say that you have the option as to what kind of 
> branch to use?  ie, "ordinary", presumably without a delay slot or one 
> with a delay slot?

> Is the "ordinary" actually just a nullified delay slot or some form of 
> likely/not likely static hint?

Specifically for MIPSR6: the ISA possesses traditional delay slot branches and
a normal branch (no delay slots, not annulling, no hints, subtle static hazard),
aka "compact branch" in MIPS terminology. They could be described as nullify
on taken delay slot branch but we saw little to no value in that.

Matthew Fortune provided a writeup with their handling in GCC: 

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01892.html

> But what is the compact form at the micro-architectural level?  My
> mips-fu has diminished greatly, but my recollection is the bubble is
> always there.   Is that not the case?

The pipeline bubble will exist but the performance impact varies across
R6 cores. High-end OoO cores won't be impacted as much, but lower
end cores will. microMIPSR6 removes delay slot branches altogether which
pushes the simplest micro-architectures to optimize away the cost of a
pipeline bubble.

For non-microMIPSR6 this is why we have different branch policies implemented
in the MIPS backend to allow branch usage to be tuned. By default, if a delay
slot can be filled then we use a delay slot branch otherwise we use a compact
branch as the only thing in the DS would be a NOP anyway.

Compact branches do a strange restriction in that they cannot be followed by a 
CTI. This is to simplify branch predictors apparently but this may be lifted in
future ISA releases.

> If it is able to find insns from the commonly executed path that don't 
> have a long latency, then the fill is usually profitable (since the 
> pipeline bubble always exists).  However, pulling a long latency 
> instruction (say anything that might cache miss or an fdiv/fsqrt) off 
> the slow path and conditionally nullifying it can be *awful*.
> Everything else is in-between.

I agree. The variability in profit/loss in a concern and I see two ways to deal
with it:

A) modify the delay slot filler so that it choses speculative instructions of 
less than some $cost and avoid instruction duplication when the eager filler
picks an instruction from a block with multiple predecessors. Making such
changes would be invasive and require more target specific hooks.

B) Use compact branches instead of speculative delay slot execution and forsake
variable performance for a consistent pipeline bubble by not using the
speculative delay filler altogether.

Between these two choices, B seems to better option as due to sheer simplicity.
Choosing neither gives speculative instruction execution when there could be a
small consistent penalty instead.

Thanks,
Simon

From: Jeff Law [l...@redhat.com]
Sent: 17 September 2015 17:55
To: Simon Dardis; Bernd Schmidt
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Target hook for disabling the delay slot filler.

On 09/17/2015 03:52 AM, Simon Dardis wrote:
> The profitability of using an ordinary branch over a delay slot branch 
> depends on how the delay slot is filled. If a delay slot can be filled 
> from an instruction preceding the branch or instructions proceeding 
> that must be executed on both sides then it is profitable to use a delay slot 
> branch.
Agreed.  It's an over-simplification, but for the purposes of this discussion 
it's close enough.


>
> For cases when instructions are chosen from one side of the branch, 
> the proposed optimization strategy is to not speculatively execute 
> instructions when ordinary branches could be used. Performance-wise 
> this avoids executing instructions which the eager delay filler picked 
> wrongly.
Are you trying to say that you have the option as to what kind of branch to 
use?  ie, "ordinary", presumably without a delay slot or one with a delay slot?

Is the "ordinary" actually just a nullified delay slot or some form of 
likely/not likely static hint?



>
> Since most branches have a compact form disabling the eager delay 
> filler should be no worse than altering it not to fill delay slots in this 
> case.
But what is the compact form at the micro-architectural level?  My mips-fu has 
diminished greatly, but my recollection is the bubble is
always there.   Is that not the case?

fill_eager_delay_slots is most definitely speculative and its profitability is 
largely dependent on the cost of what insns it finds to fill those delay slots 
and whether they're from the common or uncommon path.

If it is able to find insns from the commonly executed path that don't have a 
long latency, then the fill is usually profitable (since the pipeline bubble 
always exists).  However, pulling a long latency instruction (say anything that 
might cache miss or an fdiv/fsqrt) off the slow path and conditionally 
nullifying it can be *awful*.
Everything else is in-between.



J

Re: debug mode symbols cleanup

2015-09-18 Thread Ulrich Weigand
Jonathan Wakely wrote:
> On 18/09/15 13:00 +0200, Ulrich Weigand wrote:
> >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:
> > In function 'void {anonymous}::print_word({anonymous}::PrintContext&, const 
> >char*, std::ptrdiff_t)':
> >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:10:
> > error: 'stderr' was not declared in this scope
> >  fprintf(stderr, "\n");
> >  ^
> >/home/uweigand/dailybuild/spu-tc-2015-09-17/gcc-head/src/libstdc++-v3/src/c++11/debug.cc:573:22:
> > error: 'fprintf' was not declared in this scope
> >  fprintf(stderr, "\n");
> >  ^
> 
> Catherine fixed this with r227888.

Ah, OK.  Thanks!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [patch] libstdc++/64857 Rationalise PCH headers and 17_intro/headers tests.

2015-09-18 Thread Jonathan Wakely

On 11/09/15 13:25 +0100, Jonathan Wakely wrote:

diff --git a/libstdc++-v3/include/precompiled/extc++.h 
b/libstdc++-v3/include/precompiled/extc++.h
index de3775b..8883e47 100644
--- a/libstdc++-v3/include/precompiled/extc++.h
+++ b/libstdc++-v3/include/precompiled/extc++.h
@@ -28,15 +28,25 @@

#if __cplusplus < 201103L
#include 
+#else
+#include 
#endif

#include 
+#if __cplusplus >= 201103L
+# include 
+#endif
+#include 
#include 
#include 
#include 
#include 
+#if __cplusplus >= 201103L
+# include 
+#endif
#include 
#include 
+#include 


Kai pointed out that  is already included further
down, and guarded by _GLIBCXX_HAVE_ICONV, so doing it here breaks some
targets.


#include 
#include 
#include 
@@ -45,9 +55,13 @@
#include 
#include 
#include 
+#include 
#include 
#include 
#include 
+#if __cplusplus >= 201103L
+# include 
+#endif


This breaks bootstrap on (at least) NetBSD 5.x because 
uses UINT32_C. That fails due to https://gcc.gnu.org/PR65806 so I'm
checking for that macro in  (and I also have a patch
coming to use GCC's own stdint.h on NetBSD 5).

Fixed by the attached patch.

Tested powerpc64le-linux and x86_64-netbsd5.1.

Committed to trunk.

commit c8d1af248beaa22b81d7606a14a2742028d20056
Author: Jonathan Wakely 
Date:   Fri Sep 18 11:35:11 2015 +0100

Fix errors due to extra includes in extc++.h

	* include/precompiled/extc++.h: Fix bootstrap error due to
	unconditional inclusion of .
	* include/ext/random: Check for definition of UINT32_C.

diff --git a/libstdc++-v3/include/ext/random b/libstdc++-v3/include/ext/random
index be6db5d..0bcfa4a 100644
--- a/libstdc++-v3/include/ext/random
+++ b/libstdc++-v3/include/ext/random
@@ -43,7 +43,7 @@
 # include 
 #endif
 
-#ifdef _GLIBCXX_USE_C99_STDINT_TR1
+#if defined(_GLIBCXX_USE_C99_STDINT_TR1) && defined(UINT32_C)
 
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
@@ -3499,7 +3499,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
 #include "ext/opt_random.h"
 #include "random.tcc"
 
-#endif // _GLIBCXX_USE_C99_STDINT_TR1
+#endif // _GLIBCXX_USE_C99_STDINT_TR1 && UINT32_C
 
 #endif // C++11
 
diff --git a/libstdc++-v3/include/precompiled/extc++.h b/libstdc++-v3/include/precompiled/extc++.h
index 8883e47..31f5ec8 100644
--- a/libstdc++-v3/include/precompiled/extc++.h
+++ b/libstdc++-v3/include/precompiled/extc++.h
@@ -46,7 +46,6 @@
 #endif
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 


Re: [PATCH][AARCH64] Emulating aligned mask loads on AArch64

2015-09-18 Thread Richard Biener
On Fri, 18 Sep 2015, James Greenhalgh wrote:

> On Fri, Sep 18, 2015 at 11:24:50AM +0100, Pawel Kupidura wrote:
> > This patch uses max reductions to emulate aligned masked loads on AArch64.
> > It reduces the mask to a scalar that is nonzero if any mask element is true,
> > then uses that scalar to select between the real address and a scratchpad
> > address.
> > 
> > The idea is that if the vector load is aligned, it cannot cross a page
> > boundary and so cannot partially fault.  It is safe to load from the
> > address (and use only some of the result) if any mask element is true.
> > 
> > The patch provided a 15% speed improvement for simple microbenchmarks.
> > 
> > There were several spec2k6 benchmarks affected by patch: 400.perlbench,
> > 403.gcc, 436.cactusADM, 454.calculix and 464.h264.  However, the changes
> > had no measureable effect on performance.
> > 
> > Regression-tested on x86_64-linux-gnu, aarch64-linux-gnu and 
> > arm-linux-gnueabi.
> 
> Hi Pawel, this patch doesn't look AArch64 specific to me. You will probably
> get more traction with reviews if you post it tagged appropriately and
> with the relevant maintainers on CC, in this case - as an auto-vectorizer
> patch, Richard Biener and Zdenek Dvorak.
> 
> It is also customary to include a ChangeLog in your submissions, this can
> be useful for seeign at a glance what your patch modifies.

Some comments - first of all you don't need REDUC_MAX_EXPR, you only
need to do a test for non-zero mask.  You can do this with querying
an integer mode of vec_mask size and then using

  VIEW_CONVERT_EXPR (vec_mask) != 0 ? ...

now to your assumption that an "aligned" vector load cannot trap.  I
think that assumption is wrong if the alignment requirement of the
target is not the size of the vector.  Note that the current 
implementation inside the vectorizer has a notion of
aligned == aligned to vector size so I think you are fine here.

As you are converting the masked load to a conditional load the
target may have support for that and thus you could avoid the
scratch memory in that case (you can also avoid creating more than
one scratch memory per vectorized function - all vectors are of the
same size).  Not sure if there is any target where this applies to
though.  IIRC AMD XOP had a real vector "cond-expr" that allowed
a memory operand in one of the arms (but I don't remember whether
the specification said anything about not performing the load
if the mask specifies all bits come from the other operand).

Other than that, a minor comment on the patch below...

> Thanks,
> James
> 
> > diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> > index 73f2729..066d133 100644
> > --- a/gcc/optabs-query.h
> > +++ b/gcc/optabs-query.h
> > @@ -134,5 +134,6 @@ bool can_vec_mask_load_store_p (machine_mode, bool);
> >  bool can_compare_and_swap_p (machine_mode, bool);
> >  bool can_atomic_exchange_p (machine_mode, bool);
> >  bool lshift_cheap_p (bool);
> > +bool supports_umax_reduction ();
> >  
> >  #endif
> > diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> > index 254089f..23a85a4 100644
> > --- a/gcc/optabs-query.c
> > +++ b/gcc/optabs-query.c
> > @@ -463,6 +463,21 @@ can_mult_highpart_p (machine_mode mode, bool uns_p)
> >return 0;
> >  }
> >  
> > +/* Return true if target supports unsigned max reduction for any mode.  */
> > +
> > +bool
> > +supports_umax_reduction ()
> > +{
> > +  machine_mode mode;
> > +
> > +  for (mode = MIN_MODE_VECTOR_INT; mode <= MAX_MODE_VECTOR_INT;
> > +   mode = (machine_mode) (mode + 1))
> > +if (optab_handler (reduc_umax_scal_optab, mode) != CODE_FOR_nothing)
> > +  return true;
> > +
> > +  return false;
> > +}
> > +
> >  /* Return true if target supports vector masked load/store for mode.  */
> >  
> >  bool
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-4.c 
> > b/gcc/testsuite/gcc.dg/vect/vect-align-4.c
> > new file mode 100644
> > index 000..98db8e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-align-4.c
> > @@ -0,0 +1,65 @@
> > +/* { dg-require-effective-target umax_reduction } */
> > +
> > +#define N 512
> > +#define K 32
> > +
> > +extern void abort (void) __attribute__((noreturn));
> > +
> > +int a[N] __attribute__ ((aligned (16)));
> > +int b[N] __attribute__ ((aligned (16)));
> > +int c[N] __attribute__ ((aligned (16)));
> > +
> > +__attribute__ ((noinline)) void
> > +init_arrays () {
> > +  int i;
> > +
> > +  for (i = 0; i < N / 4; ++i)
> > +a[i] = K + 1;
> > +
> > +  for (i = N / 4; i < N / 2; ++i)
> > +a[i] = (i % 2 == 0) ? K - 1 : K + 1;
> > +
> > +  for (i = N / 2; i < N; ++i)
> > +a[i] = K - 1;
> > +
> > +  for (i = 0; i < N; ++i)
> > +b[i] = i;
> > +}
> > +
> > +__attribute__ ((noinline)) void
> > +check_array () {
> > +  int i = 0;
> > +
> > +  for (i = 0; i < N / 4; ++i)
> > +if (c[i] != a[i])
> > +  abort ();
> > +
> > +  for (i = N / 4; i < N / 2; ++i)
> > +if (c[i] != (i % 2 == 0) ? b[i] : a[i])
> > +  abort ();
> > +
> >

Re: [RFC][Scalar masks 1/x] Introduce GEN_MASK_EXPR.

2015-09-18 Thread Richard Biener
On Tue, Aug 25, 2015 at 11:02 PM, Jeff Law  wrote:
> On 08/21/2015 10:30 AM, Ilya Enkovich wrote:
>>>
>>> If we're checking an optab to drive an optimization, then we're probably
>>> on
>>> the wrong track.
>>
>>
>> That's totally similar to VEC_COND_EXPR which we generate comparison into.
>
> It is.  The vectorizer is riddled with this stuff.  Sigh.  So I won't
> consider this a negative for the scalar  mask support.

Hey, just because it's already bad don't make it worse!

>>
>>> I think this ties into the overall discussion about
>>> whether or not to represent these masks in gimple or try to handle them
>>> later during gimple->rtl expansion.
>>
>>
>> Currently we don't have any abstraction for masks, it is supported
>> using vector of integers. When I expand it I have no idea whether it
>> is just a vector of integers to be stored or a mask to be used for
>> MASK_LOAD. Actually it may be used in both cases at the same time.
>>
>> Handling it in RTL means we have to undo bool->int transformation made
>> in GIMPLE. For trivial cases it may be easy but in generic it can be
>> challenging.  I want to avoid it from the beginning.
>
> I wasn't suggesting handling them in RTL, but at the border between gimple
> and RTL.
>
> But if we can't reliably determine how a particular mask is going to be used
> at that point, then doing things at the gimple/RTL border may be a
> spectacularly bad idea ;-)

Yeah, I also see Ilyas point here.  Basically the new integer masks are
generated by if-conversion only.  What I was suggesting is to differentiate
both cases by making vector comparisons always result in vector.
Thus if you have

  if (a < b || d > e)
   x = c;

then vectorized if-converted code would generate

  vector cnd = a < b;
  vector cnd2 = d > e;
  vector cnd3 = cnd | cnd2;
  x = cnd3 ? c : x;

when the target supports vector and otherwise

  vector cnd = a < b ? -1 : 0;
  vector cnd2 = d > e ? -1 : 0;
  vector cnd3 = cnd | cnd2;
  x = cnd3 ? c : x;

which means only VEC_COND expr would support both kind of masks
(well, and the IFNs for masked loads/stores).  Basically the vectorizer
needs to handle vectorizing conditions like

   cnd = a < b;

dependent on target support for masks (that is vectorizing of 'bool'
operations).

Reworking the existing vectorization of bool operations to not depend
on pattern detection would be a first step (and already soving a few
PRs on the way).

Richard.

>
> jeff


[gomp4] default reduction expansion

2015-09-18 Thread Nathan Sidwell
The default reduction expander was confusingly not placed with the other openacc 
default hooks, it also indirected to a bunch of worker functions all doing 
essentially the same  thing, which obscured what was happening.


Reimplemented thusly.

nathan
2015-09-18  Nathan Sidwell  

	* omp-low.c (default_goacc_reduction): Move to other default
	handlers, reimplement to simplify.
	(default_goacc_reduction_setup, default_goacc_reduction_teardown,
	default_goacc_reduction_init_fini): Remove.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 227822)
+++ gcc/omp-low.c	(working copy)
@@ -15027,6 +15027,60 @@ default_goacc_lock (gimple ARG_UNUSED (s
   return false;
 }
 
+/* Default goacc.reduction early expander.
+
+   LHS-opt = IFN_RED_ (RES_PTR-opt, VAR, LEVEL, OP, LID, RID)
+   If RES_PTR is not integer-zerop:
+   SETUP - emit 'LHS = *RES_PTR', LHS = NULL
+   TEARDOWN - emit '*RES_PTR = VAR'
+   If LHS is not NULL
+   emit 'LHS = VAR'
+
+ Return false -- does not need a rescan.  */
+
+bool
+default_goacc_reduction (gimple call)
+{
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
+  tree lhs = gimple_call_lhs (call);
+  tree var = gimple_call_arg (call, 1);
+  unsigned code = gimple_call_internal_fn (call);
+  gimple_seq seq = NULL;
+
+  /* Mark the function for SSA renaming.  */
+  mark_virtual_operands_for_renaming (cfun);
+
+  if (code == IFN_GOACC_REDUCTION_SETUP
+  || code == IFN_GOACC_REDUCTION_TEARDOWN)
+{
+  /* Setup and Teardown need to copy from/to the receiver object,
+	 if there is one.  */
+  tree ref_to_res = gimple_call_arg (call, 0);
+  
+  if (!integer_zerop (ref_to_res))
+	{
+	  tree dst = build_simple_mem_ref (ref_to_res);
+	  tree src = var;
+	  
+	  if (code == IFN_GOACC_REDUCTION_SETUP)
+	{
+	  src = dst;
+	  dst = lhs;
+	  lhs = NULL;
+	}
+	  gimple_seq_add_stmt (&seq, gimple_build_assign (dst, src));
+	}
+}
+
+  /* Copy VAR to LHS, if there is an LHS.  */
+  if (lhs)
+gimple_seq_add_stmt (&seq, gimple_build_assign (lhs, var));
+
+  gsi_replace_with_seq (&gsi, seq, true);
+
+  return false;
+}
+
 namespace {
 
 const pass_data pass_data_oacc_transform =
@@ -15070,145 +15124,4 @@ make_pass_oacc_transform (gcc::context *
   return new pass_oacc_transform (ctxt);
 }
 
-/* Default implementation of targetm.goacc.reduction_setup.  This hook
-   provides a baseline implementation for the internal function
-   GOACC_REDUCTION_SETUP for a single-threaded target.  I.e. num_gangs =
-   num_workers = vector_length = 1.
-
-   Given:
-
- V = IFN_RED_SETUP (RES_PTR, LOCAL, LEVEL, OP. LID, RID)
-
-   Expand to:
-
- V = RES_PTR ? *RES_PTR : LOCAL;
-*/
-
-static bool
-default_goacc_reduction_setup (gimple call)
-{
-  gimple_stmt_iterator gsi = gsi_for_stmt (call);
-  tree v = gimple_call_lhs (call);
-  tree ref_to_res = gimple_call_arg (call, 0);
-  tree local_var = gimple_call_arg (call, 1);
-  gimple_seq seq = NULL;
-
-  push_gimplify_context (true);
-
-  if (!integer_zerop (ref_to_res))
-{
-  tree x = build_simple_mem_ref (ref_to_res);
-  gimplify_assign (v, x, &seq);
-}
-  else
-gimplify_assign (v, local_var, &seq);
-
-  pop_gimplify_context (NULL);
-
-  gsi_replace_with_seq (&gsi, seq, true);
-
-  return false;
-}
-
-/* Default implementation for both targetm.goacc.reduction_init and
-   reduction_fini.  This hook provides a baseline implementation for the
-   internal functions GOACC_REDUCTION_INIT and GOACC_REDUCTION_FINI for a
-   single-threaded target.
-
-   Given:
-
- V = IFN_RED_INIT (RES_PTR, LOCAL, LEVEL, OP, LID, RID)
-
-   or
-
- V = IFN_RED_FINI (RES_PTR, LOCAL, LEVEL, OP, LID, RID)
-
-   Expand to:
-
- V = LOCAL;
-*/
-
-static bool
-default_goacc_reduction_init_fini (gimple call)
-{
-  gimple_stmt_iterator gsi = gsi_for_stmt (call);
-  tree v = gimple_call_lhs (call);
-  tree local_var = gimple_call_arg (call, 1);
-  gimple g;
-
-  g = gimple_build_assign (v, local_var);
-  gsi_replace (&gsi, g, true);
-
-  return false;
-}
-
-/* Default implementation of targetm.goacc.reduction_teardown.  This hook
-   provides a baseline implementation for the internal function
-   GOACC_REDUCTION_TEARDOWN for a single-threaded target.
-
-   Given:
-
- IFN_RED_TEARDOWN (RES_PTR, LOCAL, LEVEL, OP, LID, RID)
-
-   Expand to:
-
- if (RES_PTR)
-   *RES_PTR = LOCAL;
-
-V = LOCAL;
-*/
-
-static bool
-default_goacc_reduction_teardown (gimple call)
-{
-  gimple_stmt_iterator gsi = gsi_for_stmt (call);
-  tree lhs = gimple_call_lhs (call);
-  tree ref_to_res = gimple_call_arg (call, 0);
-  tree var = gimple_call_arg (call, 1);
-  gimple_seq seq = NULL;
-
-  push_gimplify_context (true);
-
-  if (!integer_zerop (ref_to_res))
-{
-  tree x = build_simple_mem_ref (ref_to_res);
-  gimplify_assign (x, var, &seq);
-}
-
-  if (lhs != NULL_TREE)
-gimplify_assign (lhs, var, &seq);
-
-  pop_gimplify_context (NU

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-18 Thread Richard Biener
On Tue, Sep 15, 2015 at 5:32 PM, Alan Hayward  wrote:
>
>
> On 15/09/2015 13:09, "Richard Biener"  wrote:
>
>>
>>Now comments on the patch itself.
>>
>>+  if (code == COND_EXPR)
>>+   *v_reduc_type = COND_REDUCTION;
>>
>>so why not simply use COND_EXPR instead of the new v_reduc_type?
>
> v_reduc_type is also dependant on check_reduction (which comes from
> !nested_cycle in vectorizable_reduction).
> It seemed messy to keep on checking for both those things throughout.
>
> In my patch to catch simpler condition reductions, I’ll be adding another
> value to this enum too. v_reduc_type will be set to this new value based
> on the same properties for COND_REDUCTION plus some additional constraints.
>
>>
>>+  if (check_reduction && code != COND_EXPR &&
>>+  vect_is_slp_reduction (loop_info, phi, def_stmt))
>>
>>&&s go to the next line
>
> ok
>
>>
>>+ /* Reduction of the max index and a reduction of the found
>>+values.  */
>>+ epilogue_cost += add_stmt_cost (target_cost_data, 1,
>>+ vec_to_scalar, stmt_info, 0,
>>+ vect_epilogue);
>>
>>vec_to_scalar once isn't what the comment suggests.  Instead the
>>comment suggests twice what a regular reduction would do
>>but I guess we can "hide" the vec_to_scalar cost and "merge" it
>>with the broadcast.  Thus make the above two vector_stmt costs?
>>
>>+ /* A broadcast of the max value.  */
>>+ epilogue_cost += add_stmt_cost (target_cost_data, 2,
>>+ scalar_to_vec, stmt_info, 0,
>>+ vect_epilogue);
>>
>>comment suggests a single broadcast.
>
> I’ve made a copy/paste error here. Just need to swap the 1 and the 2.
>
>
>>
>>@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
>>  the final vector of induction results:  */
>>   exit_phi = NULL;
>>   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
>>-{
>>+   {
>>  gimple use_stmt = USE_STMT (use_p);
>>  if (is_gimple_debug (use_stmt))
>>continue;
>>
>>please avoid unrelated whitespace changes.
>
> Ok. I was changing “8 spaces” to a tab, but happy to revert.
>
>>
>>+  case COND_EXPR:
>>+   if (v_reduc_type == COND_REDUCTION)
>>+ {
>>...
>>+   /* Fall through.  */
>>+
>>   case MIN_EXPR:
>>   case MAX_EXPR:
>>-  case COND_EXPR:
>>
>>aww, so we could already handle COND_EXPR reductions?  How do they
>>differ from what you add?  Did you see if that path is even exercised
>>today?
>
> Today, COND_EXPRs are only supported when they are nested inside a loop.
> See the vect-cond-*.c tests.
> For example:
>
> for (j = 0; j < M; j++)
> {
>   x = x_in[j];
>   curr_a = a[0];
>
> for (i = 0; i < N; i++)
>   {
> next_a = a[i+1];
> curr_a = x > c[i] ? curr_a : next_a;
>   }
>   x_out[j] = curr_a;
> }
>
>
> In that case, only the outer loop is vectorised.
>
>>
>>+   /* Create a vector of {init_value, 0, 0, 0...}.  */
>>+   vec *v;
>>+   vec_alloc (v, nunits);
>>+   CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
>>+   if (SCALAR_FLOAT_TYPE_P (scalar_type))
>>+ for (i = 1; i < nunits; ++i)
>>+   CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>>+   build_real (scalar_type,
>>dconst0));
>>+   else
>>+ for (i = 1; i < nunits; ++i)
>>+   CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>>+   build_int_cst (scalar_type, 0));
>>+   init_def = build_constructor (vectype, v);
>>
>>you can unify the float/int case by using build_zero_cst (scalar_type).
>>Note that you should build a vector constant instead of a constructor
>>if init_val is a constant.  The convenient way is to build the vector
>>elements into a tree[] array and use build_vector_stat in that case.
>
> Ok, will switch to build_zero_cst.
> Also, I will switch my vector to  {init_value, init_value, init_value…}.
> I had {init_value, 0, 0, 0…} because I was going have the option of
> ADD_REDUC_EXPR,
> But that got removed along the way.

You can then simply use build_vector_from_val ().

>>
>>+  /* Find maximum value from the vector of found indexes.  */
>>+  tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");
>>
>>just use make_ssa_name (index_scalar_type);
>
> Ok
>
>>
>>+  /* Convert the vector of data to the same type as the EQ.  */
>>+  tree vec_data_cast;
>>+  if ( TYPE_UNSIGNED (index_vec_type))
>>+   {
>>
>>How come it never happens the element
>>sizes do not match?  (int index type and double data type?)
>
> This was a little unclear.
> The induction index is originally created as an unsigned version of the
> type as the data vector.
> (see the definition of cr_index_vector_type in vectorizable_reduction(),

Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Richard Biener
On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich  wrote:
> 2015-09-03 15:11 GMT+03:00 Richard Biener :
>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  wrote:
>>> Adding CCs.
>>>
>>> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
 2015-09-01 17:25 GMT+03:00 Richard Biener :

 Totally disabling old style vector comparison and bool pattern is a
 goal but doing hat would mean a lot of regressions for many targets.
 Do you want to it to be tried to estimate amount of changes required
 and reveal possible issues? What would be integration plan for these
 changes? Do you want to just introduce new vector in GIMPLE
 disabling bool patterns and then resolving vectorization regression on
 all targets or allow them live together with following target switch
 one by one from bool patterns with finally removing them? Not all
 targets are likely to be adopted fast I suppose.
>>
>> Well, the frontends already create vec_cond exprs I believe.  So for
>> bool patterns the vectorizer would have to do the same, but the
>> comparison result in there would still use vec.  Thus the scalar
>>
>>  _Bool a = b < c;
>>  _Bool c = a || d;
>>  if (c)
>>
>> would become
>>
>>  vec a = VEC_COND ;
>>  vec c = a | d;
>
> This should be identical to
>
> vec<_Bool> a = a < b;
> vec<_Bool> c = a | d;
>
> where vec<_Bool> has VxSI mode. And we should prefer it in case target
> supports vector comparison into vec, right?
>
>>
>> when the target does not have vecs directly and otherwise
>> vec directly (dropping the VEC_COND).
>>
>> Just the vector comparison inside the VEC_COND would always
>> have vec type.
>
> I don't really understand what you mean by 'doesn't have vecs
> dirrectly' here. Currently I have a hook to ask for a vec mode
> and assume target doesn't support it in case it returns VOIDmode. But
> in such case I have no mode to use for vec inside VEC_COND
> either.

I was thinking about targets not supporting generating vec
(of whatever mode) from a comparison directly but only via
a COND_EXPR.

> In default implementation of the new target hook I always return
> integer vector mode (to have default behavior similar to the current
> one). It should allow me to use vec for conditions in all
> vec_cond. But we'd need some other trigger for bool patterns to apply.
> Probably check vec_cmp optab in check_bool_pattern and don't convert
> in case comparison is supported by target? Or control it via
> additional hook.

Not sure if we are always talking about the same thing for
"bool patterns".  I'd remove bool patterns completely, IMHO
they are not necessary at all.

>>
>> And the "bool patterns" I am talking about are those in
>> tree-vect-patterns.c, not any targets instruction patterns.
>
> I refer to them also. BTW bool patterns also pull comparison into
> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
> think with vector comparisons in place we should allow SSA_NAME as
> conditions in VEC_COND for better CSE. That should require new vcond
> optabs though.

I think we do allow this, just the vectorizer doesn't expect it.  In the long
run I want to get rid of the GENERIC exprs in both COND_EXPR and
VEC_COND_EXPR.  Just didn't have the time to do this...

Richard.

> Ilya
>
>>
>> Richard.
>>

 Ilya


Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Richard Biener
On Tue, Sep 15, 2015 at 3:52 PM, Ilya Enkovich  wrote:
> On 08 Sep 15:37, Ilya Enkovich wrote:
>> 2015-09-04 23:42 GMT+03:00 Jeff Law :
>> >
>> > So do we have enough confidence in this representation that we want to go
>> > ahead and commit to it?
>>
>> I think new representation fits nice mostly. There are some places
>> where I have to make some exceptions for vector of bools to make it
>> work. This is mostly to avoid target modifications. I'd like to avoid
>> necessity to change all targets currently supporting vec_cond. It
>> makes me add some special handling of vec in GIMPLE, e.g. I add
>> a special code in vect_init_vector to build vec invariants with
>> proper casting to int. Otherwise I'd need to do it on a target side.
>>
>> I made several fixes and current patch (still allowing integer vector
>> result for vector comparison and applying bool patterns) passes
>> bootstrap and regression testing on x86_64. Now I'll try to fully
>> switch to vec and see how it goes.
>>
>> Thanks,
>> Ilya
>>
>
> Hi,
>
> I made a step forward forcing vector comparisons have a mask (vec) 
> result and disabling bool patterns in case vector comparison is supported by 
> target.  Several issues were met.
>
>  - c/c++ front-ends generate vector comparison with integer vector result.  I 
> had to make some modifications to use vec_cond instead.  Don't know if there 
> are other front-ends producing vector comparisons.
>  - vector lowering fails to expand vector masks due to mismatch of type and 
> mode sizes.  I fixed vector type size computation to match mode size and 
> added a special handling of mask expand.
>  - I disabled canonical type creation for vector mask because we can't layout 
> it with VOID mode. I don't know why we may need a canonical type here.  But 
> get_mask_mode call may be moved into type layout to get it.
>  - Expand of vec constants/contstructors requires special handling.  
> Common case should require target hooks/optabs to expand vector into required 
> mode.  But I suppose we want to have a generic code to handle vector of int 
> mode case to avoid modification of multiple targets which use default 
> vec modes.

One complication you might run into currently is that at the moment we
require the comparison result to be
of the same size as the comparison operands.  This means that
vector with, say, 4 elements has
to support different modes for v4si < v4si vs. v4df < v4df (if you
think of x86 with its multiple vector sizes).
That's for the "fallback" non-mask vector only of course.  Does
that mean we have to use different
bool types with different modes here?

So the other possibility is to never expose the fallback vector
anywhere but make sure to lower to
vector via VEC_COND_EXPRs.  After all it's only the vectorizer
that should create stmts with
vector LHS and the vectorizer is already careful to only
generate code supported by the target.

> Currently 'make check' shows two types of regression.
>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  
> This must be due to my front-end changes.  Hope it will be easy to fix.
>   - missed vectorization. All of them appear due to bool patterns disabling.  
> I didn't look into all of them but it seems the main problem is in mixed type 
> sizes.  With bool patterns and integer vector masks we just put int->(other 
> sized int) conversion for masks and it gives us required mask transformation. 
>  With boolean mask we don't have a proper scalar statements to do that.  I 
> think mask widening/narrowing may be directly supported in masked statements 
> vectorization.  Going to look into it.
>
> I attach what I currently have for a prototype.  It grows bigger so I split 
> into several parts.
>
> Thanks,
> Ilya
> --
> * avx512-vec-bool-01-add-truth-vector.ChangeLog
>
> 2015-09-15  Ilya Enkovich  
>
> * doc/tm.texi: Regenerated.
> * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
> * stor-layout.c (layout_type): Use mode to get vector mask size.
> (vector_type_mode): Likewise.
> * target.def (get_mask_mode): New.
> * targhooks.c (default_vector_alignment): Use mode alignment
> for vector masks.
> (default_get_mask_mode): New.
> * targhooks.h (default_get_mask_mode): New.
> * tree.c (make_vector_type): Vector mask has no canonical type.
> (build_truth_vector_type): New.
> (build_same_sized_truth_vector_type): New.
> (truth_type_for): Support vector masks.
> * tree.h (VECTOR_MASK_TYPE_P): New.
> (build_truth_vector_type): New.
> (build_same_sized_truth_vector_type): New.
>
> * avx512-vec-bool-02-no-int-vec-cmp.ChangeLog
>
> gcc/
>
> 2015-09-15  Ilya Enkovich  
>
> * tree-cfg.c (verify_gimple_comparison) Require vector mask
> type for vector comparison.
> (verify_gimple_assign_ternary): Likewise.
>
> gcc/c
>
> 2015-09-15  Ilya Enkovich  
>
> * c-typeck.c (build_con

Re: [gomp] more ptx builtins

2015-09-18 Thread Thomas Schwinge
Hi!

On Thu, 30 Jul 2015 15:44:41 -0400, Nathan Sidwell  wrote:
> I've committed this to  gomp4.  It adds spinlock builtins [...]

These two test cases actually got committed to gomp-4_0-branch in the
later/unrelated r226508.  In gomp-4_0-branch r227904, I now fixed these
as follows:

commit 5ab915af300d470df125fcf2445f56b601fbd80b
Author: tschwinge 
Date:   Fri Sep 18 12:43:37 2015 +

Fix DejaGnu directives in nvptx spinlock tests

gcc/testsuite/
* gcc.target/nvptx/spinlock-1.c: Fix DejaGnu directives.
* gcc.target/nvptx/spinlock-2.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@227904 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp|5 +
 gcc/testsuite/gcc.target/nvptx/spinlock-1.c |4 ++--
 gcc/testsuite/gcc.target/nvptx/spinlock-2.c |4 ++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 1135e84..b14167e 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-09-18  Thomas Schwinge  
+
+   * gcc.target/nvptx/spinlock-1.c: Fix DejaGnu directives.
+   * gcc.target/nvptx/spinlock-2.c: Likewise.
+
 2015-09-11  Cesar Philippidis  
 
* c-c++-common/goacc/parallel-reduction.c: Enclose the parallel
diff --git gcc/testsuite/gcc.target/nvptx/spinlock-1.c 
gcc/testsuite/gcc.target/nvptx/spinlock-1.c
index 0b458c6..b464ad9 100644
--- gcc/testsuite/gcc.target/nvptx/spinlock-1.c
+++ gcc/testsuite/gcc.target/nvptx/spinlock-1.c
@@ -7,5 +7,5 @@ void Foo ()
 
 
 /* { dg-final { scan-assembler-times ".atom.global.cas.b32" 2 } } */
-/* { dg-final { scan-assember ".global .u32 __global_lock;" } } */
-/* { dg-final { scan-assember-not ".shared .u32 __shared_lock;" } } */
+/* { dg-final { scan-assembler ".global .u32 __global_lock;" } } */
+/* { dg-final { scan-assembler-not ".shared .u32 __shared_lock;" } } */
diff --git gcc/testsuite/gcc.target/nvptx/spinlock-2.c 
gcc/testsuite/gcc.target/nvptx/spinlock-2.c
index a327179..9a51d3f 100644
--- gcc/testsuite/gcc.target/nvptx/spinlock-2.c
+++ gcc/testsuite/gcc.target/nvptx/spinlock-2.c
@@ -6,5 +6,5 @@ void Foo ()
 }
 
 /* { dg-final { scan-assembler-times ".atom.shared.cas.b32" 2 } } */
-/* { dg-final { scan-assember ".shared .u32 __shared_lock;" } } */
-/* { dg-final { scan-assember-not ".shared .u32 __shared_lock;" } } */
+/* { dg-final { scan-assembler ".shared .u32 __shared_lock;" } } */
+/* { dg-final { scan-assembler-not ".global .u32 __global_lock;" } } */


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-09-18 Thread David Edelsohn
On Thu, Sep 17, 2015 at 3:13 PM, Lynn A. Boger
 wrote:
> Here is my updated patch, with the changes suggested by
> Ian for gcc/gospec.c and David for gcc/configure.ac.
>
> Bootstrap built and tested on ppc64le, ppc64 multilib.
>
> 2015-09-17Lynn Boger 
> gcc/
> PR target/66870
> config/rs6000/sysv4.h:  Define TARGET_CAN_SPLIT_STACK_64BIT
> config.in:  Set up HAVE_GOLD_ALTERNATE_SPLIT_STACK
> configure.ac:  Define HAVE_GOLD_ALTERNATE_SPLIT_STACK
> on Power based on gold linker version
> configure:  Regenerate
> gcc.c:  Add -fuse-ld=gold to STACK_SPLIT_SPEC if
> HAVE_GOLD_ALTERNATE_SPLIT_STACK defined
> go/gospec.c:  (lang_specific_driver):  Set appropriate split
> stack
> options for 64 bit compiles based on
> TARGET_CAN_SPLIT_STACK_64BIT

The rs6000 bits are okay with me.

Ian needs to approve the go bits.  And Ian or a configure maintainer
needs to approve the other bits.

Thanks, David


[PATCH] Use stdint-wrap.h on *-*-netbsd[56]*

2015-09-18 Thread Jonathan Wakely

This patch adjust config.gcc so that it installs  for NetBSD
5.x and 6.x, which is necessary for the C++ library because the host
 has:

#if !defined(__cplusplus) || defined(__STDC_LIMIT_MACROS)
#include 
#endif

#if !defined(__cplusplus) || defined(__STDC_CONSTANT_MACROS)
#include 
#endif

This means that contrary to the C++11 standard the stdint macros are
only defined when __STDC_CONSTANT_MACROS / __STDC_LIMIT_MACROS are
defined.

I first noted the problem earlier this year and opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65806 


I rediscovered the problem when I broke netbsd bootstrap by including
 during bootstrap with https://gcc.gnu.org/r227684

That header uses UINT32_C, which is not defined without this patch.

NetBSD 7.x should be OK, because it knows about C++11 (see the link in
the PR for details).

Tested x86_64-unknown-netbsd5.1, OK for trunk?


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index affc5ba..9450dcb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-09-16  Jonathan Wakely  
+
+   * config.gcc (*-*-netbsd[5-6]*): Set use_gcc_stdint=wrap.
+
 2015-09-15  Alan Lawrence  
 
* config/aarch64/aarch64-simd.md
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 75807f5..394ded3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -788,6 +788,14 @@ case ${target} in
   default_use_cxa_atexit=yes
   ;;
   esac
+
+  # NetBSD 5.x and 6.x provide  but require
+  # __STDC_LIMIT_MACROS and __STDC_CONSTANT_MACROS for C++.
+  case ${target} in
+*-*-netbsd[5-6]* | *-*-netbsdelf[5-6]*)
+  use_gcc_stdint=wrap
+  ;;
+  esac
   ;;
 *-*-openbsd*)
   tmake_file="t-openbsd"


Re: [PATCH, rs6000] Add expansions for min/max vector reductions

2015-09-18 Thread Bill Schmidt
On Fri, 2015-09-18 at 10:38 +0200, Richard Biener wrote:
> On Thu, 17 Sep 2015, Segher Boessenkool wrote:
> 
> > On Thu, Sep 17, 2015 at 09:18:42AM -0500, Bill Schmidt wrote:
> > > On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote:
> > > > So just to clarify - you need to reduce the vector with max to a scalar
> > > > but want the (same) result in all vector elements?
> > > 
> > > Yes.  Alan Hayward's cond-reduction patch is set up to perform a
> > > reduction to scalar, followed by a scalar broadcast to get the value
> > > into all positions.  It happens that our most efficient expansion to
> > > reduce to scalar will naturally produce the value in all positions.
> > 
> > It also is many insns after expand, so relying on combine to combine
> > all that plus the following splat (as Richard suggests below) is not
> > really going to work.
> > 
> > If there also are targets where the _scal version is cheaper, maybe
> > we should keep both, and have expand expand to whatever the target
> > supports?
> 
> Wait .. so you don't actually have an instruction to do, say,
> REDUC_MAX_EXPR (neither to scalar nor to vector)?  Then it's better
> to _not_ define such pattern and let the vectorizer generate
> its fallback code.  If the fallback code isn't "best" then better
> think of a way to make it choose the best variant out of its
> available ones (and maybe add another).  I think it tests
> availability of the building blocks for the variants and simply
> picks the first that works without checking the cost model.

That's what we were considering per Alan Lawrence's suggestion elsewhere
in this thread, but there isn't currently a way to represent a
whole-vector rotate in gimple.  So we'd either have to add that or fall
back to an inferior code sequence, I believe.

Bill

> 
> Richard.
> 
> > 
> > Segher
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
> 




Re: [PATCH, rs6000] Add expansions for min/max vector reductions

2015-09-18 Thread Richard Biener
On Fri, 18 Sep 2015, Bill Schmidt wrote:

> On Fri, 2015-09-18 at 10:38 +0200, Richard Biener wrote:
> > On Thu, 17 Sep 2015, Segher Boessenkool wrote:
> > 
> > > On Thu, Sep 17, 2015 at 09:18:42AM -0500, Bill Schmidt wrote:
> > > > On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote:
> > > > > So just to clarify - you need to reduce the vector with max to a 
> > > > > scalar
> > > > > but want the (same) result in all vector elements?
> > > > 
> > > > Yes.  Alan Hayward's cond-reduction patch is set up to perform a
> > > > reduction to scalar, followed by a scalar broadcast to get the value
> > > > into all positions.  It happens that our most efficient expansion to
> > > > reduce to scalar will naturally produce the value in all positions.
> > > 
> > > It also is many insns after expand, so relying on combine to combine
> > > all that plus the following splat (as Richard suggests below) is not
> > > really going to work.
> > > 
> > > If there also are targets where the _scal version is cheaper, maybe
> > > we should keep both, and have expand expand to whatever the target
> > > supports?
> > 
> > Wait .. so you don't actually have an instruction to do, say,
> > REDUC_MAX_EXPR (neither to scalar nor to vector)?  Then it's better
> > to _not_ define such pattern and let the vectorizer generate
> > its fallback code.  If the fallback code isn't "best" then better
> > think of a way to make it choose the best variant out of its
> > available ones (and maybe add another).  I think it tests
> > availability of the building blocks for the variants and simply
> > picks the first that works without checking the cost model.
> 
> That's what we were considering per Alan Lawrence's suggestion elsewhere
> in this thread, but there isn't currently a way to represent a
> whole-vector rotate in gimple.  So we'd either have to add that or fall
> back to an inferior code sequence, I believe.

A whole-vector rotate is just a VEC_PERM with a proper constant mask.
Of course the target would have to detect these cases and use
vector rotate instructions (x86 does that for example).

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Ilya Enkovich
2015-09-17 20:35 GMT+03:00 Richard Henderson :
> On 09/15/2015 06:52 AM, Ilya Enkovich wrote:
>> I made a step forward forcing vector comparisons have a mask (vec) 
>> result and disabling bool patterns in case vector comparison is supported by 
>> target.  Several issues were met.
>>
>>  - c/c++ front-ends generate vector comparison with integer vector result.  
>> I had to make some modifications to use vec_cond instead.  Don't know if 
>> there are other front-ends producing vector comparisons.
>>  - vector lowering fails to expand vector masks due to mismatch of type and 
>> mode sizes.  I fixed vector type size computation to match mode size and 
>> added a special handling of mask expand.
>>  - I disabled canonical type creation for vector mask because we can't 
>> layout it with VOID mode. I don't know why we may need a canonical type 
>> here.  But get_mask_mode call may be moved into type layout to get it.
>>  - Expand of vec constants/contstructors requires special handling.  
>> Common case should require target hooks/optabs to expand vector into 
>> required mode.  But I suppose we want to have a generic code to handle 
>> vector of int mode case to avoid modification of multiple targets which use 
>> default vec modes.
>>
>> Currently 'make check' shows two types of regression.
>>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). 
>>  This must be due to my front-end changes.  Hope it will be easy to fix.
>>   - missed vectorization. All of them appear due to bool patterns disabling. 
>>  I didn't look into all of them but it seems the main problem is in mixed 
>> type sizes.  With bool patterns and integer vector masks we just put 
>> int->(other sized int) conversion for masks and it gives us required mask 
>> transformation.  With boolean mask we don't have a proper scalar statements 
>> to do that.  I think mask widening/narrowing may be directly supported in 
>> masked statements vectorization.  Going to look into it.
>>
>> I attach what I currently have for a prototype.  It grows bigger so I split 
>> into several parts.
>
> The general approach looks good.
>

Great!

>
>> +/* By defaults a vector of integers is used as a mask.  */
>> +
>> +machine_mode
>> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
>> +{
>> +  unsigned elem_size = vector_size / nunits;
>> +  machine_mode elem_mode
>> += smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
>
> Why these arguments as opposed to passing elem_size?  It seems that every hook
> is going to have to do this division...

Every target would have nunits = vector_size / elem_size because
nunits is used to create a vector mode. Thus no difference.

>
>> +#define VECTOR_MASK_TYPE_P(TYPE) \
>> +  (TREE_CODE (TYPE) == VECTOR_TYPE   \
>> +   && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE)
>
> Perhaps better as VECTOR_BOOLEAN_TYPE_P, since that's exactly what's being 
> tested?

OK

>
>> @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree 
>> op1)
>>return true;
>>  }
>>  }
>> -  /* Or an integer vector type with the same size and element count
>> +  /* Or a boolean vector type with the same element count
>>   as the comparison operand types.  */
>>else if (TREE_CODE (type) == VECTOR_TYPE
>> -&& TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
>> +&& TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>
> VECTOR_BOOLEAN_TYPE_P.
>
>> @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
>> tree t, tree bitsize, tree bitpos)
>>  {
>>if (bitpos)
>> -return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
>> +{
>> +  if (TREE_CODE (type) == BOOLEAN_TYPE)
>> + {
>> +   tree itype
>> + = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
>> +   tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
>> + bitsize, bitpos);
>> +   return gimplify_build2 (gsi, NE_EXPR, type, field,
>> +   build_zero_cst (itype));
>> + }
>> +  else
>> + return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
>> +}
>>else
>>  return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
>>  }
>
> So... this is us lowering vector operations on a target that doesn't support
> them.  Which means that we get to decide what value is produced for a
> comparison?  Which means that we can settle on the "normal" -1, correct?
>
> Which means that we ought not need to extract the entire element and then
> compare for non-zero, but rather simply extract a single bit from the element,
> and directly call that a boolean result, correct?

Didn't think about that. I'll give it a try.

>
> I assume you tested all this code with -mno-sse or equivalent arch default?

I didn't make some special runs for that. Just used regression testi

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-18 Thread Alan Lawrence

On 18/09/15 13:17, Richard Biener wrote:


Ok, I see.

That this case is already vectorized is because it implements MAX_EXPR,
modifying it slightly to

int foo (int *a)
{
   int val = 0;
   for (int i = 0; i < 1024; ++i)
 if (a[i] > val)
   val = a[i] + 1;
   return val;
}

makes it no longer handled by current code.



Yes. I believe the idea for the patch is to handle arbitrary expressions like

int foo (int *a)
{
   int val = 0;
   for (int i = 0; i < 1024; ++i)
 if (some_expression (i))
   val = another_expression (i);
   return val;
}

Cheers,
Alan



Re: (patch,rfc) s/gimple/gimple */

2015-09-18 Thread Trevor Saunders
On Wed, Sep 16, 2015 at 03:11:14PM -0400, David Malcolm wrote:
> On Wed, 2015-09-16 at 09:16 -0400, Trevor Saunders wrote:
> > Hi,
> > 
> > I gave changing from gimple to gimple * a shot last week.  It turned out
> > to be not too hard.  As you might expect the patch is huge so its
> > attached compressed.
> > 
> > patch was bootstrapped + regtested on x86_64-linux-gnu, and run through
> > config-list.mk.  However I needed to update it some for changes made
> > while testing.  Do people want to make this change now?  If so I'll try
> > and commit the patch over the weekend when less is changing.
> 
> 
> FWIW there are some big changes in gcc/tree-vect-slp.c:vectorizable_load
> that looks like unrelated whitespace changes, e.g. the following (and
> there are some followup hunks).  Did something change underneath, or was
> there a stray whitespace cleanup here?  (I skimmed through the patch,
> and this was the only file I spotted where something looked wrong)

yeah, it was a stray whitespace cleanup, but I reverted it.

Given the few but only positive comments I've seen I'm planning to
commit this over the weekend.

Trev



Re: [gomp4] OpenACC reduction tests

2015-09-18 Thread Thomas Schwinge
Hi Cesar!

On Fri, 17 Jul 2015 11:13:59 -0700, Cesar Philippidis  
wrote:
> This patch updates the libgomp OpenACC reduction test cases to check
> worker, vector and combined gang worker vector reductions. I tried to
> use some macros to simplify the c test cases a bit. I probably could
> have made them more generic with an additional header file/macro, but
> then that makes it too confusing too debug. The fortran tests are a bit
> of a lost clause, unless someone knows how to use the preprocessor with
> !$acc loops.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-2.c

> +static void
> +test_reductions (void)
>  {

> -  [...]
> +  const int n = 100;
>int i;
> -  [...]
> +  float array[n];
>  
>for (i = 0; i < n; i++)
> -[...]
> +array[i] = i+1;
>  
> -  [...]
> +  /* Gang reductions.  */
> +  check_reduction_op (float, +, 0, array[i], num_gangs (ng), gang);
> +  check_reduction_op (float, *, 1, array[i], num_gangs (ng), gang);

I see this one reproducibly FAIL in the x86_64 -m32 multilib's
host-fallback testing (there is no nvptx offloading for 32-bit
configurations).  (The -m32 multilib is configured/enabled by default, so
fixing this is a prerequisite for trunk integration.)  From a very quick
glance, might it be that we're overflowing the float data type with the
"1 * 2 * 3 * [...] * 1000" computation?  The OpenACC reduction computes
"inf" which is then compared against a very high finite reference value
-- or the other way round (I lost my debugging session).  Instead of
multiplying these "big" numbers, I guess we should just do a more
idiomatic floating point computation?

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-4.c

>  /* complex reductions.  */

> +static void
> +test_reductions (void)
>  {

> +  double _Complex array[n];
> +
> +  for (i = 0; i < n; i++)
> +array[i] = i+1;
> +
> +  /* Gang reductions.  */
> +  check_reduction_op (double, +, 0, creal (array[i]), num_gangs (ng), gang);

Given that in the check_reduction_op instantiations you're specifying a
"double" data type (instead of "double _Complex", for example), and
"creal (array[i])" reduction operands (instead of "array[i]", for
example), we're not actually testing reductions with complex data types,
so I guess that should be changed.  :-)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction.h
> @@ -0,0 +1,43 @@
> +#ifndef REDUCTION_H
> +#define REDUCTION_H
> +
> +#define DO_PRAGMA(x) _Pragma (#x)
> +
> +#define check_reduction_op(type, op, init, b, gwv_par, gwv_loop) \
> +  {  \
> +type res, vres;  \
> +res = (init);\
> +DO_PRAGMA (acc parallel gwv_par copy (res))  \
> +DO_PRAGMA (acc loop gwv_loop reduction (op:res)) \
> +for (i = 0; i < n; i++)  \
> +  res = res op (b);  
> \
> + \
> +vres = (init);   \
> +for (i = 0; i < n; i++)  \
> +  vres = vres op (b);\
> + \
> +if (res != vres) \
> +  abort ();  
> \
> +  }

It's the right thing for integer data types, but for anything floating
point, we should be allowing for some small difference (epsilon) between
res and vres, due to rounding differences in the OpenACC reduction
(possibly offloaded) and reference value computation, and similar.

> +#define check_reduction_macro(type, op, init, b, gwv_par, gwv_loop)  \
> +  {  \
> +type res, vres;  \
> +res = (init);\
> +DO_PRAGMA (acc parallel gwv_par copy(res))   
> \
> +DO_PRAGMA (acc loop gwv_loop reduction (op:res)) \
> +for (i = 0; i < n; i++)  \
> +  res = op (res, (b));   \
> + \
> +vres = (init);   \
> +for (i = 0; i < n; i++)  \
> +  vres = op (vres, (b)); \
> +  

Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Ilya Enkovich
2015-09-18 15:22 GMT+03:00 Richard Biener :
> On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich  wrote:
>> 2015-09-03 15:11 GMT+03:00 Richard Biener :
>>> On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  
>>> wrote:
 Adding CCs.

 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>
> Totally disabling old style vector comparison and bool pattern is a
> goal but doing hat would mean a lot of regressions for many targets.
> Do you want to it to be tried to estimate amount of changes required
> and reveal possible issues? What would be integration plan for these
> changes? Do you want to just introduce new vector in GIMPLE
> disabling bool patterns and then resolving vectorization regression on
> all targets or allow them live together with following target switch
> one by one from bool patterns with finally removing them? Not all
> targets are likely to be adopted fast I suppose.
>>>
>>> Well, the frontends already create vec_cond exprs I believe.  So for
>>> bool patterns the vectorizer would have to do the same, but the
>>> comparison result in there would still use vec.  Thus the scalar
>>>
>>>  _Bool a = b < c;
>>>  _Bool c = a || d;
>>>  if (c)
>>>
>>> would become
>>>
>>>  vec a = VEC_COND ;
>>>  vec c = a | d;
>>
>> This should be identical to
>>
>> vec<_Bool> a = a < b;
>> vec<_Bool> c = a | d;
>>
>> where vec<_Bool> has VxSI mode. And we should prefer it in case target
>> supports vector comparison into vec, right?
>>
>>>
>>> when the target does not have vecs directly and otherwise
>>> vec directly (dropping the VEC_COND).
>>>
>>> Just the vector comparison inside the VEC_COND would always
>>> have vec type.
>>
>> I don't really understand what you mean by 'doesn't have vecs
>> dirrectly' here. Currently I have a hook to ask for a vec mode
>> and assume target doesn't support it in case it returns VOIDmode. But
>> in such case I have no mode to use for vec inside VEC_COND
>> either.
>
> I was thinking about targets not supporting generating vec
> (of whatever mode) from a comparison directly but only via
> a COND_EXPR.

Where may these direct comparisons come from? Vectorizer never
generates unsupported statements. It means we get them from
gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
Actually vect lowering checks if we are able to make comparison and
expand also uses vec_cond to expand vector comparison, so probably we
may live with them.

>
>> In default implementation of the new target hook I always return
>> integer vector mode (to have default behavior similar to the current
>> one). It should allow me to use vec for conditions in all
>> vec_cond. But we'd need some other trigger for bool patterns to apply.
>> Probably check vec_cmp optab in check_bool_pattern and don't convert
>> in case comparison is supported by target? Or control it via
>> additional hook.
>
> Not sure if we are always talking about the same thing for
> "bool patterns".  I'd remove bool patterns completely, IMHO
> they are not necessary at all.

I refer to transformations made by vect_recog_bool_pattern. Don't see
how to remove them completely for targets not supporting comparison
vectorization.

>
>>>
>>> And the "bool patterns" I am talking about are those in
>>> tree-vect-patterns.c, not any targets instruction patterns.
>>
>> I refer to them also. BTW bool patterns also pull comparison into
>> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
>> think with vector comparisons in place we should allow SSA_NAME as
>> conditions in VEC_COND for better CSE. That should require new vcond
>> optabs though.
>
> I think we do allow this, just the vectorizer doesn't expect it.  In the long
> run I want to get rid of the GENERIC exprs in both COND_EXPR and
> VEC_COND_EXPR.  Just didn't have the time to do this...

That would be nice. As a first step I'd like to support optabs for
VEC_COND_EXPR directly using vec.

Thanks,
Ilya

>
> Richard.
>
>> Ilya
>>
>>>
>>> Richard.
>>>
>
> Ilya


Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (take

2015-09-18 Thread Marek Polacek
2)
Reply-To: 
In-Reply-To: <20150918100606.gf27...@redhat.com>

On Fri, Sep 18, 2015 at 12:06:06PM +0200, Marek Polacek wrote:
> > Since we don't know bar's side-effects we must assume they change
> > the value of a and so we must avoid diagnosing the third if.
> 
> Ok, I'm convinced now.  We have something similar in the codebase:
> libsupc++/eh_catch.cc has
> 
>   int count = header->handlerCount;
>   if (count < 0)
> {   
>   // This exception was rethrown.  Decrement the (inverted) catch
>   // count and remove it from the chain when it reaches zero.
>   if (++count == 0)
> globals->caughtExceptions = header->nextException;
> }   
>   else if (--count == 0)
> {   
>   // Handling for this exception is complete.  Destroy the object.
>   globals->caughtExceptions = header->nextException;
>   _Unwind_DeleteException (&header->unwindHeader);
>   return;
> }   
>   else if (count < 0)
> // A bug in the exception handling library or compiler.
> std::terminate ();
> 
> Here all arms are reachable.  I guess I need to kill the chain of conditions
> when we find something with side-effects, exactly as you suggested.

Done in the below.  This version actually bootstraps, because I've added
-Wno-duplicated-cond for insn-dfatab.o and insn-latencytab.o (don't know
how to fix these) + I've tweaked a condition in genemit.c.  The problem
here is that we have

  if (INTVAL (x) == 0)
printf ("const0_rtx");
  else if (INTVAL (x) == 1)
printf ("const1_rtx");
  else if (INTVAL (x) == -1) 
printf ("constm1_rtx");
  // ...
  else if (INTVAL (x) == STORE_FLAG_VALUE)
printf ("const_true_rtx");

and STORE_FLAG_VALUE happens to be 1, so we have two same conditions.
STORE_FLAG_VALUE is 1 or -1, but according to the documentation it can
also be some other number so we should keep this if statement.  I've
avoided the warning by adding STORE_FLAG_VALUE > 1 check.

How does this look like now?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-09-18  Marek Polacek  

PR c/64249
* c-common.c (warn_duplicated_cond_add_or_warn): New function.
* c-common.h (warn_duplicated_cond_add_or_warn): Declare.
* c.opt (Wduplicated-cond): New option.

* c-parser.c (c_parser_statement_after_labels): Add CHAIN parameter
and pass it down to c_parser_if_statement.
(c_parser_else_body): Add CHAIN parameter and pass it down to
c_parser_statement_after_labels.
(c_parser_if_statement): Add CHAIN parameter.  Add code to warn about
duplicated if-else-if conditions.

* parser.c (cp_parser_statement): Add CHAIN parameter and pass it
down to cp_parser_selection_statement.
(cp_parser_selection_statement): Add CHAIN parameter.  Add code to
warn about duplicated if-else-if conditions.
(cp_parser_implicitly_scoped_statement): Add CHAIN parameter and pass
it down to cp_parser_statement.

* doc/invoke.texi: Document -Wduplicated-cond.
* Makefile.in (insn-latencytab.o): Use -Wno-duplicated-cond.
(insn-dfatab.o): Likewise.
* genemit.c (gen_exp): Rewrite condition to avoid -Wduplicated-cond
warning.

* c-c++-common/Wduplicated-cond-1.c: New test.
* c-c++-common/Wduplicated-cond-2.c: New test.
* c-c++-common/Wduplicated-cond-3.c: New test.
* c-c++-common/Wduplicated-cond-4.c: New test.
* c-c++-common/Wmisleading-indentation.c (fn_37): Avoid
-Wduplicated-cond warning.

diff --git gcc/Makefile.in gcc/Makefile.in
index c2df21d..d7caa76 100644
--- gcc/Makefile.in
+++ gcc/Makefile.in
@@ -217,6 +217,8 @@ libgcov-merge-tool.o-warn = -Wno-error
 gimple-match.o-warn = -Wno-unused
 generic-match.o-warn = -Wno-unused
 dfp.o-warn = -Wno-strict-aliasing
+insn-latencytab.o-warn = -Wno-duplicated-cond
+insn-dfatab.o-warn = -Wno-duplicated-cond
 
 # All warnings have to be shut off in stage1 if the compiler used then
 # isn't gcc; configure determines that.  WARN_CFLAGS will be either
diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 4b922bf..8991215 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -12919,4 +12919,45 @@ reject_gcc_builtin (const_tree expr, location_t loc /* 
= UNKNOWN_LOCATION */)
   return false;
 }
 
+/* If we're creating an if-else-if condition chain, first see if we
+   already have this COND in the CHAIN.  If so, warn and don't add COND
+   into the vector, otherwise add the COND there.  LOC is the location
+   of COND.  */
+
+void
+warn_duplicated_cond_add_or_warn (location_t loc, tree cond, vec **chain)
+{
+  /* No chain has been created yet.  Do nothing.  */
+  if (*chain == NULL)
+return;
+
+  if (TREE_SIDE_EFFECTS (cond))
+{
+  /* Uh-oh!  This condition has a side-effect, thus invalidates
+the whole chain.  */
+  delete *chain;
+  *chain = NULL;
+  return;
+}
+
+

Re: (patch,rfc) s/gimple/gimple */

2015-09-18 Thread Richard Biener
On Fri, Sep 18, 2015 at 3:32 PM, Trevor Saunders  wrote:
> On Wed, Sep 16, 2015 at 03:11:14PM -0400, David Malcolm wrote:
>> On Wed, 2015-09-16 at 09:16 -0400, Trevor Saunders wrote:
>> > Hi,
>> >
>> > I gave changing from gimple to gimple * a shot last week.  It turned out
>> > to be not too hard.  As you might expect the patch is huge so its
>> > attached compressed.
>> >
>> > patch was bootstrapped + regtested on x86_64-linux-gnu, and run through
>> > config-list.mk.  However I needed to update it some for changes made
>> > while testing.  Do people want to make this change now?  If so I'll try
>> > and commit the patch over the weekend when less is changing.
>>
>>
>> FWIW there are some big changes in gcc/tree-vect-slp.c:vectorizable_load
>> that looks like unrelated whitespace changes, e.g. the following (and
>> there are some followup hunks).  Did something change underneath, or was
>> there a stray whitespace cleanup here?  (I skimmed through the patch,
>> and this was the only file I spotted where something looked wrong)
>
> yeah, it was a stray whitespace cleanup, but I reverted it.
>
> Given the few but only positive comments I've seen I'm planning to
> commit this over the weekend.

Thanks a lot!

If you are still in a refactoring mood then I have sth else here.  When
streamlining the gimple accessors I noticed the glaring const-correctness
issue in

/* Return a pointer to the LHS of assignment statement GS.  */

static inline tree *
gimple_assign_lhs_ptr (const gassign *gs)
{
  return const_cast (&gs->op[0]);
}

and was thinking to either "fix" it by removing the 'const' or by
merging gimple_assign_lhs and gimple_assign_lhs_ptr into

static inline const tree&
gimple_assign_lhs (const gassign *);

static inline tree&
gimple_assign_lhs (gassign *);

thus forgo X() vs. X_ptr() by using const/non-const references
as return values.  It's a little cost on the simple accessor
(extra dereference in the caller) but a cleaner API.

It would also preserve constness of users of the _ptr variant
if they don't end up modifying the thing.

Did I mention I never liked the "_ptr" notion?

Richard.

> Trev
>


Re: [gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-09-18 Thread Cesar Philippidis
On 09/18/2015 01:39 AM, Thomas Schwinge wrote:

> On Tue, 1 Sep 2015 18:29:55 +0200, Tom de Vries  
> wrote:
>> On 27/08/15 03:37, Cesar Philippidis wrote:
>>> -  ctx->ganglocal_size_host = align_and_expand (&gl_host, host_size, align);
>>
>> I suspect this caused a bootstrap failure (align_and_expand unused). 
>> Worked-around as attached.
> 
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -1450,7 +1450,7 @@ omp_copy_decl (tree var, copy_body_data *cb)
>>  
>>  /* Modify the old size *POLDSZ to align it up to ALIGN, and then return
>> a value with SIZE added to it.  */
>> -static tree
>> +static tree ATTRIBUTE_UNUSED
>>  align_and_expand (tree *poldsz, tree size, unsigned int align)
>>  {
>>tree oldsz = *poldsz;
> 
> If I remember correctly, this has only ever been used in the "ganglocal"
> implementation -- which is now gone.  So, should align_and_expand also be
> elided (Cesar)?

Most likely. I probably overlooked it when I was working on that
ganglocal removal patch. Can you remove it please? I'm already juggling
a couple of patches right now.

Thanks,
Cesar





Re: [PATCH, rs6000] Add expansions for min/max vector reductions

2015-09-18 Thread Bill Schmidt
On Fri, 2015-09-18 at 15:15 +0200, Richard Biener wrote:
> On Fri, 18 Sep 2015, Bill Schmidt wrote:
> 
> > On Fri, 2015-09-18 at 10:38 +0200, Richard Biener wrote:
> > > On Thu, 17 Sep 2015, Segher Boessenkool wrote:
> > > 
> > > > On Thu, Sep 17, 2015 at 09:18:42AM -0500, Bill Schmidt wrote:
> > > > > On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote:
> > > > > > So just to clarify - you need to reduce the vector with max to a 
> > > > > > scalar
> > > > > > but want the (same) result in all vector elements?
> > > > > 
> > > > > Yes.  Alan Hayward's cond-reduction patch is set up to perform a
> > > > > reduction to scalar, followed by a scalar broadcast to get the value
> > > > > into all positions.  It happens that our most efficient expansion to
> > > > > reduce to scalar will naturally produce the value in all positions.
> > > > 
> > > > It also is many insns after expand, so relying on combine to combine
> > > > all that plus the following splat (as Richard suggests below) is not
> > > > really going to work.
> > > > 
> > > > If there also are targets where the _scal version is cheaper, maybe
> > > > we should keep both, and have expand expand to whatever the target
> > > > supports?
> > > 
> > > Wait .. so you don't actually have an instruction to do, say,
> > > REDUC_MAX_EXPR (neither to scalar nor to vector)?  Then it's better
> > > to _not_ define such pattern and let the vectorizer generate
> > > its fallback code.  If the fallback code isn't "best" then better
> > > think of a way to make it choose the best variant out of its
> > > available ones (and maybe add another).  I think it tests
> > > availability of the building blocks for the variants and simply
> > > picks the first that works without checking the cost model.
> > 
> > That's what we were considering per Alan Lawrence's suggestion elsewhere
> > in this thread, but there isn't currently a way to represent a
> > whole-vector rotate in gimple.  So we'd either have to add that or fall
> > back to an inferior code sequence, I believe.
> 
> A whole-vector rotate is just a VEC_PERM with a proper constant mask.
> Of course the target would have to detect these cases and use
> vector rotate instructions (x86 does that for example).

Hm, yes, that's right.  And we should already have those special-permute
recognitions in place; I had just forgotten about them.  Ok, I agree
this is probably the best approach, then.

I'll have to refresh my memory on Alan H's patch, but we may need to add
logic to do this sort of epilogue expansion for his new reduction.  I
think right now it is just giving up if the REUC_MAX_EXPR isn't
supported.

Thanks,
Bill


> 
> Richard.
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
> 




Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Ilya Enkovich
2015-09-18 15:29 GMT+03:00 Richard Biener :
> On Tue, Sep 15, 2015 at 3:52 PM, Ilya Enkovich  wrote:
>> On 08 Sep 15:37, Ilya Enkovich wrote:
>>> 2015-09-04 23:42 GMT+03:00 Jeff Law :
>>> >
>>> > So do we have enough confidence in this representation that we want to go
>>> > ahead and commit to it?
>>>
>>> I think new representation fits nice mostly. There are some places
>>> where I have to make some exceptions for vector of bools to make it
>>> work. This is mostly to avoid target modifications. I'd like to avoid
>>> necessity to change all targets currently supporting vec_cond. It
>>> makes me add some special handling of vec in GIMPLE, e.g. I add
>>> a special code in vect_init_vector to build vec invariants with
>>> proper casting to int. Otherwise I'd need to do it on a target side.
>>>
>>> I made several fixes and current patch (still allowing integer vector
>>> result for vector comparison and applying bool patterns) passes
>>> bootstrap and regression testing on x86_64. Now I'll try to fully
>>> switch to vec and see how it goes.
>>>
>>> Thanks,
>>> Ilya
>>>
>>
>> Hi,
>>
>> I made a step forward forcing vector comparisons have a mask (vec) 
>> result and disabling bool patterns in case vector comparison is supported by 
>> target.  Several issues were met.
>>
>>  - c/c++ front-ends generate vector comparison with integer vector result.  
>> I had to make some modifications to use vec_cond instead.  Don't know if 
>> there are other front-ends producing vector comparisons.
>>  - vector lowering fails to expand vector masks due to mismatch of type and 
>> mode sizes.  I fixed vector type size computation to match mode size and 
>> added a special handling of mask expand.
>>  - I disabled canonical type creation for vector mask because we can't 
>> layout it with VOID mode. I don't know why we may need a canonical type 
>> here.  But get_mask_mode call may be moved into type layout to get it.
>>  - Expand of vec constants/contstructors requires special handling.  
>> Common case should require target hooks/optabs to expand vector into 
>> required mode.  But I suppose we want to have a generic code to handle 
>> vector of int mode case to avoid modification of multiple targets which use 
>> default vec modes.
>
> One complication you might run into currently is that at the moment we
> require the comparison result to be
> of the same size as the comparison operands.  This means that
> vector with, say, 4 elements has
> to support different modes for v4si < v4si vs. v4df < v4df (if you
> think of x86 with its multiple vector sizes).
> That's for the "fallback" non-mask vector only of course.  Does
> that mean we have to use different
> bool types with different modes here?

I though about boolean types with different sizes/modes. I still avoid
them but it causes some ugliness. E.g. sizeof(innertype)*nelems !=
sizeof(vectortype) for vec. I causes some special handling in
type layout and problems in lowering because BIT_FIELD_REF uses more
bits than resulting type has. I use additional comparison to handle
it. Richard also proposed to extract one bit only for bools. Don't
know if differently sized boolean types may help to resolve this issue
or create more problems.

>
> So the other possibility is to never expose the fallback vector
> anywhere but make sure to lower to
> vector via VEC_COND_EXPRs.  After all it's only the vectorizer
> that should create stmts with
> vector LHS and the vectorizer is already careful to only
> generate code supported by the target.

In case vec has integer vector mode, comparison should be
handled similar to VEC_COND_EXPR by vect lowering and expand which
should be enough to have it properly handled on targets with no
vec support.

Thanks,
Ilya

>
>> Currently 'make check' shows two types of regression.
>>   - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). 
>>  This must be due to my front-end changes.  Hope it will be easy to fix.
>>   - missed vectorization. All of them appear due to bool patterns disabling. 
>>  I didn't look into all of them but it seems the main problem is in mixed 
>> type sizes.  With bool patterns and integer vector masks we just put 
>> int->(other sized int) conversion for masks and it gives us required mask 
>> transformation.  With boolean mask we don't have a proper scalar statements 
>> to do that.  I think mask widening/narrowing may be directly supported in 
>> masked statements vectorization.  Going to look into it.
>>
>> I attach what I currently have for a prototype.  It grows bigger so I split 
>> into several parts.
>>
>> Thanks,
>> Ilya


Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-18 Thread Alan Hayward

On 18/09/2015 14:26, "Alan Lawrence"  wrote:

>On 18/09/15 13:17, Richard Biener wrote:
>>
>> Ok, I see.
>>
>> That this case is already vectorized is because it implements MAX_EXPR,
>> modifying it slightly to
>>
>> int foo (int *a)
>> {
>>int val = 0;
>>for (int i = 0; i < 1024; ++i)
>>  if (a[i] > val)
>>val = a[i] + 1;
>>return val;
>> }
>>
>> makes it no longer handled by current code.
>>
>
>Yes. I believe the idea for the patch is to handle arbitrary expressions
>like
>
>int foo (int *a)
>{
>int val = 0;
>for (int i = 0; i < 1024; ++i)
>  if (some_expression (i))
>val = another_expression (i);
>return val;
>}


Yes, that’s correct. Hopefully my new test cases should cover everything.


Alan.




Re: [PATCH][AArch64][1/5] Improve immediate generation

2015-09-18 Thread James Greenhalgh
On Wed, Sep 02, 2015 at 01:34:48PM +0100, Wilco Dijkstra wrote:
> This patch reimplements aarch64_bitmask_imm using bitwise arithmetic rather
> than a slow binary search. The algorithm searches for a sequence of set bits.
> If there are no more set bits and not all bits are set, it is a valid mask.
> Otherwise it determines the distance to the next set bit and checks the mask
> is repeated across the full 64 bits. Native performance is 5-6x faster on
> typical queries.
> 
> No change in generated code, passes GCC regression/bootstrap.

OK.

Thanks,
James

> ChangeLog:
> 2015-09-02  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c (aarch64_bitmask_imm):
>   Reimplement using faster algorithm.
> 
> ---
>  gcc/config/aarch64/aarch64.c | 62 
> +---
>  1 file changed, 53 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index c666dce..ba1b77e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3301,19 +3301,63 @@ aarch64_movw_imm (HOST_WIDE_INT val, machine_mode 
> mode)
> || (val & (((HOST_WIDE_INT) 0x) << 16)) == val);
>  }
>  
> +/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2.  */
> +
> +static const unsigned HOST_WIDE_INT bitmask_imm_mul[] =
> +  {
> +0x00010001ull,
> +0x0001000100010001ull,
> +0x0101010101010101ull,
> +0xull,
> +0xull,
> +  };
> +
>  
>  /* Return true if val is a valid bitmask immediate.  */
> +
>  bool
> -aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode mode)
> +aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)
>  {
> -  if (GET_MODE_SIZE (mode) < 8)
> -{
> -  /* Replicate bit pattern.  */
> -  val &= (HOST_WIDE_INT) 0x;
> -  val |= val << 32;
> -}
> -  return bsearch (&val, aarch64_bitmasks, AARCH64_NUM_BITMASKS,
> -   sizeof (aarch64_bitmasks[0]), aarch64_bitmasks_cmp) != NULL;
> +  unsigned HOST_WIDE_INT val, tmp, mask, first_one, next_one;
> +  int bits;
> +
> +  /* Check for a single sequence of one bits and return quickly if so.
> + The special cases of all ones and all zeroes returns false.  */
> +  val = (unsigned HOST_WIDE_INT) val_in;
> +  tmp = val + (val & -val);
> +
> +  if (tmp == (tmp & -tmp))
> +return (val + 1) > 1;
> +
> +  /* Replicate 32-bit immediates so we can treat them as 64-bit.  */
> +  if (mode == SImode)
> +val = (val << 32) | (val & 0x);
> +
> +  /* Invert if the immediate doesn't start with a zero bit - this means we
> + only need to search for sequences of one bits.  */
> +  if (val & 1)
> +val = ~val;
> +
> +  /* Find the first set bit and set tmp to val with the first sequence of one
> + bits removed.  Return success if there is a single sequence of ones.  */
> +  first_one = val & -val;
> +  tmp = val & (val + first_one);
> +
> +  if (tmp == 0)
> +return true;
> +
> +  /* Find the next set bit and compute the difference in bit position.  */
> +  next_one = tmp & -tmp;
> +  bits = clz_hwi (first_one) - clz_hwi (next_one);
> +  mask = val ^ tmp;
> +
> +  /* Check the bit position difference is a power of 2, and that the first
> + sequence of one bits fits within 'bits' bits.  */
> +  if ((mask >> bits) != 0 || bits != (bits & -bits))
> +return false;
> +
> +  /* Check the sequence of one bits is repeated 64/bits times.  */
> +  return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];
>  }
>  
>  
> -- 
> 1.8.3
> 
> 


Re: [PATCH, ARM]: Fix static interworking call

2015-09-18 Thread Richard Earnshaw
On 17/09/15 09:46, Christian Bruel wrote:
> As obvious, bad operand number.
> 
> OK for trunk ?
> 
> Christian
> 
> 
> p1.patch
> 
> 
> 2015-09-18  Christian Bruel  
> 
>   * config/arm/arm.md (*call_value_symbol): Fix operand for interworking.
> 
> 2015-09-18  Christian Bruel  
> 
>   * gcc.target/arm/attr_thumb-static2.c: New test.
> 
> --- gnu_trunk.ref/gcc/gcc/config/arm/arm.md   2015-09-14 09:52:37.697264500 
> +0200
> +++ gnu_trunk.p0/gcc/gcc/config/arm/arm.md2015-09-17 10:03:33.849451705 
> +0200
> @@ -7891,7 +7891,7 @@
> /* Switch mode now when possible.  */
> if (SYMBOL_REF_DECL (op) && !TREE_PUBLIC (SYMBOL_REF_DECL (op))
>  && arm_arch5 && arm_change_mode_p (SYMBOL_REF_DECL (op)))
> -  return NEED_PLT_RELOC ? \"blx%?\\t%a0(PLT)\" : \"blx%?\\t(%a0)\";
> +  return NEED_PLT_RELOC ? \"blx%?\\t%a1(PLT)\" : \"blx%?\\t(%a1)\";
>  
>  return NEED_PLT_RELOC ? \"bl%?\\t%a1(PLT)\" : \"bl%?\\t%a1\";
>}"
> diff -ruNp 
> gnu_trunk.ref/gcc/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c 
> gnu_trunk.p0/gcc/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c
> --- gnu_trunk.ref/gcc/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c   
> 1970-01-01 01:00:00.0 +0100
> +++ gnu_trunk.p0/gcc/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c
> 2015-09-17 10:08:08.350064131 +0200
> @@ -0,0 +1,40 @@
> +/* Check that interwork between static functions is correctly resolved. */
> +
> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
> +/* { dg-options "-O0 -march=armv7-a -mfloat-abi=hard" } */

You can't have thumb1 and hard float, so the skip unless thumb1 seems a
nonsense.

R.

> +/* { dg-do compile } */
> +
> +struct _NSPoint
> +{
> +  float x;
> +  float y;
> +};
> +
> +typedef struct _NSPoint NSPoint;
> +
> +static NSPoint
> +__attribute__ ((target("arm")))
> +NSMakePoint (float x, float y)
> +{
> +  NSPoint point;
> +  point.x = x;
> +  point.y = y;
> +  return point;
> +}
> +
> +static NSPoint
> +__attribute__ ((target("thumb")))
> +RelativePoint (NSPoint point, NSPoint refPoint)
> +{
> +  return NSMakePoint (refPoint.x + point.x, refPoint.y + point.y);
> +}
> +
> +NSPoint
> +__attribute__ ((target("arm")))
> +g(NSPoint refPoint)
> +{
> +  float pointA, pointB;
> +  return RelativePoint (NSMakePoint (0, pointA), refPoint);
> +}
> +
> +/* { dg-final { scan-assembler-times "blx" 2 } } */
> 



Re: [PATCH][AArch64][2/5] Improve immediate generation

2015-09-18 Thread James Greenhalgh
On Wed, Sep 02, 2015 at 01:35:19PM +0100, Wilco Dijkstra wrote:
> aarch64_internal_mov_immediate uses loops iterating over all legal bitmask
> immediates to find 2-instruction immediate combinations. One loop is
> quadratic and despite being extremely expensive very rarely finds a matching
> immediate (43 matches in all of SPEC2006 but none are emitted in final code),
> so it can be removed without any effect on code quality. The other loop can
> be replaced by a constant-time search: rather than iterating over all legal
> bitmask values, reconstruct a potential bitmask and query the fast
> aarch64_bitmask_imm.
> 
> No change in generated code, passes GCC regression tests/bootstrap.

Well, presumably those 43 cases in SPEC2006 changed...

The code looks OK, and I haven't seen any objections from Marcus or
Richard on the overall direction of the patch set.

OK for trunk.

Thanks,
James

> 
> ChangeLog:
> 2015-09-02  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
>   Replace slow immediate matching loops with a faster algorithm.
> 
> ---
>  gcc/config/aarch64/aarch64.c | 96 
> +++-
>  1 file changed, 23 insertions(+), 73 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index c0280e6..d6f7cb0 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1376,7 +1376,7 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
> generate,
>unsigned HOST_WIDE_INT mask;
>int i;
>bool first;
> -  unsigned HOST_WIDE_INT val;
> +  unsigned HOST_WIDE_INT val, val2;
>bool subtargets;
>rtx subtarget;
>int one_match, zero_match, first_not__match;
> @@ -1503,85 +1503,35 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, 
> bool generate,
>   }
>  }
>  
> -  /* See if we can do it by arithmetically combining two
> - immediates.  */
> -  for (i = 0; i < AARCH64_NUM_BITMASKS; i++)
> +  if (zero_match != 2 && one_match != 2)
>  {
> -  int j;
> -  mask = 0x;
> +  /* Try emitting a bitmask immediate with a movk replacing 16 bits.
> +  For a 64-bit bitmask try whether changing 16 bits to all ones or
> +  zeroes creates a valid bitmask.  To check any repeated bitmask,
> +  try using 16 bits from the other 32-bit half of val.  */
>  
> -  if (aarch64_uimm12_shift (val - aarch64_bitmasks[i])
> -   || aarch64_uimm12_shift (-val + aarch64_bitmasks[i]))
> +  for (i = 0; i < 64; i += 16, mask <<= 16)
>   {
> -   if (generate)
> - {
> -   subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
> -   emit_insn (gen_rtx_SET (subtarget,
> -   GEN_INT (aarch64_bitmasks[i])));
> -   emit_insn (gen_adddi3 (dest, subtarget,
> -  GEN_INT (val - aarch64_bitmasks[i])));
> - }
> -   num_insns += 2;
> -   return num_insns;
> +   val2 = val & ~mask;
> +   if (val2 != val && aarch64_bitmask_imm (val2, mode))
> + break;
> +   val2 = val | mask;
> +   if (val2 != val && aarch64_bitmask_imm (val2, mode))
> + break;
> +   val2 = val2 & ~mask;
> +   val2 = val2 | (((val2 >> 32) | (val2 << 32)) & mask);
> +   if (val2 != val && aarch64_bitmask_imm (val2, mode))
> + break;
>   }
> -
> -  for (j = 0; j < 64; j += 16, mask <<= 16)
> +  if (i != 64)
>   {
> -   if ((aarch64_bitmasks[i] & ~mask) == (val & ~mask))
> +   if (generate)
>   {
> -   if (generate)
> - {
> -   emit_insn (gen_rtx_SET (dest,
> -   GEN_INT (aarch64_bitmasks[i])));
> -   emit_insn (gen_insv_immdi (dest, GEN_INT (j),
> -  GEN_INT ((val >> j) & 0x)));
> - }
> -   num_insns += 2;
> -   return num_insns;
> +   emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));
> +   emit_insn (gen_insv_immdi (dest, GEN_INT (i),
> +  GEN_INT ((val >> i) & 0x)));
>   }
> - }
> -}
> -
> -  /* See if we can do it by logically combining two immediates.  */
> -  for (i = 0; i < AARCH64_NUM_BITMASKS; i++)
> -{
> -  if ((aarch64_bitmasks[i] & val) == aarch64_bitmasks[i])
> - {
> -   int j;
> -
> -   for (j = i + 1; j < AARCH64_NUM_BITMASKS; j++)
> - if (val == (aarch64_bitmasks[i] | aarch64_bitmasks[j]))
> -   {
> - if (generate)
> -   {
> - subtarget = subtargets ? gen_reg_rtx (mode) : dest;
> - emit_insn (gen_rtx_SET (subtarget,
> - GEN_INT (aarch64_bitmasks[i])));
> - emit_insn (gen_iordi3 (dest, subtarget,
> -GEN_INT (aarch64_bitmasks[j])));
> -   }
> - num_insn

Re: [PATCH][AArch64][3/5] Improve immediate generation

2015-09-18 Thread James Greenhalgh
On Wed, Sep 02, 2015 at 01:35:33PM +0100, Wilco Dijkstra wrote:
> Remove aarch64_bitmasks, aarch64_build_bitmask_table and aarch64_bitmasks_cmp
> as they are no longer used by the immediate generation code.
> 
> No change in generated code, passes GCC regression tests/bootstrap.

OK.

Thanks,
James

> 
> ChangeLog:
> 2015-09-02  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c (aarch64_bitmasks): Remove.
>   (AARCH64_NUM_BITMASKS) remove.  (aarch64_bitmasks_cmp): Remove.
>   (aarch64_build_bitmask_table): Remove.
> 
> ---
>  gcc/config/aarch64/aarch64.c | 69 
> 
>  1 file changed, 69 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 070c68b..0bc6b19 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -563,12 +563,6 @@ static const struct aarch64_option_extension 
> all_extensions[] =
> increment address.  */
>  static machine_mode aarch64_memory_reference_mode;
>  
> -/* A table of valid AArch64 "bitmask immediate" values for
> -   logical instructions.  */
> -
> -#define AARCH64_NUM_BITMASKS  5334
> -static unsigned HOST_WIDE_INT aarch64_bitmasks[AARCH64_NUM_BITMASKS];
> -
>  typedef enum aarch64_cond_code
>  {
>AARCH64_EQ = 0, AARCH64_NE, AARCH64_CS, AARCH64_CC, AARCH64_MI, AARCH64_PL,
> @@ -3172,67 +3166,6 @@ aarch64_tls_referenced_p (rtx x)
>  }
>  
>  
> -static int
> -aarch64_bitmasks_cmp (const void *i1, const void *i2)
> -{
> -  const unsigned HOST_WIDE_INT *imm1 = (const unsigned HOST_WIDE_INT *) i1;
> -  const unsigned HOST_WIDE_INT *imm2 = (const unsigned HOST_WIDE_INT *) i2;
> -
> -  if (*imm1 < *imm2)
> -return -1;
> -  if (*imm1 > *imm2)
> -return +1;
> -  return 0;
> -}
> -
> -
> -static void
> -aarch64_build_bitmask_table (void)
> -{
> -  unsigned HOST_WIDE_INT mask, imm;
> -  unsigned int log_e, e, s, r;
> -  unsigned int nimms = 0;
> -
> -  for (log_e = 1; log_e <= 6; log_e++)
> -{
> -  e = 1 << log_e;
> -  if (e == 64)
> - mask = ~(HOST_WIDE_INT) 0;
> -  else
> - mask = ((HOST_WIDE_INT) 1 << e) - 1;
> -  for (s = 1; s < e; s++)
> - {
> -   for (r = 0; r < e; r++)
> - {
> -   /* set s consecutive bits to 1 (s < 64) */
> -   imm = ((unsigned HOST_WIDE_INT)1 << s) - 1;
> -   /* rotate right by r */
> -   if (r != 0)
> - imm = ((imm >> r) | (imm << (e - r))) & mask;
> -   /* replicate the constant depending on SIMD size */
> -   switch (log_e) {
> -   case 1: imm |= (imm <<  2);
> -   case 2: imm |= (imm <<  4);
> -   case 3: imm |= (imm <<  8);
> -   case 4: imm |= (imm << 16);
> -   case 5: imm |= (imm << 32);
> -   case 6:
> - break;
> -   default:
> - gcc_unreachable ();
> -   }
> -   gcc_assert (nimms < AARCH64_NUM_BITMASKS);
> -   aarch64_bitmasks[nimms++] = imm;
> - }
> - }
> -}
> -
> -  gcc_assert (nimms == AARCH64_NUM_BITMASKS);
> -  qsort (aarch64_bitmasks, nimms, sizeof (aarch64_bitmasks[0]),
> -  aarch64_bitmasks_cmp);
> -}
> -
> -
>  /* Return true if val can be encoded as a 12-bit unsigned immediate with
> a left shift of 0 or 12 bits.  */
>  bool
> @@ -7828,8 +7761,6 @@ aarch64_override_options (void)
> || (aarch64_arch_string && valid_arch))
>  gcc_assert (explicit_arch != aarch64_no_arch);
>  
> -  aarch64_build_bitmask_table ();
> -
>aarch64_override_options_internal (&global_options);
>  
>/* Save these options as the default ones in case we push and pop them 
> later
> -- 
> 1.8.3
> 
> 
> 


Re: [PATCH][AArch64][4/5] Improve immediate generation

2015-09-18 Thread James Greenhalgh
On Wed, Sep 02, 2015 at 01:36:03PM +0100, Wilco Dijkstra wrote:
> The code that emits a movw with an add/sub is hardly ever used, and all cases
> in actual code are already covered by mov+movk, so it is redundant (an
> example of such an immediate is 0x00000abcul).
> 
> Passes GCC regression tests/bootstrap. Minor changes in generated code due to
> movk being used instead of add/sub (codesize remains the same).

OK.

Thanks,
James

> 
> ChangeLog:
> 2015-09-02  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
>   Remove redundant immediate generation code.
> 
> ---
>  gcc/config/aarch64/aarch64.c | 60 
> 
>  1 file changed, 60 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0bc6b19..bd6e522 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1371,8 +1371,6 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
> generate,
>int i;
>bool first;
>unsigned HOST_WIDE_INT val, val2;
> -  bool subtargets;
> -  rtx subtarget;
>int one_match, zero_match, first_not__match;
>int num_insns = 0;
>  
> @@ -1402,7 +1400,6 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool 
> generate,
>/* Remaining cases are all for DImode.  */
>  
>val = INTVAL (imm);
> -  subtargets = optimize && can_create_pseudo_p ();
>  
>one_match = 0;
>zero_match = 0;
> @@ -1440,63 +1437,6 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, 
> bool generate,
>if (zero_match == 2)
>  goto simple_sequence;
>  
> -  mask = 0x0UL;
> -  for (i = 16; i < 64; i += 16, mask <<= 16)
> -{
> -  HOST_WIDE_INT comp = mask & ~(mask - 1);
> -
> -  if (aarch64_uimm12_shift (val - (val & mask)))
> - {
> -   if (generate)
> - {
> -   subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
> -   emit_insn (gen_rtx_SET (subtarget, GEN_INT (val & mask)));
> -   emit_insn (gen_adddi3 (dest, subtarget,
> -  GEN_INT (val - (val & mask;
> - }
> -   num_insns += 2;
> -   return num_insns;
> - }
> -  else if (aarch64_uimm12_shift (-(val - ((val + comp) & mask
> - {
> -   if (generate)
> - {
> -   subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
> -   emit_insn (gen_rtx_SET (subtarget,
> -   GEN_INT ((val + comp) & mask)));
> -   emit_insn (gen_adddi3 (dest, subtarget,
> -  GEN_INT (val - ((val + comp) & mask;
> - }
> -   num_insns += 2;
> -   return num_insns;
> - }
> -  else if (aarch64_uimm12_shift (val - ((val - comp) | ~mask)))
> - {
> -   if (generate)
> - {
> -   subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
> -   emit_insn (gen_rtx_SET (subtarget,
> -   GEN_INT ((val - comp) | ~mask)));
> -   emit_insn (gen_adddi3 (dest, subtarget,
> -  GEN_INT (val - ((val - comp) | ~mask;
> - }
> -   num_insns += 2;
> -   return num_insns;
> - }
> -  else if (aarch64_uimm12_shift (-(val - (val | ~mask
> - {
> -   if (generate)
> - {
> -   subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
> -   emit_insn (gen_rtx_SET (subtarget, GEN_INT (val | ~mask)));
> -   emit_insn (gen_adddi3 (dest, subtarget,
> -  GEN_INT (val - (val | ~mask;
> - }
> -   num_insns += 2;
> -   return num_insns;
> - }
> -}
> -
>if (zero_match != 2 && one_match != 2)
>  {
>for (i = 0; i < 64; i += 16, mask <<= 16)
> -- 
> 1.8.3
> 
> 
> 


Re: [PATCH][AArch64][5/5] Improve immediate generation

2015-09-18 Thread James Greenhalgh
On Wed, Sep 02, 2015 at 01:36:28PM +0100, Wilco Dijkstra wrote:
> Cleanup the remainder of aarch64_internal_mov_immediate. Compute the number
> of 16-bit aligned 16-bit masks that are all-zeroes or all-ones, and emit the
> smallest sequence using a single loop skipping either all-ones or all-zeroes.
> 
> Passes GCC regression tests/bootstrap. Minor changes in generated code for
> some special cases but codesize is identical.

OK.

Thanks,
James

> 
> ChangeLog:
> 2015-09-02  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
>   Cleanup immediate generation code.
> 
> ---
>  gcc/config/aarch64/aarch64.c | 137 
> ---
>  1 file changed, 39 insertions(+), 98 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index af9a3d3..ca4428a 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1367,75 +1367,42 @@ static int
>  aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
>   machine_mode mode)
>  {
> -  unsigned HOST_WIDE_INT mask;
>int i;
> -  bool first;
> -  unsigned HOST_WIDE_INT val, val2;
> -  int one_match, zero_match, first_not__match;
> -  int num_insns = 0;
> +  unsigned HOST_WIDE_INT val, val2, mask;
> +  int one_match, zero_match;
> +  int num_insns;
>  
> -  if (CONST_INT_P (imm) && aarch64_move_imm (INTVAL (imm), mode))
> +  val = INTVAL (imm);
> +
> +  if (aarch64_move_imm (val, mode))
>  {
>if (generate)
>   emit_insn (gen_rtx_SET (dest, imm));
> -  num_insns++;
> -  return num_insns;
> +  return 1;
>  }
>  
> -  if (mode == SImode)
> +  if ((val >> 32) == 0 || mode == SImode)
>  {
> -  /* We know we can't do this in 1 insn, and we must be able to do it
> -  in two; so don't mess around looking for sequences that don't buy
> -  us anything.  */
>if (generate)
>   {
> -   emit_insn (gen_rtx_SET (dest, GEN_INT (INTVAL (imm) & 0x)));
> -   emit_insn (gen_insv_immsi (dest, GEN_INT (16),
> -  GEN_INT ((INTVAL (imm) >> 16) & 0x)));
> +   emit_insn (gen_rtx_SET (dest, GEN_INT (val & 0x)));
> +   if (mode == SImode)
> + emit_insn (gen_insv_immsi (dest, GEN_INT (16),
> +GEN_INT ((val >> 16) & 0x)));
> +   else
> + emit_insn (gen_insv_immdi (dest, GEN_INT (16),
> +GEN_INT ((val >> 16) & 0x)));
>   }
> -  num_insns += 2;
> -  return num_insns;
> +  return 2;
>  }
>  
>/* Remaining cases are all for DImode.  */
>  
> -  val = INTVAL (imm);
> -
> -  one_match = 0;
> -  zero_match = 0;
>mask = 0x;
> -  first_not__match = -1;
> -
> -  for (i = 0; i < 64; i += 16, mask <<= 16)
> -{
> -  if ((val & mask) == mask)
> - one_match++;
> -  else
> - {
> -   if (first_not__match < 0)
> - first_not__match = i;
> -   if ((val & mask) == 0)
> - zero_match++;
> - }
> -}
> -
> -  if (one_match == 2)
> -{
> -  /* Set one of the quarters and then insert back into result.  */
> -  mask = 0xll << first_not__match;
> -  if (generate)
> - {
> -   emit_insn (gen_rtx_SET (dest, GEN_INT (val | mask)));
> -   emit_insn (gen_insv_immdi (dest, GEN_INT (first_not__match),
> -  GEN_INT ((val >> first_not__match)
> -   & 0x)));
> - }
> -  num_insns += 2;
> -  return num_insns;
> -}
> -
> -  if (zero_match == 2)
> -goto simple_sequence;
> +  zero_match = ((val & mask) == 0) + ((val & (mask << 16)) == 0) +
> +((val & (mask << 32)) == 0) + ((val & (mask << 48)) == 0);
> +  one_match = ((~val & mask) == 0) + ((~val & (mask << 16)) == 0) +
> +((~val & (mask << 32)) == 0) + ((~val & (mask << 48)) == 0);
>  
>if (zero_match != 2 && one_match != 2)
>  {
> @@ -1463,58 +1430,32 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, 
> bool generate,
>   {
> emit_insn (gen_rtx_SET (dest, GEN_INT (val2)));
> emit_insn (gen_insv_immdi (dest, GEN_INT (i),
> -  GEN_INT ((val >> i) & 0x)));
> +  GEN_INT ((val >> i) & 0x)));
>   }
> -   return 2;
>   }
>  }
>  
> -  if (one_match > zero_match)
> -{
> -  /* Set either first three quarters or all but the third.*/
> -  mask = 0xll << (16 - first_not__match);
> -  if (generate)
> - emit_insn (gen_rtx_SET (dest,
> - GEN_INT (val | mask | 0xull)));
> -  num_insns ++;
> +  /* Generate 2-4 instructions, skipping 16 bits of all zeroes or ones which
> + are emitted by the initial mov.  If one_match > zero_match, skip set 
> bits,
> + otherwise

Re: [PATCH] PR target/67480: AVX512 bitwise logic insns pattern is incorrect

2015-09-18 Thread Alexander Fomin
Hi,
On Tue, Sep 08, 2015 at 11:41:50AM +0300, Kirill Yukhin wrote:
> Hi,
> On 07 Sep 19:07, Alexander Fomin wrote:
> > +  tmp = TARGET_AVX512VL ? "p" : "p";
> Suppose masking is applied and 1st alternative chosen...
> > +  ops = "%s\t{%%2, %%0|%%0, %%2}";
> We'll reach here having p %xmm17, %xmm18 w/o even mention of mask 
> register.
> I think we need to check here if masking is needed and emit EVEX version (3 
> args + mask).
> >tmp = TARGET_AVX512VL ? "pq" : "p";
> Despite of alternative chosen, you force insn to be pq (when compiled 
> w/ -mavx512vl).
> >ops = "%s\t{%%2, %%0|%%0, %%2}";
> So, here you'll emit, e.g. "pandq %xmm16, %xmm17"
> If think it'll be better to attach AVX-512VL related suffix while 
> discriminating
> alternatives.
> 
> --
> Thanks, K
I've fixed the problems related to masks/alternatives for both patterns.
Is updated version OK for trunk?

Regards,
Alexander
---
gcc/

PR target/67480
* config/i386/sse.md (define_mode_iterator VI48_AVX_AVX512F): New.
(define_mode_iterator VI12_AVX_AVX512F): New.
(define_insn "3"): Change
all iterators to VI48_AVX_AVX512F. Extract remaining modes ...
(define_insn "*3"): ... Into new pattern using
VI12_AVX_AVX512F iterators without masking.

gcc/testsuite
PR target/67480
* gcc.target/i386/pr67480.c: New test.
---
 gcc/config/i386/sse.md  | 129 +---
 gcc/testsuite/gcc.target/i386/pr67480.c |  10 +++
 2 files changed, 127 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4535570..0a57db0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -416,6 +416,14 @@
   [(V16SI "TARGET_AVX512F") V8SI V4SI
(V8DI "TARGET_AVX512F") V4DI V2DI])
 
+(define_mode_iterator VI48_AVX_AVX512F
+  [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI])
+
+(define_mode_iterator VI12_AVX_AVX512F
+  [ (V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
+(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI])
+
 (define_mode_iterator V48_AVX2
   [V4SF V2DF
V8SF V4DF
@@ -11077,10 +11085,10 @@
 })
 
 (define_insn "3"
-  [(set (match_operand:VI 0 "register_operand" "=x,v")
-   (any_logic:VI
- (match_operand:VI 1 "nonimmediate_operand" "%0,v")
- (match_operand:VI 2 "nonimmediate_operand" "xm,vm")))]
+  [(set (match_operand:VI48_AVX_AVX512F 0 "register_operand" "=x,v")
+   (any_logic:VI48_AVX_AVX512F
+ (match_operand:VI48_AVX_AVX512F 1 "nonimmediate_operand" "%0,v")
+ (match_operand:VI48_AVX_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
   "TARGET_SSE && 
&& ix86_binary_operator_ok (, mode, operands)"
 {
@@ -11109,24 +7,120 @@
 case V4DImode:
 case V4SImode:
 case V2DImode:
-  if (TARGET_AVX512VL)
+  tmp = TARGET_AVX512VL ? "p" : "p";
+  break;
+default:
+  gcc_unreachable ();
+  }
+  break;
+
+   case MODE_V8SF:
+  gcc_assert (TARGET_AVX);
+   case MODE_V4SF:
+  gcc_assert (TARGET_SSE);
+  gcc_assert (!);
+  tmp = "ps";
+  break;
+
+   default:
+  gcc_unreachable ();
+   }
+
+  switch (which_alternative)
+{
+case 0:
+  if ()
+ops = "v%s\t{%%2, %%0, %%0|%%0, %%0, 
%%2}";
+  else
+ops = "%s\t{%%2, %%0|%%0, %%2}";
+  break;
+case 1:
+  ops = "v%s\t{%%2, %%1, %%0|%%0, %%1, 
%%2}";
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  snprintf (buf, sizeof (buf), ops, tmp);
+  return buf;
+}
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog")
+   (set (attr "prefix_data16")
+ (if_then_else
+   (and (eq_attr "alternative" "0")
+   (eq_attr "mode" "TI"))
+   (const_string "1")
+   (const_string "*")))
+   (set_attr "prefix" "")
+   (set (attr "mode")
+   (cond [(and (match_test " == 16")
+   (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
+(const_string "")
+  (match_test "TARGET_AVX2")
+(const_string "")
+  (match_test "TARGET_AVX")
+(if_then_else
+  (match_test " > 16")
+  (const_string "V8SF")
+  (const_string ""))
+  (ior (not (match_test "TARGET_SSE2"))
+   (match_test "optimize_function_for_size_p (cfun)"))
+(const_string "V4SF")
+ ]
+ (const_string "")))])
+
+(define_insn "*3"
+  [(set (match_operand:VI12_AVX_AVX512F 0 "register_operand" "=x,v")
+   (any_logic: VI12_AVX_AVX512F
+ (match_operand:VI12_AVX_AVX512F 1 "nonimmediate_operand" "%0,v")
+ (match_operand:VI12_AVX_AVX512F 2 "nonimmediate_operand" "xm,vm")))]
+  "TARGET_SSE && ix86_binary_operator_ok (, mode, operands)"
+{
+  static char buf[64];
+  const char *ops;
+  const char *tmp;
+  const char *

Re: [PATCH, ARM]: Fix static interworking call

2015-09-18 Thread Richard Earnshaw
On 18/09/15 15:38, Christian Bruel wrote:
> 
> 
> On 09/18/2015 04:16 PM, Richard Earnshaw wrote:
>> On 17/09/15 09:46, Christian Bruel wrote:
>>> As obvious, bad operand number.
>>>
>>> OK for trunk ?
>>>
>>> Christian
>>>
>>>
>>> p1.patch
>>>
>>>
>>> 2015-09-18  Christian Bruel  
>>>
>>> * config/arm/arm.md (*call_value_symbol): Fix operand for
>>> interworking.
>>>
>>> 2015-09-18  Christian Bruel  
>>>
>>> * gcc.target/arm/attr_thumb-static2.c: New test.
>>>
>>> --- gnu_trunk.ref/gcc/gcc/config/arm/arm.md2015-09-14
>>> 09:52:37.697264500 +0200
>>> +++ gnu_trunk.p0/gcc/gcc/config/arm/arm.md2015-09-17
>>> 10:03:33.849451705 +0200
>>> @@ -7891,7 +7891,7 @@
>>>  /* Switch mode now when possible.  */
>>>  if (SYMBOL_REF_DECL (op) && !TREE_PUBLIC (SYMBOL_REF_DECL (op))
>>>   && arm_arch5 && arm_change_mode_p (SYMBOL_REF_DECL (op)))
>>> -  return NEED_PLT_RELOC ? \"blx%?\\t%a0(PLT)\" : \"blx%?\\t(%a0)\";
>>> +  return NEED_PLT_RELOC ? \"blx%?\\t%a1(PLT)\" : \"blx%?\\t(%a1)\";
>>>
>>>   return NEED_PLT_RELOC ? \"bl%?\\t%a1(PLT)\" : \"bl%?\\t%a1\";
>>> }"
>>> diff -ruNp
>>> gnu_trunk.ref/gcc/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c
>>> gnu_trunk.p0/gcc/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c
>>> ---
>>> gnu_trunk.ref/gcc/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c
>>> 1970-01-01
>>> 01:00:00.0 +0100
>>> +++
>>> gnu_trunk.p0/gcc/gcc/testsuite/gcc.target/arm/attr_thumb-static2.c   
>>> 2015-09-17 10:08:08.350064131 +0200
>>> @@ -0,0 +1,40 @@
>>> +/* Check that interwork between static functions is correctly
>>> resolved. */
>>> +
>>> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>>> +/* { dg-options "-O0 -march=armv7-a -mfloat-abi=hard" } */
>>
>> You can't have thumb1 and hard float,
> 
> Ah OK I didn't know that. Is it that there was no FPU before V5 ?
> 

Thumb1 had no instruction encodings for accessing the FPU.

>> so the skip unless thumb1 seems a nonsense.
> 
> And there is no thumb1 and march=armv7-a !. So indeed the skip unless
> thumb1 is a nonsense.
> Is the attached patch OK to clean this up ?
> 
> thanks,
> 
> 
>>
>> R.
>>
>>> +/* { dg-do compile } */
>>> +
>>> +struct _NSPoint
>>> +{
>>> +  float x;
>>> +  float y;
>>> +};
>>> +
>>> +typedef struct _NSPoint NSPoint;
>>> +
>>> +static NSPoint
>>> +__attribute__ ((target("arm")))
>>> +NSMakePoint (float x, float y)
>>> +{
>>> +  NSPoint point;
>>> +  point.x = x;
>>> +  point.y = y;
>>> +  return point;
>>> +}
>>> +
>>> +static NSPoint
>>> +__attribute__ ((target("thumb")))
>>> +RelativePoint (NSPoint point, NSPoint refPoint)
>>> +{
>>> +  return NSMakePoint (refPoint.x + point.x, refPoint.y + point.y);
>>> +}
>>> +
>>> +NSPoint
>>> +__attribute__ ((target("arm")))
>>> +g(NSPoint refPoint)
>>> +{
>>> +  float pointA, pointB;
>>> +  return RelativePoint (NSMakePoint (0, pointA), refPoint);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times "blx" 2 } } */
>>>
>>
> 
> 1.patch
> 
> 
> 2015-09-17  Christian Bruel  
> 
>   * gcc.target/arm/attr_thumb-static2.c: Test only for thumb2.
> 
> Index: attr_thumb-static2.c
> ===
> --- attr_thumb-static2.c  (revision 227904)
> +++ attr_thumb-static2.c  (working copy)
> @@ -1,6 +1,6 @@
>  /* Check that interwork between static functions is correctly resolved. */
>  
> -/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
> +/* { dg-require-effective-target arm_thumb2_ok } */
>  /* { dg-options "-O0 -march=armv7-a -mfloat-abi=hard" } */
>  /* { dg-do compile } */
>  
> 

Do you really need -mfloat-abi=hard for this test?  If so, I think you
also need "dg-require-effective-target arm_hard_vfp_ok".  See
gcc.target/arm/pr65729.c

R.



[PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-09-18 Thread Jennifer Yao
A number of functions in libstdc++ are guarded by the _GLIBCXX_USE_C99
preprocessor macro, which is only defined on systems that pass all of
the checks for a large set of C99 functions. Consequently, on systems
which lack any of the required C99 facilities (e.g. Cygwin, which
lacks some C99 complex math functions), the numeric conversion
functions (std::stoi(), std::stol(), std::to_string(), etc.) are not
defined—a rather silly outcome, as none of the numeric conversion
functions are implemented using C99 math functions.

This patch enables numeric conversion functions on the aforementioned
systems by splitting the checks for C99 support and defining several
new macros (_GLIBCXX_USE_C99_STDIO, _GLIBCXX_USE_C99_STDLIB, and
_GLIBCXX_USE_C99_WCHAR), which replace the use of _GLIBCXX_USE_C99 in
#if conditionals where appropriate.

Tested on x86_64-pc-cygwin.

Note: Several of the testcases that are newly enabled by the patch
fail on Cygwin due to defects in newlib (e.g. strtof() and wcstof() do
not set errno, strtold() and wcstold() do not handle trailing
non-numeric characters in input strings correctly, etc.). Also, at
least one testcase may fail due to PR 66530
 (to circumvent,
install the build tree or add the directory containing the built DLL
and import library to PATH).

libstdc++-v3/ChangeLog:

2015-09-18  Jennifer Yao  

PR libstdc++/58393
PR libstdc++/61580
* acinclude.m4 (GLIBCXX_ENABLE_C99, GLIBCXX_CHECK_C99_TR1): Use -std=c++0x
instead of -std=c++98 in CXXFLAGS. Cache the results of checking for complex
math and wide character functions. Define preprocessor macros
_GLIBCXX_USE_C99_STDIO, _GLIBCXX_USE_C99_STDLIB, and _GLIBCXX_USE_C99_WCHAR.
Reformat to improve readability.
* config.h.in: Regenerate.
* config/locale/dragonfly/c_locale.h (std::__convert_from_v): Replace
_GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDIO.
* config/locale/generic/c_locale.h (std::__convert_from_v): Replace
_GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDIO.
* config/locale/gnu/c_locale.h (std::__convert_from_v): Replace
_GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDIO.
* config/os/bsd/dragonfly/os_defines.h: Define _GLIBCXX_USE_C99_STDIO,
_GLIBCXX_USE_C99_STDLIB, and _GLIBCXX_USE_C99_WCHAR.
* configure: Regenerate.
* include/bits/basic_string.h: Change and add preprocessor #if conditionals
so that numeric conversion functions are defined when
_GLIBCXX_USE_C99_STDIO, _GLIBCXX_USE_C99_STDLIB, or _GLIBCXX_USE_C99_WCHAR
are defined, instead of _GLIBCXX_USE_C99.
* include/bits/locale_facets.tcc (std::num_put::_M_insert_float): Replace
_GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDIO.
* include/bits/locale_facets_nonio.tcc (std::money_put::do_put): Replace
_GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDIO.
* include/c_compatibility/math.h: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_MATH.
* include/c_compatibility/wchar.h: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_WCHAR.
* include/c_global/cstdio: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_STDIO.
* include/c_global/cstdlib: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_STDLIB.
* include/c_global/cwchar: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_WCHAR.
* include/c_std/cstdio: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_STDIO.
* include/c_std/cstdlib: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_STDLIB.
* include/c_std/cwchar: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_WCHAR.
* include/ext/vstring.h: Change and add preprocessor #if conditionals
so that numeric conversion functions are defined when
_GLIBCXX_USE_C99_STDIO, _GLIBCXX_USE_C99_STDLIB, or _GLIBCXX_USE_C99_WCHAR
are defined, instead of _GLIBCXX_USE_C99.
* include/tr1/cstdio: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_STDIO.
* include/tr1/cstdlib: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_STDLIB.
* include/tr1/cwchar: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_WCHAR.
* include/tr1/stdlib.h: Replace _GLIBCXX_USE_C99 with
_GLIBCXX_USE_C99_STDLIB.
* src/c++11/debug.cc (__gnu_debug::_Error_formatter::_M_format_word):
Replace _GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDIO.
* src/c++98/locale_facets.cc (std::__num_base::_S_format_float): Replace
_GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDIO.
* testsuite/18_support/exception_ptr/60612-terminate.cc: Replace
_GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDLIB.
* testsuite/18_support/exception_ptr/60612-unexpected.cc: Replace
_GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_STDLIB.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/stod.cc
(test01): Replace _GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_WCHAR.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/stof.cc
(test01): Replace _GLIBCXX_USE_C99 with _GLIBCXX_USE_C99_WCHAR.
* testsuite/21_strings/b

Re: [RFC][PATCH] Preferred rename register in regrename pass

2015-09-18 Thread Bernd Schmidt

On 09/17/2015 04:38 PM, Robert Suchanek wrote:

We came across a situation for MIPS64 where moves for sign-extension were
not converted into a nop because of IRA spilled some of the allocnos and
assigned different hard register for the output operand in the move.
LRA is not fixing this up as most likely the move was not introduced by
the LRA itself.  I found it hard to fix this in LRA and looked at
an alternative solution where regrename pass appeared to be the best candidate.


For reference, please post examples of the insn pattern(s) where you 
would hope to get an improvement. Do they use matching constraints 
between the input and output operands in at least one alternative?


So this does look like something that could be addressed in regrename, 
but I think the patch is not quite the way to do it.



+/* Return a preferred rename register for HEAD.  */


Function comments ideally ought to be a little more detailed. Preferred 
how and why?



+static int
+find_preferred_rename_reg (du_head_p head)
+{
+  struct du_chain *this_du;
+  int preferred_reg = -1;
+
+  for (this_du = head->first; this_du; this_du = this_du->next_use)


This loop seems to search for the insn where the chain terminates (i.e. 
the register dies). It seems strange to do this here rather than during 
the initial scan in record_out_operands where we visit every insn and 
already look for REG_DEAD notes.



+  rtx note;
+  insn_rr_info *p;
+
+  /* The preferred rename register is an output register iff an input
+register dies in an instruction but the candidate must be validated by
+check_new_reg_p.  */
+  for (note = REG_NOTES (this_du->insn); note; note = XEXP (note, 1))
+   if (insn_rr.exists()
+   && REG_NOTE_KIND (note) == REG_DEAD
+   && REGNO (XEXP (note, 0)) == head->regno
+   && (p = &insn_rr[INSN_UID (this_du->insn)])
+   && p->op_info)
+ {
+   int i;
+   for (i = 0; i < p->op_info->n_chains; i++)
+ {
+   struct du_head *next_head = p->op_info->heads[i];
+   if (head != next_head)


Here you're not actually verifying the chosen preferred reg is an 
output? Is the use of plain "p->op_info" (which is actually an array) 
intentional as a guess that operand 0 is the output? I'm not thrilled 
with this, and at the very least it should be "p->op_info[0]." to avoid 
reader confusion.
It's also not verifying that this is indeed a case where choosing a 
preferred reg has a beneficial effect at all.


The use of insn_rr would probably also be unnecessary if this was done 
during the scan phase.



+   preferred_reg = next_head->regno;


The problem here is that there's an ordering issue. What if next_head 
gets renamed afterwards? The choice of preferred register hasn't bought 
us anything in that case.


For all these reasons I'd suggest a different approach, looking for such 
situations during the scan. Try to detect a situation where

 * we have a REG_DEAD note for an existing chain
 * the insn fulfils certain conditions (i.e. it's a move, or maybe one
   of the alternatives has a matching constraint). After all, there's
   not much point in tying if the reg that dies was used in a memory
   address.
 * a new chain is started for a single output
Then, instead of picking a best register, mark the two chains as tied. 
Then, when choosing a rename register, see if a tied chain already was 
renamed, and try to pick the same register first.



@@ -1826,7 +1900,7 @@ regrename_optimize (void)
df_analyze ();
df_set_flags (DF_DEFER_INSN_RESCAN);

-  regrename_init (false);
+  regrename_init (true);


It would be good to avoid this as it makes the renamer more expensive. I 
expect that if you follow the strategy described above, this won't be 
necessary.



-  struct du_chain *chains[MAX_REGS_PER_ADDRESS];
-  struct du_head *heads[MAX_REGS_PER_ADDRESS];
+  vec chains;
+  vec heads;


Given that MAX_REGS_PER_ADDRESS tends to be 1 or 2 this appears to make 
things more heavyweight, especially with the extra loop needed to free 
the vecs. If possible, try to avoid this. (Again, AFAICS this 
information shouldn't really be necessary for what you're trying to do).



Bernd


[PATCH] Break out phi-only cprop into its own file

2015-09-18 Thread Jeff Law


I was tackling another bit of infrastructure (eliminating various file 
scoped statics)  needed in DOM to address 47679 , I figured the time to 
split out the phi-only-cprop code to its own file had long since passed.


That's precisely what this patch does.  It moves all the phi-only-cprop 
bits into its own file, eliminates the file scoped static variables in 
that new file and minimizes the set of headers included by tree-ssa-dom.c.


Bootstrapped and regression tested on x86_64-linux-gnu.  For some extra 
sanity I also build with config-list.mk.  While there were failures in 
that build, they are totally unrelated to these changes (presumably 
Kaz's recent change to sh.c will fix the vast majority of those failures :-)


Installed on the trunk.


Jeff
* Makefile.in (OBJS): Add tree-ssa-phionlycprop.o
* tree-ssa-dom.c: Remove unnecessary header includes.
(remove_stmt_or_phi): Moved from here into tree-ssa-phionlycprop.c
(get_rhs_or_phi_arg, get_lhs_or_phi_result): Likewise.
(propagate_rhs_into_lhs, eliminate_const_or_copy): Likewise.
(eliminate_degenerate_phis_1, pass_phi_only_cprop): Likewise.
(pass_phi_only_cprop::execute): Likewise.
(make_pass_phi_only_cprop): Likewise.
* tree-ssa-phionlycprop.c: New file with moved code.  Eliminate
uses of file scoped statics by passing the required objects
as parameters wherever needed.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 254837e..8801207 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1462,6 +1462,7 @@ OBJS = \
tree-ssa-loop.o \
tree-ssa-math-opts.o \
tree-ssa-operands.o \
+   tree-ssa-phionlycprop.o \
tree-ssa-phiopt.o \
tree-ssa-phiprop.o \
tree-ssa-pre.o \
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 1b44bd1..963dea9 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -22,20 +22,13 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
-#include "cfghooks.h"
 #include "tree.h"
 #include "gimple.h"
-#include "hard-reg-set.h"
 #include "ssa.h"
-#include "alias.h"
 #include "fold-const.h"
-#include "stor-layout.h"
-#include "flags.h"
-#include "tm_p.h"
 #include "cfganal.h"
 #include "cfgloop.h"
 #include "gimple-pretty-print.h"
-#include "internal-fn.h"
 #include "gimple-fold.h"
 #include "tree-eh.h"
 #include "gimple-iterator.h"
@@ -45,7 +38,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "tree-ssa-propagate.h"
 #include "tree-ssa-threadupdate.h"
-#include "langhooks.h"
 #include "params.h"
 #include "tree-ssa-scopedtables.h"
 #include "tree-ssa-threadedge.h"
@@ -1986,523 +1978,3 @@ lookup_avail_expr (gimple stmt, bool insert)
 
   return lhs;
 }
-
-/* PHI-ONLY copy and constant propagation.  This pass is meant to clean
-   up degenerate PHIs created by or exposed by jump threading.  */
-
-/* Given a statement STMT, which is either a PHI node or an assignment,
-   remove it from the IL.  */
-
-static void
-remove_stmt_or_phi (gimple stmt)
-{
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
-
-  if (gimple_code (stmt) == GIMPLE_PHI)
-remove_phi_node (&gsi, true);
-  else
-{
-  gsi_remove (&gsi, true);
-  release_defs (stmt);
-}
-}
-
-/* Given a statement STMT, which is either a PHI node or an assignment,
-   return the "rhs" of the node, in the case of a non-degenerate
-   phi, NULL is returned.  */
-
-static tree
-get_rhs_or_phi_arg (gimple stmt)
-{
-  if (gimple_code (stmt) == GIMPLE_PHI)
-return degenerate_phi_result (as_a  (stmt));
-  else if (gimple_assign_single_p (stmt))
-return gimple_assign_rhs1 (stmt);
-  else
-gcc_unreachable ();
-}
-
-
-/* Given a statement STMT, which is either a PHI node or an assignment,
-   return the "lhs" of the node.  */
-
-static tree
-get_lhs_or_phi_result (gimple stmt)
-{
-  if (gimple_code (stmt) == GIMPLE_PHI)
-return gimple_phi_result (stmt);
-  else if (is_gimple_assign (stmt))
-return gimple_assign_lhs (stmt);
-  else
-gcc_unreachable ();
-}
-
-/* Propagate RHS into all uses of LHS (when possible).
-
-   RHS and LHS are derived from STMT, which is passed in solely so
-   that we can remove it if propagation is successful.
-
-   When propagating into a PHI node or into a statement which turns
-   into a trivial copy or constant initialization, set the
-   appropriate bit in INTERESTING_NAMEs so that we will visit those
-   nodes as well in an effort to pick up secondary optimization
-   opportunities.  */
-
-static void
-propagate_rhs_into_lhs (gimple stmt, tree lhs, tree rhs, bitmap 
interesting_names)
-{
-  /* First verify that propagation is valid.  */
-  if (may_propagate_copy (lhs, rhs))
-{
-  use_operand_p use_p;
-  imm_use_iterator iter;
-  gimple use_stmt;
-  bool all = true;
-
-  /* Dump details.  */
-  if (dump_file && (dump_flags & TDF_DETAILS))

Re: (patch,rfc) s/gimple/gimple */

2015-09-18 Thread Jeff Law

On 09/18/2015 07:32 AM, Trevor Saunders wrote:

On Wed, Sep 16, 2015 at 03:11:14PM -0400, David Malcolm wrote:

On Wed, 2015-09-16 at 09:16 -0400, Trevor Saunders wrote:

Hi,

I gave changing from gimple to gimple * a shot last week.  It turned out
to be not too hard.  As you might expect the patch is huge so its
attached compressed.

patch was bootstrapped + regtested on x86_64-linux-gnu, and run through
config-list.mk.  However I needed to update it some for changes made
while testing.  Do people want to make this change now?  If so I'll try
and commit the patch over the weekend when less is changing.



FWIW there are some big changes in gcc/tree-vect-slp.c:vectorizable_load
that looks like unrelated whitespace changes, e.g. the following (and
there are some followup hunks).  Did something change underneath, or was
there a stray whitespace cleanup here?  (I skimmed through the patch,
and this was the only file I spotted where something looked wrong)


yeah, it was a stray whitespace cleanup, but I reverted it.

Given the few but only positive comments I've seen I'm planning to
commit this over the weekend.

That works for me.

jeff


[PATCH] shrink-wrap: Handle multiple predecessors of prologue

2015-09-18 Thread Segher Boessenkool
The caller of try_shrink_wrapping wants to be returned a single edge to
put the prologue on.  To make that work even if there are multiple edges
(all pointing to the PRO block) that need the prologue, add a new block
that becomes the destination of all such edges, and then jumps to PRO.

In the general case, some edges to PRO will need to be redirected, and
not all edges *can* be redirected.  This adds a can_get_prologue function
that detects such cases.  This then happily can also handle the "prologue
clobbers some reg that is live on the edge we want to insert it on" case.

Not all (if any?) EDGE_CROSSING edges can be redirected, so handle those
the same as EDGE_COMPLEX edges.  Maybe they should *be* EDGE_COMPLEX?

The "clobber" test tests all registers live in, which is a bit pessimistic
(only those live on edges that get the prologue need to be considered).
This is the same test as was there before; I haven't measured what the
impact of this suboptimality is.

Bootstrapped and tested on powerpc64-linux.  is this okay for mainline?


Segher


2015-09-18  Segher Boessenkool  

* function.c (thread_prologue_and_epilogue_insns): Delete
orig_entry_edge argument to try_shrink_wrapping.
* shrink-wrap.c (can_get_prologue): New function.
(can_dup_for_shrink_wrapping): Also handle EDGE_CROSSING.
(try_shrink_wrapping): Delete orig_entry_edge argument.  Use
can_get_prologue where needed.  Remove code that finds a single
edge for the prologue.  Remove code that tests if any reg clobbered
by the prologue is live on the prologue edge.  Remove code that finds
the new prologue edge after duplicating blocks.  Make a new prologue
block and edge.
* shrink-wrap.h (try_shrink_wrapping): Delete orig_entry_edge argument.

---
 gcc/function.c|   2 +-
 gcc/shrink-wrap.c | 170 ++
 gcc/shrink-wrap.h |   4 +-
 3 files changed, 99 insertions(+), 77 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index bb75b1c..1095465 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -6144,7 +6144,7 @@ thread_prologue_and_epilogue_insns (void)
  prologue/epilogue is emitted only around those parts of the
  function that require it.  */
 
-  try_shrink_wrapping (&entry_edge, orig_entry_edge, &bb_flags, prologue_seq);
+  try_shrink_wrapping (&entry_edge, &bb_flags, prologue_seq);
 
   if (split_prologue_seq != NULL_RTX)
 {
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 1387594..d2d665f 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -462,6 +462,30 @@ prepare_shrink_wrap (basic_block entry_block)
   }
 }
 
+/* Return whether basic block PRO can get the prologue.  It can not if it
+   has incoming complex edges that need a prologue inserted (we make a new
+   block for the prologue, so those edges would need to be redirected, which
+   does not work).  It also can not if there exist registers live on entry
+   to PRO that are clobbered by the prologue.  */
+
+static bool
+can_get_prologue (basic_block pro, HARD_REG_SET prologue_clobbered)
+{
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, pro->preds)
+if (e->flags & (EDGE_COMPLEX | EDGE_CROSSING)
+   && !dominated_by_p (CDI_DOMINATORS, e->src, pro))
+  return false;
+
+  HARD_REG_SET live;
+  REG_SET_TO_HARD_REG_SET (live, df_get_live_in (pro));
+  if (hard_reg_set_intersect_p (live, prologue_clobbered))
+return false;
+
+  return true;
+}
+
 /* Return whether we can duplicate basic block BB for shrink wrapping.  We
cannot if the block cannot be duplicated at all, or if any of its incoming
edges are complex and come from a block that does not require a prologue
@@ -478,7 +502,7 @@ can_dup_for_shrink_wrapping (basic_block bb, basic_block 
pro, unsigned max_size)
   edge e;
   edge_iterator ei;
   FOR_EACH_EDGE (e, ei, bb->preds)
-if (e->flags & EDGE_COMPLEX
+if (e->flags & (EDGE_COMPLEX | EDGE_CROSSING)
&& !dominated_by_p (CDI_DOMINATORS, e->src, pro))
   return false;
 
@@ -577,14 +601,13 @@ fix_fake_fallthrough_edge (edge e)
(bb 4 is duplicated to 5; the prologue is inserted on the edge 5->3).
 
ENTRY_EDGE is the edge where the prologue will be placed, possibly
-   changed by this function.  ORIG_ENTRY_EDGE is the edge where it
-   would be placed without shrink-wrapping.  BB_WITH is a bitmap that,
-   if we do shrink-wrap, will on return contain the interesting blocks
-   that run with prologue.  PROLOGUE_SEQ is the prologue we will insert.  */
+   changed by this function.  BB_WITH is a bitmap that, if we do shrink-
+   wrap, will on return contain the interesting blocks that run with
+   prologue.  PROLOGUE_SEQ is the prologue we will insert.  */
 
 void
-try_shrink_wrapping (edge *entry_edge, edge orig_entry_edge,
-bitmap_head *bb_with, rtx_insn *prologue_seq)
+try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
+  

[PATCH][RS6000] Migrate from reduc_xxx to reduc_xxx_scal optabs

2015-09-18 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html
after discovering that patch was broken on power64le - thanks to Bill Schmidt
for pointing out that gcc112 is the opposite endianness to gcc110...

This time I decided to avoid any funny business with making RTL match other
patterns in other .md files, and instead to directly call the relevant
expanders. This should thus preserve the codegen of the previous expansion path.
Moreover, combining the uplus and splus expansion paths (as addition is the same
regardless of signedness) causes some additional examples to be reduced directly
via patterns.

Bootstrapped + check-g{cc,++,fortran}
on powerpc64-none-linux-gnu (--with-cpu=power7)
and powerpc64le-none-linux-gnu (--with-cpu=power8).

gcc/ChangeLog:

* config/rs6000/altivec.md (reduc_splus_): Rename to...
(reduc_plus_scal_): ...this, add rs6000_expand_vector_extract.
(reduc_uplus_v16qi): Remove.

* config/rs6000/vector.md (VEC_reduc_name): Change "splus" to "plus".
(reduc__v2df): Remove.
(reduc__v4sf): Remove.
(reduc__scal_): New.

* config/rs6000/vsx.md (vsx_reduc__v2df): Declare
gen_ function by removing * prefix.
(vsx_reduc__v4sf): Likewise.
---
 gcc/config/rs6000/altivec.md | 25 ++-
 gcc/config/rs6000/vector.md  | 47 ++--
 gcc/config/rs6000/vsx.md |  4 ++--
 3 files changed, 27 insertions(+), 49 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 4170f38..93ce1f0 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2648,35 +2648,22 @@
   operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
 })
 
-(define_expand "reduc_splus_"
-  [(set (match_operand:VIshort 0 "register_operand" "=v")
+(define_expand "reduc_plus_scal_"
+  [(set (match_operand: 0 "register_operand" "=v")
 (unspec:VIshort [(match_operand:VIshort 1 "register_operand" "v")]
UNSPEC_REDUC_PLUS))]
   "TARGET_ALTIVEC"
 {
   rtx vzero = gen_reg_rtx (V4SImode);
   rtx vtmp1 = gen_reg_rtx (V4SImode);
-  rtx dest = gen_lowpart (V4SImode, operands[0]);
+  rtx vtmp2 = gen_reg_rtx (mode);
+  rtx dest = gen_lowpart (V4SImode, vtmp2);
+  int elt = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 : 0;
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4ss (vtmp1, operands[1], vzero));
   emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
-  DONE;
-})
-
-(define_expand "reduc_uplus_v16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=v")
-(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
- UNSPEC_REDUC_PLUS))]
-  "TARGET_ALTIVEC"
-{
-  rtx vzero = gen_reg_rtx (V4SImode);
-  rtx vtmp1 = gen_reg_rtx (V4SImode);
-  rtx dest = gen_lowpart (V4SImode, operands[0]);
-
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
-  emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero));
-  emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
+  rs6000_expand_vector_extract (operands[0], vtmp2, elt);
   DONE;
 })
 
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 8821dec..d8699c8 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -78,7 +78,7 @@
 ;; Vector reduction code iterators
 (define_code_iterator VEC_reduc [plus smin smax])
 
-(define_code_attr VEC_reduc_name [(plus "splus")
+(define_code_attr VEC_reduc_name [(plus "plus")
  (smin "smin")
  (smax "smax")])
 
@@ -1061,38 +1061,29 @@
   "")
 
 ;; Vector reduction expanders for VSX
-
-(define_expand "reduc__v2df"
-  [(parallel [(set (match_operand:V2DF 0 "vfloat_operand" "")
-  (VEC_reduc:V2DF
-   (vec_concat:V2DF
-(vec_select:DF
- (match_operand:V2DF 1 "vfloat_operand" "")
- (parallel [(const_int 1)]))
-(vec_select:DF
- (match_dup 1)
- (parallel [(const_int 0)])))
-   (match_dup 1)))
- (clobber (match_scratch:V2DF 2 ""))])]
-  "VECTOR_UNIT_VSX_P (V2DFmode)"
-  "")
-
-; The (VEC_reduc:V4SF
+; The (VEC_reduc:...
 ;  (op1)
-;  (unspec:V4SF [(const_int 0)] UNSPEC_REDUC))
+;  (unspec:... [(const_int 0)] UNSPEC_REDUC))
 ;
 ; is to allow us to use a code iterator, but not completely list all of the
 ; vector rotates, etc. to prevent canonicalization
 
-(define_expand "reduc__v4sf"
-  [(parallel [(set (match_operand:V4SF 0 "vfloat_operand" "")
-  (VEC_reduc:V4SF
-   (unspec:V4SF [(const_int 0)] UNSPEC_REDUC)
-   (match_operand:V4SF 1 "vfloat_operand" "")))
- (clobber (match_scratch:V4SF 2 ""))
- (clobber (match_scratch:V4SF 3 ""))])]
-  "VECTOR_UNIT_VSX_P (V4SFmode)"
-  "")
+
+

Re: [PATCH, rs6000] Add expansions for min/max vector reductions

2015-09-18 Thread Alan Lawrence

On 18/09/15 09:35, Richard Biener wrote:


Btw, we ditched the original reduce-to-vector variant due to its
endianess issues (it only had _one_ element of the vector contain
the reduction result).  Re-introducing reduce-to-vector but with
the reduction result in all elements wouldn't have any issue like
that.

Of course it would introduce a third optab variant similar to the
old (now legacy) version but with different semantics...


I am working (again) on finishing this migration, and think I have fixed the 
endianness issues on PowerPC, patch posted just now.  MIPS/Loongson, I'm still 
struggling with; AFAICT, they implement their own reduc_ pattern for most cases, 
which for the most part repeats the midend's approach using shifts, but the 
first step uses a funky permute, as they have a faster instruction for this...


--Alan



Re: [PR64164] drop copyrename, integrate into expand

2015-09-18 Thread Alan Lawrence

On 02/09/15 23:12, Alexandre Oliva wrote:

On Sep  2, 2015, Alan Lawrence  wrote:


One more failure to report, I'm afraid. On AArch64 Bigendian,
aapcs64/func-ret-4.c ICEs in simplify_subreg (line refs here are from
r227348):


Thanks.  The failure mode was different in the current, revamped git
branch aoliva/pr64164, but I've just fixed it there.

I'm almost ready to post a new patch, with a new, simpler, less fragile
and more maintainable approach to integrate cfgexpand and assign_parms'
RTL assignment, so if you could give it a spin on big and little endian
aarch64 natives, that would be very much appreciated!



On trunk, aarch64_be is still ICEing in gcc.target/aarch64/aapcs64/func-ret-4.c 
(complex numbers).


With the latest git commit 2b27ef197ece54c4573c5a748b0d40076e35412c on branch 
aoliva/pr64164, I am now able to build a cross toolchain for aarch64 and 
aarch64_be, and can confirm the ABI failure is fixed on the branch.


HTH,
Alan



Re: Fwd: [PATCH] Enable libstdc++ numeric conversions on Cygwin

2015-09-18 Thread Jonathan Wakely

On 18/09/15 11:17 -0400, Jennifer Yao wrote:

A number of functions in libstdc++ are guarded by the _GLIBCXX_USE_C99
preprocessor macro, which is only defined on systems that pass all of
the checks for a large set of C99 functions. Consequently, on systems
which lack any of the required C99 facilities (e.g. Cygwin, which
lacks some C99 complex math functions), the numeric conversion
functions (std::stoi(), std::stol(), std::to_string(), etc.) are not
defined—a rather silly outcome, as none of the numeric conversion
functions are implemented using C99 math functions.

This patch enables numeric conversion functions on the aforementioned
systems by splitting the checks for C99 support and defining several
new macros (_GLIBCXX_USE_C99_STDIO, _GLIBCXX_USE_C99_STDLIB, and
_GLIBCXX_USE_C99_WCHAR), which replace the use of _GLIBCXX_USE_C99 in
#if conditionals where appropriate.


Awesome! This has been on my TODO list for ages, but I've not made
much progress. I *definitely* want to see this change happen, but
there are some legal prerequisites that need to be met before that can
happen, see
https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html#contrib.list

Do you already have a copyright assignment for GCC?

If not, would you be willing to complete one?

THanks for doing this work, it will help several other platforms, not
only Cygwin.


N.B. I don't see a patch attached to your mail, but that's not a
problem for now as I don't want to look at it until I know the status
of your copyright assignment (if we don't end up using your patch and
I do it myself then I don't want to plagiarise your work!)





[Patch/ccmp] Cost instruction sequences to choose better expand order

2015-09-18 Thread Jiong Wang

Current conditional compare (CCMP) support in GCC aim to optimize
short circuit for cascade comparision, given a simple conditional
compare candidate:

  if (a == 17 || a == 32)

it's represented like the following in IR:

  t0 = a == 17
  t1 = a == 32
  t2 = t0 || t1


Normally, CCMP contains two parts, the first part calculate the initial
condition using normal compare instruction like "cmp", then the second
part can use "ccmp" instruction and exected conditionally based on the
condition generated in the first step.

The problem is current implementation always expand t0 first, then
t1. While the expand order need to consider the rtx costs, because "cmp"
and "ccmp" may have different restrictions that the expand order will
result in performance differences.

For example on AArch64, "ccmp" only accept immediate within -31 ~ 31
while "cmp" accept wider range, so if we expand "a == 32" in the second
step, then it will use "ccmp", and thus an extra "mov reg, 32"
instruction is generated because "32" is out of the range. While if we
expand "a == 32" first, then it's fine as "cmp" can encoding it.

Instruction difference for a simple testcase is listed below:

int foo(int a)
{
  if (a == 17 || a == 32)
return 1;
  else
return 2;
}

before
===
foo:
mov w1, 32
cmp w0, 17
ccmpw0, w1, 4, ne
csetw0, ne
add w0, w0, 1
ret

after
===
foo:
cmp w0, 32
ccmpw0, 17, 4, ne
csetw0, ne
add w0, w0, 1
ret

this patch still haven't fix other complexer situations, for example given:

 if (a == 1 || a = 29 || a == 32)

there is recursive call of expand_ccmp_expr_1 while this patch only
handle the inner most call where the incoming gimple is with both
operands be comparision operations.

NOTE: AArch64 backend can't cost CCMP instruction accurately, so I marked
the testcase as XFAIL which will be removed once we fix the cost issue.

2015-09-18  Jiong Wang  

gcc/
  * ccmp.c (expand_ccmp_expr_1): Costs the instruction sequences
  generated from different expand order.
  
gcc/testsuite/
  * gcc.target/aarch64/ccmp_1.c: New testcase.
  
-- 
Regards,
Jiong

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index 3c3fbcd..dc985f6 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-outof-ssa.h"
 #include "cfgexpand.h"
 #include "ccmp.h"
+#include "predict.h"

 /* The following functions expand conditional compare (CCMP) instructions.
Here is a short description about the over all algorithm:
@@ -165,6 +166,8 @@ expand_ccmp_next (gimple g, enum tree_code code, rtx prev,
 static rtx
 expand_ccmp_expr_1 (gimple g, rtx *prep_seq, rtx *gen_seq)
 {
+  rtx prep_seq_1, gen_seq_1;
+  rtx prep_seq_2, gen_seq_2;
   tree exp = gimple_assign_rhs_to_tree (g);
   enum tree_code code = TREE_CODE (exp);
   gimple gs0 = get_gimple_for_ssa_name (TREE_OPERAND (exp, 0));
@@ -180,19 +183,62 @@ expand_ccmp_expr_1 (gimple g, rtx *prep_seq, rtx *gen_seq)
 {
   if (TREE_CODE_CLASS (code1) == tcc_comparison)
 	{
-	  int unsignedp0;
-	  enum rtx_code rcode0;
+	  int unsignedp0, unsignedp1;
+	  enum rtx_code rcode0, rcode1;
+	  int speed_p = optimize_insn_for_speed_p ();
+	  rtx tmp2, ret, ret2;
+	  unsigned cost1 = MAX_COST;
+	  unsigned cost2 = MAX_COST;
+	  bool first_only_p = false;
+	  bool second_only_p = false;

 	  unsignedp0 = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (gs0)));
+	  unsignedp1 = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (gs1)));
 	  rcode0 = get_rtx_code (code0, unsignedp0);
-
-	  tmp = targetm.gen_ccmp_first (prep_seq, gen_seq, rcode0,
+	  rcode1 = get_rtx_code (code1, unsignedp1);
+	  tmp = targetm.gen_ccmp_first (&prep_seq_1, &gen_seq_1, rcode0,
 	gimple_assign_rhs1 (gs0),
 	gimple_assign_rhs2 (gs0));
-	  if (!tmp)
-	return NULL_RTX;
+	  tmp2 = targetm.gen_ccmp_first (&prep_seq_2, &gen_seq_2, rcode1,
+	 gimple_assign_rhs1 (gs1),
+	 gimple_assign_rhs2 (gs1));

-	  return expand_ccmp_next (gs1, code, tmp, prep_seq, gen_seq);
+	  if (! tmp && ! tmp2)
+	return NULL_RTX;
+	  else if (! tmp)
+	second_only_p = true;
+	  else if (! tmp2)
+	first_only_p = true;
+
+
+	  if (! second_only_p)
+	{
+	  ret = expand_ccmp_next (gs1, code, tmp, &prep_seq_1, &gen_seq_1);
+	  cost1 = seq_cost (safe_as_a  (prep_seq_1),
+speed_p);
+	  cost1 += seq_cost (safe_as_a  (gen_seq_1), speed_p);
+	}
+	  if (! first_only_p)
+	{
+	  ret2 = expand_ccmp_next (gs0, code, tmp2, &prep_seq_2,
+   &gen_seq_2);
+	  cost2 = seq_cost (safe_as_a  (prep_seq_2),
+speed_p);
+	  cost2 += seq_cost (safe_as_a  (gen_seq_2), speed_p);
+	}
+
+	  if (cost1 > cost2)
+	{
+	  *prep_seq = prep_seq_2;
+	  *gen_seq = gen_seq_2;
+	  ret = ret2;
+	}
+	  else
+	{
+	  *prep_seq = prep_seq_1;
+	  *gen_seq = gen_seq_1;
+	}
+	  return ret;
 	}
   else
 	{
diff --git a/gcc/

Re: [RFC] Masking vectorized loops with bound not aligned to VF.

2015-09-18 Thread Kirill Yukhin
Hello,
On 18 Sep 10:31, Richard Biener wrote:
> On Thu, 17 Sep 2015, Ilya Enkovich wrote:
> 
> > 2015-09-16 15:30 GMT+03:00 Richard Biener :
> > > On Mon, 14 Sep 2015, Kirill Yukhin wrote:
> > >
> > >> Hello,
> > >> I'd like to initiate discussion on vectorization of loops which
> > >> boundaries are not aligned to VF. Main target for this optimization
> > >> right now is x86's AVX-512, which features per-element embedded masking
> > >> for all instructions. The main goal for this mail is to agree on overall
> > >> design of the feature.
> > >>
> > >> This approach was presented @ GNU Cauldron 2015 by Ilya Enkovich [1].
> > >>
> > >> Here's a sketch of the algorithm:
> > >>   1. Add check on basic stmts for masking: possibility to introduce 
> > >> index vector and
> > >>  corresponding mask
> > >>   2. At the check if statements are vectorizable we additionally check 
> > >> if stmts
> > >>  need and can be masked and compute masking cost. Result is stored 
> > >> in `stmt_vinfo`.
> > >>  We are going  to mask only mem. accesses, reductions and modify 
> > >> mask for already
> > >>  masked stmts (mask load, mask store and vect. condition)
> > >
> > > I think you also need to mask divisions (for integer divide by zero) and
> > > want to mask FP ops which may result in NaNs or denormals (because that's
> > > generally to slow down execution a lot in my experience).
> > >
> > > Why not simply mask all stmts?
> > 
> > Hi,
> > 
> > Statement masking may be not free. Especially if we need to transform
> > mask somehow to do it. It also may be unsupported on a platform (e.g.
> > for AVX-512 not all instructions support masking) but still not be a
> > problem to mask a loop. BTW for AVX-512 masking doesn't boost
> > performance even if we have some special cases like NaNs. We don't
> > consider exceptions in vector code (and it seems to be a case now?)
> > otherwise we would need to mask them also.
> 
> Well, we do need to honor
> 
>   if (x != 0.)
>y[i] = z[i] / x;
> 
> in some way.  I think if-conversion currently simply gives up here.
> So if we have the epilogue and using masked loads what are the
> contents of the 'masked' elements (IIRC they are zero or all-ones, 
> right)?  If the end up as zero then even simple code like
> 
>   for (i;;)
>a[i] = b[i] / c[i];
> 
> cannot be transformed in the suggested way with -ftrapping-math
> and the remainder iteration might get slow if processing NaN
> operands is still as slow as it was 10 years ago.
> 
> IMHO for if-converting possibly trapping stmts (like the above
> example) we need some masking support anyway (and a way to express
> the masking in GIMPLE).
We'll use if-cvt technique. If op is trapping - we do not apply masking for 
loop remainder
This is subject for further development. Currently we don't try truly mask 
existing GIMPLE
stmts. All masking is achieved using `vec_cond` and we're not sure that 
trapping is really
useful feature while vectorization is on.

> > >>   3. Make a decision about masking: take computed costs and est. 
> > >> iterations count
> > >>  into consideration
> > >>   4. Modify prologue/epilogue generation according decision made at 
> > >> analysis. Three
> > >>  options available:
> > >> a. Use scalar remainder
> > >> b. Use masked remainder. Won't be supported in first version
> > >> c. Mask main loop
> > >>   5.Support vectorized loop masking:
> > >> - Create stmts for mask generation
> > >> - Support generation of masked vector code (create generic vector 
> > >> code then
> > >>   patch it w/ masks)
> > >>   -  Mask loads/stores/vconds/reductions only
> > >>
> > >>  In first version (targeted v6) we're not going to support 4.b and loop
> > >> mask pack/unpack. No `pack/unpack` means that masking will be supported
> > >> only for types w/ the same size as index variable
> > >
> > > This means that if ncopies for any stmt is > 1 masking won't be supported,
> > > right?  (you'd need two or more different masks)
> > 
> > We don't think it is a very important feature to have in initial
> > version. It can be added later and shouldn't affect overall
> > implementation design much. BTW currently masked loads and stores
> > don't support masks of other sizes and don't do masks pack/unpack.
> 
> I think masked loads/stores support this just fine.  Remember the
> masks are regular vectors generated by cond exprs in the current code.
Not quite true, mask load/stores are not supported for different size.
E.g. this example is not vectorized:
  int a[LENGTH], b[LENGTH];
  long long c[LENGTH];

  int test ()
  {
int i;
#pragma omp simd safelen(16)
for (i = 0; i < LENGTH; i++)
  if (a[i] > b[i])
c[i] = 1;
  }

> > >> [1] - 
> > >> https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf
> > >>
> > >> What do you think?
> > >
> > > There was the idea some time ago to use single-iteration vector
> > > variants 

Re: [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.

2015-09-18 Thread Ramana Radhakrishnan
Hi Andrew,


>>
>> Tested the series for aarch64-none-linux-gnu with native bootstrap and
>> make check. Also tested for aarch64-none-elf with cross-compiled
>> check-gcc on an ARMv8.1 emulator with +lse enabled by default.
> 
> 
> Are you going to add some builtins for MIN/MAX support too?

The ACLE does not specify special intrinsics for any of the atomic instructions,
implementing the GCC intrinsics for atomics. The AArch64 backend is consistent
with the ACLE in terms of the intrinsics that have been added into it.

Having had some internal discussions on this topic - I've been made aware that 
there are proposals such as
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3696.htm, though not 
considered for C++14 might
be a useful avenue to explore. It is better to add intrinsics to the general 
GCC language extensions that
matched with this or were suitable rather than something to the AArch64 backend.

We have no immediate plans of doing so - Is this something you can help with ?

regards
Ramana

> 
> Thanks,
> Andrew Pinski
> 
>>
>> Ok for trunk?
>> Matthew
>>
>> 2015-09-17  Matthew Wahab  
>>
>> * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop):
>> Adjust declaration.
>> * config/aarch64/aarch64.c (aarch64_emit_bic): New.
>> (aarch64_gen_atomic_load_op): Adjust comment.  Add parameter
>> out_result.  Update to support update-fetch operations.
>> * config/aarch64/atomics.md (aarch64_atomic_exchange_lse):
>> Adjust for change to aarch64_gen_atomic_ldop.
>> (aarch64_atomic__lse): Likewise.
>> (aarch64_atomic_fetch__lse): Likewise.
>> (atomic__fetch): Change to an expander.
>> (aarch64_atomic__fetch): New.
>> (aarch64_atomic__fetch_lse): New.
>>
>> gcc/testsuite
>> 2015-09-17  Matthew Wahab  
>>
>> * gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for
>> update-fetch operations.
>> * gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise.
>>


Re: [c++-delayed-folding] cp_fold_r

2015-09-18 Thread Jason Merrill

On 09/18/2015 02:19 AM, Kai Tietz wrote:

Hi Jason,

Sounds like an interesting idea.  Do have already a specific approach in mind?

My idea might be just hard to model, as we aren't sure we walked
before the complete chain.  Due cp_fold is caching, we won't try to
fold an expression a second time, but we don't cover all EXPRs in
cp_fold, which makes it hard to tell, if we ended up walking the
complete expression-tree, or ended on an unknown expression.
So we could add to cp_fold an additional return-value, which indicates
if we ended with an unknown expression (means default statement).
This we could use later on to decided if we need to walk sub-tree, or
not.  Not sure if that is best approach, but it could help to avoid
some double runs.


That makes sense, but maybe cp_fold should handle all expressions?


2015-09-17 8:10 GMT+02:00 Jason Merrill :

I think we want to clear *walk_subtrees a lot more often in cp_fold_r; as it
is, for most expressions we end up calling cp_fold on the full-expression,
then uselessly on the subexpressions after we already folded the containing
expression.




Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Ville Voutilainen
On 18 September 2015 at 19:27, Jason Merrill  wrote:
> On 09/17/2015 06:23 PM, Ville Voutilainen wrote:
>>
>> This patch doesn't handle attributes yet, it looks to
>> me as if gcc doesn't support namespace attributes in the location that
>> the standard grammar puts them into.
> Mind fixing that, too?

Can we please do that separately?

>
>> + "-std=c++17 or -std=gnu++17");
> Please use "1z" until C++17 is final.


Will do.


Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Jason Merrill

On 09/17/2015 06:23 PM, Ville Voutilainen wrote:

This patch doesn't handle attributes yet, it looks to
me as if gcc doesn't support namespace attributes in the location that
the standard grammar puts them into.


Mind fixing that, too?


+ "-std=c++17 or -std=gnu++17");


Please use "1z" until C++17 is final.

Jason



Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Jason Merrill

On 09/18/2015 12:30 PM, Ville Voutilainen wrote:

On 18 September 2015 at 19:27, Jason Merrill  wrote:

On 09/17/2015 06:23 PM, Ville Voutilainen wrote:


This patch doesn't handle attributes yet, it looks to
me as if gcc doesn't support namespace attributes in the location that
the standard grammar puts them into.

Mind fixing that, too?


Can we please do that separately?


I suppose so, but it seems pretty trivial.  In any case, looks like your 
patch would accept the odd


namespace A __attribute ((visibility ("default"))) ::B { }

Jason



[PATCH, middle-end]: Fix PR67619, ICE at -O1 and above in int_mode_for_mode, at stor-layout.c

2015-09-18 Thread Uros Bizjak
Hello!

When expanding __bultin_eh_return, its address pointers can degrade to
a modeless constant. Use copy_addr_to_reg to always give temporary
register a Pmode.

2015-09-18  Uros Bizjak  

PR middle-end/67619
* except.c (expand_builtin_eh_return): Use copy_addr_to_reg to copy
the address to a register.

testsuite/ChangeLog:

2015-09-18  Uros Bizjak  

PR middle-end/67619
* gcc.dg/torture/pr67619.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN as obvious, the patch will be backported to
other release branches in a couple of days.

Uros.
Index: testsuite/gcc.dg/torture/pr67619.c
===
--- testsuite/gcc.dg/torture/pr67619.c  (revision 0)
+++ testsuite/gcc.dg/torture/pr67619.c  (revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+void
+foo ()
+{
+  unsigned long l;
+  void *p = 0; 
+
+  __builtin_unwind_init ();
+  l = 0; 
+  __builtin_eh_return (l, p);
+}
Index: except.c
===
--- except.c(revision 227896)
+++ except.c(working copy)
@@ -2219,7 +2219,7 @@ expand_builtin_eh_return (tree stackadj_tree ATTRI
 VOIDmode, EXPAND_NORMAL);
   tmp = convert_memory_address (Pmode, tmp);
   if (!crtl->eh.ehr_stackadj)
-crtl->eh.ehr_stackadj = copy_to_reg (tmp);
+crtl->eh.ehr_stackadj = copy_addr_to_reg (tmp);
   else if (tmp != crtl->eh.ehr_stackadj)
 emit_move_insn (crtl->eh.ehr_stackadj, tmp);
 #endif
@@ -2228,7 +2228,7 @@ expand_builtin_eh_return (tree stackadj_tree ATTRI
 VOIDmode, EXPAND_NORMAL);
   tmp = convert_memory_address (Pmode, tmp);
   if (!crtl->eh.ehr_handler)
-crtl->eh.ehr_handler = copy_to_reg (tmp);
+crtl->eh.ehr_handler = copy_addr_to_reg (tmp);
   else if (tmp != crtl->eh.ehr_handler)
 emit_move_insn (crtl->eh.ehr_handler, tmp);
 


Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (take

2015-09-18 Thread Martin Sebor

Done in the below.  This version actually bootstraps, because I've added
-Wno-duplicated-cond for insn-dfatab.o and insn-latencytab.o (don't know
how to fix these) + I've tweaked a condition in genemit.c.  The problem
here is that we have

   if (INTVAL (x) == 0)
 printf ("const0_rtx");
   else if (INTVAL (x) == 1)
 printf ("const1_rtx");
   else if (INTVAL (x) == -1)
 printf ("constm1_rtx");
   // ...
   else if (INTVAL (x) == STORE_FLAG_VALUE)
 printf ("const_true_rtx");

and STORE_FLAG_VALUE happens to be 1, so we have two same conditions.
STORE_FLAG_VALUE is 1 or -1, but according to the documentation it can
also be some other number so we should keep this if statement.  I've
avoided the warning by adding STORE_FLAG_VALUE > 1 check.


Binutils and GLIBC also fail to build due to similar problems (in
addition to several errors triggered by the new -Wunused-const-variable
warning).

The one in GLIBC is trivial to fix by guarding the code with
#if N != 1:

In file included from ../sysdeps/x86_64/ldbl2mpn.c:1:0:
../sysdeps/x86_64/../i386/ldbl2mpn.c: In function 
‘__mpn_extract_long_double’:
../sysdeps/x86_64/../i386/ldbl2mpn.c:89:24: error: duplicated ‘if’ 
condition [-Werror=duplicated-cond]

else if (res_ptr[0] != 0)
^
../sysdeps/x86_64/../i386/ldbl2mpn.c:74:23: note: previously used here
if (res_ptr[N - 1] != 0)
   ^

The one in Binutils is pretty easy to fix too:

In function ‘stab_int_type’:
/home/msebor/scm/fsf/binutils-gdb/binutils/wrstabs.c:665:18: error: 
duplicated if’ condition [-Werror=duplicated-cond]

else if (size == 8)
  ^
/home/msebor/scm/fsf/binutils-gdb/binutils/wrstabs.c:663:18: note: 
previously used here

else if (size == sizeof (long))
  ^

but it makes me wonder how common this pattern is in portable
code and whether adding workarounds for it is the right solution
(or if it might prompt people to disable the warning, which would
be a shame).

As an aside, I would have expected the change you implemented
in GCC to get around this to trigger some other warning (such
as -Wtautological-compare) e.g., if (a > 1 && a == -1), but it
doesn't seem to, either in GCC or Clang.



How does this look like now?


If no one else is concerned about the above it looks good to
me. I was hoping to see the warning emitted for conditional
expressions as well but that can be considered an enhancement.

FWIW, while testing the patch I noticed the following bug: 67629.
It seems like the same logic was we discussed in this context is
needed there as well.

Martin



Re: [RFC] Try vector as a new representation for vector masks

2015-09-18 Thread Richard Henderson
On 09/18/2015 06:21 AM, Ilya Enkovich wrote:
>>> +machine_mode
>>> +default_get_mask_mode (unsigned nunits, unsigned vector_size)
>>> +{
>>> +  unsigned elem_size = vector_size / nunits;
>>> +  machine_mode elem_mode
>>> += smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
>>
>> Why these arguments as opposed to passing elem_size?  It seems that every 
>> hook
>> is going to have to do this division...
> 
> Every target would have nunits = vector_size / elem_size because
> nunits is used to create a vector mode. Thus no difference.

I meant passing nunits and elem_size, but not vector_size.  Thus no division
required.  If the target does require the vector size, it could be obtained by
multiplication, which is cheaper.  But in cases like this we'd not require
either mult or div.

>>> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
>>>create_output_operand (&ops[0], target, TYPE_MODE (type));
>>>create_fixed_operand (&ops[1], mem);
>>>create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
>>> -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
>>> +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
>>> +   TYPE_MODE (TREE_TYPE (maskt))),
>>> +3, ops);
>>
>> Why do we now need a conversion here?
> 
> Mask mode was implicit for masked loads and stores. Now it becomes
> explicit because we may load the same value using different masks.
> E.g. for i386 we may load 256bit vector using both vector and scalar
> masks.

Ok, sure, the mask mode is needed, I get that.  But that doesn't answer the
question regarding conversion.  Why would convert_optab_handler be needed to
*change* the mode of the mask.  I assume that's not actually possible, with the
target hook already having chosen the proper mode for the mask.


r~


Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Ville Voutilainen
On 18 September 2015 at 19:34, Jason Merrill  wrote:
 This patch doesn't handle attributes yet, it looks to
 me as if gcc doesn't support namespace attributes in the location that
 the standard grammar puts them into.
>>> Mind fixing that, too?
>> Can we please do that separately?
> I suppose so, but it seems pretty trivial.  In any case, looks like your
> patch would accept the odd
> namespace A __attribute ((visibility ("default"))) ::B { }


Yes, or namespace A[[nonsense]]::B {}. Those cases are easy to fix,
but namespace [[attribute_in_proper_location]] A {} seemingly caused
weird barfing. That's why I didn't put in the rejection of the former,
I'd prefer
to figure out the latter and the former at the same time, and I'd prefer doing
that once the basic facility is in. Yes, partly because I'll travel tomorrow. :)


[gomp4.1] DOACROSS expansion and various fixes

2015-09-18 Thread Jakub Jelinek
Hi!

This patch implements DOACROSS expansion (both tweaks the omp for
expansion to set up everything that is needed and call new APIs,
and expands ordered depend regions too).  In addition to that
it fixes some bugs in lower_omp_ordered_clauses, in particular
the indices other than the first one (or for collapsed loops more)
should be indices of the lexically latest iteration, so for forward
loops and ordered(2) it is actually maximum, not minimum, and for
say ordered(3) collapse(1) loops it shouldn't find maximum or minimum
of each indice individually, but find one that has the outer-most
dimension after collapse maximal or minimal (and if multiple sink vectors
have the same outer most one, then the second etc.).

Various things are still not implemented, like loops with unsigned long
and long long/unsigned long long iterators.  Or apparently we can't
use GCD if the first POST in the loop is not dominated by the WAITs
(that will mean we probably have to move that optimization from lowering
to expansion).  Collapse > 1 is not handled in the optimization either.
And for unsigned iterators I have various questions to be clarified in the
standard.

The library side is almost missing for now, all I've done is implemented
the loop start APIs, so that I can at least test the expand_omp_for_generic
expansion somewhat.  The next week I'm going to create the needed data
structures during the initialization and actually implement (perhaps only
busy waiting for now) the post/wait calls.

2015-09-18  Jakub Jelinek  

* gimplify.c (gimplify_omp_for): Push into
loop_iter_var vector both the original and new decl.
(gimplify_omp_ordered): Update the decl in TREE_VALUE
from the original to the new decl.
* omp-low.c (struct omp_region): Adjust comments,
add ord_stmt field.
(extract_omp_for_data): Canonicalize cond_code even for
ordered loops after collapsed ones.  If loops is non-NULL,
fd->collapse == 1 and fd->ordered > 1, treat the outermost
loop similarly to collapsed ones, n1 == 0, step == 1, n2 == constant
or variable number of iterations.
(check_omp_nesting_restrictions): Only check outer context
when verifying ordered depend construct is closely nested in
for ordered construct.
(expand_omp_for_init_counts): Rename zero_iter_bb argument to
zero_iter1_bb and first_zero_iter to first_zero_iter1, add
zero_iter2_bb and first_zero_iter2 arguments, handle computation
of counts even for ordered loops.
(expand_omp_ordered_source, expand_omp_ordered_sink,
expand_omp_ordered_source_sink): New functions.
(expand_omp_for_ordered_loops): Add counts argument, initialize
the counts vars if needed.  Fix up !gsi_end_p (gsi) handling,
use the right step for each loop.
(expand_omp_for_generic): Handle expansion of doacross loops.
(expand_omp_for_static_nochunk, expand_omp_for_static_chunk,
expand_omp_simd, expand_omp_taskloop_for_outer,
expand_omp_taskloop_for_inner): Adjust expand_omp_for_init_counts
callers.
(expand_omp_for): Handle doacross loops.
(expand_omp): Don't expand ordered depend constructs here, record
ord_stmt instead for later expand_omp_for_generic.
(lower_omp_ordered_clauses): Don't ICE on collapsed loops, just
give up on them for now.  For loops other than the first or
collapsed ones compute lexically latest loop rather than minimum
or maximum from each constant separately.  Simplify.
* omp-builtins.def (BUILT_IN_GOMP_LOOP_DOACROSS_STATIC_START,
BUILT_IN_GOMP_LOOP_DOACROSS_DYNAMIC_START,
BUILT_IN_GOMP_LOOP_DOACROSS_GUIDED_START,
BUILT_IN_GOMP_LOOP_DOACROSS_RUNTIME_START,
BUILT_IN_GOMP_DOACROSS_POST, BUILT_IN_GOMP_DOACROSS_WAIT): New.
* builtin-types.def (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR,
BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, BT_FN_VOID_LONG_VAR):
New.
gcc/fortran/
* types.def (BT_FN_BOOL_UINT_LONGPTR_LONGPTR_LONGPTR,
BT_FN_BOOL_UINT_LONGPTR_LONG_LONGPTR_LONGPTR, BT_FN_VOID_LONG_VAR):
New.
* f95-lang.c (DEF_FUNCTION_TYPE_VAR_1): Define.
gcc/testsuite/
* c-c++-common/gomp/sink-4.c: Don't expect the constant to have
pointer type.
* gcc.dg/gomp/sink-fold-3.c: Likewise.
* gcc.dg/gomp/sink-fold-1.c (k): New variable.
(funk): Add another ordered loop, use better test values and
adjust the expected result.
libgomp/
* libgomp.map (GOMP_4.1): Add GOMP_loop_doacross_dynamic_start,
GOMP_loop_doacross_guided_start, GOMP_loop_doacross_runtime_start,
GOMP_loop_doacross_static_start, GOMP_doacross_post and
GOMP_doacross_wait exports.
* ordered.c: Include stdarg.h.
(GOMP_doacross_post, GOMP_doacross_wait): New functions.
* loop.c (gomp_loop_doacross

Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Jason Merrill

On 09/18/2015 12:58 PM, Ville Voutilainen wrote:

On 18 September 2015 at 19:34, Jason Merrill  wrote:

This patch doesn't handle attributes yet, it looks to
me as if gcc doesn't support namespace attributes in the location that
the standard grammar puts them into.

Mind fixing that, too?

Can we please do that separately?

I suppose so, but it seems pretty trivial.  In any case, looks like your
patch would accept the odd
namespace A __attribute ((visibility ("default"))) ::B { }


Yes, or namespace A[[nonsense]]::B {}. Those cases are easy to fix,
but namespace [[attribute_in_proper_location]] A {} seemingly caused
weird barfing. That's why I didn't put in the rejection of the former, I'd 
prefer
to figure out the latter and the former at the same time, and I'd prefer doing
that once the basic facility is in. Yes, partly because I'll travel tomorrow. :)


To fix the former, you just need to keep


   /* Parse any specified attributes.  */
   attribs = cp_parser_attributes_opt (parser);


next to the open brace.  OK with that change, I suppose the other can wait.

Jason



[PATCH,committed] Read-only DWARF2 frame tables on AIX

2015-09-18 Thread David Edelsohn
Attached is the final version of the patch that allows AIX to use a
read-only version of DWARF2 frame tables for exception handling
instead of placing the tables in the data section.  This allows the
tables to be shared by multiple processes on AIX and reduces memory
requirements.

The common bits were approved by Jason Merrill during an earlier round
of reviews.

Bootstrapped on powerpc-ibm-aix7.1.0.0.

libgcc/

* config.host (powerpc-ibm-aix*): Add crtdbase.o to extra_parts.
* config/rs6000/crtdbase.S: New file.
* config/rs6000/t-aix-cxa: Build crtdbase.o.

gcc/

* defaults.h (EH_FRAME_SECTION_NAME): Depend on
EH_FRAME_THROUGH_COLLECT2.
* dwarf2asm.c (dw2_asm_output_encoded_addr_rtx): Add case for
DW_EH_PE_datarel.
* dwarf2out.c (switch_to_eh_frame_section): Use a read-only section
even if EH_FRAME_SECTION_NAME is undefined.  Restrict special
collect2 labels to EH_FRAME_THROUGH_COLLECT2.
* except.c (switch_to_exception_section): Use a read-only section
even if EH_FRAME_SECTION_NAME is undefined.
* system.h (EH_FRAME_IN_DATA_SECTION): Poison.
* collect2.c (write_c_file_stat): Provide dbase on AIX.
(scan_prog_file): Don't export __dso_handle nor
__gcc_unwind_dbase.
* config/rs6000/aix.h (ASM_PREFERRED_EH_DATA_FORMAT): Define.
(EH_TABLES_CAN_BE_READ_ONLY): Define.
(ASM_OUTPUT_DWARF_PCREL): Define.
(ASM_OUTPUT_DWARF_DATAREL): Define.
(EH_FRAME_THROUGH_COLLECT2): Define.
(EH_FRAME_IN_DATA_SECTION): Delete.
* config/rs6000/aix61.h (STARTFILE_SPEC): Add crtdbase.o.
* config/rs6000/rs6000-protos.h (rs6000_asm_output_dwarf_pcrel):
Declare.
(rs6000_asm_output_dwarf_datarel): Declare.
* config/rs6000/rs6000.c (rs6000_aix_asm_output_dwarf_pcrel): New.
(rs6000_aix_asm_output_dwarf_datarel): New.
(rs6000_xcoff_asm_init_sections): Don't set exception_section.
* config/spu/spu-elf.h (EH_FRAME_IN_DATA_SECTION): Delete.
(EH_FRAME_THROUGH_COLLECT2): Define.
* config/i386/i386-interix.h (EH_FRAME_IN_DATA_SECTION): Delete.
(EH_FRAME_THROUGH_COLLECT2): Define.
(EH_TABLES_CAN_BE_READ_ONLY): Define.
* doc/tm.texi.in (EH_FRAME_IN_DATA_SECTION): Delete.
(EH_FRAME_THROUGH_COLLECT2): New.
(ASM_OUTPUT_DWARF_DATAREL): New.
* doc/tm.texi: Regenerate.
Index: libgcc/config.host
===
--- libgcc/config.host  (revision 227905)
+++ libgcc/config.host  (working copy)
@@ -1085,7 +1085,7 @@
 rs6000-ibm-aix[56789].* | powerpc-ibm-aix[56789].*)
md_unwind_header=rs6000/aix-unwind.h
tmake_file="t-fdpbit rs6000/t-ppc64-fp rs6000/t-slibgcc-aix 
rs6000/t-ibm-ldouble rs6000/t-aix-cxa"
-   extra_parts="crtcxa.o crtcxa_s.o"
+   extra_parts="crtcxa.o crtcxa_s.o crtdbase.o"
;;
 rl78-*-elf)
tmake_file="$tm_file t-fdpbit rl78/t-rl78"
Index: libgcc/config/rs6000/crtdbase.S
===
--- libgcc/config/rs6000/crtdbase.S (revision 0)
+++ libgcc/config/rs6000/crtdbase.S (revision 0)
@@ -0,0 +1,31 @@
+/* Defines __gcc_unwind_dbase
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Symbol used as an arbitrary base for offsets inside the data
+ * segment for unwind information. */
+   .file "crtdbase.S"
+   .globl __gcc_unwind_dbase
+   .csect __gcc_unwind_dbase[RW],2
+   .align 2
+__gcc_unwind_dbase:
+   .long 0
Index: libgcc/config/rs6000/t-aix-cxa
===
--- libgcc/config/rs6000/t-aix-cxa  (revision 227905)
+++ libgcc/config/rs6000/t-aix-cxa  (working copy)
@@ -5,6 +5,9 @@
 
 SHLIB_MAPFILES += $(srcdir)/config/rs6000/libgcc-aix-cxa.ver
 
+crtdbase.o: $(srcdir)/config/rs6000/crtdbase.S
+   $(crt_compile) -c $<
+
 crtcxa.o: $(srcdir)/config/rs6000/crtcxa.c
$(crt_

Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Ville Voutilainen
On 18 September 2015 at 20:26, Jason Merrill  wrote:
>>> I suppose so, but it seems pretty trivial.  In any case, looks like your
>>> patch would accept the odd
>>> namespace A __attribute ((visibility ("default"))) ::B { }
>> Yes, or namespace A[[nonsense]]::B {}. Those cases are easy to fix,
>> but namespace [[attribute_in_proper_location]] A {} seemingly caused
>> weird barfing. That's why I didn't put in the rejection of the former, I'd
>> prefer
>> to figure out the latter and the former at the same time, and I'd prefer
>> doing
>> that once the basic facility is in. Yes, partly because I'll travel
>> tomorrow. :)
> To fix the former, you just need to keep
>>/* Parse any specified attributes.  */
>>attribs = cp_parser_attributes_opt (parser);
> next to the open brace.  OK with that change, I suppose the other can wait.


I also need to diagnose the use of attributes with a nested namespace
definition,
so I need to add the error emission and test it. ;)


Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Ville Voutilainen
On 18 September 2015 at 20:30, Ville Voutilainen
 wrote:
> On 18 September 2015 at 20:26, Jason Merrill  wrote:
 I suppose so, but it seems pretty trivial.  In any case, looks like your
 patch would accept the odd
 namespace A __attribute ((visibility ("default"))) ::B { }
>>> Yes, or namespace A[[nonsense]]::B {}. Those cases are easy to fix,
>>> but namespace [[attribute_in_proper_location]] A {} seemingly caused
>>> weird barfing. That's why I didn't put in the rejection of the former, I'd
>>> prefer
>>> to figure out the latter and the former at the same time, and I'd prefer
>>> doing
>>> that once the basic facility is in. Yes, partly because I'll travel
>>> tomorrow. :)
>> To fix the former, you just need to keep
>>>/* Parse any specified attributes.  */
>>>attribs = cp_parser_attributes_opt (parser);
>> next to the open brace.  OK with that change, I suppose the other can wait.
>
>
> I also need to diagnose the use of attributes with a nested namespace
> definition,
> so I need to add the error emission and test it. ;)

Hmm, I already do that, the nested namespace definition parsing
effectively requires
an identifier. Ok, I'll give it a spin, I'll send an updated patch for
review. :)


Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Ville Voutilainen
On 18 September 2015 at 20:38, Ville Voutilainen
 wrote:
> On 18 September 2015 at 20:30, Ville Voutilainen
>  wrote:
>> On 18 September 2015 at 20:26, Jason Merrill  wrote:
> I suppose so, but it seems pretty trivial.  In any case, looks like your
> patch would accept the odd
> namespace A __attribute ((visibility ("default"))) ::B { }
 Yes, or namespace A[[nonsense]]::B {}. Those cases are easy to fix,
 but namespace [[attribute_in_proper_location]] A {} seemingly caused
 weird barfing. That's why I didn't put in the rejection of the former, I'd
 prefer
 to figure out the latter and the former at the same time, and I'd prefer
 doing
 that once the basic facility is in. Yes, partly because I'll travel
 tomorrow. :)
>>> To fix the former, you just need to keep
/* Parse any specified attributes.  */
attribs = cp_parser_attributes_opt (parser);
>>> next to the open brace.  OK with that change, I suppose the other can wait.
>>
>>
>> I also need to diagnose the use of attributes with a nested namespace
>> definition,
>> so I need to add the error emission and test it. ;)
>
> Hmm, I already do that, the nested namespace definition parsing
> effectively requires
> an identifier. Ok, I'll give it a spin, I'll send an updated patch for
> review. :)

Argh, no. An attribute immediately following a nesting namespace would need
to be parsed before the nested namespace definition handling is done, otherwise
the nested namespace definition handling is never entered because the next token
is not CPP_SCOPE. So the attributes should be parsed and rejected where they are
parsed now if they are followed by a CPP_SCOPE. That's easy, I'll just check
for non-null attribs and diagnose.


Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (take

2015-09-18 Thread Manuel López-Ibáñez

On 18/09/15 18:45, Martin Sebor wrote:

but it makes me wonder how common this pattern is in portable
code and whether adding workarounds for it is the right solution
(or if it might prompt people to disable the warning, which would
be a shame).


Perhaps if we are going to warn, we could look for sizeof() and virtual 
locations in the operands, and skip the warning. It would be nice to find a 
heuristic that allows warning in most cases but skip those that appear often in 
common code.


Another alternative is to have a more heuristic version of operand_equal(), if 
two operands are equal because of "compilation-dependent" code (macro 
expansion, sizeof, etc), then they are not considered equal. that is 
operand_equal_2(sizeof(long), sizeof(long)) returns true, but 
operand_equal_2(8, sizeof(long)) returns false.


I have no idea whether it is possible to implement operand_equal_2 in a sane 
way.

Cheers,

Manuel.


Re: [PATCH][RS6000] Migrate from reduc_xxx to reduc_xxx_scal optabs

2015-09-18 Thread Bill Schmidt
On Fri, 2015-09-18 at 16:39 +0100, Alan Lawrence wrote:
> This is a respin of https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html
> after discovering that patch was broken on power64le - thanks to Bill Schmidt
> for pointing out that gcc112 is the opposite endianness to gcc110...
> 
> This time I decided to avoid any funny business with making RTL match other
> patterns in other .md files, and instead to directly call the relevant
> expanders. This should thus preserve the codegen of the previous expansion 
> path.
> Moreover, combining the uplus and splus expansion paths (as addition is the 
> same
> regardless of signedness) causes some additional examples to be reduced 
> directly
> via patterns.

Alan, thanks for the patch!  David will have to approve it, but this
endian-corrected version looks good to me.

Regards,
Bill

> 
> Bootstrapped + check-g{cc,++,fortran}
> on powerpc64-none-linux-gnu (--with-cpu=power7)
> and powerpc64le-none-linux-gnu (--with-cpu=power8).
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/altivec.md (reduc_splus_): Rename to...
>   (reduc_plus_scal_): ...this, add rs6000_expand_vector_extract.
>   (reduc_uplus_v16qi): Remove.
> 
>   * config/rs6000/vector.md (VEC_reduc_name): Change "splus" to "plus".
>   (reduc__v2df): Remove.
>   (reduc__v4sf): Remove.
>   (reduc__scal_): New.
> 
>   * config/rs6000/vsx.md (vsx_reduc__v2df): Declare
>   gen_ function by removing * prefix.
>   (vsx_reduc__v4sf): Likewise.
> ---
>  gcc/config/rs6000/altivec.md | 25 ++-
>  gcc/config/rs6000/vector.md  | 47 
> ++--
>  gcc/config/rs6000/vsx.md |  4 ++--
>  3 files changed, 27 insertions(+), 49 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 4170f38..93ce1f0 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2648,35 +2648,22 @@
>operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
>  })
>  
> -(define_expand "reduc_splus_"
> -  [(set (match_operand:VIshort 0 "register_operand" "=v")
> +(define_expand "reduc_plus_scal_"
> +  [(set (match_operand: 0 "register_operand" "=v")
>  (unspec:VIshort [(match_operand:VIshort 1 "register_operand" "v")]
>   UNSPEC_REDUC_PLUS))]
>"TARGET_ALTIVEC"
>  {
>rtx vzero = gen_reg_rtx (V4SImode);
>rtx vtmp1 = gen_reg_rtx (V4SImode);
> -  rtx dest = gen_lowpart (V4SImode, operands[0]);
> +  rtx vtmp2 = gen_reg_rtx (mode);
> +  rtx dest = gen_lowpart (V4SImode, vtmp2);
> +  int elt = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 : 0;
>  
>emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
>emit_insn (gen_altivec_vsum4ss (vtmp1, operands[1], vzero));
>emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
> -  DONE;
> -})
> -
> -(define_expand "reduc_uplus_v16qi"
> -  [(set (match_operand:V16QI 0 "register_operand" "=v")
> -(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")]
> -   UNSPEC_REDUC_PLUS))]
> -  "TARGET_ALTIVEC"
> -{
> -  rtx vzero = gen_reg_rtx (V4SImode);
> -  rtx vtmp1 = gen_reg_rtx (V4SImode);
> -  rtx dest = gen_lowpart (V4SImode, operands[0]);
> -
> -  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
> -  emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero));
> -  emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
> +  rs6000_expand_vector_extract (operands[0], vtmp2, elt);
>DONE;
>  })
>  
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 8821dec..d8699c8 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -78,7 +78,7 @@
>  ;; Vector reduction code iterators
>  (define_code_iterator VEC_reduc [plus smin smax])
>  
> -(define_code_attr VEC_reduc_name [(plus "splus")
> +(define_code_attr VEC_reduc_name [(plus "plus")
> (smin "smin")
> (smax "smax")])
>  
> @@ -1061,38 +1061,29 @@
>"")
>  
>  ;; Vector reduction expanders for VSX
> -
> -(define_expand "reduc__v2df"
> -  [(parallel [(set (match_operand:V2DF 0 "vfloat_operand" "")
> -(VEC_reduc:V2DF
> - (vec_concat:V2DF
> -  (vec_select:DF
> -   (match_operand:V2DF 1 "vfloat_operand" "")
> -   (parallel [(const_int 1)]))
> -  (vec_select:DF
> -   (match_dup 1)
> -   (parallel [(const_int 0)])))
> - (match_dup 1)))
> -   (clobber (match_scratch:V2DF 2 ""))])]
> -  "VECTOR_UNIT_VSX_P (V2DFmode)"
> -  "")
> -
> -; The (VEC_reduc:V4SF
> +; The (VEC_reduc:...
>  ;(op1)
> -;(unspec:V4SF [(const_int 0)] UNSPEC_REDUC))
> +;(unspec:... [(const_int 0)] UNSPEC_REDUC))
>  ;
>  ; is to allow us to use a code iterator, but not completely list all of the
>  ; vector rotates, etc. to prevent canonicalization
>  
> -(define_exp

Re: [PATCH 03/22] Move diagnostic_show_locus and friends out into a new source file

2015-09-18 Thread David Malcolm
On Mon, 2015-09-14 at 13:35 -0600, Jeff Law wrote:
> On 09/10/2015 02:28 PM, David Malcolm wrote:
> > The function "diagnostic_show_locus" gains new functionality in the
> > next patch, so this preliminary patch breaks it out into a new source
> > file, diagnostic-show-locus.c, along with a couple of related functions.
> >
> > gcc/ChangeLog:
> > * Makefile.in (OBJS-libcommon): Add diagnostic-show-locus.o.
> > * diagnostic.c (adjust_line): Move to diagnostic-show-locus.c.
> > (diagnostic_show_locus): Likewise.
> > (diagnostic_print_caret_line): Likewise.
> > * diagnostic-show-locus.c: New file.
> This is fine for the trunk.

Thanks; bootstrapped®rtested; committed to trunk as r227915.

> So much for the easy stuff :-)

FWIW, I'm working on a much simpler version of the patch kit, addressing
some of the issues already raised.

> jeff
> 




Re: [PATCH][RS6000] Migrate from reduc_xxx to reduc_xxx_scal optabs

2015-09-18 Thread David Edelsohn
On Fri, Sep 18, 2015 at 2:26 PM, Bill Schmidt
 wrote:
> On Fri, 2015-09-18 at 16:39 +0100, Alan Lawrence wrote:
>> This is a respin of https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html
>> after discovering that patch was broken on power64le - thanks to Bill Schmidt
>> for pointing out that gcc112 is the opposite endianness to gcc110...
>>
>> This time I decided to avoid any funny business with making RTL match other
>> patterns in other .md files, and instead to directly call the relevant
>> expanders. This should thus preserve the codegen of the previous expansion 
>> path.
>> Moreover, combining the uplus and splus expansion paths (as addition is the 
>> same
>> regardless of signedness) causes some additional examples to be reduced 
>> directly
>> via patterns.
>
> Alan, thanks for the patch!  David will have to approve it, but this
> endian-corrected version looks good to me.

Okay with me.

Thanks, David


Re: [PATCH tree-inline] do not say "called from here" with UNKNOWN_LOCATION

2015-09-18 Thread Manuel López-Ibáñez
And now with the patch.

On 18 September 2015 at 20:40, Manuel López-Ibáñez
 wrote:
> In https://sourceware.org/ml/libc-alpha/2014-12/msg00300.html, we give a
> "called from here" note without actually having a location, which looks
> strange. I haven't been able to generate such a testcase. In this patch, we
> assert this cannot happen when checking and simply skip the extra note in
> release mode.
>
> Boot&tested on x86_64-linux-gnu
>
> OK?
>
>
> gcc/testsuite/ChangeLog:
>
> 2015-09-18  Manuel López-Ibáñez  
>
> * gcc.target/i386/inline_error.c (int bar): Use dg-message for note.
> * gcc.target/i386/pr57756.c (static __inline int caller): Likewise.
> * gcc.target/i386/pr59789.c (f1): Likewise.
> * gcc.target/i386/intrinsics_5.c (__m128i foo): Likewise.
> * gcc.target/i386/intrinsics_6.c: Likewise.
> * gcc.dg/winline-5.c (int t): Likewise.
> * gcc.dg/winline-9.c (t): Likewise.
> * gcc.dg/always_inline2.c (q): Likewise.
> * gcc.dg/winline-2.c (inline int t): Likewise.
> * gcc.dg/winline-6.c: Likewise.
> * gcc.dg/winline-10.c (void g): Likewise.
> * gcc.dg/pr49243.c (void parse): Likewise.
> * gcc.dg/always_inline3.c (q2): Likewise.
> * gcc.dg/winline-3.c: Likewise.
> * gcc.dg/winline-7.c (inline void *t): Likewise.
>
> gcc/ChangeLog:
>
> 2015-09-18  Manuel López-Ibáñez  
>
> * tree-inline.c (expand_call_inline): Use inform for extra note.
> Do not give "called from here" with UNKNOWN_LOCATION.
Index: gcc/testsuite/gcc.target/i386/inline_error.c
===
--- gcc/testsuite/gcc.target/i386/inline_error.c(revision 227880)
+++ gcc/testsuite/gcc.target/i386/inline_error.c(working copy)
@@ -7,7 +7,7 @@ foo () /* { dg-error "inlining failed in
   return 0;
 }
 
 int bar()
 {
-  return foo (); /* { dg-error "called from here" } */
+  return foo (); /* { dg-message "called from here" } */
 }
Index: gcc/testsuite/gcc.target/i386/pr57756.c
===
--- gcc/testsuite/gcc.target/i386/pr57756.c (revision 227880)
+++ gcc/testsuite/gcc.target/i386/pr57756.c (working copy)
@@ -9,11 +9,11 @@ __inline int callee () /* { dg-error "in
 }
 
 __attribute__((target("sse")))
 static __inline int caller ()
 {
-  return callee(); /* { dg-error "called from here" }  */
+  return callee(); /* { dg-message "called from here" }  */
 }
 
 int main ()
 {
   return caller();
Index: gcc/testsuite/gcc.target/i386/pr59789.c
===
--- gcc/testsuite/gcc.target/i386/pr59789.c (revision 227880)
+++ gcc/testsuite/gcc.target/i386/pr59789.c (working copy)
@@ -16,7 +16,7 @@ _mm_set_epi32 (int __q3, int __q2, int _
 
 
 __m128i
 f1(void)
 { /* { dg-message "warning: SSE vector return without SSE enabled changes the 
ABI" } */
-  return _mm_set_epi32 (0, 0, 0, 0); /* { dg-error "called from here" } */
+  return _mm_set_epi32 (0, 0, 0, 0); /* { dg-message "called from here" } */
 }
Index: gcc/testsuite/gcc.target/i386/intrinsics_5.c
===
--- gcc/testsuite/gcc.target/i386/intrinsics_5.c(revision 227880)
+++ gcc/testsuite/gcc.target/i386/intrinsics_5.c(working copy)
@@ -8,9 +8,9 @@
 
 #include 
 
 __m128i foo(__m128i *V)
 {
-return _mm_stream_load_si128(V); /* { dg-error "called from here" } */
+return _mm_stream_load_si128(V); /* { dg-message "called from here" } */
 }
 
 /* { dg-prune-output ".*inlining failed.*" }  */
Index: gcc/testsuite/gcc.target/i386/intrinsics_6.c
===
--- gcc/testsuite/gcc.target/i386/intrinsics_6.c(revision 227880)
+++ gcc/testsuite/gcc.target/i386/intrinsics_6.c(working copy)
@@ -8,9 +8,9 @@
 
 #include 
 
 __m128i foo(__m128i *V)
 {
-return _mm_stream_load_si128(V); /* { dg-error "called from here" } */
+return _mm_stream_load_si128(V); /* { dg-message "called from here" } */
 }
 
 /* { dg-prune-output ".*inlining failed.*" }  */
Index: gcc/testsuite/gcc.dg/winline-5.c
===
--- gcc/testsuite/gcc.dg/winline-5.c(revision 227880)
+++ gcc/testsuite/gcc.dg/winline-5.c(working copy)
@@ -15,7 +15,7 @@ inline int q(void) /* { dg-warning "inli
big();
big();
 }
 int t (void)
 {
-   return q (); /* { dg-warning "called from here" } */
+   return q (); /* { dg-message "called from here" } */
 }
Index: gcc/testsuite/gcc.dg/winline-9.c
===
--- gcc/testsuite/gcc.dg/winline-9.c(revision 227880)
+++ gcc/testsuite/gcc.dg/winline-9.c(working copy)
@@ -20,7 +20,7 @@ int
 t()
 {
   if (a)
 aa();
   if (b)
-bb();  /* { dg-warning "called from here" "" } */
+bb();  /

[PATCH tree-inline] do not say "called from here" with UNKNOWN_LOCATION

2015-09-18 Thread Manuel López-Ibáñez
In https://sourceware.org/ml/libc-alpha/2014-12/msg00300.html, we give a
"called from here" note without actually having a location, which looks
strange. I haven't been able to generate such a testcase. In this patch, we
assert this cannot happen when checking and simply skip the extra note in
release mode.

Boot&tested on x86_64-linux-gnu

OK?


gcc/testsuite/ChangeLog:

2015-09-18  Manuel López-Ibáñez  

* gcc.target/i386/inline_error.c (int bar): Use dg-message for note.
* gcc.target/i386/pr57756.c (static __inline int caller): Likewise.
* gcc.target/i386/pr59789.c (f1): Likewise.
* gcc.target/i386/intrinsics_5.c (__m128i foo): Likewise.
* gcc.target/i386/intrinsics_6.c: Likewise.
* gcc.dg/winline-5.c (int t): Likewise.
* gcc.dg/winline-9.c (t): Likewise.
* gcc.dg/always_inline2.c (q): Likewise.
* gcc.dg/winline-2.c (inline int t): Likewise.
* gcc.dg/winline-6.c: Likewise.
* gcc.dg/winline-10.c (void g): Likewise.
* gcc.dg/pr49243.c (void parse): Likewise.
* gcc.dg/always_inline3.c (q2): Likewise.
* gcc.dg/winline-3.c: Likewise.
* gcc.dg/winline-7.c (inline void *t): Likewise.

gcc/ChangeLog:

2015-09-18  Manuel López-Ibáñez  

* tree-inline.c (expand_call_inline): Use inform for extra note.
Do not give "called from here" with UNKNOWN_LOCATION.


Re: Openacc launch API

2015-09-18 Thread Nathan Sidwell

On 09/18/15 05:13, Bernd Schmidt wrote:


Is that so difficult though? See if nvptx ignores (let's say) intelmic arguments
in favour of the default and accepts nvptx ones.


I'm sorry, I think it is unreasonable to require support in this patch for 
something that's not yet implemented in the rest of the toolchain.  The 
likelihood of implementing it correctly without  all the other bits filled in is 
low -- there are bound to be unanticipated subtleties.



I still don't like this. I think there are at least two better alternatives: add
a new GOMP_LAUNCH_key which makes GOACC_parallel read a number of waits from a
va_list * pointer passed after it, or just admit that the legacy function always
does host fallback and just truncate the current version after


I think you're worrying about  something that will not happen in practice.  As I 
said, the most waits I've seen on code has been 2, so a limit of 8 seems 
perfectly adequate.


nathan


Re: [PATCH, RFC] Implement N4230, Nested namespace definition

2015-09-18 Thread Ville Voutilainen
On 18 September 2015 at 20:46, Ville Voutilainen
 wrote:
> Argh, no. An attribute immediately following a nesting namespace would need
> to be parsed before the nested namespace definition handling is done, 
> otherwise
> the nested namespace definition handling is never entered because the next 
> token
> is not CPP_SCOPE. So the attributes should be parsed and rejected where they 
> are
> parsed now if they are followed by a CPP_SCOPE. That's easy, I'll just check
> for non-null attribs and diagnose.

New patch attached. I added tests for the partial attribute support in
it. Ok for trunk?
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4f424b6..e9353b0 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16953,6 +16953,8 @@ cp_parser_namespace_definition (cp_parser* parser)
   tree identifier, attribs;
   bool has_visibility;
   bool is_inline;
+  cp_token* token;
+  int nested_definition_count = 0;
 
   cp_ensure_no_omp_declare_simd (parser);
   if (cp_lexer_next_token_is_keyword (parser->lexer, RID_INLINE))
@@ -16965,7 +16967,7 @@ cp_parser_namespace_definition (cp_parser* parser)
 is_inline = false;
 
   /* Look for the `namespace' keyword.  */
-  cp_parser_require_keyword (parser, RID_NAMESPACE, RT_NAMESPACE);
+  token = cp_parser_require_keyword (parser, RID_NAMESPACE, RT_NAMESPACE);
 
   /* Get the name of the namespace.  We do not attempt to distinguish
  between an original-namespace-definition and an
@@ -16979,11 +16981,38 @@ cp_parser_namespace_definition (cp_parser* parser)
   /* Parse any specified attributes.  */
   attribs = cp_parser_attributes_opt (parser);
 
-  /* Look for the `{' to start the namespace.  */
-  cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
   /* Start the namespace.  */
   push_namespace (identifier);
 
+  /* Parse any nested namespace definition. */
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SCOPE))
+{
+  if (attribs)
+error_at (token->location, "a nested namespace definition cannot have 
attributes");
+  if (cxx_dialect < cxx1z)
+pedwarn (input_location, OPT_Wpedantic,
+ "nested namespace definitions only available with "
+ "-std=c++1z or -std=gnu++1z");
+  if (is_inline)
+error_at (token->location, "a nested namespace definition cannot be 
inline");
+  while (cp_lexer_next_token_is (parser->lexer, CPP_SCOPE))
+{
+  cp_lexer_consume_token (parser->lexer);
+  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
+identifier = cp_parser_identifier (parser);
+  else
+{
+  cp_parser_error (parser, "nested identifier required");
+  break;
+}
+  ++nested_definition_count;
+  push_namespace (identifier);
+}
+}
+
+  /* Look for the `{' to validate starting the namespace.  */
+  cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
+
   /* "inline namespace" is equivalent to a stub namespace definition
  followed by a strong using directive.  */
   if (is_inline)
@@ -17007,6 +17036,10 @@ cp_parser_namespace_definition (cp_parser* parser)
   if (has_visibility)
 pop_visibility (1);
 
+  /* Finish the nested namespace definitions.  */
+  while (nested_definition_count--)
+pop_namespace ();
+
   /* Finish the namespace.  */
   pop_namespace ();
   /* Look for the final `}'.  */
diff --git a/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def1.C 
b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def1.C
new file mode 100644
index 000..ebdb70b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def1.C
@@ -0,0 +1,19 @@
+// { dg-options "-std=c++1z" }
+
+namespace A::B::C
+{
+   struct X {};
+   namespace T::U::V { struct Y {}; }
+}
+
+A::B::C::X x;
+A::B::C::T::U::V::Y y;
+
+inline namespace D::E {} // { dg-error "cannot be inline" }
+
+namespace F::G:: {} // { dg-error "nested identifier required" }
+
+namespace G __attribute ((visibility ("default"))) ::H {} // { dg-error 
"cannot have attributes" }
+
+namespace H [[deprecated]] ::I {} // { dg-error "cannot have 
attributes|ignored" }
+
diff --git a/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def2.C 
b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def2.C
new file mode 100644
index 000..c47a94a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def2.C
@@ -0,0 +1,5 @@
+// { dg-options "-std=c++11 -pedantic-errors" }
+
+namespace A::B::C // { dg-error "nested namespace definitions only available 
with" }
+{
+}
diff --git a/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def3.C 
b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def3.C
new file mode 100644
index 000..f2dac8f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def3.C
@@ -0,0 +1,5 @@
+// { dg-options "-std=c++11" }
+
+namespace A::B::C
+{
+}
diff --git a/gcc/testsuite/g++.dg/lookup/name-clash5.C 
b/gcc/testsuite/g++.dg/lookup/name-clash5.C
index 74595c2..9673bb9 100644
--- a/gcc/testsuite/g+

[committed, pa] Change argument type and casts in pa_ldil_cint_p and pa_cint_ok_for_move to unsigned

2015-09-18 Thread John David Anglin
The attached change fixes build error on trunk and makes the argument types for 
various constant
checks more consistent.

Tested on hppa2.0w-hp-hpux11.11 and hppa-unknown-linux-gnu.  Committed to trunk.

Dave
--
John David Anglin   dave.ang...@bell.net


2014-09-18  John David Anglin  

* config/pa/pa-protos.h (pa_cint_ok_for_move): Change argument type to
unsigned.
(pa_ldil_cint_p): Likewise.
* config/pa/pa.c (pa_cint_ok_for_move): likewise.
(pa_ldil_cint_p): Likewise. Change signed casts to unsigned.
Update callers.
* config/pa/pa.md: Likewise.

Index: config/pa/pa-protos.h
===
--- config/pa/pa-protos.h   (revision 227798)
+++ config/pa/pa-protos.h   (working copy)
@@ -82,9 +82,9 @@
 #endif /* RTX_CODE */
 
 extern int pa_and_mask_p (unsigned HOST_WIDE_INT);
-extern int pa_cint_ok_for_move (HOST_WIDE_INT);
+extern int pa_cint_ok_for_move (unsigned HOST_WIDE_INT);
 extern int pa_ior_mask_p (unsigned HOST_WIDE_INT);
-extern int pa_ldil_cint_p (HOST_WIDE_INT);
+extern int pa_ldil_cint_p (unsigned HOST_WIDE_INT);
 extern int pa_mem_shadd_constant_p (int);
 extern int pa_shadd_constant_p (int);
 extern int pa_zdepi_cint_p (unsigned HOST_WIDE_INT);
Index: config/pa/pa.c
===
--- config/pa/pa.c  (revision 227798)
+++ config/pa/pa.c  (working copy)
@@ -707,7 +707,7 @@
 /* Accept any constant that can be moved in one instruction into a
general register.  */
 int
-pa_cint_ok_for_move (HOST_WIDE_INT ival)
+pa_cint_ok_for_move (unsigned HOST_WIDE_INT ival)
 {
   /* OK if ldo, ldil, or zdepi, can be used.  */
   return (VAL_14_BITS_P (ival)
@@ -719,11 +719,12 @@
significant 11 bits of the value must be zero and the value must
not change sign when extended from 32 to 64 bits.  */
 int
-pa_ldil_cint_p (HOST_WIDE_INT ival)
+pa_ldil_cint_p (unsigned HOST_WIDE_INT ival)
 {
-  HOST_WIDE_INT x = ival & (((HOST_WIDE_INT) -1 << 31) | 0x7ff);
+  unsigned HOST_WIDE_INT x;
 
-  return x == 0 || x == ((HOST_WIDE_INT) -1 << 31);
+  x = ival & (((unsigned HOST_WIDE_INT) -1 << 31) | 0x7ff);
+  return x == 0 || x == ((unsigned HOST_WIDE_INT) -1 << 31);
 }
 
 /* True iff zdepi can be used to generate this CONST_INT.
@@ -1858,7 +1859,7 @@
 
   if (register_operand (operand1, mode)
  || (GET_CODE (operand1) == CONST_INT
- && pa_cint_ok_for_move (INTVAL (operand1)))
+ && pa_cint_ok_for_move (UINTVAL (operand1)))
  || (operand1 == CONST0_RTX (mode))
  || (GET_CODE (operand1) == HIGH
  && !symbolic_operand (XEXP (operand1, 0), VOIDmode))
@@ -2134,7 +2135,7 @@
  operands[1] = tmp;
}
   else if (GET_CODE (operand1) != CONST_INT
-  || !pa_cint_ok_for_move (INTVAL (operand1)))
+  || !pa_cint_ok_for_move (UINTVAL (operand1)))
{
  rtx temp;
  rtx_insn *insn;
@@ -10252,7 +10253,7 @@
   && !reload_in_progress
   && !reload_completed
   && !LEGITIMATE_64BIT_CONST_INT_P (INTVAL (x))
-  && !pa_cint_ok_for_move (INTVAL (x)))
+  && !pa_cint_ok_for_move (UINTVAL (x)))
 return false;
 
   if (function_label_operand (x, mode))
Index: config/pa/pa.md
===
--- config/pa/pa.md (revision 227798)
+++ config/pa/pa.md (working copy)
@@ -5035,7 +5035,7 @@
(plus:SI (match_operand:SI 1 "register_operand" "")
 (match_operand:SI 2 "const_int_operand" "")))
(clobber (match_operand:SI 4 "register_operand" ""))]
-  "! pa_cint_ok_for_move (INTVAL (operands[2]))
+  "! pa_cint_ok_for_move (UINTVAL (operands[2]))
&& VAL_14_BITS_P (INTVAL (operands[2]) >> 1)"
   [(set (match_dup 4) (plus:SI (match_dup 1) (match_dup 2)))
(set (match_dup 0) (plus:SI (match_dup 4) (match_dup 3)))]
@@ -5054,13 +5054,13 @@
(plus:SI (match_operand:SI 1 "register_operand" "")
 (match_operand:SI 2 "const_int_operand" "")))
(clobber (match_operand:SI 4 "register_operand" ""))]
-  "! pa_cint_ok_for_move (INTVAL (operands[2]))"
+  "! pa_cint_ok_for_move (UINTVAL (operands[2]))"
   [(set (match_dup 4) (match_dup 2))
(set (match_dup 0) (plus:SI (ashift:SI (match_dup 4) (match_dup 3))
   (match_dup 1)))]
   "
 {
-  HOST_WIDE_INT intval = INTVAL (operands[2]);
+  unsigned HOST_WIDE_INT intval = UINTVAL (operands[2]);
 
   /* Try dividing the constant by 2, then 4, and finally 8 to see
  if we can get a constant which can be loaded into a register


[PATCH] avail_exprs is no longer file scoped

2015-09-18 Thread Jeff Law


So the next step in fixing 47679 is to continue the process of allowing 
tree-ssa-threadedge to modify the expression hash tables in well defined 
ways.  The key is we want to be able to call 
record_temporary_equivalences in the threader.


record_temporary_equivalences will modify the const_and_copies_stack by 
way of record_equality and avail_exprs/avail_exprs_stack by way of 
record_cond.


record_equality/record_cond and their children currently have access to 
those objects by way of file scoped static variables.  While this works, 
it seems horribly unclean.


Instead we need to be passing suitable objects into 
record_temporary_equivalences and more generally, we should eliminate 
the file scoped statics that allow access to those 3 key objects.


This is the first of 3 patches to eliminate those file scoped statics. 
This one eliminates avail_exprs (by far the easiest).  It's already 
stuffed into the avail_exprs_stack object.  Just adding an accessor 
method is all it takes.


This slightly cleans up the debug/dump support along the way and fixes 
some nitty whitespace stuff.


Bootstrapped and regression tested on x86_64-linux-gnu.  Installed on 
the trunk.


Jeff


* tree-ssa-dom.c (avail_exprs): No longer file scoped.  Bury
it into the avail_exprs_stack class.
(pass_dominator::execute): Corresponding changes to declaration
and initialization of avail_exprs.  Pass avail_exprs to
dump_dominator_optimization_stats.
(record_cond): Extract avail_exprs from avail_exprs_stack.
(lookup_avail_expr): Similarly.
(htab_staticstics): Remove unnecessary prototype.  Move to earlier
position in file.
(dump_dominator_optimization_stats): Make static and prototype.
Add argument for the hash table to dump.
(debug_dominator_optimization_stats): Remove.
* tree-ssa-dom.h (dump_dominator_optimization_stats): Remove
prototype.
(debug_dominator_optimization_stats): Similarly.
* tree-ssa-scopedtables.h (class avail_exprs_stack): Add missing
"void" in prototype for pop_to_marker method.  Add accessor method
for the underlying avail_exprs table.

* tree-ssa-threadedge.c: Remove trailing whitespace.


diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 963dea9..bf5e8a1 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -78,15 +78,6 @@ struct edge_info
   vec cond_equivalences;
 };
 
-/* Hash table with expressions made available during the renaming process.
-   When an assignment of the form X_i = EXPR is found, the statement is
-   stored in this table.  If the same expression EXPR is later found on the
-   RHS of another statement, it is replaced with X_i (thus performing
-   global redundancy elimination).  Similarly as we pass through conditionals
-   we record the conditional itself as having either a true or false value
-   in this table.  */
-static hash_table *avail_exprs;
-
 /* Unwindable equivalences, both const/copy and expression varieties.  */
 static const_and_copies *const_and_copies;
 static avail_exprs_stack *avail_exprs_stack;
@@ -114,8 +105,6 @@ static struct opt_stats_d opt_stats;
 /* Local functions.  */
 static void optimize_stmt (basic_block, gimple_stmt_iterator);
 static tree lookup_avail_expr (gimple, bool);
-static void htab_statistics (FILE *,
-const hash_table &);
 static void record_cond (cond_equivalence *);
 static void record_equality (tree, tree);
 static void record_equivalences_from_phis (basic_block);
@@ -123,6 +112,9 @@ static void record_equivalences_from_incoming_edge 
(basic_block);
 static void eliminate_redundant_computations (gimple_stmt_iterator *);
 static void record_equivalences_from_stmt (gimple, int);
 static edge single_incoming_edge_ignoring_loop_edges (basic_block);
+static void dump_dominator_optimization_stats (FILE *file,
+  hash_table *);
+
 
 /* Free the edge_info data attached to E, if it exists.  */
 
@@ -548,7 +540,8 @@ pass_dominator::execute (function *fun)
   memset (&opt_stats, 0, sizeof (opt_stats));
 
   /* Create our hash tables.  */
-  avail_exprs = new hash_table (1024);
+  hash_table *avail_exprs
+= new hash_table (1024);
   avail_exprs_stack = new class avail_exprs_stack (avail_exprs);
   const_and_copies = new class const_and_copies ();
   need_eh_cleanup = BITMAP_ALLOC (NULL);
@@ -671,7 +664,7 @@ pass_dominator::execute (function *fun)
 
   /* Debugging dumps.  */
   if (dump_file && (dump_flags & TDF_STATS))
-dump_dominator_optimization_stats (dump_file);
+dump_dominator_optimization_stats (dump_file, avail_exprs);
 
   loop_optimizer_finalize ();
 
@@ -1008,10 +1001,22 @@ record_equivalences_from_incoming_edge (basic_block bb)
 record_temporary_equivalences (e);
 }
 
+/* Dump statistics for the hash table HTAB.  */
+
+static void
+htab_statistics (FILE *file, const hash_table &htab)
+{

  1   2   >