Re: [PATCH] Simplify vector compare-not-select sequence

2015-08-02 Thread Andreas Schwab
Bill Schmidt  writes:

>   * gcc.target/powerpc/vec-cmp-sel.c: New test.

FAIL: gcc.target/powerpc/vec-cmp-sel.c (test for excess errors)
Excess errors:
/daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:14:1: 
error: use of 'long long' in AltiVec types is invalid without -mvsx
/daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:15:3: 
error: use of 'long long' in AltiVec types is invalid without -mvsx
/daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:16:3: 
error: use of 'long long' in AltiVec types is invalid without -mvsx
/daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:17:3: 
error: use of 'long long' in AltiVec types is invalid without -mvsx
/daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:17:29: 
error: incompatible types when initializing type '__vector unsigned long long' 
using type '__vector __bool long long'
/daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:18:3: 
error: use of 'long long' in AltiVec types is invalid without -mvsx

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


RE: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

2015-08-02 Thread Kumar, Venkataramanan
Hi Jakub, 

Thank you for reviewing the patch.

I have incorporated your comments in the attached patch.


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Wednesday, July 29, 2015 1:24 AM
> To: Kumar, Venkataramanan
> Cc: Richard Beiner (richard.guent...@gmail.com); gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with
> power 2 integer constant.
> 
> Hi!
> 
> > This is just an initial patch and tries to optimize integer type power
> > 2 constants.  I wanted to get feedback on this .  I bootstrapped and
> > reg tested on aarch64-none-linux-gnu .
> 
> Thanks for working on it.
> ChangeLog entry for the patch is missing, probably also some testcases.
> 
> > @@ -90,6 +94,7 @@ static vect_recog_func_ptr
> vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
> > vect_recog_rotate_pattern,
> > vect_recog_vector_vector_shift_pattern,
> > vect_recog_divmod_pattern,
> > +vect_recog_multconst_pattern,
> > vect_recog_mixed_size_cond_pattern,
> > vect_recog_bool_pattern};
> 
> Please watch formatting, the other lines are tab indented, so please use a
> tab rather than 8 spaces.
> 
> > @@ -2147,6 +2152,87 @@ vect_recog_vector_vector_shift_pattern
> (vec *stmts,
> >return pattern_stmt;
> >  }
> >
> 
> Function comment is missing here.
> 
> > +static gimple
> > +vect_recog_multconst_pattern (vec *stmts,
> > +   tree *type_in, tree *type_out)
> 
> About the function name, wonder if just vect_recog_mult_pattern wouldn't
> be enough.
> 
> > +  rhs_code = gimple_assign_rhs_code (last_stmt);  switch (rhs_code)
> > +{
> > +case MULT_EXPR:
> > +  break;
> > +default:
> > +  return NULL;
> > +}
> 
> This looks too weird, I'd just do
>   if (gimple_assign_rhs_code (last_stmt) != MULT_EXPR)
> return NULL;
> (you handle just one pattern).
> 
> > +  /* If the target can handle vectorized multiplication natively,
> > + don't attempt to optimize this.  */  optab = optab_for_tree_code
> > + (rhs_code, vectype, optab_default);
> 
> Supposedly you can use MULT_EXPR directly here.
> 
> > +  /* If target cannot handle vector left shift then we cannot
> > + optimize and bail out.  */
> > +  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
> > + if (!optab
> > +  || optab_handler (optab, TYPE_MODE (vectype)) ==
> CODE_FOR_nothing)
> > +return NULL;
> > +
> > +  if (integer_pow2p (oprnd1))
> > +{
> > +  /* Pattern detected.  */
> > +  if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_NOTE, vect_location,
> > +"vect_recog_multconst_pattern: detected:\n");
> > +
> > +  tree shift;
> > +  shift = build_int_cst (itype, tree_log2 (oprnd1));
> > +  pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var
> (itype, NULL),
> > + LSHIFT_EXPR, oprnd0, shift);
> > +  if (dump_enabled_p ())
> > +   dump_gimple_stmt_loc (MSG_NOTE, vect_location, TDF_SLIM,
> pattern_stmt,
> > +  0);
> > +  stmts->safe_push (last_stmt);
> > +  *type_in = vectype;
> > +  *type_out = vectype;
> > +  return pattern_stmt;
> > +}
> 
> Trailing whitespace.
> The integer_pow2p case (have you checked signed multiply by INT_MIN?) is
> only one of the cases you can actually handle, you can look at expand_mult
> for many other cases - e.g. multiplication by negated powers of 2, or call
> choose_mult_variant and handle whatever it returns.

I have added patterns that detect mults with power of 2 constants and convert 
them to shifts in this patch.
I will do a follow up patch for other variants as done in "expand_mult" and 
"choose_mult_variant".

INT_MIN case, I have not yet checked. 

 I thought of adding like this. 
if (wi::eq_p (oprnd1, HOST_WIDE_INT_MIN))
return NULL;

But wanted to confirm with you since I don't understand  this clearly.  Is this 
because Var * INT_MIN !=  Var << 31  ?

 I tried this case for "int" or long int type arr  

void test_vector_shifts()
{
for(i=0; i<=99;i++)
arr[i]=arr[i]*INT_MIN;
}

Int type-  Looks like  without my patch aarch64 vectorizes it using shifts.
ldr q0, [x0]
shl v0.4s, v0.4s, 31
str q0, [x0], 16

For long int  type -  without my patch it converts to shifts but vectorization 
did not happen.
ldr x1, [x0]
neg x1, x1, lsl 31
str x1, [x0], 8

> 
>   Jakub

gcc/ChangeLog
2015-08-02  Venkataramanan Kumar  
 * tree-vect-patterns.c (vect_recog_mult_pattern): New function for 
vectorizing
multiplication patterns.
 * tree-vectorizer.h: Adjust the number of patterns.

gcc/testsuite/ChangeLog
2015-08-02  Venkataramanan Kumar  
 * gcc.dg/vect/vect-mult-patterns.c: New
 

Regards,
Venkat.
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c 
b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c
new file mode 

[PATCH] Avoid recomputing multiple times the defining predicate-chain for the same PHI

2015-08-02 Thread Patrick Palka
In the uninit pass, the function is_use_properly_guarded is called
consecutively by find_uninit_use, each time being passed the same PHI
parameter, and each time we needlessly compute, simplify and normalize
the same defining predicate-chain of that PHI.

This patch fixes this inefficiency by tweaking is_use_properly_guarded
and its callers to memoize the computation of def_preds.  This should
make the pass less expensive when analyzing a PHI that has multiple
uses.

Bootstrapped and regtested on x86_64 Linux with no regressions, OK to commit?

gcc/ChangeLog:

* gcc/tree-ssa-uninit.c (find_uninit_use): Declare and pass to
is_use_properly_guarded the variable def_preds.  Free its
contents before returning.
(prune_uninit_phi_opnds_in_unrealizable_paths): Same.
(is_use_properly_guarded): Replace local variable def_preds with
a parameter.  Adjust accordingly.  Only update *def_preds if it's
empty.
---
 gcc/tree-ssa-uninit.c | 69 +--
 1 file changed, 45 insertions(+), 24 deletions(-)

diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 0ed05e1..e2ac902 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -980,6 +980,7 @@ is_use_properly_guarded (gimple use_stmt,
  basic_block use_bb,
  gphi *phi,
  unsigned uninit_opnds,
+pred_chain_union *def_preds,
  hash_set *visited_phis);
 
 /* Returns true if all uninitialized opnds are pruned. Returns false
@@ -1098,14 +1099,19 @@ prune_uninit_phi_opnds_in_unrealizable_paths (gphi *phi,
   edge opnd_edge;
   unsigned uninit_opnds2
   = compute_uninit_opnds_pos (opnd_def_phi);
+  pred_chain_union def_preds = vNULL;
+  bool ok;
   gcc_assert (!MASK_EMPTY (uninit_opnds2));
   opnd_edge = gimple_phi_arg_edge (phi, i);
-  if (!is_use_properly_guarded (phi,
-opnd_edge->src,
-opnd_def_phi,
-uninit_opnds2,
-visited_phis))
-  return false;
+  ok = is_use_properly_guarded (phi,
+   opnd_edge->src,
+   opnd_def_phi,
+   uninit_opnds2,
+   &def_preds,
+   visited_phis);
+ destroy_predicate_vecs (def_preds);
+ if (!ok)
+   return false;
 }
   else
 return false;
@@ -2158,23 +2164,31 @@ normalize_preds (pred_chain_union preds, gimple 
use_or_def, bool is_use)
true if it can be determined that the use of PHI's def in
USE_STMT is guarded with a predicate set not overlapping with
predicate sets of all runtime paths that do not have a definition.
+
Returns false if it is not or it can not be determined. USE_BB is
the bb of the use (for phi operand use, the bb is not the bb of
-   the phi stmt, but the src bb of the operand edge). UNINIT_OPNDS
-   is a bit vector. If an operand of PHI is uninitialized, the
+   the phi stmt, but the src bb of the operand edge).
+
+   UNINIT_OPNDS is a bit vector. If an operand of PHI is uninitialized, the
corresponding bit in the vector is 1.  VISIED_PHIS is a pointer
-   set of phis being visted.  */
+   set of phis being visted.
+
+   *DEF_PREDS contains the (memoized) defining predicate chains of PHI.
+   If *DEF_PREDS is the empty vector, the defining predicate chains of
+   PHI will be computed and stored into *DEF_PREDS as needed.
+
+   VISITED_PHIS is a pointer set of phis being visited.  */
 
 static bool
 is_use_properly_guarded (gimple use_stmt,
  basic_block use_bb,
  gphi *phi,
  unsigned uninit_opnds,
+pred_chain_union *def_preds,
  hash_set *visited_phis)
 {
   basic_block phi_bb;
   pred_chain_union preds = vNULL;
-  pred_chain_union def_preds = vNULL;
   bool has_valid_preds = false;
   bool is_properly_guarded = false;
 
@@ -2205,25 +2219,26 @@ is_use_properly_guarded (gimple use_stmt,
   return true;
 }
 
-  has_valid_preds = find_def_preds (&def_preds, phi);
-
-  if (!has_valid_preds)
+  if (def_preds->is_empty ())
 {
-  destroy_predicate_vecs (preds);
-  destroy_predicate_vecs (def_preds);
-  return false;
+  has_valid_preds = find_def_preds (def_preds, phi);
+
+  if (!has_valid_preds)
+   {
+ destroy_predicate_vecs (preds);
+ return false;
+   }
+
+  simplify_preds (def_preds, phi, false);
+  *def_preds = normalize_preds (*def

Re: [PATCH] Avoid recomputing multiple times the defining predicate-chain for the same PHI

2015-08-02 Thread Bernhard Reutner-Fischer
On August 2, 2015 8:18:51 PM GMT+02:00, Patrick Palka  
wrote:

No comment on the patch itself, but please 
s/visted/visited/
s/VISIED_PHIS/VISITED_PHIS/

while at it.
TIA,



Re: [PATCH] Avoid recomputing multiple times the defining predicate-chain for the same PHI

2015-08-02 Thread Patrick Palka
On Sun, Aug 2, 2015 at 2:44 PM, Bernhard Reutner-Fischer
 wrote:
> On August 2, 2015 8:18:51 PM GMT+02:00, Patrick Palka  
> wrote:
>
> No comment on the patch itself, but please
> s/visted/visited/
> s/VISIED_PHIS/VISITED_PHIS/
>
> while at it.
> TIA,
>

Sure, no problem.


Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-08-02 Thread Martin Sebor

OK for the trunk.  Sorry for the delay.


Thank you. Committed in revision 226480.

Martin



[gomp4] expand acc_on_device earlier

2015-08-02 Thread Nathan Sidwell
I've committed this to gomp4 branch.  It expands the acc_on_device builtin 
earlier in the new oacc_xform pass.  This will allow more optimization earlier on.


The existing expansion point now only needs to deal with the host-side case.

nathan
2015-08-02  Nathan Sidwell  

	gcc/
	* omp-low.c (oacc_xform_on_device): New function.
	(execute_oacc_transform): Use get_oacc_fn_attrib.  Call
	oacc_xform_on_device.
	* builtins.c (expand_builtin_on_device): Only expect to be
	expanded on host compiler.

	libgcc/
	* config/nvptx/comp-acc_on_device.c: Include gomp-constants.h.
	(acc_on_device): Code directly here.

	libgomp/
	* openacc.h (acc_on_device): Take int and explain why.
	* oacc-init.c (acc_on_device): Likewise.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 226462)
+++ gcc/omp-low.c	(working copy)
@@ -14510,29 +14510,65 @@ make_pass_late_lower_omp (gcc::context *
   return new pass_late_lower_omp (ctxt);
 }
 
+/* Transform an acc_on_device call.  The std requires this folded at
+   compile time for constant operands.  We always fold it.  In an
+   offloaded function we're never 'none'.  We cannot detect
+   host_nonshm here, as that's a dynamic feature of the runtime.
+   However, users shouldn't be using host_nonshm anyway, only the
+   test harness.  */
+
+static void
+oacc_xform_on_device (gimple_stmt_iterator *gsi, gimple stmt)
+{
+  tree arg = gimple_call_arg (stmt, 0);
+  unsigned val = GOMP_DEVICE_HOST;
+	  
+#ifdef ACCEL_COMPILER
+  val = GOMP_DEVICE_NOT_HOST;
+#endif
+  tree result = build2 (EQ_EXPR, boolean_type_node, arg,
+			build_int_cst (integer_type_node, val));
+#ifdef ACCEL_COMPILER
+  {
+tree dev  = build2 (EQ_EXPR, boolean_type_node, arg,
+			build_int_cst (integer_type_node,
+   ACCEL_COMPILER_acc_device));
+result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev);
+  }
+#endif
+  result = fold_convert (integer_type_node, result);
+  tree lhs = gimple_call_lhs (stmt);
+  gimple_seq replace = NULL;
+
+  push_gimplify_context (true);
+  gimplify_assign (lhs, result, &replace);
+  pop_gimplify_context (NULL);
+  gsi_replace_with_seq (gsi, replace, false);
+}
+
 /* Main entry point for oacc transformations which run on the device
-   compiler.  */
+   compilerafter LTO, so we know what the target device is at this
+   point (including the host fallback).  */
 
 static unsigned int
 execute_oacc_transform ()
 {
   basic_block bb;
-  gimple_stmt_iterator gsi;
-  gimple stmt;
 
-  if (!lookup_attribute ("oacc function",
-			 DECL_ATTRIBUTES (current_function_decl)))
+  if (!get_oacc_fn_attrib (current_function_decl))
 return 0;
 
-
   FOR_ALL_BB_FN (bb, cfun)
 {
-  gsi = gsi_start_bb (bb);
-
-  while (!gsi_end_p (gsi))
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	   !gsi_end_p (gsi); gsi_next (&gsi))
 	{
-	  stmt = gsi_stmt (gsi);
-	  gsi_next (&gsi);
+	  gimple stmt = gsi_stmt (gsi);
+
+	  /* acc_on_device must be evaluated at compile time for
+	 constant arguments.  */
+	  if (gimple_call_builtin_p (stmt, BUILT_IN_ACC_ON_DEVICE))
+	oacc_xform_on_device (&gsi, stmt);
 	}
 }
 
Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 226462)
+++ gcc/builtins.c	(working copy)
@@ -5880,43 +5880,39 @@ expand_stack_save (void)
 }
 
 
-/* Expand OpenACC acc_on_device.
-
-   This has to happen late (that is, not in early folding; expand_builtin_*,
-   rather than fold_builtin_*), as we have to act differently for host and
-   acceleration device (ACCEL_COMPILER conditional).  */
+/* Expand OpenACC acc_on_device.  This is expanded in the openacc
+   transform pass, but if the user has this outside of an offloaded
+   region, we'll find it here.  In that case we must be host or none.  */
 
 static rtx
 expand_builtin_acc_on_device (tree exp, rtx target)
 {
+#ifdef ACCEL_COMPILER
+  gcc_unreachable ();
+#else
+  gcc_assert (!get_oacc_fn_attrib (current_function_decl));
+  
   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
 return NULL_RTX;
 
   tree arg = CALL_EXPR_ARG (exp, 0);
-
-  /* Return (arg == v1 || arg == v2) ? 1 : 0.  */
-  machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
-  rtx v = expand_normal (arg), v1, v2;
-#ifdef ACCEL_COMPILER
-  v1 = GEN_INT (GOMP_DEVICE_NOT_HOST);
-  v2 = GEN_INT (ACCEL_COMPILER_acc_device);
-#else
-  v1 = GEN_INT (GOMP_DEVICE_NONE);
-  v2 = GEN_INT (GOMP_DEVICE_HOST);
-#endif
+  rtx val = expand_normal (arg);
   machine_mode target_mode = TYPE_MODE (integer_type_node);
   if (!target || !register_operand (target, target_mode))
 target = gen_reg_rtx (target_mode);
   emit_move_insn (target, const1_rtx);
   rtx_code_label *done_label = gen_label_rtx ();
-  do_compare_rtx_and_jump (v, v1, EQ, false, v_mode, NULL_RTX,
+  do_compare_rtx_and_jump (val, GEN_INT (GOMP_DEVICE_HOST), EQ,
+			   false, GET_MODE (val), NULL_RTX,
 			   NULL, done_label, PROB_EVE

Re: [C++/66443] virtual base of abstract class

2015-08-02 Thread Jason Merrill

On 08/01/2015 07:31 PM, Nathan Sidwell wrote:

Ok, this patch fixes things up.  The previous version was a little too
lax, extending the logic of DR1611 to all synthesized functions.
However, this broke virtual synthesized dtors, in that an abstract
class's synthesized dtor's exception specification would not take
account of any virtual base dtor exception specs.  This would mean that
a non-abstract derived class's synthesized dtor might end up with a
throwing exception spec (because the virtual base's dtor did), and that
would be looser than the exception spec on the abstract base's
non-callable synthesized dtor.  And that fails the virtual overriding
checks.


It seems to me that DR 1658 ignores vbases of abstract classes for 
determining whether a destructor is deleted, but says nothing about 
exception specifications.


DR 1351 specifically ignores vbases of abstract classes for determining 
the exception specification of a constructor, but only for constructors.


So I think that for destructors we want to walk the base, but pass in a 
fake delete_p.


Why the check for inherited_parms?  I would think that inheriting 
constructors would be handled like other ctors.


Jason



Re: C++ delayed folding branch review

2015-08-02 Thread Jason Merrill

On 07/31/2015 05:54 PM, Kai Tietz wrote:

The "STRIP_NOPS-requirement in 'reduced_constant_expression_p'" I could remove, 
but for one case in constexpr.  Without folding we don't do type-sinking/raising.


Right.


So binary/unary operations might be containing cast, which were in the past 
unexpected.


Why aren't the casts folded away?


On verify_constant we check by reduced_constant_expression_p, if value is a 
constant.  We don't handle here, that NOP_EXPRs are something we want to look 
through here, as it doesn't change anything if this is a constant, or not.


NOPs around constants should have been folded away by the time we get there.

Jason