Re: [PATCH] Simplify vector compare-not-select sequence
Bill Schmidt writes: > * gcc.target/powerpc/vec-cmp-sel.c: New test. FAIL: gcc.target/powerpc/vec-cmp-sel.c (test for excess errors) Excess errors: /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:14:1: error: use of 'long long' in AltiVec types is invalid without -mvsx /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:15:3: error: use of 'long long' in AltiVec types is invalid without -mvsx /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:16:3: error: use of 'long long' in AltiVec types is invalid without -mvsx /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:17:3: error: use of 'long long' in AltiVec types is invalid without -mvsx /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:17:29: error: incompatible types when initializing type '__vector unsigned long long' using type '__vector __bool long long' /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:18:3: error: use of 'long long' in AltiVec types is invalid without -mvsx Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
RE: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.
Hi Jakub, Thank you for reviewing the patch. I have incorporated your comments in the attached patch. > -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Wednesday, July 29, 2015 1:24 AM > To: Kumar, Venkataramanan > Cc: Richard Beiner (richard.guent...@gmail.com); gcc-patches@gcc.gnu.org > Subject: Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with > power 2 integer constant. > > Hi! > > > This is just an initial patch and tries to optimize integer type power > > 2 constants. I wanted to get feedback on this . I bootstrapped and > > reg tested on aarch64-none-linux-gnu . > > Thanks for working on it. > ChangeLog entry for the patch is missing, probably also some testcases. > > > @@ -90,6 +94,7 @@ static vect_recog_func_ptr > vect_vect_recog_func_ptrs[NUM_PATTERNS] = { > > vect_recog_rotate_pattern, > > vect_recog_vector_vector_shift_pattern, > > vect_recog_divmod_pattern, > > +vect_recog_multconst_pattern, > > vect_recog_mixed_size_cond_pattern, > > vect_recog_bool_pattern}; > > Please watch formatting, the other lines are tab indented, so please use a > tab rather than 8 spaces. > > > @@ -2147,6 +2152,87 @@ vect_recog_vector_vector_shift_pattern > (vec *stmts, > >return pattern_stmt; > > } > > > > Function comment is missing here. > > > +static gimple > > +vect_recog_multconst_pattern (vec *stmts, > > + tree *type_in, tree *type_out) > > About the function name, wonder if just vect_recog_mult_pattern wouldn't > be enough. > > > + rhs_code = gimple_assign_rhs_code (last_stmt); switch (rhs_code) > > +{ > > +case MULT_EXPR: > > + break; > > +default: > > + return NULL; > > +} > > This looks too weird, I'd just do > if (gimple_assign_rhs_code (last_stmt) != MULT_EXPR) > return NULL; > (you handle just one pattern). > > > + /* If the target can handle vectorized multiplication natively, > > + don't attempt to optimize this. */ optab = optab_for_tree_code > > + (rhs_code, vectype, optab_default); > > Supposedly you can use MULT_EXPR directly here. > > > + /* If target cannot handle vector left shift then we cannot > > + optimize and bail out. */ > > + optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector); > > + if (!optab > > + || optab_handler (optab, TYPE_MODE (vectype)) == > CODE_FOR_nothing) > > +return NULL; > > + > > + if (integer_pow2p (oprnd1)) > > +{ > > + /* Pattern detected. */ > > + if (dump_enabled_p ()) > > + dump_printf_loc (MSG_NOTE, vect_location, > > +"vect_recog_multconst_pattern: detected:\n"); > > + > > + tree shift; > > + shift = build_int_cst (itype, tree_log2 (oprnd1)); > > + pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var > (itype, NULL), > > + LSHIFT_EXPR, oprnd0, shift); > > + if (dump_enabled_p ()) > > + dump_gimple_stmt_loc (MSG_NOTE, vect_location, TDF_SLIM, > pattern_stmt, > > + 0); > > + stmts->safe_push (last_stmt); > > + *type_in = vectype; > > + *type_out = vectype; > > + return pattern_stmt; > > +} > > Trailing whitespace. > The integer_pow2p case (have you checked signed multiply by INT_MIN?) is > only one of the cases you can actually handle, you can look at expand_mult > for many other cases - e.g. multiplication by negated powers of 2, or call > choose_mult_variant and handle whatever it returns. I have added patterns that detect mults with power of 2 constants and convert them to shifts in this patch. I will do a follow up patch for other variants as done in "expand_mult" and "choose_mult_variant". INT_MIN case, I have not yet checked. I thought of adding like this. if (wi::eq_p (oprnd1, HOST_WIDE_INT_MIN)) return NULL; But wanted to confirm with you since I don't understand this clearly. Is this because Var * INT_MIN != Var << 31 ? I tried this case for "int" or long int type arr void test_vector_shifts() { for(i=0; i<=99;i++) arr[i]=arr[i]*INT_MIN; } Int type- Looks like without my patch aarch64 vectorizes it using shifts. ldr q0, [x0] shl v0.4s, v0.4s, 31 str q0, [x0], 16 For long int type - without my patch it converts to shifts but vectorization did not happen. ldr x1, [x0] neg x1, x1, lsl 31 str x1, [x0], 8 > > Jakub gcc/ChangeLog 2015-08-02 Venkataramanan Kumar * tree-vect-patterns.c (vect_recog_mult_pattern): New function for vectorizing multiplication patterns. * tree-vectorizer.h: Adjust the number of patterns. gcc/testsuite/ChangeLog 2015-08-02 Venkataramanan Kumar * gcc.dg/vect/vect-mult-patterns.c: New Regards, Venkat. diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c new file mode
[PATCH] Avoid recomputing multiple times the defining predicate-chain for the same PHI
In the uninit pass, the function is_use_properly_guarded is called consecutively by find_uninit_use, each time being passed the same PHI parameter, and each time we needlessly compute, simplify and normalize the same defining predicate-chain of that PHI. This patch fixes this inefficiency by tweaking is_use_properly_guarded and its callers to memoize the computation of def_preds. This should make the pass less expensive when analyzing a PHI that has multiple uses. Bootstrapped and regtested on x86_64 Linux with no regressions, OK to commit? gcc/ChangeLog: * gcc/tree-ssa-uninit.c (find_uninit_use): Declare and pass to is_use_properly_guarded the variable def_preds. Free its contents before returning. (prune_uninit_phi_opnds_in_unrealizable_paths): Same. (is_use_properly_guarded): Replace local variable def_preds with a parameter. Adjust accordingly. Only update *def_preds if it's empty. --- gcc/tree-ssa-uninit.c | 69 +-- 1 file changed, 45 insertions(+), 24 deletions(-) diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c index 0ed05e1..e2ac902 100644 --- a/gcc/tree-ssa-uninit.c +++ b/gcc/tree-ssa-uninit.c @@ -980,6 +980,7 @@ is_use_properly_guarded (gimple use_stmt, basic_block use_bb, gphi *phi, unsigned uninit_opnds, +pred_chain_union *def_preds, hash_set *visited_phis); /* Returns true if all uninitialized opnds are pruned. Returns false @@ -1098,14 +1099,19 @@ prune_uninit_phi_opnds_in_unrealizable_paths (gphi *phi, edge opnd_edge; unsigned uninit_opnds2 = compute_uninit_opnds_pos (opnd_def_phi); + pred_chain_union def_preds = vNULL; + bool ok; gcc_assert (!MASK_EMPTY (uninit_opnds2)); opnd_edge = gimple_phi_arg_edge (phi, i); - if (!is_use_properly_guarded (phi, -opnd_edge->src, -opnd_def_phi, -uninit_opnds2, -visited_phis)) - return false; + ok = is_use_properly_guarded (phi, + opnd_edge->src, + opnd_def_phi, + uninit_opnds2, + &def_preds, + visited_phis); + destroy_predicate_vecs (def_preds); + if (!ok) + return false; } else return false; @@ -2158,23 +2164,31 @@ normalize_preds (pred_chain_union preds, gimple use_or_def, bool is_use) true if it can be determined that the use of PHI's def in USE_STMT is guarded with a predicate set not overlapping with predicate sets of all runtime paths that do not have a definition. + Returns false if it is not or it can not be determined. USE_BB is the bb of the use (for phi operand use, the bb is not the bb of - the phi stmt, but the src bb of the operand edge). UNINIT_OPNDS - is a bit vector. If an operand of PHI is uninitialized, the + the phi stmt, but the src bb of the operand edge). + + UNINIT_OPNDS is a bit vector. If an operand of PHI is uninitialized, the corresponding bit in the vector is 1. VISIED_PHIS is a pointer - set of phis being visted. */ + set of phis being visted. + + *DEF_PREDS contains the (memoized) defining predicate chains of PHI. + If *DEF_PREDS is the empty vector, the defining predicate chains of + PHI will be computed and stored into *DEF_PREDS as needed. + + VISITED_PHIS is a pointer set of phis being visited. */ static bool is_use_properly_guarded (gimple use_stmt, basic_block use_bb, gphi *phi, unsigned uninit_opnds, +pred_chain_union *def_preds, hash_set *visited_phis) { basic_block phi_bb; pred_chain_union preds = vNULL; - pred_chain_union def_preds = vNULL; bool has_valid_preds = false; bool is_properly_guarded = false; @@ -2205,25 +2219,26 @@ is_use_properly_guarded (gimple use_stmt, return true; } - has_valid_preds = find_def_preds (&def_preds, phi); - - if (!has_valid_preds) + if (def_preds->is_empty ()) { - destroy_predicate_vecs (preds); - destroy_predicate_vecs (def_preds); - return false; + has_valid_preds = find_def_preds (def_preds, phi); + + if (!has_valid_preds) + { + destroy_predicate_vecs (preds); + return false; + } + + simplify_preds (def_preds, phi, false); + *def_preds = normalize_preds (*def
Re: [PATCH] Avoid recomputing multiple times the defining predicate-chain for the same PHI
On August 2, 2015 8:18:51 PM GMT+02:00, Patrick Palka wrote: No comment on the patch itself, but please s/visted/visited/ s/VISIED_PHIS/VISITED_PHIS/ while at it. TIA,
Re: [PATCH] Avoid recomputing multiple times the defining predicate-chain for the same PHI
On Sun, Aug 2, 2015 at 2:44 PM, Bernhard Reutner-Fischer wrote: > On August 2, 2015 8:18:51 PM GMT+02:00, Patrick Palka > wrote: > > No comment on the patch itself, but please > s/visted/visited/ > s/VISIED_PHIS/VISITED_PHIS/ > > while at it. > TIA, > Sure, no problem.
Re: [PATCH] warn for unsafe calls to __builtin_return_address
OK for the trunk. Sorry for the delay. Thank you. Committed in revision 226480. Martin
[gomp4] expand acc_on_device earlier
I've committed this to gomp4 branch. It expands the acc_on_device builtin earlier in the new oacc_xform pass. This will allow more optimization earlier on. The existing expansion point now only needs to deal with the host-side case. nathan 2015-08-02 Nathan Sidwell gcc/ * omp-low.c (oacc_xform_on_device): New function. (execute_oacc_transform): Use get_oacc_fn_attrib. Call oacc_xform_on_device. * builtins.c (expand_builtin_on_device): Only expect to be expanded on host compiler. libgcc/ * config/nvptx/comp-acc_on_device.c: Include gomp-constants.h. (acc_on_device): Code directly here. libgomp/ * openacc.h (acc_on_device): Take int and explain why. * oacc-init.c (acc_on_device): Likewise. Index: gcc/omp-low.c === --- gcc/omp-low.c (revision 226462) +++ gcc/omp-low.c (working copy) @@ -14510,29 +14510,65 @@ make_pass_late_lower_omp (gcc::context * return new pass_late_lower_omp (ctxt); } +/* Transform an acc_on_device call. The std requires this folded at + compile time for constant operands. We always fold it. In an + offloaded function we're never 'none'. We cannot detect + host_nonshm here, as that's a dynamic feature of the runtime. + However, users shouldn't be using host_nonshm anyway, only the + test harness. */ + +static void +oacc_xform_on_device (gimple_stmt_iterator *gsi, gimple stmt) +{ + tree arg = gimple_call_arg (stmt, 0); + unsigned val = GOMP_DEVICE_HOST; + +#ifdef ACCEL_COMPILER + val = GOMP_DEVICE_NOT_HOST; +#endif + tree result = build2 (EQ_EXPR, boolean_type_node, arg, + build_int_cst (integer_type_node, val)); +#ifdef ACCEL_COMPILER + { +tree dev = build2 (EQ_EXPR, boolean_type_node, arg, + build_int_cst (integer_type_node, + ACCEL_COMPILER_acc_device)); +result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev); + } +#endif + result = fold_convert (integer_type_node, result); + tree lhs = gimple_call_lhs (stmt); + gimple_seq replace = NULL; + + push_gimplify_context (true); + gimplify_assign (lhs, result, &replace); + pop_gimplify_context (NULL); + gsi_replace_with_seq (gsi, replace, false); +} + /* Main entry point for oacc transformations which run on the device - compiler. */ + compilerafter LTO, so we know what the target device is at this + point (including the host fallback). */ static unsigned int execute_oacc_transform () { basic_block bb; - gimple_stmt_iterator gsi; - gimple stmt; - if (!lookup_attribute ("oacc function", - DECL_ATTRIBUTES (current_function_decl))) + if (!get_oacc_fn_attrib (current_function_decl)) return 0; - FOR_ALL_BB_FN (bb, cfun) { - gsi = gsi_start_bb (bb); - - while (!gsi_end_p (gsi)) + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) { - stmt = gsi_stmt (gsi); - gsi_next (&gsi); + gimple stmt = gsi_stmt (gsi); + + /* acc_on_device must be evaluated at compile time for + constant arguments. */ + if (gimple_call_builtin_p (stmt, BUILT_IN_ACC_ON_DEVICE)) + oacc_xform_on_device (&gsi, stmt); } } Index: gcc/builtins.c === --- gcc/builtins.c (revision 226462) +++ gcc/builtins.c (working copy) @@ -5880,43 +5880,39 @@ expand_stack_save (void) } -/* Expand OpenACC acc_on_device. - - This has to happen late (that is, not in early folding; expand_builtin_*, - rather than fold_builtin_*), as we have to act differently for host and - acceleration device (ACCEL_COMPILER conditional). */ +/* Expand OpenACC acc_on_device. This is expanded in the openacc + transform pass, but if the user has this outside of an offloaded + region, we'll find it here. In that case we must be host or none. */ static rtx expand_builtin_acc_on_device (tree exp, rtx target) { +#ifdef ACCEL_COMPILER + gcc_unreachable (); +#else + gcc_assert (!get_oacc_fn_attrib (current_function_decl)); + if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE)) return NULL_RTX; tree arg = CALL_EXPR_ARG (exp, 0); - - /* Return (arg == v1 || arg == v2) ? 1 : 0. */ - machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg)); - rtx v = expand_normal (arg), v1, v2; -#ifdef ACCEL_COMPILER - v1 = GEN_INT (GOMP_DEVICE_NOT_HOST); - v2 = GEN_INT (ACCEL_COMPILER_acc_device); -#else - v1 = GEN_INT (GOMP_DEVICE_NONE); - v2 = GEN_INT (GOMP_DEVICE_HOST); -#endif + rtx val = expand_normal (arg); machine_mode target_mode = TYPE_MODE (integer_type_node); if (!target || !register_operand (target, target_mode)) target = gen_reg_rtx (target_mode); emit_move_insn (target, const1_rtx); rtx_code_label *done_label = gen_label_rtx (); - do_compare_rtx_and_jump (v, v1, EQ, false, v_mode, NULL_RTX, + do_compare_rtx_and_jump (val, GEN_INT (GOMP_DEVICE_HOST), EQ, + false, GET_MODE (val), NULL_RTX, NULL, done_label, PROB_EVE
Re: [C++/66443] virtual base of abstract class
On 08/01/2015 07:31 PM, Nathan Sidwell wrote: Ok, this patch fixes things up. The previous version was a little too lax, extending the logic of DR1611 to all synthesized functions. However, this broke virtual synthesized dtors, in that an abstract class's synthesized dtor's exception specification would not take account of any virtual base dtor exception specs. This would mean that a non-abstract derived class's synthesized dtor might end up with a throwing exception spec (because the virtual base's dtor did), and that would be looser than the exception spec on the abstract base's non-callable synthesized dtor. And that fails the virtual overriding checks. It seems to me that DR 1658 ignores vbases of abstract classes for determining whether a destructor is deleted, but says nothing about exception specifications. DR 1351 specifically ignores vbases of abstract classes for determining the exception specification of a constructor, but only for constructors. So I think that for destructors we want to walk the base, but pass in a fake delete_p. Why the check for inherited_parms? I would think that inheriting constructors would be handled like other ctors. Jason
Re: C++ delayed folding branch review
On 07/31/2015 05:54 PM, Kai Tietz wrote: The "STRIP_NOPS-requirement in 'reduced_constant_expression_p'" I could remove, but for one case in constexpr. Without folding we don't do type-sinking/raising. Right. So binary/unary operations might be containing cast, which were in the past unexpected. Why aren't the casts folded away? On verify_constant we check by reduced_constant_expression_p, if value is a constant. We don't handle here, that NOP_EXPRs are something we want to look through here, as it doesn't change anything if this is a constant, or not. NOPs around constants should have been folded away by the time we get there. Jason