Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts
on 2019/9/12 下午4:14, Richard Biener wrote: > On Wed, 11 Sep 2019, Kewen.Lin wrote: > >> Hi, >> >> Sorry for the late update. I've updated the words of target hooks part. >> >> Could someone help to review it? Thanks in advance! >> >> By the way, as previous emails in this thread, Bin has approved the IVOPTs >> part, while Segher has approved the rs6000 part. > > The target hooks part is OK. I guess we'll have to extend it eventually > in case other targets want to make use of it. > Thanks Richard! Committed by r275713. Yes, it's enough when doloop IV costs zero or infinite for generic/address use, but if one target wants some other values, we may have to take it as one common cost shared for all generic/address uses. It's like IV candidate cost but not the same since it's only needed when doloop IV is used for generic/address uses, I guess it requires some changes in candidate set cost calculation. I chose to keep it simple at the first place, but radar on for any other target adoptions. Thanks, Kewen > Thanks, > Richard. > >> >> Thanks, >> Kewen >> >> - >> >> gcc/ChangeLog >> >> 2019-09-11 Kewen Lin >> >> PR middle-end/80791 >> * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro. >> (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. >> (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. >> * target.def (have_count_reg_decr_p): New hook. >> (doloop_cost_for_generic): Likewise. >> (doloop_cost_for_address): Likewise. >> * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise. >> (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. >> (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. >> * doc/tm.texi: Regenerate. >> * tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost >> addend. >> (record_group): Init doloop_p. >> (add_candidate_1): Add optional argument doloop, change the handlings >> accordingly. >> (add_candidate): Likewise. >> (generic_predict_doloop_p): Update attribute. >> (force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/ >> LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/ >> UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/ >> MIN_EXPR. >> (get_computation_cost): Update for doloop IV cand extra cost. >> (determine_group_iv_cost_cond): Update for doloop IV cand. >> (determine_iv_cost): Likewise. >> (ivopts_estimate_reg_pressure): Likewise. >> (may_eliminate_iv): Update handlings for doloop IV cand. >> (add_iv_candidate_for_doloop): New function. >> (find_iv_candidates): Call function add_iv_candidate_for_doloop. >> (iv_ca_set_no_cp): Update for doloop IV cand. >> (iv_ca_set_cp): Likewise. >> (iv_ca_dump): Dump register cost. >> (find_doloop_use): New function. >> (analyze_and_mark_doloop_use): Likewise. >> (tree_ssa_iv_optimize_loop): Call function analyze_and_mark_doloop_use. >> >> gcc/testsuite/ChangeLog >> >> 2019-09-11 Kewen Lin >> >> PR middle-end/80791 >> * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change. >> * gcc.dg/tree-ssa/ivopts-lt.c: Likewise. >> * gcc.dg/tree-ssa/pr32044.c: Likewise. >> >> >> on 2019/8/23 下午6:18, Segher Boessenkool wrote: >>> Hi! >>> >>> On Fri, Aug 23, 2019 at 05:43:32PM +0800, Bin.Cheng wrote: On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin wrote: Not sure if non-ivopts parts are already approved? If so, the patch is okay with above issues addressed. >>> >>> The rs6000 part is fine. The target.def entries need some spell check >>> and copy-editing, but are obvious and trivial otherwise, and/or you can >>> approve it as ivopts maintainer. >>> Thanks very much for your time! >>> >>> And thank you as well Bin :-) >>> >>> >>> Segher >>> >> >
[patch,committed][og9][openacc-gcc-9-branch] Fix segfault with plugin-gcn under libgomp
Hi all, exactly the same as for plugin-hsa on the trunk: https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00718.html Only a different file (which is only in this branch). Tobias commit 392044a8db285d9aea0a280983ce7c5014a4e99c Author: Tobias Burnus Date: Thu Sep 12 18:07:53 2019 +0200 libgomp plugin-gcn - init string libgomp/ 2019-09-13 Tobias Burnus * plugin/plugin-gcn.c (hsa_warn, hsa_fatal, hsa_error): Ensure string is initialized. diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc index 355e406d4e3..14ed4e0ec2c 100644 --- a/libgomp/ChangeLog.openacc +++ b/libgomp/ChangeLog.openacc @@ -1,3 +1,8 @@ +2019-09-13 Tobias Burnus + + * plugin/plugin-gcn.c (hsa_warn, hsa_fatal, hsa_error): Ensure + string is initialized. + 2019-09-10 Julian Brown * plugin/plugin-gcn.c (GOMP_hsa_kernel_dispatch): Remove diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index f7e3554f297..b8ec96391f7 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -489,7 +489,7 @@ hsa_warn (const char *str, hsa_status_t status) if (!debug) return; - const char *hsa_error_msg; + const char *hsa_error_msg = "[unknown]"; hsa_fns.hsa_status_string_fn (status, &hsa_error_msg); fprintf (stderr, "GCN warning: %s\nRuntime message: %s\n", str, @@ -502,7 +502,7 @@ hsa_warn (const char *str, hsa_status_t status) static void hsa_fatal (const char *str, hsa_status_t status) { - const char *hsa_error_msg; + const char *hsa_error_msg = "[unknown]"; hsa_fns.hsa_status_string_fn (status, &hsa_error_msg); GOMP_PLUGIN_fatal ("GCN fatal error: %s\nRuntime message: %s\n", str, hsa_error_msg); @@ -514,7 +514,7 @@ hsa_fatal (const char *str, hsa_status_t status) static bool hsa_error (const char *str, hsa_status_t status) { - const char *hsa_error_msg; + const char *hsa_error_msg = "[unknown]"; hsa_fns.hsa_status_string_fn (status, &hsa_error_msg); GOMP_PLUGIN_error ("GCN fatal error: %s\nRuntime message: %s\n", str, hsa_error_msg);
[patch, fortran] Improve error messages for mismatched arguments
Hello world, the attached patch improves the rather hard to read error messages for argument mismatches. With this patch, this reads argument_checking_21.f90:7:11: 6 | call foo(1.0) ! { dg-warning "Rank mismatch" } | 2 7 | call foo(b) ! { dg-warning "Rank mismatch" } | 1 Fehler: Rank mismatch between actual argument at (1) and actual argument at (2) (scalar and rank-2) which I think is fairly clear. It also makes sure that warnings are always emitted by -fallow-argument-mismatch by removing -Wargument-mismatch. Finally, for people who do not want to have too many warnings cluttering up their logs, a type mismatch is only shown once if it is a warning. While I was going on about fixing warnings, I also fixed PR 91557 with the bit in trans-expr.c. This part is trivial, I will backport it to the other affected branches. After this, I think we can close PR 91556. Regression-tested. OK for trunk? 2019-09-13 Thomas Koenig PR fortran/91557 PR fortran/91556 * frontend-passes.c (check_externals_procedure): Reformat argument list. Use gfc_compare_actual_formal instead of gfc_procedure_use. * gfortran.h (gfc_symbol): Add flag error. * interface.c (gfc_compare_interfaces): Reformat. (argument_rank_mismatch): Add where_formal argument. If it is present, note that the error is between different calls. (compare_parameter): Change warnings that previously dependended on -Wargument-mismatch to unconditional. Issue an error / warning on type mismatch only once. Pass where_formal to argument_rank_mismatch for artificial variables. (compare_actual_formal): Change warnings that previously dependeded on -Wargument-mismatch to unconditional. (gfc_check_typebound_override): Likewise. (gfc_get_formal_from_actual_arglist): Set declared_at for artificial symbol. * invoke.texi: Extend description of -fallow-argument-mismatch. Delete -Wargument-mismatch. * lang.opt: Change -Wargument-mismatch to do-nothing option. * resolve.c (resolve_structure_cons): Change warnings that previously depended on -Wargument-mismatch to unconditional. * trans-decl.c (generate_local_decl): Do not warn if the symbol is artificial. 2019-09-13 Thomas Koenig PR fortran/91557 PR fortran/91556 * gfortran.dg/argument_checking_20.f90: New test. * gfortran.dg/argument_checking_21.f90: New test. * gfortran.dg/argument_checking_22.f90: New test. * gfortran.dg/argument_checking_23.f90: New test. * gfortran.dg/warn_unused_dummy_argument_5.f90: New test. * gfortran.dg/bessel_3.f90: Add pattern for type mismatch. * gfortran.dg/g77/20010519-1.f: Adjust dg-warning messages to new handling. * gfortran.dg/pr24823.f: Likewise. * gfortran.dg/pr39937.f: Likewise. Index: fortran/frontend-passes.c === --- fortran/frontend-passes.c (Revision 275713) +++ fortran/frontend-passes.c (Arbeitskopie) @@ -5373,7 +5373,8 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code /* Common tests for argument checking for both functions and subroutines. */ static int -check_externals_procedure (gfc_symbol *sym, locus *loc, gfc_actual_arglist *actual) +check_externals_procedure (gfc_symbol *sym, locus *loc, + gfc_actual_arglist *actual) { gfc_gsymbol *gsym; gfc_symbol *def_sym = NULL; @@ -5396,7 +5397,7 @@ static int if (def_sym) { - gfc_procedure_use (def_sym, &actual, loc); + gfc_compare_actual_formal (&actual, def_sym->formal, 0, 0, 0, loc); return 0; } Index: fortran/gfortran.h === --- fortran/gfortran.h (Revision 275713) +++ fortran/gfortran.h (Arbeitskopie) @@ -1610,6 +1610,9 @@ typedef struct gfc_symbol /* Set if this is a module function or subroutine with the abreviated declaration in a submodule. */ unsigned abr_modproc_decl:1; + /* Set if a previous error or warning has occurred and no other + should be reported. */ + unsigned error:1; int refs; struct gfc_namespace *ns; /* namespace containing this symbol */ Index: fortran/interface.c === --- fortran/interface.c (Revision 275713) +++ fortran/interface.c (Arbeitskopie) @@ -1807,9 +1807,9 @@ gfc_compare_interfaces (gfc_symbol *s1, gfc_symbol if (!compare_rank (f2->sym, f1->sym)) { if (errmsg != NULL) - snprintf (errmsg, err_len, "Rank mismatch in argument '%s' " - "(%i/%i)", f1->sym->name, symbol_rank (f1->sym), - symbol_rank (f2->sym)); + snprintf (errmsg, err_len, "Rank mismatch in argument " + "'%s' (%i/%i)", f1->sym->name, + symbol_rank (f1->sym), symbol_rank (f2->sym)
[PR target/85401] initialize the move cost table before using it
This seems to be the way the rest of ira-color.c does it. I hope it's OK. It does fix the segfault. 2019-09-10 Maya Rashish PR target/85401 * ira-color.c: (allocno_copy_cost_saving) Call ira_init_register_move_cost_if_necessary diff --git a/gcc/ira-color.c b/gcc/ira-color.c index 99236994d64..5d721102e19 100644 --- a/gcc/ira-color.c +++ b/gcc/ira-color.c @@ -2828,6 +2828,7 @@ allocno_copy_cost_saving (ira_allocno_t allocno, int hard_regno) } else gcc_unreachable (); + ira_init_register_move_cost_if_necessary(allocno_mode); cost += cp->freq * ira_register_move_cost[allocno_mode][rclass][rclass]; } return cost;
Re: [patch, fortran] Improve error messages for mismatched arguments
On Sat, Sep 14, 2019 at 02:27:15PM +0200, Thomas Koenig wrote: > > the attached patch improves the rather hard to read error > messages for argument mismatches. With this patch, this reads > > argument_checking_21.f90:7:11: > > 6 | call foo(1.0) ! { dg-warning "Rank mismatch" } >| 2 > 7 | call foo(b) ! { dg-warning "Rank mismatch" } >| 1 > Fehler: Rank mismatch between actual argument at (1) and actual argument > at (2) (scalar and rank-2) > > which I think is fairly clear. It also makes sure that warnings are > always emitted by -fallow-argument-mismatch by removing > -Wargument-mismatch. Finally, for people who do not want to have too > many warnings cluttering up their logs, a type mismatch is only > shown once if it is a warning. > > While I was going on about fixing warnings, I also fixed PR 91557 with > the bit in trans-expr.c. This part is trivial, I will backport it > to the other affected branches. > > After this, I think we can close PR 91556. Regression-tested. OK for > trunk? > Looks good to me. Thanks for working of this issue. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow
Re: [PATCH V2] Setup predicate for switch default case in IPA (PR ipa/91089)
> It is somewhat hard to write a testcase to show role of range info only with > this patch. If another patch "Generalized predicate/condition for parameter > reference in IPA (PR ipa/91088)" is accepted, it will become easy and I will > update this testcase to show that. > > And this new version also resolves a problem that we might generate very > complicated predicate for basic block in convergence point reached from > all switch cases. > >switch (index) > / | \ > case1 case2 ... default > \ | / >convergence point > > For switch/if statement, current algorithm synthesizes predicate of > convergence point by OR-combining predicates of all its cases/branches, which > might give us a very complicated one. Actually, we can get simplified > predicate > by using the fact that predicate for a basic block must also hold true for > its post > dominators. To be specific, convergence point should include (at least equal > to) > predicate of the switch/if statement. > > Honza & Martin, > > Would please review this new patch when you have time? Thanks. > > Feng > > - > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c > index 278bf606661..dc5752fc005 100644 > --- a/gcc/ipa-fnsummary.c > +++ b/gcc/ipa-fnsummary.c > @@ -1198,7 +1198,8 @@ set_cond_stmt_execution_predicate (struct > ipa_func_body_info *fbi, > /* invert_tree_comparison will return ERROR_MARK on FP >comparsions that are not EQ/NE instead of returning proper >unordered one. Be sure it is not confused with NON_CONSTANT. */ > - if (this_code != ERROR_MARK) > + if (this_code != ERROR_MARK > + && !dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest)) > { > predicate p > = add_condition (summary, index, size, &aggpos, this_code, So this change is handling the diamond conditional you describe above? This is bit of hack since it leaves the edge predicate unnecesarily conservative though I see it saves some conditions to be inserted and makes things to go smoother. Please add a comment that explain this and reffers to the other places where we do this (in the switch handling below). > @@ -1269,13 +1270,31 @@ set_switch_stmt_execution_predicate (struct > ipa_func_body_info *fbi, >if (!unmodified_parm_or_parm_agg_item (fbi, last, op, &index, &size, > &aggpos)) > return; > > + auto_vec > ranges; > + tree type = TREE_TYPE (op); > + tree one_cst = build_one_cst (type); > + int bound_limit = PARAM_VALUE (PARAM_IPA_MAX_SWITCH_PREDICATE_BOUNDS); > + int bound_count = 0; > + value_range_base vr; > + > + get_range_info (op, vr); > + >FOR_EACH_EDGE (e, ei, bb->succs) > { >e->aux = edge_predicate_pool.allocate (); >*(predicate *) e->aux = false; > } > + > + e = gimple_switch_edge (cfun, last, 0); > + /* Set BOUND_COUNT to maximum count to bypass computing predicate for > + default case if its target basic block is in convergence point of all > + switch cases, which can be determined by checking whether it > + post-dominates the switch statement. */ > + if (dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest)) > +bound_count = INT_MAX; > + >n = gimple_switch_num_labels (last); > - for (case_idx = 0; case_idx < n; ++case_idx) > + for (case_idx = 1; case_idx < n; ++case_idx) > { >tree cl = gimple_switch_label (last, case_idx); >tree min, max; > @@ -1285,10 +1304,10 @@ set_switch_stmt_execution_predicate (struct > ipa_func_body_info *fbi, >min = CASE_LOW (cl); >max = CASE_HIGH (cl); > > - /* For default we might want to construct predicate that none > - of cases is met, but it is bit hard to do not having negations > - of conditionals handy. */ > - if (!min && !max) > + /* The case's target basic block is in convergence point of all switch > + cases, its predicate should be at least as that of the switch > + statement. */ > + if (dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest)) > p = true; >else if (!max) > p = add_condition (summary, index, size, &aggpos, EQ_EXPR, > @@ -1304,7 +1323,111 @@ set_switch_stmt_execution_predicate (struct > ipa_func_body_info *fbi, > } >*(class predicate *) e->aux > = p.or_with (summary->conds, *(class predicate *) e->aux); > + > + /* If there are too many disjoint case ranges, predicate for default > + case might become too complicated. So add a limit here. */ > + if (bound_count > bound_limit) > + continue; > + > + bool new_range = true; > + > + if (!ranges.is_empty ()) > + { > + tree last_max_plus_one > + = int_const_binop (PLUS_EXPR, ranges.last ().second, one_cst); > + > + /* Merge case ranges if they are continuous. */ > + if (tree_int_cst_equal (min, last_max_plus_one)) > + new_range = false; > +
Re: [PATCH V3] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)
> +/* Analyze EXPR if it represents a series of simple operations performed on > + a function parameter and return true if so. FBI, STMT, EXPR, INDEX_P and > + AGGPOS have the same meaning like in unmodified_parm_or_parm_agg_item. > + Type of the parameter or load from an aggregate via the parameter is > + stored in *TYPE_P. Operations on the parameter are recorded to > + PARAM_OPS_P if it is not NULL. */ > + > +static bool > +decompose_param_expr (struct ipa_func_body_info *fbi, > + gimple *stmt, tree expr, > + int *index_p, tree *type_p, > + struct agg_position_info *aggpos, > + expr_eval_ops *param_ops_p = NULL) > +{ > + int op_limit = PARAM_VALUE (PARAM_IPA_MAX_PARAM_EXPR_OPS); > + int op_count = 0; > + > + if (param_ops_p) > +*param_ops_p = NULL; > + > + while (true) > +{ > + expr_eval_op eval_op; > + poly_int64 size; > + > + if (unmodified_parm_or_parm_agg_item (fbi, stmt, expr, index_p, &size, > + aggpos)) > + { > + tree type = TREE_TYPE (expr); > + > + /* Stop if found bit-field whose size does not equal any natural > + integral type. */ > + if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (type)), size)) > + break; Why is this needed? > + > + *type_p = type; > + return true; > + } > + > + if (TREE_CODE (expr) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (expr)) > + break; > + > + if (!is_gimple_assign (stmt = SSA_NAME_DEF_STMT (expr))) > + break; > + > + switch (gimple_assign_rhs_class (stmt)) > + { > + case GIMPLE_SINGLE_RHS: > + expr = gimple_assign_rhs1 (stmt); > + continue; > + > + case GIMPLE_UNARY_RHS: > + expr = gimple_assign_rhs1 (stmt); > + > + eval_op.val_is_rhs = false; I find val_is_rhs to be confusing, lhs/rhs is usually used for the gimple assignments. Here everything is rhs, but it depends whether the val is operand0 or operand1. So I guess we could do val_is_op1. > @@ -1183,10 +1301,8 @@ set_cond_stmt_execution_predicate (struct > ipa_func_body_info *fbi, >if (!is_gimple_ip_invariant (gimple_cond_rhs (last))) > return; >op = gimple_cond_lhs (last); > - /* TODO: handle conditionals like > - var = op0 < 4; > - if (var != 0). */ > - if (unmodified_parm_or_parm_agg_item (fbi, last, op, &index, &size, > &aggpos)) > + > + if (decompose_param_expr (fbi, last, op, &index, &type, &aggpos, > ¶m_ops)) > { >code = gimple_cond_code (last); >inverted_code = invert_tree_comparison (code, HONOR_NANS (op)); > @@ -1201,13 +1317,13 @@ set_cond_stmt_execution_predicate (struct > ipa_func_body_info *fbi, > if (this_code != ERROR_MARK) > { > predicate p > - = add_condition (summary, index, size, &aggpos, this_code, > - unshare_expr_without_location > - (gimple_cond_rhs (last))); > + = add_condition (summary, index, type, &aggpos, this_code, > + gimple_cond_rhs (last), param_ops); An why unsharing is no longer needed here? It is importnat to avoid anything which reffers to function local blocks to get into the global LTO stream. > +/* Check whether two set of operations have same effects. */ > +static bool > +expr_eval_ops_equal_p (expr_eval_ops ops1, expr_eval_ops ops2) > +{ > + if (ops1) > +{ > + if (!ops2 || ops1->length () != ops2->length ()) > + return false; > + > + for (unsigned i = 0; i < ops1->length (); i++) > + { > + expr_eval_op &op1 = (*ops1)[i]; > + expr_eval_op &op2 = (*ops2)[i]; > + > + if (op1.code != op2.code > + || op1.val_is_rhs != op2.val_is_rhs > + || !vrp_operand_equal_p (op1.val, op2.val) Why you need vrp_operand_equal_p instead of operand_equal_p here? Overall the patch looks very reasonable. We may end up with bit more general expressions (i.e. supporting arithmetics involving more than one operand). I see you also fixed a lot of typos in comments, please go head and commit them separately as obvious. Thank for all the work! Honza
[testsuite, obvious] Remove explicit "dg-do run" from g++.dg/vect/pr87914.cc
I found another instance of PR testsuite/83889 in the g++ testsuite. I've committed the attached patch to fix it in the same way as all the others. -Sandra 2019-09-14 Sandra Loosemore PR testsuite/83889 gcc/testsuite/ * g++.dg/vect/pr87914.cc: Remove explicit dg-do run. Index: gcc/testsuite/g++.dg/vect/pr87914.cc === --- gcc/testsuite/g++.dg/vect/pr87914.cc (revision 275717) +++ gcc/testsuite/g++.dg/vect/pr87914.cc (working copy) @@ -1,4 +1,3 @@ -// { dg-do run } // { dg-additional-options "-fopenmp-simd" } // { dg-additional-options "-mavx2" { target { avx2_runtime } } }
Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line
On 9/5/19 10:35 AM, Wilco Dijkstra wrote: > Agreed. I've got a couple of general comments: > > * The option name -matomic-ool sounds too abbreviated. I think eg. > -moutline-atomics is more descriptive and user friendlier. Changed. > * Similarly the exported __aa64_have_atomics variable could be named > __aarch64_have_lse_atomics so it's clear that it is about LSE atomics. Changed. > +@item -matomic-ool > +@itemx -mno-atomic-ool > +Enable or disable calls to out-of-line helpers to implement atomic > operations. > +These helpers will, at runtime, determine if ARMv8.1-Atomics instructions > +should be used; if not, they will use the load/store-exclusive instructions > +that are present in the base ARMv8.0 ISA. > + > +This option is only applicable when compiling for the base ARMv8.0 > +instruction set. If using a later revision, e.g. @option{-march=armv8.1-a} > +or @option{-march=armv8-a+lse}, the ARMv8.1-Atomics instructions will be > +used directly. > > So what is the behaviour when you explicitly select a specific CPU? Selecting a specific cpu selects the specific architecture that the cpu supports, does it not? Thus the architecture example above still applies. Unless I don't understand what distinction that you're making? > +/* Branch to LABEL if LSE is enabled. > + The branch should be easily predicted, in that it will, after > constructors, > + always branch the same way. The expectation is that systems that > implement > + ARMv8.1-Atomics are "beefier" than those that omit the extension. > + By arranging for the fall-through path to use load-store-exclusive insns, > + we aid the branch predictor of the smallest cpus. */ > > I'd say that by the time GCC10 is released and used in distros, systems > without > LSE atomics would be practically non-existent. So we should favour LSE atomics > by default. I suppose. Does it not continue to be true that an a53 is more impacted by the branch prediction than an a76? r~
Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)
On Wed, 11 Sep 2019 at 11:50, Wilco Dijkstra wrote: > > While code hoisting generally improves codesize, it can affect performance > negatively. Benchmarking shows it doesn't help SPEC and negatively affects > embedded benchmarks, so only enable code hoisting with -Os on Arm. > > Bootstrap OK, OK for commit? Hi Wilco, My only concern with the patch is that the issue isn't specific to code-hoisting. For this particular case (reproducible with pr77445-2.c), disabling jump threading doesn't cause the register spill with hoisting enabled. Likewise disabling forwprop3 and forwprop4 prevents the spill. The last time I tried, setting setting max-jump-thread-duplication-stmts to 20 and fsm-scale-path-stmts to 3, not only removed the spill but also resulted in 9 more hoistings, but regressed some other test on jump threading. So I was wondering whether we should look into fine-tuning relevant params, instead of disabling code hoisting entirely (we will end up regressing some test cases either way) ? Thanks, Prathamesh > > ChangeLog: > 2019-09-11 Wilco Dijkstra > > > PR tree-optimization/80155 > * common/config/arm/arm-common.c (arm_option_optimization_table): > Enable -fcode-hoisting with -Os. > > -- > diff --git a/gcc/common/config/arm/arm-common.c > b/gcc/common/config/arm/arm-common.c > index > 41a920f6dc96833e778faa8dbcc19beac483734c..b0d5fb300bf01acc1fb6f4631635f8a1bfe6441c > 100644 > --- a/gcc/common/config/arm/arm-common.c > +++ b/gcc/common/config/arm/arm-common.c > @@ -39,6 +39,8 @@ static const struct default_options > arm_option_optimization_table[] = > /* Enable section anchors by default at -O1 or higher. */ > { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 }, > { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, > +/* Enable code hoisting only with -Os. */ > +{ OPT_LEVELS_SIZE, OPT_fcode_hoisting, NULL, 1 }, > { OPT_LEVELS_NONE, 0, NULL, 0 } >}; > >