[c-family] Plug small loophole in the C/C++ binding for Ada
This plugs a small loophole in the improved handling of forward declarations in the C/C++ binding for Ada, whereby a forward declaration for a type is not generated if the typedef declaration making it necessary uses the same name as the tag of the type. Tested on x86-64/Linux, applied on the mainline and 8 branch. 2018-06-01 Eric Botcazou c-family/ * c-ada-spec.c (dump_ada_declaration) : Generate a forward declaration for a typedef independently of whether the declaration of the subtype is generated. -- Eric BotcazouIndex: c-ada-spec.c === --- c-ada-spec.c (revision 260913) +++ c-ada-spec.c (working copy) @@ -2734,19 +2734,25 @@ dump_ada_declaration (pretty_printer *bu if (TYPE_NAME (typ)) { - /* If types have same representation, and same name (ignoring - casing), then ignore the second type. */ + /* If the types have the same name (ignoring casing), then ignore + the second type, but forward declare the first if need be. */ if (type_name (typ) == type_name (TREE_TYPE (t)) || !strcasecmp (type_name (typ), type_name (TREE_TYPE (t { + if (RECORD_OR_UNION_TYPE_P (typ) && !TREE_VISITED (stub)) + { + INDENT (spc); + dump_forward_type (buffer, typ, t, 0); + } + TREE_VISITED (t) = 1; return 0; } INDENT (spc); - if (RECORD_OR_UNION_TYPE_P (typ)) - dump_forward_type (buffer, stub, t, spc); + if (RECORD_OR_UNION_TYPE_P (typ) && !TREE_VISITED (stub)) + dump_forward_type (buffer, typ, t, spc); pp_string (buffer, "subtype "); dump_ada_node (buffer, t, type, spc, false, true);
Re: fwprop addressing costs
On Fri, Jun 1, 2018 at 8:59 AM Robin Dapp wrote: > > Hi, > > when investigating a regression, I realized that we create a superfluous > load on S390. The snippet looks something like > > LA %r10, 0(%r8,%r9) > LLH %r4, 0(%r10) > > meaning the address in r10 is computed by an LA even though LLH supports > the addressing already. The same address is used multiple times so > combine cannot do something about it. > > Looking into fwprop, I realized it actually tries to propagate the > address but exits because we specify higher costs for an address with > index than for one without. This was meant to account for the fact > that, in general and all other things being equal, not every instruction > can handle indexed addressing mode. > > Now, in this case, fwprop actually knows the instructions it propagates > into and could decide based on the full costs, seeing that it would not > be more expensive. Currently, it recursively descends to the parts that > are going to be propagated or replaced and compares the costs of both > without regarding the full instruction. > > Would it make sense to enhance fwprop with a more detailed cost > evaluation or are there other passes that should do the same - i.e. > what's the preferred way to solve this? Is changing the address costs in > the backend depending on addressing mode sensible at all? As far as I > can see, the x86 backend also changes costs depending on global > properties (i.e. to prefer fewer registers when addressing). forwprop could possibly use the new insn_cost hook > Regards > Robin >
Re: [PATCH 09/10] Experiment with using optinfo in gimple-loop-interchange.cc
On Tue, May 29, 2018 at 10:33 PM David Malcolm wrote: > > This was an experiment to try to capture information on a > loop optimization. > > gcc/ChangeLog: > * gimple-loop-interchange.cc (should_interchange_loops): Add > optinfo note when interchange gives better data locality behavior. > (tree_loop_interchange::interchange): Add OPTINFO_SCOPE. > Add optinfo for successful and unsuccessful interchanges. > (prepare_perfect_loop_nest): Add OPTINFO_SCOPE. Add > optinfo note. > (pass_linterchange::execute): Add OPTINFO_SCOPE. > --- > gcc/gimple-loop-interchange.cc | 36 +++- > 1 file changed, 35 insertions(+), 1 deletion(-) > > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc > index eb35263..cd32288 100644 > --- a/gcc/gimple-loop-interchange.cc > +++ b/gcc/gimple-loop-interchange.cc > @@ -1556,7 +1556,19 @@ should_interchange_loops (unsigned i_idx, unsigned > o_idx, >ratio = innermost_loops_p ? INNER_STRIDE_RATIO : OUTER_STRIDE_RATIO; >/* Do interchange if it gives better data locality behavior. */ >if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, ratio))) > -return true; > +{ > + if (optinfo_enabled_p ()) > + OPTINFO_NOTE ((gimple *)NULL) // FIXME > + << "interchange gives better data locality behavior: " > + << "iloop_strides: " > + << decu (iloop_strides) > + << " > (oloop_strides: " > + << decu (oloop_strides) > + << " * ratio: " > + << decu (ratio) > + << ")"; Just randomly inside the thread. NOO! :/ Please do _not_ add more stream-like APIs. How do you expect translators to deal with those? Yes, I'm aware of the graphite-* ones and I dislike those very much. What's wrong with the existing dump API? Richard. > + return true; > +} >if (wi::gtu_p (iloop_strides, oloop_strides)) > { >/* Or it creates more invariant memory references. */ > @@ -1578,6 +1590,8 @@ bool > tree_loop_interchange::interchange (vec datarefs, > vec ddrs) > { > + OPTINFO_SCOPE ("tree_loop_interchange::interchange", m_loop_nest[0]); > + >location_t loc = find_loop_location (m_loop_nest[0]); >bool changed_p = false; >/* In each iteration we try to interchange I-th loop with (I+1)-th loop. > @@ -1628,6 +1642,10 @@ tree_loop_interchange::interchange > (vec datarefs, > fprintf (dump_file, > "Loop_pair is interchanged\n\n", > oloop.m_loop->num, iloop.m_loop->num); > + if (optinfo_enabled_p ()) > + OPTINFO_SUCCESS (oloop.m_loop) > + << optinfo_printf ("Loop_pair is > interchanged", > +oloop.m_loop->num, iloop.m_loop->num); > > changed_p = true; > interchange_loops (iloop, oloop); > @@ -1641,6 +1659,10 @@ tree_loop_interchange::interchange > (vec datarefs, > fprintf (dump_file, > "Loop_pair is not interchanged\n\n", > oloop.m_loop->num, iloop.m_loop->num); > + if (optinfo_enabled_p ()) > + OPTINFO_FAILURE (oloop.m_loop) > + << optinfo_printf ("Loop_pair is not > interchanged", > +oloop.m_loop->num, iloop.m_loop->num); > } > } >simple_dce_from_worklist (m_dce_seeds); > @@ -1648,6 +1670,9 @@ tree_loop_interchange::interchange > (vec datarefs, >if (changed_p) > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, > "loops interchanged in loop nest\n"); > + if (optinfo_enabled_p ()) > +OPTINFO_SUCCESS (m_loop_nest[0]) > + << "loops interchanged in loop nest"; > >return changed_p; > } > @@ -1971,6 +1996,8 @@ static bool > prepare_perfect_loop_nest (struct loop *loop, vec *loop_nest, >vec *datarefs, vec *ddrs) > { > + OPTINFO_SCOPE ("prepare_perfect_loop_nest", loop); > + >struct loop *start_loop = NULL, *innermost = loop; >struct loop *outermost = loops_for_fn (cfun)->tree_root; > > @@ -2029,6 +2056,12 @@ prepare_perfect_loop_nest (struct loop *loop, > vec *loop_nest, > fprintf (dump_file, >"\nConsider loop interchange for loop_nest<%d - %d>\n", >start_loop->num, innermost->num); > + if (optinfo_enabled_p ()) > + { > + OPTINFO_NOTE (start_loop) > + << optinfo_printf ("consider loop interchange for loop_nest<%d > - %d>", > +start_loop->num, innermost->num); > + } > > if (loop != start_loop) > prune_access_strides_not_in_loop (start_loop, innermost, *datarefs); > @@ -2061,6 +2094,7 @@ pass_linterchange::execute (function *fun) >struct loop *loop; >FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST) > { > + OPTINFO_SCOPE ("considering loop for in
Re: [RFC] [aarch64] Add HiSilicon tsv110 CPU support
Hi Ramana, Sorry to reply so later because of short leave. On 2018/5/23 18:41, Ramana Radhakrishnan wrote: > > > On 23/05/2018 03:50, Zhangshaokun wrote: >> Hi Ramana, >> >> On 2018/5/22 18:28, Ramana Radhakrishnan wrote: >>> On Tue, May 22, 2018 at 9:40 AM, Shaokun Zhang >>> wrote: tsv110 is designed by HiSilicon and supports v8_4A, it also optimizes L1 Icache which can access L1 Dcache. Therefore, DC CVAU is not necessary in __aarch64_sync_cache_range for tsv110, is there any good idea to skip DC CVAU operation for tsv110. >>> >>> A solution would be to use an ifunc but on a cpu variant. >>> >> >> ifunc, can you give further explanation? >> If on a cpu variant, for HiSilicon tsv110, we have two version and CPU >> variants >> are 0 and 1. Both are expected to skip DC CVAU operation in sync icache and >> dcache. > >>> Since it is not necessary for sync icache and dcache, it is beneficial for >>> performance to skip the redundant DC CVAU and do IC IVAU only. >>> For JVM, __clear_cache is called many times. >>> > > Thanks for some more detail as to where you think you want to use this. Have > you investigated whether the jvm can actually elide such a call rather than > trying to fix this in the toolchain ? > In fact, We(HiSilicon) want optimize DC CVAU not only in the toolchain, but also for LLVM and others. > If you really need to think about solutions in the toolchain - > > The simplest first step would be to implement the changes hinted at by the > comment in aarch64.h . > > If you read the comment above CLEAR_INSN_CACHE in aarch64.h you would see > that > > /* This definition should be relocated to aarch64-elf-raw.h. This macro >should be undefined in aarch64-linux.h and a clear_cache pattern >implmented to emit either the call to __aarch64_sync_cache_range() >directly or preferably the appropriate sycall or cache clear >instructions inline. */ > #define CLEAR_INSN_CACHE(beg, end) \ > extern void __aarch64_sync_cache_range (void *, void *); \ > __aarch64_sync_cache_range (beg, end) > > Thus I would expect that by implementing the clear_cache pattern and deciding > whether to put out the call to the __aarch64_sync_cache_range function or not > depending on whether you had the tsv110 chosen on the command line would > allow you to have an idea of what the performance gain actually is by > compiling the jvm with -mcpu=tsv110 vs -march=armv8-a. You probably also want > to clean up the trampoline_init code while you are here. > Thanks for your nice explanation and guidance. For our next generation cpu core tsv200, We will optimize for IC IVAU that there is no need to flush Icache, keep the clear_cache as NOP function. Shall I consider this? or Maybe i lose something what your said. Thanks, Shaokun > I do think that's something that should be easy enough to do and the subject > of a patch series in its own right. If your users can rebuild the world for > tsv110 then this is sufficient. > > If you want to have a single jvm binary without any run time checks, then you > need to investigate the use of ifuncs which are a mechanism in the GNU > toolchain for some of this kind of stuff. We tend not to ifuncs on a per CPU > basis unless there is a very good reason and the performance improvement is > worth it (but probably more on a per architecture or per architectural basis) > and you will need to make the case for it including what sort of performance > benefits it gives. Some introduction about this feature can be found here. > https://sourceware.org/glibc/wiki/GNU_IFUNC > > regards > Ramana > >> >> Hi ARM guys, >> are you happy to share yours idea about this? >> >>> Is this really that important for performance and on what workloads ? >>> >> >> Since it is not necessary for sync icache and dcache, it is beneficial for >> performance to skip the redundant DC CVAU and do IC IVAU only. >> For JVM, __clear_cache is called many times. >> >> Thanks, >> Shaokun >> >>> regards >>> Ramana >>> Any thoughts and ideas are welcome. Shaokun Zhang (1): [aarch64] Add HiSilicon tsv110 CPU support. gcc/ChangeLog| 9 +++ gcc/config/aarch64/aarch64-cores.def | 5 ++ gcc/config/aarch64/aarch64-cost-tables.h | 103 +++ gcc/config/aarch64/aarch64-tune.md | 2 +- gcc/config/aarch64/aarch64.c | 79 gcc/doc/invoke.texi | 2 +- 6 files changed, 198 insertions(+), 2 deletions(-) -- 2.7.4 >>> >>> >> > > . >
Re: [RFC][PR82479] missing popcount builtin detection
Hi Bin, Thanks a lo for the review. On 1 June 2018 at 03:45, Bin.Cheng wrote: > On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah > wrote: >> Hi Bin, >> >> Thanks for the review. Please find the revised patch based on the >> review comments. >> >> Thanks, >> Kugan >> >> On 17 May 2018 at 19:56, Bin.Cheng wrote: >>> On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah >>> wrote: Hi Richard, On 6 March 2018 at 02:24, Richard Biener wrote: > On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah > wrote: >> Hi Richard, >> > > Hi, > Thanks very much for working. > >> +/* Utility function to check if OP is defined by a stmt >> + that is a val - 1. If that is the case, set it to STMT. */ >> + >> +static bool >> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt) > This is checking if op is defined as val - 1, so name it as > ssa_defined_by_minus_one_stmt_p? > >> +{ >> + if (TREE_CODE (op) == SSA_NAME >> + && (*stmt = SSA_NAME_DEF_STMT (op)) >> + && is_gimple_assign (*stmt) >> + && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR) >> + && val == gimple_assign_rhs1 (*stmt) >> + && integer_minus_onep (gimple_assign_rhs2 (*stmt))) >> +return true; >> + else >> +return false; > You can simply return the boolean condition. Done. > >> +} >> + >> +/* See if LOOP is a popcout implementation of the form > ... >> + rhs1 = gimple_assign_rhs1 (and_stmt); >> + rhs2 = gimple_assign_rhs2 (and_stmt); >> + >> + if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one)) >> +rhs1 = rhs2; >> + else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, &and_minus_one)) >> +; >> + else >> +return false; >> + >> + /* Check the recurrence. */ >> + phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one)); > So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true? Please > use rhs1 directly. Done. >> + gimple *src_phi = SSA_NAME_DEF_STMT (rhs2); > I think this is checking wrong thing and is redundant. Either rhs2 > equals to rhs1 or is defined as (rhs1 - 1). For (rhs2 == rhs1) case, > the check duplicates checking on phi; for the latter, it's never a PHI > stmt and shouldn't be checked. > >> + if (gimple_code (phi) != GIMPLE_PHI >> + || gimple_code (src_phi) != GIMPLE_PHI) >> +return false; >> + >> + dest = gimple_assign_lhs (count_stmt); >> + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); >> + tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge >> (loop)->dest_idx); >> + if (adjust) >> +iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), >> +build_call_expr (fn, 1, src), >> +build_int_cst (TREE_TYPE (dest), 1)); >> + else >> +iter = build_call_expr (fn, 1, src); > Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as > niters type. Though unsigned type is unnecessary in this case, but > better to follow existing behavior? Done. > >> + max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest))); > As richi suggested, max should be the number of bits in type of IV. > >> + >> + niter->assumptions = boolean_false_node; > Redundant. Not sure I understand. If I remove this, I am getting ICE (segmentation fault). What is the expectation here? >> + niter->control.base = NULL_TREE; >> + niter->control.step = NULL_TREE; >> + niter->control.no_overflow = false; >> + niter->niter = iter; >> + niter->assumptions = boolean_true_node; >> + niter->may_be_zero = boolean_false_node; >> + niter->max = max; >> + niter->bound = NULL_TREE; >> + niter->cmp = ERROR_MARK; >> + return true; >> +} >> + >> + > Appology if these are nitpickings. Thanks for the review. I am happy to make the changes needed to get it to how it should be :) Thanks, Kugan > > Thanks, > bin From f45179c777d846731d2d899a142c45a36ab35fd1 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah Date: Thu, 10 May 2018 21:41:53 +1000 Subject: [PATCH] popcount Change-Id: I10c1f990e5b9c9900cf7c93678df924f0463b72e --- gcc/ipa-fnsummary.c | 4 + gcc/testsuite/gcc.dg/tree-ssa/popcount.c | 41 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c | 29 ++ gcc/testsuite/gcc.dg/tree-ssa/popcount3.c | 28 ++ gcc/tree-scalar-evolution.c | 7 ++ gcc/tree-ssa-loop-ivopts.c| 10 ++ gcc/tree-ssa-loop-niter.c | 152 +- 7 files changed, 270 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index bdf9ba1..4dc156a 100644 --- a/gcc/ipa-fnsummary.c +++ b/gcc/ipa-fnsummary.c @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info, nonconstant_names); return p2.or_with (summary->conds, p1); } +
Re: [PATCH 01/10] Convert dump and optgroup flags to enums
On Tue, May 29, 2018 at 10:32 PM David Malcolm wrote: > > The dump machinery uses "int" in a few places, for two different > sets of bitmasks. > > This patch makes things more self-documenting and type-safe by using > a new pair of enums: one for the dump_flags_t and another for the > optgroup_flags. Great! This should also make them accessible in gdb w/o using -g3. > This requires adding some overloaded bit operations to the enums > in question, which, in this patch is done for each enum . If the basic > idea is OK, should I add a template for this? (with some kind of > magic to express that bitmasking operations are only supported on > certain opt-in enums). Does C++ allow > int enums? I think we want some way of knowing when either enum exceeds int (dump_flags_t was already uint64_t but you now make it effectively int again). That is, general wrapping for enum ops should chose an appropriate unsigned integer for the operation. So yes, a common implementation looks useful to me. I think this patch is independently useful. Thanks, Richard. > > gcc/c-family/ChangeLog: > * c-pretty-print.c (c_pretty_printer::statement): Use TDF_NONE > rather than 0. > > gcc/ChangeLog: > * cfg.c (debug): Use TDF_NONE rather than 0. > * cfghooks.c (debug): Likewise. > * dumpfile.c (DUMP_FILE_INFO): Likewise; also for OPTGROUP. > (struct dump_option_value_info): Convert to... > (struct kv_pair): ...this template type. > (dump_options): Convert to kv_pair; use TDF_NONE > rather than 0. > (optinfo_verbosity_options): Likewise. > (optgroup_options): Convert to kv_pair; use > OPTGROUP_NONE. > (gcc::dump_manager::dump_register): Use optgroup_flags_t rather > than int for "optgroup_flags" param. > (dump_generic_expr_loc): Use dump_flags_t rather than int for > "dump_kind" param. > (dump_finish): Use TDF_NONE rather than 0. > (gcc::dump_manager::opt_info_enable_passes): Use optgroup_flags_t > rather than int for "optgroup_flags" param. Use TDF_NONE rather > than 0. Update for change to option_ptr. > (opt_info_switch_p_1): Convert "optgroup_flags" param from int * > to optgroup_flags_t *. Use TDF_NONE and OPTGROUP_NONE rather than > 0. Update for changes to optinfo_verbosity_options and > optgroup_options. > (opt_info_switch_p): Convert optgroup_flags from int to > optgroup_flags_t. > * dumpfile.h (TDF_ADDRESS, TDF_SLIM, TDF_RAW, TDF_DETAILS, > TDF_STATS, TDF_BLOCKS, TDF_VOPS, TDF_LINENO, TDF_UID) > TDF_STMTADDR, TDF_GRAPH, TDF_MEMSYMS, TDF_RHS_ONLY, TDF_ASMNAME, > TDF_EH, TDF_NOUID, TDF_ALIAS, TDF_ENUMERATE_LOCALS, TDF_CSELIB, > TDF_SCEV, TDF_GIMPLE, TDF_FOLDING, MSG_OPTIMIZED_LOCATIONS, > MSG_MISSED_OPTIMIZATION, MSG_NOTE, MSG_ALL, TDF_COMPARE_DEBUG, > TDF_NONE): Convert from macros to... > (enum dump_flag): ...this new enum. > (dump_flags_t): Update to use enum. > (operator|, operator&, operator~, operator|=, operator&=): > Implement for dump_flags_t. > (OPTGROUP_NONE, OPTGROUP_IPA, OPTGROUP_LOOP, OPTGROUP_INLINE, > OPTGROUP_OMP, OPTGROUP_VEC, OPTGROUP_OTHER, OPTGROUP_ALL): > Convert from macros to... > (enum optgroup_flag): ...this new enum. > (optgroup_flags_t): New typedef. > (operator|, operator|=): Implement for optgroup_flags_t. > (struct dump_file_info): Convert field "alt_flags" to > dump_flags_t. Convert field "optgroup_flags" to > optgroup_flags_t. > (dump_register): Convert param "optgroup_flags" to > optgroup_flags_t. > (opt_info_enable_passes): Likewise. > * early-remat.c (early_remat::dump_edge_list): Use TDF_NONE rather > than 0. > * gimple-pretty-print.c (debug): Likewise. > * gimple-ssa-store-merging.c (bswap_replace): Likewise. > (merged_store_group::apply_stores): Likewise. > * gimple-ssa-strength-reduction.c (insert_initializers): Likewise. > * gimple.c (verify_gimple_pp): Likewise. > * passes.c (pass_manager::register_one_dump_file): Convert > local "optgroup_flags" to optgroup_flags_t. > * print-tree.c (print_node): Use TDF_NONE rather than 0. > (debug): Likewise. > (debug_body): Likewise. > * tree-pass.h (struct pass_data): Convert field "optgroup_flags" > to optgroup_flags_t. > * tree-pretty-print.c (print_struct_decl): Use TDF_NONE rather > than 0. > * tree-ssa-math-opts.c (convert_mult_to_fma_1): Likewise. > (convert_mult_to_fma): Likewise. > * tree-ssa-reassoc.c (undistribute_ops_list): Likewise. > * tree-ssa-sccvn.c (vn_eliminate): Likewise. > * tree-vect-data-refs.c (dump_lower_bound): Convert param > "dump_kind" to dump_flags_t
Re: [RFC][PR82479] missing popcount builtin detection
On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah wrote: > Hi Bin, > > Thanks a lo for the review. > > On 1 June 2018 at 03:45, Bin.Cheng wrote: >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah >> wrote: >>> Hi Bin, >>> >>> Thanks for the review. Please find the revised patch based on the >>> review comments. >>> >>> Thanks, >>> Kugan >>> >>> On 17 May 2018 at 19:56, Bin.Cheng wrote: On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah wrote: > Hi Richard, > > On 6 March 2018 at 02:24, Richard Biener > wrote: >> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah >> wrote: >>> Hi Richard, >>> >> >> Hi, >> Thanks very much for working. >> >>> +/* Utility function to check if OP is defined by a stmt >>> + that is a val - 1. If that is the case, set it to STMT. */ >>> + >>> +static bool >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt) >> This is checking if op is defined as val - 1, so name it as >> ssa_defined_by_minus_one_stmt_p? >> >>> +{ >>> + if (TREE_CODE (op) == SSA_NAME >>> + && (*stmt = SSA_NAME_DEF_STMT (op)) >>> + && is_gimple_assign (*stmt) >>> + && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR) >>> + && val == gimple_assign_rhs1 (*stmt) >>> + && integer_minus_onep (gimple_assign_rhs2 (*stmt))) >>> +return true; >>> + else >>> +return false; >> You can simply return the boolean condition. > Done. > >> >>> +} >>> + >>> +/* See if LOOP is a popcout implementation of the form >> ... >>> + rhs1 = gimple_assign_rhs1 (and_stmt); >>> + rhs2 = gimple_assign_rhs2 (and_stmt); >>> + >>> + if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one)) >>> +rhs1 = rhs2; >>> + else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, >>> &and_minus_one)) >>> +; >>> + else >>> +return false; >>> + >>> + /* Check the recurrence. */ >>> + phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one)); >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true? Please >> use rhs1 directly. > > Done. >>> + gimple *src_phi = SSA_NAME_DEF_STMT (rhs2); >> I think this is checking wrong thing and is redundant. Either rhs2 >> equals to rhs1 or is defined as (rhs1 - 1). For (rhs2 == rhs1) case, >> the check duplicates checking on phi; for the latter, it's never a PHI >> stmt and shouldn't be checked. >> >>> + if (gimple_code (phi) != GIMPLE_PHI >>> + || gimple_code (src_phi) != GIMPLE_PHI) >>> +return false; >>> + >>> + dest = gimple_assign_lhs (count_stmt); >>> + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); >>> + tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge >>> (loop)->dest_idx); >>> + if (adjust) >>> +iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), >>> +build_call_expr (fn, 1, src), >>> +build_int_cst (TREE_TYPE (dest), 1)); >>> + else >>> +iter = build_call_expr (fn, 1, src); >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as >> niters type. Though unsigned type is unnecessary in this case, but >> better to follow existing behavior? > > Done. >> >>> + max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest))); >> As richi suggested, max should be the number of bits in type of IV. >> >>> + >>> + niter->assumptions = boolean_false_node; >> Redundant. > > Not sure I understand. If I remove this, I am getting ICE > (segmentation fault). What is the expectation here? Is it a simple typo? Because assumptions is set to boolean_true_node just 5 lines below? The niters part looks good for me with this change. You may need richi's approval for other parts? Thanks, bin > >>> + niter->control.base = NULL_TREE; >>> + niter->control.step = NULL_TREE; >>> + niter->control.no_overflow = false; >>> + niter->niter = iter; >>> + niter->assumptions = boolean_true_node; >>> + niter->may_be_zero = boolean_false_node; >>> + niter->max = max; >>> + niter->bound = NULL_TREE; >>> + niter->cmp = ERROR_MARK; >>> + return true; >>> +} >>> + >>> + >> Appology if these are nitpickings. > Thanks for the review. I am happy to make the changes needed to get it > to how it should be :) > > Thanks, > Kugan > >> >> Thanks, >> bin
[gomp5] Same variable can't appear in multiple nontemporal clauses
Hi! Just a simple restriction, committed to gomp-5_0-branch. 2018-06-01 Jakub Jelinek * c-typeck.c (c_finish_omp_clauses): Diagnose more than one nontemporal clause refering to the same variable. * semantics.c (finish_omp_clauses): Diagnose more than one nontemporal clause refering to the same variable. * c-c++-common/gomp/nontemporal-2.c: New test. --- gcc/c/c-typeck.c.jj 2018-05-04 19:08:55.309273302 +0200 +++ gcc/c/c-typeck.c2018-06-01 09:52:06.498468358 +0200 @@ -13061,8 +13061,10 @@ c_finish_omp_clauses (tree clauses, enum bitmap_initialize (&firstprivate_head, &bitmap_default_obstack); bitmap_initialize (&lastprivate_head, &bitmap_default_obstack); bitmap_initialize (&aligned_head, &bitmap_default_obstack); + /* If ort == C_ORT_OMP_DECLARE_SIMD used as uniform_head instead. */ bitmap_initialize (&map_head, &bitmap_default_obstack); bitmap_initialize (&map_field_head, &bitmap_default_obstack); + /* If ort == C_ORT_OMP used as nontemporal_head instead. */ bitmap_initialize (&oacc_reduction_head, &bitmap_default_obstack); if (ort & C_ORT_ACC) @@ -13534,6 +13536,15 @@ c_finish_omp_clauses (tree clauses, enum "%qE is not a variable in % clause", t); remove = true; } + else if (bitmap_bit_p (&oacc_reduction_head, DECL_UID (t))) + { + error_at (OMP_CLAUSE_LOCATION (c), + "%qE appears more than once in % " + "clauses", t); + remove = true; + } + else + bitmap_set_bit (&oacc_reduction_head, DECL_UID (t)); break; case OMP_CLAUSE_DEPEND: --- gcc/cp/semantics.c.jj 2018-05-31 15:18:22.659856911 +0200 +++ gcc/cp/semantics.c 2018-06-01 10:15:48.760084496 +0200 @@ -5871,8 +5871,10 @@ finish_omp_clauses (tree clauses, enum c bitmap_initialize (&firstprivate_head, &bitmap_default_obstack); bitmap_initialize (&lastprivate_head, &bitmap_default_obstack); bitmap_initialize (&aligned_head, &bitmap_default_obstack); + /* If ort == C_ORT_OMP_DECLARE_SIMD used as uniform_head instead. */ bitmap_initialize (&map_head, &bitmap_default_obstack); bitmap_initialize (&map_field_head, &bitmap_default_obstack); + /* If ort == C_ORT_OMP used as nontemporal_head instead. */ bitmap_initialize (&oacc_reduction_head, &bitmap_default_obstack); if (ort & C_ORT_ACC) @@ -6625,6 +6627,14 @@ finish_omp_clauses (tree clauses, enum c error ("%qE is not a variable in % clause", t); remove = true; } + else if (bitmap_bit_p (&oacc_reduction_head, DECL_UID (t))) + { + error ("%qD appears more than once in % clauses", +t); + remove = true; + } + else + bitmap_set_bit (&oacc_reduction_head, DECL_UID (t)); break; case OMP_CLAUSE_DEPEND: --- gcc/testsuite/c-c++-common/gomp/nontemporal-2.c.jj 2018-06-01 10:37:11.305873349 +0200 +++ gcc/testsuite/c-c++-common/gomp/nontemporal-2.c 2018-06-01 12:13:16.106230308 +0200 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ + +#define N 1024 +extern int a[N], b[N], c[N], d[N]; + +void +foo (void) +{ + int i; + #pragma omp simd nontemporal (a, b) aligned (a, b, c) + for (i = 0; i < N; ++i) +a[i] = b[i] + c[i]; + #pragma omp simd nontemporal (d) nontemporal (d) /* { dg-error "'d' appears more than once in 'nontemporal' clauses" } */ + for (i = 0; i < N; ++i) +d[i] = 2 * c[i]; + #pragma omp simd nontemporal (a, b, b) /* { dg-error "'b' appears more than once in 'nontemporal' clauses" } */ + for (i = 0; i < N; ++i) +a[i] += b[i] + c[i]; +} Jakub
Re: Use conditional internal functions in if-conversion
On Fri, May 25, 2018 at 3:06 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Fri, May 25, 2018 at 1:49 PM Richard Sandiford < > > richard.sandif...@linaro.org> wrote: > > > >> Richard Biener writes: > >> > On Fri, May 25, 2018 at 12:12 PM Richard Sandiford < > >> > richard.sandif...@linaro.org> wrote: > >> > > >> >> Richard Biener writes: > >> >> > On Wed, May 16, 2018 at 12:17 PM Richard Sandiford < > >> >> > richard.sandif...@linaro.org> wrote: > >> >> >> This patch uses IFN_COND_* to vectorise conditionally-executed, > >> >> >> potentially-trapping arithmetic, such as most floating-point > >> >> >> ops with -ftrapping-math. E.g.: > >> >> > > >> >> >> if (cond) { ... x = a + b; ... } > >> >> > > >> >> >> becomes: > >> >> > > >> >> >> ... > >> >> >> x = IFN_COND_ADD (cond, a, b); > >> >> >> ... > >> >> > > >> >> >> When this transformation is done on its own, the value of x for > >> >> >> !cond isn't important. > >> >> > > >> >> >> However, the patch also looks for the equivalent of: > >> >> > > >> >> >> y = cond ? x : a; > >> >> > > >> >> > As generated by predicate_all_scalar_phis, right? So this tries to > >> > capture > >> >> > > >> >> > if (cond) > >> >> > y = a / b; > >> >> > else > >> >> > y = a; > >> >> > > >> >> > But I think it would be more useful to represent the else value > >> > explicitely > >> >> > as extra operand given a constant like 0 is sth that will happen in > >> > practice > >> >> > and is also sth that is I think supported as a result by some > > targets. > >> >> > > >> >> > So to support the flow of current if-conversion you'd convert to > >> >> > > >> >> > IFN_COND_ADD (cond, a, b, a); // or any other reasonable "default" > >> > value > >> >> > (target controlled?) > >> >> > > >> >> > and you merge that with a single-use COND_EXPR by replacing that > >> > operand. > >> >> > > >> >> > If a target then cannot support the default value it needs to emit a > >> > blend > >> >> > afterwards. > >> > > >> >> Yeah, that's definitely better. I was a bit worried about the extra > >> >> generality, but it turns out that (on SVE) having the else argument and > >> >> getting the .md patterns to blend the input can avoid an unnecessary > >> >> move that would be difficult to remove otherwise. > >> > > >> > Btw, I've checked AVX512 and their masked operations either blend > >> > into the destination or zero the masked elements. Possibly useful > >> > for plus reductions. > > > >> OK. > > > >> >> I've committed the patches to add the else argument and the target hook > >> >> to select the preferred else value (thanks for the reviews). The patch > >> >> below now uses that preferred else value if there's no ?: or if the > >> >> else value of the ?: isn't available at the point of the conditional > >> >> operation. > >> > > >> >> This version also copes with: > >> > > >> >> if (cond) { ... x = a + b; ... } > >> >> y = !cond ? c : x; > >> > > >> >> by adding a new utility function inverse_conditions_p. > >> > > >> > interpret_as_condition_p is a bit of a layering violation in > > fold-const.c > >> > since it looks at SSA_NAME_DEF_STMT. In gimple.c we have a related > >> > thing, mainly used by forwprop - canonicalize_cond_expr_cond which is > >> > used to interpret a GENERIC folded thing as condition. > >> > > >> > Do you really need the def stmt lookup? > > > >> Yeah, since these internal functions always take a gimple value > >> as input, not a comparison code. But I can put the code elsewhere. > > > >> > Note that one item on my TODO list is to remove [VEC_]COND_EXPR from > >> > gimple in favor of using a quaternary assign stmt with tcc_comparison > > code > >> > so we'd have > >> > > >> > op0 = op1 < op2 ? op3 : op4; > >> > > >> > that means currently valid op0 = op1 < op2; would become > >> > a more explicit op0 = op1 < op2 ? true : false; > >> > > >> > That means you'd for example may need to handle _1 != 0 being > >> > feeded by _1 = _2 < _3 ? true : false; -- or simply rely on those > >> > to be combined. > >> > > >> > Sooo - maybe you feel like implementing the above? ;) > > > >> So is just requiring a gimple value in COND_EXPR and VEC_COND_EXPR > >> a non-starter? That seems neater IMO. (And probably replacing > >> them with a single code.) > > > > That's the alternative. OTOH with the above we have a single canonical > > way to compute the value of a comparison rather than two as we have now: > > _1 = _2 < _3; > > vs. > > _1 = _2 < _3 ? true : false; > > > > so I somewhat settled on that "fix" to the GENERIC expression in > > our [VEC_]COND_EXPRs. > > But the second feels as redundant as x + 0 to me :-) Booleans should > be 0 or 1 whereever they come from, whether it's directly from a > comparison, from a BIT_FIELD_REF, from a logical operation, etc. > So if we don*t fold the second to the first at the moment, that seems > like a missed opportunity. One of the "achievements" of the re-defined COND_EXPRs
[PATCH] Fix PR86017
The following fixes the testcase in PR86017 where failure to inline-expand single-byte memsets on GIMPLE causes us to miss store-merging opportunities. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-06-01 Richard Biener PR middle-end/86017 * gimple-fold.c (var_decl_component_p): Also allow offsetted vars wrapped in MEM_REFs. * gcc.dg/tree-ssa/pr86017.c: New testcase. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 261055) +++ gcc/gimple-fold.c (working copy) @@ -632,7 +632,9 @@ var_decl_component_p (tree var) tree inner = var; while (handled_component_p (inner)) inner = TREE_OPERAND (inner, 0); - return SSA_VAR_P (inner); + return (DECL_P (inner) + || (TREE_CODE (inner) == MEM_REF + && TREE_CODE (TREE_OPERAND (inner, 0)) == ADDR_EXPR)); } /* If the SIZE argument representing the size of an object is in a range Index: gcc/testsuite/gcc.dg/tree-ssa/pr86017.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr86017.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/pr86017.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-store-merging" } */ + +void f (void*); + +void g (void) +{ + char a[8]; + __builtin_memset (a, 0, 8); + + f (a); +} + +void h (void) +{ + char a[8]; + __builtin_memset (a, 0, 1); + __builtin_memset (a + 1, 0, 1); + __builtin_memset (a + 2, 0, 1); + __builtin_memset (a + 3, 0, 1); + __builtin_memset (a + 4, 0, 1); + __builtin_memset (a + 5, 0, 1); + __builtin_memset (a + 6, 0, 1); + __builtin_memset (a + 7, 0, 1); + + f (a); +} + +/* { dg-final { scan-tree-dump "Merged into 1 stores" "store-merging" } } */
Re: [patch] Enhance GIMPLE store-merging pass for bit-fields
On Thu, May 31, 2018 at 3:25 PM Eric Botcazou wrote: > > Hi, > > this enhances the GIMPLE store-merging pass by teaching it to deal with > generic stores to bit-fields, i.e. not just stores of immediates. The > motivating example is: > > struct S { > unsigned int flag : 1; > unsigned int size : 31; > }; > > void foo (struct S *s, unsigned int size) > { > s->flag = 1; > s->size = size & 0x7FFF; > } > > which may seem a little contrived but is the direct translation of something > very natural in Ada, and for which the compiler currently generates at -O2: > > orb $1, (%rdi) > leal(%rsi,%rsi), %eax > movl(%rdi), %esi > andl$1, %esi > orl %eax, %esi > movl%esi, (%rdi) > ret > > With the change, the generated code is optimal: > > leal1(%rsi,%rsi), %esi > movl%esi, (%rdi) > ret > > The patch adds a 4th class of stores (with the BIT_INSERT_EXPR code) that can > be merged into groups containing other BIT_INSERT_EXPR or INTEGER_CST stores. > These stores are merged like constant stores, but the immediate value is not > updated (unlike the mask) and instead output_merged_store synthesizes the bit > insertion sequences from the original stores. It also contains a few cleanups > for the dumping code and other minor fixes. > > Tested on x86-64/Linux and SPARC/Solaris, OK for the mainline? Just a general remark, the many non-functional but stylistic changes do not make the patch easier to review ;) + /* If bit insertion is required, we use the source as an accumulator +into which the successive bit-field values are manually inserted. +FIXME: perhaps use BIT_INSERT_EXPR instead in some cases? */ so exactly - why not use BIT_INSERT_EXPR here? Some reasoning would be nice to have mentioned here. Is it because BIT_INSERT_EXPR will basically go the store_bit_field expansion route and thus generate the same code as the unmerged one? For code-generating adjacent inserts wouldn't it be more efficient to do accum = first-to-be-inserted-val; accum <<= shift-to-next-inserted-val; accum |= next-to-be-inserted-val; ... accum <<= final-shift; instead of shifting the to-be-inserted value? x86 with BMI2 should be able to expand the bit-inserts directly using pdep (it's a little bit of an awkward instruction though). BMI doesn't seen to have a single reg-to-bitfield insertion instruction mimicing bextr. Otherwise the patch looks OK to me. Thanks, Richard. > > 2018-05-31 Eric Botcazou > > * gimple-ssa-store-merging.c: Include gimple-fold.h. > (struct store_immediate_info): Document BIT_INSERT_EXPR stores. > (struct merged_store_group): Add bit_insertion field. > (dump_char_array): Use standard hexadecimal format. > (merged_store_group::merged_store_group): Set bit_insertion to false. > (merged_store_group::apply_stores): Use optimal buffer size. Deal > with BIT_INSERT_EXPR stores. Move up code updating the mask and > also print the mask in the dump file. > (pass_store_merging::gate): Minor tweak. > (imm_store_chain_info::coalesce_immediate): Fix wrong association > of stores with groups in dump. Allow coalescing of BIT_INSERT_EXPR > with INTEGER_CST stores. > (count_multiple_uses) : New case. > (imm_store_chain_info::output_merged_store): Add try_bitpos variable > and use it throughout. Generare bit insertion sequences if need be. > (pass_store_merging::process_store): Remove redundant condition. > Record store from a SSA name to a bit-field with BIT_INSERT_EXPR. > > > 2018-05-31 Eric Botcazou > > * gcc.dg/store_merging_20.c: New test. > * gnat.dg/opt71.adb: Likewise. > * gnat.dg/opt71_pkg.ads: New helper. > > -- > Eric Botcazou
[PATCH] Do not modify DR_STMT in vectorizer pattern recog
Modifying DR_STMT stands in the way of re-using dataref and dependence analysis between vector size analyses. Thus this introduces vect_dr_stmt which wraps DR_STMT and returns the pattern stmt if appropriate. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-06-01 Richard Biener * tree-vectorizer.h (vect_dr_stmt): New function. (vect_get_load_cost): Adjust. (vect_get_store_cost): Likewise. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use vect_dr_stmt instead of DR_SMTT. (vect_record_base_alignments): Likewise. (vect_calculate_target_alignment): Likewise. (vect_compute_data_ref_alignment): Likewise and make static. (vect_update_misalignment_for_peel): Likewise. (vect_verify_datarefs_alignment): Likewise. (vector_alignment_reachable_p): Likewise. (vect_get_data_access_cost): Likewise. Pass down vinfo to vect_get_load_cost/vect_get_store_cost instead of DR. (vect_get_peeling_costs_all_drs): Likewise. (vect_peeling_hash_get_lowest_cost): Likewise. (vect_enhance_data_refs_alignment): Likewise. (vect_find_same_alignment_drs): Likewise. (vect_analyze_data_refs_alignment): Likewise. (vect_analyze_group_access_1): Likewise. (vect_analyze_group_access): Likewise. (vect_analyze_data_ref_access): Likewise. (vect_analyze_data_ref_accesses): Likewise. (vect_vfa_segment_size): Likewise. (vect_small_gap_p): Likewise. (vectorizable_with_step_bound_p): Likewise. (vect_prune_runtime_alias_test_list): Likewise. (vect_analyze_data_refs): Likewise. (vect_supportable_dr_alignment): Likewise. * tree-vect-loop-manip.c (get_misalign_in_elems): Likewise. (vect_gen_prolog_loop_niters): Likewise. * tree-vect-loop.c (vect_analyze_loop_2): Likewise. * tree-vect-patterns.c (vect_recog_bool_pattern): Do not modify DR_STMT. (vect_recog_mask_conversion_pattern): Likewise. (vect_try_gather_scatter_pattern): Likewise. * tree-vect-stmts.c (vect_model_store_cost): Pass stmt_info to vect_get_store_cost. (vect_get_store_cost): Get stmt_info instead of DR. (vect_model_load_cost): Pass stmt_info to vect_get_load_cost. (vect_get_load_cost): Get stmt_info instead of DR. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 261061) +++ gcc/tree-vect-data-refs.c (working copy) @@ -290,8 +290,8 @@ vect_analyze_data_ref_dependence (struct struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); struct data_reference *dra = DDR_A (ddr); struct data_reference *drb = DDR_B (ddr); - stmt_vec_info stmtinfo_a = vinfo_for_stmt (DR_STMT (dra)); - stmt_vec_info stmtinfo_b = vinfo_for_stmt (DR_STMT (drb)); + stmt_vec_info stmtinfo_a = vinfo_for_stmt (vect_dr_stmt (dra)); + stmt_vec_info stmtinfo_b = vinfo_for_stmt (vect_dr_stmt (drb)); lambda_vector dist_v; unsigned int loop_depth; @@ -467,7 +467,8 @@ vect_analyze_data_ref_dependence (struct ... = a[i]; a[i+1] = ...; where loads from the group interleave with the store. */ - if (!vect_preserves_scalar_order_p (DR_STMT (dra), DR_STMT (drb))) + if (!vect_preserves_scalar_order_p (vect_dr_stmt(dra), + vect_dr_stmt (drb))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -618,9 +619,9 @@ vect_slp_analyze_data_ref_dependence (st /* If dra and drb are part of the same interleaving chain consider them independent. */ - if (STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (DR_STMT (dra))) - && (DR_GROUP_FIRST_ELEMENT (vinfo_for_stmt (DR_STMT (dra))) - == DR_GROUP_FIRST_ELEMENT (vinfo_for_stmt (DR_STMT (drb) + if (STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (vect_dr_stmt (dra))) + && (DR_GROUP_FIRST_ELEMENT (vinfo_for_stmt (vect_dr_stmt (dra))) + == DR_GROUP_FIRST_ELEMENT (vinfo_for_stmt (vect_dr_stmt (drb) return false; /* Unknown data dependence. */ @@ -834,22 +835,21 @@ vect_record_base_alignments (vec_info *v unsigned int i; FOR_EACH_VEC_ELT (vinfo->datarefs, i, dr) { - gimple *stmt = DR_STMT (dr); -if (!DR_IS_CONDITIONAL_IN_STMT (dr) - && STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) - { - gimple *stmt = DR_STMT (dr); - vect_record_base_alignment (vinfo, stmt, &DR_INNERMOST (dr)); - - /* If DR is nested in the loop that is being vectorized, we can also - record the alignment of the base wrt the outer loop. */ - if (loop && nested_in_vect_loop_p (loop, stmt)) - { - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - vect_record
[PATCH] MIPS: Add i6500 processor as an alias for i6400
Hi, This patch adds i6500 CPU as an alias for i6400. Regards, Robert gcc/ChangeLog: 2018-06-01 Matthew Fortune * config/mips/mips-cpus.def: New MIPS_CPU for i6500. * config/mips/mips-tables.opt: Regenerate. * config/mips/mips.h (MIPS_ISA_LEVEL_SPEC): Mark i6500 as mips64r6. * doc/invoke.texi: Document -march=i6500. --- gcc/config/mips/mips-cpus.def | 1 + gcc/config/mips/mips-tables.opt | 3 +++ gcc/config/mips/mips.h | 2 +- gcc/doc/invoke.texi | 2 +- 4 files changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/config/mips/mips-cpus.def b/gcc/config/mips/mips-cpus.def index d0640e5..7314335 100644 --- a/gcc/config/mips/mips-cpus.def +++ b/gcc/config/mips/mips-cpus.def @@ -171,3 +171,4 @@ MIPS_CPU ("xlp", PROCESSOR_XLP, 65, PTF_AVOID_BRANCHLIKELY_SPEED) /* MIPS64 Release 6 processors. */ MIPS_CPU ("i6400", PROCESSOR_I6400, 69, 0) +MIPS_CPU ("i6500", PROCESSOR_I6400, 69, 0) diff --git a/gcc/config/mips/mips-tables.opt b/gcc/config/mips/mips-tables.opt index daccefb..d8e50b2 100644 --- a/gcc/config/mips/mips-tables.opt +++ b/gcc/config/mips/mips-tables.opt @@ -696,3 +696,6 @@ Enum(mips_arch_opt_value) String(xlp) Value(101) Canonical EnumValue Enum(mips_arch_opt_value) String(i6400) Value(102) Canonical +EnumValue +Enum(mips_arch_opt_value) String(i6500) Value(103) Canonical + diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index f290560..705434e 100644 --- a/gcc/config/mips/mips.h +++ b/gcc/config/mips/mips.h @@ -782,7 +782,7 @@ struct mips_cpu_info { %{march=mips64r2|march=loongson3a|march=octeon|march=xlp: -mips64r2} \ %{march=mips64r3: -mips64r3} \ %{march=mips64r5: -mips64r5} \ - %{march=mips64r6|march=i6400: -mips64r6}}" + %{march=mips64r6|march=i6400|march=i6500: -mips64r6}}" /* A spec that injects the default multilib ISA if no architecture is specified. */ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 53ef14c..0d3a912 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -20220,7 +20220,7 @@ The processor names are: @samp{34kc}, @samp{34kf2_1}, @samp{34kf1_1}, @samp{34kn}, @samp{74kc}, @samp{74kf2_1}, @samp{74kf1_1}, @samp{74kf3_2}, @samp{1004kc}, @samp{1004kf2_1}, @samp{1004kf1_1}, -@samp{i6400}, +@samp{i6400}, @samp{i6500}, @samp{interaptiv}, @samp{loongson2e}, @samp{loongson2f}, @samp{loongson3a}, @samp{m4k}, -- 2.7.4
[PATCH] MIPS: Add support for -mcrc and -mginv options
Hi, This patch adds -mcrc and -mginv options to pass through them to the assembler. Regards, Robert gcc/ChangeLog: 2018-06-01 Matthew Fortune * config/mips/mips.h (ASM_SPEC): Pass through -mcrc, -mno-crc, -mginv and -mno-ginv to the assembler. * config/mips/mips.opt (-mcrc): New option. (-mginv): Likewise. * doc/invoke.text (-mcrc): Document. (-mginv): Likewise. --- gcc/config/mips/mips.h | 2 ++ gcc/config/mips/mips.opt | 8 gcc/doc/invoke.texi | 14 ++ 3 files changed, 24 insertions(+) diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index f290560..13c30f0 100644 --- a/gcc/config/mips/mips.h +++ b/gcc/config/mips/mips.h @@ -1354,6 +1354,8 @@ struct mips_cpu_info { %{meva} %{mno-eva} \ %{mvirt} %{mno-virt} \ %{mxpa} %{mno-xpa} \ +%{mcrc} %{mno-crc} \ +%{mginv} %{mno-ginv} \ %{mmsa} %{mno-msa} \ %{msmartmips} %{mno-smartmips} \ %{mmt} %{mno-mt} \ diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt index 545da54..5a9f255 100644 --- a/gcc/config/mips/mips.opt +++ b/gcc/config/mips/mips.opt @@ -412,6 +412,14 @@ mxpa Target Report Var(TARGET_XPA) Use eXtended Physical Address (XPA) instructions. +mcrc +Target Report Var(TARGET_CRC) +Use Cyclic Redundancy Check (CRC) instructions. + +mginv +Target Report Var(TARGET_GINV) +Use Global INValidate (GINV) instructions. + mvr4130-align Target Report Mask(VR4130_ALIGN) Perform VR4130-specific alignment optimizations. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 53ef14c..bcb4c14 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -888,6 +888,8 @@ Objective-C and Objective-C++ Dialects}. -meva -mno-eva @gol -mvirt -mno-virt @gol -mxpa -mno-xpa @gol +-mcrc -mno-crc @gol +-mginv -mno-ginv @gol -mmicromips -mno-micromips @gol -mmsa -mno-msa @gol -mfpu=@var{fpu-type} @gol @@ -20701,6 +20703,18 @@ Use (do not use) the MIPS Virtualization (VZ) instructions. @opindex mno-xpa Use (do not use) the MIPS eXtended Physical Address (XPA) instructions. +@item -mcrc +@itemx -mno-crc +@opindex mcrc +@opindex mno-crc +Use (do not use) the MIPS Cyclic Redundancy Check (CRC) instructions. + +@item -mginv +@itemx -mno-ginv +@opindex mginv +@opindex mno-ginv +Use (do not use) the MIPS Global INValidate (GINV) instructions. + @item -mlong64 @opindex mlong64 Force @code{long} types to be 64 bits wide. See @option{-mlong32} for -- 2.7.4
[PATCH] MIPS: Update I6400 scheduler
Hi, Update to i6400 scheduler. Regards, Robert gcc/ChangeLog: 2018-06-01 Prachi Godbole * config/mips/i6400.md (i6400_gpmuldiv): Remove cpu_unit. (i6400_gpmul): Add cpu_unit. (i6400_gpdiv): Likewise. (i6400_msa_add_d): Update reservations. (i6400_msa_int_add) Likewise. (i6400_msa_short_logic3) Likewise. (i6400_msa_short_logic2) Likewise. (i6400_msa_short_logic) Likewise. (i6400_msa_move) Likewise. (i6400_msa_cmp) Likewise. (i6400_msa_short_float2) Likewise. (i6400_msa_div_d) Likewise. (i6400_msa_long_logic1) Likewise. (i6400_msa_long_logic2) Likewise. (i6400_msa_mult) Likewise. (i6400_msa_long_float2) Likewise. (i6400_msa_long_float4) Likewise. (i6400_msa_long_float5) Likewise. (i6400_msa_long_float8) Likewise. (i6400_fpu_minmax): New define_insn_reservation. (i6400_fpu_fadd): Include frint type. (i6400_fpu_store): New define_insn_reservation. (i6400_fpu_load): Likewise. (i6400_fpu_move): Likewise. (i6400_fpu_fcmp): Likewise. (i6400_fpu_fmadd): Likewise. (i6400_int_mult): Include imul3nc type and update reservation. (i6400_int_div): Include idiv3 type and update reservation. (i6400_int_load): Update to check type not move_type. (i6400_int_store): Likewise. (i6400_int_prefetch): Set zero latency. --- gcc/config/mips/i6400.md | 86 ++-- 1 file changed, 61 insertions(+), 25 deletions(-) diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md index 413e9e8..a985401 100644 --- a/gcc/config/mips/i6400.md +++ b/gcc/config/mips/i6400.md @@ -21,7 +21,7 @@ (define_automaton "i6400_int_pipe, i6400_mdu_pipe, i6400_fpu_short_pipe, i6400_fpu_long_pipe") -(define_cpu_unit "i6400_gpmuldiv" "i6400_mdu_pipe") +(define_cpu_unit "i6400_gpmul, i6400_gpdiv" "i6400_mdu_pipe") (define_cpu_unit "i6400_agen, i6400_alu1, i6400_lsu" "i6400_int_pipe") (define_cpu_unit "i6400_control, i6400_ctu, i6400_alu0" "i6400_int_pipe") @@ -50,49 +50,49 @@ (define_insn_reservation "i6400_msa_add_d" 1 (and (eq_attr "cpu" "i6400") (and (eq_attr "mode" "!V2DI") (eq_attr "alu_type" "simd_add"))) - "i6400_fpu_short, i6400_fpu_intadd") + "i6400_fpu_short+i6400_fpu_intadd*2") ;; add, hadd, sub, hsub, average, min, max, compare (define_insn_reservation "i6400_msa_int_add" 2 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_int_arith")) - "i6400_fpu_short, i6400_fpu_intadd") + "i6400_fpu_short+i6400_fpu_intadd*2") ;; sat, pcnt (define_insn_reservation "i6400_msa_short_logic3" 3 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_sat,simd_pcnt")) - "i6400_fpu_short, i6400_fpu_logic") + "i6400_fpu_short+i6400_fpu_logic*2") ;; shifts, nloc, nlzc, bneg, bclr, shf (define_insn_reservation "i6400_msa_short_logic2" 2 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_shift,simd_shf,simd_bit")) - "i6400_fpu_short, i6400_fpu_logic") + "i6400_fpu_short+i6400_fpu_logic*2") ;; and, or, xor, ilv, pck, fill, splat (define_insn_reservation "i6400_msa_short_logic" 1 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_permute,simd_logic,simd_splat,simd_fill")) - "i6400_fpu_short, i6400_fpu_logic") + "i6400_fpu_short+i6400_fpu_logic*2") ;; move.v, ldi (define_insn_reservation "i6400_msa_move" 1 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_move")) - "i6400_fpu_short, i6400_fpu_logic") + "i6400_fpu_short+i6400_fpu_logic*2") ;; Float compare New: CMP.cond.fmt (define_insn_reservation "i6400_msa_cmp" 2 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_fcmp")) - "i6400_fpu_short, i6400_fpu_cmp") + "i6400_fpu_short+i6400_fpu_cmp*2") ;; Float min, max, class (define_insn_reservation "i6400_msa_short_float2" 2 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_fminmax,simd_fclass")) - "i6400_fpu_short, i6400_fpu_float") + "i6400_fpu_short+i6400_fpu_float*2") ;; div.d, mod.d (non-pipelined) (define_insn_reservation "i6400_msa_div_d" 36 @@ -158,43 +158,43 @@ (define_insn_reservation "i6400_fpu_msa_move" 1 (define_insn_reservation "i6400_msa_long_logic1" 1 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_bitmov,simd_insert")) - "i6400_fpu_long, i6400_fpu_logic_l") + "i6400_fpu_long+i6400_fpu_logic_l*2") ;; binsl, binsr, vshf, sld (define_insn_reservation "i6400_msa_long_logic2" 2 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_bitins,simd_sld")) - "i6400_fpu_long, i6400_fpu_logic_l") + "i6400_fpu_long+i6400_fpu_logic_l*2") ;; Vector mul, dotp, madd, msub (define_insn_reservation "i6400_msa_mult" 5 (and (eq_attr "cpu" "i6400") (eq_attr "type" "simd_mul")) - "i6400_fpu_long, i6400_fpu_mult") + "i6400_fpu_long+i6400_fpu_mult*2") ;; Float flog2 (defin
[PATCH] MIPS: Add support for P6600
Hi, The below adds support for -march=p6600. It includes a new scheduler plus performance tweaks. gcc/ChangeLog: 2018-06-01 Matthew Fortune Prachi Godbole * config/mips/mips-cpus.def: Define P6600. * config/mips/mips-tables.opt: Regenerate. * config/mips/mips.c (mips_multipass_dfa_lookahead): Add P6600. * config/mips/mips.h (TUNE_P6600): New define. (MIPS_ISA_LEVEL_SPEC): Infer mips64r6 from p6600. * doc/invoke.texi: Add p6600 to supported architectures. --- gcc/config/mips/mips-cpus.def | 1 + gcc/config/mips/mips-tables.opt | 3 + gcc/config/mips/mips.c | 84 +- gcc/config/mips/mips.h | 6 +- gcc/config/mips/mips.md | 2 + gcc/config/mips/p6600.md| 342 gcc/doc/invoke.texi | 2 +- 7 files changed, 434 insertions(+), 6 deletions(-) create mode 100644 gcc/config/mips/p6600.md diff --git a/gcc/config/mips/mips-cpus.def b/gcc/config/mips/mips-cpus.def index d0640e5..c6d27b6 100644 --- a/gcc/config/mips/mips-cpus.def +++ b/gcc/config/mips/mips-cpus.def @@ -171,3 +171,4 @@ MIPS_CPU ("xlp", PROCESSOR_XLP, 65, PTF_AVOID_BRANCHLIKELY_SPEED) /* MIPS64 Release 6 processors. */ MIPS_CPU ("i6400", PROCESSOR_I6400, 69, 0) +MIPS_CPU ("p6600", PROCESSOR_P6600, 69, 0) diff --git a/gcc/config/mips/mips-tables.opt b/gcc/config/mips/mips-tables.opt index daccefb..edf06af 100644 --- a/gcc/config/mips/mips-tables.opt +++ b/gcc/config/mips/mips-tables.opt @@ -696,3 +696,6 @@ Enum(mips_arch_opt_value) String(xlp) Value(101) Canonical EnumValue Enum(mips_arch_opt_value) String(i6400) Value(102) Canonical +EnumValue +Enum(mips_arch_opt_value) String(p6600) Value(105) Canonical + diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index bfe64bb..9c77c13 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -198,6 +198,15 @@ enum mips_address_type { ADDRESS_SYMBOLIC }; +/* Classifies an unconditional branch of interest for the P6600. */ + +enum mips_ucbranch_type { + /* May not even be a branch. */ + UC_UNDEFINED, + UC_BALC, + UC_OTHER +}; + /* Macros to create an enumeration identifier for a function prototype. */ #define MIPS_FTYPE_NAME1(A, B) MIPS_##A##_FTYPE_##B #define MIPS_FTYPE_NAME2(A, B, C) MIPS_##A##_FTYPE_##B##_##C @@ -1127,6 +1136,19 @@ static const struct mips_rtx_cost_data COSTS_N_INSNS (36), /* int_div_di */ 2,/* branch_cost */ 4 /* memory_latency */ + }, + { /* P6600 */ +COSTS_N_INSNS (4),/* fp_add */ +COSTS_N_INSNS (5),/* fp_mult_sf */ +COSTS_N_INSNS (5),/* fp_mult_df */ +COSTS_N_INSNS (17), /* fp_div_sf */ +COSTS_N_INSNS (17), /* fp_div_df */ +COSTS_N_INSNS (5),/* int_mult_si */ +COSTS_N_INSNS (5),/* int_mult_di */ +COSTS_N_INSNS (8),/* int_div_si */ +COSTS_N_INSNS (8),/* int_div_di */ + 2,/* branch_cost */ + 4 /* memory_latency */ } }; @@ -14592,6 +14614,7 @@ mips_issue_rate (void) case PROCESSOR_LOONGSON_2F: case PROCESSOR_LOONGSON_3A: case PROCESSOR_P5600: +case PROCESSOR_P6600: return 4; case PROCESSOR_XLP: @@ -14727,7 +14750,7 @@ mips_multipass_dfa_lookahead (void) if (TUNE_OCTEON) return 2; - if (TUNE_P5600 || TUNE_I6400) + if (TUNE_P5600 || TUNE_P6600 || TUNE_I6400) return 4; return 0; @@ -18647,6 +18670,28 @@ mips_orphaned_high_part_p (mips_offset_table *htab, rtx_insn *insn) return false; } +/* Subroutine of mips_avoid_hazard. We classify unconditional branches + of interest for the P6600 for performance reasons. We're interested + in differentiating BALC from JIC, JIALC and BC. */ + +static enum mips_ucbranch_type +mips_classify_branch_p6600 (rtx_insn *insn) +{ + if (!(insn + && USEFUL_INSN_P (insn) + && GET_CODE (PATTERN (insn)) != SEQUENCE)) +return UC_UNDEFINED; + + if (get_attr_jal (insn) == JAL_INDIRECT /* JIC and JIALC. */ + || get_attr_type (insn) == TYPE_JUMP) /* BC as it is a loose jump. */ +return UC_OTHER; + + if (CALL_P (insn) && get_attr_jal (insn) == JAL_DIRECT) +return UC_BALC; + + return UC_UNDEFINED; +} + /* Subroutine of mips_reorg_process_insns. If there is a hazard between INSN and a previous instruction, avoid it by inserting nops after instruction AFTER. @@ -18699,14 +18744,36 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn *insn, int *hilo_delay, && GET_CODE (pattern) != ASM_INPUT && asm_noperands (pattern) < 0) nops = 1; + /* The P6600's branch predictor does not handle certain static + sequences of back-to-back jumps well. Inserting a no-op only + costs space as the dispatch unit will disregard the nop. + Here we handle
Re: [RFC][PR64946] "abs" vectorization fails for char/short types
On Fri, Jun 1, 2018 at 4:12 AM Kugan Vivekanandarajah wrote: > > Hi Richard, > > This is the revised patch based on the review and the discussion in > https://gcc.gnu.org/ml/gcc/2018-05/msg00179.html. > > In summary: > - I skipped (element_precision (type) < element_precision (TREE_TYPE > (@0))) in the match.pd pattern as this would prevent transformation > for the case in PR. > that is, I am interested in is something like: > char t = (char) ABS_EXPR <(int) x> > and I want to generate > char t = (char) ABSU_EXPR > > - I also haven't added all the necessary match.pd changes for > ABSU_EXPR. I have a patch for that but will submit separately based on > this reveiw. > > - I also tried to add ABSU_EXPRsupport in the places as necessary by > grepping for ABS_EXPR. > > - I also had to add special casing in vectorizer for ABSU_EXP as its > result is unsigned type. > > Is this OK. Patch bootstraps and the regression test is ongoing. The c/c-typeck.c:build_unary_op change looks unnecessary - the C FE should never generate this directly (the c-common one might be triggered by early folding I guess). @@ -1761,6 +1762,9 @@ const_unop (enum tree_code code, tree type, tree arg0) if (TREE_CODE (arg0) == INTEGER_CST || TREE_CODE (arg0) == REAL_CST) return fold_abs_const (arg0, type); break; +case ABSU_EXPR: + return fold_convert (type, fold_abs_const (arg0, + signed_type_for (type))); case CONJ_EXPR: I think this will get you bogus TREE_OVERFLOW flags set on ABSU (-INT_MIN). I think you want to change fold_abs_const to properly deal with arg0 being signed and type unsigned. That is, sth like diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 6f80f1b1d69..f60f9c77e91 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -13843,18 +13843,19 @@ fold_abs_const (tree arg0, tree type) { /* If the value is unsigned or non-negative, then the absolute value is the same as the ordinary value. */ - if (!wi::neg_p (wi::to_wide (arg0), TYPE_SIGN (type))) - t = arg0; + wide_int val = wi::to_wide (arg0); + bool overflow = false; + if (!wi::neg_p (val, TYPE_SIGN (TREE_TYPE (arg0 + ; /* If the value is negative, then the absolute value is its negation. */ else - { - bool overflow; - wide_int val = wi::neg (wi::to_wide (arg0), &overflow); - t = force_fit_type (type, val, -1, - overflow | TREE_OVERFLOW (arg0)); - } + wide_int val = wi::neg (val, &overflow); + + /* Force to the destination type, set TREE_OVERFLOW for signed + TYPE only. */ + t = force_fit_type (type, val, 1, overflow | TREE_OVERFLOW (arg0)); } break; and then simply share the const_unop code with ABS_EXPR. diff --git a/gcc/match.pd b/gcc/match.pd index 14386da..7d7c132 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -102,6 +102,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match (nop_convert @0) @0) +(simplify (abs (convert @0)) + (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && !TYPE_UNSIGNED (TREE_TYPE (@0)) + && !TYPE_UNSIGNED (type)) + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); } + (convert (absu:utype @0) + + please put a comment before the pattern. I believe there's no need to check for !TYPE_UNSIGNED (type). Note this also converts abs ((char)int-var) to (char)absu(int-var) which doesn't make sense. The original issue you want to address here is the case where TYPE_PRECISION of @0 is less than the precision of type. That is, you want to remove language introduced integer promotion of @0 which only is possible with ABSU. So please do add such precision check (I simply suggested the bogus direction of the test). diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 68f4fd3..9b62583 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3685,6 +3685,12 @@ verify_gimple_assign_unary (gassign *stmt) case PAREN_EXPR: case CONJ_EXPR: break; +case ABSU_EXPR: + if (!TYPE_UNSIGNED (lhs_type) + || !ANY_INTEGRAL_TYPE_P (rhs1_type)) if (!ANY_INTEGRAL_TYPE_P (lhs_type) || !TYPE_UNSIGNED (lhs_type) || !ANY_INTEGRAL_TYPE_P (rhs1_type) || TYPE_UNSIGNED (rhs1_type) || element_precision (lhs_type) != element_precision (rhs1_type)) { error ("invalid types for ABSU_EXPR"); debug_generic_expr (lhs_type); debug_generic_expr (rhs1_type); return true; } + return true; + return false; + break; diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c index 30c6d9e..44b1399 100644 --- a/gcc/tree-eh.c +++ b/gcc/tree-eh.c @@ -2465,6 +2465,7 @@ operation_could_trap_helper_p (enum tree_code op, case NEGATE_EXPR: case ABS_EXPR: +case ABSU_EXPR: case CONJ_EXPR: /* These operations don't trap with floating point. */ if (honor_t
Re: [PATCH] DWARF5: Don't generate DW_AT_loclists_base for split compile unit DIEs.
On Sat, 2018-05-26 at 21:31 +0200, Mark Wielaard wrote: > The loclists_base attribute is used to point to the beginning of the > loclists index of a DWARF5 loclists table when using DW_FORM_loclistsx. > For split compile units the base is not given by the attribute, but is > either the first (and only) index in the .debug_loclists.dwo section, > or (when placed in a .dwp file) given by the DW_SECT_LOCLISTS row in > the .debug_cu_index section. > > The loclists_base attribute is only valid for the full (or skeleton) > compile unit DIE in the main (relocatable) object. But GCC only ever > generates a loclists table index for the .debug_loclists section put > into the split DWARF .dwo file. > > For split compile unit DIEs it is confusing (and not according to spec) > to also have a DW_AT_loclists_base attribute (which might be wrong, > since its relocatable offset won't actually be relocated). Ping. > gcc/ChangeLog > > * dwarf2out.c (dwarf2out_finish): Remove generation of > DW_AT_loclists_base. > --- > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index c05bfe4..103ded0 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -31292,11 +31292,17 @@ dwarf2out_finish (const char *) > if (dwarf_split_debug_info) > { > if (have_location_lists) > > { > > - if (dwarf_version >= 5) > > - add_AT_loclistsptr (comp_unit_die (), DW_AT_loclists_base, > > - loc_section_label); > > + /* Since we generate the loclists in the split DWARF .dwo > > + file itself, we don't need to generate a loclists_base > > + attribute for the split compile unit DIE. That attribute > > + (and using relocatable sec_offset FORMs) isn't allowed > > + for a split compile unit. Only if the .debug_loclists > > + section was in the main file, would we need to generate a > > + loclists_base attribute here (for the full or skeleton > > + unit DIE). */ > + > > /* optimize_location_lists calculates the size of the lists, > > so index them first, and assign indices to the entries. > > Although optimize_location_lists will remove entries from > > the table, it only does so for duplicates, and therefore
Fix phi backedge detection in backprop (PR85989)
This PR is a nasty wrong code bug due to my fluffing a test for a backedge in gimple-ssa-backprop.c. Backedges are supposed to be from definitions in the statement we're visiting to uses in statements that we haven't visited yet. However, the check failed to account for PHIs in the current block that had already been processed, so if two PHIs in the same block referenced each other, we'd treat both references as backedges. In more detail: The first phase of the pass goes through all statements in an arbitrary order, making optimistic assumptions about any statements that haven't been visited yet. The second phase then calculates a correct (supposedly maximal) fixed point. Although the first phase order is arbitrary in principle, we still use the CFG rpo to cut down on the backedges. This means that the only thing that's truly arbitrary is the order that we process the PHIs in a block. Any order should be OK and should eventually give the same results. But we have to follow whatever order we pick, and the pass wasn't doing that. Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and x86_64-linux-gnu. OK for trunk and all release branches? Sorry for what in hindsight was a really stupid bug. Richard 2018-06-01 Richard Sandiford gcc/ PR tree-optimization/85989 * gimple-ssa-backprop.c (backprop::m_visited_phis): New member variable. (backprop::intersect_uses): Check it when deciding whether this is a backedge reference. (backprop::process_block): Add each phi to m_visited_phis after visiting it, then clear it at the end. gcc/testsuite/ PR tree-optimization/85989 * gcc.dg/torture/pr85989.c: New test. Index: gcc/gimple-ssa-backprop.c === --- gcc/gimple-ssa-backprop.c 2018-05-18 09:26:37.726714678 +0100 +++ gcc/gimple-ssa-backprop.c 2018-06-01 13:36:04.223449400 +0100 @@ -260,6 +260,11 @@ dump_usage_info (FILE *file, tree var, u post-order walk. */ auto_sbitmap m_visited_blocks; + /* A bitmap of phis that we have finished processing in the initial + post-order walk, excluding those from blocks mentioned in + M_VISITED_BLOCKS. */ + auto_bitmap m_visited_phis; + /* A worklist of SSA names whose definitions need to be reconsidered. */ auto_vec m_worklist; @@ -496,8 +501,11 @@ backprop::intersect_uses (tree var, usag { if (is_gimple_debug (stmt)) continue; - if (is_a (stmt) - && !bitmap_bit_p (m_visited_blocks, gimple_bb (stmt)->index)) + gphi *phi = dyn_cast (stmt); + if (phi + && !bitmap_bit_p (m_visited_blocks, gimple_bb (phi)->index) + && !bitmap_bit_p (m_visited_phis, + SSA_NAME_VERSION (gimple_phi_result (phi { /* Skip unprocessed phis. */ if (dump_file && (dump_flags & TDF_DETAILS)) @@ -505,7 +513,7 @@ backprop::intersect_uses (tree var, usag fprintf (dump_file, "[BACKEDGE] "); print_generic_expr (dump_file, var); fprintf (dump_file, " in "); - print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + print_gimple_stmt (dump_file, phi, 0, TDF_SLIM); } } else @@ -629,7 +637,12 @@ backprop::process_block (basic_block bb) } for (gphi_iterator gpi = gsi_start_phis (bb); !gsi_end_p (gpi); gsi_next (&gpi)) -process_var (gimple_phi_result (gpi.phi ())); +{ + tree result = gimple_phi_result (gpi.phi ()); + process_var (result); + bitmap_set_bit (m_visited_phis, SSA_NAME_VERSION (result)); +} + bitmap_clear (m_visited_phis); } /* Delete the definition of VAR, which has no uses. */ Index: gcc/testsuite/gcc.dg/torture/pr85989.c === --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/torture/pr85989.c 2018-06-01 13:36:04.223449400 +0100 @@ -0,0 +1,31 @@ +/* { dg-do run } */ + +#define N 9 + +void __attribute__((noipa)) +f (double x, double y, double *res) +{ + y = -y; + for (int i = 0; i < N; ++i) +{ + double tmp = y; + y = x; + x = tmp; + res[i] = i; +} + res[N] = y * y; + res[N + 1] = x; +} + +int +main (void) +{ + double res[N + 2]; + f (10, 20, res); + for (int i = 0; i < N; ++i) +if (res[i] != i) + __builtin_abort (); + if (res[N] != 100 || res[N + 1] != -20) +__builtin_abort (); + return 0; +}
Re: Fix phi backedge detection in backprop (PR85989)
On Fri, Jun 1, 2018 at 2:38 PM Richard Sandiford wrote: > > This PR is a nasty wrong code bug due to my fluffing a test for a > backedge in gimple-ssa-backprop.c. Backedges are supposed to be > from definitions in the statement we're visiting to uses in statements > that we haven't visited yet. However, the check failed to account for > PHIs in the current block that had already been processed, so if two > PHIs in the same block referenced each other, we'd treat both > references as backedges. > > In more detail: > > The first phase of the pass goes through all statements in an arbitrary > order, making optimistic assumptions about any statements that haven't > been visited yet. The second phase then calculates a correct > (supposedly maximal) fixed point. > > Although the first phase order is arbitrary in principle, we still use > the CFG rpo to cut down on the backedges. This means that the only > thing that's truly arbitrary is the order that we process the PHIs > in a block. Any order should be OK and should eventually give the > same results. But we have to follow whatever order we pick, > and the pass wasn't doing that. > > Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf > and x86_64-linux-gnu. OK for trunk and all release branches? OK. Thanks, Richard. > Sorry for what in hindsight was a really stupid bug. > > Richard > > > > 2018-06-01 Richard Sandiford > > gcc/ > PR tree-optimization/85989 > * gimple-ssa-backprop.c (backprop::m_visited_phis): New member > variable. > (backprop::intersect_uses): Check it when deciding whether this > is a backedge reference. > (backprop::process_block): Add each phi to m_visited_phis > after visiting it, then clear it at the end. > > gcc/testsuite/ > PR tree-optimization/85989 > * gcc.dg/torture/pr85989.c: New test. > > Index: gcc/gimple-ssa-backprop.c > === > --- gcc/gimple-ssa-backprop.c 2018-05-18 09:26:37.726714678 +0100 > +++ gcc/gimple-ssa-backprop.c 2018-06-01 13:36:04.223449400 +0100 > @@ -260,6 +260,11 @@ dump_usage_info (FILE *file, tree var, u > post-order walk. */ >auto_sbitmap m_visited_blocks; > > + /* A bitmap of phis that we have finished processing in the initial > + post-order walk, excluding those from blocks mentioned in > + M_VISITED_BLOCKS. */ > + auto_bitmap m_visited_phis; > + >/* A worklist of SSA names whose definitions need to be reconsidered. */ >auto_vec m_worklist; > > @@ -496,8 +501,11 @@ backprop::intersect_uses (tree var, usag > { >if (is_gimple_debug (stmt)) > continue; > - if (is_a (stmt) > - && !bitmap_bit_p (m_visited_blocks, gimple_bb (stmt)->index)) > + gphi *phi = dyn_cast (stmt); > + if (phi > + && !bitmap_bit_p (m_visited_blocks, gimple_bb (phi)->index) > + && !bitmap_bit_p (m_visited_phis, > + SSA_NAME_VERSION (gimple_phi_result (phi > { > /* Skip unprocessed phis. */ > if (dump_file && (dump_flags & TDF_DETAILS)) > @@ -505,7 +513,7 @@ backprop::intersect_uses (tree var, usag > fprintf (dump_file, "[BACKEDGE] "); > print_generic_expr (dump_file, var); > fprintf (dump_file, " in "); > - print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > + print_gimple_stmt (dump_file, phi, 0, TDF_SLIM); > } > } >else > @@ -629,7 +637,12 @@ backprop::process_block (basic_block bb) > } >for (gphi_iterator gpi = gsi_start_phis (bb); !gsi_end_p (gpi); > gsi_next (&gpi)) > -process_var (gimple_phi_result (gpi.phi ())); > +{ > + tree result = gimple_phi_result (gpi.phi ()); > + process_var (result); > + bitmap_set_bit (m_visited_phis, SSA_NAME_VERSION (result)); > +} > + bitmap_clear (m_visited_phis); > } > > /* Delete the definition of VAR, which has no uses. */ > Index: gcc/testsuite/gcc.dg/torture/pr85989.c > === > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/torture/pr85989.c 2018-06-01 13:36:04.223449400 > +0100 > @@ -0,0 +1,31 @@ > +/* { dg-do run } */ > + > +#define N 9 > + > +void __attribute__((noipa)) > +f (double x, double y, double *res) > +{ > + y = -y; > + for (int i = 0; i < N; ++i) > +{ > + double tmp = y; > + y = x; > + x = tmp; > + res[i] = i; > +} > + res[N] = y * y; > + res[N + 1] = x; > +} > + > +int > +main (void) > +{ > + double res[N + 2]; > + f (10, 20, res); > + for (int i = 0; i < N; ++i) > +if (res[i] != i) > + __builtin_abort (); > + if (res[N] != 100 || res[N + 1] != -20) > +__builtin_abort (); > + return 0; > +}
Ping: Extend tree code folds to IFN_COND_*
Ping for the below, which is the only unreviewed IFN_COND_* patch. Thanks for the reviews of the others. Richard Sandiford writes: > This patch adds match.pd support for applying normal folds to their > IFN_COND_* forms. E.g. the rule: > > (plus @0 (negate @1)) -> (minus @0 @1) > > also allows the fold: > > (IFN_COND_ADD @0 @1 (negate @2) @3) -> (IFN_COND_SUB @0 @1 @2 @3) > > Actually doing this by direct matches in gimple-match.c would > probably lead to combinatorial explosion, so instead, the patch > makes gimple_match_op carry a condition under which the operation > happens ("cond"), and the value to use when the condition is false > ("else_value"). Thus in the example above we'd do the following > > (a) convert: > > cond:NULL_TREE (IFN_COND_ADD @0 @1 @4 @3) else_value:NULL_TREE > > to: > > cond:@0 (plus @1 @4) else_value:@3 > > (b) apply gimple_resimplify to (plus @1 @4) > > (c) reintroduce cond and else_value when constructing the result. > > Nested operations inherit the condition of the outer operation > (so that we don't introduce extra faults) but have a null else_value. > If we try to build such an operation, the target gets to choose what > else_value it can handle efficiently: obvious choices include one of > the operands or a zero constant. (The alternative would be to have some > representation for an undefined value, but that seems a bit invasive, > and isn't likely to be useful here.) > > I've made the condition a mandatory part of the gimple_match_op > constructor so that it doesn't accidentally get dropped. > > Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf > and x86_64-linux-gnu. OK to install? > > Richard > > > 2018-05-24 Richard Sandiford > > gcc/ > * target.def (preferred_else_value): New target hook. > * doc/tm.texi.in (TARGET_PREFERRED_ELSE_VALUE): New hook. > * doc/tm.texi: Regenerate. > * targhooks.h (default_preferred_else_value): Declare. > * targhooks.c (default_preferred_else_value): New function. > * internal-fn.h (conditional_internal_fn_code): Declare. > * internal-fn.c (FOR_EACH_CODE_MAPPING): New macro. > (get_conditional_internal_fn): Use it. > (conditional_internal_fn_code): New function. > * gimple-match.h (gimple_match_cond): New struct. > (gimple_match_op): Add a cond member function. > (gimple_match_op::gimple_match_op): Update all forms to take a > gimple_match_cond. > * genmatch.c (expr::gen_transform): Use the same condition as res_op > for the suboperation, but don't specify a particular else_value. > * tree-ssa-sccvn.c (vn_nary_simplify, vn_reference_lookup_3) > (visit_nary_op, visit_reference_op_load): Pass > gimple_match_cond::UNCOND to the gimple_match_op constructor. > * gimple-match-head.c: Include tree-eh.h > (convert_conditional_op): New function. > (maybe_resimplify_conditional_op): Likewise. > (gimple_resimplify1): Call maybe_resimplify_conditional_op. > (gimple_resimplify2): Likewise. > (gimple_resimplify3): Likewise. > (gimple_resimplify4): Likewise. > (maybe_push_res_to_seq): Return null for conditional operations. > (try_conditional_simplification): New function. > (gimple_simplify): Call it. Pass conditions to the gimple_match_op > constructor. > * match.pd: Fold VEC_COND_EXPRs of an IFN_COND_* call to a new > IFN_COND_* call. > * config/aarch64/aarch64.c (aarch64_preferred_else_value): New > function. > (TARGET_PREFERRED_ELSE_VALUE): Redefine. > > gcc/testsuite/ > * gcc.dg/vect/vect-cond-arith-2.c: New test. > * gcc.target/aarch64/sve/loop_add_6.c: Likewise. > > Index: gcc/target.def > === > --- gcc/target.def2018-05-01 19:30:30.159632586 +0100 > +++ gcc/target.def2018-05-24 10:33:30.871095132 +0100 > @@ -2040,6 +2040,25 @@ HOOK_VECTOR_END (vectorize) > #define HOOK_PREFIX "TARGET_" > > DEFHOOK > +(preferred_else_value, > + "This hook returns the target's preferred final argument for a call\n\ > +to conditional internal function @var{ifn} (really of type\n\ > +@code{internal_fn}). @var{type} specifies the return type of the\n\ > +function and @var{ops} are the operands to the conditional operation,\n\ > +of which there are @var{nops}.\n\ > +\n\ > +For example, if @var{ifn} is @code{IFN_COND_ADD}, the hook returns\n\ > +a value of type @var{type} that should be used when @samp{@var{ops}[0]}\n\ > +and @samp{@var{ops}[1]} are conditionally added together.\n\ > +\n\ > +This hook is only relevant if the target supports conditional patterns\n\ > +like @code{cond_add@var{m}}. The default implementation returns a zero\n\ > +constant of type @var{type}.", > + tree, > + (unsigned ifn, tree type, unsigned nops, tree *ops), > + default_preferred_else_value) > + > +DEFHOOK > (record_offload_symbol, > "Used when offloa
Re: [patch] Enhance GIMPLE store-merging pass for bit-fields
On Fri, 1 Jun 2018, Richard Biener wrote: > For code-generating adjacent inserts wouldn't it be more efficient > to do > > accum = first-to-be-inserted-val; > accum <<= shift-to-next-inserted-val; > accum |= next-to-be-inserted-val; > ... > accum <<= final-shift; > > instead of shifting the to-be-inserted value? Perhaps I'm missing some context, but I don't see why this is expected to be more efficient. This sequence has both ors and shifts on the critical path, whereas shifting to-be-inserted values has more parallelism (shifts are then independent of ors), and even allows ors to be reassociated, shortening the critical path even more. Alexander
Re: [RFC][PR82479] missing popcount builtin detection
On Fri, Jun 1, 2018 at 12:06 PM Bin.Cheng wrote: > > On Fri, Jun 1, 2018 at 9:56 AM, Kugan Vivekanandarajah > wrote: > > Hi Bin, > > > > Thanks a lo for the review. > > > > On 1 June 2018 at 03:45, Bin.Cheng wrote: > >> On Thu, May 31, 2018 at 3:51 AM, Kugan Vivekanandarajah > >> wrote: > >>> Hi Bin, > >>> > >>> Thanks for the review. Please find the revised patch based on the > >>> review comments. > >>> > >>> Thanks, > >>> Kugan > >>> > >>> On 17 May 2018 at 19:56, Bin.Cheng wrote: > On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah > wrote: > > Hi Richard, > > > > On 6 March 2018 at 02:24, Richard Biener > > wrote: > >> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah > >> wrote: > >>> Hi Richard, > >>> > >> > >> Hi, > >> Thanks very much for working. > >> > >>> +/* Utility function to check if OP is defined by a stmt > >>> + that is a val - 1. If that is the case, set it to STMT. */ > >>> + > >>> +static bool > >>> +ssa_defined_by_and_minus_one_stmt_p (tree op, tree val, gimple **stmt) > >> This is checking if op is defined as val - 1, so name it as > >> ssa_defined_by_minus_one_stmt_p? > >> > >>> +{ > >>> + if (TREE_CODE (op) == SSA_NAME > >>> + && (*stmt = SSA_NAME_DEF_STMT (op)) > >>> + && is_gimple_assign (*stmt) > >>> + && (gimple_assign_rhs_code (*stmt) == PLUS_EXPR) > >>> + && val == gimple_assign_rhs1 (*stmt) > >>> + && integer_minus_onep (gimple_assign_rhs2 (*stmt))) > >>> +return true; > >>> + else > >>> +return false; > >> You can simply return the boolean condition. > > Done. > > > >> > >>> +} > >>> + > >>> +/* See if LOOP is a popcout implementation of the form > >> ... > >>> + rhs1 = gimple_assign_rhs1 (and_stmt); > >>> + rhs2 = gimple_assign_rhs2 (and_stmt); > >>> + > >>> + if (ssa_defined_by_and_minus_one_stmt_p (rhs1, rhs2, &and_minus_one)) > >>> +rhs1 = rhs2; > >>> + else if (ssa_defined_by_and_minus_one_stmt_p (rhs2, rhs1, > >>> &and_minus_one)) > >>> +; > >>> + else > >>> +return false; > >>> + > >>> + /* Check the recurrence. */ > >>> + phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one)); > >> So gimple_assign_rhs1 (and_minus_one) == rhs1 is always true? Please > >> use rhs1 directly. > > > > Done. > >>> + gimple *src_phi = SSA_NAME_DEF_STMT (rhs2); > >> I think this is checking wrong thing and is redundant. Either rhs2 > >> equals to rhs1 or is defined as (rhs1 - 1). For (rhs2 == rhs1) case, > >> the check duplicates checking on phi; for the latter, it's never a PHI > >> stmt and shouldn't be checked. > >> > >>> + if (gimple_code (phi) != GIMPLE_PHI > >>> + || gimple_code (src_phi) != GIMPLE_PHI) > >>> +return false; > >>> + > >>> + dest = gimple_assign_lhs (count_stmt); > >>> + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); > >>> + tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge > >>> (loop)->dest_idx); > >>> + if (adjust) > >>> +iter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), > >>> +build_call_expr (fn, 1, src), > >>> +build_int_cst (TREE_TYPE (dest), 1)); > >>> + else > >>> +iter = build_call_expr (fn, 1, src); > >> Note tree-ssa-loop-niters.c always use unsigned_type_for (IV-type) as > >> niters type. Though unsigned type is unnecessary in this case, but > >> better to follow existing behavior? > > > > Done. > >> > >>> + max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest))); > >> As richi suggested, max should be the number of bits in type of IV. > >> > >>> + > >>> + niter->assumptions = boolean_false_node; > >> Redundant. > > > > Not sure I understand. If I remove this, I am getting ICE > > (segmentation fault). What is the expectation here? > Is it a simple typo? Because assumptions is set to boolean_true_node > just 5 lines below? > The niters part looks good for me with this change. You may need > richi's approval for other parts? diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index bdf9ba1..4dc156a 100644 --- a/gcc/ipa-fnsummary.c +++ b/gcc/ipa-fnsummary.c @@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info, nonconstant_names); return p2.or_with (summary->conds, p1); } + else if (TREE_CODE (expr) == CALL_EXPR) +{ + return false; +} else { debug_tree (expr); I'd return true here instead - we don't want to track popcount() in predicates. Also unnecessary braces. @@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr) if (!integer_pow2p (TREE_OPERAND (expr, 1))) return true; } + if (code == CALL_EXPR + && (fndecl = get_callee_fndecl (expr)) + && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL + && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl))) +return false; please use if (code == CALL_EXPR) { if (!is_inexpensive_builtin (get_callee_fndecl (expr))) return t
Re: [PATCH 09/10] Experiment with using optinfo in gimple-loop-interchange.cc
On Fri, 2018-06-01 at 11:50 +0200, Richard Biener wrote: > On Tue, May 29, 2018 at 10:33 PM David Malcolm > wrote: > > > > This was an experiment to try to capture information on a > > loop optimization. > > > > gcc/ChangeLog: > > * gimple-loop-interchange.cc (should_interchange_loops): > > Add > > optinfo note when interchange gives better data locality > > behavior. > > (tree_loop_interchange::interchange): Add OPTINFO_SCOPE. > > Add optinfo for successful and unsuccessful interchanges. > > (prepare_perfect_loop_nest): Add OPTINFO_SCOPE. Add > > optinfo note. > > (pass_linterchange::execute): Add OPTINFO_SCOPE. > > --- > > gcc/gimple-loop-interchange.cc | 36 > > +++- > > 1 file changed, 35 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop- > > interchange.cc > > index eb35263..cd32288 100644 > > --- a/gcc/gimple-loop-interchange.cc > > +++ b/gcc/gimple-loop-interchange.cc > > @@ -1556,7 +1556,19 @@ should_interchange_loops (unsigned i_idx, > > unsigned o_idx, > >ratio = innermost_loops_p ? INNER_STRIDE_RATIO : > > OUTER_STRIDE_RATIO; > >/* Do interchange if it gives better data locality behavior. */ > >if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, ratio))) > > -return true; > > +{ > > + if (optinfo_enabled_p ()) > > + OPTINFO_NOTE ((gimple *)NULL) // FIXME > > + << "interchange gives better data locality behavior: " > > + << "iloop_strides: " > > + << decu (iloop_strides) > > + << " > (oloop_strides: " > > + << decu (oloop_strides) > > + << " * ratio: " > > + << decu (ratio) > > + << ")"; > > Just randomly inside the thread. > > NOO! > > :/ > Please do _not_ add more stream-like APIs. How do you expect > translators to deal with those? > > Yes, I'm aware of the graphite-* ones and I dislike those very much. > > What's wrong with the existing dump API? The existing API suffers from a "wall of text" problem: * although it's possible to filter based on various criteria (the optgroup tags, specific passes, success vs failure), it's not possible to filter base on code hotness: the -fopt-info API works purely in terms of location_t. So all of the messages about the hottest functions in the workload are intermingled with all of the other messages about all of the other functions. * some of the text notes refer to function entry, but all of these are emitted "at the same level": there's no way to see the nesting of these function-entry logs, and where other notes are in relation to them. For example, in: test.c:8:3: note: === analyzing loop === test.c:8:3: note: === analyze_loop_nest === test.c:8:3: note: === vect_analyze_loop_form === test.c:8:3: note: === get_loop_niters === test.c:8:3: note: symbolic number of iterations is (unsigned int) n_9(D) test.c:8:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed test.c:8:3: note: vectorized 0 loops in function there's no way to tell that the "vect_analyze_loop_form" is in fact inside the call to "analyze_loop_nest", and where the "symbolic number of iterations" messages is coming from in relation to them. This may not seem significant here, but I'm quoting a small example; vectorization typically leads to dozens of messages, with a deep nesting structure (where that structure isn't visible in the -fopt-info output). The existing API is throwing data away: * as noted above, by working purely with a location_t, the execution count isn't associated with the messages. The output format purely gives file/line/column information, but doesn't cover the inlining chain. For C++ templates it doesn't specify which instance of a template is being optimized. * there's no metadata about where the messages are coming from. It's easy to get at the current pass internally, but the messages don't capture that. Figuring out where a message came from requires grepping the GCC source code. The prototype I posted captures the __FILE__ and __LINE__ within the gcc source for every message emitted, and which pass instance emitted it. * The current output format is of the form: "FILE:LINE:COLUMN: free-form text\n" This is only machine-readable up to a point: if a program is parsing it, all it has is the free-form text. The prototype I posted captures what kinds of things are in the text (statements, trees, symtab_nodes), and captures location information for them, so that visualizations of the dumps can provide useful links. There's no API-level grouping of messages, beyond looking for newline characters. I'm probably re-hashing a lot of the material in the cover letter here: "[PATCH 00/10] RFC: Prototype of compiler-assisted performance analysis" https://gcc.gnu.org/ml/gcc-patches/2018-05/msg01675.html I'd like to provide a machine-readable ou
Re: [PATCH] Use --push-state --as-needed and --pop-state instead of --as-needed and --no-as-needed for libgcc
Hi! On Wed, 11 Apr 2018 20:55:49 +0200, Jakub Jelinek wrote: > On Wed, Apr 11, 2018 at 06:07:17PM +0200, Matthias Klose wrote: > > On 11.04.2018 12:31, Jakub Jelinek wrote: > > > As discussed, using --as-needed and --no-as-needed is dangerous, because > > > it results in --no-as-needed even for libraries after -lgcc_s, even when > > > the > > > default is --as-needed or --as-needed has been specified earlier on the > > > command line. > > > > > > If the linker supports --push-state/--pop-state, we should IMHO use it. > > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for stage1? > > this is problematic for binutils versions with --push-state/--pop-state > > support > > in the BFD linker but not in gold, and then using -fuse-ld=gold. So maybe > > the > > version check for the BFD linker should only succeed for the first binutils > > version which also has -push-state/--pop-state support in gold. > > Does anybody use -fuse-ld=gold? Probably the question is not whether anybody's using that or not, but rather whether it's supported or not, and last time I looked, it was documented in GCC's manual. ;-) > > The BFD linker is only able to save exactly one state, and nested > > --push-state > > calls override the state (binutils PR23043). Otoh, there is not much linked > > after libgcc, so maybe this is not an issue. > > In any case, here is updated patch that will use it only for 2.28+ which > should have both ld.bfd and ld.gold --push-state support. > --- gcc/configure.ac.jj 2018-04-10 14:35:49.764788806 +0200 > +++ gcc/configure.ac 2018-04-11 18:26:16.745563830 +0200 > @@ -5517,11 +5517,25 @@ if test $in_tree_ld = yes ; then >if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" > -ge 16 -o "$gcc_cv_gld_major_version" -gt 2 \ > && test $in_tree_ld_is_elf = yes; then > gcc_cv_ld_as_needed=yes > +if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" > -ge 28; then > + gcc_cv_ld_as_needed_option='--push-state --as-needed' > + gcc_cv_ld_no_as_needed_option='--pop-state' > +fi >fi > elif test x$gcc_cv_ld != x; then ># Check if linker supports --as-needed and --no-as-needed options >if $gcc_cv_ld --help 2>&1 | grep as-needed > /dev/null; then > gcc_cv_ld_as_needed=yes > +if $gcc_cv_ld --help 2>&1 | grep push-state > /dev/null \ > + && $gcc_cv_ld --help 2>&1 | grep pop-state > /dev/null \ > + && echo "$ld_ver" | grep GNU > /dev/null \ > + && test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -ge 28; then > + # Use these options only when both ld.bfd and ld.gold support > + # --push-state/--pop-state, which unfortunately wasn't added > + # at the same time. > + gcc_cv_ld_as_needed_option='--push-state --as-needed' > + gcc_cv_ld_no_as_needed_option='--pop-state' > +fi Is binutils/ld going to stay at major version 2 forever, or should these checks also include '-o "$gcc_cv_gld_major_version" -gt 2', like done a few lines above, for example? (I have not checked now whether that might be a problem for other existing "feature" checks, too.) Grüße Thomas
Re: [PATCH] Improve bootstrap times
Hi! On Thu, 26 Apr 2018 10:08:09 +0200 (CEST), Richard Biener wrote: > --- Makefile.tpl (revision 259638) > +++ Makefile.tpl (working copy) > @@ -452,11 +452,21 @@ STAGE1_CONFIGURE_FLAGS = --disable-inter > --disable-coverage --enable-languages="$(STAGE1_LANGUAGES)" \ > --disable-build-format-warnings > > +# When using the slow stage1 compiler disable IL verification and forcefully > +# enable it when using the stage2 compiler instead. As we later compare > +# stage2 and stage3 we are merely avoid doing redundant work, plus we apply > +# checking when building all target libraries for release builds. Good idea to delay checking until late build stages! > +STAGE1_TFLAGS += -fno-checking > +STAGE2_CFLAGS += -fno-checking > +STAGE2_TFLAGS += -fno-checking > +STAGE3_CFLAGS += -fchecking > +STAGE3_TFLAGS += -fchecking This however means that when a user configured with "extra" checking enabled (for example, "--enable-checking=yes,extra", meaning a default of "-fchecking=2"), stage 3 now is not actually build anymore with "extra" checking enabled (configured default, or explicit "-fchecking=2"), but instead only with "normal checking" ("-fchecking" being equivalent to "-fchecking=1", as the above was changed to in the trunk r259755 change). That's probably to be considered surprising, unexpected. Given that, as documented, "extra" checking may affect code generation, we have three options: a) disable "extra" checking here (status quo), or b) disable the "-fno-checking optimization" if "extra" checking has been requested, or c) rework "extra" checking to not affect code generation. c) would seem best, but probably there's a reason for its current behavior, given that it has only recently been introduced. Also, the later trunk r259755 change had the following: * dwarf2out.c (gen_producer_string): Ignore -fchecking[=]. --- gcc/dwarf2out.c (revision 259754) +++ gcc/dwarf2out.c (working copy) @@ -24234,6 +24234,8 @@ gen_producer_string (void) case OPT_fmacro_prefix_map_: case OPT_ffile_prefix_map_: case OPT_fcompare_debug: + case OPT_fchecking: + case OPT_fchecking_: /* Ignore these. */ continue; default: ..., and I wonder whether "-fchecking=2" should actually continue to be included in the producer string as far as it affects code generation, and only "-fchecking=1" and "-fno-checking" be filtered out? Grüße Thomas
Re: [PATCH] Improve bootstrap times
On Fri, Jun 01, 2018 at 03:43:38PM +0200, Thomas Schwinge wrote: > > +STAGE1_TFLAGS += -fno-checking > > +STAGE2_CFLAGS += -fno-checking > > +STAGE2_TFLAGS += -fno-checking > > +STAGE3_CFLAGS += -fchecking > > +STAGE3_TFLAGS += -fchecking > > This however means that when a user configured with "extra" checking > enabled (for example, "--enable-checking=yes,extra", meaning a default of > "-fchecking=2"), stage 3 now is not actually build anymore with "extra" > checking enabled (configured default, or explicit "-fchecking=2"), but > instead only with "normal checking" ("-fchecking" being equivalent to > "-fchecking=1", as the above was changed to in the trunk r259755 change). > That's probably to be considered surprising, unexpected. > > Given that, as documented, "extra" checking may affect code generation, > we have three options: a) disable "extra" checking here (status quo), or > b) disable the "-fno-checking optimization" if "extra" checking has been > requested, or c) rework "extra" checking to not affect code generation. > c) would seem best, but probably there's a reason for its current > behavior, given that it has only recently been introduced. c) would essentially mean getting rid of the "extra" mode, because checking that doesn't affect code generation is the normal checking. The a) case is what we've decided to do, it disables the extra checking during bootstrap, sure, but it doesn't disable it for the target libraries that aren't bootstrapped, nor make check, nor anything else you compile with the compiler. Jakub
[MAINTAINERS, committed] Update email address
Hi, I've updated my email address in the MAINTAINERS file. Thanks, - Tom [MAINTAINERS] Update email address 2018-06-01 Tom de Vries * MAINTAINERS: Update my email address. --- MAINTAINERS | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5ffbbdf..c72b200 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -88,7 +88,7 @@ nds32 portChung-Ju Wu nds32 port Shiva Chen nios2 port Chung-Lin Tang nios2 port Sandra Loosemore -nvptx port Tom de Vries +nvptx port Tom de Vries pdp11 port Paul Koning powerpcspe portAndrew Jenner riscv port Kito Cheng @@ -623,7 +623,7 @@ Ilya Verbin Andre Vieira Kugan Vivekanandarajah Ville Voutilainen -Tom de Vries +Tom de Vries Nenad Vukicevic Feng Wang Jiong Wang
Re: [PATCH, rs6000 8/9] enable gimple folding for vec_xl, vec_xst
On Fri, 2018-06-01 at 08:53 +0200, Richard Biener wrote: > On Thu, May 31, 2018 at 9:59 PM Will Schmidt > wrote: > > > > Hi, > > Add support for gimple folding for unaligned vector loads and stores. > > testcases posted separately in this thread. > > > > Regtest completed across variety of systems, P6,P7,P8,P9. > > > > OK for trunk? > > Thanks, > > -Will > > > > [gcc] > > > > 2018-05-31 Will Schmidt > > > > * config/rs6000/rs6000.c: (rs6000_builtin_valid_without_lhs) Add > > vec_xst > > variants to the list. (rs6000_gimple_fold_builtin) Add support for > > folding unaligned vector loads and stores. > > > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > > index d62abdf..54b7de2 100644 > > --- a/gcc/config/rs6000/rs6000.c > > +++ b/gcc/config/rs6000/rs6000.c > > @@ -15360,10 +15360,16 @@ rs6000_builtin_valid_without_lhs (enum > > rs6000_builtins fn_code) > > case ALTIVEC_BUILTIN_STVX_V8HI: > > case ALTIVEC_BUILTIN_STVX_V4SI: > > case ALTIVEC_BUILTIN_STVX_V4SF: > > case ALTIVEC_BUILTIN_STVX_V2DI: > > case ALTIVEC_BUILTIN_STVX_V2DF: > > +case VSX_BUILTIN_STXVW4X_V16QI: > > +case VSX_BUILTIN_STXVW4X_V8HI: > > +case VSX_BUILTIN_STXVW4X_V4SF: > > +case VSX_BUILTIN_STXVW4X_V4SI: > > +case VSX_BUILTIN_STXVD2X_V2DF: > > +case VSX_BUILTIN_STXVD2X_V2DI: > >return true; > > default: > >return false; > > } > > } > > @@ -15869,10 +15875,77 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator > > *gsi) > > gimple_set_location (g, loc); > > gsi_replace (gsi, g, true); > > return true; > >} > > > > +/* unaligned Vector loads. */ > > +case VSX_BUILTIN_LXVW4X_V16QI: > > +case VSX_BUILTIN_LXVW4X_V8HI: > > +case VSX_BUILTIN_LXVW4X_V4SF: > > +case VSX_BUILTIN_LXVW4X_V4SI: > > +case VSX_BUILTIN_LXVD2X_V2DF: > > +case VSX_BUILTIN_LXVD2X_V2DI: > > + { > > +arg0 = gimple_call_arg (stmt, 0); // offset > > +arg1 = gimple_call_arg (stmt, 1); // address > > +lhs = gimple_call_lhs (stmt); > > +location_t loc = gimple_location (stmt); > > +/* Since arg1 may be cast to a different type, just use > > ptr_type_node > > + here instead of trying to enforce TBAA on pointer types. */ > > +tree arg1_type = ptr_type_node; > > +tree lhs_type = TREE_TYPE (lhs); > > +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. > > Create > > + the tree using the value from arg0. The resulting type will > > match > > + the type of arg1. */ > > +gimple_seq stmts = NULL; > > +tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0); > > +tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, > > + arg1_type, arg1, temp_offset); > > +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > > +/* Use the build2 helper to set up the mem_ref. The MEM_REF could > > also > > + take an offset, but since we've already incorporated the offset > > + above, here we just pass in a zero. */ > > +gimple *g; > > +g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, temp_addr, > > + build_int_cst (arg1_type, > > 0))); > > So in GIMPLE the type of the MEM_REF specifies the alignment so my question > is what type does the lhs usually have here? I'd simply guess V4SF, etc.? In yes. (double-checking). my reference for the intrinsic signatures shows the lhs is a vector of type. The rhs can be either *type or *vector of type. vector double vec_vsx_ld (int, const vector double *); vector double vec_vsx_ld (int, const double *); With similar/same for the assorted other types. These are also on my list as 'unaligned' vector loads. I'm not certain if that adds a twist to how I should answer the below.. Bill? > this case you are missing a > tree ltype = build_aligned_type (lhs_type, desired-alignment); > > and use that ltype for building the MEM_REF. I suppose in this case the known > alignment is either BITS_PER_UNIT or element alignment (thus > TYPE_ALIGN (TREE_TYPE (lhs_type)))? I'd think element alignment. but no longer certain. :-) > Or is the type of the load the element types? So, In any case.. I'll build up / modify some tests to look at data being loaded, and see if I can see alignment issues here. Thanks, -Will > Richard. > > > +gimple_set_location (g, loc); > > +gsi_replace (gsi, g, true); > > +return true; > > + } > > + > > +/* unaligned Vector stores. */ > > +case VSX_BUILTIN_STXVW4X_V16QI: > > +case VSX_BUILTIN_STXVW4X_V8HI: > > +case VSX_BUILTIN_STXVW4X_V4SF: > > +case VSX_BUILTIN_STXVW4X_V4SI: > > +case VSX_BUILTIN_STXVD2X_V2DF: > > +case VSX_BUILTIN_STXVD2X_V2DI: > > + { > > +arg0 = gimple_
Re: [PATCH, rs6000 8/9] enable gimple folding for vec_xl, vec_xst
On Jun 1, 2018, at 10:11 AM, Will Schmidt wrote: > > On Fri, 2018-06-01 at 08:53 +0200, Richard Biener wrote: >> On Thu, May 31, 2018 at 9:59 PM Will Schmidt >> wrote: >>> >>> Hi, >>> Add support for gimple folding for unaligned vector loads and stores. >>> testcases posted separately in this thread. >>> >>> Regtest completed across variety of systems, P6,P7,P8,P9. >>> >>> OK for trunk? >>> Thanks, >>> -Will >>> >>> [gcc] >>> >>> 2018-05-31 Will Schmidt >>> >>>* config/rs6000/rs6000.c: (rs6000_builtin_valid_without_lhs) Add >>> vec_xst >>>variants to the list. (rs6000_gimple_fold_builtin) Add support for >>>folding unaligned vector loads and stores. >>> >>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >>> index d62abdf..54b7de2 100644 >>> --- a/gcc/config/rs6000/rs6000.c >>> +++ b/gcc/config/rs6000/rs6000.c >>> @@ -15360,10 +15360,16 @@ rs6000_builtin_valid_without_lhs (enum >>> rs6000_builtins fn_code) >>> case ALTIVEC_BUILTIN_STVX_V8HI: >>> case ALTIVEC_BUILTIN_STVX_V4SI: >>> case ALTIVEC_BUILTIN_STVX_V4SF: >>> case ALTIVEC_BUILTIN_STVX_V2DI: >>> case ALTIVEC_BUILTIN_STVX_V2DF: >>> +case VSX_BUILTIN_STXVW4X_V16QI: >>> +case VSX_BUILTIN_STXVW4X_V8HI: >>> +case VSX_BUILTIN_STXVW4X_V4SF: >>> +case VSX_BUILTIN_STXVW4X_V4SI: >>> +case VSX_BUILTIN_STXVD2X_V2DF: >>> +case VSX_BUILTIN_STXVD2X_V2DI: >>> return true; >>> default: >>> return false; >>> } >>> } >>> @@ -15869,10 +15875,77 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator >>> *gsi) >>>gimple_set_location (g, loc); >>>gsi_replace (gsi, g, true); >>>return true; >>> } >>> >>> +/* unaligned Vector loads. */ >>> +case VSX_BUILTIN_LXVW4X_V16QI: >>> +case VSX_BUILTIN_LXVW4X_V8HI: >>> +case VSX_BUILTIN_LXVW4X_V4SF: >>> +case VSX_BUILTIN_LXVW4X_V4SI: >>> +case VSX_BUILTIN_LXVD2X_V2DF: >>> +case VSX_BUILTIN_LXVD2X_V2DI: >>> + { >>> +arg0 = gimple_call_arg (stmt, 0); // offset >>> +arg1 = gimple_call_arg (stmt, 1); // address >>> +lhs = gimple_call_lhs (stmt); >>> +location_t loc = gimple_location (stmt); >>> +/* Since arg1 may be cast to a different type, just use >>> ptr_type_node >>> + here instead of trying to enforce TBAA on pointer types. */ >>> +tree arg1_type = ptr_type_node; >>> +tree lhs_type = TREE_TYPE (lhs); >>> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. >>> Create >>> + the tree using the value from arg0. The resulting type will >>> match >>> + the type of arg1. */ >>> +gimple_seq stmts = NULL; >>> +tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0); >>> +tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, >>> + arg1_type, arg1, temp_offset); >>> +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >>> +/* Use the build2 helper to set up the mem_ref. The MEM_REF could >>> also >>> + take an offset, but since we've already incorporated the offset >>> + above, here we just pass in a zero. */ >>> +gimple *g; >>> +g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, temp_addr, >>> + build_int_cst (arg1_type, >>> 0))); >> >> So in GIMPLE the type of the MEM_REF specifies the alignment so my question >> is what type does the lhs usually have here? I'd simply guess V4SF, etc.? >> In > > yes. (double-checking). my reference for the intrinsic signatures > shows the lhs is a vector of type. The rhs can be either *type or > *vector of type. > > vector double vec_vsx_ld (int, const vector double *); > vector double vec_vsx_ld (int, const double *); > With similar/same for the assorted other types. > > These are also on my list as 'unaligned' vector loads. I'm not certain > if that adds a twist to how I should answer the below.. > > Bill? 'unaligned' means not necessarily aligned on a vector boundary. They are guaranteed to be aligned on an element boundary. > >> this case you are missing a >> tree ltype = build_aligned_type (lhs_type, desired-alignment); >> >> and use that ltype for building the MEM_REF. I suppose in this case the >> known >> alignment is either BITS_PER_UNIT or element alignment (thus >> TYPE_ALIGN (TREE_TYPE (lhs_type)))? > > I'd think element alignment. but no longer certain. :-) Yep, element alignment. Thanks, Bill > >> Or is the type of the load the element types? > > > So, In any case.. I'll build up / modify some tests to look at data > being loaded, and see if I can see alignment issues here. > > Thanks, > -Will > > > >> Richard. >> >>> +gimple_set_location (g, loc); >>> +gsi_replace (gsi, g, true); >>> +return true; >>> + } >>> + >>> +/* unaligned Vector stores. */
[gomp5] Use error_at rather than error for OpenMP clause checking
Hi! I've noticed a lot of spots where we weren't using exact location, even when we have OMP_CLAUSE_LOCATION we can use. Fixed thusly, committed to gomp-5_0-branch. 2018-06-01 Jakub Jelinek * c-typeck.c (c_finish_omp_clauses): Use error_at with OMP_CLAUSE_LOCATION (c) as first argument instead of error. Formatting fix. * semantics.c (finish_omp_reduction_clause): Use error_at with OMP_CLAUSE_LOCATION (c) as first argument instead of error. (finish_omp_clauses): * g++.dg/gomp/pr33372-1.C: Adjust location of the expected diagnostics. * g++.dg/gomp/pr33372-3.C: Likewise. --- gcc/c/c-typeck.c.jj 2018-06-01 09:52:06.498468358 +0200 +++ gcc/c/c-typeck.c2018-06-01 16:30:56.051806429 +0200 @@ -13359,7 +13359,7 @@ c_finish_omp_clauses (tree clauses, enum if (TYPE_ATOMIC (TREE_TYPE (t))) { error_at (OMP_CLAUSE_LOCATION (c), - "%<_Atomic%> %qD in % clause", t); + "%<_Atomic%> %qD in % clause", t); remove = true; break; } @@ -13417,7 +13417,9 @@ c_finish_omp_clauses (tree clauses, enum { if (bitmap_bit_p (&oacc_reduction_head, DECL_UID (t))) { - error ("%qD appears more than once in reduction clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears more than once in reduction clauses", + t); remove = true; } else @@ -13435,9 +13437,11 @@ c_finish_omp_clauses (tree clauses, enum && bitmap_bit_p (&map_head, DECL_UID (t))) { if (ort == C_ORT_ACC) - error ("%qD appears more than once in data clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears more than once in data clauses", t); else - error ("%qD appears both in data and map clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears both in data and map clauses", t); remove = true; } else @@ -13464,9 +13468,11 @@ c_finish_omp_clauses (tree clauses, enum else if (bitmap_bit_p (&map_head, DECL_UID (t))) { if (ort == C_ORT_ACC) - error ("%qD appears more than once in data clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears more than once in data clauses", t); else - error ("%qD appears both in data and map clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears both in data and map clauses", t); remove = true; } else @@ -13641,14 +13647,17 @@ c_finish_omp_clauses (tree clauses, enum if (bitmap_bit_p (&map_head, DECL_UID (t))) { if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP) - error ("%qD appears more than once in motion" - " clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears more than once in motion " + "clauses", t); else if (ort == C_ORT_ACC) - error ("%qD appears more than once in data" - " clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears more than once in data " + "clauses", t); else - error ("%qD appears more than once in map" - " clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears more than once in map " + "clauses", t); remove = true; } else @@ -13758,15 +13767,18 @@ c_finish_omp_clauses (tree clauses, enum if (bitmap_bit_p (&generic_head, DECL_UID (t)) || bitmap_bit_p (&firstprivate_head, DECL_UID (t))) { - error ("%qD appears more than once in data clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), + "%qD appears more than once in data clauses", t); remove = true; } else if (bitmap_bit_p (&map_head, DECL_UID (t))) { if (ort == C_ORT_ACC) - error ("%qD appears more than once in data clauses", t); + error_at (OMP_CLAUSE_LOCATION (c), +
Re: [PATCH 09/10] Experiment with using optinfo in gimple-loop-interchange.cc
On June 1, 2018 3:40:15 PM GMT+02:00, David Malcolm wrote: >On Fri, 2018-06-01 at 11:50 +0200, Richard Biener wrote: >> On Tue, May 29, 2018 at 10:33 PM David Malcolm >> wrote: >> > >> > This was an experiment to try to capture information on a >> > loop optimization. >> > >> > gcc/ChangeLog: >> > * gimple-loop-interchange.cc (should_interchange_loops): >> > Add >> > optinfo note when interchange gives better data locality >> > behavior. >> > (tree_loop_interchange::interchange): Add OPTINFO_SCOPE. >> > Add optinfo for successful and unsuccessful interchanges. >> > (prepare_perfect_loop_nest): Add OPTINFO_SCOPE. Add >> > optinfo note. >> > (pass_linterchange::execute): Add OPTINFO_SCOPE. >> > --- >> > gcc/gimple-loop-interchange.cc | 36 >> > +++- >> > 1 file changed, 35 insertions(+), 1 deletion(-) >> > >> > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop- >> > interchange.cc >> > index eb35263..cd32288 100644 >> > --- a/gcc/gimple-loop-interchange.cc >> > +++ b/gcc/gimple-loop-interchange.cc >> > @@ -1556,7 +1556,19 @@ should_interchange_loops (unsigned i_idx, >> > unsigned o_idx, >> >ratio = innermost_loops_p ? INNER_STRIDE_RATIO : >> > OUTER_STRIDE_RATIO; >> >/* Do interchange if it gives better data locality behavior. */ >> >if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, ratio))) >> > -return true; >> > +{ >> > + if (optinfo_enabled_p ()) >> > + OPTINFO_NOTE ((gimple *)NULL) // FIXME >> > + << "interchange gives better data locality behavior: " >> > + << "iloop_strides: " >> > + << decu (iloop_strides) >> > + << " > (oloop_strides: " >> > + << decu (oloop_strides) >> > + << " * ratio: " >> > + << decu (ratio) >> > + << ")"; >> >> Just randomly inside the thread. >> >> NOO! >> >> :/ > >> Please do _not_ add more stream-like APIs. How do you expect >> translators to deal with those? >> >> Yes, I'm aware of the graphite-* ones and I dislike those very much. >> >> What's wrong with the existing dump API? > >The existing API suffers from a "wall of text" problem: > >* although it's possible to filter based on various criteria (the >optgroup tags, specific passes, success vs failure), it's not possible >to filter base on code hotness: the -fopt-info API works purely in >terms of location_t. So all of the messages about the hottest >functions in the workload are intermingled with all of the other >messages about all of the other functions. True >* some of the text notes refer to function entry, but all of these are >emitted "at the same level": there's no way to see the nesting of these >function-entry logs, and where other notes are in relation to them. >For example, in: > > test.c:8:3: note: === analyzing loop === > test.c:8:3: note: === analyze_loop_nest === > test.c:8:3: note: === vect_analyze_loop_form === > test.c:8:3: note: === get_loop_niters === >test.c:8:3: note: symbolic number of iterations is (unsigned int) >n_9(D) >test.c:8:3: note: not vectorized: loop contains function calls or data >references that cannot be analyzed > test.c:8:3: note: vectorized 0 loops in function > >there's no way to tell that the "vect_analyze_loop_form" is in fact >inside the call to "analyze_loop_nest", and where the "symbolic number >of iterations" messages is coming from in relation to them. This may >not seem significant here, but I'm quoting a small example; >vectorization typically leads to dozens of messages, with a deep >nesting structure (where that structure isn't visible in the -fopt-info > >output). True. The same issue exists for diagnostics BTW. Indeed, being able to collapse 'sections' in dump files, opt-info or diagnostics sounds useful. Note that for dump files and opt-info the level argument was sort of designed to do that. > >The existing API is throwing data away: > >* as noted above, by working purely with a location_t, the execution >count isn't associated with the messages. The output format purely >gives file/line/column information, but doesn't cover the inlining >chain. For C++ templates it doesn't specify which instance of a >template is being optimized. It might be useful to enhance the interface by allowing different kind of 'locations'. >* there's no metadata about where the messages are coming from. It's >easy to get at the current pass internally, but the messages don't >capture that. Figuring out where a message came from requires grepping >the GCC source code. The prototype I posted captures the __FILE__ and >__LINE__ within the gcc source for every message emitted, and which >pass instance emitted it. The opt info group was supposed to captures this to the level interesting for a user. >* The current output format is of the form: > "FILE:LINE:COLUMN: free-form text\n" >This is only machine-readable up to a point: if a program is
Re: [PATCH, rs6000 8/9] enable gimple folding for vec_xl, vec_xst
On June 1, 2018 5:15:58 PM GMT+02:00, Bill Schmidt wrote: >On Jun 1, 2018, at 10:11 AM, Will Schmidt >wrote: >> >> On Fri, 2018-06-01 at 08:53 +0200, Richard Biener wrote: >>> On Thu, May 31, 2018 at 9:59 PM Will Schmidt > wrote: Hi, Add support for gimple folding for unaligned vector loads and >stores. testcases posted separately in this thread. Regtest completed across variety of systems, P6,P7,P8,P9. OK for trunk? Thanks, -Will [gcc] 2018-05-31 Will Schmidt * config/rs6000/rs6000.c: (rs6000_builtin_valid_without_lhs) >Add vec_xst variants to the list. (rs6000_gimple_fold_builtin) Add >support for folding unaligned vector loads and stores. diff --git a/gcc/config/rs6000/rs6000.c >b/gcc/config/rs6000/rs6000.c index d62abdf..54b7de2 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -15360,10 +15360,16 @@ rs6000_builtin_valid_without_lhs (enum >rs6000_builtins fn_code) case ALTIVEC_BUILTIN_STVX_V8HI: case ALTIVEC_BUILTIN_STVX_V4SI: case ALTIVEC_BUILTIN_STVX_V4SF: case ALTIVEC_BUILTIN_STVX_V2DI: case ALTIVEC_BUILTIN_STVX_V2DF: +case VSX_BUILTIN_STXVW4X_V16QI: +case VSX_BUILTIN_STXVW4X_V8HI: +case VSX_BUILTIN_STXVW4X_V4SF: +case VSX_BUILTIN_STXVW4X_V4SI: +case VSX_BUILTIN_STXVD2X_V2DF: +case VSX_BUILTIN_STXVD2X_V2DI: return true; default: return false; } } @@ -15869,10 +15875,77 @@ rs6000_gimple_fold_builtin >(gimple_stmt_iterator *gsi) gimple_set_location (g, loc); gsi_replace (gsi, g, true); return true; } +/* unaligned Vector loads. */ +case VSX_BUILTIN_LXVW4X_V16QI: +case VSX_BUILTIN_LXVW4X_V8HI: +case VSX_BUILTIN_LXVW4X_V4SF: +case VSX_BUILTIN_LXVW4X_V4SI: +case VSX_BUILTIN_LXVD2X_V2DF: +case VSX_BUILTIN_LXVD2X_V2DI: + { +arg0 = gimple_call_arg (stmt, 0); // offset +arg1 = gimple_call_arg (stmt, 1); // address +lhs = gimple_call_lhs (stmt); +location_t loc = gimple_location (stmt); +/* Since arg1 may be cast to a different type, just use >ptr_type_node + here instead of trying to enforce TBAA on pointer >types. */ +tree arg1_type = ptr_type_node; +tree lhs_type = TREE_TYPE (lhs); +/* POINTER_PLUS_EXPR wants the offset to be of type >'sizetype'. Create + the tree using the value from arg0. The resulting type >will match + the type of arg1. */ +gimple_seq stmts = NULL; +tree temp_offset = gimple_convert (&stmts, loc, sizetype, >arg0); +tree temp_addr = gimple_build (&stmts, loc, >POINTER_PLUS_EXPR, + arg1_type, arg1, >temp_offset); +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); +/* Use the build2 helper to set up the mem_ref. The >MEM_REF could also + take an offset, but since we've already incorporated >the offset + above, here we just pass in a zero. */ +gimple *g; +g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, >temp_addr, + build_int_cst >(arg1_type, 0))); >>> >>> So in GIMPLE the type of the MEM_REF specifies the alignment so my >question >>> is what type does the lhs usually have here? I'd simply guess V4SF, >etc.? In >> >> yes. (double-checking). my reference for the intrinsic signatures >> shows the lhs is a vector of type. The rhs can be either *type or >> *vector of type. >> >> vector double vec_vsx_ld (int, const vector double *); >> vector double vec_vsx_ld (int, const double *); >> With similar/same for the assorted other types. >> >> These are also on my list as 'unaligned' vector loads. I'm not >certain >> if that adds a twist to how I should answer the below.. >> >> Bill? > >'unaligned' means not necessarily aligned on a vector boundary. >They are guaranteed to be aligned on an element boundary. >> >>> this case you are missing a >>> tree ltype = build_aligned_type (lhs_type, desired-alignment); >>> >>> and use that ltype for building the MEM_REF. I suppose in this case >the known >>> alignment is either BITS_PER_UNIT or element alignment (thus >>> TYPE_ALIGN (TREE_TYPE (lhs_type)))? >> >> I'd think element alignment. but no longer certain. :-) > >Yep, element alignment. Note the x86 unaligned intrinsics support arbitray unaligned loads. So that's not available for power? Does the HW implementation require element alignment? Richard. >Thanks, >Bill >> >>> Or is the type of the load the element types? >> >> >> So, In any case.. I'll bu
Re: [PATCH, rs6000 8/9] enable gimple folding for vec_xl, vec_xst
On Jun 1, 2018, at 10:35 AM, Richard Biener wrote: > > On June 1, 2018 5:15:58 PM GMT+02:00, Bill Schmidt > wrote: >> On Jun 1, 2018, at 10:11 AM, Will Schmidt >> wrote: >>> >>> On Fri, 2018-06-01 at 08:53 +0200, Richard Biener wrote: On Thu, May 31, 2018 at 9:59 PM Will Schmidt >> wrote: > > Hi, > Add support for gimple folding for unaligned vector loads and >> stores. > testcases posted separately in this thread. > > Regtest completed across variety of systems, P6,P7,P8,P9. > > OK for trunk? > Thanks, > -Will > > [gcc] > > 2018-05-31 Will Schmidt > > * config/rs6000/rs6000.c: (rs6000_builtin_valid_without_lhs) >> Add vec_xst > variants to the list. (rs6000_gimple_fold_builtin) Add >> support for > folding unaligned vector loads and stores. > > diff --git a/gcc/config/rs6000/rs6000.c >> b/gcc/config/rs6000/rs6000.c > index d62abdf..54b7de2 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -15360,10 +15360,16 @@ rs6000_builtin_valid_without_lhs (enum >> rs6000_builtins fn_code) >case ALTIVEC_BUILTIN_STVX_V8HI: >case ALTIVEC_BUILTIN_STVX_V4SI: >case ALTIVEC_BUILTIN_STVX_V4SF: >case ALTIVEC_BUILTIN_STVX_V2DI: >case ALTIVEC_BUILTIN_STVX_V2DF: > +case VSX_BUILTIN_STXVW4X_V16QI: > +case VSX_BUILTIN_STXVW4X_V8HI: > +case VSX_BUILTIN_STXVW4X_V4SF: > +case VSX_BUILTIN_STXVW4X_V4SI: > +case VSX_BUILTIN_STXVD2X_V2DF: > +case VSX_BUILTIN_STXVD2X_V2DI: > return true; >default: > return false; >} > } > @@ -15869,10 +15875,77 @@ rs6000_gimple_fold_builtin >> (gimple_stmt_iterator *gsi) > gimple_set_location (g, loc); > gsi_replace (gsi, g, true); > return true; > } > > +/* unaligned Vector loads. */ > +case VSX_BUILTIN_LXVW4X_V16QI: > +case VSX_BUILTIN_LXVW4X_V8HI: > +case VSX_BUILTIN_LXVW4X_V4SF: > +case VSX_BUILTIN_LXVW4X_V4SI: > +case VSX_BUILTIN_LXVD2X_V2DF: > +case VSX_BUILTIN_LXVD2X_V2DI: > + { > +arg0 = gimple_call_arg (stmt, 0); // offset > +arg1 = gimple_call_arg (stmt, 1); // address > +lhs = gimple_call_lhs (stmt); > +location_t loc = gimple_location (stmt); > +/* Since arg1 may be cast to a different type, just use >> ptr_type_node > + here instead of trying to enforce TBAA on pointer >> types. */ > +tree arg1_type = ptr_type_node; > +tree lhs_type = TREE_TYPE (lhs); > +/* POINTER_PLUS_EXPR wants the offset to be of type >> 'sizetype'. Create > + the tree using the value from arg0. The resulting type >> will match > + the type of arg1. */ > +gimple_seq stmts = NULL; > +tree temp_offset = gimple_convert (&stmts, loc, sizetype, >> arg0); > +tree temp_addr = gimple_build (&stmts, loc, >> POINTER_PLUS_EXPR, > + arg1_type, arg1, >> temp_offset); > +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > +/* Use the build2 helper to set up the mem_ref. The >> MEM_REF could also > + take an offset, but since we've already incorporated >> the offset > + above, here we just pass in a zero. */ > +gimple *g; > +g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, >> temp_addr, > + build_int_cst >> (arg1_type, 0))); So in GIMPLE the type of the MEM_REF specifies the alignment so my >> question is what type does the lhs usually have here? I'd simply guess V4SF, >> etc.? In >>> >>> yes. (double-checking). my reference for the intrinsic signatures >>> shows the lhs is a vector of type. The rhs can be either *type or >>> *vector of type. >>> >>> vector double vec_vsx_ld (int, const vector double *); >>> vector double vec_vsx_ld (int, const double *); >>> With similar/same for the assorted other types. >>> >>> These are also on my list as 'unaligned' vector loads. I'm not >> certain >>> if that adds a twist to how I should answer the below.. >>> >>> Bill? >> >> 'unaligned' means not necessarily aligned on a vector boundary. >> They are guaranteed to be aligned on an element boundary. >>> this case you are missing a tree ltype = build_aligned_type (lhs_type, desired-alignment); and use that ltype for building the MEM_REF. I suppose in this case >> the known alignment is either BITS_PER_UNIT or element alignment (thus TYPE_ALIGN (TREE_TYPE (lhs_type)))? >>> >>> I'd think element alignment. but no longer certain. :-) >> >> Yep, element alignment. > > Note the x86 unaligned intrinsics support arbitray unaligned loads. So
[PATCH][AArch64] Used prefer aliases SXTL(2) and UXTL(2)
Hi all, The SSHL and USHL instructions are used with a shift operand of zero to sign and zero-extend integer vectors into wider modes. GCC makes extensive use of them to "unpack" vectors. AArch64 defines a shorthand alias for that case. Instead of writing: SSHLL ., ., 0 we can write SXTL ., . Similar for the unsigned versions and the high-part versions (SSHL2 -> SXTL2). This makes the assembly of vectorised functions a bit more readable. Bootstrapped and tested on aarch64-none-linux-gnu. Committing to trunk as obvious. Thanks, Kyrill 2018-06-01 Kyrylo Tkachov * config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpack_lo_): Use UXTL and SXTL assembler mnemonics. (aarch64_simd_vec_unpack_hi_): Use UXTL2 and SXTL2 assembler mnemonics. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index de059f6923546d3359a9b86dc7011b8ae8c80b82..d886b61f03751f7b76758cafed41c31e553b8e2a 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1567,7 +1567,7 @@ (define_insn "aarch64_simd_vec_unpack_lo_" (match_operand:VQW 2 "vect_par_cnst_lo_half" "") )))] "TARGET_SIMD" - "shll\t%0., %1., 0" + "xtl\t%0., %1." [(set_attr "type" "neon_shift_imm_long")] ) @@ -1578,7 +1578,7 @@ (define_insn "aarch64_simd_vec_unpack_hi_" (match_operand:VQW 2 "vect_par_cnst_hi_half" "") )))] "TARGET_SIMD" - "shll2\t%0., %1., 0" + "xtl2\t%0., %1." [(set_attr "type" "neon_shift_imm_long")] )
Re: C++ PATCH for c++/85977, array reference size deduction failure
On Thu, May 31, 2018 at 10:11:27AM -0400, Jason Merrill wrote: > On Wed, May 30, 2018 at 5:23 PM, Marek Polacek wrote: > > We are failing to deduce the template parameter N here > > > > template > > void foo(const long int (&)[N]) {} > > > > void bar() { > > foo ({1,2,3}); > > } > > > > because of the type mismatch; parm is long int (element type of the array), > > while arg is int (element type of {1, 2, 3}), and unify doesn't like that: > > > > 21789 /* We have already checked cv-qualification at the top of the > > 21790 function. */ > > 21791 if (!same_type_ignoring_top_level_qualifiers_p (arg, parm)) > > 21792 return unify_type_mismatch (explain_p, parm, arg); > > > > But since the parameter type is array, we should see if there exists an > > implicit conversion sequence for each element of the array from the > > corresponding element of the initializer list, and that's what I tried, > > and it seems to work. > > > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > > > 2018-05-30 Marek Polacek > > > > PR c++/85977 > > * pt.c (unify): Handle the [over.ics.list]/6 case. > > > > + /* [over.ics.list]/6 says we should try an implicit conversion > > +from each list element to the corresponding array element > > +type. */ > > [over.ics.list] doesn't apply during template argument deduction, but > rather [temp.deduct.call]. Ah, I see. > > + if (TREE_CODE (parm) == ARRAY_TYPE) > > + { > > + tree x = perform_implicit_conversion (elttype, elt, > > complain); > > Rather than check this immediately here, we want to do more or less > what unify_one_argument does: > > if (strict != DEDUCE_EXACT > && TYPE_P (parm) && !uses_deducible_template_parms (parm)) > /* For function parameters with no deducible template parameters, >just return. We'll check non-dependent conversions later. */ > return unify_success (explain_p); > > so if elttype has no deducible template parms, don't do deduction from > the list elements at all. If I return unify_success if !uses_deducible_template_parms (elttype) from unify, it breaks even the test that worked before (with int instead of long int), because in type_unification_real we end up calling 20338 return unify_parameter_deduction_failure (explain_p, tparm); because tparms has N, but targs is empty (i.e. we weren't able to deduce the template parameter N to 3 in the unify_one_argument call before). And so... > And then we want to check convertibility of the elements in > type_unification_real, when we check convertibility of other function > parameters that don't involve template parameters: > > /* DR 1391: All parameters have args, now check non-dependent > parms for > convertibility. */ ...we'll never get here. What am I missing? I suppose what you want to do is to somehow defer the deducing for later, but I admit I'm lost here. Marek
Re: C++ PATCH for c++/85977, array reference size deduction failure
On Fri, Jun 1, 2018 at 11:52 AM, Marek Polacek wrote: > On Thu, May 31, 2018 at 10:11:27AM -0400, Jason Merrill wrote: >> On Wed, May 30, 2018 at 5:23 PM, Marek Polacek wrote: >> > We are failing to deduce the template parameter N here >> > >> > template >> > void foo(const long int (&)[N]) {} >> > >> > void bar() { >> > foo ({1,2,3}); >> > } >> > >> > because of the type mismatch; parm is long int (element type of the array), >> > while arg is int (element type of {1, 2, 3}), and unify doesn't like that: >> > >> > 21789 /* We have already checked cv-qualification at the top of the >> > 21790 function. */ >> > 21791 if (!same_type_ignoring_top_level_qualifiers_p (arg, parm)) >> > 21792 return unify_type_mismatch (explain_p, parm, arg); >> > >> > But since the parameter type is array, we should see if there exists an >> > implicit conversion sequence for each element of the array from the >> > corresponding element of the initializer list, and that's what I tried, >> > and it seems to work. >> > >> > Bootstrapped/regtested on x86_64-linux, ok for trunk? >> > >> > 2018-05-30 Marek Polacek >> > >> > PR c++/85977 >> > * pt.c (unify): Handle the [over.ics.list]/6 case. >> > >> > + /* [over.ics.list]/6 says we should try an implicit >> > conversion >> > +from each list element to the corresponding array element >> > +type. */ >> >> [over.ics.list] doesn't apply during template argument deduction, but >> rather [temp.deduct.call]. > > Ah, I see. > >> > + if (TREE_CODE (parm) == ARRAY_TYPE) >> > + { >> > + tree x = perform_implicit_conversion (elttype, elt, >> > complain); >> >> Rather than check this immediately here, we want to do more or less >> what unify_one_argument does: >> >> if (strict != DEDUCE_EXACT >> && TYPE_P (parm) && !uses_deducible_template_parms (parm)) >> /* For function parameters with no deducible template parameters, >>just return. We'll check non-dependent conversions later. */ >> return unify_success (explain_p); >> >> so if elttype has no deducible template parms, don't do deduction from >> the list elements at all. > > If I return unify_success if !uses_deducible_template_parms (elttype) from > unify, it breaks even the test that worked before (with int instead of long > int), because in type_unification_real we end up calling > 20338 return unify_parameter_deduction_failure (explain_p, tparm); > because tparms has N, but targs is empty (i.e. we weren't able to deduce > the template parameter N to 3 in the unify_one_argument call before). Yes, we still need to do deduction on the array bound, so we can't return immediately. We just want to avoid the FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (arg), i, elt) loop. Jason
Re: [PATCH] Implement Fortran 2018's RANDOM_INIT
On Fri, Jun 01, 2018 at 09:50:06AM +0300, Janne Blomqvist wrote: > On Mon, May 28, 2018 at 8:06 PM, Steve Kargl < > s...@troutmask.apl.washington.edu> wrote: > > > The attached patch implements the RANDOM_INIT intrinsic > > subroutine specified in Fortran 2018. I have had this > > patch in my local tree for the last 5+ months. Now that > > 8.1 is out, it is time to submit it. It has been built > > and regression tested on x86_64-*-freebsd. OK to commit? > > > > Note, I have only tested with -fcoarray=single as I don't > > have OpenCoarray set up to build with trunk. Testing with > > OpenCoarray is encouraged. > > > > 2018-05-28 Steven G. Kargl > > > > * check.c (gfc_check_random_init): New function. Check arguments of > > RANDOM_INIT. > > * gfortran.h (GFC_ISYM_RANDOM_INIT): New enum token. > > * intrinsic.c (add_subroutines): Add RANDOM_INIT to list of > > subroutines. > > * intrinsic.h: Add prototypes for gfc_check_random_init and > > gfc_resolve_random_init > > * intrinsic.texi: Document new intrinsic subprogram. > > * iresolve.c (gfc_resolve_random_init): Resolve routine name. > > * trans-decl.c: Declare gfor_fndecl_random_init > > * trans-intrinsic.c (conv_intrinsic_random_init): New function. > > Translate call to RANDOM_INIT. > > (gfc_conv_intrinsic_subroutine): Call it. > > * trans.h: Declare gfor_fndecl_random_init > > > > 2018-05-28 Steven G. Kargl > > > > * gfortran.dg/random_init_1.f90: New test. > > * gfortran.dg/random_init_2.f90: New test. > > * gfortran.dg/random_init_3.f90: New test. > > * gfortran.dg/random_init_4.f90: New test. > > * gfortran.dg/random_init_5.f90: New test. > > * gfortran.dg/random_init_6.f90: New test. > > > > 2018-05-28 Steven G. Kargl > > > > * libgfortran/Makefile.am: Add random_init.f90 to build. > > * libgfortran/Makefile.in: Regenerated. > > * libgfortran/gfortran.map: Expose symbol for > > _gfortran_random_init. > > * libgfortran/intrinsics/random_init.f90: Implementation. > > > > Looks good, thanks for the patch! > Committed as revision 261075. This patch will not be backported. Ping me if something seems broken. -- Steve
[PATCH, rs6000] Clean up implementation of built-in functions
This patch improves maintainability of the rs6000 built-in functions by adding a comment to describe the non-traditional implementation of the __builtin_vec_vsx_ld and __builtin_vec_vsx_st functions, and by removing eight redundant entries from the altivec_overloaded_builtins array. Note, in the patch file, that the lines immediately preceding each of the deletions from altivec_overloaded_builtins exactly matches the deleted lines. This redundancy may have been accidentally introduced by manual resolution of merge conflicts. I did not investigate the origin of the redundancy. The redundant entries cause trouble to tools that automate consistency checking between implementation and documentation of built-in functions. Additionally, they are a likely cause of future bugs if any future efforts need to make corrections or changes to the associated functions. This patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8). Is this ok for trunk? gcc/ChangeLog: 2018-06-01 Kelvin Nilsen * config/rs6000/rs6000-builtin.def (VSX_BUILTIN_VEC_LD, VSX_BUILTIN_VEC_ST): Add comment to explain non-traditional uses. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove several redundant entries. Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 261067) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -1811,6 +1811,15 @@ BU_VSX_OVERLOAD_1 (VUNSIGNEDE, "vunsignede") BU_VSX_OVERLOAD_1 (VUNSIGNEDO, "vunsignedo") /* VSX builtins that are handled as special cases. */ + + +/* NON-TRADITIONAL BEHAVIOR HERE: Besides introducing the + __builtin_vec_ld and __builtin_vec_st built-in functions, + the VSX_BUILTIN_VEC_LD and VSX_BUILTIN_VEC_ST symbolic constants + introduced below are also affiliated with the __builtin_vec_vsx_ld + and __builtin_vec_vsx_st functions respectively. This unnatural + binding is formed with explicit calls to the def_builtin function + found in rs6000.c. */ BU_VSX_OVERLOAD_X (LD, "ld") BU_VSX_OVERLOAD_X (ST, "st") BU_VSX_OVERLOAD_X (XL, "xl") Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 261067) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -1375,28 +1375,16 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTSW, ALTIVEC_BUILTIN_VCMPGTSW, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTSW, ALTIVEC_BUILTIN_VCMPGTSW, -RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTUW, ALTIVEC_BUILTIN_VCMPGTUW, RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTUW, ALTIVEC_BUILTIN_VCMPGTUW, -RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTSH, ALTIVEC_BUILTIN_VCMPGTSH, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTSH, ALTIVEC_BUILTIN_VCMPGTSH, -RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTUH, ALTIVEC_BUILTIN_VCMPGTUH, RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTUH, ALTIVEC_BUILTIN_VCMPGTUH, -RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTSB, ALTIVEC_BUILTIN_VCMPGTSB, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTSB, ALTIVEC_BUILTIN_VCMPGTSB, -RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTUB, ALTIVEC_BUILTIN_VCMPGTUB, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTUB, ALTIVEC_BUILTIN_VCMPGTUB, -RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPLE, ALTIVEC_BUILTIN_VCMPGEFP, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_XVCMPGEDP, @@ -4249,8 +4237,6 @@ const struct altivec_builtin_types altivec_overloa { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 }, { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI, -RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 }, - { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 }, { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, @@ -5620,8 +5606,6 @@ const struct
[PATCH] rs6000: Fix mangling for 128-bit float
This patch changes the (C++) mangling of the 128-bit float types. IBM long double ("double-double") is mangled as "g", as before, and IEEE 128-bit long double is mangled as "u9__ieee128". Bootstrapped and tested on powerpc64-linux {-m32,-m64} (Power7) and on powerpc64le-linux (Power9). Also tested manually; testsuite patches will follow soon. Committing to trunk. Segher 2018-06-01 Segher Boessenkool * config/rs6000/rs6000.c (rs6000_mangle_type): Change the mangling of the 128-bit floating point types. Fix function comment. --- gcc/config/rs6000/rs6000.c | 31 +++ 1 file changed, 7 insertions(+), 24 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index fab8ee3..a755491 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -31983,8 +31983,9 @@ rs6000_handle_altivec_attribute (tree *node, return NULL_TREE; } -/* AltiVec defines four built-in scalar types that serve as vector - elements; we must teach the compiler how to mangle them. */ +/* AltiVec defines five built-in scalar types that serve as vector + elements; we must teach the compiler how to mangle them. The 128-bit + floating point mangling is target-specific as well. */ static const char * rs6000_mangle_type (const_tree type) @@ -32001,30 +32002,12 @@ rs6000_mangle_type (const_tree type) if (type == bool_int_type_node) return "U6__booli"; if (type == bool_long_long_type_node) return "U6__boolx"; - /* Use a unique name for __float128 rather than trying to use "e" or "g". Use - "g" for IBM extended double, no matter whether it is long double (using - -mabi=ibmlongdouble) or the distinct __ibm128 type. */ - if (TARGET_FLOAT128_TYPE) -{ - if (type == ieee128_float_type_node) - return "U10__float128"; - - if (type == ibm128_float_type_node) - return "u8__ibm128"; - - if (TARGET_LONG_DOUBLE_128 && type == long_double_type_node) - return (TARGET_IEEEQUAD) ? "U10__float128" : "g"; -} - - /* Mangle IBM extended float long double as `g' (__float128) on - powerpc*-linux where long-double-64 previously was the default. */ - if (TYPE_MAIN_VARIANT (type) == long_double_type_node - && TARGET_ELF - && TARGET_LONG_DOUBLE_128 - && !TARGET_IEEEQUAD) + if (SCALAR_FLOAT_TYPE_P (type) && FLOAT128_IBM_P (TYPE_MODE (type))) return "g"; + if (SCALAR_FLOAT_TYPE_P (type) && FLOAT128_IEEE_P (TYPE_MODE (type))) +return "u9__ieee128"; - /* For all other types, use normal C++ mangling. */ + /* For all other types, use the default mangling. */ return NULL; } -- 1.8.3.1
Re: [patch] Enhance GIMPLE store-merging pass for bit-fields
> Just a general remark, the many non-functional but stylistic changes do not > make the patch easier to review ;) Sorry about that (in this case the ChangeLog is supposed to help a little). > + /* If bit insertion is required, we use the source as an > accumulator +into which the successive bit-field values are > manually inserted. +FIXME: perhaps use BIT_INSERT_EXPR instead > in some cases? */ > > so exactly - why not use BIT_INSERT_EXPR here? Some reasoning would > be nice to have mentioned here. Is it because BIT_INSERT_EXPR will > basically go the store_bit_field expansion route and thus generate the > same code as the unmerged one? I just copied an existing comment from Jakub a few lines below: /* FIXME: If there is a single chunk of zero bits in mask, perhaps use BIT_INSERT_EXPR instead? */ stmt = gimple_build_assign (make_ssa_name (int_type), BIT_AND_EXPR, tem, mask); so I didn't actually experiment. But since I use BIT_INSERT_EXPR as rhs_code, I wanted to acknowledge the possibility of also generating it. > For code-generating adjacent inserts wouldn't it be more efficient > to do > > accum = first-to-be-inserted-val; > accum <<= shift-to-next-inserted-val; > accum |= next-to-be-inserted-val; > ... > accum <<= final-shift; > > instead of shifting the to-be-inserted value? Alexander provided a valid rebuttal I think. > Otherwise the patch looks OK to me. Thanks for the review. -- Eric Botcazou
Re: [PATCH , rs6000] Add missing builtin test cases, fix arguments to match specifications.
Hi Carl, On Tue, May 29, 2018 at 08:37:01AM -0700, Carl Love wrote: > * gcc.target/powerpc/builtins-3.c: Add tests > test_sll_vuill_vuill_vuc, Stray tab. > +/* { dg-final { scan-assembler-times "vupklpx" 1 { target le } } } */ > +/* { dg-final { scan-assembler-times "vupklpx" 1 { target be } } } */ > +/* { dg-final { scan-assembler-times "vupkhpx" 1 { target le } } } */ > +/* { dg-final { scan-assembler-times "vupkhpx" 1 { target be } } } */ That is fine of course, but looks a bit silly ;-) Merge them? > +/* { dg-final { scan-assembler-times "xxlor" 11 { target { be && ilp32 } } } > } */ > +/* { dg-final { scan-assembler-times "xxlor" 7 { target { be && lp64 } } } > } */ > +/* { dg-final { scan-assembler-times "xxlor" 7 { target le } } } */ You can do this with just "11 { target ilp32 }" and "7 { target lp64 }", if that makes sense for the test. > +/* { dg-final { scan-assembler-times {\mrldic\M} 0 { target { be && ilp32 } > } } } */ > +/* { dg-final { scan-assembler-times {\mrldic\M} 64 { target { be && lp64 } > } } } */ > +/* { dg-final { scan-assembler-times {\mrldic\M} 64 { target le } } } */ Similar. Looks great otherwise. Okay for trunk. Thanks! Segher
[MAINTAINERS, committed] Update my email address
Hi, I've updated my email address in MAINTAINERS file. I don't have FSF copyright assignment record for am...@gcc.gnu.org but will sort it out before next commit to GCC. Thanks, bin --- trunk/ChangeLog 2018/06/01 18:46:23 261078 +++ trunk/ChangeLog 2018/06/01 19:53:31 261079 @@ -1,3 +1,7 @@ +2018-06-01 Bin Cheng + + * MAINTAINERS: Update my email address. + 2018-06-01 Tom de Vries * MAINTAINERS: Update my email address. --- trunk/MAINTAINERS 2018/06/01 18:46:23 261078 +++ trunk/MAINTAINERS 2018/06/01 19:53:31 261079 @@ -237,8 +237,8 @@ auto-vectorizer Richard Biener auto-vectorizer Zdenek Dvorak loop infrastructure Zdenek Dvorak -loop ivopts Bin Cheng -loop optimizer Bin Cheng +loop ivopts Bin Cheng +loop optimizer Bin Cheng OpenMP Jakub Jelinek testsuite Rainer Orth testsuite Mike Stump @@ -343,7 +343,6 @@ Chandra Chavva Dehao Chen Fabien Chêne -Bin Cheng Harshit Chopra Tamar Christina Eric Christopher
Re: [PATCH , rs6000] Add builtin tests for vec_madds, vec_extract_fp32_from_shortl and vec_extract_fp32_from_shorth, vec_xst_be
Hi! On Tue, May 29, 2018 at 08:54:47AM -0700, Carl Love wrote: > 2018-05-29 Carl Love > * gcc.target/powerpc/altivec-35.c (foo): Add builtin test vec_madds. > * gcc.target/powerpc/builtins-3-p9.c (main): Add tests for > vec_extract_fp32_from_shortl and vec_extract_fp32_from_shorth. > * gcc.target/powerpc/builtins-6-runnable.c (main): Fix typo for output. > Add vec_xst_be for signed and unsigned arguments. Okay for trunk. Thanks! Segher
Re: [PATCH] PR target/85358: Add target hook to prevent default widening
I'm wondering if there are other suggestions to make this patch acceptable. As I mentioned previously, the initialization process needs to go through all of the widening tables in order to initialize all FP types, so we can't just arbitrarily eliminate IFmode from the widening table. I could imagine having an alternative *_FLOAT_MODE that essentially marks which modes shouldn't be widened to/from and binop/unop could use. Or I could imagine making two widening tables, one for initialization, and one for binop/unop, or other possibilities. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[MAINTAINERS] Added myself to write after approval
Added myself to write after approval list in MAINTAINERS. Index: ChangeLog === --- ChangeLog (revision 261081) +++ ChangeLog (revision 261082) @@ -1,3 +1,7 @@ +2018-06-01 Jozef Lawrynowicz + + * MAINTAINERS (write after approval): Add myself. + 2018-06-01 Bin Cheng * MAINTAINERS: Update my email address. Index: MAINTAINERS === --- MAINTAINERS (revision 261081) +++ MAINTAINERS (revision 261082) @@ -472,6 +472,7 @@ Chris Lattner Terry Laurenzo Alan Lawrence +Jozef Lawrynowicz Georg-Johann Lay Marc Lehmann James Lemke
C++ PATCH for CWG 1581, when are constexpr member functions defined?
Since 5.2, GCC has deferred instantiation of constexpr functions until the point that the definition is necessary for evaluating a constant-expression. The resolution of core issue 1581 says that we need to be a bit more eager about instantiation: we should instantiate a constexpr function that is mentioned in an expression that we evaluate as a constant-expression, even if the constexpr function isn't actually needed. While playing with this I noticed two other issues; these patches aren't necessary for the final 1581 patch, but still seem correct. * 58281 was a problem with nothing ever clearing DECL_EXTERNAL on the instantiation of f; the patch for 65942 fixed it by deferring instantiation long enough for mark_used to send f to note_vague_linkage_fn so that EOF processing would clear DECL_EXTERNAL, but it would be good to clear it directly when we see the second explicit instantiation. * A function defaulted in class is just as defined as one defaulted out of class. Tested x86_64-pc-linux-gnu, applying to trunk. commit d0c8a45932d244d27201505c8c3d52bd3dbc2a67 Author: Jason Merrill Date: Thu May 31 17:09:25 2018 -0400 CWG 1581: When are constexpr member functions defined? * constexpr.c (instantiate_cx_fn_r, instantiate_constexpr_fns): New. (cxx_eval_outermost_constant_expr): Call instantiate_constexpr_fns. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index a099408dd28..944c1cdf11e 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -4813,6 +4813,46 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, return r; } +/* P0859: A function is needed for constant evaluation if it is a constexpr + function that is named by an expression ([basic.def.odr]) that is + potentially constant evaluated. + + So we need to instantiate any constexpr functions mentioned by the + expression even if the definition isn't needed for evaluating the + expression. */ + +static tree +instantiate_cx_fn_r (tree *tp, int *walk_subtrees, void */*data*/) +{ + if (TREE_CODE (*tp) == FUNCTION_DECL + && DECL_DECLARED_CONSTEXPR_P (*tp) + && !DECL_INITIAL (*tp) + && DECL_TEMPLOID_INSTANTIATION (*tp)) +{ + ++function_depth; + instantiate_decl (*tp, /*defer_ok*/false, /*expl_inst*/false); + --function_depth; +} + else if (TREE_CODE (*tp) == CALL_EXPR + || TREE_CODE (*tp) == AGGR_INIT_EXPR) +{ + if (EXPR_HAS_LOCATION (*tp)) + input_location = EXPR_LOCATION (*tp); +} + + if (!EXPR_P (*tp)) +*walk_subtrees = 0; + + return NULL_TREE; +} +static void +instantiate_constexpr_fns (tree t) +{ + location_t loc = input_location; + cp_walk_tree_without_duplicates (&t, instantiate_cx_fn_r, NULL); + input_location = loc; +} + static tree cxx_eval_outermost_constant_expr (tree t, bool allow_non_constant, bool strict = true, tree object = NULL_TREE) @@ -4858,6 +4898,7 @@ cxx_eval_outermost_constant_expr (tree t, bool allow_non_constant, r = TARGET_EXPR_INITIAL (r); } + instantiate_constexpr_fns (r); r = cxx_eval_constant_expression (&ctx, r, false, &non_constant_p, &overflow_p); @@ -4959,6 +5000,7 @@ is_sub_constant_expr (tree t) constexpr_ctx ctx = { NULL, &map, NULL, NULL, NULL, NULL, true, true }; + instantiate_constexpr_fns (t); cxx_eval_constant_expression (&ctx, t, false, &non_constant_p, &overflow_p); return !non_constant_p && !overflow_p; diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-inst1.C b/gcc/testsuite/g++.dg/cpp2a/constexpr-inst1.C new file mode 100644 index 000..1016bec9d3e --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-inst1.C @@ -0,0 +1,13 @@ +// Testcase from P0859 +// { dg-do compile { target c++14 } } + +template constexpr int f() { return T::value; } // { dg-error "int" } +template void g(decltype(B ? f() : 0)); +template void g(...); +template void h(decltype(int{B ? f() : 0})); +template void h(...); +void x() { + g(0); // OK, B ? f() : 0 is not potentially constant evaluated + h(0); // error, instantiates f even though B evaluates to false and +// list-initialization of int from int cannot be narrowing +} commit 34c29b27b770d76de9a55682d46670e0bcbfe98c Author: Jason Merrill Date: Thu May 31 17:21:31 2018 -0400 PR c++/58281 - explicit instantiation of constexpr * pt.c (mark_decl_instantiated): Clear DECL_EXTERNAL. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index b97cd3013be..4c5890deeb8 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -22168,6 +22168,12 @@ mark_decl_instantiated (tree result, int extern_p) linkonce sections. */ else if (TREE_PUBLIC (result)) maybe_make_one_only (result); + if (TREE_CODE (result) == FUNCTION_DECL + && DECL_TEMPLATE_INSTANTIATED (result)) + /* If the function has already been instantiated, clear DECL_EXTERNAL, + since start_preparsed_function wouldn't have if we had an earlier + extern explicit in
Re: [ARM/FDPIC 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts
On Mon, 28 May 2018, Christophe Lyon wrote: > On 25/05/2018 18:32, Joseph Myers wrote: > > On Fri, 25 May 2018, Christophe Lyon wrote: > > > > > In libtool.m4, we use uclinuxfdpiceabi in cases where ELF shared > > > libraries support is required, as uclinux does not guarantee that. > > > > To confirm: has this libtool.m4 patch gone upstream (or at least been > > submitted upstream, if not yet reviewed)? > > > Hi Joseph, > > No sorry, I didn't realize I had to post this patch elsewhere than > gcc-patches. > > I'm going to post it to the libtool project. > > Looking at gcc/libtool.m4's history, it seems the process would then be to > cherry-pick my patch into gcc, rather than merge with the upstream version? In practice, probably. We'd like to update autoconf/automake/libtool from upstream, but updating any one of those components is quite risky when they are very out of date as at present, and there may be dependencies between them (and as previously discussed, updating libtool will require reverting one upstream libtool patch incompatible with how the GCC build handles sysroots). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, rs6000 1/9] Testcase coverage for vec_xl() instrinsics
On Thu, May 31, 2018 at 02:57:21PM -0500, Will Schmidt wrote: > 2018-05-31 Will Schmidt > > * gcc.target/powerpc/fold-vec-load-vec_xl-char.c > * gcc.target/powerpc/fold-vec-load-vec_xl-double.c > * gcc.target/powerpc/fold-vec-load-vec_xl-float.c > * gcc.target/powerpc/fold-vec-load-vec_xl-int.c > * gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c > * gcc.target/powerpc/fold-vec-load-vec_xl-short.c "* gcc.target/powerpc/fold-vec-load-vec_xl-char.c: New testcase." etc.? > +/* { dg-final { scan-assembler-times "lxvw4x|lxvd2x|lxvx|lvx" 12 } } */ This works, but some \m\M would be better. > +/* { dg-do compile { target { powerpc*-*-linux* } } } */ Tests in gcc.target/powerpc can be just { dg-do compile } btw. (Or leave even that out, it is the default; whichever you prefer). Okay for trunk if you make a proper changelog ;-) Segher
Re: [PATCH, rs6000 2/9] Testcase coverage for builtin_vec_xl() instrinsics
On Thu, May 31, 2018 at 02:57:34PM -0500, Will Schmidt wrote: > 2018-05-31 Will Schmidt > > * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c: New. > * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c: New. > * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c: New. > * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c: New. > * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c: New. > * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c: New. OK, thanks. Segher
Re: [PATCH, rs6000 3/9] Testcase coverage for vec_vsx_ld() instrinsics
On Thu, May 31, 2018 at 02:57:48PM -0500, Will Schmidt wrote: > 2018-05-31 Will Schmidt > > * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c : New. > * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c : New. > * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c : New. > * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c : New. > * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c : New. > * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c : New. No space before : . Looks fine otherwise :-) Segher
[C++ Patch] Fix some locations
Hi, while working on some bugs I noticed that in a few places in decl.c we could do better in terms of locations within the current infrastructure, some simple, straightforward improvements. I'm attaching below a tiny first patch, more to follow, I hope. For example, a function which could be improved in many places is grok_ctor_properties and, before going ahead, I'd like to ask about something I noticed yesterday: /* There can be no default arguments. */ for (tree arg = argtypes; arg != void_list_node; arg = TREE_CHAIN (arg)) if (TREE_PURPOSE (arg)) { TREE_PURPOSE (arg) = NULL_TREE; if (operator_code == POSTINCREMENT_EXPR || operator_code == POSTDECREMENT_EXPR) pedwarn (input_location, OPT_Wpedantic, "%qD cannot have default arguments", decl); else { error ("%qD cannot have default arguments", decl); return false; } } the special casing seems weird, so far I haven't been able to find anything in the standard about it and all the other compilers I have at hand (Oracle, Intel, Clang) don't seem to have it. Shall we keep it or not? Maybe with an additional comment explaining the rationale? The affected testcases would be parse/defarg11.C and g++.jason/operator.C. Thanks, Paolo. /// /cp 2018-06-01 Paolo Carlini * decl.c (grokfndecl): Use the location_t argument in two more places. /testsuite 2018-06-01 Paolo Carlini * g++.dg/template/friend64.C: New. * g++.old-deja/g++.other/friend4.C: Test the location too. * g++.old-deja/g++.pt/crash23.C: Likewise. Index: cp/decl.c === --- cp/decl.c (revision 261078) +++ cp/decl.c (working copy) @@ -8674,8 +8674,9 @@ grokfndecl (tree ctype, if (friendp && TREE_CODE (orig_declarator) == TEMPLATE_ID_EXPR) { if (funcdef_flag) - error ("defining explicit specialization %qD in friend declaration", - orig_declarator); + error_at (location, + "defining explicit specialization %qD in friend declaration", + orig_declarator); else { tree fns = TREE_OPERAND (orig_declarator, 0); @@ -8684,9 +8685,10 @@ grokfndecl (tree ctype, if (PROCESSING_REAL_TEMPLATE_DECL_P ()) { /* Something like `template friend void f()'. */ - error ("invalid use of template-id %qD in declaration " -"of primary template", -orig_declarator); + error_at (location, + "invalid use of template-id %qD in declaration " + "of primary template", + orig_declarator); return NULL_TREE; } Index: testsuite/g++.dg/template/friend64.C === --- testsuite/g++.dg/template/friend64.C(nonexistent) +++ testsuite/g++.dg/template/friend64.C(working copy) @@ -0,0 +1,6 @@ +template void foo (int); + +template +class Q { + friend void foo (int) { } // { dg-error "15:defining explicit specialization" } +}; Index: testsuite/g++.old-deja/g++.other/friend4.C === --- testsuite/g++.old-deja/g++.other/friend4.C (revision 261078) +++ testsuite/g++.old-deja/g++.other/friend4.C (working copy) @@ -11,7 +11,7 @@ template void foo(); template class bar { int i; // { dg-message "" } private - template friend void foo(); // { dg-error "" } bogus declaration + template friend void foo(); // { dg-error "34:invalid use of template-id" } }; template void foo() { bar baz; baz.i = 1; // { dg-error "" } foo cannot access bar::i Index: testsuite/g++.old-deja/g++.pt/crash23.C === --- testsuite/g++.old-deja/g++.pt/crash23.C (revision 261078) +++ testsuite/g++.old-deja/g++.pt/crash23.C (working copy) @@ -4,7 +4,7 @@ template void foo(); template class bar { public: int i; - template friend void foo(); // { dg-error "" } template-id + template friend void foo(); // { dg-error "34:invalid use of template-id" } }; template void foo() { bar baz; baz.i = 1;
Re: [PATCH] add udivhi3, umodhi3 functions to libgcc
On Tue, 29 May 2018, Paul Koning wrote: > +unsigned short udivmodhi4(unsigned short, unsigned short, int); libgcc should not have any such non-static functions in the user namespace; they should all start with __. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, rs6000 6/9] Testcase coverage for vec_vsx_st() intrinsics
On Thu, May 31, 2018 at 02:58:29PM -0500, Will Schmidt wrote: > 2018-05-31 Will Schmidt > > * gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c: New. > * gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c: New. > * gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c: New. > * gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c: New. > * gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c: New. > * gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c: New. Okay, thanks! Segher
Re: [PATCH, rs6000 5/9] Testcase coverage for builtin_vec_xst() instrinsics
On Thu, May 31, 2018 at 02:58:11PM -0500, Will Schmidt wrote: > 2018-05-31 Will Schmidt > > * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c: New. > * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c: New. > * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c: New. > * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c: New. > * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-longlong.c: New. > * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c: New. Okido. Segher
Re: [PATCH] add udivhi3, umodhi3 functions to libgcc
> On Jun 1, 2018, at 5:04 PM, Joseph Myers wrote: > > On Tue, 29 May 2018, Paul Koning wrote: > >> +unsigned short udivmodhi4(unsigned short, unsigned short, int); > > libgcc should not have any such non-static functions in the user > namespace; they should all start with __. That too is a problem in the previous code. Ok, it sounds like the right answer is to withdraw this patch and create a new one that handles both this issue and cleans up the way udivsi is done. That would replace the existing udivmod.c and udivmodsi4.c files. paul
Re: [PATCH] c/55976 -Werror=return-type should error on returning a value from a void function
Thanks for pointing this out. I'll check out what's going on and fix the issue --Dave On 05/31/2018 05:53 AM, H.J. Lu wrote: On Wed, May 30, 2018 at 3:56 PM, Jeff Law wrote: On 04/22/2018 01:17 PM, dave.pa...@oracle.com wrote: This patch fixes handling of -Werror=return-type as well as -Wno-return-type. Currently, -Werror=return-type does not turn the warnings into errors and -Wno-return-type does not turn off warning/error. Now they both work as expected. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55976 Initialize warn_return_type only for C++/C++ with ObjC extensions. In C, this allows us to differentiate between default (no option), or cases where -Wreturn-type/-Wno-return-type are specified. Elsewhere, update references to warn_return_type (for C) to reflect change in initialization. Patch was successfully bootstrapped and tested on x86_64-linux. --Dave CL-55976 /c 2018-04-22 David Pagan PR c/55976 * c-decl.c (grokdeclarator): Update check for return type warnings. (start_function): Likewise. (finish_function): Likewise. * c-typeck.c (c_finish_return): Update check for return type warnings. Pass OPT_Wreturn_type to pedwarn when appropriate. * c-opts.c (c_common_post_options): Set default for warn_return_type for C++/C++ with ObjC extensions only. For C, makes it possible to differentiate between default (no option), -Wreturn-type, and -Wno-return-type. /testsuite 2018-04-22 David Pagan PR c/55976 * gcc.dg/noncompile/pr55976-1.c: New test. * gcc.dg/noncompile/pr55976-2.c: New test. THanks. Installed on the trunk. On x86, I got FAIL: gcc.dg/noncompile/pr55976-1.c -O0 (test for excess errors) FAIL: gcc.dg/noncompile/pr55976-1.c -O1 (test for excess errors) FAIL: gcc.dg/noncompile/pr55976-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.dg/noncompile/pr55976-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) FAIL: gcc.dg/noncompile/pr55976-1.c -O2 (test for excess errors) FAIL: gcc.dg/noncompile/pr55976-1.c -O3 -g (test for excess errors) FAIL: gcc.dg/noncompile/pr55976-1.c -Os (test for excess errors) [hjl@gnu-skx-1 testsuite]$ /export/ssd/git/gcc-test-native/bld/gcc/xgcc -B/export/ssd/git/gcc-test-native/bld/gcc/ /export/ssd/git/gcc-test-native/src-trunk/gcc/testsuite/gcc.dg/noncompile/pr55976-1.c -mx32 -B/export/ssd/git/gcc-test-native/bld/x86_64-pc-linux-gnu/32/libmpx/ -B/export/ssd/git/gcc-test-native/bld/x86_64-pc-linux-gnu/32/libmpx/mpxrt -L/export/ssd/git/gcc-test-native/bld/x86_64-pc-linux-gnu/32/libmpx/mpxrt/.libs -B/export/ssd/git/gcc-test-native/bld/x86_64-pc-linux-gnu/32/libmpx/ -B/export/ssd/git/gcc-test-native/bld/x86_64-pc-linux-gnu/32/libmpx/mpxwrap -L/export/ssd/git/gcc-test-native/bld/x86_64-pc-linux-gnu/32/libmpx/mpxwrap/.libs -fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -Werror=return-type -S -o pr55976-1.s /export/ssd/git/gcc-test-native/src-trunk/gcc/testsuite/gcc.dg/noncompile/pr55976-1.c: In function \u2018t\u2019: /export/ssd/git/gcc-test-native/src-trunk/gcc/testsuite/gcc.dg/noncompile/pr55976-1.c:7:20: error: \u2018return\u2019 with a value, in function returning void [-Werror=return-type] /export/ssd/git/gcc-test-native/src-trunk/gcc/testsuite/gcc.dg/noncompile/pr55976-1.c:7:6: note: declared here /export/ssd/git/gcc-test-native/src-trunk/gcc/testsuite/gcc.dg/noncompile/pr55976-1.c: In function \u2018b\u2019: /export/ssd/git/gcc-test-native/src-trunk/gcc/testsuite/gcc.dg/noncompile/pr55976-1.c:8:12: error: \u2018return\u2019 with no value, in function returning non-void [-Werror=return-type] /export/ssd/git/gcc-test-native/src-trunk/gcc/testsuite/gcc.dg/noncompile/pr55976-1.c:8:5: note: declared here cc1: some warnings being treated as errors [hjl@gnu-skx-1 testsuite]$
Re: [PATCH, rs6000 7/9] testcase updates for unaligned loads/stores
On Thu, May 31, 2018 at 02:58:40PM -0500, Will Schmidt wrote: > 2018-05-31 Will Schmidt > > * gcc.target/powerpc/p8-vec-xl-xst-v2.c: New. > * gcc.target/powerpc/p8-vec-xl-xst.c: Disable gimple-folding. > * gcc.target/powerpc/swaps-p8-17.c: Same. Only one space after : . > diff --git a/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c > b/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c > new file mode 100644 > index 000..3315c5f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/p8-vec-xl-xst-v2.c > @@ -0,0 +1,64 @@ > +/* { dg-do compile { target { powerpc64le-*-* } } } */ Please use the new "le" selector here (and other files). Looks fine otherwise, thanks. Segher
Re: [PATCH, rs6000 4/9] Testcase coverage for vec_xst() instrinsics
On Thu, May 31, 2018 at 02:57:57PM -0500, Will Schmidt wrote: > 2018-05-31 Will Schmidt > > * gcc.target/powerpc/fold-vec-store-vec_xst-char.c : New. > * gcc.target/powerpc/fold-vec-store-vec_xst-double.c : New. > * gcc.target/powerpc/fold-vec-store-vec_xst-float.c : New. > * gcc.target/powerpc/fold-vec-store-vec_xst-int.c : New. > * gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c : New. > * gcc.target/powerpc/fold-vec-store-vec_xst-short.c : New. No space before : . The actual patch is fine :-) Segher
Re: [C++ Patch] Fix some locations
On Fri, Jun 1, 2018 at 5:03 PM, Paolo Carlini wrote: > while working on some bugs I noticed that in a few places in decl.c we could > do better in terms of locations within the current infrastructure, some > simple, straightforward improvements. I'm attaching below a tiny first > patch, more to follow, I hope. > > For example, a function which could be improved in many places is > grok_ctor_properties and, before going ahead, I'd like to ask about > something I noticed yesterday: > > /* There can be no default arguments. */ > for (tree arg = argtypes; arg != void_list_node; arg = TREE_CHAIN (arg)) > if (TREE_PURPOSE (arg)) > { > TREE_PURPOSE (arg) = NULL_TREE; > if (operator_code == POSTINCREMENT_EXPR > || operator_code == POSTDECREMENT_EXPR) > pedwarn (input_location, OPT_Wpedantic, >"%qD cannot have default arguments", decl); > else > { > error ("%qD cannot have default arguments", decl); > return false; > } > } > > the special casing seems weird, so far I haven't been able to find anything > in the standard about it and all the other compilers I have at hand (Oracle, > Intel, Clang) don't seem to have it. Shall we keep it or not? Maybe with an > additional comment explaining the rationale? The affected testcases would be > parse/defarg11.C and g++.jason/operator.C. I think in the olden days, with a default argument one function could support both prefix and postfix inc/decrement syntax: struct A { A& operator++(int = 0); }; int main() { A a; ++a; a++; } ...but that hasn't actually worked in forever, so it shouldn't be a problem to make the ++ case an error, too. Jason
Re: [PATCH] DWARF5: Don't generate DW_AT_loclists_base for split compile unit DIEs.
OK. On Fri, Jun 1, 2018 at 8:29 AM, Mark Wielaard wrote: > On Sat, 2018-05-26 at 21:31 +0200, Mark Wielaard wrote: >> The loclists_base attribute is used to point to the beginning of the >> loclists index of a DWARF5 loclists table when using DW_FORM_loclistsx. >> For split compile units the base is not given by the attribute, but is >> either the first (and only) index in the .debug_loclists.dwo section, >> or (when placed in a .dwp file) given by the DW_SECT_LOCLISTS row in >> the .debug_cu_index section. >> >> The loclists_base attribute is only valid for the full (or skeleton) >> compile unit DIE in the main (relocatable) object. But GCC only ever >> generates a loclists table index for the .debug_loclists section put >> into the split DWARF .dwo file. >> >> For split compile unit DIEs it is confusing (and not according to spec) >> to also have a DW_AT_loclists_base attribute (which might be wrong, >> since its relocatable offset won't actually be relocated). > > Ping. > >> gcc/ChangeLog >> >> * dwarf2out.c (dwarf2out_finish): Remove generation of >> DW_AT_loclists_base. >> --- >> >> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c >> index c05bfe4..103ded0 100644 >> --- a/gcc/dwarf2out.c >> +++ b/gcc/dwarf2out.c >> @@ -31292,11 +31292,17 @@ dwarf2out_finish (const char *) >>if (dwarf_split_debug_info) >> { >>if (have_location_lists) >> > { >> > - if (dwarf_version >= 5) >> > - add_AT_loclistsptr (comp_unit_die (), DW_AT_loclists_base, >> > - loc_section_label); >> > + /* Since we generate the loclists in the split DWARF .dwo >> > +file itself, we don't need to generate a loclists_base >> > +attribute for the split compile unit DIE. That attribute >> > +(and using relocatable sec_offset FORMs) isn't allowed >> > +for a split compile unit. Only if the .debug_loclists >> > +section was in the main file, would we need to generate a >> > +loclists_base attribute here (for the full or skeleton >> > +unit DIE). */ >> + >> > /* optimize_location_lists calculates the size of the lists, >> > so index them first, and assign indices to the entries. >> > Although optimize_location_lists will remove entries from >> > the table, it only does so for duplicates, and therefore
Re: [PATCH] rs6000: Fix mangling for 128-bit float
On Fri, 1 Jun 2018, Segher Boessenkool wrote: > This patch changes the (C++) mangling of the 128-bit float types. > IBM long double ("double-double") is mangled as "g", as before, and > IEEE 128-bit long double is mangled as "u9__ieee128". To be clear: given this mangling (which certainly simplifies the ABI), is the intent that only one type with double-double format, and only one type with binary128 format, will be accessible in any given C++ compilation, to avoid ICEs (bugs 85075 and 85518) from different types having the same mangling? So __ibm128 *will* be the same type as long double when those have the same format, and likewise long double, __float128, __ieee128 and __typeof (__builtin_inff128 ()) will have the same type when they have the same format? You can have the simple mangling that's compatible between different choices of long double, or you can have the types being consistently different so people can e.g. overload and write templates using them without worrying about the possibility that __ieee128 might or might not be the same type as long double depending on the compiler options, but you can't have both without running into problems. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] rs6000: Fix mangling for 128-bit float
On Fri, Jun 01, 2018 at 09:33:57PM +, Joseph Myers wrote: > On Fri, 1 Jun 2018, Segher Boessenkool wrote: > > > This patch changes the (C++) mangling of the 128-bit float types. > > IBM long double ("double-double") is mangled as "g", as before, and > > IEEE 128-bit long double is mangled as "u9__ieee128". > > To be clear: given this mangling (which certainly simplifies the ABI), is > the intent that only one type with double-double format, and only one type > with binary128 format, will be accessible in any given C++ compilation, to > avoid ICEs (bugs 85075 and 85518) from different types having the same > mangling? So __ibm128 *will* be the same type as long double when those > have the same format, and likewise long double, __float128, __ieee128 and > __typeof (__builtin_inff128 ()) will have the same type when they have the > same format? Yes. (__float128 is just a #define btw.) > You can have the simple mangling that's compatible between different > choices of long double, or you can have the types being consistently > different so people can e.g. overload and write templates using them > without worrying about the possibility that __ieee128 might or might not > be the same type as long double depending on the compiler options, but you > can't have both without running into problems. Right. But given how much this simplifies, we'll just have to bite the bullet and get this to work properly. Segher
Re: [PATCH 09/10] Experiment with using optinfo in gimple-loop-interchange.cc
On Fri, 2018-06-01 at 17:31 +0200, Richard Biener wrote: > On June 1, 2018 3:40:15 PM GMT+02:00, David Malcolm com> wrote: > > On Fri, 2018-06-01 at 11:50 +0200, Richard Biener wrote: > > > On Tue, May 29, 2018 at 10:33 PM David Malcolm > > om> > > > wrote: > > > > > > > > This was an experiment to try to capture information on a > > > > loop optimization. > > > > > > > > gcc/ChangeLog: > > > > * gimple-loop-interchange.cc > > > > (should_interchange_loops): > > > > Add > > > > optinfo note when interchange gives better data > > > > locality > > > > behavior. > > > > (tree_loop_interchange::interchange): Add > > > > OPTINFO_SCOPE. > > > > Add optinfo for successful and unsuccessful > > > > interchanges. > > > > (prepare_perfect_loop_nest): Add OPTINFO_SCOPE. Add > > > > optinfo note. > > > > (pass_linterchange::execute): Add OPTINFO_SCOPE. > > > > --- > > > > gcc/gimple-loop-interchange.cc | 36 > > > > +++- > > > > 1 file changed, 35 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop- > > > > interchange.cc > > > > index eb35263..cd32288 100644 > > > > --- a/gcc/gimple-loop-interchange.cc > > > > +++ b/gcc/gimple-loop-interchange.cc > > > > @@ -1556,7 +1556,19 @@ should_interchange_loops (unsigned > > > > i_idx, > > > > unsigned o_idx, > > > >ratio = innermost_loops_p ? INNER_STRIDE_RATIO : > > > > OUTER_STRIDE_RATIO; > > > >/* Do interchange if it gives better data locality > > > > behavior. */ > > > >if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, > > > > ratio))) > > > > -return true; > > > > +{ > > > > + if (optinfo_enabled_p ()) > > > > + OPTINFO_NOTE ((gimple *)NULL) // FIXME > > > > + << "interchange gives better data locality behavior: > > > > " > > > > + << "iloop_strides: " > > > > + << decu (iloop_strides) > > > > + << " > (oloop_strides: " > > > > + << decu (oloop_strides) > > > > + << " * ratio: " > > > > + << decu (ratio) > > > > + << ")"; > > > > > > Just randomly inside the thread. > > > > > > NOO! > > > > > > :/ > > > Please do _not_ add more stream-like APIs. How do you expect > > > translators to deal with those? > > > > > > Yes, I'm aware of the graphite-* ones and I dislike those very > > > much. > > > > > > What's wrong with the existing dump API? > > > > The existing API suffers from a "wall of text" problem: > > > > * although it's possible to filter based on various criteria (the > > optgroup tags, specific passes, success vs failure), it's not > > possible > > to filter base on code hotness: the -fopt-info API works purely in > > terms of location_t. So all of the messages about the hottest > > functions in the workload are intermingled with all of the other > > messages about all of the other functions. > > True > > > * some of the text notes refer to function entry, but all of these > > are > > emitted "at the same level": there's no way to see the nesting of > > these > > function-entry logs, and where other notes are in relation to > > them. > > For example, in: > > > > test.c:8:3: note: === analyzing loop === > > test.c:8:3: note: === analyze_loop_nest === > > test.c:8:3: note: === vect_analyze_loop_form === > > test.c:8:3: note: === get_loop_niters === > > test.c:8:3: note: symbolic number of iterations is (unsigned int) > > n_9(D) > > test.c:8:3: note: not vectorized: loop contains function calls or > > data > > references that cannot be analyzed > > test.c:8:3: note: vectorized 0 loops in function > > > > there's no way to tell that the "vect_analyze_loop_form" is in fact > > inside the call to "analyze_loop_nest", and where the "symbolic > > number > > of iterations" messages is coming from in relation to them. This > > may > > not seem significant here, but I'm quoting a small example; > > vectorization typically leads to dozens of messages, with a deep > > nesting structure (where that structure isn't visible in the -fopt- > > info > > > > output). > > True. The same issue exists for diagnostics BTW. Indeed, being able > to collapse 'sections' in dump files, opt-info or diagnostics sounds > useful. > > Note that for dump files and opt-info the level argument was sort of > designed to do that. Are you referring to the indentation argument here? > > > > The existing API is throwing data away: > > > > * as noted above, by working purely with a location_t, the > > execution > > count isn't associated with the messages. The output format purely > > gives file/line/column information, but doesn't cover the inlining > > chain. For C++ templates it doesn't specify which instance of a > > template is being optimized. > > It might be useful to enhance the interface by allowing different > kind of 'locations'. In patch 3 of the kit there's a class optinfo_location, which can be constructed
Re: [PATCH] PR target/85358: Add target hook to prevent default widening
On Fri, 1 Jun 2018, Michael Meissner wrote: > I'm wondering if there are other suggestions to make this patch acceptable. > > As I mentioned previously, the initialization process needs to go through all > of the widening tables in order to initialize all FP types, so we can't just > arbitrarily eliminate IFmode from the widening table. Initialization that's meant to cover all floating-point modes logically should not rely on everything being reachable by the "wider" relation. That is, I'd expect it to do something equivalent to FOR_EACH_MODE_IN_CLASS (mode, MODE_FLOAT) rather than something based on "wider". But if the initialization relies on some form of partial ordering (initializing some modes relies on other modes having been initialized first, in a way somehow related to the "wider" relation), it may be more complicated than that. -- Joseph S. Myers jos...@codesourcery.com
[PATCH], Add weak references to bridge old/new PowerPC ieee 128-bit name mangling
The new mangling scheme for PowerPC now uses "u9__ieee128" as the mangled name instead of "U10__float128". This would break old code that used __float128 and called to libraries built with the new compiler. This patch adds a hook to emit a weak reference for the old mangled name when we are creating the new mangled name with a different spelling. I added an undocumented switch (-malias-old-name-mangling) to not create the weak reference if it is a problem. This patch also makes __ibm128 or __float128 use the long double mode if long double uses the IBM extended double or IEEE 128-bit representations. This allows templates to work again with those types (the template code aborts if you have two distinct types use the same mangling). However, overloaded types won't work, but I suspect these days people use templates over overloaded functions. I fixed up the test case for this (pr85657.C) so that it only tests for templates. I have tested it on a little endian power8 system and there were no regressions. It does fix the error in pr85657.C that was due to the previous name mangling changes. Can I check this into the trunk? We will need to backport the name mangling changes and these patches back to GCC 8.2 as well. I will need to re-issue the patch for __builtin_{,un}pack_longdouble and __builtin_{,un}pack_ibm128 due to changes made by these patches. [gcc] 2018-06-01 Michael Meissner * config/rs6000/rs6000.c (rs6000_passes_ieee128): New static boolean to remember if we passed or returned IEEE 128-bit values. (old_mangling): New static boolean to say whether to use the old mangling strings for IEEE 128-bit floating point or the new mangling. (TARGET_ASM_GLOBALIZE_DECL_NAME): Define target hook. (rs6000_debug_reg_global): Print whether we support creating weak references for the old mangling name. (rs6000_option_override_internal): Add support for -malias-old-name-mangling. (init_cumulative_args): Note if we ever have passed or returned IEEE 128-bit floating point. (rs6000_function_arg_advance_1): Likewise. (rs6000_init_builtins): If long double is IBM extended double, make __int128 use the TFmode type. If long double is IEEE 128-bit floating point, make __float128 use the TFmode type. (rs6000_mangle_type): Return either the old mangling or new mangling string for IEEE 128-bit floating point types. (rs6000_globalize_decl_name): If we are putting out a .globl for a C++ mangled name that contains an IEEE 128-bit floating point value, put out a weak reference for the old name if it was different. * config/rs6000/rs6000.opt (-malias-old-name-mangling): Add undocumented switch to control whether weak references are put out for the old mangled name if the mangling changed. [gcc/testsuite] 2018-06-01 Michael Meissner * g++.dg/pr85657.C: Update test, only test whether templates can be used on all 3 128-bit floating point types. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 261078) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -200,6 +200,15 @@ int dot_symbols; of this machine mode. */ scalar_int_mode rs6000_pmode; +/* Note whether IEEE 128-bit floating point was passed or returned, either as + the __float128/_Float128 explicit type, or when long double is IEEE 128-bit + floating point. We changed the default C++ mangling for these types and we + may want to generate a weak alias to the old mangling. */ +static bool rs6000_passes_ieee128; + +/* Generate old manged name, not new name. */ +static bool old_mangling; + /* Width in bits of a pointer. */ unsigned rs6000_pointer_size; @@ -1973,6 +1982,11 @@ static const struct attribute_spec rs600 #undef TARGET_STARTING_FRAME_OFFSET #define TARGET_STARTING_FRAME_OFFSET rs6000_starting_frame_offset + +#if TARGET_ELF && RS6000_WEAK +#undef TARGET_ASM_GLOBALIZE_DECL_NAME +#define TARGET_ASM_GLOBALIZE_DECL_NAME rs6000_globalize_decl_name +#endif /* Processor table. */ @@ -2835,6 +2849,9 @@ rs6000_debug_reg_global (void) fprintf (stderr, DEBUG_FMT_S, "abi", abi_str); + if (TARGET_OLD_NAME_MANGLING) +fprintf (stderr, DEBUG_FMT_S, "old name mangling", "true"); + if (rs6000_altivec_abi) fprintf (stderr, DEBUG_FMT_S, "altivec_abi", "true"); @@ -4635,6 +4652,16 @@ rs6000_option_override_internal (bool gl rs6000_isa_flags &= ~OPTION_MASK_FLOAT128_HW; } + /* Enable making an alias by default to map old mangled names to the new + names on systems that support float128. */ + if (TARGET_OLD_NAME_MANGLING && !TARGET_FLOAT128_TYPE) +{ +
C++ PATCH for c++/85764, bogus 'this' not captured error
Here, because we're inside a member function, resolvable_dummy_lambda thought we could resolve a reference to 'this'. But because it's a static member function, we can't. Using nonlambda_method_basetype instead of current_nonlambda_class_type fixes that, but then I needed to improve n_m_b to handle NSDMI context. Tested x86_64-pc-linux-gnu, applying to trunk. commit 100e3e46cc5e8000ebe27640beb34d027b414beb Author: Jason Merrill Date: Fri Jun 1 19:10:39 2018 -0400 PR c++/85764 - bogus 'this' not captured error. * lambda.c (resolvable_dummy_lambda): Use nonlambda_method_basetype. (nonlambda_method_basetype): Handle NSDMI. diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 9c1b49b4d8e..231490fbe4e 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -884,7 +884,7 @@ resolvable_dummy_lambda (tree object) && current_class_type && LAMBDA_TYPE_P (current_class_type) && lambda_function (current_class_type) - && DERIVED_FROM_P (type, current_nonlambda_class_type ())) + && DERIVED_FROM_P (type, nonlambda_method_basetype())) return CLASSTYPE_LAMBDA_EXPR (current_class_type); return NULL_TREE; @@ -949,30 +949,37 @@ current_nonlambda_function (void) return fn; } -/* Returns the method basetype of the innermost non-lambda function, or - NULL_TREE if none. */ +/* Returns the method basetype of the innermost non-lambda function, including + a hypothetical constructor if inside an NSDMI, or NULL_TREE if none. */ tree nonlambda_method_basetype (void) { - tree fn, type; if (!current_class_ref) return NULL_TREE; - type = current_class_type; + tree type = current_class_type; if (!type || !LAMBDA_TYPE_P (type)) return type; - /* Find the nearest enclosing non-lambda function. */ - fn = TYPE_NAME (type); - do -fn = decl_function_context (fn); - while (fn && LAMBDA_FUNCTION_P (fn)); + while (true) +{ + tree lam = CLASSTYPE_LAMBDA_EXPR (type); + tree ex = LAMBDA_EXPR_EXTRA_SCOPE (lam); + if (ex && TREE_CODE (ex) == FIELD_DECL) + /* Lambda in an NSDMI. */ + return DECL_CONTEXT (ex); - if (!fn || !DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)) -return NULL_TREE; - - return TYPE_METHOD_BASETYPE (TREE_TYPE (fn)); + tree fn = TYPE_CONTEXT (type); + if (!fn || TREE_CODE (fn) != FUNCTION_DECL + || !DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)) + /* No enclosing non-lambda method. */ + return NULL_TREE; + if (!LAMBDA_FUNCTION_P (fn)) + /* Found an enclosing non-lambda method. */ + return TYPE_METHOD_BASETYPE (TREE_TYPE (fn)); + type = DECL_CONTEXT (fn); +} } /* Like current_scope, but looking through lambdas. */ diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-this2.C b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-this2.C new file mode 100644 index 000..d27ed713fff --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-this2.C @@ -0,0 +1,13 @@ +// PR c++/85764 +// { dg-do compile { target c++14 } } + +template +class trie { +static void for_each(int & f, trie const & n, Key & prefix) { +[&](trie const & c) { + for_each(f, c, prefix); +}; +} +void for_each(int & f) const { +} +};
Re: [PATCH], Add weak references to bridge old/new PowerPC ieee 128-bit name mangling
On Jun 01 2018, Michael Meissner wrote: > This patch adds a hook to emit a weak reference for the old mangled name when > we are creating the new mangled name with a different spelling. I added an > undocumented switch (-malias-old-name-mangling) to not create the weak > reference if it is a problem. Please avoid "old" in the option name, at least add a word that specifies it further. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."