Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Tue, Dec 10, 2013 at 11:59:16PM -0700, Jeff Law wrote: > On 12/10/13 23:35, Bin.Cheng wrote: > >I know little about GC, so when ggc_collect may be called (between two > >passes)? If so, I have to call free_affine_expand_cache just after the > >calling to tree_to_affine_combination_expand in SCEV because it's an > >analyzer and as you pointed out, the analyzing result is used by > >different optimizers. The pointer map would be maintained between > >different passes if it's not instantly freed. > The garbage collector only runs between passes. If you have roots > in static storage, then you'll need to decorate them so the garbage > collector scans them. > > There's a fairly extensive discussion of the garbage collector in > the GCC internals manual. It isn't that easy, pointer_map isn't a data structure recognized by the garbage collector at all (plus, as I've said before, the code is inserting XNEW allocated structures into the pointer_map, which isn't GC friendly either, but making them GC allocated might increase garbage unnecessarily, as all the structs are short lived). I've looked through scev_{initialize,finalize} callers, and almost all passes initialize it and finalize within the same pass, it is only the loop optimizations: NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_dce_loop); NEXT_PASS (pass_tree_unswitch); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_check_data_deps); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_graphite); PUSH_INSERT_PASSES_WITHIN (pass_graphite) NEXT_PASS (pass_graphite_transforms); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_dce_loop); POP_INSERT_PASSES () NEXT_PASS (pass_iv_canon); NEXT_PASS (pass_parallelize_loops); NEXT_PASS (pass_if_conversion); /* pass_vectorize must immediately follow pass_if_conversion. Please do not add any other passes in between. */ NEXT_PASS (pass_vectorize); PUSH_INSERT_PASSES_WITHIN (pass_vectorize) NEXT_PASS (pass_dce_loop); POP_INSERT_PASSES () NEXT_PASS (pass_predcom); NEXT_PASS (pass_complete_unroll); NEXT_PASS (pass_slp_vectorize); NEXT_PASS (pass_loop_prefetch); NEXT_PASS (pass_iv_optimize); NEXT_PASS (pass_lim); NEXT_PASS (pass_tree_loop_done); where scev is live in between the passes, and unfortunately there is no call that each pass calls that you could easily add the free_affine.* call to (except for execute_function_todo but you'd need to invent another TODO_* flag for it and the question is how you'd ensure it will be handled for the non-loop specific passes that are just run in between these passes (say pass_copy_prop, would we need pass_copy_prop_loop?). Perhaps you could replace in tree-affine.c and all it's users {,struct} pointer_map_t * with struct GTY((user)) affine_expand_cache { pointer_map_t *cache; }; and write custom gt_ggc_mx (affine_expand_cache *) function that would would either pointer_map_traverse the cache and for each cache entry call gt_ggc_mx on the embedded trees in the structures, or drop the cache (dunno if the cache is really just a cache and it doesn't matter if it is dropped any time, or if it can affect code generation). There are also PCH related functions that need to be defined, though I guess you could just assert there that the cache is NULL and not walk anything. Jakub
[C++ PATCH] Fix GC related issues in C++ FE (PR c++/58627)
Hi! fn_type_unification does: /* We can't free this if a pending_template entry or * last_error_tinst_level is pointing at it. */ if (last_pending_template == old_last_pend && last_error_tinst_level == old_error_tinst) ggc_free (tinst); because sometimes tinst (and targs) is added to pending templates or tinst level and thus still GC reachable, but often it isn't and it is called often enough that it creates too much garbage. But, the resolve_address_of_overloaded_function caller also calls ggc_free (targs); for similar reasons, but without checking if it might be stored somewhere. The options I see: 1) drop the ggc_free (targs); call from resolve_address_of_overloaded_function altogether 2) remember and check last_pending_template/last_error_tinst_level also in resolve_address_of_overloaded_function for the same purposes as fn_type_unification does; the problem with that is that those vars are private to pt.c 3) what the patch does, add another parameter to fn_type_unification which will tell the caller whether it is safe to ggc_free targs or not The following patch implements 3), and adds a bunch of ggc_free calls to other callers of fn_type_unification when it has that info already. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-12-11 Jakub Jelinek PR c++/58627 * cp-tree.h (fn_type_unification): Add bool * argument. * pt.c (fn_type_unification): Add targs_unused_p argument, set what it points to to true if targs argument isn't stored anywhere. (get_bindings): Adjust fn_type_unification caller. If targs_unused_p, ggc_free targs. * call.c (add_template_candidate_real): Likewise. (print_z_candidate): Likewise. * class.c (resolve_address_of_overloaded_function): Don't ggc_free targs if targs_unused_p is false. --- gcc/cp/cp-tree.h.jj 2013-11-25 10:20:12.0 +0100 +++ gcc/cp/cp-tree.h2013-12-10 09:13:55.202703361 +0100 @@ -5495,7 +5495,7 @@ extern tree instantiate_template (tree, extern tree fn_type_unification(tree, tree, tree, const tree *, unsigned int, tree, unification_kind_t, int, -bool, bool); +bool, bool, bool *); extern void mark_decl_instantiated (tree, int); extern int more_specialized_fn (tree, tree, int); extern void do_decl_instantiation (tree, tree); --- gcc/cp/call.c.jj2013-11-25 10:20:12.0 +0100 +++ gcc/cp/call.c 2013-12-10 09:19:20.034025608 +0100 @@ -2919,22 +2919,25 @@ add_template_candidate_real (struct z_ca gcc_assert (ia == nargs_without_in_chrg); errs = errorcount+sorrycount; + bool targs_unused_p; fn = fn_type_unification (tmpl, explicit_targs, targs, args_without_in_chrg, nargs_without_in_chrg, return_type, strict, flags, false, - complain & tf_decltype); + complain & tf_decltype, &targs_unused_p); if (fn == error_mark_node) { /* Don't repeat unification later if it already resulted in errors. */ - if (errorcount+sorrycount == errs) + if (errorcount + sorrycount == errs) reason = template_unification_rejection (tmpl, explicit_targs, targs, args_without_in_chrg, nargs_without_in_chrg, return_type, strict, flags); else reason = template_unification_error_rejection (); + if (targs_unused_p) + ggc_free (targs); goto fail; } @@ -3230,16 +3233,19 @@ print_z_candidate (location_t loc, const } /* Re-run template unification with diagnostics. */ inform (cloc, " template argument deduction/substitution failed:"); + bool targs_unused_p; + tree targs; + targs = make_tree_vec (r->u.template_unification.num_targs); fn_type_unification (r->u.template_unification.tmpl, - r->u.template_unification.explicit_targs, - (make_tree_vec -
[PATCH] Fix invalid tree sharing caused by the inliner (PR tree-optimization/59386)
Hi! If id->retvar isn't a decl (can happen for DECL_BY_REFERENCE returns), then we can end up with invalid tree sharing because we reuse it more than once (on the following testcase id->retvar is a COMPONENT_REF). Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-12-11 Jakub Jelinek PR tree-optimization/59386 * tree-inline.c (remap_gimple_stmt): If not id->do_not_unshare, unshare_expr (id->retval) before passing it to gimple_build_assign. * gcc.c-torture/compile/pr59386.c: New test. --- gcc/tree-inline.c.jj2013-12-10 08:52:13.0 +0100 +++ gcc/tree-inline.c 2013-12-10 17:04:09.022069945 +0100 @@ -1273,7 +1273,9 @@ remap_gimple_stmt (gimple stmt, copy_bod || ! SSA_NAME_VAR (retval) || TREE_CODE (SSA_NAME_VAR (retval)) != RESULT_DECL))) { - copy = gimple_build_assign (id->retvar, retval); + copy = gimple_build_assign (id->do_not_unshare + ? id->retvar : unshare_expr (id->retvar), + retval); /* id->retvar is already substituted. Skip it on later remapping. */ skip_first = true; } --- gcc/testsuite/gcc.c-torture/compile/pr59386.c.jj2013-12-10 17:06:39.389291052 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr59386.c 2013-12-10 17:05:37.0 +0100 @@ -0,0 +1,24 @@ +/* PR tree-optimization/59386 */ + +struct S { int s; }; +struct T { int t; struct S u; } c; +int b; + +struct S +foo () +{ + struct T d; + if (b) +while (c.t) + ; + else +return d.u; +} + +struct S +bar () +{ + struct T a; + a.u = foo (); + return a.u; +} Jakub
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Wed, Dec 11, 2013 at 4:17 PM, Jakub Jelinek wrote: > On Tue, Dec 10, 2013 at 11:59:16PM -0700, Jeff Law wrote: >> On 12/10/13 23:35, Bin.Cheng wrote: >> >I know little about GC, so when ggc_collect may be called (between two >> >passes)? If so, I have to call free_affine_expand_cache just after the >> >calling to tree_to_affine_combination_expand in SCEV because it's an >> >analyzer and as you pointed out, the analyzing result is used by >> >different optimizers. The pointer map would be maintained between >> >different passes if it's not instantly freed. >> The garbage collector only runs between passes. If you have roots >> in static storage, then you'll need to decorate them so the garbage >> collector scans them. >> >> There's a fairly extensive discussion of the garbage collector in >> the GCC internals manual. > > It isn't that easy, pointer_map isn't a data structure recognized by the > garbage collector at all (plus, as I've said before, the code is inserting > XNEW allocated structures into the pointer_map, which isn't GC friendly > either, but making them GC allocated might increase garbage unnecessarily, > as all the structs are short lived). > > I've looked through scev_{initialize,finalize} callers, and almost all > passes initialize it and finalize within the same pass, it is only the loop > optimizations: > NEXT_PASS (pass_tree_loop_init); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_dce_loop); > NEXT_PASS (pass_tree_unswitch); > NEXT_PASS (pass_scev_cprop); > NEXT_PASS (pass_record_bounds); > NEXT_PASS (pass_check_data_deps); > NEXT_PASS (pass_loop_distribution); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_graphite); > PUSH_INSERT_PASSES_WITHIN (pass_graphite) > NEXT_PASS (pass_graphite_transforms); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_dce_loop); > POP_INSERT_PASSES () > NEXT_PASS (pass_iv_canon); > NEXT_PASS (pass_parallelize_loops); > NEXT_PASS (pass_if_conversion); > /* pass_vectorize must immediately follow pass_if_conversion. > Please do not add any other passes in between. */ > NEXT_PASS (pass_vectorize); > PUSH_INSERT_PASSES_WITHIN (pass_vectorize) > NEXT_PASS (pass_dce_loop); > POP_INSERT_PASSES () > NEXT_PASS (pass_predcom); > NEXT_PASS (pass_complete_unroll); > NEXT_PASS (pass_slp_vectorize); > NEXT_PASS (pass_loop_prefetch); > NEXT_PASS (pass_iv_optimize); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_tree_loop_done); > where scev is live in between the passes, and unfortunately there > is no call that each pass calls that you could easily add the free_affine.* > call to (except for execute_function_todo but you'd need to invent another > TODO_* flag for it and the question is how you'd ensure it will be handled > for the non-loop specific passes that are just run in between these > passes (say pass_copy_prop, would we need pass_copy_prop_loop?). > > Perhaps you could replace in tree-affine.c and all it's users > {,struct} pointer_map_t * with > struct GTY((user)) affine_expand_cache { > pointer_map_t *cache; > }; > and write custom gt_ggc_mx (affine_expand_cache *) function that would > would either pointer_map_traverse the cache and for each cache entry > call gt_ggc_mx on the embedded trees in the structures, or drop the cache > (dunno if the cache is really just a cache and it doesn't matter if it is > dropped any time, or if it can affect code generation). There are also PCH > related functions that need to be defined, though I guess you could just > assert there that the cache is NULL and not walk anything. Thank both of you for being patient on this patch. I went through the documentation quickly and realized that I have to modify pointer-map structure to make it recognized by GC (maybe more work suggested by Jakub). It seems I shouldn't include that task in this patch at this stage 3, I am thinking just call free_affine* function in the place it is created for SCEV. Of course, that makes it more expensive. I just found that the patch also fixes PR58296 on IVOPT and I do want to include it in 4.9 if we are ok with it. So what's your opinion? Thanks, bin > > Jakub -- Best Regards.
[PATCH] Range info handling fixes (PR tree-optimization/59417)
Hi! As discussed in the PR, range info as well as alignment info for pointers can be position dependent, can be e.g. computed from VRP ASSERT_EXPRs, so we can't blindly propagate it to SSA_NAMEs that the SSA_NAME with range info has been set to. The following patch limits that to SSA_NAMEs defined within the same bb, in that case it should be safe. Also, the patch makes determine_value_range more robust, insert of asserting it will just punt and use solely range info of var, rather than also considering ranges of PHI results. If VR_UNDEFINED comes into play, range info can be weird (of course on undefined behavior only code, but we shouldn't ICE on it). Either of these changes fixes the ICE on it's own. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-12-11 Jakub Jelinek PR tree-optimization/59417 * tree-ssa-copy.c (fini_copy_prop): If copy_of[i].value is defined in a different bb rhan var, only duplicate points-to info and not alignment info and don't duplicate range info. * tree-ssa-loop-niter.c (determine_value_range): Instead of assertion failure handle inconsistencies in range info by only using var's range info and not PHI result range infos. * gcc.c-torture/compile/pr59417.c: New test. --- gcc/tree-ssa-copy.c.jj 2013-12-10 08:52:13.0 +0100 +++ gcc/tree-ssa-copy.c 2013-12-10 17:55:16.819293842 +0100 @@ -567,14 +567,28 @@ fini_copy_prop (void) if (copy_of[i].value != var && TREE_CODE (copy_of[i].value) == SSA_NAME) { + basic_block copy_of_bb + = gimple_bb (SSA_NAME_DEF_STMT (copy_of[i].value)); + basic_block var_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); if (POINTER_TYPE_P (TREE_TYPE (var)) && SSA_NAME_PTR_INFO (var) && !SSA_NAME_PTR_INFO (copy_of[i].value)) - duplicate_ssa_name_ptr_info (copy_of[i].value, -SSA_NAME_PTR_INFO (var)); + { + duplicate_ssa_name_ptr_info (copy_of[i].value, + SSA_NAME_PTR_INFO (var)); + /* Points-to information is cfg insensitive, +but alignment info might be cfg sensitive, if it +e.g. is derived from VRP derived non-zero bits. +So, do not copy alignment info if the two SSA_NAMEs +aren't defined in the same basic block. */ + if (var_bb != copy_of_bb) + mark_ptr_info_alignment_unknown + (SSA_NAME_PTR_INFO (copy_of[i].value)); + } else if (!POINTER_TYPE_P (TREE_TYPE (var)) && SSA_NAME_RANGE_INFO (var) - && !SSA_NAME_RANGE_INFO (copy_of[i].value)) + && !SSA_NAME_RANGE_INFO (copy_of[i].value) + && var_bb == copy_of_bb) duplicate_ssa_name_range_info (copy_of[i].value, SSA_NAME_RANGE_TYPE (var), SSA_NAME_RANGE_INFO (var)); --- gcc/tree-ssa-loop-niter.c.jj2013-12-02 23:34:02.0 +0100 +++ gcc/tree-ssa-loop-niter.c 2013-12-10 17:59:07.744105878 +0100 @@ -173,7 +173,15 @@ determine_value_range (struct loop *loop { minv = minv.max (minc, TYPE_UNSIGNED (type)); maxv = maxv.min (maxc, TYPE_UNSIGNED (type)); - gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) <= 0); + /* If the PHI result range are inconsistent with +the VAR range, give up on looking at the PHI +results. This can happen if VR_UNDEFINED is +involved. */ + if (minv.cmp (maxv, TYPE_UNSIGNED (type)) > 0) + { + rtype = get_range_info (var, &minv, &maxv); + break; + } } } } --- gcc/testsuite/gcc.c-torture/compile/pr59417.c.jj2013-12-10 17:59:29.235996171 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr59417.c 2013-12-09 16:26:53.0 +0100 @@ -0,0 +1,39 @@ +/* PR tree-optimization/59417 */ + +int a, b, d; +short c; + +void +f (void) +{ + if (b) +{ + int *e; + + if (d) + { + for (; b; a++) + lbl1: + d = 0; + + for (; d <= 1; d++) + { + int **q = &e; + for (**q = 0; **q <= 0; **q++) + d = 0; + } + } +} + + else +{ + int t; + for (c = 0; c < 77; c++) + for (c = 0; c < 46; c++); + for (; t <= 0; t++) + lbl2: + ; + goto lbl1; +} + goto lbl2; +} Jakub
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote: > Thank both of you for being patient on this patch. > I went through the documentation quickly and realized that I have to > modify pointer-map structure to make it recognized by GC (maybe more Changing pointer_map at this point is IMHO not appropriate. > work suggested by Jakub). It seems I shouldn't include that task in > this patch at this stage 3, I am thinking just call free_affine* > function in the place it is created for SCEV. Of course, that makes > it more expensive. Perhaps you could call free_affine_expand_cache say from analyze_scalar_evolution, that is the innermost enclosing exported API that indirectly calls your new code, but it seems it is also called recursively from analyze_scalar_evolution_1 or functions it calls. So maybe you'd need to rename analyze_scalar_evolution to analyze_scalar_evolution_2 and adjust some calls to analyze_scalar_evolution to the latter in tree-scalar-evolution.c, and add analyze_scalar_evolution wrapper that calls analyze_scalar_evolution_2 and free_affine_expand_cache. Whether this would work depends on detailed analysis of the tree-scalar-evolution.c callgraph, there are tons of functions, most of them static, and the question is if there is a single non-static (or at most a few of them) function that cover all calls to your new code in the static functions it (indirectly) calls, i.e. if there isn't any other external entry point that might call your new code without doing free_affine_expand_cache afterwards. If you can find that, I'd say it would be the safest thing to do. But let's see what say Richard thinks about it too. Jakub
[PATCH, nds32] Missing target_cpu_default in TARGET_DEFAULT_TARGET_FLAGS.
Hi, Recently I used --target=nds32be-elf to configure nds32 gcc, it seems that the big endian is not set as default. In the config.gcc, I notice that target_cpu_default is defined as: nds32le-*-*) target_cpu_default="0" nds32be-*-*) target_cpu_default="0|MASK_BIG_ENDIAN" But the TARGET_CPU_DEFAULT is not added into TARGET_DEFAULT_TARGET_FLAGS so that MASK_BIG_ENDIAN is not enabled for nds32be-elf. The following is the patch to fix this issue. Tested on nds32be-elf. OK to apply? Index: common/config/nds32/nds32-common.c === --- common/config/nds32/nds32-common.c (revision 205880) +++ common/config/nds32/nds32-common.c (working copy) @@ -93,7 +93,8 @@ TARGET_CMOV : Generate conditional move instruction. */ #undef TARGET_DEFAULT_TARGET_FLAGS #define TARGET_DEFAULT_TARGET_FLAGS\ - (MASK_GP_DIRECT \ + (TARGET_CPU_DEFAULT \ + | MASK_GP_DIRECT\ | MASK_16_BIT \ | MASK_PERF_EXT \ | MASK_CMOV) Index: ChangeLog === --- ChangeLog (revision 205880) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2013-12-11 Monk Chiang + + * common/config/nds32/nds32-common.c (TARGET_DEFAULT_TARGET_FLAGS): + Redefine. + 2013-12-11 Bin Cheng Reverted: Monk
Re: [PATCH] Fix invalid tree sharing caused by the inliner (PR tree-optimization/59386)
On Wed, 11 Dec 2013, Jakub Jelinek wrote: > Hi! > > If id->retvar isn't a decl (can happen for DECL_BY_REFERENCE returns), > then we can end up with invalid tree sharing because we reuse it more than > once (on the following testcase id->retvar is a COMPONENT_REF). > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? Ok. Thanks, Richard. > 2013-12-11 Jakub Jelinek > > PR tree-optimization/59386 > * tree-inline.c (remap_gimple_stmt): If not id->do_not_unshare, > unshare_expr (id->retval) before passing it to gimple_build_assign. > > * gcc.c-torture/compile/pr59386.c: New test. > > --- gcc/tree-inline.c.jj 2013-12-10 08:52:13.0 +0100 > +++ gcc/tree-inline.c 2013-12-10 17:04:09.022069945 +0100 > @@ -1273,7 +1273,9 @@ remap_gimple_stmt (gimple stmt, copy_bod > || ! SSA_NAME_VAR (retval) > || TREE_CODE (SSA_NAME_VAR (retval)) != RESULT_DECL))) > { > - copy = gimple_build_assign (id->retvar, retval); > + copy = gimple_build_assign (id->do_not_unshare > + ? id->retvar : unshare_expr (id->retvar), > + retval); > /* id->retvar is already substituted. Skip it on later remapping. */ > skip_first = true; > } > --- gcc/testsuite/gcc.c-torture/compile/pr59386.c.jj 2013-12-10 > 17:06:39.389291052 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr59386.c 2013-12-10 > 17:05:37.0 +0100 > @@ -0,0 +1,24 @@ > +/* PR tree-optimization/59386 */ > + > +struct S { int s; }; > +struct T { int t; struct S u; } c; > +int b; > + > +struct S > +foo () > +{ > + struct T d; > + if (b) > +while (c.t) > + ; > + else > +return d.u; > +} > + > +struct S > +bar () > +{ > + struct T a; > + a.u = foo (); > + return a.u; > +} > > Jakub > > -- Richard Biener SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer
Re: [PATCH] Range info handling fixes (PR tree-optimization/59417)
On Wed, 11 Dec 2013, Jakub Jelinek wrote: > Hi! > > As discussed in the PR, range info as well as alignment info for pointers > can be position dependent, can be e.g. computed from VRP ASSERT_EXPRs, > so we can't blindly propagate it to SSA_NAMEs that the SSA_NAME with range > info has been set to. The following patch limits that to SSA_NAMEs defined > within the same bb, in that case it should be safe. > > Also, the patch makes determine_value_range more robust, insert of asserting > it will just punt and use solely range info of var, rather than also > considering ranges of PHI results. If VR_UNDEFINED comes into play, range > info can be weird (of course on undefined behavior only code, but we > shouldn't ICE on it). > > Either of these changes fixes the ICE on it's own. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. > 2013-12-11 Jakub Jelinek > > PR tree-optimization/59417 > * tree-ssa-copy.c (fini_copy_prop): If copy_of[i].value is defined > in a different bb rhan var, only duplicate points-to info and > not alignment info and don't duplicate range info. > * tree-ssa-loop-niter.c (determine_value_range): Instead of > assertion failure handle inconsistencies in range info by only > using var's range info and not PHI result range infos. > > * gcc.c-torture/compile/pr59417.c: New test. > > --- gcc/tree-ssa-copy.c.jj2013-12-10 08:52:13.0 +0100 > +++ gcc/tree-ssa-copy.c 2013-12-10 17:55:16.819293842 +0100 > @@ -567,14 +567,28 @@ fini_copy_prop (void) >if (copy_of[i].value != var > && TREE_CODE (copy_of[i].value) == SSA_NAME) > { > + basic_block copy_of_bb > + = gimple_bb (SSA_NAME_DEF_STMT (copy_of[i].value)); > + basic_block var_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); > if (POINTER_TYPE_P (TREE_TYPE (var)) > && SSA_NAME_PTR_INFO (var) > && !SSA_NAME_PTR_INFO (copy_of[i].value)) > - duplicate_ssa_name_ptr_info (copy_of[i].value, > - SSA_NAME_PTR_INFO (var)); > + { > + duplicate_ssa_name_ptr_info (copy_of[i].value, > +SSA_NAME_PTR_INFO (var)); > + /* Points-to information is cfg insensitive, > + but alignment info might be cfg sensitive, if it > + e.g. is derived from VRP derived non-zero bits. > + So, do not copy alignment info if the two SSA_NAMEs > + aren't defined in the same basic block. */ > + if (var_bb != copy_of_bb) > + mark_ptr_info_alignment_unknown > + (SSA_NAME_PTR_INFO (copy_of[i].value)); > + } > else if (!POINTER_TYPE_P (TREE_TYPE (var)) > && SSA_NAME_RANGE_INFO (var) > -&& !SSA_NAME_RANGE_INFO (copy_of[i].value)) > +&& !SSA_NAME_RANGE_INFO (copy_of[i].value) > +&& var_bb == copy_of_bb) > duplicate_ssa_name_range_info (copy_of[i].value, > SSA_NAME_RANGE_TYPE (var), > SSA_NAME_RANGE_INFO (var)); > --- gcc/tree-ssa-loop-niter.c.jj 2013-12-02 23:34:02.0 +0100 > +++ gcc/tree-ssa-loop-niter.c 2013-12-10 17:59:07.744105878 +0100 > @@ -173,7 +173,15 @@ determine_value_range (struct loop *loop > { > minv = minv.max (minc, TYPE_UNSIGNED (type)); > maxv = maxv.min (maxc, TYPE_UNSIGNED (type)); > - gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) <= 0); > + /* If the PHI result range are inconsistent with > + the VAR range, give up on looking at the PHI > + results. This can happen if VR_UNDEFINED is > + involved. */ > + if (minv.cmp (maxv, TYPE_UNSIGNED (type)) > 0) > + { > + rtype = get_range_info (var, &minv, &maxv); > + break; > + } > } > } > } > --- gcc/testsuite/gcc.c-torture/compile/pr59417.c.jj 2013-12-10 > 17:59:29.235996171 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr59417.c 2013-12-09 > 16:26:53.0 +0100 > @@ -0,0 +1,39 @@ > +/* PR tree-optimization/59417 */ > + > +int a, b, d; > +short c; > + > +void > +f (void) > +{ > + if (b) > +{ > + int *e; > + > + if (d) > + { > + for (; b; a++) > + lbl1: > + d = 0; > + > + for (; d <= 1; d++) > + { > + int **q = &e; > + for (**q = 0; **q <= 0; **q++) > + d = 0; > + } > + } > +} > + > + else > +{ > + int t; > + for (c = 0; c < 77; c++) > + for (c = 0; c < 46; c++); > + for (; t <= 0; t++) > + lbl2: > + ; > + goto lbl1; > +} > + goto lbl2; > +} > > Jakub
Re: [PATCH] S/390: Function hotpatching option and function attribute
Backporting this patch to 4.8, 4.4 and 4.1 brougth some issues to light. The new version of the patch optimizes some code and warnings and adapts the test cases so that they run without modifications on 4.8 and earlier. Original message: On Thu, Dec 05, 2013 at 09:06:31AM +0100, Dominik Vogt wrote: > Andreas Krebbel an I have ported the function hotpatching feature > from i386 (aka ms_hook_prologue attribute) to S/390. Andreas has > already internally approved the attached patch and will commit it > soon (for legal reasons he has to do that himself). > > The attached patch introduces command line options as well as a > function attribute to control the function hotpatching prologue: > > -m[no-]hotpatch > Enable or disable hotpatching prologues for all functions in > the compilation unit (with the default prologue size). > > -mhotpatch= > Same as -mhotpatch, but the size of the prologue is > halfwords (i.e. the trampoline area is halwords). > > __attribute__ ((hotpatch)) and > __attribute__ ((hotpatch())) > Work like the option, but only for the specific function. If > the attribute and one of the options is given, the attribute > always wins. > > above is limited to values 0 through 100 (default 12). > Functions that are explicitly inline cannot be hotpatched, and > hotpatched functions are never implicitly inlined. > > The layout of a hotpatchable function prologue is > > nopr %r7 ( times, 2 bytes each, aka "trampoline area") > function_label: > nop 0 (four bytes) > > The function label alignment is changed to eight bytes to make > sure that the eight bytes directly in front of the label reside > in the same cacheline. > > To hotpatch a function, the program first writes the address of > the new version of the hotpatched function to the eight bytes > before the label (four bytes on 32-bit), then writes code to read > that address and to jump to it into the trampoline area and then > replaces the nop behind the label with a relative backwards jump > into the trampoline area. (Some maintenance of the Icache is > necessary in this procedure.) > > If the program can make sure that the patched functions are always > close enough to the original in the memory map, the trampoline > area can be omitted ( = 0), and a short relative jump to the > patched function can be written over the four byte nop. > > The patch includes a number of test cases that all run without > trouble here. The tests validate the syntax of the options an the > attribute and check for the correct number of nops. I have > verified the correct layout of the trampoline manually (i.e. the > correct placement of the nops). Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany >From a307cee52819be37cb44d8f2a972200b132697e8 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Mon, 18 Nov 2013 15:41:46 + Subject: [PATCH 1/2] S/390: Function hotpatching option and function attribute --- gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c | 201 + gcc/config/s390/s390.h | 5 +- gcc/config/s390/s390.opt | 8 + gcc/doc/extend.texi| 11 ++ gcc/doc/invoke.texi| 18 +- gcc/testsuite/gcc.target/s390/hotpatch-1.c | 20 ++ gcc/testsuite/gcc.target/s390/hotpatch-10.c| 21 +++ gcc/testsuite/gcc.target/s390/hotpatch-11.c| 20 ++ gcc/testsuite/gcc.target/s390/hotpatch-12.c| 20 ++ gcc/testsuite/gcc.target/s390/hotpatch-2.c | 20 ++ gcc/testsuite/gcc.target/s390/hotpatch-3.c | 20 ++ gcc/testsuite/gcc.target/s390/hotpatch-4.c | 26 +++ gcc/testsuite/gcc.target/s390/hotpatch-5.c | 21 +++ gcc/testsuite/gcc.target/s390/hotpatch-6.c | 21 +++ gcc/testsuite/gcc.target/s390/hotpatch-7.c | 21 +++ gcc/testsuite/gcc.target/s390/hotpatch-8.c | 28 +++ gcc/testsuite/gcc.target/s390/hotpatch-9.c | 21 +++ gcc/testsuite/gcc.target/s390/hotpatch-compile-1.c | 27 +++ gcc/testsuite/gcc.target/s390/hotpatch-compile-2.c | 27 +++ gcc/testsuite/gcc.target/s390/hotpatch-compile-3.c | 27 +++ gcc/testsuite/gcc.target/s390/hotpatch-compile-4.c | 11 ++ gcc/testsuite/gcc.target/s390/hotpatch-compile-5.c | 28 +++ gcc/testsuite/gcc.target/s390/hotpatch-compile-6.c | 11 ++ gcc/testsuite/gcc.target/s390/hotpatch-compile-7.c | 68 +++ 25 files changed, 700 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-1.c create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-10.c create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-11.c create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-12.c create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-2.c create mode 100644 gcc/testsuite/gcc.target/s390/hotpatch-3.c create mode 100644 gcc/testsuite/g
Re: [Patch, RTL] Eliminate redundant vec_select moves.
Richard Henderson writes: > On 12/10/2013 10:44 AM, Richard Sandiford wrote: >> Sorry, I don't understand. I never said it was invalid. I said >> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents >> a single register. On a little-endian target, the offset cannot be >> anything other than 0 in that case. >> >> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for >> something that is always invalid, regardless of the target. That kind >> of situation should be rejected by target-independent code instead. > > But, we want to disable the subreg before we know whether or not (reg:V4SF X) > will be allocated to a single hard register. That is something that we can't > know in target-independent code before register allocation. I was thinking that if we've got a class, we've also got things like CLASS_MAX_NREGS. Maybe that doesn't cope with padding properly though. But even in the padding cases an offset-based check in C_C_M_C could be derived from other information. subreg_get_info handles padding with: nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); if (GET_MODE_INNER (xmode) == VOIDmode) xmode_unit = xmode; else xmode_unit = GET_MODE_INNER (xmode); gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); gcc_assert (nregs_xmode == (GET_MODE_NUNITS (xmode) * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit))); gcc_assert (hard_regno_nregs[xregno][xmode] == (hard_regno_nregs[xregno][xmode_unit] * GET_MODE_NUNITS (xmode))); /* You can only ask for a SUBREG of a value with holes in the middle if you don't cross the holes. (Such a SUBREG should be done by picking a different register class, or doing it in memory if necessary.) An example of a value with holes is XCmode on 32-bit x86 with -m128bit-long-double; it's represented in 6 32-bit registers, 3 for each part, but in memory it's two 128-bit parts. Padding is assumed to be at the end (not necessarily the 'high part') of each unit. */ if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < GET_MODE_NUNITS (xmode)) && (offset / GET_MODE_SIZE (xmode_unit) != ((offset + GET_MODE_SIZE (ymode) - 1) / GET_MODE_SIZE (xmode_unit { info->representable_p = false; rknown = true; } and I wouldn't really want to force targets to individually reproduce that kind of logic at the class level. If the worst comes to the worst we could cache the difficult cases. Thanks, Richard
Re: [PATCH] Fix PR 59390
On 8 December 2013 16:53, Uros Bizjak wrote: > On Fri, Dec 6, 2013 at 8:33 PM, Sriraman Tallam wrote: >> Patch updated with two more tests to check if the vfmadd insn is being >> produced when possible. >> >> Thanks >> Sri >> >> On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam wrote: >>> Hi, >>> >>> I have attached a patch to fix >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review. >>> >>> Here is the problem. GCC adds target-specific builtins on demand. The >>> FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via >>> this declaration: >>> >>> void fun() __attribute__((target("fma"))); >>> >>> Specifically, the builtin __builtin_ia32_vfmaddpd gets added when >>> ix86_add_new_builtins is called from ix86_valid_target_attribute_tree >>> when processing this target attribute. >>> >>> Now, when the vectorizer is processing the builtin "__builtin_fma" in >>> function other_fn(), it checks to see if this function is vectorizable >>> and calls ix86_builtin_vectorized_function in i386.c. That returns the >>> builtin stored here: >>> >>> >>> case BUILT_IN_FMA: >>> if (out_mode == DFmode && in_mode == DFmode) >>> { >>> if (out_n == 2 && in_n == 2) >>>return ix86_builtins[IX86_BUILTIN_VFMADDPD]; >>> >>> >>> ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE >>> had the builtin not been added by the previous target attribute. That >>> is why the code works if we remove the previous declaration. >>> >>> The fix is to not just return the builtin but to also check if the >>> current function's isa allows the use of the builtin. For instance, >>> this patch would solve the problem: >>> >>> @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, tre >>>if (out_mode == DFmode && in_mode == DFmode) >>> { >>>if (out_n == 2 && in_n == 2) >>> -return ix86_builtins[IX86_BUILTIN_VFMADDPD]; >>> +{ >>> + if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa >>> + & global_options.x_ix86_isa_flags) >>> +return ix86_builtins[IX86_BUILTIN_VFMADDPD]; >>> + else >>> + return NULL_TREE; >>> +} >>> >>> >>> but there are many instances of this usage in >>> ix86_builtin_vectorized_function. This patch covers all the cases. > >> PR target/59390 >> * gcc.target/i386/pr59390.c: New test. >> * gcc.target/i386/pr59390_1.c: New test. >> * gcc.target/i386/pr59390_2.c: New test. >> * config/i386/i386.c (get_builtin): New function. >> (ix86_builtin_vectorized_function): Replace all instances of >> ix86_builtins[...] with get_builtin(...). >> (ix86_builtin_reciprocal): Ditto. > > OK, with a couple of nits: > > +static tree get_builtin (enum ix86_builtins code) > > Please name this function ix86_get_builtin. > > + if (current_function_decl) > +target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); > + if (target_tree) > +opts = TREE_TARGET_OPTION (target_tree); > + else > +opts = TREE_TARGET_OPTION (target_option_default_node); > + > > opts = TREE_TARGET_OPTION (target_tree ? target_tree : > target_option_default_node); Just curious: > +static tree get_builtin (enum ix86_builtins code) > +{ [] > +> [] > @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre >if (out_mode == DFmode && in_mode == DFmode) > { > if (out_n == 2 && in_n == 2) > -return ix86_builtins[IX86_BUILTIN_SQRTPD]; > +get_builtin (IX86_BUILTIN_SQRTPD); > else if (out_n == 4 && in_n == 4) > -return ix86_builtins[IX86_BUILTIN_SQRTPD256]; > +get_builtin (IX86_BUILTIN_SQRTPD256); > } >break; I must be missing something? Don't you have to return ix86_get_builtin(...) ? thanks,
Re: [PATCH] Fix PR 59390
On Wed, 11 Dec 2013, Bernhard Reutner-Fischer wrote: > On 8 December 2013 16:53, Uros Bizjak wrote: > > On Fri, Dec 6, 2013 at 8:33 PM, Sriraman Tallam wrote: > >> Patch updated with two more tests to check if the vfmadd insn is being > >> produced when possible. > >> > >> Thanks > >> Sri > >> > >> On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam > >> wrote: > >>> Hi, > >>> > >>> I have attached a patch to fix > >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review. > >>> > >>> Here is the problem. GCC adds target-specific builtins on demand. The > >>> FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via > >>> this declaration: > >>> > >>> void fun() __attribute__((target("fma"))); > >>> > >>> Specifically, the builtin __builtin_ia32_vfmaddpd gets added when > >>> ix86_add_new_builtins is called from ix86_valid_target_attribute_tree > >>> when processing this target attribute. > >>> > >>> Now, when the vectorizer is processing the builtin "__builtin_fma" in > >>> function other_fn(), it checks to see if this function is vectorizable > >>> and calls ix86_builtin_vectorized_function in i386.c. That returns the > >>> builtin stored here: > >>> > >>> > >>> case BUILT_IN_FMA: > >>> if (out_mode == DFmode && in_mode == DFmode) > >>> { > >>> if (out_n == 2 && in_n == 2) > >>>return ix86_builtins[IX86_BUILTIN_VFMADDPD]; > >>> > >>> > >>> ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE > >>> had the builtin not been added by the previous target attribute. That > >>> is why the code works if we remove the previous declaration. > >>> > >>> The fix is to not just return the builtin but to also check if the > >>> current function's isa allows the use of the builtin. For instance, > >>> this patch would solve the problem: > >>> > >>> @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, > >>> tre > >>>if (out_mode == DFmode && in_mode == DFmode) > >>> { > >>>if (out_n == 2 && in_n == 2) > >>> -return ix86_builtins[IX86_BUILTIN_VFMADDPD]; > >>> +{ > >>> + if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa > >>> + & global_options.x_ix86_isa_flags) > >>> +return ix86_builtins[IX86_BUILTIN_VFMADDPD]; > >>> + else > >>> + return NULL_TREE; > >>> +} > >>> > >>> > >>> but there are many instances of this usage in > >>> ix86_builtin_vectorized_function. This patch covers all the cases. > > > >> PR target/59390 > >> * gcc.target/i386/pr59390.c: New test. > >> * gcc.target/i386/pr59390_1.c: New test. > >> * gcc.target/i386/pr59390_2.c: New test. > >> * config/i386/i386.c (get_builtin): New function. > >> (ix86_builtin_vectorized_function): Replace all instances of > >> ix86_builtins[...] with get_builtin(...). > >> (ix86_builtin_reciprocal): Ditto. > > > > OK, with a couple of nits: > > > > +static tree get_builtin (enum ix86_builtins code) > > > > Please name this function ix86_get_builtin. > > > > + if (current_function_decl) > > +target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); > > + if (target_tree) > > +opts = TREE_TARGET_OPTION (target_tree); > > + else > > +opts = TREE_TARGET_OPTION (target_option_default_node); > > + > > > > opts = TREE_TARGET_OPTION (target_tree ? target_tree : > > target_option_default_node); > > Just curious: > > +static tree get_builtin (enum ix86_builtins code) > > +{ > [] > > +> > [] > > @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre > >if (out_mode == DFmode && in_mode == DFmode) > > { > > if (out_n == 2 && in_n == 2) > > -return ix86_builtins[IX86_BUILTIN_SQRTPD]; > > +get_builtin (IX86_BUILTIN_SQRTPD); > > else if (out_n == 4 && in_n == 4) > > -return ix86_builtins[IX86_BUILTIN_SQRTPD256]; > > +get_builtin (IX86_BUILTIN_SQRTPD256); > > } > >break; > > I must be missing something? > Don't you have to return ix86_get_builtin(...) ? Of course you have to. Richard.
Re: [RFA][PATCH][PR tree-optimization/45685]
On Wed, Dec 11, 2013 at 7:54 AM, Jeff Law wrote: > > So for this source, compiled for x86_64 with -O3: > > typedef unsigned long int uint64_t; > typedef long int int64_t; > int summation_helper_1(int64_t* products, uint64_t count) > { > int s = 0; > uint64_t i; > for(i=0; i { > int64_t val = (products[i]>0) ? 1 : -1; > products[i] *= val; > if(products[i] != i) > val = -val; > products[i] = val; > s += val; > } > return s; > } > > > int summation_helper_2(int64_t* products, uint64_t count) > { > int s = 0; > uint64_t i; > for(i=0; i { > int val = (products[i]>0) ? 1 : -1; > products[i] *= val; > if(products[i] != i) > val = -val; > products[i] = val; > s += val; > } > return s; > } > > > The loops we generate are pretty bad and have regressed relative to older > versions of GCC. > > For the first loop, we have the following .optimized output for the loop: > > : > # s_28 = PHI > # i_27 = PHI > _11 = MEM[base: products_9(D), index: i_27, step: 8, offset: 0B]; > val_4 = _11 > 0 ? 1 : -1; > prephitmp_38 = _11 > 0 ? -1 : 1; > prephitmp_39 = _11 > 0 ? 4294967295 : 1; > prephitmp_41 = _11 > 0 ? 1 : 4294967295; > _12 = val_4 * _11; > _14 = (long unsigned int) _12; > val_3 = _14 != i_27 ? prephitmp_38 : val_4; > prephitmp_44 = _14 != i_27 ? prephitmp_39 : prephitmp_41; > MEM[base: products_9(D), index: i_27, step: 8, offset: 0B] = val_3; > s.1_18 = (unsigned int) s_28; > _19 = prephitmp_44 + s.1_18; > s_20 = (int) _19; > i_21 = i_27 + 1; > if (i_21 != count_7(D)) > goto ; > else > goto ; > > : > goto ; > > > Note the series of COND_EXPRs. A couple are just conditional negation which > can be implemented with a straight-line code sequence. Using that > straight-line sequence results in: > > : > # s_31 = PHI > # i_32 = PHI > _11 = MEM[base: products_9(D), index: i_32, step: 8, offset: 0B]; > val_4 = _11 > 0 ? 1 : -1; > _12 = val_4 * _11; > _14 = (long unsigned int) _12; > _24 = _14 != i_32; > _25 = (int64_t) _24; > _29 = -_25; > _28 = _29 ^ val_4; > _27 = _28 + _25; > MEM[base: products_9(D), index: i_32, step: 8, offset: 0B] = _27; > _17 = (unsigned int) _27; > s.1_18 = (unsigned int) s_31; > _19 = _17 + s.1_18; > s_20 = (int) _19; > i_21 = i_32 + 1; > if (i_21 != count_7(D)) > goto ; > else > goto ; > > : > goto ; > > Which *appears* worse. However, that code can much more easily be handled > by the RTL optimizers.When we look at what the trunk generates at the > assembly level we have: > > .L3: > movq(%rdi,%rcx,8), %rdx > testq %rdx, %rdx > setg%r8b > movzbl %r8b, %r10d > movzbl %r8b, %r8d > leaq-1(%r10,%r10), %r10 > leal-1(%r8,%r8), %r8d > movq%r10, %r11 > imulq %rdx, %r11 > testq %rdx, %rdx > setle %dl > movzbl %dl, %r9d > movzbl %dl, %edx > leaq-1(%r9,%r9), %r9 > leal-1(%rdx,%rdx), %edx > cmpq%rcx, %r11 > cmove %r10, %r9 > cmove %r8d, %edx > movq%r9, (%rdi,%rcx,8) > addq$1, %rcx > addl%edx, %eax > cmpq%rsi, %rcx > jne .L3 > (Ick) > > With the conditional negation patch that turns into: > > L3: > movq(%rdi,%rcx,8), %r8 > xorl%edx, %edx > testq %r8, %r8 > setg%dl > leaq-1(%rdx,%rdx), %rdx > imulq %rdx, %r8 > cmpq%rcx, %r8 > setne %r8b > movzbl %r8b, %r8d > movq%r8, %r9 > negq%r9 > xorq%r9, %rdx > addq%r8, %rdx > movq%rdx, (%rdi,%rcx,8) > addq$1, %rcx > addl%edx, %eax > cmpq%rsi, %rcx > jne .L3 > > No branches within the loop, no conditional moves either. In all it's 5 > instructions shorter. > > > The second loop shows similar effects, though they're not as dramatic. > > Before: > : > # s_27 = PHI > # i_26 = PHI > _11 = MEM[base: products_9(D), index: i_26, step: 8, offset: 0B]; > val_4 = _11 > 0 ? 1 : -1; > prephitmp_32 = _11 > 0 ? 1 : -1; > prephitmp_33 = _11 > 0 ? -1 : 1; > prephitmp_34 = _11 > 0 ? -1 : 1; > _13 = _11 * prephitmp_32; > _15 = (long unsigned int) _13; > val_3 = _15 != i_26 ? prephitmp_33 : val_4; > prephitmp_36 = _15 != i_26 ? prephitmp_34 : prephitmp_32; > MEM[base: products_9(D), index: i_26, step: 8, offset: 0B] = prephitmp_36; > s_19 = val_3 + s_27; > i_20 = i_26 + 1; > if (i_20 != count_7(D)) > goto ; > else > goto ; > > : > goto ; > > > Which results in the following assembly: > >
Re: [PATCH PR41488]Recognize more induction variables by simplifying PEELED chrec in scalar evolution
On Wed, Dec 11, 2013 at 9:50 AM, Jakub Jelinek wrote: > On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote: >> Thank both of you for being patient on this patch. >> I went through the documentation quickly and realized that I have to >> modify pointer-map structure to make it recognized by GC (maybe more > > Changing pointer_map at this point is IMHO not appropriate. > >> work suggested by Jakub). It seems I shouldn't include that task in >> this patch at this stage 3, I am thinking just call free_affine* >> function in the place it is created for SCEV. Of course, that makes >> it more expensive. > > Perhaps you could call free_affine_expand_cache say from > analyze_scalar_evolution, that is the innermost enclosing exported > API that indirectly calls your new code, but it seems it is also called > recursively from analyze_scalar_evolution_1 or functions it calls. > So maybe you'd need to rename analyze_scalar_evolution to > analyze_scalar_evolution_2 and adjust some calls to analyze_scalar_evolution > to the latter in tree-scalar-evolution.c, and add analyze_scalar_evolution > wrapper that calls analyze_scalar_evolution_2 and free_affine_expand_cache. > Whether this would work depends on detailed analysis of the > tree-scalar-evolution.c callgraph, there are tons of functions, most of > them static, and the question is if there is a single non-static (or at most > a few of them) function that cover all calls to your new code in the > static functions it (indirectly) calls, i.e. if there isn't any other > external entry point that might call your new code without doing > free_affine_expand_cache afterwards. > > If you can find that, I'd say it would be the safest thing to do. > But let's see what say Richard thinks about it too. I think keeping the SCEV cache live over multiple passes is fragile (we do re-use SSA names and passes do not appropriately call scev_reset[_htab] in all cases). With fixing that the easiest approach is to associate a affine-expand cache with the scev info. Without that there isn't a very good solution other than discarding the cache after each expansion at the point you use it. The SCEV cache should effectively make most of the affine expansion caching moot (but you can try instrumenting that to verify it). That is, I suggest to try freeing the cache right after the second tree_to_aff_combination_expand. Btw, + /* Try harder to check if they are equal. */ + tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map); + tree_to_aff_combination_expand (step_val, type, &aff2, &peeled_chrec_map); + aff_combination_scale (&aff2, double_int_minus_one); + aff_combination_add (&aff1, &aff2); + left = fold_convert (type, aff_combination_to_tree (&aff1)); + + /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP + if "left" equals to "init + right". */ + if (operand_equal_p (left, integer_zero_node, 0)) you don't need aff_combination_to_tree, simply check aff1.n == 0 && aff1.offset.is_zero () (you may want to add aff_combination_zero_p or even aff_combination_equal_p) Thanks, Richard. > Jakub
New tsan tests.
Hi all, I've added new tests for tsan from LLVM. Tested on x86_64. Ok to commit? -Maxim diff --git a/gcc/testsuite/c-c++-common/tsan/free_race2.c b/gcc/testsuite/c-c++-common/tsan/free_race2.c new file mode 100644 index 000..3c15d2d --- /dev/null +++ b/gcc/testsuite/c-c++-common/tsan/free_race2.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-shouldfail "tsan" } */ + +#include + +void __attribute__((noinline)) foo(int *mem) { + free(mem); +} + +void __attribute__((noinline)) bar(int *mem) { + mem[0] = 42; +} + +int main() { + int *mem = (int*)malloc(100); + foo(mem); + bar(mem); + return 0; +} + +/* { dg-output "WARNING: ThreadSanitizer: heap-use-after-free.*(\n|\r\n|\r)" } */ +/* { dg-output " Write of size 4.* by main thread:(\n|\r\n|\r)" } */ +/* { dg-output "#0 bar.*" } */ +/* { dg-output "#1 main .*" } */ +/* { dg-output " Previous write of size 8 at .* by main thread:(\n|\r\n|\r)" } */ +/* { dg-output "#0 free .*" } */ +/* { dg-output "#\(1|2\) foo.*(\n|\r\n|\r)" } */ +/* { dg-output "#\(2|3\) main .*" } */ + diff --git a/gcc/testsuite/c-c++-common/tsan/race_on_barrier2.c b/gcc/testsuite/c-c++-common/tsan/race_on_barrier2.c new file mode 100644 index 000..9576c67 --- /dev/null +++ b/gcc/testsuite/c-c++-common/tsan/race_on_barrier2.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-shouldfail "tsan" } */ + +#include +#include +#include +#include + +pthread_barrier_t B; +int Global; + +void *Thread1(void *x) { + if (pthread_barrier_wait(&B) == PTHREAD_BARRIER_SERIAL_THREAD) +pthread_barrier_destroy(&B); + return NULL; +} + +void *Thread2(void *x) { + if (pthread_barrier_wait(&B) == PTHREAD_BARRIER_SERIAL_THREAD) +pthread_barrier_destroy(&B); + return NULL; +} + +int main() { + pthread_barrier_init(&B, 0, 2); + pthread_t t; + pthread_create(&t, NULL, Thread1, NULL); + Thread2(0); + pthread_join(t, NULL); + return 0; +} + +/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ diff --git a/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c b/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c new file mode 100644 index 000..f112d09 --- /dev/null +++ b/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ +/* { dg-shouldfail "tsan" } */ + +#include +#include +#include +#include + +pthread_mutex_t Mtx; +int Global; + +void *Thread1(void *x) { + pthread_mutex_init(&Mtx, 0); + pthread_mutex_lock(&Mtx); + Global = 42; + pthread_mutex_unlock(&Mtx); + return NULL; +} + +void *Thread2(void *x) { + sleep(1); + pthread_mutex_lock(&Mtx); + Global = 43; + pthread_mutex_unlock(&Mtx); + return NULL; +} + +int main() { + pthread_t t[2]; + pthread_create(&t[0], NULL, Thread1, NULL); + pthread_create(&t[1], NULL, Thread2, NULL); + pthread_join(t[0], NULL); + pthread_join(t[1], NULL); + pthread_mutex_destroy(&Mtx); + return 0; +} + +/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ +/* { dg-output " Atomic read of size 1 at .* by thread T2:(\n|\r\n|\r)" } */ +/* { dg-output "#0 pthread_mutex_lock.*" } */ +/* { dg-output "#1 Thread2.* .*(race_on_mutex.c:22|\\?{2}:0) (.*)" } */ +/* { dg-output " Previous write of size 1 at .* by thread T1:(\n|\r\n|\r)" } */ +/* { dg-output "#0 pthread_mutex_init .* (.)*" } */ +/* { dg-output "#1 Thread1.* .*(race_on_mutex.c:13|\\?{2}:0) .*" } */ diff --git a/gcc/testsuite/c-c++-common/tsan/race_on_mutex2.c b/gcc/testsuite/c-c++-common/tsan/race_on_mutex2.c new file mode 100644 index 000..d8a6980 --- /dev/null +++ b/gcc/testsuite/c-c++-common/tsan/race_on_mutex2.c @@ -0,0 +1,26 @@ +/* { dg-do run } */ +/* { dg-shouldfail "tsan" } */ + +#include +#include +#include +#include + +void *Thread(void *x) { + pthread_mutex_lock((pthread_mutex_t*)x); + pthread_mutex_unlock((pthread_mutex_t*)x); + return 0; +} + +int main() { + pthread_mutex_t Mtx; + pthread_mutex_init(&Mtx, 0); + pthread_t t; + pthread_create(&t, 0, Thread, &Mtx); + sleep(1); + pthread_mutex_destroy(&Mtx); + pthread_join(t, 0); + return 0; +} + +/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ diff --git a/gcc/testsuite/c-c++-common/tsan/simple_race.c b/gcc/testsuite/c-c++-common/tsan/simple_race.c new file mode 100644 index 000..24b88e8 --- /dev/null +++ b/gcc/testsuite/c-c++-common/tsan/simple_race.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-shouldfail "tsan" } */ + +#include +#include + +int Global; + +void *Thread1(void *x) { + Global = 42; + return NULL; +} + +void *Thread2(void *x) { + Global = 43; + return NULL; +} + +int main() { + pthread_t t[2]; + pthread_create(&t[0], NULL, Thread1, NULL); + pthread_create(&t[1], NULL, Thread2, NULL); + pthread_join(t[0], NULL); + pthread_join(t[1], NULL); + return 0; +} + +/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ diff --git a/gcc/testsuite/c-c++-common/tsan/simple_stack.c b/gcc/testsuite/c-c++-common/tsan/simple_stack.c new file
Re: New tsan tests.
On Wed, Dec 11, 2013 at 02:12:35PM +0400, Maxim Ostapenko wrote: > I've added new tests for tsan from LLVM. > --- /dev/null > +++ b/gcc/testsuite/g++.dg/tsan/aligned_vs_unaligned_race.C > @@ -0,0 +1,31 @@ > +/* { dg-do run } */ > + > +#include > +#include > +#include > + > +uint64_t Global[2]; > + > +void *Thread1(void *x) { > + Global[1]++; > + return NULL; > +} > + > +void *Thread2(void *x) { > + char *p1 = reinterpret_cast(&Global[0]); > + uint64_t *p4 = reinterpret_cast(p1 + 1); This test would fail on strict alignment targets. Right now tsan is supported only on x86_64-linux, so this isn't fatal right now, but something to consider for the future. Why is the test C++ only though? Just replace the reinterpret_cast stuff with normal C cast? > --- /dev/null > +++ b/gcc/testsuite/g++.dg/tsan/benign_race.C > @@ -0,0 +1,40 @@ > +/* { dg-do run } */ > + > +#include > +#include > +#include > + > +int Global; > +int WTFGlobal; Again, why C++ only? Just guard the extern "C" { and } with #ifdef __cplusplus. > + > +extern "C" { > +void AnnotateBenignRaceSized(const char *f, int l, > + void *mem, unsigned int size, const char *desc); > +void WTFAnnotateBenignRaceSized(const char *f, int l, > +void *mem, unsigned int size, > +const char *desc); > +} > --- /dev/null > +++ b/gcc/testsuite/g++.dg/tsan/default_options.C > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-shouldfail "tsan" } */ > + > +#include > +#include > + > +extern "C" const char *__tsan_default_options() { > + return "report_bugs=0"; Similarly. > --- /dev/null > +++ b/gcc/testsuite/g++.dg/tsan/fd_close_norace.C > @@ -0,0 +1,32 @@ > +/* { dg-do run } */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +void *Thread1(void *x) { > + int f = open("/dev/random", O_RDONLY); > + close(f); > + return NULL; > +} > + > +void *Thread2(void *x) { > + sleep(1); > + int f = open("/dev/random", O_RDONLY); > + close(f); > + return NULL; > +} > + > +int main() { > + pthread_t t[2]; > + pthread_create(&t[0], NULL, Thread1, NULL); > + pthread_create(&t[1], NULL, Thread2, NULL); > + pthread_join(t[0], NULL); > + pthread_join(t[1], NULL); > + printf("OK\n"); > +} Ditto. Is the only non-C thing here the missing return 0; from main? > + > +/* { dg-prune-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ > diff --git a/gcc/testsuite/g++.dg/tsan/fd_close_norace2.C > b/gcc/testsuite/g++.dg/tsan/fd_close_norace2.C > new file mode 100644 > index 000..f2d394c > --- /dev/null > +++ b/gcc/testsuite/g++.dg/tsan/fd_close_norace2.C Likewise. Jakub
RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
Hi Uros! Accommodated the changes that you mentioned. Completed the bootstrap testing too. Regards Ganesh -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Wednesday, December 04, 2013 3:17 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org; Richard Guenther (richard.guent...@gmail.com) Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4 On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh wrote: > Attached is the revised patch. > The target independent part has been already approved and added. > > This revision of the patch adds a x86 tune definition and checks it while > deciding the unroll factor. > > Accommodated the comments given by you except one. > >> *x will never be null for active insns. > Since every rtx in the insn is checked for memory references, the NULL_RTX > check is required. Yes you are correct. for_each_rtx also passes NULL_RTX, I was distracted by "There are no sub-expressions." comment. +if (NONDEBUG_INSN_P (insn) && INSN_CODE (insn) != -1) Do you need to check for INSN_CODE here? IIRC, checking for NONDEBUG_INSN_P is enough. +for_each_rtx (&insn, (rtx_function) ix86_loop_memcount, &mem_count); +} + free (bbs); + + if (mem_count <=32) +return 32/mem_count; Ouch... mem_count can be zero. Is there a reason to change this part from previous patch? Uros. unroll-adjust.patch Description: unroll-adjust.patch
Re: patch for elimination to SP when it is changed in RTL (PR57293)
Hi Vladimir, I've some regressions on ARM after this SP elimination patch, and they are execution failures. Here is the list: g++.dg/cilk-plus/AN/array_test_ND_tplt.cc -O3 -fcilkplus gcc.c-torture/execute/va-arg-22.c -O2 gcc.dg/atomic/c11-atomic-exec-5.c -O0 gfortran.dg/direct_io_12.f90 -O[23] gfortran.dg/elemental_dependency_1.f90 -O2 gfortran.dg/matmul_2.f90 -O2 gfortran.dg/matmul_6.f90 -O2 gfortran.dg/mvbits_7.f90 -O3 gfortran.dg/unlimited_polymorphic_1.f03 -O3 I reduced and looked at var-arg-22.c and the issue is that in lra_eliminate_regs_1 (called by get_equiv_with_elimination) we transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset. What we try to do here is to change the pseudo 195 of the insn 118 below : (insn 118 114 112 8 (set (reg:DI 195) (unspec:DI [ (mem:DI (plus:SI (reg/f:SI 215) (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12 + 64B]+8 S8 A8]) ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi} (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192) (const_int 8 [0x8])) [7 a35+8 S8 A32]) (nil))) with its equivalent (x arg of lra_eliminate_regs_1): (mem/c:DI (plus:SI (reg/f:SI 102 sfp) (const_int 76 [0x4c])) [7 a35+8 S8 A32]) lra_eliminate_regs_1 is called with full_p = true (it is not really clear for what it means), but in the PLUS switch case, we have offset = 0xb (given by ep->offset) and as lra_get_insn_recog_data (insn)->sp_offset value is 0, we will indeed add 0xb to the original 0x4c offset. So, here I don't get if it is the sp_offset value of the lra_insn_recog_data element which is not well updated or if lra_ eliminate_regs_1 has to be called with update_p and not full_p (which fixed the value in that particular case). Is it more obvious for you ? Thanks Yvan On 3 December 2013 16:39, Vladimir Makarov wrote: > On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote: >> >> On 2 December 2013 23:44, Vladimir Makarov wrote: >> >>> If somebody with the rights approves, I can commit it tomorrow. >>> >>> 2013-12-02 Vladimir Makarov >>> >>> * config/aarch64/aarch64.c (aarch64_frame_pointer_required): >>> Check >>> LR_REGNUM. >>> (aarch64_can_eliminate): Don't check elimination source when >>> frame_pointer_requred is false. >>> >> >> >> This is fine with me, go ahead and commit it. Thanks /Marcus >> > Committed as rev. 205637 with changelog fix of a typo found by Jeff. >
RE: [REPOST] Invalid Code when reading from unaligned zero-sized array
Hi, On Tue, 10 Dec 2013 16:14:43, Richard Biener wrote: > > On Tue, Dec 10, 2013 at 4:02 PM, Richard Biener > wrote: >> On Tue, Dec 10, 2013 at 11:53 AM, Eric Botcazou >> wrote: What we support is out of bounds accesses for heap vars if the var's type has flexible array member or something we treat similarly and there is the possibility that there could be payload after the heap var that could be accessed from the flexible array members or similar arrays. >>> >>> My question was about the above similar arrays, i.e. whether we consider all >>> trailing arrays in structures as flexible-like or not. No strong opinion. >> >> Yes we do, even for struct { struct { int a; char a[1] } }; (note the not >> really >> "trailing" as there is padding after the trailing array). We do take >> size limitations from a DECL (if we see one) into account to limit the >> effect of this trailing-array-supporting, so it effectively only applies to >> indirect accesses (and the padding example above, you can use the whole >> padding if DECL_SIZE allows that). >> So, I don't see what is the big deal with BLKmode, because all the cases which actually could have flexible array member extra payloads (or similar) must necessarily live in memory, if it is the compiler that decides whether to put it into memory or keep in registers etc., then it can't be heap allocated. >>> >>> The invariant is that types for which objects can effectively have variable >>> size must have BLKmode, otherwise you need to add very ugly code in the RTL >>> expander to mask the lie. >> >> I wonder if we can make the expander more rely on the DECLs mode >> and optimize only the DECLs mode (if we can constrain its size via >> DECL_SIZE) to non-BLKmode instead of doing that for the TYPEs mode. >> Yes, you'd have DECL_MODE != TYPE_MODE that way. >> >> Or rather I wonder if the expander doesn't already work that way >> (looks at DECL_MODE). > > To speak in patches (completely untested) - sth like the following ontop > of making more types have BLKmode: > > Index: gcc/stor-layout.c > === > --- gcc/stor-layout.c (revision 205857) > +++ gcc/stor-layout.c (working copy) > @@ -569,8 +569,17 @@ layout_decl (tree decl, unsigned int kno > bitsize_unit_node)); > > if (code != FIELD_DECL) > - /* For non-fields, update the alignment from the type. */ > - do_type_align (type, decl); > + { > + /* For non-fields, update the alignment from the type. */ > + do_type_align (type, decl); > + /* Optimize the mode of the decl. > + ??? Ensure choosen mode alignment fits decl alignment. */ > + if (DECL_MODE (decl) == BLKmode > + && DECL_SIZE (decl) > + && compare_tree_int (DECL_SIZE (t), MAX_FIXED_MODE_SIZE) <= 0) > + DECL_MODE (decl) > + = mode_for_size (TREE_INT_CST_LOW (DECL_SIZE (decl)), MODE_INT, 1); > + } > else > /* For fields, it's a bit more complicated... */ > { > > >> Richard. >> >>> -- >>> Eric Botcazou Whatever the fix will be, it should contain at least the two test cases from my patch, and, maybe - if that is possible - it would be nice to test it with with SLOW_UNALIGNED_ACCESS defined like this: config/i386/i386.h: #define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) 1 Bernd.
Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
On Wed, Dec 11, 2013 at 11:27 AM, Gopalasubramanian, Ganesh wrote: > Accommodated the changes that you mentioned. > Completed the bootstrap testing too. Please provide updated ChangeLog. The function comment is missing. Maybe you should also describe magic number 32 here? +static unsigned +ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop) + if (!ix86_tune_features [X86_TUNE_ADJUST_UNROLL]) + return nunroll; Please add #define TARGET_ADJUST_UNROLL \ ix86_tune_features[X86_TUNE_ADJUST_UNROLL] to i386.h and use definition here. Otherwise, the patch looks OK. BTW: Please avoid top-posting, see e.g. [1] for reasons... [1] http://gcc.gnu.org/ml/gcc/2008-08/msg00133.html Uros.
Re: [PATCH] Fix PR 59390
On Wed, Dec 11, 2013 at 10:15 AM, Bernhard Reutner-Fischer wrote: >>> Patch updated with two more tests to check if the vfmadd insn is being >>> produced when possible. >>> >>> Thanks >>> Sri >>> >>> On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam >>> wrote: Hi, I have attached a patch to fix http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review. Here is the problem. GCC adds target-specific builtins on demand. The FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via this declaration: void fun() __attribute__((target("fma"))); Specifically, the builtin __builtin_ia32_vfmaddpd gets added when ix86_add_new_builtins is called from ix86_valid_target_attribute_tree when processing this target attribute. Now, when the vectorizer is processing the builtin "__builtin_fma" in function other_fn(), it checks to see if this function is vectorizable and calls ix86_builtin_vectorized_function in i386.c. That returns the builtin stored here: case BUILT_IN_FMA: if (out_mode == DFmode && in_mode == DFmode) { if (out_n == 2 && in_n == 2) return ix86_builtins[IX86_BUILTIN_VFMADDPD]; ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE had the builtin not been added by the previous target attribute. That is why the code works if we remove the previous declaration. The fix is to not just return the builtin but to also check if the current function's isa allows the use of the builtin. For instance, this patch would solve the problem: @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, tre if (out_mode == DFmode && in_mode == DFmode) { if (out_n == 2 && in_n == 2) -return ix86_builtins[IX86_BUILTIN_VFMADDPD]; +{ + if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa + & global_options.x_ix86_isa_flags) +return ix86_builtins[IX86_BUILTIN_VFMADDPD]; + else + return NULL_TREE; +} but there are many instances of this usage in ix86_builtin_vectorized_function. This patch covers all the cases. >> >>> PR target/59390 >>> * gcc.target/i386/pr59390.c: New test. >>> * gcc.target/i386/pr59390_1.c: New test. >>> * gcc.target/i386/pr59390_2.c: New test. >>> * config/i386/i386.c (get_builtin): New function. >>> (ix86_builtin_vectorized_function): Replace all instances of >>> ix86_builtins[...] with get_builtin(...). >>> (ix86_builtin_reciprocal): Ditto. >> >> OK, with a couple of nits: >> >> +static tree get_builtin (enum ix86_builtins code) >> >> Please name this function ix86_get_builtin. >> >> + if (current_function_decl) >> +target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); >> + if (target_tree) >> +opts = TREE_TARGET_OPTION (target_tree); >> + else >> +opts = TREE_TARGET_OPTION (target_option_default_node); >> + >> >> opts = TREE_TARGET_OPTION (target_tree ? target_tree : >> target_option_default_node); > > Just curious: >> +static tree get_builtin (enum ix86_builtins code) >> +{ > [] >> +> > [] >> @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre >>if (out_mode == DFmode && in_mode == DFmode) >> { >> if (out_n == 2 && in_n == 2) >> -return ix86_builtins[IX86_BUILTIN_SQRTPD]; >> +get_builtin (IX86_BUILTIN_SQRTPD); >> else if (out_n == 4 && in_n == 4) >> -return ix86_builtins[IX86_BUILTIN_SQRTPD256]; >> +get_builtin (IX86_BUILTIN_SQRTPD256); >> } >>break; > > I must be missing something? > Don't you have to return ix86_get_builtin(...) ? Huh, I didn't even notice this mistake. Uros.
Re: New tsan tests.
> This test would fail on strict alignment targets. Ok, we'll throw in a dg-skip-if. > Why is the test C++ only though? > Again, why C++ only? > Ditto. > Likewise. I think main reason is having same versions of code with Clang. I had an impression that minimization of changes would be preferred and C vs. C++ should not make much difference for TSan anyway. We can convert the tests if necessary though. -Y
Re: New tsan tests.
On Wed, Dec 11, 2013 at 03:05:22PM +0400, Yury Gribov wrote: > > This test would fail on strict alignment targets. > > Ok, we'll throw in a dg-skip-if. > > > Why is the test C++ only though? > > Again, why C++ only? > > Ditto. > > Likewise. > > I think main reason is having same versions of code with Clang. I > had an impression that minimization of changes would be preferred > and C vs. C++ should not make much difference for TSan anyway. > > We can convert the tests if necessary though. I guess I can live with it as is, just was wondering why the tests were written so carelessly that when they don't really need to test any C++ features they are still written in C++. Jakub
Re: patch for elimination to SP when it is changed in RTL (PR57293)
Yvan, On Wed, Dec 11, 2013 at 10:35 AM, Yvan Roux wrote: > Hi Vladimir, > > I've some regressions on ARM after this SP elimination patch, and they > are execution failures. Here is the list: Pragmatically, I think it's time we turned LRA on by default now that we are in stage3 and that would help with getting more issues out of the auto-testers quicker than anything else. Given we are now well into stage3, we should make sure that the LRA support gets as much testing as it can get in the run-up to the release. Can you prepare a patch for this please ? regards Ramana > > g++.dg/cilk-plus/AN/array_test_ND_tplt.cc -O3 -fcilkplus > gcc.c-torture/execute/va-arg-22.c -O2 > gcc.dg/atomic/c11-atomic-exec-5.c -O0 > gfortran.dg/direct_io_12.f90 -O[23] > gfortran.dg/elemental_dependency_1.f90 -O2 > gfortran.dg/matmul_2.f90 -O2 > gfortran.dg/matmul_6.f90 -O2 > gfortran.dg/mvbits_7.f90 -O3 > gfortran.dg/unlimited_polymorphic_1.f03 -O3 > > I reduced and looked at var-arg-22.c and the issue is that in > lra_eliminate_regs_1 (called by get_equiv_with_elimination) we > transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset. What > we try to do here is to change the pseudo 195 of the insn 118 below : > > (insn 118 114 112 8 (set (reg:DI 195) > (unspec:DI [ > (mem:DI (plus:SI (reg/f:SI 215) > (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12 > + 64B]+8 S8 A8]) > ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi} > (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192) > (const_int 8 [0x8])) [7 a35+8 S8 A32]) > (nil))) > > with its equivalent (x arg of lra_eliminate_regs_1): > > (mem/c:DI (plus:SI (reg/f:SI 102 sfp) > (const_int 76 [0x4c])) [7 a35+8 S8 A32]) > > lra_eliminate_regs_1 is called with full_p = true (it is not really > clear for what it means), but in the PLUS switch case, we have offset > = 0xb (given by ep->offset) and as lra_get_insn_recog_data > (insn)->sp_offset value is 0, we will indeed add 0xb to the original > 0x4c offset. > > So, here I don't get if it is the sp_offset value of the > lra_insn_recog_data element which is not well updated or if lra_ > eliminate_regs_1 has to be called with update_p and not full_p (which > fixed the value in that particular case). Is it more obvious for you > ? > > Thanks > Yvan > > > On 3 December 2013 16:39, Vladimir Makarov wrote: >> On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote: >>> >>> On 2 December 2013 23:44, Vladimir Makarov wrote: >>> If somebody with the rights approves, I can commit it tomorrow. 2013-12-02 Vladimir Makarov * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check LR_REGNUM. (aarch64_can_eliminate): Don't check elimination source when frame_pointer_requred is false. >>> >>> >>> This is fine with me, go ahead and commit it. Thanks /Marcus >>> >> Committed as rev. 205637 with changelog fix of a typo found by Jeff. >>
Re: Two build != host fixes
Hi, I'm having problems with that patch. I try to start at X86_64-linux-gnu, and I want to get the GCC running on arm-linux-gnueabihf. I grabbed system headers and libraries from the target and put it in the prefix path. In the first step I do ../gcc-4.9-20131208/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard This GCC runs on PC and generates arm-linux-gnueabihf executables. Then I try this ../gcc-4.9-20131208/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-cross --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard But It fails because auto-build.h contains nonsense. That is probably because almost every check has a fatal error #include not found. I personally prefer to have gmp, mpfr, mpc in-tree (using contrib/download_prerequisites). I experimented a bit and at least this attached patch improves the situation for me. Maybe I never had any problems with GMP before, because the in-tree configuration of GMP does -DNO_ASM ? Regards Bernd. patch-configure.diff Description: Binary data
Re: New tsan tests.
> I guess I can live with it as is, just was wondering why the tests > were written so carelessly that when they don't really need to test > any C++ features they are still written in C++. Adding Kostya to comment. -Y
Re: questions about COND_EXEC and SMS
On Thu, Dec 5, 2013 at 12:11 PM, dxq wrote: > hi all, > > *We found that COND_EXEC is better than IF_THEN_ELSE when used as expressing > condition move insns, because in sched, IF_THEN_ELSE insn has a dependence > on itself, and COND_EXEC has not. > * Besides, IF_THEN_ELSE is not good for SMS. some backend (frv) expands > condition move as IF_THEN_ELSE pattern before reload and splits IF_THEN_ELSE > into COND_EXEC at split2 pass, which is a post reolad pass. However, in SMS > pass(pre-reload), we can't get the accurate information of IF_THEN_ELSE > insns. > * However, as far as i know, COND_EXEC is not supporting in pre-reload > passes. > > So, I'm asking for some suggestions for supporiting COND_EXEC in pre-reload > passes, for me, maybe from split1 to reload pass is good enough. what work > need to do for supporting that, and is there any one who has done any work > on that? Supporting COND_EXEC before register allocation (RA) is in itself not a big challenge. The problem is handling them *during* register allocation: How to do spilling/reloading registers used in a predicated instruction, and how to spill/reload the predicate condition (condition code or BImode register). If I would have to add support for this myself, I'd do it as follows: 1. Make your port work with LRA instead of reload. 2. Make pre-RA passes handle COND_EXEC with the predicate (COND_EXEC_COND) explicitly assigned to a hard register and only predicated instructions (COND_EXEC_EXPR) that don't need reloads (e.g. only reg->reg and imm->reg moves). 3. Try to get the simple test cases from (1) to get through IRA/LRA. I wouldn't try doing it with reload. With LRA it's likely simpler to add support for this because you can spill to pseudo-registers. 4. Gradually relax the constraints on the kind of COND_EXEC instructions you accept before reload, and play whack-a-mole on all bugs that ubdoubtedly will pop up. Supporting COND_EXEC before reload has been discussed before on this mailing list, but I don't think anyone has every started a serious effort to implement it. There are so few targets that use COND_EXPR and the ones that do (ia64) are falling out of favor. I expect (or at least: hope) that LRA makes it a lot easier to implement it. Ciao! Steven
Re: New tsan tests.
What's wrong with C++? :) Upstream we have 16 .c tests and 106 .cc tests in compiler-rt/lib/tsan/lit_tests. We typically prefer .cc because imo C++ is a better language (even when using what looks like the C subset). But we need some .c tests to make sure that tsan still works w/o C++ run-time. --kcc On Wed, Dec 11, 2013 at 3:12 PM, Yury Gribov wrote: >> I guess I can live with it as is, just was wondering why the tests >> were written so carelessly that when they don't really need to test >> any C++ features they are still written in C++. > > Adding Kostya to comment. > > -Y
Re: New tsan tests.
On Wed, Dec 11, 2013 at 03:21:32PM +0400, Konstantin Serebryany wrote: > What's wrong with C++? :) > Upstream we have 16 .c tests and 106 .cc tests in > compiler-rt/lib/tsan/lit_tests. > We typically prefer .cc because imo C++ is a better language (even That is a matter of opinion. > when using what looks like the C subset). The question is why don't you limit to the subset of the two languages when it doesn't cost anything. As I've mentioned on the patch, some of the tests were C++ only (well, also C99) just because there wasn't return 0; in main, others just because they used reinterpret_cast instead of C cast where it didn't result in any advantage. Some just because they used new instead of malloc, though in this case I can understand that you might want to test how is operator new instrumented. Jakub
Re: patch for elimination to SP when it is changed in RTL (PR57293)
> Pragmatically, I think it's time we turned LRA on by default now that > we are in stage3 and that would help with getting more issues out of > the auto-testers quicker than anything else. Given we are now well > into stage3, we should make sure that the LRA support gets as much > testing as it can get in the run-up to the release. I agree. > Can you prepare a patch for this please ? I'll post the patch on the list. Thanks, Yvan
Re: New tsan tests.
On Wed, Dec 11, 2013 at 3:27 PM, Jakub Jelinek wrote: > On Wed, Dec 11, 2013 at 03:21:32PM +0400, Konstantin Serebryany wrote: >> What's wrong with C++? :) >> Upstream we have 16 .c tests and 106 .cc tests in >> compiler-rt/lib/tsan/lit_tests. >> We typically prefer .cc because imo C++ is a better language (even > > That is a matter of opinion. Of course! (I did say "imo") > >> when using what looks like the C subset). > > The question is why don't you limit to the subset of the two languages > when it doesn't cost anything. Mostly because of my (our) opinion above. :) GCC test suite may have more .c tests than .cc tests; I don't have an opinion here. We need some C++specific tests (e.g. with operator new) and we need at least one C test. --kcc > As I've mentioned on the patch, some of > the tests were C++ only (well, also C99) just because there wasn't return 0; > in main, others just because they used reinterpret_cast instead of C cast > where it didn't result in any advantage. Some just because they used > new instead of malloc, though in this case I can understand that you might > want to test how is operator new instrumented. > > Jakub
[PATCH, ARM, LRA] Switch on LRA on ARM.
Hi, here is the patch to turn LRA on by default on ARM, there is still some regressions in the testsuite, as reported in the thread below, but as Ramana said in the same thread, we now need to find the remaining issue as fast as possible. http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01074.html Thanks Yvan 2013-12-11 Yvan Roux * config/arm/arm.opt (mlra): Enable LRA by default. Index: gcc/config/arm/arm.opt === --- gcc/config/arm/arm.opt (revision 205885) +++ gcc/config/arm/arm.opt (working copy) @@ -144,7 +144,7 @@ Specify the name of the target floating point hardware/format mlra -Target Report Var(arm_lra_flag) Init(0) Save +Target Report Var(arm_lra_flag) Init(1) Save Use LRA instead of reload (transitional) mhard-float
Re: [PATCH, ARM, LRA] Switch on LRA on ARM.
On 11/12/13 11:47, Yvan Roux wrote: > Hi, > > here is the patch to turn LRA on by default on ARM, there is still > some regressions in the testsuite, as reported in the thread below, > but as Ramana said in the same thread, we now need to find the > remaining issue as fast as possible. > > http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01074.html > > Thanks > Yvan > > > 2013-12-11 Yvan Roux > > * config/arm/arm.opt (mlra): Enable LRA by default. > > > lra-on.diff > > > Index: gcc/config/arm/arm.opt > === > --- gcc/config/arm/arm.opt(revision 205885) > +++ gcc/config/arm/arm.opt(working copy) > @@ -144,7 +144,7 @@ > Specify the name of the target floating point hardware/format > > mlra > -Target Report Var(arm_lra_flag) Init(0) Save > +Target Report Var(arm_lra_flag) Init(1) Save > Use LRA instead of reload (transitional) > > mhard-float > OK. R.
Re: New tsan tests.
On Wed, Dec 11, 2013 at 03:35:37PM +0400, Konstantin Serebryany wrote: > >> when using what looks like the C subset). > > > > The question is why don't you limit to the subset of the two languages > > when it doesn't cost anything. > > Mostly because of my (our) opinion above. :) > GCC test suite may have more .c tests than .cc tests; I don't have an > opinion here. > We need some C++specific tests (e.g. with operator new) and we need at > least one C test. In the GCC testsuite, you can put tests that are written in the common subset of C and C++ that you want to test by both frontends into c-c++-common (they even can have different expected errors etc. through { target c } or { target c++ }), you can use #ifdef __cplusplus in the tests too. Anyway, let's keep the current tests as is, the patch is ok for trunk, and if in the future you wouldn't mind making more tests written in the common subset of C/C++, it would be certainly appreciated. Jakub
Re: [ARM] Fix register r3 wrongly used to save ip in nested APCS frame
> Revised patch attached, your testcase passes on arm-eabi with it. Does it > look OK to you? If so, I'll run a testing cycle on arm-vxworks and > arm-eabi. > > > * config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with > arguments to push onto the stack and no varargs, save ip into the last > stack slot if r3 isn't available on entry. No problems detected for the patch on arm-vxworks or arm-eabi. -- Eric Botcazou
Re: AARCH64 configure check for gas -mabi support
On 10/12/13 20:23, Kugan wrote: gcc/ +2013-12-11 Kugan Vivekanandarajah + * configure.ac: Add check for aarch64 assembler -mabi support. + * configure: Regenerate. + * config.in: Regenerate. + * config/aarch64/aarch64-elf.h (ASM_MABI_SPEC): New define. + (ASM_SPEC): Update to substitute -mabi with ASM_MABI_SPEC. + * config/aarch64/aarch64.h (aarch64_override_options): Issue error if + assembler does not support -mabi and option ilp32 is selected. + * doc/install.texi: Added note that building gcc 4.9 and after with pre + 2.24 binutils will not support -mabi=ilp32. + Kugan, Thanks for sorting this out. OK to commit. /Marcus
Re: Two build != host fixes
On Wed, Dec 11, 2013 at 12:10:04PM +0100, Bernd Edlinger wrote: > Hi, > > I'm having problems with that patch. Sorry to hear that. > I try to start at X86_64-linux-gnu, and I want to get the GCC running on > arm-linux-gnueabihf. > I grabbed system headers and libraries from the target and put it in the > prefix path. > > In the first step I do > > ../gcc-4.9-20131208/configure > --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 > --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran > --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 > --with-float=hard > > This GCC runs on PC and generates arm-linux-gnueabihf executables. > > Then I try this > > ../gcc-4.9-20131208/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-cross > --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf > --enable-languages=c,c++,fortran --with-arch=armv7-a --with-tune=cortex-a9 > --with-fpu=vfpv3-d16 --with-float=hard > > But It fails because auto-build.h contains nonsense. That is probably because > almost every check > has a fatal error #include not found. > > I personally prefer to have gmp, mpfr, mpc in-tree (using > contrib/download_prerequisites). > > I experimented a bit and at least this attached patch improves the situation > for me. > > Maybe I never had any problems with GMP before, because the in-tree > configuration of GMP does -DNO_ASM ? GMPINC really shouldn't be used to find build headers, since it is used to find host headers. See the top level Makefile.in. When gmp has been installed, using GMPINC means you pull in a whole lot of host headers for the build compiler. Which might work in rare cases, but it's a lot more likely to fail. Even with in-tree gmp, how do you get things like GMP_LIMB_BITS correct if your build machine is 64-bit and your host is 32-bit? (Perhaps there is some build magic that allows this to work, I'll investigate when I get back from vacation.) Incidentally, we've been using a couple of other patches for build != host that I haven't posted because I wasn't sure who authored them. It's possible the first one might help you. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index aad927c..7995e64 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -747,7 +747,8 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS) # Native linker and preprocessor flags. For x-fragment overrides. BUILD_LDFLAGS=@BUILD_LDFLAGS@ -BUILD_CPPFLAGS=$(ALL_CPPFLAGS) +BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \ + -I$(srcdir)/../include $(CPPINC) # Actual name to use when installing a native compiler. GCC_INSTALL_NAME := $(shell echo gcc|sed '$(program_transform_name)') diff --git a/gcc/ada/gcc-interface/Make-lang.in b/gcc/ada/gcc-interface/Make-lang.in index 57f9009..e1d3ed6 100644 --- a/gcc/ada/gcc-interface/Make-lang.in +++ b/gcc/ada/gcc-interface/Make-lang.in @@ -625,7 +625,7 @@ ada.tags: force ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb -$(MKDIR) ada/doctools $(CP) $^ ada/doctools - cd ada/doctools && $(GNATMAKE) -q xgnatugn + cd ada/doctools && gnatmake -q xgnatugn # Note that doc/gnat_ugn.texi and doc/projects.texi do not depend on # xgnatugn being built so we can distribute a pregenerated doc/gnat_ugn.info diff --git a/gnattools/Makefile.in b/gnattools/Makefile.in index 794d374..6b0d5e8 100644 --- a/gnattools/Makefile.in +++ b/gnattools/Makefile.in @@ -23,6 +23,7 @@ SHELL = @SHELL@ srcdir = @srcdir@ libdir = @libdir@ build = @build@ +host = @host@ target = @target@ prefix = @prefix@ INSTALL = @INSTALL@ @@ -31,6 +32,7 @@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ # Nonstandard autoconf-set variables. LN_S=@LN_S@ +host_alias=@host_alias@ target_noncanonical=@target_noncanonical@ # Variables for the user (or the top level) to override. @@ -183,7 +185,11 @@ regnattools: $(GCC_DIR)/stamp-gnatlib-rts # put the host RTS dir first in the PATH to hide the default runtime # files that are among the sources # FIXME: This should be done in configure. +ifeq ($(host), $(build)) RTS_DIR:=$(strip $(subst \,/,$(shell gnatls -v | grep adalib ))) +else +RTS_DIR:=$(strip $(subst \,/,$(shell $(host_alias)-gnatls -v | grep adalib ))) +endif gnattools-cross: $(GCC_DIR)/stamp-tools # gnattools1-re $(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \ -- Alan Modra Australia Development Lab, IBM
Re: AARCH64 configure check for gas -mabi support
Committed on Kugan's behalf as rev 205891. On 11 December 2013 13:27, Marcus Shawcroft wrote: > On 10/12/13 20:23, Kugan wrote: > >> gcc/ >> >> +2013-12-11 Kugan Vivekanandarajah >> + * configure.ac: Add check for aarch64 assembler -mabi support. >> + * configure: Regenerate. >> + * config.in: Regenerate. >> + * config/aarch64/aarch64-elf.h (ASM_MABI_SPEC): New define. >> + (ASM_SPEC): Update to substitute -mabi with ASM_MABI_SPEC. >> + * config/aarch64/aarch64.h (aarch64_override_options): Issue >> error if >> + assembler does not support -mabi and option ilp32 is selected. >> + * doc/install.texi: Added note that building gcc 4.9 and after >> with pre >> + 2.24 binutils will not support -mabi=ilp32. >> + >> > > Kugan, Thanks for sorting this out. OK to commit. > > /Marcus >
[Patch, Fortran, F03] PR 58916: Allocation of scalar with array source not rejected
Hi all, attached is a small patch which fixes accepts-invalid and ICE-on-invalid problems on allocation with source. Regtested on x86_64-unknown-linux-gnu. Ok for trunk? Cheers, Janus 2013-12-11 Janus Weil PR fortran/58916 * resolve.c (conformable_arrays): Treat scalar 'e2'. (resolve_allocate_expr): Check rank also for unlimited-polymorphic variables. 2013-12-11 Janus Weil PR fortran/58916 * gfortran.dg/allocate_with_source_4.f90: New. Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (revision 205872) +++ gcc/fortran/resolve.c (working copy) @@ -6597,7 +6597,8 @@ conformable_arrays (gfc_expr *e1, gfc_expr *e2) for (tail = e2->ref; tail && tail->next; tail = tail->next); /* First compare rank. */ - if (tail && e1->rank != tail->u.ar.as->rank) + if ((tail && e1->rank != tail->u.ar.as->rank) + || (!tail && e1->rank != e2->rank)) { gfc_error ("Source-expr at %L must be scalar or have the " "same rank as the allocate-object at %L", @@ -6794,8 +6795,7 @@ resolve_allocate_expr (gfc_expr *e, gfc_code *code } /* Check F03:C632 and restriction following Note 6.18. */ - if (code->expr3->rank > 0 && !unlimited - && !conformable_arrays (code->expr3, e)) + if (code->expr3->rank > 0 && !conformable_arrays (code->expr3, e)) goto failure; /* Check F03:C633. */ ! { dg-do compile } ! ! PR 58916: [F03] Allocation of scalar with array source not rejected ! ! Contributed by Vladimir Fuka class(*), allocatable :: a1 real, allocatable :: a2 real b(1) allocate(a1, source=b) ! { dg-error "must be scalar or have the same rank" } allocate(a2, source=b) ! { dg-error "must be scalar or have the same rank" } end
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford wrote: > Richard Henderson writes: >> On 12/10/2013 10:44 AM, Richard Sandiford wrote: >>> Sorry, I don't understand. I never said it was invalid. I said >>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents >>> a single register. On a little-endian target, the offset cannot be >>> anything other than 0 in that case. >>> >>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for >>> something that is always invalid, regardless of the target. That kind >>> of situation should be rejected by target-independent code instead. >> >> But, we want to disable the subreg before we know whether or not (reg:V4SF X) >> will be allocated to a single hard register. That is something that we can't >> know in target-independent code before register allocation. > > I was thinking that if we've got a class, we've also got things like > CLASS_MAX_NREGS. Maybe that doesn't cope with padding properly though. > But even in the padding cases an offset-based check in C_C_M_C could > be derived from other information. > > subreg_get_info handles padding with: > > nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); > if (GET_MODE_INNER (xmode) == VOIDmode) > xmode_unit = xmode; > else > xmode_unit = GET_MODE_INNER (xmode); > gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); > gcc_assert (nregs_xmode > == (GET_MODE_NUNITS (xmode) > * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit))); > gcc_assert (hard_regno_nregs[xregno][xmode] > == (hard_regno_nregs[xregno][xmode_unit] > * GET_MODE_NUNITS (xmode))); > > /* You can only ask for a SUBREG of a value with holes in the middle > if you don't cross the holes. (Such a SUBREG should be done by > picking a different register class, or doing it in memory if > necessary.) An example of a value with holes is XCmode on 32-bit > x86 with -m128bit-long-double; it's represented in 6 32-bit > registers, > 3 for each part, but in memory it's two 128-bit parts. > Padding is assumed to be at the end (not necessarily the 'high part') > of each unit. */ > if ((offset / GET_MODE_SIZE (xmode_unit) + 1 >< GET_MODE_NUNITS (xmode)) > && (offset / GET_MODE_SIZE (xmode_unit) > != ((offset + GET_MODE_SIZE (ymode) - 1) > / GET_MODE_SIZE (xmode_unit > { > info->representable_p = false; > rknown = true; > } > > and I wouldn't really want to force targets to individually reproduce > that kind of logic at the class level. If the worst comes to the worst > we could cache the difficult cases. > My case is x86 CANNOT_CHANGE_MODE_CLASS only needs to know if the subreg byte is zero or not. It doesn't care about mode padding. You are concerned about information passed to CANNOT_CHANGE_MODE_CLASS is too expensive for target to process. It isn't the case for x86. Am I correct that mode can't change if subreg byte is non-zero? A target can just check subreg byte != 0, like my patch does. Here is a patch to add SUBREG_BYTE to CANNOT_CHANGE_MODE_CLASS. Tested on Linux/x86-64. Does it look OK? Thanks. -- H.J. --- 2013-12-11 H.J. Lu * combine.c (subst): Pass subreg byte to REG_CANNOT_CHANGE_MODE_P. (simplify_set): Likewise. * emit-rtl.c (validate_subreg): Likewise. * recog.c (register_operand): Likewise. * rtlanal.c (simplify_subreg_regno): Likewise. * hard-reg-set.h (REG_CANNOT_CHANGE_MODE_P): Add SUBREG_BYTE and pass it to CANNOT_CHANGE_MODE_CLASS. * regcprop.c (mode_change_ok): Pass unknown subreg byte to REG_CANNOT_CHANGE_MODE_P. * reginfo.c (record_subregs_of_mode): Pass unknown subreg byte to CANNOT_CHANGE_MODE_CLASS. * postreload.c (reload_cse_simplify_set): Pass subreg byte to CANNOT_CHANGE_MODE_CLASS. (reload_cse_simplify_operands): Likewise. * reload.c (push_reload): Likewise. * reload1.c (choose_reload_regs): Pass subreg byte to REG_CANNOT_CHANGE_MODE_P. (inherit_piecemeal_p): Pass unknown subreg byte to REG_CANNOT_CHANGE_MODE_P. * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Add and ignore subreg byte. * config/alpha/alpha.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/ia64/ia64.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/m32c/m32c.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/mep/mep.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/mips/mips.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/msp430/msp430.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/pa/pa32-regs.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/pa/pa64-regs.h (CANNOT_CHANGE_MODE_CLASS): Likewise. * config/pdp11/pdp11.h (CANNO
RE: Two build != host fixes
On Wed, 11 Dec 2013 23:11:46, Alan Modra wrote: > > On Wed, Dec 11, 2013 at 12:10:04PM +0100, Bernd Edlinger wrote: >> Hi, >> >> I'm having problems with that patch. > > Sorry to hear that. > Never mind. I have similar patches, but I did not >> I try to start at X86_64-linux-gnu, and I want to get the GCC running on >> arm-linux-gnueabihf. >> I grabbed system headers and libraries from the target and put it in the >> prefix path. >> >> In the first step I do >> >> ../gcc-4.9-20131208/configure >> --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 >> --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran >> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 >> --with-float=hard >> >> This GCC runs on PC and generates arm-linux-gnueabihf executables. >> >> Then I try this >> >> ../gcc-4.9-20131208/configure >> --prefix=/home/ed/gnu/arm-linux-gnueabihf-cross --host=arm-linux-gnueabihf >> --target=arm-linux-gnueabihf --enable-languages=c,c++,fortran >> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 >> --with-float=hard >> >> But It fails because auto-build.h contains nonsense. That is probably >> because almost every check >> has a fatal error #include not found. >> >> I personally prefer to have gmp, mpfr, mpc in-tree (using >> contrib/download_prerequisites). >> >> I experimented a bit and at least this attached patch improves the situation >> for me. >> >> Maybe I never had any problems with GMP before, because the in-tree >> configuration of GMP does -DNO_ASM ? > > GMPINC really shouldn't be used to find build headers, since it is used > to find host headers. See the top level Makefile.in. When gmp has > been installed, using GMPINC means you pull in a whole lot of host > headers for the build compiler. Which might work in rare cases, but > it's a lot more likely to fail. Even with in-tree gmp, how do you get > things like GMP_LIMB_BITS correct if your build machine is 64-bit and > your host is 32-bit? (Perhaps there is some build magic that allows > this to work, I'll investigate when I get back from vacation.) > I do not know, but until last week the only problem was a missing SSIZE_MAX in gcc/config/host-linux.c (glimits.h does not define this, and fix-include replaced mine!) We need the auto-build only to build something that translates .md files to .c, so I would'nt care about GMP, but some other things, like the right prototype for printf make a difference. now the auto-build.h has #define HAVE_DECL_SBRK 0 last week that was #define HAVE_DECL_SBRK 1 I can give you my sys-root files, and you can play with it if you like. > Incidentally, we've been using a couple of other patches for > build != host that I haven't posted because I wasn't sure who authored > them. It's possible the first one might help you. > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index aad927c..7995e64 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -747,7 +747,8 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS) > > # Native linker and preprocessor flags. For x-fragment overrides. > BUILD_LDFLAGS=@BUILD_LDFLAGS@ > -BUILD_CPPFLAGS=$(ALL_CPPFLAGS) > +BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \ > + -I$(srcdir)/../include $(CPPINC) > I did not have this one. What is it good for? > # Actual name to use when installing a native compiler. > GCC_INSTALL_NAME := $(shell echo gcc|sed '$(program_transform_name)') > diff --git a/gcc/ada/gcc-interface/Make-lang.in > b/gcc/ada/gcc-interface/Make-lang.in > index 57f9009..e1d3ed6 100644 > --- a/gcc/ada/gcc-interface/Make-lang.in > +++ b/gcc/ada/gcc-interface/Make-lang.in > @@ -625,7 +625,7 @@ ada.tags: force > ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb > -$(MKDIR) ada/doctools > $(CP) $^ ada/doctools > - cd ada/doctools && $(GNATMAKE) -q xgnatugn > + cd ada/doctools && gnatmake -q xgnatugn > Yes, I also have that. It's a show-stopper for Ada. > # Note that doc/gnat_ugn.texi and doc/projects.texi do not depend on > # xgnatugn being built so we can distribute a pregenerated doc/gnat_ugn.info > diff --git a/gnattools/Makefile.in b/gnattools/Makefile.in > index 794d374..6b0d5e8 100644 > --- a/gnattools/Makefile.in > +++ b/gnattools/Makefile.in > @@ -23,6 +23,7 @@ SHELL = @SHELL@ > srcdir = @srcdir@ > libdir = @libdir@ > build = @build@ > +host = @host@ > target = @target@ > prefix = @prefix@ > INSTALL = @INSTALL@ > @@ -31,6 +32,7 @@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ > > # Nonstandard autoconf-set variables. > LN_S=@LN_S@ > +host_alias=@host_alias@ > target_noncanonical=@target_noncanonical@ > > # Variables for the user (or the top level) to override. > @@ -183,7 +185,11 @@ regnattools: $(GCC_DIR)/stamp-gnatlib-rts > # put the host RTS dir first in the PATH to hide the default runtime > # files that are among the sources > # FIXME: This should be done in configure. > +ifeq ($(host), $(build)) > RTS_DIR:=$(strip $(subst \,/,$(shell gnatls -v | grep adalib ))) > +else > +RTS_DIR:=$(strip $(subst \,
Re: Two build != host fixes
On 11 Dec 2013, at 13:11, Bernd Edlinger wrote: > I did not have this one. > What is it good for? > >> # Actual name to use when installing a native compiler. >> GCC_INSTALL_NAME := $(shell echo gcc|sed '$(program_transform_name)') >> diff --git a/gcc/ada/gcc-interface/Make-lang.in >> b/gcc/ada/gcc-interface/Make-lang.in >> index 57f9009..e1d3ed6 100644 >> --- a/gcc/ada/gcc-interface/Make-lang.in >> +++ b/gcc/ada/gcc-interface/Make-lang.in >> @@ -625,7 +625,7 @@ ada.tags: force >> ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb >> -$(MKDIR) ada/doctools >> $(CP) $^ ada/doctools >> - cd ada/doctools && $(GNATMAKE) -q xgnatugn >> + cd ada/doctools && gnatmake -q xgnatugn >> > > Yes, I also have that. It's a show-stopper for Ada. I have some more fixes for Ada cross-builds that Eric commented on but need a little more work - will try to re-test this evening and re-post tomorrow. Iain
Re: [Patch, Fortran, F03] PR 58916: Allocation of scalar with array source not rejected
Dear Janus, It looks good to me - OK for trunk. Thanks for the patch. Paul On 11 December 2013 14:02, Janus Weil wrote: > Hi all, > > attached is a small patch which fixes accepts-invalid and > ICE-on-invalid problems on allocation with source. Regtested on > x86_64-unknown-linux-gnu. Ok for trunk? > > Cheers, > Janus > > > 2013-12-11 Janus Weil > > PR fortran/58916 > * resolve.c (conformable_arrays): Treat scalar 'e2'. > (resolve_allocate_expr): Check rank also for unlimited-polymorphic > variables. > > > 2013-12-11 Janus Weil > > PR fortran/58916 > * gfortran.dg/allocate_with_source_4.f90: New. -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
> On 12/10/2013 04:48 AM, Jan Hubicka wrote: > >The case where I played with local comdats was the following. > >I made cgraph_body_availability to get context argument (i.e. instead of > >saying > >if something is available in current unit, it was saying if it is available > >in current function/variable). > > > >If two symbols are in the same comdat group and refering each other, they are > >available even though they may seem overwritable to others. I then started to > >produce local symbols for those local references that are equivalent to your > >comdat > >local. > > > >We probably want to get in this extension too. (I did not get data on how > >often > >it fires, but it does i.e. for recursive calls of C++ inlines). > > I wouldn't expect it to affect anything other than recursive calls, > since before my patch functions in the same COMDAT don't call each > other, and with my patch they only call functions that are already > local. > > Also, this optimization would seem to apply to all recursive > functions, not just those in comdat groups. Agreed, I already do the conversion for many functions (based on "inline" keyword implying that there is no overwrite changing semantic). So far the conversion does not happen for comdats, since it would lead to local comdat and I also ignored the conversion rule. I have patch for that that handles it post-inlining + inliner patch that takes advantage of function context to allow recursive inlining. > > Are you thinking to add this after my patch is in? Yes, lets do that incrementally. > > >>+ /* Don't clone decls local to a comdat group. */ > >>+ if (comdat_local_p (node)) > >>+return false; > > > >Why? It seems this should work if the clone becomes another comdat local? > > Right, I think the problem was that the clone wasn't becoming comdat > local. And for the specific case of decloning, there's no point in > cloning the decloned constructor. If it does not make sense, how we ended up cloning it? Can you show me some code sample of decloning? I assume that we basically turn original cloned constructors into wrappers around common "worker" that is now comdat local. I think ipa-cp may end up deciding on clonning the wrapper that will break because it will end up static calling the local comdat function. On the other hand it may decide to clone both the wrapper and the actual function and in that case bringing both static is fine. So to be on safe side, we probably want to consider functions calling local comdat unclonable but we do not need to worry about local comdats themselves. For good codegen, I think ipa-cp will need to understand it needs to either clone both or nothing. I think it is something Martin can look into? > > >On the other hand, I think you want to prevent ipa-cp propagating addresses > >of comdat > >locals out of the function. (i.e. make it to check can_refer predicate for > >its subtitutions) > > Right, I wasn't worrying about that because it can't come up with decloning. > > >So we should check here if both caller and callee are in the same group and > >allow > >inlining then? > > Makes sense. > > >Probably you want to get make_decl_local to preserve comdat group; then you > >won't need > >to care about it here and clonning could work. > > OK. > > >Probably also when declaring a comdat symbol local, we want to turn all > >associated > >comdats to local, right? (i.e. with LTO and hidden comdat) > > I don't think so; if we change a public comdat symbol to local, > that's a change to the ABI of the comdat, so it can't be the same > comdat anymore, and dissolving the comdat seems appropriate. Yep, this is what I had in mind. When we declare to turn comdat into non-comdat we need to make sure that the comdat locals are dissolved, too. > > >Also i think this change needs more work. FROM_DECL is not the function you > >are going to get the reference, it is variable whose constructor the value > >was > >take from. > >I think we need to rename it to can_refer_decl_in_symbol and add symbol > >argument > >to get proper check. I am not sure where we use it without being sure the > >symbol > >is current_function_decl. We definitly make use of it during devirt and we > >need to > >start using it in ipa-cp. > > Would it be OK for me to just drop this change, since it isn't > needed for decloning? OK, I will make followup patch for this. Thanks, Honza
Re: [Patch, Fortran, F03] PR 58916: Allocation of scalar with array source not rejected
> It looks good to me - OK for trunk. Thanks, Paul. Committed as r205894. Cheers, Janus > On 11 December 2013 14:02, Janus Weil wrote: >> Hi all, >> >> attached is a small patch which fixes accepts-invalid and >> ICE-on-invalid problems on allocation with source. Regtested on >> x86_64-unknown-linux-gnu. Ok for trunk? >> >> Cheers, >> Janus >> >> >> 2013-12-11 Janus Weil >> >> PR fortran/58916 >> * resolve.c (conformable_arrays): Treat scalar 'e2'. >> (resolve_allocate_expr): Check rank also for unlimited-polymorphic >> variables. >> >> >> 2013-12-11 Janus Weil >> >> PR fortran/58916 >> * gfortran.dg/allocate_with_source_4.f90: New. > > > > -- > The knack of flying is learning how to throw yourself at the ground and miss. >--Hitchhikers Guide to the Galaxy
Re: [PATCH, testsuite] Fix alignment in movapd tests
On 13-12-10 01:13 PM, Uros Bizjak wrote: Hello! 2013-12-10 Ryan Mansfield PR testsuite/59442 * gcc.target/i386/sse2-movapd-1.c: Fix alignment attributes. * gcc.target/i386/sse2-movapd-2.c: Likewise. * gcc.target/i386/avx-vmovapd-256-1.c: Likewise. * gcc.target/i386/avx-vmovapd-256-2.c: Likewise. OK for mainline and release branches. Thanks. Could someone please apply it for me? Regards, Ryan Mansfield
Re: Two build != host fixes
> I have some more fixes for Ada cross-builds that Eric commented on but need > a little more work - will try to re-test this evening and re-post tomorrow. It's also PR ada/55946. Would mind trying the attached patch? -- Eric BotcazouIndex: gnattools/Makefile.in === --- gnattools/Makefile.in (revision 205881) +++ gnattools/Makefile.in (working copy) @@ -24,6 +24,8 @@ srcdir = @srcdir@ libdir = @libdir@ build = @build@ target = @target@ +host = @host@ +host_alias= @host_alias@ prefix = @prefix@ INSTALL = @INSTALL@ INSTALL_DATA = @INSTALL_DATA@ @@ -92,6 +94,7 @@ TOOLS_FLAGS_TO_PASS_RE= \ "CC=../../xgcc -B../../" \ "CXX=../../xg++ -B../../ $(CXX_LFLAGS)" \ "CFLAGS=$(CFLAGS)" \ + "LDFLAGS=$(LDFLAGS)" \ "ADAFLAGS=$(ADAFLAGS)" \ "ADA_CFLAGS=$(ADA_CFLAGS)" \ "INCLUDES=$(INCLUDES_FOR_SUBDIR)" \ @@ -105,6 +108,22 @@ TOOLS_FLAGS_TO_PASS_RE= \ "TOOLSCASE=cross" # Variables for gnattools, cross +ifeq ($(build), $(host)) + GNATMAKE_FOR_HOST=gnatmake + GNATLINK_FOR_HOST=gnatlink + GNATBIND_FOR_HOST=gnatbind + GNATLS_FOR_HOST=gnatls +else + GNATMAKE_FOR_HOST=$(host_alias)-gnatmake + GNATLINK_FOR_HOST=$(host_alias)-gnatlink + GNATBIND_FOR_HOST=$(host_alias)-gnatbind + GNATLS_FOR_HOST=$(host_alias)-gnatls +endif + +# Put the host RTS dir first in the PATH to hide the default runtime +# files that are among the sources +RTS_DIR:=$(strip $(subst \,/,$(shell $(GNATLS_FOR_HOST) -v | grep adalib ))) + TOOLS_FLAGS_TO_PASS_CROSS= \ "CC=$(CC)" \ "CXX=$(CXX)" \ @@ -117,9 +136,9 @@ TOOLS_FLAGS_TO_PASS_CROSS= \ "exeext=$(exeext)" \ "fsrcdir=$(fsrcdir)" \ "srcdir=$(fsrcdir)" \ - "GNATMAKE=gnatmake" \ - "GNATLINK=gnatlink" \ - "GNATBIND=gnatbind" \ + "GNATMAKE=$(GNATMAKE_FOR_HOST)" \ + "GNATLINK=$(GNATLINK_FOR_HOST)" \ + "GNATBIND=$(GNATBIND_FOR_HOST)" \ "TOOLSCASE=cross" \ "LIBGNAT=" @@ -188,11 +207,6 @@ regnattools: $(GCC_DIR)/stamp-gnatlib-rt $(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \ $(TOOLS_FLAGS_TO_PASS_NATIVE) common-tools -# For cross builds of gnattools, -# put the host RTS dir first in the PATH to hide the default runtime -# files that are among the sources -# FIXME: This should be done in configure. -RTS_DIR:=$(strip $(subst \,/,$(shell gnatls -v | grep adalib ))) gnattools-cross: $(GCC_DIR)/stamp-tools # gnattools1-re $(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \ Index: gcc/ada/gcc-interface/Make-lang.in === --- gcc/ada/gcc-interface/Make-lang.in (revision 205881) +++ gcc/ada/gcc-interface/Make-lang.in (working copy) @@ -658,7 +658,7 @@ ada.tags: force ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb -$(MKDIR) ada/doctools $(CP) $^ ada/doctools - cd ada/doctools && $(GNATMAKE) -q xgnatugn + cd ada/doctools && gnatmake -q xgnatugn # Note that doc/gnat_ugn.texi and doc/projects.texi do not depend on # xgnatugn being built so we can distribute a pregenerated doc/gnat_ugn.info
Re: [Patch Ada/build] deal with some cross/native cross issues
Hello Eric, I had made the mods to this and done some light testing - then got side-tracked by other priorities. however, since the topic has come up on the list: On 6 Nov 2013, at 12:57, Eric Botcazou wrote: >> I've been trying to improve the building and testing of Darwin for crosses >> and native crosses. >> 1. xgnatugn needs to be run on the build system, so needs to be built with >> the build system's gnatmake. I haven't put a canonical prefix on this since >> this doesn't appear to be done elsewhere. Defined as GNATMAKE_FOR_BUILD and >> passed to sub-processes. > > Why do you need to pass it to ADA_TOOLS_FLAGS_TO_PASS though? Just replace > $(GNATMAKE) with gnatmake. done (FWIW, I think that GNATMAKE_FOR_BUILD would make it obvious for a future reader, but not a big deal) >> 2. Some builds might need to pass LDFLAGS to the gnat* builds. Appended >> LDFLAGS to GCC_LINK. Passed on in gnattools/Make. > > OK. > >> 3. In gnattools, the RTS dir must be for the host and not for the build; >> This actually only showed up when I tried a cross from a 64bit pointer >> machine to a 32bit pointer one (i.e it is easy for it to go unnoticed). > > OK, but don't you need to do the same for gnatmake/gnatbind/gnatlink here? > See gcc-interface/Make-lang.in, line 171 and below, for similar code. it did appear odd that one path had the test and the other did not, however the comment on the native-x case is somewhat misleading since it implies (at least to me) that the *intention* is to use the newly-built target(=host) lib? In the current patch this is changed to place the test and setting RTS_DIR to cover both native and canadian X cases, at least this should be safe. Note that I have *not* tested any canadian-crosses, just cross and native-cross, if you need someone to do a canadian X before this is applied, let me know, and I'll try to set something up. (unless it gets covered by Alan or Bernd's cases). I am re-testing the attached, rebased to tot, but that will take a while, given the machines I have available, OK to apply if [cross & native-cross] testing passes? (if the other folks doing cross-build stuff want to incorporate/take this on, that's OK too). Iain gcc/ada/gcc-interface/Make-lang.in | 9 + gcc/ada/gcc-interface/Makefile.in | 2 +- gnattools/Makefile.in | 6 ++ 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/gcc/ada/gcc-interface/Make-lang.in b/gcc/ada/gcc-interface/Make-lang.in index cd3676f..f7aafc0 100644 --- a/gcc/ada/gcc-interface/Make-lang.in +++ b/gcc/ada/gcc-interface/Make-lang.in @@ -178,6 +178,10 @@ else GNATLINK_FOR_HOST=$(host)-gnatlink GNATLS_FOR_HOST=$(host)-gnatls + ifneq ($(findstring ada,$(LANGUAGES)),) +RTS_DIR:=$(strip $(subst \,/,$(shell $(GNATLS_FOR_HOST) -v | grep adalib ))) + endif + ifeq ($(host), $(target)) # This is a cross native. All the sources are taken from the currently # built runtime. @@ -193,9 +197,6 @@ else else # This is a canadian cross. We should use a toolchain running on the # build platform and targeting the host platform. -ifneq ($(findstring ada,$(LANGUAGES)),) - RTS_DIR:=$(strip $(subst \,/,$(shell $(GNATLS_FOR_HOST) -v | grep adalib ))) -endif ADA_TOOLS_FLAGS_TO_PASS=\ CC="$(CC)" \ CXX="$(CXX)" \ @@ -658,7 +659,7 @@ ada.tags: force ada/doctools/xgnatugn$(build_exeext): ada/xgnatugn.adb -$(MKDIR) ada/doctools $(CP) $^ ada/doctools - cd ada/doctools && $(GNATMAKE) -q xgnatugn + cd ada/doctools && gnatmake -q xgnatugn # Note that doc/gnat_ugn.texi and doc/projects.texi do not depend on # xgnatugn being built so we can distribute a pregenerated doc/gnat_ugn.info diff --git a/gcc/ada/gcc-interface/Makefile.in b/gcc/ada/gcc-interface/Makefile.in index 885a5ed..6b675f2 100644 --- a/gcc/ada/gcc-interface/Makefile.in +++ b/gcc/ada/gcc-interface/Makefile.in @@ -2415,7 +2415,7 @@ TOOLS_FLAGS_TO_PASS= \ "GNATLINK=$(GNATLINK)" \ "GNATBIND=$(GNATBIND)" -GCC_LINK=$(CXX) $(GCC_LINK_FLAGS) $(ADA_INCLUDES) +GCC_LINK=$(CXX) $(GCC_LINK_FLAGS) $(ADA_INCLUDES) $(LDFLAGS) # Build directory for the tools. Let's copy the target-dependent # sources using the same mechanism as for gnatlib. The other sources are diff --git a/gnattools/Makefile.in b/gnattools/Makefile.in index fdd6491..118847c 100644 --- a/gnattools/Makefile.in +++ b/gnattools/Makefile.in @@ -24,6 +24,7 @@ srcdir = @srcdir@ libdir = @libdir@ build = @build@ target = @target@ +host = @host@ prefix = @prefix@ INSTALL = @INSTALL@ INSTALL_DATA = @INSTALL_DATA@ @@ -92,6 +93,7 @@ TOOLS_FLAGS_TO_PASS_RE= \ "CC=../../xgcc -B../../" \ "CXX=../../xg++ -B../../ $(CXX_LFLAGS)" \ "CFLAGS=$(CFLAGS)" \ + "LDFLAGS=$(LDFLAGS)" \ "ADAFLAGS=$(ADAFLAGS)" \ "ADA_CFLAGS=$(ADA_CFLAGS)" \ "INCLUDES=$(INCLUDES_FOR_SUBDIR)" \ @@ -192,7 +194,11 @@ regnattools: $(GCC_DIR
Re: [RFA][PATCH][PR tree-optimization/45685]
On 12/11/13 02:51, Richard Biener wrote: First of all phiopt runs unconditionally for -On with n > 0 but the conversion is clearly not suitable for non-speed optimizations. Thus I'd guard it with at least !optimize_size && optimize >= 2. As you are targeting a worse transformation done by if-conversion you may want to add || (((flag_tree_loop_vectorize || cfun->has_force_vect_loops) && flag_tree_loop_if_convert != 0) || flag_tree_loop_if_convert == 1 || flag_tree_loop_if_convert_stores == 1) (ugh). That's a hell of a condition to guard a transformation. But, yea, I agree. + + /* If inversion is needed, first try to invert the test since + that's cheapest. */ + if (invert) +{ + enum tree_code new_code + = invert_tree_comparison (cond_code, + HONOR_NANS (TYPE_MODE (TREE_TYPE (rhs; That looks wrong - you want to look at HONOR_NANS on the mode of one of the comparison operands, not of the actual value you want to negate (it's integer and thus never has NaNs). Bah. That was supposed to be HONOR_SIGNED_ZEROS. Which as far as I can tell is a property of the value being tested. So it seems to me it should be HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (one of the conditional's arguments))) much like is done in abs_replacement. With the difference we want to look at the comparison (which may have different arguments than the PHI we're converting) and that we can still apply the optimization, just in a slightly different way. jeff
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
On 12/11/2013 08:55 AM, Jan Hubicka wrote: + /* Don't clone decls local to a comdat group. */ + if (comdat_local_p (node)) +return false; Why? It seems this should work if the clone becomes another comdat local? Right, I think the problem was that the clone wasn't becoming comdat local. And for the specific case of decloning, there's no point in cloning the decloned constructor. If it does not make sense, how we ended up cloning it? I guess the heuristics are making a mistake. Can you show me some code sample of decloning? I assume that we basically turn original cloned constructors into wrappers around common "worker" that is now comdat local. Right. I think ipa-cp may end up deciding on clonning the wrapper that will break because it will end up static calling the local comdat function. On the other hand it may decide to clone both the wrapper and the actual function and in that case bringing both static is fine. In the case I'm looking at (g++.dg/torture/pr41257.C, with decloning forced on), we're cloning the local function in order to propagate in a '0' passed in from one of the wrappers. This is pointless because the wrapper contains just the one call, so in any situation where cloning makes sense, inlining is better. So, it's probably possible to make it work to clone the comdat local function into another comdat local function, but it's not useful, and it's easier to just prevent it. Jason
Re: [RFA][PATCH][PR tree-optimization/45685]
On Wed, Dec 11, 2013 at 3:51 PM, Jeff Law wrote: > On 12/11/13 02:51, Richard Biener wrote: >> >> >> First of all phiopt runs unconditionally for -On with n > 0 but the >> conversion >> is clearly not suitable for non-speed optimizations. Thus I'd guard it >> with at least !optimize_size && optimize >= 2. As you are targeting >> a worse transformation done by if-conversion you may want to >> add || (((flag_tree_loop_vectorize || cfun->has_force_vect_loops) >> && flag_tree_loop_if_convert != 0) >>|| flag_tree_loop_if_convert == 1 >>|| flag_tree_loop_if_convert_stores == 1) >> (ugh). > > That's a hell of a condition to guard a transformation. But, yea, I agree. > > >>> + >>> + /* If inversion is needed, first try to invert the test since >>> + that's cheapest. */ >>> + if (invert) >>> +{ >>> + enum tree_code new_code >>> + = invert_tree_comparison (cond_code, >>> + HONOR_NANS (TYPE_MODE (TREE_TYPE >>> (rhs; >> >> >> That looks wrong - you want to look at HONOR_NANS on the mode >> of one of the comparison operands, not of the actual value you want >> to negate (it's integer and thus never has NaNs). > > Bah. That was supposed to be HONOR_SIGNED_ZEROS. Which as far as I can > tell is a property of the value being tested. No, it's invert_tree_comparison (enum tree_code code, bool honor_nans) so indeed HONOR_NANS. And yes, on a conditional argument (it can be a FP comparison but a integer negate). Richard. > So it seems to me it should be > > HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (one of the conditional's > arguments))) > > much like is done in abs_replacement. With the difference we want to look > at the comparison (which may have different arguments than the PHI we're > converting) and that we can still apply the optimization, just in a slightly > different way. > > jeff >
Re: [Patch, RTL] Eliminate redundant vec_select moves.
"H.J. Lu" writes: > On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford > wrote: >> Richard Henderson writes: >>> On 12/10/2013 10:44 AM, Richard Sandiford wrote: Sorry, I don't understand. I never said it was invalid. I said (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents a single register. On a little-endian target, the offset cannot be anything other than 0 in that case. So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for something that is always invalid, regardless of the target. That kind of situation should be rejected by target-independent code instead. >>> >>> But, we want to disable the subreg before we know whether or not (reg:V4SF >>> X) >>> will be allocated to a single hard register. That is something that we >>> can't >>> know in target-independent code before register allocation. >> >> I was thinking that if we've got a class, we've also got things like >> CLASS_MAX_NREGS. Maybe that doesn't cope with padding properly though. >> But even in the padding cases an offset-based check in C_C_M_C could >> be derived from other information. >> >> subreg_get_info handles padding with: >> >> nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); >> if (GET_MODE_INNER (xmode) == VOIDmode) >> xmode_unit = xmode; >> else >> xmode_unit = GET_MODE_INNER (xmode); >> gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); >> gcc_assert (nregs_xmode >> == (GET_MODE_NUNITS (xmode) >> * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit))); >> gcc_assert (hard_regno_nregs[xregno][xmode] >> == (hard_regno_nregs[xregno][xmode_unit] >> * GET_MODE_NUNITS (xmode))); >> >> /* You can only ask for a SUBREG of a value with holes in the middle >> if you don't cross the holes. (Such a SUBREG should be done by >> picking a different register class, or doing it in memory if >> necessary.) An example of a value with holes is XCmode on 32-bit >> x86 with -m128bit-long-double; it's represented in 6 32-bit >> registers, >> 3 for each part, but in memory it's two 128-bit parts. >> Padding is assumed to be at the end (not necessarily the 'high >> part') >> of each unit. */ >> if ((offset / GET_MODE_SIZE (xmode_unit) + 1 >>< GET_MODE_NUNITS (xmode)) >> && (offset / GET_MODE_SIZE (xmode_unit) >> != ((offset + GET_MODE_SIZE (ymode) - 1) >> / GET_MODE_SIZE (xmode_unit >> { >> info->representable_p = false; >> rknown = true; >> } >> >> and I wouldn't really want to force targets to individually reproduce >> that kind of logic at the class level. If the worst comes to the worst >> we could cache the difficult cases. >> > > My case is x86 CANNOT_CHANGE_MODE_CLASS only needs > to know if the subreg byte is zero or not. It doesn't care about mode > padding. You are concerned about information passed to > CANNOT_CHANGE_MODE_CLASS is too expensive for target > to process. It isn't the case for x86. No, I'm concerned that by going this route, we're forcing every target (or at least every target with wider-than-word registers, which is most of the common ones) to implement the same target-independent restriction. This is not an x86-specific issue. Thanks, Richard
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford wrote: > "H.J. Lu" writes: >> On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford >> wrote: >>> Richard Henderson writes: On 12/10/2013 10:44 AM, Richard Sandiford wrote: > Sorry, I don't understand. I never said it was invalid. I said > (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents > a single register. On a little-endian target, the offset cannot be > anything other than 0 in that case. > > So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for > something that is always invalid, regardless of the target. That kind > of situation should be rejected by target-independent code instead. But, we want to disable the subreg before we know whether or not (reg:V4SF X) will be allocated to a single hard register. That is something that we can't know in target-independent code before register allocation. >>> >>> I was thinking that if we've got a class, we've also got things like >>> CLASS_MAX_NREGS. Maybe that doesn't cope with padding properly though. >>> But even in the padding cases an offset-based check in C_C_M_C could >>> be derived from other information. >>> >>> subreg_get_info handles padding with: >>> >>> nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); >>> if (GET_MODE_INNER (xmode) == VOIDmode) >>> xmode_unit = xmode; >>> else >>> xmode_unit = GET_MODE_INNER (xmode); >>> gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); >>> gcc_assert (nregs_xmode >>> == (GET_MODE_NUNITS (xmode) >>> * HARD_REGNO_NREGS_WITH_PADDING (xregno, >>> xmode_unit))); >>> gcc_assert (hard_regno_nregs[xregno][xmode] >>> == (hard_regno_nregs[xregno][xmode_unit] >>> * GET_MODE_NUNITS (xmode))); >>> >>> /* You can only ask for a SUBREG of a value with holes in the middle >>> if you don't cross the holes. (Such a SUBREG should be done by >>> picking a different register class, or doing it in memory if >>> necessary.) An example of a value with holes is XCmode on 32-bit >>> x86 with -m128bit-long-double; it's represented in 6 32-bit >>> registers, >>> 3 for each part, but in memory it's two 128-bit parts. >>> Padding is assumed to be at the end (not necessarily the 'high >>> part') >>> of each unit. */ >>> if ((offset / GET_MODE_SIZE (xmode_unit) + 1 >>>< GET_MODE_NUNITS (xmode)) >>> && (offset / GET_MODE_SIZE (xmode_unit) >>> != ((offset + GET_MODE_SIZE (ymode) - 1) >>> / GET_MODE_SIZE (xmode_unit >>> { >>> info->representable_p = false; >>> rknown = true; >>> } >>> >>> and I wouldn't really want to force targets to individually reproduce >>> that kind of logic at the class level. If the worst comes to the worst >>> we could cache the difficult cases. >>> >> >> My case is x86 CANNOT_CHANGE_MODE_CLASS only needs >> to know if the subreg byte is zero or not. It doesn't care about mode >> padding. You are concerned about information passed to >> CANNOT_CHANGE_MODE_CLASS is too expensive for target >> to process. It isn't the case for x86. > > No, I'm concerned that by going this route, we're forcing every target > (or at least every target with wider-than-word registers, which is most > of the common ones) to implement the same target-independent restriction. > This is not an x86-specific issue. > So you prefer a generic solution which makes CANNOT_CHANGE_MODE_CLASS return true for vector mode subreg if subreg byte != 0. Is this correct? Thanks. -- H.J.
[wwwdocs] Buildstat update for 4.8
Latest results for gcc 4.8.x. This patch supersedes the previous one sent on 2013-11-03 as it was never applied. -tgc Testresults for 4.8.2 arm-unknown-linux-gnueabi i386-pc-solaris2.9 i686-pc-linux-gnu (2) i686-unknown-linux-gnu mips-unknown-linux-gnu mipsel-unknown-linux-gnu powerpc-apple-darwin8.11.0 powerpc-unknown-linux-gnu sparc-sun-solaris2.9 sparc64-sun-solaris2.10 (2) sparc-unknown-linux-gnu x86_64-apple-darwin13 x86_64-unknown-linux-gnu (3) Index: buildstat.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/buildstat.html,v retrieving revision 1.5 diff -u -r1.5 buildstat.html --- buildstat.html 2 Oct 2013 08:58:33 - 1.5 +++ buildstat.html 11 Dec 2013 16:15:47 - @@ -26,6 +26,7 @@ arm-unknown-linux-gnueabi  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01757.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg01612.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02658.html";>4.8.0 @@ -85,6 +86,7 @@ i386-pc-solaris2.9  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg02068.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00230.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg00416.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg00415.html";>4.8.0, @@ -112,15 +114,26 @@ i686-pc-linux-gnu  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01348.html";>4.8.2, +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01313.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-07/msg02349.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-04/msg03091.html";>4.8.0 +i686-unknown-linux-gnu + +Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01351.html";>4.8.2 + + + + mips-unknown-linux-gnu  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01758.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg02891.html";>4.8.1 @@ -129,6 +142,7 @@ mipsel-unknown-linux-gnu  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01759.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg01568.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02562.html";>4.8.0 @@ -138,6 +152,7 @@ powerpc-apple-darwin8.11.0  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01405.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg00237.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-04/msg01012.html";>4.8.0 @@ -163,6 +178,7 @@ powerpc-unknown-linux-gnu  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00014.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg01486.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02452.html";>4.8.0 @@ -180,6 +196,7 @@ sparc-sun-solaris2.9  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg02067.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-08/msg00243.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02774.html";>4.8.0, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02717.html";>4.8.0 @@ -206,6 +223,8 @@ sparc64-sun-solaris2.10  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01666.html";>4.8.2, +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01379.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02413.html";>4.8.0 @@ -214,6 +233,7 @@ sparc-unknown-linux-gnu  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01785.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-06/msg02503.html";>4.8.1, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02660.html";>4.8.0 @@ -244,6 +264,14 @@ +x86_64-apple-darwin13 + +Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01562.html";>4.8.2 + + + + x86_64-apple-darwin13.0.0  Test results: @@ -255,6 +283,9 @@ x86_64-unknown-linux-gnu  Test results: +http://gcc.gnu.org/ml/gcc-testresults/2013-11/msg00373.html";>4.8.2, +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01854.html";>4.8.2, +http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01777.html";>4.8.2, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02881.html";>4.8.0, http://gcc.gnu.org/ml/gcc-testresults/2013-03/msg02623.html";>4.8.0
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
On 12/11/2013 10:11 AM, Jason Merrill wrote: So, it's probably possible to make it work to clone the comdat local function into another comdat local function, but it's not useful, and it's easier to just prevent it. Well, not that much easier actually. I'm attaching both a delta from my last patch and a complete patch against trunk. Two questions: Do you have an opinion about which way to implement symtab_in_same_comdat_p? Is ipa_passes the right place to initialize calls_comdat_local? Jason diff --git a/gcc/cgraph.c b/gcc/cgraph.c index bb6b382..8b5b786 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2623,11 +2623,6 @@ verify_cgraph_node (struct cgraph_node *node) error ("execution count is negative"); error_found = true; } - if (node->global.inlined_to && node->same_comdat_group) -{ - error ("inline clone in same comdat group list"); - error_found = true; -} if (!node->definition && !node->in_other_partition && node->local.local) { error ("local symbols must be defined"); @@ -2666,15 +2661,13 @@ verify_cgraph_node (struct cgraph_node *node) error_found = true; } } - tree required_group = NULL_TREE; - if (comdat_local_p (node)) -required_group = DECL_COMDAT_GROUP (node->decl); + bool check_group = symtab_comdat_local_p (node); for (e = node->callers; e; e = e->next_caller) { if (verify_edge_count_and_frequency (e)) error_found = true; - if (required_group - && DECL_COMDAT_GROUP (e->caller->decl) != required_group) + if (check_group + && !symtab_in_same_comdat_p (e->caller, node)) { error ("comdat-local function called by %s outside its comdat", identifier_to_locale (e->caller->name ())); diff --git a/gcc/cgraph.h b/gcc/cgraph.h index a4de294..62b7a7c 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -428,6 +428,8 @@ public: unsigned tm_clone : 1; /* True if this decl is a dispatcher for function versions. */ unsigned dispatcher_function : 1; + /* True if this decl calls a COMDAT-local function. */ + unsigned calls_comdat_local : 1; }; @@ -1496,8 +1498,25 @@ symtab_can_be_discarded (symtab_node *node) constructors. */ static inline bool -comdat_local_p (symtab_node *node) +symtab_comdat_local_p (symtab_node *node) { return (node->same_comdat_group && !TREE_PUBLIC (node->decl)); } + +/* Return true if ONE and TWO are part of the same COMDAT group. */ + +static inline bool +symtab_in_same_comdat_p (symtab_node *one, symtab_node *two) +{ +#if 1 + return DECL_COMDAT_GROUP (one->decl) == DECL_COMDAT_GROUP (two->decl); +#else + if (one->same_comdat_group && two->same_comdat_group) +for (symtab_node *n = one->same_comdat_group; n != one; + n = n->same_comdat_group) + if (n == two) + return true; + return false; +#endif +} #endif /* GCC_CGRAPH_H */ diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index 90ef901..cff9a45 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -216,6 +216,8 @@ cgraph_clone_node (struct cgraph_node *n, tree decl, gcov_type count, int freq, new_node->clone = n->clone; new_node->clone.tree_map = NULL; new_node->tp_first_run = n->tp_first_run; + if (n->same_comdat_group) +symtab_add_to_same_comdat_group (new_node, n); if (n->count) { if (new_node->count > n->count) @@ -446,13 +448,10 @@ cgraph_create_virtual_clone (struct cgraph_node *old_node, redirect_callers, false, NULL); /* Update the properties. Make clone visible only within this translation unit. Make sure - that is not weak also. - ??? We cannot use COMDAT linkage because there is no - ABI support for this. */ + that is not weak also. */ DECL_EXTERNAL (new_node->decl) = 0; if (DECL_ONE_ONLY (old_decl)) DECL_SECTION_NAME (new_node->decl) = NULL; - DECL_COMDAT_GROUP (new_node->decl) = 0; TREE_PUBLIC (new_node->decl) = 0; DECL_COMDAT (new_node->decl) = 0; DECL_WEAK (new_node->decl) = 0; diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 44f3afd..a1294ae 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1244,7 +1244,8 @@ mark_functions_to_output (void) for (next = cgraph (node->same_comdat_group); next != node; next = cgraph (next->same_comdat_group)) - if (!next->thunk.thunk_p && !next->alias) + if (!next->thunk.thunk_p && !next->alias + && !symtab_comdat_local_p (next)) next->process = 1; } } @@ -1990,6 +1991,12 @@ ipa_passes (void) invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL); + cgraph_node *node; + FOR_EACH_FUNCTION (node) +if (symtab_comdat_local_p (node)) + for (struct cgraph_edge *e = node->callers; e; e = e->next_caller) + e->caller->calls_comdat_local = true; + if (!in_lto_p) { execute_ipa_pass_list (passes->all_small_ipa_passes); diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index e326e75..850abb8 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -97,7 +97
Re: [Patch, RTL] Eliminate redundant vec_select moves.
H.J. Lu wrote: On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford wrote: "H.J. Lu" writes: On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford wrote: Richard Henderson writes: On 12/10/2013 10:44 AM, Richard Sandiford wrote: Sorry, I don't understand. I never said it was invalid. I said (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents a single register. On a little-endian target, the offset cannot be anything other than 0 in that case. So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for something that is always invalid, regardless of the target. That kind of situation should be rejected by target-independent code instead. But, we want to disable the subreg before we know whether or not (reg:V4SF X) will be allocated to a single hard register. That is something that we can't know in target-independent code before register allocation. I was thinking that if we've got a class, we've also got things like CLASS_MAX_NREGS. Maybe that doesn't cope with padding properly though. But even in the padding cases an offset-based check in C_C_M_C could be derived from other information. subreg_get_info handles padding with: nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); if (GET_MODE_INNER (xmode) == VOIDmode) xmode_unit = xmode; else xmode_unit = GET_MODE_INNER (xmode); gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); gcc_assert (nregs_xmode == (GET_MODE_NUNITS (xmode) * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit))); gcc_assert (hard_regno_nregs[xregno][xmode] == (hard_regno_nregs[xregno][xmode_unit] * GET_MODE_NUNITS (xmode))); /* You can only ask for a SUBREG of a value with holes in the middle if you don't cross the holes. (Such a SUBREG should be done by picking a different register class, or doing it in memory if necessary.) An example of a value with holes is XCmode on 32-bit x86 with -m128bit-long-double; it's represented in 6 32-bit registers, 3 for each part, but in memory it's two 128-bit parts. Padding is assumed to be at the end (not necessarily the 'high part') of each unit. */ if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < GET_MODE_NUNITS (xmode)) && (offset / GET_MODE_SIZE (xmode_unit) != ((offset + GET_MODE_SIZE (ymode) - 1) / GET_MODE_SIZE (xmode_unit { info->representable_p = false; rknown = true; } and I wouldn't really want to force targets to individually reproduce that kind of logic at the class level. If the worst comes to the worst we could cache the difficult cases. My case is x86 CANNOT_CHANGE_MODE_CLASS only needs to know if the subreg byte is zero or not. It doesn't care about mode padding. You are concerned about information passed to CANNOT_CHANGE_MODE_CLASS is too expensive for target to process. It isn't the case for x86. No, I'm concerned that by going this route, we're forcing every target (or at least every target with wider-than-word registers, which is most of the common ones) to implement the same target-independent restriction. This is not an x86-specific issue. So you prefer a generic solution which makes CANNOT_CHANGE_MODE_CLASS return true for vector mode subreg if subreg byte != 0. Is this correct? Do you mean a generic solution for C_C_M_C to return true for non-zero byte_offset vector subregs in the context of x86? I want to clarify because in the context of 32-bit ARM little-endian, a non-zero byte-offset vector subreg is still a valid full hardreg. eg. for (subreg:DI (reg:V4SF) 8) C_C_M_C can return 'false' as this can be resolved to a full D-reg. Thanks, Tejas.
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
On 12/11/2013 11:24 AM, Jason Merrill wrote: Well, not that much easier actually. I'm attaching both a delta from my last patch and a complete patch against trunk. Oops, forgot to remove the gimple-fold change, doing that now. Jason
Re: [RFA][PATCH][PR tree-optimization/45685]
On 12/11/13 08:11, Richard Biener wrote: Bah. That was supposed to be HONOR_SIGNED_ZEROS. Which as far as I can tell is a property of the value being tested. No, it's invert_tree_comparison (enum tree_code code, bool honor_nans) so indeed HONOR_NANS. And yes, on a conditional argument (it can be a FP comparison but a integer negate). I realized I was wrong after I went downstairs for breakfast :-) Ignore my last message WRT HONOR_NANs and HONOR_SIGNED_ZEROs. Jeff
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, Dec 11, 2013 at 8:26 AM, Tejas Belagod wrote: > H.J. Lu wrote: >> >> On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford >> wrote: >>> >>> "H.J. Lu" writes: On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford wrote: > > Richard Henderson writes: >> >> On 12/10/2013 10:44 AM, Richard Sandiford wrote: >>> >>> Sorry, I don't understand. I never said it was invalid. I said >>> (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents >>> a single register. On a little-endian target, the offset cannot be >>> anything other than 0 in that case. >>> >>> So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for >>> something that is always invalid, regardless of the target. That >>> kind >>> of situation should be rejected by target-independent code instead. >> >> But, we want to disable the subreg before we know whether or not >> (reg:V4SF X) >> will be allocated to a single hard register. That is something that >> we can't >> know in target-independent code before register allocation. > > I was thinking that if we've got a class, we've also got things like > CLASS_MAX_NREGS. Maybe that doesn't cope with padding properly though. > But even in the padding cases an offset-based check in C_C_M_C could > be derived from other information. > > subreg_get_info handles padding with: > > nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); > if (GET_MODE_INNER (xmode) == VOIDmode) > xmode_unit = xmode; > else > xmode_unit = GET_MODE_INNER (xmode); > gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); > gcc_assert (nregs_xmode > == (GET_MODE_NUNITS (xmode) > * HARD_REGNO_NREGS_WITH_PADDING (xregno, > xmode_unit))); > gcc_assert (hard_regno_nregs[xregno][xmode] > == (hard_regno_nregs[xregno][xmode_unit] > * GET_MODE_NUNITS (xmode))); > > /* You can only ask for a SUBREG of a value with holes in the > middle > if you don't cross the holes. (Such a SUBREG should be done > by > picking a different register class, or doing it in memory if > necessary.) An example of a value with holes is XCmode on > 32-bit > x86 with -m128bit-long-double; it's represented in 6 32-bit > registers, > 3 for each part, but in memory it's two 128-bit parts. > Padding is assumed to be at the end (not necessarily the 'high > part') > of each unit. */ > if ((offset / GET_MODE_SIZE (xmode_unit) + 1 >< GET_MODE_NUNITS (xmode)) > && (offset / GET_MODE_SIZE (xmode_unit) > != ((offset + GET_MODE_SIZE (ymode) - 1) > / GET_MODE_SIZE (xmode_unit > { > info->representable_p = false; > rknown = true; > } > > and I wouldn't really want to force targets to individually reproduce > that kind of logic at the class level. If the worst comes to the worst > we could cache the difficult cases. > My case is x86 CANNOT_CHANGE_MODE_CLASS only needs to know if the subreg byte is zero or not. It doesn't care about mode padding. You are concerned about information passed to CANNOT_CHANGE_MODE_CLASS is too expensive for target to process. It isn't the case for x86. >>> >>> No, I'm concerned that by going this route, we're forcing every target >>> (or at least every target with wider-than-word registers, which is most >>> of the common ones) to implement the same target-independent restriction. >>> This is not an x86-specific issue. >>> >> >> So you prefer a generic solution which makes >> CANNOT_CHANGE_MODE_CLASS return true >> for vector mode subreg if subreg byte != 0. Is this >> correct? > > > Do you mean a generic solution for C_C_M_C to return true for non-zero > byte_offset vector subregs in the context of x86? > > I want to clarify because in the context of 32-bit ARM little-endian, a > non-zero byte-offset vector subreg is still a valid full hardreg. eg. for > >(subreg:DI (reg:V4SF) 8) > > C_C_M_C can return 'false' as this can be resolved to a full D-reg. > Does that mean subreg byte interpretation is endian-dependent? Both llittle endian subreg:DI (reg:V4SF) 0) and big endian subreg:DI (reg:V4SF) MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT) refer to the same lower 64 bits of reg:V4SF. Is this correct? -- H.J.
Re: _Cilk_spawn and _Cilk_sync for C++
On 12/10/2013 06:03 PM, Iyer, Balaji V wrote: Fixed patch and ChangeLog entries are attached. Is it Ok to install? OK. Jason
Re: [Patch, RTL] Eliminate redundant vec_select moves.
H.J. Lu wrote: On Wed, Dec 11, 2013 at 8:26 AM, Tejas Belagod wrote: H.J. Lu wrote: On Wed, Dec 11, 2013 at 7:49 AM, Richard Sandiford wrote: "H.J. Lu" writes: On Wed, Dec 11, 2013 at 1:13 AM, Richard Sandiford wrote: Richard Henderson writes: On 12/10/2013 10:44 AM, Richard Sandiford wrote: Sorry, I don't understand. I never said it was invalid. I said (subreg:SF (reg:V4SF X) 1) was invalid if (reg:V4SF X) represents a single register. On a little-endian target, the offset cannot be anything other than 0 in that case. So the CANNOT_CHANGE_MODE_CLASS code above seems to be checking for something that is always invalid, regardless of the target. That kind of situation should be rejected by target-independent code instead. But, we want to disable the subreg before we know whether or not (reg:V4SF X) will be allocated to a single hard register. That is something that we can't know in target-independent code before register allocation. I was thinking that if we've got a class, we've also got things like CLASS_MAX_NREGS. Maybe that doesn't cope with padding properly though. But even in the padding cases an offset-based check in C_C_M_C could be derived from other information. subreg_get_info handles padding with: nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode); if (GET_MODE_INNER (xmode) == VOIDmode) xmode_unit = xmode; else xmode_unit = GET_MODE_INNER (xmode); gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit)); gcc_assert (nregs_xmode == (GET_MODE_NUNITS (xmode) * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit))); gcc_assert (hard_regno_nregs[xregno][xmode] == (hard_regno_nregs[xregno][xmode_unit] * GET_MODE_NUNITS (xmode))); /* You can only ask for a SUBREG of a value with holes in the middle if you don't cross the holes. (Such a SUBREG should be done by picking a different register class, or doing it in memory if necessary.) An example of a value with holes is XCmode on 32-bit x86 with -m128bit-long-double; it's represented in 6 32-bit registers, 3 for each part, but in memory it's two 128-bit parts. Padding is assumed to be at the end (not necessarily the 'high part') of each unit. */ if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < GET_MODE_NUNITS (xmode)) && (offset / GET_MODE_SIZE (xmode_unit) != ((offset + GET_MODE_SIZE (ymode) - 1) / GET_MODE_SIZE (xmode_unit { info->representable_p = false; rknown = true; } and I wouldn't really want to force targets to individually reproduce that kind of logic at the class level. If the worst comes to the worst we could cache the difficult cases. My case is x86 CANNOT_CHANGE_MODE_CLASS only needs to know if the subreg byte is zero or not. It doesn't care about mode padding. You are concerned about information passed to CANNOT_CHANGE_MODE_CLASS is too expensive for target to process. It isn't the case for x86. No, I'm concerned that by going this route, we're forcing every target (or at least every target with wider-than-word registers, which is most of the common ones) to implement the same target-independent restriction. This is not an x86-specific issue. So you prefer a generic solution which makes CANNOT_CHANGE_MODE_CLASS return true for vector mode subreg if subreg byte != 0. Is this correct? Do you mean a generic solution for C_C_M_C to return true for non-zero byte_offset vector subregs in the context of x86? I want to clarify because in the context of 32-bit ARM little-endian, a non-zero byte-offset vector subreg is still a valid full hardreg. eg. for (subreg:DI (reg:V4SF) 8) C_C_M_C can return 'false' as this can be resolved to a full D-reg. Does that mean subreg byte interpretation is endian-dependent? Both llittle endian subreg:DI (reg:V4SF) 0) and big endian subreg:DI (reg:V4SF) MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT) refer to the same lower 64 bits of reg:V4SF. Is this correct? If my understanding of endianness representation in RTL registers is correct, yes. I said little-endian because C_C_M_C is currently gated on TARGET_BIG_ENDIAN in arm.h. Thanks, Tejas.
RFA: patch to fix PR59466 (inefficient address generation on ppc )
The following patch fixes PR59466 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59466 LRA on PPC sometimes generates inefficient code addis 9,2,.LC77@toc@ha addi 9,9,.LC77@toc@l ld 9,0(9) instead of addis 9,2,.LC77@toc@ha ld 9,.LC77@toc@l(9) I can not create a small test for this. The smallest file when I found is bzip2.c from SPEC2000. LRA generates move insn with invalid memory [unspec[`*.LC29',%2:DI] 45] but it can handle it (make it valid) very efficiently after that trying different numerous transformations. PPC target machinary through validize_mem just put all address in a pseudo. I could prevent use validize_mem in rs6000.c but I prefer to do it in change_addres_1 as other targets might have the same problem and it is better to have one for all solution. Still it does not fully solve the problem as insn r257:DI=[unspec[`*.LC29',%2:DI] 45] cant be recognized as *movdi... pattern has operand predicates rejecting memory because of invalid address. To fix this a change in general_operand is done. As LRA can not work properly with regular insn recognition, I added an assert for this in lra_set_insn_recog_data to figure out this situation earlier. Again, LRA has a very good code for legitimize address by itself and it is better to use it. After applying patch I see code size reduction on SPEC2000. Before the patch (this is relative reload generated code): CFP2000- -0.471% 27171 27043 168.wupwise -0.457% 14006 13942 171.swim -0.392% 24515 24419 172.mgrid 0.226% 85079 85271 173.applu 0.751% 728891 734363 177.mesa 0.194% 214357 214773 178.galgel -0.295% 21683 21619 179.art 0.412% 31089 31217 183.equake -0.520% 79976 79560 187.facerec 0.000% 152504 152504 188.ammp 0.000% 43758 43758 189.lucas -0.181%10622651060337 191.fma3d 1.035%10416841052468 200.sixtrack -0.105% 151944 151784 301.apsi Average = 0.0139775% CINT2000- 0.252% 76242 76434 164.gzip 0.172% 186152 186472 175.vpr -0.215%20846122080132 176.gcc 0.000% 16716 16716 181.mcf 0.085% 225316 225508 186.crafty -0.015% 210100 210068 197.parser 0.622% 433635 436332 252.eon -0.298% 762014 759742 253.perlbmk =0.000% 904784 904784 254.gap -0.285% 706432 704416 255.vortex 0.220% 58297 58425 256.bzip2 0.314% 265334 266166 300.twolf Average = 0.070863% After the patch: CFP2000- -0.589% 27171 27011 168.wupwise -0.457% 14006 13942 171.swim -0.392% 24515 24419 172.mgrid -0.113% 85079 84983 173.applu 0.654% 728891 733659 177.mesa 0.060% 214357 214485 178.galgel -0.295% 21683 21619 179.art -0.412% 31089 30961 183.equake -0.520% 79976 79560 187.facerec 0.000% 152504 152504 188.ammp 0.000% 43758 43758 189.lucas -0.317%10622651058897 191.fma3d 0.356%10416841045396 200.sixtrack -0.105% 151944 151784 301.apsi Average = -0.152103% CINT2000- 0.084% 76242 76306 164.gzip -0.052% 186152 186056 175.vpr -0.284%20846122078692 176.gcc 0.000% 16716 16716 181.mcf -0.312% 225316 224612 186.crafty -0.091% 210100 209908 197.parser 0.622% 433635 436332 252.eon -0.340% 762014 759422 253.perlbmk 0.000% 904784 904784 254.gap -0.335% 706432 704064 255.vortex 0.110% 58297 58361 256.bzip2 -0.241% 265334 264694 300.twolf Average = -0.070023% Code size reduction for PPC in this case means also faster code generation. I see it but cannot provide reliable SPEC2000 rate change. The patch was successfully bootstrapped and tested on i686, x86_64, and PPC64. Ok to commit? 2013-12-11 Vladimir Makarov PR rtl-optimization/59466 * emit-rtl.c (change_address_1): Don't validate address for LRA. * recog.c (general_operand): Accept any memory for LRA. * lra.c (lra_set_insn_recog_data): Add an assert. Index: emit-rtl.c === --- emit-rtl.c (revision 205870) +++ emit-rtl.c (working copy) @@ -1951,7 +1951,9 @@ change_address_1 (rtx memref, enum machi && (!validate || memory_address_addr_space_p (mode, addr, as))) return memref; - if (validate) + /*
Re: [C++ PATCH] Fix GC related issues in C++ FE (PR c++/58627)
It's only safe to free the targs if they weren't used to instantiate any templates, so I lean toward option #1. Did you test this with strict gc? Jason
Re: [RFA][PATCH][PR tree-optimization/45685]
On 12/11/13 02:51, Richard Biener wrote: + /* If inversion is needed, first try to invert the test since + that's cheapest. */ + if (invert) +{ + enum tree_code new_code + = invert_tree_comparison (cond_code, + HONOR_NANS (TYPE_MODE (TREE_TYPE (rhs; That looks wrong - you want to look at HONOR_NANS on the mode of one of the comparison operands, not of the actual value you want to negate (it's integer and thus never has NaNs). HONOR_NANS (TYPE_MODE (TREE_TYPE (gimple_cond_lhs (cond Right? Jeff
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
> On 12/11/2013 08:55 AM, Jan Hubicka wrote: > + /* Don't clone decls local to a comdat group. */ > + if (comdat_local_p (node)) > +return false; > >>> > >>>Why? It seems this should work if the clone becomes another comdat local? > >> > >>Right, I think the problem was that the clone wasn't becoming comdat > >>local. And for the specific case of decloning, there's no point in > >>cloning the decloned constructor. > > > >If it does not make sense, how we ended up cloning it? > > I guess the heuristics are making a mistake. > > >Can you show me some code sample of decloning? > > >I assume that we basically turn original cloned constructors into wrappers > >around common "worker" that is now comdat local. > > Right. > > >I think ipa-cp may end up > >deciding on clonning the wrapper that will break because it will end up > >static > >calling the local comdat function. > >On the other hand it may decide to clone both the wrapper and the actual > >function > >and in that case bringing both static is fine. > > In the case I'm looking at (g++.dg/torture/pr41257.C, with decloning > forced on), we're cloning the local function in order to propagate > in a '0' passed in from one of the wrappers. This is pointless > because the wrapper contains just the one call, so in any situation > where cloning makes sense, inlining is better. ipa-cp does not really contain heuristic to figure out where cloning makes more sense than inlining basically because if we produce such clone, inliner will inline it anyway. It is harder than it may seem to isolate testcases where inlining is always a win. > > So, it's probably possible to make it work to clone the comdat local > function into another comdat local function, but it's not useful, > and it's easier to just prevent it. Yep, with current limited use of comdat locals I guess it makes sense. Honza > > Jason
[PATCH][ARM] Add C++ name mangling entry for poly64x2_t type
Hi all, Following up on the crypto intrinsics for AArch32 (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00365.html), here is a patch adding C++ name mangling for the poly64x2_t container type. This of course depends on the Crypto intrinsics patches going in. Tested arm-none-eabi on a model. Ok for trunk after the prerequisites go in? Thanks, Kyrill 2013-12-11 Kyrylo Tkachov * config/arm/arm.c (arm_mangle_map): Add entry for __builtin_neon_poly64.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 5e306d0..72c94e5 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -29340,6 +29340,7 @@ static arm_mangle_map_entry arm_mangle_map[] = { { V4SFmode, "__builtin_neon_sf", "19__simd128_float32_t" }, { V16QImode, "__builtin_neon_poly8", "17__simd128_poly8_t" }, { V8HImode, "__builtin_neon_poly16", "18__simd128_poly16_t" }, + { V2DImode, "__builtin_neon_poly64", "18__simd128_poly64_t" }, { VOIDmode, NULL, NULL } };
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
> On 12/11/2013 10:11 AM, Jason Merrill wrote: > >So, it's probably possible to make it work to clone the comdat local > >function into another comdat local function, but it's not useful, and > >it's easier to just prevent it. > > Well, not that much easier actually. I'm attaching both a delta > from my last patch and a complete patch against trunk. Two > questions: > > Do you have an opinion about which way to implement symtab_in_same_comdat_p? I would go with the #if 1 case. Don't seem to justify a loop that is trivially replaceable by comdat. In longer term the plan is to move all those visibility fields from trees into symbols and in that case I am sure we will end up having the comdat group name in there too. > > Is ipa_passes the right place to initialize calls_comdat_local? The flag is probably needed in both early inliner and IPA inliner. A conservative place to initialize it would be in inline_analyze_function. (early inliner analyze function twice, first before inlining and next after early optimization, so the update should also clear the flag if call disapeared). I think we also want to clear it if call to comdat local function is marked inline. This is only way we can end up inlining those constructors now, right? inlining ctors/dtors of course is rather important for further optimization, so we should be cureful to not regress optimization here too much. We will still lose some propagation since inliner won't work out that it can propagate something useful through the wrapper call (unlike ipa-cp that would if it knew how to clone local comdats). I am on way home, I will look in detail on patch next. Honza
RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez > Sent: Tuesday, December 10, 2013 1:03 PM > To: Iyer, Balaji V > Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org' > Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > Elemental functions) for C > > > >> But aren't both OpenMP and Cilk Plus simd clones marked as "omp > >> declare simd"? In which case you shouldn't have to do anything? > >> Are the Cilk Plus clones not being marked as "omp declare simd" in > >> the front-ends? > >> > > > > Didn't you mention in this thread > > (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that Cilk > > Plus SIMD-enabled functions must be marked as "cilk plus elementals" > > (as it wsa called then)? Did I misunderstand you? > > Both OpenMP and Cilk Plus clones should be marked with "omp declare > simd". But Cilk Plus should _also_ be marked with "cilk plus elementals" (or > "cilk simd function" now). In which case the aforementioned function > doesn't require any changes because Cilk Plus clones will be seen as they are > marked with "omp declare simd". > > So... > > OpenMP declare simd gets tagged as: > "omp declare simd" > Cilk Plus vector functions gets tagged as: > "omp declare simd" > "cilk simd function" > Ok, so you want the same clauses included in both omp declare simd and cilk simd function tree lists? For example in the c-parser.c, I have something like this: if (is_cilkplus_cilk_simd_fn) c = build_tree_list (get_identifier ("cilk simd function"), c); else c = build_tree_list (get_identifier ("omp declare simd"), c); TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) = c; You want to change it something like this? If (is_cilkplus_cilk_simd_fn) { tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), c); TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) = c_cilk; } c = build_tree_list (get_identififer ("omp declare simd"), c); TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) = c;
Re: [PATCH] Fix PR 59390
On Wed, Dec 11, 2013 at 2:42 AM, Uros Bizjak wrote: > On Wed, Dec 11, 2013 at 10:15 AM, Bernhard Reutner-Fischer > wrote: > Patch updated with two more tests to check if the vfmadd insn is being produced when possible. Thanks Sri On Fri, Dec 6, 2013 at 11:12 AM, Sriraman Tallam wrote: > Hi, > > I have attached a patch to fix > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59390. Please review. > > Here is the problem. GCC adds target-specific builtins on demand. The > FMA target-specific builtin __builtin_ia32_vfmaddpd gets added via > this declaration: > > void fun() __attribute__((target("fma"))); > > Specifically, the builtin __builtin_ia32_vfmaddpd gets added when > ix86_add_new_builtins is called from ix86_valid_target_attribute_tree > when processing this target attribute. > > Now, when the vectorizer is processing the builtin "__builtin_fma" in > function other_fn(), it checks to see if this function is vectorizable > and calls ix86_builtin_vectorized_function in i386.c. That returns the > builtin stored here: > > > case BUILT_IN_FMA: > if (out_mode == DFmode && in_mode == DFmode) > { > if (out_n == 2 && in_n == 2) >return ix86_builtins[IX86_BUILTIN_VFMADDPD]; > > > ix86_builtins[IX86_BUILTIN_VFMADDPD] would have contained NULL_TREE > had the builtin not been added by the previous target attribute. That > is why the code works if we remove the previous declaration. > > The fix is to not just return the builtin but to also check if the > current function's isa allows the use of the builtin. For instance, > this patch would solve the problem: > > @@ -33977,7 +33977,13 @@ ix86_builtin_vectorized_function (tree fndecl, > tre >if (out_mode == DFmode && in_mode == DFmode) > { >if (out_n == 2 && in_n == 2) > -return ix86_builtins[IX86_BUILTIN_VFMADDPD]; > +{ > + if (ix86_builtins_isa[IX86_BUILTIN_VFMADDPD].isa > + & global_options.x_ix86_isa_flags) > +return ix86_builtins[IX86_BUILTIN_VFMADDPD]; > + else > + return NULL_TREE; > +} > > > but there are many instances of this usage in > ix86_builtin_vectorized_function. This patch covers all the cases. >>> PR target/59390 * gcc.target/i386/pr59390.c: New test. * gcc.target/i386/pr59390_1.c: New test. * gcc.target/i386/pr59390_2.c: New test. * config/i386/i386.c (get_builtin): New function. (ix86_builtin_vectorized_function): Replace all instances of ix86_builtins[...] with get_builtin(...). (ix86_builtin_reciprocal): Ditto. >>> >>> OK, with a couple of nits: >>> >>> +static tree get_builtin (enum ix86_builtins code) >>> >>> Please name this function ix86_get_builtin. >>> >>> + if (current_function_decl) >>> +target_tree = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); >>> + if (target_tree) >>> +opts = TREE_TARGET_OPTION (target_tree); >>> + else >>> +opts = TREE_TARGET_OPTION (target_option_default_node); >>> + >>> >>> opts = TREE_TARGET_OPTION (target_tree ? target_tree : >>> target_option_default_node); >> >> Just curious: >>> +static tree get_builtin (enum ix86_builtins code) >>> +{ >> [] >>> +> >> [] >>> @@ -33677,9 +33701,9 @@ ix86_builtin_vectorized_function (tree fndecl, tre >>>if (out_mode == DFmode && in_mode == DFmode) >>> { >>> if (out_n == 2 && in_n == 2) >>> -return ix86_builtins[IX86_BUILTIN_SQRTPD]; >>> +get_builtin (IX86_BUILTIN_SQRTPD); >>> else if (out_n == 4 && in_n == 4) >>> -return ix86_builtins[IX86_BUILTIN_SQRTPD256]; >>> +get_builtin (IX86_BUILTIN_SQRTPD256); >>> } >>>break; >> >> I must be missing something? >> Don't you have to return ix86_get_builtin(...) ? Yes, I noticed it last evening when the vect tests broke. I fixed it with adding the return (which was a typo when I did a bulk edit) and running tests. I will include Uros's changes and commit the patch. Thanks Sri > > Huh, I didn't even notice this mistake. > > Uros.
Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C
On 12/11/13 09:31, Iyer, Balaji V wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez Sent: Tuesday, December 10, 2013 1:03 PM To: Iyer, Balaji V Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org' Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C But aren't both OpenMP and Cilk Plus simd clones marked as "omp declare simd"? In which case you shouldn't have to do anything? Are the Cilk Plus clones not being marked as "omp declare simd" in the front-ends? Didn't you mention in this thread (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that Cilk Plus SIMD-enabled functions must be marked as "cilk plus elementals" (as it wsa called then)? Did I misunderstand you? Both OpenMP and Cilk Plus clones should be marked with "omp declare simd". But Cilk Plus should _also_ be marked with "cilk plus elementals" (or "cilk simd function" now). In which case the aforementioned function doesn't require any changes because Cilk Plus clones will be seen as they are marked with "omp declare simd". So... OpenMP declare simd gets tagged as: "omp declare simd" Cilk Plus vector functions gets tagged as: "omp declare simd" "cilk simd function" Ok, so you want the same clauses included in both omp declare simd and cilk simd function tree lists? For example in the c-parser.c, I have something like this: if (is_cilkplus_cilk_simd_fn) c = build_tree_list (get_identifier ("cilk simd function"), c); else c = build_tree_list (get_identifier ("omp declare simd"), c); TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) = c; You want to change it something like this? If (is_cilkplus_cilk_simd_fn) { tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), c); TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) = c_cilk; } c = build_tree_list (get_identififer ("omp declare simd"), c); TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) = c; Yes.
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
Hi, while thinking of the details on how to handle comdat locals through ipa-cp/inliner/folding I wonder, if we won't end up with better code going the oposite way. We can declare those functions static first and then have post-inliner IPA pass that will travel the callgraph and mark all static functions/variables that are accessed only from one comdat to be comdat local. We already have issues here - for example when ipa-split decides to split a comdat function, its part will end up output comdat block. Similarly when we decide to produce a clone only to optimize comdat function, the clone may easily become dead. I also think with the C++ include jungle, people will end up having static things referenced only from comdats, but I may be wrong. This approach would have issues by itself tough 1) we will need to do it at -O0 too and in this case we may want to have flag for those decloned functions since we need to do it only for those 2) since I dropped the ball on merging function labels patch, we may end up privatizing function with a label that has address taken from static variable that is not privatized. We may just disable localization for such functions I suppose. 3) we may surprise users who forget to annotate with used attribute. 4) inliner does not handle static and comdats the same way, since it assumes comdat will be optimized out with some probability. This won't match on statics that will later become part of comdat group. Not big deal probably. On the other hand my feeling is that this has a potential of code size savings and may end up more robust/easier to maintain than support this somewhat counter-intuitive context thorough the whole IPA optimization queue. What do you think? Honza
C++ edge_iterator (was: Re: [SH] PR 53976 - Add RTL pass to eliminate clrt, sett insns)
On Thu, 2013-11-21 at 00:04 +0100, Steven Bosscher wrote: > Declaring the edge_iterator inside the for() is not a good argument > against FOR_EACH_EDGE. Of course, brownie points are up for grabs for > the brave soul daring enough to make edge iterators be proper C++ > iterators... ;-) So, I gave it a try -- see the attached patch. It allows edge iteration to look more like STL container iteration: for (basic_block::edge_iterator ei = bb->pred_edges ().begin (); ei != bb->pred_edges ().end (); ++ei) { basic_block pred_bb = (*ei)->src; ... } The idea is to first make class vec look more like an STL container. This would also allow iterating it with std::for_each and use C++11 range based for loop (in a few years maybe). It would also make it easier to replace vec uses with STL containers if needed and vice versa. Then the typedef struct basic_block_def* basic_block; is replaced with a wrapper class 'basic_block', which is just a simple POD wrapper around a basic_block_def*. There should be no penalties compared to passing/storing raw pointers. Because of the union with constructor restriction of C++98 an additional wrapper class 'basic_block_in_union' is required, which doesn't have any constructors defined. Having 'basic_block' as a class allows putting typedefs for the edge iterator types in there (initially I tried putting the typedefs into struct basic_block_def, but gengtype would bail out). It would also be possible to have a free standing definition / typedef of edge_iterator, but it would conflict with the existing one and require too many changes at once. Moreover, the iterator type actually depends on the container type, which is vec, and the container type is defined/selected by the basic_block class. The following basic_block pred_bb = (*ei)->src; can also be written as basic_block pred_bb = ei->src; after converting the edge typedef to a wrapper of edge_def*. The idea of the approach is to allow co-existence of the new edge_iterator and the old and thus be able to gradually convert code. The wrappers around raw pointers also helo encapsulating the underlying memory management issues. For example, it would be much easier to replace garbage collected objects with intrusive reference counting. Comments and feedback appreciated. Cheers, Oleg Index: gcc/coretypes.h === --- gcc/coretypes.h (revision 205801) +++ gcc/coretypes.h (working copy) @@ -153,8 +153,8 @@ typedef struct edge_def *edge; typedef const struct edge_def *const_edge; struct basic_block_def; -typedef struct basic_block_def *basic_block; -typedef const struct basic_block_def *const_basic_block; +class basic_block; +class const_basic_block; #define obstack_chunk_alloc ((void *(*) (long)) xmalloc) #define obstack_chunk_free ((void (*) (void *)) free) Index: gcc/tracer.c === --- gcc/tracer.c (revision 205801) +++ gcc/tracer.c (working copy) @@ -102,7 +102,7 @@ /* A transaction is a single entry multiple exit region. It must be duplicated in its entirety or not at all. */ - g = last_stmt (CONST_CAST_BB (bb)); + g = last_stmt (basic_block (bb)); if (g && gimple_code (g) == GIMPLE_TRANSACTION) return true; Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c (revision 205801) +++ gcc/emit-rtl.c (working copy) @@ -4446,7 +4446,7 @@ emit_note_after (enum insn_note subtype, rtx after) { rtx note = make_note_raw (subtype); - basic_block bb = BARRIER_P (after) ? NULL : BLOCK_FOR_INSN (after); + basic_block bb = BARRIER_P (after) ? (basic_block_def*)NULL : (basic_block_def*)BLOCK_FOR_INSN (after); bool on_bb_boundary_p = (bb != NULL && BB_END (bb) == after); if (note_outside_basic_block_p (subtype, on_bb_boundary_p)) @@ -4462,7 +4462,7 @@ emit_note_before (enum insn_note subtype, rtx before) { rtx note = make_note_raw (subtype); - basic_block bb = BARRIER_P (before) ? NULL : BLOCK_FOR_INSN (before); + basic_block bb = BARRIER_P (before) ? (basic_block_def*)NULL : (basic_block_def*)BLOCK_FOR_INSN (before); bool on_bb_boundary_p = (bb != NULL && BB_HEAD (bb) == before); if (note_outside_basic_block_p (subtype, on_bb_boundary_p)) Index: gcc/vec.h === --- gcc/vec.h (revision 205801) +++ gcc/vec.h (working copy) @@ -482,6 +482,15 @@ void quick_grow (unsigned len); void quick_grow_cleared (unsigned len); + /* STL like iterator interface. */ + typedef T* iterator; + typedef const T* const_iterator; + + iterator begin (void) { return &m_vecdata[0]; } + iterator end (void) { return &m_vecdata[m_vecpfx.m_num]; } + const_iterator begin (void) const { return &m_vecdata[0]; } + const_iterator end (void) const { &m_vecdata[m_vecpfx.m_num]; } + /* vec class can access our internal data and functions. */ te
Re: [wide-int] small cleanup in wide-int.*
On 12/09/2013 10:01 AM, Richard Biener wrote: On Mon, Dec 9, 2013 at 3:49 PM, Kenneth Zadeck wrote: On 12/08/2013 05:35 AM, Richard Biener wrote: Richard Sandiford wrote: Kenneth Zadeck writes: #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT) + 1) I think this should be: (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1) We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple of HOST_BITS_PER_WIDE_INT. we will do this later when some other issues that Eric B raised are settled. I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the change above. IMO it doesn't make sense to both round up the division and also add 1 to the result. So I think we should make this change regardless of whatever follows. Looks good to me otherwise, thanks. Richard so this one works the way you want.While it is true the the problems are disjoint, the solution will likely evolving change the same lines of source in two different ways. Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we are) I think we should change it to something that makes sense. All we want is 1 extra bit. We don't need to round up the size and then add a whole HWI on top. Doing that will potentially pessimise targets that don't need anything beyond SImode. I agree. Richard. Done, ok to commit? + early examination of the target's mode file. The WIDE_INT_MAX_ELTS + can accomodate at least 1 more bit so that unsigned numbers of that + mode can be represented. Note that it is still possible to create + fixed_wide_ints that have precisions greater than + MAX_BITSIZE_MODE_ANY_INT. This can be useful when representing a + double-width multiplication result, for example. */ #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT)) + that doesn't reserve 1 more bit, it just rounds up. For one more bit you need to add 1, so #define WIDE_INT_MAX_ELTS \ + (((MAX_BITSIZE_MODE_ANY_INT + 1 + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT)) no? Otherwise ok. Thanks, Richard. kenny It's not like I can approve the patch anyway though, so I'll leave it there. Thanks, Richard committed as revision 205900 after rebootstrap. kenny Index: gcc/tree.h === --- gcc/tree.h (revision 205897) +++ gcc/tree.h (working copy) @@ -4545,7 +4545,7 @@ namespace wi static const unsigned int precision = N; }; - generic_wide_int > + generic_wide_int > to_widest (const_tree); generic_wide_int > to_offset (const_tree); @@ -4566,7 +4566,7 @@ wi::int_traits ::decompose ( precision); } -inline generic_wide_int > +inline generic_wide_int > wi::to_widest (const_tree t) { return t; @@ -4605,7 +4605,7 @@ wi::extended_tree ::get_len () const { if (N == ADDR_MAX_PRECISION) return TREE_INT_CST_OFFSET_NUNITS (m_t); - else if (N == MAX_BITSIZE_MODE_ANY_INT) + else if (N >= WIDE_INT_MAX_PRECISION) return TREE_INT_CST_EXT_NUNITS (m_t); else /* This class is designed to be used for specific output precisions Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 205897) +++ gcc/tree-vrp.c (working copy) @@ -2620,23 +2620,24 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0); if (range_int_cst_p (&vr0) && range_int_cst_p (&vr1) && TYPE_OVERFLOW_WRAPS (expr_type)) { - wide_int sizem1 = wi::mask (prec, false, prec2); - wide_int size = sizem1 + 1; + typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int; + typedef generic_wide_int + > vrp_int_cst; + vrp_int sizem1 = wi::mask (prec, false); + vrp_int size = sizem1 + 1; /* Extend the values using the sign of the result to PREC2. From here on out, everthing is just signed math no matter what the input types were. */ - wide_int min0 = wide_int::from (vr0.min, prec2, sign); - wide_int max0 = wide_int::from (vr0.max, prec2, sign); - wide_int min1 = wide_int::from (vr1.min, prec2, sign); - wide_int max1 = wide_int::from (vr1.max, prec2, sign); - + vrp_int min0 = vrp_int_cst (vr0.min); + vrp_int max0 = vrp_int_cst (vr0.max); + vrp_int min1 = vrp_int_cst (vr1.min); + vrp_int max1 = vrp_int_cst (vr1.max); /* Canonicalize the intervals. */ if (sign == UNSIGNED)
RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C
> -Original Message- > From: Aldy Hernandez [mailto:al...@redhat.com] > Sent: Wednesday, December 11, 2013 12:38 PM > To: Iyer, Balaji V > Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org' > Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > Elemental functions) for C > > On 12/11/13 09:31, Iyer, Balaji V wrote: > > > > > >> -Original Message- > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez > >> Sent: Tuesday, December 10, 2013 1:03 PM > >> To: Iyer, Balaji V > >> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org' > >> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > >> Elemental functions) for C > >> > >> > But aren't both OpenMP and Cilk Plus simd clones marked as "omp > declare simd"? In which case you shouldn't have to do anything? > Are the Cilk Plus clones not being marked as "omp declare simd" in > the front-ends? > > >>> > >>> Didn't you mention in this thread > >>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that Cilk > >>> Plus SIMD-enabled functions must be marked as "cilk plus elementals" > >>> (as it wsa called then)? Did I misunderstand you? > >> > >> Both OpenMP and Cilk Plus clones should be marked with "omp declare > >> simd". But Cilk Plus should _also_ be marked with "cilk plus > >> elementals" (or "cilk simd function" now). In which case the > >> aforementioned function doesn't require any changes because Cilk Plus > >> clones will be seen as they are marked with "omp declare simd". > >> > >> So... > >> > >>OpenMP declare simd gets tagged as: > >>"omp declare simd" > >>Cilk Plus vector functions gets tagged as: > >>"omp declare simd" > >>"cilk simd function" > >> > > > > Ok, so you want the same clauses included in both omp declare simd and > cilk simd function tree lists? > > > > For example in the c-parser.c, I have something like this: > > > >if (is_cilkplus_cilk_simd_fn) > > c = build_tree_list (get_identifier ("cilk simd function"), c); > >else > > c = build_tree_list (get_identifier ("omp declare simd"), c); > >TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl); > >DECL_ATTRIBUTES (fndecl) = c; > > > > > > You want to change it something like this? > > > > > > If (is_cilkplus_cilk_simd_fn) > >{ > > tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), > > c); > > TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl); > > DECL_ATTRIBUTES (fndecl) = c_cilk; > > } > > c = build_tree_list (get_identififer ("omp declare simd"), c); > > TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) = > > c; > > > > > > Yes. The issue with doing this is that it is creating duplicate clones. If I just kept the patch as is, it does not. For example: __attribute__((vector (vectorlength(sizeof(int) int func3 (int x) { return x; } Should create two clones: mask and unmask (_ZGVbN4v_func3 and _ZGVbM4v_func3). It is doing that if I kept the patch AS-IS. Now, if I make the modification that I mentioned above and remove the handling of cilk simd function in omp-low.c it is creating 6 clones: _ZGVdM4v_func3, _ZGVbN4v_func3, _ZGVcN4v_func3, _ZGVcM4v_func3, _ZGVdN4v_func3 Thanks, Balaji V. Iyer.
Re: patch for elimination to SP when it is changed in RTL (PR57293)
On 12/11/2013, 5:35 AM, Yvan Roux wrote: Hi Vladimir, I've some regressions on ARM after this SP elimination patch, and they are execution failures. Here is the list: g++.dg/cilk-plus/AN/array_test_ND_tplt.cc -O3 -fcilkplus gcc.c-torture/execute/va-arg-22.c -O2 gcc.dg/atomic/c11-atomic-exec-5.c -O0 gfortran.dg/direct_io_12.f90 -O[23] gfortran.dg/elemental_dependency_1.f90 -O2 gfortran.dg/matmul_2.f90 -O2 gfortran.dg/matmul_6.f90 -O2 gfortran.dg/mvbits_7.f90 -O3 gfortran.dg/unlimited_polymorphic_1.f03 -O3 I reduced and looked at var-arg-22.c and the issue is that in lra_eliminate_regs_1 (called by get_equiv_with_elimination) we transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset. What we try to do here is to change the pseudo 195 of the insn 118 below : (insn 118 114 112 8 (set (reg:DI 195) (unspec:DI [ (mem:DI (plus:SI (reg/f:SI 215) (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12 + 64B]+8 S8 A8]) ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi} (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192) (const_int 8 [0x8])) [7 a35+8 S8 A32]) (nil))) with its equivalent (x arg of lra_eliminate_regs_1): (mem/c:DI (plus:SI (reg/f:SI 102 sfp) (const_int 76 [0x4c])) [7 a35+8 S8 A32]) lra_eliminate_regs_1 is called with full_p = true (it is not really clear for what it means), It means we use full offset between the regs, otherwise we use change in the full offset from the previous iteration (it can be changed as we reserve stack memory for spilled pseudos and the reservation can be done several times). As equiv value is stored as it was before any elimination, we need always to use full offset to make elimination. but in the PLUS switch case, we have offset = 0xb (given by ep->offset) and as lra_get_insn_recog_data (insn)->sp_offset value is 0, we will indeed add 0xb to the original 0x4c offset. 0 value is suspicious because it is default. We might have not set up it from neighbor insns. So, here I don't get if it is the sp_offset value of the lra_insn_recog_data element which is not well updated or if lra_ eliminate_regs_1 has to be called with update_p and not full_p (which fixed the value in that particular case). Is it more obvious for you ? Yvan, could you send me the reduced preprocessed case and the options for cc1 to reproduce it.
RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C
Can you investigate why it is creating multiple clones? On Dec 11, 2013 10:06 AM, "Iyer, Balaji V" wrote: > > > > > -Original Message- > > From: Aldy Hernandez [mailto:ald > -Original Message- > From: Aldy Hernandez [mailto:al...@redhat.com] > Sent: Wednesday, December 11, 2013 12:38 PM > To: Iyer, Balaji V > Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org' > Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > Elemental functions) for C > > On 12/11/13 09:31, Iyer, Balaji V wrote: > > > > > >> -Original Message- > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez > >> Sent: Tuesday, December 10, 2013 1:03 PM > >> To: Iyer, Balaji V > >> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org' > >> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > >> Elemental functions) for C > >> > >> > But aren't both OpenMP and Cilk Plus simd clones marked as "omp > declare simd"? In which case you shouldn't have to do anything? > Are the Cilk Plus clones not being marked as "omp declare simd" in > the front-ends? > > >>> > >>> Didn't you mention in this thread > >>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that Cilk > >>> Plus SIMD-enabled functions must be marked as "cilk plus elementals" > >>> (as it wsa called then)? Did I misunderstand you? > >> > >> Both OpenMP and Cilk Plus clones should be marked with "omp declare > >> simd". But Cilk Plus should _also_ be marked with "cilk plus > >> elementals" (or "cilk simd function" now). In which case the > >> aforementioned function doesn't require any changes because Cilk Plus > >> clones will be seen as they are marked with "omp declare simd". > >> > >> So... > >> > >>OpenMP declare simd gets tagged as: > >>"omp declare simd" > >>Cilk Plus vector functions gets tagged as: > >>"omp declare simd" > >>"cilk simd function" > >> > > > > Ok, so you want the same clauses included in both omp declare simd and > cilk simd function tree lists? > > > > For example in the c-parser.c, I have something like this: > > > >if (is_cilkplus_cilk_simd_fn) > > c = build_tree_list (get_identifier ("cilk simd function"), c); > >else > > c = build_tree_list (get_identifier ("omp declare simd"), c); > >TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl); > >DECL_ATTRIBUTES (fndecl) = c; > > > > > > You want to change it something like this? > > > > > > If (is_cilkplus_cilk_simd_fn) > >{ > > tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), > > c); > > TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl); > > DECL_ATTRIBUTES (fndecl) = c_cilk; > > } > > c = build_tree_list (get_identififer ("omp declare simd"), c); > > TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) = > > c; > > > > > > Yes. The issue with doing this is that it is creating duplicate clones. If I just kept the patch as is, it does not. For example: __attribute__((vector (vectorlength(sizeof(int) int func3 (int x) { return x; } Should create two clones: mask and unmask (_ZGVbN4v_func3 and _ZGVbM4v_func3). It is doing that if I kept the patch AS-IS. Now, if I make the modification that I mentioned above and remove the handling of cilk simd function in omp-low.c it is creating 6 clones: _ZGVdM4v_func3, _ZGVbN4v_func3, _ZGVcN4v_func3, _ZGVcM4v_func3, _ZGVdN4v_func3 Thanks, Balaji V. Iyer.
RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C
Hi Aldy, > -Original Message- > From: Aldy Hernandez [mailto:al...@redhat.com] > Sent: Wednesday, December 11, 2013 1:27 PM > To: Iyer, Balaji V > Cc: Jakub Jelinek; 'gcc-patches@gcc.gnu.org' > Subject: RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > Elemental functions) for C > > Can you investigate why it is creating multiple clones? > I can and will let you know as soon as I find it. I am working on GOMP4 branch, could that cause anything? Just out of curiosity, why can't I keep it as-is? It is giving the correct output/behavior and doesn't seem to interfere with anything else. The only extra thing I am doing is to add an extra if-statement while recursing through all the functions to check for cilk simd function and then calling the function to handle it, which the OMP will have to do anyway. The only extra thing I added was an extra if-statement. Please don't read my above paragraph as an argument. I am just asking for clarification and to understand more about the underlying routines. Thanks, Balaji V. Iyer. > On Dec 11, 2013 10:06 AM, "Iyer, Balaji V" wrote: > > > > > > > > > -Original Message- > > > From: Aldy Hernandez [mailto:ald > > > -Original Message- > > From: Aldy Hernandez [mailto:al...@redhat.com] > > Sent: Wednesday, December 11, 2013 12:38 PM > > To: Iyer, Balaji V > > Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org' > > Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly > > Elemental functions) for C > > > > On 12/11/13 09:31, Iyer, Balaji V wrote: > > > > > > > > >> -Original Message- > > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > >> ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez > > >> Sent: Tuesday, December 10, 2013 1:03 PM > > >> To: Iyer, Balaji V > > >> Cc: 'Jakub Jelinek'; 'gcc-patches@gcc.gnu.org' > > >> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions > > >> (formerly Elemental functions) for C > > >> > > >> > > But aren't both OpenMP and Cilk Plus simd clones marked as "omp > > declare simd"? In which case you shouldn't have to do anything? > > Are the Cilk Plus clones not being marked as "omp declare simd" > > in the front-ends? > > > > >>> > > >>> Didn't you mention in this thread > > >>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03506.html) that > > >>> Cilk Plus SIMD-enabled functions must be marked as "cilk plus > elementals" > > >>> (as it wsa called then)? Did I misunderstand you? > > >> > > >> Both OpenMP and Cilk Plus clones should be marked with "omp declare > > >> simd". But Cilk Plus should _also_ be marked with "cilk plus > > >> elementals" (or "cilk simd function" now). In which case the > > >> aforementioned function doesn't require any changes because Cilk > > >> Plus clones will be seen as they are marked with "omp declare simd". > > >> > > >> So... > > >> > > >> OpenMP declare simd gets tagged as: > > >> "omp declare simd" > > >> Cilk Plus vector functions gets tagged as: > > >> "omp declare simd" > > >> "cilk simd function" > > >> > > > > > > Ok, so you want the same clauses included in both omp declare simd > > > and > > cilk simd function tree lists? > > > > > > For example in the c-parser.c, I have something like this: > > > > > >if (is_cilkplus_cilk_simd_fn) > > > c = build_tree_list (get_identifier ("cilk simd function"), c); > > >else > > > c = build_tree_list (get_identifier ("omp declare simd"), c); > > >TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl); > > >DECL_ATTRIBUTES (fndecl) = c; > > > > > > > > > You want to change it something like this? > > > > > > > > > If (is_cilkplus_cilk_simd_fn) > > >{ > > > tree c_cilk = build_tree_list (get_identifier ("cilk simd function"), > > > c); > > > TREE_CHAIN (c_cilk) = DECL_ATTRIBUTES (fndecl); > > > DECL_ATTRIBUTES (fndecl) = c_cilk; > > > } > > > c = build_tree_list (get_identififer ("omp declare simd"), c); > > > TREE_CHAIN (c) =DECL_ATTRIBUTES (fndecl); DECL_ATTRIBUTES (fndecl) > = > > > c; > > > > > > > > > > Yes. > > The issue with doing this is that it is creating duplicate clones. If I just > kept the > patch as is, it does not. > > For example: > > __attribute__((vector (vectorlength(sizeof(int) int func3 (int x) { > return x; > } > > Should create two clones: mask and unmask (_ZGVbN4v_func3 and > _ZGVbM4v_func3). It is doing that if I kept the patch AS-IS. > > Now, if I make the modification that I mentioned above and remove the > handling of cilk simd function in omp-low.c it is creating 6 clones: > _ZGVdM4v_func3, _ZGVbN4v_func3, _ZGVcN4v_func3, _ZGVcM4v_func3, > _ZGVdN4v_func3 > > Thanks, > > Balaji V. Iyer.
Re: patch for elimination to SP when it is changed in RTL (PR57293)
On 11 December 2013 19:25, Vladimir Makarov wrote: > On 12/11/2013, 5:35 AM, Yvan Roux wrote: >> >> Hi Vladimir, >> >> I've some regressions on ARM after this SP elimination patch, and they >> are execution failures. Here is the list: >> >> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc -O3 -fcilkplus >> gcc.c-torture/execute/va-arg-22.c -O2 >> gcc.dg/atomic/c11-atomic-exec-5.c -O0 >> gfortran.dg/direct_io_12.f90 -O[23] >> gfortran.dg/elemental_dependency_1.f90 -O2 >> gfortran.dg/matmul_2.f90 -O2 >> gfortran.dg/matmul_6.f90 -O2 >> gfortran.dg/mvbits_7.f90 -O3 >> gfortran.dg/unlimited_polymorphic_1.f03 -O3 >> >> I reduced and looked at var-arg-22.c and the issue is that in >> lra_eliminate_regs_1 (called by get_equiv_with_elimination) we >> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset. What >> we try to do here is to change the pseudo 195 of the insn 118 below : >> >> (insn 118 114 112 8 (set (reg:DI 195) >> (unspec:DI [ >> (mem:DI (plus:SI (reg/f:SI 215) >> (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12 >> + 64B]+8 S8 A8]) >> ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi} >> (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192) >> (const_int 8 [0x8])) [7 a35+8 S8 A32]) >> (nil))) >> >> with its equivalent (x arg of lra_eliminate_regs_1): >> >> (mem/c:DI (plus:SI (reg/f:SI 102 sfp) >> (const_int 76 [0x4c])) [7 a35+8 S8 A32]) >> >> lra_eliminate_regs_1 is called with full_p = true (it is not really >> clear for what it means), > > > It means we use full offset between the regs, otherwise we use change in the > full offset from the previous iteration (it can be changed as we reserve > stack memory for spilled pseudos and the reservation can be done several > times). As equiv value is stored as it was before any elimination, we need > always to use full offset to make elimination. Ok thanks it's clearer. > but in the PLUS switch case, we have offset >> >> = 0xb (given by ep->offset) and as lra_get_insn_recog_data >> (insn)->sp_offset value is 0, we will indeed add 0xb to the original >> 0x4c offset. >> > > 0 value is suspicious because it is default. We might have not set up it > from neighbor insns. > > > >> So, here I don't get if it is the sp_offset value of the >> lra_insn_recog_data element which is not well updated or if lra_ >> eliminate_regs_1 has to be called with update_p and not full_p (which >> fixed the value in that particular case). Is it more obvious for you >> ? >> > > Yvan, could you send me the reduced preprocessed case and the options for > cc1 to reproduce it. Here is cc1 command line : cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard -mfpu=neon -mthumb v2.c -O2 I use a native build on a chromebook, but it's reproducible with a cross compiler. With the attached test case the issue is when processing insn 118. Thanks, Yvan typedef __builtin_va_list __gnuc_va_list; typedef __gnuc_va_list va_list; extern void abort (void); extern void exit (int); void bar (int n, int c) { static int lastn = -1, lastc = -1; if (lastn != n) { if (lastc != lastn) abort (); lastc = 0; lastn = n; } if (c != (char) (lastc ^ (n << 3))) abort (); lastc++; } typedef struct { char x[31]; } A31; typedef struct { char x[32]; } A32; typedef struct { char x[35]; } A35; typedef struct { char x[72]; } A72; void foo (int size, ...) { A31 a31; A32 a32; A35 a35; A72 a72; va_list ap; int i; if (size != 21) abort (); __builtin_va_start(ap,size); a31 = __builtin_va_arg(ap,typeof (a31)); for (i = 0; i < 31; i++) bar (31, a31.x[i]); a32 = __builtin_va_arg(ap,typeof (a32)); for (i = 0; i < 32; i++) bar (32, a32.x[i]); a35 = __builtin_va_arg(ap,typeof (a35)); for (i = 0; i < 35; i++) bar (35, a35.x[i]); a72 = __builtin_va_arg(ap,typeof (a72)); for (i = 0; i < 72; i++) bar (72, a72.x[i]); __builtin_va_end(ap); } int main (void) { A31 a31; A32 a32; A35 a35; A72 a72; int i; for (i = 0; i < 31; i++) a31.x[i] = i ^ (31 << 3); for (i = 0; i < 32; i++) a32.x[i] = i ^ (32 << 3); for (i = 0; i < 35; i++) a35.x[i] = i ^ (35 << 3); for (i = 0; i < 72; i++) a72.x[i] = i ^ (72 << 3); foo (21, a31, a32, a35, a72); exit (0); }
[PATCH] Enable Cilk keywords in Cilk Runtime
Hello Everyone, Since we have _Cilk_spawn and _Cilk_sync support in C++ compiler, we can enable the keyword usage in runtime. This patch should do so. Is it Ok to install? Here are the ChangeLog entries: 2013-12-11 Balaji V. Iyer * Makefile.am (GENERAL_FLAGS): Removed un-defining of Cilk keywords. * Makefile.in: Regenerate. * runtime/symbol_test.c: Added a #define to clear out _Cilk_for. Thanks, Balaji V. Iyer. Here is the patch. diff --git a/libcilkrts/Makefile.am b/libcilkrts/Makefile.am index 56bc9eb..d252087 100644 --- a/libcilkrts/Makefile.am +++ b/libcilkrts/Makefile.am @@ -38,7 +38,7 @@ ACLOCAL_AMFLAGS = -I .. -I ../config # Compiler and linker flags. GENERAL_FLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/runtime -I$(top_srcdir)/runtime/config/$(config_dir) -DIN_CILK_RUNTIME=1 -GENERAL_FLAGS += -D_Cilk_spawn="" -D_Cilk_sync="" -D_Cilk_for=for +# GENERAL_FLAGS += -D_Cilk_spawn="" -D_Cilk_sync="" -D_Cilk_for=for # Enable Intel Cilk Plus extension GENERAL_FLAGS += -fcilkplus diff --git a/libcilkrts/Makefile.in b/libcilkrts/Makefile.in index 5066bef..94902e1 100644 --- a/libcilkrts/Makefile.in +++ b/libcilkrts/Makefile.in @@ -342,12 +342,12 @@ AUTOMAKE_OPTIONS = foreign ACLOCAL_AMFLAGS = -I .. -I ../config # Compiler and linker flags. +# GENERAL_FLAGS += -D_Cilk_spawn="" -D_Cilk_sync="" -D_Cilk_for=for # Enable Intel Cilk Plus extension GENERAL_FLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/runtime \ -I$(top_srcdir)/runtime/config/$(config_dir) \ - -DIN_CILK_RUNTIME=1 -D_Cilk_spawn="" -D_Cilk_sync="" \ - -D_Cilk_for=for -fcilkplus + -DIN_CILK_RUNTIME=1 -fcilkplus AM_CFLAGS = $(GENERAL_FLAGS) -std=c99 AM_CPPFLAGS = $(GENERAL_FLAGS) AM_LDFLAGS = -lpthread diff --git a/libcilkrts/runtime/symbol_test.c b/libcilkrts/runtime/symbol_test.c index 1113ecd..8291d36 100644 --- a/libcilkrts/runtime/symbol_test.c +++ b/libcilkrts/runtime/symbol_test.c @@ -41,6 +41,7 @@ * will cause a linker error. */ +#define _Cilk_for for extern void* __cilkrts_global_state; void *volatile p;
Re: [REPOST] Invalid Code when reading from unaligned zero-sized array
> Yes we do, even for struct { struct { int a; char a[1] } }; (note the not > really "trailing" as there is padding after the trailing array). We do > take size limitations from a DECL (if we see one) into account to limit the > effect of this trailing-array-supporting, so it effectively only applies to > indirect accesses (and the padding example above, you can use the whole > padding if DECL_SIZE allows that). OK, so we want the attached patch? FWIW it passed make -k check-c check-c++ RUNTESTFLAGS="compat.exp struct-layout-1.exp" on x86/Linux, x86-64/Linux, PowerPC/Linux [*], IA-64/Linux, SPARC/Solaris and SPARC64/Solaris with ALT_CC_UNDER_TEST set to the unpatched compiler. [*] the failures (DFP related) are the same as with the unpatched compiler. -- Eric BotcazouIndex: stor-layout.c === --- stor-layout.c (revision 205881) +++ stor-layout.c (working copy) @@ -1605,6 +1605,15 @@ compute_record_mode (tree type) || ! tree_fits_uhwi_p (DECL_SIZE (field))) return; + /* As a GNU extension, we support out-of-bounds accesses for a trailing + array in a record type. In this case, if the element type has a non + zero size, then the record type effectively has variable size so it + needs to have BLKmode. */ + if (!DECL_CHAIN (field) + && TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE + && !integer_zerop (TYPE_SIZE (TREE_TYPE (TREE_TYPE (field) + return; + /* If this field is the whole struct, remember its mode so that, say, we can put a double in a class into a DF register instead of forcing it to live in the stack. */
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
On 12/11/2013 12:14 PM, Jan Hubicka wrote: Is ipa_passes the right place to initialize calls_comdat_local? The flag is probably needed in both early inliner and IPA inliner. A conservative place to initialize it would be in inline_analyze_function. (early inliner analyze function twice, first before inlining and next after early optimization, so the update should also clear the flag if call disapeared). Unfortunately early inlining doesn't call inline_analyze_function on all functions, so we need to initialize it somewhere else. Is there a reason not to set up an initial value in ipa_passes? I think we also want to clear it if call to comdat local function is marked inline. Makes sense. Jason
Re: [RFA][PATCH][PR tree-optimization/45685]
On 12/11/13 02:51, Richard Biener wrote: That looks wrong - you want to look at HONOR_NANS on the mode of one of the comparison operands, not of the actual value you want to negate (it's integer and thus never has NaNs). The rest of the patch looks ok to me. Here's the updated version. It addresses the two issues you raised. Specifically it adds the hairy condition to avoid this code in cases where it's not likely to be useful and it fixes the call to invert_tree_comparison. Bootstrapped and regression tested on x86_64-unknown-linux-gnu OK for the trunk? jeff PR tree-optimization/45685 * tree-ssa-phiopt.c (neg_replacement): New function. (tree_ssa_phiopt_worker): Call it. PR tree-optimization/45685 * gcc.dg/tree-ssa/pr45685.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45685.c b/gcc/testsuite/gcc.dg/tree-ssa/pr45685.c new file mode 100644 index 000..0628943 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45685.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-phiopt1-details" } */ + +typedef unsigned long int uint64_t; +typedef long int int64_t; +int summation_helper_1(int64_t* products, uint64_t count) +{ + int s = 0; + uint64_t i; + for(i=0; i0) ? 1 : -1; + products[i] *= val; + if(products[i] != i) + val = -val; + products[i] = val; + s += val; + } + return s; +} + + +int summation_helper_2(int64_t* products, uint64_t count) +{ + int s = 0; + uint64_t i; + for(i=0; i0) ? 1 : -1; + products[i] *= val; + if(products[i] != i) + val = -val; + products[i] = val; + s += val; + } + return s; +} + +/* { dg-final { scan-tree-dump-times "converted to straightline code" 2 "phiopt1" } } */ +/* { dg-final { cleanup-tree-dump "phiopt1" } } */ + diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index 11e565f..96154fb 100644 --- a/gcc/tree-ssa-phiopt.c +++ b/gcc/tree-ssa-phiopt.c @@ -69,6 +69,8 @@ static bool minmax_replacement (basic_block, basic_block, edge, edge, gimple, tree, tree); static bool abs_replacement (basic_block, basic_block, edge, edge, gimple, tree, tree); +static bool neg_replacement (basic_block, basic_block, +edge, edge, gimple, tree, tree); static bool cond_store_replacement (basic_block, basic_block, edge, edge, struct pointer_set_t *); static bool cond_if_else_store_replacement (basic_block, basic_block, basic_block); @@ -336,6 +338,23 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads) /* Calculate the set of non-trapping memory accesses. */ nontrap = get_non_trapping (); + /* The replacement of conditional negation with a non-branching + sequence is really only a win when optimizing for speed and we + can avoid transformations by gimple if-conversion that result + in poor RTL generation. + + Ideally either gimple if-conversion or the RTL expanders will + be improved and the code to emit branchless conditional negation + can be removed. */ + bool replace_conditional_negation = false; + if (!do_store_elim) +replace_conditional_negation + = ((!optimize_size && optimize >= 2) +|| (((flag_tree_loop_vectorize || cfun->has_force_vect_loops) + && flag_tree_loop_if_convert != 0) +|| flag_tree_loop_if_convert == 1 +|| flag_tree_loop_if_convert_stores == 1)); + /* Search every basic block for COND_EXPR we may be able to optimize. We walk the blocks in order that guarantees that a block with @@ -489,6 +508,9 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads) cfgchanged = true; else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) cfgchanged = true; + else if (replace_conditional_negation + && neg_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) + cfgchanged = true; else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) cfgchanged = true; } @@ -1285,6 +1307,143 @@ abs_replacement (basic_block cond_bb, basic_block middle_bb, return true; } +/* The function neg_replacement replaces conditional negation with +equivalent straight line code. Returns TRUE if replacement is done, +otherwise returns FALSE. + +COND_BB branches around negation occuring in MIDDLE_BB. + +E0 and E1 are edges out of COND_BB. E0 reaches MIDDLE_BB and +E1 reaches the other successor which should contain PHI with +arguments ARG0 and ARG1. + +Assuming negation is to occur when the condition is true, +then the non-branching sequence is: + + result = (rhs ^ -cond) + cond + +Inverting t
Re: [PATCH] Enable Cilk keywords in Cilk Runtime
On 12/11/13 12:04, Iyer, Balaji V wrote: Hello Everyone, Since we have _Cilk_spawn and _Cilk_sync support in C++ compiler, we can enable the keyword usage in runtime. This patch should do so. Is it Ok to install? Here are the ChangeLog entries: 2013-12-11 Balaji V. Iyer * Makefile.am (GENERAL_FLAGS): Removed un-defining of Cilk keywords. * Makefile.in: Regenerate. * runtime/symbol_test.c: Added a #define to clear out _Cilk_for. This is fine. jeff
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
On 12/11/2013 12:45 PM, Jan Hubicka wrote: I wonder, if we won't end up with better code going the oposite way. We can declare those functions static first and then have post-inliner IPA pass that will travel the callgraph and mark all static functions/variables that are accessed only from one comdat to be comdat local. With that scheme I would be concerned that we would tend to inline the wrappers so that we can't move the worker function into the comdat, and we end up with copies of it in lots of object files. Jason
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
> On 12/11/2013 12:45 PM, Jan Hubicka wrote: > >I wonder, if we won't end up with better code going the oposite way. > >We can declare those functions static first and then have post-inliner IPA > >pass that will > >travel the callgraph and mark all static functions/variables that are > >accessed only from > >one comdat to be comdat local. > > With that scheme I would be concerned that we would tend to inline > the wrappers so that we can't move the worker function into the > comdat, and we end up with copies of it in lots of object files. Hmm, you are right. Inliner and ipa-cp will need a cost model for dragging stuff local in any case. Perhaps we even want to "privatize" as much as possible prior inlining. I will play with that incrementally. Lets go with minimal version of patch that makes things working and I will take care of solving the inliner limitations. Honza > > Jason
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
> On 12/11/2013 12:14 PM, Jan Hubicka wrote: > >>Is ipa_passes the right place to initialize calls_comdat_local? > > > >The flag is probably needed in both early inliner and IPA inliner. A > >conservative > >place to initialize it would be in inline_analyze_function. > >(early inliner analyze function twice, first before inlining and next after > >early optimization, so the update should also clear the flag if call > >disapeared). > > Unfortunately early inlining doesn't call inline_analyze_function on > all functions, so we need to initialize it somewhere else. Is there > a reason not to set up an initial value in ipa_passes? Well, less thing we have done behind passmanager back, the better. ipa-passes should really just set up compiler to execute ipa passes, but it should not do any analysis by itself. Every function we inline or clone needs to go through the inliner's analysis. But looking into it now, the place is probably compute_inline_paremeters, since inline_analyze_function is just a wrapper for the IPA pass, but pass_data_inline_parameters bypass it (this is bit confusing - I will clean it up) Honza > > >I think we also want to clear it if call to comdat local function is marked > >inline. > > Makes sense. > > Jason
Re: [PATCH] Enable Cilk keywords in Cilk Runtime
On Wed, Dec 11, 2013 at 11:04 AM, Iyer, Balaji V wrote: > Hello Everyone, > Since we have _Cilk_spawn and _Cilk_sync support in C++ compiler, we > can enable the keyword usage in runtime. This patch should do so. > > Is it Ok to install? > > Here are the ChangeLog entries: > > 2013-12-11 Balaji V. Iyer > > * Makefile.am (GENERAL_FLAGS): Removed un-defining of Cilk keywords. > * Makefile.in: Regenerate. > * runtime/symbol_test.c: Added a #define to clear out _Cilk_for. Those tests failed on Linux/i686. H.J. --- FAIL: g++.dg/cilk-plus/CK/catch_exc.cc -g -O2 -fcilkplus -L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs execution test FAIL: g++.dg/cilk-plus/CK/catch_exc.cc -g -O3 -fcilkplus -L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs execution test FAIL: g++.dg/cilk-plus/CK/catch_exc.cc -O1 -fcilkplus -L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs execution test FAIL: g++.dg/cilk-plus/CK/catch_exc.cc -O2 -fcilkplus -L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs execution test FAIL: g++.dg/cilk-plus/CK/catch_exc.cc -O3 -fcilkplus -L/export/gnu/import/git/gcc-test-intel64corei7/bld/x86_64-unknown-linux-gnu/32/libcilkrts/.libs execution test
C++ edge_iterator (was: Re: [SH] PR 53976 - Add RTL pass to eliminate clrt, sett insns)
Same message, but now with the correct patch attached. Sorry. On Thu, 2013-11-21 at 00:04 +0100, Steven Bosscher wrote: > Declaring the edge_iterator inside the for() is not a good argument > against FOR_EACH_EDGE. Of course, brownie points are up for grabs for > the brave soul daring enough to make edge iterators be proper C++ > iterators... ;-) So, I gave it a try -- see the attached patch. It allows edge iteration to look more like STL container iteration: for (basic_block::edge_iterator ei = bb->pred_edges ().begin (); ei != bb->pred_edges ().end (); ++ei) { basic_block pred_bb = (*ei)->src; ... } The idea is to first make class vec look more like an STL container. This would also allow iterating it with std::for_each and use C++11 range based for loop (in a few years maybe). It would also make it easier to replace vec uses with STL containers if needed and vice versa. Then the typedef struct basic_block_def* basic_block; is replaced with a wrapper class 'basic_block', which is just a simple POD wrapper around a basic_block_def*. There should be no penalties compared to passing/storing raw pointers. Because of the union with constructor restriction of C++98 an additional wrapper class 'basic_block_in_union' is required, which doesn't have any constructors defined. Having 'basic_block' as a class allows putting typedefs for the edge iterator types in there (initially I tried putting the typedefs into struct basic_block_def, but gengtype would bail out). It would also be possible to have a free standing definition / typedef of edge_iterator, but it would conflict with the existing one and require too many changes at once. Moreover, the iterator type actually depends on the container type, which is vec, and the container type is defined/selected by the basic_block class. The following basic_block pred_bb = (*ei)->src; can also be written as basic_block pred_bb = ei->src; after converting the edge typedef to a wrapper of edge_def*. The idea of the approach is to allow co-existence of the new edge_iterator and the old and thus be able to gradually convert code. The wrappers around raw pointers also helo encapsulating the underlying memory management issues. For example, it would be much easier to replace garbage collected objects with intrusive reference counting. Comments and feedback appreciated. Cheers, Oleg Index: gcc/config/sh/sh_optimize_sett_clrt.cc === --- gcc/config/sh/sh_optimize_sett_clrt.cc (revision 205866) +++ gcc/config/sh/sh_optimize_sett_clrt.cc (working copy) @@ -390,10 +390,10 @@ { prev_visited_bb.push_back (bb); - for (edge_iterator ei = ei_start (bb->preds); !ei_end_p (ei); - ei_next (&ei)) + for (basic_block::edge_iterator ei = bb->pred_edges ().begin (); + ei != bb->pred_edges ().end (); ++ei) { - basic_block pred_bb = ei_edge (ei)->src; + basic_block pred_bb = (*ei)->src; pred_bb_count += 1; find_last_ccreg_values (BB_END (pred_bb), pred_bb, values_out, prev_visited_bb); Index: gcc/dumpfile.c === --- gcc/dumpfile.c (revision 205866) +++ gcc/dumpfile.c (working copy) @@ -25,6 +25,7 @@ #include "tree.h" #include "gimple-pretty-print.h" #include "context.h" +#include "basic-block2.h" /* If non-NULL, return one past-the-end of the matching SUBPART of the WHOLE string. */ Index: gcc/tracer.c === --- gcc/tracer.c (revision 205866) +++ gcc/tracer.c (working copy) @@ -102,7 +102,7 @@ /* A transaction is a single entry multiple exit region. It must be duplicated in its entirety or not at all. */ - g = last_stmt (CONST_CAST_BB (bb)); + g = last_stmt (basic_block (bb)); if (g && gimple_code (g) == GIMPLE_TRANSACTION) return true; Index: gcc/dominance.c === --- gcc/dominance.c (revision 205866) +++ gcc/dominance.c (working copy) @@ -503,6 +503,8 @@ TBB v, w, k, par; basic_block en_block; edge_iterator ei, einext; + TBB k1; + basic_block b; if (reverse) en_block = EXIT_BLOCK_PTR_FOR_FN (cfun); @@ -538,9 +540,6 @@ semidominator. */ while (!ei_end_p (ei)) { - TBB k1; - basic_block b; - e = ei_edge (ei); b = (reverse) ? e->dest : e->src; einext = ei; Index: gcc/vec.h === --- gcc/vec.h (revision 205866) +++ gcc/vec.h (working copy) @@ -482,6 +482,15 @@ void quick_grow (unsigned len); void quick_grow_cleared (unsigned len); + /* STL like iterator interface. */ + typedef T* iterator; + typedef const T* const_iterator; + + iterator begin (void) { return &m_vecdata[0]; } + iterator end (void) { return &m_vecdata[m_vecpfx.m_num]; } + const_iterator begin (void) const { return &m_vecdata[0]; } + const_i
[Patch, Fortran, OOP] PR 59450: ICE for type-bound-procedure expression in module procedure interface
Hi all, the PR in the subject line involves a module procedure whose result has array bounds which contain a type-bound procedure call. Since the procedure interface is written to the .mod file, also the TBP expression needs to be written. This leads to an ICE, since it simply has not been implemented yet. When writing a function reference, we now discriminate between three cases: (1) 'ordinary' procedures, (2) type-bound procedures and (3) intrinsic procedures. We first read/write an integer value to indicate which case we're dealing with, and then do the specific I/O for this case (see patch). Up to now we basically did the same already, but only including two cases (ordinary and intrinsic functions). The patch has been regtested on x86_64-unknown-linux-gnu. Ok for trunk? Should I also bump the MOD_VERSION? Cheers, Janus 2013-12-11 Janus Weil PR fortran/59450 * module.c (mio_expr): Handle type-bound function expressions. 2013-12-11 Janus Weil PR fortran/59450 * gfortran.dg/typebound_proc_31.f90: New. Index: gcc/fortran/module.c === --- gcc/fortran/module.c(revision 205857) +++ gcc/fortran/module.c(working copy) @@ -3358,12 +3358,24 @@ mio_expr (gfc_expr **ep) { e->value.function.name = mio_allocated_string (e->value.function.name); - flag = e->value.function.esym != NULL; + if (e->value.function.esym) + flag = 1; + else if (e->ref) + flag = 2; + else + flag = 0; mio_integer (&flag); - if (flag) - mio_symbol_ref (&e->value.function.esym); - else - write_atom (ATOM_STRING, e->value.function.isym->name); + switch (flag) + { + case 1: + mio_symbol_ref (&e->value.function.esym); + break; + case 2: + mio_ref_list (&e->ref); + break; + default: + write_atom (ATOM_STRING, e->value.function.isym->name); + } } else { @@ -3372,10 +3384,15 @@ mio_expr (gfc_expr **ep) free (atom_string); mio_integer (&flag); - if (flag) - mio_symbol_ref (&e->value.function.esym); - else + switch (flag) { + case 1: + mio_symbol_ref (&e->value.function.esym); + break; + case 2: + mio_ref_list (&e->ref); + break; + default: require_atom (ATOM_STRING); e->value.function.isym = gfc_find_function (atom_string); free (atom_string); ! { dg-do compile } ! ! PR 59450: [OOP] ICE for type-bound-procedure expression in module procedure interface ! ! Contributed by module classes implicit none type :: base_class contains procedure, nopass :: get_num end type contains pure integer function get_num() end function function get_array( this ) result(array) class(base_class), intent(in) :: this integer, dimension( this%get_num() ) :: array end function end module ! { dg-final { cleanup-modules "classes" } }
[PATCH][PR rtl-optimization/59446] Don't pessimize threading through loop exits
59446 is a missed-optimization error where we want to thread through multiple loop exits and that request is being cancelled. The problem is I was looking at changes in the loop_father as a proxy for crossing loop headers. That was utterly dumb as we can test directly for crossing a loop header. Anyway, we saw the changes in the loop structure, figured we were threading through too mean headers and truncated the jump thread. Dumb. Fixing that happens to clean up the code ever-so-slightly, which is good. Bootstrapped and regression tested on x86_64-unknown-linux-gnu and installed on the trunk. PR rtl-optimization/59446 * tree-ssa-threadupdate.c (mark_threaded_blocks): Properly test for crossing a loop header. diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 6f978e2..af8fd85 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -1449,44 +1449,32 @@ mark_threaded_blocks (bitmap threaded_blocks) { vec *path = THREAD_PATH (e); - /* Basically we're looking for a situation where we can see -3 or more loop structures on a jump threading path. */ - - struct loop *first_father = (*path)[0]->e->src->loop_father; - struct loop *second_father = NULL; - for (unsigned int i = 0; i < path->length (); i++) + for (unsigned int i = 0, crossed_headers = 0; + i < path->length (); + i++) { - /* See if this is a loop father we have not seen before. */ - if ((*path)[i]->e->dest->loop_father != first_father - && (*path)[i]->e->dest->loop_father != second_father) + basic_block dest = (*path)[i]->e->dest; + crossed_headers += (dest == dest->loop_father->header); + if (crossed_headers > 1) { - /* We've already seen two loop fathers, so we -need to trim this jump threading path. */ - if (second_father != NULL) - { - /* Trim from entry I onwards. */ - for (unsigned int j = i; j < path->length (); j++) - delete (*path)[j]; - path->truncate (i); - - /* Now that we've truncated the path, make sure -what's left is still valid. We need at least -two edges on the path and the last edge can not -be a joiner. This should never happen, but let's -be safe. */ - if (path->length () < 2 - || (path->last ()->type - == EDGE_COPY_SRC_JOINER_BLOCK)) - { - delete_jump_thread_path (path); - e->aux = NULL; - } - break; - } - else + /* Trim from entry I onwards. */ + for (unsigned int j = i; j < path->length (); j++) + delete (*path)[j]; + path->truncate (i); + + /* Now that we've truncated the path, make sure +what's left is still valid. We need at least +two edges on the path and the last edge can not +be a joiner. This should never happen, but let's +be safe. */ + if (path->length () < 2 + || (path->last ()->type + == EDGE_COPY_SRC_JOINER_BLOCK)) { - second_father = (*path)[i]->e->dest->loop_father; + delete_jump_thread_path (path); + e->aux = NULL; } + break; } } }
Go patch committed: Minor fixes for recover thunks
This patch to the Go frontend fixes a couple of problems with recover thunks. First, it avoids name collisions when methods call recover; previously if two different methods with the same name (for different types) called recover, the assembler symbols would be the same. Second, it avoids crashing if a method with an unnamed receiver calls recover. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r ba0adcfb4e6a go/gogo.cc --- a/go/gogo.cc Fri Dec 06 10:24:09 2013 -0800 +++ b/go/gogo.cc Wed Dec 11 14:21:36 2013 -0800 @@ -2822,7 +2822,10 @@ if (orig_fntype->is_varargs()) new_fntype->set_is_varargs(); - std::string name = orig_no->name() + "$recover"; + std::string name = orig_no->name(); + if (orig_fntype->is_method()) +name += "$" + orig_fntype->receiver()->type()->mangled_name(gogo); + name += "$recover"; Named_object *new_no = gogo->start_function(name, new_fntype, false, location); Function *new_func = new_no->func_value(); @@ -2916,7 +2919,25 @@ && !orig_rec_no->var_value()->is_receiver()); orig_rec_no->var_value()->set_is_receiver(); - const std::string& new_receiver_name(orig_fntype->receiver()->name()); + std::string new_receiver_name(orig_fntype->receiver()->name()); + if (new_receiver_name.empty()) + { + // Find the receiver. It was named "r.NNN" in + // Gogo::start_function. + for (Bindings::const_definitions_iterator p = + new_bindings->begin_definitions(); + p != new_bindings->end_definitions(); + ++p) + { + const std::string& pname((*p)->name()); + if (pname[0] == 'r' && pname[1] == '.') + { + new_receiver_name = pname; + break; + } + } + go_assert(!new_receiver_name.empty()); + } Named_object* new_rec_no = new_bindings->lookup_local(new_receiver_name); if (new_rec_no == NULL) go_assert(saw_errors());
libgo patch committed: Let MakeFunc functions call recover
The only Go functions that may call recover are those that are immediately deferred. This permits the correct handling of a panic in a function called by a deferred function. In gccgo this is implemented by recording the return address of a function that may call recover, and checking that return address in the recover call. That works for normal functions, but fails for functions created by reflect.MakeFunc. For a MakeFunc function the caller will be some function from libffi. This patch lets that work correctly by hooking into the stub used to start functions created by reflect.MakeFunc. That stub now does the checking for whether the function may call recover, and the function itself checks whether it was called directly from libffi. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r 4863a5e8f8a3 libgo/go/reflect/makefunc_386.S --- a/libgo/go/reflect/makefunc_386.S Wed Dec 11 14:34:28 2013 -0800 +++ b/libgo/go/reflect/makefunc_386.S Wed Dec 11 15:35:56 2013 -0800 @@ -48,6 +48,15 @@ leal 8(%ebp), %eax /* Set esp field in struct. */ movl %eax, -24(%ebp) + /* For MakeFunc functions that call recover. */ + movl 4(%ebp), %eax + movl %eax, (%esp) +#ifdef __PIC__ + call __go_makefunc_can_recover@PLT +#else + call __go_makefunc_can_recover +#endif + #ifdef __PIC__ call __go_get_closure@PLT #else @@ -65,6 +74,13 @@ call reflect.MakeFuncStubGo #endif + /* MakeFunc functions can no longer call recover. */ +#ifdef __PIC__ + call __go_makefunc_returning@PLT +#else + call __go_makefunc_returning +#endif + /* Set return registers. */ movl -20(%ebp), %eax diff -r 4863a5e8f8a3 libgo/go/reflect/makefunc_amd64.S --- a/libgo/go/reflect/makefunc_amd64.S Wed Dec 11 14:34:28 2013 -0800 +++ b/libgo/go/reflect/makefunc_amd64.S Wed Dec 11 15:35:56 2013 -0800 @@ -61,6 +61,14 @@ movdqa %xmm6, 0xa0(%rsp) movdqa %xmm7, 0xb0(%rsp) + /* For MakeFunc functions that call recover. */ + movq 8(%rbp), %rdi +#ifdef __PIC__ + call __go_makefunc_can_recover@PLT +#else + call __go_makefunc_can_recover +#endif + # Get function type. #ifdef __PIC__ call __go_get_closure@PLT @@ -77,6 +85,13 @@ call reflect.MakeFuncStubGo #endif + /* MakeFunc functions can no longer call recover. */ +#ifdef __PIC__ + call __go_makefunc_returning@PLT +#else + call __go_makefunc_returning +#endif + # The structure will be updated with any return values. Load # all possible return registers before returning to the caller. diff -r 4863a5e8f8a3 libgo/runtime/go-defer.c --- a/libgo/runtime/go-defer.c Wed Dec 11 14:34:28 2013 -0800 +++ b/libgo/runtime/go-defer.c Wed Dec 11 15:35:56 2013 -0800 @@ -27,6 +27,7 @@ n->__pfn = pfn; n->__arg = arg; n->__retaddr = NULL; + n->__makefunc_can_recover = 0; g->defer = n; } diff -r 4863a5e8f8a3 libgo/runtime/go-defer.h --- a/libgo/runtime/go-defer.h Wed Dec 11 14:34:28 2013 -0800 +++ b/libgo/runtime/go-defer.h Wed Dec 11 15:35:56 2013 -0800 @@ -34,4 +34,10 @@ set by __go_set_defer_retaddr which is called by the thunks created by defer statements. */ const void *__retaddr; + + /* Set to true if a function created by reflect.MakeFunc is + permitted to recover. The return address of such a function + function will be somewhere in libffi, so __retaddr is not + useful. */ + _Bool __makefunc_can_recover; }; diff -r 4863a5e8f8a3 libgo/runtime/go-recover.c --- a/libgo/runtime/go-recover.c Wed Dec 11 14:34:28 2013 -0800 +++ b/libgo/runtime/go-recover.c Wed Dec 11 15:35:56 2013 -0800 @@ -16,12 +16,14 @@ __go_can_recover--this is, the thunk. */ _Bool -__go_can_recover (const void* retaddr) +__go_can_recover (const void *retaddr) { G *g; struct __go_defer_stack *d; const char* ret; const char* dret; + Location loc; + const byte *name; g = runtime_g (); @@ -52,7 +54,73 @@ #endif dret = (const char *) d->__retaddr; - return ret <= dret && ret + 16 >= dret; + if (ret <= dret && ret + 16 >= dret) +return 1; + + /* If the function calling recover was created by reflect.MakeFunc, + then RETADDR will be somewhere in libffi. Our caller is + permitted to recover if it was called from libffi. */ + if (!d->__makefunc_can_recover) +return 0; + + if (runtime_callers (2, &loc, 1) < 1) +return 0; + + /* If we have no function name, then we weren't called by Go code. + Guess that we were called by libffi. */ + if (loc.function.len == 0) +return 1; + + if (loc.function.len < 4) +return 0; + name = loc.function.str; + if (*name == '_') +{ + if (loc.function.len < 5) + return 0; + ++name; +} + + if (name[0] == 'f' && name[1] == 'f' && name[2] == 'i' && name[3] == '_') +return 1; + + return 0; +} + +/* This function is called when code is about to enter a function + created by reflect.MakeFunc. It is called by the function stub + used by MakeFunc. If the stub is permitted to call recover,
Re: Two build != host fixes
On Wed, Dec 11, 2013 at 02:11:49PM +0100, Bernd Edlinger wrote: > We need the auto-build only to build something that translates .md files to > .c, > so I would'nt care about GMP, but some other things, like the right prototype > for > printf make a difference. Right, but when you get some of the HAVE_* wrong, libiberty for the build compiler provides some of the "missing" functions and declarations. The declarations can clash with system header declarations giving you bootstrap failures for no good reason.. > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > > index aad927c..7995e64 100644 > > --- a/gcc/Makefile.in > > +++ b/gcc/Makefile.in > > @@ -747,7 +747,8 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS) > > > > # Native linker and preprocessor flags. For x-fragment overrides. > > BUILD_LDFLAGS=@BUILD_LDFLAGS@ > > -BUILD_CPPFLAGS=$(ALL_CPPFLAGS) > > +BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \ > > + -I$(srcdir)/../include $(CPPINC) > > > > I did not have this one. > What is it good for? It fixes another case of host header directories being searched for the build compiler. Important when GMPINC and other *INC point at installed locations for the host compiler. Trouble is, you don't just get the headers you want (eg. gmp.h) but all the other host headers too. -- Alan Modra Australia Development Lab, IBM
Go patch committed: Implement method values in reflect package
This patch to the Go frontend and libgo implements method values in the reflect package. Working with method values and reflect now works correctly, at least on x86. This changes the type signature for type methods in the reflect package to match the gc compiler. That in turn required changing the reflect package to mark method values with a new flag, as previously they were detected by the type signature. The MakeFunc support needed to create a function that takes a value and passes a pointer to the method, since all methods take pointers. It also needed to create a function that holds a receiver value. And the recover code needed to handle these new cases. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r e165d3ccf1e4 go/types.cc --- a/go/types.cc Wed Dec 11 15:38:00 2013 -0800 +++ b/go/types.cc Wed Dec 11 16:57:53 2013 -0800 @@ -2261,26 +2261,9 @@ ++p; go_assert(p->is_field_name("typ")); - if (!only_value_methods && m->is_value_method()) -{ - // This is a value method on a pointer type. Change the type of - // the method to use a pointer receiver. The implementation - // always uses a pointer receiver anyhow. - Type* rtype = mtype->receiver()->type(); - Type* prtype = Type::make_pointer_type(rtype); - Typed_identifier* receiver = - new Typed_identifier(mtype->receiver()->name(), prtype, - mtype->receiver()->location()); - mtype = Type::make_function_type(receiver, - (mtype->parameters() == NULL - ? NULL - : mtype->parameters()->copy()), - (mtype->results() == NULL - ? NULL - : mtype->results()->copy()), - mtype->location()); -} - vals->push_back(Expression::make_type_descriptor(mtype, bloc)); + bool want_pointer_receiver = !only_value_methods && m->is_value_method(); + nonmethod_type = mtype->copy_with_receiver_as_param(want_pointer_receiver); + vals->push_back(Expression::make_type_descriptor(nonmethod_type, bloc)); ++p; go_assert(p->is_field_name("tfn")); @@ -4008,6 +3991,32 @@ return ret; } +// Make a copy of a function type with the receiver as the first +// parameter. + +Function_type* +Function_type::copy_with_receiver_as_param(bool want_pointer_receiver) const +{ + go_assert(this->is_method()); + Typed_identifier_list* new_params = new Typed_identifier_list(); + Type* rtype = this->receiver_->type(); + if (want_pointer_receiver) +rtype = Type::make_pointer_type(rtype); + Typed_identifier receiver(this->receiver_->name(), rtype, + this->receiver_->location()); + new_params->push_back(receiver); + const Typed_identifier_list* orig_params = this->parameters_; + if (orig_params != NULL && !orig_params->empty()) +{ + for (Typed_identifier_list::const_iterator p = orig_params->begin(); + p != orig_params->end(); + ++p) + new_params->push_back(*p); +} + return Type::make_function_type(NULL, new_params, this->results_, + this->location_); +} + // Make a copy of a function type ignoring any receiver and adding a // closure parameter. diff -r e165d3ccf1e4 go/types.h --- a/go/types.h Wed Dec 11 15:38:00 2013 -0800 +++ b/go/types.h Wed Dec 11 16:57:53 2013 -0800 @@ -1797,6 +1797,12 @@ Function_type* copy_with_receiver(Type*) const; + // Return a copy of this type with the receiver treated as the first + // parameter. If WANT_POINTER_RECEIVER is true, the receiver is + // forced to be a pointer. + Function_type* + copy_with_receiver_as_param(bool want_pointer_receiver) const; + // Return a copy of this type ignoring any receiver and using dummy // names for all parameters. This is used for thunks for method // values. diff -r e165d3ccf1e4 libgo/go/reflect/all_test.go --- a/libgo/go/reflect/all_test.go Wed Dec 11 15:38:00 2013 -0800 +++ b/libgo/go/reflect/all_test.go Wed Dec 11 16:57:53 2013 -0800 @@ -1631,9 +1631,13 @@ } } -/* Not yet implemented for gccgo - func TestMethodValue(t *testing.T) { + switch runtime.GOARCH { + case "amd64", "386": + default: + t.Skip("reflect method values not implemented for " + runtime.GOARCH) + } + p := Point{3, 4} var i int64 @@ -1721,8 +1725,6 @@ } } -*/ - // Reflect version of $GOROOT/test/method5.go // Concrete types implementing M method. @@ -1807,14 +1809,18 @@ func (t4 Tm4) M(x int, b byte) (byte, int) { return b, x + 40 } func TestMethod5(t *testing.T) { - /* Not yet used for gccgo + switch runtime.GOARCH { + case "amd64", "386": + default: + t.Skip("reflect method values not implemented for " + runtime.GOARCH) + } + CheckF := func(name string, f func(int, byte) (byte, int), inc int) { b, x := f(1000, 99) if b != 99 || x != 1000+inc { t.Errorf("%s(1000, 99) = %v, %v, want 99, %v", name, b, x, 1000+inc) } } - */ CheckV := func(name string, i Value, inc int) { bx := i.Method(0).Call([]Value{ValueOf(1000), ValueOf(byte(99))}) @@ -1824,9 +1830,7 @@ t.Error
[trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c
This patch is for the trunk, but it solves a problem that comes up for wide-int. For wide-int we need to have the BITS_PER_UNIT available earlier.So this patch sets the default value (8) in genmodes.c so that it is available by anyone who includes insn-modes.h. The generator for tm.h was modified to include insn-modes.h.The value for BITS_PER_UNIT can be overridden by any port by placing a define for it in their target modes file. This patch removes the definition of BITS_PER_UNIT from 7 platform .h files. All of those platforms initialized it to the default value so there was no need for additions to their target modes file. In addition, this target also changes the way that MAX_BITSIZE_MODE_ANY_INT is calculated.The value is heavily used on the wide-int branch to allocate buffers that are used to hold large integer values. The change in the way it is computed was motivated by the i386 port, but there may be other ports that have the same problem. The i386 port defines two very large integer modes that are only used as containers for large vectors. They are never used for large integers. The new way of computing this allows a port to say (very early) that some of these integer modes are never used to hold numbers and so smaller buffers can be used for integer calculations. Other ports that play the same game should follow suit. This patch has been bootstrapped and regression tested on x86-64. Ok to commit? Kenny Index: gcc/config/arc/arc.h === --- gcc/config/arc/arc.h (revision 205895) +++ gcc/config/arc/arc.h (working copy) @@ -303,9 +303,6 @@ along with GCC; see the file COPYING3. numbered. */ #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN) -/* Number of bits in an addressable storage unit. */ -#define BITS_PER_UNIT 8 - /* Width in bits of a "word", which is the contents of a machine register. Note that this is not necessarily the width of data type `int'; if using 16-bit ints on a 68000, this would still be 32. Index: gcc/config/bfin/bfin.h === --- gcc/config/bfin/bfin.h (revision 205895) +++ gcc/config/bfin/bfin.h (working copy) @@ -859,9 +859,6 @@ typedef struct { /* Define this if most significant word of a multiword number is numbered. */ #define WORDS_BIG_ENDIAN 0 -/* number of bits in an addressable storage unit */ -#define BITS_PER_UNIT 8 - /* Width in bits of a "word", which is the contents of a machine register. Note that this is not necessarily the width of data type `int'; if using 16-bit ints on a 68000, this would still be 32. Index: gcc/config/lm32/lm32.h === --- gcc/config/lm32/lm32.h (revision 205895) +++ gcc/config/lm32/lm32.h (working copy) @@ -73,7 +73,6 @@ #define BYTES_BIG_ENDIAN 1 #define WORDS_BIG_ENDIAN 1 -#define BITS_PER_UNIT 8 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 4 Index: gcc/config/m32c/m32c.h === --- gcc/config/m32c/m32c.h (revision 205895) +++ gcc/config/m32c/m32c.h (working copy) @@ -140,7 +140,6 @@ machine_function; matches "int". Pointers are 16 bits for R8C/M16C (when TARGET_A16 is true) and 24 bits for M32CM/M32C (when TARGET_A24 is true), but 24-bit pointers are stored in 32-bit words. */ -#define BITS_PER_UNIT 8 #define UNITS_PER_WORD 2 #define POINTER_SIZE (TARGET_A16 ? 16 : 32) #define POINTERS_EXTEND_UNSIGNED 1 Index: gcc/config/microblaze/microblaze.h === --- gcc/config/microblaze/microblaze.h (revision 205895) +++ gcc/config/microblaze/microblaze.h (working copy) @@ -193,7 +193,6 @@ extern enum pipeline_type microblaze_pip #define BITS_BIG_ENDIAN 0 #define BYTES_BIG_ENDIAN (TARGET_LITTLE_ENDIAN == 0) #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN) -#define BITS_PER_UNIT 8 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 4 #define MIN_UNITS_PER_WORD 4 Index: gcc/config/picochip/picochip.h === --- gcc/config/picochip/picochip.h (revision 205895) +++ gcc/config/picochip/picochip.h (working copy) @@ -92,8 +92,6 @@ extern enum picochip_dfa_type picochip_s #define BYTES_BIG_ENDIAN 0 #define WORDS_BIG_ENDIAN 0 -#define BITS_PER_UNIT 8 - #define BITS_PER_WORD 16 #define UNITS_PER_WORD (BITS_PER_WORD / BITS_PER_UNIT) Index: gcc/config/spu/spu.h === --- gcc/config/spu/spu.h (revision 205895) +++ gcc/config/spu/spu.h (working copy) @@ -54,8 +54,6 @@ extern GTY(()) int spu_tune; #define WORDS_BIG_ENDIAN 1 -#define BITS_PER_UNIT 8 - /* GCC uses word_mode in many places, assuming that it is the fastest integer mode. That is not the case for SPU though. We can't us
Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c
The m32c part is OK.
Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c
On 12/11/2013 08:42 PM, DJ Delorie wrote: The m32c part is OK. thanks for the fast reply. kenny
Go patch committed: Disable sibling calls by default
Go programs expect to be able to call runtime.Callers to get their call stack. The libbacktrace library currently can not handle tail calls, and it's not clear that it can ever be completely reliable in handling tail calls. This patch to the Go frontend disables tail call optimization, unless it is specifically requested. This will permit Go programs that call runtime.Callers to work as expected. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian 2013-12-11 Ian Lance Taylor * go-lang.c (go_langhook_post_options): Disable sibling calls by default. Index: go-lang.c === --- go-lang.c (revision 205872) +++ go-lang.c (working copy) @@ -270,6 +270,10 @@ go_langhook_post_options (const char **p if (flag_excess_precision_cmdline == EXCESS_PRECISION_DEFAULT) flag_excess_precision_cmdline = EXCESS_PRECISION_STANDARD; + /* Tail call optimizations can confuse uses of runtime.Callers. */ + if (!global_options_set.x_flag_optimize_sibling_calls) +global_options.x_flag_optimize_sibling_calls = 0; + /* Returning false means that the backend should be used. */ return false; }
Re: RFA (cgraph): C++ 'structor decloning patch, Mark III
On 12/11/2013 03:53 PM, Jan Hubicka wrote: Lets go with minimal version of patch that makes things working and I will take care of solving the inliner limitations. OK, this patch adjusts calls_comdat_local in compute_inline_parameters, as you suggested. What do you think of the change to inline_call? The case of inlining a comdat-local function looks O(N^2), but shouldn't be a problem in practice since the wrappers only have one callee anyway. Jason commit 841b0c8a4dfd38f7f2dc5e0e05f5e4ded0aad4f1 Author: Jason Merrill Date: Thu Jan 12 14:04:42 2012 -0500 PR c++/41090 Add -fdeclone-ctor-dtor. gcc/cp/ * optimize.c (can_alias_cdtor, populate_clone_array): Split out from maybe_clone_body. (maybe_thunk_body): New function. (maybe_clone_body): Call it. * mangle.c (write_mangled_name): Remove code to suppress writing of mangled name for cloned constructor or destructor. (write_special_name_constructor): Handle decloned constructor. (write_special_name_destructor): Handle decloned destructor. * method.c (trivial_fn_p): Handle decloning. * semantics.c (expand_or_defer_fn_1): Clone after setting linkage. gcc/c-family/ * c.opt: Add -fdeclone-ctor-dtor. * c-opts.c (c_common_post_options): Default to on iff -Os. gcc/ * cgraph.h (struct cgraph_node): Add calls_comdat_local. (symtab_comdat_local_p, symtab_in_same_comdat_p): New. * symtab.c (verify_symtab_base): Make sure we don't refer to a comdat-local symbol from outside its comdat. * cgraph.c (verify_cgraph_node): Likewise. * cgraphunit.c (mark_functions_to_output): Don't mark comdat-locals. * ipa.c (symtab_remove_unreachable_nodes): Likewise. (function_and_variable_visibility): Handle comdat-local fns. * ipa-cp.c (decide_about_value): Don't clone comdat-locals. * ipa-inline-analysis.c (compute_inline_parameters): Update calls_comdat_local. * ipa-inline-transform.c (inline_call): Likewise. * ipa-inline.c (can_inline_edge_p): Check calls_comdat_local. include/ * demangle.h (enum gnu_v3_ctor_kinds): Added literal gnu_v3_unified_ctor. (enum gnu_v3_ctor_kinds): Added literal gnu_v3_unified_dtor. libiberty/ * cp-demangle.c (cplus_demangle_fill_ctor,cplus_demangle_fill_dtor): Handle unified ctor/dtor. (d_ctor_dtor_name): Handle unified ctor/dtor. diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index f368cab..3576f7d 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -899,6 +899,10 @@ c_common_post_options (const char **pfilename) if (warn_implicit_function_declaration == -1) warn_implicit_function_declaration = flag_isoc99; + /* Declone C++ 'structors if -Os. */ + if (flag_declone_ctor_dtor == -1) +flag_declone_ctor_dtor = optimize_size; + if (cxx_dialect >= cxx11) { /* If we're allowing C++0x constructs, don't warn about C++98 diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index bfca1e0..d270f77 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -890,6 +890,10 @@ fdeduce-init-list C++ ObjC++ Var(flag_deduce_init_list) Init(0) -fdeduce-init-list enable deduction of std::initializer_list for a template type parameter from a brace-enclosed initializer-list +fdeclone-ctor-dtor +C++ ObjC++ Var(flag_declone_ctor_dtor) Init(-1) +Factor complex constructors and destructors to favor space over speed + fdefault-inline C++ ObjC++ Ignore Does nothing. Preserved for backward compatibility. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 9501afa..ccd150c 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2666,10 +2666,18 @@ verify_cgraph_node (struct cgraph_node *node) error_found = true; } } + bool check_comdat = symtab_comdat_local_p (node); for (e = node->callers; e; e = e->next_caller) { if (verify_edge_count_and_frequency (e)) error_found = true; + if (check_comdat + && !symtab_in_same_comdat_p (e->caller, node)) + { + error ("comdat-local function called by %s outside its comdat", + identifier_to_locale (e->caller->name ())); + error_found = true; + } if (!e->inline_failed) { if (node->global.inlined_to diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 0a88da3..69b97a7 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -428,6 +428,9 @@ public: unsigned tm_clone : 1; /* True if this decl is a dispatcher for function versions. */ unsigned dispatcher_function : 1; + /* True if this decl calls a COMDAT-local function. This is set up in + compute_inline_parameters and inline_call. */ + unsigned calls_comdat_local : 1; }; @@ -1490,4 +1493,22 @@ symtab_can_be_discarded (symtab_node *node) && node->resolution != LDPR_PREVAILING_DEF_IRONLY && node->resolution != LDPR_PREVAILING_DEF_IRONLY_EXP)); } + +/* Return true if NODE is local to a particular COMDAT group, and must not + be named from outside the COMDAT. This is us