[PATCH] Fix PR66396
Bootstrapped / tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-06-09 Richard Biener PR tree-optimization/66396 * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Rename virtual operands. Index: gcc/graphite-isl-ast-to-gimple.c === --- gcc/graphite-isl-ast-to-gimple.c(revision 224212) +++ gcc/graphite-isl-ast-to-gimple.c(working copy) @@ -1069,6 +1069,10 @@ graphite_regenerate_ast_isl (scop_p scop translate_isl_ast (context_loop, root_node, if_region->true_region->entry, ip); + + mark_virtual_operands_for_renaming (cfun); + update_ssa (TODO_update_ssa); + graphite_verify (); scev_reset (); recompute_all_dominators ();
Re: [Patch, C++, PR65882] Check tf_warning flag in build_new_op_1
This is not my patch, but I'd like to ping it anyway as it also fixes PR66467 (you might want to add a testcase from this PR as well). On Sun, Apr 26, 2015 at 02:11:14PM +0300, Mikhail Maltsev wrote: > The attached patch was bootstrapped and regtested on > x86_64-unknown-linux-gnu. > > -- > Regards, > Mikhail Maltsev > gcc/cp/ChangeLog: > > 2015-04-26 Mikhail Maltsev > > * call.c (build_new_op_1): Check tf_warning flag in all cases > > gcc/testsuite/ChangeLog: > > 2015-04-26 Mikhail Maltsev > > * g++.dg/diagnostic/inhibit-warn.C: New test. > > > diff --git a/gcc/cp/call.c b/gcc/cp/call.c > index 7bdf236..689d542 100644 > --- a/gcc/cp/call.c > +++ b/gcc/cp/call.c > @@ -5677,8 +5677,9 @@ build_new_op_1 (location_t loc, enum tree_code code, > int flags, tree arg1, > case TRUTH_ORIF_EXPR: > case TRUTH_AND_EXPR: > case TRUTH_OR_EXPR: > - warn_logical_operator (loc, code, boolean_type_node, > - code_orig_arg1, arg1, code_orig_arg2, arg2); > + if (complain & tf_warning) > + warn_logical_operator (loc, code, boolean_type_node, > +code_orig_arg1, arg1, code_orig_arg2, arg2); >/* Fall through. */ > case GT_EXPR: > case LT_EXPR: > @@ -5686,8 +5687,9 @@ build_new_op_1 (location_t loc, enum tree_code code, > int flags, tree arg1, > case LE_EXPR: > case EQ_EXPR: > case NE_EXPR: > - if ((code_orig_arg1 == BOOLEAN_TYPE) > - ^ (code_orig_arg2 == BOOLEAN_TYPE)) > + if ((complain & tf_warning) > + && ((code_orig_arg1 == BOOLEAN_TYPE) > + ^ (code_orig_arg2 == BOOLEAN_TYPE))) > maybe_warn_bool_compare (loc, code, arg1, arg2); >/* Fall through. */ > case PLUS_EXPR: > diff --git a/gcc/testsuite/g++.dg/diagnostic/inhibit-warn.C > b/gcc/testsuite/g++.dg/diagnostic/inhibit-warn.C > new file mode 100644 > index 000..5655eb4 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/diagnostic/inhibit-warn.C > @@ -0,0 +1,32 @@ > +// PR c++/65882 > +// { dg-do compile { target c++11 } } > +// { dg-options "-Wbool-compare" } > + > +// Check that we don't ICE because of reentering error reporting routines > while > +// evaluating template parameters > + > +template > +struct type_function { > + static constexpr bool value = false; > +}; > + > +template > +struct dependent_type { > + typedef int type; > +}; > + > +template > +typename dependent_type<(5 > type_function::value)>::type > +bar(); > + > +template > +typename dependent_type<(5 > type_function::value)>::type > +foo() > +{ > + return bar(); > +} > + > +int main() > +{ > + foo(); > +} Marek
Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
On Mon, Jun 8, 2015 at 7:03 PM, Marc Glisse wrote: > On Mon, 8 Jun 2015, Marek Polacek wrote: > >> PR tree-optimization/66299 >> * match.pd ((CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) >> ((CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)): New > > > You are braver than I am, I would have abbreviated ctz (CST2) - ctz (CST1) > to CST3 in the ChangeLog ;-) > >> +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) >> + (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) >> + if CST2 != 0. */ >> +(for cmp (ne eq) >> + (simplify >> + (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2) >> + (with { >> + unsigned int cand = wi::ctz (@2) - wi::ctz (@0); } >> + (if (!integer_zerop (@2) > > > You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be > indented one space more? Yes, one space more. I suppose using integer_zerop might in theory allow for handling vector shifts at some point ...? >> + && wi::eq_p (wi::lshift (@0, cand), @2)) >> + (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); }) > > > Making 'cand' signed, you could return 0 when cand<0, like (2< could also return 0 when the candidate turns out not to work: (3< Tweaking it so that (6<=31 for TYPE_OVERFLOW_WRAPS and > false for TYPE_OVERFLOW_UNDEFINED is probably more controversial. Hm, yes. I think signed overflow != shift amount overflow, so testing the overflow macros for this isn't valid. Otherwise the patch looks ok to me as well - mind doing the improvement above? Thanks, Richard. > FWIW, the patch looks good to me, thanks. > > -- > Marc Glisse
Re: [patch] Adjust gcc-plugin.h
On Mon, Jun 8, 2015 at 7:52 PM, Andrew MacLeod wrote: > On 06/08/2015 09:32 AM, Richard Biener wrote: >> >> On Mon, Jun 8, 2015 at 2:07 PM, Andrew MacLeod >> wrote: >>> >>> During the original flattening process I decided to use gcc-plugin.h as >>> the >>> kitchen sink for all includes that plugins might need. I think this has >>> worked well for plugins, drastically reducing their dependency on gcc >>> internal header file structure. >>> >>> What I didn't realize was that gcc's internal header (plugin.h) also >>> includes gcc-plugin.h. This means that every file which may need to do >>> something for plugins ends up indirectly including the gcc world again >>> :-P >>> >>> Easy fix. (ha). This patch leaves all the #includes in gcc-plugin.h >>> making >>> the change transparent to plugins. All the remaining declarations and >>> such >>> are moved into a new file gcc-plugin-common.h. Both gcc-plugin.h and >>> gcc's >>> internal header plugin.h now include this common file. >>> >>> The effect is that gcc's source files no longer get anything but the >>> required plugin info. Great.. Except there were a few files which were >>> apparently picking up some required headers from gcc-plugins.h :-P >>> This >>> patch also adds the required headers to those source files. >>> >>> Compiles on x86_64-unknown-linux-gnu with no new regressions. Also >>> compiles >>> across all targets in config-list.mk. OK for trunk? >> >> Err - why not simply remove the gcc-plugin.h include from plugin.h and >> instead >> include plugin.h from gcc-plugin.h? >> >> > > the gcc source files need to see the internal bits in plugin.h, as well as > the common decls in gcc-plugin.h. So we could change the includes as you > suggest, but we'd have to copy all the decls from gcc-inlcude.h to plugin.h > so the gcc functions can see them. And then the plugins would be exposed to > all the internal APIs and decls present in plugins.h plugins are exposed to all internals of GCC anyway. gcc-plugin.h should really just be a #include kitchen-sink. > Adding the 3rd file which contains all the common decls between both sides > is the only way to isolate both. If you were OK with exposing the > internal parts of plugin.h to plugin clients I could do that. Yes. > Im presuming > we didnt want to do that and thats why there were 2 files to start with. No, gcc-plugin.h was introduced to make the set of includes required for plugins "stable". Richard. > I > hijacked the external interface in gcc-plugin.h file to provide all the > includes when instead the right thing would have been to probably create a > new in the first place. > > Andrew > > >
Re: debug-early branch merged into mainline
On Mon, Jun 8, 2015 at 10:25 PM, Aldy Hernandez wrote: > On 06/08/2015 02:59 PM, Richard Biener wrote: >> >> On June 8, 2015 7:14:19 PM GMT+02:00, Aldy Hernandez >> wrote: >>> >>> On 06/08/2015 09:30 AM, Richard Biener wrote: On Mon, Jun 8, 2015 at 2:05 PM, Aldy Hernandez >>> >>> wrote: > > On 06/08/2015 04:26 AM, Richard Biener wrote: >> >> >> On Mon, Jun 8, 2015 at 3:23 AM, Aldy Hernandez > > >>> What about if the comparison routine gets a named section and an >>> unnamed >>> section? How to compare? That's why I was giving priority to one over >>> >>> the other originally, but I didn't know about problematic qsort >>> implementations. >> >> >> Obviously unnamed and a named section can be sorted like you did in the >> original patch. > > > Obviously I'm not understanding :). > > How about this? Ok with adding v.create (object_block_htab->elements ()); and using v.quick_push () (avoids re-allocations) and with adding a v.release (); at the end of the function. And re-writing + return f1 < f2 ? -1 : (f1 > f2 ? 1 : 0); to if (f1 == f2) return 0; return f1 < f2 ? -1 : 1; Thanks, Richard. > Tested on x86-64 and ppc64le. > > Aldy
[PATCH] Fix assert in get_loop_body_in_bfs_order
Noticed when it crashed rather than asserted with broken loop structure. Bootstrapped / tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-06-09 Richard Biener * cfgloop.c (get_loop_body_in_bfs_order): Fix assert. Index: gcc/cfgloop.c === --- gcc/cfgloop.c (revision 224266) +++ gcc/cfgloop.c (working copy) @@ -954,7 +954,7 @@ get_loop_body_in_bfs_order (const struct } } - gcc_assert (i >= vc); + gcc_assert (i > vc); bb = blocks[vc++]; }
Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code
On 05/06/15 14:14, Kyrill Tkachov wrote: On 05/06/15 14:11, Richard Earnshaw wrote: On 05/06/15 14:08, Kyrill Tkachov wrote: Hi Shiva, On 05/06/15 10:42, Shiva Chen wrote: Hi, Kyrill I add the testcase as stl-cond.c. Could you help to check the testcase ? If it's OK, Could you help me to apply the patch ? This looks ok to me. One nit on the testcase: diff --git a/gcc/testsuite/gcc.target/arm/stl-cond.c b/gcc/testsuite/gcc.target/arm/stl-cond.c new file mode 100755 index 000..44c6249 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/stl-cond.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options "-O2" } */ This should also have -marm as the problem exhibited itself in arm state. I'll commit this patch with this change in 24 hours on your behalf if no one objects. Explicit use of -marm will break multi-lib testing. I've forgotten the correct hook, but there's most-likely something that will give you the right behaviour, even if it means that thumb-only multi-lib testing skips this test. So I think what we want is: dg-require-effective-target arm_arm_ok The comment in target-supports.exp is: # Return 1 if this is an ARM target where -marm causes ARM to be # used (not Thumb) I've committed the attached patch to trunk on Shiva's behalf with r224269. It gates the test on arm_arm_ok and adds -marm, like other similar tests. The ChangeLog I used is below: 2015-06-09 Shiva Chen * sync.md (atomic_load): Add conditional code for lda/ldr (atomic_store): Likewise. 2015-06-09 Shiva Chen * gcc.target/arm/stl-cond.c: New test. Thanks, Kyrill Kyrill R. Ramana, Richard, we need to backport it to GCC 5 as well, right? Thanks, Kyrill Thanks, Shiva 2015-06-05 16:34 GMT+08:00 Kyrill Tkachov : Hi Shiva, On 05/06/15 09:29, Shiva Chen wrote: Hi, Kyrill I update the patch as Richard's suggestion. - return \"str\t%1, %0\"; + return \"str%(%)\t%1, %0\"; else - return \"stl\t%1, %0\"; + return \"stl%?\t%1, %0\"; } -) + [(set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no")]) + [(set_attr "predicable" "yes") + (set_attr "predicable_short_it" "no")]) Let me sum up. We add predicable attribute to allow gcc do if-conversion in ce1/ce2/ce3 not only in final phase by final_prescan_insn finite state machine. We set predicalble_short_it to "no" to restrict conditional code generation on armv8 with thumb mode. However, we could use the flags -mno-restrict-it to force generating conditional code on thumb mode. Therefore, we have to consider the assembly output format for strb with condition code on arm/thumb mode. Because arm/thumb mode use different syntax for strb, we output the assembly as str%(%) which will put the condition code in the right place according to TARGET_UNIFIED_ASM. Is there still missing something ? That's all correct, and well summarised :) The patch looks good to me, but please include the testcase (test.c from earlier) appropriately marked up for the testsuite. I think to the level of dg-assemble, just so we know everything is wired up properly. Thanks for dealing with this. Kyrill Thanks, Shiva 2015-06-04 18:00 GMT+08:00 Kyrill Tkachov : Hi Shiva, On 04/06/15 10:57, Shiva Chen wrote: Hi, Kyrill Thanks for the tips of syntax. It seems that correct syntax for ldrb with condition code is ldreqb ldab with condition code is ldabeq So I modified the pattern as follow { enum memmodel model = (enum memmodel) INTVAL (operands[2]); if (model == MEMMODEL_RELAXED || model == MEMMODEL_CONSUME || model == MEMMODEL_RELEASE) return \"ldr%?\\t%0, %1\"; else return \"lda%?\\t%0, %1\"; } [(set_attr "predicable" "yes") (set_attr "predicable_short_it" "no")]) It seems we don't have to worry about thumb mode, I suggest you use Richard's suggestion from: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00384.html to write this in a clean way. Because we already set "predicable" "yes" and predicable_short_it" "no" for the pattern. That's not quite true. The user may compile for armv8-a with -mno-restrict-it which will turn off this restriction for Thumb and allow the conditional execution of this. In any case, I think Richard's suggestion above should work. Thanks, Kyrill The new patch could build gcc and run gcc regression test successfully. Please correct me if I still missing something. Thanks, Shiva -Original Message- From: Richard Earnshaw [mailto:richard.earns...@foss.arm.com] Sent: Thursday, June 04, 2015 4:42 PM To: Kyrill Tkachov; Shiva Chen Cc: Ramana Radhakrishnan; GCC Patches; ni...@redhat.com; Shiva Chen Subject: Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code On 04/06/15 09:17, Kyrill Tkachov wrote: Hi Shiva, On 04/06/15 04:13, Shiva Chen wrote: Hi, Ramana Currently
Re: [PR64164] drop copyrename, integrate into expand
On 8 June 2015 at 10:14, Richard Biener wrote: > On Sat, Jun 6, 2015 at 3:14 AM, Alexandre Oliva wrote: >> On Apr 27, 2015, Richard Biener wrote: >> >>> This should also mention that is_gimple_reg vars do not have their >>> address taken. >> >> check >> +static tree +leader_merge (tree cur, tree next) >> >>> Ick - presumably you can't use sth better than a TREE_LIST here? >> >> The list was an experiment that never really worked, and when I tried to >> make it work after the patch, it proved to be unworkable, so I dropped >> it, and rewrote leader_merge to choose either of the params, preferring >> anonymous over ignored over named, so as to reduce the likelihood of >> misreading of debug dumps, since that's all they're used for. >> static void -expand_one_stack_var (tree var) +expand_one_stack_var_1 (tree var) { HOST_WIDE_INT size, offset; unsigned byte_align; - size = tree_to_uhwi (DECL_SIZE_UNIT (SSAVAR (var))); - byte_align = align_local_variable (SSAVAR (var)); + if (TREE_CODE (var) != SSA_NAME || SSA_NAME_VAR (var)) +{ + size = tree_to_uhwi (DECL_SIZE_UNIT (SSAVAR (var))); + byte_align = align_local_variable (SSAVAR (var)); +} + else >> >>> I'd go here for all TREE_CODE (var) == SSA_NAME >> >> Check >> >>> (and get rid of the SSAVAR macro?) >> >> There are remaining uses that don't seem worth dropping it for. >> +/* Return the promoted mode for name. If it is a named SSA_NAME, it + is the same as promote_decl_mode. Otherwise, it is the promoted + mode of a temp decl of same type as the SSA_NAME, if we had created + one. */ + +machine_mode +promote_ssa_mode (const_tree name, int *punsignedp) +{ + gcc_assert (TREE_CODE (name) == SSA_NAME); + + if (SSA_NAME_VAR (name)) +return promote_decl_mode (SSA_NAME_VAR (name), punsignedp); >> >>> As above I'd rather not have different paths for anonymous vs. non-anonymous >>> vars (so just delete the above two lines). >> >> Check >> @@ -9668,6 +9678,11 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, pmode = promote_function_mode (type, mode, &unsignedp, gimple_call_fntype (g), 2); + else if (!exp) + { + gcc_assert (code == SSA_NAME); >> >>> promote_ssa_mode should assert this. >> + pmode = promote_ssa_mode (ssa_name, &unsignedp); >> >> It does, so... check. >> >> @@ -2121,6 +2122,15 @@ aggregate_value_p (const_tree exp, const_tree fntype) bool use_register_for_decl (const_tree decl) { + if (TREE_CODE (decl) == SSA_NAME) +{ + if (!SSA_NAME_VAR (decl)) + return TYPE_MODE (TREE_TYPE (decl)) != BLKmode + && !(flag_float_store && FLOAT_TYPE_P (TREE_TYPE (decl))); + + decl = SSA_NAME_VAR (decl); >> >>> See above. Please drop the SSA_NAME_VAR != NULL path. >> >> Check, then taken back, after a bootstrap failure and some debugging >> made me realize this would be wrong. Here are the nearly-added comments >> that explain why: >> >> /* We often try to use the SSA_NAME, instead of its underlying >> decl, to get type information and guide decisions, to avoid >> differences of behavior between anonymous and named >> variables, but in this one case we have to go for the actual >> variable if there is one. The main reason is that, at least >> at -O0, we want to place user variables on the stack, but we >> don't mind using pseudos for anonymous or ignored temps. >> Should we take the SSA_NAME, we'd conclude all SSA_NAMEs >> should go in pseudos, whereas their corresponding variables >> might have to go on the stack. So, disregarding the decl >> here would negatively impact debug info at -O0, enable >> coalescing between SSA_NAMEs that ought to get different >> stack/pseudo assignments, and get the incoming argument >> processing thoroughly confused by PARM_DECLs expected to live >> in stack slots but assigned to pseudos. */ >> >> +++ b/gcc/gimple-expr.h +/* Defined in tree-ssa-coalesce.c. */ +extern bool gimple_can_coalesce_p (tree, tree); >> >>> Err, put it to tree-ssa-coalesce.h? >> >> Check. Lots of additional headers required to be able to include >> tree-ssa-coalesce.h, though. >> >> - gcc_assert (src_mode == TYPE_MODE (TREE_TYPE (var))); + gcc_assert (src_mode == TYPE_MODE (TREE_TYPE (var ? var : name))); >> >>> The TREE_TYPE of name and its SSA_NAME_VAR are always the same. So just >>> use TREE_TYPE (name) here. >> >> Check >> gcc_assert (!REG_P (dest_rtx) - || dest_mode == promote_decl_mode (var, &unsignedp)); + || dest_mode == promote_ssa_mode (name, &unsignedp)); if (src_mode != dest
Re: [patch] libstdc++/66030 fix codecvt exports for mingw32
On 08/06/15 09:59 -0600, Martin Sebor wrote: On 06/08/2015 09:12 AM, Jonathan Wakely wrote: The linker script assumes that std::mbstate_t has the name __mbstate_t for linkage purposes, but that's not necessarily true. For mingw32 it's just a typedef for int, so the patterns don't match. This adds a new mingw32-specific pattern for codecvt_byname's constructors and destructors, and relaxes the patterns for codecvt so they match __mbstate_t or int. As a data point, in case other targets have a similar issue, mbstate_t is a typedef for char* on AIX, and (based on my old notes) typedef struct mbstate_t on HP-UX. (It is a typedef struct __mbstate_t on Darwin and Solaris.) That is useful, thanks. AIX and HP-UX don't use that linker script, so it's not a problem (for now) that it assumes __mbstate_t.
Re: debug-early branch merged into mainline
On 06/09/2015 04:00 AM, Richard Biener wrote: On Mon, Jun 8, 2015 at 10:25 PM, Aldy Hernandez wrote: On 06/08/2015 02:59 PM, Richard Biener wrote: On June 8, 2015 7:14:19 PM GMT+02:00, Aldy Hernandez wrote: On 06/08/2015 09:30 AM, Richard Biener wrote: On Mon, Jun 8, 2015 at 2:05 PM, Aldy Hernandez wrote: On 06/08/2015 04:26 AM, Richard Biener wrote: On Mon, Jun 8, 2015 at 3:23 AM, Aldy Hernandez What about if the comparison routine gets a named section and an unnamed section? How to compare? That's why I was giving priority to one over the other originally, but I didn't know about problematic qsort implementations. Obviously unnamed and a named section can be sorted like you did in the original patch. Obviously I'm not understanding :). How about this? Ok with adding I've committed the attached patch. v.create (object_block_htab->elements ()); and using v.quick_push () (avoids re-allocations) and with adding a v.release (); For some reason I thought the new C++ vector world released stuff on its own if allocated on the heap. Oh well... Thanks. Aldy commit d2f1a9cd6e91c400ab4fa569565eece1ae08b64d Author: Aldy Hernandez Date: Tue Jun 9 05:39:34 2015 -0400 * varasm.c (output_object_block_htab): Remove. (output_object_block_compare): New. (output_object_blocks): Sort named object_blocks before outputting them. diff --git a/gcc/varasm.c b/gcc/varasm.c index 18f3eac..86a70db 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -7420,14 +7420,31 @@ output_object_block (struct object_block *block) } } -/* A htab_traverse callback used to call output_object_block for - each member of object_block_htab. */ +/* A callback for qsort to compare object_blocks. */ -int -output_object_block_htab (object_block **slot, void *) +static int +output_object_block_compare (const void *x, const void *y) { - output_object_block (*slot); - return 1; + object_block *p1 = *(object_block * const*)x; + object_block *p2 = *(object_block * const*)y; + + if (p1->sect->common.flags & SECTION_NAMED + && !(p2->sect->common.flags & SECTION_NAMED)) +return 1; + + if (!(p1->sect->common.flags & SECTION_NAMED) + && p2->sect->common.flags & SECTION_NAMED) +return -1; + + if (p1->sect->common.flags & SECTION_NAMED + && p2->sect->common.flags & SECTION_NAMED) +return strcmp (p1->sect->named.name, p2->sect->named.name); + + unsigned f1 = p1->sect->common.flags; + unsigned f2 = p2->sect->common.flags; + if (f1 == f2) +return 0; + return f1 < f2 ? -1 : 1; } /* Output the definitions of all object_blocks. */ @@ -7435,7 +7452,23 @@ output_object_block_htab (object_block **slot, void *) void output_object_blocks (void) { - object_block_htab->traverse (NULL); + vec v; + v.create (object_block_htab->elements ()); + object_block *obj; + hash_table::iterator hi; + + FOR_EACH_HASH_TABLE_ELEMENT (*object_block_htab, obj, object_block *, hi) +v.quick_push (obj); + + /* Sort them in order to output them in a deterministic manner, + otherwise we may get .rodata sections in different orders with + and without -g. */ + v.qsort (output_object_block_compare); + unsigned i; + FOR_EACH_VEC_ELT (v, i, obj) +output_object_block (obj); + + v.release (); } /* This function provides a possible implementation of the
Re: Fix more of C/fortran canonical type issues
On Mon, 8 Jun 2015, Jan Hubicka wrote: > > > > I think we should instead work towards eliminating the get_alias_set > > langhook first. The LTO langhook variant contains the same handling, btw, > > so just inline that into get_alias_set and see what remains? > > I see, i completely missed existence of gimple_get_alias_set. It makes more > sense now. > > Is moving everyting to alias.c realy a desirable thing? If non-C languages do > not have this rule, why we want to reduce the code quality when compiling > those? Well, for consistency and for getting rid of one langhook ;) Richard. > Honza > > > > Richard. > > > > > Honza > > > > > > > > -- > > > > Joseph S. Myers > > > > jos...@codesourcery.com > > > > > > > > > > -- > > Richard Biener > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, > > Graham Norton, HRB 21284 (AG Nuernberg) > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [Patch libstdc++] Rewrite cpu/generic/atomic_word.h
On 22/05/15 17:56, Torvald Riegel wrote: On Fri, 2015-05-22 at 12:37 +0100, Ramana Radhakrishnan wrote: Hi, While writing atomic_word.h for the ARM backend to fix PR target/66200 I thought it would make more sense to write it all up with atomic primitives instead of providing various fragile bits of inline asssembler. Thus this patch came about. I intend to ask for a specialized version of this to be applied for the ARM backend in any case after testing completes. If this goes in as a cleanup I expect all the ports to be deleting their atomic_word.h implementation in the various ports directories and I'll follow up with patches asking port maintainers to test and apply for each of the individual cases if this is deemed to be good enough. Could you use C++11 atomics right away instead of adapting the wrappers? I spent some time trying to massage guard.cc into using C++11 atomics but it was just easier to write it with the builtins - I consider this to be a step improvement compared to where we are currently. Rewritten to use the builtins in guard.cc instead of std::atomic as that appears to be a bigger project than moving forward compared to where we are. I prefer small steps rather than big steps in these areas. Further I do not believe you can use the C++11 language features in the headers as they were not necessarily part of the standard for tr1 etc. and thus it's better to remain with the builtins, after all I am also continuing with existing practice in the headers. I think the more non-C++11 atomic operations/wrappers we can remove the better. diff --git a/libstdc++-v3/config/cpu/generic/atomic_word.h b/libstdc ++-v3/config/cpu/generic/atomic_word.h index 19038bb..bedce31 100644 --- a/libstdc++-v3/config/cpu/generic/atomic_word.h +++ b/libstdc++-v3/config/cpu/generic/atomic_word.h @@ -29,19 +29,46 @@ #ifndef _GLIBCXX_ATOMIC_WORD_H #define _GLIBCXX_ATOMIC_WORD_H 1 +#include + typedef int _Atomic_word; -// Define these two macros using the appropriate memory barrier for the target. -// The commented out versions below are the defaults. -// See ia64/atomic_word.h for an alternative approach. + +namespace __gnu_cxx _GLIBCXX_VISIBILITY(default) +{ + // Test the first byte of __g and ensure that no loads are hoisted across + // the test. That comment is not quite correct. I'd just say that this is a memory_order_acquire load and a comparison. Done. + inline bool + __test_and_acquire (__cxxabiv1::__guard *__g) + { +unsigned char __c; +unsigned char *__p = reinterpret_cast(__g); +__atomic_load (__p, &__c, __ATOMIC_ACQUIRE); +return __c != 0; + } + + // Set the first byte of __g to 1 and ensure that no stores are sunk + // across the store. Likewise; I'd just say this is a memory_order_release store with 1 as value. Ok. I've now realized that this is superfluous and better to fix the one definition in guard.cc to do the right thing instead of something like this. + inline void + __set_and_release (__cxxabiv1::__guard *__g) + { +unsigned char *__p = reinterpret_cast(__g); +unsigned char val = 1; +__atomic_store (__p, &val, __ATOMIC_RELEASE); + } + +#define _GLIBCXX_GUARD_TEST_AND_ACQUIRE(G) __gnu_cxx::__test_and_acquire (G) +#define _GLIBCXX_GUARD_SET_AND_RELEASE(G) __gnu_cxx::__set_and_release (G) // This one prevents loads from being hoisted across the barrier; // in other words, this is a Load-Load acquire barrier. Likewise; I'd just say that this is an mo_acquire fence. -// This is necessary iff TARGET_RELAXED_ORDERING is defined in tm.h. -// #define _GLIBCXX_READ_MEM_BARRIER __asm __volatile ("":::"memory") +// This is necessary iff TARGET_RELAXED_ORDERING is defined in tm.h. +#define _GLIBCXX_READ_MEM_BARRIER __atomic_thread_fence (__ATOMIC_ACQUIRE) // This one prevents stores from being sunk across the barrier; in other // words, a Store-Store release barrier. Likewise; mo_release fence. -// #define _GLIBCXX_WRITE_MEM_BARRIER __asm __volatile ("":::"memory") +#define _GLIBCXX_WRITE_MEM_BARRIER __atomic_thread_fence (__ATOMIC_RELEASE) + +} I don't think we can remove _GLIBCXX_READ_MEM_BARRIER and _GLIBCXX_WRITE_MEM_BARRIER from atomic_word.h even though they are superseded by the atomics as it is published in the documentation as available macros. It was obvious that alpha, powerpc, aix, ia64 could just fall back to the standard implementations. The cris and sparc implementations have different types for _Atomic_word from generic and the rest - my guess is sparc can remove the _GLIBCXX_READ_MEM_BARRIER and _GLIBCXX_WRITE_MEM_BARRIER but I have no way of testing this is sane. I'll leave that to the respective target maintainers for sparc and cris to deal as appropriate. 2015-06-09 Ramana Radhakrishnan PR c++/66192 PR target/66200 * config/cpu/generic/atomic_word.h (_GLIBCXX_READ_MEM_BARRIER): Define (_GLIBCXX_WRITE_MEM_BARRIER): Like
Re: [PATCH, ARM] attribute target (thumb,arm) [6/6] respin (5th)
Hi Christian, On 18/05/15 09:41, Christian Bruel wrote: On 05/06/2015 04:29 PM, Christian Bruel wrote: >Implement the -mflip-thump option. Undocumented for internal testing >only. This option artificially inserts alternative attribute thumb/modes >on functions. > >This close the patch set. Thanks for your review, > >Christian > to close the series. redo patch rebased. static bool diff '--exclude=.svn' -ruN gnu_trunk.p5/gcc/gcc/config/arm/arm.opt gnu_trunk.p6/gcc/gcc/config/arm/arm.opt --- gnu_trunk.p5/gcc/gcc/config/arm/arm.opt 2015-05-18 10:20:26.545835080 +0200 +++ gnu_trunk.p6/gcc/gcc/config/arm/arm.opt 2015-05-13 13:13:11.014686529 +0200 @@ -122,6 +122,10 @@ EnumValue Enum(float_abi_type) String(hard) Value(ARM_FLOAT_ABI_HARD) +mflip-thumb +Target Report Var(TARGET_FLIP_THUMB) +Switch ARM/Thumb modes on alternating functions for compiler testing How about adding 'Undocumented' to the properties here? Can we also get a test or two just to sanity check the option? Otherwise the patch looks ok to me. Thanks, Kyrill
[PATCH] Add extensions to dwarf2.def
Hello, I'm currently working on migrating debugging information for Ada from GNAT encodings to standard DWARF. At the moment, I have worked on two topics that I believe are not (completely) supported in standard DWARF: - fixed point types with arbitrary scale factors; - scalar types with biased representations. My goal is to submit an issue on dwarfstd.org in an attempt to introduce these extensions to the next DWARF standard. Before that, though, I would like to make sure that these extensions actually fit the need by having them supported both in GCC and GDB. The two attached patches make these extensions "public" so that no other vendor-specific tags/attributes conflict with them in the future. I cannot submit the patches that actually use these right now because I need first to port them from the 4.9 branch onto mainline (I hope I will be able to do this on early July). May I commit them? I also attached two documents that describe how to use these extensions. I guess this should go to the wiki just like for DW_AT_GNAT_descriptive_type (https://gcc.gnu.org/wiki/DW_AT_GNAT_descriptive_type). I will do this if the patches are integrated. Thank you in advance! include/ * dwarf2.def (DW_TAG_GNU_rational_constant): New tag. (DW_AT_GNU_numerator, DW_AT_GNU_denominator): New attributes. include/ * dwarf2.def (DW_AT_GNU_bias): New attribute. -- Pierre-Marie de Rodat == Introduction == Ada makes it possible to define fixed point types. In order to specify how these will be represented at runtime, the compiler selects a size in bytes and a "scale factor", which is a strictly positive rational number. Assuming type T is associated with the S scale factor, then values V will be represented at runtime by R so that V = S * R. == Fixed points in DWARF today == Since DWARFv3, there are several ways to describe fixed point types, all of these require to use DW_TAG_base_type DIEs. The encoding must be DW_ATE_signed_fixed or DW_ATE_unsigned_fixed and the scale factor must be encoded using one of the following attributes: * DW_AT_decimal_scale, which makes it possible to represent scale factors of the form 10**N (with N any signed integer). * DW_AT_binary_scale, which makes it possible to represent scale factors of the form 2**N (with N any signed integer). * DW_AT_small, which is a reference to a DW_TAG_constant entry providing the scale factor itself. == Limitations == While DW_AT_small enables one to describe any scale factor the two other attributes could, there are still some scale factors that are not encodable as of today. In Ada, it is possible to define fixed point types with scale factors such as 1/3. For instance: type T is delta 0.1 range 0.0 .. 1.0 with Small => 1.0/3.0; Trying to represent 1/3 with a DW_TAG_constant leads to a chicken an egg problem today: the constant is encoded in the DW_AT_const_value attribute, which must be "the actual constant value [...] represented as it would be on the target architecture". 1/3 can be represented on the target architecture, but only with a fixed point type... whose scalar factor is 1/3, or 1/30, etc. == Proposed extension == One new tag and two new attributes are proposed to solve this problem: * DW_TAG_GNU_rational_constant, whose purpose is to describe any rational constant. It must have exactly two attributes: * DW_AT_GNU_numerator and DW_AT_GNU_denominator, whose forms are any constant form (DW_FORM_*data*) and which represent an integer (unsigned, unless the form represents a signed one). The rational constant represented by DW_TAG_GNU_rational_constant is N/D where: * N is the integer represented by the DW_AT_GNU_numerator attribute; * D is the integer represented by the DW_AT_GNU_denominator attribute; With this new tag, it is possible to represent arbitrary rational scale factors for fixed point types, enabling one to describe all possible Ada fixed point types. == Introduction == In Ada, it is possible for a scalar type to have a biased representation in some contexts. For instance: type Small_Type is range 50 .. 57; type Record_Type is record A, B : Small_Type; end record; for Record_Type use record A at 0 range 0 .. 2; B at 0 range 3 .. 5; end record; In Record_Type, the A and B fields will be represented on 3 bits each. This is possible because they have a biased representation: * the 000 bit pattern represents 50; * the 001 one represents 51; * etc. So in this example, the representation for A and B have a bias of 50. The general principle is that given some type T, a bias B and a value V, the runtime representation for V is: V - B There is no way to describe this representation in DWARF today. == Proposed extension == One new attribute is proposed to lift this limitation: DW_AT_GNU_bias, whose form is any constant form and which represents an integer (unsigned, unless the form represents a sign
Re: [patch] Adjust hash-table.h and it's pre-requisite includes.
On 06/05/2015 05:24 PM, Andrew MacLeod wrote: > There is a horrible morass of include dependencies between hash-map.h, > mem-stats.h and hash-table.h. There are even includes in both directions > (mem-stats.h and hash-map.h include each other, as do hash-map.h and > hash-table.h.. blech). Some of those files need parts of the other file to > compile, and those whole mess is quite awful. They also manage to include > vec.h into their little party 3 times as well, and it also has some icky > #ifdefs. > > So I spent some time sorting out the situation, and reduced it down to a > straight dependency list, rooted by hash-table.h. There are no double > direction includes, and no header is included more than once. Once sorted > out, I moved the root of this tree into coretypes.h since pretty much every > file requires everything in the dependency chain. This chain consists of > statistics.h, ggc.h, vec.h, hashtab.h, inchash.h, mem-stats-traits.h, > hash-map-traits.h, mem-stats.h, hash-map.h and hash-table.h. Hello Andrew. Thank you for solving issue related to the aforementioned cycle. Few weeks ago, I implemented memory statistics support for hash-{set|table|map}, which internally uses a hash-map and that's the reason why it was a bit awful. Anyway, nice work! Martin > > With hash-table.h at the root of the dependency list, I wondered how many > files actually need just that. So I flattened a source tree such that > coretypes.h included the other required include files, but each .c file > included hash-table.h. Then I tried removing the includes. It turned out > that virtually every file needs hash-table.h. Part of that is due to how > tightly integrated with mem-stats.h it is (they still need each other), and > that is used throughout the compiler. So I think it makes sense to put that > in coretypes.h. > > I also noticed that hash-set.h is included in a lot of places as well. > Wondering how much it was actually needed, I preformed the same flattening > exercise and found that only about 10% of the files in gcc core didn't need > it to compile... the rest all needed it due to hash_set being in a > prototype parameter list or in a structure declared in a commonly used header > file (function.h, gimple-walk.h, tree-core.h, tree.h,...) . It would be a > lot of work to remove this dependency (if its even possible), so I added > hash-set.h to coretypes.h as well. rtl.h needed hash-table.h added to the > GENERATOR list, but not hash-set.. I guess the generators don't use it much > :-) > > The only other thing of note is the change to vec.h. It had an ugly set of > checks to see whether it was being used in a generator file, and if not > whether GC was available, then included it if it wasn't or provided 3 > prototypes if it wasn't suppose to be included. These allows it to compile > when GC isn't available (those routines referencing the GC functions would > never be referred to when GC isnt available).With my other changes, most > of those checks weren't necessary. I also figured it was best to simply > include those 3 prototypes for ggc_free, ggc_round_alloc_size, and > ggc_realloc all the time. When there isn't ggc.h, things remain as they are > now. If there is a ggc.h included, it will provide static confirmation that > those prototypes are up-to-date and in sync with how ggc.h defines them. > > The first patch contains all of those changes. The second one is fully > automated and removes all these headers from every other .c and .h file in > the compiler. This also included changes to many of the gen*.c routines. I > adjusted the #include list for all the *generated* .c files to also be up to > date with this patchset as well at the previous one which moved wide-int and > friends into coretypes.h > > This bootstraps with all languages enabled on x86_64-unknown-linux-gnu with > no new regressions. It also causes no failures for all the targets in > config-list.mk. > > OK for trunk? > > Andrew
Re: [PING^2] [PATCH][5a/5] Postpone expanding va_arg until pass_stdarg
Hmmm. One side effect of this is that the line number information available in the target hook gimplify_va_arg_expr, is now just the name of the containing function, rather than the specific use of va_arg. Is there some way to get this more precise location (e.g. gimple_location(stmt) in expand_ifn_va_arg_1, the only caller of said hook)? I don't really want to have to add an extra parameter to the target hook... --Alan Richard Biener wrote: On Thu, 16 Apr 2015, Tom de Vries wrote: [stage1 ping^2] On 10-03-15 16:30, Tom de Vries wrote: [stage1 ping] On 22-02-15 14:13, Tom de Vries wrote: On 19-02-15 14:03, Richard Biener wrote: On Thu, 19 Feb 2015, Tom de Vries wrote: On 19-02-15 11:29, Tom de Vries wrote: Hi, I'm posting this patch series for stage1: - 0001-Disable-lang_hooks.gimplify_expr-in-free_lang_data.patch - 0002-Add-gimple_find_sub_bbs.patch - 0003-Factor-optimize_va_list_gpr_fpr_size-out-of-pass_std.patch - 0004-Handle-internal_fn-in-operand_equal_p.patch - 0005-Postpone-expanding-va_arg-until-pass_stdarg.patch The patch series - based on Michael's initial patch - postpones expanding va_arg until pass_stdarg, which makes pass_stdarg more robust. Bootstrapped and reg-tested on x86_64 using all languages, with unix/ and unix/-m32 testing. I'll post the patches in reply to this email. This patch postpones expanding va_arg until pass_stdarg. We add a new internal function IFN_VA_ARG. During gimplification, we map VA_ARG_EXPR onto a CALL_EXPR with IFN_VA_ARG, which is then gimplified in to a gimple_call. At pass_stdarg, we expand the IFN_VA_ARG gimple_call into actual code. There are a few implementation details worth mentioning: - passing the type beyond gimplification is done by adding a NULL pointer- to-type to IFN_VA_ARG. - there is special handling for IFN_VA_ARG that would be most suited to be placed in gimplify_va_arg_expr. However, that function lacks the scope for the special handling, so it's placed awkwardly in gimplify_modify_expr. - there's special handling in case the va_arg type is variable-sized. gimplify_modify_expr adds a WITH_SIZE_EXPR to the CALL_EXPR IFN_VA_ARG for variable-sized types. However, this is gimplified into a gimple_call which does not have the possibility to wrap it's result in a WITH_SIZE_EXPR. So we're adding the size argument of the WITH_SIZE_EXPR as argument to IFN_VA_ARG, and at expansion in pass_stdarg, wrap the result of the gimplification of IFN_VA_ARG in a WITH_SIZE_EXPR, such that the subsequent gimplify_assign will generate a memcpy if necessary. - when gimplifying the va_arg argument ap, it may not be addressable. So gimplification will generate a copy ap.1 = ap, and use &ap.1 as argument. This means that we have to copy back the ap.1 value to ap after IFN_VA_ARG. The copy is classified by the va_list_gpr/fpr_size optimization as an escape, so it inhibits optimization. The tree-ssa/stdarg-2.c f15 update is because of that. OK for stage1? Looks mostly good, though it looks like with -O0 this doesn't delay lowering of va-arg and thus won't "fix" offloading. Can you instead introduce a PROP_gimple_lva, provide it by the stdarg pass and add a pass_lower_vaarg somewhere where pass_lower_complex_O0 is run that runs of !PROP_gimple_lva (and also provides it), and require PROP_gimple_lva by pass_expand? (just look for PROP_gimple_lcx for the complex stuff to get an idea what needs to be touched) Updated according to comments. Furthermore (having updated the patch series to recent trunk), I'm dropping the ACCEL_COMPILER bit in pass_stdarg::gate. AFAIU the comment there relates to this patch. Retested as before. OK for stage1? Ping. Ping again. Patch originally posted at: https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01332.html . Ok. Thanks, Richard. Thanks, - Tom Btw, I'm wondering if as run-time optimization we can tentatively set PROP_gimple_lva at the start of the gimple pass, and unset it in gimplify_va_arg_expr. That way we would avoid the loop in expand_ifn_va_arg_1 (over all bbs and gimples) in functions without va_arg. Taken care of in follow-up patch 5b.
Re: [PING^2] [PATCH][5a/5] Postpone expanding va_arg until pass_stdarg
On Tue, 9 Jun 2015, Alan Lawrence wrote: > Hmmm. One side effect of this is that the line number information available in > the target hook gimplify_va_arg_expr, is now just the name of the containing > function, rather than the specific use of va_arg. Is there some way to get > this more precise location (e.g. gimple_location(stmt) in expand_ifn_va_arg_1, > the only caller of said hook)? I don't really want to have to add an extra > parameter to the target hook... The x86 variant doesn't use any locations but if then the caller of the target hook (expand_ifn_va_arg_1) should assign the IFNs location to all statements expanded from it (it could set input_location to that during the target hook call...) Richard. > --Alan > > Richard Biener wrote: > > On Thu, 16 Apr 2015, Tom de Vries wrote: > > > > > [stage1 ping^2] > > > On 10-03-15 16:30, Tom de Vries wrote: > > > > [stage1 ping] > > > > On 22-02-15 14:13, Tom de Vries wrote: > > > > > On 19-02-15 14:03, Richard Biener wrote: > > > > > > On Thu, 19 Feb 2015, Tom de Vries wrote: > > > > > > > > > > > > > On 19-02-15 11:29, Tom de Vries wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I'm posting this patch series for stage1: > > > > > > > > - 0001-Disable-lang_hooks.gimplify_expr-in-free_lang_data.patch > > > > > > > > - 0002-Add-gimple_find_sub_bbs.patch > > > > > > > > - > > > > > > > > 0003-Factor-optimize_va_list_gpr_fpr_size-out-of-pass_std.patch > > > > > > > > - 0004-Handle-internal_fn-in-operand_equal_p.patch > > > > > > > > - 0005-Postpone-expanding-va_arg-until-pass_stdarg.patch > > > > > > > > > > > > > > > > The patch series - based on Michael's initial patch - postpones > > > > > > > > expanding > > > > > > > > va_arg > > > > > > > > until pass_stdarg, which makes pass_stdarg more robust. > > > > > > > > > > > > > > > > Bootstrapped and reg-tested on x86_64 using all languages, with > > > > > > > > unix/ and > > > > > > > > unix/-m32 testing. > > > > > > > > > > > > > > > > I'll post the patches in reply to this email. > > > > > > > > > > > > > > > This patch postpones expanding va_arg until pass_stdarg. > > > > > > > > > > > > > > We add a new internal function IFN_VA_ARG. During gimplification, > > > > > > > we > > > > > > > map > > > > > > > VA_ARG_EXPR onto a CALL_EXPR with IFN_VA_ARG, which is then > > > > > > > gimplified > > > > > > > in to a > > > > > > > gimple_call. At pass_stdarg, we expand the IFN_VA_ARG gimple_call > > > > > > > into > > > > > > > actual > > > > > > > code. > > > > > > > > > > > > > > There are a few implementation details worth mentioning: > > > > > > > - passing the type beyond gimplification is done by adding a NULL > > > > > > > pointer- > > > > > > >to-type to IFN_VA_ARG. > > > > > > > - there is special handling for IFN_VA_ARG that would be most > > > > > > > suited > > > > > > > to be > > > > > > >placed in gimplify_va_arg_expr. However, that function lacks > > > > > > > the > > > > > > > scope for > > > > > > >the special handling, so it's placed awkwardly in > > > > > > > gimplify_modify_expr. > > > > > > > - there's special handling in case the va_arg type is > > > > > > > variable-sized. > > > > > > >gimplify_modify_expr adds a WITH_SIZE_EXPR to the CALL_EXPR > > > > > > > IFN_VA_ARG for > > > > > > >variable-sized types. However, this is gimplified into a > > > > > > > gimple_call which > > > > > > >does not have the possibility to wrap it's result in a > > > > > > > WITH_SIZE_EXPR. So > > > > > > >we're adding the size argument of the WITH_SIZE_EXPR as > > > > > > > argument to > > > > > > >IFN_VA_ARG, and at expansion in pass_stdarg, wrap the result of > > > > > > > the > > > > > > >gimplification of IFN_VA_ARG in a WITH_SIZE_EXPR, such that the > > > > > > > subsequent > > > > > > >gimplify_assign will generate a memcpy if necessary. > > > > > > > - when gimplifying the va_arg argument ap, it may not be > > > > > > > addressable. > > > > > > > So > > > > > > >gimplification will generate a copy ap.1 = ap, and use &ap.1 as > > > > > > > argument. > > > > > > >This means that we have to copy back the ap.1 value to ap after > > > > > > > IFN_VA_ARG. > > > > > > >The copy is classified by the va_list_gpr/fpr_size optimization > > > > > > > as > > > > > > > an > > > > > > >escape, so it inhibits optimization. The tree-ssa/stdarg-2.c > > > > > > > f15 > > > > > > > update is > > > > > > >because of that. > > > > > > > > > > > > > > OK for stage1? > > > > > > Looks mostly good, though it looks like with -O0 this doesn't delay > > > > > > lowering of va-arg and thus won't "fix" offloading. Can you instead > > > > > > introduce a PROP_gimple_lva, provide it by the stdarg pass and add > > > > > > a pass_lower_vaarg somewhere where pass_lower_complex_O0 is run > > > > > > that runs of !PROP_gimple_lva (and also provides it), and require > > > > > > PROP_gimple_lva by pass_expand? (just look for PROP_gimple_lcx for > > > >
Re: [PING^2][PATCH][3/3][PR65460] Mark offloaded functions as parallelized
On Mon, 8 Jun 2015, Tom de Vries wrote: > On 17/04/15 12:08, Tom de Vries wrote: > > On 20-03-15 12:38, Tom de Vries wrote: > > > On 19-03-15 12:05, Tom de Vries wrote: > > > > On 18-03-15 18:22, Tom de Vries wrote: > > > > > Hi, > > > > > > > > > > this patch fixes PR65460. > > > > > > > > > > The patch marks offloaded functions as parallelized, which means the > > > > > parloops > > > > > pass no longer attempts to modify that function. > > > > > > > > Updated patch to postpone mark_parallelized_function until the > > > > corresponding > > > > cgraph_node is available, to ensure it works with the updated > > > > mark_parallelized_function from patch 2/3. > > > > > > > > > > Updated to eliminate mark_parallelized_function. > > > > > > Bootstrapped and reg-tested on x86_64. > > > > > > OK for stage4? > > > > > > > ping. > > ping^2. Original post at > https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01063.html . Ok, but shouldn't it be set before calling add_new_function as add_new_function might run passes that wouldn't identify the function as parallelized? Richard. > > > > OK for stage1? > > > > Thanks, > - Tom > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
On Tue, 9 Jun 2015, Richard Biener wrote: Tweaking it so that (6<=31 for TYPE_OVERFLOW_WRAPS and false for TYPE_OVERFLOW_UNDEFINED is probably more controversial. Hm, yes. I think signed overflow != shift amount overflow, so testing the overflow macros for this isn't valid. Would it be ok to always turn it to X>=31 then? (the value 31 is conveniently already computed in 'cand') -- Marc Glisse
Re: conditional lim
On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova wrote: > Hi Richard, > > Here is some explanation. I hope you let me know if I need to clarify > something. > > Also, you asked me about concrete example, to make sure you don’t miss > my answer here is the link: > https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html. > > Also, I doubt whether it’s convenient for you to create a build with > my patch or not. May be to clarify things you could send me some > examples/concrete cases, then I’ll compile them with > –fdump-tree-loopinit-details and –fdump-tree-lim-details and send you > these dumps. May be these dumps will be useful. (I’ll only disable > cleanup_cfg TODO after lim to let you know the exact picture after > lim). > > What do you think? > > 1. invariantness _dom_walker – > > 1.1 for each GIMPLE_COND in given bb calls handle_cond_stmt to call > for true and false edges handle_branch_edge, which calls SET_TARGET_OF > for all bb ‘predicated’ by given GIMPLE_COND. > > SET_TARGET_OF sets in basic_blocks aux 2 facts: > > a) this is true or false edge; > > b) link to cond stmt; > > Handle_branch_edge works this way: > > If (cond1) > > { > > bb1; > > if (cond2} > >{ > >bb2; > > } > >Being called for cond1, it sets cond1 as condition for both bb1 and > bb2 (the whole branch for cond1, ie also for bb containing cond2), > then this method will be called (as there is dominance order) for > cond2 to correct things (ie to set cond2 as condition for bb2). Hmm, why not track the current condition as state during the DOM walk and thus avoid processing more than one basic-block in handle_branch_edge? Thus get rid of handle_branch_edge and instead do everything in handle_cond_stmt plus the dom-walkers BB visitor? I see you don't handle BBs with multiple predecessors - that's ok, but are you sure you don't run into correctness issues when not marking such BBs as predicated? This misses handling of, say if (a || b) bb; which is a pity (but can be fixed later if desired). I note that collecting predicates has similarities to what if-conversion does in tree-ifcvt.c (even if its implementation is completely different, of course). > 1.2 As 1.1 goes we identify whether some bb is predicated by some > condition or not. > > bb->aux->type will be [TRUE/FALSE]_TARGET_OF and > bb->aux->cond_stmt=cond stmt (the nearest condition). > > If bb is always executed bb->aux->type = ALWAYS_EXECUTED_IN, > bb->loop->loop (this info was available in the clean build). > > 1.3 As this walker is called in dominance order, information about > condition is available when invariantness_dom_walker is called for > given bb. So we can make some computations based on bb->aux > structure. This is done in check_conditionally_executed. The main goal > of this method is to check that the root condition is always executed > in the loop. I did so to avoid situation like this > > Loop: > >Jmp somewhere; > > If (cond1) > > If (cond2) > > Etc > > By ‘root condition’ I mean cond1 in this case (the first cond in > dominance order). Ok, but this can't happen (an uncontional jump to somehwere). It will always be a conditional jump (a loop exit) and thus should be handled already. Did you have a testcase that was mishandled? > If ‘root condition’ is not always executed we disable (free) all the > info in bb->aux, ie all the blocks becomes neither conditionally nor > always executed according to bb->aux. > > There is some optimization to don’t go each time trough the conditions > chain (cond2->cond1), let me skip such details for now. > > > > 1.4 Then we calculate tgt_level (which is used in move_computations later) > > The patch is the same for phi and regular stmt (calculate_tgt_level) > > 1) If stmt is conditionally executed we compute max movement > (determine_max_movement) starting from > get_lim_data(condition)->max_loop. > > 2) If stmt is not cond executed as start level for > determine_max_movement computations we choose ALWAYS_EXECUTED_IN. > > To clarify why, see the comment > > /* The reason why we don't set start_level for MOVE_POSSIBLE > > statements to more outer level is: this statement could depend on > > conditional statements and even probably is not defined without this > > condition. So either we should analyze these ones and move > > under condition somehow or keep more strong start_level . */ > > As you noted in your review there was some refactoring. Of course, I > had no goal to refactor existing code, I intended to remove some > duplication which I caused by my changes. I hope we discuss in > details later if you don’t like some refactoring. Refactoring makes a big patch harder to review because it ends up containing no-op changes. It also makes it appear bigger than it really is ;) A reasonable solution is to factor out the refactoring to a separate patch. > 2. store_
Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
On Tue, 9 Jun 2015, Marc Glisse wrote: > On Tue, 9 Jun 2015, Richard Biener wrote: > > > > Tweaking it so that (6<=31 for TYPE_OVERFLOW_WRAPS and > > > false for TYPE_OVERFLOW_UNDEFINED is probably more controversial. > > > > Hm, yes. I think signed overflow != shift amount overflow, so testing the > > overflow macros for this isn't valid. > > Would it be ok to always turn it to X>=31 then? (the value 31 is conveniently > already computed in 'cand') I think so. Richard.
Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
On Tue, 9 Jun 2015, Richard Biener wrote: > On Tue, 9 Jun 2015, Marc Glisse wrote: > > > On Tue, 9 Jun 2015, Richard Biener wrote: > > > > > > Tweaking it so that (6<=31 for TYPE_OVERFLOW_WRAPS and > > > > false for TYPE_OVERFLOW_UNDEFINED is probably more controversial. > > > > > > Hm, yes. I think signed overflow != shift amount overflow, so testing the > > > overflow macros for this isn't valid. > > > > Would it be ok to always turn it to X>=31 then? (the value 31 is > > conveniently > > already computed in 'cand') > > I think so. Or even ((unsigned)X - 31) < 1 (I probably got that wrong) to properly say X>=29 && X<32, that is, preserve the implicit upper bound on X we have because it is used in a shift. Richard. > Richard. > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
Hi! As mentioned in the PR, when trying to split: (insn 7 15 13 2 (set (reg:DI 0 ax [92]) (mem:DI (plus:SI (plus:SI (mult:SI (reg/v:SI 1 dx [orig:89 b ] [89]) (const_int 8 [0x8])) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_TP)) (reg:SI 0 ax [91])) [1 a S8 A64])) rh1212265.i:2 85 {*movdi_internal} (nil)) which has collisions == 2 (both ax and dx used on lhs and both ax and dx used in the memory address), we generate invalid insn - lea with %gs: or %fs: in it. This patch fixes it by using normal lea instead (so remove the unspec UNSPEC_TP from the address for lea) and duplicating the unspec UNSPEC_TP to all the memory loads. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5/4.9/4.8? 2015-06-09 Jakub Jelinek PR target/66470 * config/i386/i386.c (ix86_split_long_move): For collisions involving direct tls segment refs, move the UNSPEC_TP out of the address for lea, to each of the memory loads. * gcc.dg/tls/pr66470.c: New test. --- gcc/config/i386/i386.c.jj 2015-06-08 15:41:19.0 +0200 +++ gcc/config/i386/i386.c 2015-06-09 11:50:29.960859723 +0200 @@ -22866,7 +22866,7 @@ ix86_split_long_move (rtx operands[]) Do an lea to the last part and use only one colliding move. */ else if (collisions > 1) { - rtx base; + rtx base, addr, tls_base = NULL_RTX; collisions = 1; @@ -22877,10 +22877,45 @@ ix86_split_long_move (rtx operands[]) if (GET_MODE (base) != Pmode) base = gen_rtx_REG (Pmode, REGNO (base)); - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); + addr = XEXP (part[1][0], 0); + if (TARGET_TLS_DIRECT_SEG_REFS) + { + struct ix86_address parts; + int ok = ix86_decompose_address (addr, &parts); + gcc_assert (ok); + if (parts.seg == DEFAULT_TLS_SEG_REG) + { + /* It is not valid to use %gs: or %fs: in +lea though, so we need to remove it from the +address used for lea and add it to each individual +memory loads instead. */ + addr = copy_rtx (addr); + rtx *x = &addr; + while (GET_CODE (*x) == PLUS) + { + for (i = 0; i < 2; i++) + if (GET_CODE (XEXP (*x, i)) == UNSPEC + && XINT (XEXP (*x, i), 1) == UNSPEC_TP) + { + tls_base = XEXP (*x, i); + *x = XEXP (*x, 1 - i); + break; + } + if (tls_base) + break; + x = &XEXP (*x, 0); + } + gcc_assert (tls_base); + } + } + emit_insn (gen_rtx_SET (base, addr)); + if (tls_base) + base = gen_rtx_PLUS (GET_MODE (base), base, tls_base); part[1][0] = replace_equiv_address (part[1][0], base); for (i = 1; i < nparts; i++) { + if (tls_base) + base = copy_rtx (base); tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i); part[1][i] = replace_equiv_address (part[1][i], tmp); } --- gcc/testsuite/gcc.dg/tls/pr66470.c.jj 2015-06-09 11:59:05.543954781 +0200 +++ gcc/testsuite/gcc.dg/tls/pr66470.c 2015-06-09 11:58:43.0 +0200 @@ -0,0 +1,29 @@ +/* PR target/66470 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target tls } */ + +extern __thread unsigned long long a[10]; +extern __thread struct S { int a, b; } b[10]; + +unsigned long long +foo (long x) +{ + return a[x]; +} + +struct S +bar (long x) +{ + return b[x]; +} + +#ifdef __SIZEOF_INT128__ +extern __thread unsigned __int128 c[10]; + +unsigned __int128 +baz (long x) +{ + return c[x]; +} +#endif Jakub
Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
On Tue, 9 Jun 2015, Richard Biener wrote: On Tue, 9 Jun 2015, Richard Biener wrote: On Tue, 9 Jun 2015, Marc Glisse wrote: On Tue, 9 Jun 2015, Richard Biener wrote: Tweaking it so that (6<=31 for TYPE_OVERFLOW_WRAPS and false for TYPE_OVERFLOW_UNDEFINED is probably more controversial. Hm, yes. I think signed overflow != shift amount overflow, so testing the overflow macros for this isn't valid. Would it be ok to always turn it to X>=31 then? (the value 31 is conveniently already computed in 'cand') I think so. Or even ((unsigned)X - 31) < 1 (I probably got that wrong) to properly say X>=29 && X<32, that is, preserve the implicit upper bound on X we have because it is used in a shift. I don't understand in what sense this preserves the upper bound. I would understand storing a range for X (when it is an SSA_NAME, and it would require a lot of care not to propagate backwards too far), or more simply introducing if(X>=32) __builtin_unreachable();. But you seem to be talking about generating more complicated code so that if someone checks (6<<123)==0 it returns false? -- Marc Glisse
Re: [PATCH] Simple optimization for MASK_STORE.
On Wed, May 20, 2015 at 4:00 PM, Yuri Rumyantsev wrote: > Hi All, > > Here is updated patch to optimize mask stores. The main goal of it is > to avoid execution of mask store if its mask is zero vector since > loads that follow it can be blocked. > The following changes were done: > 1. A test on sink legality was added - it simply prohibits to cross > statements with non-null vdef or vuse. > 2. New phi node is created in join block for moved MASK_STORE statements. > 3. Test was changed to check that 2 MASK_STORE statements are not > moved to the same block. > 4. New field was added to loop_vec_info structure to mark loops > having MASK_STORE's. > > Any comments will be appreciated. > Yuri. I still don't understand why you need the new target hook. If we have a masked load/store then the mask is computed by an assignment with a VEC_COND_EXPR (in your example) and thus a test for a zero mask is expressible as just if (vect__ifc__41.17_167 == { 0, 0, 0, 0... }) or am I missing something? As we dont' want this transform on all targets (or always) we can control it by either a --param or a new flag which default targets can adjust. [Is the hazard still present with Skylake or other AVX512 implementations for example?] +/* Helper for optimize_mask_stores: returns true if STMT sinking to end + of BB is valid and false otherwise. */ + +static bool +is_valid_sink (gimple stmt) +{ so STMT is either a masked store or a masked load? You shouldn't walk all stmts here but instead rely on the factored use-def def-use chains of virtual operands. +void +optimize_mask_stores (struct loop *loop) +{ + basic_block bb = loop->header; + gimple_stmt_iterator gsi; + gimple stmt; + auto_vec worklist; + + if (loop->dont_vectorize + || loop->num_nodes > 2) +return; looks like no longer needed given the place this is called from now (or does it guard against outer loop vectorization as well?) Also put it into tree-vect-loop.c and make it static. + /* Loop was not vectorized if mask does not have vector type. */ + if (!VECTOR_TYPE_P (TREE_TYPE (mask))) + return; this should be always false + store_bb->frequency = bb->frequency / 2; + efalse->probability = REG_BR_PROB_BASE / 2; I think the == 0 case should end up as unlikely. + if (dom_info_available_p (CDI_POST_DOMINATORS)) + set_immediate_dominator (CDI_POST_DOMINATORS, bb, store_bb); post dominators are not available in the vectorizer. + /* Put definition statement of stored value in STORE_BB +if possible. */ + arg3 = gimple_call_arg (last, 3); + if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3)) + { + def_stmt = SSA_NAME_DEF_STMT (arg3); + /* Move def_stmt to STORE_BB if it is in the same bb and +it is legal. */ + if (gimple_bb (def_stmt) == bb + && is_valid_sink (def_stmt)) ok, so you move arbitrary statements. You can always move non-memory statements and if you keep track of the last store you moved you can check if gimple_vuse () is the same as on that last store and be done with that, or if you sink another store (under the same condition) then just update the last store. Otherwise this looks better now. Thus - get rid of the target hook and of is_valid_sink. Thanks, Richard. > 2015-05-20 Yuri Rumyantsev > > * config/i386/i386.c: Include files stringpool.h and tree-ssanames.h. > (ix86_vectorize_is_zero_vector): New function. > (TARGET_VECTORIZE_IS_ZERO_VECTOR): New target macro > * doc/tm.texi.in: Add @hook TARGET_VECTORIZE_IS_ZERO_VECTOR. > * doc/tm.texi: Updated. > * target.def (is_zero_vector): New DEFHOOK. > * tree-vect-stmts.c : Include tree-into-ssa.h. > (vectorizable_mask_load_store): Initialize has_mask_store field. > (is_valid_sink): New function. > (optimize_mask_stores): New function. > * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for > loops having masked stores. > * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and > correspondent macros. > (optimize_mask_stores): Update prototype. > > gcc/testsuite/ChangeLog: > * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
On Tue, 9 Jun 2015, Marc Glisse wrote: > On Tue, 9 Jun 2015, Richard Biener wrote: > > > On Tue, 9 Jun 2015, Richard Biener wrote: > > > > > On Tue, 9 Jun 2015, Marc Glisse wrote: > > > > > > > On Tue, 9 Jun 2015, Richard Biener wrote: > > > > > > > > > > Tweaking it so that (6<=31 for TYPE_OVERFLOW_WRAPS > > > > > > and > > > > > > false for TYPE_OVERFLOW_UNDEFINED is probably more controversial. > > > > > > > > > > Hm, yes. I think signed overflow != shift amount overflow, so testing > > > > > the > > > > > overflow macros for this isn't valid. > > > > > > > > Would it be ok to always turn it to X>=31 then? (the value 31 is > > > > conveniently > > > > already computed in 'cand') > > > > > > I think so. > > > > Or even ((unsigned)X - 31) < 1 (I probably got that wrong) to properly > > say X>=29 && X<32, that is, preserve the implicit upper bound on X > > we have because it is used in a shift. > > I don't understand in what sense this preserves the upper bound. I would > understand storing a range for X (when it is an SSA_NAME, and it would require > a lot of care not to propagate backwards too far), or more simply introducing > if(X>=32) __builtin_unreachable();. But you seem to be talking about > generating more complicated code so that if someone checks (6<<123)==0 it > returns false? Well, I'm mixing simplifying the computation and preserving extra info we got from the complex computation. So yes, you are right. Richard.
[PATCH] Fix PR66423
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-06-09 Richard Biener PR middle-end/66423 * match.pd: Handle A % (unsigned)(1 << B). * gcc.dg/fold-modpow2.c: New testcase. Index: gcc/match.pd === *** gcc/match.pd(revision 224271) --- gcc/match.pd(working copy) *** (define_operator_list swapped_tcc_compar *** 248,258 (lshift INTEGER_CST@1 @2)) (for mod (trunc_mod floor_mod) (simplify ! (mod @0 (power_of_two_cand@1 @2)) (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0)) && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0) !(bit_and @0 (minus @1 { build_int_cst (TREE_TYPE (@1), 1); }) /* X % Y is smaller than Y. */ (for cmp (lt ge) --- 248,259 (lshift INTEGER_CST@1 @2)) (for mod (trunc_mod floor_mod) (simplify ! (mod @0 (convert?@3 (power_of_two_cand@1 @2))) (if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0)) + && tree_nop_conversion_p (type, TREE_TYPE (@3)) && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0) !(bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); })) /* X % Y is smaller than Y. */ (for cmp (lt ge) Index: gcc/testsuite/gcc.dg/fold-modpow2.c === *** gcc/testsuite/gcc.dg/fold-modpow2.c (revision 0) --- gcc/testsuite/gcc.dg/fold-modpow2.c (working copy) *** *** 0 --- 1,11 + /* { dg-do compile } */ + /* { dg-options "-fdump-tree-original" } */ + + unsigned int + my_mod (unsigned int a, unsigned int b) + { + return a % (1 << b); + } + + /* The above should be simplified to (unsigned int) ((1 << b) + -1) & a */ + /* { dg-final { scan-tree-dump "& a;" "original" } } */
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 9, 2015 at 1:57 PM, Jakub Jelinek wrote: > Hi! > > As mentioned in the PR, when trying to split: > (insn 7 15 13 2 (set (reg:DI 0 ax [92]) > (mem:DI (plus:SI (plus:SI (mult:SI (reg/v:SI 1 dx [orig:89 b ] [89]) > (const_int 8 [0x8])) > (unspec:SI [ > (const_int 0 [0]) > ] UNSPEC_TP)) > (reg:SI 0 ax [91])) [1 a S8 A64])) rh1212265.i:2 85 > {*movdi_internal} > (nil)) > which has collisions == 2 (both ax and dx used on lhs and both > ax and dx used in the memory address), we generate invalid insn > - lea with %gs: or %fs: in it. This patch fixes it by using > normal lea instead (so remove the unspec UNSPEC_TP from the address > for lea) and duplicating the unspec UNSPEC_TP to all the memory loads. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk/5/4.9/4.8? > > 2015-06-09 Jakub Jelinek > > PR target/66470 > * config/i386/i386.c (ix86_split_long_move): For collisions > involving direct tls segment refs, move the UNSPEC_TP out of > the address for lea, to each of the memory loads. > > * gcc.dg/tls/pr66470.c: New test. > > --- gcc/config/i386/i386.c.jj 2015-06-08 15:41:19.0 +0200 > +++ gcc/config/i386/i386.c 2015-06-09 11:50:29.960859723 +0200 > @@ -22866,7 +22866,7 @@ ix86_split_long_move (rtx operands[]) > Do an lea to the last part and use only one colliding move. */ >else if (collisions > 1) > { > - rtx base; > + rtx base, addr, tls_base = NULL_RTX; > > collisions = 1; > > @@ -22877,10 +22877,45 @@ ix86_split_long_move (rtx operands[]) > if (GET_MODE (base) != Pmode) > base = gen_rtx_REG (Pmode, REGNO (base)); > > - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); > + addr = XEXP (part[1][0], 0); > + if (TARGET_TLS_DIRECT_SEG_REFS) > + { > + struct ix86_address parts; > + int ok = ix86_decompose_address (addr, &parts); > + gcc_assert (ok); > + if (parts.seg == DEFAULT_TLS_SEG_REG) > + { > + /* It is not valid to use %gs: or %fs: in > +lea though, so we need to remove it from the > +address used for lea and add it to each individual > +memory loads instead. */ > + addr = copy_rtx (addr); > + rtx *x = &addr; > + while (GET_CODE (*x) == PLUS) Why not use RTX iterators here? IMO, it would be much more readable. Uros. > + { > + for (i = 0; i < 2; i++) > + if (GET_CODE (XEXP (*x, i)) == UNSPEC > + && XINT (XEXP (*x, i), 1) == UNSPEC_TP) > + { > + tls_base = XEXP (*x, i); > + *x = XEXP (*x, 1 - i); > + break; > + } > + if (tls_base) > + break; > + x = &XEXP (*x, 0); > + } > + gcc_assert (tls_base); > + } > + } > + emit_insn (gen_rtx_SET (base, addr)); > + if (tls_base) > + base = gen_rtx_PLUS (GET_MODE (base), base, tls_base); > part[1][0] = replace_equiv_address (part[1][0], base); > for (i = 1; i < nparts; i++) > { > + if (tls_base) > + base = copy_rtx (base); > tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i); > part[1][i] = replace_equiv_address (part[1][i], tmp); > } > --- gcc/testsuite/gcc.dg/tls/pr66470.c.jj 2015-06-09 11:59:05.543954781 > +0200 > +++ gcc/testsuite/gcc.dg/tls/pr66470.c 2015-06-09 11:58:43.0 +0200 > @@ -0,0 +1,29 @@ > +/* PR target/66470 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-require-effective-target tls } */ > + > +extern __thread unsigned long long a[10]; > +extern __thread struct S { int a, b; } b[10]; > + > +unsigned long long > +foo (long x) > +{ > + return a[x]; > +} > + > +struct S > +bar (long x) > +{ > + return b[x]; > +} > + > +#ifdef __SIZEOF_INT128__ > +extern __thread unsigned __int128 c[10]; > + > +unsigned __int128 > +baz (long x) > +{ > + return c[x]; > +} > +#endif > > Jakub
Re: [patch] Adjust gcc-plugin.h
On 06/09/2015 03:56 AM, Richard Biener wrote: On Mon, Jun 8, 2015 at 7:52 PM, Andrew MacLeod wrote: the gcc source files need to see the internal bits in plugin.h, as well as the common decls in gcc-plugin.h. So we could change the includes as you suggest, but we'd have to copy all the decls from gcc-inlcude.h to plugin.h so the gcc functions can see them. And then the plugins would be exposed to all the internal APIs and decls present in plugins.h plugins are exposed to all internals of GCC anyway. gcc-plugin.h should really just be a #include kitchen-sink. Im presuming we didnt want to do that and thats why there were 2 files to start with. No, gcc-plugin.h was introduced to make the set of includes required for plugins "stable". Richard. I didn't actually realize that.. at least its serving its true purpose now :-) So hows this. Bootstrapping and testruns proceeding now, but it should have the same results as the other patch. Assuming no regressions, OK? Andrew * gcc-plugin.h: Move decls to plugin.h and include it. * plugin.h: Relocate decls from gcc-plugin.h * ggc-page.c: Include required header files. * passes.c: Likewise. * cgraphunit.c: Likewise. Index: gcc-plugin.h === *** gcc-plugin.h (revision 224250) --- gcc-plugin.h (working copy) *** along with GCC; see the file COPYING3. *** 27,33 #include "config.h" #include "system.h" #include "coretypes.h" - #include "highlev-plugin-common.h" #include "tm.h" #include "hard-reg-set.h" #include "input.h" --- 27,32 *** along with GCC; see the file COPYING3. *** 49,186 #include "tree-core.h" #include "fold-const.h" #include "tree-check.h" ! ! /* Event names. */ ! enum plugin_event ! { ! # define DEFEVENT(NAME) NAME, ! # include "plugin.def" ! # undef DEFEVENT ! PLUGIN_EVENT_FIRST_DYNAMIC ! }; ! ! /* All globals declared here have C linkage to reduce link compatibility !issues with implementation language choice and mangling. */ ! #ifdef __cplusplus ! extern "C" { ! #endif ! ! extern const char **plugin_event_name; ! ! struct plugin_argument ! { ! char *key;/* key of the argument. */ ! char *value; /* value is optional and can be NULL. */ ! }; ! ! /* Additional information about the plugin. Used by --help and --version. */ ! ! struct plugin_info ! { ! const char *version; ! const char *help; ! }; ! ! /* Represents the gcc version. Used to avoid using an incompatible plugin. */ ! ! struct plugin_gcc_version ! { ! const char *basever; ! const char *datestamp; ! const char *devphase; ! const char *revision; ! const char *configuration_arguments; ! }; ! ! /* Object that keeps track of the plugin name and its arguments. */ ! struct plugin_name_args ! { ! char *base_name; /* Short name of the plugin (filename without !.so suffix). */ ! const char *full_name;/* Path to the plugin as specified with !-fplugin=. */ ! int argc; /* Number of arguments specified with !-fplugin-arg-... */ ! struct plugin_argument *argv; /* Array of ARGC key-value pairs. */ ! const char *version; /* Version string provided by plugin. */ ! const char *help; /* Help string provided by plugin. */ ! }; ! ! /* The default version check. Compares every field in VERSION. */ ! ! extern bool plugin_default_version_check (struct plugin_gcc_version *, ! struct plugin_gcc_version *); ! ! /* Function type for the plugin initialization routine. Each plugin module !should define this as an externally-visible function with name !"plugin_init." ! !PLUGIN_INFO - plugin invocation information. !VERSION - the plugin_gcc_version symbol of GCC. ! !Returns 0 if initialization finishes successfully. */ ! ! typedef int (*plugin_init_func) (struct plugin_name_args *plugin_info, ! struct plugin_gcc_version *version); ! ! /* Declaration for "plugin_init" function so that it doesn't need to be !duplicated in every plugin. */ ! extern int plugin_init (struct plugin_name_args *plugin_info, ! struct plugin_gcc_version *version); ! ! /* Function type for a plugin callback routine. ! !GCC_DATA - event-specific data provided by GCC !USER_DATA - plugin-specific data provided by the plugin */ ! ! typedef void (*plugin_callback_func) (void *gcc_data, void *user_data); ! ! /* Called from the plugin's initialization code. Register a single callback. !This function can be called multiple times. ! !PLUGIN_NAME - display name for this plugin !EVENT - which event the callback is for !CALLBACK- the callback to be called at the event !USER_DATA - plugin-provided data. ! */ ! ! /* Number of event ids / names registered
Re: [patch] Adjust gcc-plugin.h
On Tue, Jun 9, 2015 at 2:32 PM, Andrew MacLeod wrote: > On 06/09/2015 03:56 AM, Richard Biener wrote: >> >> On Mon, Jun 8, 2015 at 7:52 PM, Andrew MacLeod >> wrote: >>> >>> >>> the gcc source files need to see the internal bits in plugin.h, as well >>> as >>> the common decls in gcc-plugin.h. So we could change the includes as you >>> suggest, but we'd have to copy all the decls from gcc-inlcude.h to >>> plugin.h >>> so the gcc functions can see them. And then the plugins would be exposed >>> to >>> all the internal APIs and decls present in plugins.h >> >> plugins are exposed to all internals of GCC anyway. gcc-plugin.h should >> really >> just be a #include kitchen-sink. >> >> >>> Im presuming >>> we didnt want to do that and thats why there were 2 files to start with. >> >> No, gcc-plugin.h was introduced to make the set of includes required >> for plugins "stable". >> >> Richard. >> >> > I didn't actually realize that.. at least its serving its true purpose now > :-) > > So hows this. Bootstrapping and testruns proceeding now, but it should have > the same results as the other patch. > Assuming no regressions, OK? Ok. Richard. > Andrew > > >
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 09, 2015 at 02:32:07PM +0200, Uros Bizjak wrote: > > - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); > > + addr = XEXP (part[1][0], 0); > > + if (TARGET_TLS_DIRECT_SEG_REFS) > > + { > > + struct ix86_address parts; > > + int ok = ix86_decompose_address (addr, &parts); > > + gcc_assert (ok); > > + if (parts.seg == DEFAULT_TLS_SEG_REG) > > + { > > + /* It is not valid to use %gs: or %fs: in > > +lea though, so we need to remove it from the > > +address used for lea and add it to each individual > > +memory loads instead. */ > > + addr = copy_rtx (addr); > > + rtx *x = &addr; > > + while (GET_CODE (*x) == PLUS) > > Why not use RTX iterators here? IMO, it would be much more readable. Do you mean something like this? It is larger and don't see readability advantages there at all (plus the 4.8/4.9 backports can't use that anyway). But if you prefer it, I can retest it. I still need to look for a PLUS and scan the individual operands of it, because the desired change is that the PLUS is replaced with its operand (one that isn't UNSPEC_TP). And getting rid of the ix86_decompose_address would mean either unconditionally performing (wasteful) copy_rtx, or two RTX iterators cycles. The PLUS it is looking for has to be a toplevel PLUS or PLUS in XEXP (x, 0) of that (recursively), otherwise ix86_decompose_address wouldn't recognize it. 2015-06-09 Jakub Jelinek PR target/66470 * config/i386/i386.c (ix86_split_long_move): For collisions involving direct tls segment refs, move the UNSPEC_TP out of the address for lea, to each of the memory loads. * gcc.dg/tls/pr66470.c: New test. --- gcc/config/i386/i386.c.jj 2015-06-08 15:41:19.0 +0200 +++ gcc/config/i386/i386.c 2015-06-09 14:42:18.357849227 +0200 @@ -22866,7 +22866,7 @@ ix86_split_long_move (rtx operands[]) Do an lea to the last part and use only one colliding move. */ else if (collisions > 1) { - rtx base; + rtx base, addr, tls_base = NULL_RTX; collisions = 1; @@ -22877,10 +22877,48 @@ ix86_split_long_move (rtx operands[]) if (GET_MODE (base) != Pmode) base = gen_rtx_REG (Pmode, REGNO (base)); - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); + addr = XEXP (part[1][0], 0); + if (TARGET_TLS_DIRECT_SEG_REFS) + { + struct ix86_address parts; + int ok = ix86_decompose_address (addr, &parts); + gcc_assert (ok); + if (parts.seg == DEFAULT_TLS_SEG_REG) + { + /* It is not valid to use %gs: or %fs: in +lea though, so we need to remove it from the +address used for lea and add it to each individual +memory loads instead. */ + addr = copy_rtx (addr); + subrtx_ptr_iterator::array_type array; + FOR_EACH_SUBRTX_PTR (iter, array, &addr, NONCONST) + { + rtx *x = *iter; + if (GET_CODE (*x) == PLUS) + { + for (i = 0; i < 2; i++) + if (GET_CODE (XEXP (*x, i)) == UNSPEC + && XINT (XEXP (*x, i), 1) == UNSPEC_TP) + { + tls_base = XEXP (*x, i); + *x = XEXP (*x, 1 - i); + break; + } + if (tls_base) + break; + } + } + gcc_assert (tls_base); + } + } + emit_insn (gen_rtx_SET (base, addr)); + if (tls_base) + base = gen_rtx_PLUS (GET_MODE (base), base, tls_base); part[1][0] = replace_equiv_address (part[1][0], base); for (i = 1; i < nparts; i++) { + if (tls_base) + base = copy_rtx (base); tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i); part[1][i] = replace_equiv_address (part[1][i], tmp); } --- gcc/testsuite/gcc.dg/tls/pr66470.c.jj 2015-06-09 11:59:05.543954781 +0200 +++ gcc/testsuite/gcc.dg/tls/pr66470.c 2015-06-09 11:58:43.0 +0200 @@ -0,0 +1,29 @@ +/* PR target/66470 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target tls } */ + +extern __thread unsigned long long a[10]; +extern __thread struct S { int a, b; } b[10]; + +unsigned long long +foo (long x) +{ + return a[x]; +} + +struct S +bar (long x) +{ + return b[x]; +} + +#ifdef __SIZEOF_INT128__ +extern __thread unsigned __int128 c
Re: [PING^2] [PATCH][5a/5] Postpone expanding va_arg until pass_stdarg
On 09/06/15 13:03, Richard Biener wrote: On Tue, 9 Jun 2015, Alan Lawrence wrote: Hmmm. One side effect of this is that the line number information available in the target hook gimplify_va_arg_expr, is now just the name of the containing function, rather than the specific use of va_arg. Is there some way to get this more precise location (e.g. gimple_location(stmt) in expand_ifn_va_arg_1, the only caller of said hook)? I don't really want to have to add an extra parameter to the target hook... The x86 variant doesn't use any locations but if then the caller of the target hook (expand_ifn_va_arg_1) should assign the IFNs location to all statements expanded from it (it could set input_location to that during the target hook call...) That seems to work. The scan-assembler-not test in the testcase in attached patch: - fails without the expand_ifn_va_arg_1 patch hunk, and - passes with that hunk. I'll put it through bootstrap and reg-test on x86_64. OK for trunk if that goes well? Thanks, - Tom Handle location in expand_ifn_va_arg_1 2015-06-09 Tom de Vries * tree-stdarg.c (expand_ifn_va_arg_1): Handle location. * gcc.target/i386/vararg-loc.c: New test. --- gcc/testsuite/gcc.target/i386/vararg-loc.c | 27 +++ gcc/tree-stdarg.c | 4 2 files changed, 31 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/vararg-loc.c diff --git a/gcc/testsuite/gcc.target/i386/vararg-loc.c b/gcc/testsuite/gcc.target/i386/vararg-loc.c new file mode 100644 index 000..f236fe3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vararg-loc.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-g -O0" } */ + +#include + +int/* 6. */ +/* 7. */ +f (int a, ...) /* 8. */ +/* 9. */ +{ + + int sum = a; + + va_list ap; + + va_start (ap, a); + + sum += va_arg (ap, int); /* 18. */ + + sum += va_arg (ap, int); /* 20. */ + + return sum; +} + +/* { dg-final { scan-assembler-not "\\.loc 1 \[6789\] 0" } } */ +/* { dg-final { scan-assembler-times "\\.loc 1 18 0" 1 } } */ +/* { dg-final { scan-assembler-times "\\.loc 1 20 0" 1 } } */ diff --git a/gcc/tree-stdarg.c b/gcc/tree-stdarg.c index 08d10b5..65fe9f9 100644 --- a/gcc/tree-stdarg.c +++ b/gcc/tree-stdarg.c @@ -1031,6 +1031,7 @@ expand_ifn_va_arg_1 (function *fun) bool modified = false; basic_block bb; gimple_stmt_iterator i; + location_t saved_location; FOR_EACH_BB_FN (bb, fun) for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next (&i)) @@ -1051,6 +1052,8 @@ expand_ifn_va_arg_1 (function *fun) ap = build_fold_indirect_ref (ap); push_gimplify_context (false); + saved_location = input_location; + input_location = gimple_location (stmt); /* Make it easier for the backends by protecting the valist argument from multiple evaluations. */ @@ -1081,6 +1084,7 @@ expand_ifn_va_arg_1 (function *fun) else gimplify_expr (&expr, &pre, &post, is_gimple_lvalue, fb_lvalue); + input_location = saved_location; pop_gimplify_context (NULL); gimple_seq_add_seq (&pre, post); -- 1.9.1
Re: [PING^2] [PATCH][5a/5] Postpone expanding va_arg until pass_stdarg
On Tue, 9 Jun 2015, Tom de Vries wrote: > On 09/06/15 13:03, Richard Biener wrote: > > On Tue, 9 Jun 2015, Alan Lawrence wrote: > > > > > Hmmm. One side effect of this is that the line number information > > > available in > > > the target hook gimplify_va_arg_expr, is now just the name of the > > > containing > > > function, rather than the specific use of va_arg. Is there some way to get > > > this more precise location (e.g. gimple_location(stmt) in > > > expand_ifn_va_arg_1, > > > the only caller of said hook)? I don't really want to have to add an extra > > > parameter to the target hook... > > > > The x86 variant doesn't use any locations but if then the caller of > > the target hook (expand_ifn_va_arg_1) should assign the IFNs location > > to all statements expanded from it (it could set input_location to > > that during the target hook call...) > > > > That seems to work. > > The scan-assembler-not test in the testcase in attached patch: > - fails without the expand_ifn_va_arg_1 patch hunk, and > - passes with that hunk. > > I'll put it through bootstrap and reg-test on x86_64. > > OK for trunk if that goes well? Ok. Thanks, Richard. > Thanks, > - Tom > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
[Patch testsuite obvious] g++.dg/ext/pr57735.C should not run if the testsuite is explicitly passing -mfloat-abi=hard
Hi, g++.dg/ext/pr57735.C is failing for test runs which explicitly pass -mfloat-abi=hard. Looking at the test, it seems the best fix would be to check before adding -mfloat-abi=soft that we are not testing some other float-abi. We also fail to check that it is OK to add -march=armv5te and -marm. Fixed using the same mechanisms we use elsewhere in the gcc.target/arm/ tests with the attached, applied as obvious as revision 224280. Thanks, James --- gcc/testsuite/ 2015-06-09 James Greenhalgh * g++.dg/ext/pr57735.C: Do not override -mfloat-abi directives passed by the testsuite driver. diff --git a/gcc/testsuite/g++.dg/ext/pr57735.C b/gcc/testsuite/g++.dg/ext/pr57735.C index 0eb9500..a8f7d05 100644 --- a/gcc/testsuite/g++.dg/ext/pr57735.C +++ b/gcc/testsuite/g++.dg/ext/pr57735.C @@ -1,4 +1,7 @@ /* { dg-do compile { target arm*-*-* } } */ +/* { dg-require-effective-target arm_arch_v5te_ok } */ +/* { dg-require-effective-target arm_arm_ok } */ +/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } {"-mfloat-abi=soft" } } */ /* { dg-options "-march=armv5te -marm -mtune=xscale -mfloat-abi=soft -O1" } */ typedef unsigned int size_t;
[PATCH] Fix PR bootstrap/66471
Hello. Following patch renames an enum values so that they do not clash with a MinGW reserved keyword. Patch can bootstrap on MinGW and there's no regression seen on x86_64-linux-unknown-gnu. Ready for trunk? Thanks, Martin >From 740811e0af76af3e1e2f39f149c35bd49ef5fed3 Mon Sep 17 00:00:00 2001 From: mliska Date: Tue, 9 Jun 2015 13:27:53 +0200 Subject: [PATCH] Fix BITMAP identifier clash. gcc/ChangeLog: 2015-06-09 Martin Liska PR bootstrap/66471 * mem-stats-traits.h (enum mem_alloc_origin): Add _ORIGIN suffix for all enum values in mem_alloc_origin. * alloc-pool.c (dump_alloc_pool_statistics): Use newly changed enum name. * alloc-pool.h (pool_allocator::pool_allocator): Likewise. * bitmap.c (bitmap_register): Likewise. (dump_bitmap_statistics): Likewise. * ggc-common.c (dump_ggc_loc_statistics): Likewise. (ggc_record_overhead): Likewise. * hash-map.h: Likewise. * hash-set.h: Likewise. * hash-table.c (void dump_hash_table_loc_statistics): Likewise. * hash-table.h: Likewise. * vec.c (vec_prefix::register_overhead): Likewise. (vec_prefix::release_overhead): Likewise. (dump_vec_loc_statistics): Likewise. --- gcc/alloc-pool.c | 2 +- gcc/alloc-pool.h | 2 +- gcc/bitmap.c | 5 +++-- gcc/ggc-common.c | 4 ++-- gcc/hash-map.h | 2 +- gcc/hash-set.h | 2 +- gcc/hash-table.c | 2 +- gcc/hash-table.h | 4 ++-- gcc/mem-stats-traits.h | 14 +++--- gcc/vec.c | 8 +--- 10 files changed, 24 insertions(+), 21 deletions(-) diff --git a/gcc/alloc-pool.c b/gcc/alloc-pool.c index bdf184a..7e25915 100644 --- a/gcc/alloc-pool.c +++ b/gcc/alloc-pool.c @@ -33,5 +33,5 @@ dump_alloc_pool_statistics (void) if (! GATHER_STATISTICS) return; - pool_allocator_usage.dump (ALLOC_POOL); + pool_allocator_usage.dump (ALLOC_POOL_ORIGIN); } diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h index af3adde..1785df5 100644 --- a/gcc/alloc-pool.h +++ b/gcc/alloc-pool.h @@ -232,7 +232,7 @@ pool_allocator::pool_allocator (const char *name, size_t num, m_elts_free (0), m_blocks_allocated (0), m_block_list (NULL), m_block_size (0), m_ignore_type_size (ignore_type_size), m_extra_size (extra_size), m_initialized (false), - m_location (ALLOC_POOL, false PASS_MEM_STAT) {} + m_location (ALLOC_POOL_ORIGIN, false PASS_MEM_STAT) {} /* Initialize a pool allocator. */ diff --git a/gcc/bitmap.c b/gcc/bitmap.c index d600ace..733c767 100644 --- a/gcc/bitmap.c +++ b/gcc/bitmap.c @@ -30,7 +30,8 @@ mem_alloc_description bitmap_mem_desc; void bitmap_register (bitmap b MEM_STAT_DECL) { - bitmap_mem_desc.register_descriptor (b, BITMAP, false FINAL_PASS_MEM_STAT); + bitmap_mem_desc.register_descriptor (b, BITMAP_ORIGIN, false + FINAL_PASS_MEM_STAT); } /* Account the overhead. */ @@ -2076,7 +2077,7 @@ dump_bitmap_statistics (void) if (! GATHER_STATISTICS) return; - bitmap_mem_desc.dump (BITMAP); + bitmap_mem_desc.dump (BITMAP_ORIGIN); } DEBUG_FUNCTION void diff --git a/gcc/ggc-common.c b/gcc/ggc-common.c index 6021c2a..60d427f 100644 --- a/gcc/ggc-common.c +++ b/gcc/ggc-common.c @@ -977,7 +977,7 @@ dump_ggc_loc_statistics (bool final) ggc_force_collect = true; ggc_collect (); - ggc_mem_desc.dump (GGC, final ? ggc_usage::compare_final : NULL); + ggc_mem_desc.dump (GGC_ORIGIN, final ? ggc_usage::compare_final : NULL); ggc_force_collect = false; } @@ -986,7 +986,7 @@ dump_ggc_loc_statistics (bool final) void ggc_record_overhead (size_t allocated, size_t overhead, void *ptr MEM_STAT_DECL) { - ggc_usage *usage = ggc_mem_desc.register_descriptor (ptr, GGC, false + ggc_usage *usage = ggc_mem_desc.register_descriptor (ptr, GGC_ORIGIN, false FINAL_PASS_MEM_STAT); ggc_mem_desc.register_object_overhead (usage, allocated + overhead, ptr); diff --git a/gcc/hash-map.h b/gcc/hash-map.h index f28f74f..2f1bca4 100644 --- a/gcc/hash-map.h +++ b/gcc/hash-map.h @@ -107,7 +107,7 @@ class GTY((user)) hash_map public: explicit hash_map (size_t n = 13, bool ggc = false, bool gather_mem_stats = true CXX_MEM_STAT_INFO) -: m_table (n, ggc, gather_mem_stats, HASH_MAP PASS_MEM_STAT) {} +: m_table (n, ggc, gather_mem_stats, HASH_MAP_ORIGIN PASS_MEM_STAT) {} /* Create a hash_map in ggc memory. */ static hash_map *create_ggc (size_t size, bool gather_mem_stats = true diff --git a/gcc/hash-set.h b/gcc/hash-set.h index af91e31..3ec0b15 100644 --- a/gcc/hash-set.h +++ b/gcc/hash-set.h @@ -179,7 +179,7 @@ class hash_set public: explicit hash_set (size_t n = 13, bool ggc = false CXX_MEM_STAT_INFO) -: m_table (n, ggc, true, HASH_SET PASS_MEM_STAT) {} +: m_table (n, ggc, true, HASH_SET_ORIGIN PASS_MEM_STAT) {} /* Create a hash_set in gc memory with space for at least n elements. */ diff --git a/gcc/hash-table.c b/gcc/hash-table.c index 012b241..a42b884 100644 --- a/gcc/hash-table.c +++ b/gcc/hash-table.c @@ -103,7 +103,7 @@ mem_alloc_descrip
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 9, 2015 at 2:57 PM, Jakub Jelinek wrote: > On Tue, Jun 09, 2015 at 02:32:07PM +0200, Uros Bizjak wrote: >> > - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); >> > + addr = XEXP (part[1][0], 0); >> > + if (TARGET_TLS_DIRECT_SEG_REFS) >> > + { >> > + struct ix86_address parts; >> > + int ok = ix86_decompose_address (addr, &parts); >> > + gcc_assert (ok); >> > + if (parts.seg == DEFAULT_TLS_SEG_REG) >> > + { >> > + /* It is not valid to use %gs: or %fs: in >> > +lea though, so we need to remove it from the >> > +address used for lea and add it to each individual >> > +memory loads instead. */ >> > + addr = copy_rtx (addr); >> > + rtx *x = &addr; >> > + while (GET_CODE (*x) == PLUS) >> >> Why not use RTX iterators here? IMO, it would be much more readable. > > Do you mean something like this? It is larger and don't see readability > advantages there at all (plus the 4.8/4.9 backports can't use that anyway). > But if you prefer it, I can retest it. I'm afraid that simple scan loop won't work correctly on x32. There are some issues with UNSPEC_TP for this target, so we have to generate zero_extend of SImode UNSPEC, e.g.: (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) as can be seen in get_thread_pointer to construct the address. It looks that your loop won't find the UNSPEC_TP tag in the above case. Uros.
Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev wrote: > Hi All, > > Here is a simple fix which allows duplication of outer loops to > perform peeling for number of iterations if outer loop is marked with > pragma omp simd. > > Bootstrap and regression testing did not show any new failures. > Is it OK for trunk? Hmm, I don't remember needing to adjust rename_variables_in_bb when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg on non-innermost loops... (I just copied, I never called slpeel_can_duplicate_loop_p though). So - you should just remove the loop->inner condition from slpeel_can_duplicate_loop_p as it is used by non-vectorizer code as well (yeah, I never merged the nested loop support for loop distribution...). Index: tree-vect-loop.c === --- tree-vect-loop.c(revision 224100) +++ tree-vect-loop.c(working copy) @@ -1879,6 +1879,10 @@ return false; } + /* Peeling for alignment is not supported for outer-loop vectorization. */ + if (LOOP_VINFO_LOOP (loop_vinfo)->inner) +LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0; I think you can't simply do this - if vect_enhance_data_refs_alignment decided to peel for alignment then it has adjusted the DRs alignment info already. So instead of the above simply disallow peeling for alignment in vect_enhance_data_refs_alignment? Thus add || ->inner to /* Check if we can possibly peel the loop. */ if (!vect_can_advance_ivs_p (loop_vinfo) || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) do_peeling = false; ? I also can't see why the improvement has to be gated on force_vect, it surely looks profitable to enable more outer loop vectorization in general, no? How do the cost model calculations end up with peeling the outer loop for niter? On targets which don't support unaligned accesses we're left with versioning for alignment. Isn't peeling for alignment better there? Thus only disallow peeling for alignment if there is no unhandled alignment? Thanks, Richard. > ChangeLog: > > 2015-06-08 Yuri Rumyantsev > > * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument > to allow renaming of PHI arguments on edges incoming from outer > loop header, add corresponding check before start PHI iterator. > (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool > variable DUPLICATE_OUTER_LOOP and set it to true for outer loops > with true force_vectorize. Set-up dominator for outer loop too. > Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. > (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it > was marked with force_vectorize and has restricted cfg. > * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling > for outer loops. > > gcc/testsuite/ChangeLog: > * gcc.dg/vect/vect-outer-simd-2.c: New test.
Re: [PATCH] Fix PR bootstrap/66471
On Tue, Jun 9, 2015 at 3:11 PM, Martin Liška wrote: > Hello. > > Following patch renames an enum values so that they do not clash with a MinGW > reserved keyword. Reserved keyword as in a #define? Thus simply #undef it in system.h? > Patch can bootstrap on MinGW and there's no regression seen on > x86_64-linux-unknown-gnu. Does c++04 have scoped enums? Thus can you use mem_alloc_origin::BITMAP? Thanks, Richard. > Ready for trunk? > Thanks, > Martin
Re: [PING^2] [PATCH][5a/5] Postpone expanding va_arg until pass_stdarg
Tom de Vries wrote: On 09/06/15 13:03, Richard Biener wrote: On Tue, 9 Jun 2015, Alan Lawrence wrote: Hmmm. One side effect of this is that the line number information available in the target hook gimplify_va_arg_expr, is now just the name of the containing function, rather than the specific use of va_arg. Is there some way to get this more precise location (e.g. gimple_location(stmt) in expand_ifn_va_arg_1, the only caller of said hook)? I don't really want to have to add an extra parameter to the target hook... The x86 variant doesn't use any locations but if then the caller of the target hook (expand_ifn_va_arg_1) should assign the IFNs location to all statements expanded from it (it could set input_location to that during the target hook call...) That seems to work. The scan-assembler-not test in the testcase in attached patch: - fails without the expand_ifn_va_arg_1 patch hunk, and - passes with that hunk. I'll put it through bootstrap and reg-test on x86_64. OK for trunk if that goes well? Thanks, - Tom That fixes the issue for me (I'm working with on a patch which uses the line info in error output) - thank you very much! Cheers, Alan
Re: [PATCH] Fix PR bootstrap/66471
Hi, On Tue, Jun 9, 2015 at 4:28 PM, Richard Biener wrote: > On Tue, Jun 9, 2015 at 3:11 PM, Martin Liška wrote: >> Hello. >> >> Following patch renames an enum values so that they do not clash with a >> MinGW reserved keyword. > > Reserved keyword as in a #define? Thus simply #undef it in system.h? No, wingdi.h has the following typedef struct tagBITMAP { LONG bmType; LONG bmWidth; LONG bmHeight; LONG bmWidthBytes; WORD bmPlanes; WORD bmBitsPixel; LPVOID bmBits; } BITMAP,*PBITMAP,*NPBITMAP,*LPBITMAP;
Re: [PATCH] Fix PR bootstrap/66471
On Tue, Jun 9, 2015 at 3:30 PM, İsmail Dönmez wrote: > Hi, > > On Tue, Jun 9, 2015 at 4:28 PM, Richard Biener > wrote: >> On Tue, Jun 9, 2015 at 3:11 PM, Martin Liška wrote: >>> Hello. >>> >>> Following patch renames an enum values so that they do not clash with a >>> MinGW reserved keyword. >> >> Reserved keyword as in a #define? Thus simply #undef it in system.h? > > No, wingdi.h has the following > > typedef struct tagBITMAP { > LONG bmType; > LONG bmWidth; > LONG bmHeight; > LONG bmWidthBytes; > WORD bmPlanes; > WORD bmBitsPixel; > LPVOID bmBits; > } BITMAP,*PBITMAP,*NPBITMAP,*LPBITMAP; Hmm. I see. Patch is ok. Thanks, Richard.
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote: > I'm afraid that simple scan loop won't work correctly on x32. There > are some issues with UNSPEC_TP for this target, so we have to generate > zero_extend of SImode UNSPEC, e.g.: > > (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) > > as can be seen in get_thread_pointer to construct the address. It > looks that your loop won't find the UNSPEC_TP tag in the above case. You're right, for -m32 it would need to start with rtx *x = &addr; + while (GET_CODE (*x) == ZERO_EXTEND +|| GET_CODE (*x) == AND +|| GET_CODE (*x) == SUBREG) +x = &XEXP (*x, 0); to get at the PLUS. Now, with either the original patch with the above ammendment, or with the iterators, the question is what to do with the UNSPEC_TP for -m32. Is it ok to just use addr32 on the lea and use normal Pmode (== DImode) addressing for the memory reads (if I read the code well, that is what it does right now for the non-TLS ones: if (GET_MODE (base) != Pmode) base = gen_rtx_REG (Pmode, REGNO (base)); )? Then we'd need to change tls_base mode to Pmode. Jakub
Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
On Tue, Jun 09, 2015 at 09:53:21AM +0200, Richard Biener wrote: > >> +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) > >> + (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) > >> + if CST2 != 0. */ > >> +(for cmp (ne eq) > >> + (simplify > >> + (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2) > >> + (with { > >> + unsigned int cand = wi::ctz (@2) - wi::ctz (@0); } > >> + (if (!integer_zerop (@2) > > > > > > You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be > > indented one space more? > > Yes, one space more. I suppose using integer_zerop might in theory > allow for handling vector shifts at some point ...? Fixed the formatting. But I kept the integer_zerop. > >> + && wi::eq_p (wi::lshift (@0, cand), @2)) > >> + (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); }) > > > > > > Making 'cand' signed, you could return 0 when cand<0, like (2< > could also return 0 when the candidate turns out not to work: (3< > Sounds like a good improvement. Yeah, it makes sense to do that while I'm on this. Done, and a new test added. > Otherwise the patch looks ok to me as well - mind doing the improvement above? Thank you both. How does this look now? Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-06-09 Marek Polacek PR tree-optimization/66299 * match.pd ((CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) ((CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)): New patterns. * gcc.dg/pr66299-1.c: New test. * gcc.dg/pr66299-2.c: New test. * gcc.dg/pr66299-3.c: New test. diff --git gcc/match.pd gcc/match.pd index abd7851..560b218 100644 --- gcc/match.pd +++ gcc/match.pd @@ -676,6 +676,21 @@ along with GCC; see the file COPYING3. If not see (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop) (icmp @0 { build_zero_cst (TREE_TYPE (@0)); }))) +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) + (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) + if CST2 != 0. */ +(for cmp (ne eq) + (simplify + (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2) + (with { int cand = wi::ctz (@2) - wi::ctz (@0); } + (if (cand < 0 + || (!integer_zerop (@2) + && wi::ne_p (wi::lshift (@0, cand), @2))) +{ constant_boolean_node (cmp == NE_EXPR, type); }) + (if (!integer_zerop (@2) + && wi::eq_p (wi::lshift (@0, cand), @2)) +(cmp @1 { build_int_cst (TREE_TYPE (@1), cand); }) + /* Simplifications of conversions. */ /* Basic strip-useless-type-conversions / strip_nops. */ diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c index e69de29..e75146b 100644 --- gcc/testsuite/gcc.dg/pr66299-1.c +++ gcc/testsuite/gcc.dg/pr66299-1.c @@ -0,0 +1,92 @@ +/* PR tree-optimization/66299 */ +/* { dg-do run } */ +/* { dg-options "-fdump-tree-original" } */ + +void +test1 (int x) +{ + if ((0 << x) != 0 + || (1 << x) != 2 + || (2 << x) != 4 + || (3 << x) != 6 + || (4 << x) != 8 + || (5 << x) != 10 + || (6 << x) != 12 + || (7 << x) != 14 + || (8 << x) != 16 + || (9 << x) != 18 + || (10 << x) != 20) +__builtin_abort (); +} + +void +test2 (int x) +{ + if (!((0 << x) == 0 + && (1 << x) == 4 + && (2 << x) == 8 + && (3 << x) == 12 + && (4 << x) == 16 + && (5 << x) == 20 + && (6 << x) == 24 + && (7 << x) == 28 + && (8 << x) == 32 + && (9 << x) == 36 + && (10 << x) == 40)) +__builtin_abort (); +} + +void +test3 (unsigned int x) +{ + if ((0U << x) != 0U + || (1U << x) != 16U + || (2U << x) != 32U + || (3U << x) != 48U + || (4U << x) != 64U + || (5U << x) != 80U + || (6U << x) != 96U + || (7U << x) != 112U + || (8U << x) != 128U + || (9U << x) != 144U + || (10U << x) != 160U) +__builtin_abort (); +} + +void +test4 (unsigned int x) +{ + if (!((0U << x) == 0U + || (1U << x) == 8U + || (2U << x) == 16U + || (3U << x) == 24U + || (4U << x) == 32U + || (5U << x) == 40U + || (6U << x) == 48U + || (7U << x) == 56U + || (8U << x) == 64U + || (9U << x) == 72U + || (10U << x) == 80U)) +__builtin_abort (); +} + +void +test5 (int x) +{ + if ((0 << x) == 1 + || (0 << x) != 0 + || (0x8001U << x) != 0x2U) +__builtin_abort (); +} + +int +main (void) +{ + test1 (1); + test2 (2); + test3 (4U); + test4 (3U); + test5 (17); +} + +/* { dg-final { scan-tree-dump-not "<<" "original" } } */ diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c index e69de29..45e9218 100644 --- gcc/testsuite/gcc.dg/pr66299-2.c +++ gcc/testsuite/gcc.dg/pr66299-2.c @@ -0,0 +1,33 @@ +/* PR tree-optimization/66299 */ +/* { dg-do run } */ +/* { dg-options "-fdump-tree-optimized -O" } */ + +void +test1 (int x, unsigned u) +{ + if ((1U << x) != 64 + || (2 << x) != u + || (x << x) != 384 + || (3 << x) ==
Re: [C++/58583] ICE instantiating NSDMIs
On 06/08/15 13:47, Jason Merrill wrote: How about using a DECL_LANG_FLAG instead of creating a garbage DEFAULT_ARG? This patch uses DECL_LANG_FLAG_2 on the FIELD_DECL to mark that its NSDMI is being instantiated. I also discovered I'd flubbed the markup on the testcase, not sure why I didn't notice that in the test results. I've double checked this time, and all looks good. built & tested on x86_64-linux-gnu nathan 2015-06-09 Nathan Sidwell cp/ PR c++/58583 * cp-tree.h (DECL_INSTANTIATING_NSDMI_P): New. * init.c (get_nsdmi): Check for DEFAULT_ARG in template case and protect it from recursive instantiation. testsuite/ PR c++/58583 * g++.dg/cpp0x/nsdmi-template14.C: New test. Index: cp/cp-tree.h === --- cp/cp-tree.h (revision 224198) +++ cp/cp-tree.h (working copy) @@ -160,6 +160,7 @@ c-common.h, not after. LABEL_DECL_CONTINUE (in LABEL_DECL) 2: DECL_THIS_EXTERN (in VAR_DECL or FUNCTION_DECL). DECL_IMPLICIT_TYPEDEF_P (in a TYPE_DECL) + DECL_INSTANTIATING_NSDMI_P (in a FIELD_DECL) 3: DECL_IN_AGGR_P. 4: DECL_C_BIT_FIELD (in a FIELD_DECL) DECL_ANON_UNION_VAR_P (in a VAR_DECL) @@ -3785,6 +3786,11 @@ more_aggr_init_expr_args_p (const aggr_i #define DECL_ARRAY_PARAMETER_P(NODE) \ DECL_LANG_FLAG_1 (PARM_DECL_CHECK (NODE)) +/* Nonzero for a FIELD_DECL who's NSMDI is currently being + instantiated. */ +#define DECL_INSTANTIATING_NSDMI_P(NODE) \ + DECL_LANG_FLAG_2 (FIELD_DECL_CHECK (NODE)) + /* Nonzero for FIELD_DECL node means that this field is a base class of the parent object, as opposed to a member field. */ #define DECL_FIELD_IS_BASE(NODE) \ Index: cp/init.c === --- cp/init.c (revision 224198) +++ cp/init.c (working copy) @@ -544,6 +544,7 @@ get_nsdmi (tree member, bool in_ctor) tree init; tree save_ccp = current_class_ptr; tree save_ccr = current_class_ref; + if (!in_ctor) { /* Use a PLACEHOLDER_EXPR when we don't have a 'this' parameter to @@ -551,22 +552,40 @@ get_nsdmi (tree member, bool in_ctor) current_class_ref = build0 (PLACEHOLDER_EXPR, DECL_CONTEXT (member)); current_class_ptr = build_address (current_class_ref); } + if (DECL_LANG_SPECIFIC (member) && DECL_TEMPLATE_INFO (member)) { - /* Do deferred instantiation of the NSDMI. */ - init = (tsubst_copy_and_build - (DECL_INITIAL (DECL_TI_TEMPLATE (member)), - DECL_TI_ARGS (member), - tf_warning_or_error, member, /*function_p=*/false, - /*integral_constant_expression_p=*/false)); + init = DECL_INITIAL (DECL_TI_TEMPLATE (member)); + if (TREE_CODE (init) == DEFAULT_ARG) + goto unparsed; - init = digest_nsdmi_init (member, init); + /* Check recursive instantiation. */ + if (DECL_INSTANTIATING_NSDMI_P (member)) + { + error ("recursive instantiation of non-static data member " + "initializer for %qD", member); + init = error_mark_node; + } + else + { + DECL_INSTANTIATING_NSDMI_P (member) = 1; + + /* Do deferred instantiation of the NSDMI. */ + init = (tsubst_copy_and_build + (init, DECL_TI_ARGS (member), + tf_warning_or_error, member, /*function_p=*/false, + /*integral_constant_expression_p=*/false)); + init = digest_nsdmi_init (member, init); + + DECL_INSTANTIATING_NSDMI_P (member) = 0; + } } else { init = DECL_INITIAL (member); if (init && TREE_CODE (init) == DEFAULT_ARG) { + unparsed: error ("constructor required before non-static data member " "for %qD has been parsed", member); DECL_INITIAL (member) = error_mark_node; Index: testsuite/g++.dg/cpp0x/nsdmi-template14.C === --- testsuite/g++.dg/cpp0x/nsdmi-template14.C (revision 0) +++ testsuite/g++.dg/cpp0x/nsdmi-template14.C (working copy) @@ -0,0 +1,22 @@ +// PR c++/58583 +// { dg-do compile { target c++11 } } + +template struct A // { dg-error "has been parsed" } +{ + int i = (A<0>(), 0); // { dg-error "has been parsed" } +}; + +template struct B +{ + B* p = new B; +}; + +B<1> x; // { dg-error "recursive instantiation of non-static data" } + +struct C +{ + template struct D + { +D* p = new D<0>; + }; +};
Re: [C++ Patch] PR 65815
OK, thanks. Jason
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek wrote: > On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote: >> I'm afraid that simple scan loop won't work correctly on x32. There >> are some issues with UNSPEC_TP for this target, so we have to generate >> zero_extend of SImode UNSPEC, e.g.: >> >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) >> >> as can be seen in get_thread_pointer to construct the address. It >> looks that your loop won't find the UNSPEC_TP tag in the above case. > > You're right, for -m32 it would need to start with >rtx *x = &addr; > + while (GET_CODE (*x) == ZERO_EXTEND > +|| GET_CODE (*x) == AND > +|| GET_CODE (*x) == SUBREG) > +x = &XEXP (*x, 0); > to get at the PLUS. Now, with either the original patch with the above > ammendment, or with the iterators, the question is what to do > with the UNSPEC_TP for -m32. Is it ok to just use addr32 on the > lea and use normal Pmode (== DImode) addressing for the memory reads > (if I read the code well, that is what it does right now for the non-TLS > ones: > if (GET_MODE (base) != Pmode) > base = gen_rtx_REG (Pmode, REGNO (base)); > )? Then we'd need to change tls_base mode to Pmode. (I assume you referred to -mx32 above. Please also note that -mx32 uses Pmode == SImode by default). Yes, please emit zero-extended address load to a DImode register, and use this register with UNSPEC_TP to form final DImode address. The LEA that will be generated from address load will use DImode inputs due to %H operand modifier and will output to (implicitly zero-extended) SImode register. FYI, you just hit two special cases here: - addresses that mention %fs and %gs are always in DImode - LEA input operands are also always in DImode, but compiler already takes care of it. Uros.
Re: [PATCH] Optimize (CST1 << A) == CST2 (PR tree-optimization/66299)
On Tue, 9 Jun 2015, Marek Polacek wrote: > On Tue, Jun 09, 2015 at 09:53:21AM +0200, Richard Biener wrote: > > >> +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) > > >> + (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) > > >> + if CST2 != 0. */ > > >> +(for cmp (ne eq) > > >> + (simplify > > >> + (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2) > > >> + (with { > > >> + unsigned int cand = wi::ctz (@2) - wi::ctz (@0); } > > >> + (if (!integer_zerop (@2) > > > > > > > > > You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be > > > indented one space more? > > > > Yes, one space more. I suppose using integer_zerop might in theory > > allow for handling vector shifts at some point ...? > > Fixed the formatting. But I kept the integer_zerop. > > > >> + && wi::eq_p (wi::lshift (@0, cand), @2)) > > >> + (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); }) > > > > > > > > > Making 'cand' signed, you could return 0 when cand<0, like (2< > > could also return 0 when the candidate turns out not to work: (3< > > > Sounds like a good improvement. > > Yeah, it makes sense to do that while I'm on this. Done, and a new > test added. > > > Otherwise the patch looks ok to me as well - mind doing the improvement > > above? > > Thank you both. How does this look now? > > Bootstrapped/regtested on x86_64-linux, ok for trunk? Ok. Thanks, Richard. > 2015-06-09 Marek Polacek > > PR tree-optimization/66299 > * match.pd ((CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) > ((CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1)): New > patterns. > > * gcc.dg/pr66299-1.c: New test. > * gcc.dg/pr66299-2.c: New test. > * gcc.dg/pr66299-3.c: New test. > > diff --git gcc/match.pd gcc/match.pd > index abd7851..560b218 100644 > --- gcc/match.pd > +++ gcc/match.pd > @@ -676,6 +676,21 @@ along with GCC; see the file COPYING3. If not see >(cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop) >(icmp @0 { build_zero_cst (TREE_TYPE (@0)); }))) > > +/* (CST1 << A) == CST2 -> A == ctz (CST2) - ctz (CST1) > + (CST1 << A) != CST2 -> A != ctz (CST2) - ctz (CST1) > + if CST2 != 0. */ > +(for cmp (ne eq) > + (simplify > + (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2) > + (with { int cand = wi::ctz (@2) - wi::ctz (@0); } > + (if (cand < 0 > + || (!integer_zerop (@2) > + && wi::ne_p (wi::lshift (@0, cand), @2))) > +{ constant_boolean_node (cmp == NE_EXPR, type); }) > + (if (!integer_zerop (@2) > + && wi::eq_p (wi::lshift (@0, cand), @2)) > +(cmp @1 { build_int_cst (TREE_TYPE (@1), cand); }) > + > /* Simplifications of conversions. */ > > /* Basic strip-useless-type-conversions / strip_nops. */ > diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c > index e69de29..e75146b 100644 > --- gcc/testsuite/gcc.dg/pr66299-1.c > +++ gcc/testsuite/gcc.dg/pr66299-1.c > @@ -0,0 +1,92 @@ > +/* PR tree-optimization/66299 */ > +/* { dg-do run } */ > +/* { dg-options "-fdump-tree-original" } */ > + > +void > +test1 (int x) > +{ > + if ((0 << x) != 0 > + || (1 << x) != 2 > + || (2 << x) != 4 > + || (3 << x) != 6 > + || (4 << x) != 8 > + || (5 << x) != 10 > + || (6 << x) != 12 > + || (7 << x) != 14 > + || (8 << x) != 16 > + || (9 << x) != 18 > + || (10 << x) != 20) > +__builtin_abort (); > +} > + > +void > +test2 (int x) > +{ > + if (!((0 << x) == 0 > + && (1 << x) == 4 > + && (2 << x) == 8 > + && (3 << x) == 12 > + && (4 << x) == 16 > + && (5 << x) == 20 > + && (6 << x) == 24 > + && (7 << x) == 28 > + && (8 << x) == 32 > + && (9 << x) == 36 > + && (10 << x) == 40)) > +__builtin_abort (); > +} > + > +void > +test3 (unsigned int x) > +{ > + if ((0U << x) != 0U > + || (1U << x) != 16U > + || (2U << x) != 32U > + || (3U << x) != 48U > + || (4U << x) != 64U > + || (5U << x) != 80U > + || (6U << x) != 96U > + || (7U << x) != 112U > + || (8U << x) != 128U > + || (9U << x) != 144U > + || (10U << x) != 160U) > +__builtin_abort (); > +} > + > +void > +test4 (unsigned int x) > +{ > + if (!((0U << x) == 0U > + || (1U << x) == 8U > + || (2U << x) == 16U > + || (3U << x) == 24U > + || (4U << x) == 32U > + || (5U << x) == 40U > + || (6U << x) == 48U > + || (7U << x) == 56U > + || (8U << x) == 64U > + || (9U << x) == 72U > + || (10U << x) == 80U)) > +__builtin_abort (); > +} > + > +void > +test5 (int x) > +{ > + if ((0 << x) == 1 > + || (0 << x) != 0 > + || (0x8001U << x) != 0x2U) > +__builtin_abort (); > +} > + > +int > +main (void) > +{ > + test1 (1); > + test2 (2); > + test3 (4U); > + test4 (3U); > + test5 (17); > +} > + > +/* { dg-final { scan-tree-dump-not "<<" "original" } } */ > diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c > i
[PATCH] Remove restrictions on group gaps in SLP
The following patch does $subject by fixing the bogus removal of load-permutations. This means gaps are now handled by the permutation support (well, if the required permutation is supported). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-06-09 Richard Biener * tree-vect-slp.c (vect_build_slp_tree_1): Remove bailout on gaps. (vect_analyze_slp_instance): Instead do not falsely drop load permutations. Index: gcc/tree-vect-slp.c === *** gcc/tree-vect-slp.c (revision 224271) --- gcc/tree-vect-slp.c (working copy) *** vect_build_slp_tree_1 (loop_vec_info loo *** 762,794 else { /* Load. */ - unsigned unrolling_factor - = least_common_multiple - (*max_nunits, group_size) / group_size; - /* FORNOW: Check that there is no gap between the loads -and no gap between the groups when we need to load -multiple groups at once. */ - if (unrolling_factor > 1 - && ((GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) == stmt - && GROUP_GAP (vinfo_for_stmt (stmt)) != 0) - /* If the group is split up then GROUP_GAP -isn't correct here, nor is GROUP_FIRST_ELEMENT. */ - || GROUP_SIZE (vinfo_for_stmt (stmt)) > group_size)) - { - if (dump_enabled_p ()) - { - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "Build SLP failed: grouped " - "loads have gaps "); - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, - stmt, 0); - dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); - } - /* Fatal mismatch. */ - matches[0] = false; - return false; - } - /* Check that the size of interleaved loads group is not greater than the SLP group size. */ unsigned ncopies --- 762,767 *** vect_analyze_slp_instance (loop_vec_info *** 1846,1852 this_load_permuted = true; load_permutation.safe_push (load_place); } ! if (!this_load_permuted) { load_permutation.release (); continue; --- 1834,1846 this_load_permuted = true; load_permutation.safe_push (load_place); } ! if (!this_load_permuted ! /* The load requires permutation when unrolling exposes !a gap either because the group is larger than the SLP !group-size or because there is a gap between the groups. */ ! && (unrolling_factor == 1 ! || (group_size == GROUP_SIZE (vinfo_for_stmt (first_stmt)) ! && GROUP_GAP (vinfo_for_stmt (first_stmt)) == 0))) { load_permutation.release (); continue;
Re: [PING^2][PATCH][3/3][PR65460] Mark offloaded functions as parallelized
On 09/06/15 13:07, Richard Biener wrote: On Mon, 8 Jun 2015, Tom de Vries wrote: On 17/04/15 12:08, Tom de Vries wrote: On 20-03-15 12:38, Tom de Vries wrote: On 19-03-15 12:05, Tom de Vries wrote: On 18-03-15 18:22, Tom de Vries wrote: Hi, this patch fixes PR65460. The patch marks offloaded functions as parallelized, which means the parloops pass no longer attempts to modify that function. Updated patch to postpone mark_parallelized_function until the corresponding cgraph_node is available, to ensure it works with the updated mark_parallelized_function from patch 2/3. Updated to eliminate mark_parallelized_function. Bootstrapped and reg-tested on x86_64. OK for stage4? ping. ping^2. Original post at https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01063.html . Ok, but shouldn't it be set before calling add_new_function as add_new_function might run passes that wouldn't identify the function as parallelized? Hm, indeed sometimes add_new_function executes some passes itself, besides queueing the function for further processing. I suppose the existing settings of parallelized_function should be modified in a similar way. I'll bootstrap and reg-test attached two patches on x86_64, and commit unless objections. Thanks, - Tom Mark function parallelized_function before add_new_function 2015-06-09 Tom de Vries * omp-low.c (finalize_task_copyfn, expand_omp_taskreg): Mark function parallelized_function before add_new_function. --- gcc/omp-low.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index f322416..2045e48 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -1552,8 +1552,9 @@ finalize_task_copyfn (gomp_task *task_stmt) pop_cfun (); /* Inform the callgraph about the new function. */ + cgraph_node *node = cgraph_node::get_create (child_fn); + node->parallelized_function = 1; cgraph_node::add_new_function (child_fn, false); - cgraph_node::get (child_fn)->parallelized_function = 1; } /* Destroy a omp_context data structures. Called through the splay tree @@ -5589,8 +5590,9 @@ expand_omp_taskreg (struct omp_region *region) /* Inform the callgraph about the new function. */ DECL_STRUCT_FUNCTION (child_fn)->curr_properties = cfun->curr_properties; + cgraph_node *node = cgraph_node::get_create (child_fn); + node->parallelized_function = 1; cgraph_node::add_new_function (child_fn, true); - cgraph_node::get (child_fn)->parallelized_function = 1; /* Fix the callgraph edges for child_cfun. Those for cfun will be fixed in a following pass. */ -- 1.9.1 Mark offloaded functions as parallelized 2015-06-09 Tom de Vries PR tree-optimization/65460 * omp-low.c (expand_omp_target): Set parallelized_function on cgraph_node for child_fn. --- gcc/omp-low.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 2045e48..77716bf6 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -8959,6 +8959,8 @@ expand_omp_target (struct omp_region *region) /* Inform the callgraph about the new function. */ DECL_STRUCT_FUNCTION (child_fn)->curr_properties = cfun->curr_properties; + cgraph_node *node = cgraph_node::get_create (child_fn); + node->parallelized_function = 1; cgraph_node::add_new_function (child_fn, true); #ifdef ENABLE_OFFLOADING -- 1.9.1
C++ PATCH for c++/66383 (aggregate NSDMI issue)
In this testcase we end up with the initializer for the A subobject referring to the address of the B PLACEHOLDER_EXPR, which wasn't getting replaced properly because when we see the A TARGET_EXPR we stopped looking for more PLACEHOLDER_EXPRs. Instead, we need to keep looking, and learn how to handle placeholders for enclosing objects as well as the innermost one. Tested x86_64-pc-linux-gnu, applying to trunk and 5. commit 26b958fac8f8c42ab0158583c6a362caf4a9ab73 Author: Jason Merrill Date: Fri Jun 5 17:07:23 2015 -0400 PR c++/66383 * tree.c (replace_placeholders_r): Handle placeholders for an outer object. * typeck2.c (store_init_value): Only replace_placeholders for objects of class type. diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index 3c4b527..3553d7c 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -2552,15 +2552,15 @@ replace_placeholders_r (tree* t, int* walk_subtrees, void* data_) switch (TREE_CODE (*t)) { case PLACEHOLDER_EXPR: - gcc_assert (same_type_ignoring_top_level_qualifiers_p - (TREE_TYPE (*t), TREE_TYPE (obj))); - *t = obj; - *walk_subtrees = false; - break; - -case TARGET_EXPR: - /* Don't mess with placeholders in an unrelated object. */ - *walk_subtrees = false; + { + tree x = obj; + for (; !(same_type_ignoring_top_level_qualifiers_p + (TREE_TYPE (*t), TREE_TYPE (x))); + x = TREE_OPERAND (x, 0)) + gcc_assert (TREE_CODE (x) == COMPONENT_REF); + *t = x; + *walk_subtrees = false; + } break; case CONSTRUCTOR: diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index affa265..22a5580 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -836,7 +836,7 @@ store_init_value (tree decl, tree init, vec** cleanups, int flags) TREE_CONSTANT (decl) = const_init && decl_maybe_constant_var_p (decl); } - if (cxx_dialect >= cxx14) + if (cxx_dialect >= cxx14 && CLASS_TYPE_P (strip_array_types (type))) /* Handle aggregate NSDMI in non-constant initializers, too. */ value = replace_placeholders (value, decl); diff --git a/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr3.C b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr3.C new file mode 100644 index 000..185ea10 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr3.C @@ -0,0 +1,43 @@ +// PR c++/66383 +// { dg-do compile { target c++11 } } + +namespace N1 { + struct B; + + struct A + { +B* b; +A(B* b); + }; + + struct B + { +A a{ this }; + }; + + A::A(B* b): b{ b } {} + + void foo() + { +auto b = B{}; + } +} + +namespace N2 { + struct B; + + struct A + { +B* b; + }; + + struct B + { +A a{ this }; + }; + + void foo() + { +auto b = B{}; + } +}
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek wrote: > On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote: >> I'm afraid that simple scan loop won't work correctly on x32. There >> are some issues with UNSPEC_TP for this target, so we have to generate >> zero_extend of SImode UNSPEC, e.g.: >> >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) >> >> as can be seen in get_thread_pointer to construct the address. It >> looks that your loop won't find the UNSPEC_TP tag in the above case. > > You're right, for -m32 it would need to start with >rtx *x = &addr; > + while (GET_CODE (*x) == ZERO_EXTEND > +|| GET_CODE (*x) == AND > +|| GET_CODE (*x) == SUBREG) > +x = &XEXP (*x, 0); Oh, you can use SImode_address_operand predicate here. Uros.
[PATCH] Remove last permutation restriction for SLP vectorizing
This removes the restriction in place on reductions. To not regress gfortran.dg/vect/fast-math-pr37021.f90 the patch also implements SLP permutations for strided group loads. With those two pieces we can finally SLP vectorize the complex multiplication (which in the gfortran.dg/vect/fast-math-pr37021.f90 testcase uses variable stride). Code generated on x86_64 suffers from PR56766, combine not being able to use sse3_addsubv2df3 because the RTL doesn't use the expected vec_merge but (vec_select (vec_concat )) instead. Previously we vectorized this with unrolling and interleaving. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2015-06-09 Richard Biener * tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Split out from ... (vect_supported_load_permutation_p): ... here. Handle supportable permutations in reductions. * tree-vect-stmts.c (vectorizable_load): Handle SLP permutations for vectorizing strided group loads. Index: gcc/tree-vect-slp.c === *** gcc/tree-vect-slp.c (revision 224271) --- gcc/tree-vect-slp.c (working copy) *** vect_slp_rearrange_stmts (slp_tree node, *** 1326,1331 --- 1270,1336 } + /* Attempt to reorder stmts in a reduction chain so that we don't +require any load permutation. Return true if that was possible, +otherwise return false. */ + + static bool + vect_attempt_slp_rearrange_stmts (slp_instance slp_instn) + { + unsigned int group_size = SLP_INSTANCE_GROUP_SIZE (slp_instn); + unsigned int i, j; + sbitmap load_index; + unsigned int lidx; + slp_tree node, load; + + /* Compare all the permutation sequences to the first one. We know + that at least one load is permuted. */ + node = SLP_INSTANCE_LOADS (slp_instn)[0]; + if (!node->load_permutation.exists ()) + return false; + for (i = 1; SLP_INSTANCE_LOADS (slp_instn).iterate (i, &load); ++i) + { + if (!load->load_permutation.exists ()) + return false; + FOR_EACH_VEC_ELT (load->load_permutation, j, lidx) + if (lidx != node->load_permutation[j]) + return false; + } + + /* Check that the loads in the first sequence are different and there + are no gaps between them. */ + load_index = sbitmap_alloc (group_size); + bitmap_clear (load_index); + FOR_EACH_VEC_ELT (node->load_permutation, i, lidx) + { + if (bitmap_bit_p (load_index, lidx)) + { + sbitmap_free (load_index); + return false; + } + bitmap_set_bit (load_index, lidx); + } + for (i = 0; i < group_size; i++) + if (!bitmap_bit_p (load_index, i)) + { + sbitmap_free (load_index); + return false; + } + sbitmap_free (load_index); + + /* This permutation is valid for reduction. Since the order of the + statements in the nodes is not important unless they are memory + accesses, we can rearrange the statements in all the nodes + according to the order of the loads. */ + vect_slp_rearrange_stmts (SLP_INSTANCE_TREE (slp_instn), group_size, + node->load_permutation); + + /* We are done, no actual permutations need to be generated. */ + FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node) + SLP_TREE_LOAD_PERMUTATION (node).release (); + return true; + } + /* Check if the required load permutations in the SLP instance SLP_INSTN are supported. */ *** vect_supported_load_permutation_p (slp_i *** 1334,1340 { unsigned int group_size = SLP_INSTANCE_GROUP_SIZE (slp_instn); unsigned int i, j, k, next; - sbitmap load_index; slp_tree node; gimple stmt, load, next_load, first_load; struct data_reference *dr; --- 1339,1344 *** vect_supported_load_permutation_p (slp_i *** 1369,1427 stmt = SLP_TREE_SCALAR_STMTS (node)[0]; /* Reduction (there are no data-refs in the root). ! In reduction chain the order of the loads is important. */ if (!STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)) && !GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) { ! slp_tree load; ! unsigned int lidx; ! ! /* Compare all the permutation sequences to the first one. We know ! that at least one load is permuted. */ ! node = SLP_INSTANCE_LOADS (slp_instn)[0]; ! if (!node->load_permutation.exists ()) ! return false; ! for (i = 1; SLP_INSTANCE_LOADS (slp_instn).iterate (i, &load); ++i) ! { ! if (!load->load_permutation.exists ()) ! return false; ! FOR_EACH_VEC_ELT (load->load_permutation, j, lidx) ! if (lidx != node->load_permutation[j]) ! return false; ! } ! /* Check that the loads in the first sequence are different and there !are no gaps between them. */ ! load_index = sbitmap_alloc (group_size); !
[PATCH] top-level for libvtv: use normal (not raw_cxx) target exports
Hi build machinery maintainers, since we always build the C++ compiler now, I fail to see the need to still use RAW_CXX_TARGET_EXPORTS for libvtv. The situation to expose the problem is: * Use a multilib-enabled x86_64-linux box. * Use a 64-bit (multilib-disabled) bootstrap compiler (binary image). $ configure --enable-multilib --with-system-zlib $ make bootstrap When it comes to build the 32-bit libvtv, it breaks because of using "CC=/build/prev-gcc/xgcc -m32" "CXX=g++ -m32", while it should use "CC=/build/prev-gcc/xgcc -m32" "CXX=/build/prev-gcc/xg++ -m32" instead. However, I'm not sure about the general question behind: Should it work to bootstrap the multilib-compiler using a non-multilib one? This also needs above configure flags to work around two more but minor issues, which I'm unsure about whether I can/should fix at all: * --enable-multilib: Without this, the "user friendly check" is breaking, since https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=205975 * --with-system-zlib: Without this, --enable-multilib tries to build a 32-bit zlib with "CC=/build/32/./prev-gcc/xgcc" Thanks! /haubi/ 2015-06-09 Michael Haubenwallner * Makefile.def (libvtv): Drop raw_cxx=true. * Makefile.in: Regenerate. Index: Makefile.def === --- Makefile.def (revision 224279) +++ Makefile.def (working copy) @@ -135,8 +135,7 @@ lib_path=.libs; }; target_modules = { module= libvtv; bootstrap=true; - lib_path=.libs; - raw_cxx=true; }; + lib_path=.libs; }; target_modules = { module= libcilkrts; lib_path=.libs; }; target_modules = { module= liboffloadmic; Index: Makefile.in === --- Makefile.in (revision 224279) +++ Makefile.in (working copy) @@ -35290,7 +35290,7 @@ fi; \ test ! -f $(TARGET_SUBDIR)/libvtv/Makefile || exit 0; \ $(SHELL) $(srcdir)/mkinstalldirs $(TARGET_SUBDIR)/libvtv; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ echo Configuring in $(TARGET_SUBDIR)/libvtv; \ cd "$(TARGET_SUBDIR)/libvtv" || exit 1; \ case $(srcdir) in \ @@ -35333,7 +35333,7 @@ mv $(TARGET_SUBDIR)/libvtv/multilib.tmp $(TARGET_SUBDIR)/libvtv/multilib.out; \ fi; \ test ! -f $(TARGET_SUBDIR)/libvtv/Makefile || exit 0; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ CFLAGS="$(CFLAGS_FOR_TARGET)"; export CFLAGS; \ CXXFLAGS="$(CXXFLAGS_FOR_TARGET)"; export CXXFLAGS; \ LIBCFLAGS="$(LIBCFLAGS_FOR_TARGET)"; export LIBCFLAGS; \ @@ -35377,7 +35377,7 @@ mv $(TARGET_SUBDIR)/libvtv/multilib.tmp $(TARGET_SUBDIR)/libvtv/multilib.out; \ fi; \ test ! -f $(TARGET_SUBDIR)/libvtv/Makefile || exit 0; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ \ CFLAGS="$(CFLAGS_FOR_TARGET)"; export CFLAGS; \ CXXFLAGS="$(CXXFLAGS_FOR_TARGET)"; export CXXFLAGS; \ @@ -35422,7 +35422,7 @@ mv $(TARGET_SUBDIR)/libvtv/multilib.tmp $(TARGET_SUBDIR)/libvtv/multilib.out; \ fi; \ test ! -f $(TARGET_SUBDIR)/libvtv/Makefile || exit 0; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ \ CFLAGS="$(CFLAGS_FOR_TARGET)"; export CFLAGS; \ CXXFLAGS="$(CXXFLAGS_FOR_TARGET)"; export CXXFLAGS; \ @@ -35467,7 +35467,7 @@ mv $(TARGET_SUBDIR)/libvtv/multilib.tmp $(TARGET_SUBDIR)/libvtv/multilib.out; \ fi; \ test ! -f $(TARGET_SUBDIR)/libvtv/Makefile || exit 0; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ \ CFLAGS="$(CFLAGS_FOR_TARGET)"; export CFLAGS; \ CXXFLAGS="$(CXXFLAGS_FOR_TARGET)"; export CXXFLAGS; \ @@ -35512,7 +35512,7 @@ mv $(TARGET_SUBDIR)/libvtv/multilib.tmp $(TARGET_SUBDIR)/libvtv/multilib.out; \ fi; \ test ! -f $(TARGET_SUBDIR)/libvtv/Makefile || exit 0; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ \ CFLAGS="$(CFLAGS_FOR_TARGET)"; export CFLAGS; \ CXXFLAGS="$(CXXFLAGS_FOR_TARGET)"; export CXXFLAGS; \ @@ -35557,7 +35557,7 @@ mv $(TARGET_SUBDIR)/libvtv/multilib.tmp $(TARGET_SUBDIR)/libvtv/multilib.out; \ fi; \ test ! -f $(TARGET_SUBDIR)/libvtv/Makefile || exit 0; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ \ CFLAGS="$(CFLAGS_FOR_TARGET)"; export CFLAGS; \ CXXFLAGS="$(CXXFLAGS_FOR_TARGET)"; export CXXFLAGS; \ @@ -35594,9 +35594,9 @@ all-target-libvtv: configure-target-libvtv @r=`${PWD_COMMAND}`; export r; \ s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ (cd $(TARGET_SUBDIR)/libvtv && \ - $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_TARGET_FLAGS) 'CXX=$$(RAW_CXX_FOR_TARGET)' 'CXX_FOR_TARGET=$$(RAW_CXX_FOR_TARGET)' \ + $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_TARGET_FLAGS) \ $(TARGET-target-libvtv)) @endif target-libvtv @@ -35615,7 +35615,7 @@ @r=`${PWD_COMMAND}`; export r; \ s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \ TFLAGS="$(STAGE1_TFLAGS)"; \ - $(RAW_CXX_TARGET_EXPORTS) \ + $(NORMAL_TARGET_EXPORTS) \ cd $(TARGET_SUBDIR)/libvtv &&
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 09, 2015 at 04:12:33PM +0200, Uros Bizjak wrote: > On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek wrote: > > On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote: > >> I'm afraid that simple scan loop won't work correctly on x32. There > >> are some issues with UNSPEC_TP for this target, so we have to generate > >> zero_extend of SImode UNSPEC, e.g.: > >> > >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) > >> > >> as can be seen in get_thread_pointer to construct the address. It > >> looks that your loop won't find the UNSPEC_TP tag in the above case. > > > > You're right, for -m32 it would need to start with Yeah, I meant -mx32 (which I have no experience with nor spare time for). > >rtx *x = &addr; > > + while (GET_CODE (*x) == ZERO_EXTEND > > +|| GET_CODE (*x) == AND > > +|| GET_CODE (*x) == SUBREG) > > +x = &XEXP (*x, 0); > > Oh, you can use SImode_address_operand predicate here. Do I need to loop, or can there be just one SImode_address_operand code? Do you want to use the iterators (as in the second patch) or not (then is if (SImode_address_operand (addr, VOIDmode)) x = &XEXP (addr, 0); ok)? Is Pmode always SImode for -mx32, or depending on some switch or something? Would it be acceptable to just guard the changes in the patch with !TARGET_X32 and let H.J. deal with that target? I'm afraid I'm lost when to ZERO_EXTEND addr (if needed at all), etc. Jakub
[hsa] Integrate into the existing accelerator framework, libgomp plugin
Hi, the big patch below brings HSA substantially closer to nice co-existence with the existing accelerator framework in trunk. Above all, it implements a libgomp plugin, and with this patch applied, the branch uses it to run the target constructs of OpenMP 4. While there may be mistakes, I hope that the plugin already implements all that is necessary and should be very close to the version I will propose for inclusion to trunk late in the summer. The changes in libgomp itself are not big, I assume the only notable thing is that it I made it skip any and all re-mapping and transport of data (as opposed to functions) if the plugin has the GOMP_OFFLOAD_CAP_SHARED_MEM capability. The automake/autoconf scripts have been changed so that the plugin is built only when hsa is listed in --enable-offload-targets during configuration of gcc. There is also a new option --with-hsa-runtime (and more specific options --with-hsa-runtime-include and --with-hsa-runtime-lib) to provide the path to the HSA run-time header files and library, which you are now likely to have to use to play with the branch. The file hsa.c and the hsa headers in the libgomp proper have not gone away yet, so that the prototype entry point can still be used for a bit longer. But it will be removed in the big next step. The majority of changes to the compiler itself are changes to autoconf scripts to recognize hsa in the list --enable-offload-targets and define ENABLE_HSA if it si present but also to *not* define ENABLE_OFFLOADING when it is the only accelerator listed so that gcc does not create the offloading LTO elf sections. Similarly, I have changed the driver not to attempt to find an external compiler for HSA, and the common option handling so that it is usable to disable hsa offloading even when it has been configured. The fact that HSA could be enabled even though ENABLE_OFFLOADING is not, however, is slightly confusing, especially since we happy use the offload flag of cgraph_node. Would the rest of the community be opposed to changing the name of the macro to something that would imply that stuff is being streamed to disk? I have tested the patch passes our small HSA testsuite, wrote myself a simple multi-threaded shared-library loading and unloading test running an assignment in a target OppenMP construct and have also verified that the Intel accelerator works and tests on the branch as well as on corresponding trunk revision. I'm looking forward to any comments and suggestions, meanwhile I have committed the patch to the branch as r224284. Thanks, Martin 2015-06-09 Martin Jambor libgomp/ * Makefile.in: Regenerated. * config.h.in: Likewise. * configure: Likewise. * libgomp-plugin.h (offload_target_type): Added OFFLOAD_TARGET_TYPE_HSA. * plugin/Makefrag.am: Conditionally build HSA plugin. * plugin/configfrag.ac: New options for providing path to HSA run-time. Test that it is available if building the plugin. * plugin/plugin-hsa.c: New file. * target.c (GOMP_target): Do not re-map arguments when the device is capable of sharing memory. (GOMP_target_data): Likewise. * testsuite/Makefile.in (): Regenerated. gcc/ * builtin-types.def (BT_FN_VOID_PTR_INT_PTR): New. * common.opt (flag_disable_hsa): Likewise. * config.in: Regenrated. * configure : Likewise. * configure.ac (accel_dir_suffix): Set ENABLE_HSA when hsa is listed as accelerator, do not set ENABLE_OFFLOADING if it is the only one. * hsa-brig.c (hsa_dtor_statements): New variable. (hsa_output_kernel_mapping): Generate libgomp registration and unregistration calls. * hsa-gen.c (generate_hsa): New parameter kernel, use it rather than parameters. (pass_gen_hsail::gate): Make conditional on ENABLE_HSA. (pass_gen_hsail::execute): Detect kernels. * lto-wrapper.c (compile_images_for_offload_targets): Do not attempt to invoke an external hsa accelerator compiler. * omp-builtins.def (BUILT_IN_GOMP_OFFLOAD_REGISTER): New. (BUILT_IN_GOMP_OFFLOAD_UNREGISTER): Likewise. * opts.c (common_handle_option): Disable HSA if requested on the command line. gcc/fortran/ * types.def (BT_FN_VOID_PTR_INT_PTR): New. diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index 0e34531..f462d8a 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -450,6 +450,7 @@ DEF_FUNCTION_TYPE_3 (BT_FN_BOOL_ULONG_ULONG_ULONGPTR, BT_BOOL, BT_ULONG, BT_ULONG, BT_PTR_ULONG) DEF_FUNCTION_TYPE_3 (BT_FN_BOOL_ULONGLONG_ULONGLONG_ULONGLONGPTR, BT_BOOL, BT_ULONGLONG, BT_ULONGLONG, BT_PTR_ULONGLONG) +DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_PTR, BT_VOID, BT_PTR, BT_INT, BT_PTR) DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR, BT_SIZE, BT_CONST_PTR, BT
PATCH to check_global_declaration for bootstrap/66448
On targets without alias support (Darwin) bootstrap was failing with a warning about one of the constructor clones being unused. We shouldn't warn about that, since clones are artificial; we should only warn if the abstract constructor is unused. Tested x86_64-pc-linux-gnu, applying to trunk. commit 16232674086f330d05f7279681978eb683990ed5 Author: Jason Merrill Date: Tue Jun 9 10:12:47 2015 -0400 PR bootstrap/66448 * toplev.c (check_global_declaration): Don't warn about a clone. diff --git a/gcc/testsuite/g++.dg/warn/Wunused-function1.C b/gcc/testsuite/g++.dg/warn/Wunused-function1.C new file mode 100644 index 000..86b319a --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wunused-function1.C @@ -0,0 +1,12 @@ +// PR bootstrap/66448 +// { dg-options "-Wunused-function" } + +struct A { A(); }; +namespace { + struct B: virtual A { B(); }; + B::B() { } + B b; + + struct C: virtual A { C(); }; + C::C() { } // { dg-warning "defined but not used" } +} diff --git a/gcc/toplev.c b/gcc/toplev.c index 306d008..b1ccc18 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -529,6 +529,7 @@ check_global_declaration (tree decl) && ! (DECL_NAME (decl) && TREE_USED (DECL_NAME (decl))) && ! DECL_EXTERNAL (decl) && ! DECL_ARTIFICIAL (decl) + && ! DECL_ABSTRACT_ORIGIN (decl) && ! TREE_PUBLIC (decl) /* A volatile variable might be used in some non-obvious way. */ && ! TREE_THIS_VOLATILE (decl)
[PATCH] xtensa: implement trap pattern
gcc/ * config/xtensa/xtensa.h (TARGET_DEBUG): New definition. * config/xtensa/xtensa.md (define_attr "type"): New type "trap". (define_insn "trap"): New definition. --- gcc/config/xtensa/xtensa.h | 1 + gcc/config/xtensa/xtensa.md | 15 ++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h index 011411c..584080b 100644 --- a/gcc/config/xtensa/xtensa.h +++ b/gcc/config/xtensa/xtensa.h @@ -67,6 +67,7 @@ extern unsigned xtensa_current_frame_size; #define TARGET_THREADPTR XCHAL_HAVE_THREADPTR #define TARGET_LOOPS XCHAL_HAVE_LOOPS #define TARGET_WINDOWED_ABI(XSHAL_ABI == XTHAL_ABI_WINDOWED) +#define TARGET_DEBUG XCHAL_HAVE_DEBUG #define TARGET_DEFAULT \ ((XCHAL_HAVE_L32R? 0 : MASK_CONST16) | \ diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 6d84384..fed7586 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -86,7 +86,7 @@ ;; Attributes. (define_attr "type" - "unknown,jump,call,load,store,move,arith,multi,nop,farith,fmadd,fconv,fload,fstore,mul16,mul32,div32,mac16,rsr,wsr,entry" + "unknown,jump,call,load,store,move,arith,multi,nop,farith,fmadd,fconv,fload,fstore,mul16,mul32,div32,mac16,rsr,wsr,entry,trap" (const_string "unknown")) (define_attr "mode" @@ -1764,6 +1764,19 @@ [(set_attr "length" "0") (set_attr "type" "nop")]) +(define_insn "trap" + [(trap_if (const_int 1) (const_int 0))] + "" +{ + if (TARGET_DEBUG) +return (TARGET_DENSITY ? "break.n\t0" : "break\t0, 0"); + else +return (TARGET_DENSITY ? "ill.n" : "ill"); +} + [(set_attr "type""trap") + (set_attr "mode""none") + (set_attr "length" "3")]) + ;; Setting up a frame pointer is tricky for Xtensa because GCC doesn't ;; know if a frame pointer is required until the reload pass, and ;; because there may be an incoming argument value in the hard frame -- 1.8.1.4
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 9, 2015 at 4:26 PM, Jakub Jelinek wrote: >> >> I'm afraid that simple scan loop won't work correctly on x32. There >> >> are some issues with UNSPEC_TP for this target, so we have to generate >> >> zero_extend of SImode UNSPEC, e.g.: >> >> >> >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) >> >> >> >> as can be seen in get_thread_pointer to construct the address. It >> >> looks that your loop won't find the UNSPEC_TP tag in the above case. >> > >> > You're right, for -m32 it would need to start with > > Yeah, I meant -mx32 (which I have no experience with nor spare time for). > >> >rtx *x = &addr; >> > + while (GET_CODE (*x) == ZERO_EXTEND >> > +|| GET_CODE (*x) == AND >> > +|| GET_CODE (*x) == SUBREG) >> > +x = &XEXP (*x, 0); >> >> Oh, you can use SImode_address_operand predicate here. > > Do I need to loop, or can there be just one SImode_address_operand IIRC, apart from the whole address, only UNSPEC_TP can be zero_extended. It is a hardware "feature" (== HW bug) that addr32 doesn't apply to segment registers. > code? Do you want to use the iterators (as in the second patch) or not > (then is > if (SImode_address_operand (addr, VOIDmode)) > x = &XEXP (addr, 0); > ok)? Is Pmode always SImode for -mx32, or depending on some switch or Nope, it depends on -maddress-mode switch, and can be SImode or DImode. > something? Would it be acceptable to just guard the changes in the patch > with !TARGET_X32 and let H.J. deal with that target? I'm afraid I'm lost > when to ZERO_EXTEND addr (if needed at all), etc. If you wish, I can take your patch and take if further. -mx32 is a delicate beast... Uros. > > Jakub
[patch] [jit] Install headers using INSTALL_DATA
Install the gcc/jit header files without the x bit set. Ok for the trunk and the 5 branch? Matthias gcc/jit/ 2015-06-09 Matthias Klose * Make-lang.in (jit.install-common): Install headers using INSTALL_DATA. Index: gcc/jit/Make-lang.in === --- gcc/jit/Make-lang.in(revision 224282) +++ gcc/jit/Make-lang.in(working copy) @@ -268,9 +268,9 @@ ln -sf \ $(LIBGCCJIT_SONAME_SYMLINK)\ $(DESTDIR)/$(libdir)/$(LIBGCCJIT_LINKER_NAME_SYMLINK) - $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit.h \ + $(INSTALL_DATA) $(srcdir)/jit/libgccjit.h \ $(DESTDIR)/$(includedir)/libgccjit.h - $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit++.h \ + $(INSTALL_DATA) $(srcdir)/jit/libgccjit++.h \ $(DESTDIR)/$(includedir)/libgccjit++.h jit.install-man:
Re: [PR64164] drop copyrename, integrate into expand
This also broke bootstrap on PPC64 LE Linux with the same error. - David
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 09, 2015 at 06:16:32PM +0200, Uros Bizjak wrote: > > something? Would it be acceptable to just guard the changes in the patch > > with !TARGET_X32 and let H.J. deal with that target? I'm afraid I'm lost > > when to ZERO_EXTEND addr (if needed at all), etc. > > If you wish, I can take your patch and take if further. -mx32 is a > delicate beast... If you could, it would be appreciated, I'm quite busy with OpenMP 4.1 stuff now. Note that for -m64/-mx32 it will be much harder to create a reproducer, because to trigger the bug one has to convince the register allocator to allocate the lhs of the load in certain registers (not that hard), but also the index register (to be scaled, also not that hard) and also the register holding the tls symbol immediate. Wonder if one has to keep all but the two registers live across the load or something similar. Jakub
Re: C++ PATCH for c++/51747 (list-initialization from same type)
On 05/07/2015 12:45 PM, Jason Merrill wrote: We also need to adjust digest_init_r. This was only needed because we weren't calling reshape_init. Now that Paolo has fixed that, this can become an assert. commit 8d2793c42fc3de4a0b665f4c2ff2a2946ae0beda Author: Jason Merrill Date: Tue Jun 9 09:51:31 2015 -0400 DR 1467 PR c++/51747 * typeck2.c (digest_init_r): Replace previous change with gcc_unreachable. diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index b077f02..709875c 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -1089,6 +1089,7 @@ digest_init_r (tree type, tree init, bool nested, int flags, || TREE_CODE (type) == UNION_TYPE || TREE_CODE (type) == COMPLEX_TYPE); +#ifdef ENABLE_CHECKING /* "If T is a class type and the initializer list has a single element of type cv U, where U is T or a class derived from T, the object is initialized from that element." */ @@ -1099,8 +1100,10 @@ digest_init_r (tree type, tree init, bool nested, int flags, { tree elt = CONSTRUCTOR_ELT (init, 0)->value; if (reference_related_p (type, TREE_TYPE (elt))) - init = elt; + /* We should have fixed this in reshape_init. */ + gcc_unreachable (); } +#endif if (BRACE_ENCLOSED_INITIALIZER_P (init) && !TYPE_NON_AGGREGATE_CLASS (type))
Re: Fix more of C/fortran canonical type issues
> On Mon, 8 Jun 2015, Jan Hubicka wrote: > > > > > > > I think we should instead work towards eliminating the get_alias_set > > > langhook first. The LTO langhook variant contains the same handling, btw, > > > so just inline that into get_alias_set and see what remains? > > > > I see, i completely missed existence of gimple_get_alias_set. It makes more > > sense now. > > > > Is moving everyting to alias.c realy a desirable thing? If non-C languages > > do > > not have this rule, why we want to reduce the code quality when compiling > > those? > > Well, for consistency and for getting rid of one langhook ;) :) In a way this particular langhook makes sense to me - TBAA rules are language specific. We also may with explicit streaming of the TBAA dag, like LLVM does. Anyway, this is the updated patch fixing the Fortran's interoperability with size_t and signed char. I will send separate patch for the extra lto-symtab warnings shortly. I will be happy looking into the TYPE_CANONICAL (int) to be different from TYPE_CANONICAL (unsigned int) if that seems desirable. There are two things that needs to be solved - hash_canonical_type/gimple_canonical_types_compatible_p can't use TYPE_CNAONICAL of subtypes in all cases (that is easy) and we will need some way to recognize the conflict in lto-symtab other thanjust comparing TYPE_CANONICAL to not warn when a variable is declared signed in Fortran unit and unsigned in C. Bootstrapped/regtested ppc64le-linux. * lto/lto.c (hash_canonical_type): Do not hash TYPE_UNSIGNED of INTEGER_TYPE. * tree.c (gimple_canonical_types_compatible_p): Do not compare TYPE_UNSIGNED of INTEGER_TYPE. * gimple-expr.c (useless_type_conversion_p): Move INTEGER type handling ahead the canonical type lookup. * gfortran.dg/lto/bind_c-2_0.f90: New testcase * gfortran.dg/lto/bind_c-2_1.c: New testcase * gfortran.dg/lto/bind_c-3_0.f90: New testcase * gfortran.dg/lto/bind_c-3_1.c: New testcase * gfortran.dg/lto/bind_c-4_0.f90: New testcase * gfortran.dg/lto/bind_c-4_1.c: New testcase Index: lto/lto.c === --- lto/lto.c (revision 224252) +++ lto/lto.c (working copy) @@ -298,6 +298,7 @@ hash_canonical_type (tree type) { inchash::hash hstate; + enum tree_code code; /* We compute alias sets only for types that needs them. Be sure we do not recurse to something else as we can not hash incomplete @@ -309,7 +310,8 @@ smaller sets; when searching for existing matching types to merge, only existing types having the same features as the new type will be checked. */ - hstate.add_int (tree_code_for_canonical_type_merging (TREE_CODE (type))); + code = tree_code_for_canonical_type_merging (TREE_CODE (type)); + hstate.add_int (code); hstate.add_int (TYPE_MODE (type)); /* Incorporate common features of numerical types. */ @@ -319,7 +321,14 @@ || TREE_CODE (type) == OFFSET_TYPE || POINTER_TYPE_P (type)) { - hstate.add_int (TYPE_UNSIGNED (type)); + /* Some Fortran integer types are interoperable with C types regardless +their signedness. We need to ignore sign on these to make it +possible for structure containing unsigned type to interoperate +with structure containing signed type which is also required by +the standard. It is thus not enough to keep alias set of signed +type same with alias set of unsigned type. */ + if (code != INTEGER_TYPE) +hstate.add_int (TYPE_UNSIGNED (type)); hstate.add_int (TYPE_PRECISION (type)); } Index: tree.c === --- tree.c (revision 224252) +++ tree.c (working copy) @@ -12879,6 +12879,7 @@ gimple_canonical_types_compatible_p (const_tree t1, const_tree t2, bool trust_type_canonical) { + enum tree_code code; /* Type variants should be same as the main variant. When not doing sanity checking to verify this fact, go to main variants and save some work. */ if (trust_type_canonical) @@ -12918,9 +12919,9 @@ && trust_type_canonical) return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2); + code = tree_code_for_canonical_type_merging (TREE_CODE (t1)); /* Can't be the same type if the types don't have the same code. */ - if (tree_code_for_canonical_type_merging (TREE_CODE (t1)) - != tree_code_for_canonical_type_merging (TREE_CODE (t2))) + if (code != tree_code_for_canonical_type_merging (TREE_CODE (t2))) return false; /* Qualifiers do not matter for canonical type comparison purposes. */ @@ -12945,7 +12946,14 @@ { /* Can't be the same type if they have different sign or precision. */ if (TYPE_PRECISION (t1) != TYPE_PRECISION (t2) - || TYPE_UNSIGNED (t1) != TYPE_UNSIGNED (t2)) + /* Some
[PATCH] PowerPC atomic memmodel cleanup
The infrastructure for atomic operations in the rs6000 port has not utilized the cleaner GCC internal APIs for manipulating memmodel. The appended patch converts the direct casts and use of memmodel_from_int to consistently use memmodel_base. This also allows the SYNC variants to be removed from switch case statements. This should not change any semantics or behavior of atomics in PowerPC code generation. Comments? Bootstrapped on powerpc-ibm-aix7.1.0.0 * config/rs6000/rs6000.c (rs6000_pre_atomic_barrier): Remove SYNC variants cases from switch. (rs6000_post_atomic_barrier): Same. (rs6000_expand_atomic_compare_and_swap): Use memmodel_base. (rs6000_expand_atomic_exchange): Same. (rs6000_expand_atomic_op): Same. * config/rs6000/sync.md (mem_thread_fence): Use memodel_base. Remove SYNC variants cases from switch. (atomic_load): Same. (atomic_store): Same. Thanks, David * config/rs6000/rs6000.c (rs6000_pre_atomic_barrier): Remove SYNC variants cases from switch. (rs6000_post_atomic_barrier): Same. (rs6000_expand_atomic_compare_and_swap): Use memmodel_base. (rs6000_expand_atomic_exchange): Same. (rs6000_expand_atomic_op): Same. * config/rs6000/sync.md (mem_thread_fence): Use memodel_base. Remove SYNC variants cases from switch. (atomic_load): Same. (atomic_store): Same. Index: rs6000.c === --- rs6000.c(revision 224288) +++ rs6000.c(working copy) @@ -20558,15 +20558,12 @@ rs6000_pre_atomic_barrier (rtx mem, enum memmodel case MEMMODEL_RELAXED: case MEMMODEL_CONSUME: case MEMMODEL_ACQUIRE: -case MEMMODEL_SYNC_ACQUIRE: break; case MEMMODEL_RELEASE: -case MEMMODEL_SYNC_RELEASE: case MEMMODEL_ACQ_REL: emit_insn (gen_lwsync ()); break; case MEMMODEL_SEQ_CST: -case MEMMODEL_SYNC_SEQ_CST: emit_insn (gen_hwsync ()); break; default: @@ -20583,13 +20580,10 @@ rs6000_post_atomic_barrier (enum memmodel model) case MEMMODEL_RELAXED: case MEMMODEL_CONSUME: case MEMMODEL_RELEASE: -case MEMMODEL_SYNC_RELEASE: break; case MEMMODEL_ACQUIRE: -case MEMMODEL_SYNC_ACQUIRE: case MEMMODEL_ACQ_REL: case MEMMODEL_SEQ_CST: -case MEMMODEL_SYNC_SEQ_CST: emit_insn (gen_isync ()); break; default: @@ -20690,8 +20684,8 @@ rs6000_expand_atomic_compare_and_swap (rtx operand oldval = operands[3]; newval = operands[4]; is_weak = (INTVAL (operands[5]) != 0); - mod_s = memmodel_from_int (INTVAL (operands[6])); - mod_f = memmodel_from_int (INTVAL (operands[7])); + mod_s = memmodel_base (INTVAL (operands[6])); + mod_f = memmodel_base (INTVAL (operands[7])); orig_mode = mode = GET_MODE (mem); mask = shift = NULL_RTX; @@ -20810,7 +20804,7 @@ rs6000_expand_atomic_exchange (rtx operands[]) retval = operands[0]; mem = operands[1]; val = operands[2]; - model = (enum memmodel) INTVAL (operands[3]); + model = memmodel_base (INTVAL (operands[3])); mode = GET_MODE (mem); mask = shift = NULL_RTX; @@ -20861,7 +20855,7 @@ void rs6000_expand_atomic_op (enum rtx_code code, rtx mem, rtx val, rtx orig_before, rtx orig_after, rtx model_rtx) { - enum memmodel model = (enum memmodel) INTVAL (model_rtx); + enum memmodel model = memmodel_base (INTVAL (model_rtx)); machine_mode mode = GET_MODE (mem); machine_mode store_mode = mode; rtx label, x, cond, mask, shift; Index: sync.md === --- sync.md (revision 224288) +++ sync.md (working copy) @@ -41,7 +41,7 @@ [(match_operand:SI 0 "const_int_operand" "")];; model "" { - enum memmodel model = memmodel_from_int (INTVAL (operands[0])); + enum memmodel model = memmodel_base (INTVAL (operands[0])); switch (model) { case MEMMODEL_RELAXED: @@ -48,14 +48,11 @@ break; case MEMMODEL_CONSUME: case MEMMODEL_ACQUIRE: -case MEMMODEL_SYNC_ACQUIRE: case MEMMODEL_RELEASE: -case MEMMODEL_SYNC_RELEASE: case MEMMODEL_ACQ_REL: emit_insn (gen_lwsync ()); break; case MEMMODEL_SEQ_CST: -case MEMMODEL_SYNC_SEQ_CST: emit_insn (gen_hwsync ()); break; default: @@ -147,7 +144,7 @@ if (mode == TImode && !TARGET_SYNC_TI) FAIL; - enum memmodel model = memmodel_from_int (INTVAL (operands[2])); + enum memmodel model = memmodel_base (INTVAL (operands[2])); if (is_mm_seq_cst (model)) emit_insn (gen_hwsync ()); @@ -185,9 +182,7 @@ break; case MEMMODEL_CONSUME: case MEMMODEL_ACQUIRE: -case MEMMODEL_SYNC_ACQUIRE: case MEMMODEL_SEQ_CST: -case MEMMODEL_SYNC_SEQ_CST: emit_insn (gen_loadsync_ (operands[0])); break; default: @@ -214,17 +209,15 @@ if (mode =
[PATCH 3/3] Improve -Wmissing-indentation heuristics
This patch improves the heuristics of the warning in a number of ways. The improvements are hopefully adequately documented in the code comments. The additions to the test case also highlight the improvements. I tested an earlier version of this patch on more than a dozen C code bases. I only found one class of bogus warnings yet emitted, in the libpng and bdwgc projects. These projects have a coding style which indents code inside #ifdefs as if this code was guarded by an if(), e.g. if (foo != 0) x = 10; else // GUARD y = 100; // BODY #ifdef BAR blah (); // NEXT #endif These bogus warnings are pre-existing, however (i.e. not caused by this patch). gcc/c-family/ChangeLog: * c-indentation.c (should_warn_for_misleading_indentation): Improve heuristics. gcc/testsuite/ChangeLog: * c-c++-common/Wmisleading-indentation.c: Add more tests. --- gcc/c-family/c-indentation.c | 167 ++--- .../c-c++-common/Wmisleading-indentation.c | 166 2 files changed, 313 insertions(+), 20 deletions(-) diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c index f43aee6..8de3bc5 100644 --- a/gcc/c-family/c-indentation.c +++ b/gcc/c-family/c-indentation.c @@ -185,6 +185,7 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo, location_t body_loc = body_tinfo.location; location_t next_stmt_loc = next_tinfo.location; + enum cpp_ttype body_type = body_tinfo.type; enum cpp_ttype next_tok_type = next_tinfo.type; /* Don't attempt to compare the indentation of BODY_LOC and NEXT_STMT_LOC @@ -230,12 +231,33 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo, || next_tinfo.keyword == RID_ELSE) return false; + /* Likewise, if the body of the guard is a compound statement then control + flow is quite visually explicit regardless of the code's possibly poor + indentation, e.g. + + while (foo) <- GUARD + { <- BODY + bar (); + } + baz ();<- NEXT + +Things only get muddy when the body of the guard does not have +braces, e.g. + +if (foo) <- GUARD + bar (); <- BODY + baz (); <- NEXT + */ + if (body_type == CPP_OPEN_BRACE) +return false; + /* Don't warn here about spurious semicolons. */ if (next_tok_type == CPP_SEMICOLON) return false; expanded_location body_exploc = expand_location (body_loc); expanded_location next_stmt_exploc = expand_location (next_stmt_loc); + expanded_location guard_exploc = expand_location (guard_loc); /* They must be in the same file. */ if (next_stmt_exploc.file != body_exploc.file) @@ -253,13 +275,20 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo, if (flag) foo (); bar (); ^ WARN HERE + + if (flag) ; { + ^ WARN HERE + + if (flag) +; { + ^ WARN HERE + Cases where we don't want to issue a warning: various_code (); if (flag) foo (); bar (); more_code (); ^ DON'T WARN HERE. */ if (next_stmt_exploc.line == body_exploc.line) { - expanded_location guard_exploc = expand_location (guard_loc); if (guard_exploc.file != body_exploc.file) return true; if (guard_exploc.line < body_exploc.line) @@ -307,14 +336,10 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo, bar (); ^ DON'T WARN HERE -if (flag) { - foo (); -} else -{ - bar (); -} -baz (); -^ DON'T WARN HERE + if (flag) + ; + foo (); + ^ DON'T WARN HERE */ if (next_stmt_exploc.line > body_exploc.line) { @@ -322,29 +347,84 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo, "visual column"... */ unsigned int next_stmt_vis_column; unsigned int body_vis_column; + unsigned int body_line_first_nws; /* If we can't determine it, don't issue a warning. This is sometimes the case for input files containing #line directives, and these are often for autogenerated sources (e.g. from .md files), where it's not clear that it's meaningful to look at indentation. */ if (!get_visual_column (next_stmt_exploc, &next_stmt_vis_column)) return false; - if (!get_visual_column (body_exploc, &body_vis_column)) + if (!get_visual_column (body_exploc, + &body_vis_column, + &body_line_first_nws)) return false; - if (next_stmt_vis_column == body_vis_column) + if ((body_type != CPP_SEMICOLON + && next_stmt_vis_column == body_vis_column) + /* As a special case handle the case where the body is a semicolo
[PATCH 1/3] Refactor entry point to -Wmisleading-indentation
This patch refactors the entry point of -Wmisleading-indentation from: void warn_for_misleading_indentation (location_t guard_loc, location_t body_loc, location_t next_stmt_loc, enum cpp_ttype next_tok_type, const char *guard_kind); to struct token_indent_info { location_t location; cpp_ttype type; rid keyword; }; void warn_for_misleading_indentation (const token_indent_info &guard_tinfo, const token_indent_info &body_tinfo, const token_indent_info &next_tinfo); The purpose of this refactoring is to expose more information to the -Wmisleading-indentation implementation to allow for more advanced heuristics and for better coverage. (I decided to keep the usage of const references because nobody seems to mind. Also I added a new header file, c-indentation.h.) gcc/c-family/ChangeLog: * c-indentation.h (struct token_indent_info): Define. (get_token_indent_info): Define. (warn_for_misleading_information): Declare. * c-common.h (warn_for_misleading_information): Remove. * c-identation.c (warn_for_misleading_indentation): Change declaration to take three token_indent_infos. Adjust accordingly. * c-identation.c (should_warn_for_misleading_indentation): Likewise. Bail out early if the body is a compound statement. (guard_tinfo_to_string): Define. gcc/c/ChangeLog: * c-parser.c (c_parser_if_body): Take token_indent_info argument. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. (c_parser_else_body): Likewise. (c_parser_if_statement): Likewise. (c_parser_while_statement): Likewise. (c_parser_for_statement): Likewise. gcc/cp/ChangeLog: * parser.c (cp_parser_selection_statement): Move handling of semicolon body to ... (cp_parser_implicitly_scoped_statement): .. here. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. Take token_indent_info argument. (cp_parser_already_scoped_statement): Likewise. (cp_parser_selection_statement, cp_parser_iteration_statement): Extract a token_indent_info corresponding to the guard token. --- gcc/c-family/c-common.h | 7 --- gcc/c-family/c-indentation.c | 79 ++ gcc/c-family/c-indentation.h | 52 ++ gcc/c/c-parser.c | 81 +++ gcc/cp/parser.c | 100 +++ 5 files changed, 201 insertions(+), 118 deletions(-) create mode 100644 gcc/c-family/c-indentation.h diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index 7ba81c4..a78a13c 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -1429,12 +1429,5 @@ extern bool contains_cilk_spawn_stmt (tree); extern tree cilk_for_number_of_iterations (tree); extern bool check_no_cilk (tree, const char *, const char *, location_t loc = UNKNOWN_LOCATION); -/* In c-indentation.c. */ -extern void -warn_for_misleading_indentation (location_t guard_loc, -location_t body_loc, -location_t next_stmt_loc, -enum cpp_ttype next_tok_type, -const char *guard_kind); #endif /* ! GCC_C_COMMON_H */ diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c index f8c28cf..9043f8c 100644 --- a/gcc/c-family/c-indentation.c +++ b/gcc/c-family/c-indentation.c @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see #include "stor-layout.h" #include "input.h" #include "c-common.h" +#include "c-indentation.h" extern cpp_options *cpp_opts; @@ -187,11 +188,16 @@ detect_preprocessor_logic (expanded_location body_exploc, description of that function below. */ static bool -should_warn_for_misleading_indentation (location_t guard_loc, - location_t body_loc, - location_t next_stmt_loc, - enum cpp_ttype next_tok_type) +should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo, + const token_indent_info &body_tinfo, + const token_indent_info &next_tinfo) { + location_t guard_loc = guard_tinfo.locati
[PATCH 2/3] Remove is_first_nonwhitespace_on_line(), instead improve get_visual_column()
This patch removes the function is_first_nonwhitespace_on_line() in favor of augmenting the function get_visual_column() to optionally return the visual column corresponding to the first non-whitespace character on the line. Existing usage of is_first_nonwhitespace_on_line() can be trivially replaced by calling get_visual_column() and comparing *out with *first_nws. The rationale for this change is that in many cases it is better to use the visual column of the first non-whitespace character rather than the visual column of the token. Consider: if (p) { foo (1); } else // GUARD if (q) // BODY foo (2); foo (3); // NEXT Here, with current heuristics, we do not emit a warning because we notice that the visual columns of each token line up ("suggesting" autogenerated code). Yet it is obvious that we should warn here because it misleadingly looks like the foo (3); statement is guarded by the else. If we instead consider the visual column of the first non-whitespace character on the guard line, the columns will not line up thus we will emit the warning. This will be done in the next patch. gcc/c-family/ChangeLog: * c-indentation.c (get_visual_column): Add parameter first_nws, use it. Update comment documenting the function. (is_first_nonwhitespace_on_line): Remove. (should_warn_for_misleading_indentation): Replace usage of of is_first_nonwhitespace_on_line with get_visual_column. --- gcc/c-family/c-indentation.c | 53 1 file changed, 24 insertions(+), 29 deletions(-) diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c index 9043f8c..f43aee6 100644 --- a/gcc/c-family/c-indentation.c +++ b/gcc/c-family/c-indentation.c @@ -36,11 +36,16 @@ extern cpp_options *cpp_opts; /* Convert libcpp's notion of a column (a 1-based char count) to the "visual column" (0-based column, respecting tabs), by reading the relevant line. + Returns true if a conversion was possible, writing the result to OUT, - otherwise returns false. */ + otherwise returns false. If FIRST_NWS is not NULL, then write to it + the visual column corresponding to the first non-whitespace character + on the line. */ static bool -get_visual_column (expanded_location exploc, unsigned int *out) +get_visual_column (expanded_location exploc, + unsigned int *out, + unsigned int *first_nws = NULL) { int line_len; const char *line = location_get_source_line (exploc, &line_len); @@ -50,6 +55,13 @@ get_visual_column (expanded_location exploc, unsigned int *out) for (int i = 1; i < exploc.column; i++) { unsigned char ch = line[i - 1]; + + if (first_nws != NULL && !ISSPACE (ch)) + { + *first_nws = vis_column; + first_nws = NULL; + } + if (ch == '\t') { /* Round up to nearest tab stop. */ @@ -60,36 +72,13 @@ get_visual_column (expanded_location exploc, unsigned int *out) vis_column++; } + if (first_nws != NULL) +*first_nws = vis_column; + *out = vis_column; return true; } -/* Is the token at LOC the first non-whitespace on its line? - Helper function for should_warn_for_misleading_indentation. */ - -static bool -is_first_nonwhitespace_on_line (expanded_location exploc) -{ - int line_len; - const char *line = location_get_source_line (exploc, &line_len); - - /* If we can't determine it, return false so that we don't issue a - warning. This is sometimes the case for input files - containing #line directives, and these are often for autogenerated - sources (e.g. from .md files), where it's not clear that it's - meaningful to look at indentation. */ - if (!line) -return false; - - for (int i = 1; i < exploc.column; i++) -{ - unsigned char ch = line[i - 1]; - if (!ISSPACE (ch)) - return false; -} - return true; -} - /* Does the given source line appear to contain a #if directive? (or #ifdef/#ifndef). Ignore the possibility of it being inside a comment, for simplicity. @@ -282,9 +271,15 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo, /* They're all on the same line. */ gcc_assert (guard_exploc.file == next_stmt_exploc.file); gcc_assert (guard_exploc.line == next_stmt_exploc.line); + unsigned int guard_vis_column; + unsigned int guard_line_first_nws; + if (!get_visual_column (guard_exploc, + &guard_vis_column, + &guard_line_first_nws)) + return false; /* Heuristic: only warn if the guard is the first thing on its line. */ - if (is_first_nonwhitespace_on_line (guard_exploc)) + if (guard_vis_column == guard_line_first_nws) return true; } } -- 2.4.3.368.g7974889.dirty
[PATCH] PR 66474, Document using %x for VSX registers on PowerPC
A user pointed out that we never documented the requirement to use %x in the ouptut template when VSX registers are used. This patch adds the necessary documentation. Is it ok to install in the trunk and the open release branches? 2015-06-09 Michael Meissner PR target/66474 * doc/md.texi (Machine Constraints): Document that on the PowerPC if you use a constraint that targets a VSX register, you must use %x in the template. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/doc/md.texi === --- gcc/doc/md.texi (revision 224165) +++ gcc/doc/md.texi (working copy) @@ -3070,6 +3070,26 @@ Altivec vector register @item wa Any VSX register if the -mvsx option was used or NO_REGS. +When using any of the register constraints (@code{wa}, @code{wd}, +@code{wf}, @code{wg}, @code{wh}, @code{wi}, @code{wj}, @code{wk}, +@code{wl}, @code{wm}, @code{ws}, @code{wt}, @code{wu}, @code{wv}, +@code{ww}, or @code{wy}) that take VSX registers, you must use +@code{%x} in the template so that the correct register is used. +Otherwise, the register number will be incorrect if an Altivec +register is used in a place where a VSX register is expected. + +@smallexample +asm ("xvadddp %x0,%x1,%x2" "=wa" (v1) : "wa" (v2), "wa" (v3)); +@end smallexample + +is correct, but: + +@smallexample +asm ("xvadddp %0,%1,%2" "=wa" (v1) : "wa" (v2), "wa" (v3)); +@end smallexample + +is not correct. + @item wd VSX vector register to hold vector double data or NO_REGS.
Re: [PR64164] drop copyrename, integrate into expand
On Jun 9, 2015, David Edelsohn wrote: > This also broke bootstrap on PPC64 LE Linux with the same error. Thanks for your reports. I'm looking into the problem. I'd appreciate a preprocessed testcase from either of you to confirm the fix, if not to help debug it. Thanks in advance, -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [gomp4.1] Initial support for some OpenMP 4.1 construct parsing
On Wed, Apr 29, 2015 at 14:06:44 +0200, Jakub Jelinek wrote: > [...] The draft requires only alloc or to > (or always, variants) for enter data and only from or delete (or always, > variants) for exit data, so in theory it is possible to figure that from > the call without extra args, but not so for update - enter data is supposed > to increment reference counts, exit data decrement. [...] TR3.pdf also says about 'release' map-type for exit data, but it is not described in the document. > [...] And, we'll need new > arguments for async (pass through info that it is async (nowait) or sync, > and the depend clause address(es)). I don't quite understand from "If a depend clause is present, then it is treated as if it had appeared on the implicit task construct that encloses the target construct", is #pragma omp target depend(inout: x) equivalent to #pragma omp task depend(inout: x) #pragma omp target or not? In other words, can't we just generate GOMP_task (...) with GOMP_target (...) inside, without any new arguments? Thanks, -- Ilya
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 9, 2015 at 6:21 PM, Jakub Jelinek wrote: > On Tue, Jun 09, 2015 at 06:16:32PM +0200, Uros Bizjak wrote: >> > something? Would it be acceptable to just guard the changes in the patch >> > with !TARGET_X32 and let H.J. deal with that target? I'm afraid I'm lost >> > when to ZERO_EXTEND addr (if needed at all), etc. >> >> If you wish, I can take your patch and take if further. -mx32 is a >> delicate beast... > > If you could, it would be appreciated, I'm quite busy with OpenMP 4.1 stuff > now. > Note that for -m64/-mx32 it will be much harder to create a reproducer, > because to trigger the bug one has to convince the register allocator > to allocate the lhs of the load in certain registers (not that hard), > but also the index register (to be scaled, also not that hard) and > also the register holding the tls symbol immediate. Wonder if one has to > keep all but the two registers live across the load or something similar. Please find attach a patch that takes your idea slightly further. We find perhaps zero-extended UNSPEC_TP, and copy it for further use. At its place, we simply slap const0_rtx. We know that address to multi-word values has to be offsettable, which in case of x32 means that it is NOT zero-extended address. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 224292) +++ config/i386/i386.c (working copy) @@ -22858,7 +22858,7 @@ ix86_split_long_move (rtx operands[]) Do an lea to the last part and use only one colliding move. */ else if (collisions > 1) { - rtx base; + rtx base, addr, tls_base = NULL_RTX; collisions = 1; @@ -22869,10 +22869,52 @@ ix86_split_long_move (rtx operands[]) if (GET_MODE (base) != Pmode) base = gen_rtx_REG (Pmode, REGNO (base)); - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); + addr = XEXP (part[1][0], 0); + if (TARGET_TLS_DIRECT_SEG_REFS) + { + struct ix86_address parts; + int ok = ix86_decompose_address (addr, &parts); + gcc_assert (ok); + if (parts.seg != SEG_DEFAULT) + { + /* It is not valid to use %gs: or %fs: in +lea though, so we need to remove it from the +address used for lea and add it to each individual +memory loads instead. */ + rtx *x = &addr; + while (GET_CODE (*x) == PLUS) +{ + for (i = 0; i < 2; i++) + { + rtx op = XEXP (*x, i); + if ((GET_CODE (op) == UNSPEC +&& XINT (op, 1) == UNSPEC_TP) + || (GET_CODE (op) == ZERO_EXTEND + && GET_CODE (XEXP (op, 0)) == UNSPEC + && (XINT (XEXP (op, 0), 1) + == UNSPEC_TP))) + { + tls_base = XEXP (*x, i); + XEXP (*x, i) = const0_rtx; + break; + } + } + + if (tls_base) + break; + x = &XEXP (*x, 0); + } + gcc_assert (tls_base); + } + } + emit_insn (gen_rtx_SET (base, addr)); + if (tls_base) + base = gen_rtx_PLUS (GET_MODE (base), base, tls_base); part[1][0] = replace_equiv_address (part[1][0], base); for (i = 1; i < nparts; i++) { + if (tls_base) + base = copy_rtx (base); tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i); part[1][i] = replace_equiv_address (part[1][i], tmp); }
Re: [PATCH] PR 66474, Document using %x for VSX registers on PowerPC
On Tue, Jun 09, 2015 at 02:00:48PM -0400, Michael Meissner wrote: > +asm ("xvadddp %x0,%x1,%x2" "=wa" (v1) : "wa" (v2), "wa" (v3)); A colon went missing? ^^^ Segher
Re: [PATCH] PR 66474, Document using %x for VSX registers on PowerPC
On Tue, Jun 09, 2015 at 02:17:19PM -0500, Segher Boessenkool wrote: > On Tue, Jun 09, 2015 at 02:00:48PM -0400, Michael Meissner wrote: > > +asm ("xvadddp %x0,%x1,%x2" "=wa" (v1) : "wa" (v2), "wa" (v3)); > > A colon went missing? ^^^ Yes, I will correct it when I check it in. Thanks. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote: > Please find attach a patch that takes your idea slightly further. We > find perhaps zero-extended UNSPEC_TP, and copy it for further use. At > its place, we simply slap const0_rtx. We know that address to Is that safe? I mean, the address, even if offsetable, can have some immediate already (seems e.g. the offsettable_memref_p predicate just checks you can plus_constant some small integer and be recognized again) and if you turn the %gs: into a const0_rtx, it would fail next decompose. And when you already have the PLUS which has UNSPEC_TP as one of its arguments, replacing that PLUS with the other argument is IMHO very easy. Perhaps you are right that there is no need to copy_rtx, supposedly the rtx shouldn't be shared with anything and thus can be modified in place. If -mx32 is a non-issue here, then perhaps my initial patch is good enough? > Index: config/i386/i386.c > === > --- config/i386/i386.c(revision 224292) > +++ config/i386/i386.c(working copy) > @@ -22858,7 +22858,7 @@ ix86_split_long_move (rtx operands[]) >Do an lea to the last part and use only one colliding move. */ >else if (collisions > 1) > { > - rtx base; > + rtx base, addr, tls_base = NULL_RTX; > > collisions = 1; > > @@ -22869,10 +22869,52 @@ ix86_split_long_move (rtx operands[]) > if (GET_MODE (base) != Pmode) > base = gen_rtx_REG (Pmode, REGNO (base)); > > - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); > + addr = XEXP (part[1][0], 0); > + if (TARGET_TLS_DIRECT_SEG_REFS) > + { > + struct ix86_address parts; > + int ok = ix86_decompose_address (addr, &parts); > + gcc_assert (ok); > + if (parts.seg != SEG_DEFAULT) > + { > + /* It is not valid to use %gs: or %fs: in > + lea though, so we need to remove it from the > + address used for lea and add it to each individual > + memory loads instead. */ > + rtx *x = &addr; > + while (GET_CODE (*x) == PLUS) > +{ > + for (i = 0; i < 2; i++) > + { > + rtx op = XEXP (*x, i); > + if ((GET_CODE (op) == UNSPEC > + && XINT (op, 1) == UNSPEC_TP) > + || (GET_CODE (op) == ZERO_EXTEND > + && GET_CODE (XEXP (op, 0)) == UNSPEC > + && (XINT (XEXP (op, 0), 1) > + == UNSPEC_TP))) > + { > + tls_base = XEXP (*x, i); > + XEXP (*x, i) = const0_rtx; > + break; > + } > + } > + > + if (tls_base) > + break; > + x = &XEXP (*x, 0); > + } > + gcc_assert (tls_base); > + } > + } > + emit_insn (gen_rtx_SET (base, addr)); > + if (tls_base) > + base = gen_rtx_PLUS (GET_MODE (base), base, tls_base); > part[1][0] = replace_equiv_address (part[1][0], base); > for (i = 1; i < nparts; i++) > { > + if (tls_base) > + base = copy_rtx (base); > tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i); > part[1][i] = replace_equiv_address (part[1][i], tmp); > } Jakub
Re: conditional lim
On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener wrote: > On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova > wrote: >> Hi Richard, >> >> Here is some explanation. I hope you let me know if I need to clarify >> something. >> >> Also, you asked me about concrete example, to make sure you don’t miss >> my answer here is the link: >> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html. >> >> Also, I doubt whether it’s convenient for you to create a build with >> my patch or not. May be to clarify things you could send me some >> examples/concrete cases, then I’ll compile them with >> –fdump-tree-loopinit-details and –fdump-tree-lim-details and send you >> these dumps. May be these dumps will be useful. (I’ll only disable >> cleanup_cfg TODO after lim to let you know the exact picture after >> lim). >> >> What do you think? >> >> 1. invariantness _dom_walker – >> >> 1.1 for each GIMPLE_COND in given bb calls handle_cond_stmt to call >> for true and false edges handle_branch_edge, which calls SET_TARGET_OF >> for all bb ‘predicated’ by given GIMPLE_COND. >> >> SET_TARGET_OF sets in basic_blocks aux 2 facts: >> >> a) this is true or false edge; >> >> b) link to cond stmt; >> >> Handle_branch_edge works this way: >> >> If (cond1) >> >> { >> >> bb1; >> >> if (cond2} >> >>{ >> >>bb2; >> >> } >> >>Being called for cond1, it sets cond1 as condition for both bb1 and >> bb2 (the whole branch for cond1, ie also for bb containing cond2), >> then this method will be called (as there is dominance order) for >> cond2 to correct things (ie to set cond2 as condition for bb2). > > Hmm, why not track the current condition as state during the DOM walk > and thus avoid processing more than one basic-block in handle_branch_edge? > Thus get rid of handle_branch_edge and instead do everything in > handle_cond_stmt > plus the dom-walkers BB visitor? > I need to look more carefully how to implement it, but I think I understand what you mean and this optimization of course looks reasonable to me. Will do. > I see you don't handle BBs with multiple predecessors - that's ok, but > are you sure you don't run into correctness issues when not marking such > BBs as predicated? This misses handling of, say > > if (a || b) >bb; > > which is a pity (but can be fixed later if desired). > I had some test (in gcc testsuite or bootstrap build) which worked incorrectly because of multiple predecessors. As far as I remember the situation was (further, will make some notes about such tests to clarify this better), I mean with previous version of my code which handled bb with 2 predecessors: if (a) tmpvar=something; while() if (a || b) basic_block {do something with tmpvar;} // I mean basic block predicated by bb with a and bb with b So, if a is false, I mean we didn't do tmpvar=something (outside loop), BUT we are in basick_block (we went by bb with b), we got Segmentation falt in basic_block {do something with tmpvar;}. I think we can reproduce all the details of this test if I remove not handling bb with 2 predecessors. So I wouldn't move bb with 2 predecessors (this is not always executed bb in any loop, not conditionally, they will not be moved at all). This is my more detail explanation on this point. Perhaps, I didn't understand your question about correctness. Could you repeat it in other words (based on this new clarification). So I think according to current code it will not be moved. What incorrectness do you mean? > I note that collecting predicates has similarities to what if-conversion > does in tree-ifcvt.c (even if its implementation is completely different, > of course). > Ok, I'll look at this. But could you please clarify your point? (Should I just take this into account with low priority and look at this later or you want some refactoring?) >> 1.2 As 1.1 goes we identify whether some bb is predicated by some >> condition or not. >> >> bb->aux->type will be [TRUE/FALSE]_TARGET_OF and >> bb->aux->cond_stmt=cond stmt (the nearest condition). >> >> If bb is always executed bb->aux->type = ALWAYS_EXECUTED_IN, >> bb->loop->loop (this info was available in the clean build). >> >> 1.3 As this walker is called in dominance order, information about >> condition is available when invariantness_dom_walker is called for >> given bb. So we can make some computations based on bb->aux >> structure. This is done in check_conditionally_executed. The main goal >> of this method is to check that the root condition is always executed >> in the loop. I did so to avoid situation like this >> >> Loop: >> >>Jmp somewhere; >> >> If (cond1) >> >> If (cond2) >> >> Etc >> >> By ‘root condition’ I mean cond1 in this case (the first cond in >> dominance order). > > Ok, but this can't happen (an uncontional jump to somehwere). It > will always be a conditional jump (a loop exit) and thus should > be handled already. > > Did you have a testcase
Re: [PR64164] drop copyrename, integrate into expand
On Jun 9, 2015, Alexandre Oliva wrote: > On Jun 9, 2015, David Edelsohn wrote: >> This also broke bootstrap on PPC64 LE Linux with the same error. > Thanks for your reports. I'm looking into the problem. > I'd appreciate a preprocessed testcase from either of you to confirm the > fix, if not to help debug it. The first potential source for this problem that jumped at me would be silenced with this change: diff --git a/gcc/function.c b/gcc/function.c index 8bcc352..9201ed9 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -2974,7 +2974,8 @@ assign_parm_setup_block (struct assign_parm_data_all *all, stack_parm = copy_rtx (stack_parm); if (GET_MODE_SIZE (GET_MODE (entry_parm)) == size) PUT_MODE (stack_parm, GET_MODE (entry_parm)); - set_mem_attributes (stack_parm, parm, 1); + if (GET_CODE (stack_parm) == MEM) + set_mem_attributes (stack_parm, parm, 1); } /* If a BLKmode arrives in registers, copy it to a stack slot. Handle but I suspect there might be other similar issues lurking in function.c after my attempt to turn parm assignment upside down ;-) (namely, it used to assume it could pick stack slots and pseudos in a whim, but after this change it must give way to out-of-SSA's partition assignments.) I'll look into cross-building some embedded targets and see if any further issues surface. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [gomp4.1] Initial support for some OpenMP 4.1 construct parsing
On Tue, Jun 09, 2015 at 09:36:08PM +0300, Ilya Verbin wrote: > On Wed, Apr 29, 2015 at 14:06:44 +0200, Jakub Jelinek wrote: > > [...] The draft requires only alloc or to > > (or always, variants) for enter data and only from or delete (or always, > > variants) for exit data, so in theory it is possible to figure that from > > the call without extra args, but not so for update - enter data is supposed > > to increment reference counts, exit data decrement. [...] > > TR3.pdf also says about 'release' map-type for exit data, but it is not > described in the document. Seems that is the case even in the latest draft. Can you file a ticket for this or discuss on omp-lang? Or should I? > > [...] And, we'll need new > > arguments for async (pass through info that it is async (nowait) or sync, > > and the depend clause address(es)). > > I don't quite understand from "If a depend clause is present, then it is > treated > as if it had appeared on the implicit task construct that encloses the target > construct", is > > #pragma omp target depend(inout: x) > > equivalent to > > #pragma omp task depend(inout: x) > #pragma omp target > > or not? > > In other words, can't we just generate GOMP_task (...) with GOMP_target (...) > inside, without any new arguments? No, that would be an explicit task. Furthermore, the implicit task isn't on the host side, but on the offloading device side. The implicit task is what is executed when you enter the #pragma omp target. Ignoring the teams construct which is there mainly for NVidia GPGPUs, when you enter the #pragma omp target construct, there is an implicit parallel with num_threads(1) (like there is an implicit parallel with num_threads(1) when you enter main () of a host program), and that implicit parallel has a single implicit task, which executes the statements inside of #pragma omp target body, until you encounter #pragma omp teams or #pragma omp parallel. And the above statement simply says that no statements from the #pragma omp target body are executed until the depend dependency is satisfied. Whether these dependencies are host addresses, or offloading device addresses, is something that really needs to be figured out, I admit I haven't read the whole async offloading text carefully yet, nor participated in the telecons about it. Jakub
Re: [PR64164] drop copyrename, integrate into expand
On Tue, Jun 09, 2015 at 05:11:45PM -0300, Alexandre Oliva wrote: > On Jun 9, 2015, Alexandre Oliva wrote: > > > On Jun 9, 2015, David Edelsohn wrote: > >> This also broke bootstrap on PPC64 LE Linux with the same error. > > > Thanks for your reports. I'm looking into the problem. > > > I'd appreciate a preprocessed testcase from either of you to confirm the > > fix, if not to help debug it. > > The first potential source for this problem that jumped at me would be > silenced with this change: > > diff --git a/gcc/function.c b/gcc/function.c > index 8bcc352..9201ed9 100644 > --- a/gcc/function.c > +++ b/gcc/function.c > @@ -2974,7 +2974,8 @@ assign_parm_setup_block (struct assign_parm_data_all > *all, > stack_parm = copy_rtx (stack_parm); >if (GET_MODE_SIZE (GET_MODE (entry_parm)) == size) > PUT_MODE (stack_parm, GET_MODE (entry_parm)); > - set_mem_attributes (stack_parm, parm, 1); > + if (GET_CODE (stack_parm) == MEM) FYI, this is preferrably if (MEM_P (stack_parm)) these days. > + set_mem_attributes (stack_parm, parm, 1); > } Jakub
Re: [patch] [jit] Install headers using INSTALL_DATA
On Tue, 2015-06-09 at 18:19 +0200, Matthias Klose wrote: > Install the gcc/jit header files without the x bit set. Ok for the trunk and > the > 5 branch? OK.
Re: [PR64164] drop copyrename, integrate into expand
> I'll look into cross-building some embedded targets and see if any > further issues surface. SPARC is also broken, see my message and the tescase under the PR. -- Eric Botcazou
Re: [PR64164] drop copyrename, integrate into expand
Alex, I sent you a pre-processed file off-list. You could try bootstrapping on PPC64 on the GCC Compile Farm. The SPARC failure reports a different error than PPC and ARM. The PPC and ARM failures are the same message, but seem to be on different files. After the breakage from Aldy's patch all weekend, this failure is very frustrating. If this is not fixed within 24 hours, your patch must be reverted. This patch clearly should have been tested on more architectures than x86 before being approved and merged. Thanks, David On Tue, Jun 9, 2015 at 5:27 PM, Eric Botcazou wrote: >> I'll look into cross-building some embedded targets and see if any >> further issues surface. > > SPARC is also broken, see my message and the tescase under the PR. > > -- > Eric Botcazou
[PATCH][PR debug/65549] Restore DW_AT_abstract_origin for cross-unit call sites
Hello, With the recent work for PR debug/65549, we observed a regression in the generated debugging information. Take the attached reproducer, build it and look at the output DWARF: $ gcc -c -O1 -g foo.c $ objdump --dwarf=info foo.o [...] <2><4e>: Abbrev Number: 3 (DW_TAG_GNU_call_site) <4f> DW_AT_low_pc : 0x9 <2><57>: Abbrev Number: 0 There is no DW_AT_abstract_origin attribute anymore, whereas there used to be one. On PowerPC with -mlongcall, for instance, call instructions are indirect and we (at AdaCore) rely on this attribute to determine what function is actually called (it's easier this way than interpreting machine code...). The DWARF proposal for call sites also say debuggers should be able to use this attribute (to build virtual call stacks in the presence of tail calls), but I could not observe this in practice with GDB. The attached patch is an attempt to restore this attribute. The logic behind it is: this is what we used to do previously for contexts that are alien translation unit (through the call to force_decl_die), so this should not harm. Bootstrapped and regtested on x86_64-linux. Ok to commit? Thank you in advance! gcc/ChangeLog * dwarf2out.c (lookup_context_die): Return the DIE for the current compilation unit when the input context is a TRANSLATION_UNIT_DECL. -- Pierre-Marie de Rodat extern int bar(int a); int foo(int a) { a += 1; return a; } int main(void) { bar(1); } >From b2ededa4a5eac77fc0d5d95011de935a1dff7eba Mon Sep 17 00:00:00 2001 From: Pierre-Marie de Rodat Date: Mon, 8 Jun 2015 16:27:04 +0200 Subject: [PATCH] DWARF: restore DW_AT_abstract_origin for cross-unit call sites gcc/ChangeLog * dwarf2out.c (lookup_context_die): Return the DIE for the current compilation unit when the input context is a TRANSLATION_UNIT_DECL. --- gcc/dwarf2out.c | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index ee2bcb1..4e204df 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -21053,21 +21053,20 @@ is_naming_typedef_decl (const_tree decl) static inline dw_die_ref lookup_context_die (tree context) { - if (context) + if (context == NULL_TREE || TREE_CODE (context) == TRANSLATION_UNIT_DECL) +return comp_unit_die (); + + /* Find die that represents this context. */ + if (TYPE_P (context)) { - /* Find die that represents this context. */ - if (TYPE_P (context)) - { - context = TYPE_MAIN_VARIANT (context); - dw_die_ref ctx = lookup_type_die (context); - if (!ctx) - return NULL; - return strip_naming_typedef (context, ctx); - } - else - return lookup_decl_die (context); + context = TYPE_MAIN_VARIANT (context); + dw_die_ref ctx = lookup_type_die (context); + if (!ctx) + return NULL; + return strip_naming_typedef (context, ctx); } - return comp_unit_die (); + else +return lookup_decl_die (context); } /* Returns the DIE for a context. */ -- 1.7.10.4
Add extra C/Fortran interoperability testcase
Hi, I constructed this testcase as I anticipated a bogus warning. It does not happen because C_FUNPTR is function pointer and useless_type_conversion does not look into function signature, but it is still useful testcase to have. Once we start to do TBAA for pointers, we will end up in need to glob C_FUNPTR with void * to make it universal. regtested ppc64le-linux, will commit it shortly. Honza * gfortran.dg/lto/bind_c-5_0.f90: New testcase. * gfortran.dg/lto/bind_c-5_1.c: New testcase. Index: bind_c-5_0.f90 === --- bind_c-5_0.f90 (revision 0) +++ bind_c-5_0.f90 (working copy) @@ -0,0 +1,17 @@ +! { dg-lto-do run } +! { dg-lto-options {{ -O3 -flto }} } +! This testcase will abort if C_FUNPTR is not interoperable with both int * +! and float * +module lto_type_merge_test + use, intrinsic :: iso_c_binding + implicit none + + type(c_funptr), bind(c, name="myVar") :: myVar + type(c_funptr), bind(c, name="myVar2") :: myVar2 + +contains + subroutine types_test() bind(c) +myVar = myVar2 + end subroutine types_test +end module lto_type_merge_test + Index: bind_c-5_1.c === --- bind_c-5_1.c(revision 0) +++ bind_c-5_1.c(working copy) @@ -0,0 +1,31 @@ +#include +/* declared in the fortran module */ +extern int (*myVar) (int); +extern float (*myVar2) (float); +void types_test(void); + + +extern void abort(void); + +int main(int argc, char **argv) +{ + int (**myptr) (int); + float (**myptr2) (float); + asm("":"=r"(myptr):"0"(&myVar)); + asm("":"=r"(myptr2):"0"(&myVar2)); + *myptr = (int (*) (int)) (size_t) (void *)1; + *myptr2 = (float (*) (float)) (size_t) (void *)2; + types_test(); + if (*myptr != (int (*) (int)) (size_t) (void *)2) + abort (); + if (*myptr2 != (float (*) (float)) (size_t) (void *)2) + abort (); + *myptr2 = (float (*) (float)) (size_t) (void *)3; + types_test(); + if (*myptr != (int (*) (int)) (size_t) (void *)3) + abort (); + if (*myptr2 != (float (*) (float)) (size_t) (void *)3) + abort (); + return 0; +} +
[PATCH v2] xtensa: implement trap pattern
gcc/ * config/xtensa/xtensa.h (TARGET_DEBUG): New definition. * config/xtensa/xtensa.md (define_attr "type"): New type "trap". (define_insn "trap"): New definition. --- Changes v1->v2: - drop break.n, replace break 0, 0 with break 1, 15, coded breakpoint that transfers control to debugger if present. gcc/config/xtensa/xtensa.h | 1 + gcc/config/xtensa/xtensa.md | 15 ++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h index 011411c..584080b 100644 --- a/gcc/config/xtensa/xtensa.h +++ b/gcc/config/xtensa/xtensa.h @@ -67,6 +67,7 @@ extern unsigned xtensa_current_frame_size; #define TARGET_THREADPTR XCHAL_HAVE_THREADPTR #define TARGET_LOOPS XCHAL_HAVE_LOOPS #define TARGET_WINDOWED_ABI(XSHAL_ABI == XTHAL_ABI_WINDOWED) +#define TARGET_DEBUG XCHAL_HAVE_DEBUG #define TARGET_DEFAULT \ ((XCHAL_HAVE_L32R? 0 : MASK_CONST16) | \ diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 6d84384..a577aa3 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -86,7 +86,7 @@ ;; Attributes. (define_attr "type" - "unknown,jump,call,load,store,move,arith,multi,nop,farith,fmadd,fconv,fload,fstore,mul16,mul32,div32,mac16,rsr,wsr,entry" + "unknown,jump,call,load,store,move,arith,multi,nop,farith,fmadd,fconv,fload,fstore,mul16,mul32,div32,mac16,rsr,wsr,entry,trap" (const_string "unknown")) (define_attr "mode" @@ -1764,6 +1764,19 @@ [(set_attr "length" "0") (set_attr "type" "nop")]) +(define_insn "trap" + [(trap_if (const_int 1) (const_int 0))] + "" +{ + if (TARGET_DEBUG) +return "break\t1, 15"; + else +return (TARGET_DENSITY ? "ill.n" : "ill"); +} + [(set_attr "type""trap") + (set_attr "mode""none") + (set_attr "length" "3")]) + ;; Setting up a frame pointer is tricky for Xtensa because GCC doesn't ;; know if a frame pointer is required until the reload pass, and ;; because there may be an incoming argument value in the hard frame -- 1.8.1.4
Re: [PING^2][PATCH][3/3][PR65460] Mark offloaded functions as parallelized
Hi Tom! On Tue, 9 Jun 2015 16:12:12 +0200, Tom de Vries wrote: > On 09/06/15 13:07, Richard Biener wrote: > > On Mon, 8 Jun 2015, Tom de Vries wrote: > > > >> On 17/04/15 12:08, Tom de Vries wrote: > >>> On 20-03-15 12:38, Tom de Vries wrote: > On 19-03-15 12:05, Tom de Vries wrote: > > On 18-03-15 18:22, Tom de Vries wrote: > >> this patch fixes PR65460. > >> > >> The patch marks offloaded functions as parallelized, which means the > >> parloops > >> pass no longer attempts to modify that function. > > Ok, but shouldn't it be set before calling add_new_function as > > add_new_function might run passes that wouldn't identify the > > function as parallelized? > > Hm, indeed sometimes add_new_function executes some passes itself, > besides queueing the function for further processing. I suppose the > existing settings of parallelized_function should be modified in a > similar way. (I took note of this small change of the trunk patch vs. the version present on gomp-4_0-branch, and will adapt the latter as required as part of the next merge.) > I'll bootstrap and reg-test attached two patches on x86_64, and commit > unless objections. > Mark offloaded functions as parallelized > > 2015-06-09 Tom de Vries > > PR tree-optimization/65460 > * omp-low.c (expand_omp_target): Set parallelized_function on > cgraph_node for child_fn. (Committed to trunk in r224303.) > --- a/gcc/omp-low.c > +++ b/gcc/omp-low.c > @@ -8959,6 +8959,8 @@ expand_omp_target (struct omp_region *region) > >/* Inform the callgraph about the new function. */ >DECL_STRUCT_FUNCTION (child_fn)->curr_properties = > cfun->curr_properties; > + cgraph_node *node = cgraph_node::get_create (child_fn); Would you please committ a fix: in offloading configurations (ENABLE_OFFLOADING), this new node variable will clash with an existing definition a little later: > + node->parallelized_function = 1; >cgraph_node::add_new_function (child_fn, true); > > #ifdef ENABLE_OFFLOADING |/* Add the new function to the offload table. */ |vec_safe_push (offload_funcs, child_fn); | #endif | |/* Fix the callgraph edges for child_cfun. Those for cfun will be |fixed in a following pass. */ |push_cfun (child_cfun); |cgraph_edge::rebuild_edges (); | | #ifdef ENABLE_OFFLOADING |/* Prevent IPA from removing child_fn as unreachable, since there are no |refs from the parent function to child_fn in offload LTO mode. */ |struct cgraph_node *node = cgraph_node::get (child_fn); |node->mark_force_output (); | #endif Grüße, Thomas signature.asc Description: PGP signature
[PATCH, Google] Notify df framework when removing an insn in simplify-got
Hi I forgot to notify df framework when I removed an insn, it caused df verification failure described in google bug b/16155462. The following patch passed regression test on arm qemu in both thumb and arm modes. OK for google 4.9 branch? Index: simplify-got.c === --- simplify-got.c (revision 224174) +++ simplify-got.c (working copy) @@ -169,7 +169,10 @@ /* Since there is no usage of pic_reg now, we can remove it. */ if (use) -remove_insn (use); +{ + df_insn_delete (use); + remove_insn (use); +} targetm.got_access.clear_pic_reg (); free (got_accesses); htab_delete (var_table);
Re: [PATCH, Google] Notify df framework when removing an insn in simplify-got
ok. David On Tue, Jun 9, 2015 at 4:46 PM, Carrot Wei wrote: > Hi > > I forgot to notify df framework when I removed an insn, it caused df > verification failure described in google bug b/16155462. > > The following patch passed regression test on arm qemu in both thumb > and arm modes. > OK for google 4.9 branch? > > > Index: simplify-got.c > === > --- simplify-got.c (revision 224174) > +++ simplify-got.c (working copy) > @@ -169,7 +169,10 @@ > >/* Since there is no usage of pic_reg now, we can remove it. */ >if (use) > -remove_insn (use); > +{ > + df_insn_delete (use); > + remove_insn (use); > +} >targetm.got_access.clear_pic_reg (); >free (got_accesses); >htab_delete (var_table);
Re: [PR64164] drop copyrename, integrate into expand
On Jun 5, 2015, Alexandre Oliva wrote: > On Apr 27, 2015, Richard Biener wrote: >>> +/* Return the promoted mode for name. If it is a named SSA_NAME, it >>> + is the same as promote_decl_mode. Otherwise, it is the promoted >>> + mode of a temp decl of same type as the SSA_NAME, if we had created >>> + one. */ >>> + >>> +machine_mode >>> +promote_ssa_mode (const_tree name, int *punsignedp) >>> +{ >>> + gcc_assert (TREE_CODE (name) == SSA_NAME); >>> + >>> + if (SSA_NAME_VAR (name)) >>> +return promote_decl_mode (SSA_NAME_VAR (name), punsignedp); >> As above I'd rather not have different paths for anonymous vs. non-anonymous >> vars (so just delete the above two lines). > Check This caused the sparc regression reported by Eric in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164#c37 We need to match the mode of the rtl created for the partition and the promoted mode expected for the parm. I recall working to make parm and result decls the partition leaders, so that promote_ssa_mode would DTRT, but this escaped my mind when revisiting the patch after some time on another project. So we either restore promote_ssa_mode's check for an underlying decl, at least for PARM_ and RESULT_DECLs, or further massage function.c to deal with the mode difference. Any preference? I'm reverting the patch for now, so that we don't have to rush to a fix on this, and I can have more time to test and fix other arches. It was a terrible mistake to not do so before submitting the final version of the patch, or at least before installing it. I apologize for that. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [PATCH] PR 66474, Document using %x for VSX registers on PowerPC
On Tue, Jun 9, 2015 at 3:26 PM, Michael Meissner wrote: > On Tue, Jun 09, 2015 at 02:17:19PM -0500, Segher Boessenkool wrote: >> On Tue, Jun 09, 2015 at 02:00:48PM -0400, Michael Meissner wrote: >> > +asm ("xvadddp %x0,%x1,%x2" "=wa" (v1) : "wa" (v2), "wa" (v3)); >> >> A colon went missing? ^^^ > > Yes, I will correct it when I check it in. Thanks. Mike, VSX registers are a superset of Altivec registers, so the statement about an Altivec register used where a VSX register is expected is a little confusing. How about: Otherwise the register number output in the assembly file will be incorrect if an Altivec register is an operand of a VSX instruction that expects VSX register numbering. Thanks, David
Re: [PATCH] Emit -Waddress warnings for comparing address of reference against NULL
On Sun, May 3, 2015 at 5:29 PM, Patrick Palka wrote: > On Sun, Apr 26, 2015 at 8:56 PM, Patrick Palka wrote: >> Here is an updated version of the patch with, hopefully, all your >> suggestions made. I decided to add calls to STRIP_NOPS before emitting >> the warning so that we properly warn for cases where there's a cast in >> between the whole thing, e.g. >> >> if (!&(int &)a) >> >> I also added guards to emit the warnings only when the stripped operand >> is actually a decl so that we don't pass a non-decl argument to >> warning_at() which can happen in a case like >> >> if (!&(int &)*(int *)0) >> >> Does this look OK now after testing? >> >> gcc/c-family/ChangeLog: >> >> PR c++/65168 >> * c-common.c (c_common_truthvalue_conversion): Warn when >> converting an address of a reference to a truth value. >> >> gcc/cp/ChangeLog: >> >> PR c++/65168 >> * typeck.c (cp_build_binary_op): Warn when comparing an address >> of a reference against NULL. >> >> gcc/testsuite/ChangeLog: >> >> PR c++/65168 >> g++.dg/warn/Walways-true-3.C: New test. >> --- >> gcc/c-family/c-common.c| 14 ++ >> gcc/cp/typeck.c| 34 ++ >> gcc/testsuite/g++.dg/warn/Walways-true-3.C | 45 >> ++ >> 3 files changed, 93 insertions(+) >> create mode 100644 gcc/testsuite/g++.dg/warn/Walways-true-3.C >> >> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c >> index 9797e17..c353c72 100644 >> --- a/gcc/c-family/c-common.c >> +++ b/gcc/c-family/c-common.c >> @@ -4806,6 +4806,20 @@ c_common_truthvalue_conversion (location_t location, >> tree expr) >> tree totype = TREE_TYPE (expr); >> tree fromtype = TREE_TYPE (TREE_OPERAND (expr, 0)); >> >> + if (POINTER_TYPE_P (totype) >> + && TREE_CODE (fromtype) == REFERENCE_TYPE) >> + { >> + tree inner = expr; >> + STRIP_NOPS (inner); >> + >> + if (DECL_P (inner)) >> + warning_at (location, >> + OPT_Waddress, >> + "the compiler can assume that the address of " >> + "%qD will always evaluate to %", >> + inner); >> + } >> + >> /* Don't cancel the effect of a CONVERT_EXPR from a REFERENCE_TYPE, >>since that affects how `default_conversion' will behave. */ >> if (TREE_CODE (totype) == REFERENCE_TYPE >> diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c >> index 91db32a..13fb401 100644 >> --- a/gcc/cp/typeck.c >> +++ b/gcc/cp/typeck.c >> @@ -4423,6 +4423,23 @@ cp_build_binary_op (location_t location, >> warning (OPT_Waddress, "the address of %qD will never be >> NULL", >> TREE_OPERAND (op0, 0)); >> } >> + >> + if (CONVERT_EXPR_P (op0) >> + && TREE_CODE (TREE_TYPE (TREE_OPERAND (op0, 0))) >> +== REFERENCE_TYPE) >> + { >> + tree inner_op0 = op0; >> + STRIP_NOPS (inner_op0); >> + >> + if ((complain & tf_warning) >> + && c_inhibit_evaluation_warnings == 0 >> + && !TREE_NO_WARNING (op0) >> + && DECL_P (inner_op0)) >> + warning (OPT_Waddress, >> +"the compiler can assume that the address of " >> +"%qD will never be NULL", >> +inner_op0); >> + } >> } >>else if (((code1 == POINTER_TYPE || TYPE_PTRDATAMEM_P (type1)) >> && null_ptr_cst_p (op0)) >> @@ -4445,6 +4462,23 @@ cp_build_binary_op (location_t location, >> warning (OPT_Waddress, "the address of %qD will never be >> NULL", >> TREE_OPERAND (op1, 0)); >> } >> + >> + if (CONVERT_EXPR_P (op1) >> + && TREE_CODE (TREE_TYPE (TREE_OPERAND (op1, 0))) >> +== REFERENCE_TYPE) >> + { >> + tree inner_op1 = op1; >> + STRIP_NOPS (inner_op1); >> + >> + if ((complain & tf_warning) >> + && c_inhibit_evaluation_warnings == 0 >> + && !TREE_NO_WARNING (op1) >> + && DECL_P (inner_op1)) >> + warning (OPT_Waddress, >> +"the compiler can assume that the address of " >> +"%qD will never be NULL", >> +inner_op1); >> + } >> } >>else if ((code0 == POINTER_TYPE && code1 == POINTER_TYPE) >>|| (TYPE_PTRDATAMEM_P (type0) && TYPE_PTRDATAMEM_P (type1))) >> diff --git a/gcc/testsuite/g++.dg/warn/Walways-true-3.C >> b/gcc/testsuite/g++.dg/warn/Walways-true-3.C >> new file mode 100644 >> index 000..0d13d3f >> --- /dev/null >> +++ b/gcc/testsuite/g++.dg/warn/Walways-true-3.C >> @@ -0,0 +1,45 @@ >> +//
RE: backport the fixes of PR target/64011 and /61749 to 4.9 gcc
Updated patches were attached. Rebased on the latest 4.9 branch. Tested on aarch64-linux (big-endian and little-endian) with qemu. OK for 4.9? The first patch: Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 223867) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,11 @@ +2015-06-09 Xiangyu Wei + + Backport from mainline r219717: + 2015-01-15 Jiong Wang + PR rtl-optimization/64011 + * expmed.c (store_bit_field_using_insv): Warn and truncate bitsize when + there is partial overflow. + Index: gcc/expmed.c === --- gcc/expmed.c(revision 223867) +++ gcc/expmed.c(working copy) @@ -539,7 +539,21 @@ store_bit_field_using_insv (const extraction_insn xop0 = tem; copy_back = true; } - + + /* There are similar overflow check at the start of store_bit_field_1, + but that only check the situation where the field lies completely + outside the register, while there do have situation where the field + lies partialy in the register, we need to adjust bitsize for this + partial overflow situation. Without this fix, pr48335-2.c on big-endian + will broken on those arch support bit insert instruction, like arm, aarch64 + etc. */ + if (bitsize + bitnum > unit && bitnum < unit) +{ + warning (OPT_Wextra, "write of "HOST_WIDE_INT_PRINT_UNSIGNED"bit data " + "outside the bound of destination object, data truncated into " + HOST_WIDE_INT_PRINT_UNSIGNED"bit", bitsize, unit - bitnum); + bitsize = unit - bitnum; +} /* If BITS_BIG_ENDIAN is zero on a BYTES_BIG_ENDIAN machine, we count "backwards" from the size of the unit we are inserting into. Otherwise, we count bits from the most significant on a And the second one: Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 223867) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,28 @@ +2015-06-09 Xiangyu Wei + + Backport from mainline r215046: + 2014-09-09 Kyrylo Tkachov + PR target/61749 + * config/aarch64/aarch64-builtins.c (aarch64_types_quadop_qualifiers): + Use qualifier_immediate for last operand. Rename to... + (aarch64_types_ternop_lane_qualifiers): ... This. + (TYPES_QUADOP): Rename to... + (TYPES_TERNOP_LANE): ... This. + (aarch64_simd_expand_args): Return const0_rtx when encountering user + error. Change return of 0 to return of NULL_RTX. + (aarch64_crc32_expand_builtin): Likewise. + (aarch64_expand_builtin): Return NULL_RTX instead of 0. + ICE when expanding unknown builtin. + * config/aarch64/aarch64-simd-builtins.def (sqdmlal_lane): Use + TERNOP_LANE qualifiers. + (sqdmlsl_lane): Likewise. + (sqdmlal_laneq): Likewise. + (sqdmlsl_laneq): Likewise. + (sqdmlal2_lane): Likewise. + (sqdmlsl2_lane): Likewise. + (sqdmlal2_laneq): Likewise. + (sqdmlsl2_laneq): Likewise. + * gcc.target/aarch64/vqdml_lane_intrinsics-bad_1.c: New test. Index: gcc/config/aarch64/aarch64-builtins.c === --- gcc/config/aarch64/aarch64-builtins.c (revision 223867) +++ gcc/config/aarch64/aarch64-builtins.c (working copy) @@ -172,10 +172,10 @@ aarch64_types_ternopu_qualifiers[SIMD_MAX_BUILTIN_ #define TYPES_TERNOPU (aarch64_types_ternopu_qualifiers) static enum aarch64_type_qualifiers -aarch64_types_quadop_qualifiers[SIMD_MAX_BUILTIN_ARGS] +aarch64_types_ternop_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, - qualifier_none, qualifier_none }; -#define TYPES_QUADOP (aarch64_types_quadop_qualifiers) + qualifier_none, qualifier_immediate }; +#define TYPES_TERNOP_LANE (aarch64_types_ternop_lane_qualifiers) static enum aarch64_type_qualifiers aarch64_types_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -818,8 +818,11 @@ aarch64_simd_expand_args (rtx target, int icode, i case SIMD_ARG_CONSTANT: if (!(*insn_data[icode].operand[argc + have_retval].predicate) (op[argc], mode[argc])) - error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, " + { + error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, " "expected %", argc + 1); + return const0_rtx; + } break; case SIMD_ARG_STOP: @@ -886,7 +889,7 @@ aarch64_simd_expand_args (rtx target, int icode, i } if (!pat) -return 0; +return NULL_RTX; > -Original Message- > From: James Greenhalgh [mailto:james.greenha...@arm.com] > Sent: Thursday, May 28, 2015 9:58 PM > To: weixiangyu > Cc: gcc-patches@
Refine lto-symtab warnings
Hi, this patch makes warn_type_compatibility_p to use TYPE_CANONICAL rather than types_compatible_p to decide about warnings on type mismatches. types_compatible_p is way too precise. It insist on oprations among the types to be compatible. This is not necessary here as accesses are wrapped in MEM_REF that does the proper cast - we only want types to be compatible by their representaiton and by TBAA rules. Comparing TYPE_CANONICAL is still not completely fitting the bill, as it won't work for incomplete types (we can hit these in array declarations and return values of structures) and if we get TYPE_CANONICAl for pointers to not be fully globbed (I hope so), we will need to add exception these same way as get_alias_set does some exception. An option would be to simply compare alias set values. This however is bit too coarse as it won't get size mismatches nor complete/incomplete type incompatibilities, so I think it makes sense to simply build from TYPE_CANONICAL and implement whatever globbing is needed here. Even if we may need to implement similar globbing in get_alias_set and here, it is not necessarily the same: the definition of interoperability is not precisely following TBAA compatibility. Bootstrapped/regtested ppc64le-linux, OK? The testcase here is based on mixing signedness between C and Fortran units. It won't work unless the earlier patch is applied. But even if we get more discussion in the TYPE_UNSIGNED globbing, I think this patch makes sense by itself, possibly w/o a testcase. Honza * lto-symtab.c (warn_type_compatibility_p): Use TYPE_CANONICAL to compare memory representation equality. * gfortran.dg/lto/bind_c-6_0.f90: New testcase. * gfortran.dg/lto/bind_c-6_1.c: New testcase. Index: lto/lto-symtab.c === --- lto/lto-symtab.c(revision 224250) +++ lto/lto-symtab.c(working copy) @@ -199,13 +199,19 @@ /* Return non-zero if we want to output waring about T1 and T2. Return value is a bitmask of reasons of violation: - Bit 0 indicates that types are not compatible of memory layout. - Bot 1 indicates that types are not compatible because of C++ ODR rule. */ + Bit 0 indicates that types are not compatible in memory representation + or by TBAA. + Bit 1 indicates that types are not compatible because of C++ ODR rule. + We may have false positives for bit 0. We may also output more strict + warnings in mismatches between types originating from same compilation + units. For example, we may spot qualification differences for C. */ + static int warn_type_compatibility_p (tree prevailing_type, tree type) { int lev = 0; + /* C++ provide a robust way to check for type compatibility via the ODR rule. */ if (odr_or_derived_type_p (prevailing_type) && odr_or_derived_type_p (type) @@ -245,54 +251,48 @@ incomplete pointed-to types more aggressively here, ignoring mismatches in both field and tag names. It's difficult though to guarantee that this does not have side-effects on merging - more compatible types from other translation units though. */ + more compatible types from other translation units though. + For C programs doing so is however valid by WG14 + as response to DR#314 */ - /* We can tolerate differences in type qualification, the - qualification of the prevailing definition will prevail. - ??? In principle we might want to only warn for structurally - incompatible types here, but unless we have protective measures - for TBAA in place that would hide useful information. */ prevailing_type = TYPE_MAIN_VARIANT (prevailing_type); type = TYPE_MAIN_VARIANT (type); - if (!types_compatible_p (prevailing_type, type)) + /* Check type compatibility by the canonical type machinery. This only works + for complete types. */ + if (COMPLETE_TYPE_P (type) && COMPLETE_TYPE_P (prevailing_type)) { - if (TREE_CODE (prevailing_type) == FUNCTION_TYPE - || TREE_CODE (type) == METHOD_TYPE) + /* For mixed C/non-C programs we only can warn if the types are incompatible +in their representation. +For example it is possible to declare the same variable as C_PTR in +Fortran unit and as float * in C unit, because C_PTR (represented as +void *) is defined to be interoperable. +It is however definitely possible to check more here than what we +do currently. */ + gcc_assert (TYPE_CANONICAL (prevailing_type) && TYPE_CANONICAL (type)); + if (TYPE_CANONICAL (prevailing_type) != TYPE_CANONICAL (type)) return 1 | lev; - if (COMPLETE_TYPE_P (type) && COMPLETE_TYPE_P (prevailing_type)) - return 1 | lev; + return lev; +} - /* If type is incomplete then avoid warnings in the cases -that TBAA handles just fine. */ + /* Incomplete structure is compatible with every other structure.
[PATCH] Move gen_* stubs from defaults.h to genflags
Hi, all. I noticed that defaults.h file contains stub generator functions which simply call gcc_unreachable. FWIW, Trevor added them to remove some conditional compilation which depends on HAVE_ macros (I mean something like r223624). Because we still have ~80 more such conditions in GCC, and probably some of them will be fixed in the same way, I propose a patch, which allows to generate required stubs in genflags. Bootstrapped and regtested on x86_64-unknown-linux-gnu, testing for full target list in progress. OK for trunk? -- Regards, Mikhail Maltsev gcc/ChangeLog: 2015-06-10 Mikhail Maltsev * defaults.h (gen_simple_return, gen_return, gen_mem_thread_fence, gen_memory_barrier, gen_mem_signal_fence, gen_load_multiple, gen_store_multiple, gen_tablejump): Remove. Generate by genflags. (HAVE_*): Likewise. * genflags.c (struct stub_info_t): Define. (stubs): Define variable. (struct gflg_string_hasher): Define. (stubs_map): Define variable. (mark_as_done): New function. (gen_dummy): New function (move code out of gen_proto, generalize). (gen_proto): Use gen_dummy. (gen_insn): Call mark_as_done. (gen_stub): New function. (main): Initialize stubs_map, call gen_stubs for missing stubs. diff --git a/gcc/defaults.h b/gcc/defaults.h index 057b646..5beddea 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1426,96 +1426,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define TARGET_VTABLE_USES_DESCRIPTORS 0 #endif -#ifndef HAVE_simple_return -#define HAVE_simple_return 0 -static inline rtx -gen_simple_return () -{ - gcc_unreachable (); - return NULL; -} -#endif - -#ifndef HAVE_return -#define HAVE_return 0 -static inline rtx -gen_return () -{ - gcc_unreachable (); - return NULL; -} -#endif - -#ifndef HAVE_epilogue -#define HAVE_epilogue 0 -static inline rtx -gen_epilogue () -{ - gcc_unreachable (); - return NULL; -} -#endif - -#ifndef HAVE_mem_thread_fence -#define HAVE_mem_thread_fence 0 -static inline rtx -gen_mem_thread_fence (rtx) -{ - gcc_unreachable (); - return NULL; -} -#endif - -#ifndef HAVE_memory_barrier -#define HAVE_memory_barrier 0 -static inline rtx -gen_memory_barrier () -{ - gcc_unreachable (); - return NULL; -} -#endif - -#ifndef HAVE_mem_signal_fence -#define HAVE_mem_signal_fence 0 -static inline rtx -gen_mem_signal_fence (rtx) -{ - gcc_unreachable (); - return NULL; -} -#endif - -#ifndef HAVE_load_multiple -#define HAVE_load_multiple 0 -static inline rtx -gen_load_multiple (rtx, rtx, rtx) -{ - gcc_unreachable (); - return NULL; -} -#endif - -#ifndef HAVE_store_multiple -#define HAVE_store_multiple 0 -static inline rtx -gen_store_multiple (rtx, rtx, rtx) -{ - gcc_unreachable (); - return NULL; -} -#endif - -#ifndef HAVE_tablejump -#define HAVE_tablejump 0 -static inline rtx -gen_tablejump (rtx, rtx) -{ - gcc_unreachable (); - return NULL; -} -#endif - #endif /* GCC_INSN_FLAGS_H */ #endif /* ! GCC_DEFAULTS_H */ diff --git a/gcc/genflags.c b/gcc/genflags.c index 0da15f1..2a70b56 100644 --- a/gcc/genflags.c +++ b/gcc/genflags.c @@ -26,10 +26,65 @@ along with GCC; see the file COPYING3. If not see #include "tm.h" #include "rtl.h" #include "obstack.h" +#include "hash-map.h" #include "errors.h" #include "read-md.h" #include "gensupport.h" +/* Structure which holds data, required for generating stub gen_* function. */ +struct stub_info_t +{ + /* Instruction name. */ + const char *name; + /* Number of arguments (i.e., instruction operands). */ + int opno; + /* Set to true when generator is output, so no stub is needed. */ + bool done; +}; + +/* These instructions require default stub function. Stubs are never called. + They allow to reduce the amount of conditionally-compiled code: if a stub is + defined there is no need to guard the call by #ifdef (plain if statement can + be used instead). */ + +#define DEF_STUB(name, opno) { name, opno, false } + +static stub_info_t stubs[] = { +DEF_STUB ("simple_return", 0), +DEF_STUB ("return", 0), +DEF_STUB ("epilogue", 0), +DEF_STUB ("mem_thread_fence",1), +DEF_STUB ("memory_barrier", 0), +DEF_STUB ("mem_signal_fence",1), +DEF_STUB ("load_multiple", 3), +DEF_STUB ("store_multiple", 3), +DEF_STUB ("tablejump", 2) +}; + +#undef DEF_STUB + +/* Helper traits for using null-terminated strings as keys in hash map. + FIXME: Unify various "string hashers" and move them to hash-map-traits.h. */ +struct gflg_string_hasher : default_hashmap_traits +{ + typedef const char *value_type; + typedef const char *compare_type; + + static inline hashval_t hash (const char *s) + { +return htab_hash_string (s); + } + + static inline bool equal_keys (const char *p1, const char *p2) + { +return strcmp (p1, p2) == 0; + } +}; + +/* Mapping from insn nam
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
On Tue, Jun 9, 2015 at 9:30 PM, Jakub Jelinek wrote: > On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote: >> Please find attach a patch that takes your idea slightly further. We >> find perhaps zero-extended UNSPEC_TP, and copy it for further use. At >> its place, we simply slap const0_rtx. We know that address to > > Is that safe? I mean, the address, even if offsetable, can have some > immediate already (seems e.g. the offsettable_memref_p predicate just checks > you can plus_constant some small integer and be recognized again) and if you > turn the %gs: into a const0_rtx, it would fail next decompose. > And when you already have the PLUS which has UNSPEC_TP as one of its > arguments, replacing that PLUS with the other argument is IMHO very easy. > Perhaps you are right that there is no need to copy_rtx, supposedly > the rtx shouldn't be shared with anything and thus can be modified in place. Hm, you are right. I was under impression that decompose_address can handle multiple CONST_INT addends, which is unfortunatelly not the case. > If -mx32 is a non-issue here, then perhaps my initial patch is good enough? It looks to me, that if you detect and record zero-extended UNSPEC_TP, your original patch would also handle -mx32. Can you please repost your original patch with the above addition? Thanks, Uros.
Re: [PATCH] Fix ix86_split_long_move collision handling with TLS (PR target/66470)
Uros Bizjak writes: > On Tue, Jun 9, 2015 at 9:30 PM, Jakub Jelinek wrote: >> On Tue, Jun 09, 2015 at 08:09:28PM +0200, Uros Bizjak wrote: >>> Please find attach a patch that takes your idea slightly further. We >>> find perhaps zero-extended UNSPEC_TP, and copy it for further use. At >>> its place, we simply slap const0_rtx. We know that address to >> >> Is that safe? I mean, the address, even if offsetable, can have some >> immediate already (seems e.g. the offsettable_memref_p predicate just checks >> you can plus_constant some small integer and be recognized again) and if you >> turn the %gs: into a const0_rtx, it would fail next decompose. >> And when you already have the PLUS which has UNSPEC_TP as one of its >> arguments, replacing that PLUS with the other argument is IMHO very easy. >> Perhaps you are right that there is no need to copy_rtx, supposedly >> the rtx shouldn't be shared with anything and thus can be modified in place. > > Hm, you are right. I was under impression that decompose_address can > handle multiple CONST_INT addends, which is unfortunatelly not the > case. That's in some ways a feature though. I don't think we want to support multiple offsets, since that implies having more than one representation for the same address. Thanks, Richard
Re: [PATCH, Google] Notify df framework when removing an insn in simplify-got
Carrot Wei writes: > Index: simplify-got.c > === > --- simplify-got.c (revision 224174) > +++ simplify-got.c (working copy) > @@ -169,7 +169,10 @@ > >/* Since there is no usage of pic_reg now, we can remove it. */ >if (use) > -remove_insn (use); > +{ > + df_insn_delete (use); > + remove_insn (use); > +} >targetm.got_access.clear_pic_reg (); >free (got_accesses); >htab_delete (var_table); Why not just use delete_insn ()? Thanks, Richard
Re: [PATCH][PR debug/65549] Restore DW_AT_abstract_origin for cross-unit call sites
On Tue, 9 Jun 2015, Pierre-Marie de Rodat wrote: > Hello, > > With the recent work for PR debug/65549, we observed a regression in the > generated debugging information. Take the attached reproducer, build it and > look at the output DWARF: > > $ gcc -c -O1 -g foo.c > $ objdump --dwarf=info foo.o > [...] > <2><4e>: Abbrev Number: 3 (DW_TAG_GNU_call_site) > <4f> DW_AT_low_pc : 0x9 > <2><57>: Abbrev Number: 0 > > There is no DW_AT_abstract_origin attribute anymore, whereas there used to be > one. > > On PowerPC with -mlongcall, for instance, call instructions are indirect and > we (at AdaCore) rely on this attribute to determine what function is actually > called (it's easier this way than interpreting machine code...). The DWARF > proposal for call sites also say debuggers should be able to use this > attribute (to build virtual call stacks in the presence of tail calls), but I > could not observe this in practice with GDB. > > The attached patch is an attempt to restore this attribute. The logic behind > it is: this is what we used to do previously for contexts that are alien > translation unit (through the call to force_decl_die), so this should not > harm. > > Bootstrapped and regtested on x86_64-linux. Ok to commit? Hmm, so the underlying issue is that we don't associate comp_unit_die () with any TRANSLATION_UNIT_DECL. For the LTO early debug work I did the following at the very beginning of dwarf2out_early_finish: /* Pick the first TRANSLATION_UNIT_DECL we didn't create a DIE for and equate it with our default CU DIE. LTO output needs to be able to lookup DIEs for translation unit decls. */ unsigned i; tree decl; FOR_EACH_VEC_SAFE_ELT (all_translation_units, i, decl) if (!lookup_decl_die (decl)) equate_decl_number_to_die (decl, comp_unit_die ()); We create some DIEs for builtin types too early before frontends got a chance to build their TRANSLATION_UNIT_DECL, so comp_unit_die gets created before there is any TRANSLATION_UNIT_DECL. Another approach would of course be that the Frontends register their main TRANSLATION_UNIT_DECL with dwarf2out via a debug hook. But - does the above work for you and fix the regression? If so adding a debug hook would probably be the cleaner solution still. Thanks, Richard. > Thank you in advance! > > gcc/ChangeLog > * dwarf2out.c (lookup_context_die): Return the DIE for the > current compilation unit when the input context is a > TRANSLATION_UNIT_DECL. > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)