Re: [PATCH 1/2] Add missing page rounding of a page_entry
On Wed, Oct 19, 2011 at 08:40:07AM +0200, Andi Kleen wrote: > diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c > index 2da99db..ba88e3f 100644 > --- a/gcc/ggc-page.c > +++ b/gcc/ggc-page.c > @@ -736,6 +736,7 @@ alloc_page (unsigned order) >entry_size = num_objects * OBJECT_SIZE (order); >if (entry_size < G.pagesize) > entry_size = G.pagesize; > + entry_size = ROUND_UP (entry_size, G.pagesize); Isn't the "if (entry_size < G.pagesize) entry_size = G.pagesize;" above this now redundant? I'm fairly sure we never call this with zero num_objects or zero OBJECT_SIZE (order) and for anything else ROUND_UP should round < pagesize to pagesize, right? Jakub
[PATCH, PR50769] Fix ICE in phi_alternatives_equal.
Richard, For the example from the PR, -ftree-tail-merge discovers that blocks 6 and 7 are equal, removes block 7 and redirects the outgoing edge of it's predecessor block 11 to block 6. Block 6 contains a phi, which after redirecting the edge from block 11 looks like this: ... .MEM_54 = PHI <.MEM_27(18), .MEM_26(17), (11)> ... Since update_vops is false, TODO_update_ssa_only_virtuals will be run and -ftree-tail-merge doesn't update the phi. However during cleanup_tree_cfg (which is done before updating the ssa), phi_alternatives_equal is called which asserts that the phi arguments are defined. The patch fixes the ICE by setting the empty phi argument to .MEM. bootstrapped and reg-tested on x86_64. Ok for trunk? Thanks, - Tom 2011-10-19 Tom de Vries PR 50769/tree-optimization * tree-ssa-tail-merge.c (replace_block_by): Calculate phi_vuse2 unconditionally. Handle case that phi_vuse2 is not an SSA_NAME. Add dummy argument .MEM to phi when increasing number of arguments of phi by redirecting edges to the block with phi. Index: gcc/tree-ssa-tail-merge.c === --- gcc/tree-ssa-tail-merge.c (revision 179773) +++ gcc/tree-ssa-tail-merge.c (working copy) @@ -1470,11 +1470,14 @@ replace_block_by (basic_block bb1, basic edge e; edge_iterator ei; + phi_vuse2 = vop_at_entry (bb2); + if (phi_vuse2 != NULL_TREE && TREE_CODE (phi_vuse2) != SSA_NAME) +phi_vuse2 = NULL_TREE; + if (update_vops) { /* Find the vops at entry of bb1 and bb2. */ phi_vuse1 = vop_at_entry (bb1); - phi_vuse2 = vop_at_entry (bb2); /* If one of the 2 not found, it means there's no need to update. */ update_vops = phi_vuse1 != NULL_TREE && phi_vuse2 != NULL_TREE; @@ -1507,6 +1510,9 @@ replace_block_by (basic_block bb1, basic gcc_assert (pred_edge != NULL); if (update_vops) VEC_safe_push (edge, heap, redirected_edges, pred_edge); + else if (phi_vuse2 && gimple_bb (SSA_NAME_DEF_STMT (phi_vuse2)) == bb2) + add_phi_arg (SSA_NAME_DEF_STMT (phi_vuse2), SSA_NAME_VAR (phi_vuse2), + pred_edge, UNKNOWN_LOCATION); } /* Update the vops. */
Re: [4.6] Fix type of SRAed enum accesses
On Tue, Oct 18, 2011 at 2:21 PM, Jakub Jelinek wrote: > On Tue, Sep 27, 2011 at 03:26:03PM +0100, Richard Sandiford wrote: >> This patch fixes a miscompilation of stage1 c-parser.o in an ARM bootstrap. >> When an access to an enum field was SRAed, a component ref used the type >> of the integer temporary variable instead of the type of the enum. >> It therefore didn't alias other accesses to the same structure, >> and was scheduled after a copy-load. >> >> Tested on x86_64-linux-gnu, and by verifying that c-parser.o is correctly >> compiled for ARM after the patch. Martin says he's going to test on ia64 >> too (thanks) -- I'll add 50326 to the changelog if that goes OK. >> >> OK to install if there are no regressions on ia64? > > As mentioned in bugzilla, the patch that caused this regression has been > committed to 4.6 branch as well. > > Here is a backport of that patch to 4.6 branch, bootstrapped/regtested on > x86_64-linux and i686-linux and Matthias AFAIK bootstrapped it on ia64-linux > (where it previously failed to bootstrap because of this), regtest pending. > > Ok for 4.6? Ok. Richard. > 2011-10-18 Jakub Jelinek > > PR target/50350 > Backport from mainline > 2011-09-27 Richard Sandiford > > PR middle-end/50386 > PR middle-end/50326 > * tree-sra.c (build_ref_for_model): Use the type of the field as > the type of the COMPONENT_REF. > > --- gcc/tree-sra.c.jj 2011-09-30 17:19:10.0 +0200 > +++ gcc/tree-sra.c 2011-10-18 08:01:28.682647833 +0200 > @@ -1448,12 +1448,12 @@ build_ref_for_model (location_t loc, tre > { > if (TREE_CODE (model->expr) == COMPONENT_REF) > { > - tree t, exp_type; > - offset -= int_bit_position (TREE_OPERAND (model->expr, 1)); > + tree t, exp_type, fld = TREE_OPERAND (model->expr, 1); > + offset -= int_bit_position (fld); > exp_type = TREE_TYPE (TREE_OPERAND (model->expr, 0)); > t = build_ref_for_offset (loc, base, offset, exp_type, gsi, > insert_after); > - return fold_build3_loc (loc, COMPONENT_REF, model->type, t, > - TREE_OPERAND (model->expr, 1), NULL_TREE); > + return fold_build3_loc (loc, COMPONENT_REF, TREE_TYPE (fld), t, fld, > + NULL_TREE); > } > else > return build_ref_for_offset (loc, base, offset, model->type, > > Jakub >
Re: [patch testsuite]: Adjust tree-ssa/builtin-expect-*.c tests for high cost-branching optimization
On Tue, Oct 18, 2011 at 1:33 PM, Kai Tietz wrote: > Hello, > > this patch adjusts __builtin_expect tests in tree-ssa so, that short-circuit > branch optimization don't lead to fallout. I've used here a multiplication, > as simple substraction/addition might get optimized away. > > ChangeLog > > 2011-10-18 Kai Tietz > > * gcc.dg/tree-ssa/builtin-expect-1.c: Adjust test. > * gcc.dg/tree-ssa/builtin-expect-2.c: Adjust test. > * gcc.dg/tree-ssa/builtin-expect-3.c: Adjust test. > * gcc.dg/tree-ssa/builtin-expect-4.c: Adjust test. > * gcc.dg/tree-ssa/builtin-expect-5.c: Adjust test. > > > Regression tested for x86_64-unknown-linux-gnu. Ok for apply? Ok. Thanks, Richard. > Regards, > Kai > > Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c > === > --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c > +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c > @@ -1,9 +1,9 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -fdump-tree-gimple" } */ > > -f (int i, float j) > +f (int i, float j, int i2, float j2) > { > - if (__builtin_expect (i > 0 && j, 0)) > + if (__builtin_expect ((i * i2) > 0 && (j * j2), 0)) > g (); > } > > Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c > === > --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c > +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c > @@ -1,9 +1,9 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -fdump-tree-gimple" } */ > > -f (int i, float j) > +f (int i, float j, int i2, float j2) > { > - if (__builtin_expect (i > 0 || j, 0)) > + if (__builtin_expect ((i * i2) > 0 || (j * j2), 0)) > ; > else > g (); > Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c > === > --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c > +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c > @@ -1,9 +1,9 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -fdump-tree-gimple" } */ > > -f (int i, float j) > +f (int i, float j, int i2, float j2) > { > - if (__builtin_expect (i > 0 && j, 0)) > + if (__builtin_expect ((i * i2) > 0 && (j * j2), 0)) > a (); > else > b (); > Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c > === > --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c > +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c > @@ -1,9 +1,9 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -fdump-tree-gimple" } */ > > -f (int i, float j) > +f (int i, float j, int i2, float j2) > { > - if (__builtin_expect (i > 0 || j, 0)) > + if (__builtin_expect ((i * i2) > 0 || (j * j2), 0)) > a (); > else > b (); > Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c > === > --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c > +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c > @@ -1,9 +1,9 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -fdump-tree-forwprop" } */ > > -f (int i, float j) > +f (int i, float j, int i2, float j2) > { > - if (__builtin_expect (i > 0 && __builtin_expect (j != 0, 1), 0)) > + if (__builtin_expect ((i * i2) > 0 && __builtin_expect ((j * j2) != 0, 1), > 0)) > a (); > else > b (); >
Re: [PATCH] Fix gcc.dg/guality/asm-1.c regressions on 4.6 branch
On Tue, Oct 18, 2011 at 5:37 PM, Jakub Jelinek wrote: > Hi! > > Apparently a recent change (2011-10-1[23] or so) regressed > gcc.dg/guality/asm-1.c test on x86_64-linux on the 4.6 branch. > This turned out to be a bug in loc_descriptor which has been passing wrong > mode on the recursive call, so when we had var_location note containing > SImode subreg of some DImode expression, the inner loc_descriptor would > just give up because of the mode mismatch. > > Fixed thusly (here the trunk version of the patch, attached 4.6 version), > bootstrapped/regtested on x86_64-linux and i686-linux (both branches in each > case), ok for trunk/4.6? Ok. Thanks, Richard. > 2011-10-18 Jakub Jelinek > > * dwarf2out.c (loc_descriptor): For SUBREG pass SUBREG_REG's mode as > second argument instead of mode. > > --- gcc/dwarf2out.c.jj 2011-09-08 11:21:10.0 +0200 > +++ gcc/dwarf2out.c 2011-10-18 14:09:46.0 +0200 > @@ -12502,7 +12502,8 @@ loc_descriptor (rtx rtl, enum machine_mo > legitimate to make the Dwarf info refer to the whole register which > contains the given subreg. */ > if (REG_P (SUBREG_REG (rtl)) && subreg_lowpart_p (rtl)) > - loc_result = loc_descriptor (SUBREG_REG (rtl), mode, initialized); > + loc_result = loc_descriptor (SUBREG_REG (rtl), > + GET_MODE (SUBREG_REG (rtl)), > initialized); > else > goto do_default; > break; > > Jakub >
Re: [PATCH 2/2] Free large chunks in ggc
On Wed, Oct 19, 2011 at 08:40:08AM +0200, Andi Kleen wrote: > From: Andi Kleen > > This implements the freeing back of large chunks in the ggc madvise path > Richard Guenther asked for. This way on systems with limited > address space malloc() and other allocators still have > a chance to get back at some of the memory ggc freed. The > fragmented pages are still just given back, but the address space > stays allocated. > > I tried freeing only aligned 2MB areas to optimize for 2MB huge > pages, but the hit rate was quite low, so I switched to 1MB+ > unaligned areas. The target size is a param now. If the size to free is smaller than the quirk size, then it has the very undesirable effect that with using GC only you might run unnecessarily out of virtual address space, because it allocates pages in 2MB chunks, but if they are released in 1MB chunks, those released chunks will never be usable again for GC. Consider on 32-bit address space allocating 3GB of GC memory, then freeing stuff in every odd 1MB chunk of pages, then wanting to allocate through GC the 1.5GB back. IMHO we should munmap immediately in release_pages the > G.pagesize pages, those are not very likely to be reused anyway (and it had one in between ggc_collect cycle to be reused anyway), and for the == G.pagesize (the usual case, the only ones that are allocated in GGC_QUIRK_SIZE sets) we should note which page was the first one in the GGC_QUIRK_SIZE chunk and munmap exactly those 2MB starting at the first page only. Jakub
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On Wed, Oct 19, 2011 at 1:58 AM, Maxim Kuvyrkov wrote: > On 29/09/2011, at 10:23 PM, Richard Guenther wrote: > >> On Thu, Sep 29, 2011 at 10:27 AM, Tom de Vries >> wrote: > >> + && SSA_NAME_IS_DEFAULT_DEF (expr) >> + && TREE_CODE (var) == PARM_DECL >> + && POINTER_TYPE_P (TREE_TYPE (var)) >> + && TYPE_USER_ALIGN (TREE_TYPE (TREE_TYPE (var >> + val = get_align_value (TYPE_ALIGN (TREE_TYPE (TREE_TYPE (var))), >> + TREE_TYPE (var), 0); >> >> I realize that the scope where this applies is pretty narrow (and thus >> bad fallout is quite unlikely). But I suppose we should document >> somewhere that GCC treats alignment attributes on parameters >> more strict than say, on assignments. > > Richard, the intention here is for a follow up patch to change "&& > TYPE_USER_ALIGN" to "&& (TYPE_USER_ALIGN || flag_assume_aligned_parameters)". > I understand that you will probably not like the idea of adding > flag_assume_aligned_parameters, but it wouldn't make such an option any less > valuable for experienced users of GCC. For some users this option will the > additional piece of rope to hang themselves; for others it will produce good > benefits of better performance. > > [Disclaimer: I've put significant amount of my time into investigation of > this problem and hate to see it all go to waste :-).] It's not wasted, but using it as part of a bad solution isn't good either ;) I do want a start to a more general solution that also will fix the ever present wrong-code bugs on strict-align targets with respect to excessive alignment we assume for them. This doesn't seem to be one, but instead it seems to introduce more inconsistencies which is very bad. What exact problems do you try to solve (I presume not the artificial testcase in the PR)? Thanks, Richard.
Weakref and cgraph part 1
Hi, this patch makes cgraph code to also handle wekrefs as long as their destination declaration is known. The cases where it is not still needs updating at lto-symtab I plan to do incrementally. The basic idea is to handle weakrefs as external declaration (so they are subject of unrechable code removal) and make cgraph code to output those that are still needed. I had this in my tree for a while, so hope it will not cause much surprises. Bootstrapped/regtested x86_64-linux & tested on Mozilla/libreoffice LTO Honza * cgraphunit.c (handle_alias_pairs): Also handle wekref with destination declared. (output_weakrefs): New function. * varpool.c (varpool_create_variable_alias): Handle external aliases. Index: cgraphunit.c === *** cgraphunit.c(revision 180053) --- cgraphunit.c(working copy) *** handle_alias_pairs (void) *** 1219,1225 for (i = 0; VEC_iterate (alias_pair, alias_pairs, i, p);) { if (TREE_CODE (p->decl) == FUNCTION_DECL - && !lookup_attribute ("weakref", DECL_ATTRIBUTES (p->decl)) && (target_node = cgraph_node_for_asm (p->target)) != NULL) { src_node = cgraph_get_node (p->decl); --- 1219,1224 *** handle_alias_pairs (void) *** 1231,1242 However for weakref we insist on EXTERNAL flag being set. See gcc.dg/attr-alias-5.c */ if (DECL_EXTERNAL (p->decl)) ! DECL_EXTERNAL (p->decl) = 0; cgraph_create_function_alias (p->decl, target_node->decl); VEC_unordered_remove (alias_pair, alias_pairs, i); } else if (TREE_CODE (p->decl) == VAR_DECL - && !lookup_attribute ("weakref", DECL_ATTRIBUTES (p->decl)) && (target_vnode = varpool_node_for_asm (p->target)) != NULL) { /* Normally EXTERNAL flag is used to mark external inlines, --- 1230,1241 However for weakref we insist on EXTERNAL flag being set. See gcc.dg/attr-alias-5.c */ if (DECL_EXTERNAL (p->decl)) ! DECL_EXTERNAL (p->decl) = lookup_attribute ("weakref", ! DECL_ATTRIBUTES (p->decl)) != NULL; cgraph_create_function_alias (p->decl, target_node->decl); VEC_unordered_remove (alias_pair, alias_pairs, i); } else if (TREE_CODE (p->decl) == VAR_DECL && (target_vnode = varpool_node_for_asm (p->target)) != NULL) { /* Normally EXTERNAL flag is used to mark external inlines, *** handle_alias_pairs (void) *** 1245,1251 However for weakref we insist on EXTERNAL flag being set. See gcc.dg/attr-alias-5.c */ if (DECL_EXTERNAL (p->decl)) ! DECL_EXTERNAL (p->decl) = 0; varpool_create_variable_alias (p->decl, target_vnode->decl); VEC_unordered_remove (alias_pair, alias_pairs, i); } --- 1244,1251 However for weakref we insist on EXTERNAL flag being set. See gcc.dg/attr-alias-5.c */ if (DECL_EXTERNAL (p->decl)) ! DECL_EXTERNAL (p->decl) = lookup_attribute ("weakref", ! DECL_ATTRIBUTES (p->decl)) != NULL; varpool_create_variable_alias (p->decl, target_vnode->decl); VEC_unordered_remove (alias_pair, alias_pairs, i); } *** ipa_passes (void) *** 2064,2069 --- 2064,2089 bitmap_obstack_release (NULL); } + /* Weakrefs may be associated to external decls and thus not output +at expansion time. Emit all neccesary aliases. */ + + void + output_weakrefs (void) + { + struct cgraph_node *node; + struct varpool_node *vnode; + for (node = cgraph_nodes; node; node = node->next) + if (node->alias && node->thunk.alias && DECL_EXTERNAL (node->decl) + && !TREE_ASM_WRITTEN (node->decl)) + assemble_alias (node->decl, + DECL_ASSEMBLER_NAME (node->thunk.alias)); + for (vnode = varpool_nodes; vnode; vnode = vnode->next) + if (vnode->alias && vnode->alias_of && DECL_EXTERNAL (vnode->decl) + && !TREE_ASM_WRITTEN (vnode->decl)) + assemble_alias (vnode->decl, + DECL_ASSEMBLER_NAME (vnode->alias_of)); + } + /* Perform simple optimizations based on callgraph. */ *** cgraph_optimize (void) *** 2150,2155 --- 2170,2177 varpool_assemble_pending_decls (); } + + output_weakrefs (); cgraph_process_new_functions (); cgraph_state = CGRAPH_STATE_FINISHED; Index: varpool.c === *** varpool.c (revision 180052) --- varpool.c (working copy) *** varpool_create_variable_alias (tree alia *** 703,711 gcc_assert (TREE_CODE (alias) =
Re: patch ping: Dump the "degree of overlap" to compare static profile with instrumentation profile
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01293.html Hi, this seems fine for mainline. I am not 100% sure it is best choice of metric (i.e. mostly one is interested if hot blocks identified by static profile cover hot blocks of real profile), but we could get some experience with it. Thanks, Honza > > Thanks! > Dehao
Re: [PATCH] Add an intermediate coverage format for gcov
> On Oct 18, 2011, at 4:19 PM, Sharad Singhai wrote: > > Okay, I liked the idea of self-descriptive tags. I have updated the > > patch based on your suggestions. I have simplified the format > > somewhat. Instead of repeating function name, I use a 'function' tag > > with the format > > > > function:,, > > Sound nice. > > > I also dropped the unmangled function names, they were turning out to > > be too unreadable and not really useful in this context. > > Ah, I'd argue for mangled names. Every one knows they can stream through > c++filt and get unmangled, but once you unmangle, you can never go back. > Also, the mangled version is significantly smaller. For c, it is irrelevant, > for c++, it makes a big difference. I would also support unmangled variant. Otherwise the patch seems resonable to me. Honza > >
Re: [Patch,AVR]: PR50447: Tweak addhi3
2011/10/19 Georg-Johann Lay : > Georg-Johann Lay schrieb: >> >> Denis Chertykov schrieb: >> >>> What difference in size of avr-libc ? >> >> I have no tool for smart size analysis, so here is just a diff: >> >> After rebuilding avr-libc with respective compiler version, did >> respectively: >> >> $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-orig.txt >> $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-patch.txt >> >> and then >> >> $ diff -U 0 size-orig.txt size-patch.txt > size.diff >> >> As far as I can see, there is not a big gain but no object increases in >> size. >> >> For some files like ./avr/lib/avr2/libc.a:dtoa_prf.o size gain is 3%. >> For ./avr/lib/avr4/libc.a:vfprintf_std.o it's 1.7% and for others just one >> instruction better. > > Actually there are some cases where the size increases by one instruction: > >> - 464 0 0 464 1d0 realloc.o (ex >> ./avr/lib/avr31/libc.a) >> + 466 0 0 466 1d2 realloc.o (ex >> ./avr/lib/avr31/libc.a) > >> - 464 0 0 464 1d0 realloc.o (ex >> ./avr/lib/avr3/libc.a) >> + 466 0 0 466 1d2 realloc.o (ex >> ./avr/lib/avr3/libc.a) > > Will have a look tomorrow; presumably it's adding +/-1 that force a scratch > whilst the old code does not. > However results are good. The patch can be committed. Denis.
Re: [PATCH][ARM] -m{cpu,tune,arch}=native
On 18/10/11 15:42, Andrew Stubbs wrote: Which still jumps to not_found without closing f. Hmmm, I know I fixed that, I know I did! But I appear to have lost the change somewhere when I updated my checkout? I'll fix it now. Fixed and committed as attached. Apologies for the cock-up. :( Andrew 2011-10-18 Andrew Stubbs * config/arm/driver-arm.c (host_detect_local_cpu): Close the file before exiting. --- src/gcc-mainline/gcc/config/arm/driver-arm.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/src/gcc-mainline/gcc/config/arm/driver-arm.c b/src/gcc-mainline/gcc/config/arm/driver-arm.c index 43b6e58..9a6762b 100644 --- a/src/gcc-mainline/gcc/config/arm/driver-arm.c +++ b/src/gcc-mainline/gcc/config/arm/driver-arm.c @@ -75,7 +75,7 @@ host_detect_local_cpu (int argc, const char **argv) { const char *val = NULL; char buf[128]; - FILE *f; + FILE *f = NULL; bool arch; const struct vendor_cpu *cpu_table = NULL; @@ -134,6 +134,10 @@ not_found: unsigned int i; unsigned int opt; const char *search[] = {NULL, "arch"}; + +if (f) + fclose (f); + search[0] = argv[0]; for (opt = 0; opt < ARRAY_SIZE (search); opt++) for (i = 0; i < ARRAY_SIZE (configure_default_options); i++)
[PATCH] Fix PR50768, rewrite gimplify_and_update_call_from_tree
So my previous patch has caused some bugs - not unreasonable given the complexity of that function. Thus the following patch makes it readable and understandable again, by exchanging the intricate one-stage processing with a two-stage one, first assigning VDEFs and then assigning VUSEs in a second stage. Also doing the lhs assignment processing upfront which removes the need for all the duplicate code at the tail. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, let's hope it can stay this simple ;) Thanks, Richard. 2011-10-19 Richard Guenther PR middle-end/50768 * gimple-fold.c (gimplify_and_update_call_from_tree): Rewrite. * gcc.dg/torture/pr50768.c: New testcase. Index: gcc/gimple-fold.c === *** gcc/gimple-fold.c (revision 180166) --- gcc/gimple-fold.c (working copy) *** void *** 534,558 gimplify_and_update_call_from_tree (gimple_stmt_iterator *si_p, tree expr) { tree lhs; - tree tmp = NULL_TREE; /* Silence warning. */ gimple stmt, new_stmt; gimple_stmt_iterator i; gimple_seq stmts = gimple_seq_alloc(); struct gimplify_ctx gctx; ! gimple last = NULL; ! gimple laststore = NULL; tree reaching_vuse; stmt = gsi_stmt (*si_p); gcc_assert (is_gimple_call (stmt)); - lhs = gimple_call_lhs (stmt); - reaching_vuse = gimple_vuse (stmt); - push_gimplify_context (&gctx); gctx.into_ssa = gimple_in_ssa_p (cfun); if (lhs == NULL_TREE) { gimplify_and_add (expr, &stmts); --- 534,555 gimplify_and_update_call_from_tree (gimple_stmt_iterator *si_p, tree expr) { tree lhs; gimple stmt, new_stmt; gimple_stmt_iterator i; gimple_seq stmts = gimple_seq_alloc(); struct gimplify_ctx gctx; ! gimple last; ! gimple laststore; tree reaching_vuse; stmt = gsi_stmt (*si_p); gcc_assert (is_gimple_call (stmt)); push_gimplify_context (&gctx); gctx.into_ssa = gimple_in_ssa_p (cfun); + lhs = gimple_call_lhs (stmt); if (lhs == NULL_TREE) { gimplify_and_add (expr, &stmts); *** gimplify_and_update_call_from_tree (gimp *** 571,675 } } else ! tmp = get_initialized_tmp_var (expr, &stmts, NULL); pop_gimplify_context (NULL); if (gimple_has_location (stmt)) annotate_all_with_location (stmts, gimple_location (stmt)); ! /* The replacement can expose previously unreferenced variables. */ for (i = gsi_start (stmts); !gsi_end_p (i); gsi_next (&i)) { if (last) { gsi_insert_before (si_p, last, GSI_NEW_STMT); gsi_next (si_p); } new_stmt = gsi_stmt (i); if (gimple_in_ssa_p (cfun)) find_new_referenced_vars (new_stmt); /* If the new statement possibly has a VUSE, update it with exact SSA name we know will reach this one. */ if (gimple_has_mem_ops (new_stmt)) ! { ! /* If we've also seen a previous store create a new VDEF for !the latter one, and make that the new reaching VUSE. */ ! if (laststore) ! { ! reaching_vuse = make_ssa_name (gimple_vop (cfun), laststore); ! gimple_set_vdef (laststore, reaching_vuse); ! update_stmt (laststore); ! laststore = NULL; ! } ! gimple_set_vuse (new_stmt, reaching_vuse); ! gimple_set_modified (new_stmt, true); ! } ! if (gimple_assign_single_p (new_stmt) ! && !is_gimple_reg (gimple_assign_lhs (new_stmt))) ! { ! laststore = new_stmt; ! } last = new_stmt; } ! if (lhs == NULL_TREE) { ! /* If we replace a call without LHS that has a VDEF and our new ! sequence ends with a store we must make that store have the same !vdef in order not to break the sequencing. This can happen !for instance when folding memcpy calls into assignments. */ ! if (gimple_vdef (stmt) && laststore) ! { ! gimple_set_vdef (laststore, gimple_vdef (stmt)); ! if (TREE_CODE (gimple_vdef (stmt)) == SSA_NAME) ! SSA_NAME_DEF_STMT (gimple_vdef (stmt)) = laststore; ! update_stmt (laststore); ! } ! else if (gimple_in_ssa_p (cfun)) { unlink_stmt_vdef (stmt); ! release_defs (stmt); ! } ! new_stmt = last; ! } ! else ! { ! if (last) ! { ! gsi_insert_before (si_p, last, GSI_NEW_STMT); ! gsi_next (si_p); ! } ! if (laststore && is_gimple_reg (lhs)) ! { ! gimple_set_vdef (laststore, gimple_vdef (stmt)); ! update_stmt (laststore); ! if (TREE_CODE (gimple_vdef (stmt)) == SSA_NAME) ! SSA_NAME_DEF_STMT (gimple_vdef (stmt)) = laststore; ! laststore = NULL; ! } ! else if (laststore) ! { ! reaching_
Re: [trans-mem] Rename/split __transaction into __transaction_atomic and __transaction_relaxed.
On Tue, 2011-10-18 at 14:29 -0500, Aldy Hernandez wrote: > > Several ICEs in the TM tests on C++, but I think they are old. C TM > > tests work except some missing optimizations (old failures too). > > c-c++-common/tm/wrap-[12].c fail on C++, but not on C: > > wrap-1.c:5:57: error: 'transaction_wrap' argument not an identifier > > I believe this is an old error too? > > (I don't yet have a testcase summary of before the changes, so I can't > > compare properly. But will send an follow-up if there are regressions.) > > Can you run the tests before your patch to make sure there are no > regressions? That is, let's be sure that these are indeed old failures. Indeed old failures, no new regressions. Tested and boot-strapped on x86_64. OK for branch?
Re: [C++ Patch] __builtin_choose_expr
Andy Gibbs wrote on 18th October 2011 at 1:36 PM: > Paolo Carlini wrote on 18th October 2011 at 12:00 PM: >> I'm under the impression that some tests are written in C, wouldn't better >> fit in c-c++-common? > > Ok, I can do this but I think only 2 out of the 8 tests can be moved > across. I will then move the duplicates out of the gcc.dg folder too. Actually, I managed to get 3 common tests moved out of gcc.dg (pre-existing) and g++.dg/ext (from my original patch) into c-c++-common. > I will post a replacement patch later this afternoon. And here it is... Is it now ready to be committed to trunk? Regards, Andy = Changelog: 2011-10-19 Andy Gibbs * c-family/c-common.c: Enable __builtin_choose_expr * cp/cp-tree.def: Add tree node for handling __builtin_choose_expr within a template scenario * cp/cp-tree.h: Ditto * cp/cp-objcp-common.c: Ditto * cp/decl.c: Ditto * cp/parser.c (cp_parser_skip_to_closing_parenthesis): Remove the unnecessary requirement for 'recovering' flag to be set when 'or_comma' flag is set * cp/parser.c (cp_parser_skip_to_end_of_parameter): New function * cp/parser.c (cp_parser_builtin_choose_expr): New function * cp/parser.c (cp_parser_primary_expression): Use it * cp/semantics.c (potential_constant_expression_1): Mark the use of __builtin_choose_expr as a potentially constant expression * cp/semantics.c (evaluate_builtin_choose_expr_condition): New function * cp/semantics.c (finish_builtin_choose_expr): New function * cp/pt.c (tsubst_copy_and_build): Use it * cp/cxx-pretty-print.h: New function pp_cxx_choose_expression * cp/cxx-pretty-print.c: Document, define and use it * cp/error.c: Use it * doc/extend.text: Update docs * testsuite/gcc.dg/builtin-choose-expr.c: Moved... * testsuite/c-c++-common/builtin-choose-expr.c: ...to here * testsuite/gcc.dg/builtin-choose-expr-2.c: Moved... * testsuite/c-c++-common/builtin-choose-expr-2.c: ...to here * testsuite/gcc.dg/Wunused-var-2.c: Moved... * testsuite/c-c++-common/Wunused-var-15.c: ...to here * testsuite/g++.dg/ext/builtin-choose-expr1.C: New test * testsuite/g++.dg/ext/builtin-choose-expr2.C: New test * testsuite/g++.dg/ext/builtin-choose-expr3.C: New test * testsuite/g++.dg/ext/builtin-choose-expr4.C: New test * testsuite/g++.dg/ext/builtin-choose-expr5.C: New test --- gcc-4.7.0-r180072/gcc/c-family/c-common.c +++ gcc-4.7.0-patched/gcc/c-family/c-common.c @@ -423,7 +423,7 @@ const struct c_common_resword c_common_r { "__asm__", RID_ASM,0 }, { "__attribute", RID_ATTRIBUTE, 0 }, { "__attribute__", RID_ATTRIBUTE, 0 }, - { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY }, + { "__builtin_choose_expr", RID_CHOOSE_EXPR, 0 }, { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY }, { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY }, { "__builtin_offsetof", RID_OFFSETOF, 0 }, --- gcc-4.7.0-r180072/gcc/cp/cp-objcp-common.c +++ gcc-4.7.0-patched/gcc/cp/cp-objcp-common.c @@ -100,6 +100,8 @@ cp_tree_size (enum tree_code code) case TEMPLATE_INFO: return sizeof (struct tree_template_info); +case CHOOSE_EXPR: return sizeof (struct tree_choose_expr); + default: gcc_unreachable (); } @@ -301,6 +303,7 @@ cp_common_init_ts (void) MARK_TS_TYPED (USING_STMT); MARK_TS_TYPED (LAMBDA_EXPR); MARK_TS_TYPED (CTOR_INITIALIZER); + MARK_TS_TYPED (CHOOSE_EXPR); } #include "gt-cp-cp-objcp-common.h" --- gcc-4.7.0-r180072/gcc/cp/cp-tree.def +++ gcc-4.7.0-patched/gcc/cp/cp-tree.def @@ -332,6 +332,9 @@ DEFTREECODE (TAG_DEFN, "tag_defn", tcc_e /* Represents an 'offsetof' expression during template expansion. */ DEFTREECODE (OFFSETOF_EXPR, "offsetof_expr", tcc_expression, 1) +/* Represents an '__builtin_choose_expr' expression during template expansion. */ +DEFTREECODE (CHOOSE_EXPR, "choose_expr", tcc_exceptional, 0) + /* Represents a 'sizeof' expression during template expansion. */ DEFTREECODE (SIZEOF_EXPR, "sizeof_expr", tcc_expression, 1) --- gcc-4.7.0-r180072/gcc/cp/cp-tree.h +++ gcc-4.7.0-patched/gcc/cp/cp-tree.h @@ -532,6 +532,33 @@ struct GTY (()) tree_deferred_noexcept { }; +/* The condition associated with __builtin_choose_expr. This must be + an integral constant expression. */ +#define CHOOSE_EXPR_CONDITION(NODE) \ + (((struct tree_choose_expr *)CHOOSE_EXPR_CHECK (NODE))->condition) + +/* The expression to be used when condition in __builtin_choose_expr + is true*/ +#define CHOOSE_EXPR_TRUE(NODE) \ + (((struct tree_choose_expr *)CHOOSE_EXPR_CHECK (NODE))->expr_true) + +/* The expression to be used when condition in __builtin_choose_expr + is false*/ +#define CHOOSE_EXPR_FALSE(NODE) \ + (((struct tree_choose_expr *)CHOOSE_EXPR_CHECK (NODE)
[C++ Patch] PR 38761, 40872
Hi, these simple PRs are both about conditional expressions in error calls which apparently make the life very hard to gettext: today I double checked and current 0.18.1 is still unable to cope correctly with those, it just doesn't extract the second message. Thus the below, which simply uses G_() in a few more places. Tested x86_64-linux (the make gcc.pot output too, of course) Ok? Thanks, Paolo. / 2011-10-19 Paolo Carlini PR c++/38761 PR c++/40872 * decl.c (duplicate_decls, make_typename_type, grokdeclarator): Use G_() in error message strings to facilitate translation. * semantics.c (finish_id_expression): Likewise. * parser.c (cp_parser_nested_name_specifier_opt, cp_parser_parameter_declaration): Likewise. Index: decl.c === --- decl.c (revision 180180) +++ decl.c (working copy) @@ -1542,8 +1542,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool error_at (DECL_SOURCE_LOCATION (newdecl), errmsg, newdecl); if (DECL_NAME (olddecl) != NULL_TREE) error ((DECL_INITIAL (olddecl) && namespace_bindings_p ()) -? "%q+#D previously defined here" -: "%q+#D previously declared here", olddecl); + ? G_("%q+#D previously defined here") + : G_("%q+#D previously declared here"), olddecl); return error_mark_node; } else if (TREE_CODE (olddecl) == FUNCTION_DECL @@ -3236,8 +3236,8 @@ make_typename_type (tree context, tree name, enum if (!t) { if (complain & tf_error) - error (want_template ? "no class template named %q#T in %q#T" - : "no type named %q#T in %q#T", name, context); + error (want_template ? G_("no class template named %q#T in %q#T") + : G_("no type named %q#T in %q#T"), name, context); return error_mark_node; } @@ -9143,13 +9143,13 @@ grokdeclarator (const cp_declarator *declarator, virtual. A constructor may not be static. */ if (staticp == 2) error ((flags == DTOR_FLAG) -? "destructor cannot be static member function" -: "constructor cannot be static member function"); +? G_("destructor cannot be static member function") +: G_("constructor cannot be static member function")); if (memfn_quals) { error ((flags == DTOR_FLAG) - ? "destructors may not be cv-qualified" - : "constructors may not be cv-qualified"); + ? G_("destructors may not be cv-qualified") + : G_("constructors may not be cv-qualified")); memfn_quals = TYPE_UNQUALIFIED; } @@ -9502,8 +9502,10 @@ grokdeclarator (const cp_declarator *declarator, && (!friendp || funcdef_flag)) { error (funcdef_flag -? "cannot define member function %<%T::%s%> within %<%T%>" -: "cannot declare member function %<%T::%s%> within %<%T%>", +? G_("cannot define member function %<%T::%s%> " + "within %<%T%>") +: G_("cannot declare member function %<%T::%s%> " + "within %<%T%>"), ctype, name, current_class_type); return error_mark_node; } @@ -10223,8 +10225,8 @@ grokdeclarator (const cp_declarator *declarator, || sfk == sfk_destructor) { error (funcdef_flag - ? "%qs defined in a non-class scope" - : "%qs declared in a non-class scope", name); + ? G_("%qs defined in a non-class scope") + : G_("%qs declared in a non-class scope"), name); sfk = sfk_none; } } Index: semantics.c === --- semantics.c (revision 180180) +++ semantics.c (working copy) @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "c-family/c-objc.h" #include "tree-inline.h" #include "tree-mudflap.h" +#include "intl.h" #include "toplev.h" #include "flags.h" #include "output.h" @@ -2985,8 +2986,8 @@ finish_id_expression (tree id_expression, else { error (TREE_CODE (decl) == VAR_DECL -? "use of % variable from containing function" -: "use of parameter from containing function"); +? G_("use of % variable from containing function") +: G_("use of parameter from containing function")); error (" %q+#D de
[PATCH PR50572] Tune loop alignment for Atom
Hi! Here is a patch to address PR50572. Successfully bootstrapped and passed make check on x86_64-linux. regards, Sergos 2011-10-19 Sergey Ostanevich * gcc/config/i386/i386.c (ix86_option_override_internal): use loop align by 16 bytes for Atom platform diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..7a93144 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3497,6 +3497,8 @@ ix86_option_override_internal (bool main_args_p) { align_loops = processor_target_table[ix86_tune].align_loop; align_loops_max_skip = processor_target_table[ix86_tune].align_loop_max_skip; + if (ix86_tune == PROCESSOR_ATOM && optimize > 1) +align_loops_max_skip = 15; } if (align_jumps == 0) {
Re: [PATCH PR50572] Tune loop alignment for Atom
Hello! > 2011-10-19 Sergey Ostanevich > > * gcc/config/i386/i386.c (ix86_option_override_internal): use loop > align by 16 bytes for Atom platform > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 2c53423..7a93144 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -3497,6 +3497,8 @@ ix86_option_override_internal (bool main_args_p) > { >align_loops = processor_target_table[ix86_tune].align_loop; >align_loops_max_skip = > processor_target_table[ix86_tune].align_loop_max_skip; > + if (ix86_tune == PROCESSOR_ATOM && optimize > 1) > +align_loops_max_skip = 15; > } >if (align_jumps == 0) > { You can just change the default in processor_target_table. Uros.
[commit, spu] Fix PR target/50310 (unordered vector compares)
Hello, the pr50310 test cases are failing on SPU with internal compiler errors. It seems that if the back-end provides a vcond pattern, it must support all comparison operations, including unordered ones which are missing in the SPU implementation. Since we don't actually have any specific instruction support for those operations on SPU, the following patch open-codes their semantics. Tested on spu-elf, fixes the pr50310 test cases. Committed to mainline. Bye, Ulrich ChangeLog: PR target/50310 * config/spu/spu.c (spu_emit_vector_compare): Support unordered floating-point comparisons. Index: gcc/config/spu/spu.c === *** gcc/config/spu/spu.c(revision 180156) --- gcc/config/spu/spu.c(working copy) *** spu_emit_vector_compare (enum rtx_code r *** 6415,6427 try_again = true; break; case NE: /* Treat A != B as ~(A==B). */ { enum insn_code nor_code; ! rtx eq_rtx = spu_emit_vector_compare (EQ, op0, op1, dest_mode); nor_code = optab_handler (one_cmpl_optab, dest_mode); gcc_assert (nor_code != CODE_FOR_nothing); ! emit_insn (GEN_FCN (nor_code) (mask, eq_rtx)); if (dmode != dest_mode) { rtx temp = gen_reg_rtx (dest_mode); --- 6415,6438 try_again = true; break; case NE: + case UNEQ: + case UNLE: + case UNLT: + case UNGE: + case UNGT: + case UNORDERED: /* Treat A != B as ~(A==B). */ { + enum rtx_code rev_code; enum insn_code nor_code; ! rtx rev_mask; ! ! rev_code = reverse_condition_maybe_unordered (rcode); ! rev_mask = spu_emit_vector_compare (rev_code, op0, op1, dest_mode); ! nor_code = optab_handler (one_cmpl_optab, dest_mode); gcc_assert (nor_code != CODE_FOR_nothing); ! emit_insn (GEN_FCN (nor_code) (mask, rev_mask)); if (dmode != dest_mode) { rtx temp = gen_reg_rtx (dest_mode); *** spu_emit_vector_compare (enum rtx_code r *** 6466,6471 --- 6477,6524 return mask; } break; + case LTGT: + /* Try LT OR GT */ + { + rtx lt_rtx, gt_rtx; + enum insn_code ior_code; + + lt_rtx = spu_emit_vector_compare (LT, op0, op1, dest_mode); + gt_rtx = spu_emit_vector_compare (GT, op0, op1, dest_mode); + + ior_code = optab_handler (ior_optab, dest_mode); + gcc_assert (ior_code != CODE_FOR_nothing); + emit_insn (GEN_FCN (ior_code) (mask, lt_rtx, gt_rtx)); + if (dmode != dest_mode) + { + rtx temp = gen_reg_rtx (dest_mode); + convert_move (temp, mask, 0); + return temp; + } + return mask; + } + break; + case ORDERED: + /* Implement as (A==A) & (B==B) */ + { + rtx a_rtx, b_rtx; + enum insn_code and_code; + + a_rtx = spu_emit_vector_compare (EQ, op0, op0, dest_mode); + b_rtx = spu_emit_vector_compare (EQ, op1, op1, dest_mode); + + and_code = optab_handler (and_optab, dest_mode); + gcc_assert (and_code != CODE_FOR_nothing); + emit_insn (GEN_FCN (and_code) (mask, a_rtx, b_rtx)); + if (dmode != dest_mode) + { + rtx temp = gen_reg_rtx (dest_mode); + convert_move (temp, mask, 0); + return temp; + } + return mask; + } + break; default: gcc_unreachable (); } -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [PATCH PR50572] Tune loop alignment for Atom
On Wed, Oct 19, 2011 at 4:14 PM, Uros Bizjak wrote: > > You can just change the default in processor_target_table. > > Uros. > Will it be applicable during optimizations for size? It will hurt, although not much (see PR). New patch is below. Ok for trunk as obvious? Sergos 2011-10-19 Sergey Ostanevich * gcc/config/i386/i386.c (ix86_option_override_internal): use loop align by 16 bytes for Atom platform diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..8c60086 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2596,7 +2596,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {&bdver1_cost, 32, 24, 32, 7, 32}, {&bdver2_cost, 32, 24, 32, 7, 32}, {&btver1_cost, 32, 24, 32, 7, 32}, - {&atom_cost, 16, 7, 16, 7, 16} + {&atom_cost, 16, 15, 16, 7, 16} };
Re: [C++ Patch] PR 38761, 40872
OK. Jason
Re: [PATCH PR50572] Tune loop alignment for Atom
On Wed, Oct 19, 2011 at 2:26 PM, Sergey Ostanevich wrote: >> You can just change the default in processor_target_table. >> >> Uros. >> > > Will it be applicable during optimizations for size? It will hurt, > although not much (see PR). Looking at the code, I'd say that we don't handle -Os in different way. > New patch is below. Ok for trunk as obvious? > > Sergos > > 2011-10-19 Sergey Ostanevich > > * gcc/config/i386/i386.c (ix86_option_override_internal): use loop > align by 16 bytes for Atom platform Please update ChangeLog, like: * gcc/config/i386/i386.c (processor_target_table): Change Atom align_loop_max_skip to 15. > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 2c53423..8c60086 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -2596,7 +2596,7 @@ static const struct ptt > processor_target_table[PROCESSOR_max] = > {&bdver1_cost, 32, 24, 32, 7, 32}, > {&bdver2_cost, 32, 24, 32, 7, 32}, > {&btver1_cost, 32, 24, 32, 7, 32}, > - {&atom_cost, 16, 7, 16, 7, 16} > + {&atom_cost, 16, 15, 16, 7, 16} > }; OK. Thanks, Uros.
[PATCH, i386, PR50766] Fix incorrect mem/reg operands order
Hi, Here is (almost obvous) patch, which fixes PR50766. ChangeLog entry: 2011-10-19 Kirill Yukhin * config/i386/i386.md (bmi_bextr_): Update register/ memory operand order. (bmi2_bzhi_3): Ditto. (bmi2_pdep_3): Ditto. (bmi2_pext_3): Ditto. Bootstrapped, test (from bug) passing. Could you please have a look? Thanks, K pr50766.gcc.patch Description: Binary data
Re: [PATCH PR50572] Tune loop alignment for Atom
On Wed, Oct 19, 2011 at 4:46 PM, Uros Bizjak wrote: > On Wed, Oct 19, 2011 at 2:26 PM, Sergey Ostanevich > wrote: > >>> You can just change the default in processor_target_table. >>> >>> Uros. >>> >> >> Will it be applicable during optimizations for size? It will hurt, >> although not much (see PR). > > Looking at the code, I'd say that we don't handle -Os in different way. > >> New patch is below. Ok for trunk as obvious? >> >> Sergos >> >> 2011-10-19 Sergey Ostanevich >> >> * gcc/config/i386/i386.c (ix86_option_override_internal): use loop >> align by 16 bytes for Atom platform > > Please update ChangeLog, like: > > * gcc/config/i386/i386.c (processor_target_table): Change Atom > align_loop_max_skip to 15. > >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index 2c53423..8c60086 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -2596,7 +2596,7 @@ static const struct ptt >> processor_target_table[PROCESSOR_max] = >> {&bdver1_cost, 32, 24, 32, 7, 32}, >> {&bdver2_cost, 32, 24, 32, 7, 32}, >> {&btver1_cost, 32, 24, 32, 7, 32}, >> - {&atom_cost, 16, 7, 16, 7, 16} >> + {&atom_cost, 16, 15, 16, 7, 16} >> }; > > > OK. > > Thanks, > Uros. > Thanks for comments! I double checked: for -Os there's no .p2align appeared. For -O2 I see ".p2align 4,,15" instead of ".p2align 4,,7", as expected. Can someone commit it please? Regards, Sergos 2011-10-18 Sergey Ostanevich * gcc/config/i386/i386.c (processor_target_table): Change Atom align_loops_max_skip to 15. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..8c60086 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2596,7 +2596,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {&bdver1_cost, 32, 24, 32, 7, 32}, {&bdver2_cost, 32, 24, 32, 7, 32}, {&btver1_cost, 32, 24, 32, 7, 32}, - {&atom_cost, 16, 7, 16, 7, 16} + {&atom_cost, 16, 15, 16, 7, 16} }; static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
[translation patch] Bunch of trivial PRs: 48638, 49517, 49704
Hi, a bunch of trivial fixes from Roland. The only one I deem *slightly* more controversial is "AST" uppercase instead of "ast" in cp/semantics.c, I'm gonna wait a bit in case Jason has a different opinion... Thanks, Paolo. 2011-10-19 Roland Stigge PR translation/48638 * plugin.c (add_new_plugin): Fix typo in fatal_error message. 2011-10-19 Roland Stigge PR translation/49517 * config/rx/rx.c (rx_print_operand): Fix typo in warning message. /cp 2011-10-19 Roland Stigge PR translation/49704 * semantics.c (potential_constant_expression_1): Use AST instead of ast in sorry message. Index: cp/semantics.c === --- cp/semantics.c (revision 180185) +++ cp/semantics.c (working copy) @@ -8341,7 +8341,7 @@ potential_constant_expression_1 (tree t, bool want return false; default: - sorry ("unexpected ast of kind %s", tree_code_name[TREE_CODE (t)]); + sorry ("unexpected AST of kind %s", tree_code_name[TREE_CODE (t)]); gcc_unreachable(); return false; } Index: plugin.c === --- plugin.c(revision 180181) +++ plugin.c(working copy) @@ -1,5 +1,5 @@ /* Support for GCC plugin mechanism. - Copyright (C) 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -149,7 +149,7 @@ add_new_plugin (const char* plugin_name) plugin_name, ".so", NULL); if (access (plugin_name, R_OK)) fatal_error - ("inacessible plugin file %s expanded from short plugin name %s: %m", + ("inaccessible plugin file %s expanded from short plugin name %s: %m", plugin_name, base_name); } else Index: config/rx/rx.c === --- config/rx/rx.c (revision 180181) +++ config/rx/rx.c (working copy) @@ -637,7 +637,7 @@ rx_print_operand (FILE * file, rtx op, int letter) case 0xb: fprintf (file, "fintv"); break; case 0xc: fprintf (file, "intb"); break; default: - warning (0, "unreocgnized control register number: %d - using 'psw'", + warning (0, "unrecognized control register number: %d - using 'psw'", (int) INTVAL (op)); fprintf (file, "psw"); break;
Re: [translation patch] Bunch of trivial PRs: 48638, 49517, 49704
OK. Jason
Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order
On Wed, Oct 19, 2011 at 3:07 PM, Kirill Yukhin wrote: > Here is (almost obvous) patch, which fixes PR50766. > > ChangeLog entry: > 2011-10-19 Kirill Yukhin > > * config/i386/i386.md (bmi_bextr_): Update register/ > memory operand order. > (bmi2_bzhi_3): Ditto. > (bmi2_pdep_3): Ditto. > (bmi2_pext_3): Ditto. > > Bootstrapped, test (from bug) passing. > > Could you please have a look? Please also add the testcase from the PR. You can use { dg-do assemble }, but you have to check for BMI2 effective target support. BTW: I can't find BMI2 instruction reference documentation, so I'm just rubberstamping the patch as obvious. So, OK with the testcase. Thanks, Uros.
[PATCH] Fix PR50780, make comparisons able to throw again ...
This fixes PR50780 by validating a condition is gimple before substituting it into a COND_EXPR during forwprop. When looking at this I noticed that comparisons no longer are marked as possibly throwing - because of an old tuplification bug :/ Thus, fixed as well, hopefully without fallout. Bootstrap and regtest running on x86_64-unknown-linux-gnu ... Richard. 2011-10-19 Richard Guenther PR middle-end/50780 * tree-ssa-forwprop.c (forward_propagate_into_cond): Verify the condition is properly gimple before using it. * tree-eh (stmt_could_throw_1_p): Properly extract the operation type from comparisons. Index: gcc/tree-ssa-forwprop.c === *** gcc/tree-ssa-forwprop.c (revision 180186) --- gcc/tree-ssa-forwprop.c (working copy) *** forward_propagate_into_cond (gimple_stmt *** 597,603 } } ! if (tmp) { if (dump_file && tmp) { --- 597,604 } } ! if (tmp ! && is_gimple_condexpr (tmp)) { if (dump_file && tmp) { Index: gcc/tree-eh.c === *** gcc/tree-eh.c (revision 180186) --- gcc/tree-eh.c (working copy) *** stmt_could_throw_1_p (gimple stmt) *** 2512,2518 || TREE_CODE_CLASS (code) == tcc_unary || TREE_CODE_CLASS (code) == tcc_binary) { ! t = gimple_expr_type (stmt); fp_operation = FLOAT_TYPE_P (t); if (fp_operation) { --- 2512,2524 || TREE_CODE_CLASS (code) == tcc_unary || TREE_CODE_CLASS (code) == tcc_binary) { ! if (is_gimple_assign (stmt) ! && TREE_CODE_CLASS (code) == tcc_comparison) ! t = TREE_TYPE (gimple_assign_rhs1 (stmt)); ! else if (gimple_code (stmt) == GIMPLE_COND) ! t = TREE_TYPE (gimple_cond_lhs (stmt)); ! else ! t = gimple_expr_type (stmt); fp_operation = FLOAT_TYPE_P (t); if (fp_operation) {
Re: [PATCH 2/6] Generate virtual locations for tokens
Dodji Seketeli wrote: > libcpp/ChangeLog > 2011-10-15 Tom Tromey > Dodji Seketeli > > (cpp_get_token_1): New static function. Split and extended from > cpp_get_token. Use reached_end_of_context and > consume_next_token_from_context. Unify its return point. Move > the location tweaking from cpp_get_token_with_location in here. > (cpp_get_token): Use cpp_get_token_1 Since this set of patches went in, I'm seeing weird errors when building newlib for SPU: /home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h: In function '_recipd2': /home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:129:3: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts /home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:129:13: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) unsigned int' /home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:131:10: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) unsigned int' /home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:132:10: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) unsigned int' The code in question looks like: vec_uint4 isinf, iszero, isdenorm0; [snip] isdenorm0 = spu_cmpeq(spu_shuffle((vec_uint4)exp, (vec_uint4)exp, splat_hi), 0); isinf = spu_cmpeq((vec_uint4)value_abs, (vec_uint4)expmask); iszero = spu_cmpeq((vec_uint4)value_abs, 0); Note that isinf etc. *are* defined as *unsigned* vector types. The problem seems to be that the preprocessor somehow stripped off the "unsigned" keyword. A reduced test case is: #define isinf(__x) #define vec_uint4 __vector unsigned int vec_uint4 isinf; (Using the name of a function-like macro as identifer is maybe a bit odd, but should be perfectly legal C as far as I know ...) Running this through "cc1 -E" on a SPU target before the patch set yields: __attribute__((__spu_vector__)) unsigned int isinf; as expected, but after the patch set we get: __attribute__((__spu_vector__)) int isinf; instead. The problem is clearly related to the platform-specific "macro_to_expand" routine that is used on SPU to implement the context-sensitive __vector keyword. With your changes to cpp_get_token (which is the sole caller of the macro_to_expand callback), are there any changes in the interface to the callback? Any suggestions what could be going on here? Note that the implementation of the callback is in config/spu/spu-c.c:spu_macro_to_expand Thanks, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
[build] Properly test for madvise on Solaris 10 (PR bootstrap/50777)
As described in the PR, Solaris 10 bootstrap is currently broken compiling ggc-page.c in stage2 due to no declaration for madvise(). This happens because g++ defines _XOPEN_SOURCE=600, which hides the declaration, and configure doesn't check for a declaration at all. The following patch fixes both issues by checking for a madvise() declaration separately and doing so with the C++ compiler. Testing in progress on i386-pc-solaris2.8 (where it worked before), i386-pc-solaris2.10 (bootstrap broken), i386-pc-solaris2.11 (worked before), and x86_64-unknown-linux-gnu. HAVE_MADVISE is 1 everywhere, HAVE_DECL_MADVISE is 1 everwhere except on Solaris 10 in stages 2 and 3 where g++ is used. All bootstraps are beyond the point of the breakage now. Ok for mainline if they pass? Thanks. Rainer 2011-10-19 Rainer Orth PR bootstrap/50777 * configure.ac: Save and restore CXXFLAGS around gcc_AC_CHECK_DECLS uses. Check for madvise() declaration with g++ if --enable-build-with-cxx. * configure: Regenerate. * config.in: Regenerate. * ggc-page.c (USING_MADVISE): Also check HAVE_DECL_MADVISE. # HG changeset patch # Parent 1fa7cb4d63ec8cf992373dd9ba92615a589542c3 Properly test for madvise on Solaris 10 (PR bootstrap/50777) diff --git a/gcc/configure.ac b/gcc/configure.ac --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -1094,6 +1094,8 @@ AM_LANGINFO_CODESET # We will need to find libiberty.h and ansidecl.h saved_CFLAGS="$CFLAGS" CFLAGS="$CFLAGS -I${srcdir} -I${srcdir}/../include" +saved_CXXFLAGS="$CXXFLAGS" +CXXFLAGS="$CXXFLAGS -I${srcdir} -I${srcdir}/../include" gcc_AC_CHECK_DECLS(getenv atol asprintf sbrk abort atof getcwd getwd \ strsignal strstr strverscmp \ errno snprintf vsnprintf vasprintf malloc realloc calloc \ @@ -1146,6 +1148,21 @@ gcc_AC_CHECK_DECLS(sigaltstack, , ,[ #include ]) +# g++ on Solaris 10+ defines _XOPEN_SOURCE=600, which hides the madvise() +# prototype. +AS_IF([test "$ENABLE_BUILD_WITH_CXX" = "yes"], + [AC_LANG_PUSH([C++]) + gcc_AC_CHECK_DECLS(madvise, , ,[ + #include "ansidecl.h" + #include "system.h" + ]) + AC_LANG_POP([C++])], + [gcc_AC_CHECK_DECLS(madvise, , ,[ + #include "ansidecl.h" + #include "system.h" + ]) +]) + # More time-related stuff. AC_CACHE_CHECK(for struct tms, ac_cv_struct_tms, [ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ @@ -1172,8 +1189,9 @@ if test $gcc_cv_type_clock_t = yes; then [Define if defines clock_t.]) fi -# Restore CFLAGS from before the gcc_AC_NEED_DECLARATIONS tests. +# Restore CFLAGS, CXXFLAGS from before the gcc_AC_NEED_DECLARATIONS tests. CFLAGS="$saved_CFLAGS" +CXXFLAGS="$saved_CXXFLAGS" gcc_AC_INITFINI_ARRAY diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c --- a/gcc/ggc-page.c +++ b/gcc/ggc-page.c @@ -1,6 +1,6 @@ /* "Bag-of-pages" garbage collector for the GNU compiler. Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, - 2010 Free Software Foundation, Inc. + 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -50,7 +50,8 @@ along with GCC; see the file COPYING3. #define USING_MALLOC_PAGE_GROUPS #endif -#if defined(HAVE_MADVISE) && defined(MADV_DONTNEED) && defined(USING_MMAP) +#if defined(HAVE_MADVISE) && HAVE_DECL_MADVISE && defined(MADV_DONTNEED) \ +&& defined(USING_MMAP) # define USING_MADVISE #endif -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [build] Properly test for madvise on Solaris 10 (PR bootstrap/50777)
On 10/19/2011 04:17 PM, Rainer Orth wrote: As described in the PR, Solaris 10 bootstrap is currently broken compiling ggc-page.c in stage2 due to no declaration for madvise(). This happens because g++ defines _XOPEN_SOURCE=600, which hides the declaration, and configure doesn't check for a declaration at all. The following patch fixes both issues by checking for a madvise() declaration separately and doing so with the C++ compiler. Testing in progress on i386-pc-solaris2.8 (where it worked before), i386-pc-solaris2.10 (bootstrap broken), i386-pc-solaris2.11 (worked before), and x86_64-unknown-linux-gnu. HAVE_MADVISE is 1 everywhere, HAVE_DECL_MADVISE is 1 everwhere except on Solaris 10 in stages 2 and 3 where g++ is used. All bootstraps are beyond the point of the breakage now. Ok for mainline if they pass? Thanks. Rainer 2011-10-19 Rainer Orth PR bootstrap/50777 * configure.ac: Save and restore CXXFLAGS around gcc_AC_CHECK_DECLS uses. Check for madvise() declaration with g++ if --enable-build-with-cxx. * configure: Regenerate. * config.in: Regenerate. * ggc-page.c (USING_MADVISE): Also check HAVE_DECL_MADVISE. Ok. Paolo
Re: [PATCH 2/2] Free large chunks in ggc
> If the size to free is smaller than the quirk size, then it has the very > undesirable effect that with using GC only you might run unnecessarily out > of virtual address space, because it allocates pages in 2MB chunks, but > if they are released in 1MB chunks, those released chunks will never be 1MB is just the minimum, it frees it in whatever it can find (but only for a single GC cycle usually). So when enough continuous memory is free it will be reused by GC. I guess it would be possible to add a fallback to allocate a smaller chunk if the large chunk fails, but unless someone actually comes up with a test case I have doubts it is really needed. > usable again for GC. Consider on 32-bit address space allocating 3GB > of GC memory, then freeing stuff in every odd 1MB chunk of pages, then > wanting to allocate through GC the 1.5GB back. > > IMHO we should munmap immediately in release_pages the > G.pagesize pages, Then you get the fragmentation problem back in full force. > those are not very likely to be reused anyway (and it had one in between > ggc_collect cycle to be reused anyway), and for the == G.pagesize > (the usual case, the only ones that are allocated in GGC_QUIRK_SIZE sets) > we should note which page was the first one in the GGC_QUIRK_SIZE chunk > and munmap exactly those 2MB starting at the first page only. I tried this first with aligned 2MB chunks, but it doesn't trigger ever in a normal (non LTO) bootstrap. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: [trans-mem] Rename/split __transaction into __transaction_atomic and __transaction_relaxed.
On 10/19/2011 03:16 AM, Torvald Riegel wrote: > On Tue, 2011-10-18 at 14:29 -0500, Aldy Hernandez wrote: >>> Several ICEs in the TM tests on C++, but I think they are old. C TM >>> tests work except some missing optimizations (old failures too). >>> c-c++-common/tm/wrap-[12].c fail on C++, but not on C: >>> wrap-1.c:5:57: error: 'transaction_wrap' argument not an identifier >>> I believe this is an old error too? >>> (I don't yet have a testcase summary of before the changes, so I can't >>> compare properly. But will send an follow-up if there are regressions.) >> >> Can you run the tests before your patch to make sure there are no >> regressions? That is, let's be sure that these are indeed old failures. > > Indeed old failures, no new regressions. Tested and boot-strapped on > x86_64. > > OK for branch? > Yep. r~
Re: [PATCH 2/2] Free large chunks in ggc
On Wed, Oct 19, 2011 at 04:37:45PM +0200, Andi Kleen wrote: > > If the size to free is smaller than the quirk size, then it has the very > > undesirable effect that with using GC only you might run unnecessarily out > > of virtual address space, because it allocates pages in 2MB chunks, but > > if they are released in 1MB chunks, those released chunks will never be > > 1MB is just the minimum, it frees it in whatever it can find > (but only for a single GC cycle usually). So when enough continuous > memory is free it will be reused by GC. > > I guess it would be possible to add a fallback to allocate a smaller > chunk if the large chunk fails, but unless someone actually comes > up with a test case I have doubts it is really needed. > > > usable again for GC. Consider on 32-bit address space allocating 3GB > > of GC memory, then freeing stuff in every odd 1MB chunk of pages, then > > wanting to allocate through GC the 1.5GB back. > > > > IMHO we should munmap immediately in release_pages the > G.pagesize pages, > > Then you get the fragmentation problem back in full force. Why? For one, such allocations are very rare (you only get them when a single GC allocation requests > page of memory, like perhaps a string literal over 4KB or similar or function call with over 1000 arguments etc.). And if they are unlikely to be reused, not munmapping them means wasting more virtual address space than needed. Jakub
Re: [PATCH PR50572] Tune loop alignment for Atom
On Wed, Oct 19, 2011 at 6:09 AM, Sergey Ostanevich wrote: > On Wed, Oct 19, 2011 at 4:46 PM, Uros Bizjak wrote: >> On Wed, Oct 19, 2011 at 2:26 PM, Sergey Ostanevich >> wrote: >> You can just change the default in processor_target_table. Uros. >>> >>> Will it be applicable during optimizations for size? It will hurt, >>> although not much (see PR). >> >> Looking at the code, I'd say that we don't handle -Os in different way. >> >>> New patch is below. Ok for trunk as obvious? >>> >>> Sergos >>> >>> 2011-10-19 Sergey Ostanevich >>> >>> * gcc/config/i386/i386.c (ix86_option_override_internal): use loop >>> align by 16 bytes for Atom platform >> >> Please update ChangeLog, like: >> >> * gcc/config/i386/i386.c (processor_target_table): Change Atom >> align_loop_max_skip to 15. >> >>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >>> index 2c53423..8c60086 100644 >>> --- a/gcc/config/i386/i386.c >>> +++ b/gcc/config/i386/i386.c >>> @@ -2596,7 +2596,7 @@ static const struct ptt >>> processor_target_table[PROCESSOR_max] = >>> {&bdver1_cost, 32, 24, 32, 7, 32}, >>> {&bdver2_cost, 32, 24, 32, 7, 32}, >>> {&btver1_cost, 32, 24, 32, 7, 32}, >>> - {&atom_cost, 16, 7, 16, 7, 16} >>> + {&atom_cost, 16, 15, 16, 7, 16} >>> }; >> >> >> OK. >> >> Thanks, >> Uros. >> > > Thanks for comments! > I double checked: for -Os there's no .p2align appeared. > For -O2 I see ".p2align 4,,15" instead of ".p2align 4,,7", as expected. > > Can someone commit it please? > > Regards, > Sergos > > > 2011-10-18 Sergey Ostanevich > > * gcc/config/i386/i386.c (processor_target_table): Change Atom Please remove gcc/. > align_loops_max_skip to 15. > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 2c53423..8c60086 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -2596,7 +2596,7 @@ static const struct ptt > processor_target_table[PROCESSOR_max] = > {&bdver1_cost, 32, 24, 32, 7, 32}, > {&bdver2_cost, 32, 24, 32, 7, 32}, > {&btver1_cost, 32, 24, 32, 7, 32}, > - {&atom_cost, 16, 7, 16, 7, 16} > + {&atom_cost, 16, 15, 16, 7, 16} > }; > > static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = > -- H.J.
Re: [Patch,AVR]: PR50447: Tweak addhi3
Denis Chertykov schrieb: >> Georg-Johann Lay schrieb: >>> Denis Chertykov schrieb: >>> What difference in size of avr-libc ? >>> I have no tool for smart size analysis, so here is just a diff: >>> >>> After rebuilding avr-libc with respective compiler version, did >>> respectively: >>> >>> $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-orig.txt >>> $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-patch.txt >>> >>> and then >>> >>> $ diff -U 0 size-orig.txt size-patch.txt > size.diff >>> >>> As far as I can see, there is not a big gain but no object increases in >>> size. >>> >>> For some files like ./avr/lib/avr2/libc.a:dtoa_prf.o size gain is 3%. >>> For ./avr/lib/avr4/libc.a:vfprintf_std.o it's 1.7% and for others just one >>> instruction better. >> Actually there are some cases where the size increases by one instruction: >> >>> -464 0 0 464 1d0 realloc.o (ex >>> ./avr/lib/avr31/libc.a) >>> +466 0 0 466 1d2 realloc.o (ex >>> ./avr/lib/avr31/libc.a) >>> -464 0 0 464 1d0 realloc.o (ex >>> ./avr/lib/avr3/libc.a) >>> +466 0 0 466 1d2 realloc.o (ex >>> ./avr/lib/avr3/libc.a) >> Will have a look tomorrow; presumably it's adding +/-1 that force a scratch >> whilst the old code does not. Is because different spill generation. Both cases do spilling (same frame size) but a bit different. If there was a MOVW instruction the code would be smaller, so I think it's a rare corner case. > However results are good. > The patch can be committed. > > Denis. Applied with the proposed can_create_pseudo_p() Johann
Re: [cxx-mem-model] compare_exchange implementation
On 10/18/2011 06:25 PM, Andrew MacLeod wrote: Its impossible to implement a weak compare and swap unless you return both parameters in one operation. the compare_exchange matches the atomic interface for c++, and if we can't resolve it with a lock free instruction sequence, we have to leave an external call with this format to a library, so I that why I provide this built-in. Neither rth nor I like the addressable parameter, so thats why I left the rtl pattern for weak and strong compare and swap without that addressable argument, and let this builtin generate wrapper code around it. You could provide an __atomic version of the bool and val routines with memory model we toyed with making the compare_and_swap return both values so we could implement a weak version, but getting 2 return values is not pretty. It could be done with 2 separate built-ins that relate to each other, but thats not great either. I've thought about various longer term schemes, but havent been able to settle on one I really like. Ideally, if we know we can generate lock free instructions, we expose the wrapper code to the tree optimizers. I've considered: 1) adding tree support for a CAS primitive which has 2 results.. that pretty invasive but has nice features. 2) a 2 part built-in.. one which returns a value, and a second one which takes that value and then returns the boolean: ie val = __atomic_compare_and_swap (&mem, expected, desired, model) if (__atomic_compare_and_swap_success (val)) ... and during expansion from SSA to RTL, you look for uses of the result of __atomic_compare_and_swap in __atomic_compare_and_swap_success, and you can decide what RTL pattern to use and 'merge' the 2 builtins into one pattern. You can optimize the RTL pattern used based on what, if any, other uses there are of the 2 results. I think this should work OK... 3) Waiting for a flash of brilliance (may never come) or suggestion> :-) I decided for the moment to punt on exploring those and give us more time to think about the best way to do it. We do need to be able to call this specific interface for external library calls, but we are not locked into this for inline expansion of lock free instructions. In c-common.c where we turn __atomic_exchange_compare into __atomic_compare_exchange_{1,2,4,8,16}, we can instead turn it into a code sequence using a new __atomic_compare_and_swap builtin or tree code. The wrapper code that is currently emitted as RTL could then be emitted as tree expressions before the SSA optimizers see anything. So no later than the next release of GCC, I would expect to have a fully flushed out solution that gives us all the nice inlines and removes addr-taken flags and such.I just don't feel like there is time at the moment to make the correct decision while I'm trying to get the library ABI right. If I find some time, I may experiment with #2 next week. Andrew
Re: [cxx-mem-model] compare_exchange implementation
On 10/18/2011 03:07 PM, Jakub Jelinek wrote: > On Tue, Oct 18, 2011 at 06:03:16PM -0400, Andrew MacLeod wrote: >> >> Here's the last of the missing intrinsics. compare_exchange is >> slightly different than the others since it doesn't map directly to >> an rtl pattern. Its format is : >> >> bool __atomic_compare_exchange (T* mem, T* expected, T desired, >> bool weak, memory_order success, memory_order failure) >> >> note that the expected parameter is a pointer as well. In the case >> where false is returned, the value of expected is updated. > > I think the __sync_* way here was much better (two intrinsics instead of > one, one bool-ish and one returning val). With mandating T*expected > you force it to be addressable, probably until expansion time at which point > it will be just forced into memory, which is highly undesirable. In one of Andrew's previous ideas he split _ace into two separate builtins: T __atomic_compare_exchange (T* mem, T oldval, T newval, ...); bool __atomic_compare_exchange_success (T); where the input to __ace_success *must* be the ssa_name output of _ace. We then use magic to wire up both results to the same rtl pattern. We can still probably do something exactly like that, but Andrew is concerned about getting the generic user-level interface correct. r~
[cxx-mem-model] fix some trivial renaming fallout
As was pointed out, __get_builtin_sync_mem was renamed to __get_builtin_atomic which isn't really right, so this patch renames it back. It also includes some formatting fixes which resulted from __sync_mem having a different number of letters than __atomic. Andrew * optabs.c (add_op): Fix formatting from rename. (expand_atomic_fetch_op): Fix formatting from rename. * builtins.c (get_builtin_sync_mem): Reverse renamed get_builtin_atomic. (expand_builtin_sync_operation): Fix rename issues. (expand_builtin_compare_and_swap): Fix rename issues. (expand_builtin_sync_lock_test_and_set): Fix rename issues. (expand_builtin_sync_lock_release): Fix rename issues. (expand_builtin_atomic_exchange): Fix rename issues. (expand_builtin_atomic_compare_exchange): Fix rename issues. (expand_builtin_atomic_load): Fix rename issues. (expand_builtin_atomic_store): Fix rename issues. (expand_builtin_atomic_fetch_op): Fix rename issues. (expand_builtin): Fix formatting from rename. Index: optabs.c === *** optabs.c(revision 180174) --- optabs.c(working copy) *** struct op_functions { *** 7228,7239 /* Initialize the fields for each supported opcode. */ static const struct op_functions add_op = { atomic_fetch_add_optab, ! atomic_add_fetch_optab, ! atomic_add_optab, ! sync_old_add_optab, ! sync_new_add_optab, ! sync_add_optab, ! MINUS }; static const struct op_functions sub_op = { atomic_fetch_sub_optab, --- 7228,7239 /* Initialize the fields for each supported opcode. */ static const struct op_functions add_op = { atomic_fetch_add_optab, ! atomic_add_fetch_optab, ! atomic_add_optab, ! sync_old_add_optab, ! sync_new_add_optab, ! sync_add_optab, ! MINUS }; static const struct op_functions sub_op = { atomic_fetch_sub_optab, *** maybe_emit_op (const struct op_functions *** 7358,7364 AFTER is false to return the value before the operation (fetch_OP). */ rtx expand_atomic_fetch_op (rtx target, rtx mem, rtx val, enum rtx_code code, ! enum memmodel model, bool after) { enum machine_mode mode = GET_MODE (mem); const struct op_functions *optab; --- 7358,7364 AFTER is false to return the value before the operation (fetch_OP). */ rtx expand_atomic_fetch_op (rtx target, rtx mem, rtx val, enum rtx_code code, ! enum memmodel model, bool after) { enum machine_mode mode = GET_MODE (mem); const struct op_functions *optab; Index: builtins.c === *** builtins.c (revision 180174) --- builtins.c (working copy) *** get_builtin_sync_mode (int fcode_diff) *** 5072,5078 for the builtin_sync operations. */ static rtx ! get_builtin_atomic (tree loc, enum machine_mode mode) { rtx addr, mem; --- 5072,5078 for the builtin_sync operations. */ static rtx ! get_builtin_sync_mem (tree loc, enum machine_mode mode) { rtx addr, mem; *** expand_builtin_sync_operation (enum mach *** 5173,5183 } /* Expand the operands. */ ! mem = get_builtin_atomic (CALL_EXPR_ARG (exp, 0), mode); val = expand_expr_force_mode (CALL_EXPR_ARG (exp, 1), mode); return expand_atomic_fetch_op (target, mem, val, code, MEMMODEL_SEQ_CST, !after); } /* Expand the __sync_val_compare_and_swap and __sync_bool_compare_and_swap --- 5173,5183 } /* Expand the operands. */ ! mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode); val = expand_expr_force_mode (CALL_EXPR_ARG (exp, 1), mode); return expand_atomic_fetch_op (target, mem, val, code, MEMMODEL_SEQ_CST, !after); } /* Expand the __sync_val_compare_and_swap and __sync_bool_compare_and_swap *** expand_builtin_compare_and_swap (enum ma *** 5192,5198 rtx old_val, new_val, mem; /* Expand the operands. */ ! mem = get_builtin_atomic (CALL_EXPR_ARG (exp, 0), mode); old_val = expand_expr_force_mode (CALL_EXPR_ARG (exp, 1), mode); new_val = expand_expr_force_mode (CALL_EXPR_ARG (exp, 2), mode); --- 5192,5198 rtx old_val, new_val, mem; /* Expand the operan
Re: [cxx-mem-model] compare_exchange implementation
On Wed, Oct 19, 2011 at 08:05:26AM -0700, Richard Henderson wrote: > In one of Andrew's previous ideas he split _ace into two separate builtins: > > T __atomic_compare_exchange (T* mem, T oldval, T newval, ...); > bool __atomic_compare_exchange_success (T); > > where the input to __ace_success *must* be the ssa_name output of _ace. > We then use magic to wire up both results to the same rtl pattern. > > We can still probably do something exactly like that, but Andrew is > concerned about getting the generic user-level interface correct. Until we support more than one lhs stmts in GIMPLE, there is always an option to return a _Complex or vector or struct from the builtin, where one part of the return value would contain the bool whether it has succeeded and the other part would contain the old value. That of course doesn't need to be the externally exposed builtin, it could be an internal builtin (perhaps with space in the name or something). Or perhaps the user builtin if (__atomic_compare_exchange (&mem, &expected, newval)) could be lowered into tmp1 = expected; tmp2 = __atomic val_compare_exchange (&mem, tmp1, newval); expected = tmp2; if (tmp1 == tmp2) if it is known that it can be handled by the backend. Jakub
Re: [Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit
On Mon, Oct 17, 2011 at 23:52, Janne Blomqvist wrote: > On Mon, Oct 17, 2011 at 19:03, Tobias Burnus wrote: >> Hi Janne, >> >> On 10/17/2011 05:30 PM, Janne Blomqvist wrote: >>> >>> On Mon, Oct 17, 2011 at 15:49, Tobias Burnus wrote: This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine and the FLUSH statement. It removes the _commit from gfortran's buf_flush. >>> >>> Like I argued in this message >>> http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross >>> mistake. >> >> [...] >> >> And I think it is a mistake to not make the data available to other >> processes as it is indicated by the Fortran 2008 standard: >> >> "Execution of a FLUSH statement causes data written to an external le to be >> available to other processes, or causes data placed in an external file by >> means other than Fortran to be available to a READ statement. These actions >> are processor dependent." >> >> Thus, I think it makes sense for FLUSH to call _commit on Windows. > > I'm not actually sure we can draw such conclusions. What we know is > that metadata updates to the directory are delayed. It wouldn't > surprise me if opening a file (which presumably is a atomic operation > in order to avoid race conditions just like on POSIX) forces the > kernel to sync metadata of other handles to the same file. I did some further googling, and http://stackoverflow.com/questions/2883691/fflush-on-stdout seems to suggest that the my explanation above is roughly what is happening. That is, opening and closing the file in another program, "type" in the link above, flushes the metadata to the directory. Also from the MSDN link referenced there "The only guarantee about a file timestamp is that the file time is correctly reflected when the handle that makes the change is closed. " (which would suggest that the same hold for other file metadata as well, such as the size). Explanation of file caching in Windows: http://msdn.microsoft.com/en-us/library/aa364218%28v=VS.85%29.aspx Some MS presentation about how to make apps behave nicely with SMB: http://mschnlnine.vo.llnwd.net/d1/pdc08/PPTX/ES23.pptx Under the heading of "Platform Support for Metadata Caching": "Metadata caching is best effort and there are very limited consistency guarantees Metadata caches expire after a fixed time " (Though it's not entirely clear if the above "platform support" means only SMB or Windows filesystem semantics in general.) So, I think the picture is essentially: - write() (which is a wrapper around WriteFile(EX)) transfers data and metadata to the system cache (kernel page cache in unix terminology). - However, the metadata is written to the directory lazily, allowing applications that do stat() (or the equivalent Win32 API call(s)) on a pathname to see stale data. This, incidentally, is what gfortran is doing and what caused the issue that led to the introduction of _commit in the first place. - Closing the file, or _commit() (a wrapper around FlushFileBuffers()), or (per the stackoverflow link above) opening/closing the file in another process will force the directory flush immediately. Some investigation into how other language support libraries handle this: - For MS-DOS compatibility (back when OS file caching wasn't that advanced, one presumes), the MS C runtime allows the user to link in an extra object COMMODE.OBJ which makes fflush() also call _commit() recreating the MS-DOS behavior. By default, however, this is not done. Also, there are some nonstandard flags that can be passed to fopen() to indicate that one wants the MS-DOS behavior. - For the MS C++ compiler, the same COMMODE.OBJ linking can be done, and there is some nonstandard extension allowing one to get the fd from a C++ stream and then call _commit on it. By default flush() on a stream does not call _commit/FlushFileBuffers, it just does a WriteFile(). - For .NET, there is the FileStream.Flush() method, which flushes the user-space buffer without calling _commit/FlushFileBuffers. In the latest version of .NET, there is a new FileStream.Flush(bool) method which can be used to call FlushFileBuffers. In previous .NET versions, people used PInvoke (the native code calling interface) to call FlushFileBuffers if needed. - For the runtimes provided with GCC, grepping the source tree shows that libgfortran is the only occurence of _commit. In libjava there is a FlushFileBuffers call, but it's #if 0'ed away. That is, except for libgfortran, no other language runtime by default makes an effort to synchronize the metadata. In conclusion, I still think that my previous patch which got rid of the _commit and the reliance on using stat() via pathname is the correct approach. "dir" might show stale data for the file size, but this seems to be the norm on Windows, and opening the file in another process will give the correct data. So in practice I don't think there will be any problems from getting rid of _commit. -- Janne
Re: [PATCH, PR50769] Fix ICE in phi_alternatives_equal.
On Wed, Oct 19, 2011 at 10:23 AM, Tom de Vries wrote: > Richard, > > For the example from the PR, -ftree-tail-merge discovers that blocks 6 and 7 > are > equal, removes block 7 and redirects the outgoing edge of it's predecessor > block > 11 to block 6. > > Block 6 contains a phi, which after redirecting the edge from block 11 looks > like this: > ... > .MEM_54 = PHI <.MEM_27(18), .MEM_26(17), (11)> > ... > > Since update_vops is false, TODO_update_ssa_only_virtuals will be run > and -ftree-tail-merge doesn't update the phi. > > However during cleanup_tree_cfg (which is done before updating the ssa), > phi_alternatives_equal is called which asserts that the phi arguments are > defined. > > The patch fixes the ICE by setting the empty phi argument to .MEM. > > bootstrapped and reg-tested on x86_64. > > Ok for trunk? Ok. Thanks, Richard. > Thanks, > - Tom > > 2011-10-19 Tom de Vries > > PR 50769/tree-optimization > * tree-ssa-tail-merge.c (replace_block_by): Calculate phi_vuse2 > unconditionally. Handle case that phi_vuse2 is not an SSA_NAME. Add > dummy argument .MEM to phi when increasing number of arguments of phi > by > redirecting edges to the block with phi. >
Re: [cxx-mem-model] compare_exchange implementation
> ! weak = expand_expr (CALL_EXPR_ARG (exp, 3), NULL_RTX, ptr_mode, > ! EXPAND_NORMAL); > ! > ! if (weak != const0_rtx && weak != const1_rtx) > ! { > ! error ("strong/weak parameter must be 0 or 1 for > %<__atomic_compare_exchange%>"); > ! return NULL_RTX; > ! } Really? Are you even allowed to do that, like unto the variable memory model parameters? > + /* Emit the compare_and_swap. */ > create_output_operand (&ops[0], target, QImode); > create_output_operand (&ops[1], mem, mode); > ! create_output_operand (&ops[2], current_val, mode); > ! create_convert_operand_to (&ops[3], expected_val, mode, true); > ! create_convert_operand_to (&ops[4], desired, mode, true); > ! create_integer_operand (&ops[5], success); > ! expand_insn (icode, 6, ops); The mem operand must use create_fixed_operand, as with all of the other atomic builtins. I strongly suggest swapping op 1 and op2, so that all of the "real" outputs come first. I suggested on IRC that you examine the mode of the boolean output operand, and not hard-code QImode. Most targets will produce a word-sized boolean output that need not be zero-extended. > ! emit_cmp_and_jump_insns (ops[0].value, const0_rtx, NE, const0_rtx, > !QImode, 1, true_label); > ! > ! /* if not successful copy expected_val into *expected and issue the > ! failure fence. */ > ! emit_move_insn (gen_rtx_MEM (mode, expected), current_val); > ! expand_builtin_mem_thread_fence (failure); > ! > ! /* If no success fence is required, we're done. Otherwise jump around > the > ! code for TRUE and emitthe fence. */ > ! if (true_label != done_label) > ! { > ! emit_jump_insn (gen_jump (done_label)); > ! > ! emit_label (true_label); > ! if (success != MEMMODEL_RELAXED && success != MEMMODEL_RELEASE) > ! expand_builtin_mem_thread_fence (success); > ! } Really? We can do better than this. In particular, by leaving it up to the target. In the x86 case, cmpxchg is a full barrier. Always. No need for any followup barriers. The ll/sc targets are going to generate code that looks like start-barrier restart: ll curval, mem mov t, 0 cmp curval, oldval bne fail mov t, newval sc t, mem beq t, restart // strong-version only fail: end-barrier // outputs: t = 1/0 for success/failure // curval = the current value of the memory, to be stored back into expected where the position of fail: before or after the end-barrier depends on the relative strength of the memory models. No one is going to want to insert an extra jump around a barrier, especially for a not-expected failure path. We should encode the two memory model parameters and the weak parameter into a single CONST_INT operand. Probably with some macros in optabs.h to extract those parameters in the targets. r~
Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order
On Wed, Oct 19, 2011 at 6:21 AM, Uros Bizjak wrote: > On Wed, Oct 19, 2011 at 3:07 PM, Kirill Yukhin > wrote: > >> Here is (almost obvous) patch, which fixes PR50766. >> >> ChangeLog entry: >> 2011-10-19 Kirill Yukhin >> >> * config/i386/i386.md (bmi_bextr_): Update register/ >> memory operand order. >> (bmi2_bzhi_3): Ditto. >> (bmi2_pdep_3): Ditto. >> (bmi2_pext_3): Ditto. Please mention PR #. >> Bootstrapped, test (from bug) passing. >> >> Could you please have a look? > > Please also add the testcase from the PR. You can use { dg-do > assemble }, but you have to check for BMI2 effective target support. > > BTW: I can't find BMI2 instruction reference documentation, so I'm It is in Intel AVX spec: http://software.intel.com/en-us/avx/ > just rubberstamping the patch as obvious. > -- H.J.
Re: [cxx-mem-model] compare_exchange implementation
On 10/19/2011 11:16 AM, Jakub Jelinek wrote: On Wed, Oct 19, 2011 at 08:05:26AM -0700, Richard Henderson wrote: In one of Andrew's previous ideas he split _ace into two separate builtins: T __atomic_compare_exchange (T* mem, T oldval, T newval, ...); bool __atomic_compare_exchange_success (T); where the input to __ace_success *must* be the ssa_name output of _ace. We then use magic to wire up both results to the same rtl pattern. We can still probably do something exactly like that, but Andrew is concerned about getting the generic user-level interface correct. Until we support more than one lhs stmts in GIMPLE, there is always an option to return a _Complex or vector or struct from the builtin, where one part of the return value would contain the bool whether it has succeeded and the other part would contain the old value. That of course doesn't need to be the externally exposed builtin, it could be an internal builtin (perhaps with space in the name or thats a possible 3rd option. create a custom struct for the return value. Accessing the values in the struct are still going to be memory accesses until RTL expansion time aren't they? user writes: atomic_compare_exchange (&mem, &expected, desired, model)) gets turned into something like typedef struct { int val; bool success; } __atomic_CAS_result; { __atomic_CAS_result __res; tmp = *expected; __res = __atomic_CAS (&mem, tmp, desired, model) if (!res.success) *expected = res.val; return res.success; } since __res is a struct, that will have to be allocated on the stack and then see if the RTL optimizations can remove it, right? The user variable would be absolved off all the address taken crud though. something). Or perhaps the user builtin if (__atomic_compare_exchange (&mem,&expected, newval)) could be lowered into tmp1 = expected; tmp2 = __atomic val_compare_exchange (&mem, tmp1, newval); expected = tmp2; if (tmp1 == tmp2) if it is known that it can be handled by the backend. Thats what we generate now if we fall back to the __sync implementations, but that only works for strong CAS. Of course, we're only falling back to that at RTL time rather than the tree optimizers... we could fudge around earlier perhaps for the strong version... Andrew
Re: [translation patch] Bunch of trivial PRs: 48638, 49517, 49704
On Wed, Oct 19, 2011 at 6:11 AM, Paolo Carlini wrote: > Hi, > > a bunch of trivial fixes from Roland. The only one I deem *slightly* more > controversial is "AST" uppercase instead of "ast" in cp/semantics.c, I'm > gonna wait a bit in case Jason has a different opinion... > > Thanks, > Paolo. > We also need to update plugin tests: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50796 -- H.J.
Re: [PATCH 2/2] Free large chunks in ggc
> Why? For one, such allocations are very rare (you only get them when > a single GC allocation requests > page of memory, like perhaps a string > literal over 4KB or similar or function call with over 1000 arguments etc.). > And if they are unlikely to be reused, not munmapping them means wasting > more virtual address space than needed. We produce quite bit of large hashtables in GGC and those will happily be bigger than 4KB. Honza
Re: [PATCH, m68k] Enable building for ColdFire Linux
Julian Brown writes: > Maxim Kuvyrkov > > gcc/ > * config/m68k/t-linux (M68K_MLIB_CPU): Add ColdFire CPUs. Ok. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH 2/6] Generate virtual locations for tokens
"Ulrich Weigand" writes: > The problem seems to be that the preprocessor somehow stripped > off the "unsigned" keyword. A reduced test case is: > > #define isinf(__x) > > #define vec_uint4 __vector unsigned int > > vec_uint4 isinf; > > (Using the name of a function-like macro as identifer is maybe a bit odd, > but should be perfectly legal C as far as I know ...) > > Running this through "cc1 -E" on a SPU target before the patch set yields: > > __attribute__((__spu_vector__)) unsigned int isinf; > > as expected, but after the patch set we get: > > __attribute__((__spu_vector__)) int isinf; > > instead. > > The problem is clearly related to the platform-specific "macro_to_expand" > routine that is used on SPU to implement the context-sensitive __vector > keyword. > > With your changes to cpp_get_token (which is the sole caller of the > macro_to_expand callback), are there any changes in the interface > to the callback? Not that I would expect. > Any suggestions what could be going on here? Nothing obvious is coming to my mind right now, I am still looking at the issue. -- Dodji
PR bootstrap/50709 (bootstrap miscompare)
Hi, the ENABLE_CHECKING code verifying consistency of inliner cache has effect on changing order of querries to the fibheap that consequentely makes entries with same key come in different orders. Fixed by enabling the checking aways. I also added extra check that the keys are conservatively correct. Bootstrapped/regtested x86_64-linux. Index: ChangeLog === --- ChangeLog (revision 180190) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2011-10-19 Jan Hubicka + + PR bootstrap/50709 + * ipa-inline.c (inline_small_functions): Fix checking code to not make + effect on fibheap stability. + 2011-10-19 Roland Stigge PR translation/48638 Index: ipa-inline.c === --- ipa-inline.c(revision 180190) +++ ipa-inline.c(working copy) @@ -1384,6 +1384,7 @@ inline_small_functions (void) struct cgraph_node *where, *callee; int badness = fibheap_min_key (heap); int current_badness; + int cached_badness = -1; int growth; edge = (struct cgraph_edge *) fibheap_extract_min (heap); @@ -1392,16 +1393,20 @@ inline_small_functions (void) if (!edge->inline_failed) continue; - /* Be sure that caches are maintained consistent. */ #ifdef ENABLE_CHECKING + /* Be sure that caches are maintained conservatively consistent. +This means that cached badness is allways smaller or equal +to the real badness. */ + cached_badness = edge_badness (edge, false); +#endif reset_edge_growth_cache (edge); reset_node_growth_cache (edge->callee); -#endif /* When updating the edge costs, we only decrease badness in the keys. Increases of badness are handled lazilly; when we see key with out of date value on it, we re-insert it now. */ current_badness = edge_badness (edge, false); + gcc_assert (cached_badness == -1 || cached_badness <= current_badness); gcc_assert (current_badness >= badness); if (current_badness != badness) {
Re: [PATCH PR50572] Tune loop alignment for Atom
On Wed, Oct 19, 2011 at 9:30 AM, Sergey Ostanevich wrote: > 2011-10-18 Sergey Ostanevich > > * config/i386/i386.c (processor_target_table): Change Atom > align_loops_max_skip to 15. Please add PR # in ChangeLog. > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 2c53423..8c60086 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -2596,7 +2596,7 @@ static const struct ptt > processor_target_table[PROCESSOR_max] = > {&bdver1_cost, 32, 24, 32, 7, 32}, > {&bdver2_cost, 32, 24, 32, 7, 32}, > {&btver1_cost, 32, 24, 32, 7, 32}, > - {&atom_cost, 16, 7, 16, 7, 16} > + {&atom_cost, 16, 15, 16, 7, 16} > }; > > static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = Please provide a patch which can be applied. Cut/paste doesn't create a working patch. Please attach it. -- H.J.
Fwd: Re: [cxx-mem-model] compare_exchange implementation
grr, reposting due to thunderbird apparently sneaking some HTML in and it being rejected... sigh. On 10/19/2011 11:34 AM, Richard Henderson wrote: ! weak = expand_expr (CALL_EXPR_ARG (exp, 3), NULL_RTX, ptr_mode, ! EXPAND_NORMAL); ! ! if (weak != const0_rtx&& weak != const1_rtx) ! { ! error ("strong/weak parameter must be 0 or 1 for %<__atomic_compare_exchange%>"); ! return NULL_RTX; ! } Really? Are you even allowed to do that, like unto the variable memory model parameters? well, they are explicitly'compare_exchange_strong' and 'compare_exchange_weak' calls... so yes, they have 'hardcoded' one or the other :-) we could alternatively do 2 separate builtins, but I didn't see the point. But Im ambivalent. + /* Emit the compare_and_swap. */ create_output_operand (&ops[0], target, QImode); create_output_operand (&ops[1], mem, mode); ! create_output_operand (&ops[2], current_val, mode); ! create_convert_operand_to (&ops[3], expected_val, mode, true); ! create_convert_operand_to (&ops[4], desired, mode, true); ! create_integer_operand (&ops[5], success); ! expand_insn (icode, 6, ops); The mem operand must use create_fixed_operand, as with all of the other atomic builtins. I strongly suggest swapping op 1 and op2, so that all of the "real" outputs come first. oh, right, my error on the output operand... I will shuffle. I suggested on IRC that you examine the mode of the boolean output operand, and not hard-code QImode. Most targets will produce a word-sized boolean output that need not be zero-extended. I clearly misunderstood what you said then, I thought you said use QImode :-P. So, look at target and use that mode? target may not be set sometimes tho right? what do you use then? QImode? ! emit_cmp_and_jump_insns (ops[0].value, const0_rtx, NE, const0_rtx, ! QImode, 1, true_label); ! ! /* if not successful copy expected_val into *expected and issue the !failure fence. */ ! emit_move_insn (gen_rtx_MEM (mode, expected), current_val); ! expand_builtin_mem_thread_fence (failure); ! ! /* If no success fence is required, we're done. Otherwise jump around the ! code for TRUE and emitthe fence. */ ! if (true_label != done_label) ! { ! emit_jump_insn (gen_jump (done_label)); ! ! emit_label (true_label); ! if (success != MEMMODEL_RELAXED&& success != MEMMODEL_RELEASE) ! expand_builtin_mem_thread_fence (success); ! } Really? We can do better than this. In particular, by leaving it up to the target. In the x86 case, cmpxchg is a full barrier. Always. No need for any followup barriers. This is the tricky part. Actually, it is wrong but for a different reason. I don't think the CAS needs a model. If I understand my previous conversation with Lawrence, we need to know what mode the native CAS is. If its seq-cst, then we don't need to do anything else. If it is relaxed though, then additional fences are required. Worst case, if the native CAS is relaxed, the fences need to be issued as such to implement compare_exchange (T* ptr, T* old, new, success, failure) memory_fence (success) // release modes val = *old; res = relaxed_CAS (ptr, val, new) if (success) { memory_fence (success); // acquire modes return true; } *old = res; memory_fence (failure) return false; so leaving it up to the target for the CAS isnt enough.. we may need to issue one or more of these fences if the native CAS mode is weaker than the success and/or failure mode. Or perhaps I misunderstood... Lawrence? if instead it maps to val = *old; memory_fence (success) // release modes res = relaxed_CAS (ptr, val, new) if (success_of_CAS) { memory_fence (success); // acquire modes return true; } memory_fence (failure) *old = res; return false; then the CAS could take care of issuing all the required fences without a problem. I'd be very happy with that :-) The ll/sc targets are going to generate code that looks like start-barrier restart: ll curval, mem mov t, 0 cmp curval, oldval bne fail mov t, newval sc t, mem beq t, restart // strong-version only fail: end-barrier // outputs: t = 1/0 for success/failure // curval = the current value of the memory, to be stored back into expected where the position of fail: before or after the end-barrier depends on the relative strength of the memory models. No one is going to want to insert an extra jump around a barrier, especially for a not-expected failure path. We should encode the two memory model parameters and the weak parameter into a single CONST_INT operand. Prob
[4.6,patch,libgfortran/io] Fix build issue with _commit [PR5016]
I intent to commit this patch - created mostly by Janne - soon to gcc-4_6-branch; it should solve a build issue. The patch was build and regtested on x86-64-linux and build and tested on MinGW and Cygwin by Kai. It is a follow up to http://gcc.gnu.org/ml/fortran/2011-10/msg00132.html / http://gcc.gnu.org/ml/gcc-cvs/2011-10/msg00734.html Please speak up immediately, if you have objects against the committal. (Another option would be to revert previous patch, which solved a severe performance regression on Windows - thus that patch plus this patch is better.) Sorry for disrupting the 4.6.2 process a bit. Tobias libgfortran/ 2011-10-19 Janne Blomqvist Kai Tietz Tobias Burnus PR fortran/50016 * io/unix.h (flush_sync): Add new libgfortran-internal prototype. * io/unix.c (flush_sync): New function, which calls sflush and on MinGW(-w64) also _commit. (flush_all_units, flush_all_units_1): Replace sflush/_commit by flush_sync. * io/file_pos.c (st_flush): Replace sflush/_commit by flush_sync. * io/intrinsics.c (flush_i4, flush_i8): Ditto. Index: libgfortran/io/file_pos.c === --- libgfortran/io/file_pos.c (revision 180180) +++ libgfortran/io/file_pos.c (working copy) @@ -451,11 +451,7 @@ st_flush (st_parameter_filepos *fpp) if (u->flags.form == FORM_FORMATTED) fbuf_flush (u, u->mode); - sflush (u->s); -#ifdef _WIN32 - /* Without _commit, changes are not visible to other file descriptors. */ - _commit (u->s->fd); -#endif + flush_sync (u->s); unlock_unit (u); } else Index: libgfortran/io/unix.c === --- libgfortran/io/unix.c (revision 180180) +++ libgfortran/io/unix.c (working copy) @@ -1522,6 +1522,23 @@ retry: return u; } + +/* Flush dirty data, making sure that OS metadata is updated as + well. Note that this is VERY slow on mingw due to committing data + to stable storage. */ +int +flush_sync (stream * s) +{ + if (sflush (s) == -1) +return -1; +#ifdef __MINGW32__ + if (_commit (((unix_stream *)s)->fd) == -1) +return -1; +#endif + return 0; +} + + static gfc_unit * flush_all_units_1 (gfc_unit *u, int min_unit) { @@ -1538,14 +1555,7 @@ flush_all_units_1 (gfc_unit *u, int min_unit) if (__gthread_mutex_trylock (&u->lock)) return u; if (u->s) - { - sflush (u->s); -#ifdef _WIN32 - /* Without _commit, changes are not visible to other - file descriptors. */ - _commit (u->s->fd); -#endif - } + flush_sync (u->s); __gthread_mutex_unlock (&u->lock); } u = u->right; @@ -1575,12 +1585,7 @@ flush_all_units (void) if (u->closed == 0) { - sflush (u->s); -#ifdef _WIN32 - /* Without _commit, changes are not visible to other - file descriptors. */ - _commit (u->s->fd); -#endif + flush_sync (u->s); __gthread_mutex_lock (&unit_lock); __gthread_mutex_unlock (&u->lock); (void) predec_waiting_locked (u); Index: libgfortran/io/intrinsics.c === --- libgfortran/io/intrinsics.c (revision 180180) +++ libgfortran/io/intrinsics.c (working copy) @@ -206,12 +206,7 @@ flush_i4 (GFC_INTEGER_4 *unit) us = find_unit (*unit); if (us != NULL) { - sflush (us->s); -#ifdef _WIN32 - /* Without _commit, changes are not visible - to other file descriptors. */ - _commit (u->s->fd); -#endif + flush_sync (us->s); unlock_unit (us); } } @@ -234,12 +229,7 @@ flush_i8 (GFC_INTEGER_8 *unit) us = find_unit (*unit); if (us != NULL) { - sflush (us->s); -#ifdef _WIN32 - /* Without _commit, changes are not visible - to other file descriptors. */ - _commit (u->s->fd); -#endif + flush_sync (us->s); unlock_unit (us); } } Index: libgfortran/io/unix.h === --- libgfortran/io/unix.h (revision 180180) +++ libgfortran/io/unix.h (working copy) @@ -167,6 +167,9 @@ internal_proto(is_special); extern void flush_if_preconnected (stream *); internal_proto(flush_if_preconnected); +extern int flush_sync (stream *); +internal_proto(flush_sync); + extern int stream_isatty (stream *); internal_proto(stream_isatty);
[PATCH] Handle VEC_PERM_EXPR and WIDEN_LSHIFT_EXPR in expand_debug_expr (PR middle-end/50754)
Hi! These two tree codes have been added recently, but not to expand_debug_expr. The following patch adds them there, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-19 Jakub Jelinek PR middle-end/50754 * cfgexpand.c (expand_debug_expr): Handle WIDEN_LSHIFT_EXPR, ignore VEC_PERM_EXPR. --- gcc/cfgexpand.c.jj 2011-10-18 23:52:04.0 +0200 +++ gcc/cfgexpand.c 2011-10-18 23:54:22.0 +0200 @@ -3267,6 +3267,7 @@ expand_debug_expr (tree exp) case VEC_WIDEN_MULT_LO_EXPR: case VEC_WIDEN_LSHIFT_HI_EXPR: case VEC_WIDEN_LSHIFT_LO_EXPR: +case VEC_PERM_EXPR: return NULL; /* Misc codes. */ @@ -3321,6 +3322,7 @@ expand_debug_expr (tree exp) return NULL; case WIDEN_SUM_EXPR: +case WIDEN_LSHIFT_EXPR: if (SCALAR_INT_MODE_P (GET_MODE (op0)) && SCALAR_INT_MODE_P (mode)) { @@ -3329,7 +3331,8 @@ expand_debug_expr (tree exp) 0))) ? ZERO_EXTEND : SIGN_EXTEND, mode, op0, inner_mode); - return simplify_gen_binary (PLUS, mode, op0, op1); + return simplify_gen_binary (TREE_CODE (exp) == WIDEN_LSHIFT_EXPR + ? ASHIFT : PLUS, mode, op0, op1); } return NULL; Jakub
[PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)
Hi! Similarly to casts of bool to integer, even stores into bool arrays can be handled similarly. Just we need to ensure tree-vect-data-refs.c doesn't reject vectorization before tree-vect-patterns.c has a chance to optimize it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-10-19 Jakub Jelinek PR tree-optimization/50596 * tree-vect-stmts.c (vect_mark_relevant): Only use FOR_EACH_IMM_USE_FAST if lhs is SSA_NAME. (vectorizable_store): If is_pattern_stmt_p look through VIEW_CONVERT_EXPR on lhs. * tree-vect-patterns.c (vect_recog_bool_pattern): Optimize also stores into bool memory in addition to casts from bool to integral types. (vect_mark_pattern_stmts): If pattern_stmt already has vinfo created, don't create it again. * tree-vect-data-refs.c (vect_analyze_data_refs): For stores into bool memory use vectype for integral type corresponding to bool's mode. * tree-vect-loop.c (vect_determine_vectorization_factor): Give up if a store into bool memory hasn't been replaced by the pattern recognizer. * gcc.dg/vect/vect-cond-10.c: New test. --- gcc/tree-vect-stmts.c.jj2011-10-18 23:52:07.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-19 14:19:00.0 +0200 @@ -159,19 +159,20 @@ vect_mark_relevant (VEC(gimple,heap) **w /* This use is out of pattern use, if LHS has other uses that are pattern uses, we should mark the stmt itself, and not the pattern stmt. */ - FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs) -{ - if (is_gimple_debug (USE_STMT (use_p))) -continue; - use_stmt = USE_STMT (use_p); + if (TREE_CODE (lhs) == SSA_NAME) + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs) + { + if (is_gimple_debug (USE_STMT (use_p))) + continue; + use_stmt = USE_STMT (use_p); - if (vinfo_for_stmt (use_stmt) - && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt))) -{ - found = true; - break; -} -} + if (vinfo_for_stmt (use_stmt) + && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt))) + { + found = true; + break; + } + } } if (!found) @@ -3656,6 +3657,9 @@ vectorizable_store (gimple stmt, gimple_ return false; scalar_dest = gimple_assign_lhs (stmt); + if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR + && is_pattern_stmt_p (stmt_info)) +scalar_dest = TREE_OPERAND (scalar_dest, 0); if (TREE_CODE (scalar_dest) != ARRAY_REF && TREE_CODE (scalar_dest) != INDIRECT_REF && TREE_CODE (scalar_dest) != COMPONENT_REF --- gcc/tree-vect-patterns.c.jj 2011-10-18 23:52:05.0 +0200 +++ gcc/tree-vect-patterns.c2011-10-19 13:55:27.0 +0200 @@ -1933,6 +1933,50 @@ vect_recog_bool_pattern (VEC (gimple, he VEC_safe_push (gimple, heap, *stmts, last_stmt); return pattern_stmt; } + else if (rhs_code == SSA_NAME + && STMT_VINFO_DATA_REF (stmt_vinfo)) +{ + stmt_vec_info pattern_stmt_info; + vectype = STMT_VINFO_VECTYPE (stmt_vinfo); + gcc_assert (vectype != NULL_TREE); + if (!check_bool_pattern (var, loop_vinfo)) + return NULL; + + rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts); + if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == TARGET_MEM_REF) + { + lhs = copy_node (lhs); + TREE_TYPE (lhs) = TREE_TYPE (vectype); + } + else + lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs); + if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs))) + { + tree rhs2 = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); + gimple cast_stmt + = gimple_build_assign_with_ops (NOP_EXPR, rhs2, rhs, NULL_TREE); + STMT_VINFO_PATTERN_DEF_STMT (stmt_vinfo) = cast_stmt; + rhs = rhs2; + } + pattern_stmt + = gimple_build_assign_with_ops (SSA_NAME, lhs, rhs, NULL_TREE); + pattern_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL); + set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info); + STMT_VINFO_DATA_REF (pattern_stmt_info) + = STMT_VINFO_DATA_REF (stmt_vinfo); + STMT_VINFO_DR_BASE_ADDRESS (pattern_stmt_info) + = STMT_VINFO_DR_BASE_ADDRESS (stmt_vinfo); + STMT_VINFO_DR_INIT (pattern_stmt_info) = STMT_VINFO_DR_INIT (stmt_vinfo); + STMT_VINFO_DR_OFFSET (pattern_stmt_info) + = STMT_VINFO_DR_OFFSET (stmt_vinfo); + STMT_VINFO_DR_STEP (pattern_stmt_info) = STMT_VINFO_DR_STEP (stmt_vinfo); + STMT_VINFO_DR_ALIGNED_TO (pattern_stmt_info) + = STMT_VINFO_DR_
Re: [VTA, PR49310] O(n+m)-ish emit_notes
On Tue, Oct 18, 2011 at 7:50 AM, Alexandre Oliva wrote: > On Oct 17, 2011, Jakub Jelinek wrote: > >> On Tue, Oct 11, 2011 at 04:19:52PM -0300, Alexandre Oliva wrote: >>> Here's what I've got so far. Regstrapped on x86_64-linux-gnu and >>> i686-linux-gnu. Ok to install? > >> I see >> +FAIL: gcc.c-torture/compile/pr19080.c -O3 -g (internal compiler error) >> +FAIL: gcc.c-torture/compile/pr19080.c -O3 -g (test for excess errors) >> regression with this patch on x86_64-linux (--enable-checking=yes,rtl), ICE >> in var-tracking. Can you please look at that? > > Hey, I'd already fixed that! It turns out that I posted an outdated > version of the patch, from before moving an incorrect assertion check > into a test. > >> Just a style nit, isn't { supposed to go on the next line? > > Not sure. Nearly all multi-line enums in gcc/ have brackets at the end > of the line rather than in the beginning of the subsequent line; I could > only find one that followed what we both think the conventions require. > I changed it, thanks. > >> Otherwise looks ok. > > Thanks, here's what I'm going to install as soon as I complete a new > regression test with the corrected patch, just to be sure I didn't > accidentally drop any other fixes. > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50799 -- H.J.
Re: [cxx-mem-model] compare_exchange implementation
On 10/19/2011 09:26 AM, Andrew MacLeod wrote: > well, they are explicitly'compare_exchange_strong' and > 'compare_exchange_weak' calls... so yes, they have 'hardcoded' one or the > other :-) > we could alternatively do 2 separate builtins, but I didn't see the point. > But Im ambivalent. I'm just not keen on issuing errors this late. If we 'know' it's always going to be 0/1 because of other things the compiler does, we might as well simply do weak = (opN == const1_rtx) and leave it at that. > I clearly misunderstood what you said then, I thought you said use > QImode :-P. So, look at target and use that mode? target may not > be set sometimes tho right? what do you use then? QImode? You never look at target, you look at insn_data[icode].operand[opno].mode if target is set and doesn't match that mode, then create a new pseudo of the proper mode and let your caller do the conversion. > If I understand my previous conversation with Lawrence, we need to > know what mode the native CAS is. We don't have that knowledge for any of the other rtl patterns; we ask the rtl pattern to enforce a given mode. We should do the same for CAS. > Worst case, if the native CAS is relaxed, the fences need to be > issued as such to implement > > compare_exchange (T* ptr, T* old, new, success, failure) > > memory_fence (success) // release modes > val = *old; > res = relaxed_CAS (ptr, val, new) > if (success) > { > memory_fence (success); // acquire modes > return true; > } > *old = res; > memory_fence (failure) > return false; If the target uses LL/SC, then native CAS is likely relaxed. However, we will already be emitting a number of jumps in order to implement the LL/SC sequence and stuffing the fences into those jumps is going to be much better than you trying to do it in the middle-end outside of that sequence. Really. > So if the CAS can handle it all, why does it matter if the pattern > has a single "compressed" parameter for the 3 values, or whether it > simply explicitly lists all three? *shrug* Just for less memory consumption. It *could* remain 3 values. r~
Re: [PATCH] Handle VEC_PERM_EXPR and WIDEN_LSHIFT_EXPR in expand_debug_expr (PR middle-end/50754)
On 10/19/2011 10:11 AM, Jakub Jelinek wrote: > Hi! > > These two tree codes have been added recently, but not to expand_debug_expr. > The following patch adds them there, bootstrapped/regtested on x86_64-linux > and i686-linux, ok for trunk? > > 2011-10-19 Jakub Jelinek > > PR middle-end/50754 > * cfgexpand.c (expand_debug_expr): Handle WIDEN_LSHIFT_EXPR, ignore > VEC_PERM_EXPR. Ok. r~
Re: [cxx-mem-model] compare_exchange implementation
On 10/19/2011 01:55 PM, Richard Henderson wrote: On 10/19/2011 09:26 AM, Andrew MacLeod wrote: well, they are explicitly'compare_exchange_strong' and 'compare_exchange_weak' calls... so yes, they have 'hardcoded' one or the other :-) we could alternatively do 2 separate builtins, but I didn't see the point. But Im ambivalent. I'm just not keen on issuing errors this late. If we 'know' it's always going to be 0/1 because of other things the compiler does, we might as well simply do weak = (opN == const1_rtx) and leave it at that. We issue all the memory model errors at this same location . A user *could* use this builtin and write bool weak; <...> __atomic_compare_exchange (&mem, &expected, desired, weak, m1, m2); The c++ wrappers will always call it with 0 or 1, but direct callers could do other things. The optimizers may well decide that its a 1 or 0 by the time we get to expansion. Its documented as requiring a compile time constant. We either report errors, or remove the possibility by making them separate routines. Or silently make it strong. I suppose we could also check the value when we turn it from __atomic_compare_exchange to __atomic_compare_exchange_{1,2,4,8,16}... thats before going into SSA. Im not sure it really buys a lot tho. Maybe they should just be separate routines... I clearly misunderstood what you said then, I thought you said use QImode :-P. So, look at target and use that mode? target may not be set sometimes tho right? what do you use then? QImode? You never look at target, you look at insn_data[icode].operand[opno].mode if target is set and doesn't match that mode, then create a new pseudo of the proper mode and let your caller do the conversion. ah. Worst case, if the native CAS is relaxed, the fences need to be issued as such to implement compare_exchange (T* ptr, T* old, new, success, failure) memory_fence (success) // release modes val = *old; res = relaxed_CAS (ptr, val, new) if (success) { memory_fence (success); // acquire modes return true; } *old = res; memory_fence (failure) return false; If the target uses LL/SC, then native CAS is likely relaxed. However, we will already be emitting a number of jumps in order to implement the LL/SC sequence and stuffing the fences into those jumps is going to be much better than you trying to do it in the middle-end outside of that sequence. Really. My point is, IF the fence has to go BEFORE 'val = *old' and AFTER '*old = res', its impossible to do it within the CAS pattern, unless you also have it do the copies... If, which is more likely as I think about it, thats not the case and the fences can look like they do in the second example (closer to the CAS than the copies) , then the CAS can take care of both fences, which would be much more preferable and clean.. So if the CAS can handle it all, why does it matter if the pattern has a single "compressed" parameter for the 3 values, or whether it simply explicitly lists all three? *shrug* Just for less memory consumption. It *could* remain 3 values. Does it really make a measurable memory difference? I can certainly jam them together... Andrew
Re: [4.6,patch,libgfortran/io] Fix build issue with _commit [PR5016]
On Wed, Oct 19, 2011 at 07:11:02PM +0200, Tobias Burnus wrote: > I intent to commit this patch - created mostly by Janne - soon to > gcc-4_6-branch; it should solve a build issue. > > The patch was build and regtested on x86-64-linux and build and tested > on MinGW and Cygwin by Kai. > > It is a follow up to http://gcc.gnu.org/ml/fortran/2011-10/msg00132.html > / http://gcc.gnu.org/ml/gcc-cvs/2011-10/msg00734.html > > Please speak up immediately, if you have objects against the committal. > (Another option would be to revert previous patch, which solved a severe > performance regression on Windows - thus that patch plus this patch is > better.) > > Sorry for disrupting the 4.6.2 process a bit. > I don't have an objection, just a question. Does this issue appear on trunk? Comment #10 in PR 50016 appears to suggest it does. In which case, I object to the initial commit into the 4.6-branch without testing the patch of trunk first. Rushing a patch into a branch to beat a release dead line is simply a poor choice. -- Steve
Fix g++.dg/compat/struct-layout-1_generate.c options for Windows targets
g++.dg/compat/struct-layout-1_generate.c generates a dg-options line using -mno-mmx -Wno-abi for i?86-*-* x86_64-*-* - and one using -fno-common, which takes precedence, for alpha*-dec-osf* hppa*-*-hpux* powerpc*-*-darwin* *-*-mingw32* *-*-cygwin*. But at least i686-mingw32 still needs the -Wno-abi option. This patch adds the Windows targets to the case for i?86-*-darwin* x86_64-*-darwin* which already passes both sets of options. Tested for cross to i686-mingw32. OK to commit? 2011-10-19 Joseph Myers * g++.dg/compat/struct-layout-1_generate.c: Also pass -mno-mmx -Wno-abi for i?86-*-mingw32* x86_64-*-mingw32* i?86-*-cygwin*. Index: gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c === --- gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c (revision 180200) +++ gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c (working copy) @@ -47,7 +47,7 @@ "/* { dg-options \"%s-I%s\" } */\n", "/* { dg-options \"%s-I%s -mno-mmx -Wno-abi\" { target i?86-*-* x86_64-*-* } } */\n", "/* { dg-options \"%s-I%s -fno-common\" { target alpha*-dec-osf* hppa*-*-hpux* powerpc*-*-darwin* *-*-mingw32* *-*-cygwin* } } */\n", -"/* { dg-options \"%s-I%s -mno-mmx -fno-common -Wno-abi\" { target i?86-*-darwin* x86_64-*-darwin* } } */\n", +"/* { dg-options \"%s-I%s -mno-mmx -fno-common -Wno-abi\" { target i?86-*-darwin* x86_64-*-darwin* i?86-*-mingw32* x86_64-*-mingw32* i?86-*-cygwin* } } */\n", "/* { dg-options \"%s-I%s -mno-base-addresses\" { target mmix-*-* } } */\n", "/* { dg-options \"%s-I%s -mlongcalls -mtext-section-literals\" { target xtensa*-*-* } } */\n" #define NDG_OPTIONS (sizeof (dg_options) / sizeof (dg_options[0])) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add an intermediate coverage format for gcov
Since the updated patch already uses unmangled function names, is it good to commit then? Sharad On Wed, Oct 19, 2011 at 1:48 AM, Jan Hubicka wrote: >> On Oct 18, 2011, at 4:19 PM, Sharad Singhai wrote: >> > Okay, I liked the idea of self-descriptive tags. I have updated the >> > patch based on your suggestions. I have simplified the format >> > somewhat. Instead of repeating function name, I use a 'function' tag >> > with the format >> > >> > function:,, >> >> Sound nice. >> >> > I also dropped the unmangled function names, they were turning out to >> > be too unreadable and not really useful in this context. >> >> Ah, I'd argue for mangled names. Every one knows they can stream through >> c++filt and get unmangled, but once you unmangle, you can never go back. >> Also, the mangled version is significantly smaller. For c, it is >> irrelevant, for c++, it makes a big difference. > > I would also support unmangled variant. Otherwise the patch seems resonable > to me. > > Honza >> > >
Re: [PATCH] Add an intermediate coverage format for gcov
Sorry, I misunderstood your comment. I see that you are asking for unmangled function names whereas the current patch supports only mangled names. I can print unmangled names under another option. Would that work? Thanks, Sharad On Wed, Oct 19, 2011 at 12:06 PM, Sharad Singhai wrote: > Since the updated patch already uses unmangled function names, is it > good to commit then? > > Sharad > > On Wed, Oct 19, 2011 at 1:48 AM, Jan Hubicka wrote: >>> On Oct 18, 2011, at 4:19 PM, Sharad Singhai wrote: >>> > Okay, I liked the idea of self-descriptive tags. I have updated the >>> > patch based on your suggestions. I have simplified the format >>> > somewhat. Instead of repeating function name, I use a 'function' tag >>> > with the format >>> > >>> > function:,, >>> >>> Sound nice. >>> >>> > I also dropped the unmangled function names, they were turning out to >>> > be too unreadable and not really useful in this context. >>> >>> Ah, I'd argue for mangled names. Every one knows they can stream through >>> c++filt and get unmangled, but once you unmangle, you can never go back. >>> Also, the mangled version is significantly smaller. For c, it is >>> irrelevant, for c++, it makes a big difference. >> >> I would also support unmangled variant. Otherwise the patch seems resonable >> to me. >> >> Honza >>> > >> >
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Wed, Oct 19, 2011 at 12:02 PM, wrote: > Minimized the crash to this: > > struct Foo { > unsigned bf1:1; > unsigned bf2:1; > unsigned bf3:1; > }; > > void foo (struct Foo *ob) { > ob->bf2 = 1; > } > > > > D.2731_4 = &ob_1(D)->bf2; > __asan_base_addr.2 = (long unsigned int) D.2731_4; > D.2732_5 = __asan_base_addr.2 >> 3; > D.2733_6 = 1 << 44; > D.2734_7 = D.2732_5 + D.2733_6; > D.2735_8 = VIEW_CONVERT_EXPR(D.2734_7); > # VUSE <.MEM> > __asan_shadow.3 = *D.2735_8; > D.2737_9 = __asan_shadow.3 != 0; > D.2738_10 = __asan_base_addr.2 & 7; > D.2739_11 = (char) D.2738_10; > D.2740_12 = D.2739_11 + 0; > D.2741_13 = D.2740_12 >= __asan_shadow.3; > __asan_crash_cond.4 = D.2737_9 & D.2741_13; > if (__asan_crash_cond.4 != 0) > ./expand_expr_addr_expr_1_err.c: In function ‘foo’: > ./expand_expr_addr_expr_1_err.c:8:11: internal compiler error: in > expand_expr_addr_expr_1, at expr.c:7381 > > > > How do I avoid instrumenting bitfields? Use get_inner_reference to compute the bitpos, and check if it is multiple of bits_per_unit. David > > http://codereview.appspot.com/5272048/ >
Re: [trans-mem] Add gl_wt TM method.
> Add support for TM-method-specific begin code. > > * libitm_i.h (GTM::gtm_restart_reason): Re-arrange and clean up > declarations. > * dispatch.h (GTM::abi_dispatch::begin_or_restart): New. > * method-serial.cc: Implement begin_or_restart(). > * beginend.cc (GTM::gtm_thread::begin_transaction): Call > dispatch-specific begin_or_restart(). > (GTM::gtm_thread::restart): Same. Ok except, > + // Run dispatch-specific restart code. Retry until we succeed. > + GTM::gtm_restart_reason rr; > + while ((rr = disp->begin_or_restart()) > + != NUM_RESTARTS) Please add NO_RESTART = NUM_RESTARTS (or it's own number *after* NUM_RESTARTS, or -1, or something) to the enumeration and use that name. Using num_restarts here is confusing. > Fixed gtm_thread::serialirr_mode to actually use serialirr, not serial. > > * method-serial.cc (GTM::gtm_thread::serialirr_mode): Fixed: Use > serial-irrevocable dispatch, not serial. Ok. > Do not free transaction-local memory when committing a nested transaction. > > * alloc.cc (commit_allocations_2): Do not free transaction-local > memory when committing a nested transaction. Ok. > Handle re-initialization of the current method group. > > * retry.cc (GTM::gtm_thread::decide_retry_strategy): Handle > re-initialization of the current method group. > * libitm_i.h (GTM::gtm_restart_reason): Add restart reason for this. Ok. > Undo log is used for both thread-local and shared data. > > * libitm_i.h: Renamed gtm_local_undo to gtm_undolog_entry. > (GTM::gtm_thread): Renamed local_undo to undolog. Renamed > undolog-related member functions from *_local to *_undolog. > * local.cc (gtm_thread::commit_undolog): Same. > * beginend.cc (GTM::gtm_thread::trycommit): Same. > (GTM::gtm_thread::rollback): Roll back undolog before > dispatch-specific rollback. Ok. > Ensure privatization safety if requested by a TM method. > > * beginend.cc (GTM::gtm_thread::trycommit): Ensure privatization > safety if requested by a TM method. > * dispatch.h (GTM::abi_dispatch::trycommit): Add parameter for > privatization safety. > * method-serial.cc: Same. Ok. > Add gl_wt TM method. > > * libitm_i.h: Add gl_wt dispatch. > * retry.cc (parse_default_method): Same. > * method-gl.cc: New file. > * Makefile.am: Use method-gl.cc. > * Makefile.in: Rebuild. Ok with... > Fix gl_wt commit/rollback when serial lock has been acquired. > > * method-gl.cc (gl_wt_dispatch::trycommit): Fix interaction with > gtm_thread::shared_state when the serial lock is acquired. > (gl_wt_dispatch::rollback): Same. ... this merged with the previous commit. > Fix TLS read accesses on Linux/x86. > > * config/linux/x86/tls.h (abi_disp): Make TLS slot read volatile. > (gtm_thr): Same. Ok. r~
Re: [cxx-mem-model] compare_exchange implementation
On 10/19/2011 11:31 AM, Andrew MacLeod wrote: > If, which is more likely as I think about it, thats not the case and > the fences can look like they do in the second example (closer to the > CAS than the copies) , then the CAS can take care of both fences, > which would be much more preferable and clean.. Indeed. >>> So if the CAS can handle it all, why does it matter if the pattern >>> has a single "compressed" parameter for the 3 values, or whether it >>> simply explicitly lists all three? >> *shrug* Just for less memory consumption. It *could* remain 3 values. >> > Does it really make a measurable memory difference? I can certainly jam them > together... Eh, you're right, probably not a measurable difference. r~
[PATCH] Further i?86 vector permutation fixes
Hi! This patch adds two new permutations to the vshuf* tests (VEC_PACK_TRUNC style) and fixes problems related to them. The first two fix issues with AVX2, the rest of the patch deals with the problem that if the shuffle mask suggests two operands, testing whether there is a permutation supported by the hw is done with d.op0 != d.op1, but then when we find out both arguments are the same, it is performed for d.op0 == d.op1. Some routines handle just equality and some non-equality, so it is possible the check might succeed even when d.op0 == d.op1 using the adjusted mask won't (which results in ICEs). This patch in that case retries using d.op0 != d.op1. Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested with -mavx2 on sde. 2011-10-19 Jakub Jelinek * config/i386/i386.c (expand_vec_perm_vpshufb2_vpermq_even_odd): Use d->op1 instead of d->op0 for the second vpshufb. (expand_vec_perm_even_odd_1): For V8SImode fix vpshufd immediates. (ix86_expand_vec_perm_const): If mask indicates two operands are needed, but both are the same and expanding them as d.op0 == d.op1 failed, retry with d.op0 != d.op1. (ix86_expand_vec_perm_builtin): Likewise. Handle sorry printing also for d.nelt == 32. * gcc.dg/torture/vshuf-32.inc: Add interleave permutations. * gcc.dg/torture/vshuf-16.inc: Likewise. * gcc.dg/torture/vshuf-8.inc: Likewise. * gcc.dg/torture/vshuf-4.inc: Likewise. --- gcc/config/i386/i386.c.jj 2011-10-18 23:52:02.0 +0200 +++ gcc/config/i386/i386.c 2011-10-19 19:02:57.0 +0200 @@ -35992,7 +35992,7 @@ expand_vec_perm_vpshufb2_vpermq_even_odd vperm = force_reg (V32QImode, vperm); h = gen_reg_rtx (V32QImode); - op = gen_lowpart (V32QImode, d->op0); + op = gen_lowpart (V32QImode, d->op1); emit_insn (gen_avx2_pshufbv32qi3 (h, op, vperm)); ior = gen_reg_rtx (V32QImode); @@ -36154,9 +36154,9 @@ expand_vec_perm_even_odd_1 (struct expan /* Swap the 2nd and 3rd position in each lane into { 0 2 1 3 8 a 9 b } and { 4 6 5 7 c e d f }. */ emit_insn (gen_avx2_pshufdv3 (t1, t1, - GEN_INT (2 * 2 + 1 * 16 + 3 * 64))); + GEN_INT (2 * 4 + 1 * 16 + 3 * 64))); emit_insn (gen_avx2_pshufdv3 (t2, t2, - GEN_INT (2 * 2 + 1 * 16 + 3 * 64))); + GEN_INT (2 * 4 + 1 * 16 + 3 * 64))); /* Now an vpunpck[lh]qdq will produce { 0 2 4 6 8 a c e } resp. { 1 3 5 7 9 b d f }. */ @@ -36498,6 +36498,7 @@ ix86_expand_vec_perm_builtin (tree exp) { struct expand_vec_perm_d d; tree arg0, arg1, arg2; + bool maybe_retry = false; arg0 = CALL_EXPR_ARG (exp, 0); arg1 = CALL_EXPR_ARG (exp, 1); @@ -36543,6 +36544,7 @@ ix86_expand_vec_perm_builtin (tree exp) for (i = 0; i < nelt; ++i) if (d.perm[i] >= nelt) d.perm[i] -= nelt; + maybe_retry = true; } /* FALLTHRU */ @@ -36563,6 +36565,28 @@ ix86_expand_vec_perm_builtin (tree exp) if (ix86_expand_vec_perm_builtin_1 (&d)) return d.target; + /* If the mask says both arguments are needed, but they are the same, + the above tried to expand with d.op0 == d.op1. If that didn't work, + retry with d.op0 != d.op1 as that is what testing has been done with. */ + if (maybe_retry) +{ + rtx seq; + bool ok; + + extract_vec_perm_cst (&d, arg2); + d.op1 = gen_reg_rtx (d.vmode); + start_sequence (); + ok = ix86_expand_vec_perm_builtin_1 (&d); + seq = get_insns (); + end_sequence (); + if (ok) + { + emit_move_insn (d.op1, d.op0); + emit_insn (seq); + return d.target; + } +} + /* For compiler generated permutations, we should never got here, because the compiler should also be checking the ok hook. But since this is a builtin the user has access too, so don't abort. */ @@ -36588,6 +36612,19 @@ ix86_expand_vec_perm_builtin (tree exp) d.perm[8], d.perm[9], d.perm[10], d.perm[11], d.perm[12], d.perm[13], d.perm[14], d.perm[15]); break; +case 32: + sorry ("vector permutation " +"(%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d " +"%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d)", +d.perm[0], d.perm[1], d.perm[2], d.perm[3], +d.perm[4], d.perm[5], d.perm[6], d.perm[7], +d.perm[8], d.perm[9], d.perm[10], d.perm[11], +d.perm[12], d.perm[13], d.perm[14], d.perm[15], +d.perm[16], d.perm[17], d.perm[18], d.perm[19], +d.perm[20], d.perm[21], d.perm[22], d.perm[23], +d.perm[24], d.perm[25], d.perm[26], d.perm[27], +d.perm[28], d.perm[29], d.perm[30], d.perm[31]); + break; default: gcc_unreachable (); } @@ -36599,6 +36636,7 @@ b
[Patch ARM] Fix PR target/50106
Hi, This fixes PR target/50106 which was missing handling return register size from 1-3 for Thumb1. Fixed thusly. Final testing on-going with arm-linux-gnueabi with thumb1. Ok to backport to 4.6 branch given it is branch freeze time ? I'll be able to commit this to the branch latest by tomorrow A.M. UK time. cheers Ramana 2011-10-19 Ramana Radhakrishnan PR target/50106 * config/arm/arm.c (thumb_unexpanded_epilogue): Handle return reg size from 1-3. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 180200) +++ gcc/config/arm/arm.c(working copy) @@ -21652,7 +21652,7 @@ if (extra_pop > 0) { unsigned long extra_mask = (1 << extra_pop) - 1; - live_regs_mask |= extra_mask << (size / UNITS_PER_WORD); + live_regs_mask |= extra_mask << ((size + 3) / UNITS_PER_WORD); } /* The prolog may have pushed some high registers to use as
Re: [Patch ARM] Fix PR target/50106
On 10/19/2011 3:27 PM, Ramana Radhakrishnan wrote: Index: gcc/config/arm/arm.c - live_regs_mask |= extra_mask<< (size / UNITS_PER_WORD); + live_regs_mask |= extra_mask<< ((size + 3) / UNITS_PER_WORD); IIUC, wouldn't ((size + UNITS_PER_WORD - 1) / UNITS_PER_WORD) be clearer? -Nathan
[PATCH, testsuite]: Require non_strict_align effective target for gcc.dg/ipa/ipa-sra-[26].c
Hello! These two tests require non_strict_aligned effective target, since IPA fails in tree_non_mode_aligned_mem_p () for "cow" and "calf" candidates for STRICT_ALIGNMENT targets. Mode alignment requires 32 bytes, while data is aligned to 8 bytes. 2011-10-19 Uros Bizjak * gcc.dg/ipa/ipa-sra-2.c: Add dg-require-effective-target non_strict_align. * gcc.dg/ipa/ipa-sra-6.c: Ditto. Tested on x86_64-pc-linux-gnu and alphaev68-pc-linux-gnu, where the patch "fixes": FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra "About to replace expr cow_.*D.->red with \\*ISRA" FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra "About to replace expr cow_.*D.->green with ISRA" FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra "About to replace expr calf_.*D.->red with \\*ISRA" FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra "About to replace expr calf_.*D.->green with ISRA" FAIL: gcc.dg/ipa/ipa-sra-6.c scan-tree-dump-times eipa_sra "foo " 1 OK for mainline and 4.6 branch? Uros. Index: gcc.dg/ipa/ipa-sra-2.c === --- gcc.dg/ipa/ipa-sra-2.c (revision 180193) +++ gcc.dg/ipa/ipa-sra-2.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fipa-sra -fdump-tree-eipa_sra-details" } */ +/* { dg-require-effective-target non_strict_align } */ struct bovid { Index: gcc.dg/ipa/ipa-sra-6.c === --- gcc.dg/ipa/ipa-sra-6.c (revision 180193) +++ gcc.dg/ipa/ipa-sra-6.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fipa-sra -fdump-tree-eipa_sra-slim" } */ +/* { dg-require-effective-target non_strict_align } */ struct bovid {
Re: [PATCH, testsuite]: Require non_strict_align effective target for gcc.dg/ipa/ipa-sra-[26].c
From: Uros Bizjak Date: Wed, 19 Oct 2011 21:50:14 +0200 > Hello! > > These two tests require non_strict_aligned effective target, since IPA > fails in tree_non_mode_aligned_mem_p () for "cow" and "calf" > candidates for STRICT_ALIGNMENT targets. Mode alignment requires 32 > bytes, while data is aligned to 8 bytes. > > 2011-10-19 Uros Bizjak > > * gcc.dg/ipa/ipa-sra-2.c: Add dg-require-effective-target > non_strict_align. > * gcc.dg/ipa/ipa-sra-6.c: Ditto. > > Tested on x86_64-pc-linux-gnu and alphaev68-pc-linux-gnu, where the > patch "fixes": Thanks for taking care of this, I've been seeing these on sparc as well.
Re: [PATCH 2/6] Generate virtual locations for tokens
This is probably going to take some back and forth as I don't have any SPU target at hand to debug, so I have filed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50801 to track this issue, along with a candidate fix. Could you please test it and reply to the bug with your results? Thank you for your time and I apologize for the inconvenience. -- Dodji
Re: Fix g++.dg/compat/struct-layout-1_generate.c options for Windows targets
On Oct 19, 2011, at 11:58 AM, "Joseph S. Myers" wrote: > g++.dg/compat/struct-layout-1_generate.c generates a dg-options line > using -mno-mmx -Wno-abi for i?86-*-* x86_64-*-* - and one using > -fno-common, which takes precedence, for alpha*-dec-osf* hppa*-*-hpux* > powerpc*-*-darwin* *-*-mingw32* *-*-cygwin*. But at least > i686-mingw32 still needs the -Wno-abi option. This patch adds the > Windows targets to the case for i?86-*-darwin* x86_64-*-darwin* which > already passes both sets of options. > OK to commit? Ok.
Re: [PATCH] Further i?86 vector permutation fixes
On 10/19/2011 12:25 PM, Jakub Jelinek wrote: > 2011-10-19 Jakub Jelinek > > * config/i386/i386.c (expand_vec_perm_vpshufb2_vpermq_even_odd): Use > d->op1 instead of d->op0 for the second vpshufb. > (expand_vec_perm_even_odd_1): For V8SImode fix vpshufd immediates. > (ix86_expand_vec_perm_const): If mask indicates two operands are > needed, but both are the same and expanding them as d.op0 == d.op1 > failed, retry with d.op0 != d.op1. > (ix86_expand_vec_perm_builtin): Likewise. Handle sorry printing > also for d.nelt == 32. > > * gcc.dg/torture/vshuf-32.inc: Add interleave permutations. > * gcc.dg/torture/vshuf-16.inc: Likewise. > * gcc.dg/torture/vshuf-8.inc: Likewise. > * gcc.dg/torture/vshuf-4.inc: Likewise. Ok. Although I think a good followup would be to fix > + if (which == 3 && d.op0 == d.op1) > +{ > + rtx seq; > + bool ok; > + > + memcpy (d.perm, perm, sizeof (perm)); > + d.op1 = gen_reg_rtx (d.vmode); > + start_sequence (); > + ok = ix86_expand_vec_perm_builtin_1 (&d); > + seq = get_insns (); > + end_sequence (); > + if (ok) > + { > + emit_move_insn (d.op1, d.op0); > + emit_insn (seq); this so that we don't need a copy to another register. That could be done by adding a d.one_operand field, and using that test instead of explicit equality everywhere. r~
Re: C++ PATCH for c++/49172 (allow constexpr reference variables)
On 06/22/2011 12:19 AM, Jason Merrill wrote: We decided at the Madrid meeting to allow constexpr reference variables, and this patch adds that support to GCC. The cp_parser_initializer_clause hunk also fixes 50787, a bug with direct binding of references in 4.6, so I'm applying that one there. Tested x86_64-pc-linux-gnu.
Re: [4.6,patch,libgfortran/io] Fix build issue with _commit [PR5016]
Pre-remark: Jakub (4.6.2 RM) had no objects - and thus I have committed it to the branch. Steve Kargl wrote: I don't have an objection, just a question. Does this issue appear on trunk? Yes and no. The MinGW build issue this patch fixes was unique to the branch. However, the actual issue - the slow down - exists both on the 4.6 branch and on the trunk. The problem is that Janne and I couldn't agree when to call _commit(). We solved that by deferring it to later for the trunk (still time before the 4.7.0 release) while we wanted to have some fix for 4.6.2 - thus, we agreed on some fix for 4.6. The fix on the branch now calls significatly less often _commit() than before - but it probably still calls it too often. Regarding rushing with the fix: I agree that is was less optimal; the main problem is that we do not have any MinGW gfortran developer who can properly test such patches and who can detect such regressions more timely. (The performance regression dates back over a year.) Tobias
Re: [PATCH] Add an intermediate coverage format for gcov
On Oct 19, 2011, at 12:12 PM, Sharad Singhai wrote: > Sorry, I misunderstood your comment. I see that you are asking for > unmangled function names whereas the current patch supports only > mangled names. I can print unmangled names under another option. Would > that work? I defer to the person that says ok. :-)
Re: [PATCH] Further i?86 vector permutation fixes
On Wed, Oct 19, 2011 at 01:36:25PM -0700, Richard Henderson wrote: > Although I think a good followup would be to fix > this so that we don't need a copy to another register. > That could be done by adding a d.one_operand field, and > using that test instead of explicit equality everywhere. Perhaps. BTW, I've recently noticed that CSE isn't able to CSE when SET_DEST is a SUBREG, even when it is same size SUBREG. Apparently there are hundreds of places in at least the i386 backend that do emit_insn (gen_* (gen_lowpart (V???mode, something), ...)); Some of them could be changed without big difficulties (if something is used in the same routine, just create the pseudo with different mode and use gen_lowpart on the use side instead), but there are many places that use say gen_lowpart (V???mode, target) on the LHS. Should CSE be taught to handle this, or should CSE early on replace these (set (subreg:M (reg:N X) 0) ...) with (set (reg:M Y) ...) (set (reg:N X) (subreg:N (reg:M Y) 0)) so that CSE etc. can optimize it, something else? Jakub
[PATCH, testsuite]: Fix PR 50796, FAIL: gcc.dg/plugin/plugindir[1234].c
Hello! Update dg-prune-output due to inacessible -> inaccessible change. 2011-10-19 Uros Bizjak PR testsuite/50796 * gcc.dg/plugin/plugindir?.c Update dg-prune-output. Tested on x86_64-pc-linux-gnu, committed to mainline SVN. Uros. Index: gcc.dg/plugin/plugindir1.c === --- gcc.dg/plugin/plugindir1.c (revision 180206) +++ gcc.dg/plugin/plugindir1.c (working copy) @@ -1,4 +1,4 @@ /* { dg-do compile } */ /* { dg-options "-c -fplugin=foo" } */ -/* { dg-prune-output ".*inacessible plugin file.*foo\.so expanded from short plugin name.*" } */ +/* { dg-prune-output ".*inaccessible plugin file.*foo\.so expanded from short plugin name.*" } */ Index: gcc.dg/plugin/plugindir2.c === --- gcc.dg/plugin/plugindir2.c (revision 180206) +++ gcc.dg/plugin/plugindir2.c (working copy) @@ -1,4 +1,4 @@ /* { dg-do compile } */ /* { dg-options "-save-temps -c -fplugin=foo" } */ -/* { dg-prune-output ".*inacessible plugin file.*foo\.so expanded from short plugin name.*" } */ +/* { dg-prune-output ".*inaccessible plugin file.*foo\.so expanded from short plugin name.*" } */ Index: gcc.dg/plugin/plugindir3.c === --- gcc.dg/plugin/plugindir3.c (revision 180206) +++ gcc.dg/plugin/plugindir3.c (working copy) @@ -1,4 +1,4 @@ /* { dg-do preprocess } */ /* { dg-options "-fplugin=foo" } */ -/* { dg-prune-output ".*inacessible plugin file.*foo\.so expanded from short plugin name.*" } */ +/* { dg-prune-output ".*inaccessible plugin file.*foo\.so expanded from short plugin name.*" } */ Index: gcc.dg/plugin/plugindir4.c === --- gcc.dg/plugin/plugindir4.c (revision 180206) +++ gcc.dg/plugin/plugindir4.c (working copy) @@ -1,4 +1,4 @@ /* { dg-do preprocess } */ /* { dg-options "-iplugindir=my-plugindir -fplugin=foo" } */ -/* { dg-prune-output ".*inacessible plugin file.*my-plugindir/foo\.so expanded from short plugin name.*" } */ +/* { dg-prune-output ".*inaccessible plugin file.*my-plugindir/foo\.so expanded from short plugin name.*" } */
Re: [PATCH, testsuite]: Fix PR 50796, FAIL: gcc.dg/plugin/plugindir[1234].c
Uros Bizjak writes: > Index: gcc.dg/plugin/plugindir1.c > === > --- gcc.dg/plugin/plugindir1.c(revision 180206) > +++ gcc.dg/plugin/plugindir1.c(working copy) > @@ -1,4 +1,4 @@ > /* { dg-do compile } */ > /* { dg-options "-c -fplugin=foo" } */ > > -/* { dg-prune-output ".*inacessible plugin file.*foo\.so expanded from short > plugin name.*" } */ > +/* { dg-prune-output ".*inaccessible plugin file.*foo\.so expanded from > short plugin name.*" } */ Another case of pruning too much. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
regcprop.c bug fix
So while tracking down a hairy address reload for an output reload bug, copyprop_hardreg_forward_1 was faulting because it was trying to extract move patterns that didn't work out, and when it came back to the code, it then tries to access recog_data, but the problem is, the exploration of other instructions to see if they match, overwrites that data, and there is nothing that restores the data to a point in which the code below this point expects. It uses recog_data.operand[i], where i is limited by n_ops, but that value corresponded to the old data in recog_data. The recog and extract_insn in insn_invalid_p called from verify_changes called from apply_change_group called from validate_change wipes the `old' recog_data with new data. This data, for example, might only have 2 operands, with an invalid value for the third operand. The old n_ops, might well be 3 from the original data. Accessing that data can cause a crash. So, I don't know if people want to data regenerated, or it they want to cache the data, or if they want to save and restore it underneath a called api... Ok? Index: regcprop.c === --- regcprop.c (revision 1831) +++ regcprop.c (working copy) @@ -866,6 +866,10 @@ copyprop_hardreg_forward_1 (basic_block } } } + extract_insn (insn); + if (! constrain_operands (1)) + fatal_insn_not_found (insn); + preprocess_constraints (); no_move_special_case: any_replacements = false;
[PATCH, m68k] Fix floating-point comparisons with zero
Hi, It appears that m68k floating-point comparisons may have been subtly broken since 2007 when the following patch was applied: http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01570.html The bug shows with our internal 4.6-based branch, but appears to be latent on mainline (generated code is perturbed by r171236, an apparently-unrelated change -- if that commit alone is reverted, the bug reappears on current trunk). The discussion below was written wrt. the previously-mentioned 4.6 branch, so details may be slightly different on mainline, but the outcome is the same. The trouble is the code introduced in m68k.c:notice_update_cc, intended to reverse the sense of comparisons for one particular *cmp_68881/*cmp_cf alternative, inadvertently also inverts the sense for tst_68881/tst_cf (FP comparisons against zero). For the test case gcc.c-torture/execute/ieee/20010114-2.c, the badness only triggers when -freorder-blocks is in effect (-O2 and above). Two comparisons (of the same variable) against zero are munged into one in final.c after basic-block reordering makes them consecutive, and we get: ftst.s %d0 | x fjgt .L13 | (final.c omits a redundant "ftst.s %d0" here) fjngt .L2 | when we ought to have got: ftst.s %d0 | x fjgt .L13 | fjnlt .L2 | The bug only triggers due to the omission of the test, else it would happen (far) more often: if the ftst is actually emitted, the flags get reset before any harm can be done: (define_insn "tst_68881" [(set (cc0) (compare (match_operand:FP 0 "general_operand" "fm") (match_operand:FP 1 "const0_operand" "H")))] "TARGET_68881" { cc_status.flags = CC_IN_68881; /* unsets CC_REVERSED & other flags */ ... but, when final.c decides the second ftst is redundant, nothing unsets CC_REVERSED, so we actually (incorrectly) treat the comparison as having been done with reversed operands. We can fix this by never setting CC_REVERSED for ftst instructions in the first place. The problem is that the test (in notice_update_cc), if (cc_status.value2 && GET_CODE (cc_status.value2) == COMPARE && GET_MODE_CLASS (GET_MODE (XEXP (cc_status.value2, 0))) == MODE_FLOAT) { cc_status.flags = CC_IN_68881; if (!FP_REG_P (XEXP (cc_status.value2, 0))) cc_status.flags |= CC_REVERSED; } is not strict enough, and triggers for ftst instructions (even elided ones), as well as for the intended third alternative of fcmp. So, the fix is to tighten up the inner condition to not trigger for ftst. The attached patch does that, and we get the following test result delta (as well as some noise in libstdc++ results, but I think those are environment-related): FAIL -> PASS: gcc.c-torture/execute/ieee/20010114-2.c execution, -O2 FAIL -> PASS: gcc.c-torture/execute/ieee/20010114-2.c execution, -O3 -fomit-frame-pointer FAIL -> PASS: gcc.c-torture/execute/ieee/20010114-2.c execution, -O3 -g FAIL -> PASS: gcc.c-torture/execute/ieee/20030331-1.c execution, -O2 FAIL -> PASS: gcc.c-torture/execute/ieee/20030331-1.c execution, -O3 -fomit-frame-pointer FAIL -> PASS: gcc.c-torture/execute/ieee/20030331-1.c execution, -O3 -g These results are against mainline *with r171236 reverted*, cross to ColdFire Linux -- otherwise results are the same before/after. I believe the patch is still needed regardless though. OK to apply? Thanks, Julian ChangeLog gcc/ * config/m68k/m68k.c (notice_update_cc): Tighten condition for setting CC_REVERSED for FP comparisons. Index: gcc/config/m68k/m68k.c === --- gcc/config/m68k/m68k.c (revision 180197) +++ gcc/config/m68k/m68k.c (working copy) @@ -4206,7 +4206,8 @@ notice_update_cc (rtx exp, rtx insn) && GET_MODE_CLASS (GET_MODE (XEXP (cc_status.value2, 0))) == MODE_FLOAT) { cc_status.flags = CC_IN_68881; - if (!FP_REG_P (XEXP (cc_status.value2, 0))) + if (!FP_REG_P (XEXP (cc_status.value2, 0)) + && FP_REG_P (XEXP (cc_status.value2, 1))) cc_status.flags |= CC_REVERSED; } }
[C++ Patch] PR 48630 (PR 31423)
Hi, in these two twin PRs filed by Ian and Gerald, it is pointed out that cases like: struct C { int f() { return 1; } }; int f(C&c) { return ( 1 == c.f ); } where the user actually forgot the open-closed round braces are much more likely than cases where an ampersand is missing, still we output invalid use of member (did you forget the ‘&’ ?) Thus, the idea of saying instead invalid use of member (did you forget the ‘()’ ?) which I implemented in the patchlet below. Alternately we could give both hints, or refer to the argument list. Tested x86_64-linux. Thanks, Paolo. /// /cp 2011-10-19 Paolo Carlini PR c++/31423 PR c++/48630 * typeck2.c (cxx_incomplete_type_diagnostic): Improve error message for invalid use of member. /testsuite 2011-10-19 Paolo Carlini PR c++/31423 PR c++/48630 * g++.dg/parse/error42.C: New. Index: testsuite/g++.dg/parse/error42.C === --- testsuite/g++.dg/parse/error42.C(revision 0) +++ testsuite/g++.dg/parse/error42.C(revision 0) @@ -0,0 +1,5 @@ +// PR c++/48630 +// { dg-options "" } + +class C { public: C* f(); int get(); }; // { dg-error "forget the '\\(\\)'|base operand" } +int f(C* p) { return p->f->get(); } Index: cp/typeck2.c === --- cp/typeck2.c(revision 180206) +++ cp/typeck2.c(working copy) @@ -429,7 +429,7 @@ cxx_incomplete_type_diagnostic (const_tree value, case OFFSET_TYPE: bad_member: emit_diagnostic (diag_kind, input_location, 0, - "invalid use of member (did you forget the %<&%> ?)"); + "invalid use of member (did you forget the %<()%> ?)"); break; case TEMPLATE_TYPE_PARM:
Re: [C++ Patch] PR 48630 (PR 31423)
On Wed, Oct 19, 2011 at 4:34 PM, Paolo Carlini wrote: > Hi, > > in these two twin PRs filed by Ian and Gerald, it is pointed out that cases > like: > > struct C { > int f() { return 1; } > }; > > int f(C&c) { > return ( 1 == c.f ); > } > > where the user actually forgot the open-closed round braces are much more > likely than cases where an ampersand is missing, still we output > > invalid use of member (did you forget the ‘&’ ?) > > Thus, the idea of saying instead > > invalid use of member (did you forget the ‘()’ ?) > > which I implemented in the patchlet below. Alternately we could give both > hints, or refer to the argument list. I agree that '()' is a better default. But we can do better. We can use the type context, e.g. in initialization or return-statement etc., to decide whether () is intended (by looking at the TREE_TYPE of a member function type), and only in cases where we can't tell we suggest both alternative: (did you intend a function call or address of non-static member function?)
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
what kind of failures? David On Wed, Oct 19, 2011 at 2:04 PM, wrote: > On 2011/10/19 20:38:35, kcc wrote: >> >> Added code to avoid bitfields. > > Is there anything I should know about splitting basic blocks in the > presence of EH? > Currently, asan fails on 483.xalancbmk, 453.povray and 450.soplex, which > are known to have EH. > > http://codereview.appspot.com/5272048/ >
Re: [PATCH] Account for devirtualization opportunities in inliner
On 28/09/2011, at 4:56 PM, Maxim Kuvyrkov wrote: > Jan, > > The following patch starts a series of patches which improve devirtualization > optimizations in GCC. > > This patch builds on ipa-cp.c and ipa-prop.c infrastructure for analyzing > parameters and jump functions and adds basic estimation of devirtualization > benefit from inlining an edge. E.g., if inlining A across edge E into B will > allow some of the indirect edges of A to be resolved, then inlining cost of > edge E is reduced. > > The patch was bootstrapped and regtested on x86_64-linux-gnu on both -m32 and > -m64 multilibs. > > OK to commit? Ping. The primary change of this patch is to make evaluate_conditions_for_edge to output KNOWN_VALS and KNOWN_BINFOS arrays in addition to conditions for a callsite. KNOWN_VALS and KNOWN_BINFOS are then passed on to a subroutine of estimate_calls_size_and_time, which uses ipa-prop.c infrastructure to check if it will be possible to devirtualize any of the indirect edged within callee. If possible, then *size and *time returned by estimate_calls_size_and_time are reduced to account for the devirtualization benefits. OK for trunk? -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics fsf-gcc-devirt-account.ChangeLog Description: Binary data fsf-gcc-devirt-account.patch Description: Binary data
Re: [Patch, Fortran] PR 47023: C_Sizeof: Rejects valid code
>> If I saw it correctly, one still needs to fix the vendor-extension SIZEOF >> for procedure pointers (it currently returns one byte!) > > Right. Actually I don't quite understand where that 1 byte comes from. I have just committed the patch from comment #23 of the PR (which has been approved by Tobias privately), rejecting procedure pointers as actual arguments of SIZEOF: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180210 Patch includes documentation in intrinsic.texi (where I also fixed another minor nit). Cheers, Janus
Re: [patch] Fix PR tree-optimization/49960 ,Fix self data dependence
gcc-patches-ow...@gcc.gnu.org wrote on 17/10/2011 09:03:59 AM: > From: Richard Guenther > To: Razya Ladelsky/Haifa/IBM@IBMIL > Cc: gcc-patches@gcc.gnu.org, Sebastian Pop > Date: 17/10/2011 09:04 AM > Subject: Re: [patch] Fix PR tree-optimization/49960 ,Fix self data dependence > Sent by: gcc-patches-ow...@gcc.gnu.org > > On Mon, Oct 17, 2011 at 8:23 AM, Razya Ladelsky wrote: > > This patch fixes the failures described in > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960 > > It also fixes bzips when run with autopar enabled. > > > > In both cases the self dependences are not handled correctly. > > In the first case, a non affine access is analyzed: > > in the second, the distance vector is not calculated correctly (the > > distance vector considered for for self dependences is always (0,0,...)) > > > > As a result, the loops get wrongfully parallelized. > > > > The patch avoids the special handling of self dependences, and analyzes > > all dependences in the same way. Specific adjustments > > and support for the self dependence cases were made. > > Can you elaborate on > > @@ -3119,8 +3135,11 @@ add_other_self_distances (struct data_dependence_r > { > if (DDR_NUM_SUBSCRIPTS (ddr) != 1) >{ > -DDR_ARE_DEPENDENT (ddr) = chrec_dont_know; > -return; > +if (DDR_NUM_SUBSCRIPTS (ddr) != 2 || !integer_zerop (DR_ACCESS_FN > (DDR_A (ddr), 1))) > + { > +DDR_ARE_DEPENDENT (ddr) = chrec_dont_know; > +return; > + } >} > > access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); > > ? It needed a comment before, and now so even more. > > The rest of the patch is ok, I suppose the above hunk is to enhance > something, not > to fix the bug? For fortran code like: DO 140 J=1,MB DO 130 K=1,NA BKJ=B(K,J) IF(BKJ.EQ.ZERO) GO TO 130 DO 120 I=1,MA C(I,J)=C(I,J)+A(K,I)*BKJ 120 CONTINUE 130CONTINUE 140 CONTINUE RETURN The access functions for the C(i j) self dependence are: (Data Dep: #(Data Ref: # bb: 9 # stmt: D.1427_79 = *c_78(D)[D.1426_77]; # ref: *c_78(D)[D.1426_77]; # base_object: *c_78(D); # Access function 0: {{(stride.12_25 + 1) + offset.13_36, +, stride.12_25}_1, +, 1}_3 # Access function 1: 0B #) #(Data Ref: # bb: 9 # stmt: *c_78(D)[D.1426_77] = D.1433_88; # ref: *c_78(D)[D.1426_77]; # base_object: *c_78(D); # Access function 0: {{(stride.12_25 + 1) + offset.13_36, +, stride.12_25}_1, +, 1}_3 # Access function 1: 0B #) Two dimesions are created to describe C(i j) although there's no need for access function 1 which is just 0B. If this was a C code, we would have these two access functions for C[i][j]: (Data Dep: #(Data Ref: # bb: 5 # stmt: t_10 = C[i_33][j_37]; # ref: C[i_33][j_37]; # base_object: C # Access function 0: {3, +, 1}_3 # Access function 1: {3, +, 1}_2 #) #(Data Ref: # bb: 5 # stmt: C[i_33][j_37] = D.3852_15; # ref: C[i_33][j_37]; # base_object: C # Access function 0: {3, +, 1}_3 # Access function 1: {3, +, 1}_2 #) In order to handle the Fortran data accesses, even for simple cases as above, I would need to handle multivariate accesses. The data dependence analysis doesn't know how to handle such dependences if there's more than one subscript. The above Frotran code doesn't actually have two subscripts, but one, and thus should be handled. The reason this issue came up only with the changes of this patch is that now add_other_self_distances is called from build_classic_dist_vector, which is called also for self dependences. Before the patch, the distance vector for self dependences was always determined as a vector of 0's, and build_classic_dist_vector was not called. I hope it's clearer now, I will add a comment to the code, and submit it before committing it. Thanks, Razya > > Thanks, > Richard. > > > Bootstrap and testsuite pass successfully for ppc64-redhat-linux. > > > > OK for trunk? > > Thank you, > > Razya > > > > > > ChangeLog: > > > >PR tree-optimization/49960 > >* tree-data-ref.c (compute_self_dependence): Remove. > > (initialize_data_dependence_relation): Add intializations. > > Remove compute_self_dependence. > > (add_other_self_distances): Add support for two dimensions if > > the second is zero. > > (compute_affine_dependence): Remove the !DDR_SELF_REFERENCE > > condition. > > (compute_all_dependences): Remove call to > > compute_self_dependence. Add call to compute_affine_dependence > > > testsuite/ChangeLog: > > > >PR tree-optimization/49660 > >* gcc.dg/autopar/pr49660.c: New test. > > * gcc.dg/autopar/pr49660-1.c: New test. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [trans-mem] Add gl_wt TM method.
Committed with the changes below. On Wed, 2011-10-19 at 12:17 -0700, Richard Henderson wrote: > > Add support for TM-method-specific begin code. > > > > * libitm_i.h (GTM::gtm_restart_reason): Re-arrange and clean up > > declarations. > > * dispatch.h (GTM::abi_dispatch::begin_or_restart): New. > > * method-serial.cc: Implement begin_or_restart(). > > * beginend.cc (GTM::gtm_thread::begin_transaction): Call > > dispatch-specific begin_or_restart(). > > (GTM::gtm_thread::restart): Same. > > Ok except, > > > + // Run dispatch-specific restart code. Retry until we succeed. > > + GTM::gtm_restart_reason rr; > > + while ((rr = disp->begin_or_restart()) > > + != NUM_RESTARTS) > > Please add > > NO_RESTART = NUM_RESTARTS > > (or it's own number *after* NUM_RESTARTS, or -1, or something) > to the enumeration and use that name. Using num_restarts here is confusing. > > > Fixed gtm_thread::serialirr_mode to actually use serialirr, not serial. > > > > * method-serial.cc (GTM::gtm_thread::serialirr_mode): Fixed: Use > > serial-irrevocable dispatch, not serial. > > Ok. > > > Do not free transaction-local memory when committing a nested > > transaction. > > > > * alloc.cc (commit_allocations_2): Do not free transaction-local > > memory when committing a nested transaction. > > Ok. > > > Handle re-initialization of the current method group. > > > > * retry.cc (GTM::gtm_thread::decide_retry_strategy): Handle > > re-initialization of the current method group. > > * libitm_i.h (GTM::gtm_restart_reason): Add restart reason for > > this. > > Ok. > > > Undo log is used for both thread-local and shared data. > > > > * libitm_i.h: Renamed gtm_local_undo to gtm_undolog_entry. > > (GTM::gtm_thread): Renamed local_undo to undolog. Renamed > > undolog-related member functions from *_local to *_undolog. > > * local.cc (gtm_thread::commit_undolog): Same. > > * beginend.cc (GTM::gtm_thread::trycommit): Same. > > (GTM::gtm_thread::rollback): Roll back undolog before > > dispatch-specific rollback. > > Ok. > > > Ensure privatization safety if requested by a TM method. > > > > * beginend.cc (GTM::gtm_thread::trycommit): Ensure privatization > > safety if requested by a TM method. > > * dispatch.h (GTM::abi_dispatch::trycommit): Add parameter for > > privatization safety. > > * method-serial.cc: Same. > > Ok. > > > Add gl_wt TM method. > > > > * libitm_i.h: Add gl_wt dispatch. > > * retry.cc (parse_default_method): Same. > > * method-gl.cc: New file. > > * Makefile.am: Use method-gl.cc. > > * Makefile.in: Rebuild. > > Ok with... > > > Fix gl_wt commit/rollback when serial lock has been acquired. > > > > * method-gl.cc (gl_wt_dispatch::trycommit): Fix interaction with > > gtm_thread::shared_state when the serial lock is acquired. > > (gl_wt_dispatch::rollback): Same. > > ... this merged with the previous commit. > > > Fix TLS read accesses on Linux/x86. > > > > * config/linux/x86/tls.h (abi_disp): Make TLS slot read > > volatile. > > (gtm_thr): Same. > > Ok. > > > r~ >
Re: [PATCH] Don't assume that constants can clobber vtbl
On 11/10/2011, at 4:05 AM, Martin Jambor wrote: > > Just before this, there is the following test (guarded by > flag_strict_aliasing per explicit Richi's request). > > if (flag_strict_aliasing > && !POINTER_TYPE_P (TREE_TYPE (lhs))) > return false; > > Why is it not enough to catch integer constants? It appears to be enough. The patch I sent was developed against GCC 4.6 and I didn't notice your change when updated it for 4.7. Thanks for fixing this. -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
C++ PATCH for c++/50793 (incomplete value-initialization with default argument)
When bot_manip regenerates TARGET_EXPR/AGGR_INIT_EXPR, it needs to preserve the AGGR_INIT_ZERO_FIRST flag. Tested x86_64-pc-linux-gnu, applying to trunk and release branches. commit f56f4ec3e3620b4ae4e07986a17338d607952c65 Author: Jason Merrill Date: Wed Oct 19 17:21:34 2011 -0400 PR c++/50793 * tree.c (bot_manip): Propagate AGGR_INIT_ZERO_FIRST. diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index 75aa265..d023eb8 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -1879,8 +1879,12 @@ bot_manip (tree* tp, int* walk_subtrees, void* data) tree u; if (TREE_CODE (TREE_OPERAND (t, 1)) == AGGR_INIT_EXPR) - u = build_cplus_new (TREE_TYPE (t), TREE_OPERAND (t, 1), - tf_warning_or_error); + { + u = build_cplus_new (TREE_TYPE (t), TREE_OPERAND (t, 1), + tf_warning_or_error); + if (AGGR_INIT_ZERO_FIRST (TREE_OPERAND (t, 1))) + AGGR_INIT_ZERO_FIRST (TREE_OPERAND (u, 1)) = true; + } else u = build_target_expr_with_type (TREE_OPERAND (t, 1), TREE_TYPE (t), tf_warning_or_error); diff --git a/gcc/testsuite/g++.dg/init/value9.C b/gcc/testsuite/g++.dg/init/value9.C new file mode 100644 index 000..4899bd8 --- /dev/null +++ b/gcc/testsuite/g++.dg/init/value9.C @@ -0,0 +1,32 @@ +// PR c++/50793 +// { dg-do run } + +struct NonTrivial +{ + NonTrivial() { } +}; + +struct S +{ + NonTrivial nt; + int i; +}; + +int f(S s) +{ + s.i = 0xdeadbeef; + return s.i; +} + +int g(S s = S()) +{ + return s.i; +} + +int main() +{ + f(S()); // make stack dirty + + if ( g() ) +__builtin_abort(); +}
Re: Intrinsics for N2965: Type traits and base classes
> Please post a ChangeLog entry with a patch. Someone added one for > you: > > 2011-10-17 Michael Spertus > > * gcc/c-family/c-common.c (c_common_reswords): Add __bases, > __direct_bases. > * gcc/c-family/c-common.h: Add RID_BASES and RID_DIRECT_BASES. > > but it is in the wrong file (c-family has its own ChangeLog) and > shouldn't contain the gcc/c-family prefix (paths are relative to the > directory). Sorry Eric, this was my fault. I see that Paolo has already fixed this up. Thanks Paolo! best, benjamin
[C++ Patch] PR 13657
Hi, this tries to implement what Jason suggested in the audit trail: do not refer to an argument and use instead 'cannot convert...", like, eg, in cvt.c. Tested x86_64-linux. Thanks, Paolo. // /cp 2011-10-19 Paolo Carlini PR c++/13657 * class.c (instantiate_type): Fix error message. /testsuite 2011-10-19 Paolo Carlini PR c++/13657 * g++.dg/parse/error42.C: New. Index: testsuite/g++.dg/parse/error42.C === --- testsuite/g++.dg/parse/error42.C(revision 0) +++ testsuite/g++.dg/parse/error42.C(revision 0) @@ -0,0 +1,4 @@ +// PR c++/13657 + +class C { public: int (*f())(); int bar(); }; +int (*C::f())() { return C::bar; } // { dg-error "cannot convert 'C::bar'" } Index: cp/class.c === --- cp/class.c (revision 180208) +++ cp/class.c (working copy) @@ -6867,8 +6867,8 @@ instantiate_type (tree lhstype, tree rhs, tsubst_f else { if (flags & tf_error) - error ("argument of type %qT does not match %qT", - TREE_TYPE (rhs), lhstype); + error ("cannot convert %qE from type %qT to type %qT", + rhs, TREE_TYPE (rhs), lhstype); return error_mark_node; } }
[SH] PR 50694 - SH2A little endian does not actually work
Hello, the attached patch addresses PR 50694 and also does some cleanups in sh.md. I haven't run the testsuite with this one since it's basically just text replacements, except for the changed cmpgeusi_t insn. However, CSiBE didn't show any code size changes with the changed insn. As far as I could observe it, the check for the zero special case is already being done before insn matching. Or am I missing something there? If desired I could run the testsuite anyways of course. Cheers, Oleg ChangeLog: 2011-10-19 Oleg Endo PR target/50694 * config/sh/sh.h (IS_LITTLE_ENDIAN_OPTION, UNSUPPORTED_SH1, UNSUPPORTED_SH2A): New macros. (DRIVER_SELF_SPECS): Filter out unsupported options taking the default configuration into account. * config/sh/sh.c (MSW, LSW): Move macro definitions to header file. * config/sh/sh.md: Replace TARGET_LITTLE_ENDIAN conditionals with MSW and LSW macros where applicable. (cmpgeusi_t): Change insn_and_split to insn. Remove check for unsigned greater or equal zero comparison and its split. (divsi_inv_qitable, divsi_inv_hitable): Fix warning for redundant '@' with single alternative. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 180080) +++ gcc/config/sh/sh.c (working copy) @@ -61,9 +61,6 @@ int code_for_indirect_jump_scratch = CODE_FOR_indirect_jump_scratch; -#define MSW (TARGET_LITTLE_ENDIAN ? 1 : 0) -#define LSW (TARGET_LITTLE_ENDIAN ? 0 : 1) - /* These are some macros to abstract register modes. */ #define CONST_OK_FOR_ADD(size) \ (TARGET_SHMEDIA ? CONST_OK_FOR_I10 (size) : CONST_OK_FOR_I08 (size)) Index: gcc/config/sh/sh.h === --- gcc/config/sh/sh.h (revision 180080) +++ gcc/config/sh/sh.h (working copy) @@ -417,10 +417,37 @@ #define SH_DIV_STR_FOR_SIZE "call" #endif -#define DRIVER_SELF_SPECS "%{m2a:%{ml:%eSH2a does not support little-endian}}" +/* SH1 and SH2A do not support little-endian. Catch such combinations + taking into account the default configuration. */ +#if TARGET_ENDIAN_DEFAULT == MASK_BIG_ENDIAN +#define IS_LITTLE_ENDIAN_OPTION "%{ml:" +#else +#define IS_LITTLE_ENDIAN_OPTION "%{!mb:" +#endif +#if TARGET_CPU_DEFAULT == SELECT_SH1 +#define UNSUPPORTED_SH1 IS_LITTLE_ENDIAN_OPTION \ +"%{m1|!m2*:%{!m3*:%{!m4*:%{!m5*:%eSH1 does not support little-endian}" +#else +#define UNSUPPORTED_SH1 IS_LITTLE_ENDIAN_OPTION \ +"%{m1:%eSH1 does not support little-endian}}" +#endif + +#if TARGET_CPU_DEFAULT & MASK_HARD_SH2A +#define UNSUPPORTED_SH2A IS_LITTLE_ENDIAN_OPTION \ +"%{m2a*|!m1:%{!m2*:%{!m3*:%{!m4*:{!m5*:%eSH2a does not supportlittle-endian}}" +#else +#define UNSUPPORTED_SH2A IS_LITTLE_ENDIAN_OPTION \ +"%{m2a*:%eSH2a does not support little-endian}}" +#endif + +#define DRIVER_SELF_SPECS UNSUPPORTED_SH1, UNSUPPORTED_SH2A + #define ASSEMBLER_DIALECT assembler_dialect +#define MSW (TARGET_LITTLE_ENDIAN ? 1 : 0) +#define LSW (TARGET_LITTLE_ENDIAN ? 0 : 1) + extern int assembler_dialect; enum sh_divide_strategy_e { Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 180080) +++ gcc/config/sh/sh.md (working copy) @@ -806,19 +806,12 @@ ;; SImode unsigned integer comparisons ;; - -(define_insn_and_split "cmpgeusi_t" +(define_insn "cmpgeusi_t" [(set (reg:SI T_REG) (geu:SI (match_operand:SI 0 "arith_reg_operand" "r") - (match_operand:SI 1 "arith_reg_or_0_operand" "rN")))] + (match_operand:SI 1 "arith_reg_operand" "r")))] "TARGET_SH1" "cmp/hs %1,%0" - "&& operands[1] == CONST0_RTX (SImode)" - [(pc)] - " -{ - emit_insn (gen_sett ()); - DONE; -}" [(set_attr "type" "mt_group")]) (define_insn "cmpgtusi_t" @@ -943,15 +936,12 @@ (match_dup 6)] " { - operands[2] -= gen_rtx_REG (SImode, - true_regnum (operands[0]) + (TARGET_LITTLE_ENDIAN ? 1 : 0)); + operands[2] = gen_rtx_REG (SImode, true_regnum (operands[0]) + MSW); operands[3] = (operands[1] == const0_rtx ? const0_rtx - : gen_rtx_REG (SImode, - true_regnum (operands[1]) - + (TARGET_LITTLE_ENDIAN ? 1 : 0))); + : gen_rtx_REG (SImode, true_regnum (operands[1]) + MSW)); + operands[4] = gen_lowpart (SImode, operands[0]); operands[5] = gen_lowpart (SImode, operands[1]); operands[6] = gen_label_rtx (); @@ -1465,12 +1455,9 @@ " { rtx high0, high2, low0 = gen_lowpart (SImode, operands[0]); - high0 = gen_rtx_REG (SImode, - true_regnum (operands[0]) - + (TARGET_LITTLE_ENDIAN ? 1 : 0)); - high2 = gen_rtx_REG (SImode, - true_regnum (operands[2]) - + (TARGET_LITTLE_ENDIAN ? 1 : 0)); + high0 = gen_rtx_REG (SImode, true_regnum (operands[0]) + MSW); + high2 = gen_rtx_REG (SImode, true
Re: [C++ Patch] PR 13657
OK. Jason
Re: [C++ Patch] PR 48630 (PR 31423)
Surely we should only make this change for function members. Jason
[v3] tr2: bool_set, dynamic_bitset, ratio
Hi Edward! I have reviewed and selectively merged your development branch, libstdcxx-tr2-ideas-branch, into trunk. As ISO C++ is now looking at new library efforts for TR2 (and N2965 has already been checked in to trunk) I thought it well-timed. Please check my work, patch attached. On libstdcxx-tr2-ideas-branch, I found: 1) N1718 + extensions for polynomial. 2) N2136 bool_set 3) N2050 dynamic_bitset 4) ratio extensions for binary 5) cmath extensions 6) C++0x rope Of these, I selected 2, 3, and 4 for merging. It is sad to leave N1718 unmerged, I think there is a lot of good work here and frankly see N1744 (marked with open status for LWG) as a weaker candidate. But it's hard for me to justify putting this in given that it was explicitly rejected by LWG. Perhaps you have current status or thoughts? For the rope extensions, I would suggest just adding C++0x bits to ext/rope, instead of making tr2/rope. The additions look interesting but I just saw declarations and no actual implementation so I punted. I would encourage you to continue working on TR2 projects! This future work can now proceed on mainline, without a special branch. tested x86_64/linux -benjamin 20111019-1.patch.bz2 Description: application/bzip
Avoid gcc.dg/tree-prof/val-prof-7.c dependence on
The testcase gcc.dg/tree-prof/val-prof-7.c includes to get a declaration of bzero. This causes it to fail on targets where bzero (a legacy function removed in the latest version of POSIX) is not declared in that header; declaring it explicitly in the testcase is more reliable. This patch changes the include to an explicit declaration. Tested with cross to i686-mingw32 (where the header just includes and does not provide a declaration of bzero). OK to commit? 2011-10-19 Joseph Myers * gcc.dg/tree-prof/val-prof-7.c: Declare bzero instead of including . Index: gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c === --- gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (revision 180200) +++ gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (working copy) @@ -1,7 +1,7 @@ /* { dg-options "-O2 -fdump-ipa-profile -mtune=core2" } */ /* { dg-skip-if "" { ! { i?86-*-* x86_64-*-* } } { "*" } { "" } } */ -#include +extern void bzero (void *, __SIZE_TYPE__); int foo(int len) { -- Joseph S. Myers jos...@codesourcery.com
new patches using -fopt-info (issue5294043)
After some off-line discussion, we decided to use a more general approach to control the printing of optimization messages/warnings. We will introduce a new option -fopt-info: * fopt-info=0 or fno-opt-info: no message will be emitted. * fopt-info or fopt-info=1: emit important warnings and optimization messages with large performance impact. * fopt-info=2: warnings and optimization messages targeting power users. * fopt-info=3: informational messages for compiler developers. 2011-10-19 Rong Xu * gcc/common.opt (fopt-info): New flag. (fopt-info=) Ditto. * gcc/opts.c (common_handle_option): Handle OPT_fopt_info_. * gcc/flag-types.h (opt_info_verbosity_levels): New enum. * gcc/value-prof.c (check_ic_counter): guard warnings/notes by flag_opt_info. (find_func_by_funcdef_no): Ditto. (check_ic_target): Ditto. (check_counter): Ditto. (check_ic_counter): Ditto. * gcc/mcf.c (find_minimum_cost_flow): Ditto. * gcc/profile.c (read_profile_edge_counts): Ditto. (compute_branch_probabilities): Ditto. * gcc/coverage.c (read_counts_file): Ditto. (get_coverage_counts): Ditto. * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto. (maybe_issue_profile_use_note): Ditto. (optimize_reusedist): Ditto. * gcc/testsuite/gcc.dg/pr32773.c: add -fopt-info. * gcc/testsuite/gcc.dg/pr40209.c: Ditto. * gcc/testsuite/gcc.dg/pr26570.c: Ditto. * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto. Index: gcc/value-prof.c === --- gcc/value-prof.c(revision 180106) +++ gcc/value-prof.c(working copy) @@ -472,9 +472,10 @@ : DECL_SOURCE_LOCATION (current_function_decl); if (flag_profile_correction) { - inform (locus, "correcting inconsistent value profile: " - "%s profiler overall count (%d) does not match BB count " - "(%d)", name, (int)*all, (int)bb_count); + if (flag_opt_info >= OPT_INFO_MAX) +inform (locus, "correcting inconsistent value profile: %s " + "profiler overall count (%d) does not match BB count " +"(%d)", name, (int)*all, (int)bb_count); *all = bb_count; if (*count > *all) *count = *all; @@ -510,33 +511,42 @@ location_t locus; if (*count1 > all && flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, "Correcting inconsistent value profile: " - "ic (topn) profiler top target count (%ld) exceeds " - "BB count (%ld)", (long)*count1, (long)all); + if (flag_opt_info >= OPT_INFO_MAX) +{ + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, "Correcting inconsistent value profile: " + "ic (topn) profiler top target count (%ld) exceeds " + "BB count (%ld)", (long)*count1, (long)all); +} *count1 = all; } if (*count2 > all && flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, "Correcting inconsistent value profile: " - "ic (topn) profiler second target count (%ld) exceeds " - "BB count (%ld)", (long)*count2, (long)all); + if (flag_opt_info >= OPT_INFO_MAX) +{ + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, "Correcting inconsistent value profile: " + "ic (topn) profiler second target count (%ld) exceeds " + "BB count (%ld)", (long)*count2, (long)all); +} *count2 = all; } if (*count2 > *count1) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, "Corrupted topn ic value profile: " - "first target count (%ld) is less than the second " - "target count (%ld)", (long)*count1, (long)*count2); + if (flag_opt_info >= OPT_INFO_MAX) +{ + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, "Corrupted topn ic value profile: " + "first target count (%ld) is less than the second " + "target count (%ld)", (long)*count1, (long)*count2); +} return true; } @@ -548,12 +558,16 @@ *count2 = all - *count1; else
Re: new patches using -fopt-info (issue5294043)
x...@google.com (Rong Xu) writes: > After some off-line discussion, we decided to use a more general approach > to control the printing of optimization messages/warnings. We will > introduce a new option -fopt-info: > * fopt-info=0 or fno-opt-info: no message will be emitted. > * fopt-info or fopt-info=1: emit important warnings and optimization >messages with large performance impact. > * fopt-info=2: warnings and optimization messages targeting power users. > * fopt-info=3: informational messages for compiler developers. It would be interested to have some warnings about missing SRA opportunities in =1 or =2. I found that sometimes fixing those can give a large speedup. Right now a common case that prevents SRA on structure field is simply a memset or memcpy. -Andi -- a...@linux.intel.com -- Speaking for myself only