Re: [PATCH 1/2] Add missing page rounding of a page_entry

2011-10-19 Thread Jakub Jelinek
On Wed, Oct 19, 2011 at 08:40:07AM +0200, Andi Kleen wrote:
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index 2da99db..ba88e3f 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -736,6 +736,7 @@ alloc_page (unsigned order)
>entry_size = num_objects * OBJECT_SIZE (order);
>if (entry_size < G.pagesize)
>  entry_size = G.pagesize;
> +  entry_size = ROUND_UP (entry_size, G.pagesize);

Isn't the "if (entry_size < G.pagesize) entry_size = G.pagesize;"
above this now redundant?  I'm fairly sure we never call this with
zero num_objects or zero OBJECT_SIZE (order) and for anything
else ROUND_UP should round < pagesize to pagesize, right?

Jakub


[PATCH, PR50769] Fix ICE in phi_alternatives_equal.

2011-10-19 Thread Tom de Vries
Richard,

For the example from the PR, -ftree-tail-merge discovers that blocks 6 and 7 are
equal, removes block 7 and redirects the outgoing edge of it's predecessor block
11 to block 6.

Block 6 contains a phi, which after redirecting the edge from block 11 looks
like this:
...
.MEM_54 = PHI <.MEM_27(18), .MEM_26(17), (11)>
...

Since update_vops is false, TODO_update_ssa_only_virtuals will be run
and -ftree-tail-merge doesn't update the phi.

However during cleanup_tree_cfg (which is done before updating the ssa),
phi_alternatives_equal is called which asserts that the phi arguments are 
defined.

The patch fixes the ICE by setting the empty phi argument to .MEM.

bootstrapped and reg-tested on x86_64.

Ok for trunk?

Thanks,
- Tom

2011-10-19  Tom de Vries  

PR 50769/tree-optimization
* tree-ssa-tail-merge.c (replace_block_by): Calculate phi_vuse2
unconditionally.  Handle case that phi_vuse2 is not an SSA_NAME.  Add
dummy argument .MEM to phi when increasing number of arguments of phi by
redirecting edges to the block with phi.
Index: gcc/tree-ssa-tail-merge.c
===
--- gcc/tree-ssa-tail-merge.c (revision 179773)
+++ gcc/tree-ssa-tail-merge.c (working copy)
@@ -1470,11 +1470,14 @@ replace_block_by (basic_block bb1, basic
   edge e;
   edge_iterator ei;
 
+  phi_vuse2 = vop_at_entry (bb2);
+  if (phi_vuse2 != NULL_TREE && TREE_CODE (phi_vuse2) != SSA_NAME)
+phi_vuse2 = NULL_TREE;
+
   if (update_vops)
 {
   /* Find the vops at entry of bb1 and bb2.  */
   phi_vuse1 = vop_at_entry (bb1);
-  phi_vuse2 = vop_at_entry (bb2);
 
   /* If one of the 2 not found, it means there's no need to update.  */
   update_vops = phi_vuse1 != NULL_TREE && phi_vuse2 != NULL_TREE;
@@ -1507,6 +1510,9 @@ replace_block_by (basic_block bb1, basic
   gcc_assert (pred_edge != NULL);
   if (update_vops)
 	VEC_safe_push (edge, heap, redirected_edges, pred_edge);
+  else if (phi_vuse2 && gimple_bb (SSA_NAME_DEF_STMT (phi_vuse2)) == bb2)
+	add_phi_arg (SSA_NAME_DEF_STMT (phi_vuse2), SSA_NAME_VAR (phi_vuse2),
+		 pred_edge, UNKNOWN_LOCATION);
 }
 
   /* Update the vops.  */


Re: [4.6] Fix type of SRAed enum accesses

2011-10-19 Thread Richard Guenther
On Tue, Oct 18, 2011 at 2:21 PM, Jakub Jelinek  wrote:
> On Tue, Sep 27, 2011 at 03:26:03PM +0100, Richard Sandiford wrote:
>> This patch fixes a miscompilation of stage1 c-parser.o in an ARM bootstrap.
>> When an access to an enum field was SRAed, a component ref used the type
>> of the integer temporary variable instead of the type of the enum.
>> It therefore didn't alias other accesses to the same structure,
>> and was scheduled after a copy-load.
>>
>> Tested on x86_64-linux-gnu, and by verifying that c-parser.o is correctly
>> compiled for ARM after the patch.  Martin says he's going to test on ia64
>> too (thanks) -- I'll add 50326 to the changelog if that goes OK.
>>
>> OK to install if there are no regressions on ia64?
>
> As mentioned in bugzilla, the patch that caused this regression has been
> committed to 4.6 branch as well.
>
> Here is a backport of that patch to 4.6 branch, bootstrapped/regtested on
> x86_64-linux and i686-linux and Matthias AFAIK bootstrapped it on ia64-linux
> (where it previously failed to bootstrap because of this), regtest pending.
>
> Ok for 4.6?

Ok.

Richard.

> 2011-10-18  Jakub Jelinek  
>
>        PR target/50350
>        Backport from mainline
>        2011-09-27  Richard Sandiford  
>
>        PR middle-end/50386
>        PR middle-end/50326
>        * tree-sra.c (build_ref_for_model): Use the type of the field as
>        the type of the COMPONENT_REF.
>
> --- gcc/tree-sra.c.jj   2011-09-30 17:19:10.0 +0200
> +++ gcc/tree-sra.c      2011-10-18 08:01:28.682647833 +0200
> @@ -1448,12 +1448,12 @@ build_ref_for_model (location_t loc, tre
>  {
>   if (TREE_CODE (model->expr) == COMPONENT_REF)
>     {
> -      tree t, exp_type;
> -      offset -= int_bit_position (TREE_OPERAND (model->expr, 1));
> +      tree t, exp_type, fld = TREE_OPERAND (model->expr, 1);
> +      offset -= int_bit_position (fld);
>       exp_type = TREE_TYPE (TREE_OPERAND (model->expr, 0));
>       t = build_ref_for_offset (loc, base, offset, exp_type, gsi, 
> insert_after);
> -      return fold_build3_loc (loc, COMPONENT_REF, model->type, t,
> -                             TREE_OPERAND (model->expr, 1), NULL_TREE);
> +      return fold_build3_loc (loc, COMPONENT_REF, TREE_TYPE (fld), t, fld,
> +                             NULL_TREE);
>     }
>   else
>     return build_ref_for_offset (loc, base, offset, model->type,
>
>        Jakub
>


Re: [patch testsuite]: Adjust tree-ssa/builtin-expect-*.c tests for high cost-branching optimization

2011-10-19 Thread Richard Guenther
On Tue, Oct 18, 2011 at 1:33 PM, Kai Tietz  wrote:
> Hello,
>
> this patch adjusts __builtin_expect tests in tree-ssa so, that short-circuit 
> branch optimization don't lead to fallout.  I've used here a multiplication, 
> as simple substraction/addition might get optimized away.
>
> ChangeLog
>
> 2011-10-18  Kai Tietz  
>
>        * gcc.dg/tree-ssa/builtin-expect-1.c: Adjust test.
>        * gcc.dg/tree-ssa/builtin-expect-2.c: Adjust test.
>        * gcc.dg/tree-ssa/builtin-expect-3.c: Adjust test.
>        * gcc.dg/tree-ssa/builtin-expect-4.c: Adjust test.
>        * gcc.dg/tree-ssa/builtin-expect-5.c: Adjust test.
>
>
> Regression tested for x86_64-unknown-linux-gnu. Ok for apply?

Ok.

Thanks,
Richard.

> Regards,
> Kai
>
> Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c
> ===
> --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c
> +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-1.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-gimple" } */
>
> -f (int i, float j)
> +f (int i, float j, int i2, float j2)
>  {
> -  if (__builtin_expect (i > 0 && j, 0))
> +  if (__builtin_expect ((i * i2) > 0 && (j * j2), 0))
>     g ();
>  }
>
> Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c
> ===
> --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c
> +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-2.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-gimple" } */
>
> -f (int i, float j)
> +f (int i, float j, int i2, float j2)
>  {
> -  if (__builtin_expect (i > 0 || j, 0))
> +  if (__builtin_expect ((i * i2) > 0 || (j * j2), 0))
>     ;
>   else
>     g ();
> Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c
> ===
> --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c
> +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-3.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-gimple" } */
>
> -f (int i, float j)
> +f (int i, float j, int i2, float j2)
>  {
> -  if (__builtin_expect (i > 0 && j, 0))
> +  if (__builtin_expect ((i * i2) > 0 && (j * j2), 0))
>     a ();
>   else
>     b ();
> Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c
> ===
> --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c
> +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-4.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-gimple" } */
>
> -f (int i, float j)
> +f (int i, float j, int i2, float j2)
>  {
> -  if (__builtin_expect (i > 0 || j, 0))
> +  if (__builtin_expect ((i * i2) > 0 || (j * j2), 0))
>     a ();
>   else
>     b ();
> Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c
> ===
> --- gcc-head.orig/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c
> +++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/builtin-expect-5.c
> @@ -1,9 +1,9 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-forwprop" } */
>
> -f (int i, float j)
> +f (int i, float j, int i2, float j2)
>  {
> -  if (__builtin_expect (i > 0 && __builtin_expect (j != 0, 1), 0))
> +  if (__builtin_expect ((i * i2) > 0 && __builtin_expect ((j * j2) != 0, 1), 
> 0))
>     a ();
>   else
>     b ();
>


Re: [PATCH] Fix gcc.dg/guality/asm-1.c regressions on 4.6 branch

2011-10-19 Thread Richard Guenther
On Tue, Oct 18, 2011 at 5:37 PM, Jakub Jelinek  wrote:
> Hi!
>
> Apparently a recent change (2011-10-1[23] or so) regressed
> gcc.dg/guality/asm-1.c test on x86_64-linux on the 4.6 branch.
> This turned out to be a bug in loc_descriptor which has been passing wrong
> mode on the recursive call, so when we had var_location note containing
> SImode subreg of some DImode expression, the inner loc_descriptor would
> just give up because of the mode mismatch.
>
> Fixed thusly (here the trunk version of the patch, attached 4.6 version),
> bootstrapped/regtested on x86_64-linux and i686-linux (both branches in each
> case), ok for trunk/4.6?

Ok.

Thanks,
Richard.

> 2011-10-18  Jakub Jelinek  
>
>        * dwarf2out.c (loc_descriptor): For SUBREG pass SUBREG_REG's mode as
>        second argument instead of mode.
>
> --- gcc/dwarf2out.c.jj  2011-09-08 11:21:10.0 +0200
> +++ gcc/dwarf2out.c     2011-10-18 14:09:46.0 +0200
> @@ -12502,7 +12502,8 @@ loc_descriptor (rtx rtl, enum machine_mo
>         legitimate to make the Dwarf info refer to the whole register which
>         contains the given subreg.  */
>       if (REG_P (SUBREG_REG (rtl)) && subreg_lowpart_p (rtl))
> -       loc_result = loc_descriptor (SUBREG_REG (rtl), mode, initialized);
> +       loc_result = loc_descriptor (SUBREG_REG (rtl),
> +                                    GET_MODE (SUBREG_REG (rtl)), 
> initialized);
>       else
>        goto do_default;
>       break;
>
>        Jakub
>


Re: [PATCH 2/2] Free large chunks in ggc

2011-10-19 Thread Jakub Jelinek
On Wed, Oct 19, 2011 at 08:40:08AM +0200, Andi Kleen wrote:
> From: Andi Kleen 
> 
> This implements the freeing back of large chunks in the ggc madvise path
> Richard Guenther asked for.  This way on systems with limited
> address space malloc() and other allocators still have
> a chance to get back at some of the memory ggc freed. The
> fragmented pages are still just given back, but the address space
> stays allocated.
> 
> I tried freeing only aligned 2MB areas to optimize for 2MB huge
> pages, but the hit rate was quite low, so I switched to 1MB+
> unaligned areas. The target size is a param now.

If the size to free is smaller than the quirk size, then it has the very
undesirable effect that with using GC only you might run unnecessarily out
of virtual address space, because it allocates pages in 2MB chunks, but
if they are released in 1MB chunks, those released chunks will never be
usable again for GC.  Consider on 32-bit address space allocating 3GB
of GC memory, then freeing stuff in every odd 1MB chunk of pages, then
wanting to allocate through GC the 1.5GB back.

IMHO we should munmap immediately in release_pages the > G.pagesize pages,
those are not very likely to be reused anyway (and it had one in between
ggc_collect cycle to be reused anyway), and for the == G.pagesize
(the usual case, the only ones that are allocated in GGC_QUIRK_SIZE sets)
we should note which page was the first one in the GGC_QUIRK_SIZE chunk
and munmap exactly those 2MB starting at the first page only.

Jakub


Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-10-19 Thread Richard Guenther
On Wed, Oct 19, 2011 at 1:58 AM, Maxim Kuvyrkov  wrote:
> On 29/09/2011, at 10:23 PM, Richard Guenther wrote:
>
>> On Thu, Sep 29, 2011 at 10:27 AM, Tom de Vries  
>> wrote:
>
>> +              && SSA_NAME_IS_DEFAULT_DEF (expr)
>> +              && TREE_CODE (var) == PARM_DECL
>> +              && POINTER_TYPE_P (TREE_TYPE (var))
>> +              && TYPE_USER_ALIGN (TREE_TYPE (TREE_TYPE (var
>> +       val = get_align_value (TYPE_ALIGN (TREE_TYPE (TREE_TYPE (var))),
>> +                              TREE_TYPE (var), 0);
>>
>> I realize that the scope where this applies is pretty narrow (and thus
>> bad fallout is quite unlikely).  But I suppose we should document
>> somewhere that GCC treats alignment attributes on parameters
>> more strict than say, on assignments.
>
> Richard, the intention here is for a follow up patch to change "&& 
> TYPE_USER_ALIGN" to "&& (TYPE_USER_ALIGN || flag_assume_aligned_parameters)". 
>  I understand that you will probably not like the idea of adding 
> flag_assume_aligned_parameters, but it wouldn't make such an option any less 
> valuable for experienced users of GCC.  For some users this option will the 
> additional piece of rope to hang themselves; for others it will produce good 
> benefits of better performance.
>
> [Disclaimer: I've put significant amount of my time into investigation of 
> this problem and hate to see it all go to waste :-).]

It's not wasted, but using it as part of a bad solution isn't good either ;)

I do want a start to a more general solution that also will fix the ever present
wrong-code bugs on strict-align targets with respect to excessive alignment
we assume for them.  This doesn't seem to be one, but instead it seems
to introduce more inconsistencies which is very bad.

What exact problems do you try to solve (I presume not the artificial
testcase in the PR)?

Thanks,
Richard.


Weakref and cgraph part 1

2011-10-19 Thread Jan Hubicka
Hi,
this patch makes cgraph code to also handle wekrefs as long as their
destination declaration is known.  The cases where it is not still needs
updating at lto-symtab I plan to do incrementally.

The basic idea is to handle weakrefs as external declaration (so they
are subject of unrechable code removal) and make cgraph code to output
those that are still needed.

I had this in my tree for a while, so hope it will not cause much surprises.
Bootstrapped/regtested x86_64-linux & tested on Mozilla/libreoffice LTO

Honza

* cgraphunit.c (handle_alias_pairs): Also handle wekref with destination
declared.
(output_weakrefs): New function.
* varpool.c (varpool_create_variable_alias): Handle external aliases.
Index: cgraphunit.c
===
*** cgraphunit.c(revision 180053)
--- cgraphunit.c(working copy)
*** handle_alias_pairs (void)
*** 1219,1225 
for (i = 0; VEC_iterate (alias_pair, alias_pairs, i, p);)
  {
if (TREE_CODE (p->decl) == FUNCTION_DECL
-  && !lookup_attribute ("weakref", DECL_ATTRIBUTES (p->decl))
  && (target_node = cgraph_node_for_asm (p->target)) != NULL)
{
  src_node = cgraph_get_node (p->decl);
--- 1219,1224 
*** handle_alias_pairs (void)
*** 1231,1242 
 However for weakref we insist on EXTERNAL flag being set.
 See gcc.dg/attr-alias-5.c  */
  if (DECL_EXTERNAL (p->decl))
!   DECL_EXTERNAL (p->decl) = 0;
  cgraph_create_function_alias (p->decl, target_node->decl);
  VEC_unordered_remove (alias_pair, alias_pairs, i);
}
else if (TREE_CODE (p->decl) == VAR_DECL
-  && !lookup_attribute ("weakref", DECL_ATTRIBUTES (p->decl))
   && (target_vnode = varpool_node_for_asm (p->target)) != NULL)
{
  /* Normally EXTERNAL flag is used to mark external inlines,
--- 1230,1241 
 However for weakref we insist on EXTERNAL flag being set.
 See gcc.dg/attr-alias-5.c  */
  if (DECL_EXTERNAL (p->decl))
!   DECL_EXTERNAL (p->decl) = lookup_attribute ("weakref",
!   DECL_ATTRIBUTES 
(p->decl)) != NULL;
  cgraph_create_function_alias (p->decl, target_node->decl);
  VEC_unordered_remove (alias_pair, alias_pairs, i);
}
else if (TREE_CODE (p->decl) == VAR_DECL
   && (target_vnode = varpool_node_for_asm (p->target)) != NULL)
{
  /* Normally EXTERNAL flag is used to mark external inlines,
*** handle_alias_pairs (void)
*** 1245,1251 
 However for weakref we insist on EXTERNAL flag being set.
 See gcc.dg/attr-alias-5.c  */
  if (DECL_EXTERNAL (p->decl))
!   DECL_EXTERNAL (p->decl) = 0;
  varpool_create_variable_alias (p->decl, target_vnode->decl);
  VEC_unordered_remove (alias_pair, alias_pairs, i);
}
--- 1244,1251 
 However for weakref we insist on EXTERNAL flag being set.
 See gcc.dg/attr-alias-5.c  */
  if (DECL_EXTERNAL (p->decl))
!   DECL_EXTERNAL (p->decl) = lookup_attribute ("weakref",
!   DECL_ATTRIBUTES 
(p->decl)) != NULL;
  varpool_create_variable_alias (p->decl, target_vnode->decl);
  VEC_unordered_remove (alias_pair, alias_pairs, i);
}
*** ipa_passes (void)
*** 2064,2069 
--- 2064,2089 
bitmap_obstack_release (NULL);
  }
  
+ /* Weakrefs may be associated to external decls and thus not output
+at expansion time.  Emit all neccesary aliases.  */
+ 
+ void
+ output_weakrefs (void)
+ {
+   struct cgraph_node *node;
+   struct varpool_node *vnode;
+   for (node = cgraph_nodes; node; node = node->next)
+ if (node->alias && node->thunk.alias && DECL_EXTERNAL (node->decl)
+ && !TREE_ASM_WRITTEN (node->decl))
+   assemble_alias (node->decl,
+ DECL_ASSEMBLER_NAME (node->thunk.alias));
+   for (vnode = varpool_nodes; vnode; vnode = vnode->next)
+ if (vnode->alias && vnode->alias_of && DECL_EXTERNAL (vnode->decl)
+ && !TREE_ASM_WRITTEN (vnode->decl))
+   assemble_alias (vnode->decl,
+ DECL_ASSEMBLER_NAME (vnode->alias_of));
+ }
+ 
  
  /* Perform simple optimizations based on callgraph.  */
  
*** cgraph_optimize (void)
*** 2150,2155 
--- 2170,2177 
  
varpool_assemble_pending_decls ();
  }
+ 
+   output_weakrefs ();
cgraph_process_new_functions ();
cgraph_state = CGRAPH_STATE_FINISHED;
  
Index: varpool.c
===
*** varpool.c   (revision 180052)
--- varpool.c   (working copy)
*** varpool_create_variable_alias (tree alia
*** 703,711 
gcc_assert (TREE_CODE (alias) =

Re: patch ping: Dump the "degree of overlap" to compare static profile with instrumentation profile

2011-10-19 Thread Jan Hubicka
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01293.html

Hi,
this seems fine for mainline.  I am not 100% sure it is best choice of metric
(i.e. mostly one is interested if hot blocks identified by static profile cover
hot blocks of real profile), but we could get some experience with it.

Thanks,
Honza
> 
> Thanks!
> Dehao


Re: [PATCH] Add an intermediate coverage format for gcov

2011-10-19 Thread Jan Hubicka
> On Oct 18, 2011, at 4:19 PM, Sharad Singhai  wrote:
> > Okay, I liked the idea of self-descriptive tags. I have updated the
> > patch based on your suggestions. I have simplified the format
> > somewhat. Instead of repeating function name, I use a 'function' tag
> > with the format
> > 
> > function:,,
> 
> Sound nice.
> 
> > I also dropped the unmangled function names, they were turning out to
> > be too unreadable and not really useful in this context.
> 
> Ah, I'd argue for mangled names.  Every one knows they can stream through 
> c++filt and get unmangled, but once you unmangle, you can never go back.  
> Also, the mangled version is significantly smaller.  For c, it is irrelevant, 
> for c++, it makes a big difference.

I would also support unmangled variant. Otherwise the patch seems resonable to 
me.

Honza
> > 


Re: [Patch,AVR]: PR50447: Tweak addhi3

2011-10-19 Thread Denis Chertykov
2011/10/19 Georg-Johann Lay :
> Georg-Johann Lay schrieb:
>>
>> Denis Chertykov schrieb:
>>
>>> What difference in size of avr-libc ?
>>
>> I have no tool for smart size analysis, so here is just a diff:
>>
>> After rebuilding avr-libc with respective compiler version, did
>> respectively:
>>
>> $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-orig.txt
>> $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-patch.txt
>>
>> and then
>>
>> $ diff -U 0 size-orig.txt size-patch.txt > size.diff
>>
>> As far as I can see, there is not a big gain but no object increases in
>> size.
>>
>> For some files like ./avr/lib/avr2/libc.a:dtoa_prf.o size gain is 3%.
>> For ./avr/lib/avr4/libc.a:vfprintf_std.o it's 1.7% and for others just one
>> instruction better.
>
> Actually there are some cases where the size increases by one instruction:
>
>> -    464              0       0     464     1d0 realloc.o (ex
>> ./avr/lib/avr31/libc.a)
>> +    466              0       0     466     1d2 realloc.o (ex
>> ./avr/lib/avr31/libc.a)
>
>> -    464              0       0     464     1d0 realloc.o (ex
>> ./avr/lib/avr3/libc.a)
>> +    466              0       0     466     1d2 realloc.o (ex
>> ./avr/lib/avr3/libc.a)
>
> Will have a look tomorrow; presumably it's adding +/-1 that force a scratch
> whilst the old code does not.
>

However results are good.
The patch can be committed.

Denis.


Re: [PATCH][ARM] -m{cpu,tune,arch}=native

2011-10-19 Thread Andrew Stubbs

On 18/10/11 15:42, Andrew Stubbs wrote:

Which still jumps to not_found without closing f.


Hmmm, I know I fixed that, I know I did!

But I appear to have lost the change somewhere when I updated my checkout?

I'll fix it now.


Fixed and committed as attached.

Apologies for the cock-up. :(

Andrew
2011-10-18  Andrew Stubbs  

	* config/arm/driver-arm.c (host_detect_local_cpu): Close the file
	before exiting.

---
 src/gcc-mainline/gcc/config/arm/driver-arm.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/src/gcc-mainline/gcc/config/arm/driver-arm.c b/src/gcc-mainline/gcc/config/arm/driver-arm.c
index 43b6e58..9a6762b 100644
--- a/src/gcc-mainline/gcc/config/arm/driver-arm.c
+++ b/src/gcc-mainline/gcc/config/arm/driver-arm.c
@@ -75,7 +75,7 @@ host_detect_local_cpu (int argc, const char **argv)
 {
   const char *val = NULL;
   char buf[128];
-  FILE *f;
+  FILE *f = NULL;
   bool arch;
   const struct vendor_cpu *cpu_table = NULL;
 
@@ -134,6 +134,10 @@ not_found:
 unsigned int i;
 unsigned int opt;
 const char *search[] = {NULL, "arch"};
+
+if (f)
+  fclose (f);
+
 search[0] = argv[0];
 for (opt = 0; opt < ARRAY_SIZE (search); opt++)
   for (i = 0; i < ARRAY_SIZE (configure_default_options); i++)


[PATCH] Fix PR50768, rewrite gimplify_and_update_call_from_tree

2011-10-19 Thread Richard Guenther

So my previous patch has caused some bugs - not unreasonable given
the complexity of that function.  Thus the following patch makes
it readable and understandable again, by exchanging the intricate
one-stage processing with a two-stage one, first assigning VDEFs
and then assigning VUSEs in a second stage.  Also doing the
lhs assignment processing upfront which removes the need for
all the duplicate code at the tail.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu, let's
hope it can stay this simple ;)

Thanks,
Richard.

2011-10-19  Richard Guenther  

PR middle-end/50768
* gimple-fold.c (gimplify_and_update_call_from_tree): Rewrite.

* gcc.dg/torture/pr50768.c: New testcase.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 180166)
--- gcc/gimple-fold.c   (working copy)
*** void
*** 534,558 
  gimplify_and_update_call_from_tree (gimple_stmt_iterator *si_p, tree expr)
  {
tree lhs;
-   tree tmp = NULL_TREE;  /* Silence warning.  */
gimple stmt, new_stmt;
gimple_stmt_iterator i;
gimple_seq stmts = gimple_seq_alloc();
struct gimplify_ctx gctx;
!   gimple last = NULL;
!   gimple laststore = NULL;
tree reaching_vuse;
  
stmt = gsi_stmt (*si_p);
  
gcc_assert (is_gimple_call (stmt));
  
-   lhs = gimple_call_lhs (stmt);
-   reaching_vuse = gimple_vuse (stmt);
- 
push_gimplify_context (&gctx);
gctx.into_ssa = gimple_in_ssa_p (cfun);
  
if (lhs == NULL_TREE)
  {
gimplify_and_add (expr, &stmts);
--- 534,555 
  gimplify_and_update_call_from_tree (gimple_stmt_iterator *si_p, tree expr)
  {
tree lhs;
gimple stmt, new_stmt;
gimple_stmt_iterator i;
gimple_seq stmts = gimple_seq_alloc();
struct gimplify_ctx gctx;
!   gimple last;
!   gimple laststore;
tree reaching_vuse;
  
stmt = gsi_stmt (*si_p);
  
gcc_assert (is_gimple_call (stmt));
  
push_gimplify_context (&gctx);
gctx.into_ssa = gimple_in_ssa_p (cfun);
  
+   lhs = gimple_call_lhs (stmt);
if (lhs == NULL_TREE)
  {
gimplify_and_add (expr, &stmts);
*** gimplify_and_update_call_from_tree (gimp
*** 571,675 
}
  }
else
! tmp = get_initialized_tmp_var (expr, &stmts, NULL);
  
pop_gimplify_context (NULL);
  
if (gimple_has_location (stmt))
  annotate_all_with_location (stmts, gimple_location (stmt));
  
!   /* The replacement can expose previously unreferenced variables.  */
for (i = gsi_start (stmts); !gsi_end_p (i); gsi_next (&i))
  {
if (last)
{
  gsi_insert_before (si_p, last, GSI_NEW_STMT);
  gsi_next (si_p);
}
new_stmt = gsi_stmt (i);
if (gimple_in_ssa_p (cfun))
find_new_referenced_vars (new_stmt);
/* If the new statement possibly has a VUSE, update it with exact SSA
 name we know will reach this one.  */
if (gimple_has_mem_ops (new_stmt))
!   {
! /* If we've also seen a previous store create a new VDEF for
!the latter one, and make that the new reaching VUSE.  */
! if (laststore)
!   {
! reaching_vuse = make_ssa_name (gimple_vop (cfun), laststore);
! gimple_set_vdef (laststore, reaching_vuse);
! update_stmt (laststore);
! laststore = NULL;
!   }
! gimple_set_vuse (new_stmt, reaching_vuse);
! gimple_set_modified (new_stmt, true);
!   }
!   if (gimple_assign_single_p (new_stmt)
! && !is_gimple_reg (gimple_assign_lhs (new_stmt)))
!   {
! laststore = new_stmt;
!   }
last = new_stmt;
  }
  
!   if (lhs == NULL_TREE)
  {
!   /* If we replace a call without LHS that has a VDEF and our new
!  sequence ends with a store we must make that store have the same
!vdef in order not to break the sequencing.  This can happen
!for instance when folding memcpy calls into assignments.  */
!   if (gimple_vdef (stmt) && laststore)
!   {
! gimple_set_vdef (laststore, gimple_vdef (stmt));
! if (TREE_CODE (gimple_vdef (stmt)) == SSA_NAME)
!   SSA_NAME_DEF_STMT (gimple_vdef (stmt)) = laststore;
! update_stmt (laststore);
!   }
!   else if (gimple_in_ssa_p (cfun))
{
  unlink_stmt_vdef (stmt);
! release_defs (stmt);
!   }
!   new_stmt = last;
! }
!   else
! {
!   if (last)
!   {
! gsi_insert_before (si_p, last, GSI_NEW_STMT);
! gsi_next (si_p);
!   }
!   if (laststore && is_gimple_reg (lhs))
!   {
! gimple_set_vdef (laststore, gimple_vdef (stmt));
! update_stmt (laststore);
! if (TREE_CODE (gimple_vdef (stmt)) == SSA_NAME)
!   SSA_NAME_DEF_STMT (gimple_vdef (stmt)) = laststore;
! laststore = NULL;
!   }
!   else if (laststore)
!   {
! reaching_

Re: [trans-mem] Rename/split __transaction into __transaction_atomic and __transaction_relaxed.

2011-10-19 Thread Torvald Riegel
On Tue, 2011-10-18 at 14:29 -0500, Aldy Hernandez wrote:
> > Several ICEs in the TM tests on C++, but I think they are old. C TM
> > tests work except some missing optimizations (old failures too).
> > c-c++-common/tm/wrap-[12].c fail on C++, but not on C:
> > wrap-1.c:5:57: error: 'transaction_wrap' argument not an identifier
> > I believe this is an old error too?
> > (I don't yet have a testcase summary of before the changes, so I can't
> > compare properly. But will send an follow-up if there are regressions.)
> 
> Can you run the tests before your patch to make sure there are no 
> regressions?  That is, let's be sure that these are indeed old failures.

Indeed old failures, no new regressions. Tested and boot-strapped on
x86_64.

OK for branch?



Re: [C++ Patch] __builtin_choose_expr

2011-10-19 Thread Andy Gibbs
Andy Gibbs wrote on 18th October 2011 at 1:36 PM:
> Paolo Carlini wrote on 18th October 2011 at 12:00 PM:
>> I'm under the impression that some tests are written in C, wouldn't better
>> fit in c-c++-common?
> 
> Ok, I can do this but I think only 2 out of the 8 tests can be moved
> across.  I will then move the duplicates out of the gcc.dg folder too.

Actually, I managed to get 3 common tests moved out of gcc.dg (pre-existing)
and g++.dg/ext (from my original patch) into c-c++-common.

> I will post a replacement patch later this afternoon.

And here it is...  Is it now ready to be committed to trunk?

Regards,
Andy


=

Changelog:

2011-10-19  Andy Gibbs 

* c-family/c-common.c: Enable __builtin_choose_expr

* cp/cp-tree.def: Add tree node for handling __builtin_choose_expr
  within a template scenario
* cp/cp-tree.h: Ditto
* cp/cp-objcp-common.c: Ditto
* cp/decl.c: Ditto

* cp/parser.c (cp_parser_skip_to_closing_parenthesis): Remove the
  unnecessary requirement for 'recovering' flag to be set when
  'or_comma' flag is set
* cp/parser.c (cp_parser_skip_to_end_of_parameter): New function
* cp/parser.c (cp_parser_builtin_choose_expr): New function
* cp/parser.c (cp_parser_primary_expression): Use it
* cp/semantics.c (potential_constant_expression_1): Mark the use
  of __builtin_choose_expr as a potentially constant expression
* cp/semantics.c (evaluate_builtin_choose_expr_condition): New
  function
* cp/semantics.c (finish_builtin_choose_expr): New function
* cp/pt.c (tsubst_copy_and_build): Use it

* cp/cxx-pretty-print.h: New function pp_cxx_choose_expression
* cp/cxx-pretty-print.c: Document, define and use it
* cp/error.c: Use it

* doc/extend.text: Update docs

* testsuite/gcc.dg/builtin-choose-expr.c: Moved...
* testsuite/c-c++-common/builtin-choose-expr.c: ...to here
* testsuite/gcc.dg/builtin-choose-expr-2.c: Moved...
* testsuite/c-c++-common/builtin-choose-expr-2.c: ...to here
* testsuite/gcc.dg/Wunused-var-2.c: Moved...
* testsuite/c-c++-common/Wunused-var-15.c: ...to here
* testsuite/g++.dg/ext/builtin-choose-expr1.C: New test
* testsuite/g++.dg/ext/builtin-choose-expr2.C: New test
* testsuite/g++.dg/ext/builtin-choose-expr3.C: New test
* testsuite/g++.dg/ext/builtin-choose-expr4.C: New test
* testsuite/g++.dg/ext/builtin-choose-expr5.C: New test


--- gcc-4.7.0-r180072/gcc/c-family/c-common.c
+++ gcc-4.7.0-patched/gcc/c-family/c-common.c
@@ -423,7 +423,7 @@ const struct c_common_resword c_common_r
   { "__asm__", RID_ASM,0 },
   { "__attribute", RID_ATTRIBUTE,  0 },
   { "__attribute__",   RID_ATTRIBUTE,  0 },
-  { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
+  { "__builtin_choose_expr", RID_CHOOSE_EXPR, 0 },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
--- gcc-4.7.0-r180072/gcc/cp/cp-objcp-common.c
+++ gcc-4.7.0-patched/gcc/cp/cp-objcp-common.c
@@ -100,6 +100,8 @@ cp_tree_size (enum tree_code code)
 
 case TEMPLATE_INFO: return sizeof (struct tree_template_info);
 
+case CHOOSE_EXPR:   return sizeof (struct tree_choose_expr);
+
 default:
   gcc_unreachable ();
 }
@@ -301,6 +303,7 @@ cp_common_init_ts (void)
   MARK_TS_TYPED (USING_STMT);
   MARK_TS_TYPED (LAMBDA_EXPR);
   MARK_TS_TYPED (CTOR_INITIALIZER);
+  MARK_TS_TYPED (CHOOSE_EXPR);
 }
 
 #include "gt-cp-cp-objcp-common.h"
--- gcc-4.7.0-r180072/gcc/cp/cp-tree.def
+++ gcc-4.7.0-patched/gcc/cp/cp-tree.def
@@ -332,6 +332,9 @@ DEFTREECODE (TAG_DEFN, "tag_defn", tcc_e
 /* Represents an 'offsetof' expression during template expansion.  */
 DEFTREECODE (OFFSETOF_EXPR, "offsetof_expr", tcc_expression, 1)
 
+/* Represents an '__builtin_choose_expr' expression during template expansion. 
 */
+DEFTREECODE (CHOOSE_EXPR, "choose_expr", tcc_exceptional, 0)
+
 /* Represents a 'sizeof' expression during template expansion.  */
 DEFTREECODE (SIZEOF_EXPR, "sizeof_expr", tcc_expression, 1)
 
--- gcc-4.7.0-r180072/gcc/cp/cp-tree.h
+++ gcc-4.7.0-patched/gcc/cp/cp-tree.h
@@ -532,6 +532,33 @@ struct GTY (()) tree_deferred_noexcept {
 };
 
 
+/* The condition associated with __builtin_choose_expr.  This must be
+   an integral constant expression.  */
+#define CHOOSE_EXPR_CONDITION(NODE) \
+  (((struct tree_choose_expr *)CHOOSE_EXPR_CHECK (NODE))->condition)
+
+/* The expression to be used when condition in __builtin_choose_expr
+   is true*/
+#define CHOOSE_EXPR_TRUE(NODE) \
+  (((struct tree_choose_expr *)CHOOSE_EXPR_CHECK (NODE))->expr_true)
+
+/* The expression to be used when condition in __builtin_choose_expr
+   is false*/
+#define CHOOSE_EXPR_FALSE(NODE) \
+  (((struct tree_choose_expr *)CHOOSE_EXPR_CHECK (NODE)

[C++ Patch] PR 38761, 40872

2011-10-19 Thread Paolo Carlini

Hi,

these simple PRs are both about conditional expressions in error calls 
which apparently make the life very hard to gettext: today I double 
checked and current 0.18.1 is still unable to cope correctly with those, 
it just doesn't extract the second message. Thus the below, which simply 
uses G_() in a few more places.


Tested x86_64-linux (the make gcc.pot output too, of course)

Ok?

Thanks,
Paolo.

/
2011-10-19  Paolo Carlini  

PR c++/38761
PR c++/40872
* decl.c (duplicate_decls, make_typename_type, grokdeclarator): Use
G_() in error message strings to facilitate translation.
* semantics.c (finish_id_expression): Likewise.
* parser.c (cp_parser_nested_name_specifier_opt,
cp_parser_parameter_declaration): Likewise.
Index: decl.c
===
--- decl.c  (revision 180180)
+++ decl.c  (working copy)
@@ -1542,8 +1542,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool
  error_at (DECL_SOURCE_LOCATION (newdecl), errmsg, newdecl);
  if (DECL_NAME (olddecl) != NULL_TREE)
error ((DECL_INITIAL (olddecl) && namespace_bindings_p ())
-? "%q+#D previously defined here"
-: "%q+#D previously declared here", olddecl);
+  ? G_("%q+#D previously defined here")
+  : G_("%q+#D previously declared here"), olddecl);
  return error_mark_node;
}
   else if (TREE_CODE (olddecl) == FUNCTION_DECL
@@ -3236,8 +3236,8 @@ make_typename_type (tree context, tree name, enum
   if (!t)
 {
   if (complain & tf_error)
-   error (want_template ? "no class template named %q#T in %q#T"
-  : "no type named %q#T in %q#T", name, context);
+   error (want_template ? G_("no class template named %q#T in %q#T")
+  : G_("no type named %q#T in %q#T"), name, context);
   return error_mark_node;
 }
   
@@ -9143,13 +9143,13 @@ grokdeclarator (const cp_declarator *declarator,
   virtual.  A constructor may not be static.  */
if (staticp == 2)
  error ((flags == DTOR_FLAG)
-? "destructor cannot be static member function"
-: "constructor cannot be static member function");
+? G_("destructor cannot be static member function")
+: G_("constructor cannot be static member function"));
if (memfn_quals)
  {
error ((flags == DTOR_FLAG)
-  ? "destructors may not be cv-qualified"
-  : "constructors may not be cv-qualified");
+  ? G_("destructors may not be cv-qualified")
+  : G_("constructors may not be cv-qualified"));
memfn_quals = TYPE_UNQUALIFIED;
  }
 
@@ -9502,8 +9502,10 @@ grokdeclarator (const cp_declarator *declarator,
  && (!friendp || funcdef_flag))
{
  error (funcdef_flag
-? "cannot define member function %<%T::%s%> within %<%T%>"
-: "cannot declare member function %<%T::%s%> within 
%<%T%>",
+? G_("cannot define member function %<%T::%s%> "
+ "within %<%T%>")
+: G_("cannot declare member function %<%T::%s%> "
+ "within %<%T%>"),
 ctype, name, current_class_type);
  return error_mark_node;
}
@@ -10223,8 +10225,8 @@ grokdeclarator (const cp_declarator *declarator,
 || sfk == sfk_destructor)
  {
error (funcdef_flag
-  ? "%qs defined in a non-class scope"
-  : "%qs declared in a non-class scope", name);
+  ? G_("%qs defined in a non-class scope")
+  : G_("%qs declared in a non-class scope"), name);
sfk = sfk_none;
  }
  }
Index: semantics.c
===
--- semantics.c (revision 180180)
+++ semantics.c (working copy)
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-objc.h"
 #include "tree-inline.h"
 #include "tree-mudflap.h"
+#include "intl.h"
 #include "toplev.h"
 #include "flags.h"
 #include "output.h"
@@ -2985,8 +2986,8 @@ finish_id_expression (tree id_expression,
  else
{
  error (TREE_CODE (decl) == VAR_DECL
-? "use of % variable from containing function"
-: "use of parameter from containing function");
+? G_("use of % variable from containing function")
+: G_("use of parameter from containing function"));
  error ("  %q+#D de

[PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread Sergey Ostanevich
Hi!

Here is a patch to address PR50572.
Successfully bootstrapped and passed make check on x86_64-linux.

regards,
Sergos



2011-10-19  Sergey Ostanevich  

* gcc/config/i386/i386.c (ix86_option_override_internal): use loop
align by 16 bytes for Atom platform

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..7a93144 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3497,6 +3497,8 @@ ix86_option_override_internal (bool main_args_p)
 {
   align_loops = processor_target_table[ix86_tune].align_loop;
   align_loops_max_skip =
processor_target_table[ix86_tune].align_loop_max_skip;
+  if (ix86_tune == PROCESSOR_ATOM && optimize > 1)
+align_loops_max_skip = 15;
 }
   if (align_jumps == 0)
 {


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread Uros Bizjak
Hello!

> 2011-10-19  Sergey Ostanevich  
>
>   * gcc/config/i386/i386.c (ix86_option_override_internal): use loop
> align by 16 bytes for Atom platform
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 2c53423..7a93144 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3497,6 +3497,8 @@ ix86_option_override_internal (bool main_args_p)
>  {
>align_loops = processor_target_table[ix86_tune].align_loop;
>align_loops_max_skip =
> processor_target_table[ix86_tune].align_loop_max_skip;
> +  if (ix86_tune == PROCESSOR_ATOM && optimize > 1)
> +align_loops_max_skip = 15;
>  }
>if (align_jumps == 0)
>  {

You can just change the default in processor_target_table.

Uros.


[commit, spu] Fix PR target/50310 (unordered vector compares)

2011-10-19 Thread Ulrich Weigand
Hello,

the pr50310 test cases are failing on SPU with internal compiler errors.
It seems that if the back-end provides a vcond pattern, it must support
all comparison operations, including unordered ones which are missing
in the SPU implementation.

Since we don't actually have any specific instruction support for those
operations on SPU, the following patch open-codes their semantics.

Tested on spu-elf, fixes the pr50310 test cases.
Committed to mainline.

Bye,
Ulrich


ChangeLog:

PR target/50310
* config/spu/spu.c (spu_emit_vector_compare): Support unordered
floating-point comparisons.


Index: gcc/config/spu/spu.c
===
*** gcc/config/spu/spu.c(revision 180156)
--- gcc/config/spu/spu.c(working copy)
*** spu_emit_vector_compare (enum rtx_code r
*** 6415,6427 
try_again = true;
break;
  case NE:
/* Treat A != B as ~(A==B).  */
{
  enum insn_code nor_code;
! rtx eq_rtx = spu_emit_vector_compare (EQ, op0, op1, dest_mode);
  nor_code = optab_handler (one_cmpl_optab, dest_mode);
  gcc_assert (nor_code != CODE_FOR_nothing);
! emit_insn (GEN_FCN (nor_code) (mask, eq_rtx));
  if (dmode != dest_mode)
{
  rtx temp = gen_reg_rtx (dest_mode);
--- 6415,6438 
try_again = true;
break;
  case NE:
+   case UNEQ:
+   case UNLE:
+   case UNLT:
+   case UNGE:
+   case UNGT:
+   case UNORDERED:
/* Treat A != B as ~(A==B).  */
{
+   enum rtx_code rev_code;
  enum insn_code nor_code;
!   rtx rev_mask;
! 
!   rev_code = reverse_condition_maybe_unordered (rcode);
! rev_mask = spu_emit_vector_compare (rev_code, op0, op1, 
dest_mode);
! 
  nor_code = optab_handler (one_cmpl_optab, dest_mode);
  gcc_assert (nor_code != CODE_FOR_nothing);
! emit_insn (GEN_FCN (nor_code) (mask, rev_mask));
  if (dmode != dest_mode)
{
  rtx temp = gen_reg_rtx (dest_mode);
*** spu_emit_vector_compare (enum rtx_code r
*** 6466,6471 
--- 6477,6524 
  return mask;
}
break;
+ case LTGT:
+   /* Try LT OR GT */
+   {
+ rtx lt_rtx, gt_rtx;
+ enum insn_code ior_code;
+ 
+ lt_rtx = spu_emit_vector_compare (LT, op0, op1, dest_mode);
+ gt_rtx = spu_emit_vector_compare (GT, op0, op1, dest_mode);
+ 
+ ior_code = optab_handler (ior_optab, dest_mode);
+ gcc_assert (ior_code != CODE_FOR_nothing);
+ emit_insn (GEN_FCN (ior_code) (mask, lt_rtx, gt_rtx));
+ if (dmode != dest_mode)
+   {
+ rtx temp = gen_reg_rtx (dest_mode);
+ convert_move (temp, mask, 0);
+ return temp;
+   }
+ return mask;
+   }
+   break;
+ case ORDERED:
+   /* Implement as (A==A) & (B==B) */
+   {
+ rtx a_rtx, b_rtx;
+ enum insn_code and_code;
+ 
+ a_rtx = spu_emit_vector_compare (EQ, op0, op0, dest_mode);
+ b_rtx = spu_emit_vector_compare (EQ, op1, op1, dest_mode);
+ 
+ and_code = optab_handler (and_optab, dest_mode);
+ gcc_assert (and_code != CODE_FOR_nothing);
+ emit_insn (GEN_FCN (and_code) (mask, a_rtx, b_rtx));
+ if (dmode != dest_mode)
+   {
+ rtx temp = gen_reg_rtx (dest_mode);
+ convert_move (temp, mask, 0);
+ return temp;
+   }
+ return mask;
+   }
+   break;
  default:
gcc_unreachable ();
  }
-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread Sergey Ostanevich
On Wed, Oct 19, 2011 at 4:14 PM, Uros Bizjak  wrote:
>
> You can just change the default in processor_target_table.
>
> Uros.
>

Will it be applicable during optimizations for size? It will hurt,
although not much (see PR).
New patch is below. Ok for trunk as obvious?

Sergos

2011-10-19  Sergey Ostanevich  

* gcc/config/i386/i386.c (ix86_option_override_internal): use loop
align by 16 bytes for Atom platform

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..8c60086 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2596,7 +2596,7 @@ static const struct ptt
processor_target_table[PROCESSOR_max] =
   {&bdver1_cost, 32, 24, 32, 7, 32},
   {&bdver2_cost, 32, 24, 32, 7, 32},
   {&btver1_cost, 32, 24, 32, 7, 32},
-  {&atom_cost, 16, 7, 16, 7, 16}
+  {&atom_cost, 16, 15, 16, 7, 16}
 };


Re: [C++ Patch] PR 38761, 40872

2011-10-19 Thread Jason Merrill

OK.

Jason


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread Uros Bizjak
On Wed, Oct 19, 2011 at 2:26 PM, Sergey Ostanevich  wrote:

>> You can just change the default in processor_target_table.
>>
>> Uros.
>>
>
> Will it be applicable during optimizations for size? It will hurt,
> although not much (see PR).

Looking at the code, I'd say that we don't handle -Os in different way.

> New patch is below. Ok for trunk as obvious?
>
> Sergos
>
> 2011-10-19  Sergey Ostanevich  
>
>        * gcc/config/i386/i386.c (ix86_option_override_internal): use loop
>        align by 16 bytes for Atom platform

Please update ChangeLog, like:

* gcc/config/i386/i386.c (processor_target_table): Change Atom
align_loop_max_skip to 15.

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 2c53423..8c60086 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2596,7 +2596,7 @@ static const struct ptt
> processor_target_table[PROCESSOR_max] =
>   {&bdver1_cost, 32, 24, 32, 7, 32},
>   {&bdver2_cost, 32, 24, 32, 7, 32},
>   {&btver1_cost, 32, 24, 32, 7, 32},
> -  {&atom_cost, 16, 7, 16, 7, 16}
> +  {&atom_cost, 16, 15, 16, 7, 16}
>  };


OK.

Thanks,
Uros.


[PATCH, i386, PR50766] Fix incorrect mem/reg operands order

2011-10-19 Thread Kirill Yukhin
Hi,
Here is (almost obvous) patch, which fixes PR50766.

ChangeLog  entry:
2011-10-19  Kirill Yukhin  

* config/i386/i386.md (bmi_bextr_): Update register/
memory operand order.
(bmi2_bzhi_3): Ditto.
(bmi2_pdep_3): Ditto.
(bmi2_pext_3): Ditto.

Bootstrapped, test (from bug) passing.

Could you please have a look?

Thanks, K


pr50766.gcc.patch
Description: Binary data


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread Sergey Ostanevich
On Wed, Oct 19, 2011 at 4:46 PM, Uros Bizjak  wrote:
> On Wed, Oct 19, 2011 at 2:26 PM, Sergey Ostanevich  
> wrote:
>
>>> You can just change the default in processor_target_table.
>>>
>>> Uros.
>>>
>>
>> Will it be applicable during optimizations for size? It will hurt,
>> although not much (see PR).
>
> Looking at the code, I'd say that we don't handle -Os in different way.
>
>> New patch is below. Ok for trunk as obvious?
>>
>> Sergos
>>
>> 2011-10-19  Sergey Ostanevich  
>>
>>        * gcc/config/i386/i386.c (ix86_option_override_internal): use loop
>>        align by 16 bytes for Atom platform
>
> Please update ChangeLog, like:
>
> * gcc/config/i386/i386.c (processor_target_table): Change Atom
> align_loop_max_skip to 15.
>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 2c53423..8c60086 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -2596,7 +2596,7 @@ static const struct ptt
>> processor_target_table[PROCESSOR_max] =
>>   {&bdver1_cost, 32, 24, 32, 7, 32},
>>   {&bdver2_cost, 32, 24, 32, 7, 32},
>>   {&btver1_cost, 32, 24, 32, 7, 32},
>> -  {&atom_cost, 16, 7, 16, 7, 16}
>> +  {&atom_cost, 16, 15, 16, 7, 16}
>>  };
>
>
> OK.
>
> Thanks,
> Uros.
>

Thanks for comments!
I double checked: for -Os there's no .p2align appeared.
For -O2 I see ".p2align 4,,15" instead of ".p2align 4,,7", as expected.

Can someone commit it please?

Regards,
Sergos


2011-10-18  Sergey Ostanevich 

* gcc/config/i386/i386.c (processor_target_table): Change Atom
align_loops_max_skip to 15.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..8c60086 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2596,7 +2596,7 @@ static const struct ptt
processor_target_table[PROCESSOR_max] =
   {&bdver1_cost, 32, 24, 32, 7, 32},
   {&bdver2_cost, 32, 24, 32, 7, 32},
   {&btver1_cost, 32, 24, 32, 7, 32},
-  {&atom_cost, 16, 7, 16, 7, 16}
+  {&atom_cost, 16, 15, 16, 7, 16}
 };

 static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =


[translation patch] Bunch of trivial PRs: 48638, 49517, 49704

2011-10-19 Thread Paolo Carlini

Hi,

a bunch of trivial fixes from Roland. The only one I deem *slightly* 
more controversial is "AST" uppercase instead of "ast" in 
cp/semantics.c, I'm gonna wait a bit in case Jason has a different 
opinion...


Thanks,
Paolo.


2011-10-19  Roland Stigge

PR translation/48638
* plugin.c (add_new_plugin): Fix typo in fatal_error message.

2011-10-19  Roland Stigge

PR translation/49517
* config/rx/rx.c (rx_print_operand): Fix typo in warning message.

/cp
2011-10-19  Roland Stigge  

PR translation/49704
* semantics.c (potential_constant_expression_1): Use AST instead of
ast in sorry message.
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 180185)
+++ cp/semantics.c  (working copy)
@@ -8341,7 +8341,7 @@ potential_constant_expression_1 (tree t, bool want
   return false;
 
 default:
-  sorry ("unexpected ast of kind %s", tree_code_name[TREE_CODE (t)]);
+  sorry ("unexpected AST of kind %s", tree_code_name[TREE_CODE (t)]);
   gcc_unreachable();
   return false;
 }
Index: plugin.c
===
--- plugin.c(revision 180181)
+++ plugin.c(working copy)
@@ -1,5 +1,5 @@
 /* Support for GCC plugin mechanism.
-   Copyright (C) 2009, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -149,7 +149,7 @@ add_new_plugin (const char* plugin_name)
plugin_name, ".so", NULL);
   if (access (plugin_name, R_OK))
fatal_error
- ("inacessible plugin file %s expanded from short plugin name %s: %m",
+ ("inaccessible plugin file %s expanded from short plugin name %s: %m",
   plugin_name, base_name);
 }
   else
Index: config/rx/rx.c
===
--- config/rx/rx.c  (revision 180181)
+++ config/rx/rx.c  (working copy)
@@ -637,7 +637,7 @@ rx_print_operand (FILE * file, rtx op, int letter)
case 0xb: fprintf (file, "fintv"); break;
case 0xc: fprintf (file, "intb"); break;
default:
- warning (0, "unreocgnized control register number: %d - using 'psw'",
+ warning (0, "unrecognized control register number: %d - using 'psw'",
   (int) INTVAL (op));
  fprintf (file, "psw");
  break;


Re: [translation patch] Bunch of trivial PRs: 48638, 49517, 49704

2011-10-19 Thread Jason Merrill

OK.

Jason


Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order

2011-10-19 Thread Uros Bizjak
On Wed, Oct 19, 2011 at 3:07 PM, Kirill Yukhin  wrote:

> Here is (almost obvous) patch, which fixes PR50766.
>
> ChangeLog  entry:
> 2011-10-19  Kirill Yukhin  
>
>        * config/i386/i386.md (bmi_bextr_): Update register/
>        memory operand order.
>        (bmi2_bzhi_3): Ditto.
>        (bmi2_pdep_3): Ditto.
>        (bmi2_pext_3): Ditto.
>
> Bootstrapped, test (from bug) passing.
>
> Could you please have a look?

Please also add the testcase from the PR. You can use  { dg-do
assemble }, but you have to check for BMI2 effective target support.

BTW: I can't find BMI2 instruction reference documentation, so I'm
just rubberstamping the patch as obvious.

So, OK with the testcase.

Thanks,
Uros.


[PATCH] Fix PR50780, make comparisons able to throw again ...

2011-10-19 Thread Richard Guenther

This fixes PR50780 by validating a condition is gimple before
substituting it into a COND_EXPR during forwprop.  When looking
at this I noticed that comparisons no longer are marked as
possibly throwing - because of an old tuplification bug :/
Thus, fixed as well, hopefully without fallout.

Bootstrap and regtest running on x86_64-unknown-linux-gnu ...

Richard.

2011-10-19  Richard Guenther  

PR middle-end/50780
* tree-ssa-forwprop.c (forward_propagate_into_cond): Verify
the condition is properly gimple before using it.
* tree-eh (stmt_could_throw_1_p): Properly extract the
operation type from comparisons.

Index: gcc/tree-ssa-forwprop.c
===
*** gcc/tree-ssa-forwprop.c (revision 180186)
--- gcc/tree-ssa-forwprop.c (working copy)
*** forward_propagate_into_cond (gimple_stmt
*** 597,603 
}
  }
  
!   if (tmp)
  {
if (dump_file && tmp)
{
--- 597,604 
}
  }
  
!   if (tmp
!   && is_gimple_condexpr (tmp))
  {
if (dump_file && tmp)
{
Index: gcc/tree-eh.c
===
*** gcc/tree-eh.c   (revision 180186)
--- gcc/tree-eh.c   (working copy)
*** stmt_could_throw_1_p (gimple stmt)
*** 2512,2518 
|| TREE_CODE_CLASS (code) == tcc_unary
|| TREE_CODE_CLASS (code) == tcc_binary)
  {
!   t = gimple_expr_type (stmt);
fp_operation = FLOAT_TYPE_P (t);
if (fp_operation)
{
--- 2512,2524 
|| TREE_CODE_CLASS (code) == tcc_unary
|| TREE_CODE_CLASS (code) == tcc_binary)
  {
!   if (is_gimple_assign (stmt)
! && TREE_CODE_CLASS (code) == tcc_comparison)
!   t = TREE_TYPE (gimple_assign_rhs1 (stmt));
!   else if (gimple_code (stmt) == GIMPLE_COND)
!   t = TREE_TYPE (gimple_cond_lhs (stmt));
!   else
!   t = gimple_expr_type (stmt);
fp_operation = FLOAT_TYPE_P (t);
if (fp_operation)
{


Re: [PATCH 2/6] Generate virtual locations for tokens

2011-10-19 Thread Ulrich Weigand
Dodji Seketeli wrote:

> libcpp/ChangeLog
> 2011-10-15  Tom Tromey  
>   Dodji Seketeli  
> 
>   (cpp_get_token_1): New static function. Split and extended from
>   cpp_get_token.  Use reached_end_of_context and
>   consume_next_token_from_context.  Unify its return point.  Move
>   the location tweaking from cpp_get_token_with_location in here.
>   (cpp_get_token): Use cpp_get_token_1

Since this set of patches went in, I'm seeing weird errors when building
newlib for SPU:

/home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:
 In function '_recipd2':
/home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:129:3:
 note: use -flax-vector-conversions to permit conversions between vectors with 
differing element types or numbers of subparts
/home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:129:13:
 error: incompatible types when assigning to type '__vector(4) int' from type 
'__vector(4) unsigned int'
/home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:131:10:
 error: incompatible types when assigning to type '__vector(4) int' from type 
'__vector(4) unsigned int'
/home/kwerner/dailybuild/spu-tc-2011-10-18/newlib-head/src/newlib/libm/machine/spu/headers/recipd2.h:132:10:
 error: incompatible types when assigning to type '__vector(4) int' from type 
'__vector(4) unsigned int'

The code in question looks like:

  vec_uint4 isinf, iszero, isdenorm0;
[snip]
  isdenorm0 = spu_cmpeq(spu_shuffle((vec_uint4)exp, (vec_uint4)exp, splat_hi), 
0);

  isinf  = spu_cmpeq((vec_uint4)value_abs, (vec_uint4)expmask);
  iszero = spu_cmpeq((vec_uint4)value_abs, 0);

Note that isinf etc. *are* defined as *unsigned* vector types.

The problem seems to be that the preprocessor somehow stripped
off the "unsigned" keyword.  A reduced test case is:

#define isinf(__x)

#define vec_uint4 __vector unsigned int

vec_uint4 isinf;

(Using the name of a function-like macro as identifer is maybe a bit odd,
but should be perfectly legal C as far as I know ...)

Running this through "cc1 -E" on a SPU target before the patch set yields:

__attribute__((__spu_vector__)) unsigned int isinf;

as expected, but after the patch set we get:

__attribute__((__spu_vector__)) int isinf;

instead.

The problem is clearly related to the platform-specific "macro_to_expand"
routine that is used on SPU to implement the context-sensitive __vector
keyword.

With your changes to cpp_get_token (which is the sole caller of the
macro_to_expand callback), are there any changes in the interface
to the callback?  Any suggestions what could be going on here?

Note that the implementation of the callback is in
  config/spu/spu-c.c:spu_macro_to_expand


Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


[build] Properly test for madvise on Solaris 10 (PR bootstrap/50777)

2011-10-19 Thread Rainer Orth
As described in the PR, Solaris 10 bootstrap is currently broken
compiling ggc-page.c in stage2 due to no declaration for madvise().
This happens because g++ defines _XOPEN_SOURCE=600, which hides the
declaration, and configure doesn't check for a declaration at all.

The following patch fixes both issues by checking for a madvise()
declaration separately and doing so with the C++ compiler.

Testing in progress on i386-pc-solaris2.8 (where it worked before),
i386-pc-solaris2.10 (bootstrap broken), i386-pc-solaris2.11 (worked
before), and x86_64-unknown-linux-gnu.

HAVE_MADVISE is 1 everywhere, HAVE_DECL_MADVISE is 1 everwhere except on
Solaris 10 in stages 2 and 3 where g++ is used.

All bootstraps are beyond the point of the breakage now.  Ok for
mainline if they pass?

Thanks.
Rainer


2011-10-19  Rainer Orth  

PR bootstrap/50777
* configure.ac: Save and restore CXXFLAGS around
gcc_AC_CHECK_DECLS uses.
Check for madvise() declaration with g++ if --enable-build-with-cxx.
* configure: Regenerate.
* config.in: Regenerate.
* ggc-page.c (USING_MADVISE): Also check HAVE_DECL_MADVISE.

# HG changeset patch
# Parent 1fa7cb4d63ec8cf992373dd9ba92615a589542c3
Properly test for madvise on Solaris 10 (PR bootstrap/50777)

diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1094,6 +1094,8 @@ AM_LANGINFO_CODESET
 # We will need to find libiberty.h and ansidecl.h
 saved_CFLAGS="$CFLAGS"
 CFLAGS="$CFLAGS -I${srcdir} -I${srcdir}/../include"
+saved_CXXFLAGS="$CXXFLAGS"
+CXXFLAGS="$CXXFLAGS -I${srcdir} -I${srcdir}/../include"
 gcc_AC_CHECK_DECLS(getenv atol asprintf sbrk abort atof getcwd getwd \
 	strsignal strstr strverscmp \
 	errno snprintf vsnprintf vasprintf malloc realloc calloc \
@@ -1146,6 +1148,21 @@ gcc_AC_CHECK_DECLS(sigaltstack, , ,[
 #include 
 ])
 
+# g++ on Solaris 10+ defines _XOPEN_SOURCE=600, which hides the madvise()
+# prototype.
+AS_IF([test "$ENABLE_BUILD_WITH_CXX" = "yes"],
+  [AC_LANG_PUSH([C++])
+   gcc_AC_CHECK_DECLS(madvise, , ,[
+ #include "ansidecl.h"
+ #include "system.h"
+   ])
+   AC_LANG_POP([C++])],
+  [gcc_AC_CHECK_DECLS(madvise, , ,[
+ #include "ansidecl.h"
+ #include "system.h"
+   ])
+])
+
 # More time-related stuff.
 AC_CACHE_CHECK(for struct tms, ac_cv_struct_tms, [
 AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
@@ -1172,8 +1189,9 @@ if test $gcc_cv_type_clock_t = yes; then
   [Define if  defines clock_t.])
 fi
 
-# Restore CFLAGS from before the gcc_AC_NEED_DECLARATIONS tests.
+# Restore CFLAGS, CXXFLAGS from before the gcc_AC_NEED_DECLARATIONS tests.
 CFLAGS="$saved_CFLAGS"
+CXXFLAGS="$saved_CXXFLAGS"
 
 gcc_AC_INITFINI_ARRAY
 
diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -1,6 +1,6 @@
 /* "Bag-of-pages" garbage collector for the GNU compiler.
Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009,
-   2010 Free Software Foundation, Inc.
+   2010, 2011 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -50,7 +50,8 @@ along with GCC; see the file COPYING3.  
 #define USING_MALLOC_PAGE_GROUPS
 #endif
 
-#if defined(HAVE_MADVISE) && defined(MADV_DONTNEED) && defined(USING_MMAP)
+#if defined(HAVE_MADVISE) && HAVE_DECL_MADVISE && defined(MADV_DONTNEED) \
+&& defined(USING_MMAP)
 # define USING_MADVISE
 #endif
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [build] Properly test for madvise on Solaris 10 (PR bootstrap/50777)

2011-10-19 Thread Paolo Bonzini

On 10/19/2011 04:17 PM, Rainer Orth wrote:

As described in the PR, Solaris 10 bootstrap is currently broken
compiling ggc-page.c in stage2 due to no declaration for madvise().
This happens because g++ defines _XOPEN_SOURCE=600, which hides the
declaration, and configure doesn't check for a declaration at all.

The following patch fixes both issues by checking for a madvise()
declaration separately and doing so with the C++ compiler.

Testing in progress on i386-pc-solaris2.8 (where it worked before),
i386-pc-solaris2.10 (bootstrap broken), i386-pc-solaris2.11 (worked
before), and x86_64-unknown-linux-gnu.

HAVE_MADVISE is 1 everywhere, HAVE_DECL_MADVISE is 1 everwhere except on
Solaris 10 in stages 2 and 3 where g++ is used.

All bootstraps are beyond the point of the breakage now.  Ok for
mainline if they pass?

Thanks.
 Rainer


2011-10-19  Rainer Orth

PR bootstrap/50777
* configure.ac: Save and restore CXXFLAGS around
gcc_AC_CHECK_DECLS uses.
Check for madvise() declaration with g++ if --enable-build-with-cxx.
* configure: Regenerate.
* config.in: Regenerate.
* ggc-page.c (USING_MADVISE): Also check HAVE_DECL_MADVISE.







Ok.

Paolo


Re: [PATCH 2/2] Free large chunks in ggc

2011-10-19 Thread Andi Kleen
> If the size to free is smaller than the quirk size, then it has the very
> undesirable effect that with using GC only you might run unnecessarily out
> of virtual address space, because it allocates pages in 2MB chunks, but
> if they are released in 1MB chunks, those released chunks will never be

1MB is just the minimum, it frees it in whatever it can find
(but only for a single GC cycle usually). So when enough continuous
memory is free it will be reused by GC.

I guess it would be possible to add a fallback to allocate a smaller
chunk if the large chunk fails, but unless someone actually comes
up with a test case I have doubts it is really needed.

> usable again for GC.  Consider on 32-bit address space allocating 3GB
> of GC memory, then freeing stuff in every odd 1MB chunk of pages, then
> wanting to allocate through GC the 1.5GB back.
> 
> IMHO we should munmap immediately in release_pages the > G.pagesize pages,

Then you get the fragmentation problem back in full force.

> those are not very likely to be reused anyway (and it had one in between
> ggc_collect cycle to be reused anyway), and for the == G.pagesize
> (the usual case, the only ones that are allocated in GGC_QUIRK_SIZE sets)
> we should note which page was the first one in the GGC_QUIRK_SIZE chunk
> and munmap exactly those 2MB starting at the first page only.

I tried this first with aligned 2MB chunks, but it doesn't trigger ever in a 
normal (non LTO) bootstrap.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [trans-mem] Rename/split __transaction into __transaction_atomic and __transaction_relaxed.

2011-10-19 Thread Richard Henderson
On 10/19/2011 03:16 AM, Torvald Riegel wrote:
> On Tue, 2011-10-18 at 14:29 -0500, Aldy Hernandez wrote:
>>> Several ICEs in the TM tests on C++, but I think they are old. C TM
>>> tests work except some missing optimizations (old failures too).
>>> c-c++-common/tm/wrap-[12].c fail on C++, but not on C:
>>> wrap-1.c:5:57: error: 'transaction_wrap' argument not an identifier
>>> I believe this is an old error too?
>>> (I don't yet have a testcase summary of before the changes, so I can't
>>> compare properly. But will send an follow-up if there are regressions.)
>>
>> Can you run the tests before your patch to make sure there are no 
>> regressions?  That is, let's be sure that these are indeed old failures.
> 
> Indeed old failures, no new regressions. Tested and boot-strapped on
> x86_64.
> 
> OK for branch?
> 

Yep.


r~


Re: [PATCH 2/2] Free large chunks in ggc

2011-10-19 Thread Jakub Jelinek
On Wed, Oct 19, 2011 at 04:37:45PM +0200, Andi Kleen wrote:
> > If the size to free is smaller than the quirk size, then it has the very
> > undesirable effect that with using GC only you might run unnecessarily out
> > of virtual address space, because it allocates pages in 2MB chunks, but
> > if they are released in 1MB chunks, those released chunks will never be
> 
> 1MB is just the minimum, it frees it in whatever it can find
> (but only for a single GC cycle usually). So when enough continuous
> memory is free it will be reused by GC.
> 
> I guess it would be possible to add a fallback to allocate a smaller
> chunk if the large chunk fails, but unless someone actually comes
> up with a test case I have doubts it is really needed.
> 
> > usable again for GC.  Consider on 32-bit address space allocating 3GB
> > of GC memory, then freeing stuff in every odd 1MB chunk of pages, then
> > wanting to allocate through GC the 1.5GB back.
> > 
> > IMHO we should munmap immediately in release_pages the > G.pagesize pages,
> 
> Then you get the fragmentation problem back in full force.

Why?  For one, such allocations are very rare (you only get them when
a single GC allocation requests > page of memory, like perhaps a string
literal over 4KB or similar or function call with over 1000 arguments etc.).
And if they are unlikely to be reused, not munmapping them means wasting
more virtual address space than needed.

Jakub


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread H.J. Lu
On Wed, Oct 19, 2011 at 6:09 AM, Sergey Ostanevich  wrote:
> On Wed, Oct 19, 2011 at 4:46 PM, Uros Bizjak  wrote:
>> On Wed, Oct 19, 2011 at 2:26 PM, Sergey Ostanevich  
>> wrote:
>>
 You can just change the default in processor_target_table.

 Uros.

>>>
>>> Will it be applicable during optimizations for size? It will hurt,
>>> although not much (see PR).
>>
>> Looking at the code, I'd say that we don't handle -Os in different way.
>>
>>> New patch is below. Ok for trunk as obvious?
>>>
>>> Sergos
>>>
>>> 2011-10-19  Sergey Ostanevich  
>>>
>>>        * gcc/config/i386/i386.c (ix86_option_override_internal): use loop
>>>        align by 16 bytes for Atom platform
>>
>> Please update ChangeLog, like:
>>
>> * gcc/config/i386/i386.c (processor_target_table): Change Atom
>> align_loop_max_skip to 15.
>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index 2c53423..8c60086 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -2596,7 +2596,7 @@ static const struct ptt
>>> processor_target_table[PROCESSOR_max] =
>>>   {&bdver1_cost, 32, 24, 32, 7, 32},
>>>   {&bdver2_cost, 32, 24, 32, 7, 32},
>>>   {&btver1_cost, 32, 24, 32, 7, 32},
>>> -  {&atom_cost, 16, 7, 16, 7, 16}
>>> +  {&atom_cost, 16, 15, 16, 7, 16}
>>>  };
>>
>>
>> OK.
>>
>> Thanks,
>> Uros.
>>
>
> Thanks for comments!
> I double checked: for -Os there's no .p2align appeared.
> For -O2 I see ".p2align 4,,15" instead of ".p2align 4,,7", as expected.
>
> Can someone commit it please?
>
> Regards,
> Sergos
>
>
> 2011-10-18  Sergey Ostanevich 
>
>        * gcc/config/i386/i386.c (processor_target_table): Change Atom
 Please remove gcc/.

>        align_loops_max_skip to 15.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 2c53423..8c60086 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2596,7 +2596,7 @@ static const struct ptt
> processor_target_table[PROCESSOR_max] =
>   {&bdver1_cost, 32, 24, 32, 7, 32},
>   {&bdver2_cost, 32, 24, 32, 7, 32},
>   {&btver1_cost, 32, 24, 32, 7, 32},
> -  {&atom_cost, 16, 7, 16, 7, 16}
> +  {&atom_cost, 16, 15, 16, 7, 16}
>  };
>
>  static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
>



-- 
H.J.


Re: [Patch,AVR]: PR50447: Tweak addhi3

2011-10-19 Thread Georg-Johann Lay
Denis Chertykov schrieb:
>> Georg-Johann Lay schrieb:
>>> Denis Chertykov schrieb:
>>>
 What difference in size of avr-libc ?
>>> I have no tool for smart size analysis, so here is just a diff:
>>>
>>> After rebuilding avr-libc with respective compiler version, did
>>> respectively:
>>>
>>> $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-orig.txt
>>> $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-patch.txt
>>>
>>> and then
>>>
>>> $ diff -U 0 size-orig.txt size-patch.txt > size.diff
>>>
>>> As far as I can see, there is not a big gain but no object increases in
>>> size.
>>>
>>> For some files like ./avr/lib/avr2/libc.a:dtoa_prf.o size gain is 3%.
>>> For ./avr/lib/avr4/libc.a:vfprintf_std.o it's 1.7% and for others just one
>>> instruction better.
>> Actually there are some cases where the size increases by one instruction:
>>
>>> -464  0   0 464 1d0 realloc.o (ex
>>> ./avr/lib/avr31/libc.a)
>>> +466  0   0 466 1d2 realloc.o (ex
>>> ./avr/lib/avr31/libc.a)
>>> -464  0   0 464 1d0 realloc.o (ex
>>> ./avr/lib/avr3/libc.a)
>>> +466  0   0 466 1d2 realloc.o (ex
>>> ./avr/lib/avr3/libc.a)
>> Will have a look tomorrow; presumably it's adding +/-1 that force a scratch
>> whilst the old code does not.

Is because different spill generation. Both cases do spilling (same frame size)
but a bit different. If there was a MOVW instruction the code would be smaller,
so I think it's a rare corner case.

> However results are good.
> The patch can be committed.
> 
> Denis.

Applied with the proposed can_create_pseudo_p()

Johann



Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Andrew MacLeod

On 10/18/2011 06:25 PM, Andrew MacLeod wrote:


Its impossible to implement a weak compare and swap unless you return 
both parameters in one operation.


the compare_exchange matches the atomic interface for c++, and if we 
can't resolve it with a lock free instruction sequence, we have to 
leave an external call with this format to a library, so I that why I 
provide this built-in.


Neither rth nor I like the addressable parameter, so thats why I left 
the rtl pattern for weak and strong compare and swap without that 
addressable argument, and let this builtin generate wrapper code 
around it.


You could provide an __atomic version of the bool and val routines 
with memory model we toyed with making the compare_and_swap return 
both values so we could implement a weak version, but getting 2 return 
values is not pretty.  It could be done with 2 separate built-ins that 
relate to each other, but thats not great either.


I've thought about various longer term schemes, but havent been able to 
settle on one I really like.   Ideally, if we know we can generate lock 
free instructions, we expose the wrapper code to the tree optimizers.


I've considered:
1)  adding tree support for a CAS primitive which has 2 results.. that 
pretty invasive but has nice features.


2) a 2 part built-in.. one which returns a value, and a second one which 
takes that value and then returns the boolean:  ie

   val = __atomic_compare_and_swap (&mem, expected, desired, model)
   if (__atomic_compare_and_swap_success (val))
 ...

and during expansion from SSA to RTL, you look for uses of the result of 
__atomic_compare_and_swap in  __atomic_compare_and_swap_success, and you 
can decide what RTL pattern to use and 'merge' the 2 builtins into one 
pattern. You can optimize the RTL pattern used based on what, if any, 
other uses there are of the 2 results.   I think this should work OK...


3) Waiting for a flash of brilliance (may never come) or suggestion>  :-)


I decided for the moment to punt on exploring those and give us more 
time to think about the best way to do it.   We do need to be able to 
call this specific interface for external library calls,  but we are not 
locked into this for inline expansion of lock free instructions.   In 
c-common.c where we turn __atomic_exchange_compare into  
__atomic_compare_exchange_{1,2,4,8,16}, we can instead turn it into a 
code sequence using a new __atomic_compare_and_swap builtin or tree 
code.  The wrapper code that is currently emitted as RTL could then be 
emitted as tree expressions before the SSA optimizers see anything.   So 
no later than the next release of GCC, I would expect to have a fully 
flushed out solution that gives us all the nice inlines and removes 
addr-taken flags and such.I just don't feel like there is time at 
the moment to make the correct decision while I'm trying to get the 
library ABI right.


If I find some time, I may experiment with #2 next week.

Andrew




Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Richard Henderson
On 10/18/2011 03:07 PM, Jakub Jelinek wrote:
> On Tue, Oct 18, 2011 at 06:03:16PM -0400, Andrew MacLeod wrote:
>>
>> Here's the last of the missing intrinsics.   compare_exchange is
>> slightly different than the others since it doesn't map directly to
>> an rtl pattern. Its format is :
>>
>>   bool __atomic_compare_exchange (T* mem, T* expected, T desired,
>> bool weak, memory_order success, memory_order failure)
>>
>> note that the expected parameter is a pointer as well.  In the case
>> where false is returned, the value of expected is updated.
> 
> I think the __sync_* way here was much better (two intrinsics instead of
> one, one bool-ish and one returning val).  With mandating T*expected
> you force it to be addressable, probably until expansion time at which point
> it will be just forced into memory, which is highly undesirable.

In one of Andrew's previous ideas he split _ace into two separate builtins:

  T __atomic_compare_exchange (T* mem, T oldval, T newval, ...);
  bool __atomic_compare_exchange_success (T);

where the input to __ace_success *must* be the ssa_name output of _ace.
We then use magic to wire up both results to the same rtl pattern.

We can still probably do something exactly like that, but Andrew is 
concerned about getting the generic user-level interface correct.


r~
  


[cxx-mem-model] fix some trivial renaming fallout

2011-10-19 Thread Andrew MacLeod
As was pointed out, __get_builtin_sync_mem was renamed to 
__get_builtin_atomic  which isn't really right, so this patch renames it 
back.


It also includes some formatting fixes which resulted from __sync_mem 
having a different number of letters than __atomic.


Andrew


* optabs.c (add_op): Fix formatting from rename.
(expand_atomic_fetch_op): Fix formatting from rename.
* builtins.c (get_builtin_sync_mem): Reverse renamed get_builtin_atomic.
(expand_builtin_sync_operation): Fix rename issues.
(expand_builtin_compare_and_swap): Fix rename issues.
(expand_builtin_sync_lock_test_and_set): Fix rename issues.
(expand_builtin_sync_lock_release): Fix rename issues.
(expand_builtin_atomic_exchange): Fix rename issues.
(expand_builtin_atomic_compare_exchange): Fix rename issues.
(expand_builtin_atomic_load): Fix rename issues.
(expand_builtin_atomic_store): Fix rename issues.
(expand_builtin_atomic_fetch_op): Fix rename issues.
(expand_builtin): Fix formatting from rename.

Index: optabs.c
===
*** optabs.c(revision 180174)
--- optabs.c(working copy)
*** struct op_functions {
*** 7228,7239 
  
  /* Initialize the fields for each supported opcode.  */
  static const struct op_functions add_op = { atomic_fetch_add_optab,
! atomic_add_fetch_optab,
! atomic_add_optab,
! sync_old_add_optab,
! sync_new_add_optab,
! sync_add_optab,
! MINUS
};
  
  static const struct op_functions sub_op = { atomic_fetch_sub_optab,
--- 7228,7239 
  
  /* Initialize the fields for each supported opcode.  */
  static const struct op_functions add_op = { atomic_fetch_add_optab,
!   atomic_add_fetch_optab,
!   atomic_add_optab,
!   sync_old_add_optab,
!   sync_new_add_optab,
!   sync_add_optab,
!   MINUS
};
  
  static const struct op_functions sub_op = { atomic_fetch_sub_optab,
*** maybe_emit_op (const struct op_functions
*** 7358,7364 
 AFTER is false to return the value before the operation (fetch_OP).  */
  rtx
  expand_atomic_fetch_op (rtx target, rtx mem, rtx val, enum rtx_code code,
! enum memmodel model, bool after)
  {
enum machine_mode mode = GET_MODE (mem);
const struct op_functions *optab;
--- 7358,7364 
 AFTER is false to return the value before the operation (fetch_OP).  */
  rtx
  expand_atomic_fetch_op (rtx target, rtx mem, rtx val, enum rtx_code code,
!   enum memmodel model, bool after)
  {
enum machine_mode mode = GET_MODE (mem);
const struct op_functions *optab;
Index: builtins.c
===
*** builtins.c  (revision 180174)
--- builtins.c  (working copy)
*** get_builtin_sync_mode (int fcode_diff)
*** 5072,5078 
 for the builtin_sync operations.  */
  
  static rtx
! get_builtin_atomic (tree loc, enum machine_mode mode)
  {
rtx addr, mem;
  
--- 5072,5078 
 for the builtin_sync operations.  */
  
  static rtx
! get_builtin_sync_mem (tree loc, enum machine_mode mode)
  {
rtx addr, mem;
  
*** expand_builtin_sync_operation (enum mach
*** 5173,5183 
  }
  
/* Expand the operands.  */
!   mem = get_builtin_atomic (CALL_EXPR_ARG (exp, 0), mode);
val = expand_expr_force_mode (CALL_EXPR_ARG (exp, 1), mode);
  
return expand_atomic_fetch_op (target, mem, val, code, MEMMODEL_SEQ_CST,
!after);
  }
  
  /* Expand the __sync_val_compare_and_swap and __sync_bool_compare_and_swap
--- 5173,5183 
  }
  
/* Expand the operands.  */
!   mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode);
val = expand_expr_force_mode (CALL_EXPR_ARG (exp, 1), mode);
  
return expand_atomic_fetch_op (target, mem, val, code, MEMMODEL_SEQ_CST,
!after);
  }
  
  /* Expand the __sync_val_compare_and_swap and __sync_bool_compare_and_swap
*** expand_builtin_compare_and_swap (enum ma
*** 5192,5198 
rtx old_val, new_val, mem;
  
/* Expand the operands.  */
!   mem = get_builtin_atomic (CALL_EXPR_ARG (exp, 0), mode);
old_val = expand_expr_force_mode (CALL_EXPR_ARG (exp, 1), mode);
new_val = expand_expr_force_mode (CALL_EXPR_ARG (exp, 2), mode);
  
--- 5192,5198 
rtx old_val, new_val, mem;
  
/* Expand the operan

Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Jakub Jelinek
On Wed, Oct 19, 2011 at 08:05:26AM -0700, Richard Henderson wrote:
> In one of Andrew's previous ideas he split _ace into two separate builtins:
> 
>   T __atomic_compare_exchange (T* mem, T oldval, T newval, ...);
>   bool __atomic_compare_exchange_success (T);
> 
> where the input to __ace_success *must* be the ssa_name output of _ace.
> We then use magic to wire up both results to the same rtl pattern.
> 
> We can still probably do something exactly like that, but Andrew is 
> concerned about getting the generic user-level interface correct.

Until we support more than one lhs stmts in GIMPLE, there is always
an option to return a _Complex or vector or struct from the builtin,
where one part of the return value would contain the bool whether
it has succeeded and the other part would contain the old value.
That of course doesn't need to be the externally exposed builtin,
it could be an internal builtin (perhaps with space in the name or
something).  Or perhaps the user builtin
if (__atomic_compare_exchange (&mem, &expected, newval))
could be lowered into
tmp1 = expected;
tmp2 = __atomic val_compare_exchange (&mem, tmp1, newval);
expected = tmp2;
if (tmp1 == tmp2)
if it is known that it can be handled by the backend.

Jakub


Re: [Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit

2011-10-19 Thread Janne Blomqvist
On Mon, Oct 17, 2011 at 23:52, Janne Blomqvist
 wrote:
> On Mon, Oct 17, 2011 at 19:03, Tobias Burnus  wrote:
>> Hi Janne,
>>
>> On 10/17/2011 05:30 PM, Janne Blomqvist wrote:
>>>
>>> On Mon, Oct 17, 2011 at 15:49, Tobias Burnus  wrote:

 This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine
 and
 the FLUSH statement. It removes the _commit from gfortran's buf_flush.
>>>
>>> Like I argued in this message
>>> http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross
>>> mistake.
>>
>> [...]
>>
>> And I think it is a mistake to not make the data available to other
>> processes as it is indicated by the Fortran 2008 standard:
>>
>> "Execution of a FLUSH statement causes data written to an external le to be
>> available to other processes, or causes data placed in an external file by
>> means other than Fortran to be available to a READ statement. These actions
>> are processor dependent."
>>
>> Thus, I think it makes sense for FLUSH to call _commit on Windows.
>
> I'm not actually sure we can draw such conclusions. What we know is
> that metadata updates to the directory are delayed. It wouldn't
> surprise me if opening a file (which presumably is a atomic operation
> in order to avoid race conditions just like on POSIX) forces the
> kernel to sync metadata of other handles to the same file.

I did some further googling, and

http://stackoverflow.com/questions/2883691/fflush-on-stdout

seems to suggest that the my explanation above is roughly what is
happening. That is, opening and closing the file in another program,
"type" in the link above, flushes the metadata to the directory.  Also
from  the MSDN link referenced there

"The only guarantee about a file timestamp is that the file time is
correctly reflected when the handle that makes the change is closed. "

(which would suggest that the same hold for other file metadata as
well, such as the size).


Explanation of file caching in Windows:
http://msdn.microsoft.com/en-us/library/aa364218%28v=VS.85%29.aspx

Some MS presentation about how to make apps behave nicely with SMB:
http://mschnlnine.vo.llnwd.net/d1/pdc08/PPTX/ES23.pptx

Under the heading of "Platform Support for Metadata Caching":

"Metadata caching is best effort and there
are very limited consistency guarantees

Metadata caches expire after a fixed time
"

(Though it's not entirely clear if the above "platform support" means
only SMB or Windows filesystem semantics in general.)

So, I think the picture is essentially:

- write() (which is a wrapper around WriteFile(EX)) transfers data and
metadata to the system cache (kernel page cache in unix terminology).

- However, the metadata is written to the directory lazily, allowing
applications that do stat() (or the equivalent Win32 API call(s)) on a
pathname to see stale data. This, incidentally, is what gfortran is
doing and what caused the issue that led to the introduction of
_commit in the first place.

- Closing the file, or _commit() (a wrapper around
FlushFileBuffers()), or (per the stackoverflow link above)
opening/closing the file in another process will force the directory
flush immediately.

Some investigation into how other language support libraries handle this:

- For MS-DOS compatibility (back when OS file caching wasn't that
advanced, one presumes), the MS C runtime allows the user to link in
an extra object COMMODE.OBJ which makes fflush() also call _commit()
recreating the MS-DOS behavior. By default, however, this is not done.
Also, there are some nonstandard flags that can be passed to fopen()
to indicate that one wants the MS-DOS behavior.

- For the MS C++ compiler, the same COMMODE.OBJ linking can be done,
and there is some nonstandard extension allowing one to get the fd
from a C++ stream and then call _commit on it. By default flush() on a
stream does not call _commit/FlushFileBuffers, it just does a
WriteFile().

- For .NET, there is the FileStream.Flush() method, which flushes the
user-space buffer without calling _commit/FlushFileBuffers. In the
latest version of .NET, there is a new FileStream.Flush(bool) method
which can be used to call FlushFileBuffers. In previous .NET versions,
people used PInvoke (the native code calling interface) to call
FlushFileBuffers if needed.

- For the runtimes provided with GCC, grepping the source tree shows
that libgfortran is the only occurence of _commit. In libjava there is
a FlushFileBuffers call, but it's #if 0'ed away.

That is, except for libgfortran, no other language runtime by default
makes an effort to synchronize the metadata.

In conclusion, I still think that my previous patch which got rid of
the _commit and the reliance on using stat() via pathname is the
correct approach. "dir" might show stale data for the file size, but
this seems to be the norm on Windows, and opening the file in another
process will give the correct data. So in practice I don't think there
will be any problems from getting rid of _commit.

-- 
Janne

Re: [PATCH, PR50769] Fix ICE in phi_alternatives_equal.

2011-10-19 Thread Richard Guenther
On Wed, Oct 19, 2011 at 10:23 AM, Tom de Vries  wrote:
> Richard,
>
> For the example from the PR, -ftree-tail-merge discovers that blocks 6 and 7 
> are
> equal, removes block 7 and redirects the outgoing edge of it's predecessor 
> block
> 11 to block 6.
>
> Block 6 contains a phi, which after redirecting the edge from block 11 looks
> like this:
> ...
> .MEM_54 = PHI <.MEM_27(18), .MEM_26(17), (11)>
> ...
>
> Since update_vops is false, TODO_update_ssa_only_virtuals will be run
> and -ftree-tail-merge doesn't update the phi.
>
> However during cleanup_tree_cfg (which is done before updating the ssa),
> phi_alternatives_equal is called which asserts that the phi arguments are 
> defined.
>
> The patch fixes the ICE by setting the empty phi argument to .MEM.
>
> bootstrapped and reg-tested on x86_64.
>
> Ok for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2011-10-19  Tom de Vries  
>
>        PR 50769/tree-optimization
>        * tree-ssa-tail-merge.c (replace_block_by): Calculate phi_vuse2
>        unconditionally.  Handle case that phi_vuse2 is not an SSA_NAME.  Add
>        dummy argument .MEM to phi when increasing number of arguments of phi 
> by
>        redirecting edges to the block with phi.
>


Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Richard Henderson
> !   weak = expand_expr (CALL_EXPR_ARG (exp, 3), NULL_RTX, ptr_mode,
> !   EXPAND_NORMAL);
> ! 
> !   if (weak != const0_rtx && weak != const1_rtx)
> ! {
> !   error ("strong/weak parameter must be 0 or 1 for 
> %<__atomic_compare_exchange%>");
> !   return NULL_RTX;
> ! }

Really?  Are you even allowed to do that, like unto the variable memory model 
parameters?

> +   /* Emit the compare_and_swap.  */
> create_output_operand (&ops[0], target, QImode);
> create_output_operand (&ops[1], mem, mode);
> !   create_output_operand (&ops[2], current_val, mode);
> !   create_convert_operand_to (&ops[3], expected_val, mode, true);
> !   create_convert_operand_to (&ops[4], desired, mode, true);
> !   create_integer_operand (&ops[5], success);
> !   expand_insn (icode, 6, ops);

The mem operand must use create_fixed_operand, as with all of the other atomic 
builtins.
I strongly suggest swapping op 1 and op2, so that all of the "real" outputs 
come first.

I suggested on IRC that you examine the mode of the boolean output operand, and 
not
hard-code QImode.  Most targets will produce a word-sized boolean output that 
need not
be zero-extended.

> !   emit_cmp_and_jump_insns (ops[0].value, const0_rtx, NE, const0_rtx,
> !QImode, 1, true_label);
> ! 
> !   /* if not successful copy expected_val into *expected and issue the 
> !  failure fence.  */
> !   emit_move_insn (gen_rtx_MEM (mode, expected), current_val);
> !   expand_builtin_mem_thread_fence (failure);
> ! 
> !   /* If no success fence is required, we're done. Otherwise jump around 
> the
> !  code for TRUE and emitthe fence.  */
> !   if (true_label != done_label)
> ! {
> !   emit_jump_insn (gen_jump (done_label));
> ! 
> !   emit_label (true_label);
> !   if (success != MEMMODEL_RELAXED && success != MEMMODEL_RELEASE)
> ! expand_builtin_mem_thread_fence (success);
> ! }

Really?  We can do better than this.  In particular, by leaving it up to the 
target.
In the x86 case, cmpxchg is a full barrier.  Always.  No need for any followup 
barriers.

The ll/sc targets are going to generate code that looks like

start-barrier
 restart:
ll  curval, mem
mov t, 0
cmp curval, oldval
bne fail
mov t, newval
sc  t, mem
beq t, restart  // strong-version only
 fail:
end-barrier

  // outputs: t = 1/0 for success/failure
  //  curval = the current value of the memory, to be stored back into 
expected

where the position of fail: before or after the end-barrier depends on the 
relative
strength of the memory models.  No one is going to want to insert an extra jump 
around a barrier, especially for a not-expected failure path.

We should encode the two memory model parameters and the weak parameter into a 
single
CONST_INT operand.  Probably with some macros in optabs.h to extract those 
parameters
in the targets.


r~


Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order

2011-10-19 Thread H.J. Lu
On Wed, Oct 19, 2011 at 6:21 AM, Uros Bizjak  wrote:
> On Wed, Oct 19, 2011 at 3:07 PM, Kirill Yukhin  
> wrote:
>
>> Here is (almost obvous) patch, which fixes PR50766.
>>
>> ChangeLog  entry:
>> 2011-10-19  Kirill Yukhin  
>>
>>        * config/i386/i386.md (bmi_bextr_): Update register/
>>        memory operand order.
>>        (bmi2_bzhi_3): Ditto.
>>        (bmi2_pdep_3): Ditto.
>>        (bmi2_pext_3): Ditto.

Please mention PR #.

>> Bootstrapped, test (from bug) passing.
>>
>> Could you please have a look?
>
> Please also add the testcase from the PR. You can use  { dg-do
> assemble }, but you have to check for BMI2 effective target support.
>
> BTW: I can't find BMI2 instruction reference documentation, so I'm

It is in Intel AVX spec:

http://software.intel.com/en-us/avx/

> just rubberstamping the patch as obvious.
>

-- 
H.J.


Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Andrew MacLeod

On 10/19/2011 11:16 AM, Jakub Jelinek wrote:

On Wed, Oct 19, 2011 at 08:05:26AM -0700, Richard Henderson wrote:

In one of Andrew's previous ideas he split _ace into two separate builtins:

   T __atomic_compare_exchange (T* mem, T oldval, T newval, ...);
   bool __atomic_compare_exchange_success (T);

where the input to __ace_success *must* be the ssa_name output of _ace.
We then use magic to wire up both results to the same rtl pattern.

We can still probably do something exactly like that, but Andrew is
concerned about getting the generic user-level interface correct.

Until we support more than one lhs stmts in GIMPLE, there is always
an option to return a _Complex or vector or struct from the builtin,
where one part of the return value would contain the bool whether
it has succeeded and the other part would contain the old value.
That of course doesn't need to be the externally exposed builtin,
it could be an internal builtin (perhaps with space in the name or

thats a possible 3rd option. create a custom struct for the return value.
Accessing the values in the struct are still going to be memory accesses 
until RTL expansion time aren't they?

user writes:

atomic_compare_exchange (&mem, &expected, desired, model))

gets turned into something like

typedef struct { int val;  bool success; } __atomic_CAS_result;

{
__atomic_CAS_result __res;
tmp = *expected;
__res = __atomic_CAS (&mem, tmp, desired, model)
if (!res.success)
  *expected = res.val;
return res.success;
}

since __res is a struct, that will have to be allocated on the stack and 
then see if the RTL optimizations can remove it, right?   The user 
variable would be absolved off all the address taken crud though.



something).  Or perhaps the user builtin
if (__atomic_compare_exchange (&mem,&expected, newval))
could be lowered into
tmp1 = expected;
tmp2 = __atomic val_compare_exchange (&mem, tmp1, newval);
expected = tmp2;
if (tmp1 == tmp2)
if it is known that it can be handled by the backend.

Thats what we generate now if we fall back to the __sync 
implementations, but that only works for strong CAS.  Of course, we're 
only falling back to that at RTL time rather than the tree 
optimizers...  we could fudge around earlier perhaps for the strong 
version...


Andrew


Re: [translation patch] Bunch of trivial PRs: 48638, 49517, 49704

2011-10-19 Thread H.J. Lu
On Wed, Oct 19, 2011 at 6:11 AM, Paolo Carlini  wrote:
> Hi,
>
> a bunch of trivial fixes from Roland. The only one I deem *slightly* more
> controversial is "AST" uppercase instead of "ast" in cp/semantics.c, I'm
> gonna wait a bit in case Jason has a different opinion...
>
> Thanks,
> Paolo.
>

We also need to update plugin tests:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50796

-- 
H.J.


Re: [PATCH 2/2] Free large chunks in ggc

2011-10-19 Thread Jan Hubicka
> Why?  For one, such allocations are very rare (you only get them when
> a single GC allocation requests > page of memory, like perhaps a string
> literal over 4KB or similar or function call with over 1000 arguments etc.).
> And if they are unlikely to be reused, not munmapping them means wasting
> more virtual address space than needed.

We produce quite bit of large hashtables in GGC and those will happily be
bigger than 4KB.

Honza


Re: [PATCH, m68k] Enable building for ColdFire Linux

2011-10-19 Thread Andreas Schwab
Julian Brown  writes:

>   Maxim Kuvyrkov  
>
> gcc/
> * config/m68k/t-linux (M68K_MLIB_CPU): Add ColdFire CPUs.

Ok.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH 2/6] Generate virtual locations for tokens

2011-10-19 Thread Dodji Seketeli
"Ulrich Weigand"  writes:

> The problem seems to be that the preprocessor somehow stripped
> off the "unsigned" keyword.  A reduced test case is:
>
> #define isinf(__x)
>
> #define vec_uint4 __vector unsigned int
>
> vec_uint4 isinf;
>
> (Using the name of a function-like macro as identifer is maybe a bit odd,
> but should be perfectly legal C as far as I know ...)
>
> Running this through "cc1 -E" on a SPU target before the patch set yields:
>
> __attribute__((__spu_vector__)) unsigned int isinf;
>
> as expected, but after the patch set we get:
>
> __attribute__((__spu_vector__)) int isinf;
>
> instead.
>
> The problem is clearly related to the platform-specific "macro_to_expand"
> routine that is used on SPU to implement the context-sensitive __vector
> keyword.
>
> With your changes to cpp_get_token (which is the sole caller of the
> macro_to_expand callback), are there any changes in the interface
> to the callback?

Not that I would expect.

>  Any suggestions what could be going on here?

Nothing obvious is coming to my mind right now, I am still looking
at the issue.

-- 
Dodji


PR bootstrap/50709 (bootstrap miscompare)

2011-10-19 Thread Jan Hubicka
Hi,
the ENABLE_CHECKING code verifying consistency of inliner cache has effect on 
changing
order of querries to the fibheap that consequentely makes entries with same key 
come
in different orders.

Fixed by enabling the checking aways.  I also added extra check that the keys 
are conservatively
correct.

Bootstrapped/regtested x86_64-linux.


Index: ChangeLog
===
--- ChangeLog   (revision 180190)
+++ ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2011-10-19  Jan Hubicka  
+
+   PR bootstrap/50709
+   * ipa-inline.c (inline_small_functions): Fix checking code to not make
+   effect on fibheap stability.
+
 2011-10-19  Roland Stigge  
 
PR translation/48638
Index: ipa-inline.c
===
--- ipa-inline.c(revision 180190)
+++ ipa-inline.c(working copy)
@@ -1384,6 +1384,7 @@ inline_small_functions (void)
   struct cgraph_node *where, *callee;
   int badness = fibheap_min_key (heap);
   int current_badness;
+  int cached_badness = -1;
   int growth;
 
   edge = (struct cgraph_edge *) fibheap_extract_min (heap);
@@ -1392,16 +1393,20 @@ inline_small_functions (void)
   if (!edge->inline_failed)
continue;
 
-  /* Be sure that caches are maintained consistent.  */
 #ifdef ENABLE_CHECKING
+  /* Be sure that caches are maintained conservatively consistent.
+This means that cached badness is allways smaller or equal
+to the real badness.  */
+  cached_badness = edge_badness (edge, false);
+#endif
   reset_edge_growth_cache (edge);
   reset_node_growth_cache (edge->callee);
-#endif
 
   /* When updating the edge costs, we only decrease badness in the keys.
 Increases of badness are handled lazilly; when we see key with out
 of date value on it, we re-insert it now.  */
   current_badness = edge_badness (edge, false);
+  gcc_assert (cached_badness == -1 || cached_badness <= current_badness);
   gcc_assert (current_badness >= badness);
   if (current_badness != badness)
{


Re: [PATCH PR50572] Tune loop alignment for Atom

2011-10-19 Thread H.J. Lu
On Wed, Oct 19, 2011 at 9:30 AM, Sergey Ostanevich  wrote:

> 2011-10-18  Sergey Ostanevich 
>
>    * config/i386/i386.c (processor_target_table): Change Atom
>    align_loops_max_skip to 15.

Please add PR # in ChangeLog.

>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 2c53423..8c60086 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2596,7 +2596,7 @@ static const struct ptt
> processor_target_table[PROCESSOR_max] =
>   {&bdver1_cost, 32, 24, 32, 7, 32},
>   {&bdver2_cost, 32, 24, 32, 7, 32},
>   {&btver1_cost, 32, 24, 32, 7, 32},
> -  {&atom_cost, 16, 7, 16, 7, 16}
> +  {&atom_cost, 16, 15, 16, 7, 16}
> };
>
> static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =

Please provide a patch which can be applied.  Cut/paste doesn't create
a working patch.  Please attach it.

-- 
H.J.


Fwd: Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Andrew MacLeod


grr, reposting due to thunderbird apparently sneaking some HTML in and 
it being rejected... sigh.


On 10/19/2011 11:34 AM, Richard Henderson wrote:

!   weak = expand_expr (CALL_EXPR_ARG (exp, 3), NULL_RTX, ptr_mode,
! EXPAND_NORMAL);
!
!   if (weak != const0_rtx&&  weak != const1_rtx)
! {
!   error ("strong/weak parameter must be 0 or 1 for 
%<__atomic_compare_exchange%>");
!   return NULL_RTX;
! }

Really?  Are you even allowed to do that, like unto the variable memory model 
parameters?

well, they are explicitly'compare_exchange_strong'  and 
'compare_exchange_weak' calls...   so yes, they have 'hardcoded' one or 
the other :-)
we could alternatively do 2 separate builtins, but I didn't see the 
point.  But Im ambivalent.

+   /* Emit the compare_and_swap.  */
 create_output_operand (&ops[0], target, QImode);
 create_output_operand (&ops[1], mem, mode);
!   create_output_operand (&ops[2], current_val, mode);
!   create_convert_operand_to (&ops[3], expected_val, mode, true);
!   create_convert_operand_to (&ops[4], desired, mode, true);
!   create_integer_operand (&ops[5], success);
!   expand_insn (icode, 6, ops);

The mem operand must use create_fixed_operand, as with all of the other atomic 
builtins.
I strongly suggest swapping op 1 and op2, so that all of the "real" outputs 
come first.

oh, right, my error on the output operand...  I will shuffle.


I suggested on IRC that you examine the mode of the boolean output operand, and 
not
hard-code QImode.  Most targets will produce a word-sized boolean output that 
need not
be zero-extended.

I clearly misunderstood what you said then, I thought you said use 
QImode :-P.   So, look at target and use that mode?  target may not be 
set sometimes tho right? what do you use then?  QImode?

!   emit_cmp_and_jump_insns (ops[0].value, const0_rtx, NE, const0_rtx,
!  QImode, 1, true_label);
!
!   /* if not successful copy expected_val into *expected and issue the
!failure fence.  */
!   emit_move_insn (gen_rtx_MEM (mode, expected), current_val);
!   expand_builtin_mem_thread_fence (failure);
!
!   /* If no success fence is required, we're done. Otherwise jump around 
the
!  code for TRUE and emitthe fence.  */
!   if (true_label != done_label)
! {
! emit_jump_insn (gen_jump (done_label));
!
! emit_label (true_label);
! if (success != MEMMODEL_RELAXED&&  success != MEMMODEL_RELEASE)
!   expand_builtin_mem_thread_fence (success);
!   }

Really?  We can do better than this.  In particular, by leaving it up to the 
target.
In the x86 case, cmpxchg is a full barrier.  Always.  No need for any followup 
barriers.


This is the tricky part.  Actually, it is wrong but for a different 
reason. I don't think the CAS needs a model.  If I understand my 
previous conversation with Lawrence, we need to know what mode the 
native CAS is.  If its seq-cst, then we don't need to do anything else. 
If it is relaxed though, then additional fences are required. Worst 
case, if the native CAS is relaxed, the fences need to be issued as such 
to implement


 compare_exchange (T* ptr, T* old, new, success, failure)

   memory_fence (success) // release modes
   val = *old;
   res = relaxed_CAS (ptr, val, new)
   if (success)
  {
memory_fence (success);   // acquire modes
return true;
  }
   *old = res;
   memory_fence (failure)
   return false;


so leaving it up to the target for the CAS isnt enough..  we may need to 
issue one or more of these fences if the native CAS mode is weaker than 
the success and/or failure mode.


Or perhaps I misunderstood... Lawrence?

if instead it maps to

   val = *old;
   memory_fence (success) // release modes
   res = relaxed_CAS (ptr, val, new)
   if (success_of_CAS)
  {
memory_fence (success);   // acquire modes
return true;
  }
   memory_fence (failure)
   *old = res;
   return false;

then the CAS could take care of issuing all the required fences  without 
a problem.   I'd be very happy with that :-)



The ll/sc targets are going to generate code that looks like

start-barrier
  restart:
ll  curval, mem
mov t, 0
cmp curval, oldval
bne fail
mov t, newval
sc  t, mem
beq t, restart  // strong-version only
  fail:
end-barrier

   // outputs: t = 1/0 for success/failure
   //  curval = the current value of the memory, to be stored back into 
expected

where the position of fail: before or after the end-barrier depends on the 
relative
strength of the memory models.  No one is going to want to insert an extra jump
around a barrier, especially for a not-expected failure path.

We should encode the two memory model parameters and the weak parameter into a 
single
CONST_INT operand.  Prob

[4.6,patch,libgfortran/io] Fix build issue with _commit [PR5016]

2011-10-19 Thread Tobias Burnus
I intent to commit this patch - created mostly by Janne - soon to 
gcc-4_6-branch; it should solve a build issue.


The patch was build and regtested on x86-64-linux and build and tested 
on MinGW and Cygwin by Kai.


It is a follow up to http://gcc.gnu.org/ml/fortran/2011-10/msg00132.html 
/ http://gcc.gnu.org/ml/gcc-cvs/2011-10/msg00734.html


Please speak up immediately, if you have objects against the committal. 
(Another option would be to revert previous patch, which solved a severe 
performance regression on Windows - thus that patch plus this patch is 
better.)


Sorry for disrupting the 4.6.2 process a bit.

Tobias
libgfortran/
2011-10-19  Janne Blomqvist  
	Kai Tietz  
	Tobias Burnus  

	PR fortran/50016
	* io/unix.h (flush_sync): Add new libgfortran-internal prototype.
	* io/unix.c (flush_sync): New function, which calls sflush and
	on MinGW(-w64) also _commit.
	(flush_all_units, flush_all_units_1): Replace sflush/_commit by
	flush_sync.
	* io/file_pos.c (st_flush): Replace sflush/_commit by flush_sync.
	* io/intrinsics.c (flush_i4, flush_i8): Ditto.

Index: libgfortran/io/file_pos.c
===
--- libgfortran/io/file_pos.c	(revision 180180)
+++ libgfortran/io/file_pos.c	(working copy)
@@ -451,11 +451,7 @@ st_flush (st_parameter_filepos *fpp)
   if (u->flags.form == FORM_FORMATTED)
 fbuf_flush (u, u->mode);
 
-  sflush (u->s);
-#ifdef _WIN32
-  /* Without _commit, changes are not visible to other file descriptors.  */
-  _commit (u->s->fd);
-#endif
+  flush_sync (u->s);
   unlock_unit (u);
 }
   else
Index: libgfortran/io/unix.c
===
--- libgfortran/io/unix.c	(revision 180180)
+++ libgfortran/io/unix.c	(working copy)
@@ -1522,6 +1522,23 @@ retry:
   return u;
 }
 
+
+/* Flush dirty data, making sure that OS metadata is updated as
+   well. Note that this is VERY slow on mingw due to committing data
+   to stable storage.  */
+int
+flush_sync (stream * s)
+{
+  if (sflush (s) == -1)
+return -1;
+#ifdef __MINGW32__
+  if (_commit (((unix_stream *)s)->fd) == -1)
+return -1;
+#endif
+  return 0;
+}
+
+
 static gfc_unit *
 flush_all_units_1 (gfc_unit *u, int min_unit)
 {
@@ -1538,14 +1555,7 @@ flush_all_units_1 (gfc_unit *u, int min_unit)
 	  if (__gthread_mutex_trylock (&u->lock))
 	return u;
 	  if (u->s)
-	{
-	  sflush (u->s);
-#ifdef _WIN32
-	  /* Without _commit, changes are not visible to other
-		 file descriptors.  */
-	  _commit (u->s->fd);
-#endif
-	}
+	flush_sync (u->s);
 	  __gthread_mutex_unlock (&u->lock);
 	}
   u = u->right;
@@ -1575,12 +1585,7 @@ flush_all_units (void)
 
   if (u->closed == 0)
 	{
-	  sflush (u->s);
-#ifdef _WIN32
-	  /* Without _commit, changes are not visible to other
-	 file descriptors.  */
-	  _commit (u->s->fd);
-#endif
+	  flush_sync (u->s);
 	  __gthread_mutex_lock (&unit_lock);
 	  __gthread_mutex_unlock (&u->lock);
 	  (void) predec_waiting_locked (u);
Index: libgfortran/io/intrinsics.c
===
--- libgfortran/io/intrinsics.c	(revision 180180)
+++ libgfortran/io/intrinsics.c	(working copy)
@@ -206,12 +206,7 @@ flush_i4 (GFC_INTEGER_4 *unit)
   us = find_unit (*unit);
   if (us != NULL)
 	{
-	  sflush (us->s);
-#ifdef _WIN32
-	  /* Without _commit, changes are not visible
-	 to other file descriptors.  */
-	  _commit (u->s->fd);
-#endif
+	  flush_sync (us->s);
 	  unlock_unit (us);
 	}
 }
@@ -234,12 +229,7 @@ flush_i8 (GFC_INTEGER_8 *unit)
   us = find_unit (*unit);
   if (us != NULL)
 	{
-	  sflush (us->s);
-#ifdef _WIN32
-	  /* Without _commit, changes are not visible
-	 to other file descriptors.  */
-	  _commit (u->s->fd);
-#endif
+	  flush_sync (us->s);
 	  unlock_unit (us);
 	}
 }
Index: libgfortran/io/unix.h
===
--- libgfortran/io/unix.h	(revision 180180)
+++ libgfortran/io/unix.h	(working copy)
@@ -167,6 +167,9 @@ internal_proto(is_special);
 extern void flush_if_preconnected (stream *);
 internal_proto(flush_if_preconnected);
 
+extern int flush_sync (stream *);
+internal_proto(flush_sync);
+
 extern int stream_isatty (stream *);
 internal_proto(stream_isatty);
 


[PATCH] Handle VEC_PERM_EXPR and WIDEN_LSHIFT_EXPR in expand_debug_expr (PR middle-end/50754)

2011-10-19 Thread Jakub Jelinek
Hi!

These two tree codes have been added recently, but not to expand_debug_expr.
The following patch adds them there, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?

2011-10-19  Jakub Jelinek  

PR middle-end/50754
* cfgexpand.c (expand_debug_expr): Handle WIDEN_LSHIFT_EXPR, ignore
VEC_PERM_EXPR.

--- gcc/cfgexpand.c.jj  2011-10-18 23:52:04.0 +0200
+++ gcc/cfgexpand.c 2011-10-18 23:54:22.0 +0200
@@ -3267,6 +3267,7 @@ expand_debug_expr (tree exp)
 case VEC_WIDEN_MULT_LO_EXPR:
 case VEC_WIDEN_LSHIFT_HI_EXPR:
 case VEC_WIDEN_LSHIFT_LO_EXPR:
+case VEC_PERM_EXPR:
   return NULL;
 
/* Misc codes.  */
@@ -3321,6 +3322,7 @@ expand_debug_expr (tree exp)
   return NULL;
 
 case WIDEN_SUM_EXPR:
+case WIDEN_LSHIFT_EXPR:
   if (SCALAR_INT_MODE_P (GET_MODE (op0))
  && SCALAR_INT_MODE_P (mode))
{
@@ -3329,7 +3331,8 @@ expand_debug_expr (tree exp)
  0)))
  ? ZERO_EXTEND : SIGN_EXTEND, mode, op0,
  inner_mode);
- return simplify_gen_binary (PLUS, mode, op0, op1);
+ return simplify_gen_binary (TREE_CODE (exp) == WIDEN_LSHIFT_EXPR
+ ? ASHIFT : PLUS, mode, op0, op1);
}
   return NULL;
 

Jakub


[PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)

2011-10-19 Thread Jakub Jelinek
Hi!

Similarly to casts of bool to integer, even stores into bool arrays
can be handled similarly.  Just we need to ensure tree-vect-data-refs.c
doesn't reject vectorization before tree-vect-patterns.c has a chance
to optimize it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-19  Jakub Jelinek  

PR tree-optimization/50596
* tree-vect-stmts.c (vect_mark_relevant): Only use
FOR_EACH_IMM_USE_FAST if lhs is SSA_NAME.
(vectorizable_store): If is_pattern_stmt_p look through
VIEW_CONVERT_EXPR on lhs.
* tree-vect-patterns.c (vect_recog_bool_pattern): Optimize
also stores into bool memory in addition to casts from bool
to integral types.
(vect_mark_pattern_stmts): If pattern_stmt already has vinfo
created, don't create it again.
* tree-vect-data-refs.c (vect_analyze_data_refs): For stores
into bool memory use vectype for integral type corresponding
to bool's mode.
* tree-vect-loop.c (vect_determine_vectorization_factor): Give up
if a store into bool memory hasn't been replaced by the pattern
recognizer.

* gcc.dg/vect/vect-cond-10.c: New test.

--- gcc/tree-vect-stmts.c.jj2011-10-18 23:52:07.0 +0200
+++ gcc/tree-vect-stmts.c   2011-10-19 14:19:00.0 +0200
@@ -159,19 +159,20 @@ vect_mark_relevant (VEC(gimple,heap) **w
   /* This use is out of pattern use, if LHS has other uses that are
  pattern uses, we should mark the stmt itself, and not the pattern
  stmt.  */
-  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
-{
-  if (is_gimple_debug (USE_STMT (use_p)))
-continue;
-  use_stmt = USE_STMT (use_p);
+ if (TREE_CODE (lhs) == SSA_NAME)
+   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+ {
+   if (is_gimple_debug (USE_STMT (use_p)))
+ continue;
+   use_stmt = USE_STMT (use_p);
 
-  if (vinfo_for_stmt (use_stmt)
-  && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt)))
-{
-  found = true;
-  break;
-}
-}
+   if (vinfo_for_stmt (use_stmt)
+   && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt)))
+ {
+   found = true;
+   break;
+ }
+ }
 }
 
   if (!found)
@@ -3656,6 +3657,9 @@ vectorizable_store (gimple stmt, gimple_
 return false;
 
   scalar_dest = gimple_assign_lhs (stmt);
+  if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR
+  && is_pattern_stmt_p (stmt_info))
+scalar_dest = TREE_OPERAND (scalar_dest, 0);
   if (TREE_CODE (scalar_dest) != ARRAY_REF
   && TREE_CODE (scalar_dest) != INDIRECT_REF
   && TREE_CODE (scalar_dest) != COMPONENT_REF
--- gcc/tree-vect-patterns.c.jj 2011-10-18 23:52:05.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-19 13:55:27.0 +0200
@@ -1933,6 +1933,50 @@ vect_recog_bool_pattern (VEC (gimple, he
   VEC_safe_push (gimple, heap, *stmts, last_stmt);
   return pattern_stmt;
 }
+  else if (rhs_code == SSA_NAME
+  && STMT_VINFO_DATA_REF (stmt_vinfo))
+{
+  stmt_vec_info pattern_stmt_info;
+  vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
+  gcc_assert (vectype != NULL_TREE);
+  if (!check_bool_pattern (var, loop_vinfo))
+   return NULL;
+
+  rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts);
+  if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == TARGET_MEM_REF)
+   {
+ lhs = copy_node (lhs);
+ TREE_TYPE (lhs) = TREE_TYPE (vectype);
+   }
+  else
+   lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs);
+  if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
+   {
+ tree rhs2 = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+ gimple cast_stmt
+   = gimple_build_assign_with_ops (NOP_EXPR, rhs2, rhs, NULL_TREE);
+ STMT_VINFO_PATTERN_DEF_STMT (stmt_vinfo) = cast_stmt;
+ rhs = rhs2;
+   }
+  pattern_stmt
+   = gimple_build_assign_with_ops (SSA_NAME, lhs, rhs, NULL_TREE);
+  pattern_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL);
+  set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info);
+  STMT_VINFO_DATA_REF (pattern_stmt_info)
+   = STMT_VINFO_DATA_REF (stmt_vinfo);
+  STMT_VINFO_DR_BASE_ADDRESS (pattern_stmt_info)
+   = STMT_VINFO_DR_BASE_ADDRESS (stmt_vinfo);
+  STMT_VINFO_DR_INIT (pattern_stmt_info) = STMT_VINFO_DR_INIT (stmt_vinfo);
+  STMT_VINFO_DR_OFFSET (pattern_stmt_info)
+   = STMT_VINFO_DR_OFFSET (stmt_vinfo);
+  STMT_VINFO_DR_STEP (pattern_stmt_info) = STMT_VINFO_DR_STEP (stmt_vinfo);
+  STMT_VINFO_DR_ALIGNED_TO (pattern_stmt_info)
+   = STMT_VINFO_DR_

Re: [VTA, PR49310] O(n+m)-ish emit_notes

2011-10-19 Thread H.J. Lu
On Tue, Oct 18, 2011 at 7:50 AM, Alexandre Oliva  wrote:
> On Oct 17, 2011, Jakub Jelinek  wrote:
>
>> On Tue, Oct 11, 2011 at 04:19:52PM -0300, Alexandre Oliva wrote:
>>> Here's what I've got so far.  Regstrapped on x86_64-linux-gnu and
>>> i686-linux-gnu.  Ok to install?
>
>> I see
>> +FAIL: gcc.c-torture/compile/pr19080.c  -O3 -g  (internal compiler error)
>> +FAIL: gcc.c-torture/compile/pr19080.c  -O3 -g  (test for excess errors)
>> regression with this patch on x86_64-linux (--enable-checking=yes,rtl), ICE
>> in var-tracking.  Can you please look at that?
>
> Hey, I'd already fixed that!  It turns out that I posted an outdated
> version of the patch, from before moving an incorrect assertion check
> into a test.
>
>> Just a style nit, isn't { supposed to go on the next line?
>
> Not sure.  Nearly all multi-line enums in gcc/ have brackets at the end
> of the line rather than in the beginning of the subsequent line; I could
> only find one that followed what we both think the conventions require.
> I changed it, thanks.
>
>> Otherwise looks ok.
>
> Thanks, here's what I'm going to install as soon as I complete a new
> regression test with the corrected patch, just to be sure I didn't
> accidentally drop any other fixes.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50799

-- 
H.J.


Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Richard Henderson
On 10/19/2011 09:26 AM, Andrew MacLeod wrote:
> well, they are explicitly'compare_exchange_strong'  and 
> 'compare_exchange_weak' calls...   so yes, they have 'hardcoded' one or the 
> other :-)
> we could alternatively do 2 separate builtins, but I didn't see the point.  
> But Im ambivalent.  

I'm just not keen on issuing errors this late.  If we 'know' it's always going 
to
be 0/1 because of other things the compiler does, we might as well simply do

weak = (opN == const1_rtx)

and leave it at that.

> I clearly misunderstood what you said then, I thought you said use
> QImode :-P.   So, look at target and use that mode?  target may not
> be set sometimes tho right? what do you use then?  QImode?

You never look at target, you look at 

insn_data[icode].operand[opno].mode

if target is set and doesn't match that mode, then create a new pseudo
of the proper mode and let your caller do the conversion.

> If I understand my previous conversation with Lawrence, we need to
> know what mode the native CAS is.

We don't have that knowledge for any of the other rtl patterns; we
ask the rtl pattern to enforce a given mode.  We should do the same
for CAS.

> Worst case, if the native CAS is relaxed, the fences need to be
> issued as such to implement
> 
>  compare_exchange (T* ptr, T* old, new, success, failure)
> 
> memory_fence (success) // release modes
> val = *old;
> res = relaxed_CAS (ptr, val, new)
> if (success)
>   {
> memory_fence (success);   // acquire modes
> return true;
>   }
> *old = res;
> memory_fence (failure)  
> return false;

If the target uses LL/SC, then native CAS is likely relaxed.  However,
we will already be emitting a number of jumps in order to implement
the LL/SC sequence and stuffing the fences into those jumps is going
to be much better than you trying to do it in the middle-end outside
of that sequence.  Really.

> So if the CAS can handle it all, why does it matter if the pattern
> has a single "compressed" parameter for the 3 values, or whether it
> simply explicitly lists all three?

*shrug* Just for less memory consumption.  It *could* remain 3 values.


r~


Re: [PATCH] Handle VEC_PERM_EXPR and WIDEN_LSHIFT_EXPR in expand_debug_expr (PR middle-end/50754)

2011-10-19 Thread Richard Henderson
On 10/19/2011 10:11 AM, Jakub Jelinek wrote:
> Hi!
> 
> These two tree codes have been added recently, but not to expand_debug_expr.
> The following patch adds them there, bootstrapped/regtested on x86_64-linux
> and i686-linux, ok for trunk?
> 
> 2011-10-19  Jakub Jelinek  
> 
>   PR middle-end/50754
>   * cfgexpand.c (expand_debug_expr): Handle WIDEN_LSHIFT_EXPR, ignore
>   VEC_PERM_EXPR.

Ok.


r~


Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Andrew MacLeod

On 10/19/2011 01:55 PM, Richard Henderson wrote:

On 10/19/2011 09:26 AM, Andrew MacLeod wrote:

well, they are explicitly'compare_exchange_strong'  and 
'compare_exchange_weak' calls...   so yes, they have 'hardcoded' one or the 
other :-)
we could alternatively do 2 separate builtins, but I didn't see the point.  But 
Im ambivalent.

I'm just not keen on issuing errors this late.  If we 'know' it's always going 
to
be 0/1 because of other things the compiler does, we might as well simply do

weak = (opN == const1_rtx)

and leave it at that.


We issue all the memory model errors at this same location
.
A user *could* use this builtin and write

bool weak;
<...>
__atomic_compare_exchange (&mem, &expected, desired, weak, m1, m2);

The c++ wrappers will always call it with 0 or 1, but direct callers 
could do other things.  The optimizers may well decide
that its a 1 or 0 by the time we get to expansion.  Its documented as 
requiring a compile time constant.


We either report errors, or remove the possibility by making them 
separate routines.  Or silently make it strong.
I suppose we could also check the value when we turn it from 
__atomic_compare_exchange to __atomic_compare_exchange_{1,2,4,8,16}... 
thats before going into SSA. Im not sure it really buys a lot tho.


Maybe they should just be separate routines...


I clearly misunderstood what you said then, I thought you said use
QImode :-P.   So, look at target and use that mode?  target may not
be set sometimes tho right? what do you use then?  QImode?

You never look at target, you look at

insn_data[icode].operand[opno].mode

if target is set and doesn't match that mode, then create a new pseudo
of the proper mode and let your caller do the conversion.


ah.



Worst case, if the native CAS is relaxed, the fences need to be
issued as such to implement

  compare_exchange (T* ptr, T* old, new, success, failure)

 memory_fence (success) // release modes
 val = *old;
 res = relaxed_CAS (ptr, val, new)
 if (success)
   {
 memory_fence (success);   // acquire modes
 return true;
   }
 *old = res;
 memory_fence (failure)
 return false;

If the target uses LL/SC, then native CAS is likely relaxed.  However,
we will already be emitting a number of jumps in order to implement
the LL/SC sequence and stuffing the fences into those jumps is going
to be much better than you trying to do it in the middle-end outside
of that sequence.  Really.



My point is, IF the fence has to go BEFORE 'val = *old' and AFTER '*old 
= res', its impossible to do it within the CAS pattern,  unless you also 
have it do the copies...


If, which is more likely as I think about it, thats not the case and the 
fences can look like they do in the second example (closer to the CAS 
than the copies) , then the CAS can take care of both fences, which 
would be much more preferable and clean..

So if the CAS can handle it all, why does it matter if the pattern
has a single "compressed" parameter for the 3 values, or whether it
simply explicitly lists all three?

*shrug* Just for less memory consumption.  It *could* remain 3 values.

Does it really make a measurable memory difference?  I can certainly jam 
them together...


Andrew



Re: [4.6,patch,libgfortran/io] Fix build issue with _commit [PR5016]

2011-10-19 Thread Steve Kargl
On Wed, Oct 19, 2011 at 07:11:02PM +0200, Tobias Burnus wrote:
> I intent to commit this patch - created mostly by Janne - soon to 
> gcc-4_6-branch; it should solve a build issue.
> 
> The patch was build and regtested on x86-64-linux and build and tested 
> on MinGW and Cygwin by Kai.
> 
> It is a follow up to http://gcc.gnu.org/ml/fortran/2011-10/msg00132.html 
> / http://gcc.gnu.org/ml/gcc-cvs/2011-10/msg00734.html
> 
> Please speak up immediately, if you have objects against the committal. 
> (Another option would be to revert previous patch, which solved a severe 
> performance regression on Windows - thus that patch plus this patch is 
> better.)
> 
> Sorry for disrupting the 4.6.2 process a bit.
> 

I don't have an objection, just a question.  Does
this issue appear on trunk?  Comment #10 in PR 50016
appears to suggest it does.  In which case, I object
to the initial commit into the 4.6-branch without
testing the patch of trunk first.  Rushing a patch
into a branch to beat a release dead line is simply
a poor choice.

-- 
Steve


Fix g++.dg/compat/struct-layout-1_generate.c options for Windows targets

2011-10-19 Thread Joseph S. Myers
g++.dg/compat/struct-layout-1_generate.c generates a dg-options line
using -mno-mmx -Wno-abi for i?86-*-* x86_64-*-* - and one using
-fno-common, which takes precedence, for alpha*-dec-osf* hppa*-*-hpux*
powerpc*-*-darwin* *-*-mingw32* *-*-cygwin*.  But at least
i686-mingw32 still needs the -Wno-abi option.  This patch adds the
Windows targets to the case for i?86-*-darwin* x86_64-*-darwin* which
already passes both sets of options.

Tested for cross to i686-mingw32.  OK to commit?

2011-10-19  Joseph Myers  

* g++.dg/compat/struct-layout-1_generate.c: Also pass -mno-mmx
-Wno-abi for i?86-*-mingw32* x86_64-*-mingw32* i?86-*-cygwin*.

Index: gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c
===
--- gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c  (revision 
180200)
+++ gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c  (working copy)
@@ -47,7 +47,7 @@
 "/* { dg-options \"%s-I%s\" } */\n",
 "/* { dg-options \"%s-I%s -mno-mmx -Wno-abi\" { target i?86-*-* x86_64-*-* } } 
*/\n",
 "/* { dg-options \"%s-I%s -fno-common\" { target alpha*-dec-osf* hppa*-*-hpux* 
powerpc*-*-darwin* *-*-mingw32* *-*-cygwin* } } */\n",
-"/* { dg-options \"%s-I%s -mno-mmx -fno-common -Wno-abi\" { target 
i?86-*-darwin* x86_64-*-darwin* } } */\n",
+"/* { dg-options \"%s-I%s -mno-mmx -fno-common -Wno-abi\" { target 
i?86-*-darwin* x86_64-*-darwin* i?86-*-mingw32* x86_64-*-mingw32* 
i?86-*-cygwin* } } */\n",
 "/* { dg-options \"%s-I%s -mno-base-addresses\" { target mmix-*-* } } */\n",
 "/* { dg-options \"%s-I%s -mlongcalls -mtext-section-literals\" { target 
xtensa*-*-* } } */\n"
 #define NDG_OPTIONS (sizeof (dg_options) / sizeof (dg_options[0]))

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add an intermediate coverage format for gcov

2011-10-19 Thread Sharad Singhai
Since the updated patch already uses unmangled function names, is it
good to commit then?

Sharad

On Wed, Oct 19, 2011 at 1:48 AM, Jan Hubicka  wrote:
>> On Oct 18, 2011, at 4:19 PM, Sharad Singhai  wrote:
>> > Okay, I liked the idea of self-descriptive tags. I have updated the
>> > patch based on your suggestions. I have simplified the format
>> > somewhat. Instead of repeating function name, I use a 'function' tag
>> > with the format
>> >
>> > function:,,
>>
>> Sound nice.
>>
>> > I also dropped the unmangled function names, they were turning out to
>> > be too unreadable and not really useful in this context.
>>
>> Ah, I'd argue for mangled names.  Every one knows they can stream through 
>> c++filt and get unmangled, but once you unmangle, you can never go back.  
>> Also, the mangled version is significantly smaller.  For c, it is 
>> irrelevant, for c++, it makes a big difference.
>
> I would also support unmangled variant. Otherwise the patch seems resonable 
> to me.
>
> Honza
>> >
>


Re: [PATCH] Add an intermediate coverage format for gcov

2011-10-19 Thread Sharad Singhai
Sorry, I misunderstood your comment. I see that you are asking for
unmangled function names whereas the current patch supports only
mangled names. I can print unmangled names under another option. Would
that work?

Thanks,
Sharad

On Wed, Oct 19, 2011 at 12:06 PM, Sharad Singhai  wrote:
> Since the updated patch already uses unmangled function names, is it
> good to commit then?
>
> Sharad
>
> On Wed, Oct 19, 2011 at 1:48 AM, Jan Hubicka  wrote:
>>> On Oct 18, 2011, at 4:19 PM, Sharad Singhai  wrote:
>>> > Okay, I liked the idea of self-descriptive tags. I have updated the
>>> > patch based on your suggestions. I have simplified the format
>>> > somewhat. Instead of repeating function name, I use a 'function' tag
>>> > with the format
>>> >
>>> > function:,,
>>>
>>> Sound nice.
>>>
>>> > I also dropped the unmangled function names, they were turning out to
>>> > be too unreadable and not really useful in this context.
>>>
>>> Ah, I'd argue for mangled names.  Every one knows they can stream through 
>>> c++filt and get unmangled, but once you unmangle, you can never go back.  
>>> Also, the mangled version is significantly smaller.  For c, it is 
>>> irrelevant, for c++, it makes a big difference.
>>
>> I would also support unmangled variant. Otherwise the patch seems resonable 
>> to me.
>>
>> Honza
>>> >
>>
>


Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-10-19 Thread Xinliang David Li
On Wed, Oct 19, 2011 at 12:02 PM,   wrote:
> Minimized the crash to this:
>
> struct Foo {
>  unsigned bf1:1;
>  unsigned bf2:1;
>  unsigned bf3:1;
> };
>
> void foo (struct Foo *ob) {
>  ob->bf2 = 1;
> }
>
>
>
> D.2731_4 = &ob_1(D)->bf2;
> __asan_base_addr.2 = (long unsigned int) D.2731_4;
> D.2732_5 = __asan_base_addr.2 >> 3;
> D.2733_6 = 1 << 44;
> D.2734_7 = D.2732_5 + D.2733_6;
> D.2735_8 = VIEW_CONVERT_EXPR(D.2734_7);
> # VUSE <.MEM>
> __asan_shadow.3 = *D.2735_8;
> D.2737_9 = __asan_shadow.3 != 0;
> D.2738_10 = __asan_base_addr.2 & 7;
> D.2739_11 = (char) D.2738_10;
> D.2740_12 = D.2739_11 + 0;
> D.2741_13 = D.2740_12 >= __asan_shadow.3;
> __asan_crash_cond.4 = D.2737_9 & D.2741_13;
> if (__asan_crash_cond.4 != 0)
> ./expand_expr_addr_expr_1_err.c: In function ‘foo’:
> ./expand_expr_addr_expr_1_err.c:8:11: internal compiler error: in
> expand_expr_addr_expr_1, at expr.c:7381
>
>
>
> How do I avoid instrumenting bitfields?

Use get_inner_reference to compute the bitpos, and check if it is
multiple of bits_per_unit.

David

>
> http://codereview.appspot.com/5272048/
>


Re: [trans-mem] Add gl_wt TM method.

2011-10-19 Thread Richard Henderson
> Add support for TM-method-specific begin code.
> 
>   * libitm_i.h (GTM::gtm_restart_reason): Re-arrange and clean up
>   declarations.
>   * dispatch.h (GTM::abi_dispatch::begin_or_restart): New.
>   * method-serial.cc: Implement begin_or_restart().
>   * beginend.cc (GTM::gtm_thread::begin_transaction): Call
>   dispatch-specific begin_or_restart().
>   (GTM::gtm_thread::restart): Same.

Ok except,

> +  // Run dispatch-specific restart code. Retry until we succeed.
> +  GTM::gtm_restart_reason rr;
> +  while ((rr = disp->begin_or_restart())
> +  != NUM_RESTARTS)

Please add

  NO_RESTART = NUM_RESTARTS

(or it's own number *after* NUM_RESTARTS, or -1, or something)
to the enumeration and use that name.  Using num_restarts here is confusing.

> Fixed gtm_thread::serialirr_mode to actually use serialirr, not serial.
> 
>   * method-serial.cc (GTM::gtm_thread::serialirr_mode): Fixed: Use
>   serial-irrevocable dispatch, not serial.

Ok.

> Do not free transaction-local memory when committing a nested transaction.
> 
>   * alloc.cc (commit_allocations_2): Do not free transaction-local
>   memory when committing a nested transaction.

Ok.

> Handle re-initialization of the current method group.
> 
>   * retry.cc (GTM::gtm_thread::decide_retry_strategy): Handle
>   re-initialization of the current method group.
>   * libitm_i.h (GTM::gtm_restart_reason): Add restart reason for this.

Ok.

> Undo log is used for both thread-local and shared data.
> 
>   * libitm_i.h: Renamed gtm_local_undo to gtm_undolog_entry.
>   (GTM::gtm_thread): Renamed local_undo to undolog. Renamed
>   undolog-related member functions from *_local to *_undolog.
>   * local.cc (gtm_thread::commit_undolog): Same.
>   * beginend.cc (GTM::gtm_thread::trycommit): Same.
>   (GTM::gtm_thread::rollback): Roll back undolog before
>   dispatch-specific rollback.

Ok.

> Ensure privatization safety if requested by a TM method.
> 
>   * beginend.cc (GTM::gtm_thread::trycommit): Ensure privatization
>   safety if requested by a TM method.
>   * dispatch.h (GTM::abi_dispatch::trycommit): Add parameter for
>   privatization safety.
>   * method-serial.cc: Same.

Ok.

> Add gl_wt TM method.
> 
>   * libitm_i.h: Add gl_wt dispatch.
>   * retry.cc (parse_default_method): Same.
>   * method-gl.cc: New file.
>   * Makefile.am: Use method-gl.cc.
>   * Makefile.in: Rebuild.

Ok with...

> Fix gl_wt commit/rollback when serial lock has been acquired.
> 
>   * method-gl.cc (gl_wt_dispatch::trycommit): Fix interaction with
>   gtm_thread::shared_state when the serial lock is acquired.
>   (gl_wt_dispatch::rollback): Same.

... this merged with the previous commit.

> Fix TLS read accesses on Linux/x86.
> 
>   * config/linux/x86/tls.h (abi_disp): Make TLS slot read volatile.
>   (gtm_thr): Same.

Ok.


r~



Re: [cxx-mem-model] compare_exchange implementation

2011-10-19 Thread Richard Henderson
On 10/19/2011 11:31 AM, Andrew MacLeod wrote:
> If, which is more likely as I think about it, thats not the case and
> the fences can look like they do in the second example (closer to the
> CAS than the copies) , then the CAS can take care of both fences,
> which would be much more preferable and clean..

Indeed.

>>> So if the CAS can handle it all, why does it matter if the pattern
>>> has a single "compressed" parameter for the 3 values, or whether it
>>> simply explicitly lists all three?
>> *shrug* Just for less memory consumption.  It *could* remain 3 values.
>>
> Does it really make a measurable memory difference?  I can certainly jam them 
> together...

Eh, you're right, probably not a measurable difference.


r~


[PATCH] Further i?86 vector permutation fixes

2011-10-19 Thread Jakub Jelinek
Hi!

This patch adds two new permutations to the vshuf* tests (VEC_PACK_TRUNC
style) and fixes problems related to them.  The first two fix issues with
AVX2, the rest of the patch deals with the problem that if the shuffle
mask suggests two operands, testing whether there is a permutation supported
by the hw is done with d.op0 != d.op1, but then when we find out both
arguments are the same, it is performed for d.op0 == d.op1.  Some routines
handle just equality and some non-equality, so it is possible
the check might succeed even when d.op0 == d.op1 using the adjusted mask
won't (which results in ICEs).  This patch in that case retries
using d.op0 != d.op1.

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
with -mavx2 on sde.

2011-10-19  Jakub Jelinek  

* config/i386/i386.c (expand_vec_perm_vpshufb2_vpermq_even_odd): Use
d->op1 instead of d->op0 for the second vpshufb.
(expand_vec_perm_even_odd_1): For V8SImode fix vpshufd immediates.
(ix86_expand_vec_perm_const): If mask indicates two operands are
needed, but both are the same and expanding them as d.op0 == d.op1
failed, retry with d.op0 != d.op1.
(ix86_expand_vec_perm_builtin): Likewise.  Handle sorry printing
also for d.nelt == 32.

* gcc.dg/torture/vshuf-32.inc: Add interleave permutations.
* gcc.dg/torture/vshuf-16.inc: Likewise.
* gcc.dg/torture/vshuf-8.inc: Likewise.
* gcc.dg/torture/vshuf-4.inc: Likewise.

--- gcc/config/i386/i386.c.jj   2011-10-18 23:52:02.0 +0200
+++ gcc/config/i386/i386.c  2011-10-19 19:02:57.0 +0200
@@ -35992,7 +35992,7 @@ expand_vec_perm_vpshufb2_vpermq_even_odd
   vperm = force_reg (V32QImode, vperm);
 
   h = gen_reg_rtx (V32QImode);
-  op = gen_lowpart (V32QImode, d->op0);
+  op = gen_lowpart (V32QImode, d->op1);
   emit_insn (gen_avx2_pshufbv32qi3 (h, op, vperm));
 
   ior = gen_reg_rtx (V32QImode);
@@ -36154,9 +36154,9 @@ expand_vec_perm_even_odd_1 (struct expan
   /* Swap the 2nd and 3rd position in each lane into
 { 0 2 1 3 8 a 9 b } and { 4 6 5 7 c e d f }.  */
   emit_insn (gen_avx2_pshufdv3 (t1, t1,
-   GEN_INT (2 * 2 + 1 * 16 + 3 * 64)));
+   GEN_INT (2 * 4 + 1 * 16 + 3 * 64)));
   emit_insn (gen_avx2_pshufdv3 (t2, t2,
-   GEN_INT (2 * 2 + 1 * 16 + 3 * 64)));
+   GEN_INT (2 * 4 + 1 * 16 + 3 * 64)));
 
   /* Now an vpunpck[lh]qdq will produce
 { 0 2 4 6 8 a c e } resp. { 1 3 5 7 9 b d f }.  */
@@ -36498,6 +36498,7 @@ ix86_expand_vec_perm_builtin (tree exp)
 {
   struct expand_vec_perm_d d;
   tree arg0, arg1, arg2;
+  bool maybe_retry = false;
 
   arg0 = CALL_EXPR_ARG (exp, 0);
   arg1 = CALL_EXPR_ARG (exp, 1);
@@ -36543,6 +36544,7 @@ ix86_expand_vec_perm_builtin (tree exp)
for (i = 0; i < nelt; ++i)
  if (d.perm[i] >= nelt)
d.perm[i] -= nelt;
+   maybe_retry = true;
   }
   /* FALLTHRU */
 
@@ -36563,6 +36565,28 @@ ix86_expand_vec_perm_builtin (tree exp)
   if (ix86_expand_vec_perm_builtin_1 (&d))
 return d.target;
 
+  /* If the mask says both arguments are needed, but they are the same,
+ the above tried to expand with d.op0 == d.op1.  If that didn't work,
+ retry with d.op0 != d.op1 as that is what testing has been done with.  */
+  if (maybe_retry)
+{
+  rtx seq;
+  bool ok;
+
+  extract_vec_perm_cst (&d, arg2);
+  d.op1 = gen_reg_rtx (d.vmode);
+  start_sequence ();
+  ok = ix86_expand_vec_perm_builtin_1 (&d);
+  seq = get_insns ();
+  end_sequence ();
+  if (ok)
+   {
+ emit_move_insn (d.op1, d.op0);
+ emit_insn (seq);
+ return d.target;
+   }
+}
+
   /* For compiler generated permutations, we should never got here, because
  the compiler should also be checking the ok hook.  But since this is a
  builtin the user has access too, so don't abort.  */
@@ -36588,6 +36612,19 @@ ix86_expand_vec_perm_builtin (tree exp)
 d.perm[8], d.perm[9], d.perm[10], d.perm[11],
 d.perm[12], d.perm[13], d.perm[14], d.perm[15]);
   break;
+case 32:
+  sorry ("vector permutation "
+"(%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
+"%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d)",
+d.perm[0], d.perm[1], d.perm[2], d.perm[3],
+d.perm[4], d.perm[5], d.perm[6], d.perm[7],
+d.perm[8], d.perm[9], d.perm[10], d.perm[11],
+d.perm[12], d.perm[13], d.perm[14], d.perm[15],
+d.perm[16], d.perm[17], d.perm[18], d.perm[19],
+d.perm[20], d.perm[21], d.perm[22], d.perm[23],
+d.perm[24], d.perm[25], d.perm[26], d.perm[27],
+d.perm[28], d.perm[29], d.perm[30], d.perm[31]);
+  break;
 default:
   gcc_unreachable ();
 }
@@ -36599,6 +36636,7 @@ b

[Patch ARM] Fix PR target/50106

2011-10-19 Thread Ramana Radhakrishnan
Hi,

This fixes PR target/50106 which was missing handling return register
size from 1-3 for Thumb1. Fixed thusly. Final testing on-going with
arm-linux-gnueabi with thumb1.

Ok to backport to 4.6 branch given it is branch freeze time ? I'll be
able to commit this to the branch latest by tomorrow A.M. UK time.

cheers
Ramana


2011-10-19  Ramana Radhakrishnan  

PR target/50106
* config/arm/arm.c (thumb_unexpanded_epilogue): Handle return
reg size from 1-3.


Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 180200)
+++ gcc/config/arm/arm.c(working copy)
@@ -21652,7 +21652,7 @@
   if (extra_pop > 0)
 {
   unsigned long extra_mask = (1 << extra_pop) - 1;
-  live_regs_mask |= extra_mask << (size / UNITS_PER_WORD);
+  live_regs_mask |= extra_mask << ((size + 3) / UNITS_PER_WORD);
 }

   /* The prolog may have pushed some high registers to use as


Re: [Patch ARM] Fix PR target/50106

2011-10-19 Thread Nathan Froyd

On 10/19/2011 3:27 PM, Ramana Radhakrishnan wrote:

Index: gcc/config/arm/arm.c
-  live_regs_mask |= extra_mask<<  (size / UNITS_PER_WORD);
+  live_regs_mask |= extra_mask<<  ((size + 3) / UNITS_PER_WORD);


IIUC, wouldn't ((size + UNITS_PER_WORD - 1) / UNITS_PER_WORD) be clearer?

-Nathan


[PATCH, testsuite]: Require non_strict_align effective target for gcc.dg/ipa/ipa-sra-[26].c

2011-10-19 Thread Uros Bizjak
Hello!

These two tests require non_strict_aligned effective target, since IPA
fails in tree_non_mode_aligned_mem_p () for "cow" and "calf"
candidates for STRICT_ALIGNMENT targets. Mode alignment requires 32
bytes, while data is aligned to 8 bytes.

2011-10-19  Uros Bizjak  

* gcc.dg/ipa/ipa-sra-2.c: Add dg-require-effective-target
non_strict_align.
* gcc.dg/ipa/ipa-sra-6.c: Ditto.

Tested on x86_64-pc-linux-gnu and alphaev68-pc-linux-gnu, where the
patch "fixes":

FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra "About to replace
expr cow_.*D.->red with \\*ISRA"
FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra "About to replace
expr cow_.*D.->green with ISRA"
FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra "About to replace
expr calf_.*D.->red with \\*ISRA"
FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra "About to replace
expr calf_.*D.->green with ISRA"
FAIL: gcc.dg/ipa/ipa-sra-6.c scan-tree-dump-times eipa_sra "foo " 1

OK for mainline and 4.6 branch?

Uros.
Index: gcc.dg/ipa/ipa-sra-2.c
===
--- gcc.dg/ipa/ipa-sra-2.c  (revision 180193)
+++ gcc.dg/ipa/ipa-sra-2.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fipa-sra -fdump-tree-eipa_sra-details"  } */
+/* { dg-require-effective-target non_strict_align } */
 
 struct bovid
 {
Index: gcc.dg/ipa/ipa-sra-6.c
===
--- gcc.dg/ipa/ipa-sra-6.c  (revision 180193)
+++ gcc.dg/ipa/ipa-sra-6.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fipa-sra -fdump-tree-eipa_sra-slim"  } */
+/* { dg-require-effective-target non_strict_align } */
 
 struct bovid
 {


Re: [PATCH, testsuite]: Require non_strict_align effective target for gcc.dg/ipa/ipa-sra-[26].c

2011-10-19 Thread David Miller
From: Uros Bizjak 
Date: Wed, 19 Oct 2011 21:50:14 +0200

> Hello!
> 
> These two tests require non_strict_aligned effective target, since IPA
> fails in tree_non_mode_aligned_mem_p () for "cow" and "calf"
> candidates for STRICT_ALIGNMENT targets. Mode alignment requires 32
> bytes, while data is aligned to 8 bytes.
> 
> 2011-10-19  Uros Bizjak  
> 
>   * gcc.dg/ipa/ipa-sra-2.c: Add dg-require-effective-target
>   non_strict_align.
>   * gcc.dg/ipa/ipa-sra-6.c: Ditto.
> 
> Tested on x86_64-pc-linux-gnu and alphaev68-pc-linux-gnu, where the
> patch "fixes":

Thanks for taking care of this, I've been seeing these on sparc as well.


Re: [PATCH 2/6] Generate virtual locations for tokens

2011-10-19 Thread Dodji Seketeli
This is probably going to take some back and forth as I don't have any
SPU target at hand to debug, so I have filed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50801 to track this issue,
along with a candidate fix.  Could you please test it and reply to the
bug with your results?

Thank you for your time and I apologize for the inconvenience.

-- 
Dodji


Re: Fix g++.dg/compat/struct-layout-1_generate.c options for Windows targets

2011-10-19 Thread Mike Stump
On Oct 19, 2011, at 11:58 AM, "Joseph S. Myers"  wrote:
> g++.dg/compat/struct-layout-1_generate.c generates a dg-options line
> using -mno-mmx -Wno-abi for i?86-*-* x86_64-*-* - and one using
> -fno-common, which takes precedence, for alpha*-dec-osf* hppa*-*-hpux*
> powerpc*-*-darwin* *-*-mingw32* *-*-cygwin*.  But at least
> i686-mingw32 still needs the -Wno-abi option.  This patch adds the
> Windows targets to the case for i?86-*-darwin* x86_64-*-darwin* which
> already passes both sets of options.

> OK to commit?

Ok.


Re: [PATCH] Further i?86 vector permutation fixes

2011-10-19 Thread Richard Henderson
On 10/19/2011 12:25 PM, Jakub Jelinek wrote:
> 2011-10-19  Jakub Jelinek  
> 
>   * config/i386/i386.c (expand_vec_perm_vpshufb2_vpermq_even_odd): Use
>   d->op1 instead of d->op0 for the second vpshufb.
>   (expand_vec_perm_even_odd_1): For V8SImode fix vpshufd immediates.
>   (ix86_expand_vec_perm_const): If mask indicates two operands are
>   needed, but both are the same and expanding them as d.op0 == d.op1
>   failed, retry with d.op0 != d.op1.
>   (ix86_expand_vec_perm_builtin): Likewise.  Handle sorry printing
>   also for d.nelt == 32.
> 
>   * gcc.dg/torture/vshuf-32.inc: Add interleave permutations.
>   * gcc.dg/torture/vshuf-16.inc: Likewise.
>   * gcc.dg/torture/vshuf-8.inc: Likewise.
>   * gcc.dg/torture/vshuf-4.inc: Likewise.

Ok.

Although I think a good followup would be to fix

> +  if (which == 3 && d.op0 == d.op1)
> +{
> +  rtx seq;
> +  bool ok;
> +
> +  memcpy (d.perm, perm, sizeof (perm));
> +  d.op1 = gen_reg_rtx (d.vmode);
> +  start_sequence ();
> +  ok = ix86_expand_vec_perm_builtin_1 (&d);
> +  seq = get_insns ();
> +  end_sequence ();
> +  if (ok)
> + {
> +   emit_move_insn (d.op1, d.op0);
> +   emit_insn (seq);

this so that we don't need a copy to another register.
That could be done by adding a d.one_operand field, and
using that test instead of explicit equality everywhere.


r~


Re: C++ PATCH for c++/49172 (allow constexpr reference variables)

2011-10-19 Thread Jason Merrill

On 06/22/2011 12:19 AM, Jason Merrill wrote:

We decided at the Madrid meeting to allow constexpr reference variables,
and this patch adds that support to GCC.


The cp_parser_initializer_clause hunk also fixes 50787, a bug with 
direct binding of references in 4.6, so I'm applying that one there.


Tested x86_64-pc-linux-gnu.



Re: [4.6,patch,libgfortran/io] Fix build issue with _commit [PR5016]

2011-10-19 Thread Tobias Burnus
Pre-remark: Jakub (4.6.2 RM) had no objects - and thus I have committed 
it to the branch.


Steve Kargl wrote:
I don't have an objection, just a question. Does this issue appear on 
trunk?


Yes and no. The MinGW build issue this patch fixes was unique to the branch.

However, the actual issue - the slow down - exists both on the 4.6 
branch and on the trunk. The problem is that Janne and I couldn't agree 
when to call _commit(). We solved that by deferring it to later for the 
trunk (still time before the 4.7.0 release) while we wanted to have some 
fix for 4.6.2 - thus, we agreed on some fix for 4.6. The fix on the 
branch now calls significatly less often _commit() than before - but it 
probably still calls it too often.


Regarding rushing with the fix: I agree that is was less optimal; the 
main problem is that we do not have any MinGW gfortran developer who can 
properly test such patches and who can detect such regressions more 
timely. (The performance regression dates back over a year.)


Tobias


Re: [PATCH] Add an intermediate coverage format for gcov

2011-10-19 Thread Mike Stump
On Oct 19, 2011, at 12:12 PM, Sharad Singhai  wrote:
> Sorry, I misunderstood your comment. I see that you are asking for
> unmangled function names whereas the current patch supports only
> mangled names. I can print unmangled names under another option. Would
> that work?

I defer to the person that says ok.  :-)


Re: [PATCH] Further i?86 vector permutation fixes

2011-10-19 Thread Jakub Jelinek
On Wed, Oct 19, 2011 at 01:36:25PM -0700, Richard Henderson wrote:
> Although I think a good followup would be to fix
> this so that we don't need a copy to another register.
> That could be done by adding a d.one_operand field, and
> using that test instead of explicit equality everywhere.

Perhaps.

BTW, I've recently noticed that CSE isn't able to CSE when
SET_DEST is a SUBREG, even when it is same size SUBREG.  Apparently
there are hundreds of places in at least the i386 backend that
do emit_insn (gen_* (gen_lowpart (V???mode, something), ...));
Some of them could be changed without big difficulties
(if something is used in the same routine, just create the pseudo
with different mode and use gen_lowpart on the use side instead),
but there are many places that use say gen_lowpart (V???mode, target)
on the LHS.
Should CSE be taught to handle this, or should CSE early on
replace these (set (subreg:M (reg:N X) 0) ...) with
(set (reg:M Y) ...)
(set (reg:N X) (subreg:N (reg:M Y) 0))
so that CSE etc. can optimize it, something else?

Jakub


[PATCH, testsuite]: Fix PR 50796, FAIL: gcc.dg/plugin/plugindir[1234].c

2011-10-19 Thread Uros Bizjak
Hello!

Update dg-prune-output due to inacessible -> inaccessible change.

2011-10-19  Uros Bizjak  

PR testsuite/50796
* gcc.dg/plugin/plugindir?.c Update dg-prune-output.

Tested on x86_64-pc-linux-gnu, committed to mainline SVN.

Uros.
Index: gcc.dg/plugin/plugindir1.c
===
--- gcc.dg/plugin/plugindir1.c  (revision 180206)
+++ gcc.dg/plugin/plugindir1.c  (working copy)
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
 /* { dg-options "-c -fplugin=foo" } */
 
-/* { dg-prune-output ".*inacessible plugin file.*foo\.so expanded from short 
plugin name.*" } */
+/* { dg-prune-output ".*inaccessible plugin file.*foo\.so expanded from short 
plugin name.*" } */
Index: gcc.dg/plugin/plugindir2.c
===
--- gcc.dg/plugin/plugindir2.c  (revision 180206)
+++ gcc.dg/plugin/plugindir2.c  (working copy)
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
 /* { dg-options "-save-temps -c -fplugin=foo" } */
 
-/* { dg-prune-output ".*inacessible plugin file.*foo\.so expanded from short 
plugin name.*" } */
+/* { dg-prune-output ".*inaccessible plugin file.*foo\.so expanded from short 
plugin name.*" } */
Index: gcc.dg/plugin/plugindir3.c
===
--- gcc.dg/plugin/plugindir3.c  (revision 180206)
+++ gcc.dg/plugin/plugindir3.c  (working copy)
@@ -1,4 +1,4 @@
 /* { dg-do preprocess } */
 /* { dg-options "-fplugin=foo" } */
 
-/* { dg-prune-output ".*inacessible plugin file.*foo\.so expanded from short 
plugin name.*" } */
+/* { dg-prune-output ".*inaccessible plugin file.*foo\.so expanded from short 
plugin name.*" } */
Index: gcc.dg/plugin/plugindir4.c
===
--- gcc.dg/plugin/plugindir4.c  (revision 180206)
+++ gcc.dg/plugin/plugindir4.c  (working copy)
@@ -1,4 +1,4 @@
 /* { dg-do preprocess } */
 /* { dg-options "-iplugindir=my-plugindir -fplugin=foo" } */
 
-/* { dg-prune-output ".*inacessible plugin file.*my-plugindir/foo\.so expanded 
from short plugin name.*" } */
+/* { dg-prune-output ".*inaccessible plugin file.*my-plugindir/foo\.so 
expanded from short plugin name.*" } */


Re: [PATCH, testsuite]: Fix PR 50796, FAIL: gcc.dg/plugin/plugindir[1234].c

2011-10-19 Thread Andreas Schwab
Uros Bizjak  writes:

> Index: gcc.dg/plugin/plugindir1.c
> ===
> --- gcc.dg/plugin/plugindir1.c(revision 180206)
> +++ gcc.dg/plugin/plugindir1.c(working copy)
> @@ -1,4 +1,4 @@
>  /* { dg-do compile } */
>  /* { dg-options "-c -fplugin=foo" } */
>  
> -/* { dg-prune-output ".*inacessible plugin file.*foo\.so expanded from short 
> plugin name.*" } */
> +/* { dg-prune-output ".*inaccessible plugin file.*foo\.so expanded from 
> short plugin name.*" } */

Another case of pruning too much.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


regcprop.c bug fix

2011-10-19 Thread Mike Stump
So while tracking down a hairy address reload for an output reload bug, 
copyprop_hardreg_forward_1 was faulting because it was trying to extract move 
patterns that didn't work out, and when it came back to the code, it then tries 
to access recog_data, but the problem is, the exploration of other instructions 
to see if they match, overwrites that data, and there is nothing that restores 
the data to a point in which the code below this point expects.  It uses 
recog_data.operand[i], where i is limited by n_ops, but that value corresponded 
to the old data in recog_data.  The recog and extract_insn in insn_invalid_p 
called from verify_changes called from apply_change_group called from 
validate_change wipes the `old' recog_data with new data.  This data, for 
example, might only have 2 operands, with an invalid value for the third 
operand.  The old n_ops, might well be 3 from the original data.  Accessing 
that data can cause a crash.

So, I don't know if people want to data regenerated, or it they want to cache 
the data, or if they want to save and restore it underneath a called api...

Ok?

Index: regcprop.c
===
--- regcprop.c  (revision 1831)
+++ regcprop.c  (working copy)
@@ -866,6 +866,10 @@ copyprop_hardreg_forward_1 (basic_block
}
}
}
+  extract_insn (insn);
+  if (! constrain_operands (1))
+   fatal_insn_not_found (insn);
+  preprocess_constraints ();
   no_move_special_case:
 
   any_replacements = false;



[PATCH, m68k] Fix floating-point comparisons with zero

2011-10-19 Thread Julian Brown
Hi,

It appears that m68k floating-point comparisons may have been
subtly broken since 2007 when the following patch was applied:

  http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01570.html

The bug shows with our internal 4.6-based branch, but appears to be
latent on mainline (generated code is perturbed by r171236, an
apparently-unrelated change -- if that commit alone is reverted, the bug
reappears on current trunk). The discussion below was written wrt. the
previously-mentioned 4.6 branch, so details may be slightly different on
mainline, but the outcome is the same.

The trouble is the code introduced in m68k.c:notice_update_cc, intended
to reverse the sense of comparisons for one particular
*cmp_68881/*cmp_cf alternative, inadvertently also inverts
the sense for tst_68881/tst_cf (FP comparisons against
zero).

For the test case gcc.c-torture/execute/ieee/20010114-2.c, the
badness only triggers when -freorder-blocks is in effect (-O2 and
above). Two comparisons (of the same variable) against zero are munged
into one in final.c after basic-block reordering makes them
consecutive, and we get:

ftst.s %d0  | x
fjgt .L13   |
(final.c omits a redundant "ftst.s %d0" here)
fjngt .L2   |

when we ought to have got:

ftst.s %d0  | x
fjgt .L13   |
fjnlt .L2   |

The bug only triggers due to the omission of the test, else it would
happen (far) more often: if the ftst is actually emitted, the flags get
reset before any harm can be done:

(define_insn "tst_68881"
  [(set (cc0)
(compare (match_operand:FP 0 "general_operand" "fm")
 (match_operand:FP 1 "const0_operand" "H")))]
  "TARGET_68881"
{
  cc_status.flags = CC_IN_68881;  /* unsets CC_REVERSED & other flags */
  ...

but, when final.c decides the second ftst is redundant, nothing unsets
CC_REVERSED, so we actually (incorrectly) treat the comparison as
having been done with reversed operands.

We can fix this by never setting CC_REVERSED for ftst instructions in
the first place. The problem is that the test (in notice_update_cc),

  if (cc_status.value2 && GET_CODE (cc_status.value2) == COMPARE
  && GET_MODE_CLASS (GET_MODE (XEXP (cc_status.value2, 0))) == MODE_FLOAT)
{
  cc_status.flags = CC_IN_68881;
  if (!FP_REG_P (XEXP (cc_status.value2, 0)))
cc_status.flags |= CC_REVERSED;
}

is not strict enough, and triggers for ftst instructions (even elided
ones), as well as for the intended third alternative of fcmp.

So, the fix is to tighten up the inner condition to not trigger for
ftst. The attached patch does that, and we get the following test result
delta (as well as some noise in libstdc++ results, but I think those
are environment-related):

FAIL -> PASS: gcc.c-torture/execute/ieee/20010114-2.c execution,  -O2 
FAIL -> PASS: gcc.c-torture/execute/ieee/20010114-2.c execution,  -O3 
-fomit-frame-pointer 
FAIL -> PASS: gcc.c-torture/execute/ieee/20010114-2.c execution,  -O3 -g 
FAIL -> PASS: gcc.c-torture/execute/ieee/20030331-1.c execution,  -O2 
FAIL -> PASS: gcc.c-torture/execute/ieee/20030331-1.c execution,  -O3 
-fomit-frame-pointer 
FAIL -> PASS: gcc.c-torture/execute/ieee/20030331-1.c execution,  -O3 -g 

These results are against mainline *with r171236 reverted*, cross to
ColdFire Linux -- otherwise results are the same before/after. I believe
the patch is still needed regardless though.

OK to apply?

Thanks,

Julian

ChangeLog

gcc/
* config/m68k/m68k.c (notice_update_cc): Tighten condition for
setting CC_REVERSED for FP comparisons.
Index: gcc/config/m68k/m68k.c
===
--- gcc/config/m68k/m68k.c	(revision 180197)
+++ gcc/config/m68k/m68k.c	(working copy)
@@ -4206,7 +4206,8 @@ notice_update_cc (rtx exp, rtx insn)
   && GET_MODE_CLASS (GET_MODE (XEXP (cc_status.value2, 0))) == MODE_FLOAT)
 {
   cc_status.flags = CC_IN_68881;
-  if (!FP_REG_P (XEXP (cc_status.value2, 0)))
+  if (!FP_REG_P (XEXP (cc_status.value2, 0))
+	  && FP_REG_P (XEXP (cc_status.value2, 1)))
 	cc_status.flags |= CC_REVERSED;
 }
 }


[C++ Patch] PR 48630 (PR 31423)

2011-10-19 Thread Paolo Carlini

Hi,

in these two twin PRs filed by Ian and Gerald, it is pointed out that 
cases like:


   struct C {
 int f() { return 1; }
   };

   int f(C&c) {
 return ( 1 == c.f );
   }

where the user actually forgot the open-closed round braces are much 
more likely than cases where an ampersand is missing, still we output


invalid use of member (did you forget the ‘&’ ?)

Thus, the idea of saying instead

invalid use of member (did you forget the ‘()’ ?)

which I implemented in the patchlet below. Alternately we could give 
both hints, or refer to the argument list.


Tested x86_64-linux.

Thanks,
Paolo.

///
/cp
2011-10-19  Paolo Carlini  

PR c++/31423
PR c++/48630
* typeck2.c (cxx_incomplete_type_diagnostic): Improve error message
for invalid use of member.

/testsuite
2011-10-19  Paolo Carlini  

PR c++/31423
PR c++/48630
* g++.dg/parse/error42.C: New.



Index: testsuite/g++.dg/parse/error42.C
===
--- testsuite/g++.dg/parse/error42.C(revision 0)
+++ testsuite/g++.dg/parse/error42.C(revision 0)
@@ -0,0 +1,5 @@
+// PR c++/48630
+// { dg-options "" }
+
+class C { public: C* f(); int get(); }; // { dg-error "forget the 
'\\(\\)'|base operand" }
+int f(C* p) { return p->f->get(); }
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 180206)
+++ cp/typeck2.c(working copy)
@@ -429,7 +429,7 @@ cxx_incomplete_type_diagnostic (const_tree value,
 case OFFSET_TYPE:
 bad_member:
   emit_diagnostic (diag_kind, input_location, 0,
-  "invalid use of member (did you forget the %<&%> ?)");
+  "invalid use of member (did you forget the %<()%> ?)");
   break;
 
 case TEMPLATE_TYPE_PARM:


Re: [C++ Patch] PR 48630 (PR 31423)

2011-10-19 Thread Gabriel Dos Reis
On Wed, Oct 19, 2011 at 4:34 PM, Paolo Carlini  wrote:
> Hi,
>
> in these two twin PRs filed by Ian and Gerald, it is pointed out that cases
> like:
>
>   struct C {
>     int f() { return 1; }
>   };
>
>   int f(C&c) {
>     return ( 1 == c.f );
>   }
>
> where the user actually forgot the open-closed round braces are much more
> likely than cases where an ampersand is missing, still we output
>
> invalid use of member (did you forget the ‘&’ ?)
>
> Thus, the idea of saying instead
>
> invalid use of member (did you forget the ‘()’ ?)
>
> which I implemented in the patchlet below. Alternately we could give both
> hints, or refer to the argument list.

I agree that '()' is a better default.  But we can do better.
We can use the type context, e.g. in initialization or
return-statement etc., to decide
whether () is intended (by looking at the TREE_TYPE of a member function type),
and only in cases where we can't tell we suggest both alternative:

 (did you intend a function call or address of non-static member function?)


Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)

2011-10-19 Thread Xinliang David Li
what kind of failures?

David

On Wed, Oct 19, 2011 at 2:04 PM,   wrote:
> On 2011/10/19 20:38:35, kcc wrote:
>>
>> Added code to avoid bitfields.
>
> Is there anything I should know about splitting basic blocks in the
> presence of EH?
> Currently, asan fails on 483.xalancbmk, 453.povray and 450.soplex, which
> are known to have EH.
>
> http://codereview.appspot.com/5272048/
>


Re: [PATCH] Account for devirtualization opportunities in inliner

2011-10-19 Thread Maxim Kuvyrkov
On 28/09/2011, at 4:56 PM, Maxim Kuvyrkov wrote:

> Jan,
> 
> The following patch starts a series of patches which improve devirtualization 
> optimizations in GCC.
> 
> This patch builds on ipa-cp.c and ipa-prop.c infrastructure for analyzing 
> parameters and jump functions and adds basic estimation of devirtualization 
> benefit from inlining an edge.  E.g., if inlining A across edge E into B will 
> allow some of the indirect edges of A to be resolved, then inlining cost of 
> edge E is reduced.
> 
> The patch was bootstrapped and regtested on x86_64-linux-gnu on both -m32 and 
> -m64 multilibs.
> 
> OK to commit?

Ping.

The primary change of this patch is to make evaluate_conditions_for_edge to 
output KNOWN_VALS and KNOWN_BINFOS arrays in addition to conditions for a 
callsite.  KNOWN_VALS and KNOWN_BINFOS are then passed on to a subroutine of 
estimate_calls_size_and_time, which uses ipa-prop.c infrastructure to check if 
it will be possible to devirtualize any of the indirect edged within callee.  
If possible, then *size and *time returned by estimate_calls_size_and_time are 
reduced to account for the devirtualization benefits.

OK for trunk?

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



fsf-gcc-devirt-account.ChangeLog
Description: Binary data


fsf-gcc-devirt-account.patch
Description: Binary data


Re: [Patch, Fortran] PR 47023: C_Sizeof: Rejects valid code

2011-10-19 Thread Janus Weil
>> If I saw it correctly, one still needs to fix the vendor-extension SIZEOF
>> for procedure pointers (it currently returns one byte!)
>
> Right. Actually I don't quite understand where that 1 byte comes from.

I have just committed the patch from comment #23 of the PR (which has
been approved by Tobias privately), rejecting procedure pointers as
actual arguments of SIZEOF:

http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180210

Patch includes documentation in intrinsic.texi (where I also fixed
another minor nit).

Cheers,
Janus


Re: [patch] Fix PR tree-optimization/49960 ,Fix self data dependence

2011-10-19 Thread Razya Ladelsky
gcc-patches-ow...@gcc.gnu.org wrote on 17/10/2011 09:03:59 AM:

> From: Richard Guenther 
> To: Razya Ladelsky/Haifa/IBM@IBMIL
> Cc: gcc-patches@gcc.gnu.org, Sebastian Pop 
> Date: 17/10/2011 09:04 AM
> Subject: Re: [patch] Fix PR tree-optimization/49960 ,Fix self data 
dependence
> Sent by: gcc-patches-ow...@gcc.gnu.org
> 
> On Mon, Oct 17, 2011 at 8:23 AM, Razya Ladelsky  
wrote:
> > This patch fixes the failures described in
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49960
> > It also fixes bzips when run with autopar enabled.
> >
> > In both cases the self dependences are not handled correctly.
> > In the first case, a non affine access is analyzed:
> > in the second, the distance vector is not calculated correctly (the
> > distance vector considered for for self dependences is always 
(0,0,...))
> >
> > As  a result, the loops get wrongfully parallelized.
> >
> > The patch avoids the special handling of  self dependences, and 
analyzes
> > all dependences in the same way. Specific adjustments
> > and support for the self dependence cases were made.
> 
> Can you elaborate on
> 
> @@ -3119,8 +3135,11 @@ add_other_self_distances (struct 
data_dependence_r
> {
>   if (DDR_NUM_SUBSCRIPTS (ddr) != 1)
>{
> -DDR_ARE_DEPENDENT (ddr) = chrec_dont_know;
> -return;
> +if (DDR_NUM_SUBSCRIPTS (ddr) != 2 || !integer_zerop 
(DR_ACCESS_FN
> (DDR_A (ddr), 1)))
> +  {
> +DDR_ARE_DEPENDENT (ddr) = chrec_dont_know;
> +return;
> +  }
>}
> 
>   access_fun = DR_ACCESS_FN (DDR_A (ddr), 0);
> 
> ?  It needed a comment before, and now so even more.
> 
> The rest of the patch is ok, I suppose the above hunk is to enhance
> something, not
> to fix the bug?

For fortran code like:

  DO 140 J=1,MB
 DO 130 K=1,NA
BKJ=B(K,J)
IF(BKJ.EQ.ZERO) GO TO 130
   DO 120 I=1,MA
  C(I,J)=C(I,J)+A(K,I)*BKJ
  120  CONTINUE
  130CONTINUE
  140 CONTINUE
  RETURN


The access functions for the C(i j) self dependence are:

(Data Dep: 
#(Data Ref: 
#  bb: 9 
#  stmt: D.1427_79 = *c_78(D)[D.1426_77];
#  ref: *c_78(D)[D.1426_77];
#  base_object: *c_78(D);
#  Access function 0: {{(stride.12_25 + 1) + offset.13_36, +, 
stride.12_25}_1, +, 1}_3
#  Access function 1: 0B
#)
#(Data Ref: 
#  bb: 9 
#  stmt: *c_78(D)[D.1426_77] = D.1433_88;
#  ref: *c_78(D)[D.1426_77];
#  base_object: *c_78(D);
#  Access function 0: {{(stride.12_25 + 1) + offset.13_36, +, 
stride.12_25}_1, +, 1}_3
#  Access function 1: 0B
#)


Two dimesions are created to describe C(i j) although there's no need for 
access function 1 which is just 0B.


If this was a C code, we would have these two access functions for 
C[i][j]:

(Data Dep: 
#(Data Ref: 
#  bb: 5 
#  stmt: t_10 = C[i_33][j_37];
#  ref: C[i_33][j_37];
#  base_object: C
#  Access function 0: {3, +, 1}_3
#  Access function 1: {3, +, 1}_2
#)
#(Data Ref: 
#  bb: 5 
#  stmt: C[i_33][j_37] = D.3852_15;
#  ref: C[i_33][j_37];
#  base_object: C
#  Access function 0: {3, +, 1}_3
#  Access function 1: {3, +, 1}_2
#)


In order to handle the Fortran data accesses, even for simple cases as 
above, 
I would need to handle multivariate accesses.
The data dependence analysis doesn't know how to handle such dependences 
if there's more than one subscript.
The above Frotran code doesn't actually have two subscripts, but one, and 
thus should be handled.
 
The reason this issue came up only with the changes of this patch is that 
now 
add_other_self_distances is called from build_classic_dist_vector, which 
is called also for self dependences.
Before the patch, the distance vector for self dependences was always 
determined as a vector of 0's, and build_classic_dist_vector
was not called.

I hope it's clearer now, I will add a comment to the code, and submit it 
before committing it.

Thanks,
Razya


> 
> Thanks,
> Richard.
> 
> > Bootstrap and testsuite pass successfully for ppc64-redhat-linux.
> >
> > OK for trunk?
> > Thank you,
> > Razya
> >
> >
> > ChangeLog:
> >
> >PR tree-optimization/49960
> >* tree-data-ref.c (compute_self_dependence): Remove.
> > (initialize_data_dependence_relation): Add intializations.
> > Remove compute_self_dependence.
> > (add_other_self_distances): Add support for two dimensions 
if
> > the second is zero.
> > (compute_affine_dependence): Remove the 
!DDR_SELF_REFERENCE
> > condition.
> > (compute_all_dependences): Remove call to
> > compute_self_dependence. Add call to compute_affine_dependence
> 
> > testsuite/ChangeLog:
> >
> >PR tree-optimization/49660
> >* gcc.dg/autopar/pr49660.c: New test.
> >   * gcc.dg/autopar/pr49660-1.c: New test.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >



Re: [trans-mem] Add gl_wt TM method.

2011-10-19 Thread Torvald Riegel
Committed with the changes below.

On Wed, 2011-10-19 at 12:17 -0700, Richard Henderson wrote:
> > Add support for TM-method-specific begin code.
> > 
> > * libitm_i.h (GTM::gtm_restart_reason): Re-arrange and clean up
> > declarations.
> > * dispatch.h (GTM::abi_dispatch::begin_or_restart): New.
> > * method-serial.cc: Implement begin_or_restart().
> > * beginend.cc (GTM::gtm_thread::begin_transaction): Call
> > dispatch-specific begin_or_restart().
> > (GTM::gtm_thread::restart): Same.
> 
> Ok except,
> 
> > +  // Run dispatch-specific restart code. Retry until we succeed.
> > +  GTM::gtm_restart_reason rr;
> > +  while ((rr = disp->begin_or_restart())
> > +  != NUM_RESTARTS)
> 
> Please add
> 
>   NO_RESTART = NUM_RESTARTS
> 
> (or it's own number *after* NUM_RESTARTS, or -1, or something)
> to the enumeration and use that name.  Using num_restarts here is confusing.
> 
> > Fixed gtm_thread::serialirr_mode to actually use serialirr, not serial.
> > 
> > * method-serial.cc (GTM::gtm_thread::serialirr_mode): Fixed: Use
> > serial-irrevocable dispatch, not serial.
> 
> Ok.
> 
> > Do not free transaction-local memory when committing a nested 
> > transaction.
> > 
> > * alloc.cc (commit_allocations_2): Do not free transaction-local
> > memory when committing a nested transaction.
> 
> Ok.
> 
> > Handle re-initialization of the current method group.
> > 
> > * retry.cc (GTM::gtm_thread::decide_retry_strategy): Handle
> > re-initialization of the current method group.
> > * libitm_i.h (GTM::gtm_restart_reason): Add restart reason for 
> > this.
> 
> Ok.
> 
> > Undo log is used for both thread-local and shared data.
> > 
> > * libitm_i.h: Renamed gtm_local_undo to gtm_undolog_entry.
> > (GTM::gtm_thread): Renamed local_undo to undolog. Renamed
> > undolog-related member functions from *_local to *_undolog.
> > * local.cc (gtm_thread::commit_undolog): Same.
> > * beginend.cc (GTM::gtm_thread::trycommit): Same.
> > (GTM::gtm_thread::rollback): Roll back undolog before
> > dispatch-specific rollback.
> 
> Ok.
> 
> > Ensure privatization safety if requested by a TM method.
> > 
> > * beginend.cc (GTM::gtm_thread::trycommit): Ensure privatization
> > safety if requested by a TM method.
> > * dispatch.h (GTM::abi_dispatch::trycommit): Add parameter for
> > privatization safety.
> > * method-serial.cc: Same.
> 
> Ok.
> 
> > Add gl_wt TM method.
> > 
> > * libitm_i.h: Add gl_wt dispatch.
> > * retry.cc (parse_default_method): Same.
> > * method-gl.cc: New file.
> > * Makefile.am: Use method-gl.cc.
> > * Makefile.in: Rebuild.
> 
> Ok with...
> 
> > Fix gl_wt commit/rollback when serial lock has been acquired.
> > 
> > * method-gl.cc (gl_wt_dispatch::trycommit): Fix interaction with
> > gtm_thread::shared_state when the serial lock is acquired.
> > (gl_wt_dispatch::rollback): Same.
> 
> ... this merged with the previous commit.
> 
> > Fix TLS read accesses on Linux/x86.
> > 
> > * config/linux/x86/tls.h (abi_disp): Make TLS slot read 
> > volatile.
> > (gtm_thr): Same.
> 
> Ok.
> 
> 
> r~
> 




Re: [PATCH] Don't assume that constants can clobber vtbl

2011-10-19 Thread Maxim Kuvyrkov
On 11/10/2011, at 4:05 AM, Martin Jambor wrote:

> 
> Just before this, there is the following test (guarded by
> flag_strict_aliasing per explicit Richi's request). 
> 
> if (flag_strict_aliasing
> && !POINTER_TYPE_P (TREE_TYPE (lhs)))
>   return false;
> 
> Why is it not enough to catch integer constants?

It appears to be enough.  The patch I sent was developed against GCC 4.6 and I 
didn't notice your change when updated it for 4.7.  Thanks for fixing this.

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



C++ PATCH for c++/50793 (incomplete value-initialization with default argument)

2011-10-19 Thread Jason Merrill
When bot_manip regenerates TARGET_EXPR/AGGR_INIT_EXPR, it needs to 
preserve the AGGR_INIT_ZERO_FIRST flag.


Tested x86_64-pc-linux-gnu, applying to trunk and release branches.
commit f56f4ec3e3620b4ae4e07986a17338d607952c65
Author: Jason Merrill 
Date:   Wed Oct 19 17:21:34 2011 -0400

	PR c++/50793
	* tree.c (bot_manip): Propagate AGGR_INIT_ZERO_FIRST.

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 75aa265..d023eb8 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -1879,8 +1879,12 @@ bot_manip (tree* tp, int* walk_subtrees, void* data)
   tree u;
 
   if (TREE_CODE (TREE_OPERAND (t, 1)) == AGGR_INIT_EXPR)
-	u = build_cplus_new (TREE_TYPE (t), TREE_OPERAND (t, 1),
-			 tf_warning_or_error);
+	{
+	  u = build_cplus_new (TREE_TYPE (t), TREE_OPERAND (t, 1),
+			   tf_warning_or_error);
+	  if (AGGR_INIT_ZERO_FIRST (TREE_OPERAND (t, 1)))
+	AGGR_INIT_ZERO_FIRST (TREE_OPERAND (u, 1)) = true;
+	}
   else
 	u = build_target_expr_with_type (TREE_OPERAND (t, 1), TREE_TYPE (t),
 	 tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/init/value9.C b/gcc/testsuite/g++.dg/init/value9.C
new file mode 100644
index 000..4899bd8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/value9.C
@@ -0,0 +1,32 @@
+// PR c++/50793
+// { dg-do run }
+
+struct NonTrivial
+{
+  NonTrivial() { }
+};
+
+struct S
+{
+  NonTrivial nt;
+  int i;
+};
+
+int f(S s)
+{
+  s.i = 0xdeadbeef;
+  return s.i;
+}
+
+int g(S s = S())
+{
+  return s.i;
+}
+
+int main()
+{
+  f(S());  // make stack dirty
+
+  if ( g() )
+__builtin_abort();
+}


Re: Intrinsics for N2965: Type traits and base classes

2011-10-19 Thread Benjamin Kosnik

> Please post a ChangeLog entry with a patch.  Someone added one for
> you:
> 
> 2011-10-17  Michael Spertus  
> 
>   * gcc/c-family/c-common.c (c_common_reswords): Add __bases,
>   __direct_bases.
>   * gcc/c-family/c-common.h: Add RID_BASES and RID_DIRECT_BASES.
> 
> but it is in the wrong file (c-family has its own ChangeLog) and
> shouldn't contain the gcc/c-family prefix (paths are relative to the
> directory).

Sorry Eric, this was my fault. I see that Paolo has already fixed this
up. 

Thanks Paolo!

best,
benjamin


[C++ Patch] PR 13657

2011-10-19 Thread Paolo Carlini

Hi,

this tries to implement what Jason suggested in the audit trail: do not 
refer to an argument and use instead 'cannot convert...", like, eg, in 
cvt.c.


Tested x86_64-linux.

Thanks,
Paolo.

//
/cp
2011-10-19  Paolo Carlini  

PR c++/13657
* class.c (instantiate_type): Fix error message.

/testsuite
2011-10-19  Paolo Carlini  

PR c++/13657
* g++.dg/parse/error42.C: New.

Index: testsuite/g++.dg/parse/error42.C
===
--- testsuite/g++.dg/parse/error42.C(revision 0)
+++ testsuite/g++.dg/parse/error42.C(revision 0)
@@ -0,0 +1,4 @@
+// PR c++/13657
+
+class C { public: int (*f())(); int bar(); };
+int (*C::f())() { return C::bar; } // { dg-error "cannot convert 'C::bar'" }
Index: cp/class.c
===
--- cp/class.c  (revision 180208)
+++ cp/class.c  (working copy)
@@ -6867,8 +6867,8 @@ instantiate_type (tree lhstype, tree rhs, tsubst_f
   else
{
  if (flags & tf_error)
-   error ("argument of type %qT does not match %qT",
-  TREE_TYPE (rhs), lhstype);
+   error ("cannot convert %qE from type %qT to type %qT",
+  rhs, TREE_TYPE (rhs), lhstype);
  return error_mark_node;
}
 }


[SH] PR 50694 - SH2A little endian does not actually work

2011-10-19 Thread Oleg Endo
Hello,

the attached patch addresses PR 50694 and also does some cleanups in
sh.md.

I haven't run the testsuite with this one since it's basically just text
replacements, except for the changed cmpgeusi_t insn.  However, CSiBE
didn't show any code size changes with the changed insn.  As far as I
could observe it, the check for the zero special case is already being
done before insn matching. Or am I missing something there?

If desired I could run the testsuite anyways of course.

Cheers,
Oleg


ChangeLog:

2011-10-19  Oleg Endo  

PR target/50694
* config/sh/sh.h (IS_LITTLE_ENDIAN_OPTION, UNSUPPORTED_SH1,
UNSUPPORTED_SH2A): New macros.
(DRIVER_SELF_SPECS): Filter out unsupported options taking the
default configuration into account.

* config/sh/sh.c (MSW, LSW): Move macro definitions to header
file.
* config/sh/sh.md: Replace TARGET_LITTLE_ENDIAN conditionals
with MSW and LSW macros where applicable.
(cmpgeusi_t): Change insn_and_split to insn.  Remove check
for unsigned greater or equal zero comparison and its split.
(divsi_inv_qitable, divsi_inv_hitable): Fix warning for
redundant '@' with single alternative.

Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 180080)
+++ gcc/config/sh/sh.c	(working copy)
@@ -61,9 +61,6 @@
 
 int code_for_indirect_jump_scratch = CODE_FOR_indirect_jump_scratch;
 
-#define MSW (TARGET_LITTLE_ENDIAN ? 1 : 0)
-#define LSW (TARGET_LITTLE_ENDIAN ? 0 : 1)
-
 /* These are some macros to abstract register modes.  */
 #define CONST_OK_FOR_ADD(size) \
   (TARGET_SHMEDIA ? CONST_OK_FOR_I10 (size) : CONST_OK_FOR_I08 (size))
Index: gcc/config/sh/sh.h
===
--- gcc/config/sh/sh.h	(revision 180080)
+++ gcc/config/sh/sh.h	(working copy)
@@ -417,10 +417,37 @@
 #define SH_DIV_STR_FOR_SIZE "call"
 #endif
 
-#define DRIVER_SELF_SPECS "%{m2a:%{ml:%eSH2a does not support little-endian}}"
+/* SH1 and SH2A do not support little-endian.  Catch such combinations
+   taking into account the default configuration.  */
+#if TARGET_ENDIAN_DEFAULT == MASK_BIG_ENDIAN
+#define IS_LITTLE_ENDIAN_OPTION "%{ml:"
+#else
+#define IS_LITTLE_ENDIAN_OPTION "%{!mb:"
+#endif
 
+#if TARGET_CPU_DEFAULT == SELECT_SH1
+#define UNSUPPORTED_SH1 IS_LITTLE_ENDIAN_OPTION \
+"%{m1|!m2*:%{!m3*:%{!m4*:%{!m5*:%eSH1 does not support little-endian}"
+#else
+#define UNSUPPORTED_SH1 IS_LITTLE_ENDIAN_OPTION \
+"%{m1:%eSH1 does not support little-endian}}"
+#endif
+
+#if TARGET_CPU_DEFAULT & MASK_HARD_SH2A
+#define UNSUPPORTED_SH2A IS_LITTLE_ENDIAN_OPTION \
+"%{m2a*|!m1:%{!m2*:%{!m3*:%{!m4*:{!m5*:%eSH2a does not supportlittle-endian}}"
+#else
+#define UNSUPPORTED_SH2A IS_LITTLE_ENDIAN_OPTION \
+"%{m2a*:%eSH2a does not support little-endian}}"
+#endif
+
+#define DRIVER_SELF_SPECS UNSUPPORTED_SH1, UNSUPPORTED_SH2A
+
 #define ASSEMBLER_DIALECT assembler_dialect
 
+#define MSW (TARGET_LITTLE_ENDIAN ? 1 : 0)
+#define LSW (TARGET_LITTLE_ENDIAN ? 0 : 1)
+
 extern int assembler_dialect;
 
 enum sh_divide_strategy_e {
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 180080)
+++ gcc/config/sh/sh.md	(working copy)
@@ -806,19 +806,12 @@
 ;; SImode unsigned integer comparisons
 ;; -
 
-(define_insn_and_split "cmpgeusi_t"
+(define_insn "cmpgeusi_t"
   [(set (reg:SI T_REG)
 	(geu:SI (match_operand:SI 0 "arith_reg_operand" "r")
-		(match_operand:SI 1 "arith_reg_or_0_operand" "rN")))]
+		(match_operand:SI 1 "arith_reg_operand" "r")))]
   "TARGET_SH1"
   "cmp/hs	%1,%0"
-  "&& operands[1] == CONST0_RTX (SImode)"
-  [(pc)]
-  "
-{
-  emit_insn (gen_sett ());
-  DONE;
-}"
[(set_attr "type" "mt_group")])
 
 (define_insn "cmpgtusi_t"
@@ -943,15 +936,12 @@
(match_dup 6)]
   "
 {
-  operands[2]
-= gen_rtx_REG (SImode,
-		   true_regnum (operands[0]) + (TARGET_LITTLE_ENDIAN ? 1 : 0));
+  operands[2] = gen_rtx_REG (SImode, true_regnum (operands[0]) + MSW);
   operands[3]
 = (operands[1] == const0_rtx
? const0_rtx
-   : gen_rtx_REG (SImode,
-		  true_regnum (operands[1])
-		  + (TARGET_LITTLE_ENDIAN ? 1 : 0)));
+   : gen_rtx_REG (SImode, true_regnum (operands[1]) + MSW));
+
   operands[4] = gen_lowpart (SImode, operands[0]);
   operands[5] = gen_lowpart (SImode, operands[1]);
   operands[6] = gen_label_rtx ();
@@ -1465,12 +1455,9 @@
   "
 {
   rtx high0, high2, low0 = gen_lowpart (SImode, operands[0]);
-  high0 = gen_rtx_REG (SImode,
-		   true_regnum (operands[0])
-		   + (TARGET_LITTLE_ENDIAN ? 1 : 0));
-  high2 = gen_rtx_REG (SImode,
-		   true_regnum (operands[2])
-		   + (TARGET_LITTLE_ENDIAN ? 1 : 0));
+  high0 = gen_rtx_REG (SImode, true_regnum (operands[0]) + MSW);
+  high2 = gen_rtx_REG (SImode, true

Re: [C++ Patch] PR 13657

2011-10-19 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch] PR 48630 (PR 31423)

2011-10-19 Thread Jason Merrill

Surely we should only make this change for function members.

Jason


[v3] tr2: bool_set, dynamic_bitset, ratio

2011-10-19 Thread Benjamin Kosnik

Hi Edward! 

I have reviewed and selectively merged your development
branch, libstdcxx-tr2-ideas-branch, into trunk. As ISO C++ is now
looking at new library efforts for TR2 (and N2965 has already been
checked in to trunk) I thought it well-timed.

Please check my work, patch attached. 

On  libstdcxx-tr2-ideas-branch, I found:

1) N1718 + extensions for polynomial. 
2) N2136 bool_set
3) N2050 dynamic_bitset
4) ratio extensions for binary
5) cmath extensions
6) C++0x rope

Of these, I selected 2, 3, and 4 for merging.

It is sad to leave N1718 unmerged, I think there is a lot of good work
here and frankly see N1744 (marked with open status for LWG) as a weaker
candidate. But it's hard for me to justify putting this in given that it
was explicitly rejected by LWG. Perhaps you have current status or
thoughts?

For the rope extensions, I would suggest just adding C++0x bits to
ext/rope, instead of making tr2/rope. 

The  additions look interesting but I just saw declarations and
no actual implementation so I punted.

I would encourage you to continue working on TR2 projects! This
future work can now proceed on mainline, without a special branch.

tested x86_64/linux

-benjamin








20111019-1.patch.bz2
Description: application/bzip


Avoid gcc.dg/tree-prof/val-prof-7.c dependence on

2011-10-19 Thread Joseph S. Myers
The testcase gcc.dg/tree-prof/val-prof-7.c includes  to get
a declaration of bzero.  This causes it to fail on targets where bzero
(a legacy function removed in the latest version of POSIX) is not
declared in that header; declaring it explicitly in the testcase is
more reliable.  This patch changes the include to an explicit
declaration.

Tested with cross to i686-mingw32 (where the header just includes
 and does not provide a declaration of bzero).  OK to
commit?

2011-10-19  Joseph Myers  

* gcc.dg/tree-prof/val-prof-7.c: Declare bzero instead of
including .

Index: gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c
===
--- gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (revision 180200)
+++ gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (working copy)
@@ -1,7 +1,7 @@
 /* { dg-options "-O2 -fdump-ipa-profile -mtune=core2" } */
 /* { dg-skip-if "" { ! { i?86-*-* x86_64-*-* } } { "*" } { "" } } */
 
-#include 
+extern void bzero (void *, __SIZE_TYPE__);
 
 int foo(int len)
 {

-- 
Joseph S. Myers
jos...@codesourcery.com


new patches using -fopt-info (issue5294043)

2011-10-19 Thread Rong Xu
After some off-line discussion, we decided to use a more general approach
to control the printing of optimization messages/warnings. We will
introduce a new option -fopt-info:
 * fopt-info=0 or fno-opt-info: no message will be emitted.
 * fopt-info or fopt-info=1: emit important warnings and optimization
   messages with large performance impact.
 * fopt-info=2: warnings and optimization messages targeting power users.
 * fopt-info=3: informational messages for compiler developers.

2011-10-19   Rong Xu  

* gcc/common.opt (fopt-info): New flag. (fopt-info=) Ditto.
* gcc/opts.c (common_handle_option): Handle OPT_fopt_info_.
* gcc/flag-types.h (opt_info_verbosity_levels): New enum.
* gcc/value-prof.c (check_ic_counter): guard warnings/notes by
  flag_opt_info.
  (find_func_by_funcdef_no): Ditto. 
  (check_ic_target): Ditto.
  (check_counter): Ditto.
  (check_ic_counter): Ditto.
* gcc/mcf.c (find_minimum_cost_flow): Ditto.
* gcc/profile.c (read_profile_edge_counts): Ditto.
  (compute_branch_probabilities): Ditto.
* gcc/coverage.c (read_counts_file): Ditto.
  (get_coverage_counts): Ditto.
* gcc/tree-profile.c: (gimple_gen_reusedist): Ditto.
  (maybe_issue_profile_use_note): Ditto.
  (optimize_reusedist): Ditto.
* gcc/testsuite/gcc.dg/pr32773.c: add -fopt-info.
* gcc/testsuite/gcc.dg/pr40209.c: Ditto.
* gcc/testsuite/gcc.dg/pr26570.c: Ditto.
* gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.


Index: gcc/value-prof.c
===
--- gcc/value-prof.c(revision 180106)
+++ gcc/value-prof.c(working copy)
@@ -472,9 +472,10 @@
   : DECL_SOURCE_LOCATION (current_function_decl);
   if (flag_profile_correction)
 {
- inform (locus, "correcting inconsistent value profile: "
- "%s profiler overall count (%d) does not match BB count "
-  "(%d)", name, (int)*all, (int)bb_count);
+  if (flag_opt_info >= OPT_INFO_MAX)
+inform (locus, "correcting inconsistent value profile: %s "
+   "profiler overall count (%d) does not match BB count "
+"(%d)", name, (int)*all, (int)bb_count);
  *all = bb_count;
  if (*count > *all)
 *count = *all;
@@ -510,33 +511,42 @@
   location_t locus;
   if (*count1 > all && flag_profile_correction)
 {
-  locus = (stmt != NULL)
-  ? gimple_location (stmt)
-  : DECL_SOURCE_LOCATION (current_function_decl);
-  inform (locus, "Correcting inconsistent value profile: "
-  "ic (topn) profiler top target count (%ld) exceeds "
- "BB count (%ld)", (long)*count1, (long)all);
+  if (flag_opt_info >= OPT_INFO_MAX)
+{
+  locus = (stmt != NULL)
+  ? gimple_location (stmt)
+  : DECL_SOURCE_LOCATION (current_function_decl);
+  inform (locus, "Correcting inconsistent value profile: "
+  "ic (topn) profiler top target count (%ld) exceeds "
+  "BB count (%ld)", (long)*count1, (long)all);
+}
   *count1 = all;
 }
   if (*count2 > all && flag_profile_correction)
 {
-  locus = (stmt != NULL)
-  ? gimple_location (stmt)
-  : DECL_SOURCE_LOCATION (current_function_decl);
-  inform (locus, "Correcting inconsistent value profile: "
-  "ic (topn) profiler second target count (%ld) exceeds "
- "BB count (%ld)", (long)*count2, (long)all);
+  if (flag_opt_info >= OPT_INFO_MAX)
+{
+  locus = (stmt != NULL)
+  ? gimple_location (stmt)
+  : DECL_SOURCE_LOCATION (current_function_decl);
+  inform (locus, "Correcting inconsistent value profile: "
+  "ic (topn) profiler second target count (%ld) exceeds "
+ "BB count (%ld)", (long)*count2, (long)all);
+}
   *count2 = all;
 }
   
   if (*count2 > *count1)
 {
-  locus = (stmt != NULL)
-  ? gimple_location (stmt)
-  : DECL_SOURCE_LOCATION (current_function_decl);
-  inform (locus, "Corrupted topn ic value profile: "
- "first target count (%ld) is less than the second "
- "target count (%ld)", (long)*count1, (long)*count2);
+  if (flag_opt_info >= OPT_INFO_MAX)
+{
+  locus = (stmt != NULL)
+  ? gimple_location (stmt)
+  : DECL_SOURCE_LOCATION (current_function_decl);
+  inform (locus, "Corrupted topn ic value profile: "
+ "first target count (%ld) is less than the second "
+ "target count (%ld)", (long)*count1, (long)*count2);
+}
   return true;
 }
 
@@ -548,12 +558,16 @@
*count2 = all - *count1;
   else
   

Re: new patches using -fopt-info (issue5294043)

2011-10-19 Thread Andi Kleen
x...@google.com (Rong Xu) writes:

> After some off-line discussion, we decided to use a more general approach
> to control the printing of optimization messages/warnings. We will
> introduce a new option -fopt-info:
>  * fopt-info=0 or fno-opt-info: no message will be emitted.
>  * fopt-info or fopt-info=1: emit important warnings and optimization
>messages with large performance impact.
>  * fopt-info=2: warnings and optimization messages targeting power users.
>  * fopt-info=3: informational messages for compiler developers.

It would be interested to have some warnings about missing SRA
opportunities in =1 or =2. I found that sometimes fixing those can give a
large speedup.

Right now a common case that prevents SRA on structure field 
is simply a memset or memcpy.

-Andi


-- 
a...@linux.intel.com -- Speaking for myself only


  1   2   >