date:20120906

Re: [PATCH] Fix PR 54494, removal of volatile store in strlen optimization because of the inlininer

2012-09-06 Thread Jakub Jelinek

On Wed, Sep 05, 2012 at 09:10:53PM -0700, Andrew Pinski wrote:
>   The inlininer likes to recreate some MEM_REF, it copies most of the
> bits (TREE_THIS_NOTRAP, TREE_THIS_VOLATILE, etc.) but forgets about
> TREE_SIDE_EFFECTS.  This causes the strlen optimization to think the
> memory store does not have a side effects.
> 
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> 
> Thanks,
> Andrew Pinski
> 
> ChangeLog:
> 
> * tree-inline.c (remap_gimple_op_r): Copy TREE_SIDE_EFFECTS also.
> 
> testsuite/ChangeLog:
> * gcc.dg/tree-ssa/strlen-1.c: New testcase.

Patch preapproved, but you've attached a different patch.

I'd say copy_tree_body_r's MEM_REF handling should also copy
TREE_SIDE_EFFECTS/TREE_THIS_NOTRAP (what about TREE_READONLY?),
maybe copy_decl_to_var/copy_result_decl_to_var should also
copy TREE_SIDE_EFFECTS, perhaps omp-low.c copy_var_decl, install_var_field,
tree-nested.c lookup_field_for_decl, tree-sra.c sra_ipa_reset_debug_stmts
(grep TREE_THIS_VOLATILE.*TREE_THIS_VOLATILE, looking for missing
TREE_SIDE_EFFECTS copy nearby).  That said, perhaps the tree-ssa-strlen.c
change is desirable too, unless we add some checking that TREE_THIS_VOLATILE
references/decls have TREE_SIDE_EFFECTS bit set in the IL.

> --- testsuite/gcc.c-torture/compile/pr49474.c (revision 0)
> +++ testsuite/gcc.c-torture/compile/pr49474.c (revision 0)
> --- cprop.c   (revision 176187)
> +++ cprop.c   (working copy)

Jakub

Re: [middle-end] Add machine_mode to address_cost target hook

2012-09-06 Thread Dominique Dhumieres

Oleg,

Bootstrap fails at revision 190996 on powerpc-apple-darwin9 with:

g++ -c   -g -DIN_GCC   -fno-exceptions -fno-rtti -W -Wall -Wno-narrowing 
-Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. 
-I. -I../../work/gcc -I../../work/gcc/. -I../../work/gcc/../include -I./../intl 
-I../../work/gcc/../libcpp/include -I/opt/mp/include  
-I../../work/gcc/../libdecnumber -I../../work/gcc/../libdecnumber/dpd 
-I../libdecnumber -DCLOOG_INT_GMP  -I/opt/mp/include  \
../../work/gcc/config/rs6000/rs6000.c -o rs6000.o
../../work/gcc/config/rs6000/rs6000.c: In function 'int 
rs6000_debug_address_cost(rtx, machine_mode, addr_space_t, bool)':
../../work/gcc/config/rs6000/rs6000.c:26077:42: error: cannot convert 'bool' to 
'machine_mode' for argument '2' to 'int hook_int_rtx_mode_as_bool_0(rtx, 
machine_mode, addr_space_t, bool)'
   int ret = TARGET_ADDRESS_COST (x, speed);

Obvious(?) fix

--- ../_gcc_clean/gcc/config/rs6000/rs6000.c2012-09-05 20:25:39.0 
+0200
+++ ../work/gcc/config/rs6000/rs6000.c  2012-09-06 00:56:21.0 +0200
@@ -26074,7 +26074,7 @@ static int
 rs6000_debug_address_cost (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED,
   addr_space_t as ATTRIBUTE_UNUSED, bool speed)
 {
-  int ret = TARGET_ADDRESS_COST (x, speed);
+  int ret = TARGET_ADDRESS_COST (x, mode, as, speed);
 
   fprintf (stderr, "\nrs6000_address_cost, return = %d, speed = %s, x:\n",
   ret, speed ? "true" : "false");

TIA

Dominique

Re: [PATCH] Fix PR 54494, removal of volatile store in strlen optimization because of the inlininer

2012-09-06 Thread Andrew Pinski

On Thu, Sep 6, 2012 at 12:07 AM, Jakub Jelinek  wrote:
> On Wed, Sep 05, 2012 at 09:10:53PM -0700, Andrew Pinski wrote:
>>   The inlininer likes to recreate some MEM_REF, it copies most of the
>> bits (TREE_THIS_NOTRAP, TREE_THIS_VOLATILE, etc.) but forgets about
>> TREE_SIDE_EFFECTS.  This causes the strlen optimization to think the
>> memory store does not have a side effects.
>>
>> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>>
>> * tree-inline.c (remap_gimple_op_r): Copy TREE_SIDE_EFFECTS also.
>>
>> testsuite/ChangeLog:
>> * gcc.dg/tree-ssa/strlen-1.c: New testcase.
>
> Patch preapproved, but you've attached a different patch.

Sorry about that.  Here is the correct one.

Also is this ok for the 4.7 branch?

Thanks,
Andrew Pinski

>
> I'd say copy_tree_body_r's MEM_REF handling should also copy
> TREE_SIDE_EFFECTS/TREE_THIS_NOTRAP (what about TREE_READONLY?),
> maybe copy_decl_to_var/copy_result_decl_to_var should also
> copy TREE_SIDE_EFFECTS, perhaps omp-low.c copy_var_decl, install_var_field,
> tree-nested.c lookup_field_for_decl, tree-sra.c sra_ipa_reset_debug_stmts
> (grep TREE_THIS_VOLATILE.*TREE_THIS_VOLATILE, looking for missing
> TREE_SIDE_EFFECTS copy nearby).  That said, perhaps the tree-ssa-strlen.c
> change is desirable too, unless we add some checking that TREE_THIS_VOLATILE
> references/decls have TREE_SIDE_EFFECTS bit set in the IL.
>
>> --- testsuite/gcc.c-torture/compile/pr49474.c (revision 0)
>> +++ testsuite/gcc.c-torture/compile/pr49474.c (revision 0)
>> --- cprop.c   (revision 176187)
>> +++ cprop.c   (working copy)
>
> Jakub
Index: testsuite/gcc.dg/tree-ssa/strlen-1.c
===
--- testsuite/gcc.dg/tree-ssa/strlen-1.c(revision 0)
+++ testsuite/gcc.dg/tree-ssa/strlen-1.c(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+extern const unsigned long base;
+static inline void wreg(unsigned char val, unsigned long addr) 
__attribute__((always_inline));
+static inline void wreg(unsigned char val, unsigned long addr)
+{
+   *((volatile unsigned char *) (__SIZE_TYPE__) (base + addr)) = val;
+}
+void wreg_twice(void)
+{
+   wreg(0, 42);
+   wreg(0, 42);
+}
+
+/* We should not remove the second null character store to (base+42) address. 
*/
+/* { dg-final { scan-tree-dump-times " ={v} 0;" 2 "optimized" } }  */
+/* { dg-final { cleanup-tree-dump "optimized" } }  */
Index: tree-inline.c
===
--- tree-inline.c   (revision 191004)
+++ tree-inline.c   (working copy)
@@ -848,6 +848,7 @@ remap_gimple_op_r (tree *tp, int *walk_s
 ptr, TREE_OPERAND (*tp, 1));
  TREE_THIS_NOTRAP (*tp) = TREE_THIS_NOTRAP (old);
  TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old);
+ TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old);
  TREE_NO_WARNING (*tp) = TREE_NO_WARNING (old);
  *walk_subtrees = 0;
  return NULL;

Re: [PATCH] Fix PR 54494, removal of volatile store in strlen optimization because of the inlininer

2012-09-06 Thread Jakub Jelinek

On Thu, Sep 06, 2012 at 12:50:44AM -0700, Andrew Pinski wrote:
> > Patch preapproved, but you've attached a different patch.
> 
> Sorry about that.  Here is the correct one.
> 
> Also is this ok for the 4.7 branch?

Yes, thanks.

> --- tree-inline.c (revision 191004)
> +++ tree-inline.c (working copy)
> @@ -848,6 +848,7 @@ remap_gimple_op_r (tree *tp, int *walk_s
>ptr, TREE_OPERAND (*tp, 1));
> TREE_THIS_NOTRAP (*tp) = TREE_THIS_NOTRAP (old);
> TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old);
> +   TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old);
> TREE_NO_WARNING (*tp) = TREE_NO_WARNING (old);
> *walk_subtrees = 0;
> return NULL;


Jakub

Re: Ping: [PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1

2012-09-06 Thread Richard Earnshaw

On 06/09/12 06:41, Bin Cheng wrote:
> Hi Richard, 
> Thanks very much for comments.
> 
>>> Ping?
>>>
>>> Hi Ramana, could you help me review this patch?
>>> Hi Eric, Richard, could you help me review the change in regcprop.c?
>>
>> Subtraction of zero isn't canonical rtl though.  Passes after peephole2
> would
>> be well within their rights to simplify the expression back to a move.
>> From that point of view, making the passes recognise (plus X 0) and (minus
> X 0)
>> as special cases would be inconsistent.
>>
>> Rather than make the Thumb 1 CC usage implicit in the rtl stream, and
> carry
>> the current state around in cfun->machine, it seems like it would be
> better to
>> get md_reorg to rewrite the instructions into a form that makes the use of
>> condition codes explicit.
> 
> Here is a problem that two versions MOV instruction are supported in Thumb1
> instruction set. The flag-setting version MOV can only take low register
> operands, while non-flag-setting version MOV can take high register
> operands. So we cannot rewrite non-flag-setting version MOV(with high
> register operands) into explicitly flag-setting one, and that's why it is
> rewritten into subtract of zero instruction now.
> 
>>
>> md_reorg also sounds like a better place in the pipeline than peephole2 to
> be
>> doing this kind of transformation, although I admit I have zero evidence
> to
>> back that up...
> 
> Yes, it may be feasible to rewrite the instruction in machine reorg pass,
> rather than peephole2. But that need bigger change in ARM back end.
> Hi Ramana, Richard, what's your opinion on this?
> 
> Thanks very much.
> 
> 

I side with Richard on this one.  The mid-end should only have to deal
with RTL that's in canonical form.

R.

RE: Ping: [PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1

2012-09-06 Thread Bin Cheng

> >
> > Yes, it may be feasible to rewrite the instruction in machine reorg
> > pass, rather than peephole2. But that need bigger change in ARM back
end.
> > Hi Ramana, Richard, what's your opinion on this?
> >
> > Thanks very much.
> >
> >
> 
> I side with Richard on this one.  The mid-end should only have to deal
with
> RTL that's in canonical form.
> 
So how about rewrite mov insn into sub in machine reorg pass and remove the
current peeophole2 codes?

Thanks

Re: [Patch ARM] implement bswap16

2012-09-06 Thread Richard Earnshaw

On 05/09/12 17:01, Christophe Lyon wrote:
> Hi,
> 
> This patch implements __builtin_bswap16() on ARM (v6 and above) using
> revsh with a signed input and rev16 with an unsigned input.
> 
> It is pretty much equal to the patch posted some time ago
> http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00962.html, but it's hard
> to write such patterns differently ;-)
> 
> I have added a testcase.
> 
> OK for trunk?
> 
> Christophe.=
> 
> 

+(define_insn "*arm_revsh"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+   (sign_extend:SI (bswap:HI (match_operand:HI 1 "s_register_operand" 
"r"]
+  "TARGET_32BIT && arm_arch6"
+  "revsh%?\t%0, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "length" "4")]

Can you add additional constraints for the t1 encoding for this and the other 
TARGET_32BIT patterns.  Then the compiler will get the length calculations 
correct.  Something like:


(define_insn "*arm_revsh"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+   (sign_extend:SI (bswap:HI (match_operand:HI 1 "s_register_operand" 
"l,r"]
  "TARGET_32BIT && arm_arch6"
  "revsh%?\t%0, %1"
  [(set_attr "predicable" "yes")
+   (set_attr "arch" "t2,*")
+   (set_attr "length" "2,4")]

Brownie points for retro-fitting this to the existing rev patterns.

+(define_expand "bswaphi2"
+  [(set (match_operand:HI 0 "s_register_operand" "=r")
+   (bswap:HI (match_operand:HI 1 "s_register_operand" "r")))]

Define_expand doesn't take constraints.

Finally, these patterns should be grouped with the other byte-reversal patterns 
in arm.md, not placed at the end of the file.

R.

Re: Change double_int calls to new interface.

2012-09-06 Thread Richard Guenther

On Wed, 5 Sep 2012, Lawrence Crowl wrote:

> On 9/5/12, Richard Guenther  wrote:
> > On Tue, 4 Sep 2012, Lawrence Crowl wrote:
> >> Modify gcc/*.[hc] double_int call sites to use the new interface.
> >> This change entailed adding a few new methods to double_int.
> >>
> >> Other changes will happen in separate patches.  Once all uses of
> >> the old interface are gone, they will be removed.
> >>
> >> The change results in a 0.163% time improvement with a 70%
> >> confidence.
> >>
> >> Tested on x86_64.
> >> Index: gcc/ChangeLog
> >
> > - double_int_lshift
> > -   (double_int_one,
> > -TREE_INT_CST_LOW (vr1.min),
> > -TYPE_PRECISION (expr_type),
> > -false));
> > + double_int_one
> > + .llshift (TREE_INT_CST_LOW
> > (vr1.min),
> > +   TYPE_PRECISION
> > (expr_type)));
> >
> > Ick - is that what our coding-conventions say?  I mean the
> > .llshift on the next line.
> 
> Our conventions say nothing about that, but method calls seem
> somewhat analogoust to binary operators, and hence this formatting
> was probably the least objectional.
> 
> > Otherwise ok.
> 
> As in you want me to do something else?

No, just asking ;)

> > The tmin.cmp (tmax, uns) > 0 kind of things look odd - definitely
> > methods like tmin.gt (tmax, uns) would be nice to have.  Or even
> > better, get rid of the 'uns' parameters and provide a
> >
> > struct double_int_with_signedness {
> >   double_int val;
> >   bool uns;
> > };
> >
> > struct double_uint : double_int_with_signedness {
> >   double_uint (double_int);
> > };
> >
> > ...
> >
> > and comparison operators which take double_uint/sint.
> 
> It would, I think, be better to have separate signed and unsigned
> types.  That change was significantly structural, and I don't know
> where the wide_int work sits in relation to that choice.

double_int is supposed to be a twos-complement thing, ontop of it
we have functions that model signed/unsigned ints of a specific
precision.

> > You didn't remove any of the old interfaces, so I think we are
> > going to bitrot quickly again.
> 
> I couldn't remove the old interface yet because I haven't updated
> all the code yet.

Ah, I see.

Richard.

[PATCH] Make -fno-tree-pta work

2012-09-06 Thread Richard Guenther


This fixes -fno-tree-pta - noticed it doesn't work when trying
to compile a testcase where PTA uses too much memory.

Bootstrapped & tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-09-06  Richard Guenther  

* passes.c (execute_function_todo): Call compute_may_aliases
only if flag_tree_pta is set.

Index: gcc/passes.c
===
--- gcc/passes.c(revision 190983)
+++ gcc/passes.c(working copy)
@@ -1776,7 +1776,8 @@ execute_function_todo (void *data)
   if (flags & TODO_rebuild_alias)
 {
   execute_update_addresses_taken ();
-  compute_may_aliases ();
+  if (flag_tree_pta)
+   compute_may_aliases ();
 }
   else if (optimize && (flags & TODO_update_address_taken))
 execute_update_addresses_taken ();

Re: [middle-end] Add machine_mode to address_cost target hook

2012-09-06 Thread Oleg Endo

On Thu, 2012-09-06 at 09:49 +0200, Dominique Dhumieres wrote:
> Oleg,
> 
> Bootstrap fails at revision 190996 on powerpc-apple-darwin9 with:
> 
> g++ -c   -g -DIN_GCC   -fno-exceptions -fno-rtti -W -Wall -Wno-narrowing 
> -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic 
> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
> -DHAVE_CONFIG_H -I. -I. -I../../work/gcc -I../../work/gcc/. 
> -I../../work/gcc/../include -I./../intl -I../../work/gcc/../libcpp/include 
> -I/opt/mp/include  -I../../work/gcc/../libdecnumber 
> -I../../work/gcc/../libdecnumber/dpd -I../libdecnumber -DCLOOG_INT_GMP  
> -I/opt/mp/include  \
>   ../../work/gcc/config/rs6000/rs6000.c -o rs6000.o
> ../../work/gcc/config/rs6000/rs6000.c: In function 'int 
> rs6000_debug_address_cost(rtx, machine_mode, addr_space_t, bool)':
> ../../work/gcc/config/rs6000/rs6000.c:26077:42: error: cannot convert 'bool' 
> to 'machine_mode' for argument '2' to 'int hook_int_rtx_mode_as_bool_0(rtx, 
> machine_mode, addr_space_t, bool)'
>int ret = TARGET_ADDRESS_COST (x, speed);
> 
> Obvious(?) fix
> 
> --- ../_gcc_clean/gcc/config/rs6000/rs6000.c  2012-09-05 20:25:39.0 
> +0200
> +++ ../work/gcc/config/rs6000/rs6000.c2012-09-06 00:56:21.0 
> +0200
> @@ -26074,7 +26074,7 @@ static int
>  rs6000_debug_address_cost (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED,
>  addr_space_t as ATTRIBUTE_UNUSED, bool speed)
>  {
> -  int ret = TARGET_ADDRESS_COST (x, speed);
> +  int ret = TARGET_ADDRESS_COST (x, mode, as, speed);
>  
>fprintf (stderr, "\nrs6000_address_cost, return = %d, speed = %s, x:\n",
>  ret, speed ? "true" : "false");
> 

Argh, sorry.  Yes, you're right.  The unused attrs can also go away in
this case:

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 190990)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -26071,10 +26071,10 @@
 /* Debug form of ADDRESS_COST that is selected if -mdebug=cost.  */
 
 static int
-rs6000_debug_address_cost (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED,
-  addr_space_t as ATTRIBUTE_UNUSED, bool speed)
+rs6000_debug_address_cost (rtx x, enum machine_mode mode, addr_space_t as,
+  bool speed)
 {
-  int ret = TARGET_ADDRESS_COST (x, speed);
+  int ret = TARGET_ADDRESS_COST (x, mode, as, speed);
 
   fprintf (stderr, "\nrs6000_address_cost, return = %d, speed = %s, x:\n",
   ret, speed ? "true" : "false");


Could you please commit this (I can't at the moment)?

Cheers,
Oleg

Re: [patch] Random cleanups

2012-09-06 Thread Richard Guenther

On Wed, Sep 5, 2012 at 10:34 PM, Steven Bosscher  wrote:
> Hello,
>
> Just some cleanups I did while working on something bigger.
>
> OK for trunk?

Ok.

Thanks,
Richard.

> Ciao!
> Steven
>
> * graphite.c (print_global_statistics): Use EDGE_COUNT instead
> of VEC_length.
> (print_graphite_scop_statistics): Likewise.
> * graphite-scop-detection.c (get_bb_type): Use single_succ_p.
> (print_graphite_scop_statistics): Use EDGE_COUNT, not VEC_length.
> (canonicalize_loop_closed_ssa): Use single_pred_p.
>
> * alias.c (reg_seen): Make this an sbitmap.
> (record_set, init_alias_analysis): Update.

Re: [middle-end] Add machine_mode to address_cost target hook

2012-09-06 Thread Dominique Dhumieres

> Could you please commit this (I can't at the moment)?

Sorry, I don't have write access.

Dominique

Re: [patch] Fix bitmap_last_set_bit

2012-09-06 Thread Richard Guenther

On Wed, Sep 5, 2012 at 10:40 PM, Steven Bosscher  wrote:
> Hi,
>
> bitmap.c:bitmap_last_set_bit() is not used by any code in the current
> GCC trunk, but I'm using it and I noticed it returns an incorrect
> result. This patch rewrites most of the function to return the correct
> result.
>
> Not sure how to test this other than to say that my code, that uses
> this function, works with the patch and breaks without it. I've also
> unleashed bitmap_last_set_bit (and bitmap_first_set_bit) on a large
> number of randomly generated bitmaps and rather expensive verification
> code that doesn't accept the results of the pre-patch
> bitmap_last_set_bit and is happy with my new implementation.
>
> OK for trunk?

Ok.

Thanks,
Richard.

> Ciao!
> Steven
>
> * bitmap.c (bitmap_last_set_bit): Rewrite to return the correct bit.

Re: Ping: [PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1

2012-09-06 Thread Richard Earnshaw

On 06/09/12 09:33, Bin Cheng wrote:
>>>
>>> Yes, it may be feasible to rewrite the instruction in machine reorg
>>> pass, rather than peephole2. But that need bigger change in ARM back
> end.
>>> Hi Ramana, Richard, what's your opinion on this?
>>>
>>> Thanks very much.
>>>
>>>
>>
>> I side with Richard on this one.  The mid-end should only have to deal
> with
>> RTL that's in canonical form.
>>
> So how about rewrite mov insn into sub in machine reorg pass and remove the
> current peeophole2 codes?
> 
> Thanks
> 
> 

I think you should keep a peephole that transforms

X := Y
cbranch (cmp (Y, 0) tgt)

into

X := Y
cbranch (cmp (X, 0) tgt)

since that gives you a dataflow dependency between the two patterns that
is easier to track.

Once you've got that the mdreorg can track the dataflow and transform
the mov into a sub.

R.

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Richard Guenther

On Thu, Sep 6, 2012 at 8:06 AM, Marc Glisse  wrote:
> On Wed, 5 Sep 2012, Gabriel Dos Reis wrote:
>
>> On Wed, Sep 5, 2012 at 5:09 PM, Iyer, Balaji V 
>> wrote:
>>>
>>> Let's say we have two for loops like this:
>>>
>>> int my_func (int x, int y);
>>>
>>> For (ii = 0; ii < 1; ii++)
>>> X[ii] = my_func (Y[ii], Z[ii]);
>
>
> I assume X, Y and Z are __restrict pointers (or something the compiler can
> detect doesn't alias).
>
>
>> 2. Considering this example, won't you get the same behaviour
>> if my_func was declared with "pure" attribute?  If not, why?
>
>
> AFAIU, my_func is defined in a separate library and because of the attribute
> on the definition, it will actually export overloads:
> int myfunc(int,int);
> v2si myfunc(v2si,v2si);
> v4si myfunc(v4si,v4si);
> etc (where does it stop? seems problematic if the library is compiled for
> sse4 and I then compile and link an avx program)
>
> (hopefully with implementations more clever than breaking the vectors into
> pieces and calling the basic myfunc on each)
>
> The attribute on the declaration then lets gcc's vectorizer know it can call
> those overloads.

And as the overloads definitions are not guaranteed to be generated by GCC
you need to specify the ABI and mangling of those overloads.

+static tree
+handle_vector_attribute (tree *node, tree name ATTRIBUTE_UNUSED,
+tree args ATTRIBUTE_UNUSED,
+int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  tree opt_list;
+  VEC(tree,gc) *opt_vec = NULL;
+  opt_vec = make_tree_vector ();
+  VEC_safe_push (tree, gc, opt_vec, build_string (2, "O3"));
+  opt_list = build_tree_list_vec (opt_vec);
+  release_tree_vector (opt_vec);
+  handle_optimize_attribute (node, get_identifier ("optimize"), opt_list,
+flags, no_add_attrs);

Please no - do not use "optimize" attributes from inside the implementation.
What happens if the user also specifies an optimize attribute?
The above also doesnt' make sense to me, so please elaborate on why you
want to enable -O3 for a function marked with the vector attribute.

This all awfully sounds like a worse way to do the multi-versioning stuff
that is still pending review.

+  if (flag_enable_cilkplus
+  && gimple_code (stmt) == GIMPLE_CALL
+  && is_elem_fn (gimple_call_fndecl (stmt)))
+{
+  parm_type = find_elem_fn_parm_type (stmt, op, &step_size);
+  if (parm_type == TYPE_UNIFORM || parm_type == TYPE_LINEAR)
+   dt = vect_external_def;

the middle-end should not care if CILK+ is enabled or not.  Otherwise
this will not work with LTO.  Please use generic infrastructure for the
implementation or enhance generic infrastructure.

If the vectorizer should be able to vectorize non-inlined functions then
there should be an IPA pass analyzing functions for whether they can
be "elemental" (propagating this alongside the callgraph).  Then
you either decide up-front whether to "clone" those functions for
various vector sizes, or, IMHO better, make sure to ship the function
bodies to all LTRANS units that make use of them (much similar to
how we handle inlines) and make the vectorizer emit the clones.

In all this seems unrelated to CILK+ work (even if you make use of this
from within CILK+).

Richard.

> With suitable pure/const attribute you could unroll the loop a bit and
> reorder the calls to myfunc, but without myfunc's body, you couldn't do as
> much.
>
> Note that this is my guess from reading the example and completely ignoring
> the patch, it could be miles from the truth, and it needs better explanation
> (the doc patch is coming later in the series IIRC).
>
> --
> Marc Glisse

Re: [middle-end] Add machine_mode to address_cost target hook

2012-09-06 Thread Oleg Endo






On 6 Sep 2012, at 11:18, domi...@lps.ens.fr (Dominique Dhumieres) wrote:


Could you please commit this (I can't at the moment)?


Sorry, I don't have write access.



OK, then I will commit it later today.

Cheers,
Oleg

Re: [PATCH] Combine location with block using block_locations

2012-09-06 Thread Richard Guenther

On Wed, Aug 22, 2012 at 1:54 AM, Dehao Chen  wrote:
> On Tue, Aug 21, 2012 at 6:25 AM, Richard Guenther
>  wrote:
>> On Mon, Aug 20, 2012 at 3:18 AM, Dehao Chen  wrote:
>>> ping
>>
>> Conceptually I like the change.  Can a libcpp maintainer please have a 2nd
>> look?
>>
>> Dehao, did you do any compile-time and memory-usage benchmarks?
>
> I don't have a memory benchmarks at hand. But I've tested it through
> some huge apps, each of which takes more than 1 hour to build on a
> modern machine. None of them had observed noticeable memory footprint
> and compile time increase.

Thanks.

I'd like to see a 2nd "ok", but hereby I give a first one ;)  It's a big change
but definitely a good one.

Thanks again,
Richard.

> Thanks,
> Dehao
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Dehao
>>>
>>> On Tue, Aug 14, 2012 at 10:13 AM, Dehao Chen  wrote:
 Hi, Dodji,

 Thanks for the review. I've fixed all the addressed issues. I'm
 attaching the related changes:

 Thanks,
 Dehao

 libcpp/ChangeLog:
 2012-08-01  Dehao Chen  

 * include/line-map.h (MAX_SOURCE_LOCATION): New value.
 (location_adhoc_data_init): New.
 (location_adhoc_data_fini): New.
 (get_combined_adhoc_loc): New.
 (get_data_from_adhoc_loc): New.
 (get_location_from_adhoc_loc): New.
 (COMBINE_LOCATION_DATA): New.
 (IS_ADHOC_LOC): New.
 (expanded_location): New field.
 * line-map.c (location_adhoc_data): New.
 (location_adhoc_data_htab): New.
 (curr_adhoc_loc): New.
 (location_adhoc_data): New.
 (allocated_location_adhoc_data): New.
 (location_adhoc_data_hash): New.
 (location_adhoc_data_eq): New.
 (location_adhoc_data_update): New.
 (get_combined_adhoc_loc): New.
 (get_data_from_adhoc_loc): New.
 (get_location_from_adhoc_loc): New.
 (location_adhoc_data_init): New.
 (location_adhoc_data_fini): New.
 (linemap_lookup): Change to use new location.
 (linemap_ordinary_map_lookup): Likewise.
 (linemap_macro_map_lookup): Likewise.
 (linemap_macro_map_loc_to_def_point): Likewise.
 (linemap_macro_map_loc_unwind_toward_spel): Likewise.
 (linemap_get_expansion_line): Likewise.
 (linemap_get_expansion_filename): Likewise.
 (linemap_location_in_system_header_p): Likewise.
 (linemap_location_from_macro_expansion_p): Likewise.
 (linemap_macro_loc_to_spelling_point): Likewise.
 (linemap_macro_loc_to_def_point): Likewise.
 (linemap_macro_loc_to_exp_point): Likewise.
 (linemap_resolve_location): Likewise.
 (linemap_unwind_toward_expansion): Likewise.
 (linemap_unwind_to_first_non_reserved_loc): Likewise.
 (linemap_expand_location): Likewise.
 (linemap_dump_location): Likewise.

 Index: libcpp/line-map.c
 ===
 --- libcpp/line-map.c   (revision 190209)
 +++ libcpp/line-map.c   (working copy)
 @@ -25,6 +25,7 @@
  #include "line-map.h"
  #include "cpplib.h"
  #include "internal.h"
 +#include "hashtab.h"

  static void trace_include (const struct line_maps *, const struct 
 line_map *);
  static const struct line_map * linemap_ordinary_map_lookup (struct 
 line_maps *,
 @@ -50,6 +51,135 @@
  extern unsigned num_expanded_macros_counter;
  extern unsigned num_macro_tokens_counter;

 +/* Data structure to associate an arbitrary data to a source location.  */
 +struct location_adhoc_data {
 +  source_location locus;
 +  void *data;
 +};
 +
 +/* The following data structure encodes a location with some adhoc data
 +   and maps it to a new unsigned integer (called an adhoc location)
 +   that replaces the original location to represent the mapping.
 +
 +   The new adhoc_loc uses the highest bit as the enabling bit, i.e. if the
 +   highest bit is 1, then the number is adhoc_loc. Otherwise, it serves as
 +   the original location. Once identified as the adhoc_loc, the lower 31
 +   bits of the integer is used to index the location_adhoc_data array,
 +   in which the locus and associated data is stored.  */
 +
 +static htab_t location_adhoc_data_htab;
 +static source_location curr_adhoc_loc;
 +static struct location_adhoc_data *location_adhoc_data;
 +static unsigned int allocated_location_adhoc_data;
 +
 +/* Hash function for location_adhoc_data hashtable.  */
 +
 +static hashval_t
 +location_adhoc_data_hash (const void *l)
 +{
 +  const struct location_adhoc_data *lb =
 +  (const struct location_adhoc_data *) l;
 +  retur

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Gabriel Dos Reis

On Thu, Sep 6, 2012 at 1:06 AM, Marc Glisse  wrote:
> On Wed, 5 Sep 2012, Gabriel Dos Reis wrote:
>
>> On Wed, Sep 5, 2012 at 5:09 PM, Iyer, Balaji V 
>> wrote:
>>>
>>> Let's say we have two for loops like this:
>>>
>>> int my_func (int x, int y);
>>>
>>> For (ii = 0; ii < 1; ii++)
>>> X[ii] = my_func (Y[ii], Z[ii]);
>
>
> I assume X, Y and Z are __restrict pointers (or something the compiler can
> detect doesn't alias).

I assumed this much.


>
>
>> 2. Considering this example, won't you get the same behaviour
>> if my_func was declared with "pure" attribute?  If not, why?
>
>
> AFAIU, my_func is defined in a separate library and because of the attribute
> on the definition, it will actually export overloads:
> int myfunc(int,int);
> v2si myfunc(v2si,v2si);
> v4si myfunc(v4si,v4si);
> etc (where does it stop? seems problematic if the library is compiled for
> sse4 and I then compile and link an avx program)

Thanks but I was not talking of anything this complicated.
The "pure" attribute has nothing to do with overloading?


>
> (hopefully with implementations more clever than breaking the vectors into
> pieces and calling the basic myfunc on each)
>
> The attribute on the declaration then lets gcc's vectorizer know it can call
> those overloads.

My question was why the same conclusion won't be reached on the
example given if the function was declared with the "pure attribute.

>
> With suitable pure/const attribute you could unroll the loop a bit and
> reorder the calls to myfunc, but without myfunc's body, you couldn't do as
> much.
>
> Note that this is my guess from reading the example and completely ignoring
> the patch, it could be miles from the truth, and it needs better explanation
> (the doc patch is coming later in the series IIRC).

Note that the example given, was a function taking two ints and returning
an int. How would a function with "pure" attribute fool the vectorizer?

-- Gaby

RE: [PATCH] Enable bbro for -Os

2012-09-06 Thread Zhenqiang Chen

Thank you for the comments.

> -Original Message-
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Wednesday, September 05, 2012 9:39 PM
> To: Zhenqiang Chen
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Enable bbro for -Os
> 
> > Basic block reordering is disabled for -Os from gcc 4.7 since the pass
> > will lead to big code size regression. But benchmarks logs also show
> > there are lots of regression due to poor code layout compared with 4.6.
> >
> > The patch is to enable bbro for -Os. When optimizing for size, it
> > * avoid duplicating block.
> > * keep its original order if there is no chance to fall through.
> > * ignore edge frequency and probability.
> > * handle predecessor first if its index is smaller to break long trace.
> > * only connect Trace n with Trace n + 1 to reduce long jump.
> >
> > Here are the CSiBE code size benchmark results:
> > * For ARM, code size reduces 0.21%.
> > * For MIPS, code size reduces 0.25%.
> > * For PPC, code size reduces 0.33%.
> > * For X86, code size reduces 0.22%.
> 
> Interesting figures.  The patch looks good overall but, since the -Os path
> deviates substantially from the original algorithm, it needs to be clearly
> documented in the comment at the top of the file (before "Reference"),
e.g.
> 
> "The above description is for the full algorithm, which is used when the
> function is optimized for speed.  When the function is optimized for size,
in
> order to <...insert reasons here...>, the algorithm is modified as
follows:
> <...list modifications here...>"

Add the following comments:
+   The above description is for the full algorithm, which is used when the
+   function is optimized for speed.  When the function is optimized for
size,
+   in order to reduce long jump and connect more fall through edges, the
+   algorithm is modified as follows:
+   (1) Break long trace to short ones.  The trace is broken at a block,
which
+   has multi-predecessors/successors during finding traces.  When
connecting
+   traces, only connect Trace n with Trace n + 1.  This change reduces most
+   long jumps compared with the above algorithm.
+   (2) Ignore the edge probability and frequency for fall through edges.
+   (3) Keep its original order when there is no chance to fall through.
bbro
+   bases on the result of cfg_cleanup, which does lots of optimizations on
cfg.
+   So the order is expected to be kept if no fall through.
+
+   To implement the change for code size optimization, block's index is
+   selected as the key and all traces are found in one round.
 
 
> @@ -558,6 +564,14 @@ find_traces_1_round (int branch_th, int exec_th,
> gcov_type count_th,
> /* Add all non-selected successors to the heaps.  */
> FOR_EACH_EDGE (e, ei, bb->succs)
>   {
> +   /* Wait for the predecessors.  */
> +   if ((e == best_edge) && for_size
> +   && (EDGE_COUNT (best_edge->dest->succs) > 1
> +   || EDGE_COUNT (best_edge->dest->preds) > 1))
> + {
> +   best_edge = NULL;
> + }
> +
> if (e == best_edge
> || e->dest == EXIT_BLOCK_PTR
> || bb_visited_trace (e->dest))
> 
> I don't really understand what this means and why this is done here.  If
you
> don't want to add the best destination in some cases, why not do it just
> before the loop and explicitly state the reason?  And you don't need
> parentheses.

The code segment should be before the loop. Add the following comments for
the code:

+ /* If the best destination has multiple successors or
predecessors,
+don't allow it to be added when optimizing for size.  This
makes
+sure predecessors with smaller index handled before the best
+destination.  It breaks long trace and reduces long jumps.
+
+Take if-then-else as an example.
+   A
+  / \
+ B   C
+  \ /
+   D
+If we do not remove the best edge B->D/C->D.  The final order
is
+A B D ... C.  C is at the end of the program.  If D or
successors
+of D are complicated, might need long jump for A->C and C->D.
+Similar issue for order: A C D ... B.
+
+After removing the best edge, the final result will be
ABCD/ACBD.
+It does not add jump compared with the previous order. But it
+reduce the possibility of long jump.  */
+ if (best_edge && for_size
+ && (EDGE_COUNT (best_edge->dest->succs) > 1
+|| EDGE_COUNT (best_edge->dest->preds) > 1))
+   best_edge = NULL;

All other comments are accepted. 

The updated patch is attached. Is it OK?

Thanks!
-Zhenqiang

Enable-bbro-for-size-updated2.patch
Description: Binary data

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Marc Glisse


On Thu, 6 Sep 2012, Marc Glisse wrote:

AFAIU, my_func is defined in a separate library and because of the attribute 
on the definition, it will actually export overloads:

int myfunc(int,int);
v2si myfunc(v2si,v2si);
v4si myfunc(v4si,v4si);
etc (where does it stop? seems problematic if the library is compiled for 
sse4 and I then compile and link an avx program)


According to the doc, it only generates one of these vector versions (even 
more risk of mismatch).


Does it actually create the extra declaration in the front-end, i.e. can I 
explicitly call myfunc on a v4si that I created myself, or is the 
middle-end the only user?


--
Marc Glisse

[PATCH] Remove unused MOVE_NONTEMPORAL

2012-09-06 Thread Richard Guenther


This removes the MOVE_NONTEMPORAL tree flag which is nowhere set.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-09-06  Richard Guenther  

* tree.h (MOVE_NONTEMPORAL): Remove.
* tree-pretty-print.c (dump_generic_node): Remove
MOVE_NONTEMPORAL handling.
* expr.c (expand_expr_real_1): Likewise.

Index: gcc/tree-pretty-print.c
===
--- gcc/tree-pretty-print.c (revision 191014)
+++ gcc/tree-pretty-print.c (working copy)
@@ -1436,9 +1436,6 @@ dump_generic_node (pretty_printer *buffe
 false);
   pp_space (buffer);
   pp_character (buffer, '=');
-  if (TREE_CODE (node) == MODIFY_EXPR
- && MOVE_NONTEMPORAL (node))
-   pp_string (buffer, "{nt}");
   pp_space (buffer);
   dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags,
 false);
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 191014)
+++ gcc/tree.h  (working copy)
@@ -524,9 +524,6 @@ struct GTY(()) tree_base {
TYPE_REF_CAN_ALIAS_ALL in
POINTER_TYPE, REFERENCE_TYPE
 
-   MOVE_NONTEMPORAL in
-   MODIFY_EXPR
-
CASE_HIGH_SEEN in
CASE_LABEL_EXPR
 
@@ -1162,10 +1159,6 @@ extern void omp_clause_range_check_faile
 #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
   (PTR_OR_REF_CHECK (NODE)->base.static_flag)
 
-/* In a MODIFY_EXPR, means that the store in the expression is nontemporal.  */
-#define MOVE_NONTEMPORAL(NODE) \
-  (EXPR_CHECK (NODE)->base.static_flag)
-
 /* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
there was an overflow in folding.  */
 
Index: gcc/expr.c
===
--- gcc/expr.c  (revision 191014)
+++ gcc/expr.c  (working copy)
@@ -10319,13 +10319,13 @@ expand_expr_real_1 (tree exp, rtx target
 value ? label : 0,
 value ? 0 : label, -1);
expand_assignment (lhs, build_int_cst (TREE_TYPE (rhs), value),
-  MOVE_NONTEMPORAL (exp));
+  false);
do_pending_stack_adjust ();
emit_label (label);
return const0_rtx;
  }
 
-   expand_assignment (lhs, rhs, MOVE_NONTEMPORAL (exp));
+   expand_assignment (lhs, rhs, false);
return const0_rtx;
   }

Re: [PATCH] Fix SCCVN aggregate copy code some more

2012-09-06 Thread H.J. Lu

On Thu, Aug 11, 2011 at 8:31 AM, Richard Guenther  wrote:
>
> When adjusting the C++ FE code generation for base type copies
> I noticed the XFAILs on g++.dg/tree-ssa/pr41186.C and investigated
> things a bit.  We can fix the testcase easily iff the frontend
> emits the MEM_REF variants, thus the following patch.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> Richard.
>
> 2011-08-11  Richard Guenther  
>
> * tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid redundant
> lookups, make looking through aggregate copies stronger.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54498

-- 
H.J.

Re: [middle-end] Add machine_mode to address_cost target hook

2012-09-06 Thread Georg-Johann Lay


Oleg Endo schrieb:

On Wed, 2012-09-05 at 14:39 -0400, DJ Delorie wrote:

I don't feel the m32c change needs my specific ack, it's a harmless
change that goes with the ack for the feature itself.

However, I will note that m32c does have different costs for addresses
in different address spaces, at least when -Os.


I have created http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54496
for this.


The same is true for avr.  The address costs highly depend on the
address space, no matter what -O is used.

Can you explain how this works?

For example I don't see a single call of address_cost in
lower_subreg.c what means that at least that module has
not a reasonable cost model.  The costs go that odyssey:

lower-subreg.c:compute_costs()
-> rtlanal.c:insn_rtx_cost()
-> rtl.h:set_src_cost()
-> rtlanal.c:rtx_cost()
-> targetm.rtx_costs()

Each call level add some abstraction, i.e. removes information
about the insn; atually it's no more an insn, not even a pattern,
if the target hook is entered...

So at this point it appears a bit pointless to add mode
and addr_space to address_cost if the call sites don't use
that hook if it is needed.

The change is definitely in the right direction, but I wonder
how it helps to fix code bloats of 300%-400% as in PR52543?

The avr backend currently hacks around that by expanding MEM
for non-generic address space to UNSPEC.  Not nice.

Describing the cost will simply have no effect (provided that
MEM -> UNSPEC hack would be reverted).

Johann

[PATCH] Fix PR54498

2012-09-06 Thread Richard Guenther


The following fixes PR54498 - we may not treat already visited
parts of the CFG as non-aliasing after re-writing the expression
we lookup.  Just abort the walk in this case.

Bootstrapped and tested on x86_64-unknown-linux-gnu on the 4.7
branch, trunk testing in progress.

Richard.

2012-09-06  Richard Guenther  

PR tree-optimization/54498
* tree-ssa-alias.h (get_continuation_for_phi): Add flag to
abort when reaching an already visited region.
* tree-ssa-alias.c (maybe_skip_until): Likewise.  And do it.
(get_continuation_for_phi_1): Likewise.
(walk_non_aliased_vuses): When we translated the reference,
abort when we re-visit a region.
* tree-ssa-pre.c (translate_vuse_through_block): Adjust.

Index: gcc/tree-ssa-alias.h
===
--- gcc/tree-ssa-alias.h(revision 191016)
+++ gcc/tree-ssa-alias.h(working copy)
@@ -108,7 +108,7 @@ extern bool stmt_may_clobber_ref_p (gimp
 extern bool stmt_may_clobber_ref_p_1 (gimple, ao_ref *);
 extern bool call_may_clobber_ref_p (gimple, tree);
 extern bool stmt_kills_ref_p (gimple, tree);
-extern tree get_continuation_for_phi (gimple, ao_ref *, bitmap *);
+extern tree get_continuation_for_phi (gimple, ao_ref *, bitmap *, bool);
 extern void *walk_non_aliased_vuses (ao_ref *, tree,
 void *(*)(ao_ref *, tree, void *),
 void *(*)(ao_ref *, tree, void *), void *);
Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 191016)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -1886,7 +1886,7 @@ stmt_kills_ref_p (gimple stmt, tree ref)
 
 static bool
 maybe_skip_until (gimple phi, tree target, ao_ref *ref,
- tree vuse, bitmap *visited)
+ tree vuse, bitmap *visited, bool abort_on_visited)
 {
   basic_block bb = gimple_bb (phi);
 
@@ -1904,8 +1904,9 @@ maybe_skip_until (gimple phi, tree targe
{
  /* An already visited PHI node ends the walk successfully.  */
  if (bitmap_bit_p (*visited, SSA_NAME_VERSION (PHI_RESULT (def_stmt
-   return true;
- vuse = get_continuation_for_phi (def_stmt, ref, visited);
+   return !abort_on_visited;
+ vuse = get_continuation_for_phi (def_stmt, ref,
+  visited, abort_on_visited);
  if (!vuse)
return false;
  continue;
@@ -1919,7 +1920,7 @@ maybe_skip_until (gimple phi, tree targe
   if (gimple_bb (def_stmt) != bb)
{
  if (!bitmap_set_bit (*visited, SSA_NAME_VERSION (vuse)))
-   return true;
+   return !abort_on_visited;
  bb = gimple_bb (def_stmt);
}
   vuse = gimple_vuse (def_stmt);
@@ -1933,7 +1934,8 @@ maybe_skip_until (gimple phi, tree targe
 
 static tree
 get_continuation_for_phi_1 (gimple phi, tree arg0, tree arg1,
-   ao_ref *ref, bitmap *visited)
+   ao_ref *ref, bitmap *visited,
+   bool abort_on_visited)
 {
   gimple def0 = SSA_NAME_DEF_STMT (arg0);
   gimple def1 = SSA_NAME_DEF_STMT (arg1);
@@ -1946,14 +1948,14 @@ get_continuation_for_phi_1 (gimple phi,
   && dominated_by_p (CDI_DOMINATORS,
  gimple_bb (def1), gimple_bb (def0
 {
-  if (maybe_skip_until (phi, arg0, ref, arg1, visited))
+  if (maybe_skip_until (phi, arg0, ref, arg1, visited, abort_on_visited))
return arg0;
 }
   else if (gimple_nop_p (def1)
   || dominated_by_p (CDI_DOMINATORS,
  gimple_bb (def0), gimple_bb (def1)))
 {
-  if (maybe_skip_until (phi, arg1, ref, arg0, visited))
+  if (maybe_skip_until (phi, arg1, ref, arg0, visited, abort_on_visited))
return arg1;
 }
   /* Special case of a diamond:
@@ -1988,7 +1990,8 @@ get_continuation_for_phi_1 (gimple phi,
be found.  */
 
 tree
-get_continuation_for_phi (gimple phi, ao_ref *ref, bitmap *visited)
+get_continuation_for_phi (gimple phi, ao_ref *ref, bitmap *visited,
+ bool abort_on_visited)
 {
   unsigned nargs = gimple_phi_num_args (phi);
 
@@ -2025,7 +2028,8 @@ get_continuation_for_phi (gimple phi, ao
   for (i = 0; i < nargs; ++i)
{
  arg1 = PHI_ARG_DEF (phi, i);
- arg0 = get_continuation_for_phi_1 (phi, arg0, arg1, ref, visited);
+ arg0 = get_continuation_for_phi_1 (phi, arg0, arg1, ref, visited,
+abort_on_visited);
  if (!arg0)
return NULL_TREE;
}
@@ -2061,6 +2065,7 @@ walk_non_aliased_vuses (ao_ref *ref, tre
 {
   bitmap visited = NULL;
   void *res;
+  bool translated = false;
 
   timevar_push (TV_ALIAS_STMT_WALK);
 
@@ -2077,7 +2082,7 @@ walk_non_aliased_vuses (ao_ref *ref, tre

[PATCH] rs6000 TARGET_ADDRSS_COST

2012-09-06 Thread David Edelsohn

rs6000_debug_address_cost() needs to call TARGET_ADDRESS_COST with the
new arguments.

Committed.

- David

* config/rs6000/rs6000.c (rs6000_xcoff_asm_named_section): Add TLS
section.
* config/rs6000/rs6000.c (rs6000_debug_address_cost): Add new
arguments to TARGET_ADDRESS_COST call.

Index: rs6000.c
===
--- rs6000.c(revision 191020)
+++ rs6000.c(working copy)
@@ -25547,10 +25547,12 @@
tree decl ATTRIBUTE_UNUSED)
 {
   int smclass;
-  static const char * const suffix[3] = { "PR", "RO", "RW" };
+  static const char * const suffix[4] = { "PR", "RO", "RW", "TL" };

   if (flags & SECTION_CODE)
 smclass = 0;
+  else if (flags & SECTION_TLS)
+smclass = 3;
   else if (flags & SECTION_WRITE)
 smclass = 2;
   else
@@ -26071,10 +26073,10 @@
 /* Debug form of ADDRESS_COST that is selected if -mdebug=cost.  */

 static int
-rs6000_debug_address_cost (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED,
-  addr_space_t as ATTRIBUTE_UNUSED, bool speed)
+rs6000_debug_address_cost (rtx x, enum machine_mode mode,
+  addr_space_t as, bool speed)
 {
-  int ret = TARGET_ADDRESS_COST (x, speed);
+  int ret = TARGET_ADDRESS_COST (x, mode, as, speed);

   fprintf (stderr, "\nrs6000_address_cost, return = %d, speed = %s, x:\n",
   ret, speed ? "true" : "false");

[PATCH] New PARAM_IPA_MAX_AGG_ITEMS instead of a #define

2012-09-06 Thread Martin Jambor

Hi,

the following patch replaces #define IPA_MAX_AFF_JF_ITEMS with a new
parameter PARAM_IPA_MAX_AGG_ITEMS (yeah, without the type).
Bootstrapped and tested on x86_64-linux, I consider it obvious enough
that I will commit it next week if there are no objections.

Thanks,

Martin


2012-09-05  Martin Jambor  

* params.def (PARAM_IPA_MAX_AGG_ITEMS): New parameter.
* ipa-prop.c: Include params.h.
(IPA_MAX_AFF_JF_ITEMS): Removed.
(determine_known_aggregate_parts): Use param value of
PARAM_IPA_MAX_AGG_ITEMS instead of IPA_MAX_AFF_JF_ITEMS.
* Makefile.in (ipa-prop.o): Add PARAMS_H dependency.

Index: src/gcc/Makefile.in
===
--- src.orig/gcc/Makefile.in
+++ src/gcc/Makefile.in
@@ -2851,7 +2851,7 @@ ipa-prop.o : ipa-prop.c $(CONFIG_H) $(SY
$(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) \
$(TREE_INLINE_H) $(GIMPLE_H) \
$(GIMPLE_PRETTY_PRINT_H) $(LTO_STREAMER_H) \
-   $(DATA_STREAMER_H) $(TREE_STREAMER_H)
+   $(DATA_STREAMER_H) $(TREE_STREAMER_H) $(PARAMS_H)
 ipa-ref.o : ipa-ref.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
langhooks.h $(GGC_H) $(TARGET_H) $(CGRAPH_H)  $(TREE_H) $(TARGET_H) \
$(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) $(GGC_H) 
Index: src/gcc/ipa-prop.c
===
--- src.orig/gcc/ipa-prop.c
+++ src/gcc/ipa-prop.c
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
 #include "lto-streamer.h"
 #include "data-streamer.h"
 #include "tree-streamer.h"
+#include "params.h"
 
 
 /* Intermediate information about a parameter that is only useful during the
@@ -1145,9 +1146,6 @@ get_ssa_def_if_simple_copy (tree rhs)
   return rhs;
 }
 
-/* TODO: Turn this into a PARAM.  */
-#define IPA_MAX_AFF_JF_ITEMS 16
-
 /* Simple linked list, describing known contents of an aggregate beforere
call.  */
 
@@ -1327,8 +1325,8 @@ determine_known_aggregate_parts (gimple
   *p = n;
 
   item_count++;
-  if (const_count == IPA_MAX_AFF_JF_ITEMS
- || item_count == 2 * IPA_MAX_AFF_JF_ITEMS)
+  if (const_count == PARAM_VALUE (PARAM_IPA_MAX_AGG_ITEMS)
+ || item_count == 2 * PARAM_VALUE (PARAM_IPA_MAX_AGG_ITEMS))
break;
 }
 
Index: src/gcc/params.def
===
--- src.orig/gcc/params.def
+++ src/gcc/params.def
@@ -885,6 +885,12 @@ DEFPARAM (PARAM_IPA_CP_EVAL_THRESHOLD,
  "beneficial to clone.",
  500, 0, 0)
 
+DEFPARAM (PARAM_IPA_MAX_AGG_ITEMS,
+ "ipa-max-agg-items",
+ "Maximum number of aggregate content items for a parameter in "
+ "jump functions and lattices",
+ 16, 0, 0)
+
 /* WHOPR partitioning configuration.  */
 
 DEFPARAM (PARAM_LTO_PARTITIONS,

[AArch64] allow 16 bytes constants in constant pool.

2012-09-06 Thread Marcus Shawcroft

Relax the logic that prevents TFmode constants being addressed in the 
constant pool.


2012-09-06  Marcus Shawcroft  

* config/aarch64/aarch64.c (aarch64_classify_address):
Allow 16 byte modes in constant pool.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 310c1a0..aa90402 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2835,9 +2835,7 @@ aarch64_classify_address (struct aarch64_address_info *info,
 case LABEL_REF:
   /* load literal: pc-relative constant pool entry.  */
   info->type = ADDRESS_SYMBOLIC;
-  if (outer_code != PARALLEL
-	  && (GET_MODE_SIZE (mode) == 4
-	  || GET_MODE_SIZE (mode) == 8))
+  if (outer_code != PARALLEL)
 	{
 	  rtx sym, addend;

[AArch64] Implement section anchors

2012-09-06 Thread James Greenhalgh


Hi,

This patch implements section anchors for the AArch64 port.

OK for aarch64-branch?

Regards,
James Greenhalgh

--
2012-09-06  James Greenhalgh  
Richard Earnshaw  

* common/config/aarch64/aarch64-common.c
(aarch_option_optimization_table): New.
(TARGET_OPTION_OPTIMIZATION_TABLE): Define.
* gcc/config.gcc ([aarch64] target_has_targetm_common): Set to yes.
* gcc/config/aarch64/aarch64-elf.h (ASM_OUTPUT_DEF): New definition.
* gcc/config/aarch64/aarch64.c (TARGET_MIN_ANCHOR_OFFSET): Define.
(TARGET_MAX_ANCHOR_OFFSET): Likewise.
diff --git gcc/common/config/aarch64/aarch64-common.c gcc/common/config/aarch64/aarch64-common.c
index df72406..bd249e1 100644
--- gcc/common/config/aarch64/aarch64-common.c
+++ gcc/common/config/aarch64/aarch64-common.c
@@ -36,6 +36,17 @@
 #undef  TARGET_HANDLE_OPTION
 #define TARGET_HANDLE_OPTION aarch64_handle_option
 
+#undef	TARGET_OPTION_OPTIMIZATION_TABLE
+#define TARGET_OPTION_OPTIMIZATION_TABLE aarch_option_optimization_table
+
+/* Set default optimization options.  */
+static const struct default_options aarch_option_optimization_table[] =
+  {
+/* Enable section anchors by default at -O1 or higher.  */
+{ OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
+{ OPT_LEVELS_NONE, 0, NULL, 0 }
+  };
+
 /* Implement TARGET_HANDLE_OPTION.
This function handles the target specific options for CPU/target selection.
 
diff --git gcc/config.gcc gcc/config.gcc
index 72ef1ca..27a93b5 100644
--- gcc/config.gcc
+++ gcc/config.gcc
@@ -314,6 +314,7 @@ aarch64*-*-*)
 	need_64bit_hwint=yes
 	extra_headers="arm_neon.h"
 	extra_objs="aarch64-builtins.o"
+	target_has_targetm_common=yes
 	;;
 alpha*-*-*)
 	cpu_type=alpha
diff --git gcc/config/aarch64/aarch64-elf.h gcc/config/aarch64/aarch64-elf.h
index 6d8b933..1c021d0 100644
--- gcc/config/aarch64/aarch64-elf.h
+++ gcc/config/aarch64/aarch64-elf.h
@@ -25,6 +25,15 @@
 #define ASM_OUTPUT_LABELREF(FILE, NAME) \
   aarch64_asm_output_labelref (FILE, NAME)
 
+#define ASM_OUTPUT_DEF(FILE, NAME1, NAME2)	\
+  do		\
+{		\
+  assemble_name (FILE, NAME1);		\
+  fputs (" = ", FILE);			\
+  assemble_name (FILE, NAME2);		\
+  fputc ('\n', FILE);			\
+} while (0)
+
 #define TEXT_SECTION_ASM_OP	"\t.text"
 #define DATA_SECTION_ASM_OP	"\t.data"
 #define BSS_SECTION_ASM_OP	"\t.bss"
diff --git gcc/config/aarch64/aarch64.c gcc/config/aarch64/aarch64.c
index 310c1a0..98f43e1 100644
--- gcc/config/aarch64/aarch64.c
+++ gcc/config/aarch64/aarch64.c
@@ -6799,6 +6799,17 @@ aarch64_c_mode_for_suffix (char suffix)
 #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE aarch64_preferred_simd_mode
 
+/* Section anchor support.  */
+
+#undef TARGET_MIN_ANCHOR_OFFSET
+#define TARGET_MIN_ANCHOR_OFFSET -256
+
+/* Limit the maximum anchor offset to 4k-1, since that's the limit for a
+   byte offset; we can do much more for larger data types, but have no way
+   to determine the size of the access.  We assume accesses are aligned.  */
+#undef TARGET_MAX_ANCHOR_OFFSET
+#define TARGET_MAX_ANCHOR_OFFSET 4095
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"

[AArch64] support long double exceptions and rounding mode

2012-09-06 Thread Marcus Shawcroft

Enable the raising of exception in long double soft float and support 
for rounding mode.


2012-09-06  Marcus Shawcroft  

* config/aarch64/sfp-machine.h (FP_EX_INVALID, FP_EX_DIVZERO)
(FP_EX_OVERFLOW, FP_EX_UNDERFLOW, FP_EX_INEXACT)
(FP_HANDLE_EXCEPTIONS, FP_RND_NEAREST, FP_RND_ZERO, FP_RND_PINF)
(FP_RND_MINF, _FP_DECL_EX, FP_INIT_FOUNDMODE, FP_ROUNDMODE):New.
diff --git a/libgcc/config/aarch64/sfp-machine.h b/libgcc/config/aarch64/sfp-machine.h
index a697348..3a09ae7 100644
--- a/libgcc/config/aarch64/sfp-machine.h
+++ b/libgcc/config/aarch64/sfp-machine.h
@@ -64,6 +64,79 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
 R##_c = FP_CLS_NAN;		\
   } while (0)
 
+#define FP_EX_INVALID	0x01
+#define FP_EX_DIVZERO	0x02
+#define FP_EX_OVERFLOW	0x04
+#define FP_EX_UNDERFLOW	0x08
+#define FP_EX_INEXACT	0x10
+
+#define FP_HANDLE_EXCEPTIONS		\
+  do {	\
+const float fp_max = __FLT_MAX__;	\
+const float fp_min = __FLT_MIN__;	\
+const float fp_1e32 = 1.0e32f;	\
+const float fp_zero = 0.0;		\
+const float fp_one = 1.0;		\
+unsigned fpsr;			\
+if (_fex & FP_EX_INVALID)		\
+  {	\
+__asm__ __volatile__ ("fdiv\ts0, %s0, %s0"			\
+			  :		\
+			  : "w" (fp_zero)\
+			  : "s0");	\
+	__asm__ __volatile__ ("mrs\t%0, fpsr" : "=r" (fpsr));		\
+  }	\
+if (_fex & FP_EX_DIVZERO)		\
+  {	\
+	__asm__ __volatile__ ("fdiv\ts0, %s0, %s1"			\
+			  :		\
+			  : "w" (fp_one), "w" (fp_zero)		\
+			  : "s0");	\
+	__asm__ __volatile__ ("mrs\t%0, fpsr" : "=r" (fpsr));		\
+  }	\
+if (_fex & FP_EX_OVERFLOW)		\
+  {	\
+__asm__ __volatile__ ("fadd\ts0, %s0, %s1"			\
+			  :		\
+			  : "w" (fp_max), "w" (fp_1e32)		\
+			  : "s0");	\
+__asm__ __volatile__ ("mrs\t%0, fpsr" : "=r" (fpsr));		\
+  }	\
+if (_fex & FP_EX_UNDERFLOW)		\
+  {	\
+	__asm__ __volatile__ ("fmul\ts0, %s0, %s0"			\
+			  :		\
+			  : "w" (fp_min)\
+			  : "s0");	\
+	__asm__ __volatile__ ("mrs\t%0, fpsr" : "=r" (fpsr));		\
+  }	\
+if (_fex & FP_EX_INEXACT)		\
+  {	\
+	__asm__ __volatile__ ("fsub\ts0, %s0, %s1"			\
+			  :		\
+			  : "w" (fp_max), "w" (fp_one)		\
+			  : "s0");	\
+	__asm__ __volatile__ ("mrs\t%0, fpsr" : "=r" (fpsr));		\
+  }	\
+  } while (0)
+
+
+#define FP_RND_NEAREST		0
+#define FP_RND_ZERO		0xc0
+#define FP_RND_PINF		0x40
+#define FP_RND_MINF		0x80
+
+#define _FP_DECL_EX \
+  unsigned long int _fpcr __attribute__ ((unused)) = FP_RND_NEAREST
+
+#define FP_INIT_ROUNDMODE			\
+  do {		\
+__asm__ __volatile__ ("mrs	%0, fpcr"	\
+			  : "=r" (_fpcr));	\
+  } while (0)
+
+#define FP_ROUNDMODE (_fpcr & 0xc0)
+
 #define	__LITTLE_ENDIAN	1234
 #define	__BIG_ENDIAN	4321

Re: [AArch64] Implement section anchors

2012-09-06 Thread Richard Earnshaw

On 06/09/12 15:31, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch implements section anchors for the AArch64 port.
> 
> OK for aarch64-branch?
> 
> Regards,
> James Greenhalgh
> 
> --
> 2012-09-06  James Greenhalgh  
>   Richard Earnshaw  
> 
>   * common/config/aarch64/aarch64-common.c
>   (aarch_option_optimization_table): New.
>   (TARGET_OPTION_OPTIMIZATION_TABLE): Define.
>   * gcc/config.gcc ([aarch64] target_has_targetm_common): Set to yes.
>   * gcc/config/aarch64/aarch64-elf.h (ASM_OUTPUT_DEF): New definition.
>   * gcc/config/aarch64/aarch64.c (TARGET_MIN_ANCHOR_OFFSET): Define.
>   (TARGET_MAX_ANCHOR_OFFSET): Likewise.
> 
> 

OK.

R.

RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-09-06 Thread Ian Bolton

> On 2012-08-31 07:49, Ian Bolton wrote:
> > +(define_split
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +   (const:DI (plus:DI (match_operand:DI 1 "aarch64_valid_symref"
> "S")
> > +  (match_operand:DI 2 "const_int_operand"
> "i"]
> > +  ""
> > +  [(set (match_dup 0) (high:DI (const:DI (plus:DI (match_dup 1)
> > + (match_dup 2)
> > +   (set (match_dup 0) (lo_sum:DI (match_dup 0)
> > +(const:DI (plus:DI (match_dup 1)
> > +   (match_dup 2)]
> > +  ""
> > +)
> 
> You ought not need this as a separate split, since (CONST ...) should
> be handled exactly like (SYMBOL_REF).

I see in combine.c that it does get done for a MEM (which is how my
earlier patch worked), but not when it's being used for other reasons
(hence the title of this email).

See below for current code from find_split_point:

case MEM:
#ifdef HAVE_lo_sum
  /* If we have (mem (const ..)) or (mem (symbol_ref ...)), split it
 using LO_SUM and HIGH.  */
  if (GET_CODE (XEXP (x, 0)) == CONST
  || GET_CODE (XEXP (x, 0)) == SYMBOL_REF)
{
  enum machine_mode address_mode
= targetm.addr_space.address_mode (MEM_ADDR_SPACE (x));

  SUBST (XEXP (x, 0),
 gen_rtx_LO_SUM (address_mode,
 gen_rtx_HIGH (address_mode, XEXP (x, 0)),
 XEXP (x, 0)));
  return &XEXP (XEXP (x, 0), 0);
}
#endif


If I don't use my split pattern, I could alter combine to remove the
requirement that parent is a MEM.

What do you think?

> 
> Also note that constraints ("=r" etc) aren't used for splits.
> 

If I keep the pattern, I will remove the constraints.  Thanks for the
pointers in this regard.


Cheers,
Ian

Re: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-09-06 Thread Richard Henderson

On 09/06/2012 08:06 AM, Ian Bolton wrote:
> If I don't use my split pattern, I could alter combine to remove the
> requirement that parent is a MEM.
> 
> What do you think?

I merely question the calling out of CONST as special.

Either you've got some pattern that handles SYMBOL_REF
the same way, or you're missing something.

r~

C++ PATCH for c++/54341, c++/54253 (constexpr and virtual functions)

2012-09-06 Thread Jason Merrill


Vtables were causing several different problems for constexpr:

1) Value-initializing a nearly-empty class (that has a vptr but no data) 
meant two initializers for a single base.  Fixed by not bothering to 
zero out a type with no data before calling its constructor.


2) A primary base is allocated at offset 0 even if it isn't at the 
beginning of the base-clause, but constructors initialize bases in the 
order of the base-clause.  So we need to do some adjustment to get our 
CONSTRUCTOR in the right order.


Looking at issue 2 also led me to notice that we were failing to ignore 
base fields as intended in cx_check_missing_mem_inits.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 439320363040fdcdadc1a500cf4319f5f8325bad
Author: Jason Merrill 
Date:   Wed Sep 5 22:26:43 2012 -0400

	PR c++/54341
	PR c++/54253
	* semantics.c (sort_constexpr_mem_initializers): New.
	(build_constexpr_constructor_member_initializers): Use it.
	(cx_check_missing_mem_inits): Skip artificial fields.
	* init.c (expand_aggr_init_1): Don't zero out a class
	with no data.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 09288f8..561477a 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -1742,8 +1742,10 @@ expand_aggr_init_1 (tree binfo, tree true_exp, tree exp, tree init, int flags,
  that's value-initialization.  */
   if (init == void_type_node)
 {
-  /* If no user-provided ctor, we need to zero out the object.  */
-  if (!type_has_user_provided_constructor (type))
+  /* If the type has data but no user-provided ctor, we need to zero
+	 out the object.  */
+  if (!type_has_user_provided_constructor (type)
+	  && !is_really_empty_class (type))
 	{
 	  tree field_size = NULL_TREE;
 	  if (exp != true_exp && CLASSTYPE_AS_BASE (type) != type)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index f64246d..7cd1468 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5895,6 +5895,35 @@ check_constexpr_ctor_body (tree last, tree list)
   return ok;
 }
 
+/* VEC is a vector of constructor elements built up for the base and member
+   initializers of a constructor for TYPE.  They need to be in increasing
+   offset order, which they might not be yet if TYPE has a primary base
+   which is not first in the base-clause.  */
+
+static VEC(constructor_elt,gc) *
+sort_constexpr_mem_initializers (tree type, VEC(constructor_elt,gc) *vec)
+{
+  if (!CLASSTYPE_HAS_PRIMARY_BASE_P (type)
+  || (CLASSTYPE_PRIMARY_BINFO (type)
+	  == BINFO_BASE_BINFO (TYPE_BINFO (type), 0)))
+return vec;
+
+  /* Find the element for the primary base and move it to the beginning of
+ the vec.  */
+  tree pri = BINFO_TYPE (CLASSTYPE_PRIMARY_BINFO (type));
+  VEC(constructor_elt,gc) &v = *vec;
+  int pri_idx;
+
+  for (pri_idx = 1; ; ++pri_idx)
+if (TREE_TYPE (v[pri_idx].index) == pri)
+  break;
+  constructor_elt pri_elt = v[pri_idx];
+  for (int i = 0; i < pri_idx; ++i)
+v[i+1] = v[i];
+  v[0] = pri_elt;
+  return vec;
+}
+
 /* Build compile-time evalable representations of member-initializer list
for a constexpr constructor.  */
 
@@ -5957,6 +5986,7 @@ build_constexpr_constructor_member_initializers (tree type, tree body)
 	  return body;
 	}
 	}
+  vec = sort_constexpr_mem_initializers (type, vec);
   return build_constructor (type, vec);
 }
   else
@@ -6075,14 +6105,16 @@ cx_check_missing_mem_inits (tree fun, tree body, bool complain)
 	{
 	  index = CONSTRUCTOR_ELT (body, i)->index;
 	  /* Skip base and vtable inits.  */
-	  if (TREE_CODE (index) != FIELD_DECL)
+	  if (TREE_CODE (index) != FIELD_DECL
+	  || DECL_ARTIFICIAL (index))
 	continue;
 	}
   for (; field != index; field = DECL_CHAIN (field))
 	{
 	  tree ftype;
 	  if (TREE_CODE (field) != FIELD_DECL
-	  || (DECL_C_BIT_FIELD (field) && !DECL_NAME (field)))
+	  || (DECL_C_BIT_FIELD (field) && !DECL_NAME (field))
+	  || DECL_ARTIFICIAL (field))
 	continue;
 	  ftype = strip_array_types (TREE_TYPE (field));
 	  if (type_has_constexpr_default_constructor (ftype))
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-virtual2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-virtual2.C
new file mode 100644
index 000..86040a3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-virtual2.C
@@ -0,0 +1,24 @@
+// PR c++/54341
+// { dg-do compile { target c++11 } }
+
+template
+struct enable_shared_from_this
+{
+  constexpr enable_shared_from_this(); // { dg-warning "used but never defined" }
+
+private:
+  int mem;
+};
+
+class VTableClass {
+public:
+virtual void someVirtualMethod() { }
+};
+
+class SomeClass : public enable_shared_from_this< SomeClass >, public
+VTableClass { };
+
+SomeClass* createInstance()
+{
+return new SomeClass;
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-virtual3.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-virtual3.C
new file mode 100644
index 000..de446bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-virtual3.C
@@ -0,

PR testsuite/54184: rewriting failing data race test

2012-09-06 Thread Aldy Hernandez

The current test is failing on some architectures because the underlying 
gimple has changed.  I believe the best way to test the speculative 
store data race is with the simulate-thread/ harness.  We already have a 
speculative store test in the harness, so this will be a nice addition.


I have manually inspected that we are performing the speculative store 
without --param allow-store-data-races=0, and avoiding it with =1.


Tested on x86-64 with and without the --param.

OK for trunk?
testsuite/
PR testsuite/54184
* gcc.dg/pr52558-1.c: Delete.
* gcc.dg/simulate-thread/speculative-store-2.c: New.

diff --git a/gcc/testsuite/gcc.dg/pr52558-1.c b/gcc/testsuite/gcc.dg/pr52558-1.c
deleted file mode 100644
index c34ad06..000
--- a/gcc/testsuite/gcc.dg/pr52558-1.c
+++ /dev/null
@@ -1,22 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */
-
-/* Test that `count' is not written to unless p->data > 0.  */
-
-int count;
-
-struct obj {
-int data;
-struct obj *next;
-} *q;
-
-void func()
-{
-  struct obj *p;
-  for (p = q; p; p = p->next)
-if (p->data > 0)
-  count++;
-}
-
-/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" 
} } */
-/* { dg-final { cleanup-tree-dump "lim1" } } */
diff --git a/gcc/testsuite/gcc.dg/simulate-thread/speculative-store-2.c 
b/gcc/testsuite/gcc.dg/simulate-thread/speculative-store-2.c
new file mode 100644
index 000..d4d28f5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/simulate-thread/speculative-store-2.c
@@ -0,0 +1,74 @@
+/* { dg-do link } */
+/* { dg-options "--param allow-store-data-races=0 -O2" } */
+/* { dg-final { simulate-thread } } */
+
+#include 
+#include 
+
+#include "simulate-thread.h"
+
+/* Test that speculative stores do not happen for --param
+   allow-store-data-races=0.  */
+
+int count, insns;
+
+struct obj {
+int data;
+struct obj *next;
+} *q;
+
+void simulate_thread_other_threads ()
+{
+  ++insns;
+  ++count;
+}
+
+int simulate_thread_step_verify ()
+{
+  return 0;
+}
+
+int simulate_thread_final_verify ()
+{
+  /* If count != insns, someone must have cached `count' and stored a
+ racy value into it.  */
+  if (count != insns)
+{
+  printf("FAIL: count was incorrectly cached\n");
+  return 1;
+}
+  return 0;
+}
+
+/* Test that `count' is not written to unless p->data > 0.  */
+
+__attribute__((noinline))
+void simulate_thread_main()
+{
+  struct obj *p;
+  for (p = q; p; p = p->next)
+if (p->data > 0)
+  count++;
+}
+
+struct obj *
+insert(struct obj *head, int data)
+{
+  struct obj *t = (struct obj *) malloc (sizeof (struct obj));
+  t->next = head;
+  t->data = data;
+  return t;
+}
+
+int main()
+{
+  q = insert (0, 0);
+  q = insert (q, 0);
+  q = insert (q, 0);
+  q = insert (q, 0);
+  q = insert (q, 0);
+
+  simulate_thread_main ();
+  simulate_thread_done ();
+  return 0;
+}

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Richard Henderson

On 09/06/2012 02:37 AM, Richard Guenther wrote:
> In all this seems unrelated to CILK+ work (even if you make use of this
> from within CILK+).

While true, we also asked him to split up the work.  And this piece,
done correctly, seems useful even if the rest of cilk is ignored.


r~

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Iyer, Balaji V

Hello Joseph,
Thanks for reviewing my patch. Please see my responses below:

>-Original Message-
>From: Joseph Myers [mailto:jos...@codesourcery.com]
>Sent: Wednesday, September 05, 2012 8:07 PM
>To: Iyer, Balaji V
>Cc: gcc-patches@gcc.gnu.org; Aldy Hernandez (al...@redhat.com); Jeff Law;
>r...@redhat.com
>Subject: Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
>
>On Wed, 5 Sep 2012, Iyer, Balaji V wrote:
>
>>  Attached, please find the 1st of ~22 patches that implements Cilk
>> Plus. This patch will implement Elemental Functions into the C compiler.
>> Please check it in to the trunk if it looks OK.
>>
>>  Below, I will give you a small example about what elemental function
>> is and how it can be useful. Details about elemental function can be
>> found in the following link
>> (http://software.intel.com/en-us/articles/elemental-functions-writing-
>> data-parallel-code-in-cc-using-intel-cilk-plus)
>
>That page says "To continue reading the article, click on the link below."
>but I don't see such a link below.

Sorry about that. We were recently updating the website and I think it may have 
gotten messed up during then. I have contacted the appropriate people to fix 
it.  I will let you know as soon as the link is fixed.

>
>> * c-cpp-elem-function.c (create_processor_attribute): Likewise.
>
>I don't see a ChangeLog entry for the addition of this file at all.  When a 
>new file
>is added, "New file." is enough entry; you don't describe particular things 
>within
>the file.

Ok, I was mistaken there. I thought we had to add a changelog entry for every 
function and not every file. I will fix it in the updated patch I send soon.

>
>This file includes tm.h and tm_p.h.  Inclusion of these headers from front-end
>code is deprecated.  If they are really needed, please put comments on the
>includes about exactly what target macros are being used in this front-end 
>code.
>Similarly, use of hard-reg-set.h in front-end code is doubtful.  Generally, 
>please
>check all #includes in all new source files and make sure that each include is
>actually needed because some functionality from the relevant header is used in
>the source file; do not just copy the headers included by some existing source 
>file.

OK, I will look into this. I don't believe I am using tm_p.h or tm.h. I just 
put them in just in case.

>
>create_processor_attribute contains hardcoded references to x86-specific
>functionality.  This is not OK; all such target dependencies need to be kept 
>within
>the back ends, and handled from the rest of the compiler via target hooks (in
>most cases, new target dependencies must use target hooks not target macros).

The only thing I am doing in that function is to add appropriate attribute. In 
elemental function, there is a processor clause that will allow users to set 
the type of processor they want the function compiled for. All I am doing is to 
map that information to the appropriate "arch" attribute. I didn't think it had 
any back end pecularity.

>
>Please make sure every new function has a comment explicitly describing the
>semantics of every parameter and the return value as well as anything else the
>function does.

I was trying to follow suit with other functions nearby. Most function had 1 or 
2 line header comments that gives a quick description about the function. 

>
>Where there are alternative versions of functions/macros with/without explicit
>locations, please use the forms with explicit locations (e.g.
>build2_loc instead of build2), and try to link the locations to particular 
>source
>code tokens and pass those locations down explicitly to each function as 
>needed.

I have tried to preserve the location wherever appropriate. In many cases I 
used UNKNOWN_LOCATION or omitted the location information because the code is 
internally generated which does not have a line number.

>
>There may be more issues; I'll await a revised patch before doing further 
>review.

I will send another one ASAP.

>
>--
>Joseph S. Myers
>jos...@codesourcery.com

Yours Sincerely,

Balaji V. Iyer.

Re: [Patch ARM] implement bswap16

2012-09-06 Thread Christophe Lyon

On 6 September 2012 10:48, Richard Earnshaw  wrote:
> On 05/09/12 17:01, Christophe Lyon wrote:
>
> +(define_insn "*arm_revsh"
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
> +   (sign_extend:SI (bswap:HI (match_operand:HI 1 "s_register_operand" 
> "r"]
> +  "TARGET_32BIT && arm_arch6"
> +  "revsh%?\t%0, %1"
> +  [(set_attr "predicable" "yes")
> +   (set_attr "length" "4")]
>
> Can you add additional constraints for the t1 encoding for this and the other 
> TARGET_32BIT patterns.  Then the compiler will get the length calculations 
> correct.  Something like:
>
>
> (define_insn "*arm_revsh"
> +  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
> +   (sign_extend:SI (bswap:HI (match_operand:HI 1 "s_register_operand" 
> "l,r"]
>   "TARGET_32BIT && arm_arch6"
>   "revsh%?\t%0, %1"
>   [(set_attr "predicable" "yes")
> +   (set_attr "arch" "t2,*")
> +   (set_attr "length" "2,4")]
>
> Brownie points for retro-fitting this to the existing rev patterns.

OK I will do it.

But why are the thumb1_XXX patterns still necessary?
I tried removing them, but compiling the testcase with -march=armv6
-mthumb makes the compiler fail (internal compiler error:
output_operand: invalid %-code)

> +(define_expand "bswaphi2"
> +  [(set (match_operand:HI 0 "s_register_operand" "=r")
> +   (bswap:HI (match_operand:HI 1 "s_register_operand" "r")))]
>
> Define_expand doesn't take constraints.
Oops. So I'll also have to clean bswapsi2 :-)

> Finally, these patterns should be grouped with the other byte-reversal 
> patterns in arm.md, not placed at the end of the file.
I am not sure to understand: I added them right after bswapsi2, do
they need to be before it?

Christophe.

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Richard Henderson

On 09/05/2012 03:09 PM, Iyer, Balaji V wrote:
> If we annotate *both* the function declaration and the function with the 
> following attribute, the compiler will create a vector and scalar version of 
> the function. 
> 
> __attribute__((vector)) my_func (int x, int y);
> 
> __attribute__((vector)) my_func (int x, int y) 
> {
>   ... /* Body of the function.  */
> }

I know Marc Glisse has already brought this up down-thread, but I'll
re-iterate for emphasis: You cannot possibly form a stable, exportable
ABI with this alone.

At minimum I would say that the vectorlength parameter would have to
be mandatory on declarations to be useful.  One could reasonably leave
them to default on definitions, or have explicit vectorlength(default)
for declarations internal to a project (i.e. assert that the file that
contains the elemental function is compiled with the same compile flags
and so will make the same choices for default).

I see that Intel does not even begin to address this within their own
documentation.  Is the problem of ABIs vs target cpus really being ignored?

r~

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Iyer, Balaji V

Hello Marc,
Please see my response below. 

Thanks for looking at my patch!

Sincerely,

Balaji V. Iyer.

>-Original Message-
>From: Marc Glisse [mailto:marc.gli...@inria.fr]
>Sent: Thursday, September 06, 2012 2:06 AM
>To: Gabriel Dos Reis
>Cc: Iyer, Balaji V; gcc-patches@gcc.gnu.org; Aldy Hernandez
>(al...@redhat.com); Jeff Law; r...@redhat.com
>Subject: Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
>
>On Wed, 5 Sep 2012, Gabriel Dos Reis wrote:
>
>> On Wed, Sep 5, 2012 at 5:09 PM, Iyer, Balaji V  
>> wrote:
>>> Let's say we have two for loops like this:
>>>
>>> int my_func (int x, int y);
>>>
>>> For (ii = 0; ii < 1; ii++)
>>> X[ii] = my_func (Y[ii], Z[ii]);
>
>I assume X, Y and Z are __restrict pointers (or something the compiler can 
>detect
>doesn't alias).

Yes, the compiler must detect that.

>
>> 2. Considering this example, won't you get the same behaviour
>> if my_func was declared with "pure" attribute?  If not, why?
>
>AFAIU, my_func is defined in a separate library and because of the attribute on
>the definition, it will actually export overloads:
>int myfunc(int,int);
>v2si myfunc(v2si,v2si);
>v4si myfunc(v4si,v4si);
>etc (where does it stop? seems problematic if the library is compiled for
>sse4 and I then compile and link an avx program)

The user can provide at most 1 vector length and the compiler will map it to 
appropriate vector value. If the user omits the vectorlength clause then the 
compiler picks a vectorlength based on the architecture's vector units and the 
data width. So, it will stop at 2 (1 scalar and 1 vector) :-).

>
>(hopefully with implementations more clever than breaking the vectors into
>pieces and calling the basic myfunc on each)
>
>The attribute on the declaration then lets gcc's vectorizer know it can call 
>those
>overloads.
>
>With suitable pure/const attribute you could unroll the loop a bit and reorder 
>the
>calls to myfunc, but without myfunc's body, you couldn't do as much.
>
>Note that this is my guess from reading the example and completely ignoring the
>patch, it could be miles from the truth, and it needs better explanation (the 
>doc
>patch is coming later in the series IIRC).
>
>--
>Marc Glisse

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Gabriel Dos Reis

On Thu, Sep 6, 2012 at 10:51 AM, Richard Henderson  wrote:
> On 09/06/2012 02:37 AM, Richard Guenther wrote:
>> In all this seems unrelated to CILK+ work (even if you make use of this
>> from within CILK+).
>
> While true, we also asked him to split up the work.  And this piece,
> done correctly, seems useful even if the rest of cilk is ignored.

Fully agreed.

The language/front-end modifications need more discussion and
explanation though.

-- Gaby

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Joseph S. Myers

On Thu, 6 Sep 2012, Iyer, Balaji V wrote:

> Ok, I was mistaken there. I thought we had to add a changelog entry for 
> every function and not every file. I will fix it in the updated patch I 
> send soon.

For functions in existing files you do need to mention each function - but 
not for new files.

> >create_processor_attribute contains hardcoded references to x86-specific
> >functionality.  This is not OK; all such target dependencies need to be kept 
> >within
> >the back ends, and handled from the rest of the compiler via target hooks (in
> >most cases, new target dependencies must use target hooks not target macros).
> 
> The only thing I am doing in that function is to add appropriate 
> attribute. In elemental function, there is a processor clause that will 
> allow users to set the type of processor they want the function compiled 
> for. All I am doing is to map that information to the appropriate "arch" 
> attribute. I didn't think it had any back end pecularity.

Concepts such as "pentium_4" are architecture-specific and have no place 
in front-end files.  This whole mapping from one sort of string to another 
belongs within the back end.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C++ Patch] PR 18747

2012-09-06 Thread Paolo Carlini


Hi,

On 09/06/2012 02:03 AM, Jason Merrill wrote:

but note that for:

template 
struct A
{
int select() { return 0; }
};

we have parser->num_template_parameter_lists == 1 and num_templates ==
0. Thus it seems that the case 'num_templates + 1' isn't (just) about
member templates...


That's odd, num_templates should be 1.

Yes, it seems odd, the same happens for template functions, like simply:

template 
void foo();

Really, whatever we do at least the comment should be adjusted not to 
mention only member templates.

  And I notice that cp_parser_check_declarator_template_parameters has
another copy of the num_template_headers_for_class logic; they should
be merged.
Something like the attached passes testing, the clean-up is nice, but 
I'm not sure about the specific details of the existing code vs 
num_template_headers_for_class, whether we can just call the latter as I 
did and be done.

I think the problem with 24314 is that we try to decide how many
template headers we want before we determine what declaration we're
looking at.  When we have a redefinition or specialization, we know
exactly how many headers we want, and we should check accordingly
rather than say N or N+1.
I see. Yesterday at some point I wondered whether we could do that 
already when cp_parser_check_template_parameters is called, that is 
remove the + 1 case and make the callers more precise. But I understand 
not that it's too early. Then, are you under the impression that we 
should still have cp_parser_check_template_parameters as-is and add a 
check later, or have only the late one?


I'm looking more into this.

Thanks,
Paolo.

Index: parser.c
===
--- parser.c(revision 191016)
+++ parser.c(working copy)
@@ -20670,55 +20670,25 @@ cp_parser_check_declarator_template_parameters (cp
cp_declarator *declarator,
location_t declarator_location)
 {
-  unsigned num_templates;
-
-  /* We haven't seen any classes that involve template parameters yet.  */
-  num_templates = 0;
-
   switch (declarator->kind)
 {
 case cdk_id:
-  if (declarator->u.id.qualifying_scope)
-   {
- tree scope;
+  {
+   unsigned num_templates = 0;
+   tree scope = declarator->u.id.qualifying_scope;
 
- scope = declarator->u.id.qualifying_scope;
+   if (scope)
+ num_templates = num_template_headers_for_class (scope);
+   else if (TREE_CODE (declarator->u.id.unqualified_name)
+== TEMPLATE_ID_EXPR)
+ /* If the DECLARATOR has the form `X' then it uses one
+additional level of template parameters.  */
+ ++num_templates;
 
- while (scope && CLASS_TYPE_P (scope))
-   {
- /* You're supposed to have one `template <...>'
-for every template class, but you don't need one
-for a full specialization.  For example:
+   return cp_parser_check_template_parameters 
+ (parser, num_templates, declarator_location, declarator);
+  }
 
-template  struct S{};
-template <> struct S { void f(); };
-void S::f () {}
-
-is correct; there shouldn't be a `template <>' for
-the definition of `S::f'.  */
- if (!CLASSTYPE_TEMPLATE_INFO (scope))
-   /* If SCOPE does not have template information of any
-  kind, then it is not a template, nor is it nested
-  within a template.  */
-   break;
- if (explicit_class_specialization_p (scope))
-   break;
- if (PRIMARY_TEMPLATE_P (CLASSTYPE_TI_TEMPLATE (scope)))
-   ++num_templates;
-
- scope = TYPE_CONTEXT (scope);
-   }
-   }
-  else if (TREE_CODE (declarator->u.id.unqualified_name)
-  == TEMPLATE_ID_EXPR)
-   /* If the DECLARATOR has the form `X' then it uses one
-  additional level of template parameters.  */
-   ++num_templates;
-
-  return cp_parser_check_template_parameters 
-   (parser, num_templates, declarator_location, declarator);
-
-
 case cdk_function:
 case cdk_array:
 case cdk_pointer:

Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Gabriel Dos Reis

On Thu, Sep 6, 2012 at 11:11 AM, Iyer, Balaji V  wrote:

>>> On Wed, Sep 5, 2012 at 5:09 PM, Iyer, Balaji V  
>>> wrote:
 Let's say we have two for loops like this:

 int my_func (int x, int y);

 For (ii = 0; ii < 1; ii++)
 X[ii] = my_func (Y[ii], Z[ii]);
>>
>>I assume X, Y and Z are __restrict pointers (or something the compiler can 
>>detect
>>doesn't alias).
>
> Yes, the compiler must detect that.

Exactly what do you mean by detect?
That the user has supplied somewhere a declaration saying the pointers
are restricted? (Note that C++11 does not have restrict).  Or that the
compiler must perform an alias analysis?

>>> 2. Considering this example, won't you get the same behaviour
>>> if my_func was declared with "pure" attribute?  If not, why?
>>
>>AFAIU, my_func is defined in a separate library and because of the attribute 
>>on
>>the definition, it will actually export overloads:
>>int myfunc(int,int);
>>v2si myfunc(v2si,v2si);
>>v4si myfunc(v4si,v4si);
>>etc (where does it stop? seems problematic if the library is compiled for
>>sse4 and I then compile and link an avx program)
>
> The user can provide at most 1 vector length and the compiler will map it to 
> appropriate vector value. If the user omits the vectorlength clause then the 
> compiler picks a vectorlength based on the architecture's vector units and 
> the data width. So, it will stop at 2 (1 scalar and 1 vector) :-).

Is that part of the ABI and the function declaration?
Note that in C++11 (and I suppose in C++1y), attributes are supposed to be
semantics-neutral, in the sense that if a program compiles with attributes, then
ignoring those attributes should also lead to a well-formed program with the
same observable behaviour.

This brings us to the question: do you expect your proposal to the C++ committee
to be adopted as is, or do you anticipate or expect changes based on committee
feedback?  If you expect changes, what policy to propose for changes that would
reflect any feedback you would get from WG21?  The reason why this is important
is because WG21 has its own schedule, independent of GCC, and GCC has to
deal with forward/backward compatibility.

-- Gaby

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Iyer, Balaji V

Sorry, I didn't see this message. Please see my responses below:

>-Original Message-
>From: Marc Glisse [mailto:marc.gli...@inria.fr]
>Sent: Thursday, September 06, 2012 7:04 AM
>To: gcc-patches@gcc.gnu.org
>Cc: Gabriel Dos Reis; Iyer, Balaji V; Aldy Hernandez (al...@redhat.com); Jeff 
>Law;
>r...@redhat.com
>Subject: Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
>
>On Thu, 6 Sep 2012, Marc Glisse wrote:
>
>> AFAIU, my_func is defined in a separate library and because of the
>> attribute on the definition, it will actually export overloads:
>> int myfunc(int,int);
>> v2si myfunc(v2si,v2si);
>> v4si myfunc(v4si,v4si);
>> etc (where does it stop? seems problematic if the library is compiled
>> for
>> sse4 and I then compile and link an avx program)
>
>According to the doc, it only generates one of these vector versions (even more
>risk of mismatch).
>
>Does it actually create the extra declaration in the front-end, i.e. can I 
>explicitly
>call myfunc on a v4si that I created myself, or is the middle-end the only 
>user?


Yes you can call the function yourself.

>
>--
>Marc Glisse

Re: [Patch ARM] implement bswap16

2012-09-06 Thread Richard Earnshaw

On 06/09/12 17:07, Christophe Lyon wrote:
> On 6 September 2012 10:48, Richard Earnshaw  wrote:
>> On 05/09/12 17:01, Christophe Lyon wrote:
>>
>> +(define_insn "*arm_revsh"
>> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
>> +   (sign_extend:SI (bswap:HI (match_operand:HI 1 "s_register_operand" 
>> "r"]
>> +  "TARGET_32BIT && arm_arch6"
>> +  "revsh%?\t%0, %1"
>> +  [(set_attr "predicable" "yes")
>> +   (set_attr "length" "4")]
>>
>> Can you add additional constraints for the t1 encoding for this and the 
>> other TARGET_32BIT patterns.  Then the compiler will get the length 
>> calculations correct.  Something like:
>>
>>
>> (define_insn "*arm_revsh"
>> +  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
>> +   (sign_extend:SI (bswap:HI (match_operand:HI 1 "s_register_operand" 
>> "l,r"]
>>   "TARGET_32BIT && arm_arch6"
>>   "revsh%?\t%0, %1"
>>   [(set_attr "predicable" "yes")
>> +   (set_attr "arch" "t2,*")
>> +   (set_attr "length" "2,4")]
>>
>> Brownie points for retro-fitting this to the existing rev patterns.
> 
> OK I will do it.
> 
> But why are the thumb1_XXX patterns still necessary?
> I tried removing them, but compiling the testcase with -march=armv6
> -mthumb makes the compiler fail (internal compiler error:
> output_operand: invalid %-code)
> 

They probably aren't necessary.  It should be possible to combine these
patterns into

(define_insn "*arm_revsh"
  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
(sign_extend:SI (bswap:HI (match_operand:HI 1
"s_register_operand" "l,r"]
  "arm_arch6"
  "revsh%?\t%0, %1"
  [(set_attr "predicable" "yes")
   (set_attr "arch" "t,32")
   (set_attr "length" "2,4")]


>> +(define_expand "bswaphi2"
>> +  [(set (match_operand:HI 0 "s_register_operand" "=r")
>> +   (bswap:HI (match_operand:HI 1 "s_register_operand" "r")))]
>>
>> Define_expand doesn't take constraints.
> Oops. So I'll also have to clean bswapsi2 :-)
> 
>> Finally, these patterns should be grouped with the other byte-reversal 
>> patterns in arm.md, not placed at the end of the file.
> I am not sure to understand: I added them right after bswapsi2, do
> they need to be before it?
> 

Strike that bit, then.  I saw the reference to "include "ldmstm.md"" in
the tail of your patch and hadn't realised that the bswap patterns were
already immediately before that.

Re: [PATCH, ARM] Cleanup: Replace GET_CODE comparisons with predicates

2012-09-06 Thread Richard Earnshaw

On 06/09/12 12:29, Kyrylo Tkachov wrote:
> 2012-09-06  Kyrylo Tkachov  
>  
>  * config/arm/arm.c: Use CONST_INT_P, CONST_DOUBLE_P, REG_P, MEM_P,
>  LABEL_P, JUMP_P, CALL_P, NOTE_P, BARRIER_P consistently.
>  * config/arm/arm.h: Use REG_P, MEM_P consistently.
>  * config/arm/arm.md: Use CONST_INT_P, REG_P, MEM_P, CONST_DOUBLE_P
>  consistently.
>  * config/arm/neon.md: Use REG_P consistently.
>  * config/arm/predicates.md: Use CONST_INT_P, REG_P, MEM_P
> consistently.
>  * config/arm/thumb2.md: Use CONST_INT_P, REG_P consistently.
>  * config/arm/vec-common.md: Use REG_P consistently.

Thanks, I've put this in.

R.

RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access

2012-09-06 Thread Ian Bolton

> From: Richard Henderson [mailto:r...@redhat.com]
> On 09/06/2012 08:06 AM, Ian Bolton wrote:
> > If I don't use my split pattern, I could alter combine to remove the
> > requirement that parent is a MEM.
> >
> > What do you think?
> 
> I merely question the calling out of CONST as special.
> 
> Either you've got some pattern that handles SYMBOL_REF
> the same way, or you're missing something.

Oh, I understand now.  Thanks for clarifying.

Some digging has shown me that the transformation keys off
the equivalence, as highlighted below.  It's always phrased in
terms of a const and never a symbol_ref.


after ud_dce:

   6 r82:DI=high(`arr')
   7 r81:DI=r82:DI+low(`arr')
 REG_DEAD: r82:DI
 REG_EQUAL: `arr'
   8 r80:DI=r81:DI+0xc
 REG_DEAD: r81:DI
 REG_EQUAL: const(`arr'+0xc)   <- this equivalence

after combine:

   7 r80:DI=high(const(`arr'+0xc))
   8 r80:DI=r80:DI+low(const(`arr'+0xc))
 REG_EQUAL: const(`arr'+0xc)   <- this equivalence


Based on that, and assuming I remove the constraints on the
pattern, would you say the patch is worthy of commit?

Thanks,
Ian

Re: [C++ Patch] PR 18747

2012-09-06 Thread Jason Merrill


On 09/06/2012 12:18 PM, Paolo Carlini wrote:

On 09/06/2012 02:03 AM, Jason Merrill wrote:

but note that for:

template 
struct A
{
int select() { return 0; }
};

we have parser->num_template_parameter_lists == 1 and num_templates ==
0. Thus it seems that the case 'num_templates + 1' isn't (just) about
member templates...


That's odd, num_templates should be 1.

Yes, it seems odd, the same happens for template functions, like simply:

template 
void foo();


That seems right; there is one more set of template parameters than 
required for the scope, so foo itself is a primary template.  If above 
you were talking about when we're looking at "struct A" it is right, too.



Really, whatever we do at least the comment should be adjusted not to
mention only member templates.


Yes, it's any case of declaring a primary template.


Something like the attached passes testing, the clean-up is nice, but
I'm not sure about the specific details of the existing code vs
num_template_headers_for_class, whether we can just call the latter as I
did and be done.


I think I prefer the code from here and would change 
num_template_headers_for_class to match.



I see. Yesterday at some point I wondered whether we could do that
already when cp_parser_check_template_parameters is called, that is
remove the + 1 case and make the callers more precise. But I understand
not that it's too early. Then, are you under the impression that we
should still have cp_parser_check_template_parameters as-is and add a
check later, or have only the late one?


I think it ought to work to only have a late check.

Jason

Bump minimum gmp version to 4.2.3

2012-09-06 Thread Diego Novillo

Doug was trying to build with gmp 4.2.1 and ran into this error:

gmp-4.2.1/include/gmp.h:515:12: error: 'std::FILE' has not been declared

This bug was fixed in gmp 4.2.3, so I've bumped the minimum
acceptable version in configure.ac

Tested on x86_64 with gmp 4.2.3.  Committed to trunk.


Diego.

* configure.ac: Bump minimum GMP version to 4.2.3.
* configure: Re-generate.

diff --git a/configure b/configure
index 8272ed0..a5df0c8 100755
--- a/configure
+++ b/configure
@@ -5318,7 +5318,7 @@ main ()
 
   #define GCC_GMP_VERSION_NUM(a,b,c) (((a) << 16L) | ((b) << 8) | (c))
   #define GCC_GMP_VERSION 
GCC_GMP_VERSION_NUM(__GNU_MP_VERSION,__GNU_MP_VERSION_MINOR,__GNU_MP_VERSION_PATCHLEVEL)
-  #if GCC_GMP_VERSION < GCC_GMP_VERSION_NUM(4,2,0)
+  #if GCC_GMP_VERSION < GCC_GMP_VERSION_NUM(4,2,3)
   choke me
   #endif
 
diff --git a/configure.ac b/configure.ac
index 36830d1..a6f5828 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1358,7 +1358,7 @@ if test -d ${srcdir}/gcc && test "x$have_gmp" = xno; then
   AC_TRY_COMPILE([#include "gmp.h"],[
   #define GCC_GMP_VERSION_NUM(a,b,c) (((a) << 16L) | ((b) << 8) | (c))
   #define GCC_GMP_VERSION 
GCC_GMP_VERSION_NUM(__GNU_MP_VERSION,__GNU_MP_VERSION_MINOR,__GNU_MP_VERSION_PATCHLEVEL)
-  #if GCC_GMP_VERSION < GCC_GMP_VERSION_NUM(4,2,0)
+  #if GCC_GMP_VERSION < GCC_GMP_VERSION_NUM(4,2,3)
   choke me
   #endif
   ], [AC_TRY_COMPILE([#include ],[

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Iyer, Balaji V

Hello Richard,
I forgot to answer one of questions. Please see it below:

Thanks,

Balaji V. Iyer.


>+static tree
>+handle_vector_attribute (tree *node, tree name ATTRIBUTE_UNUSED,
>+   tree args ATTRIBUTE_UNUSED,
>+   int ARG_UNUSED (flags), bool *no_add_attrs) {
>+  tree opt_list;
>+  VEC(tree,gc) *opt_vec = NULL;
>+  opt_vec = make_tree_vector ();
>+  VEC_safe_push (tree, gc, opt_vec, build_string (2, "O3"));
>+  opt_list = build_tree_list_vec (opt_vec);
>+  release_tree_vector (opt_vec);
>+  handle_optimize_attribute (node, get_identifier ("optimize"), opt_list,
>+   flags, no_add_attrs);
>
>Please no - do not use "optimize" attributes from inside the implementation.
>What happens if the user also specifies an optimize attribute?
>The above also doesnt' make sense to me, so please elaborate on why you want
>to enable -O3 for a function marked with the vector attribute.

The reason why I used optimize is because I would like to turn on the 
vectorizer. As far as I can tell, the only way to do that is to have -O3. 
Please advise if there is a better way to do so.

[PATCH] OpenBSD/hppa support

2012-09-06 Thread Mark Kettenis

Most bits are stolen from Linux, but there are a few subtle
differences since our assembler is configured to be slightly more
HP-UX-ish.


libgcc/:

2012-09-06  Mark Kettenis  

* config.host (hppa-*-openbsd*): New target.
* config/pa/t-openbsd: New file.

gcc/:

2012-09-06  Mark Kettenis  

* config.gcc (hppa*-*-openbsd*): New target.
* config/pa/pa-openbsd.h: New file.
* config/pa/pa32-openbsd.h: New file.
* config/host-openbsd.c (TRY_EXCEPT_VM_SPACE): Define for
OpenBSD/hppa.


Index: libgcc/config/pa/t-openbsd
===
--- libgcc/config/pa/t-openbsd  (revision 0)
+++ libgcc/config/pa/t-openbsd  (working copy)
@@ -0,0 +1,9 @@
+#Plug millicode routines into libgcc.a  We want these on both native and
+#cross compiles.  We use the "64-bit" routines because the "32-bit" code
+#is broken for certain corner cases.
+LIB1ASMSRC = pa/milli64.S
+LIB1ASMFUNCS = _divI _divU _remI _remU _div_const _mulI _dyncall
+
+HOST_LIBGCC2_CFLAGS += -DELF=1 -DLINUX=1
+
+LIB2ADD = $(srcdir)/config/pa/fptr.c
Index: libgcc/config.host
===
--- libgcc/config.host  (revision 190881)
+++ libgcc/config.host  (working copy)
@@ -496,6 +496,9 @@
extra_parts="libgcc_stub.a"
md_unwind_header=pa/hpux-unwind.h
;;
+hppa*-*-openbsd*)
+   tmake_file="$tmake_file pa/t-openbsd"
+   ;;
 i[34567]86-*-darwin*)
tmake_file="$tmake_file i386/t-crtpc i386/t-crtfm"
tm_file="$tm_file i386/darwin-lib.h"
Index: gcc/config/pa/pa-openbsd.h
===
--- gcc/config/pa/pa-openbsd.h  (revision 0)
+++ gcc/config/pa/pa-openbsd.h  (working copy)
@@ -0,0 +1,162 @@
+/* Definitions for PA_RISC with ELF format
+   Copyright 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010,
+   2011
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+#undef TARGET_OS_CPP_BUILTINS
+#define TARGET_OS_CPP_BUILTINS()   \
+  do   \
+{  \
+   OPENBSD_OS_CPP_BUILTINS();  \
+   builtin_assert ("machine=bigendian");   \
+}  \
+  while (0)
+
+/* Our profiling scheme doesn't LP labels and counter words.  */
+#define NO_DEFERRED_PROFILE_COUNTERS 1
+
+#undef STRING_ASM_OP
+#define STRING_ASM_OP   "\t.stringz\t"
+
+#define TEXT_SECTION_ASM_OP "\t.text"
+#define DATA_SECTION_ASM_OP "\t.data"
+#define BSS_SECTION_ASM_OP "\t.section\t.bss"
+
+/* We want local labels to start with period if made with asm_fprintf.  */
+#undef LOCAL_LABEL_PREFIX
+#define LOCAL_LABEL_PREFIX "."
+
+/* Define these to generate the Linux/ELF/SysV style of internal
+   labels all the time - i.e. to be compatible with
+   ASM_GENERATE_INTERNAL_LABEL in .  Compare these with the
+   ones in pa.h and note the lack of dollar signs in these.  FIXME:
+   shouldn't we fix pa.h to use ASM_GENERATE_INTERNAL_LABEL instead? */
+
+#undef ASM_OUTPUT_ADDR_VEC_ELT
+#define ASM_OUTPUT_ADDR_VEC_ELT(FILE, VALUE) \
+  if (TARGET_BIG_SWITCH)   \
+fprintf (FILE, "\t.word .L%d\n", VALUE);   \
+  else \
+fprintf (FILE, "\tb .L%d\n\tnop\n", VALUE)
+
+#undef ASM_OUTPUT_ADDR_DIFF_ELT
+#define ASM_OUTPUT_ADDR_DIFF_ELT(FILE, BODY, VALUE, REL) \
+  if (TARGET_BIG_SWITCH)   \
+fprintf (FILE, "\t.word .L%d-.L%d\n", VALUE, REL); \
+  else \
+fprintf (FILE, "\tb .L%d\n\tnop\n", VALUE)
+
+/* Use the default.  */
+#undef ASM_OUTPUT_LABEL
+
+/* NOTE: (*targetm.asm_out.internal_label)() is defined for us by elfos.h, and
+   does what we want (i.e. uses colons).  It must be compatible with
+   ASM_GENERATE_INTERNAL_LABEL(), so do not define it here.  */
+
+/* Use the default.  */
+#undef ASM_OUTPUT_INTERNAL_LABEL
+
+/* Use the default.  */
+#undef TARGET_ASM_GLOBALIZE_LABEL
+
+/* FIXME: Hacked from the  one so that we avoid multiple
+   labels in a function declaration (since pa.c seems determined to do
+   it differently)  */
+
+#undef ASM_DECLARE_FUNCTION_NAME
+#define ASM_DEC

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Iyer, Balaji V



>-Original Message-
>From: Joseph Myers [mailto:jos...@codesourcery.com]
>Sent: Thursday, September 06, 2012 12:18 PM
>To: Iyer, Balaji V
>Cc: gcc-patches@gcc.gnu.org; Aldy Hernandez (al...@redhat.com); Jeff Law;
>r...@redhat.com
>Subject: RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
>
>On Thu, 6 Sep 2012, Iyer, Balaji V wrote:
>
>> Ok, I was mistaken there. I thought we had to add a changelog entry
>> for every function and not every file. I will fix it in the updated
>> patch I send soon.
>
>For functions in existing files you do need to mention each function - but not 
>for
>new files.
>
>> >create_processor_attribute contains hardcoded references to
>> >x86-specific functionality.  This is not OK; all such target
>> >dependencies need to be kept within the back ends, and handled from
>> >the rest of the compiler via target hooks (in most cases, new target
>dependencies must use target hooks not target macros).
>>
>> The only thing I am doing in that function is to add appropriate
>> attribute. In elemental function, there is a processor clause that
>> will allow users to set the type of processor they want the function
>> compiled for. All I am doing is to map that information to the appropriate
>"arch"
>> attribute. I didn't think it had any back end pecularity.
>
>Concepts such as "pentium_4" are architecture-specific and have no place in
>front-end files.  This whole mapping from one sort of string to another belongs
>within the back end.

Please excuse me if I am "beating this horse to death." I am asking this to 
make sure I am understanding this correctly before I start re-implementing 
things. I am not very clear about whether the problem is the function's 
location or the place where it is called? Can you please clarify? Things like 
pentium_4 are part of the language  (please see processor clause in the pg 34 
of the spec) and all I was doing was to parse that and was doing a string 
matching and substituting one string for the next. All the processing and 
picking of instructions are done by the existing backend.

Thanks,

Balaji V. Iyer.


>
>--
>Joseph S. Myers
>jos...@codesourcery.com

[PATCH] Fix PR bootstrap/54419

2012-09-06 Thread Jack Howarth

   The attached patch eliminates the bootstrap failures in libstdc++-v3
of PR 54419 by having configure check for assembler support of the new
rdrnd opcode and defining  _GLIBCXX_X86_RDRAND in config.h if supported.
Tested on x86_64-apple-darwin12 against the assembler from Xcode 4.4.1.
Okay for gcc trunk?
 Jack
ps Regenerated config.h.in in libstdc++-v3 with...

autoheader -I. -I../config

and configure with...

autoconf -I. -I../config

libstdc++-v3/

2012-09-06  Ulrich Drepper  
Dominique d'Humieres 
Jack Howarth 

PR bootstrap/54419
* configure.ac: Test for rdrnd support in assembler.
* src/c++11/random.cc: (__x86_rdrand): Depend on _GLIBCXX_X86_RDRAND.
(random_device::_M_init): Likewise.
(random_device::_M_getval): Likewise.
* configure: Regenerated.
* config.h.in: Regenerated.

Index: libstdc++-v3/src/c++11/random.cc
===
--- libstdc++-v3/src/c++11/random.cc(revision 191031)
+++ libstdc++-v3/src/c++11/random.cc(working copy)
@@ -50,7 +50,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   return __ret;
 }
 
-#if defined __i386__ || defined __x86_64__
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
 unsigned int
 __attribute__ ((target("rdrnd")))
 __x86_rdrand(void)
@@ -75,7 +75,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 
 if (token == "default")
   {
-#if defined __i386__ || defined __x86_64__
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
unsigned int eax, ebx, ecx, edx;
// Check availability of cpuid and, for now at least, also the
// CPU signature for Intel's
@@ -118,7 +118,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   random_device::result_type
   random_device::_M_getval()
   {
-#if (defined __i386__ || defined __x86_64__)
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
 if (! _M_file)
   return __x86_rdrand();
 #endif
Index: libstdc++-v3/configure.ac
===
--- libstdc++-v3/configure.ac   (revision 191031)
+++ libstdc++-v3/configure.ac   (working copy)
@@ -330,6 +330,18 @@ case "$target" in
 esac
 GLIBCXX_CONDITIONAL(GLIBCXX_LDBL_COMPAT, test $ac_ldbl_compat = yes)
 
+ac_cv_x86_rdrand=no
+case "$target" in
+  i?86-*-* | \
+  x86_64-*-*)
+  AC_TRY_COMPILE(, [asm("rdrand %eax");],
+[ac_cv_x86_rdrand=yes], [ac_cv_x86_rdrand=no])
+esac
+if test $ac_cv_x86_rdrand = yes; then
+  AC_DEFINE(_GLIBCXX_X86_RDRAND, 1,
+   [ Defined if as can handle rdrand. ])
+fi
+
 # This depends on GLIBCXX_ENABLE_SYMVERS and GLIBCXX_IS_NATIVE.
 GLIBCXX_CONFIGURE_TESTSUITE

Re: [PATCH, libstdc++] Improve slightly __cxa_guard_acquire

2012-09-06 Thread Benjamin De Kosnik

On Thu, 30 Aug 2012 12:48:34 +0200
Thiago Macieira  wrote:

> Hello
> 
> The attached patch is a simple improvement to make a thread that
> failed to set the waiting bit to exit the function earlier, if it
> detects that another thread has successfully finished initialising.
> It matches the CAS code from a few lines above.
> 
> The change from RELAXED to ACQUIRE is noted in the previous patch
> I've just sent.

I like this, but want

// make a thread that failed to set the waiting bit to exit the function
// earlier, if it detects that another thread has successfully finished
// initialising

added as a comment in the

if (expected == pending_bit)

branch. 

I would like to put this in trunk + comment and give it 2-3 days at
least before 4.7 branch.

-benjamin

Re: [PATCH] Fix PR bootstrap/54419

2012-09-06 Thread Ulrich Drepper

On Thu, Sep 6, 2012 at 2:40 PM, Jack Howarth  wrote:
> Okay for gcc trunk?

One typo:

> * configure.ac: Test for rdrnd support in assembler.

It's rdrand.  I wouldn't be pedantic if the opcode wouldn't have
changed from rdrnd to rdrand at some point and using the old name
could be confusing.

RE: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)

2012-09-06 Thread Joseph S. Myers

On Thu, 6 Sep 2012, Iyer, Balaji V wrote:

> >Concepts such as "pentium_4" are architecture-specific and have no place in
> >front-end files.  This whole mapping from one sort of string to another 
> >belongs
> >within the back end.
> 
> Please excuse me if I am "beating this horse to death." I am asking this 
> to make sure I am understanding this correctly before I start 
> re-implementing things. I am not very clear about whether the problem is 
> the function's location or the place where it is called? Can you please 
> clarify? Things like pentium_4 are part of the language (please see 
> processor clause in the pg 34 of the spec) and all I was doing was to 
> parse that and was doing a string matching and substituting one string 
> for the next. All the processing and picking of instructions are done by 
> the existing backend.

That indicates a significant deficiency in the structure of the 
specification.  It would best be reworked to separate the 
architecture-independent specification from architecture-specific annexes.  
Every reference to something architecture-specific should instead say how 
things are defined by the architecture annex (for example, that the 
architecture annex specifies the tokens accepts for the processor clause 
and the default vector length).  There should be a defined way for 
architecture annexes to be added or updated for new architectures.

In addition, descriptions such as "Calls to functions other than other 
elemental functions and the intrinsic short vector math libraries provided 
with the Intel compilers" are clearly unsuitable for a specification of a 
language extension.

Until the specification is cleaned up to follow normal good practice for 
such specifications, the documentation included with GCC will need, along 
with a pointer to the specification, detail how it is amended to be 
properly architecture-independent and what are considered to be the 
architecture annexes followed by GCC.  And those parts of the 
specification that clearly would naturally vary from architecture to 
architecture will need to have the architecture-specific parts implemented 
through target hooks, not directly in the architecture-independent 
compiler.  And make it as easy as possible for people to add test coverage 
for new architectures.

Having looked at bits of the specification now, I may as well point out 
another, unrelated, issue that needs fixing in the specification: 
"Elemental functions cannot be virtual, and can only be called directly, 
not through a function pointer".  As I noted on the WG14 reflector when a 
similar issue appeared in a draft of the IEEE 758-2008 bindings, all C 
function calls are through function pointers - that's how the C standard 
defines them - so you need to say (in the specification, not here) what's 
meant in actual C standard terms, and make sure the implementation follows 
that, and make sure there are testcases verifying that this constraint is 
diagnosed.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [middle-end] Add machine_mode to address_cost target hook

2012-09-06 Thread Oleg Endo

On Thu, 2012-09-06 at 14:41 +0200, Georg-Johann Lay wrote:
> Oleg Endo schrieb:
> > On Wed, 2012-09-05 at 14:39 -0400, DJ Delorie wrote:
> >> I don't feel the m32c change needs my specific ack, it's a harmless
> >> change that goes with the ack for the feature itself.
> >>
> >> However, I will note that m32c does have different costs for addresses
> >> in different address spaces, at least when -Os.
> > 
> > I have created http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54496
> > for this.
> 
> The same is true for avr.

As it is for other targets, too...

> Can you explain how this works?
> 

I'm not sure, however ... 

> For example I don't see a single call of address_cost in
> lower_subreg.c what means that at least that module has
> not a reasonable cost model.  The costs go that odyssey:
> 
> lower-subreg.c:compute_costs()
> -> rtlanal.c:insn_rtx_cost()
> -> rtl.h:set_src_cost()
> -> rtlanal.c:rtx_cost()
> -> targetm.rtx_costs()

... lower-subreg obviously is simply not dealding with mem loads/stores
when it collects the 'SET' costs.  Also notice that it calls into the
target once during startup and not for each individual insn.

> Each call level add some abstraction, i.e. removes information
> about the insn; atually it's no more an insn, not even a pattern,
> if the target hook is entered...
> 
> So at this point it appears a bit pointless to add mode
> and addr_space to address_cost if the call sites don't use
> that hook if it is needed.

I've noticed, that some passes (e.g. auto-inc-dec) would construct a MEM
pattern and use set_src_cost instead of using address_cost.  This can be
handled by simply implementing the MEM case in the rtx_costs hook and
re-using the target's address_cost function.  In fact some targets do
exactly that.

> The change is definitely in the right direction, but I wonder
> how it helps to fix code bloats of 300%-400% as in PR52543?

I'm not familiar with the AVR parts.
BTW, There was a small change in lower-subreg which required some
adaptations in targets:
http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00425.html

See also gcc/config/sh/sh.c (sh_rtx_costs): case SET: ...
Not sure whether it helps in your case.

> The avr backend currently hacks around that by expanding MEM
> for non-generic address space to UNSPEC.  Not nice.
> 
> Describing the cost will simply have no effect (provided that
> MEM -> UNSPEC hack would be reverted).

Because the address cost is not considered when doing the splits.
Maybe lower-subreg could be extended to pay attention to MEMs when
splitting (which would complicate its costs calculations -- it probably
would have to call into the target for every insn it looks at).

Maybe Richard or Ian have better ideas regarding this issue.

Cheers,
Oleg

Re: Bump minimum gmp version to 4.2.3

2012-09-06 Thread NightStrike

On Thu, Sep 6, 2012 at 2:13 PM, Diego Novillo  wrote:
> Doug was trying to build with gmp 4.2.1 and ran into this error:
>
> gmp-4.2.1/include/gmp.h:515:12: error: 'std::FILE' has not been declared
>
> This bug was fixed in gmp 4.2.3, so I've bumped the minimum
> acceptable version in configure.ac
>
> Tested on x86_64 with gmp 4.2.3.  Committed to trunk.

Don't you also have to modify the contrib/download_prerequisites
script, as well as the contents of
ftp://gcc.gnu.org/pub/gcc/infrastructure ?

Re: Bump minimum gmp version to 4.2.3

2012-09-06 Thread NightStrike

On Thu, Sep 6, 2012 at 3:13 PM, NightStrike  wrote:
> On Thu, Sep 6, 2012 at 2:13 PM, Diego Novillo  wrote:
>> Doug was trying to build with gmp 4.2.1 and ran into this error:
>>
>> gmp-4.2.1/include/gmp.h:515:12: error: 'std::FILE' has not been declared
>>
>> This bug was fixed in gmp 4.2.3, so I've bumped the minimum
>> acceptable version in configure.ac
>>
>> Tested on x86_64 with gmp 4.2.3.  Committed to trunk.
>
> Don't you also have to modify the contrib/download_prerequisites
> script, as well as the contents of
> ftp://gcc.gnu.org/pub/gcc/infrastructure ?

Actually, on closer inspection, other places list the minimum version
as 4.3.2, including:
http://gcc.gnu.org/install/prerequisites.html

as well as the two locations that I mention above.  Perhaps configure
should do likewise.

[PATCH][revised] Fix PR bootstrap/54419

2012-09-06 Thread Jack Howarth

   The attached patch eliminates the bootstrap failures in libstdc++-v3
of PR 54419 by having configure check for assembler support of the new
rdrand opcode and defining  _GLIBCXX_X86_RDRAND in config.h if supported.
Tested on x86_64-apple-darwin12 against the assembler from Xcode 4.4.1.
Okay for gcc trunk?
 Jack
ps Regenerated config.h.in in libstdc++-v3 with...

autoheader -I. -I../config

and configure with...

autoconf -I. -I../config

libstdc++-v3/

2012-09-06  Ulrich Drepper  
Dominique d'Humieres 
Jack Howarth 

PR bootstrap/54419
* configure.ac: Test for rdrand support in assembler.
* src/c++11/random.cc: (__x86_rdrand): Depend on _GLIBCXX_X86_RDRAND.
(random_device::_M_init): Likewise.
(random_device::_M_getval): Likewise.
* configure: Regenerated.
* config.h.in: Regenerated.

Index: libstdc++-v3/src/c++11/random.cc
===
--- libstdc++-v3/src/c++11/random.cc(revision 191031)
+++ libstdc++-v3/src/c++11/random.cc(working copy)
@@ -50,7 +50,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   return __ret;
 }
 
-#if defined __i386__ || defined __x86_64__
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
 unsigned int
 __attribute__ ((target("rdrnd")))
 __x86_rdrand(void)
@@ -75,7 +75,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 
 if (token == "default")
   {
-#if defined __i386__ || defined __x86_64__
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
unsigned int eax, ebx, ecx, edx;
// Check availability of cpuid and, for now at least, also the
// CPU signature for Intel's
@@ -118,7 +118,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   random_device::result_type
   random_device::_M_getval()
   {
-#if (defined __i386__ || defined __x86_64__)
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
 if (! _M_file)
   return __x86_rdrand();
 #endif
Index: libstdc++-v3/configure.ac
===
--- libstdc++-v3/configure.ac   (revision 191031)
+++ libstdc++-v3/configure.ac   (working copy)
@@ -330,6 +330,18 @@ case "$target" in
 esac
 GLIBCXX_CONDITIONAL(GLIBCXX_LDBL_COMPAT, test $ac_ldbl_compat = yes)
 
+ac_cv_x86_rdrand=no
+case "$target" in
+  i?86-*-* | \
+  x86_64-*-*)
+  AC_TRY_COMPILE(, [asm("rdrand %eax");],
+[ac_cv_x86_rdrand=yes], [ac_cv_x86_rdrand=no])
+esac
+if test $ac_cv_x86_rdrand = yes; then
+  AC_DEFINE(_GLIBCXX_X86_RDRAND, 1,
+   [ Defined if as can handle rdrand. ])
+fi
+
 # This depends on GLIBCXX_ENABLE_SYMVERS and GLIBCXX_IS_NATIVE.
 GLIBCXX_CONFIGURE_TESTSUITE

Re: [PATCH][revised] Fix PR bootstrap/54419

2012-09-06 Thread Jakub Jelinek

On Thu, Sep 06, 2012 at 03:22:10PM -0400, Jack Howarth wrote:
>The attached patch eliminates the bootstrap failures in libstdc++-v3
> of PR 54419 by having configure check for assembler support of the new
> rdrand opcode and defining  _GLIBCXX_X86_RDRAND in config.h if supported.
> Tested on x86_64-apple-darwin12 against the assembler from Xcode 4.4.1.
> Okay for gcc trunk?

Paolo requested in the PR that instead we define a new
AC_DEFUN([GLIBCXX_CHECK_X86_RDRAND], [
...
])
(or similar) in libstdc++-v3/acinclude.m4 and add just
GLIBCXX_CHECK_X86_RDRAND
(or similar) line to libstdc++-v3/configure.ac.

Jakub

Re: [PATCH][revised] Fix PR bootstrap/54419

2012-09-06 Thread Jakub Jelinek

On Thu, Sep 06, 2012 at 03:22:10PM -0400, Jack Howarth wrote:
>The attached patch eliminates the bootstrap failures in libstdc++-v3
> of PR 54419 by having configure check for assembler support of the new
> rdrand opcode and defining  _GLIBCXX_X86_RDRAND in config.h if supported.
> Tested on x86_64-apple-darwin12 against the assembler from Xcode 4.4.1.
> Okay for gcc trunk?

Paolo requested in the PR that instead we define a new
AC_DEFUN([GLIBCXX_CHECK_X86_RDRAND], [
...
])
(or similar) in libstdc++-v3/acinclude.m4 and add just
GLIBCXX_CHECK_X86_RDRAND
(or similar) line to libstdc++-v3/configure.ac.

Jakub

Re: Bump minimum gmp version to 4.2.3

2012-09-06 Thread Diego Novillo


On 2012-09-06 15:13 , NightStrike wrote:

On Thu, Sep 6, 2012 at 2:13 PM, Diego Novillo  wrote:

Doug was trying to build with gmp 4.2.1 and ran into this error:

gmp-4.2.1/include/gmp.h:515:12: error: 'std::FILE' has not been declared

This bug was fixed in gmp 4.2.3, so I've bumped the minimum
acceptable version in configure.ac

Tested on x86_64 with gmp 4.2.3.  Committed to trunk.


Don't you also have to modify the contrib/download_prerequisites
script, as well as the contents of
ftp://gcc.gnu.org/pub/gcc/infrastructure ?


No. The version I changed is the minimum *tolerable* version (the one 
configure declares as buggy but usable).


4.3.2 is still the minimum we require as acceptable.


Diego.

Re: Bump minimum gmp version to 4.2.3

2012-09-06 Thread Diego Novillo


On 2012-09-06 15:16 , NightStrike wrote:


Actually, on closer inspection, other places list the minimum version
as 4.3.2, including:
http://gcc.gnu.org/install/prerequisites.html

as well as the two locations that I mention above.  Perhaps configure
should do likewise.


It already does (configure.ac:1367).


Diego.

[PATCH][revisedx2] Fix PR bootstrap/54419

2012-09-06 Thread Jack Howarth

   The attached patch eliminates the bootstrap failures in libstdc++-v3
of PR 54419 by adding a define to acinclude.m4, GLIBCXX_CHECK_X86_RDRAND, that
checks for assembler support of the new rdrnd opcode, using this new define
in configure.ac and also defining _GLIBCXX_X86_RDRAND in config.h if supported.
Tested on x86_64-apple-darwin12 against the assembler from Xcode 4.4.1.
Okay for gcc trunk?
 Jack

libstdc++-v3/


2012-09-06  Ulrich Drepper  
Dominique d'Humieres 
Jack Howarth 

PR bootstrap/54419
* acinclude.m4: Define GLIBCXX_CHECK_X86_RDRAND.
* configure.ac: Use GLIBCXX_CHECK_X86_RDRAND to test for rdrand 
support in assembler.
* src/c++11/random.cc: (__x86_rdrand): Depend on _GLIBCXX_X86_RDRAND.
(random_device::_M_init): Likewise.
(random_device::_M_getval): Likewise.
* configure: Regenerated.
* config.h.in: Regenerated.


Index: libstdc++-v3/src/c++11/random.cc
===
--- libstdc++-v3/src/c++11/random.cc(revision 191041)
+++ libstdc++-v3/src/c++11/random.cc(working copy)
@@ -50,7 +50,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   return __ret;
 }
 
-#if defined __i386__ || defined __x86_64__
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
 unsigned int
 __attribute__ ((target("rdrnd")))
 __x86_rdrand(void)
@@ -75,7 +75,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
 
 if (token == "default")
   {
-#if defined __i386__ || defined __x86_64__
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
unsigned int eax, ebx, ecx, edx;
// Check availability of cpuid and, for now at least, also the
// CPU signature for Intel's
@@ -118,7 +118,7 @@ namespace std _GLIBCXX_VISIBILITY(defaul
   random_device::result_type
   random_device::_M_getval()
   {
-#if (defined __i386__ || defined __x86_64__)
+#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
 if (! _M_file)
   return __x86_rdrand();
 #endif
Index: libstdc++-v3/configure.ac
===
--- libstdc++-v3/configure.ac   (revision 191041)
+++ libstdc++-v3/configure.ac   (working copy)
@@ -330,6 +330,9 @@ case "$target" in
 esac
 GLIBCXX_CONDITIONAL(GLIBCXX_LDBL_COMPAT, test $ac_ldbl_compat = yes)
 
+# Check if assembler supports rdrand opcode.
+GLIBCXX_CHECK_X86_RDRAND
+
 # This depends on GLIBCXX_ENABLE_SYMVERS and GLIBCXX_IS_NATIVE.
 GLIBCXX_CONFIGURE_TESTSUITE
 
Index: libstdc++-v3/acinclude.m4
===
--- libstdc++-v3/acinclude.m4   (revision 191041)
+++ libstdc++-v3/acinclude.m4   (working copy)
@@ -3360,6 +3360,22 @@ AC_DEFUN([AC_LC_MESSAGES], [
 ])
 
 dnl
+dnl Check whether rdrand is supported in the assembler.
+AC_DEFUN([GLIBCXX_CHECK_X86_RDRAND], [
+  ac_cv_x86_rdrand=no
+  case "$target" in
+i?86-*-* | \
+x86_64-*-*)
+AC_TRY_COMPILE(, [asm("rdrand %eax");],
+   [ac_cv_x86_rdrand=yes], [ac_cv_x86_rdrand=no])
+  esac
+  if test $ac_cv_x86_rdrand = yes; then
+AC_DEFINE(_GLIBCXX_X86_RDRAND, 1,
+   [ Defined if as can handle rdrand. ])
+  fi
+])
+
+dnl
 dnl Check whether get_nprocs is available in , and define 
_GLIBCXX_USE_GET_NPROCS.
 dnl
 AC_DEFUN([GLIBCXX_CHECK_GET_NPROCS], [

Re: [PATCH, libstdc++] Improve slightly __cxa_guard_acquire

2012-09-06 Thread Benjamin De Kosnik


Here's the patch as applied to trunk in rev. 191042. I'll apply it to
4.7 this weekend as long as nobody yelps.

-benjamin

2012-09-06  Thiago Macieira  

	PR libstdc++/54172
* libsupc++/guard.cc (__cxa_guard_acquire): Exit the loop earlier if
we detect that another thread has had success. Don't compare_exchange
from a finished state back to a waiting state. Comment.

diff --git a/libstdc++-v3/libsupc++/guard.cc b/libstdc++-v3/libsupc++/guard.cc
index adc9608..60165cd 100644
--- a/libstdc++-v3/libsupc++/guard.cc
+++ b/libstdc++-v3/libsupc++/guard.cc
@@ -244,13 +244,13 @@ namespace __cxxabiv1
 if (__gthread_active_p ())
   {
 	int *gi = (int *) (void *) g;
-	int expected(0);
 	const int guard_bit = _GLIBCXX_GUARD_BIT;
 	const int pending_bit = _GLIBCXX_GUARD_PENDING_BIT;
 	const int waiting_bit = _GLIBCXX_GUARD_WAITING_BIT;
 
 	while (1)
 	  {
+	int expected(0);
 	if (__atomic_compare_exchange_n(gi, &expected, pending_bit, false,
 	__ATOMIC_ACQ_REL,
 	__ATOMIC_RELAXED))
@@ -264,13 +264,26 @@ namespace __cxxabiv1
 		// Already initialized.
 		return 0;	
 	  }
+
 	 if (expected == pending_bit)
 	   {
+		 // Use acquire here.
 		 int newv = expected | waiting_bit;
 		 if (!__atomic_compare_exchange_n(gi, &expected, newv, false,
 		  __ATOMIC_ACQ_REL, 
-		  __ATOMIC_RELAXED))
-		   continue;
+		  __ATOMIC_ACQUIRE))
+		   {
+		 if (expected == guard_bit)
+		   {
+			 // Make a thread that failed to set the
+			 // waiting bit exit the function earlier,
+			 // if it detects that another thread has
+			 // successfully finished initialising.
+			 return 0;
+		   }
+		 if (expected == 0)
+		   continue;
+		   }
 		 
 		 expected = newv;
 	   }

Re: [PATCH, libstdc++] Improve slightly __cxa_guard_acquire

2012-09-06 Thread Thiago Macieira

On quinta-feira, 6 de setembro de 2012 13.33.11, Benjamin De Kosnik wrote:
> Here's the patch as applied to trunk in rev. 191042. I'll apply it to
> 4.7 this weekend as long as nobody yelps.

Thanks. The change to ACQUIRE is also a bugfix.

--
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
 Intel Sweden AB - Registration Number: 556189-6027
 Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden


signature.asc
Description: This is a digitally signed message part.

Re: [PATCH, libstdc++] Improve slightly __cxa_guard_acquire

2012-09-06 Thread Jakub Jelinek

On Thu, Sep 06, 2012 at 01:33:11PM -0700, Benjamin De Kosnik wrote:
> Here's the patch as applied to trunk in rev. 191042. I'll apply it to
> 4.7 this weekend as long as nobody yelps.

> 2012-09-06  Thiago Macieira  
> 
>   PR libstdc++/54172
> * libsupc++/guard.cc (__cxa_guard_acquire): Exit the loop earlier if
> we detect that another thread has had success. Don't compare_exchange
> from a finished state back to a waiting state. Comment.
> 
> diff --git a/libstdc++-v3/libsupc++/guard.cc b/libstdc++-v3/libsupc++/guard.cc
> index adc9608..60165cd 100644
> --- a/libstdc++-v3/libsupc++/guard.cc
> +++ b/libstdc++-v3/libsupc++/guard.cc
> @@ -244,13 +244,13 @@ namespace __cxxabiv1
>  if (__gthread_active_p ())
>{
>   int *gi = (int *) (void *) g;
> - int expected(0);
>   const int guard_bit = _GLIBCXX_GUARD_BIT;
>   const int pending_bit = _GLIBCXX_GUARD_PENDING_BIT;
>   const int waiting_bit = _GLIBCXX_GUARD_WAITING_BIT;
>  
>   while (1)
> {
> + int expected(0);
>   if (__atomic_compare_exchange_n(gi, &expected, pending_bit, false,
>   __ATOMIC_ACQ_REL,
>   __ATOMIC_RELAXED))

Shouldn't this __ATOMIC_RELAXED be also __ATOMIC_ACQUIRE?  If expected ends
up being guard_bit, then the code will return 0; right away.

> @@ -264,13 +264,26 @@ namespace __cxxabiv1
>   // Already initialized.
>   return 0;   
> }
> +
>if (expected == pending_bit)
>  {
> +  // Use acquire here.
>int newv = expected | waiting_bit;
>if (!__atomic_compare_exchange_n(gi, &expected, newv, false,
> __ATOMIC_ACQ_REL, 
> -   __ATOMIC_RELAXED))
> -continue;
> +   __ATOMIC_ACQUIRE))
> +{
> +  if (expected == guard_bit)
> +{
> +  // Make a thread that failed to set the
> +  // waiting bit exit the function earlier,
> +  // if it detects that another thread has
> +  // successfully finished initialising.
> +  return 0;
> +}
> +  if (expected == 0)
> +continue;
> +}
>
>expected = newv;
>  }


Jakub

[PATCH, i386]: Remove .code64 from HLE configure test

2012-09-06 Thread Uros Bizjak

Hello!

We can use 32bit address without .code64 directive; this test is valid
for 32bit and 64bit targets.

2012-09-06  Uros Bizjak  

* configure.ac (hle prefixes): Remove .code64.
* configure: Regenerated.

Tested on x86_64-pc-linux-gnu, committed to mainline SVN.

Uros.
Index: configure.ac
===
--- configure.ac(revision 191042)
+++ configure.ac(working copy)
@@ -3581,9 +3581,7 @@
 
 gcc_GAS_CHECK_FEATURE([hle prefixes],
   gcc_cv_as_ix86_hle,,,
-  [.code64
-   lock xacquire cmpxchg %esi, (%rcx)
-   ],,
+  [lock xacquire cmpxchg %esi, (%ecx)],,
   [AC_DEFINE(HAVE_AS_IX86_HLE, 1,
 [Define if your assembler supports HLE prefixes.])])
 
Index: configure
===
--- configure   (revision 191042)
+++ configure   (working copy)
@@ -24417,9 +24417,7 @@
 else
   gcc_cv_as_ix86_hle=no
   if test x$gcc_cv_as != x; then
-$as_echo '.code64
-   lock xacquire cmpxchg %esi, (%rcx)
-   ' > conftest.s
+$as_echo 'lock xacquire cmpxchg %esi, (%ecx)' > conftest.s
 if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
   { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
   (eval $ac_try) 2>&5

[google] Added new dump flag -pmu to display pmu data in pass summaries (issue6489092)

2012-09-06 Thread Chris Manghane

This patch adds a new dump flag that dumps PMU profile information using
the -pmu dump option.

This patch should be applied to google/main.

Tested with crosstools.

2012-09-06  Chris Manghane  

* gcc/doc/invoke.texi: Modified pmu-profile-use option.
* gcc/tree-dump.c: Added new dump flag.
* gcc/tree-pretty-print.c
(dump_load_latency_details): New function.
(dump_pmu): New function.
(dump_generic_node): Added support for new dump flag.
* gcc/tree-pretty-print.h: Added new function to global header.
* gcc/tree-pass.h (enum tree_dump_index): Added new dump flag.
* gcc/gcov.c:
(process_pmu_profile): Fixed assertion conditions.
* gcc/gcov-io.h (struct gcov_pmu_summary): Added new struct.
* gcc/opts.c (common_handle_option): Added support for modified option.
* gcc/gimple-pretty-print.c
(dump_gimple_phi): Added support for new dump flag.
(dump_gimple_stmt): Ditto.
* gcc/coverage.c
(htab_counts_entry_hash): Added new hash table for PMU info.
(htab_pmu_entry_hash): Ditto.
(htab_counts_entry_eq): Ditto.
(htab_pmu_entry_eq): Ditto.
(htab_counts_entry_del): Ditto.
(htab_pmu_entry_del): Ditto.
(read_counts_file): Ditto.
(static void read_pmu_file): Ditto.
(get_coverage_pmu_latency): Ditto.
(get_coverage_pmu_branch_mispredict): Ditto.
(pmu_data_present): Added new function.
(coverage_init): Added pmu file reading support.
* gcc/coverage.h: Added pmu functions to global header.
* gcc/common.opt: Modified pmu-profile-use option.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 190817)
+++ gcc/doc/invoke.texi (working copy)
@@ -399,7 +399,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-generate=@var{path} -fprofile-generate-sampling @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -fpmu-profile-generate=@var{pmuoption} @gol
--fpmu-profile-use=@var{pmuoption} @gol
+-fpmu-profile-use=@var{pmudata} @gol
 -freciprocal-math -free -fregmove -frename-registers -freorder-blocks @gol
 -frecord-gcc-switches-in-elf@gol
 -freorder-blocks-and-partition -freorder-functions @gol
@@ -8381,12 +8381,11 @@ displayed using coverage tool gcov. The params var
 "pmu_profile_n_addresses" can be used to restrict PMU data collection
 to only this many addresses.
 
-@item -fpmu-profile-use=@var{pmuoption}
+@item -fpmu-profile-use=@var{pmudata}
 @opindex fpmu-profile-use
 
-Enable performance monitoring unit (PMU) profiling based
-optimizations.  Currently only @var{load-latency} and
-@var{branch-mispredict} are supported.
+If @var{pmudata} is specified, GCC will read PMU data from @var{pmudata}. If
+unspecified, PMU data will be read from 'pmuprofile.gcda'.
 
 @item -fprofile-strip=@var{base_suffix}
 @opindex fprofile-strip
Index: gcc/tree-dump.c
===
--- gcc/tree-dump.c (revision 190817)
+++ gcc/tree-dump.c (working copy)
@@ -824,9 +824,11 @@ static const struct dump_option_value_info dump_op
   {"nouid", TDF_NOUID},
   {"enumerate_locals", TDF_ENUMERATE_LOCALS},
   {"scev", TDF_SCEV},
+  {"pmu", TDF_PMU},
   {"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL | TDF_IPA
| TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE
-   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV)},
+   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV
+| TDF_PMU)},
   {NULL, 0}
 };
 
Index: gcc/tree-pretty-print.c
===
--- gcc/tree-pretty-print.c (revision 190817)
+++ gcc/tree-pretty-print.c (working copy)
@@ -25,6 +25,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "tree.h"
 #include "output.h"
+#include "basic-block.h"
+#include "gcov-io.h"
+#include "coverage.h"
 #include "tree-pretty-print.h"
 #include "hashtab.h"
 #include "tree-flow.h"
@@ -51,6 +54,7 @@ static void do_niy (pretty_printer *, const_tree);
 
 static pretty_printer buffer;
 static int initialized = 0;
+static char *file_prefix = NULL;
 
 /* Try to print something for an unknown tree code.  */
 
@@ -461,7 +465,32 @@ dump_omp_clauses (pretty_printer *buffer, tree cla
 }
 }
 
+/* Dump detailed information about pmu load latency events */
 
+void
+dump_load_latency_details (pretty_printer *buffer, gcov_pmu_ll_info_t *ll_info)
+{
+  if (ll_info == NULL)
+return;
+
+  pp_string (buffer, "\n[load latency contribution: ");
+  pp_scalar (buffer, "%.2f%%\n", ll_info->self / 100.f);
+  pp_string (buffer, "average cycle distribution:\n");
+  pp_scalar (buffer, "%.2f%% <= 10 cycles\n",
+ ll_info->lt_10 / 100.f);
+  pp_scalar (buffer, "%.2f%% <= 32 cycles\n",
+ ll_info->lt_32 / 100.f);

Re: [PATCH] PowerPC VLE port

2012-09-06 Thread Andrew Pinski

On Thu, Sep 6, 2012 at 2:55 PM, James Lemke  wrote:
> Attached are the patches for this gcc port.
>
> On a recent checkout (r191027) I have run the DejaGNU suite with no new
> failures for binutils, gas, ld, gcc, g++, gfortran.  A bootstrap is in
> progress.

Could you explain why you are changing system.h ?
Also seems like TARGET_VLE_ISEL should not be needed TARGET_ISEL is
always set for VLE targets.

Thanks,
Andrew

>
> Comments?
> OK to commit?
>
> Thanks, Jim.
>
> --
> Jim Lemke
> Mentor Graphics / CodeSourcery
> Orillia Ontario

Re: [C++ Patch] PR 18747

2012-09-06 Thread Paolo Carlini


Hi,

On 09/06/2012 07:47 PM, Jason Merrill wrote:

On 09/06/2012 12:18 PM, Paolo Carlini wrote:

On 09/06/2012 02:03 AM, Jason Merrill wrote:

but note that for:

template 
struct A
{
int select() { return 0; }
};

we have parser->num_template_parameter_lists == 1 and num_templates ==
0. Thus it seems that the case 'num_templates + 1' isn't (just) about
member templates...


That's odd, num_templates should be 1.

Yes, it seems odd, the same happens for template functions, like simply:

template 
void foo();


That seems right; there is one more set of template parameters than 
required for the scope, so foo itself is a primary template. If above 
you were talking about when we're looking at "struct A" it is right, too.
I was talking about that, yes. Ok then, per se the current check is 
loose but correct...

Really, whatever we do at least the comment should be adjusted not to
mention only member templates.


Yes, it's any case of declaring a primary template.

... modulo the comment.



Something like the attached passes testing, the clean-up is nice, but
I'm not sure about the specific details of the existing code vs
num_template_headers_for_class, whether we can just call the latter as I
did and be done.


I think I prefer the code from here and would change 
num_template_headers_for_class to match.

Ok, I did that in the below, also passes testing.

I see. Yesterday at some point I wondered whether we could do that
already when cp_parser_check_template_parameters is called, that is
remove the + 1 case and make the callers more precise. But I understand
not that it's too early. Then, are you under the impression that we
should still have cp_parser_check_template_parameters as-is and add a
check later, or have only the late one?


I think it ought to work to only have a late check.

I will look into that.

Thanks!
Paolo.

//
2012-09-07  Paolo Carlini  

* pt.c (num_template_headers_for_class): Rework per the code
inline in cp_parser_check_declarator_template_parameters.
* parser.c (cp_parser_check_declarator_template_parameters):
Use num_template_headers_for_class.
Index: pt.c
===
--- pt.c(revision 191032)
+++ pt.c(working copy)
@@ -2214,9 +2214,9 @@ copy_default_args_to_explicit_spec (tree decl)
 int
 num_template_headers_for_class (tree ctype)
 {
-  int template_count = 0;
-  tree t = ctype;
-  while (t != NULL_TREE && CLASS_TYPE_P (t))
+  int num_templates = 0;
+
+  while (ctype && CLASS_TYPE_P (ctype))
 {
   /* You're supposed to have one `template <...>' for every
 template class, but you don't need one for a full
@@ -2228,21 +2228,20 @@ num_template_headers_for_class (tree ctype)
 
 is correct; there shouldn't be a `template <>' for the
 definition of `S::f'.  */
-  if (CLASSTYPE_TEMPLATE_SPECIALIZATION (t)
- && !any_dependent_template_arguments_p (CLASSTYPE_TI_ARGS (t)))
-   /* T is an explicit (not partial) specialization.  All
-  containing classes must therefore also be explicitly
-  specialized.  */
+  if (!CLASSTYPE_TEMPLATE_INFO (ctype))
+   /* If CTYPE does not have template information of any
+  kind,  then it is not a template, nor is it nested
+  within a template.  */
break;
-  if ((CLASSTYPE_USE_TEMPLATE (t) || CLASSTYPE_IS_TEMPLATE (t))
- && PRIMARY_TEMPLATE_P (CLASSTYPE_TI_TEMPLATE (t)))
-   template_count += 1;
+  if (explicit_class_specialization_p (ctype))
+   break;
+  if (PRIMARY_TEMPLATE_P (CLASSTYPE_TI_TEMPLATE (ctype)))
+   ++num_templates;
 
-  t = TYPE_MAIN_DECL (t);
-  t = DECL_CONTEXT (t);
+  ctype = TYPE_CONTEXT (ctype);
 }
 
-  return template_count;
+  return num_templates;
 }
 
 /* Do a simple sanity check on the template headers that precede the
Index: parser.c
===
--- parser.c(revision 191032)
+++ parser.c(working copy)
@@ -20670,55 +20670,25 @@ cp_parser_check_declarator_template_parameters (cp
cp_declarator *declarator,
location_t declarator_location)
 {
-  unsigned num_templates;
-
-  /* We haven't seen any classes that involve template parameters yet.  */
-  num_templates = 0;
-
   switch (declarator->kind)
 {
 case cdk_id:
-  if (declarator->u.id.qualifying_scope)
-   {
- tree scope;
+  {
+   unsigned num_templates = 0;
+   tree scope = declarator->u.id.qualifying_scope;
 
- scope = declarator->u.id.qualifying_scope;
+   if (scope)
+ num_templates = num_template_headers_for_class (scope);
+   else if (TREE_CODE (declarator->u.id.unqualified_name)
+== TEMPLATE_ID_EXPR)
+ /* If the DECLARATOR has the form `X' then it uses one
+

Re: [PATCH] PowerPC VLE port

2012-09-06 Thread Maciej W. Rozycki

On Thu, 6 Sep 2012, Andrew Pinski wrote:

> Could you explain why you are changing system.h ?
> Also seems like TARGET_VLE_ISEL should not be needed TARGET_ISEL is
> always set for VLE targets.

 You mean this:

+  POWERPC_E200_MASK = MASK_VLE | MASK_ISEL | MASK_MULTIPLE

?  Well, this just marks that the e200 processor supports ISEL regardless 
of the mode selected (standard vs VLE).  Then with -mvle ISEL is supposed 
to be enabled regardless of the processor setting in effect (ISEL is a 
part of the base VLE instruction set, while it is optional in the standard 
mode).

  Maciej

Re: [PATCH] PowerPC VLE port

2012-09-06 Thread Andrew Pinski

On Thu, Sep 6, 2012 at 3:39 PM, Maciej W. Rozycki
 wrote:
> On Thu, 6 Sep 2012, Andrew Pinski wrote:
>
>> Could you explain why you are changing system.h ?
>> Also seems like TARGET_VLE_ISEL should not be needed TARGET_ISEL is
>> always set for VLE targets.
>
>  You mean this:
>
> +  POWERPC_E200_MASK = MASK_VLE | MASK_ISEL | MASK_MULTIPLE
>
> ?  Well, this just marks that the e200 processor supports ISEL regardless
> of the mode selected (standard vs VLE).  Then with -mvle ISEL is supposed
> to be enabled regardless of the processor setting in effect (ISEL is a
> part of the base VLE instruction set, while it is optional in the standard
> mode).

What I mean is set TARGET_ISEL to true when -mvle is supplied.

Thanks,
Andrew

Re: [PATCH][revisedx2] Fix PR bootstrap/54419

2012-09-06 Thread Paolo Carlini


Hi,

On 09/06/2012 10:32 PM, howa...@frodo.msbb.uc.edu wrote:

The attached patch eliminates the bootstrap failures in libstdc++-v3
of PR 54419 by adding a define to acinclude.m4, GLIBCXX_CHECK_X86_RDRAND, that
checks for assembler support of the new rdrnd opcode, using this new define
in configure.ac and also defining _GLIBCXX_X86_RDRAND in config.h if supported.
Tested on x86_64-apple-darwin12 against the assembler from Xcode 4.4.1.
Okay for gcc trunk?
Patch is basically Ok. Any particular reason for not using 
AC_MSG_CHECKING, AC_CACHE_VAL, and AC_MSG_RESULT, like we normally do in 
acinclude.m4? (I think testing in C is fine)


Paolo.

Re: [PATCH] PowerPC VLE port

2012-09-06 Thread Maciej W. Rozycki

On Thu, 6 Sep 2012, Andrew Pinski wrote:

> >  You mean this:
> >
> > +  POWERPC_E200_MASK = MASK_VLE | MASK_ISEL | MASK_MULTIPLE
> >
> > ?  Well, this just marks that the e200 processor supports ISEL regardless
> > of the mode selected (standard vs VLE).  Then with -mvle ISEL is supposed
> > to be enabled regardless of the processor setting in effect (ISEL is a
> > part of the base VLE instruction set, while it is optional in the standard
> > mode).
> 
> What I mean is set TARGET_ISEL to true when -mvle is supplied.

1. Will it work (switch back to -mno-isel) if -mno-vle is requested 
   further on the command line?

2. Separating the settings will help when/if per-function VLE/non-VLE 
   switching support is implemented, e.g. along the lines of 
   attribute((mips16)) and attribute((nomips16)).

  Maciej

Re: [PATCH] PowerPC VLE port

2012-09-06 Thread Joseph S. Myers

On Thu, 6 Sep 2012, James Lemke wrote:

> Attached are the patches for this gcc port.
> 
> On a recent checkout (r191027) I have run the DejaGNU suite with no new
> failures for binutils, gas, ld, gcc, g++, gfortran.  A bootstrap is in
> progress.

The -t* options added duplicate -mcpu= options; the only existing 
precedent appears to be arm-vxworks and I don't think the options are 
appropriate for generic PowerPC target files (not specific to an OS port 
such as VxWorks with its own special selection of multilibs).  Instead, it 
would be better to make the -mcpu= options imply appropriate other 
options.

They also aren't mentioned in invoke.texi at all; all new options need 
documenting in invoke.texi.  The only change to invoke.texi is listing 
-mvle and -mno-vle in the initial option summary.  The main section of 
PowerPC options documentation needs the actual substantive documentation 
of the semantics of -mvle added; just the summary isn't enough.

How did any target using eabivle.h manage to build when it uses 
MASK_NEW_MNEMONICS, which no longer exists?  Maybe this separate target 
isn't needed at all

Wouldn't it be better for longlong.h to have actual support for VLE rather 
than just disabling the present code, or is such support much harder to do 
than the present code?  (Using built-in functions, e.g. __builtin_clz if 
that expands inline for VLE, is fine.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] PowerPC VLE port

2012-09-06 Thread James Lemke


On 09/06/2012 06:09 PM, Andrew Pinski wrote:

Could you explain why you are changing system.h ?


That was a convenience to me at one point.
It should have been deleted from the patch set.

--
Jim Lemke
Mentor Graphics / CodeSourcery
Orillia Ontario,  +1-613-963-1073

[gccgo branch] Add cherry picked bug fixes

2012-09-06 Thread Ian Lance Taylor

I have committed this patch to the gccgo branch to add some cherry
picked bug fixes to the Go library.  This is for internal release
purposes.  The mainline and 4.7 tree remain at the Go 1.0.2 release.  I
expect that most of these patches will be in Go 1.0.3.

The complete list of cherry-picked patches, with reference to the master
Go mercurial repository:

# Node ID 8eae6e390d201d578d0b816077c68b8904fee838
# Parent  1384d7510575837707113161ea907468c2c6fb15
net/http: ignore paths on CONNECT requests in ServeMux

Fixes issue 3538

R=golang-dev, adg, rsc
CC=golang-dev
http://codereview.appspot.com/6117058

# Date 1338227929 25200
# Node ID 40dc2beda77fd6d445e5babd9a7b383b000fe514
# Parent  1786f1514ab5cc008c4110c80f8ee5d92925628e
net/http: speed up ServeMux when no patterns contain hostnames

R=golang-dev, r
CC=golang-dev
http://codereview.appspot.com/6248053

# Node ID 2033e1b11a20b7fc2d5bdccfeb0af3ff97e74b8b
# Parent  a3c2d3c41c45eb9d10b85c382aa086ca198760cf
go/ast: multiple "blank" imports are permitted

R=rsc, dsymonds
CC=golang-dev
http://codereview.appspot.com/6303099

# Node ID 070604630d24623f594faed7602f6753698b3cd5
# Parent  f8a77e2b7d0b85f3774b90199075d06d3582283f
net/http: support multiple byte ranges in ServeContent

Fixes issue 3784

R=golang-dev, adg
CC=golang-dev
http://codereview.appspot.com/6351052

# Node ID d32138d8d05fcefd6b522c8134d7e26d5ca5bb51
# Parent  aee6a01a9f9ee734801d0873755d3ce9e7668eac
net/http: ignore malicious or dumb Range requests

R=golang-dev, adg
CC=golang-dev
http://codereview.appspot.com/6356050

# Node ID d5754b3d9f444d28085b12fcc9614ec8ee2c2735
# Parent  1c10c31995d6b7f1a9669a771057ea4c2b720700
encoding/gob: fix check for short input in slice decode

R=golang-dev, dsymonds, r, nigeltao
CC=golang-dev
http://codereview.appspot.com/6374059

# Node ID 5e7fd762f3565b67b0c461ca8000b16b36515e3f
# Parent  6c441dee919c5c1ce2651141bce6213912918869
testing: fix memory blowup when formatting many lines.

Fixes issue 3830.

R=golang-dev, r
CC=golang-dev, remy
http://codereview.appspot.com/6373047

# Node ID 7a67d277c7e8f5d1003e2fac0c1579e1d25eddca
# Parent  6eb7e61b5286fb84218bd6a5a971e57072c00055
testing: allow concurrent use of T and B

Notably, allow concurrent logging and failing.

R=golang-dev, r
CC=golang-dev
http://codereview.appspot.com/6453045

# Node ID 8d39afcd18b1abfb5c50738fb988561d0d0b723c
# Parent  52a0395d0e81b1677d001932b6c666bdb53b8f86
net/http: Set TLSClientConfig.ServerName on every HTTP request.

This makes SNI "just work" for callers using the standard http.Client.

Since we now have a test that depends on the httptest.Server cert, change
the cert to be a CA (keeping all other fields the same).

R=bradfitz
CC=agl, dsymonds, gobot, golang-dev
http://codereview.appspot.com/6448154

# Node ID c552fb2b6a6c1495508191c2f3f89b9bc8a1180b
# Parent  57039cf95e8901f4324aea658c1a986f5436194c
net/http: send an explicit zero Content-Length when Handler never Writes

Fixes issue 4004

NOTE: Edited to change .get to .Get, to avoid needing to bring in CL 6255053,
which refers to far too many other changes.

R=golang-dev, r
CC=golang-dev
http://codereview.appspot.com/6472055

# Node ID c8cc7270808012f382029b13d4d950137eb1c81a
# Parent  18a0bd67b4b475049162093a21e4a53e637d3cad
net/http: fix inserting of implicit redirects in serve mux

In serve mux, if pattern contains a host name, pass only the path to
the redirect handler.

Add tests for serve mux redirections.

R=rsc
CC=bradfitz, gobot, golang-dev
http://codereview.appspot.com/6329045

# Node ID 5d47297457972e289d6d759ded49693f38ad9500
# Parent  c8cc7270808012f382029b13d4d950137eb1c81a
net/http: add (*ServeMux).Handler method

The Handler method makes the ServeMux dispatch logic
available to wrappers that enforce additional constraints
on requests.

R=golang-dev, bradfitz, dsymonds
CC=golang-dev
http://codereview.appspot.com/6450165

Ian

Index: encoding/gob/encoder_test.go
===
--- encoding/gob/encoder_test.go	(revision 190560)
+++ encoding/gob/encoder_test.go	(working copy)
@@ -736,3 +736,32 @@ func TestPtrToMapOfMap(t *testing.T) {
 		t.Fatalf("expected %v got %v", data, newData)
 	}
 }
+
+// There was an error check comparing the length of the input with the
+// length of the slice being decoded. It was wrong because the next
+// thing in the input might be a type definition, which would lead to
+// an incorrect length check.  This test reproduces the corner case.
+
+type Z struct {
+}
+
+func Test29ElementSlice(t *testing.T) {
+	Register(Z{})
+	src := make([]interface{}, 100) // Size needs to be bigger than size of type definition.
+	for i := range src {
+		src[i] = Z{}
+	}
+	buf := new(bytes.Buffer)
+	err := NewEncoder(buf).Encode(src)
+	if err != nil {
+		t.Fatalf("encode: %v", err)
+		return
+	}
+
+	var dst []interface{}
+	err = NewDecoder(buf).Decode(&dst)
+	if err != nil {
+		t.Errorf("decode: %v", err)
+		return
+	}
+}
Index: encoding/gob/decode.go
=

[patch trivial] Fix comment in dwarf2.def

2012-09-06 Thread Cary Coutant

I've committed this trivial comment patch to src/include/dwarf2.def
(and will shortly do the same in gcc/include).

-cary


2012-09-06  Cary Coutant  

include/
* dwarf2.def: Edit comment.

diff --git a/include/dwarf2.def b/include/dwarf2.def
index 3c3dfcc..7fe2df1 100644
--- a/include/dwarf2.def
+++ b/include/dwarf2.def
@@ -586,7 +586,7 @@ DW_OP (DW_OP_GNU_convert, 0xf7)
 DW_OP (DW_OP_GNU_reinterpret, 0xf9)
 /* The GNU parameter ref extension.  */
 DW_OP (DW_OP_GNU_parameter_ref, 0xfa)
-/* Extension for Fission.  See http://gcc.gnu.org/wiki/DebugFission.  */
+/* Extensions for Fission.  See http://gcc.gnu.org/wiki/DebugFission.  */
 DW_OP (DW_OP_GNU_addr_index, 0xfb)
 DW_OP (DW_OP_GNU_const_index, 0xfc)
 /* HP extensions.  */

Re: [PATCH] PowerPC VLE port

2012-09-06 Thread James Lemke


On 09/06/2012 07:07 PM, Joseph S. Myers wrote:

The -t* options added duplicate -mcpu= options; the only existing
precedent appears to be arm-vxworks and I don't think the options are
appropriate for generic PowerPC target files (not specific to an OS port
such as VxWorks with its own special selection of multilibs).  Instead, it
would be better to make the -mcpu= options imply appropriate other
options.


Agreed.  I will remove the -t* options.

--
Jim Lemke
Mentor Graphics / CodeSourcery
Orillia Ontario,  +1-613-963-1073

Re: [google] Added new dump flag -pmu to display pmu data in pass summaries (issue6489092)

2012-09-06 Thread Teresa Johnson

On Thu, Sep 6, 2012 at 2:49 PM, Chris Manghane  wrote:
> This patch adds a new dump flag that dumps PMU profile information using
> the -pmu dump option.
>
> This patch should be applied to google/main.
>
> Tested with crosstools.
>
> 2012-09-06  Chris Manghane  
>
> * gcc/doc/invoke.texi: Modified pmu-profile-use option.
> * gcc/tree-dump.c: Added new dump flag.
> * gcc/tree-pretty-print.c
> (dump_load_latency_details): New function.
> (dump_pmu): New function.
> (dump_generic_node): Added support for new dump flag.
> * gcc/tree-pretty-print.h: Added new function to global header.
> * gcc/tree-pass.h (enum tree_dump_index): Added new dump flag.
> * gcc/gcov.c:
> (process_pmu_profile): Fixed assertion conditions.
> * gcc/gcov-io.h (struct gcov_pmu_summary): Added new struct.
> * gcc/opts.c (common_handle_option): Added support for modified 
> option.
> * gcc/gimple-pretty-print.c
> (dump_gimple_phi): Added support for new dump flag.
> (dump_gimple_stmt): Ditto.
> * gcc/coverage.c
> (htab_counts_entry_hash): Added new hash table for PMU info.
> (htab_pmu_entry_hash): Ditto.
> (htab_counts_entry_eq): Ditto.
> (htab_pmu_entry_eq): Ditto.
> (htab_counts_entry_del): Ditto.
> (htab_pmu_entry_del): Ditto.
> (read_counts_file): Ditto.
> (static void read_pmu_file): Ditto.
> (get_coverage_pmu_latency): Ditto.
> (get_coverage_pmu_branch_mispredict): Ditto.
> (pmu_data_present): Added new function.
> (coverage_init): Added pmu file reading support.
> * gcc/coverage.h: Added pmu functions to global header.
> * gcc/common.opt: Modified pmu-profile-use option.
>
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi (revision 190817)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -399,7 +399,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fprofile-generate=@var{path} -fprofile-generate-sampling @gol
>  -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
>  -fpmu-profile-generate=@var{pmuoption} @gol
> --fpmu-profile-use=@var{pmuoption} @gol
> +-fpmu-profile-use=@var{pmudata} @gol
>  -freciprocal-math -free -fregmove -frename-registers -freorder-blocks @gol
>  -frecord-gcc-switches-in-elf@gol
>  -freorder-blocks-and-partition -freorder-functions @gol
> @@ -8381,12 +8381,11 @@ displayed using coverage tool gcov. The params var
>  "pmu_profile_n_addresses" can be used to restrict PMU data collection
>  to only this many addresses.
>
> -@item -fpmu-profile-use=@var{pmuoption}
> +@item -fpmu-profile-use=@var{pmudata}
>  @opindex fpmu-profile-use
>
> -Enable performance monitoring unit (PMU) profiling based
> -optimizations.  Currently only @var{load-latency} and
> -@var{branch-mispredict} are supported.
> +If @var{pmudata} is specified, GCC will read PMU data from @var{pmudata}. If
> +unspecified, PMU data will be read from 'pmuprofile.gcda'.
>
>  @item -fprofile-strip=@var{base_suffix}
>  @opindex fprofile-strip
> Index: gcc/tree-dump.c
> ===
> --- gcc/tree-dump.c (revision 190817)
> +++ gcc/tree-dump.c (working copy)
> @@ -824,9 +824,11 @@ static const struct dump_option_value_info dump_op
>{"nouid", TDF_NOUID},
>{"enumerate_locals", TDF_ENUMERATE_LOCALS},
>{"scev", TDF_SCEV},
> +  {"pmu", TDF_PMU},
>{"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL | TDF_IPA
> | TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE
> -   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV)},
> +   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV
> +| TDF_PMU)},
>{NULL, 0}
>  };
>
> Index: gcc/tree-pretty-print.c
> ===
> --- gcc/tree-pretty-print.c (revision 190817)
> +++ gcc/tree-pretty-print.c (working copy)
> @@ -25,6 +25,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm.h"
>  #include "tree.h"
>  #include "output.h"
> +#include "basic-block.h"
> +#include "gcov-io.h"
> +#include "coverage.h"
>  #include "tree-pretty-print.h"
>  #include "hashtab.h"
>  #include "tree-flow.h"
> @@ -51,6 +54,7 @@ static void do_niy (pretty_printer *, const_tree);
>
>  static pretty_printer buffer;
>  static int initialized = 0;
> +static char *file_prefix = NULL;
>
>  /* Try to print something for an unknown tree code.  */
>
> @@ -461,7 +465,32 @@ dump_omp_clauses (pretty_printer *buffer, tree cla
>  }
>  }
>
> +/* Dump detailed information about pmu load latency events */
>
> +void
> +dump_load_latency_details (pretty_printer *buffer, gcov_pmu_ll_info_t 
> *ll_info)
> +{
> +  if (ll_info == NULL)
> +return;
> +
> +  pp_string (buffer, "\n[load latency contribution: ");
> +  p

Re: [google] Added new dump flag -pmu to display pmu data in pass summaries (issue6489092)

2012-09-06 Thread Teresa Johnson

On Thu, Sep 6, 2012 at 5:34 PM, Chris Manghane  wrote:
>
>
> On Thu, Sep 6, 2012 at 5:08 PM, Teresa Johnson  wrote:
>>
>> On Thu, Sep 6, 2012 at 2:49 PM, Chris Manghane  wrote:
>> > This patch adds a new dump flag that dumps PMU profile information using
>> > the -pmu dump option.
>> >
>> > This patch should be applied to google/main.
>> >
>> > Tested with crosstools.
>> >
>> > 2012-09-06  Chris Manghane  
>> >
>> > * gcc/doc/invoke.texi: Modified pmu-profile-use option.
>> > * gcc/tree-dump.c: Added new dump flag.
>> > * gcc/tree-pretty-print.c
>> > (dump_load_latency_details): New function.
>> > (dump_pmu): New function.
>> > (dump_generic_node): Added support for new dump flag.
>> > * gcc/tree-pretty-print.h: Added new function to global header.
>> > * gcc/tree-pass.h (enum tree_dump_index): Added new dump flag.
>> > * gcc/gcov.c:
>> > (process_pmu_profile): Fixed assertion conditions.
>> > * gcc/gcov-io.h (struct gcov_pmu_summary): Added new struct.
>> > * gcc/opts.c (common_handle_option): Added support for modified
>> > option.
>> > * gcc/gimple-pretty-print.c
>> > (dump_gimple_phi): Added support for new dump flag.
>> > (dump_gimple_stmt): Ditto.
>> > * gcc/coverage.c
>> > (htab_counts_entry_hash): Added new hash table for PMU info.
>> > (htab_pmu_entry_hash): Ditto.
>> > (htab_counts_entry_eq): Ditto.
>> > (htab_pmu_entry_eq): Ditto.
>> > (htab_counts_entry_del): Ditto.
>> > (htab_pmu_entry_del): Ditto.
>> > (read_counts_file): Ditto.
>> > (static void read_pmu_file): Ditto.
>> > (get_coverage_pmu_latency): Ditto.
>> > (get_coverage_pmu_branch_mispredict): Ditto.
>> > (pmu_data_present): Added new function.
>> > (coverage_init): Added pmu file reading support.
>> > * gcc/coverage.h: Added pmu functions to global header.
>> > * gcc/common.opt: Modified pmu-profile-use option.
>> >
>> > Index: gcc/doc/invoke.texi
>> > ===
>> > --- gcc/doc/invoke.texi (revision 190817)
>> > +++ gcc/doc/invoke.texi (working copy)
>> > @@ -399,7 +399,7 @@ Objective-C and Objective-C++ Dialects}.
>> >  -fprofile-generate=@var{path} -fprofile-generate-sampling @gol
>> >  -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
>> >  -fpmu-profile-generate=@var{pmuoption} @gol
>> > --fpmu-profile-use=@var{pmuoption} @gol
>> > +-fpmu-profile-use=@var{pmudata} @gol
>> >  -freciprocal-math -free -fregmove -frename-registers -freorder-blocks
>> > @gol
>> >  -frecord-gcc-switches-in-elf@gol
>> >  -freorder-blocks-and-partition -freorder-functions @gol
>> > @@ -8381,12 +8381,11 @@ displayed using coverage tool gcov. The params
>> > var
>> >  "pmu_profile_n_addresses" can be used to restrict PMU data collection
>> >  to only this many addresses.
>> >
>> > -@item -fpmu-profile-use=@var{pmuoption}
>> > +@item -fpmu-profile-use=@var{pmudata}
>> >  @opindex fpmu-profile-use
>> >
>> > -Enable performance monitoring unit (PMU) profiling based
>> > -optimizations.  Currently only @var{load-latency} and
>> > -@var{branch-mispredict} are supported.
>> > +If @var{pmudata} is specified, GCC will read PMU data from
>> > @var{pmudata}. If
>> > +unspecified, PMU data will be read from 'pmuprofile.gcda'.
>> >
>> >  @item -fprofile-strip=@var{base_suffix}
>> >  @opindex fprofile-strip
>> > Index: gcc/tree-dump.c
>> > ===
>> > --- gcc/tree-dump.c (revision 190817)
>> > +++ gcc/tree-dump.c (working copy)
>> > @@ -824,9 +824,11 @@ static const struct dump_option_value_info dump_op
>> >{"nouid", TDF_NOUID},
>> >{"enumerate_locals", TDF_ENUMERATE_LOCALS},
>> >{"scev", TDF_SCEV},
>> > +  {"pmu", TDF_PMU},
>> >{"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL |
>> > TDF_IPA
>> > | TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE
>> > -   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS |
>> > TDF_SCEV)},
>> > +   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV
>> > +| TDF_PMU)},
>> >{NULL, 0}
>> >  };
>> >
>> > Index: gcc/tree-pretty-print.c
>> > ===
>> > --- gcc/tree-pretty-print.c (revision 190817)
>> > +++ gcc/tree-pretty-print.c (working copy)
>> > @@ -25,6 +25,9 @@ along with GCC; see the file COPYING3.  If not see
>> >  #include "tm.h"
>> >  #include "tree.h"
>> >  #include "output.h"
>> > +#include "basic-block.h"
>> > +#include "gcov-io.h"
>> > +#include "coverage.h"
>> >  #include "tree-pretty-print.h"
>> >  #include "hashtab.h"
>> >  #include "tree-flow.h"
>> > @@ -51,6 +54,7 @@ static void do_niy (pretty_printer *, const_tree);
>> >
>> >  static pretty_printer buffer;
>> >  static int initialized = 0;
>> >

Re: [patch, mips] New mips triplet for multilib linux builds

2012-09-06 Thread Steve Ellcey

On Thu, 2012-09-06 at 06:47 +0100, Richard Sandiford wrote:

> > Is this an 'if-then-else' usage?
> 
> Yeah, but I typoed, sorry.  It should be:
> 
> %{mips32r2|mips64r2:-msynci;:-mno-synci}
> 
> Richard

OK, I got that working now.  I am still having some issues though.  My
original patch was setup to include mti-linux.h before mips.h and I
think that is good for setting MIPS_ABI_DEFAULT and MIPS_ISA_DEFAULT
because mips.h is going to look and see if those values are set.

But now that I am setting DRIVER_SELF_SPECS in the header it seems
like I should include it after mips.h (so I can override the setting
of DRIVER_SELF_SPECS in mips.h).

Do I need two header files?  One to include before mips.h and one to
include after mips.h?

Steve Ellcey
sell...@mips.com

Re: [PATCH] Remove RS6000_CALL_GLUE

2012-09-06 Thread David Edelsohn

On Sun, Aug 26, 2012 at 7:03 PM, Segher Boessenkool
 wrote:
> On all supported targets, plain "nop" is the correct thing
> to use ("cror 31,31,31" was for POWER).
>
> Tested on powerpc64-linux --enable-languages=c,c++,fortran; no
> regressions.  Okay for mainline?
>
>
> Segher
>
>
> 2012-08-26  Segher Boessenkool  
>
> gcc/
> * config/rs6000/aix43.h (RS6000_CALL_GLUE): Delete.
> * config/rs6000/aix51.h (RS6000_CALL_GLUE): Delete.
> * config/rs6000/aix52.h (RS6000_CALL_GLUE): Delete.
> * config/rs6000/aix53.h (RS6000_CALL_GLUE): Delete.
> * config/rs6000/aix61.h (RS6000_CALL_GLUE): Delete.
> * config/rs6000/freebsd64.h (RS6000_CALL_GLUE): Delete.
> * config/rs6000/linux64.h (RS6000_CALL_GLUE): Delete.
> * config/rs6000/rs6000.c (print_operand) ['.']: Delete.
> * config/rs6000/rs6000.h (RS6000_CALL_GLUE): Delete.
> * config/rs6000/rs6000.md (tls_gd_aix):
> Replace %. with nop.
> (tls_gd_call_aix): Ditto.
> (tls_ld_aix): Ditto.
> (tls_ld_call_aix): Ditto.
> (call_nonlocal_aix32): Ditto.
> (call_nonlocal_aix64): Ditto.
> (call_value_nonlocal_aix32): Ditto.
> (call_value_nonlocal_aix64): Ditto.

There has been no objection from others, so go ahead.

Thanks, David

[google] Added new dump flag -pmu to display pmu data in pass summaries (issue6489092)

2012-09-06 Thread Chris Manghane

Fixed spacing and condensed repeated code into helper.

This patch should be applied to google/main.

Tested with crosstools.

2012-09-06  Chris Manghane  

* gcc/doc/invoke.texi: Modified pmu-profile-use option.
* gcc/tree-dump.c: Added new dump flag.
* gcc/tree-pretty-print.c
(dump_load_latency_details): New function.
(dump_pmu): New function.
(dump_generic_node): Added support for new dump flag.
* gcc/tree-pretty-print.h: Added new function to global header.
* gcc/tree-pass.h (enum tree_dump_index): Added new dump flag.
* gcc/gcov.c:
(process_pmu_profile): Fixed assertion conditions.
* gcc/gcov-io.h (struct gcov_pmu_summary): Added new struct.
* gcc/opts.c (common_handle_option): Added support for modified option.
* gcc/gimple-pretty-print.c
(dump_pmu_data): New function.
(dump_gimple_phi): Added support for new dump flag.
(dump_gimple_stmt): Ditto.
* gcc/coverage.c
(htab_counts_entry_hash): Added new hash table for PMU info.
(htab_pmu_entry_hash): Ditto.
(htab_counts_entry_eq): Ditto.
(htab_pmu_entry_eq): Ditto.
(htab_counts_entry_del): Ditto.
(htab_pmu_entry_del): Ditto.
(read_counts_file): Ditto.
(static void read_pmu_file): Ditto.
(get_coverage_pmu_latency): Ditto.
(get_coverage_pmu_branch_mispredict): Ditto.
(pmu_data_present): Added new function.
(coverage_init): Added pmu file reading support.
* gcc/coverage.h: Added pmu functions to global header.
* gcc/common.opt: Modified pmu-profile-use option.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 190817)
+++ gcc/doc/invoke.texi (working copy)
@@ -399,7 +399,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-generate=@var{path} -fprofile-generate-sampling @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -fpmu-profile-generate=@var{pmuoption} @gol
--fpmu-profile-use=@var{pmuoption} @gol
+-fpmu-profile-use=@var{pmudata} @gol
 -freciprocal-math -free -fregmove -frename-registers -freorder-blocks @gol
 -frecord-gcc-switches-in-elf@gol
 -freorder-blocks-and-partition -freorder-functions @gol
@@ -8381,12 +8381,11 @@ displayed using coverage tool gcov. The params var
 "pmu_profile_n_addresses" can be used to restrict PMU data collection
 to only this many addresses.
 
-@item -fpmu-profile-use=@var{pmuoption}
+@item -fpmu-profile-use=@var{pmudata}
 @opindex fpmu-profile-use
 
-Enable performance monitoring unit (PMU) profiling based
-optimizations.  Currently only @var{load-latency} and
-@var{branch-mispredict} are supported.
+If @var{pmudata} is specified, GCC will read PMU data from @var{pmudata}. If
+unspecified, PMU data will be read from 'pmuprofile.gcda'.
 
 @item -fprofile-strip=@var{base_suffix}
 @opindex fprofile-strip
Index: gcc/tree-dump.c
===
--- gcc/tree-dump.c (revision 190817)
+++ gcc/tree-dump.c (working copy)
@@ -824,9 +824,11 @@ static const struct dump_option_value_info dump_op
   {"nouid", TDF_NOUID},
   {"enumerate_locals", TDF_ENUMERATE_LOCALS},
   {"scev", TDF_SCEV},
+  {"pmu", TDF_PMU},
   {"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_TREE | TDF_RTL | TDF_IPA
| TDF_STMTADDR | TDF_GRAPH | TDF_DIAGNOSTIC | TDF_VERBOSE
-   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV)},
+   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV
+| TDF_PMU)},
   {NULL, 0}
 };
 
Index: gcc/tree-pretty-print.c
===
--- gcc/tree-pretty-print.c (revision 190817)
+++ gcc/tree-pretty-print.c (working copy)
@@ -25,6 +25,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "tree.h"
 #include "output.h"
+#include "basic-block.h"
+#include "gcov-io.h"
+#include "coverage.h"
 #include "tree-pretty-print.h"
 #include "hashtab.h"
 #include "tree-flow.h"
@@ -51,6 +54,7 @@ static void do_niy (pretty_printer *, const_tree);
 
 static pretty_printer buffer;
 static int initialized = 0;
+static char *file_prefix = NULL;
 
 /* Try to print something for an unknown tree code.  */
 
@@ -461,7 +465,32 @@ dump_omp_clauses (pretty_printer *buffer, tree cla
 }
 }
 
+/* Dump detailed information about pmu load latency events */
 
+void
+dump_load_latency_details (pretty_printer *buffer, gcov_pmu_ll_info_t *ll_info)
+{
+  if (ll_info == NULL)
+return;
+
+  pp_string (buffer, "\n[load latency contribution: ");
+  pp_scalar (buffer, "%.2f%%\n", ll_info->self / 100.f);
+  pp_string (buffer, "average cycle distribution:\n");
+  pp_scalar (buffer, "%.2f%% <= 10 cycles\n",
+ ll_info->lt_10 / 100.f);
+  pp_scalar (buffer, "%.2f%% <= 32 cycles\n",
+ ll_info->lt_32 / 100.f);
+

[PATCH] Add -fmem-report-wpa

2012-09-06 Thread Andi Kleen

From: Andi Kleen 

For parallel LTO builds setting -fmem-report does not work very well
because all the LTRANS phases dump it in parallel and typically interleave
it to unreadability.

Since usually the memory bottleneck is WPA add a flag to only dump
the memory report for that.

Passed bootstrap and testsuite on x86_64.

2012-09-06  Andi Kleen  

* gcc/common.opt (-fmem-report-wpa): Add
* gcc/doc/invoke.texi (-fmem-report-wpa): Document.
* gcc/lto/lto.c (do_whole_program_analysis): Run mem_report
when mem_report_wpa is set.
---
 gcc/common.opt  |4 
 gcc/doc/invoke.texi |7 ++-
 gcc/lto/lto.c   |2 ++
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 87e28b5..73eebf4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1470,6 +1470,10 @@ fmem-report
 Common Report Var(mem_report)
 Report on permanent memory allocation
 
+fmem-report-wpa
+Common Report Var(mem_report_wpa)
+Report on permanent memory allocation in WPA only
+
 ; This will attempt to merge constant section constants, if 1 only
 ; string constants and constants from constant pool, if 2 also constant
 ; variables.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6cf7cec..5a5f9d2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -328,7 +328,7 @@ Objective-C and Objective-C++ Dialects}.
 -feliminate-unused-debug-symbols -femit-class-debug-always @gol
 -fenable-@var{kind}-@var{pass} @gol
 -fenable-@var{kind}-@var{pass}=@var{range-list} @gol
--fdebug-types-section @gol
+-fdebug-types-section -fmem-report-wpa @gol
 -fmem-report -fpre-ipa-mem-report -fpost-ipa-mem-report -fprofile-arcs @gol
 -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol
 -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol
@@ -5132,6 +5132,11 @@ pass when it finishes.
 Makes the compiler print some statistics about permanent memory
 allocation when it finishes.
 
+@item -fmem-report-wpa
+@opindex fmem-report-wpa
+Makes the compiler print some statistics about permanent memory
+allocation for the WPA phase only.
+
 @item -fpre-ipa-mem-report
 @opindex fpre-ipa-mem-report
 @item -fpost-ipa-mem-report
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 5da5412..3c59299 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -2016,6 +2016,8 @@ do_whole_program_analysis (void)
   /* Show the LTO report before launching LTRANS.  */
   if (flag_lto_report)
 print_lto_report ();
+  if (mem_report_wpa)
+dump_mem_report ();
 }
 
 
-- 
1.7.7

[PATCH] Fix part of PR gcov-profile/54487 (issue6501100)

2012-09-06 Thread Teresa Johnson

This fixes part of the issue described in PR gcov-profile/54487 where
there were warnings about mismatches due to slight differences in the
merged histograms in different object files. This can happen due to
the truncating integer division in the merge routine, which could result
in slightly different histograms when summaries are merged in different
orders.

Tested with bootstrap and profiledbootstrap on x86_64-unknown-linux-gnu.
Ok for trunk?

Teresa

2012-09-06  Teresa Johnson  

PR gcov-profile/54487
* libgcc/libgcov.c (gcov_exit): Avoid warning on histogram
differences.

Index: libgcc/libgcov.c
===
--- libgcc/libgcov.c(revision 191035)
+++ libgcc/libgcov.c(working copy)
@@ -707,7 +707,13 @@ gcov_exit (void)
memcpy (cs_all, cs_prg, sizeof (*cs_all));
  else if (!all_prg.checksum
   && (!GCOV_LOCKED || cs_all->runs == cs_prg->runs)
-  && memcmp (cs_all, cs_prg, sizeof (*cs_all)))
+   /* Don't compare the histograms, which may have slight
+  variations depending on the order they were updated
+  due to the truncating integer divides used in the
+  merge.  */
+   && memcmp (cs_all, cs_prg,
+  sizeof (*cs_all) - (sizeof (gcov_bucket_type)
+  * GCOV_HISTOGRAM_SIZE)))
{
  fprintf (stderr, "profiling:%s:Invocation mismatch - some data 
files may have been removed%s\n",
   gi_filename, GCOV_LOCKED

--
This patch is available for review at http://codereview.appspot.com/6501100

PING: [PATCH, ARM] New CPU support for Marvell PJ4 cores

2012-09-06 Thread Yi-Hsiu Hsu

Ping!!

Cheers,
Yi-Hsiu, Hsu


-Original Message-
From: Yi-Hsiu Hsu 
Sent: Tuesday, June 26, 2012 1:51 PM
To: 'Chung-Lin Tang'
Cc: Ramana Radhakrishnan; gcc-patches@gcc.gnu.org
Subject: RE: [PATCH, ARM] New CPU support for Marvell PJ4 cores

Hi Chung-Lin,

I think tune_marvell attribute better be kept for future Marvell cores 
extension.
Thanks!

B.R.
Yi-Hsiu, Hsu


-Original Message-
From: Chung-Lin Tang [mailto:clt...@codesourcery.com]
Sent: Tuesday, June 26, 2012 12:12 PM
To: Yi-Hsiu Hsu
Cc: Ramana Radhakrishnan; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, ARM] New CPU support for Marvell PJ4 cores

On 2012/6/26 09:37 AM, Yi-Hsiu Hsu wrote:
> Updated changelog.
> 
> * config/arm/marvell-pj4.md: New marvell-pj4 pipeline description.
> * config/arm/arm.c (arm_issue_rate): Add marvell_pj4.
> * config/arm/arm.md (tune_marvell): Add marvell_pj4.
> * config/arm/arm-cores.def: Add core marvell-pj4.
> * config/arm/arm-tune.md: Regenerated.
> * config/arm/arm-tables.opt: Regenerated.
> * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell_pj4.
> * doc/invoke.texi: Added entry for marvell-pj4.

Another nit, I think the tune_marvell attribute now looks a bit unneeded. You 
can just fold it into the "tune" clause of "generic_sched".

Thanks,
Chung-Lin

Re: [patch, mips] New mips triplet for multilib linux builds

2012-09-06 Thread Richard Sandiford

Steve Ellcey  writes:
> On Thu, 2012-09-06 at 06:47 +0100, Richard Sandiford wrote:
>> > Is this an 'if-then-else' usage?
>> 
>> Yeah, but I typoed, sorry.  It should be:
>> 
>> %{mips32r2|mips64r2:-msynci;:-mno-synci}
>> 
>> Richard
>
> OK, I got that working now.  I am still having some issues though.  My
> original patch was setup to include mti-linux.h before mips.h and I
> think that is good for setting MIPS_ABI_DEFAULT and MIPS_ISA_DEFAULT
> because mips.h is going to look and see if those values are set.
>
> But now that I am setting DRIVER_SELF_SPECS in the header it seems
> like I should include it after mips.h (so I can override the setting
> of DRIVER_SELF_SPECS in mips.h).
>
> Do I need two header files?  One to include before mips.h and one to
> include after mips.h?

MIPS_ABI_DEFAULT and MIPS_ISA_DEFAULT are better set in config.gcc.
That also allows you to handle (say) mipsisa32-mti-linux-gnu vs.
mipsisa64-mti-linux-gnu.

I think in general we want more specific header files to come after
less specific ones, so that the more specific ones can override
whatever they like.  E.g. the order for the generic config/ *.hs
is "elfos.h gnu-user.h linux.h" and the order for the MIPS ones
is "mips.h gnu-user.h gnu-user64.h linux64.h".  linux-common.h
coming after linux64.h is an odd-one-out really.

Richard

Small ira.c:setup_pressure_classes tweak

2012-09-06 Thread Richard Sandiford

ira.c:setup_pressure_classes treats "leaf" classes as pressure classes
even if moves between them are more expensive than moves to or from memory:

  if (ira_class_hard_regs_num[cl] != 1
  /* A register class without subclasses may contain a few
 hard registers and movement between them is costly
 (e.g. SPARC FPCC registers).  We still should consider it
 as a candidate for a pressure class.  */
  && alloc_reg_class_subclasses[cl][0] != LIM_REG_CLASSES)
{

MIPS accumulators are another case where this inclusion is important.

Everything works as expected with current sources, where:

MD_REGS = MD0_REG + MD1_REG
ACC_REGS = MD_REGS + DSP_ACC_REGS

So the three leaf classes are the singleton MD0_REG and MD1_REG (available
on all MIPS targets) and DSP_ACC_REGS (available only when using the DSP ASE).
All three are treated as pressure classes where appropriate.

However, MD0 and MD1 are no longer independently allocatable.  Splitting
what is effectively one register into two classes creates some confusion,
so I'd like to get rid of them.  We're then left with just:

ACC_REGS = MD_REGS + DSP_ACC_REGS

where MD_REGS and DSP_ACC_REGS are both leaf classes.  Without the DSP ASE,
that reduces to:

ACC_REGS = MD_REGS

The problem is that alloc_reg_class_subclasses (unlike reg_class_subclasses)
includes higher-numbered classes too.  So when ACC_REGS = MD_REGS,
alloc_reg_class_subclasses lists ACC_REGS as a subset of MD_REGS
and MD_REGS as a subset of ACC_REGS.  The condition:

  alloc_reg_class_subclasses[cl][0] != LIM_REG_CLASSES

therefore holds for both of them.  This is similar to the situation
mentioned in setup_uniform_class_p:

  /* We can not use alloc_reg_class_subclasses here because move
 cost hooks does not take into account that some registers are
 unavailable for the subtarget.  E.g. for i686, INT_SSE_REGS
 is element of alloc_reg_class_subclasses for GENERAL_REGS
 because SSE regs are unavailable.  */
  for (i = 0; (cl2 = reg_class_subclasses[cl][i]) != LIM_REG_CLASSES; i++)

When we have several equivalent leaf classes, I think we should continue
to treat the lowest-numbered one as a pressure class.  This produces the
expected results with the MIPS patch described above.

Tested on x86_64-linux-gnu and mipsisa64-elf.  Also tested by making sure
that there were no changed in cc1 .ii files for x86_64-linux-gnu.
OK to install?


gcc/
* ira.c (setup_pressure_classes): Handle synonymous classes.

Index: gcc/ira.c
===
--- gcc/ira.c   2012-09-05 21:33:00.115078796 +0100
+++ gcc/ira.c   2012-09-05 22:01:05.747029786 +0100
@@ -789,7 +789,7 @@ setup_pressure_classes (void)
 hard registers and movement between them is costly
 (e.g. SPARC FPCC registers).  We still should consider it
 as a candidate for a pressure class.  */
- && alloc_reg_class_subclasses[cl][0] != LIM_REG_CLASSES)
+ && alloc_reg_class_subclasses[cl][0] < cl)
{
  /* Check that the moves between any hard registers of the
 current class are not more expensive for a legal mode

91 matches

Mail list logo