Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-17 Thread Richard Sandiford
Vladimir Makarov  writes:
> On 04/10/2012 09:35 AM, Richard Sandiford wrote:
>> Hi Vlad,
>>
>> Back in Decemember, when we were still very much in stage 3, I sent
>> an RFC about an alternative implementation of -fsched-pressure.
>> Just wanted to send a reminder now that we're in the proper stage:
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html
>>
>> Ulrich has benchmarked it on ARM, S/390 and Power7 (thanks), and got
>> reasonable results.  (I mentioned bad Power 7 results in that message,
>> because of the way the VSX_REGS class is handled.  Ulrich's results
>> are without -mvsx though.)
>>
>> The condition I orignally set myself was that this patch should only
>> go in if it becomes the default on at least one architecture,
>> specifically ARM.  Ulrich tells me that Linaro have now made it
>> the default for ARM in their GCC 4.7 release, so hopefully Ramana
>> would be OK with doing the same in upstream 4.8.
>>
>> I realise the whole thing is probably more complicated and ad-hoc
>> than you'd like.  Saying it can't go in is a perfectly acceptable
>> answer IMO.
>>
> I have a mixed feeling with the patch.  I've tried it on SPEC2000 on 
> x86/x86-64 and ARM.  Model algorithm generates bigger code up to 3.5% 
> (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in 
> comparison with the current algorithm.

Yeah.  That's expected really, since one of the points of the patch is
to allow more spilling than the current algorithm does.

One of the main problems I was seeing with the current algorithm was
that it was too conservative, and prevented spilling even when it would
be benificial.  This included spilling outside loops.  E.g. if you have
14 available GPRs, as on ARM, and have 10 pseudos live across a loop
but not used within it, the current algorithm uses 10 registers as the
starting pressure when scheduling the loop.  So the current algorithm
tries hard to avoid spilling those 10 registers, even if that restricts
the amount of reordering within the loop.  The new algorithm was supposed
to allow such registers to be spilled, but those extra spills would
increase code size.

> It is slower too.  Although the difference is quite insignificant on
> Corei7, compiler speed slowdown achieves 0.4% on SPECFP2000 on arm.

Hmm, that's not good.

> The algorithm also generates slower code on x86 (1.5% on SPECINT and
> 5% on SPECFP200) and practically the same average code on x86-64 and
> ARM (I've tried only SPECINT on ARM).

Yeah, that's underwhelming too.  It looks like there's a danger that
this would become an A8- and A9-specific pass, so we would really need
better ARM results than that.

> On the other hand, I don't think that 1st insn scheduling will be ever 
> used for x86.  And although the SPECFP2000 rate is the same on x86-64 I 
> saw that some SPECFP2000 tests benefit from your algorithm on x86-64 
> (one amazing difference is 70% improvement on swim on x86-64 although it 
> might be because of different reasons like alignment or cache 
> behaviour).  So I think the algorithm might work better on processors 
> with more registers.

Notwithstanding that this is a goemean, I assume there were some bad
results to cancel out the gain?

> As for the patch itself, I think you should document the option in 
> doc/invoke.texi.  It is missed.

I forgot to say, but that was deliberate.  I see this as a developer
option rather than a user option.  The important thing was to allow
backends to default to the new algorithm if they want to.  Providing
command-line control gives developers an easier way of seeing which
default makes sense, but I don't think the option should be advertised
to users.

> Another minor mistake I found is one line garbage (I guess from
> -fira-algorithm) in description of -fsched-pressure-algorithm in
> common.opt.

Oops, thanks :-)

Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch.  It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future.  There must be
better ways of achieving the same thing.

Richard


Re: [patch] Remove strange case cost code

2012-04-17 Thread Richard Guenther
On Tue, Apr 17, 2012 at 8:49 AM, Jan Hubicka  wrote:
>> Hello,
>>
>> There is code in stmt.c since the initial checkin, that tries to
>> balance a switch tree according to some ascii heuristics. I see a
>> couple of problems with this code:
>>
>> 1. It doesn't seem to help much. With the attached patch to remove the
>> code, I see no compile time changes to e.g. compile GCC itself.
>>
>> 2. It isn't clear what the heuristic is based on (no reference to any
>> testing done, or a reference to a book or paper).
>>
>> 3. The heuristic is applied for case values in the range <-1,127>
>> (inclusive) even if the type of the switch expression isn't char or
>> int but e.g. an enum. This results in funny application of this
>> heuristic in GCC itself to e.g. some cases of enum rtx_code and enum
>> tree_code.
>
> Note that it would make a lot of sense to teach this heuristics predict.c
> and properly identify chars.

Indeed this would be the proper place to implement this logic.

> Also it is possble to get an historgrams from profile feedback into
> switch expansion. I always wanted to do that once switch expansion code
> is cleaned up and moved to gimple level...

Indeed.  At least the parts that expand switch stmts to (balanced) trees
should be moved to the GIMPLE level, retaining only the table-jump-like
expansions as switch stmts.

>>
>>
>> The attached patch removes the heuristic.
>>
>> Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?

Ok.

Thanks,
Richard.

>> Ciao!
>> Steven
>
>


Re: [patch] Remove strange case cost code

2012-04-17 Thread Jan Hubicka
> > Note that it would make a lot of sense to teach this heuristics predict.c
> > and properly identify chars.
> 
> Indeed this would be the proper place to implement this logic.

TO a degree - switch expansion needs more info than it can obtain from edge
profile.  Having
switch
  case 1,3,5,7,8,9: aaa
  case 2,4,6,8,10,12: bbb
to produce well ballanced decision tree, it is not enough to know how
often the value is even and how often it is odd...

Thus there is a need for value histograms.
> 
> > Also it is possble to get an historgrams from profile feedback into
> > switch expansion. I always wanted to do that once switch expansion code
> > is cleaned up and moved to gimple level...
> 
> Indeed.  At least the parts that expand switch stmts to (balanced) trees
> should be moved to the GIMPLE level, retaining only the table-jump-like
> expansions as switch stmts.

Yep.
Honza
> 
> >>
> >>
> >> The attached patch removes the heuristic.
> >>
> >> Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?
> 
> Ok.
> 
> Thanks,
> Richard.
> 
> >> Ciao!
> >> Steven
> >
> >


Re: [patch] Remove strange case cost code

2012-04-17 Thread Paolo Bonzini
Il 17/04/2012 10:45, Richard Guenther ha scritto:
> > Also it is possble to get an historgrams from profile feedback into
> > switch expansion. I always wanted to do that once switch expansion code
> > is cleaned up and moved to gimple level...
> 
> Indeed.  At least the parts that expand switch stmts to (balanced) trees
> should be moved to the GIMPLE level, retaining only the table-jump-like
> expansions as switch stmts.

This would also make it much easier to drop the range checking from
switch statements (VRP would just fold those away).  Also, targets could
choose between casesi and tablejump.  ARM can benefit from that.

Paolo


Re: [PATCH] Prevent 'control reaches end of non-void function' warning for DO_WHILE

2012-04-17 Thread Tom de Vries
On 16/04/12 16:23, Jason Merrill wrote:
> On 04/14/2012 05:43 PM, Tom de Vries wrote:
>>  +  tree expr = NULL;
>>  +  append_to_statement_list (*block,&expr);
>>  +  *block = expr;

  Rather than doing this dance here, I think it would be better to enhance
  append_to_statement_list to handle the case of the list argument being a
  non-list.

>> Added return value to append_to_statement_list, so now it's:
>>
>> *block = append_to_statement_list (*block, NULL);
> 
> That's different from what I was suggesting; if the list argument is a 
> pointer to a non-list, we can build up a list for at at that time, so we 
> don't need the
> 
>> +  *block = append_to_statement_list (*block, NULL);
> 
> line at all; when we see
> 
>> +  append_to_statement_list (build1 (LABEL_EXPR, void_type_node, label),
>> +block);
> 
> if *block isn't a STATEMENT_LIST we just make the necessary adjustments.
> 

I see. Patch adapted, bootstrapped and reg-tested on x86_64.

ok for trunk?

Thanks,
- Tom

> Jason

2012-04-17  Tom de Vries  

* tree-iterator.c (append_to_statement_list_1): Handle case that *list_p
is not a STMT_LIST.

* cp-gimplify.c (begin_bc_block): Add location parameter and use as
location argument to create_artificial_label.
(finish_bc_block): Change return type to void.  Remove body_seq
parameter, and add block parameter.  Append label to STMT_LIST and
return in block.
(gimplify_cp_loop, gimplify_for_stmt, gimplify_while_stmt)
(gimplify_do_stmt, gimplify_switch_stmt): Remove function.
(genericize_cp_loop, genericize_for_stmt, genericize_while_stmt)
(genericize_do_stmt, genericize_switch_stmt, genericize_continue_stmt)
(genericize_break_stmt, genericize_omp_for_stmt): New function.
(cp_gimplify_omp_for): Remove bc_continue processing.
(cp_gimplify_expr): Genericize VEC_INIT_EXPR.
(cp_gimplify_expr): Mark FOR_STMT, WHILE_STMT, DO_STMT, SWITCH_STMT,
CONTINUE_STMT, and BREAK_STMT as unreachable.
(cp_genericize_r): Genericize FOR_STMT, WHILE_STMT, DO_STMT,
SWITCH_STMT, CONTINUE_STMT, BREAK_STMT and OMP_FOR.
(cp_genericize_tree): New function, factored out of ...
(cp_genericize): ... this function.

* g++.dg/pr51264-4.C: New test.
Index: gcc/tree-iterator.c
===
--- gcc/tree-iterator.c (revision 185028)
+++ gcc/tree-iterator.c (working copy)
@@ -74,6 +74,13 @@ append_to_statement_list_1 (tree t, tree
 	}
   *list_p = list = alloc_stmt_list ();
 }
+  else if (TREE_CODE (list) != STATEMENT_LIST)
+{
+  tree first = list;
+  *list_p = list = alloc_stmt_list ();
+  i = tsi_last (list);
+  tsi_link_after (&i, first, TSI_CONTINUE_LINKING);
+}
 
   i = tsi_last (list);
   tsi_link_after (&i, t, TSI_CONTINUE_LINKING);
Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c (revision 185028)
+++ gcc/cp/cp-gimplify.c (working copy)
@@ -34,6 +34,11 @@ along with GCC; see the file COPYING3.
 #include "flags.h"
 #include "splay-tree.h"
 
+/* Forward declarations.  */
+
+static tree cp_genericize_r (tree *, int *, void *);
+static void cp_genericize_tree (tree*);
+
 /* Local declarations.  */
 
 enum bc_t { bc_break = 0, bc_continue = 1 };
@@ -45,37 +50,36 @@ static tree bc_label[2];
 /* Begin a scope which can be exited by a break or continue statement.  BC
indicates which.
 
-   Just creates a label and pushes it into the current context.  */
+   Just creates a label with location LOCATION and pushes it into the current
+   context.  */
 
 static tree
-begin_bc_block (enum bc_t bc)
+begin_bc_block (enum bc_t bc, location_t location)
 {
-  tree label = create_artificial_label (input_location);
+  tree label = create_artificial_label (location);
   DECL_CHAIN (label) = bc_label[bc];
   bc_label[bc] = label;
   return label;
 }
 
 /* Finish a scope which can be exited by a break or continue statement.
-   LABEL was returned from the most recent call to begin_bc_block.  BODY is
+   LABEL was returned from the most recent call to begin_bc_block.  BLOCK is
an expression for the contents of the scope.
 
If we saw a break (or continue) in the scope, append a LABEL_EXPR to
-   body.  Otherwise, just forget the label.  */
+   BLOCK.  Otherwise, just forget the label.  */
 
-static gimple_seq
-finish_bc_block (enum bc_t bc, tree label, gimple_seq body)
+static void
+finish_bc_block (tree *block, enum bc_t bc, tree label)
 {
   gcc_assert (label == bc_label[bc]);
 
   if (TREE_USED (label))
-{
-  gimple_seq_add_stmt (&body, gimple_build_label (label));
-}
+append_to_statement_list (build1 (LABEL_EXPR, void_type_node, label),
+			  block);
 
   bc_label[bc] = DECL_CHAIN (label);
   DECL_CHAIN (label) = NULL_TREE;
- 

Re: [PATCH] Dissociate store_expr's temp from exp so that it is not marked as addressable

2012-04-17 Thread Martin Jambor
Hi,

On Thu, Apr 12, 2012 at 07:21:12PM +0200, Eric Botcazou wrote:
> > Well, the commit did not add a testcase and when I looked up the patch
> > in the mailing list archive
> > (http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01449.html) it said it
> > was fixing problems not reproducible on trunk so it's basically
> > impossible for me to evaluate whether it is still necessary by some
> > simple testing.  Having said that, I guess I can give it a round of
> > regular testing on all the platforms I have currently set up.
> 
> The problem was that, for the same address, you had the alias set of the type 
> on one MEM and the alias set of the reference on the other MEM.  If the alias 
> set of the reference doesn't conflict with that of the type (this can happen 
> in Ada because of DECL_NONADDRESSABLE_P), the RAW dependency may be missed.
> 
> If we don't put the alias set of the reference on one of the MEM, then I 
> don't 
> think that we need to put it on the other MEM.  That's what's done for the 
> first, non-bitfield temporary now.
> 
> > 2012-04-10  Martin Jambor  
> >
> > * expr.c (expand_expr_real_1): Pass type, not the expression, to
> > set_mem_attributes for a memory temporary. Do not call the function
> > for the memory temporary created for a bitfield.
> 
> Fine with me, but the now dangling code in the bitfield case is a bit 
> annoying.

In order to alleviate that feeling, I'd like to propose the following
patch, which I have successfully bootstrapped and tested on
x86_64-linux (including Ada and obj-c++), i686-linux (likewise),
sparc64-linux (with Ada but without Java), ia64-linux (default
languages, i.e. without Ada) and ppc64-linux (likewise).  Testsuite
run (no bootstrap) on hppa-linux (C and C++ only) is still running and
I expect to have results tomorrow.

Thus, OK for trunk?

Thanks,

Martin


2012-04-16  Martin Jambor  

* expr.c (expand_expr_real_1): Remove setting parent's alias set for
temporaries created for a bitfield (reverting revision 122014).

Index: src/gcc/expr.c
===
--- src.orig/gcc/expr.c
+++ src/gcc/expr.c
@@ -9866,19 +9866,11 @@ expand_expr_real_1 (tree exp, rtx target
   necessarily be constant.  */
if (mode == BLKmode)
  {
-   HOST_WIDE_INT size = GET_MODE_BITSIZE (ext_mode);
rtx new_rtx;
 
-   /* If the reference doesn't use the alias set of its type,
-  we cannot create the temporary using that type.  */
-   if (component_uses_parent_alias_set (exp))
- {
-   new_rtx = assign_stack_local (ext_mode, size, 0);
-   set_mem_alias_set (new_rtx, get_alias_set (exp));
- }
-   else
- new_rtx = assign_stack_temp_for_type (ext_mode, size, 0, 
type);
-
+   new_rtx = assign_stack_temp_for_type (ext_mode,
+  GET_MODE_BITSIZE (ext_mode),
+  0, type);
emit_move_insn (new_rtx, op0);
op0 = copy_rtx (new_rtx);
PUT_MODE (op0, BLKmode);



Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Ilya Enkovich
>>
>> It has nothing but defines for Android. It did not move any existing
>> code to this file.
>>
>
> Adding linux-common.h to i386 backend needs approval from
> i386 backend maintainer.   If a patch also adds Android support,
> i386 backend maintainer may not feel comfortable to review it.
> However, if you simplify add linux-common.h with XXX_SPEC,
> i386 backend maintainer can review it easily.
>
> --
> H.J.

All XXX_SPEC in linux-common.h are Android related. I also believe
that my Android specific changes need i386 backend maintainer approval
anyway because wrong Android support implementation may break other
targets.

Could please someone from maintainers tell me if it is needed to split
this patch?

Thanks,
Ilya


Re: [patch] Remove strange case cost code

2012-04-17 Thread Steven Bosscher
On Tue, Apr 17, 2012 at 10:45 AM, Richard Guenther
 wrote:
>> Also it is possble to get an historgrams from profile feedback into
>> switch expansion. I always wanted to do that once switch expansion code
>> is cleaned up and moved to gimple level...
>
> Indeed.  At least the parts that expand switch stmts to (balanced) trees
> should be moved to the GIMPLE level, retaining only the table-jump-like
> expansions as switch stmts.

My goal for GCC 4.8 is to do just that: Move switch expansion to
GIMPLE and add value profiling for switch expressions. I may put back
that heuristic as a branch predictor, but I doubt it makes much of a
difference. Besides, it is actually hard to figure out whether a
switch expression is for characters in an ascii string because char is
promoted to int.

Ciao!
Steven


[C++ Patch] PR 52599

2012-04-17 Thread Paolo Carlini

Hi,

in order to avoid this ICE on invalid, I think it makes sense to 
explicitly check for try-block in massage_constexpr_body, since 7.1.5/4 
explicitly rules out for constexpr constructors such function-body.


Tested x86_64-linux.

Thanks,
Paolo.

///
/cp
2012-04-17  Paolo Carlini  

PR c++/52599
* semantics.c (massage_constexpr_body): Check for function-try-block
as constructor function-body.

/testsuite
2012-04-17  Paolo Carlini  

PR c++/52599
* g++.dg/cpp0x/constexpr-ctor10.C: New.
Index: testsuite/g++.dg/cpp0x/constexpr-ctor10.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
@@ -0,0 +1,6 @@
+// PR c++/52599
+// { dg-options -std=c++11 }
+
+struct foo {
+  constexpr foo() try { } catch(...) { };  // { dg-error "constructor" }
+};
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 186523)
+++ cp/semantics.c  (working copy)
@@ -6001,8 +6001,18 @@ static tree
 massage_constexpr_body (tree fun, tree body)
 {
   if (DECL_CONSTRUCTOR_P (fun))
-body = build_constexpr_constructor_member_initializers
-  (DECL_CONTEXT (fun), body);
+{
+  if (TREE_CODE (body) == BIND_EXPR
+ && TREE_CODE (BIND_EXPR_BODY (body)) == TRY_BLOCK)
+   {
+ error ("body of % constructor cannot be a "
+"function-try-block");
+ return error_mark_node;
+   }
+
+  body = build_constexpr_constructor_member_initializers
+   (DECL_CONTEXT (fun), body);
+}
   else
 {
   if (TREE_CODE (body) == EH_SPEC_BLOCK)


[PATCH] Fix loop bound computation based on undefined behavior

2012-04-17 Thread Richard Guenther

Loop bound computation uses undefined behavior when accessing arrays
outside of their domain.  Unfortunately while it tries to honor
issues with trailing arrays in allocated storage its implementation
is broken (for one, it does consider a TYPE_DECL after the array
as a sign that the array is not at struct end).  The following patch
moves array_at_struct_end_p to expr.c near its natural user
(it's also used by graphite) and re-implements it.  It also adjusts
array_ref_up_bound to not return any bound in the case of an
access to a trailing array - at present what it returns is a
conservative answer in the wrong sense in two of its four callers
(it returns a lower bound for the upper bound).  Given the fact
that array_ref_low_bound returns an exact answer not returning
any lower / upper bound for the upper bound but only what we would
consider exact sounds like the most reasonable solution.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

This does not yet fully recover bootstrap if you make use of
undefined behavior loop bound detection in VRP, but two miscompiles
of GCC vanish.

Richard.

2012-04-17  Richard Guenther  

* tree-flow.h (array_at_struct_end_p): Move declaration ...
* tree.h (array_at_struct_end_p): ... here.
* tree-ssa-loop-niter.c (idx_infer_loop_bounds): Infer nothing
from array references at struct ends.
(array_at_struct_end_p): Move ...
* expr.c (array_at_struct_end_p): ... here.  Rewrite.
(array_ref_up_bound): Return NULL_TREE for array references
at struct ends.

Index: gcc/tree.h
===
*** gcc/tree.h  (revision 186496)
--- gcc/tree.h  (working copy)
*** extern bool contains_packed_reference (c
*** 5068,5073 
--- 5068,5075 
  
  extern tree array_ref_element_size (tree);
  
+ bool array_at_struct_end_p (tree);
+ 
  /* Return a tree representing the lower bound of the array mentioned in
 EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
  
Index: gcc/expr.c
===
*** gcc/expr.c  (revision 186496)
--- gcc/expr.c  (working copy)
*** array_ref_low_bound (tree exp)
*** 6778,6783 
--- 6778,6820 
return build_int_cst (TREE_TYPE (TREE_OPERAND (exp, 1)), 0);
  }
  
+ /* Returns true if REF is an array reference to an array at the end of
+a structure.  If this is the case, the array may be allocated larger
+than its upper bound implies.  */
+ 
+ bool
+ array_at_struct_end_p (tree ref)
+ {
+   if (TREE_CODE (ref) != ARRAY_REF
+   && TREE_CODE (ref) != ARRAY_RANGE_REF)
+ return false;
+ 
+   while (handled_component_p (ref))
+ {
+   /* If the reference chain contains a component reference to a
+  non-union type and there follows another field the reference
+is not at the end of a structure.  */
+   if (TREE_CODE (ref) == COMPONENT_REF
+ && TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 0))) == RECORD_TYPE)
+   {
+ tree nextf = DECL_CHAIN (TREE_OPERAND (ref, 1));
+ while (nextf && TREE_CODE (nextf) != FIELD_DECL)
+   nextf = DECL_CHAIN (nextf);
+ if (nextf)
+   return false;
+   }
+ 
+   ref = TREE_OPERAND (ref, 0);
+ }
+ 
+   /* If the reference is based on a declared entity, the size of the array
+  is constrained by its given domain.  */
+   if (DECL_P (ref))
+ return false;
+ 
+   return true;
+ }
+ 
  /* Return a tree representing the upper bound of the array mentioned in
 EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
  
*** array_ref_up_bound (tree exp)
*** 6789,6795 
/* If there is a domain type and it has an upper bound, use it, substituting
   for a PLACEHOLDER_EXPR as needed.  */
if (domain_type && TYPE_MAX_VALUE (domain_type))
! return SUBSTITUTE_PLACEHOLDER_IN_EXPR (TYPE_MAX_VALUE (domain_type), exp);
  
/* Otherwise fail.  */
return NULL_TREE;
--- 6826,6843 
/* If there is a domain type and it has an upper bound, use it, substituting
   for a PLACEHOLDER_EXPR as needed.  */
if (domain_type && TYPE_MAX_VALUE (domain_type))
! {
!   tree max = TYPE_MAX_VALUE (domain_type);
! 
!   /* For references to arrays at the end of dynamically allocated
!  structures TYPE_MAX_VALUE is not an upper bound for the array
!size.  */
!   if (TREE_CODE (max) == INTEGER_CST
! && array_at_struct_end_p (exp))
!   return NULL_TREE;
! 
!   return SUBSTITUTE_PLACEHOLDER_IN_EXPR (max, exp);
! }
  
/* Otherwise fail.  */
return NULL_TREE;
Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 186496)
--- gcc/tree-flow.h (working copy)
*** tree find_loop_niter (struct loop *, edg
*** 686,692 
  tree loop_niter_by_eval (struct loop *, edge);
  tree find_loop_ni

Re: [patch] Remove strange case cost code

2012-04-17 Thread Jan Hubicka
> On Tue, Apr 17, 2012 at 10:45 AM, Richard Guenther
>  wrote:
> >> Also it is possble to get an historgrams from profile feedback into
> >> switch expansion. I always wanted to do that once switch expansion code
> >> is cleaned up and moved to gimple level...
> >
> > Indeed.  At least the parts that expand switch stmts to (balanced) trees
> > should be moved to the GIMPLE level, retaining only the table-jump-like
> > expansions as switch stmts.
> 
> My goal for GCC 4.8 is to do just that: Move switch expansion to
> GIMPLE and add value profiling for switch expressions. I may put back
> that heuristic as a branch predictor, but I doubt it makes much of a
> difference. Besides, it is actually hard to figure out whether a

I have my doubts, too, this is why it is not implemented.  Lets see if the
removal changes anything.
Currently the branch prediction code is extraordinarily stupid on switches.
We still could do better - i.e. be able to combine other types of predictions
on them (i.e. switch edge leading to abort() is unlikely) and eventually we
could teach VRP about value range histograms.

> switch expression is for characters in an ascii string because char is
> promoted to int.

Good plan! Can you two symchonize your efforts, please?

Honza
> 
> Ciao!
> Steven


Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

2012-04-17 Thread Richard Guenther
On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries  wrote:
> On 27/01/12 21:37, Tom de Vries wrote:
>> On 24/01/12 11:40, Richard Guenther wrote:
>>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries  
>>> wrote:
 Richard,
 Jakub,

 the following patch fixes PR51879.

 Consider the following test-case:
 ...
 int bar (int);
 void baz (int);

 void
 foo (int y)
 {
  int a;
  if (y == 6)
    a = bar (7);
  else
    a = bar (7);
  baz (a);
 }
 ...

 after compiling at -02, the representation looks like this before 
 tail-merging:
 ...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_3 = barD.1703 (7);
  goto ;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_4 = barD.1703 (7);
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:1
  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
  # aD.1709_1 = PHI 
  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
  # USE = nonlocal
  # CLB = nonlocal
  bazD.1705 (aD.1709_1);
  # VUSE <.MEMD.1714_9>
  return;
 ...

 the patch allows aD.1709_4 to be value numbered to aD.1709_3, and 
 .MEMD.1714_8
 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.

 The patch makes sure non-const/pure call results (gimple_vdef and
 gimple_call_lhs) are properly value numbered.

 Bootstrapped and reg-tested on x86_64.

 ok for stage1?
>>>
>>> The following cannot really work:
>>>
>>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>>>    result = vn_reference_lookup_1 (&vr1, NULL);
>>>    if (result)
>>>      {
>>> -      changed = set_ssa_val_to (lhs, result);
>>> +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
>>> +      if (vdef)
>>> +       changed |= set_ssa_val_to (vdef, result_vdef);
>>> +      changed |= set_ssa_val_to (lhs, result);
>>>
>>> because 'result' may be not an SSA name.  It might also not have
>>> a proper definition statement (if VN_INFO (result)->needs_insertion
>>> is true).  So you at least need to guard things properly.
>>>
>>
>> Right. And that also doesn't work if the function is without lhs, such as in 
>> the
>> new test-case pr51879-6.c.
>>
>> I fixed this by storing both lhs and vdef, such that I don't have to derive
>> the vdef from the lhs.
>>
>>> (On a side-note - I _did_ want to remove value-numbering virtual operands
>>> at some point ...)
>>>
>>
>> Doing so willl hurt performance of tail-merging in its current form.
>> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
>> value numbering as used in tail-merging has its limitations too.
>> Do you have any ideas how to address that one?
>>
>>> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>>>           /* ???  We should handle stores from calls.  */
>>>           else if (TREE_CODE (lhs) == SSA_NAME)
>>>             {
>>> +             tree vuse = gimple_vuse (stmt);
>>>               if (!gimple_call_internal_p (stmt)
>>> -                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
>>> +                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>>> +                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
>>>                 changed = visit_reference_op_call (lhs, stmt);
>>>               else
>>>                 changed = defs_to_varying (stmt);
>>>
>>> ... exactly because of the issue that a stmt has multiple defs.  Btw,
>>> vuse should have been visited here or be part of our SCC, so, why do
>>> you need this check?
>>>
>>
>> Removed now, that was a workaround for a bug in an earlier version of the 
>> patch,
>> that I didn't need anymore.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> OK for stage1?
>>
>
> Richard,
>
> quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
> ...
> I think these fixes hint at that we should
> use "structural" equality as fallback if value-numbering doesn't equate
> two stmt effects.  Thus, treat two stmts with exactly the same operands
> and flags as equal and using value-numbering to canonicalize operands
> (when they are SSA names) for that comparison, or use VN entirely
> if there are no side-effects on the stmt.
>
> Changing value-numbering of virtual operands, even if it looks correct in the
> simple cases you change, doesn't look like a general solution for the missed
> tail merging opportunities.
> ...
>
> The test-case pr51879-6.c shows a case where improving value numbering will 
> help
> tail-merging, but structural equality comparison not:
> ...
>  # BLOCK 3 freq:1991
>  # PRED: 2 [19.9%]  (true,exec)
>  # .MEM

[Fixinclude]: Fix typo and default to twoprocess on VMS

2012-04-17 Thread Tristan Gingold
Hi,

one-process methodology cannot be used on VMS because fork/pipe/dup2 aren't 
fully supported.  To avoid a build failure, it is therefore better to build 
using two-process methodology.

But, when twoprocess is selected, gcc emits a warning due to a missing 
specifier in printf.  The patch fixes that.

Manually tested on x86_64-darwin by configuring with --enable-twoprocess.

I am pretty sure that fixinclude cannot be used as-is on VMS due to the 
filename convention and missing shell, but at least we can build a cross and a 
native canadian on UNIX.

Ok for trunk ?

Tristan.

fixincludes/
2012-04-17  Tristan Gingold  

* fixincl.c (fix_with_system): Add missing specifier.
* configure.ac: Default to twoprocess on vms.
* configure: Regenerate.

diff --git a/fixincludes/configure.ac b/fixincludes/configure.ac
index e7de791..f1fb2ff 100644
--- a/fixincludes/configure.ac
+++ b/fixincludes/configure.ac
@@ -53,7 +53,8 @@ fi],
i?86-*-msdosdjgpp* | \
i?86-*-mingw32* | \
x86_64-*-mingw32* | \
-   *-*-beos* )
+   *-*-beos* | \
+*-*-*vms*)
TARGET=twoprocess
;;
 
diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index 9f399ab..1133534 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -829,7 +829,7 @@ fix_with_system (tFixDesc* p_fixd,
   /*
*  Now add the fix number and file names that may be needed
*/
-  sprintf (pz_scan, " %ld '%s' '%s'",  (long) (p_fixd - fixDescList),
+  sprintf (pz_scan, " %ld '%s' '%s' '%s'", (long) (p_fixd - fixDescList),
   pz_fix_file, pz_file_source, pz_temp_file);
 }
   else /* NOT an "internal" fix: */



Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Uros Bizjak
On Tue, Apr 17, 2012 at 12:16 PM, Ilya Enkovich  wrote:
>>>
>>> It has nothing but defines for Android. It did not move any existing
>>> code to this file.
>>>
>>
>> Adding linux-common.h to i386 backend needs approval from
>> i386 backend maintainer.   If a patch also adds Android support,
>> i386 backend maintainer may not feel comfortable to review it.
>> However, if you simplify add linux-common.h with XXX_SPEC,
>> i386 backend maintainer can review it easily.
>>
>> --
>> H.J.
>
> All XXX_SPEC in linux-common.h are Android related. I also believe
> that my Android specific changes need i386 backend maintainer approval
> anyway because wrong Android support implementation may break other
> targets.

+#undef  ENDFILE_SPEC
+#define ENDFILE_SPEC \
+  GNU_USER_TARGET_MATHFILE_SPEC " " \
+  GNU_USER_TARGET_ENDFILE_SPEC

Where is GNU_USER_TARGET_ENDFILE_SPEC defined?

Uros.


Re: [C++ Patch] PR 52599

2012-04-17 Thread Jason Merrill
I think build_constexpr_constructor_member_initializers is a better 
place for that check, since it's already looking at the tree structure.


Jason


Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Uros Bizjak
On Tue, Apr 17, 2012 at 3:16 PM, Uros Bizjak  wrote:
> On Tue, Apr 17, 2012 at 12:16 PM, Ilya Enkovich  
> wrote:

 It has nothing but defines for Android. It did not move any existing
 code to this file.

>>>
>>> Adding linux-common.h to i386 backend needs approval from
>>> i386 backend maintainer.   If a patch also adds Android support,
>>> i386 backend maintainer may not feel comfortable to review it.
>>> However, if you simplify add linux-common.h with XXX_SPEC,
>>> i386 backend maintainer can review it easily.
>>>
>>> --
>>> H.J.
>>
>> All XXX_SPEC in linux-common.h are Android related. I also believe
>> that my Android specific changes need i386 backend maintainer approval
>> anyway because wrong Android support implementation may break other
>> targets.
>
> +#undef  ENDFILE_SPEC
> +#define ENDFILE_SPEC \
> +  GNU_USER_TARGET_MATHFILE_SPEC " " \
> +  GNU_USER_TARGET_ENDFILE_SPEC
>
> Where is GNU_USER_TARGET_ENDFILE_SPEC defined?

Oh, I found it.

The patch looks OK to me in the sense, that there is no difference for
x86 targets.

So, OK for x86.

Thanks,
Uros.


Re: [PATCH] Fix loop bound computation based on undefined behavior

2012-04-17 Thread Richard Guenther
On Tue, 17 Apr 2012, Richard Guenther wrote:

> 
> Loop bound computation uses undefined behavior when accessing arrays
> outside of their domain.  Unfortunately while it tries to honor
> issues with trailing arrays in allocated storage its implementation
> is broken (for one, it does consider a TYPE_DECL after the array
> as a sign that the array is not at struct end).  The following patch
> moves array_at_struct_end_p to expr.c near its natural user
> (it's also used by graphite) and re-implements it.  It also adjusts
> array_ref_up_bound to not return any bound in the case of an
> access to a trailing array - at present what it returns is a
> conservative answer in the wrong sense in two of its four callers
> (it returns a lower bound for the upper bound).  Given the fact
> that array_ref_low_bound returns an exact answer not returning
> any lower / upper bound for the upper bound but only what we would
> consider exact sounds like the most reasonable solution.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> This does not yet fully recover bootstrap if you make use of
> undefined behavior loop bound detection in VRP, but two miscompiles
> of GCC vanish.

I ended up with the following simplified variant because we have
testcases that test that niter still records an estimate for
a trailing a[5].

Bootstrapped and tested on x86_64-unknown-linux-gnu, committed.

Richard.

2012-04-17  Richard Guenther  

* tree-flow.h (array_at_struct_end_p): Move declaration ...
* tree.h (array_at_struct_end_p): ... here.
* tree-ssa-loop-niter.c (array_at_struct_end_p): Move ...
* expr.c (array_at_struct_end_p): ... here.  Rewrite.

Index: gcc/tree.h
===
*** gcc/tree.h  (revision 186496)
--- gcc/tree.h  (working copy)
*** extern bool contains_packed_reference (c
*** 5068,5073 
--- 5068,5075 
  
  extern tree array_ref_element_size (tree);
  
+ bool array_at_struct_end_p (tree);
+ 
  /* Return a tree representing the lower bound of the array mentioned in
 EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
  
Index: gcc/expr.c
===
*** gcc/expr.c  (revision 186496)
--- gcc/expr.c  (working copy)
*** array_ref_low_bound (tree exp)
*** 6778,6783 
--- 6778,6820 
return build_int_cst (TREE_TYPE (TREE_OPERAND (exp, 1)), 0);
  }
  
+ /* Returns true if REF is an array reference to an array at the end of
+a structure.  If this is the case, the array may be allocated larger
+than its upper bound implies.  */
+ 
+ bool
+ array_at_struct_end_p (tree ref)
+ {
+   if (TREE_CODE (ref) != ARRAY_REF
+   && TREE_CODE (ref) != ARRAY_RANGE_REF)
+ return false;
+ 
+   while (handled_component_p (ref))
+ {
+   /* If the reference chain contains a component reference to a
+  non-union type and there follows another field the reference
+is not at the end of a structure.  */
+   if (TREE_CODE (ref) == COMPONENT_REF
+ && TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 0))) == RECORD_TYPE)
+   {
+ tree nextf = DECL_CHAIN (TREE_OPERAND (ref, 1));
+ while (nextf && TREE_CODE (nextf) != FIELD_DECL)
+   nextf = DECL_CHAIN (nextf);
+ if (nextf)
+   return false;
+   }
+ 
+   ref = TREE_OPERAND (ref, 0);
+ }
+ 
+   /* If the reference is based on a declared entity, the size of the array
+  is constrained by its given domain.  */
+   if (DECL_P (ref))
+ return false;
+ 
+   return true;
+ }
+ 
  /* Return a tree representing the upper bound of the array mentioned in
 EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
  
Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 186496)
--- gcc/tree-flow.h (working copy)
*** tree find_loop_niter (struct loop *, edg
*** 686,692 
  tree loop_niter_by_eval (struct loop *, edge);
  tree find_loop_niter_by_eval (struct loop *, edge *);
  void estimate_numbers_of_iterations (bool);
- bool array_at_struct_end_p (tree);
  bool scev_probably_wraps_p (tree, tree, gimple, struct loop *, bool);
  bool convert_affine_scev (struct loop *, tree, tree *, tree *, gimple, bool);
  
--- 686,691 
Index: gcc/tree-ssa-loop-niter.c
===
*** gcc/tree-ssa-loop-niter.c   (revision 186496)
--- gcc/tree-ssa-loop-niter.c   (working copy)
*** record_nonwrapping_iv (struct loop *loop
*** 2640,2686 
record_estimate (loop, niter_bound, max, stmt, false, realistic, upper);
  }
  
- /* Returns true if REF is a reference to an array at the end of a dynamically
-allocated structure.  If this is the case, the array may be allocated 
larger
-than its upper bound implies.  */
- 
- bool
- array_at_struct_end_p (tree ref)
- {
-   tree base = ge

Re: [PATCH] Prevent 'control reaches end of non-void function' warning for DO_WHILE

2012-04-17 Thread Jason Merrill

OK, thanks.

Jason


Re: Change initialization order in sel-sched

2012-04-17 Thread Alexander Monakov

On Wed, 11 Apr 2012, Richard Guenther wrote:

> On Wed, Apr 11, 2012 at 4:16 PM, Bernd Schmidt  
> wrote:
> > The order of calls to sched_rgn_init and sched_init differs between
> > sched-rgn and sel-sched. This caused a scheduler patch I was working on
> > to segfault once sel-sched was enabled. The following patch swaps the
> > two function calls.
> >
> > Bootstrapped & tested on i686-linux. Ok?
> 
> Ok.
> 
> Thanks,
> Richard.

Actually, this causes miscompilations with selective scheduler when
-fsel-sched-pipelining is enabled (as it is with -O3 on ia64).  The reason is,
with that flag we build custom regions that consist of a loop body and its
preheader in sel_find_rgns, which is called from sched_rgn_init.  We require
that sched_init is called afterwards, so that DF data is computed for any new
blocks that might have been created (i.e. preheaders); it's possible that DF
is not the only thing that forces this order.

Bernd, could you elaborate on the segfault you had seen?  Perhaps we could
offer some advice on fixing it then.

In the meanwhile, could you revert your patch?  I'm sorry to point out this
problem after the patch had been committed, but it's not immediately obvious :)
Andrey or I will add an explanatory comment in sel-sched afterwards.

Alexander


[PATCH] Fix PR53011

2012-04-17 Thread Richard Guenther

This fixes PR53011 - EH cleanup needs to cater for loops now
(or avoid some transforms).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

Index: gcc/tree-eh.c
===
*** gcc/tree-eh.c   (revision 186523)
--- gcc/tree-eh.c   (working copy)
*** cleanup_empty_eh_merge_phis (basic_block
*** 3916,3921 
--- 3916,3936 
for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (ei)); )
  if (e->flags & EDGE_EH)
{
+   /* ???  CFG manipluation routines do not try to update loop
+  form on edge redirection.  Do so manually here for now.  */
+   /* If we redirect a loop entry or latch edge that will either create
+  a multiple entry loop or rotate the loop.  If the loops merge
+  we may have created a loop with multiple latches.
+  All of this isn't easily fixed thus cancel the affected loop
+  and mark the other loop as possibly having multiple latches.  */
+   if (current_loops
+   && e->dest == e->dest->loop_father->header)
+ {
+   e->dest->loop_father->header = NULL;
+   e->dest->loop_father->latch = NULL;
+   new_bb->loop_father->latch = NULL;
+   loops_state_set (LOOPS_NEED_FIXUP|LOOPS_MAY_HAVE_MULTIPLE_LATCHES);
+ }
redirect_eh_edge_1 (e, new_bb, change_region);
redirect_edge_succ (e, new_bb);
flush_pending_stmts (e);
Index: gcc/testsuite/g++.dg/torture/pr53011.C
===
*** gcc/testsuite/g++.dg/torture/pr53011.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr53011.C  (revision 0)
***
*** 0 
--- 1,66 
+ // { dg-do compile }
+ 
+ extern "C" class WvFastString;
+ typedef WvFastString& WvStringParm;
+ struct WvFastString {
+   ~WvFastString();
+   operator char* () {}
+ };
+ class WvString : WvFastString {};
+ class WvAddr {};
+ class WvIPAddr : WvAddr {};
+ struct WvIPNet : WvIPAddr {
+   bool is_default() {}
+ };
+ template struct WvTraits_Helper {
+   static void release(T *obj) {
+ delete obj;
+   }
+ };
+ template struct WvTraits {
+   static void release(From *obj) {
+ WvTraits_Helper::release(obj);
+   }
+ };
+ struct WvLink {
+   void   *data;
+   WvLink *next;
+   boolautofree;
+   WvLink(bool, int) : autofree() {}
+   bool get_autofree() {}
+ 
+   void unlink() {
+ delete this;
+   }
+ };
+ struct WvListBase {
+   WvLink head, *tail;
+   WvListBase() : head(0, 0) {}
+ };
+ template struct WvList : WvListBase {
+   ~WvList() {
+ zap();
+   }
+ 
+   void zap(bool destroy = 1) {
+ while (head.next) unlink_after(&head, destroy);
+   }
+ 
+   void unlink_after(WvLink *after, bool destroy) {
+ WvLink *next = 0;
+ T *obj   = (destroy && next->get_autofree()) ? 
+static_cast(next->data) : 0;
+ 
+ if (tail) tail = after;
+ next->unlink();
+ WvTraits::release(obj);
+   }
+ };
+ typedef WvListWvStringListBase;
+ class WvStringList : WvStringListBase {};
+ class WvSubProc {
+   WvStringList last_args, env;
+ };
+ void addroute(WvIPNet& dest, WvStringParm table) {
+   if (dest.is_default() || (table != "default")) WvSubProc checkProc;
+ }


Re: Change initialization order in sel-sched

2012-04-17 Thread Bernd Schmidt
On 04/17/2012 03:33 PM, Alexander Monakov wrote:
> Bernd, could you elaborate on the segfault you had seen?  Perhaps we could
> offer some advice on fixing it then.

It was only seen with another patch which modified the sched-rgn
initialization code.

> In the meanwhile, could you revert your patch?  I'm sorry to point out this
> problem after the patch had been committed, but it's not immediately obvious 
> :)
> Andrey or I will add an explanatory comment in sel-sched afterwards.

Will revert for now. In general I think it would be better to change
sel-sched so that the init functions can always be called in the same
order, so as to avoid unnecessary surprises.


Bernd


Re: [C++ Patch] PR 53003

2012-04-17 Thread Jason Merrill

I have various thoughts:

It's odd that we still treat 'return' as starting a function body long 
after we removed that extension.


Maybe we shouldn't look for a function body if we already have an 
initializer and aren't dealing with a function declarator.


I guess we should set initializer_token_start for {} initializers as well.

But your patch is certainly the smallest change, and OK.

Jason


[PATCH] Use all bells and whistles for number of iteration analysis in VRP

2012-04-17 Thread Richard Guenther

This patch reverts the change originally done when adding 
number-of-iteration analysis uses to VRP, to have a flag
to toggle whether to derive number of iterations from undefined
behavior.  To be able to do so one error in VRP has to be fixed - we
have to check for the number of stmt executions, not for the number
of latch block executions.  Otherwise we miscompile IRA during bootstrap.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

With this I can finally make the max loop bound preserved and
set it from loop version producers.  Yay.

Richard.

2012-04-17  Richard Guenther  

* cfgloop.h (estimate_numbers_of_iterations_loop): Remove
use_undefined_p parameter.
* tree-flow.h (estimate_numbers_of_iterations): Likewise.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations_loop):
Likewise.
(estimate_numbers_of_iterations): Likewise.
(estimated_loop_iterations): Adjust.
(max_loop_iterations): Likewise.
(scev_probably_wraps_p): Likewise.
* tree-ssa-loop.c (tree_ssa_loop_bounds): Likewise.
* tree-vrp.c (adjust_range_with_scev): Use max_stmt_executions,
not max_loop_iterations.
(execute_vrp): Remove explicit number of iterations estimation.

Index: gcc/cfgloop.h
===
*** gcc/cfgloop.h   (revision 186526)
--- gcc/cfgloop.h   (working copy)
*** gcov_type expected_loop_iterations_unbou
*** 278,284 
  extern unsigned expected_loop_iterations (const struct loop *);
  extern rtx doloop_condition_get (rtx);
  
! void estimate_numbers_of_iterations_loop (struct loop *, bool);
  bool estimated_loop_iterations (struct loop *, double_int *);
  bool max_loop_iterations (struct loop *, double_int *);
  HOST_WIDE_INT estimated_loop_iterations_int (struct loop *);
--- 278,284 
  extern unsigned expected_loop_iterations (const struct loop *);
  extern rtx doloop_condition_get (rtx);
  
! void estimate_numbers_of_iterations_loop (struct loop *);
  bool estimated_loop_iterations (struct loop *, double_int *);
  bool max_loop_iterations (struct loop *, double_int *);
  HOST_WIDE_INT estimated_loop_iterations_int (struct loop *);
Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 186527)
--- gcc/tree-flow.h (working copy)
*** bool number_of_iterations_exit (struct l
*** 685,691 
  tree find_loop_niter (struct loop *, edge *);
  tree loop_niter_by_eval (struct loop *, edge);
  tree find_loop_niter_by_eval (struct loop *, edge *);
! void estimate_numbers_of_iterations (bool);
  bool scev_probably_wraps_p (tree, tree, gimple, struct loop *, bool);
  bool convert_affine_scev (struct loop *, tree, tree *, tree *, gimple, bool);
  
--- 685,691 
  tree find_loop_niter (struct loop *, edge *);
  tree loop_niter_by_eval (struct loop *, edge);
  tree find_loop_niter_by_eval (struct loop *, edge *);
! void estimate_numbers_of_iterations (void);
  bool scev_probably_wraps_p (tree, tree, gimple, struct loop *, bool);
  bool convert_affine_scev (struct loop *, tree, tree *, tree *, gimple, bool);
  
Index: gcc/tree-ssa-loop-niter.c
===
*** gcc/tree-ssa-loop-niter.c   (revision 186527)
--- gcc/tree-ssa-loop-niter.c   (working copy)
*** gcov_type_to_double_int (gcov_type val)
*** 2950,2956 
 is true also use estimates derived from undefined behavior.  */
  
  void
! estimate_numbers_of_iterations_loop (struct loop *loop, bool use_undefined_p)
  {
VEC (edge, heap) *exits;
tree niter, type;
--- 2950,2956 
 is true also use estimates derived from undefined behavior.  */
  
  void
! estimate_numbers_of_iterations_loop (struct loop *loop)
  {
VEC (edge, heap) *exits;
tree niter, type;
*** estimate_numbers_of_iterations_loop (str
*** 2984,2991 
  }
VEC_free (edge, heap, exits);
  
!   if (use_undefined_p)
! infer_loop_bounds_from_undefined (loop);
  
/* If we have a measured profile, use it to estimate the number of
   iterations.  */
--- 2984,2990 
  }
VEC_free (edge, heap, exits);
  
!   infer_loop_bounds_from_undefined (loop);
  
/* If we have a measured profile, use it to estimate the number of
   iterations.  */
*** estimate_numbers_of_iterations_loop (str
*** 3013,3019 
  bool
  estimated_loop_iterations (struct loop *loop, double_int *nit)
  {
!   estimate_numbers_of_iterations_loop (loop, true);
if (!loop->any_estimate)
  return false;
  
--- 3012,3018 
  bool
  estimated_loop_iterations (struct loop *loop, double_int *nit)
  {
!   estimate_numbers_of_iterations_loop (loop);
if (!loop->any_estimate)
  return false;
  
*** estimated_loop_iterations (struct loop *
*** 3028,3034 
  bool
  max_loop_iterations (struct loop *loop, doub

Re: RFA: Clean up ADDRESS handling in alias.c

2012-04-17 Thread Richard Guenther
On Sun, Apr 15, 2012 at 5:11 PM, Richard Sandiford
 wrote:
> The comment in alias.c says:
>
>   The contents of an ADDRESS is not normally used, the mode of the
>   ADDRESS determines whether the ADDRESS is a function argument or some
>   other special value.  Pointer equality, not rtx_equal_p, determines whether
>   two ADDRESS expressions refer to the same base address.
>
>   The only use of the contents of an ADDRESS is for determining if the
>   current function performs nonlocal memory memory references for the
>   purposes of marking the function as a constant function.  */
>
> The first paragraph is a bit misleading IMO.  AFAICT, rtx_equal_p has
> always given ADDRESS the full recursive treatment, rather than saying
> that pointer equality determines ADDRESS equality.  (This is in contrast
> to something like VALUE, where pointer equality is used.)  And AFAICT
> we've always had:
>
> static int
> base_alias_check (rtx x, rtx y, enum machine_mode x_mode,
>                  enum machine_mode y_mode)
> {
>  ...
>  /* If the base addresses are equal nothing is known about aliasing.  */
>  if (rtx_equal_p (x_base, y_base))
>    return 1;
>  ...
> }
>
> So I think the contents of an ADDRESS _are_ used to distinguish
> between different bases.
>
> The second paragraph ceased to be true in 2005 when the pure/const
> analysis moved to its own IPA pass.  Nothing now looks at the contents
> beyond rtx_equal_p.
>
> Also, base_alias_check effectively treats all arguments as a single base.
> That makes conceptual sense, because this analysis isn't strong enough
> to determine whether arguments are base values at all, never mind whether
> accesses based on different arguments conflict.  But the fact that we have
> a single base isn't obvious from the way the code is written, because we
> create several separate, non-rtx_equal_p, ADDRESSes to represent arguments.
> See:
>
>  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>    /* Check whether this register can hold an incoming pointer
>       argument.  FUNCTION_ARG_REGNO_P tests outgoing register
>       numbers, so translate if necessary due to register windows.  */
>    if (FUNCTION_ARG_REGNO_P (OUTGOING_REGNO (i))
>        && HARD_REGNO_MODE_OK (i, Pmode))
>      static_reg_base_value[i]
>        = gen_rtx_ADDRESS (VOIDmode, gen_rtx_REG (Pmode, i));
>
> and:
>
>      /* Check for an argument passed in memory.  Only record in the
>         copying-arguments block; it is too hard to track changes
>         otherwise.  */
>      if (copying_arguments
>          && (XEXP (src, 0) == arg_pointer_rtx
>              || (GET_CODE (XEXP (src, 0)) == PLUS
>                  && XEXP (XEXP (src, 0), 0) == arg_pointer_rtx)))
>        return gen_rtx_ADDRESS (VOIDmode, src);
>
> I think it would be cleaner and less wasteful to use a single rtx for
> the single "base" (really "potential base").
>
> So if we wanted to, we could now remove the operand from ADDRESS and
> simply rely on pointer equality.  I'm a bit reluctant to do that though.
> It would make debugging harder, and it would mean either adding knowledge
> of this alias-specific code to other files (specifically rtl.c:rtx_equal_p),
> or adding special ADDRESS shortcuts to alias.c.  But I think the code
> would be more obvious if we replaced the rtx operand with a unique id,
> which is what we already use for the REG_NOALIAS case:
>
>      new_reg_base_value[regno] = gen_rtx_ADDRESS (Pmode,
>                                                   GEN_INT (unique_id++));
>
> And if we do that, we can make the id a direct operand of the ADDRESS,
> rather than a CONST_INT subrtx[*].  That should make rtx_equal_p cheaper too.
>
>  [*] I'm trying to get rid of CONST_INTs like these that have
>      no obvious mode.
>
> All of which led to the patch below.  I checked that it didn't change
> the code generated at -O2 for a recent set of cc1 .ii files.  Also
> bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?
>
> To cover my back: I'm just trying to rewrite the current code according
> to its current assumptions.  Whether those assumptions are correct or not
> is always open to debate...

This all looks reasonable and matches what I discovered by reverse
engineering the last time I ran into ADDRESSes ...

So, ok, given that nobody else has commented yet.

Thanks,
Richard.

> Richard
>
>
> gcc/
>        * rtl.def (ADDRESS): Turn operand into a HOST_WIDE_INT.
>        * alias.c (reg_base_value): Expand and update comment.
>        (arg_base_value): New variable.
>        (unique_id): Move up file.
>        (unique_base_value, unique_base_value_p, known_base_value_p): New.
>        (find_base_value): Use arg_base_value and known_base_value_p.
>        (record_set): Document REG_NOALIAS handling.  Use unique_base_value.
>        (find_base_term): Use known_base_value_p.
>        (base_alias_check): Use unique_base_value_p.
>        (init_alias_target): Initialize arg_base_value.  Use unique_base_value.
>        (init_

Re: [Fixinclude]: Fix typo and default to twoprocess on VMS

2012-04-17 Thread Bruce Korb
Hi Tristan,

On Tue, Apr 17, 2012 at 5:57 AM, Tristan Gingold  wrote:
> Hi,
>
> one-process methodology cannot be used on VMS[...]
> But, when twoprocess is selected, gcc emits a warning[...]
> Ok for trunk ?

> diff --git a/fixincludes/configure.ac b/fixincludes/configure.ac
> index e7de791..f1fb2ff 100644
> --- a/fixincludes/configure.ac
> +++ b/fixincludes/configure.ac
> @@ -53,7 +53,8 @@ fi],
>        i?86-*-msdosdjgpp* | \
>        i?86-*-mingw32* | \
>        x86_64-*-mingw32* | \
> -       *-*-beos* )
> +       *-*-beos* | \
> +        *-*-*vms*)
>                TARGET=twoprocess
>                ;;

This, definitely.

> diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
> index 9f399ab..1133534 100644
> --- a/fixincludes/fixincl.c
> +++ b/fixincludes/fixincl.c
> @@ -829,7 +829,7 @@ fix_with_system (tFixDesc* p_fixd,
>       /*
>        *  Now add the fix number and file names that may be needed
>        */
> -      sprintf (pz_scan, " %ld '%s' '%s'",  (long) (p_fixd - fixDescList),
> +      sprintf (pz_scan, " %ld '%s' '%s' '%s'", (long) (p_fixd - fixDescList),
>               pz_fix_file, pz_file_source, pz_temp_file);
>     }
>   else /* NOT an "internal" fix: */

This, almost certainly.  I'll take a peek at the source and convince myself of
this decade old mistake tomorrow & send my grateful thanks and approval then.
(No access to source today.)

Thank you!  Cheers - Bruce


Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Ilya Enkovich
> On Tue, Apr 17, 2012 at 3:16 PM, Uros Bizjak  wrote:
>> On Tue, Apr 17, 2012 at 12:16 PM, Ilya Enkovich  
>> wrote:
>
> It has nothing but defines for Android. It did not move any existing
> code to this file.
>

 Adding linux-common.h to i386 backend needs approval from
 i386 backend maintainer.   If a patch also adds Android support,
 i386 backend maintainer may not feel comfortable to review it.
 However, if you simplify add linux-common.h with XXX_SPEC,
 i386 backend maintainer can review it easily.

 --
 H.J.
>>>
>>> All XXX_SPEC in linux-common.h are Android related. I also believe
>>> that my Android specific changes need i386 backend maintainer approval
>>> anyway because wrong Android support implementation may break other
>>> targets.
>>
>> +#undef  ENDFILE_SPEC
>> +#define ENDFILE_SPEC \
>> +  GNU_USER_TARGET_MATHFILE_SPEC " " \
>> +  GNU_USER_TARGET_ENDFILE_SPEC
>>
>> Where is GNU_USER_TARGET_ENDFILE_SPEC defined?
>
> Oh, I found it.
>
> The patch looks OK to me in the sense, that there is no difference for
> x86 targets.
>
> So, OK for x86.
>
> Thanks,
> Uros.

Thanks, Uros!

Maxim, could you please look at patch?

Thanks,
Ilya


CPU_NONE ix86_schedule cpu attribute for -march=nocona

2012-04-17 Thread Roman Zhuykov
Hello,

I found the following problem while investigating SMS on x86-64.
When I run gcc with -march=nocona (on pentium-4 with EM64T extension), all
latencies in data dependency graph become zeros. The global pointer
"insn_default_latency" points to insn_default_latency_none, which
returns zero for any instruction.
This happens because ix86_schedule cpu attribute is set to CPU_NONE for nocona.

CPU_NONE was introduced by this patch:
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00179.html

I think we don't want any scheduler to work with zero latencies on
such processors (with such -march).
The following patch fixes the problem for my case with -march=nocona.
Is it correct to fix the problem like this?
What to do with 32bit architectures (i386, i486, pentium4, pentium4m,
prescott) ?

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index af4af7c..38d64e9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2989,7 +2989,7 @@ ix86_option_override_internal (bool main_args_p)
   PTA_MMX | PTA_SSE | PTA_SSE2},
  {"prescott", PROCESSOR_NOCONA, CPU_NONE,
   PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3},
-  {"nocona", PROCESSOR_NOCONA, CPU_NONE,
+  {"nocona", PROCESSOR_NOCONA, CPU_GENERIC64,
   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
   | PTA_CX16 | PTA_NO_SAHF},
  {"core2", PROCESSOR_CORE2_64, CPU_CORE2,
--
Roman Zhuykov


Re: [i386, patch, RFC] HLE support in GCC

2012-04-17 Thread Sergey Ostanevich
>
> Any other inputs?
>

I would suggest to use "snprintf" b/gcc/config/i386/i386-c.c to avoid
possible buffer overrun.

I also have a question regarding AS compatibility. In case one built
GCC using AS with support of HLE then using this GCC on a machine with
old AS will cause fail because of usupported prefix. Can we support it
compile time rather configure time?

regards,
Sergos


Re: [i386, patch, RFC] HLE support in GCC

2012-04-17 Thread Andi Kleen
> I also have a question regarding AS compatibility. In case one built
> GCC using AS with support of HLE then using this GCC on a machine with
> old AS will cause fail because of usupported prefix. Can we support it

I don't think that's a supported use case for gcc.
It also doesn't work with .cfi* intrinsics and some other things.

> compile time rather configure time?

The only way to do that would be to always generate .byte,
but the people who read the assembler output would hate you 
for it.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: CPU_NONE ix86_schedule cpu attribute for -march=nocona

2012-04-17 Thread H.J. Lu
On Tue, Apr 17, 2012 at 7:35 AM, Roman Zhuykov  wrote:
> Hello,
>
> I found the following problem while investigating SMS on x86-64.
> When I run gcc with -march=nocona (on pentium-4 with EM64T extension), all
> latencies in data dependency graph become zeros. The global pointer
> "insn_default_latency" points to insn_default_latency_none, which
> returns zero for any instruction.
> This happens because ix86_schedule cpu attribute is set to CPU_NONE for 
> nocona.
>
> CPU_NONE was introduced by this patch:
> http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00179.html
>
> I think we don't want any scheduler to work with zero latencies on
> such processors (with such -march).
> The following patch fixes the problem for my case with -march=nocona.
> Is it correct to fix the problem like this?
> What to do with 32bit architectures (i386, i486, pentium4, pentium4m,
> prescott) ?
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index af4af7c..38d64e9 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2989,7 +2989,7 @@ ix86_option_override_internal (bool main_args_p)
>       PTA_MMX | PTA_SSE | PTA_SSE2},
>      {"prescott", PROCESSOR_NOCONA, CPU_NONE,
>       PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3},
> -      {"nocona", PROCESSOR_NOCONA, CPU_NONE,
> +      {"nocona", PROCESSOR_NOCONA, CPU_GENERIC64,
>       PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
>       | PTA_CX16 | PTA_NO_SAHF},
>      {"core2", PROCESSOR_CORE2_64, CPU_CORE2,
> --

Should we replace all CPU_NONE with CPU_GENERIC32/CPU_GENERIC64?

-- 
H.J.


[PATCH, rs6000] Remove DImode from SLOW_UNALIGNED_ACCESS

2012-04-17 Thread Pat Haugen

DImode references do not suffer a major performance hit for < 4-byte aligned 
access like the float types.

Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?

-Pat


2012-04-17  Pat Haugen 

* config/rs6000/rs6000.h (SLOW_UNALIGNED_ACCESS): Remove DImode.

Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h	(revision 186389)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -771,8 +771,7 @@ extern unsigned rs6000_pointer_size;
 #define SLOW_UNALIGNED_ACCESS(MODE, ALIGN)\
   (STRICT_ALIGNMENT			\
|| (((MODE) == SFmode || (MODE) == DFmode || (MODE) == TFmode	\
-	|| (MODE) == SDmode || (MODE) == DDmode || (MODE) == TDmode	\
-	|| (MODE) == DImode)		\
+	|| (MODE) == SDmode || (MODE) == DDmode || (MODE) == TDmode)	\
&& (ALIGN) < 32)			\
|| (VECTOR_MODE_P ((MODE)) && (((int)(ALIGN)) < VECTOR_ALIGN (MODE
 


Re: [PR tree-optimization/52558]: RFC: questions on store data race

2012-04-17 Thread Aldy Hernandez

On 04/13/12 18:22, Boehm, Hans wrote:




-Original Message-
From: Aldy Hernandez [mailto:al...@redhat.com]
Sent: Thursday, April 12, 2012 3:12 PM
To: Richard Guenther
Cc: Andrew MacLeod; Boehm, Hans; gcc-patches; Torvald Riegel
Subject: [PR tree-optimization/52558]: RFC: questions on store data
race

Here we have a testcase that affects both the C++ memory model and
transactional memory.

[Hans, this is caused by the same problem that is causing the
speculative register promotion issue you and Torvald pointed me at].




In the following testcase (adapted from the PR), the loop invariant
motion pass caches a pre-existing value for g_2, and then performs a
store to g_2 on every path, causing a store data race:

int g_1 = 1;
int g_2 = 0;

int func_1(void)
{
int l;
for (l = 0; l<  1234; l++)
{
  if (g_1)
return l;
  else
g_2 = 0;<-- Store to g_2 should only happen if !g_1
}
return 999;
}

This gets transformed into something like:

g_2_lsm = g_2;
if (g_1) {
g_2 = g_2_lsm;  // boo! hiss!
return 0;
} else {
g_2_lsm = 0;
g_2 = g_2_lsm;
}

The spurious write to g_2 could cause a data race.

Presumably the g_2_lsm = g_2 is actually outside the loop?

Why does the second g_2 = g_2_lsm; get introduced?  I would have expected it 
before the return.  Was the example just over-abbreviated?


There is some abbreviation going on :).  To be exact, this is what -O2 
currently produces for the lim1 pass.


:
  pretmp.4_1 = g_1;
  g_2_lsm.6_12 = g_2;

:
  # l_13 = PHI 
  # g_2_lsm.6_10 = PHI 
  g_1.0_4 = pretmp.4_1;
  if (g_1.0_4 != 0)
goto ;
  else
goto ;

:
  g_2_lsm.6_11 = 0;
  l_6 = l_13 + 1;
  if (l_6 != 1234)
goto ;
  else
goto ;

:
  # g_2_lsm.6_18 = PHI 
  g_2 = g_2_lsm.6_18;
  goto ;

:
  goto ;

:
  # g_2_lsm.6_17 = PHI 
  # l_19 = PHI 
  g_2 = g_2_lsm.6_17;

:
  # l_2 = PHI 
  return l_2;

So yes, there seems to be another write to g_2 inside the else, but 
probably because we have merged some basic blocks along the way.




Other than that, this sounds right to me.  So does Richard's flag-based 
version, which is the approach I would have originally expected.  But that 
clearly costs you a register.  It would be interesting to see how they compare.


I am working on the flag based approach.

Thanks to both of you.


Re: CPU_NONE ix86_schedule cpu attribute for -march=nocona

2012-04-17 Thread Alexander Monakov


On Tue, 17 Apr 2012, H.J. Lu wrote:

> On Tue, Apr 17, 2012 at 7:35 AM, Roman Zhuykov  wrote:
> > Hello,
> >
> > I found the following problem while investigating SMS on x86-64.
> > When I run gcc with -march=nocona (on pentium-4 with EM64T extension), all
> > latencies in data dependency graph become zeros. The global pointer
> > "insn_default_latency" points to insn_default_latency_none, which
> > returns zero for any instruction.
> > This happens because ix86_schedule cpu attribute is set to CPU_NONE for 
> > nocona.
> >
> > CPU_NONE was introduced by this patch:
> > http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00179.html
> >
> > I think we don't want any scheduler to work with zero latencies on
> > such processors (with such -march).
> > The following patch fixes the problem for my case with -march=nocona.
> > Is it correct to fix the problem like this?
> > What to do with 32bit architectures (i386, i486, pentium4, pentium4m,
> > prescott) ?
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index af4af7c..38d64e9 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -2989,7 +2989,7 @@ ix86_option_override_internal (bool main_args_p)
> >       PTA_MMX | PTA_SSE | PTA_SSE2},
> >      {"prescott", PROCESSOR_NOCONA, CPU_NONE,
> >       PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3},
> > -      {"nocona", PROCESSOR_NOCONA, CPU_NONE,
> > +      {"nocona", PROCESSOR_NOCONA, CPU_GENERIC64,
> >       PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
> >       | PTA_CX16 | PTA_NO_SAHF},
> >      {"core2", PROCESSOR_CORE2_64, CPU_CORE2,
> > --
> 
> Should we replace all CPU_NONE with CPU_GENERIC32/CPU_GENERIC64?

CPU_GENERIC32 had been removed by the 2008 patch Roman was referring to.  Did
you mean CPU_PENTIUMPRO?

PowerPC prologue and epilogue

2012-04-17 Thread Alan Modra
This is the first in a series of patches cleaning up rs6000 prologue
and epilogue generating code.  This one is just the formatting/style
changes plus renaming two variables to better reflect their usage,
and moving code around.

The patch series has been bootstrapped and regression tested
powerpc-linux, powerpc64-linux and powerpc-linux-gnuspe.  Please test
on darwin and aix.

* config/rs6000/rs6000.c (rs6000_emit_savres_rtx): Formatting.
(rs6000_emit_prologue, rs6000_emit_epilogue): Likewise.  Rename
sp_offset to frame_off.  Move world save code earlier.

diff -urp gcc-virgin/gcc/config/rs6000/rs6000.c 
gcc-alan1/gcc/config/rs6000/rs6000.c
--- gcc-virgin/gcc/config/rs6000/rs6000.c   2012-04-14 22:48:44.108432893 
+0930
+++ gcc-alan1/gcc/config/rs6000/rs6000.c2012-04-16 11:57:37.282242636 
+0930
@@ -19212,9 +19212,9 @@ rs6000_emit_savres_rtx (rs6000_stack_t *
 
   sym = rs6000_savres_routine_sym (info, savep, gpr, lr);
   RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym);
-  use_reg = DEFAULT_ABI == ABI_AIX ? (gpr && !lr ? 12 : 1)
-  : DEFAULT_ABI == ABI_DARWIN && !gpr ? 1
-  : 11;
+  use_reg = (DEFAULT_ABI == ABI_AIX ? (gpr && !lr ? 12 : 1)
+: DEFAULT_ABI == ABI_DARWIN && !gpr ? 1
+: 11);
   RTVEC_ELT (p, offset++)
 = gen_rtx_USE (VOIDmode,
   gen_rtx_REG (Pmode, use_reg));
@@ -19224,7 +19224,7 @@ rs6000_emit_savres_rtx (rs6000_stack_t *
   rtx addr, reg, mem;
   reg = gen_rtx_REG (reg_mode, start_reg + i);
   addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
-  GEN_INT (save_area_offset + reg_size*i));
+  GEN_INT (save_area_offset + reg_size * i));
   mem = gen_frame_mem (reg_mode, addr);
 
   RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode,
@@ -19293,9 +19293,9 @@ rs6000_emit_prologue (void)
   int saving_GPRs_inline;
   int using_store_multiple;
   int using_static_chain_p = (cfun->static_chain_decl != NULL_TREE
-  && df_regs_ever_live_p (STATIC_CHAIN_REGNUM)
+ && df_regs_ever_live_p (STATIC_CHAIN_REGNUM)
  && call_used_regs[STATIC_CHAIN_REGNUM]);
-  HOST_WIDE_INT sp_offset = 0;
+  HOST_WIDE_INT frame_off = 0;
 
   if (flag_stack_usage_info)
 current_function_static_stack_size = info->total_size;
@@ -19323,52 +19323,6 @@ rs6000_emit_prologue (void)
   reg_size = 8;
 }
 
-  strategy = info->savres_strategy;
-  using_store_multiple = strategy & SAVRES_MULTIPLE;
-  saving_FPRs_inline = strategy & SAVE_INLINE_FPRS;
-  saving_GPRs_inline = strategy & SAVE_INLINE_GPRS;
-
-  /* For V.4, update stack before we do any saving and set back pointer.  */
-  if (! WORLD_SAVE_P (info)
-  && info->push_p
-  && (DEFAULT_ABI == ABI_V4
- || crtl->calls_eh_return))
-{
-  bool need_r11 = (TARGET_SPE
-  ? (!saving_GPRs_inline
- && info->spe_64bit_regs_used == 0)
-  : (!saving_FPRs_inline || !saving_GPRs_inline));
-  rtx copy_reg = need_r11 ? gen_rtx_REG (Pmode, 11) : NULL;
-
-  if (info->total_size < 32767)
-   sp_offset = info->total_size;
-  else if (need_r11)
-   frame_reg_rtx = copy_reg;
-  else if (info->cr_save_p
-  || info->lr_save_p
-  || info->first_fp_reg_save < 64
-  || info->first_gp_reg_save < 32
-  || info->altivec_size != 0
-  || info->vrsave_mask != 0
-  || crtl->calls_eh_return)
-   {
- copy_reg = frame_ptr_rtx;
- frame_reg_rtx = copy_reg;
-   }
-  else
-   {
- /* The prologue won't be saving any regs so there is no need
-to set up a frame register to access any frame save area.
-We also won't be using sp_offset anywhere below, but set
-the correct value anyway to protect against future
-changes to this function.  */
- sp_offset = info->total_size;
-   }
-  rs6000_emit_allocate_stack (info->total_size, copy_reg);
-  if (frame_reg_rtx != sp_reg_rtx)
-   rs6000_emit_stack_tie (frame_reg_rtx, false);
-}
-
   /* Handle world saves specially here.  */
   if (WORLD_SAVE_P (info))
 {
@@ -19396,7 +19350,7 @@ rs6000_emit_prologue (void)
  && info->push_p
  && info->lr_save_p
  && (!crtl->calls_eh_return
-  || info->ehrd_offset == -432)
+ || info->ehrd_offset == -432)
  && info->vrsave_save_offset == -224
  && info->altivec_save_offset == -416);
 
@@ -19423,14 +19377,14 @@ rs6000_emit_prologue (void)
 properly.  */
   for (i = 0; i < 64 - info->first_fp_reg_save; i++)
{
- rtx reg = gen_rtx_REG (((TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT

Re: [PATCH, rs6000] Remove DImode from SLOW_UNALIGNED_ACCESS

2012-04-17 Thread David Edelsohn
On Tue, Apr 17, 2012 at 10:55 AM, Pat Haugen
 wrote:
> DImode references do not suffer a major performance hit for < 4-byte aligned
> access like the float types.
>
> Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?
>
> 2012-04-17  Pat Haugen 
>
>        * config/rs6000/rs6000.h (SLOW_UNALIGNED_ACCESS): Remove DImode.

Okay.

I think this may have been introduced at the time the port was 32 bit
and DImode values sometimes were allocated to FPRs.

Thanks, David


PowerPC prologue and epilogue 2

2012-04-17 Thread Alan Modra
This fixes a lot of confusion in rs6000_frame_related call arguments.
At the time rs6000_frame_related first appeared, the prologue only
used sp_reg_rtx (r1) or frame_ptr_rtx (r12) as frame_reg_rtx to access
register save slots.  If r12 was used, it was necessary to add a note
that gave the equivalent offset relative to r1.

Nowadays, r11 is used as frame_reg_rtx too, when abiv4 and saving regs
out-of-line with a large frame.  When that change was made the calls
to rs6000_frame_related were not updated.  So rs6000_frame_related
won't replace r11 in register save rtl.  As it happens this isn't a
bug because when you look closely, out-of-line saves are disabled with
a large frame!  A fix for that will come later in this patch series.
I also optimize rs6000_frame_related a little to save generating
duplicate rtl.

* config/rs6000/rs6000.c (rs6000_frame_related): Don't emit a
REG_FRAME_RELATED_EXPR note when the instruction exactly matches
the replacement.
(emit_frame_save): Delete frame_ptr param.  Rename total_size to
frame_reg_to_sp.
(rs6000_emit_prologue): Add sp_off.  Update rs6000_frame_related
and emit_frame_save calls.  Cope with possibly missing note.

diff -urp gcc-alan1/gcc/config/rs6000/rs6000.c 
gcc-alan2/gcc/config/rs6000/rs6000.c
--- gcc-alan1/gcc/config/rs6000/rs6000.c2012-04-16 11:57:37.282242636 
+0930
+++ gcc-alan2/gcc/config/rs6000/rs6000.c2012-04-16 11:58:01.50108 
+0930
@@ -18751,7 +18751,10 @@ output_probe_stack_range (rtx reg1, rtx
with (plus:P (reg 1) VAL), and with REG2 replaced with RREG if REG2
is not NULL.  It would be nice if dwarf2out_frame_debug_expr could
deduce these equivalences by itself so it wasn't necessary to hold
-   its hand so much.  */
+   its hand so much.  Don't be tempted to always supply d2_f_d_e with
+   the actual cfa register, ie. r31 when we are using a hard frame
+   pointer.  That fails when saving regs off r1, and sched moves the
+   r31 setup past the reg saves.  */
 
 static rtx
 rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
@@ -18759,6 +18762,25 @@ rs6000_frame_related (rtx insn, rtx reg,
 {
   rtx real, temp;
 
+  if (REGNO (reg) == 1 && reg2 == NULL_RTX)
+{
+  /* No need for any replacement.  Just set RTX_FRAME_RELATED_P.  */
+  int i;
+
+  gcc_checking_assert (val == 0);
+  real = PATTERN (insn);
+  if (GET_CODE (real) == PARALLEL)
+   for (i = 0; i < XVECLEN (real, 0); i++)
+ if (GET_CODE (XVECEXP (real, 0, i)) == SET)
+   {
+ rtx set = XVECEXP (real, 0, i);
+
+ RTX_FRAME_RELATED_P (set) = 1;
+   }
+  RTX_FRAME_RELATED_P (insn) = 1;
+  return insn;
+}
+
   /* copy_rtx will not make unique copies of registers, so we need to
  ensure we don't have unwanted sharing here.  */
   if (reg == reg2)
@@ -18772,10 +18794,13 @@ rs6000_frame_related (rtx insn, rtx reg,
   if (reg2 != NULL_RTX)
 real = replace_rtx (real, reg2, rreg);
 
-  real = replace_rtx (real, reg,
- gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode,
-   STACK_POINTER_REGNUM),
-   GEN_INT (val)));
+  if (REGNO (reg) == 1)
+gcc_checking_assert (val == 0);
+  else
+real = replace_rtx (real, reg,
+   gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode,
+ STACK_POINTER_REGNUM),
+ GEN_INT (val)));
 
   /* We expect that 'real' is either a SET or a PARALLEL containing
  SETs (and possibly other stuff).  In a PARALLEL, all the SETs
@@ -18893,8 +18918,8 @@ generate_set_vrsave (rtx reg, rs6000_sta
Save REGNO into [FRAME_REG + OFFSET] in mode MODE.  */
 
 static rtx
-emit_frame_save (rtx frame_reg, rtx frame_ptr, enum machine_mode mode,
-unsigned int regno, int offset, HOST_WIDE_INT total_size)
+emit_frame_save (rtx frame_reg, enum machine_mode mode,
+unsigned int regno, int offset, HOST_WIDE_INT frame_reg_to_sp)
 {
   rtx reg, offset_rtx, insn, mem, addr, int_rtx;
   rtx replacea, replaceb;
@@ -18930,7 +18955,8 @@ emit_frame_save (rtx frame_reg, rtx fram
 
   insn = emit_move_insn (mem, reg);
 
-  return rs6000_frame_related (insn, frame_ptr, total_size, replacea, 
replaceb);
+  return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
+  replacea, replaceb);
 }
 
 /* Emit an offset memory reference suitable for a frame store, while
@@ -19295,7 +19321,9 @@ rs6000_emit_prologue (void)
   int using_static_chain_p = (cfun->static_chain_decl != NULL_TREE
  && df_regs_ever_live_p (STATIC_CHAIN_REGNUM)
  && call_used_regs[STATIC_CHAIN_REGNUM]);
+  /* Offset to top of frame for frame_reg and sp respectively.  */
   HOST_WIDE_INT frame_off = 0;
+  HOST_WIDE_INT sp_off = 0;
 
   if (flag_st

PowerPC prologue and epilogue 3

2012-04-17 Thread Alan Modra
This continues the prologue and epilogue cleanup.  Not many user
visible changes here, except for:
- a bugfix to the LR save RTL emitted by rs6000_emit_savres_rtx which
  may affect SPE,
- a bugfix for SPE code emitted when using a static chain,
- vector saves will be done using r1 for large frames just over 32k in
  size, and,
- using r11 as a frame pointer whenever we need to set up r11 for
  out-of-line saves, and merging two pointer reg setup insns.
The latter is a necessary prerequisite to enabling out-of-line
save/restore for large frames, as I do in a later patch.  Currently
this will only affect abiv4 -Os when using out-of-line saves.

eg. -m32 -Os -mno-multiple
int f (double x)
{
  char a[33];
  __asm __volatile ("#%0" : "=m" (a) : : "fr31", "r27", "r28");
  return (int) x;
}
old new
stwu 1,-96(1)   mflr 0
mflr 0  addi 11,1,-8
addi 11,1,88stwu 1,-96(1)
stw 0,100(1)stw 0,12(11)
stfd 31,88(1)   bl _savegpr_27
bl _savegpr_27  stfd 31,0(11)


* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Delete forward
decl.  Move logic selecting update reg to callers.  Update all callers.
(rs6000_emit_allocate_stack): Add copy_off param.
(emit_frame_save): Don't handle reg+reg addressing.
(ptr_regno_for_savres): New function, extracted from..
(rs6000_emit_savres_rtx): ..here.  Add lr_offset param.
(rs6000_emit_prologue): Generate frame_ptr_rtx as we need it.
Set frame_reg_rtx to r11 whenever r11 is needed, and merge
frame offset adjustment for out-of-line save with copy from sp.
Simplify condition controlling whether cr is saved early or
late.  Use ptr_regno_for_savres to verify correct reg is set
up for out-of-line saves.  Pass the actual pointer reg used to
rs6000_emit_savres_rtx so rtl matches insns in out-of-line
function.  Rearrange spe vars so code is similar to that
elsewhere in this function.  Don't update frame_off when spe
save code will restore r11.  Use emit_frame_save for spe and
gpr saves.  Consolidate darwin out-of-line gpr setup with that
for other abis.  Don't assume frame_offset is zero and frame
reg is sp when setting up altivec reg saves, and calculate
exact offset requirement.
(rs6000_emit_epilogue): Use HOST_WIDE_INT for frame_off.  Tidy
spe restore code.  Consolidate darwin out-of-line gpr setup
with that for other abis.

diff -urp gcc-alan2/gcc/config/rs6000/rs6000.c 
gcc-alan3/gcc/config/rs6000/rs6000.c
--- gcc-alan2/gcc/config/rs6000/rs6000.c2012-04-16 11:58:01.50108 
+0930
+++ gcc-alan3/gcc/config/rs6000/rs6000.c2012-04-17 07:19:42.927931887 
+0930
@@ -951,7 +951,6 @@ static void rs6000_eliminate_indexed_mem
 static const char *rs6000_mangle_type (const_tree);
 static void rs6000_set_default_type_attributes (tree);
 static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
-static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -18534,7 +18533,7 @@ rs6000_emit_stack_tie (rtx fp, bool hard
The generated code may use hard register 0 as a temporary.  */
 
 static void
-rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg)
+rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg, int copy_off)
 {
   rtx insn;
   rtx stack_reg = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
@@ -18578,7 +18577,12 @@ rs6000_emit_allocate_stack (HOST_WIDE_IN
 }
 
   if (copy_reg)
-emit_move_insn (copy_reg, stack_reg);
+{
+  if (copy_off != 0)
+   emit_insn (gen_add3_insn (copy_reg, stack_reg, GEN_INT (copy_off)));
+  else
+   emit_move_insn (copy_reg, stack_reg);
+}
 
   if (size > 32767)
 {
@@ -18921,42 +18925,22 @@ static rtx
 emit_frame_save (rtx frame_reg, enum machine_mode mode,
 unsigned int regno, int offset, HOST_WIDE_INT frame_reg_to_sp)
 {
-  rtx reg, offset_rtx, insn, mem, addr, int_rtx;
-  rtx replacea, replaceb;
-
-  int_rtx = GEN_INT (offset);
+  rtx reg, insn, mem, addr;
 
   /* Some cases that need register indexed addressing.  */
-  if ((TARGET_ALTIVEC_ABI && ALTIVEC_VECTOR_MODE (mode))
-  || (TARGET_VSX && ALTIVEC_OR_VSX_VECTOR_MODE (mode))
-  || (TARGET_E500_DOUBLE && mode == DFmode)
-  || (TARGET_SPE_ABI
- && SPE_VECTOR_MODE (mode)
- && !SPE_CONST_OFFSET_OK (offset)))
-{
-  /* Whomever calls us must make sure r11 is available in the
-flow path of instructions in the prologue.  */
-  offset_rtx = gen_rtx_REG (Pmode, 11);
-  emit_move_insn (offset_rtx, int_rtx);
-
-  replacea = offset_rtx;
-  replaceb = int

PowerPC prologue and epilogue 4

2012-04-17 Thread Alan Modra
This provides some protection against misuse of r0, r11 and r12.  I
found it useful when enabling out-of-line saves for large frames.  ;-)

* config/rs6000/rs6000.c (START_USE, END_USE, NOT_INUSE): Define.
(rs6000_emit_prologue): Use the above to catch register overlap.

diff -urp gcc-alan3/gcc/config/rs6000/rs6000.c 
gcc-alan4/gcc/config/rs6000/rs6000.c
--- gcc-alan3/gcc/config/rs6000/rs6000.c2012-04-17 07:19:42.927931887 
+0930
+++ gcc-alan4/gcc/config/rs6000/rs6000.c2012-04-17 09:11:31.760669589 
+0930
@@ -19301,6 +19301,29 @@ rs6000_emit_prologue (void)
   HOST_WIDE_INT frame_off = 0;
   HOST_WIDE_INT sp_off = 0;
 
+#ifdef ENABLE_CHECKING
+  /* Track and check usage of r0, r11, r12.  */
+  int reg_inuse = using_static_chain_p ? 1 << 11 : 0;
+#define START_USE(R) do \
+  {\
+gcc_assert ((reg_inuse & (1 << (R))) == 0);\
+reg_inuse |= 1 << (R); \
+  } while (0)
+#define END_USE(R) do \
+  {\
+gcc_assert ((reg_inuse & (1 << (R))) != 0);\
+reg_inuse &= ~(1 << (R));  \
+  } while (0)
+#define NOT_INUSE(R) do \
+  {\
+gcc_assert ((reg_inuse & (1 << (R))) == 0);\
+  } while (0)
+#else
+#define START_USE(R) do {} while (0)
+#define END_USE(R) do {} while (0)
+#define NOT_INUSE(R) do {} while (0)
+#endif
+
   if (flag_stack_usage_info)
 current_function_static_stack_size = info->total_size;
 
@@ -19465,6 +19488,7 @@ rs6000_emit_prologue (void)
   if (need_r11)
{
  ptr_reg = gen_rtx_REG (Pmode, 11);
+ START_USE (11);
}
   else if (info->total_size < 32767)
frame_off = info->total_size;
@@ -19477,6 +19501,7 @@ rs6000_emit_prologue (void)
   || crtl->calls_eh_return)
{
  ptr_reg = gen_rtx_REG (Pmode, 12);
+ START_USE (12);
}
   else
{
@@ -19509,6 +19534,7 @@ rs6000_emit_prologue (void)
   rtx addr, reg, mem;
 
   reg = gen_rtx_REG (Pmode, 0);
+  START_USE (0);
   insn = emit_move_insn (reg, gen_rtx_REG (Pmode, LR_REGNO));
   RTX_FRAME_RELATED_P (insn) = 1;
 
@@ -19524,6 +19550,7 @@ rs6000_emit_prologue (void)
  insn = emit_move_insn (mem, reg);
  rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
NULL_RTX, NULL_RTX);
+ END_USE (0);
}
 }
 
@@ -19536,6 +19563,7 @@ rs6000_emit_prologue (void)
   rtx set;
 
   cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno);
+  START_USE (cr_save_regno);
   insn = emit_insn (gen_movesi_from_cr (cr_save_rtx));
   RTX_FRAME_RELATED_P (insn) = 1;
   /* Now, there's no way that dwarf2out_frame_debug_expr is going
@@ -19579,6 +19607,8 @@ rs6000_emit_prologue (void)
 /*savep=*/true, /*gpr=*/false, lr);
   rs6000_frame_related (insn, frame_reg_rtx, sp_off,
NULL_RTX, NULL_RTX);
+  if (lr)
+   END_USE (0);
 }
 
   /* Save GPRs.  This is done as a PARALLEL if we are using
@@ -19623,10 +19653,15 @@ rs6000_emit_prologue (void)
  if (using_static_chain_p)
{
  rtx r0 = gen_rtx_REG (Pmode, 0);
+
+ START_USE (0);
  gcc_assert (info->first_gp_reg_save > 11);
 
  emit_move_insn (r0, spe_save_area_ptr);
}
+ else if (REGNO (frame_reg_rtx) != 11)
+   START_USE (11);
+
  emit_insn (gen_addsi3 (spe_save_area_ptr,
 frame_reg_rtx, GEN_INT (offset)));
  if (!using_static_chain_p && REGNO (frame_reg_rtx) == 11)
@@ -19657,8 +19692,16 @@ rs6000_emit_prologue (void)
}
 
   /* Move the static chain pointer back.  */
-  if (using_static_chain_p && !spe_regs_addressable)
-   emit_move_insn (spe_save_area_ptr, gen_rtx_REG (Pmode, 0));
+  if (!spe_regs_addressable)
+   {
+ if (using_static_chain_p)
+   {
+ emit_move_insn (spe_save_area_ptr, gen_rtx_REG (Pmode, 0));
+ END_USE (0);
+   }
+ else if (REGNO (frame_reg_rtx) != 11)
+   END_USE (11);
+   }
 }
   else if (!WORLD_SAVE_P (info) && !saving_GPRs_inline)
 {
@@ -19679,10 +19722,13 @@ rs6000_emit_prologue (void)
 
  if (ptr_set_up)
frame_off = -end_save;
+ else
+   NOT_INUSE (ptr_regno);
  emit_insn (gen_add3_insn (ptr_reg, frame_reg_rtx, offset));
}
   else if (!ptr_set_up)
{
+ NOT_INUSE (ptr_regno);
  emit_move_insn (ptr_reg, frame_reg_rtx);
}
   ptr_off = -end_save;
@@ -19693,6 +19739,8 @@ rs6000_emit_prologue (void)
 /*savep=*/true, /*gpr=*/true, lr);
   rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off,
NULL_RTX,

PowerPC prologue and epilogue 5

2012-04-17 Thread Alan Modra
This enables out-of-line save and restore for large frames, and for
ABI_AIX when using the static chain.

* config/rs6000/rs6000.c (rs6000_savres_strategy): Allow
out-of-line save/restore for large frames.  Don't disable
out-of-line saves on ABI_AIX when using static chain reg.
(rs6000_emit_prologue): Adjust cr_save_regno on ABI_AIX to not
clobber static chain reg, and tweak for out-of-line gpr saves
that use r1.

diff -urp gcc-alan4/gcc/config/rs6000/rs6000.c 
gcc-alan5/gcc/config/rs6000/rs6000.c
--- gcc-alan4/gcc/config/rs6000/rs6000.c2012-04-17 09:11:31.760669589 
+0930
+++ gcc-alan5/gcc/config/rs6000/rs6000.c2012-04-17 11:16:09.369537832 
+0930
@@ -17432,8 +17432,7 @@ rs6000_savres_strategy (rs6000_stack_t *
 strategy |= SAVRES_MULTIPLE;
 
   if (crtl->calls_eh_return
-  || cfun->machine->ra_need_lr
-  || info->total_size > 32767)
+  || cfun->machine->ra_need_lr)
 strategy |= (SAVE_INLINE_FPRS | REST_INLINE_FPRS
 | SAVE_INLINE_GPRS | REST_INLINE_GPRS);
 
@@ -17454,10 +17453,10 @@ rs6000_savres_strategy (rs6000_stack_t *
   /* Don't bother to try to save things out-of-line if r11 is occupied
  by the static chain.  It would require too much fiddling and the
  static chain is rarely used anyway.  FPRs are saved w.r.t the stack
- pointer on Darwin.  */
-  if (using_static_chain_p)
-strategy |= (DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS)
-   | SAVE_INLINE_GPRS;
+ pointer on Darwin, and AIX uses r1 or r12.  */
+  if (using_static_chain_p && DEFAULT_ABI != ABI_AIX)
+strategy |= ((DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS)
+| SAVE_INLINE_GPRS);
 
   /* If we are going to use store multiple, then don't even bother
  with the out-of-line routines, since the store-multiple
@@ -19555,7 +19554,10 @@ rs6000_emit_prologue (void)
 }
 
   /* If we need to save CR, put it into r12 or r11.  */
-  cr_save_regno = DEFAULT_ABI == ABI_AIX && !saving_GPRs_inline ? 11 : 12;
+  cr_save_regno = (DEFAULT_ABI == ABI_AIX
+  && (strategy & SAVE_INLINE_GPRS) == 0
+  && (strategy & SAVE_NOINLINE_GPRS_SAVES_LR) == 0
+  && !using_static_chain_p ? 11 : 12);
   if (!WORLD_SAVE_P (info)
   && info->cr_save_p
   && REGNO (frame_reg_rtx) != cr_save_regno)

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Atom: Enabling unroll at O2 optimization level

2012-04-17 Thread Igor Zamyatin
On Thu, Apr 12, 2012 at 3:16 PM, Richard Guenther
 wrote:
> On Thu, Apr 12, 2012 at 1:05 PM, Igor Zamyatin  wrote:
>> On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther
>>  wrote:
>>> On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin  wrote:
 Hi All!

 Here is a patch that enables unroll at O2 for Atom.

 This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32
 bits) with quite moderate code size increase (~5% for EEMBC2.0, 32
 bits).
>>>
>>> 5% is not moderate.  Your patch does enable unrolling at -O2 but not -O3,
>>> why? Why do you disable register renaming?  check_imull requires a function
>>> comment.
>>
>> Sure, enabling unroll for O3 could be the next step.
>> We can't avoid code size increase with unroll - what number do you
>> think will be appropriate?
>> Register renaming was the reason of several degradations during tuning 
>> process
>> Comment for check_imull was added
>>
>>>
>>> This completely looks like a hack for EEMBC2.0, so it's definitely not ok.
>>
>> Why? EEMBC was measured and result provided here just because this
>> benchmark considers to be very relevant for Atom
>
> I'd say that SPEC INT (2000 / 2006) is more relevant for Atom (SPEC FP
> would be irrelevant OTOH).  Similar code size for, say, Mozilla Firefox
> or GCC itself would be important.
>
>>> -O2 is not supposed to give best benchmark results.
>>
>> O2 is wide-used so performance improvement could be important for users.
>
> But not at a 5% size cost.  Please also always check the compile-time effect
> which is important for -O2 as well.

What would be an acceptable number of size cost/compile-time increase
for O2 and O3 on EEMBC, SPEC INT 2000 and Mozilla?

Is it possible in common to put Atom-specific unroll heuristics under
some option which could be mentioned in GCC docs?

>
> Richard.
>
>>>
>>> Thanks,
>>> Richard.
>>>

 Tested for i386 and x86-64, ok for trunk?
>>
>> Updated patch attached
>>

 Thanks,
 Igor

 ChangeLog:

 2012-04-10  Yakovlev Vladimir  

        * gcc/config/i386/i386.c (check_imul): New routine.
        (ix86_loop_unroll_adjust): New target hook.
        (ix86_option_override_internal): Enable unrolling on Atom at -O2.
        (TARGET_LOOP_UNROLL_ADJUST): New define.

Thanks,
Igor


[PATCH] Allow un-distribution with repeated factors (PR52976 follow-up)

2012-04-17 Thread William J. Schmidt
The emergency reassociation patch for PR52976 disabled un-distribution
in the presence of repeated factors to avoid ICEs in zero_one_operation.
This patch fixes such cases properly by teaching zero_one_operation
about __builtin_pow* calls.

Bootstrapped with no new regressions on powerpc64-linux.  Also built
SPEC cpu2000 and cpu2006 successfully.  Ok for trunk?

Thanks,
Bill


gcc:

2012-04-17  Bill Schmidt  

* tree-ssa-reassoc.c (stmt_is_power_of_op): New function.
(decrement_power): Likewise.
(propagate_op_to_single_use): Likewise.
(zero_one_operation): Handle __builtin_pow* calls in linearized
expression trees; factor logic into propagate_op_to_single_use.
(undistribute_ops_list): Allow operands with repeat counts > 1.


gcc/testsuite:

2012-04-17  Bill Schmidt  

gfortran.dg/reassoc_7.f: New test.
gfortran.dg/reassoc_8.f: Likewise.
gfortran.dg/reassoc_9.f: Likewise.
gfortran.dg/reassoc_10.f: Likewise.


Index: gcc/testsuite/gfortran.dg/reassoc_10.f
===
--- gcc/testsuite/gfortran.dg/reassoc_10.f  (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_10.f  (revision 0)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+
+  SUBROUTINE S55199(P,Q,Dvdph)
+  implicit none
+  real(8) :: c1,c2,c3,P,Q,Dvdph
+  c1=0.1d0
+  c2=0.2d0
+  c3=0.3d0
+  Dvdph = c1 + 2.*P*c2 + 3.*P**2*Q**3*c3
+  END
+
+! There should be five multiplies following un-distribution
+! and power expansion.
+
+! { dg-final { scan-tree-dump-times " \\\* " 5 "optimized" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
Index: gcc/testsuite/gfortran.dg/reassoc_7.f
===
--- gcc/testsuite/gfortran.dg/reassoc_7.f   (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_7.f   (revision 0)
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+
+  SUBROUTINE S55199(P,Dvdph)
+  implicit none
+  real(8) :: c1,c2,c3,P,Dvdph
+  c1=0.1d0
+  c2=0.2d0
+  c3=0.3d0
+  Dvdph = c1 + 2.*P*c2 + 3.*P**2*c3
+  END
+
+! There should be two multiplies following un-distribution.
+
+! { dg-final { scan-tree-dump-times " \\\* " 2 "optimized" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
Index: gcc/testsuite/gfortran.dg/reassoc_8.f
===
--- gcc/testsuite/gfortran.dg/reassoc_8.f   (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_8.f   (revision 0)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+
+  SUBROUTINE S55199(P,Dvdph)
+  implicit none
+  real(8) :: c1,c2,c3,P,Dvdph
+  c1=0.1d0
+  c2=0.2d0
+  c3=0.3d0
+  Dvdph = c1 + 2.*P**2*c2 + 3.*P**3*c3
+  END
+
+! There should be three multiplies following un-distribution
+! and power expansion.
+
+! { dg-final { scan-tree-dump-times " \\\* " 3 "optimized" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
Index: gcc/testsuite/gfortran.dg/reassoc_9.f
===
--- gcc/testsuite/gfortran.dg/reassoc_9.f   (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_9.f   (revision 0)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-options "-O3 -ffast-math -fdump-tree-optimized" }
+
+  SUBROUTINE S55199(P,Dvdph)
+  implicit none
+  real(8) :: c1,c2,c3,P,Dvdph
+  c1=0.1d0
+  c2=0.2d0
+  c3=0.3d0
+  Dvdph = c1 + 2.*P**2*c2 + 3.*P**4*c3
+  END
+
+! There should be three multiplies following un-distribution
+! and power expansion.
+
+! { dg-final { scan-tree-dump-times " \\\* " 3 "optimized" } }
+! { dg-final { cleanup-tree-dump "optimized" } }
Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 186495)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -1020,6 +1020,98 @@ oecount_cmp (const void *p1, const void *p2)
 return c1->id - c2->id;
 }
 
+/* Return TRUE iff STMT represents a builtin call that raises OP
+   to some exponent.  */
+
+static bool
+stmt_is_power_of_op (gimple stmt, tree op)
+{
+  tree fndecl;
+
+  if (!is_gimple_call (stmt))
+return false;
+
+  fndecl = gimple_call_fndecl (stmt);
+
+  if (!fndecl
+  || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+return false;
+
+  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
+{
+CASE_FLT_FN (BUILT_IN_POW):
+CASE_FLT_FN (BUILT_IN_POWI):
+  return (operand_equal_p (gimple_call_arg (stmt, 0), op, 0));
+  
+default:
+  return false;
+}
+}
+
+/* Given STMT which is a __builtin_pow* call, decrement its exponent
+   in place and return the result.  Assumes that stmt_is_power_of_op
+   was previously called for STMT and returne

Re: CPU_NONE ix86_schedule cpu attribute for -march=nocona

2012-04-17 Thread H.J. Lu
On Tue, Apr 17, 2012 at 8:04 AM, Alexander Monakov  wrote:
>
>
> On Tue, 17 Apr 2012, H.J. Lu wrote:
>
>> On Tue, Apr 17, 2012 at 7:35 AM, Roman Zhuykov  wrote:
>> > Hello,
>> >
>> > I found the following problem while investigating SMS on x86-64.
>> > When I run gcc with -march=nocona (on pentium-4 with EM64T extension), all
>> > latencies in data dependency graph become zeros. The global pointer
>> > "insn_default_latency" points to insn_default_latency_none, which
>> > returns zero for any instruction.
>> > This happens because ix86_schedule cpu attribute is set to CPU_NONE for 
>> > nocona.
>> >
>> > CPU_NONE was introduced by this patch:
>> > http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00179.html
>> >
>> > I think we don't want any scheduler to work with zero latencies on
>> > such processors (with such -march).
>> > The following patch fixes the problem for my case with -march=nocona.
>> > Is it correct to fix the problem like this?
>> > What to do with 32bit architectures (i386, i486, pentium4, pentium4m,
>> > prescott) ?
>> >
>> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> > index af4af7c..38d64e9 100644
>> > --- a/gcc/config/i386/i386.c
>> > +++ b/gcc/config/i386/i386.c
>> > @@ -2989,7 +2989,7 @@ ix86_option_override_internal (bool main_args_p)
>> >       PTA_MMX | PTA_SSE | PTA_SSE2},
>> >      {"prescott", PROCESSOR_NOCONA, CPU_NONE,
>> >       PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3},
>> > -      {"nocona", PROCESSOR_NOCONA, CPU_NONE,
>> > +      {"nocona", PROCESSOR_NOCONA, CPU_GENERIC64,
>> >       PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
>> >       | PTA_CX16 | PTA_NO_SAHF},
>> >      {"core2", PROCESSOR_CORE2_64, CPU_CORE2,
>> > --
>>
>> Should we replace all CPU_NONE with CPU_GENERIC32/CPU_GENERIC64?
>
> CPU_GENERIC32 had been removed by the 2008 patch Roman was referring to.  Did
> you mean CPU_PENTIUMPRO?

Yes.

-- 
H.J.


Re: [C++ Patch] PR 52599

2012-04-17 Thread Paolo Carlini

Hi,
I think build_constexpr_constructor_member_initializers is a better 
place for that check, since it's already looking at the tree structure.

Indeed. I'm finishing testing the below. Ok if it passes?

Thanks,
Paolo.


/cp
2012-04-17  Paolo Carlini  

PR c++/52599
* semantics.c (build_constexpr_constructor_member_initializers):
Check for function-try-block as function-body.

/testsuite
2012-04-17  Paolo Carlini  

PR c++/52599
* g++.dg/cpp0x/constexpr-ctor10.C: New.
Index: testsuite/g++.dg/cpp0x/constexpr-ctor10.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
@@ -0,0 +1,6 @@
+// PR c++/52599
+// { dg-options -std=c++11 }
+
+struct foo {
+  constexpr foo() try { } catch(...) { };  // { dg-error "constructor" }
+};
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 186523)
+++ cp/semantics.c  (working copy)
@@ -5921,6 +5921,8 @@ build_constexpr_constructor_member_initializers (t
break;
}
 }
+  else if (TREE_CODE (body) == TRY_BLOCK)
+error ("body of % constructor cannot be a function-try-block");
   else if (EXPR_P (body))
 ok = build_data_member_initialization (body, &vec);
   else


Re: [C++ Patch] PR 53003

2012-04-17 Thread Paolo Carlini

On 04/17/2012 03:55 PM, Jason Merrill wrote:

I have various thoughts:

It's odd that we still treat 'return' as starting a function body long 
after we removed that extension.


Maybe we shouldn't look for a function body if we already have an 
initializer and aren't dealing with a function declarator.


I guess we should set initializer_token_start for {} initializers as 
well.


But your patch is certainly the smallest change, and OK.
Thanks. Thus let's say I apply the very safe patchlet to mainline and 
branch and then, when time allows, I'll try and see if I clean up a bit 
mainline in this area.


Thanks,
Paolo.


Re: [C++ Patch] PR 52599

2012-04-17 Thread Paolo Carlini

On 04/17/2012 05:35 PM, Paolo Carlini wrote:

Hi,
I think build_constexpr_constructor_member_initializers is a better 
place for that check, since it's already looking at the tree structure.

Indeed. I'm finishing testing the below. Ok if it passes?
... uhm, actually like this seems more correct to me, I'm testing this 
variant instead. Sorry.


Paolo.

///
Index: testsuite/g++.dg/cpp0x/constexpr-ctor10.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
@@ -0,0 +1,6 @@
+// PR c++/52599
+// { dg-options -std=c++11 }
+
+struct foo {
+  constexpr foo() try { } catch(...) { };  // { dg-error "constructor" }
+};
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 186523)
+++ cp/semantics.c  (working copy)
@@ -5921,6 +5921,12 @@ build_constexpr_constructor_member_initializers (t
break;
}
 }
+  else if (TREE_CODE (body) == TRY_BLOCK)
+{
+  error ("body of % constructor cannot be "
+"a function-try-block");
+  return error_mark_node;
+}
   else if (EXPR_P (body))
 ok = build_data_member_initialization (body, &vec);
   else


[patch] Cleanup tree-switch-conversion a bit

2012-04-17 Thread Steven Bosscher
> My goal for GCC 4.8 is to do just that: Move switch expansion to
> GIMPLE and add value profiling for switch expressions.

And the idea is to put all that code in tree-switch-conversion.c. But
there are a few clean-ups I wish to do on that code before that.
First, there is a global pass info structure that contains useful data
for all forms of GIMPLE_SWITCH lowering. I've "un-globalized" that
data with the attached patch. While there, I made the dump messages
uniform.

Bootstrapped and tested on powerpc-unknown-linux-gnu. OK?

Ciao!
Steven
* tree-switch-conversion.c (info): Remove global pass info.
(check_range, check_process_case, check_final_bb, create_temp_arrays,
free_temp_arrays, gather_default_values, build_constructors,
array_value_type, build_one_array, build_arrays, gen_def_assigns,
fix_phi_nodes, gen_inbound_check): Pass info around from ...
(process_switch): ... here.  Unify message format.  Return a const
char pointer to the failure reason message.
(do_switchconv): Unify message format.  Update process_switch usage.

Index: tree-switch-conversion.c
===
--- tree-switch-conversion.c(revision 186526)
+++ tree-switch-conversion.c(working copy)
@@ -24,8 +24,8 @@ Software Foundation, 51 Franklin Street, Fifth Flo
  Switch initialization conversion
 
 The following pass changes simple initializations of scalars in a switch
-statement into initializations from a static array.  Obviously, the values must
-be constant and known at compile time and a default branch must be
+statement into initializations from a static array.  Obviously, the values
+must be constant and known at compile time and a default branch must be
 provided.  For example, the following code:
 
 int a,b;
@@ -162,16 +162,12 @@ struct switch_conv_info
   basic_block bit_test_bb[2];
 };
 
-/* Global pass info.  */
-static struct switch_conv_info info;
-
-
 /* Checks whether the range given by individual case statements of the SWTCH
switch statement isn't too big and whether the number of branches actually
satisfies the size of the new array.  */
 
 static bool
-check_range (gimple swtch)
+check_range (gimple swtch, struct switch_conv_info *info)
 {
   tree min_case, max_case;
   unsigned int branch_num = gimple_switch_num_labels (swtch);
@@ -181,7 +177,7 @@ static bool
  is a default label which is the first in the vector.  */
 
   min_case = gimple_switch_label (swtch, 1);
-  info.range_min = CASE_LOW (min_case);
+  info->range_min = CASE_LOW (min_case);
 
   gcc_assert (branch_num > 1);
   gcc_assert (CASE_LOW (gimple_switch_label (swtch, 0)) == NULL_TREE);
@@ -191,22 +187,22 @@ static bool
   else
 range_max = CASE_LOW (max_case);
 
-  gcc_assert (info.range_min);
+  gcc_assert (info->range_min);
   gcc_assert (range_max);
 
-  info.range_size = int_const_binop (MINUS_EXPR, range_max, info.range_min);
+  info->range_size = int_const_binop (MINUS_EXPR, range_max, info->range_min);
 
-  gcc_assert (info.range_size);
-  if (!host_integerp (info.range_size, 1))
+  gcc_assert (info->range_size);
+  if (!host_integerp (info->range_size, 1))
 {
-  info.reason = "index range way too large or otherwise unusable.\n";
+  info->reason = "index range way too large or otherwise unusable";
   return false;
 }
 
-  if ((unsigned HOST_WIDE_INT) tree_low_cst (info.range_size, 1)
+  if ((unsigned HOST_WIDE_INT) tree_low_cst (info->range_size, 1)
   > ((unsigned) branch_num * SWITCH_CONVERSION_BRANCH_RATIO))
 {
-  info.reason = "the maximum range-branch ratio exceeded.\n";
+  info->reason = "the maximum range-branch ratio exceeded";
   return false;
 }
 
@@ -219,7 +215,7 @@ static bool
and returns true.  Otherwise returns false.  */
 
 static bool
-check_process_case (tree cs)
+check_process_case (tree cs, struct switch_conv_info *info)
 {
   tree ldecl;
   basic_block label_bb, following_bb;
@@ -228,48 +224,48 @@ static bool
   ldecl = CASE_LABEL (cs);
   label_bb = label_to_block (ldecl);
 
-  e = find_edge (info.switch_bb, label_bb);
+  e = find_edge (info->switch_bb, label_bb);
   gcc_assert (e);
 
   if (CASE_LOW (cs) == NULL_TREE)
 {
   /* Default branch.  */
-  info.default_prob = e->probability;
-  info.default_count = e->count;
+  info->default_prob = e->probability;
+  info->default_count = e->count;
 }
   else
 {
   int i;
-  info.other_count += e->count;
+  info->other_count += e->count;
   for (i = 0; i < 2; i++)
-   if (info.bit_test_bb[i] == label_bb)
+   if (info->bit_test_bb[i] == label_bb)
  break;
-   else if (info.bit_test_bb[i] == NULL)
+   else if (info->bit_test_bb[i] == NULL)
  {
-   info.bit_test_bb[i] = label_bb;
-   info.bit_test_uniq++;
+   info->bit_test_bb[i] = label_bb;
+   info->bit_test_uniq++;
   

Vector subscripts in C++

2012-04-17 Thread Marc Glisse

Hello,

this patch adds vector subscripting to C++ by reusing the C code. 
build_array_ref and cp_build_array_ref could probably share more, but I 
don't understand them enough to do it.


(note that I can't commit, so if you like the patch...)

gcc/cp/ChangeLog
2012-04-17  Marc Glisse  

PR c++/51033
* typeck.c (cp_build_array_ref): Handle VECTOR_TYPE.
* decl2.c (grok_array_decl): Likewise.

gcc/c-family/ChangeLog
2012-04-17  Marc Glisse  

PR c++/51033
* c-common.c (convert_vector_to_pointer_for_subscript): New function.
* c-common.h (convert_vector_to_pointer_for_subscript): Declare it.

gcc/ChangeLog
2012-04-17  Marc Glisse  

PR c++/51033
* c-typeck.c (build_array_ref): Call
convert_vector_to_pointer_for_subscript.
* doc/extend.texi (Vector Extensions): Subscripting not just for C.

gcc/testsuite/ChangeLog
2012-04-17  Marc Glisse  

PR c++/51033
* gcc.dg/vector-1.c: Move to ...
* c-c++-common/vector-1.c: ... here.
* gcc.dg/vector-2.c: Move to ...
* c-c++-common/vector-2.c: ... here.
* gcc.dg/vector-3.c: Move to ...
* c-c++-common/vector-3.c: ... here. Adapt to C++.
* gcc.dg/vector-4.c: Move to ...
* c-c++-common/vector-4.c: ... here.
* gcc.dg/vector-init-1.c: Move to ...
* c-c++-common/vector-init-1.c: ... here.
* gcc.dg/vector-init-2.c: Move to ...
* c-c++-common/vector-init-2.c: ... here.
* gcc.dg/vector-subscript-1.c: Move to ... Adapt to C++.
* c-c++-common/vector-subscript-1.c: ... here.
* gcc.dg/vector-subscript-2.c: Move to ...
* c-c++-common/vector-subscript-2.c: ... here.
* gcc.dg/vector-subscript-3.c: Move to ...
* c-c++-common/vector-subscript-3.c: ... here.

--
Marc GlisseIndex: cp/decl2.c
===
--- cp/decl2.c  (revision 186523)
+++ cp/decl2.c  (working copy)
@@ -373,7 +373,7 @@
 It is a little-known fact that, if `a' is an array and `i' is
 an int, you can write `i[a]', which means the same thing as
 `a[i]'.  */
-  if (TREE_CODE (type) == ARRAY_TYPE)
+  if (TREE_CODE (type) == ARRAY_TYPE || TREE_CODE (type) == VECTOR_TYPE)
p1 = array_expr;
   else
p1 = build_expr_type_conversion (WANT_POINTER, array_expr, false);
Index: cp/typeck.c
===
--- cp/typeck.c (revision 186523)
+++ cp/typeck.c (working copy)
@@ -2902,6 +2902,8 @@
   break;
 }
 
+  convert_vector_to_pointer_for_subscript (loc, &array, idx);
+
   if (TREE_CODE (TREE_TYPE (array)) == ARRAY_TYPE)
 {
   tree rval, type;
Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 186523)
+++ c-family/c-common.c (working copy)
@@ -10831,4 +10831,29 @@
   return literal;
 }
 
+/* For vector[index], convert the vector to a
+   pointer of the underlying type.  */
+void
+convert_vector_to_pointer_for_subscript (location_t loc, tree* vecp, tree 
index)
+{
+  if (TREE_CODE (TREE_TYPE (*vecp)) == VECTOR_TYPE)
+{
+  tree type = TREE_TYPE (*vecp);
+  tree type1;
+
+  if (TREE_CODE (index) == INTEGER_CST)
+if (!host_integerp (index, 1)
+|| ((unsigned HOST_WIDE_INT) tree_low_cst (index, 1)
+   >= TYPE_VECTOR_SUBPARTS (type)))
+  warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
+
+  c_common_mark_addressable_vec (*vecp);
+  type = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
+  type = build_pointer_type (type);
+  type1 = build_pointer_type (TREE_TYPE (*vecp));
+  *vecp = build1 (ADDR_EXPR, type1, *vecp);
+  *vecp = convert (type, *vecp);
+}
+}
+
 #include "gt-c-family-c-common.h"
Index: c-family/c-common.h
===
--- c-family/c-common.h (revision 186523)
+++ c-family/c-common.h (working copy)
@@ -1119,4 +1119,6 @@
 
 extern tree build_userdef_literal (tree suffix_id, tree value, tree 
num_string);
 
+extern void convert_vector_to_pointer_for_subscript (location_t, tree*, tree);
+
 #endif /* ! GCC_C_COMMON_H */
Index: testsuite/c-c++-common/vector-3.c
===
--- testsuite/c-c++-common/vector-3.c   (revision 186523)
+++ testsuite/c-c++-common/vector-3.c   (working copy)
@@ -2,4 +2,7 @@
 
 /* Check that we error out when using vector_size on the bool type. */
 
+#ifdef __cplusplus
+#define _Bool bool
+#endif
 __attribute__((vector_size(16) )) _Bool a; /* { dg-error "" } */
Index: testsuite/c-c++-common/vector-subscript-1.c
===
--- testsuite/c-c++-common/vector-subscript-1.c (revision 186523)
+++ testsuite/c-c++-common/vector-subscript-1.c (working copy)
@@ -6,7 +6,7 

Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-17 Thread Vladimir Makarov

On 04/17/2012 04:29 AM, Richard Sandiford wrote:

Vladimir Makarov  writes:

On the other hand, I don't think that 1st insn scheduling will be ever
used for x86.  And although the SPECFP2000 rate is the same on x86-64 I
saw that some SPECFP2000 tests benefit from your algorithm on x86-64
(one amazing difference is 70% improvement on swim on x86-64 although it
might be because of different reasons like alignment or cache
behaviour).  So I think the algorithm might work better on processors
with more registers.
Notwithstanding that this is a goemean, I assume there were some bad
results to cancel out the gain?
Yes, mgrid had 4% degradation, mesa had 2%, facerec and ammp had 2.5%, 
lucas had 15%.  On the other hand, galgel had 2% improvement and equake 
had 1%.  All other differences are not considerable.


Oops, thanks :-)

Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch.  It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future.  There must be
better ways of achieving the same thing.


It is up to you, Richard.  I don't mind if you commit it into the trunk.

There is something in your approach too.  If it would be possible to get 
the best of the two methods, we could see a big improvement.




[v3] std::uninitialized_copy test

2012-04-17 Thread Benjamin De Kosnik

Found this bug in 4.4 branches, fixed with the following in later
branches:
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01616.html

But the test case is useful anyway.

tested x86/linux

-benjamin2012-04-16  Benjamin Kosnik  

	* testsuite/20_util/specialized_algorithms/uninitialized_copy/
	808590.cc: New.


diff --git a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc
new file mode 100644
index 000..7ccd8da
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2012 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without Pred the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+
+// 4.4.x only
+struct c 
+{
+  void *m;
+
+  c(void* o = 0) : m(o) {}
+  c(const c &r) : m(r.m) {}
+
+  template
+explicit c(T &o) : m((void*)0xdeadfbeef) { }
+};
+
+int main() 
+{
+  std::vector cbs;
+  const c cb((void*)0xcafebabe);
+
+  for (int fd = 62; fd < 67; ++fd) 
+{
+  cbs.resize(fd + 1);
+  cbs[fd] = cb;
+}
+
+  for (int fd = 62; fd< 67; ++fd)
+if (cb.m != cbs[fd].m)
+  throw std::runtime_error("wrong");
+  return 0;
+}


Re: [C++ Patch] PR 52599

2012-04-17 Thread Jason Merrill

OK.

Jason


Re: [v3, testsuite] Fix merging default libstdc++.log

2012-04-17 Thread Rainer Orth
Hi Mike,

> On Apr 16, 2012, at 8:03 AM, Rainer Orth wrote:
>> I've long noticed that libstdc++.log (unlike libstdc++.sum) doesn't
>> contain log entries for tests run from abi.exp, but hadn't looked
>> closer, getting used to check libstdc++.log.sep instead which contained
>> everything I expected.
>
>> ok for mainline?
>
> Ok.  Would have been nice to see the before and after log file...

the full thing is pretty boring, but the gist of the change can be seen
by diffing the output of dg-extract-results.sh -L libstdc++.log.sep:

--- 11-gcc.old/i386-pc-solaris2.11/libstdc++-v3/testsuite/libstdc++.log.dist
2012-04-17 18:59:05.777535114 +0200
+++ 11-gcc/i386-pc-solaris2.11/libstdc++-v3/testsuite/libstdc++.log.fixed   
2012-04-17 18:57:26.890396807 +0200
@@ -1,4 +1,4 @@
-Test Run By ro on Sat Apr 14 19:57:36 2012
+Test Run By ro on Sun Apr 15 22:11:21 2012
 Native configuration is i386-pc-solaris2.11
 
=== libstdc++ tests ===
@@ -15,8 +15,8 @@
 libgomp support detected
 set_ld_library_path_env_vars: 
ld_library_path=:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/../libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc/amd64
 LD_LIBRARY_PATH = 
:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/../libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc/amd64:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libmudflap/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libssp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libitm/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/./gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/./prev-gcc

[differences due to tmp file names omitted ...]

@@ -54,6 +54,226 @@
 Setting LD_LIBRARY_PATH to 
:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/../libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc/amd64::/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/../libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc/amd64:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libmudflap/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libssp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libitm/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/./gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/./prev-gcc
 spawn [open ...]
 
+    libstdc++-v3 check-abi Summary 
+
+# of added symbols: 0
+# of missing symbols:   0
+# of undesignated symbols:  0
+# of incompatible symbols:  0
+
+using: baseline_symbols.txt
+PASS: libstdc++-abi/abi_check
+testcase 
/vol/gcc/src/hg/trunk/solaris/libstdc++-v3/testsuite/libstdc++-abi/abi.exp 
completed in 35 seconds
+Running 
/vol/gcc/src/hg/trunk/solaris/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp
 ...
+libgomp support detected
[rest of prettyprinters.exp tests omitted ...]
+testcase 
/vol/gcc/src/hg/trunk/solaris/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp
 completed in 45 seconds
+
=== libstdc++ Summary for unix ===
 
 Running target unix/-m64

[similar change for 64-bit variant omitted...]

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Use target_alias in validate_failures.py

2012-04-17 Thread Rainer Orth
Diego Novillo  writes:

> On 4/16/12 7:32 AM, Rainer Orth wrote:
>
>> Btw., it occured to me that it might be useful to add an option to
>> locate out-of-tree manifests.  I often have several source trees
>> (unmodified sources, ones with local patches) and would like to share
>> manifests between them.  While this can be achieved with symlinks, a
>> --manifest_dir or similar option might be an alternative.  Thoughts?
>
> That would be fantastic.  This is not the first time someone requests this,
> but I've never gotten around to implementing it.  The only thing there is
> that multiple manifests means that versioning needs to be handled
> externally to the script.  But that's not a big deal.

Indeed, but the advantage is that people can choose whatever naming
scheme they like for the different manifests instead of implementing
some (probably limited) scheme inside validate_failures.py.

I'll give it a whirl, but probably only in early May, once I return from
a trip to California.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [v3] std::uninitialized_copy test

2012-04-17 Thread Paolo Carlini

On 04/17/2012 06:52 PM, Benjamin De Kosnik wrote:

Found this bug in 4.4 branches, fixed with the following in later
branches:
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01616.html

But the test case is useful anyway.
Definitely, thanks! The name of the new testcase seems a bit weird (for 
the FSF branches): shall we maybe refer to the original (Fedora or RHEL 
Bugzilla, I suppose) PR in a comment and then use either an explicative 
name (our current practice) or just a small number for the name of the 
testcase itself?


Thanks again,
Paolo.


Re: [v3] std::uninitialized_copy test

2012-04-17 Thread Benjamin Kosnik

> Definitely, thanks! The name of the new testcase seems a bit weird
> (for the FSF branches): shall we maybe refer to the original (Fedora
> or RHEL Bugzilla, I suppose) PR in a comment and then use either an
> explicative name (our current practice) or just a small number for
> the name of the testcase itself?

Yes, agreed. I couldn't think of anything descriptive for this test,
but if you can please feel free to assign it something meaningful.

-benjamin


[PATCH, i386]: Fix PR 53020, another victim of IOR vs OR naming difference

2012-04-17 Thread Uros Bizjak
Hello!

Correct name of atomic "or" named pattern is "atomic_orM", not "atomic_iorM".

Attached patch fixes this oversight.

2012-04-17  Uros Bizjak  

PR target/53020
* config/i386/sync.md (atomic_): Rename to
atomic_.

Patch was bootstrapped and tested on x86_64-pc-linux-gnu, will be
committed to all release branches.

Uros.
Index: config/i386/sync.md
===
--- config/i386/sync.md (revision 186539)
+++ config/i386/sync.md (working copy)
@@ -576,7 +576,7 @@
   return "lock{%;} sub{}\t{%1, %0|%0, %1}";
 })
 
-(define_insn "atomic_"
+(define_insn "atomic_"
   [(set (match_operand:SWI 0 "memory_operand" "+m")
(unspec_volatile:SWI
  [(any_logic:SWI (match_dup 0)


[PATCH] Fix __builtin_powi placement (PR52976 follow-up)

2012-04-17 Thread William J. Schmidt
The emergency patch for PR52976 manipulated the operand rank system to
force inserted __builtin_powi calls to occur before uses of the call
results.  However, this is generally the wrong approach, as it forces
other computations to move unnecessarily, and extends the lifetimes of
other operands.

This patch fixes the problem in the proper way, by letting the rank
system determine where the __builtin_powi call belongs, and moving the
call to that location during the expression rewrite.

Bootstrapped with no new regressions on powerpc64-linux.  SPEC cpu2000
and cpu2006 also build cleanly.  Ok for trunk?

Thanks,
Bill


gcc:

2012-04-17  Bill Schmidt  

* tree-ssa-reassoc.c (add_to_ops_vec_max_rank): Delete.
(possibly_move_powi): New function.
(rewrite_expr_tree): Call possibly_move_powi.
(rewrite_expr_tree_parallel): Likewise.
(attempt_builtin_powi): Change call of add_to_ops_vec_max_rank to
call add_to_ops_vec instead.


gcc/testsuite:

2012-04-17  Bill Schmidt  

gfortran.dg/reassoc_11.f: New test.



Index: gcc/testsuite/gfortran.dg/reassoc_11.f
===
--- gcc/testsuite/gfortran.dg/reassoc_11.f  (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_11.f  (revision 0)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-options "-O3 -ffast-math" }
+
+! This tests only for compile-time failure, which formerly occurred
+! when a __builtin_powi was introduced by reassociation in a bad place.
+
+  SUBROUTINE GRDURBAN(URBWSTR, ZIURB, GRIDHT)
+
+  IMPLICIT NONE
+  INTEGER :: I
+  REAL :: SW2, URBWSTR, ZIURB, GRIDHT(87)
+
+  SAVE 
+
+  SW2 = 1.6*(GRIDHT(I)/ZIURB)**0.667*URBWSTR**2
+
+  END
Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 186495)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -544,28 +544,6 @@ add_repeat_to_ops_vec (VEC(operand_entry_t, heap)
   reassociate_stats.pows_encountered++;
 }
 
-/* Add an operand entry to *OPS for the tree operand OP, giving the
-   new entry a larger rank than any other operand already in *OPS.  */
-
-static void
-add_to_ops_vec_max_rank (VEC(operand_entry_t, heap) **ops, tree op)
-{
-  operand_entry_t oe = (operand_entry_t) pool_alloc (operand_entry_pool);
-  operand_entry_t oe1;
-  unsigned i;
-  unsigned max_rank = 0;
-
-  FOR_EACH_VEC_ELT (operand_entry_t, *ops, i, oe1)
-if (oe1->rank > max_rank)
-  max_rank = oe1->rank;
-
-  oe->op = op;
-  oe->rank = max_rank + 1;
-  oe->id = next_operand_entry_id++;
-  oe->count = 1;
-  VEC_safe_push (operand_entry_t, heap, *ops, oe);
-}
-
 /* Return true if STMT is reassociable operation containing a binary
operation with tree code CODE, and is inside LOOP.  */
 
@@ -2162,6 +2242,47 @@ remove_visited_stmt_chain (tree var)
 }
 }
 
+/* If OP is an SSA name, find its definition and determine whether it
+   is a call to __builtin_powi.  If so, move the definition prior to
+   STMT.  Only do this during early reassociation.  */
+
+static void
+possibly_move_powi (gimple stmt, tree op)
+{
+  gimple stmt2;
+  tree fndecl;
+  gimple_stmt_iterator gsi1, gsi2;
+
+  if (!first_pass_instance
+  || !flag_unsafe_math_optimizations
+  || TREE_CODE (op) != SSA_NAME)
+return;
+  
+  stmt2 = SSA_NAME_DEF_STMT (op);
+
+  if (!is_gimple_call (stmt2)
+  || !has_single_use (gimple_call_lhs (stmt2)))
+return;
+
+  fndecl = gimple_call_fndecl (stmt2);
+
+  if (!fndecl
+  || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+return;
+
+  switch (DECL_FUNCTION_CODE (fndecl))
+{
+CASE_FLT_FN (BUILT_IN_POWI):
+  break;
+default:
+  return;
+}
+
+  gsi1 = gsi_for_stmt (stmt);
+  gsi2 = gsi_for_stmt (stmt2);
+  gsi_move_before (&gsi2, &gsi1);
+}
+
 /* This function checks three consequtive operands in
passed operands vector OPS starting from OPINDEX and
swaps two operands if it is profitable for binary operation
@@ -2267,6 +2388,8 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
  print_gimple_stmt (dump_file, stmt, 0, 0);
}
 
+ possibly_move_powi (stmt, oe1->op);
+ possibly_move_powi (stmt, oe2->op);
}
   return;
 }
@@ -2312,6 +2435,8 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
  fprintf (dump_file, " into ");
  print_gimple_stmt (dump_file, stmt, 0, 0);
}
+
+  possibly_move_powi (stmt, oe->op);
 }
   /* Recurse on the LHS of the binary operator, which is guaranteed to
  be the non-leaf side.  */
@@ -2485,6 +2610,9 @@ rewrite_expr_tree_parallel (gimple stmt, int width
  fprintf (dump_file, " into ");
  print_gimple_stmt (dump_file, stmts[i], 0, 0);
}
+
+  possibly_move_powi (stmts[i], op1);
+  possibly_move_powi (stmts[i], op2);
 }
 
   remove_visited_stmt_chain (last_rhs1);
@@ -3196,6 +3324,8 @@ attempt_builti

[PATCH, i386] V4DF __builtin_shuffle

2012-04-17 Thread Marc Glisse

Hello,

this patch expands __builtin_shuffle for V4DF mode in at most 3 insn. It 
is simple and works really well, often generates only 2 insn. It is not 
very generic, because other modes don't have an instruction equivalent to 
vshufpd. For V8SF (and likely V4DI and V8SI with AVX2, but I still need to 
do that), my patch "default case" in PR 52607 seems more interesting.


I tried calling this new function after expand_vec_perm_vperm2f128_vblend 
(instead of before as in the patch), but it generated more instructions 
for some permutations, and never less. That function is still useful for 
V8SF though.


I bootstrapped gcc on a non-avx platform, compiled a program that tests 
all 4096 shuffles with -mavx/-mavx2, and ran the result using Intel's 
emulator (SDE).


There are still a few V4DF permutations that don't generate an optimal 
sequence (3 insn instead of 2), but not that many I think. Of course, I am 
assuming a constant cost of 1 per insn, which is completely false, but 
seems like a sensible first approximation.


(note that I can't commit)


2012-04-17  Marc Glisse  

PR target/502607
* config/i386/i386.c (ix86_expand_vec_perm_const): Move code to ...
(canonicalize_perm): ... new function.
(expand_vec_perm_2vperm2f128_vshuf): New function.
(ix86_expand_vec_perm_const_1): Call it.

--
Marc GlisseIndex: config/i386/i386.c
===
--- config/i386/i386.c  (revision 186523)
+++ config/i386/i386.c  (working copy)
@@ -32946,6 +32946,7 @@
   bool testing_p;
 };
 
+static bool canonicalize_perm (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
 
@@ -37003,6 +37004,57 @@
   return true;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Implement a V4DF
+   permutation using two vperm2f128, followed by a vshufpd insn blending
+   the two vectors together.  */
+
+static bool
+expand_vec_perm_2vperm2f128_vshuf (struct expand_vec_perm_d *d)
+{
+  struct expand_vec_perm_d dfirst, dsecond, dthird;
+  bool ok;
+
+  if (!TARGET_AVX || (d->vmode != V4DFmode))
+return false;
+
+  if (d->testing_p)
+return true;
+
+  dfirst = *d;
+  dsecond = *d;
+  dthird = *d;
+
+  dfirst.perm[0] = (d->perm[0] & ~1);
+  dfirst.perm[1] = (d->perm[0] & ~1) + 1;
+  dfirst.perm[2] = (d->perm[2] & ~1);
+  dfirst.perm[3] = (d->perm[2] & ~1) + 1;
+  dsecond.perm[0] = (d->perm[1] & ~1);
+  dsecond.perm[1] = (d->perm[1] & ~1) + 1;
+  dsecond.perm[2] = (d->perm[3] & ~1);
+  dsecond.perm[3] = (d->perm[3] & ~1) + 1;
+  dthird.perm[0] = (d->perm[0] % 2);
+  dthird.perm[1] = (d->perm[1] % 2) + 4;
+  dthird.perm[2] = (d->perm[2] % 2) + 2;
+  dthird.perm[3] = (d->perm[3] % 2) + 6;
+
+  dfirst.target = gen_reg_rtx (dfirst.vmode);
+  dsecond.target = gen_reg_rtx (dsecond.vmode);
+  dthird.op0 = dfirst.target;
+  dthird.op1 = dsecond.target;
+  dthird.one_operand_p = false;
+
+  canonicalize_perm (&dfirst);
+  canonicalize_perm (&dsecond);
+
+  ok = expand_vec_perm_1 (&dfirst)
+   && expand_vec_perm_1 (&dsecond)
+   && expand_vec_perm_1 (&dthird);
+
+  gcc_assert (ok);
+
+  return true;
+}
+
 /* A subroutine of expand_vec_perm_even_odd_1.  Implement the double-word
permutation with two pshufb insns and an ior.  We should have already
failed all two instruction sequences.  */
@@ -37652,6 +37704,9 @@
 
   /* Try sequences of three instructions.  */
 
+  if (expand_vec_perm_2vperm2f128_vshuf (d))
+return true;
+
   if (expand_vec_perm_pshufb2 (d))
 return true;
 
@@ -37689,12 +37744,56 @@
   return false;
 }
 
+/* If a permutation only uses one operand, make it clear. Returns true
+   if the permutation references both operands.  */
+
+static bool
+canonicalize_perm (struct expand_vec_perm_d *d)
+{
+  int i, which, nelt = d->nelt;
+
+  for (i = which = 0; i < nelt; ++i)
+  which |= (d->perm[i] < nelt ? 1 : 2);
+
+  d->one_operand_p = true;
+  switch (which)
+{
+default:
+  gcc_unreachable();
+
+case 3:
+  if (!rtx_equal_p (d->op0, d->op1))
+{
+ d->one_operand_p = false;
+ break;
+}
+  /* The elements of PERM do not suggest that only the first operand
+is used, but both operands are identical.  Allow easier matching
+of the permutation by folding the permutation into the single
+input vector.  */
+  /* FALLTHRU */
+
+case 2:
+  for (i = 0; i < nelt; ++i)
+d->perm[i] &= nelt - 1;
+  d->op0 = d->op1;
+  break;
+
+case 1:
+  d->op1 = d->op0;
+  break;
+}
+
+  return (which == 3);
+}
+
 bool
 ix86_expand_vec_perm_const (rtx operands[4])
 {
   struct expand_vec_perm_d d;
   unsigned char perm[MAX_VECT_LEN];
-  int i, nelt, which;
+  int i, nelt;
+  bool two_args;
   rtx sel;
 
   d.target = operands[0];
@@ -37711,45 +37810,16 @@
   gcc_assert (XVECLEN (sel, 0) == nelt);
   gcc_ch

Re: RFA: Clean up ADDRESS handling in alias.c

2012-04-17 Thread H.J. Lu
On Sun, Apr 15, 2012 at 8:11 AM, Richard Sandiford
 wrote:
> The comment in alias.c says:
>
>   The contents of an ADDRESS is not normally used, the mode of the
>   ADDRESS determines whether the ADDRESS is a function argument or some
>   other special value.  Pointer equality, not rtx_equal_p, determines whether
>   two ADDRESS expressions refer to the same base address.
>
>   The only use of the contents of an ADDRESS is for determining if the
>   current function performs nonlocal memory memory references for the
>   purposes of marking the function as a constant function.  */
>
> The first paragraph is a bit misleading IMO.  AFAICT, rtx_equal_p has
> always given ADDRESS the full recursive treatment, rather than saying
> that pointer equality determines ADDRESS equality.  (This is in contrast
> to something like VALUE, where pointer equality is used.)  And AFAICT
> we've always had:
>
> static int
> base_alias_check (rtx x, rtx y, enum machine_mode x_mode,
>                  enum machine_mode y_mode)
> {
>  ...
>  /* If the base addresses are equal nothing is known about aliasing.  */
>  if (rtx_equal_p (x_base, y_base))
>    return 1;
>  ...
> }
>
> So I think the contents of an ADDRESS _are_ used to distinguish
> between different bases.
>
> The second paragraph ceased to be true in 2005 when the pure/const
> analysis moved to its own IPA pass.  Nothing now looks at the contents
> beyond rtx_equal_p.
>
> Also, base_alias_check effectively treats all arguments as a single base.
> That makes conceptual sense, because this analysis isn't strong enough
> to determine whether arguments are base values at all, never mind whether
> accesses based on different arguments conflict.  But the fact that we have
> a single base isn't obvious from the way the code is written, because we
> create several separate, non-rtx_equal_p, ADDRESSes to represent arguments.
> See:
>
>  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>    /* Check whether this register can hold an incoming pointer
>       argument.  FUNCTION_ARG_REGNO_P tests outgoing register
>       numbers, so translate if necessary due to register windows.  */
>    if (FUNCTION_ARG_REGNO_P (OUTGOING_REGNO (i))
>        && HARD_REGNO_MODE_OK (i, Pmode))
>      static_reg_base_value[i]
>        = gen_rtx_ADDRESS (VOIDmode, gen_rtx_REG (Pmode, i));
>
> and:
>
>      /* Check for an argument passed in memory.  Only record in the
>         copying-arguments block; it is too hard to track changes
>         otherwise.  */
>      if (copying_arguments
>          && (XEXP (src, 0) == arg_pointer_rtx
>              || (GET_CODE (XEXP (src, 0)) == PLUS
>                  && XEXP (XEXP (src, 0), 0) == arg_pointer_rtx)))
>        return gen_rtx_ADDRESS (VOIDmode, src);
>
> I think it would be cleaner and less wasteful to use a single rtx for
> the single "base" (really "potential base").
>
> So if we wanted to, we could now remove the operand from ADDRESS and
> simply rely on pointer equality.  I'm a bit reluctant to do that though.
> It would make debugging harder, and it would mean either adding knowledge
> of this alias-specific code to other files (specifically rtl.c:rtx_equal_p),
> or adding special ADDRESS shortcuts to alias.c.  But I think the code
> would be more obvious if we replaced the rtx operand with a unique id,
> which is what we already use for the REG_NOALIAS case:
>
>      new_reg_base_value[regno] = gen_rtx_ADDRESS (Pmode,
>                                                   GEN_INT (unique_id++));
>
> And if we do that, we can make the id a direct operand of the ADDRESS,
> rather than a CONST_INT subrtx[*].  That should make rtx_equal_p cheaper too.
>
>  [*] I'm trying to get rid of CONST_INTs like these that have
>      no obvious mode.
>
> All of which led to the patch below.  I checked that it didn't change
> the code generated at -O2 for a recent set of cc1 .ii files.  Also
> bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?
>
> To cover my back: I'm just trying to rewrite the current code according
> to its current assumptions.  Whether those assumptions are correct or not
> is always open to debate...
>
> Richard
>
>
> gcc/
>        * rtl.def (ADDRESS): Turn operand into a HOST_WIDE_INT.
>        * alias.c (reg_base_value): Expand and update comment.
>        (arg_base_value): New variable.
>        (unique_id): Move up file.
>        (unique_base_value, unique_base_value_p, known_base_value_p): New.
>        (find_base_value): Use arg_base_value and known_base_value_p.
>        (record_set): Document REG_NOALIAS handling.  Use unique_base_value.
>        (find_base_term): Use known_base_value_p.
>        (base_alias_check): Use unique_base_value_p.
>        (init_alias_target): Initialize arg_base_value.  Use unique_base_value.
>        (init_alias_analysis): Use 1 as the first id for REG_NOALIAS bases.
>

This breaks bootstrap on Linux/x86:


home/regress/tbox/native/build/./gcc/xgcc
-B/home/regress/tbox/native/build/./gcc/
-

[google/google-main] Fix for unused variable warning in libgcov.c (issue6052049)

2012-04-17 Thread Teresa Johnson
I have a patch to fix a compile time warning about an unused variable due
to the use being guarded by #ifndef __GCOV_KERNEL__.

Tested with bootstrap. Ok for google-main?

Teresa

2012-04-17   Teresa Johnson  

Google ref b/5910724.
* libgcc/libgcov.c (gcov_cur_module_id): Guard definition under
#ifndef __GCOV_KERNEL__.

Index: libgcov.c
===
--- libgcov.c   (revision 186282)
+++ libgcov.c   (working copy)
@@ -153,10 +153,10 @@ static gcov_unsigned_t gcov_crc32;
 
 /* Size of the longest file name. */
 static size_t gcov_max_filename = 0;
-#endif /* __GCOV_KERNEL__ */
 
 /* Unique identifier assigned to each module (object file).  */
 static gcov_unsigned_t gcov_cur_module_id = 0;
+#endif /* __GCOV_KERNEL__ */
 
 /* Pointer to the direct-call counters (per call-site counters).
Initialized by the caller.  */

--
This patch is available for review at http://codereview.appspot.com/6052049


Re: [patch] Remove strange case cost code

2012-04-17 Thread Xinliang David Li
On Tue, Apr 17, 2012 at 1:48 AM, Jan Hubicka  wrote:
>> > Note that it would make a lot of sense to teach this heuristics predict.c
>> > and properly identify chars.
>>
>> Indeed this would be the proper place to implement this logic.
>
> TO a degree - switch expansion needs more info than it can obtain from edge
> profile.  Having
> switch
>  case 1,3,5,7,8,9: aaa
>  case 2,4,6,8,10,12: bbb
> to produce well ballanced decision tree, it is not enough to know how
> often the value is even and how often it is odd...

Why is that? In this case, the expanded switch case does not use BST,
but testing against bit patterns.

>
> Thus there is a need for value histograms.

None of the existing value profiler will be powerful enough for this
though: the one_value profiler only tracks one value. The interval
profiler can potentially be used if the switch case range is small --
otherwise the runtime memory overhead will be too large.

Thanks,

David

>>
>> > Also it is possble to get an historgrams from profile feedback into
>> > switch expansion. I always wanted to do that once switch expansion code
>> > is cleaned up and moved to gimple level...
>>
>> Indeed.  At least the parts that expand switch stmts to (balanced) trees
>> should be moved to the GIMPLE level, retaining only the table-jump-like
>> expansions as switch stmts.
>
> Yep.
> Honza
>>
>> >>
>> >>
>> >> The attached patch removes the heuristic.
>> >>
>> >> Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?
>>
>> Ok.
>>
>> Thanks,
>> Richard.
>>
>> >> Ciao!
>> >> Steven
>> >
>> >


Re: [google/google-main] Fix for unused variable warning in libgcov.c (issue6052049)

2012-04-17 Thread Xinliang David Li
ok.

David

On Tue, Apr 17, 2012 at 11:40 AM, Teresa Johnson  wrote:
> I have a patch to fix a compile time warning about an unused variable due
> to the use being guarded by #ifndef __GCOV_KERNEL__.
>
> Tested with bootstrap. Ok for google-main?
>
> Teresa
>
> 2012-04-17   Teresa Johnson  
>
>        Google ref b/5910724.
>        * libgcc/libgcov.c (gcov_cur_module_id): Guard definition under
>        #ifndef __GCOV_KERNEL__.
>
> Index: libgcov.c
> ===
> --- libgcov.c   (revision 186282)
> +++ libgcov.c   (working copy)
> @@ -153,10 +153,10 @@ static gcov_unsigned_t gcov_crc32;
>
>  /* Size of the longest file name. */
>  static size_t gcov_max_filename = 0;
> -#endif /* __GCOV_KERNEL__ */
>
>  /* Unique identifier assigned to each module (object file).  */
>  static gcov_unsigned_t gcov_cur_module_id = 0;
> +#endif /* __GCOV_KERNEL__ */
>
>  /* Pointer to the direct-call counters (per call-site counters).
>    Initialized by the caller.  */
>
> --
> This patch is available for review at http://codereview.appspot.com/6052049


Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Maxim Kuvyrkov
On 18/04/2012, at 2:32 AM, Ilya Enkovich wrote:

>> On Tue, Apr 17, 2012 at 3:16 PM, Uros Bizjak  wrote:
...
>> 
>> The patch looks OK to me in the sense, that there is no difference for
>> x86 targets.
>> 
>> So, OK for x86.
>> 
>> Thanks,
>> Uros.
> 
> Thanks, Uros!
> 
> Maxim, could you please look at patch?

The Android parts of the patch are also good.  Given that Uros approved the x86 
pieces, you are clear to check in.

Ilya, thanks for bearing with the us and reworking your patch after the reviews.
--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics




Re: RFA: Clean up ADDRESS handling in alias.c

2012-04-17 Thread Richard Sandiford
"H.J. Lu"  writes:
> On Sun, Apr 15, 2012 at 8:11 AM, Richard Sandiford
>  wrote:
>> The comment in alias.c says:
>>
>>   The contents of an ADDRESS is not normally used, the mode of the
>>   ADDRESS determines whether the ADDRESS is a function argument or some
>>   other special value.  Pointer equality, not rtx_equal_p, determines whether
>>   two ADDRESS expressions refer to the same base address.
>>
>>   The only use of the contents of an ADDRESS is for determining if the
>>   current function performs nonlocal memory memory references for the
>>   purposes of marking the function as a constant function.  */
>>
>> The first paragraph is a bit misleading IMO.  AFAICT, rtx_equal_p has
>> always given ADDRESS the full recursive treatment, rather than saying
>> that pointer equality determines ADDRESS equality.  (This is in contrast
>> to something like VALUE, where pointer equality is used.)  And AFAICT
>> we've always had:
>>
>> static int
>> base_alias_check (rtx x, rtx y, enum machine_mode x_mode,
>>                  enum machine_mode y_mode)
>> {
>>  ...
>>  /* If the base addresses are equal nothing is known about aliasing.  */
>>  if (rtx_equal_p (x_base, y_base))
>>    return 1;
>>  ...
>> }
>>
>> So I think the contents of an ADDRESS _are_ used to distinguish
>> between different bases.
>>
>> The second paragraph ceased to be true in 2005 when the pure/const
>> analysis moved to its own IPA pass.  Nothing now looks at the contents
>> beyond rtx_equal_p.
>>
>> Also, base_alias_check effectively treats all arguments as a single base.
>> That makes conceptual sense, because this analysis isn't strong enough
>> to determine whether arguments are base values at all, never mind whether
>> accesses based on different arguments conflict.  But the fact that we have
>> a single base isn't obvious from the way the code is written, because we
>> create several separate, non-rtx_equal_p, ADDRESSes to represent arguments.
>> See:
>>
>>  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>>    /* Check whether this register can hold an incoming pointer
>>       argument.  FUNCTION_ARG_REGNO_P tests outgoing register
>>       numbers, so translate if necessary due to register windows.  */
>>    if (FUNCTION_ARG_REGNO_P (OUTGOING_REGNO (i))
>>        && HARD_REGNO_MODE_OK (i, Pmode))
>>      static_reg_base_value[i]
>>        = gen_rtx_ADDRESS (VOIDmode, gen_rtx_REG (Pmode, i));
>>
>> and:
>>
>>      /* Check for an argument passed in memory.  Only record in the
>>         copying-arguments block; it is too hard to track changes
>>         otherwise.  */
>>      if (copying_arguments
>>          && (XEXP (src, 0) == arg_pointer_rtx
>>              || (GET_CODE (XEXP (src, 0)) == PLUS
>>                  && XEXP (XEXP (src, 0), 0) == arg_pointer_rtx)))
>>        return gen_rtx_ADDRESS (VOIDmode, src);
>>
>> I think it would be cleaner and less wasteful to use a single rtx for
>> the single "base" (really "potential base").
>>
>> So if we wanted to, we could now remove the operand from ADDRESS and
>> simply rely on pointer equality.  I'm a bit reluctant to do that though.
>> It would make debugging harder, and it would mean either adding knowledge
>> of this alias-specific code to other files (specifically rtl.c:rtx_equal_p),
>> or adding special ADDRESS shortcuts to alias.c.  But I think the code
>> would be more obvious if we replaced the rtx operand with a unique id,
>> which is what we already use for the REG_NOALIAS case:
>>
>>      new_reg_base_value[regno] = gen_rtx_ADDRESS (Pmode,
>>                                                   GEN_INT (unique_id++));
>>
>> And if we do that, we can make the id a direct operand of the ADDRESS,
>> rather than a CONST_INT subrtx[*].  That should make rtx_equal_p cheaper too.
>>
>>  [*] I'm trying to get rid of CONST_INTs like these that have
>>      no obvious mode.
>>
>> All of which led to the patch below.  I checked that it didn't change
>> the code generated at -O2 for a recent set of cc1 .ii files.  Also
>> bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?
>>
>> To cover my back: I'm just trying to rewrite the current code according
>> to its current assumptions.  Whether those assumptions are correct or not
>> is always open to debate...
>>
>> Richard
>>
>>
>> gcc/
>>        * rtl.def (ADDRESS): Turn operand into a HOST_WIDE_INT.
>>        * alias.c (reg_base_value): Expand and update comment.
>>        (arg_base_value): New variable.
>>        (unique_id): Move up file.
>>        (unique_base_value, unique_base_value_p, known_base_value_p): New.
>>        (find_base_value): Use arg_base_value and known_base_value_p.
>>        (record_set): Document REG_NOALIAS handling.  Use unique_base_value.
>>        (find_base_term): Use known_base_value_p.
>>        (base_alias_check): Use unique_base_value_p.
>>        (init_alias_target): Initialize arg_base_value.  Use 
>> unique_base_value.
>>        (init_alias_analysis): Use 1 as the first id for REG_NOALIAS

Re: [patch] Remove strange case cost code

2012-04-17 Thread Jan Hubicka
> On Tue, Apr 17, 2012 at 1:48 AM, Jan Hubicka  wrote:
> >> > Note that it would make a lot of sense to teach this heuristics predict.c
> >> > and properly identify chars.
> >>
> >> Indeed this would be the proper place to implement this logic.
> >
> > TO a degree - switch expansion needs more info than it can obtain from edge
> > profile.  Having
> > switch
> >  case 1,3,5,7,8,9: aaa
> >  case 2,4,6,8,10,12: bbb
> > to produce well ballanced decision tree, it is not enough to know how
> > often the value is even and how often it is odd...
> 
> Why is that? In this case, the expanded switch case does not use BST,
> but testing against bit patterns.

Yep, oversimplified example... 
> 
> >
> > Thus there is a need for value histograms.
> 
> None of the existing value profiler will be powerful enough for this
> though: the one_value profiler only tracks one value. The interval
> profiler can potentially be used if the switch case range is small --
> otherwise the runtime memory overhead will be too large.

Adding profiler to profile individual value ranges is not that hard...
But indeed, at the moment we have single value profiler only...

Honza


[PING] iwMMXt patches

2012-04-17 Thread Matt Turner
Are these patches ready to go in? It looks like they were ack'd.

http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01815.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01817.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01816.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01818.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01819.html

We (OLPC) will need these patches for reasonable iwMMXt performance
and the ability to use VFP and iwMMXt together.

Thanks,
Matt


Wider modes for partial int modes

2012-04-17 Thread Bernd Schmidt
This patch enables GET_MODE_WIDER_MODE for MODE_PARTIAL_INT (by setting
the wider mode to the one the partial mode is based on), which is useful
for the port I'm working on: I can avoid defining operations on the
partial modes. Also, convert_modes is changed so that unsignedp is taken
into account when widening partial modes.

I've tested this on m32c-elf as well as on my port, and bootstrapped on
i686-linux. Ok?


Bernd
* machmode.h (CLASS_HAS_WIDER_MODES_P): True for MODE_PARTIAL_INT.
* expr.c (convert_move): Honor unsignedp when extending partial int
modes.
* genmodes.c (power_of_two_p, regular_mode, make_complex_modes,
emit_mode_wider): Revert Spider hacks.
(complete_mode): Don't clear component field of partial int modes.
(emit_mode_inner): Don't emit it however.
(calc_wider_mode): Partial int modes widen to their component.

Index: machmode.h
===
--- machmode.h  (revision 186270)
+++ machmode.h  (working copy)
@@ -166,6 +166,7 @@ extern const unsigned char mode_class[NU
 /* Nonzero if CLASS modes can be widened.  */
 #define CLASS_HAS_WIDER_MODES_P(CLASS) \
   (CLASS == MODE_INT   \
+   || CLASS == MODE_PARTIAL_INT\
|| CLASS == MODE_FLOAT  \
|| CLASS == MODE_DECIMAL_FLOAT  \
|| CLASS == MODE_COMPLEX_FLOAT  \
Index: expr.c
===
--- expr.c  (revision 186270)
+++ expr.c  (working copy)
@@ -438,21 +438,20 @@ convert_move (rtx to, rtx from, int unsi
   rtx new_from;
   enum machine_mode full_mode
= smallest_mode_for_size (GET_MODE_BITSIZE (from_mode), MODE_INT);
+  convert_optab ctab = unsignedp ? zext_optab : sext_optab;
+  enum insn_code icode;
 
-  gcc_assert (convert_optab_handler (sext_optab, full_mode, from_mode)
- != CODE_FOR_nothing);
+  icode = convert_optab_handler (ctab, full_mode, from_mode);
+  gcc_assert (icode != CODE_FOR_nothing);
 
   if (to_mode == full_mode)
{
- emit_unop_insn (convert_optab_handler (sext_optab, full_mode,
-from_mode),
- to, from, UNKNOWN);
+ emit_unop_insn (icode, to, from, UNKNOWN);
  return;
}
 
   new_from = gen_reg_rtx (full_mode);
-  emit_unop_insn (convert_optab_handler (sext_optab, full_mode, from_mode),
- new_from, from, UNKNOWN);
+  emit_unop_insn (icode, new_from, from, UNKNOWN);
 
   /* else proceed to integer conversions below.  */
   from_mode = full_mode;
Index: genmodes.c
===
--- genmodes.c  (revision 186270)
+++ genmodes.c  (working copy)
@@ -360,7 +360,6 @@ complete_mode (struct mode_data *m)
   m->bytesize = m->component->bytesize;
 
   m->ncomponents = 1;
-  m->component = 0;  /* ??? preserve this */
   break;
 
 case MODE_COMPLEX_INT:
@@ -823,7 +822,13 @@ calc_wider_mode (void)
 
  sortbuf[i] = 0;
  for (j = 0; j < i; j++)
-   sortbuf[j]->next = sortbuf[j]->wider = sortbuf[j + 1];
+   {
+ sortbuf[j]->next = sortbuf[j + 1];
+ if (c == MODE_PARTIAL_INT)
+   sortbuf[j]->wider = sortbuf[j]->component;
+ else
+   sortbuf[j]->wider = sortbuf[j]->next;
+   }
 
  modes[c] = sortbuf[0];
}
@@ -1120,7 +1125,8 @@ emit_mode_inner (void)
 
   for_all_modes (c, m)
 tagged_printf ("%smode",
-  m->component ? m->component->name : void_mode->name,
+  c != MODE_PARTIAL_INT && m->component
+  ? m->component->name : void_mode->name,
   m->name);
 
   print_closer ();


[patch] Cleanup tree-switch-conversion a bit

2012-04-17 Thread Steven Bosscher
Hello,

This is another step towards moving GIMPLE_SWITCH expansion to an
earlier point in the pipeline.

With the attached patch, some of the logic from stmt.c:add_case_node()
is moved to gimplify.c:gimplify_switch_expr(). This includes:

* Code to drop case labels that are out of range for the switch index
expression. (Actually, I suspect this code hasn't worked properly
since gimplification was introduced, because the switch index
expression can be promoted by language specific gimplification, so
expand_case never actually sees the proper type with the current
implementation in stmt.c.)

* Code to fold_convert case label values to the right type. I've opted
to go for folding to the original type of the SWITCH_EXPR, rather than
to the post-gimplification switch index type.

* Code to canonicalize CASE_LABEL's subnodes, CASE_LOW and CASE_HIGH.
I've chosen to impose strict requirements that CASE_HIGH > CASE_LOW if
CASE_HIGH is non-zero. This is different from what add_case_node does,
but I think it makes sense to go for the minimal representation here:
The case labels in stmt.c never lived very long (only during expand)
but GIMPLE_SWITCH statements stay around for much of the compilation
process and can also be streamed out, etc.

Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?

Ciao!
Steven
* targhooks.c (default_case_values_threshold): Fix code style nit.

* stmt.c (add_case_node, expand_case): Move logic to remove/reduce
case range and type folding from here...
* gimplify.c (gimplify_switch_expr): ... to here.

Index: targhooks.c
===
--- targhooks.c (revision 186526)
+++ targhooks.c (working copy)
@@ -1200,7 +1200,8 @@ default_target_can_inline_p (tree caller, tree cal
this means extra overhead for dispatch tables, which raises the
threshold for using them.  */
 
-unsigned int default_case_values_threshold (void)
+unsigned int
+default_case_values_threshold (void)
 {
   return (HAVE_casesi ? 4 : 5);
 }
Index: tree-switch-conversion.c
===
--- tree-switch-conversion.c(revision 186526)
+++ tree-switch-conversion.c(working copy)
@@ -24,8 +24,8 @@ Software Foundation, 51 Franklin Street, Fifth Flo
  Switch initialization conversion
 
 The following pass changes simple initializations of scalars in a switch
-statement into initializations from a static array.  Obviously, the values must
-be constant and known at compile time and a default branch must be
+statement into initializations from a static array.  Obviously, the values
+must be constant and known at compile time and a default branch must be
 provided.  For example, the following code:
 
 int a,b;
@@ -162,16 +162,12 @@ struct switch_conv_info
   basic_block bit_test_bb[2];
 };
 
-/* Global pass info.  */
-static struct switch_conv_info info;
-
-
 /* Checks whether the range given by individual case statements of the SWTCH
switch statement isn't too big and whether the number of branches actually
satisfies the size of the new array.  */
 
 static bool
-check_range (gimple swtch)
+check_range (gimple swtch, struct switch_conv_info *info)
 {
   tree min_case, max_case;
   unsigned int branch_num = gimple_switch_num_labels (swtch);
@@ -181,7 +177,7 @@ static bool
  is a default label which is the first in the vector.  */
 
   min_case = gimple_switch_label (swtch, 1);
-  info.range_min = CASE_LOW (min_case);
+  info->range_min = CASE_LOW (min_case);
 
   gcc_assert (branch_num > 1);
   gcc_assert (CASE_LOW (gimple_switch_label (swtch, 0)) == NULL_TREE);
@@ -191,22 +187,22 @@ static bool
   else
 range_max = CASE_LOW (max_case);
 
-  gcc_assert (info.range_min);
+  gcc_assert (info->range_min);
   gcc_assert (range_max);
 
-  info.range_size = int_const_binop (MINUS_EXPR, range_max, info.range_min);
+  info->range_size = int_const_binop (MINUS_EXPR, range_max, info->range_min);
 
-  gcc_assert (info.range_size);
-  if (!host_integerp (info.range_size, 1))
+  gcc_assert (info->range_size);
+  if (!host_integerp (info->range_size, 1))
 {
-  info.reason = "index range way too large or otherwise unusable.\n";
+  info->reason = "index range way too large or otherwise unusable";
   return false;
 }
 
-  if ((unsigned HOST_WIDE_INT) tree_low_cst (info.range_size, 1)
+  if ((unsigned HOST_WIDE_INT) tree_low_cst (info->range_size, 1)
   > ((unsigned) branch_num * SWITCH_CONVERSION_BRANCH_RATIO))
 {
-  info.reason = "the maximum range-branch ratio exceeded.\n";
+  info->reason = "the maximum range-branch ratio exceeded";
   return false;
 }
 
@@ -219,7 +215,7 @@ static bool
and returns true.  Otherwise returns false.  */
 
 static bool
-check_process_case (tree cs)
+check_process_case (tree cs, struct switch_conv_info *info)
 {
   tree ldecl;
   basic_block label_bb, following_bb;
@@ -228,48 +224,48

[patch] Move add_case_node logic from stmt.c to gimplify.c

2012-04-17 Thread Steven Bosscher
On Wed, Apr 18, 2012 at 12:04 AM, Steven Bosscher  wrote:
> Hello,
>
> This is another step towards moving GIMPLE_SWITCH expansion to an
> earlier point in the pipeline.
>
> With the attached patch, some of the logic from stmt.c:add_case_node()
> is moved to gimplify.c:gimplify_switch_expr(). This includes:
>
> * Code to drop case labels that are out of range for the switch index
> expression. (Actually, I suspect this code hasn't worked properly
> since gimplification was introduced, because the switch index
> expression can be promoted by language specific gimplification, so
> expand_case never actually sees the proper type with the current
> implementation in stmt.c.)
>
> * Code to fold_convert case label values to the right type. I've opted
> to go for folding to the original type of the SWITCH_EXPR, rather than
> to the post-gimplification switch index type.
>
> * Code to canonicalize CASE_LABEL's subnodes, CASE_LOW and CASE_HIGH.
> I've chosen to impose strict requirements that CASE_HIGH > CASE_LOW if
> CASE_HIGH is non-zero. This is different from what add_case_node does,
> but I think it makes sense to go for the minimal representation here:
> The case labels in stmt.c never lived very long (only during expand)
> but GIMPLE_SWITCH statements stay around for much of the compilation
> process and can also be streamed out, etc.
>
> Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?
>
> Ciao!
> Steven

And this time with the right subject and the right patch attached.
Sorry for the inconvenience!
* targhooks.c (default_case_values_threshold): Fix code style nit.

* stmt.c (add_case_node, expand_case): Move logic to remove/reduce
case range and type folding from here...
* gimplify.c (gimplify_switch_expr): ... to here.

Index: targhooks.c
===
--- targhooks.c (revision 186526)
+++ targhooks.c (working copy)
@@ -1200,7 +1200,8 @@ default_target_can_inline_p (tree caller, tree cal
this means extra overhead for dispatch tables, which raises the
threshold for using them.  */
 
-unsigned int default_case_values_threshold (void)
+unsigned int
+default_case_values_threshold (void)
 {
   return (HAVE_casesi ? 4 : 5);
 }
Index: stmt.c
===
--- stmt.c  (revision 186526)
+++ stmt.c  (working copy)
@@ -1822,66 +1822,25 @@ expand_stack_restore (tree var)
fed to us in descending order from the sorted vector of case labels used
in the tree part of the middle end.  So the list we construct is
sorted in ascending order.  The bounds on the case range, LOW and HIGH,
-   are converted to case's index type TYPE.  */
+   are converted to case's index type TYPE.  Note that the original type
+   of the case index in the source code is usually "lost" during
+   gimplification due to type promotion, but the case labels retain the
+   original type.  */
 
 static struct case_node *
 add_case_node (struct case_node *head, tree type, tree low, tree high,
tree label, alloc_pool case_node_pool)
 {
-  tree min_value, max_value;
   struct case_node *r;
 
-  gcc_assert (TREE_CODE (low) == INTEGER_CST);
-  gcc_assert (!high || TREE_CODE (high) == INTEGER_CST);
+  gcc_checking_assert (low);
+  gcc_checking_assert (! high || (TREE_TYPE (low) == TREE_TYPE (high)));
 
-  min_value = TYPE_MIN_VALUE (type);
-  max_value = TYPE_MAX_VALUE (type);
-
-  /* If there's no HIGH value, then this is not a case range; it's
- just a simple case label.  But that's just a degenerate case
- range.
- If the bounds are equal, turn this into the one-value case.  */
-  if (!high || tree_int_cst_equal (low, high))
-{
-  /* If the simple case value is unreachable, ignore it.  */
-  if ((TREE_CODE (min_value) == INTEGER_CST
-&& tree_int_cst_compare (low, min_value) < 0)
- || (TREE_CODE (max_value) == INTEGER_CST
- && tree_int_cst_compare (low, max_value) > 0))
-   return head;
-  low = fold_convert (type, low);
-  high = low;
-}
-  else
-{
-  /* If the entire case range is unreachable, ignore it.  */
-  if ((TREE_CODE (min_value) == INTEGER_CST
-&& tree_int_cst_compare (high, min_value) < 0)
- || (TREE_CODE (max_value) == INTEGER_CST
- && tree_int_cst_compare (low, max_value) > 0))
-   return head;
-
-  /* If the lower bound is less than the index type's minimum
-value, truncate the range bounds.  */
-  if (TREE_CODE (min_value) == INTEGER_CST
-&& tree_int_cst_compare (low, min_value) < 0)
-   low = min_value;
-  low = fold_convert (type, low);
-
-  /* If the upper bound is greater than the index type's maximum
-value, truncate the range bounds.  */
-  if (TREE_CODE (max_value) == INTEGER_CST
- && tree_int_cst_compare (high, max_value) > 0)
-   high = max_value;
- 

[committed] avoid @opindex before @item in invoke.texi

2012-04-17 Thread Manuel López-Ibáñez
Otherwise, it starts a new paragraph.Tested by inspecting the
resulting html. Committed as obvious.

Cheers,

Manuel.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 186552)
+++ gcc/doc/invoke.texi (working copy)
@@ -2875,8 +2875,8 @@
 line-wrapping is done; each error message appears on a single
 line.

+@item -fdiagnostics-show-location=once
 @opindex fdiagnostics-show-location
-@item -fdiagnostics-show-location=once
 Only meaningful in line-wrapping mode.  Instructs the diagnostic messages
 reporter to emit @emph{once} source location information; that is, in
 case the message is too long to fit on a single physical line and has to
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 186552)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2012-04-18  Manuel López-Ibáñez  
+
+* doc/invoke.texi (Language Independent Options): @item should be
+   before @opindex.
+


various minor obvious fixes in th track-macro-expansion code

2012-04-17 Thread Manuel López-Ibáñez
Hi Dodji,

I was going to commit this as obvious, but I want to make sure that it
doesn't conflict with your new track-macro-expansion patches. It can
also wait until you commit all your patches.

Cheers,

Manuel.

2012-04-18  Manuel López-Ibáñez  

* tree-diagnostic.c (maybe_unwind_expanded_macro_loc): Fix
comment. Delete unused parameter first_exp_point_map.
(virt_loc_aware_diagnostic_finalizer): Update call.
libcpp/
* line-map.c (linemap_resolve_location): Synchronize comments with
those in line-map.h.
* include/line-map.h (linemap_resolve_location): Fix spelling in
comment.


macro-fixes.diff
Description: Binary data


New Vietnamese PO file for 'cpplib' (version 4.7.0)

2012-04-17 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Vietnamese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/vi.po

(This file, 'cpplib-4.7.0.vi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-4.7.0.vi.po'

2012-04-17 Thread Translation Project Robot


cpplib-4.7.0.vi.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



Re: [PATCH, Android] MIPS support

2012-04-17 Thread Maxim Kuvyrkov
On 5/04/2012, at 10:16 AM, Maxim Kuvyrkov wrote:

> Chao,
> 
> Let's take discussion of MIPS changes to gcc-patches@.  Please follow up here.
> 
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
> 
> On 5/04/2012, at 10:10 AM, Fu, Chao-Ying wrote:
> 
>> For now, two MIPS changes in gnu-user.h and unwind-dw2-fde-dip.c can be 
>> posted for comment.
>> (I didn't tested this patch, though.)

You need to test your patches before posting them for review.  Below are a 
couple of comments on your current version.

>> After starting to build toolchains for Android with Bionic, we may find new 
>> files to
>> patch.  Ex: Comment out getpagesize() for bionic.
>> 
>> Any comment?  Thanks a lot!
>> 
>> Regards,
>> Chao-ying
>> 
>> Index: gcc/gcc/config/mips/gnu-user.h
>> ===
>> --- gcc.orig/gcc/config/mips/gnu-user.h  2012-04-03 17:39:50.0 
>> -0700
>> +++ gcc/gcc/config/mips/gnu-user.h   2012-04-04 14:31:50.804236000 -0700
>> @@ -45,8 +45,8 @@ along with GCC; see the file COPYING3.  
>> /* A standard GNU/Linux mapping.  On most targets, it is included in
>>   CC1_SPEC itself by config/linux.h, but mips.h overrides CC1_SPEC
>>   and provides this hook instead.  */
>> -#undef SUBTARGET_CC1_SPEC
>> -#define SUBTARGET_CC1_SPEC "%{profile:-p}"
>> +#undef GNU_USER_SUBTARGET_CC1_SPEC
>> +#define GNU_USER_SUBTARGET_CC1_SPEC "%{profile:-p}"
>> 
>> /* -G is incompatible with -KPIC which is the default, so only allow objects
>>   in the small data section if the user explicitly asks for it.  */
>> @@ -54,8 +54,8 @@ along with GCC; see the file COPYING3.  
>> #define MIPS_DEFAULT_GVALUE 0
>> 
>> /* Borrowed from sparc/linux.h */
>> -#undef LINK_SPEC
>> -#define LINK_SPEC \
>> +#undef GNU_USER_TARGET_LINK_SPEC
>> +#define GNU_USER_TARGET_LINK_SPEC \
>> "%(endian_spec) \
>>  %{shared:-shared} \
>>  %{!shared: \
>> @@ -89,8 +89,8 @@ along with GCC; see the file COPYING3.  
>> #undef ASM_OUTPUT_REG_PUSH
>> #undef ASM_OUTPUT_REG_POP
>> 
>> -#undef LIB_SPEC
>> -#define LIB_SPEC "\
>> +#undef GNU_USER_TARGET_LIB_SPEC
>> +#define GNU_USER_TARGET_LIB_SPEC "\
>> %{pthread:-lpthread} \
>> %{shared:-lc} \
>> %{!shared: \
>> @@ -133,7 +133,34 @@ extern const char *host_detect_local_cpu
>>  LINUX_DRIVER_SELF_SPECS
>> 
>> /* Similar to standard Linux, but adding -ffast-math support.  */
>> -#undef  ENDFILE_SPEC
>> -#define ENDFILE_SPEC \
>> +#undef  GNU_USER_TARGET_ENDFILE_SPEC
>> +#define GNN_USER_TARGET_ENDFILE_SPEC \
>>  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
>>   %{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s"

Above definitions are OK.

>> +
>> +#undef  LINK_SPEC
>> +#define LINK_SPEC   \
>> +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LINK_SPEC,   \
>> +   GNU_USER_TARGET_LINK_SPEC " " ANDROID_LINK_SPEC)
>> +
>> +#undef  SUBTARGET_CC1_SPEC
>> +#define SUBTARGET_CC1_SPEC  \
>> +  LINUX_OR_ANDROID_CC (GNU_USER_SUBTARGET_CC1_SPEC, \
>> +   GNU_USER_SUBTARGET_CC1_SPEC " " ANDROID_CC1_SPEC)
>> +
>> +#undef  CC1PLUS_SPEC
>> +#define CC1PLUS_SPEC
>> \
>> +  LINUX_OR_ANDROID_CC ("", ANDROID_CC1PLUS_SPEC)
>> +
>> +#undef  LIB_SPEC
>> +#define LIB_SPEC\
>> +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LIB_SPEC,\
>> +   GNU_USER_TARGET_LIB_SPEC " " ANDROID_LIB_SPEC)
>> +
>> +#undef  STARTFILE_SPEC
>> +#define STARTFILE_SPEC  
>> \
>> +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_STARTFILE_SPEC, 
>> ANDROID_STARTFILE_SPEC)
>> +
>> +#undef  ENDFILE_SPEC
>> +#define ENDFILE_SPEC
>> \
>> +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_ENDFILE_SPEC, ANDROID_ENDFILE_SPEC)

The LINUX_OR_ANDROID_* definitions should be moved out of gnu-user.h, as this 
header is used for systems besides Linux, e.g., kFreeBSD and Hurd.  Please move 
these definitions to mips/linux-common.h, which will be a new file, similarly 
as i386 did in http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html .

>> Index: gcc/libgcc/unwind-dw2-fde-dip.c
>> ===
>> --- gcc.orig/libgcc/unwind-dw2-fde-dip.c 2012-04-03 17:07:28.0 
>> -0700
>> +++ gcc/libgcc/unwind-dw2-fde-dip.c  2012-04-04 14:51:01.338074000 -0700
>> @@ -48,8 +48,9 @@
>> #include "gthr.h"
>> 
>> #if !defined(inhibit_libc) && defined(HAVE_LD_EH_FRAME_HDR) \
>> -&& (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) \
>> -|| (__GLIBC__ == 2 && __GLIBC_MINOR__ == 2 && defined(DT_CONFIG)))
>> +&& ((defined(__BIONIC__) && (defined(mips) || defined(__mips__))) \
>> +|| (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) \
>> +|| (__GLIBC__ == 

RE: [PATCH, Android] MIPS support

2012-04-17 Thread Fu, Chao-Ying
Maxim Kuvyrkov wrote:

> > 
> >> For now, two MIPS changes in gnu-user.h and 
> unwind-dw2-fde-dip.c can be posted for comment.
> >> (I didn't tested this patch, though.)
> 
> You need to test your patches before posting them for review. 
>  Below are a couple of comments on your current version.

  I can test if this patch doesn't break existing MIPS Linux GCC build.

> 
> >> After starting to build toolchains for Android with 
> Bionic, we may find new files to
> >> patch.  Ex: Comment out getpagesize() for bionic.
> >> 
> >> Any comment?  Thanks a lot!
> >> 
> >> Regards,
> >> Chao-ying
> >> 
> >> Index: gcc/gcc/config/mips/gnu-user.h
> >> ===
> >> --- gcc.orig/gcc/config/mips/gnu-user.h2012-04-03 
> 17:39:50.0 -0700
> >> +++ gcc/gcc/config/mips/gnu-user.h 2012-04-04 
> 14:31:50.804236000 -0700
> >> @@ -45,8 +45,8 @@ along with GCC; see the file COPYING3.  
> >> /* A standard GNU/Linux mapping.  On most targets, it is 
> included in
> >>   CC1_SPEC itself by config/linux.h, but mips.h overrides CC1_SPEC
> >>   and provides this hook instead.  */
> >> -#undef SUBTARGET_CC1_SPEC
> >> -#define SUBTARGET_CC1_SPEC "%{profile:-p}"
> >> +#undef GNU_USER_SUBTARGET_CC1_SPEC
> >> +#define GNU_USER_SUBTARGET_CC1_SPEC "%{profile:-p}"
> >> 
> >> /* -G is incompatible with -KPIC which is the default, so 
> only allow objects
> >>   in the small data section if the user explicitly asks for it.  */
> >> @@ -54,8 +54,8 @@ along with GCC; see the file COPYING3.  
> >> #define MIPS_DEFAULT_GVALUE 0
> >> 
> >> /* Borrowed from sparc/linux.h */
> >> -#undef LINK_SPEC
> >> -#define LINK_SPEC \
> >> +#undef GNU_USER_TARGET_LINK_SPEC
> >> +#define GNU_USER_TARGET_LINK_SPEC \
> >> "%(endian_spec) \
> >>  %{shared:-shared} \
> >>  %{!shared: \
> >> @@ -89,8 +89,8 @@ along with GCC; see the file COPYING3.  
> >> #undef ASM_OUTPUT_REG_PUSH
> >> #undef ASM_OUTPUT_REG_POP
> >> 
> >> -#undef LIB_SPEC
> >> -#define LIB_SPEC "\
> >> +#undef GNU_USER_TARGET_LIB_SPEC
> >> +#define GNU_USER_TARGET_LIB_SPEC "\
> >> %{pthread:-lpthread} \
> >> %{shared:-lc} \
> >> %{!shared: \
> >> @@ -133,7 +133,34 @@ extern const char *host_detect_local_cpu
> >>  LINUX_DRIVER_SELF_SPECS
> >> 
> >> /* Similar to standard Linux, but adding -ffast-math support.  */
> >> -#undef  ENDFILE_SPEC
> >> -#define ENDFILE_SPEC \
> >> +#undef  GNU_USER_TARGET_ENDFILE_SPEC
> >> +#define GNN_USER_TARGET_ENDFILE_SPEC \
> >>  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
> >>   %{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s"
> 
> Above definitions are OK.

  Thanks!

> 
> >> +
> >> +#undef  LINK_SPEC
> >> +#define LINK_SPEC 
>   \
> >> +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LINK_SPEC, 
>   \
> >> + GNU_USER_TARGET_LINK_SPEC " " ANDROID_LINK_SPEC)
> >> +
> >> +#undef  SUBTARGET_CC1_SPEC
> >> +#define SUBTARGET_CC1_SPEC
>   \
> >> +  LINUX_OR_ANDROID_CC (GNU_USER_SUBTARGET_CC1_SPEC,   
>   \
> >> + GNU_USER_SUBTARGET_CC1_SPEC " " ANDROID_CC1_SPEC)
> >> +
> >> +#undef  CC1PLUS_SPEC
> >> +#define CC1PLUS_SPEC  
>   \
> >> +  LINUX_OR_ANDROID_CC ("", ANDROID_CC1PLUS_SPEC)
> >> +
> >> +#undef  LIB_SPEC
> >> +#define LIB_SPEC  
>   \
> >> +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LIB_SPEC,  
>   \
> >> + GNU_USER_TARGET_LIB_SPEC " " ANDROID_LIB_SPEC)
> >> +
> >> +#undef  STARTFILE_SPEC
> >> +#define STARTFILE_SPEC
>   \
> >> +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_STARTFILE_SPEC, 
> ANDROID_STARTFILE_SPEC)
> >> +
> >> +#undef  ENDFILE_SPEC
> >> +#define ENDFILE_SPEC  
>   \
> >> +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_ENDFILE_SPEC, 
> ANDROID_ENDFILE_SPEC)
> 
> The LINUX_OR_ANDROID_* definitions should be moved out of 
> gnu-user.h, as this header is used for systems besides Linux, 
> e.g., kFreeBSD and Hurd.  Please move these definitions to 
> mips/linux-common.h, which will be a new file, similarly as 
> i386 did in http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html .

  I will check this message.

> 
> >> Index: gcc/libgcc/unwind-dw2-fde-dip.c
> >> ===
> >> --- gcc.orig/libgcc/unwind-dw2-fde-dip.c   2012-04-03 
> 17:07:28.0 -0700
> >> +++ gcc/libgcc/unwind-dw2-fde-dip.c2012-04-04 
> 14:51:01.338074000 -0700
> >> @@ -48,8 +48,9 @@
> >> #include "gthr.h"
> >> 
> >> #if !defined(inhibit_libc) && defined(HAVE_LD_EH_FRAME_HDR) \
> >> -&& (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) \
> >> -  || (__GLIBC__ == 2 && __GLIBC_MINOR__ == 2 && 
> defined(DT_CONFIG)))
> >> +&& ((defined(__BIONIC__) && (defined(mips) || 
> defined(__m

Re: [PATCH, Android] MIPS support

2012-04-17 Thread Maxim Kuvyrkov
On 18/04/2012, at 1:10 PM, Fu, Chao-Ying wrote:

> Maxim Kuvyrkov wrote:
> 
>> Above definitions are OK.
> 
>  Thanks!

For avoidance of doubt, please wait for the whole patch to be approved before 
committing it.

 Index: gcc/libgcc/unwind-dw2-fde-dip.c
 ===
 --- gcc.orig/libgcc/unwind-dw2-fde-dip.c   2012-04-03 
>> 17:07:28.0 -0700
 +++ gcc/libgcc/unwind-dw2-fde-dip.c2012-04-04 
>> 14:51:01.338074000 -0700
 @@ -48,8 +48,9 @@
 #include "gthr.h"
 
 #if !defined(inhibit_libc) && defined(HAVE_LD_EH_FRAME_HDR) \
 -&& (__GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 2) \
 -  || (__GLIBC__ == 2 && __GLIBC_MINOR__ == 2 && 
>> defined(DT_CONFIG)))
 +&& ((defined(__BIONIC__) && (defined(mips) || 
>> defined(__mips__))) \
 +|| (__GLIBC__ > 2 || (__GLIBC__ == 2 && 
>> __GLIBC_MINOR__ > 2) \
 +  || (__GLIBC__ == 2 && __GLIBC_MINOR__ == 2 && 
>> defined(DT_CONFIG
 # define USE_PT_GNU_EH_FRAME
 #endif
>> 
>> What is this change for?
> 
>  For stack unwinding, MIPS needs supporting functions in libgcc to 
> work with eh_frame for Android.
> (Note that ARM has its own unwinding functions in gcc/config/arm/.  It 
> doesn't use eh_frame.)
> The file is enabled for GLIBC originally.  Thus, I add a new test to enable it
> for MIPS Android BIONIC build.

Please use format that other C libraries use (instead of mixing together GLIBC 
and Bionic definitions):

#if !defined(inhibit_libc) && defined(HAVE_LD_EH_FRAME_HDR) \
&& defined(__BIONIC__)
# define USE_PT_GNU_EH_FRAME
#endif

Also, as far as I can tell, this change would also apply for x86, and for ARM 
having USE_PT_GNU_EH_FRAME defined will not hurt.  So please make the 
definition architecture-agnostic.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



[C++ Patch] PR 52422 (new patch)

2012-04-17 Thread Paolo Carlini

Hi Jason,

I have a new patch for this issue, another SFINAE issue noticed by 
Daniel. Compared to the last version, I extended the complain-ization ;) 
to a few more functions in typeck.c (I think the set is more consistent 
now) and thoroughly double checked that the return values of all the 
functions which now get a tsubst_flags_t argument are checked for 
error_mark_node and in case early return back error_mark_node itself, as 
you requested last time.


As usual, tested x86_64-linux.

Ok for mainline?

Thanks,
Paolo.

/
/cp
2012-04-17  Paolo Carlini  

PR c++/52422
* cp-tree.h (build_addr_func, decay_conversion,
get_member_function_from_ptrfunc,
build_m_component_ref, convert_member_func_to_ptr):
Add tsubst_flags_t parameter.
* typeck.c (cp_default_conversion): Add.
(decay_conversion, default_conversion,
get_member_function_from_ptrfunc, convert_member_func_to_ptr):
Add tsubst_flags_t parameter and use it throughout.
(cp_build_indirect_ref, cp_build_array_ref,
cp_build_function_call_vec, convert_arguments, build_x_binary_op,
cp_build_binary_op, cp_build_unary_op, build_reinterpret_cast_1,
build_const_cast_1, expand_ptrmemfunc_cst,
convert_for_initialization): Adjust.
* init.c (build_vec_init): Adjust.
* decl.c (grok_reference_init, get_atexit_node): Likewise.
* rtti.c (build_dynamic_cast_1, tinfo_base_init): Likewise.
* except.c (build_throw): Likewise.
* typeck2.c (build_x_arrow): Likewise.
(build_m_component_ref): Add tsubst_flags_t parameter and
use it throughout.
* pt.c (convert_nontype_argument): Adjust.
* semantics.c (finish_asm_stmt, maybe_add_lambda_conv_op): Likewise.
* decl2.c (build_offset_ref_call_from_tree): Likewise.
* call.c (build_addr_func): Add tsubst_flags_t parameter and
use it throughout.
(build_call_a, build_conditional_expr_1, build_new_op_1,
convert_like_real, convert_arg_to_ellipsis, build_over_call,
build_special_member_call): Adjust.
* cvt.c (cp_convert_to_pointer, force_rvalue,
build_expr_type_conversion): Likewise.

/testsuite
2012-04-17  Paolo Carlini  

PR c++/52422
* g++.dg/cpp0x/sfinae33.C: New.
* g++.dg/cpp0x/sfinae34.C: Likewise.
Index: testsuite/g++.dg/cpp0x/sfinae33.C
===
--- testsuite/g++.dg/cpp0x/sfinae33.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/sfinae33.C   (revision 0)
@@ -0,0 +1,27 @@
+// PR c++/52422
+// { dg-options -std=c++11 }
+
+template
+struct add_rval_ref
+{
+  typedef T&& type;
+};
+
+template<>
+struct add_rval_ref
+{
+  typedef void type;
+};
+
+template
+typename add_rval_ref::type create();
+
+template()())
+>
+auto f(int) -> char(&)[1];
+
+template
+auto f(...) -> char(&)[2];
+
+static_assert(sizeof(f(0)) != 1, "");
Index: testsuite/g++.dg/cpp0x/sfinae34.C
===
--- testsuite/g++.dg/cpp0x/sfinae34.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/sfinae34.C   (revision 0)
@@ -0,0 +1,27 @@
+// PR c++/52422
+// { dg-options -std=c++11 }
+
+template
+struct add_rval_ref
+{
+  typedef T&& type;
+};
+
+template<>
+struct add_rval_ref
+{
+  typedef void type;
+};
+
+template
+typename add_rval_ref::type create();
+
+template().*create())() )
+>
+auto f(int) -> char(&)[1];
+
+template
+auto f(...) -> char(&)[2];
+
+static_assert(sizeof(f(0)) != 1, "");
Index: cp/typeck.c
===
--- cp/typeck.c (revision 186552)
+++ cp/typeck.c (working copy)
@@ -1818,7 +1818,7 @@ unlowered_expr_type (const_tree exp)
that the return value is no longer an lvalue.  */
 
 tree
-decay_conversion (tree exp)
+decay_conversion (tree exp, tsubst_flags_t complain)
 {
   tree type;
   enum tree_code code;
@@ -1832,7 +1832,8 @@ tree
   exp = resolve_nondeduced_context (exp);
   if (type_unknown_p (exp))
 {
-  cxx_incomplete_type_error (exp, TREE_TYPE (exp));
+  if (complain & tf_error)
+   cxx_incomplete_type_error (exp, TREE_TYPE (exp));
   return error_mark_node;
 }
 
@@ -1851,13 +1852,14 @@ tree
   code = TREE_CODE (type);
   if (code == VOID_TYPE)
 {
-  error ("void value not ignored as it ought to be");
+  if (complain & tf_error)
+   error ("void value not ignored as it ought to be");
   return error_mark_node;
 }
-  if (invalid_nonstatic_memfn_p (exp, tf_warning_or_error))
+  if (invalid_nonstatic_memfn_p (exp, complain))
 return error_mark_node;
   if (code == FUNCTION_TYPE || is_overloaded_fn (exp))
-return cp_build_addr_expr (exp, tf_warning_or_error);
+return cp_build_addr_expr (exp, complain);
   if (code == ARRAY_TYPE)
 {
   tree adr;
@@ -1869,7 +1871,9 @@ tree
 
   if (TREE_CODE (exp) == COMPOUND_EXPR)
{
- tr

[PATCH, PR38785] Throttle PRE at -O3

2012-04-17 Thread Maxim Kuvyrkov
Steven,
J"orn,

I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
version of your combined patch attached to PR38785 still works very well.  
Would you mind me getting it through upstream review, or are there any issues 
with contributing this patch to GCC mainline?

We (CodeSourcery/Mentor) were carrying this patch in our toolchains since GCC 
4.4, and it didn't show any performance or correctness problems on x86, ARM, 
MIPS, and other architectures.  It also reliably fixes bitmnp01 regression, 
which is still present in current mainline.

I have tested this patch on recent mainline on i686-linux-gnu with no 
regressions.  Unless I hear from you to the contrary, I will push this patch 
for upstream review and, hopefully, get it checked in.

Previous discussion of this patch is at 
http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



pr38785.ChangeLog
Description: Binary data


pr38785.patch
Description: Binary data


[google/integration] Add -Xclang-only option (issue6047048)

2012-04-17 Thread Ollie Wild
To be submitted to the google/integration branch and merged into
google/{main,gcc-4_6,gcc-4_7}.

Add -Xclang-only option (which is ignored).

This is used by certain drivers to pass options selectively to clang.  Adding
support to the gcc driver makes it easier to test GCC in the absence of these
drivers.

Google ref 6302116.

2012-04-17   Ollie Wild  

* gcc/common.opt (Xclang-only): New option.
* gcc/doc/invoke.texi (Xclang-only): Document new option.
* gcc/gcc.c (display_help): Print new option.
(driver_handle_option): Support new option (ignoring args).


diff --git a/gcc/common.opt b/gcc/common.opt
index 4a751a9..39f0843 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -743,6 +743,9 @@ Warn when a vector operation is compiled outside the SIMD
 Xassembler
 Driver Separate
 
+Xclang-only
+Driver Joined
+
 Xlinker
 Driver Separate
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d980e9f..1b61e76 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9560,6 +9560,11 @@ systems using the GNU linker.  On some targets, such as 
bare-board
 targets without an operating system, the @option{-T} option may be required
 when linking to avoid references to undefined symbols.
 
+@item -Xclang-only @var{option}
+@opindex Xclang-only
+Ignore @var{option}.  This is used by some custom drivers to pass options
+to Clang but not GCC.
+
 @item -Xlinker @var{option}
 @opindex Xlinker
 Pass @var{option} as an option to the linker.  You can use this to
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 5f789fd..c6b48a6 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -2983,6 +2983,7 @@ display_help (void)
   fputs (_("  -Xassembler Pass  on to the assembler\n"), 
stdout);
   fputs (_("  -Xpreprocessor  Pass  on to the preprocessor\n"), 
stdout);
   fputs (_("  -XlinkerPass  on to the linker\n"), 
stdout);
+  fputs (_("  -Xclang-only=   Ignore \n"), stdout);
   fputs (_("  -save-temps  Do not delete intermediate files\n"), 
stdout);
   fputs (_("  -save-temps=Do not delete intermediate files\n"), 
stdout);
   fputs (_("\
@@ -3353,6 +3354,11 @@ driver_handle_option (struct gcc_options *opts,
   do_save = false;
   break;
 
+case OPT_Xclang_only:
+  /* Ignore the argument.  Used by some drivers to selectively pass
+ arguments to clang.  */
+  break;
+
 case OPT_Xlinker:
   add_infile (arg, "*");
   do_save = false;

--
This patch is available for review at http://codereview.appspot.com/6047048


[PATCH][ARM][Testsute] Skip thumb1 test in non-thumb1 target

2012-04-17 Thread Joey Ye
Fix the test case failed in ARM state.

* gcc.target/arm/thumb1-imm.c: Skip it in non-thumb1 target

Index: gcc/testsuite/gcc.target/arm/thumb1-imm.c
===
--- gcc/testsuite/gcc.target/arm/thumb1-imm.c   (revision 186517)
+++ gcc/testsuite/gcc.target/arm/thumb1-imm.c   (working copy)
@@ -1,5 +1,7 @@
 /* Check for thumb1 imm [255-510] moves.  */
 /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-options "-Os" } */
+/* { dg-skip-if "" { ! { arm_thumb1 } } } */
 
 int f()
 {







Symbol table 7/many: debug output facilities

2012-04-17 Thread Jan Hubicka
Hi,
this patch adds dump_symtab and dump_symtab_node for debugging symbol tables.
It also reorganized existing varpool and cgraph dumping code to use the same 
format.
I decided to switch from identifier name to assembler names. It is useless to 
see
all those ctor/dtor functions wihtout knowing what they are and it also makes 
things
look more like real symbol table.
I left mangled dumping as some places where testsuite require it and will work 
on
this incrementally.

Regtested/bootstrapped x86_64-linux, will commit it shortly.

Honza

* cgraph.c (cgraph_node_name): Remove.
(dump_cgraph_node): Use dump_symtab_base; reformat.
* cgraph.h (symtab_node_asm_name, symtab_node_name, dump_symtab,
debug_symtab, dump_symtab_node, debug_symtab_node, dump_symtab_base):
Declare.
(cgraph_node_name, varpool_node_name): Remove.
(cgraph_node_asm_name, varpool_node_asm_name,
cgraph_node_name, varpool_node_name): New.
* tree-pass.h (TODO_dump_cgraph): Rename to ...
(TODO_dump_symtab): ... this one.
* ipa-cp (pass_ipa_cp): Update.
* ia-reference.c (generate_summary, read_write_all_from_decl,
propagate, ipa_reference_read_optimization_summary): Update.
* cgraphunit.c (cgraph_analyze_functions): Update.
(cgraph_optimize): Update.
* ipa-ref.c (ipa_dump_references): Update.
(ipa_dump_refering): Update.
* ipa-inline.c (pass_ipa_inline): Update.
* matrix-reorg.c (pass_ipa_matrix_reorg): Update.
* ipa.c (pass_ipa_function_visibility,
pass_ipa_whole_program_visibility): Update.
* tree-sra.c (pass_early_ipa_sra): Update.
* symtab.c: Include langhooks.h
(symtab_node_asm_name): New.
(symtab_node_name): New.
(symtab_type_names): New static var.
(dump_symtab_base): New.
(dump_symtab_node, dump_symtab): New.
(debug_symtab_node,  debug_symtab): New.
* tree-ssa-structalias.c: Dump symbol table.
* pases.c (execute_todo): Handle TODO_dump_symtab instead
of TODO_dump_cgraph.
* varpoo.c (varpool_node_name): Remove.
(dump_varpool_node): Use dump_symtab_base; reformat.
Index: cgraph.c
===
*** cgraph.c(revision 186525)
--- cgraph.c(working copy)
*** cgraph_inline_failed_string (cgraph_inli
*** 1605,1617 
return cif_string_table[reason];
  }
  
- /* Return name of the node used in debug output.  */
- const char *
- cgraph_node_name (struct cgraph_node *node)
- {
-   return lang_hooks.decl_printable_name (node->symbol.decl, 2);
- }
- 
  /* Names used to print out the availability enum.  */
  const char * const cgraph_availability_names[] =
{"unset", "not_available", "overwritable", "available", "local"};
--- 1605,1610 
*** dump_cgraph_node (FILE *f, struct cgraph
*** 1625,1684 
struct cgraph_edge *edge;
int indirect_calls_count = 0;
  
!   fprintf (f, "%s/%i", cgraph_node_name (node), node->uid);
!   dump_addr (f, " @", (void *)node);
!   if (DECL_ASSEMBLER_NAME_SET_P (node->symbol.decl))
! fprintf (f, " (asm: %s)",
!IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->symbol.decl)));
if (node->global.inlined_to)
! fprintf (f, " (inline copy in %s/%i)",
 cgraph_node_name (node->global.inlined_to),
!node->global.inlined_to->uid);
!   if (node->symbol.same_comdat_group)
! fprintf (f, " (same comdat group as %s/%i)",
!cgraph_node_name (cgraph (node->symbol.same_comdat_group)),
!cgraph (node->symbol.same_comdat_group)->uid);
if (node->clone_of)
! fprintf (f, " (clone of %s/%i)",
!cgraph_node_name (node->clone_of),
!node->clone_of->uid);
if (cgraph_function_flags_ready)
! fprintf (f, " availability:%s",
 cgraph_availability_names [cgraph_function_body_availability 
(node)]);
if (node->analyzed)
  fprintf (f, " analyzed");
-   if (node->symbol.in_other_partition)
- fprintf (f, " in_other_partition");
if (node->count)
  fprintf (f, " executed "HOST_WIDEST_INT_PRINT_DEC"x",
 (HOST_WIDEST_INT)node->count);
if (node->origin)
! fprintf (f, " nested in: %s", cgraph_node_name (node->origin));
if (node->needed)
  fprintf (f, " needed");
-   if (node->symbol.address_taken)
- fprintf (f, " address_taken");
else if (node->reachable)
  fprintf (f, " reachable");
-   else if (node->symbol.used_from_other_partition)
- fprintf (f, " used_from_other_partition");
if (gimple_has_body_p (node->symbol.decl))
  fprintf (f, " body");
if (node->process)
  fprintf (f, " process");
if (node->local.local)
  fprintf (f, " local");
-   if (node->symbol.externally_visible)
- fprintf (f, " externally_visible");
-   if (node->symbol.resolution != LDPR_UNKNOWN)
- fprintf (f, " %s",
-