date:20120321

Re: [Patch/cfgexpand]: also consider assembler_name to call expand_main_function

2012-03-21 Thread Richard Guenther

On Tue, 20 Mar 2012, Tristan Gingold wrote:

> 
> On Mar 20, 2012, at 5:01 PM, Richard Guenther wrote:
> 
> > On Tue, 20 Mar 2012, Tristan Gingold wrote:
> > 
> >> 
> >> On Mar 20, 2012, at 3:19 PM, Richard Guenther wrote:
> >> 
> >> [...]
> >>> 
> >>> I'd rather get away from using a global main_identifier_node, instead
> >>> make that frontend specific, and introduce targetm.main_assembler_name
> >>> which the assembler-name creating langhook would make sure to use
> >>> when mangling what the FE thinks main is.  main_identifier_node should
> >>> not serve any purpose outside of Frontends.
> >>> 
> >>> But I see both as a possible cleanup opportunity, not a necessary change.
> >> 
> >> Something along these lines ?
> > 
> > Yes, but I'd simply call the hook at the places you now use
> > main_assembler_name and not create a global tree node for it.
> 
> But we use it at the beginning of graph_finalize_function, so caching it
> makes sense, doesn't it ?

Well, maybe ;)  I have no strong opinion here.

Richard.

Re: [PATCH] Replace a SRA FIXME with an assert

2012-03-21 Thread Richard Guenther

On Tue, 20 Mar 2012, Martin Jambor wrote:

> Hi,
> 
> On Tue, Mar 20, 2012 at 04:08:31PM +0100, Richard Guenther wrote:
> > On Tue, 20 Mar 2012, Martin Jambor wrote:
> > 
> > > Hi,
> > > 
> > > this patch which removes one of only two FIXMEs in tree-sra.c has been
> > > sitting in my patch queue for over a year.  Yesterday I noticed it
> > > there, bootstrapped and tested it on x86_64-linux and it passed.
> > > 
> > > I'd like to either commit it or just remove the comment, if there
> > > likely still are size inconsistencies in assignments but we are not
> > > planning to do anything with them in foreseeable future (and perhaps
> > > add a note to the bug).
> > > 
> > > So, which should it be?
> > 
> > Well.  Aggregate assignments can still be off I think, especially
> > because of the disconnect between TYPE_SIZE and DECL_SIZE in
> > some cases, considering *p = x; with typeof (x) == typeof (*p)
> > (tail-padding re-use).
> > 
> > The comments in PR40058 hint at that that issue might be fixed,
> > but I also remember issues with Ada.
> 
> The other FIXME in tree-sra.c suggests that Ada can produce
> VIEW_CONVERT_EXPRs with a different size than its argument, perhaps
> that is it (I'll try removing that one too).

Yeah, it does that.

> > 
> > GIMPLE verification ensures compatible types (but not a match
> > of type_size / decl_size which will be exposed by get_ref_base_and_extent)
> > 
> > But the real question is what do you want to guard against here?
> > The assert at least looks like it is going to triggert at some point,
> > but, would it be a problem if the sizes to not match?
> > 
> 
> I really can't remember what exactly happened but I do remember it did
> lead to a bug (it's been already part of the chck-in of new SRA so svn
> history does not help).  We copy access tree children accross
> assignments and also change the type of the LHS access to a scalar if
> the RHS access is a scalar (assignments into a structure containing
> just one scalar) and both could lead to some access tree children
> covering larger part of the aggregate than the parent, making the
> children un-findable or even creating overlaps which are prohibited
> for SRA candidates.
> 
> But as I wrote before, I'll be happy to just remove the FIXME comment.

I'd just remove the comment then.

Richard.

> Martin
> 
> 
> > Richard.
> > 
> > 
> > > 2011-01-06  Martin Jambor  
> > > 
> > >   * tree-sra.c (build_accesses_from_assign): Make size equality test
> > >   an assert.
> > > 
> > > Index: src/gcc/tree-sra.c
> > > ===
> > > --- src.orig/gcc/tree-sra.c
> > > +++ src/gcc/tree-sra.c
> > > @@ -1175,13 +1175,11 @@ build_accesses_from_assign (gimple stmt)
> > >&& !lacc->grp_unscalarizable_region
> > >&& !racc->grp_unscalarizable_region
> > >&& AGGREGATE_TYPE_P (TREE_TYPE (lhs))
> > > -  /* FIXME: Turn the following line into an assert after PR 40058 is
> > > -  fixed.  */
> > > -  && lacc->size == racc->size
> > >&& useless_type_conversion_p (lacc->type, racc->type))
> > >  {
> > >struct assign_link *link;
> > >  
> > > +  gcc_assert (lacc->size == racc->size);
> > >link = (struct assign_link *) pool_alloc (link_pool);
> > >memset (link, 0, sizeof (struct assign_link));
> 
> 

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix ImendÃ¶rffer

[PATCH] Fix PR52636

2012-03-21 Thread Richard Guenther


This fixes PR52636 now that we treat all constants as constants
we need to convert them to the appropriate vector type.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-03-20  Richard Guenther  

PR tree-optimizer/52636
* tree-vect-slp.c (vect_get_constant_vectors): Convert constants
to the appropriate type.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 185563)
--- gcc/tree-vect-slp.c (working copy)
*** vect_get_constant_vectors (tree op, slp_
*** 2363,2368 
--- 2363,2374 
  
/* Create 'vect_ = {op0,op1,...,opn}'.  */
number_of_places_left_in_vector--;
+ if (constant_p
+ && !types_compatible_p (TREE_TYPE (vector_type), TREE_TYPE (op)))
+   {
+ op = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (vector_type), op);
+ gcc_assert (op && CONSTANT_CLASS_P (op));
+   }
  elts[number_of_places_left_in_vector] = op;
  
if (number_of_places_left_in_vector == 0)

Re: [Patch/cfgexpand]: also consider assembler_name to call expand_main_function

2012-03-21 Thread Tristan Gingold


On Mar 20, 2012, at 6:17 PM, Jan Hubicka wrote:

>> On Tue, 20 Mar 2012, Tristan Gingold wrote:
>> 
>>> 
>>> On Mar 15, 2012, at 10:37 AM, Richard Guenther wrote:
>>> 
 On Wed, 14 Mar 2012, Tristan Gingold wrote:
>>> [?]
>>> 
 
 Well.  To make this work in LTO the "main" function (thus, the program
 entry point) should be marked at cgraph level and all users of
 MAIN_NAME_P should instead check a flag on the cgraph node.
 
> Will write a predicate in tree.[ch].
 
 Please instead transition "main-ness" to the graph.
> 
> Yep, I also agree that it is something cgraph code should care about instead 
> of
> random placess across the whole middle-end.
>>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>>> index bd21169..7a7a774 100644
>>> --- a/gcc/cfgexpand.c
>>> +++ b/gcc/cfgexpand.c
>>> @@ -4513,9 +4513,8 @@ gimple_expand_cfg (void)
>>> 
>>>   /* If this function is `main', emit a call to `__main'
>>>  to run global initializers, etc.  */
>>> -  if (DECL_NAME (current_function_decl)
>>> -  && MAIN_NAME_P (DECL_NAME (current_function_decl))
>>> -  && DECL_FILE_SCOPE_P (current_function_decl))
>>> +  if (DECL_FILE_SCOPE_P (current_function_decl)
>>> +  && cgraph_main_function_p (cgraph_get_node (current_function_decl)))
>>> expand_main_function ();
>> 
>> The DECL_FILE_SCOPE_P check is redundant, please remove them everywhere
>> you call cgraph_main_function_p.  I suppose returning false if the
>> cgraph node is NULL in cgraph_main_function_p would be good.
> 
> How do we handle the cases before cgraph is built with this approach?

Only front-end code need to check wether a function is main before they add
it in cgraph.  As each front-end should know which function is main, this is
not an issue for them.

>>> +/* Return true iff NODE is the main function (main in C).  */
>>> +static inline bool
>>> +cgraph_main_function_p (struct cgraph_node *node)
>>> +{
>>> +  return node->local.main_function;
>> 
>> node && node->local.main_function
> 
> Well, cgraph strategy is ito ICE when NODE is NULL :)
> We could have cgraph_main_function_decl_p wrapper that does the NULL 
> handling, but I still don't
> see how this helps - i.e. when you don't have cgraph node you don't have info 
> whether function
> is main or not, so you should not even try to ask.
> In what cases we ICE here?

We don't ICE here - as long as graph_main_function_p is called after front-end.

>>> +}
>>> +
>>> /* Walk all functions with body defined.  */
>>> #define FOR_EACH_FUNCTION_WITH_GIMPLE_BODY(node) \
>>>for ((node) = cgraph_first_function_with_gimple_body (); (node); \
>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
>>> index 516f187..4a59f63 100644
>>> --- a/gcc/cgraphunit.c
>>> +++ b/gcc/cgraphunit.c
>>> @@ -346,6 +346,10 @@ cgraph_finalize_function (tree decl, bool nested)
>>>   notice_global_symbol (decl);
>>>   node->local.finalized = true;
>>>   node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL;
>>> +  node->local.main_function =
>>> +DECL_FILE_SCOPE_P (decl)
>>> +&& ((!DECL_ASSEMBLER_NAME_SET_P (decl) && MAIN_NAME_P (DECL_NAME 
>>> (decl)))
>>> +   ||decl_assembler_name_equal (decl, main_identifier_node));
>> 
>> If we finalize a function we should always create an assembler name,
>> thus I'd change the above to
>> 
>>  node->local.main_function = decl_assembler_name_equal (decl, 
>> main_identifier_node);
>> 
>> btw, decl_assembler_name_equal doesn't seem to remove target-specific
>> mangling - do some OSes "mangle" main differently (I'm thinking of
>> leading underscores or complete renames)?  Thus, I guess the
>> targets might want to be able to provide the main_identifier_assember_name
>> you use here.
> 
> Yes, name function is mangled, i.e. it is _main on djgpp as long as I 
> remember.
> This is why we have the main_identifier_node to go through the mandling 
> procedure.


USER_LABEL_PREFIX is handled by decl_assembler_name_equal.

One way to simplify that is to change the NESTED argument of 
cgraph_finalize_function
to LEVEL, which could be either main, top or nested.  With this mechanism, every
front-end will explicitly tell to the middle-end which function is the main 
entry point.

Thoughts ?

Tristan.

whic

[PATCH] Bug fix in store_bit_field_1 for big endian targets (issue 51893)

2012-03-21 Thread Aurelien Buhrig

Hi,

This patch (for 4.6) fixes a wrong subword index computation in
store_bit_field_1 for big endian targets when value is at least 4 times
bigger than a word (DI REG value with HI words).

It fixes a regression on gcc.c-torture/execute/bitfld-3.c for my current
backend port.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51893

OK to commit?

Aurélien
--- gcc-4.6.1.orig/gcc/expmed.c 2011-05-22 21:02:59.0 +0200
+++ src/gcc/expmed.c2012-01-19 09:32:04.0 +0100
@@ -589,7 +589,10 @@
{
  /* If I is 0, use the low-order word in both field and target;
 if I is 1, use the next to lowest word; and so on.  */
- unsigned int wordnum = (backwards ? nwords - i - 1 : i);
+ unsigned int wordnum = (backwards
+  ? GET_MODE_SIZE(fieldmode)/UNITS_PER_WORD
+- i - 1 
+  : i);
  unsigned int bit_offset = (backwards
 ? MAX ((int) bitsize - ((int) i + 1)
* BITS_PER_WORD,

Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread Richard Guenther

On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski  wrote:
> On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt
>  wrote:
>> Greetings,
>>
>> Now that we're into stage 1 again, I'd like to submit the first round of
>> changes for dominator-based strength reduction, which will address
>> issues from PR22586, PR35308, PR46556, and perhaps others.  I'm
>> attaching two patches: the smaller (slsr-part1) is the patch I'm
>> submitting for approval today, while the larger (slsr-fyi) is for
>> reference only, but may be useful if questions arise about how the small
>> patch fits into the intended whole.
>>
>> This patch contains the logic for identifying strength reduction
>> candidates, and makes replacements only for those candidates where the
>> stride is a fixed constant.  Replacement for candidates with fixed but
>> unknown strides are not implemented herein, but that logic can be viewed
>> in the larger patch.  This patch does not address strength reduction of
>> data reference expressions, or candidates with conditional increments;
>> those issues will be dealt with in future patches.
>>
>> The cost model is built on the one used by tree-ssa-ivopts.c, and I've
>> added some new instruction costs to that model in place.  It might
>> eventually be good to divorce that modeling code from IVOPTS, but that's
>> an orthogonal patch and somewhat messy.
>
> I think this is the wrong way to do straight line strength reduction
> considering we have a nice value numbering system which should be easy
> to extended to support it.

Well, it is easy to handle very specific easy cases like

a = i * 2;
b = i * 3;
c = i * 4;

to transform it to

a = i * 2;
b = a + i;
c = b + i;

but already

a = i * 2;
b = i * 4;
c = i * 6;

would need extra special code.  The easy case could be handled in eliminate ()
by, when seeing A * CST, looking up A * (CST - 1) and if that
succeeds, transform
it to VAL + A.  Cost issues are increasing the lifetime of VAL.  I've done this
simple case at some point, but it failed to handle the common associated cases,
when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 +
3, etc.  I think
it is the re-association in case of a strength-reduction opportunity
that makes the
separate pass better?  How would you suggest handling this case in the
VN framework?  Detect the a * 3 + 3 pattern and then do two lookups, one for
a * 2 and one for val + 2?  But then we still don't have a value for a + 1
to re-use ...

Bill, experimenting with pattern detection in eliminate () would be a
possibility.

Thanks,
Richard.



> Thanks,
> Andrew pinski
>
>
>>
>> Thanks,
>> Bill
>>
>>
>> gcc:
>>
>> 2012-03-18  Bill Schmidt  
>>
>>        * tree-pass.h (pass_strength_reduction): New decl.
>>        * tree-ssa-loop-ivopts.c (add_cost): Remove #undef; rename to
>>        add_regs_cost.
>>        (multiply_regs_cost): New function.
>>        (add_const_cost): Likewise.
>>        (extend_or_trunc_cost): Likewise.
>>        (negate_cost): Likewise.
>>        (get_address_cost): Rename add_cost to add_regs_cost.
>>        (force_expr_to_var_cost): Likewise.
>>        (get_computation_cost_at): Likewise.
>>        (determine_iv_cost): Likewise.
>>        * timevar.def (TV_TREE_SLSR): New timevar.
>>        * tree-ssa-strength-reduction.c: New.
>>        * tree-flow.h (add_regs_cost): New decl.
>>        (multiply_regs_cost): Likewise.
>>        (add_const_cost): Likewise.
>>        (extend_or_trunc_cost): Likewise.
>>        (negate_cost): Likewise.
>>        * Makefile.in (tree-ssa-strength-reduction.o): New dependencies.
>>        * passes.c (init_optimization_passes): Add pass_strength_reduction.
>>
>> gcc/testsuite:
>>
>> 2012-03-18  Bill Schmidt  
>>
>>        * gcc.dg/tree-ssa/slsr-1.c: New test.
>>        * gcc.dg/tree-ssa/slsr-2.c: Likewise.
>>        * gcc.dg/tree-ssa/slsr-3.c: Likewise.
>>        * gcc.dg/tree-ssa/slsr-4.c: Likewise.
>>

Re: [patch tree-optimization]: Fix for PR 45397 part 2 of 2

2012-03-21 Thread Richard Guenther

On Thu, Mar 15, 2012 at 3:45 PM, Kai Tietz  wrote:
> 2012/3/15 Richard Guenther :
>> On Thu, Mar 15, 2012 at 2:46 PM, Kai Tietz  wrote:
>>> 2012/3/15 Richard Guenther :
 On Thu, Mar 15, 2012 at 2:09 PM, Kai Tietz  wrote:
> Hi,
>
> this is the second part of the patch for this problem.  It adds some
> basic simplifications for ==/!=
> comparisons for eliminating redudant operands.
>
> It adds the following patterns:
>  -X ==/!= Z - X -> Z ==/!= 0.
>  ~X ==/!= Z ^ X -> Z ==/!= ~0
>  X ==/!= X - Y -> Y == 0
>  X ==/!= X + Y -> Y == 0
>  X ==/!= X ^ Y -> Y == 0
>  (X - Y) ==/!= (Z - Y) -> X ==/!= Z
>  (Y - X) ==/!= (Y - Z) -> X ==/!= Z
>  (X + Y) ==/!= (X + Z) -> Y ==/!= Z
>  (X + Y) ==/!= (Z + X) -> Y ==/!= Z
>  (X ^ Y) ==/!= (Z ^ X) -> Y ==/!= Z

 Can you re-base this patch to work without the previous one?  Also
 please coordinate with Andrew.  Note that all of these(?) simplifications
 are already done by fold_comparison which we could share if you'd split
 out the EXPR_P op0/op1 cases with separated operands/code.

 Richard.
>>>
>>> Hmm, fold_comparison doesn't do the same thing as it checks for
>>> possible overflow.  This is true for comparisons not being ==/!= or
>>> having operands of none-integral-type.  But for ==/!= with integral
>>> typed arguments  the overflow doesn't matter at all.  And exactly this
>>> is what patch implements here.
>>
>> fold_comparison does not check for overflow for ==/!=.
>>
>>> This optimization of course is just desired in non-AST form, as we
>>> otherwise loose information in FE.  Therefore I didn't added it to
>>> fold_const.
>>
>> Which pieces are not already in fold-const btw?  forwprop already
>> re-constructs trees for the defs of the lhs/rhs of a comparison.
>>
>> Richard.
>
> I have tried to use here instead a call to fold_build2 instead, and I
> had to notice that it didn't optimized a single case (beside the - and
> ~ case on both sides).
>
> I see in fold const for example in the pattern 'X +- C1 CMP Y +- C2'
> to 'X CMP Y +- C2 +- C1' explicit the check for it.
>
> ...
> /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
>   X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
>   the resulting offset is smaller in absolute value than the
>   original one.  */
> if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
>    && (TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR)
> ...

Because the transform is not valid if Y +- C2 +- C1 overflows.  It is not valid
because overflow is undefined, not because the comparison would do the
wrong thing.  You'd have to change the addition to unsigned.

> The same for pattern X +- C1 CMP C2 to X CMP C2 +- C1.

Well, this is obviously just a missed optimization in fold-const.c then.  Mind
conditionalizing the overflow check to codes not NE_EXPR or EQ_EXPR?

> The cases for '(X + Y) ==/!= (Z + X)' and co have the same issue or
> are simply not present.

That's true.  I suppose they were considered too special to worry about.
Did you see these cases in real code?

> Sorry fold_const doesn't cover this at all.

It covers part of it.

> Kai

Re: [patch tree-optimization]: Fix for PR 45397 part 2 of 2

2012-03-21 Thread Kai Tietz

2012/3/21 Richard Guenther :
> On Thu, Mar 15, 2012 at 3:45 PM, Kai Tietz  wrote:
>> 2012/3/15 Richard Guenther :
>>> On Thu, Mar 15, 2012 at 2:46 PM, Kai Tietz  wrote:
 2012/3/15 Richard Guenther :
> On Thu, Mar 15, 2012 at 2:09 PM, Kai Tietz  
> wrote:
>> Hi,
>>
>> this is the second part of the patch for this problem.  It adds some
>> basic simplifications for ==/!=
>> comparisons for eliminating redudant operands.
>>
>> It adds the following patterns:
>>  -X ==/!= Z - X -> Z ==/!= 0.
>>  ~X ==/!= Z ^ X -> Z ==/!= ~0
>>  X ==/!= X - Y -> Y == 0
>>  X ==/!= X + Y -> Y == 0
>>  X ==/!= X ^ Y -> Y == 0
>>  (X - Y) ==/!= (Z - Y) -> X ==/!= Z
>>  (Y - X) ==/!= (Y - Z) -> X ==/!= Z
>>  (X + Y) ==/!= (X + Z) -> Y ==/!= Z
>>  (X + Y) ==/!= (Z + X) -> Y ==/!= Z
>>  (X ^ Y) ==/!= (Z ^ X) -> Y ==/!= Z
>
> Can you re-base this patch to work without the previous one?  Also
> please coordinate with Andrew.  Note that all of these(?) simplifications
> are already done by fold_comparison which we could share if you'd split
> out the EXPR_P op0/op1 cases with separated operands/code.
>
> Richard.

 Hmm, fold_comparison doesn't do the same thing as it checks for
 possible overflow.  This is true for comparisons not being ==/!= or
 having operands of none-integral-type.  But for ==/!= with integral
 typed arguments  the overflow doesn't matter at all.  And exactly this
 is what patch implements here.
>>>
>>> fold_comparison does not check for overflow for ==/!=.
>>>
 This optimization of course is just desired in non-AST form, as we
 otherwise loose information in FE.  Therefore I didn't added it to
 fold_const.
>>>
>>> Which pieces are not already in fold-const btw?  forwprop already
>>> re-constructs trees for the defs of the lhs/rhs of a comparison.
>>>
>>> Richard.
>>
>> I have tried to use here instead a call to fold_build2 instead, and I
>> had to notice that it didn't optimized a single case (beside the - and
>> ~ case on both sides).
>>
>> I see in fold const for example in the pattern 'X +- C1 CMP Y +- C2'
>> to 'X CMP Y +- C2 +- C1' explicit the check for it.
>>
>> ...
>> /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
>>   X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
>>   the resulting offset is smaller in absolute value than the
>>   original one.  */
>> if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
>>    && (TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR)
>> ...
>
> Because the transform is not valid if Y +- C2 +- C1 overflows.  It is not 
> valid
> because overflow is undefined, not because the comparison would do the
> wrong thing.  You'd have to change the addition to unsigned.
>
>> The same for pattern X +- C1 CMP C2 to X CMP C2 +- C1.
>
> Well, this is obviously just a missed optimization in fold-const.c then.  Mind
> conditionalizing the overflow check to codes not NE_EXPR or EQ_EXPR?
>
>> The cases for '(X + Y) ==/!= (Z + X)' and co have the same issue or
>> are simply not present.
>
> That's true.  I suppose they were considered too special to worry about.
> Did you see these cases in real code?
>
>> Sorry fold_const doesn't cover this at all.
>
> It covers part of it.
>
>> Kai

Sure, the test code shown in this patch isn't that unusual.
Especially in gimple (by using different statements) such construct
are happening.

Eg.:

int f1 (int a, int b, int c)
{
  if ((a + b) == (c + a))
   return 1;
  return 0;
}

int f2 (int a, int b, int c)
{
  if ((a ^ b) == (a  ^ c))
   return 1;
  return 0;
}


int f2 (int a, int b)
{
  if (-a == (b - a))
   return 1;
  return 0;
}

In all those cases the use of variable should be optimized out.
Instead we are producing pretty weak code for those cases.

Kai

Re: [patch tree-optimization]: Fix for PR 45397 part 2 of 2

2012-03-21 Thread Richard Guenther

On Wed, Mar 21, 2012 at 10:56 AM, Kai Tietz  wrote:
> 2012/3/21 Richard Guenther :
>> On Thu, Mar 15, 2012 at 3:45 PM, Kai Tietz  wrote:
>>> 2012/3/15 Richard Guenther :
 On Thu, Mar 15, 2012 at 2:46 PM, Kai Tietz  wrote:
> 2012/3/15 Richard Guenther :
>> On Thu, Mar 15, 2012 at 2:09 PM, Kai Tietz  
>> wrote:
>>> Hi,
>>>
>>> this is the second part of the patch for this problem.  It adds some
>>> basic simplifications for ==/!=
>>> comparisons for eliminating redudant operands.
>>>
>>> It adds the following patterns:
>>>  -X ==/!= Z - X -> Z ==/!= 0.
>>>  ~X ==/!= Z ^ X -> Z ==/!= ~0
>>>  X ==/!= X - Y -> Y == 0
>>>  X ==/!= X + Y -> Y == 0
>>>  X ==/!= X ^ Y -> Y == 0
>>>  (X - Y) ==/!= (Z - Y) -> X ==/!= Z
>>>  (Y - X) ==/!= (Y - Z) -> X ==/!= Z
>>>  (X + Y) ==/!= (X + Z) -> Y ==/!= Z
>>>  (X + Y) ==/!= (Z + X) -> Y ==/!= Z
>>>  (X ^ Y) ==/!= (Z ^ X) -> Y ==/!= Z
>>
>> Can you re-base this patch to work without the previous one?  Also
>> please coordinate with Andrew.  Note that all of these(?) simplifications
>> are already done by fold_comparison which we could share if you'd split
>> out the EXPR_P op0/op1 cases with separated operands/code.
>>
>> Richard.
>
> Hmm, fold_comparison doesn't do the same thing as it checks for
> possible overflow.  This is true for comparisons not being ==/!= or
> having operands of none-integral-type.  But for ==/!= with integral
> typed arguments  the overflow doesn't matter at all.  And exactly this
> is what patch implements here.

 fold_comparison does not check for overflow for ==/!=.

> This optimization of course is just desired in non-AST form, as we
> otherwise loose information in FE.  Therefore I didn't added it to
> fold_const.

 Which pieces are not already in fold-const btw?  forwprop already
 re-constructs trees for the defs of the lhs/rhs of a comparison.

 Richard.
>>>
>>> I have tried to use here instead a call to fold_build2 instead, and I
>>> had to notice that it didn't optimized a single case (beside the - and
>>> ~ case on both sides).
>>>
>>> I see in fold const for example in the pattern 'X +- C1 CMP Y +- C2'
>>> to 'X CMP Y +- C2 +- C1' explicit the check for it.
>>>
>>> ...
>>> /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
>>>   X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
>>>   the resulting offset is smaller in absolute value than the
>>>   original one.  */
>>> if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
>>>    && (TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR)
>>> ...
>>
>> Because the transform is not valid if Y +- C2 +- C1 overflows.  It is not 
>> valid
>> because overflow is undefined, not because the comparison would do the
>> wrong thing.  You'd have to change the addition to unsigned.
>>
>>> The same for pattern X +- C1 CMP C2 to X CMP C2 +- C1.
>>
>> Well, this is obviously just a missed optimization in fold-const.c then.  
>> Mind
>> conditionalizing the overflow check to codes not NE_EXPR or EQ_EXPR?
>>
>>> The cases for '(X + Y) ==/!= (Z + X)' and co have the same issue or
>>> are simply not present.
>>
>> That's true.  I suppose they were considered too special to worry about.
>> Did you see these cases in real code?
>>
>>> Sorry fold_const doesn't cover this at all.
>>
>> It covers part of it.
>>
>>> Kai
>
> Sure, the test code shown in this patch isn't that unusual.
> Especially in gimple (by using different statements) such construct
> are happening.
>
> Eg.:
>
> int f1 (int a, int b, int c)
> {
>  if ((a + b) == (c + a))
>   return 1;
>  return 0;
> }
>
> int f2 (int a, int b, int c)
> {
>  if ((a ^ b) == (a  ^ c))
>   return 1;
>  return 0;
> }
>
>
> int f2 (int a, int b)
> {
>  if (-a == (b - a))
>   return 1;
>  return 0;
> }
>
> In all those cases the use of variable should be optimized out.
> Instead we are producing pretty weak code for those cases.

True, I agree we should try to handle these.  Did you talk to Andrew
with respect to the gimple-combining thing he is working on?

Richard.

> Kai

Re: [RFC PATCH 0/3] Fixing expansion of misaligned MEM_REFs on strict-alignment targets

2012-03-21 Thread Martin Jambor

Hi,

On Tue, Mar 20, 2012 at 08:16:04PM +0100, Georg-Johann Lay wrote:
> Martin Jambor wrote:
> > Hi,
> > 
> > this is another iteration of my attempts to fix expansion of
> > misaligned memory accesses on strict-alignment platforms (which was
> > suggested by Richi in
> > http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00931.html and my first
> > attempt was posted as
> > http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00319.html).
> > 
> > This time I got further, to big extent thanks to parts of Richi's
> > fixes of PR 50444 which cleaned up expr.c considerably.  I have
> > successfully bootstrapped the combined patch on x86_64-linux,
> > i686-linux, ia64-linux (without Ada) and sparc64-linux (without Java).
> > I have run the c and c++ testsuites on individual patches on sparc64
> > and ia64 too.
> > 
> > Nevertheless, since I still lack experience in this area, there will
> > almost certainly be comments and suggestions and therefore I have
> > divided the three main changes to three different patches, so that
> > they are easier to comment on by both me and anybody reviewing them.
> > 
> > Thanks in advance for any comments,
> > 
> > Martin
> 
> Hi Martin.
> 
> The new test cases make implications on the size of int: for example
>  they crash for targets with sizeof(int) == 2
> 

Crash?  I assume the tests misaligned-expand-[12].c abort because of
the comparison with 0xdeadbeef constant, misaligned-expand-3.c should
not be affected by size of int.  If that is the case, does the
following patch fixes the issue for you?

Sorry for the inconvenience,

Martin


2012-03-21  Martin Jambor  

* gcc.dg/misaligned-expand-1.c (cst): Cast to int.
* gcc.dg/misaligned-expand-2.c (cst): Likewise.

Index: src/gcc/testsuite/gcc.dg/misaligned-expand-1.c
===
--- src.orig/gcc/testsuite/gcc.dg/misaligned-expand-1.c
+++ src/gcc/testsuite/gcc.dg/misaligned-expand-1.c
@@ -14,7 +14,7 @@ foo (myint *p)
   return *p;
 }
 
-#define cst 0xdeadbeef
+#define cst (int) 0xdeadbeef
 #define NUM 8
 
 struct blah
Index: src/gcc/testsuite/gcc.dg/misaligned-expand-2.c
===
--- src.orig/gcc/testsuite/gcc.dg/misaligned-expand-2.c
+++ src/gcc/testsuite/gcc.dg/misaligned-expand-2.c
@@ -14,7 +14,7 @@ foo (myint *p, unsigned int i)
   *p = i;
 }
 
-#define cst 0xdeadbeef
+#define cst (int) 0xdeadbeef
 #define NUM 8
 
 struct blah
@@ -25,8 +25,6 @@ struct blah
 
 struct blah g;
 
-#define cst 0xdeadbeef
-
 int
 main (int argc, char **argv)
 {

Re: [PATCH][ARM] Improve use of conditional execution in thumb mode.

2012-03-21 Thread Andrew Stubbs


On 19/03/12 14:48, Richard Earnshaw wrote:

OK.


Committed.

Andrew

[PATCH, ARM] Don't force vget_lane returning a 64-bit result to transfer to core registers

2012-03-21 Thread Richard Earnshaw

Semantically the neon intrinsic vgetq_lane_[su]64 returns a 64 bit
sub-object of a 128-bit vector; there's no real need for the intrinsic
to map onto a specific machine instruction.

Indeed, if force a particular instruction that moves the result into a
core register, but then want to use the result in the vector unit, we
don't really want to have to move the result back to the other register
bank.  However, that's what we do today.

This patch changes the way we expand these operations so that we
no-longer force selection of the get-lane operation.

A side effect of this change is that we now spit out the fmrrd mnemonic
rather than the vmov equivalent.  As a consequence I've updated the
testsuite to allow for this change.  The changes to the ML files are
pretty mechanical, but I don't speak ML so it would be helpful if
another pair of eyes could check that bit over and tell me if I've
missed something subtle.

Tested on trunk and gcc-4.7, but only installed on trunk.

R.

* neon.md (neon_vget_lanev2di): Use gen_lowpart and gen_highpart.
* config/arm/neon.ml (Fixed_return_reg): Renamed to fixed_vector_reg.
All callers changed.
(Fixed_core_reg): New feature.
(Vget_lane [sizes S64 and U64]): Add Fixed_core_reg.  Allow
fmrrd in disassembly.
* neon-testgen.ml: Handle Fixed_core_reg.

testsuite/
* gcc.target/arm/neon/vgetQ_laneu64.c: Regenerated.
* gcc.target/arm/neon/vgetQ_lanes64.c: Likewise.--- config/arm/neon-testgen.ml  (revision 185587)
+++ config/arm/neon-testgen.ml  (local)
@@ -79,9 +79,12 @@ let emit_automatics chan c_types feature
   (* The intrinsic returns a value.  We need to do explict register
  allocation for vget_low tests or they fail because of copy
  elimination.  *)
-  ((if List.mem Fixed_return_reg features then
+  ((if List.mem Fixed_vector_reg features then
   Printf.fprintf chan "  register %s out_%s asm (\"d18\");\n"
  return_ty return_ty
+else if List.mem Fixed_core_reg features then
+  Printf.fprintf chan "  register %s out_%s asm (\"r0\");\n"
+ return_ty return_ty
 else
   Printf.fprintf chan "  %s out_%s;\n" return_ty return_ty);
   emit ())
--- config/arm/neon.md  (revision 185587)
+++ config/arm/neon.md  (local)
@@ -2720,14 +2720,24 @@ (define_expand "neon_vget_lanedi"
 })
 
 (define_expand "neon_vget_lanev2di"
-  [(match_operand:DI 0 "s_register_operand" "=r")
-   (match_operand:V2DI 1 "s_register_operand" "w")
-   (match_operand:SI 2 "immediate_operand" "i")
-   (match_operand:SI 3 "immediate_operand" "i")]
+  [(match_operand:DI 0 "s_register_operand" "")
+   (match_operand:V2DI 1 "s_register_operand" "")
+   (match_operand:SI 2 "immediate_operand" "")
+   (match_operand:SI 3 "immediate_operand" "")]
   "TARGET_NEON"
 {
-  neon_lane_bounds (operands[2], 0, 2);
-  emit_insn (gen_vec_extractv2di (operands[0], operands[1], operands[2]));
+  switch (INTVAL (operands[2]))
+{
+case 0:
+  emit_move_insn (operands[0], gen_lowpart (DImode, operands[1]));
+  break;
+case 1:
+  emit_move_insn (operands[0], gen_highpart (DImode, operands[1]));
+  break;
+default:
+  neon_lane_bounds (operands[2], 0, 1);
+  FAIL;
+}
   DONE;
 })
 
--- config/arm/neon.ml  (revision 185587)
+++ config/arm/neon.ml  (local)
@@ -234,7 +234,8 @@ type features =
cases.  The function supplied must return the integer to be written
into the testcase for the argument number (0-based) supplied to it.  *)
   | Const_valuator of (int -> int)
-  | Fixed_return_reg
+  | Fixed_vector_reg
+  | Fixed_core_reg
 
 exception MixedMode of elts * elts
 
@@ -1009,7 +1010,8 @@ let ops =
 Vget_lane,
   [InfoWord;
Disassembles_as [Use_operands [| Corereg; Corereg; Dreg |]];
-   Instruction_name ["vmov"]; Const_valuator (fun _ -> 0)],
+   Instruction_name ["vmov"; "fmrrd"]; Const_valuator (fun _ -> 0);
+   Fixed_core_reg],
   Use_operands [| Corereg; Qreg; Immed |],
   "vgetQ_lane", notype_2, [S64; U64];
 
@@ -1125,7 +1127,7 @@ let ops =
   notype_1, pf_su_8_64;
 Vget_low, [Instruction_name ["vmov"];
Disassembles_as [Use_operands [| Dreg; Dreg |]];
-  Fixed_return_reg],
+  Fixed_vector_reg],
   Use_operands [| Dreg; Qreg |], "vget_low",
   notype_1, pf_su_8_32;
  Vget_low, [No_op],
--- testsuite/gcc.target/arm/neon/vgetQ_lanes64.c   (revision 185587)
+++ testsuite/gcc.target/arm/neon/vgetQ_lanes64.c   (local)
@@ -10,11 +10,11 @@
 
 void test_vgetQ_lanes64 (void)
 {
-  int64_t out_int64_t;
+  register int64_t out_int64_t asm ("r0");
   int64x2_t arg0_int64x2_t;
 
   out_int64_t = vgetq_lane_s64 (arg0_int64x2_t, 0);
 }
 
-/* { dg-final { scan-assembler "vmov\[ \]+\[rR\]\[0-9\]+, 
\[rR\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\

Re: C++ PATCH to mangling of 'new auto'

2012-03-21 Thread Jakub Jelinek

On Wed, Mar 21, 2012 at 12:04:18AM -0400, Jason Merrill wrote:
> This also seems like it might be a candidate for 4.7.0.  What do you
> think, Jakub?

Ok for 4.7.0. 

Jakub

RE: [Patch,AVR]: Hack around PR rtl-optimization/52543, Take #2

2012-03-21 Thread Weddington, Eric



> -Original Message-
> From: Georg-Johann Lay 
> Sent: Tuesday, March 20, 2012 1:56 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Denis Chertykov; Weddington, Eric
> Subject: Re: [Patch,AVR]: Hack around PR rtl-optimization/52543, Take
#2
> 
> And here is the patch...
> 
> Georg-Johann Lay wrote:
> 

Please commit.

Re: [PATCH] Fix PR52636

2012-03-21 Thread Paolo Carlini


On 03/21/2012 09:06 AM, Richard Guenther wrote:

2012-03-20  Richard Guenther

PR tree-optimizer/52636

Typo ;)

Paolo.

Re: [RFC PATCH 0/3] Fixing expansion of misaligned MEM_REFs on strict-alignment targets

2012-03-21 Thread Georg-Johann Lay

Martin Jambor wrote:
> Hi,
> 
> On Tue, Mar 20, 2012 at 08:16:04PM +0100, Georg-Johann Lay wrote:
>> Martin Jambor wrote:
>>> Hi,
>>>
>>> this is another iteration of my attempts to fix expansion of
>>> misaligned memory accesses on strict-alignment platforms (which was
>>> suggested by Richi in
>>> http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00931.html and my first
>>> attempt was posted as
>>> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00319.html).
>>>
>>> This time I got further, to big extent thanks to parts of Richi's
>>> fixes of PR 50444 which cleaned up expr.c considerably.  I have
>>> successfully bootstrapped the combined patch on x86_64-linux,
>>> i686-linux, ia64-linux (without Ada) and sparc64-linux (without Java).
>>> I have run the c and c++ testsuites on individual patches on sparc64
>>> and ia64 too.
>>>
>>> Nevertheless, since I still lack experience in this area, there will
>>> almost certainly be comments and suggestions and therefore I have
>>> divided the three main changes to three different patches, so that
>>> they are easier to comment on by both me and anybody reviewing them.
>>>
>>> Thanks in advance for any comments,
>>>
>>> Martin
>> Hi Martin.
>>
>> The new test cases make implications on the size of int: for example
>>  they crash for targets with sizeof(int) == 2
>>
> 
> Crash?  I assume the tests misaligned-expand-[12].c abort because of
> the comparison with 0xdeadbeef constant, misaligned-expand-3.c should

Yes. With "crash" I meant "failing at run-time" as opposed to a fail at
compile-time.

> not be affected by size of int.  If that is the case, does the
> following patch fixes the issue for you?

Yes, the tests pass now. I went ahead and applied your patch:

http://gcc.gnu.org/viewcvs?view=revision&revision=185602

Johann


> Martin
> 
> 
> 2012-03-21  Martin Jambor  
> 
>   * gcc.dg/misaligned-expand-1.c (cst): Cast to int.
>   * gcc.dg/misaligned-expand-2.c (cst): Likewise.
> 
> Index: src/gcc/testsuite/gcc.dg/misaligned-expand-1.c
> ===
> --- src.orig/gcc/testsuite/gcc.dg/misaligned-expand-1.c
> +++ src/gcc/testsuite/gcc.dg/misaligned-expand-1.c
> @@ -14,7 +14,7 @@ foo (myint *p)
>return *p;
>  }
>  
> -#define cst 0xdeadbeef
> +#define cst (int) 0xdeadbeef
>  #define NUM 8
>  
>  struct blah
> Index: src/gcc/testsuite/gcc.dg/misaligned-expand-2.c
> ===
> --- src.orig/gcc/testsuite/gcc.dg/misaligned-expand-2.c
> +++ src/gcc/testsuite/gcc.dg/misaligned-expand-2.c
> @@ -14,7 +14,7 @@ foo (myint *p, unsigned int i)
>*p = i;
>  }
>  
> -#define cst 0xdeadbeef
> +#define cst (int) 0xdeadbeef
>  #define NUM 8
>  
>  struct blah
> @@ -25,8 +25,6 @@ struct blah
>  
>  struct blah g;
>  
> -#define cst 0xdeadbeef
> -
>  int
>  main (int argc, char **argv)
>  {
> 
> 
>

Re: remove wrong code in immed_double_const

2012-03-21 Thread Richard Sandiford

Mike Stump  writes:
> diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
> index de45a22..0c6dc45 100644
> --- a/gcc/doc/rtl.texi
> +++ b/gcc/doc/rtl.texi
> @@ -1530,7 +1530,9 @@ Represents either a floating-point constant of mode 
> @var{m} or an
>  integer constant too large to fit into @code{HOST_BITS_PER_WIDE_INT}
>  bits but small enough to fit within twice that number of bits (GCC
>  does not provide a mechanism to represent even larger constants).  In
> -the latter case, @var{m} will be @code{VOIDmode}.
> +the latter case, @var{m} will be @code{VOIDmode}.  For integral values
> +the value is a signed value, meaning the top bit of
> +@code{CONST_DOUBLE_HIGH} is a sign bit.
>  
>  @findex CONST_DOUBLE_LOW
>  If @var{m} is @code{VOIDmode}, the bits of the value are stored in

Sounds good.

> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index 78ddfc3..c0b24e4 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -531,10 +531,9 @@ immed_double_const (HOST_WIDE_INT i0, HOST_WIDE_INT i1, 
> enum machine_mode mode)
>  
>   1) If GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT, then we use
>   gen_int_mode.
> - 2) GET_MODE_BITSIZE (mode) == 2 * HOST_BITS_PER_WIDE_INT, but the value 
> of
> - the integer fits into HOST_WIDE_INT anyway (i.e., i1 consists only
> - from copies of the sign bit, and sign of i0 and i1 are the same),  then
> - we return a CONST_INT for i0.
> + 2) If the value of the integer fits into HOST_WIDE_INT anyway
> +(i.e., i1 consists only from copies of the sign bit, and sign
> + of i0 and i1 are the same), then we return a CONST_INT for i0.
>   3) Otherwise, we create a CONST_DOUBLE for i0 and i1.  */
>if (mode != VOIDmode)
>  {

This too.

> diff --git a/gcc/explow.c b/gcc/explow.c
> index 2fae1a1..6284d61 100644
> --- a/gcc/explow.c
> +++ b/gcc/explow.c
> @@ -96,6 +96,9 @@ plus_constant (rtx x, HOST_WIDE_INT c)

For this I think we should make plus_constant a wrapper:

/* Return an rtx for the sum of X and the integer C.  */

rtx
plus_constant (rtx x, HOST_WIDE_INT c)
{
  return plus_constant_mode (GET_MODE (x), x, c);
}

/* Return an rtx for the sum of X and the integer C, given that X
   has mode MODE.  */

rtx
plus_constant_mode (enum machine_mode mode, rtx x, HOST_WIDE_INT c)
{
  ...innards of current plus_constant, without the "mode = "...
}

Reason being...

>switch (code)
>  {
>  case CONST_INT:
> +  if (GET_MODE_BITSIZE (mode) > HOST_WIDE_INT)
> + /* Punt for now.  */
> + goto overflow;
>return GEN_INT (INTVAL (x) + c);
>  
>  case CONST_DOUBLE:

...this won't work as things stand, since CONST_INT always has VOIDmode.
(I'm on a slow mission to fix that.)

I agree that this is a pre-existing bug.  Callers that want to be
CONST_INT-safe should use the new plus_constant_mode instead of
plus_constant.  Once they do, we should assert here that mode isn't
VOIDmode.  But since it's an existing bug that also affects 2-HWI
constants, I agree that replacing calls to plus_constant with calls
to plus_constant_mode is a separate fix.

I don't think it's a good idea to punt to a PLUS though.
(plus (const_int X) (const_int Y)) isn't canonical rtl,
and could cause other problems.

Suggest instead we reuse the CONST_DOUBLE code for CONST_INT,
with l1 set to INTVAL and h1 set to the sign extension.

> @@ -103,10 +106,14 @@ plus_constant (rtx x, HOST_WIDE_INT c)
>   unsigned HOST_WIDE_INT l1 = CONST_DOUBLE_LOW (x);
>   HOST_WIDE_INT h1 = CONST_DOUBLE_HIGH (x);
>   unsigned HOST_WIDE_INT l2 = c;
> - HOST_WIDE_INT h2 = c < 0 ? ~0 : 0;
> + HOST_WIDE_INT h2 = c < 0 ? ~(HOST_WIDE_INT)0 : 0;
>   unsigned HOST_WIDE_INT lv;
>   HOST_WIDE_INT hv;
>  
> + if (GET_MODE_BITSIZE (mode) > 2*HOST_WIDE_INT)
> +   /* Punt for now.  */
> +   goto overflow;
> +
>   add_double (l1, h1, l2, h2, &lv, &hv);

Nicely, add_double returns true on overflow, so I think
we should replace the punt with:

   if (add_double_with_sign (l1, h1, l2, h2, &lv, &hv, false))
 gcc_assert (GET_MODE_BITSIZE (mode) <= 2 * HOST_WIDE_INT);

(Seems better to explicitly specify the sign, even though
add_double would be equivalent.)

> @@ -141,6 +148,9 @@ plus_constant (rtx x, HOST_WIDE_INT c)
>break;
>  
>  case PLUS:
> +  if (GET_MODE_BITSIZE (mode) > HOST_WIDE_INT)
> + /* Punt for now.  */
> + goto overflow;
>/* The interesting case is adding the integer to a sum.
>Look for constant term in the sum and combine
>with C.  For an integer constant term, we make a combined

For this I think we should change the recursive CONSTANT_P call
to use plus_constant_mode (with the mode of the PLUS) instead of
plus_constant.  It will then be correct for CONST_INT, and we can
remove the special CONST_INT case.

> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index ce4eab4..37e46b1 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -101,6 +101,7 @@ m

Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread William J. Schmidt

On Wed, 2012-03-21 at 10:33 +0100, Richard Guenther wrote:
> On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski  wrote:
> > On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt
> >  wrote:
> >> Greetings,
> >>
> >> Now that we're into stage 1 again, I'd like to submit the first round of
> >> changes for dominator-based strength reduction, which will address
> >> issues from PR22586, PR35308, PR46556, and perhaps others.  I'm
> >> attaching two patches: the smaller (slsr-part1) is the patch I'm
> >> submitting for approval today, while the larger (slsr-fyi) is for
> >> reference only, but may be useful if questions arise about how the small
> >> patch fits into the intended whole.
> >>
> >> This patch contains the logic for identifying strength reduction
> >> candidates, and makes replacements only for those candidates where the
> >> stride is a fixed constant.  Replacement for candidates with fixed but
> >> unknown strides are not implemented herein, but that logic can be viewed
> >> in the larger patch.  This patch does not address strength reduction of
> >> data reference expressions, or candidates with conditional increments;
> >> those issues will be dealt with in future patches.
> >>
> >> The cost model is built on the one used by tree-ssa-ivopts.c, and I've
> >> added some new instruction costs to that model in place.  It might
> >> eventually be good to divorce that modeling code from IVOPTS, but that's
> >> an orthogonal patch and somewhat messy.
> >
> > I think this is the wrong way to do straight line strength reduction
> > considering we have a nice value numbering system which should be easy
> > to extended to support it.
> 
> Well, it is easy to handle very specific easy cases like
> 
> a = i * 2;
> b = i * 3;
> c = i * 4;
> 
> to transform it to
> 
> a = i * 2;
> b = a + i;
> c = b + i;
> 
> but already
> 
> a = i * 2;
> b = i * 4;
> c = i * 6;
> 
> would need extra special code.  The easy case could be handled in eliminate ()
> by, when seeing A * CST, looking up A * (CST - 1) and if that
> succeeds, transform
> it to VAL + A.  Cost issues are increasing the lifetime of VAL.  I've done 
> this
> simple case at some point, but it failed to handle the common associated 
> cases,
> when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 +
> 3, etc.  I think
> it is the re-association in case of a strength-reduction opportunity
> that makes the
> separate pass better?  How would you suggest handling this case in the
> VN framework?  Detect the a * 3 + 3 pattern and then do two lookups, one for
> a * 2 and one for val + 2?  But then we still don't have a value for a + 1
> to re-use ...

And it becomes even more difficult with more complex scenarios.
Consider:

a = x + (3 * s);
b = x + (5 * s);
c = x + (7 * s);

The framework I've developed recognizes that this group of instructions
is related, and that it is profitable to replace them as follows:

a = x + (3 * s);
t = 2 * s;
b = a + t;
c = b + t;

The introduced multiply by 2 (one shift) is far cheaper than the two
multiplies that it replaces.  However, suppose you have instead:

a = x + (2 * s);
b = x + (8 * s);

Now it isn't profitable to replace this by:

a = x + (2 * s);
t = 6 * s;
b = a + t;

since a multiply by 6 (2 shifts, one add) is more costly than a multiply
by 8 (one shift).  To make these decisions correctly requires analyzing
all the related statements together, which value numbering as it stands
is not equipped to do.  Logic to handle these cases is included in my
larger "fyi" patch.

As another example, consider conditionally-executed increments:

a = i * 5;
if (...)
  i = i + 1;
b = i * 5;

This can be correctly and profitably strength-reduced as:

a = i * 5;
t = a;
if (...)
  {
i = i + 1;
t = t + 5;
  }
b = t;

(This is an approximation to the actual phi representation, which I've
omitted for clarity.)  Again, this kind of analysis is not something
that fits naturally into value numbering.  I don't yet have this in the
"fyi" patch, but have it largely working in a private version.

My conclusion is that if strength reduction is done in value numbering,
it must either be a very limited form of strength reduction, or the kind
of logic I've developed that considers chains of related candidates
together must be "glued onto" value numbering.  I think the latter would
be a mistake, as it would introduce much unnecessary complexity to what
is currently a very clean approach to PRE; the strength reduction would
become an ugly wart that people would complain about.  I think it's far
cleaner to keep the two issues separate.

> 
> Bill, experimenting with pattern detection in eliminate () would be a
> possibility.

For the reasons expressed above, I don't think that would get very far
or make anyone very happy...

I appreciate Andrew's view that value numbering is a logical place to do
strength reduction, but after considering the problem over the last few
months I have to disagree.  If you don't mind, at this point I would

Re: remove wrong code in immed_double_const

2012-03-21 Thread Michael Matz

Hi,

On Tue, 20 Mar 2012, Mike Stump wrote:

> > Actually you did.  I've tried yesterday to come up with a text that 
> > would do the same (because I agree with you that deleting the assert 
> > changes the spec of the function,
> 
> The spec of the function is the text above the definition of the 
> function, coupled with the information in the .texi file, would you 
> agree?

Actually, I wouldn't.  The real spec includes many pieces of information, 
the comments (that can be incomplete or become out of date), the .texi 
docu (which can be even more incomplete and out of date), the code (which 
can conflict with the comments and still be the correct variant) and the 
current usage (which can conflict with everything of the above).  asserts 
are IMO even a nice way of documenting parts of the spec.

> If so, could you please quote the text of the spec which would 
> be violated by removing the assert?  Could you please give a specific 
> value with which we could talk about that shows a violation of the spec.

Richard did so.  If the high bit of i1 is set then currently you will get 
a negative number produced no matter the absolute value of it.  That's IMO 
part of the spec, high bit set --> negative number.  negative as defined 
by the various routines dealing with CONST_INT or CONST_DOUBLE interpreted 
in the modes allowed for creating them.

If you were to allow modes of larger size than what could be completely 
specified with the two HWI arguments i0 and i1, then it suddenly depends 
on the absolute value if you get a negative or positive number.  For small 
negative numbers (those for which i1 is ~0 and i0 < 0) you'll still get a 
negative CONST_INT.  For large negative numbers you'll get a CONST_DOUBLE, 
that when interpreted in the large requested mode (which is the only thing 
that makes sense) is positive.  It doesn't matter that it's still 
negative when interpreted in a smaller mode.

Hence all values where i1 is between (HWI)1 << (hwibits-1)) and 
((HWI)~0)-1 are the values you're searching for, that show the problem.  
As you correctly note the routine will of course generate the exact same 
CONST_DOUBLE object no matter the mode given, but they have to be 
interpreted together with the given mode.

This positive/negative inconsistency doesn't make sense to allow, and the 
assert ensures that it isn't allowed.

Now, this inconsistency can also be avoided via different means.  By 
extending the block comment in front of the function for instance, but 
then the assert would make even more sense.  So Richards proposal to move 
the assert is better: The problem occurs only with large positive or 
negative values (i1 not 0 or ~0), so the mode-size check can be moved 
after the GEN_INT call.

This would have the seemingly strange effect of disallowing too large 
modes only for large values, but that's simply a side-effect of 
CONST_DOUBLE and the whole associated machinery not being able to 
consistently deal with constants wider than 2*HWI_BITS.

> My position is simple, the spec is what is above the definition and the 
> .texi files, and the stuff inside the definition are interesting 
> implementation details of that spec, which _never_ modify the spec.

As an abstract goal that's good.  But reality is that this isn't the case.  
GCC is quite excellently commented, but it doesn't fit the ideal.  Using 
the fact that it isn't ideal to claim that the spec doesn't say anything 
about large modes (when the assert clearly disallows them) is absurd.

> My position is that 0 is a value which the spec defines, and for which 
> we assert.  Please quote the line from the spec that defines what we do 
> in that case.  I've never seen anyone quote such a line.  To support 
> your position, I will insist on a direct quote from the spec.

This line disallows the value 0 with large modes:

  gcc_assert (GET_MODE_BITSIZE (mode) == 2 * HOST_BITS_PER_WIDE_INT);

I insist on it being part of the spec.  Moving the assert changes the spec 
to allow 0 (and generally small positive and negative numbers) to also be 
generated for larger modes.  If you so want to change the spec nobody 
would be opposed.

> if I is 42, we abort.  To back the position that spec must not be 
> changed, you need to explain at least one thing for which the wrong 
> thing will happen if the spec did change.  If you want to go down that 
> path, you will need to furnish one example where badness happens with 0, 
> not 2, not 3, but 0.

Huh.  Removing the assert wouldn't only allow 0, but also other values.  
I don't understand your argumentation: "because for 0 nothing bad happens, 
that proves that nothing bad happens for any other values which we would 
also allow, hence I can remove the assert"?  It of course doesn't prove 
anything at all.  In any case, above I have given the values that will be 
problematic (and they don't include 0), and a way of changing the spec to 
disallow only them, instead of all values.  Actually Richard S.

Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread Richard Guenther

On Wed, 21 Mar 2012, William J. Schmidt wrote:

> On Wed, 2012-03-21 at 10:33 +0100, Richard Guenther wrote:
> > On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski  wrote:
> > > On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt
> > >  wrote:
> > >> Greetings,
> > >>
> > >> Now that we're into stage 1 again, I'd like to submit the first round of
> > >> changes for dominator-based strength reduction, which will address
> > >> issues from PR22586, PR35308, PR46556, and perhaps others.  I'm
> > >> attaching two patches: the smaller (slsr-part1) is the patch I'm
> > >> submitting for approval today, while the larger (slsr-fyi) is for
> > >> reference only, but may be useful if questions arise about how the small
> > >> patch fits into the intended whole.
> > >>
> > >> This patch contains the logic for identifying strength reduction
> > >> candidates, and makes replacements only for those candidates where the
> > >> stride is a fixed constant.  Replacement for candidates with fixed but
> > >> unknown strides are not implemented herein, but that logic can be viewed
> > >> in the larger patch.  This patch does not address strength reduction of
> > >> data reference expressions, or candidates with conditional increments;
> > >> those issues will be dealt with in future patches.
> > >>
> > >> The cost model is built on the one used by tree-ssa-ivopts.c, and I've
> > >> added some new instruction costs to that model in place.  It might
> > >> eventually be good to divorce that modeling code from IVOPTS, but that's
> > >> an orthogonal patch and somewhat messy.
> > >
> > > I think this is the wrong way to do straight line strength reduction
> > > considering we have a nice value numbering system which should be easy
> > > to extended to support it.
> > 
> > Well, it is easy to handle very specific easy cases like
> > 
> > a = i * 2;
> > b = i * 3;
> > c = i * 4;
> > 
> > to transform it to
> > 
> > a = i * 2;
> > b = a + i;
> > c = b + i;
> > 
> > but already
> > 
> > a = i * 2;
> > b = i * 4;
> > c = i * 6;
> > 
> > would need extra special code.  The easy case could be handled in eliminate 
> > ()
> > by, when seeing A * CST, looking up A * (CST - 1) and if that
> > succeeds, transform
> > it to VAL + A.  Cost issues are increasing the lifetime of VAL.  I've done 
> > this
> > simple case at some point, but it failed to handle the common associated 
> > cases,
> > when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 +
> > 3, etc.  I think
> > it is the re-association in case of a strength-reduction opportunity
> > that makes the
> > separate pass better?  How would you suggest handling this case in the
> > VN framework?  Detect the a * 3 + 3 pattern and then do two lookups, one for
> > a * 2 and one for val + 2?  But then we still don't have a value for a + 1
> > to re-use ...
> 
> And it becomes even more difficult with more complex scenarios.
> Consider:
> 
> a = x + (3 * s);
> b = x + (5 * s);
> c = x + (7 * s);
> 
> The framework I've developed recognizes that this group of instructions
> is related, and that it is profitable to replace them as follows:
> 
> a = x + (3 * s);
> t = 2 * s;
> b = a + t;
> c = b + t;
> 
> The introduced multiply by 2 (one shift) is far cheaper than the two
> multiplies that it replaces.  However, suppose you have instead:
> 
> a = x + (2 * s);
> b = x + (8 * s);
> 
> Now it isn't profitable to replace this by:
> 
> a = x + (2 * s);
> t = 6 * s;
> b = a + t;
> 
> since a multiply by 6 (2 shifts, one add) is more costly than a multiply
> by 8 (one shift).  To make these decisions correctly requires analyzing
> all the related statements together, which value numbering as it stands
> is not equipped to do.  Logic to handle these cases is included in my
> larger "fyi" patch.
> 
> As another example, consider conditionally-executed increments:
> 
> a = i * 5;
> if (...)
>   i = i + 1;
> b = i * 5;
> 
> This can be correctly and profitably strength-reduced as:
> 
> a = i * 5;
> t = a;
> if (...)
>   {
> i = i + 1;
> t = t + 5;
>   }
> b = t;
> 
> (This is an approximation to the actual phi representation, which I've
> omitted for clarity.)  Again, this kind of analysis is not something
> that fits naturally into value numbering.  I don't yet have this in the
> "fyi" patch, but have it largely working in a private version.
> 
> My conclusion is that if strength reduction is done in value numbering,
> it must either be a very limited form of strength reduction, or the kind
> of logic I've developed that considers chains of related candidates
> together must be "glued onto" value numbering.  I think the latter would
> be a mistake, as it would introduce much unnecessary complexity to what
> is currently a very clean approach to PRE; the strength reduction would
> become an ugly wart that people would complain about.  I think it's far
> cleaner to keep the two issues separate.

I agree.

> > 
> > Bill, experimenting with pattern detection in eliminate () would be a
> > p

Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread Richard Earnshaw

On 21/03/12 13:40, William J. Schmidt wrote:
> On Wed, 2012-03-21 at 10:33 +0100, Richard Guenther wrote:
>> On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski  wrote:
>>> On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt
>>>  wrote:
 Greetings,

 Now that we're into stage 1 again, I'd like to submit the first round of
 changes for dominator-based strength reduction, which will address
 issues from PR22586, PR35308, PR46556, and perhaps others.  I'm
 attaching two patches: the smaller (slsr-part1) is the patch I'm
 submitting for approval today, while the larger (slsr-fyi) is for
 reference only, but may be useful if questions arise about how the small
 patch fits into the intended whole.

 This patch contains the logic for identifying strength reduction
 candidates, and makes replacements only for those candidates where the
 stride is a fixed constant.  Replacement for candidates with fixed but
 unknown strides are not implemented herein, but that logic can be viewed
 in the larger patch.  This patch does not address strength reduction of
 data reference expressions, or candidates with conditional increments;
 those issues will be dealt with in future patches.

 The cost model is built on the one used by tree-ssa-ivopts.c, and I've
 added some new instruction costs to that model in place.  It might
 eventually be good to divorce that modeling code from IVOPTS, but that's
 an orthogonal patch and somewhat messy.
>>>
>>> I think this is the wrong way to do straight line strength reduction
>>> considering we have a nice value numbering system which should be easy
>>> to extended to support it.
>>
>> Well, it is easy to handle very specific easy cases like
>>
>> a = i * 2;
>> b = i * 3;
>> c = i * 4;
>>
>> to transform it to
>>
>> a = i * 2;
>> b = a + i;
>> c = b + i;
>>
>> but already
>>
>> a = i * 2;
>> b = i * 4;
>> c = i * 6;
>>
>> would need extra special code.  The easy case could be handled in eliminate 
>> ()
>> by, when seeing A * CST, looking up A * (CST - 1) and if that
>> succeeds, transform
>> it to VAL + A.  Cost issues are increasing the lifetime of VAL.  I've done 
>> this
>> simple case at some point, but it failed to handle the common associated 
>> cases,
>> when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 +
>> 3, etc.  I think
>> it is the re-association in case of a strength-reduction opportunity
>> that makes the
>> separate pass better?  How would you suggest handling this case in the
>> VN framework?  Detect the a * 3 + 3 pattern and then do two lookups, one for
>> a * 2 and one for val + 2?  But then we still don't have a value for a + 1
>> to re-use ...
> 
> And it becomes even more difficult with more complex scenarios.
> Consider:
> 
> a = x + (3 * s);
> b = x + (5 * s);
> c = x + (7 * s);
> 
> The framework I've developed recognizes that this group of instructions
> is related, and that it is profitable to replace them as follows:
> 
> a = x + (3 * s);
> t = 2 * s;
> b = a + t;
> c = b + t;
> 

Given that CPUs often have shift+add, that's not necessarily best
either.  Also, on pipelined super-scalar systems you're serializing a
problem when it might be better to improve the parallelism.

The best sequence on ARM would probably be something like

a = x + (3 * s);
b = a + (2 * s); (ADD b, a, s, LSL #1)
c = a + (4 * s); (ADD b, a, s, LSL #2).

R.

Re: [Patch,AVR]: Hack around PR rtl-optimization/52543, Take #2

2012-03-21 Thread Georg-Johann Lay

Steven Bosscher wrote:
> On Tue, Mar 20, 2012 at 8:54 PM, Georg-Johann Lay  wrote:
>> Dropping the first patch which does not work because at expand-time there
>> must not be pre-/post-modify addressing :-(
> 
> Have you tried to fix that, instead? Or at least ask around a bit to
> see what people would think about that idea? The reasons why things
> are the way they are, may not be applicable anymore.

No, I didn't try to fix it. I am not experienced enough in that field.

Moreover, at least as far as avr is concerned, using post-inc would
just be a hack, too.

> For example, perhaps the only reason for not having pre-/post-modify
> addressing modes earlier is that the old "flow" dataflow frame work
> didn't handle them. And it doesn't seem to be so black-and-white: The
> very pass you ran into problems with first, cprop, does handle
> pre-/post-modify addresses in local cprop. Some other passes simply
> take the conservative path and drop pre-/post-modify (like CSE, which

The problems were not only in cprop but also in cselib.

> doesn't record values from them). It may be a relatively small job to
> make everything accept them, and you may be something that's also
> helpful for other targets.
> 
> Ciao!
> Steven

Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread William J. Schmidt

On Wed, 2012-03-21 at 13:57 +, Richard Earnshaw wrote:
> On 21/03/12 13:40, William J. Schmidt wrote:
> > 
> > And it becomes even more difficult with more complex scenarios.
> > Consider:
> > 
> > a = x + (3 * s);
> > b = x + (5 * s);
> > c = x + (7 * s);
> > 
> > The framework I've developed recognizes that this group of instructions
> > is related, and that it is profitable to replace them as follows:
> > 
> > a = x + (3 * s);
> > t = 2 * s;
> > b = a + t;
> > c = b + t;
> > 
> 
> Given that CPUs often have shift+add, that's not necessarily best
> either.  Also, on pipelined super-scalar systems you're serializing a
> problem when it might be better to improve the parallelism.
> 
> The best sequence on ARM would probably be something like
> 
> a = x + (3 * s);
> b = a + (2 * s); (ADD b, a, s, LSL #1)
> c = a + (4 * s); (ADD b, a, s, LSL #2).
> 

These are good points, and I hope you'll keep an eye on this work as it
proceeds.  I should have been less categorical about stating the
profitability of the transformation.  My intent is that the cost model
will reflect the capabilities of the target machine, and for the machine
I'm most familiar with the transformation as shown is best.  Getting to
the optimal sequence that you show for ARM could be an interesting
challenge that might require additional logic in the cost model.  I'll
add it to my list of things to think about.

Thanks,
Bill

> 
> R.
>

Re: struct siginfo vs. siginfo_t

2012-03-21 Thread Thomas Schwinge

Hi!

On Thu, 15 Mar 2012 11:57:00 -0400, Carlos O'Donell  
wrote:
> On Thu, Mar 15, 2012 at 11:05 AM, Thomas Schwinge
>  wrote:
> > On 26 Feb 2012 18:17:52 -, drep...@sourceware.org wrote:
> >> http://sources.redhat.com/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=4efeffc1d583597e4f52985b9747269e47b754e2
> >>
> >> commit 4efeffc1d583597e4f52985b9747269e47b754e2
> >> Author: Ulrich Drepper 
> >> Date:   Sun Feb 26 13:17:27 2012 -0500
> >>
> >>     Fix up POSIX testing in conformtest
> >
> >> [...]
> >> +     * sysdeps/unix/sysv/linux/bits/siginfo.h: Don't name siginfo_t
> >> +     struct.  [...]
> >> [...]
> >
> >> diff --git a/sysdeps/unix/sysv/linux/bits/siginfo.h 
> >> b/sysdeps/unix/sysv/linux/bits/siginfo.h
> >> index ecef39d..0635e2f 100644
> >> --- a/sysdeps/unix/sysv/linux/bits/siginfo.h
> >> +++ b/sysdeps/unix/sysv/linux/bits/siginfo.h
> >> [...]
> >> @@ -47,7 +47,7 @@ typedef union sigval
> >>  #  define __SI_PAD_SIZE     ((__SI_MAX_SIZE / sizeof (int)) - 3)
> >>  # endif
> >>
> >> -typedef struct siginfo
> >> +typedef struct
> >>    {
> >>      int si_signo;            /* Signal number.  */
> >>      int si_errno;            /* If non-zero, an errno value associated 
> >> with
> >> [...]
> >
> > This change breaks GCC:
> >
> >    In file included from 
> > /scratch/tschwing/FM_sh-linux-gnu-mk2/src/gcc-mainline/libgcc/unwind-dw2.c:377:0:
> >    ./md-unwind-support.h: In function 'sh_fallback_frame_state':
> >    ./md-unwind-support.h:182:17: error: field 'info' has incomplete type
> >
> > In my case, this is really libgcc/config/sh/linux-unwind.h:
> >
> >    [...]
> >       181            struct rt_sigframe {
> >       182              struct siginfo info;
> >       183              struct ucontext uc;
> >       184            } *rt_ = context->cfa;
> >    [...]
> 
> POSIX says you get "siginto_t" *not* "struct siginfo," please fix the code.

There is one usage in boehm-gc/os_dep.c, but it is only used if
SUNOS5SIGS is defined, which it is only if one of SUNOS5, DRSNX, HPUX, or
FREEBSD is defined, which are all not using Linux-based glibc ports.

Likewise, gcc/ada/init.c has a struct __siginfo occurence, but only for
__FreeBSD__.

config/rs6000/linux-unwind.h uses ``char siginfo[128]'', and
config/s390/linux-unwind.h also uses a constant.

I tested the following patch for sh-linux-gnu.  This only covers one
configuration, but the change is pretty mechanic anyway and every place
that used to refer to struct siginfo already must have had  in
its include path, which is the same file that declares siginfo_t.

OK to commit?  This should probably also go into any active release
branches, to keep them buildable once this glibc change ripples through?

libgcc/
* config/alpha/linux-unwind.h (alpha_fallback_frame_state): Use
siginfo_t instead of struct siginfo.
* config/bfin/linux-unwind.h (bfin_fallback_frame_state): Likewise.
* config/i386/linux-unwind.h (x86_fallback_frame_state): Likewise.
* config/ia64/linux-unwind.h (ia64_fallback_frame_state)
(ia64_handle_unwabi): Likewise.
* config/mips/linux-unwind.h (mips_fallback_frame_state): Likewise.
* config/pa/linux-unwind.h (pa32_fallback_frame_state): Likewise.
* config/sh/linux-unwind.h (shmedia_fallback_frame_state)
(sh_fallback_frame_state): Likewise.
* config/tilepro/linux-unwind.h (tile_fallback_frame_state): Likewise.
* config/xtensa/linux-unwind.h (xtensa_fallback_frame_state): Likewise.

diff --git a/libgcc/config/alpha/linux-unwind.h 
b/libgcc/config/alpha/linux-unwind.h
index 4c811dc..f747053 100644
--- a/libgcc/config/alpha/linux-unwind.h
+++ b/libgcc/config/alpha/linux-unwind.h
@@ -49,7 +49,7 @@ alpha_fallback_frame_state (struct _Unwind_Context *context,
   else if (pc[1] == 0x201f015f)/* lda $0,NR_rt_sigreturn */
 {
   struct rt_sigframe {
-   struct siginfo info;
+   siginfo_t info;
struct ucontext uc;
   } *rt_ = context->cfa;
   sc = &rt_->uc.uc_mcontext;
diff --git a/libgcc/config/bfin/linux-unwind.h 
b/libgcc/config/bfin/linux-unwind.h
index 88c8285..6e8f1ad 100644
--- a/libgcc/config/bfin/linux-unwind.h
+++ b/libgcc/config/bfin/linux-unwind.h
@@ -48,10 +48,10 @@ bfin_fallback_frame_state (struct _Unwind_Context *context,
 {
   struct rt_sigframe {
int sig;
-   struct siginfo *pinfo;
+   siginfo_t *pinfo;
void *puc;
char retcode[8];
-   struct siginfo info;
+   siginfo_t info;
struct ucontext uc;
   } *rt_ = context->cfa;

diff --git a/libgcc/config/i386/linux-unwind.h 
b/libgcc/config/i386/linux-unwind.h
index f17a46c..33810c5 100644
--- a/libgcc/config/i386/linux-unwind.h
+++ b/libgcc/config/i386/linux-unwind.h
@@ -139,9 +139,9 @@ x86_fallback_frame_state (struct _Unwind_Context *context,
 {
   struct rt_sigframe {
int sig;
-   struct siginfo *pinfo;
+   siginfo_t *pinfo;
void *puc;
-   struct sigi

[RFC][PATCH] A change to do_while_loop_p()

2012-03-21 Thread Razya Ladelsky

Hi,

I need to use do_while_loop_p, but I'm not sure its functionality is what 
I expected it to be.

This is the part that I do not understand:

/* If the header contains just a condition, it is not a do-while loop.  */
  stmt = last_and_only_stmt (loop->header);
 if (stmt
  && gimple_code (stmt) == GIMPLE_COND)
return false;

The header could contain a condition which is not the loop's exit 
condition,
but rather a part of its body, then  why do we rule out this loop as a 
do_while loop?

I ran into this in a loop (the outer loop) extracted from bwaves 
benchmark:

  do k=1,nz
 km1=mod(k+nz-2,nz)+1
 kp1=mod(k,nz)+1
 do j=1,ny
jm1=mod(j+ny-2,ny)+1
jp1=mod(j,ny)+1
.
 enddo
  enddo
 
which was translated to:

D.2361_17 = *ny_16(D);

:
  # k_3 = PHI <1(4), k_562(25)>
  if (D.2361_17 > 0)
goto ;
  else
goto ;

:
  k_562 = k_3 + 1;
  # DEBUG k => k_562
  if (k_3 == D.1583_270)
goto ;  --->   return
  else
goto ;

:
  goto ;

:  --> starting the body of the the second loop
  pretmp.318_776 = (integer(kind=8)) k_3;
  pretmp.318_777 = stride.92_20 * pretmp.318_776;
... 



bb 5 is the header of the outer loop, and bb 25 is the latch.
According to do_while_loop_p ()  this is NOT a do while loop, but it
seems that it should be.

 I am attaching a patch to change do_while_loop_p() assuming that what I 
understand is indeed correct,
 Please let me know if I'm right,

Thank you,
Razya

Index: tree-ssa-loop-ch.c
===
--- tree-ssa-loop-ch.c  (revision 185604)
+++ tree-ssa-loop-ch.c  (working copy)
@@ -107,6 +107,8 @@ should_duplicate_loop_header_p (basic_block header
 bool
 do_while_loop_p (struct loop *loop)
 {
+  edge exit_edge;
+  gimple cond_stmt;
   gimple stmt = last_stmt (loop->latch);
 
   /* If the latch of the loop is not empty, it is not a do-while loop.  */
@@ -116,8 +118,14 @@ do_while_loop_p (struct loop *loop)
 
   /* If the header contains just a condition, it is not a do-while loop.  */
   stmt = last_and_only_stmt (loop->header);
+  exit_edge = single_dom_exit (loop);
+  if (exit_edge)
+cond_stmt = last_stmt (exit_edge->src);
+  else
+cond_stmt =stmt;
   if (stmt
-  && gimple_code (stmt) == GIMPLE_COND)
+  && gimple_code (stmt) == GIMPLE_COND
+  && stmt == cond_stmt)
 return false;
 
   return true;
=

[fixincludes] Fix pthread.h failure (PR other/52626)

2012-03-21 Thread Rainer Orth

As reported in PR other/52626, make check in fixincludes is currently
failing since I neglected to adapt the baseline for the Solaris 8
removal ;-(  I always meant to run make check, but forgot.

On the other hand, it would be really helpful if fixincludes make check
could emit DejaGnu-style fixincludes.{sum, log} files which would
automatically be picked up by make mail-report.log and make failures
immediately obvious.

The following patch fixes this, tested with make check in fixincludes on
i386-pc-solaris2.11.

Ok for mainline?

Rainer


2012-03-21  Rainer Orth  

PR other/52626
* tests/base/pthread.h [SOLARIS_MUTEX_INIT_2_CHECK]
(PTHREAD_COND_INITIALIZER): Adapt for solaris_cond_init removal.

# HG changeset patch
# Parent aa297e98c9489d9734c7a503fc3275455c33985e
Fix pthread.h failure (PR other/52626)

diff --git a/fixincludes/tests/base/pthread.h b/fixincludes/tests/base/pthread.h
--- a/fixincludes/tests/base/pthread.h
+++ b/fixincludes/tests/base/pthread.h
@@ -83,9 +83,9 @@ extern int __sigsetjmp (struct __jmp_buf
 #define PTHREAD_MUTEX_INITIALIZER	{{{0},0}, {{{0}}}, {0}}
 #endif
 #if __STDC__ - 0 == 0 && !defined(_NO_LONGLONG)
-#define PTHREAD_COND_INITIALIZER	{{{0}, 0, 0x4356}, 0}	/* DEFAULTCV */
+#define PTHREAD_COND_INITIALIZER	{{{0}, 0}, 0}	/* DEFAULTCV */
 #else
-#define PTHREAD_COND_INITIALIZER	{{{0}, 0, 0x4356}, {0}}	/* DEFAULTCV */
+#define PTHREAD_COND_INITIALIZER	{{{0}, 0}, {0}}	/* DEFAULTCV */
 #endif
 #if __STDC__ - 0 == 0 && !defined(_NO_LONGLONG)
 #define	PTHREAD_MUTEX_INITIALIZER		/* = DEFAULTMUTEX */	\

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: remove wrong code in immed_double_const

2012-03-21 Thread Mike Stump

On Mar 21, 2012, at 6:47 AM, Michael Matz wrote:
> Actually, I wouldn't.

Ok, thanks for explaining.  In light of that, I'd just say, I want to change 
the spec, the details don't change any for me, only the terminology I might 
use.  The problem is the spec is causing aborts on valid code, and hence, 
either, the code must be duplicated and fixed, or the code has to be fixed.  I 
don't see any value in duplicating the code, so, I am left with fixing the spec 
so that valid programs produce valid code.

> If the high bit of i1 is set then currently you will get 
> a negative number produced no matter the absolute value of it.

Ok, in the new patch, I'm pushing to change the spec so that the value is sign 
extended and fixing all the code that doesn't conform to that spec.  Richard 
seems to be agreeable with the basic idea, though, we are now sorting out all 
the little details to make that happen.  If there is any down-side or details 
we missed or got wrong, please chime in.

> For large negative numbers you'll get a CONST_DOUBLE, 
> that when interpreted in the large requested mode (which is the only thing 
> that makes sense) is positive.

In the new patch, we use sign extension, and when the high bit is set, the 
value is interpreted as a negative number is a larger mode.  I'll test signed 
and unsigned constants coming in from above to ensure the right thing happens.  
Above the signededness is tracked via the type.  In the rtl constant, it isn't, 
so that code will need an assert to prevent large unsigned values from taking 
this code path.

> Hence all values where i1 is between (HWI)1 << (hwibits-1)) and 
> ((HWI)~0)-1 are the values you're searching for, that show the problem.

Presently, I am fixing _all_ problems shown with those values.  If you know of 
any that we don't address, love to hear about them.

> This positive/negative inconsistency doesn't make sense to allow, and the 
> assert ensures that it isn't allowed.

Don't need the assert when there is no inconsistency, I believe that resolving 
any inconsistencies should remove the need for the assert.

> This would have the seemingly strange effect of disallowing too large 
> modes only for large values, but that's simply a side-effect of 
> CONST_DOUBLE and the whole associated machinery not being able to 
> consistently deal with constants wider than 2*HWI_BITS.

I'll move that assert up to the code that has the type information for the 
constant.

>> if I is 42, we abort.  To back the position that spec must not be 
>> changed, you need to explain at least one thing for which the wrong 
>> thing will happen if the spec did change.  If you want to go down that 
>> path, you will need to furnish one example where badness happens with 0, 
>> not 2, not 3, but 0.
> 
> Huh.  Removing the assert wouldn't only allow 0, but also other values.  
> I don't understand your argumentation: "because for 0 nothing bad happens, 
> that proves that nothing bad happens for any other values which we would 
> also allow, hence I can remove the assert"?

Right, it merely proves that the assert is wrong and needs fixing.  Once you 
accept that, then we progress on exactly what it should be.  This is now all 
mostly moot, given that I'm fine with changing the spec as Richard suggested to 
be a sign-extended constant.  Once we have that nice are concrete definition, 
the any code conflicts with that, is buggy, and we just fix.  Seems like a nice 
way forward to me.

> It of course doesn't prove anything at all.

:-)  Only the point I wanted to make; that 0 is safe.  As such, it proves that 
the spec, as you might call it, is wrong.  Once that spec is proven wrong, 
trivially, fixing it, isn't unreasonable.

[IA-64] Work around bug in unwinder

2012-03-21 Thread Eric Botcazou

Another latent issue exposed on IA-64 (both Linux and VMS) by GCC 4.7: the LC 
(Loop Counter) register isn't preserved by the unwinder.

The compiler generates unwind info for LC and unwind-ia64.c:uw_install_context 
restores it if this is deemed necessary.  The hitch is that "deemed necessary" 
means "register saved at some point along the path between thrower and catcher 
and going through _Unwind_RaiseException". Now if a register isn't saved along 
this path but clobbered very late, namely in uw_install_context, then nothing 
restores it before the longjmp.

unwind-ia64.c:uw_install_context reads:

static void __attribute__((noreturn))
uw_install_context (struct _Unwind_Context *current __attribute__((unused)),
struct _Unwind_Context *target)
{
  unsigned long ireg_buf[4], ireg_nat = 0, ireg_pr = 0;
  long i;

  /* Copy integer register data from the target context to a
 temporary buffer.  Do this so that we can frob AR.UNAT
 to get the NaT bits for these registers set properly.  */
  for (i = 4; i <= 7; ++i)
{
  char nat;
  void *t = target->ireg[i - 2].loc;
  if (t)
{
  unw_access_gr (target, i, &ireg_buf[i - 4], &nat, 0);
  ireg_nat |= (long)nat << (((size_t)&ireg_buf[i - 4] >> 3) & 0x3f);
  /* Set p6 - p9.  */
  ireg_pr |= 4L << i;
}
}

and it clobbers LC because of the loop when compiled with GCC 4.7 and above.

Bootstrapped/regtested on IA-64/Linux, OK for the mainline?  Do we also want it 
for 4.7.1?


2012-03-21  Eric Botcazou  

* config/ia64/unwind-ia64.c (uw_install_context): Manually save LC
if it hasn't been previously saved.


-- 
Eric Botcazou
Index: libgcc/config/ia64/unwind-ia64.c
===
--- libgcc/config/ia64/unwind-ia64.c	(revision 185395)
+++ libgcc/config/ia64/unwind-ia64.c	(working copy)
@@ -2171,8 +2171,20 @@ uw_install_context (struct _Unwind_Conte
 		struct _Unwind_Context *target)
 {
   unsigned long ireg_buf[4], ireg_nat = 0, ireg_pr = 0;
+  unsigned long saved_lc;
   long i;
 
+  /* ??? LC is a fixed register so the call to __builtin_unwind_init in
+ uw_init_context doesn't cause it to be saved.  In case it isn't in
+ the user frames either, we need to manually do it here, lest it be
+ clobbered by the loop just below.  */
+  if (target->lc_loc == NULL)
+{
+  register unsigned long lc asm ("ar.lc");
+  saved_lc = lc;
+  target->lc_loc = &saved_lc;
+}
+
   /* Copy integer register data from the target context to a
  temporary buffer.  Do this so that we can frob AR.UNAT
  to get the NaT bits for these registers set properly.  */

Re: [fixincludes] Fix pthread.h failure (PR other/52626)

2012-03-21 Thread Bruce Korb

Hi Rainer,

On Wed, Mar 21, 2012 at 9:25 AM, Rainer Orth
 wrote:
> As reported in PR other/52626, make check in fixincludes is currently
> failing since I neglected to adapt the baseline for the Solaris 8
> removal ;-(  I always meant to run make check, but forgot.
>
> On the other hand, it would be really helpful if fixincludes make check
> could emit DejaGnu-style fixincludes.{sum, log} files which would
> automatically be picked up by make mail-report.log and make failures
> immediately obvious.

Patch welcome!  I, myself, don't know what "emit DejaGnu-style
fixincludes.{sum, log} files" would mean.

> The following patch fixes this, tested with make check in fixincludes on
> i386-pc-solaris2.11.
>
> Ok for mainline?

I'm sure you examined the difference by hand and confirmed that the change is
expected.  Then yes, please, by all means.  Thank you!  - Bruce

[pph] Fix x1mbstate_t.h (issue5872043)

2012-03-21 Thread Diego Novillo

Fix x1mbstate_t.h.

This patch fixes the parser segmentation fault caused by a name
lookup failure (details in
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01369.html).

I am not 100% sure that this is the right fix, but Jason seems to
think that the theory behind this is fine (parser does no allow a
USING_DECL to be set in the bindings of an identifier).  I left a
FIXME note to help future debugging.


2012-03-21   Diego Novillo  

cp/ChangeLog.pph
* name-lookup.c (pph_set_namespace_decl_binding): Ignore
USING_DECLs.

testsuite/ChangeLog.pph
* g++.dg/pph/x1mbstate_t.h: Mark fixed.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 947708d..1b33ce3 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -6271,7 +6271,11 @@ pph_set_namespace_decl_binding (tree decl, 
cp_binding_level *bl, int flags)
 {
   /* Set the namespace identifier binding for a single decl.  */
   tree id = DECL_NAME (decl);
-  if (id)
+  /* FIXME pph.  USING_DECLs do not seem to be used in bindings by
+ the parser. This was causing the SEGV in
+ testsuite/g++.dg/pph/x1mbstate_t.h.  It's unclear whether this is
+ the right fix.  */
+  if (id && TREE_CODE (decl) != USING_DECL)
 pph_set_identifier_binding (id, decl, bl, flags);
 }
 
diff --git a/gcc/testsuite/g++.dg/pph/x1mbstate_t.h 
b/gcc/testsuite/g++.dg/pph/x1mbstate_t.h
index c07a0cc..4d473e4 100644
--- a/gcc/testsuite/g++.dg/pph/x1mbstate_t.h
+++ b/gcc/testsuite/g++.dg/pph/x1mbstate_t.h
@@ -1,9 +1,8 @@
-// { dg-xfail-if "identifier bindings not set properly" { "*-*-*" } { 
"-fpph-map=pph.map"} }
-// { dg-bogus ".*Segmentation fault" "ICE trying to parse std::mbstate_t"  { 
xfail *-*-* } 0 }
 #ifndef _X1_MBSTATE_H
 #define _X1_MBSTATE_H
 #include "x0mbstate_t.h"
-// Name lookup for std::mbstate_t fails here.  Instead of returning the global
-// type_decl for mbstate_t, it returns the "usings ::mbstate_t" declaration.
+// Name lookup for std::mbstate_t was failingfails here.  Instead of returning
+// the global type_decl for mbstate_t, it was returning the
+// "usings ::mbstate_t" declaration.
 typedef std::mbstate_t state_type;
 #endif

--
This patch is available for review at http://codereview.appspot.com/5872043

[PATCH] Partial Transition fix attempt

2012-03-21 Thread redbrain

in tree-mudflap.c to change usage of fold_conver to build_int_cst 
http://gcc.gnu.org/wiki/Partial_Transitions. I am not 100% sure this is correct 
but maybe someone can shed some light.

---
 gcc/tree-mudflap.c |   38 --
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-mudflap.c b/gcc/tree-mudflap.c
index e4f6ec0..684b7b4 100644
--- a/gcc/tree-mudflap.c
+++ b/gcc/tree-mudflap.c
@@ -567,8 +567,7 @@ mf_build_check_statement_for (tree base, tree limit,
 
   /* Build: __mf_base = (uintptr_t) .  */
   seq = gimple_seq_alloc ();
-  t = fold_convert_loc (location, mf_uintptr_type,
-   unshare_expr (base));
+  t = build_int_cst_type (mf_uintptr_type, TREE_INT_CST_LOW (unshare_expr 
(base)));
   t = force_gimple_operand (t, &stmts, false, NULL_TREE);
   gimple_seq_add_seq (&seq, stmts);
   g = gimple_build_assign (mf_base, t);
@@ -576,8 +575,7 @@ mf_build_check_statement_for (tree base, tree limit,
   gimple_seq_add_stmt (&seq, g);
 
   /* Build: __mf_limit = (uintptr_t) .  */
-  t = fold_convert_loc (location, mf_uintptr_type,
-   unshare_expr (limit));
+  t = build_int_cst_type (mf_uintptr_type, TREE_INT_CST_LOW (unshare_expr 
(limit)));
   t = force_gimple_operand (t, &stmts, false, NULL_TREE);
   gimple_seq_add_seq (&seq, stmts);
   g = gimple_build_assign (mf_limit, t);
@@ -853,18 +851,21 @@ mf_xform_derefs_1 (gimple_stmt_iterator *iter, tree *tp,
if (elt)
  elt = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (elt)),
elt);
-addr = fold_convert_loc (location, ptr_type_node, elt ? elt : 
base);
+   addr = build_int_cst_type (ptr_type_node, elt ? TREE_INT_CST_LOW 
(elt) :
+  TREE_INT_CST_LOW (base));
 addr = fold_build_pointer_plus_loc (location,
addr, byte_position (field));
   }
 else
   addr = build1 (ADDR_EXPR, build_pointer_type (type), t);
-
+   
 limit = fold_build2_loc (location, MINUS_EXPR, mf_uintptr_type,
- fold_build2_loc (location, PLUS_EXPR, 
mf_uintptr_type,
- fold_convert (mf_uintptr_type, addr),
- size),
- integer_one_node);
+fold_build2_loc (location, PLUS_EXPR, 
mf_uintptr_type,
+ build_int_cst_type 
(mf_uintptr_type,
+ 
TREE_INT_CST_LOW (addr)),
+ 
+ size),
+integer_one_node);
   }
   break;
 
@@ -908,17 +909,17 @@ mf_xform_derefs_1 (gimple_stmt_iterator *iter, tree *tp,
   return;
 
 bpu = bitsize_int (BITS_PER_UNIT);
-ofs = fold_convert (bitsizetype, TREE_OPERAND (t, 2));
+   ofs = build_int_cst_type (bitsizetype, TREE_INT_CST_LOW (TREE_OPERAND 
(t, 2)));
 rem = size_binop_loc (location, TRUNC_MOD_EXPR, ofs, bpu);
 ofs = size_binop_loc (location, TRUNC_DIV_EXPR, ofs, bpu);
 
-size = fold_convert (bitsizetype, TREE_OPERAND (t, 1));
+size = build_int_cst_type (bitsizetype, TREE_INT_CST_LOW (TREE_OPERAND 
(t, 1)));
 size = size_binop_loc (location, PLUS_EXPR, size, rem);
 size = size_binop_loc (location, CEIL_DIV_EXPR, size, bpu);
-size = fold_convert (sizetype, size);
+size = build_int_cst_type (sizetype, TREE_INT_CST_LOW (size));
 
 addr = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
-addr = fold_convert (ptr_type_node, addr);
+addr = build_int_cst_type (ptr_type_node, TREE_INT_CST_LOW (addr));
 addr = fold_build_pointer_plus_loc (location, addr, ofs);
 
 base = addr;
@@ -1048,8 +1049,8 @@ mx_register_decls (tree decl, gimple_seq seq, location_t 
location)
 
  /* Variable-sized objects should have sizes already been
 gimplified when we got here. */
- size = fold_convert (size_type_node,
-  TYPE_SIZE_UNIT (TREE_TYPE (decl)));
+ size = build_int_cst_type (size_type_node,
+TREE_INT_CST_LOW (TYPE_SIZE_UNIT 
(TREE_TYPE (decl;
  gcc_assert (is_gimple_val (size));
 
 
@@ -1233,11 +1234,12 @@ mudflap_register_call (tree obj, tree object_size, tree 
varname)
   tree arg, call_stmt;
 
   arg = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (obj)), obj);
-  arg = fold_convert (ptr_type_node, arg);
+  arg = build_int_cst_type (ptr_type_node, TREE_INT_CST_LOW (arg));
 
   call_stmt = build_call_expr (mf_register_fndecl, 4,
   arg,
-  fold_convert (size_type_node, object_size),
+

Re: [C11-atomic] test invalid hoist across and acquire load

2012-03-21 Thread Andrew MacLeod


On 03/21/2012 01:35 PM, Aldy Hernandez wrote:
In the test below, we cannot cache either [x] or [y] neither before 
the load of flag1 nor the load of flag2.  This is because the 
corresponding store/release can flush a different value of x or y:


+  if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
+i = x + y;
+
+  if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
+a = 10;
+  j = x + y;



Actually, does it need to be that complicated?

can't you simply have the "other_thread" process monotonically increase 
x by 1 every cycle?


then if the load is hoisted and commoned,  
simulate_thread_final_verify() can simply check that if  i == j,  it 
knows that x was loaded as a common value and reused when calculating 
j.   with the other thread increasing x eveyr sycle, they should never 
be the same value.


Andrew

Re: [C11-atomic] test invalid hoist across and acquire load

2012-03-21 Thread Andrew MacLeod


On 03/21/2012 01:35 PM, Aldy Hernandez wrote:


The pass at fault here is the combine stack adjustment RTL pass.  I 
have not looked into why this is happening, but I wanted to get this 
test into the branch lest we forget about it.


Is this OK for the branch?  Is my understanding correct?


Fine for the C11-atomic branch..  we'll accumulate all the failing tests 
there for now... and then address them when we go memory-model 
compliance hunting...


Andrew

[google][4.6]Bump param value of default function size limit for auto cloning

2012-03-21 Thread Sriraman Tallam

Hi,

  I am bumping up the default param value of  function size limit for
auto cloning. Since auto cloning happens on inlined functions, the
original value does not catch some cases in one of our benchmarks.

  Automatic function versioning is only available in the
google/gcc-4_6 branch. I am working on porting this to trunk. Please
see: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01230.html for
description.

Thanks,
-Sri.


Index: params.def
===
--- params.def  (revision 185514)
+++ params.def  (working copy)
@@ -1040,7 +1040,7 @@ DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
 DEFPARAM (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING,
  "autoclone-function-size-limit",
  "Do not auto clone functions beyond this size.",
- 450, 0, 10)
+ 5000, 0, 10)

 /*
 Local variables:

[C11-atomic] test invalid hoist across and acquire load

2012-03-21 Thread Aldy Hernandez

In the test below, we cannot cache either [x] or [y] neither before the 
load of flag1 nor the load of flag2.  This is because the corresponding 
store/release can flush a different value of x or y:


+  if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
+i = x + y;
+
+  if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
+a = 10;
+  j = x + y;

For example, on x86-64, we are hoisting "x" and "y" before the load of 
flag2:


movlflag1(%rip), %eax
movlx(%rip), %edx   <-- hoist of X
testl   %eax, %eax
movly(%rip), %eax   <-- hoist of Y
je  .L2
leal(%edx,%eax), %ecx
movl%ecx, i(%rip)
.L2:
movlflag2(%rip), %ecx
testl   %ecx, %ecx
je  .L3
movl$10, a(%rip)
.L3:
addl%edx, %eax  <-- x/y may have changed by the
acquire of flag2.
movl%eax, j(%rip)
ret

(For that matter, we are also hoisting "x" before the actual test of 
flag1 as well, but I believe this is allowed since flag1 has already 
been loaded.)


The pass at fault here is the combine stack adjustment RTL pass.  I have 
not looked into why this is happening, but I wanted to get this test 
into the branch lest we forget about it.


Is this OK for the branch?  Is my understanding correct?

Aldy

Index: atomic-hoist-1.c
===
--- atomic-hoist-1.c(revision 0)
+++ atomic-hoist-1.c(revision 0)
@@ -0,0 +1,96 @@
+/* { dg-do link } */
+/* { dg-require-effective-target sync_int_long } */
+/* { dg-final { simulate-thread } } */
+
+/* Test that a hoist is not performed across an acquire barrier.  */
+
+#include 
+#include "simulate-thread.h"
+
+int flag1=1, flag2=1;
+
+unsigned int x=1, y=2, i=0x1234, j=0x5678, a;
+
+/* These two tables are random numbers such that there are no two
+   pairs between the both tables that yield the same sum.  */
+
+unsigned int table1[16] = {
+  24747, 19512, 3692, 25985,
+  25079, 24, 3310, 22073,
+  4026, 25641, 35240, 35542,
+  24783, 17378, 12184, 23755
+};
+
+unsigned int table2[16] = {
+  2467, 37461, 14064, 36460,
+  46434, 8387, 42174, 36763,
+  49205, 48759, 10526, 3446,
+  14035, 2195, 6798, 38782
+};
+
+int table_cycle_size = 16;
+
+/* At each instruction, get a new X and Y from the tables to later
+   verify that we have not reused a value incorrectly.  */
+void simulate_thread_other_threads ()
+{
+  static int current = 0;
+
+  if (++current >= table_cycle_size)
+current = 0;
+  x = table1[current];
+  y = table2[current];
+}
+
+/* Return true if error, otherwise 0.  */
+int verify_result ()
+{
+  /* [i] should not equal [j], because that would mean that we hoisted
+ [x] and [y] instead of loading them again.  */
+  int fail = i == j;
+  if (fail)
+printf("FAIL: i (%u) should not equal j (%u)\n", i, j);
+  return fail;
+}
+
+int simulate_thread_step_verify ()
+{
+  return verify_result ();
+}
+
+int simulate_thread_final_verify ()
+{
+  return verify_result ();
+}
+
+__attribute__((noinline))
+void simulate_thread_main()
+{
+  /* The values of x or y should not be hoisted across reads of
+ flag[12].
+
+ For example, when the second load below synchronizes with another
+ thread, the synchronization is with a release, and that release
+ may cause a stored value of x/y to be flushed and become visible.
+ So, for this case, it is incorrect for CSE/CSA/and-others to
+ hoist x or y above the load of flag2.  */
+
+  /* Execute loads with value changing at various cyclic values.  */
+  if (__atomic_load_n (&flag1, __ATOMIC_ACQUIRE))
+i = x + y;
+ 
+  if (__atomic_load_n (&flag2, __ATOMIC_ACQUIRE))
+a = 10;
+  j = x + y;
+
+  /* Since x and y have been changing at each instruction above, i and j
+ should be different.  If they are the same, we have hoisted
+ something incorrectly.  */
+}
+
+main()
+{
+  simulate_thread_main ();
+  simulate_thread_done ();
+  return 0;
+}

Re: [google][4.6]Bump param value of default function size limit for auto cloning

2012-03-21 Thread Xinliang David Li

ok.

thanks,

David

On Wed, Mar 21, 2012 at 11:20 AM, Sriraman Tallam  wrote:
> Hi,
>
>  I am bumping up the default param value of  function size limit for
> auto cloning. Since auto cloning happens on inlined functions, the
> original value does not catch some cases in one of our benchmarks.
>
>  Automatic function versioning is only available in the
> google/gcc-4_6 branch. I am working on porting this to trunk. Please
> see: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01230.html for
> description.
>
> Thanks,
> -Sri.
>
>
> Index: params.def
> ===
> --- params.def  (revision 185514)
> +++ params.def  (working copy)
> @@ -1040,7 +1040,7 @@ DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
>  DEFPARAM (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING,
>          "autoclone-function-size-limit",
>          "Do not auto clone functions beyond this size.",
> -         450, 0, 10)
> +         5000, 0, 10)
>
>  /*
>  Local variables:

[PATCH, alpha]: Use generic config/elfos.h headers

2012-03-21 Thread Uros Bizjak

Hello!

Attached patch enables alpha to use generic config/elfos.h headers on
linux and *bsd targets. The most important difference to generic
elfos.h is in

* config/alpha/elf.h (TARGET_ASM_FILE_START_FILE_DIRECTIVE): Undefine.

and

* config/alpha/alpha.h (NO_DOLLAR_IN_LABEL): Undefine.

Otherwise, the patch is very carefully written to not change anything
in alpha.h (it only #undefines some symbols before #define and removes
READOLNY_DATA_SECTION_ASM_OP that is always defined elsewhere), so VMS
target should continue to work without problems.

2012-03-21  Uros Bizjak  

* config.gcc (alpha*-*-linux*): Add elfos.h to tm_file.
(alpha*-*-freebsd*): Ditto.
(alpha*-*-netbsd*): Ditto.
(alpha*-*-openbsd*): Ditto.
* config/alpha/elf.h (OBJECT_FORMAT_ELF): Remove.
(DWARF2_DEBUGGING_INFO): Remove.
(PREFERRED_DEBUGGING_TYPE): Remove.
(ASM_FINAL_SPEC): Remove.
(IDENT_ASM_OP): Remove.
(ASM_OUTPUT_IDENT): Remove.
(SKIP_ASM_OP): Remove.
(ASM_OUTPUT_SKIP): Remove.
(ALIGN_ASM_OP): Remove.
(ASM_OUTPUT_BEFORE_CASE_LABEL): Remove.
(ASM_OUTPUT_CASE_LABEL): Remove.
(ASM_OUTPUT_EXTERNAL_LIBCALL): Remove.
(COMMON_ASM_OP): Remove.
(ASM_OUTPUT_ALIGNED_COMMON): Remove.
(ASCII_DATA_ASM_OP): Remove.
(READONLY_DATA_SECTION_ASM_OP): Remove.
(INIT_SECTION_ASM_OP): Remove.
(FINI_SECTION_ASM_OP): Remove.
(ASM_SECTION_START_OP): Remove.
(ASM_OUTPUT_SECTION_START_FILE): Remove.
(TARGET_ASM_NAMED_SECTION): Remove.
(TARGET_ASM_SELECT_SECTION): Remove.
(MAKE_DECL_ONE_ONLY): Remove.
(TYPE_ASM_OP): Remove.
(SIZE_ASM_OP): Remove.
(ASM_WEAKEN_LABEL): Remove.
(TYPE_OPERAND_FMT): Remove.
(ASM_DECLARE_RESULT): Remove.
(ASM_DECLARE_OBJECT_NAME): Remove.
(ASM_FINISH_DECLARE_OBJECT): Remove.
(ELF_ASCII_ESCAPES): Remove.
(ELF_STRING_LIMIT): Remove.
(STRING_ASM_OP): Remove.
(ASM_OUTPUT_EXTERNAL): Remove.
(TARGET_ASM_FILE_START_FILE_DIRECTIVE): Undefine.
* config/alpha/alpha.h (PCC_BITFIELD_TYPE_MATTERS): Undefine
before define.
(ASM_DECLARE_FUNCTION_NAME): Ditto.
(ASM_DECLARE_FUNCTION_SIZE): Ditto.
(USER_LABEL_PREFIX): Ditto.
(ASM_GENERATE_INTERNAL_LABEL): Ditto.
(ASM_OUTPUT_ASCII): Ditto.
(ASM_OUTPUT_SKIP): Ditto.
(READONLY_DATA_SECTION_ASM_OP): Remove.
(ASM_OUTPUT_CASE_LABEL): Remove.
(NO_DOLLAR_IN_LABEL): Undefine.

Patch was bootstrapped and regression tested on alphaev68-pc-linux-gnu.

OK for mainline?

Uros.
Index: config/alpha/alpha.h
===
--- config/alpha/alpha.h(revision 185578)
+++ config/alpha/alpha.h(working copy)
@@ -284,6 +284,7 @@
 #define STRUCTURE_SIZE_BOUNDARY 8
 
 /* A bit-field declared as `int' forces `int' alignment for the struct.  */
+#undef PCC_BITFILED_TYPE_MATTERS
 #define PCC_BITFIELD_TYPE_MATTERS 1
 
 /* No data type wants to be aligned rounder than this.  */
@@ -700,11 +701,13 @@
 
 /* This macro produces the initial definition of a function.  */
 
+#undef ASM_DECLARE_FUNCTION_NAME
 #define ASM_DECLARE_FUNCTION_NAME(FILE,NAME,DECL) \
   alpha_start_function(FILE,NAME,DECL);
 
 /* This macro closes up a function definition for the assembler.  */
 
+#undef ASM_DECLARE_FUNCTION_SIZE
 #define ASM_DECLARE_FUNCTION_SIZE(FILE,NAME,DECL) \
   alpha_end_function(FILE,NAME,DECL)
 
@@ -982,10 +985,6 @@
 
 #define TEXT_SECTION_ASM_OP "\t.text"
 
-/* Output before read-only data.  */
-
-#define READONLY_DATA_SECTION_ASM_OP "\t.rdata"
-
 /* Output before writable data.  */
 
 #define DATA_SECTION_ASM_OP "\t.data"
@@ -1022,12 +1021,14 @@
 
 /* The prefix to add to user-visible assembler symbols.  */
 
+#undef USER_LABEL_PREFIX
 #define USER_LABEL_PREFIX ""
 
 /* This is how to output a label for a jump table.  Arguments are the same as
for (*targetm.asm_out.internal_label), except the insn for the jump table is
passed.  */
 
+#undef ASM_OUTPUT_CASE_LABEL
 #define ASM_OUTPUT_CASE_LABEL(FILE,PREFIX,NUM,TABLEINSN)   \
 { ASM_OUTPUT_ALIGN (FILE, 2); (*targetm.asm_out.internal_label) (FILE, PREFIX, 
NUM); }
 
@@ -1036,12 +1037,14 @@
PREFIX is the class of label and NUM is the number within the class.
This is suitable for output with `assemble_name'.  */
 
+#undef ASM_GENERATE_INTERNAL_LABEL
 #define ASM_GENERATE_INTERNAL_LABEL(LABEL,PREFIX,NUM)  \
   sprintf ((LABEL), "*$%s%ld", (PREFIX), (long)(NUM))
 
 /* We use the default ASCII-output routine, except that we don't write more
than 50 characters since the assembler doesn't support very long lines.  */
 
+#undef ASM_OUTPUT_ASCII
 #define ASM_OUTPUT_ASCII(MYFILE, MYSTRING, MYLENGTH) \
   do {   \
 FILE *_hide_asm_

[patch][PR52640] Fix quadratic behavior with many referenced extern functions

2012-03-21 Thread Steven Bosscher

Hello,

The test case for this bug triggeres O(extern_delcs**2) behavior
because value_member traverses the pending_assemble_externals list
from start to end for every new extern decl.

The solution I've picked, is to add a pointer set, and while there I
made pending_assemble_externals a VEC instead of a TREE_LIST. I've
also added a FIXME to clarify that this whole situation of having
places calling assemble_external is not desirable.

On gcc110, the test case compiles in ~4s with the patch, and ~24s
without. If I add LIM5(Y) and LIM5(Z), the compile time is ~15s with
the patch, and ~372s without the patch. I am not sure what is
reasonable for this kind of test case, but on a smaller machine the
test case will probably blow up if I add those two extra LIM5s.

Anyway. Bootstrapped&tested on powerpc64-unknown-linux-gnu.
OK for trunk?
OK for all open release branches too?

Ciao!
Steven

gcc/
* varasm.c (pending_assemble_externals): Make a VEC.
(pending_assemble_externals_set): New pointer set.
(process_pending_assemble_externals): Traverse the VEC instead
of the TREE_LIST. Destroy the pointer set.
(assemble_external): See if decl is in pending_assemble_externals_set,
and add it to pending_assemble_externals if necessary.
(init_varasm_once): Allocate pending_assemble_externals and
pending_assemble_externals_set.

testsuite/
* gcc.c-torture/compile/limits-externdecl.c: New test for PR62640.
gcc/
* varasm.c (pending_assemble_externals): Make a VEC.
(pending_assemble_externals_set): New pointer set.
(process_pending_assemble_externals): Traverse the VEC instead
of the TREE_LIST. Destroy the pointer set.
(assemble_external): See if decl is in pending_assemble_externals_set,
and add it to pending_assemble_externals if necessary.
(init_varasm_once): Allocate pending_assemble_externals and
pending_assemble_externals_set.

testsuite/
* gcc.c-torture/compile/limits-externdecl.c: New test for PR62640.

Index: varasm.c
===
--- varasm.c(revision 185603)
+++ varasm.c(working copy)
@@ -2097,8 +2097,15 @@ contains_pointers_p (tree type)
the compilation unit is finalized.  This is the best we can do for
right now (i.e. stage 3 of GCC 4.0) - the right thing is to delay
it all the way to final.  See PR 17982 for further discussion.  */
-static GTY(()) tree pending_assemble_externals;
+static GTY(()) VEC(tree,gc) *pending_assemble_externals;
 
+/* FIXME: Trunk is at GCC 4.8 now and the above problem still hasn't been
+   addressed properly.  This caused PR 52640 due to O(external_decls**2)
+   lookups in the pending_assemble_externals queue in assemble_external.
+   Paper over with this pointer set.  (And pending_assemble_externals even
+   was a TREE_LIST before?!)  */
+static struct pointer_set_t *pending_assemble_externals_set;
+
 #ifdef ASM_OUTPUT_EXTERNAL
 /* True if DECL is a function decl for which no out-of-line copy exists.
It is assumed that DECL's assembler name has been set.  */
@@ -2146,11 +2153,14 @@ void
 process_pending_assemble_externals (void)
 {
 #ifdef ASM_OUTPUT_EXTERNAL
-  tree list;
-  for (list = pending_assemble_externals; list; list = TREE_CHAIN (list))
-assemble_external_real (TREE_VALUE (list));
+  size_t i;
+  tree decl;
 
-  pending_assemble_externals = 0;
+  FOR_EACH_VEC_ELT (tree, pending_assemble_externals, i, decl)
+assemble_external_real (decl);
+  VEC_free (tree, gc, pending_assemble_externals);
+
+  pointer_set_destroy (pending_assemble_externals_set);
 #endif
 }
 
@@ -2191,9 +2201,8 @@ assemble_external (tree decl ATTRIBUTE_UNUSED)
 weak_decls = tree_cons (NULL, decl, weak_decls);
 
 #ifdef ASM_OUTPUT_EXTERNAL
-  if (value_member (decl, pending_assemble_externals) == NULL_TREE)
-pending_assemble_externals = tree_cons (NULL, decl,
-   pending_assemble_externals);
+  if (! pointer_set_insert (pending_assemble_externals_set, decl))
+VEC_safe_push (tree, gc, pending_assemble_externals, decl);
 #endif
 }
 
@@ -6168,6 +6177,11 @@ init_varasm_once (void)
 
   if (readonly_data_section == NULL)
 readonly_data_section = text_section;
+
+#ifdef ASM_OUTPUT_EXTERNAL
+  pending_assemble_externals = VEC_alloc (tree, gc, 12);
+  pending_assemble_externals_set = pointer_set_create ();
+#endif
 }
 
 enum tls_model
Index: testsuite/gcc.c-torture/compile/limits-externdecl.c
===
--- testsuite/gcc.c-torture/compile/limits-externdecl.c (revision 0)
+++ testsuite/gcc.c-torture/compile/limits-externdecl.c (revision 0)
@@ -0,0 +1,56 @@
+/* Inspired by the test case for PR middle-end/52640.  */
+
+typedef struct
+{
+char *value;
+} REFERENCE;
+
+/* Add a few "extern int Xx ();" declarations.  */
+#undef DEF
+#undef LIM1
+#undef LIM2
+#undef LIM3
+#undef LI

Re: [RFC, patch] powerpc64 FreeBSD support

2012-03-21 Thread Andreas Tobler


On 02.03.12 17:28, David Edelsohn wrote:

On Fri, Mar 2, 2012 at 11:04 AM, Andreas Tobler  wrote:


the attached patch adds support for powerpc64-*-freebsd*.
Results are/were sent to the test results list.

A few words about the patch.
I have chosen the way to add separate freebsd* files because FreeBSD
supports not as much as linux does in regard of the PowerPC CPU's. For
example we only have FreeBSD running on 970 CPU's. POWER support on the
kernel side is wip, but it grows very slow.

Also, powerpc FreeBSD has 64-bit long doubles while linux-ppc has 128-bit.

The architecture itself is a 64-bit one which can execute 32-bit
binaries. But it its primary bit width is 64-bit. It is not wanted that
we have a 32-bit compiler which can build 64-bit objects.

To have a clearer picture and not to influence linux-ppc I thought
having its own set of files would be easier.

I'd appreciate feedback on the patch.

There are some issues with the tests, I'm cleaning and trying to make
them work. In certain areas I need to fix the kernel or the libc first
and this takes some time.

One thing is the libstdc++ test I had to skip. Here the test tries to
allocate int_max mem. And this blows my machine :)

Here the CL's.

Thank you!
Andreas

gcc:


2011-12-22  Andreas Tobler

* configure.ac (HAVE_LD_NO_DOT_SYMBOLS): Add powerpc64-*-freebsd*.
Introduce emul_name to select the right linker emulation for
powerpc64-*-freebsd*.
* configure: Regenerate.
* config.gcc: Add bits to support powerpc64-*-freebsd*.
* config/rs6000/freebsd.h (POWERPC_FREEBSD): Define.
* config/rs6000/freebsd64.h: New file.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Use
POWERPC_FREEBSD.
(rs6000_savres_strategy): Likewise.
(rs6000_savres_routine_name): Likewise.
(rs6000_elf_file_end): Likewise.
* config/rs6000/t-freebsd64: New file.
* config/rs6000/sysv4.h (SUBTARGET_OVERRIDE_OPTIONS): Set the
rs6000_current_abi for 64-bit FreeBSD to ABI_AIX.

libgcc:
---

2011-12-22  Andreas Tobler

* config.host: Add bits to support powerpc64-*-freebsd*.
* config/rs6000/freebsd-unwind.h: New file.
* config/rs6000/t-freebsd64: New file.

libstdc++:
--
2011-12-22  Andreas Tobler

* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
Skip
this test on powerpc64-*-freebsd*.


This patch is okay for trunk now that it has opened for 4.8 release.
I thought that Loren might comment.


Thank you David!

I commited this patch to trunk, r185613-r185615.

Results are here:

http://gcc.gnu.org/ml/gcc-testresults/2012-03/msg02067.html

I'll continue to improve the port...

Again, thanks.
Andreas

Re: [PATCH] Preserve loops from tree to RTL loop optimizers

2012-03-21 Thread Richard Sandiford

Richard Guenther  writes:
> This patch makes us preserve loop structures from the start of tree
> loop optimizers to the end of RTL loop optimizers.  It uses a new
> property, PROP_loops to indicate we want to preserve them and
> massages loop_optimizer_init/finalize to honor that.
>
> On the RTL side the CFG manipulation was not prepared to handle
> updating loops, so this patch fills in enough to pass bootstrap
> and regtesting.  We still do too much loop fixing from cleanup_cfg
> basically because callers do not tell cleanup_cfg if they have
> modified the CFG themselves (CSE for example does in some cases).
> It was suggested to use a new flag to cleanup_cfg to do that,
> other suggestions welcome.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing shows some
> remaining libstdc++ errors, I am investigating them now but
> don't expect major issues.
>
> Comments?  The patch is ontop of the early RTL pass merge.

Thanks for doing this (and for keeping the ~PROP_loops case around for
passes after rtl_loop_done -- I have a patch that uses it for SMS).

Richard

Re: [patch][PR52640] Fix quadratic behavior with many referenced extern functions

2012-03-21 Thread Diego Novillo


On 3/21/12 3:30 PM, Steven Bosscher wrote:


+/* FIXME: Trunk is at GCC 4.8 now and the above problem still hasn't been
+   addressed properly.  This caused PR 52640 due to O(external_decls**2)
+   lookups in the pending_assemble_externals queue in assemble_external.
+   Paper over with this pointer set.  (And pending_assemble_externals even
+   was a TREE_LIST before?!)  */
+static struct pointer_set_t *pending_assemble_externals_set;


Can you add some description on what this pointer set does?



+
 #ifdef ASM_OUTPUT_EXTERNAL
 /* True if DECL is a function decl for which no out-of-line copy exists.
It is assumed that DECL's assembler name has been set.  */
@@ -2146,11 +2153,14 @@ void
 process_pending_assemble_externals (void)
 {
 #ifdef ASM_OUTPUT_EXTERNAL
-  tree list;
-  for (list = pending_assemble_externals; list; list = TREE_CHAIN (list))
-assemble_external_real (TREE_VALUE (list));
+  size_t i;
+  tree decl;

-  pending_assemble_externals = 0;
+  FOR_EACH_VEC_ELT (tree, pending_assemble_externals, i, decl)
+assemble_external_real (decl);
+  VEC_free (tree, gc, pending_assemble_externals);


Just pending_assemble_externals = NULL;

OK for all release branches (if they are open for fixes).


Diego.

[patch][objc] Do not call assemble_external

2012-03-21 Thread Steven Bosscher

Hello,

There is no reason for the ObjC front end to call assemble_external on
these symbols, the middle-end handles this just fine via
add_builtin_function.

Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven

objc/
* objc-act (objc_build_ivar_assignment): Do not call assemble_external.
(objc_build_global_assignment): Likewise.
(objc_build_strong_cast_assignment): Likewise.
* objc-next-runtime-abi-01.c: Cleanup commented-out assemble_external.
* objc-next-runtime-abi-02.c: Likewise.
* objc-gnu-runtime-abi-01.c: Likewise.
objc/
	* objc-act (objc_build_ivar_assignment): Do not call assemble_external.
	(objc_build_global_assignment): Likewise.
	(objc_build_strong_cast_assignment): Likewise.
	* objc-next-runtime-abi-01.c: Cleanup commented-out assemble_external.
	* objc-next-runtime-abi-02.c: Likewise.
	* objc-gnu-runtime-abi-01.c: Likewise.

Index: objc-act.c
===
--- objc-act.c	(revision 185603)
+++ objc-act.c	(working copy)
@@ -3553,7 +3553,6 @@ objc_build_ivar_assignment (tree outervar, tree lh
 		tree_cons (NULL_TREE, offs,
 		NULL_TREE)));
 
-  assemble_external (func);
   return build_function_call (input_location, func, func_params);
 }
 
@@ -3566,7 +3565,6 @@ objc_build_global_assignment (tree lhs, tree rhs)
 		  build_unary_op (input_location, ADDR_EXPR, lhs, 0)),
 		NULL_TREE));
 
-  assemble_external (objc_assign_global_decl);
   return build_function_call (input_location,
 			  objc_assign_global_decl, func_params);
 }
@@ -3580,7 +3578,6 @@ objc_build_strong_cast_assignment (tree lhs, tree
 		  build_unary_op (input_location, ADDR_EXPR, lhs, 0)),
 		NULL_TREE));
 
-  assemble_external (objc_assign_strong_cast_decl);
   return build_function_call (input_location,
 			  objc_assign_strong_cast_decl, func_params);
 }
Index: objc-next-runtime-abi-01.c
===
--- objc-next-runtime-abi-01.c	(revision 185603)
+++ objc-next-runtime-abi-01.c	(working copy)
@@ -977,7 +977,6 @@ next_runtime_abi_01_get_category_super_ref (locati
   /* else do it the slow way.  */
   add_class_reference (super_name);
   super_class = (inst_meth ? objc_get_class_decl : objc_get_meta_class_decl);
-/* assemble_external (super_class);*/
   super_name = my_build_string_pointer (IDENTIFIER_LENGTH (super_name) + 1,
 	IDENTIFIER_POINTER (super_name));
   /* super_class = objc_get{Meta}Class("CLASS_SUPER_NAME"); */
Index: objc-next-runtime-abi-02.c
===
--- objc-next-runtime-abi-02.c	(revision 185603)
+++ objc-next-runtime-abi-02.c	(working copy)
@@ -1509,7 +1509,6 @@ next_runtime_abi_02_get_category_super_ref (locati
   /* ??? Do we need to add the class ref anway for zero-link?  */
   /* else do it the slow way.  */
   super_class = (inst_meth ? objc_get_class_decl : objc_get_meta_class_decl);
-  /* assemble_external (super_class); */
   super_name = my_build_string_pointer (IDENTIFIER_LENGTH (super_name) + 1,
 	IDENTIFIER_POINTER (super_name));
   /* super_class = objc_get{Meta}Class("CLASS_SUPER_NAME"); */
Index: objc-gnu-runtime-abi-01.c
===
--- objc-gnu-runtime-abi-01.c	(revision 185603)
+++ objc-gnu-runtime-abi-01.c	(working copy)
@@ -574,8 +574,6 @@ gnu_runtime_abi_01_get_class_reference (tree ident
 		(IDENTIFIER_LENGTH (ident) + 1,
 		 IDENTIFIER_POINTER (ident)));
 
-  /* FIXME: Do we need this assemble_external() ? */
-  /* assemble_external (objc_get_class_decl);*/
   return build_function_call (input_location, objc_get_class_decl, params);
 }
 
@@ -839,8 +837,6 @@ gnu_runtime_abi_01_get_category_super_ref (locatio
 
   add_class_reference (super_name);
   super_class = (inst_meth ? objc_get_class_decl : objc_get_meta_class_decl);
-  /* FIXME: Do we need this assemble_external() ? */
-  /* assemble_external (super_class);*/
   super_name = my_build_string_pointer (IDENTIFIER_LENGTH (super_name) + 1,
 	IDENTIFIER_POINTER (super_name));
   /* super_class = get_{meta_}class("CLASS_SUPER_NAME");  */

Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Iain Sandoe


Hi Steven,

On 21 Mar 2012, at 21:09, Steven Bosscher wrote:

There is no reason for the ObjC front end to call assemble_external on
these symbols, the middle-end handles this just fine via
add_builtin_function.


Ah, that's the bit I'd yet to figure out ...


Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven

objc/
	* objc-act (objc_build_ivar_assignment): Do not call  
assemble_external.

(objc_build_global_assignment): Likewise.
(objc_build_strong_cast_assignment): Likewise.
	* objc-next-runtime-abi-01.c: Cleanup commented-out  
assemble_external.

* objc-next-runtime-abi-02.c: Likewise.
* objc-gnu-runtime-abi-01.c: Likewise.



... this would allow us to close PR17982?

... and make progress on PR24777? (I'm not sure where exactly we need  
to go with this one - we have different sets of calls depending on the  
runtime)


Iain

Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Steven Bosscher

On Wed, Mar 21, 2012 at 10:23 PM, Iain Sandoe
 wrote:
>> objc/
>>        * objc-act (objc_build_ivar_assignment): Do not call
>> assemble_external.
>>        (objc_build_global_assignment): Likewise.
>>        (objc_build_strong_cast_assignment): Likewise.
>>        * objc-next-runtime-abi-01.c: Cleanup commented-out
>> assemble_external.
>>        * objc-next-runtime-abi-02.c: Likewise.
>>        * objc-gnu-runtime-abi-01.c: Likewise.
>> 
>
>
> ... this would allow us to close PR17982?

I believe so, but I have to look into it a but deeper to understand
what the remaining assemble_external calls are for. The ones in
config/* are OK, they are all for writing out multiple-inheritance
thunks. The ones that need checking are:

calls.c:1649: assemble_external (fndecl);
expr.c:1422:  assemble_external (block_move_fn);
expr.c:2794:  assemble_external (block_clear_fn);
expr.c:7423:  assemble_external (exp);
expr.c:9022:  assemble_external (exp);
final.c:2745: assemble_external (t);
final.c:3497:   assemble_external (t);
final.c:3568:   assemble_external (SYMBOL_REF_DECL (x));
toplev.c:489:  assemble_external (decl);

> ... and make progress on PR24777? (I'm not sure where exactly we need to go
> with this one - we have different sets of calls depending on the runtime)

The FIXMEs in that PR do not exist anymore. Perhaps this removed them:

2011-10-11  Michael Meissner  

* objc-next-runtime-abi-01.c (objc_build_exc_ptr): Delete old
interface with two parallel arrays to hold standard builtin
declarations, and replace it with a function based interface that
can support creating builtins on the fly in the future.  Change
all uses, and poison the old names.  Make sure 0 is not a
legitimate builtin index.
* objc-next-runtime-abi-02.c (objc_build_exc_ptr): Ditto.
* objc-gnu-runtime-abi-01.c (objc_build_exc_ptr): Ditto.

In any case, if there's nothing left to fix for PR24777, I suppose it
can be closed as FIXED.

Ciao!
Steven

Re: remove wrong code in immed_double_const

2012-03-21 Thread Mike Stump

On Mar 21, 2012, at 6:17 AM, Richard Sandiford wrote:
> Sounds good.

Cool, a path forward.

> For this I think we should make plus_constant a wrapper:

Ah, yes, much better, thanks.  I'd expanded the comments on plus_constant_mode 
so that one might have a better idea why the code is that way and put in a 
TODO, so that people have an idea of what direction the code wants to move.

> I don't think it's a good idea to punt to a PLUS though.

Fixed, thanks for the code.

> Nicely, add_double returns true on overflow, so I think
> we should replace the punt with:

Ah, yes, nice, fixed.

> For this I think we should change the recursive CONSTANT_P call
> to use plus_constant_mode

Done.

> and we can remove the special CONST_INT case.

Ok, ah, yes, because the recursive call will already combine them, done.

>>   if (width < HOST_BITS_PER_WIDE_INT)
>> val &= ((unsigned HOST_WIDE_INT) 1 << width) - 1;
>> +  /* FIXME: audit for TImode and wider correctness.  */
>>   return val == ((unsigned HOST_WIDE_INT) 1 << (width - 1));
> 
> We've already returned false in that case though.

Ah, I saw it, but missed it anyway, thanks for the pointer, fixed.

> I'm happy for this function to be left as-is, but we could instead add a 
> comment like:
> 
>/* FIXME: We don't yet have a representation for wider modes.  */

Done.

>> @@ -1227,7 +1228,7 @@ simplify_const_unary_operation (enum rtx_code code, 
>> enum machine_mode mode,
>>  return 0;
>>  }
>>   else if (GET_MODE_BITSIZE (op_mode) >= HOST_BITS_PER_WIDE_INT * 2)
>> -;
>> +return 0;
>>   else
>>  hv = 0, lv &= GET_MODE_MASK (op_mode);
> 
> Should keep the current behaviour for == HOST_BITS_PER_WIDE_INT * 2.


> I'd slightly prefer an assert rather than a "return false", but I won't
> argue if another maintainer agrees with the return.

Ah, yes, I agree an assert would be better, as really, no one should ask for an 
unsigned conversion to floating point on a value that is negative, more likely 
it is just a spot we missed adding an assert to earlier and probably wants a 
larger constant that can represent a large unsigned number.  Fixed.

> This is changing the real case, where sign extension doesn't make
> much sense.

I'm not sure I followed.  Do you want me to remove the change for the second 
case, leave it as it, or something else?  I've left it as I had modified it.

> I think the expand_mult CONST_DOUBLE code needs changing

Agreed.  Fixed.  I preserve the code for smaller modes, and for small enough 
shift amounts. 1<<67 for example, or any mode, is still handled.  For large 
enough things, we just don't return the shift.

> Same for:

Fixed, in same way as previous case.

> simplify_const_unary_operation needs to check for overflow
> when handling 2-HWI arithmetic, since we can no longer assume
> that the result is <= 2 HWIs in size.  E.g.:
> 
>   case NEG:
> neg_double (l1, h1, &lv, &hv);
> break;
> 
> should probably be:
> 
>   case NEG:
> if (neg_double (l1, h1, &lv, &hv))
>gcc_assert (GET_MODE_BITSIZE (mode) <= 2 * HOST_WIDE_INT);
> break;

Are you talking about the block of code inside:

  /* We can do some operations on integer CONST_DOUBLEs.  Also allow
  
 for a DImode operation on a CONST_INT.  */
  else if (GET_MODE (op) == VOIDmode
   && width <= HOST_BITS_PER_WIDE_INT * 2

?  If so, we already know that the width is <= HOST_BITS_PER_WIDE_INT * 2.  :-)

Thanks for spotting the bits I missed.


Current patch:

diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texiindex de45a22..0c6dc45 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -1530,7 +1530,9 @@ Represents either a floating-point constant of mode 
@var{m} or an
 integer constant too large to fit into @code{HOST_BITS_PER_WIDE_INT}
 bits but small enough to fit within twice that number of bits (GCC
 does not provide a mechanism to represent even larger constants).  In
-the latter case, @var{m} will be @code{VOIDmode}.
+the latter case, @var{m} will be @code{VOIDmode}.  For integral values
+the value is a signed value, meaning the top bit of
+@code{CONST_DOUBLE_HIGH} is a sign bit.
 
 @findex CONST_DOUBLE_LOW
 If @var{m} is @code{VOIDmode}, the bits of the value are stored in
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 78ddfc3..c0b24e4 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -531,10 +531,9 @@ immed_double_const (HOST_WIDE_INT i0, HOST_WIDE_INT i1, 
enum machine_mode mode)
 
  1) If GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT, then we use
gen_int_mode.
- 2) GET_MODE_BITSIZE (mode) == 2 * HOST_BITS_PER_WIDE_INT, but the value of
-   the integer fits into HOST_WIDE_INT anyway (i.e., i1 consists only
-   from copies of the sign bit, and sign of i0 and i1 are the same),  then
-   we return a CONST_INT for i0.
+ 2) If the value of the integer fits into HOST_WIDE_INT any

Re: [Patch, fortran] PR 49010/24518 MOD/MODULO fixes

2012-03-21 Thread Janne Blomqvist

PING

On Wed, Mar 14, 2012 at 01:03, Janne Blomqvist
 wrote:
> Hi,
>
> the attached patch implements a few fixes and cleanups for the MOD and
> MODULO intrinsics.
>
> - When the arguments are constant, use mpfr_fmod instead of the naive
> algorithms which are numerically unstable for large arguments. This
> extends the PR 24518 fix to constant arguments as well, and makes the
> compile-time evaluation match the runtime implementation which also
> uses fmod in the same manner.
>
> - Remove the old fallback path for the case builtin_fmod is not
> available, as the builtin is AFAICS always available.
>
> The patch does not per se fix the corner-case bug as reported in PR
> 49010, in fact it makes it worse in a way as with the patch the result
> if the arguments are parameters is the same as the runtime result
> (previously, the compile-time result was correct). But, I think we
> should leave it as it is. Due to the reasons above, we're not using
> the naive algorithms anyway, and IMHO -0.0 is quite a good
> approximation for +0.0 anyway. One might even argue that due to the
> numerical instability, specifying the naive algorithms is a bug in the
> standard.
>
> The patch adds notes to the documentation about the usage of fmod, so
> users interested in corner-case behavior can look up how that function
> is supposed to behave on their target. FWIW, AFAICS MPFR and glibc
> fmod conform to the behavior specified in C99 Annex F.
>
> Regtested on x86_64-unknown-linux-gnu, Ok for trunk?
>
> 2012-03-14  Janne Blomqvist  
>
>        PR fortran/49010
>        PR fortran/24518
>        * intrinsic.texi (MOD,MODULO): Mention usage of fmod instead of
>        naive algorithm.
>        * simplify.c (gfc_simplify_mod): Use mpfr_fmod.
>        (gfc_simplify_modulo): Likewise.
>        * trans-intrinsic.c (gfc_conv_intrinsic_mod): Remove fallback as
>        builtin_fmod is always available.
>
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist

[google] Minor cleanup and test fixes for -mpatch-functions-for-instrumentation. (issue5877043)

2012-03-21 Thread Harshit Chopra

2012-03-21   Harshit Chopra  

  Minor changes:
i386.c: made check_should_patch_current_function C90 compatible.
i386.md: Added '\t' to bytes generated by
 ix86_output_function_nops_prologue_epilogue for proper formatting
 of assembly.
patch-functions-*.c: Fixed verification in tests. Added a test to verify
 nop-bytes generated for sibling calls and another test
 to verify a binary with nop-bytes runs properly.

* gcc/config/i386/i386.c (check_should_patch_current_function):
* gcc/config/i386/i386.md:
* gcc/testsuite/gcc.target/i386/patch-functions-1.c (void foo):
(int main):
* gcc/testsuite/gcc.target/i386/patch-functions-2.c:
* gcc/testsuite/gcc.target/i386/patch-functions-3.c:
* gcc/testsuite/gcc.target/i386/patch-functions-4.c:
* gcc/testsuite/gcc.target/i386/patch-functions-5.c:
* gcc/testsuite/gcc.target/i386/patch-functions-6.c:
* gcc/testsuite/gcc.target/i386/patch-functions-7.c:
* gcc/testsuite/gcc.target/i386/patch-functions-8.c (int foo):
(int bar):
* gcc/testsuite/gcc.target/i386/patch-functions-sibling-call.c:

Testing method:
  make check-gcc RUNTESTFLAGS="i386.exp=patch-functions* 
--target_board=\"unix{-m32,}\""

Patch to be applied to google/main.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 08bd5f0..be1f7a4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10981,6 +10981,7 @@ check_should_patch_current_function (void)
   const char* func_name = NULL;
   struct loops loops;
   int num_loops = 0;
+  int min_functions_instructions;
 
   /* Patch the function if it has at least a loop.  */
   if (!patch_functions_ignore_loops)
@@ -11007,7 +11008,7 @@ check_should_patch_current_function (void)
   strcmp("main", func_name) == 0)
 return true;
 
-  int min_functions_instructions =
+  min_functions_instructions =
   PARAM_VALUE (PARAM_FUNCTION_PATCH_MIN_INSTRUCTIONS);
   if (min_functions_instructions > 0)
 {
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 08353ff..38a04ae 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11688,7 +11688,7 @@
   /* Emit 10 nop bytes after ret.  */
   if (ix86_output_function_nops_prologue_epilogue (asm_out_file,
   
FUNCTION_PATCH_EPILOGUE_SECTION,
-  "ret",
+  "\tret",
   10))
return "";
 }
@@ -11712,7 +11712,7 @@
   /* Emit 9 nop bytes after rep;ret.  */
   if (ix86_output_function_nops_prologue_epilogue (asm_out_file,
   
FUNCTION_PATCH_EPILOGUE_SECTION,
-  "rep\;ret",
+  "\trep\;ret",
   9))
return "";
 }
diff --git a/gcc/testsuite/gcc.target/i386/patch-functions-1.c 
b/gcc/testsuite/gcc.target/i386/patch-functions-1.c
index 308e8c3..aa1f424 100644
--- a/gcc/testsuite/gcc.target/i386/patch-functions-1.c
+++ b/gcc/testsuite/gcc.target/i386/patch-functions-1.c
@@ -1,5 +1,5 @@
 /* Verify -mpatch-functions-for-instrumentation works.  */
-/* { dg-do run } */
+/* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-mpatch-functions-for-instrumentation" } */
 
@@ -8,13 +8,16 @@
 /* Check nop-bytes at end.  */
 /* { dg-final { scan-assembler "ret(.*).byte\t0x90(.*).byte\t0x90" } } */
 
-void foo() {
+__attribute__ ((noinline))
+void foo()
+{
   /* Dummy loop.  */
   int x = 0;
   while (++x);
 }
 
-int main() {
+int main()
+{
   foo();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/i386/patch-functions-2.c 
b/gcc/testsuite/gcc.target/i386/patch-functions-2.c
index 6baad32..78de867 100644
--- a/gcc/testsuite/gcc.target/i386/patch-functions-2.c
+++ b/gcc/testsuite/gcc.target/i386/patch-functions-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-mpatch-functions-for-instrumentation 
-mno-patch-functions-main-always" } */
 
@@ -8,11 +8,14 @@
 /* { dg-final { scan-assembler-not ".byte\t0xeb,0x09(.*).byte\t0x90" } } */
 /* { dg-final { scan-assembler-not "ret(.*).byte\t0x90(.*).byte\t0x90" } } */
 
-void foo() {
+__attribute__ ((noinline))
+void foo()
+{
   int x = 0;
 }
 
-int main() {
+int main()
+{
   foo();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/i386/patch-functions-3.c 
b/gcc/testsuite/gcc.target/i386/patch-functions-3.c
index 49b57a8..9e8eb52 100644
--- a/gcc/testsuite/gcc.target/i386/patch-functions-3.c
+++ b/gcc/testsuite/gcc.target/i386/patch-functions-3.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do c

Re: [fixincludes] Fix pthread.h failure (PR other/52626)

2012-03-21 Thread Mike Stump

On Mar 21, 2012, at 10:16 AM, Bruce Korb wrote:
> Patch welcome!  I, myself, don't know what "emit DejaGnu-style
> fixincludes.{sum, log} files" would mean.

Rather simple...  In a file called fixinclude.sum, put

  PASS: unique string

or

  FAIL: unique string

one per line, as many times as you want.  The unique strings should be 
meaningful to you in some way, and be stable over long periods of time (no 
`pwd` or `date` in them for example).  You can write a python script, a awk 
script, a bash script or a c program to generate this.  You can synthesize this 
from any source you can pull from, for example, the existing testing code or 
test report you might have.

Bonus points if you can total passes and failures:

exec >>$file
echo
echo "# of expected passes$(cat $file | grep 'PASS:' | wc -l)"
echo "# of expected failures$(cat $file | grep 'FAIL:' | wc -l)"

at the end of the file.  The above by the way, will add it (/bin/sh style), if 
you just create the $file.  That's it, done.  For example, if you just had a 
single, it all worked flawlessly thing, you could do:

if check-cmd; then
 echo "PASS: fixinclude"
else
 echo "FAIL: fixinclude"
fi

and then the code above, and viola, you're done.  To create the .log file, cp 
fixincludes.sum fixincludes.log, if you have nothing better to do.

Re: [fixincludes] Fix pthread.h failure (PR other/52626)

2012-03-21 Thread Mike Stump

On Mar 21, 2012, at 2:46 PM, Mike Stump wrote:
> echo "# of expected failures$(cat $file | grep 'FAIL:' | wc -l)"

Oh, and if you expect perfection, you should use:

echo "# of unexpected failures$(cat $file | grep 'FAIL:' | wc -l)"

instead.

Re: [PATCH] Bug fix in store_bit_field_1 for big endian targets (issue 51893)

2012-03-21 Thread Eric Botcazou

> This patch (for 4.6) fixes a wrong subword index computation in
> store_bit_field_1 for big endian targets when value is at least 4 times
> bigger than a word (DI REG value with HI words).
>
> It fixes a regression on gcc.c-torture/execute/bitfld-3.c for my current
> backend port.
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51893
>
> OK to commit?

It is OK for mainline on principle but there is no ChangeLog entry and the 
patch doesn't follow the GNU Coding Style: TABs instead of spaces, spaces 
before parentheses, etc.  See the equivalent code in extract_bit_field_1.
Moreover you need to properly test it on a mainstream big-endian platform.

See http://gcc.gnu.org/contribute.html for a more complete reference.

-- 
Eric Botcazou

Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Mike Stump

On Mar 21, 2012, at 2:09 PM, Steven Bosscher wrote:
> There is no reason for the ObjC front end to call assemble_external on
> these symbols,

> OK for trunk?

Ok.  Watch for hate mail from Jack, if you guess wrong.  :-)

Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Mike Stump

On Mar 21, 2012, at 2:32 PM, Steven Bosscher wrote:
> In any case, if there's nothing left to fix for PR24777, I suppose it
> can be closed as FIXED.

I see all sorts of FIXME: in c-decl.c still...  Anyway, someone needs to sort 
out what is done and remains undone and update the FIXMEs...  I don't know 
which ones are dead.

Re: [PATCH, alpha]: Use generic config/elfos.h headers

2012-03-21 Thread Richard Henderson

On 03/21/12 12:28, Uros Bizjak wrote:
> Hello!
> 
> Attached patch enables alpha to use generic config/elfos.h headers on
> linux and *bsd targets. The most important difference to generic
> elfos.h is in
> 
>   * config/alpha/elf.h (TARGET_ASM_FILE_START_FILE_DIRECTIVE): Undefine.

This one can be deleted, since

2012-03-14  Rainer Orth  

...
* config/alpha/alpha.c (alpha_file_start): Always assume
OBJECT_FORMAT_ELF.
Don't set targetm.asm_file_start_file_directive.
[!OBJECT_FORMAT_ELF]: Remove.
(TARGET_ASM_FILE_START_FILE_DIRECTIVE): Remove.

The patch looks ok, with,

> +#undef USER_LABEL_PREFIX
>  #define USER_LABEL_PREFIX ""

This ought to be deletable.

>  /* We use the default ASCII-output routine, except that we don't write more
> than 50 characters since the assembler doesn't support very long lines.  
> */
>  
> +#undef ASM_OUTPUT_ASCII
>  #define ASM_OUTPUT_ASCII(MYFILE, MYSTRING, MYLENGTH) \

Likewise.  Since RO deleted all non-gas support.

r~

Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Steven Bosscher

On Wed, Mar 21, 2012 at 11:11 PM, Mike Stump  wrote:
> On Mar 21, 2012, at 2:32 PM, Steven Bosscher wrote:
>> In any case, if there's nothing left to fix for PR24777, I suppose it
>> can be closed as FIXED.
>
> I see all sorts of FIXME: in c-decl.c still...  Anyway, someone needs to sort 
> out what is done and remains undone and update the FIXMEs...  I don't know 
> which ones are dead.


Ehm, yes. I was looking in the wrong place (objc/*). The "weird
not-really-builtin functions" are the ones added via
add_builtin_function but with builtin type NOT_BUILT_IN. Those
problems still appear to be there:

objc/objc-act.c:= add_builtin_function (TAG_EXCEPTIONTHROW,
temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-act.c:= add_builtin_function (TAG_SYNCENTER, temp_type,
0, NOT_BUILT_IN,
objc/objc-act.c:= add_builtin_function (TAG_SYNCEXIT, temp_type,
0, NOT_BUILT_IN,
objc/objc-act.c:= add_builtin_function (TAG_ENUMERATION_MUTATION,
type, 0, NOT_BUILT_IN,
objc/objc-gnu-runtime-abi-01.c:= add_builtin_function
(TAG_GETCLASS, type, 0, NOT_BUILT_IN,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_GETCLASS, type, 0, NOT_BUILT_IN,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_GETMETACLASS, type, 0, NOT_BUILT_IN, NULL, NULL_TREE);
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_SETJMP, temp_type, 0, NOT_BUILT_IN, NULL, NULL_TREE);
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_EXCEPTIONEXTRACT, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_EXCEPTIONTRYENTER, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_EXCEPTIONTRYEXIT, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_EXCEPTIONMATCH, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_ASSIGNIVAR, temp_type, 0, NOT_BUILT_IN,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_ASSIGNGLOBAL, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_ASSIGNSTRONGCAST, temp_type, 0, NOT_BUILT_IN, NULL,
(+ some not found by grep because the add_builtin_function call spans
multiple lines)

FWIW, Java does this too:
java/decl.c:= add_builtin_function ("_Jv_MonitorEnter", t, 0, NOT_BUILT_IN,
java/decl.c:= add_builtin_function ("_Jv_MonitorExit", t, 0, NOT_BUILT_IN,

Ciao!
Steven

Re: [patch] Split parts of cse_insn out to a few new functions

2012-03-21 Thread Steven Bosscher

On Wed, Mar 21, 2012 at 1:13 AM, Ian Lance Taylor wrote:
> On Tue, Mar 20, 2012 at 2:06 PM, Steven Bosscher wrote:
>>
>> This patch splits a couple of pieces of cse_insn out to new functions.
>> There are no functional changes, and no code generation differences as
>> far as I could tell on x86_64 (-m64 and -m32).

Likewise for the attached patch.

>> The purpose of the patch is and, loto hopefully make cse_insn easier
>> to understand. In a follow-up patch, I will make canonicalize_insn run
>> only once per insn (it currently, i.e. before and after this patch,
>> runs multiple times for CSE on extended basic blocks if a block is in
>> multiple extended basic blocks).

That is what the attached patch does.

Bootstrapped&tested on x86_64-unknown-linux-gnu.
OK for trunk?

Ciao!
Steven

* cse.c (cse_canonicalized_basic_blocks): New simple bitmap to
tag basic blocks that have already been traversed at least once,
so that all insns have been canonicalized.
(cse_insn): Call canonicalize_insn only if the basic block that
contains insn is visited for the first time.
(cse_extended_basic_block): After visiting all insns in a basic
block, mark the block in cse_canonicalized_basic_blocks.
(cse_main): Setup and destroy cse_canonicalized_basic_blocks.

(cse_find_path): Micro-optimization, reorder one condition to
avoid a reference to cfun.
	* cse.c (cse_canonicalized_basic_blocks): New simple bitmap to
	tag basic blocks that have already been traversed at least once,
	so that all insns have been canonicalized.
	(cse_insn): Call canonicalize_insn only if the basic block that
	contains insn is visited for the first time.
	(cse_extended_basic_block): After visiting all insns in a basic
	block, mark the block in cse_canonicalized_basic_blocks.
	(cse_main): Setup and destroy cse_canonicalized_basic_blocks.

	(cse_find_path): Micro-optimization, reorder one condition to
	avoid a reference to cfun.

Index: cse.c
===
--- cse.c	(revision 185622)
+++ cse.c	(working copy)
@@ -551,6 +551,10 @@ static bitmap cse_ebb_live_in, cse_ebb_l
already as part of an already processed extended basic block.  */
 static sbitmap cse_visited_basic_blocks;
 
+/* A simple bitmap to track for which basic blocks all insns have been
+   canonicalized already.  */
+static sbitmap cse_canonicalized_basic_blocks;
+
 static bool fixed_base_plus_p (rtx x);
 static int notreg_cost (rtx, enum rtx_code, int);
 static int approx_reg_cost_1 (rtx *, void *);
@@ -4492,8 +4496,10 @@ cse_insn (rtx insn)
   /* Record all the SETs in this instruction.  */
   n_sets = find_sets_in_insn (insn, &sets);
 
-  /* Substitute the canonical register where possible.  */
-  canonicalize_insn (insn, &sets, n_sets);
+  /* If we have not visited this block before (as part of another extended
+ basic block, substitute the canonical register where possible.  */
+  if (!TEST_BIT (cse_canonicalized_basic_blocks, BLOCK_FOR_INSN (insn)->index))
+canonicalize_insn (insn, &sets, n_sets);
 
   /* If this insn has a REG_EQUAL note, store the equivalent value in SRC_EQV,
  if different, or if the DEST is a STRICT_LOW_PART.  The latter condition
@@ -6254,10 +6260,9 @@ cse_find_path (basic_block first_bb, str
 	  else
 	e = NULL;
 
-	  if (e
-	  && !((e->flags & EDGE_ABNORMAL_CALL) && cfun->has_nonlocal_label)
-	  && e->dest != EXIT_BLOCK_PTR
+	  if (e && e->dest != EXIT_BLOCK_PTR
 	  && single_pred_p (e->dest)
+	  && !((e->flags & EDGE_ABNORMAL_CALL) && cfun->has_nonlocal_label)
 	  /* Avoid visiting basic blocks twice.  The large comment
 		 above explains why this can happen.  */
 	  && !TEST_BIT (cse_visited_basic_blocks, e->dest->index))
@@ -6452,6 +6457,9 @@ cse_extended_basic_block (struct cse_bas
 	}
 	}
 
+  /* We have now canonicalized all insns in this basic block.  */
+  SET_BIT (cse_canonicalized_basic_blocks, bb->index);
+
   /* With non-call exceptions, we are not always able to update
 	 the CFG properly inside cse_insn.  So clean up possibly
 	 redundant EH edges here.  */
@@ -6555,6 +6563,10 @@ cse_main (rtx f ATTRIBUTE_UNUSED, int nr
   cse_visited_basic_blocks = sbitmap_alloc (last_basic_block);
   sbitmap_zero (cse_visited_basic_blocks);
 
+  /* Set up the table of already canonicalized basic blocks.  */
+  cse_canonicalized_basic_blocks = sbitmap_alloc (last_basic_block);
+  sbitmap_zero (cse_canonicalized_basic_blocks);
+
   /* Loop over basic blocks in reverse completion order (RPO),
  excluding the ENTRY and EXIT blocks.  */
   n_blocks = pre_and_rev_post_order_compute (NULL, rc_order, false);
@@ -6598,6 +6610,7 @@ cse_main (rtx f ATTRIBUTE_UNUSED, int nr
   free (reg_eqv_table);
   free (ebb_data.path);
   sbitmap_free (cse_visited_basic_blocks);
+  sbitmap_free (cse_canonicalized_basic_blocks);
   free (rc_order);
   rtl_hooks = general_rtl_hooks;

Re: [patch] Split parts of cse_insn out to a few new functions

2012-03-21 Thread Steven Bosscher

On Thu, Mar 22, 2012 at 12:09 AM, Steven Bosscher  wrote:

>        (cse_find_path): Micro-optimization, reorder one condition to
>        avoid a reference to cfun.

Ah, and please ignore this bit. I don't know what I was thinking...

Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Iain Sandoe



On 21 Mar 2012, at 22:45, Steven Bosscher wrote:

On Wed, Mar 21, 2012 at 11:11 PM, Mike Stump   
wrote:

On Mar 21, 2012, at 2:32 PM, Steven Bosscher wrote:
In any case, if there's nothing left to fix for PR24777, I suppose  
it

can be closed as FIXED.


I see all sorts of FIXME: in c-decl.c still...  Anyway, someone  
needs to sort out what is done and remains undone and update the  
FIXMEs...  I don't know which ones are dead.



Ehm, yes. I was looking in the wrong place (objc/*). The "weird
not-really-builtin functions" are the ones added via
add_builtin_function but with builtin type NOT_BUILT_IN. Those
problems still appear to be there:

objc/objc-act.c:= add_builtin_function (TAG_EXCEPTIONTHROW,
temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-act.c:= add_builtin_function (TAG_SYNCENTER, temp_type,
0, NOT_BUILT_IN,
objc/objc-act.c:= add_builtin_function (TAG_SYNCEXIT, temp_type,
0, NOT_BUILT_IN,
objc/objc-act.c:= add_builtin_function (TAG_ENUMERATION_MUTATION,
type, 0, NOT_BUILT_IN,
objc/objc-gnu-runtime-abi-01.c:= add_builtin_function
(TAG_GETCLASS, type, 0, NOT_BUILT_IN,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_GETCLASS, type, 0, NOT_BUILT_IN,





(TAG_ASSIGNGLOBAL, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_ASSIGNSTRONGCAST, temp_type, 0, NOT_BUILT_IN, NULL,
(+ some not found by grep because the add_builtin_function call spans
multiple lines)


conceptually, the issue is that there are multiple sets of built-ins  
(potentially, one set for each runtime, and the sets are of different  
sizes).  Thus, it's not just a case of turning these into regular  
built-ins, without some mechanism to cater for overloading or presence/ 
absence of particular ones.

Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Steven Bosscher

On Thu, Mar 22, 2012 at 12:53 AM, Iain Sandoe
 wrote:

> conceptually, the issue is that there are multiple sets of built-ins
> (potentially, one set for each runtime, and the sets are of different
> sizes).  Thus, it's not just a case of turning these into regular built-ins,
> without some mechanism to cater for overloading or presence/absence of
> particular ones.

I don't understand this. We're committed to one runtime per
compilation, right? If so, then we should create only one of the sets
at any time.

Ciao!
Steven

Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Iain Sandoe



On 22 Mar 2012, at 00:00, Steven Bosscher wrote:


On Thu, Mar 22, 2012 at 12:53 AM, Iain Sandoe
 wrote:


conceptually, the issue is that there are multiple sets of built-ins
(potentially, one set for each runtime, and the sets are of different
sizes).  Thus, it's not just a case of turning these into regular  
built-ins,
without some mechanism to cater for overloading or presence/absence  
of

particular ones.


I don't understand this. We're committed to one runtime per
compilation, right? If so, then we should create only one of the sets
at any time.


Yes, that's true [notwithstanding an erroneous invocation of LTO with  
mixed inputs, which we should detect by alternate means]


but ...
.. don't the indices for built-ins need to be constant?
  (maybe that means that we'd just allocate as many as needed to  
cover all runtimes?)

.. or, otherwise, how does LTO know which set to invoke?

... sorry, I think my observation about pr24777 has pushed this thread  
off course - perhaps we'd be better putting this in the PR thread?


cheers
Iain

[C++ Patch] PR 49152

2012-03-21 Thread Paolo Carlini


Hi,

this diagnostic issue is about not even trying to print expressions in 
error messages involving operators, and print operand types instead. 
Just as an example, for:


struct X { int x; };
void trigger (X x []) { x [01] = 0; }

we currently print:

error: no match for ‘operator=’ in ‘*(x + 4u) = 0’

which the patch changes to:

error: no match for ‘operator=’ (operand types are ‘X’ and ‘int’)

Or, for the existing other/error10.C, from:

error: no match for ‘operator-’ in ‘-(* & a)’

to

error: no match for ‘operator-’ (operand type is ‘A<0>’)

Jon and Manuel checked clang and I checked what icc does: without the 
caret, I don't think we can do *much* better here, but, wrt the audit 
trail discussion, I'm proposing printing the actual operand types 
between parentheses - I got the general idea from icc - because we don't 
want to confuse parameters and arguments.


Tested x86_64-linux.

Thanks,
Paolo.

/




/cp
2012-03-21  Paolo Carlini  

PR c++/49152
* call.c (op_error): Don't try to print expressions, print types.

/testsuite
2012-03-21  Paolo Carlini  

PR c++/49152
* g++.dg/diagnostic/operator1.C: New.
* g++.dg/ext/label5.C: Adjust.
* g++.dg/ext/va-arg1.C: Likewise.
* g++.dg/other/error20.C: Likewise.
* g++.dg/other/error20.C: Likewise.
* g++.dg/other/error16.C: Likewise.
* g++.dg/other/error10.C: Likewise.
* g++.dg/parse/error30.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-err1.C: Likewise.
Index: testsuite/g++.dg/ext/label5.C
===
--- testsuite/g++.dg/ext/label5.C   (revision 185603)
+++ testsuite/g++.dg/ext/label5.C   (working copy)
@@ -2,5 +2,5 @@
 // PR c++/24052
 
 struct A { };
-int main() { b: A() && && b; } // { dg-error "A\\(\\) && && *b" }
+int main() { b: A() && && b; } // { dg-error "operand types are 'A' and 
'void\\*'" }
 // { dg-message "candidate|operator&&|no known conversion" "additional" { 
target *-*-* } 5 }
Index: testsuite/g++.dg/ext/va-arg1.C
===
--- testsuite/g++.dg/ext/va-arg1.C  (revision 185603)
+++ testsuite/g++.dg/ext/va-arg1.C  (working copy)
@@ -4,5 +4,5 @@ struct A {};
 
 void foo()
 {
-  ++__builtin_va_arg(0, A); // { dg-error "'\\+\\+va_arg\\(0, A\\)'" }
+  ++__builtin_va_arg(0, A); // { dg-error "operand type is 'A'" }
 }
Index: testsuite/g++.dg/other/error20.C
===
--- testsuite/g++.dg/other/error20.C(revision 185603)
+++ testsuite/g++.dg/other/error20.C(working copy)
@@ -8,6 +8,6 @@ struct A// { dg-message "operator=|no 
known con
 
 void bar (A& a)
 {
-  a.foo () = 0; // { dg-error "A::foo\\(\\) = 0" }
+  a.foo () = 0; // { dg-error "operand types are 'A' and 'int'" }
   // { dg-message "candidate" "candidate note" { target *-*-* } 11 }
 }   
Index: testsuite/g++.dg/other/error16.C
===
--- testsuite/g++.dg/other/error16.C(revision 185603)
+++ testsuite/g++.dg/other/error16.C(working copy)
@@ -10,5 +10,5 @@ typedef Outer XOuter;
 
 int main() {
   Outer  ab;
-  ab.foo() == 1; // { dg-error "ab.Outer" }
+  ab.foo() == 1; // { dg-error "operand types are 'Outer::Inner' and 
'int'" }
 }
Index: testsuite/g++.dg/other/error10.C
===
--- testsuite/g++.dg/other/error10.C(revision 185603)
+++ testsuite/g++.dg/other/error10.C(working copy)
@@ -6,10 +6,9 @@ template struct A {};
 
 template
 void foo(const A &a)
-{ -A(a); } // { dg-error "\\(\\* & a\\)" "" }
+{ -A(a); } // { dg-error "operand type is 'A<0>'" }
 
 void bar()
 {
 foo(A<0>()); // { dg-message "required from here" "" }
 }
-
Index: testsuite/g++.dg/diagnostic/operator1.C
===
--- testsuite/g++.dg/diagnostic/operator1.C (revision 0)
+++ testsuite/g++.dg/diagnostic/operator1.C (revision 0)
@@ -0,0 +1,4 @@
+// PR c++/49152
+
+struct X { int x; }; 
+void trigger (X x []) { x [01] = 0; } // { dg-error "operand types are 'X' and 
'int'" }
Index: testsuite/g++.dg/parse/error30.C
===
--- testsuite/g++.dg/parse/error30.C(revision 185603)
+++ testsuite/g++.dg/parse/error30.C(working copy)
@@ -8,5 +8,5 @@ struct A
   A(int);
 };
 
-A a = -A();// { dg-error "10:no match for.*operator-.*in.*-A\\(\\)" }
-A b = -A(5);   // { dg-error "11:no match for.*operator-.*in.*-A\\(5\\)" }
+A a = -A();// { dg-error "operand type is 'A'" }
+A b = -A(5);   // { dg-error "operand type is 'A'" }
Index: testsuite/g++.dg/cpp0x/lambda/lambda-err1.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-err1.C (revision 185603)
+++ testsuite/g+

Re: [google][4.6] Bug fixes to function reordering linker plugin to handle local and comdat functions. (issue 5851044)

2012-03-21 Thread tmsriram


Committed to google/gcc-4_6 after validation.

On 2012/03/21 05:07:33, davidxl wrote:

ok for google branches after checkin validation.



David




http://codereview.appspot.com/5851044/

[PATCH][Testsuite] XFAIL scev-3/4.c and add scev-5.c

2012-03-21 Thread Jiangning Liu

Hi,

This patch is to XFAIL scev-3.c and scev-5.c. 

The bug is going to be fixed after Richard Guenther fix a serials of
problems related to POINTER_PLUS_EXPR and sizetype precision.

Thanks,
-Jiangning 

ChangeLog for testsuite:

2012-03-21  Jiangning Liu  

PR tree-optimization/52563
* gcc.dg/tree-ssa/scev-3.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-4.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-5.c: New.

Thanks,
-Jiangning

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
index 28d5c93..ed63a18 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -14,5 +14,5 @@ f(int k)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" { xfail lp64 } 
+} } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git
a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
index 6c1e530..a538c32 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
@@ -19,5 +19,5 @@ f(int k)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" { xfail lp64 } 
+} } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git
a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
new file mode 100644
index 000..b9de36a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+long long i;
+
+for (i=k; i<1000; i+=k) {
+a_p = &a[i];
+*a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */

Re: [google][4.6]Bump param value of default function size limit for auto cloning

2012-03-21 Thread Sriraman Tallam

Submitted to google/gcc-4_6.

Thanks,
-Sri.

On Wed, Mar 21, 2012 at 11:38 AM, Xinliang David Li  wrote:
> ok.
>
> thanks,
>
> David
>
> On Wed, Mar 21, 2012 at 11:20 AM, Sriraman Tallam  wrote:
>> Hi,
>>
>>  I am bumping up the default param value of  function size limit for
>> auto cloning. Since auto cloning happens on inlined functions, the
>> original value does not catch some cases in one of our benchmarks.
>>
>>  Automatic function versioning is only available in the
>> google/gcc-4_6 branch. I am working on porting this to trunk. Please
>> see: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01230.html for
>> description.
>>
>> Thanks,
>> -Sri.
>>
>>
>> Index: params.def
>> ===
>> --- params.def  (revision 185514)
>> +++ params.def  (working copy)
>> @@ -1040,7 +1040,7 @@ DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
>>  DEFPARAM (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING,
>>          "autoclone-function-size-limit",
>>          "Do not auto clone functions beyond this size.",
>> -         450, 0, 10)
>> +         5000, 0, 10)
>>
>>  /*
>>  Local variables:

Re: [C++ Patch] PR 49152

2012-03-21 Thread Gabriel Dos Reis

On Wed, Mar 21, 2012 at 7:22 PM, Paolo Carlini  wrote:
> Hi,
>
> this diagnostic issue is about not even trying to print expressions in error
> messages involving operators, and print operand types instead. Just as an
> example, for:
>
> struct X { int x; };
> void trigger (X x []) { x [01] = 0; }
>
> we currently print:
>
> error: no match for ‘operator=’ in ‘*(x + 4u) = 0’
>
> which the patch changes to:
>
> error: no match for ‘operator=’ (operand types are ‘X’ and ‘int’)
>
> Or, for the existing other/error10.C, from:
>
> error: no match for ‘operator-’ in ‘-(* & a)’
>
> to
>
> error: no match for ‘operator-’ (operand type is ‘A<0>’)
>

Usually these things appear in much less simpler expressions, possibly involving
the same symbol but withe different meanings.  There out be the a
way to give an indication of which symbol the diagnostic is about.
Withe the (imperfect) approach of printing expressions, at least some indication
is given on the expression involved.  Just printing the types with no indication
of what expression is causing trouble is more head-scratching.

> Jon and Manuel checked clang and I checked what icc does: without the caret,
> I don't think we can do *much* better here, but, wrt the audit trail

I think printing part or all of the expression is better in absence of carets.
We can improve on the pretty printing of expressions, for example.

> discussion, I'm proposing printing the actual operand types between
> parentheses - I got the general idea from icc - because we don't want to
> confuse parameters and arguments.
>
> Tested x86_64-linux.
>
> Thanks,
> Paolo.
>
> /
>
>
>
>

PATCH COMMITTED: Add notes about Go to gcc-4.7/changes.html

2012-03-21 Thread Ian Lance Taylor

I committed this patch to add some notes about Go to
gcc-4.7/changes.html on the web site.

Ian

Index: gcc-4.7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.100
diff -u -r1.100 changes.html
--- gcc-4.7/changes.html	19 Mar 2012 16:05:46 -	1.100
+++ gcc-4.7/changes.html	22 Mar 2012 04:51:20 -
@@ -609,6 +609,17 @@
   
   
 
+Go
+  
+GCC 4.7 implements
+  the http://weekly.golang.org/doc/go1.html";>Go 1
+  language standard.  The library support in 4.7.0 is not
+  quite complete, due to release timing.  Release 4.7.1 is
+  expected to include complete support.
+Go has been tested on GNU/Linux and Solaris platforms.  It may
+  work on other platforms as well.
+  
+

65 matches

Mail list logo