date:20110614

[google] Fix a bug leading to inconsistent comdat group in LIPO mode (issue4616041)

2011-06-14 Thread David Li

The patch will be committed to google/main to fix a problem in LIPO model
that leads to 'reference to discarded comdat section' ld warning. The problem
is caused by inconsistent comdat groups between primary and aux modules because
thunks were skipped in aux module.

2011-06-14   David Li  

* cp/semantics.c (emit_associated_thunks):
Do not omit thunk emission for aux modules.

Index: cp/semantics.c
===
--- cp/semantics.c  (revision 174851)
+++ cp/semantics.c  (working copy)
@@ -3415,8 +3415,7 @@ emit_associated_thunks (tree fn)
  enabling you to output all the thunks with the function itself.  */
   if (DECL_VIRTUAL_P (fn)
   /* Do not emit thunks for extern template instantiations.  */
-  && ! DECL_REALLY_EXTERN (fn)
-  && ! cgraph_is_auxiliary (fn))
+  && ! DECL_REALLY_EXTERN (fn))
 {
   tree thunk;
 

--
This patch is available for review at http://codereview.appspot.com/4616041

Re: [patch, libgfortran] PR48906 Wrong rounding results with -m32

2011-06-14 Thread Thomas Henlich

On Tue, Jun 14, 2011 at 06:51, jerry DeLisle  wrote:
>> It should be easy to implement:
>>
>> After the switch between F and E editing, we just need to shift the
>> decimal point and decrement the exponent. No new rounding is required,
>> because we keep the number of significant digits.
>>
>
> OK, after a little bit of experimentation, I have arrived at the updated
> patch attached.
>
> This has been regression tested and passes all test cases I am aware of.  I
> also have included a new test case gcc/testsuite/gfortran.dg/fmt_g.f90.
>
> OK for trunk?

I have reviewed your patch, and I noticed that you placed the
digit-shifting code quite at the top of output_float(), where the
final value of e is not even known. Due to rounding, e can be modified
after this point, so your code will generate invalid output in some
cases, for example:

print "(-2PG0)", nearest(0.1d0, -1.0d0) ! 1.E+001
expected .002E+001

Please put the code where at belongs, after the switch between F and E
editing (based on the final value of e).

The same applies to the scale factor in general, e.g.

print "(-2pg12.3)", 0.096! 1.00E+01 expected 0.001E+02
print "(-1pg12.3)", 0.0996   ! 1.00E+00 expected 0.010E+01
print "(-2pg12.3)", 0.09996  ! 1.00E+01 expected 0.100
print "(-1pg12.3)", 0.09996  ! 1.00E+00 expected 0.100
print "(1pg12.3)",  0.06 ! 1.000E-01 expected 0.100
print "(2pg12.3)",  0.06 ! 10.00E-02 expected 0.100
print "(-2pg12.3)",  999.6  ! 0.100E+04 expected 0.001E+06
print "(-1pg12.3)",  999.6  ! 0.100E+04 expected 0.010E+05
print "(1pg12.3)",  999.6  ! 0.100E+04 expected 9.996E+02
print "(2pg12.3)",  999.6  ! 0.100E+04 expected 99.96E+01

Please revise your code to fix this. A working approach I have outlined in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48906#c28
and an (alpha) implementation is here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48906#c31

Thomas

Re: [PATCH, SMS] Fix violation of memory dependence

2011-06-14 Thread Ayal Zaks

Revital Eres  wrote on 13/06/2011 10:29:06 AM:

> From: Revital Eres 
> To: Ayal Zaks/Haifa/IBM@IBMIL
> Cc: gcc-patches@gcc.gnu.org, Patch Tracking 
> Date: 13/06/2011 10:29 AM
> Subject: [PATCH, SMS] Fix violation of memory dependence
>
> Hello,
>
> The attached patch fixes violation of memory dependencies. The
> problematic scenario happens when -fmodulo-sched-allow-regmoves flag
> is set and certain anti-dep edges are not created.
>
> For example, consider the following three instructions and the edges
> between them.  When -fmodulo-sched-allow-regmoves is set the edge (63 -
> Anti, 0 -> 64) is not created. (probably due to transitivity)
>
> Insn 63)  r168 = MEM[176]
> Out edges: (63 - Anti, 0 -> 64)
> In edges: (64 - True, 1 -> 63), (68 - True, 1 -> 63)
>
> insn 64)  176 = 176 + 4
> Out edges: (64 - True, 1 -> 63), (64 - True, 0-> 68)
> In edges: (63 - Anti, 0 -> 64)
>
> insn 68)  MEM[176 – 4] =  193
> Out edges: (68 - True, 1 -> 63)
> In edges: (64 - True, 0-> 68)
>
> This anti-dep edge is on the path from one memory instruction to another
> --- from 63 to 68; such that removing the edge caused a violation of
> the memory dependencies as insn 63 was scheduled after insn 68.
>
> This patch adds intra edges between every two memory instructions in
> this case.  It fixes recent bootstrap failure on ARM. (with SMS flags)
>
> The patch was tested as follows:
> On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
> enabling SMS also on loops with stage count 1.  Regtested on SPU.
> On arm-linux-gnueabi bootstrap c language with SMS
> flags enabling SMS also on loops with stage count 1
> and currently regression testing on c,c++.
>
> OK for mainline once regtest on arm-linux-gnueabi completes?
>

Yes, this is a straightforward fix to a wrong-code bug, as discussed
offline. Other alternatives that might introduce less edges:
o connect predecessors of u with v, and u with successors of v, when
removing edge (u,v). Maybe there are other cases which rely on transitivity
(?).
o have a version of sched_analyze that avoids creating register anti-deps
to begin with, and thus will create memory-deps in the absence of
transitivity.


>> * ddg.c (add_intra_loop_mem_dep): New function.

You could check first thing if (from->cuid == to->cuid), for code clarity.

Nice catch,
Ayal.


> Thanks,
> Revital
>
> Changelog:
>
> gcc/
> * ddg.c (add_intra_loop_mem_dep): New function.
> (build_intra_loop_deps): Call it.
>
> testsuite/
> * gcc.dg/sms-9.c: New file.
> [attachment "patch_fix_regmoves_12_6.txt" deleted by Ayal Zaks/Haifa/IBM]

Re: [PATCH, SMS] Fix violation of memory dependence

2011-06-14 Thread Revital Eres

Hello,

> Yes, this is a straightforward fix to a wrong-code bug, as discussed
> offline. Other alternatives that might introduce less edges:
> o connect predecessors of u with v, and u with successors of v, when
> removing edge (u,v). Maybe there are other cases which rely on transitivity
> (?).

Right. as discussed off-line I will further think if we are currently
cover all the cases.
>
>>>         * ddg.c (add_intra_loop_mem_dep): New function.
>
> You could check first thing if (from->cuid == to->cuid), for code clarity.

I will address this point separately and commit the current version of
the patch as is if that's OK.

Thanks,
Revital

Re: RFA (fold): PATCH for c++/49290 (folding (T)(ar+10))

2011-06-14 Thread Richard Guenther

On Mon, 13 Jun 2011, Jason Merrill wrote:

> On 06/13/2011 06:51 AM, Richard Guenther wrote:
> > But I suppose you want the array-ref be folded to a constant eventually?
> 
> Right.
> 
> I'm not going to keep arguing about VIEW_CONVERT_EXPR, but that brings me back
> to my original question: is it OK to add a permissive mode to the function, or
> should I copy the whole thing into the front end?

I think you should copy the whole thing into the front end for now.

Note that we want to arrive at a point where our constant folding
can handle the MEM_REF case for arbitrary constant constructors.
See fold_const_aggregate_ref in gimple-fold.c - probably not usable
from the frontend directly though.  And it doesn't yet handle
non-array constructors without having a component-ref tree.
But if we eventually have all the code in that routine you might
switch to it instead.

Richard.

Re: RFC: Fix GCSE exp_equiv_p on MEMs with different MEM_ATTRS (PR rtl-optimization/49390)

2011-06-14 Thread Richard Guenther

On Mon, 13 Jun 2011, Jakub Jelinek wrote:

> Hi!
> 
> As the testcase shows, the
> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02945.html
> patch looks wrong, MEM_ATTRS matters quite a lot for the
> alias oracle, so ignoring it leads to miscompilations.
> 
> Instead of just reverting the patch, this patch attempts to add
> some exceptions, notably when MEM_ATTRS are indirect with MEM_REF
> containing an SSA_NAME as base, and the SSA_NAMEs have the same
> var (maybe that check is unnecessary) and both SSA_NAMEs have the
> same points-to info, we can consider them interchangeable.

Hum, I think this is bogus.  When the SSA names are not exactly
the same we miss the must-alias check which prevents TBAA from
being applied.

So if you really want equivalency for the alias oracle then
you have to preserve whether the SSA names are exactly the
same or not.

The patch that reverted the MEM_ATTR comparison didn't come
with a single testcase (ugh, I realize I approved it though ;)).

What I suspect is that we are not good with sharing MEM_ATTRS
with MEM_EXPRs, esp. using operand_equal_p for comparing MEM_EXPRs
does not do a structural comparison of the trees (that was ok
as long as we didn't have INDIRECT_REFs as bases for MEM_EXPRs
but NULLed them).  Maybe it was already fixed with my patch
to treat the base operand of MEM_REFs specially via 
OEP_CONSTANT_ADDRESS_OF?

So, please consider reverting Bernds patch instead.

Bernd, do you have any testcases?

Thanks,
Richard.

Re: [PATCH, SMS] Fix violation of memory dependence

2011-06-14 Thread Revital Eres

Hello,

>> You could check first thing if (from->cuid == to->cuid), for code clarity.
>
> I will address this point separately and commit the current version of
> the patch as is if that's OK.

Re-thinking about that, I'll prepare a new version of the patch which
addresses this and re-send it.

Sorry for the confusion,
Revital

Re: [PATCH] Only run pr48377.c testcase on i?86/x86_64

2011-06-14 Thread Eric Botcazou

> This limits this testcase to i?86/x86_64 (moving to gcc.target/ would
> be harder because it relies on all the weirdo vectorization options to be
> passed), because apparently on strict alignment targets we don't handle
> aligned (1) non-aggregates correctly.  Or should it be instead xfailed
> just on selected strict-aligned targets?

The 6.4.1 release is approaching so please install the patch for now.  TIA.

-- 
Eric Botcazou

Re: [testsuite]: Skip tests for targets with int < 32 bits

2011-06-14 Thread Georg-Johann Lay

Jakub Jelinek schrieb:
> On Mon, Jun 13, 2011 at 08:18:52PM +0200, Georg-Johann Lay wrote:
>> For exammple, I added this line to, e.g.
>> * gcc.c-torture/execute/cmpsi-2.c
>> * gcc.c-torture/execute/pr45262.c
>> in trunk r172757
>> http://gcc.gnu.org/viewcvs?view=revision&revision=172757
> 
> That was a mistake.
> 
> gcc.c-torture/execute/ doesn't use the dg framework, you need
> to instead add cmpsi-2.x resp. pr45262.x file alongside with
> the testcase.  Look at other *.x files there for details on how they look
> like.
> 
>   Jakub

Thanks for your help Mike and Jakub.

Updated patch and testrun looks cleaner now.

Johann

--

testsuite/

* gcc.c-torture/execute/cmpsi-2.c: Undo 172757.
* gcc.c-torture/execute/cmpsi-2.x: New file.
* gcc.c-torture/execute/pr45262.c: Undo 172757.
* gcc.c-torture/execute/pr45262.x: New file.
* gcc.c-torture/compile/pr46534.c: Skip for AVR.
* gcc.c-torture/compile/pr49029.c: Add dg-require-effective-target
int32plus
* gcc.c-torture/compile/pr49163.c: Ditto.
Index: gcc.c-torture/execute/cmpsi-2.c
===
--- gcc.c-torture/execute/cmpsi-2.c	(Revision 174701)
+++ gcc.c-torture/execute/cmpsi-2.c	(Arbeitskopie)
@@ -1,5 +1,3 @@
-/* { dg-require-effective-target int32plus } */
-
 #define F 140
 #define T 13
 
Index: gcc.c-torture/execute/cmpsi-2.x
===
--- gcc.c-torture/execute/cmpsi-2.x	(Revision 0)
+++ gcc.c-torture/execute/cmpsi-2.x	(Revision 0)
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { [check_effective_target_int16] } {
+	return 1
+}
+
+return 0;
Index: gcc.c-torture/execute/pr45262.c
===
--- gcc.c-torture/execute/pr45262.c	(Revision 174701)
+++ gcc.c-torture/execute/pr45262.c	(Arbeitskopie)
@@ -1,5 +1,4 @@
 /* PR middle-end/45262 */
-/* { dg-require-effective-target int32plus } */
 
 extern void abort (void);
 
Index: gcc.c-torture/execute/pr45262.x
===
--- gcc.c-torture/execute/pr45262.x	(Revision 0)
+++ gcc.c-torture/execute/pr45262.x	(Revision 0)
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { [check_effective_target_int16] } {
+	return 1
+}
+
+return 0;
Index: gcc.c-torture/compile/pr46534.c
===
--- gcc.c-torture/compile/pr46534.c	(Revision 174701)
+++ gcc.c-torture/compile/pr46534.c	(Arbeitskopie)
@@ -1,4 +1,4 @@
-/* { dg-skip-if "too big" { pdp11-*-* } { "*" } { "" } } */
+/* { dg-skip-if "too big" { avr-*-* pdp11-*-* } { "*" } { "" } } */
 /* PR middle-end/46534 */
 
 extern int printf (const char *, ...);
Index: gcc.c-torture/compile/pr49029.c
===
--- gcc.c-torture/compile/pr49029.c	(Revision 174701)
+++ gcc.c-torture/compile/pr49029.c	(Arbeitskopie)
@@ -1,4 +1,5 @@
 /* PR middle-end/49029 */
+/* { dg-require-effective-target int32plus } */
 struct S { volatile unsigned f : 11; signed g : 30; } __attribute__((packed));
 struct T { volatile struct S h; } __attribute__((packed)) a;
 void foo (int);
Index: gcc.c-torture/compile/pr49163.c
===
--- gcc.c-torture/compile/pr49163.c	(Revision 174701)
+++ gcc.c-torture/compile/pr49163.c	(Arbeitskopie)
@@ -1,4 +1,5 @@
 /* PR target/49163 */
+/* { dg-require-effective-target int32plus } */
 struct S1
 {
  unsigned f0:18;

Re: RFC: Fix GCSE exp_equiv_p on MEMs with different MEM_ATTRS (PR rtl-optimization/49390)

2011-06-14 Thread Bernd Schmidt

On 06/14/2011 10:43 AM, Richard Guenther wrote:
> The patch that reverted the MEM_ATTR comparison didn't come
> with a single testcase (ugh, I realize I approved it though ;)).

> Bernd, do you have any testcases?

It was a missed-optimization problem, but I think it only showed up with
a modified ARM backend, and it was a set of changes I threw away in the
end since I found a better fix. So, from that angle no objections if
it's reverted.

Judging from the variable names the testcase was 253.perlbmk/op.c, but I
can't make the problem reappear at the moment - quite possibly because
I'm not fully remembering what I had changed in arm.c.

Bernd

Re: [PATCH] Only run pr48377.c testcase on i?86/x86_64

2011-06-14 Thread Jakub Jelinek

On Tue, Jun 14, 2011 at 11:10:13AM +0200, Eric Botcazou wrote:
> > This limits this testcase to i?86/x86_64 (moving to gcc.target/ would
> > be harder because it relies on all the weirdo vectorization options to be
> > passed), because apparently on strict alignment targets we don't handle
> > aligned (1) non-aggregates correctly.  Or should it be instead xfailed
> > just on selected strict-aligned targets?
> 
> The 6.4.1 release is approaching so please install the patch for now.  TIA.

Well, Steve has a patch for non_strict_align effective_target
in http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00673.html
(with s/strict_align/non_strict_align/g ), I was hoping it would be reviewed
and I'd just adjust the testcase to use it as well.

Jakub

Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated)

2011-06-14 Thread Joern Rennecke


Except or the fortran/java bits (committed), this patch hasn't been
reviewed for five weeks:
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00582.html

Re: RFC: Fix GCSE exp_equiv_p on MEMs with different MEM_ATTRS (PR rtl-optimization/49390)

2011-06-14 Thread Richard Guenther

On Tue, 14 Jun 2011, Bernd Schmidt wrote:

> On 06/14/2011 10:43 AM, Richard Guenther wrote:
> > The patch that reverted the MEM_ATTR comparison didn't come
> > with a single testcase (ugh, I realize I approved it though ;)).
> 
> > Bernd, do you have any testcases?
> 
> It was a missed-optimization problem, but I think it only showed up with
> a modified ARM backend, and it was a set of changes I threw away in the
> end since I found a better fix. So, from that angle no objections if
> it's reverted.
> 
> Judging from the variable names the testcase was 253.perlbmk/op.c, but I
> can't make the problem reappear at the moment - quite possibly because
> I'm not fully remembering what I had changed in arm.c.

It's likely that due to MEM_REFs on the tree level we now detect more
cases there.  Btw, if we'd re-arrange the code to use NULL MEM_ATTRS
for the canonical MEM whenever we see two non-equivalent MEM_ATTRS
it should work again (no need to compare MEM_ALIAS_SET either then).
Not sure where to do that check and MEM_ATTRS adjustment though
(probably at hashtable lookup time).

So I'd say we revert your patch for now and if somebody feels like
implementing the above ...

Richard.

Re: [patch] Don't insert pattern statements into the code (was Fix PR tree-optimization/49318)

2011-06-14 Thread Richard Guenther

On Mon, Jun 13, 2011 at 2:43 PM, Ira Rosen  wrote:
> On 10 June 2011 12:14, Richard Guenther  wrote:
>> In the end I think we should not generate the pattern stmt during
>> pattern matching but only mark the relevant statements with a
>> pattern kind.  Say, for each pattern we have a "main" statement
>> that has related stmts belonging to the pattern that define uses
>> of the "main" statement - mark those to refer to that "main" statement.
>> For that "main" statement simply record an enum value, like,
>> widening_mult.  Then only at vectorized statement
>> generation time actually generate the vectorized form of the
>> pattern statement.
>
> I ended up with the following: during pattern detection a new scalar
> pattern statement is created but not inserted into the code, it is
> only recorded as a related statement of the last statement in the
> detected pattern. Every time the last statement is being
> analyzed/transformed, we switch to the pattern statement instead. It
> is much more difficult just to mark the last stmt with an enum value,
> since we have to retrieve the relevant operands every time.
>
> I am not sure if we need to free the pattern stmt at the end.
>
> Bootstrapped and now testing on powerpc64-suse-linux (tested
> vectorizer testsuite on powerpc64-suse-linux and x86_64-suse-linux.
>
> What do you think?

   /* Mark the stmts that are involved in the pattern. */
-  gsi_insert_before (&si, pattern_stmt, GSI_SAME_STMT);
   set_vinfo_for_stmt (pattern_stmt,
  new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL));
+  gimple_set_bb (pattern_stmt, gimple_bb (stmt));

do you really need this?  Otherwise it looks reasonable.  Btw,
we can probably remove the simple DCE done in
slpeel_tree_peel_loop_to_edge (remove_dead_stmts_from_loop)
with this patch.

Thanks,
Richard.

> Thanks,
> Ira
>
> ChangeLog:
>
>     * tree-vect-loop.c (vect_determine_vectorization_factor): Don't
>     remove irrelevant pattern statements.  For irrelevant statements
>     check if it is the last statement of a detected pattern, use
>     corresponding pattern statement instead.
>     (destroy_loop_vec_info): No need to remove pattern statements,
>     only free stmt_vec_info.
>     (vect_transform_loop): For irrelevant statements check if it is
>     the last statement of a detected pattern, use corresponding
>     pattern statement instead.
>     * tree-vect-patterns.c (vect_pattern_recog_1): Don't insert
>     pattern statements.  Set basic block for the new statement.
>     (vect_pattern_recog): Update documentation.
>     * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Scan
>     operands of pattern statements.
>     (vectorizable_call): Fix printing.  In case of a pattern statement
>     use the lhs of the original statement when creating a dummy
>     statement to replace the original call.
>     (vect_analyze_stmt): For irrelevant statements check if it is
>     the last statement of a detected pattern, use corresponding
>     pattern statement instead.
>     * tree-vect-slp.c (vect_schedule_slp_instance): For pattern
>     statements use gsi of the original statement.
>

Re: Ping: [testsuite]: Skip tests for targets with int < 32 bits

2011-06-14 Thread Richard Guenther

On Mon, Jun 13, 2011 at 2:45 PM, Georg-Johann Lay  wrote:
> Ping #1 for:
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00746.html

Ok.

THanks,
Richard.

>
> Georg-Johann Lay:
>>
>> This patch fixes testsuite failures because the testcases assume
>> sizeof(int) >= 4.
>>
>>        * gcc.c-torture/compile/pr49029.c: Add dg-require-effective-target
>>        int32plus
>>        * gcc.c-torture/compile/pr49163.c: Ditto.
>

Re: Do not stream BINFO_VIRTUALs to ltrans unit

2011-06-14 Thread Richard Guenther

On Mon, Jun 13, 2011 at 2:54 PM, Jan Hubicka  wrote:
> Hi,
> by accident I noticed that BINFO_VIRTUALs streaming is really expensive. It
> about doubles amount of IL and types streamed by Mozilla.
>
> One obvious optimization is to not stream into ltrans unit where it is
> too late to do any useful devirtualization anyway.
> Doing so reduces /tmp usage from 1.7GB to 1.1GB and proportionaly reduces
> streaming out time.
>
> Bootstrapped/regtested x86_64-linux.
> OK?

Ok.

Thanks,
Richard.

> Honza
>        * lto-streamer-out.c (lto_output_ts_binfo_tree_pointers): Do not
>        stream BINFO_VIRTUALS to ltrans units.
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c  (revision 174985)
> +++ lto-streamer-out.c  (working copy)
> @@ -1117,7 +1117,11 @@
>
>   lto_output_tree_or_ref (ob, BINFO_OFFSET (expr), ref_p);
>   lto_output_tree_or_ref (ob, BINFO_VTABLE (expr), ref_p);
> -  lto_output_tree_or_ref (ob, BINFO_VIRTUALS (expr), ref_p);
> +  /* BINFO_VIRTUALS is used to drive type based devirtualizatoin.  It often 
> links
> +     together large portions of programs making it harder to partition.  
> Becuase
> +     devirtualization is interesting before inlining, only, there is no real
> +     need to ship it into ltrans partition.  */
> +  lto_output_tree_or_ref (ob, flag_wpa ? NULL : BINFO_VIRTUALS (expr), 
> ref_p);
>   lto_output_tree_or_ref (ob, BINFO_VPTR_FIELD (expr), ref_p);
>
>   output_uleb128 (ob, VEC_length (tree, BINFO_BASE_ACCESSES (expr)));
>

Re: [4.6 PATCH] Workaround for stack slot sharing problems with unrolling (PR fortran/49103)

2011-06-14 Thread Richard Guenther

On Mon, Jun 13, 2011 at 10:38 PM, Jakub Jelinek  wrote:
> On Tue, Jun 07, 2011 at 12:24:06PM +0200, Richard Guenther wrote:
>> Probably easier and more complete to do
>>
>>             if (lhs && TREE_CODE (lhs) != SSA_NAME)
>>               {
>>                  tree base = get_base_address (lhs);
>
> Done in the patch below, bootstrapped/regtested again on x86_64-linux and
> i686-linux on the 4.6 branch.
>
>> I don't like the patch too much, but it looks reasonable.  At least reverting
>> your patch doesn't really fix anything.
>>
>> Any opinions from others?
>
> Michael said the same, anyone else has any opinion or can I check it in for
> 4.6?

Yes, and for trunk.  Micha can revert it there when his patch goes in.

Thanks,
Richard.

> 2011-06-13  Jakub Jelinek  
>
>        PR fortran/49103
>        * tree.h (DECL_NONSHAREABLE): Define.
>        (struct tree_decl_common): Change decl_common_unused to
>        decl_nonshareable_flag.
>        * cfgexpand.c (expand_used_vars_for_block, clear_tree_used):
>        Ignore vars with DECL_NONSHAREABLE bit set.
>        * tree-cfg.c (gimple_duplicate_bb): Set DECL_NONSHAREABLE
>        on stores to automatic aggregate vars.
>
>        * gfortran.dg/pr49103.f90: New test.
>
> --- gcc/tree.h.jj       2011-03-14 14:12:15.0 +0100
> +++ gcc/tree.h  2011-05-31 14:05:34.0 +0200
> @@ -1330,6 +1330,10 @@ extern void omp_clause_range_check_faile
>  #define DECL_READ_P(NODE) \
>   (TREE_CHECK2 (NODE, VAR_DECL, PARM_DECL)->decl_common.decl_read_flag)
>
> +#define DECL_NONSHAREABLE(NODE) \
> +  (TREE_CHECK2 (NODE, VAR_DECL, \
> +               RESULT_DECL)->decl_common.decl_nonshareable_flag)
> +
>  /* In a CALL_EXPR, means that the call is the jump from a thunk to the
>    thunked-to function.  */
>  #define CALL_FROM_THUNK_P(NODE) (CALL_EXPR_CHECK (NODE)->base.protected_flag)
> @@ -2787,8 +2791,9 @@ struct GTY(()) tree_decl_common {
>      being set.  */
>   unsigned decl_read_flag : 1;
>
> -  /* Padding so that 'off_align' can be on a 32-bit boundary.  */
> -  unsigned decl_common_unused : 1;
> +  /* In VAR_DECL or RESULT_DECL set when significant code movement precludes
> +     attempting to share the stack slot with some other variable.  */
> +  unsigned decl_nonshareable_flag : 1;
>
>   /* DECL_OFFSET_ALIGN, used only for FIELD_DECLs.  */
>   unsigned int off_align : 8;
> --- gcc/cfgexpand.c.jj  2011-05-04 10:46:52.0 +0200
> +++ gcc/cfgexpand.c     2011-05-31 14:08:36.0 +0200
> @@ -1134,7 +1134,9 @@ expand_used_vars_for_block (tree block,
>
>   /* Expand all variables at this level.  */
>   for (t = BLOCK_VARS (block); t ; t = DECL_CHAIN (t))
> -    if (TREE_USED (t))
> +    if (TREE_USED (t)
> +        && ((TREE_CODE (t) != VAR_DECL && TREE_CODE (t) != RESULT_DECL)
> +           || !DECL_NONSHAREABLE (t)))
>       expand_one_var (t, toplevel, true);
>
>   this_sv_num = stack_vars_num;
> @@ -1167,6 +1169,8 @@ clear_tree_used (tree block)
>
>   for (t = BLOCK_VARS (block); t ; t = DECL_CHAIN (t))
>     /* if (!TREE_STATIC (t) && !DECL_EXTERNAL (t)) */
> +    if ((TREE_CODE (t) != VAR_DECL && TREE_CODE (t) != RESULT_DECL)
> +       || !DECL_NONSHAREABLE (t))
>       TREE_USED (t) = 0;
>
>   for (t = BLOCK_SUBBLOCKS (block); t ; t = BLOCK_CHAIN (t))
> --- gcc/tree-cfg.c.jj   2011-03-14 14:12:15.0 +0100
> +++ gcc/tree-cfg.c      2011-06-13 19:34:18.0 +0200
> @@ -5117,6 +5117,7 @@ gimple_duplicate_bb (basic_block bb)
>     {
>       def_operand_p def_p;
>       ssa_op_iter op_iter;
> +      tree lhs;
>
>       stmt = gsi_stmt (gsi);
>       if (gimple_code (stmt) == GIMPLE_LABEL)
> @@ -5130,6 +5131,24 @@ gimple_duplicate_bb (basic_block bb)
>       maybe_duplicate_eh_stmt (copy, stmt);
>       gimple_duplicate_stmt_histograms (cfun, copy, cfun, stmt);
>
> +      /* When copying around a stmt writing into a local non-user
> +        aggregate, make sure it won't share stack slot with other
> +        vars.  */
> +      lhs = gimple_get_lhs (stmt);
> +      if (lhs && TREE_CODE (lhs) != SSA_NAME)
> +       {
> +         tree base = get_base_address (lhs);
> +         if (base
> +             && (TREE_CODE (base) == VAR_DECL
> +                 || TREE_CODE (base) == RESULT_DECL)
> +             && DECL_IGNORED_P (base)
> +             && !TREE_STATIC (base)
> +             && !DECL_EXTERNAL (base)
> +             && (TREE_CODE (base) != VAR_DECL
> +                 || !DECL_HAS_VALUE_EXPR_P (base)))
> +           DECL_NONSHAREABLE (base) = 1;
> +       }
> +
>       /* Create new names for all the definitions created by COPY and
>         add replacement mappings for each new name.  */
>       FOR_EACH_SSA_DEF_OPERAND (def_p, copy, op_iter, SSA_OP_ALL_DEFS)
> --- gcc/testsuite/gfortran.dg/pr49103.f90.jj    2011-05-31 13:52:43.0 
> +0200
> +++ gcc/testsuite/gfortran.dg/pr49103.f90       2011-05-31 13:57:16.0 
> +0200
> @@ -0,0 +1,19 @@
> +! PR fortran/49103
> +! { dg-do run }
> +  integer :: a(2), b(2), i, j
> +

Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

2011-06-14 Thread Richard Guenther

On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng  wrote:
> Hi,
>
> The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which 
> introduces splitting avx256 unaligned loads.
> However, we found that it causes significant regressions for cpu2006 ( 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
>
> In this work, we introduce a tune option that sets splitting unaligned loads 
> default only for such CPUs that such splitting
> is beneficial.
>
> The patch passed bootstrapping and regression tests on 
> x86_64-unknown-linux-gnu system.
>
> Is it OK to commit?

It probably should go to the 4.6 branch as well.  Note that I find the
X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?

I'll defer to x86 maintainers for approval.

Richard.

> Thanks,
>
> Changpeng

Re: [google] Add intermediate text format for gcov (issue4595053)

2011-06-14 Thread Richard Guenther

On Tue, Jun 14, 2011 at 8:14 AM, Sharad Singhai  wrote:
> This patch adds an intermediate gcov text format which does not require
> source code. This format can be used by lcov or other tools.
>
> I have bootstrapped it on x86 and all tests pass. Okay for main?

I think there should be either a specification of the format in gcov.texi
or a reference to a specification if it exists elsewhere.

Richard.

> Thanks,
> Sharad
>
> 2011-06-13   Sharad Singhai  
>
>        Google Ref 3
>
>        * doc/gcov.texi: Document gcov intermediate format.
>        * gcov.c (get_gcov_file_intermediate_name): New function.
>        (output_intermediate_file): New function.
>        * testsuite/g++.dg/gcov/gcov-7.C: New test.
>
>
> Index: doc/gcov.texi
> ===
> --- doc/gcov.texi       (revision 174926)
> +++ doc/gcov.texi       (working copy)
> @@ -130,6 +130,7 @@
>      [@option{-f}|@option{--function-summaries}]
>      [@option{-o}|@option{--object-directory} @var{directory|file}] 
> @var{sourcefiles}
>      [@option{-u}|@option{--unconditional-branches}]
> +     [@option{-i}|@option{--intermediate-format}]
>      [@option{-d}|@option{--display-progress}]
>  @c man end
>  @c man begin SEEALSO
> @@ -216,6 +217,12 @@
>  @itemx --display-progress
>  Display the progress on the standard output.
>
> +@item -i
> +@itemx --intermediate-format
> +Output gcov file in an intermediate text format that can be used by
> +@command{lcov} or other applications. It will output a single *.gcov file per
> +*gcda file. No source code is required.
> +
>  @end table
>
>  @command{gcov} should be run with the current directory the same as that
> Index: gcov.c
> ===
> --- gcov.c      (revision 174926)
> +++ gcov.c      (working copy)
> @@ -38,6 +38,7 @@
>  #include "tm.h"
>  #include "intl.h"
>  #include "version.h"
> +#include "demangle.h"
>
>  #include 
>
> @@ -310,6 +311,9 @@
>
>  static int flag_display_progress = 0;
>
> +/* Output *.gcov file in intermediate format used by 'lcov'.  */
> +static int flag_intermediate_format = 0;
> +
>  /* For included files, make the gcov output file name include the name
>    of the input source file.  For example, if x.h is included in a.c,
>    then the output file name is a.c##x.h.gcov instead of x.h.gcov.  */
> @@ -436,6 +440,11 @@
>   fnotice (file, "  -o, --object-directory DIR|FILE Search for object files 
> in DIR or called FILE\n");
>   fnotice (file, "  -p, --preserve-paths            Preserve all pathname 
> components\n");
>   fnotice (file, "  -u, --unconditional-branches    Show unconditional branch 
> counts too\n");
> +  fnotice (file, "  -i, --intermediate-format       Output .gcov file in an 
> intermediate text\n\
> +                                    format that can be used by 'lcov' or 
> other\n\
> +                                    applications.  It will output a single\n\
> +                                    .gcov file per .gcda file.  No source 
> file\n\
> +                                    is required.\n");
>   fnotice (file, "  -d, --display-progress          Display progress 
> information\n");
>   fnotice (file, "\nFor bug reporting instructions, please see:\n%s.\n",
>           bug_report_url);
> @@ -472,6 +481,7 @@
>   { "object-file",          required_argument, NULL, 'o' },
>   { "unconditional-branches", no_argument,     NULL, 'u' },
>   { "display-progress",     no_argument,       NULL, 'd' },
> +  { "intermediate-format",  no_argument,       NULL, 'i' },
>   { 0, 0, 0, 0 }
>  };
>
> @@ -482,7 +492,8 @@
>  {
>   int opt;
>
> -  while ((opt = getopt_long (argc, argv, "abcdfhlno:puv", options, NULL)) != 
> -1)
> +  while ((opt = getopt_long (argc, argv, "abcdfhilno:puv", options, NULL)) !=
> +         -1)
>     {
>       switch (opt)
>        {
> @@ -516,6 +527,10 @@
>        case 'u':
>          flag_unconditional = 1;
>          break;
> +       case 'i':
> +          flag_intermediate_format = 1;
> +          flag_gcov_file = 1;
> +          break;
>         case 'd':
>           flag_display_progress = 1;
>           break;
> @@ -531,6 +546,109 @@
>   return optind;
>  }
>
> +/* Get the name of the gcov file.  The return value must be free'd.
> +
> +   It appends the '.gcov' extension to the *basename* of the file.
> +   The resulting file name will be in PWD.
> +
> +   e.g.,
> +   input: foo.da,       output: foo.da.gcov
> +   input: a/b/foo.cc,   output: foo.cc.gcov  */
> +
> +static char *
> +get_gcov_file_intermediate_name (const char *file_name)
> +{
> +  const char *gcov = ".gcov";
> +  char *result;
> +  const char *cptr;
> +
> +  /* Find the 'basename'.  */
> +  cptr = lbasename (file_name);
> +
> +  result = XNEWVEC(char, strlen (cptr) + strlen (gcov) + 1);
> +  sprintf (result, "%s%s", cptr, gcov);
> +
> +  return result;
> +}
> +
> +/* Output the result in intermediate format used by 'lcov'.
> +
> +This format contains a sin

Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

2011-06-14 Thread Jakub Jelinek

On Tue, Jun 14, 2011 at 12:13:47PM +0200, Richard Guenther wrote:
> On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng  
> wrote:
> > The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which 
> > introduces splitting avx256 unaligned loads.
> > However, we found that it causes significant regressions for cpu2006 ( 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
> >
> > In this work, we introduce a tune option that sets splitting unaligned 
> > loads default only for such CPUs that such splitting
> > is beneficial.
> >
> > The patch passed bootstrapping and regression tests on 
> > x86_64-unknown-linux-gnu system.
> >
> > Is it OK to commit?
> 
> It probably should go to the 4.6 branch as well.  Note that I find the
> X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
> why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?

I also wonder what we should do for -mtune=generic.  Should we split or not?
How big improvement is it on Intel chips, how big degradation does it
cause on AMD chips (I assume no other chip maker currently supports AVX)?

Jakub

Re: [Patch, AVR]: Fix PR46779

2011-06-14 Thread Georg-Johann Lay

Denis Chertykov schrieb:
> 2011/6/13 Georg-Johann Lay :
>> So you think is is pointless/discouraged to give a more realistic
>> description of AVR addressing be means of MODE_CODE_BASE_REG_CLASS (instead
>> of BASE_REG_CLASS) resp. REGNO_MODE_CODE_OK_FOR_BASE_P?
>>
>>> Look carefully at `out_movqi_r_mr'.
>>> There are even two fake addressing modes:
>>> 1. [Y + infinite-dslacement];
>>> 2. [X + (0...63)].
>> Yes, I know. The first is introduced by avr_legitimate_address_p and the
>> second appears to be artifact of LEGITIMIZE_RELOAD_ADDRESS.
>>
>> The changes are basically MODE_CODE_BASE_REG_CLASS (introduced in 4.2) and a
>> rewrite of avr_legitimate_address_p. The changes aim at a better addressing
>> for X and to minimize fake addresses.
>>
>>> I have spent a many hours (days, months) to debug GCC (especially avr port
>>> and reload) for right addressing modes.
>>> I have stopped on this code.
>>> AVR have a limited memory addressing and GCC can't handle it in native
>>> form.
>>> Because of that I have supported a fake adddressing modes.
>> I assume the code is from prior to 4.2 when REGNO_MODE_CODE_OK_FOR_BASE_P
>> and MODE_CODE_BASE_REG_CLASS had not been available so that supporting X
>> required some hacking.
>> All that would still be fine; however the new register allocator leads to
>> code that noone would accept. Accessing a structure through a pointer is not
>> uncommon, not even on AVR. So if Z is used for, say accessing flash, X
>> appears to be the best register.
>>
>> The shortcoming in GCC is that there is no way to give costs of addressing
>> (TARGET_ADDRESS_COST does different things).
>>
>> So take a look what avr-gcc compiles here:
>>  http://gcc.gnu.org/bugzilla/attachment.cgi?id=22242
>> I saw similar complains in forums on the web.
>>
>>> (Richard Henderson have a different opinion: GCC can, AVR port can't)
>> What does he mean with that?
>>
>>> IMHO that three limited pointer registers is not enough for C compiler.
>>> Even more with frame pointer it's only two and X is a very limited.
>> The current implementation has several oddities like
>>
>> * allowing SUBREGs in avr-legitimate_address_p
>> * changing BASE_REG_CLASS on the fly (by means of reload_completed)
>>
>> isn't that supposed to cause trouble?
> 
> You can try to remove all oddities and check results.
> Definitely something changed in GCC core since I wrote addressing code.
> 
> 
> Denis.

For your interest, here is a patch that shows the changes in
addressing mode.

Note that the

* LEGITIMIZE_RELOAD_ADDRESS is disabled. This is because I am
  unsure about how it should look like. The special cases for X
  are no more needed, and for Y and Z it might be good to have
  intermediate addresses with, say offset =0 mod 60, so that
  big offsets can be reached with addr + const, 0<= const < 60.

* patch already includes patch for pr46779 from
  http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00810.html

* As I said above, I removed orphan DI insns.

So if you have a look into reload, it might also be interesting what
it does with this changes.

Johann

--

* config/avr/avr.h (BASE_REG_CLASS): Remove.
(REG_OK_FOR_BASE_NOSTRICT_P): Remove.
(REG_OK_FOR_BASE_STRICT_P): Remove.
(LEGITIMIZE_RELOAD_ADDRESS): Remove.
(MODE_CODE_BASE_REG_CLASS): New Define.
(REGNO_MODE_CODE_OK_FOR_BASE_P): New Define.

* config/avr/avr.c: (avr_legitimate_address_p): Rewrite to allow
addresses that can actually be handled by hardware.
(avr_regno_mode_code_ok_for_base_p): New global Function.
(avr_mode_code_base_reg_class): New global Function.
(avr_hard_regno_mode_ok): Allow QI in all GPRs.
(avr_reg_ok_for_addr): New static function.
(avr_regno_reg_class): Change return type from enum reg_class to
reg_class_t.
(reg_class_tab): Set base type to reg_class_t. Return smallest
register class for each register.

* config/avr/avr.md: ("*sbrx_branch"): Disallow DI in mode.
("rotl3"): Ditto.
("*movqi"): Remove constraint 'Q'.
("*movsi"): Ditto.
("*movsf"): Ditto.
("*ashlqi3", "ashrqi3", "*lshrqi3"): Ditto.
("ashlhi3", "ashrhi3", "lshrhi3"): Ditto.
("ashlsi3", "ashrsi3", "lshrsi3"): Ditto.
("*movhi_sp"): Remove insn.
("zero_extendqidi2"): Remove insn_and_split.
("zero_extendhidi2"): Remove insn_and_split.
("zero_extendsidi2"): Remove insn_and_split.

* config/avr/avr-protos.h
(secondary_input_reload_class): Remove prototype.
(avr_mode_code_base_reg_class): New prototype.
(avr_regno_mode_code_ok_for_base_p): New prototype.
(avr_legitimize_reload_address): New prototype.
Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(Revision 175011)
+++ config/avr/avr-protos.h	(Arbeitskopie)
@@ -24,7 +24,7 @@
 
 extern int function_arg_regno_

Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated)

2011-06-14 Thread Richard Guenther

On Tue, Jun 14, 2011 at 11:40 AM, Joern Rennecke  wrote:
> Except or the fortran/java bits (committed), this patch hasn't been
> reviewed for five weeks:
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00582.html

A patch doing s/CUMULATIVE_ARGS*/cumulative_args_t/ only
is ok.  Posting compressed attached patches makes it too easy
to not review things btw ...

After that patch the "meat" of the patch should be much much smaller
and easier to review (if there is anything left besides the renaming?).

Thanks,
Richard.

Re: [patch] Don't insert pattern statements into the code (was Fix PR tree-optimization/49318)

2011-06-14 Thread Ira Rosen

On 14 June 2011 13:02, Richard Guenther  wrote:
> On Mon, Jun 13, 2011 at 2:43 PM, Ira Rosen  wrote:
>> On 10 June 2011 12:14, Richard Guenther  wrote:
>>> In the end I think we should not generate the pattern stmt during
>>> pattern matching but only mark the relevant statements with a
>>> pattern kind.  Say, for each pattern we have a "main" statement
>>> that has related stmts belonging to the pattern that define uses
>>> of the "main" statement - mark those to refer to that "main" statement.
>>> For that "main" statement simply record an enum value, like,
>>> widening_mult.  Then only at vectorized statement
>>> generation time actually generate the vectorized form of the
>>> pattern statement.
>>
>> I ended up with the following: during pattern detection a new scalar
>> pattern statement is created but not inserted into the code, it is
>> only recorded as a related statement of the last statement in the
>> detected pattern. Every time the last statement is being
>> analyzed/transformed, we switch to the pattern statement instead. It
>> is much more difficult just to mark the last stmt with an enum value,
>> since we have to retrieve the relevant operands every time.
>>
>> I am not sure if we need to free the pattern stmt at the end.
>>
>> Bootstrapped and now testing on powerpc64-suse-linux (tested
>> vectorizer testsuite on powerpc64-suse-linux and x86_64-suse-linux.
>>
>> What do you think?
>
>   /* Mark the stmts that are involved in the pattern. */
> -  gsi_insert_before (&si, pattern_stmt, GSI_SAME_STMT);
>   set_vinfo_for_stmt (pattern_stmt,
>                      new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL));
> +  gimple_set_bb (pattern_stmt, gimple_bb (stmt));
>
> do you really need this?

Yes, there are a lot of uses of gimple_bb (stmt). Otherwise, we'd have
to check there that bb exists (or that this is not a pattern stmt) and
use the bb of the original statement if not.

> Otherwise it looks reasonable.  Btw,
> we can probably remove the simple DCE done in
> slpeel_tree_peel_loop_to_edge (remove_dead_stmts_from_loop)
> with this patch.

I'll try that.

Thanks,
Ira

>
> Thanks,
> Richard.
>
>> Thanks,
>> Ira
>>
>> ChangeLog:
>>
>>     * tree-vect-loop.c (vect_determine_vectorization_factor): Don't
>>     remove irrelevant pattern statements.  For irrelevant statements
>>     check if it is the last statement of a detected pattern, use
>>     corresponding pattern statement instead.
>>     (destroy_loop_vec_info): No need to remove pattern statements,
>>     only free stmt_vec_info.
>>     (vect_transform_loop): For irrelevant statements check if it is
>>     the last statement of a detected pattern, use corresponding
>>     pattern statement instead.
>>     * tree-vect-patterns.c (vect_pattern_recog_1): Don't insert
>>     pattern statements.  Set basic block for the new statement.
>>     (vect_pattern_recog): Update documentation.
>>     * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Scan
>>     operands of pattern statements.
>>     (vectorizable_call): Fix printing.  In case of a pattern statement
>>     use the lhs of the original statement when creating a dummy
>>     statement to replace the original call.
>>     (vect_analyze_stmt): For irrelevant statements check if it is
>>     the last statement of a detected pattern, use corresponding
>>     pattern statement instead.
>>     * tree-vect-slp.c (vect_schedule_slp_instance): For pattern
>>     statements use gsi of the original statement.
>>
>

Re: Ping: The TI C6X port

2011-06-14 Thread Bernd Schmidt

Ping^4 for the C6X port.

> Additional preliminary scheduler tweaks:
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02408.html
> 
> Allow alternatives in attr "predicable":
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00094.html
> 
> regrename across basic block boundaries:
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02193.html
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02194.html
> 
>> 6/11: REG_WORDS_BIG_ENDIAN
>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00757.html
>>
>> 7/11: Cope with using a section name other than ".rodata".
>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00909.html
>>
>> 8/11: Round function arg sizes to more than PARM_BOUNDARY
>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02170.html
>>
>> 9/11: Make eq_attr work if an attribute uses (attr "...")
>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00761.html
>>
>> 10/11: The port
>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00764.html
>>
>> 11/11: Testcases
>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00762.html
>

Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated)

2011-06-14 Thread Joern Rennecke


Quoting Richard Guenther :


On Tue, Jun 14, 2011 at 11:40 AM, Joern Rennecke  wrote:

Except or the fortran/java bits (committed), this patch hasn't been
reviewed for five weeks:
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00582.html


A patch doing s/CUMULATIVE_ARGS*/cumulative_args_t/ only
is ok.


It's not quite that simple.  The patch makes a distinction between pointers
to the target specific types CUMULATIVE_ARGS, and the target-independent
cumulative_args_t.

Is it still OK if I selectively do the replacement where the
target-independent type is meant, and add a provisional
typedef CUMULATIVE_ARGS *cumulative_args_t to tie it together?


Posting compressed attached patches makes it too easy
to not review things btw ...


The mailing list size limits did't allow this patch to be posted
without compression.


After that patch the "meat" of the patch should be much much smaller
and easier to review (if there is anything left besides the renaming?).


It should be somewhat smaller, but there are lots of places where we have
to convert between cumulative_args_t and CUMULATIVE_ARGS *.
Were a target-independent interface is required, we need cumulative_args_t .
Where a target accesses struct components, it needs CUMULATIVE_ARGS *.
There are some places that just pass CUMULATIVE_ARGS * around, both in  
rtl-centric middle-end/ rtl-optimizer code and in target code, which

could be electively converted.  In general, I haven't done such optional
conversions.  They could be added according to taste once the interface
has been straightened out.  There is also a judgement call in each place
how closely the code is tied to the cumulative_args_t side or the
CUMULATIVE_ARGS * side.

Re: [PATH] PR/49139 fix always_inline failures diagnostics

2011-06-14 Thread Christian Bruel



Unfortunately still not satisfactory, I've been testing it against a few
packages, and I notice excessive warnings with the use of __typeof (__error)
that doesn't propagate the inline keyword.

For instance, a reduced use extracted from the glibc

extern __inline __attribute__ ((__always_inline__))  void
error ()
{

}

extern void
__error ()
{
}

extern __typeof (__error) error __attribute__ ((weak, alias ("__error")));

emits an annoying warning on the error redefinition.

So, a check in addition of the DECL_DECLARED_INLINED_P is needed,
TREE_ADDRESSABLE seems appropriate, since in the case of missing inline the
function would be emitted. So I'm testing:

if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (decl))
  &&  !DECL_DECLARED_INLINE_P (decl)
  &&  TREE_ADDRESSABLE (decl))

other idea ? or should be just drop this warning ?


Hmm.  Honza, any idea on the above?  Christian, I suppose you
could check if the cgraph node for that decl has the redefined_extern_inline
flag set (checking TREE_ADDRESSABLE looks bogus to me).
I'm not sure how the frontend merges those two decls - I suppose
it will have a weak always-inline function with body :/


redefined_extern_inline is not set here. But DECL_STRUCT_FUNCTION(decl)) 
seems even best since it makes no sense to try the inline if the body is 
not available. So the former works well here.


Now, another annoying case, that I reduced from Xorg packages built from 
the glibc with -fstack-protector-all -D_FORTIFY_SOURCE=2.


to the following code:
-
#include 

char *
realpath(path, resolved)
const char *path;
char *resolved;
{
   ...
   return (NULL);
}

preprocesses as:

extern char *__realpath_alias (__const char *__restrict __name, char 
*__restrict __resolved) __asm__ ("" "realpath") __attribute__ 
((__nothrow__)) __attribute__ ((__warn_unused_result__));


extern __inline __attribute__ ((__always_inline__)) __attribute__ 
((__artificial__)) __attribute__ ((__warn_unused_result__)) char *
__attribute__ ((__nothrow__)) realpath (__const char *__restrict __name, 
char *__restrict __resolved)

{
  return __realpath_alias (__name, __resolved);
}

char *
realpath(path, resolved)
 const char *path;
 char *resolved;
{
 return (((void *)0));
}

---

The problem is that the second redefinition, inherits the attributes of 
from the first extern inline declaration. So we get the warning emitted 
for this one..


It should be fine to redefine an extern inline function. is it ? So it 
would be a frontend bug to not reset the attributes when meeting the 
redefinition. Unless the first definition is an declaration and the 
attribute applies to all.


I also thought to test for the attribute artificial before emitting the 
warning, but that doesn't look correct, since this is only use from 
debugging information.


So the question is : is the redefinition of an extern inline function OK 
(I think yes), and should it inherit the attribute of the first one ?


Any idea ?

Many thanks

Christian




Richard.

Re: [patch] Don't insert pattern statements into the code (was Fix PR tree-optimization/49318)

2011-06-14 Thread Richard Guenther

On Tue, Jun 14, 2011 at 12:38 PM, Ira Rosen  wrote:
> On 14 June 2011 13:02, Richard Guenther  wrote:
>> On Mon, Jun 13, 2011 at 2:43 PM, Ira Rosen  wrote:
>>> On 10 June 2011 12:14, Richard Guenther  wrote:
 In the end I think we should not generate the pattern stmt during
 pattern matching but only mark the relevant statements with a
 pattern kind.  Say, for each pattern we have a "main" statement
 that has related stmts belonging to the pattern that define uses
 of the "main" statement - mark those to refer to that "main" statement.
 For that "main" statement simply record an enum value, like,
 widening_mult.  Then only at vectorized statement
 generation time actually generate the vectorized form of the
 pattern statement.
>>>
>>> I ended up with the following: during pattern detection a new scalar
>>> pattern statement is created but not inserted into the code, it is
>>> only recorded as a related statement of the last statement in the
>>> detected pattern. Every time the last statement is being
>>> analyzed/transformed, we switch to the pattern statement instead. It
>>> is much more difficult just to mark the last stmt with an enum value,
>>> since we have to retrieve the relevant operands every time.
>>>
>>> I am not sure if we need to free the pattern stmt at the end.

No, they are going to be garbage collected.

>>> Bootstrapped and now testing on powerpc64-suse-linux (tested
>>> vectorizer testsuite on powerpc64-suse-linux and x86_64-suse-linux.
>>>
>>> What do you think?
>>
>>   /* Mark the stmts that are involved in the pattern. */
>> -  gsi_insert_before (&si, pattern_stmt, GSI_SAME_STMT);
>>   set_vinfo_for_stmt (pattern_stmt,
>>                      new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL));
>> +  gimple_set_bb (pattern_stmt, gimple_bb (stmt));
>>
>> do you really need this?
>
> Yes, there are a lot of uses of gimple_bb (stmt). Otherwise, we'd have
> to check there that bb exists (or that this is not a pattern stmt) and
> use the bb of the original statement if not.

I see.  It's not really uglier than the part where you have to special-case
them when walking use-operands, so ...

Still a lot better than when inserting them for real.

>> Otherwise it looks reasonable.  Btw,
>> we can probably remove the simple DCE done in
>> slpeel_tree_peel_loop_to_edge (remove_dead_stmts_from_loop)
>> with this patch.
>
> I'll try that.

Thanks,
Richard.

> Thanks,
> Ira
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Ira
>>>
>>> ChangeLog:
>>>
>>>     * tree-vect-loop.c (vect_determine_vectorization_factor): Don't
>>>     remove irrelevant pattern statements.  For irrelevant statements
>>>     check if it is the last statement of a detected pattern, use
>>>     corresponding pattern statement instead.
>>>     (destroy_loop_vec_info): No need to remove pattern statements,
>>>     only free stmt_vec_info.
>>>     (vect_transform_loop): For irrelevant statements check if it is
>>>     the last statement of a detected pattern, use corresponding
>>>     pattern statement instead.
>>>     * tree-vect-patterns.c (vect_pattern_recog_1): Don't insert
>>>     pattern statements.  Set basic block for the new statement.
>>>     (vect_pattern_recog): Update documentation.
>>>     * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Scan
>>>     operands of pattern statements.
>>>     (vectorizable_call): Fix printing.  In case of a pattern statement
>>>     use the lhs of the original statement when creating a dummy
>>>     statement to replace the original call.
>>>     (vect_analyze_stmt): For irrelevant statements check if it is
>>>     the last statement of a detected pattern, use corresponding
>>>     pattern statement instead.
>>>     * tree-vect-slp.c (vect_schedule_slp_instance): For pattern
>>>     statements use gsi of the original statement.
>>>
>>
>

Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated)

2011-06-14 Thread Richard Guenther

On Tue, Jun 14, 2011 at 1:16 PM, Joern Rennecke  wrote:
> Quoting Richard Guenther :
>
>> On Tue, Jun 14, 2011 at 11:40 AM, Joern Rennecke 
>> wrote:
>>>
>>> Except or the fortran/java bits (committed), this patch hasn't been
>>> reviewed for five weeks:
>>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00582.html
>>
>> A patch doing s/CUMULATIVE_ARGS*/cumulative_args_t/ only
>> is ok.
>
> It's not quite that simple.  The patch makes a distinction between pointers
> to the target specific types CUMULATIVE_ARGS, and the target-independent
> cumulative_args_t.
>
> Is it still OK if I selectively do the replacement where the
> target-independent type is meant, and add a provisional
> typedef CUMULATIVE_ARGS *cumulative_args_t to tie it together?
>
>> Posting compressed attached patches makes it too easy
>> to not review things btw ...
>
> The mailing list size limits did't allow this patch to be posted
> without compression.
>
>> After that patch the "meat" of the patch should be much much smaller
>> and easier to review (if there is anything left besides the renaming?).
>
> It should be somewhat smaller, but there are lots of places where we have
> to convert between cumulative_args_t and CUMULATIVE_ARGS *.
> Were a target-independent interface is required, we need cumulative_args_t .
> Where a target accesses struct components, it needs CUMULATIVE_ARGS *.
> There are some places that just pass CUMULATIVE_ARGS * around, both in
> rtl-centric middle-end/ rtl-optimizer code and in target code, which
> could be electively converted.  In general, I haven't done such optional
> conversions.  They could be added according to taste once the interface
> has been straightened out.  There is also a judgement call in each place
> how closely the code is tied to the cumulative_args_t side or the
> CUMULATIVE_ARGS * side.

Hmm, I see.  Maybe a GWP wants to ack your patch in whole then.

Ian?

Thanks,
Richard.

Re: [Patch, AVR]: Fix PR46779

2011-06-14 Thread Denis Chertykov

2011/6/14 Georg-Johann Lay :
> Denis Chertykov schrieb:
>> 2011/6/13 Georg-Johann Lay :
>>> So you think is is pointless/discouraged to give a more realistic
>>> description of AVR addressing be means of MODE_CODE_BASE_REG_CLASS (instead
>>> of BASE_REG_CLASS) resp. REGNO_MODE_CODE_OK_FOR_BASE_P?
>>>
 Look carefully at `out_movqi_r_mr'.
 There are even two fake addressing modes:
 1. [Y + infinite-dslacement];
 2. [X + (0...63)].
>>> Yes, I know. The first is introduced by avr_legitimate_address_p and the
>>> second appears to be artifact of LEGITIMIZE_RELOAD_ADDRESS.
>>>
>>> The changes are basically MODE_CODE_BASE_REG_CLASS (introduced in 4.2) and a
>>> rewrite of avr_legitimate_address_p. The changes aim at a better addressing
>>> for X and to minimize fake addresses.
>>>
 I have spent a many hours (days, months) to debug GCC (especially avr port
 and reload) for right addressing modes.
 I have stopped on this code.
 AVR have a limited memory addressing and GCC can't handle it in native
 form.
 Because of that I have supported a fake adddressing modes.
>>> I assume the code is from prior to 4.2 when REGNO_MODE_CODE_OK_FOR_BASE_P
>>> and MODE_CODE_BASE_REG_CLASS had not been available so that supporting X
>>> required some hacking.
>>> All that would still be fine; however the new register allocator leads to
>>> code that noone would accept. Accessing a structure through a pointer is not
>>> uncommon, not even on AVR. So if Z is used for, say accessing flash, X
>>> appears to be the best register.
>>>
>>> The shortcoming in GCC is that there is no way to give costs of addressing
>>> (TARGET_ADDRESS_COST does different things).
>>>
>>> So take a look what avr-gcc compiles here:
>>>  http://gcc.gnu.org/bugzilla/attachment.cgi?id=22242
>>> I saw similar complains in forums on the web.
>>>
 (Richard Henderson have a different opinion: GCC can, AVR port can't)
>>> What does he mean with that?
>>>
 IMHO that three limited pointer registers is not enough for C compiler.
 Even more with frame pointer it's only two and X is a very limited.
>>> The current implementation has several oddities like
>>>
>>> * allowing SUBREGs in avr-legitimate_address_p
>>> * changing BASE_REG_CLASS on the fly (by means of reload_completed)
>>>
>>> isn't that supposed to cause trouble?
>>
>> You can try to remove all oddities and check results.
>> Definitely something changed in GCC core since I wrote addressing code.
>>
>>
>> Denis.
>
> For your interest, here is a patch that shows the changes in
> addressing mode.

Generally, the patch seems as a "right thing". I like it.

How about a regression testing and code quality.

Denis.

[PATCH] sel-sched: Avoid placing bookkeeping code above a fence (PR49349)

2011-06-14 Thread Alexander Monakov

Hello,

Quoting myself from the PR audit trail,

It's a rare bug in sel-sched: we fail to schedule some code in non-pipelining
mode.  The root cause is that we put bookkeeping instructions above a fence
that is placed on the last insn (uncond. jump) of the bookkeeping block.  We
could either make such blocks ineligible for bookkeeping or rewind such fences
from the jump back to the bookkeeping code (there's also a more involved
approach of re-introducing the idea of using local nops as placeholders for
fences).  I'm testing the following patch that implements the second approach
(as it should result in a bit cleaner code in such situations).

I'm also removing a conditional that allows NULL place_to_insert in
generate_bookkeeping_insn, as I don't see how it can possibly happen with
current implementation of find_place_for_bookkeeping.

Bootstrapped and regtested on ia64-linux, OK for trunk?  Steve Ellcey
confirmed that HP-UX testing is OK as well.

2011-06-14  Alexander Monakov  

PR target/49349
* sel-sched.c (find_place_for_bookkeeping): Add new parameter
(fence_to_rewind).  Use it to notice when bookkeeping will be placed
above a fence.  Update comments.
(generate_bookkeeping_insn): Rewind fence when bookkeeping code is
placed just above it.  Do not allow NULL place_to_insert.

diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 3f22a3c..92ba222 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -4663,9 +4663,10 @@ create_block_for_bookkeeping (edge e1, edge e2)
 }
 
 /* Return insn after which we must insert bookkeeping code for path(s) incoming
-   into E2->dest, except from E1->src.  */
+   into E2->dest, except from E1->src.  If the returned insn immediately
+   precedes a fence, assign that fence to *FENCE_TO_REWIND.  */
 static insn_t
-find_place_for_bookkeeping (edge e1, edge e2)
+find_place_for_bookkeeping (edge e1, edge e2, fence_t *fence_to_rewind)
 {
   insn_t place_to_insert;
   /* Find a basic block that can hold bookkeeping.  If it can be found, do not
@@ -4707,9 +4708,14 @@ find_place_for_bookkeeping (edge e1, edge e2)
sel_print ("Pre-existing bookkeeping block is %i\n", book_block->index);
 }
 
-  /* If basic block ends with a jump, insert bookkeeping code right before it. 
 */
+  *fence_to_rewind = NULL;
+  /* If basic block ends with a jump, insert bookkeeping code right before it.
+ Notice if we are crossing a fence when taking PREV_INSN.  */
   if (INSN_P (place_to_insert) && control_flow_insn_p (place_to_insert))
-place_to_insert = PREV_INSN (place_to_insert);
+{
+  *fence_to_rewind = flist_lookup (fences, place_to_insert);
+  place_to_insert = PREV_INSN (place_to_insert);
+}
 
   return place_to_insert;
 }
@@ -4784,21 +4790,23 @@ generate_bookkeeping_insn (expr_t c_expr, edge e1, edge 
e2)
   insn_t join_point, place_to_insert, new_insn;
   int new_seqno;
   bool need_to_exchange_data_sets;
+  fence_t fence_to_rewind;
 
   if (sched_verbose >= 4)
 sel_print ("Generating bookkeeping insn (%d->%d)\n", e1->src->index,
   e2->dest->index);
 
   join_point = sel_bb_head (e2->dest);
-  place_to_insert = find_place_for_bookkeeping (e1, e2);
-  if (!place_to_insert)
-return NULL;
+  place_to_insert = find_place_for_bookkeeping (e1, e2, &fence_to_rewind);
   new_seqno = find_seqno_for_bookkeeping (place_to_insert, join_point);
   need_to_exchange_data_sets
 = sel_bb_empty_p (BLOCK_FOR_INSN (place_to_insert));
 
   new_insn = emit_bookkeeping_insn (place_to_insert, c_expr, new_seqno);
 
+  if (fence_to_rewind)
+FENCE_INSN (fence_to_rewind) = new_insn;
+
   /* When inserting bookkeeping insn in new block, av sets should be
  following: old basic block (that now holds bookkeeping) data sets are
  the same as was before generation of bookkeeping, and new basic block

Re: [patch] Don't insert pattern statements into the code (was Fix PR tree-optimization/49318)

2011-06-14 Thread Ira Rosen

On 14 June 2011 14:27, Richard Guenther  wrote:

>>>
>>>   /* Mark the stmts that are involved in the pattern. */
>>> -  gsi_insert_before (&si, pattern_stmt, GSI_SAME_STMT);
>>>   set_vinfo_for_stmt (pattern_stmt,
>>>                      new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL));
>>> +  gimple_set_bb (pattern_stmt, gimple_bb (stmt));
>>>
>>> do you really need this?
>>
>> Yes, there are a lot of uses of gimple_bb (stmt). Otherwise, we'd have
>> to check there that bb exists (or that this is not a pattern stmt) and
>> use the bb of the original statement if not.
>
> I see.  It's not really uglier than the part where you have to special-case
> them when walking use-operands, so ...

I think it is uglier, because there are 42 cases to handle instead of
a single place that you mentioned. (Probably not all the 42 can be
really reached with a pattern stmt, but still it's a lot).

Thanks,
Ira

>
> Still a lot better than when inserting them for real.
>
>>> Otherwise it looks reasonable.  Btw,
>>> we can probably remove the simple DCE done in
>>> slpeel_tree_peel_loop_to_edge (remove_dead_stmts_from_loop)
>>> with this patch.
>>
>> I'll try that.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Ira
>>
>>>
>>> Thanks,
>>> Richard.
>>>
 Thanks,
 Ira

 ChangeLog:

     * tree-vect-loop.c (vect_determine_vectorization_factor): Don't
     remove irrelevant pattern statements.  For irrelevant statements
     check if it is the last statement of a detected pattern, use
     corresponding pattern statement instead.
     (destroy_loop_vec_info): No need to remove pattern statements,
     only free stmt_vec_info.
     (vect_transform_loop): For irrelevant statements check if it is
     the last statement of a detected pattern, use corresponding
     pattern statement instead.
     * tree-vect-patterns.c (vect_pattern_recog_1): Don't insert
     pattern statements.  Set basic block for the new statement.
     (vect_pattern_recog): Update documentation.
     * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Scan
     operands of pattern statements.
     (vectorizable_call): Fix printing.  In case of a pattern statement
     use the lhs of the original statement when creating a dummy
     statement to replace the original call.
     (vect_analyze_stmt): For irrelevant statements check if it is
     the last statement of a detected pattern, use corresponding
     pattern statement instead.
     * tree-vect-slp.c (vect_schedule_slp_instance): For pattern
     statements use gsi of the original statement.

>>>
>>
>

Re: [patch] Don't insert pattern statements into the code (was Fix PR tree-optimization/49318)

2011-06-14 Thread Richard Guenther

On Tue, Jun 14, 2011 at 1:38 PM, Ira Rosen  wrote:
> On 14 June 2011 14:27, Richard Guenther  wrote:
>

   /* Mark the stmts that are involved in the pattern. */
 -  gsi_insert_before (&si, pattern_stmt, GSI_SAME_STMT);
   set_vinfo_for_stmt (pattern_stmt,
                      new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL));
 +  gimple_set_bb (pattern_stmt, gimple_bb (stmt));

 do you really need this?
>>>
>>> Yes, there are a lot of uses of gimple_bb (stmt). Otherwise, we'd have
>>> to check there that bb exists (or that this is not a pattern stmt) and
>>> use the bb of the original statement if not.
>>
>> I see.  It's not really uglier than the part where you have to special-case
>> them when walking use-operands, so ...
>
> I think it is uglier, because there are 42 cases to handle instead of
> a single place that you mentioned. (Probably not all the 42 can be
> really reached with a pattern stmt, but still it's a lot).

Well, yes - I meant setting the BB isn't uglier which means setting BB
is ok.

Richard.

> Thanks,
> Ira
>
>>
>> Still a lot better than when inserting them for real.
>>
 Otherwise it looks reasonable.  Btw,
 we can probably remove the simple DCE done in
 slpeel_tree_peel_loop_to_edge (remove_dead_stmts_from_loop)
 with this patch.
>>>
>>> I'll try that.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Ira
>>>

 Thanks,
 Richard.

> Thanks,
> Ira
>
> ChangeLog:
>
>     * tree-vect-loop.c (vect_determine_vectorization_factor): Don't
>     remove irrelevant pattern statements.  For irrelevant statements
>     check if it is the last statement of a detected pattern, use
>     corresponding pattern statement instead.
>     (destroy_loop_vec_info): No need to remove pattern statements,
>     only free stmt_vec_info.
>     (vect_transform_loop): For irrelevant statements check if it is
>     the last statement of a detected pattern, use corresponding
>     pattern statement instead.
>     * tree-vect-patterns.c (vect_pattern_recog_1): Don't insert
>     pattern statements.  Set basic block for the new statement.
>     (vect_pattern_recog): Update documentation.
>     * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Scan
>     operands of pattern statements.
>     (vectorizable_call): Fix printing.  In case of a pattern statement
>     use the lhs of the original statement when creating a dummy
>     statement to replace the original call.
>     (vect_analyze_stmt): For irrelevant statements check if it is
>     the last statement of a detected pattern, use corresponding
>     pattern statement instead.
>     * tree-vect-slp.c (vect_schedule_slp_instance): For pattern
>     statements use gsi of the original statement.
>

>>>
>>
>

Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated)

2011-06-14 Thread Bernd Schmidt

On 06/14/2011 01:29 PM, Richard Guenther wrote:
> On Tue, Jun 14, 2011 at 1:16 PM, Joern Rennecke  wrote:
>> Quoting Richard Guenther :
>>
>>> On Tue, Jun 14, 2011 at 11:40 AM, Joern Rennecke 
>>> wrote:

 Except or the fortran/java bits (committed), this patch hasn't been
 reviewed for five weeks:
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00582.html
>>>
>>> A patch doing s/CUMULATIVE_ARGS*/cumulative_args_t/ only
>>> is ok.
>>
>> It's not quite that simple.  The patch makes a distinction between pointers
>> to the target specific types CUMULATIVE_ARGS, and the target-independent
>> cumulative_args_t.
>>
>> Is it still OK if I selectively do the replacement where the
>> target-independent type is meant, and add a provisional
>> typedef CUMULATIVE_ARGS *cumulative_args_t to tie it together?
>>
>>> Posting compressed attached patches makes it too easy
>>> to not review things btw ...
>>
>> The mailing list size limits did't allow this patch to be posted
>> without compression.
>>
>>> After that patch the "meat" of the patch should be much much smaller
>>> and easier to review (if there is anything left besides the renaming?).
>>
>> It should be somewhat smaller, but there are lots of places where we have
>> to convert between cumulative_args_t and CUMULATIVE_ARGS *.
>> Were a target-independent interface is required, we need cumulative_args_t .
>> Where a target accesses struct components, it needs CUMULATIVE_ARGS *.
>> There are some places that just pass CUMULATIVE_ARGS * around, both in
>> rtl-centric middle-end/ rtl-optimizer code and in target code, which
>> could be electively converted.  In general, I haven't done such optional
>> conversions.  They could be added according to taste once the interface
>> has been straightened out.  There is also a judgement call in each place
>> how closely the code is tied to the cumulative_args_t side or the
>> CUMULATIVE_ARGS * side.
> 
> Hmm, I see.  Maybe a GWP wants to ack your patch in whole then.

I'm not getting the point of the use of attribute((transparent_union)).
That should be removed to eliminate potential differences when compiling
other compilers, and to eliminate a potential source of bugs when
passing cumulative_args_t arguments.

Some of the formatting changes to avoid long lines are unfortunate (and
it's not done consistently); I think I'd prefer to add temporary
variables to hold the return value of pack_cumulative_args and
get_cumulative_args.

-  targetm.calls.setup_incoming_varargs (&all->args_so_far,
-   data->promoted_mode,
-   data->passed_type,
-   &varargs_pretend_bytes, no_rtl);
+  (targetm.calls.setup_incoming_varargs
+(pack_cumulative_args (&all->args_so_far), data->promoted_mode,
+  data->passed_type, &varargs_pretend_bytes,
no_rtl));

No need for parentheses around the expression. Occurs in three places.
See previous comment about using temporary variables to avoid ugly
formatting.

I think it would be best just to minimize changes in backends as much as
possible by using the following pattern everywhere:

 static void
-ix86_function_arg_advance (CUMULATIVE_ARGS *cum, enum machine_mode mode,
+ix86_function_arg_advance (cumulative_args_t cum_v, enum machine_mode mode,
   const_tree type, bool named)
 {
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);

I.e., avoid changes such as the one in mn10300_function_arg_advance.

Also,

-   if (iq2000_function_arg (&temp, mode, type, named) != 0)
+   if (iq2000_function_arg (pack_cumulative_args (&temp), mode,
type, named)  != 0)

Extra tab character before !=.

-  if (targetm.calls.strict_argument_naming (arg_regs_used_so_far))
+  /* ??? the code inside is a pointer increment.  */
+  if (targetm.calls.strict_argument_naming (arg_regs_used_so_far_v))

What does this comment mean?

Finally, I could do without the comments squished to the right-hand side
like this:

+#include "tm.h"/* For INTMAX_TYPE, INT8_TYPE,
INT16_TYPE, INT32_TYPE,
+  INT64_TYPE, INT_LEAST8_TYPE, INT_LEAST16_TYPE,
+  INT_LEAST32_TYPE, INT_LEAST64_TYPE,
INT_FAST8_TYPE,
+  INT_FAST16_TYPE, INT_FAST32_TYPE,
INT_FAST64_TYPE,
+  BOOL_TYPE_SIZE, BITS_PER_UNIT, POINTER_SIZE,
+  INT_TYPE_SIZE, CHAR_TYPE_SIZE, SHORT_TYPE_SIZE,
+  LONG_TYPE_SIZE, LONG_LONG_TYPE_SIZE,
+  FLOAT_TYPE_SIZE, DOUBLE_TYPE_SIZE,
+  LONG_DOUBLE_TYPE_SIZE and
LIBGCC2_HAS_TF_MODE.  */

(I could do without these comments entirely but I see from the archives
that Joseph requested it.)

With these changes I think it'll be OK, but I'd like to see a new patch
version first.


Bernd

Fix dealII LTO link error

2011-06-14 Thread Jan Hubicka

Hi,
this patch solves problem with DealII and WHOPR.  The code to handle comdat 
groups was written
with assumption that everything in the group is COMDAT that is not always true.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* cgraph.c (cgraph_make_decl_local): Handle DECL_ONE_ONLY
similarly to DECL_COMDAT.
* cgraphunit.c (cgraph_analyze_function): Likewise.
* ipa.c (function_and_variable_visibility): Likewise.
Index: cgraph.c
===
--- cgraph.c(revision 175001)
+++ cgraph.c(working copy)
@@ -2487,7 +2487,7 @@ cgraph_make_decl_local (tree decl)
 DECL_COMMON (decl) = 0;
   else gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
 
-  if (DECL_COMDAT (decl))
+  if (DECL_ONE_ONLY (decl) || DECL_COMDAT (decl))
 {
   /* It is possible that we are linking against library defining same 
COMDAT
 function.  To avoid conflict we need to rename our local name of the
Index: cgraphunit.c
===
--- cgraphunit.c(revision 175001)
+++ cgraphunit.c(working copy)
@@ -830,9 +830,9 @@ cgraph_analyze_function (struct cgraph_n
   if (TREE_PUBLIC (node->decl) && node->same_body_alias)
{
   DECL_EXTERNAL (node->decl) = DECL_EXTERNAL (node->thunk.alias);
- if (DECL_COMDAT (node->thunk.alias))
+ if (DECL_ONE_ONLY (node->thunk.alias))
{
- DECL_COMDAT (node->decl) = 1;
+ DECL_COMDAT (node->decl) = DECL_COMDAT (node->thunk.alias);
  DECL_COMDAT_GROUP (node->decl) = DECL_COMDAT_GROUP 
(node->thunk.alias);
  if (DECL_ONE_ONLY (node->thunk.alias) && !node->same_comdat_group)
{
Index: ipa.c
===
--- ipa.c   (revision 175001)
+++ ipa.c   (working copy)
@@ -904,9 +904,9 @@ function_and_variable_visibility (bool w
 
 We also need to arrange the thunk into the same comdat group as
 the function it reffers to.  */
- if (DECL_COMDAT (decl_node->decl))
+ if (DECL_ONE_ONLY (decl_node->decl))
{
- DECL_COMDAT (node->decl) = 1;
+ DECL_COMDAT (node->decl) = DECL_COMDAT (decl_node->decl);
  DECL_COMDAT_GROUP (node->decl) = DECL_COMDAT_GROUP 
(decl_node->decl);
  if (DECL_ONE_ONLY (decl_node->decl) && !node->same_comdat_group)
{

Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated)

2011-06-14 Thread Joern Rennecke


Quoting Bernd Schmidt :


I'm not getting the point of the use of attribute((transparent_union)).


Without that attribute, lots of ABIs add a lot of overhead for function
argument and return value passing.  E.g. instead of putting the argument
in a register, put it on the stack, and place a pointer to that stack
location in a register; instead of returning the result in a register,
have the caller pass a pointer to a location in the stack, then have the
callee write the result via that pointer into the stack, and return the
pointer.
With the transparent union attribute, you get the same straigtforward
compiled code as before with CUMULATIVE_ARGS used throughout.


That should be removed to eliminate potential differences when compiling
other compilers,


I'm not sure what you mean here.  Do you want to have compilation units
of ENABLE_CHECKING compilers be compatible with !ENABLE_CHECKING ones?
In that case, we'd have to revamp vec.h, among others.


and to eliminate a potential source of bugs when
passing cumulative_args_t arguments.


Is that about not trusting the bootstrap gcc to implement that attribute
correctly?
Or do you want the integrity check from the ENABLE_CHECKING case to be
always present?


[fr30.c:fr30_setup_incoming_varargs]

-  if (targetm.calls.strict_argument_naming (arg_regs_used_so_far))
+  /* ??? the code inside is a pointer increment.  */
+  if (targetm.calls.strict_argument_naming (arg_regs_used_so_far_v))

What does this comment mean?


It means that the code inside the if clause. i.e.:

arg_regs_used_so_far += fr30_num_arg_regs (mode, type);

is nonsense.  arg_regs_used_so_far is a pointer to int.  The pointed-to int
is supposed to record the number of words used for argument passing.
The statement increments the pointer.

Re: Cgraph alias reorg 8/14 (ipa-cp and ipa-prop update)

2011-06-14 Thread Jan Hubicka

> > Index: ipa-cp.c
> > ===
> > --- ipa-cp.c(revision 174905)
> > +++ ipa-cp.c(working copy)
> > @@ -818,7 +828,7 @@ ipcp_iterate_stage (void)
> >  /* Some lattices have changed from IPA_TOP to IPA_BOTTOM.
> > This change should be propagated.  */
> >  {
> > -  gcc_assert (n_cloning_candidates);
> > +  /*gcc_assert (n_cloning_candidates);*/
> >ipcp_propagate_stage ();
> >  }
> >if (dump_file)
> 
> 
> I know this assert can be horribly irritating but so far it has been
> very useful at spotting all kinds of errors at various places.  (In
> fact, you added it :-)
> 
> But as I want to get the whole IPA-CP replaced, I don't care all that
> much.
I reverted this change now.

Thanks,
Honza
/bin/bash: :q: command not found

Re: [PATCH, PR 49089] Don't split AVX256 unaligned loads by default on bdver1 and generic

2011-06-14 Thread H.J. Lu

On Tue, Jun 14, 2011 at 3:16 AM, Jakub Jelinek  wrote:
> On Tue, Jun 14, 2011 at 12:13:47PM +0200, Richard Guenther wrote:
>> On Tue, Jun 14, 2011 at 1:59 AM, Fang, Changpeng  
>> wrote:
>> > The patch ( http://gcc.gnu.org/ml/gcc-patches/2011-02/txt00059.txt ) which 
>> > introduces splitting avx256 unaligned loads.
>> > However, we found that it causes significant regressions for cpu2006 ( 
>> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49089 ).
>> >
>> > In this work, we introduce a tune option that sets splitting unaligned 
>> > loads default only for such CPUs that such splitting
>> > is beneficial.
>> >
>> > The patch passed bootstrapping and regression tests on 
>> > x86_64-unknown-linux-gnu system.
>> >
>> > Is it OK to commit?
>>
>> It probably should go to the 4.6 branch as well.  Note that I find the
>> X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD_OPTIMAL odd,
>> why not call it simply X86_TUNE_AVX256_SPLIT_UNALIGNED_LOAD?
>
> I also wonder what we should do for -mtune=generic.  Should we split or not?
> How big improvement is it on Intel chips, how big degradation does it
> cause on AMD chips (I assume no other chip maker currently supports AVX)?
>

Simply turning off 32byte aligned load split, which introduces
performance regressions on
Intel Sandy Bridge processors, isn't an appropriate solution.

I am proposing a different approach so that we can improve
-mtune=generic performance
on current Intel and AMD processors.

The current default GCC tuning, -mtune=generic, was implemented in
2005 for Intel
Pentium 4, Core 2 and AMD K8 processors.  Many optimization choices
are no longer
applicable to the current Intel nor AMD processors.

We should choose a set of optimization choices for -mtune=generic,
including 32byte
unaligned load split, for the current Intel and AMD processors,  which
should improve
performance with no performance regressions.

-- 
H.J.

PING^4 APPROVED patch for AMD64 targets running GNU/kFreeBSD, anyone?

2011-06-14 Thread Robert Millan

This patch for AMD64 targets running GNU/kFreeBSD has been approved
already, would anyone be so kind to commit it?  I'm afraid I don't have
write perms currently.

See: http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00884.html

Thank you very much :-)

2011/6/10 Richard Henderson :
> On 06/10/2011 01:59 PM, Robert Millan wrote:
>> 2011-06-02  Robert Millan  
>>
>>   * config/i386/kfreebsd-gnu.h: Resync with `config/i386/linux.h'.
>>   * config/kfreebsd-gnu.h (GNU_USER_DYNAMIC_LINKER): Resync with
>>   `config/linux.h'.
>>
>>   * config/i386/kfreebsd-gnu64.h: New file.
>>   * config.gcc (x86_64-*-kfreebsd*-gnu): Replace `i386/kfreebsd-gnu.h'
>>   with `i386/kfreebsd-gnu64.h'.
>>
>>   * config/i386/linux64.h (GNU_USER_LINK_EMULATION32)
>>   (GNU_USER_LINK_EMULATION64): New macros.
>>   * config/i386/gnu-user64.h (LINK_SPEC): Rely on
>>   `GNU_USER_LINK_EMULATION32' and `GNU_USER_LINK_EMULATION64' instead
>>   of hardcoding `elf_i386' and `elf_x86_64'.
>
> Ok.
>
>
> r~
>

-- 
Robert Millan
2011-06-02  Robert Millan  

* config/i386/kfreebsd-gnu.h: Resync with `config/i386/linux.h'.
* config/kfreebsd-gnu.h (GNU_USER_DYNAMIC_LINKER): Resync with
`config/linux.h'.

* config/i386/kfreebsd-gnu64.h: New file.
* config.gcc (x86_64-*-kfreebsd*-gnu): Replace `i386/kfreebsd-gnu.h'
with `i386/kfreebsd-gnu64.h'.

* config/i386/linux64.h (GNU_USER_LINK_EMULATION32)
(GNU_USER_LINK_EMULATION64): New macros.
* config/i386/gnu-user64.h (LINK_SPEC): Rely on
`GNU_USER_LINK_EMULATION32' and `GNU_USER_LINK_EMULATION64' instead
of hardcoding `elf_i386' and `elf_x86_64'.

Index: gcc/config/i386/kfreebsd-gnu64.h
===
--- gcc/config/i386/kfreebsd-gnu64.h(revision 0)
+++ gcc/config/i386/kfreebsd-gnu64.h(revision 0)
@@ -0,0 +1,26 @@
+/* Definitions for AMD x86-64 running kFreeBSD-based GNU systems with ELF 
format
+   Copyright (C) 2011
+   Free Software Foundation, Inc.
+   Contributed by Robert Millan.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define GNU_USER_LINK_EMULATION32 "elf_i386_fbsd"
+#define GNU_USER_LINK_EMULATION64 "elf_x86_64_fbsd"
+
+#define GLIBC_DYNAMIC_LINKER32 "/lib/ld.so.1"
+#define GLIBC_DYNAMIC_LINKER64 "/lib/ld-kfreebsd-x86-64.so.1"
Index: gcc/config/i386/kfreebsd-gnu.h
===
--- gcc/config/i386/kfreebsd-gnu.h  (revision 174566)
+++ gcc/config/i386/kfreebsd-gnu.h  (working copy)
@@ -1,5 +1,5 @@
 /* Definitions for Intel 386 running kFreeBSD-based GNU systems with ELF format
-   Copyright (C) 2004, 2007, 2011
+   Copyright (C) 2011
Free Software Foundation, Inc.
Contributed by Robert Millan.
 
@@ -19,11 +19,5 @@
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#undef GNU_USER_LINK_EMULATION
 #define GNU_USER_LINK_EMULATION "elf_i386_fbsd"
-
-#undef GNU_USER_DYNAMIC_LINKER32
-#define GNU_USER_DYNAMIC_LINKER32 "/lib/ld.so.1"
-
-#undef GNU_USER_DYNAMIC_LINKER64
-#define GNU_USER_DYNAMIC_LINKER64 "/lib/ld-kfreebsd-x86-64.so.1"
+#define GLIBC_DYNAMIC_LINKER "/lib/ld.so.1"
Index: gcc/config/i386/linux64.h
===
--- gcc/config/i386/linux64.h   (revision 174566)
+++ gcc/config/i386/linux64.h   (working copy)
@@ -24,6 +24,9 @@
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
+#define GNU_USER_LINK_EMULATION32 "elf_i386"
+#define GNU_USER_LINK_EMULATION64 "elf_x86_64"
+
 #define GLIBC_DYNAMIC_LINKER32 "/lib/ld-linux.so.2"
 #define GLIBC_DYNAMIC_LINKER64 "/lib64/ld-linux-x86-64.so.2"
 
Index: gcc/config/i386/gnu-user64.h
===
--- gcc/config/i386/gnu-user64.h(revision 174566)
+++ gcc/config/i386/gnu-user64.h(working copy)
@@ -69,7 +69,8 @@
  %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}}"
 
 #undef LINK_SPEC
-#define LINK_SPEC "%{" SPEC_64 ":-m elf_x86_64} %{" SPEC_32 ":-m elf_i386} \
+#define LINK_SPEC "%{" SPEC_64 ":-m " GNU_USER_LINK_EMULATION64 "} \
+   %{" SPEC_32 ":-m " GNU_USER_LINK_EMULATION32 "} \
   %{shared:-shared} \
   %{!shared: \
 %{!static: \
Index: gcc/config/kfreebsd

Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated)

2011-06-14 Thread Bernd Schmidt

On 06/14/2011 02:53 PM, Joern Rennecke wrote:
> Quoting Bernd Schmidt :
> 
>> I'm not getting the point of the use of attribute((transparent_union)).
> 
> Without that attribute, lots of ABIs add a lot of overhead for function
> argument and return value passing.

* These functions are not hotspots.
* Most sane ABIs pass single-word structs in registers
* For the most part, gcc runs on i686 and there it doesn't make a
  difference. If ARM takes over the world, it still does not make a
  difference.

This is an unnecessary and premature microoptimization. Please remove it.

>> and to eliminate a potential source of bugs when
>> passing cumulative_args_t arguments.
> 
> Is that about not trusting the bootstrap gcc to implement that attribute
> correctly?
> Or do you want the integrity check from the ENABLE_CHECKING case to be
> always present?

According to the transparent union documentation, the compiler will
accept void * rather than cumulative_args_t if the latter is declared as
a transparent union.

If the point of your ENABLE_CHECKING machinery (which I also don't
really understand) is to avoid exactly that kind of bug, then the
ENABLE_CHECKING code should go away along with the use of transparent_union.

> [fr30.c:fr30_setup_incoming_varargs]
>> -  if (targetm.calls.strict_argument_naming (arg_regs_used_so_far))
>> +  /* ??? the code inside is a pointer increment.  */
>> +  if (targetm.calls.strict_argument_naming (arg_regs_used_so_far_v))
>>
>> What does this comment mean?
> 
> It means that the code inside the if clause. i.e.:
> 
> arg_regs_used_so_far += fr30_num_arg_regs (mode, type);
> 
> is nonsense.  arg_regs_used_so_far is a pointer to int.  The pointed-to int
> is supposed to record the number of words used for argument passing.
> The statement increments the pointer.

Ok, so move the comment before that statement then and adjust it to say
this.

Bernd

Re: [Design notes, RFC] Address-lowering prototype design (PR46556)

2011-06-14 Thread Richard Guenther

On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
 wrote:
> On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
>> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
>>  wrote:
>
> 
>
>> >> > Loss of aliasing information
>> >> > 
>> >> > The most serious problem I've run into is degraded performance due to 
>> >> > poorer
>> >> > instruction scheduling choices.  I tracked this down to
>> >> > alias.c:nonoverlapping_component_refs_p.
>> >> >
>> >> > This code proves that two memory accesses don't overlap by attempting 
>> >> > to prove
>> >> > that they access different fields of the same structure.  This is done 
>> >> > using
>> >> > the MEM_EXPRs of the two rtx's, which record the expression trees that 
>> >> > were
>> >> > translated into the rtx's during expand.  When address lowering is not
>> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
>> >> > example.  However, address lowering changes the simple COMPONENT_REF 
>> >> > into a
>> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
>> >> > reference.  Thus the aliasing machinery can no longer prove that two 
>> >> > such
>> >> > field references are disjoint.
>> >> >
>> >> > This has severe consequences for performance, and has to be dealt with 
>> >> > if
>> >> > address lowering is to be successful.
>> >> >
>> >> > I've worked around this with an admittedly fragile solution; I'll 
>> >> > discuss the
>> >> > drawbacks below.  The idea is to construct a mapping from replacement 
>> >> > mem_refs
>> >> > to the original expressions that they replaced.  When a MEM_EXPR is 
>> >> > being set
>> >> > during expand, we first look up the mem_ref in the mapping.  If 
>> >> > present, the
>> >> > MEM_EXPR is set to the original expression, rather than to the mem_ref. 
>> >> >  This
>> >> > essentially duplicates the behavior in the absence of address lowering.
>> >>
>> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
>> >> of problems.  Removing it didn't cause much degradation because we now
>> >> preserve points-to information.
>> >>
>> >> Originally I played with lowering all memory accesses to MEM_REFs
>> >> (see the old mem-ref branch), and the loss of type-based alias
>> >> disambiguation was indeed an issue.
>> >>
>> >> But - I definitely do not like the idea of preserving something similar
>> >> to TMR_ORIGINAL.  Instead we can try preserving some information
>> >> we derive from it.  We keep the original access type that we can use
>> >> for TBAA but do not retain knowledge on whether the type of the
>> >> MEM_REF is valid for TBAA or if it is view-converted.
>> >
>> > Yes, I really don't like what I have at the moment, either.  I put it in
>> > place as a stopgap to let me proceed to look for other performance
>> > problems.
>> >
>> > The question is how we can infer useful information for TBAA from the
>> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
>> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
>> > many cases to maintain correctness, and couldn't prove the type
>> > information for the important cases in SPEC I was trying to address.
>> >
>> > Unfortunately, the whole design goes down the drain if we can't find a
>> > way to solve the TBAA issue.  The performance degradations are too
>> > costly.
>>
>> If you look at what basic TBAA the alias oracle performs then it boils
>> down to the fact that get_alias_set for a.b.c might end up using the
>> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
>> of the type of a.  The tree alias-oracle extracts both alias sets, that
>> of the outermost valid type and that of the innermost as both are
>> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
>> only have storage for one such alias-set.  Thus my idea at some point
>> was to store the other one as well in some form.  It will not be
>> the full information (after all, the complete access path does provide
>> some extra information - see aliasing_component_refs_p).
>
> This is what concerns me.  TBAA information for the outer and inner
> components doesn't seem sufficient to provide what
> nonoverlapping_component_refs_p is currently able to prove.  The latter
> searches for a common RECORD_TYPE somewhere along the two access paths,
> and then disambiguates if the two associated referenced fields differ.
> For a simple case like "struct x { int a; int b; };", a and b have the
> same type and alias-set, so the alias-set information doesn't add
> anything.  It isn't sufficient alone for the disambiguation of x1.a =
> MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
>
> Obviously the offset is sufficient to disambiguate for this simple case
> with a common base type, but when the shared record types aren't at the
> outermost level, we can't detect whether it is.
>
> At the moment I don't see how we can avoid degradation unless we keep
> the full access path

[testsuite] Require lto support in g++.dg/torture/pr48954.C

2011-06-14 Thread Rainer Orth

The new g++.dg/torture/pr48954.C testcase FAILs on alpha-dec-osf5.1b:

FAIL: g++.dg/torture/pr48954.C  -O0  (test for excess errors)
Excess errors:
cc1plus: error: LTO support has not been enabled in this configuration

The following test fixes this, tested with the appropriate runtest
invocation, installed on mainline.

Rainer


2011-06-14  Rainer Orth  

* g++.dg/torture/pr48954.C: Use dg-require-effective-target lto.

Index: gcc/testsuite/g++.dg/torture/pr48954.C
===
--- gcc/testsuite/g++.dg/torture/pr48954.C  (revision 175018)
+++ gcc/testsuite/g++.dg/torture/pr48954.C  (working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -flto -fno-early-inlining -fkeep-inline-functions" } */
+/* { dg-require-effective-target lto } */
+
 struct A
 {
   virtual void foo () = 0;


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Dump before flag

2011-06-14 Thread Richard Guenther

On Fri, Jun 10, 2011 at 8:44 PM, Xinliang David Li  wrote:
> This is the revised patch as suggested.
>
> How does it look?

 }

+static void
+execute_function_dump (void *data ATTRIBUTE_UNUSED)

function needs a comment.

Ok with that change.

Please always specify how you tested the patch - the past fallouts
suggest you didn't do the required testing carefully.

A changelog is missing as well.

Thanks,
Richard.

> Thanks,
>
> David
>
> On Fri, Jun 10, 2011 at 9:22 AM, Xinliang David Li  wrote:
>> On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther
>>  wrote:
>>> On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li  
>>> wrote:
 See attached.
>>>
>>> Hmm.  I don't like how you still wire dumping in the TODO routines.
>>> Doesn't it work to just dump the body from pass_fini_dump_file ()?
>>> Or if that doesn't sound clean from (a subset of) places where it
>>> is called? (we might want to exclude the ipa read/write/summary
>>> stages)
>>
>> That may require another round of function traversal -- but probably
>> not a big deal -- it sounds cleaner.
>>
>> David
>>
>>>
>>> Richard.
>>>
 Thanks,

 David

 On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther
  wrote:
> On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li  
> wrote:
>> this is the patch that just removes the TODO_dump flag and forces it
>> to dump. The original code cfun->last_verified = flags &
>> TODO_verify_all looks weird -- depending on TODO_dump is set or not,
>> the behavior of the update is different (when no other todo flags is
>> set).
>>
>> Ok for trunk?
>
> -ENOPATCH.
>
> Richard.
>
>> David
>>
>> On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li  
>> wrote:
>>> On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther
>>>  wrote:
 On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li  
 wrote:
> The following is the patch that does the job. Most of the changes are
> just  removing TODO_dump_func. The major change is in passes.c and
> tree-pass.h.
>
> -fdump-xxx-yyy-start       <-- dump before TODO_start
> -fdump-xxx-yyy-before    <-- dump before main pass after TODO_pass
> -fdump-xxx-yyy-after       <-- dump after main pass before TODO_finish
> -fdump-xxx-yyy-finish      <-- dump after TODO_finish

 Can we bikeshed a bit more about these names?
>>>
>>> These names may be less confusing:
>>>
>>> before_preparation
>>> before
>>> after
>>> after_cleanup
>>>
>>> David
>>>
 "start" and "before"
 have no semantical difference to me ... as the dump before TODO_start
 of a pass and the dump after TODO_finish of the previous pass are
 identical (hopefully ;)), maybe merge those into a -between flag?
 If you'd specify it for a single pass then you'd get both -start and 
 -finish
 (using your naming scheme).  Splitting that dump(s) to different files
 then might make sense (not sure about the name to use).

 Note that I find it extremely useful to have dumping done in
 chronological order - splitting some of it to different files destroys
 this, especially a dump after TODO_start or before TODO_finish
 should appear in the same file (or we could also start splitting
 individual TODO_ output into sub-dump-files).  I guess what would
 be nice instread would be a fancy dump-file viewer that could
 show diffs, hide things like SCEV output, etc.

 I suppose a patch that removes the dump TODO and unconditionally
 dumps at the current point would be a good preparation for this
 enhancing patch.

 Richard.

> The default is 'finish'.
>
> Does it look ok?
>
> Thanks,
>
> David
>
> On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
>  wrote:
>> On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li 
>>  wrote:

 Your patch doesn't really improve this but adds to the confusion.

 +  /* Override dump TODOs.  */
 +  if (dump_file && (pass->todo_flags_finish & TODO_dump_func)
 +      && (dump_flags & TDF_BEFORE))
 +    {
 +      pass->todo_flags_finish &= ~TODO_dump_func;
 +      pass->todo_flags_start |= TODO_dump_func;
 +    }

 and certainly writing to pass is not ok.  And the TDF_BEFORE flag
 looks misplaced as it controls TODOs, not dumping behavior.
 Yes, it's a mess right now but the above looks like a hack ontop
 of that mess (maybe because of it, but well ...).

>>>
>>> How about removing dumping TODO completely -- this can be done 
>>> easily
>

Re: Ping^5: Re: Updated^2: RFA: Fix middle-end/46500 (void * encapsulated)

2011-06-14 Thread Joern Rennecke


Quoting Bernd Schmidt :


If the point of your ENABLE_CHECKING machinery (which I also don't
really understand) is to avoid exactly that kind of bug, then the
ENABLE_CHECKING code should go away along with the use of transparent_union.


No, it does a lot more than that.  It gives a sanity check that the contents
of a cumulative_args_t have actually been packed with pack_cumulative_args,
and in case more than one target is supported, the check will also
verify that the target on which get_cumulative_args is used is the same as
the one on which pack_cumulative_args was called before to pack the  
cumlative_args_t.

Bugs where a target hook for the wrong target is called, or a data structure
from the wrong target is used, are hard to track down when all the optimizers
are operating in garbage-in-garbage-out mode.

Re: [Design notes, RFC] Address-lowering prototype design (PR46556)

2011-06-14 Thread William J. Schmidt

On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
>  wrote:
> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
> >>  wrote:
> >
> > 
> >
> >> >> > Loss of aliasing information
> >> >> > 
> >> >> > The most serious problem I've run into is degraded performance due to 
> >> >> > poorer
> >> >> > instruction scheduling choices.  I tracked this down to
> >> >> > alias.c:nonoverlapping_component_refs_p.
> >> >> >
> >> >> > This code proves that two memory accesses don't overlap by attempting 
> >> >> > to prove
> >> >> > that they access different fields of the same structure.  This is 
> >> >> > done using
> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees 
> >> >> > that were
> >> >> > translated into the rtx's during expand.  When address lowering is not
> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, for
> >> >> > example.  However, address lowering changes the simple COMPONENT_REF 
> >> >> > into a
> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a field
> >> >> > reference.  Thus the aliasing machinery can no longer prove that two 
> >> >> > such
> >> >> > field references are disjoint.
> >> >> >
> >> >> > This has severe consequences for performance, and has to be dealt 
> >> >> > with if
> >> >> > address lowering is to be successful.
> >> >> >
> >> >> > I've worked around this with an admittedly fragile solution; I'll 
> >> >> > discuss the
> >> >> > drawbacks below.  The idea is to construct a mapping from replacement 
> >> >> > mem_refs
> >> >> > to the original expressions that they replaced.  When a MEM_EXPR is 
> >> >> > being set
> >> >> > during expand, we first look up the mem_ref in the mapping.  If 
> >> >> > present, the
> >> >> > MEM_EXPR is set to the original expression, rather than to the 
> >> >> > mem_ref.  This
> >> >> > essentially duplicates the behavior in the absence of address 
> >> >> > lowering.
> >> >>
> >> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
> >> >> of problems.  Removing it didn't cause much degradation because we now
> >> >> preserve points-to information.
> >> >>
> >> >> Originally I played with lowering all memory accesses to MEM_REFs
> >> >> (see the old mem-ref branch), and the loss of type-based alias
> >> >> disambiguation was indeed an issue.
> >> >>
> >> >> But - I definitely do not like the idea of preserving something similar
> >> >> to TMR_ORIGINAL.  Instead we can try preserving some information
> >> >> we derive from it.  We keep the original access type that we can use
> >> >> for TBAA but do not retain knowledge on whether the type of the
> >> >> MEM_REF is valid for TBAA or if it is view-converted.
> >> >
> >> > Yes, I really don't like what I have at the moment, either.  I put it in
> >> > place as a stopgap to let me proceed to look for other performance
> >> > problems.
> >> >
> >> > The question is how we can infer useful information for TBAA from the
> >> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
> >> > many cases to maintain correctness, and couldn't prove the type
> >> > information for the important cases in SPEC I was trying to address.
> >> >
> >> > Unfortunately, the whole design goes down the drain if we can't find a
> >> > way to solve the TBAA issue.  The performance degradations are too
> >> > costly.
> >>
> >> If you look at what basic TBAA the alias oracle performs then it boils
> >> down to the fact that get_alias_set for a.b.c might end up using the
> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
> >> of the type of a.  The tree alias-oracle extracts both alias sets, that
> >> of the outermost valid type and that of the innermost as both are
> >> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
> >> only have storage for one such alias-set.  Thus my idea at some point
> >> was to store the other one as well in some form.  It will not be
> >> the full information (after all, the complete access path does provide
> >> some extra information - see aliasing_component_refs_p).
> >
> > This is what concerns me.  TBAA information for the outer and inner
> > components doesn't seem sufficient to provide what
> > nonoverlapping_component_refs_p is currently able to prove.  The latter
> > searches for a common RECORD_TYPE somewhere along the two access paths,
> > and then disambiguates if the two associated referenced fields differ.
> > For a simple case like "struct x { int a; int b; };", a and b have the
> > same type and alias-set, so the alias-set information doesn't add
> > anything.  It isn't sufficient alone for the disambiguation of x1.a =
> > MEM_REF[&x1, 0] and x2.b = MEM_REF[&x2, 4].
> >
> > Obviously the offset is suffici

Fix comdat unsharing

2011-06-14 Thread Jan Hubicka

Hi,
cgraph_address_taken_from_non_vtable_p was written with asumption that all 
references to functions
take addresses.  This is not true for aliases.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

Index: ChangeLog
===
--- ChangeLog   (revision 175020)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2011-06-13  Jan Hubicka  
+
+   * ipa.c (cgraph_address_taken_from_non_vtable_p): Check the ref type.
+
 2011-06-14  Richard Henderson  
 
PR debug/48459
Index: ipa.c
===
--- ipa.c   (revision 175015)
+++ ipa.c   (working copy)
@@ -543,14 +543,15 @@ cgraph_address_taken_from_non_vtable_p (
   int i;
   struct ipa_ref *ref;
   for (i = 0; ipa_ref_list_reference_iterate (&node->ref_list, i, ref); i++)
-{
-  struct varpool_node *node;
-  if (ref->refered_type == IPA_REF_CGRAPH)
-   return true;
-  node = ipa_ref_varpool_node (ref);
-  if (!DECL_VIRTUAL_P (node->decl))
-   return true;
-}
+if (ref->use == IPA_REF_ADDR)
+  {
+   struct varpool_node *node;
+   if (ref->refered_type == IPA_REF_CGRAPH)
+ return true;
+   node = ipa_ref_varpool_node (ref);
+   if (!DECL_VIRTUAL_P (node->decl))
+ return true;
+  }
   return false;
 }

Re: [PATCH] Only run pr48377.c testcase on i?86/x86_64

2011-06-14 Thread Eric Botcazou

> Well, Steve has a patch for non_strict_align effective_target
> in http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00673.html
> (with s/strict_align/non_strict_align/g ), I was hoping it would be
> reviewed and I'd just adjust the testcase to use it as well.

Would it be applied to the 4.6 branch as well?  If no, I think you should apply 
your patch to trunk and 4.6 branch and let Steve adjust it on trunk later.

-- 
Eric Botcazou

[build, libgcc] Correctly apply c_flags in shared-object.mk

2011-06-14 Thread Rainer Orth

When I first did a Solaris 11/x86 bootstrap with gld after checking in
my ENABLE_EXECUTE_STACK patch, I found that several acats and gnat.dg
tests were failing.  This hadn't happened with Sun ld.

Reghunting revealed that this had been introduced by that patch.
Fortunately, not the code itself was at fault.  Instead, before the
patch _enable_execute_stack.o had been compiled without -fexceptions,
while afterwards (with enable-execute-stack.c added to LIB2ADD)
-fexceptions was used.  I don't yet understand why this is a problem,
and only with gld, but clearly this is not how the LIB2ADD and
LIB2ADD_ST objects are supposed to be compiled.  In libgcc/Makefile.in,
we have

# Build LIB2ADD and LIB2ADD_ST.
[...]
c_flags :=
iter-items := $(LIB2ADD) $(LIB2ADD_ST)
include $(iterator)

with

iterator = $(srcdir)/empty.mk $(patsubst 
%,$(srcdir)/shared-object.mk,$(iter-items))

The problem is that the rule created from shared-object.mk to compile
LIB2ADD members refers to $(c_flags), but that variable is evaluated at
the point the rule is invoked, not when it is created.

Makefile.in sets c_flags 3 times:

# Build LIB2ADD and LIB2ADD_ST.
[...]
c_flags :=

# Build LIB2ADDEH, LIB2ADDEHSTATIC, and LIB2ADDEHSHARED.  If we don't have
[...]
c_flags := -fexceptions

# Build LIBUNWIND.
[...]
c_flags := -fexceptions

The effect is that not only LIB2ADDEH* and LIBUNWIND sources are
compiled with -fexceptions, but everything in LIB2ADD{, _ST}.

The following patch fixes this by storing the current value of c_flags
in a per-source variable and using that in the generated rules.  I've
checked that the LIB2ADD members are no longer compiled with
-fexceptions.  With that patch, all the acats and gnat.dg failures are
gone.

Bootstrapped without regressions on i386-pc-solaris2.11.

Ok for mainline?

Rainer


2011-06-12  Rainer Orth  

* shared-object.mk ($o-opt): Save c_flags.
($(base)$(objext)): Use it.
($(base)_s$(objext)): Likewise.

diff --git a/libgcc/shared-object.mk b/libgcc/shared-object.mk
--- a/libgcc/shared-object.mk
+++ b/libgcc/shared-object.mk
@@ -6,13 +6,17 @@ iter-items := $(filter-out $o,$(iter-ite
 
 base := $(basename $(notdir $o))
 
+$o-opt := $(c_flags)
+
+#$(info $o: c_flags=$(c_flags) o-opt=$($(o)-opt))
+
 ifeq ($(suffix $o),.c)
 
 $(base)$(objext): $o
-   $(gcc_compile) $(c_flags) -c $< $(vis_hide)
+   $(gcc_compile) $($<-opt) -c $< $(vis_hide)
 
 $(base)_s$(objext): $o
-   $(gcc_s_compile) $(c_flags) -c $<
+   $(gcc_s_compile) $($<-opt) -c $<
 
 else
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Cgraph alias reorg 13/14 (disable inlining functions called once at -O0

2011-06-14 Thread Eric Botcazou

> I think we also suggested at some point that -O1 optimizations
> shouldn't interfere with debugging too much.  But if it is what we did before
> it's certainly fine.

FWIW we have some evidences that -finline-functions-called-once really help 
at -O1 in terms of performances (with the 4.5 back-end) and doesn't damage 
debugging too much.

-- 
Eric Botcazou

Re: [PATCH, PR43864] Gimple level duplicate block cleanup.

2011-06-14 Thread Richard Guenther

On Fri, Jun 10, 2011 at 6:54 PM, Tom de Vries  wrote:
> Hi Richard,
>
> thanks for the review.
>
> On 06/08/2011 11:55 AM, Richard Guenther wrote:
>> On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries  wrote:
>>> Hi Richard,
>>>
>>> I have a patch for PR43864. The patch adds a gimple level duplicate block
>>> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and
>>> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the 
>>> following
>>> table (%, lower is better).
>>>
>>>                     none            pic
>>>                thumb1  thumb2  thumb1 thumb2
>>> spec2000         99.9    99.9    99.8   99.8
>>>
>>> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure 
>>> that the
>>> optimizations proposed in PR20070 would fix this PR.
>>>
>>> The problem in this PR is that when compiling with -O2, the example below 
>>> should
>>> only have one call to free. The original problem is formulated in terms of 
>>> -Os,
>>> but currently we generate one call to free with -Os, although still not the
>>> smallest code possible. I'll show here the -O2 case, since that's similar 
>>> to the
>>> original PR.
>>>
>
> Example A. (naming it for reference below)
>
>>> #include 
>>> void foo (char*, FILE*);
>>> char* hprofStartupp(char *outputFileName, char *ctx)
>>> {
>>>    char fileName[1000];
>>>    FILE *fp;
>>>    sprintf(fileName, outputFileName);
>>>    if (access(fileName, 1) == 0) {
>>>        free(ctx);
>>>        return 0;
>>>    }
>>>
>>>    fp = fopen(fileName, 0);
>>>    if (fp == 0) {
>>>        free(ctx);
>>>        return 0;
>>>    }
>>>
>>>    foo(outputFileName, fp);
>>>
>>>    return ctx;
>>> }
>>>
>>> AFAIU, there are 2 complementary methods of rtl optimizations proposed in 
>>> PR20070.
>>> - Merging 2 blocks which are identical expect for input registers, by using 
>>> a
>>>  conditional move to choose between the different input registers.
>>> - Merging 2 blocks which have different local registers, by ignoring those
>>>  differences
>>>
>>> Blocks .L6 and.L7 have no difference in local registers, but they have a
>>> difference in input registers: r3 and r1. Replacing the move to r5 by a
>>> conditional move would probably be benificial in terms of size, but it's not
>>> clear what condition the conditional move should be using. Calculating such 
>>> a
>>> condition would add in size and increase the execution path.
>>>
>>> gcc -O2 -march=armv7-a -mthumb pr43864.c -S:
>>> ...
>>>        push    {r4, r5, lr}
>>>        mov     r4, r0
>>>        sub     sp, sp, #1004
>>>        mov     r5, r1
>>>        mov     r0, sp
>>>        mov     r1, r4
>>>        bl      sprintf
>>>        mov     r0, sp
>>>        movs    r1, #1
>>>        bl      access
>>>        mov     r3, r0
>>>        cbz     r0, .L6
>>>        movs    r1, #0
>>>        mov     r0, sp
>>>        bl      fopen
>>>        mov     r1, r0
>>>        cbz     r0, .L7
>>>        mov     r0, r4
>>>        bl      foo
>>> .L3:
>>>        mov     r0, r5
>>>        add     sp, sp, #1004
>>>        pop     {r4, r5, pc}
>>> .L6:
>>>        mov     r0, r5
>>>        mov     r5, r3
>>>        bl      free
>>>        b       .L3
>>> .L7:
>>>        mov     r0, r5
>>>        mov     r5, r1
>>>        bl      free
>>>        b       .L3
>>> ...
>>>
>>> The proposed patch solved the problem by dealing with the 2 blocks at a 
>>> level
>>> when they are still identical: at gimple level. It detect that the 2 blocks 
>>> are
>>> identical, and removes one of them.
>>>
>>> The following table shows the impact of the patch on the example in terms of
>>> size for -march=armv7-a:
>>>
>>>          without     with    delta
>>> Os      :     108      104       -4
>>> O2      :     120      104      -16
>>> Os thumb:      68       64       -4
>>> O2 thumb:      76       64      -12
>>>
>>> The gain in size for -O2 is that of removing the entire block, plus the
>>> replacement of 2 moves by a constant set, which also decreases the execution
>>> path. The patch ensures optimal code for both -O2 and -Os.
>>>
>>>
>>> By keeping track of equivalent definitions in the 2 blocks, we can ignore 
>>> those
>>> differences in comparison. Without this feature, we would only match blocks 
>>> with
>>> resultless operations, due the the ssa-nature of gimples.
>>> For example, with this feature, we reduce the following function to its 
>>> minimum
>>> at gimple level, rather than at rtl level.
>>>
>
> Example B. (naming it for reference below)
>
>>> int f(int c, int b, int d)
>>> {
>>>  int r, e;
>>>
>>>  if (c)
>>>    r = b + d;
>>>  else
>>>    {
>>>      e = b + d;
>>>      r = e;
>>>    }
>>>
>>>  return r;
>>> }
>>>
>>> ;; Function f (f)
>>>
>>> f (int c, int b, int d)
>>> {
>>>  int e;
>>>
>>> :
>>>  e_6 = b_3(D) + d_4(D);
>>>  return e_6;
>>>
>>> }
>>>
>>> I'll send the patch with the testcases in a separate email.
>>>
>>> OK for trunk?
>>
>> I don't like that you hook this into cleanup_tree_cfg - that is called
>> _way_

[v3] Use noexcept in and tempbuf

2011-06-14 Thread Paolo Carlini


Hi,

tested x86_64-linux, committed.

Paolo.

///
2011-06-14  Paolo Carlini  

* include/std/functional: Use noexcept.
* include/bits/stl_tempbuf.h: Likewise.
Index: include/std/functional
===
--- include/std/functional  (revision 175020)
+++ include/std/functional  (working copy)
@@ -437,28 +437,28 @@
 public:
   typedef _Tp type;
 
-  reference_wrapper(_Tp& __indata)
+  reference_wrapper(_Tp& __indata) noexcept
   : _M_data(std::__addressof(__indata))
   { }
 
   reference_wrapper(_Tp&&) = delete;
 
-  reference_wrapper(const reference_wrapper<_Tp>& __inref):
-  _M_data(__inref._M_data)
+  reference_wrapper(const reference_wrapper<_Tp>& __inref) noexcept
+  : _M_data(__inref._M_data)
   { }
 
   reference_wrapper&
-  operator=(const reference_wrapper<_Tp>& __inref)
+  operator=(const reference_wrapper<_Tp>& __inref) noexcept
   {
_M_data = __inref._M_data;
return *this;
   }
 
-  operator _Tp&() const
+  operator _Tp&() const noexcept
   { return this->get(); }
 
   _Tp&
-  get() const
+  get() const noexcept
   { return *_M_data; }
 
   template
@@ -473,13 +473,13 @@
   /// Denotes a reference should be taken to a variable.
   template
 inline reference_wrapper<_Tp>
-ref(_Tp& __t)
+ref(_Tp& __t) noexcept
 { return reference_wrapper<_Tp>(__t); }
 
   /// Denotes a const reference should be taken to a variable.
   template
 inline reference_wrapper
-cref(const _Tp& __t)
+cref(const _Tp& __t) noexcept
 { return reference_wrapper(__t); }
 
   template
@@ -491,13 +491,13 @@
   /// Partial specialization.
   template
 inline reference_wrapper<_Tp>
-ref(reference_wrapper<_Tp> __t)
+ref(reference_wrapper<_Tp> __t) noexcept
 { return ref(__t.get()); }
 
   /// Partial specialization.
   template
 inline reference_wrapper
-cref(reference_wrapper<_Tp> __t)
+cref(reference_wrapper<_Tp> __t) noexcept
 { return cref(__t.get()); }
 
   // @} group functors
@@ -1913,13 +1913,15 @@
*  @brief Default construct creates an empty function call wrapper.
*  @post @c !(bool)*this
*/
-  function() : _Function_base() { }
+  function() noexcept
+  : _Function_base() { }
 
   /**
*  @brief Creates an empty function call wrapper.
*  @post @c !(bool)*this
*/
-  function(nullptr_t) : _Function_base() { }
+  function(nullptr_t) noexcept
+  : _Function_base() { }
 
   /**
*  @brief %Function copy constructor.
@@ -2050,7 +2052,7 @@
   /// @overload
   template
typename enable_if::value, function&>::type
-   operator=(reference_wrapper<_Functor> __f)
+   operator=(reference_wrapper<_Functor> __f) noexcept
{
  function(__f).swap(*this);
  return *this;
@@ -2093,7 +2095,7 @@
*
*  This function will not throw an %exception.
*/
-  explicit operator bool() const
+  explicit operator bool() const noexcept
   { return !_M_empty(); }
 
   // [3.7.2.4] function invocation
@@ -2119,7 +2121,7 @@
*
*  This function will not throw an %exception.
*/
-  const type_info& target_type() const;
+  const type_info& target_type() const noexcept;
 
   /**
*  @brief Access the stored target function object.
@@ -2130,10 +2132,10 @@
*
* This function will not throw an %exception.
*/
-  template   _Functor* target();
+  template   _Functor* target() noexcept;
 
   /// @overload
-  template const _Functor* target() const;
+  template const _Functor* target() const noexcept;
 #endif
 
 private:
@@ -2187,7 +2189,7 @@
   template
 const type_info&
 function<_Res(_ArgTypes...)>::
-target_type() const
+target_type() const noexcept
 {
   if (_M_manager)
{
@@ -2203,7 +2205,7 @@
 template
   _Functor*
   function<_Res(_ArgTypes...)>::
-  target()
+  target() noexcept
   {
if (typeid(_Functor) == target_type() && _M_manager)
  {
@@ -,7 +2224,7 @@
 template
   const _Functor*
   function<_Res(_ArgTypes...)>::
-  target() const
+  target() const noexcept
   {
if (typeid(_Functor) == target_type() && _M_manager)
  {
@@ -2246,13 +2248,13 @@
*/
   template
 inline bool
-operator==(const function<_Res(_Args...)>& __f, nullptr_t)
+operator==(const function<_Res(_Args...)>& __f, nullptr_t) noexcept
 { return !static_cast(__f); }
 
   /// @overload
   template
 inline bool
-operator==(nullptr_t, const function<_Res(_Args...)>& __f)
+operator==(nullptr_t, const function<_Res(_Args...)>& __f) noexcept
 { return !static_cast(__f); }
 
   /**
@@ -2264,13 +2266,13 @@
*/
   template

Re: PATCH [1/n]: Prepare x32: PR middle-end/47364: internal compiler error: in emit_move_insn, at expr.c:3355

2011-06-14 Thread Richard Guenther

On Sun, Jun 12, 2011 at 6:28 PM, H.J. Lu  wrote:
> On Sun, Jun 12, 2011 at 7:33 AM, H.J. Lu  wrote:
>> On Sun, Jun 12, 2011 at 7:00 AM, H.J. Lu  wrote:
>>> On Sun, Jun 12, 2011 at 6:50 AM, Richard Guenther
>>>  wrote:
 On Sun, Jun 12, 2011 at 3:18 PM, H.J. Lu  wrote:
> On Sun, Jun 12, 2011 at 3:48 AM, Richard Guenther
>  wrote:
>> On Sat, Jun 11, 2011 at 5:09 PM, H.J. Lu  wrote:
>>> Hi,
>>>
>>> expand_builtin_strlen has
>>>
>>> src_reg = gen_reg_rtx (Pmode);
>>> ...
>>> pat = expand_expr (src, src_reg, ptr_mode, EXPAND_NORMAL);
>>> if (pat != src_reg)
>>>  emit_move_insn (src_reg, pat);
>>>
>>> But src_reg may be in ptr_mode, wich may not be the same as Pmode.
>>> This patch checks it.  OK for trunk?
>>>
>>> Thanks.
>>>
>>>
>>> H.J.
>>> ---
>>> 2011-06-11  H.J. Lu  
>>>
>>>        PR middle-end/47364
>>>        * builtins.c (expand_builtin_strlen): Properly handle target
>>>        not in Pmode.
>>>
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> index 7b24a0c..4e2cf31 100644
>>> --- a/gcc/builtins.c
>>> +++ b/gcc/builtins.c
>>> @@ -2941,7 +2941,11 @@ expand_builtin_strlen (tree exp, rtx target,
>>>       start_sequence ();
>>>       pat = expand_expr (src, src_reg, ptr_mode, EXPAND_NORMAL);
>>>       if (pat != src_reg)
>>> -       emit_move_insn (src_reg, pat);
>>> +       {
>>> +         if (GET_MODE (pat) != Pmode)
>>> +           pat = convert_to_mode (Pmode, pat, 1);
>>
>> Shouldn't this be POINTERS_EXTEND_UNSIGNED instead of 1?
>>
>>> +         emit_move_insn (src_reg, pat);
>>
>> Why not use convert_move unconditionally?
>>
>> Or, why not expand src in Pmode from the start?  After all, src_reg is
>> created as Pmode reg.
>>
>
> This patch works for my testcase.  OK for trunk?

 Ok if it passes bootstrap & regtest on a ptr_mode != Pmode target.

>>>
>>> Only the following targets expand strlen:
>>>
>>> avr/avr.md:(define_expand "strlenhi"
>>> avr/avr.md:(define_insn "*strlenhi"
>>> i386/i386.md:(define_expand "strlen"
>>> i386/i386.md: if (ix86_expand_strlen (operands[0], operands[1],
>>> operands[2], operands[3]))
>>> i386/i386.md:(define_expand "strlenqi_1"
>>> i386/i386.md:(define_insn "*strlenqi_1"
>>> rs6000/rs6000.md:(define_expand "strlensi"
>>> s390/s390.md:; strlenM instruction pattern(s).
>>> s390/s390.md:(define_expand "strlen"
>>> s390/s390.md:(define_insn "*strlen"
>>>
>>> None of them, except for my x32 port, are ptr_mode != Pmode targets.
>>> I will bootstrap and test it on my x32 branch.
>>>
>>
>> It doesn't work on x32. I got
>>
>> /export/gnu/import/git/gcc-x32/libssp/gets-chk.c:74:14: internal
>> compiler error: in emit_move_insn, at expr.c:3319
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> See  for instructions.
>>
>> How about this patch?
>>
>> Thanks.
>
> No regressions on x32 branch.  OK for trunk?

Does it work with also doing the expansion to Pmode in the first
place?  If so, ok with that change.

Thanks,
Richard.

> Thanks.
>
>> --
>> H.J.
>> ---
>> 2011-06-12  H.J. Lu  
>>
>>        PR middle-end/47364
>>        * builtins.c (expand_builtin_strlen): Properly handle target
>>        not in Pmode.
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index 7b24a0c..a2f175d 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -2941,7 +2941,14 @@ expand_builtin_strlen (tree exp, rtx target,
>>       start_sequence ();
>>       pat = expand_expr (src, src_reg, ptr_mode, EXPAND_NORMAL);
>>       if (pat != src_reg)
>> -       emit_move_insn (src_reg, pat);
>> +       {
>> +#ifdef POINTERS_EXTEND_UNSIGNED
>> +         if (GET_MODE (pat) != Pmode)
>> +           pat = convert_to_mode (Pmode, pat,
>> +                                  POINTERS_EXTEND_UNSIGNED);
>> +#endif
>> +         emit_move_insn (src_reg, pat);
>> +       }
>>       pat = get_insns ();
>>       end_sequence ();
>>
>
>
>
> --
> H.J.
>

Re: [Design notes, RFC] Address-lowering prototype design (PR46556)

2011-06-14 Thread Richard Guenther

On Tue, Jun 14, 2011 at 4:18 PM, William J. Schmidt
 wrote:
> On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
>> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
>>  wrote:
>> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
>> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
>> >>  wrote:
>> >
>> > 
>> >
>> >> >> > Loss of aliasing information
>> >> >> > 
>> >> >> > The most serious problem I've run into is degraded performance due 
>> >> >> > to poorer
>> >> >> > instruction scheduling choices.  I tracked this down to
>> >> >> > alias.c:nonoverlapping_component_refs_p.
>> >> >> >
>> >> >> > This code proves that two memory accesses don't overlap by 
>> >> >> > attempting to prove
>> >> >> > that they access different fields of the same structure.  This is 
>> >> >> > done using
>> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees 
>> >> >> > that were
>> >> >> > translated into the rtx's during expand.  When address lowering is 
>> >> >> > not
>> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, 
>> >> >> > for
>> >> >> > example.  However, address lowering changes the simple COMPONENT_REF 
>> >> >> > into a
>> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a 
>> >> >> > field
>> >> >> > reference.  Thus the aliasing machinery can no longer prove that two 
>> >> >> > such
>> >> >> > field references are disjoint.
>> >> >> >
>> >> >> > This has severe consequences for performance, and has to be dealt 
>> >> >> > with if
>> >> >> > address lowering is to be successful.
>> >> >> >
>> >> >> > I've worked around this with an admittedly fragile solution; I'll 
>> >> >> > discuss the
>> >> >> > drawbacks below.  The idea is to construct a mapping from 
>> >> >> > replacement mem_refs
>> >> >> > to the original expressions that they replaced.  When a MEM_EXPR is 
>> >> >> > being set
>> >> >> > during expand, we first look up the mem_ref in the mapping.  If 
>> >> >> > present, the
>> >> >> > MEM_EXPR is set to the original expression, rather than to the 
>> >> >> > mem_ref.  This
>> >> >> > essentially duplicates the behavior in the absence of address 
>> >> >> > lowering.
>> >> >>
>> >> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
>> >> >> of problems.  Removing it didn't cause much degradation because we now
>> >> >> preserve points-to information.
>> >> >>
>> >> >> Originally I played with lowering all memory accesses to MEM_REFs
>> >> >> (see the old mem-ref branch), and the loss of type-based alias
>> >> >> disambiguation was indeed an issue.
>> >> >>
>> >> >> But - I definitely do not like the idea of preserving something similar
>> >> >> to TMR_ORIGINAL.  Instead we can try preserving some information
>> >> >> we derive from it.  We keep the original access type that we can use
>> >> >> for TBAA but do not retain knowledge on whether the type of the
>> >> >> MEM_REF is valid for TBAA or if it is view-converted.
>> >> >
>> >> > Yes, I really don't like what I have at the moment, either.  I put it in
>> >> > place as a stopgap to let me proceed to look for other performance
>> >> > problems.
>> >> >
>> >> > The question is how we can infer useful information for TBAA from the
>> >> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets from
>> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
>> >> > many cases to maintain correctness, and couldn't prove the type
>> >> > information for the important cases in SPEC I was trying to address.
>> >> >
>> >> > Unfortunately, the whole design goes down the drain if we can't find a
>> >> > way to solve the TBAA issue.  The performance degradations are too
>> >> > costly.
>> >>
>> >> If you look at what basic TBAA the alias oracle performs then it boils
>> >> down to the fact that get_alias_set for a.b.c might end up using the
>> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
>> >> of the type of a.  The tree alias-oracle extracts both alias sets, that
>> >> of the outermost valid type and that of the innermost as both are
>> >> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
>> >> only have storage for one such alias-set.  Thus my idea at some point
>> >> was to store the other one as well in some form.  It will not be
>> >> the full information (after all, the complete access path does provide
>> >> some extra information - see aliasing_component_refs_p).
>> >
>> > This is what concerns me.  TBAA information for the outer and inner
>> > components doesn't seem sufficient to provide what
>> > nonoverlapping_component_refs_p is currently able to prove.  The latter
>> > searches for a common RECORD_TYPE somewhere along the two access paths,
>> > and then disambiguates if the two associated referenced fields differ.
>> > For a simple case like "struct x { int a; int b; };", a and b have the
>> > same type and alias-set, so the alia

Re: RFC: Fix GCSE exp_equiv_p on MEMs with different MEM_ATTRS (PR rtl-optimization/49390)

2011-06-14 Thread Jakub Jelinek

On Tue, Jun 14, 2011 at 11:49:08AM +0200, Richard Guenther wrote:
> So I'd say we revert your patch for now and if somebody feels like
> implementing the above ...

Ok, here is what I've bootstrapped/regtested on x86_64-linux and i686-linux
and committed to trunk and 4.6 branch:

2011-06-14  Jakub Jelinek  

PR rtl-optimization/49390
Revert:
2010-06-29  Bernd Schmidt  

* cse.c (exp_equiv_p): For MEMs, if for_gcse, only compare
MEM_ALIAS_SET.

* gcc.c-torture/execute/pr49390.c: New test.

--- gcc/cse.c.jj(revision 161534)
+++ gcc/cse.c   (revision 161533)
@@ -2669,16 +2669,26 @@
 case MEM:
   if (for_gcse)
{
- /* Can't merge two expressions in different alias sets, since we
-can decide that the expression is transparent in a block when
-it isn't, due to it being set with the different alias set.  */
- if (MEM_ALIAS_SET (x) != MEM_ALIAS_SET (y))
-   return 0;
-
  /* A volatile mem should not be considered equivalent to any
 other.  */
  if (MEM_VOLATILE_P (x) || MEM_VOLATILE_P (y))
return 0;
+
+ /* Can't merge two expressions in different alias sets, since we
+can decide that the expression is transparent in a block when
+it isn't, due to it being set with the different alias set.
+
+Also, can't merge two expressions with different MEM_ATTRS.
+They could e.g. be two different entities allocated into the
+same space on the stack (see e.g. PR25130).  In that case, the
+MEM addresses can be the same, even though the two MEMs are
+absolutely not equivalent.
+
+But because really all MEM attributes should be the same for
+equivalent MEMs, we just use the invariant that MEMs that have
+the same attributes share the same mem_attrs data structure.  */
+ if (MEM_ATTRS (x) != MEM_ATTRS (y))
+   return 0;
}
   break;
 
--- gcc/testsuite/gcc.c-torture/execute/pr49390.c.jj2011-06-13 
17:28:09.0 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr49390.c   2011-06-13 
17:27:49.0 +0200
@@ -0,0 +1,88 @@
+/* PR rtl-optimization/49390 */
+
+struct S { unsigned int s1; unsigned int s2; };
+struct T { unsigned int t1; struct S t2; };
+struct U { unsigned short u1; unsigned short u2; };
+struct V { struct U v1; struct T v2; };
+struct S a;
+char *b;
+union { char b[64]; struct V v; } u;
+volatile int v;
+extern void abort (void);
+
+__attribute__((noinline, noclone)) void
+foo (int x, void *y, unsigned int z, unsigned int w)
+{
+  if (x != 4 || y != (void *) &u.v.v2)
+abort ();
+  v = z + w;
+  v = 16384;
+}
+
+__attribute__((noinline, noclone)) void
+bar (struct S x)
+{
+  v = x.s1;
+  v = x.s2;
+}
+
+__attribute__((noinline, noclone)) int
+baz (struct S *x)
+{
+  v = x->s1;
+  v = x->s2;
+  v = 0;
+  return v + 1;
+}
+
+__attribute__((noinline, noclone)) void
+test (struct S *c)
+{
+  struct T *d;
+  struct S e = a;
+  unsigned int f, g;
+  if (c == 0)
+c = &e;
+  else
+{
+  if (c->s2 % 8192 <= 15 || (8192 - c->s2 % 8192) <= 31)
+   foo (1, 0, c->s1, c->s2);
+}
+  if (!baz (c))
+return;
+  g = (((struct U *) b)->u2 & 2) ? 32 : __builtin_offsetof (struct V, v2);
+  f = c->s2 % 8192;
+  if (f == 0)
+{
+  e.s2 += g;
+  f = g;
+}
+  else if (f < g)
+{
+  foo (2, 0, c->s1, c->s2);
+  return;
+}
+  if struct U *) b)->u2 & 1) && f == g)
+{
+  bar (*c);
+  foo (3, 0, c->s1, c->s2);
+  return;
+}
+  d = (struct T *) (b + c->s2 % 8192);
+  if (d->t2.s1 >= c->s1 && (d->t2.s1 != c->s1 || d->t2.s2 >= c->s2))
+foo (4, d, c->s1, c->s2);
+  return;
+}
+
+int
+main ()
+{
+  struct S *c = 0;
+  asm ("" : "+r" (c) : "r" (&a));
+  u.v.v2.t2.s1 = 8192;
+  b = u.b;
+  test (c);
+  if (v != 16384)
+abort ();
+  return 0;
+}

Jakub

Unreviewed libffi patch

2011-06-14 Thread Rainer Orth

The following patch has remained unreviewed for a week:

[libffi] Fix libffi.call/huge_struct.c on Tru64 UNIX
http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00644.html

It needs a libffi maintainer or global reviewer.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: PING^4 APPROVED patch for AMD64 targets running GNU/kFreeBSD, anyone?

2011-06-14 Thread Uros Bizjak

Hello!

> This patch for AMD64 targets running GNU/kFreeBSD has been approved
> already, would anyone be so kind to commit it?  I'm afraid I don't have
> write perms currently.

I have committed your patch to SVN mainline after bootstrapping it on
x86_64-pc-linux-gnu.

Thanks,
Uros.

RFA PR middle-end/48770

2011-06-14 Thread Jeff Law

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


This version incorporates suggestions from Bernd.  Basically we have
reload1.c set reload_completed internally rather than deferring it into
ira.c.  That allows the call to reload() to return whether or not a DCE
pass is desirable at the end of reload.

That in turn allows us to avoid the DF clumsiness of the previous version.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.

Bernd is still seeing some differences on mips64-linux; I've been unable
to reproduce those.  Bernd, if you can send me the dump files privately,
I'm more than happy to take a look at any remaining codegen differences
this patch is triggering.

Thanks,
Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN938LAAoJEBRtltQi2kC7JxIH/jexv1Wx3RZkba8fgBMbrYYg
QLPv273smckcvITNaOdMKSRRbq/8x+hiGI4VClYX3z1tGrlIaDf+n0S/mOGmMDc3
yjxeXRBf0F8QPmkt+QG+Ck6TH3+ya2OOWmP6/RNCBQdaf7ViVuBI+IlGzhEia1OH
YL+3yDTfLpAgJ9BYTpaIB8o9m/cAAx0Rfnwgx9gcQzFGSPgEep1tg+gnxoyMbvGX
IohygwiMkU27JLokeanowL9d2H7L0kYMX1S0biDOdlm1wLI9n3JfLO9PPF0SLv8A
EESCaRmeJRH93wlNLb5qpacESgQOc6B6++zCjf1W22/GVcZIe9WKaOuxKtsoU/I=
=ZEQd
-END PGP SIGNATURE-
PR middle-end/48770
* reload.h (reload): Change to return a bool.
* ira.c (ira): If requested by reload, run a fast DCE pass after
reload has completed.  Fix comment typo.
* reload1.c (need_dce): New file scoped static.
(reload): Set reload_completed here.  Return whether or not a DCE
pass after reload is needed.
(delete_dead_insn): Set need_dce as needed.

PR middle-end/48770
* gcc.dg/pr48770.c: New test.

Index: reload.h
===
*** reload.h(revision 174696)
--- reload.h(working copy)
*** extern void reload_cse_regs (rtx);
*** 420,426 
  extern void init_reload (void);
  
  /* The reload pass itself.  */
! extern int reload (rtx, int);
  
  /* Mark the slots in regs_ever_live for the hard regs
 used by pseudo-reg number REGNO.  */
--- 420,426 
  extern void init_reload (void);
  
  /* The reload pass itself.  */
! extern bool reload (rtx, int);
  
  /* Mark the slots in regs_ever_live for the hard regs
 used by pseudo-reg number REGNO.  */
Index: testsuite/gcc.dg/pr48770.c
===
*** testsuite/gcc.dg/pr48770.c  (revision 0)
--- testsuite/gcc.dg/pr48770.c  (revision 0)
***
*** 0 
--- 1,21 
+ /* { dg-do run } */
+ /* { dg-options "-O -fprofile-arcs -fPIC -fno-dce -fno-forward-propagate" } */
+ 
+ int test_goto2 (int f)
+ {
+   int i;
+   for (i = 0; ({_Bool a = i < 10;a;}); i++)
+   {
+ if (i == f)
+   goto lab2;
+   }
+   return 4;
+ lab2:
+   return 8;
+ }
+ 
+ int main ()
+ {
+   test_goto2 (30);
+   return 0;
+ }
Index: ira.c
===
*** ira.c   (revision 174759)
--- ira.c   (working copy)
*** along with GCC; see the file COPYING3.  
*** 383,388 
--- 383,389 
  #include "integrate.h"
  #include "ggc.h"
  #include "ira-int.h"
+ #include "dce.h"
  
  
  struct target_ira default_target_ira;
*** ira (FILE *f)
*** 3526,3531 
--- 3527,3533 
int rebuild_p;
int saved_flag_ira_share_spill_slots;
basic_block bb;
+   bool need_dce;
  
timevar_push (TV_IRA);
  
*** ira (FILE *f)
*** 3717,3723 
df_set_flags (DF_NO_INSN_RESCAN);
build_insn_chain ();
  
!   reload_completed = !reload (get_insns (), ira_conflicts_p);
  
timevar_pop (TV_RELOAD);
  
--- 3719,3725 
df_set_flags (DF_NO_INSN_RESCAN);
build_insn_chain ();
  
!   need_dce = reload (get_insns (), ira_conflicts_p);
  
timevar_pop (TV_RELOAD);
  
*** ira (FILE *f)
*** 3760,3766 
  #endif
  
/* The code after the reload has changed so much that at this point
!  we might as well just rescan everything.  Not that
   df_rescan_all_insns is not going to help here because it does not
   touch the artificial uses and defs.  */
df_finish_pass (true);
--- 3762,3768 
  #endif
  
/* The code after the reload has changed so much that at this point
!  we might as well just rescan everything.  Note that
   df_rescan_all_insns is not going to help here because it does not
   touch the artificial uses and defs.  */
df_finish_pass (true);
*** ira (FILE *f)
*** 3772,3777 
--- 3774,3782 
if (optimize)
  df_analyze ();
  
+   if (need_dce && optimize)
+ run_fast_dce ();
+ 
timevar_pop (TV_IRA);
  }
  
Index: reload1.c
===
*** reload1.c   (revision 174759)
--- reload1.c   (working copy)
*** static char *reload_insn_firstobj;
*** 250,255 
--- 250,259

Re: Unreviewed libffi patch

2011-06-14 Thread Andreas Tobler


On 14.06.11 17:22, Rainer Orth wrote:

The following patch has remained unreviewed for a week:

[libffi] Fix libffi.call/huge_struct.c on Tru64 UNIX
 http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00644.html

It needs a libffi maintainer or global reviewer.



From the test suite pov it looks ok. Verified with a test run on x86_64 
darwin.


Thanks,
Andreas

Re: Unreviewed libffi patch

2011-06-14 Thread Andrew Haley

On 06/14/2011 04:22 PM, Rainer Orth wrote:
> The following patch has remained unreviewed for a week:

I think it wasn't cc'd to libffi-disc...@sourceware.org

>   [libffi] Fix libffi.call/huge_struct.c on Tru64 UNIX
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00644.html
> 
> It needs a libffi maintainer or global reviewer.

This is OK.

Andrew.

[PATCH] Ensure incoming location is available in debug info for parameters (PR debug/49382)

2011-06-14 Thread Jakub Jelinek

Hi!

As detailed in the PR, when gdb attempts to print originally passed
values to parameters instead of current values using call site info,
if the parameter is modified already before the first real instruction
in the function, it will find there already the modified value.
E.g. void foo (int x) { x++; }
or the larger testcase in the PR where first insn in the function
is call (x++); and x is unused afterwards.
In this case we say x lives in DW_OP_breg5 1 DW_OP_stack_value
from the beginning of the function till the end (in the first case)
or middle of the call (in the PR testcase).
Unfortunately that means GDB doesn't know where x has been originally
passed and thus can't look up in call site info what was passed to it.

This patch special cases the parameters, such that the very first
location in VAR_LOCATION note will be emitted even as empty range
and won't be optimized away even if before the first real insn
is some other VAR_LOCATION note for the parameter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-06-14  Jakub Jelinek  

PR debug/49382
* dwarf2out.c (dw_loc_list_node): Add force field.
(add_var_loc_to_decl): For PARM_DECL, attempt to keep
the incoming location in the list, even if it is modified
before first real insn.
(output_loc_list): Emit empty ranges with force flag set.
(dw_loc_list): If first range of a PARM_DECL is empty,
set force flag.

--- gcc/dwarf2out.c.jj  2011-06-09 19:15:26.0 +0200
+++ gcc/dwarf2out.c 2011-06-14 12:04:39.0 +0200
@@ -4466,6 +4466,9 @@ typedef struct GTY(()) dw_loc_list_struc
   /* True if this list has been replaced by dw_loc_next.  */
   bool replaced;
   bool emitted;
+  /* True if the range should be emitted even if begin and end
+ are the same.  */
+  bool force;
 } dw_loc_list_node;
 
 static dw_loc_descr_ref int_loc_descriptor (HOST_WIDE_INT);
@@ -8619,7 +8622,30 @@ add_var_loc_to_decl (tree decl, rtx loc_
   else
 temp = (var_loc_list *) *slot;
 
-  if (temp->last)
+  /* For PARM_DECLs try to keep around the original incoming value,
+ even if that means we'll emit a zero-range .debug_loc entry.  */
+  if (temp->last
+  && temp->first == temp->last
+  && TREE_CODE (decl) == PARM_DECL
+  && GET_CODE (temp->first->loc) == NOTE
+  && NOTE_VAR_LOCATION_DECL (temp->first->loc) == decl
+  && DECL_INCOMING_RTL (decl)
+  && NOTE_VAR_LOCATION_LOC (temp->first->loc)
+  && GET_CODE (NOTE_VAR_LOCATION_LOC (temp->first->loc))
+== GET_CODE (DECL_INCOMING_RTL (decl))
+  && prev_real_insn (temp->first->loc) == NULL_RTX
+  && (bitsize != -1
+ || !rtx_equal_p (NOTE_VAR_LOCATION_LOC (temp->first->loc),
+  NOTE_VAR_LOCATION_LOC (loc_note))
+ || (NOTE_VAR_LOCATION_STATUS (temp->first->loc)
+ != NOTE_VAR_LOCATION_STATUS (loc_note
+{
+  loc = ggc_alloc_cleared_var_loc_node ();
+  temp->first->next = loc;
+  temp->last = loc;
+  loc->loc = construct_piece_list (loc_note, bitpos, bitsize);
+}
+  else if (temp->last)
 {
   struct var_loc_node *last = temp->last, *unused = NULL;
   rtx *piece_loc = NULL, last_loc_note;
@@ -8665,7 +8691,9 @@ add_var_loc_to_decl (tree decl, rtx loc_
}
  else
{
- gcc_assert (temp->first == temp->last);
+ gcc_assert (temp->first == temp->last
+ || (temp->first->next == temp->last
+ && TREE_CODE (decl) == PARM_DECL));
  memset (temp->last, '\0', sizeof (*temp->last));
  temp->last->loc = construct_piece_list (loc_note, bitpos, 
bitsize);
  return temp->last;
@@ -11392,7 +11420,7 @@ output_loc_list (dw_loc_list_ref list_he
 {
   unsigned long size;
   /* Don't output an entry that starts and ends at the same address.  */
-  if (strcmp (curr->begin, curr->end) == 0)
+  if (strcmp (curr->begin, curr->end) == 0 && !curr->force)
continue;
   if (!have_multiple_function_sections)
{
@@ -16087,6 +16115,11 @@ dw_loc_list (var_loc_list *loc_list, tre
  }
 
*listp = new_loc_list (descr, node->label, endname, secname);
+   if (TREE_CODE (decl) == PARM_DECL
+   && node == loc_list->first
+   && GET_CODE (node->loc) == NOTE
+   && strcmp (node->label, endname) == 0)
+ (*listp)->force = true;
listp = &(*listp)->dw_loc_next;
 
if (range_across_switch)

Jakub

Re: [testsuite]: Skip tests for targets with int < 32 bits

2011-06-14 Thread Mike Stump

On Jun 14, 2011, at 2:20 AM, Georg-Johann Lay wrote:

> testsuite/
> 
>   * gcc.c-torture/execute/cmpsi-2.c: Undo 172757.

Please always include the PR number in the changelog entries when there is one. 
 This autolinks the work to the PR.  Use the exact formatting found in the 
changelog file.  It goes just before the above line.

Re: [PATCH] Only run pr48377.c testcase on i?86/x86_64

2011-06-14 Thread Jakub Jelinek

On Tue, Jun 14, 2011 at 04:52:18PM +0200, Eric Botcazou wrote:
> > Well, Steve has a patch for non_strict_align effective_target
> > in http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00673.html
> > (with s/strict_align/non_strict_align/g ), I was hoping it would be
> > reviewed and I'd just adjust the testcase to use it as well.
> 
> Would it be applied to the 4.6 branch as well?  If no, I think you should 
> apply 
> your patch to trunk and 4.6 branch and let Steve adjust it on trunk later.

I'd say it should be applied there as well.

Here is what I've just bootstrapped/regtested, Steve's patch with that
s/strict_align/non_strict_align/g plus a smallish change on top of that.

Mike, is this ok for trunk/4.6?

2011-06-14  Jakub Jelinek  

PR tree-optimization/48377
* gcc.dg/vect/pr48377.c: Add dg-require-effective-target
non_strict_align.

2011-06-14  Steve Ellcey  

PR middle-end/49191
* lib/target-supports.exp (check_effective_target_non_strict_align):
New.
* gcc.dg/memcpy-3.c: Add dg-require-effective-target non_strict_align.

--- gcc/testsuite/lib/target-supports.exp   (revision 174336)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -3901,3 +3901,11 @@
 return 1
 }
 
+proc check_effective_target_non_strict_align {} {
+return [check_no_compiler_messages non_strict_align assembly {
+   char *y;
+   typedef char __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))) c;
+   c *z;
+   void foo(void) { z = (c *) y; }
+} "-Wcast-align"]
+}
--- gcc/testsuite/gcc.dg/memcpy-3.c (revision 174336)
+++ gcc/testsuite/gcc.dg/memcpy-3.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-optimized" } */
+/* { dg-require-effective-target non_strict_align } */
 
 int get_int(const void *p)
 {
--- gcc/testsuite/gcc.dg/vect/pr48377.c.jj  2011-05-02 18:39:10.0 
+0200
+++ gcc/testsuite/gcc.dg/vect/pr48377.c 2011-06-03 13:19:53.0 +0200
@@ -1,4 +1,5 @@
 /* PR tree-optimization/48377 */
+/* { dg-require-effective-target non_strict_align } */
 
 typedef unsigned int U __attribute__((__aligned__ (1), __may_alias__));
 


Jakub

Re: Dump before flag

2011-06-14 Thread Xinliang David Li

On Tue, Jun 14, 2011 at 6:58 AM, Richard Guenther
 wrote:
> On Fri, Jun 10, 2011 at 8:44 PM, Xinliang David Li  wrote:
>> This is the revised patch as suggested.
>>
>> How does it look?
>
>  }
>
> +static void
> +execute_function_dump (void *data ATTRIBUTE_UNUSED)
>
> function needs a comment.
>
> Ok with that change.
>
> Please always specify how you tested the patch - the past fallouts
> suggest you didn't do the required testing carefully.

I think I did -- the fallout was probably due to different
'--enable-checking' setting. I have now turned it to 'yes'

Thanks,

David

>
> A changelog is missing as well.
>
> Thanks,
> Richard.
>
>> Thanks,
>>
>> David
>>
>> On Fri, Jun 10, 2011 at 9:22 AM, Xinliang David Li  
>> wrote:
>>> On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther
>>>  wrote:
 On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li  
 wrote:
> See attached.

 Hmm.  I don't like how you still wire dumping in the TODO routines.
 Doesn't it work to just dump the body from pass_fini_dump_file ()?
 Or if that doesn't sound clean from (a subset of) places where it
 is called? (we might want to exclude the ipa read/write/summary
 stages)
>>>
>>> That may require another round of function traversal -- but probably
>>> not a big deal -- it sounds cleaner.
>>>
>>> David
>>>

 Richard.

> Thanks,
>
> David
>
> On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther
>  wrote:
>> On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li  
>> wrote:
>>> this is the patch that just removes the TODO_dump flag and forces it
>>> to dump. The original code cfun->last_verified = flags &
>>> TODO_verify_all looks weird -- depending on TODO_dump is set or not,
>>> the behavior of the update is different (when no other todo flags is
>>> set).
>>>
>>> Ok for trunk?
>>
>> -ENOPATCH.
>>
>> Richard.
>>
>>> David
>>>
>>> On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li  
>>> wrote:
 On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther
  wrote:
> On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li 
>  wrote:
>> The following is the patch that does the job. Most of the changes are
>> just  removing TODO_dump_func. The major change is in passes.c and
>> tree-pass.h.
>>
>> -fdump-xxx-yyy-start       <-- dump before TODO_start
>> -fdump-xxx-yyy-before    <-- dump before main pass after TODO_pass
>> -fdump-xxx-yyy-after       <-- dump after main pass before 
>> TODO_finish
>> -fdump-xxx-yyy-finish      <-- dump after TODO_finish
>
> Can we bikeshed a bit more about these names?

 These names may be less confusing:

 before_preparation
 before
 after
 after_cleanup

 David

> "start" and "before"
> have no semantical difference to me ... as the dump before TODO_start
> of a pass and the dump after TODO_finish of the previous pass are
> identical (hopefully ;)), maybe merge those into a -between flag?
> If you'd specify it for a single pass then you'd get both -start and 
> -finish
> (using your naming scheme).  Splitting that dump(s) to different files
> then might make sense (not sure about the name to use).
>
> Note that I find it extremely useful to have dumping done in
> chronological order - splitting some of it to different files destroys
> this, especially a dump after TODO_start or before TODO_finish
> should appear in the same file (or we could also start splitting
> individual TODO_ output into sub-dump-files).  I guess what would
> be nice instread would be a fancy dump-file viewer that could
> show diffs, hide things like SCEV output, etc.
>
> I suppose a patch that removes the dump TODO and unconditionally
> dumps at the current point would be a good preparation for this
> enhancing patch.
>
> Richard.
>
>> The default is 'finish'.
>>
>> Does it look ok?
>>
>> Thanks,
>>
>> David
>>
>> On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
>>  wrote:
>>> On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li 
>>>  wrote:
>
> Your patch doesn't really improve this but adds to the confusion.
>
> +  /* Override dump TODOs.  */
> +  if (dump_file && (pass->todo_flags_finish & TODO_dump_func)
> +      && (dump_flags & TDF_BEFORE))
> +    {
> +      pass->todo_flags_finish &= ~TODO_dump_func;
> +      pass->todo_flags_start |= TODO_dump_func;
> +    }
>
> and certainly writing to pass is not ok.  And

PATCH [6/n]: Prepare x32: PR middle-end/47449: Don't propagate hard register non-local goto save area

2011-06-14 Thread H.J. Lu

Hi,

RTL-based forward propagation pass shouldn't propagate hard register.
OK for trunk?

Thanks.


H.J.
---
2011-06-14  H.J. Lu  

PR middle-end/47449
* fwprop.c (forward_propagate_subreg): Don't propagate hard
register nor zero/sign extended hard register.

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index b2fd955..c8009d0 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -1101,6 +1101,7 @@ forward_propagate_subreg (df_ref use, rtx def_insn, rtx 
def_set)
   src = SET_SRC (def_set);
   if (GET_CODE (src) == SUBREG
  && REG_P (SUBREG_REG (src))
+ && REGNO (SUBREG_REG (src)) >= FIRST_PSEUDO_REGISTER
  && GET_MODE (SUBREG_REG (src)) == use_mode
  && subreg_lowpart_p (src)
  && all_uses_available_at (def_insn, use_insn))
@@ -1119,6 +1120,7 @@ forward_propagate_subreg (df_ref use, rtx def_insn, rtx 
def_set)
   if ((GET_CODE (src) == ZERO_EXTEND
   || GET_CODE (src) == SIGN_EXTEND)
  && REG_P (XEXP (src, 0))
+ && REGNO (XEXP (src, 0)) >= FIRST_PSEUDO_REGISTER
  && GET_MODE (XEXP (src, 0)) == use_mode
  && !free_load_extend (src, def_insn)
  && all_uses_available_at (def_insn, use_insn))

Re: Unreviewed libffi patch

2011-06-14 Thread Rainer Orth

Andrew Haley  writes:

> On 06/14/2011 04:22 PM, Rainer Orth wrote:
>> The following patch has remained unreviewed for a week:
>
> I think it wasn't cc'd to libffi-disc...@sourceware.org

Right, I hadn't known/had forgotten about that since all my libffi fixes
happen in GCC context.  I'd only Cc'ed it to Anthony, but he isn't
listed as libffi maintainer in MAINTAINERS anymore as I just noticed.

>>  [libffi] Fix libffi.call/huge_struct.c on Tru64 UNIX
>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00644.html
>> 
>> It needs a libffi maintainer or global reviewer.
>
> This is OK.

Installed, thanks.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

PATCH [7/n]: Prepare x32: Use Use long long builtin for x86-64

2011-06-14 Thread H.J. Lu

Hi,

long may be 32bit for x86-64. But long long is always 64bit.  This
patch uses long long builtin for 64bit.  OK for trunk?

Thanks.


H.J.
---
2011-06-14  H.J. Lu  

* longlong.h (count_leading_zeros): Use long long builtin for
x86-64.
(count_trailing_zeros): Likewise.

diff --git a/gcc/longlong.h b/gcc/longlong.h
index 1bab76d..d5c0cd9 100644
--- a/gcc/longlong.h
+++ b/gcc/longlong.h
@@ -430,8 +430,8 @@ UDItype __umulsidi3 (USItype, USItype);
   : "0" ((UDItype) (n0)),  \
 "1" ((UDItype) (n1)),  \
 "rm" ((UDItype) (dv)))
-#define count_leading_zeros(count, x)  ((count) = __builtin_clzl (x))
-#define count_trailing_zeros(count, x) ((count) = __builtin_ctzl (x))
+#define count_leading_zeros(count, x)  ((count) = __builtin_clzll (x))
+#define count_trailing_zeros(count, x) ((count) = __builtin_ctzll (x))
 #define UMUL_TIME 40
 #define UDIV_TIME 40
 #endif /* x86_64 */

RFA minor DF cleanup

2011-06-14 Thread Jeff Law

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


As I've noted in prior messages; I'm looking to improve our path
isolation to improve code generation and reduce false positives from
warnings.

The patch that's been in my queue for some time now (and I suspect it's
the final patch to our current implementation of jump threading) is
capable of isolating more paths, but is certainly not capable of fully
isolating every optimizable path through the CFG and eliminating all
unexecutable paths through the CFG (neither of which is actually
desirable due to potential code bloat issues).

As a result of this better, but not full isolation, we can end up
exposing a constant propagation in a unexecutable path through the CFG
that isn't detected as unexecutable.  As a result of exposing the
constant propagation we can trigger a bogus warning from -Warray-bounds.

The problem is we might have something like this:

>   # BLOCK 11 freq:4946
>   # PRED: 9 [50.0%]  (false,exec) 10 [100.0%]  (fallthru,exec) 8 [28.0%]
>  (false,exec)
> Invalid sum of incoming frequencies 2819, should be 4946
>   # D.39048_1 = PHI <3(9), D.39048_19(10), 4294967295(8)>
>   # VUSE <.MEM_38(D)>
>   D.39016_24 = default_target_hard_regs.x_fixed_regs[D.39048_1];
> 


- -Warray-bounds won't warn for this as it only triggers when we propagate
a constant for an array index and the constant is out of bounds of the
array.In this case D.39048_1 is not a constant and thus
- -Warray-bounds does not issues a warning.


The patch I've got queued up will isolate the path 8->9 (to optimize
elsewhere).  This results in a new block which looks like:

temp = PHI (4294967295);
D.39016_xx = default_target_hard_regs.x_fixed_regs[temp];

We then propagate the constant into the use of temp triggering the
- -Warray-bounds warning.

This is caused by this code fragment:

>   /* Any constant, or pseudo with constant equivalences, may
>  require reloading from memory using the pic register.  */
>   if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM
>   && fixed_regs[PIC_OFFSET_TABLE_REGNUM])
> bitmap_set_bit (regular_block_artificial_uses, 
> PIC_OFFSET_TABLE_REGNUM);

combined with this code from the x86 backend:

> #define PIC_OFFSET_TABLE_REGNUM \
>   ((TARGET_64BIT && ix86_cmodel == CM_SMALL_PIC)\
>|| !flag_pic ? INVALID_REGNUM\
>: reload_completed ? REGNO (pic_offset_table_rtx)\
>: REAL_PIC_OFFSET_TABLE_REGNUM)


While the new code can significantly improve path isolation, it's unable
to fully isolate the paths in this code, leading to the partial
isolation and exposing the constant propagation in the dead path which
triggers -Warray-bounds warning.

I'm hoping the ideas I'm working on for revamping how we handle path
isolation may fix this, but it's hard to be sure right now.  In the mean
time, this patch fixes the instances where the next improvements to jump
threading expose the bogus -Warray-bounds warning.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  OK for
mainline?

Thanks,
Jeff





-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN94aoAAoJEBRtltQi2kC7DwkIAI2zu87P0mqwf+NzI3BAPQpU
GQl9d2Lw4z7diUfn7k+q2OqZMaoof9L0CqvhqC07Pz+UGzpke28o2WoS2Jrwxbj9
eQzC/H5DcAXmazvkwpe0BphvtqD+2Puz3pilQG1Nyopi1xJB5aKhC55VLntQuAvy
+yaw/ozJ/d0Gt9myR/NXLe0NPfRycDeuC6U+iYRolJ7I/PxP/gZZ5dW68xakstLp
oaQOakKmTres7CMWqG6ZV+5KJyQU92rnkp4ympKZGkciK1yI7Bl8fA87SqY/QkzN
eDoGP37hQnJZkh39QLQjOZCfU5ywVAP81BnYsjaeSAOEd/SdQA63nIzVhGXoDEA=
=K4dB
-END PGP SIGNATURE-
* df-problems.c (df_lr_local_compute): Manually CSE
PIC_OFFSET_TABLE_REGNUM.
* df-scan.c (df_get_regular_block_artificial_uses): Likewise.
(df_get_entry_block_def_set, df_get_exit_block_use_set): Likewise.

Index: df-problems.c
===
*** df-problems.c   (revision 174927)
--- df-problems.c   (working copy)
*** df_lr_local_compute (bitmap all_blocks A
*** 906,911 
--- 906,912 
   blocks within infinite loops.  */
if (!reload_completed)
  {
+   unsigned int pic_offset_table_regnum = PIC_OFFSET_TABLE_REGNUM;
/* Any reference to any pseudo before reload is a potential
 reference of the frame pointer.  */
bitmap_set_bit (&df->hardware_regs_used, FRAME_POINTER_REGNUM);
*** df_lr_local_compute (bitmap all_blocks A
*** 919,927 
  
/* Any constant, or pseudo with constant equivalences, may
 require reloading from memory using the pic register.  */
!   if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM
! && fixed_regs[PIC_OFFSET_TABLE_REGNUM])
!   bitmap_set_bit (&df->hardware_regs_used, PIC_OFFSET_TABLE_REGNUM);
  }
  
EXECUTE_IF_SET_IN_BITMAP (df_lr->out_of_date_transfer_functions, 0, 
bb_index,

Re: [Design notes, RFC] Address-lowering prototype design (PR46556)

2011-06-14 Thread William J. Schmidt

On Tue, 2011-06-14 at 17:21 +0200, Richard Guenther wrote:
> On Tue, Jun 14, 2011 at 4:18 PM, William J. Schmidt
>  wrote:
> > On Tue, 2011-06-14 at 15:39 +0200, Richard Guenther wrote:
> >> On Fri, Jun 10, 2011 at 5:11 PM, William J. Schmidt
> >>  wrote:
> >> > On Tue, 2011-06-07 at 16:49 +0200, Richard Guenther wrote:
> >> >> On Tue, Jun 7, 2011 at 4:14 PM, William J. Schmidt
> >> >>  wrote:
> >> >
> >> > 
> >> >
> >> >> >> > Loss of aliasing information
> >> >> >> > 
> >> >> >> > The most serious problem I've run into is degraded performance due 
> >> >> >> > to poorer
> >> >> >> > instruction scheduling choices.  I tracked this down to
> >> >> >> > alias.c:nonoverlapping_component_refs_p.
> >> >> >> >
> >> >> >> > This code proves that two memory accesses don't overlap by 
> >> >> >> > attempting to prove
> >> >> >> > that they access different fields of the same structure.  This is 
> >> >> >> > done using
> >> >> >> > the MEM_EXPRs of the two rtx's, which record the expression trees 
> >> >> >> > that were
> >> >> >> > translated into the rtx's during expand.  When address lowering is 
> >> >> >> > not
> >> >> >> > present, a simple COMPONENT_REF will appear in the MEM_EXPR:  x.a, 
> >> >> >> > for
> >> >> >> > example.  However, address lowering changes the simple 
> >> >> >> > COMPONENT_REF into a
> >> >> >> > [TARGET_]MEM_REF that is no longer necessarily identifiable as a 
> >> >> >> > field
> >> >> >> > reference.  Thus the aliasing machinery can no longer prove that 
> >> >> >> > two such
> >> >> >> > field references are disjoint.
> >> >> >> >
> >> >> >> > This has severe consequences for performance, and has to be dealt 
> >> >> >> > with if
> >> >> >> > address lowering is to be successful.
> >> >> >> >
> >> >> >> > I've worked around this with an admittedly fragile solution; I'll 
> >> >> >> > discuss the
> >> >> >> > drawbacks below.  The idea is to construct a mapping from 
> >> >> >> > replacement mem_refs
> >> >> >> > to the original expressions that they replaced.  When a MEM_EXPR 
> >> >> >> > is being set
> >> >> >> > during expand, we first look up the mem_ref in the mapping.  If 
> >> >> >> > present, the
> >> >> >> > MEM_EXPR is set to the original expression, rather than to the 
> >> >> >> > mem_ref.  This
> >> >> >> > essentially duplicates the behavior in the absence of address 
> >> >> >> > lowering.
> >> >> >>
> >> >> >> Ick.  We had this in the past via TMR_ORIGINAL which caused all sorts
> >> >> >> of problems.  Removing it didn't cause much degradation because we 
> >> >> >> now
> >> >> >> preserve points-to information.
> >> >> >>
> >> >> >> Originally I played with lowering all memory accesses to MEM_REFs
> >> >> >> (see the old mem-ref branch), and the loss of type-based alias
> >> >> >> disambiguation was indeed an issue.
> >> >> >>
> >> >> >> But - I definitely do not like the idea of preserving something 
> >> >> >> similar
> >> >> >> to TMR_ORIGINAL.  Instead we can try preserving some information
> >> >> >> we derive from it.  We keep the original access type that we can use
> >> >> >> for TBAA but do not retain knowledge on whether the type of the
> >> >> >> MEM_REF is valid for TBAA or if it is view-converted.
> >> >> >
> >> >> > Yes, I really don't like what I have at the moment, either.  I put it 
> >> >> > in
> >> >> > place as a stopgap to let me proceed to look for other performance
> >> >> > problems.
> >> >> >
> >> >> > The question is how we can infer useful information for TBAA from the
> >> >> > MEM_REFs and TMRs.  I poked at trying to identify types and offsets 
> >> >> > from
> >> >> > the MEM_EXPRs, but this ended up being useless; I had to constrain too
> >> >> > many cases to maintain correctness, and couldn't prove the type
> >> >> > information for the important cases in SPEC I was trying to address.
> >> >> >
> >> >> > Unfortunately, the whole design goes down the drain if we can't find a
> >> >> > way to solve the TBAA issue.  The performance degradations are too
> >> >> > costly.
> >> >>
> >> >> If you look at what basic TBAA the alias oracle performs then it boils
> >> >> down to the fact that get_alias_set for a.b.c might end up using the
> >> >> alias-set of the type of C but for MEM[&a + 4] it will use the alias set
> >> >> of the type of a.  The tree alias-oracle extracts both alias sets, that
> >> >> of the outermost valid type and that of the innermost as both are
> >> >> equally useful.  But the MEM_REF (or TARGET_MEM_REF) tree
> >> >> only have storage for one such alias-set.  Thus my idea at some point
> >> >> was to store the other one as well in some form.  It will not be
> >> >> the full information (after all, the complete access path does provide
> >> >> some extra information - see aliasing_component_refs_p).
> >> >
> >> > This is what concerns me.  TBAA information for the outer and inner
> >> > components doesn't seem sufficient to provide what
> >> > nonoverlapping_component_refs_p is currently ab

[google] backport r174930 to google/main

2011-06-14 Thread Xinliang David Li

Backported r174930 to google/main.

David

Re: [PATCH] Ensure incoming location is available in debug info for parameters (PR debug/49382)

2011-06-14 Thread Jeff Law

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/14/11 09:51, Jakub Jelinek wrote:
> Hi!
> 
> As detailed in the PR, when gdb attempts to print originally passed
> values to parameters instead of current values using call site info,
> if the parameter is modified already before the first real instruction
> in the function, it will find there already the modified value.
> E.g. void foo (int x) { x++; }
> or the larger testcase in the PR where first insn in the function
> is call (x++); and x is unused afterwards.
> In this case we say x lives in DW_OP_breg5 1 DW_OP_stack_value
> from the beginning of the function till the end (in the first case)
> or middle of the call (in the PR testcase).
> Unfortunately that means GDB doesn't know where x has been originally
> passed and thus can't look up in call site info what was passed to it.
> 
> This patch special cases the parameters, such that the very first
> location in VAR_LOCATION note will be emitted even as empty range
> and won't be optimized away even if before the first real insn
> is some other VAR_LOCATION note for the parameter.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2011-06-14  Jakub Jelinek  
> 
>   PR debug/49382
>   * dwarf2out.c (dw_loc_list_node): Add force field.
>   (add_var_loc_to_decl): For PARM_DECL, attempt to keep
>   the incoming location in the list, even if it is modified
>   before first real insn.
>   (output_loc_list): Emit empty ranges with force flag set.
>   (dw_loc_list): If first range of a PARM_DECL is empty,
>   set force flag.
OK.
Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN94wFAAoJEBRtltQi2kC7UcsIAJZNzCJqYUB0/axMZlooxpBq
OCZZTM1m/GRH0NZMTOx5gebvfXyJazcbM2z/tQAaYvKNfFiQNus9W3shzSDW3jzP
vIFKnc4mMZIJnulrtZ1zCrxN6ahyCj3LPgOOhoIr/FJqCjetLDIeLexlbQPST2fo
UOIscpkfL8e9QztBivkcMCMc7EDsmiwAyZeHzzUrO+WUK6vnWUpqXR0cCFwOewyv
76c+ce+7/LW6BG50PmmdvuvaDOoyz1jrj4PUNxlNTdnuYm0IwL7FA43iE0I2Vk/0
TUgBHJgYxMTijk+TeXWknRO1b0WgBMZQTIvWYknxOUyeO3P7XUIfxGmykGYr5vw=
=0NBt
-END PGP SIGNATURE-

Re: PATCH [1/n]: Prepare x32: PR middle-end/47364: internal compiler error: in emit_move_insn, at expr.c:3355

2011-06-14 Thread H.J. Lu

On Tue, Jun 14, 2011 at 8:11 AM, Richard Guenther
 wrote:
> On Sun, Jun 12, 2011 at 6:28 PM, H.J. Lu  wrote:
>> On Sun, Jun 12, 2011 at 7:33 AM, H.J. Lu  wrote:
>>> On Sun, Jun 12, 2011 at 7:00 AM, H.J. Lu  wrote:
 On Sun, Jun 12, 2011 at 6:50 AM, Richard Guenther
  wrote:
> On Sun, Jun 12, 2011 at 3:18 PM, H.J. Lu  wrote:
>> On Sun, Jun 12, 2011 at 3:48 AM, Richard Guenther
>>  wrote:
>>> On Sat, Jun 11, 2011 at 5:09 PM, H.J. Lu  wrote:
 Hi,

 expand_builtin_strlen has

 src_reg = gen_reg_rtx (Pmode);
 ...
 pat = expand_expr (src, src_reg, ptr_mode, EXPAND_NORMAL);
 if (pat != src_reg)
  emit_move_insn (src_reg, pat);

 But src_reg may be in ptr_mode, wich may not be the same as Pmode.
 This patch checks it.  OK for trunk?

 Thanks.


 H.J.
 ---
 2011-06-11  H.J. Lu  

        PR middle-end/47364
        * builtins.c (expand_builtin_strlen): Properly handle target
        not in Pmode.

 diff --git a/gcc/builtins.c b/gcc/builtins.c
 index 7b24a0c..4e2cf31 100644
 --- a/gcc/builtins.c
 +++ b/gcc/builtins.c
 @@ -2941,7 +2941,11 @@ expand_builtin_strlen (tree exp, rtx target,
       start_sequence ();
       pat = expand_expr (src, src_reg, ptr_mode, EXPAND_NORMAL);
       if (pat != src_reg)
 -       emit_move_insn (src_reg, pat);
 +       {
 +         if (GET_MODE (pat) != Pmode)
 +           pat = convert_to_mode (Pmode, pat, 1);
>>>
>>> Shouldn't this be POINTERS_EXTEND_UNSIGNED instead of 1?
>>>
 +         emit_move_insn (src_reg, pat);
>>>
>>> Why not use convert_move unconditionally?
>>>
>>> Or, why not expand src in Pmode from the start?  After all, src_reg is
>>> created as Pmode reg.
>>>
>>
>> This patch works for my testcase.  OK for trunk?
>
> Ok if it passes bootstrap & regtest on a ptr_mode != Pmode target.
>

 Only the following targets expand strlen:

 avr/avr.md:(define_expand "strlenhi"
 avr/avr.md:(define_insn "*strlenhi"
 i386/i386.md:(define_expand "strlen"
 i386/i386.md: if (ix86_expand_strlen (operands[0], operands[1],
 operands[2], operands[3]))
 i386/i386.md:(define_expand "strlenqi_1"
 i386/i386.md:(define_insn "*strlenqi_1"
 rs6000/rs6000.md:(define_expand "strlensi"
 s390/s390.md:; strlenM instruction pattern(s).
 s390/s390.md:(define_expand "strlen"
 s390/s390.md:(define_insn "*strlen"

 None of them, except for my x32 port, are ptr_mode != Pmode targets.
 I will bootstrap and test it on my x32 branch.

>>>
>>> It doesn't work on x32. I got
>>>
>>> /export/gnu/import/git/gcc-x32/libssp/gets-chk.c:74:14: internal
>>> compiler error: in emit_move_insn, at expr.c:3319
>>> Please submit a full bug report,
>>> with preprocessed source if appropriate.
>>> See  for instructions.
>>>
>>> How about this patch?
>>>
>>> Thanks.
>>
>> No regressions on x32 branch.  OK for trunk?
>
> Does it work with also doing the expansion to Pmode in the first
> place?  If so, ok with that change.
>

This is the patch I checked in.

Thanks.


-- 
H.J.
---
Index: builtins.c
===
--- builtins.c  (revision 175033)
+++ builtins.c  (working copy)
@@ -2939,9 +2939,16 @@ expand_builtin_strlen (tree exp, rtx tar

   /* Now that we are assured of success, expand the source.  */
   start_sequence ();
-  pat = expand_expr (src, src_reg, ptr_mode, EXPAND_NORMAL);
+  pat = expand_expr (src, src_reg, Pmode, EXPAND_NORMAL);
   if (pat != src_reg)
-   emit_move_insn (src_reg, pat);
+   {
+#ifdef POINTERS_EXTEND_UNSIGNED
+ if (GET_MODE (pat) != Pmode)
+   pat = convert_to_mode (Pmode, pat,
+  POINTERS_EXTEND_UNSIGNED);
+#endif
+ emit_move_insn (src_reg, pat);
+   }
   pat = get_insns ();
   end_sequence ();

Index: ChangeLog
===
--- ChangeLog   (revision 175033)
+++ ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2011-06-14  H.J. Lu  
+
+   PR middle-end/47364
+   * builtins.c (expand_builtin_strlen): Expand strlen to Pmode
+   and properly handle result not in Pmode.
+
 2011-06-14  Robert Millan  

* config/i386/kfreebsd-gnu.h: Resync with `config/i386/linux.h'.

Re: Improve DSE in the presence of calls

2011-06-14 Thread Jeff Law

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/10/11 13:18, Easwaran Raman wrote:

>>> I am not sure I understand the problem here.  If there is a wild read
>>> from asm, the instruction has the wild_read flag set. The if statement
>>> checks if that flag is set and if so it clears the bitmap - which was
>>> the original behavior. Originally, only if read_rec is non NULL you
>>> need to recompute the kill set. Now, even if read_rec is NULL,
>>> non_frame_wild_read could be set requiring the kill set to be
>>> modified, which is what this patch does.  In fact, isn't what you have
>>> written above the equivalent to what is in the patch as '/* Leave this
>>> clause unchanged */' is the same as
>>>
>>>  if (dump_file)
>>>fprintf (dump_file, "regular read\n");
>>>  scan_reads_nospill (insn_info, v, NULL);
>>>
>>>
>>> -Easwaran
>>>
> 
>> Ping.  I have changed the test case to use int and added another test
>> case that shows DSE doesn't happen when  the struct instance is
>> volatile (wild_read gets set in that case)
> 
> 
> What's the purpose behind using unit64_t in the testcase?  Somehow I
> suspect using int64_t means the test is unlikely not going to work
> across targets with different word sizes.
Sorry for the exceedingly long wait.  Things have been a bit crazy the
last several weeks.

On a positive note, re-reading things now I think my objection/comment
was mis-guided.

Patch approved, and again, sorry for the absurdly long period of
non-responsiveness.

jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJN942IAAoJEBRtltQi2kC7Fj4IAIUvXsEKHZEKHS2k/psJWyaM
Uo/vW3CLydRP0+Np/VVSwzHlmWfdWmOj1WPw1Svhvr4gP8BrZ13okVv5jbw1Hh3Y
R4mShXFK5eYzmGx5wL54hOze5zViN3gomNGbDAAhk6TCzNXmPyLT/6V1tLFTNhD5
6zOiW8pXh9ik6qTTCKbG0EMuJXDnIbYrJs4d/gHFerUgmRPc8adKjF3PCngD3F4r
40n9W/UxUejYUddavDW1fIdALWYc56F3glplFsII7SMnOmih8MTFYOvk6SZsLS5O
G2nzmnUuwt6tPWTyk9bpVKQi5dn8MmLkM13w22t36GKIg6OER2KfUdv44dgE7yw=
=o7AI
-END PGP SIGNATURE-

Re: Create common hooks structure shared between driver and cc1

2011-06-14 Thread Ian Lance Taylor

On Wed, May 25, 2011 at 12:21 PM, Joseph S. Myers
 wrote:
> Here is a revised version of my patch
>  to create
> the common hooks structure.  Tested in the same way as the original
> patch.  OK to commit?
>
> 2011-05-25  Joseph Myers  
>
>        * common/common-target-def.h, common/common-target.def,
>        common/common-target.h, common/config/default-common.c,
>        common/config/pa/pa-common.c: New files.
>        * Makefile.in (common_out_file, common_out_object_file,
>        COMMON_TARGET_H, COMMON_TARGET_DEF_H): New.
>        (OBJS-libcommon-target): Include $(common_out_object_file).
>        (prefix.o): Update dependencies.
>        ($(common_out_object_file), common/common-target-hooks-def.h,
>        s-common-target-hooks-def-h): New.
>        (s-tm-texi): Also check timestamp on common-target.def.
>        (build/genhooks.o): Update dependencies.
>        * config.gcc (common_out_file, target_has_targetm_common): Define.
>        * config/pa/som.h (ALWAYS_STRIP_DOTDOT): Replace with
>        TARGET_ALWAYS_STRIP_DOTDOT.
>        * configure.ac (common_out_object_file): Define.
>        (common_out_file, common_out_object_file): Substitute.
>        (common): Create directory.
>        * configure: Regenerate.
>        * doc/tm.texi.in (targetm_common): Document.
>        (TARGET_ALWAYS_STRIP_DOTDOT): Add @hook entry.
>        * doc/tm.texi: Regenerate.
>        * genhooks.c (hook_array): Also include common/common-target.def.
>        * prefix.c (tm.h): Don't include.
>        (common/common-target.h): Include.
>        (ALWAYS_STRIP_DOTDOT): Don't define.
>        (update_path): Use targetm_common.always_strip_dotdot instead of
>        ALWAYS_STRIP_DOTDOT.
>        * system.h (ALWAYS_STRIP_DOTDOT): Poison.

This is OK.

Thanks.

Ian

Re: Move option-related hooks to common structure

2011-06-14 Thread Ian Lance Taylor

On Fri, May 27, 2011 at 9:13 AM, Joseph S. Myers
 wrote:
>
> 2011-05-27  Joseph Myers  
>
>        * target-def.h (TARGET_HAVE_NAMED_SECTIONS): Move to
>        common/common-target-def.h.
>        * target.def (default_target_flags, handle_option,
>        supports_split_stack, optimization_table, init_struct,
>        except_unwind_info, unwind_tables_default, have_named_sections):
>        Move to common/common-target.def.
>        * target.h (enum opt_levels, struct default_options): Move to
>        common/common-target.h.
>        * targhooks.c (default_except_unwind_info,
>        dwarf2_except_unwind_info, sjlj_except_unwind_info,
>        default_target_handle_option, empty_optimization_table): Move to
>        common/common-targhooks.c.
>        * targhooks.h (default_except_unwind_info,
>        dwarf2_except_unwind_info, sjlj_except_unwind_info,
>        default_target_handle_option, empty_optimization_table): Move to
>        common/common-targhooks.h.
>        * common/common-target-def.h: Include common/common-targhooks.h.
>        (TARGET_HAVE_NAMED_SECTIONS): Define if TARGET_ASM_NAMED_SECTION
>        defined.
>        * common/common-target.def (handle_option, option_init_struct,
>        option_optimization_table, default_target_flags,
>        except_unwind_info, supports_split_stack, unwind_tables_default,
>        have_named_sections): Move from target.def.
>        (HOOK_PREFIX): Undefine at end of file.
>        * common/common-target.h: Include input.h.
>        (enum opt_levels, struct default_options): Move from target.h.
>        * common/common-targhooks.c, common/common-targhooks.h: New.
>        * config.gcc (target_has_targetm_common): Default to yes.
>        (moxie*): Set target_has_targetm_common=no.
>        (hppa*-*-*): Don't set target_has_targetm_common=yes.
>        * doc/tm.texi: Regenerate.
>        * Makefile.in (COMMON_TARGET_H): Add $(INPUT_H).
>        (C_TARGET_DEF_H): Add common/common-targhooks.h.
>        (GCC_OBJS): Remove vec.o.
>        (OBJS): Remove hooks.o and vec.o.
>        (OBJS-libcommon-target): Add vec.o, hooks.o and
>        common/common-targhooks.o.
>        (c-family/c-common.o, c-family/c-cppbuiltin.o, lto-opts.o, tree.o,
>        tree-tailcall.o, opts.o, toplev.o, varasm.o, function.o, except.o,
>        expr.o, explow.o, dbxout.o, dwarf2out.o, cfgrtl.o, haifa-sched.o,
>        cfglayout.o, $(out_object_file), $(common_out_object_file)):
>        Update dependencies.
>        (common/common-targhooks.o): New.
>        * common/config/default-common.c: Include tm.h.  Add FIXME
>        comment.
>        * common/config/pa/pa-common.c: Include more headers.  Take
>        copyright dates from pa.c.
>        (pa_option_optimization_table, pa_handle_option,
>        TARGET_OPTION_OPTIMIZATION_TABLE, TARGET_DEFAULT_TARGET_FLAGS,
>        TARGET_HANDLE_OPTION): Move from pa.c.
>        * common/config/alpha/alpha-common.c,
>        common/config/arm/arm-common.c, common/config/avr/avr-common.c,
>        common/config/bfin/bfin-common.c,
>        common/config/cris/cris-common.c,
>        common/config/fr30/fr30-common.c, common/config/frv/frv-common.c,
>        common/config/h8300/h8300-common.c,
>        common/config/i386/i386-common.c,
>        common/config/ia64/ia64-common.c,
>        common/config/iq2000/iq2000-common.c,
>        common/config/lm32/lm32-common.c,
>        common/config/m32c/m32c-common.c,
>        common/config/m32r/m32r-common.c,
>        common/config/m68k/m68k-common.c,
>        common/config/mcore/mcore-common.c,
>        common/config/mep/mep-common.c,
>        common/config/microblaze/microblaze-common.c,
>        common/config/mips/mips-common.c,
>        common/config/mmix/mmix-common.c,
>        common/config/mn10300/mn10300-common.c,
>        common/config/pdp11/pdp11-common.c,
>        common/config/picochip/picochip-common.c,
>        common/config/rs6000/rs6000-common.c,
>        common/config/rx/rx-common.c, common/config/s390/s390-common.c,
>        common/config/score/score-common.c, common/config/sh/sh-common.c,
>        common/config/sparc/sparc-common.c,
>        common/config/spu/spu-common.c, common/config/v850/v850-common.c,
>        common/config/vax/vax-common.c,
>        common/config/xstormy16/xstormy16-common.c,
>        common/config/xtensa/xtensa-common.c: New.
>        * config/alpha/alpha.c: Include common/common-target.h.
>        (alpha_option_optimization_table, alpha_handle_option,
>        TARGET_DEFAULT_TARGET_FLAGS, TARGET_HANDLE_OPTION,
>        TARGET_OPTION_OPTIMIZATION_TABLE): Move to alpha-common.c.
>        * config/arm/arm-protos.h (arm_except_unwind_info): Declare.
>        * config/arm/arm.c (arm_option_optimization_table,
>        TARGET_DEFAULT_TARGET_FLAGS, TARGET_OPTION_OPTIMIZATION_TABLE,
>        TARGET_EXCEPT_UNWIND_INFO, arm_except_unwind_info): Move to
>        arm-common.c.
>        * config/avr/avr.c (avr_option_optimization_table,
>        TARGET_OPTION_OPTIMIZATI

C++ PATCH for c++/49290 (ICE regression on (T)(ar+10))

2011-06-14 Thread Jason Merrill

In this testcase, we were hitting an assert that I put in to make sure 
that fold_indirect_ref_1 was doing its job and folding everything that 
ought to be folded.  But fold_indirect_ref_1 doesn't want to mess with 
type identity, so it can't fold if, say, the array element type has 
different cv-quals from the desired result.  After some discussion, I'm 
copying fold_indirect_ref_1 into the front end so I can be more flexible 
about type matching.


For 4.6 I'll just disable the assert to avoid the regression on 
non-constexpr code and treat the expression as non-constant.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit e1aaed01f033819e6dab19ef6c01bf9f249dc5d4
Author: Jason Merrill 
Date:   Tue Jun 7 00:00:36 2011 -0400

	PR c++/49290
	* semantics.c (cxx_fold_indirect_ref): Local, more permissive copy
	of fold_indirect_ref_1.
	(cxx_eval_indirect_ref): Use it.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 481318e..55f9519 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6755,28 +6755,16 @@ cxx_eval_vec_init (const constexpr_call *call, tree t,
because we're dealing with things like ADDR_EXPR of INTEGER_CST which
don't really make sense outside of constant expression evaluation.  Also
we want to allow folding to COMPONENT_REF, which could cause trouble
-   with TBAA in fold_indirect_ref_1.  */
+   with TBAA in fold_indirect_ref_1.
+
+   Try to keep this function synced with fold_indirect_ref_1.  */
 
 static tree
-cxx_eval_indirect_ref (const constexpr_call *call, tree t,
-		   bool allow_non_constant, bool addr,
-		   bool *non_constant_p)
+cxx_fold_indirect_ref (location_t loc, tree type, tree op0, bool *empty_base)
 {
-  tree orig_op0 = TREE_OPERAND (t, 0);
-  tree op0 = cxx_eval_constant_expression (call, orig_op0, allow_non_constant,
-	   /*addr*/false, non_constant_p);
-  tree type, sub, subtype, r;
-  bool empty_base;
+  tree sub, subtype;
 
-  /* Don't VERIFY_CONSTANT here.  */
-  if (*non_constant_p)
-return t;
-
-  type = TREE_TYPE (t);
   sub = op0;
-  r = NULL_TREE;
-  empty_base = false;
-
   STRIP_NOPS (sub);
   subtype = TREE_TYPE (sub);
   gcc_assert (POINTER_TYPE_P (subtype));
@@ -6786,16 +6774,52 @@ cxx_eval_indirect_ref (const constexpr_call *call, tree t,
   tree op = TREE_OPERAND (sub, 0);
   tree optype = TREE_TYPE (op);
 
+  /* *&CONST_DECL -> to the value of the const decl.  */
+  if (TREE_CODE (op) == CONST_DECL)
+	return DECL_INITIAL (op);
+  /* *&p => p;  make sure to handle *&"str"[cst] here.  */
   if (same_type_ignoring_top_level_qualifiers_p (optype, type))
-	r = op;
+	{
+	  tree fop = fold_read_from_constant_string (op);
+	  if (fop)
+	return fop;
+	  else
+	return op;
+	}
+  /* *(foo *)&fooarray => fooarray[0] */
+  else if (TREE_CODE (optype) == ARRAY_TYPE
+	   && (same_type_ignoring_top_level_qualifiers_p
+		   (type, TREE_TYPE (optype
+	{
+	  tree type_domain = TYPE_DOMAIN (optype);
+	  tree min_val = size_zero_node;
+	  if (type_domain && TYPE_MIN_VALUE (type_domain))
+	min_val = TYPE_MIN_VALUE (type_domain);
+	  return build4_loc (loc, ARRAY_REF, type, op, min_val,
+			 NULL_TREE, NULL_TREE);
+	}
+  /* *(foo *)&complexfoo => __real__ complexfoo */
+  else if (TREE_CODE (optype) == COMPLEX_TYPE
+	   && (same_type_ignoring_top_level_qualifiers_p
+		   (type, TREE_TYPE (optype
+	return fold_build1_loc (loc, REALPART_EXPR, type, op);
+  /* *(foo *)&vectorfoo => BIT_FIELD_REF */
+  else if (TREE_CODE (optype) == VECTOR_TYPE
+	   && (same_type_ignoring_top_level_qualifiers_p
+		   (type, TREE_TYPE (optype
+	{
+	  tree part_width = TYPE_SIZE (type);
+	  tree index = bitsize_int (0);
+	  return fold_build3_loc (loc, BIT_FIELD_REF, type, op, part_width, index);
+	}
   /* Also handle conversion to an empty base class, which
 	 is represented with a NOP_EXPR.  */
-  else if (!addr && is_empty_class (type)
+  else if (is_empty_class (type)
 	   && CLASS_TYPE_P (optype)
 	   && DERIVED_FROM_P (type, optype))
 	{
-	  r = op;
-	  empty_base = true;
+	  *empty_base = true;
+	  return op;
 	}
   /* *(foo *)&struct_with_foo_field => COMPONENT_REF */
   else if (RECORD_OR_UNION_TYPE_P (optype))
@@ -6807,7 +6831,7 @@ cxx_eval_indirect_ref (const constexpr_call *call, tree t,
 		&& (same_type_ignoring_top_level_qualifiers_p
 		(TREE_TYPE (field), type)))
 	  {
-		r = fold_build3 (COMPONENT_REF, type, op, field, NULL_TREE);
+		return fold_build3 (COMPONENT_REF, type, op, field, NULL_TREE);
 		break;
 	  }
 	}
@@ -6825,8 +6849,49 @@ cxx_eval_indirect_ref (const constexpr_call *call, tree t,
 	  op00 = TREE_OPERAND (op00, 0);
 	  op00type = TREE_TYPE (op00);
 
+	  /* ((foo*)&vectorfoo)[1] => BIT_FIELD_REF */
+	  if (TREE_CODE (op00type) == VECTOR_TYPE
+	  && (same_type_ignoring_top_level_qualifiers_p
+		  (type, TREE_TYPE (op00type
+	{
+	  HOST_WIDE_INT offset = tree_low_cst

C++ PATCH for c++/49369 (wrong cv-quals on base member in unevaluated context)

2011-06-14 Thread Jason Merrill

We were forgetting to propagate cv-quals from 'this' to the result along 
one code path.  Fixed by moving the cv-qual propagation up so it's 
shared by all code paths.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.6.
commit a7eeb9dc7b67d159f46e9d8e7976332bd73332ca
Author: Jason Merrill 
Date:   Mon Jun 13 17:26:38 2011 -0400

	PR c++/49369
	* class.c (build_base_path): Fix cv-quals in unevaluated context.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 69627cb..09444fb 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -289,6 +289,12 @@ build_base_path (enum tree_code code,
   offset = BINFO_OFFSET (binfo);
   fixed_type_p = resolves_to_fixed_type_p (expr, &nonnull);
   target_type = code == PLUS_EXPR ? BINFO_TYPE (binfo) : BINFO_TYPE (d_binfo);
+  /* TARGET_TYPE has been extracted from BINFO, and, is therefore always
+ cv-unqualified.  Extract the cv-qualifiers from EXPR so that the
+ expression returned matches the input.  */
+  target_type = cp_build_qualified_type
+(target_type, cp_type_quals (TREE_TYPE (TREE_TYPE (expr;
+  ptr_target_type = build_pointer_type (target_type);
 
   /* Do we need to look in the vtable for the real offset?  */
   virtual_access = (v_binfo && fixed_type_p <= 0);
@@ -297,7 +303,7 @@ build_base_path (enum tree_code code,
  source type is incomplete and the pointer value doesn't matter.  */
   if (cp_unevaluated_operand != 0)
 {
-  expr = build_nop (build_pointer_type (target_type), expr);
+  expr = build_nop (ptr_target_type, expr);
   if (!want_pointer)
 	expr = build_indirect_ref (EXPR_LOCATION (expr), expr, RO_NULL);
   return expr;
@@ -312,18 +318,7 @@ build_base_path (enum tree_code code,
 	 field, because other parts of the compiler know that such
 	 expressions are always non-NULL.  */
   if (!virtual_access && integer_zerop (offset))
-	{
-	  tree class_type;
-	  /* TARGET_TYPE has been extracted from BINFO, and, is
-	 therefore always cv-unqualified.  Extract the
-	 cv-qualifiers from EXPR so that the expression returned
-	 matches the input.  */
-	  class_type = TREE_TYPE (TREE_TYPE (expr));
-	  target_type
-	= cp_build_qualified_type (target_type,
-   cp_type_quals (class_type));
-	  return build_nop (build_pointer_type (target_type), expr);
-	}
+	return build_nop (ptr_target_type, expr);
   null_test = error_mark_node;
 }
 
@@ -407,9 +402,6 @@ build_base_path (enum tree_code code,
 	offset = v_offset;
 }
 
-  target_type = cp_build_qualified_type
-(target_type, cp_type_quals (TREE_TYPE (TREE_TYPE (expr;
-  ptr_target_type = build_pointer_type (target_type);
   if (want_pointer)
 target_type = ptr_target_type;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype30.C b/gcc/testsuite/g++.dg/cpp0x/decltype30.C
new file mode 100644
index 000..b23c9a9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype30.C
@@ -0,0 +1,17 @@
+// PR c++/49369
+// { dg-options -std=c++0x }
+
+template  struct assert_same;
+template  struct assert_same {};
+
+struct B {
+  int member;
+};
+
+struct C: B {
+  void method() const;
+};
+
+void C::method() const {
+  assert_same a;
+}

Re: PATCH [7/n]: Prepare x32: Use Use long long builtin for x86-64

2011-06-14 Thread Uros Bizjak

On Tue, Jun 14, 2011 at 6:04 PM, H.J. Lu  wrote:

> long may be 32bit for x86-64. But long long is always 64bit.  This
> patch uses long long builtin for 64bit.  OK for trunk?
>
> Thanks.
>
>
> H.J.
> ---
> 2011-06-14  H.J. Lu  
>
>        * longlong.h (count_leading_zeros): Use long long builtin for
>        x86-64.
>        (count_trailing_zeros): Likewise.
>
> diff --git a/gcc/longlong.h b/gcc/longlong.h
> index 1bab76d..d5c0cd9 100644
> --- a/gcc/longlong.h
> +++ b/gcc/longlong.h
> @@ -430,8 +430,8 @@ UDItype __umulsidi3 (USItype, USItype);
>           : "0" ((UDItype) (n0)),                                      \
>             "1" ((UDItype) (n1)),                                      \
>             "rm" ((UDItype) (dv)))
> -#define count_leading_zeros(count, x)  ((count) = __builtin_clzl (x))
> -#define count_trailing_zeros(count, x) ((count) = __builtin_ctzl (x))
> +#define count_leading_zeros(count, x)  ((count) = __builtin_clzll (x))
> +#define count_trailing_zeros(count, x) ((count) = __builtin_ctzll (x))
>  #define UMUL_TIME 40
>  #define UDIV_TIME 40
>  #endif /* x86_64 */

Uh, this is also needed for MingW (LLP64 target).

The patch is OK for SVN and release branches, but please also wait for
approval from MingW maintainer.

Do we need to updata glibc as well?

Thanks,
Uros.

C++ PATCH for c++/49117 (error message regression on conversion failure)

2011-06-14 Thread Jason Merrill

PR 49117 complains that the error message given on conversion failure 
regressed from 4.5 to 4.6 in that it no longer prints the source type. 
So I've added it back in.


While I was at it, I've also tweaked the compiler to also print the 
typedef-stripped version of a type when appropriate, which should help 
with understanding template error messages.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.6.
commit 2978b60371c26f46ba5ac44244d94ef100cf9cf2
Author: Jason Merrill 
Date:   Tue Jun 14 09:43:04 2011 -0400

	PR c++/49117
	* call.c (perform_implicit_conversion_flags): Print source type as
	well as expression.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 4ee0eaf..b43d078 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8296,7 +8296,8 @@ perform_implicit_conversion_flags (tree type, tree expr, tsubst_flags_t complain
 	  else if (invalid_nonstatic_memfn_p (expr, complain))
 	/* We gave an error.  */;
 	  else
-	error ("could not convert %qE to %qT", expr, type);
+	error ("could not convert %qE from %qT to %qT", expr,
+		   TREE_TYPE (expr), type);
 	}
   expr = error_mark_node;
 }
diff --git a/gcc/testsuite/g++.dg/other/error23.C b/gcc/testsuite/g++.dg/other/error23.C
index 0ff1915..959fe40 100644
--- a/gcc/testsuite/g++.dg/other/error23.C
+++ b/gcc/testsuite/g++.dg/other/error23.C
@@ -2,4 +2,4 @@
 // { dg-do compile }
 
 int v __attribute ((vector_size (8)));
-bool b = !(v - v);	// { dg-error "could not convert .\\(__vector.2. int\\)\\{0, 0\\}. to .bool.|in argument to unary" }
+bool b = !(v - v);	// { dg-error "could not convert .\\(__vector.2. int\\)\\{0, 0\\}. from .__vector.2. int. to .bool.|in argument to unary" }
diff --git a/gcc/testsuite/g++.dg/other/error32.C b/gcc/testsuite/g++.dg/other/error32.C
index 35c64c4..56d3b7a 100644
--- a/gcc/testsuite/g++.dg/other/error32.C
+++ b/gcc/testsuite/g++.dg/other/error32.C
@@ -3,6 +3,6 @@
 
 void foo()
 {
-  if (throw 0) // { dg-error "could not convert .\\. to .bool." }
+  if (throw 0) // { dg-error "could not convert .\\. from .void. to .bool." }
 ;
 }
commit 16136651e85c19a1e8338a0bd1b2b1a453413c23
Author: Jason Merrill 
Date:   Tue Jun 14 09:43:25 2011 -0400

	* error.c (type_to_string): Print typedef-stripped version too.

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 96796c2..22470dc 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -2632,6 +2632,15 @@ type_to_string (tree typ, int verbose)
 
   reinit_cxx_pp ();
   dump_type (typ, flags);
+  if (typ && TYPE_P (typ) && typ != TYPE_CANONICAL (typ)
+  && !uses_template_parms (typ))
+{
+  tree aka = strip_typedefs (typ);
+  pp_string (cxx_pp, " {aka");
+  pp_cxx_whitespace (cxx_pp);
+  dump_type (aka, flags);
+  pp_character (cxx_pp, '}');
+}
   return pp_formatted_text (cxx_pp);
 }

[x32] PATCH: Add GLIBC_DYNAMIC_LINKERX32 to kfreebsd-gnu64.h

2011-06-14 Thread H.J. Lu

Hi,

I checked this patch into x32 branch.


H.J.
---
commit 7cce5a5ab2012d170287e705741ed29828a8af0e
Author: H.J. Lu 
Date:   Tue Jun 14 10:40:05 2011 -0700

Add GLIBC_DYNAMIC_LINKERX32 to kfreebsd-gnu64.h.

diff --git a/gcc/ChangeLog.x32 b/gcc/ChangeLog.x32
index 64a40a6..afea916 100644
--- a/gcc/ChangeLog.x32
+++ b/gcc/ChangeLog.x32
@@ -1,5 +1,9 @@
 2011-06-14  H.J. Lu  
 
+   * config/i386/kfreebsd-gnu64.h (GLIBC_DYNAMIC_LINKERX32): New.
+
+2011-06-14  H.J. Lu  
+
* config/i386/gnu-user64.h (LINK_SPEC): Use
GNU_USER_LINK_EMULATIONX32.
 
diff --git a/gcc/config/i386/kfreebsd-gnu64.h b/gcc/config/i386/kfreebsd-gnu64.h
index bdb2aeb..2085ca5 100644
--- a/gcc/config/i386/kfreebsd-gnu64.h
+++ b/gcc/config/i386/kfreebsd-gnu64.h
@@ -25,3 +25,4 @@ along with GCC; see the file COPYING3.  If not see
 
 #define GLIBC_DYNAMIC_LINKER32 "/lib/ld.so.1"
 #define GLIBC_DYNAMIC_LINKER64 "/lib/ld-kfreebsd-x86-64.so.1"
+#define GLIBC_DYNAMIC_LINKERX32 "/lib/ld-kfreebsd-x32.so.1"

[testsuite] ARM tests should ignore warning about conflicting switches

2011-06-14 Thread Janis Johnson

Many tests in gcc.target/arm that specify "-march=" fail compilation
when multilib flags include "-mcpu=" due to warnings about conflicts in
switches, but then go on to pass the remainder of the test.  This patch
causes some of those tests to ignore that compiler warning; I'll get to
the rest later.

Alternate options for tests that specify -march is to skip for multilibs
that include -mcpu, or a new test directive or effective target to skip
a test if the options used generate a warning.

Tested on arm-none-linux-gnueabi with a variety of multilib flags,
including some with "-mcpu=".  OK for trunk and 4.6?

Janis
2011-06-14  Janis Johnson  

* mla-1.c: Ignore warnings about conflicting switches.
* pr39839.c: Likewise.
* pr40657-2.c: Likewise.
* pr40956.c: Likewise.
* pr41679.c: Likewise.
* pr42235.c: Likewise.
* pr42495.c: Likewise.
* pr42505.c: Likewise.
* pr42574.c: Likewise.
* pr46883.c: Likewise.
* pr46934.c: Likewise.
* xor-and.c: Likewise.

Index: mla-1.c
===
--- mla-1.c (revision 174920)
+++ mla-1.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=armv5te" } */
+/* { dg-prune-output "switch .* conflicts with" } */
 
 
 int
Index: pr39839.c
===
--- pr39839.c   (revision 174920)
+++ pr39839.c   (working copy)
@@ -1,5 +1,6 @@
 /* { dg-options "-mthumb -Os -march=armv5te -mthumb-interwork -fpic" }  */
 /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* { dg-final { scan-assembler-not "str\[\\t \]*r.,\[\\t \]*.sp," } } */
 
 struct S
Index: pr40657-2.c
===
--- pr40657-2.c (revision 174920)
+++ pr40657-2.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-options "-Os -march=armv4t -mthumb" }  */
 /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* { dg-final { scan-assembler-not "sub\[\\t \]*sp,\[\\t \]*sp" } } */
 /* { dg-final { scan-assembler-not "add\[\\t \]*sp,\[\\t \]*sp" } } */
 
Index: pr40956.c
===
--- pr40956.c   (revision 174920)
+++ pr40956.c   (working copy)
@@ -1,6 +1,7 @@
 /* { dg-options "-mthumb -Os -fpic -march=armv5te" }  */
 /* { dg-require-effective-target arm_thumb1_ok } */
 /* { dg-require-effective-target fpic } */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* Make sure the constant "0" is loaded into register only once.  */
 /* { dg-final { scan-assembler-times "mov\[\\t \]*r., #0" 1 } } */
 
Index: pr41679.c
===
--- pr41679.c   (revision 174920)
+++ pr41679.c   (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=armv5te -g -O2" } */
+/* { dg-prune-output "switch .* conflicts with" } */
 
 extern int a;
 extern char b;
Index: pr42235.c
===
--- pr42235.c   (revision 174920)
+++ pr42235.c   (working copy)
@@ -1,5 +1,6 @@
 /* { dg-options "-mthumb -O2 -march=armv5te" }  */
 /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* { dg-final { scan-assembler-not "add\[\\t \]*r.,\[\\t \]*r.,\[\\t \]*\#1" } 
} */
 /* { dg-final { scan-assembler-not "add\[\\t \]*r.,\[\\t \]*\#1" } } */
 
Index: pr42495.c
===
--- pr42495.c   (revision 174920)
+++ pr42495.c   (working copy)
@@ -1,6 +1,7 @@
 /* { dg-options "-mthumb -Os -fpic -march=armv5te -fdump-rtl-hoist" }  */
 /* { dg-require-effective-target arm_thumb1_ok } */
 /* { dg-require-effective-target fpic } */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* Make sure all calculations of gObj's address get hoisted to one location.  
*/
 /* { dg-final { scan-rtl-dump "PRE/HOIST: end of bb .* copying expression" 
"hoist" } } */
 
Index: pr42505.c
===
--- pr42505.c   (revision 174920)
+++ pr42505.c   (working copy)
@@ -1,5 +1,6 @@
 /* { dg-options "-mthumb -Os -march=armv5te" }  */
 /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* { dg-final { scan-assembler-not "str\[\\t \]*r.,\[\\t \]*.sp," } } */
 
 struct A {
Index: pr42574.c
===
--- pr42574.c   (revision 174920)
+++ pr42574.c   (working copy)
@@ -1,6 +1,7 @@
 /* { dg-options "-mthumb -Os -fpic -march=armv5te" }  */
 /* { dg-require-effective-target arm_thumb1_ok } */
 /* { dg-require-effective-target fpic } */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* Make sure the address of glob.c is calculated only on

Re: PATCH [1/n]: Add initial -x32 support

2011-06-14 Thread H.J. Lu

On Sun, Jun 05, 2011 at 12:54:41PM -0700, H.J. Lu wrote:
> Hi,
> 
> I'd like to start submitting a series of patches to enable x32:
> 
> https://sites.google.com/site/x32abi/
> 
> The GCC x32 branch is very stable. There are no unexpected failures in
> C, C++, Fortran and Objective C testsuites.  SPEC CPU 2K/2006 compile
> and run correctly at -O2 and -O3. 
> 
> More than 90% of changes are in x86 backend.  This is the first patch to
> support x32.  By default, x32 is disabled and x32 run-time support
> isn't required.  OK for trunk?
> 
> Thanks.
> 
> 

Here is the updated patch based on the feedbacks.

Thanks.


H.J.
---
2011-06-14  H.J. Lu  

* config.gcc: Support --enable-x32/--enable-ia32 for x86 Linux
targets.

* configure.ac: Support --enable-x32/--enable-ia32.
* configure: Regenerated.

* config/i386/gnu-user64.h (SPEC_64): Support x32.
(SPEC_32): Likewise.
(ASM_SPEC): Likewise.
(LINK_SPEC): Likewise.
(TARGET_THREAD_SSP_OFFSET): Likewise.
(TARGET_THREAD_SPLIT_STACK_OFFSET): Likewise.
(SPEC_X32): New.

* config/i386/i386.h (TARGET_X32): New.
(TARGET_LP64): New.
(LONG_TYPE_SIZE): Likewise.
(POINTER_SIZE): Likewise.
(POINTERS_EXTEND_UNSIGNED): Likewise.
(OPT_ARCH64): Support x32.
(OPT_ARCH32): Likewise.

* config/i386/i386.opt (mx32): New.

* config/i386/kfreebsd-gnu64.h (GNU_USER_LINK_EMULATIONX32): New.
(GLIBC_DYNAMIC_LINKERX32): Likewise.
* config/i386/linux64.h (GNU_USER_LINK_EMULATIONX32): Likewise.
(GLIBC_DYNAMIC_LINKERX32): Likewise.

* config/i386/t-linux-x32: New.
* config/i386/t-linux64-x32: Likewise.

* config/linux.h (UCLIBC_DYNAMIC_LINKERX32): New.
(BIONIC_DYNAMIC_LINKERX32): Likewise.
(GNU_USER_DYNAMIC_LINKERX32): Likewise.

* doc/install.texi: Document --enable-ia32 and --enable-x32.

* doc/invoke.texi: Document -mx32.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e9704f3..e2b72df 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1232,7 +1232,17 @@ i[34567]86-*-linux* | i[34567]86-*-kfreebsd*-gnu | 
i[34567]86-*-knetbsd*-gnu | i
if test x$enable_targets = xall; then
tm_file="${tm_file} i386/x86-64.h i386/gnu-user64.h 
i386/linux64.h"
tm_defines="${tm_defines} TARGET_BI_ARCH=1"
-   tmake_file="${tmake_file} i386/t-linux64"
+   case x${enable_x32}${enable_ia32} in
+   xyesyes)
+   tmake_file="${tmake_file} i386/t-linux-x32"
+   ;;
+   xyesno)
+   tmake_file="${tmake_file} i386/t-linux64-x32"
+   ;;
+   *)
+   tmake_file="${tmake_file} i386/t-linux64"
+   ;;
+   esac
need_64bit_hwint=yes
need_64bit_isa=yes
case X"${with_cpu}" in
@@ -1270,7 +1280,18 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu | 
x86_64-*-knetbsd*-gnu)
x86_64-*-kfreebsd*-gnu) tm_file="${tm_file} kfreebsd-gnu.h 
i386/kfreebsd-gnu64.h" ;;
x86_64-*-knetbsd*-gnu) tm_file="${tm_file} knetbsd-gnu.h" ;;
esac
-   tmake_file="${tmake_file} i386/t-linux64 i386/t-crtstuff i386/t-crtpc 
i386/t-crtfm t-dfprules"
+   case x${enable_x32}${enable_ia32} in
+   xyesyes)
+   tmake_file="${tmake_file} i386/t-linux-x32"
+   ;;
+   xyesno)
+   tmake_file="${tmake_file} i386/t-linux64-x32"
+   ;;
+   *)
+   tmake_file="${tmake_file} i386/t-linux64"
+   ;;
+   esac
+   tmake_file="${tmake_file} i386/t-crtstuff i386/t-crtpc i386/t-crtfm 
t-dfprules"
;;
 i[34567]86-pc-msdosdjgpp*)
xm_file=i386/xm-djgpp.h
diff --git a/gcc/config/i386/gnu-user64.h b/gcc/config/i386/gnu-user64.h
index b069975..954f3b2 100644
--- a/gcc/config/i386/gnu-user64.h
+++ b/gcc/config/i386/gnu-user64.h
@@ -58,25 +58,31 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #if TARGET_64BIT_DEFAULT
 #define SPEC_32 "m32"
-#define SPEC_64 "!m32"
+#define SPEC_64 "m32|mx32:;"
+#define SPEC_X32 "mx32"
 #else
-#define SPEC_32 "!m64"
+#define SPEC_32 "m64|mx32:;"
 #define SPEC_64 "m64"
+#define SPEC_X32 "mx32"
 #endif
 
 #undef ASM_SPEC
-#define ASM_SPEC "%{" SPEC_32 ":--32} %{" SPEC_64 ":--64} \
+#define ASM_SPEC "%{" SPEC_32 ":--32} \
+ %{" SPEC_64 ":--64} \
+ %{" SPEC_X32 ":--x32} \
  %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}}"
 
 #undef LINK_SPEC
 #define LINK_SPEC "%{" SPEC_64 ":-m " GNU_USER_LINK_EMULATION64 "} \
%{" SPEC_32 ":-m " GNU_USER_LINK_EMULATION32 "} \
+   %{" SPEC_X32 ":-m " GNU_USER_LINK_EMUL

[v2] Mark noexcept some destructors, add tests

2011-06-14 Thread Paolo Carlini


Hi,

tested x86_64-linux, committed to mainline.

Paolo.


2011-06-14  Paolo Carlini  

* include/std/valarray (~valarray): Use noexcept.
* include/bits/unique_ptr.h (~unique_ptr): Likewise.
* testsuite/26_numerics/valarray/noexcept_move_construct.cc: New.
* testsuite/20_util/shared_ptr/cons/noexcept_move_construct.cc:
Likewise.
* testsuite/20_util/unique_ptr/cons/noexcept_move_construct.cc:
Likewise.
* testsuite/20_util/weak_ptr/cons/noexcept_move_construct.cc:
Likewise.
Index: include/std/valarray
===
--- include/std/valarray(revision 175025)
+++ include/std/valarray(working copy)
@@ -165,7 +165,7 @@
   template
valarray(const _Expr<_Dom, _Tp>& __e);
 
-  ~valarray();
+  ~valarray() _GLIBCXX_NOEXCEPT;
 
   // _lib.valarray.assign_ assignment:
   /**
@@ -697,7 +697,7 @@
 
   template
 inline
-valarray<_Tp>::~valarray()
+valarray<_Tp>::~valarray() _GLIBCXX_NOEXCEPT
 {
   std::__valarray_destroy_elements(_M_data, _M_data + _M_size);
   std::__valarray_release_memory(_M_data);
Index: include/bits/unique_ptr.h
===
--- include/bits/unique_ptr.h   (revision 175025)
+++ include/bits/unique_ptr.h   (working copy)
@@ -166,7 +166,7 @@
 #endif
 
   // Destructor.
-  ~unique_ptr() { reset(); }
+  ~unique_ptr() noexcept { reset(); }
 
   // Assignment.
   unique_ptr&
Index: testsuite/26_numerics/valarray/noexcept_move_construct.cc
===
--- testsuite/26_numerics/valarray/noexcept_move_construct.cc   (revision 0)
+++ testsuite/26_numerics/valarray/noexcept_move_construct.cc   (revision 0)
@@ -0,0 +1,27 @@
+// { dg-do compile }
+// { dg-options "-std=gnu++0x" }
+
+// 2011-06-14  Paolo Carlini  
+//
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+typedef std::valarray vtype;
+
+static_assert(std::is_nothrow_move_constructible::value, "Error");
Index: testsuite/20_util/shared_ptr/cons/noexcept_move_construct.cc
===
--- testsuite/20_util/shared_ptr/cons/noexcept_move_construct.cc
(revision 0)
+++ testsuite/20_util/shared_ptr/cons/noexcept_move_construct.cc
(revision 0)
@@ -0,0 +1,27 @@
+// { dg-do compile }
+// { dg-options "-std=gnu++0x" }
+
+// 2011-06-14  Paolo Carlini  
+//
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+typedef std::shared_ptr sptype;
+
+static_assert(std::is_nothrow_move_constructible::value, "Error");
Index: testsuite/20_util/unique_ptr/cons/noexcept_move_construct.cc
===
--- testsuite/20_util/unique_ptr/cons/noexcept_move_construct.cc
(revision 0)
+++ testsuite/20_util/unique_ptr/cons/noexcept_move_construct.cc
(revision 0)
@@ -0,0 +1,27 @@
+// { dg-do compile }
+// { dg-options "-std=gnu++0x" }
+
+// 2011-06-14  Paolo Carlini  
+//
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed

Re: PATCH [7/n]: Prepare x32: Use Use long long builtin for x86-64

2011-06-14 Thread H.J. Lu

On Tue, Jun 14, 2011 at 10:37 AM, Uros Bizjak  wrote:
> On Tue, Jun 14, 2011 at 6:04 PM, H.J. Lu  wrote:
>
>> long may be 32bit for x86-64. But long long is always 64bit.  This
>> patch uses long long builtin for 64bit.  OK for trunk?
>>
>> Thanks.
>>
>>
>> H.J.
>> ---
>> 2011-06-14  H.J. Lu  
>>
>>        * longlong.h (count_leading_zeros): Use long long builtin for
>>        x86-64.
>>        (count_trailing_zeros): Likewise.
>>
>> diff --git a/gcc/longlong.h b/gcc/longlong.h
>> index 1bab76d..d5c0cd9 100644
>> --- a/gcc/longlong.h
>> +++ b/gcc/longlong.h
>> @@ -430,8 +430,8 @@ UDItype __umulsidi3 (USItype, USItype);
>>           : "0" ((UDItype) (n0)),                                      \
>>             "1" ((UDItype) (n1)),                                      \
>>             "rm" ((UDItype) (dv)))
>> -#define count_leading_zeros(count, x)  ((count) = __builtin_clzl (x))
>> -#define count_trailing_zeros(count, x) ((count) = __builtin_ctzl (x))
>> +#define count_leading_zeros(count, x)  ((count) = __builtin_clzll (x))
>> +#define count_trailing_zeros(count, x) ((count) = __builtin_ctzll (x))
>>  #define UMUL_TIME 40
>>  #define UDIV_TIME 40
>>  #endif /* x86_64 */
>
> Uh, this is also needed for MingW (LLP64 target).
>
> The patch is OK for SVN and release branches, but please also wait for
> approval from MingW maintainer.
>
> Do we need to updata glibc as well?
>

Yes:

http://git.kernel.org/?p=devel/glibc/hjl/x86.git;a=commit;h=196911a6e77bbe851caff25ba260a25ceb9cf376


-- 
H.J.

[testsuite] skip ARM tests if no THUMB support

2011-06-14 Thread Janis Johnson

Fix three ARM tests so they are skipped for multilibs that don't support
THUMB.  OK for trunk and 4.6?

Janis
2011-06-14  Janis Johnson  

* gcc.target/arm/pr45701-1.c: Require thumb support.
* gcc.target/arm/pr45701-2.c: Likewise.
* gcc.target/arm/thumb-branch1.c: Likewise.

Index: gcc.target/arm/pr45701-1.c
===
--- gcc.target/arm/pr45701-1.c  (revision 174920)
+++ gcc.target/arm/pr45701-1.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-march=armv7-a -mthumb -Os" }  */
 /* { dg-final { scan-assembler "push\t\{r3" } } */
 /* { dg-final { scan-assembler-not "r8" } } */
Index: gcc.target/arm/pr45701-2.c
===
--- gcc.target/arm/pr45701-2.c  (revision 174920)
+++ gcc.target/arm/pr45701-2.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-march=armv7-a -mthumb -Os" }  */
 /* { dg-final { scan-assembler "push\t\{r3" } } */
 /* { dg-final { scan-assembler-not "r8" } } */
Index: gcc.target/arm/thumb-branch1.c
===
--- gcc.target/arm/thumb-branch1.c  (revision 174920)
+++ gcc.target/arm/thumb-branch1.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-Os -mthumb -march=armv5te" } */
 
 int returnbool(int a, int b)

Re: [testsuite] ARM tests should ignore warning about conflicting switches

2011-06-14 Thread Janis Johnson

On 06/14/2011 10:47 AM, Janis Johnson wrote:
> Many tests in gcc.target/arm that specify "-march=" fail compilation
> when multilib flags include "-mcpu=" due to warnings about conflicts in
> switches, but then go on to pass the remainder of the test.  This patch
> causes some of those tests to ignore that compiler warning; I'll get to
> the rest later.
> 
> Alternate options for tests that specify -march is to skip for multilibs
> that include -mcpu, or a new test directive or effective target to skip
> a test if the options used generate a warning.
> 
> Tested on arm-none-linux-gnueabi with a variety of multilib flags,
> including some with "-mcpu=".  OK for trunk and 4.6?
> 
> Janis

The ChangeLog entry should include gcc.target/arm for each of these tests.

[pph] pph_in_binding_level fixing shadowed_labels read (issue4589054)

2011-06-14 Thread Gabriel Charette

We weren't reading in shadowed labels properly.

The local variable *sl also turned out to be useless, the compiler just didn't
mention it until now as it was "used" by the bad VEC_iterate call.

This doesn't fix any currently exposed pph bugs, but does help with me with
the patch I'm currently writting.

This was tested with a bootstrap build and pph regression testing.


2011-06-14  Gabriel Charette  

* pph-streamer-in.c (pph_in_binding_level): Fix read
of shadowed_labels.
(pph_in_binding_level): Removed *sl.

Index: pph-streamer-in.c
===
--- pph-streamer-in.c   (revision 174998)
+++ pph-streamer-in.c   (working copy)
@@ -427,7 +427,6 @@
 pph_in_binding_level (pph_stream *stream)
 {
   unsigned i, num, ix;
-  cp_label_binding *sl;
   struct cp_binding_level *bl;
   struct bitpack_d bp;
   enum pph_record_marker marker;
@@ -461,7 +460,7 @@
 
   num = pph_in_uint (stream);
   bl->shadowed_labels = NULL;
-  for (i = 0; VEC_iterate (cp_label_binding, bl->shadowed_labels, i, sl); i++)
+  for (i = 0; i < num; i++)
 {
   cp_label_binding *sl = pph_in_label_binding (stream);
   VEC_safe_push (cp_label_binding, gc, bl->shadowed_labels, sl);

--
This patch is available for review at http://codereview.appspot.com/4589054

C++ PATCH for c++/49389 (wrong value category for .*)

2011-06-14 Thread Jason Merrill


If the object expression is an rvalue, the result should be as well.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 93619457bb3756b091d86a13d1aa72880bb1ac62
Author: Jason Merrill 
Date:   Mon Jun 13 22:19:24 2011 -0400

	PR c++/49389
	* typeck2.c (build_m_component_ref): Preserve rvalueness.

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index fa64d1d..d72f57e 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1551,6 +1551,7 @@ build_m_component_ref (tree datum, tree component)
 
   if (TYPE_PTRMEM_P (ptrmem_type))
 {
+  bool is_lval = real_lvalue_p (datum);
   tree ptype;
 
   /* Compute the type of the field, as described in [expr.ref].
@@ -1573,7 +1574,11 @@ build_m_component_ref (tree datum, tree component)
   datum = build2 (POINTER_PLUS_EXPR, ptype,
 		  fold_convert (ptype, datum),
 		  build_nop (sizetype, component));
-  return cp_build_indirect_ref (datum, RO_NULL, tf_warning_or_error);
+  datum = cp_build_indirect_ref (datum, RO_NULL, tf_warning_or_error);
+  /* If the object expression was an rvalue, return an rvalue.  */
+  if (!is_lval)
+	datum = move (datum);
+  return datum;
 }
   else
 return build2 (OFFSET_REF, type, datum, component);
diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-dotstar.C b/gcc/testsuite/g++.dg/cpp0x/rv-dotstar.C
new file mode 100644
index 000..65aac8d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/rv-dotstar.C
@@ -0,0 +1,13 @@
+// PR c++/49389
+// { dg-options -std=c++0x }
+
+template T&& val();
+
+struct A {};
+
+typedef decltype(val().*val()) type;
+
+template struct assert_type;
+template<> struct assert_type {};
+
+assert_type test;

fix pr48459

2011-06-14 Thread Richard Henderson

In this pr, during the initialization of the dwarf2 backend, we attempt
to cache a translation from a local stack frame address to the CFA.  We
do this optimistically, hoping to cut down the work later for every
local stack frame address that we find in the actual variables dumped.

Unfortunately, AVR has problems with the edge condition of no local
stack frame allocated.  In this case, the results of its register
elimination are different from what dwarf2out expects.  IMO, AVR is
justified in this, because the combination that dwarf2out wants is
invalid according to TARGET_CAN_ELIMINATE.

That said, this really shouldn't matter since, for the edge condition
in question, we won't actually use the translation to the CFA.  The 
moment that we generate an actual reference to the stack frame, we'll
actually generate a frame pointer, and everything else will DTRT.

Thus we can avoid the explosion by deferring the sanity check until
the translation is actually used.

Committed to HEAD; testing for 4.6 is still going.


r~
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 776066b..b33da64 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -6471,6 +6471,7 @@ static GTY(()) VEC(tree,gc) *generic_type_instances;
 /* Offset from the "steady-state frame pointer" to the frame base,
within the current function.  */
 static HOST_WIDE_INT frame_pointer_fb_offset;
+static bool frame_pointer_fb_offset_valid;
 
 static VEC (dw_die_ref, heap) *base_types;
 
@@ -13613,6 +13614,7 @@ based_loc_descr (rtx reg, HOST_WIDE_INT offset,
  return new_reg_loc_descr (base_reg, offset);
}
 
+  gcc_assert (frame_pointer_fb_offset_valid);
  offset += frame_pointer_fb_offset;
  return new_loc_descr (DW_OP_fbreg, offset, 0);
}
@@ -18336,14 +18338,20 @@ compute_frame_pointer_to_fb_displacement 
(HOST_WIDE_INT offset)
   elim = XEXP (elim, 0);
 }
 
-  gcc_assert ((SUPPORTS_STACK_ALIGNMENT
-  && (elim == hard_frame_pointer_rtx
-  || elim == stack_pointer_rtx))
- || elim == (frame_pointer_needed
- ? hard_frame_pointer_rtx
- : stack_pointer_rtx));
-
   frame_pointer_fb_offset = -offset;
+
+  /* ??? AVR doesn't set up valid eliminations when there is no stack frame
+ in which to eliminate.  This is because it's stack pointer isn't 
+ directly accessible as a register within the ISA.  To work around
+ this, assume that while we cannot provide a proper value for
+ frame_pointer_fb_offset, we won't need one either.  */
+  frame_pointer_fb_offset_valid
+= ((SUPPORTS_STACK_ALIGNMENT
+   && (elim == hard_frame_pointer_rtx
+   || elim == stack_pointer_rtx))
+   || elim == (frame_pointer_needed
+  ? hard_frame_pointer_rtx
+  : stack_pointer_rtx));
 }
 
 /* Generate a DW_AT_name attribute given some string value to be included as

[google] Merge r173574 to google/gcc-4_6 to fix an incompatibility between C++98 and C++0x (issue4592057)

2011-06-14 Thread Jeffrey Yasskin

In C++0x mode, without this patch, calls to a user-defined trunc() function 
with an argument in namespace std and a parameter type that has an implicit 
conversion from the argument's type, cause infinite recursion in std::trunc().

This patch also includes 
http://gcc.gnu.org/viewcvs/trunk/libstdc%2B%2B-v3/testsuite/26_numerics/headers/cmath/overloads_c%2B%2B0x_neg.cc?view=markup&pathrev=173574
 and 
http://gcc.gnu.org/viewcvs/trunk/libstdc%2B%2B-v3/testsuite/tr1/8_c_compatibility/cmath/overloads_neg.cc?view=markup&pathrev=173574,
 but `svn diff` didn't capture them.

Tested with `make check-c++` on x86_64-unknown-linux-gnu.

2011-06-14  Jeffrey Yasskin  

Merge r173574 to google/gcc-4_6.
* include/c_global/cmath (acosh, asinh, atanh, cbrt, copysign,
erf, erfc, exp2, expm1, fdim, fma, fmax, hypot, ilogb, lgamma,
llrint, llround, log1p, log2, logb, lrint, lround, nearbyint,
nextafter, nexttoward, remainder, remquo, rint, round, scalbln,
scalbn, tgamma, trunc): Use __enable_if on the return type.
* include/tr1/cmath: Likewise.
* testsuite/26_numerics/headers/cmath/overloads_c++0x_neg.cc: New.
* testsuite/tr1/8_c_compatibility/cmath/overloads_neg.cc: Likewise.

Property changes on: .
___
Modified: svn:mergeinfo
   Merged /trunk:r173574

Index: libstdc++-v3/include/c_global/cmath
===
--- libstdc++-v3/include/c_global/cmath (revision 175001)
+++ libstdc++-v3/include/c_global/cmath (working copy)
@@ -1,7 +1,7 @@
 // -*- C++ -*- C forwarding header.
 
 // Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
-// 2006, 2007, 2008, 2009, 2010
+// 2006, 2007, 2008, 2009, 2010, 2011
 // Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
@@ -1120,12 +1120,10 @@
   { return __builtin_acoshl(__x); }
 
   template
-inline typename __gnu_cxx::__promote<_Tp>::__type 
+inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
+  double>::__type
 acosh(_Tp __x)
-{
-  typedef typename __gnu_cxx::__promote<_Tp>::__type __type;
-  return acosh(__type(__x));
-}
+{ return __builtin_acosh(__x); }
 
   inline float
   asinh(float __x)
@@ -1136,12 +1134,10 @@
   { return __builtin_asinhl(__x); }
 
   template
-inline typename __gnu_cxx::__promote<_Tp>::__type 
+inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
+  double>::__type
 asinh(_Tp __x)
-{
-  typedef typename __gnu_cxx::__promote<_Tp>::__type __type;
-  return asinh(__type(__x));
-}
+{ return __builtin_asinh(__x); }
 
   inline float
   atanh(float __x)
@@ -1152,12 +1148,10 @@
   { return __builtin_atanhl(__x); }
 
   template
-inline typename __gnu_cxx::__promote<_Tp>::__type 
+inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
+  double>::__type
 atanh(_Tp __x)
-{
-  typedef typename __gnu_cxx::__promote<_Tp>::__type __type;
-  return atanh(__type(__x));
-}
+{ return __builtin_atanh(__x); }
 
   inline float
   cbrt(float __x)
@@ -1168,12 +1162,10 @@
   { return __builtin_cbrtl(__x); }
 
   template
-inline typename __gnu_cxx::__promote<_Tp>::__type 
+inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
+  double>::__type
 cbrt(_Tp __x)
-{
-  typedef typename __gnu_cxx::__promote<_Tp>::__type __type;
-  return cbrt(__type(__x));
-}
+{ return __builtin_cbrt(__x); }
 
   inline float
   copysign(float __x, float __y)
@@ -1184,7 +1176,11 @@
   { return __builtin_copysignl(__x, __y); }
 
   template
-inline typename __gnu_cxx::__promote_2<_Tp, _Up>::__type
+inline
+typename __gnu_cxx::__promote_2<
+typename __gnu_cxx::__enable_if<__is_arithmetic<_Tp>::__value
+   && __is_arithmetic<_Up>::__value,
+   _Tp>::__type, _Up>::__type
 copysign(_Tp __x, _Up __y)
 {
   typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
@@ -1200,12 +1196,10 @@
   { return __builtin_erfl(__x); }
 
   template
-inline typename __gnu_cxx::__promote<_Tp>::__type 
+inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
+  double>::__type
 erf(_Tp __x)
-{
-  typedef typename __gnu_cxx::__promote<_Tp>::__type __type;
-  return erf(__type(__x));
-}
+{ return __builtin_erf(__x); }
 
   inline float
   erfc(float __x)
@@ -1216,12 +1210,10 @@
   { return __builtin_erfcl(__x); }
 
   template
-inline typename __gnu_cxx::__promote<_Tp>::__type 
+inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value,

Re: [testsuite] ARM tests should ignore warning about conflicting switches

2011-06-14 Thread Mike Stump

On Jun 14, 2011, at 10:47 AM, Janis Johnson wrote:
> Many tests in gcc.target/arm that specify "-march=" fail compilation
> when multilib flags include "-mcpu=" due to warnings about conflicts in
> switches, but then go on to pass the remainder of the test.

> OK for trunk and 4.6?

Ok.  As usual, please wait for any trunk fallout before committing to 4.6.

Re: [testsuite] skip ARM tests if no THUMB support

2011-06-14 Thread Mike Stump

On Jun 14, 2011, at 10:58 AM, Janis Johnson wrote:
> Fix three ARM tests so they are skipped for multilibs that don't support
> THUMB.  OK for trunk and 4.6?

Ok.

Re: [pph] pph_in_binding_level fixing shadowed_labels read (issue4589054)

2011-06-14 Thread Diego Novillo

On Tue, Jun 14, 2011 at 13:59, Gabriel Charette  wrote:

> 2011-06-14  Gabriel Charette  
>
>        * pph-streamer-in.c (pph_in_binding_level): Fix read
>        of shadowed_labels.
>        (pph_in_binding_level): Removed *sl.

OK, committed as rev 175050.


Diego.

Re: [google] Merge r173574 to google/gcc-4_6 to fix an incompatibility between C++98 and C++0x (issue4592057)

2011-06-14 Thread Diego Novillo

On Tue, Jun 14, 2011 at 14:45, Jeffrey Yasskin  wrote:
> In C++0x mode, without this patch, calls to a user-defined trunc() function 
> with an argument in namespace std and
> a parameter type that has an implicit conversion from the argument's type, 
> cause infinite recursion in std::trunc().
>
> This patch also includes 
> http://gcc.gnu.org/viewcvs/trunk/libstdc%2B%2B-v3/testsuite/26_numerics/headers/cmath/overloads_c%2B%2B0x_neg.cc?view=markup&pathrev=173574
>  and 
> http://gcc.gnu.org/viewcvs/trunk/libstdc%2B%2B-v3/testsuite/tr1/8_c_compatibility/cmath/overloads_neg.cc?view=markup&pathrev=173574,
>  but `svn diff` didn't capture them.

Yeah, svn diff never picks up added files.  Silly thing.

>
> Tested with `make check-c++` on x86_64-unknown-linux-gnu.
>
> 2011-06-14  Jeffrey Yasskin  
>
>        Merge r173574 to google/gcc-4_6.
>        * include/c_global/cmath (acosh, asinh, atanh, cbrt, copysign,
>        erf, erfc, exp2, expm1, fdim, fma, fmax, hypot, ilogb, lgamma,
>        llrint, llround, log1p, log2, logb, lrint, lround, nearbyint,
>        nextafter, nexttoward, remainder, remquo, rint, round, scalbln,
>        scalbn, tgamma, trunc): Use __enable_if on the return type.
>        * include/tr1/cmath: Likewise.
>        * testsuite/26_numerics/headers/cmath/overloads_c++0x_neg.cc: New.
>        * testsuite/tr1/8_c_compatibility/cmath/overloads_neg.cc: Likewise.
>

Any reason not to put this in google/main?


Diego.

Re: Dump before flag

2011-06-14 Thread Xinliang David Li

Committed after Bootstrapping and regression testing on x86-64/linux.
The follow up patch will come soon.

Thanks,

David

On Tue, Jun 14, 2011 at 8:57 AM, Xinliang David Li  wrote:
> On Tue, Jun 14, 2011 at 6:58 AM, Richard Guenther
>  wrote:
>> On Fri, Jun 10, 2011 at 8:44 PM, Xinliang David Li  
>> wrote:
>>> This is the revised patch as suggested.
>>>
>>> How does it look?
>>
>>  }
>>
>> +static void
>> +execute_function_dump (void *data ATTRIBUTE_UNUSED)
>>
>> function needs a comment.
>>
>> Ok with that change.
>>
>> Please always specify how you tested the patch - the past fallouts
>> suggest you didn't do the required testing carefully.
>
> I think I did -- the fallout was probably due to different
> '--enable-checking' setting. I have now turned it to 'yes'
>
> Thanks,
>
> David
>
>>
>> A changelog is missing as well.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>>
>>> David
>>>
>>> On Fri, Jun 10, 2011 at 9:22 AM, Xinliang David Li  
>>> wrote:
 On Fri, Jun 10, 2011 at 1:52 AM, Richard Guenther
  wrote:
> On Thu, Jun 9, 2011 at 5:47 PM, Xinliang David Li  
> wrote:
>> See attached.
>
> Hmm.  I don't like how you still wire dumping in the TODO routines.
> Doesn't it work to just dump the body from pass_fini_dump_file ()?
> Or if that doesn't sound clean from (a subset of) places where it
> is called? (we might want to exclude the ipa read/write/summary
> stages)

 That may require another round of function traversal -- but probably
 not a big deal -- it sounds cleaner.

 David

>
> Richard.
>
>> Thanks,
>>
>> David
>>
>> On Thu, Jun 9, 2011 at 2:02 AM, Richard Guenther
>>  wrote:
>>> On Thu, Jun 9, 2011 at 12:31 AM, Xinliang David Li  
>>> wrote:
 this is the patch that just removes the TODO_dump flag and forces it
 to dump. The original code cfun->last_verified = flags &
 TODO_verify_all looks weird -- depending on TODO_dump is set or not,
 the behavior of the update is different (when no other todo flags is
 set).

 Ok for trunk?
>>>
>>> -ENOPATCH.
>>>
>>> Richard.
>>>
 David

 On Wed, Jun 8, 2011 at 9:52 AM, Xinliang David Li  
 wrote:
> On Wed, Jun 8, 2011 at 2:06 AM, Richard Guenther
>  wrote:
>> On Wed, Jun 8, 2011 at 1:08 AM, Xinliang David Li 
>>  wrote:
>>> The following is the patch that does the job. Most of the changes 
>>> are
>>> just  removing TODO_dump_func. The major change is in passes.c and
>>> tree-pass.h.
>>>
>>> -fdump-xxx-yyy-start       <-- dump before TODO_start
>>> -fdump-xxx-yyy-before    <-- dump before main pass after TODO_pass
>>> -fdump-xxx-yyy-after       <-- dump after main pass before 
>>> TODO_finish
>>> -fdump-xxx-yyy-finish      <-- dump after TODO_finish
>>
>> Can we bikeshed a bit more about these names?
>
> These names may be less confusing:
>
> before_preparation
> before
> after
> after_cleanup
>
> David
>
>> "start" and "before"
>> have no semantical difference to me ... as the dump before TODO_start
>> of a pass and the dump after TODO_finish of the previous pass are
>> identical (hopefully ;)), maybe merge those into a -between flag?
>> If you'd specify it for a single pass then you'd get both -start and 
>> -finish
>> (using your naming scheme).  Splitting that dump(s) to different 
>> files
>> then might make sense (not sure about the name to use).
>>
>> Note that I find it extremely useful to have dumping done in
>> chronological order - splitting some of it to different files 
>> destroys
>> this, especially a dump after TODO_start or before TODO_finish
>> should appear in the same file (or we could also start splitting
>> individual TODO_ output into sub-dump-files).  I guess what would
>> be nice instread would be a fancy dump-file viewer that could
>> show diffs, hide things like SCEV output, etc.
>>
>> I suppose a patch that removes the dump TODO and unconditionally
>> dumps at the current point would be a good preparation for this
>> enhancing patch.
>>
>> Richard.
>>
>>> The default is 'finish'.
>>>
>>> Does it look ok?
>>>
>>> Thanks,
>>>
>>> David
>>>
>>> On Tue, Jun 7, 2011 at 2:36 AM, Richard Guenther
>>>  wrote:
 On Mon, Jun 6, 2011 at 6:20 PM, Xinliang David Li 
  wrote:
>>
>> Your patch doesn't really improve this but adds to the confusion.
>>
>> +  /* Override dump TODOs.

Re: [google] Merge r173574 to google/gcc-4_6 to fix an incompatibility between C++98 and C++0x (issue4592057)

2011-06-14 Thread Jeffrey Yasskin

On Tue, Jun 14, 2011 at 12:38 PM, Diego Novillo  wrote:
> On Tue, Jun 14, 2011 at 14:45, Jeffrey Yasskin  wrote:
>> In C++0x mode, without this patch, calls to a user-defined trunc() function 
>> with an argument in namespace std and
>> a parameter type that has an implicit conversion from the argument's type, 
>> cause infinite recursion in std::trunc().
>>
>> This patch also includes 
>> http://gcc.gnu.org/viewcvs/trunk/libstdc%2B%2B-v3/testsuite/26_numerics/headers/cmath/overloads_c%2B%2B0x_neg.cc?view=markup&pathrev=173574
>>  and 
>> http://gcc.gnu.org/viewcvs/trunk/libstdc%2B%2B-v3/testsuite/tr1/8_c_compatibility/cmath/overloads_neg.cc?view=markup&pathrev=173574,
>>  but `svn diff` didn't capture them.
>
> Yeah, svn diff never picks up added files.  Silly thing.
>
>>
>> Tested with `make check-c++` on x86_64-unknown-linux-gnu.
>>
>> 2011-06-14  Jeffrey Yasskin  
>>
>>        Merge r173574 to google/gcc-4_6.
>>        * include/c_global/cmath (acosh, asinh, atanh, cbrt, copysign,
>>        erf, erfc, exp2, expm1, fdim, fma, fmax, hypot, ilogb, lgamma,
>>        llrint, llround, log1p, log2, logb, lrint, lround, nearbyint,
>>        nextafter, nexttoward, remainder, remquo, rint, round, scalbln,
>>        scalbn, tgamma, trunc): Use __enable_if on the return type.
>>        * include/tr1/cmath: Likewise.
>>        * testsuite/26_numerics/headers/cmath/overloads_c++0x_neg.cc: New.
>>        * testsuite/tr1/8_c_compatibility/cmath/overloads_neg.cc: Likewise.
>>
>
> Any reason not to put this in google/main?

It's already in trunk, so my impression was that it was going to be
automatically merged to google/main. I only need a manual merge to get
it into our release branches.

Jeffrey

Re: [google] Merge r173574 to google/gcc-4_6 to fix an incompatibility between C++98 and C++0x (issue4592057)

2011-06-14 Thread Diego Novillo

On Tue, Jun 14, 2011 at 15:59, Jeffrey Yasskin  wrote:

> It's already in trunk, so my impression was that it was going to be
> automatically merged to google/main. I only need a manual merge to get
> it into our release branches.

Yeah, in this case it's not too different since we'll be switching
google/main to trunk soonish.

OK for google/gcc-4_6.

Diego.

Re: RFA (fold): PATCH for c++/49290 (folding (T)(ar+10))

2011-06-14 Thread Mike Stump

On Jun 13, 2011, at 3:57 AM, Richard Guenther wrote:
> That's not exactly an example - I can't think of how you want or need
> to use VIEW_CONVERT_EXPRs to implement said divmod instruction or why
> you would need anything special for the _argument_ of said instruction.

Oh, I completely misunderstood your question.  In my case, as I previously 
stated, was with a vector type that was identical, save the name of the type:

mod = a%b

where mod didn't have the type of the expression (a%b), so someone created the 
VIEW_CONVERT_EXPR on the mod.  The person creating it _thought_ it would be a 
rvalue context, but ultimately, it was an lvalue context.  We discover the 
lvalue/rvalue state of the expression at target_fold_builtin time.  The actual 
code looks more like:

  __builtin_divmod (div, mod, a, b);

In fold_builtin, we do all the processing to handle the semantics.

> An
> instruction or call with multiple outputs would simply be something
> like
> 
> { div_1, mod_2 } = __builtin_divmod (arg_3);
> 
> with two SSA defs. A nice representation for the tree for { div_1,
> mod_2 } remains to be found (if it should be a single tree at all, or
> possibly multiple ones).

At target_fold_builtin time we regenerate it as:

s = builtin_divmod_final (a, b);
div_1 = s.div
mod_2 = s.mod

and generate a type { div, mod } on the fly.  We expect the optimizer to handle 
extra moves reasonably, and we want to keep the one instruction as one unit.

> We already play tricks for sincos for example via
> 
> tem_1 = __builtin_cexpi (arg_2);
> sin_3 = REALPART_EXPR ;
> cos_4 = IMAGPART_EXPR ;
> 
> which avoids the two defs by using a single def which is then decomposed.
> 
> So, can you elaborate a bit more on what you want to do with special
> argument kinds?  Elaborate with an actual example, not words.

We support tagging any parameter to a builtin as define_outputs, define_inputs 
or define_in_outs in a part of the .md file that describes the builtins for the 
machine, the actual divmod builtin for example is:

(define_builtin "divmod" "divmod_"
  [
(define_outputs [(var_operand:T_ALL_DI 0);;dividend 

 (var_operand:T_ALL_DI 1)])  ;;mod  

(define_inputs  [(var_operand:T_ALL_DI 2)
 (var_operand:T_ALL_DI 3)])
(define_rtl_pattern "divmod4" [0 1 2 3])
(attributes [pure])
  ]
)

that's the actual code.  The testcase looks like:

  t_v4udi_0 = divmodu_t_v4udi (t_v4udi_1, t_v4udi_2, t_v4udi_3);

The VIEW_CONVERT_EXPR looks like:

unit size 
align 64 symtab 0 alias set -1 canonical type 0x77e8c690 
precision 64 min  max\

pointer_to_this  reference_to_this 
>
unsigned V4DI
size 
unit size 
align 256 symtab 0 alias set -1 canonical type 0x77f4b930 nunits 4 
reference_to_this >

arg 0 
unsigned V4DI size  unit size 

align 256 symtab 0 alias set -1 canonical type 0x75ac3888 
nunits 4>
used public static unsigned V4DI defer-output file t22.c line 262 col 
48 size  unit \
size 
align 256>>

Hopefully, somewhere about is an example of what you wanted to see, if not, let 
me know what you'd like to see.

Re: [PATCH] sel-sched: Avoid placing bookkeeping code above a fence (PR49349)

2011-06-14 Thread Vladimir Makarov


On 06/14/2011 07:34 AM, Alexander Monakov wrote:

Hello,

Quoting myself from the PR audit trail,

It's a rare bug in sel-sched: we fail to schedule some code in non-pipelining
mode.  The root cause is that we put bookkeeping instructions above a fence
that is placed on the last insn (uncond. jump) of the bookkeeping block.  We
could either make such blocks ineligible for bookkeeping or rewind such fences
from the jump back to the bookkeeping code (there's also a more involved
approach of re-introducing the idea of using local nops as placeholders for
fences).  I'm testing the following patch that implements the second approach
(as it should result in a bit cleaner code in such situations).

I'm also removing a conditional that allows NULL place_to_insert in
generate_bookkeeping_insn, as I don't see how it can possibly happen with
current implementation of find_place_for_bookkeeping.

Bootstrapped and regtested on ia64-linux, OK for trunk?  Steve Ellcey
confirmed that HP-UX testing is OK as well.


Ok.  Thanks, Alexander.

2011-06-14  Alexander Monakov

PR target/49349
* sel-sched.c (find_place_for_bookkeeping): Add new parameter
(fence_to_rewind).  Use it to notice when bookkeeping will be placed
above a fence.  Update comments.
(generate_bookkeeping_insn): Rewind fence when bookkeeping code is
placed just above it.  Do not allow NULL place_to_insert.

[testsuite] (committed) let more ARM tests ignore warnings about conflicting switches

2011-06-14 Thread Janis Johnson

I made other changes to these tests earlier today, then the patch to
ignore warnings for conflicting options was approved.  I've committed
this to trunk.

Janis
2011-06-14  Janis Johnson  

* gcc.target/arm/pr45701-1.c: Ignore warnings about conflicting 
switches.
* gcc.target/arm/pr45701-2.c: Likewise.
* gcc.target/arm/thumb-branch1.c: Likewise.

Index: gcc.target/arm/pr45701-1.c
===
--- gcc.target/arm/pr45701-1.c  (revision 175047)
+++ gcc.target/arm/pr45701-1.c  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-march=armv7-a -mthumb -Os" }  */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* { dg-final { scan-assembler "push\t\{r3" } } */
 /* { dg-final { scan-assembler-not "r8" } } */
 
Index: gcc.target/arm/pr45701-2.c
===
--- gcc.target/arm/pr45701-2.c  (revision 175047)
+++ gcc.target/arm/pr45701-2.c  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-march=armv7-a -mthumb -Os" }  */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* { dg-final { scan-assembler "push\t\{r3" } } */
 /* { dg-final { scan-assembler-not "r8" } } */
 
Index: gcc.target/arm/thumb-branch1.c
===
--- gcc.target/arm/thumb-branch1.c  (revision 175047)
+++ gcc.target/arm/thumb-branch1.c  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-Os -mthumb -march=armv5te" } */
+/* { dg-prune-output "switch .* conflicts with" } */
 
 int returnbool(int a, int b)
 {

[testsuite] skip ARM tests if no thumb2 support

2011-06-14 Thread Janis Johnson

These tests apparently require thumb2 support (I don't yet know much
about ARM).  OK for trunk, and later 4.6?

Janis
2011-06-14  Janis Johnson  

* gcc.target/arm/pr42879.c: Skip if no thumb2 support, ignore
compiler warning about switch conflicts.
* gcc.target/arm/pr45701-3.c: Likewise.

Index: gcc.target/arm/pr42879.c
===
--- gcc.target/arm/pr42879.c(revision 175047)
+++ gcc.target/arm/pr42879.c(working copy)
@@ -1,4 +1,6 @@
+/* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options "-march=armv7-a -mthumb -Os" }  */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* { dg-final { scan-assembler "lsls" } } */
 
 struct A
Index: gcc.target/arm/pr45701-3.c
===
--- gcc.target/arm/pr45701-3.c  (revision 175047)
+++ gcc.target/arm/pr45701-3.c  (working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options "-march=armv7-a -mthumb -Os" }  */
+/* { dg-prune-output "switch .* conflicts with" } */
 /* { dg-final { scan-assembler "push\t.*r8" } } */
 /* { dg-final { scan-assembler-not "push\t*r3" } } */

[PATCH, PR 48613] Don't stream jump functions if there are none

2011-06-14 Thread Martin Jambor

Hi,

the patch below fixes PR 48613 which is an ICE with -O0
-findirect-inlining.  Rather than adding "&& optimize" here and there,
at this place we can easily see whether there is something to do or
not by testing ipa_node_params_vector for NULL.  And the
flag-triggering combinations can -and are - dealt elsewhere.

Bootstrapped and tested on trunk on x86_64-linux.  OK for trunk and
subsequently for the 4.6 branch too?

Thanks,

Martin

2011-06-13  Martin Jambor  

PR tree-optimization/48613
* ipa-prop.c (ipa_prop_write_jump_functions): Return immediately if
ipa_node_params_vector is NULL.

Index: src/gcc/ipa-prop.c
===
--- src.orig/gcc/ipa-prop.c
+++ src/gcc/ipa-prop.c
@@ -2900,12 +2900,15 @@ void
 ipa_prop_write_jump_functions (cgraph_node_set set)
 {
   struct cgraph_node *node;
-  struct output_block *ob = create_output_block (LTO_section_jump_functions);
+  struct output_block *ob;
   unsigned int count = 0;
   cgraph_node_set_iterator csi;
 
-  ob->cgraph_node = NULL;
+  if (!ipa_node_params_vector)
+return;
 
+  ob = create_output_block (LTO_section_jump_functions);
+  ob->cgraph_node = NULL;
   for (csi = csi_start (set); !csi_end_p (csi); csi_next (&csi))
 {
   node = csi_node (csi);

1 2 >

1 - 100 of 125 matches

Mail list logo