date:20121008

Re: Fixup INTEGER_CST

2012-10-08 Thread Richard Guenther

On Sun, Oct 7, 2012 at 7:22 PM, Jan Hubicka  wrote:
>> On Sun, Oct 7, 2012 at 5:15 PM, Jan Hubicka  wrote:
>> > Hi,
>> > I added a santy check that after fixup all types that lost in the merging 
>> > are
>> > really dead.  And it turns out we have some zombies around.
>> >
>> > INTEGER_CST needs special care because it is special cased by the 
>> > streamer.  We also
>> > do not want to do inplace modificaitons on it because that would corrupt 
>> > the hashtable
>> > used by tree.c's sharing code
>> >
>> > Bootstrapped/regtested x86_64-linux, OK?
>>
>> No, I don't think we want to fixup INTEGER_CSTs this way.  Instead we
>> want to fixup
>> them where they end up used unfixed.
>
> Erm, I think it is what the patch does?

Ah, indeed.

> It replaces pointers to integer_cst with type that did not survive by pointer
> to new integer_cst. (with the optimization that INTEGER_CST with overflow
> is changed in place because it is allowed to do so).

Btw ...

>> > @@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
>> >LTO_FIXUP_TREE (t->type_non_common.binfo);
>> >
>> >LTO_FIXUP_TREE (TYPE_CONTEXT (t));
>> > +
>> > +  if (TREE_CODE (t) == METHOD_TYPE)
>> > +TYPE_METHOD_BASETYPE (t);
>> > +  if (TREE_CODE (t) == OFFSET_TYPE)
>> > +TYPE_OFFSET_BASETYPE (t);

that looks like a no-op to me ... (both are TYPE_MAXVAL which
is already fixed up).

Thus, ok with removing the above hunk.

Thanks,
Richard.

>> >  }
>> >
>> >  /* Fix up fields of a BINFO T.  */

Re: handle isl and cloog in contrib/download_prerequisites

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 3:16 AM, Jonathan Wakely  wrote:
> On 7 October 2012 21:31, Manuel López-Ibáñez wrote:
>> On 7 October 2012 22:13, Jonathan Wakely  wrote:
>>>
>>> On Oct 7, 2012 12:00 AM, "NightStrike"  wrote:

 On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez
  wrote:
 > Hi,
 >
 > GCC now requires ISL and a very new CLOOG but download_prerequisites
 > does not download those. Also, there is only one sensible place to

 As of what version is isl/cloog no longer optional?
>>>
>>> If they're really no longer optional then the prerequisites page and 4.8
>>> changes page need to be updated.
>>>
>>> The patch downloads isl and cloog unconditionally, does gcc build them
>>> unconditionally if they're found in the source dir?  If they are still
>>> optional I don't want download_prerequisites to fetch files that will slow
>>> down building gcc by building libs and enabling features I don't use.
>>
>> I guess they are optional in the sense that you can configure gcc to
>> not require them. But the default configure in x86_64-gnu-linux
>> requires isl and cloog.
>
> Are you sure?
>
> Seems to me the default is still the same as it always has been, i.e.
> Graphite optimisations can be enabled if ISL and cloog are present,
> but they're not "required".  I can bootstrap without ISL anyway.

If good enough ISL and cloog are not found graphite is simply disabled
unless you explicitely enabled it via specifying either of ISL or cloog
configury.

Richard.

Re: [lra] patch to speed more compilation of PR54146

2012-10-08 Thread Richard Guenther

On Sun, Oct 7, 2012 at 11:27 PM, Steven Bosscher  wrote:
> On Sun, Oct 7, 2012 at 5:59 PM, Vladimir Makarov wrote:
>> The following patch speeds LRA up more on PR54146.  Below times for
>> compilation of the test on gcc17.fsffrance.org (an AMD machine):
>>
>> Before:
>> real=1214.71 user=1192.05 system=22.48
>> After:
>> real=1144.37 user=1124.31 system=20.11
>
> Hi Vlad,
>
> The next bottle-neck in my timings is in
> lra-eliminate.c:lra_eliminate(), in this loop:
>
>FOR_EACH_BB (bb)
>  FOR_BB_INSNS_SAFE (bb, insn, temp)
>{
>if (bitmap_bit_p (&insns_with_changed_offsets, INSN_UID (insn)))
>   process_insn_for_elimination (insn, final_p);
>}
>
> The problem is in bitmap_bit_p. Random access to a large bitmap can be
> very slow.
>
> I'm playing with a patch to expand the insns_with_changed_offsets
> bitmap to an sbitmap, and will send a patch if this works better.

Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).

Richard.

> Ciao!
> Steven

Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen  wrote:
> Attached is the updated patch. Yes, if we add a VRP pass before
> profile pass, this patch would be unnecessary. Should we add a VRP
> pass?

No, we don't want VRP in early optimizations.

Richard.

> Thanks,
> Dehao
>
> On Sat, Oct 6, 2012 at 9:38 AM, Jan Hubicka  wrote:
>>> ping^2
>>>
>>> Honza, do you think this patch can make into 4.8 stage 1?
>>
>> +  if (check_value_one ^ integer_onep (val))
>>
>> Probably better as !=
>> (especially because GNU coding standard allows predicates to return more than
>> just boolean)
>>
>>
>> +{
>> +  edge e1;
>> +  edge_iterator ei;
>> +  tree val = gimple_phi_arg_def (phi_stmt, i);
>> +  edge e = gimple_phi_arg_edge (phi_stmt, i);
>> +
>> +  if (!TREE_CONSTANT (val) || !(integer_zerop (val) || integer_onep 
>> (val)))
>> +   continue;
>> +  if (check_value_one ^ integer_onep (val))
>> +   continue;
>> +  if (VEC_length (edge, e->src->succs) != 1)
>> +   {
>> + if (!predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS_GUESSED)
>> + && !predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS)
>> + && !predicted_by_p (exit_edge->src, PRED_LOOP_EXIT))
>> +   predict_edge_def (e, PRED_LOOP_EXIT, NOT_TAKEN);
>> + continue;
>> +   }
>> +
>> +  FOR_EACH_EDGE (e1, ei, e->src->preds)
>> +   if (!predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS_GUESSED)
>> +   && !predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS)
>> +   && !predicted_by_p (exit_edge->src, PRED_LOOP_EXIT))
>> + predict_edge_def (e1, PRED_LOOP_EXIT, NOT_TAKEN);
>>
>> Here you found an edge that you know is going to terminate the loop
>> and you want to predict all paths to this edge as unlikely.
>> Perhaps you want to use predict paths leading_to_edge for edge?
>>
>> You do not need to check PRED_LOOP_ITERATIONS and 
>> PRED_LOOP_ITERATIONS_GUESSED
>> because those never go to the non-exit edges.
>>
>> The nature of predict_paths_for_bb type heuristic is that they are not really
>> additive: if the path leads to two different aborts it does not make it more
>> sure that it will be unlikely.  So perhaps you can add !predicted_by_p (e, 
>> pred)
>> prior predict_edge_def call in the function?
>>
>> I wonder if we did VRP just before branch predction to jump thread the 
>> shortcut
>> condtions into loopback edges, would be there still cases where this
>> optimization will match?
>>
>> Honza

[Patch ARM] Fix that miss DMB instruction for ARMv6-M

2012-10-08 Thread Terry Guo

Hi,

When running libstdc++ regression test on Cortex-M0, the case 49445.cc fails
with error message:

/tmp/ccMqZdgc.o: In function `std::atomic::load(std::memory_order)
const':^M
/home/build/work/GCC-4-7-build/build-native/gcc-final/arm-none-eabi/armv6-m/
libstdc++-v3/include/atomic:202: undefined reference to
`__sync_synchronize'^M
/home/build/work/GCC-4-7-build/build-native/gcc-final/arm-none-eabi/armv6-m/
libstdc++-v3/include/atomic:202: undefined reference to
`__sync_synchronize'^M
/tmp/ccMqZdgc.o: In function `std::atomic::load(std::memory_order)
const':^M
/home/build/work/GCC-4-7-build/build-native/gcc-final/arm-none-eabi/armv6-m/
libstdc++-v3/include/atomic:202: undefined reference to
`__sync_synchronize'^M
/home/build/work/GCC-4-7-build/build-native/gcc-final/arm-none-eabi/armv6-m/
libstdc++-v3/include/atomic:202: undefined reference to
`__sync_synchronize'^M
collect2: error: ld returned 1 exit status^M
compiler exited with status 1

After investigation, the reason is current gcc doesn't think armv6-m has DMB
instruction. While according to ARM manuals, it has. With this wrong
assumption, the expand_mem_thread_fence will generate a call to library
function __sync_synchronize rather than DMB instruction. While no code to
implement this library function, so the error generates.

The attached patch intends to fix this issue by letting gcc also think
armv6-m has DMB instruction. Is it OK to trunk?

BR,
Terry

2012-10-08  Terry Guo  

* config/arm/arm.c (arm_arch6m): New variable to denote armv6-m
architecture.
* config/arm/arm.h (TARGET_HAVE_DMB): The armv6-m also has DMB
instruction.



armv6m-dmb.patch
Description: Binary data

Re: handle isl and cloog in contrib/download_prerequisites

2012-10-08 Thread Manuel López-Ibáñez

On 8 October 2012 09:18, Richard Guenther  wrote:
> On Mon, Oct 8, 2012 at 3:16 AM, Jonathan Wakely  wrote:
>> On 7 October 2012 21:31, Manuel López-Ibáñez wrote:
>>> On 7 October 2012 22:13, Jonathan Wakely  wrote:

 On Oct 7, 2012 12:00 AM, "NightStrike"  wrote:
>
> On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez
>  wrote:
> > Hi,
> >
> > GCC now requires ISL and a very new CLOOG but download_prerequisites
> > does not download those. Also, there is only one sensible place to
>
> As of what version is isl/cloog no longer optional?

 If they're really no longer optional then the prerequisites page and 4.8
 changes page need to be updated.

 The patch downloads isl and cloog unconditionally, does gcc build them
 unconditionally if they're found in the source dir?  If they are still
 optional I don't want download_prerequisites to fetch files that will slow
 down building gcc by building libs and enabling features I don't use.
>>>
>>> I guess they are optional in the sense that you can configure gcc to
>>> not require them. But the default configure in x86_64-gnu-linux
>>> requires isl and cloog.
>>
>> Are you sure?
>>
>> Seems to me the default is still the same as it always has been, i.e.
>> Graphite optimisations can be enabled if ISL and cloog are present,
>> but they're not "required".  I can bootstrap without ISL anyway.
>
> If good enough ISL and cloog are not found graphite is simply disabled
> unless you explicitely enabled it via specifying either of ISL or cloog
> configury.

As I said, this didn't work for me, after trying quite a few things
(not specifying anything, using with-ils/with-cloog, build cloog/isl
in several ways...). I could try to reproduce the issues and open PRs,
but it doesn't seem worth the time. My advice would be: use the script
or disable graphite, and be happy. In any case, I am not going to
commit the patch, I'll keep it local. Anyone feel free to take it and
do what you wish with it. I think there are some nice parts even if
cloog and isl are removed.

[testsuite] Minor housekeeping work

2012-10-08 Thread Eric Botcazou

Recent tests added to gcc.dg/tree-ssa don't clean up after themselves.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2012-10-08  Eric Botcazou  

* gcc.dg/tree-ssa/slsr-30.c: Use correct cleanup directive.
* gcc.dg/tree-ssa/attr-hotcold-2.c: Likewise.
* gcc.dg/tree-ssa/ldist-21.c: Add missing cleanup directive.


-- 
Eric BotcazouIndex: gcc.dg/tree-ssa/slsr-30.c
===
--- gcc.dg/tree-ssa/slsr-30.c	(revision 192137)
+++ gcc.dg/tree-ssa/slsr-30.c	(working copy)
@@ -21,4 +21,4 @@ f (int s, long c)
 }
 
 /* { dg-final { scan-tree-dump-times " \\* " 3 "dom2" } } */
-/* { dg-final { cleanup-tree-dump "optimized" } } */
+/* { dg-final { cleanup-tree-dump "dom2" } } */
Index: gcc.dg/tree-ssa/attr-hotcold-2.c
===
--- gcc.dg/tree-ssa/attr-hotcold-2.c	(revision 192137)
+++ gcc.dg/tree-ssa/attr-hotcold-2.c	(working copy)
@@ -25,4 +25,4 @@ void f(int x, int y)
the testcase around too much.  */
 /* { dg-final { scan-ipa-dump-times "block 5, loop depth 0, count 0, freq \[6-9\]\[0-9\]\[0-9\]\[0-9\]" 1 "profile_estimate" } } */
 
-/* { dg-final { cleanup-tree-dump "profile_estimate" } } */
+/* { dg-final { cleanup-ipa-dump "profile_estimate" } } */
Index: gcc.dg/tree-ssa/ldist-21.c
===
--- gcc.dg/tree-ssa/ldist-21.c	(revision 192137)
+++ gcc.dg/tree-ssa/ldist-21.c	(working copy)
@@ -9,3 +9,4 @@ void bar(char *p, int n)
 }
 
 /* { dg-final { scan-tree-dump "generated memmove" "ldist" } } */
+/* { dg-final { cleanup-tree-dump "ldist" } } */

Re: [lra] patch to speed more compilation of PR54146

2012-10-08 Thread Jakub Jelinek

On Mon, Oct 08, 2012 at 09:20:47AM +0200, Richard Guenther wrote:
> On Sun, Oct 7, 2012 at 11:27 PM, Steven Bosscher  
> wrote:
> > The next bottle-neck in my timings is in
> > lra-eliminate.c:lra_eliminate(), in this loop:
> >
> >FOR_EACH_BB (bb)
> >  FOR_BB_INSNS_SAFE (bb, insn, temp)
> >{
> >if (bitmap_bit_p (&insns_with_changed_offsets, INSN_UID (insn)))
> >   process_insn_for_elimination (insn, final_p);
> >}
> >
> > The problem is in bitmap_bit_p. Random access to a large bitmap can be
> > very slow.
> >
> > I'm playing with a patch to expand the insns_with_changed_offsets
> > bitmap to an sbitmap, and will send a patch if this works better.
> 
> Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).

Or use temporarily some rtx flag on the insns, from what I can see,
in_struct on *INSN is right now only used during scheduling and from reorg
till eoc, so for LRA sitting in between both scheduling passes it might
be possible to use that bit too.

Jakub

Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-10-08 Thread Richard Guenther

On Fri, 5 Oct 2012, Steven Bosscher wrote:

> On Fri, Sep 14, 2012 at 2:26 PM, Richard Guenther  wrote:
> > If you can figure out a better name for the function we should
> > probably move it to cfganal.c
> 
> It looks like my previous e-mail about this appears to have gone got
> somehow, so retry:
> 
> Your my_rev_post_order_compute is simply inverted_post_order_compute.
> The only difference is that you'll want to ignore EXIT_BLOCK, which is
> always added to the list by inverted_post_order_compute.

Indeed.  inverted_post_order_compute seems to handle a CFG without
infinite-loop and noreturns connected to exit though.  Possibly
that's why it doesn't care for not handling entry/exit.

I'm testing a patch to use inverted_post_order_compute from PRE.

Richard.

Re: vec_cond_expr adjustments

2012-10-08 Thread Richard Guenther

On Fri, Oct 5, 2012 at 5:01 PM, Marc Glisse  wrote:
> [I am still a little confused, sorry for the long email...]
>
>
> On Tue, 2 Oct 2012, Richard Guenther wrote:
>
> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
> +{
> +  int count = VECTOR_CST_NELTS (op0);
> +  tree *elts =  XALLOCAVEC (tree, count);
> +  gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
> +
> +  for (int i = 0; i < count; i++)
> +   {
> + tree elem_type = TREE_TYPE (type);
> + tree elem0 = VECTOR_CST_ELT (op0, i);
> + tree elem1 = VECTOR_CST_ELT (op1, i);
> +
> + elts[i] = fold_relational_const (code, elem_type,
> +  elem0, elem1);
> +
> + if(elts[i] == NULL_TREE)
> +   return NULL_TREE;
> +
> + elts[i] = fold_negate_const (elts[i], elem_type);



 I think you need to invent something new similar to STORE_FLAG_VALUE
 or use STORE_FLAG_VALUE here.  With the above you try to map
 {0, 1} to {0, -1} which is only true if the operation on the element
 types
 returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
>>>
>>>
>>> Er, seems to me that constant folding of a scalar comparison in the
>>> front/middle-end only returns {0, 1}.
>
> [and later]
>
>> I'd say adjust your fold-const patch to not negate the scalar result
>> but build a proper -1 / 0 value based on integer_zerop().
>
>
> I don't mind doing it that way, but I would like to understand first.
> LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So negating
> should be the same as testing with integer_zerop to build -1 or 0. Is it
> just a matter of style (then I am ok), or am I missing a reason which makes
> the negation wrong?

Just a matter of style.  Negating is a lot less descriptive for the actual
set of return values we produce.

>> The point is we need to define some semantics for vector comparison
>> results.
>
>
> Yes. I think a documentation patch should come first: generic.texi is
> missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't mention
> vectors. But before that we need to decide what to put there...
>
>
>> One variant is to make it target independent which in turn
>> would inhibit (or make it more difficult) to exploit some target features.
>> You for example use {0, -1} for truth values - probably to exploit target
>> features -
>
>
> Actually it was mostly because that is the meaning in the language. OpenCL
> says that a the elements in the condition. The fact that it matches what some targets do
> is a simple consequence of the fact that OpenCL was based on what hardware
> already did.

Yes, it seems that the {0, -1} choice is most reasonable for GENERIC.  So
let's document that.

>
>> even though the most natural middle-end way would be to
>> use {0, 1} as for everything else
>
>
> I agree that it would be natural and convenient in a number of places.
>
>
>> (caveat: there may be both signed and unsigned bools, we don't allow
>> vector components with non-mode precision, thus you could argue that a
>> signed bool : 1 is just "sign-extended" for your solution).
>
>
> Not sure how that would translate in the code.
>
>
>> A different variant is to make it target dependent to leverage
>> optimization opportunities
>
>
> That's an interesting possibility...
>
>
>> that's why STORE_FLAG_VALUE exists.
>
>
> AFAICS it only appears when we go from gimple to rtl, not before (and there
> is already a VECTOR_STORE_FLAG_VALUE, although no target defines it). Which
> doesn't mean we couldn't make it appear earlier for vectors.
>
>
>> For example with vector comparisons a < v result, when
>> performing bitwise operations on it, you either have to make the target
>> expand code to produce {0, -1} even if the natural compare instruction
>> would, say, produce {0, 0x8} - or not constrain the possible values
>> of its result (like forwprop would do with your patch).  In general we
>> want constant folding to yield the same results as if the HW carried
>> out the operation to make -O0 code not diverge from -O1.  Thus,
>>
>> v4si g;
>> int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }
>>
>> should not assign different values to g dependent on constant propagation
>> performed or not.
>
>
> That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether
> your target likes it or not. Depending on how things are handled,
> comparisons could be constrained internally to only appear (possibly
> indirectly) in the first argument of a vec_cond_expr.

Yes, I realized that later.

>
>> The easiest way out is something like STORE_FLAG_VALUE
>> if there does not exist a middle-end choice for vector true / false
>> components
>> that can be easily generated from what the target produces.
>>
>> Like if you perform a FP comparison
>>
>> int main () { double x = 1.0; static _Bool b; b = x < 3.0; }
>>
>> you ge

Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Jan Hubicka

> On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen  wrote:
> > Attached is the updated patch. Yes, if we add a VRP pass before
> > profile pass, this patch would be unnecessary. Should we add a VRP
> > pass?
> 
> No, we don't want VRP in early optimizations.

I am not quite sure about that.  VRP
 1) makes branch prediction work better by doing jump threading early
 2) is, after FRE, most effective tree pass on removing code by my profile
statistics.

But that would require more analysis.
The patch is OK.
Honza

Re: patch to fix constant math - second small patch

2012-10-08 Thread Richard Guenther

On Sat, Oct 6, 2012 at 12:48 AM, Kenneth Zadeck
 wrote:
> This patch adds machinery to genmodes.c so that largest possible sizes of
> various data structures can be determined at gcc build time.  These
> functions create 3 symbols that are available in insn-modes.h:
> MAX_BITSIZE_MODE_INT - the bitsize of the largest int.
> MAX_BITSIZE_MODE_PARTIAL_INT - the bitsize of the largest partial int.
> MAX_BITSIZE_MODE_ANY_INT - the largest bitsize of any kind of int.

Ok.  Please document these macros in rtl.texi.

Richard.

Re: patch to fix constant math - third small patch

2012-10-08 Thread Richard Guenther

On Sat, Oct 6, 2012 at 5:55 PM, Kenneth Zadeck  wrote:
> This is the third patch in the series of patches to fix constant math.
> this one changes some predicates at the rtl level to use the new predicate
> CONST_SCALAR_INT_P.
> I did not include a few that were tightly intertwined with other changes.
>
> Not all of these changes are strictly mechanical.   Richard, when reviewing
> this had me make additional changes to remove what he thought were latent
> bugs at the rtl level.   However, it appears that the bugs were not latent.
> I do not know what is going on here but i am smart enough to not look a gift
> horse in the mouth.
>
> All of this was done on the same machine with no changes and identical
> configs.  It is an x86-64 with ubuntu 12-4.
>
> ok for commit?

Patch missing, but if it's just mechanical changes and introduction
of CONST_SCALAR_INT_P consider it pre-approved.

Richard.

> in the logs below, gbBaseline is a trunk from friday and the gbWide is the
> same revision but with my patches.  Some of this like gfortran.dg/pr32627 is
> obviously flutter, but the rest does not appear to be.
>
> =
> heracles:~/gcc(13) gccBaseline/contrib/compare_tests
> gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log
> New tests that PASS:
>
> gcc.dg/builtins-85.c scan-assembler mysnprintf
> gcc.dg/builtins-85.c scan-assembler-not __chk_fail
> gcc.dg/builtins-85.c (test for excess errors)
>
>
> heracles:~/gcc(14) gccBaseline/contrib/compare_tests
> gbBaseline/gcc/testsuite/gfortran/gfortran.log
> gbWide/gcc/testsuite/gfortran/gfortran.log
> New tests that PASS:
>
> gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-loops (test for
> excess errors)
> gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer  (test for excess errors)
> gfortran.dg/pr32627.f03  -Os  (test for excess errors)
> gfortran.dg/pr32635.f  -O0  execution test
> gfortran.dg/pr32635.f  -O0  (test for excess errors)
> gfortran.dg/substr_6.f90  -O2  (test for excess errors)
>
> Old tests that passed, that have disappeared: (Eeek!)
>
> gfortran.dg/pr32627.f03  -O1  (test for excess errors)
> gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-all-loops
> -finline-functions  (test for excess errors)
> gfortran.dg/pr32627.f03  -O3 -g  (test for excess errors)
> gfortran.dg/substring_equivalence.f90  -O  (test for excess errors)
> Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as
> tool-and-target-specific interface file.
>
> === g++ Summary ===
>
> # of expected passes49793
> # of expected failures284
> # of unsupported tests601
>
> runtest completed at Fri Oct  5 16:10:20 2012
> heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using
> /usr/share/dejagnu/config/unix.exp as generic interface file for target.
> Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as
> tool-and-target-specific interface file.
>
> === g++ Summary ===
>
> # of expected passes50472
> # of expected failures284
> # of unsupported tests613
>
> runtest completed at Fri Oct  5 19:51:50 2012
>
>
>
>
>

Re: Fixup INTEGER_CST

2012-10-08 Thread Jan Hubicka

> On Sun, Oct 7, 2012 at 7:22 PM, Jan Hubicka  wrote:
> >> On Sun, Oct 7, 2012 at 5:15 PM, Jan Hubicka  wrote:
> >> > Hi,
> >> > I added a santy check that after fixup all types that lost in the 
> >> > merging are
> >> > really dead.  And it turns out we have some zombies around.
> >> >
> >> > INTEGER_CST needs special care because it is special cased by the 
> >> > streamer.  We also
> >> > do not want to do inplace modificaitons on it because that would corrupt 
> >> > the hashtable
> >> > used by tree.c's sharing code
> >> >
> >> > Bootstrapped/regtested x86_64-linux, OK?
> >>
> >> No, I don't think we want to fixup INTEGER_CSTs this way.  Instead we
> >> want to fixup
> >> them where they end up used unfixed.
> >
> > Erm, I think it is what the patch does?
> 
> Ah, indeed.
> 
> > It replaces pointers to integer_cst with type that did not survive by 
> > pointer
> > to new integer_cst. (with the optimization that INTEGER_CST with overflow
> > is changed in place because it is allowed to do so).
> 
> Btw ...
> 
> >> > @@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
> >> >LTO_FIXUP_TREE (t->type_non_common.binfo);
> >> >
> >> >LTO_FIXUP_TREE (TYPE_CONTEXT (t));
> >> > +
> >> > +  if (TREE_CODE (t) == METHOD_TYPE)
> >> > +TYPE_METHOD_BASETYPE (t);
> >> > +  if (TREE_CODE (t) == OFFSET_TYPE)
> >> > +TYPE_OFFSET_BASETYPE (t);
> 
> that looks like a no-op to me ... (both are TYPE_MAXVAL which
> is already fixed up).

Ah, indeed.  They were result of experimenting with the stale pointers to the
obsoletted types and field decls.  I now understand where they come from.  The
reason is twofold.

  1) after merging records we replace field decls in the cache
 by new ones.  This however does not mean that they die, because
 the existing pointers to them are not replaced.
 I have WIP patch for that that however require one extra pass
 over the list of all trees.
  2) As we query the type_hash while we are rewritting the types,
 we run into instability of the hashtable. This manifests itself
 as an ICE when one adds sanity check that while merging function
 types their arg types are equivalent, too.
 This ICEs compiling i.e. sqlite but I did not really managed to
 reduce this.  I tracked it down to the argument type being inserted
 into gimple_type_hash but at the time we query the new argument type,
 the original is no longer found despite their hashes are equivalent.
 The problem is hidden when things fit into the leader cache,
 so one needs rather big testcase.

So I tried to register all gimple types first.  Use TREE_VISITED within
the merging code to mark that type is not a leader and then TREE_CHAIN 
to point to the leader.  This avoids need to re-query the hashtable
from the later fixups.  We only look for types with TREEE_VISITED
and replace them by TREE_CHAIN.
This has two passes.  First we compute the main variants and mark
field_decls and type_decls for merging and in last pass we finally do
fixup on what remained in the table.

This allows me to poison pointers in the removed types in a way
so the GGC would ICE if they stayed reachable.
I however need the extra pass because
 1) I can not update the type_decls/field_decls while registering
types or I run into the hash table problems
 2) I can not merge the second two passes because at the time
I find type/field decls equialent there may be earlier pointers
to them.

Honza

Re: Check that unlinked uses do not contain ssa-names when renaming.

2012-10-08 Thread Richard Guenther

On Sun, Oct 7, 2012 at 12:44 PM, Tom de Vries  wrote:
> Richard,
>
> attached patch checks that unlinked uses do not contain ssa-names when 
> renaming.
>
> This assert triggers when compiling (without the fix) the PR54735 example.
>
> AFAIU, it was due to chance that we caught the PR54735 bug by hitting the
> verification failure, because the new vdef introduced by renaming happened to 
> be
> the same name as the ssa name referenced in the invalid unlinked use (in terms
> of maybe_replace_use: rdef == use).
>
> The assert from this patch catches all cases that an unlinked use contains an
> ssa-name.
>
> Bootstrapped and reg-tested on x86_64 (Ada inclusive).
>
> OK for trunk?

I don't think that is exactly what we should assert here ... (I thought about
adding checking myself ...).  What we'd want to assert is that before
any new DEF is registered (which may re-allocate an SSA name) that
no uses with SSA_NAME_IN_FREELIST appear.  Thus, a light verification
pass would be necessary at the beginning of update_ssa
(which I queued onto my TODO list ...).  We'd want that anyway to for
example catch the case where a non-virtual operand is partially renamed.

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2012-10-07  Tom de Vries  
>
> * tree-into-ssa.c (maybe_replace_use): Add assert.

Re: patch to fix constant math

2012-10-08 Thread Richard Guenther

On Sun, Oct 7, 2012 at 4:58 PM, Kenneth Zadeck  wrote:
>
> On 10/07/2012 09:19 AM, Richard Guenther wrote:
>>>
>>> >In fact, you could argue that the tree level did it wrong (not that i am
>>> >suggesting to change this).   But it makes me think what was going on
>>> > when
>>> >the decision to make TYPE_PRECISION be an INT_CST rather than just a HWI
>>> > was
>>> >made.   For that there is an implication that it could never take more
>>> > than
>>> >a HWI since no place in the code even checks TREE_INT_CST_HIGH for
>>> > these.
>>
>> Well - on the tree level we now always have two HWIs for all INTEGER_CSTs.
>> If
>> we can, based on the size of the underlying mode, reduce that to one
>> HWI we already
>> win something.  If we add an explicit length to allow a smaller
>> encoding for larger modes
>> (tree_base conveniently has an available 'int' for this ...) then we'd
>> win in more cases.
>> Thus, is CONST_INT really necessarily better than optimized CONST_WIDE_INT
>> storage?
>
> i have to admit, that looking at these data structures gives me a headache.
> This all looks like something that Rube Goldberg would have invented had he
> done object oriented design  (Richard S did not know who Rube Goldberg when
> i mentioned this name to him a few years ago since this is an American
> thing, but the british had their own equivalent and I assume the germans do
> too.).
>
> i did the first cut of changing the rtl level structure and Richard S threw
> up on it and suggested what is there now, which happily (for me) i was able
> to get mike to implement.
>
> mike also did the tree level version of the data structures for me.   i will
> make sure he used that left over length field.
>
> The bottom line is that you most likely just save the length, but that is a
> big percent of something this small.  Both of rtl ints have a mode, so if we
> can make that change later, it will be space neutral.

Yes.

Btw, as for Richards idea of conditionally placing the length field in
rtx_def looks like overkill to me.  These days we'd merely want to
optimize for 64bit hosts, thus unconditionally adding a 32 bit
field to rtx_def looks ok to me (you can wrap that inside a union to
allow both descriptive names and eventual different use - see what
I've done to tree_base)

Richard.

Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 11:04 AM, Jan Hubicka  wrote:
>> On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen  wrote:
>> > Attached is the updated patch. Yes, if we add a VRP pass before
>> > profile pass, this patch would be unnecessary. Should we add a VRP
>> > pass?
>>
>> No, we don't want VRP in early optimizations.
>
> I am not quite sure about that.  VRP
>  1) makes branch prediction work better by doing jump threading early

Well ... but jump threading may need basic-block duplication which may
increase code size.  Also VRP and FRE have pass ordering issues.

>  2) is, after FRE, most effective tree pass on removing code by my profile
> statistics.

We also don't have DSE in early opts.  I don't want to end up with the
situation that we do everything in early opts ... we should do _less_ there
(but eventually iterate properly when processing cycles).

Richard.

> But that would require more analysis.
> The patch is OK.
> Honza

Re: vec_cond_expr adjustments

2012-10-08 Thread Marc Glisse


On Mon, 8 Oct 2012, Richard Guenther wrote:


VEC_COND_EXPR is more complicated. We could for instance require that it
takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
thing are equivalent). Which would leave to decide what the expansion of
vec_cond_expr passes to the targets when the first argument is not a
comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
opencl). One issue is that targets wouldn't know if it was a dummy
comparison that can safely be ignored because the other part is the result
of logical operations on comparisons (thus composed of -1 and 0) or a
genuine comparison with an arbitrary vector, so a new optimization would be
needed (in the back-end I guess or we would need an alternate instruction to
vcond) to detect if a vector is a "signed boolean" vector.
We could instead say that vec_cond_expr really follows OpenCL's semantics
and looks at the MSB of each element. I am not sure that would change much,
it would mostly delay the apparition of <0 to RTL expansion time (and thus
make gimple slightly lighter).


I think we should delay the decision on how to optimize this.  It's indeed
not trivial and the GIMPLE middle-end aggressively forwards feeding
comparisons into the VEC_COND_EXPR expressions already (somewhat
defeating any CSE that might be possible here) in forwprop.


Thanks for going through the long email :-)

What does that imply for the first argument of VEC_COND_EXPR? Currently, 
the expander asserts that it is a comparison, but that is not reflected in 
the gimple checkers.


If we document that VEC_COND_EXPR takes a vector of -1 and 0 (which is the 
case for a comparison), I don't think it prevents from later relaxing that 
to <0 or !=0. But then I don't know how to handle expansion when the 
argument is neither a comparison (vcond) nor a constant (vec_merge? I 
haven't tried but that should be doable), I would have to pass <0 or !=0 
to the target. So is the best choice to document that VEC_COND_EXPR takes 
as first argument a comparison and make gimple checking reflect that? 
(seems sad, but at least that would tell me what I can/can't do)


By the way, since we are documenting comparisons as returning 0 and -1, 
does that bring back the integer_truep predicate?


--
Marc Glisse

Re: Fixup INTEGER_CST

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 11:18 AM, Jan Hubicka  wrote:
>> On Sun, Oct 7, 2012 at 7:22 PM, Jan Hubicka  wrote:
>> >> On Sun, Oct 7, 2012 at 5:15 PM, Jan Hubicka  wrote:
>> >> > Hi,
>> >> > I added a santy check that after fixup all types that lost in the 
>> >> > merging are
>> >> > really dead.  And it turns out we have some zombies around.
>> >> >
>> >> > INTEGER_CST needs special care because it is special cased by the 
>> >> > streamer.  We also
>> >> > do not want to do inplace modificaitons on it because that would 
>> >> > corrupt the hashtable
>> >> > used by tree.c's sharing code
>> >> >
>> >> > Bootstrapped/regtested x86_64-linux, OK?
>> >>
>> >> No, I don't think we want to fixup INTEGER_CSTs this way.  Instead we
>> >> want to fixup
>> >> them where they end up used unfixed.
>> >
>> > Erm, I think it is what the patch does?
>>
>> Ah, indeed.
>>
>> > It replaces pointers to integer_cst with type that did not survive by 
>> > pointer
>> > to new integer_cst. (with the optimization that INTEGER_CST with overflow
>> > is changed in place because it is allowed to do so).
>>
>> Btw ...
>>
>> >> > @@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
>> >> >LTO_FIXUP_TREE (t->type_non_common.binfo);
>> >> >
>> >> >LTO_FIXUP_TREE (TYPE_CONTEXT (t));
>> >> > +
>> >> > +  if (TREE_CODE (t) == METHOD_TYPE)
>> >> > +TYPE_METHOD_BASETYPE (t);
>> >> > +  if (TREE_CODE (t) == OFFSET_TYPE)
>> >> > +TYPE_OFFSET_BASETYPE (t);
>>
>> that looks like a no-op to me ... (both are TYPE_MAXVAL which
>> is already fixed up).
>
> Ah, indeed.  They were result of experimenting with the stale pointers to the
> obsoletted types and field decls.  I now understand where they come from.  The
> reason is twofold.
>
>   1) after merging records we replace field decls in the cache
>  by new ones.  This however does not mean that they die, because
>  the existing pointers to them are not replaced.
>  I have WIP patch for that that however require one extra pass
>  over the list of all trees.

Yes, I think this is also why we do

  /* ???  Not sure the above is all relevant in this
 path canonicalizing TYPE_FIELDS to that of the
 main variant.  */
  if (ix < i)
lto_fixup_types (f2);
  streamer_tree_cache_insert_at (cache, f1, ix);

something I dislike as well and something we should try to address in a
more formal way.

>   2) As we query the type_hash while we are rewritting the types,
>  we run into instability of the hashtable. This manifests itself
>  as an ICE when one adds sanity check that while merging function
>  types their arg types are equivalent, too.
>  This ICEs compiling i.e. sqlite but I did not really managed to
>  reduce this.  I tracked it down to the argument type being inserted
>  into gimple_type_hash but at the time we query the new argument type,
>  the original is no longer found despite their hashes are equivalent.
>  The problem is hidden when things fit into the leader cache,
>  so one needs rather big testcase.

Ugh.  For reduction you can disable those caches though.  The above
means there is a disconnect between hashing and comparing.
Maybe it's something weird with the early out

  if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
goto same_types;
?

> So I tried to register all gimple types first.  Use TREE_VISITED within
> the merging code to mark that type is not a leader and then TREE_CHAIN
> to point to the leader.  This avoids need to re-query the hashtable
> from the later fixups.  We only look for types with TREEE_VISITED
> and replace them by TREE_CHAIN.

TREE_CHAIN is unused for types?  But we probably shouldn't add a new
use ...

> This has two passes.  First we compute the main variants and mark
> field_decls and type_decls for merging and in last pass we finally do
> fixup on what remained in the table.
>
> This allows me to poison pointers in the removed types in a way
> so the GGC would ICE if they stayed reachable.
> I however need the extra pass because
>  1) I can not update the type_decls/field_decls while registering
> types or I run into the hash table problems
>  2) I can not merge the second two passes because at the time
> I find type/field decls equialent there may be earlier pointers
> to them.

You need to "merge" all trees reachable from the one you start at once
(what I'm working on from time to time - work per tree "SCC", in a DFS
walk).

Richard.

> Honza

Re: vec_cond_expr adjustments

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 11:34 AM, Marc Glisse  wrote:
> On Mon, 8 Oct 2012, Richard Guenther wrote:
>
>>> VEC_COND_EXPR is more complicated. We could for instance require that it
>>> takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
>>> thing are equivalent). Which would leave to decide what the expansion of
>>> vec_cond_expr passes to the targets when the first argument is not a
>>> comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
>>> opencl). One issue is that targets wouldn't know if it was a dummy
>>> comparison that can safely be ignored because the other part is the
>>> result
>>> of logical operations on comparisons (thus composed of -1 and 0) or a
>>> genuine comparison with an arbitrary vector, so a new optimization would
>>> be
>>> needed (in the back-end I guess or we would need an alternate instruction
>>> to
>>> vcond) to detect if a vector is a "signed boolean" vector.
>>> We could instead say that vec_cond_expr really follows OpenCL's semantics
>>> and looks at the MSB of each element. I am not sure that would change
>>> much,
>>> it would mostly delay the apparition of <0 to RTL expansion time (and
>>> thus
>>> make gimple slightly lighter).
>>
>>
>> I think we should delay the decision on how to optimize this.  It's indeed
>> not trivial and the GIMPLE middle-end aggressively forwards feeding
>> comparisons into the VEC_COND_EXPR expressions already (somewhat
>> defeating any CSE that might be possible here) in forwprop.
>
>
> Thanks for going through the long email :-)
>
> What does that imply for the first argument of VEC_COND_EXPR? Currently, the
> expander asserts that it is a comparison, but that is not reflected in the
> gimple checkers.

And I don't think we should reflect that in the gimple checkers rather fixup the
expander (transparently use p != 0 or p < 0).

> If we document that VEC_COND_EXPR takes a vector of -1 and 0 (which is the
> case for a comparison), I don't think it prevents from later relaxing that
> to <0 or !=0. But then I don't know how to handle expansion when the
> argument is neither a comparison (vcond) nor a constant (vec_merge? I
> haven't tried but that should be doable), I would have to pass <0 or !=0 to
> the target.

Yes.

> So is the best choice to document that VEC_COND_EXPR takes as
> first argument a comparison and make gimple checking reflect that? (seems
> sad, but at least that would tell me what I can/can't do)

No, that would just mean that in GIMPLE you'd add this p != 0 or p < 0.
And at some point in the future I really really want to push this embedded
expression to a separate statement so I have a SSA definition for it.

> By the way, since we are documenting comparisons as returning 0 and -1, does
> that bring back the integer_truep predicate?

Not sure, true would still be != 0 or all_onesp (all bits of the
precision are 1), no?

Richard.

> --
> Marc Glisse

Re: Fixup INTEGER_CST

2012-10-08 Thread Jan Hubicka

> >   2) As we query the type_hash while we are rewritting the types,
> >  we run into instability of the hashtable. This manifests itself
> >  as an ICE when one adds sanity check that while merging function
> >  types their arg types are equivalent, too.
> >  This ICEs compiling i.e. sqlite but I did not really managed to
> >  reduce this.  I tracked it down to the argument type being inserted
> >  into gimple_type_hash but at the time we query the new argument type,
> >  the original is no longer found despite their hashes are equivalent.
> >  The problem is hidden when things fit into the leader cache,
> >  so one needs rather big testcase.
> 
> Ugh.  For reduction you can disable those caches though.  The above
> means there is a disconnect between hashing and comparing.
> Maybe it's something weird with the early out
> 
>   if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
> goto same_types;
> ?

Well, the problem goes away when you process all types before changing, so I
think it really is instability of hash table computation. But I am not sure how
to test for it.
Even disabling the caching and recomputing after gimple_register_type leads
to different results.
> 
> > So I tried to register all gimple types first.  Use TREE_VISITED within
> > the merging code to mark that type is not a leader and then TREE_CHAIN
> > to point to the leader.  This avoids need to re-query the hashtable
> > from the later fixups.  We only look for types with TREEE_VISITED
> > and replace them by TREE_CHAIN.
> 
> TREE_CHAIN is unused for types?  But we probably shouldn't add a new
> use ...

It is used, but unused for type merging.  
 /* Nodes are chained together for many purposes.
   Types are chained together to record them for being output to the debugger
   (see the function `chain_type'). */

We know that types that lost merging will not be used later, so we can
overwrite pointers we don't need.

When one removes the type from variant list during registering, one can
also use TYPE_MAIN_VARIANT, for example.
> 
> > This has two passes.  First we compute the main variants and mark
> > field_decls and type_decls for merging and in last pass we finally do
> > fixup on what remained in the table.
> >
> > This allows me to poison pointers in the removed types in a way
> > so the GGC would ICE if they stayed reachable.
> > I however need the extra pass because
> >  1) I can not update the type_decls/field_decls while registering
> > types or I run into the hash table problems
> >  2) I can not merge the second two passes because at the time
> > I find type/field decls equialent there may be earlier pointers
> > to them.
> 
> You need to "merge" all trees reachable from the one you start at once
> (what I'm working on from time to time - work per tree "SCC", in a DFS
> walk).

Yep, doing things per-SCC is definitely good idea. 

It will also give a chance to improve the hash itself.  If you process in SCC
order you know that all references outside SCC have already leaders set and you
can hash their addresses rather than using the weak hash.

I would really love to see this done.  After updating Mozilla we now need 10GB
of RAM and about 18 minutes for merging (they merged in new JIT that aparently
plays badly with our types). This makes any development/testing difficult.

Honza

Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Jan Hubicka

> On Mon, Oct 8, 2012 at 11:04 AM, Jan Hubicka  wrote:
> >> On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen  wrote:
> >> > Attached is the updated patch. Yes, if we add a VRP pass before
> >> > profile pass, this patch would be unnecessary. Should we add a VRP
> >> > pass?
> >>
> >> No, we don't want VRP in early optimizations.
> >
> > I am not quite sure about that.  VRP
> >  1) makes branch prediction work better by doing jump threading early
> 
> Well ... but jump threading may need basic-block duplication which may
> increase code size.  Also VRP and FRE have pass ordering issues.
> 
> >  2) is, after FRE, most effective tree pass on removing code by my profile
> > statistics.
> 
> We also don't have DSE in early opts.  I don't want to end up with the
> situation that we do everything in early opts ... we should do _less_ there
> (but eventually iterate properly when processing cycles).

Yep, i am not quite sure about most sane variant.  Missed simple jump threading
in early opts definitely confuse both profile estimate and inline size
estimates.  But I am also not thrilled by adding more passes to early opts at
all.  Also last time I looked into this, CCP missed a lot of CCP oppurtunities
making VRP to artifically look like more useful.

Have patch that bit improves profile updating after jump threading (i.e.
re-does the profile for simple cases), but still jump threading is the most
common case for profile become inconsistent after expand.

On a related note, with -fprofile-report I can easilly track how much of code
each pass in the queue removed.  I was thinking about running this on Mozilla
and -O1 and removing those passes that did almost nothing.  Those are mostly
re-run passes, both at Gimple and RTL level. Our passmanager is not terribly
friendly for controlling pass per-repetition.

With introduction of -Og pass queue, do you think introducing -O1 pass queue
for late tree passes (that will be quite short) is sane? What about RTL
level?  I guess we can split the queues for RTL optimizations, too.
All optimizations passes prior register allocation are sort of optional
and I guess there are also -Og candidates.

I hoever find the 3 times duplicated queues bit uncool, too, but I guess
it is most compatible with PM organization.

At -O3 the most effective passes on combine.c
are:

cfg (because of cfg cleanup) -1.5474%
Early inlning -0.4991%
FRE -7.9369%
VRP -0.9321% (if run early), ccp does -0.2273%
tailr -0.5305%

After IPA
copyrename -2.2850% (it packs cleanups after inlining)
forwprop -0.5432%
VRP -0.9700% (if rerun after early passes, otherwise it is about 2%)
PRE -2.4123%
DOM -0.5182%

RTL passes
into_cfglayout -3.1400% (i.e. first cleanup_cfg)
fwprop1 -3.0467%
cprop -2.7786%
combine -3.3346%
IRA -3.4912% (i.e. the cost model preffers hard regs)
bbro -0.9765%

The numbers on tramp3d and LTO cc1 binary and not that different.
Honza

RE: [Patch] Fix PR53397

2012-10-08 Thread Kumar, Venkataramanan

Hi Richard,

I have incorporated your comments. 

> Yes, call dump_mem_ref then, instead of repeating parts of its body.

Reference object  is not yet created at the place we check for invariance. It 
is still a tree expression.  I created a common function and used at all places 
to dump the "step", "base" and "delta" values of  memory reference being 
analyzed.

Please find the modified patch attached.

GCC regression "make check -k" passes with x86_64-unknown-linux-gnu.

Regards,
Venkat.

-Original Message-
From: Richard Guenther [mailto:richard.guent...@gmail.com] 
Sent: Thursday, October 04, 2012 6:26 PM
To: Kumar, Venkataramanan
Cc: Richard Guenther; gcc-patches@gcc.gnu.org
Subject: Re: [Patch] Fix PR53397

On Tue, Oct 2, 2012 at 6:40 PM, Kumar, Venkataramanan 
 wrote:
> Hi Richi,
>
> (Snip)
>> + (!cst_and_fits_in_hwi (step))
>> +{
>> +  if( loop->inner != NULL)
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +{
>> +  fprintf (dump_file, "Reference %p:\n", (void *) ref);
>> +  fprintf (dump_file, "(base " );
>> +  print_generic_expr (dump_file, base, TDF_SLIM);
>> +  fprintf (dump_file, ", step ");
>> +  print_generic_expr (dump_file, step, TDF_TREE);
>> +  fprintf (dump_file, ")\n");
>
> No need to repeat this - all references are dumped when we gather them.
> (Snip)
>
> The dumping happens at "record_ref" which is called after these statements to 
> record these references.
>
> When the step is invariant  we return from the function without recording the 
> references.
>
>  so I thought of dumping the references here.
>
> Is there a cleaner way to dump the references at one place?

Yes, call dump_mem_ref then, instead of repeating parts of its body.

Richard.

> Regards,
> Venkat.
>
>
>
> -Original Message-
> From: Richard Guenther [mailto:rguent...@suse.de]
> Sent: Tuesday, October 02, 2012 5:42 PM
> To: Kumar, Venkataramanan
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [Patch] Fix PR53397
>
> On Mon, 1 Oct 2012, venkataramanan.ku...@amd.com wrote:
>
>> Hi,
>>
>> The below patch fixes the FFT/Scimark regression caused by useless 
>> prefetch generation.
>>
>> This fix tries to make prefetch less aggressive by prefetching arrays 
>> in the inner loop, when the step is invariant in the entire loop nest.
>>
>> GCC currently tries to prefetch invariant steps when they are in the 
>> inner loop. But does not check if the step is variant in outer loops.
>>
>> In the scimark FFT case, the trip count of the inner loop varies by a 
>> non constant step, which is invariant in the inner loop.
>> But the step variable is varying in outer loop. This makes inner loop 
>> trip count small (at run time varies sometimes as small as 1
>> iteration)
>>
>> Prefetching ahead x iteration when the inner loop trip count is 
>> smaller than x leads to useless prefetches.
>>
>> Flag used: -O3 -march=amdfam10
>>
>> Before
>> **  **
>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>> ** for details. (Results can be submitted to p...@nist.gov) **
>> **  **
>> Using   2.00 seconds min time per kenel.
>> Composite Score:  550.50
>> FFT Mflops:38.66(N=1024)
>> SOR Mflops:   617.61(100 x 100)
>> MonteCarlo: Mflops:   173.74
>> Sparse matmult  Mflops:   675.63(N=1000, nz=5000)
>> LU  Mflops:  1246.88(M=100, N=100)
>>
>>
>> After
>> **  **
>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>> ** for details. (Results can be submitted to p...@nist.gov) **
>> **  **
>> Using   2.00 seconds min time per kenel.
>> Composite Score:  639.20
>> FFT Mflops:   479.19(N=1024)
>> SOR Mflops:   617.61(100 x 100)
>> MonteCarlo: Mflops:   173.18
>> Sparse matmult  Mflops:   679.13(N=1000, nz=5000)
>> LU  Mflops:  1246.88(M=100, N=100)
>>
>> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu 
>> New tests that PASS:
>>
>> gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c 
>> scan-tree-dump aprefetch "Issued prefetch"
>> gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c 
>> scan-tree-dump aprefetch "loop variant step"
>> gcc.dg/pr53397-2.c scan-tree-dump aprefetch "Not prefetching"
>> gcc.dg/pr53397-2.c (test for excess errors)
>>
>>
>> Checked CPU2006 and polyhedron on latest AMD processor, no regressions noted.
>>
>> Ok to commit in trunk?
>>
>> regards,
>> Venkat
>>
>> gcc/ChangeLog
>> +2012-10-01  Venkataramanan Kumar  
>> +
>> +   * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$
>> +   Perform non constant step prefetching in inner loo

Re: [RFC] Make vectorizer to skip loops with small iteration estimate

2012-10-08 Thread Richard Guenther

On Sat, Oct 6, 2012 at 11:34 AM, Jan Hubicka  wrote:
> Hi,
> I benchmarked the patch moving loop header copying and it is quite noticeable 
> win.
>
> Some testsuite updating is needed. In many cases it is just because the
> optimizations are now happening earlier.
> There are however few testusite failures I have torubles to deal with:
> ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/pr21559.c scan-tree-dump-times 
> vrp1 "Threaded jump" 3
> ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2.c 
> scan-tree-dump-times vrp1 "Jumps threaded: 1" 1
> ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/vect/O3-slp-reduc-10.c 
> scan-tree-dump-times vect "vectorized 1 loops" 2
> ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++98  
> scan-tree-dump-times vrp1 "if " 1
> ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++11  
> scan-tree-dump-times vrp1 "if " 1
>
> This is mostly about VRP losing its ability to thread some jumps from the
> duplicated loop header out of the loop across the loopback edge.  This seems 
> to
> be due to loop updating logic.  Do we care about these?

Yes, I think so.  At least we care that the optimized result is the same.

Can you elaborate on "due to loop updating logic"?

Can you elaborate on the def_split_header_continue_p change?  Which probably
should be tested and installed separately?

Thanks,
Richard.

> Honza
>
> Index: tree-ssa-threadupdate.c
> ===
> *** tree-ssa-threadupdate.c (revision 192123)
> --- tree-ssa-threadupdate.c (working copy)
> *** static bool
> *** 846,854 
>   def_split_header_continue_p (const_basic_block bb, const void *data)
>   {
> const_basic_block new_header = (const_basic_block) data;
> !   return (bb != new_header
> ! && (loop_depth (bb->loop_father)
> ! >= loop_depth (new_header->loop_father)));
>   }
>
>   /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
> --- 846,860 
>   def_split_header_continue_p (const_basic_block bb, const void *data)
>   {
> const_basic_block new_header = (const_basic_block) data;
> !   const struct loop *l;
> !
> !   if (bb == new_header
> !   || loop_depth (bb->loop_father) < loop_depth 
> (new_header->loop_father))
> ! return false;
> !   for (l = bb->loop_father; l; l = loop_outer (l))
> ! if (l == new_header->loop_father)
> !   return true;
> !   return false;
>   }
>
>   /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
> Index: testsuite/gcc.dg/unroll_2.c
> ===
> *** testsuite/gcc.dg/unroll_2.c (revision 192123)
> --- testsuite/gcc.dg/unroll_2.c (working copy)
> ***
> *** 1,5 
>   /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
> ! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
> -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
> -fenable-rtl-loop2_unroll" } */
>
>   unsigned a[100], b[100];
>   inline void bar()
> --- 1,5 
>   /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
> ! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
> -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
> -fenable-rtl-loop2_unroll -fno-tree-dominator-opts" } */
>
>   unsigned a[100], b[100];
>   inline void bar()
> Index: testsuite/gcc.dg/unroll_3.c
> ===
> *** testsuite/gcc.dg/unroll_3.c (revision 192123)
> --- testsuite/gcc.dg/unroll_3.c (working copy)
> ***
> *** 1,5 
>   /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
> ! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
> -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" 
> } */
>
>   unsigned a[100], b[100];
>   inline void bar()
> --- 1,5 
>   /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
> ! /* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
> -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo 
> -fno-tree-dominator-opts" } */
>
>   unsigned a[100], b[100];
>   inline void bar()
> Index: testsuite/gcc.dg/torture/pr23821.c
> ===
> *** testsuite/gcc.dg/torture/pr23821.c  (revision 192123)
> --- testsuite/gcc.dg/torture/pr23821.c  (working copy)
> ***
> *** 1,9 
>   /* { dg-do compile } */
>   /* { dg-skip-if "" { *-*-* } { "-O0" "-fno-fat-lto-objects" } { "" } } */
> ! /* At -O1 DOM threads a jump in a non-optimal way which leads to
>  the bogus propagation.  */
> ! /* { dg-skip-if "" { *-*-* } { "-O1" } { "" } } */
> ! /* { dg-options "-fdump-tree-ivcanon-details" } */
>
>   int a[199];
>
> --- 1,8 
>   /* { dg-do compile } */
>   /* { dg-skip-if "" { *-*-* } { "-O0" "-fno-fat-lto-objects" } { "" } } */
> ! /* DOM threads a jump in a non-o

[C++ Patch/RFC] PR 54194

2012-10-08 Thread Paolo Carlini


Hi,

in this PR submitter points out that in the -Wparentheses warning, for, eg,

char in[4]={0}, out[6];
out[1] = in[1] & 0x0F | ((in[3] & 0x3C) << 2);

warning: suggest parentheses around arithmetic in operand of ‘|’ 
[-Wparentheses]


the caret points to end of the expression, ie the final closing 
parenthesis, which is rather misleading, because the problem is actually 
in the first operand of '|'. Ideally I guess one would like to somehow 
point to that first operand, but our infrastructure (shared with the C 
front-end, at the moment) isn't really ready to do that, and probably we 
would like to use a range (more than a caret) below the whole first 
operand (the problem isn't really with & per se). Considering also what 
we are already doing elsewhere, it seems to me that a straightforward 
and good improvement is obtained by passing to warn_about_parentheses 
the location of the outer operand (together with its code), as per the 
attached patchlet: then in the example the caret points to the actual 
'|' operator mentioned in the error message. Post 4.8.0 we can imagine 
further improvements...


What do you think?

Thanks,
Paolo.

///

Index: cp/typeck.c
===
--- cp/typeck.c (revision 192130)
+++ cp/typeck.c (working copy)
@@ -3630,7 +3630,8 @@ build_x_binary_op (location_t loc, enum tree_code
   && !error_operand_p (arg2)
   && (code != LSHIFT_EXPR
  || !CLASS_TYPE_P (TREE_TYPE (arg1
-warn_about_parentheses (code, arg1_code, orig_arg1, arg2_code, orig_arg2);
+warn_about_parentheses (loc, code, arg1_code, orig_arg1,
+   arg2_code, orig_arg2);
 
   if (processing_template_decl && expr != error_mark_node)
 return build_min_non_dep (code, expr, orig_arg1, orig_arg2);
Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 192130)
+++ c-family/c-common.c (working copy)
@@ -10428,7 +10428,7 @@ warn_array_subscript_with_type_char (tree index)
was enclosed in parentheses.  */
 
 void
-warn_about_parentheses (enum tree_code code,
+warn_about_parentheses (location_t loc, enum tree_code code,
enum tree_code code_left, tree arg_left,
enum tree_code code_right, tree arg_right)
 {
@@ -10449,26 +10449,26 @@ void
 {
 case LSHIFT_EXPR:
   if (code_left == PLUS_EXPR || code_right == PLUS_EXPR)
-   warning (OPT_Wparentheses,
-"suggest parentheses around %<+%> inside %<<<%>");
+   warning_at (loc, OPT_Wparentheses,
+   "suggest parentheses around %<+%> inside %<<<%>");
   else if (code_left == MINUS_EXPR || code_right == MINUS_EXPR)
-   warning (OPT_Wparentheses,
-"suggest parentheses around %<-%> inside %<<<%>");
+   warning_at (loc, OPT_Wparentheses,
+   "suggest parentheses around %<-%> inside %<<<%>");
   return;
 
 case RSHIFT_EXPR:
   if (code_left == PLUS_EXPR || code_right == PLUS_EXPR)
-   warning (OPT_Wparentheses,
-"suggest parentheses around %<+%> inside %<>>%>");
+   warning_at (loc, OPT_Wparentheses,
+   "suggest parentheses around %<+%> inside %<>>%>");
   else if (code_left == MINUS_EXPR || code_right == MINUS_EXPR)
-   warning (OPT_Wparentheses,
-"suggest parentheses around %<-%> inside %<>>%>");
+   warning_at (loc, OPT_Wparentheses,
+   "suggest parentheses around %<-%> inside %<>>%>");
   return;
 
 case TRUTH_ORIF_EXPR:
   if (code_left == TRUTH_ANDIF_EXPR || code_right == TRUTH_ANDIF_EXPR)
-   warning (OPT_Wparentheses,
-"suggest parentheses around %<&&%> within %<||%>");
+   warning_at (loc, OPT_Wparentheses,
+   "suggest parentheses around %<&&%> within %<||%>");
   return;
 
 case BIT_IOR_EXPR:
@@ -10476,18 +10476,19 @@ void
  || code_left == PLUS_EXPR || code_left == MINUS_EXPR
  || code_right == BIT_AND_EXPR || code_right == BIT_XOR_EXPR
  || code_right == PLUS_EXPR || code_right == MINUS_EXPR)
-   warning (OPT_Wparentheses,
+   warning_at (loc, OPT_Wparentheses,
 "suggest parentheses around arithmetic in operand of %<|%>");
   /* Check cases like x|y==z */
   else if (TREE_CODE_CLASS (code_left) == tcc_comparison
   || TREE_CODE_CLASS (code_right) == tcc_comparison)
-   warning (OPT_Wparentheses,
+   warning_at (loc, OPT_Wparentheses,
 "suggest parentheses around comparison in operand of %<|%>");
   /* Check cases like !x | y */
   else if (code_left == TRUTH_NOT_EXPR
   && !APPEARS_TO_BE_BOOLEAN_EXPR_P (code_right, arg_right))
-   warning (OPT_Wparentheses, "suggest parentheses around operand of "
-"% or change %<|%> to %<||%> or % to %<~%>");
+

[wwwdocs,avr]: Deprecate/remove -mshort-calls, --with-avrlibc is default

2012-10-08 Thread Georg-Johann Lay

Applied the following changes to 4.7/4.8 release notes caveats.



Index: htdocs/gcc-4.7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.127
retrieving revision 1.128
diff -u -p -r1.127 -r1.128
--- htdocs/gcc-4.7/changes.html	20 Sep 2012 06:34:08 -	1.127
+++ htdocs/gcc-4.7/changes.html	8 Oct 2012 08:54:49 -	1.128
@@ -107,6 +107,10 @@
   has been enhanced.  As a result, all objects contributing to an
   application must either be compiled with GCC versions up to 4.6.x or
   with GCC versions 4.7.0 or later.
+
+The AVR port's -mshort-calls command line option has
+  been deprecated.  It will be removed in the GCC 4.8 release.
+  See -mrelax for a replacement.
   
 The ARM port's -mwords-little-endian option has
 been deprecated.  It will be removed in a future release.
Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.39
retrieving revision 1.40
diff -u -p -r1.39 -r1.40
--- htdocs/gcc-4.8/changes.html	6 Oct 2012 22:20:07 -	1.39
+++ htdocs/gcc-4.8/changes.html	8 Oct 2012 10:07:33 -	1.40
@@ -38,6 +38,18 @@ explicit use of vector types may be inco
 built with older versions of GCC.  Auto-vectorized code is not affected
 by this change.
 
+On AVR, support has been removed for the command line
+  option -mshort-calls deprecated in GCC 4.7.
+
+On AVR, the configure option --with-avrlibc supported since
+  GCC 4.7.2 is turned on per default for all non-RTEMS configurations.
+  This option arranges for a better integration of
+  http://www.nongnu.org/avr-libc/";>AVR Libc with avr-gcc.
+  For technical details, see http://gcc.gnu.org/PR54461";>PR54461.
+  To turn off the option in non-RTEMS configurations, use
+  --with-avrlibc=no.  If the compiler is configured for
+  RTEMS, the option is always turned off.
+
 General Optimizer Improvements (and Changes)

Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 12:01 PM, Jan Hubicka  wrote:
>> On Mon, Oct 8, 2012 at 11:04 AM, Jan Hubicka  wrote:
>> >> On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen  wrote:
>> >> > Attached is the updated patch. Yes, if we add a VRP pass before
>> >> > profile pass, this patch would be unnecessary. Should we add a VRP
>> >> > pass?
>> >>
>> >> No, we don't want VRP in early optimizations.
>> >
>> > I am not quite sure about that.  VRP
>> >  1) makes branch prediction work better by doing jump threading early
>>
>> Well ... but jump threading may need basic-block duplication which may
>> increase code size.  Also VRP and FRE have pass ordering issues.
>>
>> >  2) is, after FRE, most effective tree pass on removing code by my profile
>> > statistics.
>>
>> We also don't have DSE in early opts.  I don't want to end up with the
>> situation that we do everything in early opts ... we should do _less_ there
>> (but eventually iterate properly when processing cycles).
>
> Yep, i am not quite sure about most sane variant.  Missed simple jump 
> threading
> in early opts definitely confuse both profile estimate and inline size
> estimates.  But I am also not thrilled by adding more passes to early opts at
> all.  Also last time I looked into this, CCP missed a lot of CCP oppurtunities
> making VRP to artifically look like more useful.

Eh .. that shouldn't happen.  Do you have testcases by any chance?
I used to duplicate each SSA propagator pass and checked -fdump-statistics-stats
for that the 2nd pass does nothing (thus chaining CCP doesn't improve results).
But maybe that's not the issue you run into here?

> Have patch that bit improves profile updating after jump threading (i.e.
> re-does the profile for simple cases), but still jump threading is the most
> common case for profile become inconsistent after expand.
>
> On a related note, with -fprofile-report I can easilly track how much of code
> each pass in the queue removed.  I was thinking about running this on Mozilla
> and -O1 and removing those passes that did almost nothing.  Those are mostly
> re-run passes, both at Gimple and RTL level. Our passmanager is not terribly
> friendly for controlling pass per-repetition.

Sure.  You can also more thorougly instrument passes and use
-fdump-statistics for that (I've done that), but we usually have testcases
that require that each pass that still is there is present ...

> With introduction of -Og pass queue, do you think introducing -O1 pass queue
> for late tree passes (that will be quite short) is sane?

Yes.  I don't like the dump-file naming mess that results though, but if
we want to support optimized attribute switching between -O1 and -O2
then I guess we have to live with that ...

Originally I wanted to base -Og on -O1 (thus have them mostly share the
pass queue) and retain the same pass queue for -O2 and -Os.  Maybe
that's what we eventually want to do.  Thus, add a (off for -Og) loop
optimizer sub-pass to the queue and schedule some scalar cleanups
after it but inside it.

> What about RTL
> level?  I guess we can split the queues for RTL optimizations, too.
> All optimizations passes prior register allocation are sort of optional
> and I guess there are also -Og candidates.

Yes.  Though I first wanted to see actual issues with the RTL optimizers
and -Og.

> I hoever find the 3 times duplicated queues bit uncool, too, but I guess
> it is most compatible with PM organization.

Indeed ;)  We should at least try to share the queues for -Og and -O1.

> At -O3 the most effective passes on combine.c
> are:
>
> cfg (because of cfg cleanup) -1.5474%
> Early inlning -0.4991%
> FRE -7.9369%
> VRP -0.9321% (if run early), ccp does -0.2273%

I think VRP has the advantage of taking loop iteration counts into account.
Maybe we can add sth similar to CCP.  It's sad that VRP is too expensive,
it really is a form of CCP so merging both passes would be best (we can
at a single point, add_equivalence, turn off equivalence processing - the most
expensive part of VRP, and call that CCP ...).

> tailr -0.5305%
>
> After IPA
> copyrename -2.2850% (it packs cleanups after inlining)
> forwprop -0.5432%
> VRP -0.9700% (if rerun after early passes, otherwise it is about 2%)
> PRE -2.4123%
> DOM -0.5182%
>
> RTL passes
> into_cfglayout -3.1400% (i.e. first cleanup_cfg)
> fwprop1 -3.0467%
> cprop -2.7786%
> combine -3.3346%
> IRA -3.4912% (i.e. the cost model preffers hard regs)
> bbro -0.9765%
>
> The numbers on tramp3d and LTO cc1 binary and not that different.

Yes.

Richard.

> Honza

Re: [Patch] Fix PR53397

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 12:01 PM, Kumar, Venkataramanan
 wrote:
> Hi Richard,
>
> I have incorporated your comments.
>
>> Yes, call dump_mem_ref then, instead of repeating parts of its body.
>
> Reference object  is not yet created at the place we check for invariance. It 
> is still a tree expression.  I created a common function and used at all 
> places to dump the "step", "base" and "delta" values of  memory reference 
> being analyzed.
>
> Please find the modified patch attached.
>
> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu.

I presume also bootstrapped.

Ok.

Thanks,
Richard.

> Regards,
> Venkat.
>
> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Thursday, October 04, 2012 6:26 PM
> To: Kumar, Venkataramanan
> Cc: Richard Guenther; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch] Fix PR53397
>
> On Tue, Oct 2, 2012 at 6:40 PM, Kumar, Venkataramanan 
>  wrote:
>> Hi Richi,
>>
>> (Snip)
>>> + (!cst_and_fits_in_hwi (step))
>>> +{
>>> +  if( loop->inner != NULL)
>>> +{
>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>> +{
>>> +  fprintf (dump_file, "Reference %p:\n", (void *) ref);
>>> +  fprintf (dump_file, "(base " );
>>> +  print_generic_expr (dump_file, base, TDF_SLIM);
>>> +  fprintf (dump_file, ", step ");
>>> +  print_generic_expr (dump_file, step, TDF_TREE);
>>> +  fprintf (dump_file, ")\n");
>>
>> No need to repeat this - all references are dumped when we gather them.
>> (Snip)
>>
>> The dumping happens at "record_ref" which is called after these statements 
>> to record these references.
>>
>> When the step is invariant  we return from the function without recording 
>> the references.
>>
>>  so I thought of dumping the references here.
>>
>> Is there a cleaner way to dump the references at one place?
>
> Yes, call dump_mem_ref then, instead of repeating parts of its body.
>
> Richard.
>
>> Regards,
>> Venkat.
>>
>>
>>
>> -Original Message-
>> From: Richard Guenther [mailto:rguent...@suse.de]
>> Sent: Tuesday, October 02, 2012 5:42 PM
>> To: Kumar, Venkataramanan
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [Patch] Fix PR53397
>>
>> On Mon, 1 Oct 2012, venkataramanan.ku...@amd.com wrote:
>>
>>> Hi,
>>>
>>> The below patch fixes the FFT/Scimark regression caused by useless
>>> prefetch generation.
>>>
>>> This fix tries to make prefetch less aggressive by prefetching arrays
>>> in the inner loop, when the step is invariant in the entire loop nest.
>>>
>>> GCC currently tries to prefetch invariant steps when they are in the
>>> inner loop. But does not check if the step is variant in outer loops.
>>>
>>> In the scimark FFT case, the trip count of the inner loop varies by a
>>> non constant step, which is invariant in the inner loop.
>>> But the step variable is varying in outer loop. This makes inner loop
>>> trip count small (at run time varies sometimes as small as 1
>>> iteration)
>>>
>>> Prefetching ahead x iteration when the inner loop trip count is
>>> smaller than x leads to useless prefetches.
>>>
>>> Flag used: -O3 -march=amdfam10
>>>
>>> Before
>>> **  **
>>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>>> ** for details. (Results can be submitted to p...@nist.gov) **
>>> **  **
>>> Using   2.00 seconds min time per kenel.
>>> Composite Score:  550.50
>>> FFT Mflops:38.66(N=1024)
>>> SOR Mflops:   617.61(100 x 100)
>>> MonteCarlo: Mflops:   173.74
>>> Sparse matmult  Mflops:   675.63(N=1000, nz=5000)
>>> LU  Mflops:  1246.88(M=100, N=100)
>>>
>>>
>>> After
>>> **  **
>>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>>> ** for details. (Results can be submitted to p...@nist.gov) **
>>> **  **
>>> Using   2.00 seconds min time per kenel.
>>> Composite Score:  639.20
>>> FFT Mflops:   479.19(N=1024)
>>> SOR Mflops:   617.61(100 x 100)
>>> MonteCarlo: Mflops:   173.18
>>> Sparse matmult  Mflops:   679.13(N=1000, nz=5000)
>>> LU  Mflops:  1246.88(M=100, N=100)
>>>
>>> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu
>>> New tests that PASS:
>>>
>>> gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c
>>> scan-tree-dump aprefetch "Issued prefetch"
>>> gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c
>>> scan-tree-dump aprefetch "loop variant step"
>>> gcc.dg/pr53397-2.c scan-tree-dump aprefetch "Not prefetching"
>>> gcc.dg/pr53397-2.c (test for excess errors)
>>>
>>>
>>> Checked CPU2006 and polyhedron on latest AMD processor, no re

Re: [lra] patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher

On Mon, Oct 8, 2012 at 10:18 AM, Jakub Jelinek  wrote:
>> > I'm playing with a patch to expand the insns_with_changed_offsets
>> > bitmap to an sbitmap, and will send a patch if this works better.
>>
>> Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).
>
> Or use temporarily some rtx flag on the insns, from what I can see,
> in_struct on *INSN is right now only used during scheduling and from reorg
> till eoc, so for LRA sitting in between both scheduling passes it might
> be possible to use that bit too.

AFAICT neither of these ideas will work because only insn UIDs are
used when computing insns_with_changed_offsets. You'd need the actual
insn for a VEC, pointer map or flag. Also, with a VEC or pointer map,
it's difficult to union of the the insn_bitmap sets.

The patch I have for this uses an sbitmap, it's posted in a new thread
starting here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00698.html

Ciao!
Steven

Re: [lra] patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher

On Sun, Oct 7, 2012 at 5:59 PM, Vladimir Makarov wrote:
> * lra-lives.c (lra_start_point_ranges, lra_finish_point_ranges):
> Remove.
> (process_bb_lives): Change start regno in
> EXECUTE_IF_SET_IN_BITMAP.  Iterate on DF_LR_IN (bb) instead of
> pseudos_live_through_calls.

This can be done a bit better still by checking whether the
pseudos_live_through_calls set is empty:

* lra-lives.c (process_bb_lives): At the top of a basic block, break
from the loop over pseudos_live_through_calls if the set is empty.

--- lra-lives.c.orig   2012-10-08 12:24:10.0 +0200
+++ lra-lives.c2012-10-08 12:26:07.0 +0200
@@ -751,8 +751,12 @@ process_bb_lives (basic_block bb)
 mark_pseudo_dead (i);

   EXECUTE_IF_SET_IN_BITMAP (DF_LR_IN (bb), FIRST_PSEUDO_REGISTER, j, bi)
-if (sparseset_bit_p (pseudos_live_through_calls, j))
-  check_pseudos_live_through_calls (j);
+{
+  if (sparseset_cardinality (pseudos_live_through_calls) == 0)
+   break;
+  if (sparseset_bit_p (pseudos_live_through_calls, j))
+   check_pseudos_live_through_calls (j);
+}

   incr_curr_point (freq);
 }

This test is extremely cheap (the load for the cardinality test
re-used by sparseset_bit_p) and it cuts down the time spent in live
range chains even further (especially e.g. for blocks that don't
contain calls).

OK for the branch if it passes bootstrap+testing on x86_64-unknown-linux-gnu?

Ciao!
Steven

[RFC] Implement load sinking in loops

2012-10-08 Thread Eric Botcazou

Hi,

we recently noticed that, even at -O3, the compiler doesn't figure out that 
the following loop is dumb:

#define SIZE 64

int foo (int v[])
{
  int r;

  for (i = 0; i < SIZE; i++)
r = v[i];

  return r;
}

which was a bit of a surprise.  On second thoughts, this isn't entirely 
unexpected, as it probably matters only for (slightly) pathological cases.
The attached patch nevertheless implements a form of load sinking in loops so 
as to optimize these cases.  It's combined with invariant motion to optimize:

int foo (int v[], int a)
{
  int r, i;

  for (i = 0; i < SIZE; i++)
r = v[i] + a;

  return r;
}

and with store sinking to optimize:

int foo (int v1[], int v2[])
{
  int r[SIZE];
  int i, j;

  for (j = 0; j < SIZE; j++)
for (i = 0; i < SIZE; i++)
  r[j] = v1[j] + v2[i];

  return r[SIZE - 1];
}

The optimization is enabled at -O2 in the patch for measurement purposes but, 
given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap, 
compiler-only, all languages except Go), it's probably best suited to -O3.
Or perhaps we don't care and it should simply be dropped...  Thoughts?

Tested on x86_64-suse-linux.


2012-10-08  Eric Botcazou  

* gimple.h (gsi_insert_seq_on_edge_before): Declare.
* gimple-iterator.c (gsi_insert_seq_on_edge_before): New function.
* tree-ssa-loop-im.c (struct mem_ref_loc): Add LHS field.
(mem_ref_in_stmt): Remove gcc_assert.
(copy_load_and_single_use_chain): New function.
(execute_lm): Likewise.
(hoist_memory_references): Hoist the loads after the stores.
(ref_always_accessed_p): Rename into...
(ref_always_stored_p): ...this.  Remove STORE_P and add ONCE_P.
(can_lsm_ref_p): New function extracted from...
(can_sm_ref_p): ...here.  Call it.
(follow_invariant_single_use_chain): New function.
(can_lm_ref_p): Likewise.
(find_refs_for_sm): Rename into..
(find_refs_for_lsm): ...this.  Find load hoisting opportunities.
(loop_suitable_for_sm): Rename into...
(loop_suitable_for_lsm): ...this.
(store_motion_loop): Rename into...
(load_store_motion_loop): ...this.  Adjust calls to above functions.
(tree_ssa_lim): Likewise.


2012-10-08  Eric Botcazou  

* gcc.dg/tree-ssa/loadmotion-1.c: New test.
* gcc.dg/tree-ssa/loadmotion-2.c: New test.
* gcc.dg/tree-ssa/loadmotion-3.c: New test.


-- 
Eric BotcazouIndex: gimple.h
===
--- gimple.h	(revision 192137)
+++ gimple.h	(working copy)
@@ -5196,6 +5196,7 @@ void gsi_move_before (gimple_stmt_iterat
 void gsi_move_to_bb_end (gimple_stmt_iterator *, basic_block);
 void gsi_insert_on_edge (edge, gimple);
 void gsi_insert_seq_on_edge (edge, gimple_seq);
+void gsi_insert_seq_on_edge_before (edge, gimple_seq);
 basic_block gsi_insert_on_edge_immediate (edge, gimple);
 basic_block gsi_insert_seq_on_edge_immediate (edge, gimple_seq);
 void gsi_commit_one_edge_insert (edge, basic_block *);
Index: gimple-iterator.c
===
--- gimple-iterator.c	(revision 192137)
+++ gimple-iterator.c	(working copy)
@@ -677,6 +677,16 @@ gsi_insert_seq_on_edge (edge e, gimple_s
   gimple_seq_add_seq (&PENDING_STMT (e), seq);
 }
 
+/* Likewise, but append it instead of prepending it.  */
+
+void
+gsi_insert_seq_on_edge_before (edge e, gimple_seq seq)
+{
+  gimple_seq pending = NULL;
+  gimple_seq_add_seq (&pending, seq);
+  gimple_seq_add_seq (&pending, PENDING_STMT (e));
+  PENDING_STMT (e) = pending;
+}
 
 /* Insert the statement pointed-to by GSI into edge E.  Every attempt
is made to place the statement in an existing basic block, but
Index: tree-ssa-loop-im.c
===
--- tree-ssa-loop-im.c	(revision 192137)
+++ tree-ssa-loop-im.c	(working copy)
@@ -103,6 +103,7 @@ typedef struct mem_ref_loc
 {
   tree *ref;			/* The reference itself.  */
   gimple stmt;			/* The statement in that it occurs.  */
+  tree lhs;			/* The (ultimate) LHS for a load.  */
 } *mem_ref_loc_p;
 
 DEF_VEC_P(mem_ref_loc_p);
@@ -674,7 +675,6 @@ mem_ref_in_stmt (gimple stmt)
 
   if (!mem)
 return NULL;
-  gcc_assert (!store);
 
   hash = iterative_hash_expr (*mem, 0);
   ref = (mem_ref_p) htab_find_with_hash (memory_accesses.refs, *mem, hash);
@@ -2192,6 +2192,140 @@ execute_sm (struct loop *loop, VEC (edge
   execute_sm_if_changed (ex, ref->mem, tmp_var, store_flag);
 }
 
+/* Copy the load and the chain of single uses described by LOC and return the
+   sequence of new statements.  Also set NEW_LHS to the copy of LOC->LHS.  */
+
+static gimple_seq
+copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs)
+{
+  tree mem = *loc->ref;
+  tree lhs, tmp_var, ssa_name;
+  gimple_seq seq = NULL;
+  gimple stmt;
+  unsigned n = 0;
+
+  /* First copy the load and create the new LHS for it.

Re: [RFC] Make vectorizer to skip loops with small iteration estimate

2012-10-08 Thread Jan Hubicka

> On Sat, Oct 6, 2012 at 11:34 AM, Jan Hubicka  wrote:
> > Hi,
> > I benchmarked the patch moving loop header copying and it is quite 
> > noticeable win.
> >
> > Some testsuite updating is needed. In many cases it is just because the
> > optimizations are now happening earlier.
> > There are however few testusite failures I have torubles to deal with:
> > ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/pr21559.c 
> > scan-tree-dump-times vrp1 "Threaded jump" 3
> > ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2.c 
> > scan-tree-dump-times vrp1 "Jumps threaded: 1" 1
> > ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/vect/O3-slp-reduc-10.c 
> > scan-tree-dump-times vect "vectorized 1 loops" 2
> > ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++98  
> > scan-tree-dump-times vrp1 "if " 1
> > ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++11  
> > scan-tree-dump-times vrp1 "if " 1
> >
> > This is mostly about VRP losing its ability to thread some jumps from the
> > duplicated loop header out of the loop across the loopback edge.  This 
> > seems to
> > be due to loop updating logic.  Do we care about these?
> 
> Yes, I think so.  At least we care that the optimized result is the same.

it is not, we really lose optimization in those testcases.
The ones that are still optimized well I updated in the patch bellow.
> 
> Can you elaborate on "due to loop updating logic"?

The problem is:
  /* We do not allow VRP information to be used for jump threading
 across a back edge in the CFG.  Otherwise it becomes too
 difficult to avoid eliminating loop exit tests.  Of course
 EDGE_DFS_BACK is not accurate at this time so we have to
 recompute it.  */
  mark_dfs_back_edges ();

  /* Do not thread across edges we are about to remove.  Just marking
 them as EDGE_DFS_BACK will do.  */
  FOR_EACH_VEC_ELT (edge, to_remove_edges, i, e)
e->flags |= EDGE_DFS_BACK;

Loop header copying puts some conditional before loop and we want to thread
up to exit out of the loop (that I think it rather important optimization).
But it no longer happens before back edge is in the way.  At least that was
the case in the tree-ssa failures I analyzed.
> 
> Can you elaborate on the def_split_header_continue_p change?  Which probably
> should be tested and installed separately?

Yes, that one is latent bug.  The code is expecting that loop exit is recognized
by loop depth decreasing that is not true.
It reproduces as ICE during bootstrap with the patch.
I will regtest/bootstrap and commit it today.

Honza

Re: Scheduler: Save state at the end of a block

2012-10-08 Thread Bernd Schmidt

On 08/13/2012 05:42 PM, Vladimir Makarov wrote:
> On 08/13/2012 06:32 AM, Bernd Schmidt wrote:
>> This is a small patch for sched-rgn that attempts to save DFA state at
>> the end of a basic block and re-use it in successor blocks. This was a
>> customer-requested optimization; I've not seen it make much of a
>> difference in any macro benchmarks.
>> Bootstrapped and tested on x86_64-linux and also tested on c6x-elf. OK?
>>
>>
>>
> Yes.  Thanks for the patch, Bernd.

It's been a while, so I thought I'd better mention I've checked this in
now after retesting.


Bernd

[RFC] Fix PR rtl-optimization/54315 (partially)

2012-10-08 Thread Eric Botcazou

Hi,

this PR is about the other path in the RTL expander where a temporary is 
created on the stack for a value returned in registers: BLKmode structures 
returned in registers without PARALLELs, i.e. for which the back-end is 
permitted (but not required) to create (REG:BLK).  The canonical example is 
the ARM (and not the PA anymore, as it returns in PARALLELs these days) but 
x86-64 also returns small unions by means of this mechanism.

The idea is the same as for PARALLELs in:
  http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00811.html
i.e. to keep the returned value in pseudo-registers as long as possible.
copy_blkmode_from_reg is changed to require an explicit target and isn't 
invoked from expand_call, but from expand_assignment/store_expr/store_field 
instead where a target is available.  The following assertion is added:

  /* BLKmode registers created in the back-end shouldn't have survived.  */
  gcc_assert (mode != BLKmode);

despite the code in expand_call.  The GET_MODE (valreg) != BLKmode test in 
expand_call was added recently by Jakub, but is very likely superfluous since

  /* Register in which non-BLKmode value will be returned,
 or 0 if no value or if value is BLKmode.  */
  rtx valreg;

and hard_function_value fixes up (REG:BLK) coming from the back-ends.

The patch was fully tested on x86_64-suse-linux, where it removes half of the 
useless stores in the original testcase for PR rtl-optimization/54315, and 
manually tested for arm-linux-gnueabi (for now), where it also removes stores 
for small structures.  Comments?


2012-10-08  Eric Botcazou  

* calls.c (expand_call): Don't deal specifically with BLKmode values
returned in naked registers.
* expr.h (copy_blkmode_from_reg): Adjust prototype.
* expr.c (copy_blkmode_from_reg): Rename first parameter into TARGET and
make it required.  Assert that SRCREG hasn't BLKmode.  Add a couple of 
short-circuits for common cases and be prepared for sub-word registers.
(expand_assignment): Call copy_blkmode_from_reg for BLKmode values
returned in naked registers.
(store_expr): Likewise.
(store_field): Likewise.


-- 
Eric BotcazouIndex: expr.h
===
--- expr.h	(revision 192137)
+++ expr.h	(working copy)
@@ -335,7 +335,7 @@ extern rtx emit_group_move_into_temps (r
 extern void emit_group_store (rtx, rtx, tree, int);
 
 /* Copy BLKmode object from a set of registers.  */
-extern rtx copy_blkmode_from_reg (rtx, rtx, tree);
+extern void copy_blkmode_from_reg (rtx, rtx, tree);
 
 /* Mark REG as holding a parameter for the next CALL_INSN.
Mode is TYPE_MODE of the non-promoted parameter, or VOIDmode.  */
Index: expr.c
===
--- expr.c	(revision 192137)
+++ expr.c	(working copy)
@@ -2086,39 +2086,23 @@ emit_group_store (rtx orig_dst, rtx src,
 emit_move_insn (orig_dst, dst);
 }
 
-/* Generate code to copy a BLKmode object of TYPE out of a
-   set of registers starting with SRCREG into TGTBLK.  If TGTBLK
-   is null, a stack temporary is created.  TGTBLK is returned.
-
-   The purpose of this routine is to handle functions that return
-   BLKmode structures in registers.  Some machines (the PA for example)
-   want to return all small structures in registers regardless of the
-   structure's alignment.  */
+/* Copy a BLKmode object of TYPE out of a register SRCREG into TARGET.
 
-rtx
-copy_blkmode_from_reg (rtx tgtblk, rtx srcreg, tree type)
+   This is used on targets that return BLKmode values in registers.  */
+
+void
+copy_blkmode_from_reg (rtx target, rtx srcreg, tree type)
 {
   unsigned HOST_WIDE_INT bytes = int_size_in_bytes (type);
   rtx src = NULL, dst = NULL;
   unsigned HOST_WIDE_INT bitsize = MIN (TYPE_ALIGN (type), BITS_PER_WORD);
   unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0;
+  enum machine_mode mode = GET_MODE (srcreg);
+  enum machine_mode tmode = GET_MODE (target);
   enum machine_mode copy_mode;
 
-  if (tgtblk == 0)
-{
-  tgtblk = assign_temp (build_qualified_type (type,
-		  (TYPE_QUALS (type)
-		   | TYPE_QUAL_CONST)),
-			1, 1);
-  preserve_temp_slots (tgtblk);
-}
-
-  /* This code assumes srcreg is at least a full word.  If it isn't, copy it
- into a new pseudo which is a full word.  */
-
-  if (GET_MODE (srcreg) != BLKmode
-  && GET_MODE_SIZE (GET_MODE (srcreg)) < UNITS_PER_WORD)
-srcreg = convert_to_mode (word_mode, srcreg, TYPE_UNSIGNED (type));
+  /* BLKmode registers created in the back-end shouldn't have survived.  */
+  gcc_assert (mode != BLKmode);
 
   /* If the structure doesn't take up a whole number of words, see whether
  SRCREG is padded on the left or on the right.  If it's on the left,
@@ -2136,22 +2120,54 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 padding_correction
   = (BITS_PER_WORD - ((bytes % UNITS_PER_WORD) * BITS_PER_UNIT))

Re: patch to fix constant math - third small patch

2012-10-08 Thread Kenneth Zadeck


yes, my bad.   here it is with the patches.
On 10/06/2012 11:55 AM, Kenneth Zadeck wrote:

This is the third patch in the series of patches to fix constant math.
this one changes some predicates at the rtl level to use the new 
predicate CONST_SCALAR_INT_P.

I did not include a few that were tightly intertwined with other changes.

Not all of these changes are strictly mechanical.   Richard, when 
reviewing this had me make additional changes to remove what he 
thought were latent bugs at the rtl level.   However, it appears that 
the bugs were not latent.I do not know what is going on here but i 
am smart enough to not look a gift horse in the mouth.


All of this was done on the same machine with no changes and identical 
configs.  It is an x86-64 with ubuntu 12-4.


ok for commit?

in the logs below, gbBaseline is a trunk from friday and the gbWide is 
the same revision but with my patches.  Some of this like 
gfortran.dg/pr32627 is obviously flutter, but the rest does not appear 
to be.


=
heracles:~/gcc(13) gccBaseline/contrib/compare_tests 
gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log

New tests that PASS:

gcc.dg/builtins-85.c scan-assembler mysnprintf
gcc.dg/builtins-85.c scan-assembler-not __chk_fail
gcc.dg/builtins-85.c (test for excess errors)


heracles:~/gcc(14) gccBaseline/contrib/compare_tests 
gbBaseline/gcc/testsuite/gfortran/gfortran.log 
gbWide/gcc/testsuite/gfortran/gfortran.log

New tests that PASS:

gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-loops (test 
for excess errors)
gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer  (test for excess 
errors)

gfortran.dg/pr32627.f03  -Os  (test for excess errors)
gfortran.dg/pr32635.f  -O0  execution test
gfortran.dg/pr32635.f  -O0  (test for excess errors)
gfortran.dg/substr_6.f90  -O2  (test for excess errors)

Old tests that passed, that have disappeared: (Eeek!)

gfortran.dg/pr32627.f03  -O1  (test for excess errors)
gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)

gfortran.dg/pr32627.f03  -O3 -g  (test for excess errors)
gfortran.dg/substring_equivalence.f90  -O  (test for excess errors)
Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as 
tool-and-target-specific interface file.


=== g++ Summary ===

# of expected passes49793
# of expected failures284
# of unsupported tests601

runtest completed at Fri Oct  5 16:10:20 2012
heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using 
/usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as 
tool-and-target-specific interface file.


=== g++ Summary ===

# of expected passes50472
# of expected failures284
# of unsupported tests613

runtest completed at Fri Oct  5 19:51:50 2012







diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 299150e..0404605 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3633,9 +3633,8 @@ expand_debug_locations (void)
 
 	gcc_assert (mode == GET_MODE (val)
 			|| (GET_MODE (val) == VOIDmode
-			&& (CONST_INT_P (val)
+			&& (CONST_SCALAR_INT_P (val)
 || GET_CODE (val) == CONST_FIXED
-|| CONST_DOUBLE_AS_INT_P (val) 
 || GET_CODE (val) == LABEL_REF)));
 	  }
 
diff --git a/gcc/combine.c b/gcc/combine.c
index 4e0a579..b531305 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2617,16 +2617,19 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p,
  constant.  */
   if (i1 == 0
   && (temp = single_set (i2)) != 0
-  && (CONST_INT_P (SET_SRC (temp))
-	  || CONST_DOUBLE_AS_INT_P (SET_SRC (temp)))
+  && CONST_SCALAR_INT_P (SET_SRC (temp))
   && GET_CODE (PATTERN (i3)) == SET
-  && (CONST_INT_P (SET_SRC (PATTERN (i3)))
-	  || CONST_DOUBLE_AS_INT_P (SET_SRC (PATTERN (i3
+  && CONST_SCALAR_INT_P (SET_SRC (PATTERN (i3)))
   && reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp)))
 {
   rtx dest = SET_DEST (PATTERN (i3));
   int offset = -1;
   int width = 0;
+  
+  /* There are not explicit tests to make sure that this is not a
+	 float, but there is code here that would not be correct if it
+	 was.  */
+  gcc_assert (GET_MODE_CLASS (GET_MODE (SET_SRC (temp))) != MODE_FLOAT);
 
   if (GET_CODE (dest) == ZERO_EXTRACT)
 	{
@@ -5102,8 +5105,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 	  if (GET_CODE (new_rtx) == CLOBBER && XEXP (new_rtx, 0) == const0_rtx)
 		return new_rtx;
 
-	  if (GET_CODE (x) == SUBREG
-		  && (CONST_INT_P (new_rtx) || CONST_DOUBLE_AS_INT_P (new_rtx)))
+	  if (GET_CODE (x) == SUBREG && CONST_SCALAR_INT_P (new_rtx))
 		{
 		  enum machine_mode mode = GET_MODE (x);
 
@@ -7133,7 +7135,7 @@ make_extraction (enum machine_mode mode, rtx inner, HOST_WIDE_INT pos,
   if (mode == tmode)
 	return new_rtx;
 
-

[PATCH] Remove my_rev_post_order_compute

2012-10-08 Thread Richard Guenther


This replaces my_rev_post_order_compute in PRE by the already
existing inverted_post_order_compute, with the necessary adjustments.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-10-08  Richard Guenther  

* tree-ssa-pre.c (postorder_num): New global.
(compute_antic): Initialize all blocks and adjust for
generic postorder.
(my_rev_post_order_compute): Remove.
(init_pre): Use inverted_post_order_compute.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 192119)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -430,6 +430,7 @@ typedef struct bb_bitmap_sets
 
 /* Basic block list in postorder.  */
 static int *postorder;
+static int postorder_num;
 
 /* This structure is used to keep track of statistics on what
optimization PRE was able to perform.  */
@@ -2456,7 +2457,7 @@ compute_antic (void)
   has_abnormal_preds = sbitmap_alloc (last_basic_block);
   sbitmap_zero (has_abnormal_preds);
 
-  FOR_EACH_BB (block)
+  FOR_ALL_BB (block)
 {
   edge_iterator ei;
   edge e;
@@ -2480,9 +2481,7 @@ compute_antic (void)
 }
 
   /* At the exit block we anticipate nothing.  */
-  ANTIC_IN (EXIT_BLOCK_PTR) = bitmap_set_new ();
   BB_VISITED (EXIT_BLOCK_PTR) = 1;
-  PA_IN (EXIT_BLOCK_PTR) = bitmap_set_new ();
 
   changed_blocks = sbitmap_alloc (last_basic_block + 1);
   sbitmap_ones (changed_blocks);
@@ -2496,7 +2495,7 @@ compute_antic (void)
 for PA ANTIC computation.  */
   num_iterations++;
   changed = false;
-  for (i = n_basic_blocks - NUM_FIXED_BLOCKS - 1; i >= 0; i--)
+  for (i = postorder_num - 1; i >= 0; i--)
{
  if (TEST_BIT (changed_blocks, postorder[i]))
{
@@ -2525,7 +2524,7 @@ compute_antic (void)
fprintf (dump_file, "Starting iteration %d\n", num_iterations);
  num_iterations++;
  changed = false;
- for (i = n_basic_blocks - NUM_FIXED_BLOCKS - 1 ; i >= 0; i--)
+ for (i = postorder_num - 1 ; i >= 0; i--)
{
  if (TEST_BIT (changed_blocks, postorder[i]))
{
@@ -4593,78 +4592,6 @@ remove_dead_inserted_code (void)
   BITMAP_FREE (worklist);
 }
 
-/* Compute a reverse post-order in *POST_ORDER.  If INCLUDE_ENTRY_EXIT is
-   true, then then ENTRY_BLOCK and EXIT_BLOCK are included.  Returns
-   the number of visited blocks.  */
-
-static int
-my_rev_post_order_compute (int *post_order, bool include_entry_exit)
-{
-  edge_iterator *stack;
-  int sp;
-  int post_order_num = 0;
-  sbitmap visited;
-
-  if (include_entry_exit)
-post_order[post_order_num++] = EXIT_BLOCK;
-
-  /* Allocate stack for back-tracking up CFG.  */
-  stack = XNEWVEC (edge_iterator, n_basic_blocks + 1);
-  sp = 0;
-
-  /* Allocate bitmap to track nodes that have been visited.  */
-  visited = sbitmap_alloc (last_basic_block);
-
-  /* None of the nodes in the CFG have been visited yet.  */
-  sbitmap_zero (visited);
-
-  /* Push the last edge on to the stack.  */
-  stack[sp++] = ei_start (EXIT_BLOCK_PTR->preds);
-
-  while (sp)
-{
-  edge_iterator ei;
-  basic_block src;
-  basic_block dest;
-
-  /* Look at the edge on the top of the stack.  */
-  ei = stack[sp - 1];
-  src = ei_edge (ei)->src;
-  dest = ei_edge (ei)->dest;
-
-  /* Check if the edge source has been visited yet.  */
-  if (src != ENTRY_BLOCK_PTR && ! TEST_BIT (visited, src->index))
-{
-  /* Mark that we have visited the destination.  */
-  SET_BIT (visited, src->index);
-
-  if (EDGE_COUNT (src->preds) > 0)
-/* Since the SRC node has been visited for the first
-   time, check its predecessors.  */
-stack[sp++] = ei_start (src->preds);
-  else
-post_order[post_order_num++] = src->index;
-}
-  else
-{
-  if (ei_one_before_end_p (ei) && dest != EXIT_BLOCK_PTR)
-post_order[post_order_num++] = dest->index;
-
-  if (!ei_one_before_end_p (ei))
-ei_next (&stack[sp - 1]);
-  else
-sp--;
-}
-}
-
-  if (include_entry_exit)
-post_order[post_order_num++] = ENTRY_BLOCK;
-
-  free (stack);
-  sbitmap_free (visited);
-  return post_order_num;
-}
-
 
 /* Initialize data structures used by PRE.  */
 
@@ -4686,9 +4613,8 @@ init_pre (void)
   connect_infinite_loops_to_exit ();
   memset (&pre_stats, 0, sizeof (pre_stats));
 
-
-  postorder = XNEWVEC (int, n_basic_blocks - NUM_FIXED_BLOCKS);
-  my_rev_post_order_compute (postorder, false);
+  postorder = XNEWVEC (int, n_basic_blocks);
+  postorder_num = inverted_post_order_compute (postorder);
 
   alloc_aux_for_blocks (sizeof (struct bb_bitmap_sets));

[PATCH] Fix PR54825

2012-10-08 Thread Richard Guenther


This fixes PR54825, properly FRE/PRE vector BIT_FIELD_REFs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-10-08  Richard Guenther  

PR tree-optimization/54825
* tree-ssa-sccvn.c (vn_nary_length_from_stmt): Handle BIT_FIELD_REF.
(init_vn_nary_op_from_stmt): Likewise.
* tree-ssa-pre.c (compute_avail): Use vn_nary_op_lookup_stmt.
* tree-ssa-sccvn.h (sizeof_vn_nary_op): Avoid overflow.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 192120)
--- gcc/tree-ssa-sccvn.c(working copy)
*** vn_nary_length_from_stmt (gimple stmt)
*** 2194,2199 
--- 2194,2202 
  case VIEW_CONVERT_EXPR:
return 1;
  
+ case BIT_FIELD_REF:
+   return 3;
+ 
  case CONSTRUCTOR:
return CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt));
  
*** init_vn_nary_op_from_stmt (vn_nary_op_t
*** 2220,2225 
--- 2223,2235 
vno->op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
break;
  
+ case BIT_FIELD_REF:
+   vno->length = 3;
+   vno->op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
+   vno->op[1] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 1);
+   vno->op[2] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 2);
+   break;
+ 
  case CONSTRUCTOR:
vno->length = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt));
for (i = 0; i < vno->length; ++i)
*** init_vn_nary_op_from_stmt (vn_nary_op_t
*** 2227,2232 
--- 2237,2243 
break;
  
  default:
+   gcc_checking_assert (!gimple_assign_single_p (stmt));
vno->length = gimple_num_ops (stmt) - 1;
for (i = 0; i < vno->length; ++i)
vno->op[i] = gimple_op (stmt, i + 1);
Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 192120)
--- gcc/tree-ssa-pre.c  (working copy)
*** compute_avail (void)
*** 3850,3860 
  || code == VEC_COND_EXPR)
continue;
  
! vn_nary_op_lookup_pieces (gimple_num_ops (stmt) - 1,
!   code,
!   gimple_expr_type (stmt),
!   gimple_assign_rhs1_ptr (stmt),
!   &nary);
  if (!nary)
continue;
  
--- 3850,3856 
  || code == VEC_COND_EXPR)
continue;
  
! vn_nary_op_lookup_stmt (stmt, &nary);
  if (!nary)
continue;
  
Index: gcc/tree-ssa-sccvn.h
===
*** gcc/tree-ssa-sccvn.h(revision 192120)
--- gcc/tree-ssa-sccvn.h(working copy)
*** typedef const struct vn_nary_op_s *const
*** 51,57 
  static inline size_t
  sizeof_vn_nary_op (unsigned int length)
  {
!   return sizeof (struct vn_nary_op_s) + sizeof (tree) * (length - 1);
  }
  
  /* Phi nodes in the hashtable consist of their non-VN_TOP phi
--- 51,57 
  static inline size_t
  sizeof_vn_nary_op (unsigned int length)
  {
!   return sizeof (struct vn_nary_op_s) + sizeof (tree) * length - sizeof 
(tree);
  }
  
  /* Phi nodes in the hashtable consist of their non-VN_TOP phi

Re: patch to fix constant math - third small patch

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 1:36 PM, Kenneth Zadeck  wrote:
> yes, my bad.   here it is with the patches.

Just for the record, ok!

Thanks,
Richard.

> On 10/06/2012 11:55 AM, Kenneth Zadeck wrote:
>>
>> This is the third patch in the series of patches to fix constant math.
>> this one changes some predicates at the rtl level to use the new predicate
>> CONST_SCALAR_INT_P.
>> I did not include a few that were tightly intertwined with other changes.
>>
>> Not all of these changes are strictly mechanical.   Richard, when
>> reviewing this had me make additional changes to remove what he thought were
>> latent bugs at the rtl level.   However, it appears that the bugs were not
>> latent.I do not know what is going on here but i am smart enough to not
>> look a gift horse in the mouth.
>>
>> All of this was done on the same machine with no changes and identical
>> configs.  It is an x86-64 with ubuntu 12-4.
>>
>> ok for commit?
>>
>> in the logs below, gbBaseline is a trunk from friday and the gbWide is the
>> same revision but with my patches.  Some of this like gfortran.dg/pr32627 is
>> obviously flutter, but the rest does not appear to be.
>>
>> =
>> heracles:~/gcc(13) gccBaseline/contrib/compare_tests
>> gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log
>> New tests that PASS:
>>
>> gcc.dg/builtins-85.c scan-assembler mysnprintf
>> gcc.dg/builtins-85.c scan-assembler-not __chk_fail
>> gcc.dg/builtins-85.c (test for excess errors)
>>
>>
>> heracles:~/gcc(14) gccBaseline/contrib/compare_tests
>> gbBaseline/gcc/testsuite/gfortran/gfortran.log
>> gbWide/gcc/testsuite/gfortran/gfortran.log
>> New tests that PASS:
>>
>> gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-loops (test for
>> excess errors)
>> gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer  (test for excess
>> errors)
>> gfortran.dg/pr32627.f03  -Os  (test for excess errors)
>> gfortran.dg/pr32635.f  -O0  execution test
>> gfortran.dg/pr32635.f  -O0  (test for excess errors)
>> gfortran.dg/substr_6.f90  -O2  (test for excess errors)
>>
>> Old tests that passed, that have disappeared: (Eeek!)
>>
>> gfortran.dg/pr32627.f03  -O1  (test for excess errors)
>> gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-all-loops
>> -finline-functions  (test for excess errors)
>> gfortran.dg/pr32627.f03  -O3 -g  (test for excess errors)
>> gfortran.dg/substring_equivalence.f90  -O  (test for excess errors)
>> Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as
>> tool-and-target-specific interface file.
>>
>> === g++ Summary ===
>>
>> # of expected passes49793
>> # of expected failures284
>> # of unsupported tests601
>>
>> runtest completed at Fri Oct  5 16:10:20 2012
>> heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using
>> /usr/share/dejagnu/config/unix.exp as generic interface file for target.
>> Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as
>> tool-and-target-specific interface file.
>>
>> === g++ Summary ===
>>
>> # of expected passes50472
>> # of expected failures284
>> # of unsupported tests613
>>
>> runtest completed at Fri Oct  5 19:51:50 2012
>>
>>
>>
>>
>>
>

Re: [PATCH] PR 53528 c++/ C++11 Generalized Attribute support

2012-10-08 Thread Dodji Seketeli

Jason Merrill  writes:

> OK.

Thanks.  Committed to trunk at revision r192199.

-- 
Dodji

Re: RFA: darwin PATCH to fix build, internal visibility

2012-10-08 Thread Dominique Dhumieres

> > It appears that the patch should also special case the "scan-assembler 
> > .internal.*Foo.methodEv"
> > tests in g++.dg/ext/visibility/pragma-override1.C and 
> > g++.dg/ext/visibility/pragma-override2.C
> > on darwin as well...
>
> Done, thanks.

Jason,

These tests are still failing on darwin. I think that
target { ! *-*-solaris2* } { ! *-*-darwin* }
sould be replaced with
target { ! { *-*-solaris2* *-*-darwin* } }

TIA

Dominique

Re: [RFC] Implement load sinking in loops

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 12:38 PM, Eric Botcazou  wrote:
> Hi,
>
> we recently noticed that, even at -O3, the compiler doesn't figure out that
> the following loop is dumb:
>
> #define SIZE 64
>
> int foo (int v[])
> {
>   int r;
>
>   for (i = 0; i < SIZE; i++)
> r = v[i];
>
>   return r;
> }
>
> which was a bit of a surprise.  On second thoughts, this isn't entirely
> unexpected, as it probably matters only for (slightly) pathological cases.
> The attached patch nevertheless implements a form of load sinking in loops so
> as to optimize these cases.  It's combined with invariant motion to optimize:
>
> int foo (int v[], int a)
> {
>   int r, i;
>
>   for (i = 0; i < SIZE; i++)
> r = v[i] + a;
>
>   return r;
> }
>
> and with store sinking to optimize:
>
> int foo (int v1[], int v2[])
> {
>   int r[SIZE];
>   int i, j;
>
>   for (j = 0; j < SIZE; j++)
> for (i = 0; i < SIZE; i++)
>   r[j] = v1[j] + v2[i];
>
>   return r[SIZE - 1];
> }
>
> The optimization is enabled at -O2 in the patch for measurement purposes but,
> given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap,
> compiler-only, all languages except Go), it's probably best suited to -O3.
> Or perhaps we don't care and it should simply be dropped...  Thoughts?

Incidentially we have scev-const-prop to deal with the similar case of
scalar computations.  But I realize this doesn't work for expressions that
are dependent on a loop variant load.

@@ -103,6 +103,7 @@ typedef struct mem_ref_loc
 {
   tree *ref;   /* The reference itself.  */
   gimple stmt; /* The statement in that it occurs.  */
+  tree lhs;/* The (ultimate) LHS for a load.  */
 } *mem_ref_loc_p;

isn't that the lhs of stmt?

+static gimple_seq
+copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs)
+{
+  tree mem = *loc->ref;
+  tree lhs, tmp_var, ssa_name;
+  gimple_seq seq = NULL;
+  gimple stmt;
+  unsigned n = 0;
+
+  /* First copy the load and create the new LHS for it.  */
+  lhs = gimple_assign_lhs (loc->stmt);
+  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));

use make_temp_ssa_name or simply copy_ssa_name (not sure you need
fancy names here).

+  if (gimple_assign_rhs1 (use_stmt) == lhs)
+   {
+ op1 = ssa_name;
+ op2 = gimple_assign_rhs2 (use_stmt);
+   }
+  else
+   {
+ op1 = gimple_assign_rhs1 (use_stmt);
+ op2 = ssa_name;
+   }

this may enlarge lifetime of the other operand?  And it looks like it would
break with unary stmts (accessing out-of-bounds op2).  Also for
is_gimple_min_invariant other operand which may be for example &a.b
you need to unshare_expr it.

+  lhs = gimple_assign_lhs (use_stmt);
+  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));
+  stmt = gimple_build_assign_with_ops (rhs_code, tmp_var, op1, op2);
+  ssa_name = make_ssa_name (tmp_var, stmt);
+  gimple_assign_set_lhs (stmt, ssa_name);

see above.  This can now be simplified to

   lhs = gimple_assign_lhs (use_stmt);
   ssa_name = copy_ssa_name (lhs, NULL);
   stmt = gimple_build_assign_with_ops (rhs_code, ssa_name, op1, op2);

Btw - isn't this all a bit backward (I mean the analysis in execute_lm?)
What you want is apply this transform to as much of the _DEF_s of
the loop-closed PHI nodes - only values used outside of the loop are
interesting.  Thats (sort-of) what SCEV const-prop does (well, it also
uses SCEV to compute the overall effect of the iterations).  So what
you want to know is whether when walking the DEF chain of the
loop closed PHI you end up at definitions before the loop or at
definitions that are not otherwise used inside the loop.

Which means it is really expression sinking.  Does tree-ssa-sink manage
to sink anything out of a loop?  Even scalar computation parts I mean?  For

 for (..)
   {
 a = x[i];
 y[i] = a;
 b = a * 2;
   }
  ... = b;

it should be able to sink b = a*2.

So I think the more natural place to implement this is either SCEV cprop
or tree-ssa-sink.c.  And process things from the loop-closed PHI use
walking the DEFs (first process all, marking interesting things to also
catch commonly used exprs for two PHI uses).

Again you might simply want to open a bugreport for this unless you
want to implement it yourself.

Thanks,
Richard.

> Tested on x86_64-suse-linux.
>
>
> 2012-10-08  Eric Botcazou  
>
> * gimple.h (gsi_insert_seq_on_edge_before): Declare.
> * gimple-iterator.c (gsi_insert_seq_on_edge_before): New function.
> * tree-ssa-loop-im.c (struct mem_ref_loc): Add LHS field.
> (mem_ref_in_stmt): Remove gcc_assert.
> (copy_load_and_single_use_chain): New function.
> (execute_lm): Likewise.
> (hoist_memory_references): Hoist the loads after the stores.
> (ref_always_accessed_p): Rename into...
> (ref_always_stored_p): ...this.  Remove STORE_P and add ONCE_P.
>

Re: [RFC] Implement load sinking in loops

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 2:32 PM, Richard Guenther
 wrote:
> On Mon, Oct 8, 2012 at 12:38 PM, Eric Botcazou  wrote:
>> Hi,
>>
>> we recently noticed that, even at -O3, the compiler doesn't figure out that
>> the following loop is dumb:
>>
>> #define SIZE 64
>>
>> int foo (int v[])
>> {
>>   int r;
>>
>>   for (i = 0; i < SIZE; i++)
>> r = v[i];
>>
>>   return r;
>> }
>>
>> which was a bit of a surprise.  On second thoughts, this isn't entirely
>> unexpected, as it probably matters only for (slightly) pathological cases.
>> The attached patch nevertheless implements a form of load sinking in loops so
>> as to optimize these cases.  It's combined with invariant motion to optimize:
>>
>> int foo (int v[], int a)
>> {
>>   int r, i;
>>
>>   for (i = 0; i < SIZE; i++)
>> r = v[i] + a;
>>
>>   return r;
>> }
>>
>> and with store sinking to optimize:
>>
>> int foo (int v1[], int v2[])
>> {
>>   int r[SIZE];
>>   int i, j;
>>
>>   for (j = 0; j < SIZE; j++)
>> for (i = 0; i < SIZE; i++)
>>   r[j] = v1[j] + v2[i];
>>
>>   return r[SIZE - 1];
>> }
>>
>> The optimization is enabled at -O2 in the patch for measurement purposes but,
>> given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap,
>> compiler-only, all languages except Go), it's probably best suited to -O3.
>> Or perhaps we don't care and it should simply be dropped...  Thoughts?
>
> Incidentially we have scev-const-prop to deal with the similar case of
> scalar computations.  But I realize this doesn't work for expressions that
> are dependent on a loop variant load.
>
> @@ -103,6 +103,7 @@ typedef struct mem_ref_loc
>  {
>tree *ref;   /* The reference itself.  */
>gimple stmt; /* The statement in that it occurs.  */
> +  tree lhs;/* The (ultimate) LHS for a load.  */
>  } *mem_ref_loc_p;
>
> isn't that the lhs of stmt?
>
> +static gimple_seq
> +copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs)
> +{
> +  tree mem = *loc->ref;
> +  tree lhs, tmp_var, ssa_name;
> +  gimple_seq seq = NULL;
> +  gimple stmt;
> +  unsigned n = 0;
> +
> +  /* First copy the load and create the new LHS for it.  */
> +  lhs = gimple_assign_lhs (loc->stmt);
> +  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));
>
> use make_temp_ssa_name or simply copy_ssa_name (not sure you need
> fancy names here).
>
> +  if (gimple_assign_rhs1 (use_stmt) == lhs)
> +   {
> + op1 = ssa_name;
> + op2 = gimple_assign_rhs2 (use_stmt);
> +   }
> +  else
> +   {
> + op1 = gimple_assign_rhs1 (use_stmt);
> + op2 = ssa_name;
> +   }
>
> this may enlarge lifetime of the other operand?  And it looks like it would
> break with unary stmts (accessing out-of-bounds op2).  Also for
> is_gimple_min_invariant other operand which may be for example &a.b
> you need to unshare_expr it.
>
> +  lhs = gimple_assign_lhs (use_stmt);
> +  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, 
> n++));
> +  stmt = gimple_build_assign_with_ops (rhs_code, tmp_var, op1, op2);
> +  ssa_name = make_ssa_name (tmp_var, stmt);
> +  gimple_assign_set_lhs (stmt, ssa_name);
>
> see above.  This can now be simplified to
>
>lhs = gimple_assign_lhs (use_stmt);
>ssa_name = copy_ssa_name (lhs, NULL);
>stmt = gimple_build_assign_with_ops (rhs_code, ssa_name, op1, op2);
>
> Btw - isn't this all a bit backward (I mean the analysis in execute_lm?)
> What you want is apply this transform to as much of the _DEF_s of
> the loop-closed PHI nodes - only values used outside of the loop are
> interesting.  Thats (sort-of) what SCEV const-prop does (well, it also
> uses SCEV to compute the overall effect of the iterations).  So what
> you want to know is whether when walking the DEF chain of the
> loop closed PHI you end up at definitions before the loop or at
> definitions that are not otherwise used inside the loop.
>
> Which means it is really expression sinking.  Does tree-ssa-sink manage
> to sink anything out of a loop?  Even scalar computation parts I mean?  For
>
>  for (..)
>{
>  a = x[i];
>  y[i] = a;
>  b = a * 2;
>}
>   ... = b;
>
> it should be able to sink b = a*2.
>
> So I think the more natural place to implement this is either SCEV cprop
> or tree-ssa-sink.c.  And process things from the loop-closed PHI use
> walking the DEFs (first process all, marking interesting things to also
> catch commonly used exprs for two PHI uses).
>
> Again you might simply want to open a bugreport for this unless you
> want to implement it yourself.

We indeed sink 2*tem but not a[i] here.  Because tree-ssa-sink.c doesn't
sink loads (IIRC) at all, but I've seen patches to fix that (IIRC).

int a[256];
int foo (int x)
{
  int i, k = 0;
  for (i = 0; i < x; ++i)
{
  int tem = a[i];
  k = 2*tem;
}
  return k;
}

Richard.

> Thanks,
> Richard.
>
>> Tested on x86_64-suse-linux.
>>
>>
>> 2012-10-

gcc/lto/lto.c: Free lto_file struct after closing the file

2012-10-08 Thread Tobias Burnus


lto_obj_file_open allocates:
  lo = XCNEW (struct lto_simple_object);
However, the data is never freed - neither explicitly nor in 
lto_obj_file_close.


In the attached patch, I free the memory now after the call to 
lto_obj_file_close.


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias


patch.diff
Description: application/unknown

Re: gcc/lto/lto.c: Free lto_file struct after closing the file

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 2:39 PM, Tobias Burnus  wrote:
> lto_obj_file_open allocates:
>   lo = XCNEW (struct lto_simple_object);
> However, the data is never freed - neither explicitly nor in
> lto_obj_file_close.
>
> In the attached patch, I free the memory now after the call to
> lto_obj_file_close.
>
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk?

Ok.

Thanks,
Richard.

> Tobias

Re: [lra] another patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher

On Mon, Oct 8, 2012 at 1:00 AM, Steven Bosscher  wrote:
> Hello,
>
> This patch changes the worklist-like bitmap in lra_eliminate() to an
> sbitmap.  Effect on compile time:

I have another patch to also make lra_constraint_insn_stack_bitmap.

Without patch:
log.0: LRA non-specific:  46.94 ( 6%)
log.0: LRA virtuals elimination:  51.56 ( 6%)
log.0: LRA reload inheritance  :   0.03 ( 0%)
log.0: LRA create live ranges  :  46.67 ( 6%)
log.0: LRA hard reg assignment :   0.55 ( 0%)

With patch:
log.3: LRA non-specific:  18.14 ( 2%)
log.3: LRA virtuals elimination:   8.04 ( 1%)
log.3: LRA reload inheritance  :   0.03 ( 0%)
log.3: LRA create live ranges  :  45.01 ( 6%)
log.3: LRA hard reg assignment :   0.63 ( 0%)

I'll go through the usual testing cycle again with my patch set and
post the final patch here for review today or tomorrow.

At this point I think it's clear that we can speed up LRA even on
crazy-large test cases, so I would not object anymore to a merge into
the trunk at this point.

Ciao!
Steven

Re: [patch] Add option to compute "reaching and live definitions"

2012-10-08 Thread Paolo Bonzini

Il 07/10/2012 19:18, Steven Bosscher ha scritto:
> Hello,
> 
> The attached patch adds a DF changeable flag to compute a subset of
> reaching definitions that are also live at the program points they
> reach. This is an idea I discussed with Paolo many years ago already,
> but until today it hadn't really ever been close to the top of my todo
> list, but trying to compile the test case for PR54146 with -fweb
> finally changed that :-)
> 
> The idea is to prune the DF_RD_OUT set of each basic block by
> registers live in DF_LR_OUT. I've implemented this pruning with the
> same approach as the sparse formulation of RD dataflow, expanding the
> regs in DF_LR_OUT to the corresponding set of DEFs and using that set
> to mask out dead DEFs in DF_RD_OUT. This is a convenient formulation
> because DF_LR is already expressed in terms of regnos (like
> sparse_kill & friends), and the formulation also works fine for the
> dense formulation, of course.
> 
> The effect on compile time for a set of cc1-i files is negligible (not
> measurable, anyway), but for crazy large test cases like PR54146 this
> patch is the difference between triggering out-of-memory or completing
> the pass (at least -fweb, probably also the other affected passes).
> 
> Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?

Ok.

I wonder if we actually need the non-pruned version anywhere...

Paolo

> df_rd_pruned.diff
> 
>   * bitmap.h (bitmap_and_into): Update prototype.
>   * bitmap.c (bitmap_and_into): Return true if the target bitmap
>   changed, false otherwise.
> 
>   * df.h (df_dump_insn_problem_function): New function type.
>   (struct df_problem): Add two functions, to dump just before and
>   just after an insn.
>   (DF_RD_PRUNE_DEAD_DEFS): New changable flag.
>   (df_dump_insn_top, df_dump_insn_bottom): New prototypes.
>   * df-core (df_dump_region): Use dump_bb.
>   (df_dump_bb_problem_data): New function.
>   (df_dump_top, df_dump_bottom): Rewrite using df_dump_bb_problem_data.
>   (df_dump_insn_problem_data): New function.
>   (df_dump_insn_top, df_dump_insn_bottom): New functions.
>   * df-scan.c (problem_SCAN): Add NULL fields for new members.
>   * df-problems.c (df_rd_local_compute): Ignore hard registers if
>   DF_NO_HARD_REGS is in effect.
>   (df_rd_transfer_function): If DF_RD_PRUNE_DEAD_DEFS is in effect,
>   prune reaching defs using the LR problem.
>   (df_rd_start_dump): Fix dumping of DEFs map.
>   (df_rd_dump_defs_set): New function.
>   (df_rd_top_dump, df_rd_bottom_dump): Use it.
>   (problem_RD): Add NULL fields for new members.
>   (problem_LR, problem_LIVE): Likewise.
>   (df_chain_bb_dump): New function.
>   (df_chain_top_dump): Dump only for artificial DEFs and USEs,
>   using df_chain_bb_dump.
>   (df_chain_bottom_dump): Likewise.
>   (df_chain_insn_top_dump, df_chain_insn_bottom_dump): New functions.
>   (problem_CHAIN): Add them as new members.
>   (problem_WORD_LR, problem_NOTE): Add NULL fields for new members.
>   (problem_MD): Likewise.
>   * cfgrtl.c (rtl_dump_bb): Use df_dump_insn_top and df_dump_insn_bottom.
>   (print_rtl_with_bb): Likewise.
> 
>   * dce.c (init_dce): Use DF_RD_PRUNE_DEAD_DEFS.
>   * loop-invariant.c (find_defs): Likewise.
>   * loop-iv.c (iv_analysis_loop_init): Likewise.
>   * ree.c (find_and_remove_re): Likewise.
>   * web.c (web_main): Likewise.
> 
> Index: bitmap.h
> ===
> --- bitmap.h  (revision 192106)
> +++ bitmap.h  (working copy)
> @@ -224,7 +224,7 @@ extern unsigned long bitmap_count_bits (const_bitm
> are three operand versions that to not destroy the source bitmaps.
> The operations supported are &, & ~, |, ^.  */
>  extern void bitmap_and (bitmap, const_bitmap, const_bitmap);
> -extern void bitmap_and_into (bitmap, const_bitmap);
> +extern bool bitmap_and_into (bitmap, const_bitmap);
>  extern bool bitmap_and_compl (bitmap, const_bitmap, const_bitmap);
>  extern bool bitmap_and_compl_into (bitmap, const_bitmap);
>  #define bitmap_compl_and(DST, A, B) bitmap_and_compl (DST, B, A)
> Index: bitmap.c
> ===
> --- bitmap.c  (revision 192106)
> +++ bitmap.c  (working copy)
> @@ -916,17 +916,18 @@ bitmap_and (bitmap dst, const_bitmap a, const_bitm
>  dst->indx = dst->current->indx;
>  }
>  
> -/* A &= B.  */
> +/* A &= B.  Return true if A changed.  */
>  
> -void
> +bool
>  bitmap_and_into (bitmap a, const_bitmap b)
>  {
>bitmap_element *a_elt = a->first;
>const bitmap_element *b_elt = b->first;
>bitmap_element *next;
> +  bool changed = false;
>  
>if (a == b)
> -return;
> +return false;
>  
>while (a_elt && b_elt)
>  {
> @@ -935,6 +936,7 @@ bitmap_and_into (bitmap a, const_bitmap b)
> next = a_elt->next;
> bitmap_element_free (a

[Patch] Fix PR52945

2012-10-08 Thread Dominique Dhumieres

The following patch fixes PR52945 on Darwin. It as beem approved
by Jan Hubicka in PR52945#c5. Since I don't have write permission,
could someone commit it for me?

TIA

Dominique

2012-10-08  Dominique d'Humieres  

PR gcc/52945
* testsuite/gcc.dg/lto/pr52634_0.c: skip the test on Darwin.

--- /opt/gcc/_clean/gcc/testsuite/gcc.dg/lto/pr52634_0.c2012-04-10 
08:58:02.0 +0200
+++ /opt/gcc/work/gcc/testsuite/gcc.dg/lto/pr52634_0.c  2012-06-19 
15:09:29.0 +0200
@@ -1,3 +1,5 @@
+/* { dg-require-weak "" } */
+/* { dg-require-alias "" } */
 /* { dg-lto-do link } */
 /* { dg-lto-options {{-flto -r -nostdlib -flto-partition=1to1}} */
 extern int cfliteValueCallBacks;

Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-08 Thread Pavel Chupin

On Android NDK libstdc++ is configured, built and packaged separately.
The problem is not dependency on libgcc sources but rather dependency
on the symlink which is generated during libgcc build and cannot be
found if libstdc++ is configured and built separately.
It was working fine for 4.4 and 4.6. This issue has been introduced in 4.7.

Do you think libstdc++ should not be built separately?

2012/10/6 Andrew Pinski :
> On Fri, Oct 5, 2012 at 12:13 PM, Andrew Pinski  wrote:
>> On Fri, Oct 5, 2012 at 7:32 AM, Pavel Chupin  
>> wrote:
>>> I can't configure libstdc++ separately. To reproduce:
>>>
>>> mkdir BUILD
>>> cd BUILD
>>> ../libstdc++-v3/configure
>>>
>>> Error:
>>> make: *** No rule to make target
>>> `/users/pvchupin/android/toolchain/gcc/gcc-4.8/BUILD/../libgcc/gthr-default.h',
>>> needed by `bits/gthr-default.h'.  Stop.
>>>
>>> See fix attached.
>>>
>>> Ok for trunk and 4.7?
>>
>> Why do you want to compile libstdc++ separately from GCC?  I think you
>> need to explain why you want to do that.  In fact libstdc++ depends on
>> libgcc internals is not a bug but rather a feature.
>
> One more thing is that for cases where target==host!=build, you can
> just use the libraries which are produced by the cross compiler and
> use "make all-host" and "make install-host" for the programs.
>
> This should simplify how Yocto builds the "native" GCC and not
> worrying about building libstdc++ separately.
>
> Thanks,
> Andrew Pinski
>
>>
>> Thanks,
>> Andrew Pinski
>>
>>
>>
>>>
>>> 2012-10-05  Pavel Chupin  
>>>
>>> Fix missing gthr-default.h issue on separate libstdc++ configure
>>> * libstdc++-v3/acinclude.m4: Define glibcxx_thread_h.
>>> * libstdc++-v3/include/Makefile.am: Use glibcxx_thread_h.
>>> * libstdc++-v3/Makefile.in: Regenerate.
>>> * libstdc++-v3/configure: Regenerate.
>>> * libstdc++-v3/doc/Makefile.in: Regenerate.
>>> * libstdc++-v3/include/Makefile.in: Regenerate.
>>> * libstdc++-v3/libsupc++/Makefile.in: Regenerate.
>>> * libstdc++-v3/po/Makefile.in: Regenerate.
>>> * libstdc++-v3/python/Makefile.in: Regenerate.
>>> * libstdc++-v3/src/Makefile.in: Regenerate.
>>> * libstdc++-v3/src/c++11/Makefile.in: Regenerate.
>>> * libstdc++-v3/src/c++98/Makefile.in: Regenerate.
>>> * libstdc++-v3/testsuite/Makefile.in: Regenerate.
>>>
>>> --
>>> Pavel Chupin
>>> Intel Corporation



-- 
Pavel Chupin
Software Engineer
Intel Corporation

Re: [C++ Patch/RFC] PR 54194

2012-10-08 Thread Jason Merrill

This is definitely an improvement, though for warnings about issues with 
the left or right argument, we could use the EXPR_LOCATION of the 
problematic argument rather than the location of the new operand.


Jason

Re: [PATCH] Improve debug info for partial inlining (PR debug/54519, take 2)

2012-10-08 Thread H.J. Lu

On Fri, Oct 5, 2012 at 7:19 AM, Jakub Jelinek  wrote:
> On Fri, Oct 05, 2012 at 03:59:55PM +0200, Richard Guenther wrote:
>> I don't think we want to rely on that ... so just keep the push/pop_cfun.
>
> Ok, so this is what I'm retesting (basically just comments added and the two
> lines (subcode and set) swapped:
>
> 2012-10-05  Jakub Jelinek  
>
> PR debug/54519
> * ipa-split.c (split_function): Add debug args and
> debug source and normal stmts for args_to_skip which are
> gimple regs.
> * tree-inline.c (copy_debug_stmt): When inlining, adjust
> source debug bind stmts to debug binds of corresponding
> DEBUG_EXPR_DECL.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54837

-- 
H.J.

Re: [C++ Patch/RFC] PR 54194

2012-10-08 Thread Paolo Carlini


On 10/08/2012 03:57 PM, Jason Merrill wrote:
This is definitely an improvement, though for warnings about issues 
with the left or right argument, we could use the EXPR_LOCATION of the 
problematic argument rather than the location of the new operand.

I agree. Let me see if I can figure out something straightforward enough.

Thanks!
Paolo.

Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-08 Thread Paolo Carlini


On 10/08/2012 03:43 PM, Pavel Chupin wrote:

This issue has been introduced in 4.7.
Irrespective of what we are eventually going to do from a practical 
point of view, I think it would be important to understand when/what 
introduced the issue: did you analyze that in any detail?


Thanks,
Paolo.

Re: [patch] Add option to compute "reaching and live definitions"

2012-10-08 Thread Steven Bosscher

On Mon, Oct 8, 2012 at 3:27 PM, Paolo Bonzini wrote:
> I wonder if we actually need the non-pruned version anywhere...

I don't think so, but I'm not sure. Only ddg.c and loop-iv.c access
the DF_RD results directly (i.e. not via DU/UD chains). For loop-iv
the pruned version is fine. For ddg I didn't feel comfortable enough
with that code to perform the changes there as well.

Ciao!
Steven

Re: Fixup INTEGER_CST

2012-10-08 Thread Jan Hubicka

> >   2) As we query the type_hash while we are rewritting the types,
> >  we run into instability of the hashtable. This manifests itself
> >  as an ICE when one adds sanity check that while merging function
> >  types their arg types are equivalent, too.
> >  This ICEs compiling i.e. sqlite but I did not really managed to
> >  reduce this.  I tracked it down to the argument type being inserted
> >  into gimple_type_hash but at the time we query the new argument type,
> >  the original is no longer found despite their hashes are equivalent.
> >  The problem is hidden when things fit into the leader cache,
> >  so one needs rather big testcase.
> 
> Ugh.  For reduction you can disable those caches though.  The above
> means there is a disconnect between hashing and comparing.
> Maybe it's something weird with the early out
> 
>   if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
> goto same_types;
> ?

I filled in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54856 sadly the
testcase I reduced with yesterday tree do not reproduce on today tree on
different machine.  Perhaps it is hash table conflict with GGC or something
like that.

sqlite seems big enough to trigger the bug quite reproducibly. On current
mainline I however need to disable leader cache (that was not true on weekend
on the other machine ;)

Honza

PING Re: [PATCH] PR c++/53540 - using fails to be equivalent to typedef

2012-10-08 Thread Dodji Seketeli

Friendly pinging this patch.

Dodji Seketeli  writes:

> Hello,
>
> In the example of this problem report, during the substituting of int
> into 'function', tsubst_aggr_type fails for the alias ctxt1.  This is
> because TYPE_TEMPLATE_INFO looks for the TEMPLATE_INFO of the ctxt1
> alias at the wrong place and was wrongly finding it to be NULL.
> Namely, it was looking for it in the DECL_TEMPLATE_INFO of the
> declaration of the type -- as if ctxt1 was an alias template
> specialization -- rather than looking of it in its
> CLASSTYPE_TEMPLATE_INFO.
>
> Fixed thus.  The second hunk of the patch is just to prevent the
> compiler from crashing when primary_template_instantiation_p is passed
> an alias of a class template instantiation.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu against trunk.
>
> gcc/cp
>
>   * cp-tree.h (TYPE_TEMPLATE_INFO): For an alias that is not an
>   instance of alias template, don't look for its TEMPLATE_INFO in
>   its declaration.
>   * pt.c (primary_template_instantiation_p): Don't crash on an alias
>   that is not an instance of a template.
>
> gcc/testsuite/
>
>   * g++.dg/cpp0x/alias-decl-21.C: New test.
> ---
>  gcc/cp/cp-tree.h   |4 ++--
>  gcc/cp/pt.c|1 +
>  gcc/testsuite/g++.dg/cpp0x/alias-decl-21.C |   24 
>  3 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-21.C
>
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 44f3ac1..64a8830 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -2634,8 +2634,8 @@ extern void decl_shadowed_for_var_insert (tree, tree);
> template info for the alias template, not the one (if any) for the
> template of the underlying type.  */
>  #define TYPE_TEMPLATE_INFO(NODE) \
> -  (TYPE_ALIAS_P (NODE)   
> \
> -   ? ((TYPE_NAME (NODE) && DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))\
> +  ((TYPE_ALIAS_P (NODE) && DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))\
> +   ? (DECL_LANG_SPECIFIC (TYPE_NAME (NODE))  \
>? DECL_TEMPLATE_INFO (TYPE_NAME (NODE))
> \
>: NULL_TREE)   \
> : ((TREE_CODE (NODE) == ENUMERAL_TYPE)\
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index ad81bab..3163bd4 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -2854,6 +2854,7 @@ primary_template_instantiation_p (const_tree t)
>else if (TYPE_P (t)
>  && TYPE_TEMPLATE_INFO (t)
>  && PRIMARY_TEMPLATE_P (TYPE_TI_TEMPLATE (t))
> +&& DECL_LANG_SPECIFIC (TYPE_NAME (t))
>  && DECL_TEMPLATE_INSTANTIATION (TYPE_NAME (t)))
>  return true;
>return false;
> diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-21.C 
> b/gcc/testsuite/g++.dg/cpp0x/alias-decl-21.C
> new file mode 100644
> index 000..b68fa93
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-21.C
> @@ -0,0 +1,24 @@
> +// Origin: PR c++/53540
> +// { dg-do compile { target c++11 } }
> +
> +template 
> +struct context
> +{
> +  typedef int type;
> +};
> +
> +template 
> +void function()
> +{
> +  using ctx1 = context;
> +  typename ctx1::type f1;
> +
> +  typedef context ctx2;
> +  typename ctx2::type f2;
> +}
> +
> +int main()
> +{
> +  function();
> +}
> +

-- 
Dodji

Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-08 Thread Pavel Chupin

It has been changed here:
http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=630d52ca0a88d173f89634a5d7dd8aee07d04d80

subj:"Move gthr to toplevel libgcc"

Here is the change (builddir is used as the directory for gthr_default.h):
-${host_builddir}/gthr-default.h: ${toplevel_srcdir}/gcc/${glibcxx_thread_h} \
+${host_builddir}/gthr-default.h: ${toplevel_builddir}/libgcc/gthr-default.h \

All other changes are fine since continue to look into srcdir:
-${host_builddir}/gthr-posix.h: ${toplevel_srcdir}/gcc/gthr-posix.h \
+${host_builddir}/gthr-posix.h: ${toplevel_srcdir}/libgcc/config/gthr-posix.h \

2012/10/8 Paolo Carlini :
> On 10/08/2012 03:43 PM, Pavel Chupin wrote:
>>
>> This issue has been introduced in 4.7.
>
> Irrespective of what we are eventually going to do from a practical point of
> view, I think it would be important to understand when/what introduced the
> issue: did you analyze that in any detail?
>
> Thanks,
> Paolo.



-- 
Pavel Chupin
Software Engineer
Intel Corporation

Re: [i386] recognize haddpd

2012-10-08 Thread Marc Glisse


On Fri, 28 Sep 2012, Uros Bizjak wrote:


2) {v[0]-v[1], v[0]-v[1]} is not recognized as a hsubpd because
vec_duplicate doesn't match vec_concat. Do we really need to duplicate (no
pun intended) the pattern?


You can add this transformation to simplify-rtx.c. Probably vec_concat
with two equal operands can be canonicalized as vec_duplicate.


Actually, it is replacing vec_duplicate with vec_concat that would help. 
Well, I'll see about that later.


Here is what I came up with, trying to follow your other advice (thanks a 
lot!).


Passes bootstrap+testsuite.

2012-10-08  Marc Glisse  

gcc/
PR target/54400
* config/i386/i386.md (type attribute): Add sseadd1.
(unit attribute): Add support for sseadd1.
* config/i386/sse.md (sse3_hv2df3): split into...
(sse3_haddv2df3): ... expander.
(*sse3_haddv2df3): ... define_insn. Accept permuted operands.
(sse3_hsubv2df3): ... define_insn.
(*sse3_haddv2df3_low): New define_insn.
(*sse3_hsubv2df3_low): New define_insn.

gcc/testsuite/
PR target/54400
* gcc.target/i386/pr54400.c: New testcase.

--
Marc GlisseIndex: gcc/testsuite/gcc.target/i386/pr54400.c
===
--- gcc/testsuite/gcc.target/i386/pr54400.c (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr54400.c (revision 0)
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse3 -mfpmath=sse" } */
+
+#include 
+
+double f (__m128d p)
+{
+  return p[0] - p[1];
+}
+
+double g1 (__m128d p)
+{
+  return p[0] + p[1];
+}
+
+double g2 (__m128d p)
+{
+  return p[1] + p[0];
+}
+
+__m128d h (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] - p[1], q[0] - q[1] };
+  return r;
+}
+
+__m128d i1 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[0] + q[1] };
+  return r;
+}
+
+__m128d i2 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[1] + q[0] };
+  return r;
+}
+
+__m128d i3 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[0] + q[1] };
+  return r;
+}
+
+__m128d i4 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[1] + q[0] };
+  return r;
+}
+
+/* { dg-final { scan-assembler-times "hsubpd" 2 } } */
+/* { dg-final { scan-assembler-times "haddpd" 6 } } */
+/* { dg-final { scan-assembler-not "unpck" } } */

Property changes on: gcc/testsuite/gcc.target/i386/pr54400.c
___
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md (revision 192206)
+++ gcc/config/i386/i386.md (working copy)
@@ -320,36 +320,36 @@
 ;; provided in other attributes.
 (define_attr "type"
   "other,multi,
alu,alu1,negnot,imov,imovx,lea,
incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
icmp,test,ibr,setcc,icmov,
push,pop,call,callv,leave,
str,bitmanip,
fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-   
sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
-   ssemuladd,sse4arg,lwp,
+   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
+   ssediv,sseins,ssemuladd,sse4arg,lwp,
mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
 ;; Main data type used by the insn
 (define_attr "mode"
   "unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,XF,TF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF"
   (const_string "unknown"))
 
 ;; The CPU unit operations uses.
 (define_attr "unit" "integer,i387,sse,mmx,unknown"
   (cond [(eq_attr "type" 
"fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
   (const_string "i387")
 (eq_attr "type" 
"sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
- sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
+ 
sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
   (const_string "sse")
 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
   (const_string "mmx")
 (eq_attr "type" "other")
   (const_string "unknown")]
 (const_string "integer")))
 
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
Index: gcc/config/i386/sse.md
===
--- gcc/config/i386/sse.md  (revision 192206)
+++ gcc/config/i386/sse.md  (working copy)
@@ -1209,42 +1209,120 @@
  (vec_select:DF (match_dup 1) (parallel [(const_int 3)])))
(plusminus:DF
  (vec_select:DF (match_dup 2) (parallel [(const_int 2)]))
  (vec_select:DF (match_dup 2) (parallel [(const_int 3)]))]
   "TARGET_AVX"
   "vhpd

RFA: PATCH to acinclude.m4 to fix gas version detection

2012-10-08 Thread Jason Merrill


On 10/04/2012 11:40 AM, Jason Merrill wrote:

Recent versions of binutils seem to have started putting ' around the
version number in bfd/configure.in, which was confusing gcc configure.


When this change was made to binutils, the other directories changed to 
using bfd/configure --version to get the version number, so this version 
of my patch uses that instead of changing the regexp.  This patch also 
fixes another issue I noticed with AIX configury.


OK for trunk?

Jason

commit 94d42e379702606ec09b241d54ed7ad72cfaff99
Author: Jason Merrill 
Date:   Fri Oct 5 18:59:08 2012 -0400

	* acinclude.m4 (gcc_cv_gas_version): Try bfd/configure --version first.
	* configure.ac (gcc_cv_gld_version): Likewise.
	(gcc_cv_as_aix_ref): Fix typo.
	* configure: Regenerate.

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index c24464b..f7699ea 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -389,6 +389,8 @@ dnl # gcc_cv_as_gas_srcdir must be defined before this.
 dnl # This gross requirement will go away eventually.
 AC_DEFUN([_gcc_COMPUTE_GAS_VERSION],
 [gcc_cv_as_bfd_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/bfd
+gcc_cv_gas_version=`$gcc_cv_as_bfd_srcdir/configure --version | sed -n -e '1s,.* ,VERSION=,p'`
+if test x$gcc_cv_gas_version != x; then true; else
 for f in $gcc_cv_as_bfd_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure.in \
@@ -397,7 +399,7 @@ for f in $gcc_cv_as_bfd_srcdir/configure \
   if test x$gcc_cv_gas_version != x; then
 break
   fi
-done
+done; fi
 gcc_cv_gas_major_version=`expr "$gcc_cv_gas_version" : "VERSION=\([[0-9]]*\)"`
 gcc_cv_gas_minor_version=`expr "$gcc_cv_gas_version" : "VERSION=[[0-9]]*\.\([[0-9]]*\)"`
 gcc_cv_gas_patch_version=`expr "$gcc_cv_gas_version" : "VERSION=[[0-9]]*\.[[0-9]]*\.\([[0-9]]*\)"`
diff --git a/gcc/configure b/gcc/configure
index 45bba8e..fe4f3c7 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -21233,6 +21233,8 @@ if test "$gcc_cv_as" = ../gas/as-new$build_exeext; then
 $as_echo "newly built gas" >&6; }
   in_tree_gas=yes
   gcc_cv_as_bfd_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/bfd
+gcc_cv_gas_version=`$gcc_cv_as_bfd_srcdir/configure --version | sed -n -e '1s,.* ,VERSION=,p'`
+if test x$gcc_cv_gas_version != x; then true; else
 for f in $gcc_cv_as_bfd_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure.in \
@@ -21241,7 +21243,7 @@ for f in $gcc_cv_as_bfd_srcdir/configure \
   if test x$gcc_cv_gas_version != x; then
 break
   fi
-done
+done; fi
 gcc_cv_gas_major_version=`expr "$gcc_cv_gas_version" : "VERSION=\([0-9]*\)"`
 gcc_cv_gas_minor_version=`expr "$gcc_cv_gas_version" : "VERSION=[0-9]*\.\([0-9]*\)"`
 gcc_cv_gas_patch_version=`expr "$gcc_cv_gas_version" : "VERSION=[0-9]*\.[0-9]*\.\([0-9]*\)"`
@@ -21393,13 +21395,15 @@ $as_echo "newly built ld" >&6; }
 	elif test "$ld_is_gold" = yes; then
 	  in_tree_ld_is_elf=yes
 	fi
+	gcc_cv_gld_version=`$gcc_cv_ld_bfd_srcdir/configure --version | sed -n -e '1s,.* ,VERSION=,p'`
+	if test x$gcc_cv_gld_version != x; then true; else
 	for f in $gcc_cv_ld_bfd_srcdir/configure $gcc_cv_ld_gld_srcdir/configure $gcc_cv_ld_gld_srcdir/configure.in $gcc_cv_ld_gld_srcdir/Makefile.in
 	do
 		gcc_cv_gld_version=`sed -n -e 's/^[ 	]*\(VERSION=[0-9]*\.[0-9]*.*\)/\1/p' < $f`
 		if test x$gcc_cv_gld_version != x; then
 			break
 		fi
-	done
+	done; fi
 	gcc_cv_gld_major_version=`expr "$gcc_cv_gld_version" : "VERSION=\([0-9]*\)"`
 	gcc_cv_gld_minor_version=`expr "$gcc_cv_gld_version" : "VERSION=[0-9]*\.\([0-9]*\)"`
 else
@@ -25346,8 +25350,8 @@ if test "${gcc_cv_as_aix_ref+set}" = set; then :
 else
   gcc_cv_as_aix_ref=no
 if test $in_tree_gas = yes; then
-if test $gcc_cv_gas_vers -ge `expr \( \( 2.21.0 \* 1000 \) + gcc_cv_as_aix_ref=yes \) \* 1000 + `
-  then :
+if test $gcc_cv_gas_vers -ge `expr \( \( 2 \* 1000 \) + 21 \) \* 1000 + 0`
+  then gcc_cv_as_aix_ref=yes
 fi
   elif test x$gcc_cv_as != x; then
 $as_echo '	.csect stuff[rw]
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 6ad6d19..3013555 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2046,6 +2046,8 @@ if test "$gcc_cv_ld" = ../ld/ld-new$build_exeext \
 	elif test "$ld_is_gold" = yes; then
 	  in_tree_ld_is_elf=yes
 	fi
+	gcc_cv_gld_version=`$gcc_cv_ld_bfd_srcdir/configure --version | sed -n -e '1s,.* ,VERSION=,p'`
+	if test x$gcc_cv_gld_version != x; then true; else
 	for f in $gcc_cv_ld_bfd_srcdir/configure $gcc_cv_ld_gld_srcdir/configure $gcc_cv_ld_gld_srcdir/configure.in $gcc_cv_ld_gld_srcdir/Makefile.in
 	do
 changequote(,)dnl
@@ -2053,7 +2055,7 @@ changequote(,)dnl
 		if test x$gcc_cv_gld_version != x; then
 			break
 		fi
-	done
+	done; fi
 	gcc_cv_gld_major_version=`expr "$gcc_cv_gld_version" : "VERSION=\([0-9]*\)"`
 	gcc_cv_gld_minor_version=`expr "$gcc_cv_gld_version" : "VERSION=[0-9]*\.\([0-9]*\)"`
 changequote([,])dnl
@@ -3878,7 +3880,7 @@ LCF0:
 case $target in
   *-*-aix*)
 	gcc_GAS_CHECK

Ping Re: Defining C99 predefined macros for whole translation unit

2012-10-08 Thread Joseph S. Myers

Ping.  This patch 
 (non-C parts) is 
pending review.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-08 Thread Paolo Carlini

Hi,

Pavel Chupin  ha scritto:

>It has been changed here:
>http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=630d52ca0a88d173f89634a5d7dd8aee07d04d80
>
>subj:"Move gthr to toplevel libgcc"

I see, thanks. Let's add Rainer in CC, see if he expected this to happen or not.

Paolo

Re: patch to fix constant math

2012-10-08 Thread Nathan Froyd

- Original Message -
> Btw, as for Richards idea of conditionally placing the length field
> in
> rtx_def looks like overkill to me.  These days we'd merely want to
> optimize for 64bit hosts, thus unconditionally adding a 32 bit
> field to rtx_def looks ok to me (you can wrap that inside a union to
> allow both descriptive names and eventual different use - see what
> I've done to tree_base)

IMHO, unconditionally adding that field isn't "optimize for 64-bit
hosts", but "gratuitously make one of the major compiler data
structures bigger on 32-bit hosts".  Not everybody can cross-compile
from a 64-bit host.  And even those people who can don't necessarily
want to.  Please try to consider what's best for all the people who
use GCC, not just the cases you happen to be working with every day.

-Nathan

[testsuite] Require tls_runtime in gcc.target/i386/pr54445-1.c

2012-10-08 Thread Rainer Orth

gcc.target/i386/pr54445-1.c FAILs to execute on Solaris 9 with native TLS:

ld.so.1: pr54445-1.exe: fatal: pr54445-1.exe: object requires TLS, but TLS faile
d to initialize

The following patch fixes this by both requiring TLS runtime support and
adding the necessary options.

Tested with the appropriate runtest invocation in i386-pc-solaris2.9 and
x86_64-unknown-linux-gnu, installed on mainline.

Rainer


2012-10-08  Rainer Orth  

* gcc.target/i386/pr54445-1.c: Require tls_runtime, add tls options.

# HG changeset patch
# Parent 67ccd7a114e0eaf13cdb8c6d8f109c8fdfb86a96
Require tls_runtime in gcc.target/i386/pr54445-1.c

diff --git a/gcc/testsuite/gcc.target/i386/pr54445-1.c b/gcc/testsuite/gcc.target/i386/pr54445-1.c
--- a/gcc/testsuite/gcc.target/i386/pr54445-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr54445-1.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
+/* { dg-do run { target tls_runtime } } */
 /* { dg-options "-O2" } */
+/* { dg-add-options tls } */
 
 __thread unsigned char tls_array[64];
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: patch to fix constant math

2012-10-08 Thread Robert Dewar


On 10/8/2012 11:01 AM, Nathan Froyd wrote:

- Original Message -

Btw, as for Richards idea of conditionally placing the length field
in
rtx_def looks like overkill to me.  These days we'd merely want to
optimize for 64bit hosts, thus unconditionally adding a 32 bit
field to rtx_def looks ok to me (you can wrap that inside a union to
allow both descriptive names and eventual different use - see what
I've done to tree_base)


IMHO, unconditionally adding that field isn't "optimize for 64-bit
hosts", but "gratuitously make one of the major compiler data
structures bigger on 32-bit hosts".  Not everybody can cross-compile
from a 64-bit host.  And even those people who can don't necessarily
want to.  Please try to consider what's best for all the people who
use GCC, not just the cases you happen to be working with every day.


I think that's rasonable in general, but as time goes on, and every
$300 laptop is 64-bit capable, one should not go TOO far out of the
way trying to make sure we can compile everything on a 32-bit machine.
After all, we don't try to ensure we can compile on a 16-bit machine
though when I helped write the Realia COBOL compiler, it was a major
consideration that we had to be able to compile arbitrarily large
programs on a 32-bit machine with one megabyte of memory. That was
achieved at the time, but is hardly relevant now!

Re: [PATCH] Fix inclusion of cxxabi_forced.h in dynamic_bitset

2012-10-08 Thread Joe Seymour

On 10/06/12 01:50, Paolo Carlini wrote:
> On 10/06/2012 02:33 AM, Joe Seymour wrote:
>> I'm seeing tr2/headers/all.cc fail in the libstdc++ testsuite:
>>
>> In file included from
>> src/gcc-mainline/libstdc++-v3/testsuite/tr2/headers/all.cc:22:0:
>> /scratch/jseymour/mainline/i686-pc-linux-gnu/install/opt/codesourcery/include/c++/4.8.0/tr2/dynamic_bitset:42:27:
>> fatal error: cxxabi_forced.h: No such file or directory
>>   #include 
>> ^
>> compilation terminated.
>>
>>
>>  From libstdc++-v3/libsupc++/Makefile.am:
>>> bits_HEADERS = \
>>> atomic_lockfree_defines.h cxxabi_forced.h \
>>> exception_defines.h exception_ptr.h hash_bytes.h nested_exception.h
>> Looking at how other headers in that list are treated, I believe it is the
>> include of cxxabi_forced.h in dynamic_bitset at fault. This patch corrects 
>> it.
> I'm pretty sure you are right. Any idea why the test isn't failing for 
> anybody else?

I was surprised not to find any other references to this failure as well,
especially as I observed the failure with pristine FSF sources. I've had a
closer look:

* We (CodeSourcery/Mentor) test the installation directory, with something like:

g++ -D_GLIBCXX_ASSERT -fmessage-length=0  -DLOCALEDIR="."
-I/scratch/jseymour/mainline/i686-pc-linux-gnu/src/gcc-mainline/libstdc++-v3/testsuite/util
\
/scratch/jseymour/mainline/i686-pc-linux-gnu/src/gcc-mainline/libstdc++-v3/testsuite/tr2/headers/all.cc
  -std=gnu++0x -S  -o all.s

* The standard "make check" invocation tests the objdir/srcdir with a longer
command, passing various paths etc, in particular:

-I/scratch/jseymour/mainline/i686-pc-linux-gnu/src/gcc-mainline/libstdc++-v3/libsupc++

Because all the headers in libstdc++-v3 are in that directory cxxabi_forced.h is
found successfully. It is the Makefile that places it in a different directory
during installation.

I suppose to get this test working correctly, we need to move the files listed
in bits_HEADERS into a bits/ directory in the source tree, then make appropriate
changes to cater for the adjusted directory layout.

Joe

Third ping: Re: Add a configure option to disable system header canonicalizations (issue6495088)

2012-10-08 Thread Simon Baldwin

Ping, again.

On 1 October 2012 16:56, Simon Baldwin  wrote:
>
> Ping, again.
>
>
> On 21 September 2012 12:45, Simon Baldwin  wrote:
> >
> > Ping.
> >
> > http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00459.html
> >
> > Full text of previous message and context at URL above.  No comments
> > or code changes since.  Patch description left below for convenience.
> >
> > >
> > > Add flags to disable system header canonicalizations.
> > >
> > > Libcpp may canonicalize system header paths with lrealpath() for
> > > diagnostics,
> > > dependency output, and similar.  If gcc is held in a symlink farm the
> > > canonicalized paths may be meaningless to users, and will also
> > > conflict with
> > > build frameworks that (for example) disallow absolute paths to header
> > > files.
> > >
> > > This change adds -f[no-]canonical-system-headers to the gcc command
> > > line, and
> > > a configure option --[en/dis]able-canonical-system-headers to set
> > > default
> > > behaviour, allowing the user to select whether or not to implement
> > > r186991.
> > > Default is enabled.  See also PR c++/52974.
> > >
> > > Tested for regressions with bootstrap builds of C and C++, both with
> > > and
> > > without configure --disable-canonical-system-headers.

--
Google UK Limited | Registered Office: Belgrave House, 76 Buckingham
Palace Road, London SW1W 9TQ | Registered in England Number: 3977902

[C++] Omit overflow check for new char[n]

2012-10-08 Thread Florian Weimer

If the size of the inner array elements is 1 and we do not need a 
cookie, we do not need to insert an overflow check.  This applies to the 
relatively frequent new char[n] case.


Built and regression-tested on x86_64-redhat-linux-gnu.  Okay for trunk?

--
Florian Weimer / Red Hat Product Security Team

gcc/:

2012-10-08  Florian Weimer  

	* init.c (build_new_1): Do not check for arithmetic overflow if
	inner array size is 1.

gcc/testsuite/:

2012-10-08  Florian Weimer  

	* g++.dg/init/new40.C: New.

Index: gcc/cp/ChangeLog
===
--- gcc/cp/ChangeLog	(revision 192206)
+++ gcc/cp/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2012-10-08  Florian Weimer  
+
+	* init.c (build_new_1): Do not check for arithmetic overflow if
+	inner array size is 1.
+
 2012-10-08  Dodji Seketeli  
 
 	PR c++/53528 C++11 attribute support
Index: gcc/cp/init.c
===
--- gcc/cp/init.c	(revision 192206)
+++ gcc/cp/init.c	(working copy)
@@ -2184,6 +2184,8 @@
   bool outer_nelts_from_type = false;
   double_int inner_nelts_count = double_int_one;
   tree alloc_call, alloc_expr;
+  /* Size of the inner array elements. */
+  double_int inner_size;
   /* The address returned by the call to "operator new".  This node is
  a VAR_DECL and is therefore reusable.  */
   tree alloc_node;
@@ -2345,8 +2347,6 @@
   double_int max_size
 	= double_int_one.llshift (TYPE_PRECISION (sizetype) - 1,
   HOST_BITS_PER_DOUBLE_INT);
-  /* Size of the inner array elements. */
-  double_int inner_size;
   /* Maximum number of outer elements which can be allocated. */
   double_int max_outer_nelts;
   tree max_outer_nelts_tree;
@@ -2450,7 +2450,13 @@
 	  if (array_p && TYPE_VEC_NEW_USES_COOKIE (elt_type))
 	size = size_binop (PLUS_EXPR, size, cookie_size);
 	  else
-	cookie_size = NULL_TREE;
+	{
+	  cookie_size = NULL_TREE;
+	  /* No size arithmetic necessary, so the size check is
+		 not needed. */
+	  if (outer_nelts_check != NULL && inner_size == double_int_one)
+		outer_nelts_check = NULL_TREE;
+	}
 	  /* Perform the overflow check.  */
 	  if (outer_nelts_check != NULL_TREE)
 size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check,
@@ -2486,7 +2492,13 @@
 	  /* Use a global operator new.  */
 	  /* See if a cookie might be required.  */
 	  if (!(array_p && TYPE_VEC_NEW_USES_COOKIE (elt_type)))
-	cookie_size = NULL_TREE;
+	{
+	  cookie_size = NULL_TREE;
+	  /* No size arithmetic necessary, so the size check is
+		 not needed. */
+	  if (outer_nelts_check != NULL && inner_size == double_int_one)
+		outer_nelts_check = NULL_TREE;
+	}
 
 	  alloc_call = build_operator_new_call (fnname, placement,
 		&size, &cookie_size,
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(revision 192206)
+++ gcc/testsuite/ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2012-10-08  Florian Weimer  
+
+	* g++.dg/init/new40.C: New.
+
 2012-10-08  Oleg Endo  
 
 	PR target/54685
Index: gcc/testsuite/g++.dg/init/new40.C
===
--- gcc/testsuite/g++.dg/init/new40.C	(revision 0)
+++ gcc/testsuite/g++.dg/init/new40.C	(working copy)
@@ -0,0 +1,112 @@
+// Testcase for overflow handling in operator new[].
+// Optimization of unnecessary overflow checks.
+// { dg-do run }
+
+#include 
+#include 
+#include 
+
+static size_t magic_allocation_size
+  = 1 + (size_t (1) << (sizeof (size_t) * 8 - 1));
+
+struct exc : std::bad_alloc {
+};
+
+static size_t expected_size;
+
+struct pod_with_new {
+  char ch;
+  void *operator new[] (size_t sz)
+  {
+if (sz != expected_size)
+  abort ();
+throw exc ();
+  }
+};
+
+struct with_new {
+  char ch;
+  with_new () { }
+  ~with_new () { }
+  void *operator new[] (size_t sz)
+  {
+if (sz != size_t (-1))
+  abort ();
+throw exc ();
+  }
+};
+
+struct non_pod {
+  char ch;
+  non_pod () { }
+  ~non_pod () { }
+};
+
+void *
+operator new (size_t sz) _GLIBCXX_THROW (std::bad_alloc)
+{
+  if (sz != expected_size)
+abort ();
+  throw exc ();
+}
+
+int
+main ()
+{
+  if (sizeof (pod_with_new) == 1)
+expected_size = magic_allocation_size;
+  else
+expected_size = -1;
+
+  try {
+new pod_with_new[magic_allocation_size];
+abort ();
+  } catch (exc &) {
+  }
+
+  if (sizeof (with_new) == 1)
+expected_size = magic_allocation_size;
+  else
+expected_size = -1;
+
+  try {
+new with_new[magic_allocation_size];
+abort ();
+  } catch (exc &) {
+  }
+
+  expected_size = magic_allocation_size;
+  try {
+new char[magic_allocation_size];
+abort ();
+  } catch (exc &) {
+  }
+
+  expected_size = -1;
+
+  try {
+new pod_with_new[magic_allocation_size][2];
+abort ();
+  } catch (exc &) {
+  }
+
+  try {
+new with_new[magic_allocation_size][2];
+

Re: [C++] Mixed scalar-vector operations

2012-10-08 Thread Marc Glisse


On Fri, 5 Oct 2012, Jason Merrill wrote:


+   error_at (loc, "conversion of scalar to vector "
+  "involves truncation");


These errors should print the types involved.  They also need to be 
suppressed when !(complain & tf_error).


Hello,

here is a new version of the patch. Differences with the previous one 
should only be comments, testsuite, printing types and inhibiting error 
messages.


Passes bootstrap+testsuite. scal-to-vec1.c was failing but then Joseph 
showed me the \[^\\n\]* trick and I retested with:

make check-gcc 'RUNTESTFLAGS=dg.exp=scal-to-vec1.c'

2012-09-22  Marc Glisse  

PR c++/54427

c/
* c-typeck.c: Include c-common.h.
(enum stv_conv): Moved to c-common.h.
(scalar_to_vector): Moved to c-common.c.
(build_binary_op): Adapt to scalar_to_vector's new prototype.
* Make-lang.in: c-typeck.c depends on c-common.h.

c-family/
* c-common.c (scalar_to_vector): Moved from c-typeck.c. Support
more operations. Make error messages optional.
* c-common.h (enum stv_conv): Moved from c-typeck.c.
(scalar_to_vector): Declare.

cp/
* typeck.c (cp_build_binary_op): Handle mixed scalar-vector
operations.
[LSHIFT_EXPR, RSHIFT_EXPR]: Likewise.

gcc/
* fold-const.c (fold_binary_loc): Use build_zero_cst instead of
build_int_cst for a potential vector.

testsuite/
* c-c++-common/vector-scalar.c: New testcase.
* g++.dg/ext/vector18.C: New testcase.
* g++.dg/ext/vector5.C: This is not an error anymore.
* gcc.dg/init-vec-1.c: Move ...
* c-c++-common/init-vec-1.c: ... here. Adapt error message.
* gcc.c-torture/execute/vector-shift1.c: Move ...
* c-c++-common/torture/vector-shift1.c: ... here.
* gcc.dg/scal-to-vec1.c: Move ...
* c-c++-common/scal-to-vec1.c: ... here. Avoid narrowing for
C++11. Adapt error messages.
* gcc.dg/convert-vec-1.c: Move ...
* c-c++-common/convert-vec-1.c: ... here.
* gcc.dg/scal-to-vec2.c: Move ...
* c-c++-common/scal-to-vec2.c: ... here.



--
Marc GlisseIndex: testsuite/g++.dg/ext/vector18.C
===
--- testsuite/g++.dg/ext/vector18.C (revision 0)
+++ testsuite/g++.dg/ext/vector18.C (revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c++11" } */
+
+typedef signed char __attribute__((vector_size(128) )) vec;
+
+template 
+auto f (A *a, B b) -> decltype (*a + b);
+
+void f (...) {}
+
+void g (vec *v, long long l)
+{
+  f (v, l);
+}

Property changes on: testsuite/g++.dg/ext/vector18.C
___
Added: svn:eol-style
   + native
Added: svn:keywords
   + Author Date Id Revision URL

Index: testsuite/g++.dg/ext/vector5.C
===
--- testsuite/g++.dg/ext/vector5.C  (revision 192153)
+++ testsuite/g++.dg/ext/vector5.C  (working copy)
@@ -1,8 +1,8 @@
 // PR c++/30022
 // { dg-do compile }
 
 void foo()
 {
   int __attribute__((vector_size(8))) v;
-  v = 1/v;  // { dg-error "invalid operands of types" }
+  v = 1/v;
 }
Index: testsuite/c-c++-common/init-vec-1.c
===
--- testsuite/c-c++-common/init-vec-1.c (revision 191610)
+++ testsuite/c-c++-common/init-vec-1.c (working copy)
@@ -1,4 +1,4 @@
 /* Don't ICE or emit spurious errors when init a vector with a scalar.  */
 /* { dg-do compile } */
 typedef float v2sf __attribute__ ((vector_size (8)));
-v2sf a = 0.0;  /* { dg-error "incompatible types" } */
+v2sf a = 0.0;  /* { dg-error "incompatible types|cannot convert" } */
Index: testsuite/c-c++-common/torture/vector-shift1.c
===
--- testsuite/c-c++-common/torture/vector-shift1.c  (revision 191610)
+++ testsuite/c-c++-common/torture/vector-shift1.c  (working copy)
@@ -1,10 +1,11 @@
+/* { dg-do run } */
 #define vector __attribute__((vector_size(8*sizeof(short
 
 int main (int argc, char *argv[]) {
   vector short v0 = {argc,2,3,4,5,6,7};
   vector short v1 = {2,2,2,2,2,2,2};
   vector short r1,r2,r3,r4;
   int i = 8;
 
   r1 = v0 << 1;
   r2 = v0 >> 1;
Index: testsuite/c-c++-common/scal-to-vec1.c
===
--- testsuite/c-c++-common/scal-to-vec1.c   (revision 191610)
+++ testsuite/c-c++-common/scal-to-vec1.c   (working copy)
@@ -6,38 +6,38 @@
 __attribute__((vector_size((elcount)*sizeof(type type
 
 #define vidx(type, vec, idx) (*((type *) &(vec) + idx))
 
 
 extern float sfl;
 extern int   sint;
 extern long long sll;
 
 int main (int argc, char *argv[]) {
-vector(8, short) v0 = {argc, 1,2,3,4,5,6,7};
+vector(8, short) v0 = {(short)argc, 1,2,3,4,5,6,7};
 vector(8, short) v1;
 
 vector(4, floa

[PATCH] Fix up vt_add_function_parameter (PR debug/54831)

2012-10-08 Thread Marek Polacek

As the testcase shows, we ICEd when generating the debug info for C++
and not splitting types into multiple registers.
The issue is in vt_add_function_parameter that we assumed that the 
DECL_RTL expression was a pseudo register.  But in that case it is
better to just give up than to ICE.
Regtested/bootstrapped on x86_64, ok for trunk?

2012-10-08  Marek Polacek  

PR debug/54831
* var-tracking.c (vt_add_function_parameter): Use condition in place
of gcc_assert.

* testsuite/g++.dg/debug/pr54831.C: New test.

--- gcc/testsuite/g++.dg/debug/pr54831.C.mp 2012-10-08 12:14:55.790807737 
+0200
+++ gcc/testsuite/g++.dg/debug/pr54831.C2012-10-08 12:51:53.856042257 
+0200
@@ -0,0 +1,20 @@
+// PR debug/54831
+// { dg-do compile }
+// { dg-options "-O -fno-split-wide-types -g" }
+
+struct S
+{
+  int m1();
+  int m2();
+};
+
+typedef void (S::*mptr) ();
+
+mptr gmp;
+void bar (mptr f);
+
+void foo (mptr f)
+{
+  f = gmp;
+  bar (f);
+}
--- gcc/var-tracking.c.mp   2012-10-08 10:56:32.354556352 +0200
+++ gcc/var-tracking.c  2012-10-08 12:50:09.627307344 +0200
@@ -9404,12 +9404,13 @@ vt_add_function_parameter (tree parm)
 
   if (parm != decl)
 {
-  /* Assume that DECL_RTL was a pseudo that got spilled to
-memory.  The spill slot sharing code will force the
+  /* If that DECL_RTL wasn't a pseudo that got spilled to
+memory, bail out.  The spill slot sharing code will force the
 memory to reference spill_slot_decl (%sfp), so we don't
 match above.  That's ok, the pseudo must have referenced
 the entire parameter, so just reset OFFSET.  */
-  gcc_assert (decl == get_spill_slot_decl (false));
+  if (decl != get_spill_slot_decl (false))
+return;
   offset = 0;
 }
 
Marek

Re: [i386] recognize haddpd

2012-10-08 Thread Uros Bizjak

On Mon, Oct 8, 2012 at 4:40 PM, Marc Glisse  wrote:
> On Fri, 28 Sep 2012, Uros Bizjak wrote:
>
> 2) {v[0]-v[1], v[0]-v[1]} is not recognized as a hsubpd because
> vec_duplicate doesn't match vec_concat. Do we really need to duplicate
> (no
> pun intended) the pattern?
>>
>>
>> You can add this transformation to simplify-rtx.c. Probably vec_concat
>> with two equal operands can be canonicalized as vec_duplicate.
>
>
> Actually, it is replacing vec_duplicate with vec_concat that would help.
> Well, I'll see about that later.
>
> Here is what I came up with, trying to follow your other advice (thanks a
> lot!).
>
> Passes bootstrap+testsuite.
>
> 2012-10-08  Marc Glisse  
>
> gcc/
> PR target/54400
> * config/i386/i386.md (type attribute): Add sseadd1.
> (unit attribute): Add support for sseadd1.
> * config/i386/sse.md (sse3_hv2df3): split into...
> (sse3_haddv2df3): ... expander.
> (*sse3_haddv2df3): ... define_insn. Accept permuted operands.
> (sse3_hsubv2df3): ... define_insn.
> (*sse3_haddv2df3_low): New define_insn.
> (*sse3_hsubv2df3_low): New define_insn.
>
> gcc/testsuite/
> PR target/54400
>
> * gcc.target/i386/pr54400.c: New testcase.
>
> --
> Marc Glisse
>
> Index: gcc/testsuite/gcc.target/i386/pr54400.c
> ===
> --- gcc/testsuite/gcc.target/i386/pr54400.c (revision 0)
> +++ gcc/testsuite/gcc.target/i386/pr54400.c (revision 0)
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse3 -mfpmath=sse" } */
> +
> +#include 
> +
> +double f (__m128d p)
> +{
> +  return p[0] - p[1];
> +}
> +
> +double g1 (__m128d p)
> +{
> +  return p[0] + p[1];
> +}
> +
> +double g2 (__m128d p)
> +{
> +  return p[1] + p[0];
> +}
> +
> +__m128d h (__m128d p, __m128d q)
> +{
> +  __m128d r = { p[0] - p[1], q[0] - q[1] };
> +  return r;
> +}
> +
> +__m128d i1 (__m128d p, __m128d q)
> +{
> +  __m128d r = { p[0] + p[1], q[0] + q[1] };
> +  return r;
> +}
> +
> +__m128d i2 (__m128d p, __m128d q)
> +{
> +  __m128d r = { p[0] + p[1], q[1] + q[0] };
> +  return r;
> +}
> +
> +__m128d i3 (__m128d p, __m128d q)
> +{
> +  __m128d r = { p[1] + p[0], q[0] + q[1] };
> +  return r;
> +}
> +
> +__m128d i4 (__m128d p, __m128d q)
> +{
> +  __m128d r = { p[1] + p[0], q[1] + q[0] };
> +  return r;
> +}
> +
> +/* { dg-final { scan-assembler-times "hsubpd" 2 } } */
> +/* { dg-final { scan-assembler-times "haddpd" 6 } } */
> +/* { dg-final { scan-assembler-not "unpck" } } */
>
> Property changes on: gcc/testsuite/gcc.target/i386/pr54400.c
> ___
> Added: svn:keywords
>+ Author Date Id Revision URL
> Added: svn:eol-style
>+ native
>
> Index: gcc/config/i386/i386.md
> ===
> --- gcc/config/i386/i386.md (revision 192206)
> +++ gcc/config/i386/i386.md (working copy)
> @@ -320,36 +320,36 @@
>  ;; provided in other attributes.
>  (define_attr "type"
>"other,multi,
> alu,alu1,negnot,imov,imovx,lea,
> incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
> icmp,test,ibr,setcc,icmov,
> push,pop,call,callv,leave,
> str,bitmanip,
> fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
> sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
> -
> sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
> -   ssemuladd,sse4arg,lwp,
> +   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
> +   ssediv,sseins,ssemuladd,sse4arg,lwp,
> mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
>(const_string "other"))
>
>  ;; Main data type used by the insn
>  (define_attr "mode"
>
> "unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,XF,TF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF"
>(const_string "unknown"))
>
>  ;; The CPU unit operations uses.
>  (define_attr "unit" "integer,i387,sse,mmx,unknown"
>(cond [(eq_attr "type"
> "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
>(const_string "i387")
>  (eq_attr "type"
> "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
> - sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
> +
> sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
>   ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
>(const_string "sse")
>  (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
>(const_string "mmx")
>  (eq_attr "type" "other")
>(const_string "unknown")]
>  (const_string "integer")))

You missed the most important sseadd1 addition, the one that prevents
checking of operand2 when calculating "memory" attribute:

 (and (eq_attr "type"
 "!alu1,negnot,ishift1,
   imov,imovx,icmp,test,bitmanip,

Re: [PATCH] Fix up vt_add_function_parameter (PR debug/54831)

2012-10-08 Thread Jakub Jelinek

On Mon, Oct 08, 2012 at 05:58:15PM +0200, Marek Polacek wrote:
> 2012-10-08  Marek Polacek  
> 
>   PR debug/54831
>   * var-tracking.c (vt_add_function_parameter): Use condition in place
>   of gcc_assert.

Perhaps s/in place/instead/ ?

> --- gcc/var-tracking.c.mp 2012-10-08 10:56:32.354556352 +0200
> +++ gcc/var-tracking.c2012-10-08 12:50:09.627307344 +0200
> @@ -9404,12 +9404,13 @@ vt_add_function_parameter (tree parm)
>  
>if (parm != decl)
>  {
> -  /* Assume that DECL_RTL was a pseudo that got spilled to
> -  memory.  The spill slot sharing code will force the
> +  /* If that DECL_RTL wasn't a pseudo that got spilled to
> +  memory, bail out.  The spill slot sharing code will force the

I'd perhaps add s/The/Otherwise, the/ here.

>memory to reference spill_slot_decl (%sfp), so we don't
>match above.  That's ok, the pseudo must have referenced
>the entire parameter, so just reset OFFSET.  */
> -  gcc_assert (decl == get_spill_slot_decl (false));
> +  if (decl != get_spill_slot_decl (false))
> +return;
>offset = 0;
>  }

Ok with those changes.

Jakub

Re: [i386] recognize haddpd

2012-10-08 Thread Uros Bizjak

On Mon, Oct 8, 2012 at 6:08 PM, Uros Bizjak  wrote:

>> +(define_insn "*sse3_haddv2df3"
>>[(set (match_operand:V2DF 0 "register_operand" "=x,x")
>> (vec_concat:V2DF
>> - (plusminus:DF
>> + (plus:DF
>> +   (vec_select:DF
>> + (match_operand:V2DF 1 "register_operand" "0,x")
>> + (parallel [(match_operand:SI 3 "const_0_to_1_operand")]))
>> +   (vec_select:DF
>> + (match_dup 1)
>> + (parallel [(match_operand:SI 4 "const_0_to_1_operand")])))
>> + (plus:DF
>> +   (vec_select:DF
>> + (match_operand:V2DF 2 "nonimmediate_operand" "xm,xm")
>> + (parallel [(match_operand:SI 5 "const_0_to_1_operand")]))
>> +   (vec_select:DF
>> + (match_dup 2)
>> + (parallel [(match_operand:SI 6 "const_0_to_1_operand")])]
>> +  "TARGET_SSE3 && INTVAL (operands[3]) != INTVAL (operands[4])
>> +   && INTVAL (operands[5]) != INTVAL (operands[6])"
>> +  "@
>> +   haddpd\t{%2, %0|%0, %2}
>> +   vhaddpd\t{%2, %1, %0|%0, %1, %2}"
>> +  [(set_attr "isa" "noavx,avx")
>> +   (set_attr "type" "sseadd")
>> +   (set_attr "prefix" "orig,vex")
>> +   (set_attr "mode" "V2DF")])
>
> Please use (match_dup 3) in place of (match_operand 5) and (match_dup
> 4) in place of (match_operand 6) predicates. These should be the same.

Oh, I was too quick with this part. The code above is OK, since we can
permute every part independently.

Uros.

[patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread Steve Ellcey

The gcc.target/mips/ext_ins.c was failing in little endian mode on MIPS because
the compiler is smart enough now to see that 'c' is uninitialized and it can
insert the field 'a' into 'c' with a shift and a full store instead of an
insert because the store just overwrites unintialized data.  I changed the
code to force the compiler to preserve the other fields of 'c' and that makes
it use the insert instruction in both big and little endian modes.

Tested on mips-mti-elf.

OK to checkin?

Steve Ellcey
sell...@mips.com



2012-10-08  Steve Ellcey  

* gcc.target/ext_ins.c: Modify f2 to aviod uninitialized data.


diff --git a/gcc/testsuite/gcc.target/mips/ext_ins.c 
b/gcc/testsuite/gcc.target/mips/ext_ins.c
index f0169bc..36f0f3f 100644
--- a/gcc/testsuite/gcc.target/mips/ext_ins.c
+++ b/gcc/testsuite/gcc.target/mips/ext_ins.c
@@ -18,9 +18,8 @@ NOMIPS16 unsigned int f1 (struct A a)
   return a.j;
 }
 
-NOMIPS16 void f2 (int i)
+NOMIPS16 struct A f2 (struct A a, int i)
 {
-  struct A c;
-  c.j = i;
-  func (c);
+  a.j = i;
+  return a;
 }

Re: patch to fix constant math

2012-10-08 Thread Richard Guenther

On Mon, Oct 8, 2012 at 5:01 PM, Nathan Froyd  wrote:
> - Original Message -
>> Btw, as for Richards idea of conditionally placing the length field
>> in
>> rtx_def looks like overkill to me.  These days we'd merely want to
>> optimize for 64bit hosts, thus unconditionally adding a 32 bit
>> field to rtx_def looks ok to me (you can wrap that inside a union to
>> allow both descriptive names and eventual different use - see what
>> I've done to tree_base)
>
> IMHO, unconditionally adding that field isn't "optimize for 64-bit
> hosts", but "gratuitously make one of the major compiler data
> structures bigger on 32-bit hosts".  Not everybody can cross-compile
> from a 64-bit host.  And even those people who can don't necessarily
> want to.  Please try to consider what's best for all the people who
> use GCC, not just the cases you happen to be working with every day.

The challenge would of course be to have the overhead only for a minority
of all RTX codes.  After all that 32bits are free to be used for every one.

And I would not consider RTX a 'major compiler data structure' - of course
that makes the whole issue somewhat moot ;)

Richard.

> -Nathan

[patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os

2012-10-08 Thread Steve Ellcey

The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization
level does not do whichever optimization it is that results in a bbit instead
of a bbit[01]l.  I would like to skip this test for -Os the way it already gets
skipped for -O0.

Tested on mips-mti-elf.  Ok for checkin?

Steve Ellcey
sell...@mips.com



2012-10-08  Steve Ellcey  

* gcc.target/octeon-bbit-2.c: Skip for -Os optimization level.


diff --git a/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c 
b/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
index 9bd8dce..7d88d68 100644
--- a/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
+++ b/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=octeon -mbranch-likely -fno-unroll-loops" } */
-/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-Os" } { "" } } */
 /* { dg-final { scan-assembler "\tbbit\[01\]\t" } } */
 /* { dg-final { scan-assembler-not "\tbbit\[01\]l\t" } } */
 /* { dg-final { scan-assembler "\tbnel\t" } } */

[lra] 3rd patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher

Hello,

This patch makes lra_constraint_insn_stack_bitmap an sbitmap. This
reduces compile time by another minute or so on gcc17 for the test
case of PR54146, and I think it's a general improvement also for less
extreme code. For cc1-i files the compile time change tends to be a
little less but that may just be noise.

Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for lra-branch?

(This is the combined patch of all changes in my check-out of the
lra-branch. The lra.c and lra-constraints.c bits are new, the rest was
posted previously and is awaiting review also.)

Ciao!
Steven

* lra-int.h (lra_constraint_insn_stack_bitmap,
lra_constraint_insn_stack): Remove.
(lra_pop_insn, lra_insn_stack_length): New prototypes.
* lra.c (lra_constraint_insn_stack_bitmap): Make static sbitmap.
(lra_constraint_insn_stack): Make static.
(lra_push_insn_1): New function.
(lra_push_insn): Rewrite using lra_push_insn_1.
(lra_push_insn_and_update_insn_regno_info): Likewise.
(lra_pop_insn, lra_insn_stack_length): New functions.
* lra_constraints.c (lra_constraints): Use new interface to
insns stack instead of manipulating in-place.
* lra-eliminations.c (add_insn_bitmap_to_set): New function.
(update_reg_eliminate): Make argument an sbitmap.  Return a bool
telling whether the input sbitmap has changed.
(lra_eliminate): Allocate and free the worklist set as an sbitmap.

* lra-lives.c (curr_point): Make non-static in lra_create_live_ranges.
(mark_pseudo_live): Take POINT argument.
(mark_pseudo_dead): Likewise.
(mark_regno_live): Likewise, and return a bool to indicate that
someting changed in the dataflow sets.
(mark_regno_dead): Likewise.
(next_program_point): Renamed from incr_curr_point, and take the
current program point as a by-reference argument.
(process_bb_lives): Take the current program point as by-ref argument.
Try to only do a program point increment if this is necessary.
(remove_some_program_points_and_update_live_ranges): If no compression
can be done, don't update the live ranges.
(lra_create_live_ranges): Make curr_point local, and pass it around.
Visit blocks in topological order of the reverse CFG.

* lra-int.h (lra_assert): Define as duplicate of gcc_checking_assert.


lra-patch3.diff
Description: Binary data

Re: [PATCH] Fix up vt_add_function_parameter (PR debug/54831)

2012-10-08 Thread Marek Polacek

On Mon, Oct 08, 2012 at 06:09:41PM +0200, Jakub Jelinek wrote:
> Ok with those changes.

Thanks, this is what I've checked in:

2012-10-08  Marek Polacek  

PR debug/54831
* var-tracking.c (vt_add_function_parameter): Use condition instead
of gcc_assert.

* testsuite/g++.dg/debug/pr54831.C: New test.

--- gcc/testsuite/g++.dg/debug/pr54831.C.mp 2012-10-08 12:14:55.790807737 
+0200
+++ gcc/testsuite/g++.dg/debug/pr54831.C2012-10-08 19:20:45.771190631 
+0200
@@ -0,0 +1,20 @@
+// PR debug/54831
+// { dg-do compile }
+// { dg-options "-O -fno-split-wide-types -g" }
+
+struct S
+{
+  int m1();
+  int m2();
+};
+
+typedef void (S::*mptr) ();
+
+mptr gmp;
+void bar (mptr f);
+
+void foo (mptr f)
+{
+  f = gmp;
+  bar (f);
+}
--- gcc/var-tracking.c.mp   2012-10-08 10:56:32.354556352 +0200
+++ gcc/var-tracking.c  2012-10-08 19:19:15.031950120 +0200
@@ -9404,12 +9404,13 @@ vt_add_function_parameter (tree parm)
 
   if (parm != decl)
 {
-  /* Assume that DECL_RTL was a pseudo that got spilled to
-memory.  The spill slot sharing code will force the
-memory to reference spill_slot_decl (%sfp), so we don't
-match above.  That's ok, the pseudo must have referenced
-the entire parameter, so just reset OFFSET.  */
-  gcc_assert (decl == get_spill_slot_decl (false));
+  /* If that DECL_RTL wasn't a pseudo that got spilled to
+memory, bail out.  Otherwise, the spill slot sharing code
+will force the memory to reference spill_slot_decl (%sfp),
+so we don't match above.  That's ok, the pseudo must have
+referenced the entire parameter, so just reset OFFSET.  */
+  if (decl != get_spill_slot_decl (false))
+return;
   offset = 0;
 }
 

Marek

Re: [C++] Mixed scalar-vector operations

2012-10-08 Thread Jason Merrill


OK.

Jason

Re: RFA: darwin PATCH to fix build, internal visibility

2012-10-08 Thread Jason Merrill


On 10/08/2012 08:28 AM, Dominique Dhumieres wrote:

These tests are still failing on darwin. I think that
target { ! *-*-solaris2* } { ! *-*-darwin* }
sould be replaced with
target { ! { *-*-solaris2* *-*-darwin* } }


Could someone with a darwin box handy make the appropriate change?

Thanks.

Jason

Re: [patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os

2012-10-08 Thread Mike Stump

On Oct 8, 2012, at 9:21 AM, Steve Ellcey  wrote:
> The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization
> level does not do whichever optimization it is that results in a bbit instead
> of a bbit[01]l.  I would like to skip this test for -Os the way it already 
> gets
> skipped for -O0.
> 
> Tested on mips-mti-elf.  Ok for checkin?

Ideally I'd like a mips expert to weigh in on this.  The issue is, is the code 
smaller with the other instruction?  If so, is there a reasonable way to obtain 
that type of win more often in the port with -Os?  Now, if you are that mips 
expert, that's fine, but, trivially you don't need my approval to check it in.  
If the code is larger, trivially, the patch is ok.  If the optimization 
generally hurt code size and can't be made to win, the patch is ok.  If always 
the same size, it would seem ok.   I just don't have the mips specific 
background to know which case this is.

Re: [C++] Mixed scalar-vector operations

2012-10-08 Thread Mike Stump

On Oct 8, 2012, at 8:53 AM, Marc Glisse  wrote:
> On Fri, 5 Oct 2012, Jason Merrill wrote:
> 
>>> +   error_at (loc, "conversion of scalar to vector "
>>> +  "involves truncation");
>> 
>> These errors should print the types involved.  They also need to be 
>> suppressed when !(complain & tf_error).
> 
> Hello,
> 
> here is a new version of the patch.

All I can say is thank you for pressing forward and not being discouraged.  In 
the end, it feels like we'll have better vector support in C++.  :-)

Re: [patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread Mike Stump

On Oct 8, 2012, at 9:16 AM, Steve Ellcey  wrote:
> The gcc.target/mips/ext_ins.c was failing in little endian mode on MIPS 
> because
> the compiler is smart enough now to see that 'c' is uninitialized and it can
> insert the field 'a' into 'c' with a shift and a full store instead of an
> insert because the store just overwrites unintialized data.  I changed the
> code to force the compiler to preserve the other fields of 'c' and that makes
> it use the insert instruction in both big and little endian modes.
> 
> Tested on mips-mti-elf.
> 
> OK to checkin?

Ok.

Re: [patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os

2012-10-08 Thread Steve Ellcey

On Mon, 2012-10-08 at 11:09 -0700, Mike Stump wrote:
> On Oct 8, 2012, at 9:21 AM, Steve Ellcey  wrote:
> > The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization
> > level does not do whichever optimization it is that results in a bbit 
> > instead
> > of a bbit[01]l.  I would like to skip this test for -Os the way it already 
> > gets
> > skipped for -O0.
> > 
> > Tested on mips-mti-elf.  Ok for checkin?
> 
> Ideally I'd like a mips expert to weigh in on this.  The issue is, is the 
> code smaller with the other instruction?
> If so, is there a reasonable way to obtain that type of win more often in the 
> port with -Os?  Now, if you are that
> mips expert, that's fine, but, trivially you don't need my approval to check 
> it in.  If the code is larger,
> trivially, the patch is ok.  If the optimization generally hurt code size and 
> can't be made to win, the patch is ok.
> If always the same size, it would seem ok.   I just don't have the mips 
> specific background to know which case this
> is.

Well, I checked -O1, -O2 and -Os.  The -Os code is smaller then -O1 but
larger then -O2.  I didn't dig deep enough to find out exactly which
optimization is causing the change in instruction usage.  Perhaps
Richard Sandiford will have an opinion on this change.

Steve Ellcey
sell...@mips.com

Re: [patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread David Daney


On 10/08/2012 11:15 AM, Mike Stump wrote:

On Oct 8, 2012, at 9:16 AM, Steve Ellcey  wrote:

The gcc.target/mips/ext_ins.c was failing in little endian mode on MIPS because
the compiler is smart enough now to see that 'c' is uninitialized and it can
insert the field 'a' into 'c' with a shift and a full store instead of an
insert because the store just overwrites unintialized data.  I changed the
code to force the compiler to preserve the other fields of 'c' and that makes
it use the insert instruction in both big and little endian modes.

Tested on mips-mti-elf.

OK to checkin?


Ok.


I don't think this is the proper fix for this.

Use of BBIT{0,1} instructions will always be smaller than the 
alternative.  So disabling the test for -Os doesn't fix the problem the 
test is designed to find.


The real problem is that some optimizer is broken.  Instead of disabling 
the tests, can we fix the problem instead?


The goal of the testsuite should be to detect problems, not yield clean 
results.


If Richard disagrees with me, then I would defer to him.


David Daney

[google/4_7] Patch committed: backport the static prediction for short-circuit patch from trunk

2012-10-08 Thread Dehao Chen

I have backported r192215 from trunk to google-4_7:

2012-10-08  Dehao Chen  

* predict.c (predict_extra_loop_exits): Use
predict_paths_leading_to_edge to replace predict_edge_def.

Bootstrapped and passed crosstool test.

Dehao

Re: [PATCH] PR c++/53540 - using fails to be equivalent to typedef

2012-10-08 Thread Jason Merrill

Let's move the alias template case from primary_template_instantiation_p 
into alias_template_specialization_p and call the latter from the 
former.  And also call it from tsubst.


Jason

Re: [patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread David Daney

Really I meant this in reply to the  'Fix 
gcc.target/mips/octeon-bbit-2.c for -Os' thread.  Sorry for confusing 
the issue here.


I don't really have an objection to this one.

David Daney

On 10/08/2012 11:28 AM, David Daney wrote:

On 10/08/2012 11:15 AM, Mike Stump wrote:

On Oct 8, 2012, at 9:16 AM, Steve Ellcey  wrote:

The gcc.target/mips/ext_ins.c was failing in little endian mode on
MIPS because
the compiler is smart enough now to see that 'c' is uninitialized and
it can
insert the field 'a' into 'c' with a shift and a full store instead
of an
insert because the store just overwrites unintialized data.  I
changed the
code to force the compiler to preserve the other fields of 'c' and
that makes
it use the insert instruction in both big and little endian modes.

Tested on mips-mti-elf.

OK to checkin?


Ok.


I don't think this is the proper fix for this.

Use of BBIT{0,1} instructions will always be smaller than the
alternative.  So disabling the test for -Os doesn't fix the problem the
test is designed to find.

The real problem is that some optimizer is broken.  Instead of disabling
the tests, can we fix the problem instead?

The goal of the testsuite should be to detect problems, not yield clean
results.

If Richard disagrees with me, then I would defer to him.


David Daney

New Spanish PO file for 'gcc' (version 4.7.2)

2012-10-08 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Spanish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/es.po

(This file, 'gcc-4.7.2.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[C++ PATCH] Fix ICE in cp_tree_equal (PR c++/54858)

2012-10-08 Thread Jakub Jelinek

Hi!

The following testcase ICEs because cp_tree_equal doesn't handle
FIELD_DECLs (in 4.4 it was enough to have c0/d0 and c1/d1 in the testcase,
now 12 lines are needed due to introduction of a hash table).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk/4.7?

2012-10-08  Jakub Jelinek  

PR c++/54858
* tree.c (cp_tree_equal): Handle FIELD_DECL.

* g++.dg/template/crash113.C: New test.

--- gcc/cp/tree.c.jj2012-10-05 21:26:54.0 +0200
+++ gcc/cp/tree.c   2012-10-08 18:19:00.897543649 +0200
@@ -2559,6 +2559,7 @@ cp_tree_equal (tree t1, tree t2)
 
 case VAR_DECL:
 case CONST_DECL:
+case FIELD_DECL:
 case FUNCTION_DECL:
 case TEMPLATE_DECL:
 case IDENTIFIER_NODE:
--- gcc/testsuite/g++.dg/template/pr54858.C.jj  2012-10-08 18:15:55.470586784 
+0200
+++ gcc/testsuite/g++.dg/template/pr54858.C 2012-10-08 18:14:28.0 
+0200
@@ -0,0 +1,21 @@
+// PR c++/54858
+// { dg-do compile }
+
+template  struct A {};
+template  struct B {};
+template  struct C
+{
+  A<0> c0; B, &C::c0> d0; // { dg-error "could not convert template 
argument" }
+  A<0> c1; B, &C::c1> d1; // { dg-error "could not convert template 
argument" }
+  A<0> c2; B, &C::c2> d2; // { dg-error "could not convert template 
argument" }
+  A<0> c3; B, &C::c3> d3; // { dg-error "could not convert template 
argument" }
+  A<0> c4; B, &C::c4> d4; // { dg-error "could not convert template 
argument" }
+  A<0> c5; B, &C::c5> d5; // { dg-error "could not convert template 
argument" }
+  A<0> c6; B, &C::c6> d6; // { dg-error "could not convert template 
argument" }
+  A<0> c7; B, &C::c7> d7; // { dg-error "could not convert template 
argument" }
+  A<0> c8; B, &C::c8> d8; // { dg-error "could not convert template 
argument" }
+  A<0> c9; B, &C::c9> d9; // { dg-error "could not convert template 
argument" }
+  A<0> ca; B, &C::ca> da; // { dg-error "could not convert template 
argument" }
+  A<0> cb; B, &C::cb> db; // { dg-error "could not convert template 
argument" }
+};
+C e;

Jakub

Re: patch to fix constant math - first small patch

2012-10-08 Thread Kenneth Zadeck


is this ok to commit with this change?

kenny
On 10/05/2012 08:14 PM, Joseph S. Myers wrote:

On Fri, 5 Oct 2012, Kenneth Zadeck wrote:


+# define HOST_HALF_WIDE_INT_PRINT "h"

This may cause problems on hosts not supporting %hd (MinGW?), and there's
no real need for using "h" here given the promotion of short to int; you
can just use "" (rather than e.g. needing special handling in xm-mingw32.h
like is done for HOST_LONG_LONG_FORMAT).

Re: [C++ PATCH] Fix ICE in cp_tree_equal (PR c++/54858)

2012-10-08 Thread Jason Merrill


OK.

Jason

Re: Convert more non-GTY htab_t to hash_table.

2012-10-08 Thread Mike Stump

On Oct 5, 2012, at 3:19 PM, Diego Novillo  wrote:
> On Fri, Oct 5, 2012 at 6:08 PM, Lawrence Crowl  wrote:
> 
>>> For many people the time to compile (almost) empty file is very
>>> important, we are already bad about that right now, initializing
>>> too much stuff dynamically is going to make it worse.
>> 
>> So far, we are looking at dynamic initializations that would
>> take about 10 cycles.  Even on a slow processor, a thousand
>> initializations would take a microsecond.  Our time reports don't
>> even report anything less than 5 milliseconds.
>> 
>> Is there any reason to believe that this anticipated static
>> initialization overhead is not pretty low relative to other overhead?
>> I'm thinking here of the fact that to even start, the driver launches
>> cc1[plus] which has to parse all the options created by the driver.
> 
> I agree.  I don't think this will be a real problem.

I hope you're right.  Experience tells me that the usual high cost a single 
dynamic initialization is 30,000,000 cycles, about 100 of them cost 1 second.  
Costs of the low side, are completely irrelevant.  I think the 10 cycle cost is 
not the high side, but the irrelevant low side number.  If one wanted to 
understand the actual cost one can take a snap of the cycle counter before the 
dynamic inits happen (or near the front of them) and take a snap of it after 
they run, and examine the difference…  A difference of 0, means, though one 
might conceive of them as dynamic inits, they are not.  And the other number is 
what it is.  A global cycle counter that free runs as a time of day counter can 
see the page faults, tlb misses and all the other hair, while per process cpu 
used counter is less useful.

Re: [i386] recognize haddpd

2012-10-08 Thread Marc Glisse


On Mon, 8 Oct 2012, Uros Bizjak wrote:


You missed the most important sseadd1 addition, the one that prevents
checking of operand2 when calculating "memory" attribute:

 (and (eq_attr "type"
 "!alu1,negnot,ishift1,
   imov,imovx,icmp,test,bitmanip,
   fmov,fcmp,fsgn,
   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
   sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
  (match_operand 2 "memory_operand"))

Please note "!" in the above expression.

[...]

Also note that you have to add handling of sseadd1 attribute in other
(scheduler) *.md files. Simply grep for sseadd and add ",sseadd1"
everywhere.


Thank you, it makes more sense now. The attached passed 
bootstrap+testsuite. I didn't know if I should be more precise in the 
ChangeLog, but it would make the ChangeLog as long as the patch with about 
23 entries like:

(define_insn_reservation bdver1_ssemuladd_256): Likewise

Next goal would be to further recognize some DPPD potential uses, but that 
seems harder.



2012-10-09  Marc Glisse  

gcc/
PR target/54400
* config/i386/i386.md (type attribute): Add sseadd1.
(unit attribute): Add support for sseadd1.
(memory attribute): Likewise.
* config/i386/athlon.md: Likewise.
* config/i386/core2.md: Likewise.
* config/i386/atom.md: Likewise.
* config/i386/ppro.md: Likewise.
* config/i386/bdver1.md: Likewise.
* config/i386/sse.md (sse3_hv2df3): split into...
(sse3_haddv2df3): ... expander.
(*sse3_haddv2df3): ... define_insn. Accept permuted operands.
(sse3_hsubv2df3): ... define_insn.
(*sse3_haddv2df3_low): New define_insn.
(*sse3_hsubv2df3_low): New define_insn.

gcc/testsuite/
PR target/54400
* gcc.target/i386/pr54400.c: New testcase.


--
Marc GlisseIndex: testsuite/gcc.target/i386/pr54400.c
===
--- testsuite/gcc.target/i386/pr54400.c (revision 0)
+++ testsuite/gcc.target/i386/pr54400.c (revision 0)
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse3 -mfpmath=sse" } */
+
+#include 
+
+double f (__m128d p)
+{
+  return p[0] - p[1];
+}
+
+double g1 (__m128d p)
+{
+  return p[0] + p[1];
+}
+
+double g2 (__m128d p)
+{
+  return p[1] + p[0];
+}
+
+__m128d h (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] - p[1], q[0] - q[1] };
+  return r;
+}
+
+__m128d i1 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[0] + q[1] };
+  return r;
+}
+
+__m128d i2 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[1] + q[0] };
+  return r;
+}
+
+__m128d i3 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[0] + q[1] };
+  return r;
+}
+
+__m128d i4 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[1] + q[0] };
+  return r;
+}
+
+/* { dg-final { scan-assembler-times "hsubpd" 2 } } */
+/* { dg-final { scan-assembler-times "haddpd" 6 } } */
+/* { dg-final { scan-assembler-not "unpck" } } */

Property changes on: testsuite/gcc.target/i386/pr54400.c
___
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 192214)
+++ config/i386/i386.md (working copy)
@@ -320,36 +320,36 @@
 ;; provided in other attributes.
 (define_attr "type"
   "other,multi,
alu,alu1,negnot,imov,imovx,lea,
incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
icmp,test,ibr,setcc,icmov,
push,pop,call,callv,leave,
str,bitmanip,
fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-   
sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
-   ssemuladd,sse4arg,lwp,
+   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
+   ssediv,sseins,ssemuladd,sse4arg,lwp,
mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
   (const_string "other"))
 
 ;; Main data type used by the insn
 (define_attr "mode"
   "unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,XF,TF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF"
   (const_string "unknown"))
 
 ;; The CPU unit operations uses.
 (define_attr "unit" "integer,i387,sse,mmx,unknown"
   (cond [(eq_attr "type" 
"fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
   (const_string "i387")
 (eq_attr "type" 
"sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
- sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
+ 
sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg")
   (const_string "sse")
 (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")

Re: patch to fix constant math - third small patch

2012-10-08 Thread Richard Sandiford

Kenneth Zadeck  writes:
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 4e0a579..b531305 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -2617,16 +2617,19 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int 
> *new_direct_jump_p,
>   constant.  */
>if (i1 == 0
>&& (temp = single_set (i2)) != 0
> -  && (CONST_INT_P (SET_SRC (temp))
> -   || CONST_DOUBLE_AS_INT_P (SET_SRC (temp)))
> +  && CONST_SCALAR_INT_P (SET_SRC (temp))
>&& GET_CODE (PATTERN (i3)) == SET
> -  && (CONST_INT_P (SET_SRC (PATTERN (i3)))
> -   || CONST_DOUBLE_AS_INT_P (SET_SRC (PATTERN (i3
> +  && CONST_SCALAR_INT_P (SET_SRC (PATTERN (i3)))
>&& reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp)))
>  {
>rtx dest = SET_DEST (PATTERN (i3));
>int offset = -1;
>int width = 0;
> +  
> +  /* There are not explicit tests to make sure that this is not a
> +  float, but there is code here that would not be correct if it
> +  was.  */
> +  gcc_assert (GET_MODE_CLASS (GET_MODE (SET_SRC (temp))) != MODE_FLOAT);

No need for this assert: CONST_SCALAR_INT_P (SET_SRC (temp)) should cover it.

> @@ -1009,9 +1007,7 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, enum machine_mode 
> memmode)
>  static rtx
>  wrap_constant (enum machine_mode mode, rtx x)
>  {
> -  if (!CONST_INT_P (x) 
> -  && GET_CODE (x) != CONST_FIXED
> -  && !CONST_DOUBLE_AS_INT_P (x))
> +  if ((!CONST_SCALAR_INT_P (x)) && GET_CODE (x) != CONST_FIXED)

Redundant brackets.

Looks good to me otherwise, thanks.

Richard

Re: [PATCH, libstdc++] Add proper OpenBSD support

2012-10-08 Thread Mark Kettenis

Jonathan,

Any further thoughts about this?  I've attached a diff that combines
my origional diff with the change to use the "newlib" locale model on
OpenBSD since they probably should be committed together.

> > > On 10 September 2012 07:34, Mark Kettenis wrote:
> > >> Date: Sun, 9 Sep 2012 21:07:39 +0100
> > >> From: Jonathan Wakely 
> > >>
> > >> On 4 September 2012 20:26, Mark Kettenis wrote:
> > >> > Fixes a few testcases.  Mostly based on the existing
> > >> > NetBSD/FreeBSD/Darwin code.
> > >> >
> > >> > 2012-09-04  Mark Kettenis  
> > >> >
> > >> > * configure.host (*-*-openbsd*) Set cpu_include_dir.
> > >> > * config/os/bsd/openbsd/ctype_base.h: New file.
> > >> > * config/os/bsd/openbsd/ctype_configure_char.cc: New file.
> > >> > * config/os/bsd/openbsd/ctype_inline.h: New file.
> > >> > * config/os/bsd/openbsd/os_defines.h: New file.
> > >>
> > >> This patch is OK, thanks.  Do you want me to commit it for you?
> > >
> > > Yes please.
> > 
> > It occurs to me now that the patch changes the size of
> > ctype_base::mask, from the generic unsigned to char. I assume the
> > OpenBSD system compiler uses char? How long has that change been
> > present in the OpenBSD source tree?
> 
> Yes, the system compile uses char and has been doing so since mid-2005.
> 
> > I'm not sure whether or not it's better to change the size of that
> > type in GCC 4.8, which would break compatibility with previous
> > versions of the FSF sources but provide compatibility with the OpenBSD
> > system compiler.  My guess would be that most people on OpenBSD are
> > using the system compiler not upstream FSF sources.
> 
> Indeed.  People either use the system compiler or install one from
> ports/packages.  Given the sorry state of OpenBSD support in the FSF
> source tree (barely buildable) I think binary compatibility with the
> system compiler is more important.

2012-10-08  Mark Kettenis  

* configure.host (*-*-openbsd*) Set cpu_include_dir.
* config/os/bsd/openbsd/ctype_base.h: New file.
* config/os/bsd/openbsd/ctype_configure_char.cc: New file.
* config/os/bsd/openbsd/ctype_inline.h: New file.
* config/os/bsd/openbsd/os_defines.h: New file.
* acinclude.m4 (GLIBCXX_ENABLE_CLOCALE): Use newlib locale model
for OpenBSD.
* configure: Regenerated.


Index: acinclude.m4
===
--- acinclude.m4(revision 192154)
+++ acinclude.m4(working copy)
@@ -1862,6 +1862,9 @@
   darwin* | freebsd*)
enable_clocale_flag=darwin
;;
+  openbsd*)
+   enable_clocale_flag=newlib
+   ;;
   *)
if test x"$with_newlib" = x"yes"; then
  enable_clocale_flag=newlib
Index: config/os/bsd/openbsd/ctype_base.h
===
--- config/os/bsd/openbsd/ctype_base.h  (revision 0)
+++ config/os/bsd/openbsd/ctype_base.h  (working copy)
@@ -0,0 +1,59 @@
+// Locale support -*- C++ -*-
+
+// Copyright (C) 2000, 2009, 2012 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+//
+// ISO C++ 14882: 22.1  Locales
+//
+  
+// Information as gleaned from /usr/include/ctype.h on OpenBSD.
+  
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  /// @brief  Base class for ctype.
+  struct ctype_base
+  {
+// Non-standard typedefs.
+typedef const short*   __to_type;
+
+// NB: Offsets into ctype::_M_table force a particular size
+// on the mask type. Because of this, we don't use an enum.
+typedef char   mask;
+
+static const mask upper= _U;
+static const mask lower= _L;
+static const mask alpha= _U | _L;
+static const mask digit= _N;
+static const mask xdigit   = _N | _X;
+static const mask space= _S;
+static const mask print= _P | _U | _L | _N | _B;
+static const mask graph= _P | _U | _L | _N;
+static co

[google/4_7] Patch committed: backport the location_block bugfix patches from trunk

2012-10-08 Thread Dehao Chen

I have backported the following patches from trunk to google-4_7:

191931, 192049, 192120, 192165

gcc:
2012-10-08  Dehao Chen  

Backport 191931, 192049, 192120, 192165 from trunk.

* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Use
LOCATION_LOCUS to compare with UNKNOWN_LOCATION.
(slpeel_tree_peel_loop_to_edge): Likewise.
* tree-vectorizer.c (vectorize_loops): Likewise.
* tree-cfg.c (move_block_to_fn): Update lexical block for phi_args.
* tree-ssa-live.c (clear_unused_block_pointer_1): Look at
DECL_DEBUG_EXPR again.
* gimple-low.c (lower_stmt): Set the block for call args.

testsuite:
2012-10-08  Dehao Chen  

Backport r192049 from:trunk:

* gcc.dg/pr54782.c: New test.

Bootstrapped and passed crosstool tests.

Dehao

Re: [PATCH, libstdc++] Add proper OpenBSD support

2012-10-08 Thread Jonathan Wakely

On 8 October 2012 20:45, Mark Kettenis wrote:
> Jonathan,
>
> Any further thoughts about this?  I've attached a diff that combines
> my origional diff with the change to use the "newlib" locale model on
> OpenBSD since they probably should be committed together.

Hi,

Sorry for the delay, I realised over the weekend this never went in.

I'm happy to apply the combined diff if you think using newlib is the
right option for OpenBSD.

Jonathan

Re: patch to fix constant math

2012-10-08 Thread Richard Sandiford

Robert Dewar  writes:
> On 10/8/2012 11:01 AM, Nathan Froyd wrote:
>> - Original Message -
>>> Btw, as for Richards idea of conditionally placing the length field
>>> in
>>> rtx_def looks like overkill to me.  These days we'd merely want to
>>> optimize for 64bit hosts, thus unconditionally adding a 32 bit
>>> field to rtx_def looks ok to me (you can wrap that inside a union to
>>> allow both descriptive names and eventual different use - see what
>>> I've done to tree_base)
>>
>> IMHO, unconditionally adding that field isn't "optimize for 64-bit
>> hosts", but "gratuitously make one of the major compiler data
>> structures bigger on 32-bit hosts".  Not everybody can cross-compile
>> from a 64-bit host.  And even those people who can don't necessarily
>> want to.  Please try to consider what's best for all the people who
>> use GCC, not just the cases you happen to be working with every day.
>
> I think that's rasonable in general, but as time goes on, and every
> $300 laptop is 64-bit capable, one should not go TOO far out of the
> way trying to make sure we can compile everything on a 32-bit machine.

It's not 64-bit machine vs. 32-bit machine.  It's an LP64 ABI vs.
an ILP32 ABI.  HJ & co. have put considerable effort into developing
the x32 ABI for x86_64 precisely because ILP32 is still useful for
64-bit machines.  Just as it was for MIPS when SGI invented n32
(which is still useful now).  I believe 64-bit SPARC has a similar
thing, and no doubt other architectures do too.

After all, there shouldn't be much need for more than 2GB of virtual
address space in an AVR cross compiler.  So why pay the cache penalty
of 64-bit pointers and longs (GCC generally tries to avoid using "long"
directly) when a 32-bit pointer will do?

Many years ago, I moved the HOST_WIDE_INT fields out of rtunion
and into the main rtx_def union because it produced a significant
speed-up on n32 IRIX.  That was before tree-level optimisation,
but I don't think we've really pruned that much RTL optimisation
since then, so I'd be surprised if much has changed.

Richard

Re: [i386] recognize haddpd

2012-10-08 Thread Uros Bizjak

On Mon, Oct 8, 2012 at 9:36 PM, Marc Glisse  wrote:
> On Mon, 8 Oct 2012, Uros Bizjak wrote:
>
>> You missed the most important sseadd1 addition, the one that prevents
>> checking of operand2 when calculating "memory" attribute:
>>
>>  (and (eq_attr "type"
>>  "!alu1,negnot,ishift1,
>>imov,imovx,icmp,test,bitmanip,
>>fmov,fcmp,fsgn,
>>
>> sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
>>sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt")
>>   (match_operand 2 "memory_operand"))
>>
>> Please note "!" in the above expression.
>
> [...]
>
>> Also note that you have to add handling of sseadd1 attribute in other
>> (scheduler) *.md files. Simply grep for sseadd and add ",sseadd1"
>> everywhere.
>
>
> Thank you, it makes more sense now. The attached passed bootstrap+testsuite.
> I didn't know if I should be more precise in the ChangeLog, but it would
> make the ChangeLog as long as the patch with about 23 entries like:
> (define_insn_reservation bdver1_ssemuladd_256): Likewise
>
> Next goal would be to further recognize some DPPD potential uses, but that
> seems harder.
>
>
> 2012-10-09  Marc Glisse  
>
>
> gcc/
> PR target/54400
> * config/i386/i386.md (type attribute): Add sseadd1.
> (unit attribute): Add support for sseadd1.
> (memory attribute): Likewise.
> * config/i386/athlon.md: Likewise.
> * config/i386/core2.md: Likewise.
> * config/i386/atom.md: Likewise.
> * config/i386/ppro.md: Likewise.
> * config/i386/bdver1.md: Likewise.
>
> * config/i386/sse.md (sse3_hv2df3): split into...
> (sse3_haddv2df3): ... expander.
> (*sse3_haddv2df3): ... define_insn. Accept permuted operands.
> (sse3_hsubv2df3): ... define_insn.
> (*sse3_haddv2df3_low): New define_insn.
> (*sse3_hsubv2df3_low): New define_insn.
>
> gcc/testsuite/
> PR target/54400
> * gcc.target/i386/pr54400.c: New testcase.

OK for mainline SVN with a couple of small changes below ...

> +(define_insn "*sse3_haddv2df3"
>[(set (match_operand:V2DF 0 "register_operand" "=x,x")
> (vec_concat:V2DF
> - (plusminus:DF
> + (plus:DF
> +   (vec_select:DF
> + (match_operand:V2DF 1 "register_operand" "0,x")
> + (parallel [(match_operand:SI 3 "const_0_to_1_operand")]))
> +   (vec_select:DF
> + (match_dup 1)
> + (parallel [(match_operand:SI 4 "const_0_to_1_operand")])))
> + (plus:DF
> +   (vec_select:DF
> + (match_operand:V2DF 2 "nonimmediate_operand" "xm,xm")
> + (parallel [(match_operand:SI 5 "const_0_to_1_operand")]))
> +   (vec_select:DF
> + (match_dup 2)
> + (parallel [(match_operand:SI 6 "const_0_to_1_operand")])]
> +  "TARGET_SSE3 && INTVAL (operands[3]) != INTVAL (operands[4])
> +   && INTVAL (operands[5]) != INTVAL (operands[6])"

Please put every && expression in its own line:

"TARGET_SSE3
  && INTVAL (operands[3]) != INTVAL (operands[4])
  && INTVAL (operands[5]) != INTVAL (operands[6])"

> +(define_insn "*sse3_haddv2df3_low"
> +  [(set (match_operand:DF 0 "register_operand" "=x,x")
> +   (plus:DF
> + (vec_select:DF
> +   (match_operand:V2DF 1 "register_operand" "0,x")
> +   (parallel [(match_operand:SI 2 "const_0_to_1_operand")]))
> + (vec_select:DF
> +   (match_dup 1)
> +   (parallel [(match_operand:SI 3 "const_0_to_1_operand")]]
> +  "TARGET_SSE3 && INTVAL (operands[2]) != INTVAL (operands[3])"

Also here.

Thanks,
Uros.

Small cleanup/memory leak plugs for lto

2012-10-08 Thread Tobias Burnus


Some more issues found by Coverity scanner.

lto-cgraph.c: The code seems to be unused, besides, it's a zero-trip 
loop as parm_num is set to 0 and then checked non nonzeroness.


lto-opts: The check whether first_p is non NULL is always false: All 
calls have a variable ref as argument - and first_p is unconditionally 
dereferenced.


lto_obj_file_open: One could check additionally check "lo" is NULL, but 
that has then to be directly after the XCNEW as already lto_file_init 
dereferences "lo".


Build and regtested on x86-64-gnu-linux

Tobias


patch.diff
Description: application/unknown

Re: [patch][lra] Improve initial program point density in lra-lives.c (RFA)

2012-10-08 Thread Vladimir Makarov


On 10/07/2012 02:52 PM, Steven Bosscher wrote:

On Sat, Oct 6, 2012 at 4:52 AM, Vladimir Makarov wrote:

Without this patch:
Compressing live ranges: from 700458 to 391665 - 55%, pre_count
40730653, post_count 34363983
max per-reg pre_count 12978 (228090, 2 defs, 2 uses) (reg/f:DI 228090
[ SR.25009 ])
max per-reg post_count 10967 (228090, 2 defs, 2 uses) (reg/f:DI 228090
[ SR.25009 ])

With this patch:
Compressing live ranges: from 700458 to 372585 - 53%, pre_count
283937, post_count 271120
max per-reg pre_count 545 (230653, 542 defs, 542 uses) (reg/f:DI
230653 [ SR.13303 ])
max per-reg post_count 544 (230649, 542 defs, 542 uses) (reg/f:DI
230649 [ SR.13305 ])

(the per-reg counts are the lengths of the live range chains for the
mentioned regno).

Yes, that is impressive.  But I think, #points in a live range is a real
parameter of the complexity.

Yes, that's probably true, except for the compression stuff.

Here's the final patch, bootstrapped&tested on
x86_64-unknown-linux-gnu. OK for the LRA-branch?



Yes.  Thanks, Steven.

Optimizing live ranges is a real fun.  I guess there is some potential 
to improve function to check live ranges intersection and merging.

1 2 >

1 - 100 of 125 matches

Mail list logo