Re: [PATCH][0/n] tree LIM TLC - series part for backporting, limit LIM

2013-03-18 Thread Richard Biener
On Fri, 15 Mar 2013, Richard Biener wrote:

> On Thu, 14 Mar 2013, Richard Biener wrote:
> 
> > 
> > This extracts pieces from the already posted patch series that are
> > most worthwhile and applicable for backporting to both 4.8 and 4.7.
> > It also re-implements the limiting of the maximum number of memory
> > references to consider for LIMs dependence analysis.  This limiting
> > is now done per loop-nest and disables optimizing outer loops
> > only.  The limiting requires backporting introduction of the
> > shared unalalyzable mem-ref - it works by marking that as stored
> > in loops we do not want to compute dependences for - which makes
> > dependence computation for mems in those loops linear, as that
> > mem-ref, which conveniently has ID 0, is tested first.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > The current limit of 1000 datarefs is quite low (well, for LIMs
> > purposes, that is), and I only bothered to care about -O1 for
> > backports (no caching of the affine combination).  With the
> > limit in place and at -O1 LIM now takes
> > 
> >  tree loop invariant motion:   0.55 ( 1%) usr
> > 
> > for the testcase in PR39326.  Four patches in total, we might
> > consider not backporting the limiting, without it this
> > insane testcase has, at ~2GB memory usage (peak determined by IRA)
> > 
> >  tree loop invariant motion: 533.30 (77%) usr
> > 
> > but avoids running into the DSE / combine issue (and thus stays
> > managable overall at -O1).  With limiting it requires -fno-dse
> > to not blow up (>5GB of memory use).
> 
> Note that the limiting patch (below) causes code-generation differences
> because it collects memory-references in a different order and
> store-motion applies its transform in order of mem-ref IDs
> (different order of loads / stores and different decl UIDs).  The
> different ordering results in quite a big speedup because bitmaps
> have a more regular form (maybe only for this testcase though).

I have now committed the first two patches to trunk as r196768.

Richard.

2013-03-18  Richard Biener  

PR tree-optimization/39326
* tree-ssa-loop-im.c (refs_independent_p): Exploit symmetry.
(struct mem_ref): Replace mem member with ao_ref typed member.
(MEM_ANALYZABLE): Adjust.
(memref_eq): Likewise.
(mem_ref_alloc): Likewise.
(gather_mem_refs_stmt): Likewise.
(mem_refs_may_alias_p): Use the ao_ref to query the alias oracle.
(execute_sm_if_changed_flag_set): Adjust.
(execute_sm): Likewise.
(ref_always_accessed_p): Likewise.
(refs_independent_p): Likewise.
(can_sm_ref_p): Likewise.



Re: [PATCH][RFC] Fix PR56113 more

2013-03-18 Thread Richard Biener
On Fri, 8 Feb 2013, Richard Biener wrote:

> On Fri, 1 Feb 2013, Richard Biener wrote:
> 
> > On Fri, 1 Feb 2013, Jakub Jelinek wrote:
> > 
> > > On Fri, Feb 01, 2013 at 10:00:00AM +0100, Richard Biener wrote:
> > > > 
> > > > This reduces compile-time of the testcase in PR56113 (with n = 4)
> > > > from 575s to 353s.  It does so by reducing the quadratic algorithm
> > > > to impose an order on visiting dominator sons during a domwalk.
> > > > 
> > > > Steven raises the issue that there exist domwalk users that modify
> > > > the CFG during the walk and thus the new scheme does not work
> > > > (at least optimally, as the current quadratic scheme does).  As
> > > > we are using a fixed-size sbitmap to track visited blocks existing
> > > > domwalks cannot add new blocks to the CFG so the worst thing that
> > > > can happen is that the order of dominator sons is no longer
> > > > optimal (I suppose with the "right" CFG manipulations even the
> > > > domwalk itself does not work - so I'd be hesitant to try to support
> > > > such domwalk users) - back to the state before any ordering
> > > > was imposed on the dom children visits (see rev 159100).
> > > 
> > > I think it would be desirable to first analyze the failures Steven saw, if
> > > any.  As you said, asan doesn't use domwalk, so it is a mystery to me.
> > 
> > Yeah.  Now, fortunately domwalk.h is only directly included and thus
> > the set of optimizers using it are
> > 
> > compare-elim.c:#include "domwalk.h"
> > domwalk.c:#include "domwalk.h"
> > fwprop.c:#include "domwalk.h"
> > gimple-ssa-strength-reduction.c:#include "domwalk.h"
> > graphite-sese-to-poly.c:#include "domwalk.h"
> > tree-into-ssa.c:#include "domwalk.h"
> > tree-ssa-dom.c:#include "domwalk.h"
> > tree-ssa-dse.c:#include "domwalk.h"
> > tree-ssa-loop-im.c:#include "domwalk.h"
> > tree-ssa-math-opts.c:   If we did this using domwalk.c, an efficient 
> > implementation would have
> > tree-ssa-phiopt.c:#include "domwalk.h"
> > tree-ssa-pre.c:#include "domwalk.h"
> > tree-ssa-pre.c:/* Local state for the eliminate domwalk.  */
> > tree-ssa-pre.c:   eliminate domwalk.  */
> > tree-ssa-pre.c:/* At the current point of the eliminate domwalk make OP 
> > available.  */
> > tree-ssa-pre.c:/* Perform elimination for the basic-block B during the 
> > domwalk.  */
> > tree-ssa-strlen.c:#include "domwalk.h"
> > tree-ssa-uncprop.c:#include "domwalk.h"
> > 
> > I don't see any target specific ones that do not have coverage
> > with x86_64 multilib testing (maybe compare-elim.c?  though that
> > doesn't really require a domwalk as it is only using the
> > before_dom_children hook).  That said, arbitrary CFG manipulations
> > during domwalk certainly will not preserve "domwalk" properties
> > of a domwalk.
> > 
> > Steven - can you reproduce your failures (and on which target?)
> 
> Ping.
> 
> I'm not sure what to do about this old compile-time regression.
> Apart from this known issue in domwalk.c GCC 4.8 scalability (at -O1)
> looks quite good.  I can certainly push it back to 4.9 if you think
> it's too risky to fix now.

I have committed this to trunk now, r196769.

Richard.

2013-03-18  Richard Biener  

PR middle-end/56113
* domwalk.c (bb_postorder): New global static.
(cmp_bb_postorder): New function.
(walk_dominator_tree): Replace scheme imposing an order for
visiting dominator sons by one sorting them at the time they
are pushed on the stack.



Re: [PATCH] Fix ???s in find_uses_to_rename and vect_transform_loop

2013-03-18 Thread Richard Biener
On Mon, Feb 11, 2013 at 4:45 PM, Richard Biener  wrote:
>
> This fixes the compile-time sink in find_uses_to_rename, that we
> scan the whole function when nothing is to do (well, appearantly).
>
> -O3 bootstrap and regtest on x86_64-unknown-linux-gnu in progress,
> scheduled for stage1.

Committed as r196770.

Richard.

> Richard.
>
> 2013-02-11  Richard Biener  
>
> * tree-ssa-loop-manip.c (find_uses_to_rename): Do not scan the
> whole function when there is nothing to do.
> * tree-ssa-loop.c (pass_vectorize): Remove TODO_update_ssa.
> * tree-vectorizer.c (vectorize_loops): Update virtual and
> loop-closed SSA once.
> * tree-vect-loop.c (vect_transform_loop): Do not update SSA here.
>
> Index: gcc/tree-ssa-loop-manip.c
> ===
> *** gcc/tree-ssa-loop-manip.c   (revision 195940)
> --- gcc/tree-ssa-loop-manip.c   (working copy)
> *** find_uses_to_rename (bitmap changed_bbs,
> *** 443,463 
> unsigned index;
> bitmap_iterator bi;
>
> !   /* ??? If CHANGED_BBS is empty we rewrite the whole function -- why?  */
> !   if (changed_bbs && !bitmap_empty_p (changed_bbs))
> ! {
> !   EXECUTE_IF_SET_IN_BITMAP (changed_bbs, 0, index, bi)
> !   {
> ! find_uses_to_rename_bb (BASIC_BLOCK (index), use_blocks, need_phis);
> !   }
> ! }
> else
> ! {
> !   FOR_EACH_BB (bb)
> !   {
> ! find_uses_to_rename_bb (bb, use_blocks, need_phis);
> !   }
> ! }
>   }
>
>   /* Rewrites the program into a loop closed ssa form -- i.e. inserts extra
> --- 443,454 
> unsigned index;
> bitmap_iterator bi;
>
> !   if (changed_bbs)
> ! EXECUTE_IF_SET_IN_BITMAP (changed_bbs, 0, index, bi)
> !   find_uses_to_rename_bb (BASIC_BLOCK (index), use_blocks, need_phis);
> else
> ! FOR_EACH_BB (bb)
> !   find_uses_to_rename_bb (bb, use_blocks, need_phis);
>   }
>
>   /* Rewrites the program into a loop closed ssa form -- i.e. inserts extra
> Index: gcc/tree-ssa-loop.c
> ===
> *** gcc/tree-ssa-loop.c (revision 195940)
> --- gcc/tree-ssa-loop.c (working copy)
> *** struct gimple_opt_pass pass_vectorize =
> *** 242,249 
> 0,/* properties_provided */
> 0,/* properties_destroyed */
> 0,  /* todo_flags_start */
> !   TODO_update_ssa
> ! | TODO_ggc_collect/* todo_flags_finish */
>}
>   };
>
> --- 242,248 
> 0,/* properties_provided */
> 0,/* properties_destroyed */
> 0,  /* todo_flags_start */
> !   TODO_ggc_collect/* todo_flags_finish */
>}
>   };
>
> Index: gcc/tree-vectorizer.c
> ===
> *** gcc/tree-vectorizer.c   (revision 195940)
> --- gcc/tree-vectorizer.c   (working copy)
> *** vectorize_loops (void)
> *** 149,155 
>
> free_stmt_vec_info_vec ();
>
> !   return num_vectorized_loops > 0 ? TODO_cleanup_cfg : 0;
>   }
>
>
> --- 149,164 
>
> free_stmt_vec_info_vec ();
>
> !   if (num_vectorized_loops > 0)
> ! {
> !   /* If we vectorized any loop only virtual SSA form needs to be 
> updated.
> !???  Also while we try hard to update loop-closed SSA form we fail
> !to properly do this in some corner-cases (see PR56286).  */
> !   rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa_only_virtuals);
> !   return TODO_cleanup_cfg;
> ! }
> !
> !   return 0;
>   }
>
>
> Index: gcc/tree-vect-loop.c
> ===
> *** gcc/tree-vect-loop.c(revision 195940)
> --- gcc/tree-vect-loop.c(working copy)
> *** vect_transform_loop (loop_vec_info loop_
> *** 5763,5773 
>  loop->nb_iterations_estimate = loop->nb_iterations_estimate - 
> double_int_one;
>   }
>
> -   /* The memory tags and pointers in vectorized statements need to
> -  have their SSA forms updated.  FIXME, why can't this be delayed
> -  until all the loops have been transformed?  */
> -   update_ssa (TODO_update_ssa);
> -
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, "LOOP 
> VECTORIZED.");
> if (loop->inner && dump_enabled_p ())
> --- 5763,5768 


Re: [PATCH] Fix PR3713

2013-03-18 Thread Richard Biener
On Wed, Jan 16, 2013 at 4:57 PM, Richard Biener  wrote:
>
> This fixes PR3713 by properly propagating ->has_constants in SCCVN.
> With that we are able to simplify (unsigned) Bar & 1 properly.
> Only copyprop later turns the call into a direct one though,
> so I'm testing the important fact - that Bar is inlined and eliminated
> by IPA inlining.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Unless this is somehow a regression (which I doubt) this has to
> wait for stage1 (even though it's pretty safe and at most exposes
> existing bugs in SCCVN).

Committed as r196771.

Richard.

> Richard.
>
> 2013-01-16  Richard Biener  
>
> PR tree-optimization/3713
> * tree-ssa-sccvn.c (visit_copy): Simplify.  Always propagate
> has_constants and expr.
> (stmt_has_constants): Properly valueize SSA names when deciding
> whether the stmt has constants.
>
> * g++.dg/ipa/devirt-11.C: New testcase.
>
> Index: gcc/tree-ssa-sccvn.c
> ===
> *** gcc/tree-ssa-sccvn.c(revision 195240)
> --- gcc/tree-ssa-sccvn.c(working copy)
> *** static tree valueize_expr (tree expr);
> *** 2653,2670 
>   static bool
>   visit_copy (tree lhs, tree rhs)
>   {
> -   /* Follow chains of copies to their destination.  */
> -   while (TREE_CODE (rhs) == SSA_NAME
> -&& SSA_VAL (rhs) != rhs)
> - rhs = SSA_VAL (rhs);
> -
> /* The copy may have a more interesting constant filled expression
>(we don't, since we know our RHS is just an SSA name).  */
> !   if (TREE_CODE (rhs) == SSA_NAME)
> ! {
> !   VN_INFO (lhs)->has_constants = VN_INFO (rhs)->has_constants;
> !   VN_INFO (lhs)->expr = VN_INFO (rhs)->expr;
> ! }
>
> return set_ssa_val_to (lhs, rhs);
>   }
> --- 2653,2665 
>   static bool
>   visit_copy (tree lhs, tree rhs)
>   {
> /* The copy may have a more interesting constant filled expression
>(we don't, since we know our RHS is just an SSA name).  */
> !   VN_INFO (lhs)->has_constants = VN_INFO (rhs)->has_constants;
> !   VN_INFO (lhs)->expr = VN_INFO (rhs)->expr;
> !
> !   /* And finally valueize.  */
> !   rhs = SSA_VAL (rhs);
>
> return set_ssa_val_to (lhs, rhs);
>   }
> *** expr_has_constants (tree expr)
> *** 3063,3087 
>   static bool
>   stmt_has_constants (gimple stmt)
>   {
> if (gimple_code (stmt) != GIMPLE_ASSIGN)
>   return false;
>
> switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
>   {
> ! case GIMPLE_UNARY_RHS:
> !   return is_gimple_min_invariant (gimple_assign_rhs1 (stmt));
>
>   case GIMPLE_BINARY_RHS:
> !   return (is_gimple_min_invariant (gimple_assign_rhs1 (stmt))
> ! || is_gimple_min_invariant (gimple_assign_rhs2 (stmt)));
> ! case GIMPLE_TERNARY_RHS:
> !   return (is_gimple_min_invariant (gimple_assign_rhs1 (stmt))
> ! || is_gimple_min_invariant (gimple_assign_rhs2 (stmt))
> ! || is_gimple_min_invariant (gimple_assign_rhs3 (stmt)));
>   case GIMPLE_SINGLE_RHS:
> /* Constants inside reference ops are rarely interesting, but
>  it can take a lot of looking to find them.  */
> !   return is_gimple_min_invariant (gimple_assign_rhs1 (stmt));
>   default:
> gcc_unreachable ();
>   }
> --- 3058,3095 
>   static bool
>   stmt_has_constants (gimple stmt)
>   {
> +   tree tem;
> +
> if (gimple_code (stmt) != GIMPLE_ASSIGN)
>   return false;
>
> switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
>   {
> ! case GIMPLE_TERNARY_RHS:
> !   tem = gimple_assign_rhs3 (stmt);
> !   if (TREE_CODE (tem) == SSA_NAME)
> !   tem = SSA_VAL (tem);
> !   if (is_gimple_min_invariant (tem))
> !   return true;
> !   /* Fallthru.  */
>
>   case GIMPLE_BINARY_RHS:
> !   tem = gimple_assign_rhs2 (stmt);
> !   if (TREE_CODE (tem) == SSA_NAME)
> !   tem = SSA_VAL (tem);
> !   if (is_gimple_min_invariant (tem))
> !   return true;
> !   /* Fallthru.  */
> !
>   case GIMPLE_SINGLE_RHS:
> /* Constants inside reference ops are rarely interesting, but
>  it can take a lot of looking to find them.  */
> ! case GIMPLE_UNARY_RHS:
> !   tem = gimple_assign_rhs1 (stmt);
> !   if (TREE_CODE (tem) == SSA_NAME)
> !   tem = SSA_VAL (tem);
> !   return is_gimple_min_invariant (tem);
> !
>   default:
> gcc_unreachable ();
>   }
> Index: gcc/testsuite/g++.dg/ipa/devirt-11.C
> ===
> *** gcc/testsuite/g++.dg/ipa/devirt-11.C(revision 0)
> --- gcc/testsuite/g++.dg/ipa/devirt-11.C(working copy)
> ***
> *** 0 
> --- 1,22 
> + // { dg-do compile }
> + // { dg-options "-std=c++11 -O -fdump-ipa-inline" }
> +
> + class Foo
> + {
> + public:
> +   void Bar() const
> + {
> +   

Re: [google][4.7]Using CPU mocks to test code coverage of multiversioned functions

2013-03-18 Thread Richard Biener
On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam  wrote:
> Hi,
>
>This patch is meant for google/gcc-4_7 but I want this to be
> considered for trunk when it opens again. This patch makes it easy to
> test for code coverage of multiversioned functions. Here is a
> motivating example:
>
> __attribute__((target ("default"))) int foo () { ... return 0; }
> __attribute__((target ("sse"))) int foo () { ... return 1; }
> __attribute__((target ("popcnt"))) int foo () { ... return 2; }
>
> int main ()
> {
>   return foo();
> }
>
> Lets say your test CPU supports popcnt.  A run of this program will
> invoke the popcnt version of foo (). Then, how do we test the sse
> version of foo()? To do that for the above example, we need to run
> this code on a CPU that has sse support but no popcnt support.
> Otherwise, we need to comment out the popcnt version and run this
> example. This can get painful when there are many versions. The same
> argument applies to testing  the default version of foo.
>
> So, I am introducing the ability to mock a CPU. If the CPU you are
> testing on supports sse, you should be able to test the sse version.
>
> First, I have introduced a new flag called -fmv-debug.  This patch
> invokes the function version dispatcher every time a call to a foo ()
> is made. Without that flag, the version dispatch happens once at
> startup time via the IFUNC mechanism.
>
> Also, with -fmv-debug, the version dispatcher uses the two new
> builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to
> check the cpu type and cpu isa.
>
> Then, I plan to add the following hooks to libgcc (in a different patch) :
>
> int set_mock_cpu_is (const char *cpu);
> int set_mock_cpu_supports (const char *isa);
> int init_mock_cpu (); // Clear the values of the mock cpu.
>
> With this support, here is how you can test for code coverage of the
> "sse" version and "default version of foo in the above example:
>
> int main ()
> {
>   // Test SSE version.
>if (__builtin_cpu_supports ("sse"))
>{
>  init_mock_cpu();
>  set_mock_cpu_supports ("sse");
>  assert (foo () == 1);
>}
>   // Test default version.
>   init_mock_cpu();
>   assert (foo () == 0);
> }
>
> Invoking a multiversioned binary several times with appropriate mock
> cpu values for the various ISAs and CPUs will give the complete code
> coverage desired. Ofcourse, the underlying platform should be able to
> support the various features.
>
> Note that the above test will work only with -fmv-debug as the
> dispatcher must be invoked on every multiversioned call to be able to
> dynamically change the version.
>
> Multiple ISA features can be set in the mock cpu by calling
> "set_mock_cpu_supports" several times with different ISA names.
> Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
> will set the CPU type.
>
> This patch only includes the gcc changes.  I will separately prepare a
> patch for the libgcc changes. Right now, since the libgcc changes are
> not available the two new mock cpu builtins check the real CPU like
> "__builtin_cpu_is" and "__builtin_cpu_supports".
>
> Patch attached.  Please look at mv14_debug_code_coverage.C for an
> exhaustive example of testing for code coverage in the presence of
> multiple versions.
>
> Comments please.

Err.  As we are using IFUNCs isn't it simply possible to do this in
the dynamic loader - for example by simlply pre-loading a library
with the IFUNC relocators implemented differently?  Thus, shouldn't
we simply provide such library as a convenience?

Thanks,
Richard.

> Thanks
> Sri


Re: Fold VEC_COND_EXPR to abs, min, max

2013-03-18 Thread Richard Biener
On Sun, Mar 17, 2013 at 4:46 PM, Marc Glisse  wrote:
> Hello,
>
> this patch adds a bit of folding to VEC_COND_EXPR so it is possible to
> generate ABS_EXPR and MAX_EXPR for vectors without relying on the
> vectorizer. I would have preferred to merge the COND_EXPR and VEC_COND_EXPR
> cases, but there are too many things that need fixing first, so I just
> copied the most relevant block. Folding from the front-end is ugly, but
> that's how the scalar case works, and they can both move to gimple folding
> together later.
>
> Bootstrap + testsuite on x86_64-linux-gnu.
>
> 2013-03-17  Marc Glisse  
>
> gcc/
> * fold-const.c (fold_cond_expr_with_comparison): Use build_zero_cst.
> VEC_COND_EXPR cannot be lvalues.
> (fold_ternary_loc) : Call
> fold_cond_expr_with_comparison.
>
> gcc/cp/
> * call.c (build_conditional_expr_1): Fold VEC_COND_EXPR.
>
> gcc/testsuite/
> * g++.dg/ext/vector21.C: New testcase.
>
> --
> Marc Glisse
> Index: gcc/testsuite/g++.dg/ext/vector21.C
> ===
> --- gcc/testsuite/g++.dg/ext/vector21.C (revision 0)
> +++ gcc/testsuite/g++.dg/ext/vector21.C (revision 0)
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-gimple" } */
> +
> +typedef int vec __attribute__ ((vector_size (4 * sizeof (int;
> +
> +void f1 (vec *x)
> +{
> +  *x = (*x >= 0) ? *x : -*x;
> +}
> +void f2 (vec *x)
> +{
> +  *x = (0 < *x) ? *x : -*x;
> +}
> +void g1 (vec *x)
> +{
> +  *x = (*x < 0) ? -*x : *x;
> +}
> +void g2 (vec *x)
> +{
> +  *x = (0 > *x) ? -*x : *x;
> +}
> +void h (vec *x, vec *y)
> +{
> +  *x = (*x < *y) ? *y : *x;
> +}
> +void i (vec *x, vec *y)
> +{
> +  *x = (*x < *y) ? *x : *y;
> +}
> +void j (vec *x, vec *y)
> +{
> +  *x = (*x < *y) ? *x : *x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "ABS_EXPR" 4 "gimple" } } */
> +/* { dg-final { scan-tree-dump "MIN_EXPR" "gimple" } } */
> +/* { dg-final { scan-tree-dump "MAX_EXPR" "gimple" } } */
> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR" "gimple" } } */
> +/* { dg-final { cleanup-tree-dump "gimple" } } */
>
> Property changes on: gcc/testsuite/g++.dg/ext/vector21.C
> ___
> Added: svn:keywords
>+ Author Date Id Revision URL
> Added: svn:eol-style
>+ native
>
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c(revision 196748)
> +++ gcc/fold-const.c(working copy)
> @@ -4625,21 +4625,21 @@ fold_cond_expr_with_comparison (location
>   A == 0 ? A : 0 is always 0 unless A is -0.  Note that
>   both transformations are correct when A is NaN: A != 0
>   is then true, and A == 0 is false.  */
>
>if (!HONOR_SIGNED_ZEROS (TYPE_MODE (type))
>&& integer_zerop (arg01) && integer_zerop (arg2))
>  {
>if (comp_code == NE_EXPR)
> return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type,
> arg1));
>else if (comp_code == EQ_EXPR)
> -   return build_int_cst (type, 0);
> +   return build_zero_cst (type);
>  }
>
>/* Try some transformations of A op B ? A : B.
>
>   A == B? A : Bsame as B
>   A != B? A : Bsame as A
>   A >= B? A : Bsame as max (A, B)
>   A > B?  A : Bsame as max (B, A)
>   A <= B? A : Bsame as min (A, B)
>   A < B?  A : Bsame as min (B, A)
> @@ -4662,21 +4662,22 @@ fold_cond_expr_with_comparison (location
>   expressions will be false, so all four give B.  The min()
>   and max() versions would give a NaN instead.  */
>if (!HONOR_SIGNED_ZEROS (TYPE_MODE (type))
>&& operand_equal_for_comparison_p (arg01, arg2, arg00)
>/* Avoid these transformations if the COND_EXPR may be used
>  as an lvalue in the C++ front-end.  PR c++/19199.  */
>&& (in_gimple_form
>   || (strcmp (lang_hooks.name, "GNU C++") != 0
>   && strcmp (lang_hooks.name, "GNU Objective-C++") != 0)
>   || ! maybe_lvalue_p (arg1)
> - || ! maybe_lvalue_p (arg2)))
> + || ! maybe_lvalue_p (arg2)
> + || TREE_CODE (TREE_TYPE (arg1)) == VECTOR_TYPE))

You mean that the VEC_COND_EXPRs can never be used as an lvalue in
the C++ frontend?

>  {
>tree comp_op0 = arg00;
>tree comp_op1 = arg01;
>tree comp_type = TREE_TYPE (comp_op0);
>
>/* Avoid adding NOP_EXPRs in case this is an lvalue.  */
>if (TYPE_MAIN_VARIANT (comp_type) == TYPE_MAIN_VARIANT (type))
> {
>   comp_type = type;
>   comp_op0 = arg1;
> @@ -14138,20 +14139,51 @@ fold_ternary_loc (location_t loc, enum t
>return NULL_TREE;
>
>  case VEC_COND_EXPR:
>if (TREE_CODE (arg0) == VECTOR_CST)
> {
>   if (integer_all_onesp (arg0) && !TREE_SIDE_EFFECTS (op2))
> return pedantic_non_lvalue_loc (loc, op1);
>   if (integer_zerop (arg0) && !TREE_

Re: [v3] libstdc++/55979 (+ notes about 55977)

2013-03-18 Thread Paolo Carlini

Hi,

On 03/17/2013 06:45 PM, Jonathan Wakely wrote:

On 17 March 2013 17:14, Paolo Carlini wrote:

I guess we could at least work around the problem by going back to
_M_get_Tp_allocator().construct in _M_create_node (or, better, the
allocator_traits<>::construct equivalent, per the recent fix for 56613; we
would use it on _Tp actually, everywhere) but I don't know if Jon has
already something in his tree for this batch of issues regarding our base
container class / node constructors, or we want to decouple the issue from
55977, do std::vector and std::deque, which would be trivial even for 4.8.1,
or something else. Suggestions?

For std::list I'm waiting until we have two separate C++03 and C++11
implementations, then I'll implement allocator support in the C++11
code only, as it will be much easier. ...
Ok  great. Then, I'm going to apply mainline and 4.8.1 the 
straightforward std::vector and std::deque bits. The PR remains open for 
the rest.


Thanks!
Paolo.

/

2013-03-18  Paolo Carlini  

PR libstdc++/55977 (partial, std::vector and std::deque bits)
* include/bits/stl_vector.h (_M_range_initialize(_InputIterator,
_InputIterator, std::input_iterator_tag)): Use emplace_back.
* include/bits/deque.tcc (_M_range_initialize(_InputIterator,
_InputIterator, std::input_iterator_tag)): Likewise.
* testsuite/23_containers/vector/cons/55977.cc: New.
* testsuite/23_containers/deque/cons/55977.cc: Likewise.
* testsuite/23_containers/vector/requirements/dr438/assign_neg.cc:
Adjust dg-error line number.
* testsuite/23_containers/vector/requirements/dr438/insert_neg.cc:
Likewise.
Index: include/bits/deque.tcc
===
--- include/bits/deque.tcc  (revision 196754)
+++ include/bits/deque.tcc  (working copy)
@@ -381,7 +381,11 @@
 __try
   {
 for (; __first != __last; ++__first)
+#if __cplusplus >= 201103L
+ emplace_back(*__first);
+#else
   push_back(*__first);
+#endif
   }
 __catch(...)
   {
Index: include/bits/stl_vector.h
===
--- include/bits/stl_vector.h   (revision 196754)
+++ include/bits/stl_vector.h   (working copy)
@@ -1184,7 +1184,11 @@
_InputIterator __last, std::input_iterator_tag)
 {
  for (; __first != __last; ++__first)
+#if __cplusplus >= 201103L
+   emplace_back(*__first);
+#else
push_back(*__first);
+#endif
}
 
   // Called by the second initialize_dispatch above
Index: testsuite/23_containers/deque/cons/55977.cc
===
--- testsuite/23_containers/deque/cons/55977.cc (revision 0)
+++ testsuite/23_containers/deque/cons/55977.cc (working copy)
@@ -0,0 +1,70 @@
+// { dg-do compile }
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+#include 
+
+template 
+struct MyAllocator
+{
+  std::allocator base;
+  typedef T value_type;
+
+  // FIXME: these types shouldn't be required.
+  typedef T*pointer;
+  typedef const T*  const_pointer;
+  typedef T&reference;
+  typedef const T&  const_reference;
+  template 
+struct rebind
+{ typedef MyAllocator other; };
+
+  MyAllocator() = default;
+  template 
+  MyAllocator(const MyAllocator& other) : base(other.base) {}
+  T* allocate(std::size_t n) { return base.allocate(n); }
+  void deallocate(T* p, std::size_t n) { return base.deallocate(p, n); }
+  template 
+  void construct(U* p, Args&&... args)
+  {
+::new (static_cast(p)) U(std::forward(args)...);
+  }
+};
+
+struct A
+{
+private:
+  friend class MyAllocator;
+  A(int value) : value(value) {}
+  int value;
+public:
+  A() : value() {}
+  int get() const { return value; }
+};
+
+void foo()
+{
+  std::deque> v1;
+  const int i = 1;
+  v1.emplace_back(i); // OK
+  std::deque> v2(std::istream_iterator(), {}); // ERROR
+}
Index: testsuite/23_containers/vector/cons/55977.cc
===
--- testsuit

Re: [PATCH][1/n] Vectorizer TLC: re-organize data dependence checking

2013-03-18 Thread Richard Biener
On Wed, Feb 27, 2013 at 4:49 PM, Richard Biener  wrote:
>
> This splits data reference group analysis away from data dependence
> checking and splits the latter into loop and a BB vectorization
> functions.  This allows us to perform the now no longer quadratic
> but O(n * log (n)) data reference group analysis right after
> data reference gathering, pushing the quadratic data dependence
> testing down the vectorization analysis pipeline.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, queued for 4.9.

Committed as r196775.

Richard.

> Richard.
>
> 2013-02-27  Richard Biener  
>
> * tree-vect-data-refs.c (vect_update_interleaving_chain): Remove.
> (vect_insert_into_interleaving_chain): Likewise.
> (vect_drs_dependent_in_basic_block): Inline ...
> (vect_slp_analyze_data_ref_dependence): ... here.  New function,
> split out from ...
> (vect_analyze_data_ref_dependence): ... here.  Simplify.
> (vect_check_interleaving): Simplify.
> (vect_analyze_data_ref_dependences): Likewise.  Split out ...
> (vect_slp_analyze_data_ref_dependences): ... this new function.
> (dr_group_sort_cmp): New function.
> (vect_analyze_data_ref_accesses): Compute data-reference groups
> here instead of in vect_analyze_data_ref_dependence.  Use
> a more efficient algorithm.
> * tree-vect-slp.c (vect_slp_analyze_bb_1): Use
> vect_slp_analyze_data_ref_dependences.  Call
> vect_analyze_data_ref_accesses earlier.
> * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> * tree-vectorizer.h (vect_analyze_data_ref_dependences): Adjust.
> (vect_slp_analyze_data_ref_dependences): New prototype.
>
> Index: trunk/gcc/tree-vect-data-refs.c
> ===
> *** trunk.orig/gcc/tree-vect-data-refs.c2013-02-27 14:53:36.0 
> +0100
> --- trunk/gcc/tree-vect-data-refs.c 2013-02-27 16:45:58.969861004 +0100
> *** vect_get_place_in_interleaving_chain (gi
> *** 154,394 
>   }
>
>
> - /* Function vect_insert_into_interleaving_chain.
> -
> -Insert DRA into the interleaving chain of DRB according to DRA's INIT.  
> */
> -
> - static void
> - vect_insert_into_interleaving_chain (struct data_reference *dra,
> -struct data_reference *drb)
> - {
> -   gimple prev, next;
> -   tree next_init;
> -   stmt_vec_info stmtinfo_a = vinfo_for_stmt (DR_STMT (dra));
> -   stmt_vec_info stmtinfo_b = vinfo_for_stmt (DR_STMT (drb));
> -
> -   prev = GROUP_FIRST_ELEMENT (stmtinfo_b);
> -   next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (prev));
> -   while (next)
> - {
> -   next_init = DR_INIT (STMT_VINFO_DATA_REF (vinfo_for_stmt (next)));
> -   if (tree_int_cst_compare (next_init, DR_INIT (dra)) > 0)
> -   {
> - /* Insert here.  */
> - GROUP_NEXT_ELEMENT (vinfo_for_stmt (prev)) = DR_STMT (dra);
> - GROUP_NEXT_ELEMENT (stmtinfo_a) = next;
> - return;
> -   }
> -   prev = next;
> -   next = GROUP_NEXT_ELEMENT (vinfo_for_stmt (prev));
> - }
> -
> -   /* We got to the end of the list. Insert here.  */
> -   GROUP_NEXT_ELEMENT (vinfo_for_stmt (prev)) = DR_STMT (dra);
> -   GROUP_NEXT_ELEMENT (stmtinfo_a) = NULL;
> - }
> -
> -
> - /* Function vect_update_interleaving_chain.
> -
> -For two data-refs DRA and DRB that are a part of a chain interleaved data
> -accesses, update the interleaving chain.  DRB's INIT is smaller than 
> DRA's.
> -
> -There are four possible cases:
> -1. New stmts - both DRA and DRB are not a part of any chain:
> -   FIRST_DR = DRB
> -   NEXT_DR (DRB) = DRA
> -2. DRB is a part of a chain and DRA is not:
> -   no need to update FIRST_DR
> -   no need to insert DRB
> -   insert DRA according to init
> -3. DRA is a part of a chain and DRB is not:
> -   if (init of FIRST_DR > init of DRB)
> -   FIRST_DR = DRB
> - NEXT(FIRST_DR) = previous FIRST_DR
> -   else
> -   insert DRB according to its init
> -4. both DRA and DRB are in some interleaving chains:
> -   choose the chain with the smallest init of FIRST_DR
> -   insert the nodes of the second chain into the first one.  */
> -
> - static void
> - vect_update_interleaving_chain (struct data_reference *drb,
> -   struct data_reference *dra)
> - {
> -   stmt_vec_info stmtinfo_a = vinfo_for_stmt (DR_STMT (dra));
> -   stmt_vec_info stmtinfo_b = vinfo_for_stmt (DR_STMT (drb));
> -   tree next_init, init_dra_chain, init_drb_chain;
> -   gimple first_a, first_b;
> -   tree node_init;
> -   gimple node, prev, next, first_stmt;
> -
> -   /* 1. New stmts - both DRA and DRB are not a part of any chain.   */
> -   if (!GROUP_FIRST_ELEMENT (stmtinfo_a) && !GROUP_FIRST_ELEMENT 
> (stmtinfo_b))
> - {
> -   GROUP_FIRST_ELEMENT (stmtinfo_a) = DR_STMT (drb);
> 

Re: Fold VEC_COND_EXPR to abs, min, max

2013-03-18 Thread Marc Glisse

On Mon, 18 Mar 2013, Richard Biener wrote:


2013-03-17  Marc Glisse  

gcc/
* fold-const.c (fold_cond_expr_with_comparison): Use build_zero_cst.
VEC_COND_EXPR cannot be lvalues.
(fold_ternary_loc) : Call
fold_cond_expr_with_comparison.

gcc/cp/
* call.c (build_conditional_expr_1): Fold VEC_COND_EXPR.

gcc/testsuite/
* g++.dg/ext/vector21.C: New testcase.



@@ -4662,21 +4662,22 @@ fold_cond_expr_with_comparison (location
  expressions will be false, so all four give B.  The min()
  and max() versions would give a NaN instead.  */
   if (!HONOR_SIGNED_ZEROS (TYPE_MODE (type))
   && operand_equal_for_comparison_p (arg01, arg2, arg00)
   /* Avoid these transformations if the COND_EXPR may be used
 as an lvalue in the C++ front-end.  PR c++/19199.  */
   && (in_gimple_form
  || (strcmp (lang_hooks.name, "GNU C++") != 0
  && strcmp (lang_hooks.name, "GNU Objective-C++") != 0)
  || ! maybe_lvalue_p (arg1)
- || ! maybe_lvalue_p (arg2)))
+ || ! maybe_lvalue_p (arg2)
+ || TREE_CODE (TREE_TYPE (arg1)) == VECTOR_TYPE))


You mean that the VEC_COND_EXPRs can never be used as an lvalue in
the C++ frontend?


Yes, as I mention in the ChangeLog. Not just the C++ front-end, it never 
makes sense to use a VEC_COND_EXPR as an lvalue, it really is just a 
ternary variant of BIT_AND_EXPR.



Btw, instead of copying this whole block I'd prefer

case COND_EXPR:
case VEC_COND_EXPR:
... common cases...

/* ???  Fixup the code below for VEC_COND_EXRP.  */
if (code == VEC_COND_EXPR)
  break;


Makes sense, I'll rework the patch.

Thanks,

--
Marc Glisse


[PATCH] Fix PR56483

2013-03-18 Thread Richard Biener

This fixes PR56483 by properly testing for boolean values
during expansion of conditionals.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2013-03-18  Richard Biener  

PR middle-end/56483
* cfgexpand.c (expand_gimple_cond): Inline gimple_cond_single_var_p
and implement properly.
* gimple.h (gimple_cond_single_var_p): Remove.

Index: gcc/cfgexpand.c
===
*** gcc/cfgexpand.c (revision 196309)
--- gcc/cfgexpand.c (working copy)
*** expand_gimple_cond (basic_block bb, gimp
*** 1886,1894 
   be cleaned up by combine.  But some pattern matchers like if-conversion
   work better when there's only one compare, so make up for this
   here as special exception if TER would have made the same change.  */
!   if (gimple_cond_single_var_p (stmt)
!   && SA.values
&& TREE_CODE (op0) == SSA_NAME
&& bitmap_bit_p (SA.values, SSA_NAME_VERSION (op0)))
  {
gimple second = SSA_NAME_DEF_STMT (op0);
--- 1886,1899 
   be cleaned up by combine.  But some pattern matchers like if-conversion
   work better when there's only one compare, so make up for this
   here as special exception if TER would have made the same change.  */
!   if (SA.values
&& TREE_CODE (op0) == SSA_NAME
+   && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
+   && TREE_CODE (op1) == INTEGER_CST
+   && ((gimple_cond_code (stmt) == NE_EXPR
+  && integer_zerop (op1))
+ || (gimple_cond_code (stmt) == EQ_EXPR
+ && integer_onep (op1)))
&& bitmap_bit_p (SA.values, SSA_NAME_VERSION (op0)))
  {
gimple second = SSA_NAME_DEF_STMT (op0);
Index: gcc/gimple.h
===
*** gcc/gimple.h(revision 196309)
--- gcc/gimple.h(working copy)
*** gimple_cond_false_p (const_gimple gs)
*** 2747,2769 
return false;
  }
  
- /* Check if conditional statement GS is of the form 'if (var != 0)' or
-'if (var == 1)' */
- 
- static inline bool
- gimple_cond_single_var_p (gimple gs)
- {
-   if (gimple_cond_code (gs) == NE_EXPR
-   && gimple_cond_rhs (gs) == boolean_false_node)
- return true;
- 
-   if (gimple_cond_code (gs) == EQ_EXPR
-   && gimple_cond_rhs (gs) == boolean_true_node)
- return true;
- 
-   return false;
- }
- 
  /* Set the code, LHS and RHS of GIMPLE_COND STMT from CODE, LHS and RHS.  */
  
  static inline void
--- 2747,2752 


[PATCH] Handle string/character search functions in PTA / oracle

2013-03-18 Thread Richard Biener

This fixes an issue that shows up in PR56210.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2013-03-18  Richard Biener  

PR tree-optimization/56210
* tree-ssa-structalias.c (find_func_aliases_for_builtin_call):
Handle string / character search functions.
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Likewise.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 195751)
--- gcc/tree-ssa-structalias.c  (working copy)
*** find_func_aliases_for_builtin_call (gimp
*** 4196,4201 
--- 4257,4285 
return true;
  }
break;
+   /* String / character search functions return a pointer into the
+  source string or NULL.  */
+   case BUILT_IN_INDEX:
+   case BUILT_IN_STRCHR:
+   case BUILT_IN_STRRCHR:
+   case BUILT_IN_MEMCHR:
+   case BUILT_IN_STRSTR:
+   case BUILT_IN_STRPBRK:
+   if (gimple_call_lhs (t))
+ {
+   tree src = gimple_call_arg (t, 0);
+   get_constraint_for_ptr_offset (src, NULL_TREE, &rhsc);
+   constraint_expr nul;
+   nul.var = nothing_id;
+   nul.offset = 0;
+   nul.type = ADDRESSOF;
+   rhsc.safe_push (nul);
+   get_constraint_for (gimple_call_lhs (t), &lhsc);
+   process_all_all_constraints (lhsc, rhsc);
+   lhsc.release();
+   rhsc.release();
+ }
+   return true;
/* Trampolines are special - they set up passing the static
 frame.  */
case BUILT_IN_INIT_TRAMPOLINE:
Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c(revision 195751)
--- gcc/tree-ssa-alias.c(working copy)
*** ref_maybe_used_by_call_p_1 (gimple call,
*** 1314,1319 
--- 1314,1356 
   size);
return refs_may_alias_p_1 (&dref, ref, false);
  }
+   /* These read memory pointed to by the first argument.  */
+   case BUILT_IN_INDEX:
+   case BUILT_IN_STRCHR:
+   case BUILT_IN_STRRCHR:
+ {
+   ao_ref dref;
+   ao_ref_init_from_ptr_and_size (&dref,
+  gimple_call_arg (call, 0),
+  NULL_TREE);
+   return refs_may_alias_p_1 (&dref, ref, false);
+ }
+   /* These read memory pointed to by the first argument with size
+  in the third argument.  */
+   case BUILT_IN_MEMCHR:
+ {
+   ao_ref dref;
+   ao_ref_init_from_ptr_and_size (&dref,
+  gimple_call_arg (call, 0),
+  gimple_call_arg (call, 2));
+   return refs_may_alias_p_1 (&dref, ref, false);
+ }
+   /* These read memory pointed to by the first and second arguments.  */
+   case BUILT_IN_STRSTR:
+   case BUILT_IN_STRPBRK:
+ {
+   ao_ref dref;
+   ao_ref_init_from_ptr_and_size (&dref,
+  gimple_call_arg (call, 0),
+  NULL_TREE);
+   if (refs_may_alias_p_1 (&dref, ref, false))
+ return true;
+   ao_ref_init_from_ptr_and_size (&dref,
+  gimple_call_arg (call, 1),
+  NULL_TREE);
+   return refs_may_alias_p_1 (&dref, ref, false);
+ }
+ 
/* The following builtins do not read from memory.  */
case BUILT_IN_FREE:
case BUILT_IN_MALLOC:


Commit: MN10300: Add missing comment line

2013-03-18 Thread Nick Clifton
Hi Guys

  I am applying this patch as obvious.  It adds a line missing from the
  comment describing the mn10300_get_live_callee_saved_regs function.

Cheers
  Nick

gcc/ChangeLog
2013-03-18  Nick Clifton  

* config/mn10300/mn10300.c (mn10300_get_live_callee_saved_regs):
Add missing line to comment describing function.

Index: gcc/config/mn10300/mn10300.c
===
--- gcc/config/mn10300/mn10300.c(revision 196767)
+++ gcc/config/mn10300/mn10300.c(working copy)
@@ -622,6 +622,7 @@
 
 /* Returns the set of live, callee-saved registers as a bitmask.  The
callee-saved extended registers cannot be stored individually, so
+   all of them will be included in the mask if any one of them is used.
Also returns the number of bytes in the registers in the mask if
BYTES_SAVED is not NULL.  */
 


Commit: XStormy16: Remove spurious backslash

2013-03-18 Thread Nick Clifton
Hi Guys,

  I am applying this small patch to remove a spurious backslash escape
  at the end of a line in stormy16.c.

Cheers
  Nick

gcc/ChangeLog
2013-03-18  Nick Clifton  

* config/stormy16/stormy16.c (xstormy16_expand_prologue): Remove
spurious backslash.

Index: gcc/config/stormy16/stormy16.c
===
--- gcc/config/stormy16/stormy16.c  (revision 196767)
+++ gcc/config/stormy16/stormy16.c  (working copy)
@@ -1082,7 +1082,7 @@
 gen_rtx_MEM (Pmode, 
stack_pointer_rtx),
 reg);
XVECEXP (dwarf, 0, 1) = gen_rtx_SET (Pmode, stack_pointer_rtx,
-plus_constant (Pmode, \
+plus_constant (Pmode,
stack_pointer_rtx,
GET_MODE_SIZE 
(Pmode)));
add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);


Re: Fold VEC_COND_EXPR to abs, min, max

2013-03-18 Thread Richard Biener
On Mon, Mar 18, 2013 at 11:27 AM, Marc Glisse  wrote:
> On Mon, 18 Mar 2013, Richard Biener wrote:
>
>>> 2013-03-17  Marc Glisse  
>>>
>>> gcc/
>>> * fold-const.c (fold_cond_expr_with_comparison): Use
>>> build_zero_cst.
>>> VEC_COND_EXPR cannot be lvalues.
>>> (fold_ternary_loc) : Call
>>> fold_cond_expr_with_comparison.
>>>
>>> gcc/cp/
>>> * call.c (build_conditional_expr_1): Fold VEC_COND_EXPR.
>>>
>>> gcc/testsuite/
>>> * g++.dg/ext/vector21.C: New testcase.
>
>
>>> @@ -4662,21 +4662,22 @@ fold_cond_expr_with_comparison (location
>>>   expressions will be false, so all four give B.  The min()
>>>   and max() versions would give a NaN instead.  */
>>>if (!HONOR_SIGNED_ZEROS (TYPE_MODE (type))
>>>&& operand_equal_for_comparison_p (arg01, arg2, arg00)
>>>/* Avoid these transformations if the COND_EXPR may be used
>>>  as an lvalue in the C++ front-end.  PR c++/19199.  */
>>>&& (in_gimple_form
>>>   || (strcmp (lang_hooks.name, "GNU C++") != 0
>>>   && strcmp (lang_hooks.name, "GNU Objective-C++") != 0)
>>>   || ! maybe_lvalue_p (arg1)
>>> - || ! maybe_lvalue_p (arg2)))
>>> + || ! maybe_lvalue_p (arg2)
>>> + || TREE_CODE (TREE_TYPE (arg1)) == VECTOR_TYPE))
>>
>>
>> You mean that the VEC_COND_EXPRs can never be used as an lvalue in
>> the C++ frontend?
>
>
> Yes, as I mention in the ChangeLog. Not just the C++ front-end, it never
> makes sense to use a VEC_COND_EXPR as an lvalue, it really is just a ternary
> variant of BIT_AND_EXPR.

Then please add a && TREE_CODE == COND_EXPR around the
code handling only COND_EXPRs instead.

Richard.

>
>> Btw, instead of copying this whole block I'd prefer
>>
>> case COND_EXPR:
>> case VEC_COND_EXPR:
>> ... common cases...
>>
>> /* ???  Fixup the code below for VEC_COND_EXRP.  */
>> if (code == VEC_COND_EXPR)
>>   break;
>
>
> Makes sense, I'll rework the patch.
>
> Thanks,
>
> --
> Marc Glisse


Re: Fold VEC_COND_EXPR to abs, min, max

2013-03-18 Thread Marc Glisse

On Mon, 18 Mar 2013, Richard Biener wrote:


On Mon, Mar 18, 2013 at 11:27 AM, Marc Glisse  wrote:

On Mon, 18 Mar 2013, Richard Biener wrote:

You mean that the VEC_COND_EXPRs can never be used as an lvalue in
the C++ frontend?


Yes, as I mention in the ChangeLog. Not just the C++ front-end, it never
makes sense to use a VEC_COND_EXPR as an lvalue, it really is just a ternary
variant of BIT_AND_EXPR.


Then please add a && TREE_CODE == COND_EXPR around the
code handling only COND_EXPRs instead.


Hmm, there isn't one. There is just a block of code that is disabled when 
the compiler is not certain that the result is not an lvalue. And the 
arguments it can use to prove that are:

* we are in gimple form
* we are not doing C++
* one of the alternatives is not an lvalue
* (new) it is a vec_cond_expr

Apart from changing TREE_CODE == VEC_COND_EXPR to TREE_CODE != COND_EXPR, 
I am not sure what to change.


(Looking at the patch, I may have forgotten to check for side effects 
somewhere, the tests needed are not exactly the same as for COND_EXPR 
since VEC_COND_EXPR is not lazy, I'll check that before resubmitting)


--
Marc Glisse


Re: Fold VEC_COND_EXPR to abs, min, max

2013-03-18 Thread Richard Biener
On Mon, Mar 18, 2013 at 12:06 PM, Marc Glisse  wrote:
> On Mon, 18 Mar 2013, Richard Biener wrote:
>
>> On Mon, Mar 18, 2013 at 11:27 AM, Marc Glisse 
>> wrote:
>>>
>>> On Mon, 18 Mar 2013, Richard Biener wrote:

 You mean that the VEC_COND_EXPRs can never be used as an lvalue in
 the C++ frontend?
>>>
>>>
>>> Yes, as I mention in the ChangeLog. Not just the C++ front-end, it never
>>> makes sense to use a VEC_COND_EXPR as an lvalue, it really is just a
>>> ternary
>>> variant of BIT_AND_EXPR.
>>
>>
>> Then please add a && TREE_CODE == COND_EXPR around the
>> code handling only COND_EXPRs instead.
>
>
> Hmm, there isn't one. There is just a block of code that is disabled when
> the compiler is not certain that the result is not an lvalue. And the
> arguments it can use to prove that are:
> * we are in gimple form
> * we are not doing C++
> * one of the alternatives is not an lvalue
> * (new) it is a vec_cond_expr
>
> Apart from changing TREE_CODE == VEC_COND_EXPR to TREE_CODE != COND_EXPR, I
> am not sure what to change.
>
> (Looking at the patch, I may have forgotten to check for side effects
> somewhere, the tests needed are not exactly the same as for COND_EXPR since
> VEC_COND_EXPR is not lazy, I'll check that before resubmitting)

Hmm, I see we don't even have the code available in
fold_cond_expr_with_comparison.
So use instead

  if (!HONOR_SIGNED_ZEROS (TYPE_MODE (type))
  && operand_equal_for_comparison_p (arg01, arg2, arg00)
  /* Avoid these transformations if the COND_EXPR may be used
 as an lvalue in the C++ front-end.  PR c++/19199.  */
  && (in_gimple_form
  || VECTOR_TYPE_P (type)
  || (strcmp (lang_hooks.name, "GNU C++") != 0
  && strcmp (lang_hooks.name, "GNU Objective-C++") != 0)
  || ! maybe_lvalue_p (arg1)
  || ! maybe_lvalue_p (arg2)))

err - there isn't a VECTOR_TYPE_P predicate - time to add one ;)

Thanks,
Richard.

> --
> Marc Glisse


[PATCH] Exchange late VRP and DOM passes

2013-03-18 Thread Richard Biener

This moves VRP after late DOM.  This is because VRP has a hard
time dealing with non-copyproped (and not CSEd) IL and conveniently
DOM provides both.  I noticed this when working on PR56273 where
we miss quite some VRP opportunities because of this.
I cannot think of a good reason to have the current order, so I am
going ahead with this after SVN is back.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

While I have the patch in my tree for quite a while I don't
remember testing it, so the patch may get some additional
testsuite fallout changes.

Richard.

2013-03-18  Richard Biener  

PR tree-optimization/56273
* passes.c (init_optimization_passes): Move second VRP after DOM.

Index: gcc/passes.c
===
--- gcc/passes.c(revision 195938)
+++ gcc/passes.c(working copy)
@@ -1488,7 +1488,6 @@ init_optimization_passes (void)
   NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_cse_reciprocals);
   NEXT_PASS (pass_reassoc);
-  NEXT_PASS (pass_vrp);
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_dominator);
   /* The only const/copy propagation opportunities left after
@@ -1497,6 +1496,7 @@ init_optimization_passes (void)
 only examines PHIs to discover const/copy propagation
 opportunities.  */
   NEXT_PASS (pass_phi_only_cprop);
+  NEXT_PASS (pass_vrp);
   NEXT_PASS (pass_cd_dce);
   NEXT_PASS (pass_tracer);
 



[PATCH] Vectorizer TLC

2013-03-18 Thread Richard Biener

The following is a collection of TLC sitting in my local tree,
mostly resulting in less obscure IL after vectorization.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2013-03-18  Richard Biener  

* tree-vect-loop-manip.c (vect_create_cond_for_alias_checks):
Remove cond_expr_stmt_list argument and do not gimplify the
built expression.
(vect_loop_versioning): Adjust.
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
Cleanup to use less temporaries.
(vect_create_data_ref_ptr): Cleanup.

Index: gcc/tree-vect-loop-manip.c
===
*** gcc/tree-vect-loop-manip.c.orig 2013-02-20 11:05:04.0 +0100
--- gcc/tree-vect-loop-manip.c  2013-03-18 11:38:46.633554021 +0100
*** vect_vfa_segment_size (struct data_refer
*** 2271,2290 
  
 Output:
 COND_EXPR - conditional expression.
-COND_EXPR_STMT_LIST - statements needed to construct the conditional
-  expression.
- 
  
 The returned value is the conditional expression to be used in the if
 statement that controls which version of the loop gets executed at runtime.
  */
  
  static void
! vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo,
!  tree * cond_expr,
!  gimple_seq * cond_expr_stmt_list)
  {
-   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
vec  may_alias_ddrs =
  LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
--- 2271,2284 
  
 Output:
 COND_EXPR - conditional expression.
  
 The returned value is the conditional expression to be used in the if
 statement that controls which version of the loop gets executed at runtime.
  */
  
  static void
! vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
  {
vec  may_alias_ddrs =
  LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo);
int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
*** vect_create_cond_for_alias_checks (loop_
*** 2333,2344 
  dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
}
  
!   addr_base_a =
! vect_create_addr_base_for_vector_ref (stmt_a, cond_expr_stmt_list,
! NULL_TREE, loop);
!   addr_base_b =
! vect_create_addr_base_for_vector_ref (stmt_b, cond_expr_stmt_list,
! NULL_TREE, loop);
  
if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
length_factor = scalar_loop_iters;
--- 2327,2340 
  dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b));
}
  
!   addr_base_a
!   = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a),
!  size_binop (PLUS_EXPR, DR_OFFSET (dr_a),
!  DR_INIT (dr_a)));
!   addr_base_b
!   = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b),
!  size_binop (PLUS_EXPR, DR_OFFSET (dr_b),
!  DR_INIT (dr_b)));
  
if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0))
length_factor = scalar_loop_iters;
*** vect_loop_versioning (loop_vec_info loop
*** 2435,2442 
   &cond_expr_stmt_list);
  
if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
! vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr,
!  &cond_expr_stmt_list);
  
cond_expr = force_gimple_operand_1 (cond_expr, &gimplify_stmt_list,
  is_gimple_condexpr, NULL_TREE);
--- 2431,2437 
   &cond_expr_stmt_list);
  
if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
! vect_create_cond_for_alias_checks (loop_vinfo, &cond_expr);
  
cond_expr = force_gimple_operand_1 (cond_expr, &gimplify_stmt_list,
  is_gimple_condexpr, NULL_TREE);
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c.orig  2013-03-05 16:50:25.0 +0100
--- gcc/tree-vect-data-refs.c   2013-03-18 11:37:47.391898625 +0100
*** vect_create_addr_base_for_vector_ref (gi
*** 3556,3574 
  {
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
!   tree data_ref_base = unshare_expr (DR_BASE_ADDRESS (dr));
const char *base_name;
!   tree data_ref_base_var;
!   tree vec_stmt;
!   tree addr_base, addr_expr;
tree dest;
gimple_seq seq = NULL;
!   tree base_offset = unshare_expr (DR_OFFSET (dr));
!   tree init = unshare_expr (DR_INIT (dr));
tree vect_ptr_type;
tree step = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (

[PATCH] PTA TLC

2013-03-18 Thread Richard Biener

This is a collection of changes that cleanup PTA.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Richard.

2013-03-18  Richard Biener  

* tree-ssa-structalias.c (find): Use gcc_checking_assert.
(unite): Likewise.
(merge_node_constraints): Likewise.
(build_succ_graph): Likewise.
(valid_graph_edge): Inline into single caller.
(unify_nodes): Likewise.  Use bitmap_set_bit return value
and cache varinfo.
(scc_visit): Fix formatting and variable use.
(do_sd_constraint): Use gcc_checking_assert.
(do_ds_constraint): Likewise.
(do_complex_constraint): Likewise.
(condense_visit): Likewise.  Cleanup.
(dump_pred_graph): New function.
(perform_var_substitution): Dump the pred-graph before
variable substitution.
(find_equivalent_node): Use gcc_checking_assert.
(rewrite_constraints): Guard checking loop with ENABLE_CHECKING.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c.orig 2013-03-07 16:47:53.0 +0100
--- gcc/tree-ssa-structalias.c  2013-03-18 12:41:57.475843339 +0100
*** static constraint_graph_t graph;
*** 581,587 
  static unsigned int
  find (unsigned int node)
  {
!   gcc_assert (node < graph->size);
if (graph->rep[node] != node)
  return graph->rep[node] = find (graph->rep[node]);
return node;
--- 581,587 
  static unsigned int
  find (unsigned int node)
  {
!   gcc_checking_assert (node < graph->size);
if (graph->rep[node] != node)
  return graph->rep[node] = find (graph->rep[node]);
return node;
*** find (unsigned int node)
*** 595,601 
  static bool
  unite (unsigned int to, unsigned int from)
  {
!   gcc_assert (to < graph->size && from < graph->size);
if (to != from && graph->rep[from] != to)
  {
graph->rep[from] = to;
--- 595,601 
  static bool
  unite (unsigned int to, unsigned int from)
  {
!   gcc_checking_assert (to < graph->size && from < graph->size);
if (to != from && graph->rep[from] != to)
  {
graph->rep[from] = to;
*** merge_node_constraints (constraint_graph
*** 1023,1029 
unsigned int i;
constraint_t c;
  
!   gcc_assert (find (from) == to);
  
/* Move all complex constraints from src node into to node  */
FOR_EACH_VEC_ELT (graph->complex[from], i, c)
--- 1023,1029 
unsigned int i;
constraint_t c;
  
!   gcc_checking_assert (find (from) == to);
  
/* Move all complex constraints from src node into to node  */
FOR_EACH_VEC_ELT (graph->complex[from], i, c)
*** add_graph_edge (constraint_graph_t graph
*** 1143,1158 
  }
  
  
- /* Return true if {DEST.SRC} is an existing graph edge in GRAPH.  */
- 
- static bool
- valid_graph_edge (constraint_graph_t graph, unsigned int src,
- unsigned int dest)
- {
-   return (graph->succs[dest]
- && bitmap_bit_p (graph->succs[dest], src));
- }
- 
  /* Initialize the constraint graph structure to contain SIZE nodes.  */
  
  static void
--- 1143,1148 
*** build_succ_graph (void)
*** 1319,1325 
else if (rhs.type == ADDRESSOF)
{
  /* x = &y */
! gcc_assert (find (rhs.var) == rhs.var);
  bitmap_set_bit (get_varinfo (lhsvar)->solution, rhsvar);
}
else if (lhsvar > anything_id
--- 1309,1315 
else if (rhs.type == ADDRESSOF)
{
  /* x = &y */
! gcc_checking_assert (find (rhs.var) == rhs.var);
  bitmap_set_bit (get_varinfo (lhsvar)->solution, rhsvar);
}
else if (lhsvar > anything_id
*** scc_visit (constraint_graph_t graph, str
*** 1396,1409 
  
if (!bitmap_bit_p (si->visited, w))
scc_visit (graph, si, w);
-   {
-   unsigned int t = find (w);
-   unsigned int nnode = find (n);
-   gcc_assert (nnode == n);
  
!   if (si->dfs[t] < si->dfs[nnode])
! si->dfs[n] = si->dfs[t];
!   }
  }
  
/* See if any components have been identified.  */
--- 1386,1396 
  
if (!bitmap_bit_p (si->visited, w))
scc_visit (graph, si, w);
  
!   unsigned int t = find (w);
!   gcc_checking_assert (find (n) == n);
!   if (si->dfs[t] < si->dfs[n])
!   si->dfs[n] = si->dfs[t];
  }
  
/* See if any components have been identified.  */
*** static void
*** 1458,1465 
  unify_nodes (constraint_graph_t graph, unsigned int to, unsigned int from,
 bool update_changed)
  {
  
-   gcc_assert (to != from && find (to) == to);
if (dump_file && (dump_flags & TDF_DETAILS))
  fprintf (dump_file, "Unifying %s to %s\n",
 get_varinfo (from)->name,
--- 1445,1452 
  unify_nodes (constraint_graph_t graph, unsigned int to, unsigned int from,
 bool update_changed)
  {
+   gcc_checking_a

[PATCH][ARM] Handle unordered comparison cases in NEON vcond

2013-03-18 Thread Kyrylo Tkachov
Hi all,

Given code:

#define MAX(a, b) (a > b ? a : b)
void foo (int ilast, float* w, float* w2)
{
  int i;
  for (i = 0; i < ilast; ++i)
  {
w[i] = MAX (0.0f, w2[i]);
  }
}

compiled with
-O1 -funsafe-math-optimizations -ftree-vectorize -mfpu=neon -mfloat-abi=hard
on 
arm-none-eabi will cause an ICE when trying to expand the vcond pattern.
Looking at the vcond pattern in neon.md, the predicate for the
comparison operator (arm_comparison_operator) uses
maybe_get_arm_condition_code
 which is not needed for vcond since we don't care about the ARM condition
code
(we can handle all the comparison cases ourselves in the expander).

Changing the predicate to comparison_operator allows the expander to proceed
but it ICEs again because the pattern doesn't handle the floating point
unordered cases! (i.e. UNGT, UNORDERED, UNLE etc).

Adding support for the unordered cases is very similar to the aarch64 port
added
here:
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00957.html
This patch adapts that code to the arm port.

Added the testcase that exposed the ICE initially and also the UNORDERED and
LTGT
variations of it.

No regressions on arm-none-eabi.

Ok for trunk?

Thanks,
Kyrill


gcc/ChangeLog
2013-03-18  Kyrylo Tkachov  

* config/arm/iterators.md (v_cmp_result): New mode attribute.
* config/arm/neon.md (vcond): Handle unordered cases.


gcc/testsuite/ChangeLog
2013-03-18  Kyrylo Tkachov  

* gcc.target/arm/neon-vcond-gt.c: New test.
* gcc.target/arm/neon-vcond-ltgt.c: Likewise.
* gcc.target/arm/neon-vcond-unordered.c: Likewise.diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 252f18b..b3ad42b 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -314,6 +314,12 @@
 (V2SF "V2SI") (V4SF  "V4SI")
 (DI   "DI")   (V2DI  "V2DI")])
 
+(define_mode_attr v_cmp_result [(V8QI "v8qi") (V16QI "v16qi")
+   (V4HI "v4hi") (V8HI  "v8hi")
+   (V2SI "v2si") (V4SI  "v4si")
+   (DI   "di")   (V2DI  "v2di")
+   (V2SF "v2si") (V4SF  "v4si")])
+
 ;; Get element type from double-width mode, for operations where we 
 ;; don't care about signedness.
 (define_mode_attr V_if_elem [(V8QI "i8")  (V16QI "i8")
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 79b3f66..99fb5e8 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1721,80 +1721,144 @@
 (define_expand "vcond"
   [(set (match_operand:VDQW 0 "s_register_operand" "")
(if_then_else:VDQW
- (match_operator 3 "arm_comparison_operator"
+ (match_operator 3 "comparison_operator"
[(match_operand:VDQW 4 "s_register_operand" "")
 (match_operand:VDQW 5 "nonmemory_operand" "")])
  (match_operand:VDQW 1 "s_register_operand" "")
  (match_operand:VDQW 2 "s_register_operand" "")))]
   "TARGET_NEON && (! || flag_unsafe_math_optimizations)"
 {
-  rtx mask;
-  int inverse = 0, immediate_zero = 0;
-  /* See the description of "magic" bits in the 'T' case of
- arm_print_operand.  */
   HOST_WIDE_INT magic_word = (mode == V2SFmode || mode == V4SFmode)
 ? 3 : 1;
   rtx magic_rtx = GEN_INT (magic_word);
-  
-  mask = gen_reg_rtx (mode);
-  
-  if (operands[5] == CONST0_RTX (mode))
-immediate_zero = 1;
-  else if (!REG_P (operands[5]))
-operands[5] = force_reg (mode, operands[5]);
-  
+  int inverse = 0;
+  int swap_bsl_operands = 0;
+  rtx mask = gen_reg_rtx (mode);
+  rtx tmp = gen_reg_rtx (mode);
+
+  rtx (*base_comparison) (rtx, rtx, rtx, rtx);
+  rtx (*complimentary_comparison) (rtx, rtx, rtx, rtx);
+
   switch (GET_CODE (operands[3]))
 {
 case GE:
-  emit_insn (gen_neon_vcge (mask, operands[4], operands[5],
- magic_rtx));
+case LE:
+case EQ:
+  if (!REG_P (operands[5])
+ && (operands[5] != CONST0_RTX (mode)))
+   operands[5] = force_reg (mode, operands[5]);
   break;
-
+default:
+  if (!REG_P (operands[5]))
+   operands[5] = force_reg (mode, operands[5]);
+}
+
+  switch (GET_CODE (operands[3]))
+{
+case LT:
+case UNLT:
+  inverse = 1;
+  /* Fall through.  */
+case GE:
+case UNGE:
+case ORDERED:
+case UNORDERED:
+  base_comparison = gen_neon_vcge;
+  complimentary_comparison = gen_neon_vcgt;
+  break;
+case LE:
+case UNLE:
+  inverse = 1;
+  /* Fall through.  */
 case GT:
-  emit_insn (gen_neon_vcgt (mask, operands[4], operands[5],
- magic_rtx));
+case UNGT:
+  base_comparison = gen_neon_vcgt;
+  complimentary_comparison = gen_neon_vcge;
   break;
-
 case EQ:
-  emit_insn (gen_neon_vceq (mask, operands[4], operands[5],
- magic_rtx));
+c

[PATCH] Fix cselim ICE (PR tree-optimization/56635)

2013-03-18 Thread Jakub Jelinek
Hi!

On the attached testcase we ICE, because cselim uses
   if (!is_gimple_reg_type (TREE_TYPE (lhs))
   || !operand_equal_p (lhs, gimple_assign_lhs (else_assign), 0))
to guard an optimization, and we have two MEM_REFs where operand_equal_p
is true, but they don't have compatible types (the first one is _Complex
double, i.e. is_gimple_reg_type), the second has aggregate type of the
same size/same mode (structure containing the _Complex double).

The first patch is what I've bootstrapped/regtested on i686-linux so far
(x86_64-linux regtest still pending), it seems to trigger different return
value from operand_equal_p in a couple of TUs:
c52008b.adb
/usr/src/gcc-4.8/libstdc++-v3/src/c++98/complex_io.cc
/usr/src/gcc-4.8/lto-plugin/lto-plugin.c
/usr/src/gcc-4.8/gcc/testsuite/g++.dg/opt/pr47366.C
/usr/src/gcc-4.8/gcc/testsuite/g++.dg/torture/pr56635.C
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/actual_array_constructor_1.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/alloc_comp_assign_2.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/alloc_comp_assign_3.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/argument_checking_1.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/array_memcpy_2.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/array_temporaries_3.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/auto_char_len_3.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/char_pointer_func.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/proc_ptr_23.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/pure_byref_1.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/realloc_on_assign_17.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/reshape.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/string_ctor_1.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/unlimited_polymorphic_1.f03
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/vector_subscript_2.f90
/usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/widechar_8.f90
/usr/src/gcc-4.8/libstdc++-v3/testsuite/23_containers/unordered_map/cons/56112.cc
/usr/src/gcc-4.8/libstdc++-v3/testsuite/25_algorithms/remove_if/moveable.cc
/usr/src/gcc-4.8/libstdc++-v3/testsuite/25_algorithms/remove/moveable.cc
/usr/src/gcc-4.8/libstdc++-v3/testsuite/25_algorithms/stable_sort/49559.cc
but from quick look at some of them it didn't look like something very
undesirable.  In any case, no new FAILs.  Ok for trunk if it passes regtest
on x86_64-linux too?

The second patch is a safer version intended for 4.8.0, to be
bootstrapped/regtested afterwards.

Jakub
2013-03-18  Jakub Jelinek  

PR tree-optimization/56635
* fold-const.c (operand_equal_p): For MEM_REF and TARGET_MEM_REF,
require types_compatible_p types.

* g++.dg/torture/pr56635.C: New test.

--- gcc/fold-const.c.jj 2013-02-26 10:59:53.0 +0100
+++ gcc/fold-const.c2013-03-18 10:20:41.368932746 +0100
@@ -2572,13 +2572,14 @@ operand_equal_p (const_tree arg0, const_
  flags &= ~OEP_CONSTANT_ADDRESS_OF;
  /* Require equal access sizes, and similar pointer types.
 We can have incomplete types for array references of
-variable-sized arrays from the Fortran frontent
-though.  */
+variable-sized arrays from the Fortran frontend
+though.  Also verify the types are compatible.  */
  return ((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1))
   || (TYPE_SIZE (TREE_TYPE (arg0))
   && TYPE_SIZE (TREE_TYPE (arg1))
   && operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)),
   TYPE_SIZE (TREE_TYPE (arg1)), 
flags)))
+ && types_compatible_p (TREE_TYPE (arg0), TREE_TYPE (arg1))
  && (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 1)))
  == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg1, 1
  && OP_SAME (0) && OP_SAME (1));
--- gcc/testsuite/g++.dg/torture/pr56635.C.jj   2013-03-18 09:56:28.089526657 
+0100
+++ gcc/testsuite/g++.dg/torture/pr56635.C  2013-03-18 09:56:36.111478322 
+0100
@@ -0,0 +1,17 @@
+// PR tree-optimization/56635
+// { dg-do compile }
+
+struct A { _Complex double a; };
+
+void
+foo (A **x, A **y)
+{
+  A r;
+  if (__real__ x[0]->a)
+{
+  r.a = y[0]->a / x[0]->a;
+  **x = r;
+}
+  else
+**x = **y;
+}
2013-03-18  Jakub Jelinek  

PR tree-optimization/56635
* tree-ssa-phiopt.c (cond_if_else_store_replacement_1): Give up
if lhs of then_assign and else_assign don't have compatible types.

* g++.dg/torture/pr56635.C: New test.

--- gcc/tree-ssa-phiopt.c.jj2013-02-13 21:47:17.0 +0100
+++ gcc/tree-ssa-phiopt.c   2013-03-18 09:54:03.047377144 +0100
@@ -1528,7 +1528,7 @@ cond_if_else_store_replacement_1 (basic_
  basic_block join_bb, gimple then_assign,
  gimple else_assign)
 {
-  tree lhs_base, lhs, then_rhs, else_rhs, name;
+  tree lhs_base, 

Re: [PATCH] Fix cselim ICE (PR tree-optimization/56635)

2013-03-18 Thread Richard Biener
On Mon, Mar 18, 2013 at 1:00 PM, Jakub Jelinek  wrote:
> Hi!
>
> On the attached testcase we ICE, because cselim uses
>if (!is_gimple_reg_type (TREE_TYPE (lhs))
>|| !operand_equal_p (lhs, gimple_assign_lhs (else_assign), 0))
> to guard an optimization, and we have two MEM_REFs where operand_equal_p
> is true, but they don't have compatible types (the first one is _Complex
> double, i.e. is_gimple_reg_type), the second has aggregate type of the
> same size/same mode (structure containing the _Complex double).
>
> The first patch is what I've bootstrapped/regtested on i686-linux so far
> (x86_64-linux regtest still pending), it seems to trigger different return
> value from operand_equal_p in a couple of TUs:
> c52008b.adb
> /usr/src/gcc-4.8/libstdc++-v3/src/c++98/complex_io.cc
> /usr/src/gcc-4.8/lto-plugin/lto-plugin.c
> /usr/src/gcc-4.8/gcc/testsuite/g++.dg/opt/pr47366.C
> /usr/src/gcc-4.8/gcc/testsuite/g++.dg/torture/pr56635.C
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/actual_array_constructor_1.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/alloc_comp_assign_2.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/alloc_comp_assign_3.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/argument_checking_1.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/array_memcpy_2.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/array_temporaries_3.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/auto_char_len_3.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/char_pointer_func.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/proc_ptr_23.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/pure_byref_1.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/realloc_on_assign_17.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/reshape.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/string_ctor_1.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/unlimited_polymorphic_1.f03
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/vector_subscript_2.f90
> /usr/src/gcc-4.8/gcc/testsuite/gfortran.dg/widechar_8.f90
> /usr/src/gcc-4.8/libstdc++-v3/testsuite/23_containers/unordered_map/cons/56112.cc
> /usr/src/gcc-4.8/libstdc++-v3/testsuite/25_algorithms/remove_if/moveable.cc
> /usr/src/gcc-4.8/libstdc++-v3/testsuite/25_algorithms/remove/moveable.cc
> /usr/src/gcc-4.8/libstdc++-v3/testsuite/25_algorithms/stable_sort/49559.cc
> but from quick look at some of them it didn't look like something very
> undesirable.  In any case, no new FAILs.  Ok for trunk if it passes regtest
> on x86_64-linux too?

Ok.

Thanks,
Richard.

> The second patch is a safer version intended for 4.8.0, to be
> bootstrapped/regtested afterwards.
>
> Jakub


[Patch, microblaze]: Add -fstack-usage support

2013-03-18 Thread David Holsgrove
Changelog

2013-03-18  David Holsgrove 

 * gcc/config/microblaze/microblaze.c (microblaze_expand_prologue):
   Add check for flag_stack_usage to handle -fstack-usage support

Signed-off-by: David Holsgrove 



0004-Patch-microblaze-Add-fstack-usage-support.patch
Description: 0004-Patch-microblaze-Add-fstack-usage-support.patch


[Patch, microblaze]: Enable DWARF exception handling support

2013-03-18 Thread David Holsgrove
Add DWARF exception handling support for MicroBlaze.

Changelog

2013-03-18  Edgar E. Iglesias 
David Holsgrove 

 * common/config/microblaze/microblaze-common.c: Remove
   TARGET_EXCEPT_UNWIND_INFO definition.
 * config/microblaze/microblaze-protos.h: Add microblaze_eh_return prototype.
 * gcc/config/microblaze/microblaze.c: (microblaze_must_save_register,
   microblaze_expand_epilogue, microblaze_return_addr): Handle
   calls_eh_return
   (microblaze_eh_return): New function.
 * gcc/config/microblaze/microblaze.h: Define RETURN_ADDR_OFFSET,
   EH_RETURN_DATA_REGNO, MB_EH_STACKADJ_REGNUM, EH_RETURN_STACKADJ_RTX,
   ASM_PREFERRED_EH_DATA_FORMAT
 * gcc/config/microblaze/microblaze.md: Define eh_return pattern.

Signed-off-by: David Holsgrove 
Signed-off-by: Edgar E. Iglesias 



0001-Patch-microblaze-Enable-DWARF-exception-handling-sup.patch
Description: 0001-Patch-microblaze-Enable-DWARF-exception-handling-sup.patch


[Patch, microblaze]: Add atomic builtin implementation

2013-03-18 Thread David Holsgrove
Add sync_compare_and_swapsi and sync_test_and_setsi
implementations for MicroBlaze.

Changelog

2013-03-18  David Holsgrove 

 * gcc/config/microblaze/sync.md: New file.
 * gcc/config/microblaze/microblaze.md: Add UNSPEC_SYNC_CAS,
   UNSPEC_SYNC_XCHG and include sync.md.
 * gcc/config/microblaze/microblaze.c: Add print_operand 'y'.
 * gcc/config/microblaze/constraints.md: Add memory_contraint
   'Q' which is a single register.

Signed-off-by: David Holsgrove 



0002-Patch-microblaze-Add-atomic-builtin.patch
Description: 0002-Patch-microblaze-Add-atomic-builtin.patch


[Patch, microblaze]: Add TARGET_ASM_OUTPUT_MI_THUNK to support varargs thunk

2013-03-18 Thread David Holsgrove
Changelog

2013-03-18  David Holsgrove 

 * gcc/config/microblaze/microblaze.c: Add microblaze_asm_output_mi_thunk
   and define TARGET_ASM_OUTPUT_MI_THUNK and TARGET_ASM_CAN_OUTPUT_MI_THUNK

Signed-off-by: David Holsgrove 



0003-Patch-microblaze-Add-TARGET_ASM_OUTPUT_MI_THUNK-to-s.patch
Description: 0003-Patch-microblaze-Add-TARGET_ASM_OUTPUT_MI_THUNK-to-s.patch


[Patch, microblaze]: Remove SECONDARY_MEMORY_NEEDED

2013-03-18 Thread David Holsgrove
MicroBlaze doesn't have restrictions that would force us to
reload regs via memory. Don't define SECONDARY_MEMORY_NEEDED.
Fixes an ICE when compiling OpenSSL for linux.

Changelog

2013-03-18  Edgar E. Iglesias 

 * gcc/config/microblaze/microblaze.h: Remove SECONDARY_MEMORY_NEEDED
   definition.

Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Peter A. G. Crosthwaite 



0005-Patch-microblaze-Remove-SECONDARY_MEMORY_NEEDED.patch
Description: 0005-Patch-microblaze-Remove-SECONDARY_MEMORY_NEEDED.patch


[Patch, microblaze]: Add SIZE_TYPE and PTRDIFF_TYPE to microblaze.h

2013-03-18 Thread David Holsgrove
Changelog

2013-03-18  David Holsgrove 

 * gcc/config/microblaze/microblaze.h: Define SIZE_TYPE
   and PTRDIFF_TYPE.

Signed-off-by: David Holsgrove 



0006-Patch-microblaze-Add-SIZE_TYPE-and-PTRDIFF_TYPE-to-m.patch
Description: 0006-Patch-microblaze-Add-SIZE_TYPE-and-PTRDIFF_TYPE-to-m.patch


[PATCH] Speedup PTA

2013-03-18 Thread Richard Biener

This patch, long on my TODO list, speeds up PTA by removing the
costly hashtable lookup when looking for related fields of
a sub-variable.  Instead we now keep a pointer to the first
field.  For space-savings I changed head/next to be the variable
info ID instead.  This speeds up a fortran testcase by 10%.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2013-03-18  Richard Biener  

* tree-ssa-structalias.c (struct variable_info): Add pointer
to the first field of an aggregate with sub-vars.  Make
this and the pointer to the next subfield its ID.
(vi_next): New function.
(nothing_id, anything_id, readonly_id, escaped_id, nonlocal_id,
storedanything_id, integer_id): Increment by one.
(new_var_info, get_call_vi, lookup_call_clobber_vi,
get_call_clobber_vi): Adjust.
(solution_set_expand): Simplify and speedup.
(solution_set_add): Inline into ...
(set_union_with_increment): ... this.  Adjust accordingly.
(do_sd_constraint): Likewise.
(do_ds_constraint): Likewise.
(do_complex_constraint): Simplify.
(build_pred_graph): Adjust.
(solve_graph): Likewise.  Simplify and speedup.
(get_constraint_for_ssa_var, get_constraint_for_ptr_offset,
get_constraint_for_component_ref, get_constraint_for_1,
first_vi_for_offset, first_or_preceding_vi_for_offset,
create_function_info_for, create_variable_info_for_1,
create_variable_info_for, intra_create_variable_infos): Adjust.
(init_base_vars): Push NULL for ID zero.
(compute_points_to_sets): Adjust.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c.orig 2013-03-18 12:41:57.0 +0100
--- gcc/tree-ssa-structalias.c  2013-03-18 13:38:23.141158010 +0100
*** struct variable_info
*** 268,275 
/* True if this represents a IPA function info.  */
unsigned int is_fn_info : 1;
  
!   /* A link to the variable for the next field in this structure.  */
!   struct variable_info *next;
  
/* Offset of this variable, in bits, from the base variable  */
unsigned HOST_WIDE_INT offset;
--- 268,279 
/* True if this represents a IPA function info.  */
unsigned int is_fn_info : 1;
  
!   /* The ID of the variable for the next field in this structure
!  or zero for the last field in this structure.  */
!   unsigned next;
! 
!   /* The ID of the variable for the first field in this structure.  */
!   unsigned head;
  
/* Offset of this variable, in bits, from the base variable  */
unsigned HOST_WIDE_INT offset;
*** get_varinfo (unsigned int n)
*** 319,328 
return varmap[n];
  }
  
! /* Static IDs for the special variables.  */
! enum { nothing_id = 0, anything_id = 1, readonly_id = 2,
!escaped_id = 3, nonlocal_id = 4,
!storedanything_id = 5, integer_id = 6 };
  
  /* Return a new variable info structure consisting for a variable
 named NAME, and using constraint graph node NODE.  Append it
--- 323,342 
return varmap[n];
  }
  
! /* Return the next variable in the list of sub-variables of VI
!or NULL if VI is the last sub-variable.  */
! 
! static inline varinfo_t
! vi_next (varinfo_t vi)
! {
!   return get_varinfo (vi->next);
! }
! 
! /* Static IDs for the special variables.  Variable ID zero is unused
!and used as terminator for the sub-variable chain.  */
! enum { nothing_id = 1, anything_id = 2, readonly_id = 3,
!escaped_id = 4, nonlocal_id = 5,
!storedanything_id = 6, integer_id = 7 };
  
  /* Return a new variable info structure consisting for a variable
 named NAME, and using constraint graph node NODE.  Append it
*** new_var_info (tree t, const char *name)
*** 355,361 
  && DECL_HARD_REGISTER (t)));
ret->solution = BITMAP_ALLOC (&pta_obstack);
ret->oldsolution = NULL;
!   ret->next = NULL;
  
stats.total_vars++;
  
--- 369,376 
  && DECL_HARD_REGISTER (t)));
ret->solution = BITMAP_ALLOC (&pta_obstack);
ret->oldsolution = NULL;
!   ret->next = 0;
!   ret->head = ret->id;
  
stats.total_vars++;
  
*** get_call_vi (gimple call)
*** 387,398 
vi->fullsize = 2;
vi->is_full_var = true;
  
!   vi->next = vi2 = new_var_info (NULL_TREE, "CALLCLOBBERED");
vi2->offset = 1;
vi2->size = 1;
vi2->fullsize = 2;
vi2->is_full_var = true;
  
*slot_p = (void *) vi;
return vi;
  }
--- 402,415 
vi->fullsize = 2;
vi->is_full_var = true;
  
!   vi2 = new_var_info (NULL_TREE, "CALLCLOBBERED");
vi2->offset = 1;
vi2->size = 1;
vi2->fullsize = 2;
vi2->is_full_var = true;
  
+   vi->next = vi2->id;
+ 
*slot_p = (void *) vi;
return vi;
  }
*** lookup_call_clobber_vi (gimple call)
*** 422,428 
if (!uses)

[Patch, microblaze]: Extend jump insn to accept bri to SYMBOL_REFS

2013-03-18 Thread David Holsgrove
Changelog

2013-03-18  David Holsgrove 

 * gcc/config/microblaze/microblaze.md (jump):
   Account for jumps to SYMBOL_REFs.

Signed-off-by: David Holsgrove 



0007-Patch-microblaze-Extend-jump-insn-to-accept-bri-to-S.patch
Description: 0007-Patch-microblaze-Extend-jump-insn-to-accept-bri-to-S.patch


Re: [PATCH] Fix PR56605

2013-03-18 Thread Eric Botcazou
> 2013-03-13  Bill Schmidt  
>   Steven Bosscher 
> 
>   PR rtl-optimization/56605
>   * loop-iv.c (implies_p): Handle equal RTXs and subregs.
> 
> gcc/testsuite:
> 
> 2013-03-13  Bill Schmidt  wschm...@linux.vnet.ibm.com>
> 
>   PR rtl-optimization/56605
>   * gcc.target/powerpc/pr56605.c: New.

OK, thanks.

-- 
Eric Botcazou


Re: [PATCH] libgcc: Add DWARF info to aeabi_ldivmod and aeabi_uldivmod

2013-03-18 Thread Meador Inge
Ping.

On 03/05/2013 12:15 PM, Meador Inge wrote:
> Hi All,
> 
> This patch fixes a minor annoyance that causes backtraces to disappear
> inside of aeabi_ldivmod and aeabi_uldivmod due to the lack of appropriate
> DWARF information.  I fixed the problem by adding the necessary cfi_*
> macros in these functions.
> 
> OK?
> 
> 2013-03-05  Meador Inge  
> 
>   * config/arm/bpabi.S (aeabi_ldivmod): Add DWARF information for
>   computing the location of the link register.
>   (aeabi_uldivmod): Ditto.
> 
> Index: libgcc/config/arm/bpabi.S
> ===
> --- libgcc/config/arm/bpabi.S (revision 196470)
> +++ libgcc/config/arm/bpabi.S (working copy)
> @@ -123,6 +123,7 @@ ARM_FUNC_START aeabi_ulcmp
>  #ifdef L_aeabi_ldivmod
>  
>  ARM_FUNC_START aeabi_ldivmod
> + cfi_start   __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod)
>   test_div_by_zero signed
>  
>   sub sp, sp, #8
> @@ -132,17 +133,20 @@ ARM_FUNC_START aeabi_ldivmod
>  #else
>   do_push {sp, lr}
>  #endif
> +98:  cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>   bl SYM(__gnu_ldivmod_helper) __PLT__
>   ldr lr, [sp, #4]
>   add sp, sp, #8
>   do_pop {r2, r3}
>   RET
> + cfi_end LSYM(Lend_aeabi_ldivmod)
>   
>  #endif /* L_aeabi_ldivmod */
>  
>  #ifdef L_aeabi_uldivmod
>  
>  ARM_FUNC_START aeabi_uldivmod
> + cfi_start   __aeabi_uldivmod, LSYM(Lend_aeabi_uldivmod)
>   test_div_by_zero unsigned
>  
>   sub sp, sp, #8
> @@ -152,11 +156,13 @@ ARM_FUNC_START aeabi_uldivmod
>  #else
>   do_push {sp, lr}
>  #endif
> +98:  cfi_push 98b - __aeabi_uldivmod, 0xe, -0xc, 0x10
>   bl SYM(__gnu_uldivmod_helper) __PLT__
>   ldr lr, [sp, #4]
>   add sp, sp, #8
>   do_pop {r2, r3}
>   RET
> - 
> + cfi_end LSYM(Lend_aeabi_uldivmod)
> +
>  #endif /* L_aeabi_divmod */
>   
> 


-- 
Meador Inge
CodeSourcery / Mentor Embedded


PATCH: PR target/56560: [4.6/4.7 regression] vzeroupper clobbers argument with AVX

2013-03-18 Thread H.J. Lu
Hi,

ix86_function_arg sets cfun->machine->callee_pass_avx256_p from the
current argument.  It clears callee_pass_avx256_p when ix86_function_arg
is called to generate a library call to passs an argument.  This patch
adds callee_pass_avx256_p and callee_return_avx256_p to ix86_args to store
the AVX info in CUM and copy it to cfun->machine->callee_pass_avx256_p
when ix86_function_arg is called immediately before the call instruction
is emitted.  OK for 4.7 branch?

Thanks.


H.J.
--
gcc/

2013-03-18  H.J. Lu  

PR target/56560
* config/i386/i386.c (init_cumulative_args): Also set
cum->callee_return_avx256_p.
(ix86_function_arg): Set cum->callee_pass_avx256_p.  Set
cfun->machine->callee_pass_avx256_p only when MODE == VOIDmode.

* config/i386/i386.h (ix86_args): Add callee_pass_avx256_p and
callee_return_avx256_p.

gcc/

2013-03-18  H.J. Lu  

PR target/56560
* gcc.target/i386/pr56560.c: New file.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c1f6c88..7a441c7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5592,7 +5592,10 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument 
info to initialize */
{
  /* The return value of this function uses 256bit AVX modes.  */
  if (caller)
-   cfun->machine->callee_return_avx256_p = true;
+   {
+ cfun->machine->callee_return_avx256_p = true;
+ cum->callee_return_avx256_p = true;
+   }
  else
cfun->machine->caller_return_avx256_p = true;
}
@@ -6863,11 +6866,20 @@ ix86_function_arg (cumulative_args_t cum_v, enum 
machine_mode omode,
 {
   /* This argument uses 256bit AVX modes.  */
   if (cum->caller)
-   cfun->machine->callee_pass_avx256_p = true;
+   cum->callee_pass_avx256_p = true;
   else
cfun->machine->caller_pass_avx256_p = true;
 }
 
+  if (cum->caller && mode == VOIDmode)
+{
+  /* This function is called with MODE == VOIDmode immediately
+before the call instruction is emitted.  We copy callee 256bit
+AVX info from the current CUM here.  */
+  cfun->machine->callee_return_avx256_p = cum->callee_return_avx256_p;
+  cfun->machine->callee_pass_avx256_p = cum->callee_pass_avx256_p;
+}
+
   return arg;
 }
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 80d19f1..899678d 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1502,6 +1502,10 @@ typedef struct ix86_args {
   in SSE registers.  Otherwise 0.  */
   enum calling_abi call_abi;   /* Set to SYSV_ABI for sysv abi. Otherwise
   MS_ABI for ms abi.  */
+  /* Nonzero if it passes 256bit AVX modes.  */
+  BOOL_BITFIELD callee_pass_avx256_p : 1;
+  /* Nonzero if it returns 256bit AVX modes.  */
+  BOOL_BITFIELD callee_return_avx256_p : 1;
 } CUMULATIVE_ARGS;
 
 /* Initialize a variable CUM of type CUMULATIVE_ARGS
diff --git a/gcc/testsuite/gcc.target/i386/pr56560.c 
b/gcc/testsuite/gcc.target/i386/pr56560.c
new file mode 100644
index 000..5417cbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr56560.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -mvzeroupper -dp" } */
+
+extern void abort (void);
+
+typedef double vec_t __attribute__((vector_size(32)));
+
+struct S { int i1; int i2; int i3; };
+
+extern int bar (vec_t, int, int, int, int, int, struct S);
+
+void foo (vec_t v, struct S s)
+{
+  int i = bar (v, 1, 2, 3, 4, 5, s);
+  if (i == 0)
+abort ();
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */


Re: [PATCH 1/4] Mark all member functions with memory models always inline

2013-03-18 Thread Andi Kleen
On Mon, Mar 18, 2013 at 04:28:13PM +, Jonathan Wakely wrote:
> On 16 March 2013 13:29, Andi Kleen wrote:
> >
> > With inline __attribute__((always_inline)) these functions
> > get inlined even with -O0.
> >
> > I hardcoded the attribute in the header for now, assuming
> > that all compilers that support libstdc++ have attribute
> > always_inline too. If not it would need to be moved
> > as a macro to c++config.h with appropiate ifdefs.
> 
> That should be fine.  I assume __always_inline was chosen rather than
> _GLIBCXX_ALWAYS_INLINE for consistency with glibc?

Actually with the Linux kernel, but it's pretty arbitary to be honest.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH 1/4] Mark all member functions with memory models always inline

2013-03-18 Thread Jonathan Wakely
On 18 March 2013 16:28, Jonathan Wakely wrote:
> On 16 March 2013 13:29, Andi Kleen wrote:
>>
>> With inline __attribute__((always_inline)) these functions
>> get inlined even with -O0.
>>
>> I hardcoded the attribute in the header for now, assuming
>> that all compilers that support libstdc++ have attribute
>> always_inline too. If not it would need to be moved
>> as a macro to c++config.h with appropiate ifdefs.
>
> That should be fine.  I assume __always_inline was chosen rather than
> _GLIBCXX_ALWAYS_INLINE for consistency with glibc?

Ah, I see it's also being used in libitm and libatomic, not just libstdc++.


Re: [PATCH 1/4] Mark all member functions with memory models always inline

2013-03-18 Thread Jonathan Wakely
On 16 March 2013 13:29, Andi Kleen wrote:
>
> With inline __attribute__((always_inline)) these functions
> get inlined even with -O0.
>
> I hardcoded the attribute in the header for now, assuming
> that all compilers that support libstdc++ have attribute
> always_inline too. If not it would need to be moved
> as a macro to c++config.h with appropiate ifdefs.

That should be fine.  I assume __always_inline was chosen rather than
_GLIBCXX_ALWAYS_INLINE for consistency with glibc?


Re: [google][4.7]Using CPU mocks to test code coverage of multiversioned functions

2013-03-18 Thread Richard Biener
"H.J. Lu"  wrote:

>On Mon, Mar 18, 2013 at 10:02 AM, Paul Pluzhnikov
> wrote:
>> +cc libc-alpha
>>
>> On Mon, Mar 18, 2013 at 9:05 AM, Xinliang David Li
> wrote:
>>> Interesting idea about lazy IFUNC relocation.
>>
>>> On Mon, Mar 18, 2013 at 2:02 AM, Richard Biener
>>>  wrote:
>>
 On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam
> wrote:
>>
>This patch is meant for google/gcc-4_7 but I want this to be
> considered for trunk when it opens again. This patch makes it easy
>to
> test for code coverage of multiversioned functions. Here is a
> motivating example:
>>
 Err.  As we are using IFUNCs isn't it simply possible to do this in
 the dynamic loader - for example by simlply pre-loading a library
 with the IFUNC relocators implemented differently?  Thus, shouldn't
 we simply provide such library as a convenience?
>>
>> A similar need exists in glibc itself: it too has multiversioned
>functions,
>> and lack of testing has led to recent bugs in some of them.
>>
>> HJ has added a framework to test IFUNCs to glibc late last year, but
>it
>> would be nice to have a more general IFUNC control, so I could e.g.
>run
>> a binary on SSE4-capable machine A as that binary would run on
>SSE2-only
>> capable machine B.
>>
>> (We've had a few bugs recently, were the crash would only show on
>machine
>> B and not A. These are a pain to debug, as I may not have access to
>B.)
>>
>> If such a controller is implemented, I'd think it would have to be
>part
>> of GLIBC (or part of the ld-linux itself), and not of libgcc.
>>
>>   LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are
>available
>>
>
>We can pass environment variables to IFUNC selector.   Maybe we can
>enable it for debug build.

I was asking for the ifunc selector to be
Overridable by ld_preload or a similar mechanism at dynamic load time.

Richard.



Re: [google][4.7]Using CPU mocks to test code coverage of multiversioned functions

2013-03-18 Thread H.J. Lu
On Mon, Mar 18, 2013 at 10:02 AM, Paul Pluzhnikov
 wrote:
> +cc libc-alpha
>
> On Mon, Mar 18, 2013 at 9:05 AM, Xinliang David Li  wrote:
>> Interesting idea about lazy IFUNC relocation.
>
>> On Mon, Mar 18, 2013 at 2:02 AM, Richard Biener
>>  wrote:
>
>>> On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam  
>>> wrote:
>
This patch is meant for google/gcc-4_7 but I want this to be
 considered for trunk when it opens again. This patch makes it easy to
 test for code coverage of multiversioned functions. Here is a
 motivating example:
>
>>> Err.  As we are using IFUNCs isn't it simply possible to do this in
>>> the dynamic loader - for example by simlply pre-loading a library
>>> with the IFUNC relocators implemented differently?  Thus, shouldn't
>>> we simply provide such library as a convenience?
>
> A similar need exists in glibc itself: it too has multiversioned functions,
> and lack of testing has led to recent bugs in some of them.
>
> HJ has added a framework to test IFUNCs to glibc late last year, but it
> would be nice to have a more general IFUNC control, so I could e.g. run
> a binary on SSE4-capable machine A as that binary would run on SSE2-only
> capable machine B.
>
> (We've had a few bugs recently, were the crash would only show on machine
> B and not A. These are a pain to debug, as I may not have access to B.)
>
> If such a controller is implemented, I'd think it would have to be part
> of GLIBC (or part of the ld-linux itself), and not of libgcc.
>
>   LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are 
> available
>

We can pass environment variables to IFUNC selector.   Maybe we can
enable it for debug build.

-- 
H.J.


Re: [google][4.7]Using CPU mocks to test code coverage of multiversioned functions

2013-03-18 Thread Paul Pluzhnikov
+cc libc-alpha

On Mon, Mar 18, 2013 at 9:05 AM, Xinliang David Li  wrote:
> Interesting idea about lazy IFUNC relocation.

> On Mon, Mar 18, 2013 at 2:02 AM, Richard Biener
>  wrote:

>> On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam  
>> wrote:

>>>This patch is meant for google/gcc-4_7 but I want this to be
>>> considered for trunk when it opens again. This patch makes it easy to
>>> test for code coverage of multiversioned functions. Here is a
>>> motivating example:

>> Err.  As we are using IFUNCs isn't it simply possible to do this in
>> the dynamic loader - for example by simlply pre-loading a library
>> with the IFUNC relocators implemented differently?  Thus, shouldn't
>> we simply provide such library as a convenience?

A similar need exists in glibc itself: it too has multiversioned functions,
and lack of testing has led to recent bugs in some of them.

HJ has added a framework to test IFUNCs to glibc late last year, but it
would be nice to have a more general IFUNC control, so I could e.g. run
a binary on SSE4-capable machine A as that binary would run on SSE2-only
capable machine B.

(We've had a few bugs recently, were the crash would only show on machine
B and not A. These are a pain to debug, as I may not have access to B.)

If such a controller is implemented, I'd think it would have to be part
of GLIBC (or part of the ld-linux itself), and not of libgcc.

  LD_CPU_FEATURES=sse,sse2 ./a.out  # run as if only sse and sse2 are available

Thanks,
-- 
Paul Pluzhnikov


Re: [google][4.7]Using CPU mocks to test code coverage of multiversioned functions

2013-03-18 Thread Paul Pluzhnikov
On Mon, Mar 18, 2013 at 10:18 AM, Richard Biener
 wrote:
> "H.J. Lu"  wrote:

>>We can pass environment variables to IFUNC selector.   Maybe we can
>>enable it for debug build.

Enabling this for just debug builds would not cover my use case.

If the environment variable is used at loader initialization time to
override CPUID output, then the runtime cost of that code would be minuscule,
and it can be available in production glibc builds.

> I was asking for the ifunc selector to be
> Overridable by ld_preload or a similar mechanism at dynamic load time.

Yes, that's how I understood you.

I don't believe it would be easy to implement such interposer (if
possible at all), and it would be very much tied to glibc internals.

Overriding CPUID at loader initialization time sounds simpler (but I
haven't looked at the code yet :-).

-- 
Paul Pluzhnikov


Re: [google][4.7]Using CPU mocks to test code coverage of multiversioned functions

2013-03-18 Thread Xinliang David Li
Interesting idea about lazy IFUNC relocation.

David

On Mon, Mar 18, 2013 at 2:02 AM, Richard Biener
 wrote:
> On Fri, Mar 15, 2013 at 10:55 PM, Sriraman Tallam  wrote:
>> Hi,
>>
>>This patch is meant for google/gcc-4_7 but I want this to be
>> considered for trunk when it opens again. This patch makes it easy to
>> test for code coverage of multiversioned functions. Here is a
>> motivating example:
>>
>> __attribute__((target ("default"))) int foo () { ... return 0; }
>> __attribute__((target ("sse"))) int foo () { ... return 1; }
>> __attribute__((target ("popcnt"))) int foo () { ... return 2; }
>>
>> int main ()
>> {
>>   return foo();
>> }
>>
>> Lets say your test CPU supports popcnt.  A run of this program will
>> invoke the popcnt version of foo (). Then, how do we test the sse
>> version of foo()? To do that for the above example, we need to run
>> this code on a CPU that has sse support but no popcnt support.
>> Otherwise, we need to comment out the popcnt version and run this
>> example. This can get painful when there are many versions. The same
>> argument applies to testing  the default version of foo.
>>
>> So, I am introducing the ability to mock a CPU. If the CPU you are
>> testing on supports sse, you should be able to test the sse version.
>>
>> First, I have introduced a new flag called -fmv-debug.  This patch
>> invokes the function version dispatcher every time a call to a foo ()
>> is made. Without that flag, the version dispatch happens once at
>> startup time via the IFUNC mechanism.
>>
>> Also, with -fmv-debug, the version dispatcher uses the two new
>> builtins "__builtin_mock_cpu_is" and "__builtin_mock_cpu_supports" to
>> check the cpu type and cpu isa.
>>
>> Then, I plan to add the following hooks to libgcc (in a different patch) :
>>
>> int set_mock_cpu_is (const char *cpu);
>> int set_mock_cpu_supports (const char *isa);
>> int init_mock_cpu (); // Clear the values of the mock cpu.
>>
>> With this support, here is how you can test for code coverage of the
>> "sse" version and "default version of foo in the above example:
>>
>> int main ()
>> {
>>   // Test SSE version.
>>if (__builtin_cpu_supports ("sse"))
>>{
>>  init_mock_cpu();
>>  set_mock_cpu_supports ("sse");
>>  assert (foo () == 1);
>>}
>>   // Test default version.
>>   init_mock_cpu();
>>   assert (foo () == 0);
>> }
>>
>> Invoking a multiversioned binary several times with appropriate mock
>> cpu values for the various ISAs and CPUs will give the complete code
>> coverage desired. Ofcourse, the underlying platform should be able to
>> support the various features.
>>
>> Note that the above test will work only with -fmv-debug as the
>> dispatcher must be invoked on every multiversioned call to be able to
>> dynamically change the version.
>>
>> Multiple ISA features can be set in the mock cpu by calling
>> "set_mock_cpu_supports" several times with different ISA names.
>> Calling "init_mock_cpu" will clear all the values. "set_mock_cpu_is"
>> will set the CPU type.
>>
>> This patch only includes the gcc changes.  I will separately prepare a
>> patch for the libgcc changes. Right now, since the libgcc changes are
>> not available the two new mock cpu builtins check the real CPU like
>> "__builtin_cpu_is" and "__builtin_cpu_supports".
>>
>> Patch attached.  Please look at mv14_debug_code_coverage.C for an
>> exhaustive example of testing for code coverage in the presence of
>> multiple versions.
>>
>> Comments please.
>
> Err.  As we are using IFUNCs isn't it simply possible to do this in
> the dynamic loader - for example by simlply pre-loading a library
> with the IFUNC relocators implemented differently?  Thus, shouldn't
> we simply provide such library as a convenience?
>
> Thanks,
> Richard.
>
>> Thanks
>> Sri


Re: [PING^5] PR 54805: __gthread_tsd* in vxlib-tls.c

2013-03-18 Thread rbmj

On 16-Feb-13 23:21, Maxim Kuvyrkov wrote:

On 14/02/2013, at 10:18 AM, rbmj wrote:

Here's the updated, (trivial) patch.


Thanks.  I'll apply this once 4.8 branches and trunk is back into development 
mode.



Since GCC 4.9 has branched now are you still willing to commit (maybe 
after the outage is over; I don't know the state of the svn server)?


One of my friends has also commented that the warning that this fixes 
causes the launchpad PPA system to reject the package (based on the 
build log), so is it possible for this to apply in 4.8.1 also?  I don't 
know how that process works, I assume I'd have to wait until after 4.8.0 
officially releases.  I understand that it's way too late for 4.8.0 
(_trivial_ as the fix is) :(


Suggested ChangeLog:

[libgcc]
 Robert Mason 
* config/vxlib-tls.c: Add prototypes for __gthread_tsd*()

Robert Mason


Re: [google][4.7]Using CPU mocks to test code coverage of multiversioned functions

2013-03-18 Thread Alan Modra
On Mon, Mar 18, 2013 at 06:18:58PM +0100, Richard Biener wrote:
> I was asking for the ifunc selector to be
> Overridable by ld_preload or a similar mechanism at dynamic load time.

Please don't.  Calling an ifunc resolver function in another library
is just asking for trouble with current glibc.  Why?  Well, the other
library containing the resolver function may not have had any dynamic
relocations applied.  So if the resolver makes use of the GOT (to read
some variable), it will use unrelocated addresses.  You'll segfault if
you're lucky.

For anyone playing with ifunc, please test out your great ideas on
i386, ppc32, mips, arm, etc. *NOT* x86_64 or powerpc64 which both
avoid the GOT in many cases.

-- 
Alan Modra
Australia Development Lab, IBM