date:20141119

Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Jakub Jelinek

On Tue, Nov 18, 2014 at 04:34:12PM -0800, Mike Stump wrote:
> On Nov 18, 2014, at 3:42 PM, Jakub Jelinek  wrote:
> > No, I'm not touching tmp array at all in that case, only look at vp
> > individual bytes.  Either they are all 0, or all 0xff, or I return NULL.
> 
> Oh, sorry, I misread where the break; in your code was going.  I might have 
> been misled by:
> 
> > -   gcc_assert (GET_MODE_PRECISION (outer_submode)
> > -   <= MAX_BITSIZE_MODE_ANY_INT);
> 
> in your patch.  You don’t need that anymore, do you?  If not, can you remove 
> this part.

I thought the assert is unnecessary given the condition just a few lines
above it.  But can keep it, perhaps gcc_checking_assert would be enough, and
hopefully compiler optimizes it away completely.

> 
> The rest looks like normal rtl/vector code, I don’t see anything wrong with 
> it.

Jakub

Re: [PATCH, testsuite] Add bind_pic_locally to certain tests

2014-11-19 Thread Andreas Schwab

* gcc.dg/pure-2.c: Update line numbers.

diff --git a/gcc/testsuite/gcc.dg/pure-2.c b/gcc/testsuite/gcc.dg/pure-2.c
index 638bd7c..fe6e2bc 100644
--- a/gcc/testsuite/gcc.dg/pure-2.c
+++ b/gcc/testsuite/gcc.dg/pure-2.c
@@ -8,14 +8,14 @@ extern int v;
 /* Trivial.  */
 int
 foo1(int a)  /* { dg-bogus "normally" "detect pure candidate" } */
-{ /* { dg-warning "pure" "detect pure candidate" { target *-*-* } "9" } */
+{ /* { dg-warning "pure" "detect pure candidate" { target *-*-* } "10" } */
   return v;
 }
 
 /* Loops known to be normally and extern const calls should be safe.  */
 int __attribute__ ((noinline))
 foo2(int n)  /* { dg-bogus "normally" "detect pure candidate" } */
-{ /* { dg-warning "pure" "detect pure candidate" { target *-*-* } "16" } */
+{ /* { dg-warning "pure" "detect pure candidate" { target *-*-* } "17" } */
   int ret = 0;
   int i;
   for (i=0; i

Re: [PATCH, PR62167] Fix tail-merge pass for dead type-unsafe code

2014-11-19 Thread Richard Biener

On Tue, 18 Nov 2014, Tom de Vries wrote:

> Richard,
> 
> this (trunk) patch fixes PR62167.
> 
> The patch fixes a problem that triggers with the test-case on the 4.8 branch,
> when tail-merge makes a dead type-unsafe load alive.
> 
> I'm not able to reproduce this bug on 4.9 and trunk with the same test-case.
> On those branches, the tail-merge already does not happen.
> 
> The reason for the difference is as follows: With 4.8 the two phi arguments of
> the phi in the tail block are value-numbered identically:
> ...
> SCC consists of: p_14
> Value numbering p_14 stmt = p_14 = MEM[(struct head *)_13].first;
> Setting value number of p_14 to p_14 (changed)
> 
> SCC consists of: p_15
> Value numbering p_15 stmt = p_15 = _13->next;
> Setting value number of p_15 to p_14 (changed)
> ...
> 
> With 4.9 (and trunk), that's not the case:
> ...
> SCC consists of: p_14
> Value numbering p_14 stmt = p_14 = MEM[(struct head *)&heads][k.1_9].first;
> Setting value number of p_14 to p_14 (changed)
> 
> SCC consists of: p_15
> Value numbering p_15 stmt = p_15 = _13->next;
> Setting value number of p_15 to p_15 (changed)
> ...
> 
> I'm not sure the bug triggers on trunk and 4.9, but I see no reason why it
> could not trigger, so I'd prefer to apply the patch to 4.9 and trunk as well.
> 
> The patch introduces an xfail for pr51879-12.c. I can follow up with a patch
> to improve upon that, but I think that's better limited to trunk only.

Yeah.

> Bootstrapped and reg-tested on x86_64/trunk.
> 
> OK for trunk/stage3, 4.8, 4.9?

Ok.  Note that this goes well with disabling the (re-)use of 
value-numbering in tail-merging (which causes me quite some headaches...).
On trunk it shouldn't be necessary as much as it was earlier as we now
fully propagate constants and copies during FRE/PRE elimination.

Thanks,
Richard.

Re: [PATCH] Fix -fsanitize=bool -fnon-call-exceptions (PR sanitizer/63913)

2014-11-19 Thread Richard Biener

On Tue, 18 Nov 2014, Jakub Jelinek wrote:

> Hi!
> 
> This patch fixes instrumentation of bool/enum loads if they could
> throw.  We want to keep the EH on the load, and push
> further statements on the fallthru edge.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2014-11-18  Jakub Jelinek  
> 
>   PR sanitizer/63913
>   * ubsan.c: Include tree-eh.h.
>   (instrument_bool_enum_load): Handle loads that can throw.
> 
>   * g++.dg/ubsan/pr63913.C: New test.
> 
> --- gcc/ubsan.c.jj2014-11-14 15:11:49.0 +0100
> +++ gcc/ubsan.c   2014-11-18 09:29:20.409209556 +0100
> @@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.
>  #include "dfp.h"
>  #include "builtins.h"
>  #include "tree-object-size.h"
> +#include "tree-eh.h"
>  
>  /* Map from a tree to a VAR_DECL tree.  */
>  
> @@ -1135,7 +1136,9 @@ instrument_bool_enum_load (gimple_stmt_i
>|| TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
>  return;
>  
> +  bool can_throw = stmt_could_throw_p (stmt);
>location_t loc = gimple_location (stmt);
> +  tree lhs = gimple_assign_lhs (stmt);
>tree ptype = build_pointer_type (TREE_TYPE (rhs));
>tree atype = reference_alias_ptr_type (rhs);
>gimple g = gimple_build_assign (make_ssa_name (ptype, NULL),
> @@ -1145,9 +1148,24 @@ instrument_bool_enum_load (gimple_stmt_i
>tree mem = build2 (MEM_REF, utype, gimple_assign_lhs (g),
>build_int_cst (atype, 0));
>tree urhs = make_ssa_name (utype, NULL);
> -  g = gimple_build_assign (urhs, mem);
> -  gimple_set_location (g, loc);
> -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  if (can_throw)
> +{
> +  gimple_assign_set_lhs (stmt, urhs);
> +  g = gimple_build_assign_with_ops (NOP_EXPR, lhs, urhs, NULL_TREE);
> +  gimple_set_location (g, loc);
> +  edge e = find_fallthru_edge (gimple_bb (stmt)->succs);
> +  gsi_insert_on_edge_immediate (e, g);
> +  gimple_assign_set_rhs_with_ops (gsi, MEM_REF, mem, NULL_TREE);

Please use gimple_assign_set_rhs_from_tree for single-rhs trees.

Ok with that change.

Thanks,
RIchard.

> +  update_stmt (stmt);
> +  *gsi = gsi_for_stmt (g);
> +  g = stmt;
> +}
> +  else
> +{
> +  g = gimple_build_assign (urhs, mem);
> +  gimple_set_location (g, loc);
> +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +}
>minv = fold_convert (utype, minv);
>maxv = fold_convert (utype, maxv);
>if (!integer_zerop (minv))
> @@ -1169,8 +1187,11 @@ instrument_bool_enum_load (gimple_stmt_i
>gimple_set_location (g, loc);
>gsi_insert_after (gsi, g, GSI_NEW_STMT);
>  
> -  gimple_assign_set_rhs_with_ops (&gsi2, NOP_EXPR, urhs, NULL_TREE);
> -  update_stmt (stmt);
> +  if (!can_throw)
> +{
> +  gimple_assign_set_rhs_with_ops (&gsi2, NOP_EXPR, urhs, NULL_TREE);
> +  update_stmt (stmt);
> +}
>  
>gsi2 = gsi_after_labels (then_bb);
>if (flag_sanitize_undefined_trap_on_error)
> --- gcc/testsuite/g++.dg/ubsan/pr63913.C.jj   2014-11-18 09:34:10.603981487 
> +0100
> +++ gcc/testsuite/g++.dg/ubsan/pr63913.C  2014-11-18 09:33:47.0 
> +0100
> @@ -0,0 +1,12 @@
> +// PR sanitizer/63913
> +// { dg-do compile }
> +// { dg-options "-fsanitize=bool -fnon-call-exceptions" }
> +
> +struct B { B (); ~B (); };
> +
> +double
> +foo (bool *x)
> +{
> +  B b;
> +  return *x;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Richard Biener

On Tue, 18 Nov 2014, Jakub Jelinek wrote:

> Hi!
> 
> OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
> they are never used for integer arithmetics (and there is no way
> to represent all their values in RTL if not using CONST_WIDE_INT).
> As the following testcase shows, simplify_immed_subreg can be called
> with such modes though, e.g. trying to forward propagate a CONST_VECTOR
> (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
> in the IL directly) into a SUBREG_REG.

So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
end doing with that OI mode subreg?  (why can't we simply use the
V4DF one?)

> The following patch instead of ICE handles the most common cases (all 0
> and all 1 CONST_VECTORs) and returns NULL otherwise.
> 
> Before wide-int got merged, the testcase worked, though the code didn't
> bother checking anything, just created 0 or constm1_rtx for the two cases
> that could happen and if something else appeared, could just return what
> matched low TImode (or DImode for -m32).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Looks ok to me.  Not sure if the zero/all-ones case really happens that
much to be worth special-casing - I think you could use
fixed_wide_int and simply see if the result is representable
in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
the patch may be smaller for that.

Thanks,
Richard.

> 2014-11-18  Jakub Jelinek  
> 
>   PR target/63910
>   * simplify-rtx.c (simplify_immed_subreg): For integer modes
>   wider than MAX_BITSIZE_MODE_ANY_INT, handle all zeros and all ones
>   and for other values return NULL_RTX.
> 
>   * gcc.target/i386/pr63910.c: New test.
> 
> --- gcc/simplify-rtx.c.jj 2014-11-11 00:06:19.0 +0100
> +++ gcc/simplify-rtx.c2014-11-18 10:17:01.198668555 +0100
> @@ -5505,6 +5505,21 @@ simplify_immed_subreg (machine_mode oute
>   HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / 
> HOST_BITS_PER_WIDE_INT];
>   wide_int r;
>  
> + if (GET_MODE_PRECISION (outer_submode) > MAX_BITSIZE_MODE_ANY_INT)
> +   {
> + /* Handle just all zeros and all ones CONST_VECTORs in
> +this case.  */
> + if ((vp[0] & value_mask) == 0)
> +   elems[elem] = const0_rtx;
> + else if ((vp[0] & value_mask) == value_mask)
> +   elems[elem] = constm1_rtx;
> + else
> +   return NULL_RTX;
> + for (i = value_bit; i < elem_bitsize; i += value_bit)
> +   if ((vp[i / value_bit] & value_mask) != (vp[0] & value_mask))
> + return NULL_RTX;
> + break;
> +   }
>   for (u = 0; u < units; u++)
> {
>   unsigned HOST_WIDE_INT buf = 0;
> @@ -5516,8 +5531,6 @@ simplify_immed_subreg (machine_mode oute
>   tmp[u] = buf;
>   base += HOST_BITS_PER_WIDE_INT;
> }
> - gcc_assert (GET_MODE_PRECISION (outer_submode)
> - <= MAX_BITSIZE_MODE_ANY_INT);
>   r = wide_int::from_array (tmp, units,
> GET_MODE_PRECISION (outer_submode));
>   elems[elem] = immed_wide_int_const (r, outer_submode);
> --- gcc/testsuite/gcc.target/i386/pr63910.c.jj2014-11-18 
> 10:12:24.282659318 +0100
> +++ gcc/testsuite/gcc.target/i386/pr63910.c   2014-11-18 10:12:00.0 
> +0100
> @@ -0,0 +1,12 @@
> +/* PR target/63910 */
> +/* { dg-do compile } */
> +/* { dg-options "-O -mstringop-strategy=vector_loop -mavx512f" } */
> +
> +extern void bar (float *c);
> +
> +void
> +foo (void)
> +{
> +  float c[1024] = { };
> +  bar (c);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)

2014-11-19 Thread Richard Biener

On Tue, 18 Nov 2014, Jakub Jelinek wrote:

> Hi!
> 
> Apparently, expand_expr with EXPR_WRITE can return
> a SUBREG with SUBREG_PROMOTED_VAR_P set on it.  For

Huh, that looks bogus to me.  But of course I know nothing
(read: not enough) to really tell.  Eric?

Richard.

> UBSAN_CHECK_{ADD,SUB,MUL} expansion, I've been doing just
> emit_move_insn into it, which apparently is wrong in that case,
> store_expr instead uses convert_move for it.  The
> {ADD,SUB,MUL}_OVERFLOW (i.e. __builtin_*_overflow) expansion
> shouldn't need it, as the result is complex and complex integers
> aren't promoted that way.  As store_expr* uses a tree expression
> to store, while I have rtx, I just wrote a short helper function
> for this.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2014-11-18  Jakub Jelinek  
> 
>   PR sanitizer/63520
>   * internal-fn.c (expand_ubsan_result_store): New function.
>   (expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow):
>   Use it instead of just emit_move_insn.
> 
>   * c-c++-common/ubsan/pr63520.c: New test.
> 
> --- gcc/internal-fn.c.jj  2014-11-12 13:28:47.0 +0100
> +++ gcc/internal-fn.c 2014-11-18 15:35:46.395916823 +0100
> @@ -395,6 +395,21 @@ expand_arith_overflow_result_store (tree
>write_complex_part (target, lres, false);
>  }
>  
> +/* Helper for expand_*_overflow.  Store RES into TARGET.  */
> +
> +static void
> +expand_ubsan_result_store (rtx target, rtx res)
> +{
> +  if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
> +/* If this is a scalar in a register that is stored in a wider mode   
> +   than the declared mode, compute the result into its declared mode
> +   and then convert to the wider mode.  Our value is the computed
> +   expression.  */
> +convert_move (SUBREG_REG (target), res, SUBREG_PROMOTED_SIGN (target));
> +  else
> +emit_move_insn (target, res);
> +}
> +
>  /* Add sub/add overflow checking to the statement STMT.
> CODE says whether the operation is +, or -.  */
>  
> @@ -809,7 +824,7 @@ expand_addsub_overflow (location_t loc,
>if (lhs)
>  {
>if (is_ubsan)
> - emit_move_insn (target, res);
> + expand_ubsan_result_store (target, res);
>else
>   {
> if (do_xor)
> @@ -904,7 +919,7 @@ expand_neg_overflow (location_t loc, tre
>if (lhs)
>  {
>if (is_ubsan)
> - emit_move_insn (target, res);
> + expand_ubsan_result_store (target, res);
>else
>   expand_arith_overflow_result_store (lhs, target, mode, res);
>  }
> @@ -1590,7 +1605,7 @@ expand_mul_overflow (location_t loc, tre
>if (lhs)
>  {
>if (is_ubsan)
> - emit_move_insn (target, res);
> + expand_ubsan_result_store (target, res);
>else
>   expand_arith_overflow_result_store (lhs, target, mode, res);
>  }
> --- gcc/testsuite/c-c++-common/ubsan/pr63520.c.jj 2014-11-18 
> 15:40:07.271273710 +0100
> +++ gcc/testsuite/c-c++-common/ubsan/pr63520.c2014-11-18 
> 15:40:40.971673904 +0100
> @@ -0,0 +1,16 @@
> +/* PR sanitizer/63520 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined" } */
> +
> +int a;
> +
> +void
> +foo (void)
> +{
> +  while (1)
> +{
> +  if (a == 1)
> + break;
> +  a -= 1;
> +}
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

[PATCH] Fix PR63844

2014-11-19 Thread Richard Biener


The following improves code-generation for PR63844 by using
a restrict qualified reference type for the OMP receiver decl.
This improves alias analysis and points-to analysis enough to
usually allow invariant and store motion where that was possible
in the non-split-out variant.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?

Sadly I don't have a good testcase (for example one that wasn't
vectorized before but is now).  We can add one later - ISTR there
were a few other bugs about not vectorizing with -fopenmp, I'll
try to find them again and see whether they are fixed.

Thanks,
Richard. 

2014-11-19  Richard Biener  

PR tree-optimization/63844
* omp-low.c (fixup_child_record_type): Use a restrict qualified
referece type for the receiver parameter.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c   (revision 217692)
+++ gcc/omp-low.c   (working copy)
@@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct
   layout_type (type);
 }
 
-  TREE_TYPE (ctx->receiver_decl) = build_pointer_type (type);
+  TREE_TYPE (ctx->receiver_decl)
+= build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
 }
 
 /* Instantiate decls as necessary in CTX to satisfy the data sharing

Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)

2014-11-19 Thread Jakub Jelinek

On Wed, Nov 19, 2014 at 10:02:18AM +0100, Richard Biener wrote:
> On Tue, 18 Nov 2014, Jakub Jelinek wrote:
> > Apparently, expand_expr with EXPR_WRITE can return
> > a SUBREG with SUBREG_PROMOTED_VAR_P set on it.  For
> 
> Huh, that looks bogus to me.  But of course I know nothing
> (read: not enough) to really tell.  Eric?

I've tried to look where it comes from, and it dates back to r2xxx
or so, so forever.
And store_expr has a large:
  else if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
/* If this is a scalar in a register that is stored in a wider mode
   than the declared mode, compute the result into its declared mode
   and then convert to the wider mode.  Our value is the computed
   expression.  */
handling block.  For targets with only word sized operations something
like that actually makes a lot of sense, I have just not been aware of that.

Jakub

Re: [PATCH] Fix PR63844

2014-11-19 Thread Jakub Jelinek

On Wed, Nov 19, 2014 at 10:07:07AM +0100, Richard Biener wrote:
> 
> The following improves code-generation for PR63844 by using
> a restrict qualified reference type for the OMP receiver decl.
> This improves alias analysis and points-to analysis enough to
> usually allow invariant and store motion where that was possible
> in the non-split-out variant.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
> 
> Sadly I don't have a good testcase (for example one that wasn't
> vectorized before but is now).  We can add one later - ISTR there
> were a few other bugs about not vectorizing with -fopenmp, I'll
> try to find them again and see whether they are fixed.

Most of those PRs should have openmp keyword.

> 2014-11-19  Richard Biener  
> 
>   PR tree-optimization/63844
>   * omp-low.c (fixup_child_record_type): Use a restrict qualified
>   referece type for the receiver parameter.

Ok, thanks.

> --- gcc/omp-low.c (revision 217692)
> +++ gcc/omp-low.c (working copy)
> @@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct
>layout_type (type);
>  }
>  
> -  TREE_TYPE (ctx->receiver_decl) = build_pointer_type (type);
> +  TREE_TYPE (ctx->receiver_decl)
> += build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
>  }
>  
>  /* Instantiate decls as necessary in CTX to satisfy the data sharing

Jakub

Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler

2014-11-19 Thread Ramana Radhakrishnan




On 14/11/14 15:12, Maxim Kuvyrkov wrote:

On Nov 14, 2014, at 8:38 AM, Jeff Law  wrote:


On 10/20/14 22:06, Maxim Kuvyrkov wrote:

Hi,
Ramana, this change requires benchmarking, which I can't easily do
at

the moment. I would appreciate any benchmarking results that you can
share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
needs to be tuned/confirmed for Cortex-A15.
What were the results of that benchmarking?  IIRC I tabled reviewing this work 
waiting for those results (and I probably should have let you know that.  
Sorry,  my bad there).


I don't have the benchmarking results yet, and I was hoping for ARM to help 
with getting the numbers.  The arm maintainers still need to OK the 
arm-specific portion of the patch, which, I imagine, will happen only of 
benchmark scores improve.



I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the 
trees you gave me links to against the equivalent version on trunk.


The results with SPEC2k on A15 were in the noise with the default value 
for PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm 
still waiting on results for values 0, 1 and 3 and hopefully something 
will come back soon for SPEC2k.





@@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void)
   return issue_rate > 1 ? issue_rate : 0;
 }

+/* Enable modeling of Cortex-A15 L2 auto-prefetcher.  */
+static int
+arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
+{
+  switch (arm_tune)
+{
+case cortexa15:
+  return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
+
+default:
+  return 0;
+}
+}
+



It would be better to have this as a flag in the tuning tables rather 
than hardcoding for a core here. The backend has been moving in that 
direction for all core centric information and it is preferable that be 
continued.


So this logic here should just be

if (current_tune->multipass_lookahead)
  return autopref_multipass_lookahead_guard (insn, ready_index);
else
  return 0;




regards
Ramana






...

Can this be built on top of Bin's work for insn fusion?  There's a lot of 
commonality in the structure of the insns you care about.  He's already got a 
nice little priority function that I think you could utilize to to ensure the 
insns with smaller offsets fire first.


I would argue that macro-fusion should have been implemented the way 
autopref_model is -- via 
targetm.sched.first_cycle_multipass_dfa_lookahead_guard hook.  To implement the 
autopref model I cleaned up and generalized existing infrastructure (max_issue 
and dfa_lookahead_guard hook) instead of adding yet another decision-making 
primitive to the scheduler.




My biggest concern would be sched2 coming along and undoing that work since 
you're not going to fuse those into move-multiple types of instructions.


The autoprefetcher will be active only during sched2.  It is disabled during 
sched1 by the fact that max_issue is not used when scheduling for register 
pressure.

Thanks,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-19 Thread Yangfei (Felix)

> > Sorry for missing the point.  It seems to me that 't2' here will conflict 
> > with
> condition of the pattern *movhi_insn_arch4:
> >"TARGET_ARM
> > && arm_arch4
> > && (register_operand (operands[0], HImode)
> > || register_operand (operands[1], HImode))"
> >
> > #define TARGET_ARM  (! TARGET_THUMB)
> > /* 32-bit Thumb-2 code.  */
> > #define TARGET_THUMB2   (TARGET_THUMB &&
> arm_arch_thumb2)
> >
> 
> Bah, Indeed ! - I misremembered the t2 there, my mistake.
> 
> Yes you are right there, but what I'd like you to do is to use that mechanism
> rather than putting all this logic in the predicate.
> 
> So, I'd prefer you to add a v6t2 to the values for the "arch" attribute, 
> don't forget
> to update the comments above.
> 
> and in arch_enabled you need to enforce this with
> 
>   (and (eq_attr "arch" "v6t2")
>(match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2"))
>(const_string "yes")
> 
> And in the pattern use v6t2 ...
> 
> arm_arch_thumb2 implies that this is at the architecture level of v6t2.
> Therefore TARGET_ARM && arm_arch_thumb2 implies ARM state.


Hi Ramana, 
Thank you for your suggestions.  I rebased the patch on the latest trunk 
and updated it accordingly. 
As this patch will not work for architectures older than armv6t2,  I also 
prefer Thomas's patch to fix for them. 
I am currently performing test for this patch.  Assuming no issues pops up, 
OK for the trunk?  
And is it necessary to backport this patch to the 4.8 & 4.9 branches? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217717)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,11 @@
+2014-11-19  Felix Yang  
+   Shanyao Chen  
+
+   PR target/59593
+   * config/arm/arm.md (define_attr "arch"): Add v6t2.
+   (define_attr "arch_enabled"): Add test for the above.
+   (*movhi_insn_arch4): Add new alternative.
+
 2014-11-18  Felix Yang  
 
* config/aarch64/aarch64.c (doloop_end): New pattern.
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 217717)
+++ gcc/config/arm/arm.md   (working copy)
@@ -125,9 +125,10 @@
 ; This can be "a" for ARM, "t" for either of the Thumbs, "32" for
 ; TARGET_32BIT, "t1" or "t2" to specify a specific Thumb mode.  "v6"
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
-; arm_arch6.  This attribute is used to compute attribute "enabled",
-; use type "any" to enable an alternative in all cases.
-(define_attr "arch" 
"any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3"
+; arm_arch6.  "v6t2" for Thumb-2 with arm_arch6.  This attribute is
+; used to compute attribute "enabled", use type "any" to enable an
+; alternative in all cases.
+(define_attr "arch" 
"any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3"
   (const_string "any"))
 
 (define_attr "arch_enabled" "no,yes"
@@ -162,6 +163,10 @@
  (match_test "TARGET_32BIT && !arm_arch6"))
 (const_string "yes")
 
+(and (eq_attr "arch" "v6t2")
+ (match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2"))
+(const_string "yes")
+
 (and (eq_attr "arch" "avoid_neon_for_64bits")
  (match_test "TARGET_NEON")
  (not (match_test "TARGET_PREFER_NEON_64BITS")))
@@ -6288,8 +6293,8 @@
 
 ;; Pattern to recognize insn generated default case above
 (define_insn "*movhi_insn_arch4"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,m,r")
-   (match_operand:HI 1 "general_operand"  "rIk,K,r,mi"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m,r")
+   (match_operand:HI 1 "general_operand"  "rIk,K,n,r,mi"))]
   "TARGET_ARM
&& arm_arch4
&& (register_operand (operands[0], HImode)
@@ -6297,16 +6302,19 @@
   "@
mov%?\\t%0, %1\\t%@ movhi
mvn%?\\t%0, #%B1\\t%@ movhi
+   movw%?\\t%0, %1\\t%@ movhi
str%(h%)\\t%1, %0\\t%@ movhi
ldr%(h%)\\t%0, %1\\t%@ movhi"
   [(set_attr "predicable" "yes")
-   (set_attr "pool_range" "*,*,*,256")
-   (set_attr "neg_pool_range" "*,*,*,244")
+   (set_attr "pool_range" "*,*,*,*,256")
+   (set_attr "neg_pool_range" "*,*,*,*,244")
+   (set_attr "arch" "*,*,v6t2,*,*")
(set_attr_alternative "type"
  [(if_then_else (match_operand 1 "const_int_operand" 
"")
 (const_string "mov_imm" )
 (const_string "mov_reg"))
   (const_string "mvn_imm")
+  (const_string "mov_imm")
   (const_string "store1")
   (const_string "load1")])]
 )


arm-patch-v3.diff
Description: arm-patch-v3.diff

Re: [PATCH] Fix simd clone vectorization with EH (PR tree-optimization/63915)

2014-11-19 Thread Richard Biener

On Tue, 18 Nov 2014, Jakub Jelinek wrote:

> Hi!
> 
> Simd clone vectorization uses vect_finish_stmt_generation which
> handles adding the new calls properly to EH tables, but afterwards
> we replace the original call with a normal assignment, and if the original
> call can throw, we need to remove it from the EH tables.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?

Ok.

Thanks,
Richard.

> 2014-11-18  Jakub Jelinek  
> 
>   PR tree-optimization/63915
>   * tree-vect-stmts.c (vectorizable_simd_clone_call): Pass
>   true instead of false as last argument to gsi_replace.
> 
>   * c-c++-common/gomp/pr60823-4.c: New test.
> 
> --- gcc/tree-vect-stmts.c.jj  2014-11-14 00:10:39.0 +0100
> +++ gcc/tree-vect-stmts.c 2014-11-18 17:02:15.635257023 +0100
> @@ -3195,7 +3195,7 @@ vectorizable_simd_clone_call (gimple stm
>set_vinfo_for_stmt (new_stmt, stmt_info);
>set_vinfo_for_stmt (stmt, NULL);
>STMT_VINFO_STMT (stmt_info) = new_stmt;
> -  gsi_replace (gsi, new_stmt, false);
> +  gsi_replace (gsi, new_stmt, true);
>unlink_stmt_vdef (stmt);
>  
>return true;
> --- gcc/testsuite/c-c++-common/gomp/pr60823-4.c.jj2014-11-18 
> 17:06:38.859319961 +0100
> +++ gcc/testsuite/c-c++-common/gomp/pr60823-4.c   2014-11-18 
> 17:06:34.602383632 +0100
> @@ -0,0 +1,7 @@
> +/* PR tree-optimization/63915 */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_simd_clones } */
> +/* { dg-options "-O2 -fopenmp-simd" } */
> +/* { dg-additional-options "-fpic" { target fpic } } */
> +
> +#include "pr60823-2.c"
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

Re: [PATCH, PR62167] Fix tail-merge pass for dead type-unsafe code

2014-11-19 Thread Richard Biener

On Tue, 18 Nov 2014, Jeff Law wrote:

> On 11/18/14 09:57, Tom de Vries wrote:
> > Richard,
> > 
> > this (trunk) patch fixes PR62167.
> > 
> > The patch fixes a problem that triggers with the test-case on the 4.8
> > branch, when tail-merge makes a dead type-unsafe load alive.
> > 
> > I'm not able to reproduce this bug on 4.9 and trunk with the same
> > test-case. On those branches, the tail-merge already does not happen.
> > 
> > The reason for the difference is as follows: With 4.8 the two phi
> > arguments of the phi in the tail block are value-numbered identically:
> > ...
> > SCC consists of: p_14
> > Value numbering p_14 stmt = p_14 = MEM[(struct head *)_13].first;
> > Setting value number of p_14 to p_14 (changed)
> > 
> > SCC consists of: p_15
> > Value numbering p_15 stmt = p_15 = _13->next;
> > Setting value number of p_15 to p_14 (changed)
> > ...
> > 
> > With 4.9 (and trunk), that's not the case:
> > ...
> > SCC consists of: p_14
> > Value numbering p_14 stmt = p_14 = MEM[(struct head *)&heads][k.1_9].first;
> > Setting value number of p_14 to p_14 (changed)
> > 
> > SCC consists of: p_15
> > Value numbering p_15 stmt = p_15 = _13->next;
> > Setting value number of p_15 to p_15 (changed)
> > ...
> > 
> > I'm not sure the bug triggers on trunk and 4.9, but I see no reason why
> > it could not trigger, so I'd prefer to apply the patch to 4.9 and trunk
> > as well.
> > 
> > The patch introduces an xfail for pr51879-12.c. I can follow up with a
> > patch to improve upon that, but I think that's better limited to trunk
> > only.
> > 
> > Bootstrapped and reg-tested on x86_64/trunk.
> So instead of simply disabling for anything with virtual operands, shouldn't
> instead we be comparing the virtual operands and alias information?  Seems to
> me that if we did that properly, this would "just work".  Right?

But that would simply use operand_equal_p () 

Richard.

Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler

2014-11-19 Thread Maxim Kuvyrkov

On Nov 19, 2014, at 12:27 PM, Ramana Radhakrishnan 
 wrote:

> 
> 
> On 14/11/14 15:12, Maxim Kuvyrkov wrote:
>> On Nov 14, 2014, at 8:38 AM, Jeff Law  wrote:
>> 
>>> On 10/20/14 22:06, Maxim Kuvyrkov wrote:
 Hi,
 Ramana, this change requires benchmarking, which I can't easily do
 at
>>> the moment. I would appreciate any benchmarking results that you can
>>> share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
>>> needs to be tuned/confirmed for Cortex-A15.
>>> What were the results of that benchmarking?  IIRC I tabled reviewing this 
>>> work waiting for those results (and I probably should have let you know 
>>> that.  Sorry,  my bad there).
>> 
>> I don't have the benchmarking results yet, and I was hoping for ARM to help 
>> with getting the numbers.  The arm maintainers still need to OK the 
>> arm-specific portion of the patch, which, I imagine, will happen only of 
>> benchmark scores improve.
> 
> 
> I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the trees 
> you gave me links to against the equivalent version on trunk.
> 
> The results with SPEC2k on A15 were in the noise with the default value for 
> PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm still waiting 
> on results for values 0, 1 and 3 and hopefully something will come back soon 
> for SPEC2k.

I think there is a good chance that this optimization will not improve SPEC 
benchmarks, my main concern is to make sure SPECs don't regress.  If SPECs are 
neutral and the patch gives good improvement (2.5 times faster) on another 
benchmark (STREAM), is this a good enough argument for commit?

> 
> 
>> 
>> @@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void)
>>   return issue_rate > 1 ? issue_rate : 0;
>> }
>> 
>> +/* Enable modeling of Cortex-A15 L2 auto-prefetcher.  */
>> +static int
>> +arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
>> +{
>> +  switch (arm_tune)
>> +{
>> +case cortexa15:
>> +  return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
>> +
>> +default:
>> +  return 0;
>> +}
>> +}
>> +
> 
> 
> It would be better to have this as a flag in the tuning tables rather than 
> hardcoding for a core here. The backend has been moving in that direction for 
> all core centric information and it is preferable that be continued.
> 
> So this logic here should just be
> 
> if (current_tune->multipass_lookahead)
>  return autopref_multipass_lookahead_guard (insn, ready_index);
> else
>  return 0;
> 

OK, I'll convert this to use tuning tables.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)

2014-11-19 Thread Richard Biener

On Wed, 19 Nov 2014, Jakub Jelinek wrote:

> On Wed, Nov 19, 2014 at 10:02:18AM +0100, Richard Biener wrote:
> > On Tue, 18 Nov 2014, Jakub Jelinek wrote:
> > > Apparently, expand_expr with EXPR_WRITE can return
> > > a SUBREG with SUBREG_PROMOTED_VAR_P set on it.  For
> > 
> > Huh, that looks bogus to me.  But of course I know nothing
> > (read: not enough) to really tell.  Eric?
> 
> I've tried to look where it comes from, and it dates back to r2xxx
> or so, so forever.
> And store_expr has a large:
>   else if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
> /* If this is a scalar in a register that is stored in a wider mode
>than the declared mode, compute the result into its declared mode
>and then convert to the wider mode.  Our value is the computed
>expression.  */
> handling block.  For targets with only word sized operations something
> like that actually makes a lot of sense, I have just not been aware of that.

Ok.  Then the patch is ok.

Thanks,
Richard.

Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-19 Thread Ramana Radhakrishnan




On 19/11/14 09:29, Yangfei (Felix) wrote:

Sorry for missing the point.  It seems to me that 't2' here will conflict with

condition of the pattern *movhi_insn_arch4:

"TARGET_ARM
 && arm_arch4
 && (register_operand (operands[0], HImode)
 || register_operand (operands[1], HImode))"

#define TARGET_ARM  (! TARGET_THUMB)
/* 32-bit Thumb-2 code.  */
#define TARGET_THUMB2   (TARGET_THUMB &&

arm_arch_thumb2)




Bah, Indeed ! - I misremembered the t2 there, my mistake.

Yes you are right there, but what I'd like you to do is to use that mechanism
rather than putting all this logic in the predicate.

So, I'd prefer you to add a v6t2 to the values for the "arch" attribute, don't 
forget
to update the comments above.

and in arch_enabled you need to enforce this with

   (and (eq_attr "arch" "v6t2")
(match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2"))
 (const_string "yes")

And in the pattern use v6t2 ...

arm_arch_thumb2 implies that this is at the architecture level of v6t2.
Therefore TARGET_ARM && arm_arch_thumb2 implies ARM state.



Hi Ramana,
 Thank you for your suggestions.  I rebased the patch on the latest trunk 
and updated it accordingly.
 As this patch will not work for architectures older than armv6t2,  I also 
prefer Thomas's patch to fix for them.
 I am currently performing test for this patch.  Assuming no issues pops 
up, OK for the trunk?
 And is it necessary to backport this patch to the 4.8 & 4.9 branches?




This is OK for trunk if no regressions - please let it bake for a few 
days on trunk to let auto-testers catch up. This is OK for 4.8 / 4.9 
after that and please test your backport.


regards
Ramana





Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217717)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,11 @@
+2014-11-19  Felix Yang  
+   Shanyao Chen  
+
+   PR target/59593
+   * config/arm/arm.md (define_attr "arch"): Add v6t2.
+   (define_attr "arch_enabled"): Add test for the above.
+   (*movhi_insn_arch4): Add new alternative.
+
  2014-11-18  Felix Yang  

* config/aarch64/aarch64.c (doloop_end): New pattern.
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 217717)
+++ gcc/config/arm/arm.md   (working copy)
@@ -125,9 +125,10 @@
  ; This can be "a" for ARM, "t" for either of the Thumbs, "32" for
  ; TARGET_32BIT, "t1" or "t2" to specify a specific Thumb mode.  "v6"
  ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
-; arm_arch6.  This attribute is used to compute attribute "enabled",
-; use type "any" to enable an alternative in all cases.
-(define_attr "arch" 
"any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3"
+; arm_arch6.  "v6t2" for Thumb-2 with arm_arch6.  This attribute is
+; used to compute attribute "enabled", use type "any" to enable an
+; alternative in all cases.
+(define_attr "arch" 
"any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3"
(const_string "any"))

  (define_attr "arch_enabled" "no,yes"
@@ -162,6 +163,10 @@
  (match_test "TARGET_32BIT && !arm_arch6"))
 (const_string "yes")

+(and (eq_attr "arch" "v6t2")
+ (match_test "TARGET_32BIT && arm_arch6 && arm_arch_thumb2"))
+(const_string "yes")
+
 (and (eq_attr "arch" "avoid_neon_for_64bits")
  (match_test "TARGET_NEON")
  (not (match_test "TARGET_PREFER_NEON_64BITS")))
@@ -6288,8 +6293,8 @@

  ;; Pattern to recognize insn generated default case above
  (define_insn "*movhi_insn_arch4"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,m,r")
-   (match_operand:HI 1 "general_operand"  "rIk,K,r,mi"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r,r,m,r")
+   (match_operand:HI 1 "general_operand"  "rIk,K,n,r,mi"))]
"TARGET_ARM
 && arm_arch4
 && (register_operand (operands[0], HImode)
@@ -6297,16 +6302,19 @@
"@
 mov%?\\t%0, %1\\t%@ movhi
 mvn%?\\t%0, #%B1\\t%@ movhi
+   movw%?\\t%0, %1\\t%@ movhi
 str%(h%)\\t%1, %0\\t%@ movhi
 ldr%(h%)\\t%0, %1\\t%@ movhi"
[(set_attr "predicable" "yes")
-   (set_attr "pool_range" "*,*,*,256")
-   (set_attr "neg_pool_range" "*,*,*,244")
+   (set_attr "pool_range" "*,*,*,*,256")
+   (set_attr "neg_pool_range" "*,*,*,*,244")
+   (set_attr "arch" "*,*,v6t2,*,*")
 (set_attr_alternative "type"
   [(if_then_else (match_operand 1 "const_int_operand" 
"")
  (const_string "mov_imm" )
  (const_string "mov_reg"))
(const_string "mvn_imm")
+  (const_string "mov_imm")

Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-19 Thread Richard Biener

On Tue, Nov 18, 2014 at 10:04 PM, David Malcolm  wrote:
> On Tue, 2014-11-18 at 10:53 +0100, Richard Biener wrote:
>> On Tue, Nov 18, 2014 at 2:59 AM, David Malcolm  wrote:
>> > On Mon, 2014-11-17 at 11:06 +0100, Richard Biener wrote:
>> >> On Sat, Nov 15, 2014 at 12:00 PM, David Malcolm  
>> >> wrote:
>> >> > On Thu, 2014-11-13 at 11:45 +0100, Richard Biener wrote:
>> >> >> On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm  
>> >> >> wrote:
>> >> >> > On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote:
>> >> >> >> On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek  
>> >> >> >> wrote:
>> >> >> >> > On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:
>> >> >> >> >> On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:
>> >> >> >> >> > On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:
>> >> >> >> >> > > To be constructive here - the above case is from within a
>> >> >> >> >> > > GIMPLE_ASSIGN case label
>> >> >> >> >> > > and thus I'd have expected
>> >> >> >> >> > >
>> >> >> >> >> > > case GIMPLE_ASSIGN:
>> >> >> >> >> > >   {
>> >> >> >> >> > > gassign *a1 = as_a  (s1);
>> >> >> >> >> > > gassign *a2 = as_a  (s2);
>> >> >> >> >> > >   lhs1 = gimple_assign_lhs (a1);
>> >> >> >> >> > >   lhs2 = gimple_assign_lhs (a2);
>> >> >> >> >> > >   if (TREE_CODE (lhs1) != SSA_NAME
>> >> >> >> >> > >   && TREE_CODE (lhs2) != SSA_NAME)
>> >> >> >> >> > > return (operand_equal_p (lhs1, lhs2, 0)
>> >> >> >> >> > > && gimple_operand_equal_value_p 
>> >> >> >> >> > > (gimple_assign_rhs1 (a1),
>> >> >> >> >> > >  
>> >> >> >> >> > > gimple_assign_rhs1 (a2)));
>> >> >> >> >> > >   else if (TREE_CODE (lhs1) == SSA_NAME
>> >> >> >> >> > >&& TREE_CODE (lhs2) == SSA_NAME)
>> >> >> >> >> > > return vn_valueize (lhs1) == vn_valueize (lhs2);
>> >> >> >> >> > >   return false;
>> >> >> >> >> > >   }
>> >> >> >> >> > >
>> >> >> >> >> > > instead.  That's the kind of changes I have expected and 
>> >> >> >> >> > > have approved of.
>> >> >> >> >> >
>> >> >> >> >> > But even that looks like just adding extra work for all 
>> >> >> >> >> > developers, with no
>> >> >> >> >> > gain.  You only have to add extra code and extra temporaries, 
>> >> >> >> >> > in switches
>> >> >> >> >> > typically also have to add {} because of the temporaries and 
>> >> >> >> >> > thus extra
>> >> >> >> >> > indentation level, and it doesn't simplify anything in the 
>> >> >> >> >> > code.
>> >> >> >> >>
>> >> >> >> >> The branch attempts to use the C++ typesystem to capture 
>> >> >> >> >> information
>> >> >> >> >> about the kinds of gimple statement we expect, both:
>> >> >> >> >>   (A) so that the compiler can detect type errors, and
>> >> >> >> >>   (B) as a comprehension aid to the human reader of the code
>> >> >> >> >>
>> >> >> >> >> The ideal here is when function params and struct field can be
>> >> >> >> >> strengthened from "gimple" to a subclass ptr.  This captures the
>> >> >> >> >> knowledge that every use of a function or within a struct has a 
>> >> >> >> >> given
>> >> >> >> >> gimple code.
>> >> >> >> >
>> >> >> >> > I just don't like all the as_a/is_a stuff enforced everywhere,
>> >> >> >> > it means more typing, more temporaries, more indentation.
>> >> >> >> > So, as I view it, instead of the checks being done cheaply (yes, 
>> >> >> >> > I think
>> >> >> >> > the gimple checking as we have right now is very cheap) under the
>> >> >> >> > hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those 
>> >> >> >> > changes
>> >> >> >> > put the burden on the developers, who has to check that manually 
>> >> >> >> > through
>> >> >> >> > the as_a/is_a stuff everywhere, more typing and uglier syntax.
>> >> >> >> > I just don't see that as a step forward, instead a huge step 
>> >> >> >> > backwards.
>> >> >> >> > But perhaps I'm alone with this.
>> >> >> >> > Can you e.g. compare the size of - lines in your patchset 
>> >> >> >> > combined, and
>> >> >> >> > size of + lines in your patchset?  As in, if your changes lead to 
>> >> >> >> > less
>> >> >> >> > typing or more.
>> >> >> >>
>> >> >> >> I see two ways out here.  One is to add overloads to all the 
>> >> >> >> functions
>> >> >> >> taking the special types like
>> >> >> >>
>> >> >> >> tree
>> >> >> >> gimple_assign_rhs1 (gimple *);
>> >> >> >>
>> >> >> >> or simply add
>> >> >> >>
>> >> >> >> gassign *operator ()(gimple *g) { return as_a  (g); }
>> >> >> >>
>> >> >> >> into a gimple-compat.h header which you include in places that
>> >> >> >> are not converted "nicely".
>> >> >> >
>> >> >> > Thanks for the suggestions.
>> >> >> >
>> >> >> > Am I missing something, or is the gimple-compat.h idea above not 
>> >> >> > valid C
>> >> >> > ++?
>> >> >> >
>> >> >> > Note that "gimple" is still a typedef to
>> >> >> >   gimple_statement_base *
>> >> >> > (as noted before, the gimple -> gimple * change would break e

Re: [PATCH] Cancel bswap opt when intermediate stmts are reused

2014-11-19 Thread Richard Biener

On Tue, Nov 18, 2014 at 5:33 PM, Thomas Preud'homme
 wrote:
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Monday, November 17, 2014 12:47 PM
>>
>> Hmm.  I am a little bit concerned about the malloc traffic generated here.
>> So why not use a vec, get rid of the ->next pointer and
>> use a hash_map  to associate the stmt with
>> an index into the vector?
>
> Sure, it even makes things easier. However I don't understand why a vector is
> better than malloc. Is it a matter of fragmentation?

That and with a vector you get log(N) allocations instead of N.

>>
>> At this point I'd rather leave DCE to DCE ...
>
> I thought since all information is there why not do it. It makes it easier to
> read the dump of the pass.

Fair enough ...

>>
>> Ick - do we really have to use gotos here?  Can you refactor this
>> a bit to avoid it?
>
> Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and
> found a way to avoid it.

Thanks.

>>
>> The whole scheme wouldn't exactly fly with the idea of eventually
>> using a lattice to propagate the symbolic numbers, but well...
>
>>
>> I think the overall idea is sound and if you have time to adjust according
>> to my comments that would be nice.
>
> To be honest I think it should wait for next stage1. After several ping I took
> another look at the testcase with the idea of measuring the size reduction
> the patch could give and was surprised that in all cases I could construct the
> size was actually bigger. Performance might be improved nonetheless but
> I think this needs more careful consideration.
>
> And as you said the approach would need to be changed if bswap was
> rewritten to do a forward analysis. At last, nobody reported a code size or
> performance regression so far due to the changes so that might be a non
> issue. If such a report happens, then it will be a good time to revisit that
> decision.
>
> Do you agree?

Yes.  If you can't come up with a testcase that it improves but it only
pessimizes cases then IMHO this would be a very premature optimization.
I suspect that we'd need to have a more complex cost analysis factoring
in that we _do_ remove a lot of code even if parts need to be kept alive.
So maybe only discard the transform if _all_ stmts need to be kept
live (as when we even remove one of the partial loads then this should
alread offset the single load we add ... for the bswap this may be
slightly different).

>
>>
>> Sorry for the very late review.
>
> That's alright, I know how it is. Thank you for keeping track of it. I 
> actually
> feel sorry I didn't warn about my findings. I thought the patch fell through
> the cracks and didn't want to spam gcc-patches uselessly.

Sure.

Thanks,
Richard.

> Best regards,
>
> Thomas
>
>
>

[PATCH, libgomp]: Require vect_simd_clones effective target for examples-4/e.53.5.{c,f90}

2014-11-19 Thread Uros Bizjak

Hello!

2014-11-19  Uros Bizjak  

* testsuite/libgomp.c/examples-4/e.53.5.c: Require
vect_simd_clones effective target.
* testsuite/libgomp.fortran/examples-4/e.53.5.f90: Ditto.

Tested on x86_64-linux-gnu {,-m32} CentOS 5.11.

OK for mainline?

Uros.
Index: testsuite/libgomp.fortran/examples-4/e.53.5.f90
===
--- testsuite/libgomp.fortran/examples-4/e.53.5.f90 (revision 217718)
+++ testsuite/libgomp.fortran/examples-4/e.53.5.f90 (working copy)
@@ -1,4 +1,4 @@
-! { dg-do run }
+! { dg-do run { target vect_simd_clones } }
 ! { dg-options "-O2" }
 ! { dg-additional-options "-msse2" { target sse2_runtime } }
 ! { dg-additional-options "-mavx" { target avx_runtime } }
Index: testsuite/libgomp.c/examples-4/e.53.5.c
===
--- testsuite/libgomp.c/examples-4/e.53.5.c (revision 217718)
+++ testsuite/libgomp.c/examples-4/e.53.5.c (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target vect_simd_clones } } */
 /* { dg-options "-O2" } */
 /* { dg-additional-options "-msse2" { target sse2_runtime } } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */

Re: [PATCH, libgomp]: Require vect_simd_clones effective target for examples-4/e.53.5.{c,f90}

2014-11-19 Thread Jakub Jelinek

On Wed, Nov 19, 2014 at 11:26:25AM +0100, Uros Bizjak wrote:
> 2014-11-19  Uros Bizjak  
> 
> * testsuite/libgomp.c/examples-4/e.53.5.c: Require
> vect_simd_clones effective target.
> * testsuite/libgomp.fortran/examples-4/e.53.5.f90: Ditto.
> 
> Tested on x86_64-linux-gnu {,-m32} CentOS 5.11.
> 
> OK for mainline?

Yes, thanks.

> Index: testsuite/libgomp.fortran/examples-4/e.53.5.f90
> ===
> --- testsuite/libgomp.fortran/examples-4/e.53.5.f90   (revision 217718)
> +++ testsuite/libgomp.fortran/examples-4/e.53.5.f90   (working copy)
> @@ -1,4 +1,4 @@
> -! { dg-do run }
> +! { dg-do run { target vect_simd_clones } }
>  ! { dg-options "-O2" }
>  ! { dg-additional-options "-msse2" { target sse2_runtime } }
>  ! { dg-additional-options "-mavx" { target avx_runtime } }
> Index: testsuite/libgomp.c/examples-4/e.53.5.c
> ===
> --- testsuite/libgomp.c/examples-4/e.53.5.c   (revision 217718)
> +++ testsuite/libgomp.c/examples-4/e.53.5.c   (working copy)
> @@ -1,4 +1,4 @@
> -/* { dg-do run } */
> +/* { dg-do run { target vect_simd_clones } } */
>  /* { dg-options "-O2" } */
>  /* { dg-additional-options "-msse2" { target sse2_runtime } } */
>  /* { dg-additional-options "-mavx" { target avx_runtime } } */


Jakub

[PATCH 01/21] PR jit/63854: Fix memory leak within gcc_options

2014-11-19 Thread David Malcolm

Valgrind shows this fixes ~1 KB of leak per iteration (on x86_64) by
plugging these leaks allocated at opts.c lines 286 and 289:

==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 875 of 917
==57820==at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x59A6747: xmalloc (xmalloc.c:147)
==57820==by 0x595542A: init_options_struct(gcc_options*, gcc_options*) 
(opts.c:286)
==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)
==57820==
==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 876 of 917
==57820==at 0x4A081D4: calloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x59A6780: xcalloc (xmalloc.c:162)
==57820==by 0x595543E: init_options_struct(gcc_options*, gcc_options*) 
(opts.c:289)
==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* opts.c (finalize_options_struct): New.
* opts.h (finalize_options_struct): New.
* toplev.c (toplev::finalize): Call finalize_options_struct
on global_options and global_options_set.
---
 gcc/opts.c   | 8 
 gcc/opts.h   | 1 +
 gcc/toplev.c | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/gcc/opts.c b/gcc/opts.c
index d22882b..dabd3c6 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -307,6 +307,14 @@ init_options_struct (struct gcc_options *opts, struct 
gcc_options *opts_set)
   targetm_common.option_init_struct (opts);
 }
 
+/* Release any allocations owned by OPTS.  */
+
+void
+finalize_options_struct (struct gcc_options *opts)
+{
+  XDELETEVEC (opts->x_param_values);
+}
+
 /* If indicated by the optimization level LEVEL (-Os if SIZE is set,
-Ofast if FAST is set, -Og if DEBUG is set), apply the option DEFAULT_OPT
to OPTS and OPTS_SET, diagnostic context DC, location LOC, with language
diff --git a/gcc/opts.h b/gcc/opts.h
index f694082..c3ec942 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -325,6 +325,7 @@ extern void decode_cmdline_options_to_array (unsigned int 
argc,
 extern void init_options_once (void);
 extern void init_options_struct (struct gcc_options *opts,
 struct gcc_options *opts_set);
+extern void finalize_options_struct (struct gcc_options *opts);
 extern void decode_cmdline_options_to_array_default_mask (unsigned int argc,
  const char **argv, 
  struct 
cl_decoded_option **decoded_options,
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 2e48047..4b4e568 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2169,4 +2169,7 @@ toplev::finalize (void)
   ipa_cp_c_finalize ();
   ipa_reference_c_finalize ();
   params_c_finalize ();
+
+  finalize_options_struct (&global_options);
+  finalize_options_struct (&global_options_set);
 }
-- 
1.8.5.3

[PATCH 02/21] PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack

2014-11-19 Thread David Malcolm

Valgrind shows this fixes ~4 KB of leak per iteration (on x86_64) by
plugging this leak allocated at reginfo.c:1327:
  gcc_obstack_init (&valid_mode_changes_obstack);

==57820== 16,256 bytes in 4 blocks are definitely lost in loss record 906 of 917
==57820==at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x59A6747: xmalloc (xmalloc.c:147)
==57820==by 0x30958842DB: _obstack_begin (obstack.c:184)
==57820==by 0x51C1EC1: init_subregs_of_mode() (reginfo.c:1327)
==57820==by 0x50D2A38: init_costs() (ira-costs.c:2181)
==57820==by 0x50D74A8: ira_costs() (ira-costs.c:2211)
==57820==by 0x50D1326: ira_build() (ira-build.c:3459)
==57820==by 0x50C909C: (anonymous namespace)::pass_ira::execute(function*) 
(ira.c:5227)
==57820==by 0x51884F7: execute_one_pass(opt_pass*) (passes.c:2269)
==57820==by 0x5188B75: execute_pass_list_1(opt_pass*) (passes.c:2321)
==57820==by 0x5188B87: execute_pass_list_1(opt_pass*) (passes.c:2322)
==57820==by 0x5188BC8: execute_pass_list(function*, opt_pass*) 
(passes.c:2332)

gcc/ChangeLog:
PR jit/63854
* reginfo.c (finish_subregs_of_mode): Replace obstack_finish with
obstack_free when cleaning up valid_mode_changes_obstack.
---
 gcc/reginfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index efe23cd..c2daf22 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1343,7 +1343,7 @@ void
 finish_subregs_of_mode (void)
 {
   XDELETEVEC (valid_mode_changes);
-  obstack_finish (&valid_mode_changes_obstack);
+  obstack_free (&valid_mode_changes_obstack, NULL);
 }
 
 /* Free all data attached to the structure.  This isn't a destructor because
-- 
1.8.5.3

[PATCH 06/21] PR jit/63854: Fix leak of opts_obstack

2014-11-19 Thread David Malcolm

This was leaking about 4KB per iteration:

16,256 bytes in 4 blocks are definitely lost in loss record 233 of 239
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75C17: xmalloc (xmalloc.c:147)
   by 0x30958842DB: _obstack_begin (obstack.c:184)
   by 0x5D1D317: init_options_struct(gcc_options*, gcc_options*) (opts.c:279)
   by 0x532DB0C: toplev::main(int, char**) (toplev.c:2081)
   by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* toplev.c (toplev::finalize): Free opts_obstack.
---
 gcc/toplev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 291b84d..0d617c2 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2178,4 +2178,6 @@ toplev::finalize (void)
   /* Clean up the context (and pass_manager etc). */
   delete g;
   g = NULL;
+
+  obstack_free (&opts_obstack, NULL);
 }
-- 
1.8.5.3

[PATCH 03/21] PR jit/63854: Fix memory leaks within context/pass_manager/dump_manager

2014-11-19 Thread David Malcolm

We weren't cleaning up the context, pass_manager or dump_manager.

An earlier version of the context and pass_manager classes had them
allocated in the GC-heap, but they were moved to the system heap
before that patch was committed; they were never cleaned up, so e.g.
all passes leak on each in-process iteration.

Cleaning all of this up fixes the largest of the leaks in the PR,
about 60KB per iteration:

==57820== 80,560 (88 direct, 80,472 indirect) bytes in 1 blocks are definitely 
lost in loss record 913 of 917
==57820==at 0x4A06965: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x536A462: make_pass_dce(gcc::context*) (tree-ssa-dce.c:1703)
==57820==by 0x518B8DA: gcc::pass_manager::pass_manager(gcc::context*) 
(pass-instances.def:173)
==57820==by 0x4E8D6D9: gcc::context::context() (context.c:37)
==57820==by 0x4E2ED19: toplev::main(int, char**) (toplev.c:1211)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)
==57820==
==57820== 161,488 (352 direct, 161,136 indirect) bytes in 4 blocks are 
definitely lost in loss record 917 of 917
==57820==at 0x4A06965: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x5553102: make_pass_insert_vzeroupper(gcc::context*) 
(i386.c:2581)
==57820==by 0x55571DF: ix86_option_override() (i386.c:4325)
==57820==by 0x4E2EEF4: toplev::main(int, char**) (toplev.c:1295)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)

All of this additional cleanup happens within toplev::finalize, and
hence we don't call it from cc1, cc1plus, etc, only from libgccjit.so.

Note that some calls to register_pass_info were using a static:
  static struct register_pass_info
on the stack e.g.:

 opt_pass *pass_handle_trap_shadows = make_pass_handle_trap_shadows (g);
 static struct register_pass_info handle_trap_shadows_info
   = { pass_handle_trap_shadows, "eh_ranges",
   1, PASS_POS_INSERT_AFTER
 };

This led to crashes on the 2nd iteration: the on-stack struct was only
being built on the first iteration, using the result of the first call
to the "make_pass_" function.  Subsequent calls would thus be
registering a freed pass, giving a use-after-free error (which was
previously hidden because we weren't freeing them).

The fix is to remove the "static" from such structs.

This also fixes a leak of strings within dump_file_info for passes.

gcc/ChangeLog:
PR jit/63854
* config/alpha/alpha.c (alpha_option_override): Remove static from
"handle_trap_shadows_info" and "align_insns_info".
* config/i386/i386.c (ix86_option_override): Likewise for
"insert_vzeroupper_info".
* config/rl78/rl78.c (rl78_asm_file_start): Likewise for
"rl78_devirt_info" and "rl78_move_elim_info".
* config/rs6000/rs6000.c (rs6000_option_override): Likewise for
"analyze_swaps_info".
* context.c (gcc::context::~context): New.
* context.h (gcc::context::~context): New.
* dumpfile.c (dump_files): Add "false" initializers for new field
"owns_strings".
(gcc::dump_manager::~dump_manager): New.
(gcc::dump_manager::dump_register): Add param "take_ownership".
* dumpfile.h (struct dump_file_info): Add field "owns_strings".
(gcc::dump_manager::~dump_manager): New.
(gcc::dump_manager::dump_register): Add param "take_ownership".
* pass_manager.h (gcc::pass_manager::operator delete): New.
(gcc::pass_manager::~pass_manager): New.
* passes.c (pass_manager::register_one_dump_file): Pass "true" to
new "owns_strings" argument to dump_register.
(pass_manager::operator delete): New.
(delete_pass_tree): New function.
(pass_manager::~pass_manager): New.
* statistics.c (statistics_early_init): Pass "false" to
new "owns_strings" argument to dump_register.
* toplev.c (toplev::finalize): Clean up the context and thus the
things it owns.
---
 gcc/config/alpha/alpha.c   |  4 ++--
 gcc/config/i386/i386.c |  2 +-
 gcc/config/rl78/rl78.c |  4 ++--
 gcc/config/rs6000/rs6000.c |  2 +-
 gcc/context.c  |  6 ++
 gcc/context.h  |  1 +
 gcc/dumpfile.c | 47 ++
 gcc/dumpfile.h | 11 ++-
 gcc/pass_manager.h |  2 ++
 gcc/passes.c   | 39 +-
 gcc/statistics.c   |  3 ++-
 gcc/toplev.c

[PATCH 00/21] PR 63854: Fix various memory leaks

2014-11-19 Thread David Malcolm

Running the jit testsuite under valgrind showed various memory leaks.

Some are per-invocation of the compiler, and hence only affect
libgccjit.so (although presumably it's useful to cc1 etc to get
valgrind clean).

Others appear to be per-function and hence affect the non-JIT use
cases.

The following patch kit fixes most of the leaks.

It also adds a contrib/valgrind.supp suppressions file, for sparseset
(though I see now that gcc/configure.ac can detect if
valgrind/memcheck.h is available and use it if
  --enable-valgrind-annotations
Should I instead require that option when doing such testing?).

Successfully bootstrapped®rtested the cumulative effect of the
patches on x86_64-unknown-linux-gnu (Fedora 20).

Are the non-JIT parts OK for trunk?
(Presumably I can give myself approval for the JIT parts)

David Malcolm (21):
  PR jit/63854: Fix memory leak within gcc_options
  PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack
  PR jit/63854: Fix memory leaks within
context/pass_manager/dump_manager
  PR jit/63854: Fix memory leak within bb-reorder.c
  PR jit/63854: Fix memory leak of save_decoded_options
  PR jit/63854: Fix leak of opts_obstack
  PR jit/63854: Fix leak of optimization_summary_obstack
  PR jit/63854: Add ira_costs_c_finalize
  PR jit/63854: Don't leak producer_string in dwarf2out.c
  PR jit/63854: Fix leak of worklist within jit-recording.c
  PR jit/63854: Fix leak of "avail" within tree-ssa-pre.c
  PR jit/63854: Add a valgrind suppresion file
  PR jit/63854: Add support for running "make check-jit" under valgrind
  PR jit/63854: Fix leak of paths within jump threading
  PR jit/63854: lra.c: Fix leak of point_freq_vec's buffer when calling
lra_inheritance
  PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS
  PR jit/63854: Fix leaking vec in jit
  PR jit/63854: Add "long-term" allocator to gcc::context
  PR jit/63854: Fix leak of ipa hooks
  PR jit/63854: Fix leak in ipa-icf.c
  PR jit/63854: Fix leaks in test-fuzzer.c

 contrib/valgrind.supp  | 11 ++
 gcc/bb-reorder.c   |  2 +-
 gcc/config/alpha/alpha.c   |  4 +-
 gcc/config/i386/i386.c |  2 +-
 gcc/config/rl78/rl78.c |  4 +-
 gcc/config/rs6000/rs6000.c |  2 +-
 gcc/context.c  | 24 
 gcc/context.h  | 47 +++
 gcc/dumpfile.c | 47 +--
 gcc/dumpfile.h | 11 +-
 gcc/dwarf2out.c|  2 +
 gcc/ipa-icf.c  |  1 +
 gcc/ipa-prop.c |  3 +-
 gcc/ipa-reference.c| 17 +++-
 gcc/ira-costs.c|  6 +++
 gcc/ira.h  |  3 ++
 gcc/jit/jit-playback.c | 73 +--
 gcc/jit/jit-playback.h | 27 -
 gcc/jit/jit-recording.c| 19 +
 gcc/jit/jit-recording.h| 26 ++---
 gcc/lra.c  |  1 +
 gcc/opts.c |  8 
 gcc/opts.h |  1 +
 gcc/pass_manager.h |  3 ++
 gcc/passes.c   | 39 ++-
 gcc/reginfo.c  |  2 +-
 gcc/statistics.c   |  3 +-
 gcc/testsuite/jit.dg/jit.exp   | 79 +-
 gcc/testsuite/jit.dg/test-fuzzer.c |  6 +++
 gcc/toplev.c   | 20 --
 gcc/tree-ssa-pre.c |  2 +-
 gcc/tree-ssa-threadedge.c  |  1 +
 gcc/tree-ssa-threadupdate.c|  1 +
 gcc/tree.c | 17 +---
 34 files changed, 434 insertions(+), 80 deletions(-)
 create mode 100644 contrib/valgrind.supp

-- 
1.8.5.3

[PATCH 10/21] PR jit/63854: Fix leak of worklist within jit-recording.c

2014-11-19 Thread David Malcolm

Fix this leak:

160 bytes in 5 blocks are definitely lost in loss record 154 of 228
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75D4F: xrealloc (xmalloc.c:177)
   by 0x4DE1710: void 
va_heap::reserve(vec*&, unsigned int, bool) (vec.h:310)
   by 0x4DDFAB5: vec::reserve(unsigned int, bool) (vec.h:1428)
   by 0x4DDFBFC: vec::reserve_exact(unsigned int) (vec.h:1448)
   by 0x4DDE588: vec::create(unsigned int) (vec.h:1463)
   by 0x4DD9B9F: gcc::jit::recording::function::validate() 
(jit-recording.c:2191)
   by 0x4DD7AD3: gcc::jit::recording::context::validate() (jit-recording.c:1005)
   by 0x4DD7660: gcc::jit::recording::context::compile() (jit-recording.c:848)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

gcc/jit/ChangeLog:
PR jit/63854
* jit-recording.c (recording::function::validate): Convert
"worklist" from vec<> to autovec<> to fix a leak.
---
 gcc/jit/jit-recording.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
index 8daa8f2..8cce277 100644
--- a/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -2187,8 +2187,7 @@ recording::function::validate ()
 {
   /* Iteratively walk the graph of blocks, marking their "m_is_reachable"
 flag, starting at the initial block.  */
-  vec worklist;
-  worklist.create (m_blocks.length ());
+  auto_vec worklist (m_blocks.length ());
   worklist.safe_push (m_blocks[0]);
   while (worklist.length () > 0)
{
-- 
1.8.5.3

[PATCH 08/21] PR jit/63854: Add ira_costs_c_finalize

2014-11-19 Thread David Malcolm

this_target_ira_int->free_ira_costs () is called by ira_init_costs,
but this isn't called after the compile, causing noise when running
under valgrind.

This patch adds a ira_costs_c_finalize function to clean this up,
and calls it from toplev::finalize (and hence this isn't called by
cc1/cc1plus/etc, just by libgccjit.so).

Doing so eliminates about 27KB of "leak" from the valgrind report.

gcc/ChangeLog:
PR jit/63854
* ira-costs.c (ira_costs_c_finalize): New function.
* ira.h (ira_costs_c_finalize): New prototype.
* toplev.c (toplev::finalize): Call ira_costs_c_finalize.
---
 gcc/ira-costs.c | 6 ++
 gcc/ira.h   | 3 +++
 gcc/toplev.c| 1 +
 3 files changed, 10 insertions(+)

diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 122815b..2dabead 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2356,3 +2356,9 @@ ira_adjust_equiv_reg_cost (unsigned regno, int cost)
   else
 regno_equiv_gains[regno] += cost;
 }
+
+void
+ira_costs_c_finalize (void)
+{
+  this_target_ira_int->free_ira_costs ();
+}
diff --git a/gcc/ira.h b/gcc/ira.h
index b294ea1..d62656c 100644
--- a/gcc/ira.h
+++ b/gcc/ira.h
@@ -199,4 +199,7 @@ extern bool ira_bad_reload_regno (int, rtx, rtx);
 
 extern void ira_adjust_equiv_reg_cost (unsigned, int);
 
+/* ira-costs.c */
+extern void ira_costs_c_finalize (void);
+
 #endif /* GCC_IRA_H */
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 0d617c2..0181a3f 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2168,6 +2168,7 @@ toplev::finalize (void)
   gcse_c_finalize ();
   ipa_cp_c_finalize ();
   ipa_reference_c_finalize ();
+  ira_costs_c_finalize ();
   params_c_finalize ();
 
   finalize_options_struct (&global_options);
-- 
1.8.5.3

[PATCH 07/21] PR jit/63854: Fix leak of optimization_summary_obstack

2014-11-19 Thread David Malcolm

This was leaking ~4KB per iteration:

16,256 bytes in 4 blocks are definitely lost in loss record 234 of 239
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75C17: xmalloc (xmalloc.c:147)
   by 0x30958842DB: _obstack_begin (obstack.c:184)
   by 0x4DFECDE: bitmap_obstack_initialize(bitmap_obstack*) (bitmap.c:338)
   by 0x5CA3C1D: ipa_init() (ipa-reference.c:431)
   by 0x5CA3FB1: generate_summary() (ipa-reference.c:551)
   by 0x5CA4650: propagate() (ipa-reference.c:684)
   by 0x5CA5BEE: (anonymous namespace)::pass_ipa_reference::execute(function*) 
(ipa-reference.c:1178)
   by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5224427: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
   by 0x4E495C1: ipa_passes() (cgraphunit.c:2088)
   by 0x4E498E0: symbol_table::compile() (cgraphunit.c:2172)

gcc/ChangeLog:
PR jit/63854
* ipa-reference.c (ipa_reference_c_finalize): Release
optimization_summary_obstack.
---
 gcc/ipa-reference.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index b421f63..1ce06d1 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -1193,5 +1193,9 @@ make_pass_ipa_reference (gcc::context *ctxt)
 void
 ipa_reference_c_finalize (void)
 {
-  ipa_init_p = false;
+  if (ipa_init_p)
+{
+  bitmap_obstack_release (&optimization_summary_obstack);
+  ipa_init_p = false;
+}
 }
-- 
1.8.5.3

[PATCH 11/21] PR jit/63854: Fix leak of "avail" within tree-ssa-pre.c

2014-11-19 Thread David Malcolm

Fix this leak:

120 bytes in 5 blocks are definitely lost in loss record 141 of 227
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75D8F: xrealloc (xmalloc.c:177)
   by 0x550DEBA: void va_heap::reserve(vec*&, unsigned int, bool) (vec.h:310)
   by 0x550D509: vec::reserve(unsigned int, bool) 
(vec.h:1428)
   by 0x550DA3E: vec::reserve_exact(unsigned int) 
(vec.h:1448)
   by 0x550D167: vec::safe_grow(unsigned int) 
(vec.h:1576)
   by 0x55076FE: do_regular_insertion(basic_block_def*, basic_block_def*) 
(tree-ssa-pre.c:3209)
   by 0x5508379: insert_aux(basic_block_def*) (tree-ssa-pre.c:3526)
   by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536)
   by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536)
   by 0x55084C0: insert() (tree-ssa-pre.c:3559)
   by 0x550C5BC: (anonymous namespace)::pass_pre::execute(function*) 
(tree-ssa-pre.c:4805)

gcc/ChangeLog:
PR jit/63854
* tree-ssa-pre.c (do_regular_insertion): Convert "avail" from
vec<> to auto_vec<> to fix a leak.
---
 gcc/tree-ssa-pre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index ea99198..027dc1c 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3202,7 +3202,7 @@ do_regular_insertion (basic_block block, basic_block dom)
   bool new_stuff = false;
   vec exprs;
   pre_expr expr;
-  vec avail = vNULL;
+  auto_vec avail;
   int i;
 
   exprs = sorted_array_from_bitmap_set (ANTIC_IN (block));
-- 
1.8.5.3

[PATCH 12/21] PR jit/63854: Add a valgrind suppresion file

2014-11-19 Thread David Malcolm

Valgrind complains about uninitialized data within sparseset_bit_p.
Provide a suppression file to silence these warnings.

Valgrind requires suppression files for C++ code to use the mangled
names, so we do that here.

contrib/ChangeLog:
PR jit/63854
* valgrind.supp: New.
---
 contrib/valgrind.supp | 11 +++
 1 file changed, 11 insertions(+)
 create mode 100644 contrib/valgrind.supp

diff --git a/contrib/valgrind.supp b/contrib/valgrind.supp
new file mode 100644
index 000..ec9ff02
--- /dev/null
+++ b/contrib/valgrind.supp
@@ -0,0 +1,11 @@
+{
+   suppress-uninit-cond-with-sparseset_bit_p
+   Memcheck:Cond
+   fun:_ZL15sparseset_bit_pP13sparseset_defm
+}
+
+{
+   suppress-uninit-use-with-sparseset_bit_p
+   Memcheck:Value8
+   fun:_ZL15sparseset_bit_pP13sparseset_defm
+}
-- 
1.8.5.3

[PATCH 13/21] PR jit/63854: Add support for running "make check-jit" under valgrind

2014-11-19 Thread David Malcolm

This commit updates jit.exp so that if RUN_UNDER_VALGRIND is present
in the environment, all of the built client code using libgccjit.so is
run under valgrind, with --leak-check=full.

Hence:
  RUN_UNDER_VALGRIND= make check-jit
will run all jit testcases under valgrind (taking 27 mins on my
machine).

Results are written to testsuite/jit/test-FOO.exe.valgrind.txt

jit.exp automatically parses these result file, looking for lines of
the form
  definitely lost: 11,316 bytes in 235 blocks
  indirectly lost: 352 bytes in 4 blocks
in the valgrind log's summary footer, adding PASSes if they are zero
bytes, and, for now generating XFAILs for non-zero bytes.

Sadly this diverges jit.exp's fixed_host_execute further from DejaGnu's
host_execute, but I don't see a clean way to fix that.

This currently adds 63 PASSes and 49 XFAILs to jit.sum, giving:
  # of expected passes   2481
  # of expected failures 49

gcc/testsuite/ChangeLog:
PR jit/63854
* jit.dg/jit.exp (report_leak): New.
(parse_valgrind_logfile): New.
(fixed_host_execute): Detect if RUN_UNDER_VALGRIND is present
in the environment, and if so, run the executable under
valgrind, capturing valgrind's output to a logfile.  Parse the
log file, generating PASSes and XFAILs for the summary of leaks.
---
 gcc/testsuite/jit.dg/jit.exp | 79 +++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp
index 531e929..26ab127 100644
--- a/gcc/testsuite/jit.dg/jit.exp
+++ b/gcc/testsuite/jit.dg/jit.exp
@@ -23,6 +23,48 @@ load_lib target-libpath.exp
 load_lib gcc.exp
 load_lib dejagnu.exp
 
+# Look for lines of the form:
+#   definitely lost: 11,316 bytes in 235 blocks
+#   indirectly lost: 352 bytes in 4 blocks
+# Ideally these would report zero bytes lost (which is a PASS);
+# for now, report non-zero leaks as XFAILs.
+proc report_leak {kind name logfile line} {
+set match [regexp "$kind lost: .*" $line result]
+if $match {
+   verbose "Saw \"$result\" within \"$line\"" 3
+   # Extract bytes and blocks.
+   # These can contain commas as well as numerals,
+   # but we only care about whether we have zero.
+   regexp "$kind lost: (.+) bytes in (.+) blocks" \
+   $result -> bytes blocks
+   verbose "bytes: '$bytes'" 2
+   verbose "blocks: '$blocks'" 2
+   if { $bytes == 0 } {
+   pass "$name: $logfile: $result"
+   } else {
+   xfail "$name: $logfile: $result"
+   }
+}
+}
+
+proc parse_valgrind_logfile {name logfile} {
+verbose "parse_valgrind_logfile: $logfile" 2
+if [catch {set f [open $logfile]}] {
+   fail "$name: unable to read $logfile"
+   return
+}
+
+while { [gets $f line] >= 0 } {
+   # Strip off the PID prefix e.g. ==7675==
+   set line [regsub "==\[0-9\]*== " $line ""]
+   verbose $line 2
+
+   report_leak "definitely" $name $logfile $line
+   report_leak "indirectly" $name $logfile $line
+}
+close $f
+}
+
 # This is host_execute from dejagnu.exp commit
 #   126a089777158a7891ff975473939f08c0e31a1c
 # with the following patch applied, and renaming to "fixed_host_execute".
@@ -49,8 +91,11 @@ load_lib dejagnu.exp
 #  if there was a problem.
 #
 proc fixed_host_execute {args} {
+global env
 global text
 global spawn_id
+global srcdir
+verbose "srcdir: $srcdir" 2
 
 set timeoutmsg "Timed out: Never got started, "
 set timeout 100
@@ -75,7 +120,28 @@ proc fixed_host_execute {args} {
 # spawn the executable and look for the DejaGnu output messages from the
 # test case.
 # spawn -noecho -open [open "|./${executable}" "r"]
-spawn -noecho "./${executable}" ${params}
+
+set run_under_valgrind [info exists env(RUN_UNDER_VALGRIND)]
+
+if $run_under_valgrind {
+   set valgrind_logfile "${executable}.valgrind.txt"
+   set valgrind_params {"valgrind"}
+   lappend valgrind_params "--leak-check=full"
+   # srcdir is within gcc/testsuite; locate "contrib" relative to it:
+   lappend valgrind_params 
"--suppressions=${srcdir}/../../contrib/valgrind.supp"
+   lappend valgrind_params "--log-file=${valgrind_logfile}"
+} else {
+   set valgrind_params {}
+}
+verbose "valgrind_params: $valgrind_params" 2
+
+set args ${valgrind_params}
+lappend args "./${executable}"
+set args [concat $args ${params}]
+verbose "args: $args" 2
+
+eval spawn -noecho $args
+
 expect_after full_buffer { error "got full_buffer" }
 
 set prefix "\[^\r\n\]*"
@@ -147,6 +213,17 @@ proc fixed_host_execute {args} {
 # force a close of the executable to be safe.
 catch close
 
+# valgrind might not have finished writing the log out
+# before we parse it; need to wait for spawnee to
+# finish.
+catch wait reason
+verbose "reason: $reason" 2
+
+if $run_under_valgrind {
+

[PATCH 15/21] PR jit/63854: lra.c: Fix leak of point_freq_vec's buffer when calling lra_inheritance

2014-11-19 Thread David Malcolm

Valgrind showed a leak of 1640 bytes per iteration of one of the jit
testcases (I think this is per-function in a compile):

8,200 bytes in 5 blocks are definitely lost in loss record 214 of 223
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75D1F: xrealloc (xmalloc.c:177)
   by 0x4E0C94E: void va_heap::reserve(vec*&, 
unsigned int, bool) (vec.h:310)
   by 0x4E0C7A3: vec::reserve(unsigned int, bool) 
(vec.h:1428)
   by 0x4E0C55A: vec::reserve_exact(unsigned int) 
(vec.h:1448)
   by 0x4E0C2C6: vec::create(unsigned int) (vec.h:1463)
   by 0x519B316: lra_create_live_ranges(bool) (lra-lives.c:1248)
   by 0x5179BD7: lra(_IO_FILE*) (lra.c:2297)
   by 0x5126C4E: do_reload() (ira.c:5380)
   by 0x5127098: (anonymous namespace)::pass_reload::execute(function*) 
(ira.c:5550)
   by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358)

which is this line:
  point_freq_vec.create (get_max_uid () * 2);

point_freq_vec's buffer is released in lra_clear_live_ranges.

Am unfamiliar with LRA, but my reading of the lra code is that the
"live_p" boolean tracks whether or not we need to call
lra_clear_live_ranges.

Debugging calls to the above line and calls to lra_clear_live_ranges
showed a mismatch; the call to lra_create_live_ranges before
lra_inheritance wasn't setting live_p.

Setting it after that callsite fixes the leak (though perhaps the code
could be cleaned up by having live_p be set/cleared by the
allocation/clear functions themselves, rather than by their callers?
Vladimir?)

gcc/ChangeLog:
PR jit/63854
* lra.c (lra): After creating live ranges in preparation for call
to lra_inheritance, set live_p to true.
---
 gcc/lra.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/lra.c b/gcc/lra.c
index 9309d5e..ec122c7 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -2297,6 +2297,7 @@ lra (FILE *f)
 actual_call_used_reg_set,  which is needed during
 lra_inheritance.  */
  lra_create_live_ranges (true, true);
+ live_p = true;
}
  lra_inheritance ();
}
-- 
1.8.5.3

[PATCH 04/21] PR jit/63854: Fix memory leak within bb-reorder.c

2014-11-19 Thread David Malcolm

Valgrind showed this leaking 200 bytes per iteration in one of
my testcases:

1,000 bytes in 5 blocks are definitely lost in loss record 200 of 241
   at 0x4A083AA: realloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75C5C: xrealloc (xmalloc.c:179)
   by 0x4E10734: void va_heap::reserve(vec*&, unsigned int, bool) (vec.h:310
)
   by 0x4E105BB: vec::reserve(unsigned int, 
bool) (vec.h:1428)
   by 0x4E15B37: vec::safe_push(basic_block_def* const&) (vec.h:1537)
   by 0x5B61F44: find_rarely_executed_basic_blocks_and_crossing_edges() 
(bb-reorder.c:1614)
   by 0x5B63E90: (anonymous 
namespace)::pass_partition_blocks::execute(function*) (bb-reorder.c:2711)
   by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x52237C4: execute_pass_list_1(opt_pass*) (passes.c:2358)
   by 0x52237F5: execute_pass_list_1(opt_pass*) (passes.c:2359)
   by 0x5223832: execute_pass_list(function*, opt_pass*) (passes.c:2369)
   by 0x4E4884F: cgraph_node::expand() (cgraphunit.c:1773)

Fix is trivial.

gcc/ChangeLog:
PR jit/63854
* bb-reorder.c
(find_rarely_executed_basic_blocks_and_crossing_edges): Convert
local bbs_in_hot_partition from vec<> to auto_vec<>.
---
 gcc/bb-reorder.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 1f7c3ee..4ca3bb2 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -1582,7 +1582,7 @@ find_rarely_executed_basic_blocks_and_crossing_edges 
(void)
   edge e;
   edge_iterator ei;
   unsigned int cold_bb_count = 0;
-  vec bbs_in_hot_partition = vNULL;
+  auto_vec bbs_in_hot_partition;
 
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB_FN (bb, cfun)
-- 
1.8.5.3

[PATCH 17/21] PR jit/63854: Fix leaking vec in jit

2014-11-19 Thread David Malcolm

This fixes various leaks of vec buffers seen via valgrind within jit
(both recording and playback).

Various vec<> within jit::recording are converted to auto_vec<>.

Various playback::wrapper subclasses containing vec<> gain a finalizer
so they can release the vec when they are collected.

gcc/jit/ChangeLog:
PR jit/63854
* jit-playback.c (gcc::jit::playback::compound_type::set_fields):
Convert param from const vec & to
const auto_vec *.
(gcc::jit::playback::context::new_function_type): Convert param
"param_types" from vec * to const auto_vec *.
(gcc::jit::playback::context::new_function): Convert param
"params" from vec * to const auto_vec *.
(gcc::jit::playback::context::build_call): Convert param "args"
from vec to const auto_vec *.
(gcc::jit::playback::context::new_call): Likewise.
(gcc::jit::playback::context::new_call_through_ptr): Likewise.
(wrapper_finalizer): New function.
(gcc::jit::playback::wrapper::operator new): Call the finalizer
variant of ggc_internal_cleared_alloc, supplying
wrapper_finalizer.
(gcc::jit::playback::function::finalizer): New.
(gcc::jit::playback::block::finalizer): New.
(gcc::jit::playback::source_file::finalizer): New.
(gcc::jit::playback::source_line::finalizer): New.

* jit-playback.h
(gcc::jit::playback::context::new_function_type): Convert param
"param_types" from vec * to const auto_vec *.
(gcc::jit::playback::context::new_function): Convert param
"params" from vec * to const auto_vec *.
(gcc::jit::playback::context::new_call): Convert param
"args" from vec to const auto_vec *.
(gcc::jit::playback::context::new_call_through_ptr): Likewise.
(gcc::jit::playback::context::build_call): Likewise.
(gcc::jit::playback::context): Convert fields "m_functions",
"m_source_files", "m_cached_locations" from vec to auto_vec.
(gcc::jit::playback::wrapper::finalizer): New virtual function.
(gcc::jit::playback::compound_type::set_fields): Convert param fro
const vec & to
const auto_vec *.
(gcc::jit::playback::function::finalizer): New.
(gcc::jit::playback::block::finalizer): New.
(gcc::jit::playback::source_file::finalizer): New.
(gcc::jit::playback::source_line::finalizer): New.

* jit-recording.c
(gcc::jit::recording::function_type::replay_into): Convert local
from a vec into an auto_vec.
(gcc::jit::recording::fields::replay_into): Likewise.
(gcc::jit::recording::function::replay_into): Likewise.
(gcc::jit::recording::call::replay_into): Likewise.
(gcc::jit::recording::call_through_ptr::replay_into): Likewise.

* jit-recording.h (gcc::jit::recording::context): Convert fields
"m_mementos", "m_compound_types", "m_functions" from vec<> to
auto_vec <>.
(gcc::jit::recording::function_type::get_param_types): Convert
return type from vec to const vec &.
(gcc::jit::recording::function_type): Convert field
"m_param_types" from a vec<> to an auto_vec<>.
(gcc::jit::recording::fields): Likewise for field "m_fields".
(gcc::jit::recording::function::get_params): Convert return type
from vec  to const vec &.
(gcc::jit::recording::function): Convert fields "m_params",
"m_locals", "m_blocks" from vec<> to auto_vec<>.
(gcc::jit::recording::block): Likewise for field "m_statements".
vec<> to auto_vec<>.
(gcc::jit::recording::call): Likewise for field "m_args".
(gcc::jit::recording::call_through_ptr): Likewise.
---
 gcc/jit/jit-playback.c  | 73 +
 gcc/jit/jit-playback.h  | 27 --
 gcc/jit/jit-recording.c | 16 +--
 gcc/jit/jit-recording.h | 26 +-
 4 files changed, 100 insertions(+), 42 deletions(-)

diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index 285a3ef..8fdfa29 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -285,15 +285,15 @@ new_compound_type (location *loc,
 }
 
 void
-playback::compound_type::set_fields (const vec &fields)
+playback::compound_type::set_fields (const auto_vec *fields)
 {
   /* Compare with c/c-decl.c: finish_struct. */
   tree t = as_tree ();
 
   tree fieldlist = NULL;
-  for (unsigned i = 0; i < fields.length (); i++)
+  for (unsigned i = 0; i < fields->length (); i++)
 {
-  field *f = fields[i];
+  field *f = (*fields)[i];
   DECL_CONTEXT (f->as_tree ()) = t;
   fieldlist = chainon (f->as_tree (), fieldlist);
 }
@@ -309,7 +309,7 @@ playback::compound_type::set_fields (const 
vec &fields)
 playback::type *
 playback::context::
 new_function_type (type *return_type,
-  vec *param_types,
+  const auto_vec *param_type

[PATCH 16/21] PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS

2014-11-19 Thread David Malcolm

Valgrind showed a per-iteration leak of pass_ipa_pta (and presumably
pass_omp_simd_clone):

704 (352 direct, 352 indirect) bytes in 4 blocks are definitely lost in loss 
record 198 of 241
   at 0x4A06965: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x555A55D: make_pass_ipa_pta(gcc::context*) (tree-ssa-structalias.c:7425)
   by 0x5219FDA: gcc::pass_manager::pass_manager(gcc::context*) 
(pass-instances.def:141)
   by 0x4E4F5D1: gcc::context::context() (context.c:37)
   by 0x532C07B: general_init(char const*) (toplev.c:1212)
   by 0x532DB5E: toplev::main(int, char**) (toplev.c:2074)
   by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401DC4: test_jit (harness.h:190)
   by 0x401EA8: main (harness.h:232)

Investigation showed that ~pass_manager wasn't deleting these
passes since for some reason GCC_PASS_LISTS didn't contain
all_late_ipa_passes and so wasn't calling delete_pass_tree on it.

Add it to GCC_PASS_LISTS, fixing the leak.

gcc/ChangeLog:
PR jit/63854
* pass_manager.h (GCC_PASS_LISTS): Add all_late_ipa_passes.
---
 gcc/pass_manager.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index 82857a9..e6658ad 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -29,6 +29,7 @@ struct register_pass_info;
   DEF_PASS_LIST (all_lowering_passes) \
   DEF_PASS_LIST (all_small_ipa_passes) \
   DEF_PASS_LIST (all_regular_ipa_passes) \
+  DEF_PASS_LIST (all_late_ipa_passes) \
   DEF_PASS_LIST (all_passes)
 
 #define DEF_PASS_LIST(LIST) PASS_LIST_NO_##LIST,
-- 
1.8.5.3

[PATCH 20/21] PR jit/63854: Fix leak in ipa-icf.c

2014-11-19 Thread David Malcolm

ipa-icf.c was leaking 16 bytes per iteration in the jit testcases here:

80 bytes in 5 blocks are definitely lost in loss record 94 of 199
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D764B7: xmalloc (xmalloc.c:147)
   by 0x5C962FF: ipa_icf::sem_item_optimizer::get_group_by_hash(unsigned int, 
ipa_icf::sem_item_type) (ipa-icf.c:1503)
   by 0x5C96C8C: ipa_icf::sem_item_optimizer::build_hash_based_classes() 
(ipa-icf.c:1723)
   by 0x5C96765: ipa_icf::sem_item_optimizer::execute() (ipa-icf.c:1616)
   by 0x5C98E81: ipa_icf::ipa_icf_driver() (ipa-icf.c:2373)
   by 0x5C993C8: ipa_icf::pass_ipa_icf::execute(function*) (ipa-icf.c:2421)
   by 0x5223CCD: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5224BA7: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
   by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088)
   by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
   by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)

by not freeing the allocation here:

  congruence_class_group *item = XNEW (congruence_class_group);

in sem_item_optimizer::get_group_by_hash.

Fix it.

gcc/ChangeLog:
PR jit/63854
* ipa-icf.c (sem_item_optimizer::~sem_item_optimizer): Free each
congruence_class_group *.
---
 gcc/ipa-icf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 2d5fcf5..b0ba82e 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -1322,6 +1322,7 @@ sem_item_optimizer::~sem_item_optimizer ()
delete (*it)->classes[i];
 
   (*it)->classes.release ();
+  free (*it);
 }
 
   m_items.release ();
-- 
1.8.5.3

[PATCH 19/21] PR jit/63854: Fix leak of ipa hooks

2014-11-19 Thread David Malcolm

This fixes three leaks in IPA seen in jit testcases with valgrind:

This one:
96 bytes in 4 blocks are definitely lost in loss record 102 of 205
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76447: xmalloc (xmalloc.c:147)
   by 0x4E35C23: symbol_table::add_cgraph_insertion_hook(void (*)(cgraph_node*, 
void*), void*) (cgraph.c:383)
   by 0x51070C6: ipa_register_cgraph_hooks() (ipa-prop.c:3864)
   by 0x5C753D8: ipcp_generate_summary() (ipa-cp.c:3786)
   by 0x5223540: execute_ipa_summary_passes(ipa_opt_pass_d*) (passes.c:2102)
   by 0x4E49A60: ipa_passes() (cgraphunit.c:2074)
   by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
   by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)
   by 0x4DC999C: jit_langhook_write_globals() (dummy-frontend.c:216)
   by 0x532B3A6: compile_file() (toplev.c:583)
   by 0x532E15F: do_compile() (toplev.c:2020)

appears to be caused by
  ipa-prop.c (ipa_register_cgraph_hooks)
unconditionally inserting ipa_add_new_function as
"function_insertion_hook_holder", rather than if the latter is
non-NULL, like the other hooks.

These two in ipa-reference.c:
96 bytes in 4 blocks are definitely lost in loss record 103 of 205
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76447: xmalloc (xmalloc.c:147)
   by 0x4E35AA9: symbol_table::add_cgraph_removal_hook(void (*)(cgraph_node*, 
void*), void*) (cgraph.c:329)
   by 0x5CA446E: ipa_init() (ipa-reference.c:435)
   by 0x5CA47D1: generate_summary() (ipa-reference.c:551)
   by 0x5CA4E70: propagate() (ipa-reference.c:684)
   by 0x5CA640E: (anonymous namespace)::pass_ipa_reference::execute(function*) 
(ipa-reference.c:1178)
   by 0x5223CC1: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5224B9B: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
   by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088)
   by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
   by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)

96 bytes in 4 blocks are definitely lost in loss record 104 of 205
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76447: xmalloc (xmalloc.c:147)
   by 0x4E35E29: symbol_table::add_cgraph_duplication_hook(void 
(*)(cgraph_node*, cgraph_node*, void*), void*) (cgraph.c:453)
   by 0x5CA4493: ipa_init() (ipa-reference.c:437)
   by 0x5CA47D1: generate_summary() (ipa-reference.c:551)
   by 0x5CA4E70: propagate() (ipa-reference.c:684)
   by 0x5CA640E: (anonymous namespace)::pass_ipa_reference::execute(function*) 
(ipa-reference.c:1178)
   by 0x5223CC1: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5224B9B: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
   by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088)
   by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
   by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)

appear to be due to th hooks never being removed.

My patch hacks in a removal of them into ipa_reference_c_finalize, but
I suspect there's a cleaner place to put this.

gcc/ChangeLog:
PR jit/63854
* ipa-prop.c (ipa_register_cgraph_hooks): Guard insertion of
ipa_add_new_function on function_insertion_hook_holder being
non-NULL.
* ipa-reference.c (ipa_reference_c_finalize): Remove
node_removal_hook_holder and node_duplication_hook_holder if
they've been added to symtab.
* toplev.c (toplev::finalize): Call ipa_reference_c_finalize
before cgraph_c_finalize so that the former can access "symtab".
---
 gcc/ipa-prop.c  |  3 ++-
 gcc/ipa-reference.c | 11 +++
 gcc/toplev.c|  4 +++-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index f87243c..a8238ac 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3552,7 +3552,8 @@ ipa_register_cgraph_hooks (void)
   if (!node_duplication_hook_holder)
 node_duplication_hook_holder =
   symtab->add_cgraph_duplication_hook (&ipa_node_duplication_hook, NULL);
-  function_insertion_hook_holder =
+  if (!function_insertion_hook_holder)
+function_insertion_hook_holder =
   symtab->add_cgraph_insertion_hook (&ipa_add_new_function, NULL);
 }
 
diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index 1ce06d1..b046f9e 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -1198,4 +1198,15 @@ ipa_reference_c_finalize (void)
   bitmap_obstack_release (&optimization_summary_obstack);
   ipa_init_p = false;
 }
+
+  if (node_removal_hook_holder)
+{
+  symtab->remove_cgraph_removal_hook (node_removal_hook_holder);
+  node_removal_hook_holder = NULL;
+}
+  if (node_duplication_hook_holder)
+{
+  symtab->remove_cgraph_duplication_hook (node_duplication_hook_holder);
+  node_duplication_hook_holder = NULL;
+}
 }
diff --git a/gcc/toplev.c b/gcc/toplev.c
index c0c418c..96e7af9 100644
--- a/gcc

[PATCH 09/21] PR jit/63854: Don't leak producer_string in dwarf2out.c

2014-11-19 Thread David Malcolm

Fix this small per-iteration leak with debuginfo:

424 bytes in 4 blocks are definitely lost in loss record 185 of 230
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75CA7: xmalloc (xmalloc.c:147)
   by 0x4ECE9E4: gen_producer_string() (dwarf2out.c:19489)
   by 0x4EDB2C8: dwarf2out_finish(char const*) (dwarf2out.c:24257)
   by 0x532AD3B: compile_file() (toplev.c:623)
   by 0x532D9E1: do_compile() (toplev.c:2020)
   by 0x532DC54: toplev::main(int, char**) (toplev.c:2117)
   by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* dwarf2out.c (dwarf2out_c_finalize): Free producer_string.
---
 gcc/dwarf2out.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index b16883f..9069f9a 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -24741,6 +24741,8 @@ dwarf2out_c_finalize (void)
   frame_pointer_fb_offset = 0;
   frame_pointer_fb_offset_valid = false;
   base_types.release ();
+  XDELETEVEC (producer_string);
+  producer_string = NULL;
 }
 
 #include "gt-dwarf2out.h"
-- 
1.8.5.3

[PATCH 05/21] PR jit/63854: Fix memory leak of save_decoded_options

2014-11-19 Thread David Malcolm

This commit fixes this leak from opts-common.c, about 4KB per iteration.

==57820== 18,816 (2,560 direct, 16,256 indirect) bytes in 4 blocks are 
definitely lost in loss record 907 of 917
==57820==at 0x4A083AA: realloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x59A67CC: xrealloc (xmalloc.c:179)
==57820==by 0x59587C9: decode_cmdline_options_to_array(unsigned int, char 
const**, unsigned int, cl_decoded_option**, unsigned int*) (opts-common.c:885)
==57820==by 0x4E2ED90: toplev::main(int, char**) (toplev.c:2089)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* toplev.c (toplev::finalize): Clean up save_decoded_options.
---
 gcc/toplev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 876279f..291b84d 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2173,6 +2173,8 @@ toplev::finalize (void)
   finalize_options_struct (&global_options);
   finalize_options_struct (&global_options_set);
 
+  XDELETEVEC (save_decoded_options);
+
   /* Clean up the context (and pass_manager etc). */
   delete g;
   g = NULL;
-- 
1.8.5.3

[PATCH 18/21] PR jit/63854: Add "long-term" allocator to gcc::context

2014-11-19 Thread David Malcolm

Some places in the startup code use char * values that can sometimes be
string literals, and can sometimes be dynamically built using xstrdup or
concat.

This isn't a problem for cc1 etc since this is only called once, but
for libgccjit they are small per-invocation leaks.

There's no clean way to release them: retrofitting logic to decide if
we're dealing with a string literal vs a dynamically-allocated buffer
(and if something else is pointing to said buffer) is messy and
error-prone; they are also unconnected to the GC.

Hence the cleanest way to fix the leak is to add a new "long-term"
allocator, to gcc::context.  This gives back buffers that will persist
until the gcc::context is cleaned away.

Using this fixes these leaks seen via valgrind:

28 bytes in 4 blocks are definitely lost in loss record 13 of 209
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76417: xmalloc (xmalloc.c:147)
   by 0x5D76509: xstrdup (xstrdup.c:34)
   by 0x532CC38: process_options() (toplev.c:1316)
   by 0x532DFF4: do_compile() (toplev.c:1976)
   by 0x532E3B0: toplev::main(int, char**) (toplev.c:2117)
   by 0x4DE7983: gcc::jit::playback::context::compile() (jit-playback.c:1646)
   by 0x4DD791A: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5E12: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

108 bytes in 12 blocks are definitely lost in loss record 135 of 241
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75E07: xmalloc (xmalloc.c:147)
   by 0x5D7098B: concat (concat.c:147)
   by 0x563A342: build_common_builtin_nodes() (tree.c:10145)
   by 0x4DC91FC: jit_langhook_init() (dummy-frontend.c:128)
   by 0x532D3F4: lang_dependent_init(char const*) (toplev.c:1790)
   by 0x532D9C8: do_compile() (toplev.c:2007)
   by 0x532DCAC: toplev::main(int, char**) (toplev.c:2118)
   by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401DC4: test_jit (harness.h:190)

108 bytes in 12 blocks are definitely lost in loss record 136 of 241
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75E07: xmalloc (xmalloc.c:147)
   by 0x5D7098B: concat (concat.c:147)
   by 0x563A3B6: build_common_builtin_nodes() (tree.c:10151)
   by 0x4DC91FC: jit_langhook_init() (dummy-frontend.c:128)
   by 0x532D3F4: lang_dependent_init(char const*) (toplev.c:1790)
   by 0x532D9C8: do_compile() (toplev.c:2007)
   by 0x532DCAC: toplev::main(int, char**) (toplev.c:2118)
   by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401DC4: test_jit (harness.h:190)

136 bytes in 4 blocks are definitely lost in loss record 129 of 209
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76417: xmalloc (xmalloc.c:147)
   by 0x532BE28: init_asm_output(char const*) (toplev.c:936)
   by 0x532DB34: lang_dependent_init(char const*) (toplev.c:1795)
   by 0x532E0CC: do_compile() (toplev.c:2006)
   by 0x532E3B0: toplev::main(int, char**) (toplev.c:2117)
   by 0x4DE7983: gcc::jit::playback::context::compile() (jit-playback.c:1646)
   by 0x4DD791A: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5E12: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* context.c (gcc::context::context): Initialize m_longterm_obstack.
(gcc::context::~context): Free m_longterm_obstack.
(gcc::context::longterm_xalloc): New method.
(gcc::context::longterm_xstrdup): New method.
* context.h: Include obstack.h
(gcc::context::longterm_xalloc): New method.
(gcc::context::longterm_xallocvec): New method
(gcc::context::longterm_xstrdup): New method.
(gcc::context::m_longterm_obstack): New field.
* toplev.c (init_asm_output): Use g->longterm_xallocvec 
rather than XNEWVEC when allocating "dumpname" to avoid a leak.
(process_options): Likewise, use g->longterm_xstrdup rather than
xstrdup when allocating "name" (and hence aux_base_name).
* tree.c: Include context.h.
(build_common_builtin_nodes): When writing dynamically-allocated
entries into built_in_names, use g->longterm_xstrdup to take a
copy of the buffer acquired by concat to avoid a leak.
---
 gcc/context.c | 18 ++
 gcc/context.h | 46 ++
 gcc/toplev.c  |  4 ++--
 gcc/tree.c| 17 +++--
 4 files changed, 77 insertions(+), 8 dele

[PATCH 14/21] PR jit/63854: Fix leak of paths within jump threading

2014-11-19 Thread David Malcolm

Paths are allocated as:
   vec *path = new vec ();
i.e. the vec itself is on the heap, so a mere:
  path->release ();
is merely releasing the buffer the vec holds, not the vec itself.

This patch updates the two sites that release paths to also delete them,
fixing leaks like this seen by valgrind:
160 bytes in 20 blocks are definitely lost in loss record 165 of 241
   at 0x4A06965: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x556E5AF: thread_across_edge(gimple_statement_base*, edge_def*, bool, 
vec*, tree_node
* (*)(gimple_statement_base*, gimple_statement_base*)) 
(tree-ssa-threadedge.c:1124)
   by 0x547B73B: dom_opt_dom_walker::thread_across_edge(edge_def*) 
(tree-ssa-dom.c:1184)
   by 0x5480183: dom_opt_dom_walker::after_dom_children(basic_block_def*) 
(tree-ssa-dom.c:2015)
   by 0x5C1C5F7: dom_walker::walk(basic_block_def*) (domwalk.c:218)
   by 0x547AE83: (anonymous namespace)::pass_dominator::execute(function*) 
(tree-ssa-dom.c:919)
   by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358)
   by 0x5223865: execute_pass_list_1(opt_pass*) (passes.c:2359)
   by 0x52238A2: execute_pass_list(function*, opt_pass*) (passes.c:2369)
   by 0x4E4888F: cgraph_node::expand() (cgraphunit.c:1773)
   by 0x4E48F29: expand_all_functions() (cgraphunit.c:1909)

[Space-wise, a vec  is just a pointer, hence the 8 bytes
per block seen in the valgrind report]

gcc/ChangeLog:
PR jit/63854
* tree-ssa-threadedge.c (thread_across_edge): Don't just release
"path", delete it.
* tree-ssa-threadupdate.c (delete_jump_thread_path): Likewise.
---
 gcc/tree-ssa-threadedge.c   | 1 +
 gcc/tree-ssa-threadupdate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index d5b9696..9ee1cae 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -1149,6 +1149,7 @@ thread_across_edge (gimple dummy_cond,
 through the vector entries.  */
   gcc_assert (path->length () == 0);
   path->release ();
+  delete path;
 
   /* A negative status indicates the target block was deemed too big to
 duplicate.  Just quit now rather than trying to use the block as
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 151ed83..5cb9f86 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2481,6 +2481,7 @@ delete_jump_thread_path (vec *path)
   for (unsigned int i = 0; i < path->length (); i++)
 delete (*path)[i];
   path->release();
+  delete path;
 }
 
 /* Register a jump threading opportunity.  We queue up all the jump
-- 
1.8.5.3

[PATCH 21/21] PR jit/63854: Fix leaks in test-fuzzer.c

2014-11-19 Thread David Malcolm

gcc/testsuite/ChangeLog:
PR jit/63854
* jit.dg/test-fuzzer.c (fuzzer_init): Free malloced buffers.
(make_random_function): Free ff->locals.
---
 gcc/testsuite/jit.dg/test-fuzzer.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/testsuite/jit.dg/test-fuzzer.c 
b/gcc/testsuite/jit.dg/test-fuzzer.c
index f363f8f..b501792 100644
--- a/gcc/testsuite/jit.dg/test-fuzzer.c
+++ b/gcc/testsuite/jit.dg/test-fuzzer.c
@@ -105,6 +105,11 @@ fuzzer_init (fuzzer *f, gcc_jit_context *ctxt, unsigned 
int seed)
 
   for (i = 0; i < num_funcs; i++)
 f->funcs[f->num_funcs++] = make_random_function (f);
+
+  /* Now clean out f.  */
+  free (f->types);
+  free (f->funcs);
+  free (f->globals);
 }
 
 /* Get random int in inclusive range [min, max].  */
@@ -309,6 +314,7 @@ make_random_function (fuzzer *f)
 
   gcc_jit_function *result = ff->fn;
 
+  free (ff->locals);
   free (ff->params);
   free (ff);
 
-- 
1.8.5.3

Re: [PING][PATCH][AARCH64]Fix PR63424 by adding v2di3 pattern

2014-11-19 Thread Renlin Li


On 10/11/14 16:55, Renlin Li wrote:

On 06/11/14 15:00, Renlin Li wrote:

Hi all,

Dose anybody have time to review this?

Kind regards,
Renlin Li

On 31/10/14 14:51, Renlin Li wrote:

Hi all,

This is a patch which will fix PR63424.

It implements signed/unsigned max/min pattern for V2DI mode in terms 
of vcondv2div2di pattern.


In this particular case, VEC_COND_EXPR (V2DImode) is generated as 
aarch64 target supports it (vcond for VALL). The 
VEC_COND_EXPR will further folded into MIN_EXPR/MAX_EXPR in dom pass 
unconditionally. Later in expand pass, the compiler tries to expand 
min_expr using standard RTL operation. It fails, because aarch64 
target don't have minv2di3 pattern implemented. It then tries to 
generate conditional move and compare&branch sequence, all fails. At 
last it falls into libfunc call, no luck either. An ICE to complain 
about this.


aarch64-none-elf toolchain has been tested on the model, no 
regressions.


Is it Okay for trunk?

gcc/ChangeLog:

2014-10-31  Renlin Li  
PR target/63424
* config/aarch64/aarch64-simd.md (v2di3): New.

gcc/testsuite/ChangeLog:

2014-10-31  Renlin Li  
PR target/63424
* gcc.target/aarch64/pr63424.c: New.



Hi,

Dose anybody have time to review this?

Thank you so much!

Regards,
Renlin Li


Ping again.

Regards,
Renlin Li

[PATCH] Fix for fold_negate_expr (PR sanitizer/63879)

2014-11-19 Thread Marek Polacek

The problem here is that negate_expr_p returns true (as it should)
for UINT_MAX(OVF), but fold_negate_expr didn't actually fold it,
and that is a no-no; if negate_expr_p is true, fold_negate_expr must
not return NULL_TREE.  I added the following hunk for bootstrap and
regtest for better coverage:

@@ -752,6 +755,12 @@ fold_negate_expr (location_t loc, tree t)
   break;
 }
 
+#ifdef ENABLE_CHECKING
+  if (negate_expr_p (t))
+internal_error ("fold_negate_expr should never return NULL_TREE "
+   "if negate_expr_p is true");
+#endif
+
   return NULL_TREE;
 }

Fixed by adjusting fold_negate_expr so that it folds non-trapping
wrapping constants.  The SANITIZE_SI_OVERFLOW check is important
to not regress -Woverflow warnings (ug).

negate_expr_p hunk is to match the NEGATE_EXPR case in fold_negate_expr.
 
Bootstrapped/regtested on power8-linux, ok for trunk?

2014-11-19  Marek Polacek  

PR sanitizer/63879
* fold-const.c (negate_expr_p) : Return
!TYPE_OVERFLOW_SANITIZED.
(fold_negate_expr) : Fold when overflow
does not trap and when overflow wraps, or when SANITIZE_SI_OVERFLOW
is 0.

* c-c++-common/ubsan/pr63879-1.c: New test.
* c-c++-common/ubsan/pr63879-2.c: New test.

diff --git gcc/fold-const.c gcc/fold-const.c
index f6fb8af..719e06e 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -408,9 +408,11 @@ negate_expr_p (tree t)
  && TYPE_OVERFLOW_WRAPS (type));
 
 case FIXED_CST:
-case NEGATE_EXPR:
   return true;
 
+case NEGATE_EXPR:
+  return !TYPE_OVERFLOW_SANITIZED (type);
+
 case REAL_CST:
   /* We want to canonicalize to positive real constants.  Pretend
  that only negative ones can be easily negated.  */
@@ -555,7 +557,8 @@ fold_negate_expr (location_t loc, tree t)
   tem = fold_negate_const (t, type);
   if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
  || (!TYPE_OVERFLOW_TRAPS (type)
- && (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0))
+ && TYPE_OVERFLOW_WRAPS (type))
+ || (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0)
return tem;
   break;
 
diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-1.c 
gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
index e69de29..2035849 100644
--- gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
+++ gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
@@ -0,0 +1,23 @@
+/* PR sanitizer/63879 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined" } */
+
+struct A
+{
+  int inode;
+} * a;
+int b, c;
+void
+fn1 ()
+{
+  int d = 0;
+  while (b)
+{
+  if (a->inode)
+d++;
+  a = 0;
+}
+  c = d - 1;
+  for (; c >= 0; c--)
+;
+}
diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-2.c 
gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
index e69de29..34eb8e7 100644
--- gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
+++ gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
@@ -0,0 +1,13 @@
+/* PR sanitizer/63879 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined" } */
+
+int a;
+void
+fn1 ()
+{
+  int b = 2;
+  for (; a;)
+while (b >= 0)
+  b--;
+}

Marek

Re: [PATCH 01/21] PR jit/63854: Fix memory leak within gcc_options

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> Valgrind shows this fixes ~1 KB of leak per iteration (on x86_64) by
> plugging these leaks allocated at opts.c lines 286 and 289:
>
> ==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 875 of 
> 917
> ==57820==at 0x4A0645D: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==57820==by 0x59A6747: xmalloc (xmalloc.c:147)
> ==57820==by 0x595542A: init_options_struct(gcc_options*, gcc_options*) 
> (opts.c:286)
> ==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081)
> ==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
> (jit-playback.c:1615)
> ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
> (jit-recording.c:861)
> ==57820==by 0x401CA4: test_jit (harness.h:190)
> ==57820==by 0x401D88: main (harness.h:232)
> ==57820==
> ==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 876 of 
> 917
> ==57820==at 0x4A081D4: calloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==57820==by 0x59A6780: xcalloc (xmalloc.c:162)
> ==57820==by 0x595543E: init_options_struct(gcc_options*, gcc_options*) 
> (opts.c:289)
> ==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081)
> ==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
> (jit-playback.c:1615)
> ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
> (jit-recording.c:861)
> ==57820==by 0x401CA4: test_jit (harness.h:190)
> ==57820==by 0x401D88: main (harness.h:232)

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
> PR jit/63854
> * opts.c (finalize_options_struct): New.
> * opts.h (finalize_options_struct): New.
> * toplev.c (toplev::finalize): Call finalize_options_struct
> on global_options and global_options_set.
> ---
>  gcc/opts.c   | 8 
>  gcc/opts.h   | 1 +
>  gcc/toplev.c | 3 +++
>  3 files changed, 12 insertions(+)
>
> diff --git a/gcc/opts.c b/gcc/opts.c
> index d22882b..dabd3c6 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -307,6 +307,14 @@ init_options_struct (struct gcc_options *opts, struct 
> gcc_options *opts_set)
>targetm_common.option_init_struct (opts);
>  }
>
> +/* Release any allocations owned by OPTS.  */
> +
> +void
> +finalize_options_struct (struct gcc_options *opts)
> +{
> +  XDELETEVEC (opts->x_param_values);
> +}
> +
>  /* If indicated by the optimization level LEVEL (-Os if SIZE is set,
> -Ofast if FAST is set, -Og if DEBUG is set), apply the option DEFAULT_OPT
> to OPTS and OPTS_SET, diagnostic context DC, location LOC, with language
> diff --git a/gcc/opts.h b/gcc/opts.h
> index f694082..c3ec942 100644
> --- a/gcc/opts.h
> +++ b/gcc/opts.h
> @@ -325,6 +325,7 @@ extern void decode_cmdline_options_to_array (unsigned int 
> argc,
>  extern void init_options_once (void);
>  extern void init_options_struct (struct gcc_options *opts,
>  struct gcc_options *opts_set);
> +extern void finalize_options_struct (struct gcc_options *opts);
>  extern void decode_cmdline_options_to_array_default_mask (unsigned int argc,
>   const char **argv,
>   struct 
> cl_decoded_option **decoded_options,
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 2e48047..4b4e568 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -2169,4 +2169,7 @@ toplev::finalize (void)
>ipa_cp_c_finalize ();
>ipa_reference_c_finalize ();
>params_c_finalize ();
> +
> +  finalize_options_struct (&global_options);
> +  finalize_options_struct (&global_options_set);
>  }
> --
> 1.8.5.3
>

Re: [PATCH 02/21] PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> Valgrind shows this fixes ~4 KB of leak per iteration (on x86_64) by
> plugging this leak allocated at reginfo.c:1327:
>   gcc_obstack_init (&valid_mode_changes_obstack);

Ok.

Thanks,
Richard.

> ==57820== 16,256 bytes in 4 blocks are definitely lost in loss record 906 of 
> 917
> ==57820==at 0x4A0645D: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==57820==by 0x59A6747: xmalloc (xmalloc.c:147)
> ==57820==by 0x30958842DB: _obstack_begin (obstack.c:184)
> ==57820==by 0x51C1EC1: init_subregs_of_mode() (reginfo.c:1327)
> ==57820==by 0x50D2A38: init_costs() (ira-costs.c:2181)
> ==57820==by 0x50D74A8: ira_costs() (ira-costs.c:2211)
> ==57820==by 0x50D1326: ira_build() (ira-build.c:3459)
> ==57820==by 0x50C909C: (anonymous 
> namespace)::pass_ira::execute(function*) (ira.c:5227)
> ==57820==by 0x51884F7: execute_one_pass(opt_pass*) (passes.c:2269)
> ==57820==by 0x5188B75: execute_pass_list_1(opt_pass*) (passes.c:2321)
> ==57820==by 0x5188B87: execute_pass_list_1(opt_pass*) (passes.c:2322)
> ==57820==by 0x5188BC8: execute_pass_list(function*, opt_pass*) 
> (passes.c:2332)
>
> gcc/ChangeLog:
> PR jit/63854
> * reginfo.c (finish_subregs_of_mode): Replace obstack_finish with
> obstack_free when cleaning up valid_mode_changes_obstack.
> ---
>  gcc/reginfo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/reginfo.c b/gcc/reginfo.c
> index efe23cd..c2daf22 100644
> --- a/gcc/reginfo.c
> +++ b/gcc/reginfo.c
> @@ -1343,7 +1343,7 @@ void
>  finish_subregs_of_mode (void)
>  {
>XDELETEVEC (valid_mode_changes);
> -  obstack_finish (&valid_mode_changes_obstack);
> +  obstack_free (&valid_mode_changes_obstack, NULL);
>  }
>
>  /* Free all data attached to the structure.  This isn't a destructor because
> --
> 1.8.5.3
>

Re: [PATCH][AARCH64]Fix PR63424 by adding v2di3 pattern

2014-11-19 Thread Richard Henderson

On 10/31/2014 03:51 PM, Renlin Li wrote:
> +(define_expand "v2di3"
> +  [(parallel [
> +(set (match_operand:V2DI 0 "register_operand" "")
> +  (MAXMIN:V2DI (match_operand:V2DI 1 "register_operand" "")
> +   (match_operand:V2DI 2 "register_operand" "")))
> +(clobber (reg:CC CC_REGNUM))])]
> +  "TARGET_SIMD"

There's no clobber of CC_REGNUM, so you can take that out.

Otherwise it looks good.


r~

Re: [PATCH 06/21] PR jit/63854: Fix leak of opts_obstack

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> This was leaking about 4KB per iteration:

Ok.

Thanks,
Richard.

> 16,256 bytes in 4 blocks are definitely lost in loss record 233 of 239
>at 0x4A0645D: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0x5D75C17: xmalloc (xmalloc.c:147)
>by 0x30958842DB: _obstack_begin (obstack.c:184)
>by 0x5D1D317: init_options_struct(gcc_options*, gcc_options*) (opts.c:279)
>by 0x532DB0C: toplev::main(int, char**) (toplev.c:2081)
>by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615)
>by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
>by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
>by 0x401CA4: test_jit (harness.h:190)
>by 0x401D88: main (harness.h:232)
>
> gcc/ChangeLog:
> PR jit/63854
> * toplev.c (toplev::finalize): Free opts_obstack.
> ---
>  gcc/toplev.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 291b84d..0d617c2 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -2178,4 +2178,6 @@ toplev::finalize (void)
>/* Clean up the context (and pass_manager etc). */
>delete g;
>g = NULL;
> +
> +  obstack_free (&opts_obstack, NULL);
>  }
> --
> 1.8.5.3
>

Re: [PATCH 07/21] PR jit/63854: Fix leak of optimization_summary_obstack

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> This was leaking ~4KB per iteration:

Ok.

Thanks,
Richard.

> 16,256 bytes in 4 blocks are definitely lost in loss record 234 of 239
>at 0x4A0645D: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0x5D75C17: xmalloc (xmalloc.c:147)
>by 0x30958842DB: _obstack_begin (obstack.c:184)
>by 0x4DFECDE: bitmap_obstack_initialize(bitmap_obstack*) (bitmap.c:338)
>by 0x5CA3C1D: ipa_init() (ipa-reference.c:431)
>by 0x5CA3FB1: generate_summary() (ipa-reference.c:551)
>by 0x5CA4650: propagate() (ipa-reference.c:684)
>by 0x5CA5BEE: (anonymous 
> namespace)::pass_ipa_reference::execute(function*) (ipa-reference.c:1178)
>by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306)
>by 0x5224427: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
>by 0x4E495C1: ipa_passes() (cgraphunit.c:2088)
>by 0x4E498E0: symbol_table::compile() (cgraphunit.c:2172)
>
> gcc/ChangeLog:
> PR jit/63854
> * ipa-reference.c (ipa_reference_c_finalize): Release
> optimization_summary_obstack.
> ---
>  gcc/ipa-reference.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
> index b421f63..1ce06d1 100644
> --- a/gcc/ipa-reference.c
> +++ b/gcc/ipa-reference.c
> @@ -1193,5 +1193,9 @@ make_pass_ipa_reference (gcc::context *ctxt)
>  void
>  ipa_reference_c_finalize (void)
>  {
> -  ipa_init_p = false;
> +  if (ipa_init_p)
> +{
> +  bitmap_obstack_release (&optimization_summary_obstack);
> +  ipa_init_p = false;
> +}
>  }
> --
> 1.8.5.3
>

Re: [PATCH 08/21] PR jit/63854: Add ira_costs_c_finalize

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> this_target_ira_int->free_ira_costs () is called by ira_init_costs,
> but this isn't called after the compile, causing noise when running
> under valgrind.
>
> This patch adds a ira_costs_c_finalize function to clean this up,
> and calls it from toplev::finalize (and hence this isn't called by
> cc1/cc1plus/etc, just by libgccjit.so).
>
> Doing so eliminates about 27KB of "leak" from the valgrind report.

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
> PR jit/63854
> * ira-costs.c (ira_costs_c_finalize): New function.
> * ira.h (ira_costs_c_finalize): New prototype.
> * toplev.c (toplev::finalize): Call ira_costs_c_finalize.
> ---
>  gcc/ira-costs.c | 6 ++
>  gcc/ira.h   | 3 +++
>  gcc/toplev.c| 1 +
>  3 files changed, 10 insertions(+)
>
> diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
> index 122815b..2dabead 100644
> --- a/gcc/ira-costs.c
> +++ b/gcc/ira-costs.c
> @@ -2356,3 +2356,9 @@ ira_adjust_equiv_reg_cost (unsigned regno, int cost)
>else
>  regno_equiv_gains[regno] += cost;
>  }
> +
> +void
> +ira_costs_c_finalize (void)
> +{
> +  this_target_ira_int->free_ira_costs ();
> +}
> diff --git a/gcc/ira.h b/gcc/ira.h
> index b294ea1..d62656c 100644
> --- a/gcc/ira.h
> +++ b/gcc/ira.h
> @@ -199,4 +199,7 @@ extern bool ira_bad_reload_regno (int, rtx, rtx);
>
>  extern void ira_adjust_equiv_reg_cost (unsigned, int);
>
> +/* ira-costs.c */
> +extern void ira_costs_c_finalize (void);
> +
>  #endif /* GCC_IRA_H */
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 0d617c2..0181a3f 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -2168,6 +2168,7 @@ toplev::finalize (void)
>gcse_c_finalize ();
>ipa_cp_c_finalize ();
>ipa_reference_c_finalize ();
> +  ira_costs_c_finalize ();
>params_c_finalize ();
>
>finalize_options_struct (&global_options);
> --
> 1.8.5.3
>

Re: [PATCH 11/21] PR jit/63854: Fix leak of "avail" within tree-ssa-pre.c

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> Fix this leak:

Ok.

Thanks,
Richard.

> 120 bytes in 5 blocks are definitely lost in loss record 141 of 227
>at 0x4A0645D: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0x5D75D8F: xrealloc (xmalloc.c:177)
>by 0x550DEBA: void va_heap::reserve(vec vl_embed>*&, unsigned int, bool) (vec.h:310)
>by 0x550D509: vec::reserve(unsigned int, 
> bool) (vec.h:1428)
>by 0x550DA3E: vec::reserve_exact(unsigned 
> int) (vec.h:1448)
>by 0x550D167: vec::safe_grow(unsigned int) 
> (vec.h:1576)
>by 0x55076FE: do_regular_insertion(basic_block_def*, basic_block_def*) 
> (tree-ssa-pre.c:3209)
>by 0x5508379: insert_aux(basic_block_def*) (tree-ssa-pre.c:3526)
>by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536)
>by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536)
>by 0x55084C0: insert() (tree-ssa-pre.c:3559)
>by 0x550C5BC: (anonymous namespace)::pass_pre::execute(function*) 
> (tree-ssa-pre.c:4805)
>
> gcc/ChangeLog:
> PR jit/63854
> * tree-ssa-pre.c (do_regular_insertion): Convert "avail" from
> vec<> to auto_vec<> to fix a leak.
> ---
>  gcc/tree-ssa-pre.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
> index ea99198..027dc1c 100644
> --- a/gcc/tree-ssa-pre.c
> +++ b/gcc/tree-ssa-pre.c
> @@ -3202,7 +3202,7 @@ do_regular_insertion (basic_block block, basic_block 
> dom)
>bool new_stuff = false;
>vec exprs;
>pre_expr expr;
> -  vec avail = vNULL;
> +  auto_vec avail;
>int i;
>
>exprs = sorted_array_from_bitmap_set (ANTIC_IN (block));
> --
> 1.8.5.3
>

Re: [PATCH 12/21] PR jit/63854: Add a valgrind suppresion file

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> Valgrind complains about uninitialized data within sparseset_bit_p.
> Provide a suppression file to silence these warnings.
>
> Valgrind requires suppression files for C++ code to use the mangled
> names, so we do that here.

There is --enable-valgrind-annotations to get the same effect by GCC
telling valgrind about this (and more).

Richard.

> contrib/ChangeLog
> PR jit/63854
> * valgrind.supp: New.
> ---
>  contrib/valgrind.supp | 11 +++
>  1 file changed, 11 insertions(+)
>  create mode 100644 contrib/valgrind.supp
>
> diff --git a/contrib/valgrind.supp b/contrib/valgrind.supp
> new file mode 100644
> index 000..ec9ff02
> --- /dev/null
> +++ b/contrib/valgrind.supp
> @@ -0,0 +1,11 @@
> +{
> +   suppress-uninit-cond-with-sparseset_bit_p
> +   Memcheck:Cond
> +   fun:_ZL15sparseset_bit_pP13sparseset_defm
> +}
> +
> +{
> +   suppress-uninit-use-with-sparseset_bit_p
> +   Memcheck:Value8
> +   fun:_ZL15sparseset_bit_pP13sparseset_defm
> +}
> --
> 1.8.5.3
>

Re: [PATCH 16/21] PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> Valgrind showed a per-iteration leak of pass_ipa_pta (and presumably
> pass_omp_simd_clone):

Ok.

Thanks,
Richard.

> 704 (352 direct, 352 indirect) bytes in 4 blocks are definitely lost in loss 
> record 198 of 241
>at 0x4A06965: operator new(unsigned long) (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0x555A55D: make_pass_ipa_pta(gcc::context*) 
> (tree-ssa-structalias.c:7425)
>by 0x5219FDA: gcc::pass_manager::pass_manager(gcc::context*) 
> (pass-instances.def:141)
>by 0x4E4F5D1: gcc::context::context() (context.c:37)
>by 0x532C07B: general_init(char const*) (toplev.c:1212)
>by 0x532DB5E: toplev::main(int, char**) (toplev.c:2074)
>by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615)
>by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
>by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
>by 0x401DC4: test_jit (harness.h:190)
>by 0x401EA8: main (harness.h:232)
>
> Investigation showed that ~pass_manager wasn't deleting these
> passes since for some reason GCC_PASS_LISTS didn't contain
> all_late_ipa_passes and so wasn't calling delete_pass_tree on it.
>
> Add it to GCC_PASS_LISTS, fixing the leak.
>
> gcc/ChangeLog:
> PR jit/63854
> * pass_manager.h (GCC_PASS_LISTS): Add all_late_ipa_passes.
> ---
>  gcc/pass_manager.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
> index 82857a9..e6658ad 100644
> --- a/gcc/pass_manager.h
> +++ b/gcc/pass_manager.h
> @@ -29,6 +29,7 @@ struct register_pass_info;
>DEF_PASS_LIST (all_lowering_passes) \
>DEF_PASS_LIST (all_small_ipa_passes) \
>DEF_PASS_LIST (all_regular_ipa_passes) \
> +  DEF_PASS_LIST (all_late_ipa_passes) \
>DEF_PASS_LIST (all_passes)
>
>  #define DEF_PASS_LIST(LIST) PASS_LIST_NO_##LIST,
> --
> 1.8.5.3
>

Re: [PATCH 04/21] PR jit/63854: Fix memory leak within bb-reorder.c

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> Valgrind showed this leaking 200 bytes per iteration in one of
> my testcases:

Ok.

Thanks,
Richard.

> 1,000 bytes in 5 blocks are definitely lost in loss record 200 of 241
>at 0x4A083AA: realloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0x5D75C5C: xrealloc (xmalloc.c:179)
>by 0x4E10734: void 
> va_heap::reserve(vec vl_embed>*&, unsigned int, bool) (vec.h:310
> )
>by 0x4E105BB: vec::reserve(unsigned 
> int, bool) (vec.h:1428)
>by 0x4E15B37: vec vl_ptr>::safe_push(basic_block_def* const&) (vec.h:1537)
>by 0x5B61F44: find_rarely_executed_basic_blocks_and_crossing_edges() 
> (bb-reorder.c:1614)
>by 0x5B63E90: (anonymous 
> namespace)::pass_partition_blocks::execute(function*) (bb-reorder.c:2711)
>by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306)
>by 0x52237C4: execute_pass_list_1(opt_pass*) (passes.c:2358)
>by 0x52237F5: execute_pass_list_1(opt_pass*) (passes.c:2359)
>by 0x5223832: execute_pass_list(function*, opt_pass*) (passes.c:2369)
>by 0x4E4884F: cgraph_node::expand() (cgraphunit.c:1773)
>
> Fix is trivial.
>
> gcc/ChangeLog:
> PR jit/63854
> * bb-reorder.c
> (find_rarely_executed_basic_blocks_and_crossing_edges): Convert
> local bbs_in_hot_partition from vec<> to auto_vec<>.
> ---
>  gcc/bb-reorder.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index 1f7c3ee..4ca3bb2 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -1582,7 +1582,7 @@ find_rarely_executed_basic_blocks_and_crossing_edges 
> (void)
>edge e;
>edge_iterator ei;
>unsigned int cold_bb_count = 0;
> -  vec bbs_in_hot_partition = vNULL;
> +  auto_vec bbs_in_hot_partition;
>
>/* Mark which partition (hot/cold) each basic block belongs in.  */
>FOR_EACH_BB_FN (bb, cfun)
> --
> 1.8.5.3
>

Re: [PATCH 05/21] PR jit/63854: Fix memory leak of save_decoded_options

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> This commit fixes this leak from opts-common.c, about 4KB per iteration.

Ok.

Thanks,
Richard.

> ==57820== 18,816 (2,560 direct, 16,256 indirect) bytes in 4 blocks are 
> definitely lost in loss record 907 of 917
> ==57820==at 0x4A083AA: realloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==57820==by 0x59A67CC: xrealloc (xmalloc.c:179)
> ==57820==by 0x59587C9: decode_cmdline_options_to_array(unsigned int, char 
> const**, unsigned int, cl_decoded_option**, unsigned int*) (opts-common.c:885)
> ==57820==by 0x4E2ED90: toplev::main(int, char**) (toplev.c:2089)
> ==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
> (jit-playback.c:1615)
> ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
> (jit-recording.c:861)
> ==57820==by 0x401CA4: test_jit (harness.h:190)
> ==57820==by 0x401D88: main (harness.h:232)
>
> gcc/ChangeLog:
> PR jit/63854
> * toplev.c (toplev::finalize): Clean up save_decoded_options.
> ---
>  gcc/toplev.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 876279f..291b84d 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -2173,6 +2173,8 @@ toplev::finalize (void)
>finalize_options_struct (&global_options);
>finalize_options_struct (&global_options_set);
>
> +  XDELETEVEC (save_decoded_options);
> +
>/* Clean up the context (and pass_manager etc). */
>delete g;
>g = NULL;
> --
> 1.8.5.3
>

Re: [PATCH 20/21] PR jit/63854: Fix leak in ipa-icf.c

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> ipa-icf.c was leaking 16 bytes per iteration in the jit testcases here:

Ok.

Tahnks,
Richard.

> 80 bytes in 5 blocks are definitely lost in loss record 94 of 199
>at 0x4A0645D: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0x5D764B7: xmalloc (xmalloc.c:147)
>by 0x5C962FF: ipa_icf::sem_item_optimizer::get_group_by_hash(unsigned int, 
> ipa_icf::sem_item_type) (ipa-icf.c:1503)
>by 0x5C96C8C: ipa_icf::sem_item_optimizer::build_hash_based_classes() 
> (ipa-icf.c:1723)
>by 0x5C96765: ipa_icf::sem_item_optimizer::execute() (ipa-icf.c:1616)
>by 0x5C98E81: ipa_icf::ipa_icf_driver() (ipa-icf.c:2373)
>by 0x5C993C8: ipa_icf::pass_ipa_icf::execute(function*) (ipa-icf.c:2421)
>by 0x5223CCD: execute_one_pass(opt_pass*) (passes.c:2306)
>by 0x5224BA7: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
>by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088)
>by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
>by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)
>
> by not freeing the allocation here:
>
>   congruence_class_group *item = XNEW (congruence_class_group);
>
> in sem_item_optimizer::get_group_by_hash.
>
> Fix it.
>
> gcc/ChangeLog:
> PR jit/63854
> * ipa-icf.c (sem_item_optimizer::~sem_item_optimizer): Free each
> congruence_class_group *.
> ---
>  gcc/ipa-icf.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
> index 2d5fcf5..b0ba82e 100644
> --- a/gcc/ipa-icf.c
> +++ b/gcc/ipa-icf.c
> @@ -1322,6 +1322,7 @@ sem_item_optimizer::~sem_item_optimizer ()
> delete (*it)->classes[i];
>
>(*it)->classes.release ();
> +  free (*it);
>  }
>
>m_items.release ();
> --
> 1.8.5.3
>

Re: [PATCH 09/21] PR jit/63854: Don't leak producer_string in dwarf2out.c

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> Fix this small per-iteration leak with debuginfo:

Ok.

Thanks,
Richard.

> 424 bytes in 4 blocks are definitely lost in loss record 185 of 230
>at 0x4A0645D: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0x5D75CA7: xmalloc (xmalloc.c:147)
>by 0x4ECE9E4: gen_producer_string() (dwarf2out.c:19489)
>by 0x4EDB2C8: dwarf2out_finish(char const*) (dwarf2out.c:24257)
>by 0x532AD3B: compile_file() (toplev.c:623)
>by 0x532D9E1: do_compile() (toplev.c:2020)
>by 0x532DC54: toplev::main(int, char**) (toplev.c:2117)
>by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615)
>by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
>by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
>by 0x401CA4: test_jit (harness.h:190)
>by 0x401D88: main (harness.h:232)
>
> gcc/ChangeLog:
> PR jit/63854
> * dwarf2out.c (dwarf2out_c_finalize): Free producer_string.
> ---
>  gcc/dwarf2out.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index b16883f..9069f9a 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -24741,6 +24741,8 @@ dwarf2out_c_finalize (void)
>frame_pointer_fb_offset = 0;
>frame_pointer_fb_offset_valid = false;
>base_types.release ();
> +  XDELETEVEC (producer_string);
> +  producer_string = NULL;
>  }
>
>  #include "gt-dwarf2out.h"
> --
> 1.8.5.3
>

Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Jakub Jelinek

On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote:
> > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
> > they are never used for integer arithmetics (and there is no way
> > to represent all their values in RTL if not using CONST_WIDE_INT).
> > As the following testcase shows, simplify_immed_subreg can be called
> > with such modes though, e.g. trying to forward propagate a CONST_VECTOR
> > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
> > in the IL directly) into a SUBREG_REG.
> 
> So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
> end doing with that OI mode subreg?  (why can't we simply use the
> V4DF one?)

propagate_rtx_1 is called on
(subreg:OI (reg:V8DI 89) 0)
with old_rtx
(reg:V8DI 89)
and new_rtx
(const_vector:V8DI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
])

Seems we throw away the result in that case though, because
gen_lowpart_common doesn't like to return low parts of VOIDmode constants
larger if the mode is larger than double int:
1353  innermode = GET_MODE (x);
1354  if (CONST_INT_P (x)
1355  && msize * BITS_PER_UNIT <= HOST_BITS_PER_WIDE_INT)
1356innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0);
1357  else if (innermode == VOIDmode)
1358innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0);
we hit the last if and as mode is wider than innermode, we return NULL later
on.

> > The following patch instead of ICE handles the most common cases (all 0
> > and all 1 CONST_VECTORs) and returns NULL otherwise.
> > 
> > Before wide-int got merged, the testcase worked, though the code didn't
> > bother checking anything, just created 0 or constm1_rtx for the two cases
> > that could happen and if something else appeared, could just return what
> > matched low TImode (or DImode for -m32).
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Looks ok to me.  Not sure if the zero/all-ones case really happens that
> much to be worth special-casing - I think you could use
> fixed_wide_int and simply see if the result is representable
> in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
> the patch may be smaller for that.

So perhaps something like this?  Don't know how much more inefficient it is
compared to what it used to do before.

Or just keep the existing code and just remove the assert and instead return
NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
least during propagation that will make zero change.
Though, in that case I have still doubts about the current code handling right
modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
== 0, we still silently throw away the upper bits, don't we?

BTW, to Mike, the assert has been misplaced, there was first buffer overflow
and only after that the assert.

2014-11-19  Jakub Jelinek  

PR target/63910
* simplify-rtx.c (simplify_immed_subreg): Handle integer modes wider
than MAX_BITSIZE_MODE_ANY_INT.

* gcc.target/i386/pr63910.c: New test.

--- gcc/simplify-rtx.c.jj   2014-11-19 09:17:15.491327992 +0100
+++ gcc/simplify-rtx.c  2014-11-19 12:28:16.223808178 +0100
@@ -5501,8 +5501,12 @@ simplify_immed_subreg (machine_mode oute
int units
  = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1)
  / HOST_BITS_PER_WIDE_INT;
-   HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / 
HOST_BITS_PER_WIDE_INT];
-   wide_int r;
+   const int tmpsize = (MAX_BITSIZE_MODE_ANY_MODE 
++ HOST_BITS_PER_WIDE_INT - 1)
+   / HOST_BITS_PER_WIDE_INT;
+   HOST_WIDE_INT tmp[tmpsize];
+   typedef FIXED_WIDE_INT (tmpsize * HOST_BITS_PER_WIDE_INT) imm_int;
+   imm_int r;

for (u = 0; u < units; u++)
  {
@@ -5515,10 +5519,12 @@ simplify_immed_subreg (machine_mode oute
tmp[u] = buf;
base += HOST_BITS_PER_WIDE_INT;
  }
-   gcc_assert (GET_MODE_PRECISION (outer_submode)
-   <= MAX_BITSIZE_MODE_ANY_INT);
-   r = wide_int::from_array (tmp, units,
- GET_MODE_PRECISION (outer_submode));
+   r = imm_int::from_array (tmp, units,
+GET_MODE_PRECISION (outer_submode));
+#if TARGET_SUPPORTS_WIDE_INT == 0
+   if (wi::min_precision (r, SIGNED) > HOST_BITS_PER_DOUBLE_INT)
+ return NULL_RTX;
+#endif
elems[elem] = immed_wide_int_const (r, outer_submode);
  }
  break;
--- gcc/testsuite/gcc.target/i386/pr63910.c.jj  2014-1

Re: [PATCH 14/21] PR jit/63854: Fix leak of paths within jump threading

2014-11-19 Thread Richard Biener

On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm  wrote:
> Paths are allocated as:
>vec *path = new vec ();
> i.e. the vec itself is on the heap, so a mere:
>   path->release ();
> is merely releasing the buffer the vec holds, not the vec itself.

Ok.

Thanks,
RIchard.

> This patch updates the two sites that release paths to also delete them,
> fixing leaks like this seen by valgrind:
> 160 bytes in 20 blocks are definitely lost in loss record 165 of 241
>at 0x4A06965: operator new(unsigned long) (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>by 0x556E5AF: thread_across_edge(gimple_statement_base*, edge_def*, bool, 
> vec*, tree_node
> * (*)(gimple_statement_base*, gimple_statement_base*)) 
> (tree-ssa-threadedge.c:1124)
>by 0x547B73B: dom_opt_dom_walker::thread_across_edge(edge_def*) 
> (tree-ssa-dom.c:1184)
>by 0x5480183: dom_opt_dom_walker::after_dom_children(basic_block_def*) 
> (tree-ssa-dom.c:2015)
>by 0x5C1C5F7: dom_walker::walk(basic_block_def*) (domwalk.c:218)
>by 0x547AE83: (anonymous namespace)::pass_dominator::execute(function*) 
> (tree-ssa-dom.c:919)
>by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306)
>by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358)
>by 0x5223865: execute_pass_list_1(opt_pass*) (passes.c:2359)
>by 0x52238A2: execute_pass_list(function*, opt_pass*) (passes.c:2369)
>by 0x4E4888F: cgraph_node::expand() (cgraphunit.c:1773)
>by 0x4E48F29: expand_all_functions() (cgraphunit.c:1909)
>
> [Space-wise, a vec  is just a pointer, hence the 8 bytes
> per block seen in the valgrind report]
>
> gcc/ChangeLog:
> PR jit/63854
> * tree-ssa-threadedge.c (thread_across_edge): Don't just release
> "path", delete it.
> * tree-ssa-threadupdate.c (delete_jump_thread_path): Likewise.
> ---
>  gcc/tree-ssa-threadedge.c   | 1 +
>  gcc/tree-ssa-threadupdate.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index d5b9696..9ee1cae 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> @@ -1149,6 +1149,7 @@ thread_across_edge (gimple dummy_cond,
>  through the vector entries.  */
>gcc_assert (path->length () == 0);
>path->release ();
> +  delete path;
>
>/* A negative status indicates the target block was deemed too big to
>  duplicate.  Just quit now rather than trying to use the block as
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index 151ed83..5cb9f86 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> @@ -2481,6 +2481,7 @@ delete_jump_thread_path (vec *path)
>for (unsigned int i = 0; i < path->length (); i++)
>  delete (*path)[i];
>path->release();
> +  delete path;
>  }
>
>  /* Register a jump threading opportunity.  We queue up all the jump
> --
> 1.8.5.3
>

Re: [PATCH] Fix for fold_negate_expr (PR sanitizer/63879)

2014-11-19 Thread Richard Biener

On Wed, 19 Nov 2014, Marek Polacek wrote:

> The problem here is that negate_expr_p returns true (as it should)
> for UINT_MAX(OVF), but fold_negate_expr didn't actually fold it,
> and that is a no-no; if negate_expr_p is true, fold_negate_expr must
> not return NULL_TREE.  I added the following hunk for bootstrap and
> regtest for better coverage:
> 
> @@ -752,6 +755,12 @@ fold_negate_expr (location_t loc, tree t)
>break;
>  }
>  
> +#ifdef ENABLE_CHECKING
> +  if (negate_expr_p (t))
> +internal_error ("fold_negate_expr should never return NULL_TREE "
> + "if negate_expr_p is true");
> +#endif
> +
>return NULL_TREE;
>  }
> 
> Fixed by adjusting fold_negate_expr so that it folds non-trapping
> wrapping constants.  The SANITIZE_SI_OVERFLOW check is important
> to not regress -Woverflow warnings (ug).
> 
> negate_expr_p hunk is to match the NEGATE_EXPR case in fold_negate_expr.
>  
> Bootstrapped/regtested on power8-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2014-11-19  Marek Polacek  
> 
>   PR sanitizer/63879
>   * fold-const.c (negate_expr_p) : Return
>   !TYPE_OVERFLOW_SANITIZED.
>   (fold_negate_expr) : Fold when overflow
>   does not trap and when overflow wraps, or when SANITIZE_SI_OVERFLOW
>   is 0.
> 
>   * c-c++-common/ubsan/pr63879-1.c: New test.
>   * c-c++-common/ubsan/pr63879-2.c: New test.
> 
> diff --git gcc/fold-const.c gcc/fold-const.c
> index f6fb8af..719e06e 100644
> --- gcc/fold-const.c
> +++ gcc/fold-const.c
> @@ -408,9 +408,11 @@ negate_expr_p (tree t)
> && TYPE_OVERFLOW_WRAPS (type));
>  
>  case FIXED_CST:
> -case NEGATE_EXPR:
>return true;
>  
> +case NEGATE_EXPR:
> +  return !TYPE_OVERFLOW_SANITIZED (type);
> +
>  case REAL_CST:
>/* We want to canonicalize to positive real constants.  Pretend
>   that only negative ones can be easily negated.  */
> @@ -555,7 +557,8 @@ fold_negate_expr (location_t loc, tree t)
>tem = fold_negate_const (t, type);
>if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
> || (!TYPE_OVERFLOW_TRAPS (type)
> -   && (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0))
> +   && TYPE_OVERFLOW_WRAPS (type))
> +   || (flag_sanitize & SANITIZE_SI_OVERFLOW) == 0)
>   return tem;
>break;
>  
> diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-1.c 
> gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
> index e69de29..2035849 100644
> --- gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
> +++ gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
> @@ -0,0 +1,23 @@
> +/* PR sanitizer/63879 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined" } */
> +
> +struct A
> +{
> +  int inode;
> +} * a;
> +int b, c;
> +void
> +fn1 ()
> +{
> +  int d = 0;
> +  while (b)
> +{
> +  if (a->inode)
> +d++;
> +  a = 0;
> +}
> +  c = d - 1;
> +  for (; c >= 0; c--)
> +;
> +}
> diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-2.c 
> gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
> index e69de29..34eb8e7 100644
> --- gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
> +++ gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
> @@ -0,0 +1,13 @@
> +/* PR sanitizer/63879 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined" } */
> +
> +int a;
> +void
> +fn1 ()
> +{
> +  int b = 2;
> +  for (; a;)
> +while (b >= 0)
> +  b--;
> +}
> 
>   Marek
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Richard Biener

On Wed, 19 Nov 2014, Jakub Jelinek wrote:

> On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote:
> > > OImode/XImode on i?86/x86_64 are not <= MAX_BITSIZE_MODE_ANY_INT, because
> > > they are never used for integer arithmetics (and there is no way
> > > to represent all their values in RTL if not using CONST_WIDE_INT).
> > > As the following testcase shows, simplify_immed_subreg can be called
> > > with such modes though, e.g. trying to forward propagate a CONST_VECTOR
> > > (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can 
> > > appear
> > > in the IL directly) into a SUBREG_REG.
> > 
> > So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
> > end doing with that OI mode subreg?  (why can't we simply use the
> > V4DF one?)
> 
> propagate_rtx_1 is called on
> (subreg:OI (reg:V8DI 89) 0)
> with old_rtx
> (reg:V8DI 89)
> and new_rtx
> (const_vector:V8DI [
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> (const_int 0 [0])
> ])
> 
> Seems we throw away the result in that case though, because
> gen_lowpart_common doesn't like to return low parts of VOIDmode constants
> larger if the mode is larger than double int:
> 1353innermode = GET_MODE (x);
> 1354if (CONST_INT_P (x)
> 1355&& msize * BITS_PER_UNIT <= HOST_BITS_PER_WIDE_INT)
> 1356  innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0);
> 1357else if (innermode == VOIDmode)
> 1358  innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0);
> we hit the last if and as mode is wider than innermode, we return NULL later
> on.
> 
> > > The following patch instead of ICE handles the most common cases (all 0
> > > and all 1 CONST_VECTORs) and returns NULL otherwise.
> > > 
> > > Before wide-int got merged, the testcase worked, though the code didn't
> > > bother checking anything, just created 0 or constm1_rtx for the two cases
> > > that could happen and if something else appeared, could just return what
> > > matched low TImode (or DImode for -m32).
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Looks ok to me.  Not sure if the zero/all-ones case really happens that
> > much to be worth special-casing - I think you could use
> > fixed_wide_int and simply see if the result is representable
> > in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
> > the patch may be smaller for that.
> 
> So perhaps something like this?  Don't know how much more inefficient it is
> compared to what it used to do before.

Yes, that looks good.

> Or just keep the existing code and just remove the assert and instead return
> NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
> least during propagation that will make zero change.
> Though, in that case I have still doubts about the current code handling right
> modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
> MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
> == 0, we still silently throw away the upper bits, don't we?

Well - not with your added check, no?

I'd say the patch is ok.

Thanks,
Richard.

> BTW, to Mike, the assert has been misplaced, there was first buffer overflow
> and only after that the assert.
> 
> 2014-11-19  Jakub Jelinek  
> 
>   PR target/63910
>   * simplify-rtx.c (simplify_immed_subreg): Handle integer modes wider
>   than MAX_BITSIZE_MODE_ANY_INT.
> 
>   * gcc.target/i386/pr63910.c: New test.
> 
> --- gcc/simplify-rtx.c.jj 2014-11-19 09:17:15.491327992 +0100
> +++ gcc/simplify-rtx.c2014-11-19 12:28:16.223808178 +0100
> @@ -5501,8 +5501,12 @@ simplify_immed_subreg (machine_mode oute
>   int units
> = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1)
> / HOST_BITS_PER_WIDE_INT;
> - HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / 
> HOST_BITS_PER_WIDE_INT];
> - wide_int r;
> + const int tmpsize = (MAX_BITSIZE_MODE_ANY_MODE 
> +  + HOST_BITS_PER_WIDE_INT - 1)
> + / HOST_BITS_PER_WIDE_INT;
> + HOST_WIDE_INT tmp[tmpsize];
> + typedef FIXED_WIDE_INT (tmpsize * HOST_BITS_PER_WIDE_INT) imm_int;
> + imm_int r;
>  
>   for (u = 0; u < units; u++)
> {
> @@ -5515,10 +5519,12 @@ simplify_immed_subreg (machine_mode oute
>   tmp[u] = buf;
>   base += HOST_BITS_PER_WIDE_INT;
> }
> - gcc_assert (GET_MODE_PRECISION (outer_submode)
> - <= MAX_BITSIZE_MODE_ANY_INT);
> - r = wide_int::from_array (tmp, units,
> -   GET_MODE_PRECISION (outer_submode));
> + r = imm_int::from_array (tmp, units,
> +  GET_MODE_PRECISION (outer_sub

Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Jakub Jelinek

On Wed, Nov 19, 2014 at 12:59:06PM +0100, Richard Biener wrote:
> > So perhaps something like this?  Don't know how much more inefficient it is
> > compared to what it used to do before.
> 
> Yes, that looks good.
> 
> > Or just keep the existing code and just remove the assert and instead return
> > NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
> > least during propagation that will make zero change.
> > Though, in that case I have still doubts about the current code handling 
> > right
> > modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
> > MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
> > == 0, we still silently throw away the upper bits, don't we?
> 
> Well - not with your added check, no?

For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok.  Not sure
about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or
is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up)
precision?  Mike?

Jakub

[C++ Patch] PR 55425

2014-11-19 Thread Paolo Carlini


Hi,

today I wanted to simply close this Bug as fixed, but then I noticed 
that Richard Smith in the audit trail argued, correctly in my opinion, 
that we should accept things like:


constexpr const char* x() { return __func__; }

in C++11 mode too, because the "as if" local variable specification of 
__func__ in the Standard doesn't matter for the purpose of constexpr, 
because otherwise in C++11 no constexpr function would be valid: the 
Standard does *not* say that __func__ is only predefined if it is used.


That said, I think a quick way to accept __func__, __FUNCTION__, and 
__PRETTY_FUNCTION__ is checking whether DECL_ARTIFICIAL (decl) is true 
on the decl. Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2014-11-19  Paolo Carlini  

PR c++/55425
* constexpr.c (constexpr_fn_retval): Accept __func__, __FUNCTION__,
and __PRETTY_FUNCTION__.

/testsuite
2014-11-19  Paolo Carlini  

PR c++/55425
* g++.dg/cpp0x/constexpr-__func__.C
Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 217755)
+++ cp/constexpr.c  (working copy)
@@ -616,9 +616,14 @@ constexpr_fn_retval (tree body)
   return break_out_target_exprs (TREE_OPERAND (body, 0));
 
 case DECL_EXPR:
-  if (TREE_CODE (DECL_EXPR_DECL (body)) == USING_DECL)
-   return NULL_TREE;
-  return error_mark_node;
+  {
+   tree decl = DECL_EXPR_DECL (body);
+   if (TREE_CODE (decl) == USING_DECL
+   /* Accept __func__, __FUNCTION__, and __PRETTY_FUNCTION__.  */
+   || DECL_ARTIFICIAL (decl))
+ return NULL_TREE;
+   return error_mark_node;
+  }
 
 case CLEANUP_POINT_EXPR:
   return constexpr_fn_retval (TREE_OPERAND (body, 0));
Index: testsuite/g++.dg/cpp0x/constexpr-__func__.C
===
--- testsuite/g++.dg/cpp0x/constexpr-__func__.C (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-__func__.C (working copy)
@@ -0,0 +1,6 @@
+// PR c++/55425
+// { dg-do compile { target c++11 } }
+
+constexpr const char* x() { return __func__; }
+constexpr const char* y() { return __FUNCTION__; }
+constexpr const char* z() { return __PRETTY_FUNCTION__; }

Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Richard Biener

On Wed, 19 Nov 2014, Jakub Jelinek wrote:

> On Wed, Nov 19, 2014 at 12:59:06PM +0100, Richard Biener wrote:
> > > So perhaps something like this?  Don't know how much more inefficient it 
> > > is
> > > compared to what it used to do before.
> > 
> > Yes, that looks good.
> > 
> > > Or just keep the existing code and just remove the assert and instead 
> > > return
> > > NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
> > > least during propagation that will make zero change.
> > > Though, in that case I have still doubts about the current code handling 
> > > right
> > > modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
> > > MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If 
> > > TARGET_SUPPORTS_WIDE_INT
> > > == 0, we still silently throw away the upper bits, don't we?
> > 
> > Well - not with your added check, no?
> 
> For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok.  Not sure
> about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or
> is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up)
> precision?  Mike?

It can represent any - well, the RTX at least.  Code then using
"simple" wide_int may wreck then though.

Richard.

Re: [PATCH 4/4] Data structure is used for inline_summary struct.

2014-11-19 Thread Martin Liška


On 11/18/2014 11:25 PM, Martin Jambor wrote:

On Tue, Nov 18, 2014 at 07:59:26PM +0100, Jan Hubicka wrote:

Hi,

On Tue, Nov 18, 2014 at 04:39:00PM +0100, Jan Hubicka wrote:

On Fri, Nov 14, 2014 at 08:59:10PM +0100, Jan Hubicka wrote:



b) with GTY, we cannot call destructor


Everything in symbol table is expecitely memory managed (i.e. enver left
to be freed by garbage collector). It resists in GTY only to allow linking
garbage collected object from them and to get PCH working.



Well, if I understand the intent correctly, summaries are for stuff
that is not in the symbol table.  For example jump functions are a

Correct.

vector of structures possibly containing trees, so everything has to
be in garbage collected memory.

When an edge is removed, it is necessary to be notified about it
immediately, for example to decrement rdesc_refcount (you might argue
that that should be done in a separate hook and not from within a
summary class but then you start to rely on hook invocation ordering
so I think it is better to eventually use the summaries for it too).


I do not see why ctors/dtors can not do the reference counting. In fact
this is how refcounting is done usually anyway?



Well, when there is no garbage collection involved then yes, that is
how you normally do it but in the GC case, there is the question of
what is the appropriate time to call destructor on garbage collected
data (like jump functions)?


I still fail to see problem here.  Summaries are explicitly managed- they are
constructed at summary construction time or when new callgarph node is
introduced/duplicated.  They are destroyed when callgarph node is destroyed or
whole summary is ddestroyed.  It is job of the summary datastructure to call
proper ctors/dtors, not job of garbage collector that provides the underlying
memory management.


I do not think that all summaries (in the meaning of a description of
one particular symbol table node or call graph edge) are explicitely
managed.  For example ipa_edge_args or ipa_agg_replacement_value
(which my alignment patch changes to ipcp_transformation_summary) are
allocated in GC memory because they contain trees.



If you have datastructure that points to something that is not
explicitly managed (i.e. tree expression), you just can not have
non-trivial constructor on that datastructure, because that is freed
transparently by gty that don't do destruction...


I admit to not being particularly bright today but that seems to be
exactly my point.


Well, in your case you have datastructure jump_function that contain a pointer
to tree (EXPR).  What I am trying to explain is that I see no reson why
jump_function needs to be POD.


I never said that the summary object needs to be a POD, I only said I
liked the possibility of storing very simple objects (without wrapping
them in classes with constructors and destructors).  That is of course
nothing more than my personal preference.


The tree pointed to by EXPR pointer can not
have a dtor by itself because GGC will not call it upon freeing.

It is true that jump_function lives in GGC memory (to make pointer to expr
work) but it never gets removed by ggc_collect because it is always pointed to
by the summary datastructure.  There are two ways to free the jump_function
datastructure.
   1) removing the symbol node it is attached to.
  Here the symtab code will call removal hook that was registered by 
container
  template. The container will call destructor of jump_function and the 
ggc_free
  its memory
   2) removing the summary.  In this case I would again expect the container
  template to walk all summaries and free them.

So even if your structure lives in GGC memory it is not really garbage
collected and thus the lack of machinery to call dtors at a time ggc decides to
free something is not a problem?

In fact looking at struct default_hashmap_traits, I see:

   /* Called to dispose of the key and value before marking the entry as
  deleted.  */

   template static void remove (T &v) { v.~T (); }


Now I see, I should have read your previous email more carefully, by
explicitely managed you mean that destructors will be called
explicitely by the summary infrastructure.  I was wondering how you
wanted to rip the summaries out of GGC memory.

Well, I suppose that would work, and since explicit calls to
destructors are basically the counterpart of placement new that we
already plan to use, it might be actually be the proper C++ thing to
do.

(I am not sure I like it though, for all other purposes the summary
objects will look like managed by the garbage collector and only we
who read this thread will know that the lifetime of the object would
be decoupled from the allocation-span of its memory).

Thanks for the clarification,

Martin


Hello.

I tried to come up with ctor/dtor solution for types passes to symbol_summary
template class.

Example:
struct inline_summary
{
  inline_summary (cgraph_node *node);
  inline_summary (

[PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs

2014-11-19 Thread Maciej W. Rozycki

Hi,

 It has come to my attention that we create suboptimal code for the 
`__call_stub_fp_' variant of the MIPS16 call stubs.  These stubs are 
used for outgoing calls made from MIPS16 code to standard MIPS code that 
return floating-point results and may also pass floating-point 
arguments.

 This can be illustrated with this small program:

$ cat foo.c
double bar (double d);

int main (int argc, char **argv)
{
  return bar (argc);
}
$ mips-linux-gnu-gcc -O2 -mips16 -c foo.c
$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 <__call_stub_fp_bar>:
   0:   44856000mtc1a1,$f12
   4:   44846800mtc1a0,$f13
   8:   03e09021moves2,ra
   c:   0c00jal 0 <__call_stub_fp_bar>
c: R_MIPS_26bar
  10:   nop
  14:   4403mfc1v1,$f0
  18:   0248jr  s2
  1c:   44020800mfc1v0,$f1

Disassembly of section .text.startup:

 :
   0:   f100 64c4   save32,ra,s2
   4:   1800    jal 0 
4: R_MIPS16_26  __mips16_floatsidf
   8:   6500nop
   a:   67a3movea1,v1
   c:   1800    jal 0 
c: R_MIPS16_26  bar
  10:   6782movea0,v0
  12:   67a3movea1,v1
  14:   1800    jal 0 
14: R_MIPS16_26 __mips16_fix_truncdfsi
  18:   6782movea0,v0
  1a:   f100 6444   restore 32,ra,s2
  1e:   e8a0jrc ra

-- as you can see in `__call_stub_fp_bar' above the jump delay slot 
remains empty because the move to $s2 instruction cannot be scheduled 
there due to a data dependency on $ra.

 However there is no need to save $ra last and since the MIPS IV ISA 
there are no coprocessor move delay slots so the last MTC1 instruction 
could be scheduled there instead. 

 So here is a change to avoid this empty delay slot.  With this change 
in place we get this code instead:

$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 <__call_stub_fp_bar>:
   0:   03e09021moves2,ra
   4:   44856000mtc1a1,$f12
   8:   0c00jal 0 <__call_stub_fp_bar>
8: R_MIPS_26bar
   c:   44846800mtc1a0,$f13
  10:   4403mfc1v1,$f0
  14:   0248jr  s2
  18:   44020800mfc1v0,$f1

Disassembly of section .text.startup:

 :
   0:   f100 64c4   save32,ra,s2
   4:   1800    jal 0 
4: R_MIPS16_26  __mips16_floatsidf
   8:   6500nop
   a:   67a3movea1,v1
   c:   1800    jal 0 
c: R_MIPS16_26  bar
  10:   6782movea0,v0
  12:   67a3movea1,v1
  14:   1800    jal 0 
14: R_MIPS16_26 __mips16_fix_truncdfsi
  18:   6782movea0,v0
  1a:   f100 6444   restore 32,ra,s2
  1e:   e8a0jrc ra

-- as you can see the last MTC1 instruction has now been placed in the 
delay slot.

 For ISAs that do have a coprocessor move delay slot there is no gain, 
but no loss either, both delay slots remain empty due to data 
dependencies (coprocessor move delay slots):

$ mips-linux-gnu-gcc -O2 -mips1 -mips16 -c foo.c
$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 <__call_stub_fp_bar>:
   0:   03e09021moves2,ra
   4:   44856000mtc1a1,$f12
   8:   44846800mtc1a0,$f13
   c:   0c00jal 0 <__call_stub_fp_bar>
c: R_MIPS_26bar
  10:   nop
  14:   4403mfc1v1,$f0
  18:   44020800mfc1v0,$f1
  1c:   0248jr  s2
  20:   nop

Disassembly of section .text.startup:

 :
   0:   63fcaddiu   sp,-32
   2:   6772movev1,s2
   4:   6207sw  ra,28(sp)
   6:   1800    jal 0 
6: R_MIPS16_26  __mips16_floatsidf
   a:   d306sw  v1,24(sp)
   c:   67a3movea1,v1
   e:   1800    jal 0 
e: R_MIPS16_26  bar
  12:   6782movea0,v0
  14:   67a3movea1,v1
  16:   1800    jal 0 
16: R_MIPS16_26 __mips16_fix_truncdfsi
  1a:   6782movea0,v0
  1c:   9606lw  a2,24(sp)
  1e:   9707lw  a3,28(sp)
  20:   6556moves2,a2
  22:   ef00jr  a3
  24:   6304addiu   sp,32
  26:   6500nop

-- though I think this consideration is actually academic as no legacy 
MIPS16 processors have ever be

[PATCH] Fix PR63677 and a regression from alias-improvements

2014-11-19 Thread Richard Biener


PR63677 shows that late complete unrolling can expose quite some memory
CSE opportunities (and followup simplifications) which we fail to
exploit because currently DOMs ability to do memory CSE is quite
limited (it requires the redundant load to optimize to directly follow the
load or store).  This is actually a regression from before
alias-improvements, thus to the time where we had multiple virtual
operand symbols which made us handle some cases better than now
with a single memory operand symbol.

The following patch makes DOM use the alias walker to fix that.

The easiest way to do that is to no longer hash virtual operands
and use the expression hash lookup result as "candidate" if
the expression involves memory.  We can then verify if the candidate
can be used by using the alias walker to walk to the candidate.
If there is an intermediate possibly aliasing stmt we simply
replace the hashtable entry and set up enough info to be able
to rewind to the previous state during unwinding.

This means the DOM implementation of memory CSE is a tad bit more
efficient than that of SCCVN.  But it is also very simplistic as
it requires 1:1 requivalence for the memory access structure
(as compared to SCCVN which can look through various kinds of
aggregate copies or initializations as well as handling accessing
the same memory location with different memory access structures).
This is also the reason that the patch doesn't fix PR63864 which
requires us to look through aggregate copies.

To fix PR63677 I also had to teach DOM the trick of propagating
&a + CST as &MEM[&a, CST] which is used by all other propagators.

Given how easy this was to restore pre-alias-improvements state
I wonder why I didn't think of this before...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Ok for trunk?

Thanks,
Richard.

2014-11-19   Richard Biener  

PR tree-optimization/63677
* tree-ssa-dom.c: Include gimplify.h for unshare_expr.
(avail_exprs_stack): Make a vector of pairs.
(struct hash_expr_elt): Replace stmt member with vop member.
(expr_elt_hasher::equal): Simplify.
(initialize_hash_element): Adjust.
(initialize_hash_element_from_expr): Likewise.
(dom_opt_dom_walker::thread_across_edge): Likewise.
(record_cond): Likewise.
(dom_opt_dom_walker::before_dom_children): Likewise.
(print_expr_hash_elt): Likewise.
(remove_local_expressions_from_table): Restore previous state
if requested.
(record_equivalences_from_stmt): Record &x + CST as constant
&MEM[&x, CST] for further propagation.
(vuse_eq): New function.
(lookup_avail_expr): For loads use the alias oracle to see
whether a candidate from the expr hash is usable.
(avail_expr_hash): Do not hash VUSEs.

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase.
* gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise.

Index: gcc/tree-ssa-dom.c
===
*** gcc/tree-ssa-dom.c.orig 2014-11-19 13:28:46.123015743 +0100
--- gcc/tree-ssa-dom.c  2014-11-19 13:58:33.122937543 +0100
*** along with GCC; see the file COPYING3.
*** 66,71 
--- 66,72 
  #include "tree-ssa-threadedge.h"
  #include "tree-ssa-dom.h"
  #include "inchash.h"
+ #include "gimplify.h"
  
  /* This file implements optimizations on the dominator tree.  */
  
*** struct edge_info
*** 139,145 
 marker.  */
  typedef struct expr_hash_elt * expr_hash_elt_t;
  
! static vec avail_exprs_stack;
  
  /* Structure for entries in the expression hash table.  */
  
--- 140,146 
 marker.  */
  typedef struct expr_hash_elt * expr_hash_elt_t;
  
! static vec > avail_exprs_stack;
  
  /* Structure for entries in the expression hash table.  */
  
*** struct expr_hash_elt
*** 151,158 
/* The expression (rhs) we want to record.  */
struct hashable_expr expr;
  
!   /* The stmt pointer if this element corresponds to a statement.  */
!   gimple stmt;
  
/* The hash value for RHS.  */
hashval_t hash;
--- 152,160 
/* The expression (rhs) we want to record.  */
struct hashable_expr expr;
  
!   /* The virtual operand associated with the nearest dominating stmt
!  loading from or storing to expr.  */
!   tree vop;
  
/* The hash value for RHS.  */
hashval_t hash;
*** expr_elt_hasher::hash (const value_type
*** 187,196 
  inline bool
  expr_elt_hasher::equal (const value_type &p1, const compare_type &p2)
  {
-   gimple stmt1 = p1->stmt;
const struct hashable_expr *expr1 = &p1->expr;
const struct expr_hash_elt *stamp1 = p1->stamp;
-   gimple stmt2 = p2->stmt;
const struct hashable_expr *expr2 = &p2->expr;
const struct expr_hash_elt *stamp2 = p2->stamp;
  
--- 189,196 
*** expr_elt_hasher::equal (const value_type
*** 198,222 
if (stamp1 == stamp2)
  return true;
  
!

[PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-19 Thread Christian Bruel


Hello Ramana,

Here is the attribute revisited after your comments, so

 - thumb1 is now supported
 - -mflip-thump option added for testing.
 - inlining is allowed between modes.

This set of patches was tested on rev#217709 as:

no regressions with:
arm-sim/
arm-sim/-march=armv7-a
arm-sim/-mthumb
arm-sim/-mthumb/-march=armv7-a

a few artifacts, all of them analyzed, with
arm-sim/-mflip-thumb/
arm-sim/-mflip-thumb//-march=armv7-a
arm-sim/-mflip-thumb//-mthumb
arm-sim/-mflip-thumb//-mthumb/-march=armv7-a


Artifacts are analyzed, they are mostly the fault of the testsuite being 
unable to mix modes in the expected results (e.g thumb[1,2] or arm tests 
predicates, mix setjmp/longjump between modes...)


The support is split as followed:

[1/6] Reorganized arm_option_override to dynamically redefine the flags 
depending on the attribute mode.


[2/6] Reorganized macro settings to be set/unset for each #pragma targets

[3/6] Change ARM_DECLARE_FUNCTION_NAME into a function

[4/6] Implement hooks to support attribute target

[5/6] Implement #pragma target

[6/6] Add -mflip-thumb option for testing

I think I missed the stage3, Anyway would it be OK for stage1 when it 
reopens ?


Christian

RE: [PATCH] MIPS/GCC: Unconditional jump generation bug fix

2014-11-19 Thread Maciej W. Rozycki

On Tue, 18 Nov 2014, Matthew Fortune wrote:

> > > I admit to being a bit more nervous about 4.9 but the test coverage
> > > seems thorough enough. I guess I would have been less concerned if the
> > > optimisation was still just tied to TARGET_MICROMIPS for the 4.9
> > branch.
> > >
> > > Catherine, what do you think?
> > >
> > This is okay for 4.9 IMO.
> 
> OK

 FWIW we've been using this change since Oct 2012 with no issues (as I 
noted it was meant to be included with the original microMIPS support 
submission, but was lost in transit) and also GAS has code to relax 
out-of-range branches to jumps in non-PIC standard MIPS code under the 
same condition this RTL insn uses, so even if a wrong branch slipped 
through here (which it doesn't), then GAS would fix it up.

 See gas/config/tc-mips.c (md_apply_fix)  for the 
relaxation piece if interested.

  Maciej

RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-19 Thread Moore, Catherine



> -Original Message-
> From: Matthew Fortune [mailto:matthew.fort...@imgtec.com]
> Sent: Tuesday, November 18, 2014 9:42 AM
> To: Andrew Bennett; gcc-patches@gcc.gnu.org
> Cc: Moore, Catherine; Rozycki, Maciej
> Subject: RE: [PATCH] If using branch likelies in MIPS sync code fill the delay
> slot with a nop
> 
> > The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing
> > when using the -mfix-r1 option.  This is due to the fact that the
> > delay slot of the branch instruction that checks if the atomic
> > operation was not successful can be filled with an operation that
> > returns the output result.  For the branch likely case this operation
> > can not go in the delay slot because it wont execute when the atomic
> > operation was successful and therefore the wrong result will be
> > returned.  This patch forces a nop to be placed in the delay slot if a 
> > branch
> likely is used.
> > The fix is as simple as possible; it does cause a second nop to be
> > introduced after the branch instruction in the branch likely case.  As
> > this option is rarely used, it makes more sense to keep the code
> > readable than trying to optimise it.
> >
> > The atomic tests now pass successfully.  The ChangeLog and patch are
> > below.
> >
> > Ok to commit?
> 
> I'm OK with just making the fix-r1 case safe rather than also complicating
> the normal code path to remove the normal delay slot NOP in this special
> case.  I'd like to see what Catherine thinks though.  To remove the second
> NOP I believe we would have to add a !TARGET_FIX_R1 to the condition
> of the normal delay slot NOP. This seems a bit convoluted when the real
> reason is because branch likely is in use.  To make use of the
> mips_branch_likely flag it would have to be set earlier in:
> mips_output_sync_loop and also get set in mips_sync_loop_insns.
> 
> If Catherine thinks getting rid of the second NOP is important enough then
> please base it on mips_branch_likely and fix the callers.
> 
  Yes, removing the second NOP is worth the additional effort.

[PATCH, ARM] attribute target (thumb,arm) [1/6]

2014-11-19 Thread Christian Bruel


In preparation of the target attribute,

reorganize ´arm_option_override´ into 3 entities:
arm_option_override_internal_p
arm_option_check_internal
arm_option_param_internal

Also define and use TREE_TARGET macros instead of file-scope variables 
in the machine description.


Thanks,

Christian
2014-09-23  Christian Bruel  

	* config/arm/arm.h (arm_option_override): Reoganized and split.
	(arm_option_params_internal); New function.
	(arm_option_check_internal): New function.
	(arm_option_override_internal): New function.
	(restrict_default): New boolean.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros.
	(TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN a/gcc/gcc/config/arm/arm.c b/gcc/gcc/config/arm/arm.c
--- a/gcc/gcc/config/arm/arm.c	2014-11-18 08:35:38.0 +0100
+++ b/gcc/gcc/config/arm/arm.c	2014-11-18 09:11:19.0 +0100
@@ -744,6 +744,9 @@ const struct arm_fpu_desc *arm_fpu_desc;
 rtx thumb_call_via_label[14];
 static int thumb_call_reg_needed;
 
+/* Remember default option is used.  */
+static bool restrict_default;
+
 /* Bit values used to identify processor capabilities.  */
 #define FL_CO_PROC(1 << 0)/* Has external co-processor bus */
 #define FL_ARCH3M (1 << 1)/* Extended multiply */
@@ -886,12 +889,6 @@ int arm_tune_wbuf = 0;
 /* Nonzero if tuning for Cortex-A9.  */
 int arm_tune_cortex_a9 = 0;
 
-/* Nonzero if generating Thumb instructions.  */
-int thumb_code = 0;
-
-/* Nonzero if generating Thumb-1 instructions.  */
-int thumb1_code = 0;
-
 /* Nonzero if we should define __THUMB_INTERWORK__ in the
preprocessor.
XXX This is a bit of a hack, it's intended to help work around
@@ -2601,6 +2598,136 @@ arm_gimplify_va_arg_expr (tree valist, t
   return std_gimplify_va_arg_expr (valist, type, pre_p, post_p);
 }
 
+/* Check any incompatible options that the user has specified.  */
+static void
+arm_option_check_internal (struct gcc_options *opts)
+{
+  /* Make sure that the processor choice does not conflict with any of the
+ other command line choices.  */
+  if (TREE_TARGET_ARM (opts) && !(insn_flags & FL_NOTM))
+error ("target CPU does not support ARM mode");
+
+  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
+ from here where no function is being compiled currently.  */
+  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME) && TREE_TARGET_ARM (opts))
+warning (0, "enabling backtrace support is only meaningful when compiling for the Thumb");
+
+  if (TREE_TARGET_ARM (opts) && TARGET_CALLEE_INTERWORKING)
+warning (0, "enabling callee interworking support is only meaningful when compiling for the Thumb");
+
+  /* If this target is normally configured to use APCS frames, warn if they
+ are turned off and debugging is turned on.  */
+  if (TREE_TARGET_ARM (opts)
+  && write_symbols != NO_DEBUG
+  && !TARGET_APCS_FRAME
+  && (TARGET_DEFAULT & MASK_APCS_FRAME))
+warning (0, "-g with -mno-apcs-frame may not give sensible debugging");
+
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TREE_TARGET_THUMB (opts) && TARGET_IWMMXT)
+error ("iWMMXt unsupported under Thumb mode");
+
+  if (TARGET_HARD_TP && TREE_TARGET_THUMB1 (opts))
+error ("can not use -mtp=cp15 with 16-bit Thumb");
+
+  if (TREE_TARGET_THUMB (opts) && TARGET_VXWORKS_RTP && flag_pic)
+{
+  error ("RTP PIC is incompatible with Thumb");
+  flag_pic = 0;
+}
+
+  /* We only support -mslow-flash-data on armv7-m targets.  */
+  if (target_slow_flash_data
+  && ((!(arm_arch7 && !arm_arch_notm) && !arm_arch7em)
+	  || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON)))
+error ("-mslow-flash-data only supports non-pic code on armv7-m targets");
+}
+
+/* Check any params depending on attributes that the user has specified.  */
+static void
+arm_option_params_internal (struct gcc_options *opts)
+{
+ /* If we are not using the default (ARM mode) section anchor offset
+ ranges, then set the correct ranges now.  */
+  if (TREE_TARGET_THUMB1 (opts))
+{
+  /* Thumb-1 LDR instructions cannot have negative offsets.
+ Permissible positive offset ranges are 5-bit (for byte loads),
+ 6-bit (for halfword loads), or 7-bit (for word loads).
+ Empirical results suggest a 7-bit anchor range gives the best
+ overall code size.  */
+  targetm.min_anchor_offset = 0;
+  targetm.max_anchor_offset = 127;
+}
+  else if (TREE_TARGET_THUMB2 (opts))
+{
+  /* The minimum is set such that the total size of the block
+ for a particular anchor is 248 + 1 + 4095 bytes, which is
+ divisible by eight, ensuring natural spacing of anchors.  */
+  targetm.min_anchor_offset = -248;
+  targetm.max_anchor_offset = 4095;

Re: [PING^4][PATCH] Warn about unclosed pragma omp declare target.

2014-11-19 Thread Ilya Tocar

As omp target and offloading support is committed to trunk,
I think it's reasonable to add some new warnings.

On 06 Nov 15:27, Ilya Tocar wrote:
> Ping.
> On 30 Oct 18:31, Ilya Tocar wrote:
> > Ping.
> > On 20 Oct 19:26, Ilya Tocar wrote:
> > > Ping.
> > > 
> > > On 02 Oct 17:38, Ilya Tocar wrote:
> > > > Ping.
> > > > On 15 Aug 16:26, Ilya Tocar wrote:
> > > > > Ping.
> > > > > 
> > > > > On 29 Jul 18:45, Ilya Tocar wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > As discussed here in 
> > > > > > https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
> > > > > > Gcc should complain about pragma omp declare target without
> > > > > > corresponding pragma omp end declare target. This patch adds a 
> > > > > > warning
> > > > > > for those cases.
> > > > > > Bootstraps/passes make-check.
> > > > > > Ok for trunk?
> > > > > > 
> > > > > > ChangeLog:
> > > > > > 
> > > > > > 2014-07-29  Ilya Tocar  
> > > > > > 
> > > > > > * c-decl.c (omp_declare_target_location_stack): New.
> > > > > > * c-lang.h (omp_declare_target_location_stack): Declare.
> > > > > > * c-parser.c (warn_unclosed_pragma_omp_target): New.
> > > > > > (c_parser_translation_unit): Call it.
> > > > > > (c_parser_omp_declare_target): Remeber location.
> > > > > > (c_parser_omp_end_declare_target): Forget location.
> > > > > > 
> > > > > > And ChangeLog for testsuite:
> > > > > > 
> > > > > > 2014-07-29  Ilya Tocar  
> > > > > > 
> > > > > > * gcc.dg/gomp//target-3.c: New testcase.
> > > > > > 
> > > > > > ---
> > > > > >  gcc/c/c-decl.c   |  3 +++
> > > > > >  gcc/c/c-lang.h   |  3 +++
> > > > > >  gcc/c/c-parser.c | 22 +-
> > > > > >  gcc/testsuite/gcc.dg/gomp/target-3.c | 33 
> > > > > > +
> > > > > >  4 files changed, 60 insertions(+), 1 deletion(-)
> > > > > >  create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
> > > > > > 
> > > > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > > > > index 2a4b439..2dd5b2c 100644
> > > > > > --- a/gcc/c/c-decl.c
> > > > > > +++ b/gcc/c/c-decl.c
> > > > > > @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = 
> > > > > > VOIDmode;
> > > > > >  /* If non-zero, implicit "omp declare target" attribute is added 
> > > > > > into the
> > > > > > attribute lists.  */
> > > > > >  int current_omp_declare_target_attribute;
> > > > > > +
> > > > > > +/* Holds locations of currently open "omp declare target" pragmas. 
> > > > > >  */
> > > > > > +vec omp_declare_target_location_stack;
> > > > > >  
> > > > > >  /* Each c_binding structure describes one binding of an identifier 
> > > > > > to
> > > > > > a decl.  All the decls in a scope - irrespective of namespace - 
> > > > > > are
> > > > > > diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
> > > > > > index e974906..cef995c 100644
> > > > > > --- a/gcc/c/c-lang.h
> > > > > > +++ b/gcc/c/c-lang.h
> > > > > > @@ -59,4 +59,7 @@ struct GTY(()) language_function {
> > > > > > attribute lists.  */
> > > > > >  extern GTY(()) int current_omp_declare_target_attribute;
> > > > > >  
> > > > > > +/* Holds locations of currently open "omp declare target" pragmas. 
> > > > > >  */
> > > > > > +extern vec omp_declare_target_location_stack;
> > > > > > +
> > > > > >  #endif /* ! GCC_C_LANG_H */
> > > > > > diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> > > > > > index e32bf04..0b96fe9 100644
> > > > > > --- a/gcc/c/c-parser.c
> > > > > > +++ b/gcc/c/c-parser.c
> > > > > > @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd 
> > > > > > (c_parser *, enum pragma_context);
> > > > > >  static tree c_parser_array_notation (location_t, c_parser *, tree, 
> > > > > > tree);
> > > > > >  static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, 
> > > > > > bool);
> > > > > >  
> > > > > > +static void warn_unclosed_pragma_omp_target ();
> > > > > > +
> > > > > >  /* Parse a translation unit (C90 6.7, C99 6.9).
> > > > > >  
> > > > > > translation-unit:
> > > > > > @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
> > > > > > }
> > > > > >while (c_parser_next_token_is_not (parser, CPP_EOF));
> > > > > >  }
> > > > > > +
> > > > > > +  warn_unclosed_pragma_omp_target ();
> > > > > >  }
> > > > > >  
> > > > > >  /* Parse an external declaration (C90 6.7, C99 6.9).
> > > > > > @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser 
> > > > > > *parser, tree fndecl, tree parms,
> > > > > >  static void
> > > > > >  c_parser_omp_declare_target (c_parser *parser)
> > > > > >  {
> > > > > > +  location_t loc = c_parser_peek_token (parser)->location;
> > > > > >c_parser_skip_to_pragma_eol (parser);
> > > > > >current_omp_declare_target_attribute++;
> > > > > > +  omp_declare_target_location_stack.safe_push (loc);
> > > > > >  }
> > > > > >  
> > > > > >  static void
> > > > > > @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser 
> > > > > > *

[PATCH, ARM] attribute target (thumb,arm) [2/6]

2014-11-19 Thread Christian Bruel


In preparation of the pragma target

reorganize ´TARGET_CPU_CPP_BUILTINS´ to redefine mode dependent macros 
based on current thumb_p.


Thanks,

Christian
2014-09-23  Christian Bruel  

	* config/arm/arm-c.c (cpp_def_or_undef): New functions.
	(arm_cpp_builtins): Likewise.
	* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Move mode dependant 
	macros to arm_cpp_builtins.
	* config/arm/arm-protos.h (arm_cpp_builtins): Declare.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN b/gcc/gcc/config/arm/arm-c.c c/gcc/gcc/config/arm/arm-c.c
--- b/gcc/gcc/config/arm/arm-c.c	2014-11-05 14:34:49.0 +0100
+++ c/gcc/gcc/config/arm/arm-c.c	2014-11-13 11:16:59.0 +0100
@@ -42,3 +42,73 @@ arm_lang_object_attributes_init (void)
 {
   arm_lang_output_object_attributes_hook = arm_output_c_attributes;
 }
+
+/* Define or undefine macro.  */
+
+static void
+cpp_def_or_undef (struct cpp_reader *in, const char *str, bool def_p)
+{
+  if (def_p)
+cpp_define (in, str);
+  else
+cpp_undef (in, str);
+}
+
+/* Define or undefine macros based on the current target.  If the user does
+   #pragma GCC target, we need to adjust the macros dynamically.  */
+
+void
+arm_cpp_builtins (struct cpp_reader *in, bool thumb_p)
+{
+  bool target_32bit_p = !thumb_p || arm_arch_thumb2;
+  bool thumb2_p = thumb_p && arm_arch_thumb2;
+  bool have_ldrex_p = (arm_arch6 && !thumb_p) || arm_arch7;
+  bool have_ldrexbh_p = (arm_arch6k && !thumb_p) || arm_arch7;
+  bool have_ldrexd_p = ((arm_arch6k && !thumb_p) || arm_arch7)
+&& arm_arch_notm;
+
+  int arm_feature_ldrex = (have_ldrex_p ? 4 : 0)
+| (have_ldrexbh_p ? 3 : 0) | (have_ldrexd_p ? 8 : 0);
+
+  cpp_def_or_undef (in, "__thumb__", thumb_p);
+  if (arm_arch_thumb2)
+cpp_def_or_undef (in, "__thumb2__", thumb_p);
+  if (TARGET_BIG_END)
+cpp_def_or_undef (in, "__THUMBEB__", thumb_p);
+  else
+cpp_def_or_undef (in, "__THUMBEL__", thumb_p);
+
+  cpp_def_or_undef (in, "__ARM_32BIT_STATE", target_32bit_p); /* TARGET_32BIT  */
+
+  if (arm_arch5e && (arm_arch_notm || arm_arch7))   /* TARGET_ARM_QBIT  */
+cpp_def_or_undef (in, "__ARM_FEATURE_QBIT", target_32bit_p);
+
+  if (arm_arch6 && (arm_arch_notm || arm_arch7))/* TARGET_ARM_SAT  */
+cpp_def_or_undef (in, "__ARM_FEATURE_SAT", target_32bit_p);
+
+  if (arm_arch5e && (arm_arch_notm || arm_arch7em)) /* TARGET_DSP_MULTIPLY  */
+cpp_def_or_undef (in, "__ARM_FEATURE_DSP", target_32bit_p);
+
+  if (arm_arch6 && (arm_arch_notm || arm_arch7em))  /* TARGET_INT_SIMD  */
+cpp_def_or_undef (in, "__ARM_FEATURE_SIMD32", target_32bit_p);
+
+ /* TARGET_IDIV  */
+  cpp_def_or_undef (in, "__ARM_ARCH_EXT_IDIV__",
+		((!thumb_p && arm_arch_arm_hwdiv)
+		 || (thumb2_p && arm_arch_thumb_hwdiv)));
+
+  cpp_def_or_undef (in, "__ARM_FEATURE_IDIV",
+		((!thumb_p && arm_arch_arm_hwdiv)
+		 || (thumb2_p && arm_arch_thumb_hwdiv)));
+
+ if (arm_feature_ldrex)
+   cpp_define_formatted (in, "__ARM_FEATURE_LDREX=%d", arm_feature_ldrex);
+ else
+   cpp_undef (in, "__ARM_FEATURE_LDREX");
+
+ cpp_def_or_undef (in, "__ARM_FEATURE_CLZ",
+		   ((TARGET_ARM_ARCH >= 5 && !thumb_p) || TARGET_ARM_ARCH_ISA_THUMB >=2));
+
+ cpp_def_or_undef (in, "__ARM_ASM_SYNTAX_UNIFIED__", inline_asm_unified);
+}
+
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN b/gcc/gcc/config/arm/arm.h c/gcc/gcc/config/arm/arm.h
--- b/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:50.0 +0100
+++ c/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:37.0 +0100
@@ -48,29 +48,12 @@ extern char arm_arch_name[];
 #define TARGET_CPU_CPP_BUILTINS()			\
   do			\
 {			\
-	if (TARGET_DSP_MULTIPLY)			\
-	   builtin_define ("__ARM_FEATURE_DSP");	\
-if (TARGET_ARM_QBIT)\
-   builtin_define ("__ARM_FEATURE_QBIT");	\
-if (TARGET_ARM_SAT)\
-   builtin_define ("__ARM_FEATURE_SAT");	\
 if (TARGET_CRYPTO)\
 	   builtin_define ("__ARM_FEATURE_CRYPTO");	\
 	if (unaligned_access)\
 	  builtin_define ("__ARM_FEATURE_UNALIGNED");	\
 	if (TARGET_CRC32)\
 	  builtin_define ("__ARM_FEATURE_CRC32");	\
-	if (TARGET_32BIT)\
-	  builtin_define ("__ARM_32BIT_STATE");		\
-	if (TARGET_ARM_FEATURE_LDREX)\
-	  builtin_define_with_int_value (			\
-	"__ARM_FEATURE_LDREX", TARGET_ARM_FEATURE_LDREX);	\
-	if ((TARGET_ARM_ARCH >= 5 && !TARGET_THUMB)		\
-	 || TARGET_ARM_ARCH_ISA_THUMB >=2)			\
-	  builtin_define ("__ARM_FEATURE_CLZ");			\
-	if (TARGET_INT_SIMD)	\
-	  builtin_define ("__ARM_FEATURE_SIMD32");		\
-\
 	builtin_define_with_int_value (\
 	  "__ARM_SIZEOF_MINIMAL_ENUM",\
 	  flag_short_enums ? 1 : 4);\
@@ -89,10 +72,6 @@ extern char arm_arch_name[];
 	if (arm_arch_notm)\
 	  builtin_define ("__ARM_ARCH_ISA_ARM");	\
 	builtin_define ("__APCS_32__");			\
-	if (TARGET_THUMB)\
-	  builtin_define ("__thumb__");			\
-	if (TARGET_THUMB2)\
-	  builtin_define ("__thumb2__");		\
 	if (TARGET_ARM_ARCH_ISA_THUMB

[PATCH, ARM] attribute target (thumb,arm) [3/6]

2014-11-19 Thread Christian Bruel

Re-implement ARM_DECLARE_FUNCTION_NAME as a function. That will make 
changed related to unified/divided and mode directives easier to insert.


Thanks

Christian
2014-09-23  Christian Bruel  

	* config/arm/arm-protos.h (arm_declare_function_name): Declare.
	(is_called_in_ARM_mode): Remove.
	* config/arm/arm.c (is_called_in_ARM_mode): Declare static bool.
	(arm_declare_function_name): Moved from from ARM_DECLARE_FUNCTION_NAME.
	* config/arm/arm.h (ARM_DECLARE_FUNCTION_NAME): Call
	 arm_declare_function_name.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm.c d/gcc/gcc/config/arm/arm.c
--- c/gcc/gcc/config/arm/arm.c	2014-11-18 08:52:48.0 +0100
+++ d/gcc/gcc/config/arm/arm.c	2014-11-18 08:51:50.0 +0100
@@ -26422,6 +26422,23 @@ thumb_pop (FILE *f, unsigned long mask)
   fprintf (f, "}\n");
 }
 
+/* Return nonzero if FUNC must be entered in ARM mode.  */
+static bool
+is_called_in_ARM_mode (tree func)
+{
+  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
+
+  /* Ignore the problem about functions whose address is taken.  */
+  if (TARGET_CALLEE_INTERWORKING && TREE_PUBLIC (func))
+return true;
+
+#ifdef ARM_PE
+  return lookup_attribute ("interfacearm", DECL_ATTRIBUTES (func)) != NULL_TREE;
+#else
+  return false;
+#endif
+}
+
 /* Generate code to return from a thumb function.
If 'reg_containing_return_addr' is -1, then the return address is
actually on the stack, at the stack pointer.  */
@@ -26857,22 +26874,6 @@ thumb_far_jump_used_p (void)
   return 0;
 }
 
-/* Return nonzero if FUNC must be entered in ARM mode.  */
-int
-is_called_in_ARM_mode (tree func)
-{
-  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
-
-  /* Ignore the problem about functions whose address is taken.  */
-  if (TARGET_CALLEE_INTERWORKING && TREE_PUBLIC (func))
-return TRUE;
-
-#ifdef ARM_PE
-  return lookup_attribute ("interfacearm", DECL_ATTRIBUTES (func)) != NULL_TREE;
-#else
-  return FALSE;
-#endif
-}
 
 /* Given the stack offsets and register mask in OFFSETS, decide how
many additional registers to push instead of subtracting a constant
@@ -32386,6 +32387,25 @@ arm_is_constant_pool_ref (rtx x)
 	  && CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)));
 }
 
+void
+arm_declare_function_name (FILE *stream, const char *name, tree decl)
+{
+  if (TARGET_THUMB)
+{
+  if (is_called_in_ARM_mode (decl)
+	  || (TARGET_THUMB1 && !TARGET_THUMB1_ONLY
+	  && cfun->is_thunk))
+	fprintf (stream, "\t.code 32\n");
+  else if (TARGET_THUMB1)
+	fprintf (stream, "\t.code\t16\n\t.thumb_func\n");
+  else
+	fprintf (stream, "\t.thumb\n\t.thumb_func\n");
+}
+
+  if (TARGET_POKE_FUNCTION_NAME)
+arm_poke_function_name (stream, (const char *) name);
+}
+
 /* If MEM is in the form of [base+offset], extract the two parts
of address and set to BASE and OFFSET, otherwise return false
after clearing BASE and OFFSET.  */
@@ -32506,4 +32526,5 @@ arm_sched_fusion_priority (rtx_insn *ins
   *pri = tmp;
   return;
 }
+
 #include "gt-arm.h"
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm.h d/gcc/gcc/config/arm/arm.h
--- c/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:37.0 +0100
+++ d/gcc/gcc/config/arm/arm.h	2014-11-13 12:19:45.0 +0100
@@ -2184,23 +2184,7 @@ extern int making_const_table;
? 1 : 0)
 
 #define ARM_DECLARE_FUNCTION_NAME(STREAM, NAME, DECL) 	\
-  do			\
-{			\
-  if (TARGET_THUMB) \
-{		\
-  if (is_called_in_ARM_mode (DECL)		\
-	  || (TARGET_THUMB1 && !TARGET_THUMB1_ONLY	\
-		  && cfun->is_thunk))	\
-fprintf (STREAM, "\t.code 32\n") ;		\
-  else if (TARGET_THUMB1)			\
-   fprintf (STREAM, "\t.code\t16\n\t.thumb_func\n") ;	\
-  else		\
-   fprintf (STREAM, "\t.thumb\n\t.thumb_func\n") ;	\
-}		\
-  if (TARGET_POKE_FUNCTION_NAME)			\
-arm_poke_function_name (STREAM, (const char *) NAME);	\
-}			\
-  while (0)
+  arm_declare_function_name ((STREAM), (NAME), (DECL));
 
 /* For aliases of functions we use .thumb_set instead.  */
 #define ASM_OUTPUT_DEF_FROM_DECLS(FILE, DECL1, DECL2)		\
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm-protos.h d/gcc/gcc/config/arm/arm-protos.h
--- c/gcc/gcc/config/arm/arm-protos.h	2014-11-13 11:16:17.0 +0100
+++ d/gcc/gcc/config/arm/arm-protos.h	2014-11-13 12:21:36.0 +0100
@@ -30,6 +30,7 @@ extern void arm_load_pic_register (unsig
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
+extern void arm_declare_function_name (FILE *, const char *, tree);
 extern void thumb2_expand_return (bool);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
@@ -178,9 +179,6 @@ extern const char *thumb1_unexpanded_epi
 exter

Re: Stream out default optimization nodes

2014-11-19 Thread H.J. Lu

On Tue, Nov 18, 2014 at 9:29 AM, Jan Hubicka  wrote:
>> On Tue, Nov 18, 2014 at 9:27 AM, Jan Hubicka  wrote:
>> >> https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00473.html
>> >>
>> >> /export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/xg++
>> >> -B/export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/
>> >> -B/usr/5.0.0/x86_64-unknown-linux-gnu/bin/ -nostdinc++
>> >> -B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
>> >> -B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
>> >>  
>> >> -I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
>> >>  
>> >> -I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include
>> >>  
>> >> -I/export/gnu/import/git/gcc-test-profiled/src-trunk/libstdc++-v3/libsupc++
>> >> -L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
>> >> -L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
>> >>   -g -O2 -flto=jobserver -frandom-seed=1 -fprofile-use -DIN_GCC
>> >> -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
>> >> -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute
>> >> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
>> >> -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -DGENERATOR_FILE
>> >> -static-libstdc++ -static-libgcc  -o build/genmatch \
>> >> build/genmatch.o ../libcpp/libcpp.a ../libiberty/libiberty.a
>> >> build/errors.o build/vec.o build/hash-table.o
>> >> .././libiberty/libiberty.a
>> >> ../../src-trunk/libcpp/lex.c: In function âend_directiveâ:
>> >> ../../src-trunk/libcpp/lex.c:442:43: error:
>> >> â__builtin_ia32_pcmpestri128â needs isa option -m32 -msse4.2
>> >>index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
>> >>^
>> >> make[7]: *** [/tmp/ccTC6Hk9.ltrans9.ltrans.o] Error 1
>> >
>> > Indeed, it looks like the bug is that search_line_sse42 gets inlined int
>> > end_directive that is compiled w/o SSE support.  Probably something that
>> > happened previously, too, just led to compiling the function with
>> > SSE4.2
>> >
>> > I will need to setup -m32 LTO bootstrap enviornment...
>> >
>>
>> This is -m64 LTO, not -m32.
> OK then the message seems bogus, too.  I will try to reproduce it.
>

I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63963

-- 
H.J.

[PATCH, ARM] attribute target (thumb,arm) [4/6]

2014-11-19 Thread Christian Bruel


Implements and document the hooks to support target_attributes.

The emission of blx is handled directly for armv5 to overcome a bug with 
the current binutils that fails with calls to a static symbol in a 
different section. (e.g .text -> .text.startup) in different modes.

(ref https://sourceware.org/bugzilla/show_bug.cgi?id=17505)

Regtests included

Thanks

Christian
2014-09-23  Christian Bruel  

	* config/arm/arm.opt (mthumb): Save.
	* config/arm/arm.h (arm_valid_target_attribute_tree): Declare.
	(arm_reset_previous_fndecl, arm_change_mode_p): Likewise.
	(SWITCHABLE_TARGET): Define.
	* config/arm/arm.c (arm_reset_previous_fndecl): New functions.
	(arm_valid_target_attribute_tree, arm_change_mode_p): Likewise.
	(arm_valid_target_attribute_p): Likewise.
	(arm_set_current_function, arm_can_inline_p): Likewise.
	(arm_valid_target_attribute_rec): Likewise.
	(arm_previous_fndecl): New variable.
	(TARGET_SET_CURRENT_FUNCTION, TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
	(TARGET_CAN_INLINE_P): Define.
	(arm_asm_trampoline_template): Emit mode.
	(arm_file_start): Don't set unified syntax.
	(arm_declare_function_name): Set unified syntax and mode.
	(arm_option_override): Init target_option_default_node.
	and target_option_current_node.
	* config/arm/arm.md (*call_value_symbol): Set mode when possible.
	(*call_symbol): Likewise.
	* doc/extend.texi: Document ARM target and pragma attribute.
	* doc/invoke.texi: Likewise.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN d/gcc/gcc/config/arm/arm.c e/gcc/gcc/config/arm/arm.c
--- d/gcc/gcc/config/arm/arm.c	2014-11-18 08:51:50.0 +0100
+++ e/gcc/gcc/config/arm/arm.c	2014-11-18 09:05:57.0 +0100
@@ -80,6 +80,7 @@
 #include "opts.h"
 #include "dumpfile.h"
 #include "gimple-expr.h"
+#include "target-globals.h"
 #include "builtins.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
@@ -258,6 +259,9 @@ static tree arm_build_builtin_va_list (v
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static void arm_set_current_function (tree);
+static bool arm_can_inline_p (tree, tree);
+static bool arm_valid_target_attribute_p (tree, tree, tree, int);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
@@ -400,6 +404,9 @@ static const struct attribute_spec arm_a
 #undef  TARGET_ASM_FUNCTION_EPILOGUE
 #define TARGET_ASM_FUNCTION_EPILOGUE arm_output_function_epilogue
 
+#undef TARGET_CAN_INLINE_P
+#define TARGET_CAN_INLINE_P arm_can_inline_p
+
 #undef  TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE arm_option_override
 
@@ -412,6 +419,12 @@ static const struct attribute_spec arm_a
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST arm_adjust_cost
 
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION arm_set_current_function
+
+#undef TARGET_OPTION_VALID_ATTRIBUTE_P
+#define TARGET_OPTION_VALID_ATTRIBUTE_P arm_valid_target_attribute_p
+
 #undef TARGET_SCHED_REORDER
 #define TARGET_SCHED_REORDER arm_sched_reorder
 
@@ -3205,6 +3218,11 @@ arm_option_override (void)
 
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
+
+  /* Save the initial options in case the user does function specific
+ options.  */
+  target_option_default_node = target_option_current_node
+= build_target_option_node (&global_options);
 }
 
 static void
@@ -3358,13 +3376,20 @@ arm_warn_func_return (tree decl)
 static void
 arm_asm_trampoline_template (FILE *f)
 {
+  if (TARGET_UNIFIED_ASM)
+fprintf (f, "\t.syntax unified\n");
+  else
+fprintf (f, "\t.syntax divided\n");
+
   if (TARGET_ARM)
 {
+  fprintf (f, "\t.arm\n");
   asm_fprintf (f, "\tldr\t%r, [%r, #0]\n", STATIC_CHAIN_REGNUM, PC_REGNUM);
   asm_fprintf (f, "\tldr\t%r, [%r, #0]\n", PC_REGNUM, PC_REGNUM);
 }
   else if (TARGET_THUMB2)
 {
+  fprintf (f, "\t.thumb\n");
   /* The Thumb-2 trampoline is similar to the arm implementation.
 	 Unlike 16-bit Thumb, we enter the stub in thumb mode.  */
   asm_fprintf (f, "\tldr.w\t%r, [%r, #4]\n",
@@ -26874,6 +26899,23 @@ thumb_far_jump_used_p (void)
   return 0;
 }
 
+/* Check that FUNC is called with a different mode.  */
+
+bool
+arm_change_mode_p (tree func)
+{
+  if (TREE_CODE (func) != FUNCTION_DECL)
+return false;
+
+  tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (func);
+
+  if (!callee_tree)
+callee_tree = target_option_default_node;
+
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+
+  return (TREE_TARGET_THUMB (callee_opts) != TARGET_THUMB);
+}
 
 /* Given the stack offsets and register mask in OFFSETS, decide how
many additional registers to push instead of subtracting a constant
@@ -28448,9 +28490,6 @@ arm_file_start (void)
 {
   int val;
 
-  if (TARGET

[PATCH] Fix sanitizer/63690

2014-11-19 Thread Marek Polacek

As the testcase shows, we should really check first that we
have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE.
On an invalid testcase we might get e.g. STRING_CST.

Bootstrapped/regtested on power8-linux, ok for trunk?

2014-11-19  Marek Polacek  

PR sanitizer/63690
* ubsan.c (instrument_object_size): Check for MEM_REF.

* gcc.dg/ubsan/pr63690.c: New test.

diff --git gcc/testsuite/gcc.dg/ubsan/pr63690.c 
gcc/testsuite/gcc.dg/ubsan/pr63690.c
index e69de29..bcf520a 100644
--- gcc/testsuite/gcc.dg/ubsan/pr63690.c
+++ gcc/testsuite/gcc.dg/ubsan/pr63690.c
@@ -0,0 +1,9 @@
+/* PR sanitizer/63690 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined" } */
+
+void
+foo (void)
+{
+  (*"c")++;
+}
diff --git gcc/ubsan.c gcc/ubsan.c
index 7d1e341..ad5665f 100644
--- gcc/ubsan.c
+++ gcc/ubsan.c
@@ -1539,7 +1539,13 @@ instrument_object_size (gimple_stmt_iterator *gsi, bool 
is_lhs)
 return;
 
   bool decl_p = DECL_P (inner);
-  tree base = decl_p ? inner : TREE_OPERAND (inner, 0);
+  tree base;
+  if (decl_p)
+base = inner;
+  else if (TREE_CODE (inner) == MEM_REF)
+base = TREE_OPERAND (inner, 0);
+  else
+return;
   tree ptr = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (t)), t);
 
   while (TREE_CODE (base) == SSA_NAME)

Marek

[patch, asan, testsuite] Adjust asan expected backtrace

2014-11-19 Thread FX


The attached patch fixes 23 asan testsuite failures on x86_64-apple-darwin14, 
where the backtrace features e.g. wrap_malloc instead of interceptor_malloc.
So I adjusted the dg-output patterns to match.

OK to commit to trunk?



2014-11-19  Francois-Xavier Coudert  

* c-c++-common/asan/heap-overflow-1.c: Ajust dg-output.
* c-c++-common/asan/memcmp-1.c: Likewise.
* c-c++-common/asan/strncpy-overflow-1.c: Likewise.
* c-c++-common/asan/use-after-free-1.c: Likewise.



asan.diff
Description: Binary data

Re: [PATCH 17/21] PR jit/63854: Fix leaking vec in jit

2014-11-19 Thread Trevor Saunders

On Wed, Nov 19, 2014 at 05:46:17AM -0500, David Malcolm wrote:
> This fixes various leaks of vec buffers seen via valgrind within jit
> (both recording and playback).
> 
> Various vec<> within jit::recording are converted to auto_vec<>.
> 
> Various playback::wrapper subclasses containing vec<> gain a finalizer
> so they can release the vec when they are collected.

It seems kind of tempting to just use a virtual dtor as the finalizer,
but I'm not sure how to make that work nicely with your overriding of
operator new.

Trev

> 
> gcc/jit/ChangeLog:
>   PR jit/63854
>   * jit-playback.c (gcc::jit::playback::compound_type::set_fields):
>   Convert param from const vec & to
>   const auto_vec *.
>   (gcc::jit::playback::context::new_function_type): Convert param
>   "param_types" from vec * to const auto_vec *.
>   (gcc::jit::playback::context::new_function): Convert param
>   "params" from vec * to const auto_vec *.
>   (gcc::jit::playback::context::build_call): Convert param "args"
>   from vec to const auto_vec *.
>   (gcc::jit::playback::context::new_call): Likewise.
>   (gcc::jit::playback::context::new_call_through_ptr): Likewise.
>   (wrapper_finalizer): New function.
>   (gcc::jit::playback::wrapper::operator new): Call the finalizer
>   variant of ggc_internal_cleared_alloc, supplying
>   wrapper_finalizer.
>   (gcc::jit::playback::function::finalizer): New.
>   (gcc::jit::playback::block::finalizer): New.
>   (gcc::jit::playback::source_file::finalizer): New.
>   (gcc::jit::playback::source_line::finalizer): New.
> 
>   * jit-playback.h
>   (gcc::jit::playback::context::new_function_type): Convert param
>   "param_types" from vec * to const auto_vec *.
>   (gcc::jit::playback::context::new_function): Convert param
>   "params" from vec * to const auto_vec *.
>   (gcc::jit::playback::context::new_call): Convert param
>   "args" from vec to const auto_vec *.
>   (gcc::jit::playback::context::new_call_through_ptr): Likewise.
>   (gcc::jit::playback::context::build_call): Likewise.
>   (gcc::jit::playback::context): Convert fields "m_functions",
>   "m_source_files", "m_cached_locations" from vec to auto_vec.
>   (gcc::jit::playback::wrapper::finalizer): New virtual function.
>   (gcc::jit::playback::compound_type::set_fields): Convert param fro
>   const vec & to
>   const auto_vec *.
>   (gcc::jit::playback::function::finalizer): New.
>   (gcc::jit::playback::block::finalizer): New.
>   (gcc::jit::playback::source_file::finalizer): New.
>   (gcc::jit::playback::source_line::finalizer): New.
> 
>   * jit-recording.c
>   (gcc::jit::recording::function_type::replay_into): Convert local
>   from a vec into an auto_vec.
>   (gcc::jit::recording::fields::replay_into): Likewise.
>   (gcc::jit::recording::function::replay_into): Likewise.
>   (gcc::jit::recording::call::replay_into): Likewise.
>   (gcc::jit::recording::call_through_ptr::replay_into): Likewise.
> 
>   * jit-recording.h (gcc::jit::recording::context): Convert fields
>   "m_mementos", "m_compound_types", "m_functions" from vec<> to
>   auto_vec <>.
>   (gcc::jit::recording::function_type::get_param_types): Convert
>   return type from vec to const vec &.
>   (gcc::jit::recording::function_type): Convert field
>   "m_param_types" from a vec<> to an auto_vec<>.
>   (gcc::jit::recording::fields): Likewise for field "m_fields".
>   (gcc::jit::recording::function::get_params): Convert return type
>   from vec  to const vec &.
>   (gcc::jit::recording::function): Convert fields "m_params",
>   "m_locals", "m_blocks" from vec<> to auto_vec<>.
>   (gcc::jit::recording::block): Likewise for field "m_statements".
>   vec<> to auto_vec<>.
>   (gcc::jit::recording::call): Likewise for field "m_args".
>   (gcc::jit::recording::call_through_ptr): Likewise.
> ---
>  gcc/jit/jit-playback.c  | 73 
> +
>  gcc/jit/jit-playback.h  | 27 --
>  gcc/jit/jit-recording.c | 16 +--
>  gcc/jit/jit-recording.h | 26 +-
>  4 files changed, 100 insertions(+), 42 deletions(-)
> 
> diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
> index 285a3ef..8fdfa29 100644
> --- a/gcc/jit/jit-playback.c
> +++ b/gcc/jit/jit-playback.c
> @@ -285,15 +285,15 @@ new_compound_type (location *loc,
>  }
>  
>  void
> -playback::compound_type::set_fields (const vec &fields)
> +playback::compound_type::set_fields (const auto_vec 
> *fields)
>  {
>/* Compare with c/c-decl.c: finish_struct. */
>tree t = as_tree ();
>  
>tree fieldlist = NULL;
> -  for (unsigned i = 0; i < fields.length (); i++)
> +  for (unsigned i = 0; i < fields->length (); i++)
>  {
> -  field *f = fields[i];
> +  field *f = (*fields)[i];
>DECL_CONTEXT (f->as_t

[PATCH, ARM] attribute target (thumb,arm) [5/6]

2014-11-19 Thread Christian Bruel


Implements the hooks for #pragma GCC target

A test included to check that macros were correctly defined/undefined on 
pragma regions.


Thanks

Christian
2014-09-23  Christian Bruel  

	* config/arm/arm.h (REGISTER_TARGET_PRAGMAS):
	 Call arm_register_target_pragmas.
	* config/arm/arm-protos.h (arm_register_target_pragmas): Declare.
	* config/arm/arm-c.c (arm_register_target_pragmas): New function.
	(arm_pragma_target_parse): Likewise.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm-c.c f/gcc/gcc/config/arm/arm-c.c
--- e/gcc/gcc/config/arm/arm-c.c	2014-11-13 13:03:27.0 +0100
+++ f/gcc/gcc/config/arm/arm-c.c	2014-11-13 13:53:45.0 +0100
@@ -20,9 +20,12 @@
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
-#include "tm_p.h"
 #include "tree.h"
+#include "tm_p.h"
 #include "c-family/c-common.h"
+#include "target.h"
+#include "target-def.h"
+#include "c-family/c-pragma.h"
 
 /* Output C specific EABI object attributes.  These can not be done in
arm.c because they require information from the C frontend.  */
@@ -112,3 +115,78 @@ arm_cpp_builtins (struct cpp_reader *in,
  cpp_def_or_undef (in, "__ARM_ASM_SYNTAX_UNIFIED__", inline_asm_unified);
 }
 
+/* Hook to validate the current #pragma GCC target and set the arch custom
+   mode state.  If ARGS is NULL, then POP_TARGET is used to reset
+   the options.  */
+static bool
+arm_pragma_target_parse (tree args, tree pop_target)
+{
+  tree prev_tree = build_target_option_node (&global_options);
+  tree cur_tree;
+  struct cl_target_option *prev_opt;
+  struct cl_target_option *cur_opt;
+  bool cur_mode, prev_mode;
+
+  if (! args)
+{
+  cur_tree = ((pop_target) ? pop_target : target_option_default_node);
+  cl_target_option_restore (&global_options,
+TREE_TARGET_OPTION (cur_tree));
+}
+  else
+{
+  cur_tree = arm_valid_target_attribute_tree (args,  &global_options);
+  if (cur_tree == NULL_TREE)
+	{
+	  cl_target_option_restore (&global_options,
+TREE_TARGET_OPTION (prev_tree));
+	  return false;
+	}
+}
+
+  target_option_current_node = cur_tree;
+  arm_reset_previous_fndecl ();
+
+  /* Figure out the previous mode.  */
+  prev_opt  = TREE_TARGET_OPTION (prev_tree);
+  cur_opt   = TREE_TARGET_OPTION (cur_tree);
+
+  gcc_assert (prev_opt);
+  gcc_assert (cur_opt);
+
+  cur_mode = TARGET_THUMB_P (cur_opt->x_target_flags);
+  prev_mode = TARGET_THUMB_P (prev_opt->x_target_flags);
+
+  if (prev_mode != cur_mode)
+{
+  /* For the definitions, ensure all newly defined macros are considered
+	 as used for -Wunused-macros.  There is no point warning about the
+	 compiler predefined macros.  */
+  cpp_options *cpp_opts = cpp_get_options (parse_in);
+  unsigned char saved_warn_unused_macros = cpp_opts->warn_unused_macros;
+  cpp_opts->warn_unused_macros = 0;
+
+  /* Update macros.  */
+  arm_cpp_builtins (parse_in, cur_mode);
+
+  cpp_opts->warn_unused_macros = saved_warn_unused_macros;
+}
+
+  return true;
+}
+
+/* Register target pragmas.  We need to add the hook for parsing #pragma GCC
+   option here rather than in arm.c since it will pull in various preprocessor
+   functions, and those are not present in languages like fortran without a
+   preprocessor.  */
+
+void
+arm_register_target_pragmas (void)
+{
+  /* Update pragma hook to allow parsing #pragma GCC target.  */
+  targetm.target_option.pragma_parse = arm_pragma_target_parse;
+
+#ifdef REGISTER_SUBTARGET_PRAGMAS
+  REGISTER_SUBTARGET_PRAGMAS ();
+#endif
+}
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm.h f/gcc/gcc/config/arm/arm.h
--- e/gcc/gcc/config/arm/arm.h	2014-11-13 13:48:59.0 +0100
+++ f/gcc/gcc/config/arm/arm.h	2014-11-13 13:48:01.0 +0100
@@ -2097,7 +2097,8 @@ extern int making_const_table;
   c_register_pragma (0, "long_calls", arm_pr_long_calls);		\
   c_register_pragma (0, "no_long_calls", arm_pr_no_long_calls);		\
   c_register_pragma (0, "long_calls_off", arm_pr_long_calls_off);	\
-  arm_lang_object_attributes_init(); \
+  arm_lang_object_attributes_init();	\
+  arm_register_target_pragmas();   \
 } while (0)
 
 /* Condition code information.  */
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm-protos.h f/gcc/gcc/config/arm/arm-protos.h
--- e/gcc/gcc/config/arm/arm-protos.h	2014-11-13 13:07:47.0 +0100
+++ f/gcc/gcc/config/arm/arm-protos.h	2014-11-13 13:49:50.0 +0100
@@ -306,6 +306,7 @@ extern const char *arm_rewrite_selected_
 
 /* Defined in gcc/common/config/arm-c.c.  */
 extern void arm_lang_object_attributes_init(void);
+extern void arm_register_target_pragmas (void);
 extern void arm_cpp_builtins (struct cpp_reader *, bool);
 
 extern bool arm_is_constant_pool_ref (rtx);
2014-09-23  Christian Bruel  

	* gcc.target/arm/pragma_attribute.c: N

[PATCH, ARM] attribute target (thumb,arm) [6/6]

2014-11-19 Thread Christian Bruel

Implement the -mflip-thump option. Undocumented for internal testing 
only. This option artificially inserts alternative attribute thumb/modes 
on functions.


This close the patch set. Thanks for your review,

Christian
2014-09-23  Christian Bruel  

	* config/arm/arm.c (add_attribute, arm_insert_attributes): New functions
	(TARGET_INSERT_ATTRIBUTES): Define.
	(thumb_flipper): New var.
	* config/arm/arm.opt (-mflip-thumb): New switch.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN f/gcc/gcc/config/arm/arm.c g/gcc/gcc/config/arm/arm.c
--- f/gcc/gcc/config/arm/arm.c	2014-11-18 13:16:44.0 +0100
+++ g/gcc/gcc/config/arm/arm.c	2014-11-19 14:04:22.0 +0100
@@ -218,6 +218,7 @@ static void arm_encode_section_info (tre
 
 static void arm_file_end (void);
 static void arm_file_start (void);
+static void arm_insert_attributes (tree, tree *);
 
 static void arm_setup_incoming_varargs (cumulative_args_t, machine_mode,
 	tree, int *, int);
@@ -370,6 +371,9 @@ static const struct attribute_spec arm_a
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE arm_attribute_table
 
+#undef  TARGET_INSERT_ATTRIBUTES
+#define TARGET_INSERT_ATTRIBUTES arm_insert_attributes
+
 #undef TARGET_ASM_FILE_START
 #define TARGET_ASM_FILE_START arm_file_start
 #undef TARGET_ASM_FILE_END
@@ -29403,6 +29407,56 @@ arm_valid_target_attribute_tree (tree ar
   return t;
 }
 
+/* True if -mflip-thumb should next add an attribute for the default
+   mode, false if it should next add an attribute for the opposite mode.  */
+static GTY(()) bool thumb_flipper = TARGET_THUMB;
+
+static void 
+add_attribute  (const char * mode, tree *attributes)
+{
+  size_t len = strlen (mode);
+  tree value = build_string (len, mode);
+
+  TREE_TYPE (value) = build_array_type (char_type_node,
+	build_index_type (size_int (len)));
+
+  *attributes = tree_cons (get_identifier ("target"),
+			   build_tree_list (NULL_TREE, value),
+			   *attributes);
+}
+
+/* For testing. Insert thumb or arm modes alternatively on functions.  */
+
+static void
+arm_insert_attributes (tree fndecl, tree * attributes)
+{
+  const char *mode;
+
+  if (TREE_CODE (fndecl) != FUNCTION_DECL || DECL_EXTERNAL(fndecl)
+  || DECL_BUILT_IN (fndecl) || DECL_ARTIFICIAL (fndecl))
+   return;
+
+  /* Nested definitions must inherit mode.  */
+  if (current_function_decl)
+   {
+ mode = TARGET_THUMB ? "thumb" : "arm";  
+ add_attribute (mode, attributes);
+ return;
+   }
+
+  if (! TARGET_FLIP_THUMB)
+return;
+
+  /* If there is already a setting don't change it.  */
+  if (lookup_attribute ("target", *attributes) != NULL)
+return;
+
+  mode = thumb_flipper ? "thumb" : "arm";
+  add_attribute (mode, attributes);
+
+  thumb_flipper = !thumb_flipper;
+}
+
 /* Hook to validate attribute((target("string"))).  */
 
 static bool
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN f/gcc/gcc/config/arm/arm.opt g/gcc/gcc/config/arm/arm.opt
--- f/gcc/gcc/config/arm/arm.opt	2014-11-13 14:05:34.0 +0100
+++ g/gcc/gcc/config/arm/arm.opt	2014-11-19 13:59:46.0 +0100
@@ -122,6 +122,10 @@ Enum(float_abi_type) String(softfp) Valu
 EnumValue
 Enum(float_abi_type) String(hard) Value(ARM_FLOAT_ABI_HARD)
 
+mflip-thumb
+Target Report Var(TARGET_FLIP_THUMB)
+Switch ARM/Thumb modes on alternating functions for compiler testing
+
 mfp16-format=
 Target RejectNegative Joined Enum(arm_fp16_format_type) Var(arm_fp16_format) Init(ARM_FP16_FORMAT_NONE)
 Specify the __fp16 floating-point format

Re: [PATCH] Fix sanitizer/63690

2014-11-19 Thread Jakub Jelinek

On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote:
> As the testcase shows, we should really check first that we
> have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE.
> On an invalid testcase we might get e.g. STRING_CST.

Well, addr_object_size should handle STRING_CSTs just fine, so
why not handle STRING_CSTs in there too (similarly to decl_p = true
case).
Supposedly you want decl_p = true for them and need to tweak
  if (!POINTER_TYPE_P (TREE_TYPE (base)) && !DECL_P (base))
but otherwise it should work.
On the other side,
int
foo (int x)
{
  return "foobar"[x];
}
is already handled by -fsanitize=bounds and for some reason
gimple_assign_load_p doesn't consider handled components with
STRING_CST as base as loads so for -fsanitize=object-size
we aren't called for that.

Jakub

Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-11-19 Thread Ilya Enkovich

On 13 Nov 20:56, Joseph Myers wrote:
> On Thu, 13 Nov 2014, Ilya Enkovich wrote:
> 
> > > You can leave it as a single library - it's just that imposes libgcc-like
> > > constraints on what the library does and how it does things, so as to be
> > > usable for arbitrary programs built with MPX (e.g. using
> > > reserved-namespace names such as __write when available - direct syscalls
> > > may also be a possibility in some cases).
> > 
> > I really don't want to make it libgcc-like and complicate its further
> > development.  What is the reason for that?  Is it because library is
> > linked by -mmpx?  Can it be avoided if library is linked only at
> > '-fcheck-pointer-bounds -mmpx' combination?  If single -mmpx is used
> 
> What's -fcheck-pointer-bounds?  I don't see any documentation of it in 
> invoke.texi.  Any options intended for users to use need documenting there 
> - did a documentation patch get missed out from the series of patches 
> posted / committed?
> 
> Documentation might help understand the intent of the feature.  I think 
> there are two natural possibilities:
> 
> * Use as a lightweight hardening feature enabled by default across a whole 
> distribution (or for a subset of the distribution considered 
> security-critical), by people who don't know the detailed requirements of 
> individual programs built with the option.  In that case there are 
> libgcc-like requirements (as another issue there, it might be problematic 
> to cause arbitrary previously unthreaded programs to link with libpthread; 
> at least, performance issues seem possible).  Outputting reports to files 
> determined by environment variables is probably not a sensible approach in 
> this case.  You might want reports to syslog, or might want MPX-detected 
> issues to be intercepted by crash reporters such as apport so they can 
> collect relevant information and offer to report it to the distributor.
> 
> * Use as a debugging tool, by people who understand the requirements of 
> their application / library and can tell whether the things the MPX 
> library is documented as doing might be a problem in their case.  For 
> this, libgcc-like requirements don't apply and output to files seems more 
> sensible.

I think if we compare runtime libraries for these two usage models then we 
don't find much in common.  Thus it should be two independent runtime 
libraries.  This posted library corresponds to the second scenario and is to be 
used mainly for debugging. 

> 
> If it's intended only as the latter - and this is documented - then you 
> don't have the libgcc-like requirements, and there's no point in having 
> multiple libraries.  If it's intended for both, that points the way to 
> separate libraries (where the debugging-tool library would be used in that 
> use case, but the crash-reporter-interception case would expect programs 
> to have been linked with only the minimal library).

Would it be enough to mention implicit link and resulting pthread dependency in 
-fcheck-pointer-bounds description in invoke.texi?

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

Here is an updated version.  I removed printf usage in a signal handler, added 
symbol versioning, added thread safety comments for static vars, added static 
link support and -static-libmpx option (will update corresponding documentation 
patch).  Does it look OK?

Thanks,
Ilya
--
2014-11-19  Ilya Enkovich  

* Makefile.def: Add libmpx.
* configure.ac: Add libmpx.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/

2014-11-19  Ilya Enkovich  

* gcc.c (LIBMPX_LIBS): New.
(LIBMPX_SPEC): New.
(MPX_SPEC): New.
(LINK_COMMAND_SPEC): Add MPX_SPEC.
* c-family/c.opt (static-libmpx): New.

libmpx/

2014-11-19  Ilya Enkovich  

Initial commit.


diff --git a/Makefile.def b/Makefile.def
index 40bbca9..4a535d2 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -128,6 +128,9 @@ target_modules = { module= libsanitizer;
   bootstrap=true;
   lib_path=.libs;
   raw_cxx=true; };
+target_modules = { module= libmpx;
+  bootstrap=true;
+  lib_path=.libs; };
 target_modules = { module= libvtv;
   bootstrap=true;
   lib_path=.libs;
diff --git a/configure.ac b/configure.ac
index b27fb1d..ccb119b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -162,6 +162,7 @@ target_libraries="target-libgcc \
target-libstdc++-v3 \
target-libsanitizer \
target-libvtv \
+   target-libmpx \
target-libssp \
target-libquadmath \
target-libgfortran \
@@ -642,6 +643,25 @@ if test -d ${srcdir}/libvtv; then
 fi
 fi
 
+
+# Disable libmpx on unsupported systems.
+if test -d ${srcdir}/libmpx; then
+if test x$enable_libmpx = x; then
+   AC_MSG_CHECKING([for libmpx support])
+   if (srcdir=${

Re: [PATCH] Fix sanitizer/63690

2014-11-19 Thread Jakub Jelinek

On Wed, Nov 19, 2014 at 03:12:05PM +0100, Jakub Jelinek wrote:
> On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote:
> > As the testcase shows, we should really check first that we
> > have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE.
> > On an invalid testcase we might get e.g. STRING_CST.
> 
> Well, addr_object_size should handle STRING_CSTs just fine, so
> why not handle STRING_CSTs in there too (similarly to decl_p = true
> case).
> Supposedly you want decl_p = true for them and need to tweak
>   if (!POINTER_TYPE_P (TREE_TYPE (base)) && !DECL_P (base))
> but otherwise it should work.
> On the other side,
> int
> foo (int x)
> {
>   return "foobar"[x];
> }
> is already handled by -fsanitize=bounds and for some reason
> gimple_assign_load_p doesn't consider handled components with
> STRING_CST as base as loads so for -fsanitize=object-size
> we aren't called for that.

The patch is ok as is.

Jakub

RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-19 Thread Andrew Bennett

>   Yes, removing the second NOP is worth the additional effort.

The updated patch is below.

Ok to commit?

Regards,


Andrew


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 02268f3..368c6f0 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
  This will sometimes be a delayed branch; see the write code below
  for details.  */
   mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at, mem, NULL);
-  mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL);
+
+  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay slot
+ of the branch if it is a branch likely because they will not execute when
+ the atomic operation is successful, so place a NOP in there instead.  */
+
+  mips_multi_add_insn ("beq%?\t%0,%.,1b%~", at, NULL);
 
   /* if (INSN1 != MOVE && INSN1 != LI) NEWVAL = $TMP3 [delay slot].  */
   if (insn1 != SYNC_INSN1_MOVE && insn1 != SYNC_INSN1_LI && tmp3 != newval)
@@ -13005,7 +13010,7 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
   mips_multi_copy_insn (tmp3_insn);
   mips_multi_set_operand (mips_multi_last_index (), 0, newval);
 }
-  else if (!(required_oldval && cmp))
+  else if (!(required_oldval && cmp) && !mips_branch_likely)
 mips_multi_add_insn ("nop", NULL);
 
   /* CMP = 1 -- either standalone or in a delay slot.  */
@@ -13029,12 +13034,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
 const char *
 mips_output_sync_loop (rtx_insn *insn, rtx *operands)
 {
-  mips_process_sync_loop (insn, operands);
-
   /* Use branch-likely instructions to work around the LL/SC R1
  errata.  */
   mips_branch_likely = TARGET_FIX_R1;
 
+  mips_process_sync_loop (insn, operands);
+
   mips_push_asm_switch (&mips_noreorder);
   mips_push_asm_switch (&mips_nomacro);
   mips_push_asm_switch (&mips_noat);
@@ -13056,6 +13061,9 @@ mips_output_sync_loop (rtx_insn *insn, rtx *operands)
 unsigned int
 mips_sync_loop_insns (rtx_insn *insn, rtx *operands)
 {
+  /* Use branch-likely instructions to work around the LL/SC R1
+ errata.  */
+  mips_branch_likely = TARGET_FIX_R1;
   mips_process_sync_loop (insn, operands);
   return mips_multi_num_insns;
 }

Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-19 Thread Ramana Radhakrishnan

On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel  wrote:

> I think I missed the stage3, Anyway would it be OK for stage1 when it
> reopens ?

Since you submitted this well during stage1 and given that these
patches address comments from earlier in the review process we should
aim to get these in for 5.0. If at some point during the review
process it looks risky and there needs to be significant rework we can
always stop. It's on the list of patches to be reviewed and I will
find some dedicated time later this week to set down and review / play
with the patches in an attempt to move this forward as it is a
reasonably large chunk of work.

Thanks for continuing to work on these patches and addressing the
earlier review comments.

regards
Ramana

>
> Christian
>

[PATCH] gimple_{build_assign,assign_set_rhs}_with_ops* cleanup

2014-11-19 Thread Jakub Jelinek

Hi!

This patch:
1) adds unary op overload to gimple_build_assign_with_ops
   so that many callers don't need to pass NULL_TREE as the last
   argument explicitly
2) adds unary op overload to gimple_assign_set_rhs_with_ops for similar
   reasons
3) renames gimple_assign_set_rhs_with_ops_1 to
   gimple_assign_set_rhs_with_ops so it becomes ternary op overload
   to the existing binary op one and 2) added unary op
and adjusts all users that I found.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-11-19  Jakub Jelinek  

* gimple.h (gimple_build_assign_with_ops): Add unary arg overload.
(gimple_assign_set_rhs_with_ops_1): Renamed to ...
(gimple_assign_set_rhs_with_ops): ... this.  Adjust binary arg
inline overload to use it.  Add unary arg overload.
* gimple.c (gimple_build_assign_with_ops): New unary arg overload.
(gimple_assign_set_rhs_from_tree): Use
gimple_assign_set_rhs_with_ops instead of
gimple_assign_set_rhs_with_ops_1.
(gimple_assign_set_rhs_with_ops_1): Renamed to ...
(gimple_assign_set_rhs_with_ops): ... this.
* ipa-split.c (split_function): Remove last NULL argument
from gimple_build_assign_with_ops call.
* tree-ssa-loop-im.c
(move_computations_dom_walker::before_dom_children): Likewise.
* tsan.c (instrument_builtin_call): Likewise.
* tree-vect-stmts.c (vect_init_vector, vectorizable_mask_load_store,
vectorizable_conversion, vectorizable_load): Likewise.
* tree-vect-loop.c (vect_is_simple_reduction_1,
get_initial_def_for_induction): Likewise.
* tree-loop-distribution.c (generate_memset_builtin): Likewise.
* tree-vect-patterns.c (vect_handle_widen_op_by_const,
vect_recog_widen_mult_pattern, vect_operation_fits_smaller_type,
vect_recog_over_widening_pattern, vect_recog_rotate_pattern,
vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern,
vect_recog_mixed_size_cond_pattern, adjust_bool_pattern_cast,
adjust_bool_pattern, vect_recog_bool_pattern): Likewise.
* tree-ssa-phiopt.c (conditional_replacement, abs_replacement,
neg_replacement): Likewise.
* asan.c (build_shadow_mem_access, maybe_create_ssa_name,
maybe_cast_to_ptrmode, asan_expand_check_ifn): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
* omp-low.c (lower_rec_input_clauses, expand_omp_for_generic,
expand_omp_for_static_nochunk, expand_omp_for_static_chunk,
simd_clone_adjust): Likewise.
* tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Likewise.
* gimple-ssa-strength-reduction.c (introduce_cast_before_cand,
replace_one_candidate): Likewise.
* gimple-builder.c (build_type_cast): Likewise.
* tree-ssa-forwprop.c (simplify_rotate): Likewise.
(forward_propagate_addr_expr_1): Remove last NULL argument
from gimple_assign_set_rhs_with_ops call.
(simplify_vector_constructor): Use gimple_assign_set_rhs_with_ops
instead of gimple_assign_set_rhs_with_ops_1.
* tree-ssa-reassoc.c (maybe_optimize_range_tests): Remove last NULL
argument from gimple_build_assign_with_ops call.
(repropagate_negates): Remove last NULL argument from
gimple_assign_set_rhs_with_ops call.
* ubsan.c (ubsan_expand_null_ifn, ubsan_expand_objsize_ifn): Remove
last NULL argument from gimple_build_assign_with_ops call.
(instrument_bool_enum_load): Likewise.  Remove last NULL argument
from gimple_assign_set_rhs_with_ops call.
* tree-ssa-math-opts.c (build_and_insert_cast, convert_mult_to_fma):
Remove last NULL argument from gimple_build_assign_with_ops call.
(bswap_replace): Likewise.  Use gimple_assign_set_rhs_with_ops instead
of gimple_assign_set_rhs_with_ops_1.
(convert_plusminus_to_widen): Use gimple_assign_set_rhs_with_ops
instead of gimple_assign_set_rhs_with_ops_1.
* gimple-fold.c (replace_stmt_with_simplification): Likewise.
(rewrite_to_defined_overflow, gimple_build): Remove last NULL argument
from gimple_build_assign_with_ops call.
* tree-ssa-strlen.c (handle_pointer_plus): Remove last NULL argument
from gimple_assign_set_rhs_with_ops call.
* tree-vrp.c (simplify_truth_ops_using_ranges,
simplify_bit_ops_using_ranges): Remove last NULL argument from
gimple_assign_set_rhs_with_ops call.
(simplify_float_conversion_using_ranges,
simplify_internal_call_using_ranges): Remove last NULL argument from
gimple_build_assign_with_ops call.

--- gcc/gimple.h.jj 2014-11-19 10:45:17.780769013 +0100
+++ gcc/gimple.h2014-11-19 10:52:52.118574835 +0100
@@ -1180,6 +1180,8 @@ gimple gimple_build_assign_with_ops (enu
 tree, tree, tree CXX_MEM_STAT_IN

Re: [PATCH] Fix sanitizer/63690

2014-11-19 Thread Marek Polacek

On Wed, Nov 19, 2014 at 03:12:05PM +0100, Jakub Jelinek wrote:
> On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote:
> > As the testcase shows, we should really check first that we
> > have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE.
> > On an invalid testcase we might get e.g. STRING_CST.
> 
> Well, addr_object_size should handle STRING_CSTs just fine, so
> why not handle STRING_CSTs in there too (similarly to decl_p = true
> case).
> Supposedly you want decl_p = true for them and need to tweak
>   if (!POINTER_TYPE_P (TREE_TYPE (base)) && !DECL_P (base))
> but otherwise it should work.
> On the other side,
> int
> foo (int x)
> {
>   return "foobar"[x];
> }
> is already handled by -fsanitize=bounds and for some reason

Right - I've checked that this is the case.  I could mention
that in the original submission.

> gimple_assign_load_p doesn't consider handled components with
> STRING_CST as base as loads so for -fsanitize=object-size
> we aren't called for that.

Thanks for review,

Marek

RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-19 Thread Matthew Fortune

> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index
> 02268f3..368c6f0 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx
> *operands)
>   This will sometimes be a delayed branch; see the write code below
>   for details.  */
>mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at,
> mem, NULL);
> -  mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL);
> +
> +  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the
> delay slot
> + of the branch if it is a branch likely because they will not
> execute when
> + the atomic operation is successful, so place a NOP in there
> + instead.  */

Please rephrase the comment along the lines of my previous suggestion.
This wording is too complex IMO.

Matthew

RE: [PATCH][AArch64] Adjust generic move costs

2014-11-19 Thread Wilco Dijkstra

Hi Jiong,

Can you commit this please?

2014-11-19  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (generic_regmove_cost):
Increase FP move cost (PR61915).


---
 gcc/config/aarch64/aarch64.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index de53c94..cd30724 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -218,8 +218,10 @@ __extension__
 static const struct cpu_regmove_cost generic_regmove_cost =
 {
   NAMED_PARAM (GP2GP, 1),
-  NAMED_PARAM (GP2FP, 2),
-  NAMED_PARAM (FP2GP, 2),
+  /* Avoid the use of slow int<->fp moves for spilling by setting
+ their cost higher than memmov_cost.  */
+  NAMED_PARAM (GP2FP, 5),
+  NAMED_PARAM (FP2GP, 5),
   NAMED_PARAM (FP2FP, 2)
 };
 
-- 
1.9.1

[patch, committed]

2014-11-19 Thread FX

Committed as pre-approved by Jakub in PR62132.
It took me three commits (revisions ) to get it right. I apologize for that, 
and I’ll stop coding for the day :(



2014-11-19  Francois-Xavier Coudert  

PR sanitizer/62132
* c-c++-common/asan/misalign-1.c: Pass -fno-omit-frame-pointer on
darwin, adjust dg-output.
* c-c++-common/asan/misalign-2.c: Likewise.


Index: c-c++-common/asan/misalign-1.c
===
--- c-c++-common/asan/misalign-1.c  (revision 217694)
+++ c-c++-common/asan/misalign-1.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run { target { ilp32 || lp64 } } } */
 /* { dg-options "-O2" } */
+/* { dg-additional-options "-fno-omit-frame-pointer" { target *-*-darwin* } } 
*/
 /* { dg-shouldfail "asan" } */
 
 struct S { int i; } __attribute__ ((packed));
@@ -38,5 +39,5 @@ main ()
 /* { dg-output "ERROR: AddressSanitizer:\[^\n\r]*on address\[^\n\r]*" } */
 /* { dg-output "0x\[0-9a-f\]+ at pc 0x\[0-9a-f\]+ bp 0x\[0-9a-f\]+ sp 
0x\[0-9a-f\]+\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*READ of size 4 at 0x\[0-9a-f\]+ thread 
T0\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "#0 0x\[0-9a-f\]+ (in 
_*foo(\[^\n\r]*misalign-1.c:10|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*misalign-1.c:34|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */
+/* { dg-output "#0 0x\[0-9a-f\]+ (in 
_*foo(\[^\n\r]*misalign-1.c:1\[01]|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } 
*/
+/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*misalign-1.c:3\[45]|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */
Index: c-c++-common/asan/misalign-2.c
===
--- c-c++-common/asan/misalign-2.c  (revision 217694)
+++ c-c++-common/asan/misalign-2.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run { target { ilp32 || lp64 } } } */
 /* { dg-options "-O2" } */
+/* { dg-additional-options "-fno-omit-frame-pointer" { target *-*-darwin* } } 
*/
 /* { dg-shouldfail "asan" } */
 
 struct S { int i; } __attribute__ ((packed));
@@ -38,5 +39,5 @@ main ()
 /* { dg-output "ERROR: AddressSanitizer:\[^\n\r]*on address\[^\n\r]*" } */
 /* { dg-output "0x\[0-9a-f\]+ at pc 0x\[0-9a-f\]+ bp 0x\[0-9a-f\]+ sp 
0x\[0-9a-f\]+\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*READ of size 4 at 0x\[0-9a-f\]+ thread 
T0\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "#0 0x\[0-9a-f\]+ (in 
_*baz(\[^\n\r]*misalign-2.c:22|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*misalign-2.c:34|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */
+/* { dg-output "#0 0x\[0-9a-f\]+ (in 
_*baz(\[^\n\r]*misalign-2.c:2\[23]|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } 
*/
+/* { dg-output "#1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*misalign-2.c:3\[45]|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */

Re: [PATCH] gimple_{build_assign,assign_set_rhs}_with_ops* cleanup

2014-11-19 Thread Richard Biener

On Wed, 19 Nov 2014, Jakub Jelinek wrote:

> Hi!
> 
> This patch:
> 1) adds unary op overload to gimple_build_assign_with_ops
>so that many callers don't need to pass NULL_TREE as the last
>argument explicitly
> 2) adds unary op overload to gimple_assign_set_rhs_with_ops for similar
>reasons
> 3) renames gimple_assign_set_rhs_with_ops_1 to
>gimple_assign_set_rhs_with_ops so it becomes ternary op overload
>to the existing binary op one and 2) added unary op
> and adjusts all users that I found.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2014-11-19  Jakub Jelinek  
> 
>   * gimple.h (gimple_build_assign_with_ops): Add unary arg overload.
>   (gimple_assign_set_rhs_with_ops_1): Renamed to ...
>   (gimple_assign_set_rhs_with_ops): ... this.  Adjust binary arg
>   inline overload to use it.  Add unary arg overload.
>   * gimple.c (gimple_build_assign_with_ops): New unary arg overload.
>   (gimple_assign_set_rhs_from_tree): Use
>   gimple_assign_set_rhs_with_ops instead of
>   gimple_assign_set_rhs_with_ops_1.
>   (gimple_assign_set_rhs_with_ops_1): Renamed to ...
>   (gimple_assign_set_rhs_with_ops): ... this.
>   * ipa-split.c (split_function): Remove last NULL argument
>   from gimple_build_assign_with_ops call.
>   * tree-ssa-loop-im.c
>   (move_computations_dom_walker::before_dom_children): Likewise.
>   * tsan.c (instrument_builtin_call): Likewise.
>   * tree-vect-stmts.c (vect_init_vector, vectorizable_mask_load_store,
>   vectorizable_conversion, vectorizable_load): Likewise.
>   * tree-vect-loop.c (vect_is_simple_reduction_1,
>   get_initial_def_for_induction): Likewise.
>   * tree-loop-distribution.c (generate_memset_builtin): Likewise.
>   * tree-vect-patterns.c (vect_handle_widen_op_by_const,
>   vect_recog_widen_mult_pattern, vect_operation_fits_smaller_type,
>   vect_recog_over_widening_pattern, vect_recog_rotate_pattern,
>   vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern,
>   vect_recog_mixed_size_cond_pattern, adjust_bool_pattern_cast,
>   adjust_bool_pattern, vect_recog_bool_pattern): Likewise.
>   * tree-ssa-phiopt.c (conditional_replacement, abs_replacement,
>   neg_replacement): Likewise.
>   * asan.c (build_shadow_mem_access, maybe_create_ssa_name,
>   maybe_cast_to_ptrmode, asan_expand_check_ifn): Likewise.
>   * tree-vect-slp.c (vect_get_constant_vectors): Likewise.
>   * omp-low.c (lower_rec_input_clauses, expand_omp_for_generic,
>   expand_omp_for_static_nochunk, expand_omp_for_static_chunk,
>   simd_clone_adjust): Likewise.
>   * tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Likewise.
>   * gimple-ssa-strength-reduction.c (introduce_cast_before_cand,
>   replace_one_candidate): Likewise.
>   * gimple-builder.c (build_type_cast): Likewise.
>   * tree-ssa-forwprop.c (simplify_rotate): Likewise.
>   (forward_propagate_addr_expr_1): Remove last NULL argument
>   from gimple_assign_set_rhs_with_ops call.
>   (simplify_vector_constructor): Use gimple_assign_set_rhs_with_ops
>   instead of gimple_assign_set_rhs_with_ops_1.
>   * tree-ssa-reassoc.c (maybe_optimize_range_tests): Remove last NULL
>   argument from gimple_build_assign_with_ops call.
>   (repropagate_negates): Remove last NULL argument from
>   gimple_assign_set_rhs_with_ops call.
>   * ubsan.c (ubsan_expand_null_ifn, ubsan_expand_objsize_ifn): Remove
>   last NULL argument from gimple_build_assign_with_ops call.
>   (instrument_bool_enum_load): Likewise.  Remove last NULL argument
>   from gimple_assign_set_rhs_with_ops call.
>   * tree-ssa-math-opts.c (build_and_insert_cast, convert_mult_to_fma):
>   Remove last NULL argument from gimple_build_assign_with_ops call.
>   (bswap_replace): Likewise.  Use gimple_assign_set_rhs_with_ops instead
>   of gimple_assign_set_rhs_with_ops_1.
>   (convert_plusminus_to_widen): Use gimple_assign_set_rhs_with_ops
>   instead of gimple_assign_set_rhs_with_ops_1.
>   * gimple-fold.c (replace_stmt_with_simplification): Likewise.
>   (rewrite_to_defined_overflow, gimple_build): Remove last NULL argument
>   from gimple_build_assign_with_ops call.
>   * tree-ssa-strlen.c (handle_pointer_plus): Remove last NULL argument
>   from gimple_assign_set_rhs_with_ops call.
>   * tree-vrp.c (simplify_truth_ops_using_ranges,
>   simplify_bit_ops_using_ranges): Remove last NULL argument from
>   gimple_assign_set_rhs_with_ops call.
>   (simplify_float_conversion_using_ranges,
>   simplify_internal_call_using_ranges): Remove last NULL argument from
>   gimple_build_assign_with_ops call.
> 
> --- gcc/gimple.h.jj   2014-11-19 10:45:17.780769013 +0100
> +++ gcc/gimple.h  2014-11-19 10:52:52.118574835 +0100
> @@ -1180,6 +1180,8 @@ gimple g

RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-19 Thread Maciej W. Rozycki

On Wed, 19 Nov 2014, Matthew Fortune wrote:

> > @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx
> > *operands)
> >   This will sometimes be a delayed branch; see the write code below
> >   for details.  */
> >mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at,
> > mem, NULL);
> > -  mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL);
> > +
> > +  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the
> > delay slot
> > + of the branch if it is a branch likely because they will not
> > execute when
> > + the atomic operation is successful, so place a NOP in there
> > + instead.  */
> 
> Please rephrase the comment along the lines of my previous suggestion.
> This wording is too complex IMO.

 Also I'm not sure if it's an enforced rule for GCC sources (it is for 
GDB for example; someone please chime in if I'm missing something here), 
but can you take the opportunity and limit the span of these comment 
lines a bit, like to 74 or maybe even 72 columns, similarly to the rule 
for ChangeLog entries, to make them more readable.  Thanks.

  Maciej

Re: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4 (fix s390 build)

2014-11-19 Thread Richard Biener

On Thu, Nov 13, 2014 at 7:57 PM, Ulrich Weigand  wrote:
> Richard Henderson wrote:
>> On 11/12/2014 09:41 PM, Ulrich Weigand wrote:
>> > * optabs.c (prepare_operand): Gracefully fail if the mode of X
>> > does not match the operand mode expected by the insn pattern.
>>
>> This is ok.
>
> I've checked this in now, thanks.
>
>> I wondered whether s390 benefit from following i386 in allowing
>> mov<>cc to accept immediate operands pre-reload.  This could then
>> be re-used by cstorecc4 to allow any CC mode to expand to LOCR
>> when available.
>
> Well, we already compile
>
> int test (long x, long y)
> {
>   return x < y;
> }
>
> to
>
> cgr %r2,%r3 # 44*cmpdi_ccs/1[length = 4]
> lhi %r1,0   # 43*movsi_zarch/1  [length = 4]
> lhi %r2,1   # 7 *movsi_zarch/1  [length = 4]
> locrnl  %r2,%r1 # 45*movsicc/2  [length = 4]
> lgfr%r2,%r2 # 17*extendsidi2/1  [length = 4]
> br  %r14# 50return  [length = 2]
>
> with -march=z196 or higher.  The LOCR is not introduced via
> cstorecc4, but instead directly by noce_emit_cmove via
> emit_conditional_move.  It doesn't seem to matter that the
> patterns don't accept immediate operands; those are loaded
> into registers by maybe_legitimize_operand, which already
> tries a copy_to_mode_reg.

This probably caused bootstrap on s390x-linux to fail as in PR63952
(last checked with rev. 217714).

Richard.

> Bye,
> Ulrich
>
> --
>   Dr. Ulrich Weigand
>   GNU/Linux compilers and toolchain
>   ulrich.weig...@de.ibm.com
>

Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-19 Thread Christian Bruel




On 11/19/2014 03:18 PM, Ramana Radhakrishnan wrote:

On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel  wrote:


I think I missed the stage3, Anyway would it be OK for stage1 when it
reopens ?


Since you submitted this well during stage1 and given that these
patches address comments from earlier in the review process we should
aim to get these in for 5.0. If at some point during the review
process it looks risky and there needs to be significant rework we can
always stop. It's on the list of patches to be reviewed and I will
find some dedicated time later this week to set down and review / play
with the patches in an attempt to move this forward as it is a
reasonably large chunk of work.


Thanks, also I forgot to mention that you need
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01231.html
to play with the attribute. Will be part of the same shot.

Christian



Thanks for continuing to work on these patches and addressing the
earlier review comments.

regards
Ramana



Christian

[COMMITTED 1/3] Make TARGET_STATIC_CHAIN allow a function type

2014-11-19 Thread Richard Henderson

As opposed to always being a decl.  This is a prerequisite
to allowing the static chain to be loaded for indirect calls.

* targhooks.c (default_static_chain): Remove check for
DECL_STATIC_CHAIN.
* config/moxie/moxie.c (moxie_static_chain): Likewise.
* config/i386/i386.c (ix86_static_chain): Allow decl or type
as the first argument.
* config/xtensa/xtensa.c (xtensa_static_chain): Change the name
of the unused first parameter.
* doc/tm.texi (TARGET_STATIC_CHAIN): Document the first parameter
may be a type.
* target.def (static_chain): Likewise.
---
 gcc/ChangeLog  | 13 +
 gcc/config/i386/i386.c | 19 +--
 gcc/config/moxie/moxie.c   |  5 +
 gcc/config/xtensa/xtensa.c |  2 +-
 gcc/doc/tm.texi|  2 +-
 gcc/target.def |  6 +++---
 gcc/targhooks.c|  5 +
 7 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3166e03..3b41de2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -27356,13 +27356,10 @@ ix86_minimum_alignment (tree exp, machine_mode mode,
This is a register, unless all free registers are used by arguments.  */
 
 static rtx
-ix86_static_chain (const_tree fndecl, bool incoming_p)
+ix86_static_chain (const_tree fndecl_or_type, bool incoming_p)
 {
   unsigned regno;
 
-  if (!DECL_STATIC_CHAIN (fndecl))
-return NULL;
-
   if (TARGET_64BIT)
 {
   /* We always use R10 in 64-bit mode.  */
@@ -27370,13 +27367,23 @@ ix86_static_chain (const_tree fndecl, bool incoming_p)
 }
   else
 {
-  tree fntype;
+  const_tree fntype, fndecl;
   unsigned int ccvt;
 
   /* By default in 32-bit mode we use ECX to pass the static chain.  */
   regno = CX_REG;
 
-  fntype = TREE_TYPE (fndecl);
+  if (TREE_CODE (fndecl_or_type) == FUNCTION_DECL)
+   {
+  fntype = TREE_TYPE (fndecl_or_type);
+ fndecl = fndecl_or_type;
+   }
+  else
+   {
+ fntype = fndecl_or_type;
+ fndecl = NULL;
+   }
+
   ccvt = ix86_get_callcvt (fntype);
   if ((ccvt & IX86_CALLCVT_FASTCALL) != 0)
{
diff --git a/gcc/config/moxie/moxie.c b/gcc/config/moxie/moxie.c
index d4688d9..148d26b 100644
--- a/gcc/config/moxie/moxie.c
+++ b/gcc/config/moxie/moxie.c
@@ -528,13 +528,10 @@ moxie_arg_partial_bytes (cumulative_args_t cum_v,
 /* Worker function for TARGET_STATIC_CHAIN.  */
 
 static rtx
-moxie_static_chain (const_tree fndecl, bool incoming_p)
+moxie_static_chain (const_tree ARG_UNUSED (fndecl_or_type), bool incoming_p)
 {
   rtx addr, mem;
 
-  if (!DECL_STATIC_CHAIN (fndecl))
-return NULL;
-
   if (incoming_p)
 addr = plus_constant (Pmode, arg_pointer_rtx, 2 * UNITS_PER_WORD);
   else
diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index 159a1a7..a0025a5 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -3626,7 +3626,7 @@ xtensa_function_value_regno_p (const unsigned int regno)
expressions that denote where they are stored.  */
 
 static rtx
-xtensa_static_chain (const_tree ARG_UNUSED (fndecl), bool incoming_p)
+xtensa_static_chain (const_tree ARG_UNUSED (fndecl_or_type), bool incoming_p)
 {
   rtx base = incoming_p ? arg_pointer_rtx : stack_pointer_rtx;
   return gen_frame_mem (Pmode, plus_constant (Pmode, base,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 0d3a9fd..5b9da47 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3462,7 +3462,7 @@ If the static chain is passed in memory, these macros 
should not be
 defined; instead, the @code{TARGET_STATIC_CHAIN} hook should be used.
 @end defmac
 
-@deftypefn {Target Hook} rtx TARGET_STATIC_CHAIN (const_tree @var{fndecl}, 
bool @var{incoming_p})
+@deftypefn {Target Hook} rtx TARGET_STATIC_CHAIN (const_tree 
@var{fndecl_or_type}, bool @var{incoming_p})
 This hook replaces the use of @code{STATIC_CHAIN_REGNUM} et al for
 targets that may use different static chain locations for different
 nested functions.  This may be required if the target has function
diff --git a/gcc/target.def b/gcc/target.def
index bc5160d..dc48ae6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4516,8 +4516,8 @@ false for naked functions.  The default implementation 
always returns true.",
  bool, (void),
  hook_bool_void_true)
 
-/* Return an rtx for the static chain for FNDECL.  If INCOMING_P is true,
-   then it should be for the callee; otherwise for the caller.  */
+/* Return an rtx for the static chain for FNDECL_OR_TYPE.  If INCOMING_P
+   is true, then it should be for the callee; otherwise for the caller.  */
 DEFHOOK
 (static_chain,
  "This hook replaces the use of @code{STATIC_CHAIN_REGNUM} et al for\n\
@@ -4539,7 +4539,7 @@ will be at an offset from the frame pointer.\n\
 The variables @code{stack_pointer_rtx}, @code{frame_pointer_rtx}, and\n\
 @code{arg_pointer_rtx} will have been initiali

[COMMITTED 0/3] Allow front end use of the static chain

2014-11-19 Thread Richard Henderson

This is the middle-end and C front end parts of the gccgo related
patch set first posted here.

  https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01009.html

I've re-tested these patches in isolation and have committed them
to mainline.  I'm currently submitting the Go portions for the
upstream review, and hope to have to libffi merge complete soon.

I think it's important that the change to Go using the static chain
happen before gcc5, as it represents an ABI change.  Our next opportunity
to fix the bulk of the non-x86 targets would be gcc6 in a year's time.


r~


Richard Henderson (3):
  Make TARGET_STATIC_CHAIN allow a function type
  Allow the front-end to create calls with a static chain
  Allow the static chain to be set from C

 gcc/ChangeLog| 27 ++
 gcc/c-family/c-common.c  |  2 ++
 gcc/c-family/c-common.h  |  2 +-
 gcc/c/c-parser.c | 40 ++
 gcc/calls.c  | 14 --
 gcc/config/i386/i386.c   | 19 --
 gcc/config/moxie/moxie.c |  5 +
 gcc/config/xtensa/xtensa.c   |  2 +-
 gcc/doc/extend.texi  | 13 +
 gcc/doc/tm.texi  |  2 +-
 gcc/gimple-fold.c| 21 
 gcc/gimplify.c   | 17 +++-
 gcc/target.def   |  6 +++---
 gcc/targhooks.c  |  5 +
 gcc/testsuite/ChangeLog  |  5 +
 gcc/testsuite/gcc.dg/cwsc0.c | 16 +++
 gcc/testsuite/gcc.dg/cwsc1.c | 46 
 gcc/tree-cfg.c   | 22 +++--
 18 files changed, 222 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/cwsc0.c
 create mode 100644 gcc/testsuite/gcc.dg/cwsc1.c

-- 
1.9.3

[COMMITTED 3/3] Allow the static chain to be set from C

2014-11-19 Thread Richard Henderson

We need to be able to set the static chain on a few calls within the
Go runtime, so expose this with __builtin_call_with_static_chain.

* c-family/c-common.c (c_common_reswords): Add
__builtin_call_with_static_chain.
* c-family/c-common.h (RID_BUILTIN_CALL_WITH_STATIC_CHAIN): New.
* c/c-parser.c (c_parser_postfix_expression): Handle it.
* doc/extend.texi (__builtin_call_with_static_chain): Document it.

* gcc.dg/cwsc0.c: New test.
* gcc.dg/cwsc1.c: New test.
---
 gcc/ChangeLog|  6 ++
 gcc/c-family/c-common.c  |  2 ++
 gcc/c-family/c-common.h  |  2 +-
 gcc/c/c-parser.c | 40 ++
 gcc/doc/extend.texi  | 13 +
 gcc/testsuite/ChangeLog  |  5 +
 gcc/testsuite/gcc.dg/cwsc0.c | 16 +++
 gcc/testsuite/gcc.dg/cwsc1.c | 46 
 8 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/cwsc0.c
 create mode 100644 gcc/testsuite/gcc.dg/cwsc1.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 839111a..95b6b1b 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -453,6 +453,8 @@ const struct c_common_resword c_common_reswords[] =
   { "__attribute__",   RID_ATTRIBUTE,  0 },
   { "__auto_type", RID_AUTO_TYPE,  D_CONLY },
   { "__bases",  RID_BASES, D_CXXONLY },
+  { "__builtin_call_with_static_chain",
+RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index ca6fc8b..7e53923 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -101,7 +101,7 @@ enum rid
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX, 
RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
-  RID_FRACT, RID_ACCUM, RID_AUTO_TYPE,
+  RID_FRACT, RID_ACCUM, RID_AUTO_TYPE, RID_BUILTIN_CALL_WITH_STATIC_CHAIN,
 
   /* C11 */
   RID_ALIGNAS, RID_GENERIC,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index f90f6af..8a4fd39 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7414,6 +7414,46 @@ c_parser_postfix_expression (c_parser *parser)
  = comptypes (e1, e2) ? integer_one_node : integer_zero_node;
  }
  break;
+   case RID_BUILTIN_CALL_WITH_STATIC_CHAIN:
+ {
+   vec *cexpr_list;
+   c_expr_t *e2_p;
+   tree chain_value;
+
+   c_parser_consume_token (parser);
+   if (!c_parser_get_builtin_args (parser,
+   "__builtin_call_with_static_chain",
+   &cexpr_list, false))
+ {
+   expr.value = error_mark_node;
+   break;
+ }
+   if (vec_safe_length (cexpr_list) != 2)
+ {
+   error_at (loc, "wrong number of arguments to "
+  "%<__builtin_call_with_static_chain%>");
+   expr.value = error_mark_node;
+   break;
+ }
+
+   expr = (*cexpr_list)[0];
+   e2_p = &(*cexpr_list)[1];
+   *e2_p = convert_lvalue_to_rvalue (loc, *e2_p, true, true);
+   chain_value = e2_p->value;
+   mark_exp_read (chain_value);
+
+   if (TREE_CODE (expr.value) != CALL_EXPR)
+ error_at (loc, "first argument to "
+   "%<__builtin_call_with_static_chain%> "
+   "must be a call expression");
+   else if (TREE_CODE (TREE_TYPE (chain_value)) != POINTER_TYPE)
+ error_at (loc, "second argument to "
+   "%<__builtin_call_with_static_chain%> "
+   "must be a pointer type");
+   else
+ CALL_EXPR_STATIC_CHAIN (expr.value) = chain_value;
+   break;
+ }
case RID_BUILTIN_COMPLEX:
  {
vec *cexpr_list;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d10a815..7178c9a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8913,6 +8913,7 @@ in the Cilk Plus language manual which can be found at
 @node Other Builtins
 @section Other Built-in Functions Provided by GCC
 @cindex built-in functions
+@findex __builtin_call_with_static_chain
 @findex __builtin_fpclassify
 @findex __builtin_isfinite
 @findex __builtin_isnormal
@@ -9501,6 +9502,18 @@ depending on the arguments' types.  For example:
 
 @end deftypefn
 
+@deftypefn {Built-in Function} @var{type} __builtin_call_with_static_chain 
(@var{call_exp}, @var{pointer_exp})
+
+The @var{call_exp} expression must be a function call, and the
+@var{pointer_exp} expression must be a pointer.  The @var{p

[COMMITTED 2/3] Allow the front-end to create calls with a static chain

2014-11-19 Thread Richard Henderson

And, at the same time, allow indirect calls to have a static chain.
We'll always eliminate the static chain if we can prove it's unused.

* calls.c (prepare_call_address): Allow decl or type for first arg.
(expand_call): Pass type to prepare_call_address if no decl.
* gimple-fold.c (gimple_fold_call): Eliminate the static chain if
the function doesn't use it; fold it otherwise.
* gimplify.c (gimplify_call_expr): Gimplify the static chain.
* tree-cfg.c (verify_gimple_call): Allow a static chain on indirect
function calls.
---
 gcc/ChangeLog |  8 
 gcc/calls.c   | 14 --
 gcc/gimple-fold.c | 21 +
 gcc/gimplify.c| 17 -
 gcc/tree-cfg.c| 22 +++---
 5 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/gcc/calls.c b/gcc/calls.c
index 7f55aaf..c64c0eb 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -197,7 +197,7 @@ static void restore_fixed_argument_area (rtx, rtx, int, 
int);
CALL_INSN_FUNCTION_USAGE information.  */
 
 rtx
-prepare_call_address (tree fndecl, rtx funexp, rtx static_chain_value,
+prepare_call_address (tree fndecl_or_type, rtx funexp, rtx static_chain_value,
  rtx *call_fusage, int reg_parm_seen, int sibcallp)
 {
   /* Make a valid memory address and copy constants through pseudo-regs,
@@ -217,12 +217,13 @@ prepare_call_address (tree fndecl, rtx funexp, rtx 
static_chain_value,
 #endif
 }
 
-  if (static_chain_value != 0)
+  if (static_chain_value != 0
+  && (TREE_CODE (fndecl_or_type) != FUNCTION_DECL
+ || DECL_STATIC_CHAIN (fndecl_or_type)))
 {
   rtx chain;
 
-  gcc_assert (fndecl);
-  chain = targetm.calls.static_chain (fndecl, false);
+  chain = targetm.calls.static_chain (fndecl_or_type, false);
   static_chain_value = convert_memory_address (Pmode, static_chain_value);
 
   emit_move_insn (chain, static_chain_value);
@@ -3278,8 +3279,9 @@ expand_call (tree exp, rtx target, int ignore)
}
 
   after_args = get_last_insn ();
-  funexp = prepare_call_address (fndecl, funexp, static_chain_value,
-&call_fusage, reg_parm_seen, pass == 0);
+  funexp = prepare_call_address (fndecl ? fndecl : fntype, funexp,
+static_chain_value, &call_fusage,
+reg_parm_seen, pass == 0);
 
   load_register_parameters (args, num_actuals, &call_fusage, flags,
pass == 0, &sibcall_failure);
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index acdadcd..4f716b2 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2723,6 +2723,27 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
inplace)
}
 }
 
+  /* Check for indirect calls that became direct calls, and then
+ no longer require a static chain.  */
+  if (gimple_call_chain (stmt))
+{
+  tree fn = gimple_call_fndecl (stmt);
+  if (fn && !DECL_STATIC_CHAIN (fn))
+   {
+ gimple_call_set_chain (stmt, NULL);
+ changed = true;
+   }
+  else
+   {
+ tree tmp = maybe_fold_reference (gimple_call_chain (stmt), false);
+ if (tmp)
+   {
+ gimple_call_set_chain (stmt, tmp);
+ changed = true;
+   }
+   }
+}
+
   if (inplace)
 return changed;
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 93c06de..c46fb66 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2432,7 +2432,7 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool 
want_value)
}
 }
 
-  /* Finally, gimplify the function arguments.  */
+  /* Gimplify the function arguments.  */
   if (nargs > 0)
 {
   for (i = (PUSH_ARGS_REVERSED ? nargs - 1 : 0);
@@ -2454,6 +2454,21 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, 
bool want_value)
 }
 }
 
+  /* Gimplify the static chain.  */
+  if (CALL_EXPR_STATIC_CHAIN (*expr_p))
+{
+  if (fndecl && !DECL_STATIC_CHAIN (fndecl))
+   CALL_EXPR_STATIC_CHAIN (*expr_p) = NULL;
+  else
+   {
+ enum gimplify_status t;
+ t = gimplify_arg (&CALL_EXPR_STATIC_CHAIN (*expr_p), pre_p,
+   EXPR_LOCATION (*expr_p));
+ if (t == GS_ERROR)
+   ret = GS_ERROR;
+   }
+}
+
   /* Verify the function result.  */
   if (want_value && fndecl
   && VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fnptrtype
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 9dd8961..71a828c 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -,22 +,14 @@ verify_gimple_call (gimple stmt)
   return true;
 }
 
-  /* If there is a static chain argument, this should not be an indirect
- call, and the decl should have DECL_STATIC_CHAIN set.  */
-  if (gimple_call_chain (stmt))
+  /* If there is a static chain argument, the call should either be
+ indirect, or the decl should

[PATCH] Factor and export a const_unop, export const_binop

2014-11-19 Thread Richard Biener


The following patch is to avoid needlessly recursing to generic_simplify
from fold_unary when we ask to perform constant folding.  Similar
fold_binary is currently entered to constant fold from 
gimple/generic_simplify and will do extra work in case it isn't able
to fold the constant operation.

To mitigate that (and still missing the fold_ternary case - to be done
as followup in case this gets accepted at this point) - I factor and
export a const_unop and provide an exported const_binop which also
knows to dispatch to fold_relational_const.  These API are intended
to provide pure constant folding (that is, all-constant operands,
not only a constant result).

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok at this point?
(it should fix a tiny compile-time regression)

I'll followup with a const_ternop (better name?) if ok.

Thanks,
Richard.

2014-11-19  Richard Biener  

* fold-const.h (const_unop): Declare.
(const_binop): Likewise.
* fold-const.c (const_binop): Export overload that expects
a type parameter and dispatches to fold_relational_const as well.
Check both operand kinds for guarding the transforms.
(const_unop): New function, with constant folding from fold_unary_loc.
(fold_unary_loc): Dispatch to const_unop for tcc_constant operand.
Remove constant folding done there from the simplifications.
(fold_binary_loc): Check for constants using CONSTANT_CLASS_P.
(fold_negate_expr): Remove dead code from the REAL_CST case.
Avoid building garbage in the COMPLEX_CST case.
* gimple-match-head.c (gimple_resimplify1): Dispatch to
const_unop.
(gimple_resimplify2): Dispatch to const_binop.
(gimple_simplify): Likewise.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c.orig   2014-11-18 15:16:04.263514898 +0100
--- gcc/fold-const.c2014-11-19 13:22:24.866032427 +0100
*** static bool negate_expr_p (tree);
*** 115,121 
  static tree negate_expr (tree);
  static tree split_tree (tree, enum tree_code, tree *, tree *, tree *, int);
  static tree associate_trees (location_t, tree, tree, enum tree_code, tree);
- static tree const_binop (enum tree_code, tree, tree);
  static enum comparison_code comparison_to_compcode (enum tree_code);
  static enum tree_code compcode_to_comparison (enum comparison_code);
  static int operand_equal_for_comparison_p (tree, tree, tree);
--- 115,120 
*** static tree fold_negate_const (tree, tre
*** 156,161 
--- 155,163 
  static tree fold_not_const (const_tree, tree);
  static tree fold_relational_const (enum tree_code, tree, tree, tree);
  static tree fold_convert_const (enum tree_code, tree, tree);
+ static tree fold_view_convert_expr (tree, tree);
+ static bool vec_cst_ctor_to_array (tree, tree *);
+ 
  
  /* Return EXPR_LOCATION of T if it is not UNKNOWN_LOCATION.
 Otherwise, return LOC.  */
*** fold_negate_expr (location_t loc, tree t
*** 561,570 
  
  case REAL_CST:
tem = fold_negate_const (t, type);
!   /* Two's complement FP formats, such as c4x, may overflow.  */
!   if (!TREE_OVERFLOW (tem) || !flag_trapping_math)
!   return tem;
!   break;
  
  case FIXED_CST:
tem = fold_negate_const (t, type);
--- 563,569 
  
  case REAL_CST:
tem = fold_negate_const (t, type);
!   return tem;
  
  case FIXED_CST:
tem = fold_negate_const (t, type);
*** fold_negate_expr (location_t loc, tree t
*** 572,584 
  
  case COMPLEX_CST:
{
!   tree rpart = negate_expr (TREE_REALPART (t));
!   tree ipart = negate_expr (TREE_IMAGPART (t));
! 
!   if ((TREE_CODE (rpart) == REAL_CST
!&& TREE_CODE (ipart) == REAL_CST)
!   || (TREE_CODE (rpart) == INTEGER_CST
!   && TREE_CODE (ipart) == INTEGER_CST))
  return build_complex (type, rpart, ipart);
}
break;
--- 571,579 
  
  case COMPLEX_CST:
{
!   tree rpart = fold_negate_expr (loc, TREE_REALPART (t));
!   tree ipart = fold_negate_expr (loc, TREE_IMAGPART (t));
!   if (rpart && ipart)
  return build_complex (type, rpart, ipart);
}
break;
*** const_binop (enum tree_code code, tree a
*** 1135,1144 
STRIP_NOPS (arg1);
STRIP_NOPS (arg2);
  
!   if (TREE_CODE (arg1) == INTEGER_CST)
  return int_const_binop (code, arg1, arg2);
  
!   if (TREE_CODE (arg1) == REAL_CST)
  {
machine_mode mode;
REAL_VALUE_TYPE d1;
--- 1130,1139 
STRIP_NOPS (arg1);
STRIP_NOPS (arg2);
  
!   if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST)
  return int_const_binop (code, arg1, arg2);
  
!   if (TREE_CODE (arg1) == REAL_CST && TREE_CODE (arg2) == REAL_CST)
  {
machine_mode mode;
REAL_VALUE_TYPE d1;
*** const

1 2 3 >

1 - 100 of 231 matches

Mail list logo