[PATCH][4.8] S/390: Transaction optimization fixes

2013-10-04 Thread Andreas Krebbel
Hi,

the attached patch fixes two issues with the TX backend optimization
trying to get rid of FPR save/restore operations in some cases.

The first is that the stack pointer might get clobbered on 64 bit if
the backend was able to get rid of all the FPR saves/restores and
these were the only things requiring stack space. For more details
please see the testcase comment.

The second is that the optimization did not work as expected on 31 bit
since the loop setting the fpr save bits only for the 64 bit
call-saved FPRs made use of the call_clobbered information.

Both issues are covered by the new testcase.

Bootstrapped and regtested on s390 and s390x with --with-arch=zEC12.
No regressions.

I'm going to apply the patch to 4.8 and mainline after waiting a few
days for comments.

Bye,

-Andreas-

2013-10-04  Andreas Krebbel  

* config/s390/s390.c (s390_register_info): Make the call-saved FPR
loop to work also for 31bit ABI.
Save the stack pointer for frame_size > 0.

2013-10-04  Andreas Krebbel  

* gcc.target/s390/htm-nofloat-2.c: New testcase.

---
 gcc/config/s390/s390.c|   21 
 gcc/testsuite/gcc.target/s390/htm-nofloat-2.c |   55 ++
 2 files changed, 57 insertions(+), 8 deletions(-), 11 modifications(!)

Index: gcc/config/s390/s390.c
===
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*** s390_register_info (int clobbered_regs[]
*** 7509,7516 
  {
cfun_frame_layout.fpr_bitmap = 0;
cfun_frame_layout.high_fprs = 0;
!   if (TARGET_64BIT)
!   for (i = FPR8_REGNUM; i <= FPR15_REGNUM; i++)
  /* During reload we have to use the df_regs_ever_live infos
 since reload is marking FPRs used as spill slots there as
 live before actually making the code changes.  Without
--- 7509,7519 
  {
cfun_frame_layout.fpr_bitmap = 0;
cfun_frame_layout.high_fprs = 0;
! 
!   for (i = FPR0_REGNUM; i <= FPR15_REGNUM; i++)
!   {
! if (call_really_used_regs[i])
!   continue;
  /* During reload we have to use the df_regs_ever_live infos
 since reload is marking FPRs used as spill slots there as
 live before actually making the code changes.  Without
*** s390_register_info (int clobbered_regs[]
*** 7523,7530 
  && !global_regs[i])
{
  cfun_set_fpr_save (i);
! cfun_frame_layout.high_fprs++;
}
  }
  
for (i = 0; i < 16; i++)
--- 7526,7536 
  && !global_regs[i])
{
  cfun_set_fpr_save (i);
! 
! if (i >= FPR8_REGNUM)
!   cfun_frame_layout.high_fprs++;
}
+   }
  }
  
for (i = 0; i < 16; i++)
*** s390_register_info (int clobbered_regs[]
*** 7554,7559 
--- 7560,7566 
|| TARGET_TPF_PROFILING
|| cfun_save_high_fprs_p
|| get_frame_size () > 0
+   || (reload_completed && cfun_frame_layout.frame_size > 0)
|| cfun->calls_alloca
|| cfun->stdarg);
  
*** s390_register_info (int clobbered_regs[]
*** 7652,7665 
cfun_set_fpr_save (i + FPR0_REGNUM);
}
  }
- 
-   if (!TARGET_64BIT)
- {
-   if (df_regs_ever_live_p (FPR4_REGNUM) && !global_regs[FPR4_REGNUM])
-   cfun_set_fpr_save (FPR4_REGNUM);
-   if (df_regs_ever_live_p (FPR6_REGNUM) && !global_regs[FPR6_REGNUM])
-   cfun_set_fpr_save (FPR6_REGNUM);
- }
  }
  
  /* Fill cfun->machine with info about frame of current function.  */
--- 7659,7664 
Index: gcc/testsuite/gcc.target/s390/htm-nofloat-2.c
===
*** /dev/null
--- gcc/testsuite/gcc.target/s390/htm-nofloat-2.c
***
*** 0 
--- 1,55 
+ /* { dg-do run } */
+ /* { dg-options "-O3 -mhtm -Wa,-march=zEC12 --save-temps" } */
+ 
+ /* __builtin_tbegin has to emit clobbers for all FPRs since the tbegin
+instruction does not automatically preserves them.  If the
+transaction body is fully contained in a function the backend tries
+after reload to get rid of the FPR save/restore operations
+triggered by the clobbers.  This testcase failed since the backend
+was able to get rid of all FPR saves/restores and since these were
+the only stack operations also of the entire stack space.  So even
+the save/restore of the stack pointer was omitted in the end.
+However, since the frame layout has been fixed before, the prologue
+still generated the stack pointer decrement making foo return with
+a modified stack pointer.  */
+ 
+ void abort(void);
+ 
+ void __attribute__((noinline))
+ foo (int a)
+ {
+   /* This is just to prevent the tbegin code from actually being
+  executed.  That way the test may even run on machines prior to
+  z

[SH] PR 51244 - Fix defects introduced in 4.8

2013-10-04 Thread Oleg Endo
Hello,

Some of the things I've done in 4.8 to improve SH T bit handling turned
out to produce wrong code.  The attached patch fixes that by introducing
an SH specific RTL pass.

Tested on rev 202876 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
Additional test cases will follow.
OK for trunk?

Cheers,
Oleg

gcc/ChangeLog:

PR target/51244
* config/sh/ifcvt_sh.cc: New SH specific RTL pass.
* config.gcc (SH extra_objs): Add ifcvt_sh.o.
* config/sh/t-sh (ifcvt_sh.o): New entry.
* config/sh/sh.c (sh_fixed_condition_code_regs): New function 
that implements the target hook TARGET_FIXED_CONDITION_CODE_REGS.
(register_sh_passes): New function.  Register ifcvt_sh pass.
(sh_option_override): Invoke it.
(sh_canonicalize_comparison): Handle op0_preserve_value.
* sh.md (*cbranch_t"): Do not try to optimize missed test and 
branch opportunities.  Canonicalize branch condition.
(nott): Allow only if pseudos can be created for non-SH2A.
Index: gcc/config.gcc
===
--- gcc/config.gcc	(revision 202876)
+++ gcc/config.gcc	(working copy)
@@ -462,6 +462,7 @@
 	cpu_type=sh
 	need_64bit_hwint=yes
 	extra_options="${extra_options} fused-madd.opt"
+	extra_objs="${extra_objs} ifcvt_sh.o"
 	;;
 v850*-*-*)
 	cpu_type=v850
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 202876)
+++ gcc/config/sh/sh.c	(working copy)
@@ -53,6 +53,9 @@
 #include "alloc-pool.h"
 #include "tm-constrs.h"
 #include "opts.h"
+#include "tree-pass.h"
+#include "pass_manager.h"
+#include "context.h"
 
 #include 
 #include 
@@ -311,6 +314,7 @@
 static void sh_canonicalize_comparison (int *, rtx *, rtx *, bool);
 static void sh_canonicalize_comparison (enum rtx_code&, rtx&, rtx&,
 	enum machine_mode, bool);
+static bool sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2);
 
 static void sh_init_sync_libfuncs (void) ATTRIBUTE_UNUSED;
 
@@ -587,6 +591,9 @@
 #undef TARGET_CANONICALIZE_COMPARISON
 #define TARGET_CANONICALIZE_COMPARISON	sh_canonicalize_comparison
 
+#undef TARGET_FIXED_CONDITION_CODE_REGS
+#define TARGET_FIXED_CONDITION_CODE_REGS sh_fixed_condition_code_regs
+
 /* Machine-specific symbol_ref flags.  */
 #define SYMBOL_FLAG_FUNCVEC_FUNCTION	(SYMBOL_FLAG_MACH_DEP << 0)
 
@@ -710,6 +717,34 @@
 #undef err_ret
 }
 
+/* Register SH specific RTL passes.  */
+extern opt_pass* make_pass_ifcvt_sh (gcc::context* ctx, bool split_insns,
+ const char* name);
+static void
+register_sh_passes (void)
+{
+  if (!TARGET_SH1)
+return;
+
+/* Running the ifcvt_sh pass after ce1 generates better code when
+   comparisons are combined and reg-reg moves are introduced, because
+   reg-reg moves will be eliminated afterwards.  However, there are quite
+   some cases where combine will be unable to fold comparison related insns,
+   thus for now don't do it.
+  register_pass (make_pass_ifcvt_sh (g, false, "ifcvt1_sh"),
+		 PASS_POS_INSERT_AFTER, "ce1", 1);
+*/
+
+  /*  Run ifcvt_sh pass after combine but before register allocation.  */
+  register_pass (make_pass_ifcvt_sh (g, true, "ifcvt2_sh"),
+		 PASS_POS_INSERT_AFTER, "split1", 1);
+
+  /* Run ifcvt_sh pass after register allocation and basic block reordering
+ as this sometimes creates new opportunities.  */
+  register_pass (make_pass_ifcvt_sh (g, true, "ifcvt3_sh"),
+		 PASS_POS_INSERT_AFTER, "split4", 1);
+}
+
 /* Implement TARGET_OPTION_OVERRIDE macro.  Validate and override 
various options, and do some machine dependent initialization.  */
 static void
@@ -1022,6 +1057,8 @@
  target CPU.  */
   selected_atomic_model_
 = parse_validate_atomic_model_option (sh_atomic_model_str);
+
+  register_sh_passes ();
 }
 
 /* Print the operand address in x to the stream.  */
@@ -1908,7 +1945,7 @@
 static void
 sh_canonicalize_comparison (enum rtx_code& cmp, rtx& op0, rtx& op1,
 			enum machine_mode mode,
-			bool op0_preserve_value ATTRIBUTE_UNUSED)
+			bool op0_preserve_value)
 {
   /* When invoked from within the combine pass the mode is not specified,
  so try to get it from one of the operands.  */
@@ -1928,6 +1965,9 @@
   // Make sure that the constant operand is the second operand.
   if (CONST_INT_P (op0) && !CONST_INT_P (op1))
 {
+  if (op0_preserve_value)
+	return;
+
   std::swap (op0, op1);
   cmp = swap_condition (cmp);
 }
@@ -2016,6 +2056,14 @@
   *code = (int)tmp_code;
 }
 
+bool
+sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2)
+{
+  *p1 = T_REG;
+  *p2 = INVALID_REGNUM;
+  return true;
+}
+
 enum rtx_code
 prepare_cbranch_operands (rtx *operands, enum machine_mode mode,
 			  enum rtx_code comparison)
Index: gcc/config/sh/sh.md
===

[C++ Patch] PR 58560

2013-10-04 Thread Paolo Carlini

Hi,

this error recovery ICE (a low priority regression) can be easily 
avoided by checking the TREE_TYPE of exp too. Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2013-10-04  Paolo Carlini  

PR c++/58560
* typeck2.c (build_functional_cast): Use error_operand_p on exp.

/testsuite
2013-10-04  Paolo Carlini  

PR c++/58560
* g++.dg/cpp0x/auto39.C: New.
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 203200)
+++ cp/typeck2.c(working copy)
@@ -1757,7 +1757,7 @@ build_functional_cast (tree exp, tree parms, tsubs
   tree type;
   vec *parmvec;
 
-  if (exp == error_mark_node || parms == error_mark_node)
+  if (error_operand_p (exp) || parms == error_mark_node)
 return error_mark_node;
 
   if (TREE_CODE (exp) == TYPE_DECL)
Index: testsuite/g++.dg/cpp0x/auto39.C
===
--- testsuite/g++.dg/cpp0x/auto39.C (revision 0)
+++ testsuite/g++.dg/cpp0x/auto39.C (working copy)
@@ -0,0 +1,6 @@
+// PR c++/58560
+// { dg-do compile { target c++11 } }
+
+typedef auto T; // { dg-error "typedef declared 'auto'" }
+
+void foo() { T(); }


Enable SSE math on i386 with -Ofast

2013-10-04 Thread Jan Hubicka
Hi,
this patch makes -Ofast to also imply -mfpmath=sse.  It is important win on
SPECfP (2000 and 2006). Even though for exmaple the following
float a(float b)
{
   return b+10;
}

results in somewhat ridiculous
a:
.LFB0:  
.cfi_startproc
subl$4, %esp
.cfi_def_cfa_offset 8
movss   .LC0, %xmm0
addss   8(%esp), %xmm0
movss   %xmm0, (%esp)
flds(%esp)
addl$4, %esp
.cfi_def_cfa_offset 4
ret

I wonder if we can get rid at least of the redundant stack alignment on ESP...

Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are no
complains.  I wonder if -ffast-math should do the same - it is documented as 
enabling
explicit set of options, bu that can be changed I guess.

* invoke.texi (Ofast): Update documentation.
* i386.h (TARGET_FPMATH_DEFAULT): Enable SSE math with -Ofast.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 203161)
+++ doc/invoke.texi (working copy)
@@ -6796,6 +6796,7 @@ Disregard strict standards compliance.
 valid for all standard-compliant programs.
 It turns on @option{-ffast-math} and the Fortran-specific
 @option{-fno-protect-parens} and @option{-fstack-arrays}.
+On i386 target it also enable @option{-mfpmath=sse}.
 
 @item -Og
 @opindex Og
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 203161)
+++ config/i386/i386.h  (working copy)
@@ -209,7 +209,8 @@ extern const struct processor_costs ix86
 
 #ifndef TARGET_FPMATH_DEFAULT
 #define TARGET_FPMATH_DEFAULT \
-  (TARGET_64BIT && TARGET_SSE ? FPMATH_SSE : FPMATH_387)
+  ((TARGET_64BIT && TARGET_SSE) \
+   || (TARGET_SSE && optimize_fast) ? FPMATH_SSE : FPMATH_387)
 #endif
 
 #define TARGET_FLOAT_RETURNS_IN_80387 TARGET_FLOAT_RETURNS


Re: [C++ Patch] PR 58448

2013-10-04 Thread Paolo Carlini

... and this is a more straightforward approach. Also tested x86_64-linux.

Thanks!
Paolo.

/
/cp
2013-10-04  Paolo Carlini  

PR c++/58448
* pt.c (tsubst): Use error_operand_p on parameter t.

/testsuite
2013-10-04  Paolo Carlini  

PR c++/58448
* g++.dg/template/crash117.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 203200)
+++ cp/pt.c (working copy)
@@ -11272,7 +11272,7 @@ tsubst (tree t, tree args, tsubst_flags_t complain
   enum tree_code code;
   tree type, r = NULL_TREE;
 
-  if (t == NULL_TREE || t == error_mark_node
+  if (t == NULL_TREE
   || t == integer_type_node
   || t == void_type_node
   || t == char_type_node
@@ -11281,6 +11281,9 @@ tsubst (tree t, tree args, tsubst_flags_t complain
   || TREE_CODE (t) == TRANSLATION_UNIT_DECL)
 return t;
 
+  if (error_operand_p (t))
+return error_mark_node;
+
   if (DECL_P (t))
 return tsubst_decl (t, args, complain);
 
Index: testsuite/g++.dg/template/crash117.C
===
--- testsuite/g++.dg/template/crash117.C(revision 0)
+++ testsuite/g++.dg/template/crash117.C(working copy)
@@ -0,0 +1,6 @@
+// PR c++/58448
+
+class SmallVector; struct Types4;
+template  struct Types {
+  typedef Types4<>::Constructable // { dg-error "template|typedef|expected" }
+} Types:: > // { dg-error 
"expected" }


Re: patch to canonicalize tree-csts.

2013-10-04 Thread Kenneth Zadeck
So here is a patch with the change. As before, bootstrapped an tested on 
x86-64.


On 10/03/2013 12:16 PM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

Changing the representation of unsigned constants is only worthwhile
if we can avoid the force_to_size for some unsigned cases.  I think we can
avoid it for precision >= xprecision && !small_prec.  Either we should take
the hit of doing that comparison (but see below) or the change isn't
worthwhile.

i think that it is something closer to precision >= xprecision +
HOST_BITS_PER_WIDE_INT && ...
The problem is that the tree cst may have one extra block beyond the
precision.

Ah, OK.


I was thinking that we should always be able to use the constant as-is
for max_wide_int-based and addr_wide_int-based operations.  The small_prec

again, you can get edge cased to death here.i think it would work
for max because that really is bigger than anything else, but it is
possible (though unlikely) to have something big converted to an address
by truncation.

But I'd have expected that conversion to be represented by an explicit
CONVERT_EXPR or NOP_EXPR.  It seems wrong to use addr_wide_int directly on
something that isn't bit- or byte-address-sized.  It'd be the C equivalent
of int + long -> int rather than the expected int + long -> long.

Same goes for wide_int.  If we're doing arithmetic at a specific
precision, it seems odd for one of the inputs to be wider and yet
not have an explicit truncation.

Thanks,
Richard


Index: gcc/tree.c
===
--- gcc/tree.c	(revision 203039)
+++ gcc/tree.c	(working copy)
@@ -1187,10 +1187,10 @@ wide_int_to_tree (tree type, const wide_
   tree t;
   int ix = -1;
   int limit = 0;
-  int i;
+  unsigned int i;
 
   gcc_assert (type);
-  int prec = TYPE_PRECISION (type);
+  unsigned int prec = TYPE_PRECISION (type);
   signop sgn = TYPE_SIGN (type);
 
   /* Verify that everything is canonical.  */
@@ -1204,11 +1204,11 @@ wide_int_to_tree (tree type, const wide_
 }
 
   wide_int cst = wide_int::from (pcst, prec, sgn);
-  int len = int (cst.get_len ());
-  int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
+  unsigned int len = int (cst.get_len ());
+  unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
   bool recanonize = sgn == UNSIGNED
-&& (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len
-&& small_prec;
+&& small_prec
+&& (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len;
 
   switch (TREE_CODE (type))
 {
@@ -1235,7 +1235,7 @@ wide_int_to_tree (tree type, const wide_
 
 case INTEGER_TYPE:
 case OFFSET_TYPE:
-  if (TYPE_UNSIGNED (type))
+  if (TYPE_SIGN (type) == UNSIGNED)
 	{
 	  /* Cache 0..N */
 	  limit = INTEGER_SHARE_LIMIT;
@@ -1294,7 +1294,7 @@ wide_int_to_tree (tree type, const wide_
 	 must be careful here because tree-csts and wide-ints are
 	 not canonicalized in the same way.  */
 	  gcc_assert (TREE_TYPE (t) == type);
-	  gcc_assert (TREE_INT_CST_NUNITS (t) == len);
+	  gcc_assert (TREE_INT_CST_NUNITS (t) == (int)len);
 	  if (recanonize)
 	{
 	  len--;
@@ -1321,7 +1321,10 @@ wide_int_to_tree (tree type, const wide_
 	  TREE_VEC_ELT (TYPE_CACHED_VALUES (type), ix) = t;
 	}
 }
-  else if (cst.get_len () == 1)
+  else if (cst.get_len () == 1
+	   && (TYPE_SIGN (type) == SIGNED
+	   || recanonize
+	   || cst.elt (0) >= 0))
 {
   /* 99.99% of all int csts will fit in a single HWI.  Do that one
 	 efficiently.  */
@@ -1351,14 +1354,29 @@ wide_int_to_tree (tree type, const wide_
 	 for the gc to take care of.  There will not be enough of them
 	 to worry about.  */
   void **slot;
-  tree nt = make_int_cst (len);
-  TREE_INT_CST_NUNITS (nt) = len;
+  tree nt;
+  if (!recanonize
+	  && TYPE_SIGN (type) == UNSIGNED 
+	  && cst.elt (len - 1) < 0)
+	{
+	  unsigned int blocks_needed 
+	= (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT;
+
+	  nt = make_int_cst (blocks_needed + 1);
+	  for (i = len; i < blocks_needed; i++)
+	TREE_INT_CST_ELT (nt, i) = (HOST_WIDE_INT)-1;
+
+	  TREE_INT_CST_ELT (nt, blocks_needed) = 0;
+	}
+  else
+	nt = make_int_cst (len);
   if (recanonize)
 	{
 	  len--;
 	  TREE_INT_CST_ELT (nt, len) = zext_hwi (cst.elt (len), small_prec);
 	}
-  for (int i = 0; i < len; i++)
+	
+  for (i = 0; i < len; i++)
 	TREE_INT_CST_ELT (nt, i) = cst.elt (i);
   TREE_TYPE (nt) = type;
 
@@ -10556,7 +10574,8 @@ widest_int_cst_value (const_tree x)
 
 #if HOST_BITS_PER_WIDEST_INT > HOST_BITS_PER_WIDE_INT
   gcc_assert (HOST_BITS_PER_WIDEST_INT >= HOST_BITS_PER_DOUBLE_INT);
-  gcc_assert (TREE_INT_CST_NUNITS (x) <= 2);
+  gcc_assert (TREE_INT_CST_NUNITS (x) <= 2
+	  || (TREE_INT_CST_NUNITS (x) == 3 && TREE_INT_CST_ELT (x, 2) == 0));
   
   if (TREE_INT_CST_NUNITS (x) == 1)
 val = ((HOST_WIDEST_INT)val << HOST_BITS_PER_WIDE_INT) >> HOST_BITS_PER_WIDE_INT;
@@ -10565,7 

Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Ramana Radhakrishnan

On 10/02/13 23:49, Rong Xu wrote:

Here is the new patch. Honaz: Could you take a look?

Thanks,

-Rong

On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka  wrote:

Thanks for the suggestion. This is much cleaner than to use binary parameter.

Just want to make sure I understand it correctly about the orginal hitrate:
you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use
the one specified in the biniltin-expect-probability parameter.


Yes.


Should I use 90% as the default? It's hard to fit current value 0.9996
in percent form.


Yes, 90% seems fine.  The original value was set quite arbitrarily and no real
performance study was made as far as I know except yours. I think users that
are sure they use expect to gueard completely cold edges may just use 100%
instead of 0.9996, so I would not worry much about the precision.

Honza


-Rong


OK with that change.

Honza



This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug 
further yet but still reducing the testcase.


See

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619

for details.

Can you please deal with this appropriately ?

regards
Ramana




Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Andrew Sutton
>> +  // Do not permit the declaration of constrained friend
>> +  // function declarations. They cannot be instantiated since
>> +  // the resulting declaration would never match the definition,
>> +  // which must be a non-template and cannot be constrained.
>
>
> You're in the template-id code here, so "must be a non-template" is
> confusing:
>
> template  void f();
>
> struct A {
>   friend void f(); // matches a template
> };
>
> Perhaps you mean that it must match a fully-instantiated function, so any
> constraints on the templates were considered during
> determine_specialization.

This seems like a simple comment fix, but there's a longer explanation
of what I want (see below). Would this be more appropriate?

  // Do not allow constrained friend template specializations.

The intent is stronger than to say it must match something. I don't
want to allow any declarations of the form

template
struct X {
  friend void f<>(T x) requires C; // Error.
};

This should never even get to determine_specialization since the
original declaration is never actually pushed.

We could use those constraints to match the specialization to one of
several constrained overloads, as you mentioned earlier, but I'd
rather avoid that for now. Solving that problem in general would
require that we allow constrained (explicit) specializations and
define a method of matching instantiated constraints to dependent
constraints, and that we do so as an alternative to the usual
constraint checking during template argument deduction.

Maybe it's a useful feature, but it's really hard to gauge how much
use it would actually get. We can always revisit that in the future.
Somebody else can write that paper :)

Andrew


Re: [PATCH] libgccjit.so: an embeddable JIT-compiler based on GCC

2013-10-04 Thread Joseph S. Myers
On Thu, 3 Oct 2013, David Malcolm wrote:

> Right now all you get back from the result is a "void*" which you're
> meant to cast to machine code for the CPU.   I guess we could add an

And right now the library is calling dlopen.  Which means that although 
what the user gets is a void *, the dynamic linker processing has dealt 
with registering eh_frame information and the right hooks have been passed 
through for GDB to see that an object has been loaded and access its debug 
information.  Is this use of dlopen intended to be part of the interface, 
or just a temporary hack with users eventually needing to use other 
interfaces for eh_frame and debug info handling?  (GDB has some support 
for JITs, but I haven't looked into the details of how it works.)

> option on the gcc_jit_context for setting which ISA you want code for.

Even apart from completely separate ISAs, there's also the matter of other 
command-line options such as -march=, which I'd think users should be able 
to specify.  And the complication that default options to cc1 can be 
determined by specs, whether in .h files or from configure options such as 
--with=arch=, so cc1's defaults may not be the same as the defaults when 
you run the driver (which are likely to be better defaults for the JIT 
than cc1's).  And if the user wants to use -march=native (which seems 
sensible for the common use case of a JIT, generating code for the same 
system as runs the compiler), that's handled through specs.

Maybe the driver should be converted to library code so it's possible to 
run command-line options through it and generate the command line that 
would get passed to cc1 (but then use that to call a function in the same 
process - with a different copy of global options data, diagnostic context 
etc. to avoid any issues from those being initialized twice - rather than 
running a subprocess).  That would also allow generating the right options 
for the assembler to pass to the library version of the assembler.

> > >   * There are some grotesque kludges in internal-api.c, especially in
> > > how we go from .s assembler files to a DSO (grep for "gross hack" ;) )
> > 
> > Apart from making the assembler into a shared library itself, it would 
> > also be nice to avoid needing those files at all (via an API for writing 
> > assembler text that doesn't depend on a FILE * pointer, although on GNU 
> > hosts you can always use in-memory files via open_memstream, which would 
> > make the internal changes much smaller).  But in the absence of such a 
> > cleanup, running the assembler via the driver should work, although 
> > inefficient.
> 
> (nods)   Note that all of the kludgery (if that's a word) is hidden
> inside the library, so we can fix it all up without affecting client
> code: the client-facing API doesn't expose any of this.
> 
> FWIW I added timevars for measuring the invocation of the driver; that
> kludge makes up about 50% of the overall time taken.  I haven't yet
> broken that down into assembler vs linker vs fork/exec overhead, but
> clearly this is something that could use optimization - leading to
> (very) vague thoughts involving turning parts of binutils into libraries
> also.

First I guess it might simply be a library that receives blocks of text 
that currently go to asm_out_file - but eventually there might be 
efficiency to be gained by passing binary data to the assembler in some 
cases (e.g. for large blocks of debug info) rather than generating and 
then parsing text.  (So some asm_out methods would gain implementations 
passing such data instead of generating text.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[PING] [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2013-10-04 Thread Zoran Jovanovic
Just to ping this patch.

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01829.html

Regards,
Zoran Jovanovic

From: Zoran Jovanovic
Sent: Tuesday, September 24, 2013 11:59 PM
To: gcc-patches@gcc.gnu.org
Cc: Petar Jovanovic
Subject: RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc 
inside)

Hello,
This is new patch version.
Comments from Bernhard Reutner-Fischer review applied.
Also, test case bitfildmrg2.c modified - it is now execute test.


Example:

Original code:
   D.1351;
   D.1350;
   D.1349;
  D.1349_2 = p1_1(D)->f1;
  p2_3(D)->f1 = D.1349_2;
  D.1350_4 = p1_1(D)->f2;
  p2_3(D)->f2 = D.1350_4;
  D.1351_5 = p1_1(D)->f3;
  p2_3(D)->f3 = D.1351_5;

Optimized code:
   D.1358;
  _16 = pr1_2(D)->_field0;
  pr2_4(D)->_field0 = _16;

Algorithm works on basic block level and consists of following 3 major steps:
1. Go trough basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Modify trees accordingly.

New command line option "-ftree-bitfield-merge" is introduced.

Tested - passed gcc regression tests.

Changelog -

gcc/ChangeLog:
2013-09-24 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * Makefile.in : Added tree-sra.c to GTFILES.
  * common.opt (ftree-bitfield-merge): New option.
  * doc/invoke.texi: Added reference to "-ftree-bitfield-merge".
  * tree-sra.c (ssa_bitfield_merge): New function.
  Entry for (-ftree-bitfield-merge).
  (bitfield_stmt_access_pair_htab_hash): New function.
  (bitfield_stmt_access_pair_htab_eq): New function.
  (cmp_access): New function.
  (create_and_insert_access): New function.
  (get_bit_offset): New function.
  (get_merged_bit_field_size): New function.
  (add_stmt_access_pair): New function.
  * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.
  (field_byte_offset): declaration moved to tree.h, static removed.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.
  * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): moved from dwarf2out.c.
  * tree.h (expressions_equal_p): declaration added.
  (field_byte_offset): declaration added.

Patch -

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a2e3f7a..54aa8e7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/asan.c \
   $(srcdir)/ubsan.c \
   $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \
+  $(srcdir)/tree-sra.c \
   @all_gtfiles@

 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/common.opt b/gcc/common.opt
index 202e169..afac514 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2164,6 +2164,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates

+ftree-bitfield-merge
+Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
+Enable bit-field merge on trees
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa0f4ed..e588cae 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
 -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
--ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
+-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
@@ -7679,6 +7679,11 @@ pointer alignment information.
 This pass only operates on local scalar variables and is enabled by default
 at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.

+@item -ftree-bitfield-merge
+@opindex ftree-bitfield-merge
+Combines several adjacent bit-field accesses that copy values
+from one memory location to another into one single bit-field access.
+
 @item -ftree-ccp
 @opindex ftree-ccp
 Perform sparse conditional constant propagation (CCP) on trees.  This
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 95049e4..e74db17 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT, unsigned 
int);
 static tree field_type (const_tree);
 static unsigned int simple_type_align_in_bits (const_tree);
 s

Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Xinliang David Li
Dehao, can you take a look?

David

On Fri, Oct 4, 2013 at 6:05 AM, Ramana Radhakrishnan  wrote:
> On 10/02/13 23:49, Rong Xu wrote:
>>
>> Here is the new patch. Honaz: Could you take a look?
>>
>> Thanks,
>>
>> -Rong
>>
>> On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka  wrote:

 Thanks for the suggestion. This is much cleaner than to use binary
 parameter.

 Just want to make sure I understand it correctly about the orginal
 hitrate:
 you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use
 the one specified in the biniltin-expect-probability parameter.
>>>
>>>
>>> Yes.


 Should I use 90% as the default? It's hard to fit current value 0.9996
 in percent form.
>>>
>>>
>>> Yes, 90% seems fine.  The original value was set quite arbitrarily and no
>>> real
>>> performance study was made as far as I know except yours. I think users
>>> that
>>> are sure they use expect to gueard completely cold edges may just use
>>> 100%
>>> instead of 0.9996, so I would not worry much about the precision.
>>>
>>> Honza


 -Rong
>
>
> OK with that change.
>
> Honza
>
>
>
> This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug further
> yet but still reducing the testcase.
>
> See
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
>
> for details.
>
> Can you please deal with this appropriately ?
>
> regards
> Ramana
>
>


Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-04 Thread Xinliang David Li
On Thu, Oct 3, 2013 at 10:15 PM, Teresa Johnson  wrote:
> This patch handles the fact that small profile count values sometimes
> get truncated down to 0 during updates due to integer arithmetic. This
> causes sub-optimal function splitting under
> -freorder-blocks-and-partition.
>
> The first part fixes the logic in probably_never_executed that looks
> at the bb frequency when the bb count is zero. It now correctly
> compares the count computed from the frequency to the number of
> profile runs instead of to REG_BR_PROB_BASE. The minimum ratio of
> profile counts to training runs required for a block to not be
> considered cold is now encoded in a parameter, with the default value
> of 4 to match the existing code.
>
> The second part ensures that frequencies are correctly updated after
> inlining. The problem is that after inlining the frequencies were
> being recomputed directly from the corresponding bb counts in
> rebuild_frequencies. If the counts had been truncated to 0, then the
> recomputed frequencies would become 0 as well (often leading to
> inconsistencies in the frequencies between blocks). Then the above
> change to probably_never_executed would not help identify these blocks
> as non-cold from the frequencies. The fix was to use the existing
> logic used during static profile rebuilding to also
> recompute/propagate the frequencies from the branch probabilities in
> the profile feedback case. I also renamed a number of estimate_*
> routines to compute_* and updated the comments to reflect the fact
> that these routines are not doing estimation (from a static profile),
> but in fact recomputing/propagating frequencies from the existing
> (either guessed or profile-feedback-based) probabilities.

For the second part, it seems to assume the branch probabilities are
better maintained than bb counts?

David

>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
> (One more round of regression testing in progress after making some
> slight cleanup changes.)
>
> Thanks,
> Teresa
>
> 2013-10-03  Teresa Johnson  
>
> * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
> * predict.c (probably_never_executed): Compare frequency-based
> count to number of training runs.
> (tree_estimate_probability): Update function call to new name.
> compute_bb_frequencies and pass new parameter.
> (compute_loops_at_level): Renamed.
> (compute_loops): Ditto.
> (compute_bb_frequencies): Renamed, and new parameter to
> force recomputing frequencies.
> (rebuild_frequencies): Recompute bb frequencies from the
> probabilities instead of from counts due to truncation issues.
> * predict.h (compute_bb_frequencies): Update declaration.
>
> Index: params.def
> ===
> --- params.def  (revision 203152)
> +++ params.def  (working copy)
> @@ -373,6 +373,12 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
>  "Select fraction of the maximal frequency of executions of
> basic block in function given basic block needs to have to be
> considered hot",
>  1000, 0, 0)
>
> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
> +"min-hot-run-ratio",
> + "The minimum ratio of profile runs to basic block execution count "
> + "for the block to be considered hot",
> +4, 0, 1)
> +
>  DEFPARAM (PARAM_ALIGN_THRESHOLD,
>   "align-threshold",
>   "Select fraction of the maximal frequency of executions of
> basic block in function given basic block get alignment",
> Index: predict.c
> ===
> --- predict.c   (revision 203152)
> +++ predict.c   (working copy)
> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>gcc_checking_assert (fun);
>if (profile_status_for_function (fun) == PROFILE_READ)
>  {
> -  if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
> +  int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
> +  if (RDIV (count * min_run_ratio, profile_info->runs) > 0)
> return false;
>if (!frequency)
> return true;
>if (!ENTRY_BLOCK_PTR->frequency)
> return false;
> -  if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
> REG_BR_PROB_BASE)
> +  if (ENTRY_BLOCK_PTR->count)
> {
> - return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
> -   ENTRY_BLOCK_PTR->frequency)
> - < REG_BR_PROB_BASE / 4);
> +  gcov_type scaled_count
> +  = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
> +  gcov_type computed_count;
> +  /* Check for overflow, in which case ENTRY_BLOCK_PTR->count should
> + be large enough to do the division first without losing much
> + precision.  */
> +  if (scaled_count/frequency/min_run_ratio != ENTRY_BLOCK_PTR->count)
> +   

Re: [Patch] Internal functions for testsuite in regex

2013-10-04 Thread Paolo Carlini

Hi,

On 10/04/2013 04:13 PM, Tim Shen wrote:

This is based on disscusions
here(http://gcc.gnu.org/ml/libstdc++/2013-10/msg00034.html)

And it successfully find a bug in regex_executor.tcc :)

Booted and tested under -m32, -m64 and debug before the change in
regex_executor.tcc;
-m32 and -m64 only for regex_executor.tcc, but I'll start a bootstrap now.
Can you please add short comments inline in the code explaining the 
semantics of the new functions? A short comment before each one. In 
particular, the new *_testsuite functions, isn't immediately clear in 
what they differ from the non-_testsuite variants. In any case we should 
figure out a better name, maybe even *_internal, if we can't find 
anything more accurate, but it should be self-explaining in terms of 
their C++ semantics.


Please also document in a comment before the definition of 
__regex_search_impl the __policy parameter (not just in the body of the 
function). And I don't see why we can't use an enumeration instead of 
integers, like


enum class _RegexExecutorPolicy : int { _AutoExecutor, _DFSExecutor, 
_BFSExecutor };


More trivial issues: inline functions (like all the new _testsuite ones) 
should be in .h, not in .tcc. Some unuglified names, like res ans res2.


Paolo.


Re: [Patch] Internal functions for testsuite in regex

2013-10-04 Thread Paolo Carlini

On 10/04/2013 06:04 PM, Paolo Carlini wrote:
In particular, the new *_testsuite functions, isn't immediately clear 
in what they differ from the non-_testsuite variants. In any case we 
should figure out a better name, maybe even *_internal, if we can't 
find anything more accurate, but it should be self-explaining in terms 
of their C++ semantics.
Thus, the functions are comparing two different executors. Thus say that 
in the name, like: __regex_search_debug, or __regex_search_compare. And 
please say what they are for in a comment before at least the "primary" 
overload, the one actually carrying out the comparison.


All the other comments live.

Paolo.



[patch] Fix unresolved symbol with -gsplit-dwarf

2013-10-04 Thread Cary Coutant
When building the location list for a variable that has been optimized
by SRA, dw_sra_loc_expr sometimes creates a DWARF expression or a
piece of an expression, but then abandons it for some reason.  When
abandoning it, we neglected to release any addr_table entries created
for DW_OP_addr_index opcodes, occasionally resulting in a link-time
unresolved symbol.  This patch releases the addr_table entries before
abandoning a location expression.

Bootstrapped and regression tested on x86-64.
Committed to trunk at r203206.

-cary


2013-10-03  Cary Coutant  

gcc/
* dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when
discarding a location list expression (or a piece of one).

Index: dwarf2out.c
===
--- dwarf2out.c (revision 203183)
+++ dwarf2out.c (working copy)
@@ -13492,6 +13492,9 @@ dw_sra_loc_expr (tree decl, rtx loc)
   if (last != NULL && opsize != bitsize)
{
  padsize += bitsize;
+ /* Discard the current piece of the descriptor and release any
+addr_table entries it uses.  */
+ remove_loc_list_addr_table_entries (cur_descr);
  continue;
}
 
@@ -13500,18 +13503,24 @@ dw_sra_loc_expr (tree decl, rtx loc)
   if (padsize)
{
  if (padsize > decl_size)
-   return NULL;
+   {
+ remove_loc_list_addr_table_entries (cur_descr);
+ goto discard_descr;
+   }
  decl_size -= padsize;
  *descr_tail = new_loc_descr_op_bit_piece (padsize, 0);
  if (*descr_tail == NULL)
-   return NULL;
+   {
+ remove_loc_list_addr_table_entries (cur_descr);
+ goto discard_descr;
+   }
  descr_tail = &(*descr_tail)->dw_loc_next;
  padsize = 0;
}
   *descr_tail = cur_descr;
   descr_tail = tail;
   if (bitsize > decl_size)
-   return NULL;
+   goto discard_descr;
   decl_size -= bitsize;
   if (last == NULL)
{
@@ -13547,9 +13556,9 @@ dw_sra_loc_expr (tree decl, rtx loc)
{
  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
  && (memsize > BITS_PER_WORD || bitsize > BITS_PER_WORD))
-   return NULL;
+   goto discard_descr;
  if (memsize < bitsize)
-   return NULL;
+   goto discard_descr;
  if (BITS_BIG_ENDIAN)
offset = memsize - bitsize;
}
@@ -13557,7 +13566,7 @@ dw_sra_loc_expr (tree decl, rtx loc)
 
  *descr_tail = new_loc_descr_op_bit_piece (bitsize, offset);
  if (*descr_tail == NULL)
-   return NULL;
+   goto discard_descr;
  descr_tail = &(*descr_tail)->dw_loc_next;
}
 }
@@ -13568,9 +13577,14 @@ dw_sra_loc_expr (tree decl, rtx loc)
 {
   *descr_tail = new_loc_descr_op_bit_piece (decl_size, 0);
   if (*descr_tail == NULL)
-   return NULL;
+   goto discard_descr;
 }
   return descr;
+
+discard_descr:
+  /* Discard the descriptor and release any addr_table entries it uses.  */
+  remove_loc_list_addr_table_entries (descr);
+  return NULL;
 }
 
 /* Return the dwarf representation of the location list LOC_LIST of


Re: [Patch] Internal functions for testsuite in regex

2013-10-04 Thread Paolo Carlini
.. a final one: if you don't like all those *_debug functions around, 
both in rehex.h and regex.tcc, you could move all of them to a new 
header matching the new naming scheme, like regex_debug.h. For the time 
being I would recommend putting it in bits/, installing it, including it 
from , exactly like all the other regex bits.


A completely different option, which I also like a lot in fact, would be 
putting the new *_testsuite functions inside the already existing 
testsuite/util/testsuite/regex.h. There you would use namespace 
__gnu_test, would not require strict uglification, etc.


Paolo.


[patch] Relocate remaining tree-flow-inline.h functions

2013-10-04 Thread Andrew MacLeod

This patch mostly re-factors tree-flow-inline.h out of existence.

the gimple-df data structure has found anew home in gimple-ssa.h, and 
this actually seems like a very appropriate place for it as it holds a 
lot fo the ssa specific stuff in it.


The remaining inline functions in tree-flow-inline.h are spread to the 
wind a bit.

- 2 were no longer used, s they are deleted.
- The various stmt_uid functions went to gimple.h, along with some stats 
macros and misc stuff. didn't seem to be a more appropriate place right 
now based on their usage patterns.
- tree-hasher.h was including tree-flow.h simply to get the definition 
of 'struct int_tree_map'.  man.  Its  a hash table function only, so it 
belongs here.
- contains_view_convert_expr_p was only called from tree-sra.c, so I 
moved it there and made it static.
- ranges_overlap_p .. this was used almost exclusively by SSA aliasing 
code, so this seemed appropriate for  tree-ssa-alias.h.
- gimple_ssa_operands goes to tree-ssa-operands.c and becomes static as 
this was the only consumer
- is_global_var and  may_be_aliased are tree specific.. may_be_aliased 
is used by dse.c, and by putting it into tree.c, that file is 
tantalizing close to being able to NOT include tree-ssa.h or any gimple 
stuff..   local_variable_can_escape() is the only routine in there that 
is even aware of ssa or gimple and something seems wrong there. thats on 
my list to take care of shortly.


Bootstraps on x86_64-unknown-linux-gnu and no new regressions.  OK?

Andrew


	* tree-flow.h (tm_restart_node, gimple_df): Move to gimple-ssa.h.
	(struct int_tree_map): Move to tree-hasher.h
	(SCALE, LABEL, PERCENT): Move to gimple.h
	* tree-flow-inline.h: Delete.  Move functions to other files.
	(unmodifiable_var_p, ref_contains_array_ref): Unused, so delete.
	* gimple-ssa.h (tm_restart_node, gimple_df): Relocate from tree-flow.h.
	(gimple_in_ssa_p, gimple_vop): Relocate from tree-flow-inline.h
	* gimple.h (imple_stmt_max_uid, set_gimple_stmt_max_uid,
	inc_gimple_stmt_max_uid, get_lineno): Relocate from tree-flow-inline.h.
	(SCALE, LABEL, PERCENT): Relocate from tree-flow.h
	* tree-hasher.h: Don't include tree-flow.h.
	(struct int_tree_map): Relocate from tree-flow.h.
	* tree-sra.c (contains_view_convert_expr_p): Relocate from
	tree-flow-inline.h and make static.
	* tree-ssa-alias.h (ranges_overlap_p): Relocate from tree-flow-inline.h.
	* tree-ssa-operands.c (gimple_ssa_operands): Relocate from
	tree-flow-inline.h and make static.
	* tree.h (is_global_var, may_be_aliased): Relocate from
	tree-flow-inline.h.
	* Makefile.in (GTFILES): Remove tree-flow.h and add gimple-ssa.h.

Index: tree-flow.h
===
*** tree-flow.h	(revision 203148)
--- tree-flow.h	(working copy)
*** along with GCC; see the file COPYING3.
*** 36,122 
  #include "gimple-low.h"
  #include "tree-into-ssa.h"
  
- /* This structure is used to map a gimple statement to a label,
-or list of labels to represent transaction restart.  */
- 
- struct GTY(()) tm_restart_node {
-   gimple stmt;
-   tree label_or_list;
- };
- 
- /* Gimple dataflow datastructure. All publicly available fields shall have
-gimple_ accessor defined in tree-flow-inline.h, all publicly modifiable
-fields should have gimple_set accessor.  */
- struct GTY(()) gimple_df {
-   /* A vector of all the noreturn calls passed to modify_stmt.
-  cleanup_control_flow uses it to detect cases where a mid-block
-  indirect call has been turned into a noreturn call.  When this
-  happens, all the instructions after the call are no longer
-  reachable and must be deleted as dead.  */
-   vec *modified_noreturn_calls;
- 
-   /* Array of all SSA_NAMEs used in the function.  */
-   vec *ssa_names;
- 
-   /* Artificial variable used for the virtual operand FUD chain.  */
-   tree vop;
- 
-   /* The PTA solution for the ESCAPED artificial variable.  */
-   struct pt_solution escaped;
- 
-   /* A map of decls to artificial ssa-names that point to the partition
-  of the decl.  */
-   struct pointer_map_t * GTY((skip(""))) decls_to_pointers;
- 
-   /* Free list of SSA_NAMEs.  */
-   vec *free_ssanames;
- 
-   /* Hashtable holding definition for symbol.  If this field is not NULL, it
-  means that the first reference to this variable in the function is a
-  USE or a VUSE.  In those cases, the SSA renamer creates an SSA name
-  for this variable with an empty defining statement.  */
-   htab_t GTY((param_is (union tree_node))) default_defs;
- 
-   /* True if there are any symbols that need to be renamed.  */
-   unsigned int ssa_renaming_needed : 1;
- 
-   /* True if all virtual operands need to be renamed.  */
-   unsigned int rename_vops : 1;
- 
-   /* True if the code is in ssa form.  */
-   unsigned int in_ssa_p : 1;
- 
-   /* True if IPA points-to information was computed for this function.  */
-   unsigned int ipa_pta : 1;
- 
-   struct ssa_operands ssa_opera

[Patch, Fortran, committed] GCC 4.8 backporting of defined assignment patches

2013-10-04 Thread Tobias Burnus
I have committed the attached patch to the GCC 4.8 branch, backporting 
some defined assignment patches. Committed as Rev. 203207/203208.


GCC 4.8 added defined assignment for components during intrinsic 
assignment, which had some issues.


a) PR 57697/PR58469
Patch: http://gcc.gnu.org/ml/fortran/2013-09/msg00039.html
Approval: http://gcc.gnu.org/ml/fortran/2013-09/msg00056.html

b) http://gcc.gnu.org/ml/fortran/2013-09/msg00016.html
c) http://gcc.gnu.org/ml/fortran/2013-09/msg00026.html
d) http://gcc.gnu.org/ml/fortran/2013-09/msg00038.html

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 203206)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,13 @@
+2013-10-04  Tobias Burnus  
+
+	Backport from mainline
+	2013-09-25  Tobias Burnus  
+
+	PR fortran/57697
+	PR fortran/58469
+	* resolve.c (generate_component_assignments): Avoid double free
+	at runtime and freeing a still-being used expr.
+
 2013-08-24  Mikael Morin  
 
 	PR fortran/57798
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c	(Revision 203206)
+++ gcc/fortran/resolve.c	(Arbeitskopie)
@@ -9997,6 +9997,26 @@
 		  temp_code = build_assignment (EXEC_ASSIGN,
 		t1, (*code)->expr1,
 NULL, NULL, (*code)->loc);
+
+		  /* For allocatable LHS, check whether it is allocated.  Note
+		 that allocatable components with defined assignment are
+		 not yet support.  See PR 57696.  */
+		  if ((*code)->expr1->symtree->n.sym->attr.allocatable)
+		{
+		  gfc_code *block;
+		  gfc_expr *e =
+			gfc_lval_expr_from_sym ((*code)->expr1->symtree->n.sym);
+		  block = gfc_get_code ();
+		  block->op = EXEC_IF;
+		  block->block = gfc_get_code ();
+		  block->block->op = EXEC_IF;
+		  block->block->expr1
+			  = gfc_build_intrinsic_call (ns,
+GFC_ISYM_ALLOCATED, "allocated",
+(*code)->loc, 1, e);
+		  block->block->next = temp_code;
+		  temp_code = block;
+		}
 		  add_code_to_chain (&temp_code, &tmp_head, &tmp_tail);
 		}
 
@@ -10005,8 +10025,37 @@
 	  gfc_free_expr (this_code->ext.actual->expr);
 	  this_code->ext.actual->expr = gfc_copy_expr (t1);
 	  add_comp_ref (this_code->ext.actual->expr, comp1);
+
+	  /* If the LHS variable is allocatable and wasn't allocated and
+ the temporary is allocatable, pointer assign the address of
+ the freshly allocated LHS to the temporary.  */
+	  if ((*code)->expr1->symtree->n.sym->attr.allocatable
+		  && gfc_expr_attr ((*code)->expr1).allocatable)
+		{
+		  gfc_code *block;
+		  gfc_expr *cond;
+
+		  cond = gfc_get_expr ();
+		  cond->ts.type = BT_LOGICAL;
+		  cond->ts.kind = gfc_default_logical_kind;
+		  cond->expr_type = EXPR_OP;
+		  cond->where = (*code)->loc;
+		  cond->value.op.op = INTRINSIC_NOT;
+		  cond->value.op.op1 = gfc_build_intrinsic_call (ns,
+	  GFC_ISYM_ALLOCATED, "allocated",
+	  (*code)->loc, 1, gfc_copy_expr (t1));
+		  block = gfc_get_code ();
+		  block->op = EXEC_IF;
+		  block->block = gfc_get_code ();
+		  block->block->op = EXEC_IF;
+		  block->block->expr1 = cond;
+		  block->block->next = build_assignment (EXEC_POINTER_ASSIGN,
+	t1, (*code)->expr1,
+	NULL, NULL, (*code)->loc);
+		  add_code_to_chain (&block, &head, &tail);
+		}
 	}
-	  }
+	}
   else if (this_code->op == EXEC_ASSIGN && !this_code->next)
 	{
 	  /* Don't add intrinsic assignments since they are already
@@ -10028,13 +10077,6 @@
 	}
 }
 
-  /* This is probably not necessary.  */
-  if (this_code)
-{
-  gfc_free_statements (this_code);
-  this_code = NULL;
-}
-
   /* Put the temporary assignments at the top of the generated code.  */
   if (tmp_head && component_assignment_level == 1)
 {
@@ -10043,6 +10085,30 @@
   tmp_head = tmp_tail = NULL;
 }
 
+  // If we did a pointer assignment - thus, we need to ensure that the LHS is
+  // not accidentally deallocated. Hence, nullify t1.
+  if (t1 && (*code)->expr1->symtree->n.sym->attr.allocatable
+  && gfc_expr_attr ((*code)->expr1).allocatable)
+{
+  gfc_code *block;
+  gfc_expr *cond;
+  gfc_expr *e;
+
+  e = gfc_lval_expr_from_sym ((*code)->expr1->symtree->n.sym);
+  cond = gfc_build_intrinsic_call (ns, GFC_ISYM_ASSOCIATED, "associated",
+   (*code)->loc, 2, gfc_copy_expr (t1), e);
+  block = gfc_get_code ();
+  block->op = EXEC_IF;
+  block->block = gfc_get_code ();
+  block->block->op = EXEC_IF;
+  block->block->expr1 = cond;
+  block->block->next = build_assignment (EXEC_POINTER_ASSIGN,
+	t1, gfc_get_null_expr (&(*code)->loc),
+	NULL, NULL, (*code)->loc);
+  gfc_append_code (tail, block);
+  tail = block;
+}
+
   /* Now attach the remaining code chain to the input code.  Step on
  to the end of the new code since resolution is complete.  */
   gcc_assert (

Re: patch to canonize unsigned tree-csts

2013-10-04 Thread Richard Sandiford
I was hoping Richard would weigh in here.  In case not...

Kenneth Zadeck  writes:
 I was thinking that we should always be able to use the constant as-is
 for max_wide_int-based and addr_wide_int-based operations.  The small_prec
>>> again, you can get edge cased to death here.i think it would work
>>> for max because that really is bigger than anything else, but it is
>>> possible (though unlikely) to have something big converted to an address
>>> by truncation.
>> But I'd have expected that conversion to be represented by an explicit
>> CONVERT_EXPR or NOP_EXPR.  It seems wrong to use addr_wide_int directly on
>> something that isn't bit- or byte-address-sized.  It'd be the C equivalent
>> of int + long -> int rather than the expected int + long -> long.
>>
>> Same goes for wide_int.  If we're doing arithmetic at a specific
>> precision, it seems odd for one of the inputs to be wider and yet
>> not have an explicit truncation.
> you miss the second reason why we needed addr-wide-int.   A large amount 
> of the places where the addressing arithmetic is done are not "type 
> safe".Only the gimple and rtl that is translated from the source 
> code is really type safe. In passes like the strength reduction 
> where they just "grab things from all over", the addr-wide-int or the 
> max-wide-int are safe haven structures that are guaranteed to be large 
> enough to not matter.So what i fear here is something like a very 
> wide loop counter being grabbed into some address calculation.

It still seems really dangerous to be implicitly truncating a wider type
to addr_wide_int.  It's not something we'd ever do in mainline, because
uses of addr_wide_int are replacing uses of double_int, and double_int
is our current maximum-width representation.

Using addr_wide_int rather than max_wide_int is an optimisation.
IMO part of implementating that optimisation should be to introduce
explicit truncations whenever we try to use addr_wide_int to operate
on inputs than might be wider than addr_wide_int.

So I still think the assert is the way to go.

Thanks,
Richard



[PATCH][ARM]Use cortex tune_params for armv8-a architecture

2013-10-04 Thread Renlin Li

Hi all,

This patch will change tune_params for armv8-a architecture to general 
cortex.

Change has been tested for arm-none-eabi on the model.

Ok for trunk?

Kind regards,
Renlin Li

gcc/ChangeLog:

2013-10-03  Renlin Li 

* config/arm/arm-cores.def (cortex-a53): Use cortex tunning.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 3d59fa6..17c9bf3 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -129,7 +129,7 @@ ARM_CORE("cortex-a7",	  cortexa7,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV
 ARM_CORE("cortex-a8",	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
 ARM_CORE("cortex-a15",	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
-ARM_CORE("cortex-a53",	  cortexa53,	8A, FL_LDSCHED, cortex_a5)
+ARM_CORE("cortex-a53",	  cortexa53,	8A, FL_LDSCHED, cortex)
 ARM_CORE("cortex-r4",	  cortexr4,	7R, FL_LDSCHED, cortex)
 ARM_CORE("cortex-r4f",	  cortexr4f,	7R, FL_LDSCHED, cortex)
 ARM_CORE("cortex-r5",	  cortexr5,	7R, FL_LDSCHED | FL_ARM_DIV, cortex)

Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Jason Merrill

On 10/04/2013 09:20 AM, Andrew Sutton wrote:

>Perhaps you mean that it must match a fully-instantiated function, so any
>constraints on the templates were considered during
>determine_specialization.



This seems like a simple comment fix, but there's a longer explanation
of what I want (see below). Would this be more appropriate?

   // Do not allow constrained friend template specializations.

The intent is stronger than to say it must match something.


By "must" I meant that whatever it matches could only be a 
fully-instantiated function.


But I guess the main reason for disallowing constraints here is the same 
as for explicit specializations and non-template functions; 
non-dependent constraints don't really make any sense.


Jason



Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Andrew Sutton
>>> >Perhaps you mean that it must match a fully-instantiated function, so
>>> > any
>>> >constraints on the templates were considered during
>>> >determine_specialization.
>
>
>> This seems like a simple comment fix, but there's a longer explanation
>> of what I want (see below). Would this be more appropriate?
>>
>>// Do not allow constrained friend template specializations.
>>
>> The intent is stronger than to say it must match something.
>
>
> By "must" I meant that whatever it matches could only be a
> fully-instantiated function.


I see what you mean. I was caught up in the wrong part of the
sentence.. But yes, that's right.

> But I guess the main reason for disallowing constraints here is the same as
> for explicit specializations and non-template functions; non-dependent
> constraints don't really make any sense.

That's the intent.

Okay to commit?

Andrew


friends-3.patch
Description: Binary data


Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Rong Xu
My change on the probability of builtin_expect does have an impact on
the partial inline (more outlined functions will get inline back to
the original function).
I think this triggers an existing issue.
Dehao will explain this in his coming email.

-Rong

On Fri, Oct 4, 2013 at 6:05 AM, Ramana Radhakrishnan  wrote:
> On 10/02/13 23:49, Rong Xu wrote:
>>
>> Here is the new patch. Honaz: Could you take a look?
>>
>> Thanks,
>>
>> -Rong
>>
>> On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka  wrote:

 Thanks for the suggestion. This is much cleaner than to use binary
 parameter.

 Just want to make sure I understand it correctly about the orginal
 hitrate:
 you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use
 the one specified in the biniltin-expect-probability parameter.
>>>
>>>
>>> Yes.


 Should I use 90% as the default? It's hard to fit current value 0.9996
 in percent form.
>>>
>>>
>>> Yes, 90% seems fine.  The original value was set quite arbitrarily and no
>>> real
>>> performance study was made as far as I know except yours. I think users
>>> that
>>> are sure they use expect to gueard completely cold edges may just use
>>> 100%
>>> instead of 0.9996, so I would not worry much about the precision.
>>>
>>> Honza


 -Rong
>
>
> OK with that change.
>
> Honza
>
>
>
> This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug further
> yet but still reducing the testcase.
>
> See
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
>
> for details.
>
> Can you please deal with this appropriately ?
>
> regards
> Ramana
>
>


[google/gcc-4_8] Backport fix for unresolved symbol with -gsplit-dwarf

2013-10-04 Thread Cary Coutant
When building the location list for a variable that has been optimized
by SRA, dw_sra_loc_expr sometimes creates a DWARF expression or a
piece of an expression, but then abandons it for some reason.  When
abandoning it, we neglected to release any addr_table entries created
for DW_OP_addr_index opcodes, occasionally resulting in a link-time
unresolved symbol.  This patch releases the addr_table entries before
abandoning a location expression.

Backported from trunk r203206.

Google ref b/10833306.

2013-10-03  Cary Coutant  

gcc/
* dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when
discarding a location list expression (or a piece of one).


Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Dehao Chen
I looked at this problem. Bug updated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619

This is a bug when updating block during tree-inline. Basically, it is
legal for *n to be NULL. E.g. When gimple_block(id->gimple_call) is
NULL, remap_blocks_to_null will be called to set *n to NULL.

The problem is that we should check this before calling
COMBINE_LOCATION_DATA, which assumes block is not NULL.

The following patch can fix the problem:

Index: gcc/tree-inline.c
===
--- gcc/tree-inline.c (revision 203208)
+++ gcc/tree-inline.c (working copy)
@@ -2090,7 +2090,10 @@ copy_phis_for_bb (basic_block bb, copy_body_data *
   n = (tree *) pointer_map_contains (id->decl_map,
  LOCATION_BLOCK (locus));
   gcc_assert (n);
-  locus = COMBINE_LOCATION_DATA (line_table, locus, *n);
+  if (*n)
+locus = COMBINE_LOCATION_DATA (line_table, locus, *n);
+  else
+locus = LOCATION_LOCUS (locus);
  }
   else
  locus = LOCATION_LOCUS (locus);

On Fri, Oct 4, 2013 at 11:05 AM, Rong Xu  wrote:
> My change on the probability of builtin_expect does have an impact on
> the partial inline (more outlined functions will get inline back to
> the original function).
> I think this triggers an existing issue.
> Dehao will explain this in his coming email.
>
> -Rong
>
> On Fri, Oct 4, 2013 at 6:05 AM, Ramana Radhakrishnan  wrote:
>> On 10/02/13 23:49, Rong Xu wrote:
>>>
>>> Here is the new patch. Honaz: Could you take a look?
>>>
>>> Thanks,
>>>
>>> -Rong
>>>
>>> On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka  wrote:
>
> Thanks for the suggestion. This is much cleaner than to use binary
> parameter.
>
> Just want to make sure I understand it correctly about the orginal
> hitrate:
> you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use
> the one specified in the biniltin-expect-probability parameter.


 Yes.
>
>
> Should I use 90% as the default? It's hard to fit current value 0.9996
> in percent form.


 Yes, 90% seems fine.  The original value was set quite arbitrarily and no
 real
 performance study was made as far as I know except yours. I think users
 that
 are sure they use expect to gueard completely cold edges may just use
 100%
 instead of 0.9996, so I would not worry much about the precision.

 Honza
>
>
> -Rong
>>
>>
>> OK with that change.
>>
>> Honza
>>
>>
>>
>> This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug further
>> yet but still reducing the testcase.
>>
>> See
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
>>
>> for details.
>>
>> Can you please deal with this appropriately ?
>>
>> regards
>> Ramana
>>
>>


Go patch committed: Use backend interface for references to temps

2013-10-04 Thread Ian Lance Taylor
This patch from Chris Manghane changes the Go frontend to use the
backend interface when building a reference to a temporary variable.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.8 branch.

Ian


2013-10-04  Chris Manghane  

* go-gcc.cc (Backend::convert_expression): New function.


Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc	(revision 203037)
+++ gcc/go/go-gcc.cc	(revision 203038)
@@ -208,6 +208,16 @@ class Gcc_backend : public Backend
   Bexpression*
   zero_expression(Btype*);
 
+  Bexpression*
+  error_expression()
+  { return this->make_expression(error_mark_node); }
+
+  Bexpression*
+  var_expression(Bvariable* var, Location);
+
+  Bexpression*
+  indirect_expression(Bexpression* expr, bool known_valid, Location);
+
   // Statements.
 
   Bstatement*
@@ -848,6 +858,30 @@ Gcc_backend::zero_expression(Btype* btyp
   return tree_to_expr(ret);
 }
 
+// An expression that references a variable.
+
+Bexpression*
+Gcc_backend::var_expression(Bvariable* var, Location)
+{
+  tree ret = var->get_tree();
+  if (ret == error_mark_node)
+return this->error_expression();
+  return tree_to_expr(ret);
+}
+
+// An expression that indirectly references an expression.
+
+Bexpression*
+Gcc_backend::indirect_expression(Bexpression* expr, bool known_valid,
+ Location location)
+{
+  tree ret = build_fold_indirect_ref_loc(location.gcc_location(),
+ expr->get_tree());
+  if (known_valid)
+TREE_THIS_NOTRAP(ret) = 1;
+  return tree_to_expr(ret);
+}
+
 // An expression as a statement.
 
 Bstatement*
Index: gcc/go/gofrontend/backend.h
===
--- gcc/go/gofrontend/backend.h	(revision 203037)
+++ gcc/go/gofrontend/backend.h	(revision 203038)
@@ -231,6 +231,22 @@ class Backend
   virtual Bexpression*
   zero_expression(Btype*) = 0;
 
+  // Create an error expression. This is used for cases which should
+  // not occur in a correct program, in order to keep the compilation
+  // going without crashing.
+  virtual Bexpression*
+  error_expression() = 0;
+
+  // Create a reference to a variable.
+  virtual Bexpression*
+  var_expression(Bvariable* var, Location) = 0;
+
+  // Create an expression that indirects through the pointer expression EXPR
+  // (i.e., return the expression for *EXPR). KNOWN_VALID is true if the pointer
+  // is known to point to a valid memory location.
+  virtual Bexpression*
+  indirect_expression(Bexpression* expr, bool known_valid, Location) = 0;
+
   // Statements.
 
   // Create an error statement.  This is used for cases which should
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc	(revision 203037)
+++ gcc/go/gofrontend/expressions.cc	(revision 203038)
@@ -978,22 +978,19 @@ Var_expression::do_get_tree(Translate_co
 {
   Bvariable* bvar = this->variable_->get_backend_variable(context->gogo(),
 			  context->function());
-  tree ret = var_to_tree(bvar);
-  if (ret == error_mark_node)
-return error_mark_node;
   bool is_in_heap;
+  Location loc = this->location();
   if (this->variable_->is_variable())
 is_in_heap = this->variable_->var_value()->is_in_heap();
   else if (this->variable_->is_result_variable())
 is_in_heap = this->variable_->result_var_value()->is_in_heap();
   else
 go_unreachable();
+
+  Bexpression* ret = context->backend()->var_expression(bvar, loc);
   if (is_in_heap)
-{
-  ret = build_fold_indirect_ref_loc(this->location().gcc_location(), ret);
-  TREE_THIS_NOTRAP(ret) = 1;
-}
-  return ret;
+ret = context->backend()->indirect_expression(ret, true, loc);
+  return expr_to_tree(ret);
 }
 
 // Ast dump for variable expression.


libgo patch committed: Fix calling Interface on MakeFunc value

2013-10-04 Thread Ian Lance Taylor
This patch to libgo fixes calling the Interface method on a Value
created by calling MakeFunc.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.8 branch.

Ian

diff -r 52c01e7b81fe libgo/go/reflect/all_test.go
--- a/libgo/go/reflect/all_test.go	Fri Oct 04 11:00:40 2013 -0700
+++ b/libgo/go/reflect/all_test.go	Fri Oct 04 11:49:27 2013 -0700
@@ -1454,6 +1454,30 @@
 	}
 }
 
+func TestMakeFuncInterface(t *testing.T) {
+	switch runtime.GOARCH {
+	case "amd64", "386":
+	default:
+		t.Skip("MakeFunc not implemented for " + runtime.GOARCH)
+	}
+
+	fn := func(i int) int { return i }
+	incr := func(in []Value) []Value {
+		return []Value{ValueOf(int(in[0].Int() + 1))}
+	}
+	fv := MakeFunc(TypeOf(fn), incr)
+	ValueOf(&fn).Elem().Set(fv)
+	if r := fn(2); r != 3 {
+		t.Errorf("Call returned %d, want 3", r)
+	}
+	if r := fv.Call([]Value{ValueOf(14)})[0].Int(); r != 15 {
+		t.Errorf("Call returned %d, want 15", r)
+	}
+	if r := fv.Interface().(func(int) int)(26); r != 27 {
+		t.Errorf("Call returned %d, want 27", r)
+	}
+}
+
 type Point struct {
 	x, y int
 }
diff -r 52c01e7b81fe libgo/go/reflect/makefunc.go
--- a/libgo/go/reflect/makefunc.go	Fri Oct 04 11:00:40 2013 -0700
+++ b/libgo/go/reflect/makefunc.go	Fri Oct 04 11:49:27 2013 -0700
@@ -63,7 +63,7 @@
 
 	impl := &makeFuncImpl{code: code, typ: ftyp, fn: fn}
 
-	return Value{t, unsafe.Pointer(impl), flag(Func) << flagKindShift}
+	return Value{t, unsafe.Pointer(&impl), flag(Func<

Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Jan Hubicka
> I looked at this problem. Bug updated
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
> 
> This is a bug when updating block during tree-inline. Basically, it is
> legal for *n to be NULL. E.g. When gimple_block(id->gimple_call) is
> NULL, remap_blocks_to_null will be called to set *n to NULL.

The NULL in gimple_block (gimple_call) comes from the call introduced by 
ipa-split?
I remember that ipa-split used to try to put the call into block since we was 
ICEing
in similar ways previously, too.  Perhaps this has changed with new BLOCK 
representation?

Honza


Re: patch to canonize unsigned tree-csts

2013-10-04 Thread Kenneth Zadeck

On 10/04/2013 01:00 PM, Richard Sandiford wrote:

I was hoping Richard would weigh in here.  In case not...

Kenneth Zadeck  writes:

I was thinking that we should always be able to use the constant as-is
for max_wide_int-based and addr_wide_int-based operations.  The small_prec

again, you can get edge cased to death here.i think it would work
for max because that really is bigger than anything else, but it is
possible (though unlikely) to have something big converted to an address
by truncation.

But I'd have expected that conversion to be represented by an explicit
CONVERT_EXPR or NOP_EXPR.  It seems wrong to use addr_wide_int directly on
something that isn't bit- or byte-address-sized.  It'd be the C equivalent
of int + long -> int rather than the expected int + long -> long.

Same goes for wide_int.  If we're doing arithmetic at a specific
precision, it seems odd for one of the inputs to be wider and yet
not have an explicit truncation.

you miss the second reason why we needed addr-wide-int.   A large amount
of the places where the addressing arithmetic is done are not "type
safe".Only the gimple and rtl that is translated from the source
code is really type safe. In passes like the strength reduction
where they just "grab things from all over", the addr-wide-int or the
max-wide-int are safe haven structures that are guaranteed to be large
enough to not matter.So what i fear here is something like a very
wide loop counter being grabbed into some address calculation.

It still seems really dangerous to be implicitly truncating a wider type
to addr_wide_int.  It's not something we'd ever do in mainline, because
uses of addr_wide_int are replacing uses of double_int, and double_int
is our current maximum-width representation.

Using addr_wide_int rather than max_wide_int is an optimisation.
IMO part of implementating that optimisation should be to introduce
explicit truncations whenever we try to use addr_wide_int to operate
on inputs than might be wider than addr_wide_int.

So I still think the assert is the way to go.
addr_wide_int is not as much of an optimization as it is documentation 
of what you are doing - i.e. this is addressing arithmetic.  My 
justification for putting it in was that we wanted a sort of an abstract 
type to say that this was not just user math, it was addressing 
arithmetic and that the ultimate result is going to be slammed into a 
target pointer.


I was only using that as an example to try to indicate that I did not 
think that it was wrong if someone did truncate.   In particular, would 
you want the assert to be that the value was truncated or that the type 
of the value would allow numbers that would be truncated?   I actually 
think neither.


If a programmer uses a long long on a 32 bit machine for some index 
variable and slams that into a pointer, he either knows what he is doing 
or has made a mistake.do you really think that the compiler should ice?




Thanks,
Richard





Merge from GCC 4.8 branch to gccgo branch

2013-10-04 Thread Ian Lance Taylor
I merged revision 203214 of the GCC 4.8 branch to the gccgo branch.

Ian


Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Dehao Chen
On Fri, Oct 4, 2013 at 11:54 AM, Jan Hubicka  wrote:
>> I looked at this problem. Bug updated
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
>>
>> This is a bug when updating block during tree-inline. Basically, it is
>> legal for *n to be NULL. E.g. When gimple_block(id->gimple_call) is
>> NULL, remap_blocks_to_null will be called to set *n to NULL.
>
> The NULL in gimple_block (gimple_call) comes from the call introduced by 
> ipa-split?

That is correct.

> I remember that ipa-split used to try to put the call into block since we was 
> ICEing
> in similar ways previously, too.  Perhaps this has changed with new BLOCK 
> representation?

The new BLOCK representation does not change this. I think it makes
sense to leave the block of newly introduced call_stmt as NULL because
when it's inlined back, we don't want to add additional block layers.

Dehao

>
> Honza


Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Jason Merrill

OK.

Jason


[gomp4] Target fallback ICV handling, ICV fixes

2013-10-04 Thread Jakub Jelinek
Hi!

I've committed the following patch to gomp-4.0-branch.

The omp-low.c changes are to fix some bugs with if clause on
#pragma omp target{, data, update}.
The c-cppbuiltin.c is to finally announce OpenMP 4.0 support for C/C++.

The libgomp changes are:
1) as required by OpenMP 4.0, thread_limit_var is now a per-data-environment
   ICV, rather than global var
2) gomp_remaining_threads_count has been removed, instead as required by the
   spec ThreadsBusy from the spec is tracked per contention group inside
   of thread_pool; if there is just one contention group, then the new
   thr->thread_pool->threads_busy should be the difference between
   icv->thread_limit_var and the old gomp_remaining_threads_count;
   so, in gomp_resolve_num_threads now we add nthreads - 1 to it
   rather than subtracting it
3) apparently the old OMP_THREAD_LIMIT code was buggy, because
   GOMP_parallel_end was also subtracting from gomp_remaining_threads_count
   rather than adding to it (with the new code it is correct to subtract;
   when I get spare time I'll write a small alternative patch for the
   release branches together with thread-limit-1.c testcase)
4) as the threads_busy count is now per-contention group, if a parallel
   isn't nested, we actually don't need to atomically update the counter,
   because there is just one thread in the contention group
5) gomp_managed_threads counter remains to be a global var, that is used
   to decide about spinning length, that is desirable to be global and
   is not user observable thing covered by the standard; I've just
   renamed the mutex guarding it
6) for GOMP_target host fallback, the function will create a new initial
   thread by making a copy of the old TLS *gomp_thread () and clearing it
   (except for affinity place and reinitializing it's place var to the
   whole place list), then restoring back
7) I've noticed that &thr->release semaphore is never used for the master
   threads, so there is no point initializing it; we were initializing
   it just for the first initial thread, e.g. not in subsequent user
   pthread_create created threads that encounter #pragma omp constructs;
   and the semaphore wasn't ever destroyed
8) GOMP_teams is now implemented for the host fallback just by adjusting
   icv->thread_limit_var
9) on the target-7.c testcase I found several issues in the var remapping
   code (some fields could be uninitialized in certain cases)

Tested on x86_64-linux, committed.

2013-10-04  Jakub Jelinek  

* omp-low.c (expand_omp_target): When handling IF clause on
#pragma omp target, split new_bb rather than entry_bb.  If
not GF_OMP_TARGET_KIND_REGION, split new_bb right before
the GOMP_TARGET stmt, rather than after labels.
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Predefine _OPENMP to
201307 instead of 201107.
libgomp/
* libgomp.h (struct gomp_task_icv): Add thread_limit_var.
(gomp_thread_limit_var, gomp_remaining_threads_count,
gomp_remaining_threads_lock): Remove.
(gomp_managed_threads_lock): New variable.
(struct gomp_thread_pool): Add threads_busy field.
(gomp_free_thread): New prototype.
* parallel.c (gomp_resolve_num_threads): Adjust for
thread_limit now being in icv->thread_limit_var.  Use
UINT_MAX instead of ULONG_MAX as infinity.  If not nested,
just return minimum of max_num_threads and icv->thread_limit_var
and if thr->thread_pool, set threads_busy to the returned value.
Otherwise, don't update atomically gomp_remaining_threads_count,
but instead thr->thread_pool->threads_busy.
(GOMP_parallel_end): Adjust for thread_limit now being in
icv->thread_limit_var.  Use UINT_MAX instead of ULONG_MAX as
infinity.  Adjust threads_busy in the pool rather than
gomp_remaining_threads_count.  Remember team->nthreads and call
gomp_team_end before adjusting threads_busy, if not nested
afterwards, just set it to 1 non-atomically.
* team.c (gomp_thread_start): Clear thr->thread_pool and
thr->task before returning.
(gomp_free_pool_helper): Clear thr->thread_pool and thr->task
before calling pthread_exit.
(gomp_free_thread): No longer static.  Use
gomp_managed_threads_lock instead of gomp_remaining_threads_lock.
(gomp_team_start): Set thr->thread_pool->threads_busy to
nthreads immediately after creating new pool.  Use
gomp_managed_threads_lock instead of gomp_remaining_threads_lock.
(gomp_team_end): Use gomp_managed_threads_lock instead of
gomp_remaining_threads_lock.
(initialize_team): Don't call gomp_sem_init here.
* env.c (gomp_global_icv): Initialize thread_limit_var field.
(gomp_thread_limit_var, gomp_remaining_threads_count,
gomp_remaining_threads_lock): Remove.
(gomp_managed_threads_locks): New variable.
(handle_omp_displ

[PATCH] Refactor a bit of jump thread identification code

2013-10-04 Thread Jeff Law


This pulls out the code to search for a threading opportunity in a 
normal block into its own subroutine.   No functional changes.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on the trunk.



diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 065ebf1..cf4a45c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-04  Jeff Law  
+
+   * tree-ssa-threadedge.c: Fix some trailing whitespace problems.
+
+   * tree-ssa-threadedge.c (thread_through_normal_block): Broken out of ...
+   (thread_across_edge): Here.  Call it.
+
 2013-10-04  Cary Coutant  
 
* dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 39e921b..c2dd015 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -771,7 +771,7 @@ thread_around_empty_blocks (edge taken_edge,
   gsi = gsi_start_nondebug_bb (bb);
 
   /* If the block has no statements, but does have a single successor, then
- it's just a forwarding block and we can thread through it trivially. 
+ it's just a forwarding block and we can thread through it trivially.
 
  However, note that just threading through empty blocks with single
  successors is not inherently profitable.  For the jump thread to
@@ -779,7 +779,7 @@ thread_around_empty_blocks (edge taken_edge,
 
  By taking the return value from the recursive call, we get the
  desired effect of returning TRUE when we found a profitable jump
- threading opportunity and FALSE otherwise. 
+ threading opportunity and FALSE otherwise.
 
  This is particularly important when this routine is called after
  processing a joiner block.  Returning TRUE too aggressively in
@@ -844,13 +844,16 @@ thread_around_empty_blocks (edge taken_edge,
  path);
   return true;
 }
- 
+
   return false;
 }
-  
+
 /* We are exiting E->src, see if E->dest ends with a conditional
jump which has a known value when reached via E.
 
+   E->dest can have arbitrary side effects which, if threading is
+   successful, will be maintained.
+
Special care is necessary if E is a back edge in the CFG as we
may have already recorded equivalences for E->dest into our
various tables, including the result of the conditional at
@@ -858,11 +861,6 @@ thread_around_empty_blocks (edge taken_edge,
limited in that case to avoid short-circuiting the loop
incorrectly.
 
-   Note it is quite common for the first block inside a loop to
-   end with a conditional which is either always true or always
-   false when reached via the loop backedge.  Thus we do not want
-   to blindly disable threading across a loop backedge.
-
DUMMY_COND is a shared cond_expr used by condition simplification as 
scratch,
to avoid allocating memory.
 
@@ -873,17 +871,19 @@ thread_around_empty_blocks (edge taken_edge,
STACK is used to undo temporary equivalences created during the walk of
E->dest.
 
-   SIMPLIFY is a pass-specific function used to simplify statements.  */
+   SIMPLIFY is a pass-specific function used to simplify statements.
 
-void
-thread_across_edge (gimple dummy_cond,
-   edge e,
-   bool handle_dominating_asserts,
-   vec *stack,
-   tree (*simplify) (gimple, gimple))
-{
-  gimple stmt;
+   Our caller is responsible for restoring the state of the expression
+   and const_and_copies stacks.  */
 
+static bool
+thread_through_normal_block (edge e,
+gimple dummy_cond,
+bool handle_dominating_asserts,
+vec *stack,
+tree (*simplify) (gimple, gimple),
+vec *path)
+{
   /* If E is a backedge, then we want to verify that the COND_EXPR,
  SWITCH_EXPR or GOTO_EXPR at the end of e->dest is not affected
  by any statements in e->dest.  If it is affected, then it is not
@@ -891,20 +891,19 @@ thread_across_edge (gimple dummy_cond,
   if (e->flags & EDGE_DFS_BACK)
 {
   if (cond_arg_set_in_bb (e, e->dest))
-   goto fail;
+   return false;
 }
 
-  stmt_count = 0;
-
   /* PHIs create temporary equivalences.  */
   if (!record_temporary_equivalences_from_phis (e, stack))
-goto fail;
+return false;
 
   /* Now walk each statement recording any context sensitive
  temporary equivalences we can detect.  */
-  stmt = record_temporary_equivalences_from_stmts_at_dest (e, stack, simplify);
+  gimple stmt
+= record_temporary_equivalences_from_stmts_at_dest (e, stack, simplify);
   if (!stmt)
-goto fail;
+return false;
 
   /* If we stopped at a COND_EXPR or SWITCH_EXPR, see if we know which arm
  will be taken.  */
@@ -927,9 +926,8 @@ thread_across_edge (gimple dummy_cond,
  /* DEST could be NULL for a computed jump to an absolute
 address.  */

Re: [C++ Patch] PR 58448

2013-10-04 Thread Jason Merrill

OK.

Jason


Re: [PATCH] alternative hirate for builtin_expert

2013-10-04 Thread Jan Hubicka
> On Fri, Oct 4, 2013 at 11:54 AM, Jan Hubicka  wrote:
> >> I looked at this problem. Bug updated
> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
> >>
> >> This is a bug when updating block during tree-inline. Basically, it is
> >> legal for *n to be NULL. E.g. When gimple_block(id->gimple_call) is
> >> NULL, remap_blocks_to_null will be called to set *n to NULL.
> >
> > The NULL in gimple_block (gimple_call) comes from the call introduced by 
> > ipa-split?
> 
> That is correct.
> 
> > I remember that ipa-split used to try to put the call into block since we 
> > was ICEing
> > in similar ways previously, too.  Perhaps this has changed with new BLOCK 
> > representation?
> 
> The new BLOCK representation does not change this. I think it makes
> sense to leave the block of newly introduced call_stmt as NULL because
> when it's inlined back, we don't want to add additional block layers.

You are right, it may be result of Jakub's changes in the area (to improve 
debug info
after inlining back).  I guess the patch makes sense then.

Honza

> 
> Dehao
> 
> >
> > Honza


Re: Enable building of libatomic on AArch64

2013-10-04 Thread Andrew Pinski
On Thu, Oct 3, 2013 at 3:43 PM, Michael Hudson-Doyle
 wrote:
> Hi,
>
> As libatomic builds for and the tests pass on AArch64 (built on x86_64
> but tested on a foundation model, logs and summary:
>
> http://people.linaro.org/~mwhudson/libatomic.sum.txt
> http://people.linaro.org/~mwhudson/runtest-log-v-2.txt
>
> ) this patch enables the build.
>
> Cheers,
> mwh
> (first time posting to this list, let me know if I'm doing it wrong)
>
> 2013-10-04  Michael Hudson-Doyle  
>
>   * configure.tgt: Add AArch64 support.

Replying here also, This is the same patch which we have been using
internally and I would like to see this approved.

Thanks,
Andrew Pinski

>


Re: [C++ Patch] PR 58560

2013-10-04 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch] PR 58503

2013-10-04 Thread Jason Merrill

OK.

Jason


Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-04 Thread Kaz Kojima
Oleg Endo  wrote:
> Some of the things I've done in 4.8 to improve SH T bit handling turned
> out to produce wrong code.  The attached patch fixes that by introducing
> an SH specific RTL pass.
> 
> Tested on rev 202876 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> Additional test cases will follow.
> OK for trunk?

I've got a new failure with the patch on sh4-unknown-linux-gnu:

New tests that FAIL:

gfortran.fortran-torture/execute/forall_7.f90 execution,  -O3 -g 

> ifcvt_sh

We usually use sh_/sh- prefix for target specific things, don't we?
I'm not sure that there is some rigid naming convention for them,
though.

Regards,
kaz


Re: [c++-concepts] constrained friends redux

2013-10-04 Thread Paolo Carlini

Hi Andrew,

On 10/04/2013 07:36 PM, Andrew Sutton wrote:

+  if (!check_template_constraints (tmpl, args))
+{
+  location_t loc = DECL_SOURCE_LOCATION (function);
+  error ("%qD is not a viable candidate", function);
+  diagnose_constraints (input_location, tmpl, args);
+  return error_mark_node;
+}

Nit: loc seems unused.

Thanks,
Paolo.