date:20140516

Re: [PATCH] Implement -fsanitize=float-cast-overflow

2014-05-16 Thread Jakub Jelinek

On Thu, May 15, 2014 at 09:29:44PM +, Joseph S. Myers wrote:
> On Thu, 15 May 2014, Jakub Jelinek wrote:
>
> > But I think we can't use decimal_real_from_string, we'd need a variant
> > of that function that would allow specification of the rounding mode
>
> My point is that you can use "%.*RUe" or "%.*RDe" formats (for max and min
> respectively), with an appropriate precision, and let MPFR do the rounding
> to an appropriate number of decimal digits in the right direction (to
> produce a value that's exactly representable in the relevant DFP type, as
> long as it's in range).

You are right, that seems to work.  So new incremental patch.

BTW, for IBM long double
__int128_t f3 (long double x) { return x; }
the
u>= { -170141183460469231731687303715884105728.0 + -4194304.0 }
check is actually imprecise, while all correct long double values will be
properly accepted, if high double is exactly
-170141183460469231731687303715884105728.0 and low double is in
[-1.0, -4194304.0), then the unspecified conversion will not be
diagnosed, but I'm afraid there is nothing we can (easily) do about it,
because { -170141183460469231731687303715884105728.0 + -1.0 }, while
representable in IBM long double, is not representable in our
REAL_VALUE_TYPE, because we assume fixed precision, while IBM long double
has variable.  Guess the combination of IBM long double and __int128_t
(for long long it is still ok) is rare enough that we don't need to care
about it (alternative would be e.g. to compare against addition of
those two numbers with some optimization barrier/volatile that would
avoid it from being optimized into a single long double number.

--- gcc/convert.c
+++ gcc/convert.c
@@ -851,6 +851,8 @@
  expr = save_expr (expr);
  tree check = ubsan_instrument_float_cast (loc, type, expr);
  expr = build1 (FIX_TRUNC_EXPR, type, expr);
+ if (check == NULL)
+   return expr;
  return fold_build2 (COMPOUND_EXPR, TREE_TYPE (expr), check, expr);
}
   else
--- gcc/ubsan.c
+++ gcc/ubsan.c
@@ -47,6 +47,8 @@
 #include "asan.h"
 #include "gimplify-me.h"
 #include "intl.h"
+#include "realmpfr.h"
+#include "dfp.h"
 
 /* Map from a tree to a VAR_DECL tree.  */
 
@@ -903,17 +905,95 @@
 ubsan_instrument_float_cast (location_t loc, tree type, tree expr)
 {
   tree expr_type = TREE_TYPE (expr);
-  tree t, tt, fn;
+  tree t, tt, fn, min, max;
+  enum machine_mode mode = TYPE_MODE (expr_type);
+  int prec = TYPE_PRECISION (type);
+  bool uns_p = TYPE_UNSIGNED (type);
 
-  tree min = TYPE_MIN_VALUE (type);
-  tree max = TYPE_MAX_VALUE (type);
-  /* Add/subtract 1.0 so we can avoid truncating the value of EXPR.  */
-  min = fold_build2 (MINUS_EXPR, expr_type,
-build_real_from_int_cst (expr_type, min),
-build_one_cst (expr_type));
-  max = fold_build2 (PLUS_EXPR, expr_type,
-build_real_from_int_cst (expr_type, max),
-build_one_cst (expr_type));
+  /* Float to integer conversion first truncates toward zero, so
+ even signed char c = 127.875f; is not problematic.
+ Therefore, we should complain only if EXPR is unordered or smaller
+ or equal than TYPE_MIN_VALUE - 1.0 or greater or equal than
+ TYPE_MAX_VALUE + 1.0.  */
+  if (REAL_MODE_FORMAT (mode)->b == 2)
+{
+  /* For maximum, TYPE_MAX_VALUE might not be representable
+in EXPR_TYPE, e.g. if TYPE is 64-bit long long and
+EXPR_TYPE is IEEE single float, but TYPE_MAX_VALUE + 1.0 is
+either representable or infinity.  */
+  REAL_VALUE_TYPE maxval = dconst1;
+  SET_REAL_EXP (&maxval, REAL_EXP (&maxval) + prec - !uns_p);
+  real_convert (&maxval, mode, &maxval);
+  max = build_real (expr_type, maxval);
+
+  /* For unsigned, assume -1.0 is always representable.  */
+  if (uns_p)
+   min = build_minus_one_cst (expr_type);
+  else
+   {
+ /* TYPE_MIN_VALUE is generally representable (or -inf),
+but TYPE_MIN_VALUE - 1.0 might not be.  */
+ REAL_VALUE_TYPE minval = dconstm1, minval2;
+ SET_REAL_EXP (&minval, REAL_EXP (&minval) + prec - 1);
+ real_convert (&minval, mode, &minval);
+ real_arithmetic (&minval2, MINUS_EXPR, &minval, &dconst1);
+ real_convert (&minval2, mode, &minval2);
+ if (real_compare (EQ_EXPR, &minval, &minval2)
+ && !real_isinf (&minval))
+   {
+ /* If TYPE_MIN_VALUE - 1.0 is not representable and
+rounds to TYPE_MIN_VALUE, we need to subtract
+more.  As REAL_MODE_FORMAT (mode)->p is the number
+of base digits, we want to subtract a number that
+will be 1 << (REAL_MODE_FORMAT (mode)->p - 1)
+times smaller than minval.  */
+ minval2 = dconst1;
+ gcc_assert (prec > REAL_MODE_FORMAT (mode)->p);
+ SET_REAL_EXP (&minval2,
+   REAL_EXP (&

Re: [PATCH] Make SCCVN constant-fold calls

2014-05-16 Thread Richard Biener

On Thu, 15 May 2014, Richard Biener wrote:

> 
> For some odd reason I didn't implement this earlier.  This is one
> major source of 2nd-stage opportunities that come up when running
> two adjacent FRE passes.
> 
> Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

The following is what I have applied (with also properly
value-numbering VDEFs if there).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2014-05-16  Richard Biener  

* tree-ssa-sccvn.c (visit_use): Also constant-fold calls.

* gcc.dg/tree-ssa/ssa-fre-41.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 210418)
--- gcc/tree-ssa-sccvn.c(working copy)
*** visit_use (tree use)
*** 3566,3593 
else if (is_gimple_call (stmt))
{
  tree lhs = gimple_call_lhs (stmt);
- 
- /* ???  We could try to simplify calls.  */
- 
  if (lhs && TREE_CODE (lhs) == SSA_NAME)
{
! if (stmt_has_constants (stmt))
!   VN_INFO (lhs)->has_constants = true;
! else
{
! /* We reset expr and constantness here because we may
!have been value numbering optimistically, and
!iterating.  They may become non-constant in this case,
!even if they were optimistically constant.  */
! VN_INFO (lhs)->has_constants = false;
! VN_INFO (lhs)->expr = NULL_TREE;
}
! 
! if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
{
! changed = defs_to_varying (stmt);
  goto done;
}
}
  
  if (!gimple_call_internal_p (stmt)
--- 3566,3635 
else if (is_gimple_call (stmt))
{
  tree lhs = gimple_call_lhs (stmt);
  if (lhs && TREE_CODE (lhs) == SSA_NAME)
{
! /* Try constant folding based on our current lattice.  */
! tree simplified = gimple_fold_stmt_to_constant_1 (stmt,
!   vn_valueize);
! if (simplified)
{
! if (dump_file && (dump_flags & TDF_DETAILS))
!   {
! fprintf (dump_file, "call ");
! print_gimple_expr (dump_file, stmt, 0, 0);
! fprintf (dump_file, " simplified to ");
! print_generic_expr (dump_file, simplified, 0);
! if (TREE_CODE (lhs) == SSA_NAME)
!   fprintf (dump_file, " has constants %d\n",
!expr_has_constants (simplified));
! else
!   fprintf (dump_file, "\n");
!   }
}
! /* Setting value numbers to constants will occasionally
!screw up phi congruence because constants are not
!uniquely associated with a single ssa name that can be
!looked up.  */
! if (simplified
! && is_gimple_min_invariant (simplified))
{
! VN_INFO (lhs)->expr = simplified;
! VN_INFO (lhs)->has_constants = true;
! changed = set_ssa_val_to (lhs, simplified);
! if (gimple_vdef (stmt))
!   changed |= set_ssa_val_to (gimple_vdef (stmt),
!  gimple_vuse (stmt));
  goto done;
}
+ else if (simplified
+  && TREE_CODE (simplified) == SSA_NAME)
+   {
+ changed = visit_copy (lhs, simplified);
+ if (gimple_vdef (stmt))
+   changed |= set_ssa_val_to (gimple_vdef (stmt),
+  gimple_vuse (stmt));
+ goto done;
+   }
+ else
+   {
+ if (stmt_has_constants (stmt))
+   VN_INFO (lhs)->has_constants = true;
+ else
+   {
+ /* We reset expr and constantness here because we may
+have been value numbering optimistically, and
+iterating.  They may become non-constant in this case,
+even if they were optimistically constant.  */
+ VN_INFO (lhs)->has_constants = false;
+ VN_INFO (lhs)->expr = NULL_TREE;
+   }
+ 
+ if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+   {
+ changed = defs_to_varying (stmt);
+ goto done;
+   }
+   }
}
  
  if (!gimple_call_internal_p (stmt)
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre

Re: patch8.diff updated Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate

2014-05-16 Thread Svante Signell

On Wed, 2014-05-07 at 10:18 +0200, Svante Signell wrote:
> On Tue, 2014-05-06 at 15:26 +0200, Samuel Thibault wrote:

Attached is an updated patch8.diff. Arch specific code to
src/libgo/mksysinfo.sh has been added, now other systems are not
affected by the patch except the SYS_FCNTL part.

For that part of the patch without it the build on GNU/Hurd fails. On
the other hand SYS_FCNTL is not defined for e.g. GNU/Linux either. This
is used in gcc-4.9-4.9.0/src/libgo/go/net/fd_unix.go:
func dupCloseOnExec(fd int) (newfd int, err error) {
if atomic.LoadInt32(&tryDupCloexec) == 1 && syscall.F_DUPFD_CLOEXEC!=0 {
r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd),
syscall.F_DUPFD_CLOEXEC, 0)

It is yet unknown how the build succeeds on Linux without the SYS_FCNTL
being defined? Maybe any the conditions above are not met. 
--- a/src/libgo/mksysinfo.sh
+++ b/src/libgo/mksysinfo.sh
@@ -210,6 +210,13 @@
   egrep '#define E[A-Z0-9_]+ ' | \
   sed -e 's/^#define \(E[A-Z0-9_]*\) .*$/const \1 = Errno(_\1)/' >> ${OUT}
 
+# Special treatment of EWOULDBLOCK for GNU/Hurd
+# /usr/include/bits/errno.h: #define EWOULDBLOCK EAGAIN
+if egrep 'define EWOULDBLOCK EAGAIN' gen-sysinfo.go > /dev/null 2>&1; then
+  egrep '^const EWOULDBLOCK = Errno(_EWOULDBLOCK)' ${OUT} | \
+sed -i.bak -e 's/_EWOULDBLOCK/_EAGAIN/' ${OUT}
+fi
+
 # The O_xxx flags.
 egrep '^const _(O|F|FD)_' gen-sysinfo.go | \
   sed -e 's/^\(const \)_\([^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
@@ -225,6 +232,11 @@
   echo "const F_DUPFD_CLOEXEC = 0" >> ${OUT}
 fi
 
+# Special treatment of SYS_FCNTL for GNU/Hurd
+if ! grep '^const SYS_FCNTL' ${OUT} > /dev/null 2>&1; then
+  echo "const SYS_FCNTL = 0" >> ${OUT}
+fi
+
 # These flags can be lost on i386 GNU/Linux when using
 # -D_FILE_OFFSET_BITS=64, because we see "#define F_SETLK F_SETLK64"
 # before we see the definition of F_SETLK64.
@@ -528,6 +540,12 @@
 
 # The stat type.
 # Prefer largefile variant if available.
+# Special treatment of st_dev for GNU/Hurd
+# /usr/include/i386-gnu/bits/stat.h: #define st_dev st_fsid
+if grep 'define st_dev st_fsid' gen-sysinfo.go > /dev/null 2>&1; then
+  egrep '^type _stat ' gen-sysinfo.go > /dev/null 2>&1| \
+  sed -i.bak -e 's/; st_fsid/; st_dev/' gen-sysinfo.go
+fi
 stat=`grep '^type _stat64 ' gen-sysinfo.go || true`
 if test "$stat" != ""; then
   grep '^type _stat64 ' gen-sysinfo.go

Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware

2014-05-16 Thread Richard Biener

On Thu, 8 May 2014, Richard Biener wrote:

> 
> Ok, not really predicate aware, but this makes value-numbering
> pessimistically handle non-executable edges.  In the following
> patch groundwork is laid and PHI value-numbering is adjusted
> to take advantage of edges known to be not executable.
> 
> SCCVN is not well-suited to be control aware, but we still
> can see if value-numbering allows us to mark edges as
> not executable by looking at control statements.  Value-numbering
> of PHI nodes is one obvious consumer of such information
> and it also gives a natural order to do that (pessimistic)
> edge executability computation - dominator order.
> 
> Thus the following adds a pass over all control statements,
> trying to simplify them after value-numbering their operands
> (and all uses recursively, as SCCVN does).
> 
> With followup patches I will try to use this information to
> reduce the amount of work done (also improving optimization,
> of course).  One other obvious candidate is the alias walker
> which doesn't have to consider unreachable paths when
> walking into virtual PHIs.
> 
> The patch will likely get some more cleanups (due to the hack
> in set_ssa_val_to).
> 
> Comments still welcome.

Quiet as usual.  Well, the following is what I have committed
after bootstrapping and regtesting on x86_64-unknown-linux-gnu.
It fixes the inliner which is confused by random pass-local
flags on the edges to the exit block, adds one more testcase
and adjusts two.

I figured that followups for more optimizations are not
necessary as virtual operand value-numbering already gets
us most of the benefit.  Followups trying to do less work
may still be possible but they are low on priority.

Richard.

2014-05-16  Richard Biener  

* tree-ssa-sccvn.c: Include tree-cfg.h and domwalk.h.
(set_ssa_val_to): Handle unexpected sets to VN_TOP.
(visit_phi): Ignore edges marked as not executable.
(class cond_dom_walker): New.
(cond_dom_walker::before_dom_children): Value-number
control statements and mark successor edges as not
executable if possible.
(run_scc_vn): First walk all control statements in
dominator order, marking edges as not executable.
* tree-inline.c (copy_edges_for_bb): Be not confused
about random edge flags.

* gcc.dg/tree-ssa/ssa-fre-39.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-40.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-8.c: One more elimination.
* gcc.dg/tree-ssa/struct-aliasing-2.c: Scan cddce1 dump.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c.orig   2014-05-15 12:47:20.762286122 +0200
--- gcc/tree-ssa-sccvn.c2014-05-15 13:04:57.872213342 +0200
*** along with GCC; see the file COPYING3.
*** 51,56 
--- 51,58 
  #include "params.h"
  #include "tree-ssa-propagate.h"
  #include "tree-ssa-sccvn.h"
+ #include "tree-cfg.h"
+ #include "domwalk.h"
  
  /* This algorithm is based on the SCC algorithm presented by Keith
 Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
*** set_ssa_val_to (tree from, tree to)
*** 2661,2666 
--- 2663,2687 
tree currval = SSA_VAL (from);
HOST_WIDE_INT toff, coff;
  
+   /* The only thing we allow as value numbers are ssa_names
+  and invariants.  So assert that here.  We don't allow VN_TOP
+  as visiting a stmt should produce a value-number other than
+  that.
+  ???  Still VN_TOP can happen for unreachable code, so force
+  it to varying in that case.  Not all code is prepared to
+  get VN_TOP on valueization.  */
+   if (to == VN_TOP)
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Forcing value number to varying on "
+"receiving VN_TOP\n");
+   to = from;
+ }
+ 
+   gcc_assert (to != NULL_TREE
+ && (TREE_CODE (to) == SSA_NAME
+ || is_gimple_min_invariant (to)));
+ 
if (from != to)
  {
if (currval == from)
*** set_ssa_val_to (tree from, tree to)
*** 2680,2692 
to = from;
  }
  
-   /* The only thing we allow as value numbers are VN_TOP, ssa_names
-  and invariants.  So assert that here.  */
-   gcc_assert (to != NULL_TREE
- && (to == VN_TOP
- || TREE_CODE (to) == SSA_NAME
- || is_gimple_min_invariant (to)));
- 
if (dump_file && (dump_flags & TDF_DETAILS))
  {
fprintf (dump_file, "Setting value number of ");
--- 2701,2706 
*** visit_phi (gimple phi)
*** 3071,3077 
tree result;
tree sameval = VN_TOP;
bool allsame = true;
-   unsigned i;
  
/* TODO: We could check for this in init_sccvn, and replace this
   with a gcc_assert.  */
--- 3085,3090 
*** visit_phi (gimple phi)
*** 3080,3106 
  
/* See if all non-TOP arguments have the same v

Re: [AArch64 costs 0/18] Improve address- and rtx-cost models

2014-05-16 Thread Marcus Shawcroft

This series is OK.
/Marcus

On 27 March 2014 17:33, James Greenhalgh  wrote:
> Hi,
>
> This patch series improves the costing model in the AArch64 backend to
> match a number of new idioms.
>
> This patch is a combination of a series I had been working on, with the
> cost-model for XGene-1 proposed by Philipp Tomsich.
> ( http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01084.html )
>
> Where sensible I have integrated the idiom matching in Philipp's
> with my own work, though there were cases which were redundant, or
> could be folded to reduce code duplication. There were other cases where
> the code suggested XGene-1 would benefit from special-case handling.
> Without documentation for the XGene-1 I can't cater for these special
> cases, and I have not tried to do so here.
>
> The patch series has been bootstrapped natively on aarch64-none-elf, and
> has been through aarch64-none-elf runs with no issue.
>
> Is this OK for stage-1?
>
> Thanks,
> James

[AArch64 costs] Fixup to costing of FNMUL

2014-05-16 Thread James Greenhalgh


Hi,

While waiting for approval of the costs patches, I spotted this bug.
FNMUL should be costed like an FMUL, rather than being costed like
an FMA instruction.

Bootstrapped and tested in series with the costs patches with no issues.

I see Marcus has approved the patch series, so is this OK to go in as a fixup
to that?

Thanks,
James

---
gcc/

2014-05-15  James Greenhalgh  

* config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Fix FNMUL case.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6eb2303..0a7f441 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4711,24 +4711,18 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed)
 {
   if (speed)
 	{
-	  /* Floating-point FMA can also support negations of the
+	  /* Floating-point FMA/FMUL can also support negations of the
 	 operands.  */
 	  if (GET_CODE (op0) == NEG)
-	{
-	  maybe_fma = true;
-	  op0 = XEXP (op0, 0);
-	}
+	op0 = XEXP (op0, 0);
 	  if (GET_CODE (op1) == NEG)
-	{
-	  maybe_fma = true;
-	  op1 = XEXP (op1, 0);
-	}
+	op1 = XEXP (op1, 0);
 
 	  if (maybe_fma)
 	/* FMADD/FNMADD/FNMSUB/FMSUB.  */
 	cost += extra_cost->fp[mode == DFmode].fma;
 	  else
-	/* FMUL.  */
+	/* FMUL/FNMUL.  */
 	cost += extra_cost->fp[mode == DFmode].mult;
 	}

Re: RFA: Fix calculation of size of builtin setjmp buffer

2014-05-16 Thread Nicholas Clifton


Hi Eric,

  OK - here is your version of the patch, extended with a comment which 
I think is helpful for other people reading the code, and with the 
changes to builtins.c and md.texi removed, since the size of the buffer 
is not changing.


  Is this version OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2014-05-16  Nick Clifton  

* except.c (init_eh): Correct computation of the size of a builtin
setjmp buffer for when pointers are bigger than words.


Index: gcc/except.c
===
--- gcc/except.c(revision 210490)
+++ gcc/except.c(working copy)
@@ -286,9 +286,22 @@
   tmp = size_int (FIRST_PSEUDO_REGISTER + 2 - 1);
 #endif
 #else
-  /* builtin_setjmp takes a pointer to 5 words.  */
-  tmp = size_int (5 * BITS_PER_WORD / POINTER_SIZE - 1);
+  /* Compute a minimally sized jump buffer.  We need room to store at
+least 3 pointers - stack pointer, frame pointer and return address.
+Plus for some targets we need room for an extra pointer - in the
+case of MIPS this is the global pointer.  This makes a total of four
+pointers, but to be safe we actually allocate room for 5.
+
+If pointers are smaller than words then we allocate enough room for
+5 words, just in case the backend needs this much room.  For more
+discussion on this issue see:
+http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00313.html.  */
+  if (POINTER_SIZE > BITS_PER_WORD)
+   tmp = size_int (5 - 1);
+  else
+   tmp = size_int ((5 * BITS_PER_WORD / POINTER_SIZE) - 1);
 #endif
   tmp = build_index_type (tmp);
   tmp = build_array_type (ptr_type_node, tmp);

Re: [AArch64 costs] Fixup to costing of FNMUL

2014-05-16 Thread Marcus Shawcroft

On 16 May 2014 09:26, James Greenhalgh  wrote:

> 2014-05-15  James Greenhalgh  
>
> * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Fix FNMUL case.

OK /Marcus

Avoiding some garbage rtl from instantiate_virtual_regs

2014-05-16 Thread Richard Sandiford

It seems that in a typical -O0 compile the amount of rtl that
starts out as needed but becomes garbage is only slightly less than
half of the total amount created.  One of the big offenders is the
vregs pass, which creates new PLUSes when instanstiating a virtual
register + a constant and which creates new MEMs when instantiating
an address involving a virtual register.  This happens a lot in -O0
code because all variables live on the stack.

The instantiation walk is fundamentally in-place: every other part
of the pattern is modified without copying.  And rtl sharing rules
guarantee that we can do the same for PLUSes of registers and MEMs.

The patch does this by adding "inplace" arguments to plus_constant and
replace_equiv_address.  In a -O0 compile of an oldish fold-const.ii
(where no GC takes place) it reduces the amount of used GC memory
from 169M to 166M.  The average max RSS goes down by just over 1%.
Compile time seems to decrease slightly, but probably in the noise range.

There might be other callers than can use the new interfaces too.

Tested on x86_64-linux-gnu.  Also tested by comparing the asm output
for various parts of the testsuite before and after the patch.
The only changes were that some "sym+0"s becamse plain "syms"
(i.e. (plus X (const_int 0)) became X) because of the plus_constant
change.

OK to install?

Thanks,
Richard


gcc/
* emit-rtl.h (replace_equiv_address, replace_equiv_address_nv): Add an
inplace argument.  Store the new address in the original MEM when true.
* emit-rtl.c (change_address_1): Likewise.
(adjust_address_1, adjust_automodify_address_1, offset_address):
Update accordingly.
* rtl.h (plus_constant): Add an inplace argument.
* explow.c (plus_constant): Likewise.  Try to reuse the original PLUS
when true.  Avoid generating (plus X (const_int 0)).
* function.c (instantiate_virtual_regs_in_rtx): Adjust the PLUS
in-place.  Pass true to plus_constant.
(instantiate_virtual_regs_in_insn): Pass true to replace_equiv_address.

Index: gcc/emit-rtl.h
===
--- gcc/emit-rtl.h  2014-05-15 11:27:06.000259353 +0100
+++ gcc/emit-rtl.h  2014-05-16 09:11:42.479556294 +0100
@@ -52,10 +52,10 @@ extern tree get_spill_slot_decl (bool);
ADDR.  The caller is asserting that the actual piece of memory pointed
to is the same, just the form of the address is being changed, such as
by putting something into a register.  */
-extern rtx replace_equiv_address (rtx, rtx);
+extern rtx replace_equiv_address (rtx, rtx, bool = false);
 
 /* Likewise, but the reference is not required to be valid.  */
-extern rtx replace_equiv_address_nv (rtx, rtx);
+extern rtx replace_equiv_address_nv (rtx, rtx, bool = false);
 
 extern rtx gen_blockage (void);
 extern rtvec gen_rtvec (int, ...);
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  2014-05-16 09:09:18.446271662 +0100
+++ gcc/emit-rtl.c  2014-05-16 09:40:25.285714457 +0100
@@ -145,7 +145,6 @@ #define cur_insn_uid (crtl->emit.x_cur_i
 #define cur_debug_insn_uid (crtl->emit.x_cur_debug_insn_uid)
 #define first_label_num (crtl->emit.x_first_label_num)
 
-static rtx change_address_1 (rtx, enum machine_mode, rtx, int);
 static void set_used_decls (tree);
 static void mark_label_nuses (rtx);
 static hashval_t const_int_htab_hash (const void *);
@@ -2010,11 +2009,15 @@ clear_mem_size (rtx mem)
 /* Return a memory reference like MEMREF, but with its mode changed to MODE
and its address changed to ADDR.  (VOIDmode means don't change the mode.
NULL for ADDR means don't change the address.)  VALIDATE is nonzero if the
-   returned memory location is required to be valid.  The memory
-   attributes are not changed.  */
+   returned memory location is required to be valid.  INPLACE is true if any
+   changes can be made directly to MEMREF or false if MEMREF must be treated
+   as immutable.
+
+   The memory attributes are not changed.  */
 
 static rtx
-change_address_1 (rtx memref, enum machine_mode mode, rtx addr, int validate)
+change_address_1 (rtx memref, enum machine_mode mode, rtx addr, int validate,
+ bool inplace)
 {
   addr_space_t as;
   rtx new_rtx;
@@ -2042,6 +2045,12 @@ change_address_1 (rtx memref, enum machi
   if (rtx_equal_p (addr, XEXP (memref, 0)) && mode == GET_MODE (memref))
 return memref;
 
+  if (inplace)
+{
+  XEXP (memref, 0) = addr;
+  return memref;
+}
+
   new_rtx = gen_rtx_MEM (mode, addr);
   MEM_COPY_ATTRIBUTES (new_rtx, memref);
   return new_rtx;
@@ -2053,7 +2062,7 @@ change_address_1 (rtx memref, enum machi
 rtx
 change_address (rtx memref, enum machine_mode mode, rtx addr)
 {
-  rtx new_rtx = change_address_1 (memref, mode, addr, 1);
+  rtx new_rtx = change_address_1 (memref, mode, addr, 1, false);
   enum machine_mode mmode = GET_MODE (new_rtx);
   struc

[PATCH SH] Don't switch mode around fmov (pr61195)

2014-05-16 Thread Christian Bruel

Hi,

This patch reduces the number of unnecessary PR mode switches for single
precision moves. When FPSCR.SZ is not forced. Test illustration in the patch

Regtested for sh-none-elf with -m4, (-m2a still running), bootstrapped
on sh4-linux-elf board. ok for trunk ?

Many thanks,

Christian





 


2014-05-16  Christian Bruel  

	PR target/61195
	* config/sh/sh.c (movsf_ie): Unset fp_mode for fmov.

2014-05-16  Christian Bruel  

	PR target/61195
	* gcc.target/sh/pr61195.c: New test.

Index: config/sh/sh.md
===
--- config/sh/sh.md	(revision 210475)
+++ config/sh/sh.md	(working copy)
@@ -8357,9 +8357,26 @@ label:
   (const_int 2)
   (const_int 2)
   (const_int 0)])
-   (set (attr "fp_mode") (if_then_else (eq_attr "fmovd" "yes")
-	   (const_string "single")
-	   (const_string "single")))])
+  (set_attr_alternative "fp_mode"
+ [(if_then_else (eq_attr "fmovd" "yes") (const_string "single") (const_string "none"))
+  (const_string "none")
+  (const_string "single")
+  (const_string "single")
+  (const_string "none")
+  (if_then_else (eq_attr "fmovd" "yes") (const_string "single") (const_string "none"))
+  (if_then_else (eq_attr "fmovd" "yes") (const_string "single") (const_string "none"))
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")
+  (const_string "none")])])
 
 (define_split
   [(set (match_operand:SF 0 "register_operand" "")
Index: testsuite/gcc.target/sh/pr61195.c
===
--- testsuite/gcc.target/sh/pr61195.c	(revision 0)
+++ testsuite/gcc.target/sh/pr61195.c	(working copy)
@@ -0,0 +1,19 @@
+/* Verify that we don't switch mode for single moves.  */
+/* { dg-do compile }  */
+/* { dg-require-effective-target hard_float } */
+/* { dg-skip-if "" { *-*-* }  { "mfmovd" } { "" } } */
+/* { dg-final { scan-assembler-not "fpscr" } } */
+
+float *g;
+
+float
+foo(float f)
+{
+  return f;
+}
+
+float
+foo1(void)
+{
+  return *g;
+}

RE: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

2014-05-16 Thread Ian Bolton

Ping.  This should be relatively simple to review.

Many thanks.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Ian Bolton
> Sent: 08 May 2014 18:36
> To: gcc-patches
> Subject: [PATCH, AArch64] Use MOVN to generate 64-bit negative
> immediates where sensible
> 
> Hi,
> 
> It currently takes 4 instructions to generate certain immediates on
> AArch64 (unless we put them in the constant pool).
> 
> For example ...
> 
>   long long
>   beefcafebabe ()
>   {
> return 0xBEEFCAFEBABEll;
>   }
> 
> leads to ...
> 
>   mov x0, 0x47806
>   mov x0, 0xcafe, lsl 16
>   mov x0, 0xbeef, lsl 32
>   orr x0, x0, -281474976710656
> 
> The above case is tackled in this patch by employing MOVN
> to generate the top 32-bits in a single instruction ...
> 
>   mov x0, -71536975282177
>   movk x0, 0xcafe, lsl 16
>   movk x0, 0xbabe, lsl 0
> 
> Note that where at least two half-words are 0x, existing
> code that does the immediate in two instructions is still used.)
> 
> Tested on standard gcc regressions and the attached test case.
> 
> OK for commit?
> 
> Cheers,
> Ian
> 
> 
> 2014-05-08  Ian Bolton  
> 
> gcc/
>   * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
>   Use MOVN when top-most half-word (and only that half-word)
>   is 0x.
> gcc/testsuite/
>   * gcc.target/aarch64/movn_1.c: New test.

RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case

2014-05-16 Thread Ian Bolton

Ping.  This may well be classed as "obvious", but that's not
obvious to me, so I request a review.  Many thanks.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Ian Bolton
> Sent: 08 May 2014 18:42
> To: gcc-patches
> Subject: [PATCH, AArch64] Fix macro in vdup_lane_2 test case
> 
> This patch fixes a defective macro definition, based on correct
> definition in similar testcases.  The test currently passes
> through luck rather than correctness.
> 
> OK for commit?
> 
> Cheers,
> Ian
> 
> 
> 2014-05-08  Ian Bolton  
> 
> gcc/testsuite
>   * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an
>   actual instruction to move into the allocated register.

Re: [Patch, avr] Propagate -mrelax gcc driver flag to assembler

2014-05-16 Thread Georg-Johann Lay


Am 05/15/2014 09:55 AM, schrieb Senthil Kumar Selvaraj:

On Wed, May 14, 2014 at 12:56:54PM +0200, Rainer Orth wrote:

Georg-Johann Lay  writes:


Or what about simply that, which works for me:


Index: config/avr/avr.h
===
--- config/avr/avr.h(revision 210276)
+++ config/avr/avr.h(working copy)
@@ -512,7 +512,11 @@ extern const char *avr_device_to_sp8 (in
  %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
  %{!fexceptions:-fno-exceptions}"

+#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
+#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) %{mrelax:-mlink-relax} "
+#else
  #define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) "
+#endif

  #define LINK_SPEC "\
  %{mrelax:--relax\


Better yet something like

#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
#define LINK_RELAX_SPEC "%{mrelax:-mlink-relax} "
#else
#define LINK_RELAX_SPEC ""
#endif

#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) " LINK_RELAX_SPEC



Does this look ok? I don't have commit access, so could someone commit
this please?


Hi, looks fine to me.  Thanks

Usually, changelogs are more descriptive w.r.t. to what objects are touched 
like:

* config/avr/avr.h (LINK_RELAX_SPEC): Pass -mlink-relax to the
assembler, depending on HAVE_AS_AVR_LINK_RELAX_OPTION.
(ASM_SPEC): Use it.
* configure.ac (HAVE_AVR_AS_LINK_RELAX_OPTION) [avr]: New define if
assembler supports -mlink-relax.
* config.in: Regenerate.
* configure: Likewise.



Regards
Senthil

2014-05-15  Senthil Kumar Selvaraj  

* config/avr/avr.h: Pass on mlink-relax to assembler.
* configure.ac: Test for mlink-relax assembler support.
* config.in: Regenerate.
* configure: Likewise.

diff --git gcc/config.in gcc/config.in
index c0ba36e..1738301 100644
--- gcc/config.in
+++ gcc/config.in
@@ -575,6 +575,12 @@
  #endif


+/* Define if your assembler supports -mlink-relax option. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AVR_AS_LINK_RELAX_OPTION
+#endif
+
+
  /* Define to 1 if you have the `clearerr_unlocked' function. */
  #ifndef USED_FOR_TARGET
  #undef HAVE_CLEARERR_UNLOCKED
diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
index 9d34983..c59c54d 100644
--- gcc/config/avr/avr.h
+++ gcc/config/avr/avr.h
@@ -512,8 +512,14 @@ extern const char *avr_device_to_sp8 (int argc, const char 
**argv);
  %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
  %{!fexceptions:-fno-exceptions}"

-#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) "
-
+#ifdef HAVE_AVR_AS_LINK_RELAX_OPTION
+#define ASM_RELAX_SPEC "%{mrelax:-mlink-relax}"
+#else
+#define ASM_RELAX_SPEC ""
+#endif
+
+#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) " ASM_RELAX_SPEC
+
  #define LINK_SPEC "\
  %{mrelax:--relax\
   %{mpmem-wrap-around:%{mmcu=at90usb8*:--pmem-wrap-around=8k}\
diff --git gcc/configure gcc/configure
index f4db0a0..2812cdb 100755
--- gcc/configure
+++ gcc/configure
@@ -24014,6 +24014,39 @@ $as_echo "#define HAVE_AS_JSRDIRECT_RELOCS 1" 
>>confdefs.h
  fi
  ;;

+  avr-*-*)
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for -mlink-relax 
option" >&5
+$as_echo_n "checking assembler for -mlink-relax option... " >&6; }
+if test "${gcc_cv_as_avr_relax+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_avr_relax=no
+  if test x$gcc_cv_as != x; then
+$as_echo '.text' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mlink-relax -o conftest.o conftest.s 
>&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_avr_relax=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_avr_relax" >&5
+$as_echo "$gcc_cv_as_avr_relax" >&6; }
+if test $gcc_cv_as_avr_relax = yes; then
+
+$as_echo "#define HAVE_AVR_AS_LINK_RELAX_OPTION 1" >>confdefs.h
+
+fi
+  ;;
+
cris-*-*)
  { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for -no-mul-bug-abort 
option" >&5
  $as_echo_n "checking assembler for -no-mul-bug-abort option... " >&6; }
diff --git gcc/configure.ac gcc/configure.ac
index 8f17dfb..49a1f3d 100644
--- gcc/configure.ac
+++ gcc/configure.ac
@@ -3510,6 +3510,13 @@ case "$target" in
[Define if your assembler supports the lituse_jsrdirect relocation.])])
  ;;

+  avr-*-*)
+gcc_GAS_CHECK_FEATURE([-mlink-relax option], gcc_cv_as_avr_relax,,
+  [-mlink-relax], [.text],,
+  [AC_DEFINE(HAVE_AVR_AS_LINK_RELAX_OPTION, 1,
+   [Define if your assembler supports -mlink-relax option.])])
+  ;;
+
cris-*-*)
  gcc_GAS_CHECK_FEATURE([-no-mul-bug-abort option],
gcc_cv_as_cris_no_mul_bug,[2,15,91],

RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Thomas Preud'homme

Ping?

Best regards,

Thomas Preud'homme

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Friday, May 09, 2014 6:26 PM
> To: GCC Patches
> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
> 
> Sorry, took longer than expected as I got distracted by some other patch.
> I merged the whole patchset in a single patch as I was told the current setup
> is actually more difficult to read.
> 
> Here are the updated ChangeLogs:
> 
> *** gcc/ChangeLog ***
> 
> 2014-05-09  Thomas Preud'homme  
> 
>   PR tree-optimization/54733
>   * expr.c (get_inner_reference): Add a parameter to control whether
> a
>   MEM_REF should be split into base + offset.
>   * tree.h (get_inner_reference): Default new parameter to false.
>   * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure.
>   (CMPNOP): Define.
>   (find_bswap_or_nop_load): New.
>   (find_bswap_1): Renamed to ...
>   (find_bswap_or_nop_1): This. Also add support for memory source.
>   (find_bswap): Renamed to ...
>   (find_bswap_or_nop): This. Also add support for memory source and
>   detection of bitwise operations equivalent to load in host endianness.
>   (execute_optimize_bswap): Likewise. Also move its leading
> comment back
>   in place and split statement transformation into ...
>   (bswap_replace): This. Add assert when updating bswap_stats.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2014-05-09  Thomas Preud'homme  
> 
>   PR tree-optimization/54733
>   * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
> bswap
>   optimization to support memory sources and bitwise operations
>   equivalent to load in host endianness.
>   * gcc.dg/optimize-bswaphi-1.c: Likewise.
>   * gcc.dg/optimize-bswapsi-2.c: Likewise.
>   * gcc.c-torture/execute/bswap-2.c: Likewise.
> 
> Ok for trunk?
> 
> Best regards,
> 
> Thomas
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > Sent: Monday, May 05, 2014 7:30 PM
> > To: GCC Patches
> > Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
> > load/store
> >
> > I found a way to improve the function find_bswap/find_bswap_or_nop
> > and reduce its size. Please hold for the review, I will post an updated
> > version as soon as I finish testing.
> >
> > Best regards,
> >
> > Thomas Preud'homme
> >
> >
> >

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng

On Fri, May 16, 2014 at 12:57 AM, Steven Bosscher  wrote:
> On Thu, May 15, 2014 at 9:26 AM, bin.cheng wrote:
>> Hi,
>> Targets like ARM and AARCH64 support double-word load store instructions,
>> and these instructions are generally faster than the corresponding two
>> load/stores.  GCC currently uses peephole2 to merge paired load/store into
>> one single instruction which has a disadvantage.  It can only handle simple
>> cases like the two instructions actually appear sequentially in instruction
>> stream, and is too weak to handle cases in which the two load/store are
>> intervened by other irrelevant instructions.
>>
>> Here comes up with a new GCC pass looking through each basic block and
>> merging paired load store even they are not adjacent to each other.  The
>> algorithm is pretty simple:
>> 1) In initialization pass iterating over instruction stream it collects
>> relevant memory access information for each instruction.
>> 2) It iterates over each basic block, tries to find possible paired
>> instruction for each memory access instruction.  During this work, it checks
>> dependencies between the two possible instructions and also records the
>> information indicating how to pair the two instructions.  To avoid quadratic
>> behavior of the algorithm, It introduces new parameter
>> max-merge-paired-loadstore-distance and set the default value to 4, which is
>> large enough to catch major part of opportunities on ARM/cortex-a15.
>> 3) For each candidate pair, it calls back-end's hook to do target dependent
>> check and merge the two instructions if possible.
>>
>> Though the parameter is set to 4, for miscellaneous benchmarks, this pass
>> can merge numerous opportunities except ones already merged by peephole2
>> (same level numbers of opportunities comparing to peepholed ones).  GCC
>> bootstrap can also confirm this finding.
>>
>> Yet there is an open issue about when we should run this new pass.  Though
>> register renaming is disabled by default now, I put this pass after it,
>> because renaming can resolve some false dependencies thus benefit this pass.
>> Another finding is, it can capture a lot more opportunities if it's after
>> sched2, but I am not sure whether it will mess up with scheduling results in
>> this way.
>>
>> So, any comments about this?

Hi Steven,
Thanks for reviewing this.  Here are some answers to the general questions.

>
> First off: Why does this need a target hook? We're getting more and
> more of them -- too many IMHO. There should be really good reasons for
> adding even more new ones.

Yes, I think this one does have a good reason.  The target independent
pass just makes sure that two consecutive memory access instructions
are free of data-dependency with each other, then feeds it to back-end
hook.  It's back-end's responsibility to generate correct instruction.
 It's not about modifying an existing insn then recognize it, it's
about creating new instruction sometimes.  For example, we can
generate a simple move insn in Arm mode, while have to generate a
parallel instruction in Thumb mode.  Target independent part has no
idea how to generate an expected insn.  Moreover, back-end may check
some special conditions too.

>
> Does this pass have to run after scheduling? The way you describe it,
No, I just meant there is more opportunities after regrenaming, and
even more opportunities after sched2, I haven't investigated reason
for the latter one yet,  but this pass doesn't depend on sched2 to
work.

> this sounds like a scheduling problem, where you don't need regrename
> to resolve false dependencies. Your sched2 pass should be able to
> prioritize mergeable loads/stores to schedule them adjacent. Of if you
> must do this before scheduling, then at least do it before sched2. Now
> you're really ruining the order of the scheduled instructions, which
> seems bad.
Yes, I agree it should NOT disturb scheduling results, that's why I
put it before sched2 and after register renaming right now.

>
> I don't see how regrename will help resolve [base+offset] false
> dependencies. Can you explain? I'd expect effects from
> hardreg-copyprop "commoning" a base register.
It's the register operand's false dependency, rather than the base's
one.  Considering below simple case:
mov r1,  #const1
store r1, [base+offset1]
mov r1, #const2
store r1, [base_offset2]
It should be renamed into:
mov r1,  #const1
store r1, [base+offset1]
mov r2, #const2
store r2, [base_offset2]
Then caught by the patch.

I will leave other comments for a moment after more discussion here.

Thanks,
bin
>
> ChangeLog is missing the entry for arm.c.
>
> Your pass should make those peephole2's redundant, so you should
> remove the relevant define_peephole2's.
>
>
>>  +   generated by spilling during reigster allocation.  To catch all
>
> s/reigster/register/
>
>
>> +   whose address expression is in the form of "[base_offset]".  It
>
> s/[base_offset]/[base+offset]/
>
>
>> +   only

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng

On Fri, May 16, 2014 at 1:13 AM, Jeff Law  wrote:
> On 05/15/14 10:51, Mike Stump wrote:
>>
>> On May 15, 2014, at 12:26 AM, bin.cheng  wrote:
>>>
>>> Here comes up with a new GCC pass looking through each basic block
>>> and merging paired load store even they are not adjacent to each
>>> other.
>>
>>
>> So I have a target that has load and store multiple support that
>> supports large a number of registers (2-n registers), and I added a
>> sched0 pass that is a light copy of the regular scheduling pass that
>> uses a different cost function which arranges all loads first, then
>> all stores then everything else.  Within a group of loads or stores
>> the secondary key is the base register, the next key is the offset.
>> The net result, all loads off the same register are sorted in
>> increasing order.
>
> Glad to see someone else stumble on (ab)using the scheduler to do this.
Emm, If it's (ab)using, should we still do it then?

>
> I've poked at the scheduler several times to do similar stuff, but was never
> really satisfied with the results and never tried to polish those prototypes
> into something worth submitting.
>
> One example I've poked at was discovery of stores which then feed into a
> load from the same location.  Which obviously we'd prefer to turn into a
> store + copy (subject to mess of constraints).  There's a handful of these
> kind of transformations that seem to naturally drop out of this kind of
> work.
As Mike stated, merging of consecutive memory accesses is all about
the base register and the offset. I am thinking another method
collecting all memory accesses with same base register then doing the
merge work.  In this way, we should be able to merge more than 2
instructions, also it would be possible to remove redundant load
instructions in one pass.

My question is how many these redundant loads could be?  Is there any
rtl pass responsible for this now?

Thanks,
bin
>
> Similarly a post-reload pass could be used to promote single word
> loads/stores to double-word operations.
>
> If anyone cared about PA 1.1 code generation, it'd be a much cleaner way to
> support the non-fused fmpyadd fmpysub insns.
>
> Anyway, if you want to move forward with the idea, I'd certainly support
> doing so.
>
> jeff



-- 
Best Regards.

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng

On Fri, May 16, 2014 at 12:51 AM, Mike Stump  wrote:
> On May 15, 2014, at 12:26 AM, bin.cheng  wrote:
>> Here comes up with a new GCC pass looking through each basic block and
>> merging paired load store even they are not adjacent to each other.
>
> So I have a target that has load and store multiple support that supports 
> large a number of registers (2-n registers), and I added a sched0 pass that 
> is a light copy of the regular scheduling pass that uses a different cost 
> function which arranges all loads first, then all stores then everything 
> else.  Within a group of loads or stores the secondary key is the base 
> register, the next key is the offset.  The net result, all loads off the same 
> register are sorted in increasing order.  We then can use some define_insns 
> and some peephole to patterns to take the seemingly unrelated instructions, 
> which are now adjacent to knock them down into single instructions, instead 
> of the mass of instructions they were before.  And then a peephole pass that 
> runs early to allow the patterns to do the heavy lifting.  This scheme can 
> handle unlimited complexity on the addressing forms just by ensuring the cost 
> function for the new scheduling pass looks at all relevant bits (target 
> dependent, if the simple machine independent form reg + n is not enough).  
> The sched0 and the peephole pass run early:
>
> +  NEXT_PASS (pass_sched0);
> +  NEXT_PASS (pass_peephole1);
>NEXT_PASS (pass_web);
>NEXT_PASS (pass_rtl_cprop);
>NEXT_PASS (pass_cse2);
>
> (before register allocation) so, it can arrange to have things in adjacent 
> registers for the load and store multiple instructions.  The register 
> allocator can then arrange all the code to use those registers directly.
>
> So, having done all this work, I think it would be nicer if there were a pass 
> that managed it (so that one doesn't have to write any of the peephole or the 
> define_insns (you need like 3*n patterns, and the patterns of O(n), so, you 
> need around n*4/2 lines of code, which is annoying for large n.  A pass could 
> use the existing load store multiple patterns directly, so, no additional 
> port work.  In my work, since I extend life times of values into registers, 
> pretty much without limit, this could be a bad thing.  The code is naturally 
> limited to only extending the lives of things where load and store multiple 
> are used, as if they aren't used, the regular scheduling pass would undo all 
> the sched0 motion.  I choose a light copy of sched, as I don't care about 
> compile time, and I wanted a very small patch that was easy to maintain.  If 
> this pass when into trunk, we'd run the new passes _only_ if a port asked for 
> them.  99% of the ports likely don't want either, though, peephole before 
> register allocation might be interesting for others to solve other issues.
>
> I wanted this to run before register allocation as my load and store multiple 
> instructions only take consecutive register ranges (n-m), and I need the 
> register allocator to manage to make it true.  I do reg to reg motion to move 
> things around as needed, but almost all I expect the allocator to get rid of. 
>  Very complex cases might wind up with a few extra moves, but I have nice 
> bubbles that I can fit these extra moves into.
>
> In my scheme, no new options, no documentation for new options, no new param 
> options, no silly constants, no hard to write/maintain pass, no new weird 
> targets interface, no limit on just pairs, works on stores as well, runs 
> earlier, 430 lines instead of 1086 lines, conceptually much simpler, added 
> benefit of peephole before register allocation that can be used for other 
> things by the port...
>
> So, my question is, does my scheme on your target find more or fewer things?  
> Would your scheme pair pairs (so that 4 registers would go into 1 
> instruction)?
>
Hi Mike,
Thanks for bringing up this new method.  I had to admit that I was not
very much into this method at the first glance especially it requires
another pass in cooperation with scheduler.

For the first question, unfortunately, I can't apply the patch to svn
trunk, could you please update it thus I can do some experiment with
it?  From my experience, lots of opportunities on ARM are generated by
RA, putting it before RA would miss many cases.  Another possible
issue is about interaction with other optimizations.  Putting it at
early stage of RTL, we may disturb other optimizations like
/gcse/-fgcse-lm/-fgcse-sm/dse.  Of course, it has advantages too, for
example, fwprop_addr sometimes corrupt load/store pair opportunities
on ARM, it won't be a problem for your patch.

For the second question, the answer is no for current implementation.
For ARM the most important opportunity is the paired one, so I just
started with pair of two instructions, which is much simpler.  I do
have a draft idea how to merge more than two instructions, but haven't
worke

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Bin.Cheng

On Thu, May 15, 2014 at 6:31 PM, Oleg Endo  wrote:
> Hi,
>
> On 15 May 2014, at 09:26, "bin.cheng"  wrote:
>
>> Hi,
>> Targets like ARM and AARCH64 support double-word load store instructions,
>> and these instructions are generally faster than the corresponding two
>> load/stores.  GCC currently uses peephole2 to merge paired load/store into
>> one single instruction which has a disadvantage.  It can only handle simple
>> cases like the two instructions actually appear sequentially in instruction
>> stream, and is too weak to handle cases in which the two load/store are
>> intervened by other irrelevant instructions.
>>
>> Here comes up with a new GCC pass looking through each basic block and
>> merging paired load store even they are not adjacent to each other.  The
>> algorithm is pretty simple:
>> 1) In initialization pass iterating over instruction stream it collects
>> relevant memory access information for each instruction.
>> 2) It iterates over each basic block, tries to find possible paired
>> instruction for each memory access instruction.  During this work, it checks
>> dependencies between the two possible instructions and also records the
>> information indicating how to pair the two instructions.  To avoid quadratic
>> behavior of the algorithm, It introduces new parameter
>> max-merge-paired-loadstore-distance and set the default value to 4, which is
>> large enough to catch major part of opportunities on ARM/cortex-a15.
>> 3) For each candidate pair, it calls back-end's hook to do target dependent
>> check and merge the two instructions if possible.
>>
>> Though the parameter is set to 4, for miscellaneous benchmarks, this pass
>> can merge numerous opportunities except ones already merged by peephole2
>> (same level numbers of opportunities comparing to peepholed ones).  GCC
>> bootstrap can also confirm this finding.
>
> This is interesting.  E.g. on SH there are insns to load/store SFmode pairs.  
> However, these insns require a mode switch and have some constraints on 
> register usage.  So in the SH case the load/store pairing would need to be 
> done before reg alloc and before mode switching.
>
>>
>> Yet there is an open issue about when we should run this new pass.  Though
>> register renaming is disabled by default now, I put this pass after it,
>> because renaming can resolve some false dependencies thus benefit this pass.
>> Another finding is, it can capture a lot more opportunities if it's after
>> sched2, but I am not sure whether it will mess up with scheduling results in
>> this way.
>
> How about the following.
> Instead of adding new hooks and inserting the pass to the general pass list, 
> make the new
> pass class take the necessary callback functions directly.  Then targets can 
> just instantiate
> the pass, passing their impl of the callbacks, and insert the pass object 
> into the pass list at
> a place that fits best for the target.
Oh, I don't know we can do this in GCC.  But yes, a target may want to
run it at some place that fits best for the target.

Thanks,
bin
>
>
>>
>> So, any comments about this?
>>
>> Thanks,
>> bin
>>
>>
>> 2014-05-15  Bin Cheng  
>>* common.opt (flag_merge_paired_loadstore): New option.
>>* merge-paired-loadstore.c: New file.
>>* Makefile.in: Support new file.
>>* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
>>(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
>>* params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
>>* doc/invoke.texi (-fmerge-paired-loadstore): New.
>>(max-merge-paired-loadstore-distance): New.
>>* doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
>>* doc/tm.texi: Regenerated.
>>* target.def (merge_paired_loadstore): New.
>>* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
>>* passes.def (pass_merge_paired_loadstore): New pass.
>>* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.
>>
>> gcc/testsuite/ChangeLog
>> 2014-05-15  Bin Cheng  
>>
>>* gcc.target/arm/merge-paired-loadstore.c: New test.
>>
>> 



-- 
Best Regards.

Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

2014-05-16 Thread James Greenhalgh

On Fri, Mar 28, 2014 at 03:39:53PM +, James Greenhalgh wrote:
> On Fri, Mar 28, 2014 at 03:09:22PM +, pins...@gmail.com wrote:
> > > On Mar 28, 2014, at 7:48 AM, James Greenhalgh  
> > > wrote:
> > > On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote:
> > >>> On Mar 28, 2014, at 2:12 AM, James Greenhalgh 
> > >>>  wrote:
> > >>> There is no way to perform scalar addition in the vector register file,
> > >>> but with the RTX costs in place we start rewriting (x << 1) to (x + x)
> > >>> on almost all cores. The code which makes this decision has no idea 
> > >>> that we
> > >>> will end up doing this (it happens well before reload) and so we end up 
> > >>> with
> > >>> very ugly code generation in the case where addition was selected, but
> > >>> we are operating in vector registers.
> > >>> 
> > >>> This patch relies on the same gimmick we are already using to allow
> > >>> shifts on 32-bit scalars in the vector register file - Use a vector 32x2
> > >>> operation instead, knowing that we can safely ignore the top bits.
> > >>> 
> > >>> This restores some normality to scalar_shift_1.c, however the test
> > >>> that we generate a left shift by one is clearly bogus, so remove that.
> > >>> 
> > >>> This patch is pretty ugly, but it does generate superficially better
> > >>> looking code for this testcase.
> > >>> 
> > >>> Tested on aarch64-none-elf with no issues.
> > >>> 
> > >>> OK for stage 1?
> > >> 
> > >> It seems we should also discourage the neon alternatives as there might 
> > >> be
> > >> extra movement between the two register sets which we don't want.
> > > 
> > > I see your point, but we've tried to avoid doing that elsewhere in the
> > > AArch64 backend. Our argument has been that strictly speaking, it isn't 
> > > that
> > > the alternative is expensive, it is the movement between the register 
> > > sets. We
> > > do model that elsewhere, and the register allocator should already be 
> > > trying to
> > > avoid unneccesary moves between register classes.
> > > 
> > 
> > What about on a specific core where that alternative is expensive; that is
> > the vector instructions are worse than the scalar ones. How are we going to
> > handle this case?
> 
> Certainly not by discouraging the alternative for all cores. We would need
> a more nuanced approach which could be tuned on a per-core basis. Otherwise
> we are bluntly and inaccurately pessimizing those cases where we can cheaply
> perform the operation in the vector register file (e.g. we are cleaning up
> loose ends after a vector loop, we have spilled to the vector register
> file, etc.). The register preference mechanism feels the wrong place to
> catch this as it does not allow for that degree of per-core felxibility,
> an alternative is simply "disparaged slightly" (?, * in LRA) or
> "disparaged severely" (!).
> 
> I would think that we don't want to start polluting the machine description
> trying to hack around this as was done with the ARM backend's
> neon_for_64_bits/avoid_neon_for_64_bits.
> 
> How have other targets solved this issue?

Did you have any further thoughts on this? I've pushed the costs patches, so
we will start to see gcc.target/aarch64/scalar_shift_1.c failing without
this or an equivalent patch.

Othereise, *ping*

Thanks,
James

---
gcc/

2014-05-16  James Greenhalgh  

* config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
vector registers.

gcc/testsuite/

2014-05-16  James Greenhalgh  

* gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.

> > > If those mechanisms are broken, we should fix them - in that case fixing
> > > this by discouraging valid alternatives would seem to be gaffer-taping 
> > > over the
> > > real problem.
> > > 
> > > Thanks,
> > > James
> > > 
> > >> 
> > >> Thanks,
> > >> Andrew
> > >> 
> > >>> 
> > >>> Thanks,
> > >>> James
> > >>> 
> > >>> ---
> > >>> gcc/
> > >>> 
> > >>> 2014-03-27  James Greenhalgh  
> > >>> 
> > >>>   * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
> > >>>   vector registers.
> > >>> 
> > >>> gcc/testsuite/
> > >>> 2014-03-27  James Greenhalgh  
> > >>> 
> > >>>   * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
> > >>> <0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch>
> > >> 
> > 
> 

--1.8.3-rc0
Content-Type: text/x-patch; 
name="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; 
filename="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch"

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1068,16 +1068,17 @@ (define_expand "add3"
 
 (define_insn "*addsi3_aarch64"
   [(set
-(match_operand:SI 0 "register_operand" "=rk,rk,rk")
+(match_operand:SI 0

Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener

On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 wrote:
> Ping?

Sorry ...

> Best regards,
>
> Thomas Preud'homme
>
>> -Original Message-
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
>> Sent: Friday, May 09, 2014 6:26 PM
>> To: GCC Patches
>> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
>>
>> Sorry, took longer than expected as I got distracted by some other patch.
>> I merged the whole patchset in a single patch as I was told the current setup
>> is actually more difficult to read.
>>
>> Here are the updated ChangeLogs:
>>
>> *** gcc/ChangeLog ***
>>
>> 2014-05-09  Thomas Preud'homme  
>>
>>   PR tree-optimization/54733
>>   * expr.c (get_inner_reference): Add a parameter to control whether
>> a
>>   MEM_REF should be split into base + offset.
>>   * tree.h (get_inner_reference): Default new parameter to false.
>>   * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure.
>>   (CMPNOP): Define.
>>   (find_bswap_or_nop_load): New.
>>   (find_bswap_1): Renamed to ...
>>   (find_bswap_or_nop_1): This. Also add support for memory source.
>>   (find_bswap): Renamed to ...
>>   (find_bswap_or_nop): This. Also add support for memory source and
>>   detection of bitwise operations equivalent to load in host endianness.
>>   (execute_optimize_bswap): Likewise. Also move its leading
>> comment back
>>   in place and split statement transformation into ...
>>   (bswap_replace): This. Add assert when updating bswap_stats.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2014-05-09  Thomas Preud'homme  
>>
>>   PR tree-optimization/54733
>>   * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
>> bswap
>>   optimization to support memory sources and bitwise operations
>>   equivalent to load in host endianness.
>>   * gcc.dg/optimize-bswaphi-1.c: Likewise.
>>   * gcc.dg/optimize-bswapsi-2.c: Likewise.
>>   * gcc.c-torture/execute/bswap-2.c: Likewise.
>>
>> Ok for trunk?

Ok, I now decided otherwise and dislike the new parameter to
get_inner_reference.  Can you please revert that part and just
deal with a MEM_REF result in your only caller?

And (of course) I found another possible issue.  The way you
compute load_type and use it here:

+  /* Perform the load.  */
+  load_offset_ptr = build_int_cst (n->alias_set, 0);
+  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
+ load_offset_ptr);

makes the load always appear aligned according to the mode of
load_type.  On strict-alignment targets this may cause faults.

So what you have to do is either (simpler)

   unsigned int align = get_pointer_alignment (addr_tmp);
   tree al_load_type = load_type;
   if (align < TYPE_ALIGN (load_type))
 al_load_type = build_aligned_type (load_type, align);
...
val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
 load_offset_ptr);

or keep track of the "first" actual load and use

   unsigned int align = get_object_alignment (that_first_load);

"first" in the one that corresponds to addr_tmp.  From that there
is a much better chance to derive good alignment values.

Of course on STRICT_ALIGNMENT targets a not aligned load
will be decomposed again, so eventually doing the transformation
may no longer be profitable(?).

Thanks and sorry again for the delay.

Otherwise the patch looks good to me.

Richard.

>> Best regards,
>>
>> Thomas
>>
>> > -Original Message-
>> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
>> > Sent: Monday, May 05, 2014 7:30 PM
>> > To: GCC Patches
>> > Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
>> > load/store
>> >
>> > I found a way to improve the function find_bswap/find_bswap_or_nop
>> > and reduce its size. Please hold for the review, I will post an updated
>> > version as soon as I finish testing.
>> >
>> > Best regards,
>> >
>> > Thomas Preud'homme
>> >
>> >
>> >
>
>

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Richard Biener

On Fri, May 16, 2014 at 12:10 PM, Bin.Cheng  wrote:
> On Thu, May 15, 2014 at 6:31 PM, Oleg Endo  wrote:
>> Hi,
>>
>> On 15 May 2014, at 09:26, "bin.cheng"  wrote:
>>
>>> Hi,
>>> Targets like ARM and AARCH64 support double-word load store instructions,
>>> and these instructions are generally faster than the corresponding two
>>> load/stores.  GCC currently uses peephole2 to merge paired load/store into
>>> one single instruction which has a disadvantage.  It can only handle simple
>>> cases like the two instructions actually appear sequentially in instruction
>>> stream, and is too weak to handle cases in which the two load/store are
>>> intervened by other irrelevant instructions.
>>>
>>> Here comes up with a new GCC pass looking through each basic block and
>>> merging paired load store even they are not adjacent to each other.  The
>>> algorithm is pretty simple:
>>> 1) In initialization pass iterating over instruction stream it collects
>>> relevant memory access information for each instruction.
>>> 2) It iterates over each basic block, tries to find possible paired
>>> instruction for each memory access instruction.  During this work, it checks
>>> dependencies between the two possible instructions and also records the
>>> information indicating how to pair the two instructions.  To avoid quadratic
>>> behavior of the algorithm, It introduces new parameter
>>> max-merge-paired-loadstore-distance and set the default value to 4, which is
>>> large enough to catch major part of opportunities on ARM/cortex-a15.
>>> 3) For each candidate pair, it calls back-end's hook to do target dependent
>>> check and merge the two instructions if possible.
>>>
>>> Though the parameter is set to 4, for miscellaneous benchmarks, this pass
>>> can merge numerous opportunities except ones already merged by peephole2
>>> (same level numbers of opportunities comparing to peepholed ones).  GCC
>>> bootstrap can also confirm this finding.
>>
>> This is interesting.  E.g. on SH there are insns to load/store SFmode pairs. 
>>  However, these insns require a mode switch and have some constraints on 
>> register usage.  So in the SH case the load/store pairing would need to be 
>> done before reg alloc and before mode switching.
>>
>>>
>>> Yet there is an open issue about when we should run this new pass.  Though
>>> register renaming is disabled by default now, I put this pass after it,
>>> because renaming can resolve some false dependencies thus benefit this pass.
>>> Another finding is, it can capture a lot more opportunities if it's after
>>> sched2, but I am not sure whether it will mess up with scheduling results in
>>> this way.
>>
>> How about the following.
>> Instead of adding new hooks and inserting the pass to the general pass list, 
>> make the new
>> pass class take the necessary callback functions directly.  Then targets can 
>> just instantiate
>> the pass, passing their impl of the callbacks, and insert the pass object 
>> into the pass list at
>> a place that fits best for the target.
> Oh, I don't know we can do this in GCC.  But yes, a target may want to
> run it at some place that fits best for the target.

Btw, the bswap pass enhancements that are currently in review may
also be an opportunity to catch these.  They can merge adjacent
loads that are used "composed" (but not yet composed by storing
into adjacent memory).  The basic-block vectorizer should also
handle this (if the composition happens to be by storing into
adjacent memory) - of course it needs vector modes available and
it has to be enabled.

Richard.

> Thanks,
> bin
>>
>>
>>>
>>> So, any comments about this?
>>>
>>> Thanks,
>>> bin
>>>
>>>
>>> 2014-05-15  Bin Cheng  
>>>* common.opt (flag_merge_paired_loadstore): New option.
>>>* merge-paired-loadstore.c: New file.
>>>* Makefile.in: Support new file.
>>>* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
>>>(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
>>>* params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
>>>* doc/invoke.texi (-fmerge-paired-loadstore): New.
>>>(max-merge-paired-loadstore-distance): New.
>>>* doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
>>>* doc/tm.texi: Regenerated.
>>>* target.def (merge_paired_loadstore): New.
>>>* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
>>>* passes.def (pass_merge_paired_loadstore): New pass.
>>>* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2014-05-15  Bin Cheng  
>>>
>>>* gcc.target/arm/merge-paired-loadstore.c: New test.
>>>
>>> 
>
>
>
> --
> Best Regards.

Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread pinskia



> On May 16, 2014, at 3:48 AM, Richard Biener  
> wrote:
> 
> On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
>  wrote:
>> Ping?
> 
> Sorry ...
> 
>> Best regards,
>> 
>> Thomas Preud'homme
>> 
>>> -Original Message-
>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
>>> Sent: Friday, May 09, 2014 6:26 PM
>>> To: GCC Patches
>>> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
>>> 
>>> Sorry, took longer than expected as I got distracted by some other patch.
>>> I merged the whole patchset in a single patch as I was told the current 
>>> setup
>>> is actually more difficult to read.
>>> 
>>> Here are the updated ChangeLogs:
>>> 
>>> *** gcc/ChangeLog ***
>>> 
>>> 2014-05-09  Thomas Preud'homme  
>>> 
>>>  PR tree-optimization/54733
>>>  * expr.c (get_inner_reference): Add a parameter to control whether
>>> a
>>>  MEM_REF should be split into base + offset.
>>>  * tree.h (get_inner_reference): Default new parameter to false.
>>>  * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure.
>>>  (CMPNOP): Define.
>>>  (find_bswap_or_nop_load): New.
>>>  (find_bswap_1): Renamed to ...
>>>  (find_bswap_or_nop_1): This. Also add support for memory source.
>>>  (find_bswap): Renamed to ...
>>>  (find_bswap_or_nop): This. Also add support for memory source and
>>>  detection of bitwise operations equivalent to load in host endianness.
>>>  (execute_optimize_bswap): Likewise. Also move its leading
>>> comment back
>>>  in place and split statement transformation into ...
>>>  (bswap_replace): This. Add assert when updating bswap_stats.
>>> 
>>> *** gcc/testsuite/ChangeLog ***
>>> 
>>> 2014-05-09  Thomas Preud'homme  
>>> 
>>>  PR tree-optimization/54733
>>>  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
>>> bswap
>>>  optimization to support memory sources and bitwise operations
>>>  equivalent to load in host endianness.
>>>  * gcc.dg/optimize-bswaphi-1.c: Likewise.
>>>  * gcc.dg/optimize-bswapsi-2.c: Likewise.
>>>  * gcc.c-torture/execute/bswap-2.c: Likewise.
>>> 
>>> Ok for trunk?
> 
> Ok, I now decided otherwise and dislike the new parameter to
> get_inner_reference.  Can you please revert that part and just
> deal with a MEM_REF result in your only caller?
> 
> And (of course) I found another possible issue.  The way you
> compute load_type and use it here:
> 
> +  /* Perform the load.  */
> +  load_offset_ptr = build_int_cst (n->alias_set, 0);
> +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
> + load_offset_ptr);
> 
> makes the load always appear aligned according to the mode of
> load_type.  On strict-alignment targets this may cause faults.
> 
> So what you have to do is either (simpler)
> 
>   unsigned int align = get_pointer_alignment (addr_tmp);
>   tree al_load_type = load_type;
>   if (align < TYPE_ALIGN (load_type))
> al_load_type = build_aligned_type (load_type, align);
> ...
>val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
> load_offset_ptr);
> 
> or keep track of the "first" actual load and use
> 
>   unsigned int align = get_object_alignment (that_first_load);
> 
> "first" in the one that corresponds to addr_tmp.  From that there
> is a much better chance to derive good alignment values.
> 
> Of course on STRICT_ALIGNMENT targets a not aligned load
> will be decomposed again, so eventually doing the transformation
> may no longer be profitable(?).

Not always decomposed. On MIPS, it should using the load/store left/right 
instructions for unaligned load/stores which is normally better than decomposed 
load/stores. So having a cost model would be nice. 

Thanks,
Andrew

> 
> Thanks and sorry again for the delay.
> 
> Otherwise the patch looks good to me.
> 
> Richard.
> 
>>> Best regards,
>>> 
>>> Thomas
>>> 
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Monday, May 05, 2014 7:30 PM
 To: GCC Patches
 Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
 load/store
 
 I found a way to improve the function find_bswap/find_bswap_or_nop
 and reduce its size. Please hold for the review, I will post an updated
 version as soon as I finish testing.
 
 Best regards,
 
 Thomas Preud'homme
>> 
>>

Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Richard Biener

On Fri, May 16, 2014 at 1:54 AM, Xinliang David Li  wrote:
> Hi, debugging runtime bugs due to devirtualization can be hard for
> very large C++ programs with complicated class hierarchy. This patch
> adds the support to report this high level transformation via
> -fopt-info (not hidden inside dump file) and the ability the do binary
> search with cutoff.
>
> Ok for trunk after build and test?

+  else if (dump_enabled_p ())
+   {
+ location_t loc = gimple_location (ie->call_stmt);
+ dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+  "Discovered direct call to non-function in %s, "

diagnostics start with lower-case.  Why not merge this with the
dump_file case?  The point of all the infrastructure was to _not_
need to distinguish the cases ...

(similar for the other cases, and IIRC you miss one case in
tree-ssa-pre.c calling ipa_intraprocedural_devirtualization?)

Thanks,
Richard.


> thanks,
>
> David

Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener

On Fri, May 16, 2014 at 12:56 PM,   wrote:
>
>
>> On May 16, 2014, at 3:48 AM, Richard Biener  
>> wrote:
>>
>> On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
>>  wrote:
>>> Ping?
>>
>> Sorry ...
>>
>>> Best regards,
>>>
>>> Thomas Preud'homme
>>>
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store

 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current 
 setup
 is actually more difficult to read.

 Here are the updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-09  Thomas Preud'homme  

  PR tree-optimization/54733
  * expr.c (get_inner_reference): Add a parameter to control whether
 a
  MEM_REF should be split into base + offset.
  * tree.h (get_inner_reference): Default new parameter to false.
  * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure.
  (CMPNOP): Define.
  (find_bswap_or_nop_load): New.
  (find_bswap_1): Renamed to ...
  (find_bswap_or_nop_1): This. Also add support for memory source.
  (find_bswap): Renamed to ...
  (find_bswap_or_nop): This. Also add support for memory source and
  detection of bitwise operations equivalent to load in host endianness.
  (execute_optimize_bswap): Likewise. Also move its leading
 comment back
  in place and split statement transformation into ...
  (bswap_replace): This. Add assert when updating bswap_stats.

 *** gcc/testsuite/ChangeLog ***

 2014-05-09  Thomas Preud'homme  

  PR tree-optimization/54733
  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
  optimization to support memory sources and bitwise operations
  equivalent to load in host endianness.
  * gcc.dg/optimize-bswaphi-1.c: Likewise.
  * gcc.dg/optimize-bswapsi-2.c: Likewise.
  * gcc.c-torture/execute/bswap-2.c: Likewise.

 Ok for trunk?
>>
>> Ok, I now decided otherwise and dislike the new parameter to
>> get_inner_reference.  Can you please revert that part and just
>> deal with a MEM_REF result in your only caller?
>>
>> And (of course) I found another possible issue.  The way you
>> compute load_type and use it here:
>>
>> +  /* Perform the load.  */
>> +  load_offset_ptr = build_int_cst (n->alias_set, 0);
>> +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
>> + load_offset_ptr);
>>
>> makes the load always appear aligned according to the mode of
>> load_type.  On strict-alignment targets this may cause faults.
>>
>> So what you have to do is either (simpler)
>>
>>   unsigned int align = get_pointer_alignment (addr_tmp);
>>   tree al_load_type = load_type;
>>   if (align < TYPE_ALIGN (load_type))
>> al_load_type = build_aligned_type (load_type, align);
>> ...
>>val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
>> load_offset_ptr);
>>
>> or keep track of the "first" actual load and use
>>
>>   unsigned int align = get_object_alignment (that_first_load);
>>
>> "first" in the one that corresponds to addr_tmp.  From that there
>> is a much better chance to derive good alignment values.
>>
>> Of course on STRICT_ALIGNMENT targets a not aligned load
>> will be decomposed again, so eventually doing the transformation
>> may no longer be profitable(?).
>
> Not always decomposed. On MIPS, it should using the load/store left/right 
> instructions for unaligned load/stores which is normally better than 
> decomposed load/stores. So having a cost model would be nice.

Agreed, but I am happy with doing that as a followup.  Btw,
a very simple one would be to reject unaligned
SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
[of course that may be true on MIPS even for the cases where
a "reasonable" fast unalgined variant exists - nearly no target
defines that macro in a too fancy way]

Richard.

> Thanks,
> Andrew
>
>>
>> Thanks and sorry again for the delay.
>>
>> Otherwise the patch looks good to me.
>>
>> Richard.
>>
 Best regards,

 Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Monday, May 05, 2014 7:30 PM
> To: GCC Patches
> Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
> load/store
>
> I found a way to improve the function find_bswap/find_bswap_or_nop
> and reduce its size. Please hold for the review, I will post an updated
> version as soon as I finish testing.
>
> Best regar

Re: [PATCH, libstdc++/61166] overflow when parse number in std::duration operator""

2014-05-16 Thread Jonathan Wakely


On 15/05/14 22:54 -0400, Ed Smith-Rowland wrote:

On 05/15/2014 03:03 PM, Jonathan Wakely wrote:

Here's a finished patch to simplify 

Tested x86_64-linux. Ed, any objection to this version?


This looks great, thanks!


I committed that to trunk, I'll put it on the 4.9 branch too.

Having done that should we actually stop using it as suggested in the 
bug trail? ;-)


I was going to do that, then realised that there's a defect in the
standard where it requires overflow in duration integer literals to be
diagnosed. That's only possible with literal operator templates, so I
think we should keep your _Parse_int code, but apply the attached
change to detect overflow.

As the TODO comment says, it should be sufficient to simply
instantiate integral_constant<_Rep, _Val::value> to give a diagnostic
when _Rep{_Value::value} is narrowing, but GCC only gives a warning
for it, and that's suppressed in a system header, so I do an explicit
static_assert.  That could be replaced with ...

#pragma GCC diagnostic push
#pragma GCC diagnostic error "-Woverflow"
#pragma GCC diagnostic error "-Wsystem-headers"
   template
 constexpr _Dur __check_overflow()
 {
   using _Val = __parse_int::_Parse_int<_Digits...>;
   using _Rep = typename _Dur::rep;
   return _Dur{integral_constant<_Rep, _Val::value>::value};
 }
#pragma GCC diagnostic pop

... but I have plans to do that sort of thing more widely, which I'll
deal with another time as part of https://gcc.gnu.org/PR50871 and/or
https://gcc.gnu.org/PR58876 (what do other people think about using
diagnostic pragmas to locally re-enable diagnostics in our headers?)

Tested x86_64-linux, committed to trunk.
commit 361c9b79e0b1c7f2435f5b0b369a139b216dee90
Author: Jonathan Wakely 
Date:   Fri May 16 10:31:38 2014 +0100

	* include/bits/parse_numbers.h (__parse_int::_Number_help): Check for
	overflow.
	* include/std/chrono (chrono_literals::__select_type::_Select_type):
	Remove.
	(chrono_literals::_Checked_integral_constant): Define.
	Simplify UDL operator templates and check for overflow.
	* testsuite/20_util/duration/literals/range.cc: New.

diff --git a/libstdc++-v3/include/bits/parse_numbers.h b/libstdc++-v3/include/bits/parse_numbers.h
index 0a42381a..a29d127 100644
--- a/libstdc++-v3/include/bits/parse_numbers.h
+++ b/libstdc++-v3/include/bits/parse_numbers.h
@@ -193,6 +193,7 @@ namespace __parse_int
   _Pow / (_Base * __valid_digit::value),
   _Digs...>;
   using type = __ull_constant<_Pow * __digit::value + __next::type::value>;
+  static_assert((type::value / _Pow) == __digit::value, "overflow");
 };
 
   template
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index b114e02..39ad5e3 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -787,117 +787,79 @@ _GLIBCXX_END_NAMESPACE_VERSION
   inline namespace chrono_literals
   {
 
-namespace __select_type
-{
-
-  using namespace __parse_int;
-
-  template
-	struct _Select_type
-	: conditional<
-	_Val <= static_cast
-		  (numeric_limits::max()),
-	_Dur, void>
-	{
-	  static constexpr typename _Select_type::type
-	value{static_cast(_Val)};
-	};
-
-  template
-	constexpr typename _Select_type<_Val, _Dur>::type
-	_Select_type<_Val, _Dur>::value;
+template
+  struct _Checked_integral_constant
+  : integral_constant<_Rep, static_cast<_Rep>(_Val)>
+  {
+	static_assert(_Checked_integral_constant::value > 0
+		  && _Checked_integral_constant::value == _Val,
+		  "literal value cannot be represented by duration type");
+  };
 
-} // __select_type
+template
+  constexpr _Dur __check_overflow()
+  {
+	using _Val = __parse_int::_Parse_int<_Digits...>;
+	using _Rep = typename _Dur::rep;
+	// TODO: should be simply integral_constant<_Rep, _Val::value>
+	// but GCC doesn't reject narrowing conversions to _Rep.
+	using _CheckedVal = _Checked_integral_constant<_Rep, _Val::value>;
+	return _Dur{_CheckedVal::value};
+  }
 
 constexpr chrono::duration>
 operator""h(long double __hours)
 { return chrono::duration>{__hours}; }
 
 template 
-  constexpr typename
-  __select_type::_Select_type<__select_int::_Select_int<_Digits...>::value,
-			 chrono::hours>::type
+  constexpr chrono::hours
   operator""h()
-  {
-	return __select_type::_Select_type<
-			  __select_int::_Select_int<_Digits...>::value,
-			  chrono::hours>::value;
-  }
+  { return __check_overflow(); }
 
 constexpr chrono::duration>
 operator""min(long double __mins)
 { return chrono::duration>{__mins}; }
 
 template 
-  constexpr typename
-  __select_type::_Select_type<__select_int::_Select_int<_Digits...>::value,
-			 chrono::minutes>::type
+  constexpr chrono::minutes
   operator""min()
-  {
-	return __select_type::_Select_type<
-			  __select_int::_Select_int<_Digits...>::value,
-

Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener

On Fri, May 16, 2014 at 1:03 PM, Richard Biener
 wrote:
> On Fri, May 16, 2014 at 12:56 PM,   wrote:
>>
>>
>>> On May 16, 2014, at 3:48 AM, Richard Biener  
>>> wrote:
>>>
>>> On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
>>>  wrote:
 Ping?
>>>
>>> Sorry ...
>>>
 Best regards,

 Thomas Preud'homme

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Friday, May 09, 2014 6:26 PM
> To: GCC Patches
> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
>
> Sorry, took longer than expected as I got distracted by some other patch.
> I merged the whole patchset in a single patch as I was told the current 
> setup
> is actually more difficult to read.
>
> Here are the updated ChangeLogs:
>
> *** gcc/ChangeLog ***
>
> 2014-05-09  Thomas Preud'homme  
>
>  PR tree-optimization/54733
>  * expr.c (get_inner_reference): Add a parameter to control whether
> a
>  MEM_REF should be split into base + offset.
>  * tree.h (get_inner_reference): Default new parameter to false.
>  * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure.
>  (CMPNOP): Define.
>  (find_bswap_or_nop_load): New.
>  (find_bswap_1): Renamed to ...
>  (find_bswap_or_nop_1): This. Also add support for memory source.
>  (find_bswap): Renamed to ...
>  (find_bswap_or_nop): This. Also add support for memory source and
>  detection of bitwise operations equivalent to load in host 
> endianness.
>  (execute_optimize_bswap): Likewise. Also move its leading
> comment back
>  in place and split statement transformation into ...
>  (bswap_replace): This. Add assert when updating bswap_stats.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2014-05-09  Thomas Preud'homme  
>
>  PR tree-optimization/54733
>  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
> bswap
>  optimization to support memory sources and bitwise operations
>  equivalent to load in host endianness.
>  * gcc.dg/optimize-bswaphi-1.c: Likewise.
>  * gcc.dg/optimize-bswapsi-2.c: Likewise.
>  * gcc.c-torture/execute/bswap-2.c: Likewise.
>
> Ok for trunk?
>>>
>>> Ok, I now decided otherwise and dislike the new parameter to
>>> get_inner_reference.  Can you please revert that part and just
>>> deal with a MEM_REF result in your only caller?
>>>
>>> And (of course) I found another possible issue.  The way you
>>> compute load_type and use it here:
>>>
>>> +  /* Perform the load.  */
>>> +  load_offset_ptr = build_int_cst (n->alias_set, 0);
>>> +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
>>> + load_offset_ptr);
>>>
>>> makes the load always appear aligned according to the mode of
>>> load_type.  On strict-alignment targets this may cause faults.
>>>
>>> So what you have to do is either (simpler)
>>>
>>>   unsigned int align = get_pointer_alignment (addr_tmp);
>>>   tree al_load_type = load_type;
>>>   if (align < TYPE_ALIGN (load_type))
>>> al_load_type = build_aligned_type (load_type, align);
>>> ...
>>>val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
>>> load_offset_ptr);
>>>
>>> or keep track of the "first" actual load and use
>>>
>>>   unsigned int align = get_object_alignment (that_first_load);
>>>
>>> "first" in the one that corresponds to addr_tmp.  From that there
>>> is a much better chance to derive good alignment values.
>>>
>>> Of course on STRICT_ALIGNMENT targets a not aligned load
>>> will be decomposed again, so eventually doing the transformation
>>> may no longer be profitable(?).
>>
>> Not always decomposed. On MIPS, it should using the load/store left/right 
>> instructions for unaligned load/stores which is normally better than 
>> decomposed load/stores. So having a cost model would be nice.
>
> Agreed, but I am happy with doing that as a followup.  Btw,
> a very simple one would be to reject unaligned
> SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
> [of course that may be true on MIPS even for the cases where
> a "reasonable" fast unalgined variant exists - nearly no target
> defines that macro in a too fancy way]

Oh, and what happens for

unsigned foo (unsigned char *x)
{
  return x[0] << 24 | x[2] << 8 | x[3];
}

?  We could do an unsigned int load from x and zero byte 3
with an AND.  Enhancement for a followup, similar to also
considering vector types for the load (also I'm not sure
that uint64_type_node always has non-BLKmode for all
targets).

Richard.

> Richard.
>
>> Thanks,
>> Andrew
>>
>>>
>>> Thanks and sorry again for the delay.
>>>
>>> Otherwise the patch looks good to me.
>>>
>>> Richard.
>>>
> Best regards,
>

[PATCH] Fix one testcase from PR61194

2014-05-16 Thread Richard Biener


This fixes the non-regression testcase from PR61194 by also
recognizing COND_EXPRs as sink for bool expressions in vectorizer
pattern detection.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-05-16  Richard Biener  

PR tree-optimization/61194
* tree-vect-patterns.c (adjust_bool_pattern): Also handle
bool patterns ending in a COND_EXPR.

* gcc.dg/vect/pr61194.c: New testcase.

Index: gcc/tree-vect-patterns.c
===
*** gcc/tree-vect-patterns.c(revision 210492)
--- gcc/tree-vect-patterns.c(working copy)
*** adjust_bool_pattern (tree var, tree out_
*** 2889,2895 
   S5  e_b = c_b | d_b;
   S6  f_T = (TYPE) e_b;
  
!where type 'TYPE' is an integral type.
  
 Input:
  
--- 2889,2900 
   S5  e_b = c_b | d_b;
   S6  f_T = (TYPE) e_b;
  
!where type 'TYPE' is an integral type.  Or a similar pattern
!ending in
! 
!  S6  f_Y = e_b ? r_Y : s_Y;
! 
!as results from if-conversion of a complex condition.
  
 Input:
  
*** vect_recog_bool_pattern (vec *st
*** 2966,2971 
--- 2971,3015 
*type_out = vectype;
*type_in = vectype;
stmts->safe_push (last_stmt);
+   if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+  "vect_recog_bool_pattern: detected:\n");
+ 
+   return pattern_stmt;
+ }
+   else if (rhs_code == COND_EXPR
+  && TREE_CODE (var) == SSA_NAME)
+ {
+   vectype = get_vectype_for_scalar_type (TREE_TYPE (lhs));
+   if (vectype == NULL_TREE)
+   return NULL;
+ 
+   /* Build a scalar type for the boolean result that when
+  vectorized matches the vector type of the result in
+size and number of elements.  */
+   unsigned prec
+   = wi::udiv_trunc (TYPE_SIZE (vectype),
+ TYPE_VECTOR_SUBPARTS (vectype)).to_uhwi ();
+   tree type
+   = build_nonstandard_integer_type (prec,
+ TYPE_UNSIGNED (TREE_TYPE (var)));
+   if (get_vectype_for_scalar_type (type) == NULL_TREE)
+   return NULL;
+ 
+   if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
+   return NULL;
+ 
+   rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts);
+   lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+   pattern_stmt 
+ = gimple_build_assign_with_ops (COND_EXPR, lhs,
+ build2 (NE_EXPR, boolean_type_node,
+ rhs, build_int_cst (type, 0)),
+ gimple_assign_rhs2 (last_stmt),
+ gimple_assign_rhs3 (last_stmt));
+   *type_out = vectype;
+   *type_in = vectype;
+   stmts->safe_push (last_stmt);
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
   "vect_recog_bool_pattern: detected:\n");
Index: gcc/testsuite/gcc.dg/vect/pr61194.c
===
*** gcc/testsuite/gcc.dg/vect/pr61194.c (revision 0)
--- gcc/testsuite/gcc.dg/vect/pr61194.c (working copy)
***
*** 0 
--- 1,43 
+ /* { dg-require-effective-target vect_cond_mixed } */
+ /* { dg-additional-options "-ftree-loop-if-convert-stores" } */
+ 
+ #include "tree-vect.h"
+ 
+ static float x[1024];
+ static float y[1024];
+ static float z[1024];
+ static float w[1024];
+ 
+ void __attribute__((noinline,noclone)) barX()
+ {
+   int i;
+   for (i=0; i<1024; ++i)
+ z[i] = ((x[i]>0) & (w[i]<0)) ? z[i] : y[i];
+ }
+ 
+ int main()
+ {
+   int i;
+ 
+   check_vect ();
+ 
+   for (i = 0; i < 1024; ++i)
+ {
+   x[i] = -10 + i;
+   w[i] = 100 - i;
+   z[i] = 0.;
+   y[i] = 1.;
+   __asm__ volatile ("" : : : "memory");
+ }
+ 
+   barX();
+ 
+   for (i = 0; i < 1024; ++i)
+ if (z[i] != ((x[i]>0 && w[i]<0) ? 0. : 1.))
+   abort ();
+ 
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */

PR 61084: Use widest_int in sparc_fold_builtin

2014-05-16 Thread Richard Sandiford

This is the second part of PR 61084, which it seems I'd forgotten to post.
pdist calculates a wide result from narrower inputs, so I should have
used widest_int rather than wide_int.

Tested by Rainer on SPARC Solaris.  OK to install?

Thanks,
Richard


gcc/
PR target/61084
* config/sparc/sparc.c (sparc_fold_builtin): Use widest_int rather
than wide_int.

Index: gcc/config/sparc/sparc.c
===
--- gcc/config/sparc/sparc.c2014-05-15 13:49:25.425654068 +0100
+++ gcc/config/sparc/sparc.c2014-05-16 12:56:48.578349758 +0100
@@ -10915,8 +10915,8 @@ sparc_fold_builtin (tree fndecl, int n_a
  && TREE_CODE (arg2) == INTEGER_CST)
{
  bool overflow = false;
- wide_int result = arg2;
- wide_int tmp;
+ widest_int result = wi::to_widest (arg2);
+ widest_int tmp;
  unsigned i;
 
  for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
@@ -10926,8 +10926,8 @@ sparc_fold_builtin (tree fndecl, int n_a
 
  bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;
 
- tmp = wi::neg (e1, &neg1_ovf);
- tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
+ tmp = wi::neg (wi::to_widest (e1), &neg1_ovf);
+ tmp = wi::add (wi::to_widest (e0), tmp, SIGNED, &add1_ovf);
  if (wi::neg_p (tmp))
tmp = wi::neg (tmp, &neg2_ovf);
  else

Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

2014-05-16 Thread Richard Earnshaw

On 08/05/14 18:36, Ian Bolton wrote:
> Hi,
> 
> It currently takes 4 instructions to generate certain immediates on
> AArch64 (unless we put them in the constant pool).
> 
> For example ...
> 
>   long long
>   beefcafebabe ()
>   {
> return 0xBEEFCAFEBABEll;
>   }
> 
> leads to ...
> 
>   mov x0, 0x47806
>   mov x0, 0xcafe, lsl 16
>   mov x0, 0xbeef, lsl 32
>   orr x0, x0, -281474976710656
> 
> The above case is tackled in this patch by employing MOVN
> to generate the top 32-bits in a single instruction ...
> 
>   mov x0, -71536975282177
>   movk x0, 0xcafe, lsl 16
>   movk x0, 0xbabe, lsl 0
> 
> Note that where at least two half-words are 0x, existing
> code that does the immediate in two instructions is still used.)
> 
> Tested on standard gcc regressions and the attached test case.
> 
> OK for commit?

What about:

long long a()
{
  return 0x123456789abcll;
}

long long b()
{
  return 0x123456789abcll;
}

long long c()
{
  return 0x123456789abcll;
}

?

Surely these can also benefit from this sort of optimization, but it
looks as though you only handle the top 16 bits being set.

R.

> 
> Cheers,
> Ian
> 
> 
> 2014-05-08  Ian Bolton  
> 
> gcc/
>   * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
>   Use MOVN when top-most half-word (and only that half-word)
>   is 0x.
> gcc/testsuite/
>   * gcc.target/aarch64/movn_1.c: New test.
> 
> 
> aarch64-movn-exploitation-patch-v5.txt
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 43a83566..a8e504e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>   }
>  }
>  
> +  /* Look for case where upper 16 bits are set, so we can use MOVN.  */
> +  if ((val & 0xll) == 0xll)
> +{
> +  emit_insn (gen_rtx_SET (VOIDmode, dest,
> +   GEN_INT (~ (~val & (0xll << 32);
> +  emit_insn (gen_insv_immdi (dest, GEN_INT (16),
> +  GEN_INT ((val >> 16) & 0x)));
> +  emit_insn (gen_insv_immdi (dest, GEN_INT (0),
> +  GEN_INT (val & 0x)));
> +  return;
> +}
> +
>   simple_sequence:
>first = true;
>mask = 0x;
> diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c 
> b/gcc/testsuite/gcc.target/aarch64/movn_1.c
> new file mode 100644
> index 000..cc11ade
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-inline --save-temps" } */
> +
> +extern void abort (void);
> +
> +long long
> +foo ()
> +{
> +  /* { dg-final { scan-assembler "mov\tx\[0-9\]+, -71536975282177" } } */
> +  return 0xbeefcafebabell;
> +}
> +
> +long long
> +merge4 (int a, int b, int c, int d)
> +{
> +  return ((long long) a << 48 | (long long) b << 32
> +   | (long long) c << 16 | (long long) d);
> +}
> +
> +int main ()
> +{
> +  if (foo () != merge4 (0x, 0xbeef, 0xcafe, 0xbabe))
> +abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-saved-temps } } */
>

Re: [PATCH (for next stage 1)] Add return type to gimple function dumps

2014-05-16 Thread Richard Biener

On Tue, Apr 29, 2014 at 5:01 PM, David Malcolm  wrote:
> On Tue, 2014-04-29 at 11:16 +0200, Richard Biener wrote:
>> On Tue, Apr 29, 2014 at 2:58 AM, David Malcolm  wrote:
>> > On Thu, 2014-04-24 at 15:46 -0600, Jeff Law wrote:
>> >> On 03/10/14 13:22, David Malcolm wrote:
>> >> > Gimple function dumps contain the types of parameters, but not of the
>> >> > return type.
>> >> >
>> >> > The attached patch fixes this omission; here's an example of the
>> >> > before/after diff:
>> >> > $ diff -up /tmp/pr23401.c.004t.gimple.old /tmp/pr23401.c.004t.gimple.new
>> >> > --- /tmp/pr23401.c.004t.gimple.old  2014-03-10 13:40:08.972063541 
>> >> > -0400
>> >> > +++ /tmp/pr23401.c.004t.gimple.new  2014-03-10 13:39:49.346515464 
>> >> > -0400
>> >> > @@ -1,3 +1,4 @@
>> >> > +int
>> >> >    (int i)
>> >> >   {
>> >> > int D.1731;
>> >> >
>> >> >
>> >> > Successfully bootstrapped and regrtested on x86_64 Linux (Fedora 20).
>> >> >
>> >> > A couple of test cases needed tweaking, since they were counting the
>> >> > number of occurrences of "int" in the gimple dump, which thus changed
>> >> > for functions returning int (like the one above).
>> >> >
>> >> > OK for next stage 1?
>> >> Conceptually OK.  As Richi notes, the work here is in fixing up the
>> >> testsuite.  I didn't see a reply to Richi's question, particularly WRT
>> >> the Fortran testsuite.
>> >
>> > I'm attaching a revised version of the patch which adds the use of
>> > TDF_SLIM (though it didn't appear to be necessary in the test I did of a
>> > function returning a struct).
>> >
>> > Successfully bootstrapped & regrtested on x86_64 Linux (Fedora 20),
>> > using:
>> >   --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
>> >
>> > I didn't see any new failures from this in the testsuite, in particular
>> > gfortran.sum.  Here's a comparison of the before/after test results,
>> > generated using my "jamais-vu" tool [1], with comments added by me
>> > inline:
>> >
>> > Comparing 16 common .sum files
>> > --
>> >
>> >  gcc/testsuite/ada/acats/acats.sum : total: 2320 PASS: 2320
>> >  gcc/testsuite/g++/g++.sum : total: 90421 FAIL: 3 PASS: 86969 XFAIL: 445 
>> > UNSUPPORTED: 3004
>> >  gcc/testsuite/gcc/gcc.sum : total: 110458 FAIL: 45 PASS: 108292 XFAIL: 
>> > 265 XPASS: 33 UNSUPPORTED: 1823
>> >  gcc/testsuite/gfortran/gfortran.sum : total: 45717 PASS: 45600 XFAIL: 52 
>> > UNSUPPORTED: 65
>> >  gcc/testsuite/gnat/gnat.sum : total: 1255 PASS: 1234 XFAIL: 18 
>> > UNSUPPORTED: 3
>> >  gcc/testsuite/go/go.sum : total: 7266 PASS: 7258 XFAIL: 1 UNTESTED: 6 
>> > UNSUPPORTED: 1
>> >  gcc/testsuite/obj-c++/obj-c++.sum : total: 1450 PASS: 1354 XFAIL: 10 
>> > UNSUPPORTED: 86
>> >  gcc/testsuite/objc/objc.sum : total: 2973 PASS: 2893 XFAIL: 6 
>> > UNSUPPORTED: 74
>> >  x86_64-unknown-linux-gnu/boehm-gc/testsuite/boehm-gc.sum : total: 13 
>> > PASS: 12 UNSUPPORTED: 1
>> >  x86_64-unknown-linux-gnu/libatomic/testsuite/libatomic.sum : total: 54 
>> > PASS: 54
>> >  x86_64-unknown-linux-gnu/libffi/testsuite/libffi.sum : total: 1856 PASS: 
>> > 1801 UNSUPPORTED: 55
>> >  x86_64-unknown-linux-gnu/libgo/libgo.sum : total: 122 PASS: 122
>> >  x86_64-unknown-linux-gnu/libgomp/testsuite/libgomp.sum : total: 2420 
>> > PASS: 2420
>> >  x86_64-unknown-linux-gnu/libitm/testsuite/libitm.sum : total: 30 PASS: 26 
>> > XFAIL: 3 UNSUPPORTED: 1
>> >  x86_64-unknown-linux-gnu/libjava/testsuite/libjava.sum : total: 2586 
>> > PASS: 2582 XFAIL: 4
>> >  x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum : total: 
>> > 10265 PASS: 1 XFAIL: 41 UNSUPPORTED: 224
>> >
>> > (...i.e. the totals were unchanged between unpatched/patched for all of
>> > the .sum files; and yes, Fortran was tested.  Should there be a
>> > gcj.sum?)
>> >
>> > Tests that went away in gcc/testsuite/gcc/gcc.sum: 2
>> > 
>> >
>> >  PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple "int" 5
>> >  PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple "int" 3
>> >
>> > Tests appeared in gcc/testsuite/gcc/gcc.sum: 2
>> > --
>> >
>> >  PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple "int" 6
>> >  PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple "int" 4
>> >
>> >
>> > (...my comparison tool isn't smart enough yet to tie these "went
>> > away"/"appeared" results together; they reflect the fixups from the
>> > patch).
>> >
>> > Tests that went away in gcc/testsuite/go/go.sum: 2
>> > --
>> >
>> >  PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
>> > build) compilation,  -O2 -g
>> >  PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
>> > build) execution,  -O2 -g
>> >
>> > Tests appeared in gcc/testsuite/go/go.sum: 2
>> > 
>> >
>> >  PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
>

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Evgeny Stupachenko

The test uses SSSE3 because of the following restriction in i386.c:

static bool
expand_vec_perm_pshufb2 (struct expand_vec_perm_d *d)
{
  rtx rperm[2][16], vperm, l, h, op, m128;
  unsigned int i, nelt, eltsz;

  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
return false;

Does the following fix ok?

2014-05-16  Evgeny Stupachenko  

   * gcc.dg/vect/pr52252-ld.c: Fix target for the test.


diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
index 6e3cb52..301433b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
+++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -g -ftree-vectorize -mssse3
-fdump-tree-vect-details" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details" } */
+/* { dg-skip-if "why" { ! { x86_64-*-* i?86-*-* } } } */

 #define byte unsigned char

On Tue, May 13, 2014 at 12:21 PM, Richard Biener  wrote:
> On Mon, 12 May 2014, Evgeny Stupachenko wrote:
>
>> The test is on general changes. However I was able to test it on x86 only.
>> I see 2 possible solutions:
>> 1. Set the test for x86 only.
>> 2. Modify it so that it will pass on sparc-sun-solaris2.
>>
>> If 2. is not acceptable I'll create patch for 1.
>> Currently I don't see why "in0_9 = *in_27" is not supported. Does the
>> test fail because of unsupported permutation?
>
> The test uses
>
> /* { dg-options "-O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details"
> { target { i?86-*-* x86_64-*-* } } } */
>
> that's bogus.  You shouldn't add any dg-options.  Instead use proper
> dg-effective-target checks for the mssse3 feature you are using.
> Note that the dg-final checking is applied regardless of the options
> above are applied or not.
>
> Why does the test only succeed with -mssse3 btw?
>
> The proper way to restrict the test to a single target is to use
>
> /* { dg-skip-if "why" { ! { x86_64-*-* i?86-*-* } } } */
>
> Sorry for not catching this in the review.
>
> Richard.
>
>>
>> On Mon, May 12, 2014 at 7:14 PM, Rainer Orth
>>  wrote:
>> > Evgeny Stupachenko  writes:
>> >
>> >> Patch with fixes attached.
>> >> Currently if-structure is as following:
>> >> +  if (count == 3)
>> >> ...
>> >> +  else
>> >> +   {
>> >> + /* If length is not equal to 3 then only power of 2 is 
>> >> supported.  */
>> >> + gcc_assert (exact_log2 (count) != -1);
>> >>
>> >> For stores group I've created another mail thread.
>> > [...]
>>  2014-05-06  Evgeny Stupachenko  
>> 
>> PR tree-optimization/52252
>> * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3.
>> >
>> > This test FAILs on sparc-sun-solaris2.11, both 32 and 64-bit:
>> >
>> > FAIL: gcc.dg/vect/pr52252-ld.c scan-tree-dump-times vect "vectorized 1 
>> > loops" 1
>> > FAIL: gcc.dg/vect/pr52252-ld.c -flto -ffat-lto-objects  
>> > scan-tree-dump-times vect "vectorized 1 loops" 1
>> >
>> > The dumps have
>> >
>> > /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/pr52252-ld.c:10:3: 
>> > note: not vectorized: relevant stmt not supported: in0_9 = *in_27;
>> > /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/pr52252-ld.c:7:1: 
>> > note: vectorized 0 loops in function.
>> >
>> > Rainer
>> >
>> > --
>> > -
>> > Rainer Orth, Center for Biotechnology, Bielefeld University
>>
>>
>
> --
> Richard Biener 
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Rainer Orth

Hi Evgeny,

> Does the following fix ok?
>
> 2014-05-16  Evgeny Stupachenko  
>
>* gcc.dg/vect/pr52252-ld.c: Fix target for the test.
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
> b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
> index 6e3cb52..301433b 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -g -ftree-vectorize -mssse3
> -fdump-tree-vect-details" { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-options "-O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details" } 
> */
> +/* { dg-skip-if "why" { ! { x86_64-*-* i?86-*-* } } } */

If the test is really x86 specific, move it to gcc.target/i386 and
remove the dg-skip-if.  Otherwise, add an explanation for skipping the
test on other targets to the first arg of dg-skip-if.  This is supposed
to be a comment stating why the test is skipped, not "why" literally.

Approval or rejection of the testcase is up to the target maintainers.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: patch8.diff updated Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate

2014-05-16 Thread Ian Lance Taylor

On Fri, May 16, 2014 at 1:03 AM, Svante Signell
 wrote:
>
> For that part of the patch without it the build on GNU/Hurd fails. On
> the other hand SYS_FCNTL is not defined for e.g. GNU/Linux either. This
> is used in gcc-4.9-4.9.0/src/libgo/go/net/fd_unix.go:
> func dupCloseOnExec(fd int) (newfd int, err error) {
> if atomic.LoadInt32(&tryDupCloexec) == 1 && syscall.F_DUPFD_CLOEXEC!=0 {
> r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd),
> syscall.F_DUPFD_CLOEXEC, 0)
>
> It is yet unknown how the build succeeds on Linux without the SYS_FCNTL
> being defined? Maybe any the conditions above are not met.

On GNU/Linux SYS_FCNTL is defined by the generated file sysinfo.go,
because SYS_fcntl is defined by .

Ian

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Jakub Jelinek

On Fri, May 16, 2014 at 03:11:05PM +0200, Rainer Orth wrote:
> Hi Evgeny,
> 
> > Does the following fix ok?
> >
> > 2014-05-16  Evgeny Stupachenko  
> >
> >* gcc.dg/vect/pr52252-ld.c: Fix target for the test.
> >
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
> > b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
> > index 6e3cb52..301433b 100644
> > --- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
> > +++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
> > @@ -1,5 +1,6 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -g -ftree-vectorize -mssse3
> > -fdump-tree-vect-details" { target { i?86-*-* x86_64-*-* } } } */
> > +/* { dg-options "-O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details" 
> > } */
> > +/* { dg-skip-if "why" { ! { x86_64-*-* i?86-*-* } } } */
> 
> If the test is really x86 specific, move it to gcc.target/i386 and
> remove the dg-skip-if.  Otherwise, add an explanation for skipping the
> test on other targets to the first arg of dg-skip-if.  This is supposed
> to be a comment stating why the test is skipped, not "why" literally.

Well, I don't see anything i?86/x86_64 specific on the test.  What
is specific is the -mssse3, which supposedly should be added through
/* { dg-additional-options "-mssse3" { target { i?86-*-* x86_64-*-* } } } */
and then perhaps the test might not necessarily be vectorized (so the
dg-final line may need target guard as well.
But, I see no reason not to try to compile this on other targets.

Jakub

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Evgeny Stupachenko

I'm not sure about other architectures. I can test it on x86. Most
likely it will pass on Arm, but again I'm not sure that { arm*-*-* }
is ok.

The test is on general changes. So we can wait for others and if there
are no more objections leave sparc-sun-solaris2 as target to skip.

Or change to the following:
/* { dg-skip-if "The test should pass on x86, other architectures are
untested" { ! { x86_64-*-* i?86-*-* } } } */

So that other will add their targets if necessary.

Thanks,
Evgeny

Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.

2014-05-16 Thread Rainer Orth

Evgeny Stupachenko  writes:

> The test is on general changes. So we can wait for others and if there
> are no more objections leave sparc-sun-solaris2 as target to skip.

If so, use sparc*-*-* instead.

> Or change to the following:
> /* { dg-skip-if "The test should pass on x86, other architectures are
> untested" { ! { x86_64-*-* i?86-*-* } } } */
>
> So that other will add their targets if necessary.

I wouldn't do it this way, because this will never happen.  Rather,
start with all targets and skip or xfail if necessary, adding an
explanation why this is necessary.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: patch8.diff updated Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate

2014-05-16 Thread Svante Signell

On Fri, 2014-05-16 at 06:20 -0700, Ian Lance Taylor wrote:
> On Fri, May 16, 2014 at 1:03 AM, Svante Signell
>  wrote:
> >
> > For that part of the patch without it the build on GNU/Hurd fails. On
> > the other hand SYS_FCNTL is not defined for e.g. GNU/Linux either. This
> > is used in gcc-4.9-4.9.0/src/libgo/go/net/fd_unix.go:
> > func dupCloseOnExec(fd int) (newfd int, err error) {
> > if atomic.LoadInt32(&tryDupCloexec) == 1 && syscall.F_DUPFD_CLOEXEC!=0 {
> > r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd),
> > syscall.F_DUPFD_CLOEXEC, 0)
> >
> > It is yet unknown how the build succeeds on Linux without the SYS_FCNTL
> > being defined? Maybe any the conditions above are not met.
> 
> On GNU/Linux SYS_FCNTL is defined by the generated file sysinfo.go,
> because SYS_fcntl is defined by .

Thanks, then that part of the patch should read:

# Special treatment of _SYS_fcntl for GNU/Hurd
if ! grep '^const _SYS_fcntl' ${OUT} > /dev/null 2>&1; then
  echo "const SYS_FCNTL = 0" >> ${OUT}
fi

Shall I submit a new patch8.diff (or all patches again)?

Re: [PATCH] [PING^2] Fix for PR libstdc++/60758

2014-05-16 Thread Alexey Merzlyakov

On 07.05.2014 13:28, Ramana Radhakrishnan wrote:

On 05/07/14 09:19, Yury Gribov wrote:

 Original Message 
Subject: [PING] [PATCH] Fix for PR libstdc++/60758
Date: Thu, 17 Apr 2014 17:48:12 +0400
From: Alexey Merzlyakov 
To: Ramana Radhakrishnan 
CC: gcc-patches@gcc.gnu.org , Viacheslav
Garbuzov , Yury Gribov 

Hi,

This fixes infinite backtrace in __cxa_end_cleanup().
Regtest was finished with no regressions on arm-linux-gnueabi(sf).

The patch posted at:
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html

This is OK to apply if no regressions.

Thanks,
Ramana

Thanks in advance.

Best regards,
Merzlyakov Alexey

I have re-tested it again on arm-linux-gnueabi(sf) - no regressions.
The change was committed to mainline as revision 210515.

Best regards,
Merzlyakov Alexey

Re: avx runtime check

2014-05-16 Thread Mike Stump

On May 15, 2014, at 11:52 PM, Richard Biener  wrote:
> On May 16, 2014 4:47:11 AM CEST, Mike Stump  wrote:
>> This reorders the avx checks and gates on a target triplet check before
>> compiling any code.
> 
> Can you explain why?

Sure, because check_avx_hw_available runs [istarget x86_64-*-*] || [istarget 
i?86-*-*] before doing anything, like compiling:

   typedef double __m512d __attribute__ ((__vector_size__ (64)));

__m512d _mm512_add (__m512d a)
{
  return __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
}

with -mavx512f, which my target doesn’t have, but even running a compilation of 
that seems wrong.  The other possibility would be to add in a:

# If this is not the right target then we can skip the test.

  
if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
expr 0
} else {

into the test for check_effective_target_avx512f.


proc check_effective_target_avx512f { } {
return [check_no_compiler_messages avx512f object {
typedef double __m512d __attribute__ ((__vector_size__ (64)));

__m512d _mm512_add (__m512d a)
{
  return __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
}
} "-O2 -mavx512f" ]
}

proc check_avx_hw_available { } {
return [check_cached_effective_target avx_hw_available {
# If this is not the right target then we can skip the test.

  
if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
expr 0
} else {
check_runtime_nocache avx_hw_available {
#include "cpuid.h"  

  
int main ()
{
  unsigned int eax, ebx, ecx, edx;
  if (__get_cpuid (1, &eax, &ebx, &ecx, &edx))
return ((ecx & (bit_AVX | bit_OSXSAVE))
!= (bit_AVX | bit_OSXSAVE));
  return 1;
}
} ""
}
}]
}


>> diff --git a/gcc/testsuite/lib/target-supports.exp
>> b/gcc/testsuite/lib/target-supports.exp
>> index 40b5414..103a28a 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -1353,8 +1353,8 @@ proc check_effective_target_sse2_runtime { } {
>> # Return 1 if the target supports running AVX executables, 0 otherwise.
>> 
>> proc check_effective_target_avx_runtime { } {
>> -if { [check_effective_target_avx]
>> -&& [check_avx_hw_available]
>> +if { [check_avx_hw_available]
>> +&& [check_effective_target_avx]
>>&& [check_avx_os_support_available] } {
>>   return 1
>>}

[PATCH][ARM] Vectorize bswap[16,32,64] operations

2014-05-16 Thread Kyrill Tkachov


Hi all,

This is the aarch32 version of 
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00850.html that allows us to 
(auto-)vectorise the __builtin_bswap[16,32,64] functions using the vrev 
instructions.
For that we create some new NEON builtins and get 
arm_builtin_vectorized_function to map to them when asked to vectorise the 
corresponding builtin.


The tests for this were added with the aarch64 patch mentioned above but were 
disabled for arm. This patch enables them now that we support the operations (of 
course they now pass on arm)


Tested arm-none-eabi and bootstrapped on arm-none-linux-gnueabihf.

Ok for trunk?


Thanks,
Kyrill

2014-05-16  Kyrylo Tkachov  

* config/arm/neon.md (neon_bswap): New pattern.
* config/arm/arm.c (neon_itype): Add NEON_BSWAP.
(arm_init_neon_builtins): Handle NEON_BSWAP.
Define required type nodes.
(arm_expand_neon_builtin): Handle NEON_BSWAP.
(arm_builtin_vectorized_function): Handle BUILTIN_BSWAP builtins.
* config/arm/arm_neon_builtins.def (bswap): Define builtins.
* config/arm/iterators.md (VDQHSD): New mode iterator.

2014-05-16  Kyrylo Tkachov  

* lib/target-supports.exp (check_effective_target_vect_bswap):
Specify arm*-*-* support.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1e44080..6d430a2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23064,6 +23064,7 @@ typedef enum {
   NEON_BINOP,
   NEON_TERNOP,
   NEON_UNOP,
+  NEON_BSWAP,
   NEON_GETLANE,
   NEON_SETLANE,
   NEON_CREATE,
@@ -23531,14 +23532,19 @@ arm_init_neon_builtins (void)
 
   tree V8QI_type_node;
   tree V4HI_type_node;
+  tree V4UHI_type_node;
   tree V4HF_type_node;
   tree V2SI_type_node;
+  tree V2USI_type_node;
   tree V2SF_type_node;
   tree V16QI_type_node;
   tree V8HI_type_node;
+  tree V8UHI_type_node;
   tree V4SI_type_node;
+  tree V4USI_type_node;
   tree V4SF_type_node;
   tree V2DI_type_node;
+  tree V2UDI_type_node;
 
   tree intUQI_type_node;
   tree intUHI_type_node;
@@ -23634,16 +23640,26 @@ arm_init_neon_builtins (void)
   const_intDI_pointer_node = build_pointer_type (const_intDI_node);
   const_float_pointer_node = build_pointer_type (const_float_node);
 
+  /* Unsigned integer types for various mode sizes.  */
+  intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode));
+  intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
+  intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
+  intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
+  neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode));
   /* Now create vector types based on our NEON element types.  */
   /* 64-bit vectors.  */
   V8QI_type_node =
 build_vector_type_for_mode (neon_intQI_type_node, V8QImode);
   V4HI_type_node =
 build_vector_type_for_mode (neon_intHI_type_node, V4HImode);
+  V4UHI_type_node =
+build_vector_type_for_mode (intUHI_type_node, V4HImode);
   V4HF_type_node =
 build_vector_type_for_mode (neon_floatHF_type_node, V4HFmode);
   V2SI_type_node =
 build_vector_type_for_mode (neon_intSI_type_node, V2SImode);
+  V2USI_type_node =
+build_vector_type_for_mode (intUSI_type_node, V2SImode);
   V2SF_type_node =
 build_vector_type_for_mode (neon_float_type_node, V2SFmode);
   /* 128-bit vectors.  */
@@ -23651,19 +23667,18 @@ arm_init_neon_builtins (void)
 build_vector_type_for_mode (neon_intQI_type_node, V16QImode);
   V8HI_type_node =
 build_vector_type_for_mode (neon_intHI_type_node, V8HImode);
+  V8UHI_type_node =
+build_vector_type_for_mode (intUHI_type_node, V8HImode);
   V4SI_type_node =
 build_vector_type_for_mode (neon_intSI_type_node, V4SImode);
+  V4USI_type_node =
+build_vector_type_for_mode (intUSI_type_node, V4SImode);
   V4SF_type_node =
 build_vector_type_for_mode (neon_float_type_node, V4SFmode);
   V2DI_type_node =
 build_vector_type_for_mode (neon_intDI_type_node, V2DImode);
-
-  /* Unsigned integer types for various mode sizes.  */
-  intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode));
-  intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
-  intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
-  intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
-  neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode));
+  V2UDI_type_node =
+build_vector_type_for_mode (intUDI_type_node, V2DImode);
 
 
   (*lang_hooks.types.register_builtin_type) (intUQI_type_node,
@@ -23741,8 +23756,6 @@ arm_init_neon_builtins (void)
 
   if (TARGET_CRYPTO && TARGET_HARD_FLOAT)
   {
-tree V4USI_type_node =
-  build_vector_type_for_mode (intUSI_type_node, V4SImode);
 
 tree V16UQI_type_node =
   build_vector_type_for_mode (intUQI_type_node, V16QImode);
@@ -24106,6 +24119,31 @@ arm_init_neon_builtins (void)
 	ftype = build_function_type_list (return_type, eltype, NULL);
 	break;
 	  }
+	case NEON_BSWAP

Re: [Committed] [PATCH,*/2] shrink wrap a function with a single loop: split live_edge

2014-05-16 Thread Dominique Dhumieres

> One of the two commits breaks several fortran tests in 32 bit mode, ...

It it is not the default you need to compile the failing tests with 
-mtune=core2 or -mtune=corei7.

Dominique

[patch] fix recent libstdc++ test failures

2014-05-16 Thread Jonathan Wakely


Tested x86_64-linux, -m32 and -m64, committed to trunk.

commit c5265e76b75cbfa0744c354f6d5aaf98c327ff8e
Author: Jonathan Wakely 
Date:   Fri May 16 13:26:18 2014 +0100

	* testsuite/20_util/make_unsigned/requirements/typedefs-1.cc: Fix
	test for 32-bit target.
	* testsuite/20_util/make_unsigned/requirements/typedefs-2.cc:
	Likewise.
	* testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc:
	Fix test.

diff --git a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc
index a893ede..d9f13aa 100644
--- a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc
+++ b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc
@@ -49,7 +49,7 @@ void test01()
 
 #ifdef _GLIBCXX_USE_WCHAR_T
   typedef make_unsigned::type  	test23_type;
-  static_assert(is_same::value, "");
+  static_assert(is_unsigned::value, "");
 #endif
 
   // Chapter 48, chapter 20. Smallest rank such that new unsigned type
diff --git a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc
index 637b0c7..807e2bf 100644
--- a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc
+++ b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc
@@ -30,6 +30,7 @@ void test01()
 {
   using std::make_unsigned;
   using std::is_same;
+  using std::is_unsigned;
 
   // Positive tests.
   typedef make_unsigned::type  	test2_type;
@@ -50,7 +51,7 @@ void test01()
 
 #ifdef _GLIBCXX_USE_WCHAR_T
   typedef make_unsigned::type  	test23_type;
-  static_assert(is_same::value, "");
+  static_assert(is_unsigned::value, "");
 #endif
 
   typedef make_unsigned::type  	  test24_type;
diff --git a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc
index 26fc3d0..dd52a70 100644
--- a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc
+++ b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc
@@ -33,8 +33,7 @@ test01()
   bool test __attribute__((unused)) = true;
 
   const std::tr1::shared_ptr p1(new A);
-  p1.reset(); // { dg-error "discards qualifiers" }
-  // { dg-error "no match" "" { target *-*-* } 36 }
+  p1.reset(); // { dg-error "no match" }
 
   return 0;
 }

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Steven Bosscher

On Fri, May 16, 2014 at 12:51 PM, Richard Biener wrote:
> Btw, the bswap pass enhancements that are currently in review may
> also be an opportunity to catch these.  They can merge adjacent
> loads that are used "composed" (but not yet composed by storing
> into adjacent memory).  The basic-block vectorizer should also
> handle this (if the composition happens to be by storing into
> adjacent memory) - of course it needs vector modes available and
> it has to be enabled.

That won't work for cleaning up spill code load/reloads, which is one
of the motivations for the new pass.

Ciao!
Steven

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Ramana Radhakrishnan

>
> Btw, the bswap pass enhancements that are currently in review may
> also be an opportunity to catch these.  They can merge adjacent
> loads that are used "composed" (but not yet composed by storing
> into adjacent memory).  The basic-block vectorizer should also
> handle this (if the composition happens to be by storing into
> adjacent memory) - of course it needs vector modes available and
> it has to be enabled.

Should we really do it there ?  If we start merging multiple loads and
stores into the vector register set, on some architectures or
microarchitectures the cost of moving to and from the vector register
set to the general purpose register set might be too expensive for
some operations and then we get into all sorts of issues.

I think there is merit in an RTL pass.

Ramana

>
> Richard.
>
>> Thanks,
>> bin
>>>
>>>

 So, any comments about this?

 Thanks,
 bin


 2014-05-15  Bin Cheng  
* common.opt (flag_merge_paired_loadstore): New option.
* merge-paired-loadstore.c: New file.
* Makefile.in: Support new file.
* config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro.
(load_latency_expanded_p, arm_merge_paired_loadstore): New function.
* params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param.
* doc/invoke.texi (-fmerge-paired-loadstore): New.
(max-merge-paired-loadstore-distance): New.
* doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New.
* doc/tm.texi: Regenerated.
* target.def (merge_paired_loadstore): New.
* tree-pass.h (make_pass_merge_paired_loadstore): New decl.
* passes.def (pass_merge_paired_loadstore): New pass.
* timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var.

 gcc/testsuite/ChangeLog
 2014-05-15  Bin Cheng  

* gcc.target/arm/merge-paired-loadstore.c: New test.

 
>>
>>
>>
>> --
>> Best Regards.

[PATCH 0/5] let gdb reuse gcc'c C compiler

2014-05-16 Thread Tom Tromey

Hi!

This patch series is half of a project to let gdb reuse gcc (which
half depends on which list you are seeing this on), so that users can
compile small snippets of code and evaluate them in the current
context of the inferior.

This first series implements this idea for C.  A user can compile a
code snippet and it will be inserted into the inferior and evaluated.
Declarations needed by the snippet are supplied by gdb, and there is a
bit of magic so that the snippets can refer to local variables in the
current frame.

The new command allows for arbitrary code to be inserted -- not just
expressions.  For example:

(gdb) compile code int i; for (i = 0; i < 3; ++i) printf ("#%d\n", i)
#0
#1
#2

This series supplies a gcc plugin to do most of the work, so that any
gcc crashes -- seen during development due to translation bugs -- do
not also crash gdb.

The interface between gdb and gcc is defined by a few files added to
include/.

There is a new shared library which gdb loads in order to communicate
with the gcc plugin.  This library communicates with the gcc plugin
using a simple, ad-hoc RPC mechanism.

This shared library exports a single public function which is used to
instantiate any needed objects.  This makes it simple to version the
API and avoid undue synchronization between gcc and gdb.

We think the plugin is best suited to be put into the gcc repository
because it is coupled more tightly to gcc than to gdb.

To try it out, just build gcc and gdb with the patches applied.  Then
set your PATH and LD_LIBRARY_PATH to point to the right subdirectories
of the new gcc install directory.

In later series we plan to extend this functionality; either on the
gcc side, say by writing a similar plugin for C++; or on the gdb side,
say by making it possible to compile breakpoint conditions.  However,
we haven't yet decided exactly which future projects we will tackle or
in what order.

[PATCH 1/5] export finish_bitfield_layout from stor-layout

2014-05-16 Thread Tom Tromey

The gdb plugin handles some aspects of type layout on its own.  It
does this because it knows the layout of types, but not the path by
which the layout was determined -- so normal gcc things like
TYPE_PACKED cannot be used.

This patch exposes one bit of stor-layout so it can be used by the
plugin.

2014-05-16  Phil Muldoon  
Tom Tromey  

* stor-layout.c (finish_bitfield_layout): Now public.  Change
argument type to 'tree'.
(finish_record_layout): Update.
* stor-layout.h (finish_bitfield_layout): Declare.
---
 gcc/ChangeLog |  8 
 gcc/stor-layout.c | 12 ++--
 gcc/stor-layout.h |  1 +
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index 8fa4dc8..ad057d9 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -1931,10 +1931,10 @@ finish_bitfield_representative (tree repr, tree field)
 }
 
 /* Compute and set FIELD_DECLs for the underlying objects we should
-   use for bitfield access for the structure laid out with RLI.  */
+   use for bitfield access for the structure T.  */
 
-static void
-finish_bitfield_layout (record_layout_info rli)
+void
+finish_bitfield_layout (tree t)
 {
   tree field, prev;
   tree repr = NULL_TREE;
@@ -1943,10 +1943,10 @@ finish_bitfield_layout (record_layout_info rli)
  we could use the underlying type as hint for the representative
  if the bitfield would fit and the representative would not exceed
  the union in size.  */
-  if (TREE_CODE (rli->t) != RECORD_TYPE)
+  if (TREE_CODE (t) != RECORD_TYPE)
 return;
 
-  for (prev = NULL_TREE, field = TYPE_FIELDS (rli->t);
+  for (prev = NULL_TREE, field = TYPE_FIELDS (t);
field; field = DECL_CHAIN (field))
 {
   if (TREE_CODE (field) != FIELD_DECL)
@@ -2033,7 +2033,7 @@ finish_record_layout (record_layout_info rli, int free_p)
   finalize_type_size (rli->t);
 
   /* Compute bitfield representatives.  */
-  finish_bitfield_layout (rli);
+  finish_bitfield_layout (rli->t);
 
   /* Propagate TYPE_PACKED to variants.  With C++ templates,
  handle_packed_attribute is too early to do this.  */
diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h
index 0ff98f8..cd58672 100644
--- a/gcc/stor-layout.h
+++ b/gcc/stor-layout.h
@@ -35,6 +35,7 @@ extern tree rli_size_so_far (record_layout_info);
 extern void normalize_rli (record_layout_info);
 extern void place_field (record_layout_info, tree);
 extern void compute_record_mode (tree);
+extern void finish_bitfield_layout (tree);
 extern void finish_record_layout (record_layout_info, int);
 extern unsigned int element_precision (const_tree);
 extern void finalize_size_functions (void);
-- 
1.9.0

[PATCH 3/5] introduce the binding oracle

2014-05-16 Thread Tom Tromey

gdb wants to supply any declarations that may be referred to by the
user's code.  Hooking into symbol lookup was an efficient way to
accomplish this.

This patch introducing a "binding oracle" that is consulted whenever a
symbol binding is looked up for the first time.  The oracle is just a
global function pointer.  If it is NULL, no special work is done.  It
is called with the identifier to supply and with an enum argument
indicating the kind of binding being requested.  The oracle can then
call back into the C front end (via the new functions c_pushtag and
c_bind) to supply a binding; or it can silently do nothing if the
request could not be fulfilled.

The code caches Whether the oracle has been called to avoid repeated
useless queries.

There is a little hack in c_print_identifier to avoid calling the
binding oracle here.  This makes debugging gcc in the presence of the
plugin remain relatively sane -- without this, calling debug_tree or
the like can confusingly call into the plugin.

2014-05-16  Phil Muldoon  
Tom Tromey  

* c-tree.h (enum c_oracle_request): New.
(c_binding_oracle_function): New typedef.
(c_binding_oracle, c_pushtag, c_bind): Declare.
* c-decl.c (c_binding_oracle): New global.
(I_SYMBOL_CHECKED): New macro.
(i_symbol_binding): New function.
(I_SYMBOL_BINDING, I_SYMBOL_DECL): Redefine.
(I_TAG_CHECKED): New macro.
(i_tag_binding): New function.
(I_TAG_BINDING, I_TAG_DECL): Redefine.
(I_LABEL_CHECKED): New macro.
(i_label_binding): New function.
(I_LABEL_BINDING, I_LABEL_DECL): Redefine.
(c_print_identifier): Save and restore c_binding_oracle.
(c_pushtag, c_bind): New functions.
---
 gcc/c/ChangeLog |  19 +++
 gcc/c/c-decl.c  | 161 ++--
 gcc/c/c-tree.h  |  24 +
 3 files changed, 189 insertions(+), 15 deletions(-)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index d52dcc9..b391add 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -215,21 +215,6 @@ struct GTY((chain_next ("%h.prev"))) c_binding {
 #define B_IN_FILE_SCOPE(b) ((b)->depth == 1 /*file_scope->depth*/)
 #define B_IN_EXTERNAL_SCOPE(b) ((b)->depth == 0 /*external_scope->depth*/)
 
-#define I_SYMBOL_BINDING(node) \
-  (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))->symbol_binding)
-#define I_SYMBOL_DECL(node) \
- (I_SYMBOL_BINDING(node) ? I_SYMBOL_BINDING(node)->decl : 0)
-
-#define I_TAG_BINDING(node) \
-  (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))->tag_binding)
-#define I_TAG_DECL(node) \
- (I_TAG_BINDING(node) ? I_TAG_BINDING(node)->decl : 0)
-
-#define I_LABEL_BINDING(node) \
-  (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))->label_binding)
-#define I_LABEL_DECL(node) \
- (I_LABEL_BINDING(node) ? I_LABEL_BINDING(node)->decl : 0)
-
 /* Each C symbol points to three linked lists of c_binding structures.
These describe the values of the identifier in the three different
namespaces defined by the language.  */
@@ -245,6 +230,96 @@ struct GTY(()) lang_identifier {
 extern char C_SIZEOF_STRUCT_LANG_IDENTIFIER_isnt_accurate
 [(sizeof(struct lang_identifier) == C_SIZEOF_STRUCT_LANG_IDENTIFIER) ? 1 : -1];
 
+/* The binding oracle; see c-tree.h.  */
+void (*c_binding_oracle) (enum c_oracle_request, tree identifier);
+
+/* This flag is set on an identifier if we have previously asked the
+   binding oracle for this identifier's symbol binding.  */
+#define I_SYMBOL_CHECKED(node) \
+  (TREE_LANG_FLAG_4 (IDENTIFIER_NODE_CHECK (node)))
+
+static inline struct c_binding* *
+i_symbol_binding (tree node)
+{
+  struct lang_identifier *lid
+= (struct lang_identifier *) IDENTIFIER_NODE_CHECK (node);
+
+  if (lid->symbol_binding == NULL
+  && c_binding_oracle != NULL
+  && !I_SYMBOL_CHECKED (node))
+{
+  /* Set the "checked" flag first, to avoid infinite recursion
+when the binding oracle calls back into gcc.  */
+  I_SYMBOL_CHECKED (node) = 1;
+  c_binding_oracle (C_ORACLE_SYMBOL, node);
+}
+
+  return &lid->symbol_binding;
+}
+
+#define I_SYMBOL_BINDING(node) (*i_symbol_binding (node))
+
+#define I_SYMBOL_DECL(node) \
+ (I_SYMBOL_BINDING(node) ? I_SYMBOL_BINDING(node)->decl : 0)
+
+/* This flag is set on an identifier if we have previously asked the
+   binding oracle for this identifier's tag binding.  */
+#define I_TAG_CHECKED(node) \
+  (TREE_LANG_FLAG_5 (IDENTIFIER_NODE_CHECK (node)))
+
+static inline struct c_binding **
+i_tag_binding (tree node)
+{
+  struct lang_identifier *lid
+= (struct lang_identifier *) IDENTIFIER_NODE_CHECK (node);
+
+  if (lid->tag_binding == NULL
+  && c_binding_oracle != NULL
+  && !I_TAG_CHECKED (node))
+{
+  /* Set the "checked" flag first, to avoid infinite recursion
+when the binding oracle calls back into gcc.  */
+  I_TAG_CHECKED (node) = 1;
+  c_binding_oracle (C_ORACLE_TAG, node);
+}
+
+

[PATCH 4/5] add gcc/gdb interface files

2014-05-16 Thread Tom Tromey

The gcc plugin is split into two parts.  One part is an ordinary gcc
plugin.  The other part is a shared library that is loaded by gdb.

This patch adds some files that define the interface exported by this
shared library to gdb.  These files also define the internal API by
which the gdb- and gcc-sides communicate.

These files will be kept in sync between gcc and gdb like much of
include/.

The exported API has been intentionally kept very simple.  In
particular only a single function is exported from the gdb-side
library; symbol visibility is used to hide everything else.  This
exported symbol is a function which is called to return a structure
holding function pointers that gdb then uses.  This structure is
versioned so that changes can be made without necessarily requiring a
simultaneous gdb upgrade.

Note that the C compiler API is broken out separately.  This lets us
extend it to other GCC front ends as desired.  We plan to investigate
C++ in the future.

2014-05-16  Phil Muldoon  
Jan Kratochvil  
Tom Tromey  

* gcc-c-fe.def: New file.
* gcc-c-interface.h: New file.
* gcc-interface.h: New file.
---
 include/ChangeLog |   8 ++
 include/gcc-c-fe.def  | 195 
 include/gcc-c-interface.h | 220 ++
 include/gcc-interface.h   | 120 +
 4 files changed, 543 insertions(+)
 create mode 100644 include/gcc-c-fe.def
 create mode 100644 include/gcc-c-interface.h
 create mode 100644 include/gcc-interface.h

diff --git a/include/gcc-c-fe.def b/include/gcc-c-fe.def
new file mode 100644
index 000..dc5a41a
--- /dev/null
+++ b/include/gcc-c-fe.def
@@ -0,0 +1,195 @@
+/* Interface between GCC C FE and GDB  -*- c -*-
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+
+
+/* Create a new "decl" in GCC.  A decl is a declaration, basically a
+   kind of symbol.
+
+   NAME is the name of the new symbol.  SYM_KIND is the kind of
+   symbol being requested.  SYM_TYPE is the new symbol's C type;
+   except for labels, where this is not meaningful and should be
+   zero.  If SUBSTITUTION_NAME is not NULL, then a reference to this
+   decl in the source will later be substituted with a dereference
+   of a variable of the given name.  Otherwise, for symbols having
+   an address (e.g., functions), ADDRESS is the address.  FILENAME
+   and LINE_NUMBER refer to the symbol's source location.  If this
+   is not known, FILENAME can be NULL and LINE_NUMBER can be 0.
+   This function returns the new decl.  */
+
+GCC_METHOD7 (gcc_decl, build_decl,
+const char */* name */,
+enum gcc_c_symbol_kind /* sym_kind */,
+gcc_type /* sym_type */,
+const char */* substitution_name */,
+gcc_address /* address */,
+const char */* filename */,
+unsigned int /* line_number */)
+
+/* Insert a GCC decl into the symbol table.  DECL is the decl to
+   insert.  IS_GLOBAL is true if this is an outermost binding, and
+   false if it is a possibly-shadowing binding.  */
+
+GCC_METHOD2 (int /* bool */, bind, gcc_decl /* decl */,
+int /* bool */ /* is_global */)
+
+/* Insert a tagged type into the symbol table.  NAME is the tag name
+   of the type and TAGGED_TYPE is the type itself.  TAGGED_TYPE must
+   be either a struct, union, or enum type, as these are the only
+   types that have tags.  FILENAME and LINE_NUMBER refer to the type's
+   source location.  If this is not known, FILENAME can be NULL and
+   LINE_NUMBER can be 0.  */
+
+GCC_METHOD4 (int /* bool */, tagbind,
+const char */* name */,
+gcc_type /* tagged_type */,
+const char * /* filename */,
+unsigned int /* line_number */)
+
+/* Return the type of a pointer to a given base type.  */
+
+GCC_METHOD1 (gcc_type, build_pointer_type,
+gcc_type /* base_type */)
+
+/* Create a new 'struct' type.  Initially it has no fields.  */
+
+GCC_METHOD0 (gcc_type, build_record_type)
+
+/* Create a new 'union' type.  Initially it has no fields.  */
+
+GCC_METHOD0 (gcc_type, build_union_type)
+
+/* Add a field to a struct or union type.  FIELD_NAME is the field's
+   name.  FIELD_TYPE is the type of the field

Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Xinliang David Li

On Fri, May 16, 2014 at 4:00 AM, Richard Biener
 wrote:
> On Fri, May 16, 2014 at 1:54 AM, Xinliang David Li  wrote:
>> Hi, debugging runtime bugs due to devirtualization can be hard for
>> very large C++ programs with complicated class hierarchy. This patch
>> adds the support to report this high level transformation via
>> -fopt-info (not hidden inside dump file) and the ability the do binary
>> search with cutoff.
>>
>> Ok for trunk after build and test?
>
> +  else if (dump_enabled_p ())
> +   {
> + location_t loc = gimple_location (ie->call_stmt);
> + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
> +  "Discovered direct call to non-function in %s, 
> "
>
> diagnostics start with lower-case.  Why not merge this with the
> dump_file case?  The point of all the infrastructure was to _not_
> need to distinguish the cases ...
>

Right -- however in this case I don't want the node 'order' to leak
into the opt report. Should we drop it?


> (similar for the other cases, and IIRC you miss one case in
> tree-ssa-pre.c calling ipa_intraprocedural_devirtualization?)

Good catch. Will add it.

thanks,

David
>
> Thanks,
> Richard.
>
>
>> thanks,
>>
>> David

[PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Tom Tromey

In the typical case, when compiling a snippet of user code, gdb wraps
the user's text in a dummy function.

It's somewhat odd for users if an error in their code is mentioned as
coming from this dummy function.

This patch makes it possible to suppress the function-name display in
a straightforward way: it adds a new global which the plugin can set
to declare the name of the dummy function.

This patch seems like a bit of a hack, but there didn't seem to be a
notably cleaner approach.

2014-05-16  Phil Muldoon  
Tom Tromey  

* c-lang.c (c_diagnostic_ignored_function): New global.
(c_print_error_function): New function.
(LANG_HOOKS_PRINT_ERROR_FUNCTION): Define.
* c-lang.h (c_diagnostic_ignored_function): Declare.
---
 gcc/c/ChangeLog |  8 
 gcc/c/c-lang.c  | 22 ++
 gcc/c/c-lang.h  |  4 
 3 files changed, 34 insertions(+)

diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
index 97c0443..e563813 100644
--- a/gcc/c/c-lang.c
+++ b/gcc/c/c-lang.c
@@ -35,6 +35,26 @@ along with GCC; see the file COPYING3.  If not see
 
 enum c_language_kind c_language = clk_c;
 
+// If non-zero, this names a function which should not be reported in
+// a diagnostic.  This is used by the gdb plugin to avoid showing the
+// generated function name to the user.
+const char *c_diagnostic_ignored_function;
+
+// An implementation of the print_error_function langhook that
+// respects C_DIAGNOSTIC_IGNORED_FUNCTION.
+static void
+c_print_error_function (diagnostic_context *context, const char *file,
+   diagnostic_info *diagnostic)
+{
+  if (c_diagnostic_ignored_function != NULL
+  && current_function_decl != NULL_TREE
+  && DECL_NAME (current_function_decl) != NULL_TREE
+  && strcmp (IDENTIFIER_POINTER (DECL_NAME (current_function_decl)),
+c_diagnostic_ignored_function) == 0)
+return;
+  lhd_print_error_function (context, file, diagnostic);
+}
+
 /* Lang hooks common to C and ObjC are declared in c-objc-common.h;
consequently, there should be very few hooks below.  */
 
@@ -44,6 +64,8 @@ enum c_language_kind c_language = clk_c;
 #define LANG_HOOKS_INIT c_objc_common_init
 #undef LANG_HOOKS_INIT_TS
 #define LANG_HOOKS_INIT_TS c_common_init_ts
+#undef LANG_HOOKS_PRINT_ERROR_FUNCTION
+#define LANG_HOOKS_PRINT_ERROR_FUNCTION c_print_error_function
 
 /* Each front end provides its own lang hook initializer.  */
 struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
index 7fcf333..022206f 100644
--- a/gcc/c/c-lang.h
+++ b/gcc/c/c-lang.h
@@ -59,4 +59,8 @@ struct GTY(()) language_function {
attribute lists.  */
 extern GTY(()) int current_omp_declare_target_attribute;
 
+/* If non-zero, the name of a function whose name should not be
+   reported in a diagnostic.  */
+extern const char *c_diagnostic_ignored_function;
+
 #endif /* ! GCC_C_LANG_H */
-- 
1.9.0

[C++ Patch] PR 51640

2014-05-16 Thread Paolo Carlini


Hi,

the main issue in the bug report is luckily already fixed in mainline 
and 4.9 (had to do with a commit of mine ;( but I think we can do a 
little better in the diagnostic: avoid the final redundant "does not 
name a type" error after the error message about ambiguity. Tested 
x86_64-linux.


Thanks,
Paolo.

//
/cp
2014-05-16  Paolo Carlini  

PR c++/51640
* parser.c (cp_parser_diagnose_invalid_type_name): Early return
when cp_parser_lookup_name sets ambiguous_decls.

/testsuite
2014-05-16  Paolo Carlini  

PR c++/51640
* g++.dg/parse/error54.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 210512)
+++ cp/parser.c (working copy)
@@ -2880,13 +2880,21 @@ cp_parser_diagnose_invalid_type_name (cp_parser *p
  tree scope, tree id,
  location_t location)
 {
-  tree decl, old_scope;
+  tree decl, old_scope, ambiguous_decls;
   cp_parser_commit_to_tentative_parse (parser);
   /* Try to lookup the identifier.  */
   old_scope = parser->scope;
   parser->scope = scope;
-  decl = cp_parser_lookup_name_simple (parser, id, location);
+  decl = cp_parser_lookup_name (parser, id, none_type,
+   /*is_template=*/false,
+   /*is_namespace=*/false,
+   /*check_dependency=*/true,
+   &ambiguous_decls, location);
   parser->scope = old_scope;
+  if (ambiguous_decls)
+/* If the lookup was ambiguous, an error will already have
+   been issued.  */
+return;
   /* If the lookup found a template-name, it means that the user forgot
   to specify an argument list. Emit a useful error message.  */
   if (TREE_CODE (decl) == TEMPLATE_DECL)
Index: testsuite/g++.dg/parse/error54.C
===
--- testsuite/g++.dg/parse/error54.C(revision 0)
+++ testsuite/g++.dg/parse/error54.C(working copy)
@@ -0,0 +1,19 @@
+// PR c++/51640
+
+class ex {};
+
+namespace t
+{
+  class ex2 : public ex {};
+}
+
+class ex2 : public ex {};
+
+void bar()
+{
+  using namespace t;
+
+  try {
+  } catch (ex2&) { // { dg-error "reference to 'ex2' is ambiguous" }
+  }
+}

Re: [PATCH] Implement -fsanitize=float-cast-overflow

2014-05-16 Thread Joseph S. Myers

On Fri, 16 May 2014, Jakub Jelinek wrote:

> has variable.  Guess the combination of IBM long double and __int128_t
> (for long long it is still ok) is rare enough that we don't need to care
> about it (alternative would be e.g. to compare against addition of

There are existing problems with that combination anyway.  
fp-int-convert-timode.c has tests disabled for it because the compile-time 
and runtime conversions don't agree (whether for this issue or some other, 
I don't know).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Jan Hubicka

> Hi, debugging runtime bugs due to devirtualization can be hard for
> very large C++ programs with complicated class hierarchy. This patch
> adds the support to report this high level transformation via
> -fopt-info (not hidden inside dump file) and the ability the do binary
> search with cutoff.
> 
> Ok for trunk after build and test?

Seems resonable to me.
> 
> thanks,
> 
> David

> Index: ChangeLog
> ===
> --- ChangeLog (revision 210479)
> +++ ChangeLog (working copy)
> @@ -1,3 +1,18 @@
> +2014-05-15  Xinliang David Li  
> +
> + * cgraphunit.c (walk_polymorphic_call_targets): Add
> + dbgcnt and fopt-info support.
> + 2014-05-15  Xinliang David Li  
> +
> + * cgraphunit.c (walk_polymorphic_call_targets): Add
> + dbgcnt and fopt-info support.
> + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto.
> + * ipa-devirt.c (ipa_devirt): Ditto.
> + * ipa.c (walk_polymorphic_call_targets): Ditto.
> + * gimple-fold.c (fold_gimple_assign): Ditto.
> + (gimple_fold_call): Ditto.
> + * dbgcnt.def: New counter.
> +
>  2014-05-15  Martin Jambor  
>  
>   PR ipa/61085
> Index: ipa-prop.c
> ===
> --- ipa-prop.c(revision 210479)
> +++ ipa-prop.c(working copy)
> @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.
>  #include "ipa-utils.h"
>  #include "stringpool.h"
>  #include "tree-ssanames.h"
> +#include "dbgcnt.h"
>  
>  /* Intermediate information about a parameter that is only useful during the
> run of ipa_analyze_node and is not kept afterwards.  */
> @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c
>   fprintf (dump_file, "ipa-prop: Discovered direct call to 
> non-function"
>   " in %s/%i, making it unreachable.\n",
>ie->caller->name (), ie->caller->order);
> +  else if (dump_enabled_p ())
> + {
> +   location_t loc = gimple_location (ie->call_stmt);
> +   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
> +"Discovered direct call to non-function in %s, "
> +"making it unreachable\n", ie->caller->name ());

Perhaps "turning it to __builtin_unreachable call" and similarly in the other 
cases
we introduce __builtin_unreachable? I think that could be easier for user to 
work
out.

What king of problems in devirtualizatoin you are seeing?


Honza

Re: [PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Joseph S. Myers

On Fri, 16 May 2014, Tom Tromey wrote:

> In the typical case, when compiling a snippet of user code, gdb wraps
> the user's text in a dummy function.
> 
> It's somewhat odd for users if an error in their code is mentioned as
> coming from this dummy function.
> 
> This patch makes it possible to suppress the function-name display in
> a straightforward way: it adds a new global which the plugin can set
> to declare the name of the dummy function.
> 
> This patch seems like a bit of a hack, but there didn't seem to be a
> notably cleaner approach.

I'd say this global actually belongs somewhere in the diagnostic_context 
(i.e., instead of the diagnostic_context_auxiliary_data (DC) actually 
being a tree as it is at present, it should point to a structure with 
whatever extra information clients wish to use to control aspects of 
diagnostic reporting).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Xinliang David Li

On Fri, May 16, 2014 at 9:03 AM, Jan Hubicka  wrote:
>> Hi, debugging runtime bugs due to devirtualization can be hard for
>> very large C++ programs with complicated class hierarchy. This patch
>> adds the support to report this high level transformation via
>> -fopt-info (not hidden inside dump file) and the ability the do binary
>> search with cutoff.
>>
>> Ok for trunk after build and test?
>
> Seems resonable to me.
>>
>> thanks,
>>
>> David
>
>> Index: ChangeLog
>> ===
>> --- ChangeLog (revision 210479)
>> +++ ChangeLog (working copy)
>> @@ -1,3 +1,18 @@
>> +2014-05-15  Xinliang David Li  
>> +
>> + * cgraphunit.c (walk_polymorphic_call_targets): Add
>> + dbgcnt and fopt-info support.
>> + 2014-05-15  Xinliang David Li  
>> +
>> + * cgraphunit.c (walk_polymorphic_call_targets): Add
>> + dbgcnt and fopt-info support.
>> + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto.
>> + * ipa-devirt.c (ipa_devirt): Ditto.
>> + * ipa.c (walk_polymorphic_call_targets): Ditto.
>> + * gimple-fold.c (fold_gimple_assign): Ditto.
>> + (gimple_fold_call): Ditto.
>> + * dbgcnt.def: New counter.
>> +
>>  2014-05-15  Martin Jambor  
>>
>>   PR ipa/61085
>> Index: ipa-prop.c
>> ===
>> --- ipa-prop.c(revision 210479)
>> +++ ipa-prop.c(working copy)
>> @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.
>>  #include "ipa-utils.h"
>>  #include "stringpool.h"
>>  #include "tree-ssanames.h"
>> +#include "dbgcnt.h"
>>
>>  /* Intermediate information about a parameter that is only useful during the
>> run of ipa_analyze_node and is not kept afterwards.  */
>> @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c
>>   fprintf (dump_file, "ipa-prop: Discovered direct call to 
>> non-function"
>>   " in %s/%i, making it unreachable.\n",
>>ie->caller->name (), ie->caller->order);
>> +  else if (dump_enabled_p ())
>> + {
>> +   location_t loc = gimple_location (ie->call_stmt);
>> +   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
>> +"Discovered direct call to non-function in %s, "
>> +"making it unreachable\n", ie->caller->name ());
>
> Perhaps "turning it to __builtin_unreachable call" and similarly in the other 
> cases
> we introduce __builtin_unreachable? I think that could be easier for user to 
> work
> out.

Ok.

>
> What king of problems in devirtualizatoin you are seeing?

I have been chasing a runtime failure of a very large test built with
gcc-4_9. The bad code either calls a pure function or turn a virtual
call into __builtin_unreachable (incomplete target set). The indirect
info shows the otr type to be !maybe_derived_type, and the outer-type
gets cleared during inline update. I isolated a small test case -- but
the good news is that gcc-4_9 @head already fixed the problem.

I will  check in the test case to trunk later.

thanks,

David

>
>
> Honza

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Jeff Law


On 05/16/14 04:07, Bin.Cheng wrote:

On Fri, May 16, 2014 at 1:13 AM, Jeff Law  wrote:

On 05/15/14 10:51, Mike Stump wrote:


On May 15, 2014, at 12:26 AM, bin.cheng  wrote:


Here comes up with a new GCC pass looking through each basic block
and merging paired load store even they are not adjacent to each
other.



So I have a target that has load and store multiple support that
supports large a number of registers (2-n registers), and I added a
sched0 pass that is a light copy of the regular scheduling pass that
uses a different cost function which arranges all loads first, then
all stores then everything else.  Within a group of loads or stores
the secondary key is the base register, the next key is the offset.
The net result, all loads off the same register are sorted in
increasing order.


Glad to see someone else stumble on (ab)using the scheduler to do this.

Emm, If it's (ab)using, should we still do it then?
I think it'd still be fine.  There's even been a comment about doing 
this kind of thing in the scheduler that's been around since the early 
90s...


The scheduler is a bit interesting in that it has a wealth of dependency 
information and the ability to reorganize the insn stream in relatively 
arbitrary ways.  That seems to make it a natural place to think about 
transformations of this nature.  We just haven't had a good 
infrastructure for doing that.


In theory we're a lot closer now to being able to plug in different 
costing/sorting models and let the scheduler do its thing.  Those models 
might rewrite for register pressure, or encourage certain independent 
insns to issue back-to-back to encourage combining, or to build 
candidate insns for delay slot scheduling, etc.



As Mike stated, merging of consecutive memory accesses is all about
the base register and the offset. I am thinking another method
collecting all memory accesses with same base register then doing the
merge work.  In this way, we should be able to merge more than 2
instructions, also it would be possible to remove redundant load
instructions in one pass.

My question is how many these redundant loads could be?  Is there any
rtl pass responsible for this now?
I suspect it's a lot less important now than it used to be.  But there's 
probably some cases where it'd be useful.  Combining sub-word accesses 
into full-word accesses come immediately to mind.


I'm not aware of any pass which does these kind of changes in a general 
form.  Some passes (caller-save) do a fair amount of work to track when 
they can generate multi-object loads/stores (and it was a huge win back 
on the old sparc processors).



jeff

Re: [PATCH (for next stage 1)] Add return type to gimple function dumps

2014-05-16 Thread David Malcolm

On Fri, 2014-05-16 at 14:59 +0200, Richard Biener wrote:
> On Tue, Apr 29, 2014 at 5:01 PM, David Malcolm  wrote:
> > On Tue, 2014-04-29 at 11:16 +0200, Richard Biener wrote:
> >> On Tue, Apr 29, 2014 at 2:58 AM, David Malcolm  wrote:
> >> > On Thu, 2014-04-24 at 15:46 -0600, Jeff Law wrote:
> >> >> On 03/10/14 13:22, David Malcolm wrote:
> >> >> > Gimple function dumps contain the types of parameters, but not of the
> >> >> > return type.
> >> >> >
> >> >> > The attached patch fixes this omission; here's an example of the
> >> >> > before/after diff:
> >> >> > $ diff -up /tmp/pr23401.c.004t.gimple.old 
> >> >> > /tmp/pr23401.c.004t.gimple.new
> >> >> > --- /tmp/pr23401.c.004t.gimple.old  2014-03-10 13:40:08.972063541 
> >> >> > -0400
> >> >> > +++ /tmp/pr23401.c.004t.gimple.new  2014-03-10 13:39:49.346515464 
> >> >> > -0400
> >> >> > @@ -1,3 +1,4 @@
> >> >> > +int
> >> >> >    (int i)
> >> >> >   {
> >> >> > int D.1731;
> >> >> >
> >> >> >
> >> >> > Successfully bootstrapped and regrtested on x86_64 Linux (Fedora 20).
> >> >> >
> >> >> > A couple of test cases needed tweaking, since they were counting the
> >> >> > number of occurrences of "int" in the gimple dump, which thus changed
> >> >> > for functions returning int (like the one above).
> >> >> >
> >> >> > OK for next stage 1?
> >> >> Conceptually OK.  As Richi notes, the work here is in fixing up the
> >> >> testsuite.  I didn't see a reply to Richi's question, particularly WRT
> >> >> the Fortran testsuite.
> >> >
> >> > I'm attaching a revised version of the patch which adds the use of
> >> > TDF_SLIM (though it didn't appear to be necessary in the test I did of a
> >> > function returning a struct).
> >> >
> >> > Successfully bootstrapped & regrtested on x86_64 Linux (Fedora 20),
> >> > using:
> >> >   --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
> >> >
> >> > I didn't see any new failures from this in the testsuite, in particular
> >> > gfortran.sum.  Here's a comparison of the before/after test results,
> >> > generated using my "jamais-vu" tool [1], with comments added by me
> >> > inline:
> >> >
> >> > Comparing 16 common .sum files
> >> > --
> >> >
> >> >  gcc/testsuite/ada/acats/acats.sum : total: 2320 PASS: 2320
> >> >  gcc/testsuite/g++/g++.sum : total: 90421 FAIL: 3 PASS: 86969 XFAIL: 445 
> >> > UNSUPPORTED: 3004
> >> >  gcc/testsuite/gcc/gcc.sum : total: 110458 FAIL: 45 PASS: 108292 XFAIL: 
> >> > 265 XPASS: 33 UNSUPPORTED: 1823
> >> >  gcc/testsuite/gfortran/gfortran.sum : total: 45717 PASS: 45600 XFAIL: 
> >> > 52 UNSUPPORTED: 65
> >> >  gcc/testsuite/gnat/gnat.sum : total: 1255 PASS: 1234 XFAIL: 18 
> >> > UNSUPPORTED: 3
> >> >  gcc/testsuite/go/go.sum : total: 7266 PASS: 7258 XFAIL: 1 UNTESTED: 6 
> >> > UNSUPPORTED: 1
> >> >  gcc/testsuite/obj-c++/obj-c++.sum : total: 1450 PASS: 1354 XFAIL: 10 
> >> > UNSUPPORTED: 86
> >> >  gcc/testsuite/objc/objc.sum : total: 2973 PASS: 2893 XFAIL: 6 
> >> > UNSUPPORTED: 74
> >> >  x86_64-unknown-linux-gnu/boehm-gc/testsuite/boehm-gc.sum : total: 13 
> >> > PASS: 12 UNSUPPORTED: 1
> >> >  x86_64-unknown-linux-gnu/libatomic/testsuite/libatomic.sum : total: 54 
> >> > PASS: 54
> >> >  x86_64-unknown-linux-gnu/libffi/testsuite/libffi.sum : total: 1856 
> >> > PASS: 1801 UNSUPPORTED: 55
> >> >  x86_64-unknown-linux-gnu/libgo/libgo.sum : total: 122 PASS: 122
> >> >  x86_64-unknown-linux-gnu/libgomp/testsuite/libgomp.sum : total: 2420 
> >> > PASS: 2420
> >> >  x86_64-unknown-linux-gnu/libitm/testsuite/libitm.sum : total: 30 PASS: 
> >> > 26 XFAIL: 3 UNSUPPORTED: 1
> >> >  x86_64-unknown-linux-gnu/libjava/testsuite/libjava.sum : total: 2586 
> >> > PASS: 2582 XFAIL: 4
> >> >  x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum : total: 
> >> > 10265 PASS: 1 XFAIL: 41 UNSUPPORTED: 224
> >> >
> >> > (...i.e. the totals were unchanged between unpatched/patched for all of
> >> > the .sum files; and yes, Fortran was tested.  Should there be a
> >> > gcj.sum?)
> >> >
> >> > Tests that went away in gcc/testsuite/gcc/gcc.sum: 2
> >> > 
> >> >
> >> >  PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple "int" 5
> >> >  PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple "int" 3
> >> >
> >> > Tests appeared in gcc/testsuite/gcc/gcc.sum: 2
> >> > --
> >> >
> >> >  PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple "int" 6
> >> >  PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple "int" 4
> >> >
> >> >
> >> > (...my comparison tool isn't smart enough yet to tie these "went
> >> > away"/"appeared" results together; they reflect the fixups from the
> >> > patch).
> >> >
> >> > Tests that went away in gcc/testsuite/go/go.sum: 2
> >> > --
> >> >
> >> >  PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of 
> >> > build) compilation,  -O2 -g
> >> >  PASS

Re: [patch,mips] avoid invalid register for JALR

2014-05-16 Thread Maciej W. Rozycki

On Wed, 14 May 2014, Sandra Loosemore wrote:

> > > When I was trying to benchmark another patch (which I'll be sending
> > > along shortly) with CSiBE for -mabi=64, I ran into an assembler error
> > > like this:
> > > 
> > > /tmp/ccJv2faG.s: Assembler messages:
> > > /tmp/ccJv2faG.s:1605: Error: a destination register must be supplied
> > > `jalr $31'
> > 
> > JALR patterns should have an explicit clobber of $31, which I thought
> > was also supposed to stop $31 from being used as the call address.
> 
> Hm.  Yes, that ought to work, in theory
> 
> > Do you have a testcase?
> 
> I can reproduce the error in a current mipsisa64-elfoabi build, with my patch
> to delete ADJUST_REG_ALLOC_ORDER applied.  It triggers on this file from
> CSiBE:
> 
> mipsisa64-elfoabi-gcc -c -mabi=64 -O2 -fno-common -w
> csibe/src/./ttt-0.10.1.preproc/src/connect4.i

 I wonder if there's something fishy going on here.  I checked output 
produced with -dP and the offending instruction is emitted like this:

 #(call_insn 172 124 161 (parallel [
 #(call (mem:SI (reg:DI 31 $31) [0 c4_setup S4 A32])
 #(const_int 0 [0]))
 #(clobber (reg:SI 31 $31))
 #]) c4_new.i:79 594 {call_internal}
 # (expr_list:REG_DEAD (reg:DI 31 $31)
 #(expr_list:REG_DEAD (reg:DI 7 $7)
 #(expr_list:REG_DEAD (reg:DI 6 $6)
 #(expr_list:REG_DEAD (reg:DI 5 $5)
 #(expr_list:REG_DEAD (reg:DI 4 $4)
 #(nil))
 #(expr_list:DI (use (reg:DI 4 $4))
 #(expr_list:SI (use (reg:DI 5 $5))
 #(expr_list:SI (use (reg:DI 6 $6))
 #(expr_list:SI (use (reg:DI 7 $7))
 #(nil))
jalr$31  # 172  call_internal/1 [length = 4]

so clearly the clobber is ignored, or perhaps rather considered a late 
clobber instead of an early clobber that's indeed required here.

 I have reduced your case a bit and attached the result to this e-mail.  
With this code I can reproduce the problem using the following command:

$ mips-sde-elf-gcc -S -dP -mabi=64 -O2 -fno-common -w -o c4_new.s c4_new.i

and current trunk with the patch you recently posted as archived at:

https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01016.html

applied.  With the patch reverted the issue goes away ($17 is used for the 
jump), so clearly the register allocation order override made in 
mips_order_regs_for_local_alloc is currently covering an underlying bug.

  Maciejtypedef unsigned int size_t;

extern void *malloc (size_t __size);

typedef char bool;
typedef struct _Board {
  struct _Board *(*copy) (const struct _Board *);  
  void (*setup) (struct _Board *, int, int, int);
  void (*display) (struct _Board *);  
  int (*eval) (struct _Board *);
  void (*score) (struct _Board *, int *, int *);  
  bool (*full) (struct _Board *);  
  int (*winner) (struct _Board *); 
 
  bool (*valid_move) (struct _Board *, int);  
  bool (*move) (struct _Board *, int, int);   
  void (*unmove) (struct _Board *);   
  int (*getmove) (struct _Board *, int, 
int *, int *);  
  void (*help) (void);  
  char (*symbol) (struct _Board *, int, int);   
  void (*coords) (struct _Board *, int, char *);  
  int rows, cols;
  int squares;   
  int *moves;  
  int nummoves;  
  int X_player;  
  int *board;   
  int **points;  
  int numpoints;
  int depth, depth2;   
  int *center;  
} Board;

Board *c4_copy (const Board *);
void c4_setup (Board *, int, int, int);
void c4_display (Board *);
int c4_eval (Board *);
inline bool c4_full (Board *);
int c4_winner (Board *);
bool c4_valid_move (Board *, int);
bool c4_move (Board *, int, int);
void c4_unmove (Board *);
int c4_getmove (Board *, int, int *, int *);
void c4_help (void);
char c4_symbol (Board *, int, int);
void c4_coords (Board *, int, char *);

Board *c4_new (int players, int size, int depth)
{
  Board *T = (Board *) malloc (sizeof (Board));
  Board *B = (Board *)T;
  int **points;
  int *board;
  int i, j, point = 0;
  int MyPE = 0;
  int rows, cols;

  B->copy = c4_copy;
  B->setup = c4_setup;
  B->display = c4_display;
  B->eval = c4_eval;
  B->full = c4_full;
  B->winner = c4_winner;
  B->valid_move = c4_valid_move;
  B->move = c4_move;
  B->unmove = c4_unmove;
  B->getmove = c4_getmove;
  B->help = c4_help;
  B->symbol = c4_symbol;
  B->coords = c4_coords;
  B->score = ((void *)0) ;

  rows = B->rows = 6;
  cols = B->cols = 7;
  B->squares = 7;  

  if (MyPE == 0)
c4_setup ((Board *)T, players, depth, 0);

  B->board = (int *) malloc (rows*cols * sizeof(int));

  B->moves = (int *) malloc (rows*cols * sizeof(int));
  B->nummoves = 0;

  for (i = 0; i < rows*cols; i++)
B->board[i] = B->moves[i] = 0;

  B->numpoints = rows*(cols-3) + cols*(rows-3) + 2*(rows-3)*(cols-3);
  points = (int **) malloc (B->numpoints * 4 * sizeof(int *));
  board = B->board;
   
  for (j = 0; j < rows - 3; j++) {
for (i =

Re: [PATCH 0/5] let gdb reuse gcc'c C compiler

2014-05-16 Thread Mike Stump

On May 16, 2014, at 8:26 AM, Tom Tromey  wrote:
> This patch series is half of a project to let gdb reuse gcc (which
> half depends on which list you are seeing this on), so that users can
> compile small snippets of code and evaluate them in the current
> context of the inferior.

Nice…  I looked though it, nice and lightweight and seems like the api can hold 
up through time.

Re: [PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Manuel López-Ibáñez

> This patch makes it possible to suppress the function-name display in
> a straightforward way: it adds a new global which the plugin can set
> to declare the name of the dummy function.
>
> This patch seems like a bit of a hack, but there didn't seem to be a
> notably cleaner approach.

Can't you override the diagnostic_starter() in your plugin? This way
you can even customize it to print "gdb" in the prefix (if you wish to
provide a custom prefix). If that is possible, that seems way cleaner.
If not, I wonder why not.

Otherwise, why not override the lang_hook itself? This way the plugin
can provide its own lang_hook so that your proposed
c_print_error_function lives in the plugin itself.

I'm probably missing something obvious here.

Cheers,

Manuel.

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Jeff Law


On 05/16/14 04:07, Bin.Cheng wrote:


Yes, I think this one does have a good reason.  The target independent
pass just makes sure that two consecutive memory access instructions
are free of data-dependency with each other, then feeds it to back-end
hook.  It's back-end's responsibility to generate correct instruction.
But given these two memory access insns, there's only a couple ways 
they're likely to combine into a single insn.  We could just as easily 
have the target independent code construct a new insn then try to 
recognize it.  If it's not recognized, then try the other way.


Or is it the case that we're doing something beyond upsizing the mode?



  It's not about modifying an existing insn then recognize it, it's
about creating new instruction sometimes.  For example, we can
generate a simple move insn in Arm mode, while have to generate a
parallel instruction in Thumb mode.  Target independent part has no
idea how to generate an expected insn.  Moreover, back-end may check
some special conditions too.
But can't you go through movXX to generate either the simple insn on the 
ARM or the PARALLEL on the thumb?


Jeff

Re: [C++ Patch] PR 51640

2014-05-16 Thread Jason Merrill


OK.

Jason

Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Jan Hubicka

> 
> I have been chasing a runtime failure of a very large test built with
> gcc-4_9. The bad code either calls a pure function or turn a virtual
> call into __builtin_unreachable (incomplete target set). The indirect
> info shows the otr type to be !maybe_derived_type, and the outer-type
> gets cleared during inline update. I isolated a small test case -- but
> the good news is that gcc-4_9 @head already fixed the problem.
> 
> I will  check in the test case to trunk later.

Good, testcase would be welcome.  I guess it was the fix for placement_new bug.
It disables some valid devirtualizations (and I thus may revisit the fix for
4.10), so it would be good to know if your testcase differs from the original
PR one.

Honza
> 
> thanks,
> 
> David
> 
> >
> >
> > Honza

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Mike Stump

On May 16, 2014, at 3:07 AM, Bin.Cheng  wrote:
> 
>> I don't see how regrename will help resolve [base+offset] false
>> dependencies. Can you explain? I'd expect effects from
>> hardreg-copyprop "commoning" a base register.
> It's the register operand's false dependency, rather than the base's
> one.  Considering below simple case:
>mov r1,  #const1
>store r1, [base+offset1]
>mov r1, #const2
>store r1, [base_offset2]
> It should be renamed into:
>mov r1,  #const1
>store r1, [base+offset1]
>mov r2, #const2
>store r2, [base_offset2]

Ah, but, what did this look like right before pass_web?

Re: we are starting the wide int merge

2014-05-16 Thread Gerald Pfeifer

On Sat, 10 May 2014, Gerald Pfeifer wrote:
> Since (at least) 16:40 UTC that day my i386-unknown-freebsd10.0 builds
> fail as follows:
> 
>   Comparing stages 2 and 3
>   warning: gcc/cc1obj-checksum.o differs
>   warning: gcc/cc1-checksum.o differs
>   warning: gcc/cc1plus-checksum.o differs
>   Bootstrap comparison failure!
>   gcc/fold-const.o differs
>   gcc/simplify-rtx.o differs
>   gcc/tree-ssa-ccp.o differs
> 
> (FreeBSD/i386 really builds for i486, but retains the original name;
> I'm traveling with limited access, but would not be surprised for this
> to also show up for i386-*-linux-gnu or i486-*-linux-gnu.)

Is anybody able to reproduce this, for example on a GNU/Linux system?

This tester of mine hasn't been able to bootstrap for nearly a week,
and timing-wise it would be really a coincidence were this not due to
wide-int.

Gerald

Re: RFA: Fix calculation of size of builtin setjmp buffer

2014-05-16 Thread Eric Botcazou

>OK - here is your version of the patch, extended with a comment which
> I think is helpful for other people reading the code, and with the
> changes to builtins.c and md.texi removed, since the size of the buffer
> is not changing.
> 
>Is this version OK to apply ?

Yes, IMO that's fine, thanks.

-- 
Eric Botcazou

Re: RFA: Fix calculation of size of builtin setjmp buffer

2014-05-16 Thread Eric Botcazou

> But, fails whenever the size of the mode of the save area is bigger than a
> certain amount…  On my port, the size taken up by the  save area is large
> enough to cause this to fail.  :-(

That's a bit unexpected, why do you need so big a save area exactly?  The only 
architecture for which this doesn't work is the IA-64, which is a very special 
beast...  In this case, the way out is to define DONT_USE_BUILTIN_SETJMP and 
JMP_BUF_SIZE to the needed size.

-- 
Eric Botcazou

Re: [PATCH] Fix ARM NAN fraction bits

2014-05-16 Thread Maciej W. Rozycki

On Thu, 27 Feb 2014, Joey Ye wrote:

> Current ARM soft-float implementation is violating the RTABI
> (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pd
> f) Section 4.1.1.1:
> 
> When not otherwise specified by IEEE 754, the result on an invalid operation
> should be the quiet NaN bit pattern with only the most significant bit of
> the significand set, and all other significand bits zero. 
> 
> This patch fixes it by setting _FP_NANFRAC_* to zero.
> 
> Ran make check test with –mfloat-abi=soft. No regression.
> 
> OK to checkin?
> 
> 2014-02-27  Joey Ye  
> * sysdeps/arm/soft-fp/sfp-machine.h 
>   (_FP_NANFRAC_S, _FP_NANFRAC_D, _FP_NANFRAC_Q):
>   Set to zero.
> 
> 
> diff --git a/sysdeps/arm/soft-fp/sfp-machine.h
> b/sysdeps/arm/soft-fp/sfp-machine.h
> index 52a08b5..32697fe 100644
> --- a/sysdeps/arm/soft-fp/sfp-machine.h
> +++ b/sysdeps/arm/soft-fp/sfp-machine.h
> @@ -21,9 +21,9 @@
> #define _FP_DIV_MEAT_D(R,X,Y)  _FP_DIV_MEAT_2_udiv(D,R,X,Y)
> #define _FP_DIV_MEAT_Q(R,X,Y)  _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
> 
> -#define _FP_NANFRAC_S  ((_FP_QNANBIT_S << 1) - 1)
> -#define _FP_NANFRAC_D ((_FP_QNANBIT_D << 1) - 1),
> -1
> -#define _FP_NANFRAC_Q ((_FP_QNANBIT_Q << 1) - 1),
> -1, -1, -1
> +#define _FP_NANFRAC_S 0
> +#define _FP_NANFRAC_D    0, 0
> +#define _FP_NANFRAC_Q    0, 0, 0, 0
> #define _FP_NANSIGN_S   0
> #define _FP_NANSIGN_D  0
> #define _FP_NANSIGN_Q  0

 This did regrettably, when propagated to libgcc, regress 
gcc.dg/torture/builtin-math-7.c on soft-fp arm-eabi targets, currently 
ARMv6-M (`-march=armv6-m -mthumb') only.  This is because these NANFRAC 
macros have now no bits set and as a result when used to construct a NaN 
in the semi-raw mode, they build an infinity instead.  Consequently 
operations such as (Inf - Inf) now produce Inf rather than NaN.  The 
change worked for the original test case, because division is made in the 
canonical mode, where the quiet bit is set separately, from the fp class.

 Here's a fix making code match the commit description quoted above, that 
is set the most significant bit of the significand.  This is also what 
targets similar in this respect do.

 OK to apply?  OK for libgcc (against libgcc/config/arm/sfp-machine.h), in 
particular for GCC 4.8 and 4.9?

2014-05-16  Maciej W. Rozycki  

PR libgcc/60166
* sysdeps/arm/soft-fp/sfp-machine.h (_FP_NANFRAC_S, _FP_NANFRAC_D)
(_FP_NANSIGN_Q): Set the quiet bit.

  Maciej

glibc-soft-fp-arm-nanfrac.diff
Index: glibc-fsf-trunk-quilt/sysdeps/arm/soft-fp/sfp-machine.h
===
--- glibc-fsf-trunk-quilt.orig/sysdeps/arm/soft-fp/sfp-machine.h
2014-05-16 03:25:52.0 +0100
+++ glibc-fsf-trunk-quilt/sysdeps/arm/soft-fp/sfp-machine.h 2014-05-16 
03:31:34.451805339 +0100
@@ -21,9 +21,9 @@
 #define _FP_DIV_MEAT_D(R,X,Y)  _FP_DIV_MEAT_2_udiv(D,R,X,Y)
 #define _FP_DIV_MEAT_Q(R,X,Y)  _FP_DIV_MEAT_4_udiv(Q,R,X,Y)

-#define _FP_NANFRAC_S  0
-#define _FP_NANFRAC_D  0, 0
-#define _FP_NANFRAC_Q  0, 0, 0, 0
+#define _FP_NANFRAC_S  _FP_QNANBIT_S
+#define _FP_NANFRAC_D  _FP_QNANBIT_D, 0
+#define _FP_NANFRAC_Q  _FP_QNANBIT_Q, 0, 0, 0
 #define _FP_NANSIGN_S  0
 #define _FP_NANSIGN_D  0
 #define _FP_NANSIGN_Q  0

Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread pinskia



> On May 16, 2014, at 4:13 AM, Richard Biener  
> wrote:
> 
> On Fri, May 16, 2014 at 1:03 PM, Richard Biener
>  wrote:
>> On Fri, May 16, 2014 at 12:56 PM,   wrote:
>>> 
>>> 
 On May 16, 2014, at 3:48 AM, Richard Biener  
 wrote:
 
 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
  wrote:
> Ping?
 
 Sorry ...
 
> Best regards,
> 
> Thomas Preud'homme
> 
>> -Original Message-
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
>> Sent: Friday, May 09, 2014 6:26 PM
>> To: GCC Patches
>> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
>> 
>> Sorry, took longer than expected as I got distracted by some other patch.
>> I merged the whole patchset in a single patch as I was told the current 
>> setup
>> is actually more difficult to read.
>> 
>> Here are the updated ChangeLogs:
>> 
>> *** gcc/ChangeLog ***
>> 
>> 2014-05-09  Thomas Preud'homme  
>> 
>> PR tree-optimization/54733
>> * expr.c (get_inner_reference): Add a parameter to control whether
>> a
>> MEM_REF should be split into base + offset.
>> * tree.h (get_inner_reference): Default new parameter to false.
>> * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure.
>> (CMPNOP): Define.
>> (find_bswap_or_nop_load): New.
>> (find_bswap_1): Renamed to ...
>> (find_bswap_or_nop_1): This. Also add support for memory source.
>> (find_bswap): Renamed to ...
>> (find_bswap_or_nop): This. Also add support for memory source and
>> detection of bitwise operations equivalent to load in host 
>> endianness.
>> (execute_optimize_bswap): Likewise. Also move its leading
>> comment back
>> in place and split statement transformation into ...
>> (bswap_replace): This. Add assert when updating bswap_stats.
>> 
>> *** gcc/testsuite/ChangeLog ***
>> 
>> 2014-05-09  Thomas Preud'homme  
>> 
>> PR tree-optimization/54733
>> * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
>> bswap
>> optimization to support memory sources and bitwise operations
>> equivalent to load in host endianness.
>> * gcc.dg/optimize-bswaphi-1.c: Likewise.
>> * gcc.dg/optimize-bswapsi-2.c: Likewise.
>> * gcc.c-torture/execute/bswap-2.c: Likewise.
>> 
>> Ok for trunk?
 
 Ok, I now decided otherwise and dislike the new parameter to
 get_inner_reference.  Can you please revert that part and just
 deal with a MEM_REF result in your only caller?
 
 And (of course) I found another possible issue.  The way you
 compute load_type and use it here:
 
 +  /* Perform the load.  */
 +  load_offset_ptr = build_int_cst (n->alias_set, 0);
 +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
 + load_offset_ptr);
 
 makes the load always appear aligned according to the mode of
 load_type.  On strict-alignment targets this may cause faults.
 
 So what you have to do is either (simpler)
 
  unsigned int align = get_pointer_alignment (addr_tmp);
  tree al_load_type = load_type;
  if (align < TYPE_ALIGN (load_type))
al_load_type = build_aligned_type (load_type, align);
 ...
   val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
load_offset_ptr);
 
 or keep track of the "first" actual load and use
 
  unsigned int align = get_object_alignment (that_first_load);
 
 "first" in the one that corresponds to addr_tmp.  From that there
 is a much better chance to derive good alignment values.
 
 Of course on STRICT_ALIGNMENT targets a not aligned load
 will be decomposed again, so eventually doing the transformation
 may no longer be profitable(?).
>>> 
>>> Not always decomposed. On MIPS, it should using the load/store left/right 
>>> instructions for unaligned load/stores which is normally better than 
>>> decomposed load/stores. So having a cost model would be nice.
>> 
>> Agreed, but I am happy with doing that as a followup.  Btw,
>> a very simple one would be to reject unaligned
>> SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
>> [of course that may be true on MIPS even for the cases where
>> a "reasonable" fast unalgined variant exists - nearly no target
>> defines that macro in a too fancy way]
> 
> Oh, and what happens for
> 
> unsigned foo (unsigned char *x)
> {
>  return x[0] << 24 | x[2] << 8 | x[3];
> }
> 
> ?  We could do an unsigned int load from x and zero byte 3
> with an AND.  Enhancement for a followup, similar to also
> considering vector types for the load (also I'm not sure
> that uint64_type_node always has non-BLKmod

Re: [PATCH] Fix ARM NAN fraction bits

2014-05-16 Thread Joseph S. Myers

On Fri, 16 May 2014, Maciej W. Rozycki wrote:

> 2014-05-16  Maciej W. Rozycki  
> 
>   PR libgcc/60166
>   * sysdeps/arm/soft-fp/sfp-machine.h (_FP_NANFRAC_S, _FP_NANFRAC_D)
>   (_FP_NANSIGN_Q): Set the quiet bit.

OK for glibc.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: PR 61084: Use widest_int in sparc_fold_builtin

2014-05-16 Thread Eric Botcazou

> This is the second part of PR 61084, which it seems I'd forgotten to post.
> pdist calculates a wide result from narrower inputs, so I should have
> used widest_int rather than wide_int.

Is that documented?  Because, if even you wide-int guys got it wrong...

>   PR target/61084
>   * config/sparc/sparc.c (sparc_fold_builtin): Use widest_int rather
>   than wide_int.

OK, thanks.

-- 
Eric Botcazou

Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware

2014-05-16 Thread Jeff Law


On 05/16/14 02:02, Richard Biener wrote:



Quiet as usual.
Which strongly suggests folks trust you to do the right thing here :-) 
 I think the FRE/PRE reference in $SUBJECT made me ignore the patch 
entirely -- my brain hurts when I look at our tree PRE implementation.


Jeff

Eliminate write-only variables

2014-05-16 Thread Jan Hubicka

Hi,
this patch adds code to remove write only static variables.  While analyzing
effectivity of LTO on firefox, I noticed that surprisingly large part of
binary's data segment is occupied by these.  Fixed thus.
(this is quite trivial transformation, I just never considered it important
enough to work on it).

The patch goes by marking write only variables in ipa.c (at same time we
discover addressable flag) and also fixes handling of the flags for
aliases. References to variables are then removed by fixup_cfg.
As first cut, I only remove stores without side effects, so copies from
volatile variables are preserved. I also kill LHS of function calls.
I do not attempt to remove asm statements.  This means that some references
may be left in the code and therefore the IPA code does not eliminate the
referneces after discovering write only variable and instead it relies
on dead variable elimination to do the job later.  Consequently not all write
only variables are removed with WHOPR in the case the references ends up
in different partitions. Something I can address incrementally.

Also I think dwarf2out should be updated to mark value of the write only
variables as optimized out.  Jakub, can you help me with this?
(I do not think it is valid to output the optimized out value of constructor)

Bootstrapped/regtested x86_64-linux, will commit it later today.

Honza

* varpool.c (dump_varpool_node): Dump write-only flag.
* lto-cgraph.c (lto_output_varpool_node, input_varpool_node): Stream
write-only flag.
* tree-cfg.c (execute_fixup_cfg): Remove statements setting write-only 
variables.


* gcc.c-torture/execute/20101011-1.c: Update testcase.
* gcc.dg/ira-shrinkwrap-prep-1.c: Update testcase.
* gcc.dg/tree-ssa/writeonly.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-6.c: Update testcase.
* gcc.dg/tree-ssa/pr21559.c: Update testcase.
* gcc.dg/debug/pr35154.c: Update testcase.
* gcc.target/i386/vectorize1.c: Update testcase.
* ipa.c (process_references): New function.
(set_readonly_bit): New function.
(set_writeonly_bit): New function.
(clear_addressable_bit): New function.
(ipa_discover_readonly_nonaddressable_var): Mark write only variables; 
fix
handling of aliases.
* cgraph.h (struct varpool_node): Add writeonly flag.

Index: varpool.c
===
--- varpool.c   (revision 210514)
+++ varpool.c   (working copy)
@@ -211,6 +211,8 @@ dump_varpool_node (FILE *f, varpool_node
 fprintf (f, " read-only");
   if (ctor_for_folding (node->decl) != error_mark_node)
 fprintf (f, " const-value-known");
+  if (node->writeonly)
+fprintf (f, " write-only");
   fprintf (f, "\n");
 }
 
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 210514)
+++ lto-cgraph.c(working copy)
@@ -562,6 +562,7 @@ lto_output_varpool_node (struct lto_simp
   bp_pack_value (&bp, node->forced_by_abi, 1);
   bp_pack_value (&bp, node->unique_name, 1);
   bp_pack_value (&bp, node->body_removed, 1);
+  bp_pack_value (&bp, node->writeonly, 1);
   bp_pack_value (&bp, node->definition, 1);
   alias_p = node->alias && (!boundary_p || node->weakref);
   bp_pack_value (&bp, alias_p, 1);
@@ -1153,6 +1154,7 @@ input_varpool_node (struct lto_file_decl
   node->forced_by_abi = bp_unpack_value (&bp, 1);
   node->unique_name = bp_unpack_value (&bp, 1);
   node->body_removed = bp_unpack_value (&bp, 1);
+  node->writeonly = bp_unpack_value (&bp, 1);
   node->definition = bp_unpack_value (&bp, 1);
   node->alias = bp_unpack_value (&bp, 1);
   node->weakref = bp_unpack_value (&bp, 1);
Index: tree-cfg.c
===
--- tree-cfg.c  (revision 210514)
+++ tree-cfg.c  (working copy)
@@ -8431,7 +8431,7 @@ execute_fixup_cfg (void)
   FOR_EACH_BB_FN (bb, cfun)
 {
   bb->count = apply_scale (bb->count, count_scale);
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
{
  gimple stmt = gsi_stmt (gsi);
  tree decl = is_gimple_call (stmt)
@@ -8457,9 +8457,45 @@ execute_fixup_cfg (void)
todo |= TODO_cleanup_cfg;
 }
 
+ /* Remove stores to variables we marked readonly.
+Keep access when store has side effect, i.e. in case when source
+is volatile.  */
+ if (gimple_store_p (stmt)
+ && !gimple_has_side_effects (stmt))
+   {
+ tree lhs = get_base_address (gimple_get_lhs (stmt));
+
+ if (TREE_CODE (lhs) == VAR_DECL
+ && (TREE_STATIC (lhs) || DECL_EXTERNAL (lhs))
+ && varpool_get_node (lhs)->writeonly)
+   {
+ unlink_stmt_vdef (stmt);
+ gsi_remove (&gsi, true);
+

Re: [PATCH] Use direct computation to calculate fallthrough edge count.

2014-05-16 Thread Jan Hubicka

> This patch uses direct computation (instead of indirect minus
> computation) to calculate fallthrough edge's count and frequency. This
> is less error-prone when the profile is inconsistent (e.g. all succ
> edge counts sums larger than bb count).
> 
> Bootstrapped and regression test on-going.
> 
> OK for trunk if test pass?

OK,
thanks!

Honza
> 
> Thanks,
> Dehao
> 
> gcc/ChangeLog:
> 2014-05-16  Dehao Chen  
> 
> * cfghooks.c (make_forwarder_block): Use direct computation to
> get fall-through edge's count and frequency.
> 
> Index: gcc/cfghooks.c
> ===
> --- gcc/cfghooks.c (revision 210518)
> +++ gcc/cfghooks.c (working copy)
> @@ -833,6 +833,9 @@ make_forwarder_block (basic_block bb, bool (*redir
> 
>fallthru = split_block_after_labels (bb);
>dummy = fallthru->src;
> +  dummy->count = 0;
> +  dummy->frequency = 0;
> +  fallthru->count = 0;
>bb = fallthru->dest;
> 
>/* Redirect back edges we want to keep.  */
> @@ -842,20 +845,13 @@ make_forwarder_block (basic_block bb, bool (*redir
> 
>if (redirect_edge_p (e))
>   {
> +  dummy->frequency += EDGE_FREQUENCY (e);
> +  dummy->count += e->count;
> +  fallthru->count += e->count;
>ei_next (&ei);
>continue;
>   }
> 
> -  dummy->frequency -= EDGE_FREQUENCY (e);
> -  dummy->count -= e->count;
> -  if (dummy->frequency < 0)
> - dummy->frequency = 0;
> -  if (dummy->count < 0)
> - dummy->count = 0;
> -  fallthru->count -= e->count;
> -  if (fallthru->count < 0)
> - fallthru->count = 0;
> -
>e_src = e->src;
>jump = redirect_edge_and_branch_force (e, bb);
>if (jump != NULL)

[PATCH] Use direct computation to calculate fallthrough edge count.

2014-05-16 Thread Dehao Chen

This patch uses direct computation (instead of indirect minus
computation) to calculate fallthrough edge's count and frequency. This
is less error-prone when the profile is inconsistent (e.g. all succ
edge counts sums larger than bb count).

Bootstrapped and regression test on-going.

OK for trunk if test pass?

Thanks,
Dehao

gcc/ChangeLog:
2014-05-16  Dehao Chen  

* cfghooks.c (make_forwarder_block): Use direct computation to
get fall-through edge's count and frequency.

Index: gcc/cfghooks.c
===
--- gcc/cfghooks.c (revision 210518)
+++ gcc/cfghooks.c (working copy)
@@ -833,6 +833,9 @@ make_forwarder_block (basic_block bb, bool (*redir

   fallthru = split_block_after_labels (bb);
   dummy = fallthru->src;
+  dummy->count = 0;
+  dummy->frequency = 0;
+  fallthru->count = 0;
   bb = fallthru->dest;

   /* Redirect back edges we want to keep.  */
@@ -842,20 +845,13 @@ make_forwarder_block (basic_block bb, bool (*redir

   if (redirect_edge_p (e))
  {
+  dummy->frequency += EDGE_FREQUENCY (e);
+  dummy->count += e->count;
+  fallthru->count += e->count;
   ei_next (&ei);
   continue;
  }

-  dummy->frequency -= EDGE_FREQUENCY (e);
-  dummy->count -= e->count;
-  if (dummy->frequency < 0)
- dummy->frequency = 0;
-  if (dummy->count < 0)
- dummy->count = 0;
-  fallthru->count -= e->count;
-  if (fallthru->count < 0)
- fallthru->count = 0;
-
   e_src = e->src;
   jump = redirect_edge_and_branch_force (e, bb);
   if (jump != NULL)

Re: Eliminate write-only variables

2014-05-16 Thread Andreas Schwab

Jan Hubicka  writes:

> +   /* Remove stores to variables we marked readonly.

s/read/write/

> +   /* For calls we can simply remove LHS when it is known to be read 
> only.  */

s/read/write/

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Re: Eliminate write-only variables

2014-05-16 Thread Jan Hubicka

> Jan Hubicka  writes:
> 
> > + /* Remove stores to variables we marked readonly.
> 
> s/read/write/
> 
> > + /* For calls we can simply remove LHS when it is known to be read 
> > only.  */
> 
> s/read/write/

Ah, thanks!

Honza
> 
> Andreas.
> 
> -- 
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."

Re: add dbgcnt and opt-info support for devirtualization

2014-05-16 Thread Xinliang David Li

On Fri, May 16, 2014 at 9:51 AM, Jan Hubicka  wrote:
>>
>> I have been chasing a runtime failure of a very large test built with
>> gcc-4_9. The bad code either calls a pure function or turn a virtual
>> call into __builtin_unreachable (incomplete target set). The indirect
>> info shows the otr type to be !maybe_derived_type, and the outer-type
>> gets cleared during inline update. I isolated a small test case -- but
>> the good news is that gcc-4_9 @head already fixed the problem.
>>
>> I will  check in the test case to trunk later.
>
> Good, testcase would be welcome.  I guess it was the fix for placement_new 
> bug.
> It disables some valid devirtualizations (and I thus may revisit the fix for
> 4.10), so it would be good to know if your testcase differs from the original
> PR one.

I thought so too -- but when I backed out this single change from
trunk, the test still passes, so something else is also going on.

David

>
> Honza
>>
>> thanks,
>>
>> David
>>
>> >
>> >
>> > Honza

patch to fix PR60969

2014-05-16 Thread Vladimir Makarov


  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60969

The patch was bootstrapped and tested on x86/x86-64.

Committed as rev. 210519 to gcc 4.9 branch and as rev. 210520 to trunk.

2014-05-16  Vladimir Makarov  

PR rtl-optimization/60969
* ira-costs.c (record_reg_classes): Allow only memory for pseudo.
Calculate costs for this case.

2014-05-16  Vladimir Makarov  

PR rtl-optimization/60969
* g++.dg/pr60969.C: New.

Index: ira-costs.c
===
--- ira-costs.c (revision 210069)
+++ ira-costs.c (working copy)
@@ -762,10 +762,11 @@ record_reg_classes (int n_alts, int n_op
 into that class.  */
  if (REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER)
{
- if (classes[i] == NO_REGS)
+ if (classes[i] == NO_REGS && ! allows_mem[i])
{
  /* We must always fail if the operand is a REG, but
-we did not find a suitable class.
+we did not find a suitable class and memory is
+not allowed.
 
 Otherwise we may perform an uninitialized read
 from this_op_costs after the `continue' statement
@@ -783,50 +784,90 @@ record_reg_classes (int n_alts, int n_op
  bool out_p = recog_data.operand_type[i] != OP_IN;
  enum reg_class op_class = classes[i];
  move_table *move_in_cost, *move_out_cost;
+ short (*mem_cost)[2];
 
  ira_init_register_move_cost_if_necessary (mode);
  if (! in_p)
{
  ira_assert (out_p);
- move_out_cost = ira_may_move_out_cost[mode];
- for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+ if (op_class == NO_REGS)
{
- rclass = cost_classes[k];
- pp_costs[k]
-   = move_out_cost[op_class][rclass] * frequency;
+ mem_cost = ira_memory_move_cost[mode];
+ for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+   {
+ rclass = cost_classes[k];
+ pp_costs[k] = mem_cost[rclass][0] * frequency;
+   }
+   }
+ else
+   {
+ move_out_cost = ira_may_move_out_cost[mode];
+ for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+   {
+ rclass = cost_classes[k];
+ pp_costs[k]
+   = move_out_cost[op_class][rclass] * frequency;
+   }
}
}
  else if (! out_p)
{
  ira_assert (in_p);
- move_in_cost = ira_may_move_in_cost[mode];
- for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+ if (op_class == NO_REGS)
{
- rclass = cost_classes[k];
- pp_costs[k]
-   = move_in_cost[rclass][op_class] * frequency;
+ mem_cost = ira_memory_move_cost[mode];
+ for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+   {
+ rclass = cost_classes[k];
+ pp_costs[k] = mem_cost[rclass][1] * frequency;
+   }
+   }
+ else
+   {
+ move_in_cost = ira_may_move_in_cost[mode];
+ for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+   {
+ rclass = cost_classes[k];
+ pp_costs[k]
+   = move_in_cost[rclass][op_class] * frequency;
+   }
}
}
  else
{
- move_in_cost = ira_may_move_in_cost[mode];
- move_out_cost = ira_may_move_out_cost[mode];
- for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-   {
- rclass = cost_classes[k];
- pp_costs[k] = ((move_in_cost[rclass][op_class]
- + move_out_cost[op_class][rclass])
-* frequency);
+ if (op_class == NO_REGS)
+   {
+ mem_cost = ira_memory_move_cost[mode];
+ for (k = cost_class

[C++ patch] Reduce vtable alignment

2014-05-16 Thread Jan Hubicka

Hi,
compiling:

struct A
{
  virtual void foo(void) {};
  virtual void foo2(void) {};
  virtual void foo3(void) {};
  virtual void foo4(void) {};
  virtual void foo5(void) {};
} a;

give 32 byte alignment to the virtual table on i386, because we bump up
alignments of arrays to size of vector operations.  This is wasteful, since
virutal tables are never really accessed this way.

I am testing the following patch, OK if it passes?
The patch also removes apparently 20 years old hack for SPARC.

Honza
* class.c (build_vtable): Force alignment of virtual tables
to be pointer size only to save space.
Index: class.c
===
--- class.c (revision 210521)
+++ class.c (working copy)
@@ -768,11 +768,8 @@ build_vtable (tree class_type, tree name
   TREE_READONLY (decl) = 1;
   DECL_VIRTUAL_P (decl) = 1;
   DECL_ALIGN (decl) = TARGET_VTABLE_ENTRY_ALIGN;
+  DECL_USER_ALIGN (decl) = true;
   DECL_VTABLE_OR_VTT_P (decl) = 1;
-  /* At one time the vtable info was grabbed 2 words at a time.  This
- fails on sparc unless you have 8-byte alignment.  (tiemann) */
-  DECL_ALIGN (decl) = MAX (TYPE_ALIGN (double_type_node),
-  DECL_ALIGN (decl));
   set_linkage_according_to_type (class_type, decl);
   /* The vtable has not been defined -- yet.  */
   DECL_EXTERNAL (decl) = 1;

Re: Avoiding some garbage rtl from instantiate_virtual_regs

2014-05-16 Thread Jeff Law


On 05/16/14 02:47, Richard Sandiford wrote:

It seems that in a typical -O0 compile the amount of rtl that
starts out as needed but becomes garbage is only slightly less than
half of the total amount created.  One of the big offenders is the
vregs pass, which creates new PLUSes when instanstiating a virtual
register + a constant and which creates new MEMs when instantiating
an address involving a virtual register.  This happens a lot in -O0
code because all variables live on the stack.

The instantiation walk is fundamentally in-place: every other part
of the pattern is modified without copying.  And rtl sharing rules
guarantee that we can do the same for PLUSes of registers and MEMs.

The patch does this by adding "inplace" arguments to plus_constant and
replace_equiv_address.  In a -O0 compile of an oldish fold-const.ii
(where no GC takes place) it reduces the amount of used GC memory
from 169M to 166M.  The average max RSS goes down by just over 1%.
Compile time seems to decrease slightly, but probably in the noise range.

There might be other callers than can use the new interfaces too.

Tested on x86_64-linux-gnu.  Also tested by comparing the asm output
for various parts of the testsuite before and after the patch.
The only changes were that some "sym+0"s becamse plain "syms"
(i.e. (plus X (const_int 0)) became X) because of the plus_constant
change.

OK to install?

Thanks,
Richard


gcc/
* emit-rtl.h (replace_equiv_address, replace_equiv_address_nv): Add an
inplace argument.  Store the new address in the original MEM when true.
* emit-rtl.c (change_address_1): Likewise.
(adjust_address_1, adjust_automodify_address_1, offset_address):
Update accordingly.
* rtl.h (plus_constant): Add an inplace argument.
* explow.c (plus_constant): Likewise.  Try to reuse the original PLUS
when true.  Avoid generating (plus X (const_int 0)).
* function.c (instantiate_virtual_regs_in_rtx): Adjust the PLUS
in-place.  Pass true to plus_constant.
(instantiate_virtual_regs_in_insn): Pass true to replace_equiv_address.

Index: gcc/emit-rtl.h
===
--- gcc/emit-rtl.h  2014-05-15 11:27:06.000259353 +0100
+++ gcc/emit-rtl.h  2014-05-16 09:11:42.479556294 +0100
@@ -52,10 +52,10 @@ extern tree get_spill_slot_decl (bool);
 ADDR.  The caller is asserting that the actual piece of memory pointed
 to is the same, just the form of the address is being changed, such as
 by putting something into a register.  */
-extern rtx replace_equiv_address (rtx, rtx);
+extern rtx replace_equiv_address (rtx, rtx, bool = false);

  /* Likewise, but the reference is not required to be valid.  */
-extern rtx replace_equiv_address_nv (rtx, rtx);
+extern rtx replace_equiv_address_nv (rtx, rtx, bool = false);
Presumably the default value for the inplace argument is to avoid having 
to fixup all the call sites.


I guess that's OK.  Clearly it's a safe default and avoids a fair amount 
of unnecessary churn.



OK for the trunk.

THanks,
Jeff

Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware

2014-05-16 Thread Richard Biener

On May 16, 2014 7:07:11 PM CEST, Jeff Law  wrote:
>On 05/16/14 02:02, Richard Biener wrote:
>>>
>> Quiet as usual.
>Which strongly suggests folks trust you to do the right thing here :-) 
>  I think the FRE/PRE reference in $SUBJECT made me ignore the patch 
>entirely -- my brain hurts when I look at our tree PRE implementation.

Heh, it's much easier to understand than it once was!

:)

Richard.

>Jeff

Re: [DOC Patch] Incorrect @xref in #pragma visibility

2014-05-16 Thread Jeff Law


On 05/14/14 20:25, David Wohlferd wrote:

2014-05-14  David Wohlferd 

  * doc/extend.texi: (Visibility Pragmas) Fix misplaced @xref

Approved and installed on your behalf.

Thanks,
jeff

Re: [PATCH 1/1][ira-costs] grammar fix of comments

2014-05-16 Thread Jeff Law


On 05/14/14 18:45, Zhouyi Zhou wrote:

Fix grammar error for comments above
process_bb_node_for_hard_reg_moves

Signed-off-by: Zhouyi Zhou 
---
  gcc/ira-costs.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 648806b..fbfb070 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -1855,7 +1855,7 @@ find_costs_and_classes (FILE *dump_file)

  /* Process moves involving hard regs to modify allocno hard register
 costs.  We can do this only after determining allocno class.  If a
-   hard register forms a register class, than moves with the hard
+   hard register forms a register class, then moves with the hard
 register are already taken into account in class costs for the
 allocno.  */
  static void

THanks.  Installed.

jeff

Re: [PATCH 1/5] export finish_bitfield_layout from stor-layout

2014-05-16 Thread Jeff Law


On 05/16/14 09:26, Tom Tromey wrote:

The gdb plugin handles some aspects of type layout on its own.  It
does this because it knows the layout of types, but not the path by
which the layout was determined -- so normal gcc things like
TYPE_PACKED cannot be used.

This patch exposes one bit of stor-layout so it can be used by the
plugin.

2014-05-16  Phil Muldoon  
Tom Tromey  

* stor-layout.c (finish_bitfield_layout): Now public.  Change
argument type to 'tree'.
(finish_record_layout): Update.
* stor-layout.h (finish_bitfield_layout): Declare.

OK.  However, please hold off installing until the entire set is approved.



Jeff

Re: [PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Jeff Law


On 05/16/14 09:26, Tom Tromey wrote:

In the typical case, when compiling a snippet of user code, gdb wraps
the user's text in a dummy function.

It's somewhat odd for users if an error in their code is mentioned as
coming from this dummy function.

This patch makes it possible to suppress the function-name display in
a straightforward way: it adds a new global which the plugin can set
to declare the name of the dummy function.

This patch seems like a bit of a hack, but there didn't seem to be a
notably cleaner approach.

2014-05-16  Phil Muldoon  
Tom Tromey  

* c-lang.c (c_diagnostic_ignored_function): New global.
(c_print_error_function): New function.
(LANG_HOOKS_PRINT_ERROR_FUNCTION): Define.
* c-lang.h (c_diagnostic_ignored_function): Declare.

Just a few nites.

In c-lang.c, please use the old C-style comments.  If for no other 
reason than it's consistent with all the other nearby code.   Consider 
using non-NULL when referring to pointers rather than non-zero.  */


Otherwise OK.  Please wait to install until the entire kit is approved.

BTW, didn't see patch #5 of the series.

jeff

Re: [PATCH 2/5] c_diagnostic_ignored_function hack

2014-05-16 Thread Tom Tromey

Jeff> BTW, didn't see patch #5 of the series.

Maybe it was too big.
I will try to resend it compressed.

Tom

[PATCH 5/5] add libcc1

2014-05-16 Thread Tom Tromey

> "Tom" == Tom Tromey  writes:

Tom> This patch series is half of a project to let gdb reuse gcc (which
Tom> half depends on which list you are seeing this on), so that users can
Tom> compile small snippets of code and evaluate them in the current
Tom> context of the inferior.

Jeff noted that patch #5 didn't make it through.

I've edited this one down by removing the auto-generated stuff , and
then compressed it.

If anybody wants to apply this to try it out, it's all on github.

https://github.com/tromey/gcc
https://github.com/tromey/gdb

In each repository the branch named "submit/compile" holds the rebased
series that were submitted.

The development branches are both named "gdbjit".  These are much
messier but not ever rebased, so perhaps safer to track.

Tom

This patch adds the plugin to the gcc tree and updates the top-level
configury.

It seems better to have this code in the gcc tree than in the gdb
tree, because it is bound more tightly to gcc.  The gcc plugin makes
direct calls into various parts of gcc to do its work; whereas on the
gdb side everything is done via a relatively simple API, without any
direct connection to gdb internals.  This breakdown made the most
sense because most calls are from gdb to gcc rather than vice versa.

The plugin itself consists of two parts.  These parts communicate via
a simple ad hoc RPC system implemented in the plugin code.

2014-05-16  Phil Muldoon  
Tom Tromey  

* Makefile.def: Add libcc1 to host_modules.
* configure.ac (host_tools): Add libcc1.
* Makefile.in, configure: Rebuild.

2014-05-16  Phil Muldoon  
Jan Kratochvil  
Tom Tromey  

* aclocal.m4: New file.
* callbacks.cc: New file.
* callbacks.hh: New file.
* cc1plugin-config.h.in: New file.
* configure: New file.
* configure.ac: New file.
* connection.cc: New file.
* connection.hh: New file.
* libcc1.cc: New file.
* libcc1plugin.sym: New file.
* libcc1.sym: New file.
* Makefile.am: New file.
* Makefile.in: New file.
* marshall.cc: New file.
* marshall.hh: New file.
* names.cc: New file.
* names.hh: New file.
* plugin.cc: New file.
* rpc.hh: New file.
* status.hh: New file.
---
 ChangeLog| 7 +
 Makefile.def | 5 +
 Makefile.in  |   995 ++-
 configure| 2 +-
 configure.ac | 4 +-
 libcc1/ChangeLog |24 +
 libcc1/Makefile.am   |42 +
 libcc1/Makefile.in   |   614 ++
 libcc1/aclocal.m4|   980 +++
 libcc1/callbacks.cc  |90 +
 libcc1/callbacks.hh  |64 +
 libcc1/cc1plugin-config.h.in |92 +
 libcc1/configure | 16956 +
 libcc1/configure.ac  |65 +
 libcc1/connection.cc |   153 +
 libcc1/connection.hh |   114 +
 libcc1/libcc1.cc |   454 ++
 libcc1/libcc1.sym| 1 +
 libcc1/libcc1plugin.sym  | 2 +
 libcc1/marshall.cc   |   166 +
 libcc1/marshall.hh   |93 +
 libcc1/names.cc  |46 +
 libcc1/names.hh  |55 +
 libcc1/plugin.cc |   895 +++
 libcc1/rpc.hh|   486 ++
 libcc1/status.hh |33 +
 26 files changed, 22430 insertions(+), 8 deletions(-)
 create mode 100644 libcc1/ChangeLog
 create mode 100644 libcc1/Makefile.am
 create mode 100644 libcc1/Makefile.in
 create mode 100644 libcc1/aclocal.m4
 create mode 100644 libcc1/callbacks.cc
 create mode 100644 libcc1/callbacks.hh
 create mode 100644 libcc1/cc1plugin-config.h.in
 create mode 100755 libcc1/configure
 create mode 100644 libcc1/configure.ac
 create mode 100644 libcc1/connection.cc
 create mode 100644 libcc1/connection.hh
 create mode 100644 libcc1/libcc1.cc
 create mode 100644 libcc1/libcc1.sym
 create mode 100644 libcc1/libcc1plugin.sym
 create mode 100644 libcc1/marshall.cc
 create mode 100644 libcc1/marshall.hh
 create mode 100644 libcc1/names.cc
 create mode 100644 libcc1/names.hh
 create mode 100644 libcc1/plugin.cc
 create mode 100644 libcc1/rpc.hh
 create mode 100644 libcc1/status.hh



0001-add-libcc1.patch.gz
Description: the patch

Re: [C++ patch] Reduce vtable alignment

2014-05-16 Thread Jan Hubicka

Hi,
this patch makes also the rtti type info for A in the testcase:

struct A
{
  virtual void foo(void) {};
  virtual void foo2(void) {};
  virtual void foo3(void) {};
  virtual void foo4(void) {};
  virtual void foo5(void) {};
} a;

aligned only to the ABI requirement (8) instead of being bumped up to 16 bytes
by the following code in i386.c:
  /* x86-64 ABI requires arrays greater than 16 bytes to be aligned
 to 16byte boundary.  */
  if (TARGET_64BIT)
{
  if ((opt ? AGGREGATE_TYPE_P (type) : TREE_CODE (type) == ARRAY_TYPE)
  && TYPE_SIZE (type)
  && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
  && wi::geu_p (TYPE_SIZE (type), 128)
  && align < 128)
return 128;
}

Here the variable is first run through align_variable and that decides to add
optional alignment. We really want only ABI required alignment here.
Does the following patch look resonable?

* rtti.c: Include tm_p.h
(emit_tinfo_decl): Align type infos only as required by the target ABI.

Index: rtti.c
===
--- rtti.c  (revision 210521)
+++ rtti.c  (working copy)
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "tm_p.h"
 #include "stringpool.h"
 #include "stor-layout.h"
 #include "cp-tree.h"
@@ -1596,6 +1597,12 @@ emit_tinfo_decl (tree decl)
   DECL_INITIAL (decl) = init;
   mark_used (decl);
   cp_finish_decl (decl, init, false, NULL_TREE, 0);
+  /* Avoid targets optionally bumping up the alignment to improve
+vector instruction accesses, tinfo are never accessed this way.  */
+#ifdef DATA_ABI_ALIGNMENT
+  DECL_ALIGN (decl) = DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE 
(decl)));
+  DECL_USER_ALIGN (decl) = true;
+#endif
   return true;
 }
   else

Re: [PATCH 3/5] introduce the binding oracle

2014-05-16 Thread Jeff Law


On 05/16/14 09:26, Tom Tromey wrote:

gdb wants to supply any declarations that may be referred to by the
user's code.  Hooking into symbol lookup was an efficient way to
accomplish this.

This patch introducing a "binding oracle" that is consulted whenever a
symbol binding is looked up for the first time.  The oracle is just a
global function pointer.  If it is NULL, no special work is done.  It
is called with the identifier to supply and with an enum argument
indicating the kind of binding being requested.  The oracle can then
call back into the C front end (via the new functions c_pushtag and
c_bind) to supply a binding; or it can silently do nothing if the
request could not be fulfilled.

The code caches Whether the oracle has been called to avoid repeated
useless queries.

There is a little hack in c_print_identifier to avoid calling the
binding oracle here.  This makes debugging gcc in the presence of the
plugin remain relatively sane -- without this, calling debug_tree or
the like can confusingly call into the plugin.

2014-05-16  Phil Muldoon  
Tom Tromey  

* c-tree.h (enum c_oracle_request): New.
(c_binding_oracle_function): New typedef.
(c_binding_oracle, c_pushtag, c_bind): Declare.
* c-decl.c (c_binding_oracle): New global.
(I_SYMBOL_CHECKED): New macro.
(i_symbol_binding): New function.
(I_SYMBOL_BINDING, I_SYMBOL_DECL): Redefine.
(I_TAG_CHECKED): New macro.
(i_tag_binding): New function.
(I_TAG_BINDING, I_TAG_DECL): Redefine.
(I_LABEL_CHECKED): New macro.
(i_label_binding): New function.
(I_LABEL_BINDING, I_LABEL_DECL): Redefine.
(c_print_identifier): Save and restore c_binding_oracle.
(c_pushtag, c_bind): New functions.
---
  void
  c_print_identifier (FILE *file, tree node, int indent)
  {
+  void (*save) (enum c_oracle_request, tree identifier);
+
+  // This makes debugging much more sane.
+  save = c_binding_oracle;
+  c_binding_oracle = NULL;
+
Just a nit.  C-style comment would be appreciated.  It might also help 
to clarify what "much more sane" really means here.


Otherwise, it looks OK to me.

jeff

Re: [patch] fix ppc spe bootstrap error in dwf_regno

2014-05-16 Thread Jeff Law


On 04/30/14 09:15, Cesar Philippidis wrote:

I've been working on a patch to switch a few ppc targets to use softfp
in libgcc instead of fpbit and I noticed that ppc-none-eabispe fails to
bootstrap in both trunk and 4.9. The regression was introduced in
. Essentially,
the assert for a hard register in dwf_regno () is not valid on ppc spe
targets. In rs6000_dwarf_register_span (), there is a note stating:

   /* The duality of the SPE register size wreaks all kinds of havoc.
  This is a way of distinguishing r0 in 32-bits from r0 in
  64-bits.  */

and the function adds 1200 to regno, which makes that register appear to
be a pseudo. This causes problems in dwf_regno (), which asserts that
reg is a hard register. Since the dwarf2 pass is executed after register
allocation it should, in theory, be ok for the rs6000 backend to be
using a pseudo register for this application.

Is this patch ok for trunk and 4.9? If so, please commit since I don't
have an svn account.
It seems to me the real problem here is the SPE port and this silly game 
it plays with adding a large value to regno.


While it is currently safe to remove the assert, the assert is there to 
help catch cases where a pseudo has leaked through to a point where it 
shouldn't.


I'd rather look for another approach.

Jeff

Re: [PATCH, i386, Pointer Bounds Checker 2/x] Intel Memory Protection Extensions (MPX) instructions support

2014-05-16 Thread Jeff Law


On 04/16/14 05:35, Ilya Enkovich wrote:

Hi,

This patch introduces Intel MPX bound registers and instructions.  It was 
approved earlier for 4.9 and had no significant changes since then.  I'll 
assume patch is OK if no objections arise.

Patch was bootstrapped and tested for linux-x86_64.

Thanks,
Ilya
--
gcc/

2014-04-16  Ilya Enkovich  

* mode-classes.def (MODE_BOUND): New.
* tree.def (BOUND_TYPE): New.
* genmodes.c (complete_mode): Support MODE_BOUND.
(BOUND_MODE): New.
(make_bound_mode): New.
* machmode.h (BOUND_MODE_P): New.
* stor-layout.c (int_mode_for_mode): Support MODE_BOUND.
(layout_type): Support BOUND_TYPE.
* tree-pretty-print.c (dump_generic_node): Support BOUND_TYPE.
* tree.c (build_int_cst_wide): Support BOUND_TYPE.
(type_contains_placeholder_1): Likewise.
* tree.h (BOUND_TYPE_P): New.
* varasm.c (output_constant): Support BOUND_TYPE.
* config/i386/constraints.md (B): New.
(Ti): New.
(Tb): New.
* config/i386/i386-modes.def (BND32): New.
(BND64): New.
* config/i386/i386-protos.h (ix86_bnd_prefixed_insn_p): New.
* config/i386/i386.c (isa_opts): Add mmpx.
(regclass_map): Add bound registers.
(dbx_register_map): Likewise.
(dbx64_register_map): Likewise.
(svr4_dbx_register_map): Likewise.
(PTA_MPX): New.
(ix86_option_override_internal) Support MPX ISA.
(ix86_code_end): Add MPX bnd prefix.
(output_set_got): Likewise.
(ix86_output_call_insn): Likewise.
(get_some_local_dynamic_name): Add '!' (MPX bnd) print prefix support.
(ix86_print_operand_punct_valid_p): Likewise.
(ix86_print_operand_address): Support UNSPEC_BNDMK_ADDR and
UNSPEC_BNDMK_ADDR.
(ix86_class_likely_spilled_p): Add bound regs support.
(ix86_hard_regno_mode_ok): Likewise.
(x86_order_regs_for_local_alloc): Likewise.
(ix86_bnd_prefixed_insn_p): New.
* config/i386/i386.h (FIRST_PSEUDO_REGISTER): Fix to new value.
(FIXED_REGISTERS): Add bound registers.
(CALL_USED_REGISTERS): Likewise.
(REG_ALLOC_ORDER): Likewise.
(HARD_REGNO_NREGS): Likewise.
(TARGET_MPX): New.
(VALID_BND_REG_MODE): New.
(FIRST_BND_REG): New.
(LAST_BND_REG): New.
(reg_class): Add BND_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(BND_REGNO_P): New.
(ANY_BND_REG_P): New.
(BNDmode): New.
(HI_REGISTER_NAMES): Add bound registers.
* config/i386/i386.md (UNSPEC_BNDMK): New.
(UNSPEC_BNDMK_ADDR): New.
(UNSPEC_BNDSTX): New.
(UNSPEC_BNDLDX): New.
(UNSPEC_BNDLDX_ADDR): New.
(UNSPEC_BNDCL): New.
(UNSPEC_BNDCU): New.
(UNSPEC_BNDCN): New.
(UNSPEC_MPX_FENCE): New.
(BND0_REG): New.
(BND1_REG): New.
(type): Add mpxmov, mpxmk, mpxchk, mpxld, mpxst.
(length_immediate): Likewise.
(prefix_0f): Likewise.
(memory): Likewise.
(prefix_rep): Check for bnd prefix.
(BND): New.
(bnd_ptr): New.
(BNDCHECK): New.
(bndcheck): New.
(*jcc_1): Add MPX bnd prefix and fix length.
(*jcc_2): Likewise.
(jump): Likewise.
(simple_return_internal): Likewise.
(simple_return_pop_internal): Likewise.
(*indirect_jump): Add MPX bnd prefix.
(*tablejump_1): Likewise.
(simple_return_internal_long): Likewise.
(simple_return_indirect_internal): Likewise.
(_mk): New.
(*_mk): New.
(mov): New.
(*mov_internal_mpx): New.
(_): New.
(*_): New.
(_ldx): New.
(*_ldx): New.
(_stx): New.
(*_stx): New.
* config/i386/predicates.md (lea_address_operand) Rename to...
(address_no_seg_operand): ... this.
(address_mpx_no_base_operand): New.
(address_mpx_no_index_operand): New.
(bnd_mem_operator): New.
* config/i386/i386.opt (mmpx): New.

Are parts of this patch missing?

The ChangeLog references several changes in the machine independent 
parts of GCC, but I don't see them.  I can recall what they look like, 
but for sanity's sake, I think this patch needs to be reposted.  It's 
likely OK, but let's do the right thing.


jeff

Re: [PATCH] Fix more typos in error messages

2014-05-16 Thread Jeff Law


On 02/25/14 09:21, Benno Schulenberg wrote:

2014-02-07  Benno Schulenberg


* config/arc/arc.c (arc_init): Fix typo in error message.
* config/i386/i386.c (ix86_expand_builtin): Likewise.
(split_stack_prologue_scratch_regno): Likewise.
* fortran/check.c (gfc_check_fn_rc2008): Remove duplicate
word from error message.

Thanks.  Installed.  Sorry about the delay.

jeff

[patch, libstdc++] fix TCL error in abi.exp

2014-05-16 Thread Sandra Loosemore

As noted in PR 23867, the libstdc++ ABI testsuite doesn't work for 
installed compiler testing.  There is some logic in abi.exp to check for 
the presence of the required bits and skip these tests if they're not 
there, but before it gets that far it's hitting a TCL error:


ERROR: can't read "baseline_subdir_switch": no such variable
while executing
"eval exec $cxx $baseline_subdir_switch"
invoked from within
"set baseline_subdir "[eval exec $cxx $baseline_subdir_switch]""
...

This patch moves the offending reference to $baseline_subdir_switch 
after the check for missing bits instead of before.  OK to commit?


-Sandra


2014-05-16  Iain Sandoe  
Sandra Loosemore  

libstdc++-v3/
* testsuite/libstdc++-abi/abi.exp: Defer setting of baseline_subdir
until after checking that the test is eligible to be run.
Index: libstdc++-v3/testsuite/libstdc++-abi/abi.exp
===
--- libstdc++-v3/testsuite/libstdc++-abi/abi.exp	(revision 210372)
+++ libstdc++-v3/testsuite/libstdc++-abi/abi.exp	(working copy)
@@ -24,8 +24,6 @@ if { [string match "*-*-darwin*" $target
 set lib $blddir/src/.libs/libstdc++.so
 }
 
-set baseline_subdir "[eval exec $cxx $baseline_subdir_switch]"
-
 # Build the support objects.
 v3-build_support
 
@@ -35,6 +33,8 @@ if { (${v3-symver} == 0) || ![info exist
 return
 }
 
+set baseline_subdir "[eval exec $cxx $baseline_subdir_switch]"
+
 set baseline_file \
 [file join $baseline_dir $baseline_subdir "baseline_symbols.txt"]
 # If there is no ABI-specific reference file use that of the default ABI.

Re: Unreviewed Patch

2014-05-16 Thread Jeff Law


On 02/22/14 16:07, rbmj wrote:

Hi all,

Just a ping, I haven't gotten anything back on this patch:
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00621.html
The patch needs to be tested by bootstrapping on another platform and 
performing a regression test.Most folks use an x86_64 linux system 
for that step.


While the odds that this patch breaks things is small, the regression 
test and bootstrap has proven quite valuable through the years in 
catching issues.


jeff

Re: [patch, libstdc++] fix TCL error in abi.exp

2014-05-16 Thread Jeff Law


On 05/16/14 13:53, Sandra Loosemore wrote:

As noted in PR 23867, the libstdc++ ABI testsuite doesn't work for
installed compiler testing.  There is some logic in abi.exp to check for
the presence of the required bits and skip these tests if they're not
there, but before it gets that far it's hitting a TCL error:

ERROR: can't read "baseline_subdir_switch": no such variable
 while executing
"eval exec $cxx $baseline_subdir_switch"
 invoked from within
"set baseline_subdir "[eval exec $cxx $baseline_subdir_switch]""
...

This patch moves the offending reference to $baseline_subdir_switch
after the check for missing bits instead of before.  OK to commit?

-Sandra


2014-05-16  Iain Sandoe  
 Sandra Loosemore  

 libstdc++-v3/
 * testsuite/libstdc++-abi/abi.exp: Defer setting of
baseline_subdir
 until after checking that the test is eligible to be run.

OK.
Jeff

Re: [PATCH][C-family] Fix PR61184

2014-05-16 Thread Jeff Law


On 05/14/14 03:06, Richard Biener wrote:


The following fixes pre/post-inc/dec gimplification of promoted
integer types.  There is the issue with the way TYPE_OVERFLOW_UNDEFINED
is related to TYPE_OVERFLOW_WRAPS and the (non-)semantics of
-fno-strict-overflow.

In this case, with -On -fno-strict-overflow for a variable of
type short we have !TYPE_OVERFLOW_WRAPS _and_ !TYPE_OVERFLOW_UNDEFINED
(so we're in an "undefined" area).  Which means that
!TYPE_OVERFLOW_UNDEFINED doesn't imply that overflow wraps.

Thus the gimplification has to play on the safe side and
always use an unsigned type unless the user specifies -fwrapv
(the flag with a proper semantic meaning).

That is, it seems to be the case that what predicate to use
(TYPE_OVERFLOW_WRAPS or TYPE_OVERFLOW_UNDEFINED, independent
on whether you invert it), depends on the use-case in a very
awkward (and error-prone) way.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu, ok
if that succeeds (I expect to have to adjust some testcases)?

Thanks,
Richard.

2014-05-14  Richard Biener  

c-family/
* c-gimplify.c (c_gimplify_expr): Gimplify self-modify expressions
using unsigned arithmetic if overflow does not wrap instead of
if overflow is undefined.

* c-c++-common/torture/pr61184.c: New testcase.

Seems reasonable to me.

Jeff

Re: [PATCH 5/5] add libcc1

2014-05-16 Thread Joseph S. Myers

On Fri, 16 May 2014, Tom Tromey wrote:

> This patch adds the plugin to the gcc tree and updates the top-level
> configury.

I don't see anything obvious that would disable the plugin if plugins are 
unsupported (e.g. on Windows host) or disabled (--disable-plugin).  
Probably the relevant support from gcc/configure.ac needs to go somewhere 
it can be used at toplevel.

> +  self->args.push_back ("gcc");

seems wrong - at least you should use the appropriate compiler name after 
transformation for cross compilers / --program-transform-name.  Though 
really the *versioned* driver $(target_noncanonical)-gcc-$(version) is the 
right one to use, in that the plugin should presumably be closely 
associated with a particular compiler installation when e.g. a 
distribution has multiple GCC versions packaged that can be installed 
simultaneously.  Having multiple copies of the plugin installed in the 
same prefix for different targets should be supported.  I'm not clear on 
which library does what - does libcc1 depend in any way on the target or 
GCC version?

How are the compiler multilib options (e.g. -m32 / -m64) specified?  Is 
that something GDB passes through, based on examining properties of the 
binary being debugged?  (Unfortunately there may not be a good way in 
general for GDB to tell what the required options for a particular binary 
are.  It can at least reasonably default to using a copy of the plugin for 
the same target triplet as GDB was configured for.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-16 Thread Oleg Endo

On Fri, 2014-05-16 at 18:10 +0800, Bin.Cheng wrote:
> On Thu, May 15, 2014 at 6:31 PM, Oleg Endo  wrote:
> >
> > How about the following.
> > Instead of adding new hooks and inserting the pass to the general pass 
> > list, make the new
> > pass class take the necessary callback functions directly.  Then targets 
> > can just instantiate
> > the pass, passing their impl of the callbacks, and insert the pass object 
> > into the pass list at
> > a place that fits best for the target.

> Oh, I don't know we can do this in GCC.  But yes, a target may want to
> run it at some place that fits best for the target.
> 

I think it's better than trying to come up with a scheme that so-so fits
all.  My idea would look like:

// merge_paired_loadstore.h
class merge_paired_loadstore : public rtl_opt_pass
{
public:
  struct delegate
  {
virtual bool merge_paired_loadstore (rtx x, rtx y, ...) = 0;
...
  };

  merge_paired_loadstore (gcc::context* ctx, const char* name,
  delegate* d);
  ...
};

// .cc

#include "merge_paired_loadstore.h"

static struct target_merge_loadstore_delegate :
merge_paired_loadstore::delegate
{
  virtual bool merge_paired_loadstore (...)
  {
 // code as if this was a freestanding target hook function
  };
} g_merge_loadstore_delegate;


static void _register_passes (void)
{
  register_pass (
new merge_paired_loadstore (g, "merge_ls",  
&g_merge_loadstore_delegate),
PASS_POS_INSERT_AFTER, "other pass", 1);
}


Then, later, maybe sometime in the future, if there's something like a
class target, it'd look like:

class my_target : public target,
  merge_paired_loadstore::delegate
{
   ... 

   virtual bool merge_paired_loadstore (...);
};

Maybe it's a bit far fetched at the moment, but it would be a start.

Cheers,
Oleg

[PATCH] Use optimize_function_for_size_p to assign register frequency

2014-05-16 Thread Dehao Chen

This patch uses optimize_function_for_size_p to replace old
optimize_size check in regs.h and ira-int.h to make it consistent.

Bootstrapped and testing on-going.

OK for trunk if test passes?

Thanks,
Dehao

gcc/ChangeLog:
2014-05-16  Dehao Chen  

* ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Use optimize_function_for_size_p.
* regs.h (REG_FREQ_FROM_BB): Likewise.

Index: gcc/ira-int.h
===
--- gcc/ira-int.h   (revision 210527)
+++ gcc/ira-int.h   (working copy)
@@ -42,9 +42,8 @@ along with GCC; see the file COPYING3.  If not see
profile driven feedback is available and the function is never
executed, frequency is always equivalent.  Otherwise rescale the
edge frequency.  */
-#define REG_FREQ_FROM_EDGE_FREQ(freq) \
-  (optimize_size || (flag_branch_probabilities\
-&& !ENTRY_BLOCK_PTR_FOR_FN (cfun)->count) \
+#define REG_FREQ_FROM_EDGE_FREQ(freq) \
+  (optimize_function_for_size_p (cfun)\
? REG_FREQ_MAX : (freq * REG_FREQ_MAX / BB_FREQ_MAX)   \
? (freq * REG_FREQ_MAX / BB_FREQ_MAX) : 1)

Index: gcc/regs.h
===
--- gcc/regs.h  (revision 210527)
+++ gcc/regs.h  (working copy)
@@ -135,9 +135,7 @@ extern size_t reg_info_p_size;
or profile driven feedback is available and the function is never executed,
frequency is always equivalent.  Otherwise rescale the basic block
frequency.  */
-#define REG_FREQ_FROM_BB(bb) (optimize_size  \
- || (flag_branch_probabilities   \
- && !ENTRY_BLOCK_PTR_FOR_FN (cfun)->count)   \
+#define REG_FREQ_FROM_BB(bb) (optimize_function_for_size_p (cfun)\
  ? REG_FREQ_MAX  \
  : ((bb)->frequency * REG_FREQ_MAX / BB_FREQ_MAX)\
  ? ((bb)->frequency * REG_FREQ_MAX / BB_FREQ_MAX)\

1 2 >

1 - 100 of 126 matches

Mail list logo