date:20141116

[Comitted] Add a few testcases

2014-11-16 Thread Andrew Pinski

Add a few testcases which I had floating around in a private tree.
Most of these testcases failed in our private tree at one point due to
local changes.  Since it is always good to have more testcases, I
decided to commit them.

I tested all of them on x86_64 with no failures.

Thanks,
Andrew

ChangeLog:
* gcc.c-torture/execute/memset-4.c: New test.
* gcc.c-torture/execute/20110418-1.c: New test.
* gcc.c-torture/execute/20141022-1.c: New test.
* gcc.c-torture/execute/strcpy-2.c: New test.
* gcc.c-torture/execute/20140212-2.c: New test.
* gcc.c-torture/compile/20120913-1.c: New test.
* gcc.c-torture/compile/20121010-1.c: New test.
* gcc.c-torture/compile/20120917-1.c: New test.
* gcc.c-torture/compile/20140110-1.c: New test.
* gcc.c-torture/compile/20121220-1.c: New test.
* gcc.c-torture/compile/20120822-1.c: New test.
* gcc.c-torture/compile/20121027-1.c: New test.
* gcc.c-torture/compile/20120830-2.c: New test.
Index: gcc.c-torture/execute/memset-4.c
===
--- gcc.c-torture/execute/memset-4.c(revision 0)
+++ gcc.c-torture/execute/memset-4.c(revision 0)
@@ -0,0 +1,27 @@
+/* Test to make sure memset of small old size works
+   correctly. */
+#define SIZE 15
+
+void f(char *a) __attribute__((noinline));
+void f(char *a)
+{
+  __builtin_memset (a, 0, SIZE);
+}
+
+
+int main(void)
+{
+  int i;
+  char b[SIZE];
+  for(i = 0; i < sizeof(b); i++)
+{
+  b[i] = i;
+}
+  f(b);
+  for(i = 0; i < sizeof(b); i++)
+{
+  if (0 != b[i])
+   __builtin_abort ();
+}
+  return 0;
+}
Index: gcc.c-torture/execute/20110418-1.c
===
--- gcc.c-torture/execute/20110418-1.c  (revision 0)
+++ gcc.c-torture/execute/20110418-1.c  (revision 0)
@@ -0,0 +1,29 @@
+typedef unsigned long long uint64_t;
+void f(uint64_t *a, uint64_t aa) __attribute__((noinline));
+void f(uint64_t *a, uint64_t aa)
+{
+  uint64_t new_value = aa;
+  uint64_t old_value = *a;
+  int bit_size = 32;
+uint64_t mask = (uint64_t)(unsigned)(-1);
+uint64_t tmp = old_value & mask;
+new_value &= mask;
+/* On overflow we need to add 1 in the upper bits */
+if (tmp > new_value)
+new_value += 1ullhash_list != (slot);)
+if (chunk <= (e->new_chunk>>56))
+  return e;
+}
Index: gcc.c-torture/compile/20121010-1.c
===
--- gcc.c-torture/compile/20121010-1.c  (revision 0)
+++ gcc.c-torture/compile/20121010-1.c  (revision 0)
@@ -0,0 +1,10 @@
+int _IO_getc(int*);
+read_long(int *fp)
+{
+  unsigned char b0, b1, b2, b3;
+  b0 = _IO_getc (fp);
+  b1 = _IO_getc (fp);
+  b2 = _IO_getc (fp);
+  b3 = _IO_getc (fp);
+  return ((int)(b3 << 8) | b2) << 8) | b1) << 8) | b0);
+}
Index: gcc.c-torture/compile/20120917-1.c
===
--- gcc.c-torture/compile/20120917-1.c  (revision 0)
+++ gcc.c-torture/compile/20120917-1.c  (revision 0)
@@ -0,0 +1,13 @@
+typedef long long curl_off_t;
+int tool_seek_cb(void *userdata, curl_off_t offset, int whence)
+{
+  if(offset > 0x7FFFLL - 0x1LL) 
+{
+curl_off_t left = offset;
+while(left) 
+{
+  long step = (left > 0x7FFFLL - 0x1LL) ? 2147483647L - 1L : 
(long)left;
+  left -= step;
+}
+  }
+}
Index: gcc.c-torture/compile/20140110-1.c
===
--- gcc.c-torture/compile/20140110-1.c  (revision 0)
+++ gcc.c-torture/compile/20140110-1.c  (revision 0)
@@ -0,0 +1,14 @@
+typedef long unsigned int size_t;
+struct RangeMapRange {
+  unsigned fromMin;
+  unsigned fromMax;
+  unsigned toMin;
+};
+void reserve1(void);
+void f(struct RangeMapRange *q1, size_t t)
+{
+  const struct RangeMapRange *q2 = q1 + t;
+  size_t n = q2 - q1;
+  if (n > 0)
+reserve1();
+}
Index: gcc.c-torture/compile/20121220-1.c
===
--- gcc.c-torture/compile/20121220-1.c  (revision 0)
+++ gcc.c-torture/compile/20121220-1.c  (revision 0)
@@ -0,0 +1,14 @@
+typedef unsigned char uint8_t;
+typedef unsigned int uint32_t;
+static __attribute__ (( always_inline )) __inline__
+void rop_8_notsrc_or_dst(uint8_t *dst, uint8_t src)
+{
+  *dst = (~(src)) | (*dst);
+}
+void cirrus_colorexpand_notsrc_or_dst_8 (uint8_t * dst, int bits)
+{
+  uint8_t src;
+  uint32_t colors[2];
+  src = colors[bits];
+  rop_8_notsrc_or_dst(dst, src);
+}
Index: gcc.c-torture/compile/20120822-1.c
===
--- gcc.c-torture/compile/20120822-1.c  (revision 0)
+++ gcc.c-torture/compile/20120822-1.c  (revision 0)
@@ -0,0 +1,11 @@
+int a;
+int c;
+int b;
+void shr_long(int d, unsigned char s)
+{
+ long long dvd, div, mod;
+ dvd = b;
+ mod = dvd % s;
+ if (((c >> ((mod & 0xff) % 32)) & 1) == 0)
+  a = 1;
+}
Index: gcc.c-torture/compile/20121027-1.c

Re: [PATCH 0/5] some combine patches

2014-11-16 Thread Oleg Endo

Hi Segher,

On 15 Nov 2014, at 04:19, Segher Boessenkool  wrote:

> Here are five patches that together allow combine to do more useful
> work with PARALLELs of two SETs, like on many machines a set of a GPR
> and one of the condition code, or a GPR and the carry bit on PowerPC,
> or two GPRs on some machines.
> 
> The first patch is just for debug.
> 
> The second is the real meat: it allows combining an I2 that has two SETs.
> 
> The third adds a regno field to LOG_LINKS, which the fourth then uses in
> distribute_log_links; without that, most parallels lose their log_links
> early.
> 
> The fifth removes a SET from the combination result if it is (now) dead,
> if what's left is a valid instruction.
> 
> Bootstrapped and tested on powerpc64-linux (tree of a week ago), all five
> together, -m64,-m32,-m32/-mpowerpc64,-m64/-mlra; no regressions.  Checks
> of the separate patches still running.  Is this okay for mainline if it
> passes?

When you commit those, could you please also add PR 59278 to the ChangeLog so 
that the commit appears in bugzilla?  After your patches are in, I'd like to 
add some SH specific test cases (assuming that your patches fix PR 59278).

Cheers,
Oleg

[PATCH, sh]: Use std::swap

2014-11-16 Thread Uros Bizjak

Hello!

2014-11-16  Uros Bizjak  

* config/sh/sh.c: Do not include algorithm.
(sh_emit_scc_to_t): Replace open-coded swap with std::swap
to swap values.
(sh_emit_compare_and_branch): Ditto.
(sh_emit_compare_and_set): Ditto.
* config/sh/sh.md (replacement peephole2): Ditto.
(cstore4_media): Ditto.
(*fmasf4): Ditto.

Tested by building a crosscompiler to sh-elf, otherwise untested.

Uros.
Index: sh.c
===
--- sh.c(revision 217624)
+++ sh.c(working copy)
@@ -21,7 +21,6 @@ along with GCC; see the file COPYING3.  If not see
 
 #include 
 #include 
-#include 
 
 #include "config.h"
 #include "system.h"
@@ -2351,11 +2350,7 @@ sh_emit_scc_to_t (enum rtx_code code, rtx op0, rtx
   break;
 }
   if (code != oldcode)
-{
-  rtx tmp = op0;
-  op0 = op1;
-  op1 = tmp;
-}
+std::swap (op0, op1);
 
   mode = GET_MODE (op0);
   if (mode == VOIDmode)
@@ -2436,7 +2431,7 @@ sh_emit_compare_and_branch (rtx *operands, machine
   enum rtx_code branch_code;
   rtx op0 = operands[1];
   rtx op1 = operands[2];
-  rtx insn, tem;
+  rtx insn;
   bool need_ccmpeq = false;
 
   if (TARGET_SH2E && GET_MODE_CLASS (mode) == MODE_FLOAT)
@@ -2461,7 +2456,7 @@ sh_emit_compare_and_branch (rtx *operands, machine
  || (code == LE && TARGET_IEEE && TARGET_SH2E)
  || (code == GE && !(TARGET_IEEE && TARGET_SH2E)))
{
- tem = op0, op0 = op1, op1 = tem;
+ std::swap (op0, op1);
  code = swap_condition (code);
}
 
@@ -2520,7 +2515,6 @@ sh_emit_compare_and_set (rtx *operands, machine_mo
   rtx op1 = operands[3];
   rtx_code_label *lab = NULL;
   bool invert = false;
-  rtx tem;
 
   op0 = force_reg (mode, op0);
   if ((code != EQ && code != NE
@@ -2534,8 +2528,8 @@ sh_emit_compare_and_set (rtx *operands, machine_mo
 {
   if (code == LT || code == LE)
{
+ std::swap (op0, op1);
  code = swap_condition (code);
- tem = op0, op0 = op1, op1 = tem;
}
   if (code == GE)
{
Index: sh.md
===
--- sh.md   (revision 217624)
+++ sh.md   (working copy)
@@ -1618,14 +1618,9 @@
   extract_insn (insn2);
   if (! constrain_operands (1, get_preferred_alternatives (insn2, bb)))
 {
-  rtx tmp;
 failure:
-  tmp = replacements[0];
-  replacements[0] = replacements[1];
-  replacements[1] = tmp;
-  tmp = replacements[2];
-  replacements[2] = replacements[3];
-  replacements[3] = tmp;
+  std::swap (replacements[0], replacements[1]);
+  std::swap (replacements[2], replacements[3]);
   replace_n_hard_rtx (SET_DEST (set1), replacements, 2, 1);
   replace_n_hard_rtx (SET_DEST (set2), replacements, 2, 1);
   replace_n_hard_rtx (SET_SRC (set2), replacements, 2, 1);
@@ -11348,9 +11343,7 @@ label:
 
   if (swap)
 {
-  rtx tem = operands[2];
-  operands[2] = operands[3];
-  operands[3] = tem;
+  std::swap (operands[2], operands[3]);
   code = swap_condition (code);
 }
 
@@ -12538,11 +12531,7 @@ label:
   /* Change 'b * a + a' into 'a * b + a'.
  This is better for register allocation.  */
   if (REGNO (operands[2]) == REGNO (operands[3]))
-{
-  rtx tmp = operands[1];
-  operands[1] = operands[2];
-  operands[2] = tmp;
-}
+std::swap (operands[1], operands[2]);
 }
   [(set_attr "type" "fp")
(set_attr "fp_mode" "single")])

Re: [PATCH, sh]: Use std::swap

2014-11-16 Thread Oleg Endo



On Nov 16, 2014, at 6:36 PM, Uros Bizjak  wrote:

> Hello!
> 
> 2014-11-16  Uros Bizjak  
> 
>* config/sh/sh.c: Do not include algorithm.
>(sh_emit_scc_to_t): Replace open-coded swap with std::swap
>to swap values.
>(sh_emit_compare_and_branch): Ditto.
>(sh_emit_compare_and_set): Ditto.
>* config/sh/sh.md (replacement peephole2): Ditto.
>(cstore4_media): Ditto.
>(*fmasf4): Ditto.
> 
> Tested by building a crosscompiler to sh-elf, otherwise untested.

OK for trunk.  Thanks for taking care of this.

Cheers,
Oleg

match.pd tweaks for vectors and issues with HONOR_NANS

2014-11-16 Thread Marc Glisse


Hello,

this patch breaks gcc.dg/torture/pr50396.c, and I believe this is a 
symptom of a bigger issue: the HONOR_NANS interface is bad (or at least 
the way we are using it is bad). To know if a type honors NaN, we first 
get its TYPE_MODE and then call HONOR_NANS on that. But for vectors that 
do not directly map to hardware, the mode is BLKmode, for which HONOR_NANS 
returns false (bad luck, the default is unsafe).


We could introduce a function:

bool
honor_nans (machine_mode m)
{
  // check for BLKmode?
  return HONOR_NANS (m);
}
bool
honor_nans (const_tree t)
{
  if (!TYPE_P (t))
t = TREE_TYPE (t);
  if (VECTOR_TYPE_P (t) || COMPLEX_TYPE_P (t))
t = TREE_TYPE (t);
  return honor_nans (TYPE_MODE (t));
}

and use it in many places. Or call it honors_nan so we don't have to 
rename variables. Or maybe a function ignore_nans instead, that returns 
!honor_nans. I am hoping that the element type of a vector always has a 
mode, otherwise the function will need to be a bit more complicated.


But the same issue also affects HONOR_SIGNED_ZEROS, HONOR_SNANS, 
HONOR_INFINITIES, etc. It is going to be a pain to add a new function for 
each and replace uses. We could instead replace those calls to TYPE_MODE 
by a new:


machine_mode
element_mode (const_tree t)
{
  if (!TYPE_P (t))
t = TREE_TYPE (t);
  if (VECTOR_TYPE_P (t) || COMPLEX_TYPE_P (t))
t = TREE_TYPE (t);
  return TYPE_MODE (t);
}

so we still have to use HONOR_NANS on the result but at least we only
introduce one new function.

Any opinion on how best to handle that? I can't promise I'll have time to 
work on it any time soon (I might, but I don't know).


--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd(revision 217614)
+++ gcc/match.pd(working copy)
@@ -19,21 +19,21 @@ FITNESS FOR A PARTICULAR PURPOSE.  See t
 for more details.
 
 You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
 
 /* Generic tree predicates we inherit.  */
 (define_predicates
integer_onep integer_zerop integer_all_onesp integer_minus_onep
-   integer_each_onep
+   integer_each_onep integer_truep
real_zerop real_onep real_minus_onep
CONSTANT_CLASS_P
tree_expr_nonnegative_p)
 
 /* Operator lists.  */
 (define_operator_list tcc_comparison
   lt   le   eq ne ge   gt   unordered ordered   unlt unle ungt unge uneq ltgt)
 (define_operator_list inverted_tcc_comparison
   ge   gt   ne eq lt   le   ordered   unordered ge   gt   le   lt   ltgt uneq)
 (define_operator_list inverted_tcc_comparison_with_nans
@@ -110,47 +110,49 @@ along with GCC; see the file COPYING3.
 /* Make sure to preserve divisions by zero.  This is the reason why
we don't simplify x / x to 1 or 0 / x to 0.  */
 (for op (mult trunc_div ceil_div floor_div round_div exact_div)
   (simplify
 (op @0 integer_onep)
 (non_lvalue @0)))
 
 /* X / -1 is -X.  */
 (for div (trunc_div ceil_div floor_div round_div exact_div)
  (simplify
-   (div @0 INTEGER_CST@1)
-   (if (!TYPE_UNSIGNED (type)
-&& wi::eq_p (@1, -1))
+   (div @0 integer_minus_onep@1)
+   (if (!TYPE_UNSIGNED (type))
 (negate @0
 
 /* For unsigned integral types, FLOOR_DIV_EXPR is the same as
TRUNC_DIV_EXPR.  Rewrite into the latter in this case.  */
 (simplify
  (floor_div @0 @1)
- (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  && TYPE_UNSIGNED (type))
   (trunc_div @0 @1)))
 
 /* Optimize A / A to 1.0 if we don't care about
-   NaNs or Infinities.  Skip the transformation
-   for non-real operands.  */
+   NaNs or Infinities.  */
 (simplify
  (rdiv @0 @0)
- (if (SCALAR_FLOAT_TYPE_P (type)
+ (if (FLOAT_TYPE_P (type)
   && ! HONOR_NANS (TYPE_MODE (type))
   && ! HONOR_INFINITIES (TYPE_MODE (type)))
-  { build_real (type, dconst1); })
- /* The complex version of the above A / A optimization.  */
- (if (COMPLEX_FLOAT_TYPE_P (type)
-  && ! HONOR_NANS (TYPE_MODE (TREE_TYPE (type)))
-  && ! HONOR_INFINITIES (TYPE_MODE (TREE_TYPE (type
-  { build_complex (type, build_real (TREE_TYPE (type), dconst1),
-  build_real (TREE_TYPE (type), dconst0)); }))
+  { build_one_cst (type); }))
+
+/* Optimize -A / A to -1.0 if we don't care about
+   NaNs or Infinities.  */
+(simplify
+ (rdiv:c @0 (negate @0))
+ (if (FLOAT_TYPE_P (type)
+  && ! HONOR_NANS (TYPE_MODE (type))
+  && ! HONOR_INFINITIES (TYPE_MODE (type)))
+  { build_minus_one_cst (type); }))
 
 /* In IEEE floating point, x/1 is not equivalent to x for snans.  */
 (simplify
  (rdiv @0 real_onep)
  (if (!HONOR_SNANS (TYPE_MODE (type)))
   (non_lvalue @0)))
 
 /* In IEEE floating point, x/-1 is not equivalent to -x for snans.  */
 (simplify
  (rdiv @0 real_minus_onep)
@@ -184,23 +186,22 @@ along with GCC; see the file COPYING3.
   (mod integer_zerop@0 @1)
   /* But

[patch] [WIP] Optimize synchronization in std::future if futexes are available.

2014-11-16 Thread Torvald Riegel

This WORK-IN-PROGRESS patch uses an atomic unsigned and futex operations
to optimize the synchronization code in std::future.  The current code
uses a mutex/condvar combination, which is both slower (e.g., due to
mutex contention, stronger ordering requirements for condvars, using an
additional condvar-internal mutex, ...) and makes std::future fairly
large.

It introduces an __atomic_futex_unsigned type, which provides basic
atomic operations (load and store) on an atomic and
additionally provides load_when_[not_]equal operations that do blocking
waits on the atomic -- pretty much what futexes do.  Such an
__atomic_futex_unsigned is then 

There are a few bits missing in this patch:
* A fallback implementation for platforms that don't provide futexes, in
the form of a different implementation of __atomic_futex_unsigned.  A
mutex+condvar combination is what I'm aiming at; for std::future, this
would lead to similar code and sizeof(std::future) as currently.
* More documentation of how the synchronization works.  Jonathan has a
patch in flight for that, so this should get merged.
* Integration with the on_thread_exit patch that Jonathan posted, which
uses the current, lock-based synchronization scheme and thus needs to
get adapted.
* Testing.

There are ways to optimize further I suppose, for example by letting the
__atomic_futex_unsigned take care of all current uses of call_once too.
Let me know if I should do that.  This would reduce the number of atomic
ops a little in some cases, as well as reduce space required for futures
a little.

Comments?
commit 1543313a3b590e6422f5d547cabd6662e0a6f538
Author: Torvald Riegel 
Date:   Sun Nov 16 12:07:22 2014 +0100

[WIP] Optimize synchronization in std::future if futexes are available.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index e6edc73..81e6a8b 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -83,6 +83,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
 	${bits_srcdir}/basic_string.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 2ade448..2d72de3 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -350,6 +350,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
 	${bits_srcdir}/basic_string.h \
diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
new file mode 100644
index 000..4b5664d
--- /dev/null
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -0,0 +1,175 @@
+// -*- C++ -*- header.
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/atomic_futex.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly.
+ */
+
+#ifndef _GLIBCXX_ATOMIC_FUTEX_H
+#define _GLIBCXX_ATOMIC_FUTEX_H 1
+
+#pragma GCC system_header
+
+#include 
+#include 
+#include 
+#include 
+
+#ifndef _GLIBCXX_ALWAYS_INLINE
+#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((always_inline))
+#endif
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+//#ifdef _GLIBCXX_USE_FUTEX
+  struct __atomic_futex_unsigned_base
+  {
+// Returns false iff a timeout occurred.
+static bool
+futex_wait_for(unsigned *addr, unsigned val, bool has_timeout,
+	chrono::seconds s, chrono::nanoseconds ns);
+
+// This is static because it can be executed after the object has been
+// destroyed.
+static void futex_notify_all(unsigned* addr);
+  };
+
+  template 
+  struct __atomic_futex_unsigned : public __atomic_futex_unsigned_base
+  {
+// XXX We expect this to be

Re: [PATCH 0/5] some combine patches

2014-11-16 Thread Segher Boessenkool

On Sun, Nov 16, 2014 at 05:45:06PM +0900, Oleg Endo wrote:
> When you commit those, could you please also add PR 59278 to the ChangeLog so
> that the commit appears in bugzilla?  After your patches are in, I'd like to
> add some SH specific test cases (assuming that your patches fix PR 59278).

It doesn't fix this testcase.  Here, recog_for_combine needs to add a
clobber of T, but it thinks T is not dead.  It doesn't say anything
about that in the debug dump :-(

Maybe the movrt_negc pattern shouldn't set T at all, just clobber it?
Or reg_dead_at_p can be taught about "unused" notes.

Segher

Re: [PATCH 0/5] some combine patches

2014-11-16 Thread Oleg Endo

On 16 Nov 2014, at 22:18, Segher Boessenkool  wrote:

> On Sun, Nov 16, 2014 at 05:45:06PM +0900, Oleg Endo wrote:
>> When you commit those, could you please also add PR 59278 to the ChangeLog so
>> that the commit appears in bugzilla?  After your patches are in, I'd like to
>> add some SH specific test cases (assuming that your patches fix PR 59278).
> 
> It doesn't fix this testcase.  

Too bad.

> Here, recog_for_combine needs to add a
> clobber of T, but it thinks T is not dead.  It doesn't say anything
> about that in the debug dump :-(
> 
> Maybe the movrt_negc pattern shouldn't set T at all, just clobber it?

On SH, it's not just that particular pattern, but a couple of others, which 
would need to be changed from set-set to set-clobber before/during combine and 
then converted/split into the actual set-set patterns after combine.  E.g. some 
patterns set the T bit to a known zero/one value which can be good to know 
later on.

> Or reg_dead_at_p can be taught about "unused" notes.

Sounds the easier way from my point of view.  I don't know about side effects 
for other targets of "unused" reg notes are treated as "dead" reg notes.

Cheers,
Oleg

Re: match.pd tweaks for vectors and issues with HONOR_NANS

2014-11-16 Thread Richard Biener

On November 16, 2014 1:07:59 PM CET, Marc Glisse  wrote:
>Hello,
>
>this patch breaks gcc.dg/torture/pr50396.c, and I believe this is a 
>symptom of a bigger issue: the HONOR_NANS interface is bad (or at least
>
>the way we are using it is bad). To know if a type honors NaN, we first
>
>get its TYPE_MODE and then call HONOR_NANS on that. But for vectors
>that 
>do not directly map to hardware, the mode is BLKmode, for which
>HONOR_NANS 
>returns false (bad luck, the default is unsafe).
>
>We could introduce a function:
>
>bool
>honor_nans (machine_mode m)
>{
>   // check for BLKmode?
>   return HONOR_NANS (m);
>}
>bool
>honor_nans (const_tree t)
>{
>   if (!TYPE_P (t))
> t = TREE_TYPE (t);
>   if (VECTOR_TYPE_P (t) || COMPLEX_TYPE_P (t))
> t = TREE_TYPE (t);
>   return honor_nans (TYPE_MODE (t));
>}
>
>and use it in many places. Or call it honors_nan so we don't have to 
>rename variables. Or maybe a function ignore_nans instead, that returns
>
>!honor_nans. I am hoping that the element type of a vector always has a
>
>mode, otherwise the function will need to be a bit more complicated.
>
>But the same issue also affects HONOR_SIGNED_ZEROS, HONOR_SNANS, 
>HONOR_INFINITIES, etc. It is going to be a pain to add a new function
>for 
>each and replace uses. We could instead replace those calls to
>TYPE_MODE 
>by a new:
>
>machine_mode
>element_mode (const_tree t)
>{
>   if (!TYPE_P (t))
> t = TREE_TYPE (t);
>   if (VECTOR_TYPE_P (t) || COMPLEX_TYPE_P (t))
> t = TREE_TYPE (t);
>   return TYPE_MODE (t);
>}
>
>so we still have to use HONOR_NANS on the result but at least we only
>introduce one new function.
>
>Any opinion on how best to handle that? I can't promise I'll have time
>to 
>work on it any time soon (I might, but I don't know).

I think the element_mode is the way to go.
Eventually implicitly by introducing honor_Nan's (type) which would also reduce 
typing.

Richard.

Re: [PATCH] Look through widening type conversions for possible edge assertions

2014-11-16 Thread Richard Biener

On November 16, 2014 5:22:26 AM CET, Patrick Palka  wrote:
>On Wed, Nov 12, 2014 at 3:38 AM, Richard Biener
> wrote:
>> On Wed, Nov 12, 2014 at 5:17 AM, Patrick Palka 
>wrote:
>>> On Tue, Nov 11, 2014 at 8:48 AM, Richard Biener
>>>  wrote:
 On Tue, Nov 11, 2014 at 1:10 PM, Patrick Palka
> wrote:
> This patch is a replacement for the 2nd VRP refactoring patch.  It
> simply teaches VRP to look through widening type conversions when
> finding suitable edge assertions, e.g.
>
> bool p = x != y;
> int q = (int) p;
> if (q == 0) // new edge assert: p == 0 and therefore x == y

 I think the proper fix is to forward x != y to q == 0 instead of
>this one.
 That said - the tree-ssa-forwprop.c restriction on only forwarding
 single-uses into conditions is clearly bogus here.  I suggest to
 relax it for conversions and compares.  Like with

 Index: tree-ssa-forwprop.c
 ===
 --- tree-ssa-forwprop.c (revision 217349)
 +++ tree-ssa-forwprop.c (working copy)
 @@ -476,7 +476,7 @@ forward_propagate_into_comparison_1 (gim
 {
   rhs0 = rhs_to_tree (TREE_TYPE (op1), def_stmt);
   tmp = combine_cond_expr_cond (stmt, code, type,
 -   rhs0, op1, !single_use0_p);
 +   rhs0, op1, false);
   if (tmp)
 return tmp;
 }


 Thanks,
 Richard.
>>>
>>> That makes sense.  Attached is what I have so far.  I relaxed the
>>> forwprop restriction in the case of comparing an integer constant
>with
>>> a comparison or with a conversion from a boolean value.  (If I allow
>>> all conversions, not just those from a boolean value, then a couple
>of
>>> -Wstrict-overflow faillures trigger..)  Does the change look
>sensible?
>>>  Should the logic be duplicated for the case when TREE_CODE (op1) ==
>>> SSA_NAME? Thanks for your help so far!
>>
>> It looks good though I'd have allowed all kinds of conversions, not
>only
>> those from booleans.
>>
>> If the patch tests ok with that change it is ok.
>
>Sadly changing the patch to propagate all kinds of conversions, not
>only just those from booleans, introduces regressions that I don't
>know how to adequately fix.

OK.  The original patch propagating only bool conversions is ok then.  Can you 
list the failures you have seen when propagating more?

Thanks,
Richard.

Re: Concepts code review

2014-11-16 Thread Andrew Sutton

>>> +// Bring the parameters of a function declaration back into
>>> +// scope without entering the function body. The declarator
>>> +// must be a function declarator. The caller is responsible
>>> +// for calling finish_scope.
>>> +void
>>> +push_function_parms (cp_declarator *declarator)
>>
>> I think if the caller is calling finish_scope, I'd prefer for the
>> begin_scope call to be there as well.
>>
> Even though Andrew said that this will change later for other reasons, it's
> a function I wrote so: I actually debated this with Andrew before.  My
> rationale for calling begin_scope in the function was that it feels
> consistent with the semantics of the call. Specifically it can be seen as
> reopening the function parameter scope.  Thus the call is balanced by
> calling finish_scope.  Either way would work of course, but perhaps it just
> needed a better name and/or comment?

In the process of removing constraints from lang_decl nodes, I also
ended up addressing a lot of the other constraint processing comments
-- it made sense to do it at the same time.

One of those was moving a requires-clause into a function declarator.
I had thought that this would prevent me from having to re-open that
scope, but it doesn't. The parameter scope is closed at a lower level
in the parse :/

So this issue is still around.

>>> +  // Save the current template requirements.
>>> +  saved_template_reqs = release (current_template_reqs);
>>
>>
>> It seems like a lot of places with saved_template_reqs variables could be
>> converted to use cp_manage_requirements.
>>
> Probably.  The instance you quoted isn't very trivial though.  The
> requirements are saved in two different branches and need to be restored
> before a certain point in the function.  Might just need to spend more time
> looking over the code.

I got rid of current_template_reqs in my current work. Constraints are
associated directly with a template parameter list or a declarator.

>>> +  // FIXME: This could be improved. Perhaps the type of the requires
>>> +  // expression depends on the satisfaction of its constraints. That
>>> +  // is, its type is bool only if its substitution into its normalized
>>> +  // constraints succeeds.
>>
>>
>> The requires-expression is not type-dependent, but it can be
>> instantiation-dependent and value-dependent.
>>
> This is an interesting change.  The REQUIRES_EXPR is currently marked as
> value dependent.  The ChangeLog indicates that Andrew loosened the
> conditions for being value dependent for some cases, but then added it as
> type dependent when something else failed.  May require some time to pin
> down exactly what needs to be done here.

I think something may have changed since I made that change... If I'm
remembering correctly, it used to be the case that build_x_binary_op
would try to fold the resulting expression when the operands weren't
type dependent. That happens in conjoin_constraints.

Now it looks like it's calling build_non_dependent_expr, which does
something a little different.

I agree that requires-expressions are not type-dependent (they have type bool).

Andrew

[PATCH] gcc/c-family/c-cppbuiltin.c: Use 20 instead of 18 for the maximized 64-bits integer decimal string length

2014-11-16 Thread Chen Gang

The maximize 64-bits integer decimal string length excluding NUL is 20 (
'-9223372036854775808'), so need use 20 instead of 18 for HOST_WIDE_INT.

2014-11-16  Chen Gang  

* c-family/c-cppbuiltin.c (builtin_define_with_int_value):  Use
20 instead of 18 for the maximize 64-bits integer decimal
string length
---
 gcc/c-family/c-cppbuiltin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 26fabc2..8e8cec4 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1345,7 +1345,7 @@ builtin_define_with_int_value (const char *macro, 
HOST_WIDE_INT value)
 {
   char *buf;
   size_t mlen = strlen (macro);
-  size_t vlen = 18;
+  size_t vlen = 20; /* maximize value length: -9223372036854775808 */
   size_t extra = 2; /* space for = and NUL.  */
 
   buf = (char *) alloca (mlen + vlen + extra);
-- 
1.9.3

[PATCH, committed] Update Automake files

2014-11-16 Thread Jan-Benedict Glaw

Hi!

This patch updates the files taken from Automake.  Committed.

MfG, JBG

2014-11-16  Jan-Benedict Glaw  
 
* compile: Sync with upstream Automake.
* depcomp: Ditto.
* install-sh: Ditto.
* missing: Ditto.
* mkinstalldirs: Ditto.
* ylwrap: Ditto.

diff --git a/compile b/compile
index ec64c62..a85b723 100755
--- a/compile
+++ b/compile
@@ -1,10 +1,9 @@
 #! /bin/sh
-# Wrapper for compilers which do not understand `-c -o'.
+# Wrapper for compilers which do not understand '-c -o'.
 
-scriptversion=2009-04-28.21; # UTC
+scriptversion=2012-10-14.11; # UTC
 
-# Copyright (C) 1999, 2000, 2003, 2004, 2005, 2009  Free Software
-# Foundation, Inc.
+# Copyright (C) 1999-2014 Free Software Foundation, Inc.
 # Written by Tom Tromey .
 #
 # This program is free software; you can redistribute it and/or modify
@@ -29,21 +28,224 @@ scriptversion=2009-04-28.21; # UTC
 # bugs to  or send patches to
 # .
 
+nl='
+'
+
+# We need space, tab and new line, in precisely that order.  Quoting is
+# there to prevent tools from complaining about whitespace usage.
+IFS=" ""   $nl"
+
+file_conv=
+
+# func_file_conv build_file lazy
+# Convert a $build file to $host form and store it in $file
+# Currently only supports Windows hosts. If the determined conversion
+# type is listed in (the comma separated) LAZY, no conversion will
+# take place.
+func_file_conv ()
+{
+  file=$1
+  case $file in
+/ | /[!/]*) # absolute file, and not a UNC file
+  if test -z "$file_conv"; then
+   # lazily determine how to convert abs files
+   case `uname -s` in
+ MINGW*)
+   file_conv=mingw
+   ;;
+ CYGWIN*)
+   file_conv=cygwin
+   ;;
+ *)
+   file_conv=wine
+   ;;
+   esac
+  fi
+  case $file_conv/,$2, in
+   *,$file_conv,*)
+ ;;
+   mingw/*)
+ file=`cmd //C echo "$file " | sed -e 's/"\(.*\) " *$/\1/'`
+ ;;
+   cygwin/*)
+ file=`cygpath -m "$file" || echo "$file"`
+ ;;
+   wine/*)
+ file=`winepath -w "$file" || echo "$file"`
+ ;;
+  esac
+  ;;
+  esac
+}
+
+# func_cl_dashL linkdir
+# Make cl look for libraries in LINKDIR
+func_cl_dashL ()
+{
+  func_file_conv "$1"
+  if test -z "$lib_path"; then
+lib_path=$file
+  else
+lib_path="$lib_path;$file"
+  fi
+  linker_opts="$linker_opts -LIBPATH:$file"
+}
+
+# func_cl_dashl library
+# Do a library search-path lookup for cl
+func_cl_dashl ()
+{
+  lib=$1
+  found=no
+  save_IFS=$IFS
+  IFS=';'
+  for dir in $lib_path $LIB
+  do
+IFS=$save_IFS
+if $shared && test -f "$dir/$lib.dll.lib"; then
+  found=yes
+  lib=$dir/$lib.dll.lib
+  break
+fi
+if test -f "$dir/$lib.lib"; then
+  found=yes
+  lib=$dir/$lib.lib
+  break
+fi
+if test -f "$dir/lib$lib.a"; then
+  found=yes
+  lib=$dir/lib$lib.a
+  break
+fi
+  done
+  IFS=$save_IFS
+
+  if test "$found" != yes; then
+lib=$lib.lib
+  fi
+}
+
+# func_cl_wrapper cl arg...
+# Adjust compile command to suit cl
+func_cl_wrapper ()
+{
+  # Assume a capable shell
+  lib_path=
+  shared=:
+  linker_opts=
+  for arg
+  do
+if test -n "$eat"; then
+  eat=
+else
+  case $1 in
+   -o)
+ # configure might choose to run compile as 'compile cc -o foo foo.c'.
+ eat=1
+ case $2 in
+   *.o | *.[oO][bB][jJ])
+ func_file_conv "$2"
+ set x "$@" -Fo"$file"
+ shift
+ ;;
+   *)
+ func_file_conv "$2"
+ set x "$@" -Fe"$file"
+ shift
+ ;;
+ esac
+ ;;
+   -I)
+ eat=1
+ func_file_conv "$2" mingw
+ set x "$@" -I"$file"
+ shift
+ ;;
+   -I*)
+ func_file_conv "${1#-I}" mingw
+ set x "$@" -I"$file"
+ shift
+ ;;
+   -l)
+ eat=1
+ func_cl_dashl "$2"
+ set x "$@" "$lib"
+ shift
+ ;;
+   -l*)
+ func_cl_dashl "${1#-l}"
+ set x "$@" "$lib"
+ shift
+ ;;
+   -L)
+ eat=1
+ func_cl_dashL "$2"
+ ;;
+   -L*)
+ func_cl_dashL "${1#-L}"
+ ;;
+   -static)
+ shared=false
+ ;;
+   -Wl,*)
+ arg=${1#-Wl,}
+ save_ifs="$IFS"; IFS=','
+ for flag in $arg; do
+   IFS="$save_ifs"
+   linker_opts="$linker_opts $flag"
+ done
+ IFS="$save_ifs"
+ ;;
+   -Xlinker)
+ eat=1
+ linker_opts="$linker_opts $2"
+ ;;
+   -*)
+ set x "$@" "$1"
+ shift
+ ;;
+   *.cc | *.CC | *.cxx | *.CXX | *.[cC]++)
+ func_file_conv "$1"
+ set x "$@" -Tp"$file"
+ shift
+ ;;
+   *.c | *.cpp | *.CPP | *.lib | *.LIB | *.Lib | *.OBJ | *.obj | *.[oO])
+ func_file_conv "$1" mingw
+ set x "$@

[PATCH] gcc/c-family/c-cppbuiltin.c: Add two bytes for avoiding memory overflow issue

2014-11-16 Thread Chen Gang

When 'is_str' is true, need consider about 2 '"' for the extra space, or
will cause memory overflow.

2014-11-16  Chen Gang  

* c-family/c-cppbuiltin.c (builtin_define_with_value): Add two
bytes for avoiding memory overflow issue.
---
 gcc/c-family/c-cppbuiltin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 8e8cec4..cc3d90b 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1282,7 +1282,7 @@ builtin_define_with_value (const char *macro, const char 
*expansion, int is_str)
   char *buf;
   size_t mlen = strlen (macro);
   size_t elen = strlen (expansion);
-  size_t extra = 2;  /* space for an = and a NUL */
+  size_t extra = 4;  /* space for an =, a NUL, and 2 '"' when is_str is true */
 
   if (is_str)
 {
-- 
1.9.3

Re: [PATCH] gcc/c-family/c-cppbuiltin.c: Add two bytes for avoiding memory overflow issue

2014-11-16 Thread Chen Gang


Oh, sorry, it is incorrect, original code is already add 2 bytes for it.

Thanks.

On 11/16/2014 10:32 PM, Chen Gang wrote:
> When 'is_str' is true, need consider about 2 '"' for the extra space, or
> will cause memory overflow.
> 
> 2014-11-16  Chen Gang  
> 
>   * c-family/c-cppbuiltin.c (builtin_define_with_value): Add two
>   bytes for avoiding memory overflow issue.
> ---
>  gcc/c-family/c-cppbuiltin.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> index 8e8cec4..cc3d90b 100644
> --- a/gcc/c-family/c-cppbuiltin.c
> +++ b/gcc/c-family/c-cppbuiltin.c
> @@ -1282,7 +1282,7 @@ builtin_define_with_value (const char *macro, const 
> char *expansion, int is_str)
>char *buf;
>size_t mlen = strlen (macro);
>size_t elen = strlen (expansion);
> -  size_t extra = 2;  /* space for an = and a NUL */
> +  size_t extra = 4;  /* space for an =, a NUL, and 2 '"' when is_str is true 
> */
>  
>if (is_str)
>  {
> 


-- 
Chen Gang

Open share and attitude like air water and life which God blessed

Re: [PATCH] Look through widening type conversions for possible edge assertions

2014-11-16 Thread Patrick Palka

On Sun, Nov 16, 2014 at 8:52 AM, Richard Biener
 wrote:
> On November 16, 2014 5:22:26 AM CET, Patrick Palka  
> wrote:
>>On Wed, Nov 12, 2014 at 3:38 AM, Richard Biener
>> wrote:
>>> On Wed, Nov 12, 2014 at 5:17 AM, Patrick Palka 
>>wrote:
 On Tue, Nov 11, 2014 at 8:48 AM, Richard Biener
  wrote:
> On Tue, Nov 11, 2014 at 1:10 PM, Patrick Palka
>> wrote:
>> This patch is a replacement for the 2nd VRP refactoring patch.  It
>> simply teaches VRP to look through widening type conversions when
>> finding suitable edge assertions, e.g.
>>
>> bool p = x != y;
>> int q = (int) p;
>> if (q == 0) // new edge assert: p == 0 and therefore x == y
>
> I think the proper fix is to forward x != y to q == 0 instead of
>>this one.
> That said - the tree-ssa-forwprop.c restriction on only forwarding
> single-uses into conditions is clearly bogus here.  I suggest to
> relax it for conversions and compares.  Like with
>
> Index: tree-ssa-forwprop.c
> ===
> --- tree-ssa-forwprop.c (revision 217349)
> +++ tree-ssa-forwprop.c (working copy)
> @@ -476,7 +476,7 @@ forward_propagate_into_comparison_1 (gim
> {
>   rhs0 = rhs_to_tree (TREE_TYPE (op1), def_stmt);
>   tmp = combine_cond_expr_cond (stmt, code, type,
> -   rhs0, op1, !single_use0_p);
> +   rhs0, op1, false);
>   if (tmp)
> return tmp;
> }
>
>
> Thanks,
> Richard.

 That makes sense.  Attached is what I have so far.  I relaxed the
 forwprop restriction in the case of comparing an integer constant
>>with
 a comparison or with a conversion from a boolean value.  (If I allow
 all conversions, not just those from a boolean value, then a couple
>>of
 -Wstrict-overflow faillures trigger..)  Does the change look
>>sensible?
  Should the logic be duplicated for the case when TREE_CODE (op1) ==
 SSA_NAME? Thanks for your help so far!
>>>
>>> It looks good though I'd have allowed all kinds of conversions, not
>>only
>>> those from booleans.
>>>
>>> If the patch tests ok with that change it is ok.
>>
>>Sadly changing the patch to propagate all kinds of conversions, not
>>only just those from booleans, introduces regressions that I don't
>>know how to adequately fix.
>
> OK.  The original patch propagating only bool conversions is ok then.  Can 
> you list the failures you have seen when propagating more?
>
> Thanks,
> Richard.
>

gcc.dg/Wstrict-overflow-26.c: the patch introduces a bogus overflow
warning here.  I was able to fix this one by not warning on equality
comparisons, but fixing it caused ...
gcc.dg/Wstrict-overflow-18.c: ... this to regress.  I was able to this
one too, by teaching VRP to emit an overflow warning when simplifying
non-equality comparisons to equality comparisons (in this case i > 0
--> i != 0) when the operand has the range [0, +INF(OVF)].
g++.dg/calloc.C: this regression I wasn't able to fix.  One problem is
that VRP is no longer able to simplify the "m * 4 > 0" comparison in
the following testcase:

void
f (int n)
{
  size_t m = n;
  if (m > (size_t)-1 / 4)
abort ();
  if (n != 0) // used to be m != 0 before the patch
{
  ...
  if (m * 4 > 0)
..
}
}

This happens because VRP has no way of knowing that if n != 0 then m
!= 0.  I hacked up a fix for this deficiency in VRP by looking at an
operand's def stmts when adding edge assertions, so that for the
conditional "n != 0" we will also insert the edge assertion "m != 0".
But still calloc.C regressed, most notably in the slsr pass where the
pass was unable to combine two ssa names which had equivalent
definitions. At that point I gave up.

I also played around with folding "m > (size_t)-1 / 4" to "n < 0" in
the hopes that a subsequent pass would move the definition for m
closer to its use (assuming such a pass exists) so that m will see n's
ASSERT_EXPRs in m's def chain.  But that didn't work too well because
apparently such a pass doesn't exist.

Re: [patch] [WIP] Optimize synchronization in std::future if futexes are available.

2014-11-16 Thread Jonathan Wakely


Oing libstdc++@, as required for all libstdc++ patches.

Original patch at
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02004.html


This WORK-IN-PROGRESS patch uses an atomic unsigned and futex operations
to optimize the synchronization code in std::future.  The current code
uses a mutex/condvar combination, which is both slower (e.g., due to
mutex contention, stronger ordering requirements for condvars, using an
additional condvar-internal mutex, ...) and makes std::future fairly
large.

It introduces an __atomic_futex_unsigned type, which provides basic
atomic operations (load and store) on an atomic and
additionally provides load_when_[not_]equal operations that do blocking
waits on the atomic -- pretty much what futexes do.  Such an
__atomic_futex_unsigned is then


... ?

(At a guess, you were going to say it would be useful for implementing
the proposed std::synchronic? :-)


There are a few bits missing in this patch:
* A fallback implementation for platforms that don't provide futexes, in
the form of a different implementation of __atomic_futex_unsigned.  A
mutex+condvar combination is what I'm aiming at; for std::future, this
would lead to similar code and sizeof(std::future) as currently.
* More documentation of how the synchronization works.  Jonathan has a
patch in flight for that, so this should get merged.


Yup.


* Integration with the on_thread_exit patch that Jonathan posted, which
uses the current, lock-based synchronization scheme and thus needs to
get adapted.


Aside: I think I'm going to apply that _at_thread_exit() patch, so we
have a complete implementation committed, even if we actually end up
replacing it with an atomic one - just so the patch isn't lost.
I'll add some comments to the code at the same time.


* Testing.


The testsuite isn't that extensive, but covers some of the tricky
cases I was trying to handle.


There are ways to optimize further I suppose, for example by letting the
__atomic_futex_unsigned take care of all current uses of call_once too.
Let me know if I should do that.  This would reduce the number of atomic
ops a little in some cases, as well as reduce space required for futures
a little.

Comments?


I like it, assuming it passes testing :-)

Inline below ...


diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index 8989474..392c13f 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -291,14 +292,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  typedef _Ptr<_Result_base> _Ptr_type;

+  enum _Status {
+   not_ready,
+   ready


These names need to be uglified, _S_not_ready and _S_ready would be
the usual convention.


+  };
+
  _Ptr_type _M_result;
-  mutex_M_mutex;
-  condition_variable   _M_cond;
+  __atomic_futex_unsigned<>  _M_status;
  atomic_flag   _M_retrieved;
  once_flag _M_once;

public:
-  _State_baseV2() noexcept : _M_result(), _M_retrieved(ATOMIC_FLAG_INIT)
+  _State_baseV2() noexcept : _M_result(), _M_retrieved(ATOMIC_FLAG_INIT),
+ _M_status(_Status::not_ready)
{ }
  _State_baseV2(const _State_baseV2&) = delete;
  _State_baseV2& operator=(const _State_baseV2&) = delete;
@@ -308,8 +314,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  wait()
  {
_M_complete_async();
-   unique_lock __lock(_M_mutex);
-   _M_cond.wait(__lock, [&] { return _M_ready(); });
+   // Acquire MO makes sure this synchronizes with the thread that made
+   // the future ready.
+   _M_status.load_when_equal(_Status::ready, memory_order_acquire);


Do we need another _M_complete_async() here?

Consider Thread A calling wait_for() on a future associated with an
async thread C, and Thread B calling wait() on the same future. The
state is made ready, and both A and B wake up. B returns immediately,
but A calls _M_complete_sync() which joins thread C.  This fails the
requirement that completion of the async thread (i.e. joining C)
synchronizes with the first thread to detect the ready state.

So I think we want another call to _M_complete_sync() which will wait
until C has been joined.

The current implementation ensures this requirement is met by using
the mutex so that A and B cannot wake up at the same time, so one of
them will perform the join on C and return before the other can
proceed.


return *_M_result;
  }

@@ -317,15 +324,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
future_status
wait_for(const chrono::duration<_Rep, _Period>& __rel)
{
- unique_lock __lock(_M_mutex);
- if (_M_ready())
+ _Status _s = _M_status.load(memory_order_acquire);
+ if (_s == _Status::ready)
return future_status::ready;
- if (_M_has_deferred())
+ if (_M_is_deferred_future())
return future_status::deferred;
- if (_M_cond.wait_for(__lock, __rel, [&] { r

Move TARGET_FLAGS_REGNUM entry in doc

2014-11-16 Thread Eric Botcazou

This looks like an oversight: the entry for TARGET_FLAGS_REGNUM is located in 
the "Passing Arguments in Registers" section of the doc, but it really belongs 
in the "Representation of condition codes using registers" section.

Fixed thusly, applied on all active branches as obvious.


2014-11-16  Eric Botcazou  

* doc/tm.texi.in (TARGET_FLAGS_REGNUM): Move around.
* doc/tm.texi: Regenerate.


-- 
Eric BotcazouIndex: doc/tm.texi.in
===
--- doc/tm.texi.in	(revision 217602)
+++ doc/tm.texi.in	(working copy)
@@ -3508,8 +3508,6 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
-@hook TARGET_FLAGS_REGNUM
-
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
@@ -4496,6 +4494,8 @@ like:
 
 @hook TARGET_CC_MODES_COMPATIBLE
 
+@hook TARGET_FLAGS_REGNUM
+
 @node Costs
 @section Describing Relative Costs of Operations
 @cindex costs of instructions

[PATCH, committed] Update move-if-change from gnulib

2014-11-16 Thread Jan-Benedict Glaw

Hi!

This brings move-if-change in sync with upstream gnulib.

MfG, JBG


2014-11-16  Jan-Benedict Glaw  

* move-if-change: Sync from upstream gnulib.

diff --git a/move-if-change b/move-if-change
index e7ba25e..88d9574 100755
--- a/move-if-change
+++ b/move-if-change
@@ -2,13 +2,13 @@
 # Like mv $1 $2, but if the files are the same, just delete $1.
 # Status is zero if successful, nonzero otherwise.
 
-VERSION='2011-01-28 20:09'; # UTC
+VERSION='2012-01-06 07:23'; # UTC
 # The definition above must lie within the first 8 lines in order
 # for the Emacs time-stamp write hook (at end) to update it.
 # If you change this file with Emacs, please let the write hook
 # do its job.  Otherwise, update this string manually.
 
-# Copyright (C) 2002-2007, 2009-2011 Free Software Foundation, Inc.
+# Copyright (C) 2002-2014 Free Software Foundation, Inc.
 
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -32,7 +32,7 @@ If SOURCE is different than DEST, then move it to DEST; else 
remove SOURCE.
   --help display this help and exit
   --version  output version information and exit
 
-The variable CMPPROG can be used to specify an alternative to \`cmp'.
+The variable CMPPROG can be used to specify an alternative to 'cmp'.
 
 Report bugs to ."
 

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:  Fortschritt bedeutet, einen Schritt so zu machen,
the second  :   daß man den nächsten auch noch machen kann.


signature.asc
Description: Digital signature

Re: match.pd tweaks for vectors and issues with HONOR_NANS

2014-11-16 Thread Marc Glisse


On Sun, 16 Nov 2014, Richard Biener wrote:


I think the element_mode is the way to go.


The following passed bootstrap+testsuite.

2014-11-16  Marc Glisse  

* tree.c (element_mode, integer_truep): New functions.
* tree.h (element_mode, integer_truep): Declare them.
* fold-const.c (negate_expr_p, fold_negate_expr, combine_comparisons,
fold_cond_expr_with_comparison, fold_real_zero_addition_p,
fold_comparison, fold_ternary_loc, tree_call_nonnegative_warnv_p,
fold_strip_sign_ops): Use element_mode.
(fold_binary_loc): Use element_mode and element_precision.
* match.pd: Use integer_truep, element_mode, element_precision,
VECTOR_TYPE_P and build_one_cst. Extend some transformations to
vectors. Simplify A/-A.

--
Marc GlisseIndex: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 217614)
+++ gcc/fold-const.c(working copy)
@@ -435,46 +435,46 @@ negate_expr_p (tree t)
   }
 
 case COMPLEX_EXPR:
   return negate_expr_p (TREE_OPERAND (t, 0))
 && negate_expr_p (TREE_OPERAND (t, 1));
 
 case CONJ_EXPR:
   return negate_expr_p (TREE_OPERAND (t, 0));
 
 case PLUS_EXPR:
-  if (HONOR_SIGN_DEPENDENT_ROUNDING (TYPE_MODE (type))
- || HONOR_SIGNED_ZEROS (TYPE_MODE (type)))
+  if (HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type))
+ || HONOR_SIGNED_ZEROS (element_mode (type)))
return false;
   /* -(A + B) -> (-B) - A.  */
   if (negate_expr_p (TREE_OPERAND (t, 1))
  && reorder_operands_p (TREE_OPERAND (t, 0),
 TREE_OPERAND (t, 1)))
return true;
   /* -(A + B) -> (-A) - B.  */
   return negate_expr_p (TREE_OPERAND (t, 0));
 
 case MINUS_EXPR:
   /* We can't turn -(A-B) into B-A when we honor signed zeros.  */
-  return !HONOR_SIGN_DEPENDENT_ROUNDING (TYPE_MODE (type))
-&& !HONOR_SIGNED_ZEROS (TYPE_MODE (type))
+  return !HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type))
+&& !HONOR_SIGNED_ZEROS (element_mode (type))
 && reorder_operands_p (TREE_OPERAND (t, 0),
TREE_OPERAND (t, 1));
 
 case MULT_EXPR:
   if (TYPE_UNSIGNED (TREE_TYPE (t)))
 break;
 
   /* Fall through.  */
 
 case RDIV_EXPR:
-  if (! HONOR_SIGN_DEPENDENT_ROUNDING (TYPE_MODE (TREE_TYPE (t
+  if (! HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (TREE_TYPE (t
return negate_expr_p (TREE_OPERAND (t, 1))
   || negate_expr_p (TREE_OPERAND (t, 0));
   break;
 
 case TRUNC_DIV_EXPR:
 case ROUND_DIV_EXPR:
 case EXACT_DIV_EXPR:
   /* In general we can't negate A / B, because if A is INT_MIN and
 B is 1, we may turn this into INT_MIN / -1 which is undefined
 and actually traps on some architectures.  But if overflow is
@@ -610,22 +610,22 @@ fold_negate_expr (location_t loc, tree t
return fold_build1_loc (loc, CONJ_EXPR, type,
fold_negate_expr (loc, TREE_OPERAND (t, 0)));
   break;
 
 case NEGATE_EXPR:
   if (!TYPE_OVERFLOW_SANITIZED (type))
return TREE_OPERAND (t, 0);
   break;
 
 case PLUS_EXPR:
-  if (!HONOR_SIGN_DEPENDENT_ROUNDING (TYPE_MODE (type))
- && !HONOR_SIGNED_ZEROS (TYPE_MODE (type)))
+  if (!HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type))
+ && !HONOR_SIGNED_ZEROS (element_mode (type)))
{
  /* -(A + B) -> (-B) - A.  */
  if (negate_expr_p (TREE_OPERAND (t, 1))
  && reorder_operands_p (TREE_OPERAND (t, 0),
 TREE_OPERAND (t, 1)))
{
  tem = negate_expr (TREE_OPERAND (t, 1));
  return fold_build2_loc (loc, MINUS_EXPR, type,
  tem, TREE_OPERAND (t, 0));
}
@@ -635,35 +635,35 @@ fold_negate_expr (location_t loc, tree t
{
  tem = negate_expr (TREE_OPERAND (t, 0));
  return fold_build2_loc (loc, MINUS_EXPR, type,
  tem, TREE_OPERAND (t, 1));
}
}
   break;
 
 case MINUS_EXPR:
   /* - (A - B) -> B - A  */
-  if (!HONOR_SIGN_DEPENDENT_ROUNDING (TYPE_MODE (type))
- && !HONOR_SIGNED_ZEROS (TYPE_MODE (type))
+  if (!HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type))
+ && !HONOR_SIGNED_ZEROS (element_mode (type))
  && reorder_operands_p (TREE_OPERAND (t, 0), TREE_OPERAND (t, 1)))
return fold_build2_loc (loc, MINUS_EXPR, type,
TREE_OPERAND (t, 1), TREE_OPERAND (t, 0));
   break;
 
 case MULT_EXPR:
   if (TYPE_UNSIGNED (type))
 break;
 
   /* Fall through.  */
 
 case RDIV_EXPR:
-  if (! HONOR_SIGN_DEPENDENT_ROUNDING (TYPE_MODE (type)))
+  if (! HONOR_SIGN_DEPENDENT_ROUNDING (el

Re: match.pd tweaks for vectors and issues with HONOR_NANS

2014-11-16 Thread Marc Glisse


On Sun, 16 Nov 2014, Marc Glisse wrote:


On Sun, 16 Nov 2014, Richard Biener wrote:


I think the element_mode is the way to go.


The following passed bootstrap+testsuite.

2014-11-16  Marc Glisse  

* tree.c (element_mode, integer_truep): New functions.
* tree.h (element_mode, integer_truep): Declare them.
* fold-const.c (negate_expr_p, fold_negate_expr, combine_comparisons,
fold_cond_expr_with_comparison, fold_real_zero_addition_p,
fold_comparison, fold_ternary_loc, tree_call_nonnegative_warnv_p,
fold_strip_sign_ops): Use element_mode.
(fold_binary_loc): Use element_mode and element_precision.
* match.pd: Use integer_truep, element_mode, element_precision,
VECTOR_TYPE_P and build_one_cst. Extend some transformations to
vectors. Simplify A/-A.


Hmm, that left before I wrote any explanation... There are occurences of 
HONOR_XXX (TYPE_MODE (t)) in other files (reassoc for instance), but we 
have to start somewhere, so I only touched match.pd and fold-const.c. I 
did not change the cases where it was clear that the type had to be a 
scalar.


--
Marc Glisse

Re: Avoid applying inline plan for all functions ahead of late compilation

2014-11-16 Thread Jan Hubicka

> >The patch also hits a bug in i386's ix86_set_current_function. It is
> >responsible
> >for initializing backend and it does so lazily remembering the previous
> >options
> >backend was initialized for. Pragma parsing however clears the cache
> >that leads
> >to wrong settings being used for subsetquent functions.
> >
> >Bootstrapped/regtested x86_64-linux, will commit it tomorrow after bit
> >of more testing.
> 
> But for example for IPA pta this means we apply all IPA transforms without 
> any garbage collection run?

The original loop also did not contain ggc_collect calls.  Can we call 
ggc_collect from ipa-pta's
data collection loop?

(in general I think -fipa-pta is kind of -fplease-explode-on-large-programs :))

Honza
> 
> Richard.
> 
> >Index: gcc/cgraphclones.c
> >===
> >--- gcc/cgraphclones.c   (revision 217612)
> >+++ gcc/cgraphclones.c   (working copy)
> >@@ -307,7 +307,7 @@ duplicate_thunk_for_node (cgraph_node *t
> > node = duplicate_thunk_for_node (thunk_of, node);
> > 
> >   if (!DECL_ARGUMENTS (thunk->decl))
> >-thunk->get_body ();
> >+thunk->get_untransformed_body ();
> > 
> >   cgraph_edge *cs;
> >   for (cs = node->callers; cs; cs = cs->next_caller)
> >@@ -1067,7 +1067,7 @@ symbol_table::materialize_all_clones (vo
> >   && !gimple_has_body_p (node->decl))
> > {
> >   if (!node->clone_of->clone_of)
> >-node->clone_of->get_body ();
> >+node->clone_of->get_untransformed_body ();
> >   if (gimple_has_body_p (node->clone_of->decl))
> > {
> >   if (symtab->dump_file)
> >Index: gcc/ipa-icf.c
> >===
> >--- gcc/ipa-icf.c(revision 217612)
> >+++ gcc/ipa-icf.c(working copy)
> >@@ -706,7 +706,7 @@ void
> > sem_function::init (void)
> > {
> >   if (in_lto_p)
> >-get_node ()->get_body ();
> >+get_node ()->get_untransformed_body ();
> > 
> >   tree fndecl = node->decl;
> >   function *func = DECL_STRUCT_FUNCTION (fndecl);
> >Index: gcc/passes.c
> >===
> >--- gcc/passes.c (revision 217612)
> >+++ gcc/passes.c (working copy)
> >@@ -2214,36 +2214,6 @@ execute_one_pass (opt_pass *pass)
> >  executed.  */
> >   invoke_plugin_callbacks (PLUGIN_PASS_EXECUTION, pass);
> > 
> >-  /* SIPLE IPA passes do not handle callgraphs with IPA transforms in
> >it.
> >- Apply all trnasforms first.  */
> >-  if (pass->type == SIMPLE_IPA_PASS)
> >-{
> >-  struct cgraph_node *node;
> >-  bool applied = false;
> >-  FOR_EACH_DEFINED_FUNCTION (node)
> >-if (node->analyzed
> >-&& node->has_gimple_body_p ()
> >-&& (!node->clone_of || node->decl != node->clone_of->decl))
> >-  {
> >-if (!node->global.inlined_to
> >-&& node->ipa_transforms_to_apply.exists ())
> >-  {
> >-node->get_body ();
> >-push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> >-execute_all_ipa_transforms ();
> >-cgraph_edge::rebuild_edges ();
> >-free_dominance_info (CDI_DOMINATORS);
> >-free_dominance_info (CDI_POST_DOMINATORS);
> >-pop_cfun ();
> >-applied = true;
> >-  }
> >-  }
> >-  if (applied)
> >-symtab->remove_unreachable_nodes (false, dump_file);
> >-  /* Restore current_pass.  */
> >-  current_pass = pass;
> >-}
> >-
> >   if (!quiet_flag && !cfun)
> > fprintf (stderr, " <%s>", pass->name ? pass->name : "");
> > 
> >Index: gcc/cgraphunit.c
> >===
> >--- gcc/cgraphunit.c (revision 217612)
> >+++ gcc/cgraphunit.c (working copy)
> >@@ -197,7 +197,6 @@ along with GCC; see the file COPYING3.
> > #include "target.h"
> > #include "diagnostic.h"
> > #include "params.h"
> >-#include "fibheap.h"
> > #include "intl.h"
> > #include "hash-map.h"
> > #include "plugin-api.h"
> >@@ -1469,7 +1468,7 @@ cgraph_node::expand_thunk (bool output_a
> > }
> > 
> >   if (in_lto_p)
> >-get_body ();
> >+get_untransformed_body ();
> >   a = DECL_ARGUMENTS (thunk_fndecl);
> >   
> >   current_function_decl = thunk_fndecl;
> >@@ -1522,7 +1521,7 @@ cgraph_node::expand_thunk (bool output_a
> >   gimple ret;
> > 
> >   if (in_lto_p)
> >-get_body ();
> >+get_untransformed_body ();
> >   a = DECL_ARGUMENTS (thunk_fndecl);
> > 
> >   current_function_decl = thunk_fndecl;
> >@@ -1744,7 +1743,7 @@ cgraph_node::expand (void)
> >   announce_function (decl);
> >   process = 0;
> >   gcc_assert (lowered);
> >-  get_body ();
> >+  get_untransformed_body ();
> > 
> >   /* Generate RTL for the body of DECL.  */
> > 
> >Index: gcc/cgraph.c
> >===
> >--- gcc/cgraph.c (revision 217612)
> >+++ gcc/cgraph.c

Re: [PATCH] Make IPA-CP propagate alignment information of pointers

2014-11-16 Thread Toon Moene


On 11/15/2014 02:04 AM, Martin Jambor wrote:


Hi,

this patch adds very simple propagation of alignment of pointers to
IPA-CP.  Because I have not attempted to estimate profitability of
such propagation in any way, it does not do any cloning, just
propagation when the alignment is known and the same in all contexts.


Thanks for this improvement !

From the Fortran side, arrays can be "created" in the following ways:

1. Statically in the main program.

2. As a subroutine-temporary "automatic array".

3. By allocating an allocatable array.

Arrays under 1. are aligned properly by the compiler.

Arrays under 2. are aligned properly because of the proper alignment of 
the stack nowadays.


Arrays under 3. are aligned properly because Fortran "ALLOCATE" 
ultimately calls malloc.


So Fortran arrays are always suitably aligned (the exception being an 
array actual argument passed as "CALL SUB(.., A(2:), ..)", which is 
extremely rare).


So this propagation of alignment information will result in basically 
removing all alignment peeling for Fortran code.


Thanks !

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news

[WEB][PATCH] Describe -pg and LTO changes

2014-11-16 Thread Andi Kleen


This patch describes some user visible changes that were 
added to gcc 5.

Ok to commit? 

-Andi


Index: htdocs/gcc-5/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.25
diff -u -r1.25 changes.html
--- htdocs/gcc-5/changes.html   14 Nov 2014 21:32:32 -  1.25
+++ htdocs/gcc-5/changes.html   16 Nov 2014 20:03:07 -
@@ -154,6 +154,11 @@
about qualifiers on pointers being discarded via a new warning option
-Wno-discarded-qualifiers.
 The C front end now generates more precise caret diagnostics.
+The -pg option now only affects the current file in a LTO build.
+A new no_reorder attribute has been added, that prevents reordering
+   of a specific symbol against other such symbols or inline assembler.
+   This is a more focussed alternative to 
+   -fno-toplevel-reorder.
   
 
 C++
@@ -295,6 +300,17 @@
 

 
+IA-32/x86-64
+  
+   The new -mrecord-mcount option for -pg
+   generates a Linux kernel style table of pointers to mcount or
+   __fentry__ calls at the beginning of functions. The new
+   -mnop-mcount option in addition also generates nops in
+   place of the __fentry__ or mcount call, so that a call per function
+   can be later patched in. This can be used for low overhead tracing or
+   hot code patching.
+  
+
 Operating Systems
 
   DragonFly BSD
@@ -304,7 +320,11 @@
   
 
 
-
-
+Other significant improvements
+  
+  
+The gcc-ar, gcc-nm, gcc-ranlib wrappers now
+   understand a -B option to set the compiler to use.
+

Audit predict.c for optimization attributes

2014-11-16 Thread Jan Hubicka

Hi,
many of the IPA passes ignore the fact that optimization attributes can 
enable/disable
flags per function granuality.  Since this is becoming more of an issue for LTO,
I plan to autit individual passes.  This is predict.c that uses global 
optimize_size
and I also noticed that probably_never_executed actually use cfun even if it 
have
fun pointer passed.

Bootstrapped/regtested x86_64-linux, will commit it after further testing at 
Firefox.

Honza

* predict.c (maybe_hot_frequency_p): Use opt_for_fn.
(optimize_function_for_size_p): Likewise.
(probably_never_executed): Likewise; replace cfun by fun.
Index: predict.c
===
--- predict.c   (revision 217633)
+++ predict.c   (working copy)
@@ -125,7 +125,8 @@ static inline bool
 maybe_hot_frequency_p (struct function *fun, int freq)
 {
   struct cgraph_node *node = cgraph_node::get (fun->decl);
-  if (!profile_info || !flag_branch_probabilities)
+  if (!profile_info
+  || !opt_for_fn (fun->decl, flag_branch_probabilities))
 {
   if (node->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)
 return false;
@@ -214,34 +215,34 @@ probably_never_executed (struct function
  gcov_type count, int frequency)
 {
   gcc_checking_assert (fun);
-  if (profile_status_for_fn (cfun) == PROFILE_READ)
+  if (profile_status_for_fn (fun) == PROFILE_READ)
 {
   int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
   if (count * unlikely_count_fraction >= profile_info->runs)
return false;
   if (!frequency)
return true;
-  if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency)
+  if (!ENTRY_BLOCK_PTR_FOR_FN (fun)->frequency)
return false;
-  if (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count)
+  if (ENTRY_BLOCK_PTR_FOR_FN (fun)->count)
{
   gcov_type computed_count;
   /* Check for possibility of overflow, in which case entry bb count
  is large enough to do the division first without losing much
  precision.  */
- if (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count < REG_BR_PROB_BASE *
+ if (ENTRY_BLOCK_PTR_FOR_FN (fun)->count < REG_BR_PROB_BASE *
  REG_BR_PROB_BASE)
 {
   gcov_type scaled_count
- = frequency * ENTRY_BLOCK_PTR_FOR_FN (cfun)->count *
+ = frequency * ENTRY_BLOCK_PTR_FOR_FN (fun)->count *
 unlikely_count_fraction;
  computed_count = RDIV (scaled_count,
-ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency);
+ENTRY_BLOCK_PTR_FOR_FN (fun)->frequency);
 }
   else
 {
- computed_count = RDIV (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count,
-ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency);
+ computed_count = RDIV (ENTRY_BLOCK_PTR_FOR_FN (fun)->count,
+ENTRY_BLOCK_PTR_FOR_FN (fun)->frequency);
   computed_count *= frequency * unlikely_count_fraction;
 }
   if (computed_count >= profile_info->runs)
@@ -249,7 +250,7 @@ probably_never_executed (struct function
}
   return true;
 }
-  if ((!profile_info || !flag_branch_probabilities)
+  if ((!profile_info || !(opt_for_fn (fun->decl, flag_branch_probabilities)))
   && (cgraph_node::get (fun->decl)->frequency
  == NODE_FREQUENCY_UNLIKELY_EXECUTED))
 return true;
@@ -279,7 +280,7 @@ probably_never_executed_edge_p (struct f
 bool
 optimize_function_for_size_p (struct function *fun)
 {
-  if (optimize_size)
+  if (opt_for_fn (fun->decl, optimize_size))
 return true;
   if (!fun || !fun->decl)
 return false;

Audit cgraph.c for optimization attributes

2014-11-16 Thread Jan Hubicka

Hi,
this patch updates cgraph.c.  To make flag_devirtualize fully per-function 
basis,
we will need some infastructure to figure out if it is used at all and if
the type inheritance graph construction should be done (or do it 
uncondtionally).

Adding proper opt_for_fn tests at least makes it possible to disable
devirtualization at function granuality.

cgraph_edge::maybe_hot_p may be cleaned up if profile_status was moved from
cfun->cfg to callgraph node (where it belongs, but it was not placed there
for performance reasons that are gone since last release).

Bootstrapped/regtested x86_64-linux, will commit it after testing on Firefox 
LTO.

* cgraph.c (symbol_table::create_edge): Use opt_for_fn.
(cgraph_node::cannot_return_p): Likewise.
(cgraph_edge::cannot_lead_to_return_p): Likewise.
(cgraph_edge::maybe_hot_p): Likewise.
Index: cgraph.c
===
--- cgraph.c(revision 217633)
+++ cgraph.c(working copy)
@@ -859,7 +859,8 @@ symbol_table::create_edge (cgraph_node *
   edge->indirect_inlining_edge = 0;
   edge->speculative = false;
   edge->indirect_unknown_callee = indir_unknown_callee;
-  if (flag_devirtualize && call_stmt && DECL_STRUCT_FUNCTION (caller->decl))
+  if (opt_for_fn (edge->caller->decl, flag_devirtualize)
+  && call_stmt && DECL_STRUCT_FUNCTION (caller->decl))
 edge->in_polymorphic_cdtor
   = decl_maybe_in_construction_p (NULL, NULL, call_stmt,
  caller->decl);
@@ -2374,7 +2375,7 @@ bool
 cgraph_node::cannot_return_p (void)
 {
   int flags = flags_from_decl_or_type (decl);
-  if (!flag_exceptions)
+  if (!opt_for_fn (decl, flag_exceptions))
 return (flags & ECF_NORETURN) != 0;
   else
 return ((flags & (ECF_NORETURN | ECF_NOTHROW))
@@ -2394,7 +2395,7 @@ cgraph_edge::cannot_lead_to_return_p (vo
   if (indirect_unknown_callee)
 {
   int flags = indirect_info->ecf_flags;
-  if (!flag_exceptions)
+  if (!opt_for_fn (caller->decl, flag_exceptions))
return (flags & ECF_NORETURN) != 0;
   else
return ((flags & (ECF_NORETURN | ECF_NOTHROW))
@@ -2409,7 +2410,9 @@ cgraph_edge::cannot_lead_to_return_p (vo
 bool
 cgraph_edge::maybe_hot_p (void)
 {
-  if (profile_info && flag_branch_probabilities
+  /* TODO: Export profile_status from cfun->cfg to cgraph_node.  */
+  if (profile_info
+  && opt_for_fn (caller->decl, flag_branch_probabilities)
   && !maybe_hot_count_p (NULL, count))
 return false;
   if (caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
@@ -2420,17 +2423,18 @@ cgraph_edge::maybe_hot_p (void)
   && (callee
  && callee->frequency <= NODE_FREQUENCY_EXECUTED_ONCE))
 return false;
-  if (optimize_size) return false;
+  if (opt_for_fn (caller->decl, optimize_size))
+return false;
   if (caller->frequency == NODE_FREQUENCY_HOT)
 return true;
   if (caller->frequency == NODE_FREQUENCY_EXECUTED_ONCE
   && frequency < CGRAPH_FREQ_BASE * 3 / 2)
 return false;
-  if (flag_guess_branch_prob)
+  if (opt_for_fn (caller->decl, flag_guess_branch_prob))
 {
   if (PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) == 0
  || frequency <= (CGRAPH_FREQ_BASE
-/ PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION)))
+  / PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION)))
 return false;
 }
   return true;

Audit cgraphunit for optimization attributes

2014-11-16 Thread Jan Hubicka

Hi,
this patch updates cgraphunit.  One non-trivial case is expand_thunk.  Jason, I
think expand_thunk should always inherit optimization/target attributes from
the function it is associated with, right?

Bootstrapped/regtested x86_64-linux.

Honza

* cgraphunit.c (analyze_functions): Use opt_for_fn.

Index: cgraphunit.c
===
--- cgraphunit.c(revision 217633)
+++ cgraphunit.c(working copy)
@@ -1001,7 +1001,7 @@ analyze_functions (void)
  for (edge = cnode->callees; edge; edge = edge->next_callee)
if (edge->callee->definition)
   enqueue_node (edge->callee);
- if (optimize && flag_devirtualize)
+ if (optimize && opt_for_fn (cnode->decl, flag_devirtualize))
{
  cgraph_edge *next;

[PATCH] gcc/ubsan.c: Extend 'pretty_name' space to avoid memory overflow

2014-11-16 Thread Chen Gang

According to the next code, 'pretty_name' may need additional bytes more
than 16. For simplify thinking and being extensible in future, extent it
to 256 bytes, directly.

It passes testsuite under fedora 20 x86_64-unknown-linux-gnu.


2014-11-17  Chen Gang  

* ubsan.c (ubsan_type_descriptor): Extend 'pretty_name' space to
avoid memory overflow.
---
 gcc/ubsan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ubsan.c b/gcc/ubsan.c
index 41cf546..12b05cd 100644
--- a/gcc/ubsan.c
+++ b/gcc/ubsan.c
@@ -376,7 +376,7 @@ ubsan_type_descriptor (tree type, enum ubsan_print_style 
pstyle)
 tname = "";
 
   /* Decorate the type name with '', '*', "struct", or "union".  */
-  pretty_name = (char *) alloca (strlen (tname) + 16 + deref_depth);
+  pretty_name = (char *) alloca (strlen (tname) + 256 + deref_depth);
   if (pstyle == UBSAN_PRINT_POINTER)
 {
   int pos = sprintf (pretty_name, "'%s%s%s%s%s%s%s",
-- 
1.9.3

Re: [PATCH 0/3][AArch64]More intrinsics/builtins improvements

2014-11-16 Thread Yangfei (Felix)

> These three are logically independent, but all on a common theme, and I've
> tested them all together by
> 
> bootstrapped + check-gcc on aarch64-none-elf cross-tested check-gcc on
> aarch64_be-none-elf
> 
> Ok for trunk?


Hi Alan,

It seems that we are duplicating the work for the vld1_dup part. (Refer to 
my message: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01462.html) 
I have a plan to improve these intrinsics/builtins:  vrsubhnX, vrsqrtX, 
vqrdmulX, vqmovX, vqdmulhqX, vqdmulhX, vpminX, vpmaxX, vpaddX, vpadaX
vmvnX, vmulxX, vmovnX, vmlsX, 
vhsubX, vcvtX, vcopyX, vaddlvX, vabX, vfmX, vrecpeX, vcntX, vclsX
And work for these intrinsics is in progress:  vfmX, vrecpeX, vhsubX, 
vcntX, vclsX
Please let me know if you guys want to work on any of them.  Thanks.

Re: PATCH [4 of 7], rs6000, add support for scalar floating point in Altivec registers

2014-11-16 Thread David Edelsohn

On Tue, Nov 11, 2014 at 8:07 PM, Michael Meissner
 wrote:
> This patch sets up some of the support that will be needed in the next patch,
> and updates the debug functions.  It also adds checks to make sure the upper
> regs support has the other options enabled.  Is this patch acceptable to be
> checked in once the PowerPC bootstraps?
>
> 2014-11-11  Michael Meissner  
>
> * config/rs6000/rs6000.c (RELOAD_REG_AND_M16): Add support for
> Altivec style vector loads that ignore the bottom 3 bits of the
> address.
> (rs6000_debug_addr_mask): New function to print the addr_mask
> values if debugging.
> (rs6000_debug_print_mode): Call rs6000_debug_addr_mask to print
> out addr_mask.
> (rs6000_setup_reg_addr_masks): Add support for Altivec style
> vector loads that ignore the bottom 3 bits of the address.
> (rs6000_init_hard_regno_mode_ok): Rework DFmode support if
> -mupper-regs-df.  Add support for -mupper-regs-sf.  Rearrange code
> placement for direct move support.
> (rs6000_option_override_internal): Add checks for -mupper-regs-df
> requiring -mvsx, and -mupper-regs-sf requiring -mpower8-vector.
> (rs6000_secondary_reload_fail): Add ATTRIBUTE_NORETURN.

Okay.

Thanks, David

Re: PATCH [5 of 7], rs6000, add support for scalar floating point in Altivec registers

2014-11-16 Thread David Edelsohn

On Tue, Nov 11, 2014 at 8:16 PM, Michael Meissner
 wrote:
> This is the big patch that enables the upper regs support.  It reorganizes the
> secondary reload handler to try and make it easier to understand, by having a
> variable that says it is done, rather than using cascading if statements.  The
> secondary reload inner function (which is called from the reload helper
> functions with a base scratch register) has been reworked quite a bit.
>
> I also discovered that we have two peephole2's that try to reduce SF->SF and
> DF->DF moves.  Unfortunately, this breaks the use of a traditional floating
> point register to reload data in/out of an Altivec register.  At some future
> point, I would like to revisit this, but it is needed to enable the upper regs
> support.
>
> I don't believe this will affect the non-server PowerPC ports, since the 
> reload
> handlers are only enabled under VSX.  However, it would be nice if other
> PowerPC folk can apply these patches and make sure there are no regressions.
>
> Is this patch ok to check in?
>
> 2014-11-11  Michael Meissner  
> Ulrich Weigand  
>
> * config/rs6000/rs6000.c (rs6000_secondary_reload_toc_costs):
> Helper function to identify costs of a TOC load for secondary
> reload support.
> (rs6000_secondary_reload_memory): Helper function for secondary
> reload, to determine if a particular memory operation is directly
> handled by the hardware, or if it needs support from secondary
> reload to create a valid address.
> (rs6000_secondary_reload): Rework code, to be clearer.  If the
> appropriate -mupper-regs-{sf,df} is used, use FPR registers to
> reload scalar values, since the FPR registers have D-form
> addressing. Move most of the code handling memory to the function
> rs6000_secondary_reload_memory, and use the reg_addr structure to
> determine what type of address modes are supported.  Print more
> debug information if -mdebug=addr.
> (rs6000_secondary_reload_inner): Rework entire function to be more
> general.  Use the reg_addr bits to determine what type of
> addressing is supported.
> (rs6000_preferred_reload_class): Rework.  Move constant handling
> into a single place.  Prefer using FLOAT_REGS for scalar floating
> point.
> (rs6000_secondary_reload_class): Use a FPR register to move a
> value from an Altivec register to a GPR, and vice versa.  Move VSX
> handling above traditional floating point.
>
> * config/rs6000/rs6000.md (mov_hardfloat, FMOVE32 case):
> Delete some spaces in the constraints.
> (DF->DF move peephole2): Disable if -mupper-regs-{sf,df} to
> allow using FPR registers to load/store an Altivec register for
> scalar floating point types.
> (SF->SF move peephole2): Likewise.

Okay,

Thanks, David

Re: PATCH [6 of 7], rs6000, add support for scalar floating point in Altivec registers

2014-11-16 Thread David Edelsohn

On Tue, Nov 11, 2014 at 8:19 PM, Michael Meissner
 wrote:
> This patch documents the previously undocumented -mupper-regs-df and
> -mupper-regs-sf switches.  It also provides feature test macros that users can
> use to determine if the upper register support is enabled.
>
> Once the prevous patches have gone in, is this patch ok to install?
>
> 2014-11-11  Michael Meissner  
>
> * config/rs6000/rs6000.opt (-mupper-regs-df): Make option public.
> (-mupper-regs-sf): Likewise.
>
> * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
> __UPPER_REGS_DF__ if -mupper-regs-df.  Define __UPPER_REGS_SF__ if
> -mupper-regs-sf.
>
> * doc/invoke.texi (RS/6000 and PowerPC Options): Document
> -mupper-regs-{sf,df}.

Okay.

Thanks, David

Re: PATCH [7 of 7], rs6000, add support for scalar floating point in Altivec registers

2014-11-16 Thread David Edelsohn

On Tue, Nov 11, 2014 at 8:22 PM, Michael Meissner
 wrote:
> This patch adds 2 tests to the testsuite to make sure the -mupper-regs-df and
> -mupper-regs-sf options work, and you can generate add, subtract, multiply,
> divide, and compare instructions on scalars living in the Altivec registers.  
> I
> also fixed the p8vector-ldst.c test, which has been broken for some time (this
> test was a preliminary test for the upper regs support).
>
> Assuming the previous patches are checked in, is this patch ok to install?
>
> 2014-11-11  Michael Meissner  
>
> * gcc.target/powerpc/p8vector-ldst.c: Rewrite to use 40 live
> floating point variables instead of using asm to test allocating
> values to the Altivec registers.
>
> * gcc.target/powerpc/upper-regs-sf.c: New -mupper-regs-sf and
> -mupper-regs-df tests.
> * gcc.target/powerpc/upper-regs-df.c: Likewise.

Okay.

Thanks, David

Re: PATCH [8 of 8], rs6000, add support for scalar floating point in Altivec registers

2014-11-16 Thread David Edelsohn

On Fri, Nov 14, 2014 at 3:16 PM, Michael Meissner
 wrote:
> I tracked down the regression in the spec benchmarks, and it was due to 
> turning
> off pre-increment/pre-decrement for floating point values, and these two
> benchmarks use pre-increment/pre-decrement quite a bit.  My secondary reload
> handlers are capable of adding in the pre-increment/pre-decrement if such an
> operation is attempted on an Altivec register.
>
> I am also including a patch to make the compiler work with -ffast-math.  If 
> you
> use -ffast-math, the easy_fp_constant predicate says that all constants are
> easy in order to enable using the reciprocal approximation instructions for
> division.  I put in a define_split to move the constants to the constant pool
> after the reciprocal approximation work has been done but before reload
> starts.  I had had this patch in when I was doing the development, but I
> thought I did not need it when making up the patches, but perhaps recent
> changes to the register allocator need it again.
>
> I added an option (-mupper-regs) to simplify setting both -mupper-regs-sf and
> -mupper-regs-df.  It will only set the options that the particular machine
> supports.
>
> Finally, I made the default to turn on -mupper-regs-df on power7/power8
> systems, and -mupper-regs-sf on power8 systems.  I have run the regression 
> test
> suite with these options on, and there were no regressions.  Once all of the
> other patches go in, can I check in these patches?
>
> If you would prefer the default for GCC 5.0 not to enable the upper register
> support, let me know, and I can remove the lines in rs6000-cpu.def that sets
> the default.
>
> 2014-11-14  Michael Meissner  
>
> * config/rs6000/predicates.md (memory_fp_constant): New predicate
> to return true if the operand is a floating point constant that
> must be put into the constant pool, before register allocation
> occurs.
>
> * config/rs6000/rs6000-cpus.def (ISA_2_6_MASKS_SERVER): Enable
> -mupper-regs-df by default.
> (ISA_2_7_MASKS_SERVER): Enable -mupper-regs-sf by default.
> (POWERPC_MASKS): Add -mupper-regs-{sf,df} as options set by the
> various -mcpu=... options.
> (power7 cpu): Enable -mupper-regs-df by default.
>
> * config/rs6000/rs6000.opt (-mupper-regs): New combination option
> that sets -mupper-regs-sf and -mupper-regs-df by default if the
> cpu supports the instructions.
>
> * config/rs6000/rs6000.c (rs6000_setup_reg_addr_masks): Allow
> pre-increment and pre-decrement on floating point, even if the
> -mupper-regs-{sf,df} options were used.
> (rs6000_option_override_internal): If -mupper-regs, set both
> -mupper-regs-sf and -mupper-regs-df, depending on the underlying
> cpu.
>
> * config/rs6000/rs6000.md (DFmode splitter): Add a define_split to
> move floating point constants to the constant pool before register
> allocation.  Normally constants are put into the pool immediately,
> but -ffast-math delays putting them into the constant pool for the
> reciprocal approximation support.
> (SFmode splitter): Likewise.
>
> * doc/invoke.texi (RS/6000 and PowerPC Options): Document
> -mupper-regs.

Okay.

Thanks, David

[PATCH, ARM] Constrain the small multiply test cases to be more restrictive.

2014-11-16 Thread Hale Wang

Hi,

Refer to the previous small multiply patch (r217175).

The conditions in the small multiply test cases are not restrictive enough.
If forcing the march=armv4t/armv5t, these cases will fail.
These cases can be used only if we defined "
-mcpu=cortex-m0/m1/m0plus.small-multiply ".

This patch is used to fix this issue.

These cases will be skipped if we don't define
"-mcpu=cortex-m0/m1/m0plus.small-multiply". So no influence to other
targets.

Build gcc passed. Is it OK for trunk?

Thanks and Best Regards,
Hale Wang

gcc/testsuite/ChangeLog:

2014-11-13  Hale Wang  

* gcc.target/arm/small-multiply-m0-1.c: Only apply when
" -mcpu=cortex-m0/m1/m0plus.small-multiply ".
* gcc.target/arm/small-multiply-m0-2.c: Likewise.
* gcc.target/arm/small-multiply-m0-3.c: Likewise.
* gcc.target/arm/small-multiply-m0plus-1.c: Likewise.
* gcc.target/arm/small-multiply-m0plus-2.c: Likewise.
* gcc.target/arm/small-multiply-m0plus-3.c: Likewise.
* gcc.target/arm/small-multiply-m1-1.c: Likewise.
* gcc.target/arm/small-multiply-m1-2.c: Likewise.
* gcc.target/arm/small-multiply-m1-3.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c
b/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c
index 77ec603..49132e3 100644
--- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c
+++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_thumb1_ok } */
-/* { dg-skip-if "Test is specific to cortex-m0.small-multiply" { arm*-*-* }
{ "-mcpu=*" } { "-mcpu=cortex-m0.small-multiply" } } */
+/* { dg-skip-if "Test is specific to cortex-m0.small-multiply" { arm*-*-* }
{ "*" } { "-mcpu=cortex-m0.small-multiply" } } */
 /* { dg-options "-mcpu=cortex-m0.small-multiply -mthumb -O2" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c
b/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c
index c89b3ba..7f1bf7b 100644
--- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c
+++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_thumb1_ok } */
-/* { dg-skip-if "Test is specific to cortex-m0.small-multiply" { arm*-*-* }
{ "-mcpu=*" } { "-mcpu=cortex-m0.small-multiply" } } */
+/* { dg-skip-if "Test is specific to cortex-m0.small-multiply" { arm*-*-* }
{ "*" } { "-mcpu=cortex-m0.small-multiply" } } */
 /* { dg-options "-mcpu=cortex-m0.small-multiply -mthumb -Os" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c
b/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c
index b2df109..aca39d7 100644
--- a/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c
+++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_thumb1_ok } */
-/* { dg-skip-if "Test is specific to cortex-m0.small-multiply" { arm*-*-* }
{ "-mcpu=*" } { "-mcpu=cortex-m0.small-multiply" } } */
+/* { dg-skip-if "Test is specific to cortex-m0.small-multiply" { arm*-*-* }
{ "*" } { "-mcpu=cortex-m0.small-multiply" } } */
 /* { dg-options "-mcpu=cortex-m0.small-multiply -mthumb -Os" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c
b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c
index 08a450b..12e8839 100644
--- a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c
+++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_thumb1_ok } */
-/* { dg-skip-if "Test is specific to cortex-m0plus.small-multiply" {
arm*-*-* } { "-mcpu=*" } { "-mcpu=cortex-m0plus.small-multiply" } } */
+/* { dg-skip-if "Test is specific to cortex-m0plus.small-multiply" {
arm*-*-* } { "*" } { "-mcpu=cortex-m0plus.small-multiply" } } */
 /* { dg-options "-mcpu=cortex-m0plus.small-multiply -mthumb -O2" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c
b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c
index 17b52d3..3e3c9b2 100644
--- a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c
+++ b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_thumb1_ok } */
-/* { dg-skip-if "Test is specific to cortex-m0plus.small-multiply" {
arm*-*-* } { "-mcpu=*" } { "-mcpu=cortex-m0plus.small-multiply" } } */
+/* { dg-skip-if "Test is specific to cortex-m0plus.small-multiply" {
arm*-*-* } { "*" } { "-mcpu=cortex-m0plus.small-multiply" } } */
 /* { dg-options "-mcpu=cortex-m0plus.small-multiply -mthumb -Os" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-3.c
b/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-3.c
index af69c75..75e3432 100644
--- a/gcc/testsuite/gcc.target/arm/small-multiply-m0plus-3.c
+++ b/gcc/testsuite/gcc.target/arm/smal

Re: Updated LRA rematerialization patch has been committed

2014-11-16 Thread Andrew Pinski

On Wed, Nov 12, 2014 at 7:08 PM, Vladimir Makarov  wrote:
>   After submitting LRA rematerialization patch, I got a lot of
> feedback.  Some people reported performance degradation and pointed me
> out the most important problem which looks like
>
>   p0 <- p1 + p2  p0 <- p1 + p2
>   spilled_pseudo <- p0   spilled_pseudo <- p0
>
>... some code =>
>
>   p3 <- spilled_pseudop3 <- p1 + p2
>
>   The first 2 insns were not removed and the second one became a dead
> store.  It was hard to fix as LRA (and reload pass) does mostly local
> transformations (in BB or EBB).  It could be fixed in BB or EBB.  But
> some important cases (e.g. the code between is a loop) still will be
> missed and will result in the same problem.  To fix this in right way,
> we needed to update the global live info.
>
>  A recent Intel project on reuse of pic register came to problems
> which need a global live analysis too in LRA to fix them.  For
> rematerialization, it is a matter of better code generation but for
> reuse of pic register project it is a matter of correct code
> generation.
>
>   So the last two weeks I worked on global live analysis in LRA and
> submitted the patch 3 days ago.  The rematerialization patch here is
> mostly the same I sent month ago.  I added only small changes to adapt
> it to global live analysis and fix some tests failures I found on
> ppc64.
>
>   The patch with live analysis generates smaller and a better code
> than before.  Last time I reported only 1% SPECFP2000 improvement on
> ARM.  Now I see about 0.4% SPECFP2000 improvement on x86-64 too.


Hi Vlad,
  I find this miscompiles glibc on aarch64.
I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63906 to record this issue.
My analysis so far has found that it is rematerizing the new value of
sp where the old value of sp is needed after an alloca.

Thanks,
Andrew Pinski

>
>   So I've committed the rematerialization patch to the trunk as rev. 217458.
>
>   As I wrote its initial version of rematerialziation.  Other
> people and me proposed several ideas how to improve it in the future.
>
> 2014-11-12  Vladimir Makarov  
>
> * common.opt (flra-remat): New.
> * opts.c (default_options_table): Add entry for flra_remat.
> * timevar_def (TV_LRA_REMAT): New.
> * doc/invoke.texi (-flra-remat): Add description of the new
> option.
> * doc/passes.texi (-flra-remat): Remove lra-equivs.c and
> lra-saves.c.  Add lra-remat.c.
> * Makefile.in (OBJS): Add lra-remat.o.
> * lra-remat.c: New file.
> * lra.c: Add info about the rematerialization pass in the top
> comment.
> (collect_non_operand_hard_regs, add_regs_to_insn_regno_info):
> Process unallocatable regs too.
> (lra_constraint_new_insn_uid_start): Remove.
> (lra): Add code for calling rematerialization sub-pass.
> * lra-int.h (lra_constraint_new_insn_uid_start): Remove.
> (lra_constrain_insn, lra_remat): New prototypes.
> (lra_eliminate_regs_1): Add parameter.
> * lra-lives.c (make_hard_regno_born, make_hard_regno_dead):
> Process unallocatable hard regs too.
> (process_bb_lives): Ditto.
> * lra-spills.c (remove_pseudos): Add argument to
> lra_eliminate_regs_1 call.
> * lra-eliminations.c (lra_eliminate_regs_1): Add parameter.  Use it
> for sp offset calculation.
> (lra_eliminate_regs): Add argument for lra_eliminate_regs_1 call.
> (eliminate_regs_in_insn): Add parameter.  Use it for sp offset
> calculation.
> (process_insn_for_elimination): Add argument for
> eliminate_regs_in_insn call.
> * lra-constraints.c (get_equiv_with_elimination):  Add argument
> for lra_eliminate_regs_1 call.
> (process_addr_reg): Add parameter.  Use it.
> (process_address_1): Ditto.  Add argument for process_addr_reg
> call.
> (process_address): Ditto.
> (curr_insn_transform): Add parameter.  Use it.  Add argument for
> process_address calls.
> (lra_constrain_insn): New function.
> (lra_constraints): Add argument for curr_insn_transform call.
>
>
>

[PATCH] gcc/ira-conflicts.c: avoid conflict obj compare with itself

2014-11-16 Thread Zhouyi Zhou

From: Zhouyi Zhou 

In function build_conflict_bit_table, id is set in objects_live before
traversing that sparseset, so the obj is unnessary compared with itself 
during the traversing. 
The comparing of obj with itself can be avoided by means of moving   
sparseset_set_bit (objects_live, id) after the traversing.

I have no write access to gcc repository and I can't provide a testcase
because the improvement has effective compile no output.
 
Bootstraped and regtested in x86_64 Linux 
Signed-off-by: Zhouyi Zhou 
---
 gcc/ChangeLog   |4 
 gcc/ira-conflicts.c |2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d385e33..3f4b14e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2014-11-17  Zhouyi Zhou  
+
+   * ira-conflicts.c (build_conflict_bit_table): avoid obj self comparing.
+   
 2014-11-16  Jan Hubicka  
 
* ipa-polymorphic-call.c
diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index 7aaf0cb..cccdb6b 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -177,7 +177,6 @@ build_conflict_bit_table (void)
  gcc_assert (id < ira_objects_num);
 
  aclass = ALLOCNO_CLASS (allocno);
- sparseset_set_bit (objects_live, id);
  EXECUTE_IF_SET_IN_SPARSESET (objects_live, j)
{
  ira_object_t live_obj = ira_object_id_map[j];
@@ -191,6 +190,7 @@ build_conflict_bit_table (void)
  record_object_conflict (obj, live_obj);
}
}
+ sparseset_set_bit (objects_live, id);
}
 
   for (r = ira_finish_point_ranges[i]; r != NULL; r = r->finish_next)
-- 
1.7.10.4

Re: Follow-up to PR51471

2014-11-16 Thread Jeff Law


On 11/15/14 14:37, Matthew Fortune wrote:

Eric Botcazou  writes:

IIRC, fill_eager and its related friends are all speculative in some

way

and aren't those precisely the ones that are causing us problems.

Also

note we have backends working around this stuff in fairly blunt ways:


I'd say that the PA back-end went a bit too far here, especially if it
marks some insns of the epilogue as frame-related.  dwarf2cfi.c has
special code to handle delay slots (SEQUENCEs) so it's not an all-or-
nothing game.


Given architectural difficulties of delay slots on modern processors,
would it be that painful to just not allow filling slots with frame
insns and let dbr try to find something else or drop in a nop?  I
wouldn't be all that surprised if there wasn't a measurable
performance difference on something like a modern Sparc.


Yes, modern SPARCs have (short) branches without delay slots.  But the
other big contender is MIPS here and the story might be different for
it.


MIPSr6 introduces 'compact' branches which do not have delay slots.

So the issues of filling delay slots will be less important from R6
onwards. However, delay slots remain important for now.

I haven't thought about the problem much but instinctively I'd be surprised
if a blanket restriction on frame-related instructions would lead to lots
of NOPs in delay slots.
Possibly.  I'd be surprised if frame-related stuff is used that often 
for filling slots...  Combine that with the decrease in importance for 
filling delay slots when the exist, I wouldn't be terribly surprised if 
nobody could actually measure the change if we were to make it.


The PA port may have gone too far, but it's certainly conservatively 
correct and on every PA processor that "matters" (for a very liberal 
definition of matters), I doubt the difference is measurable due to the 
depth of the reorder buffers and the fact that a nop can retire anytime 
that's convenient.


Jeff

[PING][PATCH] [AARCH64, NEON] Improve vcls(q?) vcnt(q?) and vld1(q?)_dup intrinsics

2014-11-16 Thread Yangfei (Felix)

PING?  
BTW: It seems that Alan's way of improving vld1(q?)_dup intrinsic is more 
elegant.  
So is the improvement of vcls(q?) vcnt(q?) OK for trunk?  Thanks.  


> 
> Hi,
> This patch converts vcls(q?) vcnt(q?) and vld1(q?)_dup intrinsics to use
> builtin functions instead of the previous inline assembly syntax.
> Regtested with aarch64-linux-gnu on QEMU.  Also passed the glorious
> testsuite of Christophe Lyon.
> OK for the trunk?
> 
> 
> Index: gcc/ChangeLog
> =
> ==
> --- gcc/ChangeLog (revision 217394)
> +++ gcc/ChangeLog (working copy)
> @@ -1,3 +1,21 @@
> +2014-11-13  Felix Yang  
> + Jiji Jiang  
> + Shanyao Chen  
> +
> + * config/aarch64/aarch64-simd-builtins.def (clrsb, popcount, ld1r): New
> + builtins.
> + * config/aarch64/aarch64-simd.md (aarch64_ld1r): New expand.
> + (clrsb2, popcount2): New patterns.
> + (*aarch64_simd_ld1r): Renamed to aarch64_simd_ld1r.
> + * config/aarch64/arm_neon.h (vcls_s8, vcls_s16, vcls_s32, vclsq_s8,
> + vclsq_s16, vclsq_s32, vcnt_p8, vcnt_s8, vcnt_u8, vcntq_p8, vcntq_s8,
> + vcntq_u8, vld1_dup_f32, vld1_dup_f64, vld1_dup_p8, vld1_dup_p16,
> + vld1_dup_s8, vld1_dup_s16, vld1_dup_s32, vld1_dup_s64, vld1_dup_u8,
> + vld1_dup_u16, vld1_dup_u32, vld1_dup_u64, vld1q_dup_f32,
> vld1q_dup_f64,
> + vld1q_dup_p8, vld1q_dup_p16, vld1q_dup_s8, vld1q_dup_s16,
> vld1q_dup_s32,
> + vld1q_dup_s64, vld1q_dup_u8, vld1q_dup_u16, vld1q_dup_u32,
> + vld1q_dup_u64): Rewrite using builtin functions.
> +
>  2014-11-11  Andrew Pinski  
> 
>   Bug target/61997
> Index: gcc/config/aarch64/arm_neon.h
> =
> ==
> --- gcc/config/aarch64/arm_neon.h (revision 217394)
> +++ gcc/config/aarch64/arm_neon.h (working copy)
> @@ -5317,138 +5317,6 @@ vaddlvq_u32 (uint32x4_t a)
>return result;
>  }
> 
> -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
> -vcls_s8 (int8x8_t a)
> -{
> -  int8x8_t result;
> -  __asm__ ("cls %0.8b,%1.8b"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
> -vcls_s16 (int16x4_t a)
> -{
> -  int16x4_t result;
> -  __asm__ ("cls %0.4h,%1.4h"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
> -vcls_s32 (int32x2_t a)
> -{
> -  int32x2_t result;
> -  __asm__ ("cls %0.2s,%1.2s"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
> -vclsq_s8 (int8x16_t a)
> -{
> -  int8x16_t result;
> -  __asm__ ("cls %0.16b,%1.16b"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
> -vclsq_s16 (int16x8_t a)
> -{
> -  int16x8_t result;
> -  __asm__ ("cls %0.8h,%1.8h"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
> -vclsq_s32 (int32x4_t a)
> -{
> -  int32x4_t result;
> -  __asm__ ("cls %0.4s,%1.4s"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
> -vcnt_p8 (poly8x8_t a)
> -{
> -  poly8x8_t result;
> -  __asm__ ("cnt %0.8b,%1.8b"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
> -vcnt_s8 (int8x8_t a)
> -{
> -  int8x8_t result;
> -  __asm__ ("cnt %0.8b,%1.8b"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
> -vcnt_u8 (uint8x8_t a)
> -{
> -  uint8x8_t result;
> -  __asm__ ("cnt %0.8b,%1.8b"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
> -vcntq_p8 (poly8x16_t a)
> -{
> -  poly8x16_t result;
> -  __asm__ ("cnt %0.16b,%1.16b"
> -   : "=w"(result)
> -   : "w"(a)
> -   : /* No clobbers */);
> -  return result;
> -}
> -
> -__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
> -vcntq_s8 (int8x16_t a)
> -{
> -  int8x16_t result;
> -  __asm__ ("cnt %0.16b,%1.16b"
> -   : "=w"(result)
> -   : "w

Re: [PATCH] gcc/ubsan.c: Extend 'pretty_name' space to avoid memory overflow

2014-11-16 Thread Marek Polacek

On Mon, Nov 17, 2014 at 06:40:26AM +0800, Chen Gang wrote:
> According to the next code, 'pretty_name' may need additional bytes more
> than 16. For simplify thinking and being extensible in future, extent it
> to 256 bytes, directly.

I think + 128 bytes should be enough for everyone.

Marek

Re: PATCH: PR bootstrap/63888: [5 Regression] bootstrap failed when configured with -with-build-config=bootstrap-asan --disable-werror

2014-11-16 Thread Yury Gribov


On 11/15/2014 09:34 PM, H.J. Lu wrote:

GCC uses xstrndup/xstrdup throughout the source tree and those memory
may not be freed explicitly before exut.  LeakSanitizer isn't very
useful here.  This patch suppresses LeakSanitizer in bootstrap.  OK
for trunk?


Right, I think until now everyone just did the same manually.  I wonder 
if it makes sense to also enable more aggressive checking e.g. 
detect_stack_use_after_return and check_initialization_order.


-Y

Re: [PATCH] Fix minimal alignment calculation for user-aligned types (PR63802)

2014-11-16 Thread Jakub Jelinek

On Fri, Nov 14, 2014 at 06:15:16PM +, Joseph Myers wrote:
> On Fri, 14 Nov 2014, Jakub Jelinek wrote:
> 
> > On Fri, Nov 14, 2014 at 09:46:14AM +0300, Yury Gribov wrote:
> > > Hi all,
> > > 
> > > This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63802 by 
> > > only
> > > limiting minimal type alignment with BIGGEST_ALIGNMENT for types with no
> > > __attribute__((aligned)).
> > > 
> > > Bootstrapped and regtested on x64.  Ok for trunk?
> > 
> > The function is primarily used by the C FE for _Alignas, and I have no idea
> > if such a change is desirable for that very much user visible case.  Joseph?
> 
> If it is true that a type satisfying TYPE_USER_ALIGN will never be 
> allocated at an address less-aligned than its TYPE_ALIGN, even if that's 
> greater than BIGGEST_ALIGNMENT, then the change seems correct for C11 
> _Alignof.

I think it depends on which target and where.
In structs (unless packed) the user aligned fields should be properly
aligned with respect to start of struct and the struct should have user
alignment in that case, automatic vars these days use alloca with
realignment if not handled better by the target, so should be fine too.
For data section vars and for common vars I think it really depends on the
target.  Perhaps for TYPE_USER_ALIGN use minimum of the TYPE_ALIGN and
MAX_OFILE_ALIGNMENT ?
For heap objects, it really depends on how it has been allocated, but if
allocated through malloc, the extra alignment is never guaranteed.
So, it really depends in malloc counts or not.

Jakub

Re: [PATCH] gcc/ubsan.c: Extend 'pretty_name' space to avoid memory overflow

2014-11-16 Thread Jakub Jelinek

On Mon, Nov 17, 2014 at 08:16:32AM +0100, Marek Polacek wrote:
> On Mon, Nov 17, 2014 at 06:40:26AM +0800, Chen Gang wrote:
> > According to the next code, 'pretty_name' may need additional bytes more
> > than 16. For simplify thinking and being extensible in future, extent it
> > to 256 bytes, directly.
> 
> I think + 128 bytes should be enough for everyone.

I disagree.
Consider:
typedef char 
A[1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1][1];
A a;

int foo (int j)
{

a[j][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0][0]
 = 1;
}
You need 1159 bytes in that case.  Easily to construct testcase that needs
arbitrary amount.
I think easiest would be to rewrite the code so that it uses pretty_printer
to construct the string, grep asan.c for asan_pp .  Or obstacks, but you don't
have a printer to print integers into it easily.
  if (dom && TREE_CODE (TYPE_MAX_VALUE (dom)) == INTEGER_CST)
pos += sprintf (&pretty_name[pos], HOST_WIDE_INT_PRINT_DEC,
tree_to_uhwi (TYPE_MAX_VALUE (dom)) + 1);
  else
/* ??? We can't determine the variable name; print VLA unspec.  */
pretty_name[pos++] = '*';
looks wrong anyway, as not all integers fit into uhwi.
Guess you could use wide_int to add 1 there and pp_wide_int.

Jakub

43 matches

Mail list logo