date:20140507

Re: we are starting the wide int merge

2014-05-07 Thread Jan-Benedict Glaw

On Tue, 2014-05-06 12:20:54 -0700, Mike Stump  wrote:
> On May 6, 2014, at 8:19 AM, Kenneth Zadeck  wrote:
> > please hold off on committing patches for the next couple of hours as we 
> > have a very large merge to do.
> > thanks.
> 
> All done…  It is in.

Just found one more:

g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic 
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
-DHAVE_CONFIG_H -I. -I. -I/home/vaxbuild/repos/gcc/gcc 
-I/home/vaxbuild/repos/gcc/gcc/. -I/home/vaxbuild/repos/gcc/gcc/../include 
-I/home/vaxbuild/repos/gcc/gcc/../libcpp/include  
-I/home/vaxbuild/repos/gcc/gcc/../libdecnumber 
-I/home/vaxbuild/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/home/vaxbuild/repos/gcc/gcc/../libbacktrace-o loop-iv.o -MT loop-iv.o 
-MMD -MP -MF ./.deps/loop-iv.TPo /home/vaxbuild/repos/gcc/gcc/loop-iv.c
In file included from /home/vaxbuild/repos/gcc/gcc/real.h:25:0,
 from /home/vaxbuild/repos/gcc/gcc/rtl.h:27,
 from /home/vaxbuild/repos/gcc/gcc/loop-iv.c:54:
/home/vaxbuild/repos/gcc/gcc/wide-int.h: In instantiation of 
‘fixed_wide_int_storage::fixed_wide_int_storage(const T&) [with T = long 
long unsigned int; int N = 160]’:
/home/vaxbuild/repos/gcc/gcc/wide-int.h:724:15:   required from 
‘generic_wide_int::generic_wide_int(const T&) [with T = long long unsigned 
int; storage = fixed_wide_int_storage<160>]’
/home/vaxbuild/repos/gcc/gcc/loop-iv.c:2628:48:   required from here
/home/vaxbuild/repos/gcc/gcc/wide-int.h:1172:45: error: incomplete type 
‘wi::int_traits’ used in nested name specifier
   WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
 ^
/home/vaxbuild/repos/gcc/gcc/wide-int.h:1173:47: error: incomplete type 
‘wi::int_traits’ used in nested name specifier
   wi::copy (*this, WIDE_INT_REF_FOR (T) (x, N));
   ^
make[1]: *** [loop-iv.o] Error 1






This happened for bfin-elf, see 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=220155

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
 Signature of:Don't believe in miracles: Rely on them!
 the second  :


signature.asc
Description: Digital signature

Re: we are starting the wide int merge

2014-05-07 Thread Andreas Schwab

Christophe Lyon  writes:

> It also looks like the git-svn-id property is now wrong/incomplete.
> For instance, commit 9a5942c1d4d9116ab74b0741cfe3894a89fd17fb has:
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/wide-int@201706
> 138bc75d-0d04-0410-961f-82ee72b054a4
>
> How does it map to the SVN commit in trunk?

This is a commit on the wide-int branch (the one that created it).

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[PATCH, nds32] Committed: Enable HONOR_REG_ALLOC_ORDER when optimizing for size.

2014-05-07 Thread Chung-Ju Wu

Hi, all,

There was a patch to have HONOR_REG_ALLOC_ORDER using C expression:
  http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01546.html
  http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00048.html

This is very helpful to nds32 port since we can decide when to apply
HONOR_REG_ALLOC_ORDER against code size and performance trade-off.
Currently, HONOR_REG_ALLOC_ORDER only benefits code size in nds32 port.

ChangeLog and patch are as below, committed as Rev.210137:


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 210135)
+++ gcc/ChangeLog   (revision 210137)
@@ -1,3 +1,8 @@
+2014-05-07  Chung-Ju Wu  
+
+   * config/nds32/nds32.h (HONOR_REG_ALLOC_ORDER): Have it in favor
+   of using optimize_size.
+
 2014-05-06  Mike Stump  

* wide-int.h (wi::int_traits ): Always define.

Index: gcc/config/nds32/nds32.h
===
--- gcc/config/nds32/nds32.h(revision 210135)
+++ gcc/config/nds32/nds32.h(revision 210137)
@@ -553,7 +553,7 @@

 /* Tell IRA to use the order we define rather than messing it up with its
own cost calculations.  */
-#define HONOR_REG_ALLOC_ORDER 1
+#define HONOR_REG_ALLOC_ORDER optimize_size

 /* The number of consecutive hard regs needed starting at
reg "regno" for holding a value of mode "mode".  */


Best regards,
jasonwucj

[PATCH][4.7] Fix PR57864

2014-05-07 Thread Richard Biener


This backports a piece of

2012-09-24  Richard Guenther  

   * tree-ssa-pre.c (bitmap_find_leader, create_expression_by_pieces,
   find_or_generate_expression): Remove dominating stmt argument.
   (find_leader_in_sets, phi_translate_1, bitmap_find_leader,
   create_component_ref_by_pieces_1, create_component_ref_by_pieces,
   do_regular_insertion, do_partial_partial_insertion): Adjust.
   (compute_avail): Do not set uids.

to the 4.7 branch.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to
the branch (and the testcase added to 4.8, 4.9 and trunk).

Richard.

2014-05-06  Richard Biener  

PR tree-optimization/57864
* tree-ssa-pre.c (phi_translate_1): Backport NAME case
simplification from mainline.  Do not lookup the VN
value-number here.

* gcc.dg/torture/pr57864.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 210104)
--- gcc/tree-ssa-pre.c  (working copy)
*** phi_translate_1 (pre_expr expr, bitmap_s
*** 1756,1794 
  
  case NAME:
{
-   gimple phi = NULL;
-   edge e;
-   gimple def_stmt;
tree name = PRE_EXPR_NAME (expr);
! 
!   def_stmt = SSA_NAME_DEF_STMT (name);
if (gimple_code (def_stmt) == GIMPLE_PHI
&& gimple_bb (def_stmt) == phiblock)
- phi = def_stmt;
-   else
- return expr;
- 
-   e = find_edge (pred, gimple_bb (phi));
-   if (e)
  {
!   tree def = PHI_ARG_DEF (phi, e->dest_idx);
!   pre_expr newexpr;
! 
!   if (TREE_CODE (def) == SSA_NAME)
! def = VN_INFO (def)->valnum;
  
/* Handle constant. */
if (is_gimple_min_invariant (def))
  return get_or_alloc_expr_for_constant (def);
  
!   if (TREE_CODE (def) == SSA_NAME && ssa_undefined_value_p (def))
! return NULL;
! 
!   newexpr = get_or_alloc_expr_for_name (def);
!   return newexpr;
  }
}
-   return expr;
  
  default:
gcc_unreachable ();
--- 1756,1781 
  
  case NAME:
{
tree name = PRE_EXPR_NAME (expr);
!   gimple def_stmt = SSA_NAME_DEF_STMT (name);
!   /* If the SSA name is defined by a PHI node in this block,
!  translate it.  */
if (gimple_code (def_stmt) == GIMPLE_PHI
&& gimple_bb (def_stmt) == phiblock)
  {
!   edge e = find_edge (pred, gimple_bb (def_stmt));
!   tree def = PHI_ARG_DEF (def_stmt, e->dest_idx);
  
/* Handle constant. */
if (is_gimple_min_invariant (def))
  return get_or_alloc_expr_for_constant (def);
  
!   return get_or_alloc_expr_for_name (def);
  }
+   /* Otherwise return it unchanged - it will get cleaned if its
+  value is not available in PREDs AVAIL_OUT set of expressions.  */
+   return expr;
}
  
  default:
gcc_unreachable ();
Index: gcc/testsuite/gcc.dg/torture/pr57864.c
===
*** gcc/testsuite/gcc.dg/torture/pr57864.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr57864.c  (working copy)
***
*** 0 
--- 1,37 
+ /* { dg-do compile } */
+ 
+ union U {
+ double val;
+ union U *ptr;
+ };
+ 
+ union U *d;
+ double a;
+ int b;
+ int c;
+ 
+ static void fn1(union U *p1, int p2, _Bool p3)
+ {
+ union U *e;
+ 
+ if (p2 == 0)
+   a = ((union U*)((unsigned long)p1 & ~1))->val;
+ 
+ if (b) {
+   e = p1;
+ } else if (c) {
+   e = ((union U*)((unsigned long)p1 & ~1))->ptr;
+   d = e;
+ } else {
+   e = 0;
+   d = ((union U*)0)->ptr;
+ }
+ 
+ fn1 (e, 0, 0);
+ fn1 (0, 0, p3);
+ }
+ 
+ void fn2 (void)
+ {
+   fn1 (0, 0, 0);
+ }

Re: [PATCH] Change HONOR_REG_ALLOC_ORDER to a marco for C expression

2014-05-07 Thread Chung-Ju Wu

2014-05-02 14:41 GMT+08:00 Kito Cheng :
> Hi Jeff:
>
>> I fixed up some minor whitespace issues and committed your patch.
>
> Thanks for your help :)

Hi,

I noticed the commit date in ChangeLog was incorrect for the patch.
Fixed it as obvious.  Committed into Rev.210138.

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 210137)
+++ gcc/ChangeLog   (revision 210138)
@@ -1092,7 +1092,7 @@

* doc/invoke.texi: Describe -fsanitize=float-divide-by-zero.

-2014-02-26  Kito Cheng  
+2014-05-02  Kito Cheng  

* defaults.h (HONOR_REG_ALLOC_ORDER): Change HONOR_REG_ALLOC_ORDER
to a C expression marco.


Best regards,
jasonwucj

patch1.diff updated + test results Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate

2014-05-07 Thread Svante Signell

On Tue, 2014-05-06 at 15:26 +0200, Samuel Thibault wrote:
> Svante Signell, le Tue 06 May 2014 15:25:38 +0200, a écrit :
> > On Tue, 2014-05-06 at 15:07 +0200, Samuel Thibault wrote:
> > > Svante Signell, le Tue 06 May 2014 15:05:20 +0200, a écrit :
> > > > On Tue, 2014-05-06 at 14:51 +0200, Samuel Thibault wrote:
> > > > > Just to explicitly ask for it:
> > > > > 
> > > > > Svante Signell, le Tue 06 May 2014 10:06:49 +0200, a écrit :
> > > > > > For some (yet) unknown reason all libgo tests fails with a segfault 
> > > > > > when
> > > > > > run in the build tree: make, sh or something else, the test 
> > > > > > commands are
> > > > > > rather hard to track.
> > > > > 
> > > > > Doesn't that dump a core?  Do you have /servers/crash properly 
> > > > > pointing
> > > > > to /servers/crash-dump-core and ulimit -u set to unlimited?

More good news:
- Installing the modified libpthread.so.0.3 made the segfault go away. I
could now run the check from the build tree :-)

- Adding
#define TARGET_THREAD_SSP_OFFSET 0x14
to patch1.diff and building gcc-4.9.0-2 the test results are summarised
as follows :-)
=== libgo Summary ===

# of expected passes101
# of unexpected failures21

I think some of the remaining failures are rather easy to fix.

Attached is an updated patch1.diff.
Remains to solve the problem with patch8.diff: Adding arch specific code
to: src/libgo/mksysinfo.sh
--- a/src/gcc/config/i386/gnu.h
+++ b/src/gcc/config/i386/gnu.h
@@ -37,11 +37,14 @@
 
 #ifdef TARGET_LIBC_PROVIDES_SSP
 
-/* Not supported yet.  */
-# undef TARGET_THREAD_SSP_OFFSET
-
-/* Not supported yet.  */
-# undef TARGET_CAN_SPLIT_STACK
-# undef TARGET_THREAD_SPLIT_STACK_OFFSET
+/* i386 glibc provides __stack_chk_guard in %gs:0x14.  */
+#define TARGET_THREAD_SSP_OFFSET0x14
 
+/* We only build the -fsplit-stack support in libgcc if the
+   assembler has full support for the CFI directives.  */
+#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE
+#define TARGET_CAN_SPLIT_STACK
+#endif
+/* We steal the last transactional memory word.  */
+#define TARGET_THREAD_SPLIT_STACK_OFFSET 0x30
 #endif

[PATCH] [PING^2] Fix for PR libstdc++/60758

2014-05-07 Thread Yury Gribov


 Original Message 
Subject: [PING] [PATCH] Fix for PR libstdc++/60758
Date: Thu, 17 Apr 2014 17:48:12 +0400
From: Alexey Merzlyakov 
To: Ramana Radhakrishnan 
CC: gcc-patches@gcc.gnu.org , Viacheslav 
Garbuzov , Yury Gribov 


Hi,

This fixes infinite backtrace in __cxa_end_cleanup().
Regtest was finished with no regressions on arm-linux-gnueabi(sf).

The patch posted at:
  http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html

Thanks in advance.

Best regards,
Merzlyakov Alexey



2014-05-07  Alexey Merzlyakov 

	PR libstdc++/60758
	* libsupc++/eh_arm.cc (__cxa_end_cleanup): Change r4 to lr in save/restore
	and add unwind directives.

diff --git a/libstdc++-v3/libsupc++/eh_arm.cc b/libstdc++-v3/libsupc++/eh_arm.cc
index aa453dd..6a45af5 100644
--- a/libstdc++-v3/libsupc++/eh_arm.cc
+++ b/libstdc++-v3/libsupc++/eh_arm.cc
@@ -199,27 +199,33 @@ asm (".global __cxa_end_cleanup\n"
 "	nop		5\n");
 #else
 // Assembly wrapper to call __gnu_end_cleanup without clobbering r1-r3.
-// Also push r4 to preserve stack alignment.
+// Also push lr to preserve stack alignment and to allow backtracing.
 #ifdef __thumb__
 asm ("  .pushsection .text.__cxa_end_cleanup\n"
 "	.global __cxa_end_cleanup\n"
 "	.type __cxa_end_cleanup, \"function\"\n"
 "	.thumb_func\n"
 "__cxa_end_cleanup:\n"
-"	push\t{r1, r2, r3, r4}\n"
+"	.fnstart\n"
+"	push\t{r1, r2, r3, lr}\n"
+"	.save\t{r1, r2, r3, lr}\n"
 "	bl\t__gnu_end_cleanup\n"
-"	pop\t{r1, r2, r3, r4}\n"
+"	pop\t{r1, r2, r3, lr}\n"
 "	bl\t_Unwind_Resume @ Never returns\n"
+"	.fnend\n"
 "	.popsection\n");
 #else
 asm ("  .pushsection .text.__cxa_end_cleanup\n"
 "	.global __cxa_end_cleanup\n"
 "	.type __cxa_end_cleanup, \"function\"\n"
 "__cxa_end_cleanup:\n"
-"	stmfd\tsp!, {r1, r2, r3, r4}\n"
+"	.fnstart\n"
+"	stmfd\tsp!, {r1, r2, r3, lr}\n"
+"	.save\t{r1, r2, r3, lr}\n"
 "	bl\t__gnu_end_cleanup\n"
-"	ldmfd\tsp!, {r1, r2, r3, r4}\n"
+"	ldmfd\tsp!, {r1, r2, r3, lr}\n"
 "	bl\t_Unwind_Resume @ Never returns\n"
+"	.fnend\n"
 "	.popsection\n");
 #endif
 #endif

Re: [PATCH] [PING^2] Fix for PR libstdc++/60758

2014-05-07 Thread Paolo Carlini

Hi,

On 05/07/2014 10:19 AM, Yury Gribov wrote:

 Original Message 
Subject: [PING] [PATCH] Fix for PR libstdc++/60758
Date: Thu, 17 Apr 2014 17:48:12 +0400
From: Alexey Merzlyakov 
To: Ramana Radhakrishnan 
CC: gcc-patches@gcc.gnu.org , Viacheslav 
Garbuzov , Yury Gribov 

Hi,

This fixes infinite backtrace in __cxa_end_cleanup().
Regtest was finished with no regressions on arm-linux-gnueabi(sf).

The patch posted at:
  http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html
I think you want an ARM maintainer for this. I'm adding some in CC. 
Also, remember to send patches touching the C++ library to the mailing 
list too.

Paolo.

[C++ Patch] PR 61080

2014-05-07 Thread Paolo Carlini


Hi,

thus I prepared this simple patch. Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2014-05-07  Paolo Carlini  

PR c++/61080
* pt.c (instantiate_decl): Avoid generating the body of a
deleted function.

/testsuite
2014-05-07  Paolo Carlini  

PR c++/61080
* g++.dg/cpp0x/deleted7.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 210140)
+++ cp/pt.c (working copy)
@@ -19542,6 +19542,7 @@ instantiate_decl (tree d, int defer_ok,
   int saved_unevaluated_operand = cp_unevaluated_operand;
   int saved_inhibit_evaluation_warnings = c_inhibit_evaluation_warnings;
   bool external_p;
+  bool deleted_p;
   tree fn_context;
   bool nested;
 
@@ -19623,11 +19624,17 @@ instantiate_decl (tree d, int defer_ok,
 args = gen_args;
 
   if (TREE_CODE (d) == FUNCTION_DECL)
-pattern_defined = (DECL_SAVED_TREE (code_pattern) != NULL_TREE
-  || DECL_DEFAULTED_OUTSIDE_CLASS_P (code_pattern)
-  || DECL_DELETED_FN (code_pattern));
+{
+  deleted_p = DECL_DELETED_FN (code_pattern);
+  pattern_defined = (DECL_SAVED_TREE (code_pattern) != NULL_TREE
+|| DECL_DEFAULTED_OUTSIDE_CLASS_P (code_pattern)
+|| deleted_p);
+}
   else
-pattern_defined = ! DECL_IN_AGGR_P (code_pattern);
+{
+  deleted_p = false;
+  pattern_defined = ! DECL_IN_AGGR_P (code_pattern);
+}
 
   /* We may be in the middle of deferred access check.  Disable it now.  */
   push_deferring_access_checks (dk_no_deferred);
@@ -19671,7 +19678,10 @@ instantiate_decl (tree d, int defer_ok,
 elsewhere, we don't want to instantiate the entire data
 member, but we do want to instantiate the initializer so that
 we can substitute that elsewhere.  */
-  || (external_p && VAR_P (d)))
+  || (external_p && VAR_P (d))
+  /* Handle here a deleted function too, avoid generating
+its body (c++/61080).  */
+  || deleted_p)
 {
   /* The definition of the static data member is now required so
 we must substitute the initializer.  */
@@ -19867,17 +19877,14 @@ instantiate_decl (tree d, int defer_ok,
   tf_warning_or_error, tmpl,
   /*integral_constant_expression_p=*/false);
 
- if (DECL_STRUCT_FUNCTION (code_pattern))
-   {
- /* Set the current input_location to the end of the function
-so that finish_function knows where we are.  */
- input_location
-   = DECL_STRUCT_FUNCTION (code_pattern)->function_end_locus;
+ /* Set the current input_location to the end of the function
+so that finish_function knows where we are.  */
+ input_location
+   = DECL_STRUCT_FUNCTION (code_pattern)->function_end_locus;
 
- /* Remember if we saw an infinite loop in the template.  */
- current_function_infinite_loop
-   = DECL_STRUCT_FUNCTION (code_pattern)->language->infinite_loop;
-   }
+ /* Remember if we saw an infinite loop in the template.  */
+ current_function_infinite_loop
+   = DECL_STRUCT_FUNCTION (code_pattern)->language->infinite_loop;
}
 
   /* We don't need the local specializations any more.  */
Index: testsuite/g++.dg/cpp0x/deleted7.C
===
--- testsuite/g++.dg/cpp0x/deleted7.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/deleted7.C   (working copy)
@@ -0,0 +1,36 @@
+// PR c++/61080
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wreturn-type" }
+
+struct AAA
+{
+  int a1, a2, a3;
+  void *p;
+};
+
+template 
+class WeakMapPtr
+{
+  public:
+WeakMapPtr() : ptr(nullptr) {};
+bool init(AAA *cx);
+  private:
+void *ptr;
+WeakMapPtr(const WeakMapPtr &wmp) = delete;
+WeakMapPtr &operator=(const WeakMapPtr &wmp) = delete;
+};
+
+template 
+bool WeakMapPtr::init(AAA *cx)
+{
+ptr = cx->p;
+return true;
+}
+
+struct JSObject
+{
+  int blah;
+  float meh;
+};
+
+template class WeakMapPtr;

PR 61084: SPARC fallout from wide-int merge

2014-05-07 Thread Richard Sandiford

The DImode constant spliiter assigned the result of trunc_int_for_mode
to an unsigned int rather than a HOST_WIDE_INT.  This then produced const_ints
that were zero-extended rather than sign-extended and tripped the assert:

gcc_checking_assert (INTVAL (x.first)
 == sext_hwi (INTVAL (x.first), precision)
 || (x.second == BImode && INTVAL (x.first) == 1));

The other hunks are just by inspection, but I think gen_int_mode is
preferred over GEN_INT when the mode is obvious.

Tested by Rainer, who says that the boostrap now completes.
OK to install?

Thanks,
Richard


gcc/
PR target/61084
* config/sparc/sparc.md: Fix types of low and high in DI constant
splitter.  Use gen_int_mode in some other splitters.

Index: gcc/config/sparc/sparc.md
===
--- gcc/config/sparc/sparc.md   2014-05-07 10:15:23.051156294 +0100
+++ gcc/config/sparc/sparc.md   2014-05-07 10:15:27.922201361 +0100
@@ -1886,7 +1886,7 @@ (define_split
   emit_insn (gen_movsi (gen_lowpart (SImode, operands[0]),
operands[1]));
 #else
-  unsigned int low, high;
+  HOST_WIDE_INT low, high;
 
   low = trunc_int_for_mode (INTVAL (operands[1]), SImode);
   high = trunc_int_for_mode (INTVAL (operands[1]) >> 32, SImode);
@@ -4822,7 +4822,7 @@ (define_split
   [(set (match_dup 3) (match_dup 4))
(set (match_dup 0) (ior:SI (not:SI (match_dup 3)) (match_dup 1)))]
 {
-  operands[4] = GEN_INT (~INTVAL (operands[2]));
+  operands[4] = gen_int_mode (~INTVAL (operands[2]), SImode);
 })
 
 (define_insn_and_split "*or_not_di_sp32"
@@ -4899,7 +4899,7 @@ (define_split
   [(set (match_dup 3) (match_dup 4))
(set (match_dup 0) (not:SI (xor:SI (match_dup 3) (match_dup 1]
 {
-  operands[4] = GEN_INT (~INTVAL (operands[2]));
+  operands[4] = gen_int_mode (~INTVAL (operands[2]), SImode);
 })
 
 (define_split
@@ -4911,7 +4911,7 @@ (define_split
   [(set (match_dup 3) (match_dup 4))
(set (match_dup 0) (xor:SI (match_dup 3) (match_dup 1)))]
 {
-  operands[4] = GEN_INT (~INTVAL (operands[2]));
+  operands[4] = gen_int_mode (~INTVAL (operands[2]), SImode);
 })
 
 ;; Split DImode logical operations requiring two instructions.

Re: [PATCH] [PING^2] Fix for PR libstdc++/60758

2014-05-07 Thread Ramana Radhakrishnan

On 05/07/14 09:19, Yury Gribov wrote:

 Original Message 
Subject: [PING] [PATCH] Fix for PR libstdc++/60758
Date: Thu, 17 Apr 2014 17:48:12 +0400
From: Alexey Merzlyakov 
To: Ramana Radhakrishnan 
CC: gcc-patches@gcc.gnu.org , Viacheslav
Garbuzov , Yury Gribov 

Hi,

This fixes infinite backtrace in __cxa_end_cleanup().
Regtest was finished with no regressions on arm-linux-gnueabi(sf).

The patch posted at:
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html

This is OK to apply if no regressions.

Thanks,
Ramana

Thanks in advance.

Best regards,
Merzlyakov Alexey

[PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option

2014-05-07 Thread Herman, Andrei


Hi,

Currently GCC only emits DWARF debug information (DW_TAG_lexical_block DIEs)
for compound statements containing significant local declarations.
However, code coverage tools that process the DWARF debug information to
implement block/path coverage need more complete lexical block information. 

This patch adds the necessary functionality under the control of a new 
command line argument: -fforce-dwarf-lexical-blocks.

When this flag is set, a DW_TAG_lexical_block DIE will be emitted for every
function body, loop body, switch body, case statement, if-then and if-else
statement, even if the body is a single statement. 
Likewise, a lexical block will be emitted for the first label of a labeled
statement. This block ends at the end of the current lexical scope, or when
a break, continue, goto or return statement is encountered at the same lexical
scope level. 
Consequently, any case in a switch statement that does not flow through to 
the next case, will have its own dwarf lexical block.

The complete change proposal contains 4 patches (attached first 3):
1. Add command line option -fforce-dwarf-lexical-blocks
2. Use of flag_force_dwarf_blocks
3. Create label scopes

A forth patch, extending the proposed functionality to C++ will be submitted in 
a separate message.

Attached are the proposed ChangeLog additions, named according to the directory 
each one belongs to.

Best regards,
Andrei Herman
Mentor Graphics Corporation
Israel branch 



gcc_c_ChangeLog
Description: gcc_c_ChangeLog


gcc_c-family_ChangeLog
Description: gcc_c-family_ChangeLog


gcc_ChangeLog
Description: gcc_ChangeLog


0001-Add-command-line-option-fforce_dwarf_lexical_blocks.patch
Description: 0001-Add-command-line-option-fforce_dwarf_lexical_blocks.patch


0002-Use-flag_force_dwarf_blocks.patch
Description: 0002-Use-flag_force_dwarf_blocks.patch


0003-Create-label-scopes.patch
Description: 0003-Create-label-scopes.patch

Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option

2014-05-07 Thread pinskia



> On May 7, 2014, at 2:32 AM, "Herman, Andrei"  
> wrote:
> 
> 
> Hi,
> 
> Currently GCC only emits DWARF debug information (DW_TAG_lexical_block DIEs)
> for compound statements containing significant local declarations.
> However, code coverage tools that process the DWARF debug information to
> implement block/path coverage need more complete lexical block information. 
> 
> This patch adds the necessary functionality under the control of a new 
> command line argument: -fforce-dwarf-lexical-blocks.
> 
> When this flag is set, a DW_TAG_lexical_block DIE will be emitted for every
> function body, loop body, switch body, case statement, if-then and if-else
> statement, even if the body is a single statement. 
> Likewise, a lexical block will be emitted for the first label of a labeled
> statement. This block ends at the end of the current lexical scope, or when
> a break, continue, goto or return statement is encountered at the same lexical
> scope level. 
> Consequently, any case in a switch statement that does not flow through to 
> the next case, will have its own dwarf lexical block.
> 
> The complete change proposal contains 4 patches (attached first 3):
> 1. Add command line option -fforce-dwarf-lexical-blocks

This option since it is specific to the c frontend should go into c.opt instead 
of common.opt. Unless you are going to extend this to Ada, Java and fortran. 

Thanks,
Andrew


> 2. Use of flag_force_dwarf_blocks
> 3. Create label scopes
> 
> A forth patch, extending the proposed functionality to C++ will be submitted 
> in a separate message.
> 
> Attached are the proposed ChangeLog additions, named according to the 
> directory each one belongs to.
> 
> Best regards,
> Andrei Herman
> Mentor Graphics Corporation
> Israel branch 
> 
> 
> 
> 
> <0001-Add-command-line-option-fforce_dwarf_lexical_blocks.patch>
> <0002-Use-flag_force_dwarf_blocks.patch>
> <0003-Create-label-scopes.patch>

Re: we are starting the wide int merge

2014-05-07 Thread Christophe Lyon

On 7 May 2014 09:48, Andreas Schwab  wrote:
> Christophe Lyon  writes:
>
>> It also looks like the git-svn-id property is now wrong/incomplete.
>> For instance, commit 9a5942c1d4d9116ab74b0741cfe3894a89fd17fb has:
>> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/wide-int@201706
>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>
>> How does it map to the SVN commit in trunk?
>
> This is a commit on the wide-int branch (the one that created it).
>

I had a bug in my script while parsing the output of git log,
hopefully fixed now.

RE: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option

2014-05-07 Thread Herman, Andrei

Thanks for the note.
I will make the needed changes and resubmit.

Regards,
Andrei Herman
Mentor Graphics Corporation
Israel branch 

> -Original Message-
> From: pins...@gmail.com [mailto:pins...@gmail.com]
> Sent: Wednesday, May 07, 2014 12:37 PM
> To: Herman, Andrei
> Cc: gcc-patches@gcc.gnu.org; herman_and...@mentor.com
> Subject: Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line
> option
> 
> 
> 
> > On May 7, 2014, at 2:32 AM, "Herman, Andrei"
>  wrote:
> >
> >
> > Hi,
> >
> > Currently GCC only emits DWARF debug information
> (DW_TAG_lexical_block
> > DIEs) for compound statements containing significant local declarations.
> > However, code coverage tools that process the DWARF debug information
> > to implement block/path coverage need more complete lexical block
> information.
> >
> > This patch adds the necessary functionality under the control of a new
> > command line argument: -fforce-dwarf-lexical-blocks.
> >
> > When this flag is set, a DW_TAG_lexical_block DIE will be emitted for
> > every function body, loop body, switch body, case statement, if-then
> > and if-else statement, even if the body is a single statement.
> > Likewise, a lexical block will be emitted for the first label of a
> > labeled statement. This block ends at the end of the current lexical
> > scope, or when a break, continue, goto or return statement is
> > encountered at the same lexical scope level.
> > Consequently, any case in a switch statement that does not flow
> > through to the next case, will have its own dwarf lexical block.
> >
> > The complete change proposal contains 4 patches (attached first 3):
> > 1. Add command line option -fforce-dwarf-lexical-blocks
> 
> This option since it is specific to the c frontend should go into c.opt 
> instead
> of common.opt. Unless you are going to extend this to Ada, Java and
> fortran.
> 
> Thanks,
> Andrew
> 
> 
> > 2. Use of flag_force_dwarf_blocks
> > 3. Create label scopes
> >
> > A forth patch, extending the proposed functionality to C++ will be
> submitted in a separate message.
> >
> > Attached are the proposed ChangeLog additions, named according to the
> directory each one belongs to.
> >
> > Best regards,
> > Andrei Herman
> > Mentor Graphics Corporation
> > Israel branch
> >
> > 
> > 
> > 
> > <0001-Add-command-line-option-fforce_dwarf_lexical_blocks.patch>
> > <0002-Use-flag_force_dwarf_blocks.patch>
> > <0003-Create-label-scopes.patch>

[PATCH][1/n] Always-64bit HWI cleanups

2014-05-07 Thread Richard Biener


This removes the need_64bit_hwi logic, nothing else (well, brings
libcpp in line with gcc).

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Just as I promised to send this before committing the "let's try this"
patch (which is now said to fix wide-int fallout).

Richard.

2014-05-07  Richard Biener  

gcc/
* config.gcc: Remove need_64bit_hwint.
* configure.ac: Do not define NEED_64BIT_HOST_WIDE_INT.
* hwint.h: Do not check NEED_64BIT_HOST_WIDE_INT but assume
it to be true.
* config.in: Regenerate.
* configure: Likewise.

libcpp/
* configure.ac: Copy gcc logic of detecting a 64bit type.
Remove HOST_WIDE_INT define.
* include/cpplib.h: typedef cpp_num_part to a 64bit type,
similar to how hwint.h does it.
* config.in: Regenerate.
* configure: Likewise.

Index: trunk/gcc/config.gcc
===
*** trunk.orig/gcc/config.gcc   2014-04-30 10:16:58.491135331 +0200
--- trunk/gcc/config.gcc2014-04-30 10:24:43.902103288 +0200
***
*** 164,176 
  #  gasSet to yes or no depending on whether the target
  # system normally uses GNU as.
  #
- #  need_64bit_hwint   Set to yes if HOST_WIDE_INT must be 64 bits wide
- # for this target.  This is true if this target
- # supports "long" or "wchar_t" wider than 32 bits,
- # or BITS_PER_WORD is wider than 32 bits.
- # The setting made here must match the one made in
- # other locations such as libcpp/configure.ac
- #
  #  configure_default_options
  # Set to an initializer for configure_default_options
  # in configargs.h, based on --with-cpu et cetera.
--- 164,169 
*** gnu_ld="$gnu_ld_flag"
*** 233,239 
  default_use_cxa_atexit=no
  default_gnu_indirect_function=no
  target_gtfiles=
- need_64bit_hwint=yes
  need_64bit_isa=
  native_system_header_dir=/usr/include
  target_type_format_char='@'
--- 226,231 
*** m32c*-*-*)
*** 310,323 
  ;;
  aarch64*-*-*)
cpu_type=aarch64
-   need_64bit_hwint=yes
extra_headers="arm_neon.h"
extra_objs="aarch64-builtins.o aarch-common.o"
target_has_targetm_common=yes
;;
  alpha*-*-*)
cpu_type=alpha
-   need_64bit_hwint=yes
extra_options="${extra_options} g.opt"
;;
  am33_2.0-*-linux*)
--- 302,313 
*** arm*-*-*)
*** 333,339 
target_type_format_char='%'
c_target_objs="arm-c.o"
cxx_target_objs="arm-c.o"
-   need_64bit_hwint=yes
extra_options="${extra_options} arm/arm-tables.opt"
;;
  avr-*-*)
--- 323,328 
*** i[34567]86-*-*)
*** 363,369 
cpu_type=i386
c_target_objs="i386-c.o"
cxx_target_objs="i386-c.o"
-   need_64bit_hwint=yes
extra_options="${extra_options} fused-madd.opt"
extra_headers="cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
   pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h
--- 352,357 
*** x86_64-*-*)
*** 393,403 
   adxintrin.h fxsrintrin.h xsaveintrin.h xsaveoptintrin.h
   avx512cdintrin.h avx512erintrin.h avx512pfintrin.h
   shaintrin.h"
-   need_64bit_hwint=yes
;;
  ia64-*-*)
extra_headers=ia64intrin.h
-   need_64bit_hwint=yes
extra_options="${extra_options} g.opt fused-madd.opt"
;;
  hppa*-*-*)
--- 381,389 
*** microblaze*-*-*)
*** 420,426 
  ;;
  mips*-*-*)
cpu_type=mips
-   need_64bit_hwint=yes
extra_headers="loongson.h"
extra_options="${extra_options} g.opt mips/mips-tables.opt"
;;
--- 406,411 
*** picochip-*-*)
*** 438,444 
  powerpc*-*-*)
cpu_type=rs6000
extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h 
spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
-   need_64bit_hwint=yes
case x$with_cpu in

xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[345678]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|Xe6500)
cpu_is_64bit=yes
--- 423,428 
*** powerpc*-*-*)
*** 447,453 
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
;;
  rs6000*-*-*)
-   need_64bit_hwint=yes
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
;;
  score*-*-*)
--- 431,436 
*** sparc*-*-*)
*** 459,480 
c_target_objs="sparc-c.o"
cxx_target_objs="sparc-c.o"
extra_headers="visintrin.h"
-   need_64bit_hwint=yes
;;
  spu*-*-*)
cpu_type=spu
-   need_64bit_hwint=yes

Re: [PATCH][RFC] Always require a 64bit HWI

2014-05-07 Thread Richard Biener

On Wed, 30 Apr 2014, Richard Biener wrote:

> On Tue, 29 Apr 2014, Jeff Law wrote:
> 
> > On 04/29/14 05:21, Richard Biener wrote:
> > > 
> > > The following patch forces the availability of a 64bit HWI
> > > (without applying the cleanups that result from this).  I propose
> > > this exact patch for a short time to get those that are affected
> > > and do not want to be affected scream.
> > > 
> > > But honestly I don't see any important host architecture that
> > > not already requires a 64bit HWI.
> > > 
> > > Another concern is that the host compiler may not provide a
> > > 64bit type.  I'm not sure that this is an issue nowadays
> > > (even though C++98 doesn't have 'long long', so it's maybe
> > > more an issue now with C++ than it was previously with
> > > requiring C89).  But given that it wasn't an issue for
> > > the existing 64bit HWI requiring host archs it shouldn't
> > > be an issue now.
> > > 
> > > The benefit of this change is obviously the cleanup that
> > > can result from it - especially getting rid of code
> > > generation dependences on the host (!need_64bit_hwi
> > > doesn't mean we force a 32bit hwi).  As followup
> > > we can replace HOST_WIDE_INT and its friends with
> > > int64_t variants and appear less confusing to
> > > newcomers (and it's also less characters to type! yay!).
> > > 
> > > We'd still retain HOST_WIDEST_FAST_INT, and as Kenny
> > > said elsewhere wide-int should internally operate on that,
> > > not on the eventually slow int64_t.  But that's a separate
> > > issue.
> > > 
> > > So - any objections?
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > 2014-04-29  Richard Biener  
> > > 
> > >   libcpp/
> > >   * configure.ac: Always set need_64bit_hwint to yes.
> > >   * configure: Regenerated.
> > > 
> > >   * config.gcc: Always set need_64bit_hwint to yes.
> > No objections.  The requirement for 64 bit HWINT traces its origins back to
> > the MIPS R5900 target IIRC.  It's probably well past the time when we should
> > just bite the bullet and make HWINT 64 bits across the board.
> > 
> > If the host compiler doesn't support 64-bit HWINT, then it seems to me the
> > host compiler can be used to bootstrap 4.9, which can then be used to
> > bootstrap more modern GCCs.
> > 
> > And like you I suspect it's really not going to be an issue in practice.
> 
> I realized I forgot to copy gcc-patches, so done now (patch copied
> below again for reference).
> 
> I propose to apply the patch after the wide-int merge for a short
> period of time and then followup with a patch to remove the
> need_64bit_hwint code (I'll make sure to send that out for review
> before applying this one).
> 
> Testing coverage for non-64bit hwi configs is really low these
> days (I know of only 32bit hppa-*-* that is still built and
> tested semi-regularly - Dave, I suppose the host compiler
> has a 64bit long long type there, right?).

I have now applied the patch (as it is said to fix wide-int merge
fallout).  The plan is to go forward with cleanups that are
possible after this throughout stage1 (I sent the first cleanup
patch already, but further ones should wait until we released
4.9.1 to not make backports harder than necessary).

Richard.

> Thanks,
> Richard.
> 
> 2014-04-29  Richard Biener  
> 
>   libcpp/
>   * configure.ac: Always set need_64bit_hwint to yes.
>   * configure: Regenerated.
> 
>   * config.gcc: Always set need_64bit_hwint to yes.
> 
> Index: libcpp/configure.ac
> ===
> --- libcpp/configure.ac   (revision 209890)
> +++ libcpp/configure.ac   (working copy)
> @@ -200,7 +200,7 @@ case $target in
>   tilegx*-*-* | tilepro*-*-* )
>   need_64bit_hwint=yes ;;
>   *)
> - need_64bit_hwint=no ;;
> + need_64bit_hwint=yes ;;
>  esac
>  
>  case $need_64bit_hwint:$ac_cv_sizeof_long in
> Index: gcc/config.gcc
> ===
> --- gcc/config.gcc(revision 209890)
> +++ gcc/config.gcc(working copy)
> @@ -233,7 +233,7 @@ gnu_ld="$gnu_ld_flag"
>  default_use_cxa_atexit=no
>  default_gnu_indirect_function=no
>  target_gtfiles=
> -need_64bit_hwint=
> +need_64bit_hwint=yes
>  need_64bit_isa=
>  native_system_header_dir=/usr/include
>  target_type_format_char='@'
>

Re: we are starting the wide int merge

2014-05-07 Thread Richard Sandiford

Jan-Benedict Glaw  writes:
> On Tue, 2014-05-06 12:20:54 -0700, Mike Stump  wrote:
>> On May 6, 2014, at 8:19 AM, Kenneth Zadeck  wrote:
>> > please hold off on committing patches for the next couple of hours
>> > as we have a very large merge to do.
>> > thanks.
>> 
>> All done…  It is in.
>
> Just found one more:
>
> g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
> -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual 
> -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings 
> -fno-common  -DHAVE_CONFIG_H -I. -I. -I/home/vaxbuild/repos/gcc/gcc 
> -I/home/vaxbuild/repos/gcc/gcc/. -I/home/vaxbuild/repos/gcc/gcc/../include 
> -I/home/vaxbuild/repos/gcc/gcc/../libcpp/include  
> -I/home/vaxbuild/repos/gcc/gcc/../libdecnumber 
> -I/home/vaxbuild/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
> -I/home/vaxbuild/repos/gcc/gcc/../libbacktrace-o loop-iv.o -MT loop-iv.o 
> -MMD -MP -MF ./.deps/loop-iv.TPo /home/vaxbuild/repos/gcc/gcc/loop-iv.c
> In file included from /home/vaxbuild/repos/gcc/gcc/real.h:25:0,
>  from /home/vaxbuild/repos/gcc/gcc/rtl.h:27,
>  from /home/vaxbuild/repos/gcc/gcc/loop-iv.c:54:
> /home/vaxbuild/repos/gcc/gcc/wide-int.h: In instantiation of 
> ‘fixed_wide_int_storage::fixed_wide_int_storage(const T&) [with T = long 
> long unsigned int; int N = 160]’:
> /home/vaxbuild/repos/gcc/gcc/wide-int.h:724:15:   required from 
> ‘generic_wide_int::generic_wide_int(const T&) [with T = long long unsigned 
> int; storage = fixed_wide_int_storage<160>]’
> /home/vaxbuild/repos/gcc/gcc/loop-iv.c:2628:48:   required from here
> /home/vaxbuild/repos/gcc/gcc/wide-int.h:1172:45: error: incomplete type 
> ‘wi::int_traits’ used in nested name specifier
>WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
>  ^
> /home/vaxbuild/repos/gcc/gcc/wide-int.h:1173:47: error: incomplete type 
> ‘wi::int_traits’ used in nested name specifier
>wi::copy (*this, WIDE_INT_REF_FOR (T) (x, N));
>^
> make[1]: *** [loop-iv.o] Error 1

Looks like this is specific to 32-bit HOST_WIDE_INTs.  The problem was
that loop-iv.c was using HOST_WIDEST_INT and no template specialisations
were defined for that.

Richard B's patch to force HOST_WIDE_INT to 64 bits will fix this.

Thanks,
Richard

Re: [AArch64] Fix integer vabs intrinsics

2014-05-07 Thread Richard Earnshaw

On 05/05/14 09:04, Richard Biener wrote:
> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw  wrote:
>> On 02/05/14 11:28, James Greenhalgh wrote:
>>> On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote:


> On May 2, 2014, at 2:21 AM, James Greenhalgh  
> wrote:
>
>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>  wrote:
>>>
>>> Hi,
>>>
>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave 
>>> as
>>> the hardware. That is to say, the pseudo-code sequence:
>>
>>
>> Only for signed integer types.  You should be able to use an unsigned
>> integer type here instead.
>
> If anything, I think that puts us in a worse position.

 Not if you cast it back.


> The issue that
> inspires this patch is that GCC will happily fold:
>
>  t1 = ABS_EXPR (x)
>  t2 = GE_EXPR (t1, 0)
>
> to
>
>  t2 = TRUE
>
> Surely an unsigned integer type is going to suffer the same fate? 
> Certainly I
> can imagine somewhere in the compiler there being a fold path for:

 Yes but if add a cast from the unsigned type to the signed type gcc does 
 not
 optimize that. If it does it is a bug since the overflow is defined there.
>>>
>>> I'm not sure I understand, are you saying I want to fold to:
>>>
>>>   t1 = VIEW_CONVERT_EXPR (x, unsigned)
>>>   t2 = ABS_EXPR (t1)
>>>   t3 = VIEW_CONVERT_EXPR (t2, signed)
>>>
>>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
>>> other out leading to an overall NOP? It might just be Friday morning and a
>>> lack of coffee talking, but I think I need you to spell this one out to
>>> me in big letters!
>>>
>>
>> I agree.  I think what you need is a type widening so that you get
>>
>> t1 = VEC_WIDEN (x)
>> t2 = ABS_EXPR (t1)
>> t3 = VEC_NARROW (t2)
>>
>> This then guarantees that the ABS expression cannot be undefined.  I'm
>> less sure, however about the narrow causing a change in 'sign'.  Has it
>> just punted the problem?  Maybe you need
> 
> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
> result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
> That is, have an ABS_EXPR that doesn't have the undefined issue
> (at expense of optimization in case the result is immediately casted
> back to signed)
> 

Yes, that would make more sense, and is, in effect, what the ARM VABS
instruction is doing (producing an unsigned result with no undefined
behaviour).

I'm not sure I understand your 'at expense of optimization' comment,
though.  Surely a cast back to signed is essentially a no-op, since
there's no representational change in the value (at least, not on 2's
complement machines)?


> Richard.
> 
>>
>> t1 = VEC_WIDEN (x)
>> t2 = ABS_EXPR (t1)
>> t3 = VIEW_CONVERT_EXPR (x, unsigned)
>> t4 = VEC_NARROW (t3)
>> t5 = VIEW_CONVERT_EXPR (t4, signed)
>>
>> !!!
>>
>> How you capture this into RTL during expand, though, is another thing.
>>
>> R.
>>
>
>  (unsigned >= 0) == TRUE
>
>>>
>>>  a = vabs_s8 (vdup_n_s8 (-128));
>>>  assert (a >= 0);
>>>
>>> does not hold. As in hardware
>>>
>>>  abs (-128) == -128
>>>
>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should 
>>> avoid
>>> it. In fact, we have to be even more careful than that, and keep the 
>>> integer
>>> vabs intrinsics as an unspec in the back end.
>>
>> No it is not.  The mistake is to use signed integer types here.  Just
>> add a conversion to an unsigned integer vector and it will work
>> correctly.
>> In fact the ABS rtl code is not undefined for the overflow.
>
> Here we are covering ourselves against a seperate issue. For 
> auto-vectorized
> code we want the SABD combine patterns to kick in whenever sensible. For
> intrinsics code, in the case where vsub_s8 (x, y) would cause an 
> underflow:
>
>  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>
> So in this case, the combine would be erroneous. Likewise SABA.

 This sounds like it would problematic for unsigned types  and not just for
 vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
 instead. Since in rtl overflow and underflow is defined to be wrapping.
>>>
>>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
>>> with unsigned types. Further, I have never thought of RTL having signed
>>> and unsigned types, just a bag of bits. We'll want to use unspec for the
>>> intrinsic version of vabd_s8 - but we'll want to specify the
>>>
>>>   (abs (minus (reg) (reg)))
>>>
>>> behaviour so that auto-vectorized code can pick it up.
>>>
>>> So in the end we'll have these patterns:
>>>
>>>   (abs
>>> (abs (reg)))
>>>

Re: [AArch64] Fix integer vabs intrinsics

2014-05-07 Thread Richard Biener

On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw  wrote:
> On 05/05/14 09:04, Richard Biener wrote:
>> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw  wrote:
>>> On 02/05/14 11:28, James Greenhalgh wrote:
 On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote:
>
>
>> On May 2, 2014, at 2:21 AM, James Greenhalgh  
>> wrote:
>>
>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>  wrote:

 Hi,

 Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
 undefined/impossible, the neon intrinsics vabs intrinsics should 
 behave as
 the hardware. That is to say, the pseudo-code sequence:
>>>
>>>
>>> Only for signed integer types.  You should be able to use an unsigned
>>> integer type here instead.
>>
>> If anything, I think that puts us in a worse position.
>
> Not if you cast it back.
>
>
>> The issue that
>> inspires this patch is that GCC will happily fold:
>>
>>  t1 = ABS_EXPR (x)
>>  t2 = GE_EXPR (t1, 0)
>>
>> to
>>
>>  t2 = TRUE
>>
>> Surely an unsigned integer type is going to suffer the same fate? 
>> Certainly I
>> can imagine somewhere in the compiler there being a fold path for:
>
> Yes but if add a cast from the unsigned type to the signed type gcc does 
> not
> optimize that. If it does it is a bug since the overflow is defined there.

 I'm not sure I understand, are you saying I want to fold to:

   t1 = VIEW_CONVERT_EXPR (x, unsigned)
   t2 = ABS_EXPR (t1)
   t3 = VIEW_CONVERT_EXPR (t2, signed)

 Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
 other out leading to an overall NOP? It might just be Friday morning and a
 lack of coffee talking, but I think I need you to spell this one out to
 me in big letters!

>>>
>>> I agree.  I think what you need is a type widening so that you get
>>>
>>> t1 = VEC_WIDEN (x)
>>> t2 = ABS_EXPR (t1)
>>> t3 = VEC_NARROW (t2)
>>>
>>> This then guarantees that the ABS expression cannot be undefined.  I'm
>>> less sure, however about the narrow causing a change in 'sign'.  Has it
>>> just punted the problem?  Maybe you need
>>
>> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
>> result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
>> That is, have an ABS_EXPR that doesn't have the undefined issue
>> (at expense of optimization in case the result is immediately casted
>> back to signed)
>>
>
> Yes, that would make more sense, and is, in effect, what the ARM VABS
> instruction is doing (producing an unsigned result with no undefined
> behaviour).
>
> I'm not sure I understand your 'at expense of optimization' comment,
> though.  Surely a cast back to signed is essentially a no-op, since
> there's no representational change in the value (at least, not on 2's
> complement machines)?

We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR.

Richard.

>
>> Richard.
>>
>>>
>>> t1 = VEC_WIDEN (x)
>>> t2 = ABS_EXPR (t1)
>>> t3 = VIEW_CONVERT_EXPR (x, unsigned)
>>> t4 = VEC_NARROW (t3)
>>> t5 = VIEW_CONVERT_EXPR (t4, signed)
>>>
>>> !!!
>>>
>>> How you capture this into RTL during expand, though, is another thing.
>>>
>>> R.
>>>
>>
>>  (unsigned >= 0) == TRUE
>>

  a = vabs_s8 (vdup_n_s8 (-128));
  assert (a >= 0);

 does not hold. As in hardware

  abs (-128) == -128

 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we 
 should avoid
 it. In fact, we have to be even more careful than that, and keep the 
 integer
 vabs intrinsics as an unspec in the back end.
>>>
>>> No it is not.  The mistake is to use signed integer types here.  Just
>>> add a conversion to an unsigned integer vector and it will work
>>> correctly.
>>> In fact the ABS rtl code is not undefined for the overflow.
>>
>> Here we are covering ourselves against a seperate issue. For 
>> auto-vectorized
>> code we want the SABD combine patterns to kick in whenever sensible. For
>> intrinsics code, in the case where vsub_s8 (x, y) would cause an 
>> underflow:
>>
>>  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>
>> So in this case, the combine would be erroneous. Likewise SABA.
>
> This sounds like it would problematic for unsigned types  and not just for
> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
> instead. Since in rtl overflow and underflow is defined to be wrapping.

 There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
 with unsigned types. Further, I have never thought of RTL having signed
 and unsigned types, just a bag of bits. We

Re: [PATCH][RFC] Remove RTL loop unswitching

2014-05-07 Thread Thomas Schwinge

Hi!

On Tue, 15 Apr 2014 11:26:29 +0200 (CEST), Richard Biener  
wrote:
> This removes RTL loop unswitching

> 2014-04-15  Richard Biener  
> 
>   * Makefile.in (OBJS): Remove loop-unswitch.o.
>   * loop-unswitch.c: Delete.
>   * tree-pass.h (make_pass_rtl_unswitch): Remove.
>   * passes.def (pass_rtl_unswitch): Likewise.
>   * loop-init.c (gate_rtl_unswitch): Likewise.
>   (rtl_unswitch): Likewise.
>   (pass_data_rtl_unswitch): Likewise.
>   (pass_rtl_unswitch): Likewise.
>   (make_pass_rtl_unswitch): Likewise.
>   * rtl.h (reversed_condition): Likewise.
>   (compare_and_jump_seq): Likewise.
>   * loop-iv.c (reversed_condition): Move here from loop-unswitch.c
>   and make static.
>   * loop-unroll.c (compare_and_jump_seq): Likewise.

After checking with Richard on IRC, I applied the following in r210150:

commit 81283dac62a91d2fbdf154fe51e9f84e0b1db816
Author: tschwinge 
Date:   Wed May 7 10:31:26 2014 +

Really delete gcc/loop-unswitch.c.

gcc/
* loop-unswitch.c: Delete.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@210150 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git gcc/ChangeLog gcc/ChangeLog
index d5e6a0a..e5033a0 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,7 @@
+2014-05-07  Thomas Schwinge  
+
+   * loop-unswitch.c: Delete.
+
 2014-05-07  Richard Biener  
 
* config.gcc: Always set need_64bit_hwint to yes.
@@ -2294,7 +2298,6 @@
 2014-04-23  Richard Biener  
 
* Makefile.in (OBJS): Remove loop-unswitch.o.
-   * loop-unswitch.c: Delete.
* tree-pass.h (make_pass_rtl_unswitch): Remove.
* passes.def (pass_rtl_unswitch): Likewise.
* loop-init.c (gate_rtl_unswitch): Likewise.
diff --git gcc/loop-unswitch.c gcc/loop-unswitch.c
deleted file mode 100644
index fff0fd1..000


Grüße,
 Thomas


pgp6PZW4kmLlT.pgp
Description: PGP signature

Re: [AArch64] Fix integer vabs intrinsics

2014-05-07 Thread Richard Earnshaw

On 07/05/14 11:32, Richard Biener wrote:
> On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw  wrote:
>> On 05/05/14 09:04, Richard Biener wrote:
>>> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw  wrote:
 On 02/05/14 11:28, James Greenhalgh wrote:
> On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote:
>>
>>
>>> On May 2, 2014, at 2:21 AM, James Greenhalgh  
>>> wrote:
>>>
 On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
 On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
  wrote:
>
> Hi,
>
> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
> undefined/impossible, the neon intrinsics vabs intrinsics should 
> behave as
> the hardware. That is to say, the pseudo-code sequence:


 Only for signed integer types.  You should be able to use an unsigned
 integer type here instead.
>>>
>>> If anything, I think that puts us in a worse position.
>>
>> Not if you cast it back.
>>
>>
>>> The issue that
>>> inspires this patch is that GCC will happily fold:
>>>
>>>  t1 = ABS_EXPR (x)
>>>  t2 = GE_EXPR (t1, 0)
>>>
>>> to
>>>
>>>  t2 = TRUE
>>>
>>> Surely an unsigned integer type is going to suffer the same fate? 
>>> Certainly I
>>> can imagine somewhere in the compiler there being a fold path for:
>>
>> Yes but if add a cast from the unsigned type to the signed type gcc does 
>> not
>> optimize that. If it does it is a bug since the overflow is defined 
>> there.
>
> I'm not sure I understand, are you saying I want to fold to:
>
>   t1 = VIEW_CONVERT_EXPR (x, unsigned)
>   t2 = ABS_EXPR (t1)
>   t3 = VIEW_CONVERT_EXPR (t2, signed)
>
> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
> other out leading to an overall NOP? It might just be Friday morning and a
> lack of coffee talking, but I think I need you to spell this one out to
> me in big letters!
>

 I agree.  I think what you need is a type widening so that you get

 t1 = VEC_WIDEN (x)
 t2 = ABS_EXPR (t1)
 t3 = VEC_NARROW (t2)

 This then guarantees that the ABS expression cannot be undefined.  I'm
 less sure, however about the narrow causing a change in 'sign'.  Has it
 just punted the problem?  Maybe you need
>>>
>>> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
>>> result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
>>> That is, have an ABS_EXPR that doesn't have the undefined issue
>>> (at expense of optimization in case the result is immediately casted
>>> back to signed)
>>>
>>
>> Yes, that would make more sense, and is, in effect, what the ARM VABS
>> instruction is doing (producing an unsigned result with no undefined
>> behaviour).
>>
>> I'm not sure I understand your 'at expense of optimization' comment,
>> though.  Surely a cast back to signed is essentially a no-op, since
>> there's no representational change in the value (at least, not on 2's
>> complement machines)?
> 
> We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR.
> 

Unless you're assuming that ABS_EXPR(INT_MIN) will always trap, then if
you can derive it for ABS_EXPR (which really returns [0,
INT_MAX]+UNSPECIFIED, I don't really see why you can't derive it for
(int)ABSU_EXPR, which returns [0, INT_MAX]+INT_MIN, since the latter is
a subset of the former).

R.

> Richard.
> 
>>
>>> Richard.
>>>

 t1 = VEC_WIDEN (x)
 t2 = ABS_EXPR (t1)
 t3 = VIEW_CONVERT_EXPR (x, unsigned)
 t4 = VEC_NARROW (t3)
 t5 = VIEW_CONVERT_EXPR (t4, signed)

 !!!

 How you capture this into RTL during expand, though, is another thing.

 R.

>>>
>>>  (unsigned >= 0) == TRUE
>>>
>
>  a = vabs_s8 (vdup_n_s8 (-128));
>  assert (a >= 0);
>
> does not hold. As in hardware
>
>  abs (-128) == -128
>
> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we 
> should avoid
> it. In fact, we have to be even more careful than that, and keep the 
> integer
> vabs intrinsics as an unspec in the back end.

 No it is not.  The mistake is to use signed integer types here.  Just
 add a conversion to an unsigned integer vector and it will work
 correctly.
 In fact the ABS rtl code is not undefined for the overflow.
>>>
>>> Here we are covering ourselves against a seperate issue. For 
>>> auto-vectorized
>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an 
>>> underflow:
>>>
>>>  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>
>>> So in this case, the

[patch] libstdc++/61086 - fix ubsan errors in std::vector

2014-05-07 Thread Jonathan Wakely


The testcase in the PR calls __position._M_const_cast() to get a
mutable iterator and that dereferences the pointer as suggested in
http://gcc.gnu.org/ml/libstdc++/2013-05/msg00031.html
That's invalid because the pointer is not dereferenceable (in this
case it's null but is past-the-end at all times).

I played around with changing the __normal_iterator so we would do
__postition._M_const_cast(begin()) then decided we don't need it at
all and can just as easily obtain a mutable iterator using:

  auto __pos = begin() + (__position - cbegin());

I plan to commit the attached patch to trunk and 4.9 soon. I've tested
it on x86_64-linux but not added a testcase because we don't test with
-fsanitize (though we should do) and it only shows up with Clang
anyway.
commit 566623def309c70387e41da2346ff89aa7619b13
Author: Jonathan Wakely 
Date:   Wed May 7 12:17:41 2014 +0100

	PR libstdc++/61086
	* include/bits/stl_iterator.h (__normal_iterator::_M_const_cast):
	Remove.
	* include/bits/stl_vector.h (vector::insert, vector::erase): Use
	arithmetic to obtain a mutable iterator from const_iterator.
	* include/bits/vector.tcc (vector::insert): Likewise.
	* include/debug/vector (vector::erase): Likewise.
	* testsuite/23_containers/vector/requirements/dr438/assign_neg.cc:
	Adjust dg-error line number.
	* testsuite/23_containers/vector/requirements/dr438/
	constructor_1_neg.cc: Likewise.
	* testsuite/23_containers/vector/requirements/dr438/
	constructor_2_neg.cc: Likewise.
	* testsuite/23_containers/vector/requirements/dr438/insert_neg.cc:
	Likewise.

diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h
index 16f992c..f4522a4 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -736,21 +736,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _Container>::__type>& __i) _GLIBCXX_NOEXCEPT
 : _M_current(__i.base()) { }
 
-#if __cplusplus >= 201103L
-  __normal_iterator
-  _M_const_cast() const noexcept
-  {
-	using _PTraits = std::pointer_traits;
-	return __normal_iterator
-	  (_PTraits::pointer_to(const_cast
-(*_M_current)));
-  }
-#else
-  __normal_iterator
-  _M_const_cast() const
-  { return *this; }
-#endif
-
   // Forward iterator requirements
   reference
   operator*() const _GLIBCXX_NOEXCEPT
diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h
index 3d3a2cf..0a56c65 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1051,7 +1051,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   insert(const_iterator __position, size_type __n, const value_type& __x)
   {
 	difference_type __offset = __position - cbegin();
-	_M_fill_insert(__position._M_const_cast(), __n, __x);
+	_M_fill_insert(begin() + __offset, __n, __x);
 	return begin() + __offset;
   }
 #else
@@ -1096,7 +1096,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	   _InputIterator __last)
 {
 	  difference_type __offset = __position - cbegin();
-	  _M_insert_dispatch(__position._M_const_cast(),
+	  _M_insert_dispatch(begin() + __offset,
 			 __first, __last, __false_type());
 	  return begin() + __offset;
 	}
@@ -1144,10 +1144,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   iterator
 #if __cplusplus >= 201103L
   erase(const_iterator __position)
+  { return _M_erase(begin() + (__position - cbegin())); }
 #else
   erase(iterator __position)
+  { return _M_erase(__position); }
 #endif
-  { return _M_erase(__position._M_const_cast()); }
 
   /**
*  @brief  Remove a range of elements.
@@ -1170,10 +1171,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   iterator
 #if __cplusplus >= 201103L
   erase(const_iterator __first, const_iterator __last)
+  {
+	const auto __beg = begin();
+	const auto __cbeg = cbegin();
+	return _M_erase(__beg + (__first - __cbeg), __beg + (__last - __cbeg));
+  }
 #else
   erase(iterator __first, iterator __last)
+  { return _M_erase(__first, __last); }
 #endif
-  { return _M_erase(__first._M_const_cast(), __last._M_const_cast()); }
 
   /**
*  @brief  Swaps data with another %vector.
diff --git a/libstdc++-v3/include/bits/vector.tcc b/libstdc++-v3/include/bits/vector.tcc
index 299e614..5c3dfae 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -121,14 +121,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   else
 	{
 #if __cplusplus >= 201103L
+	  const auto __pos = begin() + (__position - cbegin());
 	  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
 	{
 	  _Tp __x_copy = __x;
-	  _M_insert_aux(__position._M_const_cast(), std::move(__x_copy));
+	  _M_insert_aux(__pos, std::move(__x_copy));
 	}
 	  else
+	_M_insert_aux(__pos, __x);
+#else
+	_M_insert_aux(__position, __x);
 #endif
-	_M_insert_aux(_

Re: [patch] libstdc++/61086 - fix ubsan errors in std::vector

2014-05-07 Thread Paolo Carlini



On 05/07/2014 02:07 PM, Jonathan Wakely wrote:

The testcase in the PR calls __position._M_const_cast() to get a
mutable iterator and that dereferences the pointer as suggested in
http://gcc.gnu.org/ml/libstdc++/2013-05/msg00031.html
That's invalid because the pointer is not dereferenceable (in this
case it's null but is past-the-end at all times).
Uhmm, I see, at the time I scratched my head a bit. Nice that we can 
avoid the whole thing. Are we sure we don't have something similar 
elsewhere?


Paolo.

Re: [patch] libstdc++/61086 - fix ubsan errors in std::vector

2014-05-07 Thread Jonathan Wakely


On 07/05/14 14:21 +0200, Paolo Carlini wrote:


On 05/07/2014 02:07 PM, Jonathan Wakely wrote:

The testcase in the PR calls __position._M_const_cast() to get a
mutable iterator and that dereferences the pointer as suggested in
http://gcc.gnu.org/ml/libstdc++/2013-05/msg00031.html
That's invalid because the pointer is not dereferenceable (in this
case it's null but is past-the-end at all times).
Uhmm, I see, at the time I scratched my head a bit. Nice that we can 
avoid the whole thing. Are we sure we don't have something similar 
elsewhere?


Yes, I checked. deque::const_iterator, list::const_iterator,
vector::const_iterator and the _Rb_tree_const_iterator types all
have _M_const_cast but they do not dereference anything.

It only really affected std::vector because that's the only one of our
containers that correctly supports custom pointer types (when my fixes
for PR57272 are ready I'll need to deal with the issue again and will
be careful about dereferencing).

Re: [PATCH][RFC] Remove RTL loop unswitching

2014-05-07 Thread Thomas Schwinge

Hi!

On Tue, 15 Apr 2014 11:26:29 +0200 (CEST), Richard Biener  
wrote:
> This removes RTL loop unswitching

> 2014-04-15  Richard Biener  
> 
>   * Makefile.in (OBJS): Remove loop-unswitch.o.
>   * loop-unswitch.c: Delete.
>   * tree-pass.h (make_pass_rtl_unswitch): Remove.
>   * passes.def (pass_rtl_unswitch): Likewise.
>   * loop-init.c (gate_rtl_unswitch): Likewise.
>   (rtl_unswitch): Likewise.
>   (pass_data_rtl_unswitch): Likewise.
>   (pass_rtl_unswitch): Likewise.
>   (make_pass_rtl_unswitch): Likewise.
>   * rtl.h (reversed_condition): Likewise.
>   (compare_and_jump_seq): Likewise.
>   * loop-iv.c (reversed_condition): Move here from loop-unswitch.c
>   and make static.
>   * loop-unroll.c (compare_and_jump_seq): Likewise.

I found some more; OK to commit?  Is a non-bootstrap build enough for
this, or is a full bootstrap build and test needed?

commit 8a703b1e7adc6001f665a12f93601382e3eea806
Author: Thomas Schwinge 
Date:   Wed May 7 13:01:47 2014 +0200

More gcc/loop-unswitch.c cleanup.

gcc/
* cfgloop.h (unswitch_loops): Remove.
* doc/passes.texi: Remove references to loop-unswitch.c
* timevar.def (TV_LOOP_UNSWITCH): Remove.

diff --git gcc/cfgloop.h gcc/cfgloop.h
index ab8b809..62a656a 100644
--- gcc/cfgloop.h
+++ gcc/cfgloop.h
@@ -711,8 +711,6 @@ extern void loop_optimizer_init (unsigned);
 extern void loop_optimizer_finalize (void);
 
 /* Optimization passes.  */
-extern void unswitch_loops (void);
-
 enum
 {
   UAP_PEEL = 1,/* Enables loop peeling.  */
diff --git gcc/doc/passes.texi gcc/doc/passes.texi
index 2727b2c..fb064db 100644
--- gcc/doc/passes.texi
+++ gcc/doc/passes.texi
@@ -474,10 +474,7 @@ merging and induction variable elimination.  The pass is 
implemented in
 Loop unswitching.  This pass moves the conditional jumps that are invariant
 out of the loops.  To achieve this, a duplicate of the loop is created for
 each possible outcome of conditional jump(s).  The pass is implemented in
-@file{tree-ssa-loop-unswitch.c}.  This pass should eventually replace the
-RTL level loop unswitching in @file{loop-unswitch.c}, but currently
-the RTL level pass is not completely redundant yet due to deficiencies
-in tree level alias analysis.
+@file{tree-ssa-loop-unswitch.c}.
 
 The optimizations also use various utility functions contained in
 @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
@@ -793,8 +790,8 @@ The source files @file{cfgloopanal.c} and 
@file{cfgloopmanip.c} contain
 generic loop analysis and manipulation code.  Initialization and finalization
 of loop structures is handled by @file{loop-init.c}.
 A loop invariant motion pass is implemented in @file{loop-invariant.c}.
-Basic block level optimizations---unrolling, peeling and unswitching loops---
-are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}.
+Basic block level optimizations---unrolling, and peeling loops---
+are implemented in @file{loop-unroll.c}.
 Replacing of the exit condition of loops by special machine-dependent
 instructions is handled by @file{loop-doloop.c}.
 
diff --git gcc/timevar.def gcc/timevar.def
index 9faf98b..2db1943 100644
--- gcc/timevar.def
+++ gcc/timevar.def
@@ -207,7 +207,6 @@ DEFTIMEVAR (TV_DSE2  , "dead store elim2")
 DEFTIMEVAR (TV_LOOP  , "loop analysis")
 DEFTIMEVAR (TV_LOOP_INIT, "loop init")
 DEFTIMEVAR (TV_LOOP_MOVE_INVARIANTS  , "loop invariant motion")
-DEFTIMEVAR (TV_LOOP_UNSWITCH , "loop unswitching")
 DEFTIMEVAR (TV_LOOP_UNROLL   , "loop unrolling")
 DEFTIMEVAR (TV_LOOP_DOLOOP   , "loop doloop")
 DEFTIMEVAR (TV_LOOP_FINI, "loop fini")


Grüße,
 Thomas


pgpP6eLZr8j19.pgp
Description: PGP signature

Re: [PATCH][RFC] Remove RTL loop unswitching

2014-05-07 Thread Richard Biener

On Wed, 7 May 2014, Thomas Schwinge wrote:

> Hi!
> 
> On Tue, 15 Apr 2014 11:26:29 +0200 (CEST), Richard Biener  
> wrote:
> > This removes RTL loop unswitching
> 
> > 2014-04-15  Richard Biener  
> > 
> > * Makefile.in (OBJS): Remove loop-unswitch.o.
> > * loop-unswitch.c: Delete.
> > * tree-pass.h (make_pass_rtl_unswitch): Remove.
> > * passes.def (pass_rtl_unswitch): Likewise.
> > * loop-init.c (gate_rtl_unswitch): Likewise.
> > (rtl_unswitch): Likewise.
> > (pass_data_rtl_unswitch): Likewise.
> > (pass_rtl_unswitch): Likewise.
> > (make_pass_rtl_unswitch): Likewise.
> > * rtl.h (reversed_condition): Likewise.
> > (compare_and_jump_seq): Likewise.
> > * loop-iv.c (reversed_condition): Move here from loop-unswitch.c
> > and make static.
> > * loop-unroll.c (compare_and_jump_seq): Likewise.
> 
> I found some more; OK to commit?  Is a non-bootstrap build enough for
> this, or is a full bootstrap build and test needed?

That's enough.

Ok.

Thanks,
Richard.

> commit 8a703b1e7adc6001f665a12f93601382e3eea806
> Author: Thomas Schwinge 
> Date:   Wed May 7 13:01:47 2014 +0200
> 
> More gcc/loop-unswitch.c cleanup.
> 
>   gcc/
>   * cfgloop.h (unswitch_loops): Remove.
>   * doc/passes.texi: Remove references to loop-unswitch.c
>   * timevar.def (TV_LOOP_UNSWITCH): Remove.
> 
> diff --git gcc/cfgloop.h gcc/cfgloop.h
> index ab8b809..62a656a 100644
> --- gcc/cfgloop.h
> +++ gcc/cfgloop.h
> @@ -711,8 +711,6 @@ extern void loop_optimizer_init (unsigned);
>  extern void loop_optimizer_finalize (void);
>  
>  /* Optimization passes.  */
> -extern void unswitch_loops (void);
> -
>  enum
>  {
>UAP_PEEL = 1,  /* Enables loop peeling.  */
> diff --git gcc/doc/passes.texi gcc/doc/passes.texi
> index 2727b2c..fb064db 100644
> --- gcc/doc/passes.texi
> +++ gcc/doc/passes.texi
> @@ -474,10 +474,7 @@ merging and induction variable elimination.  The pass is 
> implemented in
>  Loop unswitching.  This pass moves the conditional jumps that are invariant
>  out of the loops.  To achieve this, a duplicate of the loop is created for
>  each possible outcome of conditional jump(s).  The pass is implemented in
> -@file{tree-ssa-loop-unswitch.c}.  This pass should eventually replace the
> -RTL level loop unswitching in @file{loop-unswitch.c}, but currently
> -the RTL level pass is not completely redundant yet due to deficiencies
> -in tree level alias analysis.
> +@file{tree-ssa-loop-unswitch.c}.
>  
>  The optimizations also use various utility functions contained in
>  @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and
> @@ -793,8 +790,8 @@ The source files @file{cfgloopanal.c} and 
> @file{cfgloopmanip.c} contain
>  generic loop analysis and manipulation code.  Initialization and finalization
>  of loop structures is handled by @file{loop-init.c}.
>  A loop invariant motion pass is implemented in @file{loop-invariant.c}.
> -Basic block level optimizations---unrolling, peeling and unswitching loops---
> -are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}.
> +Basic block level optimizations---unrolling, and peeling loops---
> +are implemented in @file{loop-unroll.c}.
>  Replacing of the exit condition of loops by special machine-dependent
>  instructions is handled by @file{loop-doloop.c}.
>  
> diff --git gcc/timevar.def gcc/timevar.def
> index 9faf98b..2db1943 100644
> --- gcc/timevar.def
> +++ gcc/timevar.def
> @@ -207,7 +207,6 @@ DEFTIMEVAR (TV_DSE2  , "dead store elim2")
>  DEFTIMEVAR (TV_LOOP  , "loop analysis")
>  DEFTIMEVAR (TV_LOOP_INIT  , "loop init")
>  DEFTIMEVAR (TV_LOOP_MOVE_INVARIANTS  , "loop invariant motion")
> -DEFTIMEVAR (TV_LOOP_UNSWITCH , "loop unswitching")
>  DEFTIMEVAR (TV_LOOP_UNROLL   , "loop unrolling")
>  DEFTIMEVAR (TV_LOOP_DOLOOP   , "loop doloop")
>  DEFTIMEVAR (TV_LOOP_FINI  , "loop fini")
> 
> 
> Grüße,
>  Thomas
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Re: [patch] libstdc++/61086 - fix ubsan errors in std::vector

2014-05-07 Thread Paolo Carlini


Hi,

On 05/07/2014 02:33 PM, Jonathan Wakely wrote:

Yes, I checked. deque::const_iterator, list::const_iterator,
vector::const_iterator and the _Rb_tree_const_iterator types all 
have _M_const_cast but they do not dereference anything.


It only really affected std::vector because that's the only one of our
containers that correctly supports custom pointer types (when my fixes 
for PR57272 are ready I'll need to deal with the issue again and will 
be careful about dereferencing).

Excellent. Thanks again!

Paolo.

Re: debug container patch

2014-05-07 Thread Ramana Radhakrishnan

On Wed, May 7, 2014 at 2:13 AM, Paolo Carlini  wrote:
> -- Francois,
>
> remember to regenerate and commit the Makefile.in changes.

Can someone regenerate and commit the Makefile.in changes soon ? I'm
seeing testsuite failures thanks to missing debug/safe_container.h on
arm-none-linux-gnueabihf

I don't have access to a machine right now with the right versions of
autoconf and automake that can do this easily.

Ramana

>
> Thanks,
> Paolo.

[PATCH][1/n] Fix PR61034

2014-05-07 Thread Richard Biener


The following fixes part of PR61034 - we are hindered by false
clobbering during FRE/PRE on paths we try to look through by
means of the alias walker.  The following makes us also
consider lattice-based disambiguation there and in particular
also try harder to disambiguate against builtins.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-05-07  Richard Biener  

PR tree-optimization/61034
* tree-ssa-alias.c (call_may_clobber_ref_p_1): Export.
(maybe_skip_until): Use translate to take into account
lattices when trying to do disambiguations.
(get_continuation_for_phi_1): Likewise.
(get_continuation_for_phi): Adjust for added translate
arguments.
(walk_non_aliased_vuses): Likewise.
* tree-ssa-alias.h (get_continuation_for_phi): Adjust
prototype.
(walk_non_aliased_vuses): Likewise.
(call_may_clobber_ref_p_1): Declare.
* tree-ssa-sccvn.c (vn_reference_lookup_3): Also
disambiguate against calls.  Stop early if we are
only supposed to disambiguate.
* tree-ssa-pre.c (translate_vuse_through_block): Adjust.

* g++.dg/tree-ssa/pr61034.C: New testcase.

Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c.orig   2014-05-07 13:53:47.015599960 +0200
--- gcc/tree-ssa-alias.c2014-05-07 14:07:09.087544738 +0200
*** ref_maybe_used_by_stmt_p (gimple stmt, t
*** 1835,1841 
  /* If the call in statement CALL may clobber the memory reference REF
 return true, otherwise return false.  */
  
! static bool
  call_may_clobber_ref_p_1 (gimple call, ao_ref *ref)
  {
tree base;
--- 1835,1841 
  /* If the call in statement CALL may clobber the memory reference REF
 return true, otherwise return false.  */
  
! bool
  call_may_clobber_ref_p_1 (gimple call, ao_ref *ref)
  {
tree base;
*** stmt_kills_ref_p (gimple stmt, tree ref)
*** 2318,2324 
  static bool
  maybe_skip_until (gimple phi, tree target, ao_ref *ref,
  tree vuse, unsigned int *cnt, bitmap *visited,
! bool abort_on_visited)
  {
basic_block bb = gimple_bb (phi);
  
--- 2318,2326 
  static bool
  maybe_skip_until (gimple phi, tree target, ao_ref *ref,
  tree vuse, unsigned int *cnt, bitmap *visited,
! bool abort_on_visited,
! void *(*translate)(ao_ref *, tree, void *, bool),
! void *data)
  {
basic_block bb = gimple_bb (phi);
  
*** maybe_skip_until (gimple phi, tree targe
*** 2338,2344 
  if (bitmap_bit_p (*visited, SSA_NAME_VERSION (PHI_RESULT (def_stmt
return !abort_on_visited;
  vuse = get_continuation_for_phi (def_stmt, ref, cnt,
!  visited, abort_on_visited);
  if (!vuse)
return false;
  continue;
--- 2340,2347 
  if (bitmap_bit_p (*visited, SSA_NAME_VERSION (PHI_RESULT (def_stmt
return !abort_on_visited;
  vuse = get_continuation_for_phi (def_stmt, ref, cnt,
!  visited, abort_on_visited,
!  translate, data);
  if (!vuse)
return false;
  continue;
*** maybe_skip_until (gimple phi, tree targe
*** 2350,2356 
  /* A clobbering statement or the end of the IL ends it failing.  */
  ++*cnt;
  if (stmt_may_clobber_ref_p_1 (def_stmt, ref))
!   return false;
}
/* If we reach a new basic-block see if we already skipped it
   in a previous walk that ended successfully.  */
--- 2353,2365 
  /* A clobbering statement or the end of the IL ends it failing.  */
  ++*cnt;
  if (stmt_may_clobber_ref_p_1 (def_stmt, ref))
!   {
! if (translate
! && (*translate) (ref, vuse, data, true) == NULL)
!   ;
! else
!   return false;
!   }
}
/* If we reach a new basic-block see if we already skipped it
   in a previous walk that ended successfully.  */
*** maybe_skip_until (gimple phi, tree targe
*** 2372,2378 
  static tree
  get_continuation_for_phi_1 (gimple phi, tree arg0, tree arg1,
ao_ref *ref, unsigned int *cnt,
!   bitmap *visited, bool abort_on_visited)
  {
gimple def0 = SSA_NAME_DEF_STMT (arg0);
gimple def1 = SSA_NAME_DEF_STMT (arg1);
--- 2381,2389 
  static tree
  get_continuation_for_phi_1 (gimple phi, tree arg0, tree arg1,
ao_ref *ref, unsigned int *cnt,
!   bitmap *visited, bool abort_on_visited,
!   void *(*translate)(ao_ref *, tree, void *, bool),
!   void

Re: debug container patch

2014-05-07 Thread Jonathan Wakely


On 07/05/14 14:17 +0100, Ramana Radhakrishnan wrote:

Can someone regenerate and commit the Makefile.in changes soon ? I'm
seeing testsuite failures thanks to missing debug/safe_container.h on
arm-none-linux-gnueabihf


It was done hours ago by
http://gcc.gnu.org/ml/gcc-cvs/2014-05/msg00170.html

Re: debug container patch

2014-05-07 Thread Ramana Radhakrishnan

On Wed, May 7, 2014 at 2:22 PM, Jonathan Wakely  wrote:
> On 07/05/14 14:17 +0100, Ramana Radhakrishnan wrote:
>>
>> Can someone regenerate and commit the Makefile.in changes soon ? I'm
>> seeing testsuite failures thanks to missing debug/safe_container.h on
>> arm-none-linux-gnueabihf
>
>
> It was done hours ago by
> http://gcc.gnu.org/ml/gcc-cvs/2014-05/msg00170.html

Sorry about the noise. I realized that just after I had hit send. not
enough coffee today.

Ramana

Re: [C++ Patch] PR 61080

2014-05-07 Thread Jason Merrill


OK.

Jason

[patch] libstdc++/61023 - copy comparison functor in RB tree move assignment

2014-05-07 Thread Jonathan Wakely


As noted in the PR, the standard doesn't actually say what containers
should do with their functors on move construction/assignment.

Our unordered containers currently move the hash and predicate
functions.

Our RB trees copy the comparison function in the move constructor but
do nothing with it in the move assignment. I think moving in both
cases is probably correct, but rather than change the existing move
constructor this patch just makes the move assignment copy the
function, for consistency.

When the standard is clarified we can review whether we should be
moving instead of copying.

Tested x86_64-linux, committed to trunk and the 4.9 branch.
commit 42ea108aeb7528ff3b41f7c1b9d11f3a8ba1bae8
Author: Jonathan Wakely 
Date:   Wed May 7 14:25:48 2014 +0100

	PR libstdc++/61023
	* include/bits/stl_tree.h (_Rb_tree::_M_move_assign): Copy the
	comparison function.
	* testsuite/23_containers/set/cons/61023.cc: New.

diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h
index 288c9fa..ce43ab8 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -1073,6 +1073,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
 _M_move_assign(_Rb_tree& __x)
 {
+  _M_impl._M_key_compare = __x._M_impl._M_key_compare;
   if (_Alloc_traits::_S_propagate_on_move_assign()
 	  || _Alloc_traits::_S_always_equal()
 	  || _M_get_Node_allocator() == __x._M_get_Node_allocator())
diff --git a/libstdc++-v3/testsuite/23_containers/set/cons/61023.cc b/libstdc++-v3/testsuite/23_containers/set/cons/61023.cc
new file mode 100644
index 000..087b9cc
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/set/cons/61023.cc
@@ -0,0 +1,56 @@
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+
+struct Comparator
+{
+  Comparator() : valid(false) { }
+  explicit Comparator(bool) : valid(true) { }
+
+  bool operator()(int i, int j) const
+  {
+if (!valid)
+  throw std::logic_error("Comparator is invalid");
+return i < j;
+  }
+
+private:
+  bool valid;
+};
+
+int main()
+{
+  using test_type = std::set;
+
+  Comparator cmp{true};
+
+  test_type good{cmp};
+
+  test_type s1;
+  s1 = good; // copy-assign
+  s1.insert(1);
+  s1.insert(2);
+
+  test_type s2;
+  s2 = std::move(good);  // move-assign
+  s2.insert(1);
+  s2.insert(2);
+}

Re: [Patch ARM 1/3] Neon intrinsics TLC : Replace intrinsics with GNU C implementations where possible.

2014-05-07 Thread Richard Earnshaw

On 28/04/14 14:01, Ramana Radhakrishnan wrote:
> 
> On Mon, Apr 28, 2014 at 12:44 PM, Julian Brown  
> wrote:
>  > On Mon, 28 Apr 2014 11:44:01 +0100
>  > Ramana Radhakrishnan  wrote:
>  >
>  >> I've special cased the ffast-math case for the _f32 intrinsics to
>  >> prevent the auto-vectorizer from coming along and vectorizing addv2sf
>  >> and addv4sf type operations which we don't want to happen by default.
>  >> Patch 1/3 causes apparent "regressions" in the rather ineffective
>  >> neon intrinsics tests that we currently carry soon hopefully to be
>  >> replaced by Christophe Lyon's rewrite that is being reviewed. On the
>  >> whole I deem this patch stack to be safe to go in if necessary. These
>  >> "regressions" are for -O0 with the vbic and vorn intrinsics which
>  >> don't now get combined and well, so be it.
>  >
>  > I think reimplementing these intrinsics in C is a mistake if we ever
>  > hope to make big-endian mode work properly, and "fixing" the generated
>  > header file by bypassing the generator makes it harder to accurately
>  > perform the sweeping changes that will probably be necessary to do that.#
> 
> 
>  > Recall e.g. the discussion around:
> 
>  >
>  > http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00161.html
> 
> Well, it would help if the generator were written in a better language 
> than ML :) . While I don't mind the different language in the backend 
> once in a while the problem is that everytime anyone needs to make a 
> change to this file, we spend far more time relearning ML than actually 
> doing the change :(.
> 

I agree: it's time the ML files went.  They're an impediment to
maintenance these days.

When the ML description was added it did three things: generated
arm_neon.h, generated the testsuite and generated a pipeline description
for Cortex-A8.  As we've progressed the second and third of these have
gone away (or at least, are about to in the case of the testsuite),
leaving only the arm_neon.h generation.  I don't see any real merit in
having that file generated from the ML file; we might as well just
maintain the existing code directly and that brings about the chance to
have more people actively work on fixing issues there without having to
learn ML first.

R.

[PATCH, PR 60897] Clear DECL_LANG_SPECIFIC when creating ISRA clones

2014-05-07 Thread Martin Jambor

Hi,

I nearly forgot about this patch to fix PR 60897 where we get a
mangled name in a warning for IPA-SRA functions because IPA-SRA
currently does not clear DECL_LANG_SPECIFIC when it messes with formal
parameters and the front-end then does not look at abstract origin
when it is not NULL.

Bootstrapped and tested on x86_64-linux.  OK for trunk?  Also,
although I have not tested it there yet, I suppose this should also be
committed to the 4.9 branch.

Thanks,

Martin


2014-04-22  Martin Jambor  

PR ipa/60897
* ipa-prop.c (ipa_modify_formal_parameters): Reset DECL_LANG_SPECIFIC.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 9f144fa..0bc44d3 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3650,6 +3650,7 @@ ipa_modify_formal_parameters (tree fndecl, 
ipa_parm_adjustment_vec adjustments)
 
   TREE_TYPE (fndecl) = new_type;
   DECL_VIRTUAL_P (fndecl) = 0;
+  DECL_LANG_SPECIFIC (fndecl) = NULL;
   otypes.release ();
   oparms.release ();
 }

Re: [PATCH] Fix GDB PR15559 (inferior calls using "thiscall" calling convention)

2014-05-07 Thread Tom Tromey

Tom> The usual approach is some appropriate text somewhere on the GCC wiki
Tom> (though I suppose a note in the mail archives would do in a pinch)
Tom> along with a URL in a comment in the appropriate file (dwarf2.h or
Tom> dwarf2.def).

Tom> Could you please do that?

Julian> How's this, as a first attempt?
Julian> http://gcc.gnu.org/wiki/GNUDwarfExtensions

Sorry I didn't reply to this sooner.
That page looks great.  Thanks for doing this.

Tom

[C++ PATCH] demangler fix

2014-05-07 Thread Gary Benson

Hi all,

A patch I committed to libiberty last year [1, 2] caused a regression
that caused the demangler to segfault on certain symbols [3, 4, 5, 6].
The attached patch fixes, and adds regression tests for all symbols
referenced in those bugs.

Ok to commit?

Thanks,
Gary

--
http://gbenson.net/

[1] http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01299.html
[2] http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01755.html
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=14963
[4] https://sourceware.org/bugzilla/show_bug.cgi?id=16593
[5] https://sourceware.org/bugzilla/show_bug.cgi?id=16752
[6] https://sourceware.org/bugzilla/show_bug.cgi?id=16845

2014-05-07  Gary Benson  

* cp-demangle.c (struct d_component_stack): New structure.
(struct d_print_info): New field component_stack.
(d_print_init): Initialize the above.
(d_print_comp_inner): Renamed from d_print_comp.
Do not restore template stack if it would cause a loop.
(d_print_comp): New function.
* testsuite/demangle-expected: New test cases.

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index bf2ffa9..41c86c7 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -275,6 +275,16 @@ struct d_growable_string
   int allocation_failure;
 };
 
+/* Stack of components, innermost first, used to avoid loops.  */
+
+struct d_component_stack
+{
+  /* This component.  */
+  const struct demangle_component *dc;
+  /* This component's parent.  */
+  const struct d_component_stack *parent;
+};
+
 /* A demangle component and some scope captured when it was first
traversed.  */
 
@@ -327,6 +337,8 @@ struct d_print_info
   int pack_index;
   /* Number of d_print_flush calls so far.  */
   unsigned long int flush_count;
+  /* Stack of components, innermost first, used to avoid loops.  */
+  const struct d_component_stack *component_stack;
   /* Array of saved scopes for evaluating substitutions.  */
   struct d_saved_scope *saved_scopes;
   /* Index of the next unused saved scope in the above array.  */
@@ -3934,6 +3946,8 @@ d_print_init (struct d_print_info *dpi, 
demangle_callbackref callback,
 
   dpi->demangle_failure = 0;
 
+  dpi->component_stack = NULL;
+
   dpi->saved_scopes = NULL;
   dpi->next_saved_scope = 0;
   dpi->num_saved_scopes = 0;
@@ -4269,8 +4283,8 @@ d_get_saved_scope (struct d_print_info *dpi,
 /* Subroutine to handle components.  */
 
 static void
-d_print_comp (struct d_print_info *dpi, int options,
-  const struct demangle_component *dc)
+d_print_comp_inner (struct d_print_info *dpi, int options,
+ const struct demangle_component *dc)
 {
   /* Magic variable to let reference smashing skip over the next modifier
  without needing to modify *dc.  */
@@ -4673,11 +4687,30 @@ d_print_comp (struct d_print_info *dpi, int options,
  }
else
  {
+   const struct d_component_stack *dcse;
+   int found_self_or_parent = 0;
+
/* This traversal is reentering SUB as a substition.
-  Restore the original templates temporarily.  */
-   saved_templates = dpi->templates;
-   dpi->templates = scope->templates;
-   need_template_restore = 1;
+  If we are not beneath SUB or DC in the tree then we
+  need to restore SUB's template stack temporarily.  */
+   for (dcse = dpi->component_stack; dcse != NULL;
+dcse = dcse->parent)
+ {
+   if (dcse->dc == sub
+   || (dcse->dc == dc
+   && dcse != dpi->component_stack))
+ {
+   found_self_or_parent = 1;
+   break;
+ }
+ }
+
+   if (!found_self_or_parent)
+ {
+   saved_templates = dpi->templates;
+   dpi->templates = scope->templates;
+   need_template_restore = 1;
+ }
  }
 
a = d_lookup_template_argument (dpi, sub);
@@ -5316,6 +5349,21 @@ d_print_comp (struct d_print_info *dpi, int options,
 }
 }
 
+static void
+d_print_comp (struct d_print_info *dpi, int options,
+ const struct demangle_component *dc)
+{
+  struct d_component_stack self;
+
+  self.dc = dc;
+  self.parent = dpi->component_stack;
+  dpi->component_stack = &self;
+
+  d_print_comp_inner (dpi, options, dc);
+
+  dpi->component_stack = self.parent;
+}
+
 /* Print a Java dentifier.  For Java we try to handle encoded extended
Unicode characters.  The C++ ABI doesn't mention Unicode encoding,
so we don't it for C++.  Characters are encoded as
diff --git a/libiberty/testsuite/demangle-expected 
b/libiberty/testsuite/demangle-expected
index 3ff08e6..453f9a3 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -4

Re: PR 61084: SPARC fallout from wide-int merge

2014-05-07 Thread Mike Stump

On May 7, 2014, at 2:26 AM, Richard Sandiford  
wrote:
> The DImode constant spliiter assigned the result of trunc_int_for_mode
> to an unsigned int rather than a HOST_WIDE_INT.  This then produced const_ints
> that were zero-extended rather than sign-extended and tripped the assert:
> 
>   gcc_checking_assert (INTVAL (x.first)
>== sext_hwi (INTVAL (x.first), precision)
>|| (x.second == BImode && INTVAL (x.first) == 1));
> 
> The other hunks are just by inspection, but I think gen_int_mode is
> preferred over GEN_INT when the mode is obvious.
> 
> Tested by Rainer, who says that the boostrap now completes.
> OK to install?

Ok.

[Committed] Add myself to MAINTAINERS

2014-05-07 Thread Charles Baylis

Committed as r210164.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 210161)
+++ MAINTAINERS (working copy)
@@ -315,6 +315,7 @@
 Simon Baldwin  sim...@google.com
 Scott Bambroughsco...@netwinder.org
 Wolfgang Bangerth  bange...@dealii.org
+Charles Baylis charles.bay...@linaro.org
 Tejas Belagod  tejas.bela...@arm.com
 Andrey Belevantsev a...@ispras.ru
 Jon Beniston   j...@beniston.com

Re: [C++ PATCH] demangler fix

2014-05-07 Thread Jason Merrill


OK, thanks.

Jason

[PATCH] copyprop_hardreg_forward needs to check HARD_REGNO_CALL_PART_CLOBBERED

2014-05-07 Thread Matthew Fortune

The MIPS O32 FPXX ABI exposes a bug in regcprop where call part
clobbered information is not checked when calculating clobbered
registers. This is only one of many places that 
regs_invalidated_by_call is used without also checking 
HARD_REGNO_CALL_PART_CLOBBERED. This patch ensures that a part 
clobbered register is treated as if fully clobbered.

Other places where this same issue occurs are not so easily
fixed as they do not always have mode information available
when calculating clobbered registers. A solution to the larger
problem will be significantly more involved.

Exposed in a testcase as part of:
http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00401.html

Regards,
Matthew

2014-05-07  Matthew Fortune  

gcc/
* regcprop.c (copyprop_hardreg_forward_1): Account for
HARD_REGNO_CALL_PART_CLOBBERED.


0001-copyprop-part-clobbered.patch
Description: 0001-copyprop-part-clobbered.patch

Re: PR 61084: SPARC fallout from wide-int merge

2014-05-07 Thread Richard Sandiford

Mike Stump  writes:
> On May 7, 2014, at 2:26 AM, Richard Sandiford
>  wrote:
>> The DImode constant spliiter assigned the result of trunc_int_for_mode
>> to an unsigned int rather than a HOST_WIDE_INT.  This then produced 
>> const_ints
>> that were zero-extended rather than sign-extended and tripped the assert:
>> 
>>  gcc_checking_assert (INTVAL (x.first)
>>   == sext_hwi (INTVAL (x.first), precision)
>>   || (x.second == BImode && INTVAL (x.first) == 1));
>> 
>> The other hunks are just by inspection, but I think gen_int_mode is
>> preferred over GEN_INT when the mode is obvious.
>> 
>> Tested by Rainer, who says that the boostrap now completes.
>> OK to install?
>
> Ok.

I think this needs a backend maintainer.  Although it was exposed by
the wide-int assert, it isn't really wide-int-related as such.

Thanks,
Richard

Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option

2014-05-07 Thread Mike Stump

On May 7, 2014, at 2:32 AM, Herman, Andrei  
wrote:
> However, code coverage tools that process the DWARF debug information to
> implement block/path coverage need more complete lexical block information. 

So, it would be nice to give a hint in the actual documentation, why a user 
might use the flag, or for a maintainer to be able to predict exactly what was 
desired in some obscure corner of dwarf semantics given the documentation.  I 
think it can be as simple as “This option is useful for code coverage tools 
that utilize the dwarf debug information.”  A user, upon seeing that, would 
then ask, do I have such a tool, say no, and then know they don’t have to 
contemplate the goodness of the option further.  If one is writing a coverage 
tool, upon seeing the documentation, they might then ask themselves, how might 
I use that flag profitably for my users.

Re: [PATCH] rs6000: New attributes for load/store: "sign_extend", "update" and "indexed"

2014-05-07 Thread David Edelsohn

On Sun, May 4, 2014 at 10:13 PM, Segher Boessenkool
 wrote:
> The new attributes replace the instruction types *_ext*, *_u, *_ux.
>
> This simplifies all code that does not care about the addressing modes,
> putting the burden on the code that does care (mostly the scheduling
> descriptions for certain CPUs).
>
> It fixes a few minor bugs in the process.
>
> The "update" and "indexed" attributes are automatic for any insn that
> has a MEM as operand 0 or 1.  Other insns have to set it manually, if
> they do not like the default (which is "no").  Insns that are type
> load/store/fpload/fpstore but have fewer than two operands need to set
> it too, or the compiler will crash.  There are very few of those.
>
> This tries not to change semantics anywhere; in particular, the string
> and multiple instructions set both "update" and "indexed" (although
> they are neither).
>
> Bootstrapped on powerpc64-linux c,c++,fortran,ada,go; tested
> {-m64,-m64/-mcpu=power8,-m32,-m32/-mpowerpc64}, no regressions.
>
> OK for mainline?
>
>
> Segher
>
>
> gcc/
>
> 2014-05-04  Segher Boessenkool  
>
> * config/rs6000/predicates.md (indexed_address_mem): New.
> * config/rs6000/rs6000.md (type): Remove load_ext, load_ext_u,
> load_ext_ux, load_ux, load_u, store_ux, store_u, fpload_ux, fpload_u,
> fpstore_ux, fpstore_u.
> (sign_extend, indexed, update): New.
> (cell_micro): Adjust.
> (*zero_extenddi2_internal1, *zero_extendsidi2_lfiwzx,
> *extendsidi2_lfiwax, *extendsidi2_nocell, *extendsfdf2_fpr,
> *movsi_internal1, *movsi_internal1_single, *movhi_internal,
> *movqi_internal, *movcc_internal1, mov_hardfloat,
> *mov_softfloat, *mov_hardfloat32, *mov_hardfloat64,
> *mov_softfloat64, *movdi_internal32, *movdi_internal64,
> *mov_string, *ldmsi8, *ldmsi7, *ldmsi6, *ldmsi5, *ldmsi4,
> *ldmsi3, *stmsi8, *stmsi7, *stmsi6, *stmsi5, *stmsi4, *stmsi3,
> *movdi_update1, movdi__update, movdi__update_stack,
> *movsi_update1, *movsi_update2, movsi_update, movsi_update_stack,
> *movhi_update1, *movhi_update2, *movhi_update3, *movhi_update4,
> *movqi_update1, *movqi_update2, *movqi_update3, *movsf_update1,
> *movsf_update2, *movsf_update3, *movsf_update4, *movdf_update1,
> *movdf_update2, load_toc_aix_si, load_toc_aix_di, probe_stack_,
> *stmw, *lmw, as well as 10 anonymous patterns): Adjust.
>
> * config/rs6000/dfp.md (movsd_store, movsd_load): Adjust.
> * config/rs6000/vsx.md (*vsx_movti_32bit, *vsx_extract__load,
> *vsx_extract__store): Adjust.
> * config/rs6000/rs6000.c (rs6000_adjust_cost, is_microcoded_insn,
> is_cracked_insn, insn_must_be_first_in_group,
> insn_must_be_last_in_group): Adjust.
>
> * config/rs6000/40x.md (ppc403-load, ppc403-store, ppc405-float):
> Adjust.
> * config/rs6000/440.md (ppc440-load, ppc440-store, ppc440-fpload,
> ppc440-fpstore): Adjust.
> * config/rs6000/476.md (ppc476-load, ppc476-store, ppc476-fpload,
> ppc476-fpstore): Adjust.
> * config/rs6000/601.md (ppc601-load, ppc601-store, ppc601-fpload,
> ppc601-fpstore): Adjust.
> * config/rs6000/603.md (ppc603-load, ppc603-store, ppc603-fpload):
> Adjust.
> * config/rs6000/6xx.md (ppc604-load, ppc604-store, ppc604-fpload):
> Adjust.
> * config/rs6000/7450.md (ppc7450-load, ppc7450-store, ppc7450-fpload,
> ppc7450-fpstore): Adjust.
> * config/rs6000/7xx.md (ppc750-load, ppc750-store): Adjust.
> * config/rs6000/8540.md (ppc8540_load, ppc8540_store): Adjust.
> * config/rs6000/a2.md (ppca2-load, ppca2-fp-load, ppca2-fp-store):
> Adjust.
> * config/rs6000/cell.md (cell-load, cell-load-ux, cell-load-ext,
> cell-fpload, cell-fpload-update, cell-store, cell-store-update,
> cell-fpstore, cell-fpstore-update): Adjust.
> * config/rs6000/e300c2c3.md (ppce300c3_load, ppce300c3_fpload,
> ppce300c3_store, ppce300c3_fpstore): Adjust.
> * config/rs6000/e500mc.md (e500mc_load, e500mc_fpload, e500mc_store,
> e500mc_fpstore): Adjust.
> * config/rs6000/e500mc64.md (e500mc64_load, e500mc64_fpload,
> e500mc64_store, e500mc64_fpstore): Adjust.
> * config/rs6000/e5500.md (e5500_load, e5500_fpload, e5500_store,
> e5500_fpstore): Adjust.
> * config/rs6000/e6500.md (e6500_load, e6500_fpload, e6500_store,
> e6500_fpstore): Adjust.
> * config/rs6000/mpc.md (mpccore-load, mpccore-store, mpccore-fpload):
> Adjust.
> * config/rs6000/power4.md (power4-load, power4-load-ext,
> power4-load-ext-update, power4-load-ext-update-indexed,
> power4-load-update-indexed, power4-load-update, power4-fpload,
> power4-fpload-update, power4-store, power4-store-update,
> power4-store-update-indexed, power4-f

[4.7] Various backports

2014-05-07 Thread Jakub Jelinek

Hi!

I've backported some fixes I've committed (plus one support change from
Jason and one fix from Marek) to 4.8 branch in the last year or so to
4.7 branch, after bootstrapping/regtesting them on x86_64-linux and
i686-linux.
Sorry for the delay.

Jakub
2014-05-07  Jakub Jelinek  

Backported from mainline
2013-06-27  Jakub Jelinek  

PR target/57623
* config/i386/i386.md (bmi2_bzhi_3): Swap AND arguments
to match RTL canonicalization.  Swap predicates and
constraints of operand 1 and 2.

* gcc.target/i386/bmi2-bzhi-1.c: New test.

--- gcc/config/i386/i386.md (revision 200477)
+++ gcc/config/i386/i386.md (revision 200478)
@@ -12174,9 +12174,9 @@ (define_insn "*bmi_blsr_"
 ;; BMI2 instructions.
 (define_insn "bmi2_bzhi_3"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
-   (and:SWI48 (match_operand:SWI48 1 "register_operand" "r")
-  (lshiftrt:SWI48 (const_int -1)
-  (match_operand:SWI48 2 
"nonimmediate_operand" "rm"
+   (and:SWI48 (lshiftrt:SWI48 (const_int -1)
+  (match_operand:SWI48 2 "register_operand" 
"r"))
+  (match_operand:SWI48 1 "nonimmediate_operand" "rm")))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_BMI2"
   "bzhi\t{%2, %1, %0|%0, %1, %2}"
--- gcc/testsuite/gcc.target/i386/bmi2-bzhi-1.c (revision 0)
+++ gcc/testsuite/gcc.target/i386/bmi2-bzhi-1.c (revision 200478)
@@ -0,0 +1,31 @@
+/* PR target/57623 */
+/* { dg-do assemble { target bmi2 } } */
+/* { dg-options "-O2 -mbmi2" } */
+
+#include 
+
+unsigned int
+f1 (unsigned int x, unsigned int *y)
+{
+  return _bzhi_u32 (x, *y);
+}
+
+unsigned int
+f2 (unsigned int *x, unsigned int y)
+{
+  return _bzhi_u32 (*x, y);
+}
+
+#ifdef  __x86_64__
+unsigned long long
+f3 (unsigned long long x, unsigned long long *y)
+{
+  return _bzhi_u64 (x, *y);
+}
+
+unsigned long long
+f4 (unsigned long long *x, unsigned long long y)
+{
+  return _bzhi_u64 (*x, y);
+}
+#endif
2014-05-07  Jakub Jelinek  

Backported from mainline
2013-06-27  Jakub Jelinek  

PR target/57623
* config/i386/i386.md (bmi_bextr_): Swap predicates and
constraints of operand 1 and 2.

* gcc.target/i386/bmi-bextr-3.c: New test.

--- gcc/config/i386/i386.md (revision 200479)
+++ gcc/config/i386/i386.md (revision 200480)
@@ -12077,8 +12077,8 @@
 
 (define_insn "bmi_bextr_"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
-(unspec:SWI48 [(match_operand:SWI48 1 "register_operand" "r")
-   (match_operand:SWI48 2 "nonimmediate_operand" "rm")]
+(unspec:SWI48 [(match_operand:SWI48 1 "nonimmediate_operand" "rm")
+   (match_operand:SWI48 2 "register_operand" "r")]
UNSPEC_BEXTR))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_BMI"
--- gcc/testsuite/gcc.target/i386/bmi-bextr-3.c (revision 0)
+++ gcc/testsuite/gcc.target/i386/bmi-bextr-3.c (revision 200480)
@@ -0,0 +1,31 @@
+/* PR target/57623 */
+/* { dg-do assemble { target bmi } } */
+/* { dg-options "-O2 -mbmi" } */
+
+#include 
+
+unsigned int
+f1 (unsigned int x, unsigned int *y)
+{
+  return __bextr_u32 (x, *y);
+}
+
+unsigned int
+f2 (unsigned int *x, unsigned int y)
+{
+  return __bextr_u32 (*x, y);
+}
+
+#ifdef  __x86_64__
+unsigned long long
+f3 (unsigned long long x, unsigned long long *y)
+{
+  return __bextr_u64 (x, *y);
+}
+
+unsigned long long
+f4 (unsigned long long *x, unsigned long long y)
+{
+  return __bextr_u64 (*x, y);
+}
+#endif
2014-05-07  Jakub Jelinek  

Backported from mainline
2013-07-03  Jakub Jelinek  

PR target/5
* config/i386/predicates.md (vsib_address_operand): Disallow
SYMBOL_REF or LABEL_REF in parts.disp if TARGET_64BIT && flag_pic.

* gcc.target/i386/pr5.c: New test.

--- gcc/config/i386/predicates.md   (revision 200649)
+++ gcc/config/i386/predicates.md   (revision 200650)
@@ -835,19 +835,28 @@ (define_predicate "vsib_address_operand"
 return false;
 
   /* VSIB addressing doesn't support (%rip).  */
-  if (parts.disp && GET_CODE (parts.disp) == CONST)
+  if (parts.disp)
 {
-  disp = XEXP (parts.disp, 0);
-  if (GET_CODE (disp) == PLUS)
-   disp = XEXP (disp, 0);
-  if (GET_CODE (disp) == UNSPEC)
-   switch (XINT (disp, 1))
- {
- case UNSPEC_GOTPCREL:
- case UNSPEC_PCREL:
- case UNSPEC_GOTNTPOFF:
-   return false;
- }
+  disp = parts.disp;
+  if (GET_CODE (disp) == CONST)
+   {
+ disp = XEXP (disp, 0);
+ if (GET_CODE (disp) == PLUS)
+   disp = XEXP (disp, 0);
+ if (GET_CODE (disp) == UNSPEC)
+   switch (XINT (disp, 1))
+ {
+ case UNSPEC_GOTPCREL:
+ case UNSPEC_PCREL:
+ case UNSPEC_GOTNTPOFF:
+   return false;
+ }
+   }
+

Re: [PATCH, PR 60897] Clear DECL_LANG_SPECIFIC when creating ISRA clones

2014-05-07 Thread Richard Biener

On May 7, 2014 5:30:53 PM CEST, Martin Jambor  wrote:
>Hi,
>
>I nearly forgot about this patch to fix PR 60897 where we get a
>mangled name in a warning for IPA-SRA functions because IPA-SRA
>currently does not clear DECL_LANG_SPECIFIC when it messes with formal
>parameters and the front-end then does not look at abstract origin
>when it is not NULL.
>
>Bootstrapped and tested on x86_64-linux.  OK for trunk?  Also,
>although I have not tested it there yet, I suppose this should also be
>committed to the 4.9 branch.

OK for both.
Thanks,
Richard.

>Thanks,
>
>Martin
>
>
>2014-04-22  Martin Jambor  
>
>   PR ipa/60897
>   * ipa-prop.c (ipa_modify_formal_parameters): Reset DECL_LANG_SPECIFIC.
>
>diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
>index 9f144fa..0bc44d3 100644
>--- a/gcc/ipa-prop.c
>+++ b/gcc/ipa-prop.c
>@@ -3650,6 +3650,7 @@ ipa_modify_formal_parameters (tree fndecl,
>ipa_parm_adjustment_vec adjustments)
> 
>   TREE_TYPE (fndecl) = new_type;
>   DECL_VIRTUAL_P (fndecl) = 0;
>+  DECL_LANG_SPECIFIC (fndecl) = NULL;
>   otypes.release ();
>   oparms.release ();
> }

Re: [SH, committed] Fix PR 61026 sh-- Fails to Compile on FreeBSD

2014-05-07 Thread Joseph S. Myers

On Sat, 3 May 2014, Oleg Endo wrote:

> +#include 
> +#include 
> +#include 
> +
>  #include "config.h"

It's never OK to include any system headers (C or C++) before "config.h".  
config.h may define feature test macros such as _FILE_OFFSET_BITS that 
affect system headers in various ways and are only effective if defined 
before any system headers are included, and if different files in GCC are 
built with different settings of such feature test macros then they may 
expect incompatible choices of ABI for C library types.

(This is a general principle for any software using autoconf, at least if 
it uses any of the autoconf macros that can define feature test macros - 
which GCC does - not just for GCC.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [patch] change specific int128 -> generic intN

2014-05-07 Thread Joseph S. Myers

On Sun, 4 May 2014, DJ Delorie wrote:

> > I'm not aware of any reason those macros need to have decimal values.  I'd 
> > suggest removing the precomputed table and printing them in hex, which is 
> > easy for values of any precision.
> 
> Here's an independent change that removes the decimal table and
> replaces it with generated hex values.  I included the relevent output
> of gcc -E -dM also.

OK (presuming the usual bootstrap and regression test, which should 
provide a reasonably thorough test of this code through the  
tests).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [DOC PATCH] Rewrite docs for inline asm

2014-05-07 Thread Joseph S. Myers

On Mon, 5 May 2014, Gerald Pfeifer wrote:

> > I've changed this to @code{"="}.  Is that what you meant?
> 
> This is a question for Joseph.  I see how a single character
> under @code{} won't work, yet @code{"="} doesn't feel right,
> either.  Perhaps ``@code{=}''?

If you are referring to an actual string constant

  "="

in the user's source code, then @code{"="} is correct.  If you are 
referring just to the single character

  =

in the user's source code, whether as a token on its own or as part of a 
larger token, then @samp{=} is the way to get it quoted (with the 
character being in a fixed-width font, but the quotes around it not being 
in such a font).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C PATCH] Don't reject valid code with _Alignas (PR c/61053)

2014-05-07 Thread Joseph S. Myers

On Mon, 5 May 2014, Marek Polacek wrote:

> In this PR the issue is that we reject (valid) code such as
> _Alignas (long long) long long foo;
> with -m32, because we trip this condition:
> 
>alignas_align = 1U << declspecs->align_log;
>if (alignas_align < TYPE_ALIGN_UNIT (type))
>  {
>if (name)
>  error_at (loc, "%<_Alignas%> specifiers cannot reduce "
>"alignment of %qE", name);
> 
> and error later on, since alignas_align is 4 (correct, see PR52023 for
> why), but TYPE_ALIGN_UNIT of long long is 8.  I think TYPE_ALIGN_UNIT
> is wrong here as that won't give us minimal alignment required.
> In c_sizeof_or_alignof_type we already have the code to compute such
> minimal alignment so I just moved the code to a separate function
> and used that instead of TYPE_ALIGN_UNIT.
> 
> Note that the test is run only on i?86 and x86_64, because we can't (?)
> easily determine which target requires what alignment.
> 
> Regtested/bootstrapped on x86_64-unknown-linux-gnu and
> powerpc64-unknown-linux-gnu, ok for trunk?

OK, though I'm not sure if the "lp64" conditions are right in the testcase 
(i.e. if x32 has the same peculiarity as -m32 here, which is what's 
implied by the use of "lp64").

-- 
Joseph S. Myers
jos...@codesourcery.com

[C++ Patch] PR 61083

2014-05-07 Thread Paolo Carlini


Hi,

curiously, convert_nontype_argument still has most of its error calls 
not protected by complain & tf_error. The obvious fix works for this 
SFINAE issue. Not a regression, but could be safe for the branch too?


Tested x86_64-linux.

Thanks,
Paolo.

//
/cp
2014-05-07  Paolo Carlini  

PR c++/61083
* pt.c (convert_nontype_argument): Protect all the error calls
with complain & tf_error.

/testsuite
2014-05-07  Paolo Carlini  

PR c++/61083
* g++.dg/cpp0x/sfinae50.C: New.
Index: cp/pt.c
===
--- cp/pt.c (revision 210180)
+++ cp/pt.c (working copy)
@@ -5812,17 +5812,18 @@ convert_nontype_argument (tree type, tree expr, ts
{
  if (VAR_P (expr))
{
- error ("%qD is not a valid template argument "
-"because %qD is a variable, not the address of "
-"a variable",
-expr, expr);
+ if (complain & tf_error)
+   error ("%qD is not a valid template argument "
+  "because %qD is a variable, not the address of "
+  "a variable", expr, expr);
  return NULL_TREE;
}
  if (POINTER_TYPE_P (expr_type))
{
- error ("%qE is not a valid template argument for %qT "
-"because it is not the address of a variable",
-expr, type);
+ if (complain & tf_error)
+   error ("%qE is not a valid template argument for %qT "
+  "because it is not the address of a variable",
+  expr, type);
  return NULL_TREE;
}
  /* Other values, like integer constants, might be valid
@@ -5837,23 +5838,24 @@ convert_nontype_argument (tree type, tree expr, ts
  ? TREE_OPERAND (expr, 0) : expr);
  if (!VAR_P (decl))
{
- error ("%qE is not a valid template argument of type %qT "
-"because %qE is not a variable",
-expr, type, decl);
+ if (complain & tf_error)
+   error ("%qE is not a valid template argument of type %qT "
+  "because %qE is not a variable", expr, type, decl);
  return NULL_TREE;
}
  else if (cxx_dialect < cxx11 && !DECL_EXTERNAL_LINKAGE_P (decl))
{
- error ("%qE is not a valid template argument of type %qT "
-"because %qD does not have external linkage",
-expr, type, decl);
+ if (complain & tf_error)
+   error ("%qE is not a valid template argument of type %qT "
+  "because %qD does not have external linkage",
+  expr, type, decl);
  return NULL_TREE;
}
  else if (cxx_dialect >= cxx11 && decl_linkage (decl) == lk_none)
{
- error ("%qE is not a valid template argument of type %qT "
-"because %qD has no linkage",
-expr, type, decl);
+ if (complain & tf_error)
+   error ("%qE is not a valid template argument of type %qT "
+  "because %qD has no linkage", expr, type, decl);
  return NULL_TREE;
}
}
@@ -5881,15 +5883,17 @@ convert_nontype_argument (tree type, tree expr, ts
 
   if (!at_least_as_qualified_p (TREE_TYPE (type), expr_type))
{
- error ("%qE is not a valid template argument for type %qT "
-"because of conflicts in cv-qualification", expr, type);
+ if (complain & tf_error)
+   error ("%qE is not a valid template argument for type %qT "
+  "because of conflicts in cv-qualification", expr, type);
  return NULL_TREE;
}
 
   if (!real_lvalue_p (expr))
{
- error ("%qE is not a valid template argument for type %qT "
-"because it is not an lvalue", expr, type);
+ if (complain & tf_error)
+   error ("%qE is not a valid template argument for type %qT "
+  "because it is not an lvalue", expr, type);
  return NULL_TREE;
}
 
@@ -5905,9 +5909,10 @@ convert_nontype_argument (tree type, tree expr, ts
  expr = TREE_OPERAND (expr, 0);
  if (DECL_P (expr))
{
- error ("%q#D is not a valid template argument for type %qT "
-"because a reference variable does not have a constant "
-"address", expr, type);
+ if (complain & tf_error)
+   error ("%q#D is not a valid template argument for type %qT "
+  "because a reference variable does not have a constant "
+  "address", expr, type);
  return NULL_TREE;
}

RE: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option

2014-05-07 Thread Herman, Andrei

Thanks for the suggestion.
The current patch includes the following text added in gcc/doc/invoke.texi:

@item -fforce-dwarf-lexical-blocks
Produce debug information (a DW_TAG_lexical_block) for every function
body, loop body, switch body, case statement, if-then and if-else statement,
even if the body is a single statement.  Likewise, a lexical block will be
emitted for the first label of a statement.  This block ends at the end of the
current lexical scope, or when a break, continue, goto or return statement is
encountered at the same lexical scope level.
This option is available when using DWARF Version 4 or higher.

I can add the suggested sentence at the beginning of the description, to save 
time for users not interested in the more detailed explanation.

Regards,
Andrei Herman
Mentor Graphics Corporation
Israel branch 

> -Original Message-
> From: Mike Stump [mailto:mikest...@comcast.net]
> Sent: Wednesday, May 07, 2014 7:00 PM
> To: Herman, Andrei
> Cc: gcc-patches@gcc.gnu.org; herman_and...@mentor.com
> Subject: Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line
> option
> 
> On May 7, 2014, at 2:32 AM, Herman, Andrei
>  wrote:
> > However, code coverage tools that process the DWARF debug information
> > to implement block/path coverage need more complete lexical block
> information.
> 
> So, it would be nice to give a hint in the actual documentation, why a user
> might use the flag, or for a maintainer to be able to predict exactly what
> was desired in some obscure corner of dwarf semantics given the
> documentation.  I think it can be as simple as "This option is useful for code
> coverage tools that utilize the dwarf debug information."  A user, upon
> seeing that, would then ask, do I have such a tool, say no, and then know
> they don't have to contemplate the goodness of the option further.  If one
> is writing a coverage tool, upon seeing the documentation, they might then
> ask themselves, how might I use that flag profitably for my users.

Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option

2014-05-07 Thread Andrew Pinski

On Wed, May 7, 2014 at 10:19 AM, Herman, Andrei
 wrote:
> Thanks for the suggestion.
> The current patch includes the following text added in gcc/doc/invoke.texi:
>
> @item -fforce-dwarf-lexical-blocks
> Produce debug information (a DW_TAG_lexical_block) for every function
> body, loop body, switch body, case statement, if-then and if-else statement,
> even if the body is a single statement.  Likewise, a lexical block will be
> emitted for the first label of a statement.  This block ends at the end of the
> current lexical scope, or when a break, continue, goto or return statement is
> encountered at the same lexical scope level.
> This option is available when using DWARF Version 4 or higher.
>
> I can add the suggested sentence at the beginning of the description, to save 
> time for users not interested in the more detailed explanation.

Also be explicit that the option only applies to C/C++ code in the
documentation.

Thanks,
Andrew Pinski

>
> Regards,
> Andrei Herman
> Mentor Graphics Corporation
> Israel branch
>
>
>> -Original Message-
>> From: Mike Stump [mailto:mikest...@comcast.net]
>> Sent: Wednesday, May 07, 2014 7:00 PM
>> To: Herman, Andrei
>> Cc: gcc-patches@gcc.gnu.org; herman_and...@mentor.com
>> Subject: Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line
>> option
>>
>> On May 7, 2014, at 2:32 AM, Herman, Andrei
>>  wrote:
>> > However, code coverage tools that process the DWARF debug information
>> > to implement block/path coverage need more complete lexical block
>> information.
>>
>> So, it would be nice to give a hint in the actual documentation, why a user
>> might use the flag, or for a maintainer to be able to predict exactly what
>> was desired in some obscure corner of dwarf semantics given the
>> documentation.  I think it can be as simple as "This option is useful for 
>> code
>> coverage tools that utilize the dwarf debug information."  A user, upon
>> seeing that, would then ask, do I have such a tool, say no, and then know
>> they don't have to contemplate the goodness of the option further.  If one
>> is writing a coverage tool, upon seeing the documentation, they might then
>> ask themselves, how might I use that flag profitably for my users.

Re: [C PATCH] Warn about variadic main (PR c/60156)

2014-05-07 Thread Joseph S. Myers

On Tue, 6 May 2014, Marek Polacek wrote:

> On Thu, May 01, 2014 at 11:37:58PM +, Joseph S. Myers wrote:
> > As a matter of QoI we should also diagnose use of _Atomic in the return 
> > type or argument types of main (something I deferred doing in the initial 
> > _Atomic support).
> 
> Ok, I opened PR61077 and I'm taking it.  But I wonder if I should
> diagnose if the second parameter is e.g.:
> _Atomic char **argv;
> char *_Atomic *argv;

Yes, those should be diagnosed (remember that _Atomic char is allowed to 
be bigger than char, so those certainly aren't reasonable types for 
arguments to main).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option

2014-05-07 Thread Joseph S. Myers

On Wed, 7 May 2014, Herman, Andrei wrote:

> When this flag is set, a DW_TAG_lexical_block DIE will be emitted for every
> function body, loop body, switch body, case statement, if-then and if-else
> statement, even if the body is a single statement. 
> Likewise, a lexical block will be emitted for the first label of a labeled
> statement. This block ends at the end of the current lexical scope, or when
> a break, continue, goto or return statement is encountered at the same lexical
> scope level. 
> Consequently, any case in a switch statement that does not flow through to 
> the next case, will have its own dwarf lexical block.

The documentation appears to suggest it's purely about debug info and has 
no effect on language semantics.  However, the implementation appears to 
force C99 scoping rules.  I don't think it's appropriate for a debug info 
option to have that effect; that is, gcc.dg/c90-scope-1.c should still 
pass even with the option enabled (more generally, the whole C testsuite 
should be verified to work with the option enabled).  I suspect the 
changes adding scopes for labels would also affect language semantics; 
it's valid in C to have a declaration (not having variably modified type) 
after one case in a switch statement that gets used in another case even 
when control does not flow through.

If you can't avoid affecting language semantics then you need to be very 
clear in the documentation that the option makes some invalid programs 
valid and vice versa and changes the semantics of some valid programs 
(even if you then assert the affected cases are uncommon in real C code).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option

2014-05-07 Thread Mike Stump

On May 7, 2014, at 10:19 AM, Herman, Andrei  
wrote:
> Thanks for the suggestion.

> I can add the suggested sentence at the beginning of the description, to save 
> time for users not interested in the more detailed explanation.

I’d put it at the end…  I think the description you have it more important.

[committed] PR 61095: tsan fallout from wide-int merge

2014-05-07 Thread Richard Sandiford

This PR was due to code in which -(int) foo was suposed to be sign-extended,
but was being ORed with an unsigned int and so ended up being zero-extended.
Fixed by using the proper-width type.

Tested on x86_64-linux-gnu and applied as obvious.  Sorry for the breakage.

Thanks,
Richard


gcc/
PR tree-optimization/61095
* tree-ssanames.c (get_nonzero_bits): Fix type extension in wi::shwi.

Index: gcc/tree-ssanames.c
===
--- gcc/tree-ssanames.c 2014-05-07 16:50:15.136064484 +0100
+++ gcc/tree-ssanames.c 2014-05-07 16:50:15.422063737 +0100
@@ -271,7 +271,8 @@ get_nonzero_bits (const_tree name)
 {
   struct ptr_info_def *pi = SSA_NAME_PTR_INFO (name);
   if (pi && pi->align)
-   return wi::shwi (-(int) pi->align | pi->misalign, precision);
+   return wi::shwi (-(HOST_WIDE_INT) pi->align
+| (HOST_WIDE_INT) pi->misalign, precision);
   return wi::shwi (-1, precision);
 }

Re: [C PATCH] Don't reject valid code with _Alignas (PR c/61053)

2014-05-07 Thread H.J. Lu

On Wed, May 7, 2014 at 10:15 AM, Joseph S. Myers
 wrote:
> On Mon, 5 May 2014, Marek Polacek wrote:
>
>> In this PR the issue is that we reject (valid) code such as
>> _Alignas (long long) long long foo;
>> with -m32, because we trip this condition:
>>
>>alignas_align = 1U << declspecs->align_log;
>>if (alignas_align < TYPE_ALIGN_UNIT (type))
>>  {
>>if (name)
>>  error_at (loc, "%<_Alignas%> specifiers cannot reduce "
>>"alignment of %qE", name);
>>
>> and error later on, since alignas_align is 4 (correct, see PR52023 for
>> why), but TYPE_ALIGN_UNIT of long long is 8.  I think TYPE_ALIGN_UNIT
>> is wrong here as that won't give us minimal alignment required.
>> In c_sizeof_or_alignof_type we already have the code to compute such
>> minimal alignment so I just moved the code to a separate function
>> and used that instead of TYPE_ALIGN_UNIT.
>>
>> Note that the test is run only on i?86 and x86_64, because we can't (?)
>> easily determine which target requires what alignment.
>>
>> Regtested/bootstrapped on x86_64-unknown-linux-gnu and
>> powerpc64-unknown-linux-gnu, ok for trunk?
>
> OK, though I'm not sure if the "lp64" conditions are right in the testcase

It should be !ia32 instead of lp64.

> (i.e. if x32 has the same peculiarity as -m32 here, which is what's
> implied by the use of "lp64").
>

Alignments of long long and long double on x32 are the same as x86-64.

-- 
H.J.

Re: [C++ Patch] PR 61083

2014-05-07 Thread Jason Merrill


On 05/07/2014 01:15 PM, Paolo Carlini wrote:

curiously, convert_nontype_argument still has most of its error calls
not protected by complain & tf_error. The obvious fix works for this
SFINAE issue. Not a regression, but could be safe for the branch too?


Sure, OK for trunk and 4.9.

Jason

[patch libgcc]: Fix PR c++/57440

2014-05-07 Thread Kai Tietz

Hi,

this patch adds for Windows targets the define
_GTHREAD_USE_MUTEX_INIT_FUNC, which is necessary as pthread-emulation
for those targets are just handling pthread_mutext_init,
othread_mutex_destroy proper.

ChangeLog libgcc

2014-05-07  Kai Tietz  

PR c++/57440
* gthr-posix.h (_GTHREAD_USE_MUTEX_INIT_FUNC): Define for native windows
targets.

Patch passed already regression-test for x86_64-unknown-linux-gnu.
Test for i686-w64-mingw32 is still running (with posix-threading
model).  Ok to apply this patch after last test passes?

Regards,
Kai



Index: gthr-posix.h
===
--- gthr-posix.h(Revision 210070)
+++ gthr-posix.h(Arbeitskopie)
@@ -34,6 +34,10 @@ see the files COPYING3 and COPYING.RUNTIME respect

 #include 

+#if defined (_WIN32) && !defined (__CYGWIN__)
+#define _GTHREAD_USE_MUTEX_INIT_FUNC 1
+#endif
+
 #if ((defined(_LIBOBJC) || defined(_LIBOBJC_WEAK)) \
  || !defined(_GTHREAD_USE_MUTEX_TIMEDLOCK))
 # include

Re: [PATCH, MIPS] Alter default number of single-precision registers

2014-05-07 Thread Richard Sandiford

Matthew Fortune  writes:
> diff --git a/gcc/testsuite/gcc.target/mips/oddspreg-6.c 
> b/gcc/testsuite/gcc.target/mips/oddspreg-6.c
> new file mode 100644
> index 000..2d1b129
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/mips/oddspreg-6.c
> @@ -0,0 +1,15 @@
> +/* Check that we disable odd-numbered single precision registers and can
> +   still generate code.  */
> +/* { dg-options "-mabi=64 -mno-odd-spreg -mhard-float" } */

"Check that we enable odd-numbered single precision registers." for this one?

OK otherwise once the copyright is sorted out, thanks.

Richard

[jit] Add a soname

2014-05-07 Thread David Malcolm

gcc/jit/
* Make-lang.in (LIBGCCJIT_LINKER_NAME): New.
(LIBGCCJIT_VERSION_NUM): New.
(LIBGCCJIT_MINOR_NUM): New.
(LIBGCCJIT_RELEASE_NUM): New.
(LIBGCCJIT_SONAME): New.
(LIBGCCJIT_FILENAME): New.
(LIBGCCJIT_LINKER_NAME_SYMLINK): New.
(LIBGCCJIT_SONAME_SYMLINK): New.
(jit): Add symlink targets.
(libgccjit.so): Convert to...
(LIBGCCJIT_FILENAME): ...and add a soname.
(jit.install-common): Install the library with a soname, and
symlinks.  Install libgccjit++.h.
---
 gcc/jit/ChangeLog.jit | 16 
 gcc/jit/Make-lang.in  | 38 +-
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index ccf8a10..f5c4742 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,19 @@
+2014-05-07  David Malcolm  
+
+   * Make-lang.in (LIBGCCJIT_LINKER_NAME): New.
+   (LIBGCCJIT_VERSION_NUM): New.
+   (LIBGCCJIT_MINOR_NUM): New.
+   (LIBGCCJIT_RELEASE_NUM): New.
+   (LIBGCCJIT_SONAME): New.
+   (LIBGCCJIT_FILENAME): New.
+   (LIBGCCJIT_LINKER_NAME_SYMLINK): New.
+   (LIBGCCJIT_SONAME_SYMLINK): New.
+   (jit): Add symlink targets.
+   (libgccjit.so): Convert to...
+   (LIBGCCJIT_FILENAME): ...and add a soname.
+   (jit.install-common): Install the library with a soname, and
+   symlinks.  Install libgccjit++.h.
+
 2014-04-25  David Malcolm  
 
* internal-api.c (gcc::jit::playback::context::compile): Put
diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
index 776ee81..ce0cdc5 100644
--- a/gcc/jit/Make-lang.in
+++ b/gcc/jit/Make-lang.in
@@ -40,7 +40,18 @@
 # into the jit rule, but that needs a little bit of work
 # to do the right thing within all.cross.
 
-jit: libgccjit.so
+LIBGCCJIT_LINKER_NAME = libgccjit.so
+LIBGCCJIT_VERSION_NUM = 0
+LIBGCCJIT_MINOR_NUM = 0
+LIBGCCJIT_RELEASE_NUM = 1
+LIBGCCJIT_SONAME = $(LIBGCCJIT_LINKER_NAME).$(LIBGCCJIT_VERSION_NUM)
+LIBGCCJIT_FILENAME = \
+  $(LIBGCCJIT_SONAME).$(LIBGCCJIT_MINOR_NUM).$(LIBGCCJIT_RELEASE_NUM)
+
+LIBGCCJIT_LINKER_NAME_SYMLINK = $(LIBGCCJIT_LINKER_NAME)
+LIBGCCJIT_SONAME_SYMLINK = $(LIBGCCJIT_SONAME)
+
+jit: $(LIBGCCJIT_FILENAME) $(LIBGCCJIT_SYMLINK) 
$(LIBGCCJIT_LINKER_NAME_SYMLINK)
 
 # Tell GNU make to ignore these if they exist.
 .PHONY: jit
@@ -53,14 +64,21 @@ jit-warn = $(STRICT_WARN)
 
 # We avoid using $(BACKEND) from Makefile.in in order to avoid pulling
 # in main.o
-libgccjit.so: $(jit_OBJS) \
+$(LIBGCCJIT_FILENAME): $(jit_OBJS) \
libbackend.a libcommon-target.a libcommon.a \
$(CPPLIB) $(LIBDECNUMBER) \
$(LIBDEPS) $(srcdir)/jit/libgccjit.map
+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ -shared \
 $(jit_OBJS) libbackend.a libcommon-target.a libcommon.a \
 $(CPPLIB) $(LIBDECNUMBER) $(LIBS) $(BACKENDLIBS) \
--Wl,--version-script=$(srcdir)/jit/libgccjit.map
+-Wl,--version-script=$(srcdir)/jit/libgccjit.map \
+-Wl,-soname,$(LIBGCCJIT_SONAME)
+
+$(LIBGCCJIT_SONAME_SYMLINK): $(LIBGCCJIT_FILENAME)
+   ln -sf $(LIBGCCJIT_FILENAME) $(LIBGCCJIT_SONAME_SYMLINK)
+
+$(LIBGCCJIT_LINKER_NAME_SYMLINK): $(LIBGCCJIT_SONAME_SYMLINK)
+   ln -sf $(LIBGCCJIT_SONAME_SYMLINK) $(LIBGCCJIT_LINKER_NAME_SYMLINK)
 
 #
 # Build hooks:
@@ -87,8 +105,18 @@ jit.srcman:
 #
 # Install hooks:
 jit.install-common: installdirs
-   $(INSTALL_PROGRAM) libgccjit.so $(DESTDIR)/$(libdir)/libgccjit.so
-   $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit.h 
$(DESTDIR)/$(includedir)/libgccjit.h
+   $(INSTALL_PROGRAM) $(LIBGCCJIT_FILENAME) \
+ $(DESTDIR)/$(libdir)/$(LIBGCCJIT_FILENAME)
+   ln -sf \
+ $(LIBGCCJIT_FILENAME) \
+ $(DESTDIR)/$(libdir)/$(LIBGCCJIT_SONAME_SYMLINK)
+   ln -sf \
+ $(LIBGCCJIT_SONAME_SYMLINK)\
+ $(DESTDIR)/$(libdir)/$(LIBGCCJIT_LINKER_NAME_SYMLINK)
+   $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit.h \
+ $(DESTDIR)/$(includedir)/libgccjit.h
+   $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit++.h \
+ $(DESTDIR)/$(includedir)/libgccjit++.h
 
 jit.install-man:
 
-- 
1.8.5.3

RE: [PATCH, MIPS] Alter default number of single-precision registers

2014-05-07 Thread Matthew Fortune

Richard Sandiford  writes:
> Matthew Fortune  writes:
> > diff --git a/gcc/testsuite/gcc.target/mips/oddspreg-6.c
> b/gcc/testsuite/gcc.target/mips/oddspreg-6.c
> > new file mode 100644
> > index 000..2d1b129
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/mips/oddspreg-6.c
> > @@ -0,0 +1,15 @@
> > +/* Check that we disable odd-numbered single precision registers and can
> > +   still generate code.  */
> > +/* { dg-options "-mabi=64 -mno-odd-spreg -mhard-float" } */
> 
> "Check that we enable odd-numbered single precision registers." for this one?

Yes.

> OK otherwise once the copyright is sorted out, thanks.
> 
> Richard

Committed: [PATCH 19/89] Const-correctness of gimple_call_builtin_p

2014-05-07 Thread David Malcolm

On Mon, 2014-04-21 at 12:56 -0400, David Malcolm wrote:
> gcc/
>   * gimple.h (gimple_builtin_call_types_compatible_p): Accept a
>   const_gimple, rather than a gimple.
>   (gimple_call_builtin_p): Likewise, for the three variants.
> 
>   * gimple.c (gimple_builtin_call_types_compatible_p): Likewise.
>   (gimple_call_builtin_p): Likewise, for the three variants.
> ---
>  gcc/gimple.c | 8 
>  gcc/gimple.h | 8 
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 13c5a08..943fa7c 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -2383,7 +2383,7 @@ validate_type (tree type1, tree type2)
> a decl of a builtin function.  */
>  
>  bool
> -gimple_builtin_call_types_compatible_p (gimple stmt, tree fndecl)
> +gimple_builtin_call_types_compatible_p (const_gimple stmt, tree fndecl)
>  {
>gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN);
>  
> @@ -2412,7 +2412,7 @@ gimple_builtin_call_types_compatible_p (gimple stmt, 
> tree fndecl)
>  /* Return true when STMT is builtins call.  */
>  
>  bool
> -gimple_call_builtin_p (gimple stmt)
> +gimple_call_builtin_p (const_gimple stmt)
>  {
>tree fndecl;
>if (is_gimple_call (stmt)
> @@ -2425,7 +2425,7 @@ gimple_call_builtin_p (gimple stmt)
>  /* Return true when STMT is builtins call to CLASS.  */
>  
>  bool
> -gimple_call_builtin_p (gimple stmt, enum built_in_class klass)
> +gimple_call_builtin_p (const_gimple stmt, enum built_in_class klass)
>  {
>tree fndecl;
>if (is_gimple_call (stmt)
> @@ -2438,7 +2438,7 @@ gimple_call_builtin_p (gimple stmt, enum built_in_class 
> klass)
>  /* Return true when STMT is builtins call to CODE of CLASS.  */
>  
>  bool
> -gimple_call_builtin_p (gimple stmt, enum built_in_function code)
> +gimple_call_builtin_p (const_gimple stmt, enum built_in_function code)
>  {
>tree fndecl;
>if (is_gimple_call (stmt)
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index a8a8d72..62f9756 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -1458,10 +1458,10 @@ extern tree gimple_unsigned_type (tree);
>  extern tree gimple_signed_type (tree);
>  extern alias_set_type gimple_get_alias_set (tree);
>  extern bool gimple_ior_addresses_taken (bitmap, gimple);
> -extern bool gimple_builtin_call_types_compatible_p (gimple, tree);
> -extern bool gimple_call_builtin_p (gimple);
> -extern bool gimple_call_builtin_p (gimple, enum built_in_class);
> -extern bool gimple_call_builtin_p (gimple, enum built_in_function);
> +extern bool gimple_builtin_call_types_compatible_p (const_gimple, tree);
> +extern bool gimple_call_builtin_p (const_gimple);
> +extern bool gimple_call_builtin_p (const_gimple, enum built_in_class);
> +extern bool gimple_call_builtin_p (const_gimple, enum built_in_function);
>  extern bool gimple_asm_clobbers_memory_p (const_gimple);
>  extern void dump_decl_set (FILE *, bitmap);
>  extern bool nonfreeing_call_p (gimple);

Succesfully bootstrapped®tested on its own on
x86_64-unknown-linux-gnu (Fedora 20).

Committed to trunk as r210185 (this is just fixing const-correctness,
and so it falls under Jeff's preapproval for such fixes here:
  http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01240.html )

Re: [patch libgcc]: Fix PR c++/57440

2014-05-07 Thread Jonathan Wakely

On 7 May 2014 20:06, Kai Tietz wrote:
>
> PR c++/57440

N.B. that should be libstdc++/57440 in the ChangeLog

[SH, committeð] PR 60884 - reduce code size of inlined strlen

2014-05-07 Thread Oleg Endo

Hi,

The attached patch reduces the code size of inlined builtin strlen
functions on SH a little bit.
Tested on r210083 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures, except for gcc.target/sh/pr53976-1.c on SH2 and
SH2A.  Using builtin strlen for checking the sett/clrt optimization pass
was a bit inappropriate in this case.

Committed as r210187.

Cheers,
Oleg

gcc/ChangeLog:
PR target/60884
* config/sh/sh-mem.cc (sh_expand_strlen): Use loop when emitting
unrolled byte insns.  Emit address increments after move insns.

gcc/testsuite/ChangeLog:
PR target/60884
* gcc.target/sh/pr53976-1.c (test_02): Remove inappropriate test case.
(test_03): Rename to test_02.
Index: gcc/testsuite/gcc.target/sh/pr53976-1.c
===
--- gcc/testsuite/gcc.target/sh/pr53976-1.c	(revision 210185)
+++ gcc/testsuite/gcc.target/sh/pr53976-1.c	(working copy)
@@ -24,15 +24,8 @@
 }
 
 int
-test_02 (const char* a)
+test_02 (int a, int b, int c, int d)
 {
-  /* Must not see a sett after the inlined strlen.  */
-  return __builtin_strlen (a);
-}
-
-int
-test_03 (int a, int b, int c, int d)
-{
   /* One of the blocks should have a sett and the other one should not.  */
   if (d > 4)
 return a + b + 1;
Index: gcc/config/sh/sh-mem.cc
===
--- gcc/config/sh/sh-mem.cc	(revision 210185)
+++ gcc/config/sh/sh-mem.cc	(working copy)
@@ -568,7 +568,7 @@
 
   addr1 = adjust_automodify_address (addr1, SImode, current_addr, 0);
 
-  /*start long loop.  */
+  /* start long loop.  */
   emit_label (L_loop_long);
 
   /* tmp1 is aligned, OK to load.  */
@@ -589,29 +589,15 @@
   addr1 = adjust_address (addr1, QImode, 0);
 
   /* unroll remaining bytes.  */
-  emit_insn (gen_extendqisi2 (tmp1, addr1));
-  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
-  jump = emit_jump_insn (gen_branch_true (L_return));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  for (int i = 0; i < 4; ++i)
+{
+  emit_insn (gen_extendqisi2 (tmp1, addr1));
+  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
+  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
+  jump = emit_jump_insn (gen_branch_true (L_return));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+}
 
-  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
-
-  emit_insn (gen_extendqisi2 (tmp1, addr1));
-  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
-  jump = emit_jump_insn (gen_branch_true (L_return));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
-
-  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
-
-  emit_insn (gen_extendqisi2 (tmp1, addr1));
-  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
-  jump = emit_jump_insn (gen_branch_true (L_return));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
-
-  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
-
-  emit_insn (gen_extendqisi2 (tmp1, addr1));
-  jump = emit_jump_insn (gen_jump_compact (L_return));
   emit_barrier_after (jump);
 
   /* start byte loop.  */
@@ -626,10 +612,9 @@
 
   /* end loop.  */
 
-  emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1)));
-
   emit_label (L_return);
 
+  emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1)));
   emit_insn (gen_subsi3 (operands[0], current_addr, start_addr));
 
   return true;

Re: [patch libgcc]: Fix PR c++/57440

2014-05-07 Thread Kai Tietz

2014-05-07 21:41 GMT+02:00 Jonathan Wakely :
> On 7 May 2014 20:06, Kai Tietz wrote:
>>
>> PR c++/57440
>
> N.B. that should be libstdc++/57440 in the ChangeLog

Oh, yes of course.

Thanks.
Kai

RFC: Faster for_each_rtx-like iterators

2014-05-07 Thread Richard Sandiford

I noticed for_each_rtx showing up in profiles and thought I'd have a go
at using worklist-based iterators instead.  So far I have three:

  FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx
  FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx
  FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx *

with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement.

I made FOR_EACH_SUBRTX the "default" (unsuffixed) version because
most walks really don't modify the structure.  I think we should
encourage const_rtxes to be used whereever possible.  E.g. it might
make it easier to have non-GC storage for temporary rtxes in future.

I've locally replaced all for_each_rtx calls in the generic code with
these iterators and they make things reproducably faster.  The speed-up
on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%,
but maybe that's enough to justify the churn.

Implementation-wise, the main observation is that most subrtxes are part
of a single contiguous sequence of "e" fields.  E.g. when compiling an
oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the
subrtxes of 7,636,542 rtxes.  Of those:

(A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields,
(B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and
  no "E" fields, and
(C)43,532 (00.6%) are more complicated.

(A) is really a special case of (B) in which the block has zero length.
Those are the only two cases that really need to be handled inline.
The implementation does this by having a mapping from an rtx code to the
bounds of its "e" sequence, in the form of a start index and count.

Out of (C), the vast majority (43,509) are PARALLELs.  However, as you'd
probably expect, bloating the inline code with that case made things
slower rather than faster.

The vast majority (in fact all in the combine.ii run above) of iterations
can be done with a 16-element stack worklist.  We obviously still need a
heap fallback for the pathological cases though.

I spent a bit of time trying different iterator implementations and
seeing which produced the best code.  Specific results from that were:

- The storage used for the worklist is separate from the iterator,
  in order to avoid capturing iterator fields.

- Although the natural type of the storage would be auto_vec <..., 16>,
  that produced some overhead compared with a separate stack array and heap
  vector pointer.  With the heap vector pointer, the only overhead is an
  assignment in the constructor and an "if (x) release (x)"-style sequence
  in the destructor.  I think the extra complication over auto_vec is worth
  it because in this case the heap version is so very rarely needed.

- Several existing for_each_rtx callbacks have something like:

if (GET_CODE (x) == CONST)
  return -1;

  or:

if (CONSTANT_P (x))
  return -1;

  to avoid walking subrtxes of constants.  That can be done without
  extra code checks and branches by having a separate code->bound
  mapping in which all constants are treated as leaf rtxes.  This usage
  should be common enough to outweigh the cache penalty of two arrays.

  The choice between iterating over constants or not is given in the
  final parameter of the FOR_EACH_* iterator.

- The maximum number of fields in (B)-type rtxes is 3.  We get better
  code by making that explicit rather than having a general loop.

- (C) codes map to an "e" count of UCHAR_MAX, so we can use a single
  check to test for that and for cases where the stack worklist is
  too small.

To give an example:

/* Callback for for_each_rtx, that returns 1 upon encountering a VALUE
   whose UID is greater than the int uid that D points to.  */

static int
refs_newer_value_cb (rtx *x, void *d)
{
  if (GET_CODE (*x) == VALUE && CSELIB_VAL_PTR (*x)->uid > *(int *)d)
return 1;

  return 0;
}

/* Return TRUE if EXPR refers to a VALUE whose uid is greater than
   that of V.  */

static bool
refs_newer_value_p (rtx expr, rtx v)
{
  int minuid = CSELIB_VAL_PTR (v)->uid;

  return for_each_rtx (&expr, refs_newer_value_cb, &minuid);
}

becomes:

/* Return TRUE if EXPR refers to a VALUE whose uid is greater than
   that of V.  */

static bool
refs_newer_value_p (const_rtx expr, rtx v)
{
  int minuid = CSELIB_VAL_PTR (v)->uid;
  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, expr, NONCONST)
if (GET_CODE (*iter) == VALUE && CSELIB_VAL_PTR (*iter)->uid > minuid)
  return true;
  return false;
}

The iterator also allows subrtxes of a specific rtx to be skipped;
this is the equivalent of returning -1 from a for_each_rtx callback.
It also allows the current rtx to be replaced in the worklist by
another.  E.g.:

static void
mark_constants_in_pattern (rtx insn)
{
  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
{
  const_rtx x = *iter;
  if (GET_CODE (x) == SYMBOL_REF)
{
  if (CONSTAN

genattrtab error reporting

2014-05-07 Thread Mike Stump

getattrtab looses track of which file the given rtl came from during error 
reporting.  A port that uses multiple .md files for the port will tend to list 
the last .md file processed instead of the correct md file.  We preserve the 
filename upon read, and during post processing, we reset the filename to the 
right context, as we process that context.

Ok?

2014-05-07  Mike Stump  

* genattrtab.c (struct insn_def): Add filename.
(convert_set_attr_alternative): Improve error message.
(check_defs): Ensure read_md_filename is set appropriately.
(gen_insn): Save read_md_filename.

diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index 99b1b83..0f14b4d 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -139,6 +139,7 @@ struct insn_def
   rtx def; /* The DEFINE_...  */
   int insn_code;   /* Instruction number.  */
   int insn_index;  /* Expression number in file, for errors.  */
+  const char *filename;/* Filename.  */
   int lineno;  /* Line number.  */
   int num_alternatives;/* Number of alternatives.  */
   int vec_idx; /* Index of attribute vector in `def'.  */
@@ -1066,7 +1067,8 @@ convert_set_attr_alternative (rtx exp, struct insn_def 
*id)
   if (XVECLEN (exp, 1) != num_alt)
 {
   error_with_line (id->lineno,
-  "bad number of entries in SET_ATTR_ALTERNATIVE");
+  "bad number of entries in SET_ATTR_ALTERNATIVE, was %d 
expected %d",
+  XVECLEN (exp, 1), num_alt);
   return NULL_RTX;
 }
 
@@ -1137,6 +1139,7 @@ check_defs (void)
   if (XVEC (id->def, id->vec_idx) == NULL)
continue;
 
+  read_md_filename = id->filename;
   for (i = 0; i < XVECLEN (id->def, id->vec_idx); i++)
{
  value = XVECEXP (id->def, id->vec_idx, i);
@@ -3280,6 +3283,7 @@ gen_insn (rtx exp, int lineno)
   id->next = defs;
   defs = id;
   id->def = exp;
+  id->filename = read_md_filename;
   id->lineno = lineno;
 
   switch (GET_CODE (exp))

Re: RFC: Faster for_each_rtx-like iterators

2014-05-07 Thread Mike Stump

On May 7, 2014, at 1:52 PM, Richard Sandiford  
wrote:
> 
> I've locally replaced all for_each_rtx calls in the generic code with
> these iterators and they make things reproducably faster.  The speed-up
> on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%,
> but maybe that's enough to justify the churn.

100 1% fixes would make the compiler 100% faster.  :-)  I think 1% is actually 
a really good improvement.  If you have times for -O0, that would be 
interesting to see what they are.

Re: [PATCH] AutoFDO patch for trunk

2014-05-07 Thread Xinliang David Li

Have you announced the autofdo profile tool to gcc list?

David

On Wed, May 7, 2014 at 2:24 PM, Dehao Chen  wrote:
> Hi,
>
> I'm planning to port the AutoFDO patch upstream. Attached is the
> prepared patch. You can also find the patch in
> http://codereview.appspot.com/99010043
>
> I've tested the patch with SPECCPU2006. For the CINT2006 benchmarks,
> the speedup comparison between O2, FDO and AutoFDO is as follows:
>
> Reference: o2
> (1): auto_fdo
> (2): fdo
>
>Benchmark Base:Reference(1)  (2)
> -
> spec/2006/int/C++/471.omnetpp 23.18   +3.11%   +5.09%
> spec/2006/int/C++/473.astar   21.15   +6.79%   +9.80%
> spec/2006/int/C++/483.xalancbmk   36.68  +11.56%  +14.47%
> spec/2006/int/C/400.perlbench 34.57   +6.59%  +18.56%
> spec/2006/int/C/401.bzip2 23.17   +0.95%   +2.49%
> spec/2006/int/C/403.gcc   32.33   +8.27%   +9.76%
> spec/2006/int/C/429.mcf   42.13   +4.72%   +5.23%
> spec/2006/int/C/445.gobmk 26.53   -1.39%   +0.05%
> spec/2006/int/C/456.hmmer 23.72   +7.12%   +7.87%
> spec/2006/int/C/458.sjeng 26.17   +4.65%   +6.04%
> spec/2006/int/C/462.libquantum57.23   +4.04%   +1.42%
> spec/2006/int/C/464.h264ref46.3   +1.07%   +8.97%
>
> geometric mean+4.73%   +7.36%
>
> The majority of the performance difference between AutoFDO and FDO
> comes from the lack of instruction level discriminator support. Cary
> Coutant is planning to port that patch upstream too.
>
> Please let me know if you have any question about this patch, and
> thanks in advance for reviewing such a huge patch.
>
> Dehao

libgo patch committed: Define CLONE flags in syscall package

2014-05-07 Thread Ian Lance Taylor

Domink Vogt pointed out that the gccgo syscall package does not define
the CLONE flags.  This patch defines them.  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline and 4.9
branch.

Ian

diff -r c8ae29f0c4c6 libgo/configure.ac
--- a/libgo/configure.ac	Tue May 06 12:23:00 2014 -0700
+++ b/libgo/configure.ac	Wed May 07 14:40:49 2014 -0700
@@ -475,7 +475,7 @@
   ;;
 esac
 
-AC_CHECK_HEADERS(sys/file.h sys/mman.h syscall.h sys/epoll.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/reboot.h netinet/icmp6.h netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h netinet/if_ether.h)
+AC_CHECK_HEADERS(sched.h sys/file.h sys/mman.h syscall.h sys/epoll.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/reboot.h netinet/icmp6.h netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h netinet/if_ether.h)
 
 AC_CHECK_HEADERS([linux/filter.h linux/if_addr.h linux/if_ether.h linux/if_tun.h linux/netlink.h linux/rtnetlink.h], [], [],
 [#ifdef HAVE_SYS_SOCKET_H
diff -r c8ae29f0c4c6 libgo/mksysinfo.sh
--- a/libgo/mksysinfo.sh	Tue May 06 12:23:00 2014 -0700
+++ b/libgo/mksysinfo.sh	Wed May 07 14:40:49 2014 -0700
@@ -163,6 +163,9 @@
 #if defined(HAVE_NETINET_ICMP6_H)
 #include 
 #endif
+#if defined(HAVE_SCHED_H)
+#include 
+#endif
 
 /* Constants that may only be defined as expressions on some systems,
expressions too complex for -fdump-go-spec to handle.  These are
@@ -1130,6 +1133,10 @@
   -e 's/\[0\]byte/[0]int8/' \
 >> ${OUT}
 
+# The GNU/Linux CLONE flags.
+grep '^const _CLONE_' gen-sysinfo.go | \
+  sed -e 's/^\(const \)_\(CLONE_[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
+
 # The Solaris 11 Update 1 _zone_net_addr_t struct.
 grep '^type _zone_net_addr_t ' gen-sysinfo.go | \
 sed -e 's/_in6_addr/[16]byte/' \

libgo patch committed: Define more TIOC constants

2014-05-07 Thread Ian Lance Taylor

This patch to libgo defines more TIOC constants, constants that are
non-trivial constants on GNU/Linux systems.  Boostrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline and 4.9
branch.

Ian

diff -r bbf6c7c22954 libgo/mksysinfo.sh
--- a/libgo/mksysinfo.sh	Wed May 07 14:42:39 2014 -0700
+++ b/libgo/mksysinfo.sh	Wed May 07 14:58:48 2014 -0700
@@ -180,6 +180,18 @@
 #ifdef TIOCSCTTY
   TIOCSCTTY_val = TIOCSCTTY,
 #endif
+#ifdef TIOCGPTN
+  TIOCGPTN_val = TIOCGPTN,
+#endif
+#ifdef TIOCSPTLCK
+  TIOCSPTLCK_val = TIOCSPTLCK,
+#endif
+#ifdef TIOCGDEV
+  TIOCGDEV_val = TIOCGDEV,
+#endif
+#ifdef TIOCSIG
+  TIOCSIG_val = TIOCSIG,
+#endif
 };
 EOF
 
@@ -778,6 +790,26 @@
 echo 'const TIOCSCTTY = _TIOCSCTTY_val' >> ${OUT}
   fi
 fi
+if ! grep '^const TIOCGPTN' ${OUT} >/dev/null 2>&1; then
+  if grep '^const _TIOCGPTN_val' ${OUT} >/dev/null 2>&1; then
+echo 'const TIOCGPTN = _TIOCGPTN_val' >> ${OUT}
+  fi
+fi
+if ! grep '^const TIOCSPTLCK' ${OUT} >/dev/null 2>&1; then
+  if grep '^const _TIOCSPTLCK_val' ${OUT} >/dev/null 2>&1; then
+echo 'const TIOCSPTLCK = _TIOCSPTLCK_val' >> ${OUT}
+  fi
+fi
+if ! grep '^const TIOCGDEV' ${OUT} >/dev/null 2>&1; then
+  if grep '^const _TIOCGDEV_val' ${OUT} >/dev/null 2>&1; then
+echo 'const TIOCGDEV = _TIOCGDEV_val' >> ${OUT}
+  fi
+fi
+if ! grep '^const TIOCSIG' ${OUT} >/dev/null 2>&1; then
+  if grep '^const _TIOCSIG_val' ${OUT} >/dev/null 2>&1; then
+echo 'const TIOCSIG = _TIOCSIG_val' >> ${OUT}
+  fi
+fi
 
 # The ioctl flags for terminal control
 grep '^const _TC[GS]ET' gen-sysinfo.go | \

AutoFDO profile toolchain is open-sourced

2014-05-07 Thread Dehao Chen

We have open-sourced AutoFDO profile toolchain in:

https://github.com/google/autofdo

For GCC developers, the most important tool is create_gcov, which
converts sampling based profile to GCC-readable profile. Please refer
to the readme file
(https://raw.githubusercontent.com/google/autofdo/master/README) for
more details.

To use the profile, one need to checkout
https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_8. We are working on
porting AutoFDO to trunk
(http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00438.html).

We have limited doc inside the open-sourced package, and we are
planning to add more content to the wiki page
(https://github.com/google/autofdo/wiki). Feel free to send me emails
or discuss on github if you have any questions.

Cheers,
Dehao

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-07 Thread Wei Mi

This is the updated patch of pr58066-3.patch.

The calls added in the templates of tls_local_dynamic_base_32 and
tls_global_dynamic_32 in pr58066-3.patch are used to prevent sched2
from moving sp setting across implicit tls calls, but those calls make
the combine of UNSPEC_TLS_LD_BASE and UNSPEC_DTPOFF difficult, so that
the optimization in tls_local_dynamic_32_once to convert local_dynamic
to global_dynamic mode for single tls reference cannot take effect. In
the updated patch, I remove those calls from insn templates and add
"reg:SI SP_REG" explicitly in the templates of UNSPEC_TLS_GD and
UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above,
and now the optimization in tls_local_dynamic_32_once works.

bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK
if regression passes?

Thanks.
Wei.

ChangeLog:

gcc/
2014-05-07  Wei Mi  

* config/i386/i386.c (ix86_compute_frame_layout):
preferred_stack_boundary updated for tls expanded call.
* config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun.

gcc/testsuite/
2014-05-07  Wei Mi  

* gcc.target/i386/pr58066.c: New test.

Index: testsuite/gcc.target/i386/pr58066.c
===
--- testsuite/gcc.target/i386/pr58066.c (revision 0)
+++ testsuite/gcc.target/i386/pr58066.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-fPIC -O2" } */
+
+/* Check whether the stack frame starting addresses of tls expanded calls
+   in foo and goo are 16bytes aligned.  */
+static __thread char ccc1;
+void* foo()
+{
+ return &ccc1;
+}
+
+__thread char ccc2;
+void* goo()
+{
+ return &ccc2;
+}
+
+/* { dg-final { scan-assembler-times ".cfi_def_cfa_offset 16" 2 } } */
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 209979)
+++ config/i386/i386.c  (working copy)
@@ -9485,20 +9485,30 @@ ix86_compute_frame_layout (struct ix86_f
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();

-  stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT;
-  preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-
   /* 64-bit MS ABI seem to require stack alignment to be always 16 except for
  function prologues and leaf.  */
-  if ((TARGET_64BIT_MS_ABI && preferred_alignment < 16)
+  if ((TARGET_64BIT_MS_ABI && crtl->preferred_stack_boundary < 128)
   && (!crtl->is_leaf || cfun->calls_alloca != 0
   || ix86_current_function_calls_tls_descriptor))
 {
-  preferred_alignment = 16;
-  stack_alignment_needed = 16;
   crtl->preferred_stack_boundary = 128;
   crtl->stack_alignment_needed = 128;
 }
+  /* preferred_stack_boundary is never updated for call
+ expanded from tls descriptor. Update it here. We don't update it in
+ expand stage because according to the comments before
+ ix86_current_function_calls_tls_descriptor, tls calls may be optimized
+ away.  */
+  else if (ix86_current_function_calls_tls_descriptor
+  && crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+{
+  crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+  if (crtl->stack_alignment_needed < PREFERRED_STACK_BOUNDARY)
+   crtl->stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
+}
+
+  stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT;
+  preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT;

   gcc_assert (!size || stack_alignment_needed);
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 209979)
+++ config/i386/i386.md (working copy)
@@ -12530,7 +12530,8 @@
(unspec:SI
 [(match_operand:SI 1 "register_operand" "b")
  (match_operand 2 "tls_symbolic_operand")
- (match_operand 3 "constant_call_address_operand" "z")]
+ (match_operand 3 "constant_call_address_operand" "z")
+ (reg:SI SP_REG)]
 UNSPEC_TLS_GD))
(clobber (match_scratch:SI 4 "=d"))
(clobber (match_scratch:SI 5 "=c"))
@@ -12555,11 +12556,14 @@
 [(set (match_operand:SI 0 "register_operand")
  (unspec:SI [(match_operand:SI 2 "register_operand")
  (match_operand 1 "tls_symbolic_operand")
- (match_operand 3 "constant_call_address_operand")]
+ (match_operand 3 "constant_call_address_operand")
+ (reg:SI SP_REG)]
 UNSPEC_TLS_GD))
  (clobber (match_scratch:SI 4))
  (clobber (match_scratch:SI 5))
- (clobber (reg:CC FLAGS_REG))])])
+ (clobber (reg:CC FLAGS_REG))])]
+  ""
+  "ix86_tls_descriptor_calls_expanded_in_cfun = true;")

 (define_insn "*tls_global_dynamic_64_"
   [(set (match_operand:P 0 "register_operand" "=a")
@@ -12614,13 +12

Re: genattrtab error reporting

2014-05-07 Thread H.J. Lu

On Wed, May 7, 2014 at 2:21 PM, Mike Stump  wrote:
> getattrtab looses track of which file the given rtl came from during error 
> reporting.  A port that uses multiple .md files for the port will tend to 
> list the last .md file processed instead of the correct md file.  We preserve 
> the filename upon read, and during post processing, we reset the filename to 
> the right context, as we process that context.
>

Does this fix

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31778

-- 
H.J.

Re: genattrtab error reporting

2014-05-07 Thread Mike Stump

On May 7, 2014, at 5:22 PM, H.J. Lu  wrote:
> On Wed, May 7, 2014 at 2:21 PM, Mike Stump  wrote:
>> getattrtab looses track of which file the given rtl came from during error 
>> reporting.  A port that uses multiple .md files for the port will tend to 
>> list the last .md file processed instead of the correct md file.  We 
>> preserve the filename upon read, and during post processing, we reset the 
>> filename to the right context, as we process that context.
>> 
> 
> Does this fix
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31778

Only if it is applied to the tree!  :-)  Yes.

[v3] Mini-tweak to acinclude.m4

2014-05-07 Thread Paolo Carlini


Hi,

I don't think we have any reason to trigger a -Wwrite-strings warning, 
thus, barring objections, I'm going to commit the below.


Thanks,
Paolo.

///
2014-05-08  Paolo Carlini  

* acinclude.m4 ([GLIBCXX_ENABLE_C99]): Avoid -Wwrite-strings warning.
* configure: Regenerate.
Index: acinclude.m4
===
--- acinclude.m4(revision 210183)
+++ acinclude.m4(working copy)
@@ -1052,8 +1052,8 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
vscanf("%i", args);
vsnprintf(fmt, 0, "%i", args);
vsscanf(fmt, "%i", args);
-  }],
- [snprintf("12", 0, "%i");],
+   snprintf(fmt, 0, "%i");
+  }], [],
  [glibcxx_cv_c99_stdio=yes], [glibcxx_cv_c99_stdio=no])
   ])
   AC_MSG_RESULT($glibcxx_cv_c99_stdio)

Fix some tests for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL

2014-05-07 Thread Joseph S. Myers

Having fixed TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to apply only to
128-bit vectors, some --with-arch=bdver3 --with-cpu=bdver3
scan-assembler failures relating to that tuning remain, because of
different choices of instructions for 128-bit vectors from the choices
expected by the tests.

This patch fixes affected tests to allow the different instruction
choices seen in this case.  Tested for x86_64-linux-gnu
(--with-arch=bdver3 --with-cpu=bdver3).  OK to commit?

2014-05-07  Joseph Myers  

* gcc.target/i386/avx256-unaligned-load-2.c,
gcc.target/i386/pr49002-1.c, gcc.target/i386/pr53712.c,
gcc.target/i386/pr53907.c, gcc.target/i386/pr59539-1.c: Allow
packed-single instructions.

Index: gcc/testsuite/gcc.target/i386/pr59539-1.c
===
--- gcc/testsuite/gcc.target/i386/pr59539-1.c   (revision 210124)
+++ gcc/testsuite/gcc.target/i386/pr59539-1.c   (working copy)
@@ -13,4 +13,4 @@
   return _mm_movemask_epi8 (result);
 }
 
-/* { dg-final { scan-assembler-times "vmovdqu" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqu|vmovups" 1 } } */
Index: gcc/testsuite/gcc.target/i386/pr53712.c
===
--- gcc/testsuite/gcc.target/i386/pr53712.c (revision 210124)
+++ gcc/testsuite/gcc.target/i386/pr53712.c (working copy)
@@ -10,4 +10,4 @@
   return __builtin_ia32_pcmpistri128 (s1chars, s2chars, 0);
 }
 
-/* { dg-final { scan-assembler-times "movdqu" 1 } } */
+/* { dg-final { scan-assembler-times "movdqu|movups" 1 } } */
Index: gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c
===
--- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c (revision 
210124)
+++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c (working copy)
@@ -11,5 +11,5 @@
 }
 
 /* { dg-final { scan-assembler-not 
"(avx_loaddqu256|vmovdqu\[^\n\r]*movv32qi_internal)" } } */
-/* { dg-final { scan-assembler 
"(sse2_loaddqu|vmovdqu\[^\n\r]*movv16qi_internal)" } } */
+/* { dg-final { scan-assembler 
"(sse2_loaddqu|(vmovdqu|vmovups)\[^\n\r]*movv16qi_internal)" } } */
 /* { dg-final { scan-assembler "vinsert.128" } } */
Index: gcc/testsuite/gcc.target/i386/pr49002-1.c
===
--- gcc/testsuite/gcc.target/i386/pr49002-1.c   (revision 210124)
+++ gcc/testsuite/gcc.target/i386/pr49002-1.c   (working copy)
@@ -13,4 +13,4 @@
 
 /* Ensure we load into xmm, not ymm.  */
 /* { dg-final { scan-assembler-not "vmovapd\[\t \]*\[^,\]*,\[\t \]*%ymm" } } */
-/* { dg-final { scan-assembler "vmovapd\[\t \]*\[^,\]*,\[\t \]*%xmm" } } */
+/* { dg-final { scan-assembler "vmovap\[ds\]\[\t \]*\[^,\]*,\[\t \]*%xmm" } } 
*/
Index: gcc/testsuite/gcc.target/i386/pr53907.c
===
--- gcc/testsuite/gcc.target/i386/pr53907.c (revision 210124)
+++ gcc/testsuite/gcc.target/i386/pr53907.c (working copy)
@@ -13,4 +13,4 @@
   return sz;
 }
 
-/* { dg-final { scan-assembler "movdqa" } } */
+/* { dg-final { scan-assembler "movdqa|movaps" } } */

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: genattrtab error reporting

2014-05-07 Thread Segher Boessenkool

> > Does this fix
> > 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31778
> 
> Only if it is applied to the tree!  :-)  Yes.

It also is PR57062.  Thanks for fixing it!


Segher

Re: RFC: Faster for_each_rtx-like iterators

2014-05-07 Thread Trevor Saunders

On Wed, May 07, 2014 at 09:52:49PM +0100, Richard Sandiford wrote:
> I noticed for_each_rtx showing up in profiles and thought I'd have a go
> at using worklist-based iterators instead.  So far I have three:
> 
>   FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx
>   FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx
>   FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx *
> 
> with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement.
> 
> I made FOR_EACH_SUBRTX the "default" (unsuffixed) version because
> most walks really don't modify the structure.  I think we should
> encourage const_rtxes to be used whereever possible.  E.g. it might
> make it easier to have non-GC storage for temporary rtxes in future.
> 
> I've locally replaced all for_each_rtx calls in the generic code with
> these iterators and they make things reproducably faster.  The speed-up
> on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%,
> but maybe that's enough to justify the churn.

seems pretty nice, and it seems like it'll make code a little more
readable too :)

> Implementation-wise, the main observation is that most subrtxes are part
> of a single contiguous sequence of "e" fields.  E.g. when compiling an
> oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the
> subrtxes of 7,636,542 rtxes.  Of those:
> 
> (A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields,
> (B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and
>   no "E" fields, and
> (C)43,532 (00.6%) are more complicated.
> 
> (A) is really a special case of (B) in which the block has zero length.
> Those are the only two cases that really need to be handled inline.
> The implementation does this by having a mapping from an rtx code to the
> bounds of its "e" sequence, in the form of a start index and count.
> 
> Out of (C), the vast majority (43,509) are PARALLELs.  However, as you'd
> probably expect, bloating the inline code with that case made things
> slower rather than faster.
> 
> The vast majority (in fact all in the combine.ii run above) of iterations
> can be done with a 16-element stack worklist.  We obviously still need a
> heap fallback for the pathological cases though.
> 
> I spent a bit of time trying different iterator implementations and
> seeing which produced the best code.  Specific results from that were:
> 
> - The storage used for the worklist is separate from the iterator,
>   in order to avoid capturing iterator fields.
> 
> - Although the natural type of the storage would be auto_vec <..., 16>,
>   that produced some overhead compared with a separate stack array and heap
>   vector pointer.  With the heap vector pointer, the only overhead is an
>   assignment in the constructor and an "if (x) release (x)"-style sequence
>   in the destructor.  I think the extra complication over auto_vec is worth
>   it because in this case the heap version is so very rarely needed.

hm, where does the overhead come from exactly? it seems like if  its
 faster to use vec *foo; we should fix something
 about vectors since this isn't the only place it could matter.  does it
 matter if you use vec * or vec ? the second
 is basically just a wrapper around the former I'd expect has no effect.
 I'm not saying you're doing the wrong thing here, but if we can make
 generic vectors faster we probably should ;) or is the issue the
 __builtin_expect()s you can add?

> - Several existing for_each_rtx callbacks have something like:
> 
> if (GET_CODE (x) == CONST)
>   return -1;
> 
>   or:
> 
> if (CONSTANT_P (x))
>   return -1;
> 
>   to avoid walking subrtxes of constants.  That can be done without
>   extra code checks and branches by having a separate code->bound
>   mapping in which all constants are treated as leaf rtxes.  This usage
>   should be common enough to outweigh the cache penalty of two arrays.
> 
>   The choice between iterating over constants or not is given in the
>   final parameter of the FOR_EACH_* iterator.

less repitition \O/

> - The maximum number of fields in (B)-type rtxes is 3.  We get better
>   code by making that explicit rather than having a general loop.
> 
> - (C) codes map to an "e" count of UCHAR_MAX, so we can use a single
>   check to test for that and for cases where the stack worklist is
>   too small.

 can we use uint8_t?

> To give an example:
> 
> /* Callback for for_each_rtx, that returns 1 upon encountering a VALUE
>whose UID is greater than the int uid that D points to.  */
> 
> static int
> refs_newer_value_cb (rtx *x, void *d)
> {
>   if (GET_CODE (*x) == VALUE && CSELIB_VAL_PTR (*x)->uid > *(int *)d)
> return 1;
> 
>   return 0;
> }
> 
> /* Return TRUE if EXPR refers to a VALUE whose uid is greater than
>that of V.  */
> 
> static bool
> refs_newer_value_p (rtx expr, rtx v)
> {
>   int minuid = CSELIB_VAL_PTR (v)->uid;
> 
>   return for_each_rtx (&expr, refs_newer_val

[RS6000] Fix PR61098, Poor code setting count register

2014-05-07 Thread Alan Modra

On powerpc64, to set a large loop count we have code like the
following after split1:

(insn 67 14 68 4 (set (reg:DI 160)
(const_int 99942400 [0x5f5])) /home/amodra/unaligned_load.c:14 -1
 (nil))
(insn 68 67 42 4 (set (reg:DI 160)
(ior:DI (reg:DI 160)
(const_int 57600 [0xe100]))) /home/amodra/unaligned_load.c:14 -1
 (expr_list:REG_EQUAL (const_int 1 [0x5f5e100])
(nil)))

and then test for loop exit with:

(jump_insn 65 31 45 5 (parallel [
(set (pc)
(if_then_else (ne (reg:DI 160)
(const_int 1 [0x1]))
(label_ref:DI 42)
(pc)))
(set (reg:DI 160)
(plus:DI (reg:DI 160)
(const_int -1 [0x])))
(clobber (scratch:CC))
(clobber (scratch:DI))
]) /home/amodra/unaligned_load.c:15 800 {*ctrdi_internal1}
 (int_list:REG_BR_PROB 9899 (nil))
 -> 42)

The jump_insn of course is meant for use with bdnz, which implies a
strong preference for reg 160 to live in the count register.  Trouble
is, the count register doesn't do arithmetic.

So, use a new psuedo for intermediate results.  On looking at this,
I noticed the !TARGET_POWERPC64 code in rs6000_emit_set_long_const was
broken, apparently expecting c1 and c2 to be the high and low 32 bits
of the constant.  That's no longer true, so I've fixed that as well.
Bootstrapped and regression tested powerpc64-linux.  OK for mainline
and branches?

PR target/61098
* config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded
params and return value.  Simplify.  Update comment.
(rs6000_emit_set_long_const): Remove unneeded param and return
value.  Correct !TARGET_POWERPC64 handling of constants > 2G.
If we can, use a new pseudo for intermediate calculations.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 209926)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1068,7 +1069,7 @@ static tree rs6000_handle_longcall_attribute (tree
 static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
-static rtx rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
+static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
 static int rs6000_memory_move_cost (enum machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, int, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, enum machine_mode, addr_space_t,
@@ -7826,53 +7811,36 @@ rs6000_conditional_register_usage (void)
 }
 
 
-/* Try to output insns to set TARGET equal to the constant C if it can
-   be done in less than N insns.  Do all computations in MODE.
-   Returns the place where the output has been placed if it can be
-   done and the insns have been emitted.  If it would take more than N
-   insns, zero is returned and no insns and emitted.  */
+/* Output insns to set DEST equal to the constant SOURCE.  */
 
-rtx
-rs6000_emit_set_const (rtx dest, enum machine_mode mode,
-  rtx source, int n ATTRIBUTE_UNUSED)
+void
+rs6000_emit_set_const (rtx dest, rtx source)
 {
-  rtx result, insn, set;
-  HOST_WIDE_INT c0, c1;
+  enum machine_mode mode = GET_MODE (dest);
+  rtx temp, insn, set;
+  HOST_WIDE_INT c;
 
+  gcc_checking_assert (CONST_INT_P (source));
+  c = INTVAL (source);
   switch (mode)
 {
-case  QImode:
+case QImode:
 case HImode:
-  if (dest == NULL)
-   dest = gen_reg_rtx (mode);
   emit_insn (gen_rtx_SET (VOIDmode, dest, source));
-  return dest;
+  return;
 
 case SImode:
-  result = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode);
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode);
 
-  emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (result),
- GEN_INT (INTVAL (source)
-  & (~ (HOST_WIDE_INT) 0x;
+  emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (temp),
+ GEN_INT (c & (~ (HOST_WIDE_INT) 0x;
   emit_insn (gen_rtx_SET (VOIDmode, dest,
- gen_rtx_IOR (SImode, copy_rtx (result),
-  GEN_INT (INTVAL (source) & 
0x;
-  result = dest;
+ gen_rtx_IOR (SImode, copy_rtx (temp),
+  GEN_INT (c & 0x;
   break;
 
 case DImode:
-  switch (GET_CODE (source))
-   {
-   case CONST_INT:
- c0 = INTVAL (source);
- c1 = -(c0 < 0);
- break;
-
-   default:
- gcc_unreachable ();
-   }
-
-  result = rs6000_emit_set_long_const (dest, c0, c1);
+

Re: [RS6000] PR60737, expand_block_clear uses word stores

2014-05-07 Thread Alan Modra

On Wed, May 07, 2014 at 01:39:50PM -0400, David Edelsohn wrote:
> On Tue, May 6, 2014 at 4:32 AM, Alan Modra  wrote:
> > BTW, the latest patch in my tree has a slight refinement, the
> > reload-by-hand addition.
> >
> > PR target/60737
> > * config/rs6000/rs6000.c (expand_block_move): Allow 64-bit
> > loads and stores when -mno-strict-align at any alignment.
> > (expand_block_clear): Similarly.  Also correct calculation of
> > instruction count.
> 
> Based on results of your experiment, the revised patch is okay.
> 
> You did not include gcc-patches in the distribution list for the revised 
> patch.

Thanks, David.  Patch copied here for gcc-patches and committed
revision 210201.

PR target/60737
* config/rs6000/rs6000.c (expand_block_move): Allow 64-bit
loads and stores when -mno-strict-align at any alignment.
(expand_block_clear): Similarly.  Also correct calculation of
instruction count.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 210200)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -15443,7 +15443,7 @@ expand_block_clear (rtx operands[])
  load zero and three to do clearing.  */
   if (TARGET_ALTIVEC && align >= 128)
 clear_step = 16;
-  else if (TARGET_POWERPC64 && align >= 32)
+  else if (TARGET_POWERPC64 && (align >= 64 || !STRICT_ALIGNMENT))
 clear_step = 8;
   else if (TARGET_SPE && align >= 64)
 clear_step = 8;
@@ -15471,12 +15471,27 @@ expand_block_clear (rtx operands[])
   mode = V2SImode;
 }
   else if (bytes >= 8 && TARGET_POWERPC64
-  /* 64-bit loads and stores require word-aligned
- displacements.  */
-  && (align >= 64 || (!STRICT_ALIGNMENT && align >= 32)))
+  && (align >= 64 || !STRICT_ALIGNMENT))
{
  clear_bytes = 8;
  mode = DImode;
+ if (offset == 0 && align < 64)
+   {
+ rtx addr;
+
+ /* If the address form is reg+offset with offset not a
+multiple of four, reload into reg indirect form here
+rather than waiting for reload.  This way we get one
+reload, not one per store.  */
+ addr = XEXP (orig_dest, 0);
+ if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
+ && GET_CODE (XEXP (addr, 1)) == CONST_INT
+ && (INTVAL (XEXP (addr, 1)) & 3) != 0)
+   {
+ addr = copy_addr_to_reg (addr);
+ orig_dest = replace_equiv_address (orig_dest, addr);
+   }
+   }
}
   else if (bytes >= 4 && (align >= 32 || !STRICT_ALIGNMENT))
{   /* move 4 bytes */
@@ -15604,13 +15619,36 @@ expand_block_move (rtx operands[])
  gen_func.movmemsi = gen_movmemsi_4reg;
}
   else if (bytes >= 8 && TARGET_POWERPC64
-  /* 64-bit loads and stores require word-aligned
- displacements.  */
-  && (align >= 64 || (!STRICT_ALIGNMENT && align >= 32)))
+  && (align >= 64 || !STRICT_ALIGNMENT))
{
  move_bytes = 8;
  mode = DImode;
  gen_func.mov = gen_movdi;
+ if (offset == 0 && align < 64)
+   {
+ rtx addr;
+
+ /* If the address form is reg+offset with offset not a
+multiple of four, reload into reg indirect form here
+rather than waiting for reload.  This way we get one
+reload, not one per load and/or store.  */
+ addr = XEXP (orig_dest, 0);
+ if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
+ && GET_CODE (XEXP (addr, 1)) == CONST_INT
+ && (INTVAL (XEXP (addr, 1)) & 3) != 0)
+   {
+ addr = copy_addr_to_reg (addr);
+ orig_dest = replace_equiv_address (orig_dest, addr);
+   }
+ addr = XEXP (orig_src, 0);
+ if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM)
+ && GET_CODE (XEXP (addr, 1)) == CONST_INT
+ && (INTVAL (XEXP (addr, 1)) & 3) != 0)
+   {
+ addr = copy_addr_to_reg (addr);
+ orig_src = replace_equiv_address (orig_src, addr);
+   }
+   }
}
   else if (TARGET_STRING && bytes > 4 && !TARGET_POWERPC64)
{   /* move up to 8 bytes at a time */

-- 
Alan Modra
Australia Development Lab, IBM

Re: genattrtab error reporting

2014-05-07 Thread Mike Stump

On May 7, 2014, at 6:12 PM, Segher Boessenkool  
wrote:
>>> Does this fix
>>> 
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31778
>> 
>> Only if it is applied to the tree!  :-)  Yes.
> 
> It also is PR57062.  Thanks for fixing it!

Thanks, marked as dup.

Re: [patch] change specific int128 -> generic intN

2014-05-07 Thread DJ Delorie


> OK (presuming the usual bootstrap and regression test, which should 
> provide a reasonably thorough test of this code through the  
> tests).

Bootstrapped with and without the patch on x86-64, no regressions.
Committed.  Thanks!

Re: RFC: Faster for_each_rtx-like iterators

2014-05-07 Thread Richard Sandiford

Trevor Saunders  writes:
> On Wed, May 07, 2014 at 09:52:49PM +0100, Richard Sandiford wrote:
>> I noticed for_each_rtx showing up in profiles and thought I'd have a go
>> at using worklist-based iterators instead.  So far I have three:
>> 
>>   FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx
>>   FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx
>>   FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx *
>> 
>> with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement.
>> 
>> I made FOR_EACH_SUBRTX the "default" (unsuffixed) version because
>> most walks really don't modify the structure.  I think we should
>> encourage const_rtxes to be used whereever possible.  E.g. it might
>> make it easier to have non-GC storage for temporary rtxes in future.
>> 
>> I've locally replaced all for_each_rtx calls in the generic code with
>> these iterators and they make things reproducably faster.  The speed-up
>> on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%,
>> but maybe that's enough to justify the churn.
>
> seems pretty nice, and it seems like it'll make code a little more
> readable too :)
>
>> Implementation-wise, the main observation is that most subrtxes are part
>> of a single contiguous sequence of "e" fields.  E.g. when compiling an
>> oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the
>> subrtxes of 7,636,542 rtxes.  Of those:
>> 
>> (A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields,
>> (B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and
>>   no "E" fields, and
>> (C)43,532 (00.6%) are more complicated.
>> 
>> (A) is really a special case of (B) in which the block has zero length.
>> Those are the only two cases that really need to be handled inline.
>> The implementation does this by having a mapping from an rtx code to the
>> bounds of its "e" sequence, in the form of a start index and count.
>> 
>> Out of (C), the vast majority (43,509) are PARALLELs.  However, as you'd
>> probably expect, bloating the inline code with that case made things
>> slower rather than faster.
>> 
>> The vast majority (in fact all in the combine.ii run above) of iterations
>> can be done with a 16-element stack worklist.  We obviously still need a
>> heap fallback for the pathological cases though.
>> 
>> I spent a bit of time trying different iterator implementations and
>> seeing which produced the best code.  Specific results from that were:
>> 
>> - The storage used for the worklist is separate from the iterator,
>>   in order to avoid capturing iterator fields.
>> 
>> - Although the natural type of the storage would be auto_vec <..., 16>,
>>   that produced some overhead compared with a separate stack array and heap
>>   vector pointer.  With the heap vector pointer, the only overhead is an
>>   assignment in the constructor and an "if (x) release (x)"-style sequence
>>   in the destructor.  I think the extra complication over auto_vec is worth
>>   it because in this case the heap version is so very rarely needed.
>
> hm, where does the overhead come from exactly? it seems like if  its
>  faster to use vec *foo; we should fix something
>  about vectors since this isn't the only place it could matter.  does it
>  matter if you use vec * or vec ? the second
>  is basically just a wrapper around the former I'd expect has no effect.
>  I'm not saying you're doing the wrong thing here, but if we can make
>  generic vectors faster we probably should ;) or is the issue the
>  __builtin_expect()s you can add?

Part of the problem is that by having an array in the vec itself,
the other fields effectively have their address taken too.
So m_alloc, m_num and m_using_auto_storage need to be set up
and maintained on the stack, even though we're almost sure that they
will never be used.

>> - The maximum number of fields in (B)-type rtxes is 3.  We get better
>>   code by making that explicit rather than having a general loop.
>> 
>> - (C) codes map to an "e" count of UCHAR_MAX, so we can use a single
>>   check to test for that and for cases where the stack worklist is
>>   too small.
>
>  can we use uint8_t?

We don't really use that in GCC yet.  I don't mind setting a precedent
though :-)

>> To give an example:
>> 
>> /* Callback for for_each_rtx, that returns 1 upon encountering a VALUE
>>whose UID is greater than the int uid that D points to.  */
>> 
>> static int
>> refs_newer_value_cb (rtx *x, void *d)
>> {
>>   if (GET_CODE (*x) == VALUE && CSELIB_VAL_PTR (*x)->uid > *(int *)d)
>> return 1;
>> 
>>   return 0;
>> }
>> 
>> /* Return TRUE if EXPR refers to a VALUE whose uid is greater than
>>that of V.  */
>> 
>> static bool
>> refs_newer_value_p (rtx expr, rtx v)
>> {
>>   int minuid = CSELIB_VAL_PTR (v)->uid;
>> 
>>   return for_each_rtx (&expr, refs_newer_value_cb, &minuid);
>> }
>> 
>> becomes:
>> 
>> /* Return TRUE if EXPR refers to a VALUE whose uid is greater than
>>

Re: RFC: Faster for_each_rtx-like iterators

2014-05-07 Thread Richard Sandiford

Richard Sandiford  writes:
> Trevor Saunders  writes:
>> On Wed, May 07, 2014 at 09:52:49PM +0100, Richard Sandiford wrote:
>>> I noticed for_each_rtx showing up in profiles and thought I'd have a go
>>> at using worklist-based iterators instead.  So far I have three:
>>> 
>>>   FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx
>>>   FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx
>>>   FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx *
>>> 
>>> with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement.
>>> 
>>> I made FOR_EACH_SUBRTX the "default" (unsuffixed) version because
>>> most walks really don't modify the structure.  I think we should
>>> encourage const_rtxes to be used whereever possible.  E.g. it might
>>> make it easier to have non-GC storage for temporary rtxes in future.
>>> 
>>> I've locally replaced all for_each_rtx calls in the generic code with
>>> these iterators and they make things reproducably faster.  The speed-up
>>> on full --enable-checking=release ./cc1 and ./cc1plus times is only about 
>>> 1%,
>>> but maybe that's enough to justify the churn.
>>
>> seems pretty nice, and it seems like it'll make code a little more
>> readable too :)
>>
>>> Implementation-wise, the main observation is that most subrtxes are part
>>> of a single contiguous sequence of "e" fields.  E.g. when compiling an
>>> oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the
>>> subrtxes of 7,636,542 rtxes.  Of those:
>>> 
>>> (A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields,
>>> (B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and
>>>   no "E" fields, and
>>> (C)43,532 (00.6%) are more complicated.
>>> 
>>> (A) is really a special case of (B) in which the block has zero length.
>>> Those are the only two cases that really need to be handled inline.
>>> The implementation does this by having a mapping from an rtx code to the
>>> bounds of its "e" sequence, in the form of a start index and count.
>>> 
>>> Out of (C), the vast majority (43,509) are PARALLELs.  However, as you'd
>>> probably expect, bloating the inline code with that case made things
>>> slower rather than faster.
>>> 
>>> The vast majority (in fact all in the combine.ii run above) of iterations
>>> can be done with a 16-element stack worklist.  We obviously still need a
>>> heap fallback for the pathological cases though.
>>> 
>>> I spent a bit of time trying different iterator implementations and
>>> seeing which produced the best code.  Specific results from that were:
>>> 
>>> - The storage used for the worklist is separate from the iterator,
>>>   in order to avoid capturing iterator fields.
>>> 
>>> - Although the natural type of the storage would be auto_vec <..., 16>,
>>>   that produced some overhead compared with a separate stack array and heap
>>>   vector pointer.  With the heap vector pointer, the only overhead is an
>>>   assignment in the constructor and an "if (x) release (x)"-style sequence
>>>   in the destructor.  I think the extra complication over auto_vec is worth
>>>   it because in this case the heap version is so very rarely needed.
>>
>> hm, where does the overhead come from exactly? it seems like if  its
>>  faster to use vec *foo; we should fix something
>>  about vectors since this isn't the only place it could matter.  does it
>>  matter if you use vec * or vec ? the second
>>  is basically just a wrapper around the former I'd expect has no effect.
>>  I'm not saying you're doing the wrong thing here, but if we can make
>>  generic vectors faster we probably should ;) or is the issue the
>>  __builtin_expect()s you can add?
>
> Part of the problem is that by having an array in the vec itself,
> the other fields effectively have their address taken too.
> So m_alloc, m_num and m_using_auto_storage need to be set up
> and maintained on the stack, even though we're almost sure that they
> will never be used.
>
>>> - The maximum number of fields in (B)-type rtxes is 3.  We get better
>>>   code by making that explicit rather than having a general loop.
>>> 
>>> - (C) codes map to an "e" count of UCHAR_MAX, so we can use a single
>>>   check to test for that and for cases where the stack worklist is
>>>   too small.
>>
>>  can we use uint8_t?
>
> We don't really use that in GCC yet.  I don't mind setting a precedent
> though :-)
>
>>> To give an example:
>>> 
>>> /* Callback for for_each_rtx, that returns 1 upon encountering a VALUE
>>>whose UID is greater than the int uid that D points to.  */
>>> 
>>> static int
>>> refs_newer_value_cb (rtx *x, void *d)
>>> {
>>>   if (GET_CODE (*x) == VALUE && CSELIB_VAL_PTR (*x)->uid > *(int *)d)
>>> return 1;
>>> 
>>>   return 0;
>>> }
>>> 
>>> /* Return TRUE if EXPR refers to a VALUE whose uid is greater than
>>>that of V.  */
>>> 
>>> static bool
>>> refs_newer_value_p (rtx expr, rtx v)
>>> {
>>>   int minuid = CSELIB_VAL_PTR (v)->uid;
>>> 
>>>   return fo

88 matches

Mail list logo