date:20121015

C++ PATCH to implement C++11 inheriting constructors

2012-10-15 Thread Jason Merrill


This patch implements the C++11 inheriting constructors feature: given

struct A { A(int); };
struct B { using A::A; };

the compiler defines B::B(int) that just passes its argument along to 
the A constructor.


Ville started working on this feature a while back, but got tied up with 
other things and was unable to finish it in time for 4.8, so I finished 
it up.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit b1ec39978ebc71b2ebde78b4100986b444d96380
Author: Ville Voutilainen 
Date:   Wed May 23 13:49:02 2012 +0300

	Implement C++11 inheriting constructors.
	* cp-tree.h (cpp0x_warn_str): Add CPP0X_INHERITING_CTORS.
	(DECL_INHERITED_CTOR_BASE, SET_DECL_INHERITED_CTOR_BASE): New.
	(special_function_kind): Add sfk_inheriting_constructor.
	* class.c (add_method): An inheriting ctor is hidden by a
	user-declared one.
	(one_inheriting_sig, one_inherited_ctor): New.
	(add_implicitly_declared_members): Handle inheriting ctors.
	* error.c (maybe_warn_cpp0x): Handle CPP0X_INHERITING_CTORS.
	* init.c (emit_mem_initializers): Don't set LOOKUP_DEFAULTED
	for an inheriting constructor.
	* method.c (type_has_trivial_fn): Handle sfk_inheriting_constructor.
	(type_set_nontrivial_flag): Likewise.
	(add_one_base_init): Split out from...
	(do_build_copy_constructor): ...here.  Handle inheriting constructors.
	(locate_fn_flags): Handle a list of arg types.
	(synthesized_method_walk): Handle inheriting constructors.
	(maybe_explain_implicit_delete): Likewise.
	(deduce_inheriting_ctor): New.
	(implicitly_declare_fn): Handle inheriting constructors.
	* name-lookup.c (push_class_level_binding_1): An inheriting constructor
	does not declare the base's name.
	(do_class_using_decl): Allow inheriting constructors.
	* pt.c (template_parms_to_args): Split from current_template_args.
	(add_inherited_template_parms): New.
	(tsubst_decl): Handle inheriting constructors.
	* tree.c (special_function_p): Handle inheriting constructors.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 0e77b81..a478de8 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -132,7 +132,7 @@ static void finish_struct_methods (tree);
 static void maybe_warn_about_overly_private_class (tree);
 static int method_name_cmp (const void *, const void *);
 static int resort_method_name_cmp (const void *, const void *);
-static void add_implicitly_declared_members (tree, int, int);
+static void add_implicitly_declared_members (tree, tree*, int, int);
 static tree fixed_type_or_null (tree, int *, int *);
 static tree build_simple_base_path (tree expr, tree binfo);
 static tree build_vtbl_ref_1 (tree, tree);
@@ -1087,6 +1087,20 @@ add_method (tree type, tree method, tree using_decl)
 	  || same_type_p (TREE_TYPE (fn_type),
 			  TREE_TYPE (method_type
 	{
+	  if (DECL_INHERITED_CTOR_BASE (method))
+	{
+	  if (DECL_INHERITED_CTOR_BASE (fn))
+		{
+		  error_at (DECL_SOURCE_LOCATION (method),
+			"%q#D inherited from %qT", method,
+			DECL_INHERITED_CTOR_BASE (method));
+		  error_at (DECL_SOURCE_LOCATION (fn),
+			"conflicts with version inherited from %qT",
+			DECL_INHERITED_CTOR_BASE (fn));
+		}
+	  /* Otherwise defer to the other function.  */
+	  return false;
+	}
 	  if (using_decl)
 	{
 	  if (DECL_CONTEXT (fn) == type)
@@ -2750,6 +2764,51 @@ declare_virt_assop_and_dtor (tree t)
 		NULL, t);
 }
 
+/* Declare the inheriting constructor for class T inherited from base
+   constructor CTOR with the parameter array PARMS of size NPARMS.  */
+
+static void
+one_inheriting_sig (tree t, tree ctor, tree *parms, int nparms)
+{
+  /* We don't declare an inheriting ctor that would be a default,
+ copy or move ctor.  */
+  if (nparms == 0
+  || (nparms == 1
+	  && TREE_CODE (parms[0]) == REFERENCE_TYPE
+	  && TYPE_MAIN_VARIANT (TREE_TYPE (parms[0])) == t))
+return;
+  int i;
+  tree parmlist = void_list_node;
+  for (i = nparms - 1; i >= 0; i--)
+parmlist = tree_cons (NULL_TREE, parms[i], parmlist);
+  tree fn = implicitly_declare_fn (sfk_inheriting_constructor,
+   t, false, ctor, parmlist);
+  if (add_method (t, fn, NULL_TREE))
+{
+  DECL_CHAIN (fn) = TYPE_METHODS (t);
+  TYPE_METHODS (t) = fn;
+}
+}
+
+/* Declare all the inheriting constructors for class T inherited from base
+   constructor CTOR.  */
+
+static void
+one_inherited_ctor (tree ctor, tree t)
+{
+  tree parms = FUNCTION_FIRST_USER_PARMTYPE (ctor);
+
+  tree *new_parms = XALLOCAVEC (tree, list_length (parms));
+  int i = 0;
+  for (; parms && parms != void_list_node; parms = TREE_CHAIN (parms))
+{
+  if (TREE_PURPOSE (parms))
+	one_inheriting_sig (t, ctor, new_parms, i);
+  new_parms[i++] = TREE_VALUE (parms);
+}
+  one_inheriting_sig (t, ctor, new_parms, i);
+}
+
 /* Create default constructors, assignment operators, and so forth for
the type indicated by T, if they are needed.  CANT_HAVE_

C++ PATCH for target/54908 (thread_local vs emutls)

2012-10-15 Thread Jason Merrill

This patch completely rewrites atexit_thread.cc to use 
__gthread_getspecific/setspecific rather than a thread_local variable to 
store the cleanup list, so that the list won't vanish if emutls_destroy 
runs first.  With this patch all the TLS tests pass on i686-pc-linux-gnu 
configured with --disable-tls to force use of emutls.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8735583f399914e92f126d98c120aba317f6626a
Author: Jason Merrill 
Date:   Mon Oct 8 10:50:20 2012 -0400

	PR target/54908
	* libsupc++/atexit_thread.cc: Rewrite to keep the cleanup list
	with get/setspecific.  Destroy the key on dlclose.

diff --git a/gcc/testsuite/g++.dg/tls/thread_local7g.C b/gcc/testsuite/g++.dg/tls/thread_local7g.C
index 6960598..3479aeb 100644
--- a/gcc/testsuite/g++.dg/tls/thread_local7g.C
+++ b/gcc/testsuite/g++.dg/tls/thread_local7g.C
@@ -3,7 +3,7 @@
 // { dg-require-alias }
 
 // The reference temp should be TLS, not normal data.
-// { dg-final { scan-assembler-not "\\.data" } }
+// { dg-final { scan-assembler-not "\\.data" { target tls_native } } }
 
 thread_local int&& ir = 42;
 
diff --git a/libstdc++-v3/libsupc++/atexit_thread.cc b/libstdc++-v3/libsupc++/atexit_thread.cc
index 5e47708..95bdcf0 100644
--- a/libstdc++-v3/libsupc++/atexit_thread.cc
+++ b/libstdc++-v3/libsupc++/atexit_thread.cc
@@ -27,109 +27,92 @@
 #include "bits/gthr.h"
 
 namespace {
-  // Data structure for the list of destructors: Singly-linked list
-  // of arrays.
-  class list
+  // One element in a singly-linked stack of cleanups.
+  struct elt
   {
-struct elt
-{
-  void *object;
-  void (*destructor)(void *);
-};
-
-static const int max_nelts = 32;
-
-list *next;
-int nelts;
-elt array[max_nelts];
-
-elt *allocate_elt();
-  public:
-void run();
-static void run(void *p);
-int add_elt(void (*)(void *), void *);
+void (*destructor)(void *);
+void *object;
+elt *next;
   };
 
-  // Return the address of an open slot.
-  list::elt *
-  list::allocate_elt()
-  {
-if (nelts < max_nelts)
-  return &array[nelts++];
-if (!next)
-  next = new (std::nothrow) list();
-if (!next)
-  return 0;
-return next->allocate_elt();
-  }
-
-  // Run all the cleanups in the list.
-  void
-  list::run()
-  {
-for (int i = nelts - 1; i >= 0; --i)
-  array[i].destructor (array[i].object);
-if (next)
-  next->run();
-  }
-
-  // Static version to use as a callback to __gthread_key_create.
-  void
-  list::run(void *p)
-  {
-static_cast(p)->run();
-  }
-
-  // The list of cleanups is per-thread.
-  thread_local list first;
-
-  // The pthread data structures for actually running the destructors at
-  // thread exit are shared.  The constructor of the thread-local sentinel
-  // object in add_elt performs the initialization.
+  // Keep a per-thread list of cleanups in gthread_key storage.
   __gthread_key_t key;
-  __gthread_once_t once = __GTHREAD_ONCE_INIT;
-  void run_current () { first.run(); }
+  // But also support non-threaded mode.
+  elt *single_thread;
+
+  // Run the specified stack of cleanups.
+  void run (void *p)
+  {
+elt *e = static_cast(p);
+for (; e; e = e->next)
+  e->destructor (e->object);
+  }
+
+  // Run the stack of cleanups for the current thread.
+  void run ()
+  {
+void *e;
+if (__gthread_active_p ())
+  e = __gthread_getspecific (key);
+else
+  e = single_thread;
+run (e);
+  }
+
+  // Initialize the key for the cleanup stack.  We use a static local for
+  // key init/delete rather than atexit so that delete is run on dlclose.
   void key_init() {
-__gthread_key_create (&key, list::run);
+struct key_s {
+  key_s() { __gthread_key_create (&key, run); }
+  ~key_s() { __gthread_key_delete (key); }
+};
+static key_s ks;
 // Also make sure the destructors are run by std::exit.
 // FIXME TLS cleanups should run before static cleanups and atexit
 // cleanups.
-std::atexit (run_current);
+std::atexit (run);
   }
-  struct sentinel
-  {
-sentinel()
+}
+
+extern "C" int
+__cxxabiv1::__cxa_thread_atexit (void (*dtor)(void *), void *obj, void */*dso_handle*/)
+  _GLIBCXX_NOTHROW
+{
+  // Do this initialization once.
+  if (__gthread_active_p ())
 {
-  if (__gthread_active_p ())
+  // When threads are active use __gthread_once.
+  static __gthread_once_t once = __GTHREAD_ONCE_INIT;
+  __gthread_once (&once, key_init);
+}
+  else
+{
+  // And when threads aren't active use a static local guard.
+  static bool queued;
+  if (!queued)
 	{
-	  __gthread_once (&once, key_init);
-	  __gthread_setspecific (key, &first);
+	  queued = true;
+	  std::atexit (run);
 	}
-  else
-	std::atexit (run_current);
 }
-  };
 
-  // Actually insert an element.
-  int
-  list::add_elt(void (*dtor)(void *), void *obj)
-  {
-thread_local sentinel s;
-elt *e = allocate_elt ();
-if (!e)
-  return -1

Re: Ping: RFA: Fix OP_INOUT handling of web.c:union_match_dups

2012-10-15 Thread Paolo Bonzini

Il 15/10/2012 07:10, Joern Rennecke ha scritto:
> 2012-10-02  Joern Rennecke  
> 
> * web.c (union_match_dups): Properly handle OP_INOUT match_dups.
> 
> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00189.html
> 

Ok.

Paolo

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Paolo Bonzini

Il 14/10/2012 22:59, Steven Bosscher ha scritto:
> On Sun, Oct 14, 2012 at 9:02 AM, Paolo Bonzini wrote:
>> Can we just simulate liveness for web, and drop REG_EQUAL/REG_EQUIV
>> notes that refer to a dead pseudo?
> 
> I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a
> pseudo that isn't live and still be valid. Consider a simple example
> like this:
> 
> a = b + 3
> // b dies here
> c = a {REG_EQUAL b+3}
> 
> The REG_EQUAL note is valid and may help optimization. Removing it
> just because b is dead at that point would be unnecessarily
> pessimistic.

I disagree that it is valid.  At least it is risky to consider it valid,
because a pass that simulates liveness might end up doing something
wrong because of that note.  If simulation is done backwards, it doesn't
even require any interaction with REG_DEAD notes.

> I also don't want to compute DF_LR taking EQ_USES into account as real
> uses for liveness, because that involves recomputing and enlarging the
> DF_LR sets (all of them, both globally and locally) before LR&RD and
> after LR&RD. That's why I implemented the quick-and-dirty liveness
> computation for the notes: It's non-intrusive on DF_LR and it's cheap.

Yes, I agree on that part of the implementation. :)

Paolo

Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-15 Thread Eric Botcazou

> > The only versions of the Solaris assembler I have access to only support
> > v8plusX according to the man page.  Has that changed recently?
> 
> For the older stuff I mean doing something like "-m32 -xarch=v9X"

OK, this is for fbe, not for as.  I think that the latter is always available 
on the machines, but I'm not sure for the former.  Rainer very likely knows.

> The current assembler in Solaris Studio (called 'fbe') calls this
> stuff "sparc4" which I guess means "SPARC-T4 and later".

Ah, thanks.  I agree that using the same monikers is the right thing to do...

> I'm just calling it VIS4 in GCC so that we can export intrinsics of,
> for example, the cryptographic instructions at some point using the
> __VIS__ version CPP tests.

...that's why I'm not sure we should invent VIS4 at this point.  How is this 
done on the Solaris Studio side?  Couldn't we add a new architecture to the 
compiler (-mcpu=sparc4, with -mcpu=niagara4 as first variant), and define 
__sparc4__ for the preprocessor?

-- 
Eric Botcazou

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Steven Bosscher

On Mon, Oct 15, 2012 at 9:38 AM, Paolo Bonzini wrote:
>> I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a
>> pseudo that isn't live and still be valid. Consider a simple example
>> like this:
>>
>> a = b + 3
>> // b dies here
>> c = a {REG_EQUAL b+3}
>>
>> The REG_EQUAL note is valid and may help optimization. Removing it
>> just because b is dead at that point would be unnecessarily
>> pessimistic.
>
> I disagree that it is valid.  At least it is risky to consider it valid,
> because a pass that simulates liveness might end up doing something
> wrong because of that note.  If simulation is done backwards, it doesn't
> even require any interaction with REG_DEAD notes.

In any case, if web doesn't properly rename the register in the
REG_EQUAL note (which it doesn't do without my patch) and we declare
such a note invalid, then we should remove the note. You're right that
GCC ends up doing something wrong, that's why Honza's test case fails.

With my patch, the registers in the notes are properly renamed and
there are no REG_EQUAL notes that refer to dead registers, so the
point whether such a note would be valid is moot. After renaming, the
note is valid, and the behavior is restored that GCC had before I
added DF_RD_PRUNE_DEAD_DEFS.

I think we should come to a conclusion of this discussion: Either we
drop the notes (e.g. by re-computing the DF_NOTE problem after web) or
we update them, like my patch does. I prefer the patch I proposed
because it re-instates the behavior GCC had before.

Ciao!
Steven

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Paolo Bonzini

Il 15/10/2012 10:13, Steven Bosscher ha scritto:
> > I disagree that it is valid.  At least it is risky to consider it valid,
> > because a pass that simulates liveness might end up doing something
> > wrong because of that note.  If simulation is done backwards, it doesn't
> > even require any interaction with REG_DEAD notes.
> 
> In any case, if web doesn't properly rename the register in the
> REG_EQUAL note (which it doesn't do without my patch) and we declare
> such a note invalid, then we should remove the note. You're right that
> GCC ends up doing something wrong, that's why Honza's test case fails.
> 
> I think we should come to a conclusion of this discussion: Either we
> drop the notes (e.g. by re-computing the DF_NOTE problem after web) or
> we update them, like my patch does. I prefer the patch I proposed
> because it re-instates the behavior GCC had before.

I prefer to declare the notes invalid and drop the notes.

Paolo

[SH] PR 54760 - Add DImode GBR loads/stores, fix optimization

2012-10-15 Thread Oleg Endo

Hello,

I somehow initially forgot to implement DImode GBR based loads/stores.
Attached patch does that and also fixes a problem with the GBR address
mode optimization.
Tested on rev 192417 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/54760
* config/sh/sh.c (sh_find_base_reg_disp): Stop searching insns 
when hitting a call insn if GBR is marked as call used.
* config/sh/iterators.md (QIHISIDI): New mode iterator.
* config/sh/predicates.md (gbr_address_mem): New predicate.
* config/sh/sh.md (*movdi_gbr_load, *movdi_gbr_store): New 
insn_and_split.
Use QIHISIDI instead of QIHISI in unnamed GBR addressing splits.


testsuite/ChangeLog:

PR target/54760
* gcc.target/sh/pr54760-2.c: Add long long and unsigned long 
long test functions.
* gcc.target/sh/pr54760-4.c: New.   
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192417)
+++ gcc/config/sh/sh.c	(working copy)
@@ -13383,6 +13383,10 @@
   for (rtx i = prev_nonnote_insn (insn); i != NULL;
 	   i = prev_nonnote_insn (i))
 	{
+	  if (REGNO_REG_SET_P (regs_invalidated_by_call_regset, GBR_REG)
+	  && CALL_P (i))
+	break;
+
 	  if (!NONJUMP_INSN_P (i))
 	continue;
 
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 192417)
+++ gcc/config/sh/sh.md	(working copy)
@@ -10277,6 +10277,47 @@
   "mov.	%0,@(0,gbr)"
   [(set_attr "type" "store")])
 
+;; DImode memory accesses have to be split in two SImode accesses.
+;; Split them before reload, so that it gets a better chance to figure out
+;; how to deal with the R0 restriction for the individual SImode accesses.
+;; Do not match this insn during or after reload because it can't be split
+;; afterwards.
+(define_insn_and_split "*movdi_gbr_load"
+  [(set (match_operand:DI 0 "register_operand")
+	(match_operand:DI 1 "gbr_address_mem"))]
+  "TARGET_SH1 && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 6))]
+{
+  /* Swap low/high part load order on little endian, so that the result reg
+ of the second load can be used better.  */
+  int off = TARGET_LITTLE_ENDIAN ? 1 : 0;
+  operands[3 + off] = gen_lowpart (SImode, operands[0]);
+  operands[5 + off] = gen_lowpart (SImode, operands[1]);
+  operands[4 - off] = gen_highpart (SImode, operands[0]);
+  operands[6 - off] = gen_highpart (SImode, operands[1]);
+})
+
+(define_insn_and_split "*movdi_gbr_store"
+  [(set (match_operand:DI 0 "gbr_address_mem")
+	(match_operand:DI 1 "register_operand"))]
+  "TARGET_SH1 && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 4) (match_dup 6))]
+{
+  /* Swap low/high part store order on big endian, so that stores of function
+ call results can save a reg copy.  */
+  int off = TARGET_LITTLE_ENDIAN ? 0 : 1;
+  operands[3 + off] = gen_lowpart (SImode, operands[0]);
+  operands[5 + off] = gen_lowpart (SImode, operands[1]);
+  operands[4 - off] = gen_highpart (SImode, operands[0]);
+  operands[6 - off] = gen_highpart (SImode, operands[1]);
+})
+
 ;; Sometimes memory accesses do not get combined with the store_gbr insn,
 ;; in particular when the displacements are in the range of the regular move
 ;; insns.  Thus, in the first split pass after the combine pass we search
@@ -10287,15 +10328,15 @@
 ;; other operand) and there's no point of doing it if the GBR is not
 ;; referenced in a function at all.
 (define_split
-  [(set (match_operand:QIHISI 0 "register_operand")
-	(match_operand:QIHISI 1 "memory_operand"))]
+  [(set (match_operand:QIHISIDI 0 "register_operand")
+	(match_operand:QIHISIDI 1 "memory_operand"))]
   "TARGET_SH1 && !reload_in_progress && !reload_completed
&& df_regs_ever_live_p (GBR_REG)"
   [(set (match_dup 0) (match_dup 1))]
 {
   rtx gbr_mem = sh_find_equiv_gbr_addr (curr_insn, operands[1]);
   if (gbr_mem != NULL_RTX)
-operands[1] = change_address (operands[1], GET_MODE (operands[1]), gbr_mem);
+operands[1] = replace_equiv_address (operands[1], gbr_mem);
   else
 FAIL;
 })
@@ -10309,7 +10350,7 @@
 {
   rtx gbr_mem = sh_find_equiv_gbr_addr (curr_insn, operands[1]);
   if (gbr_mem != NULL_RTX)
-operands[1] = change_address (operands[1], GET_MODE (operands[1]), gbr_mem);
+operands[1] = replace_equiv_address (operands[1], gbr_mem);
   else
 FAIL;
 })
@@ -10328,23 +10369,22 @@
   if (gbr_mem != NULL_RTX)
 {
   operands[2] = gen_reg_rtx (GET_MODE (operands[1]));
-  operands[1] = change_address (operands[1], GET_MODE (operands[1]),
-gbr_mem);
+  operands[1] = replace_equiv_address (operands[1], gbr_mem);
 }
   else
 FAIL;
 })
 
 (define_split
-  [(set

[SH] PR 51244 - Catch more unnecessary sign/zero extensions

2012-10-15 Thread Oleg Endo

Hello,

This one refactors some copy pasta that my previous patch regarding this
matter introduced and catches more unnecessary sign/zero extensions of T
bit stores.  It also fixes the bug reported in PR 54925 which popped up
after the last patch for PR 51244.
Tested on rev 192417 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

and no new failures.
OK?

Cheers,
Oleg

gcc/ChangeLog:

PR target/51244
* config/sh/sh-protos.h (set_of_reg): New struct.
(sh_find_set_of_reg, sh_is_logical_t_store_expr, 
sh_try_omit_signzero_extend):  Declare...
* config/sh/sh.c (sh_find_set_of_reg, 
sh_is_logical_t_store_expr, 
sh_try_omit_signzero_extend): ...these new functions.
* config/sh/sh.md (*logical_op_t): New insn_and_split.
(*zero_extendsi2_compact): Use sh_try_omit_signzero_extend
in splitter.
(*extendsi2_compact_reg): Convert to insn_and_split.  Use 
sh_try_omit_signzero_extend in splitter.
(*mov_reg_reg): Disallow t_reg_operand as operand 1.
(*cbranch_t): Rewrite combine part in splitter using new 
sh_find_set_of_reg function.

testsuite/ChangeLog:

PR target/51244
* gcc.target/sh/pr51244-17.c: New.
Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 192417)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -163,6 +163,25 @@
 	enum machine_mode mode = VOIDmode);
 extern rtx sh_find_equiv_gbr_addr (rtx cur_insn, rtx mem);
 extern int sh_eval_treg_value (rtx op);
+
+/* Result value of sh_find_set_of_reg.  */
+struct set_of_reg
+{
+  /* The insn where sh_find_set_of_reg stopped looking.
+ Can be NULL_RTX if the end of the insn list was reached.  */
+  rtx insn;
+
+  /* The set rtx of the specified reg if found, NULL_RTX otherwise.  */
+  const_rtx set_rtx;
+
+  /* The set source rtx of the specified reg if found, NULL_RTX otherwise.
+ Usually, this is the most interesting return value.  */
+  rtx set_src;
+};
+
+extern set_of_reg sh_find_set_of_reg (rtx reg, rtx insn, rtx(*stepfunc)(rtx));
+extern bool sh_is_logical_t_store_expr (rtx op, rtx insn);
+extern rtx sh_try_omit_signzero_extend (rtx extended_op, rtx insn);
 #endif /* RTX_CODE */
 
 extern void sh_cpu_cpp_builtins (cpp_reader* pfile);
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192417)
+++ gcc/config/sh/sh.c	(working copy)
@@ -13450,4 +13450,114 @@
   return NULL_RTX;
 }
 
+/*--
+  Manual insn combine support code.
+*/
+
+/* Given a reg rtx and a start insn, try to find the insn that sets the
+   specified reg by using the specified insn stepping function, such as 
+   'prev_nonnote_insn_bb'.  When the insn is found, try to extract the rtx
+   of the reg set.  */
+set_of_reg
+sh_find_set_of_reg (rtx reg, rtx insn, rtx(*stepfunc)(rtx))
+{
+  set_of_reg result;
+  result.insn = insn;
+  result.set_rtx = NULL_RTX;
+  result.set_src = NULL_RTX;
+
+  if (!REG_P (reg) || insn == NULL_RTX)
+return result;
+
+  for (result.insn = stepfunc (insn); result.insn != NULL_RTX;
+   result.insn = stepfunc (result.insn))
+{
+  if (LABEL_P (result.insn) || BARRIER_P (result.insn))
+	return result;
+  if (!NONJUMP_INSN_P (result.insn))
+	continue;
+  if (reg_set_p (reg, result.insn))
+	{
+	  result.set_rtx = set_of (reg, result.insn);
+
+	  if (result.set_rtx == NULL_RTX || GET_CODE (result.set_rtx) != SET)
+	return result;
+
+	  result.set_src = XEXP (result.set_rtx, 1);
+	  return result;
+	}
+}
+
+  return result;
+}
+
+/* Given an op rtx and an insn, try to find out whether the result of the
+   specified op consists only of logical operations on T bit stores.  */
+bool
+sh_is_logical_t_store_expr (rtx op, rtx insn)
+{
+  if (!logical_operator (op, SImode))
+return false;
+
+  rtx ops[2] = { XEXP (op, 0), XEXP (op, 1) };
+  int op_is_t_count = 0;
+
+  for (int i = 0; i < 2; ++i)
+{
+  if (t_reg_operand (ops[i], VOIDmode)
+	  || negt_reg_operand (ops[i], VOIDmode))
+	op_is_t_count++;
+
+  else
+	{
+	  set_of_reg op_set = sh_find_set_of_reg (ops[i], insn,
+		  prev_nonnote_insn_bb);
+	  if (op_set.set_src == NULL_RTX)
+	continue;
+
+	  if (t_reg_operand (op_set.set_src, VOIDmode)
+	  || negt_reg_operand (op_set.set_src, VOIDmode)
+	  || sh_is_logical_t_store_expr (op_set.set_src, op_set.insn))
+	  op_is_t_count++;
+	}
+}
+  
+  return op_is_t_count == 2;
+}
+
+/* Given the operand that is extended in a sign/zero extend insn, and the
+   insn, try to figure out whether the sign/zero extension can be replaced
+   by a simple reg-reg copy.  If so, the replacement reg rtx is returned,
+   NULL_RTX otherwise.  */
+rtx
+sh_try_omit_signzero_extend (rtx ext

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Steven Bosscher

On Mon, Oct 15, 2012 at 10:19 AM, Paolo Bonzini wrote:
> I prefer to declare the notes invalid and drop the notes.

Then, afaic, our only option is to drop them all in web, as per attached patch.

I strongly disagree with this approach though. It destroys information
that is correct, that we had before DF_RD_PRUNE_DEAD_DEFS, that we can
update, and that helps with optimization.

This whole discussion about notes being dead has gone in completely
the wrong direction. With renaming these notes are valid, and do not
refer to dead regs. Perhaps you could be convinced if you look at
Honza's test case with the patch of r192413 reverted.
The test case still fails with --param max-unroll-times=3, that makes
visualizing the problem easier.

Ciao!
Steven


web_destroy_eq_notes.diff
Description: Binary data

Re: [PATCH] Fix up vector CONSTRUCTOR verification ICE (PR tree-optimization/54889)

2012-10-15 Thread Richard Biener

On Fri, Oct 12, 2012 at 8:16 PM, Jakub Jelinek  wrote:
> Hi!
>
> Apparently vectorizable_load is another spot that could create vector
> CONSTRUCTORs that wouldn't pass the new CONSTRUCTOR verification.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?

You should only need this on the ARRAY_REF path (I wonder what are
the types that have a mismatch?), for MEM_REF simply use
build2 (MEM_REF, TREE_TYPE (vectype), ...).

Ok with that change.

Thanks,
Richard.

> 2012-10-11  Jakub Jelinek  
>
> PR tree-optimization/54889
> * tree-vect-stmts.c (vectorizable_load): Add VIEW_CONVERT_EXPR if
> newref doesn't have compatible type with vectype element type.
>
> * gfortran.dg/pr54889.f90: New test.
>
> --- gcc/tree-vect-stmts.c.jj2012-10-03 09:01:36.0 +0200
> +++ gcc/tree-vect-stmts.c   2012-10-11 10:38:38.920249396 +0200
> @@ -4752,6 +4752,11 @@ vectorizable_load (gimple stmt, gimple_s
>  running_off,
>  TREE_OPERAND (ref, 1));
>
> + if (!useless_type_conversion_p (TREE_TYPE (vectype),
> + TREE_TYPE (newref)))
> +   newref = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype),
> +newref);
> +
>   newref = force_gimple_operand_gsi (gsi, newref, true,
>  NULL_TREE, true,
>  GSI_SAME_STMT);
> --- gcc/testsuite/gfortran.dg/pr54889.f90.jj2012-10-11 10:58:11.982284176 
> +0200
> +++ gcc/testsuite/gfortran.dg/pr54889.f90   2012-10-11 10:59:14.283920937 
> +0200
> @@ -0,0 +1,10 @@
> +! PR tree-optimization/54889
> +! { dg-do compile }
> +! { dg-options "-O3" }
> +! { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } }
> +
> +subroutine foo(x,y,z)
> +  logical, pointer :: x(:,:)
> +  integer :: y, z
> +  x=x(1:y,1:z)
> +end subroutine
>
> Jakub

[SH] PR 54925 - Add test case

2012-10-15 Thread Oleg Endo

Hello,

This adds the test case from the PR.
Tested together with the patch posted here
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01380.html

OK?

Cheers,
Oleg

testsuite/ChangeLog:

PR target/54925
* gcc.c-torture/compile/pr54925.c: New.
Index: gcc/testsuite/gcc.c-torture/compile/pr54925.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr54925.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/pr54925.c	(revision 0)
@@ -0,0 +1,24 @@
+/* PR target/54925  */
+extern int bar;
+static unsigned char *
+nr_memcpy (unsigned char *, unsigned char *, unsigned short);
+
+void 
+baz (char *buf, unsigned short len)
+{
+  unsigned char data[10];
+  if (len == 0)
+return;
+  nr_memcpy (data, (unsigned char *) buf, len);
+  foo (&bar);
+}
+
+static unsigned char *
+nr_memcpy (unsigned char * to, unsigned char * from, unsigned short len)
+{
+  while (len > 0)
+{
+  len--;
+  *to++ = *from++;
+}
+}

Re: Move statements upwards after reassociation

2012-10-15 Thread Richard Biener

On Fri, Oct 12, 2012 at 8:18 PM, Easwaran Raman  wrote:
> On Fri, Oct 12, 2012 at 1:45 AM, Richard Biener
>  wrote:
>> On Fri, Oct 12, 2012 at 3:09 AM, Easwaran Raman  wrote:
>>> Thanks for the comments. As David wrote, the intent of the patch is
>>> not to do a general purpose scheduling, but to compensate for the
>>> possible live range lengthening introduced by reassociation.
>>>
>>>
>>> On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener
>>>  wrote:
 On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman  wrote:
>
> +/* Move STMT up within its BB until it can not be moved any further.  */
> +
> +static void move_stmt_upwards (gimple stmt)
> +{
> +  gimple_stmt_iterator gsi, gsistmt;
> +  tree rhs1, rhs2;
> +  gimple rhs1def = NULL, rhs2def = NULL;
> +  rhs1 = gimple_assign_rhs1 (stmt);
> +  rhs2 = gimple_assign_rhs2 (stmt);
> +  gcc_assert (rhs1);

 Please no such senseless asserts.  The following line will segfault anyway
 if rhs1 is NULL and with a debugger an assert doesn't add any useful
 information.
>>> Ok.

> +  if (TREE_CODE (rhs1) == SSA_NAME)
> +rhs1def = SSA_NAME_DEF_STMT (rhs1);
> +  else if (TREE_CODE (rhs1) != REAL_CST
> +   && TREE_CODE (rhs1) != INTEGER_CST)
> +return;
> +  if (rhs2)

 You may not access gimple_assign_rhs2 if it is not there.  So you have
 to check whether you have an unary, binary or ternary (yes) operation.
>>>
>>> gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment
>>> has less than two operands.  Regarding the check for ternary
>>> operation, I believe it is not necessary. A statement is considered
>>> for reassociation only if the RHS class is GIMPLE_BINARY_RHS.
>>> Subsequently, for rhs1 and rhs2, it checks if the def statements also
>>> have the same code and so it seems to me that a statement with a
>>> ternary operator in the RHS will never be considered in
>>> rewrite_expr_tree.
>>>
>>>

> +{
> +  if (TREE_CODE (rhs2) == SSA_NAME)
> +rhs2def = SSA_NAME_DEF_STMT (rhs2);
> +  else if (TREE_CODE (rhs1) != REAL_CST
> +   && TREE_CODE (rhs1) != INTEGER_CST)
> +return;
> +}
> +  gsi = gsi_for_stmt (stmt);
> +  gsistmt = gsi;
> +  gsi_prev (&gsi);
> +  for (; !gsi_end_p (gsi); gsi_prev (&gsi))

 

 This doesn't make much sense.  You can move a stmt to the nearest
 common post-dominator.  Assuming you only want to handle placing
 it after rhs1def or after rhs2def(?) you don't need any loop, just
 two dominator queries and an insertion after one of the definition
 stmts.
>>>
>>> Within a BB isn't that still O(size of BB)?
>>
>> Please document the fact that both stmts are in the same BB.
> Ok.
>> And no, it isn't, it is O (size of BB ^ 2).  You don't need a loop.
>> operand rank should reflect the dominance relation inside the BB.
>
> The rank of phi definitions would mess this up.
>
>> If that doesn't work simply assign UIDs to the stmts first.
> Ok.
>
>>
 But this code should consider BBs.
>>> For reassociation to look across BBs, the code should look something like 
>>> this:
>>>
>>> L1 :
>>>a_2 = a_1 + 10
>>>jc L3
>>> L2:
>>>   a_3 = a_2 + 20
>>>
>>> - L1 should dominate L2 (otherwise there will be a phi node at L2 and
>>> the reassociation of a_3 will not consider the definition of a_2)
>>> - There are no other uses of a_2 other than the one in L3.
>>>
>>> After reassociation, the stmt defining a_2 would be moved to L2.  In
>>> that case, the downward code motion of a_2 = a_1 + 10 to L2 is
>>> beneficial (one less instruction if the branch is taken). It is not
>>> obvious to me that moving  it to L1 (or whereever a_1 is defined) is
>>> beneficial.
>>
>> In this case it doesn't matter whether a1 lives through a3 or if a2 does.
>> But moving the stmt is not necessary, so why not simply avoid it.
> I used that example to show that the current downward motion has a
> useful side effect and this patch preserves it. Yes, in this example
> the downward motion can be avoided but in general it may not be
> possible. I agree with you that there is unnecessary code motion in
> many cases.
>
>> You cannot undo it with yout patch anyway.
>>
  And I don't see why more optimal
 placement cannot be done during rewrite_expr_tree itself.
>>>
>>> I started with that idea, but my current approach looks more simpler.
>>
>> Simpler, but it's a hack.
>>
>> So, the only place we "move" stmts in rewrite_expr_tree is here:
>>
>>   if (!moved)
>> {
>>   gimple_stmt_iterator gsinow, gsirhs1;
>>   gimple stmt1 = stmt, stmt2;
>>   unsigned int count;
>>
>>   gsinow = gsi_for_stmt (stmt);
>>   count = VEC_length (operand_entry_t, ops) - opindex - 2;
>>   while (count-- != 0)
>> {
>>   stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt1));
>>

Re: PR54915 (ssa-forwprop, vec_perm_expr)

2012-10-15 Thread Richard Biener

On Sat, Oct 13, 2012 at 11:25 AM, Marc Glisse  wrote:
> On Fri, 12 Oct 2012, Marc Glisse wrote:
>
>> Hello,
>>
>> apparently, in the optimization that recognizes that {v[1],v[0]} is a
>> VEC_PERM_EXPR, I forgot to check that v is a 2-element vector... (not that
>> there aren't things that could be done if v has a different size, just not
>> directly a VEC_PERM_EXPR, and not right now, priority is to fix the bug)
>>
>> Checking that v has the same type as the result seemed like the easiest
>> way, but there are many variations that could be slightly better or worse.
>>
>> bootstrap+testsuite ok.
>>
>> 2012-10-02  Marc Glisse  
>>
>> PR tree-optimization/54915
>>
>> gcc/
>> * tree-ssa-forwprop.c (simplify_vector_constructor): Check
>> argument's type.
>>
>> gcc/testsuite/
>> * gcc.dg/tree-ssa/pr54915.c: New testcase.
>
>
> This new version, with a slightly relaxed test, seems preferable and also
> passes testing.

Ok.

Thanks,
Richard.

> --
> Marc Glisse
> Index: testsuite/gcc.dg/tree-ssa/pr54915.c
> ===
> --- testsuite/gcc.dg/tree-ssa/pr54915.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/pr54915.c (revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef double v2df __attribute__ ((__vector_size__ (16)));
> +typedef double v4df __attribute__ ((__vector_size__ (32)));
> +
> +void f (v2df *ret, v4df* xp)
> +{
> +  v4df x = *xp;
> +  v2df xx = { x[2], x[3] };
> +  *ret = xx;
> +}
>
> Property changes on: testsuite/gcc.dg/tree-ssa/pr54915.c
> ___
> Added: svn:eol-style
>+ native
> Added: svn:keywords
>+ Author Date Id Revision URL
>
> Index: tree-ssa-forwprop.c
> ===
> --- tree-ssa-forwprop.c (revision 192420)
> +++ tree-ssa-forwprop.c (working copy)
> @@ -2833,20 +2833,22 @@ simplify_vector_constructor (gimple_stmt
>ref = TREE_OPERAND (op1, 0);
>if (orig)
> {
>   if (ref != orig)
> return false;
> }
>else
> {
>   if (TREE_CODE (ref) != SSA_NAME)
> return false;
> + if (!useless_type_conversion_p (type, TREE_TYPE (ref)))
> +   return false;
>   orig = ref;
> }
>if (TREE_INT_CST_LOW (TREE_OPERAND (op1, 1)) != elem_size)
> return false;
>sel[i] = TREE_INT_CST_LOW (TREE_OPERAND (op1, 2)) / elem_size;
>if (sel[i] != i) maybe_ident = false;
>  }
>if (i < nelts)
>  return false;
>
>

Re: [patch][wwwdocs] gcc 4.8 changes - mention scalability improvements

2012-10-15 Thread Richard Biener

On Sun, Oct 14, 2012 at 12:47 AM, Steven Bosscher  wrote:
> Hello,
>
> This patch adds a short notice about some speed-ups in GCC 4.8 for
> extremely large functions (coming from the work done on PR54146 by
> several people).
> OK for the wwwdocs?

Ok.

Thanks,
Richard.

> Ciao!
> Steven
>
>
>
> Index: htdocs/gcc-4.8/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
> retrieving revision 1.44
> diff -u -r1.44 changes.html
> --- htdocs/gcc-4.8/changes.html 9 Oct 2012 18:44:55 -   1.44
> +++ htdocs/gcc-4.8/changes.html 13 Oct 2012 22:45:59 -
> @@ -65,10 +65,17 @@
>level, and it makes PRE more aggressive.
>  
>  The struct reorg and matrix reorg optimizations (command-line
> -options -fipa-struct-reorg and
> --fipa-matrix-reorg) have been removed.  They did not
> -work correctly nor with link-time optimization (LTO), hence were only
> -applicable to programs consisting of a single translation unit.
> + options -fipa-struct-reorg and
> + -fipa-matrix-reorg) have been removed.  They did not
> + work correctly nor with link-time optimization (LTO), hence were only
> + applicable to programs consisting of a single translation unit.
> +
> +Several scalability bottle-necks have been removed from GCC's
> + optimization passes.  Compilation of extremely large functions,
> + e.g. due to the use of the flatten attribute in the
> + "Eigen" C++ linear algebra templates library, is significantly
> + faster than previous releases of GCC.
> +
>

Re: [patch] Back-port ifcvt.c changes from PR54146

2012-10-15 Thread Richard Biener

On Sun, Oct 14, 2012 at 10:05 PM, Steven Bosscher  wrote:
> Hello,
>
> This patch is a back-port of one of the scalability improvements I
> made to perform, well, maybe not well but at least not so poorly on
> the test case of PR54146, which has an extremely large function.
>
> The problem in ifcvt.c has two parts. The first is that clearing
> several arrays of size(max_reg_num) for every basic block slowed down
> things. The second part is that this memory was being allocated with
> alloca, so that a sufficiently large function could blow out the
> stack.
>
> The latter problem was now also found by a user trying to compile a
> sensible and well-known piece of software (see
> http://gcc.gnu.org/ml/gcc/2012-10/msg00202.html). This code compiles
> with older GCC releases, so this problem is a regression. To fix the
> problem in GCC 4.7, I'd like to propose this back-port.
>
> Bootstrapped&tested with release and default development checking on
> x86_64-unknown-linux-gnu and on powerpc64-unknown-linux-gnu. The patch
> has also already spent more than two months on the trunk now without
> problems. OK for the GCC 4.7 release branch? Maybe also for the GCC
> 4.6 branch after testing?

Ok for 4.7, I prefer to not backport this to 4.6 at this point.

Thanks,
Richard.

> Ciao!
> Steven

Re: [patch] Fix PR rtl-optimization/54870

2012-10-15 Thread Richard Biener

On Sun, Oct 14, 2012 at 10:47 PM, Eric Botcazou  wrote:
> Hi,
>
> This is the execution failure of gfortran.dg/array_constructor_4.f90 in 64-bit
> mode on SPARC/Solaris at -O3.  The dse2 dump for the reduced testcase reads:
>
> dse: local deletions = 0, global deletions = 1, spill deletions = 0
> starting the processing of deferred insns
> deleting insn with uid = 25.
> ending the processing of deferred insns
>
> but the memory location stored to:
>
> (insn 25 27 154 2 (set (mem/c:SI (plus:DI (reg/f:DI 30 %fp)
> (const_int 2039 [0x7f7])) [6 A.1+16 S4 A64])
> (reg:SI 1 %g1 [136])) array_constructor_4.f90:4 61 {*movsi_insn}
>  (nil))
>
> is read by a subsequent call to memcpy.
>
> It turns out that this memcpy call is generated for an aggregate assignment:
>
>   MEM[(c_char * {ref-all})&i] = MEM[(c_char * {ref-all})&A.17];
>
> Note the A.1 in the store and the A.17 in the load. A.1 and A.17 are aggregate
> variables sharing the same stack slot.  A.17 is correcty marked as addressable
> because of the call to memcpy, but A.1 isn't since its address isn't taken,
> and DSE can optimize away (since 4.7) stores if their MEM_EXPR doesn't escape.
>
> The store is reaching the load because an intermediate store into A.17:
>
> (insn 78 76 82 6 (set (mem/c:SI (plus:DI (reg/f:DI 30 %fp)
> (const_int 2039 [0x7f7])) [6 A.17+16 S4 A64])
> (reg:SI 1 %g1 [136])) array_constructor_4.f90:14 61 {*movsi_insn}
>  (nil))
>
> has been deleted by postreload as no-op (because redundant), thus making A.1
> partially escape without marking it as addressable.
>
> The attached patch uses cfun->gimple_df->escaped.vars to plug the hole: when
> mark_addressable is called during RTL expansion and the decl is partitioned,
> all the variables in the partition are added to the bitmap.  Then can_escape
> is changed to additionally test cfun->gimple_df->escaped.vars.
>
> Tested on x86-64/Linux and SPARC64/Solaris, OK for mainline and 4.7 branch?

Hmm.  I think this points to an issue with update_alias_info_with_stack_vars
instead.  That is, this function should have already cared for handling this
case where two decls have their stack slot shared.  What it seems to get
confused about is addressability, or rather can_escape is not using the
RTL alias export properly.  Instead of

static bool
can_escape (tree expr)
{
  tree base;
  if (!expr)
return true;
  base = get_base_address (expr);
  if (DECL_P (base)
  && !may_be_aliased (base))
return false;
  return true;

it needs to check decls_to_pointers[base] and then check
if any of the pointed-to decls may be aliased.

Now, that's not that easy because we don't have a
mapping from DECL UID to DECL (and the decl
isn't in the escaped solution if it is just used by
memcpy), but we could compute a bitmap of
all address-taken decls in update_alias_info_with_stack_vars
or simply treat all check decls_to_pointers[base] != NULL
bases as possibly having their address taken.

Richard.

>
> 2012-10-14  Eric Botcazou  
>
> PR rtl-optimization/54870
> * dse.c (can_escape): Test cfun->gimple_df->escaped.vars as well.
> * gimplify.c (mark_addressable): If this is a partition decl, add
> all the variables in the partition to cfun->gimple_df->escaped.vars.
>
>
> --
> Eric Botcazou

Re: [PATCH] Fix gcov handling directories with periods

2012-10-15 Thread Pedro Alves

On 10/15/2012 05:00 AM, Ian Lance Taylor wrote:
> On Sat, Oct 13, 2012 at 1:11 PM, Andreas Schwab  wrote:
>> Ian Lance Taylor  writes:
>>
>>> Suppose you drop this into include/libiberty.h:
>>>
>>> #ifdef __cplusplus
>>> inline char *lbasename(char *s) { return const_cast(lbasename (s)); }
>>> #endif
>>
>> That doesn't work:
>>
>> ../../gcc/libcpp/../include/libiberty.h: In function ‘char* 
>> lbasename(char*)’:
>> ../../gcc/libcpp/../include/libiberty.h:123:31: error: declaration of C 
>> function ‘char* lbasename(char*)’ conflicts with
>> ../../gcc/libcpp/../include/libiberty.h:121:20: error: previous declaration 
>> ‘const char* lbasename(const char*)’ here
> 
> Hmmm, of course.

Wrapping with extern "C++" makes it work:

#ifdef __cplusplus
extern "C++"
{
inline char *lbasename(char *s) { return const_cast(lbasename (s)); }
}
#endif

> 
> OK, your patch with CONST_CAST is OK.
> 
> Thanks.
-- 
Pedro Alves

Re: [patch] Fix PR rtl-optimization/54870

2012-10-15 Thread Eric Botcazou

> Hmm.  I think this points to an issue with update_alias_info_with_stack_vars
> instead.  That is, this function should have already cared for handling
> this case where two decls have their stack slot shared.

The problem here is that mark_addressable is called _after_ the function is
run.  IOW, by the time update_alias_info_with_stack_vars is run, there are no 
aliased variables in the function.

> static bool
> can_escape (tree expr)
> {
>   tree base;
>   if (!expr)
> return true;
>   base = get_base_address (expr);
>   if (DECL_P (base)
>   && !may_be_aliased (base))
> return false;
>   return true;
> 
> it needs to check decls_to_pointers[base] and then check
> if any of the pointed-to decls may be aliased.

That's essentially what the patch does though (except that it does it more 
efficiently), since update_alias_info_with_stack_vars correctly computes
cfun->gimple_df->escaped.vars for partitioned decls.

> Now, that's not that easy because we don't have a
> mapping from DECL UID to DECL (and the decl
> isn't in the escaped solution if it is just used by
> memcpy), but we could compute a bitmap of
> all address-taken decls in update_alias_info_with_stack_vars
> or simply treat all check decls_to_pointers[base] != NULL
> bases as possibly having their address taken.

OK, we can populate another bitmap in update_alias_info_with_stack_vars and 
update it in mark_addressable by means of decls_to_pointers and pi->pt.vars.
That seems a bit redundant with cfun->gimple_df->escaped.vars, but why not.

-- 
Eric Botcazou

Re: [PATCH, AARCH64] Added predefines for AArch64 code models

2012-10-15 Thread Marcus Shawcroft


On 11/09/12 15:02, Chris Schlumberger-Socha wrote:

This patch adds predefines for AArch64 code models. These code models are
added as an effective target for the AArch64 platform.



I've committed this patch to aarch64-trunk.

/Marcus

Re: Constant-fold vector comparisons

2012-10-15 Thread Richard Biener

On Fri, Oct 12, 2012 at 4:07 PM, Marc Glisse  wrote:
> On Sat, 29 Sep 2012, Marc Glisse wrote:
>
>> 1) it handles constant folding of vector comparisons,
>>
>> 2) it fixes another place where vectors are not expected
>
>
> Here is a new version of this patch.
>
> In a first try, I got bitten by the operator priorities in "a && b?c:d",
> which g++ doesn't warn about.
>
>
> 2012-10-12  Marc Glisse  
>
> gcc/
> * tree-ssa-forwprop.c (forward_propagate_into_cond): Handle vectors.
>
> * fold-const.c (fold_relational_const): Handle VECTOR_CST.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/foldconst-6.c: New testcase.
>
> --
> Marc Glisse
>
> Index: gcc/tree-ssa-forwprop.c
> ===
> --- gcc/tree-ssa-forwprop.c (revision 192400)
> +++ gcc/tree-ssa-forwprop.c (working copy)
> @@ -570,40 +570,43 @@ forward_propagate_into_cond (gimple_stmt
>code = gimple_assign_rhs_code (def_stmt);
>if (TREE_CODE_CLASS (code) == tcc_comparison)
> tmp = fold_build2_loc (gimple_location (def_stmt),
>code,
>TREE_TYPE (cond),
>gimple_assign_rhs1 (def_stmt),
>gimple_assign_rhs2 (def_stmt));
>else if ((code == BIT_NOT_EXPR
> && TYPE_PRECISION (TREE_TYPE (cond)) == 1)
>|| (code == BIT_XOR_EXPR
> -  && integer_onep (gimple_assign_rhs2 (def_stmt
> +  && ((gimple_assign_rhs_code (stmt) == VEC_COND_EXPR)
> +  ? integer_all_onesp (gimple_assign_rhs2 (def_stmt))
> +  : integer_onep (gimple_assign_rhs2 (def_stmt)

I don't think that we can do anything for vectors here.  The non-vector
path assumes that the type is a boolean type (thus two-valued), but
for vectors we can have arbitrary integer value input.  Thus, as we
defined true to -1 and false to 0 we cannot, unless relaxing what
VEC_COND_EXRP treats as true or false, optimize any of ~ or ^ -1
away.

Which means I'd prefer if you simply condition the existing ~ and ^
handling on COND_EXPR.

> {
>   tmp = gimple_assign_rhs1 (def_stmt);
>   swap = true;
> }
>  }
>
>if (tmp
>&& is_gimple_condexpr (tmp))
>  {
>if (dump_file && tmp)
> {
>   fprintf (dump_file, "  Replaced '");
>   print_generic_expr (dump_file, cond, 0);
>   fprintf (dump_file, "' with '");
>   print_generic_expr (dump_file, tmp, 0);
>   fprintf (dump_file, "'\n");
> }
>
> -  if (integer_onep (tmp))
> +  if ((gimple_assign_rhs_code (stmt) == VEC_COND_EXPR)
> + ? integer_all_onesp (tmp) : integer_onep (tmp))

and cache gimple_assign_rhs_code as a 'code' variable at the beginning
of the function.

> gimple_assign_set_rhs_from_tree (gsi_p, gimple_assign_rhs2 (stmt));
>else if (integer_zerop (tmp))
> gimple_assign_set_rhs_from_tree (gsi_p, gimple_assign_rhs3 (stmt));
>else
> {
>   gimple_assign_set_rhs1 (stmt, unshare_expr (tmp));
>   if (swap)
> {
>   tree t = gimple_assign_rhs2 (stmt);
>   gimple_assign_set_rhs2 (stmt, gimple_assign_rhs3 (stmt));
> Index: gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c (revision 0)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-ccp1" } */
> +
> +typedef long vec __attribute__ ((vector_size (2 * sizeof(long;
> +
> +vec f ()
> +{
> +  vec a = { -2, 666 };
> +  vec b = { 3, 2 };
> +  return a < b;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "666" "ccp1"} } */
> +/* { dg-final { cleanup-tree-dump "ccp1" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c
> ___
> Added: svn:keywords
>+ Author Date Id Revision URL
> Added: svn:eol-style
>+ native
>
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c(revision 192400)
> +++ gcc/fold-const.c(working copy)
> @@ -16121,20 +16121,44 @@ fold_relational_const (enum tree_code co
>   TREE_IMAGPART (op0),
>   TREE_IMAGPART (op1));
>if (code == EQ_EXPR)
> return fold_build2 (TRUTH_ANDIF_EXPR, type, rcond, icond);
>else if (code == NE_EXPR)
> return fold_build2 (TRUTH_ORIF_EXPR, type, rcond, icond);
>else
> return NULL_TREE;
>  }
>
> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
> +{
> +  int count = VECTOR_CST_NELTS (op0);
> +  tree *elts =  XALLOC

Re: [patch] Fix PR rtl-optimization/54870

2012-10-15 Thread Richard Biener

On Mon, Oct 15, 2012 at 12:00 PM, Eric Botcazou  wrote:
>> Hmm.  I think this points to an issue with update_alias_info_with_stack_vars
>> instead.  That is, this function should have already cared for handling
>> this case where two decls have their stack slot shared.
>
> The problem here is that mark_addressable is called _after_ the function is
> run.  IOW, by the time update_alias_info_with_stack_vars is run, there are no
> aliased variables in the function.

Where is mark_addressable called?  It's wrong (and generally impossible) to
do that late.

>> static bool
>> can_escape (tree expr)
>> {
>>   tree base;
>>   if (!expr)
>> return true;
>>   base = get_base_address (expr);
>>   if (DECL_P (base)
>>   && !may_be_aliased (base))
>> return false;
>>   return true;
>>
>> it needs to check decls_to_pointers[base] and then check
>> if any of the pointed-to decls may be aliased.
>
> That's essentially what the patch does though (except that it does it more
> efficiently), since update_alias_info_with_stack_vars correctly computes
> cfun->gimple_df->escaped.vars for partitioned decls.

No, what it does is if a decl is in ESCAPED make sure to add decls that
share the same partition also to ESCAPED.  The issue is that can_escape
queries TREE_ADDRESSABLE (which is correct on the gimple level, only
things that have their address taken can escape) - that's no longer possible
as soon as we have partitions with both addressable and non-addressable
decls.

>> Now, that's not that easy because we don't have a
>> mapping from DECL UID to DECL (and the decl
>> isn't in the escaped solution if it is just used by
>> memcpy), but we could compute a bitmap of
>> all address-taken decls in update_alias_info_with_stack_vars
>> or simply treat all check decls_to_pointers[base] != NULL
>> bases as possibly having their address taken.
>
> OK, we can populate another bitmap in update_alias_info_with_stack_vars and
> update it in mark_addressable by means of decls_to_pointers and pi->pt.vars.
> That seems a bit redundant with cfun->gimple_df->escaped.vars, but why not.

If you only have memcpy then escaped will be empty.  fixing escaped is
not the right solution (it may work for some reason in this case though).
The rtl code has to approximate ref_maybe_used_by_call_p in a conservative
way which it doesn't seem to do correctly (I don't remember a RTL alias.c
interface that would match this, or ref_maybe_used_by_stmt_p - maybe
we should add one?)

Thanks,
Richard.

> --
> Eric Botcazou

[PATCH][LTO] Move more non-tree pieces to bitfields

2012-10-15 Thread Richard Biener


This moves more non-tree fields out of the tree streaming routines
into the bitfield parts.  One to go: strings.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-10-15  Richard Guenther  

* tree-streamer-out.c (streamer_pack_tree_bitfields): Back
BINFO_BASE_ACCESSES and CONSTRUCTOR lengths here.
(streamer_write_chain): Write TREE_CHAIN as null-terminated list.
(write_ts_exp_tree_pointers): Adjust.
(write_ts_binfo_tree_pointers): Likewise.
(write_ts_constructor_tree_pointers): Likewise.
* tree-streamer-in.c (streamer_read_chain): Read TREE_CHAIN as
null-terminated list.
(unpack_value_fields): Unpack BINFO_BASE_ACCESSES and
CONSTRUCTOR lengths and materialize the arrays.
(lto_input_ts_exp_tree_pointers): Adjust.
(lto_input_ts_binfo_tree_pointers): Likewise.
(lto_input_ts_constructor_tree_pointers): Likewise.

Index: gcc/tree-streamer-out.c
===
*** gcc/tree-streamer-out.c (revision 192398)
--- gcc/tree-streamer-out.c (working copy)
*** streamer_pack_tree_bitfields (struct out
*** 409,414 
--- 409,420 
  
if (CODE_CONTAINS_STRUCT (code, TS_OPTIMIZATION))
  pack_ts_optimization (bp, expr);
+ 
+   if (CODE_CONTAINS_STRUCT (code, TS_BINFO))
+ bp_pack_var_len_unsigned (bp, VEC_length (tree, BINFO_BASE_ACCESSES 
(expr)));
+ 
+   if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
+ bp_pack_var_len_unsigned (bp, CONSTRUCTOR_NELTS (expr));
  }
  
  
*** streamer_write_builtin (struct output_bl
*** 454,464 
  void
  streamer_write_chain (struct output_block *ob, tree t, bool ref_p)
  {
!   int i, count;
! 
!   count = list_length (t);
!   streamer_write_hwi (ob, count);
!   for (i = 0; i < count; i++)
  {
tree saved_chain;
  
--- 460,466 
  void
  streamer_write_chain (struct output_block *ob, tree t, bool ref_p)
  {
!   while (t)
  {
tree saved_chain;
  
*** streamer_write_chain (struct output_bloc
*** 480,485 
--- 482,490 
TREE_CHAIN (t) = saved_chain;
t = TREE_CHAIN (t);
  }
+ 
+   /* Write a sentinel to terminate the chain.  */
+   stream_write_tree (ob, NULL_TREE, ref_p);
  }
  
  
*** write_ts_exp_tree_pointers (struct outpu
*** 725,731 
  {
int i;
  
-   streamer_write_hwi (ob, TREE_OPERAND_LENGTH (expr));
for (i = 0; i < TREE_OPERAND_LENGTH (expr); i++)
  stream_write_tree (ob, TREE_OPERAND (expr, i), ref_p);
stream_write_tree (ob, TREE_BLOCK (expr), ref_p);
--- 730,735 
*** write_ts_binfo_tree_pointers (struct out
*** 786,792 
stream_write_tree (ob, BINFO_VTABLE (expr), ref_p);
stream_write_tree (ob, BINFO_VPTR_FIELD (expr), ref_p);
  
!   streamer_write_uhwi (ob, VEC_length (tree, BINFO_BASE_ACCESSES (expr)));
FOR_EACH_VEC_ELT (tree, BINFO_BASE_ACCESSES (expr), i, t)
  stream_write_tree (ob, t, ref_p);
  
--- 790,797 
stream_write_tree (ob, BINFO_VTABLE (expr), ref_p);
stream_write_tree (ob, BINFO_VPTR_FIELD (expr), ref_p);
  
!   /* The number of BINFO_BASE_ACCESSES has already been emitted in
!  EXPR's bitfield section.  */
FOR_EACH_VEC_ELT (tree, BINFO_BASE_ACCESSES (expr), i, t)
  stream_write_tree (ob, t, ref_p);
  
*** write_ts_constructor_tree_pointers (stru
*** 807,813 
unsigned i;
tree index, value;
  
-   streamer_write_uhwi (ob, CONSTRUCTOR_NELTS (expr));
FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (expr), i, index, value)
  {
stream_write_tree (ob, index, ref_p);
--- 812,817 
Index: gcc/tree-streamer-in.c
===
*** gcc/tree-streamer-in.c  (revision 192398)
--- gcc/tree-streamer-in.c  (working copy)
*** input_identifier (struct data_in *data_i
*** 68,79 
  tree
  streamer_read_chain (struct lto_input_block *ib, struct data_in *data_in)
  {
-   int i, count;
tree first, prev, curr;
  
first = prev = NULL_TREE;
!   count = streamer_read_hwi (ib);
!   for (i = 0; i < count; i++)
  {
curr = stream_read_tree (ib, data_in);
if (prev)
--- 68,78 
  tree
  streamer_read_chain (struct lto_input_block *ib, struct data_in *data_in)
  {
tree first, prev, curr;
  
+   /* The chain is written as NULL terminated list of trees.  */
first = prev = NULL_TREE;
!   do
  {
curr = stream_read_tree (ib, data_in);
if (prev)
*** streamer_read_chain (struct lto_input_bl
*** 81,89 
else
first = curr;
  
-   TREE_CHAIN (curr) = NULL_TREE;
prev = curr;
  }
  
return first;
  }
--- 80,88 
else
first = curr;
  
prev = curr;
  }
+   while (curr);
  
return first;
  }
*** unpack_value_fields (struct data_in *dat
*** 452,457 
--- 4

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Steven Bosscher

On Mon, Oct 15, 2012 at 10:37 AM, Steven Bosscher  wrote:
> On Mon, Oct 15, 2012 at 10:19 AM, Paolo Bonzini wrote:
>> I prefer to declare the notes invalid and drop the notes.
>
> I strongly disagree with this approach though. It destroys information
> that is correct, that we had before DF_RD_PRUNE_DEAD_DEFS, that we can
> update, and that helps with optimization.

PR54916 is a case where dropping the notes would cause a missed
optimization in cse-after-loop. With my patch to update the notes, the
optimization (CSE of a load-immediate) is retained.

Ciao!
Steven

Re: [patch] Fix PR rtl-optimization/54870

2012-10-15 Thread Eric Botcazou

> Where is mark_addressable called?  It's wrong (and generally impossible) to
> do that late.

In expr.c:emit_block_move_hints.  It's one of the calls added to support the 
enhanced DSE last year, there are others in calls.c for example.

> If you only have memcpy then escaped will be empty.  fixing escaped is
> not the right solution (it may work for some reason in this case though).
> The rtl code has to approximate ref_maybe_used_by_call_p in a conservative
> way which it doesn't seem to do correctly (I don't remember a RTL alias.c
> interface that would match this, or ref_maybe_used_by_stmt_p - maybe
> we should add one?)

I'm OK with the new bitmap + decls_to_pointers idea.  Keep in mind that the 
info needs to be updated after update_alias_info_with_stack_vars, because for

MEM[(c_char * {ref-all})&i] = MEM[(c_char * {ref-all})&A.17];

you don't know until expand whether this will a memcpy or a move by pieces and 
the info is needed for the enhanced DSE to work properly.

-- 
Eric Botcazou

Fix double write of unchecked conversion to volatile variable

2012-10-15 Thread Eric Botcazou

For the attached Ada testcase, the compiler generates a double write to the 
volatile variable at -O1.  The problem is a VIEW_CONVERT_EXPR on the rhs.

We have in store_expr:

 If TEMP and TARGET compare equal according to rtx_equal_p, but
 one or both of them are volatile memory refs, we have to distinguish
 two cases:
 - expand_expr has used TARGET.  In this case, we must not generate
   another copy.  This can be detected by TARGET being equal according
   to == .
 - expand_expr has not used TARGET - that means that the source just
   happens to have the same RTX form.  Since temp will have been created
   by expand_expr, it will compare unequal according to == .
   We must generate a copy in this case, to reach the correct number
   of volatile memory references.  */

So store_expr expects that, if expand_expr returns TEMP != TARGET, then TARGET 
hasn't been used and a copy is needed.

Now in the VIEW_CONVERT_EXPR case of expand_expr, we have:

  /* At this point, OP0 is in the correct mode.  If the output type is
 such that the operand is known to be aligned, indicate that it is.
 Otherwise, we need only be concerned about alignment for non-BLKmode
 results.  */
  if (MEM_P (op0))
{
  enum insn_code icode;

  op0 = copy_rtx (op0);

i.e. op0 is blindly copied, which breaks the assumption of store_expr.

The attached patch removes the copy_rtx from the main path and puts it back 
only on the sub-path where it is presumably still needed.  Of course it's 
probably still problematic wrt store_expr, but it's the TYPE_ALIGN_OK case and 
I'm not sure it matters in practice; the patch contains a ??? note though.

Tested on x86_64-suse-linux, applied on the mainline.


2012-10-15  Eric Botcazou  

* expr.c (expand_expr_real_1) : Do not unnecessarily
copy the object in the MEM_P case.


2012-10-15  Eric Botcazou  

* gnat.dg/unchecked_convert9.ad[sb]: New test.


-- 
Eric BotcazouIndex: expr.c
===
--- expr.c	(revision 192447)
+++ expr.c	(working copy)
@@ -10270,10 +10270,15 @@ expand_expr_real_1 (tree exp, rtx target
 	{
 	  enum insn_code icode;
 
-	  op0 = copy_rtx (op0);
-
 	  if (TYPE_ALIGN_OK (type))
-	set_mem_align (op0, MAX (MEM_ALIGN (op0), TYPE_ALIGN (type)));
+	{
+	  /* ??? Copying the MEM without substantially changing it might
+		 run afoul of the code handling volatile memory references in
+		 store_expr, which assumes that TARGET is returned unmodified
+		 if it has been used.  */
+	  op0 = copy_rtx (op0);
+	  set_mem_align (op0, MAX (MEM_ALIGN (op0), TYPE_ALIGN (type)));
+	}
 	  else if (mode != BLKmode
 		   && MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode)
 		   /* If the target does have special handling for unaligned-- { dg-do compile }
-- { dg-options "-O -fdump-rtl-final" }

package body Unchecked_Convert9 is

   procedure Proc is
 L : Unsigned_32 := 16##;
   begin
 Var := Conv (L);
   end;

end Unchecked_Convert9;

-- { dg-final { scan-rtl-dump-times "set \\(mem/v" 1 "final" } }
-- { dg-final { cleanup-rtl-dump "final" } }with System;
with Ada.Unchecked_Conversion;
with Interfaces; use Interfaces;

package Unchecked_Convert9 is

   type R is record
 H : Unsigned_16;
 L : Unsigned_16;
   end record;

   Var : R;
   pragma Volatile (Var);

   function Conv is new
 Ada.Unchecked_Conversion (Source => Unsigned_32, Target => R);

   procedure Proc;

end Unchecked_Convert9;

Re: [lra] patch from Richard Sandiford's review of lra-constraints.c

2012-10-15 Thread Richard Sandiford

Thanks for the updates, they look good to me.

Vladimir Makarov  writes:
> @@ -100,8 +102,9 @@
>o for pseudos needing save/restore code around calls.
>  
>  If the split pseudo still has the same hard register as the
> -original pseudo after the subsequent assignment pass, the opposite
> -transformation is done on the same pass for undoing inheritance.  */
> +original pseudo after the subsequent assignment pass or the
> +original pseudo was split, the opposite transformation is done on
> +the same pass for undoing inheritance.  */

Looks like this should be "original pseudo was spilled" rather than "split".

> @@ -2276,11 +2157,7 @@ process_alt_operands (int only_alternati
>then.  */
> if (! (REG_P (op)
>&& REGNO (op) >= FIRST_PSEUDO_REGISTER)
> -   && ! (const_to_mem && constmemok)
> -   /* We can reload the address instead of memory (so
> -  do not punish it).  It is preferable to do to
> -  avoid cycling in some cases.  */
> -   && ! (MEM_P (op) && offmemok))
> +   && ! (const_to_mem && constmemok))
>   reject += 2;

Sorry, I wasn't suggesting you change this.  I think the old version
was correct.  I'll follow up on the other thread.

Richard

Re: [patch] Fix PR rtl-optimization/54870

2012-10-15 Thread Richard Biener

On Mon, Oct 15, 2012 at 12:43 PM, Eric Botcazou  wrote:
>> Where is mark_addressable called?  It's wrong (and generally impossible) to
>> do that late.
>
> In expr.c:emit_block_move_hints.  It's one of the calls added to support the
> enhanced DSE last year, there are others in calls.c for example.

Ugh ... that looks like a hack to make can_escape "work".  It looks to me
that we should somehow preserve knowledge on what vars a call may
use or clobber (thus the GIMPLE call-use and call-clobber sets).

As I'm not sure how to best do that I suggest we do a more proper RTL
DSE hack by adding a 'libcall-call-escape'-set which we can add to
instead of calling mark_addressable this late.  We need to add all
partitions of a decl here, of course, and we need to query it from can_escape.

But that sounds way cleaner than abusing TREE_ADDRESSABLE for this ...

>> If you only have memcpy then escaped will be empty.  fixing escaped is
>> not the right solution (it may work for some reason in this case though).
>> The rtl code has to approximate ref_maybe_used_by_call_p in a conservative
>> way which it doesn't seem to do correctly (I don't remember a RTL alias.c
>> interface that would match this, or ref_maybe_used_by_stmt_p - maybe
>> we should add one?)
>
> I'm OK with the new bitmap + decls_to_pointers idea.  Keep in mind that the
> info needs to be updated after update_alias_info_with_stack_vars, because for
>
> MEM[(c_char * {ref-all})&i] = MEM[(c_char * {ref-all})&A.17];
>
> you don't know until expand whether this will a memcpy or a move by pieces and
> the info is needed for the enhanced DSE to work properly.

Well, it just means that the enhanced DSE is fragile :/

Richard.

> --
> Eric Botcazou

[PATCH] Fix PR54920

2012-10-15 Thread Richard Biener


This fixes PR54920.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied on
trunk and the 4.7 branch.

Richard.

2012-10-15  Richard Guenther  

PR tree-optimization/54920
* tree-ssa-pre.c (create_expression_by_pieces): Properly
allocate temporary storage for all NARY elements.

* gcc.dg/torture/pr54920.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 192398)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -2853,7 +2853,7 @@ create_expression_by_pieces (basic_block
 case NARY:
   {
vn_nary_op_t nary = PRE_EXPR_NARY (expr);
-   tree genop[4];
+   tree *genop = XALLOCAVEC (tree, nary->length);
unsigned i;
for (i = 0; i < nary->length; ++i)
  {
Index: gcc/testsuite/gcc.dg/torture/pr54920.c
===
--- gcc/testsuite/gcc.dg/torture/pr54920.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr54920.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+typedef short __v8hi __attribute__ ((__vector_size__ (16)));
+typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+int a;
+__m128i b;
+
+void
+fn1 ()
+{
+  while (1)
+b = (__m128i) (__v8hi) { a, 0, 0, 0, 0, 0 };
+}

Re: [SH] Document function attributes

2012-10-15 Thread Kaz Kojima

Oleg Endo  wrote:
> The attached patch adds documentation for SH specific function
> attributes which haven't been documented yet.
> Tested with 'make info dvi pdf'.
> OK?

OK.

Regards,
kaz

Re: [SH] PR 34777 - Add test case

2012-10-15 Thread Kaz Kojima

Oleg Endo  wrote:
> The attached patch adds gcc.target/sh/torture and puts the test there.
> The torture subdir might be also useful in the future.
> Tested on rev 192417 with
> make -k check-gcc RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml}"
> 
> OK?

OK.

Regards,
kaz

Re: [SH] PR 54760 - Add DImode GBR loads/stores, fix optimization

2012-10-15 Thread Kaz Kojima

Oleg Endo  wrote:
> I somehow initially forgot to implement DImode GBR based loads/stores.
> Attached patch does that and also fixes a problem with the GBR address
> mode optimization.
> Tested on rev 192417 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> OK?

OK.

Regards,
kaz

Re: [PATCH] [AArch64] Refactor Advanced SIMD builtin initialisation.

2012-10-15 Thread Marcus Shawcroft


On 05/10/12 16:52, James Greenhalgh wrote:


Hi,

This patch refactors the initialisation code for the Advanced
SIMD builtins under the AArch64 target. The patch has been
regression tested on aarch64-none-elf.

OK for aarch64-branch?

(If yes, someone will have to commit this for me as I do not
have commit rights)

Thanks,
James Greenhalgh

---
2012-09-07  James Greenhalgh
Tejas Belagod

* config/aarch64/aarch64-builtins.c
(aarch64_simd_builtin_type_bits): Rename to...
(aarch64_simd_builtin_type_mode): ...this, make sequential.
(aarch64_simd_builtin_datum): Refactor members where possible.
(VAR1, VAR2, ..., VAR12): Update accordingly.
(aarch64_simd_builtin_data): Update accordingly.
(init_aarch64_simd_builtins): Refactor.
(aarch64_simd_builtin_compare): Remove.
(locate_simd_builtin_icode): Likewise.


OK and backport to aarch64-4.7-branch please.

/Marcus

Re: [SH] PR 51244 - Catch more unnecessary sign/zero extensions

2012-10-15 Thread Kaz Kojima

Oleg Endo  wrote:
> This one refactors some copy pasta that my previous patch regarding this
> matter introduced and catches more unnecessary sign/zero extensions of T
> bit stores.  It also fixes the bug reported in PR 54925 which popped up
> after the last patch for PR 51244.
> Tested on rev 192417 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> OK?

OK.

Regards,
kaz

Re: Ping^2: RFA: Process '*' in '@'-output-template alternatives

2012-10-15 Thread Bernd Schmidt

On 10/15/2012 06:41 AM, Joern Rennecke wrote:
> The following patch is still awaiting review:
> 
> 2011-09-19  J"orn Rennecke  
> 
> * genoutput.c (process_template): Process '*' in '@' alternatives.
> * doc/md.texi (node Output Statement): Provide example for the
> above.
> 
> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01422.html

I'm pretty sure I've wanted something like this in the past. Ok.


Bernd

Re: [PATCH] [AArch64] Add vcond, vcondu support.

2012-10-15 Thread Marcus Shawcroft


On 09/10/12 12:08, James Greenhalgh wrote:


Hi,

This patch adds support for vcond and vcondu to the AArch64
backend.

Tested with no regressions on aarch64-none-elf.

OK for aarch64-branch?

(If so, someone will have to commit for me, as I do not
have commit rights.)

Thanks
James Greenhalgh

---
2012-09-11  James Greenhalgh
Tejas Belagod

* config/aarch64/aarch64-simd.md
(aarch64_simd_bsl_internal): New pattern.
(aarch64_simd_bsl): Likewise.
(aarch64_vcond_internal): Likewise.
(vcondu): Likewise.
(vcond): Likewise.
* config/aarch64/iterators.md (UNSPEC_BSL): Add to define_constants.


OK
/Marcus

Re: [SH] PR 54925 - Add test case

2012-10-15 Thread Kaz Kojima

Oleg Endo  wrote:
> This adds the test case from the PR.
> Tested together with the patch posted here
> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01380.html
> 
> OK?

It would be better to make it a valid C program.  I've checked
that the test case with the change below also ICEs on revision
192446 for sh-linux and your another patch fixes it.  OK with
that change.

Regards,
kaz
--
--- gcc.c-torture/compile/pr54925.c~2012-10-15 20:00:50.0 +0900
+++ gcc.c-torture/compile/pr54925.c 2012-10-15 20:01:03.0 +0900
@@ -1,5 +1,6 @@
 /* PR target/54925  */
 extern int bar;
+extern void foo (int *);
 static unsigned char *
 nr_memcpy (unsigned char *, unsigned char *, unsigned short);
 
@@ -16,9 +17,11 @@ baz (char *buf, unsigned short len)
 static unsigned char *
 nr_memcpy (unsigned char * to, unsigned char * from, unsigned short len)
 {
+  unsigned char *p = to;
   while (len > 0)
 {
   len--;
   *to++ = *from++;
 }
+  return p;
 }

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Paolo Bonzini

Il 15/10/2012 10:37, Steven Bosscher ha scritto:
>> I prefer to declare the notes invalid and drop the notes.
> Then, afaic, our only option is to drop them all in web, as per attached 
> patch.
>
> I strongly disagree with this approach though. It destroys information
> that is correct, that we had before DF_RD_PRUNE_DEAD_DEFS, that we can
> update, and that helps with optimization. With renaming these notes
> are valid, and do not refer to dead regs

I agree it is bad.  But I do not understand the last sentence: I
suppose you mean that _without_ renaming these notes are valid, on the
other hand it is normal that some of the notes will be dropped if you
shorten live ranges.

Without removing all of the notes you can do something like this:

- drop the deferred rescanning from web.c.  Instead, make replace_ref
return a bool and call df_insn_rescan manually from web_main.

- attribute new registers to webs in a separate pass that happens
before rewriting, and compute a special version of LR_IN/LR_OUT that
uses the rewritten registers.

- process instructions in reverse order; before starting the visit of
a basic block, initialize the local LR bitmap with the rewritten
LR_OUT of the previous step

- after rewriting and scanning each statement, simulate liveness using
the new defs and uses.

- after rewriting each statement, look for EQ_USES referring to
registers that are dead just before the statement, and delete
REG_EQUAL notes if this is the case

Paolo

> This whole discussion about notes being dead has gone in completely
> the wrong direction. With renaming these notes are valid, and do not
> refer to dead regs. Perhaps you could be convinced if you look at
> Honza's test case with the patch of r192413 reverted.
> The test case still fails with --param max-unroll-times=3, that makes
> visualizing the problem easier.

Re: RFC: LRA for x86/x86-64 [7/9] -- continuation

2012-10-15 Thread Richard Sandiford

Vladimir Makarov  writes:
>> if that's accurate.  I dropped the term "reload pseudo" because of
>> the general comment in my earlier reply about the use of "reload pseudo"
>> when the code seems to include inheritance and split pseudos too.
> There is no inheritance and splitting yet.  It is done after the 
> constraint pass.
> So at this stage >= new_regno_start means reload pseudo.

Ah, OK.

>> That's a change in the meaning of NEW_CLASS, but seems easier for
>> callers to handle.  I think all it requires is changing:
>>
>>> +  common_class = ira_reg_class_subset[rclass][cl];
>>> +  if (new_class != NULL)
>>> +   *new_class = common_class;
>> to:
>>
>>common_class = ira_reg_class_subset[rclass][cl];
>>if (new_class != NULL && rclass != common_class)
>>  *new_class = common_class;
> This change results in infinite LRA looping on a first libgcc file 
> compilation.  Unfortunately I have no time to investigate it.
> I'd like to say that most code of in this code is very sensitive to 
> changes.  I see it a lot.  You change something looking obvious and a 
> target is broken.
> I am going to investigate it when I have more time.

Thanks.

>>> +default:
>>> +  {
>>> +   const char *fmt = GET_RTX_FORMAT (code);
>>> +   int i;
>>> +
>>> +   if (GET_RTX_LENGTH (code) != 1
>>> +   || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
>>> + {
>>> +   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
>>> + if (fmt[i] == 'e')
>>> +   extract_loc_address_regs (false, mode, as, &XEXP (x, i),
>>> + context_p, code, SCRATCH,
>>> + modify_p, ad);
>>> +   break;
>>> + }
>>> +   /* fall through for case UNARY_OP (UNSPEC ...)  */
>>> +  }
>>> +
>>> +case UNSPEC:
>>> +  if (ad->disp_loc == NULL)
>>> +   ad->disp_loc = loc;
>>> +  else if (ad->base_reg_loc == NULL)
>>> +   {
>>> + ad->base_reg_loc = loc;
>>> + ad->base_outer_code = outer_code;
>>> + ad->index_code = index_code;
>>> + ad->base_modify_p = modify_p;
>>> +   }
>>> +  else
>>> +   {
>>> + lra_assert (ad->index_reg_loc == NULL);
>>> + ad->index_reg_loc = loc;
>>> +   }
>>> +  break;
>>> +
>>> +}
>> Which targets use a bare UNSPEC as a displacement?  I thought a
>> displacement had to be a link-time constant, in which case it should
>> satisfy CONSTANT_P.  For UNSPECs, that means wrapping it in a CONST.
> I saw it somewhere.  I guess IA64.
>> I'm just a bit worried that the UNSPEC handling is sensitive to the
>> order that subrtxes are processed (unlike PLUS, which goes to some
>> trouble to work out what's what).  It could be especially confusing
>> because the default case processes operands in reverse order while
>> PLUS processes them in forward order.
>>
>> Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
>> Probably deserves a comment.
> I don't remember.  To figure out, I should switch it off and try all 
> targets supported by LRA.
>> AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
>> recording where reloads of a particular class need to go (obviously
>> in the case of disp_loc, which isn't reloaded at all).  The feidls
>> have semantic value too.  I.e. we use them to work out the value
>> of at least part of the address.
>>
>> In that case it seems dangerous to look through general rtxes
>> in the way that the default case above does.  Maybe just making
>> sure that DISP_LOC is involved in a sum with the base would be
>> enough, but another idea was:
>>
>> 
>> I know of three ways of "mutating" (for want of a better word)
>> an address:
>>
>>1. (and X (const_int X)), to align
>>2. a subreg
>>3. a unary operator (such as truncation or extension)
>>
>> So maybe we could:
>>
>>a. remove outer mutations (using a helper function)
>>b. handle LO_SUM, PRE_*, POST_*: as now
>>c. otherwise treat the address of the sum of one, two or three pieces.
>>   c1. Peel mutations of all pieces.
>>   c2. Classify the pieces into base, index and displacement.
>>   This would be similar to the jousting code above, but hopefully
>>   easier because all three rtxes are to hand.  E.g. we could
>>   do the base vs. index thing in a similar way to
>>   commutative_operand_precedence.
>>   c3. Record which pieces were mutated (e.g. using something like the
>>   index_loc vs. index_reg_loc distinction in the current code)
>>
>> That should be general enough for current targets, but if it isn't,
>> we could generalise it further when we know what generalisation is needed.
>>
>> That's still going to be a fair amount of code, but hopefully not more,
>> and we might have more confidence at each stage what each value is.
>> And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
>> ---

[path] PR 54900: store data race in if-conversion pass

2012-10-15 Thread Aldy Hernandez

[Ian, I am copying you because this is originally your code.  Richard, I 
am copying you because you are all things aliased :-).  And Andrew is 
here because I am unable to come up with a suitable test for the 
simulate-thread harness.].


Here we have a store data race because noce_can_store_speculate_p() 
incorrectly returns true.  The problem is that 
memory_modified_in_insn_p() returns true if an instruction *MAY* modify 
a memory location, but the store can only be speculated if we are 
absolutely sure the store will happen on all subsequent paths.


My approach is to implement a memory_SURELY_modified_in_insn_p(), which 
will trigger this optimization less often, but at least it will be correct.


I thought of restricting the speculation for "--param 
allow-store-data-race=0" or transactional code, but IMO the speculation 
is incorrect as is.


I am having a bit of a problem coming up with a generic testcase. 
Perhaps Andrew or others have an idea.


The attached testcase fails to trigger without the patch, because AFAICT 
we have no way of testing an addition of zero to a memory location:


cmpl$1, flag(%rip)
setb%al
addl%eax, dont_write(%rip)

In the simulate-thread harness I can test the environment before an 
instruction, and after an instruction, but adding 0 to *dont_write 
produces no measurable effects, particularly in a back-end independent 
manner.  Ideas?


Bootstrap and regtested on x86-64 Linux.

Patch OK? (Except the test itself.)

PR tree-optimization/54900
* ifcvt.c (noce_can_store_speculate_p): Call
memory_surely_modified_in_insn_p.
* alias.c (memory_surely_modified_in_insn_p): New.
(memory_modified_in_insn_p): Change comment.

diff --git a/gcc/alias.c b/gcc/alias.c
index 0c6a744..26d3797 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -2748,8 +2748,8 @@ memory_modified_1 (rtx x, const_rtx pat ATTRIBUTE_UNUSED, 
void *data)
 }
 
 
-/* Return true when INSN possibly modify memory contents of MEM
-   (i.e. address can be modified).  */
+/* Return true if INSN *MAY* possibly modify the memory contents of
+   MEM (i.e. address can be modified).  */
 bool
 memory_modified_in_insn_p (const_rtx mem, const_rtx insn)
 {
@@ -2760,6 +2760,22 @@ memory_modified_in_insn_p (const_rtx mem, const_rtx insn)
   return memory_modified;
 }
 
+/* Like memory_modified_in_insn_p, but return TRUE if INSN will
+   *SURELY* modify the memory contents of MEM.  */
+bool
+memory_surely_modified_in_insn_p (const_rtx mem, const_rtx insn)
+{
+  if (!INSN_P (insn))
+return false;
+  rtx set = single_set (insn);
+  if (set)
+{
+  rtx dest = SET_DEST (set);
+  return rtx_equal_p (dest, mem);
+}
+  return false;
+}
+
 /* Initialize the aliasing machinery.  Initialize the REG_KNOWN_VALUE
array.  */
 
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 2f486a2..659e464 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2415,7 +2415,7 @@ noce_can_store_speculate_p (basic_block top_bb, const_rtx 
mem)
  || (CALL_P (insn) && (!RTL_CONST_CALL_P (insn)
return false;
 
- if (memory_modified_in_insn_p (mem, insn))
+ if (memory_surely_modified_in_insn_p (mem, insn))
return true;
  if (modified_in_p (XEXP (mem, 0), insn))
return false;
diff --git a/gcc/rtl.h b/gcc/rtl.h
index cd5d435..d449ee1 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2614,6 +2614,7 @@ extern void init_alias_analysis (void);
 extern void end_alias_analysis (void);
 extern void vt_equate_reg_base_value (const_rtx, const_rtx);
 extern bool memory_modified_in_insn_p (const_rtx, const_rtx);
+extern bool memory_surely_modified_in_insn_p (const_rtx, const_rtx);
 extern bool may_be_sp_based_p (rtx);
 extern rtx gen_hard_reg_clobber (enum machine_mode, unsigned int);
 extern rtx get_reg_known_value (unsigned int);
diff --git a/gcc/testsuite/gcc.dg/simulate-thread/speculative-store-5.c 
b/gcc/testsuite/gcc.dg/simulate-thread/speculative-store-5.c
new file mode 100644
index 000..52daa27
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/simulate-thread/speculative-store-5.c
@@ -0,0 +1,63 @@
+/* { dg-do link } */
+/* { dg-options "--param allow-store-data-races=0" } */
+/* { dg-final { simulate-thread } } */
+
+#include 
+#include "simulate-thread.h"
+
+/* PR tree-optimization/54900 */
+
+/* This file tests that a speculative store does not happen unless we
+   are sure a store to the location would happen anyway.  */
+
+int flag = 1;
+int stuff;
+int *stuffp = &stuff;
+int dont_write = 555;
+
+void simulate_thread_other_threads() 
+{
+}
+
+int simulate_thread_step_verify()
+{
+  if (dont_write != 555)
+{
+  printf("FAIL: dont_write variable was assigned to.  \n");
+  return 1;
+}
+  return 0;
+}
+
+int simulate_thread_final_verify()
+{
+  return 0;
+}
+
+void outerfunc (int p1)
+{
+  *stuffp = 0;
+}
+
+int innerfunc ()
+{
+  if (flag)
+return 0;
+  /* This store should never happen because flag

Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-15 Thread Tobias Schlüter


On 2012-10-14 23:44, Jakub Jelinek wrote:

On Mon, Oct 15, 2012 at 12:35:27AM +0300, Janne Blomqvist wrote:

On Sat, Oct 13, 2012 at 4:26 PM, Tobias Schlüter

I'm putting forward two patches.  One uses a C++ map to very concisely build
up and handle the ordered list of symbols.  This has three problems:
1) gfortran maintainers may not want C++isms (even though in this case
it's very localized, and in my opinion very transparent), and


Even if you prefer a C++isms, why don't you go for "hash-table.h"?
std::map at least with the default allocator will just crash the compiler
if malloc returns NULL (remember that we build with -fno-exceptions),
while when you use hash-table.h (or hashtab.h) you get proper OOM diagnostics.


I wasn't aware of the OOM problem.  Couldn't gcc install a default 
memory handler that gives the correct diagnostics?  That certainly 
sounds like the most sensible solution, but I don't know if it's possible.


From looking over hash-table.h, I dislike two feature about it, one 
aesthetical, one practical: 1) I need to use a callback function during 
the iteration, which is less transparent than the for() loop and 2) I 
can't read from the comments whether traversal is ordered (I don't think 
it is, but ordered traversal is the whole point of my patch).


Cheers,
- Tobi

Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-15 Thread Tobias Schlüter



Hi Janne,

thanks for the review.

On 2012-10-14 23:35, Janne Blomqvist wrote:

- Personally, I'd prefer the C++ version; The C++ standard library is
widely used and documented and using it in favour of rolling our own
is IMHO a good idea.

- I'd be vary wrt backporting, in my experience the module.c code is
somewhat fragile and easily causes regressions. In any case, AFAICS PR
51727 is not a regression.


Ok to both.  The bug is a real problem, and I don't think the patch is 
at all dangerous, but it's good to have a second opinion.



- I think one could go a step further and get rid of the BBT stuff in
pointer_info, replacing it with two file-level maps

std::map pmap; // Or could be std::unordered_map
if available
std::map imap;

So when writing a module, use pmap similar to how pointer_info BBT is
used now, and then use imap to get a consistent order per your patch.
When reading, lookup/create mostly via imap, creating a pmap entry
also when creating a new imap entry; this avoids having to do a
brute-force search when looking up via pointer when reading (see
find_pointer2()).

(This 3rd point is mostly an idea for further work, and is not meant
as a requirement for accepting the patch)


Of course the BBT is equivalent to a map (or hash-table) with different 
syntax, but I agree that it would be nice to enhance the code to do away 
with brute-force searching.



Ok for trunk, although wait for a few days in case there is a storm of
protest on the C vs. C++ issue from other gfortran maintainers.


Yes, of course, we don't want to end up in a situation where gfortran 
maintainers suddenly need to know C, Fortran and all of C++, so I want 
to be careful about this patch.  Besides that concern, I'll also wait 
until I get more input concerning the issue that Jakub raised.


Cheers,
- Tobi

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Steven Bosscher

On Mon, Oct 15, 2012 at 1:49 PM, Paolo Bonzini wrote:
>> I strongly disagree with this approach though. It destroys information
>> that is correct, that we had before DF_RD_PRUNE_DEAD_DEFS, that we can
>> update, and that helps with optimization. With renaming these notes
>> are valid, and do not refer to dead regs
>
> I agree it is bad.  But I do not understand the last sentence: I
> suppose you mean that _without_ renaming these notes are valid, on the
> other hand it is normal that some of the notes will be dropped if you
> shorten live ranges.

OK, I now got so confused that I looked into this a bit deeper.

The situation is something like the following just after unrolling,
but before web:

   33 r72:DI=[`rowArray']
   34 {r103:DI=r72:DI+0x18;clobber flags:CC;}
   ...
   45 flags:CCNO=cmp([r72:DI+0x20],0)
  REG_DEAD: r72:DI
;; diamond shape region follows, joining up again in bb 9:
   79 r72:DI=r103:DI
  REG_EQUAL: r72:DI+0x18

On entry to bb9, r72 is not in LR_IN, so after loop unrolling this
note is already invalid if we say that a note should not refer to a
dead register.


But the register dies much earlier. The first place where insn 79
appears is in the .169r.pre dump:

;; basic block 8, loop depth 1, count 0, freq 9100, maybe hot
;;  prev block 7, next block 9, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   7
;;  6
;; bb 8 artificial_defs: { }
;; bb 8 artificial_uses: { u43(6){ }u44(7){ }u45(16){ }u46(20){ }}
;; lr  in6 [bp] 7 [sp] 16 [argp] 20 [frame] 72 82 85 87
;; lr  use   6 [bp] 7 [sp] 16 [argp] 20 [frame] 72 87
;; lr  def   17 [flags] 72
;; live  in  6 [bp] 7 [sp] 16 [argp] 20 [frame] 72 82 85 87
;; live  gen 17 [flags] 72
;; live  kill17 [flags]
L49:
   50 NOTE_INSN_BASIC_BLOCK
   79 r72:DI=r103:DI
  REG_EQUAL: r72:DI+0x18

Here, r72 is still in LR_IN so the note is valid.


Then in .171r.cprop2:

;; basic block 8, loop depth 1, count 0, freq 9100, maybe hot
;;  prev block 7, next block 9, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   7
;;  6
;; bb 8 artificial_defs: { }
;; bb 8 artificial_uses: { u43(6){ }u44(7){ }u45(16){ }u46(20){ }}
;; lr  in6 [bp] 7 [sp] 16 [argp] 20 [frame] 82 85 87 103
;; lr  use   6 [bp] 7 [sp] 16 [argp] 20 [frame] 87 103
;; lr  def   17 [flags] 72
;; live  in  6 [bp] 7 [sp] 16 [argp] 20 [frame] 82 85 87 103
;; live  gen 17 [flags] 72
;; live  kill
L49:
   50 NOTE_INSN_BASIC_BLOCK
   79 r72:DI=r103:DI
  REG_EQUAL: r72:DI+0x18

So already after CPROP2, the REG_EQUAL note is invalid if we require
that they only refer to live registers. This all happens well before
any pass uses the DF_RD problem, so this is a pre-existing problem if
we consider this kind of REG_EQUAL note to be invalid.


> Without removing all of the notes you can do something like this:
>
> - drop the deferred rescanning from web.c.  Instead, make replace_ref
> return a bool and call df_insn_rescan manually from web_main.
>
> - attribute new registers to webs in a separate pass that happens
> before rewriting, and compute a special version of LR_IN/LR_OUT that
> uses the rewritten registers.
>
> - process instructions in reverse order; before starting the visit of
> a basic block, initialize the local LR bitmap with the rewritten
> LR_OUT of the previous step
>
> - after rewriting and scanning each statement, simulate liveness using
> the new defs and uses.
>
> - after rewriting each statement, look for EQ_USES referring to
> registers that are dead just before the statement, and delete
> REG_EQUAL notes if this is the case

I think I've shown above that we're all looking at the wrong pass...

Ciao!
Steven

Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-15 Thread Kirill Yukhin

>> Looks Ok. If David can test is successfully on AIX I can approve it.
>
> I was able to bootstrap successfully with the patch.

Checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00581.html

Thanks, K

[PATCH] Cleanup comments in alias.c

2012-10-15 Thread Dodji Seketeli

Hello,

While reading alias.c, it seemed to me that some comments could use
some cleanups.

OK for trunk?

Thanks.

gcc/

* alias.c: Cleanup comments.
---
 gcc/alias.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/gcc/alias.c b/gcc/alias.c
index 0c6a744..09aef11 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -60,14 +60,13 @@ along with GCC; see the file COPYING3.  If not see
struct Z z2, *pz;
 
 
-   py = &px1.y1;
+   py = &x1.y1;
px2 = &x1;
 
Consider the four questions:
 
Can a store to x1 interfere with px2->y1?
Can a store to x1 interfere with px2->z2?
-   (*px2).z2
Can a store to x1 change the value pointed to by with py?
Can a store to x1 change the value pointed to by with pz?
 
@@ -78,24 +77,24 @@ along with GCC; see the file COPYING3.  If not see
a store through a pointer to an X can overwrite any field that is
contained (recursively) in an X (unless we know that px1 != px2).
 
-   The last two of the questions can be solved in the same way as the
-   first two questions but this is too conservative.  The observation
-   is that in some cases analysis we can know if which (if any) fields
-   are addressed and if those addresses are used in bad ways.  This
-   analysis may be language specific.  In C, arbitrary operations may
-   be applied to pointers.  However, there is some indication that
-   this may be too conservative for some C++ types.
+   The last two questions can be solved in the same way as the first
+   two questions but this is too conservative.  The observation is
+   that in some cases we can know which (if any) fields are addressed
+   and if those addresses are used in bad ways.  This analysis may be
+   language specific.  In C, arbitrary operations may be applied to
+   pointers.  However, there is some indication that this may be too
+   conservative for some C++ types.
 
The pass ipa-type-escape does this analysis for the types whose
instances do not escape across the compilation boundary.
 
Historically in GCC, these two problems were combined and a single
-   data structure was used to represent the solution to these
+   data structure that was used to represent the solution to these
problems.  We now have two similar but different data structures,
-   The data structure to solve the last two question is similar to the
-   first, but does not contain have the fields in it whose address are
-   never taken.  For types that do escape the compilation unit, the
-   data structures will have identical information.
+   The data structure to solve the last two questions is similar to
+   the first, but does not contain the fields whose address are never
+   taken.  For types that do escape the compilation unit, the data
+   structures will have identical information.
 */
 
 /* The alias sets assigned to MEMs assist the back-end in determining
-- 
Dodji

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Paolo Bonzini

Il 15/10/2012 14:53, Steven Bosscher ha scritto:
> I think I've shown above that we're all looking at the wrong pass...

I think you have... so we want a patch like this?

Index: df-problems.c
===
--- df-problems.c   (revisione 183719)
+++ df-problems.c   (copia locale)
@@ -3480,6 +3485,18 @@ df_note_bb_compute (unsigned int bb_inde
}
}
 
+  for (use_rec = DF_INSN_UID_EQ_USES (uid); *use_rec; use_rec++)
+   {
+ df_ref use = *use_rec;
+ unsigned int uregno = DF_REF_REGNO (use);
+
+ if (!bitmap_bit_p (live, uregno))
+{
+  remove_note (insn, find_reg_equal_equiv_note (insn));
+  break;
+}
+   }
+
   if (debug_insn == -1)
{
  /* ??? We could probably do better here, replacing dead

Paolo

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-15 Thread Steven Bosscher

On Mon, Oct 15, 2012 at 3:21 PM, Paolo Bonzini  wrote:
> Il 15/10/2012 14:53, Steven Bosscher ha scritto:
>> I think I've shown above that we're all looking at the wrong pass...
>
> I think you have... so we want a patch like this?

I don't think so. df_kill_notes is already supposed to take care of this.

Ciao!
Steven

Re: [PATCH] Cleanup comments in alias.c

2012-10-15 Thread Richard Biener

On Mon, 15 Oct 2012, Dodji Seketeli wrote:

> Hello,
> 
> While reading alias.c, it seemed to me that some comments could use
> some cleanups.
> 
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks.
> 
> gcc/
> 
>   * alias.c: Cleanup comments.
> ---
>  gcc/alias.c | 27 +--
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/alias.c b/gcc/alias.c
> index 0c6a744..09aef11 100644
> --- a/gcc/alias.c
> +++ b/gcc/alias.c
> @@ -60,14 +60,13 @@ along with GCC; see the file COPYING3.  If not see
> struct Z z2, *pz;
>  
>  
> -   py = &px1.y1;
> +   py = &x1.y1;
> px2 = &x1;
>  
> Consider the four questions:
>  
> Can a store to x1 interfere with px2->y1?
> Can a store to x1 interfere with px2->z2?
> -   (*px2).z2
> Can a store to x1 change the value pointed to by with py?
> Can a store to x1 change the value pointed to by with pz?
>  
> @@ -78,24 +77,24 @@ along with GCC; see the file COPYING3.  If not see
> a store through a pointer to an X can overwrite any field that is
> contained (recursively) in an X (unless we know that px1 != px2).
>  
> -   The last two of the questions can be solved in the same way as the
> -   first two questions but this is too conservative.  The observation
> -   is that in some cases analysis we can know if which (if any) fields
> -   are addressed and if those addresses are used in bad ways.  This
> -   analysis may be language specific.  In C, arbitrary operations may
> -   be applied to pointers.  However, there is some indication that
> -   this may be too conservative for some C++ types.
> +   The last two questions can be solved in the same way as the first
> +   two questions but this is too conservative.  The observation is
> +   that in some cases we can know which (if any) fields are addressed
> +   and if those addresses are used in bad ways.  This analysis may be
> +   language specific.  In C, arbitrary operations may be applied to
> +   pointers.  However, there is some indication that this may be too
> +   conservative for some C++ types.
>  
> The pass ipa-type-escape does this analysis for the types whose
> instances do not escape across the compilation boundary.
>  
> Historically in GCC, these two problems were combined and a single
> -   data structure was used to represent the solution to these
> +   data structure that was used to represent the solution to these
> problems.  We now have two similar but different data structures,
> -   The data structure to solve the last two question is similar to the
> -   first, but does not contain have the fields in it whose address are
> -   never taken.  For types that do escape the compilation unit, the
> -   data structures will have identical information.
> +   The data structure to solve the last two questions is similar to
> +   the first, but does not contain the fields whose address are never
> +   taken.  For types that do escape the compilation unit, the data
> +   structures will have identical information.
>  */
>  
>  /* The alias sets assigned to MEMs assist the back-end in determining
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-15 Thread Olivier Hainque


On Oct 15, 2012, at 15:12 , Kirill Yukhin  wrote:
>> I was able to bootstrap successfully with the patch.
> 
> Checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00581.html

 Thanks :)

Re: Constant-fold vector comparisons

2012-10-15 Thread Marc Glisse


On Mon, 15 Oct 2012, Richard Biener wrote:


   else if ((code == BIT_NOT_EXPR
&& TYPE_PRECISION (TREE_TYPE (cond)) == 1)
   || (code == BIT_XOR_EXPR
-  && integer_onep (gimple_assign_rhs2 (def_stmt
+  && ((gimple_assign_rhs_code (stmt) == VEC_COND_EXPR)
+  ? integer_all_onesp (gimple_assign_rhs2 (def_stmt))
+  : integer_onep (gimple_assign_rhs2 (def_stmt)


I don't think that we can do anything for vectors here.  The non-vector
path assumes that the type is a boolean type (thus two-valued), but
for vectors we can have arbitrary integer value input.


Actually, we just defined VEC_COND_EXPR as taking only vectors of -1 and 0 
as its first argument. So if it takes X^-1 or ~X as first argument (looks 
like I forgot the ~ case), then X is also a vector of -1 and 0.


I liked your idea of the signed boolean vector, as a way to express that 
we know some vector can only have values 0 and -1, but I am not sure how 
to use it.


Thus, as we defined true to -1 and false to 0 we cannot, unless relaxing 
what VEC_COND_EXRP treats as true or false, optimize any of ~ or ^ -1 
away.


It seems to me that what prevents from optimizing is if we want to keep 
the door open for a future relaxation of what VEC_COND_EXPR accepts as its 
first argument. Which means: produce only -1 and 0, but don't assume we 
are only reading -1 and 0 (unless we have a reason to know it, for 
instance because it is the result of a comparison), and don't assume any 
specific interpretation on those other values. Not sure how much that 
limits possible optimizations.



Which means I'd prefer if you simply condition the existing ~ and ^
handling on COND_EXPR.


Ok, that situation probably won't happen soon anyway, I just wanted to do 
something while I was looking at this region of code.


Thanks,

--
Marc Glisse

Re: [path] PR 54900: store data race in if-conversion pass

2012-10-15 Thread Andrew MacLeod


On 10/15/2012 08:28 AM, Aldy Hernandez wrote:


I am having a bit of a problem coming up with a generic testcase. 
Perhaps Andrew or others have an idea.


The attached testcase fails to trigger without the patch, because 
AFAICT we have no way of testing an addition of zero to a memory 
location:


cmpl$1, flag(%rip)
setb%al
addl%eax, dont_write(%rip)

In the simulate-thread harness I can test the environment before an 
instruction, and after an instruction, but adding 0 to *dont_write 
produces no measurable effects, particularly in a back-end independent 
manner.  Ideas?


Hum. isn't that clever.   Well, the instruction is executed pretty much 
atomically... so a write of the same value becomes very difficult to 
detect, and impossible within the existing harness. And I dont think a 
hardware watch point can catch that...


The only way I can think of is if you put 'dont_write' into a section 
which will trap if it is written to...  I don't know the details of 
doing such a thing or how you monitor the trap within the harness...


Other than that I'm not sure we can detect this with our current set of 
tools, for the longer term we'd need a write detector.  I don't suppose 
something like systemtap can detect writes like this?


Andrew

[PATCH] Last LTO streaming adjustment

2012-10-15 Thread Richard Biener


This moves TRANSLATION_UNIT_LANGUAGE to the bitpack section.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2012-10-15  Richard Biener  

* data-streamer.h (bp_pack_string_with_length): New function.
(bp_pack_string): Likewise.
(bp_unpack_indexed_string): Likewise.
(bp_unpack_string): Likewise.
* data-streamer-out.c (bp_pack_string_with_length): Likewise.
(bp_pack_string): Likewise.
* data-streamer-in.c (bp_unpack_indexed_string): Likewise.
(bp_unpack_string): Likewise.
* tree-streamer-out.c (pack_ts_translation_unit_decl_value_fields):
Pack TRANSLATION_UNIT_LANGUAGE here, not ...
(write_ts_translation_unit_decl_tree_pointers): ... here.  Remove.
(streamer_pack_tree_bitfields): Adjust.
(streamer_write_tree_body): Likewise.
* tree-streamer-in.c (unpack_ts_translation_unit_decl_value_fields):
Unpack TRANSLATION_UNIT_LANGUAGE here, not ...
(lto_input_ts_translation_unit_decl_tree_pointers): ... here.  Remove.
(unpack_value_fields): Adjust.
(streamer_read_tree_body): Likewise.

Index: gcc/data-streamer.h
===
*** gcc/data-streamer.h (revision 192451)
--- gcc/data-streamer.h (working copy)
*** unsigned streamer_string_index (struct o
*** 72,77 
--- 72,81 
  void streamer_write_string_with_length (struct output_block *,
struct lto_output_stream *,
const char *, unsigned int, bool);
+ void bp_pack_string_with_length (struct output_block *, struct bitpack_d *,
+const char *, unsigned int, bool);
+ void bp_pack_string (struct output_block *, struct bitpack_d *,
+const char *, bool);
  void streamer_write_uhwi_stream (struct lto_output_stream *,
 unsigned HOST_WIDE_INT);
  void streamer_write_hwi_stream (struct lto_output_stream *, HOST_WIDE_INT);
*** const char *streamer_read_string (struct
*** 82,87 
--- 86,94 
  const char *streamer_read_indexed_string (struct data_in *,
  struct lto_input_block *,
  unsigned int *);
+ const char *bp_unpack_indexed_string (struct data_in *, struct bitpack_d *,
+ unsigned int *);
+ const char *bp_unpack_string (struct data_in *, struct bitpack_d *);
  unsigned HOST_WIDE_INT streamer_read_uhwi (struct lto_input_block *);
  HOST_WIDE_INT streamer_read_hwi (struct lto_input_block *);
  
Index: gcc/data-streamer-out.c
===
*** gcc/data-streamer-out.c (revision 192451)
--- gcc/data-streamer-out.c (working copy)
*** streamer_write_string (struct output_blo
*** 115,120 
--- 115,153 
  }
  
  
+ /* Output STRING of LEN characters to the string table in OB.  Then
+put the index into BP.
+When PERSISTENT is set, the string S is supposed to not change during
+duration of the OB and thus OB can keep pointer into it.  */
+ 
+ void
+ bp_pack_string_with_length (struct output_block *ob, struct bitpack_d *bp,
+   const char *s, unsigned int len, bool persistent)
+ {
+   unsigned index = 0;
+   if (s)
+ index = streamer_string_index (ob, s, len, persistent);
+   bp_pack_var_len_unsigned (bp, index);
+ }
+ 
+ 
+ /* Output the '\0' terminated STRING to the string
+table in OB.  Then put the index onto the bitpack BP.
+When PERSISTENT is set, the string S is supposed to not change during
+duration of the OB and thus OB can keep pointer into it.  */
+ 
+ void
+ bp_pack_string (struct output_block *ob, struct bitpack_d *bp,
+   const char *s, bool persistent)
+ {
+   unsigned index = 0;
+   if (s)
+ index = streamer_string_index (ob, s, strlen (s) + 1, persistent);
+   bp_pack_var_len_unsigned (bp, index);
+ }
+ 
+ 
+ 
  /* Write a zero to the output stream.  */
  
  void
Index: gcc/data-streamer-in.c
===
*** gcc/data-streamer-in.c  (revision 192451)
--- gcc/data-streamer-in.c  (working copy)
*** streamer_read_string (struct data_in *da
*** 86,91 
--- 86,120 
  }
  
  
+ /* Read a string from the string table in DATA_IN using the bitpack BP.
+Write the length to RLEN.  */
+ 
+ const char *
+ bp_unpack_indexed_string (struct data_in *data_in,
+ struct bitpack_d *bp, unsigned int *rlen)
+ {
+   return string_for_index (data_in, bp_unpack_var_len_unsigned (bp), rlen);
+ }
+ 
+ 
+ /* Read a NULL terminated string from the string table in DATA_IN.  */
+ 
+ const char *
+ bp_unpack_string (struct data_in *data_in, struct bitpack_d *bp)
+ {
+   unsigned int len;
+   const char *ptr;
+ 
+   ptr =

Re: [patch] [gcc/libgcc/ada/libstdc++] Match arm--linux-eabi for ARM Linux/GNU EABI

2012-10-15 Thread Matthias Klose

On 26.06.2012 11:10, Richard Earnshaw wrote:
> On 25/06/12 22:30, Matthias Klose wrote:
>> On 25.06.2012 18:21, Matthias Klose wrote:
>>> On 25.06.2012 15:22, Richard Earnshaw wrote:
 On 25/06/12 13:08, Matthias Klose wrote:
> gcc/config.gcc now allows matching arm*-*-linux-*eabi* instead of
> arm*-*-linux-*eabi for ARM Linux/GNU EABI.  This changes the matching in 
> various
> other places as well. arm-linux-gnueabihf is used as a triplet by some
> distributions.
>
> Ok for the trunk?
>

 now that all arm-linux ports are EABI conforming, why can't this just 
 become

arm*-*-linux*
 ?
>>>
>>> I assume it could. But I didn't check for other places where this would be 
>>> needed.
>>
>> $ grep -r 'arm[^_]*eabi' . |egrep -v 'ChangeLog|\.svn/'|wc -l
>> 87
>>
>> this seems to be a larger change, which should have been committed when the 
>> old
>> abi targets were deprecated.  I'd like to get the eabi* change in first.
>>
>>   Matthias
>>
>>
> 
> Removal of the FPA support is still ongoing, but beware that it doesn't
> mean that all supported ARM configurations will be EABI conforming (some
> configurations did not use the FPA and are thus not affected by this
> change); but all ARM Linux configurations will be.

Updated patch to just match arm*-*-linux*, searched for additional files with
grep -r 'arm[^_]*linux[^_]*eabi' . |egrep -v 'ChangeLog|\.svn/'

  Matthias




2012-10-15  Matthias Klose  

gcc/

* config.gcc: Match arm*-*-linux-* for ARM Linux/GNU.
* doc/install.texi:

gcc/testsuite/
2012-10-15  Matthias Klose  

* lib/target-supports.exp (check_profiling_available): Match
arm*-*-linux-* for ARM Linux/GNU.
* g++.dg/torture/predcom-1.C: Match arm*-*-linux-* for ARM Linux/GNU.
* gfortran.dg/enum_10.f90: Likewise.
* gfortran.dg/enum_9.f90: Likewise.
* gcc.target/arm/synchronize.c: Likewise.
* g++.old-deja/g++.jason/enum6.C: Likewise.
* g++.old-deja/g++.other/enum4.C: Likewise.
* g++.old-deja/g++.law/enum9.C: Likewise.

gcc/ada/
2012-10-15  Matthias Klose  

* gcc-interface/Makefile.in: Match arm*-*-linux-*eabi* for
ARM Linux/GNU.

libgcc/
2012-10-15  Matthias Klose  

* config.host: Match arm*-*-linux-* for ARM Linux/GNU.

libstdc++-v3/
2012-10-15  Matthias Klose  

* configure.host: Match arm*-*-linux-* for ARM Linux/GNU.
* testsuite/20_util/make_signed/requirements/typedefs-2.cc: Likewise.
* testsuite/20_util/make_unsigned/requirements/typedefs-2.cc: Likewise.

libjava/
2012-10-15  Matthias Klose  

* configure.ac: Match arm*-*-linux-* for ARM Linux/GNU.
* configure: Regenerate.

Index: libgcc/config.host
===
--- libgcc/config.host  (revision 192459)
+++ libgcc/config.host  (working copy)
@@ -316,7 +316,7 @@
 arm*-*-linux*) # ARM GNU/Linux with ELF
tmake_file="${tmake_file} arm/t-arm t-fixedpoint-gnu-prefix"
case ${host} in
-   arm*-*-linux-*eabi)
+   arm*-*-linux-*)
  tmake_file="${tmake_file} arm/t-elf arm/t-bpabi arm/t-linux-eabi 
t-slibgcc-libgcc"
  tm_file="$tm_file arm/bpabi-lib.h"
  unwind_header=config/arm/unwind-arm.h
Index: libjava/configure.ac
===
--- libjava/configure.ac(revision 192459)
+++ libjava/configure.ac(working copy)
@@ -931,7 +931,7 @@
 # on Darwin -single_module speeds up loading of the dynamic libraries.
 extra_ldflags_libjava=-Wl,-single_module
 ;;
-arm*linux*eabi)
+arm*-*-linux-*)
 # Some of the ARM unwinder code is actually in libstdc++.  We
 # could in principle replicate it in libgcj, but it's better to
 # have a dependency on libstdc++.
Index: gcc/testsuite/gfortran.dg/enum_10.f90
===
--- gcc/testsuite/gfortran.dg/enum_10.f90   (revision 192459)
+++ gcc/testsuite/gfortran.dg/enum_10.f90   (working copy)
@@ -1,7 +1,7 @@
 ! { dg-do run }
 ! { dg-additional-sources enum_10.c }
 ! { dg-options "-fshort-enums -w" }
-! { dg-options "-fshort-enums -w -Wl,--no-enum-size-warning" { target 
arm*-*-linux*eabi } }
+! { dg-options "-fshort-enums -w -Wl,--no-enum-size-warning" { target 
arm*-*-linux* } }
 ! Make sure short enums are indeed interoperable with the
 ! corresponding C type.
 
Index: gcc/testsuite/gfortran.dg/enum_9.f90
===
--- gcc/testsuite/gfortran.dg/enum_9.f90(revision 192459)
+++ gcc/testsuite/gfortran.dg/enum_9.f90(working copy)
@@ -1,6 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fshort-enums" }
-! { dg-options "-fshort-enums -Wl,--no-enum-size-warning" { target 
arm*-*-linux*eabi } }
+! { dg-options "-fshort-enums -Wl,--no-enum-size-warning" { target 
arm*-*-linux* } }
 ! Program to

Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-15 Thread Rainer Orth

Eric Botcazou  writes:

>> > The only versions of the Solaris assembler I have access to only support
>> > v8plusX according to the man page.  Has that changed recently?
>> 
>> For the older stuff I mean doing something like "-m32 -xarch=v9X"
>
> OK, this is for fbe, not for as.  I think that the latter is always available 
> on the machines, but I'm not sure for the former.  Rainer very likely knows.

The assembler lives with the Studio compilers as fbe, but once in a
while that version is backported (or provided as a patch) as
/usr/ccs/bin/as.

For Solaris 10, the latest SPARC assembler patch, 118683-08, includes
-xarch=sparc4 support, rev. -07 didn't.

At least in Solaris 11.1, as does as well, not completely sure about
Solaris 11 where it may have only been introduced in an SRU (support
repository update).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-15 Thread Rainer Orth

David Miller  writes:

> Could one of you help me get the solaris side correct?  I made sure
> that binutils accepts the same options for this stuff, that's why
> I can unconditionally use '-xarch=sparc4' in the configure test.

I assume this works because gas 2.22 had neither SPARC-T4 support nor
did it accept -xarch=sparc4?

> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index b6c049b..9d2eb29 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -3501,6 +3501,24 @@ foo:
> fnaddd %f10, %f12, %f14],,
>[AC_DEFINE(HAVE_AS_FMAF_HPC_VIS3, 1,
>  [Define if your assembler supports FMAF, HPC, and VIS 3.0 
> instructions.])])
> +
> +gcc_GAS_CHECK_FEATURE([SPARC4 instructions],
> +  gcc_cv_as_sparc_fmaf,,

Shouldn't reuse a cache variable here, but use
e.g. gcc_cv_as_sparc_sparc4 instead.

> +  [-xarch=sparc4],
> +  [.text
> +   .register %g2, #scratch
> +   .register %g3, #scratch
> +   .align 4
> +   cxbe %g2, %g3, 1f
> +1: cwbneg %g2, %g3, 1f
> +1: sha1
> +   md5
> +   aes_kexpand0 %f4, %f6, %f8
> +   des_round %f38, %f40, %f42, %f44
> +   camellia_f %f54, %f56, %f58, %f60
> +   kasumi_fi_xor %f46, %f48, %f50, %f52],,
> +  [AC_DEFINE(HAVE_AS_SPARC4, 1,
> +[Define if your assembler supports SPARC4 instructions.])])
>  ;;
>  
>  changequote(,)dnl

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-15 Thread Rainer Orth

Ian Lance Taylor  writes:

> On Fri, Oct 12, 2012 at 9:40 AM, Jakub Jelinek  wrote:
>>
>> I don't see how can their testcase be used if not converted to Dejagnu
>> though, most of their testcases are full of LLVM testcase markup
>> (// RUN, // CHECK*, etc.).  So, if we import the library unmodified,
>> we'll need to setup some directory under gcc/testsuite/ for asan tests
>> and arrange for *.exp to find the libraries.
>
> Yes, that's more or less what the Go testsuite is like.  I just have a
> complicated gcc/testsuite/go.test/go-test.exp that does the right
> thing to handle the Go test cases as DejaGNU test cases.

I'd prefer if LLVM would accept the (sometimes more expressive and, to
GCC maintainers, well-known) Dejagnu annotations in addition to their
own ones, so the .exp files could live in the gcc tree only, while the
testcases themselves would be imported from upstream as is.

The same holds for gcc/testsuite/go.*, btw.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-15 Thread David Miller

From: Eric Botcazou 
Date: Mon, 15 Oct 2012 10:00:02 +0200

>> > The only versions of the Solaris assembler I have access to only support
>> > v8plusX according to the man page.  Has that changed recently?
>> 
>> For the older stuff I mean doing something like "-m32 -xarch=v9X"
> 
> OK, this is for fbe, not for as.  I think that the latter is always available 
> on the machines, but I'm not sure for the former.  Rainer very likely knows.
> 
>> The current assembler in Solaris Studio (called 'fbe') calls this
>> stuff "sparc4" which I guess means "SPARC-T4 and later".
> 
> Ah, thanks.  I agree that using the same monikers is the right thing to do...
> 
>> I'm just calling it VIS4 in GCC so that we can export intrinsics of,
>> for example, the cryptographic instructions at some point using the
>> __VIS__ version CPP tests.
> 
> ...that's why I'm not sure we should invent VIS4 at this point.  How is this 
> done on the Solaris Studio side?  Couldn't we add a new architecture to the 
> compiler (-mcpu=sparc4, with -mcpu=niagara4 as first variant), and define 
> __sparc4__ for the preprocessor?

I've scanned the documentation and there is no indication of any preprocessor
predefines or anything like that.

And keep in mind that __VIS__ is our very own invention.

Sun's compilers never predefined this.

Their makefiles do for various targets in the MediaLib sources, but that's
a source tree and header file localized convention.

Sun also never provided intrinsics other than via assembler inlines in
their VIS header.  They were never compiler builtins like our's.  The
user had to define __VIS__ on the command line to get visibility of
the routines they wanted from Sun's VIS inline assembler header file.

Sun also does not provide, and is almost certainly not going to ever
provide crypto intrinsics.

Therefore there is no convention to follow and we can do whatever we want
here.

Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-15 Thread Rainer Orth

David Miller  writes:

>>> I'm just calling it VIS4 in GCC so that we can export intrinsics of,
>>> for example, the cryptographic instructions at some point using the
>>> __VIS__ version CPP tests.
>> 
>> ...that's why I'm not sure we should invent VIS4 at this point.  How is this 
>> done on the Solaris Studio side?  Couldn't we add a new architecture to the 
>> compiler (-mcpu=sparc4, with -mcpu=niagara4 as first variant), and define 
>> __sparc4__ for the preprocessor?
>
> I've scanned the documentation and there is no indication of any preprocessor
> predefines or anything like that.
>
> And keep in mind that __VIS__ is our very own invention.
>
> Sun's compilers never predefined this.

I've found uses of e.g. __VIS >= 0x200 in their Studio 12.3 headers
(prod/include/cc/sys/vis_*.h), and strings on the cc binary reveals

-D__VIS=0x100
-D__VIS=0x200
-D__VIS=0x300
-D__VIS=0x400

I haven't yet found if those are properly documented, though.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-15 Thread David Miller

From: Rainer Orth 
Date: Mon, 15 Oct 2012 16:28:15 +0200

> David Miller  writes:
> 
>> Could one of you help me get the solaris side correct?  I made sure
>> that binutils accepts the same options for this stuff, that's why
>> I can unconditionally use '-xarch=sparc4' in the configure test.
> 
> I assume this works because gas 2.22 had neither SPARC-T4 support nor
> did it accept -xarch=sparc4?

I have a mainline binutils build installed on my system which does
have support for all of these instructions, otherwise how would I have
been able to run the testsuite and see the cbcond instructions
working? :-)

>> diff --git a/gcc/configure.ac b/gcc/configure.ac
>> index b6c049b..9d2eb29 100644
>> --- a/gcc/configure.ac
>> +++ b/gcc/configure.ac
>> @@ -3501,6 +3501,24 @@ foo:
>> fnaddd %f10, %f12, %f14],,
>>[AC_DEFINE(HAVE_AS_FMAF_HPC_VIS3, 1,
>>  [Define if your assembler supports FMAF, HPC, and VIS 3.0 
>> instructions.])])
>> +
>> +gcc_GAS_CHECK_FEATURE([SPARC4 instructions],
>> +  gcc_cv_as_sparc_fmaf,,
> 
> Shouldn't reuse a cache variable here, but use
> e.g. gcc_cv_as_sparc_sparc4 instead.

Thanks for catching that.

Re: [RFC PATCH] Add support for sparc compare-and-branch.

2012-10-15 Thread David Miller

From: Rainer Orth 
Date: Mon, 15 Oct 2012 16:44:44 +0200

> David Miller  writes:
> 
 I'm just calling it VIS4 in GCC so that we can export intrinsics of,
 for example, the cryptographic instructions at some point using the
 __VIS__ version CPP tests.
>>> 
>>> ...that's why I'm not sure we should invent VIS4 at this point.  How is 
>>> this 
>>> done on the Solaris Studio side?  Couldn't we add a new architecture to the 
>>> compiler (-mcpu=sparc4, with -mcpu=niagara4 as first variant), and define 
>>> __sparc4__ for the preprocessor?
>>
>> I've scanned the documentation and there is no indication of any preprocessor
>> predefines or anything like that.
>>
>> And keep in mind that __VIS__ is our very own invention.
>>
>> Sun's compilers never predefined this.
> 
> I've found uses of e.g. __VIS >= 0x200 in their Studio 12.3 headers
> (prod/include/cc/sys/vis_*.h), and strings on the cc binary reveals
> 
> -D__VIS=0x100
> -D__VIS=0x200
> -D__VIS=0x300
> -D__VIS=0x400
> 
> I haven't yet found if those are properly documented, though.

Strange, ok.

[C++ Patch] PR 50080

2012-10-15 Thread Paolo Carlini


Hi,

thus, if I understand correctly the resolution of Core/468 [CD1], we can 
simplify a bit the parser and just accept these 'template' outside 
templates. Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2012-10-15  Paolo Carlini  

PR c++/50080
* parser.c (cp_parser_optional_template_keyword): Implement
Core/468, allow outside template.

/testsuite
2012-10-15  Paolo Carlini  

PR c++/50080
* g++.dg/parse/tmpl-outside2.C: New.
* g++.dg/parse/tmpl-outside1.C: Adjust.
* g++.dg/template/qualttp18.C: Likewise.
* g++.old-deja/g++.pt/memtemp87.C: Likewise.
* g++.old-deja/g++.pt/overload13.C: Likewise.
Index: testsuite/g++.old-deja/g++.pt/memtemp87.C
===
--- testsuite/g++.old-deja/g++.pt/memtemp87.C   (revision 192455)
+++ testsuite/g++.old-deja/g++.pt/memtemp87.C   (working copy)
@@ -12,5 +12,4 @@ class Q {
 template class>
 class Y {
 };
-Q::template X x; // { dg-error "" } template syntax
-
+Q::template X x;
Index: testsuite/g++.old-deja/g++.pt/overload13.C
===
--- testsuite/g++.old-deja/g++.pt/overload13.C  (revision 192455)
+++ testsuite/g++.old-deja/g++.pt/overload13.C  (working copy)
@@ -7,5 +7,5 @@ struct A {
 int main ()
 {
   A a;
-  return a.template f (0); // { dg-error "" } 
+  return a.template f (0);
 }
Index: testsuite/g++.dg/parse/tmpl-outside2.C
===
--- testsuite/g++.dg/parse/tmpl-outside2.C  (revision 0)
+++ testsuite/g++.dg/parse/tmpl-outside2.C  (working copy)
@@ -0,0 +1,19 @@
+// PR c++/50080
+
+template 
+struct A
+{
+  template 
+  struct B {};
+};
+
+template 
+void test()
+{
+  typename A::template B b;
+}
+
+int main()
+{
+  typename A::template B b;
+}
Index: testsuite/g++.dg/parse/tmpl-outside1.C
===
--- testsuite/g++.dg/parse/tmpl-outside1.C  (revision 192455)
+++ testsuite/g++.dg/parse/tmpl-outside1.C  (working copy)
@@ -7,4 +7,4 @@ struct X
template  struct Y {};
 };
 
-typedef X::template Y<0> y; // { dg-error "template|invalid" }
+typedef X::template Y<0> y;
Index: testsuite/g++.dg/template/qualttp18.C
===
--- testsuite/g++.dg/template/qualttp18.C   (revision 192455)
+++ testsuite/g++.dg/template/qualttp18.C   (working copy)
@@ -14,7 +14,7 @@ template  class TT> struct X
 
 struct C
 {
-   X x; // { dg-error "" }
+   X x;
 };
 
 int main()
Index: cp/parser.c
===
--- cp/parser.c (revision 192455)
+++ cp/parser.c (working copy)
@@ -23252,29 +23252,10 @@ cp_parser_optional_template_keyword (cp_parser *pa
 {
   if (cp_lexer_next_token_is_keyword (parser->lexer, RID_TEMPLATE))
 {
-  /* The `template' keyword can only be used within templates;
-outside templates the parser can always figure out what is a
-template and what is not.  */
-  if (!processing_template_decl)
-   {
- cp_token *token = cp_lexer_peek_token (parser->lexer);
- error_at (token->location,
-   "% (as a disambiguator) is only allowed "
-   "within templates");
- /* If this part of the token stream is rescanned, the same
-error message would be generated.  So, we purge the token
-from the stream.  */
- cp_lexer_purge_token (parser->lexer);
- return false;
-   }
-  else
-   {
- /* Consume the `template' keyword.  */
- cp_lexer_consume_token (parser->lexer);
- return true;
-   }
+  /* Consume the `template' keyword.  */
+  cp_lexer_consume_token (parser->lexer);
+  return true;
 }
-
   return false;
 }

Re: [patch] [gcc/libgcc/ada/libstdc++] Match arm--linux-eabi for ARM Linux/GNU EABI

2012-10-15 Thread Richard Earnshaw


On 15/10/12 15:20, Matthias Klose wrote:

On 26.06.2012 11:10, Richard Earnshaw wrote:

On 25/06/12 22:30, Matthias Klose wrote:

On 25.06.2012 18:21, Matthias Klose wrote:

On 25.06.2012 15:22, Richard Earnshaw wrote:

On 25/06/12 13:08, Matthias Klose wrote:

gcc/config.gcc now allows matching arm*-*-linux-*eabi* instead of
arm*-*-linux-*eabi for ARM Linux/GNU EABI.  This changes the matching in various
other places as well. arm-linux-gnueabihf is used as a triplet by some
distributions.

Ok for the trunk?



now that all arm-linux ports are EABI conforming, why can't this just become

arm*-*-linux*
?


I assume it could. But I didn't check for other places where this would be 
needed.


$ grep -r 'arm[^_]*eabi' . |egrep -v 'ChangeLog|\.svn/'|wc -l
87

this seems to be a larger change, which should have been committed when the old
abi targets were deprecated.  I'd like to get the eabi* change in first.

   Matthias




Removal of the FPA support is still ongoing, but beware that it doesn't
mean that all supported ARM configurations will be EABI conforming (some
configurations did not use the FPA and are thus not affected by this
change); but all ARM Linux configurations will be.


Updated patch to just match arm*-*-linux*, searched for additional files with
grep -r 'arm[^_]*linux[^_]*eabi' . |egrep -v 'ChangeLog|\.svn/'

   Matthias




arm-linux-triplet.diff


2012-10-15  Matthias Klose  

gcc/

* config.gcc: Match arm*-*-linux-* for ARM Linux/GNU.
* doc/install.texi:



You're ChangeLog entry is incomplete.

Otherwise, OK.

R.

RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Gary Funck

We have maintained the gupc (GNU Unified Parallel C) branch for
a couple of years now, and would like to merge these changes into
the GCC trunk.

It is our goal to integrate the GUPC changes into the GCC 4.8
trunk, in order to provide a UPC (Unified Parallel C) capability
in the subsequent GCC 4.8 release.

The purpose of this note is to introduce the GUPC project,
provide an overview of the UPC-related changes and to introduce
the subsequent sets of patches which merge the GUPC branch into
GCC 4.8.

For reference,

The GUPC project page is here:
http://gcc.gnu.org/projects/gupc.html

The current GUPC release is distributed here:
http://gccupc.org

Roughly a year ago, we described the front-end related
changes at the time:
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html

We merge the GCC trunk into the gupc branch on approximately
a weekly basis.  The current GUPC branch is based upon a recent
version of the GCC trunk (192449 dated 2012-10-15), and has
been bootstrapped on x86_64/i686 Linux, PPC/POWER7/Linux and
IA64/Altix Linux. In earlier versions, GUPC was successfully
ported to SGI/MIPS (big endian) and SciCortex/MIPS (little endian).

The UPC-related source code differences
can be viewed here in various formats:
  http://gccupc.org/gupc-changes

In the discussion below, the changes are
excerpted in order to highlight important
aspects of the UPC-related changes.  The version used in
this presentation is 190707.

UPC's Shared Qualifier and Layout Qualifier
---

The UPC language specification describes
the language syntax and semantics:
  http://upc.gwu.edu/docs/upc_specs_1.2.pdf

UPC introduces a new qualifier, "shared"
that indicates that the qualified object
is located in a global shared address space
that is accessible by all UPC threads.
Additional qualifiers ("strict" and "relaxed")
further specify the semantics of accesses to
UPC shared objects.

In UPC, a shared qualified array can further
specify a "layout qualifier" that indicates
how the shared data is blocked and distributed.

There are two language pre-defined identifiers
that indicate the number of threads that
will be created when the program starts (THREADS)
and the current (zero-based) thread number
(MYTHREAD).  Typically, a UPC thread is implemented
as an operating system process.  Access to UPC
shared memory may be implemented locally via
OS provided facilities (for example, mmap),
or across nodes via a high speed network
inter-connect (for example, Infiniband).

GUPC provides a runtime (libgupc) that targets
an SMP-based system and uses mmap() to implement
global shared memory.  

Optionally, GUPC can use the more general and
more capable Berkeley UPCR runtime:
  http://upc.lbl.gov/download/source.shtml#runtime
The UPCR runtime supports a number of network
topologies, and has been ported to most of the
current High Performance Computing (HPC) systems.

The following example illustrates
the use of the UPC "shared" qualifier
combined with a layout qualifier.

#define BLKSIZE 5
#define N_PER_THREAD (4 * BLKSIZE)
shared [BLKSIZE] double A[N_PER_THREAD*THREADS];

Above the "[BLKSIZE]" construct is the UPC
layout factor; this specifies that the shared
array, A, distributes its elements across
each thread in blocks of 5 elements.  If the
program is run with two threads, then A is
distributed as shown below:

Thread 0Thread 1
-
A[ 0.. 4]   A[ 5.. 9]
A[10..14]   A[15..19]
A[20..24]   A[25..29]
A[30..34]   A[35..39]

Above, the elements shown for thread 0
are defined as having "affinity" to thread 0.
Similarly, those elements shown for thread 1
have affinity to thread 1.  In UPC, a pointer
to a shared object can be cast to a thread
local pointer (a "C" pointer), when the
designated shared object has affinity
to the referencing thread.

A UPC "pointer-to-shared" (PTS) is a pointer
that references a UPC shared object.
A UPC pointer-to-shared is a "fat" pointer
with the following logical fields:
   (virt_addr, thread, offset)

The virtual address (virt_addr) field is combined with
the thread number (thread) and offset within the
block (offset), to derive the location of the
referenced object within the UPC shared address space.

GUPC implements pointer-to-shared objects using
either a "packed" representation or a "struct"
representation.  The user can select the
pointer-to-shared representation with a "configure"
parameter.  The packed representation is the default.

The "packed" pointer-to-shared representation
limits the range of the various fields within
the pointer-to-shared in order to gain efficiency.
Packed pointer-to-shared values encode the three
part shared address (described above) as a 64-bit
value (on both 64-bit and 32-bit platforms).

The "struct" representation provides a wider
addressing range at the expense of requiring
twice the number of bits (128) needed to encode
the pointer-to-shared value.

UPC-Related Front-End Chan

Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Richard Biener

On Mon, Oct 15, 2012 at 5:47 PM, Gary Funck  wrote:
> We have maintained the gupc (GNU Unified Parallel C) branch for
> a couple of years now, and would like to merge these changes into
> the GCC trunk.
>
> It is our goal to integrate the GUPC changes into the GCC 4.8
> trunk, in order to provide a UPC (Unified Parallel C) capability
> in the subsequent GCC 4.8 release.
>
> The purpose of this note is to introduce the GUPC project,
> provide an overview of the UPC-related changes and to introduce
> the subsequent sets of patches which merge the GUPC branch into
> GCC 4.8.
>
> For reference,
>
> The GUPC project page is here:
> http://gcc.gnu.org/projects/gupc.html
>
> The current GUPC release is distributed here:
> http://gccupc.org
>
> Roughly a year ago, we described the front-end related
> changes at the time:
> http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html
>
> We merge the GCC trunk into the gupc branch on approximately
> a weekly basis.  The current GUPC branch is based upon a recent
> version of the GCC trunk (192449 dated 2012-10-15), and has
> been bootstrapped on x86_64/i686 Linux, PPC/POWER7/Linux and
> IA64/Altix Linux. In earlier versions, GUPC was successfully
> ported to SGI/MIPS (big endian) and SciCortex/MIPS (little endian).
>
> The UPC-related source code differences
> can be viewed here in various formats:
>   http://gccupc.org/gupc-changes
>
> In the discussion below, the changes are
> excerpted in order to highlight important
> aspects of the UPC-related changes.  The version used in
> this presentation is 190707.
>
> UPC's Shared Qualifier and Layout Qualifier
> ---
>
> The UPC language specification describes
> the language syntax and semantics:
>   http://upc.gwu.edu/docs/upc_specs_1.2.pdf
>
> UPC introduces a new qualifier, "shared"
> that indicates that the qualified object
> is located in a global shared address space
> that is accessible by all UPC threads.
> Additional qualifiers ("strict" and "relaxed")
> further specify the semantics of accesses to
> UPC shared objects.
>
> In UPC, a shared qualified array can further
> specify a "layout qualifier" that indicates
> how the shared data is blocked and distributed.
>
> There are two language pre-defined identifiers
> that indicate the number of threads that
> will be created when the program starts (THREADS)
> and the current (zero-based) thread number
> (MYTHREAD).  Typically, a UPC thread is implemented
> as an operating system process.  Access to UPC
> shared memory may be implemented locally via
> OS provided facilities (for example, mmap),
> or across nodes via a high speed network
> inter-connect (for example, Infiniband).
>
> GUPC provides a runtime (libgupc) that targets
> an SMP-based system and uses mmap() to implement
> global shared memory.
>
> Optionally, GUPC can use the more general and
> more capable Berkeley UPCR runtime:
>   http://upc.lbl.gov/download/source.shtml#runtime
> The UPCR runtime supports a number of network
> topologies, and has been ported to most of the
> current High Performance Computing (HPC) systems.
>
> The following example illustrates
> the use of the UPC "shared" qualifier
> combined with a layout qualifier.
>
> #define BLKSIZE 5
> #define N_PER_THREAD (4 * BLKSIZE)
> shared [BLKSIZE] double A[N_PER_THREAD*THREADS];
>
> Above the "[BLKSIZE]" construct is the UPC
> layout factor; this specifies that the shared
> array, A, distributes its elements across
> each thread in blocks of 5 elements.  If the
> program is run with two threads, then A is
> distributed as shown below:
>
> Thread 0Thread 1
> -
> A[ 0.. 4]   A[ 5.. 9]
> A[10..14]   A[15..19]
> A[20..24]   A[25..29]
> A[30..34]   A[35..39]
>
> Above, the elements shown for thread 0
> are defined as having "affinity" to thread 0.
> Similarly, those elements shown for thread 1
> have affinity to thread 1.  In UPC, a pointer
> to a shared object can be cast to a thread
> local pointer (a "C" pointer), when the
> designated shared object has affinity
> to the referencing thread.
>
> A UPC "pointer-to-shared" (PTS) is a pointer
> that references a UPC shared object.
> A UPC pointer-to-shared is a "fat" pointer
> with the following logical fields:
>(virt_addr, thread, offset)
>
> The virtual address (virt_addr) field is combined with
> the thread number (thread) and offset within the
> block (offset), to derive the location of the
> referenced object within the UPC shared address space.
>
> GUPC implements pointer-to-shared objects using
> either a "packed" representation or a "struct"
> representation.  The user can select the
> pointer-to-shared representation with a "configure"
> parameter.  The packed representation is the default.
>
> The "packed" pointer-to-shared representation
> limits the range of the various fields within
> the pointer-to-shared in order to gain efficiency.
> Packed pointer-to-shared values encode the thr

Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #4

2012-10-15 Thread Joseph S. Myers

On Fri, 12 Oct 2012, Michael Meissner wrote:

> I decided to see if it was possible to simplify the change over by adding
> another flag word in the .opt handling to give the old names (TARGET_ and
> MASK_).  For Joseph Myers and Neil Booth, the issue is when changing all
> of the switches that use Mask(xxx) and InverseMask(xxx) to also use Var(xxx),
> the option machinery changes the names of the macros to OPTION_ and
> OPTION_MASK_, which in turn causes lots and lots of changes for patch
> review.  Some can't be omitted, where we referred to the 'target_flags' and
> 'target_flags_explicit' fields, but at least it reduces the number of other
> changes.

I think changing the names is safer - it's immediately obvious as a build 
failure if you missed anything.  If you have MASK_* names for bits in more 
than one flags variable, there's a risk of accidentally testing a bit in 
the wrong variable, or ORing together bits that belong in different 
variables in a way that can't possibly work, without this causing 
immediately visible problems.  Maybe you're actually only using the names 
for a single variable, but it still seems error-prone for future changes.

I guess TARGET_* names should be safe in a way that MASK_* ones aren't for 
multiple variables - but then I wouldn't have options to do things two 
different ways, but instead use TARGET_* instead of OPTION_* and fix 
existing uses of OPTION_* for such bits.

I don't know if with C++ it's possible to keep the names the same *and* 
ensure that compile time errors occur if bits from different variables are 
used together or a bit is used with the wrong variable *and* avoid any 
other issues occurring as a consequence of such changes.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-15 Thread Ian Lance Taylor

On Mon, Oct 15, 2012 at 7:31 AM, Rainer Orth
 wrote:
> Ian Lance Taylor  writes:
>
>> On Fri, Oct 12, 2012 at 9:40 AM, Jakub Jelinek  wrote:
>>>
>>> I don't see how can their testcase be used if not converted to Dejagnu
>>> though, most of their testcases are full of LLVM testcase markup
>>> (// RUN, // CHECK*, etc.).  So, if we import the library unmodified,
>>> we'll need to setup some directory under gcc/testsuite/ for asan tests
>>> and arrange for *.exp to find the libraries.
>>
>> Yes, that's more or less what the Go testsuite is like.  I just have a
>> complicated gcc/testsuite/go.test/go-test.exp that does the right
>> thing to handle the Go test cases as DejaGNU test cases.
>
> I'd prefer if LLVM would accept the (sometimes more expressive and, to
> GCC maintainers, well-known) Dejagnu annotations in addition to their
> own ones, so the .exp files could live in the gcc tree only, while the
> testcases themselves would be imported from upstream as is.
>
> The same holds for gcc/testsuite/go.*, btw.

In my opinion, supporting the full range of GCC testsuite annotations
means imposing a lot of mechanism that the Go testsuite does not
require.  It would complicate the Go testsuite for no benefit.
Anybody who can understand the GCC testsuite annotations can
understand the much simpler Go testsuite annotations.

Ian

Ping^2 Re: Defining C99 predefined macros for whole translation unit

2012-10-15 Thread Joseph S. Myers

Ping^2.  This patch 
 (non-C parts) is 
still pending review.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Gary Funck

On 10/15/12 17:51:14, Richard Guenther wrote:
> On Mon, Oct 15, 2012 at 5:47 PM, Gary Funck  wrote:
[...]
> > UPC-Related Front-End Changes
> > -
> >
> > GCC's internal tree representation is
> > extended to record the UPC "shared",
> > "strict", "relaxed" qualifiers,
> > and the layout qualifier.
[...]
> 
> What immediately comes to my mind is that apart from parsing
> the core machinery should be shareable with Cilk+, no?

I haven't looked at Cilk in detail, but my understanding is
that Cilk and UPC have different runtime models, and clearly
different language syntax and semantics.  Perhaps those
knowledgeable in the Cilk implementation can comment further.

- Gary

Re: [asan] Emit GIMPLE directly, small cleanups

2012-10-15 Thread Diego Novillo

On Mon, Oct 15, 2012 at 11:55 AM, Ian Lance Taylor  wrote:

> In my opinion, supporting the full range of GCC testsuite annotations
> means imposing a lot of mechanism that the Go testsuite does not
> require.  It would complicate the Go testsuite for no benefit.
> Anybody who can understand the GCC testsuite annotations can
> understand the much simpler Go testsuite annotations.

Agreed.  The fact that we have to suffer DejaGNU does not gives the
right to make other projects miserable.

Diego.

Re: [C++ Patch] PR 50080

2012-10-15 Thread Jason Merrill


OK.

Jason

Re: [PATCH] Rs6000 infrastructure cleanup (switches), revised patch #4

2012-10-15 Thread Michael Meissner

On Mon, Oct 15, 2012 at 03:52:01PM +, Joseph S. Myers wrote:
> On Fri, 12 Oct 2012, Michael Meissner wrote:
> 
> > I decided to see if it was possible to simplify the change over by adding
> > another flag word in the .opt handling to give the old names (TARGET_ 
> > and
> > MASK_).  For Joseph Myers and Neil Booth, the issue is when changing 
> > all
> > of the switches that use Mask(xxx) and InverseMask(xxx) to also use 
> > Var(xxx),
> > the option machinery changes the names of the macros to OPTION_ and
> > OPTION_MASK_, which in turn causes lots and lots of changes for patch
> > review.  Some can't be omitted, where we referred to the 'target_flags' and
> > 'target_flags_explicit' fields, but at least it reduces the number of other
> > changes.
> 
> I think changing the names is safer - it's immediately obvious as a build 
> failure if you missed anything.  If you have MASK_* names for bits in more 
> than one flags variable, there's a risk of accidentally testing a bit in 
> the wrong variable, or ORing together bits that belong in different 
> variables in a way that can't possibly work, without this causing 
> immediately visible problems.  Maybe you're actually only using the names 
> for a single variable, but it still seems error-prone for future changes.

Well to be safest, we should have a prefix for each word if you define more
than one flag word.  Preferably a name that the user can specify in the .opt
file.

> I guess TARGET_* names should be safe in a way that MASK_* ones aren't for 
> multiple variables - but then I wouldn't have options to do things two 
> different ways, but instead use TARGET_* instead of OPTION_* and fix 
> existing uses of OPTION_* for such bits.

I can see the MASK/OPTION_MASK thing, but not having TARGET_* means there are
lots and lots of code changes.

Unfortunately in order to bring the number of changes down to a point where the
patches can be reviewed, my previous patches did:

#define TARGET_FOO OPTION_FOO
#define MASK_FOO OPTION_MASK_FOO

> I don't know if with C++ it's possible to keep the names the same *and* 
> ensure that compile time errors occur if bits from different variables are 
> used together or a bit is used with the wrong variable *and* avoid any 
> other issues occurring as a consequence of such changes.

I would like a way to delete the target_flags field if we don't define any
flags using it (it would affect the pch stuff that preserves and checks the
target_flags).

David and I have talked about moving to accessor macros.  I'm thinking of
something like:

mfoo
Target Report Mask(FOO) SetFunction ExplicitFunction TargetName

If TargetName were defined, it would use TARGET_ instead of OPTION_,
but the OPTION_MASK_ would not be changed.

If SetFunction was defined, the opt*.awk files would generate:

#define SET_FOO(VALUE)  \
do {\
  if (VALUE)\
target_flags &= ~MASK_FOO;  \
  else  \
target_flags |= MASK_FOO;   \
} while (0)

If ExplicitFunction was defined, the opt*.awk files would generate:

#define EXPLICIT_FOO(VALUE) \
  ((global_options_set.x_target_flags & MASK_FOO) != 0)

And then I would change options a few at a time.  When I've converted all of
the options, I would then go back to adding the Var(yyy) options, but the
SET_ and EXPLICIT_ would not change (or it could key off of
TargetName).

How would you feel about SetFunction, ExplicitFunction, and the reduced
TargetName?

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899

Re: [RFC] find_reloads_subreg_address rework triggers i386 back-end issue

2012-10-15 Thread Ulrich Weigand

Uros Bizjak wrote:
> On Fri, Oct 12, 2012 at 7:57 PM, Ulrich Weigand  wrote:
> > I was wondering if the i386 port maintainers could have a look at this
> > pattern.  Shouldn't we really have two patterns, one to *load* an unaligned
> > value and one to *store* and unaligned value, and not permit that memory
> > access to get reloaded?
> 
> Please find attached a fairly mechanical patch that splits
> move_unaligned pattern into load_unaligned and store_unaligned
> patterns. We've had some problems with this pattern, and finally we
> found the reason to make unaligned moves more robust.
> 
> I will wait for the confirmation that attached patch avoids the
> failure you are seeing with your reload patch.

Yes, this patch does in fact fix the failure I was seeing with the
reload patch.  (A full regression test shows a couple of extra fails:
FAIL: gcc.target/i386/avx256-unaligned-load-1.c scan-assembler sse_movups/1
FAIL: gcc.target/i386/avx256-unaligned-load-3.c scan-assembler sse2_movupd/1
FAIL: gcc.target/i386/avx256-unaligned-load-4.c scan-assembler avx_movups256/1
FAIL: gcc.target/i386/avx256-unaligned-store-4.c scan-assembler avx_movups256/2
But I guess these tests simply need to be updated for the new pattern names.)

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

[patch] Use hard_reg_set_iterator in a few places

2012-10-15 Thread Steven Bosscher

Hello,

GCC has hard_reg_set_iterator to iterate quickly over a HARD_REG_SET,
but it's not used a lot. Attached patch makes a few files use it to
iterate over regs_invalidated_by_call. If this is OK, I'd like to
convert loops over HARD_REG_SETs to iterators where possible.

Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven

Re: RFC: LRA for x86/x86-64 [7/9] -- continuation

2012-10-15 Thread Richard Sandiford

Hi Vlad,

Some comments about the rest of LRA.  Nothing major here...

Vladimir Makarov  writes:
> +/* Info about register in an insn.  */
> +struct lra_insn_reg
> +{
> +  /* The biggest mode through which the insn refers to the register
> + (remember the register can be accessed through a subreg in the
> + insn).  */
> +  ENUM_BITFIELD(machine_mode) biggest_mode : 16;

AFAICT, this is actually always the mode of a specific reference,
and if there are references to the same register in different modes,
those references get their own lra_insn_regs.  "mode" might be better
than "biggest_mode" if so.

> +/* Static part (common info for insns with the same ICODE) of LRA
> +   internal insn info.   It exists in at most one exemplar for each
> +   non-negative ICODE.   Warning: if the structure definition is
> +   changed, the initializer for debug_insn_static_data in lra.c should
> +   be changed too.  */

Probably worth saying (before the warning) that there is also
one structure for each asm.

> +/* LRA internal info about an insn (LRA internal insn
> +   representation).  */
> +struct lra_insn_recog_data
> +{
> +  int icode; /* The insn code.   */
> +  rtx insn; /* The insn itself.   */
> +  /* Common data for insns with the same ICODE.   */
> +  struct lra_static_insn_data *insn_static_data;

Maybe worth mentioning asms here too.

> +  /* Two arrays of size correspondingly equal to the operand and the
> + duplication numbers: */
> +  rtx **operand_loc; /* The operand locations, NULL if no operands.  */
> +  rtx **dup_loc; /* The dup locations, NULL if no dups.   */
> +  /* Number of hard registers implicitly used in given call insn.  The
> + value can be NULL or points to array of the hard register numbers
> + ending with a negative value.  */
> +  int *arg_hard_regs;
> +#ifdef HAVE_ATTR_enabled
> +  /* Alternative enabled for the insn.   NULL for debug insns.  */
> +  bool *alternative_enabled_p;
> +#endif
> +  /* The alternative should be used for the insn, -1 if invalid, or we
> + should try to use any alternative, or the insn is a debug
> + insn.  */
> +  int used_insn_alternative;
> +  struct lra_insn_reg *regs;  /* Always NULL for a debug insn.   */

Comments consistently above the field.

> +extern void lra_expand_reg_info (void);

This doesn't exist any more.

> +extern int lra_constraint_new_insn_uid_start;

Just saying in case: this seems to be write-only, with lra-constraints.c
instead using a static variable to track the uid start.

I realise you might want to keep it anyway for consistency with
lra_constraint_new_regno_start, or for debugging.

> +extern rtx lra_secondary_memory[NUM_MACHINE_MODES];

This doesn't exist any more.

> +/* lra-saves.c: */
> +
> +extern bool lra_save_restore (void);

Same for this file & function.

> +/* The function returns TRUE if at least one hard register from ones
> +   starting with HARD_REGNO and containing value of MODE are in set
> +   HARD_REGSET.   */
> +static inline bool
> +lra_hard_reg_set_intersection_p (int hard_regno, enum machine_mode mode,
> +  HARD_REG_SET hard_regset)
> +{
> +  int i;
> +
> +  lra_assert (hard_regno >= 0);
> +  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
> +if (TEST_HARD_REG_BIT (hard_regset, hard_regno + i))
> +  return true;
> +  return false;
> +}

This is the same as overlaps_hard_reg_set_p.

> +/* Return hard regno and offset of (sub-)register X through arguments
> +   HARD_REGNO and OFFSET.  If it is not (sub-)register or the hard
> +   register is unknown, then return -1 and 0 correspondingly.  */

The function seems to return -1 for both.

> +/* Add hard registers starting with HARD_REGNO and holding value of
> +   MODE to the set S.  */
> +static inline void
> +lra_add_hard_reg_set (int hard_regno, enum machine_mode mode, HARD_REG_SET 
> *s)
> +{
> +  int i;
> +
> +  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
> +SET_HARD_REG_BIT (*s, hard_regno + i);
> +}

This is add_to_hard_reg_set.

> +   Here is block diagram of LRA passes:
> +
> +   - 
> +  | Undo inheritance|  ------ 
> +  | for spilled pseudos)| | Memory-memory |  | New (and old) |
> +  | and splits (for |<| move coalesce |<-|pseudos|
> +  | pseudos got the |  ---   |   assignment  |
> +  Start   |  same  hard regs)   | 
> --- 
> +|  - ^
> +V  |     |
> + ---   V | Update virtual |  |
> +|  Remove   |> >|register |  |
> +| scratches |  ^ |  displacements |  |
> + ---   |

[Cilkplus] Merged with trunk at revision 192446.

2012-10-15 Thread Iyer, Balaji V

Cilk Plus branch was merged with trunk at revision 192446. Committed as 
revision 192464.

Thanks,

Balaji V. Iyer.

Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Joseph S. Myers

On Mon, 15 Oct 2012, Gary Funck wrote:

> Various UPC language related checks and operations
> are called in the "C" front-end and middle-end.
> To insure that these operations are defined,
> when linked with the other language front-ends
> and compilers, these functions are stub-ed,
> in a fashion similar to Objective C:

Is there a reason you chose this approach rather than the -fcilkplus 
approach of enabling an extension in the C front end given a command-line 
option?  (If you don't want to support e.g. the ObjC / UPC combination, 
you can always give an error in such cases.)  In general I think such 
conditionals are preferable to linking in stub variants of functions - and 
I'm sure people doing all-languages LTO bootstraps will appreciate not 
having to do link-time optimization of the language-independent parts of 
the compiler yet more times because of yet another binary like cc1, 
cc1plus, ... that links in much the same code.  The functions you stub out 
would then all start with assertions that they are only ever called in UPC 
mode - or if they are meant to be called in C mode but do nothing in that 
case, with appropriate checks that return early for C (if needed).

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH][AArch64] Restrict usage of SBFIZ to valid range only

2012-10-15 Thread Ian Bolton

This fixes an issue where we were generating an SBFIZ with
operand 3 outside of the valid range (as determined by the
size of the destination register and the amount of shift).

My patch checks that the range is valid before allowing
the pattern to be used.

This has now had full regression testing and all is OK.

OK for aarch64-trunk and aarch64-4_7-branch?

Cheers,
Ian


2012-10-15  Ian Bolton  

* gcc/config/aarch64/aarch64.md
(_shft_): Restrict based on op2.


-


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e6086a9..3bfe6e6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2311,7 +2311,7 @@
(ashift:GPI (ANY_EXTEND:GPI
 (match_operand:ALLX 1 "register_operand" "r"))
(match_operand 2 "const_int_operand" "n")))]
-  ""
+  " <= ( - UINTVAL (operands[2]))"
   "bfiz\\t%0, %1, %2, #"
   [(set_attr "v8type" "bfm")
(set_attr "mode" "")]

Re: [libcpp] Free some variables

2012-10-15 Thread Tom Tromey

> "Tobias" == Tobias Burnus  writes:

Tobias> Build on x86-64-linux with C/C++/Fortran. I will now do an
Tobias> all-language build/regtest.
Tobias> OK when it passes?

Tobias> 2012-10-03  Tobias Burnus  

Tobias> * files.c (read_file_guts, _cpp_save_file_entries): Free memory
Tobias> before returning.
Tobias> * lex.c (warn_about_normalization): Ditto.
Tobias> * mkdeps.c (deps_save): Ditto.
Tobias> * pch.c (cpp_valid_state): Ditto.

This is ok.  Thanks.

Tom

Re: [RFC] find_reloads_subreg_address rework triggers i386 back-end issue

2012-10-15 Thread Uros Bizjak

On Mon, Oct 15, 2012 at 6:39 PM, Ulrich Weigand  wrote:

>> On Fri, Oct 12, 2012 at 7:57 PM, Ulrich Weigand  wrote:
>> > I was wondering if the i386 port maintainers could have a look at this
>> > pattern.  Shouldn't we really have two patterns, one to *load* an unaligned
>> > value and one to *store* and unaligned value, and not permit that memory
>> > access to get reloaded?
>>
>> Please find attached a fairly mechanical patch that splits
>> move_unaligned pattern into load_unaligned and store_unaligned
>> patterns. We've had some problems with this pattern, and finally we
>> found the reason to make unaligned moves more robust.
>>
>> I will wait for the confirmation that attached patch avoids the
>> failure you are seeing with your reload patch.
>
> Yes, this patch does in fact fix the failure I was seeing with the
> reload patch.  (A full regression test shows a couple of extra fails:
> FAIL: gcc.target/i386/avx256-unaligned-load-1.c scan-assembler sse_movups/1
> FAIL: gcc.target/i386/avx256-unaligned-load-3.c scan-assembler sse2_movupd/1
> FAIL: gcc.target/i386/avx256-unaligned-load-4.c scan-assembler avx_movups256/1
> FAIL: gcc.target/i386/avx256-unaligned-store-4.c scan-assembler 
> avx_movups256/2
> But I guess these tests simply need to be updated for the new pattern names.)

Yes, the complete patch I am about to send to gcc-patches@ ML will
include all testsuite adjustments.

Uros.

Re: [C++ Patch] PR 50080

2012-10-15 Thread Jason Merrill

Actually, let's keep the diagnostic when compiling with -pedantic in 98 
mode.


Jason

[lra] patch to revert a code from previous patch.

2012-10-15 Thread Vladimir Makarov

  After committing a patch yesterday to implement proposals from a 
review, I found that GCC crashes on SPEC2000 gap.  LRA is trying to find 
a mode of operand (const_int 1) in *lea_general_1 insn and can not find 
it as the operand and insn template operand has VOIDmode.


There are still cases when context lookup is necessary to find a mode of 
the operand.  So I am reversing the change I did yesterday.


The patch is committed as rev. 192462.

2012-10-15  Vladimir Makarov  

* lra-int.h (lra_get_mode): Remove.
* lra-constraints.c (find_mode, get_op_mode): New functions.
(match_reload): Use get_op_mode instead of lra_get_mode.
(process_alt_operands, curr_insn_transform): Ditto.


Index: lra-constraints.c
===
--- lra-constraints.c   (revision 192455)
+++ lra-constraints.c   (working copy)
@@ -876,6 +876,65 @@ bitmap_head lra_bound_pseudos;
   (reg_class_size [(C)] == 1   \
|| (reg_class_size [(C)] >= 1 && targetm.class_likely_spilled_p (C)))
 
+/* Return mode of WHAT inside of WHERE whose mode of the context is
+   OUTER_MODE. If WHERE does not contain WHAT, return VOIDmode.  */
+static enum machine_mode
+find_mode (rtx *where, enum machine_mode outer_mode, rtx *what)
+{
+  int i, j;
+  enum machine_mode mode;
+  rtx x;
+  const char *fmt;
+  enum rtx_code code;
+
+  if (where == what)
+return outer_mode;
+  if (*where == NULL_RTX)
+return VOIDmode;
+  x = *where;
+  code = GET_CODE (x);
+  outer_mode = GET_MODE (x);
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+{
+  if (fmt[i] == 'e')
+   {
+ if ((mode = find_mode (&XEXP (x, i), outer_mode, what)) != VOIDmode)
+   return mode;
+   }
+  else if (fmt[i] == 'E')
+   {
+ for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+ if ((mode = find_mode (&XVECEXP (x, i, j), outer_mode, what))
+ != VOIDmode)
+   return mode;
+   }
+}
+  return VOIDmode;
+}
+
+/* Return mode for operand NOP of the current insn.  */
+static inline enum machine_mode
+get_op_mode (int nop)
+{
+  rtx *loc;
+  enum machine_mode mode;
+  bool md_first_p = asm_noperands (PATTERN (curr_insn)) < 0;
+
+  /* Take mode from the machine description first.  */
+  if (md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+return mode;
+  loc = curr_id->operand_loc[nop];
+  /* Take mode from the operand second. */
+  mode = GET_MODE (*loc);
+  if (mode != VOIDmode)
+return mode;
+  if (! md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+return mode;
+  /* Here is a very rare case. Take mode from the context.  */
+  return find_mode (&PATTERN (curr_insn), VOIDmode, loc);
+}
+
 /* If REG is a reload pseudo, try to make its class satisfying CL.  */
 static void
 narrow_reload_pseudo_class (rtx reg, enum reg_class cl)
@@ -910,8 +969,8 @@ match_reload (signed char out, signed ch
   rtx in_rtx = *curr_id->operand_loc[ins[0]];
   rtx out_rtx = *curr_id->operand_loc[out];
 
-  outmode = lra_get_mode (curr_static_id->operand[out].mode, out_rtx);
-  inmode = lra_get_mode (curr_static_id->operand[ins[0]].mode, in_rtx);
+  outmode = get_op_mode (out);
+  inmode = get_op_mode (ins[0]);
   if (inmode != outmode)
 {
   if (GET_MODE_SIZE (inmode) > GET_MODE_SIZE (outmode))
@@ -1639,8 +1698,7 @@ process_alt_operands (int only_alternati
}
   
  op = no_subreg_reg_operand[nop];
- mode = lra_get_mode (curr_static_id->operand[nop].mode,
-  *curr_id->operand_loc[nop]);
+ mode = get_op_mode (nop);
 
  win = did_match = winreg = offmemok = constmemok = false;
  badop = true;
@@ -2113,8 +2171,7 @@ process_alt_operands (int only_alternati
  && ((targetm.preferred_reload_class
   (op, this_alternative) == NO_REGS)
  || no_input_reloads_p)
- && lra_get_mode (curr_static_id->operand[nop].mode,
-  op) != VOIDmode)
+ && get_op_mode (nop) != VOIDmode)
{
  const_to_mem = 1;
  if (! no_regs_p)
@@ -3099,8 +3156,7 @@ curr_insn_transform (void)
rtx op = *curr_id->operand_loc[i];
rtx subreg = NULL_RTX;
rtx plus = NULL_RTX;
-   enum machine_mode mode
- = lra_get_mode (curr_static_id->operand[i].mode, op);
+   enum machine_mode mode = get_op_mode (i);

if (GET_CODE (op) == SUBREG)
  {
@@ -3214,7 +3270,7 @@ curr_insn_transform (void)
  enum op_type type = curr_static_id->operand[i].type;
 
  loc = curr_id->operand_loc[i];
- mode = lra_get_mode (curr_static_id->operand[i].mode, *loc);
+ mode = get_op_mode (i);
  if (GET_CODE (*loc) == SUBREG)
{
  re

Re: RFC: C++ PATCH to support dynamic initialization and destruction of C++11 and OpenMP TLS variables

2012-10-15 Thread Jason Merrill


On 10/05/2012 10:38 AM, Jason Merrill wrote:

On 10/05/2012 04:29 AM, Richard Guenther wrote:

Or if we have the extra indirection via a reference anyway, we could
have a pointer TLS variable (NULL initialized) that on the first access
will trap where in a trap handler we could then perform initialization
and setup of that pointer.


Interesting idea.  But I don't think there's any way to determine from a
SEGV handler which null pointer needs to be initialized, and in any case
users might want to have their own SEGV handlers.


But then, with support from the kernel and dynamic loader we aren't 
limited to normal signal handlers; it ought to be possible to mark a 
page of thread_local variables as trapping so that the first reference 
invokes a designated initialization function.  I think that such a 
scheme ought to be link-compatible with this one so long as the 
initialization function is the same; references would just be direct 
rather than through the wrapper function.


It sounds like recent versions of MacOS X support something like this, 
though clang doesn't take advantage of it yet.


Jason

Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-15 Thread Tobias Schlüter


On 2012-10-14 23:44, Jakub Jelinek wrote:

On Mon, Oct 15, 2012 at 12:35:27AM +0300, Janne Blomqvist wrote:

On Sat, Oct 13, 2012 at 4:26 PM, Tobias Schlüter

I'm putting forward two patches.  One uses a C++ map to very concisely build
up and handle the ordered list of symbols.  This has three problems:
1) gfortran maintainers may not want C++isms (even though in this case
it's very localized, and in my opinion very transparent), and


Even if you prefer a C++isms, why don't you go for "hash-table.h"?
std::map at least with the default allocator will just crash the compiler
if malloc returns NULL (remember that we build with -fno-exceptions),
while when you use hash-table.h (or hashtab.h) you get proper OOM diagnostics.


I don't know these parts of C++ very well, but maybe an easy fix, 
addressing this once and for all, would be doing the equivalent of 
"set_new_handler (gcc_unreachable)" (or maybe a wrapper around fatal 
("out of memory")?) at some point during gcc's initialization?  This 
should have the desired effect, shouldn't it?


Cheers,
- Tobi

[RFC] Fix spill failure at -O on 64-bit Windows

2012-10-15 Thread Eric Botcazou

For the attached Ada testcase, the compiler aborts with a spill failure at -O:

p.adb: In function 'P.F':
p.adb:16:7: error: unable to find a register to spill in class 'DREG'
p.adb:16:7: error: this is the insn:
(insn 141 140 142 17 (parallel [
(set (reg:SI 0 ax [174])
(div:SI (subreg:SI (reg:DI 43 r14 [orig:81 iftmp.6 ] [81]) 0)
(reg:SI 41 r12 [orig:140 q__l.7 ] [140])))
(set (reg:SI 43 r14 [175])
(mod:SI (subreg:SI (reg:DI 43 r14 [orig:81 iftmp.6 ] [81]) 0)
(reg:SI 41 r12 [orig:140 q__l.7 ] [140])))
(clobber (reg:CC 17 flags))
]) p.adb:6 349 {*divmodsi4}
 (expr_list:REG_DEAD (reg:SI 41 r12 [orig:140 q__l.7 ] [140])
(expr_list:REG_DEAD (reg:DI 43 r14 [orig:81 iftmp.6 ] [81])
(expr_list:REG_UNUSED (reg:SI 43 r14 [175])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))
+===GNAT BUG DETECTED==+
| 4.8.0 20121015 (experimental) [trunk revision 192447] (x86_64-pc-mingw32) 
GCC error:|
| in spill_failure, at reload1.c:2124   

The problem is that LIM hoists instructions preparing an argument register for 
a call out of a loop, without moving the associated clobber:

(insn 249 248 250 33 (clobber (reg:SC 2 cx)) p.adb:12 -1
 (nil))
[...]
(insn 253 251 254 33 (parallel [
(set (reg:DI 204)
(and:DI (reg:DI 2 cx)
(reg:DI 195)))
(clobber (reg:CC 17 flags))
]) p.adb:12 375 {*anddi_1}
 (expr_list:REG_EQUAL (and:DI (reg:DI 2 cx)
(const_int -4294967296 [0x]))
(expr_list:REG_DEAD (reg:DI 2 cx)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)

Set in insn 253 is invariant (8), cost 8, depends on 6
Decided to move invariant 8 -- gain 8

This extends the lifetime of the hard register up to the beginning of the 
function, causing reload to die on the complex division instruction.

The attached patch prevents the invariant from being hoisted in this very 
particular case.  Any better idea?

Tested on x86_64-suse-linux.


2012-10-15  Eric Botcazou  

* loop-invariant.c: Include target.h.
(check_dependency): Return false for an uninitialized argument register
that is likely to be spilled.
* Makefile.in (loop-invariant.o): Add $(TARGET_H).


2012-10-15  Eric Botcazou  

* gnat.dg/loop_optimization13.ad[sb]: New test.
* gnat.dg/loop_optimization13_pkg.ads: New helper.


-- 
Eric Botcazou-- { dg-do compile }
-- { dg-options "-O" }

with Loop_Optimization13_Pkg; use Loop_Optimization13_Pkg;

package body Loop_Optimization13 is

   function F (A : Rec) return Rec is
  N : constant Integer := A.V'Length / L;
  Res : Rec
:= (True, new Complex_Vector' (0 .. A.V'Length / L - 1 => (0.0, 0.0)));
   begin
  for I in 0 .. L - 1 loop
 for J in 0 .. N - 1 loop
Res.V (J) := Res.V (J) + A.V (I * N + J);
 end loop;
  end loop;
  return Res;
   end;

end Loop_Optimization13;with Ada.Numerics.Complex_Types; use Ada.Numerics.Complex_Types;

package Loop_Optimization13 is

   type Complex_Vector is array (Integer range <>) of Complex;
   type Complex_Vector_Ptr is access Complex_Vector;

   type Rec (Kind : Boolean := False) is record
  case Kind is
 when True => V : Complex_Vector_Ptr;
 when False => null;
  end case;
   end record;

   function F (A : Rec) return Rec;

end Loop_Optimization13;package Loop_Optimization13_Pkg is

   L : Integer;

end Loop_Optimization13_Pkg;Index: Makefile.in
===
--- Makefile.in	(revision 192447)
+++ Makefile.in	(working copy)
@@ -3101,7 +3101,7 @@ loop-iv.o : loop-iv.c $(CONFIG_H) $(SYST
intl.h $(DIAGNOSTIC_CORE_H) $(DF_H) $(HASHTAB_H)
 loop-invariant.o : loop-invariant.c $(CONFIG_H) $(SYSTEM_H) coretypes.h dumpfile.h \
$(RTL_H) $(BASIC_BLOCK_H) hard-reg-set.h $(CFGLOOP_H) $(EXPR_H) $(RECOG_H) \
-   $(TM_H) $(TM_P_H) $(FUNCTION_H) $(FLAGS_H) $(DF_H) \
+   $(TM_H) $(TM_P_H) $(FUNCTION_H) $(FLAGS_H) $(DF_H) $(TARGET_H) \
$(OBSTACK_H) $(HASHTAB_H) $(EXCEPT_H) $(PARAMS_H) $(REGS_H) ira.h
 cfgloopmanip.o : cfgloopmanip.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) \
$(BASIC_BLOCK_H) hard-reg-set.h $(CFGLOOP_H) \
Index: loop-invariant.c
===
--- loop-invariant.c	(revision 192447)
+++ loop-invariant.c	(working copy)
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.
 #include "cfgloop.h"
 #include "expr.h"
 #include "recog.h"
+#include "target.h"
 #include "function.h"
 #include "flags.h"
 #include "df.h"
@@ -784,7 +785,22 @@ check_dependency (basic_blo

Re: [C++ Patch] PR 17805

2012-10-15 Thread Jason Merrill


OK.

Jason

Re: [RFC] Fix spill failure at -O on 64-bit Windows

2012-10-15 Thread Steven Bosscher

On Mon, Oct 15, 2012 at 7:45 PM, Eric Botcazou wrote:
> This extends the lifetime of the hard register up to the beginning of the
> function, causing reload to die on the complex division instruction.

Does this still happen after my patch from yesterday to use DF_LIVE in IRA?


> The attached patch prevents the invariant from being hoisted in this very
> particular case.  Any better idea?

Maybe add a cost-free dependency on the clobber, so that it's moved
with the insn?

And/or maybe punt on all likely_spilled hard registers if
-fira-loop-pressure is not in effect, it's unlikely to be a win in any
case.

Ciao!
Steven

[c-family] Fix -fdump-ada-spec buglet in C++

2012-10-15 Thread Eric Botcazou

Hi,

since the sizetype change, we generate invalid Ada for flexible array members 
with -fdump-ada-spec in C++.  The attached patch fixes this issue and also 
partially revamps the code to polish some rough edges.

Tested on x86_64-suse-linux, OK for mainline?


2012-10-15  Eric Botcazou  

c-family/
* c-ada-spec.c (ADA_HOST_WIDE_INT_PRINT_DOUBLE_HEX): Define.
(dump_generic_ada_node) : Deal with sizetype specially.
Remove POINTER_TYPE handling, add large unsigned handling and use
ADA_HOST_WIDE_INT_PRINT_DOUBLE_HEX for big numbers.


2012-10-15  Eric Botcazou  

* g++.dg/other/dump-ada-spec-2.C: New test.


-- 
Eric BotcazouIndex: c-ada-spec.c
===
--- c-ada-spec.c	(revision 192447)
+++ c-ada-spec.c	(working copy)
@@ -30,6 +30,21 @@ along with GCC; see the file COPYING3.
 #include "c-pragma.h"
 #include "cpp-id-data.h"
 
+/* Adapted from hwint.h to use the Ada prefix.  */
+#if HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG
+# if HOST_BITS_PER_WIDE_INT == 64
+#  define ADA_HOST_WIDE_INT_PRINT_DOUBLE_HEX \
+ "16#%" HOST_LONG_FORMAT "x%016" HOST_LONG_FORMAT "x#"
+# else
+#  define ADA_HOST_WIDE_INT_PRINT_DOUBLE_HEX \
+ "16#%" HOST_LONG_FORMAT "x%08" HOST_LONG_FORMAT "x#"
+# endif
+#else
+  /* We can assume that 'long long' is at least 64 bits.  */
+# define ADA_HOST_WIDE_INT_PRINT_DOUBLE_HEX \
+"16#%" HOST_LONG_LONG_FORMAT "x%016" HOST_LONG_LONG_FORMAT "x#"
+#endif /* HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG */
+
 /* Local functions, macros and variables.  */
 static int dump_generic_ada_node (pretty_printer *, tree, tree,
   int (*)(tree, cpp_operation), int, int, bool);
@@ -2175,12 +2190,16 @@ dump_generic_ada_node (pretty_printer *b
   break;
 
 case INTEGER_CST:
-  if (TREE_CODE (TREE_TYPE (node)) == POINTER_TYPE)
-	{
-	  pp_wide_integer (buffer, TREE_INT_CST_LOW (node));
-	  pp_string (buffer, "B"); /* pseudo-unit */
-	}
-  else if (!host_integerp (node, 0))
+  /* We treat the upper half of the sizetype range as negative.  This
+	 is consistent with the internal treatment and makes it possible
+	 to generate the (0 .. -1) range for flexible array members.  */
+  if (TREE_TYPE (node) == sizetype)
+	node = fold_convert (ssizetype, node);
+  if (host_integerp (node, 0))
+	pp_wide_integer (buffer, TREE_INT_CST_LOW (node));
+  else if (host_integerp (node, 1))
+	pp_unsigned_wide_integer (buffer, TREE_INT_CST_LOW (node));
+  else
 	{
 	  tree val = node;
 	  unsigned HOST_WIDE_INT low = TREE_INT_CST_LOW (val);
@@ -2193,12 +2212,10 @@ dump_generic_ada_node (pretty_printer *b
 	  low = -low;
 	}
 	  sprintf (pp_buffer (buffer)->digit_buffer,
-	  HOST_WIDE_INT_PRINT_DOUBLE_HEX,
-	(unsigned HOST_WIDE_INT) high, low);
+		   ADA_HOST_WIDE_INT_PRINT_DOUBLE_HEX,
+		   (unsigned HOST_WIDE_INT) high, low);
 	  pp_string (buffer, pp_buffer (buffer)->digit_buffer);
 	}
-  else
-	pp_wide_integer (buffer, TREE_INT_CST_LOW (node));
   break;
 
 case REAL_CST:/* { dg-do compile } */
/* { dg-options "-fdump-ada-spec" } */

struct S
{
  int it;
  __extension__ unsigned char data[];
};

/* { dg-final { scan-ada-spec "array \\(0 .. -1\\)" } } */
/* { dg-final { cleanup-ada-spec } } */

Re: [patch] Use hard_reg_set_iterator in a few places

2012-10-15 Thread Steven Bosscher

On Mon, Oct 15, 2012 at 6:42 PM, Steven Bosscher wrote:
> Hello,
>
> GCC has hard_reg_set_iterator to iterate quickly over a HARD_REG_SET,
> but it's not used a lot. Attached patch makes a few files use it to
> iterate over regs_invalidated_by_call. If this is OK, I'd like to
> convert loops over HARD_REG_SETs to iterators where possible.
>
> Bootstrapped&tested on x86_64-unknown-linux-gnu. OK for trunk?

It's been pointed out to me that attaching the patch would make
reviewing somewhat easier.

Ciao!
Steven


hrsi-1.diff
Description: Binary data

Re: [C++ Patch] PR 50080

2012-10-15 Thread Paolo Carlini


On 10/15/2012 07:30 PM, Jason Merrill wrote:
Actually, let's keep the diagnostic when compiling with -pedantic in 
98 mode.

... too late! ;) So I prepared the below, I'm finishing testing it.

Thanks,
Paolo.

//
/cp
2012-10-15  Paolo Carlini  

PR c++/50080 (again)
* parser.c (cp_parser_optional_template_keyword): When -pedantic
and C++98 mode restore pre-Core/468 behavior.

/testsuite
2012-10-15  Paolo Carlini  

PR c++/50080 (again)
* g++.dg/parse/tmpl-outside2.C: Tweak, error in C++98.
* g++.dg/parse/tmpl-outside1.C: Likewise.
* g++.dg/template/qualttp18.C: Likewise.
* g++.old-deja/g++.pt/memtemp87.C: Likewise.
* g++.old-deja/g++.pt/overload13.C: Likewise.
Index: cp/parser.c
===
--- cp/parser.c (revision 192465)
+++ cp/parser.c (working copy)
@@ -23252,9 +23252,29 @@ cp_parser_optional_template_keyword (cp_parser *pa
 {
   if (cp_lexer_next_token_is_keyword (parser->lexer, RID_TEMPLATE))
 {
-  /* Consume the `template' keyword.  */
-  cp_lexer_consume_token (parser->lexer);
-  return true;
+  /* In C++98 the `template' keyword can only be used within templates;
+outside templates the parser can always figure out what is a
+template and what is not.  In C++11,  per the resolution of DR 468,
+'template' is allowed in cases where it is not strictly necessary.  */
+  if (!processing_template_decl
+ && pedantic && cxx_dialect == cxx98)
+   {
+ cp_token *token = cp_lexer_peek_token (parser->lexer);
+ error_at (token->location,
+   "in C++98 % (as a disambiguator) is only "
+   "allowed within templates");
+ /* If this part of the token stream is rescanned, the same
+error message would be generated.  So, we purge the token
+from the stream.  */
+ cp_lexer_purge_token (parser->lexer);
+ return false;
+   }
+  else
+   {
+ /* Consume the `template' keyword.  */
+ cp_lexer_consume_token (parser->lexer);
+ return true;
+   }
 }
   return false;
 }
Index: testsuite/g++.old-deja/g++.pt/overload13.C
===
--- testsuite/g++.old-deja/g++.pt/overload13.C  (revision 192465)
+++ testsuite/g++.old-deja/g++.pt/overload13.C  (working copy)
@@ -7,5 +7,5 @@ struct A {
 int main ()
 {
   A a;
-  return a.template f (0);
+  return a.template f (0); // { dg-error "template" "" { target c++98 } }
 }
Index: testsuite/g++.old-deja/g++.pt/memtemp87.C
===
--- testsuite/g++.old-deja/g++.pt/memtemp87.C   (revision 192465)
+++ testsuite/g++.old-deja/g++.pt/memtemp87.C   (working copy)
@@ -12,4 +12,4 @@ class Q {
 template class>
 class Y {
 };
-Q::template X x;
+Q::template X x; // { dg-error "template" "" { target c++98 } }
Index: testsuite/g++.dg/parse/tmpl-outside1.C
===
--- testsuite/g++.dg/parse/tmpl-outside1.C  (revision 192465)
+++ testsuite/g++.dg/parse/tmpl-outside1.C  (working copy)
@@ -7,4 +7,4 @@ struct X
template  struct Y {};
 };
 
-typedef X::template Y<0> y;
+typedef X::template Y<0> y; // { dg-error "template|invalid" "" { target c++98 
} }
Index: testsuite/g++.dg/parse/tmpl-outside2.C
===
--- testsuite/g++.dg/parse/tmpl-outside2.C  (revision 192465)
+++ testsuite/g++.dg/parse/tmpl-outside2.C  (working copy)
@@ -15,5 +15,5 @@ void test()
 
 int main()
 {
-  typename A::template B b;
+  typename A::template B b; // { dg-error "template|expected" "" 
{ target c++98 } }
 }
Index: testsuite/g++.dg/template/qualttp18.C
===
--- testsuite/g++.dg/template/qualttp18.C   (revision 192465)
+++ testsuite/g++.dg/template/qualttp18.C   (working copy)
@@ -14,7 +14,7 @@ template  class TT> struct X
 
 struct C
 {
-   X x;
+   X x;  // { dg-error "template" "" { target c++98 } }
 };
 
 int main()

Re: [C++ Patch] PR 50080

2012-10-15 Thread Jason Merrill


On 10/15/2012 11:33 AM, Paolo Carlini wrote:

+ && pedantic && cxx_dialect == cxx98)
+   {
+ cp_token *token = cp_lexer_peek_token (parser->lexer);
+ error_at (token->location,
+   "in C++98 % (as a disambiguator) is only "
+   "allowed within templates");


The diagnostic should be a pedwarn rather than an error.  OK with that 
change.


Jason

Re: Propagate profile counts during switch expansion

2012-10-15 Thread Easwaran Raman

On Sun, Oct 14, 2012 at 8:09 AM, Jan Hubicka  wrote:
> Hi,
>
> Index: optabs.c
> ===
> --- optabs.c(revision 191879)
> +++ optabs.c(working copy)
> @@ -4249,7 +4249,7 @@ prepare_operand (enum insn_code icode, rtx x, int
> we can do the branch.  */
>
>  static void
> -emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label)
> +emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label, int 
> prob)
>  {
>enum machine_mode optab_mode;
>enum mode_class mclass;
> @@ -4261,7 +4261,16 @@ static void
>
>gcc_assert (icode != CODE_FOR_nothing);
>gcc_assert (insn_operand_matches (icode, 0, test));
> -  emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), 
> label));
> +  rtx insn = emit_insn (
> +  GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label));
>
> I think we did not change to style of mixing declaration and code yet.  So
> please put declaration ahead.
Ok.

>
> I think you want to keep emit_jump_insn.  Also do nothing when profile_status
> == PROFILE_ABSENT.

Why should this be dependent on profile_status? The PROB passed could
also come from static prediction right.

> Index: cfgbuild.c
> ===
> --- cfgbuild.c  (revision 191879)
> +++ cfgbuild.c  (working copy)
> @@ -559,8 +559,11 @@ compute_outgoing_frequencies (basic_block b)
>   f->count = b->count - e->count;
>   return;
> }
> +  else
> +{
> +  guess_outgoing_edge_probabilities (b);
> +}
>
> Add comment here that we rely on multiway BBs having sane probabilities 
> already.
> You still want to do guessing when the edges out are EH. Those also can be 
> many.
I think this should work:

-  if (single_succ_p (b))
+  else if (single_succ_p (b))
 {
   e = single_succ_edge (b);
   e->probability = REG_BR_PROB_BASE;
   e->count = b->count;
   return;
 }
-  guess_outgoing_edge_probabilities (b);
+  else
+{
+  /* We rely on BBs with more than two successors to have sane
probabilities
+ and do not guess them here. For BBs terminated by switch statements
+ expanded to jump-table jump, we have done the right thing during
+ expansion. For EH edges, we still guess the probabilities here.  */
+  bool complex_edge = false;
+  FOR_EACH_EDGE (e, ei, b->succs)
+if (e->flags & EDGE_COMPLEX)
+  {
+complex_edge = true;
+break;
+  }
+  if (complex_edge)
+guess_outgoing_edge_probabilities (b);
+}
+


> Index: expr.h
> ===
> --- expr.h  (revision 191879)
> +++ expr.h  (working copy)
> @@ -190,7 +190,7 @@ extern int have_sub2_insn (rtx, rtx);
>  /* Emit a pair of rtl insns to compare two rtx's and to jump
> to a label if the comparison is true.  */
>  extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx,
> -enum machine_mode, int, rtx);
> +enum machine_mode, int, rtx, int 
> prob=-1);
>
> Hmm, probably first appreance of this C++ construct. I suppose it is OK.
http://gcc.gnu.org/codingconventions.html#Default says it is ok for POD values.

>
> +static inline void
> +reset_out_edges_aux (basic_block bb)
> +{
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE(e, ei, bb->succs)
> +e->aux = (void *)0;
> +}
> +static inline void
> +compute_cases_per_edge (gimple stmt)
> +{
> +  basic_block bb = gimple_bb (stmt);
> +  reset_out_edges_aux (bb);
> +  int ncases = gimple_switch_num_labels (stmt);
> +  for (int i = ncases - 1; i >= 1; --i)
> +{
> +  tree elt = gimple_switch_label (stmt, i);
> +  tree lab = CASE_LABEL (elt);
> +  basic_block case_bb = label_to_block_fn (cfun, lab);
> +  edge case_edge = find_edge (bb, case_bb);
> +  case_edge->aux = (void *)((long)(case_edge->aux) + 1);
> +}
> +}
>
> Comments and newlines per coding standard.
Ok.

> With the these changes, the patch is OK

Thanks,
Easwaran
>
> Thanks,
> Honza

[PATCH, i386]: Split unaligned SSE move to unaligned load/store (Was: [RFC] find_reloads_subreg_address rework triggers i386 back-end issue)

2012-10-15 Thread Uros Bizjak

On Mon, Oct 15, 2012 at 6:39 PM, Ulrich Weigand  wrote:

>> > I was wondering if the i386 port maintainers could have a look at this
>> > pattern.  Shouldn't we really have two patterns, one to *load* an unaligned
>> > value and one to *store* and unaligned value, and not permit that memory
>> > access to get reloaded?
>>
>> Please find attached a fairly mechanical patch that splits
>> move_unaligned pattern into load_unaligned and store_unaligned
>> patterns. We've had some problems with this pattern, and finally we
>> found the reason to make unaligned moves more robust.
>>
>> I will wait for the confirmation that attached patch avoids the
>> failure you are seeing with your reload patch.
>
> Yes, this patch does in fact fix the failure I was seeing with the
> reload patch.  (A full regression test shows a couple of extra fails:
> FAIL: gcc.target/i386/avx256-unaligned-load-1.c scan-assembler sse_movups/1
> FAIL: gcc.target/i386/avx256-unaligned-load-3.c scan-assembler sse2_movupd/1
> FAIL: gcc.target/i386/avx256-unaligned-load-4.c scan-assembler avx_movups256/1
> FAIL: gcc.target/i386/avx256-unaligned-store-4.c scan-assembler 
> avx_movups256/2
> But I guess these tests simply need to be updated for the new pattern names.)

2012-10-15  Uros Bizjak  

* config/i386/sse.md (UNSPEC_MOVU): Remove.
(UNSPEC_LOADU): New.
(UNSPEC_STOREU): Ditto.
(_movu): Split to ...
(_loadu): ... this and ...
(_storeu) ... this.
(_movdqu): Split to ...
(_loaddqu): ... this and ...
(_storedqu): ... this.
(*sse4_2_pcmpestr_unaligned): Update.
(*sse4_2_pcmpistr_unaligned): Ditto.

* config/i386/i386.c (ix86_avx256_split_vector_move_misalign): Use
gen_avx_load{dqu,ups,upd}256 to load from unaligned memory and
gen_avx_store{dqu,ups,upd}256 to store to unaligned memory.
(ix86_expand_vector_move_misalign): Use gen_sse_loadups or
gen_sse2_load{dqu,upd} to load from unaligned memory and
gen_sse_loadups or gen_sse2_store{dqu,upd}256 to store to
unaligned memory.
(struct builtin_description bdesc_spec) :
Use CODE_FOR_sse_loadups.
: Use CODE_FOR_sse2_loadupd.
: Use CODE_FOR_sse2_loaddqu.
: Use CODE_FOR_sse_storeups.
: Use CODE_FOR_sse2_storeupd.
: Use CODE_FOR_sse2_storedqu.
: Use CODE_FOR_avx_loadups256.
: Use CODE_FOR_avx_loadupd256.
: Use CODE_FOR_avx_loaddqu256.
: Use CODE_FOR_avx_storeups256.
: Use CODE_FOR_avx_storeupd256.
: Use CODE_FOR_avx_storedqu256.

testsuite/ChangeLog:

2012-10-15  Uros Bizjak  

* gcc.target/i386/avx256-unaligned-load-1.c: Update asm scan patterns.
* gcc.target/i386/avx256-unaligned-load-2.c: Ditto.
* gcc.target/i386/avx256-unaligned-load-3.c: Ditto.
* gcc.target/i386/avx256-unaligned-load-4.c: Ditto.
* gcc.target/i386/avx256-unaligned-store-1.c: Ditto.
* gcc.target/i386/avx256-unaligned-store-2.c: Ditto.
* gcc.target/i386/avx256-unaligned-store-3.c: Ditto.
* gcc.target/i386/avx256-unaligned-store-4.c: Ditto.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

Committed to mainline, similar patch will be committed to 4.7 branch.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 192420)
+++ config/i386/i386.c  (working copy)
@@ -16059,7 +16059,8 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r
 {
   rtx m;
   rtx (*extract) (rtx, rtx, rtx);
-  rtx (*move_unaligned) (rtx, rtx);
+  rtx (*load_unaligned) (rtx, rtx);
+  rtx (*store_unaligned) (rtx, rtx);
   enum machine_mode mode;

   switch (GET_MODE (op0))
@@ -16068,39 +16069,52 @@ ix86_avx256_split_vector_move_misalign (rtx op0, r
   gcc_unreachable ();
 case V32QImode:
   extract = gen_avx_vextractf128v32qi;
-  move_unaligned = gen_avx_movdqu256;
+  load_unaligned = gen_avx_loaddqu256;
+  store_unaligned = gen_avx_storedqu256;
   mode = V16QImode;
   break;
 case V8SFmode:
   extract = gen_avx_vextractf128v8sf;
-  move_unaligned = gen_avx_movups256;
+  load_unaligned = gen_avx_loadups256;
+  store_unaligned = gen_avx_storeups256;
   mode = V4SFmode;
   break;
 case V4DFmode:
   extract = gen_avx_vextractf128v4df;
-  move_unaligned = gen_avx_movupd256;
+  load_unaligned = gen_avx_loadupd256;
+  store_unaligned = gen_avx_storeupd256;
   mode = V2DFmode;
   break;
 }

-  if (MEM_P (op1) && TARGET_AVX256_SPLIT_UNALIGNED_LOAD)
+  if (MEM_P (op1))
 {
-  rtx r = gen_reg_rtx (mode);
-  m = adjust_address (op1, mode, 0);
-  emit_move_insn (r, m);
-  m = adjust_address (op1, mode, 16);
-  r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m);
-  emit_move_insn (op0, r);
+  if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD)
+   {
+

Re: [patch] Use hard_reg_set_iterator in a few places

2012-10-15 Thread Richard Sandiford

Steven Bosscher  writes:
>   * combine.c (record_dead_and_set_regs): Iterate over hard register set
>   with a hard_reg_set_iterator.
>   * cse.c (invalidate_for_call): Likewise.
>   * gcse.c (compute_hash_table_work): Likewise.
>   * loop-iv.c (simplify_using_initial_values): Likewise.
>   * postreload-gcse.c (record_opr_changes): Likewise.
>   * regcprop.c (copyprop_hardreg_forward_1): Likewise.
>   * var-tracking.c (dataflow_set_clear_at_call): Likewise.

OK, thanks.

Richard

Re: [lra] patch to revert a code from previous patch.

2012-10-15 Thread Richard Sandiford

Vladimir Makarov  writes:
>After committing a patch yesterday to implement proposals from a 
> review, I found that GCC crashes on SPEC2000 gap.  LRA is trying to find 
> a mode of operand (const_int 1) in *lea_general_1 insn and can not find 
> it as the operand and insn template operand has VOIDmode.
>
> There are still cases when context lookup is necessary to find a mode of 
> the operand.  So I am reversing the change I did yesterday.
>
> The patch is committed as rev. 192462.
>
> 2012-10-15  Vladimir Makarov  
>
>  * lra-int.h (lra_get_mode): Remove.
>  * lra-constraints.c (find_mode, get_op_mode): New functions.
>  (match_reload): Use get_op_mode instead of lra_get_mode.
>  (process_alt_operands, curr_insn_transform): Ditto.

But my objection to this code still stands.  It's wrong to assume
that an operand to an rtx has the same mode as the containing rtx.

Please add a testcase that shows the problem.

Richard

Re: RFC: C++ PATCH to support dynamic initialization and destruction of C++11 and OpenMP TLS variables

2012-10-15 Thread Richard Sandiford

domi...@lps.ens.fr (Dominique Dhumieres) writes:
> On x86_64-apple-darwin10 The following tests:
>
> g++.dg/gomp/tls-5.C
> g++.dg/tls/thread_local-cse.C
> g++.dg/tls/thread_local-order*.C
> g++.dg/tls/thread_local*g.C
>
> fail with
>
> sorry, unimplemented: dynamic initialization of non-function-local 
> thread_local variables not supported on this target

May not be related, but I was seeing g++.dg/tls/thread_local-cse.C
fail on mipsisa64-elf too.  It had the right conditions, but the dg-do
line needs to come first.

g++.dg/tls/thread_local-wrap4.C was also failing because it requires -fPIC.

I committed the following as (hopefully) obvious after testing
on mipsisa64-elf.

Richard


gcc/testsuite/
* g++.dg/tls/thread_local-cse.C: Move dg-do line.
* g++.dg/tls/thread_local-wrap4.C: Require fpic.

Index: gcc/testsuite/g++.dg/tls/thread_local-cse.C
===
--- gcc/testsuite/g++.dg/tls/thread_local-cse.C 2012-10-10 20:53:22.0 
+0100
+++ gcc/testsuite/g++.dg/tls/thread_local-cse.C 2012-10-15 20:28:38.147650178 
+0100
@@ -1,11 +1,11 @@
 // Test for CSE of the wrapper function: we should only call it once
 // for the two references to ir.
+// { dg-do run }
 // { dg-options "-std=c++11 -O -fno-inline -save-temps" }
 // { dg-require-effective-target tls_runtime }
 // { dg-require-alias }
 // { dg-final { scan-assembler-times "call *_ZTW2ir" 1 { xfail *-*-* } } }
 // { dg-final cleanup-saved-temps }
-// { dg-do run }
 
 // XFAILed until the back end supports a way to mark a function as cseable
 // though not pure.
Index: gcc/testsuite/g++.dg/tls/thread_local-wrap4.C
===
--- gcc/testsuite/g++.dg/tls/thread_local-wrap4.C   2012-10-14 
14:02:01.0 +0100
+++ gcc/testsuite/g++.dg/tls/thread_local-wrap4.C   2012-10-15 
20:28:38.147650178 +0100
@@ -2,6 +2,7 @@
 // copy per shared object.
 
 // { dg-require-effective-target tls }
+// { dg-require-effective-target fpic }
 // { dg-options "-std=c++11 -fPIC" }
 // { dg-final { scan-assembler-not "_ZTW1i@PLT" { target i?86-*-* x86_64-*-* } 
} }

Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles

2012-10-15 Thread Richard Sandiford

Joern Rennecke  writes:
> 2012-10-04  Joern Rennecke  
>
>  * final.c (get_attr_length_1): Use direct recursion rather than
>  calling get_attr_length.
>  (get_attr_lock_length): New function.
>  (INSN_VARIABLE_LENGTH_P): Define.
>  (shorten_branches): Take HAVE_ATTR_lock_length into account.
>  Don't overwrite non-delay slot insn lengths with the lengths of
>  delay slot insns with same uid.
>  * genattrtab.c (lock_length_str): New variable.
>  (make_length_attrs): New parameter base.
>  (main): Initialize lock_length_str.
>  Generate lock_lengths attributes.
>  * genattr.c (gen_attr): Emit declarations for lock_length attribute
>   related functions.
>   * doc/md.texi (node Insn Lengths): Document lock_length attribute.
>
> http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00383.html

Sorry, this is really just repeating Richard B's comments, but I still
don't understand why we need two attributes.  Why can't shorten_branches
work with the existing length and simply make sure that the length doesn't
decrease from one iteration to the next?  That seems to be how you implement
CASE_VECTOR_SHORTEN_MODE.  It also means that we can continue to use the
pessimistic algorithm for -O0.

You said in your reply that one of the reasons was to avoid
"interesting" interactions with ADJUST_INSN_LENGTH.  But the
previous minimum length would be applied after ADJUST_INSN_LENGTH,
so I'm not sure why it's a factor.

If lock_length is just an optimisation on top of that, then maybe
it would help to split it out.

Richard

Re: [lra] patch to revert a code from previous patch.

2012-10-15 Thread Richard Sandiford

Richard Sandiford  writes:
> Vladimir Makarov  writes:
>>After committing a patch yesterday to implement proposals from a 
>> review, I found that GCC crashes on SPEC2000 gap.  LRA is trying to find 
>> a mode of operand (const_int 1) in *lea_general_1 insn and can not find 
>> it as the operand and insn template operand has VOIDmode.
>>
>> There are still cases when context lookup is necessary to find a mode of 
>> the operand.  So I am reversing the change I did yesterday.
>>
>> The patch is committed as rev. 192462.
>>
>> 2012-10-15  Vladimir Makarov  
>>
>>  * lra-int.h (lra_get_mode): Remove.
>>  * lra-constraints.c (find_mode, get_op_mode): New functions.
>>  (match_reload): Use get_op_mode instead of lra_get_mode.
>>  (process_alt_operands, curr_insn_transform): Ditto.
>
> But my objection to this code still stands.  It's wrong to assume
> that an operand to an rtx has the same mode as the containing rtx.
>
> Please add a testcase that shows the problem.

(...because I was hoping to have a look myself).  But if that's too
difficult to reduce, then which operand to *lea_general_1 was the problem?
The pattern looks like:

(define_insn_and_split "*lea_general_1"
  [(set (match_operand 0 "register_operand" "=r")
(plus (plus (match_operand 1 "index_register_operand" "l")
(match_operand 2 "register_operand" "r"))
  (match_operand 3 "immediate_operand" "i")))]

So operands 0, 1 and 2 should have been registers.  Operand 3 never
needs reloading, so its mode shouldn't matter.

Richard

Re: RFC: Merge the GUPC branch into the GCC 4.8 trunk (patch 0 of 16)

2012-10-15 Thread Gary Funck

On 10/15/12 17:06:28, Joseph S. Myers wrote:
> On Mon, 15 Oct 2012, Gary Funck wrote:
> > Various UPC language related checks and operations
> > are called in the "C" front-end and middle-end.
> > To insure that these operations are defined,
> > when linked with the other language front-ends
> > and compilers, these functions are stub-ed,
> > in a fashion similar to Objective C:
> 
> Is there a reason you chose this approach rather than the -fcilkplus 
> approach of enabling an extension in the C front end given a command-line 
> option?  (If you don't want to support e.g. the ObjC / UPC combination, 
> you can always give an error in such cases.)

Back when we began to develop GUPC, it was recommended that we
introduce the UPC capability as a language dialect, similar to
Objective C.  That is the approach that we have taken.

> In general I think such conditionals are preferable to linking
> in stub variants of functions - and 
> I'm sure people doing all-languages LTO bootstraps will appreciate not 
> having to do link-time optimization of the language-independent parts of 
> the compiler yet more times because of yet another binary like cc1, 
> cc1plus, ... that links in much the same code.

I agree that there is no de facto reason that cc1upc is built
other than the fact we use a similar approach to Objective C.
However, I think that re-working this aspect of how GUPC is
implemented will require a fair amount of time/effort.  If we
can find a way to make that happen in the GCC 4.8 time frame,
or if other GCC contributors are willing to help on this,
then perhaps such a change is feasible.

- Gary

[PATCH] Install error handler for out-of-memory when using STL containers Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-15 Thread Tobias Schlüter



Hi,

On 2012-10-14 23:44, Jakub Jelinek wrote:

On Mon, Oct 15, 2012 at 12:35:27AM +0300, Janne Blomqvist wrote:

On Sat, Oct 13, 2012 at 4:26 PM, Tobias Schlüter

I'm putting forward two patches.  One uses a C++ map to very concisely build
up and handle the ordered list of symbols.  This has three problems:
1) gfortran maintainers may not want C++isms (even though in this case
it's very localized, and in my opinion very transparent), and


Even if you prefer a C++isms, why don't you go for "hash-table.h"?
std::map at least with the default allocator will just crash the compiler
if malloc returns NULL (remember that we build with -fno-exceptions),
while when you use hash-table.h (or hashtab.h) you get proper OOM diagnostics.


The attached patch adds out-of-memory diagnostics for code using STL 
containers by using set_new_handler.  Since the intended allocation size 
is not available to a new_handler, I had to forego a more detailed error 
message such as the one from xmalloc_failed().  fatal_error() and 
abort() don't give a meaningful location when the new_handler is called, 
so I chose to put together the error message manually as is done in 
xmalloc_failed().  I would have found it more appealing to have operator 
new call xmalloc() unless a custom allocator is given, but I don't think 
there's a standard way of doing this.


Built and tested on the C and Fortran testsuites.  Ok for trunk?

Cheers,
- Tobi

2012-10-15  Tobias Schlüter  

* toplev.c: Add '#include '.
(cxx_out_of_memory): New function.
(general_init): Install cxx_out_of_memory as handler for
out-of-memory condition.

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 2c9329f..2e6248a 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -89,6 +89,8 @@ along with GCC; see the file COPYING3.  If not see
   declarations for e.g. AIX 4.x.  */
 #endif
 
+#include 
+
 static void general_init (const char *);
 static void do_compile (void);
 static void process_options (void);
@@ -1061,6 +1063,21 @@ open_auxiliary_file (const char *ext)
   return file;
 }
 
+
+/* Error handler for use with C++ memory allocation.  Will be
+   installed via std::set_new_handler().  */
+
+static void
+cxx_out_of_memory()
+{
+  fprintf (stderr,
+  "\n%s%sout of memory\n",
+  progname, *progname ? ": " : "");
+
+  xexit (1);
+}
+
+
 /* Initialization of the front end environment, before command line
options are parsed.  Signal handlers, internationalization etc.
ARGV0 is main's argv[0].  */
@@ -1074,6 +1091,8 @@ general_init (const char *argv0)
 --p;
   progname = p;
 
+  std::set_new_handler (cxx_out_of_memory);
+
   xmalloc_set_program_name (progname);
 
   hex_init ();

Re: [PATCH] Reduce conservativeness in REE using machine model (issue6631066)

2012-10-15 Thread Teresa Johnson

On Fri, Oct 12, 2012 at 1:23 AM, Jakub Jelinek  wrote:
> On Thu, Oct 11, 2012 at 02:44:12PM -0700, Teresa Johnson wrote:
>> Revised patch to address conservative behavior in redundant extend
>> elimination that was resulting in redundant extends not being
>> removed. Now uses a new target hook machine_mode_from_attr_mode
>> which is currently enabled only for i386.
>
> I still don't like it, the hook still is about how it is implemented
> instead of what target property it wants to ask (the important thing
> there is that a {QI,HI} -> SImode zero extension instruction on x86_64
> performs {QI,HI} -> DImode extension actually).  That isn't the case for any
> other modes, isn't the case for sign extension etc.

That's true, although for sign extends the attr modes being set in
i386.md ensure that this won't do the wrong thing as the the attr
modes in the machine desc file match the machine_mode. However, this
ends up leading to the conservative behavior remaining for sign
extends (see testcase below).

>
> Can you please post a testcase first?

This was exposed by the snappy decompression code. However, I was able
to reproduce it by modifying the testcase submitted with the fix that
introduced this checking (gcc.c-torture/execute/20111227-1.c for
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01744.html). This test
case was failing because a sign_extend was being combined with a
zero_extend, so the instruction code check below fixed it, and the
mode check was unnecessary:

  /* Second, make sure the reaching definitions don't feed another and
 different extension.  FIXME: this obviously can be improved.  */
  for (def = defs; def; def = def->next)
if ((idx = def_map[INSN_UID(DF_REF_INSN (def->ref))])
&& (cand = &VEC_index (ext_cand, *insn_list, idx - 1))
&& (cand->code != code || cand->mode != mode))

Here is my modified test case that exposes the conservative behavior I
saw in snappy:

extern void abort (void);

unsigned short s;
unsigned int i;
unsigned long l;
unsigned char v = -1;

void __attribute__((noinline,noclone))
bar (int t)
{
  if (t == 2 && s != 0xff)
abort ();
  if (t == 1 && i != 0xff)
abort ();
  if (t == 0 && l != 0xff)
abort ();
}

void __attribute__((noinline,noclone))
foo (unsigned char *a, int t)
{
  unsigned char r = v;

  if (t == 2)
s = (unsigned short) r;
  else if (t == 1)
i = (unsigned int) r;
  else if (t == 0)
l = (unsigned long) r;
  bar (t);
}

int main(void)
{
  foo (&v, 0);
  foo (&v, 1);
  foo (&v, 2);
  return 0;
}
---

With trunk, there are currently 3 movzbl generated for foo():

movzbl  v(%rip), %eax
movzbl  %al, %eax
movzbl  %al, %eax

With my fix this goes down to 1 movzbl. However, if the test case is
modified to use signed instead of unsigned, we still end up with 3
movsbl, of which 2 are redundant:

movsbw  v(%rip), %ax
movsbq  %al, %rax
movsbl  %al, %eax

A single movsbq will suffice. But because of the attr mode settings
for sign extends I mentioned above, my patch does not help here.

>
> Given the recent ree.c changes to remember the performed operations and
> their original modes (struct ext_modified), perhaps the
> "Second, make sure the reaching definitions don't feed another and"...
> check could be made less strict or even removed, but for that a testcase is
> really needed.

I believe that we can remove the mode check from the above code
altogether. The reason is that the ree.c code will always select the
widest mode when combining extends (in merge_def_and_ext). So with a
simple change to the ree.c code to simply avoid the mode checking both
the unsigned and signed cases get addressed. In the signed case we are
left with a single movs:

movsbq  v(%rip), %rax

I didn't see any test failures in a x86_64-unknown-linux-gnu bootstrap
and regression test run. If this seems reasonable, I can follow up
with the patch (which is trivial), and I can also submit the 2 new
test cases (the signed test case is included below).

Thanks,
Teresa

extern void abort (void);

signed short s;
signed int i;
signed long l;
signed char v = -1;

void __attribute__((noinline,noclone))
bar (int t)
{
  if (t == 2 && s != -1)
abort ();
  if (t == 1 && i != -1)
abort ();
  if (t == 0 && l != -1)
abort ();
}

void __attribute__((noinline,noclone))
foo (signed char *a, int t)
{
  signed char r = v;

  if (t == 2)
s = (signed short) r;
  else if (t == 1)
i = (signed int) r;
  else if (t == 0)
l = (signed long) r;
  bar (t);
}

int main(void)
{
  foo (&v, 0);
  foo (&v, 1);
  foo (&v, 2);
  return 0;
}

>
> Jakub

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

1 2 >

1 - 100 of 123 matches

Mail list logo