Re: [PATCH, doc]: Fix a bunch of warnings in *.texi files

2014-05-18 Thread Uros Bizjak
On Sun, May 18, 2014 at 7:17 AM, David Wohlferd  wrote:
> My bad.  My version of makeinfo wasn't reporting these errors.
>
> However, this isn't right either.  There are two subsections that are now
> under "Size of an asm" that should be under "Variables in Specified
> Registers."  How about this (attached)?

Oh, I was not aware that this is a nested @menu with its own sections.

Sure, your patch is OK. I went ahead and installed it on mainline,
after I have bootstrapped it on x86_64-linux-gnu.

Thanks,
Uros.


[PATCH, doc]: Fix "POD document had syntax errors at /usr/bin/pod2man line 69." error

2014-05-18 Thread Uros Bizjak
Hello!

Attached patch fixes following errors in .pod document sources:

gfdl.pod around line 53: Expected text after =item, not a number
gfdl.pod around line 147: Expected text after =item, not a number
gfdl.pod around line 165: Expected text after =item, not a number
gfdl.pod around line 205: Expected text after =item, not a number
gfdl.pod around line 357: Expected text after =item, not a number
gfdl.pod around line 384: Expected text after =item, not a number
gfdl.pod around line 400: Expected text after =item, not a number
gfdl.pod around line 422: Expected text after =item, not a number
gfdl.pod around line 445: Expected text after =item, not a number
gfdl.pod around line 475: Expected text after =item, not a number
gfdl.pod around line 499: Expected text after =item, not a number
POD document had syntax errors at /usr/bin/pod2man line 69.
gmake[3]: [doc/gfdl.7] Error 1 (ignored)

As suggested in the fix for a similar problem [1], the solution is to
put "Z<>" in the "=item" argument string.

2014-05-18  Uros Bizjak  

* texi2pod.pl: Force .pod file to not be a numbered list.

The fix was tested by bootstrapping on Fedora20 x86_64-pc-linux-gnu,
and also comparing previous .man and .html files with new ones. They
were bit-exact.

OK for mainline and 4.9?

[1] http://comments.gmane.org/gmane.network.inn/9841

Uros.
Index: texi2pod.pl
===
--- texi2pod.pl (revision 210579)
+++ texi2pod.pl (working copy)
@@ -1,6 +1,6 @@
 #! /usr/bin/perl -w
 
-#   Copyright (C) 1999, 2000, 2001, 2003, 2010 Free Software Foundation, Inc.
+#   Copyright (C) 1999-2014 Free Software Foundation, Inc.
 
 # This file is part of GCC.
 
@@ -337,7 +337,7 @@
 $_ = "\n=item $1\n";
 }
} else {
-   $_ = "\n=item $ic\n";
+   $_ = "\n=item Z\<\>$ic\n";
$ic =~ y/A-Ya-y/B-Zb-z/;
$ic =~ s/(\d+)/$1 + 1/eg;
}


[PING^2] [PATCH, wwwdocs, AArch64] Document issues with singleton vector types

2014-05-18 Thread Yufeng Zhang

Ping^2

Thanks,
Yufeng

On 05/08/14 17:38, Yufeng Zhang wrote:

Ping~

Originally posted here:

http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00019.html

Thanks,
Yufeng

On 05/01/14 17:57, Yufeng Zhang wrote:

Hi,

This patch documents issues with singleton vector types in the 4.9
AArch64 backend.

"On AArch64, the singleton vector types int64x1_t, uint64x1_t and
float64x1_t exported by arm_neon.h are defined to be the same as their
base types.  This results in incorrect application of parameter passing
rules to arguments of types int64x1_t and uint64x1_t, with respect to
the AAPCS64 ABI specification.  In addition, names of C++ functions with
parameters of these types (including float64x1_t) are not mangled
correctly.  The current typedef declarations also unintentionally
allow implicit casting between singleton vector types and their base
types.  These issues will be resolved in a near future release.  See
PR60825 for more information."

OK for the wwwdocs repos?

Thanks,
Yufeng









Re: [PATCH, libgomp doc]: Fix all libgomp.texi warnings

2014-05-18 Thread Jakub Jelinek
On Sat, May 17, 2014 at 03:43:53PM +0200, Uros Bizjak wrote:
> 2014-05-17  Uros Bizjak  
> 
> * libgomp.texi (Runitme Library Routines): Remove multiple @menu.
> (Environment Variables) Move OMP_PROC_BIND and OMP_STACKSIZE node
> texts according to their @menu entry positions.

> Tested with x86_64-pc-linux-gnu bootstrap.
> 
> OK for mainline and 4.9?

Ok, thanks.

Jakub


[wwwdocs] Buildstat update for 4.9

2014-05-18 Thread Tom G. Christensen
Latest results for 4.9.x

-tgc

Testresults for 4.9.0:
  arm-unknown-linux-gnueabi
  hppa-unknown-linux-gnu
  i386-pc-solaris2.9 (2)
  i386-pc-solaris2.10
  i386-pc-solaris2.11
  i686-unknown-linux-gnu
  mips-unknown-linux-gnu
  mipsel-unknown-linux-gnu
  powerpc-apple-darwin8.11.0
  powerpc-unknown-linux-gnu
  powerpc64-unknown-linux-gnu
  sparc-sun-solaris2.9 (2)
  sparc-sun-solaris2.11
  sparc64-sun-solaris2.9
  sparc-unknown-linux-gnu
  x86_64-unknown-linux-gnu (3)

Index: buildstat.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/buildstat.html,v
retrieving revision 1.1
diff -u -r1.1 buildstat.html
--- buildstat.html  11 Apr 2014 13:36:53 -  1.1
+++ buildstat.html  18 May 2014 13:24:33 -
@@ -20,5 +20,141 @@
 http://gcc.gnu.org/install/finalinstall.html";>
 Installing GCC: Final Installation.
 
+
+
+
+arm-unknown-linux-gnueabi
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-05/msg00051.html";>4.9.0
+
+
+
+
+hppa-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-05/msg00210.html";>4.9.0
+
+
+
+
+i386-pc-solaris2.9
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg02112.html";>4.9.0,
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg01785.html";>4.9.0
+
+
+
+
+i386-pc-solaris2.10
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg01761.html";>4.9.0
+
+
+
+
+i386-pc-solaris2.11
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg01758.html";>4.9.0
+
+
+
+
+i686-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg02200.html";>4.9.0
+
+
+
+
+mips-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-05/msg00205.html";>4.9.0
+
+
+
+
+mipsel-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-05/msg00096.html";>4.9.0
+
+
+
+
+powerpc-apple-darwin8.11.0
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg02408.html";>4.9.0
+
+
+
+
+powerpc-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-05/msg00841.html";>4.9.0
+
+
+
+
+powerpc64-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg02340.html";>4.9.0
+
+
+
+
+sparc-sun-solaris2.9
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg02113.html";>4.9.0,
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg01832.html";>4.9.0
+
+
+
+
+sparc-sun-solaris2.11
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg01826.html";>4.9.0
+
+
+
+
+sparc64-sun-solaris2.9
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg02114.html";>4.9.0
+
+
+
+
+sparc-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-05/msg00150.html";>4.9.0
+
+
+
+
+x86_64-unknown-linux-gnu
+ 
+Test results:
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg02165.html";>4.9.0,
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg01757.html";>4.9.0,
+http://gcc.gnu.org/ml/gcc-testresults/2014-04/msg01741.html";>4.9.0
+
+
+
+
+
 
 


[PATCH] Fix PR middle-end/61141

2014-05-18 Thread John David Anglin
The attached change appears to fix PR middle-end/61141.  On PA, we can  
get
deleted insn notes in call sequences.  The attached change checks to  
make sure we have
a valid insn before calling reset_insn_used_flags and  
verify_insn_sharing.


Tested on hppa-unknown-linux-gnu, hppa2.0w-hp-hpux11.11 and hppa64-hp- 
hpux11.11.


OK for trunk?

Dave
--
John David Anglin   dave.ang...@bell.net


2014-05-18  John David Anglin  

PR middle-end/61141
* emit-rtl.c (reset_all_used_flags): In a sequence, check that
XVECEXP (pat, 0, i) is an INSN before calling reset_insn_used_flags.
(verify_rtl_sharing): Likewise.

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 210323)
+++ emit-rtl.c  (working copy)
@@ -2698,7 +2698,11 @@
  {
gcc_assert (REG_NOTES (p) == NULL);
for (int i = 0; i < XVECLEN (pat, 0); i++)
- reset_insn_used_flags (XVECEXP (pat, 0, i));
+ {
+   rtx insn = XVECEXP (pat, 0, i);
+   if (INSN_P (insn))
+ reset_insn_used_flags (insn);
+ }
  }
   }
 }
@@ -2735,7 +2739,11 @@
  verify_insn_sharing (p);
else
  for (int i = 0; i < XVECLEN (pat, 0); i++)
-   verify_insn_sharing (XVECEXP (pat, 0, i));
+ {
+   rtx insn = XVECEXP (pat, 0, i);
+   if (INSN_P (insn))
+ verify_insn_sharing (insn);
+ }
   }
 
   reset_all_used_flags ();


[C++ Patch] Use inform in 2 places

2014-05-18 Thread Paolo Carlini

Hi,

while working on c++/58664 I noticed a couple of places where, IMHO, we 
should use inform. Tested x86_64-linux.


Thanks!
Paolo.

///
/cp
2014-05-18  Paolo Carlini  

* typeck2.c (cxx_incomplete_type_diagnostic): Use inform. 
* parser.c (cp_parser_enum_specifier): Likewise.

/testsuite
2014-05-18  Paolo Carlini  

* c-c++-common/gomp/simd4.c: Adjust for inform.
* g++.dg/cpp0x/decltype-call1.C: Likewise.
* g++.dg/cpp0x/forw_enum6.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-ice7.C: Likewise.
* g++.dg/cpp0x/noexcept15.C: Likewise.
* g++.dg/cpp0x/variadic-ex2.C: Likewise.
* g++.dg/eh/spec6.C: Likewise.
* g++.dg/expr/cast1.C: Likewise.
* g++.dg/expr/dtor1.C: Likewise.
* g++.dg/ext/is_base_of_diagnostic.C: Likewise.
* g++.dg/ext/unary_trait_incomplete.C: Likewise.
* g++.dg/gomp/pr49223-2.C: Likewise.
* g++.dg/gomp/udr-4.C: Likewise.
* g++.dg/init/delete1.C: Likewise.
* g++.dg/other/crash-2.C: Likewise.
* g++.dg/parse/crash24.C: Likewise.
* g++.dg/parse/crash25.C: Likewise.
* g++.dg/parse/crash31.C: Likewise.
* g++.dg/parse/crash49.C: Likewise.
* g++.dg/parse/crash50.C: Likewise.
* g++.dg/parse/crash54.C: Likewise.
* g++.dg/parse/dtor7.C: Likewise.
* g++.dg/parse/error40.C: Likewise.
* g++.dg/parse/fused-params1.C: Likewise.
* g++.dg/parse/new1.C: Likewise.
* g++.dg/template/crash35.C: Likewise.
* g++.dg/template/crash59.C: Likewise.
* g++.dg/template/crash77.C: Likewise.
* g++.dg/template/error51.C: Likewise.
* g++.dg/template/incomplete1.C: Likewise.
* g++.dg/template/incomplete3.C: Likewise.
* g++.dg/template/incomplete4.C: Likewise.
* g++.dg/template/incomplete5.C: Likewise.
* g++.dg/template/inherit8.C: Likewise.
* g++.dg/template/instantiate1.C: Likewise.
* g++.dg/template/instantiate3.C: Likewis: Likewise.
* g++.dg/template/offsetof2.C: Likewise.
* g++.dg/tm/pr51928.C: Likewise.
* g++.dg/warn/Wdelete-incomplete-1.C: Likewise.
* g++.dg/warn/incomplete1.C: Likewise.
* g++.dg/warn/incomplete2.C: Likewise.
* g++.old-deja/g++.brendan/friend4.C: Likewise.
* g++.old-deja/g++.bugs/900121_01.C: Likewise.
* g++.old-deja/g++.bugs/900214_01.C: Likewise.
* g++.old-deja/g++.eh/catch1.C: Likewise.
* g++.old-deja/g++.eh/spec6.C: Likewise.
* g++.old-deja/g++.mike/p7868.C: Likewise.
* g++.old-deja/g++.other/crash38.C: Likewise.
* g++.old-deja/g++.other/enum2.C: Likewise.
* g++.old-deja/g++.other/incomplete.C: Likewise.
* g++.old-deja/g++.other/vaarg3.C: Likewise.
* g++.old-deja/g++.pt/crash9.C: Likewise.
* g++.old-deja/g++.pt/niklas01a.C: Likewise.
* g++.old-deja/g++.pt/typename8.C: Likewise.
* g++.old-deja/g++.robertl/ice990323-1.C: Likewise.

Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 210579)
+++ cp/typeck2.c(working copy)
@@ -438,7 +438,7 @@ void
 cxx_incomplete_type_diagnostic (const_tree value, const_tree type, 
diagnostic_t diag_kind)
 {
-  int decl = 0;
+  bool is_decl = false, complained = false;
 
   gcc_assert (diag_kind == DK_WARNING 
  || diag_kind == DK_PEDWARN 
@@ -452,10 +452,10 @@ cxx_incomplete_type_diagnostic (const_tree value,
 || TREE_CODE (value) == PARM_DECL
 || TREE_CODE (value) == FIELD_DECL))
 {
-  emit_diagnostic (diag_kind, input_location, 0,
-  "%q+D has incomplete type", value);
-  decl = 1;
-}
+  complained = emit_diagnostic (diag_kind, input_location, 0,
+   "%q+D has incomplete type", value);
+  is_decl = true;
+} 
  retry:
   /* We must print an error message.  Be clever about what it says.  */
 
@@ -464,15 +464,19 @@ cxx_incomplete_type_diagnostic (const_tree value,
 case RECORD_TYPE:
 case UNION_TYPE:
 case ENUMERAL_TYPE:
-  if (!decl)
-   emit_diagnostic (diag_kind, input_location, 0,
-"invalid use of incomplete type %q#T", type);
-  if (!TYPE_TEMPLATE_INFO (type))
-   emit_diagnostic (diag_kind, input_location, 0,
-"forward declaration of %q+#T", type);
-  else
-   emit_diagnostic (diag_kind, input_location, 0,
-"declaration of %q+#T", type);
+  if (!is_decl)
+   complained = emit_diagnostic (diag_kind, input_location, 0,
+ "invalid use of incomplete type %q#T",
+ type);
+  if (complained)
+   {
+ if (!TYPE_TEMPLATE_INFO (type))
+   inform (DECL_SOURCE_LOCAT

[Ada] Fix wrong code generated for superflat array

2014-05-18 Thread Eric Botcazou
This is a regression present on the mainline and 4.9 branch for a corner case: 
a superflat array indexed by an enumeration type with representation clause.

Tested on x86_64-suse-linux, applied on the mainline and 4.9 branch.


2014-05-18  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Do not
consider that regular packed arrays can never be superflat.


2014-05-18  Eric Botcazou  

* gnat.dg/enum3.adb: New test.


-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 210579)
+++ gcc-interface/decl.c	(working copy)
@@ -2420,8 +2420,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		 we can just use the high bound of the index type.  */
 	  else if ((Nkind (gnat_index) == N_Range
 		&& cannot_be_superflat_p (gnat_index))
-		   /* Packed Array Types are never superflat.  */
-		   || Is_Packed_Array_Type (gnat_entity))
+		   /* Bit-Packed Array Types are never superflat.  */
+		   || (Is_Packed_Array_Type (gnat_entity)
+			   && Is_Bit_Packed_Array
+			  (Original_Array_Type (gnat_entity
 		gnu_high = gnu_max;
 
 	  /* Otherwise, if the high bound is constant but the low bound is
-- { dg-do run }

procedure Enum3 is
   type Enum is (Aaa, Bbb, Ccc);
   for Enum use (1,2,4);
begin
   for Lo in Enum loop
  for Hi in Enum loop
 declare
subtype S is Enum range Lo .. Hi;
type Vector is array (S) of Integer;
Vec : Vector;
 begin
for I in S loop
   Vec (I) := 0;
end loop;
if Vec /= (S => 0) then
   raise Program_Error;
end if;
 end;
  end loop;
   end loop;
end;


[Ada] Minor cleanup

2014-05-18 Thread Eric Botcazou
This replaces an explicit test for private types by Underlying_Type, which 
does the test automatically.

Tested on x86_64-suse-linux, applied on the mainline.


2014-05-18  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity): Use Underlying_Type in
lieu of more verbose construct.
* gcc-interface/trans.c (Call_to_gnu): Likewise.
(gnat_to_gnu): Likewise.  Remove obsolete code.


-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 210583)
+++ gcc-interface/decl.c	(working copy)
@@ -543,10 +543,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	   This is a workaround for major problems in protected type
 	   handling.  */
 	Entity_Id Scop = Scope (Scope (gnat_entity));
-	if ((Is_Protected_Type (Scop)
-		 || (Is_Private_Type (Scop)
-		 && Present (Full_View (Scop))
-		 && Is_Protected_Type (Full_View (Scop
+	if (Is_Protected_Type (Underlying_Type (Scop))
 		&& Present (Original_Record_Component (gnat_entity)))
 	  {
 		gnu_decl
@@ -870,9 +867,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	/* If this is an aliased object with an unconstrained nominal subtype,
 	   make a type that includes the template.  */
 	if (Is_Constr_Subt_For_UN_Aliased (Etype (gnat_entity))
-	&& (Is_Array_Type (Etype (gnat_entity))
-		|| (Is_Private_Type (Etype (gnat_entity))
-		&& Is_Array_Type (Full_View (Etype (gnat_entity)
+	&& Is_Array_Type (Underlying_Type (Etype (gnat_entity)))
 	&& !type_annotate_only)
 	  {
 	tree gnu_array
@@ -1383,9 +1378,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	   Note that we have to do that this late because of the couple of
 	   allocation adjustments that might be made just above.  */
 	if (Is_Constr_Subt_For_UN_Aliased (Etype (gnat_entity))
-	&& (Is_Array_Type (Etype (gnat_entity))
-		|| (Is_Private_Type (Etype (gnat_entity))
-		&& Is_Array_Type (Full_View (Etype (gnat_entity)
+	&& Is_Array_Type (Underlying_Type (Etype (gnat_entity)))
 	&& !type_annotate_only)
 	  {
 	tree gnu_array
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 210579)
+++ gcc-interface/trans.c	(working copy)
@@ -4269,9 +4269,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gn
 	  if (TREE_CODE (TREE_TYPE (gnu_actual)) == RECORD_TYPE
 		  && TYPE_CONTAINS_TEMPLATE_P (TREE_TYPE (gnu_actual))
 		  && Is_Constr_Subt_For_UN_Aliased (Etype (gnat_actual))
-		  && (Is_Array_Type (Etype (gnat_actual))
-		  || (Is_Private_Type (Etype (gnat_actual))
-			  && Is_Array_Type (Full_View (Etype (gnat_actual))
+		  && Is_Array_Type (Underlying_Type (Etype (gnat_actual
 		gnu_actual = convert (gnat_to_gnu_type (Etype (gnat_actual)),
   gnu_actual);
 	}
@@ -6192,8 +6190,7 @@ gnat_to_gnu (Node_Id gnat_node)
   /* These can either be operations on booleans or on modular types.
 	 Fall through for boolean types since that's the way GNU_CODES is
 	 set up.  */
-  if (IN (Ekind (Underlying_Type (Etype (gnat_node))),
-	  Modular_Integer_Kind))
+  if (Is_Modular_Integer_Type (Underlying_Type (Etype (gnat_node
 	{
 	  enum tree_code code
 	= (kind == N_Op_Or ? BIT_IOR_EXPR
@@ -6236,22 +6233,14 @@ gnat_to_gnu (Node_Id gnat_node)
 	gnu_lhs = maybe_vector_array (gnu_lhs);
 	gnu_rhs = maybe_vector_array (gnu_rhs);
 
-	/* If this is a comparison operator, convert any references to
-	   an unconstrained array value into a reference to the
-	   actual array.  */
+	/* If this is a comparison operator, convert any references to an
+	   unconstrained array value into a reference to the actual array.  */
 	if (TREE_CODE_CLASS (code) == tcc_comparison)
 	  {
 	gnu_lhs = maybe_unconstrained_array (gnu_lhs);
 	gnu_rhs = maybe_unconstrained_array (gnu_rhs);
 	  }
 
-	/* If the result type is a private type, its full view may be a
-	   numeric subtype. The representation we need is that of its base
-	   type, given that it is the result of an arithmetic operation.  */
-	else if (Is_Private_Type (Etype (gnat_node)))
-	  gnu_type = gnu_result_type
-	= get_unpadded_type (Base_Type (Full_View (Etype (gnat_node;
-
 	/* If this is a shift whose count is not guaranteed to be correct,
 	   we need to adjust the shift count.  */
 	if (IN (kind, N_Op_Shift) && !Shift_Count_OK (gnat_node))
@@ -6361,9 +6350,7 @@ gnat_to_gnu (Node_Id gnat_node)
   /* This case can apply to a boolean or a modular type.
 	 Fall through for a boolean operand since GNU_CODES is set
 	 up to handle this.  */
-  if (Is_Modular_Integer_Type (Etype (gnat_node))
-	  || (Is_Private_Type (Etype (gnat_node))
-	  && Is_Modular_Integer_Type (Full_View (Etype (gnat_node)
+  if (Is_Modular_Integer_Type (Underlying_Type (Etype (gnat_node
 	{
 	  gnu_expr = gnat_to_gnu (Right_Opnd (gnat_node));
 	  gnu_result_type = get_unpadded_type

[C++ patch] Enable constructor decloning by default

2014-05-18 Thread Jan Hubicka
Hi,
this patch enables -fdeclone-ctor-dtor by default: I believe it is up to the
optimizers to decide when the actual worker body should be inlined into the
thunks.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* c-family/c.opt: Enable declonning by default.
* c-family/c-opts.c: Do not enable decloning for -Os.
* doc/invoke.texi (-fdeclone-ctor-dtor): Update documentation.
Index: c-family/c.opt
===
--- c-family/c.opt  (revision 210521)
+++ c-family/c.opt  (working copy)
@@ -904,7 +904,7 @@ C++ ObjC++ Var(flag_deduce_init_list) In
 -fdeduce-init-list enable deduction of std::initializer_list for a 
template type parameter from a brace-enclosed initializer-list
 
 fdeclone-ctor-dtor
-C++ ObjC++ Var(flag_declone_ctor_dtor) Init(-1)
+C++ ObjC++ Var(flag_declone_ctor_dtor) Init(1)
 Factor complex constructors and destructors to favor space over speed
 
 fdefault-inline
Index: c-family/c-opts.c
===
--- c-family/c-opts.c   (revision 210521)
+++ c-family/c-opts.c   (working copy)
@@ -906,10 +906,6 @@ c_common_post_options (const char **pfil
   if (warn_implicit_function_declaration == -1)
 warn_implicit_function_declaration = flag_isoc99;
 
-  /* Declone C++ 'structors if -Os.  */
-  if (flag_declone_ctor_dtor == -1)
-flag_declone_ctor_dtor = optimize_size;
-
   if (cxx_dialect >= cxx11)
 {
   /* If we're allowing C++0x constructs, don't warn about C++98
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 210521)
+++ doc/invoke.texi (working copy)
@@ -7413,7 +7414,7 @@ clones, which means two copies of the fu
 base and complete variants are changed to be thunks that call a common
 implementation.
 
-Enabled by @option{-Os}.
+Enabled by default.
 
 @item -fdelete-null-pointer-checks
 @opindex fdelete-null-pointer-checks


Re: [RFC][PATCH][MIPS] Patch to enable LRA for MIPS backend

2014-05-18 Thread Richard Sandiford
Richard Sandiford  writes:
> I think a cleaner way of doing it would be to have helper functions
> that switch in and out of the eliminated form, storing the old form
> in fields of a new structure (either separate from address_info,
> or a local inheritance of it).  We probably also want to have arrays
> of address_infos, one for each operand, so that we don't analyse the
> same address too many times during the same insn.

In the end maintaining the array of address_infos seemed like too much
work.  It was hard to keep it up-to-date with various other changes
that can be made, including swapping commutative operands, to the point
where it wasn't obvious whether it was really an optimisation or not.

Here's a patch that does the first.  Tested on x86_64-linux-gnu.
This time I also compared the assembly output for gcc.dg, g++.dg
and gcc.c-torture at -O2 on:

  arch64-linux-gnu arm-eabi mipsisa64-sde-elf s390x-linux-gnu
  powerpc64-linux-gnu x86_64-linux-gnu

s390x in particular is very good at exposing problems with this code.
(It caught bugs in the aborted attempt to keep an array of address_infos.)

OK to install?

Thanks,
Richard


gcc/
* lra-constraints.c (valid_address_p): Move earlier in file.
(address_eliminator): New structure.
(satisfies_memory_constraint_p): New function.
(satisfies_address_constraint_p): Likewise.
(process_alt_operands, process_address, curr_insn_transform): Use them.

Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c   2014-05-17 17:49:19.071639652 +0100
+++ gcc/lra-constraints.c   2014-05-18 20:36:17.499181467 +0100
@@ -317,6 +317,118 @@ in_mem_p (int regno)
   return get_reg_class (regno) == NO_REGS;
 }
 
+/* Return 1 if ADDR is a valid memory address for mode MODE in address
+   space AS, and check that each pseudo has the proper kind of hard
+   reg. */
+static int
+valid_address_p (enum machine_mode mode ATTRIBUTE_UNUSED,
+rtx addr, addr_space_t as)
+{
+#ifdef GO_IF_LEGITIMATE_ADDRESS
+  lra_assert (ADDR_SPACE_GENERIC_P (as));
+  GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);
+  return 0;
+
+ win:
+  return 1;
+#else
+  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as);
+#endif
+}
+
+namespace {
+  /* Temporarily eliminates registers in an address (for the lifetime of
+ the object).  */
+  class address_eliminator {
+  public:
+address_eliminator (struct address_info *ad);
+~address_eliminator ();
+
+  private:
+struct address_info *m_ad;
+rtx *m_base_loc;
+rtx m_base_reg;
+rtx *m_index_loc;
+rtx m_index_reg;
+  };
+}
+
+address_eliminator::address_eliminator (struct address_info *ad)
+  : m_ad (ad),
+m_base_loc (strip_subreg (ad->base_term)),
+m_base_reg (NULL_RTX),
+m_index_loc (strip_subreg (ad->index_term)),
+m_index_reg (NULL_RTX)
+{
+  if (m_base_loc != NULL)
+{
+  m_base_reg = *m_base_loc;
+  lra_eliminate_reg_if_possible (m_base_loc);
+  if (m_ad->base_term2 != NULL)
+   *m_ad->base_term2 = *m_ad->base_term;
+}
+  if (m_index_loc != NULL)
+{
+  m_index_reg = *m_index_loc;
+  lra_eliminate_reg_if_possible (m_index_loc);
+}
+}
+
+address_eliminator::~address_eliminator ()
+{
+  if (m_base_loc && *m_base_loc != m_base_reg)
+{
+  *m_base_loc = m_base_reg;
+  if (m_ad->base_term2 != NULL)
+   *m_ad->base_term2 = *m_ad->base_term;
+}
+  if (m_index_loc && *m_index_loc != m_index_reg)
+*m_index_loc = m_index_reg;
+}
+
+/* Return true if the eliminated form of AD is a legitimate target address.  */
+static bool
+valid_address_p (struct address_info *ad)
+{
+  address_eliminator eliminator (ad);
+  return valid_address_p (ad->mode, *ad->outer, ad->as);
+}
+
+#ifdef EXTRA_CONSTRAINT_STR
+/* Return true if the eliminated form of memory reference OP satisfies
+   extra address constraint CONSTRAINT.  */
+static bool
+satisfies_memory_constraint_p (rtx op, const char *constraint)
+{
+  struct address_info ad;
+
+  decompose_mem_address (&ad, op);
+  address_eliminator eliminator (&ad);
+  return EXTRA_CONSTRAINT_STR (op, *constraint, constraint);
+}
+
+/* Return true if the eliminated form of address AD satisfies extra
+   address constraint CONSTRAINT.  */
+static bool
+satisfies_address_constraint_p (struct address_info *ad,
+   const char *constraint)
+{
+  address_eliminator eliminator (ad);
+  return EXTRA_CONSTRAINT_STR (*ad->outer, *constraint, constraint);
+}
+
+/* Return true if the eliminated form of address OP satisfies extra
+   address constraint CONSTRAINT.  */
+static bool
+satisfies_address_constraint_p (rtx op, const char *constraint)
+{
+  struct address_info ad;
+
+  decompose_lea_address (&ad, &op);
+  return satisfies_address_constraint_p (&ad, constraint);
+}
+#endif
+
 /* Initiate equivalences for LRA.  As we keep original equivalences
before any elimination

Use resolution info to get rid of weak symbols

2014-05-18 Thread Jan Hubicka
Hi,
this patch makes GCC to use resolution info to turn COMDAT and WEAK
symbols into regular symbols based on feedback given by linker plugin.
If resolution says that given symbol is prevailing, it is possible
to turn them into normal symbols, while when resolution says it
is prevailed, it is possible to turn them into external symbols.

Doing so makes rest of the backend to work smoother on them.
We previously did this transformation partly for functions, this patch
just makes it to happen for variables too and implements the second
part (turning the symbol into external definition).

Bootstrapped/regtested x86_64-linux and tested with libreoffice
build.  Will commit it shortly.

* ipa.c (update_visibility_by_resolution_info): New function.
(function_and_variable_visibility): Use it.
Index: ipa.c
===
--- ipa.c   (revision 210522)
+++ ipa.c   (working copy)
@@ -978,6 +978,50 @@ can_replace_by_local_alias (symtab_node
  && !symtab_can_be_discarded (node));
 }
 
+/* In LTO we can remove COMDAT groups and weak symbols.
+   Either turn them into normal symbols or external symbol depending on 
+   resolution info.  */
+
+static void
+update_visibility_by_resolution_info (symtab_node * node)
+{
+  bool define;
+
+  if (!node->externally_visible
+  || (!DECL_WEAK (node->decl) && !DECL_ONE_ONLY (node->decl))
+  || node->resolution == LDPR_UNKNOWN)
+return;
+
+  define = (node->resolution == LDPR_PREVAILING_DEF_IRONLY
+   || node->resolution == LDPR_PREVAILING_DEF
+   || node->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP);
+
+  /* The linker decisions ought to agree in the whole group.  */
+  if (node->same_comdat_group)
+for (symtab_node *next = node->same_comdat_group;
+next != node; next = next->same_comdat_group)
+  gcc_assert (!node->externally_visible
+ || define == (next->resolution == LDPR_PREVAILING_DEF_IRONLY
+   || next->resolution == LDPR_PREVAILING_DEF
+   || next->resolution == 
LDPR_PREVAILING_DEF_IRONLY_EXP));
+
+  if (node->same_comdat_group)
+for (symtab_node *next = node->same_comdat_group;
+next != node; next = next->same_comdat_group)
+  {
+   DECL_COMDAT_GROUP (next->decl) = NULL;
+   DECL_WEAK (next->decl) = false;
+   if (next->externally_visible
+   && !define)
+ DECL_EXTERNAL (next->decl) = true;
+  }
+  DECL_COMDAT_GROUP (node->decl) = NULL;
+  DECL_WEAK (node->decl) = false;
+  if (!define)
+DECL_EXTERNAL (node->decl) = true;
+  symtab_dissolve_same_comdat_group_list (node);
+}
+
 /* Mark visibility of all functions.
 
A local function is one whose calls can occur only in the current
@@ -1116,38 +1160,7 @@ function_and_variable_visibility (bool w
DECL_EXTERNAL (node->decl) = 1;
}
 
-  /* If whole comdat group is used only within LTO code, we can dissolve 
it,
-we handle the unification ourselves.
-We keep COMDAT and weak so visibility out of DSO does not change.
-Later we may bring the symbols static if they are not exported.  */
-  if (DECL_ONE_ONLY (node->decl)
- && (node->resolution == LDPR_PREVAILING_DEF_IRONLY
- || node->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP))
-   {
- symtab_node *next = node;
-
- if (node->same_comdat_group)
-   for (next = node->same_comdat_group;
-next != node;
-next = next->same_comdat_group)
- if (next->externally_visible
- && (next->resolution != LDPR_PREVAILING_DEF_IRONLY
- && next->resolution != LDPR_PREVAILING_DEF_IRONLY_EXP))
-   break;
- if (node == next)
-   {
- if (node->same_comdat_group)
-   for (next = node->same_comdat_group;
-next != node;
-next = next->same_comdat_group)
-   {
- DECL_COMDAT_GROUP (next->decl) = NULL;
- DECL_WEAK (next->decl) = false;
-   }
- DECL_COMDAT_GROUP (node->decl) = NULL;
- symtab_dissolve_same_comdat_group_list (node);
-   }
-   }
+  update_visibility_by_resolution_info (node);
 }
   FOR_EACH_DEFINED_FUNCTION (node)
 {
@@ -1234,6 +1247,7 @@ function_and_variable_visibility (bool w
symtab_dissolve_same_comdat_group_list (vnode);
  vnode->resolution = LDPR_PREVAILING_DEF_IRONLY;
}
+  update_visibility_by_resolution_info (vnode);
 }
 
   if (dump_file)


Re: Eliminate write-only variables

2014-05-18 Thread Sandra Loosemore

On 05/16/2014 11:25 AM, Jan Hubicka wrote:

Hi,
this patch adds code to remove write only static variables.  While analyzing
effectivity of LTO on firefox, I noticed that surprisingly large part of
binary's data segment is occupied by these.  Fixed thus.
(this is quite trivial transformation, I just never considered it important
enough to work on it).

The patch goes by marking write only variables in ipa.c (at same time we
discover addressable flag) and also fixes handling of the flags for
aliases. References to variables are then removed by fixup_cfg.
As first cut, I only remove stores without side effects, so copies from
volatile variables are preserved. I also kill LHS of function calls.
I do not attempt to remove asm statements.  This means that some references
may be left in the code and therefore the IPA code does not eliminate the
referneces after discovering write only variable and instead it relies
on dead variable elimination to do the job later.  Consequently not all write
only variables are removed with WHOPR in the case the references ends up
in different partitions. Something I can address incrementally.



This patch seems quite similar in purpose to the remove_local_statics 
optimization that Mentor has proposed, although the implementation is 
quite different.  Here is the last version of our patch, prepared by 
Bernd Schmidt last year:


https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00317.html

I think we can drop our patch from our local tree now, but it includes a 
large number of test cases which I think are worth keeping on mainline. 
 A few of them fail with your implementation, though -- which might be 
genuine bugs, or just different limitations of the two approaches.  Can 
you take a look?


The failing tests are remove-local-statics-{4,5,7,12,14b}.c.

-Sandra



[Ada] Minor cleanup #2

2014-05-18 Thread Eric Botcazou
This exports End_Location from sinfo and uses it in gigi, instead of redoing 
the computation locally.

Tested on x86_64-suse-linux, applied on the mainline.


2014-05-18  Eric Botcazou  

* fe.h (Set_Present_Expr): Move around.
(End_Location): New macro.
* gcc-interface/trans.c (Case_Statement_to_gnu): Use End_Location.


-- 
Eric BotcazouIndex: fe.h
===
--- fe.h	(revision 210579)
+++ fe.h	(working copy)
@@ -56,8 +56,7 @@ extern char Fold_Lower[], Fold_Upper[];
 extern Boolean Debug_Flag_NN;
 
 /* einfo: We will be setting Esize for types, Component_Bit_Offset for fields,
-   Alignment for types and objects, Component_Size for array types, and
-   Present_Expr for N_Variant nodes.  */
+   Alignment for types and objects, Component_Size for array types.  */
 
 #define Set_Alignment			einfo__set_alignment
 #define Set_Component_Bit_Offset	einfo__set_component_bit_offset
@@ -65,7 +64,6 @@ extern Boolean Debug_Flag_NN;
 #define Set_Esize			einfo__set_esize
 #define Set_Mechanism			einfo__set_mechanism
 #define Set_RM_Size			einfo__set_rm_size
-#define Set_Present_Expr		sinfo__set_present_expr
 
 extern void Set_Alignment		(Entity_Id, Uint);
 extern void Set_Component_Bit_Offset	(Entity_Id, Uint);
@@ -73,7 +71,6 @@ extern void Set_Component_Size		(Entity_
 extern void Set_Esize			(Entity_Id, Uint);
 extern void Set_Mechanism		(Entity_Id, Mechanism_Type);
 extern void Set_RM_Size			(Entity_Id, Uint);
-extern void Set_Present_Expr		(Node_Id, Uint);
 
 #define Is_Entity_Name einfo__is_entity_name
 extern Boolean Is_Entity_Name		(Node_Id);
@@ -253,11 +250,15 @@ extern Node_Id First_Actual		(Node_Id);
 extern Node_Id Next_Actual		(Node_Id);
 extern Boolean Requires_Transient_Scope (Entity_Id);
 
-/* sinfo: These functions aren't in sinfo.h since we don't make the
-   setting functions, just the retrieval functions.  */
+/* sinfo: */
 
-#define Set_Has_No_Elaboration_Code sinfo__set_has_no_elaboration_code
+#define End_Location			sinfo__end_location
+#define Set_Has_No_Elaboration_Code 	sinfo__set_has_no_elaboration_code
+#define Set_Present_Expr		sinfo__set_present_expr
+
+extern Source_Ptr End_Location 		(Node_Id);
 extern void Set_Has_No_Elaboration_Code	(Node_Id, Boolean);
+extern void Set_Present_Expr		(Node_Id, Uint);
 
 /* targparm: */
 
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 210585)
+++ gcc-interface/trans.c	(working copy)
@@ -2384,8 +2384,7 @@ Case_Statement_to_gnu (Node_Id gnat_node
 
   /* We build a SWITCH_EXPR that contains the code with interspersed
  CASE_LABEL_EXPRs for each label.  */
-  if (!Sloc_to_locus (Sloc (gnat_node) + UI_To_Int (End_Span (gnat_node)),
-  &end_locus))
+  if (!Sloc_to_locus (End_Location (gnat_node), &end_locus))
 end_locus = input_location;
   gnu_label = create_artificial_label (end_locus);
   start_stmt_group ();


Re: [PATCH][MIPS] Implement O32 FPXX ABI (GCC)

2014-05-18 Thread Richard Sandiford
Matthew Fortune  writes:
> *) Dwarf debug for 64-bit values in floating point values for FPXX can't
>be strictly correct for both 32-bit and 64-bit registers but opts to
>describe one 64-bit register as that is what the FPXX ABI is emulating.
>I have not yet checked what exactly happens in GDB when confronted with
>this and 32-bit registers. This also impacts frame information described
>via mips_save_reg and mips_restore_reg. Advice on this would be
>appreciated.

I'm not sure what's best either.  Clearly it's something that needs
to be spelled out in the ABI, but I can imagine it would be dictated
by what consumers like the unwinder and gdb find easiest to handle.

> *) ISA_HAS_MXHC1 could be defined as true for all three O32 FP ABIs but
>I left out FP32 to maintain historic behaviour. It should be safe to
>Include it though. Thoughts?

Sounds like the right call to me FWIW.  Enabling it for FP32 is a separate
change really.

> *) Because GCC can be built to have mfpxx or mfp64 as the default option
>the ASM_SPEC has to handle these specially such that they are not
>passed in conjunction with -msingle-float. Depending on how all this
>option handling pans out then this may also need to account for
>msoft-float as well. It is an error to have -msoft-float and -mfp64 in
>the assembler.

The assembler and GCC shouldn't treat the options differently though.
Either it should be OK for both or neither.

> @@ -5141,7 +5141,7 @@ mips_get_arg_info (struct mips_arg_info *info, const 
> CUMULATIVE_ARGS *cum,
>|| SCALAR_FLOAT_TYPE_P (type)
>|| VECTOR_FLOAT_TYPE_P (type))
>&& (GET_MODE_CLASS (mode) == MODE_FLOAT
> -  || mode == V2SFmode)
> +  || (TARGET_PAIRED_SINGLE_FLOAT && mode == V2SFmode))
>&& GET_MODE_SIZE (mode) <= UNITS_PER_FPVALUE);
>break;

This looks odd.  We shouldn't have V2SF values if there's no ISA support
for them.

> @@ -5636,7 +5636,7 @@ mips_return_fpr_pair (enum machine_mode mode,
>  {
>int inc;
>  
> -  inc = (TARGET_NEWABI ? 2 : MAX_FPRS_PER_FMT);
> +  inc = ((TARGET_NEWABI || mips_abi == ABI_32) ? 2 : MAX_FPRS_PER_FMT);

Formatting nit: no extra brackets here.

> @@ -6508,13 +6508,27 @@ mips_output_64bit_xfer (char direction, unsigned int 
> gpreg, unsigned int fpreg)
>if (TARGET_64BIT)
>  fprintf (asm_out_file, "\tdm%cc1\t%s,%s\n", direction,
>reg_names[gpreg], reg_names[fpreg]);
> -  else if (TARGET_FLOAT64)
> +  else if (ISA_HAS_MXHC1)
>  {
>fprintf (asm_out_file, "\tm%cc1\t%s,%s\n", direction,
>  reg_names[gpreg + TARGET_BIG_ENDIAN], reg_names[fpreg]);
>fprintf (asm_out_file, "\tm%chc1\t%s,%s\n", direction,
>  reg_names[gpreg + TARGET_LITTLE_ENDIAN], reg_names[fpreg]);
>  }
> +  else if (TARGET_FLOATXX && direction == 't')
> +{
> +  /* Use the argument save area to move via memory.  */
> +  fprintf (asm_out_file, "\tsw\t%s,0($sp)\n", reg_names[gpreg]);
> +  fprintf (asm_out_file, "\tsw\t%s,4($sp)\n", reg_names[gpreg + 1]);
> +  fprintf (asm_out_file, "\tldc1\t%s,0($sp)\n", reg_names[fpreg]);
> +}
> +  else if (TARGET_FLOATXX && direction == 'f')
> +{
> +  /* Use the argument save area to move via memory.  */
> +  fprintf (asm_out_file, "\tsdc1\t%s,0($sp)\n", reg_names[fpreg]);
> +  fprintf (asm_out_file, "\tlw\t%s,0($sp)\n", reg_names[gpreg]);
> +  fprintf (asm_out_file, "\tlw\t%s,4($sp)\n", reg_names[gpreg + 1]);
> +}

The argument save area might be in use.  E.g. if an argument register
gets spilled, we'll generally try to spill it to the save area rather
than create a new stack slot for it.

This case should always be handled via SECONDARY_MEMORY_NEEDED.

> @@ -10499,7 +10544,7 @@ mips_for_each_saved_acc (HOST_WIDE_INT sp_offset, 
> mips_save_restore_fn fn)
>  static void
>  mips_save_reg (rtx reg, rtx mem)
>  {
> -  if (GET_MODE (reg) == DFmode && !TARGET_FLOAT64)
> +  if (GET_MODE (reg) == DFmode && !TARGET_FLOAT64 && !TARGET_FLOATXX)

TARGET_FLOAT32, and elsewhere.

> @@ -12202,7 +12247,8 @@ mips_secondary_reload_class (enum reg_class rclass,
>   return NO_REGS;
>  
>/* Otherwise, we need to reload through an integer register.  */
> -  return GR_REGS;
> +  if (regno >= 0)
> +return GR_REGS;
>  }
>if (FP_REG_P (regno))
>  return reg_class_subset_p (rclass, GR_REGS) ? NO_REGS : GR_REGS;

Why's this change needed?  Although I assume it's dead code if you tested
against LRA.

> @@ -12210,6 +12256,22 @@ mips_secondary_reload_class (enum reg_class rclass,
>return NO_REGS;
>  }
>  
> +/* Implement HARD_REGNO_CALLER_SAVE_MODE.
> +   Always save floating-point registers using their current mode to avoid
> +   using a 64-bit load/store when a 64-bit FP register only contains a 32-bit
> +   mode.  */
> +
> +enum machine_mode
> +mips_hard_regno_calle

Re: Eliminate write-only variables

2014-05-18 Thread Jan Hubicka
Sandra,
> This patch seems quite similar in purpose to the
> remove_local_statics optimization that Mentor has proposed, although
> the implementation is quite different.  Here is the last version of
> our patch, prepared by Bernd Schmidt last year:
> 
> https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00317.html

Thanks for pointer, I did not notice this patch!
The approach is indeed very different.  So the patch works on function basis
and cares only about local statics of functions that was not inlined?
> 
> I think we can drop our patch from our local tree now, but it
> includes a large number of test cases which I think are worth
> keeping on mainline.  A few of them fail with your implementation,
> though -- which might be genuine bugs, or just different limitations
> of the two approaches.  Can you take a look?
> 
> The failing tests are remove-local-statics-{4,5,7,12,14b}.c.

+/* Verify that we don't eliminate a global static variable.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "global_static" } } */
+
+static int global_static;
+
+int
+test1 (int x)
+{
+  global_static = x;
+
+  return global_static + x;
+}

here test1 optimizes into

  global_static=x;
  return x+x;

this makes global_static write only and thus it is correctly eliminated.
So I think this testcase is bogus.

+++ b/gcc/testsuite/gcc.dg/remove-local-statics-5.c
@@ -0,0 +1,24 @@
+/* Verify that we do not eliminate a static local variable whose uses
+   are dominated by a def when the function calls setjmp.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "thestatic" } } */
+
+#include 
+
+int
+foo (int x)
+{
+  static int thestatic;
+  int retval;
+  jmp_buf env;
+
+  thestatic = x;
+
+  retval = thestatic + x;
+
+  setjmp (env);
+
+  return retval;
+}

I belive this is similar case.  I do not see setjmp changing anything here, 
since
local optimizers turns retval = x+x;
What it was intended to test?

--- /dev/null
+++ b/gcc/testsuite/gcc.dg/remove-local-statics-7.c
@@ -0,0 +1,19 @@
+/* Verify that we eliminate a static local variable where it is defined
+   along all paths leading to a use.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "thestatic" } } */
+
+int
+test1 (int x)
+{
+  static int thestatic;
+
+  if (x < 0)
+thestatic = x;
+  else
+thestatic = -x;
+
+  return thestatic + x;
+}

Here we get after early optimizations:

int
test1 (int x)
{
  static int thestatic;
  int thestatic.0_5;
  int thestatic.1_7;
  int _8;

  :
  if (x_2(D) < 0)
goto ;
  else
goto ;

  :
  thestatic = x_2(D);
  goto ;

  :
  thestatic.0_5 = -x_2(D);
  thestatic = thestatic.0_5;

  :
  thestatic.1_7 = thestatic;
  _8 = thestatic.1_7 + x_2(D);
  return _8;

}

and thus we still have bogus read from thestatic.  Because my analysis works at 
IPA level,
we won't benefit from fact that dom2 eventually cleans it up as:
int
test1 (int x)
{
  static int thestatic;
  int thestatic.0_5;
  int thestatic.1_7;
  int _8;
  int prephitmp_10;

  :
  if (x_2(D) < 0)
goto ;
  else
goto ;

  :
  thestatic = x_2(D);
  goto ;

  :
  thestatic.0_5 = -x_2(D);
  thestatic = thestatic.0_5;

  :
  # prephitmp_10 = PHI 
  thestatic.1_7 = prephitmp_10;
  _8 = thestatic.1_7 + x_2(D);
  return _8;

}

Richi, is there a way to teach early FRE to get this transformation?
I see it is a partial redundancy problem...

+/* Verify that we do not eliminate a static variable when it is declared
+   in a function that has nested functions.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "thestatic" } } */
+
+int test1 (int x)
+{
+  static int thestatic;
+
+  int nested_test1 (int x)
+  {
+return x + thestatic;
+  }
+
+  thestatic = x;
+
+  return thestatic + x + nested_test1 (x);
+}

Here we work hard enough to optimize test1 as:
int
test1 (int x)
{
  static int thestatic;
  int _4;
  int _5;

  :
  thestatic = x_2(D);
  _4 = x_2(D) + x_2(D);
  _5 = _4 + _4;
  return _5;

}

thus inlining nested_test1 during early optimization. This makes the removal 
valid.

+/* Verify that we do not eliminate a static local variable if the function
+   containing it is inlined.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "thestatic" } } */
+
+int
+test2 (int x)
+{
+  if (x < 0)
+return 0;
+  else
+return test1 (x - 1);
+}
+
+inline int
+test1 (int x)
+{
+  static int thestatic;
+  int y;
+
+  thestatic = x;
+
+  y = test2 (thestatic - 1);
+
+  return y + x;
+}

Here thestatic becomes write only during early optimization, so again we can 
correctly eliminate it.

Sandra,
do you think you can drop the testcases that are not valid and commit the valid 
one minus
remove-local-statics-7.c for which we can fill in enhancement request?

For cases like local-statics-7 your approach can be "saved" by adding simple 
IPA analysis
to look for static vars that

[Ada] Fix ICE on volatile unconstrained array parameter

2014-05-18 Thread Eric Botcazou
The compiler aborts on a subprogram which takes a parameter with a volatile 
unconstrained array type.  This has apparently never worked.

Tested on x86_64-suse-linux, applied on the mainline.


2014-05-18  Eric Botcazou  

* gcc-interface/decl.c (change_qualified_type): New static function.
(gnat_to_gnu_entity): Use it throughout to add qualifiers on types.
: Set TYPE_VOLATILE on the array type directly.
: Likewise.
Do not set flags on an UNCONSTRAINED_ARRAY_TYPE directly.
(gnat_to_gnu_component_type): Likewise.
(gnat_to_gnu_param): Likewise.


2014-05-18  Eric Botcazou  

* gnat.dg/volatile12.ad[sb]: New test.


-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 210585)
+++ gcc-interface/decl.c	(working copy)
@@ -145,6 +145,7 @@ static tree gnat_to_gnu_component_type (
 static tree gnat_to_gnu_param (Entity_Id, Mechanism_Type, Entity_Id, bool,
 			   bool *);
 static tree gnat_to_gnu_field (Entity_Id, tree, int, bool, bool);
+static tree change_qualified_type (tree, int);
 static bool same_discriminant_p (Entity_Id, Entity_Id);
 static bool array_type_has_nonaliased_component (tree, Entity_Id);
 static bool compile_time_known_address_p (Node_Id);
@@ -1047,9 +1048,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		   Note that we need to preserve the volatility of the renamed
 		   object through the indirection.  */
 		if (TREE_THIS_VOLATILE (gnu_expr) && !TYPE_VOLATILE (gnu_type))
-		  gnu_type = build_qualified_type (gnu_type,
-		   (TYPE_QUALS (gnu_type)
-		| TYPE_QUAL_VOLATILE));
+		  gnu_type
+		= change_qualified_type (gnu_type, TYPE_QUAL_VOLATILE);
 		gnu_type = build_reference_type (gnu_type);
 		inner_const_flag = TREE_READONLY (gnu_expr);
 		const_flag = true;
@@ -1107,9 +1107,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		 || imported_p
 		 || Present (Address_Clause (gnat_entity)
 	&& !TYPE_VOLATILE (gnu_type))
-	  gnu_type = build_qualified_type (gnu_type,
-	   (TYPE_QUALS (gnu_type)
-	| TYPE_QUAL_VOLATILE));
+	  gnu_type = change_qualified_type (gnu_type, TYPE_QUAL_VOLATILE);
 
 	/* If we are defining an aliased object whose nominal subtype is
 	   unconstrained, the object is a record that contains both the
@@ -1408,8 +1406,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  }
 
 	if (const_flag)
-	  gnu_type = build_qualified_type (gnu_type, (TYPE_QUALS (gnu_type)
-		  | TYPE_QUAL_CONST));
+	  gnu_type = change_qualified_type (gnu_type, TYPE_QUAL_CONST);
 
 	/* Convert the expression to the type of the object except in the
 	   case where the object's type is unconstrained or the object's type
@@ -2243,6 +2240,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  SET_TYPE_MODE (tem, BLKmode);
 	  }
 
+	TYPE_VOLATILE (tem) = Treat_As_Volatile (gnat_entity);
+
 	/* If an alignment is specified, use it if valid.  But ignore it
 	   for the original type of packed array types.  If the alignment
 	   was requested with an explicit alignment clause, state so.  */
@@ -2595,6 +2594,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		SET_TYPE_MODE (gnu_type, BLKmode);
 	}
 
+	  TYPE_VOLATILE (gnu_type) = Treat_As_Volatile (gnat_entity);
+
 	  /* Attach the TYPE_STUB_DECL in case we have a parallel type.  */
 	  TYPE_STUB_DECL (gnu_type)
 	= create_type_stub_decl (gnu_entity_name, gnu_type);
@@ -2725,9 +2726,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  process_attributes (&gnu_type, &attr_list, false, gnat_entity);
 	  if (Treat_As_Volatile (gnat_entity))
 		gnu_type
-		  = build_qualified_type (gnu_type,
-	  TYPE_QUALS (gnu_type)
-	  | TYPE_QUAL_VOLATILE);
+		  = change_qualified_type (gnu_type, TYPE_QUAL_VOLATILE);
 	  /* Make it artificial only if the base type was artificial too.
 		 That's sort of "morally" true and will make it possible for
 		 the debugger to look it up by name in DWARF, which is needed
@@ -3218,9 +3217,6 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	&& Is_By_Reference_Type (gnat_entity))
 	  SET_TYPE_MODE (gnu_type, BLKmode);
 
-	/* We used to remove the associations of the discriminants and _Parent
-	   for validity checking but we may need them if there's a Freeze_Node
-	   for a subtype used in this record.  */
 	TYPE_VOLATILE (gnu_type) = Treat_As_Volatile (gnat_entity);
 
 	/* Fill in locations of fields.  */
@@ -3917,9 +3913,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		&& TREE_CODE (gnu_desig_type) != UNCONSTRAINED_ARRAY_TYPE)
 	  {
 		gnu_desig_type
-		  = build_qualified_type
-		(gnu_desig_type,
-		 TYPE_QUALS (gnu_desig_type) | TYPE_QUAL_CONST);
+		  = change_qualified_type (gnu_desig_type, TYPE_QUAL_CONST);
 
 		/* Some extra processing is required if we are building a
 		   pointer to an incomplete type (in the GCC sense).  We might
@@ -4623,18 +4617,17 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	if (

[Ada] Fix -feliminate-unused-debug-types

2014-05-18 Thread Eric Botcazou
This was broken in Ada by recent callgraph/varpool changes.

Tested on x86_64-suse-linux, applied on the mainline and 4.9 branch.


2014-05-18  Eric Botcazou  

* utils.c (gnat_write_global_declarations): Adjust the flags put on
dummy_global.


-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 210579)
+++ gcc-interface/utils.c	(working copy)
@@ -5756,9 +5756,10 @@ gnat_write_global_declarations (void)
   dummy_global
 	= build_decl (BUILTINS_LOCATION, VAR_DECL, get_identifier (label),
 		  void_type_node);
+  DECL_HARD_REGISTER (dummy_global) = 1;
   TREE_STATIC (dummy_global) = 1;
-  TREE_ASM_WRITTEN (dummy_global) = 1;
   node = varpool_node_for_decl (dummy_global);
+  node->definition = 1;
   node->force_output = 1;
 
   while (!types_used_by_cur_var_decl->is_empty ())


[Ada] Set function_start_locus in gigi

2014-05-18 Thread Eric Botcazou
gimple_expand_cfg contains these lines:

  /* Eventually, all FEs should explicitly set function_start_locus.  */
  if (cfun->function_start_locus == UNKNOWN_LOCATION)
   set_curr_insn_source_location
 (DECL_SOURCE_LOCATION (current_function_decl));
  else
   set_curr_insn_source_location (cfun->function_start_locus);

so it's time to do exactly that.

Tested on x86_64-suse-linux, applied on the mainline.


2014-05-18  Eric Botcazou  

* gcc-interface/trans.c (Subprogram_Body_to_gnu): Rework comment and
set function_start_locus.


-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 210587)
+++ gcc-interface/trans.c	(working copy)
@@ -3574,6 +3574,7 @@ Subprogram_Body_to_gnu (Node_Id gnat_nod
   /* The entry in the CI_CO_LIST that represents a function return, if any.  */
   tree gnu_return_var_elmt = NULL_TREE;
   tree gnu_result;
+  location_t locus;
   struct language_function *gnu_subprog_language;
   vec *cache;
 
@@ -3610,14 +3611,15 @@ Subprogram_Body_to_gnu (Node_Id gnat_nod
   relayout_decl (gnu_result_decl);
 }
 
-  /* Set the line number in the decl to correspond to that of the body so that
- the line number notes are written correctly.  */
-  Sloc_to_locus (Sloc (gnat_node), &DECL_SOURCE_LOCATION (gnu_subprog_decl));
+  /* Set the line number in the decl to correspond to that of the body.  */
+  Sloc_to_locus (Sloc (gnat_node), &locus);
+  DECL_SOURCE_LOCATION (gnu_subprog_decl) = locus;
 
   /* Initialize the information structure for the function.  */
   allocate_struct_function (gnu_subprog_decl, false);
   gnu_subprog_language = ggc_cleared_alloc ();
   DECL_STRUCT_FUNCTION (gnu_subprog_decl)->language = gnu_subprog_language;
+  DECL_STRUCT_FUNCTION (gnu_subprog_decl)->function_start_locus = locus;
   set_cfun (NULL);
 
   begin_subprog_body (gnu_subprog_decl);


Replace REG_CROSSING_JUMP with an rtx flag

2014-05-18 Thread Richard Sandiford
find_reg_note showed up in the profile of a -O2 compile of an oldish
fold-const.ii.  The main hot call was:

  /* If we are partitioning hot/cold basic_blocks, we don't want to mess
 up jumps that cross between hot/cold sections.

 Basic block partitioning may result in some jumps that appear
 to be optimizable (or blocks that appear to be mergeable), but which
 really must be left untouched (they are required to make it safely
 across partition boundaries).  See the comments at the top of
 bb-reorder.c:partition_hot_cold_basic_blocks for complete
 details.  */

  if (first != EXIT_BLOCK_PTR_FOR_FN (cfun)
  && find_reg_note (BB_END (first), REG_CROSSING_JUMP, NULL_RTX))
return changed;

from try_forward_edges.  I suppose the immediate problem was that it
doesn't check specifically for blocks that end in jumps.  Applying it
to things like calls could involve a lot pointer chasing.

A 3-pointer reg note seems a bit heavyweight for a boolean anyway.
JUMP_INSNs have a quite a few unused rtx header flags (including "jump",
ironically) so this patch records the information there instead.

This reduces the compile time by about ~0.5%.  Not a huge amount,
but maybe it counts as a cleanup.

A neater fix would be to record this in the basic_block flags,
so that we don't even need to bring the jump into cache.
That would be tricky to keep up-to-date though.

Tested on x86_64-linux-gnu.  Also tested by building arc-elf and sh-elf
compilers and checking that there was no change in testsuite output for
gcc.dg, g++.dg and gcc-c-torture.  OK to install?

I wondered about converting REG_SETJMP too, but it probably only makes
sense to use up the flags for things that are known to benefit.

Thanks,
Richard


gcc/
* reg-notes.def (CROSSING_JUMP): Likewise.
* rtl.h (rtx_def): Update comment for jump flag.
(CROSSING_JUMP_P): Define.
* cfgcleanup.c (try_forward_edges, try_optimize_cfg): Use it instead
of a REG_CROSSING_JUMP note.
* cfghooks.c (tidy_fallthru_edges): Likewise.
* cfgrtl.c (fixup_partition_crossing, rtl_verify_edges): Likewise.
* emit-rtl.c (try_split): Likewise.
* haifa-sched.c (sched_create_recovery_edges): Likewise.
* ifcvt.c (find_if_case_1, find_if_case_2): Likewise.
* jump.c (redirect_jump_2): Likewise.
* reorg.c (follow_jumps, fill_slots_from_thread): Likewise.
(relax_delay_slots): Likewise.
* config/arc/arc.md (jump_i, cbranchsi4_scratch, *bbit): Likewise.
(bbit_di): Likewise.
* config/arc/arc.c (arc_reorg, arc_can_follow_jump): Likewise.
* config/sh/sh.md (jump_compact): Likewise.
* bb-reorder.c (rotate_loop): Likewise.
(pass_duplicate_computed_gotos::execute): Likewise.
(add_reg_crossing_jump_notes): Rename to...
(update_crossing_jump_flags): ...this.
(pass_partition_blocks::execute): Update accordingly.

Index: gcc/reg-notes.def
===
--- gcc/reg-notes.def   2014-05-17 13:44:06.056606500 +0100
+++ gcc/reg-notes.def   2014-05-17 17:13:37.685638401 +0100
@@ -188,11 +188,6 @@ REG_NOTE (NORETURN)
computed goto.  */
 REG_NOTE (NON_LOCAL_GOTO)
 
-/* Indicates that a jump crosses between hot and cold sections in a
-   (partitioned) assembly or .o file, and therefore should not be
-   reduced to a simpler jump by optimizations.  */
-REG_NOTE (CROSSING_JUMP)
-
 /* This kind of note is generated at each to `setjmp', and similar
functions that can return twice.  */
 REG_NOTE (SETJMP)
Index: gcc/rtl.h
===
--- gcc/rtl.h   2014-05-17 13:44:06.056606500 +0100
+++ gcc/rtl.h   2014-05-17 17:13:37.686638410 +0100
@@ -276,6 +276,7 @@ struct GTY((chain_next ("RTX_NEXT (&%h)"
 
   /* 1 in a MEM if we should keep the alias set for this mem unchanged
  when we access a component.
+ 1 in a JUMP_INSN if it is a crossing jump.
  1 in a CALL_INSN if it is a sibling call.
  1 in a SET that is for a return.
  In a CODE_LABEL, part of the two-bit alternate entry field.
@@ -942,6 +943,10 @@ #define RTX_FRAME_RELATED_P(RTX)   
\
 #define INSN_DELETED_P(RTX)\
   (RTL_INSN_CHAIN_FLAG_CHECK ("INSN_DELETED_P", (RTX))->volatil)
 
+/* 1 if JUMP RTX is a crossing jump.  */
+#define CROSSING_JUMP_P(RTX) \
+  (RTL_FLAG_CHECK1 ("CROSSING_JUMP_P", (RTX), JUMP_INSN)->jump)
+
 /* 1 if RTX is a call to a const function.  Built from ECF_CONST and
TREE_READONLY.  */
 #define RTL_CONST_CALL_P(RTX)  \
Index: gcc/cfgcleanup.c
===
--- gcc/cfgcleanup.c2014-05-17 13:44:06.056606500 +0100
+++ gcc/cfgcleanup.c2014-05-17 17:13:37.677638330 +0100
@@ -419,7 +419,7 @@ try_forward_e

Re: Replace REG_CROSSING_JUMP with an rtx flag

2014-05-18 Thread Eric Botcazou
> A 3-pointer reg note seems a bit heavyweight for a boolean anyway.
> JUMP_INSNs have a quite a few unused rtx header flags (including "jump",
> ironically) so this patch records the information there instead.
> 
> This reduces the compile time by about ~0.5%.  Not a huge amount,
> but maybe it counts as a cleanup.

Good catch!

>   * reg-notes.def (CROSSING_JUMP): Likewise.
>   * rtl.h (rtx_def): Update comment for jump flag.
>   (CROSSING_JUMP_P): Define.
>   * cfgcleanup.c (try_forward_edges, try_optimize_cfg): Use it instead
>   of a REG_CROSSING_JUMP note.
>   * cfghooks.c (tidy_fallthru_edges): Likewise.
>   * cfgrtl.c (fixup_partition_crossing, rtl_verify_edges): Likewise.
>   * emit-rtl.c (try_split): Likewise.
>   * haifa-sched.c (sched_create_recovery_edges): Likewise.
>   * ifcvt.c (find_if_case_1, find_if_case_2): Likewise.
>   * jump.c (redirect_jump_2): Likewise.
>   * reorg.c (follow_jumps, fill_slots_from_thread): Likewise.
>   (relax_delay_slots): Likewise.
>   * config/arc/arc.md (jump_i, cbranchsi4_scratch, *bbit): Likewise.
>   (bbit_di): Likewise.
>   * config/arc/arc.c (arc_reorg, arc_can_follow_jump): Likewise.
>   * config/sh/sh.md (jump_compact): Likewise.
>   * bb-reorder.c (rotate_loop): Likewise.
>   (pass_duplicate_computed_gotos::execute): Likewise.
>   (add_reg_crossing_jump_notes): Rename to...
>   (update_crossing_jump_flags): ...this.
>   (pass_partition_blocks::execute): Update accordingly.

OK, thanks.

-- 
Eric Botcazou


[PATCH, PR61219]: Fix sNaN handling in ARM float to double conversion

2014-05-18 Thread Aurelien Jarno
On ARM soft-float, the float to double conversion doesn't convert a sNaN
to qNaN as the IEEE Std 754 standard mandates:

"Under default exception handling, any operation signaling an invalid
operation exception and for which a floating-point result is to be
delivered shall deliver a quiet NaN."

Given the soft float ARM code ignores exceptions and always provides a
result, a float to double conversion of a signaling NaN should return a
quiet NaN. Fix this in extendsfdf2.


2014-05-18  Aurelien Jarno  
   
PR target/61219
* config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.


Index: libgcc/config/arm/ieee754-df.S
===
--- libgcc/config/arm/ieee754-df.S  (revision 210588)
+++ libgcc/config/arm/ieee754-df.S  (working copy)
@@ -473,11 +473,15 @@
eorne   xh, xh, #0x3800 @ fixup exponent otherwise.
RETc(ne)@ and return it.
 
-   teq r2, #0  @ if actually 0
-   do_it   ne, e
-   teqne   r3, #0xff00 @ or INF or NAN
+   bicsr2, r2, #0xff00 @ isolate mantissa
+   do_it   eq  @ if 0, that is ZERO or INF,
RETc(eq)@ we are done already.
 
+   teq r3, #0xff00 @ check for NAN
+   do_it   eq, t
+   orreq   xh, xh, #0x0008 @ change to quiet NAN
+   RETc(eq)@ and return it.
+
@ value was denormalized.  We can normalize it now.
do_push {r4, r5, lr}
mov r4, #0x380  @ setup corresponding exponent

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


Re: Eliminate write-only variables

2014-05-18 Thread Martin Jambor
Hi,



On Fri, May 16, 2014 at 07:25:59PM +0200, Jan Hubicka wrote:
>

...

> 
>   * varpool.c (dump_varpool_node): Dump write-only flag.
>   * lto-cgraph.c (lto_output_varpool_node, input_varpool_node): Stream
>   write-only flag.
>   * tree-cfg.c (execute_fixup_cfg): Remove statements setting write-only 
> variables.
> 
> 
>   * gcc.c-torture/execute/20101011-1.c: Update testcase.
>   * gcc.dg/ira-shrinkwrap-prep-1.c: Update testcase.
>   * gcc.dg/tree-ssa/writeonly.c: New testcase.
>   * gcc.dg/tree-ssa/ssa-dse-6.c: Update testcase.
>   * gcc.dg/tree-ssa/pr21559.c: Update testcase.
>   * gcc.dg/debug/pr35154.c: Update testcase.
>   * gcc.target/i386/vectorize1.c: Update testcase.
>   * ipa.c (process_references): New function.
>   (set_readonly_bit): New function.
>   (set_writeonly_bit): New function.
>   (clear_addressable_bit): New function.
>   (ipa_discover_readonly_nonaddressable_var): Mark write only variables; 
> fix
>   handling of aliases.
>   * cgraph.h (struct varpool_node): Add writeonly flag.
> 

...

> Index: ipa.c
> ===
> --- ipa.c (revision 210514)
> +++ ipa.c (working copy)
> @@ -640,43 +711,40 @@ ipa_discover_readonly_nonaddressable_var
>if (dump_file)
>  fprintf (dump_file, "Clearing variable flags:");
>FOR_EACH_VARIABLE (vnode)
> -if (vnode->definition && varpool_all_refs_explicit_p (vnode)
> +if (!vnode->alias
>   && (TREE_ADDRESSABLE (vnode->decl)
> + || !vnode->writeonly
>   || !TREE_READONLY (vnode->decl)))
>{
>   bool written = false;
>   bool address_taken = false;
> - int i;
> -struct ipa_ref *ref;
> -for (i = 0; ipa_ref_list_referring_iterate (&vnode->ref_list,
> -i, ref)
> - && (!written || !address_taken); i++)
> -   switch (ref->use)
> - {
> - case IPA_REF_ADDR:
> -   address_taken = true;
> -   break;
> - case IPA_REF_LOAD:
> -   break;
> - case IPA_REF_STORE:
> -   written = true;
> -   break;
> - }
> - if (TREE_ADDRESSABLE (vnode->decl) && !address_taken)
> + bool read = false;
> + bool explicit_refs = true;
> +
> + process_references (vnode, &written, &address_taken, &read, 
> &explicit_refs);
> + if (!explicit_refs)
> +   continue;
> + if (!address_taken)
> {
> - if (dump_file)
> + if (TREE_ADDRESSABLE (vnode->decl) && dump_file)
> fprintf (dump_file, " %s (addressable)", vnode->name ());

I know it is technically not a part of the patch... but surely this is
supposed to dump not addressable and might be quite a bit confusing,
so if you are already changing this, correcting the dump would be
great.

Martin

> - TREE_ADDRESSABLE (vnode->decl) = 0;
> + varpool_for_node_and_aliases (vnode, clear_addressable_bit, NULL, 
> true);
> }
> - if (!TREE_READONLY (vnode->decl) && !address_taken && !written
> + if (!address_taken && !written
>   /* Making variable in explicit section readonly can cause section
>  type conflict. 
>  See e.g. gcc.c-torture/compile/pr23237.c */
>   && DECL_SECTION_NAME (vnode->decl) == NULL)
> {
> - if (dump_file)
> + if (!TREE_READONLY (vnode->decl) && dump_file)
> fprintf (dump_file, " %s (read-only)", vnode->name ());
> - TREE_READONLY (vnode->decl) = 1;
> + varpool_for_node_and_aliases (vnode, set_readonly_bit, NULL, true);
> +   }
> + if (!vnode->writeonly && !read && !address_taken)
> +   {
> + if (dump_file)
> +   fprintf (dump_file, " %s (write-only)", vnode->name ());
> + varpool_for_node_and_aliases (vnode, set_writeonly_bit, NULL, true);
> }
>}
>if (dump_file)


Fix remove_unreachable_nodes wrt comdat locals

2014-05-18 Thread Jan Hubicka
Hi,
this patch fixes ICE seen when compiling libreoffice with LTO
at 4.9 release tree.  The problem is that we now use comdat locals for
decloned constructors and symtab_remove_unreachable_nodes sometimes
remove their bodies but keeps their nodes around.

In this case the nodes needs to be brought out of comdat groups or
LTO streaming will mess up the linked list holding them.

Bootstrapped/regtested x86_64-linux and tested with libreoffice LTO build
will commit it to mainline and release branch shortly.

Honza

* ipa.c (symtab_remove_unreachable_nodes): Remove
symbol from comdat group if its body was eliminated.
(comdat_can_be_unshared_p_1): Static symbols can always
be privatized.
* symtab.c (symtab_remove_from_same_comdat_group): Break out
from ...
(symtab_unregister_node): ... this one.
(verify_symtab_base): More strict checking of comdats.
* cgraph.h (symtab_remove_from_same_comdat_group): Declare.
Index: ipa.c
===
--- ipa.c   (revision 210563)
+++ ipa.c   (working copy)
@@ -517,6 +517,7 @@ symtab_remove_unreachable_nodes (bool be
  if (!node->in_other_partition)
node->local.local = false;
  cgraph_node_remove_callees (node);
+ symtab_remove_from_same_comdat_group (node);
  ipa_remove_all_references (&node->ref_list);
  changed = true;
}
@@ -572,6 +573,8 @@ symtab_remove_unreachable_nodes (bool be
  vnode->analyzed = false;
  vnode->aux = NULL;
 
+ symtab_remove_from_same_comdat_group (vnode);
+
  /* Keep body if it may be useful for constant folding.  */
  if ((init = ctor_for_folding (vnode->decl)) == error_mark_node)
varpool_remove_initializer (vnode);
@@ -708,6 +711,8 @@ address_taken_from_non_vtable_p (symtab_
 static bool
 comdat_can_be_unshared_p_1 (symtab_node *node)
 {
+  if (!node->externally_visible)
+return true;
   /* When address is taken, we don't know if equality comparison won't
  break eventually. Exception are virutal functions, C++
  constructors/destructors and vtables, where this is not possible by
Index: symtab.c
===
--- symtab.c(revision 210563)
+++ symtab.c(working copy)
@@ -323,16 +323,11 @@ symtab_insert_node_to_hashtable (symtab_
   *slot = node;
 }
 
-/* Remove node from symbol table.  This function is not used directly, but via
-   cgraph/varpool node removal routines.  */
+/* Remove NODE from same comdat group.   */
 
 void
-symtab_unregister_node (symtab_node *node)
+symtab_remove_from_same_comdat_group (symtab_node *node)
 {
-  void **slot;
-  ipa_remove_all_references (&node->ref_list);
-  ipa_remove_all_referring (&node->ref_list);
-
   if (node->same_comdat_group)
 {
   symtab_node *prev;
@@ -346,6 +341,19 @@ symtab_unregister_node (symtab_node *nod
prev->same_comdat_group = node->same_comdat_group;
   node->same_comdat_group = NULL;
 }
+}
+
+/* Remove node from symbol table.  This function is not used directly, but via
+   cgraph/varpool node removal routines.  */
+
+void
+symtab_unregister_node (symtab_node *node)
+{
+  void **slot;
+  ipa_remove_all_references (&node->ref_list);
+  ipa_remove_all_referring (&node->ref_list);
+
+  symtab_remove_from_same_comdat_group (node);
 
   if (node->previous)
 node->previous->next = node->next;
@@ -829,6 +837,16 @@ verify_symtab_base (symtab_node *node)
  error ("non-DECL_ONE_ONLY node in a same_comdat_group list");
  error_found = true;
}
+  if (DECL_COMDAT_GROUP (n->decl) != DECL_COMDAT_GROUP 
(node->same_comdat_group->decl))
+   {
+ error ("same_comdat_group list across different groups");
+ error_found = true;
+   }
+  if (!n->definition)
+   {
+ error ("Node has same_comdat_group but it is not a definition");
+ error_found = true;
+   }
   if (n->type != node->type)
{
  error ("mixing different types of symbol in same comdat groups is not 
supported");
Index: cgraph.h
===
--- cgraph.h(revision 210563)
+++ cgraph.h(working copy)
@@ -723,6 +723,7 @@ enum symbol_partitioning_class
 /* In symtab.c  */
 void symtab_register_node (symtab_node *);
 void symtab_unregister_node (symtab_node *);
+void symtab_remove_from_same_comdat_group (symtab_node *);
 void symtab_remove_node (symtab_node *);
 symtab_node *symtab_get_node (const_tree);
 symtab_node *symtab_node_for_asm (const_tree asmname);
Index: gimple-fold.c
===
--- gimple-fold.c   (revision 210563)
+++ gimple-fold.c   (working copy)
@@ -105,7 +105,9 @@ can_refer_decl_in_current_unit_p (tree d
  external var.  */
   if (!from_decl
   || 

Re: add dbgcnt and opt-info support for devirtualization

2014-05-18 Thread Xinliang David Li
There is no test regression. Ok with this patch?

David

On Fri, May 16, 2014 at 2:19 PM, Xinliang David Li  wrote:
> Modified the patch according to yours and Richard's feedback. PTAL.
>
> thanks,
>
> David
>
> On Fri, May 16, 2014 at 9:03 AM, Jan Hubicka  wrote:
>>> Hi, debugging runtime bugs due to devirtualization can be hard for
>>> very large C++ programs with complicated class hierarchy. This patch
>>> adds the support to report this high level transformation via
>>> -fopt-info (not hidden inside dump file) and the ability the do binary
>>> search with cutoff.
>>>
>>> Ok for trunk after build and test?
>>
>> Seems resonable to me.
>>>
>>> thanks,
>>>
>>> David
>>
>>> Index: ChangeLog
>>> ===
>>> --- ChangeLog (revision 210479)
>>> +++ ChangeLog (working copy)
>>> @@ -1,3 +1,18 @@
>>> +2014-05-15  Xinliang David Li  
>>> +
>>> + * cgraphunit.c (walk_polymorphic_call_targets): Add
>>> + dbgcnt and fopt-info support.
>>> + 2014-05-15  Xinliang David Li  
>>> +
>>> + * cgraphunit.c (walk_polymorphic_call_targets): Add
>>> + dbgcnt and fopt-info support.
>>> + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto.
>>> + * ipa-devirt.c (ipa_devirt): Ditto.
>>> + * ipa.c (walk_polymorphic_call_targets): Ditto.
>>> + * gimple-fold.c (fold_gimple_assign): Ditto.
>>> + (gimple_fold_call): Ditto.
>>> + * dbgcnt.def: New counter.
>>> +
>>>  2014-05-15  Martin Jambor  
>>>
>>>   PR ipa/61085
>>> Index: ipa-prop.c
>>> ===
>>> --- ipa-prop.c(revision 210479)
>>> +++ ipa-prop.c(working copy)
>>> @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.
>>>  #include "ipa-utils.h"
>>>  #include "stringpool.h"
>>>  #include "tree-ssanames.h"
>>> +#include "dbgcnt.h"
>>>
>>>  /* Intermediate information about a parameter that is only useful during 
>>> the
>>> run of ipa_analyze_node and is not kept afterwards.  */
>>> @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c
>>>   fprintf (dump_file, "ipa-prop: Discovered direct call to 
>>> non-function"
>>>   " in %s/%i, making it unreachable.\n",
>>>ie->caller->name (), ie->caller->order);
>>> +  else if (dump_enabled_p ())
>>> + {
>>> +   location_t loc = gimple_location (ie->call_stmt);
>>> +   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
>>> +"Discovered direct call to non-function in %s, 
>>> "
>>> +"making it unreachable\n", ie->caller->name 
>>> ());
>>
>> Perhaps "turning it to __builtin_unreachable call" and similarly in the 
>> other cases
>> we introduce __builtin_unreachable? I think that could be easier for user to 
>> work
>> out.
>>
>> What king of problems in devirtualizatoin you are seeing?
>>
>>
>> Honza


Re: Add a new test

2014-05-18 Thread Xinliang David Li
Ok to check in the test?

David

On Fri, May 16, 2014 at 4:58 PM, Xinliang David Li  wrote:
> This test makes sure compiler does not wrongly devirtualize virtual
> calls into __cxa_pure_virtual or __buitlin_unreachable.
>
> Ok to checkin?
>
> David


Re: Add a new test

2014-05-18 Thread Jan Hubicka
> Ok to check in the test?
OK,
Honza
> 
> David
> 
> On Fri, May 16, 2014 at 4:58 PM, Xinliang David Li  wrote:
> > This test makes sure compiler does not wrongly devirtualize virtual
> > calls into __cxa_pure_virtual or __buitlin_unreachable.
> >
> > Ok to checkin?
> >
> > David


Re: Eliminate write-only variables

2014-05-18 Thread Jan Hubicka
> > +   if (!address_taken)
> >   {
> > -   if (dump_file)
> > +   if (TREE_ADDRESSABLE (vnode->decl) && dump_file)
> >   fprintf (dump_file, " %s (addressable)", vnode->name ());
> 
> I know it is technically not a part of the patch... but surely this is
> supposed to dump not addressable and might be quite a bit confusing,
> so if you are already changing this, correcting the dump would be
> great.

Yep, the original logic was that the variables appear in list of flags removed,
so we are removing addressable flag.  The other two flags do not follow the
practice.  I plan to cleanup this code in general (it has gathered quite some
clutter), so I will look into it next and get dumps more readable.

Honza
> 
> Martin
> 
> > -   TREE_ADDRESSABLE (vnode->decl) = 0;
> > +   varpool_for_node_and_aliases (vnode, clear_addressable_bit, NULL, 
> > true);
> >   }
> > -   if (!TREE_READONLY (vnode->decl) && !address_taken && !written
> > +   if (!address_taken && !written
> > /* Making variable in explicit section readonly can cause section
> >type conflict. 
> >See e.g. gcc.c-torture/compile/pr23237.c */
> > && DECL_SECTION_NAME (vnode->decl) == NULL)
> >   {
> > -   if (dump_file)
> > +   if (!TREE_READONLY (vnode->decl) && dump_file)
> >   fprintf (dump_file, " %s (read-only)", vnode->name ());
> > -   TREE_READONLY (vnode->decl) = 1;
> > +   varpool_for_node_and_aliases (vnode, set_readonly_bit, NULL, true);
> > + }
> > +   if (!vnode->writeonly && !read && !address_taken)
> > + {
> > +   if (dump_file)
> > + fprintf (dump_file, " %s (write-only)", vnode->name ());
> > +   varpool_for_node_and_aliases (vnode, set_writeonly_bit, NULL, true);
> >   }
> >}
> >if (dump_file)


Localize symbols used only from comdat groups

2014-05-18 Thread Jan Hubicka
Hi,
this patch adds simple IPA pass that brings symbols used only from
comdat groups into the groups.  This prevents dead code in cases
where the comdat group is replaced by a copy from different unit.

The patch saves about 0.5% of libreoffice binary and about 1%
of firefox binary with section GC disabled.

One limitation of the pass is that it won't privatize data used by a function
or vice versa, as doing so probably require inveting new comdat group for the
data and turing the symbols into hidden symbols. Something that may make sense
to implement as followup. (in a way we do so for string literals).

Bootstrapped/regtested x86_64-linux, will commit it after some further
testing.

Honza

* tree-pass.h (make_pass_ipa_comdats): New pass.
* timevar.def (TV_IPA_COMDATS): New timevar.
* passes.def (pass_ipa_comdats): Add.
* Makefile.in (OBJS): Add ipa-comdats.o
* ipa-comdats.c: New file.

* g++.dg/ipa/comdat.C: New file.
Index: tree-pass.h
===
--- tree-pass.h (revision 210521)
+++ tree-pass.h (working copy)
@@ -472,6 +472,7 @@ extern simple_ipa_opt_pass *make_pass_ip
 extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_comdats (gcc::context *ctxt);
 
 extern gimple_opt_pass *make_pass_cleanup_cfg_post_optimizing (gcc::context
   *ctxt);
Index: timevar.def
===
--- timevar.def (revision 210521)
+++ timevar.def (working copy)
@@ -71,6 +71,7 @@ DEFTIMEVAR (TV_IPA_DEVIRT  , "ipa de
 DEFTIMEVAR (TV_IPA_CONSTANT_PROP , "ipa cp")
 DEFTIMEVAR (TV_IPA_INLINING  , "ipa inlining heuristics")
 DEFTIMEVAR (TV_IPA_FNSPLIT   , "ipa function splitting")
+DEFTIMEVAR (TV_IPA_COMDATS  , "ipa comdats")
 DEFTIMEVAR (TV_IPA_OPT  , "ipa various optimizations")
 DEFTIMEVAR (TV_IPA_LTO_GIMPLE_IN , "ipa lto gimple in")
 DEFTIMEVAR (TV_IPA_LTO_GIMPLE_OUT, "ipa lto gimple out")
Index: passes.def
===
--- passes.def  (revision 210521)
+++ passes.def  (working copy)
@@ -110,6 +110,10 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
+  /* Comdat privatization come last, as direct references to comdat local
+ symbols are not allowed outside of the comdat group.  Privatizing early
+ would result in missed optimizations due to this restriction.  */
+  NEXT_PASS (pass_ipa_comdats);
   TERMINATE_PASS_LIST ()
 
   /* Simple IPA passes executed after the regular passes.  In WHOPR mode the
Index: Makefile.in
===
--- Makefile.in (revision 210521)
+++ Makefile.in (working copy)
@@ -1269,6 +1269,7 @@ OBJS = \
ipa-devirt.o \
ipa-split.o \
ipa-inline.o \
+   ipa-comdats.o \
ipa-inline-analysis.o \
ipa-inline-transform.o \
ipa-profile.o \
Index: ipa-comdats.c
===
--- ipa-comdats.c   (revision 0)
+++ ipa-comdats.c   (revision 0)
@@ -0,0 +1,387 @@
+/* Localize comdats.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* This is very simple pass that looks for static symbols that are used
+   exlusively by symbol within one comdat group.  In this case it makes
+   sense to bring the symbol itself into the group to avoid dead code
+   that would arrise when the comdat group from current unit is replaced
+   by a different copy.  Consider for example:
+
+static int q(void)
+{
+  
+}
+inline int t(void)
+{
+  return q();
+}
+
+   if Q is used only by T, it makes sense to put Q into T's comdat group.
+
+   The pass solve simple dataflow across the callgraph trying to prove what
+   symbols are used exclusively from a given comdat group.
+
+   The implementation maintains a queue linked by AUX pointer terminated by
+   pointer value 1. Lattice values are NUL

Re: Eliminate write-only variables

2014-05-18 Thread Sandra Loosemore

On 05/18/2014 02:59 PM, Jan Hubicka wrote:

Sandra,

This patch seems quite similar in purpose to the
remove_local_statics optimization that Mentor has proposed, although
the implementation is quite different.  Here is the last version of
our patch, prepared by Bernd Schmidt last year:

https://gcc.gnu.org/ml/gcc-patches/2013-06/msg00317.html


Thanks for pointer, I did not notice this patch!
The approach is indeed very different.  So the patch works on function basis
and cares only about local statics of functions that was not inlined?


Yes.  I should probably mention here that we did the analysis and 
initial implementation of this optimization 7+ years ago against GCC 
4.2, and in some cases we were being conservative in deciding the 
optimization was not valid because the information required for more 
detailed analysis wasn't being collected in the right place back then, 
etc.



The failing tests are remove-local-statics-{4,5,7,12,14b}.c.


+/* Verify that we don't eliminate a global static variable.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "global_static" } } */
+
+static int global_static;
+
+int
+test1 (int x)
+{
+  global_static = x;
+
+  return global_static + x;
+}

here test1 optimizes into

   global_static=x;
   return x+x;

this makes global_static write only and thus it is correctly eliminated.
So I think this testcase is bogus.


Yes, I agree that this one was for a restriction of our implementation 
approach.



+++ b/gcc/testsuite/gcc.dg/remove-local-statics-5.c
@@ -0,0 +1,24 @@
+/* Verify that we do not eliminate a static local variable whose uses
+   are dominated by a def when the function calls setjmp.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "thestatic" } } */
+
+#include 
+
+int
+foo (int x)
+{
+  static int thestatic;
+  int retval;
+  jmp_buf env;
+
+  thestatic = x;
+
+  retval = thestatic + x;
+
+  setjmp (env);
+
+  return retval;
+}

I belive this is similar case.  I do not see setjmp changing anything here, 
since
local optimizers turns retval = x+x;
What it was intended to test?


H, I'm guessing this was some concern about invalid code motion 
around a setjmp.  Our original analysis document lists "F does not call 
setjmp" as a requirement for the optimization, so this was probably a 
case where we were being excessively conservative.



--- /dev/null
+++ b/gcc/testsuite/gcc.dg/remove-local-statics-7.c
@@ -0,0 +1,19 @@
+/* Verify that we eliminate a static local variable where it is defined
+   along all paths leading to a use.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "thestatic" } } */
+
+int
+test1 (int x)
+{
+  static int thestatic;
+
+  if (x < 0)
+thestatic = x;
+  else
+thestatic = -x;
+
+  return thestatic + x;
+}

Here we get after early optimizations:

int
test1 (int x)
{
   static int thestatic;
   int thestatic.0_5;
   int thestatic.1_7;
   int _8;

   :
   if (x_2(D) < 0)
 goto ;
   else
 goto ;

   :
   thestatic = x_2(D);
   goto ;

   :
   thestatic.0_5 = -x_2(D);
   thestatic = thestatic.0_5;

   :
   thestatic.1_7 = thestatic;
   _8 = thestatic.1_7 + x_2(D);
   return _8;

}

and thus we still have bogus read from thestatic.  Because my analysis works at 
IPA level,
we won't benefit from fact that dom2 eventually cleans it up as:
int
test1 (int x)
{
   static int thestatic;
   int thestatic.0_5;
   int thestatic.1_7;
   int _8;
   int prephitmp_10;

   :
   if (x_2(D) < 0)
 goto ;
   else
 goto ;

   :
   thestatic = x_2(D);
   goto ;

   :
   thestatic.0_5 = -x_2(D);
   thestatic = thestatic.0_5;

   :
   # prephitmp_10 = PHI 
   thestatic.1_7 = prephitmp_10;
   _8 = thestatic.1_7 + x_2(D);
   return _8;

}

Richi, is there a way to teach early FRE to get this transformation?
I see it is a partial redundancy problem...


H, bummer that we don't get this one for free.  :-(


+/* Verify that we do not eliminate a static variable when it is declared
+   in a function that has nested functions.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "thestatic" } } */
+
+int test1 (int x)
+{
+  static int thestatic;
+
+  int nested_test1 (int x)
+  {
+return x + thestatic;
+  }
+
+  thestatic = x;
+
+  return thestatic + x + nested_test1 (x);
+}

Here we work hard enough to optimize test1 as:
int
test1 (int x)
{
   static int thestatic;
   int _4;
   int _5;

   :
   thestatic = x_2(D);
   _4 = x_2(D) + x_2(D);
   _5 = _4 + _4;
   return _5;

}

thus inlining nested_test1 during early optimization. This makes the removal 
valid.


Yes.  This is one we had to punt on due to the one-function-at-a-time 
approach.



+/* Verify that we do not eliminate a static local variable if the function
+   containing it is inlined.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "thestatic" } } */
+
+int
+te

RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-18 Thread Thomas Preud'homme
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
>  wrote:
> > Ping?
> 
> Sorry ...

> 
> Thanks and sorry again for the delay.
> 

No need to be sorry, it was really not meant as a complaint. I understand very
well that patches sometimes go under the radar and I just wanted to make
sure someone saw it.


> From: pins...@gmail.com [mailto:pins...@gmail.com]
> 
> Not always decomposed. On MIPS, it should using the load/store left/right
> instructions for unaligned load/stores which is normally better than
> decomposed load/stores. So having a cost model would be nice.
> 

This makes me think about the following situation:

uint32_t
read_le (unsigned char data[])
{
  uint16_t lo, hi;

  hi = data[0] | (data[1] << 8);
  lo = (data[2] << 16) | (data[3] << 24);
  printf ("lo + hi: %d\n", lo + hi);
  return hi | lo;
}

Currently this will do a load of 4 bytes and do a bswap on big endian target but
It would be better to just handle hi and lo separately doing two 2 bytes load
and doing a bitwise OR of these two. So check if two SSA_NAME are used by
other statements and if yes stop there.


> From: Richard Biener [mailto:richard.guent...@gmail.com]
> 
> Oh, and what happens for
> 
> unsigned foo (unsigned char *x)
> {
>   return x[0] << 24 | x[2] << 8 | x[3];
> }
> 
> ?  We could do an unsigned int load from x and zero byte 3
> with an AND.  Enhancement for a followup, similar to also
> considering vector types for the load (also I'm not sure
> that uint64_type_node always has non-BLKmode for all
> targets).
> 

Looks like a nice improvement to the patch indeed.


> From: pins...@gmail.com [mailto:pins...@gmail.com]
> 
> No we cannot if x[4] is on a different page which is not mapped in, we get a
> fault. Not something we want.
> 

Why would x[4] be loaded? I guess Richard was only suggesting doing a single
load + zeroing only when the untouched array entry is neither the first nor the
last, that is when there is a load.

Best regards,

Thomas Preud'homme





Re: we are starting the wide int merge

2014-05-18 Thread Gerald Pfeifer
On Sat, 17 May 2014, Richard Sandiford wrote:
> To rule out one possibility: which GCC are you using for stage1?

I think that may the smoking gun.  When I use GCC 4.7 to bootstrap,
FreeBSD 8, 9 and 10 all build fine on i386 (= i486) and amd64.

When I use the system compiler, which is GCC 4.2 on FreeBSD 8 and 9
and clang on FreeBSD 10, things fail on FreeBSD 10...

...with a bootstrap comparison failure of stages 2 and 3 on i386:
https://redports.org/~gerald/20140518230801-31619-208277/gcc410-4.10.0.s20140518.log

...and an interesting failure on amd64:
https://redports.org/~gerald/20140518230801-31619-208275/gcc410-4.10.0.s20140518.log


In file included from .././../gcc-4.10-20140518/gcc/xcoffout.c:29:
.././../gcc-4.10-20140518/gcc/tree.h:4576:3: warning: extraneous template 
parameter list in template specialization
  template <>
  ^~~
.././../gcc-4.10-20140518/gcc/wide-int.cc:1274:23: error: invalid use of a 
cast in a inline asm context requiring an l-value: remove the cast or 
build with -fheinous-gnu-extensions
  umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ());
  ~~~^


This means this clang-based system is not able to bootstrap GCC trunk
on amd64.

Perhaps looking into this first may affect the failure on i486?

Gerald


Re: Eliminate write-only variables

2014-05-18 Thread Jan Hubicka
> 
> H, I'm guessing this was some concern about invalid code motion
> around a setjmp.  Our original analysis document lists "F does not
> call setjmp" as a requirement for the optimization, so this was
> probably a case where we were being excessively conservative.

I suppose it was because you needed to prove that the value stored in static
variable is dead at the function return and setjmp may let you to jump
from somewhere else.  This should transparently work with mainline approach.

> >
> >Richi, is there a way to teach early FRE to get this transformation?
> >I see it is a partial redundancy problem...
> 
> H, bummer that we don't get this one for free.  :-(

Yeah, lets not forget about this one. On the plus side we got quite few cases
your code didn't ;)

> >Sandra,
> >do you think you can drop the testcases that are not valid and commit the 
> >valid one minus
> >remove-local-statics-7.c for which we can fill in enhancement request?
> 
> OK.  Keep the original numbering or re-number them to fill up the
> holes left by the deletions?

Since the original testcases never hit mainline, I would preffer them to be 
renumbered ;)

> 
> >For cases like local-statics-7 your approach can be "saved" by adding simple 
> >IPA analysis
> >to look for static vars that are used only by one function and keeping your 
> >DSE code active
> >for them, so we can still get rid of this special case assignments during 
> >late compilation.
> >I am however not quite convinced it is worth the effort - do you have some 
> >real world
> >cases where it helps?
> 
> Um, the well-known benchmark.  ;-)

Very informative, does my implementation handle it well? ;)

I suppose for benchmarks using static where they should not, the analysis that 
static
is used only in one function and pass to turn it into automatic variable would 
still
make sense. The approach removing write only variables and relying on FRE to 
completely
clean up is bit fragile by requiring very complex machinery to work perfectly...

Honza
> 
> >I am rather thinking about cutting the passmanager queue once again after 
> >main
> >tree optimization and re-running IPA unreachable code removal after them. 
> >This
> >should help with rather common cases where we optimize out code as effect
> >of inlining.
> >
> >This would basically mean running pass_all_optimizations from late IPA pass
> >and scheduling one extra fixup_cfg and perhaps DCE pass at begginig of
> >pass_all_optimizations.
> >
> >Honza
> >
> 
> -Sandra
> 


[DOC Patch] Label attributes

2014-05-18 Thread David Wohlferd

I have a release on file with the FSF, but don't have SVN write access.

Problem description:
The docs in (Attribute Syntax) say "The only attribute it makes sense to 
use after a label is 'unused'."  However, there are two others: hot and 
cold.  The reason it looks like there is only one is that the docs for 
the label attribute versions of hot and cold are misfiled under the 
(Function Attributes) section.  When there was only one label attribute, 
it (sort of) made sense not to have a separate section for label 
attributes.  Now that there are three, not so much.


ChangeLog:
2014-05-18  David Wohlferd 

 * doc/extend.texi: Create Label Attributes section,
 move all label attributes into it and reference it.

dw

Index: extend.texi
===
--- extend.texi	(revision 210577)
+++ extend.texi	(working copy)
@@ -55,6 +55,7 @@
 * Mixed Declarations::  Mixing declarations and code.
 * Function Attributes:: Declaring that functions have no side effects,
 or that they can never return.
+* Label Attributes::Specifying attributes on labels.
 * Attribute Syntax::Formal syntax for attributes.
 * Function Prototypes:: Prototype declarations and old-style definitions.
 * C++ Comments::C++ comments are recognized.
@@ -2181,7 +2182,8 @@
 @code{error} and @code{warning}.
 Several other attributes are defined for functions on particular
 target systems.  Other attributes, including @code{section} are
-supported for variables declarations (@pxref{Variable Attributes})
+supported for variables declarations (@pxref{Variable Attributes}),
+labels (@pxref{Label Attributes})
 and for types (@pxref{Type Attributes}).
 
 GCC plugins may provide their own attributes.
@@ -3617,8 +3619,8 @@
 @cindex @code{hot} function attribute
 The @code{hot} attribute on a function is used to inform the compiler that
 the function is a hot spot of the compiled program.  The function is
-optimized more aggressively and on many target it is placed into special
-subsection of the text section so all hot functions appears close together
+optimized more aggressively and on many targets it is placed into a special
+subsection of the text section so all hot functions appear close together,
 improving locality.
 
 When profile feedback is available, via @option{-fprofile-use}, hot functions
@@ -3627,23 +3629,14 @@
 The @code{hot} attribute on functions is not implemented in GCC versions
 earlier than 4.3.
 
-@cindex @code{hot} label attribute
-The @code{hot} attribute on a label is used to inform the compiler that
-path following the label are more likely than paths that are not so
-annotated.  This attribute is used in cases where @code{__builtin_expect}
-cannot be used, for instance with computed goto or @code{asm goto}.
-
-The @code{hot} attribute on labels is not implemented in GCC versions
-earlier than 4.8.
-
 @item cold
 @cindex @code{cold} function attribute
 The @code{cold} attribute on functions is used to inform the compiler that
 the function is unlikely to be executed.  The function is optimized for
-size rather than speed and on many targets it is placed into special
-subsection of the text section so all cold functions appears close together
+size rather than speed and on many targets it is placed into a special
+subsection of the text section so all cold functions appear close together,
 improving code locality of non-cold parts of program.  The paths leading
-to call of cold functions within code are marked as unlikely by the branch
+to calls of cold functions within code are marked as unlikely by the branch
 prediction mechanism.  It is thus useful to mark functions used to handle
 unlikely conditions, such as @code{perror}, as cold to improve optimization
 of hot functions that do call marked functions in rare occasions.
@@ -3654,15 +3647,6 @@
 The @code{cold} attribute on functions is not implemented in GCC versions
 earlier than 4.3.
 
-@cindex @code{cold} label attribute
-The @code{cold} attribute on labels is used to inform the compiler that
-the path following the label is unlikely to be executed.  This attribute
-is used in cases where @code{__builtin_expect} cannot be used, for instance
-with computed goto or @code{asm goto}.
-
-The @code{cold} attribute on labels is not implemented in GCC versions
-earlier than 4.8.
-
 @item no_sanitize_address
 @itemx no_address_safety_analysis
 @cindex @code{no_sanitize_address} function attribute
@@ -4527,6 +4511,65 @@
 @code{#pragma GCC} is of use for constructs that do not naturally form
 part of the grammar.  @xref{Pragmas,,Pragmas Accepted by GCC}.
 
+@node Label Attributes
+@section Label Attributes
+@cindex Label Attributes
+
+GCC allows attributes to be set on C labels.  @xref{Attribute Syntax}, for 
+details of the exact syntax for using attributes.  Other attributes are 
+available for functions (@pxref{Function Attributes}), variables 
+(@pxref{Variable Attributes})

Re: [Patch, avr] Propagate -mrelax gcc driver flag to assembler

2014-05-18 Thread Denis Chertykov
2014-05-16 14:02 GMT+04:00 Georg-Johann Lay :
> Am 05/15/2014 09:55 AM, schrieb Senthil Kumar Selvaraj:
>
>> On Wed, May 14, 2014 at 12:56:54PM +0200, Rainer Orth wrote:
>>>
>>> Georg-Johann Lay  writes:
>>>
 Or what about simply that, which works for me:


 Index: config/avr/avr.h
 ===
 --- config/avr/avr.h(revision 210276)
 +++ config/avr/avr.h(working copy)
 @@ -512,7 +512,11 @@ extern const char *avr_device_to_sp8 (in
   %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
   %{!fexceptions:-fno-exceptions}"

 +#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
 +#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) %{mrelax:-mlink-relax} "
 +#else
   #define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) "
 +#endif

   #define LINK_SPEC "\
   %{mrelax:--relax\
>>>
>>>
>>> Better yet something like
>>>
>>> #ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
>>> #define LINK_RELAX_SPEC "%{mrelax:-mlink-relax} "
>>> #else
>>> #define LINK_RELAX_SPEC ""
>>> #endif
>>>
>>> #define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) " LINK_RELAX_SPEC
>>>
>>
>> Does this look ok? I don't have commit access, so could someone commit
>> this please?
>
>
> Hi, looks fine to me.  Thanks


I'm on vacation until the 24-may.

Denis.


Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.

2014-05-18 Thread Jan Hubicka
> > Thanks for the pointer, there is indeed the recommendation in
> > optimization manual [1], section 3.6.4, where it is said:
> >
> > --quote--
> > Misaligned data access can incur significant performance penalties.
> > This is particularly true for cache line
> > splits. The size of a cache line is 64 bytes in the Pentium 4 and
> > other recent Intel processors, including
> > processors based on Intel Core microarchitecture.
> > An access to data unaligned on 64-byte boundary leads to two memory
> > accesses and requires several
> > ??ops to be executed (instead of one). Accesses that span 64-byte
> > boundaries are likely to incur a large
> > performance penalty, the cost of each stall generally are greater on
> > machines with longer pipelines.
> >
> > ...
> >
> > A 64-byte or greater data structure or array should be aligned so that
> > its base address is a multiple of 64.
> > Sorting data in decreasing size order is one heuristic for assisting
> > with natural alignment. As long as 16-
> > byte boundaries (and cache lines) are never crossed, natural alignment
> > is not strictly necessary (though
> > it is an easy way to enforce this).
> > --/quote--
> >
> > So, this part has nothing to do with AVX512, but with cache line
> > width. And we do have a --param "l1-cache-line-size=64", detected with
> > -march=native that could come handy here.
> >
> > This part should be rewritten (and commented) with the information
> > above in mind.
> 
> Like in the patch below. Please note, that the block_tune setting for
> the nocona is wrong, -march=native on my trusted old P4 returns:
> 
> --param "l1-cache-size=16" --param "l1-cache-line-size=64" --param
> "l2-cache-size=2048" "-mtune=nocona"
> 
> which is consistent with the above quote from manual.
> 
> 2014-01-02  Uros Bizjak  
> 
> * config/i386/i386.c (ix86_data_alignment): Calculate max_align
> from prefetch_block tune setting.
> (nocona_cost): Correct size of prefetch block to 64.
> 
Uros,
I am looking into libreoffice size and the data alignment seems to make huge
difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 and 
4.9,
while clang produces 5.2MB.

The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, but
But perhaps we want to revisit the alignment rules.  The optimization manuals
usually care only about performance critical loops.  Perhaps we can make the
rules to align only bigger datastructures, or so at least for -O2.

Honza



Re: [Patch, avr] Propagate -mrelax gcc driver flag to assembler

2014-05-18 Thread Senthil Kumar Selvaraj
On Fri, May 16, 2014 at 12:02:12PM +0200, Georg-Johann Lay wrote:
> Am 05/15/2014 09:55 AM, schrieb Senthil Kumar Selvaraj:
> >On Wed, May 14, 2014 at 12:56:54PM +0200, Rainer Orth wrote:
> >>Georg-Johann Lay  writes:
> >>
> >>>Or what about simply that, which works for me:
> >>>
> >>>
> >>>Index: config/avr/avr.h
> >>>===
> >>>--- config/avr/avr.h(revision 210276)
> >>>+++ config/avr/avr.h(working copy)
> >>>@@ -512,7 +512,11 @@ extern const char *avr_device_to_sp8 (in
> >>>  %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
> >>>  %{!fexceptions:-fno-exceptions}"
> >>>
> >>>+#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
> >>>+#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) %{mrelax:-mlink-relax} "
> >>>+#else
> >>>  #define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) "
> >>>+#endif
> >>>
> >>>  #define LINK_SPEC "\
> >>>  %{mrelax:--relax\
> >>
> >>Better yet something like
> >>
> >>#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION
> >>#define LINK_RELAX_SPEC "%{mrelax:-mlink-relax} "
> >>#else
> >>#define LINK_RELAX_SPEC ""
> >>#endif
> >>
> >>#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) " LINK_RELAX_SPEC
> >>
> >
> >Does this look ok? I don't have commit access, so could someone commit
> >this please?
> 
> Hi, looks fine to me.  Thanks
> 
> Usually, changelogs are more descriptive w.r.t. to what objects are touched 
> like:

Ah ok. Will keep that in mind, thanks.

Regards
Senthil
> 
>   * config/avr/avr.h (LINK_RELAX_SPEC): Pass -mlink-relax to the
>   assembler, depending on HAVE_AS_AVR_LINK_RELAX_OPTION.
>   (ASM_SPEC): Use it.
>   * configure.ac (HAVE_AVR_AS_LINK_RELAX_OPTION) [avr]: New define if
>   assembler supports -mlink-relax.
>   * config.in: Regenerate.
>   * configure: Likewise.
> 
> >
> >Regards
> >Senthil
> >
> >2014-05-15  Senthil Kumar Selvaraj  
> >
> > * config/avr/avr.h: Pass on mlink-relax to assembler.
> > * configure.ac: Test for mlink-relax assembler support.
> > * config.in: Regenerate.
> > * configure: Likewise.
> >
> >diff --git gcc/config.in gcc/config.in
> >index c0ba36e..1738301 100644
> >--- gcc/config.in
> >+++ gcc/config.in
> >@@ -575,6 +575,12 @@
> >  #endif
> >
> >
> >+/* Define if your assembler supports -mlink-relax option. */
> >+#ifndef USED_FOR_TARGET
> >+#undef HAVE_AVR_AS_LINK_RELAX_OPTION
> >+#endif
> >+
> >+
> >  /* Define to 1 if you have the `clearerr_unlocked' function. */
> >  #ifndef USED_FOR_TARGET
> >  #undef HAVE_CLEARERR_UNLOCKED
> >diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h
> >index 9d34983..c59c54d 100644
> >--- gcc/config/avr/avr.h
> >+++ gcc/config/avr/avr.h
> >@@ -512,8 +512,14 @@ extern const char *avr_device_to_sp8 (int argc, const 
> >char **argv);
> >  %{!fenforce-eh-specs:-fno-enforce-eh-specs} \
> >  %{!fexceptions:-fno-exceptions}"
> >
> >-#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) "
> >-
> >+#ifdef HAVE_AVR_AS_LINK_RELAX_OPTION
> >+#define ASM_RELAX_SPEC "%{mrelax:-mlink-relax}"
> >+#else
> >+#define ASM_RELAX_SPEC ""
> >+#endif
> >+
> >+#define ASM_SPEC "%:device_to_as(%{mmcu=*:%*}) " ASM_RELAX_SPEC
> >+
> >  #define LINK_SPEC "\
> >  %{mrelax:--relax\
> >   %{mpmem-wrap-around:%{mmcu=at90usb8*:--pmem-wrap-around=8k}\
> >diff --git gcc/configure gcc/configure
> >index f4db0a0..2812cdb 100755
> >--- gcc/configure
> >+++ gcc/configure
> >@@ -24014,6 +24014,39 @@ $as_echo "#define HAVE_AS_JSRDIRECT_RELOCS 1" 
> >>>confdefs.h
> >  fi
> >  ;;
> >
> >+  avr-*-*)
> >+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
> >-mlink-relax option" >&5
> >+$as_echo_n "checking assembler for -mlink-relax option... " >&6; }
> >+if test "${gcc_cv_as_avr_relax+set}" = set; then :
> >+  $as_echo_n "(cached) " >&6
> >+else
> >+  gcc_cv_as_avr_relax=no
> >+  if test x$gcc_cv_as != x; then
> >+$as_echo '.text' > conftest.s
> >+if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mlink-relax -o conftest.o 
> >conftest.s >&5'
> >+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> >+  (eval $ac_try) 2>&5
> >+  ac_status=$?
> >+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
> >+  test $ac_status = 0; }; }
> >+then
> >+gcc_cv_as_avr_relax=yes
> >+else
> >+  echo "configure: failed program was" >&5
> >+  cat conftest.s >&5
> >+fi
> >+rm -f conftest.o conftest.s
> >+  fi
> >+fi
> >+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_avr_relax" >&5
> >+$as_echo "$gcc_cv_as_avr_relax" >&6; }
> >+if test $gcc_cv_as_avr_relax = yes; then
> >+
> >+$as_echo "#define HAVE_AVR_AS_LINK_RELAX_OPTION 1" >>confdefs.h
> >+
> >+fi
> >+  ;;
> >+
> >cris-*-*)
> >  { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
> > -no-mul-bug-abort option" >&5
> >  $as_echo_n "checking assembler for -no-mul-bug-abort option... " >&6; }
> >diff --git gcc/configure.ac gcc/configure.ac
> >index 8f17dfb..49a1f3d 100644
> >--- gcc/configure.ac
> >+++ gcc/configure.

Re: [PATCH, PR61219]: Fix sNaN handling in ARM float to double conversion

2014-05-18 Thread Joey Ye
If f2d need fix, then please fix d2f too as current implementation for
both behave similarly.

- Joey

On Mon, May 19, 2014 at 5:23 AM, Aurelien Jarno  wrote:
> On ARM soft-float, the float to double conversion doesn't convert a sNaN
> to qNaN as the IEEE Std 754 standard mandates:
>
> "Under default exception handling, any operation signaling an invalid
> operation exception and for which a floating-point result is to be
> delivered shall deliver a quiet NaN."
>
> Given the soft float ARM code ignores exceptions and always provides a
> result, a float to double conversion of a signaling NaN should return a
> quiet NaN. Fix this in extendsfdf2.
>
>
> 2014-05-18  Aurelien Jarno  
>
> PR target/61219
> * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.
>
>
> Index: libgcc/config/arm/ieee754-df.S
> ===
> --- libgcc/config/arm/ieee754-df.S  (revision 210588)
> +++ libgcc/config/arm/ieee754-df.S  (working copy)
> @@ -473,11 +473,15 @@
> eorne   xh, xh, #0x3800 @ fixup exponent otherwise.
> RETc(ne)@ and return it.
>
> -   teq r2, #0  @ if actually 0
> -   do_it   ne, e
> -   teqne   r3, #0xff00 @ or INF or NAN
> +   bicsr2, r2, #0xff00 @ isolate mantissa
> +   do_it   eq  @ if 0, that is ZERO or INF,
> RETc(eq)@ we are done already.
>
> +   teq r3, #0xff00 @ check for NAN
> +   do_it   eq, t
> +   orreq   xh, xh, #0x0008 @ change to quiet NAN
> +   RETc(eq)@ and return it.
> +
> @ value was denormalized.  We can normalize it now.
> do_push {r4, r5, lr}
> mov r4, #0x380  @ setup corresponding exponent
>
> --
> Aurelien Jarno  GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net http://www.aurel32.net


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-18 Thread Bin.Cheng
On Sat, May 17, 2014 at 12:52 AM, Mike Stump  wrote:
> On May 16, 2014, at 3:07 AM, Bin.Cheng  wrote:
>>
>>> I don't see how regrename will help resolve [base+offset] false
>>> dependencies. Can you explain? I'd expect effects from
>>> hardreg-copyprop "commoning" a base register.
>> It's the register operand's false dependency, rather than the base's
>> one.  Considering below simple case:
>>mov r1,  #const1
>>store r1, [base+offset1]
>>mov r1, #const2
>>store r1, [base_offset2]
>> It should be renamed into:
>>mov r1,  #const1
>>store r1, [base+offset1]
>>mov r2, #const2
>>store r2, [base_offset2]
>
> Ah, but, what did this look like right before pass_web?
I don't think this would be a problem for pre-RA, generally GCC won't
try to reuse pseudo register in this way, right?

Thanks,
bin


-- 
Best Regards.


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-18 Thread Bin.Cheng
On Sat, May 17, 2014 at 12:32 AM, Jeff Law  wrote:
> On 05/16/14 04:07, Bin.Cheng wrote:
>
>> Yes, I think this one does have a good reason.  The target independent
>> pass just makes sure that two consecutive memory access instructions
>> are free of data-dependency with each other, then feeds it to back-end
>> hook.  It's back-end's responsibility to generate correct instruction.
>
> But given these two memory access insns, there's only a couple ways they're
> likely to combine into a single insn.  We could just as easily have the
> target independent code construct a new insn then try to recognize it.  If
> it's not recognized, then try the other way.
>
> Or is it the case that we're doing something beyond upsizing the mode?
>
>
>
>>   It's not about modifying an existing insn then recognize it, it's
>> about creating new instruction sometimes.  For example, we can
>> generate a simple move insn in Arm mode, while have to generate a
>> parallel instruction in Thumb mode.  Target independent part has no
>> idea how to generate an expected insn.  Moreover, back-end may check
>> some special conditions too.
>
> But can't you go through movXX to generate either the simple insn on the ARM
> or the PARALLEL on the thumb?
>
Yes, I think it's more than upsizing the mode.  There is another
example from one of x86's candidate peephole patch at
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00467.html

The patch wants to do below transformation, which I think is very
target dependent.

+(define_peephole2
+  [(set (match_operand:DF 0 "register_operand")
+   (match_operand:DF 1 "memory_operand"))
+   (set (match_operand:V2DF 2 "register_operand")
+   (vec_concat:V2DF (match_dup 0)
+(match_operand:DF 3 "memory_operand")))]
+  "TARGET_SSE_UNALIGNED_LOAD_OPTIMAL
+   && REGNO (operands[0]) == REGNO (operands[2])
+   && adjacent_mem_locations (operands[1], operands[3])"
+  [(set (match_dup 2)
+   (unspec:V2DF [(match_dup 4)] UNSPEC_LOADU))]
+
+;; merge movsd/movhpd to movupd when TARGET_SSE_UNALIGNED_STORE_OPTIMAL
+;; is true.
+(define_peephole2
+  [(set (match_operand:DF 0 "memory_operand")
+(vec_select:DF (match_operand:V2DF 1 "register_operand")
+  (parallel [(const_int 0)])))
+   (set (match_operand:DF 2 "memory_operand")
+(vec_select:DF (match_dup 1)
+   (parallel [(const_int 1)])))]
+  "TARGET_SSE_UNALIGNED_STORE_OPTIMAL
+   && adjacent_mem_locations (operands[0], operands[2])"
+  [(set (match_dup 3)
+(unspec:V2DF [(match_dup 1)] UNSPEC_STOREU))]

Thanks,
bin


-- 
Best Regards.


Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-18 Thread Bin.Cheng
On Sat, May 17, 2014 at 12:18 AM, Jeff Law  wrote:
> On 05/16/14 04:07, Bin.Cheng wrote:
>>
>> On Fri, May 16, 2014 at 1:13 AM, Jeff Law  wrote:
>>>
>>> On 05/15/14 10:51, Mike Stump wrote:


 On May 15, 2014, at 12:26 AM, bin.cheng  wrote:
>
>
> Here comes up with a new GCC pass looking through each basic block
> and merging paired load store even they are not adjacent to each
> other.



 So I have a target that has load and store multiple support that
 supports large a number of registers (2-n registers), and I added a
 sched0 pass that is a light copy of the regular scheduling pass that
 uses a different cost function which arranges all loads first, then
 all stores then everything else.  Within a group of loads or stores
 the secondary key is the base register, the next key is the offset.
 The net result, all loads off the same register are sorted in
 increasing order.
>>>
>>>
>>> Glad to see someone else stumble on (ab)using the scheduler to do this.
>>
>> Emm, If it's (ab)using, should we still do it then?
>
> I think it'd still be fine.  There's even been a comment about doing this
> kind of thing in the scheduler that's been around since the early 90s...
>
> The scheduler is a bit interesting in that it has a wealth of dependency
> information and the ability to reorganize the insn stream in relatively
> arbitrary ways.  That seems to make it a natural place to think about
> transformations of this nature.  We just haven't had a good infrastructure
> for doing that.
>
> In theory we're a lot closer now to being able to plug in different
> costing/sorting models and let the scheduler do its thing.  Those models
> might rewrite for register pressure, or encourage certain independent insns
> to issue back-to-back to encourage combining, or to build candidate insns
> for delay slot scheduling, etc.
>
>
>> As Mike stated, merging of consecutive memory accesses is all about
>> the base register and the offset. I am thinking another method
>> collecting all memory accesses with same base register then doing the
>> merge work.  In this way, we should be able to merge more than 2
>> instructions, also it would be possible to remove redundant load
>> instructions in one pass.
>>
>> My question is how many these redundant loads could be?  Is there any
>> rtl pass responsible for this now?
>
> I suspect it's a lot less important now than it used to be.  But there's
> probably some cases where it'd be useful.  Combining sub-word accesses into
> full-word accesses come immediately to mind.
>
> I'm not aware of any pass which does these kind of changes in a general
> form.  Some passes (caller-save) do a fair amount of work to track when they
> can generate multi-object loads/stores (and it was a huge win back on the
> old sparc processors).
>
Glad this RFC has attracted some attentions and thanks for all the
comments.  Here I can see four major concerns as below:
1) Should we do it in a separated pass, or just along with scheduler?

2) When should we run the new pass, before or after RA?  There are
both advantages and disadvantages and very depends on the target for
which we are compiling.
I have no simple answer to this.  Maybe we can run the pass twice or
follow Oleg's suggestion.  I think it's a new strategy for GCC to let
backend decide when to run a pass.

3) Do we need a new target hook interface?
I answered this in other messages and I still think it's target dependent.

4) The optimization should be able to handle cases with more than 2
consecutive load/store instructions.
The current implementation can't handle such cases and need further extension.

The 3) and 4) are just implementation questions, while I am not sure
about 1) and 2), so any more comments that we could make some
decisions to carry on this optimization?

Thanks,
bin

-- 
Best Regards.


Re: [PATCH ARM] Improve ARM memset inlining

2014-05-18 Thread Bin.Cheng
Ping^2

Thanks,
bin

On Mon, May 12, 2014 at 11:17 AM, Bin.Cheng  wrote:
> Ping.
>
> Thanks,
> bin
>
> On Tue, May 6, 2014 at 12:59 PM, bin.cheng  wrote:
>>
>>

>> Precisely, I configured gcc with options "--with-arch=armv7-a
>> --with-cpu|--with-tune=cortex-a9".
>> I read gcc documents and realized that "-mcpu" is ignored when "-march" is
>> specified.  I don't know why gcc acts in this manner, but it leads to
>> inconsistent configuration/command line behavior.
>> If we configure GCC with "--with-arch=armv7-a --with-cpu=cortex-a9", then
>> only "-march=armv7-a" is passed to cc1.
>> If we compile with "-march=armv7-a -mcpu=cortex-a9", then gcc works fine and
>> passes "-march=armv7-a -mcpu=cortex-a9" to cc1.
>>
>> Even more weird cc1 warns that "switch -mcpu=cortex-m4 conflicts with
>> -march=armv7-m switch".
>>
>> Thanks,
>> bin
>>
>>
>>
>>
>
>
>
> --
> Best Regards.



-- 
Best Regards.


Re: we are starting the wide int merge

2014-05-18 Thread Richard Sandiford
Gerald Pfeifer  writes:
> On Sat, 17 May 2014, Richard Sandiford wrote:
>> To rule out one possibility: which GCC are you using for stage1?
>
> I think that may the smoking gun.  When I use GCC 4.7 to bootstrap,
> FreeBSD 8, 9 and 10 all build fine on i386 (= i486) and amd64.
>
> When I use the system compiler, which is GCC 4.2 on FreeBSD 8 and 9
> and clang on FreeBSD 10, things fail on FreeBSD 10...
>
> ...with a bootstrap comparison failure of stages 2 and 3 on i386:
> https://redports.org/~gerald/20140518230801-31619-208277/gcc410-4.10.0.s20140518.log

Do you get exactly the same comparison failures using clang and GCC 4.2
as the stage1 compiler?  That would rule out the system compiler
miscompiling stage1.

> In file included from .././../gcc-4.10-20140518/gcc/xcoffout.c:29:
> .././../gcc-4.10-20140518/gcc/tree.h:4576:3: warning: extraneous template 
> parameter list in template specialization
>   template <>
>   ^~~

Oops, fixed below, applied as obvious.

> .././../gcc-4.10-20140518/gcc/wide-int.cc:1274:23: error: invalid use of a 
> cast in a inline asm context requiring an l-value: remove the cast or 
> build with -fheinous-gnu-extensions
>   umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ());
>   ~~~^

This is PR 61146.  You can get around it by adding -fheinous-gnu-extensions
to BOOT_CFLAGS.

> This means this clang-based system is not able to bootstrap GCC trunk
> on amd64.
>
> Perhaps looking into this first may affect the failure on i486?

'Fraid it won't help.  We don't use umul_ppmm (or even include
longlong.h) for 486.

Thanks,
Richard


gcc/
* tree.h: Remove extraneous template <>.

Index: gcc/tree.h
===
--- gcc/tree.h  2014-05-19 07:45:30.378667987 +0100
+++ gcc/tree.h  2014-05-19 07:46:07.364991104 +0100
@@ -4573,7 +4573,6 @@ #define ANON_AGGRNAME_FORMAT "__anon_%d"
 unsigned int get_len () const;
   };
 
-  template <>
   template 
   struct int_traits  >
   {


Re: [PATCH, PR61219]: Fix sNaN handling in ARM float to double conversion

2014-05-18 Thread Aurelien Jarno
On Mon, May 19, 2014 at 02:08:06PM +0800, Joey Ye wrote:
> If f2d need fix, then please fix d2f too as current implementation for
> both behave similarly.

I have done some tests with double to float conversion, and the NaN
behaviour is correct. This is due to specific code handling that in
d2f:

3:  @ chech for NAN
mvnsr3, r2, asr #21
bne 5f  @ simple overflow
orrsr3, xl, xh, lsl #12
do_it   ne, tt
movne   r0, #0x7f00
orrne   r0, r0, #0x00c0
RETc(ne)@ return NAN

Aurelien

> On Mon, May 19, 2014 at 5:23 AM, Aurelien Jarno  wrote:
> > On ARM soft-float, the float to double conversion doesn't convert a sNaN
> > to qNaN as the IEEE Std 754 standard mandates:
> >
> > "Under default exception handling, any operation signaling an invalid
> > operation exception and for which a floating-point result is to be
> > delivered shall deliver a quiet NaN."
> >
> > Given the soft float ARM code ignores exceptions and always provides a
> > result, a float to double conversion of a signaling NaN should return a
> > quiet NaN. Fix this in extendsfdf2.
> >
> >
> > 2014-05-18  Aurelien Jarno  
> >
> > PR target/61219
> > * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN.
> >
> >
> > Index: libgcc/config/arm/ieee754-df.S
> > ===
> > --- libgcc/config/arm/ieee754-df.S  (revision 210588)
> > +++ libgcc/config/arm/ieee754-df.S  (working copy)
> > @@ -473,11 +473,15 @@
> > eorne   xh, xh, #0x3800 @ fixup exponent otherwise.
> > RETc(ne)@ and return it.
> >
> > -   teq r2, #0  @ if actually 0
> > -   do_it   ne, e
> > -   teqne   r3, #0xff00 @ or INF or NAN
> > +   bicsr2, r2, #0xff00 @ isolate mantissa
> > +   do_it   eq  @ if 0, that is ZERO or INF,
> > RETc(eq)@ we are done already.
> >
> > +   teq r3, #0xff00 @ check for NAN
> > +   do_it   eq, t
> > +   orreq   xh, xh, #0x0008 @ change to quiet NAN
> > +   RETc(eq)@ and return it.
> > +
> > @ value was denormalized.  We can normalize it now.
> > do_push {r4, r5, lr}
> > mov r4, #0x380  @ setup corresponding exponent
> >
> > --
> > Aurelien Jarno  GPG: 4096R/1DDD8C9B
> > aurel...@aurel32.net http://www.aurel32.net
> 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net