Re: [PATCH 3/4] libstdc++: avoid character accumulation in istreambuf_iterator

2017-11-16 Thread Paolo Carlini

Hi,

On 16/11/2017 06:31, Petr Ovtchenkov wrote:
Is we really worry about frozen sizeof of instantiated template? 
Yes we do. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html 
under "Prohibited Changes", point 8.


Of course removing the buffering has performance implications too - 
that's why it's there in the first place! - which I remember we 
investigated a bit again in the past when somebody reported that a few 
implementations had it other did not. But I can't say to have followed 
all the (recently uncovered) conformance implications, it could well be 
that we cannot be 100% conforming to the letter of the current standard 
while taking advantage of a buffering mechanism. Jonathan will provide 
feedback.


Paolo.


[Ada] Spurious ineffective use_clause warning on use in boolean condition

2017-11-16 Thread Pierre-Marie de Rodat
This patch prevents spurious ineffective use_clause warnings in certain cases
due to the possible rewritting of nodes within boolean expressions.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-11-16  Justin Squirek  

* sem.adb (Analyze): Remove requirement that the original node of N be
an operator in the case that analysis on the node yields the relevant
operator - so prefer it instead.

Index: sem.adb
===
--- sem.adb (revision 254797)
+++ sem.adb (working copy)
@@ -740,18 +740,33 @@
 
   Debug_A_Exit ("analyzing  ", N, "  (done)");
 
-  --  Mark relevant use-type and use-package clauses as effective using the
-  --  original node, because constant folding may have occurred and removed
-  --  references that need to be examined. If the node in question is
-  --  overloaded then this is deferred until resolution.
+  --  Mark relevant use-type and use-package clauses as effective
+  --  preferring the original node over the analyzed one in the case that
+  --  constant folding has occurred and removed references that need to be
+  --  examined. Also, if the node in question is overloaded then this is
+  --  deferred until resolution.
 
-  if Nkind (Original_Node (N)) in N_Op
-and then Present (Entity (Original_Node (N)))
-and then not Is_Overloaded (Original_Node (N))
-  then
- Mark_Use_Clauses (Original_Node (N));
-  end if;
+  declare
+ Operat : Node_Id := Empty;
+  begin
+ --  Attempt to obtain a checkable operator node
 
+ if Nkind (Original_Node (N)) in N_Op then
+Operat := Original_Node (N);
+ elsif Nkind (N) in N_Op then
+Operat := N;
+ end if;
+
+ --  Mark the operator
+
+ if Present (Operat)
+   and then Present (Entity (Operat))
+   and then not Is_Overloaded (Operat)
+ then
+Mark_Use_Clauses (Operat);
+ end if;
+  end;
+
   --  Now that we have analyzed the node, we call the expander to perform
   --  possible expansion. We skip this for subexpressions, because we don't
   --  have the type yet, and the expander will need to know the type before


[Ada] Compiler abort or infinite loop in malformed declaration

2017-11-16 Thread Pierre-Marie de Rodat
This patch fixes a compiler abort (or an infinite loop in a compiler whose
internal assertions are not enabled) when the subtype indication in an object
declaration does not denote a type. Syntactically this may happen when the
subtype indication is missing altogether and the expression in the declaration
is analyzed as if it were one.

Compiling aida-ling_float_random.adb must yield:

   aida-long_float_random.adb:51:23: subtype mark required in this context
   aida-long_float_random.adb:53:07: assignment to "in" mode parameter
  not allowed
   aida-long_float_random.adb:59:07: "Temp" is undefined
  (more references follow)
   aida-long_float_random.adb:66:07: "Ni" is undefined (more references follow)
   aida-long_float_random.adb:71:07: "Nj" is undefined (more references follow)
   aida-long_float_random.adb:76:30: no selector "Cd" for type "T" defined
  at aida-long_float_random.ads:57
   aida-long_float_random.adb:77:10: "C" is undefined (more references follow)
   aida-long_float_random.ads:27:06: warning: postcondition does not mention
  function result

---
package body Aida.Long_Float_Random with SPARK_Mode is

   function Make (New_I : Seed_Range_1 := Default_I;
  New_J : Seed_Range_1 := Default_J;
  New_K : Seed_Range_1 := Default_K;
  New_L : Seed_Range_2 := Default_L
 ) return T
   is
  S : Real;
  R : Real;
  M : Range_1;
   begin -- Set_Seed
  return This : T do
 This.I := New_I;
 This.J := New_J;
 This.K := New_K;
 This.L := New_L;
 This.Ni := Range_3'Last;
 This.Nj := Range_3'Last / 3 + 1;
 This.C := Init_C;

 Fill_U : for Ii in Range_3'Range loop
S := 0.0;
R := 0.5;

pragma Loop_Invariant (S = 0.0 and R = 0.5);

Calc_S : for Jj in 1 .. 24 loop
   M := (This.J * This.I) mod M1;
   M := (M * This.K) mod M1;
   This.I := This.J;
   This.J := This.K;
   This.K := M;
   This.L := (53 * This.L + 1) mod M2;

   pragma Loop_Invariant (R <= 0.5 and R >= 0.0);

   if (This.L * M) mod 64 >= 32 then
  S  := S + R;
   end if;

   R := 0.5 * R;
end loop Calc_S;

This.U (Ii) := S;
 end loop Fill_U;
  end return;
   end Make;

   function Random (This : T) return Real is
  Temp : This.Temp.Value;
   begin
  This.Temp := (Exists => False);
  return Temp;
   end Random;

   procedure Calculate (This : in out T) is
   begin
  Temp := This.U (This.Ni) - This.U (This.Nj);
  if Temp < 0.0 then
 Temp := Temp + 1.0;
  end if;

  This.U (This.Ni) := Temp;

  Ni := Ni - 1;
  if Ni = 0 then
 Ni := Range_3'Last;
  end if;

  Nj := Nj - 1;
  if Nj = 0 then
 Nj := Range_3'Last;
  end if;

  This.C := This.C - This.Cd;
  if C < 0.0 then
 C := C + Cm;
  end if;

  Temp := Temp - C;
  if Temp < 0.0 then
 Temp := Temp + 1.0;
  end if;
   end Calculate;

end Aida.Long_Float_Random;
---
package Aida.Long_Float_Random with SPARK_Mode is

   type T (<>) is tagged limited private;

   subtype Real is Long_Float;

   M1 : constant := 179;
   M2 : constant := M1 - 10;

   subtype Seed_Range_1 is Integer range 2 .. M1 - 1;
   subtype Seed_Range_2 is Integer range 0 .. M2 - 1;

   Default_I : constant Seed_Range_1 := 12;
   Default_J : constant Seed_Range_1 := 34;
   Default_K : constant Seed_Range_1 := 56;
   Default_L : constant Seed_Range_2 := 78;

   function Make (New_I : Seed_Range_1 := Default_I;
  New_J : Seed_Range_1 := Default_J;
  New_K : Seed_Range_1 := Default_K;
  New_L : Seed_Range_2 := Default_L
 ) return T;

   function Random (This : T) return Real with
 Global => null,
 Pre=> This.Is_Calculated,
 Post   => not This.Is_Calculated;

   function Is_Calculated (This : T) return Boolean with
 Global => null;

   procedure Calculate (This : in out T) with
 Global => null,
 Post   => This.Is_Calculated;

private

   M3  : constant :=   97;
   Divisor : constant := 16777216.0;
   Init_C  : constant :=   362436.0 / Divisor;
   Cd  : constant :=  7654321.0 / Divisor;
   Cm  : constant := 16777213.0 / Divisor;

   subtype Range_1 is Integer range 0 .. M1 - 1;
   subtype Range_2 is Integer range 0 .. M2 - 1;
   subtype Range_3 is Integer range 1 .. M3;

   type U_Array_T is array (Range_3) of Real;

   type Optional_Temp_T (Exists : Boolean := False) is record
  case Exists is
 when True  => Value : Real;
 when False => null;
  end case;
   end record;

   type T is tagged limited record
  I: Range_1;
  J: Range_1;
  K: Range_1;
  Ni   : Integer;
  Nj   : Integer;
  L:

Re: [PATCH 02/14] Support for adding and stripping location_t wrapper nodes

2017-11-16 Thread Richard Biener
On Wed, Nov 15, 2017 at 4:33 PM, David Malcolm  wrote:
> On Wed, 2017-11-15 at 12:11 +0100, Richard Biener wrote:
>> On Wed, Nov 15, 2017 at 7:17 AM, Trevor Saunders > rg> wrote:
>> > On Fri, Nov 10, 2017 at 04:45:17PM -0500, David Malcolm wrote:
>> > > This patch provides a mechanism in tree.c for adding a wrapper
>> > > node
>> > > for expressing a location_t, for those nodes for which
>> > > !CAN_HAVE_LOCATION_P, along with a new method of cp_expr.
>> > >
>> > > It's called in later patches in the kit via that new method.
>> > >
>> > > In this version of the patch, I use NON_LVALUE_EXPR for wrapping
>> > > constants, and VIEW_CONVERT_EXPR for other nodes.
>> > >
>> > > I also turned off wrapper nodes for EXCEPTIONAL_CLASS_P, for the
>> > > sake
>> > > of keeping the patch kit more minimal.
>> > >
>> > > The patch also adds a STRIP_ANY_LOCATION_WRAPPER macro for
>> > > stripping
>> > > such nodes, used later on in the patch kit.
>> >
>> > I happened to start reading this series near the end and was rather
>> > confused by this macro since it changes variables in a rather
>> > unhygienic
>> > way.  Did you consider just defining a inline function to return
>> > the
>> > actual decl?  It seems like its not used that often so the slight
>> > extra
>> > syntax should be that big a deal compared to the explicitness.
>>
>> Existing practice  (STRIP_NOPS & friends).  I'm fine either way,
>> the patch looks good.
>>
>> Eventually you can simplify things by doing less checking in
>> location_wrapper_p, like only checking
>>
>> +inline bool location_wrapper_p (const_tree exp)
>> +{
>> +  if ((TREE_CODE (exp) == NON_LVALUE_EXPR
>> +   || (TREE_CODE (exp) == VIEW_CONVERT_EXPR
>> +  && (TREE_TYPE (exp)
>> + == TREE_TYPE (TREE_OPERAND (exp, 0)))
>> +return true;
>> +  return false;
>> +}
>>
>> and renaming to maybe_location_wrapper_p.  After all you can't really
>> distinguish location wrappers from non-location wrappers?  (and why
>> would you want to?)
>
> That's the implementation I originally tried.
>
> As noted in an earlier thread about this, the problem I ran into was
> (in g++.dg/conversion/reinterpret1.C):
>
>   // PR c++/15076
>
>   struct Y { Y(int &); };
>
>   int v;
>   Y y1(reinterpret_cast(v));  // { dg-error "" }
>
> where the "reinterpret_cast" has the same type as the VAR_DECL v,
> and hence the argument to y1 is a NON_LVALUE_EXPR around a VAR_DECL,
> where both have the same type, and hence location_wrapper_p () on the
> cast would return true.
>
> Compare with:
>
>   Y y1(v);
>
> where the argument "v" with a location wrapper is a VIEW_CONVERT_EXPR
> around a VAR_DECL.
>
> With the simpler conditions you suggested above, both are treated as
> location wrappers (leading to the dg-error in the test failing),
> whereas with the condition in the patch, only the latter is treated as
> a location wrapper, and an error is correctly emitted for the dg-error.
>
> Hope this sounds sane.  Maybe the function needs a more detailed
> comment explaining this?

Yes.  I guess the above would argue for a new tree code but I can
see that it is better to avoid that.

Thanks,
Richard.

> Thanks
> Dave
>
>
>> Thanks,
>> Richard.
>>
>> > Other than that the series seems reasonable, and I look forward to
>> > having wrappers in more places.  I seem to remember something I
>> > wanted
>> > to warn about they would make much easier.
>> >
>> > Thanks
>> >
>> > Trev
>> >


Re: [PATCH] Replace has_single_use guards in store-merging

2017-11-16 Thread Christophe Lyon
Hi Jakub,

On 9 November 2017 at 13:58, Richard Biener  wrote:
> On Wed, 8 Nov 2017, Jakub Jelinek wrote:
>
>> On Wed, Nov 08, 2017 at 04:20:15PM +0100, Richard Biener wrote:
>> > Can't you simply use
>> >
>> >unsigned ret = 0;
>> >FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
>> >  if (!has_single_use (op))
>> >++ret;
>> >return ret;
>> >
>> > ?  Not sure if the bit_not_p handling is required.
>>
>> Consider the case with most possible statements:
>>   _1 = load1;
>>   _2 = load2;
>>   _3 = ~_1;
>>   _4 = ~_2;
>>   _5 = _3 ^ _4;
>>   store0 = _5;
>> All these statements construct the value for the store, and if we remove
>> the old stores and add new stores we'll need similar statements for each
>> of the new stores.  What the function is attempting to compute how many
>> of these original statements will be not DCEd.
>> If _5 has multiple uses, then we'll need all of them, so 5 stmts not
>> being DCEd.  If _5 has single use, but _3 (and/or _4) has multiple uses,
>> we'll need the corresponding loads in addition to the BIT_NOT_EXPR
>> statement(s).  If only _1 (and/or _2) has multiple uses, we'll need
>> the load(s) but nothing else.
>> So, there isn't a single stmt I could FOR_EACH_SSA_TREE_OPERAND on.
>> For BIT_{AND,IOR,XOR}_EXPR doing it just on that stmt would be too rough
>> approximation and would miss the case when the bitwise binary op result
>> is used.
>
> Hmm, I see.
>
>> > It doesn't seem you handle multi-uses of the BIT_*_EXPR results
>> > itself?  Or does it only handle multi-uses of the BIT_*_EXPR
>> > but not the feeding loads?
>>
>> I believe I handle all those precisely above (the only reason I've talked
>> about aproximation is that bit field loads/stores are counted as one stmt
>> and the masking added for handling multiple semi-adjacent bitfield
>> loads/stores aren't counted either).
>>
>> Given the above example:
>>   if (!has_single_use (gimple_assign_rhs1 (stmt)))
>>   {
>> ret += 1 + info->ops[0].bit_not_p;
>> if (info->ops[1].base_addr)
>>   ret += 1 + info->ops[1].bit_not_p;
>> return ret + 1;
>>   }
>> Above should handle the _5 multiple uses case (the first operand is 
>> guaranteed
>> by the discovery code to be a possibly negated load).
>>   stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
>>   /* stmt is now the BIT_*_EXPR.  */
>>   if (!has_single_use (gimple_assign_rhs1 (stmt)))
>>   ret += 1 + info->ops[0].bit_not_p;
>> Above should handle the _3 multiple uses.
>>   else if (info->ops[0].bit_not_p)
>>   {
>> gimple *stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
>> if (!has_single_use (gimple_assign_rhs1 (stmt2)))
>>   ++ret;
>>   }
>> Above should handle multiple uses of _1.
>>   if (info->ops[1].base_addr == NULL_TREE)
>>   return ret;
>> is an early return for the case when second argument to BIT_*_EXPR is
>> constant.
>>   if (!has_single_use (gimple_assign_rhs2 (stmt)))
>>   ret += 1 + info->ops[1].bit_not_p;
>> Above should handle the _4 multiple uses.
>>   else if (info->ops[1].bit_not_p)
>>   {
>> gimple *stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs2 (stmt));
>> if (!has_single_use (gimple_assign_rhs1 (stmt2)))
>>   ++ret;
>>   }
>> Above should handle the _2 multiple uses.
>>
>> And for another example like:
>>   _6 = load1;
>>   _7 = ~_6;
>>   store0 = _7;
>>   if (!has_single_use (gimple_assign_rhs1 (stmt)))
>>   return 1 + info->ops[0].bit_not_p;
>> Above should handle _7 multiple uses
>>   else if (info->ops[0].bit_not_p)
>>   {
>> stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
>> if (!has_single_use (gimple_assign_rhs1 (stmt)))
>>   return 1;
>>   }
>> Above should handle _6 multiple uses.
>
> Thanks for the explanation, it's now more clear what the code does.
>
> The patch is ok.  (and hopefully makes the PGO bootstrap issue latent
> again - heh)
>


I've noticed that this patch (r254579) introduces an ICE on aarch64:
gcc.target/aarch64/vect-compile.c (internal compiler error)
gcc.target/aarch64/vect.c (internal compiler error)


during GIMPLE pass: store-merging
In file included from /gcc/testsuite/gcc.target/aarch64/vect.c:5:0:
/gcc/testsuite/gcc.target/aarch64/vect.x: In function 'test_orn':
/gcc/testsuite/gcc.target/aarch64/vect.x:7:6: internal compiler error:
tree check: expected ssa_name, have mem_ref in has_single_use, at
ssa-iterators.h:400
0x547c93 tree_check_failed(tree_node const*, char const*, int, char const*, ...)
/gcc/tree.c:9098
0x120a79d tree_check
/gcc/tree.h:3344
0x120a79d has_single_use
/gcc/ssa-iterators.h:400
0x120bb67 count_multiple_uses
/gcc/gimple-ssa-store-merging.c:1413
0x120bd81 split_group
/gcc/gimple-ssa-store-merging.c:1490
0x120c890 output_merged_store
/gcc/gimple-ssa-store-merging.c:1699
0x120f07f output_merged_stores

Re: [PATCH] Replace has_single_use guards in store-merging

2017-11-16 Thread Jakub Jelinek
On Thu, Nov 16, 2017 at 11:00:01AM +0100, Christophe Lyon wrote:
> I've noticed that this patch (r254579) introduces an ICE on aarch64:
> gcc.target/aarch64/vect-compile.c (internal compiler error)
> gcc.target/aarch64/vect.c (internal compiler error)

That should have been fixed in r254628.

Jakub


Re: [PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-11-16 Thread Richard Biener
On Thu, Nov 16, 2017 at 4:08 AM, Martin Sebor  wrote:
> On 11/15/2017 03:51 AM, Richard Biener wrote:
>>
>> On Tue, Nov 14, 2017 at 6:45 PM, Martin Sebor  wrote:
>>>
>>> On 11/14/2017 05:28 AM, Richard Biener wrote:


 On Mon, Nov 13, 2017 at 6:37 PM, Martin Sebor  wrote:
>
>
> Richard, this thread may have been conflated with the one Re:
> [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets
> (PR 82455) They are about different things.
>
> I'm still looking for approval of:
>
>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01208.html
>>>
>>>
>>>
>>> Sorry, I pointed to an outdated version.  This is the latest
>>> version:
>>>
>>>   https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01304.html
>>>
>>> My bad...
>>>
>>>

 +  tree maxbound
 + = build_int_cst (sizetype, ~(1LLU << (TYPE_PRECISION (sizetype) -
 1)));

 this looks possibly bogus.  Can you instead use

   up_bound_p1
 = wide_int_to_tree (sizetype, wi::div_trunc (wi::max_value
 (TYPE_PRECISION (sizetype), SIGNED), wi::to_wide (eltsize)));

 please?  Note you are _not_ computing the proper upper bound here
 because
 that
 is what you compute plus low_bound.

 +  up_bound_p1 = int_const_binop (TRUNC_DIV_EXPR, maxbound,
 eltsize);

 +
 +  tree arg = TREE_OPERAND (ref, 0);
 +  tree_code code = TREE_CODE (arg);
 +  if (code == COMPONENT_REF)
 + {
 +  HOST_WIDE_INT off;
 +  if (tree base = get_addr_base_and_unit_offset (ref, &off))
 +{
 +  tree size = TYPE_SIZE_UNIT (TREE_TYPE (base));
 +  if (TREE_CODE (size) == INTEGER_CST)
 + up_bound_p1 = int_const_binop (MINUS_EXPR, up_bound_p1, size);

 I think I asked this multiple times now but given 'ref' is the
 variable array-ref
 a.b.c[i] when you call get_addr_base_and_unit_offset (ref, &off) you
 always
 get a NULL_TREE return value.

 So I asked you to pass it 'arg' instead ... which gets you the offset of
 a.b.c, which looks like what you intended to get anyway.

 I also wonder what you compute here - you are looking at the size of
 'base'
 but that is the size of 'a'.  You don't even use the computed offset!
 Which
 means you could have used get_base_address instead!?  Also the type
 of 'base' may be completely off given MEM[&blk + 8].b.c[i] would return
 blk
 as base which might be an array of chars and not in any way related to
 the type of the innermost structure we access with COMPONENT_REFs.

 Why are you only looking at COMPONENT_REF args anyways?  You
 don't want to handle a.b[3][i]?

 That is, I'd have expected you do

if (get_addr_base_and_unit_offset (ref, &off))
  up_bound_p1 = wide_int_to_tree (sizetype, wi::sub (wi::to_wide
 (up_bound_p1), off));
>>
>>
>> ^
>
>
> Please see the attached update.

Ok.

Thanks,
Richard.

> Martin


[Ada] Disallow renamings declaring tagged primitives

2017-11-16 Thread Pierre-Marie de Rodat
This patch implements the following SPARK rules from SPARK RM 6.1.1(3):

   A subprogram_renaming_declaration shall not declare a primitive operation of
   a tagged type.


-- Source --


--  renamings.ads

package Renamings with SPARK_Mode is
   type T is tagged null record;

   procedure Null_Proc (Obj : in out T) is null;

   procedure Proc_1 (Obj : in out T);
   procedure Proc_2 (Obj : in out T);

   function Func_1 (Obj : T) return Integer;
   function Func_2 (Obj : T) return Integer;

   function Func_3 return T;
   function Func_4 return T;

   procedure Error_1 (Obj : in out T) renames Null_Proc; --  Error
   procedure Error_2 (Obj : in out T) renames Proc_1;--  Error
   function  Error_3 (Obj : T) return Integer renames Func_1;--  Error
   function  Error_4 return T renames Func_3;--  Error

   package Nested is
  procedure OK_1 (Obj : in out T) renames Null_Proc; --  OK
  procedure OK_2 (Obj : in out T) renames Proc_1;--  OK
  function  OK_3 (Obj : T) return Integer renames Func_1;--  OK
  function  OK_4 return T renames Func_3;--  OK
   end Nested;
end Renamings;

--  renamings.adb

package body Renamings with SPARK_Mode is
   procedure Proc_1 (Obj : in out T) is begin null; end Proc_1;

   procedure Proc_2 (Obj : in out T) renames Proc_1; --  OK

   function Func_1 (Obj : T) return Integer is
   begin
  return 0;
   end Func_1;

   function Func_2 (Obj : T) return Integer renames Func_1;  --  OK

   function Func_3 return T is
  Result : T;
   begin
  return Result;
   end Func_3;

   function Func_4 return T renames Func_3;  --  OK
end Renamings;


-- Compilation and output --


$ gcc -c renamings.adb
renamings.ads:15:39: subprogram renaming "Error_1" cannot declare primitive of
  type "T" (SPARK RM 6.1.1(3))
renamings.ads:16:39: subprogram renaming "Error_2" cannot declare primitive of
  type "T" (SPARK RM 6.1.1(3))
renamings.ads:17:47: subprogram renaming "Error_3" cannot declare primitive of
  type "T" (SPARK RM 6.1.1(3))
renamings.ads:18:31: subprogram renaming "Error_4" cannot declare primitive of
  type "T" (SPARK RM 6.1.1(3))

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-11-16  Hristian Kirtchev  

* sem_ch8.adb (Analyze_Subprogram_Renaming): Ensure that a renaming
declaration does not define a primitive operation of a tagged type for
SPARK.
(Check_SPARK_Primitive_Operation): New routine.

Index: sem_ch8.adb
===
--- sem_ch8.adb (revision 254797)
+++ sem_ch8.adb (working copy)
@@ -59,6 +59,7 @@
 with Sem_Dist; use Sem_Dist;
 with Sem_Elab; use Sem_Elab;
 with Sem_Eval; use Sem_Eval;
+with Sem_Prag; use Sem_Prag;
 with Sem_Res;  use Sem_Res;
 with Sem_Util; use Sem_Util;
 with Sem_Type; use Sem_Type;
@@ -1924,6 +1925,10 @@
   --have one. Otherwise the subtype of Sub's return profile must
   --exclude null.
 
+  procedure Check_SPARK_Primitive_Operation (Subp_Id : Entity_Id);
+  --  Ensure that a SPARK renaming denoted by its entity Subp_Id does not
+  --  declare a primitive operation of a tagged type (SPARK RM 6.1.1(3)).
+
   procedure Freeze_Actual_Profile;
   --  In Ada 2012, enforce the freezing rule concerning formal incomplete
   --  types: a callable entity freezes its profile, unless it has an
@@ -2519,6 +2524,52 @@
  end if;
   end Check_Null_Exclusion;
 
+  -
+  -- Check_SPARK_Primitive_Operation --
+  -
+
+  procedure Check_SPARK_Primitive_Operation (Subp_Id : Entity_Id) is
+ Prag : constant Node_Id := SPARK_Pragma (Subp_Id);
+ Typ  : Entity_Id;
+
+  begin
+ --  Nothing to do when the subprogram appears within an instance
+
+ if In_Instance then
+return;
+
+ --  Nothing to do when the subprogram is not subject to SPARK_Mode On
+ --  because this check applies to SPARK code only.
+
+ elsif not (Present (Prag)
+ and then Get_SPARK_Mode_From_Annotation (Prag) = On)
+ then
+return;
+
+ --  Nothing to do when the subprogram is not a primitive operation
+
+ elsif not Is_Primitive (Subp_Id) then
+return;
+ end if;
+
+ Typ := Find_Dispatching_Type (Subp_Id);
+
+ --  Nothing to do when the subprogram is a primitive operation of an
+ --  untagged type.
+
+ if No (Typ) then
+return;
+ end if;
+
+ --  At this point a renaming declaration introduces a new primitive
+ --  operation for a tagged type.
+
+ Error_Msg_Node_2 := Typ;
+ Error_Msg_NE
+  

[Ada] Crash on early call region of SPARK subprogram body

2017-11-16 Thread Pierre-Marie de Rodat
This patch accounts for the case where the early call region of a subprogram
body declared in a package body spans into the empty corresponding spec due to
pragma Elaborate_Body.


-- Source --


--  gnat.adc

pragma SPARK_Mode (On);

--  pack.ads

package Pack with Elaborate_Body is
end Pack;

--  pack.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Pack is
   procedure Proc;

   procedure Elaborator is
   begin
  Proc;
   end Elaborator;

   procedure Proc is
   begin
  Put_Line ("Proc");
   end Proc;

begin
   Elaborator;
end Pack;

-
-- Compilation --
-

$ gcc -c pack.adb

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-11-16  Hristian Kirtchev  

* sem_elab.adb (Include): Including a node which is also a compilation
unit terminates the search because there are no more lists to examine.

Index: sem_elab.adb
===
--- sem_elab.adb(revision 254803)
+++ sem_elab.adb(working copy)
@@ -4245,7 +4245,7 @@
   procedure Include (N : Node_Id; Curr : in out Node_Id);
   pragma Inline (Include);
   --  Update the Curr and Start pointers to include arbitrary construct N
-  --  in the early call region.
+  --  in the early call region. This routine raises ECR_Found.
 
   function Is_OK_Preelaborable_Construct (N : Node_Id) return Boolean;
   pragma Inline (Is_OK_Preelaborable_Construct);
@@ -4559,7 +4559,24 @@
   procedure Include (N : Node_Id; Curr : in out Node_Id) is
   begin
  Start := N;
- Curr  := Prev (Start);
+
+ --  The input node is a compilation unit. This terminates the search
+ --  because there are no more lists to inspect and there are no more
+ --  enclosing constructs to climb up to. The transitions are:
+ --
+ --private declarations -> terminate
+ --visible declarations -> terminate
+ --statements   -> terminate
+ --declarations -> terminate
+
+ if Nkind (Parent (Start)) = N_Compilation_Unit then
+raise ECR_Found;
+
+ --  Otherwise the input node is still within some list
+
+ else
+Curr := Prev (Start);
+ end if;
   end Include;
 
   ---


Re: [PATCH] Canonicalize constant multiplies in division

2017-11-16 Thread Richard Biener
On Wed, Nov 15, 2017 at 3:39 PM, Wilco Dijkstra  wrote:
> Richard Biener wrote:
>> On Tue, Oct 17, 2017 at 6:32 PM, Wilco Dijkstra  
>> wrote:
>
>>>  (if (flag_reciprocal_math)
>>> - /* Convert (A/B)/C to A/(B*C)  */
>>> + /* Convert (A/B)/C to A/(B*C). */
>>>   (simplify
>>>(rdiv (rdiv:s @0 @1) @2)
>>> -   (rdiv @0 (mult @1 @2)))
>>> +  (rdiv @0 (mult @1 @2)))
>>> +
>>> + /* Canonicalize x / (C1 * y) to (x * C2) / y.  */
>>> + (if (optimize)
>>
>> why if (optimize) here?  The pattern you removed has no
>> such check.  As discussed this may undo CSE of C1 * y
>> so please check for a single-use on the mult with :s
>
> I think that came from an earlier version of this patch. I've removed it
> and added a single use check.
>
>>> +  (simplify
>>> +   (rdiv @0 (mult @1 REAL_CST@2))
>>> +   (if (!real_zerop (@1))
>>
>> why this check?  The pattern below didn't have it.
>
> Presumably to avoid the change when dividing by zero. I've removed it, here is
> the updated version. This passes bootstrap and regress:

Ok.

Richard.

>
> ChangeLog
> 2017-11-15  Wilco Dijkstra  
> Jackson Woodruff  
>
> gcc/
> PR 71026/tree-optimization
> * match.pd: Canonicalize constant multiplies in division.
>
> gcc/testsuite/
> PR 71026/tree-optimization
> * gcc.dg/cse_recip.c: New test.
> --
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> b5042b783c0830a2da08c44bed39842a17911844..ea7d90ed977cfff991d74bee54e91ecb209b6030
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -344,10 +344,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(negate @0)))
>
>  (if (flag_reciprocal_math)
> - /* Convert (A/B)/C to A/(B*C)  */
> + /* Convert (A/B)/C to A/(B*C). */
>   (simplify
>(rdiv (rdiv:s @0 @1) @2)
> -   (rdiv @0 (mult @1 @2)))
> +  (rdiv @0 (mult @1 @2)))
> +
> + /* Canonicalize x / (C1 * y) to (x * C2) / y.  */
> + (simplify
> +  (rdiv @0 (mult:s @1 REAL_CST@2))
> +  (with
> +   { tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @2); }
> +   (if (tem)
> +(rdiv (mult @0 { tem; } ) @1
>
>   /* Convert A/(B/C) to (A/B)*C  */
>   (simplify
> @@ -646,15 +654,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (tem)
>   (rdiv { tem; } @1)
>
> -/* Convert C1/(X*C2) into (C1/C2)/X  */
> -(simplify
> - (rdiv REAL_CST@0 (mult @1 REAL_CST@2))
> -  (if (flag_reciprocal_math)
> -   (with
> -{ tree tem = const_binop (RDIV_EXPR, type, @0, @2); }
> -(if (tem)
> - (rdiv { tem; } @1)
> -
>  /* Simplify ~X & X as zero.  */
>  (simplify
>   (bit_and:c (convert? @0) (convert? (bit_not @0)))
> diff --git a/gcc/testsuite/gcc.dg/cse_recip.c 
> b/gcc/testsuite/gcc.dg/cse_recip.c
> new file mode 100644
> index 
> ..88cba9930c0eb1fdee22a797eff110cd9a14fcda
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/cse_recip.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -fdump-tree-optimized-raw" } */
> +
> +void
> +cse_recip (float x, float y, float *a)
> +{
> +  a[0] = y / (5 * x);
> +  a[1] = y / (3 * x);
> +  a[2] = y / x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "rdiv_expr" 1 "optimized" } } */
>
>
>
>


Re: [PATCH, rs6000] Correct some Power9 scheduling info

2017-11-16 Thread Segher Boessenkool
Hi Pat,

That look good, thanks!  Okay for trunk.


Segher


Re: [PATCH] Fix PowerPC testsuite not to look for *.c*~ files

2017-11-16 Thread Segher Boessenkool
Hi!

On Wed, Nov 15, 2017 at 07:51:24PM -0500, Michael Meissner wrote:
> I was back-porting some changes to the IBM Advance Toolchain branch, and I was
> doing this via creating a patch file, and applying the patch output.  I tend 
> to
> always use the -b option to patch to create a backup file.  I had new 
> failures,
> since the new files the bfp, dfp, and vfu sub-directories created an empty
> *.c.~1~ file, and the .exp tried to run it as a test.  Since we don't have any
> non C files in those directories, I changed the test to just *.exp.  I 
> verified
> that we get the same number of failures, successes, etc. with the patch 
> applied
> and without it being applied.  Can I check this into the trunk?
> 
> [gcc/testsuite]
> 2017-11-15  Michael Meissner  
> 
>   * gcc.target/powerpc/bfp/bfp.exp: Look for *.c files, not *.c*
>   files to prevent ~ files from getting recognized.
>   * gcc.target/powerpc/dfp/dfp.exp: Likewise.
>   * gcc.target/powerpc/vsu/vsu.exp: Likewise.

And these are the only three testsuites with this problem (in powerpc/), and
we do not have any *.c* files other than *.c .  Okay for trunk, thanks!


Segher


Re: [PATCH] Replace has_single_use guards in store-merging

2017-11-16 Thread Christophe Lyon
On 16 November 2017 at 11:04, Jakub Jelinek  wrote:
> On Thu, Nov 16, 2017 at 11:00:01AM +0100, Christophe Lyon wrote:
>> I've noticed that this patch (r254579) introduces an ICE on aarch64:
>> gcc.target/aarch64/vect-compile.c (internal compiler error)
>> gcc.target/aarch64/vect.c (internal compiler error)
>
> That should have been fixed in r254628.
>

Great, thanks. Validations are still catching-up on my side, I still
have a few days of delay.

> Jakub


[Ada] Fix more precise mode for parameter

2017-11-16 Thread Pierre-Marie de Rodat
CodePeer analysis of GNAT showed that a parameter was not read and
always set on all paths, making it an out rather than an in-out.
This was not detected by the compiler, because one path ends up
raising an exception, which is not taken into account in the simpler
analysis done in GNAT.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-11-16  Yannick Moy  

* sem_elab.adb (Include): Fix mode of parameter Curr to out.

Index: sem_elab.adb
===
--- sem_elab.adb(revision 254804)
+++ sem_elab.adb(working copy)
@@ -4242,7 +4242,7 @@
   --  Determine whether list List contains at least one suitable construct
   --  for inclusion into an early call region.
 
-  procedure Include (N : Node_Id; Curr : in out Node_Id);
+  procedure Include (N : Node_Id; Curr : out Node_Id);
   pragma Inline (Include);
   --  Update the Curr and Start pointers to include arbitrary construct N
   --  in the early call region. This routine raises ECR_Found.
@@ -4556,7 +4556,7 @@
   -- Include --
   -
 
-  procedure Include (N : Node_Id; Curr : in out Node_Id) is
+  procedure Include (N : Node_Id; Curr : out Node_Id) is
   begin
  Start := N;
 


[Ada] Disallow renamings declaring tagged primitives

2017-11-16 Thread Pierre-Marie de Rodat
This patch enables the check which ensures that a subprogram renaming does not
declare a primitive operation of a tagged type in instantiations.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-11-16  Hristian Kirtchev  

* sem_ch8.adb (Check_SPARK_Primitive_Operation): Enable the check in
instantiations.

Index: sem_ch8.adb
===
--- sem_ch8.adb (revision 254804)
+++ sem_ch8.adb (working copy)
@@ -2533,16 +2533,11 @@
  Typ  : Entity_Id;
 
   begin
- --  Nothing to do when the subprogram appears within an instance
-
- if In_Instance then
-return;
-
  --  Nothing to do when the subprogram is not subject to SPARK_Mode On
  --  because this check applies to SPARK code only.
 
- elsif not (Present (Prag)
- and then Get_SPARK_Mode_From_Annotation (Prag) = On)
+ if not (Present (Prag)
+  and then Get_SPARK_Mode_From_Annotation (Prag) = On)
  then
 return;
 


[Ada] Allow calls to Is_CCT_Instance for records

2017-11-16 Thread Pierre-Marie de Rodat
Routine Is_CCT_Instance (where CCT stands for Current Concurrent Type)
is now used in the SPARK backend for checking references to the current
type instance within default expressions of discriminants and components of
(single) concurrent units.

The same backend code was already used for similar references in records
and to reuse it the Is_CCT_Instance can now be called for record types.

No frontend test provided, because only the SPARK backend is affected.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-11-16  Piotr Trojanek  

* sem_util.ads, sem_util.adb (Is_CCT_Instance): Allow calls where
Context_Id denotes a record type.

Index: sem_util.adb
===
--- sem_util.adb(revision 254802)
+++ sem_util.adb(working copy)
@@ -12890,8 +12890,9 @@
   E_Package,
   E_Procedure,
   E_Protected_Type,
-  E_Task_Type));
-
+  E_Task_Type)
+  or else
+Is_Record_Type (Context_Id));
  return Scope_Within_Or_Same (Context_Id, Ref_Id);
   end if;
end Is_CCT_Instance;
Index: sem_util.ads
===
--- sem_util.ads(revision 254802)
+++ sem_util.ads(working copy)
@@ -1536,9 +1536,10 @@
  (Ref_Id : Entity_Id;
   Context_Id : Entity_Id) return Boolean;
--  Subsidiary to the analysis of pragmas [Refined_]Depends and [Refined_]
-   --  Global. Determine whether entity Ref_Id (which must represent either
-   --  a protected type or a task type) denotes the current instance of a
-   --  concurrent type. Context_Id denotes the associated context where the
+   --  Global; also used when analyzing default expressions of protected and
+   --  record components. Determine whether entity Ref_Id (which must represent
+   --  either a protected type or a task type) denotes the current instance of
+   --  a concurrent type. Context_Id denotes the associated context where the
--  pragma appears.
 
function Is_Child_Or_Sibling


Re: [PATCH, rs6000] correct implementation of _mm_add_pi32

2017-11-16 Thread Segher Boessenkool
Hi!

On Wed, Nov 15, 2017 at 08:58:21PM -0600, Steven Munroe wrote:
> A small thinko in the implementation of _mm_add_pi32 that only shows
> when compiling for power9.

This is okay, it is trivial and obvious.  Please commit.  Thanks,


Segher


> 2017-11-15  Steven Munroe  
> 
>   * config/rs6000/mmintrin.h (_mm_add_pi32[_ARCH_PWR]): Correct
>   parameter list for vec_splats.
> 
> Index: gcc/config/rs6000/mmintrin.h
> ===
> --- gcc/config/rs6000/mmintrin.h  (revision 254714)
> +++ gcc/config/rs6000/mmintrin.h  (working copy)
> @@ -463,8 +463,8 @@ _mm_add_pi32 (__m64 __m1, __m64 __m2)
>  #if _ARCH_PWR9
>__vector signed int a, b, c;
>  
> -  a = (__vector signed int)vec_splats (__m1, __m1);
> -  b = (__vector signed int)vec_splats (__m2, __m2);
> +  a = (__vector signed int)vec_splats (__m1);
> +  b = (__vector signed int)vec_splats (__m2);
>c = vec_add (a, b);
>return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0));
>  #else
> 


Cleanup profile update after loop duplication

2017-11-16 Thread Jan Hubicka
Hi,
this patch removes use of frequencies in duplicate_loop_to_header_edge.  I have
checked that things go well with all three transofmrations it supports
(complette peeling, peeling and unrolling).

In simple testcases there are small mismatches after but that is because profile
can not be maintained precisely without adjusting probabilities of conditionals
inside of the loop so we alwyas produces some mismatches here.

Bootstrapped/regtested x86_64-linux.

Honza

* cfgloopmanip.c (duplicate_loop_to_header_edge): Cleanup profile
manipulation.
Index: cfgloopmanip.c
===
--- cfgloopmanip.c  (revision 254767)
+++ cfgloopmanip.c  (working copy)
@@ -1096,14 +1096,16 @@ duplicate_loop_to_header_edge (struct lo
   basic_block new_bb, bb, first_active_latch = NULL;
   edge ae, latch_edge;
   edge spec_edges[2], new_spec_edges[2];
-#define SE_LATCH 0
-#define SE_ORIG 1
+  const int SE_LATCH = 0;
+  const int SE_ORIG = 1;
   unsigned i, j, n;
   int is_latch = (latch == e->src);
-  int scale_act = 0, *scale_step = NULL, scale_main = 0;
-  int scale_after_exit = 0;
-  int p, freq_in, freq_le, freq_out_orig;
-  int prob_pass_thru, prob_pass_wont_exit, prob_pass_main;
+  profile_probability *scale_step = NULL;
+  profile_probability scale_main = profile_probability::always ();
+  profile_probability scale_act = profile_probability::always ();
+  profile_count after_exit_num = profile_count::zero (),
+   after_exit_den = profile_count::zero ();
+  bool scale_after_exit = false;
   int add_irreducible_flag;
   basic_block place_after;
   bitmap bbs_to_scale = NULL;
@@ -1142,33 +1144,26 @@ duplicate_loop_to_header_edge (struct lo
 
   if (flags & DLTHE_FLAG_UPDATE_FREQ)
 {
-  /* Calculate coefficients by that we have to scale frequencies
+  /* Calculate coefficients by that we have to scale counts
 of duplicated loop bodies.  */
-  freq_in = header->count.to_frequency (cfun);
-  freq_le = EDGE_FREQUENCY (latch_edge);
-  if (freq_in == 0)
-   freq_in = 1;
-  if (freq_in < freq_le)
-   freq_in = freq_le;
-  freq_out_orig = orig ? EDGE_FREQUENCY (orig) : freq_in - freq_le;
-  if (freq_out_orig > freq_in - freq_le)
-   freq_out_orig = freq_in - freq_le;
-  prob_pass_thru = RDIV (REG_BR_PROB_BASE * freq_le, freq_in);
-  prob_pass_wont_exit =
- RDIV (REG_BR_PROB_BASE * (freq_le + freq_out_orig), freq_in);
+  profile_count count_in = header->count;
+  profile_count count_le = latch_edge->count ();
+  profile_count count_out_orig = orig ? orig->count () : count_in - 
count_le;
+  profile_probability prob_pass_thru = count_le.probability_in (count_in);
+  profile_probability prob_pass_wont_exit =
+ (count_le + count_out_orig).probability_in (count_in);
 
   if (orig && orig->probability.initialized_p ()
  && !(orig->probability == profile_probability::always ()))
{
  /* The blocks that are dominated by a removed exit edge ORIG have
 frequencies scaled by this.  */
- if (orig->probability.initialized_p ())
-   scale_after_exit
-= GCOV_COMPUTE_SCALE (REG_BR_PROB_BASE,
-  REG_BR_PROB_BASE
- - orig->probability.to_reg_br_prob_base 
());
- else
-   scale_after_exit = REG_BR_PROB_BASE;
+ if (orig->count ().initialized_p ())
+   {
+ after_exit_num = orig->src->count;
+ after_exit_den = after_exit_num - orig->count ();
+ scale_after_exit = true;
+   }
  bbs_to_scale = BITMAP_ALLOC (NULL);
  for (i = 0; i < n; i++)
{
@@ -1178,7 +1173,7 @@ duplicate_loop_to_header_edge (struct lo
}
}
 
-  scale_step = XNEWVEC (int, ndupl);
+  scale_step = XNEWVEC (profile_probability, ndupl);
 
   for (i = 1; i <= ndupl; i++)
scale_step[i - 1] = bitmap_bit_p (wont_exit, i)
@@ -1189,52 +1184,48 @@ duplicate_loop_to_header_edge (struct lo
 copy becomes 1.  */
   if (flags & DLTHE_FLAG_COMPLETTE_PEEL)
{
- int wanted_freq = EDGE_FREQUENCY (e);
-
- if (wanted_freq > freq_in)
-   wanted_freq = freq_in;
+ profile_count wanted_count = e->count ();
 
  gcc_assert (!is_latch);
- /* First copy has frequency of incoming edge.  Each subsequent
-frequency should be reduced by prob_pass_wont_exit.  Caller
+ /* First copy has count of incoming edge.  Each subsequent
+count should be reduced by prob_pass_wont_exit.  Caller
 should've managed the flags so all except for original loop
 has won't exist set.  */
- scale_act = GCOV_COMPUTE_SCALE (wanted_freq, freq_in);
+ scale_act = wanted_count.probability_in (count_in);
  /* Now simulate the dup

cleanup RTL loop insns accounting

2017-11-16 Thread Jan Hubicka
Hi,
this patch removes frequencies from RTL loop accounting.

Honza

* cfgloopanal.c: Include sreal.h
(average_num_loop_insns): Use counts and sreal for accounting.
Index: cfgloopanal.c
===
--- cfgloopanal.c   (revision 254767)
+++ cfgloopanal.c   (working copy)
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.
 #include "expr.h"
 #include "graphds.h"
 #include "params.h"
+#include "sreal.h"
 
 struct target_cfgloop default_target_cfgloop;
 #if SWITCHABLE_TARGET
@@ -199,7 +200,8 @@ int
 average_num_loop_insns (const struct loop *loop)
 {
   basic_block *bbs, bb;
-  unsigned i, binsns, ninsns, ratio;
+  unsigned i, binsns;
+  sreal ninsns;
   rtx_insn *insn;
 
   ninsns = 0;
@@ -213,19 +215,18 @@ average_num_loop_insns (const struct loo
if (NONDEBUG_INSN_P (insn))
  binsns++;
 
-  ratio = loop->header->count.to_frequency (cfun) == 0
- ? BB_FREQ_MAX
- : (bb->count.to_frequency (cfun) * BB_FREQ_MAX)
-/ loop->header->count.to_frequency (cfun);
-  ninsns += binsns * ratio;
+  ninsns += (sreal)binsns * bb->count.to_sreal_scale (loop->header->count);
+  /* Avoid overflows.   */
+  if (ninsns > 100)
+   return 10;
 }
   free (bbs);
 
-  ninsns /= BB_FREQ_MAX;
-  if (!ninsns)
-ninsns = 1; /* To avoid division by zero.  */
+  int64_t ret = ninsns.to_int ();
+  if (!ret)
+ret = 1; /* To avoid division by zero.  */
 
-  return ninsns;
+  return ret;
 }
 
 /* Returns expected number of iterations of LOOP, according to


cleanup profile use in final.c

2017-11-16 Thread Jan Hubicka
Hi
this patch turns final to use counts rather than frequencies when deciding
on alignments.

Honza

* final.c (compute_alignments): Use counts rather than frequencies.
Index: final.c
===
--- final.c (revision 254767)
+++ final.c (working copy)
@@ -661,16 +661,13 @@ insn_current_reference_address (rtx_insn
 }
 }
 
-/* Compute branch alignments based on frequency information in the
-   CFG.  */
+/* Compute branch alignments based on CFG profile.  */
 
 unsigned int
 compute_alignments (void)
 {
   int log, max_skip, max_log;
   basic_block bb;
-  int freq_max = 0;
-  int freq_threshold = 0;
 
   if (label_align)
 {
@@ -693,17 +690,19 @@ compute_alignments (void)
   flow_loops_dump (dump_file, NULL, 1);
 }
   loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
-  FOR_EACH_BB_FN (bb, cfun)
-if (bb->count.to_frequency (cfun) > freq_max)
-  freq_max = bb->count.to_frequency (cfun);
-  freq_threshold = freq_max / PARAM_VALUE (PARAM_ALIGN_THRESHOLD);
+  profile_count count_threshold = cfun->cfg->count_max.apply_scale
+(1, PARAM_VALUE (PARAM_ALIGN_THRESHOLD));
 
   if (dump_file)
-fprintf (dump_file, "freq_max: %i\n",freq_max);
+{
+  fprintf (dump_file, "count_max: ");
+  cfun->cfg->count_max.dump (dump_file);
+  fprintf (dump_file, "\n");
+}
   FOR_EACH_BB_FN (bb, cfun)
 {
   rtx_insn *label = BB_HEAD (bb);
-  int fallthru_frequency = 0, branch_frequency = 0, has_fallthru = 0;
+  bool has_fallthru = 0;
   edge e;
   edge_iterator ei;
 
@@ -712,35 +711,41 @@ compute_alignments (void)
{
  if (dump_file)
fprintf (dump_file,
-"BB %4i freq %4i loop %2i loop_depth %2i skipped.\n",
-bb->index, bb->count.to_frequency (cfun),
+"BB %4i loop %2i loop_depth %2i skipped.\n",
+bb->index,
 bb->loop_father->num,
 bb_loop_depth (bb));
  continue;
}
   max_log = LABEL_ALIGN (label);
   max_skip = targetm.asm_out.label_align_max_skip (label);
+  profile_count fallthru_count = profile_count::zero ();
+  profile_count branch_count = profile_count::zero ();
 
   FOR_EACH_EDGE (e, ei, bb->preds)
{
  if (e->flags & EDGE_FALLTHRU)
-   has_fallthru = 1, fallthru_frequency += EDGE_FREQUENCY (e);
+   has_fallthru = 1, fallthru_count += e->count ();
  else
-   branch_frequency += EDGE_FREQUENCY (e);
+   branch_count += e->count ();
}
   if (dump_file)
{
- fprintf (dump_file, "BB %4i freq %4i loop %2i loop_depth"
-  " %2i fall %4i branch %4i",
-  bb->index, bb->count.to_frequency (cfun), 
bb->loop_father->num,
-  bb_loop_depth (bb),
-  fallthru_frequency, branch_frequency);
+ fprintf (dump_file, "BB %4i loop %2i loop_depth"
+  " %2i fall ",
+  bb->index, bb->loop_father->num,
+  bb_loop_depth (bb));
+ fallthru_count.dump (dump_file);
+ fprintf (dump_file, " branch ");
+ branch_count.dump (dump_file);
  if (!bb->loop_father->inner && bb->loop_father->num)
fprintf (dump_file, " inner_loop");
  if (bb->loop_father->header == bb)
fprintf (dump_file, " loop_header");
  fprintf (dump_file, "\n");
}
+  if (!fallthru_count.initialized_p () || !branch_count.initialized_p ())
+   continue;
 
   /* There are two purposes to align block with no fallthru incoming edge:
 1) to avoid fetch stalls when branch destination is near cache boundary
@@ -753,11 +758,11 @@ compute_alignments (void)
 when function is called.  */
 
   if (!has_fallthru
- && (branch_frequency > freq_threshold
- || (bb->count.to_frequency (cfun) 
-   > bb->prev_bb->count.to_frequency (cfun) * 10
- && (bb->prev_bb->count.to_frequency (cfun)
- <= ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.to_frequency 
(cfun) / 2
+ && (branch_count > count_threshold
+ || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
+ && (bb->prev_bb->count 
+ <= ENTRY_BLOCK_PTR_FOR_FN (cfun)
+  ->count.apply_scale (1, 2)
{
  log = JUMP_ALIGN (label);
  if (dump_file)
@@ -774,9 +779,10 @@ compute_alignments (void)
  && !(single_succ_p (bb)
   && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
  && optimize_bb_for_speed_p (bb)
- && branch_frequency + fallthru_frequency > freq_threshold
- && (branch_frequency
- > fallthru_frequency * PARAM_VALUE (PARAM_ALIGN_LOOP_ITERATIONS)))
+ && branch_count + fallthru_count > count_thre

Re: [PATCH #2], make Float128 built-in functions work with -mabi=ieeelongdouble

2017-11-16 Thread Segher Boessenkool
On Wed, Nov 15, 2017 at 04:56:10PM -0500, Michael Meissner wrote:
> David tells me that the patch to enable float128 built-in functions to work
> with the -mabi=ieeelongdouble option broke AIX because on AIX, the float128
> insns are disabled, and they all become CODE_FOR_nothing.  The switch 
> statement
> that was added in rs6000.c to map KFmode built-in functions to TFmode breaks
> under AIX.

It also breaks on Linux with older binutils (no HAVE_AS_POWER9 defined).

> I changed the code to have a separate table, and the first call, I build the
> table.  If the insn was not generated, it will just be CODE_FOR_nothing, and
> the KF->TF mode conversion will not be done.
> 
> I have tested this on a little endian power8 system and there were no
> regressions.  Once David verifies that it builds on AIX, can I check this into
> the trunk?

I don't like this scheme much (huge table, initialisation at runtime, etc.),
but okay for trunk, to unbreak things there.

Some comments on the patch:

> +  if (first_time)
> + {
> +   first_time = false;
> +   gcc_assert ((int)CODE_FOR_nothing == 0);

No useless cast please.  The whole assert is pretty useless fwiw; just
take it out?

> +   for (i = 0; i < ARRAY_SIZE (map); i++)
> + map_insn_code[(int)map[i].from] = map[i].to;
> + }

Space after cast.

Only do this for codes that are *not* CODE_FOR_nothing?


Segher


Re: [PATCH] Improve -Wmaybe-uninitialized documentation

2017-11-16 Thread Jonathan Wakely

On 15/11/17 20:28 -0700, Martin Sebor wrote:

On 11/15/2017 07:31 AM, Jonathan Wakely wrote:

The docs for -Wmaybe-uninitialized have some issues:

- That first sentence is looong.
- Apparently some C++ programmers think "automatic variable" means one
declared with C++11 `auto`, rather than simply a local variable.
- The sentence about only warning when optimizing is stuck in between
two chunks talking about longjmp, which could be inferred to mean
only the setjmp/longjmp part of the warning depends on optimization.

This attempts to make it easier to parse and understand.


I've always found the description remarkably precise.  Particularly
the bit where it talks about the two paths, one initialized and the
other not.  Your rewording loses that distinction so I don't think
it's as accurate, or even correct.

To use an example, this would satisfy the new description:

 int f (void)
 {
   int i;
   return i;
 }

but it doesn't match GCC behavior (it triggers -Wuninitialized,
not -Wmaybe-uninitialized).  Unless the distinction is more
subtle than I ascribe to it I think it needs to be preserved
in the rewording.


Ah, I tested a similar case and missed that the warning I got was from
-Wuninitialized not -Wmaybe-uninitialized, which made me think that
"a use of the variable that is initialized" was wrong.

OK, so then here's an alternative patch which doesn't touch that first
sentence except to add "(i.e. local)". That makes the first sentence
even longer, but if it's accurate maybe that's OK. This still adds
"These warnings are only possible in optimizing compilation, because
otherwise GCC does not keep track of the state of variables." And
removes the similar text from the middle of the setjmp/longjmp
discussion.


commit 3ebe2a74817b63e27f961e91e6c044d00245
Author: Jonathan Wakely 
Date:   Thu Nov 16 10:43:51 2017 +

Improve -Wmaybe-uninitialized documentation

* doc/invoke.texi (-Wmaybe-uninitialized): Rephrase for clarity.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 85c980bdfc9..bb68c308166 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -4970,11 +4970,16 @@ void store (int *i)
 @item -Wmaybe-uninitialized
 @opindex Wmaybe-uninitialized
 @opindex Wno-maybe-uninitialized
-For an automatic variable, if there exists a path from the function
-entry to a use of the variable that is initialized, but there exist
+For an automatic (i.e.@ local) variable, if there exists a path from the
+function entry to a use of the variable that is initialized, but there exist
 some other paths for which the variable is not initialized, the compiler
 emits a warning if it cannot prove the uninitialized paths are not
-executed at run time. These warnings are made optional because GCC is
+executed at run time.
+
+These warnings are only possible in optimizing compilation, because otherwise
+GCC does not keep track of the state of variables.
+
+These warnings are made optional because GCC is
 not smart enough to see all the reasons why the code might be correct
 in spite of appearing to have an error.  Here is one example of how
 this can happen:
@@ -5004,9 +5009,7 @@ similar code.
 
 @cindex @code{longjmp} warnings
 This option also warns when a non-volatile automatic variable might be
-changed by a call to @code{longjmp}.  These warnings as well are possible
-only in optimizing compilation.
-
+changed by a call to @code{longjmp}.
 The compiler sees only the calls to @code{setjmp}.  It cannot know
 where @code{longjmp} will be called; in fact, a signal handler could
 call it at any point in the code.  As a result, you may get a warning


Re: [PATCH 1/4] Revert "2017-10-04 Petr Ovtchenkov "

2017-11-16 Thread Jonathan Wakely

On 10/10/17 22:55 +0300, Petr Ovtchenkov wrote:

This reverts commit 0dfbafdf338cc6899d146add5161e52efb02c067
(svn r253417).


I'm not even going to bother to review patches sent without any
explanation or rationale for the change.

I will repeat what Paolo said: changing the ABI is not acceptable.



Re: Make istreambuf_iterator::_M_sbuf immutable and add debug checks

2017-11-16 Thread Jonathan Wakely

On 16/11/17 08:51 +0300, Petr Ovtchenkov wrote:

On Mon, 6 Nov 2017 22:19:22 +0100
François Dumont  wrote:


Hi

     Any final decision regarding this patch ?

François


https://gcc.gnu.org/ml/libstdc++/2017-11/msg00036.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00035.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00037.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00034.html


It would be helpful if you two could collaborate and come up with a
good solution, or at least discuss the pros and cons, instead of just
sending competing patches.



Re: [PATCH 3/4] libstdc++: avoid character accumulation in istreambuf_iterator

2017-11-16 Thread Petr Ovtchenkov
On Thu, 16 Nov 2017 10:39:02 +0100
Paolo Carlini  wrote:

> Hi,
> 
> On 16/11/2017 06:31, Petr Ovtchenkov wrote:
> > Is we really worry about frozen sizeof of instantiated template? 
> Yes we do. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html 
> under "Prohibited Changes", point 8.
> 
> Of course removing the buffering has performance implications too - 
> that's why it's there in the first place!

"buffering" here is a secondary buffering (after streambuf).
No relation to performance, but place for incoherence with
state of attached streambuf.

Of cause, I can spend time to measure the difference.

The main point of this patch series is avoidance of lost link between
streambuf and istream_iterator when istream_iterator see eof.
Implementations that forget about attached streambuf after
istream_iterator see eof (or lost synchronization with attached
streambuf) violate principles of C++ objects life cycle.
From practical point of view, such implementation block usage of
istream_iterator for sockets, ttys, etc. --- only non-modified
files remains in scope of application.


> - which I remember we 
> investigated a bit again in the past when somebody reported that a few 
> implementations had it other did not. But I can't say to have followed 
> all the (recently uncovered) conformance implications, it could well be 
> that we cannot be 100% conforming to the letter of the current standard 
> while taking advantage of a buffering mechanism. Jonathan will provide 
> feedback.
> 
> Paolo.

--

  - ptr


Re: [PATCH 3/4] libstdc++: avoid character accumulation in istreambuf_iterator

2017-11-16 Thread Paolo Carlini

Hi,

On 16/11/2017 12:03, Petr Ovtchenkov wrote:

On Thu, 16 Nov 2017 10:39:02 +0100
Paolo Carlini  wrote:


Hi,

On 16/11/2017 06:31, Petr Ovtchenkov wrote:

Is we really worry about frozen sizeof of instantiated template?

Yes we do. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
under "Prohibited Changes", point 8.

Of course removing the buffering has performance implications too -
that's why it's there in the first place!

"buffering" here is a secondary buffering (after streambuf).
No relation to performance, but place for incoherence with
state of attached streambuf.
It depends, we may be dealing with an unbuffered stream. For sure at the 
time we measured a performance impact in some cases, likewise whoever 
implemented it in the first place (not me) otherwise, again, why bothering?


Paolo.



[COMMITTED][AArch64] Fix frame tests

2017-11-16 Thread Wilco Dijkstra
Improve the AArch64 frame tests - add -f(no-)omit-frame-pointer,
update checks and add missing tests.  As a result all tests now
pass.

Committed as obvious.

ChangeLog:
2017-11-16  Wilco Dijkstra  

* gcc.target/aarch64/lr_free_2.c: Fix test.
* gcc.target/aarch64/spill_1.c: Likewise.
* gcc.target/aarch64/test_frame_11.c: Likewise.
* gcc.target/aarch64/test_frame_12.c: Likewise.
* gcc.target/aarch64/test_frame_13.c: Likewise.
* gcc.target/aarch64/test_frame_14.c: Likewise.
* gcc.target/aarch64/test_frame_15.c: Likewise.
* gcc.target/aarch64/test_frame_3.c: Likewise.
* gcc.target/aarch64/test_frame_5.c: Likewise.
* gcc.target/aarch64/test_frame_9.c: Likewise.
--

diff --git a/gcc/testsuite/gcc.target/aarch64/lr_free_2.c 
b/gcc/testsuite/gcc.target/aarch64/lr_free_2.c
index 
e2b9490fab1a27755d239ad6802325a619f73db3..5d9500f4fb144bdae5d0199f0b0a218deb504176
 100644
--- a/gcc/testsuite/gcc.target/aarch64/lr_free_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/lr_free_2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fno-inline -O2 -ffixed-x2 -ffixed-x3 -ffixed-x4 -ffixed-x5 
-ffixed-x6 -ffixed-x7 -ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 
-ffixed-x13 -ffixed-x14 -ffixed-x15 -ffixed-x16 -ffixed-x17 -ffixed-x18 
-ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 
-ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-x28 --save-temps 
-mgeneral-regs-only -fno-ipa-cp -fdump-rtl-ira" } */
+/* { dg-options "-fno-omit-frame-pointer -fno-inline -O2 -ffixed-x2 -ffixed-x3 
-ffixed-x4 -ffixed-x5 -ffixed-x6 -ffixed-x7 -ffixed-x8 -ffixed-x9 -ffixed-x10 
-ffixed-x11 -ffixed-x12 -ffixed-x13 -ffixed-x14 -ffixed-x15 -ffixed-x16 
-ffixed-x17 -ffixed-x18 -ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 
-ffixed-x23 -ffixed-x24 -ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-x28 
--save-temps -mgeneral-regs-only -fno-ipa-cp -fdump-rtl-ira" } */
 
 extern void abort ();
 
diff --git a/gcc/testsuite/gcc.target/aarch64/spill_1.c 
b/gcc/testsuite/gcc.target/aarch64/spill_1.c
index 
847425895d456e4433b0d15556d60a66a8f8f70c..c9528cb21daaefcdd5f1218ee13edf40ee44bd99
 100644
--- a/gcc/testsuite/gcc.target/aarch64/spill_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/spill_1.c
@@ -14,5 +14,3 @@ foo (void)
 }
 
 /* { dg-final { scan-assembler-times {\tmovi\tv[0-9]+\.4s,} 2 } } */
-/* { dg-final { scan-assembler-not {\tldr\t} } } */
-/* { dg-final { scan-assembler-not {\tstr\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_11.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_11.c
index 
f162cc091e0090ece943715ae573d0c11821b19b..67f858260d9156ac00951a21aa57353182188133
 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_11.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_11.c
@@ -5,7 +5,7 @@
  * optimized code should use "stp !" for stack adjustment.  */
 
 /* { dg-do run } */
-/* { dg-options "-O2 --save-temps" } */
+/* { dg-options "-fno-omit-frame-pointer -O2 --save-temps" } */
 
 #include "test_frame_common.h"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
index 
62761e7ff9b3fd0afc064f9a9b737583261b0610..02e48b4acac0a27a4beca0a3011d8d2c1a408117
 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_12.c
@@ -4,7 +4,7 @@
  * number of callee-save reg >= 2.  */
 
 /* { dg-do run } */
-/* { dg-options "-O2 --save-temps" } */
+/* { dg-options "-O2 -fomit-frame-pointer --save-temps" } */
 
 #include "test_frame_common.h"
 
@@ -14,5 +14,5 @@ t_frame_run (test12)
 /* { dg-final { scan-assembler-times "sub\tsp, sp, #\[0-9\]+" 1 } } */
 
 /* Check epilogue using no write-back.  */
-/* { dg-final { scan-assembler "ldp\tx29, x30, \\\[sp, \[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "ldr\tx30, \\\[sp, \[0-9\]+\\\]" } } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_13.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_13.c
index 
74b3370fa463b652265e00fff80cc8856524d509..33139363785d6befcb5eb009bc4324df785d32c4
 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_13.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_13.c
@@ -5,7 +5,7 @@
  * Use a single stack adjustment, no writeback.  */
 
 /* { dg-do run } */
-/* { dg-options "-O2 --save-temps" } */
+/* { dg-options "-fno-omit-frame-pointer -O2 --save-temps" } */
 
 #include "test_frame_common.h"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_14.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_14.c
index 
78818dec32af95c43b610cab1832ea29041c3b36..acef2bbffc82885b3c05d2c902921461bb389c23
 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_14.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_14.c
@@ -4,9 +4,12 @@
  * number of callee-save reg >= 2.  */
 
 /* { dg-do run } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fno-omit-frame-pointer --save-temps" } */
 
 #include "test_fram

Re: [PATCH 1/4] Revert "2017-10-04 Petr Ovtchenkov "

2017-11-16 Thread Petr Ovtchenkov
On Thu, 16 Nov 2017 10:56:29 +
Jonathan Wakely  wrote:

> On 10/10/17 22:55 +0300, Petr Ovtchenkov wrote:
> >This reverts commit 0dfbafdf338cc6899d146add5161e52efb02c067
> >(svn r253417).
> 
> I'm not even going to bother to review patches sent without any
> explanation or rationale for the change.

https://gcc.gnu.org/ml/libstdc++/2017-11/msg00044.html

Along with "violate principles of C++ objects life cycle",
the side-effect is

  - Make istreambuf_iterator::_M_sbuf immutable
  - streambuf_iterator: avoid debug-dependent behaviour

I should underline, that "_M_sbuf = 0" when istreambuf_iterator
see eof, lead to cripple lifecycle of istreambuf_iterator
object and [almost] block usage of istreambuf_iterator
for entities other then immutable files.

All tests from 24_iterators and 25_algorithms passed,
so I expect it conform to Standard.

This is series of patches, not single patch because
I keep in mind technology aspect---easy transfer
to branches other then trunk.

> 
> I will repeat what Paolo said: changing the ABI is not acceptable.

I will repeat special for you:


Is we really worry about frozen sizeof of instantiated template?
(Removed private template member).

If yes, than

   int_type __dummy;

is our all.


I.e. problem can be easy resolved---i.e. ABI will not suffer, if we will
reach some consensus on the main issue. 

> 


Re: [PATCH, GCC/testsuite/ARM] Rework expectation for call to Armv8-M nonsecure function

2017-11-16 Thread Kyrill Tkachov


On 15/11/17 17:04, Thomas Preudhomme wrote:

Hi,

Testcase gcc.target/arm/cmse/cmse-14.c checks whether bar is called via
__gnu_cmse_nonsecure_call libcall and not via a direct call. However the
pattern is a bit surprising in that it needs to explicitely allow "by"
due to allowing anything before the 'b'.

This patch rewrites the logic to look for b as a first non-whitespace
letter followed iby anything (to match bl and conditional branches)
followed by some spaces and then bar.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-11-01  Thomas Preud'homme 

* gcc.target/arm/cmse/cmse-14.c: Change logic to match branch
instruction to bar.

Testing: Test still passes for both Armv8-M Baseline and Mainline.

Is this ok for trunk?



Ok.
Thanks,
Kyrill


Best regards,

Thomas




Re: [PATCH 1/4] Revert "2017-10-04 Petr Ovtchenkov "

2017-11-16 Thread Jonathan Wakely

On 16/11/17 14:35 +0300, Petr Ovtchenkov wrote:

On Thu, 16 Nov 2017 10:56:29 +
Jonathan Wakely  wrote:


On 10/10/17 22:55 +0300, Petr Ovtchenkov wrote:
>This reverts commit 0dfbafdf338cc6899d146add5161e52efb02c067
>(svn r253417).

I'm not even going to bother to review patches sent without any
explanation or rationale for the change.


https://gcc.gnu.org/ml/libstdc++/2017-11/msg00044.html

Along with "violate principles of C++ objects life cycle",
the side-effect is

 - Make istreambuf_iterator::_M_sbuf immutable
 - streambuf_iterator: avoid debug-dependent behaviour

I should underline, that "_M_sbuf = 0" when istreambuf_iterator
see eof, lead to cripple lifecycle of istreambuf_iterator
object and [almost] block usage of istreambuf_iterator
for entities other then immutable files.

All tests from 24_iterators and 25_algorithms passed,
so I expect it conform to Standard.

This is series of patches, not single patch because
I keep in mind technology aspect---easy transfer
to branches other then trunk.



I will repeat what Paolo said: changing the ABI is not acceptable.


I will repeat special for you:


Is we really worry about frozen sizeof of instantiated template?
(Removed private template member).

If yes, than

  int_type __dummy;

is our all.


I.e. problem can be easy resolved---i.e. ABI will not suffer, if we will


What about other translation units which have inlined the old
definition of the template, and expect to find a buffered character in
that member?


reach some consensus on the main issue.


We don't have any consensus, in fact I don't see anybody agreeing with
you, and I've previously stated I don't want to support your use case:
https://gcc.gnu.org/ml/libstdc++/2017-09/msg00100.html



Re: [PATCH, GCC/testsuite/ARM] Fix selection of effective target for cmse tests

2017-11-16 Thread Kyrill Tkachov

Hi Thomas,

On 15/11/17 16:59, Thomas Preudhomme wrote:

Hi,

Some of the tests in the gcc.target/arm/cmse directory (eg.
gcc.target/arm/cmse/mainline/bitfield-4.c) are failing when run without
an architecture specified in RUNTESTFLAGS due to them not adding the
option to select an Armv8-M architecture.

This patch fixes the issue by adding the right option from the exp file
so that no architecture fiddling is necessary in the individual tests.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-11-03  Thomas Preud'homme 

* gcc.target/arm/cmse/cmse.exp: Add option to select Armv8-M 
Baseline

or Armv8-M Mainline when running the respective tests.
* gcc.target/arm/cmse/baseline/cmse-11.c: Remove architecture 
check and

selection.
* gcc.target/arm/cmse/baseline/cmse-13.c: Likewise.
* gcc.target/arm/cmse/baseline/cmse-2.c: Likewise.
* gcc.target/arm/cmse/baseline/cmse-6.c: Likewise.
* gcc.target/arm/cmse/baseline/softfp.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-8.c: Likewise.

Testing: Running cmse.exp for both Armv8-M Baseline and Mainline shows
no regression. Running it for a toolchain defaulting to Armv8-M Baseline
but with RUNTESTFLAGS unset sees some FAIL->PASS.

Is this ok for trunk?



Ok.
Thanks,
Kyrill


Best regards,

Thomas




Re: [PATCH 1/4] Revert "2017-10-04 Petr Ovtchenkov "

2017-11-16 Thread Jonathan Wakely

On 16/11/17 11:39 +, Jonathan Wakely wrote:

On 16/11/17 14:35 +0300, Petr Ovtchenkov wrote:

On Thu, 16 Nov 2017 10:56:29 +
Jonathan Wakely  wrote:


On 10/10/17 22:55 +0300, Petr Ovtchenkov wrote:

This reverts commit 0dfbafdf338cc6899d146add5161e52efb02c067
(svn r253417).


I'm not even going to bother to review patches sent without any
explanation or rationale for the change.


https://gcc.gnu.org/ml/libstdc++/2017-11/msg00044.html

Along with "violate principles of C++ objects life cycle",
the side-effect is

- Make istreambuf_iterator::_M_sbuf immutable
- streambuf_iterator: avoid debug-dependent behaviour

I should underline, that "_M_sbuf = 0" when istreambuf_iterator
see eof, lead to cripple lifecycle of istreambuf_iterator
object and [almost] block usage of istreambuf_iterator
for entities other then immutable files.

All tests from 24_iterators and 25_algorithms passed,
so I expect it conform to Standard.

This is series of patches, not single patch because
I keep in mind technology aspect---easy transfer
to branches other then trunk.



I will repeat what Paolo said: changing the ABI is not acceptable.


I will repeat special for you:


Is we really worry about frozen sizeof of instantiated template?
(Removed private template member).

If yes, than

 int_type __dummy;

is our all.


I.e. problem can be easy resolved---i.e. ABI will not suffer, if we will


What about other translation units which have inlined the old
definition of the template, and expect to find a buffered character in
that member?


In other words, the ABI is not just the "frozen sizeof".


reach some consensus on the main issue.


We don't have any consensus, in fact I don't see anybody agreeing with
you, and I've previously stated I don't want to support your use case:
https://gcc.gnu.org/ml/libstdc++/2017-09/msg00100.html



Re: [PATCH 3/4] libstdc++: avoid character accumulation in istreambuf_iterator

2017-11-16 Thread Petr Ovtchenkov
On Thu, 16 Nov 2017 12:29:37 +0100
Paolo Carlini  wrote:

> Hi,
> 
> On 16/11/2017 12:03, Petr Ovtchenkov wrote:
> > On Thu, 16 Nov 2017 10:39:02 +0100
> > Paolo Carlini  wrote:
> >
> >> Hi,
> >>
> >> On 16/11/2017 06:31, Petr Ovtchenkov wrote:
> >>> Is we really worry about frozen sizeof of instantiated template?
> >> Yes we do. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
> >> under "Prohibited Changes", point 8.
> >>
> >> Of course removing the buffering has performance implications too -
> >> that's why it's there in the first place!
> > "buffering" here is a secondary buffering (after streambuf).
> > No relation to performance, but place for incoherence with
> > state of attached streambuf.
> It depends, we may be dealing with an unbuffered stream. For sure at the 
> time we measured a performance impact in some cases, likewise whoever 
> implemented it in the first place (not me) otherwise, again, why bothering?

This part of code is from SGI, so I suspect that nobody here really 
measure performance difference between "bufferred" and "non-buffered"
implementations. Just because we have only implementation
with _M_c in isreambuf_iterator.

> 
> Paolo.
> 

--

   - ptr


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-16 Thread Tamar Christina
> > > 
> > > I see.  But then the slow_unaligned_access implementation should use
> > > non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> > > defaulted to STRICT_ALIGN.
> > > 
> > > Given that SLOW_UNALIGNED_ACCESS has different values for different
> > > modes it would also make sense to be more specific for the testcase in
> > > question, like word_mode_slow_unaligned_access to tell this only applies 
> > > to
> > > word_mode?
> > 
> > Ah, that's fair enough. I've updated the patch and the new changelog is:
> 
> Did you attach the old patch? I don't see strict_aling being tested in
> the word_mode_np_slow_unalign test.
> 

Sorry! I misunderstood your previous email. I've added the check accordingly.

But this also raises a question, some targets have defined SLOW_UNALIGNED_ACCESS
in a way that uses only internal state to determine the value where 
STRICT_ALIGNMENT
is essentially ignored. e.g. PowerPC and riscv.

The code generation *might* change for them but the tests won't run. I see now 
way to
make the test accurate (as in, runs in all cases where the codegen changed)
unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.

Would this be the way to go?

Thanks,
Tamar

> Richard.
> 
> > 
> > gcc/
> > 2017-11-15  Tamar Christina  
> > 
> > * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > with fast unaligned access.
> > * doc/sourcebuild.texi (word_mode_no_slow_unalign): New.
> > 
> > gcc/testsuite/
> > 2017-11-15  Tamar Christina  
> > 
> > * gcc.dg/struct-simple.c: New.
> > * lib/target-supports.exp
> > (check_effective_target_word_mode_no_slow_unalign): New.
> > 
> > Ok for trunk?
> > 
> > Thanks,
> > Tamar
> > 
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > > Thanks,
> > > > Tamar
> > > > >
> > > > > Otherwise the expr.c change looks ok.
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > >
> > > > > > gcc/
> > > > > > 2017-11-14  Tamar Christina  
> > > > > >
> > > > > > * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > > > > with fast unaligned access.
> > > > > > * doc/sourcebuild.texi (no_slow_unalign): New.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > > 2017-11-14  Tamar Christina  
> > > > > >
> > > > > > * gcc.dg/struct-simple.c: New.
> > > > > > * lib/target-supports.exp
> > > > > > (check_effective_target_no_slow_unalign): New.
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener 
> > > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > > > Norton, HRB 21284 (AG Nuernberg)
> > > >
> > > >
> > > 
> > > --
> > > Richard Biener 
> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > > HRB 21284 (AG Nuernberg)
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)

-- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..3b200964462a82ebbe68bbe798cc91ed27337034 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item word_mode_no_slow_unalign
+Target does not have slow unaligned access when doing word size accesses.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a81c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src
+bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index ..17b956022e4efb37044c7a74cc8baa9fb779221a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@
+/* { dg-do-run } */
+/* { dg-require-effective-target word_mode_no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be us

Re: [PING][patch] PR81794: have "would be stringified in traditional C" warning in libcpp/macro.c be controlled by -Wtraditional

2017-11-16 Thread Eric Gallager
Ping yet again: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00123.html

On 11/9/17, Eric Gallager  wrote:
> Ping again: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00123.html
>
> On 11/2/17, Eric Gallager  wrote:
>> Ping: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01834.html
>>
>> On 10/25/17, Eric Gallager  wrote:
>>> On Sat, Sep 30, 2017 at 8:05 PM, Eric Gallager 
>>> wrote:
 On Fri, Sep 29, 2017 at 11:15 AM, David Malcolm 
 wrote:
> On Sun, 2017-09-17 at 20:00 -0400, Eric Gallager wrote:
>> Attached is a version of
>> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00481.html that
>> contains
>> a combination of both the fix and the testcase update, as requested
>> in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81794#c2
>>
>> I had to use a different computer than I usually use to send this
>> email, as the hard drive that originally had this patch is currently
>> unresponsive. Since it's also the one with my ssh keys on it, I can't
>> commit with it. Sorry if the ChangeLogs get mangled.
>
> Thanks for putting this together; sorry about the delay in reviewing
> it.
>
> The patch mostly looks good.
>
> Did you perform a full bootstrap and run of the testsuite with this
> patch?  If so, it's best to state this in the email, so that we know
> that the patch has survived this level of testing.

 Yes, I bootstrapped with it, but I haven't done a full run of the
 testsuite with it yet; just the one testcase I updated.
>>>
>>> Update: I've now run the testsuite with it; test results are here:
>>> https://gcc.gnu.org/ml/gcc-testresults/2017-10/msg01751.html
>>> I'm pretty sure all the FAILs are unrelated to this patch.
>>>

>
> Some nits below:
>
>> libcpp/ChangeLog:
>>
>> 2017-03-24  Eric Gallager  
>>
>>  * macro.c (check_trad_stringification): Have warning be
>> controlled by
>>  -Wtraditional.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-09-17  Eric Gallager  
>>
>> PR preprocessor/81794
>> * gcc.dg/pragma-diag-7.c: Update to include check for
>> stringification.
>>
>> On Sat, May 6, 2017 at 11:33 AM, Eric Gallager 
>> wrote:
>> > Pinging this: https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01325.h
>> > tml
>> >
>> > On 3/24/17, Eric Gallager  wrote:
>> > > It seemed odd to me that gcc was issuing a warning about
>> > > compatibility
>> > > with traditional C that I couldn't turn off by pushing/popping
>> > > -Wtraditional over the problem area, so I made the attached
>> > > (minor)
>> > > patch to fix it. Survives bootstrap, but the only testing I've
>> > > done
>> > > with it has been compiling the one file that was giving me issues
>> > > previously, which I'd need to reduce further to turn it into a
>> > > proper
>> > > test case.
>> > >
>> > > Thanks,
>> > > Eric Gallager
>> > >
>> > > libcpp/ChangeLog:
>> > >
>> > > 2017-03-24  Eric Gallager  
>> > >
>> > >   * macro.c (check_trad_stringification): Have warning be
>> > > controlled by
>> > >   -Wtraditional.
>> > >
>> >
>> > So I did the reducing I mentioned above and now have a testcase for
>> > it; it was pretty similar to the one from here:
>> > https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01319.html
>> > so I combined them into a single testcase and have attached the
>> > combined version. I can confirm that the testcase passes with my
>> > patch
>> > applied.
>
> [...]
>
>> diff --git a/gcc/testsuite/gcc.dg/pragma-diag-7.c
>> b/gcc/testsuite/gcc.dg/pragma-diag-7.c
>> index 402ee56..e06c410 100644
>> --- a/gcc/testsuite/gcc.dg/pragma-diag-7.c
>> +++ b/gcc/testsuite/gcc.dg/pragma-diag-7.c
>> @@ -7,3 +7,16 @@ unsigned long bad = 1UL; /* { dg-warning "suffix" }
>> */
>>  /* Note the extra space before the pragma on this next line: */
>>   #pragma GCC diagnostic pop
>>  unsigned long ok_again = 2UL; /* { dg-bogus "suffix" } */
>> +
>> +/* Redundant with the previous pop, but just shows that it fails to
>> stop the
>> + * following warning with an unpatched GCC: */
>> +#pragma GCC diagnostic ignored "-Wtraditional"
>> +
>> +/* { dg-bogus "would be stringified" .+1 } */
>
> As far as I can tell, this dg-bogus line doesn't actually get matched;
> when I run the testsuite without the libcpp fix, I get:
>
>   FAIL: gcc.dg/pragma-diag-7.c (test for excess errors)
>
> If I update the dg-bogus line to read:
>
>   /* { dg-bogus "would be stringified" "" { target *-*-* } .+1 } */
>
> then it's matched, and I get:
>
>   FAIL: gcc.dg/pragma-diag-7.c  (test for bogus messages, line 16)
>
> I believe that as written the ".+1" 2nd argument is interpreted as a
> huma

Re: Make istreambuf_iterator::_M_sbuf immutable and add debug checks

2017-11-16 Thread Jonathan Wakely

On 16/11/17 10:57 +, Jonathan Wakely wrote:

On 16/11/17 08:51 +0300, Petr Ovtchenkov wrote:

On Mon, 6 Nov 2017 22:19:22 +0100
François Dumont  wrote:


Hi

    Any final decision regarding this patch ?

François


https://gcc.gnu.org/ml/libstdc++/2017-11/msg00036.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00035.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00037.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00034.html


It would be helpful if you two could collaborate and come up with a
good solution, or at least discuss the pros and cons, instead of just
sending competing patches.



Let me be more clear: I'm not going to review further patches in this
area while you two are proposing different alternatives, without
commenting on each other's approach.

If you think your solution is better than François's solution, you
should explain why, not just send a different patch. If François
thinks his solution is better than yours, he should state why, not
just send a different patch.

I don't have time to infer all that from just your patches, so I'm not
going to bother.



Re: [PATCH 3/4] libstdc++: avoid character accumulation in istreambuf_iterator

2017-11-16 Thread Paolo Carlini

Hi,

On 16/11/2017 12:41, Petr Ovtchenkov wrote:

This part of code is from SGI, so I suspect that nobody here really
measure performance difference between "bufferred" and "non-buffered"
implementations.
Here where? The GNU libstdc++-v3 implementation? Certainly we did, as I 
tried to tell you the issue - the tradeoff between a more "correct", 
simpler, and certainly easier to synchronize implementation and better 
performance in some cases - isn't new. Please carry out a through search 
of Bugzilla and mailing list, there must be something recorded. I'll see 
if I can help about that, it's been a while.


Paolo.


Re: [PATCH 1/4] Revert "2017-10-04 Petr Ovtchenkov "

2017-11-16 Thread Petr Ovtchenkov
On Thu, 16 Nov 2017 11:39:30 +
Jonathan Wakely  wrote:

> On 16/11/17 14:35 +0300, Petr Ovtchenkov wrote:
> >On Thu, 16 Nov 2017 10:56:29 +
> >Jonathan Wakely  wrote:
> >
> >> On 10/10/17 22:55 +0300, Petr Ovtchenkov wrote:
> >> >This reverts commit 0dfbafdf338cc6899d146add5161e52efb02c067
> >> >(svn r253417).
> >>
> >> I'm not even going to bother to review patches sent without any
> >> explanation or rationale for the change.
> >
> >https://gcc.gnu.org/ml/libstdc++/2017-11/msg00044.html
> >
> >Along with "violate principles of C++ objects life cycle",
> >the side-effect is
> >
> >  - Make istreambuf_iterator::_M_sbuf immutable
> >  - streambuf_iterator: avoid debug-dependent behaviour
> >
> >I should underline, that "_M_sbuf = 0" when istreambuf_iterator
> >see eof, lead to cripple lifecycle of istreambuf_iterator
> >object and [almost] block usage of istreambuf_iterator
> >for entities other then immutable files.
> >
> >All tests from 24_iterators and 25_algorithms passed,
> >so I expect it conform to Standard.
> >
> >This is series of patches, not single patch because
> >I keep in mind technology aspect---easy transfer
> >to branches other then trunk.
> >
> >>
> >> I will repeat what Paolo said: changing the ABI is not acceptable.
> >
> >I will repeat special for you:
> >
> >
> >Is we really worry about frozen sizeof of instantiated template?
> >(Removed private template member).
> >
> >If yes, than
> >
> >   int_type __dummy;
> >
> >is our all.
> >
> >
> >I.e. problem can be easy resolved---i.e. ABI will not suffer, if we will
> 
> What about other translation units which have inlined the old
> definition of the template, and expect to find a buffered character in
> that member?

I can say that I can write

  int_type _M_c;

but you see, this is a _private_ member of template, so we should (may?) worry 
only
about size of object.

Just for clarification: Do you made accent on "buffered" or on "character" 
("symbol" in ELF)?

> 
> >reach some consensus on the main issue.
> 
> We don't have any consensus, in fact I don't see anybody agreeing with
> you, and I've previously stated I don't want to support your use case:
> https://gcc.gnu.org/ml/libstdc++/2017-09/msg00100.html
> 


Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-16 Thread Richard Biener
On Thu, 16 Nov 2017, Tamar Christina wrote:

> > > > 
> > > > I see.  But then the slow_unaligned_access implementation should use
> > > > non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> > > > defaulted to STRICT_ALIGN.
> > > > 
> > > > Given that SLOW_UNALIGNED_ACCESS has different values for different
> > > > modes it would also make sense to be more specific for the testcase in
> > > > question, like word_mode_slow_unaligned_access to tell this only 
> > > > applies to
> > > > word_mode?
> > > 
> > > Ah, that's fair enough. I've updated the patch and the new changelog is:
> > 
> > Did you attach the old patch? I don't see strict_aling being tested in
> > the word_mode_np_slow_unalign test.
> > 
> 
> Sorry! I misunderstood your previous email. I've added the check accordingly.

+if { ([istarget x86_64-*-*]
+  || [istarget aarch64*-*-*])
+&& [is-effective-target non_strict_align]
+   } {
+set et_word_mode_no_slow_unalign_saved($et_index) 1
+}

I'd have made it

  if { ([is-effective-target non_strict_align]
&& ! ( [istarget ...] || ))

thus default it to 1 for non-strict-align targets.

> But this also raises a question, some targets have defined 
> SLOW_UNALIGNED_ACCESS
> in a way that uses only internal state to determine the value where 
> STRICT_ALIGNMENT
> is essentially ignored. e.g. PowerPC and riscv.
> 
> The code generation *might* change for them but the tests won't run. I see 
> now way to
> make the test accurate (as in, runs in all cases where the codegen changed)
> unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.
> 
> Would this be the way to go?

I don't think so.  SLOW_UNALIGNED_ACCESS is per mode and specific to
a certain alignment.

Richard.

> Thanks,
> Tamar
> 
> > Richard.
> > 
> > > 
> > > gcc/
> > > 2017-11-15  Tamar Christina  
> > > 
> > >   * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > >   with fast unaligned access.
> > >   * doc/sourcebuild.texi (word_mode_no_slow_unalign): New.
> > >   
> > > gcc/testsuite/
> > > 2017-11-15  Tamar Christina  
> > > 
> > >   * gcc.dg/struct-simple.c: New.
> > >   * lib/target-supports.exp
> > >   (check_effective_target_word_mode_no_slow_unalign): New.
> > > 
> > > Ok for trunk?
> > > 
> > > Thanks,
> > > Tamar
> > > 
> > > > 
> > > > Thanks,
> > > > Richard.
> > > > 
> > > > > Thanks,
> > > > > Tamar
> > > > > >
> > > > > > Otherwise the expr.c change looks ok.
> > > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > >
> > > > > > > gcc/
> > > > > > > 2017-11-14  Tamar Christina  
> > > > > > >
> > > > > > >   * expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > > > > >   with fast unaligned access.
> > > > > > >   * doc/sourcebuild.texi (no_slow_unalign): New.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > 2017-11-14  Tamar Christina  
> > > > > > >
> > > > > > >   * gcc.dg/struct-simple.c: New.
> > > > > > >   * lib/target-supports.exp
> > > > > > >   (check_effective_target_no_slow_unalign): New.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener 
> > > > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > > > > Norton, HRB 21284 (AG Nuernberg)
> > > > >
> > > > >
> > > > 
> > > > --
> > > > Richard Biener 
> > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > > > HRB 21284 (AG Nuernberg)
> > > 
> > 
> > -- 
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> > 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] New lang hook

2017-11-16 Thread Richard Biener
On Tue, Nov 14, 2017 at 1:31 PM, Nathan Sidwell  wrote:
> This patch addresses c++/82836 & c++/82737.  The root cause was a bad
> assumption I made when moving the mangling alias machinery to its own hash
> table.
>
> I had thought that once we SET_DECL_ASSEMBLER_NAME, it never becomes unset
> (or changed).  That is false.  There are paths in the compiler that set it
> back to zero, and at least one path where we remangled because of a bad
> assumption about the templatedness of a friend.
>
> Previously, we placed mangling aliases in the global namespace mapping an
> identifier (the mangled name) to the decl.  Resetting the assembler name
> didn't break this map -- but may have led to unneeded aliases, I guess.
>
> Moving the alias machinery to its own hash-map allowed me to make namespace
> hashes a simple hash, keyed by DECL_NAME (rather than a hash-map from
> identifier->decl).
>
> Finally converting the alias hash-map to a hash-table keyed by
> DECL_ASSEMBLER_NAME shrunk that table too.  But exposed this problem.
>
> I did contemplate reverting that change, but tried this first, and it seems
> good.
>
> 1) rename the args of COPY_DECL_ASSEMBLER_NAME from DECL1 & DECL2 to the
> more semantic SRC_DECL and DST_DECL.  Same for COPY_DECL_RTL.  These macros
> smell rather memcpy-like, but the args are the wrong way round, so the more
> clues the better. (My comment in 82737 was misled by this.)
>
> 2) change SET_DECL_ASSEMBLER_NAME to a function call, similar to
> DECL_ASSEMBLER_NAME.
>
> 3) have that call forward to a new lang hook, override_decl_assembler_name,
> if it is changing the name. (set_decl_assembler_name already having been
> taken.)
>
> 4) the new lang hook default simply stores the new name via
> DECL_ASSEMBLER_NAME_RAW.
>
> 5) the C++ FE overrides this.  If the current name is in the alias map and
> maps to this decl, we take it out of the map.  Then set the new name.
>
> 6) excitingly, mangle_decl can be called with a non-null
> DECL_ASSEMBLER_NAME, so that function's use of SET_DECL_ASSEMBLER_NAME works
> just fine.
>
> booted on all languages.
>
> ok?

Looks reasonable apart from

+  /* Overwrite the DECL_ASSEMBLER_NAME for a node.  The name is being
+ changed (including to or from NULL_TREE).  */

which suggests the default implementation of set_decl_assembler_name would
call this hook (which it doesn't).  Any particular reason?  Maybe just document
(including to NULL_TREE), thus exclude from NULL_TREE?

Thanks,
Richard.

>
> nathan
> --
> Nathan Sidwell


Re: Make istreambuf_iterator::_M_sbuf immutable and add debug checks

2017-11-16 Thread Petr Ovtchenkov
On Thu, 16 Nov 2017 11:46:48 +
Jonathan Wakely  wrote:

> On 16/11/17 10:57 +, Jonathan Wakely wrote:
> >On 16/11/17 08:51 +0300, Petr Ovtchenkov wrote:
> >>On Mon, 6 Nov 2017 22:19:22 +0100
> >>François Dumont  wrote:
> >>
> >>>Hi
> >>>
> >>>     Any final decision regarding this patch ?
> >>>
> >>>François
> >>
> >>https://gcc.gnu.org/ml/libstdc++/2017-11/msg00036.html
> >>https://gcc.gnu.org/ml/libstdc++/2017-11/msg00035.html
> >>https://gcc.gnu.org/ml/libstdc++/2017-11/msg00037.html
> >>https://gcc.gnu.org/ml/libstdc++/2017-11/msg00034.html
> >
> >It would be helpful if you two could collaborate and come up with a
> >good solution, or at least discuss the pros and cons, instead of just
> >sending competing patches.
> 
> 
> Let me be more clear: I'm not going to review further patches in this
> area while you two are proposing different alternatives, without
> commenting on each other's approach.
> 
> If you think your solution is better than François's solution, you
> should explain why, not just send a different patch. If François
> thinks his solution is better than yours, he should state why, not
> just send a different patch.
> 
> I don't have time to infer all that from just your patches, so I'm not
> going to bother.
> 

References here is a notification that

   - there is another opinion;
   - discussion is in another thread.

Nothing more.


[PR c++/81060] ICE with invalid initialzer via lambda

2017-11-16 Thread Nathan Sidwell
This patch fixes a regression my name lookup changes caused earlier this 
year.  We'd already emitted an error about a bogus template definition, 
but the fallout from that now killed us :(


I restored some of the handling r247909 removed.  Namely restoring the 
pushing of a lambda type into the scope enclosing an already-completed 
class.


Interestingly, this caused lambda-template13.C to start complaining 
about a not-defined template needing instantiating on a local (lambda) 
class.  That seems correct to me.  function's ctor is instantiated for 
the default arg in:


  void C::foo (T, function = [] {});

Applying to trunk.

nathan
--
Nathan Sidwell
2017-11-16  Nathan Sidwell  

	PR c++/81060
	* decl.c (xref_tag_1): Push lambda into current scope.
	* name-lookup.c (do_pushtag): Don't deal with ts_lambda here.

	PR c++81060
	* g++.dg/cpp0x/lambda/lambda-template13.C: Avoid undefined
	template using local type error.
	* g++.dg/cpp0x/pr81060.C: New.

Index: cp/decl.c
===
--- cp/decl.c	(revision 254786)
+++ cp/decl.c	(working copy)
@@ -13546,8 +13546,12 @@ xref_tag_1 (enum tag_types tag_code, tre
 	  t = make_class_type (code);
 	  TYPE_CONTEXT (t) = context;
 	  if (scope == ts_lambda)
-	/* Mark it as a lambda type.  */
-	CLASSTYPE_LAMBDA_EXPR (t) = error_mark_node;
+	{
+	  /* Mark it as a lambda type.  */
+	  CLASSTYPE_LAMBDA_EXPR (t) = error_mark_node;
+	  /* And push it into current scope.  */
+	  scope = ts_current;
+	}
 	  t = pushtag (name, t, scope);
 	}
 }
Index: cp/name-lookup.c
===
--- cp/name-lookup.c	(revision 254786)
+++ cp/name-lookup.c	(working copy)
@@ -6242,9 +6242,7 @@ do_pushtag (tree name, tree type, tag_sc
 	view of the language.  */
 	 || (b->kind == sk_template_parms
 	 && (b->explicit_spec_p || scope == ts_global))
-	 /* Pushing into a class is ok for lambdas or when we want current  */
 	 || (b->kind == sk_class
-	 && scope != ts_lambda
 	 && (scope != ts_current
 		 /* We may be defining a new type in the initializer
 		of a static member variable. We allow this when
@@ -6267,7 +6265,6 @@ do_pushtag (tree name, tree type, tag_sc
 	  tree cs = current_scope ();
 
 	  if (scope == ts_current
-	  || scope == ts_lambda
 	  || (cs && TREE_CODE (cs) == FUNCTION_DECL))
 	context = cs;
 	  else if (cs && TYPE_P (cs))
@@ -6304,8 +6301,7 @@ do_pushtag (tree name, tree type, tag_sc
 
   if (b->kind == sk_class)
 	{
-	  if (!TYPE_BEING_DEFINED (current_class_type)
-	  && scope != ts_lambda)
+	  if (!TYPE_BEING_DEFINED (current_class_type))
 	return error_mark_node;
 
 	  if (!PROCESSING_REAL_TEMPLATE_DECL_P ())
Index: testsuite/g++.dg/cpp0x/lambda/lambda-template13.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-template13.C	(revision 254786)
+++ testsuite/g++.dg/cpp0x/lambda/lambda-template13.C	(working copy)
@@ -4,7 +4,7 @@
 struct function
 {
   template < typename _Functor>
-  function (_Functor);
+  function (_Functor) {}
 };
 
 template 
Index: testsuite/g++.dg/cpp0x/pr81060.C
===
--- testsuite/g++.dg/cpp0x/pr81060.C	(revision 0)
+++ testsuite/g++.dg/cpp0x/pr81060.C	(working copy)
@@ -0,0 +1,11 @@
+// { dg-do compile  { target c++11 } }
+// PR 81050 ICE in invalid after error
+
+template struct A
+{
+  static const int i;
+};
+
+template
+const int A::i // { dg-error "template definition of non-template" }
+= []{ return 0; }(); // BOOM!


Re: [RFC PATCH] Merge libsanitizer from upstream

2017-11-16 Thread Maxim Ostapenko

Hi Christophe,

On 13/11/17 15:47, Christophe Lyon wrote:

On 30 October 2017 at 16:21, Maxim Ostapenko  wrote:

On 30/10/17 17:08, Christophe Lyon wrote:

On 30/10/2017 11:12, Maxim Ostapenko wrote:

Hi,

sorry for the late response.

On 20/10/17 13:45, Christophe Lyon wrote:

Hi,

On 19 October 2017 at 13:17, Jakub Jelinek  wrote:

On Thu, Oct 19, 2017 at 02:07:24PM +0300, Maxim Ostapenko wrote:

Is the patch (the merge + this incremental) ok for trunk?

I think the patch is OK, just wondering about two things:

Richi just approved the patch on IRC, so I'll commit, then we can deal
with
follow-ups.


Does anyone else run these tests on arm?
Since you applied this patch, I'm seeing lots of new errors and
timeouts.
I have been ignoring regression reports for *san because of yyrandomness
in the results, but the timeouts are a  major inconvenience in testing
because it increases latency a lot in getting results, or worse I get no
result at all because the validation job is killed before completion.

Looking at some intermediate logs, I have noticed:
==24797==AddressSanitizer CHECK failed:
/libsanitizer/asan/asan_poisoning.cc:34
"((AddrIsAlignedByGranularity(addr))) != (0)" (0x0, 0x0)
  #0 0x408d7d65 in AsanCheckFailed /libsanitizer/asan/asan_rtl.cc:67
  #1 0x408ecd5d in __sanitizer::CheckFailed(char const*, int, char
const*, unsigned long long, unsigned long long)
/libsanitizer/sanitizer_common/sanitizer_termination.cc:77
  #2 0x408d22d5 in __asan::PoisonShadow(unsigned long, unsigned
long, unsigned char) /libsanitizer/asan/asan_poisoning.cc:34
  #3 0x4085409b in __asan_register_globals
/libsanitizer/asan/asan_globals.cc:368
  #4 0x109eb in _GLOBAL__sub_I_00099_1_ten

(/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-linux-gnueabi/gcc3/gcc/testsuite/gcc/alloca_big_alignment.exe+0x109eb)

in MANY (193 in gcc) tests.

and many others (152 in gcc) just time out individually (eg
c-c++-common/asan/alloca_instruments_all_paddings.c) with no error in
the logs besides Dejagnu's
WARNING: program timed out.


Since I'm using an apparently unusual setup, maybe I have to update it
to cope with the new version,
so I'd like to know if others are seeing the same problems on arm?

I'm using qemu -R 0 to execute the test programs, encapsulated by
proot (similar to chroot, but does not require root privileges).

Am I missing something obvious?


I've caught the same error on my Arndale board. The issue seems to be
quite obvious: after merge, ASan requires globals array to be aligned by
shadow granularity.


Thanks for confirming. I've spent a lot of time investigating the timeout
issues, that led to zombie processes and servers needing reboot. I've
finally identified that going back to qemu-2.7 avoid the timeout issues
(I've reported a qemu bug).


This trivial patch seems to fix the issue. Could you check it on your
setup?


I was just about to finally start looking at this sanity check problem, so
thank you very much for sharing your patch.
I manually tested it on the subset of my configs and it solves the
assertion failure, thanks!
However, I can notice many regressions compared to before the merge:
c-c++-common/asan/alloca_instruments_all_paddings.c
c-c++-common/asan/alloca_loop_unpoisoning.c
c-c++-common/asan/alloca_safe_access.c
c-c++-common/asan/asan-interface-1.c
c-c++-common/asan/halt_on_error-1.c
c-c++-common/asan/pr59063-1.c
c-c++-common/asan/pr59063-2.c
c-c++-common/asan/pr63316.c
c-c++-common/asan/pr63888.c
c-c++-common/asan/pr70712.c
c-c++-common/asan/pr71480.c
c-c++-common/asan/pr79944.c
c-c++-common/asan/pr80308.c
c-c++-common/asan/swapcontext-test-1.c
gcc.dg/asan/nosanitize-and-inline.c
gcc.dg/asan/pr79196.c
gcc.dg/asan/pr80166.c
gcc.dg/asan/pr81186.c
gcc.dg/asan/use-after-scope-11.c
gcc.dg/asan/use-after-scope-4.c
gcc.dg/asan/use-after-scope-6.c
gcc.dg/asan/use-after-scope-7.c
gcc.dg/asan/use-after-scope-goto-1.c
gcc.dg/asan/use-after-scope-goto-2.c
gcc.dg/asan/use-after-scope-switch-1.c
gcc.dg/asan/use-after-scope-switch-2.c
gcc.dg/asan/use-after-scope-switch-3.c
gcc.dg/asan/use-after-scope-switch-4.c

out of which only
c-c++-common/asan/swapcontext-test-1.c
c-c++-common/asan/halt_on_error-1.c
print something in gcc.log

Do they pass for you?


Ah, I see. The problem is that after this merge LSan was enabled for ARM.
LSan sets atexit handler that calls internal_clone function that's not
supported in QEMU.
That's why these tests pass on board, but fail under QEMU.
Could you try set ASAN_OPTIONS=detect_leaks=0 in your environment?


Hi,

I have a followup on this issue, after investigating a bit more.

I filed a bug report against QEMU, and indeed it seems that it rejects
clone() as called by the sanitizers on purpose, because it cannot support
CLONE_UNTRACED.

That being said, I was wondering why the same tests worked "better"
with qemu-aarch64 (as opposed to qemu-arm). And I noticed that on aarch64,
we have sanitizer_common/sanitizer_syscall_linux_aarch64.inc where
internal_iserror

[PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-16 Thread Alan Hayward
This is a set of patches aimed at supporting aarch64 SVE register
preservation around TLS calls.

Across a TLS call, Aarch64 SVE does not explicitly preserve the
SVE vector registers. However, the Neon vector registers are preserved.
Due to overlapping of registers, this means the lower 128bits of all
SVE vector registers will be preserved.

The existing GCC code will currently incorrectly assume preservation
of all of the SVE registers.

This patch introduces a CLOBBER_HIGH expression. This behaves a bit like
a CLOBBER expression. CLOBBER_HIGH can only refer to a single register.
The mode of the expression indicates the size of the lower bits which
will be preserved. If the register contains a value bigger than this
mode then the code will treat the register as clobbered.

The means in order to evaluate if a clobber high is relevant, we need to ensure
the mode of the existing value in a register is tracked.

The following patches in this series add support for the CLOBBER_HIGH,
with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
aarch64. The testing performed on these patches is also detailed in the
final patch.

These patches are based on top of the linaro-dev/sve branch.

A simpler alternative to this patch would be to assume all Neon and SVE
registers are clobbered across TLS calls, however this would be a
performance regression against all Aarch64 targets.

Alan.


2017-11-16  Alan Hayward  

* doc/rtl.texi (clobber_high): Add.
(parallel): Add in clobber high
* rtl.c (rtl_check_failed_code3): Add function.
* rtl.def (CLOBBER_HIGH): Add expression.
* rtl.h (RTL_CHECKC3): Add macro.
(rtl_check_failed_code3): Add declaration.
(XC3EXP): Add macro.


diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 
f583940b9441b2111c8d65a00a064e89bdd2ffaf..951322258ddbb57900225bd501bd23a8a9970ead
 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -3209,6 +3209,18 @@ There is one other known use for clobbering a pseudo 
register in a
 clobbered by the insn.  In this case, using the same pseudo register in
 the clobber and elsewhere in the insn produces the expected results.

+@findex clobber_high
+@item (clobber_high @var{x})
+Represents the storing or possible storing of an unpredictable,
+undescribed value into the upper parts of @var{x}. The mode of the expression
+represents the lower parts of the register which will not be overwritten.
+@code{reg} must be a reg expression.
+
+One place this is used is when calling into functions where the registers are
+preserved, but only up to a given number of bits.  For example when using
+Aarch64 SVE, calling a TLS descriptor will cause only the lower 128 bits of
+each of the vector registers to be preserved.
+
 @findex use
 @item (use @var{x})
 Represents the use of the value of @var{x}.  It indicates that the
@@ -3262,7 +3274,8 @@ Represents several side effects performed in parallel.  
The square
 brackets stand for a vector; the operand of @code{parallel} is a
 vector of expressions.  @var{x0}, @var{x1} and so on are individual
 side effect expressions---expressions of code @code{set}, @code{call},
-@code{return}, @code{simple_return}, @code{clobber} or @code{use}.
+@code{return}, @code{simple_return}, @code{clobber} @code{use} or
+@code{clobber_high}.

 ``In parallel'' means that first all the values used in the individual
 side-effects are computed, and second all the actual side-effects are
diff --git a/gcc/rtl.c b/gcc/rtl.c
index 
3b2728be8b506fb3c14a20297cf92368caa5ca3b..6db84f99627bb8617c6e227892ca44076f4e729b
 100644
--- a/gcc/rtl.c
+++ b/gcc/rtl.c
@@ -860,6 +860,17 @@ rtl_check_failed_code2 (const_rtx r, enum rtx_code code1, 
enum rtx_code code2,
 }

 void
+rtl_check_failed_code3 (const_rtx r, enum rtx_code code1, enum rtx_code code2,
+   enum rtx_code code3, const char *file, int line,
+   const char *func)
+{
+  internal_error
+("RTL check: expected code '%s', '%s' or '%s', have '%s' in %s, at %s:%d",
+ GET_RTX_NAME (code1), GET_RTX_NAME (code2), GET_RTX_NAME (code3),
+ GET_RTX_NAME (GET_CODE (r)), func, trim_filename (file), line);
+}
+
+void
 rtl_check_failed_code_mode (const_rtx r, enum rtx_code code, machine_mode mode,
bool not_mode, const char *file, int line,
const char *func)
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 
83bcfcaadcacc45cce352bf7fba33fbbc87ccd58..a6c4d4a46c4eb4f6cb0eca66a3f6a558f94acc8a
 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -312,6 +312,16 @@ DEF_RTL_EXPR(USE, "use", "e", RTX_EXTRA)
is considered undeletable before reload.  */
 DEF_RTL_EXPR(CLOBBER, "clobber", "e", RTX_EXTRA)

+/* Indicate that the upper parts of something are clobbered in a way that we
+   don't want to explain.  The MODE references the lower bits that will be
+   preserved.  Anything above that size will be clobbered.
+
+   CLOBBER_HIGH only occurs as the operand of a PARALL

[PATCH 3/7] Add func to check if register is clobbered by clobber_high

2017-11-16 Thread Alan Hayward
This patch adds the function reg_is_clobbered_by_clobber_high.
Given a CLOBBER_HIGH expression and a register, it checks if
the register will be clobbered.

A second version exists for the cases where the expressions are
not available.

The function will be used throughout the following patches.

Alan.

2017-11-16  Alan Hayward  

* rtl.h (reg_is_clobbered_by_clobber_high): Add declarations.
* rtlanal.c (reg_is_clobbered_by_clobber_high): Add function.

diff --git a/gcc/rtl.h b/gcc/rtl.h
index 
bdb05d00120e7fadeb7f2d29bd67afc7a77262c1..5a85eb42ea4455cf3a975b3adbdb9d0415441d3b
 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3417,6 +3417,16 @@ extern bool tablejump_p (const rtx_insn *, rtx_insn **, 
rtx_jump_table_data **);
 extern int computed_jump_p (const rtx_insn *);
 extern bool tls_referenced_p (const_rtx);
 extern bool contains_mem_rtx_p (rtx x);
+extern bool reg_is_clobbered_by_clobber_high (unsigned int, machine_mode,
+ const_rtx);
+
+/* Convenient wrapper for reg_is_clobbered_by_clobber_high.  */
+inline bool
+reg_is_clobbered_by_clobber_high (const_rtx x, const_rtx clobber_high_op)
+{
+  return reg_is_clobbered_by_clobber_high (REGNO (x), GET_MODE (x),
+  clobber_high_op);
+}

 /* Overload for refers_to_regno_p for checking a single register.  */
 inline bool
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 
79a5ae197c14ba338240123f2fc912f2ea60e178..923e3314d25c05f9055907c61b4a24186701cc23
 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -6519,3 +6519,32 @@ tls_referenced_p (const_rtx x)
   return true;
   return false;
 }
+
+/* Return true if reg REGNO with mode REG_MODE would be clobbered by the
+   clobber_high operand in CLOBBER_HIGH_OP.  */
+
+bool
+reg_is_clobbered_by_clobber_high (unsigned int regno, machine_mode reg_mode,
+ const_rtx clobber_high_op)
+{
+  unsigned int clobber_regno = REGNO (clobber_high_op);
+  machine_mode clobber_mode = GET_MODE (clobber_high_op);
+  unsigned char regno_nregs = hard_regno_nregs (regno, reg_mode);
+
+  /* Clobber high should always span exactly one register.  */
+  gcc_assert (REG_NREGS (clobber_high_op) == 1);
+
+  /* Clobber high needs to match with one of the registers in X.  */
+  if (clobber_regno < regno || clobber_regno >= regno + regno_nregs)
+return false;
+
+  gcc_assert (reg_mode != BLKmode && clobber_mode != BLKmode);
+
+  if (reg_mode == VOIDmode)
+return clobber_mode != VOIDmode;
+
+  /* Clobber high will clobber if its size might be greater than the size of
+ register regno.  */
+  return may_gt (exact_div (GET_MODE_SIZE (reg_mode), regno_nregs),
+GET_MODE_SIZE (clobber_mode));
+}



[PATCH 2/7] Support >26 operands in generation code.

2017-11-16 Thread Alan Hayward
This patch adds support for CLOBBER_HIGH in the generation code.

An aarch64 will require 31 clobber high expressions, plus two
clobbers.

The exisiting gen code restricts to 26 vector operands by virtue
of using the operators [a-z]. This patch extends this to 52 by
supporting [a-zA-Z].

Alan.

2017-11-16  Alan Hayward  

* emit-rtl.c (verify_rtx_sharing): Check CLOBBER_HIGH.
(copy_insn_1): Likewise.
(gen_hard_reg_clobber_high): Add function.
* genconfig.c (walk_insn_part): Check CLOBBER_HIGH.
* genemit.c (gen_exp): Likewise.
(gen_emit_seq): Pass thru info.
(gen_insn): Check CLOBBER_HIGH.
(gen_expand): Pass thru info.
(gen_split): Likewise.
(output_add_clobbers): Likewise.
* genextract.c (push_pathstr_operand): New function to
support [a-zA-Z].
(walk_rtx): Call push_pathstr_operand.
(print_path): Support [a-zA-Z].
* genrecog.c (validate_pattern): Check CLOBBER_HIGH.
(remove_clobbers): Likewise.
* rtl.h (gen_hard_reg_clobber_high): Add declaration.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 
af4a038d75acf17c7b04ad58ab7467f7bd7cd129..64159a82e3f79b792d58166d4307db8751c88980
 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -2895,6 +2895,7 @@ verify_rtx_sharing (rtx orig, rtx insn)
   /* SCRATCH must be shared because they represent distinct values.  */
   return;
 case CLOBBER:
+case CLOBBER_HIGH:
   /* Share clobbers of hard registers (like cc0), but do not share pseudo 
reg
  clobbers or clobbers of hard registers that originated as pseudos.
  This is needed to allow safe register renaming.  */
@@ -3148,6 +3149,7 @@ repeat:
   /* SCRATCH must be shared because they represent distinct values.  */
   return;
 case CLOBBER:
+case CLOBBER_HIGH:
   /* Share clobbers of hard registers (like cc0), but do not share pseudo 
reg
  clobbers or clobbers of hard registers that originated as pseudos.
  This is needed to allow safe register renaming.  */
@@ -5707,6 +5709,7 @@ copy_insn_1 (rtx orig)
 case SIMPLE_RETURN:
   return orig;
 case CLOBBER:
+case CLOBBER_HIGH:
   /* Share clobbers of hard registers (like cc0), but do not share pseudo 
reg
  clobbers or clobbers of hard registers that originated as pseudos.
  This is needed to allow safe register renaming.  */
@@ -6529,6 +6532,19 @@ gen_hard_reg_clobber (machine_mode mode, unsigned int 
regno)
gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (mode, regno)));
 }

+static GTY((deletable)) rtx
+hard_reg_clobbers_high[NUM_MACHINE_MODES][FIRST_PSEUDO_REGISTER];
+
+rtx
+gen_hard_reg_clobber_high (machine_mode mode, unsigned int regno)
+{
+  if (hard_reg_clobbers_high[mode][regno])
+return hard_reg_clobbers_high[mode][regno];
+  else
+return (hard_reg_clobbers_high[mode][regno]
+   = gen_rtx_CLOBBER_HIGH (VOIDmode, gen_rtx_REG (mode, regno)));
+}
+
 location_t prologue_location;
 location_t epilogue_location;

diff --git a/gcc/genconfig.c b/gcc/genconfig.c
index 
4ff36cb019d427f410d9f251777b9b05217fac36..4108e9c457fce5529ec9a3284d37f933736776ad
 100644
--- a/gcc/genconfig.c
+++ b/gcc/genconfig.c
@@ -72,6 +72,7 @@ walk_insn_part (rtx part, int recog_p, int non_pc_set_src)
   switch (code)
 {
 case CLOBBER:
+case CLOBBER_HIGH:
   clobbers_seen_this_insn++;
   break;

diff --git a/gcc/genemit.c b/gcc/genemit.c
index 
708da27221546c406030e88a4b07a51fb9df4a14..4e93b6b9831c65fc829fed6367881233b8eddcac
 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -79,7 +79,7 @@ gen_rtx_scratch (rtx x, enum rtx_code subroutine_type)
substituting any operand references appearing within.  */

 static void
-gen_exp (rtx x, enum rtx_code subroutine_type, char *used)
+gen_exp (rtx x, enum rtx_code subroutine_type, char *used, md_rtx_info *info)
 {
   RTX_CODE code;
   int i;
@@ -123,7 +123,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used)
   for (i = 0; i < XVECLEN (x, 1); i++)
{
  printf (",\n\t\t");
- gen_exp (XVECEXP (x, 1, i), subroutine_type, used);
+ gen_exp (XVECEXP (x, 1, i), subroutine_type, used, info);
}
   printf (")");
   return;
@@ -137,7 +137,7 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used)
   for (i = 0; i < XVECLEN (x, 2); i++)
{
  printf (",\n\t\t");
- gen_exp (XVECEXP (x, 2, i), subroutine_type, used);
+ gen_exp (XVECEXP (x, 2, i), subroutine_type, used, info);
}
   printf (")");
   return;
@@ -163,12 +163,21 @@ gen_exp (rtx x, enum rtx_code subroutine_type, char *used)
 case CLOBBER:
   if (REG_P (XEXP (x, 0)))
{
- printf ("gen_hard_reg_clobber (%smode, %i)", GET_MODE_NAME (GET_MODE 
(XEXP (x, 0))),
-REGNO (XEXP (x, 0)));
+ printf ("gen_hard_reg_clobber (%smode, %i)",

[PATCH 4/7]: lra support for clobber_high

2017-11-16 Thread Alan Hayward
This patch simply adds the lra specific changes for clobber_high.

Alan.


2017-11-16  Alan Hayward  

* lra-eliminations.c (lra_eliminate_regs_1): Check for clobber high.
(mark_not_eliminable): Likewise.
* lra-int.h (struct lra_insn_reg): Add clobber high marker.
* lra-lives.c (process_bb_lives): Check for clobber high. 
* lra.c (new_insn_reg): Remember clobber highs.
(collect_non_operand_hard_regs): Check for clobber high.
(lra_set_insn_recog_data): Likewise.
(add_regs_to_insn_regno_info): Likewise.
(lra_update_insn_regno_info): Likewise.


diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c
index 
bea8b023b7cb7a512f7482a2f014647c30462870..251e539530456722e3f4231b928c2124f9d602b6
 100644
--- a/gcc/lra-eliminations.c
+++ b/gcc/lra-eliminations.c
@@ -654,6 +654,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
   return x;

 case CLOBBER:
+case CLOBBER_HIGH:
 case SET:
   gcc_unreachable ();

@@ -806,6 +807,16 @@ mark_not_eliminable (rtx x, machine_mode mem_mode)
setup_can_eliminate (ep, false);
   return;

+case CLOBBER_HIGH:
+  gcc_assert (REG_P (XEXP (x, 0)));
+  gcc_assert (REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER);
+  for (ep = reg_eliminate;
+  ep < ®_eliminate[NUM_ELIMINABLE_REGS];
+  ep++)
+   if (reg_is_clobbered_by_clobber_high (ep->to_rtx, XEXP (x, 0)))
+ setup_can_eliminate (ep, false);
+  return;
+
 case SET:
   if (SET_DEST (x) == stack_pointer_rtx
  && GET_CODE (SET_SRC (x)) == PLUS
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 
6c219eacee3940054ed480a44cda1ed07993ad16..be439b95e9cedb358e9ba6c63f8a9490af5c816d
 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -166,6 +166,8 @@ struct lra_insn_reg
   /* True if there is an early clobber alternative for this
  operand.  */
   unsigned int early_clobber : 1;
+  /* True if the reg is clobber highed by the operand.  */
+  unsigned int clobber_high : 1;
   /* The corresponding regno of the register.  */
   int regno;
   /* Next reg info of the same insn.  */
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 
df7e2537dd09a4abd5ce517f4bb556cc32000fa0..82a1811f837b425a326753c0d956149382d752cb
 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -655,7 +655,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
dead_insn_p)
   bool call_p;
   int n_alt, dst_regno, src_regno;
   rtx set;
-  struct lra_insn_reg *reg;
+  struct lra_insn_reg *reg, *hr;

   if (!NONDEBUG_INSN_P (curr_insn))
continue;
@@ -687,11 +687,12 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
dead_insn_p)
break;
  }
  for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
-   if (reg->type != OP_IN)
+   if (reg->type != OP_IN && !reg->clobber_high)
  {
remove_p = false;
break;
  }
+
  if (remove_p && ! volatile_refs_p (PATTERN (curr_insn)))
{
  dst_regno = REGNO (SET_DEST (set));
@@ -809,14 +810,24 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
dead_insn_p)
 unused values because they still conflict with quantities
 that are live at the time of the definition.  */
   for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-   if (reg->type != OP_IN)
- {
-   need_curr_point_incr
- |= mark_regno_live (reg->regno, reg->biggest_mode,
- curr_point);
-   check_pseudos_live_through_calls (reg->regno,
- last_call_used_reg_set);
- }
+   {
+ if (reg->type != OP_IN)
+   {
+ need_curr_point_incr
+   |= mark_regno_live (reg->regno, reg->biggest_mode,
+   curr_point);
+ check_pseudos_live_through_calls (reg->regno,
+   last_call_used_reg_set);
+   }
+
+ if (reg->regno >= FIRST_PSEUDO_REGISTER)
+   for (hr = curr_static_id->hard_regs; hr != NULL; hr = hr->next)
+ if (hr->clobber_high
+ && may_gt (GET_MODE_SIZE (PSEUDO_REGNO_MODE (reg->regno)),
+GET_MODE_SIZE (hr->biggest_mode)))
+   SET_HARD_REG_BIT (lra_reg_info[reg->regno].conflict_hard_regs,
+ hr->regno);
+   }

   for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
if (reg->type != OP_IN)
diff --git a/gcc/lra.c b/gcc/lra.c
index 
8d44c75b0b4f89ff9fe94d0b8dfb2e77d43fee26..d6775d629655700484760ab85f78b9fd16189ca0
 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -535,13 +535,14 @@ object_allocator lra_insn_reg_pool ("insn 
regs");
clobbered in the insn (EARLY_CLOBBER), and reference to the next
 

[PATCH 5/7]: cse support for clobber_high

2017-11-16 Thread Alan Hayward
This patch simply adds the cse specific changes for clobber_high.

Alan.

2017-11-16  Alan Hayward  

* cse.c (invalidate_reg): New function extracted from...
(invalidate): ...here.
(canonicalize_insn): Check for clobber high.
(invalidate_from_clobbers): invalidate clobber highs.
(invalidate_from_sets_and_clobbers): Likewise.
(count_reg_usage): Check for clobber high.
(insn_live_p): Likewise.
* cselib.c (cselib_expand_value_rtx_1):Likewise.
(cselib_invalidate_regno): Check for clobber in setter.
(cselib_invalidate_rtx): Pass through setter.
(cselib_invalidate_rtx_note_stores):
(cselib_process_insn): Check for clobber high.
* cselib.h (cselib_invalidate_rtx): Add operand.

diff --git a/gcc/cse.c b/gcc/cse.c
index 
e3c0710215df0acca924ce74ffa54582125d0136..f15ae8693fbb243323dd049e21648a49546a4608
 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -559,6 +559,7 @@ static struct table_elt *insert_with_costs (rtx, struct 
table_elt *, unsigned,
 static struct table_elt *insert (rtx, struct table_elt *, unsigned,
 machine_mode);
 static void merge_equiv_classes (struct table_elt *, struct table_elt *);
+static void invalidate_reg (rtx, bool);
 static void invalidate (rtx, machine_mode);
 static void remove_invalid_refs (unsigned int);
 static void remove_invalid_subreg_refs (unsigned int, poly_uint64,
@@ -1818,7 +1819,85 @@ check_dependence (const_rtx x, rtx exp, machine_mode 
mode, rtx addr)
 }
   return false;
 }
-

+
+/* Remove from the hash table, or mark as invalid, all expressions whose
+   values could be altered by storing in register X.
+
+   CLOBBER_HIGH is set if X was part of a CLOBBER_HIGH expression.  */
+
+static void
+invalidate_reg (rtx x, bool clobber_high)
+{
+  gcc_assert (GET_CODE (x) == REG);
+
+  /* If X is a register, dependencies on its contents are recorded
+ through the qty number mechanism.  Just change the qty number of
+ the register, mark it as invalid for expressions that refer to it,
+ and remove it itself.  */
+  unsigned int regno = REGNO (x);
+  unsigned int hash = HASH (x, GET_MODE (x));
+
+  /* Remove REGNO from any quantity list it might be on and indicate
+ that its value might have changed.  If it is a pseudo, remove its
+ entry from the hash table.
+
+ For a hard register, we do the first two actions above for any
+ additional hard registers corresponding to X.  Then, if any of these
+ registers are in the table, we must remove any REG entries that
+ overlap these registers.  */
+
+  delete_reg_equiv (regno);
+  REG_TICK (regno)++;
+  SUBREG_TICKED (regno) = -1;
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+{
+  gcc_assert (!clobber_high);
+  remove_pseudo_from_table (x, hash);
+}
+  else
+{
+  HOST_WIDE_INT in_table = TEST_HARD_REG_BIT (hard_regs_in_table, regno);
+  unsigned int endregno = END_REGNO (x);
+  unsigned int rn;
+  struct table_elt *p, *next;
+
+  CLEAR_HARD_REG_BIT (hard_regs_in_table, regno);
+
+  for (rn = regno + 1; rn < endregno; rn++)
+   {
+ in_table |= TEST_HARD_REG_BIT (hard_regs_in_table, rn);
+ CLEAR_HARD_REG_BIT (hard_regs_in_table, rn);
+ delete_reg_equiv (rn);
+ REG_TICK (rn)++;
+ SUBREG_TICKED (rn) = -1;
+   }
+
+  if (in_table)
+   for (hash = 0; hash < HASH_SIZE; hash++)
+ for (p = table[hash]; p; p = next)
+   {
+ next = p->next_same_hash;
+
+ if (!REG_P (p->exp) || REGNO (p->exp) >= FIRST_PSEUDO_REGISTER)
+   continue;
+
+ if (clobber_high)
+   {
+ if (reg_is_clobbered_by_clobber_high (p->exp, x))
+   remove_from_table (p, hash);
+   }
+ else
+   {
+ unsigned int tregno = REGNO (p->exp);
+ unsigned int tendregno = END_REGNO (p->exp);
+ if (tendregno > regno && tregno < endregno)
+   remove_from_table (p, hash);
+   }
+   }
+}
+}
+
 /* Remove from the hash table, or mark as invalid, all expressions whose
values could be altered by storing in X.  X is a register, a subreg, or
a memory reference with nonvarying address (because, when a memory
@@ -1841,65 +1920,7 @@ invalidate (rtx x, machine_mode full_mode)
   switch (GET_CODE (x))
 {
 case REG:
-  {
-   /* If X is a register, dependencies on its contents are recorded
-  through the qty number mechanism.  Just change the qty number of
-  the register, mark it as invalid for expressions that refer to it,
-  and remove it itself.  */
-   unsigned int regno = REGNO (x);
-   unsigned int hash = HASH (x, GET_MODE (x));
-
-   /* Remove REGNO from any quantity list it might be on and indicate
-  that its value might have changed.  If it is a pse

[PATCH 6/7]: Remaining support for clobber high

2017-11-16 Thread Alan Hayward
This patch simply adds the remainder of clobber high checks.
Happy to split this into smaller patches if required (there didn't
seem anything obvious to split into).


Alan.

2017-11-16  Alan Hayward  

* alias.c (record_set): Check for clobber high.
* cfgexpand.c (expand_gimple_stmt): Likewise.
* combine-stack-adj.c (single_set_for_csa): Likewise.
* combine.c (find_single_use_1): Likewise.
(set_nonzero_bits_and_sign_copies): Likewise.
(get_combine_src_dest): Likewise.
(is_parallel_of_n_reg_sets): Likewise.
(try_combine): Likewise.
(record_dead_and_set_regs_1): Likewise.
(reg_dead_at_p_1): Likewise.
(reg_dead_at_p): Likewise.
* dce.c (deletable_insn_p): Likewise.
(mark_nonreg_stores_1): Likewise.
(mark_nonreg_stores_2): Likewise.
* df-scan.c (df_find_hard_reg_defs): Likewise.
(df_uses_record): Likewise.
(df_get_call_refs): Likewise.
* dwarf2out.c (mem_loc_descriptor): Likewise.
* haifa-sched.c (haifa_classify_rtx): Likewise.
* ira-build.c (create_insn_allocnos): Likewise.
* ira-costs.c (scan_one_insn): Likewise.
* ira.c (equiv_init_movable_p): Likewise.
(rtx_moveable_p): Likewise.
(interesting_dest_for_shprep): Likewise.
* jump.c (mark_jump_label_1): Likewise.
* postreload-gcse.c (record_opr_changes): Likewise.
* postreload.c (reload_cse_simplify): Likewise.
(struct reg_use): Add source expr.
(reload_combine): Check for clobber high.
(reload_combine_note_use): Likewise.
(reload_cse_move2add): Likewise.
(move2add_note_store): Likewise.
* print-rtl.c (print_pattern): Likewise.
* recog.c (decode_asm_operands): Likewise.
(store_data_bypass_p): Likewise.
(if_test_bypass_p): Likewise.
* regcprop.c (kill_clobbered_value): Likewise.
(kill_set_value): Likewise.
* reginfo.c (reg_scan_mark_refs): Likewise.
* reload1.c (maybe_fix_stack_asms): Likewise.
(eliminate_regs_1): Likewise.
(elimination_effects): Likewise.
(mark_not_eliminable): Likewise.
(scan_paradoxical_subregs): Likewise.
(forget_old_reloads_1): Likewise.
* reorg.c (find_end_label): Likewise.
(try_merge_delay_insns): Likewise.
(redundant_insn): Likewise.
(own_thread_p): Likewise.
(fill_simple_delay_slots): Likewise.
(fill_slots_from_thread): Likewise.
(dbr_schedule): Likewise.
* resource.c (update_live_status): Likewise.
(mark_referenced_resources): Likewise.
(mark_set_resources): Likewise.
* rtl.c (copy_rtx): Likewise.
* rtlanal.c (reg_referenced_p): Likewise.
(single_set_2): Likewise.
(noop_move_p): Likewise.
(note_stores): Likewise.
* sched-deps.c (sched_analyze_reg): Likewise.
(sched_analyze_insn): Likewise.


diff --git a/gcc/alias.c b/gcc/alias.c
index 
c69ef410edac2ab0ab93e8ec9fe4c89a7078c001..6a6734bd7d5732c255c009be47e68aa073a9ebb1
 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -1554,6 +1554,17 @@ record_set (rtx dest, const_rtx set, void *data 
ATTRIBUTE_UNUSED)
  new_reg_base_value[regno] = 0;
  return;
}
+  /* A CLOBBER_HIGH only wipes out the old value if the mode of the old
+value is greater than that of the clobber.  */
+  else if (GET_CODE (set) == CLOBBER_HIGH)
+   {
+ if (new_reg_base_value[regno] != 0
+ && reg_is_clobbered_by_clobber_high (
+  regno, GET_MODE (new_reg_base_value[regno]), XEXP (set, 0)))
+   new_reg_base_value[regno] = 0;
+ return;
+   }
+
   src = SET_SRC (set);
 }
   else
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 
06a8af8a1663c9e518a8169650a0c9969990df1f..ea6fc265f757543cff635a805fd4045a10add23e
 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3803,6 +3803,7 @@ expand_gimple_stmt (gimple *stmt)
  /* If we want exceptions for non-call insns, any
 may_trap_p instruction may throw.  */
  && GET_CODE (PATTERN (insn)) != CLOBBER
+ && GET_CODE (PATTERN (insn)) != CLOBBER_HIGH
  && GET_CODE (PATTERN (insn)) != USE
  && insn_could_throw_p (insn))
make_reg_eh_region_note (insn, 0, lp_nr);
diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
index 
09f0be814f98922b6926a929401894809a890f61..595e83c73760a97e0f8ebd99e12b1853d1d52b92
 100644
--- a/gcc/combine-stack-adj.c
+++ b/gcc/combine-stack-adj.c
@@ -133,6 +133,7 @@ single_set_for_csa (rtx_insn *insn)
  && SET_SRC (this_rtx) == SET_DEST (this_rtx))
;
   else if (GET_CODE (this_rtx) != CLOBBER
+  && GET_CODE (this_rtx) != CLOBBER_HIGH
   && GET_CODE (this_rtx) != USE)
return NULL_RTX;
 }
diff --git a/gcc/combine.c b/gcc

[PATCH 7/7]: Enable clobber high for tls descs on Aarch64

2017-11-16 Thread Alan Hayward
This final patch adds the clobber high expressions to tls_desc for aarch64.
It also adds three tests.

In addition I also tested by taking the gcc torture test suite and making
all global variables __thread. Then emended the suite to compile with -fpic,
save the .s file and only for one given O level.
I ran this before and after the patch and compared the resulting .s files,
ensuring that there were no ASM changes.
I discarded the 10% of tests that failed to compile (due to the code in
the test now being invalid C).
I did this for O0,O2,O3 on both x86 and aarch64 and observed no difference
between ASM files before and after the patch.

Alan.

2017-11-16  Alan Hayward  

gcc/
* config/aarch64/aarch64.md: Add clobber highs to tls_desc.

gcc/testsuite/  
* gcc.target/aarch64/sve_tls_preserve_1.c: New test.
* gcc.target/aarch64/sve_tls_preserve_2.c: New test.
* gcc.target/aarch64/sve_tls_preserve_3.c: New test.



diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
6a15ff0b61d775cf30189b8503cfa45987701228..1f332b254fe0e37954efbe92982f214100d7046f
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -57,7 +57,36 @@
 (LR_REGNUM 30)
 (SP_REGNUM 31)
 (V0_REGNUM 32)
+(V1_REGNUM 33)
+(V2_REGNUM 34)
+(V3_REGNUM 35)
+(V4_REGNUM 36)
+(V5_REGNUM 37)
+(V6_REGNUM 38)
+(V7_REGNUM 39)
+(V8_REGNUM 40)
+(V9_REGNUM 41)
+(V10_REGNUM42)
+(V11_REGNUM43)
+(V12_REGNUM44)
+(V13_REGNUM45)
+(V14_REGNUM46)
 (V15_REGNUM47)
+(V16_REGNUM48)
+(V17_REGNUM49)
+(V18_REGNUM50)
+(V19_REGNUM51)
+(V20_REGNUM52)
+(V21_REGNUM53)
+(V22_REGNUM54)
+(V23_REGNUM55)
+(V24_REGNUM56)
+(V25_REGNUM57)
+(V26_REGNUM58)
+(V27_REGNUM59)
+(V28_REGNUM60)
+(V29_REGNUM61)
+(V30_REGNUM62)
 (V31_REGNUM63)
 (LAST_SAVED_REGNUM 63)
 (SFP_REGNUM64)
@@ -5745,6 +5774,38 @@
   UNSPEC_TLSDESC))
(clobber (reg:DI LR_REGNUM))
(clobber (reg:CC CC_REGNUM))
+   (clobber_high (reg:TI V0_REGNUM))
+   (clobber_high (reg:TI V1_REGNUM))
+   (clobber_high (reg:TI V2_REGNUM))
+   (clobber_high (reg:TI V3_REGNUM))
+   (clobber_high (reg:TI V4_REGNUM))
+   (clobber_high (reg:TI V5_REGNUM))
+   (clobber_high (reg:TI V6_REGNUM))
+   (clobber_high (reg:TI V7_REGNUM))
+   (clobber_high (reg:TI V8_REGNUM))
+   (clobber_high (reg:TI V9_REGNUM))
+   (clobber_high (reg:TI V10_REGNUM))
+   (clobber_high (reg:TI V11_REGNUM))
+   (clobber_high (reg:TI V12_REGNUM))
+   (clobber_high (reg:TI V13_REGNUM))
+   (clobber_high (reg:TI V14_REGNUM))
+   (clobber_high (reg:TI V15_REGNUM))
+   (clobber_high (reg:TI V16_REGNUM))
+   (clobber_high (reg:TI V17_REGNUM))
+   (clobber_high (reg:TI V18_REGNUM))
+   (clobber_high (reg:TI V19_REGNUM))
+   (clobber_high (reg:TI V20_REGNUM))
+   (clobber_high (reg:TI V21_REGNUM))
+   (clobber_high (reg:TI V22_REGNUM))
+   (clobber_high (reg:TI V23_REGNUM))
+   (clobber_high (reg:TI V24_REGNUM))
+   (clobber_high (reg:TI V25_REGNUM))
+   (clobber_high (reg:TI V26_REGNUM))
+   (clobber_high (reg:TI V27_REGNUM))
+   (clobber_high (reg:TI V28_REGNUM))
+   (clobber_high (reg:TI V29_REGNUM))
+   (clobber_high (reg:TI V30_REGNUM))
+   (clobber_high (reg:TI V31_REGNUM))
(clobber (match_scratch:DI 1 "=r"))]
   "TARGET_TLS_DESC"
   "adrp\\tx0, %A0\;ldr\\t%1, [x0, #%L0]\;add\\t0, 0, 
%L0\;.tlsdesccall\\t%0\;blr\\t%1"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_1.c
new file mode 100644
index 
..5bad829568130181ef1ab386545bd3ee164c322e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fpic -march=armv8-a+sve" } */
+
+/* Clobber highs do not need to be spilled around tls usage.  */
+
+typedef float v4si __attribute__ ((vector_size (16)));
+
+__thread v4si tx;
+
+v4si foo (v4si a, v4si b, v4si c)
+{
+  v4si y;
+
+  y = a + tx + b + c;
+
+  return y + 7;
+}
+
+/* { dg-final { scan-assembler-not {\tstr\t} } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_2.c
new file mode 100644
index 
..69e8829287b8418c28f8c227391c4f8d2186ea63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve_tls_preserve_2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options

[Ada] Spurious error on System'To_Address in -gnatc mode

2017-11-16 Thread Pierre-Marie de Rodat
This patch fixes a bug where if an address clause specifies a call to
System'To_Address as the address, and the code is compiled with the
-gnatc switch, the compiler gives a spurious error message.

The following test should compile quietly with -gnatc:

gcc -c -gnatc counter.ads

with System;

package Counter is
   type Bar is
  record
 X : Integer;
 Y : Integer;
  end record;

   Null_Bar : constant Bar := (0, 0);

   Address : constant := 16#D000_#;

   Foo : Bar := Null_Bar;
   for Foo'Address use System'To_Address (Address);
end Counter;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-11-16  Bob Duff  

* sem_ch13.adb (Check_Expr_Constants): Avoid error message in case of
System'To_Address.

Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 254797)
+++ sem_ch13.adb(working copy)
@@ -9783,6 +9783,15 @@
then
   Check_At_Constant_Address (Prefix (Nod));
 
+   --  Normally, System'To_Address will have been transformed into
+   --  an Unchecked_Conversion, but in -gnatc mode, it will not,
+   --  and we don't want to give an error, because the whole point
+   --  of 'To_Address is that it is static.
+
+   elsif Attribute_Name (Nod) = Name_To_Address then
+  pragma Assert (Operating_Mode = Check_Semantics);
+  null;
+
else
   Check_Expr_Constants (Prefix (Nod));
   Check_List_Constants (Expressions (Nod));


Re: Hurd port for gcc-7 go PATCH 1-3(15)

2017-11-16 Thread Svante Signell
On Wed, 2017-11-15 at 21:54 +0100, Svante Signell wrote:
> On Wed, 2017-11-15 at 21:40 +0100, Matthias Klose wrote:
> > On 06.11.2017 16:36, Svante Signell wrote:
> > > Hi,
> > > 
> > > Attached are patches to enable gccgo to build properly on Debian
> > > GNU/Hurd on gcc-7 (7-7.2.0-12).
> > 
> > sysinfo.go:6744:7: error: redefinition of 'SYS_IOCTL'
> >  const SYS_IOCTL = _SYS_ioctl
> >    ^
> > sysinfo.go:6403:7: note: previous definition of 'SYS_IOCTL' was here
> >  const SYS_IOCTL = 0
> >    ^
> > the patches break the build on any Linux architecture.  Please could you
> > test
> > your patches against a linux target as well?
> 
> I'm really sorry. I regularly do that, but missed this one for gcc-7. Do you
> mean the patches against gcc-8 you asked me for? You wrote that gcc-7 is not
> of
> interest and I should concentrate on gcc-8.
> 
> Again, I'm really sorry. Will fix this tomorrow hopefully.
> 
> Thanks!

Attached is an updated patch for gcc-7. An updated patch for gcc-8 will follow
shortly when I have build tested gcc-8 go on both Linux and Hurd.

The patch for src/libgo/mksysinfo.sh worked fine in gcc-5 and gcc-6. The problem
is that in gcc-7 and gcc-8 generation of build//libgo/sysinfo.go
is made differently.

The Hurd-specific entry about SYS_IOCTL had to be moved after:

# The syscall numbers.  We force the names to upper case.
grep '^const _SYS_' gen-sysinfo.go | \
  sed -e 's/const _\(SYS_[^= ]*\).*$/\1/' | \
  while read sys; do
sup=`echo $sys | tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ`
echo "const $sup = _$sys" >> ${OUT}
  done

Thanks!Index: gcc-7-7.2.0/src/libgo/configure.ac
===
--- gcc-7-7.2.0.orig/src/libgo/configure.ac
+++ gcc-7-7.2.0/src/libgo/configure.ac
@@ -146,7 +146,7 @@ AC_SUBST(go_include)
 # All known GOOS values.  This is the union of all operating systems
 # supported by the gofrontend and all operating systems supported by
 # the gc toolchain.
-ALLGOOS="android darwin dragonfly freebsd irix linux netbsd openbsd plan9 rtems solaris windows"
+ALLGOOS="android darwin dragonfly freebsd irix gnu linux netbsd openbsd plan9 rtems solaris windows"
 
 is_darwin=no
 is_freebsd=no
@@ -157,6 +157,7 @@ is_openbsd=no
 is_dragonfly=no
 is_rtems=no
 is_solaris=no
+is_gnu=no
 GOOS=unknown
 case ${host} in
   *-*-darwin*)   is_darwin=yes;  GOOS=darwin ;;
@@ -168,6 +169,7 @@ case ${host} in
   *-*-dragonfly*) is_dragonfly=yes; GOOS=dragonfly ;;
   *-*-rtems*)is_rtems=yes;   GOOS=rtems ;;
   *-*-solaris2*) is_solaris=yes; GOOS=solaris ;;
+  *-*-gnu*)  is_gnu=yes; GOOS=gnu ;;
 esac
 AM_CONDITIONAL(LIBGO_IS_DARWIN, test $is_darwin = yes)
 AM_CONDITIONAL(LIBGO_IS_FREEBSD, test $is_freebsd = yes)
@@ -178,6 +180,7 @@ AM_CONDITIONAL(LIBGO_IS_OPENBSD, test $i
 AM_CONDITIONAL(LIBGO_IS_DRAGONFLY, test $is_dragonfly = yes)
 AM_CONDITIONAL(LIBGO_IS_RTEMS, test $is_rtems = yes)
 AM_CONDITIONAL(LIBGO_IS_SOLARIS, test $is_solaris = yes)
+AM_CONDITIONAL(LIBGO_IS_GNU, test $is_gnu = yes)
 AM_CONDITIONAL(LIBGO_IS_BSD, test $is_darwin = yes -o $is_dragonfly = yes -o $is_freebsd = yes -o $is_netbsd = yes -o $is_openbsd = yes)
 AC_SUBST(GOOS)
 AC_SUBST(ALLGOOS)
@@ -838,6 +841,14 @@ main ()
 CFLAGS="$CFLAGS_hold"
 LIBS="$LIBS_hold"
 ])
+
+case ${host} in
+  *-*-gnu*)
+  LIBS="$LIBS -lpthread"
+  AC_SUBST(LIBS)
+  ;;
+esac
+
 dnl overwrite for the mips* 64bit multilibs, fails on some buildds
 if test "$libgo_cv_lib_setcontext_clobbers_tls" = "yes"; then
   case "$target" in
Index: gcc-7-7.2.0/src/libgo/Makefile.am
===
--- gcc-7-7.2.0.orig/src/libgo/Makefile.am
+++ gcc-7-7.2.0/src/libgo/Makefile.am
@@ -420,10 +420,14 @@ else
 if LIBGO_IS_NETBSD
 runtime_getncpu_file = runtime/getncpu-bsd.c
 else
+if LIBGO_IS_GNU
+runtime_getncpu_file = runtime/getncpu-gnu.c
+else
 runtime_getncpu_file = runtime/getncpu-none.c
 endif
 endif
 endif
+endif
 endif
 endif
 endif
Index: gcc-7-7.2.0/src/libgo/Makefile.in
===
--- gcc-7-7.2.0.orig/src/libgo/Makefile.in
+++ gcc-7-7.2.0/src/libgo/Makefile.in
@@ -183,7 +183,8 @@ libgo_llgo_la_DEPENDENCIES = $(am__DEPEN
 @LIBGO_IS_LINUX_FALSE@am__objects_2 = thread-sema.lo
 @LIBGO_IS_LINUX_TRUE@am__objects_2 = thread-linux.lo
 @LIBGO_IS_RTEMS_TRUE@am__objects_3 = rtems-task-variable-add.lo
-@LIBGO_IS_DARWIN_FALSE@@LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_NETBSD_FALSE@@LIBGO_IS_SOLARIS_FALSE@am__objects_4 = getncpu-none.lo
+@LIBGO_IS_DARWIN_FALSE@@LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_GNU_FALSE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_NETBSD_FALSE@@LIBGO_IS_SOLARIS_FALSE@am__objects_4 = getncpu-none.lo
+@LIBGO_IS_DARWIN_FALSE@@LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_GNU_TRUE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_NETBSD_FALSE@@LIBGO_IS_SOLARIS_FALSE@am__objects_4 = getncpu-gnu.lo
 @LIBGO_IS_DARWIN_FALSE@@LIBGO_IS_FR

[Ada] Handling of elaboration warnings

2017-11-16 Thread Pierre-Marie de Rodat
This patch modifies the elaboration warnings produced by the ABE mechanism to
depend on the status of flag Elab_Warnings. The flag is enabled by compilation
switch -gnatwl. This change allows for selective suppression of warnings, as
well as total suppression.

In order to preserve the behaviour of the ABE mmechanism with respect ot the
legacy ABE mechanism, elaboration warnings are now on by default.

-
-- Sources --
-

--  selective_2.ads

package Selective_2 is
   Var : Integer;

   generic
   procedure Gen;

   procedure Proc;

   task type Tsk is
  entry E;
   end Tsk;

   package Direct is
  procedure Force_Body;
   end Direct;
end Selective_2;

--  selective_2.adb

package body Selective_2 is
   function Elaborator return Boolean is
  pragma Warnings (Off);
  procedure Inst is new Gen; --  OK
  T : Tsk;   --  OK
  pragma Warnings (On);
   begin
  Proc;  --  Warn
  return True;
   end Elaborator;

   package body Direct is
  procedure Force_Body is begin null; end Force_Body;
  pragma Warnings (Off);
  procedure Inst is new Gen; --  OK
  T : Tsk;   --  OK
  pragma Warnings (On);
   begin
  Proc;  --  Warn
   end Direct;

   Indirect : constant Boolean := Elaborator;

   procedure Gen is begin null; end Gen;

   procedure Proc is begin null; end Proc;

   task body Tsk is
   begin
  accept E;
   end Tsk;

   pragma Warnings (Off);
begin
   Var := 1; --  OK
end Selective_2;


-- Compilation and output --


$ gcc -c selective_2.adb
selective_2.adb:8:07: warning: cannot call "Proc" before body seen
selective_2.adb:8:07: warning: Program_Error may be raised at run time
selective_2.adb:8:07: warning:   body of unit "Selective_2" elaborated
selective_2.adb:8:07: warning:   function "Elaborator" called at line 22
selective_2.adb:8:07: warning:   procedure "Proc" called at line 8
selective_2.adb:19:07: warning: cannot call "Proc" before body seen
selective_2.adb:19:07: warning: Program_Error will be raised at run time

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-11-16  Hristian Kirtchev  

* opt.ads: Elaboration warnings are now on by default. Add a comment
explaining why this is needed.
* sem_ch9.adb (Analyze_Requeue): Preserve the status of elaboration
warnings.
* sem_ch12.adb (Analyze_Package_Instantiation): Preserve the status of
elaboration warnings.
(Analyze_Subprogram_Instantiation): Preserve the status of elaboration
warnings.
* sem_elab.adb: Update the structure of Call_Attributes and
Instantiation_Attributes.
(Build_Call_Marker): Propagate the status of elaboration warnings from
the call to the marker.
(Extract_Call_Attributes): Extract the status of elaboration warnings.
(Extract_Instantiation_Attributes): Extract the status of elaboration
warnings.
(Process_Conditional_ABE_Activation_Impl): Elaboration diagnostics are
now dependent on the status of elaboration warnings.
(Process_Conditional_ABE_Call_Ada): Elaboration diagnostics are now
dependent on the status of elaboration warnings.
(Process_Conditional_ABE_Instantiation_Ada): Elaboration diagnostics
are now dependent on the status of elaboration warnings.
(Process_Guaranteed_ABE_Activation_Impl): Remove pragma Unreferenced
for formal Call_Attrs. Elaboration diagnostics are now dependent on the
status of elaboration warnings.
(Process_Guaranteed_ABE_Call): Elaboration diagnostics are now
dependent on the status of elaboration warnings.
(Process_Guaranteed_ABE_Instantiation): Elaboration diagnostics are now
dependent on the status of elaboration warnings.
* sem_prag.adb (Analyze_Pragma): Remove the unjustified warning
concerning pragma Elaborate.
* sem_res.adb (Resolve_Call): Preserve the status of elaboration
warnings.
(Resolve_Entry_Call): Propagate flag Is_Elaboration_Warnings_OK_Node
from the procedure call to the entry call.
* sem_util.adb (Mark_Elaboration_Attributes): Add formal parameter
Warnings.
(Mark_Elaboration_Attributes_Node): Preserve the status of elaboration
warnings
* sem_util.ads (Mark_Elaboration_Attributes): Add formal parameter
Warnings. Update the comment on usage.
* sinfo.adb (Is_Dispatching_Call): Update to use Flag6.
(Is_Elaboration_Warnings_OK_Node): New routine.
(Set_Is_Dispatching_Call): Update

Re: [PATCH] Factor out division by squares and remove division around comparisons (2/2)

2017-11-16 Thread Wilco Dijkstra

ping


From: Jackson Woodruff 
Sent: 06 September 2017 10:55
To: Richard Biener
Cc: Wilco Dijkstra; kyrylo.tkac...@foss.arm.com; Joseph S. Myers; GCC Patches
Subject: Re: [PATCH] Factor out division by squares and remove division around 
comparisons (2/2)
  

Hi all,

A minor improvement came to mind while updating other parts of this patch.

I've updated a testcase to make it more clear and a condition now uses a 
call to is_division_by rather than manually checking those conditions.

Jackson

On 08/30/2017 05:32 PM, Jackson Woodruff wrote:
> Hi all,
> 
> I've attached a new version of the patch in response to a few of Wilco's 
> comments in person.
> 
> The end product of the pass is still the same, but I have fixed several 
> bugs.
> 
> Now tested independently of the other patches.
> 
> On 08/15/2017 03:07 PM, Richard Biener wrote:
>> On Thu, Aug 10, 2017 at 4:10 PM, Jackson Woodruff
>>  wrote:
>>> Hi all,
>>>
>>> The patch implements the some of the division optimizations discussed in
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 .
>>>
>>> We now reassociate (as discussed in the bug report):
>>>
>>>  x / (y * y) -> x  * (1 / y) * (1 / y)
>>>
>>> If it is reasonable to do so. This is done with
>>> -funsafe-math-optimizations.
>>>
>>> Bootstrapped and regtested with part (1/2). OK for trunk?
>>
>> I believe your enhancement shows the inherent weakness of
>> CSE of reciprocals in that it works from the defs.  It will
>> handle x / (y * y) but not x / (y * y * y).
>>
>> I think a rewrite of this mini-pass is warranted.
> 
> I suspect that there might be more to gain by of handling the case of
> x / (y * z) rather than the case of x / (y**n), but I agree that this 
> pass could do more.
> 
>>
>> Richard.
>>
>>> Jackson
>>>
>>> gcc/
>>>
>>> 2017-08-03  Jackson Woodruff  
>>>
>>>  PR 71026/tree-optimization
>>>  * tree-ssa-math-opts (is_division_by_square,
>>>  is_square_of, insert_sqaure_reciprocals): New.
>>>  (insert_reciprocals): Change to insert reciprocals
>>>  before a division by a square.
>>>  (execute_cse_reciprocals_1): Change to consider
>>>  division by a square.
>>>
>>>
>>> gcc/testsuite
>>>
>>> 2017-08-03  Jackson Woodruff  
>>>
>>>  PR 71026/tree-optimization
>>>  * gcc.dg/associate_division_1.c: New.
>>>
> 
> Thanks,
> 
> Jackson.
> 
> Updated ChangeLog:
> 
> gcc/
> 
> 2017-08-30  Jackson Woodruff  
> 
>  PR 71026/tree-optimization
>  * tree-ssa-math-opts (is_division_by_square, is_square_of): New.
>  (insert_reciprocals): Change to insert reciprocals
>  before a division by a square and to insert the square
>  of a reciprocal.
>  (execute_cse_reciprocals_1): Change to consider
>  division by a square.
>  (register_division_in): Add importance parameter.
> 
> gcc/testsuite
> 
> 2017-08-30  Jackson Woodruff  
> 
>  PR 71026/tree-optimization
>  * gcc.dg/extract_recip_3.c: New.
>  * gcc.dg/extract_recip_4.c: New.
>  * gfortran.dg/extract_recip_1.f: New.
diff --git a/gcc/testsuite/gcc.dg/extract_recip_3.c 
b/gcc/testsuite/gcc.dg/extract_recip_3.c
new file mode 100644
index 
..ad9f2dc36f1e695ceca1f50bc78f4ac4fbb2e787
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/extract_recip_3.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+float
+extract_square (float *a, float *b, float x, float y)
+{
+  *a = 3 / (y * y);
+  *b = 5 / (y * y);
+
+  return x / (y * y);
+}
+
+/* Don't expect the 'powmult' (calculation of y * y)
+   to be deleted until a later pass, so look for one
+   more multiplication than strictly necessary.  */
+float
+extract_recip (float *a, float *b, float x, float y, float z)
+{
+  *a = 7 / y;
+  *b = x / (y * y);
+
+  return z / y;
+}
+
+/* 4 For the pointers to a, b, 4 multiplications in 'extract_square',
+   4 multiplications in 'extract_recip' expected.  */
+/* { dg-final { scan-tree-dump-times " \\* " 12 "optimized" } } */
+
+/* 1 division in 'extract_square', 1 division in 'extract_recip'. */
+/* { dg-final { scan-tree-dump-times " / " 2 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/extract_recip_4.c 
b/gcc/testsuite/gcc.dg/extract_recip_4.c
new file mode 100644
index 
..83105c60ced5c2671f3793d76482c35502712a2c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/extract_recip_4.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+/* Don't expect any of these divisions to be extracted.  */
+double f (double x, int p)
+{
+  if (p > 0)
+{
+  return 1.0/(x * x);
+}
+
+  if (p > -1)
+{
+  return x * x * x;
+}
+  return  1.0 /(x);
+}
+
+/* Expect a reciprocal to be extracted here.  */
+double g (double *a, double x, double y)
+{
+  *a = 3 / y;
+  double k = x / (y * y);
+
+  if (y * y == 2.0)
+return k + 1 / y;
+  else
+return k - 1 / y;
+}
+
+/* Expect

Re: [PATCH] BRIG frontend: request for a global review

2017-11-16 Thread Pekka Jääskeläinen
Hi,

I added some content to gccbrig.texi in r254820 as below. If you have something
that I could describe further there, please just let me know.

Index: gcc/brig/gccbrig.texi
===
--- gcc/brig/gccbrig.texi (revision 254819)
+++ gcc/brig/gccbrig.texi (revision 254820)
@@ -1,5 +1,153 @@
 \input texinfo @c -*-texinfo-*-
 @setfilename gccbrig.info
-@settitle The GNU BRIG Compiler
+@settitle The GNU BRIG (HSAIL) Compiler
+@set copyrights-brig 2017

+@c Merge the standard indexes into a single one.
+@syncodeindex fn cp
+@syncodeindex vr cp
+@syncodeindex ky cp
+@syncodeindex pg cp
+@syncodeindex tp cp
+
+@include gcc-common.texi
+
+@copying
+@c man begin COPYRIGHT
+Copyright @copyright{} @value{copyrights-brig} Free Software Foundation, Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, the Front-Cover Texts being (a) (see below), and
+with the Back-Cover Texts being (b) (see below).
+A copy of the license is included in the
+@c man end
+section entitled ``GNU Free Documentation License''.
+@ignore
+@c man begin COPYRIGHT
+man page gfdl(7).
+@c man end
+@end ignore
+
+@c man begin COPYRIGHT
+
+(a) The FSF's Front-Cover Text is:
+
+ A GNU Manual
+
+(b) The FSF's Back-Cover Text is:
+
+ You have freedom to copy and modify this GNU Manual, like GNU
+ software.  Copies published by the Free Software Foundation raise
+ funds for GNU development.
+@c man end
+@end copying
+
+@ifinfo
+@format
+@dircategory Software development
+@direntry
+* Gccbrig: (gccbrig).   A GCC-based compiler for BRIG/HSAIL
finalization
+@end direntry
+@end format
+
+@insertcopying
+@end ifinfo
+
+@titlepage
+@title The GNU BRIG (HSAIL) Compiler
+@versionsubtitle
+@author Pekka Jääskeläinen
+
+@page
+@vskip 0pt plus 1filll
+Published by the Free Software Foundation @*
+51 Franklin Street, Fifth Floor@*
+Boston, MA 02110-1301, USA@*
+@sp 1
+@insertcopying
+@end titlepage
+@contents
+@page
+
+@node Top
+@top Introduction
+
+This manual describes how to use @command{gccbrig}, the GNU compiler for
+the binary representation (BRIG) of the HSA Intermediate Language (HSAIL).
+For more information about the Heterogeneous System Architecture (HSA)
+Foundation's standards in general, see @uref{http://www.hsafoundation.com/}.
+
+@menu
+* Copying:: The GNU General Public License.
+* GNU Free Documentation License::
+How you can share and copy this manual.
+* Using Gccbrig::   How to use Gccbrig.
+* Index::   Index.
+@end menu
+
+@include gpl_v3.texi
+
+@include fdl.texi
+
+
+@node Using Gccbrig
+@chapter Using Gccbrig
+
+@c man title gccbrig A GCC-based compiler for HSAIL
+
+@ignore
+@c man begin SYNOPSIS gccbrig
+gccbrig [@option{-c}|@option{-S}]
+[@option{-O}@var{level}] [@option{-L}@var{dir}@dots{}]
+[@option{-o} @var{outfile}] @var{infile}@dots{}
+
+Gccbrig is typically not invoked from the command line, but
+through an HSA finalizer implementation.
+@c man end
+@c man begin SEEALSO
+The Info entry for @file{gccbrig} and
+@uref{https://github.com/HSAFoundation/phsa}
+@c man end
+@end ignore
+
+@c man begin DESCRIPTION gccbrig
+
+The BRIG frontend (@command{gccbrig}) differs from the
+other frontends in GCC on how it's typically used.  It's a translator
+for an intermediate language that is not meant to be written directly
+by programmers.  Its input format BRIG is a binary representation of
+HSAIL, which is a textual assembly format for an imaginary machine
+of which instruction set is defined in HSA Programmer Reference Manual
+(PRM) Specification.  Gccbrig currently implements the Base profile
+of the PRM version 1.0.
+
+HSA Runtime Specification defines an API which includes means
+to build and launch ``kernels'' from a host program running on a CPU
+to one or more heterogeneous ``kernel agents''. A kernel Agent
+is typically a GPU or a DSP device controlled by the CPU.
+The build phase is called ``finalization'', which means translation of
+one or more target-independent BRIG files describing the program that
+one wants to run in the Agent to the Agent's instruction set.  Gccbrig
+implements the translation process by generating GENERIC, which is
+translated to the ISA of any supported GCC target by the GCC's backend
+framework, thus enabling potentially any GCC target to act as an HSA agent.
+
+As the kernel finalization process can be only launched from the host API,
+@command{gccbrig} is not typically used directly from the command line by
+the end user, but through an HSA runtime implementation that implements
+the finalizer API running on the host CPU.  Gccbrig is
+designed to work with an open source HSA runtime implementation
+called ``phsa-runtime'', which can be install

[PATCH] Disable -ftrapping-math by default

2017-11-16 Thread Wilco Dijkstra
GCC currently defaults to -ftrapping-math.  This is supposed to generate
code for correct user-visible traps and FP status flags.

However it doesn't work as expected since it doesn't block any floating
point optimizations.  For example it continues to perform CSE, moves FP
operations across calls, moves FP operations out of loops, constant folds
and removes dead floating point operations that cause exceptions.

Given the majority of code doesn't contain user trap handlers or inspects
FP status flags, there is no point in enabling it even if it worked as expected.

Simple case that should cause a FP exception:

void f(void)
{
  0.0 / 0.0;
}

Compiles to:

f:
ret

OK for commit?

2017-11-16  Wilco Dijkstra  

* common.opt (ftrapping-math): Change default to 0.
* doc/invoke.texi (-ftrapping-math): Update documentation.
--

diff --git a/gcc/common.opt b/gcc/common.opt
index 
1bb87353f760d7c60c39de8b9de4311c1ec3d892..59940c64356964f8f9b9d842ad3f1a1c02548bab
 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2470,7 +2470,7 @@ generate them instead of using descriptors.
 ; (user-visible) trap.  This is the case, for example, in nonstop
 ; IEEE 754 arithmetic.
 ftrapping-math
-Common Report Var(flag_trapping_math) Init(1) Optimization SetByCombined
+Common Report Var(flag_trapping_math) Init(0) Optimization SetByCombined
 Assume floating-point operations can trap.
 
 ftrapv
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
74c33ea35b9f320b419a3417e6007d2391536f1b..3673b34b3b7f7b57cfa6375b5316f9f282a9e9bb
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9479,20 +9479,21 @@ This option implies that the sign of a zero result 
isn't significant.
 
 The default is @option{-fsigned-zeros}.
 
-@item -fno-trapping-math
-@opindex fno-trapping-math
-Compile code assuming that floating-point operations cannot generate
+@item -ftrapping-math
+@opindex ftrapping-math
+Compile code assuming that floating-point operations can generate
 user-visible traps.  These traps include division by zero, overflow,
-underflow, inexact result and invalid operation.  This option requires
-that @option{-fno-signaling-nans} be in effect.  Setting this option may
-allow faster code if one relies on ``non-stop'' IEEE arithmetic, for example.
+underflow, inexact result and invalid operation.
 
-This option should never be turned on by any @option{-O} option since
-it can result in incorrect output for programs that depend on
-an exact implementation of IEEE or ISO rules/specifications for
-math functions.
+Note this option has only been partially implemented and does not work
+as expected.  For example @option{-ftrapping-math} performs floating
+point optimizations such as loop invariant motion, constant folding
+and scheduling across function calls which have user-visible effects
+on FP exception flags.
+
+This option is turned on when using @option{-fsignaling-nans}.
 
-The default is @option{-ftrapping-math}.
+The default is @option{-fno-trapping-math}.
 
 @item -frounding-math
 @opindex frounding-math

[PATCH] Add noexcept to std::shared_future copy operations (LWG DR 2799)

2017-11-16 Thread Jonathan Wakely

These functions just increment a refcount (and the base class
functions that do that are already noexcept anyway).

* include/std/future (shared_future): Add noexcept to copy constructor
and copy-assignment operator (LWG 2799).

Tested pwoerpc64le-linux, committed to trunk.

commit 9b66ab2fb36324e2d9cb8e99207d08d531c5cec3
Author: Jonathan Wakely 
Date:   Thu Nov 16 14:24:02 2017 +

Add noexcept to std::shared_future copy operations (LWG DR 2799)

* include/std/future (shared_future): Add noexcept to copy 
constructor
and copy-assignment operator (LWG 2799).

diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index 73d5a60a918..d9d446bc2f6 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -896,7 +896,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr shared_future() noexcept : _Base_type() { }
 
   /// Copy constructor
-  shared_future(const shared_future& __sf) : _Base_type(__sf) { }
+  shared_future(const shared_future& __sf) noexcept : _Base_type(__sf) { }
 
   /// Construct from a future rvalue
   shared_future(future<_Res>&& __uf) noexcept
@@ -908,7 +908,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   : _Base_type(std::move(__sf))
   { }
 
-  shared_future& operator=(const shared_future& __sf)
+  shared_future& operator=(const shared_future& __sf) noexcept
   {
 shared_future(__sf)._M_swap(*this);
 return *this;


Re: [PATCH] Disable -ftrapping-math by default

2017-11-16 Thread Richard Biener
On Thu, Nov 16, 2017 at 3:33 PM, Wilco Dijkstra  wrote:
> GCC currently defaults to -ftrapping-math.  This is supposed to generate
> code for correct user-visible traps and FP status flags.
>
> However it doesn't work as expected since it doesn't block any floating
> point optimizations.  For example it continues to perform CSE, moves FP
> operations across calls, moves FP operations out of loops, constant folds
> and removes dead floating point operations that cause exceptions.
>
> Given the majority of code doesn't contain user trap handlers or inspects
> FP status flags, there is no point in enabling it even if it worked as 
> expected.
>
> Simple case that should cause a FP exception:
>
> void f(void)
> {
>   0.0 / 0.0;
> }
>
> Compiles to:
>
> f:
> ret

We are generally not preserving traps but we guard any transform that
might introduce traps with -ftrapping-math.  That's similar to how we treat
-ftrapv and pointer dereferences.

We're mitigating the "bad" effect of the -ftrapping-math default
by defaulting to -fno-signalling-nans.

If it doesn't block any optimizations what's the point of the patch?

Richard.

> OK for commit?
>
> 2017-11-16  Wilco Dijkstra  
>
> * common.opt (ftrapping-math): Change default to 0.
> * doc/invoke.texi (-ftrapping-math): Update documentation.
> --
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 
> 1bb87353f760d7c60c39de8b9de4311c1ec3d892..59940c64356964f8f9b9d842ad3f1a1c02548bab
>  100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2470,7 +2470,7 @@ generate them instead of using descriptors.
>  ; (user-visible) trap.  This is the case, for example, in nonstop
>  ; IEEE 754 arithmetic.
>  ftrapping-math
> -Common Report Var(flag_trapping_math) Init(1) Optimization SetByCombined
> +Common Report Var(flag_trapping_math) Init(0) Optimization SetByCombined
>  Assume floating-point operations can trap.
>
>  ftrapv
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 74c33ea35b9f320b419a3417e6007d2391536f1b..3673b34b3b7f7b57cfa6375b5316f9f282a9e9bb
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9479,20 +9479,21 @@ This option implies that the sign of a zero result 
> isn't significant.
>
>  The default is @option{-fsigned-zeros}.
>
> -@item -fno-trapping-math
> -@opindex fno-trapping-math
> -Compile code assuming that floating-point operations cannot generate
> +@item -ftrapping-math
> +@opindex ftrapping-math
> +Compile code assuming that floating-point operations can generate
>  user-visible traps.  These traps include division by zero, overflow,
> -underflow, inexact result and invalid operation.  This option requires
> -that @option{-fno-signaling-nans} be in effect.  Setting this option may
> -allow faster code if one relies on ``non-stop'' IEEE arithmetic, for example.
> +underflow, inexact result and invalid operation.
>
> -This option should never be turned on by any @option{-O} option since
> -it can result in incorrect output for programs that depend on
> -an exact implementation of IEEE or ISO rules/specifications for
> -math functions.
> +Note this option has only been partially implemented and does not work
> +as expected.  For example @option{-ftrapping-math} performs floating
> +point optimizations such as loop invariant motion, constant folding
> +and scheduling across function calls which have user-visible effects
> +on FP exception flags.
> +
> +This option is turned on when using @option{-fsignaling-nans}.
>
> -The default is @option{-ftrapping-math}.
> +The default is @option{-fno-trapping-math}.
>
>  @item -frounding-math
>  @opindex frounding-math


Re: [PATCH] New lang hook

2017-11-16 Thread Nathan Sidwell

On 11/16/2017 07:03 AM, Richard Biener wrote:


Looks reasonable apart from

+  /* Overwrite the DECL_ASSEMBLER_NAME for a node.  The name is being
+ changed (including to or from NULL_TREE).  */

which suggests the default implementation of set_decl_assembler_name would
call this hook (which it doesn't).  Any particular reason?  Maybe just document
(including to NULL_TREE), thus exclude from NULL_TREE?


As discussed on IRC, this variant calls the new hook from the default 
set_decl_assembler_name hook.  Applying.


nathan

--
Nathan Sidwell
2017-11-16  Nathan Sidwell  

	PR c++/82836
	PR c++/82737
	* tree.h (COPY_DECL_RTL): Rename parms for clarity.
	(SET_DECL_ASSEMBLER_NAME): Forward to
	overwrite_decl_assembler_name.
	(COPY_DECL_ASSEMBLER_NAME): Rename parms for clarity.
	(overwrite_decl_assembler_name): Declare.
	* tree.c (overwrite_decl_assembler_name): New.
	* langhooks-def.h (lhd_overwrite_decl_assembler_name): Declare.
	(LANG_HOOKS_OVERWRITE_DECL_ASSEMBLER_NAME): Provide default.
	(LANG_HOOKS_INITIALIZER): Add it.
	* langhooks.h (struct lang_hooks): Add overwrite_decl_assembler_name.
	* langhooks.c (lhd_set_decl_assembler_name): Use
	SET_DECL_ASSEMBLER_NAME.
	(lhd_overwrite_decl_assembler_name): Default implementation.

	PR c++/82836
	PR c++/82737
	* cp-objcp-common.h (LANG_HOOKS_OVERWRITE_DECL_ASSEMBLER_NAME):
	Override.
	* cp-tree.h (overwrite_mangling): Declare.
	* decl2.c (struct mangled_decl_hash): Entries are deletable.
	(overwrite_mangling): New.

	PR c++/82836
	PR c++/82737
	* g++.dg/pr82836.C: New.

Index: cp/cp-objcp-common.h
===
--- cp/cp-objcp-common.h	(revision 254817)
+++ cp/cp-objcp-common.h	(working copy)
@@ -73,6 +73,8 @@ extern void cp_register_dumps (gcc::dump
 #define LANG_HOOKS_DUP_LANG_SPECIFIC_DECL cxx_dup_lang_specific_decl
 #undef LANG_HOOKS_SET_DECL_ASSEMBLER_NAME
 #define LANG_HOOKS_SET_DECL_ASSEMBLER_NAME mangle_decl
+#undef LANG_HOOKS_OVERWRITE_DECL_ASSEMBLER_NAME
+#define LANG_HOOKS_OVERWRITE_DECL_ASSEMBLER_NAME overwrite_mangling
 #undef LANG_HOOKS_PRINT_STATISTICS
 #define LANG_HOOKS_PRINT_STATISTICS cxx_print_statistics
 #undef LANG_HOOKS_PRINT_XNODE
Index: cp/cp-tree.h
===
--- cp/cp-tree.h	(revision 254817)
+++ cp/cp-tree.h	(working copy)
@@ -6187,6 +6187,7 @@ extern tree cxx_maybe_build_cleanup		(tr
 
 /* in decl2.c */
 extern void record_mangling			(tree, bool);
+extern void overwrite_mangling			(tree, tree);
 extern void note_mangling_alias			(tree, tree);
 extern void generate_mangling_aliases		(void);
 extern tree build_memfn_type			(tree, tree, cp_cv_quals, cp_ref_qualifier);
Index: cp/decl2.c
===
--- cp/decl2.c	(revision 254817)
+++ cp/decl2.c	(working copy)
@@ -123,9 +123,14 @@ struct mangled_decl_hash : ggc_remove  (1);
+  }
+  static void mark_deleted (value_type &e)
+  {
+e = reinterpret_cast  (1);
+  }
 };
 
 /* A hash table of decls keyed by mangled name.  Used to figure out if
@@ -4439,6 +,33 @@ record_mangling (tree decl, bool need_wa
 }
 }
 
+/* The mangled name of DECL is being forcibly changed to NAME.  Remove
+   any existing knowledge of DECL's mangled name meaning DECL.  */
+
+void
+overwrite_mangling (tree decl, tree name)
+{
+  if (tree id = DECL_ASSEMBLER_NAME_RAW (decl))
+if ((TREE_CODE (decl) == VAR_DECL
+	 || TREE_CODE (decl) == FUNCTION_DECL)
+	&& mangled_decls)
+  if (tree *slot
+	  = mangled_decls->find_slot_with_hash (id, IDENTIFIER_HASH_VALUE (id),
+		NO_INSERT))
+	if (*slot == decl)
+	  {
+	mangled_decls->clear_slot (slot);
+
+	/* If this is an alias, remove it from the symbol table.  */
+	if (DECL_ARTIFICIAL (decl) && DECL_IGNORED_P (decl))
+	  if (symtab_node *n = symtab_node::get (decl))
+		if (n->cpp_implicit_alias)
+		  n->remove ();
+	  }
+
+  DECL_ASSEMBLER_NAME_RAW (decl) = name;
+}
+
 /* The entire file is now complete.  If requested, dump everything
to a file.  */
 
Index: langhooks-def.h
===
--- langhooks-def.h	(revision 254817)
+++ langhooks-def.h	(working copy)
@@ -51,7 +51,8 @@ extern const char *lhd_dwarf_name (tree,
 extern int lhd_types_compatible_p (tree, tree);
 extern void lhd_print_error_function (diagnostic_context *,
   const char *, struct diagnostic_info *);
-extern void lhd_set_decl_assembler_name (tree);
+extern void lhd_set_decl_assembler_name (tree decl);
+extern void lhd_overwrite_decl_assembler_name (tree decl, tree name);
 extern bool lhd_warn_unused_global_decl (const_tree);
 extern tree lhd_type_for_size (unsigned precision, int unsignedp);
 extern void lhd_incomplete_type_error (location_t, const_tree, const_tree);
@@ -107,6 +108,7 @@ extern int lhd_type_dwarf_attribute (con
 #define LANG_HOOKS_FINISH_INCOMPLETE_DECL lhd_do_nothing_t
 #define LANG_HOOKS_DUP_LANG_SPECIFIC_DECL lhd_do_nothing

Re: [PATCH, GCC/ARM] Fix ICE in Armv8-M Security Extensions code

2017-11-16 Thread Kyrill Tkachov

Hi Thomas,

On 15/11/17 16:57, Thomas Preudhomme wrote:

Hi,

Commit r253825 which introduced some sanity checks for sbitmap revealed
a bug in the conversion of cmse_nonsecure_entry_clear_before_return ()
to using bitmap structure. bitmap_and expects that the two bitmaps have
the same length, yet the code in
cmse_nonsecure_entry_clear_before_return () have different size for
to_clear_bitmap and to_clear_arg_regs_bitmap, with the assumption that
bitmap_and would behave has if the bits not allocated were in fact zero.
This commit makes sure both bitmap are equally sized.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-11-13  Thomas Preud'homme 

* config/arm/arm.c (cmse_nonsecure_entry_clear_before_return): 
Allocate

to_clear_arg_regs_bitmap to the same size as to_clear_bitmap.

Testing: Bootstrapped GCC on arm-none-linux-gnueabihf target and
testsuite shows no regression. Running cmse.exp tests for Armv8-M
Baseline and Mainline shows FAIL->PASS for bitfield-1, bitfield-2,
bitfield-3 and struct-1 testcases.

Is this ok for trunk?



Ok.
Thanks,
Kyrill


Best regards,

Thomas




Re: [PATCH GCC]A simple implementation of loop interchange

2017-11-16 Thread Bin.Cheng
On Tue, Oct 24, 2017 at 3:30 PM, Michael Matz  wrote:
> Hello,
>
> On Fri, 22 Sep 2017, Bin.Cheng wrote:
>
>> This is updated patch for loop interchange with review suggestions
>> resolved.  Changes are:
>>   1) It does more light weight checks like rectangle loop nest check
>> earlier than before.
>>   2) It checks profitability of interchange before data dependence 
>> computation.
>>   3) It calls find_data_references_in_loop only once for a loop nest now.
>>   4) Data dependence is open-computed so that we can skip instantly at
>> unknown dependence.
>>   5) It improves code generation in mapping induction variables for
>> loop nest, as well as
>>  adding a simple dead code elimination pass.
>>   6) It changes magic constants into parameters.
>
> So I have a couple comments/questions.  Something stylistic:
Hi Michael,
Thanks for reviewing.

>
>> +class loop_cand
>> +{
>> +public:
>> ...
>> +  friend class tree_loop_interchange;
>> +private:
>
> Just make this all public (and hence a struct, not class).
> No need for friends in file local classes.
Done.

>
>> +single_use_in_loop (tree var, struct loop *loop)
>> ...
>> +  FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
>> +{
>> +  stmt = USE_STMT (use_p);
>> ...
>> +  basic_block bb = gimple_bb (stmt);
>> +  gcc_assert (bb != NULL);
>
> This pattern reoccurs often in your patch: you check for a bb associated
> for a USE_STMT.  Uses of SSA names always occur in basic blocks, no need
> for checking.
Done.

>
> Then, something about your handling of simple reductions:
>
>> +void
>> +loop_cand::classify_simple_reduction (reduction_p re)
>> +{
>> ...
>> +  /* Require memory references in producer and consumer are the same so
>> + that we can undo reduction during interchange.  */
>> +  if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0))
>> +return;
>
> Where is it checked that the undoing transformation is legal also
> from a data dep point of view?  Think code like this:
>
>sum = X[i];
>for (j ...)
>  sum += X[j];
>X[i] = sum;
>
> Moving the store into the inner loop isn't always correct and I don't seem
> to find where the above situation is rejected.
Yeah.  for the old patch, it's possible to have such loop wrongly interchanged;
in practice, it's hard to create an example.  The pass will give up
when computing
data dep between references in inner/outer loops.  In this updated
patch, it's fixed
by giving up if there is any dependence between references of inner/outer loops.

>
> Maybe I'm confused because I also don't see where you even can get into
> the above situation (though I do see testcases about this).  The thing is,
> for an 2d loop nest to contain something like the above reduction it can't
> be perfect:
>
>for (j) {
>  int sum = X[j];  // 1
>  for (i)
>sum += Y[j][i];
>  X[j] = sum;  // 2
>}
>
> But you do check for perfectness in proper_loop_form_for_interchange and
> prepare_perfect_loop_nest, so either you can't get into the situation or
> the checking can't be complete, or you define the above to be perfect
> nevertheless (probably because the load and store are in outer loop
> header/exit blocks?).  The latter would mean that you accept also other
> code in header/footer of loops from a pure CFG perspective, so where is it
> checked that that other code (which aren't simple reductions) isn't
> harmful to the transformation?
Yes, I used the name perfect loop nest, but the pass can handle special form
imperfect loop nest for the simple reduction.  I added comments describing
this before function prepare_perfect_loop_nest.

>
> Then, the data dependence part of the new pass:
>
>> +bool
>> +tree_loop_interchange::valid_data_dependences (unsigned inner, unsigned 
>> outer)
>> +{
>> +  struct data_dependence_relation *ddr;
>> +
>> +  for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
>> +{
>> +  /* Skip no-dependence case.  */
>> +  if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
>> + continue;
>> +
>> +  for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
>> + {
>> +   lambda_vector dist_vect = DDR_DIST_VECT (ddr, j);
>> +   unsigned level = dependence_level (dist_vect, loop_nest.length ());
>> +
>> +   /* If there is no carried dependence.  */
>> +   if (level == 0)
>> + continue;
>> +
>> +   level --;
>> +   /* Skip case which has '>' as the leftmost direction.  */
>> +   if (!lambda_vector_lexico_pos (dist_vect, level))
>> + return false;
>
> Shouldn't happen as dist vectors are forced positive via DDR_REVERSED.
Done.

>
>> +   /* If dependence is carried by outer loop of the two loops for
>> +  interchange.  */
>> +   if (level < outer)
>> + continue;
>> +
>> +   lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
>> +   /* If directions at both inner/outer levels are the same.  */
>> +   if (dir_vect[inner] == dir_vect[outer])
>> + continue

Re: [PATCH][GCC][mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

2017-11-16 Thread Tamar Christina
Hi Richard,

> 
> I'd have made it
> 
>   if { ([is-effective-target non_strict_align]
> && ! ( [istarget ...] || ))
> 
> thus default it to 1 for non-strict-align targets.
> 

Fair, I've switched it to a black list and have excluded the only one I know
should not work. Most of the rest will get blocked by non_strict_align and for 
the
few others I'll adjust the testcase accordingly if there are any issues.

> > But this also raises a question, some targets have defined 
> > SLOW_UNALIGNED_ACCESS
> > in a way that uses only internal state to determine the value where 
> > STRICT_ALIGNMENT
> > is essentially ignored. e.g. PowerPC and riscv.
> > 
> > The code generation *might* change for them but the tests won't run. I see 
> > now way to
> > make the test accurate (as in, runs in all cases where the codegen changed)
> > unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.
> > 
> > Would this be the way to go?
> 
> I don't think so.  SLOW_UNALIGNED_ACCESS is per mode and specific to
> a certain alignment.
> 

Ah, right! that slipped my mind for a bit.

Ok for trunk?

Thanks for the review,
Tamar
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..3b200964462a82ebbe68bbe798cc91ed27337034 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item word_mode_no_slow_unalign
+Target does not have slow unaligned access when doing word size accesses.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a81c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src
+bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index ..17b956022e4efb37044c7a74cc8baa9fb779221a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@
+/* { dg-do-run } */
+/* { dg-require-effective-target word_mode_no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .
+
+   Please email any bugs, comments, and/or additions to this file to:
+   bug-...@prep.ai.mit.edu  */
+
+#include 
+
+struct struct3 { char a, b, c; };
+struct struct3 foo3 = { 'A', 'B', 'C'},  L3;
+
+struct struct3  fun3()
+{
+  return foo3;
+}
+
+#ifdef PROTOTYPES
+void Fun3(struct struct3 foo3)
+#else
+void Fun3(foo3)
+ struct struct3 foo3;
+#endif
+{
+  L3 = foo3;
+}
+
+int main()
+{
+  struct struct3 x = fun3();
+
+  printf("a:%c, b:%c, c:%c\n", x.a, x.b, x.c);
+}
+
+/* { dg-final { scan-rtl-dump-not {zero_extract:.+\[\s*foo3\s*\]} "final" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b6f9e51c4817cf8235c8e33b14e2763308eb482a..03413c323d00e88872879a741ab3c015e052311d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6037,6 +6037,31 @@ proc check_effective_target_unaligned_stack { } {
 return $et_unaligned_stack_saved
 }
 
+# Return 1 if the target plus current options does not have
+# slow unaligned access when using word size accesses.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_word_mode_no_slow_unalign { } {
+global et_word_mode_no_slow_unalign_saved
+global et_index
+
+if [info exists et_word_mode_no_slow_unalign_saved($et_index)] {
+verbose "check_effective_target_word_mode_no_slow_unalign: \
+ using cached result" 

RE: [PATCH, committed] Add myself to MAINTAINERS

2017-11-16 Thread Peryt, Sebastian
Message didn't get thru for some reason. Resending.

Sebastian

From: Peryt, Sebastian 
Sent: Wednesday, November 15, 2017 1:44 PM
To: gcc-patches@gcc.gnu.org
Cc: Peryt, Sebastian 
Subject: [PATCH, committed] Add myself to MAINTAINERS

ChangeLog:

2017-11-15  Sebastian Peryt  

    * MAINTAINERS (write after approval): Add myself.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 254760)
+++ MAINTAINERS (working copy)
@@ -532,6 +532,7 @@
Devang Patel   
Andris Pavenis 
Fernando Pereira   
+Sebastian Peryt    

Kaushik Phatak 
Nicolas Pitre  
Paul Pluzhnikov    

Sebastian



[patch] remove cilk-plus

2017-11-16 Thread Koval, Julia
// I failed to send patch itself, it is too big even in gzipped form.  What is 
the right way to send such big patches?

Hi, this patch removes cilkplus. Ok for trunk?

2017-11-16  Julia Koval  
Sebastian Peryt  
gcc/
* Makefile.def (target_modules): Remove libcilkrts.
* Makefile.in: Ditto.
* configure: Ditto.
* configure.ac: Ditto.
* contrib/gcc_update: Ditto.
* Makefile.in (cilkplus.def, cilk-builtins.def, c-family/cilk.o, 
c-family/c-cilkplus.o, c-family/array-notation-common.o,
cilk-common.o, cilk.h, cilk-common.c): Remove.
* builtin-types.def (BT_FN_INT_PTR_PTR_PTR_FTYPE_BT_INT_BT_PTR_BT_PTR
_BT_PTR): Remove.
* builtins.c (is_builtin_name): Remove cilkplus condition.
(BUILT_IN_CILK_DETACH, BUILT_IN_CILK_POP_FRAME): Remove.
* builtins.def (DEF_CILK_BUILTIN_STUB, DEF_CILKPLUS_BUILTIN,
cilk-builtins.def, cilkplus.def): Remove.
* c-family/array-notation-common.c: Delete.
* c-family/c-cilkplus.c: Ditto.
* c-family/c-common.c (_Cilk_spawn, _Cilk_sync, _Cilk_for): Remove.
* c-family/c-common.def (ARRAY_NOTATION_REF): Remove.
* c-family/c-common.h (RID_CILK_SPAWN, build_array_notation_expr,
build_array_notation_ref, C_ORT_CILK, c_check_cilk_loop,
c_validate_cilk_plus_loop, cilkplus_an_parts, 
cilk_ignorable_spawn_rhs_op,
cilk_recognize_spawn): Remove.
* c-family/c-gimplify.c (CILK_SPAWN_STMT): Remove.
* c-family/c-omp.c: Remove CILK_SIMD check.
* c-family/c-pragma.c: Ditto.
* c-family/c-pragma.h: Remove CILK related pragmas.
* c-family/c-pretty-print.c (c_pretty_printer::postfix_expression): 
Remove
ARRAY_NOTATION_REF condition.
(c_pretty_printer::expression): Ditto.
* c-family/c.opt (fcilkplus): Remove.
* c-family/cilk.c: Delete.
* c/Make-lang.in (c/c-array-notation.o): Remove.
* c/c-array-notation.c: Delete.
* c/c-decl.c: Remove cilkplus condition.
* c/c-parser.c (c_parser_cilk_simd, c_parser_cilk_for,
c_parser_cilk_verify_simd, c_parser_array_notation,
c_parser_cilk_clause_vectorlength, c_parser_cilk_grainsize,
c_parser_cilk_simd_fn_vector_attrs, c_finish_cilk_simd_fn_tokens): 
Delete.
(c_parser_declaration_or_fndef): Remove cilkplus condition.
(c_parser_direct_declarator_inner): Ditto.
(CILK_SIMD_FN_CLAUSE_MASK): Delete.
(c_parser_attributes): Remove cilk-plus condition.
(c_parser_compound_statement): Ditto.
(c_parser_statement_after_labels): Ditto.
(c_parser_if_statement): Ditto.
(c_parser_switch_statement): Ditto.
(c_parser_while_statement): Ditto.
(c_parser_do_statement): Ditto.
(c_parser_for_statement): Ditto.
(c_parser_unary_expression): Ditto.
(c_parser_postfix_expression): Ditto.
(c_parser_postfix_expression_after_primary): Ditto.
(c_parser_pragma): Ditto.
(c_parser_omp_clause_name): Ditto.
(c_parser_omp_all_clauses): Ditto.
(c_parser_omp_for_loop): Ditto.
(c_finish_omp_declare_simd): Ditto.
* c/c-typeck.c (build_array_ref, build_function_call_vec, 
convert_arguments,
lvalue_p, build_compound_expr, c_finish_return, c_finish_if_stmt,
c_finish_loop, build_binary_op): Remove cilkplus condition.
* cif-code.def (CILK_SPAWN): Remove.
* cilk-builtins.def: Delete.
* cilk-common.c: Ditto.
* cilk.h: Ditto.
* cilkplus.def: Ditto.
* config/darwin.h (fcilkplus): Delete.
* cp/Make-lang.in (cp/cp-array-notation.o, cp/cp-cilkplus.o): Delete.
* cp/call.c (convert_for_arg_passing, build_cxx_call): Remove cilkplus.
* cp/constexpr.c (potential_constant_expression_1): Ditto.
* cp/cp-array-notation.c: Delete.
* cp/cp-cilkplus.c: Ditto.
* cp/cp-cilkplus.h: Ditto.
* cp/cp-gimplify.c (cp_gimplify_expr, cp_fold_r, cp_genericize): Remove
cilkplus condition.
* cp/cp-objcp-common.c (ARRAY_NOTATION_REF): Delete.
* cp/cp-tree.h (cilkplus_an_triplet_types_ok_p): Delete.
* cp/decl.c (grokfndecl, finish_function): Remove cilkplus condition.
* cp/error.c (dump_decl, dump_expr): Remove ARRAY_NOTATION_REF 
condition.
* cp/lambda.c (cp-cilkplus.h): Remove.
* cp/parser.c (cp_parser_cilk_simd, cp_parser_cilk_for,
cp_parser_cilk_simd_vectorlength): Delete.
(cp_debug_parser, cp_parser_ctor_initializer_opt_and_function_body,
cp_parser_postfix_expression, cp_parser_postfix_open_square_expression,
cp_parser_statement, cp_parser_jump_statement, 
cp_parser_direct_declarator,
cp_parser_late_return_type_opt, cp_parser_gnu_attribute_list,
cp_parser_omp_clause_name, cp_parser_omp_clause_aligned,
cp_parser_omp_clause_linear, cp_parser_omp_all_clauses,

Re: [patch] remove cilk-plus

2017-11-16 Thread Marek Polacek
On Thu, Nov 16, 2017 at 03:33:40PM +, Koval, Julia wrote:
> // I failed to send patch itself, it is too big even in gzipped form.  What 
> is the right way to send such big patches?
 
You can split the patch and then post each part in a separate e-mail.
Easier to review, too.

> Hi, this patch removes cilkplus. Ok for trunk?

Happy to see this, but the CL will have to be adjusted, e.g. no "c-family/"
prefix and similar.

Marek


Re: [PATCH] Disable -ftrapping-math by default

2017-11-16 Thread Wilco Dijkstra
Richard Biener wrote:

> We are generally not preserving traps but we guard any transform that
> might introduce traps with -ftrapping-math.  That's similar to how we treat
> -ftrapv and pointer dereferences.

Right. It appears it's mostly concerned about division - if it is about division
by zero aborting like a null pointer reference then maybe it should be renamed
to -ftrap-fp-division-by-zero? Are there targets that abort like this?

> We're mitigating the "bad" effect of the -ftrapping-math default
> by defaulting to -fno-signalling-nans.
>
> If it doesn't block any optimizations what's the point of the patch?

It certainly blocks some optimizations, I noticed it affects the division 
reciprocal
optimization.

Wilco

Re: [patch] remove cilk-plus

2017-11-16 Thread Jakub Jelinek
On Thu, Nov 16, 2017 at 03:33:40PM +, Koval, Julia wrote:
> // I failed to send patch itself, it is too big even in gzipped form.  What 
> is the right way to send such big patches?

Don't include the libcilkrts subtree in the patch nor /cilk-plus/
testcases that are going to be removed?

> Hi, this patch removes cilkplus. Ok for trunk?
> 
> 2017-11-16  Julia Koval  
>   Sebastian Peryt  
> gcc/
>   * Makefile.def (target_modules): Remove libcilkrts.
>   * Makefile.in: Ditto.
>   * configure: Ditto.
>   * configure.ac: Ditto.

The ChangeLog needs work, e.g. we have many different ChangeLog files and
changes should be relative to that.  The above entries are for toplevel.

>   * contrib/gcc_update: Ditto.

This one is for contrib/ChangeLog, so should be without contrib/
in the entry.

>   * Makefile.in (cilkplus.def, cilk-builtins.def, c-family/cilk.o, 
>   c-family/c-cilkplus.o, c-family/array-notation-common.o,
>   cilk-common.o, cilk.h, cilk-common.c): Remove.
>   * builtin-types.def (BT_FN_INT_PTR_PTR_PTR_FTYPE_BT_INT_BT_PTR_BT_PTR
>   _BT_PTR): Remove.

There should be no linebreaks within one identifier.  So
* builtin-types.def
(BT_FN_INT_PTR_PTR_PTR_FTYPE_BT_INT_BT_PTR_BT_PTR_BT_PTR): Remove.

>   * c-family/array-notation-common.c: Delete.
>   * c-family/c-cilkplus.c: Ditto.
>   * c-family/c-common.c (_Cilk_spawn, _Cilk_sync, _Cilk_for): Remove.
>   * c-family/c-common.def (ARRAY_NOTATION_REF): Remove.
>   * c-family/c-common.h (RID_CILK_SPAWN, build_array_notation_expr,
>   build_array_notation_ref, C_ORT_CILK, c_check_cilk_loop,
>   c_validate_cilk_plus_loop, cilkplus_an_parts, 
> cilk_ignorable_spawn_rhs_op,
>   cilk_recognize_spawn): Remove.
>   * c-family/c-gimplify.c (CILK_SPAWN_STMT): Remove.
>   * c-family/c-omp.c: Remove CILK_SIMD check.
>   * c-family/c-pragma.c: Ditto.
>   * c-family/c-pragma.h: Remove CILK related pragmas.
>   * c-family/c-pretty-print.c (c_pretty_printer::postfix_expression): 
> Remove
>   ARRAY_NOTATION_REF condition.
>   (c_pretty_printer::expression): Ditto.
>   * c-family/c.opt (fcilkplus): Remove.
>   * c-family/cilk.c: Delete.

c-family has its own ChangeLog, no c-family/ prefix (similarly for c/, cp/,
etc.).

>   * c/Make-lang.in (c/c-array-notation.o): Remove.
>   * c/c-array-notation.c: Delete.
>   * c/c-decl.c: Remove cilkplus condition.
>   * c/c-parser.c (c_parser_cilk_simd, c_parser_cilk_for,
>   c_parser_cilk_verify_simd, c_parser_array_notation,
>   c_parser_cilk_clause_vectorlength, c_parser_cilk_grainsize,
c_parser_cilk_simd_fn_vector_attrs, c_finish_cilk_simd_fn_tokens): 
Delete.

Too long line.

>   (c_parser_declaration_or_fndef): Remove cilkplus condition.
>   (c_parser_direct_declarator_inner): Ditto.
>   (CILK_SIMD_FN_CLAUSE_MASK): Delete.
>   (c_parser_attributes): Remove cilk-plus condition.
>   (c_parser_compound_statement): Ditto.
>   (c_parser_statement_after_labels): Ditto.
>   (c_parser_if_statement): Ditto.
>   (c_parser_switch_statement): Ditto.
>   (c_parser_while_statement): Ditto.
>   (c_parser_do_statement): Ditto.
>   (c_parser_for_statement): Ditto.
>   (c_parser_unary_expression): Ditto.
>   (c_parser_postfix_expression): Ditto.
>   (c_parser_postfix_expression_after_primary): Ditto.
>   (c_parser_pragma): Ditto.
>   (c_parser_omp_clause_name): Ditto.
>   (c_parser_omp_all_clauses): Ditto.
>   (c_parser_omp_for_loop): Ditto.
>   (c_finish_omp_declare_simd): Ditto.

Perhaps you could shorten by writing:
(c_parser_attributes, c_parser_compound_statement,
c_parser_statement_after_labels, c_parser_if_statement,
c_parser_switch_statement, c_parser_while_statement,
c_parser_do_statement, c_parser_for_statement,
c_parser_unary_expression, c_parser_postfix_expression,
c_parser_postfix_expression_after_primary, c_parser_pragma,
c_parser_omp_clause_name, c_parser_omp_all_clauses,
c_parser_omp_for_loop, c_finish_omp_declare_simd):
Remove cilkplus support.
etc.
* c/c-typeck.c (build_array_ref, build_function_call_vec, 
convert_arguments,

Too long line (various others).

>   * tree-core.h
>   * tree-nested.c
>   * tree-pretty-print.c
>   * tree.c
>   * tree.def
>   * tree.h

Description on what changed is missing.

>   * g++.dg/cilk-plus/AN/array_function.c: Delete.c

Delete.c ?  Should be Delete.

Jakub


Re: [PATCH] Improve -Wmaybe-uninitialized documentation

2017-11-16 Thread Martin Sebor

On 11/16/2017 03:49 AM, Jonathan Wakely wrote:

On 15/11/17 20:28 -0700, Martin Sebor wrote:

On 11/15/2017 07:31 AM, Jonathan Wakely wrote:

The docs for -Wmaybe-uninitialized have some issues:

- That first sentence is looong.
- Apparently some C++ programmers think "automatic variable" means one
declared with C++11 `auto`, rather than simply a local variable.
- The sentence about only warning when optimizing is stuck in between
two chunks talking about longjmp, which could be inferred to mean
only the setjmp/longjmp part of the warning depends on optimization.

This attempts to make it easier to parse and understand.


I've always found the description remarkably precise.  Particularly
the bit where it talks about the two paths, one initialized and the
other not.  Your rewording loses that distinction so I don't think
it's as accurate, or even correct.

To use an example, this would satisfy the new description:

 int f (void)
 {
   int i;
   return i;
 }

but it doesn't match GCC behavior (it triggers -Wuninitialized,
not -Wmaybe-uninitialized).  Unless the distinction is more
subtle than I ascribe to it I think it needs to be preserved
in the rewording.


Ah, I tested a similar case and missed that the warning I got was from
-Wuninitialized not -Wmaybe-uninitialized, which made me think that
"a use of the variable that is initialized" was wrong.

OK, so then here's an alternative patch which doesn't touch that first
sentence except to add "(i.e. local)". That makes the first sentence
even longer, but if it's accurate maybe that's OK. This still adds
"These warnings are only possible in optimizing compilation, because
otherwise GCC does not keep track of the state of variables." And
removes the similar text from the middle of the setjmp/longjmp
discussion.


Thanks, this looks fine to me.

As an aside, I wonder if you think that rewording the part about
GCC not being smart enough might be worthwhile:

 These warnings are made optional because GCC is not smart enough
 to see all the reasons why the code might be correct in spite of
 appearing to have an error.

It sounds just a little pejorative (or maybe just colloquial) to
me for the manual.  Perhaps:

 These warnings are made optional because GCC may not be able to
 determine when the code is correct in spite of appearing to have
 an error.

Martin


[PATCH][PR c++/82888] smarter code for default initialization of scalar arrays

2017-11-16 Thread Nathan Froyd
Default-initialization of scalar arrays in C++ member initialization
lists produced rather slow code, laboriously setting each element of the
array to zero.  It would be much faster to block-initialize the array,
and that's what this patch does.

The patch works for me, but I'm not sure if it's the best way to
accomplish this.  At least two other possibilities come to mind:

1) Detect this case in build_vec_init_expr and act as though the user
   wrote 'member{0}', which the front-end already produces efficient
   code for.

2) Detect this case in build_vec_init, but again, act as though the user
   wrote 'member{0}' and let everything proceed as normal.
   (Alternatively, handle this case prior to calling build_vec_init and
   pass different arguments to build_vec_init.)

Opinions as to the best way forward here?  I'm unsure of whether the
code below is front-end friendly; I see in the gimple dumps that the
solution below adds an extra CLOBBER on 'this' for 'member()', whereas
'member{0}' does not.  It's possible that I'm missing something.

Bootstrapped on x86_64-unknown-linux-gnu, no regressions.

OK for trunk?

-Nathan

gcc/cp/
PR c++/82888
* init.c (build_vec_init): Handle default-initialization of array
types.

gcc/testsuite/
PR c++/82888
* g++.dg/init/pr82888.C: New.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index c76460d..53d6133 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -4038,6 +4038,15 @@ build_vec_init (tree base, tree maxindex, tree init,
}
 }
 
+  /* Default-initialize scalar arrays directly.  */
+  if (TREE_CODE (atype) == ARRAY_TYPE
+  && SCALAR_TYPE_P (TREE_TYPE (atype))
+  && !init)
+{
+  gcc_assert (!from_array);
+  return build2 (MODIFY_EXPR, atype, base, build_constructor (atype, 
NULL));
+}
+
   /* If we have a braced-init-list or string constant, make sure that the array
  is big enough for all the initializers.  */
   bool length_check = (init
diff --git a/gcc/testsuite/g++.dg/init/pr82888.C 
b/gcc/testsuite/g++.dg/init/pr82888.C
new file mode 100644
index 000..9225e23
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/pr82888.C
@@ -0,0 +1,18 @@
+// { dg-do compile }
+// { dg-options "-fdump-tree-gimple" }
+
+class A
+{
+public:
+  A();
+
+private:
+  unsigned char mStorage[4096];
+};
+
+A::A()
+  : mStorage()
+{}
+
+// { dg-final { scan-tree-dump "this->mStorage = {}" "gimple" } }
+// { dg-final { scan-tree-dump-not "&this->mStorage" "gimple" } }


Re: [patch] remove cilk-plus

2017-11-16 Thread Eric Gallager
On 11/16/17, Koval, Julia  wrote:
> // I failed to send patch itself, it is too big even in gzipped form.  What
> is the right way to send such big patches?
>
> Hi, this patch removes cilkplus. Ok for trunk?

I'm not a reviewer, but just as an onlooker, I'd want to see notes
about the removal in the caveats section of
https://gcc.gnu.org/gcc-8/changes.html

>
> 2017-11-16  Julia Koval  
>   Sebastian Peryt  
> gcc/
>   * Makefile.def (target_modules): Remove libcilkrts.
>   * Makefile.in: Ditto.
>   * configure: Ditto.
>   * configure.ac: Ditto.
>   * contrib/gcc_update: Ditto.
>   * Makefile.in (cilkplus.def, cilk-builtins.def, c-family/cilk.o,
>   c-family/c-cilkplus.o, c-family/array-notation-common.o,
>   cilk-common.o, cilk.h, cilk-common.c): Remove.
>   * builtin-types.def (BT_FN_INT_PTR_PTR_PTR_FTYPE_BT_INT_BT_PTR_BT_PTR
>   _BT_PTR): Remove.
>   * builtins.c (is_builtin_name): Remove cilkplus condition.
>   (BUILT_IN_CILK_DETACH, BUILT_IN_CILK_POP_FRAME): Remove.
>   * builtins.def (DEF_CILK_BUILTIN_STUB, DEF_CILKPLUS_BUILTIN,
>   cilk-builtins.def, cilkplus.def): Remove.
>   * c-family/array-notation-common.c: Delete.
>   * c-family/c-cilkplus.c: Ditto.
>   * c-family/c-common.c (_Cilk_spawn, _Cilk_sync, _Cilk_for): Remove.
>   * c-family/c-common.def (ARRAY_NOTATION_REF): Remove.
>   * c-family/c-common.h (RID_CILK_SPAWN, build_array_notation_expr,
>   build_array_notation_ref, C_ORT_CILK, c_check_cilk_loop,
>   c_validate_cilk_plus_loop, cilkplus_an_parts, 
> cilk_ignorable_spawn_rhs_op,
>   cilk_recognize_spawn): Remove.
>   * c-family/c-gimplify.c (CILK_SPAWN_STMT): Remove.
>   * c-family/c-omp.c: Remove CILK_SIMD check.
>   * c-family/c-pragma.c: Ditto.
>   * c-family/c-pragma.h: Remove CILK related pragmas.
>   * c-family/c-pretty-print.c (c_pretty_printer::postfix_expression): 
> Remove
>   ARRAY_NOTATION_REF condition.
>   (c_pretty_printer::expression): Ditto.
>   * c-family/c.opt (fcilkplus): Remove.
>   * c-family/cilk.c: Delete.
>   * c/Make-lang.in (c/c-array-notation.o): Remove.
>   * c/c-array-notation.c: Delete.
>   * c/c-decl.c: Remove cilkplus condition.
>   * c/c-parser.c (c_parser_cilk_simd, c_parser_cilk_for,
>   c_parser_cilk_verify_simd, c_parser_array_notation,
>   c_parser_cilk_clause_vectorlength, c_parser_cilk_grainsize,
>   c_parser_cilk_simd_fn_vector_attrs, c_finish_cilk_simd_fn_tokens): 
> Delete.
>   (c_parser_declaration_or_fndef): Remove cilkplus condition.
>   (c_parser_direct_declarator_inner): Ditto.
>   (CILK_SIMD_FN_CLAUSE_MASK): Delete.
>   (c_parser_attributes): Remove cilk-plus condition.
>   (c_parser_compound_statement): Ditto.
>   (c_parser_statement_after_labels): Ditto.
>   (c_parser_if_statement): Ditto.
>   (c_parser_switch_statement): Ditto.
>   (c_parser_while_statement): Ditto.
>   (c_parser_do_statement): Ditto.
>   (c_parser_for_statement): Ditto.
>   (c_parser_unary_expression): Ditto.
>   (c_parser_postfix_expression): Ditto.
>   (c_parser_postfix_expression_after_primary): Ditto.
>   (c_parser_pragma): Ditto.
>   (c_parser_omp_clause_name): Ditto.
>   (c_parser_omp_all_clauses): Ditto.
>   (c_parser_omp_for_loop): Ditto.
>   (c_finish_omp_declare_simd): Ditto.
>   * c/c-typeck.c (build_array_ref, build_function_call_vec,
> convert_arguments,
>   lvalue_p, build_compound_expr, c_finish_return, c_finish_if_stmt,
>   c_finish_loop, build_binary_op): Remove cilkplus condition.
>   * cif-code.def (CILK_SPAWN): Remove.
>   * cilk-builtins.def: Delete.
>   * cilk-common.c: Ditto.
>   * cilk.h: Ditto.
>   * cilkplus.def: Ditto.
>   * config/darwin.h (fcilkplus): Delete.
>   * cp/Make-lang.in (cp/cp-array-notation.o, cp/cp-cilkplus.o): Delete.
>   * cp/call.c (convert_for_arg_passing, build_cxx_call): Remove cilkplus.
>   * cp/constexpr.c (potential_constant_expression_1): Ditto.
>   * cp/cp-array-notation.c: Delete.
>   * cp/cp-cilkplus.c: Ditto.
>   * cp/cp-cilkplus.h: Ditto.
>   * cp/cp-gimplify.c (cp_gimplify_expr, cp_fold_r, cp_genericize): Remove
>   cilkplus condition.
>   * cp/cp-objcp-common.c (ARRAY_NOTATION_REF): Delete.
>   * cp/cp-tree.h (cilkplus_an_triplet_types_ok_p): Delete.
>   * cp/decl.c (grokfndecl, finish_function): Remove cilkplus condition.
>   * cp/error.c (dump_decl, dump_expr): Remove ARRAY_NOTATION_REF 
> condition.
>   * cp/lambda.c (cp-cilkplus.h): Remove.
>   * cp/parser.c (cp_parser_cilk_simd, cp_parser_cilk_for,
>   cp_parser_cilk_simd_vectorlength): Delete.
>   (cp_debug_parser, cp_parser_ctor_initializer_opt_and_function_body,
>   cp_parser_postfix_expression, cp_parser_postfix_open_square_expression,
>   cp_parser_statement, cp_parser_jump_statement,
> cp_parser_direct_decl

Use profile count scaling in vect_do_peeling

2017-11-16 Thread Jan Hubicka
Hi,
this is one of two remiaing places we scale by integer ratios rather than
counts which lose quality info.  This is because we scale up here which is
technically bad idea we lose precision and all code duplication should
perform scale at once as last step to avoid cumulating mistakes.

Bot since the updating logic is quite tricky, i decided to simply rewrite
scaling to counts for now.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* tree-vect-loop-manip.c (vect_do_peeling): Do not use
scale_bbs_frequencies_int.
Index: tree-vect-loop-manip.c
===
--- tree-vect-loop-manip.c  (revision 254767)
+++ tree-vect-loop-manip.c  (working copy)
@@ -1844,14 +1844,16 @@ vect_do_peeling (loop_vec_info loop_vinf
  /* Simply propagate profile info from guard_bb to guard_to which is
 a merge point of control flow.  */
  guard_to->count = guard_bb->count;
+
  /* Scale probability of epilog loop back.
 FIXME: We should avoid scaling down and back up.  Profile may
 get lost if we scale down to 0.  */
- int scale_up = REG_BR_PROB_BASE * REG_BR_PROB_BASE
-/ prob_vector.to_reg_br_prob_base ();
  basic_block *bbs = get_loop_body (epilog);
- scale_bbs_frequencies_int (bbs, epilog->num_nodes, scale_up,
-REG_BR_PROB_BASE);
+ for (unsigned int i = 0; i < epilog->num_nodes; i++)
+   bbs[i]->count = bbs[i]->count.apply_scale
+(bbs[i]->count,
+ bbs[i]->count.apply_probability
+   (prob_vector));
  free (bbs);
}
 


Avoid integer profile scaling in tree_transform_and_unroll_loop

2017-11-16 Thread Jan Hubicka
Hi,
this is the last remaining case of integer scaling.  The issue is again the 
same.
We scale up which is not best idea and unrolling done via cfgloopmanip gets 
around
wtihout doing it.
Again I decided to keep the logic for now, just update it to profile counts.

Bootstrapped/regtested x86_64-linux, comitted.

* tree-ssa-loop-manip.c
(scale_dominated_blocks_in_loop): Update to profile counts.
(tree_transform_and_unroll_loop): Likewise.
Index: tree-ssa-loop-manip.c
===
--- tree-ssa-loop-manip.c   (revision 254767)
+++ tree-ssa-loop-manip.c   (working copy)
@@ -1091,11 +1091,11 @@ determine_exit_conditions (struct loop *
 
 static void
 scale_dominated_blocks_in_loop (struct loop *loop, basic_block bb,
-   int num, int den)
+   profile_count num, profile_count den)
 {
   basic_block son;
 
-  if (den == 0)
+  if (!den.nonzero_p () && !(num == profile_count::zero ()))
 return;
 
   for (son = first_dom_son (CDI_DOMINATORS, bb);
@@ -1104,7 +1104,7 @@ scale_dominated_blocks_in_loop (struct l
 {
   if (!flow_bb_inside_loop_p (loop, son))
continue;
-  scale_bbs_frequencies_int (&son, 1, num, den);
+  scale_bbs_frequencies_profile_count (&son, 1, num, den);
   scale_dominated_blocks_in_loop (loop, son, num, den);
 }
 }
@@ -1281,9 +1281,10 @@ tree_transform_and_unroll_loop (struct l
 scale_dominated_blocks_in_loop (loop, exit->src,
/* We are scaling up here so probability
   does not fit.  */
-   REG_BR_PROB_BASE,
-   REG_BR_PROB_BASE
-   - exit->probability.to_reg_br_prob_base ());
+   loop->header->count,
+   loop->header->count
+   - loop->header->count.apply_probability
+(exit->probability));
 
   bsi = gsi_last_bb (exit_bb);
   exit_if = gimple_build_cond (EQ_EXPR, integer_zero_node,
@@ -1377,8 +1378,7 @@ tree_transform_and_unroll_loop (struct l
 {
   /* Avoid dropping loop body profile counter to 0 because of zero count
 in loop's preheader.  */
-  if (freq_e == profile_count::zero ())
-freq_e = profile_count::from_gcov_type (1);
+  freq_e = freq_e.force_nonzero ();
   scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
 }
 


Remove scale_bbs_frequencies_int, scale_bbs_frequencies_int

2017-11-16 Thread Jan Hubicka
Hi,
since all uses of those functions are now updated to profile counts and 
probabilities,
we can rmeove these.

Bootstrapped/retested x86_64-linux, comitted.

Honza

* cfg.c (scale_bbs_frequencies_int,
cale_bbs_frequencies_gcov_type): Remove.
* cfg.h (scale_bbs_frequencies_int,
cale_bbs_frequencies_gcov_type): Remove.
Index: cfg.c
===
--- cfg.c   (revision 254767)
+++ cfg.c   (working copy)
@@ -917,48 +917,6 @@ update_bb_profile_for_threading (basic_b
 }
 
 /* Multiply all frequencies of basic blocks in array BBS of length NBBS
-   by NUM/DEN, in int arithmetic.  May lose some accuracy.  */
-void
-scale_bbs_frequencies_int (basic_block *bbs, int nbbs, int num, int den)
-{
-  int i;
-  if (num < 0)
-num = 0;
-
-  /* Scale NUM and DEN to avoid overflows.  Frequencies are in order of
- 10^4, if we make DEN <= 10^3, we can afford to upscale by 100
- and still safely fit in int during calculations.  */
-  if (den > 1000)
-{
-  if (num > 100)
-   return;
-
-  num = RDIV (1000 * num, den);
-  den = 1000;
-}
-  if (num > 100 * den)
-return;
-
-  for (i = 0; i < nbbs; i++)
-{
-  bbs[i]->count = bbs[i]->count.apply_scale (num, den);
-}
-}
-
-/* Multiply all frequencies of basic blocks in array BBS of length NBBS
-   by NUM/DEN, in gcov_type arithmetic.  More accurate than previous
-   function but considerably slower.  */
-void
-scale_bbs_frequencies_gcov_type (basic_block *bbs, int nbbs, gcov_type num,
-gcov_type den)
-{
-  int i;
-
-  for (i = 0; i < nbbs; i++)
-bbs[i]->count = bbs[i]->count.apply_scale (num, den);
-}
-
-/* Multiply all frequencies of basic blocks in array BBS of length NBBS
by NUM/DEN, in profile_count arithmetic.  More accurate than previous
function but considerably slower.  */
 void
Index: cfg.h
===
--- cfg.h   (revision 254767)
+++ cfg.h   (working copy)
@@ -107,9 +107,6 @@ extern basic_block debug_bb_n (int);
 extern void dump_bb_info (FILE *, basic_block, int, dump_flags_t, bool, bool);
 extern void brief_dump_cfg (FILE *, dump_flags_t);
 extern void update_bb_profile_for_threading (basic_block, profile_count, edge);
-extern void scale_bbs_frequencies_int (basic_block *, int, int, int);
-extern void scale_bbs_frequencies_gcov_type (basic_block *, int, gcov_type,
-gcov_type);
 extern void scale_bbs_frequencies_profile_count (basic_block *, int,
 profile_count, profile_count);
 extern void scale_bbs_frequencies (basic_block *, int, profile_probability);


[PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-16 Thread Sudi Das
Hi

This patch fixes the test case armv8_2-fp16-move-1.c for 
arm-none-linux-gnueabihf where 2 of the scan-assembler directives were failing. 
We now generate less vmov between core and VFP registers. Thus changing those 
directives to reflect that.

Is this ok for trunk?
If yes could someone commit it on my behalf?

Sudi


*** gcc/testsuite/ChangeLog ***

2017-11-16  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov scan-assembler
directives.

diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index bb4e68f..0ed8560 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
 /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
 
-/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } }  */
-/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } }  */
 
 int
 test_compare_1 (__fp16 a, __fp16 b)


Set edges to region known to be executed 0 times to have probability 0

2017-11-16 Thread Jan Hubicka
Hi,
this patch fixes one profilemismatch issue in testsuite where we end up with
non-zero probability with edge to BB calling abort.

When detecting regions known to be executed 0 times, we should also set
edges leading to them to be executed 0 times.

The other change makes combine_predictions_for_bb to preserve this information
even if static branch prediction thinks otherwise.

Finally I noticed one unnecesary assert and fact htat I forgot to add call
to propagate_unlikely_bbs_forward while breaking it out of
propagate_unlikely_bbs.

Bootstrapped/regtested x86_64-linux, comitted.

* predict.c (combine_predictions_for_bb): Preserve zero predicted   
eges.
(expensive_function_p): Remove useless assert.
(determine_unlikely_bbs): Propagate also forward; determine cold blocks
Index: predict.c
===
--- predict.c   (revision 254812)
+++ predict.c   (working copy)
@@ -1118,18 +1118,26 @@ combine_predictions_for_bb (basic_block
   int nedges = 0;
   edge e, first = NULL, second = NULL;
   edge_iterator ei;
+  int nzero = 0;
+  int nunknown = 0;
 
   FOR_EACH_EDGE (e, ei, bb->succs)
-if (!unlikely_executed_edge_p (e))
-  {
-   nedges ++;
-   if (first && !second)
- second = e;
-   if (!first)
- first = e;
-  }
-else if (!e->probability.initialized_p ())
-  e->probability = profile_probability::never ();
+{
+  if (!unlikely_executed_edge_p (e))
+{
+ nedges ++;
+ if (first && !second)
+   second = e;
+ if (!first)
+   first = e;
+}
+  else if (!e->probability.initialized_p ())
+e->probability = profile_probability::never ();
+ if (!e->probability.initialized_p ())
+nunknown++;
+ else if (e->probability == profile_probability::never ())
+   nzero++;
+}
 
   /* When there is no successor or only one choice, prediction is easy.
 
@@ -1283,8 +1291,27 @@ combine_predictions_for_bb (basic_block
 }
   clear_bb_predictions (bb);
 
-  if ((!bb->count.nonzero_p () || !first->probability.initialized_p ())
-  && !dry_run)
+
+  /* If we have only one successor which is unknown, we can compute missing
+ probablity.  */
+  if (nunknown == 1)
+{
+  profile_probability prob = profile_probability::always ();
+  edge missing = NULL;
+
+  FOR_EACH_EDGE (e, ei, bb->succs)
+   if (e->probability.initialized_p ())
+ prob -= e->probability;
+   else if (missing == NULL)
+ missing = e;
+   else
+ gcc_unreachable ();
+   missing->probability = prob;
+}
+  /* If nothing is unknown, we have nothing to update.  */
+  else if (!nunknown && nzero != (int)EDGE_COUNT (bb->succs))
+;
+  else if (!dry_run)
 {
   first->probability
 = profile_probability::from_reg_br_prob_base (combined_probability);
@@ -3334,16 +3361,11 @@ expensive_function_p (int threshold)
 {
   basic_block bb;
 
-  /* We can not compute accurately for large thresholds due to scaled
- frequencies.  */
-  gcc_assert (threshold <= BB_FREQ_MAX);
-
   /* If profile was scaled in a way entry block has count 0, then the function
  is deifnitly taking a lot of time.  */
   if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.nonzero_p ())
 return true;
 
-  /* Maximally BB_FREQ_MAX^2 so overflow won't happen.  */
   profile_count limit = ENTRY_BLOCK_PTR_FOR_FN
   (cfun)->count.apply_scale (threshold, 1);
   profile_count sum = profile_count::zero ();
@@ -3453,6 +3475,7 @@ determine_unlikely_bbs ()
 
   gcc_checking_assert (!bb->aux);
 }
+  propagate_unlikely_bbs_forward ();
 
   auto_vec nsuccs;
   nsuccs.safe_grow_cleared (last_basic_block_for_fn (cfun));
@@ -3498,7 +3521,6 @@ determine_unlikely_bbs ()
   FOR_EACH_EDGE (e, ei, bb->preds)
if (!(e->probability == profile_probability::never ()))
  {
-   e->probability = profile_probability::never ();
if (!(e->src->count == profile_count::zero ()))
  {
nsuccs[e->src->index]--;
@@ -3507,6 +3529,19 @@ determine_unlikely_bbs ()
  }
  }
 }
+  /* Finally all edges from non-0 regions to 0 are unlikely.  */
+  FOR_ALL_BB_FN (bb, cfun)
+if (!(bb->count == profile_count::zero ()))
+  FOR_EACH_EDGE (e, ei, bb->succs)
+   if (!(e->probability == profile_probability::never ())
+   && e->dest->count == profile_count::zero ())
+  {
+if (dump_file && (dump_flags & TDF_DETAILS))
+  fprintf (dump_file, "Edge %i->%i is unlikely because "
+   "it enters unlikely block\n",
+   bb->index, e->dest->index);
+e->probability = profile_probability::never ();
+  }
 }
 
 /* Estimate and propagate basic block frequencies using the given branch


Accumulate time in sreals consistently in ipa-fnsummary

2017-11-16 Thread Jan Hubicka
Hi,
this patch drops use of integer bb frequencies in ipa-fnsummary.  This avoids
capping to 100 for frequency and makes it consistent with edge accounting.

ipcp-2.c needs updating becuase the cumulated time is now more realistic.
There is loop iterating 32*32 times and we accounted it as loop iteraitng 100
times.

Bootstrapped/regtested x86_64-linux. Comitted.

Honza

* ipa-fnsummary.c (analyze_function_body): Accumulate time consistently
in sreal.
* gcc.dg/ipa/ipcp-2.c: Lower threshold.
Index: ipa-fnsummary.c
===
--- ipa-fnsummary.c (revision 254812)
+++ ipa-fnsummary.c (working copy)
@@ -1986,7 +1986,7 @@ analyze_function_body (struct cgraph_nod
  <0,2>.  */
   basic_block bb;
   struct function *my_function = DECL_STRUCT_FUNCTION (node->decl);
-  int freq;
+  sreal freq;
   struct ipa_fn_summary *info = ipa_fn_summaries->get (node);
   predicate bb_predicate;
   struct ipa_func_body_info fbi;
@@ -2052,7 +2052,7 @@ analyze_function_body (struct cgraph_nod
   for (n = 0; n < nblocks; n++)
 {
   bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
-  freq = compute_call_stmt_bb_frequency (node->decl, bb);
+  freq = bb->count.to_sreal_scale (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count);
   if (clobber_only_eh_bb_p (bb))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -2127,7 +2127,7 @@ analyze_function_body (struct cgraph_nod
  fprintf (dump_file, "  ");
  print_gimple_stmt (dump_file, stmt, 0);
  fprintf (dump_file, "\t\tfreq:%3.2f size:%3i time:%3i\n",
-  ((double) freq) / CGRAPH_FREQ_BASE, this_size,
+  freq.to_double (), this_size,
   this_time);
}
 
@@ -2201,7 +2201,7 @@ analyze_function_body (struct cgraph_nod
will_be_nonconstant = true;
  if (this_time || this_size)
{
- this_time *= freq;
+ sreal final_time = (sreal)this_time * freq;
 
  prob = eliminated_by_inlining_prob (stmt);
  if (prob == 1 && dump_file && (dump_flags & TDF_DETAILS))
@@ -2218,7 +2218,7 @@ analyze_function_body (struct cgraph_nod
 
  if (*(is_gimple_call (stmt) ? &bb_predicate : &p) != false)
{
- time += this_time;
+ time += final_time;
  size += this_size;
}
 
@@ -2231,14 +2231,12 @@ analyze_function_body (struct cgraph_nod
{
  predicate ip = bb_predicate & predicate::not_inlined ();
  info->account_size_time (this_size * prob,
-  (sreal)(this_time * prob)
-  / (CGRAPH_FREQ_BASE * 2), ip,
+  (this_time * prob) / 2, ip,
   p);
}
  if (prob != 2)
info->account_size_time (this_size * (2 - prob),
-(sreal)(this_time * (2 - prob))
- / (CGRAPH_FREQ_BASE * 2),
+(this_time * (2 - prob) / 2),
 bb_predicate,
 p);
}
@@ -2256,7 +2254,6 @@ analyze_function_body (struct cgraph_nod
}
 }
   set_hint_predicate (&ipa_fn_summaries->get (node)->array_index, array_index);
-  time = time / CGRAPH_FREQ_BASE;
   free (order);
 
   if (nonconstant_names.exists () && !early)
Index: testsuite/gcc.dg/ipa/ipcp-2.c
===
--- testsuite/gcc.dg/ipa/ipcp-2.c   (revision 254812)
+++ testsuite/gcc.dg/ipa/ipcp-2.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp -fno-early-inlining 
--param ipa-cp-eval-threshold=100"  } */
+/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp -fno-early-inlining 
--param ipa-cp-eval-threshold=80"  } */
 /* { dg-add-options bind_pic_locally } */
 
 extern int get_stuff (int);


Accumulate time in sreals in ipa-fnsplit

2017-11-16 Thread Jan Hubicka
Hi,
this patch does same change to ipa-split as previous patch did to fnsummary.

Bootstrapped/regtested x86_64-linux.

Honza

* ipa-split.c (split_bb_info): Turn time to sreal.
(split_point): Likewise.
(dump_split_point): Likewise.
(fine_split_points): Likewise.
(execute_split_functions): Only zero split_bbs; turn time to sreals.
Index: ipa-split.c
===
--- ipa-split.c (revision 254812)
+++ ipa-split.c (working copy)
@@ -111,7 +111,7 @@ along with GCC; see the file COPYING3.
 struct split_bb_info
 {
   unsigned int size;
-  unsigned int time;
+  sreal time;
 };
 
 static vec bb_info_vec;
@@ -121,7 +121,8 @@ static vec bb_info_vec;
 struct split_point
 {
   /* Size of the partitions.  */
-  unsigned int header_time, header_size, split_time, split_size;
+  sreal header_time, split_time;
+  unsigned int header_size, split_size;
 
   /* SSA names that need to be passed into spit function.  */
   bitmap ssa_names_to_pass;
@@ -195,10 +196,11 @@ dump_split_point (FILE * file, struct sp
 {
   fprintf (file,
   "Split point at BB %i\n"
-  "  header time: %i header size: %i\n"
-  "  split time: %i split size: %i\n  bbs: ",
-  current->entry_bb->index, current->header_time,
-  current->header_size, current->split_time, current->split_size);
+  "  header time: %f header size: %i\n"
+  "  split time: %f split size: %i\n  bbs: ",
+  current->entry_bb->index, current->header_time.to_double (),
+  current->header_size, current->split_time.to_double (),
+  current->split_size);
   dump_bitmap (file, current->split_bbs);
   fprintf (file, "  SSA names to pass: ");
   dump_bitmap (file, current->ssa_names_to_pass);
@@ -1034,7 +1036,8 @@ struct stack_entry
   int earliest;
 
   /* Overall time and size of all BBs reached from this BB in DFS walk.  */
-  int overall_time, overall_size;
+  sreal overall_time;
+  int overall_size;
 
   /* When false we can not split on this BB.  */
   bool can_split;
@@ -1059,7 +1062,7 @@ struct stack_entry
the component used by consider_split.  */
 
 static void
-find_split_points (basic_block return_bb, int overall_time, int overall_size)
+find_split_points (basic_block return_bb, sreal overall_time, int overall_size)
 {
   stack_entry first;
   vec stack = vNULL;
@@ -1731,7 +1734,8 @@ execute_split_functions (void)
 {
   gimple_stmt_iterator bsi;
   basic_block bb;
-  int overall_time = 0, overall_size = 0;
+  sreal overall_time = 0;
+  int overall_size = 0;
   int todo = 0;
   struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
@@ -1822,33 +1826,36 @@ execute_split_functions (void)
 
   /* Compute local info about basic blocks and determine function size/time.  
*/
   bb_info_vec.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1);
-  memset (&best_split_point, 0, sizeof (best_split_point));
+  best_split_point.split_bbs = NULL;
   basic_block return_bb = find_return_bb ();
   int tsan_exit_found = -1;
   FOR_EACH_BB_FN (bb, cfun)
 {
-  int time = 0;
+  sreal time = 0;
   int size = 0;
-  int freq = compute_call_stmt_bb_frequency (current_function_decl, bb);
+  sreal freq = bb->count.to_sreal_scale
+(ENTRY_BLOCK_PTR_FOR_FN (cfun)->count);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Basic block %i\n", bb->index);
 
   for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
{
- int this_time, this_size;
+ sreal this_time;
+ int this_size;
  gimple *stmt = gsi_stmt (bsi);
 
  this_size = estimate_num_insns (stmt, &eni_size_weights);
- this_time = estimate_num_insns (stmt, &eni_time_weights) * freq;
+ this_time = (sreal)estimate_num_insns (stmt, &eni_time_weights)
+* freq;
  size += this_size;
  time += this_time;
  check_forbidden_calls (stmt);
 
  if (dump_file && (dump_flags & TDF_DETAILS))
{
- fprintf (dump_file, "  freq:%6i size:%3i time:%3i ",
-  freq, this_size, this_time);
+ fprintf (dump_file, "  freq:%4.2f size:%3i time:%4.2f ",
+  freq.to_double (), this_size, this_time.to_double ());
  print_gimple_stmt (dump_file, stmt, 0);
}
 


Remove write only bb_freq in tree-emutls.c

2017-11-16 Thread Jan Hubicka
Hi,
this var is actually write only, so I have removed it.

Honza

* tree-emutls.c (lower_emutls_data): Remove unused bb_freq.
(lower_emutls_function_body): Do not compute it.
Index: tree-emutls.c
===
--- tree-emutls.c   (revision 254812)
+++ tree-emutls.c   (working copy)
@@ -383,7 +383,6 @@ struct lower_emutls_data
   struct cgraph_node *builtin_node;
   tree builtin_decl;
   basic_block bb;
-  int bb_freq;
   location_t loc;
   gimple_seq seq;
 };
@@ -622,10 +621,6 @@ lower_emutls_function_body (struct cgrap
 PHI argument for that edge.  */
   if (!gimple_seq_empty_p (phi_nodes (d.bb)))
{
- /* The calls will be inserted on the edges, and the frequencies
-will be computed during the commit process.  */
- d.bb_freq = 0;
-
  nedge = EDGE_COUNT (d.bb->preds);
  for (i = 0; i < nedge; ++i)
{
@@ -650,8 +645,6 @@ lower_emutls_function_body (struct cgrap
}
}
 
-  d.bb_freq = compute_call_stmt_bb_frequency (current_function_decl, d.bb);
-
   /* We can re-use any SSA_NAME created during this basic block.  */
   clear_access_vars ();
 


RE: [patch] remove cilk-plus

2017-11-16 Thread Koval, Julia
Thanks for your comments, fixed it.

2017-11-16  Julia Koval  
Sebastian Peryt  

* Makefile.def (target_modules): Remove libcilkrts.
* Makefile.in: Ditto.
* configure: Ditto.
* configure.ac: Ditto.

contrib/
* contrib/gcc_update: Ditto.

gcc/
* Makefile.in (cilkplus.def, cilk-builtins.def, c-family/cilk.o, 
c-family/c-cilkplus.o, c-family/array-notation-common.o,
cilk-common.o, cilk.h, cilk-common.c): Remove.
* builtin-types.def
(BT_FN_INT_PTR_PTR_PTR_FTYPE_BT_INT_BT_PTR_BT_PTR_BT_PTR): Remove.
* builtins.c (is_builtin_name): Remove cilkplus condition.
(BUILT_IN_CILK_DETACH, BUILT_IN_CILK_POP_FRAME): Remove.
* builtins.def (DEF_CILK_BUILTIN_STUB, DEF_CILKPLUS_BUILTIN,
cilk-builtins.def, cilkplus.def): Remove.
* cif-code.def (CILK_SPAWN): Remove.
* cilk-builtins.def: Delete.
* cilk-common.c: Ditto.
* cilk.h: Ditto.
* cilkplus.def: Ditto.
* config/darwin.h (fcilkplus): Delete.
* cppbuiltin.c: Ditto.
* doc/extend.texi: Remove cilkplus doc.
* doc/generic.texi: Ditto.
* doc/invoke.texi: Ditto.
* doc/passes.texi: Ditto.
* gcc.c (fcilkplus): Remove.
* gengtype.c (cilk.h): Remove.
* gimple-pretty-print.c (dump_gimple_omp_for): Remove cilkplus support.
* gimple.h (GF_OMP_FOR_KIND_CILKFOR, GF_OMP_FOR_KIND_CILKSIMD): Remove.
* gimplify.c (gimplify_return_expr, maybe_fold_stmt, gimplify_call_expr,
is_gimple_stmt, gimplify_modify_expr, gimplify_scan_omp_clauses,
gimplify_adjust_omp_clauses, gimplify_omp_for, gimplify_expr): Remove
cilkplus conditions.
* ipa-fnsummary.c (ipa_dump_fn_summary, compute_fn_summary,
inline_read_section): Ditto.
* ipa-inline-analysis.c (cilk.h): Remove.
* ira.c (ira_setup_eliminable_regset): Remove cilkplus support.
* lto-wrapper.c (merge_and_complain, append_compiler_options,
append_linker_options): Remove condition for fcilkplus.
* lto/lto-lang.c (cilk.h): Remove.
(lto_init): Remove condition for fcilkplus.
* omp-expand.c (expand_cilk_for_call): Delete.
(expand_omp_taskreg, expand_omp_for_static_chunk,
expand_omp_for): Remove cilkplus
conditions.
(expand_cilk_for): Delete.
* omp-general.c (omp_extract_for_data): Remove cilkplus support.
* omp-low.c (scan_sharing_clauses, create_omp_child_function,
execute_lower_omp, diagnose_sb_0): Ditto.
* omp-simd-clone.c (simd_clone_clauses_extract): Ditto.
* tree-core.h (OMP_CLAUSE__CILK_FOR_COUNT_): Delete.
* tree-nested.c: Ditto.
* tree-pretty-print.c (dump_omp_clause): Remove cilkplus support.
(dump_generic_node): Ditto.
* tree.c (OMP_CLAUSE__CILK_FOR_COUNT_): Delete.
* tree.def (cilk_simd, cilk_for, cilk_spawn_stmt, cilk_sync_stmt): 
Delete.
* tree.h (CILK_SPAWN_FN, EXPR_CILK_SPAWN): Delete.

gcc/c-family/
* array-notation-common.c: Delete.
* c-cilkplus.c: Ditto.
* c-common.c (_Cilk_spawn, _Cilk_sync, _Cilk_for): Remove.
* c-common.def (ARRAY_NOTATION_REF): Remove.
* c-common.h (RID_CILK_SPAWN, build_array_notation_expr,
build_array_notation_ref, C_ORT_CILK, c_check_cilk_loop,
c_validate_cilk_plus_loop, cilkplus_an_parts, 
cilk_ignorable_spawn_rhs_op,
cilk_recognize_spawn): Remove.
* c-gimplify.c (CILK_SPAWN_STMT): Remove.
* c-omp.c: Remove CILK_SIMD check.
* c-pragma.c: Ditto.
* c-pragma.h: Remove CILK related pragmas.
* c-pretty-print.c (c_pretty_printer::postfix_expression): Remove
ARRAY_NOTATION_REF condition.
(c_pretty_printer::expression): Ditto.
* c.opt (fcilkplus): Remove.
* cilk.c: Delete.

gcc/c/
* Make-lang.in (c/c-array-notation.o): Remove.
* c-array-notation.c: Delete.
* c-decl.c: Remove cilkplus condition.
* c-parser.c (c_parser_cilk_simd, c_parser_cilk_for,
c_parser_cilk_verify_simd, c_parser_array_notation,
c_parser_cilk_clause_vectorlength, c_parser_cilk_grainsize,
c_parser_cilk_simd_fn_vector_attrs,
c_finish_cilk_simd_fn_tokens): Delete.
(c_parser_declaration_or_fndef): Remove cilkplus condition.
(c_parser_direct_declarator_inner): Ditto.
(CILK_SIMD_FN_CLAUSE_MASK): Delete.
(c_parser_attributes, c_parser_compound_statement,
c_parser_statement_after_labels, c_parser_if_statement,
c_parser_switch_statement, c_parser_while_statement,
c_parser_do_statement, c_parser_for_statement,
c_parser_unary_expression, c_parser_postfix_expression,
c_parser_postfix_expression_after_primary,
c_parser_pragma, c_parser_omp_clause_name, c_parser_omp_all_clauses,
c_parser_omp_for_loop, c_finish_omp_declare_

Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-16 Thread Kyrill Tkachov

Hi Sudi,

On 16/11/17 16:37, Sudi Das wrote:

Hi

This patch fixes the test case armv8_2-fp16-move-1.c for 
arm-none-linux-gnueabihf where 2 of the scan-assembler directives were 
failing. We now generate less vmov between core and VFP registers. 
Thus changing those directives to reflect that.


Is this ok for trunk?
If yes could someone commit it on my behalf?

Sudi


*** gcc/testsuite/ChangeLog ***

2017-11-16  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov scan-assembler
directives.



diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c 
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index bb4e68f..0ed8560 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
 /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 
1 } }  */
 /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 
1 } }  */
 
-/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } }  */

-/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } }  */
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } }  */
 
Some of the moves between core and fp registers were the result of inefficient codegen and in hindsight

scanning for them was not very useful. Now that we emit only the required ones 
I think scanning for the plain
vmovs between two S-registers doesn't test anything useful.
So can you please just remove the second scan-assembler directive here?

Thanks,
Kyrill



[PATCH] Use bswap framework in store-merging (PR tree-optimization/78821)

2017-11-16 Thread Jakub Jelinek
Hi!

This patch uses the bswap pass framework inside of the store merging
pass to handle adjacent stores which produce together a 16/32/64 bit
store of bswapped value (loaded or from SSA_NAME) or identity (usually
only from SSA_NAME, the code prefers to use the existing store merging
code if coming from identity load, because it e.g. can handle arbitrary
sizes, not just 16/32/64 bits).

There are small tweaks to the bswap code to make it usable inside of
the store merging pass.  Then when processing the stores, we record
what find_bswap_or_nop_1 returns and do a small sanity check on it,
and when doing coalesce_immediate_stores (i.e. the splitting into
groups), we try for 64-bit, 32-bit and 16-bit sizes if we can extend/shift
(according to endianity) and perform_symbolic_merge them together.
If it is possible, we turn those 2+ adjacent stores that make together
{64,32,16} bits into a separate group and process it specially later
(we need to treat it as a single store rather than multiple, so
split_group is only very lightweight for that case).

Bootstrapped/regtested on {x86_64,i686,powerpc64le,powerpc64}-linux, ok for 
trunk?

The cases this patch can handle are less common than rhs_code INTEGER_CST
(stores of constants to adjacent memory) or MEM_REF (adjacent memory
copying), but are more common than the bitwise ops, during combined
x86_64+i686 bootstraps/regtests it triggered:
lrotate_expr  974   2528
nop_expr  720   1711
(lrotate_expr stands for bswap, nop_expr for identity, the first column is
the actual count of such new stores, the second is the original number of
stores that have been optimized this way).

2017-11-16  Jakub Jelinek  

PR tree-optimization/78821
* gimple-ssa-store-merging.c (find_bswap_or_nop_load): Give up
if base is TARGET_MEM_REF.  If base is not MEM_REF, set base_addr
to the address of the base rather than the base itself.
(find_bswap_or_nop_1): Just use pointer comparison for vuse check.
(find_bswap_or_nop_finalize): New function.
(find_bswap_or_nop): Use it.
(bswap_replace): Return a tree rather than bool, change first
argument from gimple * to gimple_stmt_iterator, allow inserting
into an empty sequence, allow ins_stmt to be NULL - then emit
all stmts into gsi.  Fix up MEM_REF address gimplification.
(pass_optimize_bswap::execute): Adjust bswap_replace caller.
Formatting fix.
(struct store_immediate_info): Add N and INS_STMT non-static
data members.
(store_immediate_info::store_immediate_info): Initialize them
from newly added ctor args.
(merged_store_group::apply_stores): Formatting fixes.  Sort by
bitpos at the end.
(stmts_may_clobber_ref_p): For stores call also
refs_anti_dependent_p.
(gather_bswap_load_refs): New function.
(imm_store_chain_info::try_coalesce_bswap): New method.
(imm_store_chain_info::coalesce_immediate_stores): Use it.
(split_group): Handle LROTATE_EXPR and NOP_EXPR rhs_code specially.
(imm_store_chain_info::output_merged_store): Fail if number of
new estimated stmts is bigger or equal than old.  Handle LROTATE_EXPR
and NOP_EXPR rhs_code.
(pass_store_merging::process_store): Compute n and ins_stmt, if
ins_stmt is non-NULL and the store rhs is otherwise invalid, use
LROTATE_EXPR rhs_code.  Pass n and ins_stmt to store_immediate_info
ctor.
(pass_store_merging::execute): Calculate dominators.

* gcc.dg/store_merging_16.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2017-11-16 10:45:09.239185205 +0100
+++ gcc/gimple-ssa-store-merging.c  2017-11-16 15:34:08.560080214 +0100
@@ -369,7 +369,10 @@ find_bswap_or_nop_load (gimple *stmt, tr
   base_addr = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode,
   &unsignedp, &reversep, &volatilep);
 
-  if (TREE_CODE (base_addr) == MEM_REF)
+  if (TREE_CODE (base_addr) == TARGET_MEM_REF)
+/* Do not rewrite TARGET_MEM_REF.  */
+return false;
+  else if (TREE_CODE (base_addr) == MEM_REF)
 {
   offset_int bit_offset = 0;
   tree off = TREE_OPERAND (base_addr, 1);
@@ -401,6 +404,8 @@ find_bswap_or_nop_load (gimple *stmt, tr
 
   bitpos += bit_offset.to_shwi ();
 }
+  else
+base_addr = build_fold_addr_expr (base_addr);
 
   if (bitpos % BITS_PER_UNIT)
 return false;
@@ -743,8 +748,7 @@ find_bswap_or_nop_1 (gimple *stmt, struc
  if (TYPE_PRECISION (n1.type) != TYPE_PRECISION (n2.type))
return NULL;
 
- if (!n1.vuse != !n2.vuse
- || (n1.vuse && !operand_equal_p (n1.vuse, n2.vuse, 0)))
+ if (n1.vuse != n2.vuse)
return NULL;
 
  source_stmt
@@ -765,39 +769,21 @@ find_bswap_or_nop_1 (gimple *stmt, struc
   return NULL;
 }
 
-/* Check if STMT completes a bswap implementation or a read in a given
-   endianness c

[PATCH] [BRIGFE] Reduce the number of type conversions due to the untyped HSAIL regs

2017-11-16 Thread Pekka Jääskeläinen
Instead of always representing the HSAIL's untyped registers as
unsigned int, the gccbrig now pre-analyzes the BRIG code and
builds the register variables as a type used the most when storing
or reading data to/from each register. This reduces the total
conversions which cannot be always optimized away.

Committed as r254837.

BR,
Pekka
Index: gcc/brig/brigfrontend/brig-util.cc
===
--- gcc/brig/brigfrontend/brig-util.cc	(revision 254836)
+++ gcc/brig/brigfrontend/brig-util.cc	(revision 254837)
@@ -26,6 +26,7 @@
 #include "brig-util.h"
 #include "errors.h"
 #include "diagnostic-core.h"
+#include "print-tree.h"
 
 bool
 group_variable_offset_index::has_variable (const std::string &name) const
@@ -473,3 +474,91 @@
   /* Drop const qualifiers.  */
   return tree_type;
 }
+
+/* Calculates numeric identifier for the HSA register REG.
+
+   Returned value is bound to [0, BRIG_2_TREE_HSAIL_TOTAL_REG_COUNT].  */
+
+size_t
+gccbrig_hsa_reg_id (const BrigOperandRegister ®)
+{
+  size_t offset = reg.regNum;
+  switch (reg.regKind)
+{
+case BRIG_REGISTER_KIND_QUAD:
+  offset
+	+= BRIG_2_TREE_HSAIL_D_REG_COUNT + BRIG_2_TREE_HSAIL_S_REG_COUNT
+	+ BRIG_2_TREE_HSAIL_C_REG_COUNT;
+  break;
+case BRIG_REGISTER_KIND_DOUBLE:
+  offset += BRIG_2_TREE_HSAIL_S_REG_COUNT + BRIG_2_TREE_HSAIL_C_REG_COUNT;
+  break;
+case BRIG_REGISTER_KIND_SINGLE:
+  offset += BRIG_2_TREE_HSAIL_C_REG_COUNT;
+case BRIG_REGISTER_KIND_CONTROL:
+  break;
+default:
+  gcc_unreachable ();
+  break;
+}
+  return offset;
+}
+
+std::string
+gccbrig_hsa_reg_name_from_id (size_t reg_hash)
+{
+  char reg_name[32];
+  if (reg_hash < BRIG_2_TREE_HSAIL_C_REG_COUNT)
+{
+  sprintf (reg_name, "$c%lu", reg_hash);
+  return reg_name;
+}
+
+  reg_hash -= BRIG_2_TREE_HSAIL_C_REG_COUNT;
+  if (reg_hash < BRIG_2_TREE_HSAIL_S_REG_COUNT)
+{
+  sprintf (reg_name, "$s%lu", reg_hash);
+  return reg_name;
+}
+
+  reg_hash -= BRIG_2_TREE_HSAIL_S_REG_COUNT;
+  if (reg_hash < BRIG_2_TREE_HSAIL_D_REG_COUNT)
+{
+  sprintf (reg_name, "$d%lu", reg_hash);
+  return reg_name;
+}
+
+   reg_hash -= BRIG_2_TREE_HSAIL_D_REG_COUNT;
+   if (reg_hash < BRIG_2_TREE_HSAIL_Q_REG_COUNT)
+{
+  sprintf (reg_name, "$q%lu", reg_hash);
+  return reg_name;
+}
+
+  gcc_unreachable ();
+  return "$??";
+}
+
+/* Prints statistics of register usage to stdout.  */
+
+void
+gccbrig_print_reg_use_info (FILE *dump, const regs_use_index &info)
+{
+  regs_use_index::const_iterator begin_it = info.begin ();
+  regs_use_index::const_iterator end_it = info.end ();
+  for (regs_use_index::const_iterator it = begin_it; it != end_it; it++)
+{
+  std::string hsa_reg = gccbrig_hsa_reg_name_from_id (it->first);
+  printf ("%s:\n", hsa_reg.c_str ());
+  const reg_use_info &info = it->second;
+  typedef std::vector >::const_iterator reg_use_it;
+  reg_use_it begin_it2 = info.m_type_refs.begin ();
+  reg_use_it end_it2 = info.m_type_refs.end ();
+  for (reg_use_it it2 = begin_it2; it2 != end_it2; it2++)
+	{
+	  fprintf (dump, "(%lu) ", it2->second);
+	  print_node_brief (dump, "", it2->first, 0);
+	  fprintf (dump, "\n");
+	}
+}
+}
Index: gcc/brig/brigfrontend/brig-util.h
===
--- gcc/brig/brigfrontend/brig-util.h	(revision 254836)
+++ gcc/brig/brigfrontend/brig-util.h	(revision 254837)
@@ -23,6 +23,7 @@
 #define GCC_BRIG_UTIL_H
 
 #include 
+#include 
 
 #include "config.h"
 #include "system.h"
@@ -31,6 +32,15 @@
 #include "opts.h"
 #include "tree.h"
 
+/* There are 128 c regs and 2048 s/d/q regs each in the HSAIL.  */
+#define BRIG_2_TREE_HSAIL_C_REG_COUNT (128)
+#define BRIG_2_TREE_HSAIL_S_REG_COUNT (2048)
+#define BRIG_2_TREE_HSAIL_D_REG_COUNT (2048)
+#define BRIG_2_TREE_HSAIL_Q_REG_COUNT (2048)
+#define BRIG_2_TREE_HSAIL_TOTAL_REG_COUNT\
+  (BRIG_2_TREE_HSAIL_C_REG_COUNT + BRIG_2_TREE_HSAIL_S_REG_COUNT	\
+   + BRIG_2_TREE_HSAIL_D_REG_COUNT + BRIG_2_TREE_HSAIL_Q_REG_COUNT)
+
 /* Helper class for keeping book of group variable offsets.  */
 
 class group_variable_offset_index
@@ -76,4 +86,25 @@
 /* From hsa.h.  */
 bool hsa_type_packed_p (BrigType16_t type);
 
+struct reg_use_info
+{
+  /* This vector keeps count of the times an HSAIL register is used as
+ a tree type in generic expressions.  The count is used to select
+ type for 'register' variables to reduce emission of
+ VIEW_CONVERT_EXPR nodes.  The data is kept in vector (insertion
+ order) for determinism, in a case there is a tie with the
+ counts.  */
+  std::vector > m_type_refs;
+  /* Tree to index.  Lookup for the above vector.  */
+  std::map m_type_refs_lookup;
+};
+
+/* key = hsa register entry generated by gccbrig_hsa_reg_id ().  */
+typedef std::map regs_use_index;
+
+size_t gccbrig_hsa_reg_id (const BrigOperandRegister ®);
+std::string gccbrig_hsa_re

Re: Make istreambuf_iterator::_M_sbuf immutable and add debug checks

2017-11-16 Thread François Dumont

On 16/11/2017 12:46, Jonathan Wakely wrote:

On 16/11/17 10:57 +, Jonathan Wakely wrote:

On 16/11/17 08:51 +0300, Petr Ovtchenkov wrote:

On Mon, 6 Nov 2017 22:19:22 +0100
François Dumont  wrote:


Hi

    Any final decision regarding this patch ?

François


https://gcc.gnu.org/ml/libstdc++/2017-11/msg00036.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00035.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00037.html
https://gcc.gnu.org/ml/libstdc++/2017-11/msg00034.html


It would be helpful if you two could collaborate and come up with a
good solution, or at least discuss the pros and cons, instead of just
sending competing patches.



Let me be more clear: I'm not going to review further patches in this
area while you two are proposing different alternatives, without
commenting on each other's approach.

If you think your solution is better than François's solution, you
should explain why, not just send a different patch. If François
thinks his solution is better than yours, he should state why, not
just send a different patch.

I don't have time to infer all that from just your patches, so I'm not
going to bother.


Proposing to revert my patch doesn't sound to me like a friendly action 
to start a collaboration.


My only concern has always been the Debug mode impact which is now fixed.

I already said that I disagree with Petr's main goal to keep eof 
iterator linked to the underlying stream. So current implementation is 
just fine to me and I'll let Petr argument for any change. @Jonathan, 
You can ignore my last request to remove mutable keywork on _M_sbuf.


François




Re: Adjust empty class parameter passing ABI (PR c++/60336)

2017-11-16 Thread Marek Polacek
On Tue, Nov 14, 2017 at 07:34:54AM +0100, Richard Biener wrote:
> On November 14, 2017 6:21:41 AM GMT+01:00, Jason Merrill  
> wrote:
> >On Mon, Nov 13, 2017 at 1:02 PM, Marek Polacek 
> >> In the end I did two bootstraps with the patch, but modifed one of
> >them
> >> to always return false for ix86_is_empty_record.  Then I compared all
> >the
> >> *.o in both dirs.  The result is attached.  Then I looked at
> >DW_AT_producer
> >> for all these .o that differ; all of them are C++.  Is this enough to
> >> clear our concerns?
> >
> >Hmm, a bunch of these are right at the beginning, bytes 41 and 65, in
> >the header.
> >
> >Did you build them in the different trunk/trunk2 directories?  I think
> >Jakub was suggesting building them in the same directory.
> >> And I also ran a bootstrap with --enable-cxx-flags=-Wabi=11, and
> >didn't
> >> see any warnings.
> >
> >If there's a codegen change, there ought to be a warning to go along
> >with it.
> 
> The question was of course also for unintended changes but yes (I was mainly 
> concerned by libstdc++ ABI changes). 

Ok, I did two bootstraps in the same dir, one with ix86_is_empty_record always
returning false.  There were a few object files that differ in their assembly
between those two bootstraps.  Previously I didn't see any warnings because
I hadn't thought of -Wsystem-headers.  Also, we intentionally don't warn if
the empty parameter is the last one:

+  bool seen_empty_type = false;
+  FOREACH_FUNCTION_ARGS (fntype, argtype, iter)
+   {
+ if (VOID_TYPE_P (argtype))
+   break;
+ if (TYPE_EMPTY_P (argtype))
+   seen_empty_type = true;
+ else if (seen_empty_type)
+   {
+ cum->warn_empty = true;
+ break;
+   }
+   }

After enabling -Wsystem-headers and tweaking the code above so that we warn
even if the empty parameter is trailing I can see the warnings that correspond
to the assembly changes.  Below is a summary of what I found.  TL;DR: I don't
see any unintended changes.

gcc/gcov.o
  includes #define INCLUDE_ALGORITHM so we'll get bits/stl_algo.h with stuff
  like
  bits/stl_algo.h:1840:5: warning: empty class 
‘__gnu_cxx::__ops::_Iter_comp_iter’ 
 __insertion_sort(_RandomAccessIterator __first,

gcc/go/gogo.o
  includes bits/stl_vector.h:
  1535   _M_range_insert(__pos, __first, __last,
  1536   std::__iterator_category(__first));
  warning: empty class ‘std::forward_iterator_tag’

gcc/go/expressions.o
gcc/go/export.o
  includes bits/stl_algo.h:
  std::__insertion_sort(__first, __last, __comp);
  warning: empty class ‘__gnu_cxx::__ops::_Iter_less_iter’

gcc/go/types.o
  includes bits/stl_algo.h:
  std::__final_insertion_sort(__first, __last, __comp);
  warning: empty class 
‘__gnu_cxx::__ops::_Iter_comp_iter’

gcc/go/statements.o
  bits/hashtable.h:2068:4: warning: empty class ‘std::integral_constant’

gcc/build/genrecog.o
  includes #define INCLUDE_ALGORITHM
  warning: empty class ‘__gnu_cxx::__ops::_Iter_less_iter’
  342   std::__adjust_heap(__first, __parent, __len, 
_GLIBCXX_MOVE(__value),
  343  __comp);

gcc/i386.o
gcc/bb-reorder.o
  expected changes

gcc/tree-loop-distribution.o
  uses std::stable_sort

x86_64-pc-linux-gnu/libstdc++-v3/src/c++98/bitmap_allocator.o
  ./src/c++98/bitmap_allocator.cc
  57 iterator __tmp = __lower_bound(__free_list.begin(), __free_list.end(),
  58__sz, _LT_pointer_compare());
  warning: empty class ‘__gnu_cxx::free_list::_LT_pointer_compare’ 

x86_64-pc-linux-gnu/libstdc++-v3/src/c++98/messages_members.o
x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/sstream-inst.o
x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/cxx11-wlocale-inst.o
  includes bits/basic_string.h:
  236   _M_construct(__beg, __end, _Tag());
  warning: empty class ‘std::forward_iterator_tag’

x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/cow-shim_facets.o
  includes c++11/cxx11-shim_facets.cc:
  271   return __collate_compare(other_abi{}, _M_get(),
  272lo1, hi1, lo2, hi2);
  warning: empty class ‘std::integral_constant’

x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/cow-string-inst.o
x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/string-inst.o
x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/cow-wstring-inst.o
  includes basic_string.tcc:
  563   basic_string<_CharT, _Traits, _Alloc>::
  564   _S_construct(_InIterator __beg, _InIterator __end, const _Alloc& 
__a,
  565forward_iterator_tag)
  warning: empty class ‘std::forward_iterator_tag’

x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/cxx11-shim_facets.o
  see above

x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/locale-inst.o
x86_64-pc-linux-gnu/libstdc++-v3/src/c++11/wlocale-inst.o
  includes bits/basic_string.h:
  5033   return _S_construct(__beg, __end, __a, _Tag());
  warning: empty class ‘std::forward_iterator_tag’

x86_64-pc-linux-gnu/l

Re: [PATCH #2], make Float128 built-in functions work with -mabi=ieeelongdouble

2017-11-16 Thread Michael Meissner
On Thu, Nov 16, 2017 at 04:48:18AM -0600, Segher Boessenkool wrote:
> On Wed, Nov 15, 2017 at 04:56:10PM -0500, Michael Meissner wrote:
> > David tells me that the patch to enable float128 built-in functions to work
> > with the -mabi=ieeelongdouble option broke AIX because on AIX, the float128
> > insns are disabled, and they all become CODE_FOR_nothing.  The switch 
> > statement
> > that was added in rs6000.c to map KFmode built-in functions to TFmode breaks
> > under AIX.
> 
> It also breaks on Linux with older binutils (no HAVE_AS_POWER9 defined).
> 
> > I changed the code to have a separate table, and the first call, I build the
> > table.  If the insn was not generated, it will just be CODE_FOR_nothing, and
> > the KF->TF mode conversion will not be done.
> > 
> > I have tested this on a little endian power8 system and there were no
> > regressions.  Once David verifies that it builds on AIX, can I check this 
> > into
> > the trunk?
> 
> I don't like this scheme much (huge table, initialisation at runtime, etc.),
> but okay for trunk, to unbreak things there.
> 
> Some comments on the patch:
> 
> > +  if (first_time)
> > +   {
> > + first_time = false;
> > + gcc_assert ((int)CODE_FOR_nothing == 0);
> 
> No useless cast please.  The whole assert is pretty useless fwiw; just
> take it out?
> 
> > + for (i = 0; i < ARRAY_SIZE (map); i++)
> > +   map_insn_code[(int)map[i].from] = map[i].to;
> > +   }
> 
> Space after cast.
> 
> Only do this for codes that are *not* CODE_FOR_nothing?

I must admit to not liking the code, and it is overly complicated.

It occurred to me this morning that a much simpler patch is to just #ifdef out
the switch statement if we don't have the proper assembler.  I tried this on an
old power7 system using the system assembler (which does not support the ISA
3.0 instructions) and it built fine.  I think this will work on AIX.  David can
you check this?

I will fire off a build, and if it is successful, can I check this patch
instead of the other patch?

2017-11-15  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_expand_builtin): Do not do the
switch statement mapping KF built-ins to TF built-ins if we don't
have the proper ISA 3.0 assembler support.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 254837)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -16690,7 +16690,10 @@ rs6000_expand_builtin (tree exp, rtx tar
  double (KFmode) or long double is IEEE 128-bit (TFmode).  It is simpler if
  we only define one variant of the built-in function, and switch the code
  when defining it, rather than defining two built-ins and using the
- overload table in rs6000-c.c to switch between the two.  */
+ overload table in rs6000-c.c to switch between the two.  If we don't have
+ the proper assembler, don't do this switch because CODE_FOR_*kf* and
+ CODE_FOR_*tf* will be CODE_FOR_nothing.  */
+#ifdef HAVE_AS_POWER9
   if (FLOAT128_IEEE_P (TFmode))
 switch (icode)
   {
@@ -16711,6 +16714,7 @@ rs6000_expand_builtin (tree exp, rtx tar
   case CODE_FOR_xsiexpqpf_kf:  icode = CODE_FOR_xsiexpqpf_tf;  break;
   case CODE_FOR_xststdcqp_kf:  icode = CODE_FOR_xststdcqp_tf;  break;
   }
+#endif
 
   if (TARGET_DEBUG_BUILTIN)
 {


Re: [committed][PATCH] Change order of processing blocks/threads in tree-ssa-threadupdate.c

2017-11-16 Thread Jeff Law
On 11/15/2017 12:57 AM, Aldy Hernandez wrote:
> 
> 
> On 11/14/2017 10:46 PM, Jeff Law wrote:
>> With my local patches to remove jump threading from VRP I was seeing a
>> fairly obvious jump threading path left in the CFG after DOM.  This
>> missed jump thread ultimately caused a false positive uninitialized
>> warning.
> 
> This wouldn't be uninit-pred-[68]* or some such, which I also trigger
> when messing around with the backwards threader ??.
Nope.  It was expand_expr_real_1 from expr.c.  That's also why there
wasn't a testcase included.  Culling that down to something reasonable
was going to be, umm, painful.

There's a slight chance it'd help the case you're referring to, but I
doubt it.

> 
>> ping-ponging, but not due to this patch AFAICT.  Also verified by visual
>> inspection that the first DOM pass fully threaded the code in question
>> when using a local branch that has my removal of threading from tree-vrp
>> patches installed and bootstrapping that branch.
> 
> If DOM dumps the threads to the dump file you may want to bake that test
> with some GIMPLE FE test.
> 
>> Installing on the trunk.
> 
> You forgot to attach the patch :).
Seems like I botched both submissions from that night.  I blame nyquil.


Jeff
commit b0915eb6736b70306ccc4f8498aeb25c77c29c7f
Author: law 
Date:   Wed Nov 15 03:45:03 2017 +

* tree-ssa-threadupdate.c (thread_through_all_blocks): Thread
blocks is post order.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@254752 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9cba109ec59..c404eb8e5a7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-11-14  Jeff Law  
+
+   * tree-ssa-threadupdate.c (thread_through_all_blocks): Thread
+   blocks is post order.
+
 2017-11-15  Alexandre Oliva 
 
* dumpfile.h (TDF_COMPARE_DEBUG): New.
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 3d3aeab2a66..045905eceb7 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2174,7 +2174,6 @@ thread_through_all_blocks (bool may_peel_loop_headers)
 {
   bool retval = false;
   unsigned int i;
-  bitmap_iterator bi;
   struct loop *loop;
   auto_bitmap threaded_blocks;
 
@@ -2278,14 +2277,33 @@ thread_through_all_blocks (bool may_peel_loop_headers)
 
   initialize_original_copy_tables ();
 
-  /* First perform the threading requests that do not affect
- loop structure.  */
-  EXECUTE_IF_SET_IN_BITMAP (threaded_blocks, 0, i, bi)
-{
-  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, i);
+  /* The order in which we process jump threads can be important.
+
+ Consider if we have two jump threading paths A and B.  If the
+ target edge of A is the starting edge of B and we thread path A
+ first, then we create an additional incoming edge into B->dest that
+ we can not discover as a jump threading path on this iteration.
+
+ If we instead thread B first, then the edge into B->dest will have
+ already been redirected before we process path A and path A will
+ natually, with no further work, target the redirected path for B.
 
-  if (EDGE_COUNT (bb->preds) > 0)
-   retval |= thread_block (bb, true);
+ An post-order is sufficient here.  Compute the ordering first, then
+ process the blocks.  */
+  if (!bitmap_empty_p (threaded_blocks))
+{
+  int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+  unsigned int postorder_num = post_order_compute (postorder, false, 
false);
+  for (unsigned int i = 0; i < postorder_num; i++)
+   {
+ unsigned int indx = postorder[i];
+ if (bitmap_bit_p (threaded_blocks, indx))
+   {
+ basic_block bb = BASIC_BLOCK_FOR_FN (cfun, indx);
+ retval |= thread_block (bb, true);
+   }
+   }
+  free (postorder);
 }
 
   /* Then perform the threading through loop headers.  We start with the


Re: [PATCH #2], make Float128 built-in functions work with -mabi=ieeelongdouble

2017-11-16 Thread David Edelsohn
On Thu, Nov 16, 2017 at 12:48 PM, Michael Meissner
 wrote:
> On Thu, Nov 16, 2017 at 04:48:18AM -0600, Segher Boessenkool wrote:
>> On Wed, Nov 15, 2017 at 04:56:10PM -0500, Michael Meissner wrote:
>> > David tells me that the patch to enable float128 built-in functions to work
>> > with the -mabi=ieeelongdouble option broke AIX because on AIX, the float128
>> > insns are disabled, and they all become CODE_FOR_nothing.  The switch 
>> > statement
>> > that was added in rs6000.c to map KFmode built-in functions to TFmode 
>> > breaks
>> > under AIX.
>>
>> It also breaks on Linux with older binutils (no HAVE_AS_POWER9 defined).
>>
>> > I changed the code to have a separate table, and the first call, I build 
>> > the
>> > table.  If the insn was not generated, it will just be CODE_FOR_nothing, 
>> > and
>> > the KF->TF mode conversion will not be done.
>> >
>> > I have tested this on a little endian power8 system and there were no
>> > regressions.  Once David verifies that it builds on AIX, can I check this 
>> > into
>> > the trunk?
>>
>> I don't like this scheme much (huge table, initialisation at runtime, etc.),
>> but okay for trunk, to unbreak things there.
>>
>> Some comments on the patch:
>>
>> > +  if (first_time)
>> > +   {
>> > + first_time = false;
>> > + gcc_assert ((int)CODE_FOR_nothing == 0);
>>
>> No useless cast please.  The whole assert is pretty useless fwiw; just
>> take it out?
>>
>> > + for (i = 0; i < ARRAY_SIZE (map); i++)
>> > +   map_insn_code[(int)map[i].from] = map[i].to;
>> > +   }
>>
>> Space after cast.
>>
>> Only do this for codes that are *not* CODE_FOR_nothing?
>
> I must admit to not liking the code, and it is overly complicated.
>
> It occurred to me this morning that a much simpler patch is to just #ifdef out
> the switch statement if we don't have the proper assembler.  I tried this on 
> an
> old power7 system using the system assembler (which does not support the ISA
> 3.0 instructions) and it built fine.  I think this will work on AIX.  David 
> can
> you check this?
>
> I will fire off a build, and if it is successful, can I check this patch
> instead of the other patch?

This patch will solve the problem.

GCC policy prefers runtime tests over #ifdef, but I agree that the
runtime approach is overly messy.  This seems like a reasonable
approach to me.

Thanks, David


Re: [patch] remove cilk-plus

2017-11-16 Thread Jeff Law
On 11/16/2017 09:22 AM, Eric Gallager wrote:
> On 11/16/17, Koval, Julia  wrote:
>> // I failed to send patch itself, it is too big even in gzipped form.  What
>> is the right way to send such big patches?
>>
>> Hi, this patch removes cilkplus. Ok for trunk?
> 
> I'm not a reviewer, but just as an onlooker, I'd want to see notes
> about the removal in the caveats section of
> https://gcc.gnu.org/gcc-8/changes.html
Cilk+ was deprecated in gcc-7 and announced as-such.

But I do think a one-liner to the gcc-8 page would be appropriate to
note its removal.

Jeff



Re: [PING] [PATCH] Remove CANADIAN, that break compilation for foreign target

2017-11-16 Thread Petr Ovtchenkov
On Wed, 20 Sep 2017 13:44:59 +0300
Petr Ovtchenkov  wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71212
> 
> On Fri, 20 May 2016 16:10:50 +0300
> Petr Ovtchenkov  wrote:
> 
> > Some old ad-hoc (adding -I/usr/include to compiler
> > flags) break compilation of libstdc++ for foreign
> > target architecture (due to compiler see includes
> > of native).

Reference for terms:

https://gcc.gnu.org/onlinedocs/gccint/Configure-Terms.html

Present of "CANADIAN=yes" lead to inclusion of
headers from build (-I/usr/include). "CANADIAN=yes" used _only_
to set "-I/usr/include".

Inclusion of build headers in cross-compilation
process is not a mistake only in case of native (i.e. it is mistake
for cross, for canadian, for crossed native and for crossback),
but sometimes give "success".

Note, that build/host/target may be different not only due to
different architectures, but due to different sysroots
(libc, kernel, binutils, etc.).

CANADIAN is set to "yes" by code

-  # If Canadian cross, then don't pick up tools from the build directory.
-  # Used only in GLIBCXX_EXPORT_INCLUDES.
-  if test -n "$with_cross_host" &&
- test x"$build_alias" != x"$with_cross_host" &&
- test x"$build" != x"$target";
-  then
-CANADIAN=yes
-  else
-CANADIAN=no
-  fi

and it add "-I/usr/include" to compiler flags for building libstdc++.
This is wrong.

Reference to patch:
https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01332.html


Re: [PATCH] Improve -Wmaybe-uninitialized documentation

2017-11-16 Thread Jeff Law
On 11/16/2017 03:49 AM, Jonathan Wakely wrote:
> On 15/11/17 20:28 -0700, Martin Sebor wrote:
>> On 11/15/2017 07:31 AM, Jonathan Wakely wrote:
>>> The docs for -Wmaybe-uninitialized have some issues:
>>>
>>> - That first sentence is looong.
>>> - Apparently some C++ programmers think "automatic variable" means one
>>> declared with C++11 `auto`, rather than simply a local variable.
>>> - The sentence about only warning when optimizing is stuck in between
>>> two chunks talking about longjmp, which could be inferred to mean
>>> only the setjmp/longjmp part of the warning depends on optimization.
>>>
>>> This attempts to make it easier to parse and understand.
>>
>> I've always found the description remarkably precise.  Particularly
>> the bit where it talks about the two paths, one initialized and the
>> other not.  Your rewording loses that distinction so I don't think
>> it's as accurate, or even correct.
>>
>> To use an example, this would satisfy the new description:
>>
>>  int f (void)
>>  {
>>    int i;
>>    return i;
>>  }
>>
>> but it doesn't match GCC behavior (it triggers -Wuninitialized,
>> not -Wmaybe-uninitialized).  Unless the distinction is more
>> subtle than I ascribe to it I think it needs to be preserved
>> in the rewording.
> 
> Ah, I tested a similar case and missed that the warning I got was from
> -Wuninitialized not -Wmaybe-uninitialized, which made me think that
> "a use of the variable that is initialized" was wrong.
> 
> OK, so then here's an alternative patch which doesn't touch that first
> sentence except to add "(i.e. local)". That makes the first sentence
> even longer, but if it's accurate maybe that's OK. This still adds
> "These warnings are only possible in optimizing compilation, because
> otherwise GCC does not keep track of the state of variables." And
> removes the similar text from the middle of the setjmp/longjmp
> discussion.
> 
> 
> 
> patch.txt
> 
> 
> commit 3ebe2a74817b63e27f961e91e6c044d00245
> Author: Jonathan Wakely 
> Date:   Thu Nov 16 10:43:51 2017 +
> 
> Improve -Wmaybe-uninitialized documentation
> 
> * doc/invoke.texi (-Wmaybe-uninitialized): Rephrase for clarity.
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 85c980bdfc9..bb68c308166 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -4970,11 +4970,16 @@ void store (int *i)
>  @item -Wmaybe-uninitialized
>  @opindex Wmaybe-uninitialized
>  @opindex Wno-maybe-uninitialized
> -For an automatic variable, if there exists a path from the function
> -entry to a use of the variable that is initialized, but there exist
> +For an automatic (i.e.@ local) variable, if there exists a path from the
> +function entry to a use of the variable that is initialized, but there exist
s/exist/exists/

?

I think with that nit it's ok.

jeff


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-16 Thread Jeff Law
On 11/16/2017 05:34 AM, Alan Hayward wrote:
> This is a set of patches aimed at supporting aarch64 SVE register
> preservation around TLS calls.
> 
> Across a TLS call, Aarch64 SVE does not explicitly preserve the
> SVE vector registers. However, the Neon vector registers are preserved.
> Due to overlapping of registers, this means the lower 128bits of all
> SVE vector registers will be preserved.
> 
> The existing GCC code will currently incorrectly assume preservation
> of all of the SVE registers.
> 
> This patch introduces a CLOBBER_HIGH expression. This behaves a bit like
> a CLOBBER expression. CLOBBER_HIGH can only refer to a single register.
> The mode of the expression indicates the size of the lower bits which
> will be preserved. If the register contains a value bigger than this
> mode then the code will treat the register as clobbered.
> 
> The means in order to evaluate if a clobber high is relevant, we need to 
> ensure
> the mode of the existing value in a register is tracked.
> 
> The following patches in this series add support for the CLOBBER_HIGH,
> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
> aarch64. The testing performed on these patches is also detailed in the
> final patch.
> 
> These patches are based on top of the linaro-dev/sve branch.
> 
> A simpler alternative to this patch would be to assume all Neon and SVE
> registers are clobbered across TLS calls, however this would be a
> performance regression against all Aarch64 targets.
So just a couple design questions.

Presumably there's no reasonable way to set up GCC's view of the
register file to avoid this problem?  ISTM that if the SVE register was
split into two, one for the part that overlapped with the neon register
and one that did not, then this could be handled via standard mechanisms?

Alternately would it be easier to clobber a subreg representing the high
part of the register?  Hmm, probably not.

Jeff


Re: Make istreambuf_iterator::_M_sbuf immutable and add debug checks

2017-11-16 Thread Petr Ovtchenkov
On Thu, 16 Nov 2017 18:40:08 +0100
François Dumont  wrote:

> On 16/11/2017 12:46, Jonathan Wakely wrote:
> > On 16/11/17 10:57 +, Jonathan Wakely wrote:
> >> On 16/11/17 08:51 +0300, Petr Ovtchenkov wrote:
> >>> On Mon, 6 Nov 2017 22:19:22 +0100
> >>> François Dumont  wrote:
> >>>
>  Hi
> 
>      Any final decision regarding this patch ?
> 
>  François
> >>>
> >>> https://gcc.gnu.org/ml/libstdc++/2017-11/msg00036.html
> >>> https://gcc.gnu.org/ml/libstdc++/2017-11/msg00035.html
> >>> https://gcc.gnu.org/ml/libstdc++/2017-11/msg00037.html
> >>> https://gcc.gnu.org/ml/libstdc++/2017-11/msg00034.html
> >>
> >> It would be helpful if you two could collaborate and come up with a
> >> good solution, or at least discuss the pros and cons, instead of just
> >> sending competing patches.
> >
> >
> > Let me be more clear: I'm not going to review further patches in this
> > area while you two are proposing different alternatives, without
> > commenting on each other's approach.
> >
> > If you think your solution is better than François's solution, you
> > should explain why, not just send a different patch. If François
> > thinks his solution is better than yours, he should state why, not
> > just send a different patch.
> >
> > I don't have time to infer all that from just your patches, so I'm not
> > going to bother.
> >
> >
> Proposing to revert my patch doesn't sound to me like a friendly action 
> to start a collaboration.

I'm already say that this is technical issue: this patch present only in
trunk yet. Series is more useful for applying in different branches.
BTW, https://gcc.gnu.org/ml/libstdc++/2017-11/msg00037.html
was inspired by you https://gcc.gnu.org/ml/libstdc++/2017-10/msg00029.html

> 
> My only concern has always been the Debug mode impact which is now fixed.

I hope I'm suggest identical behaviour for Debug and non-Debug mode
(no difference in interaction with associated streambuf).

> 
> I already said that I disagree with Petr's main goal to keep eof 
> iterator linked to the underlying stream.

Ok.

> So current implementation is 
> just fine to me and I'll let Petr argument for any change.

Please, clear for me: what is the "current implementation"?
Is it what we see now in trunk?

> @Jonathan, 
> You can ignore my last request to remove mutable keywork on _M_sbuf.
> 
> François
> 
> 

--

   - ptr


Re: [PATCH v2] [libcc1] Rename C{,P}_COMPILER_NAME and remove triplet from them

2017-11-16 Thread Jeff Law
On 11/15/2017 09:12 PM, Sergio Durigan Junior wrote:
> On Wednesday, November 15 2017, Jim Wilson wrote:
> 
>> On 11/13/2017 01:10 PM, Sergio Durigan Junior wrote:
>>> On Tuesday, September 26 2017, I wrote:
>>>
 Ping^2.
>>>
>>> Ping^3.
>>>
>>> I'm sending the updated ChangeLog/patch.  I'm also removing gdb-patches
>>> from the Cc list.
>>>
>>> libcc1/ChangeLog:
>>> 2017-09-01  Sergio Durigan Junior  
>>> Pedro Alves  
>>>
>>> * Makefile.am: Remove references to c-compiler-name.h and
>>> cp-compiler-name.h
>>> * Makefile.in: Regenerate.
>>> * compiler-name.hh: New file.
>>> * libcc1.cc: Don't include c-compiler-name.h.  Include
>>> compiler-name.hh.
>>> * libcp1.cc: Don't include cp-compiler-name.h.  Include
>>> compiler-name.hh.
>>
>> OK.
>>
>> This is a gcc plugin for gdb, so it makes sense that gdb developers
>> should be allowed to decide how it should work.
> 
> Thanks Jim and Alex for the review.
> 
> I don't have permission to push to the GCC repository, so if one of you
> guys could do it for me I'd appreciate.
Done.
jeff


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-16 Thread Richard Biener
On November 16, 2017 7:05:30 PM GMT+01:00, Jeff Law  wrote:
>On 11/16/2017 05:34 AM, Alan Hayward wrote:
>> This is a set of patches aimed at supporting aarch64 SVE register
>> preservation around TLS calls.
>> 
>> Across a TLS call, Aarch64 SVE does not explicitly preserve the
>> SVE vector registers. However, the Neon vector registers are
>preserved.
>> Due to overlapping of registers, this means the lower 128bits of all
>> SVE vector registers will be preserved.
>> 
>> The existing GCC code will currently incorrectly assume preservation
>> of all of the SVE registers.
>> 
>> This patch introduces a CLOBBER_HIGH expression. This behaves a bit
>like
>> a CLOBBER expression. CLOBBER_HIGH can only refer to a single
>register.
>> The mode of the expression indicates the size of the lower bits which
>> will be preserved. If the register contains a value bigger than this
>> mode then the code will treat the register as clobbered.
>> 
>> The means in order to evaluate if a clobber high is relevant, we need
>to ensure
>> the mode of the existing value in a register is tracked.
>> 
>> The following patches in this series add support for the
>CLOBBER_HIGH,
>> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
>> aarch64. The testing performed on these patches is also detailed in
>the
>> final patch.
>> 
>> These patches are based on top of the linaro-dev/sve branch.
>> 
>> A simpler alternative to this patch would be to assume all Neon and
>SVE
>> registers are clobbered across TLS calls, however this would be a
>> performance regression against all Aarch64 targets.
>So just a couple design questions.
>
>Presumably there's no reasonable way to set up GCC's view of the
>register file to avoid this problem?  ISTM that if the SVE register was
>split into two, one for the part that overlapped with the neon register
>and one that did not, then this could be handled via standard
>mechanisms?
>
>Alternately would it be easier to clobber a subreg representing the
>high
>part of the register?  Hmm, probably not.

I thought of a set of the preserved part to itself that leaves the upper part 
undefined. Not sure if we have such thing or if it would work in all places 
that a clobber does.

Richard. 

>Jeff



Re: [PATCH] c-family: add name_hint/deferred_diagnostic (v3)

2017-11-16 Thread Jeff Law
On 11/02/2017 06:21 PM, David Malcolm wrote:
> Jeff: You previously had concerns about the refcounting used in v1
> of this patch; this avoids that in favor of using gnu::unique_ptr.
> Joseph already approved the C frontend parts of v2 of this
> patch.  
I had to go back and find my original message to remember what I was
concerned about.  It was the "delete this" that caught my eye in
conjunction with reference counting which brought up issues of ensuring
the object was always heap allocated and such.

Otherwise the patch was reasonable.  Given you've changed the concerning
code to use the blessed gnu::unique_ptr those concerns should be fully
addressed now.


> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> OK for trunk?
> 
> Changed in v3:
> - We can't directly include "unique-ptr.h" due to the fix for
>   PR bootstrap/82610; see:
> https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01289.html
>   The fix is to define INCLUDE_UNIQUE_PTR before including system.h.
>   This version of the patch moves the usage of gnu::unique_ptr from
>   c-common.h to a new name-hint.h header, to avoid having to define
>   INCLUDE_UNIQUE_PTR everywhere that uses c-common.h.
> - Updated for *_at_rich_loc renaming
> 
> Changed in v2:
> - dropped refcounting in favor of using gnu::unique_ptr.  One
>   wart with this is that the handling of suppressed diagnostics
>   has to happen in every deferred_diagnostic subclass, rather
>   than in the name_hint class.  It would be possible to fix this
>   by introducing another dynamically-allocated object to manage
>   this concern, but adding another dynamic allocation seemed like
>   overkill.
> 
> Blurb from v1:
> 
> In various places we use lookup_name_fuzzy to provide a hint,
> and can report messages of the form:
>   error: unknown foo named 'bar'
> or:
>   error: unknown foo named 'bar'; did you mean 'SUGGESTION?
> 
> This patch provides a way for lookup_name_fuzzy to provide
> both the suggestion above, and (optionally) additional hints
> that can be printed e.g.
> 
>   note: did you forget to include ?
> 
> This patch provides the mechanism and ports existing users
> of lookup_name_fuzzy to the new return type.
> There are no uses of such hints in this patch, but followup
> patches provide various front-end specific uses of this.
> 
> gcc/c-family/ChangeLog:
>   * c-common.h (enum lookup_name_fuzzy_kind): Move to name-hint.h.
>   (lookup_name_fuzzy): Likewise.  Convert return type from
>   const char * to name_hint.  Add location_t param.
>   * name-hint.h: New header.
> 
> gcc/c/ChangeLog:
>   * c-decl.c: Define INCLUDE_UNIQUE_PTR before including system.h.
>   Include "c-family/name-hint.h"
>   (implicit_decl_warning): Convert "hint" from
>   const char * to name_hint.  Pass location to
>   lookup_name_fuzzy.  Suppress any deferred diagnostic if the
>   warning was not printed.
>   (undeclared_variable): Likewise for "guessed_id".
>   (lookup_name_fuzzy): Convert return type from const char *
>   to name_hint.  Add location_t param.
>   * c-parser.c: Define INCLUDE_UNIQUE_PTR before including system.h.
>   Include "c-family/name-hint.h"
>   (c_parser_declaration_or_fndef): Convert "hint" from
>   const char * to name_hint.  Pass location to lookup_name_fuzzy.
>   (c_parser_parameter_declaration): Likewise.
> 
> gcc/cp/ChangeLog:
>   * name-lookup.c: Define INCLUDE_UNIQUE_PTR before including system.h.
>   Include "c-family/name-hint.h"
>   (suggest_alternatives_for): Convert "fuzzy_name" from const char *
>   to name_hint, and rename to "hint".  Pass location to
>   lookup_name_fuzzy.
>   (lookup_name_fuzzy): Convert return type from const char *
>   to name_hint.  Add location_t param.
>   * parser.c: Define INCLUDE_UNIQUE_PTR before including system.h.
>   Include "c-family/name-hint.h"
>   (cp_parser_diagnose_invalid_type_name): Convert
>   "suggestion" from const char * to name_hint, and rename to "hint".
>   Pass location to lookup_name_fuzzy.
OK.

jeff


Re: [PATCH #2], make Float128 built-in functions work with -mabi=ieeelongdouble

2017-11-16 Thread Segher Boessenkool
On Thu, Nov 16, 2017 at 12:54:54PM -0500, David Edelsohn wrote:
> On Thu, Nov 16, 2017 at 12:48 PM, Michael Meissner
>  wrote:
> > On Thu, Nov 16, 2017 at 04:48:18AM -0600, Segher Boessenkool wrote:
> >> On Wed, Nov 15, 2017 at 04:56:10PM -0500, Michael Meissner wrote:
> >> > David tells me that the patch to enable float128 built-in functions to 
> >> > work
> >> > with the -mabi=ieeelongdouble option broke AIX because on AIX, the 
> >> > float128
> >> > insns are disabled, and they all become CODE_FOR_nothing.  The switch 
> >> > statement
> >> > that was added in rs6000.c to map KFmode built-in functions to TFmode 
> >> > breaks
> >> > under AIX.
> >>
> >> It also breaks on Linux with older binutils (no HAVE_AS_POWER9 defined).
> >>
> >> > I changed the code to have a separate table, and the first call, I build 
> >> > the
> >> > table.  If the insn was not generated, it will just be CODE_FOR_nothing, 
> >> > and
> >> > the KF->TF mode conversion will not be done.
> >> >
> >> > I have tested this on a little endian power8 system and there were no
> >> > regressions.  Once David verifies that it builds on AIX, can I check 
> >> > this into
> >> > the trunk?
> >>
> >> I don't like this scheme much (huge table, initialisation at runtime, 
> >> etc.),
> >> but okay for trunk, to unbreak things there.
> >>
> >> Some comments on the patch:
> >>
> >> > +  if (first_time)
> >> > +   {
> >> > + first_time = false;
> >> > + gcc_assert ((int)CODE_FOR_nothing == 0);
> >>
> >> No useless cast please.  The whole assert is pretty useless fwiw; just
> >> take it out?
> >>
> >> > + for (i = 0; i < ARRAY_SIZE (map); i++)
> >> > +   map_insn_code[(int)map[i].from] = map[i].to;
> >> > +   }
> >>
> >> Space after cast.
> >>
> >> Only do this for codes that are *not* CODE_FOR_nothing?
> >
> > I must admit to not liking the code, and it is overly complicated.
> >
> > It occurred to me this morning that a much simpler patch is to just #ifdef 
> > out
> > the switch statement if we don't have the proper assembler.  I tried this 
> > on an
> > old power7 system using the system assembler (which does not support the ISA
> > 3.0 instructions) and it built fine.  I think this will work on AIX.  David 
> > can
> > you check this?
> >
> > I will fire off a build, and if it is successful, can I check this patch
> > instead of the other patch?
> 
> This patch will solve the problem.
> 
> GCC policy prefers runtime tests over #ifdef, but I agree that the
> runtime approach is overly messy.  This seems like a reasonable
> approach to me.

Same here.  It's a nice simple patch, and with a comment even :-)

We also have 117 #if.* in rs6000.c already, one more won't hurt.


Segher


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-16 Thread Alan Hayward

> On 16 Nov 2017, at 18:24, Richard Biener  wrote:
> 
> On November 16, 2017 7:05:30 PM GMT+01:00, Jeff Law  wrote:
>> On 11/16/2017 05:34 AM, Alan Hayward wrote:
>>> This is a set of patches aimed at supporting aarch64 SVE register
>>> preservation around TLS calls.
>>> 
>>> Across a TLS call, Aarch64 SVE does not explicitly preserve the
>>> SVE vector registers. However, the Neon vector registers are
>> preserved.
>>> Due to overlapping of registers, this means the lower 128bits of all
>>> SVE vector registers will be preserved.
>>> 
>>> The existing GCC code will currently incorrectly assume preservation
>>> of all of the SVE registers.
>>> 
>>> This patch introduces a CLOBBER_HIGH expression. This behaves a bit
>> like
>>> a CLOBBER expression. CLOBBER_HIGH can only refer to a single
>> register.
>>> The mode of the expression indicates the size of the lower bits which
>>> will be preserved. If the register contains a value bigger than this
>>> mode then the code will treat the register as clobbered.
>>> 
>>> The means in order to evaluate if a clobber high is relevant, we need
>> to ensure
>>> the mode of the existing value in a register is tracked.
>>> 
>>> The following patches in this series add support for the
>> CLOBBER_HIGH,
>>> with the final patch adding CLOBBER_HIGHs around TLS_DESC calls for
>>> aarch64. The testing performed on these patches is also detailed in
>> the
>>> final patch.
>>> 
>>> These patches are based on top of the linaro-dev/sve branch.
>>> 
>>> A simpler alternative to this patch would be to assume all Neon and
>> SVE
>>> registers are clobbered across TLS calls, however this would be a
>>> performance regression against all Aarch64 targets.
>> So just a couple design questions.
>> 
>> Presumably there's no reasonable way to set up GCC's view of the
>> register file to avoid this problem?  ISTM that if the SVE register was
>> split into two, one for the part that overlapped with the neon register
>> and one that did not, then this could be handled via standard
>> mechanisms?
>> 

Yes, that was an early alternative option for the patch.

With that it would effect every operation that uses SVE registers. A simple
add of two registers now has 4 inputs and two outputs. It would get in the
way when debugging any sve dumps and be generally annoying.
Possible that the code for that in would all be in the aarch64 target,
(making everyone else happy!) But I suspect that there would be still be
strange dependency issues that’d need sorting in the common code.

Whereas with this patch, there are no new oddities in non-tls compiles/dumps.
Although the patch touches a lot of files, the changes are mostly restricted
to places where standard clobbers were already being checked.


>> Alternately would it be easier to clobber a subreg representing the
>> high
>> part of the register?  Hmm, probably not.
> 
> I thought of a set of the preserved part to itself that leaves the upper part 
> undefined. Not sure if we have such thing or if it would work in all places 
> that a clobber does.

I’ve not seen such a thing in the code. But it would need specific handling in
the all the existing clobber code.


Alan.




Re: Improve spilling for variable-size slots

2017-11-16 Thread Jeff Law
On 11/03/2017 10:35 AM, Richard Sandiford wrote:
> Once SVE is enabled, a general AArch64 spill slot offset will be
> 
>   A + B * VL
> 
> where A is a constant and B is a multiple of the SVE vector length.
> The offsets in SVE load and store instructions are a multiple of VL
> (and so can encode some values of B), while offsets for base AArch64
> load and store instructions aren't (and encode some values of A).
> 
> We therefore get better spill code if variable-sized slots are grouped
> together separately from constant-sized slots, and if variable-sized
> slots are not reused for constant-sized data.  Then, spills to the
> constant-sized slots can add B * VL to the offset first, creating a
> common anchor point for spills with the same B component but different
> A components.  Spills to variable-sized slots can likewise add A to
> the offset first, creating a common anchor point for spills with the
> same A component but different B components.
> 
> This patch implements the sorting and grouping side of the optimisation.
> A later patch creates the anchor points.
> 
> The patch is a no-op on other targets.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-11-03  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * lra-spills.c (pseudo_reg_slot_compare): Sort slots by whether
>   they are variable or constant sized.
>   (assign_stack_slot_num_and_sort_pseudos): Don't reuse variable-sized
>   slots for constant-sized data.
OK.
jeff
> 


Re: [patch, doc] Document that new Perl version breaks required automake

2017-11-16 Thread Jeff Law
On 11/09/2017 12:28 AM, Thomas Koenig wrote:
> Hello world,
> 
> while PR 82856 remains unsolved so far, this documentation patch at
> least points people into the right direction if --enable-maintainer-mode
> fails due to the incompatibility of the latest Perl version with
> the required automkake version.
> 
> Tested with "make dvi" and "make pdf". OK for trunk?
> 
> 2017-11-09  Thomas Koenig  
> 
>     PR bootstrap/82856
>     doc/install.texi: Document incompatibility of Perl >=5.6.26
>     with the required version of automake 1.11.6.
> 
OK.
jeff


Re: [PATCH 14/22] Enable building libsanitizer with Intel CET

2017-11-16 Thread Jeff Law
On 11/08/2017 04:22 PM, Tsimbalist, Igor V wrote:
> The revised patch is attached. The differences are in what options are 
> defined and propagated to Makefiles for CET enabling.
>  
> Ok for trunk?
OK once the set as a whole is ack'd.

JEff


  1   2   >