Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread Uros Bizjak
On Wed, Apr 20, 2016 at 9:53 PM, H.J. Lu  wrote:
> Since all 1s in TImode is standard SSE2 constants, all 1s in OImode is
> standard AVX2 constants and all 1s in XImode is standard AVX512F constants,
> pass mode to standard_sse_constant_p and standard_sse_constant_opcode
> to check if all 1s is available for target.
>
> Tested on Linux/x86-64.  OK for master?

No.

This patch should use "isa" attribute instead of adding even more
similar patterns. Also, please leave MEM_P checks, the rare C->m move
can be easily resolved by IRA.

Also, the mode checks should be in the predicate,
standard_sse_constant_p just validates if the constant is allowed by
ISA.

Uros.

> BTW, it will be used to fix
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70155
>
>
> H.J.
> ---
> * config/i386/i386-protos.h (standard_sse_constant_p): Take
> machine_mode with VOIDmode as default.
> * config/i386/i386.c (standard_sse_constant_p): Get mode if
> it is VOIDmode.  Return 2 for all 1s of integer in supported
> modes.
> (ix86_expand_vector_move): Pass mode to standard_sse_constant_p.
> * config/i386/i386.md (*movxi_internal_avx512f): Replace
> vector_move_operand with nonimmediate_or_sse_const_operand and
> use BC instead of C in constraint.  Check register_operand
> instead of MEM_P.  Pass mode to standard_sse_constant_opcode.
> (*movoi_internal_avx): Disabled for TARGET_AVX2.  Check
> register_operand instead of MEM_P.
> (*movoi_internal_avx2): New pattern.
> (*movti_internal_sse): Likewise.
> (*movti_internal): Renamed to ...
> (*movti_internal_sse2): This.  Require SSE2.  Use BC instead of
> C in constraint. Check register_operand instead of MEM_P in
> 32-bit mode.
> ---
>  gcc/config/i386/i386-protos.h |   2 +-
>  gcc/config/i386/i386.c|  27 ---
>  gcc/config/i386/i386.md   | 104 
> --
>  3 files changed, 121 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index ff47bc1..cf54189 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -50,7 +50,7 @@ extern bool ix86_using_red_zone (void);
>  extern int standard_80387_constant_p (rtx);
>  extern const char *standard_80387_constant_opcode (rtx);
>  extern rtx standard_80387_constant_rtx (int);
> -extern int standard_sse_constant_p (rtx);
> +extern int standard_sse_constant_p (rtx, machine_mode = VOIDmode);
>  extern const char *standard_sse_constant_opcode (rtx_insn *, rtx);
>  extern bool symbolic_reference_mentioned_p (rtx);
>  extern bool extended_reg_mentioned_p (rtx);
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 6379313..dd951c2 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -10766,18 +10766,31 @@ standard_80387_constant_rtx (int idx)
> in supported SSE/AVX vector mode.  */
>
>  int
> -standard_sse_constant_p (rtx x)
> +standard_sse_constant_p (rtx x, machine_mode mode)
>  {
> -  machine_mode mode;
> -
>if (!TARGET_SSE)
>  return 0;
>
> -  mode = GET_MODE (x);
> -
> +  if (mode == VOIDmode)
> +mode = GET_MODE (x);
> +
>if (x == const0_rtx || x == CONST0_RTX (mode))
>  return 1;
> -  if (vector_all_ones_operand (x, mode))
> +  if (CONST_INT_P (x))
> +{
> +  /* If mode != VOIDmode, standard_sse_constant_p must be called:
> +1. On TImode with SSE2.
> +2. On OImode with AVX2.
> +3. On XImode with AVX512F.
> +   */
> +  if ((HOST_WIDE_INT) INTVAL (x) == HOST_WIDE_INT_M1
> + && (mode == VOIDmode
> + || (mode == TImode && TARGET_SSE2)
> + || (mode == OImode && TARGET_AVX2)
> + || (mode == XImode && TARGET_AVX512F)))
> +   return 2;
> +}
> +  else if (vector_all_ones_operand (x, mode))
>  switch (mode)
>{
>case V16QImode:
> @@ -18758,7 +18771,7 @@ ix86_expand_vector_move (machine_mode mode, rtx 
> operands[])
>&& (CONSTANT_P (op1)
>   || (SUBREG_P (op1)
>   && CONSTANT_P (SUBREG_REG (op1
> -  && !standard_sse_constant_p (op1))
> +  && !standard_sse_constant_p (op1, mode))
>  op1 = validize_mem (force_const_mem (mode, op1));
>
>/* We need to check memory alignment for SSE mode since attribute
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index babd0a4..75227aa 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1971,8 +1971,10 @@
>
>  (define_insn "*movxi_internal_avx512f"
>[(set (match_operand:XI 0 "nonimmediate_operand" "=v,v ,m")
> -   (match_operand:XI 1 "vector_move_operand"  "C ,vm,v"))]
> -  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
> +   (match_operand:XI 1 "nonimmediate_or_sse_const_operand" "BC,vm,v"))]
> +  "TARGET_AVX512F
> +   && (register_operand (operands[0], XIm

Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 9:37 AM, Uros Bizjak  wrote:
> On Wed, Apr 20, 2016 at 9:53 PM, H.J. Lu  wrote:
>> Since all 1s in TImode is standard SSE2 constants, all 1s in OImode is
>> standard AVX2 constants and all 1s in XImode is standard AVX512F constants,
>> pass mode to standard_sse_constant_p and standard_sse_constant_opcode
>> to check if all 1s is available for target.
>>
>> Tested on Linux/x86-64.  OK for master?
>
> No.
>
> This patch should use "isa" attribute instead of adding even more
> similar patterns. Also, please leave MEM_P checks, the rare C->m move
> can be easily resolved by IRA.

Actually, register_operand checks are indeed better, please disregard
MEM_P recommendation.

Uros.


[PATCH][AArch64][wwwdocs] Summarise some more AArch64 changes for GCC6

2016-04-21 Thread Kyrill Tkachov

Hi all,

Here's a proposed summary of the changes in the AArch64 backend for GCC 6.
If there's anything I've missed it's purely my oversight, feel free to add
entries or suggest improvements.

Jim, you added support for the qdf24xx identifier to -mcpu and -mtune.
Could you please suggest an appropriate entry to describe it?
I think the same format as the Cortex-A35 entry in this patch would be 
appropriate.

Ok to commit?

Thanks,
Kyrill
Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.62
diff -U 3 -r1.62 changes.html
--- htdocs/gcc-6/changes.html	24 Feb 2016 09:36:06 -	1.62
+++ htdocs/gcc-6/changes.html	12 Apr 2016 12:47:30 -
@@ -312,29 +312,91 @@
 AArch64

  
+   A number of AArch64-specific options were added.  The most important
+   ones are summarised in this section but for usage instructions please
+   refer to the documentation.
+ 
+ 
The new command line options -march=native,
-mcpu=native and -mtune=native are now
available on native AArch64 GNU/Linux systems.  Specifying
these options will cause GCC to auto-detect the host CPU and
rewrite these options to the optimal setting for that system.
-   If GCC is unable to detect the host CPU these options have no effect.
  
  
-   -fpic is now supported by the AArch64 target when generating
+   -fpic is now supported when generating
code for the small code model (-mcmodel=small).  The size of
the global offset table (GOT) is limited to 28KiB under the LP64 SysV ABI
, and 15KiB under the ILP32 SysV ABI.
  
  
-   The AArch64 port now supports target attributes and pragmas.  Please
-   refer to the https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes";>
-   documentation for details of available attributes and
+   Target attributes and pragmas are now supported.  Please
+   refer to the documentation for details of available attributes and
pragmas as well as usage instructions.
  
  
Link-time optimization across translation units with different
target-specific options is now supported.
  
+ 
+   The option -mtls-size= is now supported.  It can be used to
+   specify the bit size of TLS offsets, allowing GCC to generate
+   better TLS instruction sequences.
+ 
+ 
+   The option -fno-plt is now fixed and is fully
+   functional.
+ 
+ 
+   The ARMv8.1-A architecture and the Large System Extensions are now
+   supported.  They can be used by specifying the
+   -march=armv8.1-a option.  Additionally, the
+   +lse option extension can be used in a similar fashion
+   to other option extensions.
+   The Large System Extensions introduce new instructions that are used
+   in the implementation of common atomic operations.
+ 
+ 
+   The ACLE half-precision floating-point type __fp16 is now
+   supported in the C and C++ languages.
+ 
+ 
+   The ARM Cortex-A35 processor is now supported via the
+   -mcpu=cortex-a35 and -mtune=cortex-a35
+   options as well as the equivalent target attributes and pragmas.
+ 
+ 
+   Code generation for the ARM Cortex-A57 processor is improved.
+   Among general code generation improvements, a better algorithm is
+   added for allocating registers to floating-point multiply-accumulate
+   instructions offering increased performance when compiling with
+   -mcpu=cortex-a57 or -mtune=cortex-a57.
+ 
+ Code generation for the ARM Cortex-A53 processor is improved.
+   A more accurate instruction scheduling model for the processor is
+   now used, and a number of compiler tuning parameters have been set
+   to offer increased performance when compiling with
+   -mcpu=cortex-a53 or -mtune=cortex-a53.
+ 
+ Code generation for the Samsung Exynos M1 processor is improved.
+   A more accurate instruction scheduling model for the processor is
+   now used, and a number of compiler tuning parameters have been set
+   to offer increased performance when compiling with
+   -mcpu=exynos-m1 or -mtune=exynos-m1.
+ 
+ 
+   Improvements in the generation of conditional branches and literal
+   pools were made to allow the compiler to compile functions of a large
+   size.  Constant pools are now placed into separate rodata sections.
+   The new option -mpc-relative-literal-loads is
+   introduced to generate per-function literal pools, limiting the maximum
+   size of functions to 1MiB.
+ 
+ 
+   Several correctness issues with generation of Advanced SIMD instructions
+   for big-endian targets have been fixed resulting in improved code
+   generation for ACLE intrinsics wit

[Ada] Spurious errors with generalized iterators

2016-04-21 Thread Arnaud Charlet
This patch fixes some spurious errors in a generalized iterator over a user-
defined container, when the first parameter of the Iterate function
is an access parameter, and the iterator type is locally derived.

Executing;

   gnatmake -q ausprobieren.adb
   ausprobieren

must yield:

5
999
5
999

---
with Ada.Text_IO;
use  Ada.Text_IO;
with Circularly_Linked_Lists;
procedure Ausprobieren is

  package Lists is new Circularly_Linked_Lists (Integer);
  use Lists;

  Elem1 : aliased Integer := 5;
  List: aliased Circularly_Linked_List := Init (Elem1);
  Elem2 : aliased Integer := 999;
begin
  List.Insert (Elem2);
  for Cursor in List.Iterate loop
Put_Line (Integer'Image (List (Cursor)));
  end loop;

  for Elm of List loop
 Put_Line (Integer'Image (Elm));
  end loop;
end Ausprobieren; 
---
with Ada.Iterator_Interfaces;

generic

  type Element_Type is private;

package Circularly_Linked_Lists is

  type Circularly_Linked_List is tagged private
with Default_Iterator  => Iterate,
 Iterator_Element  => Element_Type,
 Variable_Indexing => Acc;

  type Accessor (Elem: not null access Element_Type) is limited null record
with Implicit_Dereference => Elem;

  type Cursor is private;
  function Init (X : aliased ELement_Type) return Circularly_Linked_List;
  function Has_Element (Position: Cursor) return Boolean;

  function Acc (CLL : in out Circularly_Linked_List;
Position: in Cursor) return Accessor;
  package Iterator_Interfaces is new
  Ada.Iterator_Interfaces (Cursor, Has_Element);

  type Forward_Iterator (CLL: not null access Circularly_Linked_List) is new
  Iterator_Interfaces.Forward_Iterator with null record;

  overriding function First (Object  : Forward_Iterator) return Cursor;
  overriding function Next  (Object  : Forward_Iterator;
 Position: Cursor  ) return Cursor;

   function Iterate1 (CLL: not null access Circularly_Linked_List'Class)
   return Forward_Iterator;

  function Iterate (CLL: not null access Circularly_Linked_List )
return Forward_Iterator'Class;

  procedure Insert
  (Object : in out Circularly_Linked_List; Thing : aliased Element_Type);
private

  type CLL_Ptr is access all Circularly_Linked_List;

  type Ptr is access all Element_Type;

  type Cursor is record
Current: CLL_Ptr;
  end record;

  type Circularly_Linked_List is tagged record
Next, Prev: CLL_Ptr;
It  : Ptr;
  end record;
end Circularly_Linked_Lists;
---
package body Circularly_Linked_Lists is

  function Init (X : aliased ELement_Type) return Circularly_Linked_List is
  begin
 return (null, null, X'Unrestricted_Access);
  end;
  function Has_Element (Position: Cursor) return Boolean is
  begin
  return  Position.Current /= null and then Position.Current.It /= null;
  end Has_Element;

  function Acc (CLL : in out Circularly_Linked_List;
Position: in Cursor) return Accessor is
  begin
 return (Elem => Position.Current.It);
  end;

   function Iterate1 (CLL: not null access Circularly_Linked_List'Class)
   return Forward_Iterator
 is
  begin
 return forward_iterator'(Iterator_Interfaces.Forward_Iterator with
CLL => CLL.all'Unrestricted_Access);
  end;


 function Iterate (CLL: not null access Circularly_Linked_List )
return Forward_Iterator'Class
 is
  begin
 return forward_iterator'(Iterator_Interfaces.Forward_Iterator with
CLL => CLL);
  end;

  overriding function First (Object  : Forward_Iterator) return Cursor is
  begin
 return (Current => Object.CLL.all'Unchecked_Access);
  end;
  overriding function Next  (Object  : Forward_Iterator;
 Position: Cursor  ) return Cursor is
  begin
 return (Current => Position.Current.Next);
  end;
 
  procedure Insert
 (Object : in out Circularly_Linked_List; Thing : aliased Element_Type) is
  begin
 Object.Next := new Circularly_Linked_List'
   (Prev => Object'Unchecked_access,
Next => Object.Next,
It => Thing'Unrestricted_Access);
  end;
end Circularly_Linked_Lists;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-21  Ed Schonberg  

* sem_util.adb (Denotes_Iterator): Use root type to determine
whether the ultimate ancestor is the predefined iterator
interface pakage.
* exp_ch5.adb (Expand_Iterator_Over_Container): simplify code
and avoid reuse of Pack local variable.

Index: exp_ch5.adb
===
--- exp_ch5.adb (revision 235268)
+++ exp_ch5.adb (working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+-

[Ada] Improved handling of rep item chains

2016-04-21 Thread Arnaud Charlet
This patch is an internal improvement to the handling of rep item chains of
types. The inheritance of rep item chains can how avoid potential cycles. No
need for a test, no change in behavior.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-21  Hristian Kirtchev  

* sem_aux.ads, sem_aux.adb (Has_Rep_Item): New variant.
* sem_util.adb (Inherit_Rep_Item_Chain): Reimplemented.

Index: sem_aux.adb
===
--- sem_aux.adb (revision 235193)
+++ sem_aux.adb (working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -708,6 +708,29 @@
   return Present (Get_Rep_Item (E, Nam1, Nam2, Check_Parents));
end Has_Rep_Item;
 
+   function Has_Rep_Item (E : Entity_Id; N : Node_Id) return Boolean is
+  Item : Node_Id;
+
+   begin
+  pragma Assert
+(Nkind_In (N, N_Aspect_Specification,
+  N_Attribute_Definition_Clause,
+  N_Enumeration_Representation_Clause,
+  N_Pragma,
+  N_Record_Representation_Clause));
+
+  Item := First_Rep_Item (E);
+  while Present (Item) loop
+ if Item = N then
+return True;
+ end if;
+
+ Item := Next_Rep_Item (Item);
+  end loop;
+
+  return False;
+   end Has_Rep_Item;
+

-- Has_Rep_Pragma --

Index: sem_aux.ads
===
--- sem_aux.ads (revision 235192)
+++ sem_aux.ads (working copy)
@@ -6,7 +6,7 @@
 --  --
 -- S p e c  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -246,6 +246,10 @@
--  not inherited from its parents, if any). If found then True is returned,
--  otherwise False indicates that no matching entry was found.
 
+   function Has_Rep_Item (E : Entity_Id; N : Node_Id) return Boolean;
+   --  Determine whether the Rep_Item chain of arbitrary entity E contains item
+   --  N. N must denote a valid rep item.
+
function Has_Rep_Pragma
  (E : Entity_Id;
   Nam   : Name_Id;
Index: sem_util.adb
===
--- sem_util.adb(revision 235304)
+++ sem_util.adb(working copy)
@@ -10733,57 +10733,143 @@

 
procedure Inherit_Rep_Item_Chain (Typ : Entity_Id; From_Typ : Entity_Id) is
-  From_Item : constant Node_Id := First_Rep_Item (From_Typ);
-  Item  : Node_Id := Empty;
-  Last_Item : Node_Id := Empty;
+  Item  : Node_Id;
+  Next_Item : Node_Id;
 
begin
-  --  Reach the end of the destination type's chain (if any) and capture
-  --  the last item.
+  --  There are several inheritance scenarios to consider depending on
+  --  whether both types have rep item chains and whether the destination
+  --  type already inherits part of the source type's rep item chain.
 
-  Item := First_Rep_Item (Typ);
-  while Present (Item) loop
+  --  1) The source type lacks a rep item chain
+  -- From_Typ ---> Empty
+  --
+  -- Typ > Item (or Empty)
 
- --  Do not inherit a chain that has been inherited already
+  --  In this case inheritance cannot take place because there are no items
+  --  to inherit.
 
- if Item = From_Item then
-return;
- end if;
+  --  2) The destination type lacks a rep item chain
+  -- From_Typ ---> Item ---> ...
+  --
+  -- Typ > Empty
 
- Last_Item := Item;
- Item := Next_Rep_Item (Item);
-  end loop;
+  --  Inheritance takes place by setting the First_Rep_Item of the
+  --  destination type to the First_Rep_Item of the source type.
+ 

[Ada] Freezing a subprogram does not always freeze its profile

2016-04-21 Thread Arnaud Charlet
AI05-019 specifies the conditions under which freezing a subprogram also
freezes the profile of the subprogram. Prior to this patch the profile was
frozen unconditionally, leading to spurious errors.

Examples in ACATS test BDD2004.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-21  Ed Schonberg  

* freeze.ads, freeze.adb (Freeze_Entity, Freeze_Before): Add
boolean parameter to determine whether freezing an overloadable
entity freezes its profile as well. This is required by
AI05-019. The call to Freeze_Profile within Freeze_Entity is
conditioned by the value of this flag, whose default is True.
* sem_attr.adb (Resolve_Attribute, case 'Access): The attribute
reference freezes the prefix, but it the prefix is a subprogram
it does not freeze its profile.

Index: freeze.adb
===
--- freeze.adb  (revision 235267)
+++ freeze.adb  (working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -1908,9 +1908,17 @@
-- Freeze_Before --
---
 
-   procedure Freeze_Before (N : Node_Id; T : Entity_Id) is
-  Freeze_Nodes : constant List_Id := Freeze_Entity (T, N);
+   procedure Freeze_Before
+ (N   : Node_Id;
+  T   : Entity_Id;
+  F_P : Boolean := True)
+   is
+   --  Freeze T, then insert the generated Freeze nodes before the node N.
+   --  The flag F_P is used when T is an overloadable entity, and indicates
+   --  whether its profile should be frozen at the same time.
 
+  Freeze_Nodes : constant List_Id := Freeze_Entity (T, N, F_P);
+
begin
   if Ekind (T) = E_Function then
  Check_Expression_Function (N, T);
@@ -1925,7 +1933,11 @@
-- Freeze_Entity --
---
 
-   function Freeze_Entity (E : Entity_Id; N : Node_Id) return List_Id is
+   function Freeze_Entity
+ (E : Entity_Id;
+  N : Node_Id;
+  F_P : Boolean := True) return List_Id
+   is
   Loc: constant Source_Ptr := Sloc (N);
   Atype  : Entity_Id;
   Comp   : Entity_Id;
@@ -4990,12 +5002,13 @@
 
 --  In Ada 2012, freezing a subprogram does not always freeze
 --  the corresponding profile (see AI05-019). An attribute
---  reference is not a freezing point of the profile.
+--  reference is not a freezing point of the profile. The boolean
+--  Flag F_P indicates whether the profile should be frozen now.
 --  Other constructs that should not freeze ???
 
 --  This processing doesn't apply to internal entities (see below)
 
-if not Is_Internal (E) then
+if not Is_Internal (E) and then F_P then
if not Freeze_Profile (E) then
   Ghost_Mode := Save_Ghost_Mode;
   return Result;
Index: freeze.ads
===
--- freeze.ads  (revision 235192)
+++ freeze.ads  (working copy)
@@ -6,7 +6,7 @@
 --  --
 -- S p e c  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -187,13 +187,19 @@
--  If Initialization_Statements (E) is an N_Compound_Statement, insert its
--  actions in the enclosing list and reset the attribute.
 
-   function Freeze_Entity (E : Entity_Id; N : Node_Id) return List_Id;
+   function Freeze_Entity
+ (E : Entity_Id;
+  N : Node_Id;
+  F_P : Boolean := True) return List_Id;
--  Freeze an entity, and return Freeze nodes, to be inserted at the point
--  of call. N is a node whose source location corresponds to the freeze
--  point. This is used in placing warning messages in the situation where
--  it appears that a type has been frozen too early, e.g. when a primitive
--  operation is declared after the freezi

Re: patch to fix PR70689

2016-04-21 Thread Jiong Wang



On 19/04/16 03:54, Vladimir N Makarov wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70689

The patch was successfully tested and bootstrapped on x86/x86-64.

Committed to the trunk as rev. 235184.


This caused the following regression on trunk

  FAIL: gcc: gcc.target/arm/eliminate.c scan-assembler-times r0,[\\t ]*sp 3

configuration: --target=arm-none-eabi --enable-languages=c
compile option: -O2 -march=armv7-a

before:
===
foo:
str lr, [sp, #-4]!
sub sp, sp, #12
add r0, sp, #4
bl  bar
add r0, sp, #4
bl  bar
add r0, sp, #4
bl  bar
add sp, sp, #12
ldr pc, [sp], #4

after:
===
foo:
str lr, [sp, #-4]!
sub sp, sp, #20
add r3, sp, #12
str r3, [sp, #4]
mov r0, r3
bl  bar
add r3, sp, #12
str r3, [sp, #4]
mov r0, r3
bl  bar
add r3, sp, #12
str r3, [sp, #4]
mov r0, r3
bl  bar
add sp, sp, #20
ldr pc, [sp], #4


[Ada] Minor changes to warning message for exe filenames on Windows

2016-04-21 Thread Arnaud Charlet
Change wording of the warning message on problematic filenames to be
more neutral. Add a new substring "patch" introduced on Windows 10.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-21  Vasiliy Fofanov  

* gnatlink.adb: Change wording of the warning message on
problematic filenames to be more neutral. Add a new substring
"patch" introduced on Windows 10.

Index: gnatlink.adb
===
--- gnatlink.adb(revision 235192)
+++ gnatlink.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1996-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1996-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -1680,10 +1680,10 @@
 
--  Special warnings for worrisome file names on windows
 
-   --  Windows-7 will not allow an executable file whose name contains any
-   --  of the substrings "install", "setup", or "update" to load without
-   --  special administration privileges. This rather incredible behavior
-   --  is Microsoft's idea of a useful security precaution.
+   --  Recent versions of Windows by default cause privilege escalation if an
+   --  executable file name contains substrings "install", "setup", "update"
+   --  or "patch". A console application will typically fail to load as a
+   --  result, so we should warn the user.
 
Bad_File_Names_On_Windows : declare
   FN : String := Output_File_Name.all;
@@ -1696,13 +1696,10 @@
  for J in 1 .. FN'Length - (S'Length - 1) loop
 if FN (J .. J + (S'Length - 1)) = S then
Error_Msg
- ("warning: possible problem with executable name """
-  & Output_File_Name.all & '"');
+ ("warning: executable file name """ & Output_File_Name.all
+  & """ contains substring """ & S & '"');
Error_Msg
- ("file name contains substring """ & S & '"');
-   Error_Msg
- ("admin privileges may be required on Windows 7 "
-  & "to load this file");
+ ("admin privileges may be required to run this file");
 end if;
  end loop;
   end Check_File_Name;
@@ -1723,6 +1720,7 @@
  Check_File_Name ("install");
  Check_File_Name ("setup");
  Check_File_Name ("update");
+ Check_File_Name ("patch");
   end if;
end Bad_File_Names_On_Windows;
 


Re: [PATCH PR70715]Expand simple operations in IV.base and check if it's the control_IV

2016-04-21 Thread Richard Biener
On Wed, Apr 20, 2016 at 5:08 PM, Bin Cheng  wrote:
> Hi,
> As reported in PR70715, GCC failed to prove no-overflows of IV(&p[n]) for 
> simple example like:
> int
> foo (char *p, unsigned n)
> {
>   while(n--)
> {
>   p[n]='A';
> }
>   return 0;
> }
> Actually, code has already been added to handle this form loops when fixing 
> PR68529.  Problem with this case is loop niter analyzer records control_IV 
> with its base expanded by calling expand_simple_operations.  This patch 
> simply adds code expanding BASE before we check its equality against 
> control_IV.base.  In the long run, we might want to remove the use of 
> expand_simple_operations.
>
> Bootstrap and test on x86_64.  Is it OK?

Ok.

Richard.

> Thanks,
> bin
>
>
> 2016-04-20  Bin Cheng  
>
> PR tree-optimization/70715
> * tree-ssa-loop-niter.c (loop_exits_before_overflow): Check equality
> after expanding BASE using expand_simple_operations.


Re: match.pd patch: max(int_min, x)->x

2016-04-21 Thread Richard Biener
On Wed, Apr 20, 2016 at 8:44 PM, Marc Glisse  wrote:
> Hello,
>
> this simple transformation is currently done in RTL, sometimes also in VRP
> if we have any kind of range information (even on the wrong side, but not
> with VR_VARYING). It seems more natural to complete the match.pd pattern
> than make VRP understand this case.
>
> Bootstrap+regtest on powerpc64le-unknown-linux-gnu (some noise in libgomp
> testcases).

Ok.

Thanks,
Richard.

> 2016-04-21  Marc Glisse  
>
> gcc/
> * match.pd (min(int_max, x), max(int_min, x)): New transformations.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/minmax-1.c: New testcase.
>
> --
> Marc Glisse
> Index: gcc/match.pd
> ===
> --- gcc/match.pd(revision 235292)
> +++ gcc/match.pd(working copy)
> @@ -1185,30 +1185,40 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* min(max(x,y),y) -> y.  */
>  (simplify
>   (min:c (max:c @0 @1) @1)
>   @1)
>  /* max(min(x,y),y) -> y.  */
>  (simplify
>   (max:c (min:c @0 @1) @1)
>   @1)
>  (simplify
>   (min @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_MIN_VALUE (type)
> -  && operand_equal_p (@1, TYPE_MIN_VALUE (type), OEP_ONLY_CONST))
> -  @1))
> + (switch
> +  (if (INTEGRAL_TYPE_P (type)
> +   && TYPE_MIN_VALUE (type)
> +   && operand_equal_p (@1, TYPE_MIN_VALUE (type), OEP_ONLY_CONST))
> +   @1)
> +  (if (INTEGRAL_TYPE_P (type)
> +   && TYPE_MAX_VALUE (type)
> +   && operand_equal_p (@1, TYPE_MAX_VALUE (type), OEP_ONLY_CONST))
> +   @0)))
>  (simplify
>   (max @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_MAX_VALUE (type)
> -  && operand_equal_p (@1, TYPE_MAX_VALUE (type), OEP_ONLY_CONST))
> -  @1))
> + (switch
> +  (if (INTEGRAL_TYPE_P (type)
> +   && TYPE_MAX_VALUE (type)
> +   && operand_equal_p (@1, TYPE_MAX_VALUE (type), OEP_ONLY_CONST))
> +   @1)
> +  (if (INTEGRAL_TYPE_P (type)
> +   && TYPE_MIN_VALUE (type)
> +   && operand_equal_p (@1, TYPE_MIN_VALUE (type), OEP_ONLY_CONST))
> +   @0)))
>  (for minmax (FMIN FMAX)
>   /* If either argument is NaN, return the other one.  Avoid the
>  transformation if we get (and honor) a signalling NaN.  */
>   (simplify
>(minmax:c @0 REAL_CST@1)
>(if (real_isnan (TREE_REAL_CST_PTR (@1))
> && (!HONOR_SNANS (@1) || !TREE_REAL_CST (@1).signalling))
> @0)))
>  /* Convert fmin/fmax to MIN_EXPR/MAX_EXPR.  C99 requires these
> functions to return the numeric arg if the other one is NaN.
> Index: gcc/testsuite/gcc.dg/tree-ssa/minmax-1.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/minmax-1.c(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/minmax-1.c(working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +static int min(int a,int b){return (a +static int max(int a,int b){return (a +int f(int x){return max(x,-__INT_MAX__-1);}
> +int g(int x){return min(x,__INT_MAX__);}
> +
> +/* { dg-final { scan-tree-dump-times "return x_\[0-9\]+.D.;" 2 "optimized"
> } } */
>


[Ada] Improved exception message for Host_Error

2016-04-21 Thread Arnaud Charlet
GNAT.Sockets.Host_Error is the exception raised when a host name
or address lookup fails. This change improves the associated exception
message by including the offending name or address

The following test case must display:

raised GNAT.SOCKETS.HOST_ERROR : [1] Host not found: nonexistent-host

with GNAT.Sockets; use GNAT.Sockets;
procedure Get_Nonexistent_Host is
   Dummy : Host_Entry_Type := Get_Host_By_Name ("nonexistent-host");
begin
   null;
end Get_Nonexistent_Host;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-21  Thomas Quinot  

* g-socket.adb (Raise_Host_Error): Include additional Name parameter.


Index: g-socket.adb
===
--- g-socket.adb(revision 235192)
+++ g-socket.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
--- Copyright (C) 2001-2014, AdaCore --
+-- Copyright (C) 2001-2016, AdaCore --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -185,9 +185,10 @@
--  Raise Socket_Error with an exception message describing the error code
--  from errno.
 
-   procedure Raise_Host_Error (H_Error : Integer);
+   procedure Raise_Host_Error (H_Error : Integer; Name : String);
--  Raise Host_Error exception with message describing error code (note
-   --  hstrerror seems to be obsolete) from h_errno.
+   --  hstrerror seems to be obsolete) from h_errno. Name is the name
+   --  or address that was being looked up.
 
procedure Narrow (Item : in out Socket_Set_Type);
--  Update Last as it may be greater than the real last socket
@@ -973,7 +974,7 @@
  Res'Access, Buf'Address, Buflen, Err'Access) /= 0
   then
  Netdb_Unlock;
- Raise_Host_Error (Integer (Err));
+ Raise_Host_Error (Integer (Err), Image (Address));
   end if;
 
   begin
@@ -1015,7 +1016,7 @@
(HN, Res'Access, Buf'Address, Buflen, Err'Access) /= 0
  then
 Netdb_Unlock;
-Raise_Host_Error (Integer (Err));
+Raise_Host_Error (Integer (Err), Name);
  end if;
 
  return H : constant Host_Entry_Type :=
@@ -1700,11 +1701,12 @@
-- Raise_Host_Error --
--
 
-   procedure Raise_Host_Error (H_Error : Integer) is
+   procedure Raise_Host_Error (H_Error : Integer; Name : String) is
begin
   raise Host_Error with
 Err_Code_Image (H_Error)
-  & Host_Error_Messages.Host_Error_Message (H_Error);
+  & Host_Error_Messages.Host_Error_Message (H_Error)
+  & ": " & Name;
end Raise_Host_Error;
 



[Ada] Minimizing recompilation with multiple limited_with clauses

2016-04-21 Thread Arnaud Charlet
This patch simplifies the contents of Ali files, but removing from it
dependency lines that denote units that are not analyzed, because they only
appear in the context of units named in limited_with clauses.


The following must execute quietly:

   gcc -c -gnatc a_things.ads
   grep c_things a_things.ali
   grep f_things a_things.ali

---
limited with B_Things;

package A_Things is

   type Instance is tagged null record;
   type Class_Access is access all Instance'Class;

   procedure Do_Something (Self : Instance);
   function Get_B_Instance (Self : Instance)
 return access B_Things.Instance'Class;
   function Get_B_Class_Acccess (Self : Instance) return B_Things.Class_Access;

end A_Things;
---
limited with A_Things;
limited with C_Things;

package B_Things is

   type Instance is tagged null record;
   type Class_Access is access all Instance'Class;
   Some_Junk : Integer := 1235;
   procedure Do_Something (Self : Instance);

   function Get_A_Instance (Self : Instance)
  return access A_Things.Instance'Class;
   function Get_A_Class_Acccess (Self : Instance)
  return A_Things.Class_Access;

   function Get_C_Instance (Self : Instance)
  return access C_Things.Instance'Class;
   function Get_C_Class_Acccess (Self : Instance)
  return C_Things.Class_Access;

end B_Things;
---
limited with B_Things;
limited with D_Things;

package C_Things is

   type Instance is tagged null record;
   type Class_Access is access all Instance'Class;

   procedure Do_Something (Self : Instance);

   function Get_B_Instance (Self : Instance)
  return access B_Things.Instance'Class;
   function Get_B_Class_Acccess (Self : Instance)
  return B_Things.Class_Access;

   function Get_D_Instance (Self : Instance)
  return access D_Things.Instance'Class;
   function Get_D_Class_Acccess (Self : Instance)
  return D_Things.Class_Access;

end C_Things;
---
limited with C_Things;
limited with E_Things;

package D_Things is

   type Instance is tagged null record;
   type Class_Access is access all Instance'Class;

   procedure Do_Something (Self : Instance);

   function Get_C_Instance (Self : Instance)
  return access C_Things.Instance'Class;
   function Get_C_Class_Acccess (Self : Instance)
  return C_Things.Class_Access;

   function Get_E_Instance (Self : Instance)
  return access E_Things.Instance'Class;
   function Get_E_Class_Acccess (Self : Instance)
  return E_Things.Class_Access;

end D_Things;
---
limited with E_Things;
limited with F_Things;

package E_Things is

   type Instance is tagged null record;
   type Class_Access is access all Instance'Class;

   procedure Do_Something (Self : Instance);

   function Get_D_Instance (Self : Instance)
  return access D_Things.Instance'Class;
   function Get_D_Class_Acccess (Self : Instance)
  return D_Things.Class_Access;

   function Get_F_Instance (Self : Instance)
  return access F_Things.Instance'Class;
   function Get_F_Class_Acccess (Self : Instance)
  return F_Things.Class_Access;

end E_Things;
---
limited with E_Things;

package F_Things is

   type Instance is tagged private;
   type Class_Access is access all Instance'Class;

   procedure Do_Something (Self : Instance);

   function Get_E_Instance (Self : Instance)
  return access E_Things.Instance'Class is (null);
   function Get_E_Class_Acccess (Self : Instance)
  return E_Things.Class_Access is (null);

private

   type Instance is tagged record
  X : Integer;
  Y : Integer;
  Z : Integer;
   end record;

end F_Things;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-21  Ed Schonberg  

* lib-writ.adb (Write_ALI): Do not record in ali file units
that are present in the files table but not analyzed. These
units are present because they appear in the context of units
named in limited_with clauses, and the unit being compiled does
not depend semantically on them.

Index: lib-writ.adb
===
--- lib-writ.adb(revision 235192)
+++ lib-writ.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -155,8 +155,9 @@
 OA_Setting=> 'O',
 SPARK_Mode_Pragma => Empty);
 
-  --  Parse system.ads so that the checksum is set right
-  --  Style checks are not applied.
+  --  Pars

[PATCH] Fix PR70747

2016-04-21 Thread Richard Biener

I am testing the following (obvious) patch to fix PR70747.

Richard.

2016-04-21  Richard Biener  

PR middle-end/70747
* fold-const.c (fold_comparison): Return properly typed
constant boolean.

* gcc.dg/pr70747.c: New testcase.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 235305)
--- gcc/fold-const.c(working copy)
*** fold_comparison (location_t loc, enum tr
*** 8676,8686 
case EQ_EXPR:
case LE_EXPR:
case LT_EXPR:
! return boolean_false_node;
case GE_EXPR:
case GT_EXPR:
case NE_EXPR:
! return boolean_true_node;
default:
  gcc_unreachable ();
}
--- 8686,8696 
case EQ_EXPR:
case LE_EXPR:
case LT_EXPR:
! return constant_boolean_node (false, type);
case GE_EXPR:
case GT_EXPR:
case NE_EXPR:
! return constant_boolean_node (true, type);
default:
  gcc_unreachable ();
}
Index: gcc/testsuite/gcc.dg/pr70747.c
===
*** gcc/testsuite/gcc.dg/pr70747.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr70747.c  (working copy)
***
*** 0 
--- 1,10 
+ /* { dg-do compile } */
+ /* { dg-options "-pedantic" } */
+ 
+ int *a, b;
+ 
+ void fn1 ()
+ {
+   a = __builtin_malloc (sizeof(int)*2); 
+   b = &a[1] == (0, 0); /* { dg-warning "comparison between pointer and 
integer" } */
+ }


[Ada] Incorrect result of equality on multidimensional packed arrays

2016-04-21 Thread Arnaud Charlet
This patch corrects an oversight in the computation of size of multidimensional
packed arrays.  Previously to this patch only the first dimension was used
to determine the number of storage units to compare.

Executing

   gnatmake -q equality.adb
   equality

must yield

   Success - comparison claims these are different

---
with ADA.TEXT_IO;

procedure EQUALITY is

   type FLAG_TYPE is (RED, GREEN);
   for FLAG_TYPE'size use 1;
   
   type TWO_DIM_ARRAY_TYPE is array 
   (INTEGER range 1 .. 16, INTEGER range 1 .. 16) of FLAG_TYPE;
   pragma PACK(TWO_DIM_ARRAY_TYPE);
   
   ARR_1 : TWO_DIM_ARRAY_TYPE := (others => (others => RED));
   ARR_2 : TWO_DIM_ARRAY_TYPE := (others => (others => RED));

begin

   ARR_2(2,1) := GREEN;-- Make the two arrays different.

   if ARR_1 /= ARR_2
   then
  ADA.TEXT_IO.PUT_LINE("Success - comparison claims these are different");
   else
  ADA.TEXT_IO.PUT_LINE("Failure - comparison claims these are identical"); 
   end if;

end EQUALITY;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-21  Ed Schonberg  

* exp_pakd.adb (Compute_Number_Components): New function to
build an expression that computes the number of a components of
an array that may be multidimensional.
(Expan_Packed_Eq): Use it.

Index: exp_pakd.adb
===
--- exp_pakd.adb(revision 235192)
+++ exp_pakd.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -81,6 +81,12 @@
-- Local Subprograms --
---
 
+   function Compute_Number_Components
+  (N   : Node_Id;
+   Typ : Entity_Id) return Node_Id;
+   --  Build an expression that multiplies the length of the dimensions of the
+   --  array, used to control array equality checks.
+
procedure Compute_Linear_Subscript
  (Atyp   : Entity_Id;
   N  : Node_Id;
@@ -260,6 +266,38 @@
   return Adjusted;
end Revert_Storage_Order;
 
+   ---
+   -- Compute_Number_Components --
+   ---
+
+   function Compute_Number_Components
+  (N   : Node_Id;
+   Typ : Entity_Id) return Node_Id
+   is
+  Loc  : constant Source_Ptr := Sloc (N);
+  Len_Expr : Node_Id;
+
+   begin
+  Len_Expr :=
+Make_Attribute_Reference (Loc,
+  Attribute_Name => Name_Length,
+  Prefix => New_Occurrence_Of (Typ, Loc),
+  Expressions=> New_List (Make_Integer_Literal (Loc, 1)));
+
+  for J in 2 .. Number_Dimensions (Typ) loop
+ Len_Expr :=
+   Make_Op_Multiply (Loc,
+ Left_Opnd  => Len_Expr,
+ Right_Opnd =>
+   Make_Attribute_Reference (Loc,
+Attribute_Name => Name_Length,
+Prefix => New_Occurrence_Of (Typ, Loc),
+Expressions=> New_List (Make_Integer_Literal (Loc, J;
+  end loop;
+
+  return Len_Expr;
+   end Compute_Number_Components;
+
--
-- Compute_Linear_Subscript --
--
@@ -451,7 +489,6 @@
   PASize   : Uint;
   Decl : Node_Id;
   PAT  : Entity_Id;
-  Len_Dim  : Node_Id;
   Len_Expr : Node_Id;
   Len_Bits : Uint;
   Bits_U1  : Node_Id;
@@ -811,35 +848,8 @@
  --  Build an expression for the length of the array in bits.
  --  This is the product of the length of each of the dimensions
 
- declare
-J : Nat := 1;
+ Len_Expr := Compute_Number_Components (Typ, Typ);
 
- begin
-Len_Expr := Empty; -- suppress junk warning
-
-loop
-   Len_Dim :=
- Make_Attribute_Reference (Loc,
-   Attribute_Name => Name_Length,
-   Prefix => New_Occurrence_Of (Typ, Loc),
-   Expressions=> New_List (
- Make_Integer_Literal (Loc, J)));
-
-   if J = 1 then
-  Len_Expr := Len_Dim;
-
-   else
-  Len_Expr :=
-Make_Op_Multiply (Loc,
-  Left_Opnd  => Len_Expr,
-  Right_Opnd => Len_Dim);
-   end if;
-
-   J := J + 1;
-   exit when J > Number_Dimensions (Typ)

[Ada] Detection of missing abstract state refinement

2016-04-21 Thread Arnaud Charlet
This patch implements a mechanism which detects missing refinement of abstract
states depending on whether a package requires a completing body or not. The
patch also cleans up the two entity lists used to store refinement and Part_Of
constituents of abstract states.


-- Source --


--  lib_pack_1.ads

package Lib_Pack_1
  with SPARK_Mode,
   Abstract_State => Error_State_1   --  Error
is
   package Nested_1
 with Abstract_State => Error_State_2--  Error
   is
   end Nested_1;
end Lib_Pack_1;

--  lib_pack_2.ads

package Lib_Pack_2
  with SPARK_Mode,
   Abstract_State => OK_1
is
   package Nested_1
 with Abstract_State => Error_1  --  Error
   is
   end Nested_1;

   package Nested_2
 with Abstract_State => OK_2
   is
   end Nested_2;

   package Nested_3
 with Abstract_State => Error_2  --  Error
   is
   end Nested_3;

   procedure Force_Body;
end Lib_Pack_2;

--  lib_pack_2.adb

package body Lib_Pack_2
   with SPARK_Mode,
Refined_State => (OK_1 => null)
is
   package body Nested_1 is
   end Nested_1;

   package body Nested_2
 with Refined_State => (OK_2 => null)
   is
   end Nested_2;

   --  package body Nested_3 is missing

   procedure Force_Body is begin null; end Force_Body;
end Lib_Pack_2;

--  non_lib_pack.ads

package Non_Lib_Pack with SPARK_Mode is
   procedure Force_Body;
end Non_Lib_Pack;

--  non_lib_pack.adb

package body Non_Lib_Pack with SPARK_Mode is
   procedure Force_Body is
  package Nested_1
with Abstract_State => Error_1   --  Error
  is
  end Nested_1;

  package body Nested_1 is
  end Nested_1;

  package Nested_2
with Abstract_State => OK_1
  is
  end Nested_2;

  package body Nested_2
with Refined_State => (OK_1 => null) --  OK
  is
  end Nested_2;

  package Nested_3
with Abstract_State => Error_2   --  Error
  is
  end Nested_3;

  --  package body Nested_3 is missing
   begin
  null;
   end Force_Body;
end Non_Lib_Pack;


-- Compilation and output --


$ gcc -c lib_pack_1.ads
$ gcc -c lib_pack_2.adb
$ gcc -c non_lib_pack.adb
lib_pack_1.ads:3:26: state "Error_State_1" requires refinement
lib_pack_1.ads:6:29: state "Error_State_2" requires refinement
lib_pack_2.ads:6:29: state "Error_1" requires refinement
lib_pack_2.ads:16:29: state "Error_2" requires refinement
non_lib_pack.adb:4:32: state "Error_1" requires refinement
non_lib_pack.adb:22:32: state "Error_2" requires refinement

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-21  Hristian Kirtchev  

* contracts.adb (Analyze_Package_Body_Contract): Do not check
for a missing package refinement because 1) packages do not have
"refinement" and 2) the check for proper state refinement is
performed in a different place.
* einfo.adb (Has_Non_Null_Visible_Refinement): Reimplemented.
(Has_Null_Visible_Refinement): Reimplemented.
* sem_ch3.adb (Analyze_Declarations): Determine whether all
abstract states have received a refinement and if not, emit
errors.
* sem_ch7.adb (Analyze_Package_Declaration): Code
cleanup. Determine whether all abstract states of the
package and any nested packages have received a refinement
and if not, emit errors.
(Requires_Completion_In_Body): Add new formal parameter
Do_Abstract_States. Update the comment on usage. Propagate the
Do_Abstract_States flag to all Unit_Requires_Body calls.
(Unit_Requires_Body): Remove formal
parameter Ignore_Abstract_States. Add new formal paramter
Do_Abstract_States. Propagate the Do_Abstract_States flag to
all Requires_Completion_In calls.
* sem_ch7.ads (Unit_Requires_Body): Remove formal
parameter Ignore_Abstract_States. Add new formal paramter
Do_Abstract_States. Update the comment on usage.
* sem_ch9.adb (Analyze_Single_Protected_Declaration): Do
not initialize the constituent list as this is now done on a
need-to-add-element basis.
(Analyze_Single_Task_Declaration):
Do not initialize the constituent list as this is now done on
a need-to-add-element basis.
* sem_ch10.adb (Decorate_State): Do not initialize the constituent
lists as this is now done on a need-to-add-element basis.
* sem_prag.adb (Analyze_Constituent): Set the
refinement constituents when adding a new element.
(Analyze_Part_Of_In_Decl_Part): Set the Part_Of constituents when
adding a new element.
(Analyze_Part_Of_Option): Set the Part_Of
constituents when adding a new element.
(Analyze_Pragma):

Re: [PATCH] vrp: remove rendundant has_single_use tests

2016-04-21 Thread Richard Biener
On Thu, Apr 21, 2016 at 12:45 AM, Patrick Palka  wrote:
> During assert-location discovery, if an SSA name is live according to
> live_on_edge() on some outgoing edge E, then the SSA name definitely has
> at least two uses: the use on the outgoing edge, and the use in some BB
> dominating E->src from which the SSA_NAME and the potential assertion
> was discovered.  These two uses can't be the same because the liveness
> array is populated on-the-fly in reverse postorder so the latter use
> which dominates BB couldn't have yet contributed to the liveness bitmap.
>
> So AFAICT it's not necessary to check live_on_edge() as well as
> !has_single_use() since the former check will imply the latter.  So this
> patch removes these redundant calls to has_single_use() (and alse
> replaces the use of has_single_use() in find_assert_locations_1 with a
> liveness bitmap test which should be cheaper and more accurate).
>
> I bootstrapped and regtested this change on x86_64-pc-linux-gnu.  I also
> confirmed that the number of calls made to register_new_assert_for
> before and after the patch remains the same during compilation of
> libstdc++ and during compilation of gimple-match.c and when running the
> tree-ssa.exp testsuite.  Does this look OK to commit?

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-vrp.c (register_edge_assert_for_2): Remove redundant
> has_single_use() tests.
> (register_edge_assert_for_1): Likewise.
> (find_assert_locations_1): Check the liveness bitmap instead of
> calling has_single_use().
> ---
>  gcc/tree-vrp.c | 29 ++---
>  1 file changed, 10 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index bbdf9ce..3cb470b 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -5145,8 +5145,7 @@ register_edge_assert_for_2 (tree name, edge e, 
> gimple_stmt_iterator bsi,
>
>/* Only register an ASSERT_EXPR if NAME was found in the sub-graph
>   reachable from E.  */
> -  if (live_on_edge (e, name)
> -  && !has_single_use (name))
> +  if (live_on_edge (e, name))
>  register_new_assert_for (name, name, comp_code, val, NULL, e, bsi);
>
>/* In the case of NAME <= CST and NAME being defined as
> @@ -5188,8 +5187,7 @@ register_edge_assert_for_2 (tree name, edge e, 
> gimple_stmt_iterator bsi,
>   && (cst2 == NULL_TREE
>   || TREE_CODE (cst2) == INTEGER_CST)
>   && INTEGRAL_TYPE_P (TREE_TYPE (name3))
> - && live_on_edge (e, name3)
> - && !has_single_use (name3))
> + && live_on_edge (e, name3))
> {
>   tree tmp;
>
> @@ -5215,8 +5213,7 @@ register_edge_assert_for_2 (tree name, edge e, 
> gimple_stmt_iterator bsi,
>   && TREE_CODE (name2) == SSA_NAME
>   && TREE_CODE (cst2) == INTEGER_CST
>   && INTEGRAL_TYPE_P (TREE_TYPE (name2))
> - && live_on_edge (e, name2)
> - && !has_single_use (name2))
> + && live_on_edge (e, name2))
> {
>   tree tmp;
>
> @@ -5319,8 +5316,7 @@ register_edge_assert_for_2 (tree name, edge e, 
> gimple_stmt_iterator bsi,
>   tree op1 = gimple_assign_rhs2 (def_stmt);
>   if (TREE_CODE (op0) == SSA_NAME
>   && TREE_CODE (op1) == INTEGER_CST
> - && live_on_edge (e, op0)
> - && !has_single_use (op0))
> + && live_on_edge (e, op0))
> {
>   enum tree_code reverse_op = (rhs_code == PLUS_EXPR
>? MINUS_EXPR : PLUS_EXPR);
> @@ -5346,8 +5342,7 @@ register_edge_assert_for_2 (tree name, edge e, 
> gimple_stmt_iterator bsi,
>   && (comp_code == LE_EXPR || comp_code == GT_EXPR
>   || !tree_int_cst_equal (val,
>   TYPE_MIN_VALUE (TREE_TYPE (val
> - && live_on_edge (e, name2)
> - && !has_single_use (name2))
> + && live_on_edge (e, name2))
> {
>   tree tmp, cst;
>   enum tree_code new_comp_code = comp_code;
> @@ -5392,8 +5387,7 @@ register_edge_assert_for_2 (tree name, edge e, 
> gimple_stmt_iterator bsi,
>   && INTEGRAL_TYPE_P (TREE_TYPE (name2))
>   && IN_RANGE (tree_to_uhwi (cst2), 1, prec - 1)
>   && prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val)))
> - && live_on_edge (e, name2)
> - && !has_single_use (name2))
> + && live_on_edge (e, name2))
> {
>   mask = wi::mask (tree_to_uhwi (cst2), false, prec);
>   val2 = fold_binary (LSHIFT_EXPR, TREE_TYPE (val), val, cst2);
> @@ -5498,12 +5492,10 @@ register_edge_assert_for_2 (tree name, edge e, 
> gimple_stmt_iterator bsi,
>   || !INTEGRAL_TYPE_P (TREE_TYPE (names[1]))
>   || (TYPE_PRECISION (TREE_TYPE (name2))
>   != TYPE_PRECISION (TREE_TYPE (names[1])))
> -

Re: [PATCH] opts-global.c: Include gimple.h for LAST_AND_UNUSED_GIMPLE_CODE.

2016-04-21 Thread Richard Biener
On Thu, Apr 21, 2016 at 4:23 AM, Khem Raj  wrote:
> gcc/:
> 2016-04-16  Khem Raj  
>
> * opts-global.c: Include gimple.h for LAST_AND_UNUSED_GIMPLE_CODE.
>
> Fixes build errors e.g.
>
> | 
> ../../../../../../../work-shared/gcc-6.0.0-r0/git/gcc/lto-streamer.h:159:34: 
> error: 'LAST_AND_UNUSED_GIMPLE_CODE' was not declared in this scope
> |LTO_bb0 = 1 + MAX_TREE_CODES + LAST_AND_UNUSED_GIMPLE_CODE,
> ---
>  gcc/opts-global.c | 1 +
>  1 file changed, 1 insertion(+)

I don't see build errors here.  How do you get them?

Richard.

> diff --git a/gcc/opts-global.c b/gcc/opts-global.c
> index 989ef3d..92fb9ac 100644
> --- a/gcc/opts-global.c
> +++ b/gcc/opts-global.c
> @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "plugin-api.h"
>  #include "ipa-ref.h"
>  #include "cgraph.h"
> +#include "gimple.h"
>  #include "lto-streamer.h"
>  #include "output.h"
>  #include "plugin.h"
> --
> 2.8.0
>


Re: [PATCH] vrp: remove rendundant has_single_use tests

2016-04-21 Thread Richard Biener
On Thu, Apr 21, 2016 at 2:47 AM, Patrick Palka  wrote:
> On Wed, Apr 20, 2016 at 6:45 PM, Patrick Palka  wrote:
>> During assert-location discovery, if an SSA name is live according to
>> live_on_edge() on some outgoing edge E, then the SSA name definitely has
>> at least two uses: the use on the outgoing edge, and the use in some BB
>> dominating E->src from which the SSA_NAME and the potential assertion
>> was discovered.  These two uses can't be the same because the liveness
>> array is populated on-the-fly in reverse postorder so the latter use
>> which dominates BB couldn't have yet contributed to the liveness bitmap.
>>
>> So AFAICT it's not necessary to check live_on_edge() as well as
>> !has_single_use() since the former check will imply the latter.  So this
>> patch removes these redundant calls to has_single_use() (and alse
>> replaces the use of has_single_use() in find_assert_locations_1 with a
>> liveness bitmap test which should be cheaper and more accurate).
>>
>> I bootstrapped and regtested this change on x86_64-pc-linux-gnu.  I also
>> confirmed that the number of calls made to register_new_assert_for
>> before and after the patch remains the same during compilation of
>> libstdc++ and during compilation of gimple-match.c and when running the
>> tree-ssa.exp testsuite.  Does this look OK to commit?
>>
>> gcc/ChangeLog:
>>
>> * tree-vrp.c (register_edge_assert_for_2): Remove redundant
>> has_single_use() tests.
>> (register_edge_assert_for_1): Likewise.
>> (find_assert_locations_1): Check the liveness bitmap instead of
>> calling has_single_use().
>
> By the way, would it be reasonable to cache/precompute the number of
> non-debug uses each ssa name has so that has_single_use, has_zero_uses
> etc are much cheaper?

Not sure whether that's good (think of the need to update this plus the
storage required for it).  Maybe keep the immediate use list in order
{ real uses, debug uses }?
(thus do inserting at head/tail depending on use stmt case)

Richard.


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 9:42 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 9:37 AM, Uros Bizjak  wrote:
>> On Wed, Apr 20, 2016 at 9:53 PM, H.J. Lu  wrote:
>>> Since all 1s in TImode is standard SSE2 constants, all 1s in OImode is
>>> standard AVX2 constants and all 1s in XImode is standard AVX512F constants,
>>> pass mode to standard_sse_constant_p and standard_sse_constant_opcode
>>> to check if all 1s is available for target.
>>>
>>> Tested on Linux/x86-64.  OK for master?
>>
>> No.
>>
>> This patch should use "isa" attribute instead of adding even more
>> similar patterns. Also, please leave MEM_P checks, the rare C->m move
>> can be easily resolved by IRA.
>
> Actually, register_operand checks are indeed better, please disregard
> MEM_P recommendation.

So, something like attached untested RFC proto-patch, that lacks
wide-int handling.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0687701..572f5bf 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10777,7 +10777,23 @@ standard_sse_constant_p (rtx x)
   
   if (x == const0_rtx || x == CONST0_RTX (mode))
 return 1;
-  if (vector_all_ones_operand (x, mode))
+
+  else if (CONST_INT_P (x))
+{
+  if (INTVAL (X) == HOST_WIDE_INT_M1
+ && TARGET_SSE2)
+   return 2;
+}
+  else if (CONST_WIDE_INT_P (x))
+{
+  if ( something involving wi::minus-one 
+ && TARGET_AVX2)
+   return 2;
+  if (
+ && TARGET_AVX512F)
+   return 2;
+}
+  else if (vector_all_ones_operand (x, mode))
 switch (mode)
   {
   case V16QImode:
@@ -10811,53 +10827,70 @@ standard_sse_constant_p (rtx x)
 const char *
 standard_sse_constant_opcode (rtx_insn *insn, rtx x)
 {
+  machine_mode insn_mode = get_attr_mode (insn);
+
   switch (standard_sse_constant_p (x))
 {
 case 1:
-  switch (get_attr_mode (insn))
+  switch (insn_mode)
{
case MODE_XI:
  return "vpxord\t%g0, %g0, %g0";
-   case MODE_V16SF:
- return TARGET_AVX512DQ ? "vxorps\t%g0, %g0, %g0"
-: "vpxord\t%g0, %g0, %g0";
-   case MODE_V8DF:
- return TARGET_AVX512DQ ? "vxorpd\t%g0, %g0, %g0"
-: "vpxorq\t%g0, %g0, %g0";
+   case MODE_OI:
+ return (TARGET_AVX512VL
+ ? "vpxord\t%x0, %x0, %x0"
+ : "vpxor\t%x0, %x0, %x0");
case MODE_TI:
- return TARGET_AVX512VL ? "vpxord\t%t0, %t0, %t0"
-: "%vpxor\t%0, %d0";
-   case MODE_V2DF:
- return "%vxorpd\t%0, %d0";
-   case MODE_V4SF:
- return "%vxorps\t%0, %d0";
+ return (TARGET_AVX512VL
+ ? "vpxord\t%t0, %t0, %t0"
+ : "%vpxor\t%0, %d0");
 
-   case MODE_OI:
- return TARGET_AVX512VL ? "vpxord\t%x0, %x0, %x0"
-: "vpxor\t%x0, %x0, %x0";
+   case MODE_V8DF:
+ return (TARGET_AVX512DQ
+ ? "vxorpd\t%g0, %g0, %g0"
+ : "vpxorq\t%g0, %g0, %g0");
case MODE_V4DF:
  return "vxorpd\t%x0, %x0, %x0";
+   case MODE_V2DF:
+ return "%vxorpd\t%0, %d0";
+
+   case MODE_V16SF:
+ return (TARGET_AVX512DQ
+ ? "vxorps\t%g0, %g0, %g0"
+ : "vpxord\t%g0, %g0, %g0");
case MODE_V8SF:
  return "vxorps\t%x0, %x0, %x0";
+   case MODE_V4SF:
+ return "%vxorps\t%0, %d0";
 
default:
  break;
}
 
 case 2:
-  if (TARGET_AVX512VL
- || get_attr_mode (insn) == MODE_XI
- || get_attr_mode (insn) == MODE_V8DF
- || get_attr_mode (insn) == MODE_V16SF)
-   return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
-  if (TARGET_AVX)
-   return "vpcmpeqd\t%0, %0, %0";
-  else
-   return "pcmpeqd\t%0, %0";
+  switch (GET_MODE_SIZE (insn_mode))
+   {
+   case 64:
+ gcc_assert (TARGET_AVX512F);
+ return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
+   case 32:
+ gcc_assert (TARGET_AVX2);
+ return (TARGET_AVX512VL
+ ? "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
+ : "vpcmpeqd\t%0, %0, %0");
+   case 16:
+ gcc_assert (TARGET_SSE2);
+ return (TARGET_AVX512VL
+ ? "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
+ : "pcmpeqd\t%0, %0");
+   default:
+ break;
+   }
 
 default:
   break;
 }
+
   gcc_unreachable ();
 }
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 38eb98c..3337968 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1970,9 +1970,11 @@
(set_attr "length_immediate" "1")])
 
 (define_insn "*movxi_internal_avx512f"
-  [(set (match_operand:XI 0 "nonimmediate_operand" "=v,v ,m")
-   (match_operand:XI 1 "vector_move_operand"  "C ,vm,v

match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-21 Thread Marc Glisse

Hello,

another simple transformation.

Instead of the ":s", I had single_use (@2) || single_use (@3), but changed 
it for simplicity. There may be some patterns in match.pd where we want 
something like that though, as requiring single_use on many expressions 
may be stricter than we need.


We could generalize to cases where overflow is not undefined if we know 
(VRP) that the variables are not TYPE_MIN_VALUE, but that didn't look like 
a priority.


Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

2016-04-21  Marc Glisse  

gcc/
* match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-2.c: New testcase.


--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd(revision 235292)
+++ gcc/match.pd(working copy)
@@ -1215,20 +1215,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
MIN and MAX don't honor that, so only transform if -ffinite-math-only
is set.  C99 doesn't require -0.0 to be handled, so we don't have to
worry about it either.  */
 (if (flag_finite_math_only)
  (simplify
   (FMIN @0 @1)
   (min @0 @1))
  (simplify
   (FMAX @0 @1)
   (max @0 @1)))
+/* min (-A, -B) -> -max (A, B)  */
+(for minmax (min max FMIN FMAX)
+ maxmin (max min FMAX FMIN)
+ (simplify
+  (minmax (negate:s@2 @0) (negate:s@3 @1))
+  (if (FLOAT_TYPE_P (TREE_TYPE (@0))
+   || (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0
+   (negate (maxmin @0 @1)
+/* MIN (~X, ~Y) -> ~MAX (X, Y)
+   MAX (~X, ~Y) -> ~MIN (X, Y)  */
+(for minmax (min max)
+ maxmin (max min)
+ (simplify
+  (minmax (bit_not:s@2 @0) (bit_not:s@3 @1))
+  (bit_not (maxmin @0 @1
 
 /* Simplifications of shift and rotates.  */
 
 (for rotate (lrotate rrotate)
  (simplify
   (rotate integer_all_onesp@0 @1)
   @0))
 
 /* Optimize -1 >> x for arithmetic right shifts.  */
 (simplify
Index: gcc/testsuite/gcc.dg/tree-ssa/minmax-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/minmax-2.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/minmax-2.c(working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fstrict-overflow -fdump-tree-optimized" } */
+
+static int max(int a,int b){return (a

Re: [PATCH] opts-global.c: Include gimple.h for LAST_AND_UNUSED_GIMPLE_CODE.

2016-04-21 Thread Alexander Monakov
On Wed, 20 Apr 2016, Khem Raj wrote:

> gcc/:
> 2016-04-16  Khem Raj  
> 
>   * opts-global.c: Include gimple.h for LAST_AND_UNUSED_GIMPLE_CODE.
> 
> Fixes build errors e.g.
> 
> | 
> ../../../../../../../work-shared/gcc-6.0.0-r0/git/gcc/lto-streamer.h:159:34: 
> error: 'LAST_AND_UNUSED_GIMPLE_CODE' was not declared in this scope
> |LTO_bb0 = 1 + MAX_TREE_CODES + LAST_AND_UNUSED_GIMPLE_CODE,
> ---
>  gcc/opts-global.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/opts-global.c b/gcc/opts-global.c
> index 989ef3d..92fb9ac 100644
> --- a/gcc/opts-global.c
> +++ b/gcc/opts-global.c
> @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "plugin-api.h"
>  #include "ipa-ref.h"
>  #include "cgraph.h"
> +#include "gimple.h"
>  #include "lto-streamer.h"
>  #include "output.h"
>  #include "plugin.h"

The context in this patch looks like old contents of opts-global.c, prior to
Andrew MacLeod's cleanups in December 2015.  Here's how the includes look in
today's gcc-6 branch:

21 #include "config.h"
22 #include "system.h"
23 #include "coretypes.h"
24 #include "backend.h"
25 #include "rtl.h"
26 #include "tree.h"
27 #include "tree-pass.h"
28 #include "diagnostic.h"
29 #include "opts.h"
30 #include "flags.h"
31 #include "langhooks.h"
32 #include "dbgcnt.h"
33 #include "debug.h"
34 #include "output.h"
35 #include "plugin.h"
36 #include "toplev.h"
37 #include "context.h"
38 #include "asan.h"

Alexander


Re: match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-21 Thread Richard Biener
On Thu, Apr 21, 2016 at 12:32 PM, Marc Glisse  wrote:
> Hello,
>
> another simple transformation.
>
> Instead of the ":s", I had single_use (@2) || single_use (@3), but changed
> it for simplicity. There may be some patterns in match.pd where we want
> something like that though, as requiring single_use on many expressions may
> be stricter than we need.
>
> We could generalize to cases where overflow is not undefined if we know
> (VRP) that the variables are not TYPE_MIN_VALUE, but that didn't look like a
> priority.
>
> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

Ok.  I thought about using negate_expr_p but min(-x,5) -> -max(x, -5) doesn't
look like an obvious win.

Thanks,
Richard.

> 2016-04-21  Marc Glisse  
>
> gcc/
> * match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
> New transformations.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/minmax-2.c: New testcase.
>
>
> --
> Marc Glisse
> Index: gcc/match.pd
> ===
> --- gcc/match.pd(revision 235292)
> +++ gcc/match.pd(working copy)
> @@ -1215,20 +1215,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> MIN and MAX don't honor that, so only transform if -ffinite-math-only
> is set.  C99 doesn't require -0.0 to be handled, so we don't have to
> worry about it either.  */
>  (if (flag_finite_math_only)
>   (simplify
>(FMIN @0 @1)
>(min @0 @1))
>   (simplify
>(FMAX @0 @1)
>(max @0 @1)))
> +/* min (-A, -B) -> -max (A, B)  */
> +(for minmax (min max FMIN FMAX)
> + maxmin (max min FMAX FMIN)
> + (simplify
> +  (minmax (negate:s@2 @0) (negate:s@3 @1))
> +  (if (FLOAT_TYPE_P (TREE_TYPE (@0))
> +   || (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0
> +   (negate (maxmin @0 @1)
> +/* MIN (~X, ~Y) -> ~MAX (X, Y)
> +   MAX (~X, ~Y) -> ~MIN (X, Y)  */
> +(for minmax (min max)
> + maxmin (max min)
> + (simplify
> +  (minmax (bit_not:s@2 @0) (bit_not:s@3 @1))
> +  (bit_not (maxmin @0 @1
>
>  /* Simplifications of shift and rotates.  */
>
>  (for rotate (lrotate rrotate)
>   (simplify
>(rotate integer_all_onesp@0 @1)
>@0))
>
>  /* Optimize -1 >> x for arithmetic right shifts.  */
>  (simplify
> Index: gcc/testsuite/gcc.dg/tree-ssa/minmax-2.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/minmax-2.c(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/minmax-2.c(working copy)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fstrict-overflow -fdump-tree-optimized" } */
> +
> +static int max(int a,int b){return (a +int f(int x,int y){return max(-x,-y);}
> +int g(int x,int y){return max(~x,~y);}
> +double h(double x,double y){return __builtin_fmax(-x,-y);}
> +
> +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump "__builtin_fmin" "optimized" } } */
>


[PATCH] Fix PR70740

2016-04-21 Thread Richard Biener

I am testing the following patch to fix PR70740.

Bootstrap/regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-04-21  Richard Biener  

PR tree-optimization/70740
* tree-ssa-phiprop.c (propagate_with_phi): Handle inserted
VDEF.

* gcc.dg/torture/pr70740.c: New testcase.

Index: gcc/tree-ssa-phiprop.c
===
*** gcc/tree-ssa-phiprop.c  (revision 235305)
--- gcc/tree-ssa-phiprop.c  (working copy)
*** propagate_with_phi (basic_block bb, gphi
*** 327,339 
continue;
  
/* Check if we can move the loads.  The def stmt of the virtual use
!needs to be in a different basic block dominating bb.  */
vuse = gimple_vuse (use_stmt);
def_stmt = SSA_NAME_DEF_STMT (vuse);
if (!SSA_NAME_IS_DEFAULT_DEF (vuse)
  && (gimple_bb (def_stmt) == bb
! || !dominated_by_p (CDI_DOMINATORS,
! bb, gimple_bb (def_stmt
goto next;
  
/* Found a proper dereference with an aggregate copy.  Just
--- 327,341 
continue;
  
/* Check if we can move the loads.  The def stmt of the virtual use
!needs to be in a different basic block dominating bb.  When the
!def is an edge-inserted one we know it dominates us.  */
vuse = gimple_vuse (use_stmt);
def_stmt = SSA_NAME_DEF_STMT (vuse);
if (!SSA_NAME_IS_DEFAULT_DEF (vuse)
  && (gimple_bb (def_stmt) == bb
! || (gimple_bb (def_stmt)
! && !dominated_by_p (CDI_DOMINATORS,
! bb, gimple_bb (def_stmt)
goto next;
  
/* Found a proper dereference with an aggregate copy.  Just
Index: gcc/testsuite/gcc.dg/torture/pr70740.c
===
*** gcc/testsuite/gcc.dg/torture/pr70740.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70740.c  (working copy)
***
*** 0 
--- 1,38 
+ /* { dg-do compile } */
+ 
+ extern int foo (void);
+ extern void *memcpy (void *, const void *, __SIZE_TYPE__);
+ 
+ struct
+ {
+   char a[6];
+ } d;
+ struct
+ {
+   int a1[0];
+   int a2[0];
+   int a3[0];
+   int a4[];
+ } a, c;
+ int b;
+ 
+ int *
+ bar ()
+ {
+   if (b)
+ return a.a4;
+   return a.a2;
+ }
+ 
+ void
+ baz ()
+ {
+   int *e, *f;
+   if (foo ())
+ e = c.a3;
+   else
+ e = c.a1;
+   memcpy (d.a, e, 6);
+   f = bar ();
+   memcpy (d.a, f, 1);
+ }


Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-04-21 Thread kugan

Hi Richard,

On 19/04/16 22:11, Richard Biener wrote:

On Tue, Apr 19, 2016 at 1:36 PM, Richard Biener
 wrote:

On Tue, Apr 19, 2016 at 1:35 PM, Richard Biener
 wrote:

On Mon, Feb 29, 2016 at 11:53 AM, kugan
 wrote:


Err.  I think the way you implement that in reassoc is ad-hoc and not
related to reassoc at all.

In fact what reassoc is missing is to handle

   -y * z * (-w) * x -> y * x * w * x

thus optimize negates as if they were additional * -1 entries in a
multiplication chain.  And
then optimize a single remaining * -1 in the result chain to a negate.

Then match.pd handles x + (-y) -> x - y (independent of -frounding-math
btw).

So no, this isn't ok as-is, IMHO you want to expand the multiplication ops
chain
pulling in the * -1 ops (if single-use, of course).



I agree. Here is the updated patch along what you suggested. Does this look
better ?


It looks better but I think you want to do factor_out_negate_expr before the
first qsort/optimize_ops_list call to catch -1. * z * (-w) which also means you
want to simply append a -1. to the ops list rather than adjusting the result
with a negate stmt.

You also need to guard all this with ! HONOR_SNANS (type) && (!
HONOR_SIGNED_ZEROS (type)
|| ! COMPLEX_FLOAT_TYPE_P (type)) (see match.pd pattern transforming x
* -1. to -x).


And please add at least one testcase.


And it appears to me that you could handle this in linearize_expr_tree
as well, similar
to how we handle MULT_EXPR with acceptable_pow_call there by adding -1. and
op into the ops vec.




I am not sure I understand this. I tried doing this. If I add  -1 and 
rhs1 for the NEGATE_EXPR to ops list,  when it come to rewrite_expr_tree 
constant will be sorted early and would make it hard to generate:

 x + (-y * z * z) => x - y * z * z

Do you want to swap the constant in MULT_EXPR chain (if present) like in 
swap_ops_for_binary_stmt and then create a NEGATE_EXPR ?



Thanks,
Kugan


Similar for the x + x + x -> 3 * x case we'd want to add a repeat op when seeing
x + 3 * x + x and use ->count in that patch as well.

Best split out the

   if (rhscode == MULT_EXPR
   && TREE_CODE (binrhs) == SSA_NAME
   && acceptable_pow_call (SSA_NAME_DEF_STMT (binrhs), &base, &exponent))
 {
   add_repeat_to_ops_vec (ops, base, exponent);
   gimple_set_visited (SSA_NAME_DEF_STMT (binrhs), true);
 }
   else
 add_to_ops_vec (ops, binrhs);

pattern into a helper that handles the other cases.

Richard.


Richard.


Richard.


Thanks,
Kugan


Re: patch to fix PR70689

2016-04-21 Thread Jiong Wang

On 21/04/16 09:45, Jiong Wang wrote:



On 19/04/16 03:54, Vladimir N Makarov wrote:

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70689

The patch was successfully tested and bootstrapped on x86/x86-64.

Committed to the trunk as rev. 235184.


This caused the following regression on trunk

  FAIL: gcc: gcc.target/arm/eliminate.c scan-assembler-times r0,[\\t 
]*sp 3


Tracked by PR70751.


Re: C++ PATCH to fix a part of c++/70513 (ICE-on-invalid with enums)

2016-04-21 Thread Marek Polacek
On Wed, Apr 20, 2016 at 11:33:55AM -0400, Jason Merrill wrote:
> On 04/08/2016 07:51 AM, Marek Polacek wrote:
> >By the template part of this PR I mean that we ICE on
> >
> >template 
> >class D
> >{
> >   enum D::A { foo } c;
> >};
> >
> >where clang++ says
> >error: template specialization or definition requires a template parameter 
> >list
> >corresponding to the nested type 'D'
> >which I guess means that a valid code would have "" after "D".
> 
> No, this is misleading; adding the template args wouldn't make the extra
> qualification valid.  We should just give the extra qualification error in
> this case, too.

Oh, I see.  In that case...

> It might help to move your added check to before we push_scope.

This wouldn't help: nested_name_specifier and prev_scope are both "struct D",
but prev_scope contains TYPE_FIELDS, so comparison with == wouldn't work.  But
I wonder if we can't simply use same_type_p then, as in the below.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-20  Marek Polacek  

PR c/70513
* parser.c (cp_parser_enum_specifier): Check and possibly error for
extra qualification.

* g++.dg/cpp0x/forw_enum12.C: New test.
* g++.dg/cpp0x/forw_enum13.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index 0a1ed1a..e9d1995 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -17233,6 +17233,17 @@ cp_parser_enum_specifier (cp_parser* parser)
  type, prev_scope, nested_name_specifier);
  type = error_mark_node;
}
+ /* If that scope is the scope where the declaration is being placed
+the program is invalid.  */
+ else if (CLASS_TYPE_P (nested_name_specifier)
+  && CLASS_TYPE_P (prev_scope)
+  && same_type_p (nested_name_specifier, prev_scope))
+   {
+ permerror (type_start_token->location,
+"extra qualification not allowed");
+ type = error_mark_node;
+ nested_name_specifier = NULL_TREE;
+   }
}
 
   if (scoped_enum_p)
diff --git gcc/testsuite/g++.dg/cpp0x/forw_enum12.C 
gcc/testsuite/g++.dg/cpp0x/forw_enum12.C
index e69de29..906ba68 100644
--- gcc/testsuite/g++.dg/cpp0x/forw_enum12.C
+++ gcc/testsuite/g++.dg/cpp0x/forw_enum12.C
@@ -0,0 +1,29 @@
+// PR c++/70513
+// { dg-do compile { target c++11 } }
+
+struct S1
+{
+  enum E : int;
+  enum S1::E : int { X } e; // { dg-error "extra qualification not allowed" }
+};
+
+struct S2
+{
+  enum class E : int;
+  enum class S2::E : int { X } e; // { dg-error "extra qualification not 
allowed" }
+};
+
+struct S3
+{
+  enum struct E : int;
+  enum struct S3::E : int { X } e; // { dg-error "extra qualification not 
allowed" }
+};
+
+struct S4
+{
+  struct S5
+  {
+enum E : char;
+enum S4::S5::E : char { X } e; // { dg-error "extra qualification not 
allowed" }
+  };
+};
diff --git gcc/testsuite/g++.dg/cpp0x/forw_enum13.C 
gcc/testsuite/g++.dg/cpp0x/forw_enum13.C
index e69de29..b8027f0 100644
--- gcc/testsuite/g++.dg/cpp0x/forw_enum13.C
+++ gcc/testsuite/g++.dg/cpp0x/forw_enum13.C
@@ -0,0 +1,47 @@
+// PR c++/70513
+// { dg-do compile { target c++11 } }
+
+template 
+class D1
+{
+  enum A : int;
+  enum D1::A : int { foo } c; // { dg-error "extra qualification not allowed" }
+};
+
+template 
+class D2
+{
+  enum A : int;
+  enum D2::A : int { foo } c; // { dg-error "extra qualification not 
allowed" }
+};
+
+template 
+class D3
+{
+  enum D3::A { foo } c; // { dg-error "extra qualification not allowed" }
+};
+
+template 
+class D4
+{
+  enum D4::A { foo } c; // { dg-error "extra qualification not allowed" }
+};
+
+template 
+class D5
+{
+  class D6
+  {
+enum D6::A { foo } c; // { dg-error "extra qualification not allowed" }
+  };
+};
+
+template 
+class D7
+{
+  class D8
+  {
+enum A : int;
+enum D8::A : int { foo } c; // { dg-error "extra qualification not 
allowed" }
+  };
+};

Marek


Re: [PATCH][combine] Check WORD_REGISTER_OPERATIONS normally rather than through preprocessor

2016-04-21 Thread Jeff Law

On 04/18/2016 03:33 AM, Kyrill Tkachov wrote:

Hi Jeff,

On 17/04/16 21:16, Jeff Law wrote:

On 12/15/2015 10:07 AM, Kyrill Tkachov wrote:

Hi all,

As part of the war on conditional compilation here's an #if check on
WORD_REGISTER_OPERATIONS that
seems to have been missed out.

Bootstrapped and tested on arm, aarch64, x86_64.

Is it still ok to commit these kinds of conditional compilation
conversions?

Thanks,
Kyrill

2015-12-15  Kyrylo Tkachov  

 * combine.c (simplify_comparison): Convert preprocessor check of
 WORD_REGISTER_OPERATIONS into runtime check.

This patch, and others like it are fine for the trunk (gcc-7) again.



Thanks, but I've committed this already in December after approval from
Segher
(https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01771.html)
Ah, I just checked the 2016 stuff when I started flushing out the queue 
of easy stuff ;-)


Sorry for the noise.

jeff



[PATCHv2 0/7] ARC: Add support for nps400 variant

2016-04-21 Thread Andrew Burgess
This new iteration of the previous version is largely the same except
that I now no longer use configure time options to build in support
for nps400.  Instead support controlled with a -mcpu=nps400 command
line switch.  This change was made to mirror a similar change that was
requested when I pushed nps400 support upstream into binutils.

Most of the instructions added in this series are now in mainline
binutils, there are a few outstanding (<10) which I will return too
after this patch, but if anyone is super keen then there's a version
of binutils with full nps400 support on github:
 https://github.com/EZchip/binutils

However, all of the nps400 specific tests are compile only, so a
binutils with full nps400 support should not be required in order to
test these changes.

Thanks,
Andrew

---

Andrew Burgess (7):
  gcc/arc: Add support for nps400 cpu type.
  gcc/arc: Replace rI constraint with r & Cm2 for ld and update insns
  gcc/arc: convert some constraints to define_constraint
  gcc/arc: Add support for nps400 cmem xld/xst instructions
  gcc/arc: Add nps400 bitops support
  gcc/arc: Mask integer 'L' operands to 32-bit
  gcc/arc: Add an nps400 specific testcase

 gcc/ChangeLog.NPS400  | 104 ++
 gcc/common/config/arc/arc-common.c|   4 +
 gcc/config/arc/arc-opts.h |   1 +
 gcc/config/arc/arc.c  |  68 +++-
 gcc/config/arc/arc.h  |  23 +-
 gcc/config/arc/arc.md | 567 +++---
 gcc/config/arc/arc.opt|  18 +
 gcc/config/arc/constraints.md |  86 -
 gcc/config/arc/predicates.md  |  19 +
 gcc/testsuite/ChangeLog.NPS400|  43 +++
 gcc/testsuite/gcc.target/arc/cmem-1.c |  10 +
 gcc/testsuite/gcc.target/arc/cmem-2.c |  10 +
 gcc/testsuite/gcc.target/arc/cmem-3.c |  10 +
 gcc/testsuite/gcc.target/arc/cmem-4.c |  10 +
 gcc/testsuite/gcc.target/arc/cmem-5.c |  10 +
 gcc/testsuite/gcc.target/arc/cmem-6.c |  10 +
 gcc/testsuite/gcc.target/arc/cmem-7.c |  26 ++
 gcc/testsuite/gcc.target/arc/cmem-ld.inc  |  16 +
 gcc/testsuite/gcc.target/arc/cmem-st.inc  |  18 +
 gcc/testsuite/gcc.target/arc/extzv-1.c|  11 +
 gcc/testsuite/gcc.target/arc/insv-1.c |  21 ++
 gcc/testsuite/gcc.target/arc/insv-2.c |  18 +
 gcc/testsuite/gcc.target/arc/movb-1.c |  13 +
 gcc/testsuite/gcc.target/arc/movb-2.c |  13 +
 gcc/testsuite/gcc.target/arc/movb-3.c |  13 +
 gcc/testsuite/gcc.target/arc/movb-4.c |  13 +
 gcc/testsuite/gcc.target/arc/movb-5.c |  13 +
 gcc/testsuite/gcc.target/arc/movb_cl-1.c  |   9 +
 gcc/testsuite/gcc.target/arc/movb_cl-2.c  |  11 +
 gcc/testsuite/gcc.target/arc/movbi_cl-1.c |   9 +
 gcc/testsuite/gcc.target/arc/movh_cl-1.c  |  27 ++
 gcc/testsuite/gcc.target/arc/movl-1.c |  17 +
 gcc/testsuite/gcc.target/arc/nps400-1.c   |  23 ++
 33 files changed, 1109 insertions(+), 155 deletions(-)
 create mode 100644 gcc/ChangeLog.NPS400
 create mode 100644 gcc/testsuite/ChangeLog.NPS400
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-3.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-4.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-5.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-6.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-7.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-ld.inc
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-st.inc
 create mode 100644 gcc/testsuite/gcc.target/arc/extzv-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/insv-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/insv-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-3.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-4.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-5.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb_cl-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb_cl-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movbi_cl-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movh_cl-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movl-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/nps400-1.c

-- 
2.6.4



[PATCHv2 2/7] gcc/arc: Replace rI constraint with r & Cm2 for ld and update insns

2016-04-21 Thread Andrew Burgess
In the load*_update instructions the constraint 'rI' was being used,
which would accept either a register or a signed 12 bit constant.  The
problem is that the 32-bit form of ld with update only takes a signed
9-bit immediate.  As such, some ld instructions could be generated that
would, when assembled be 64-bit long, however, GCC believed them to be
32-bit long.  This error in the length would cause problems during
branch shortening.

The store*_update have the same restrictions on immediate size, however,
the patterns for these instructions already only accept 9-bit
immediates, and so should be safe.

gcc/ChangeLog:

* config/arc/arc.md (*loadqi_update): Replace use of 'rI'
constraint with separate 'r' and 'Cm2' constraints.
(*load_zeroextendqisi_update): Likewise.
(*load_signextendqisi_update): Likewise.
(*loadhi_update): Likewise.
(*load_zeroextendhisi_update): Likewise.
(*load_signextendhisi_update): Likewise.
(*loadsi_update): Likewise.
(*loadsf_update): Likewise.
---
 gcc/ChangeLog.NPS400  | 12 +++
 gcc/config/arc/arc.md | 96 +--
 2 files changed, 60 insertions(+), 48 deletions(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 4193d26..99e8e30 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -1151,40 +1151,40 @@
 
 ;; Note: loadqi_update has no 16-bit variant
 (define_insn "*loadqi_update"
-  [(set (match_operand:QI 3 "dest_reg_operand" "=r,r")
+  [(set (match_operand:QI 3 "dest_reg_operand" "=r,r,r")
 (match_operator:QI 4 "any_mem_operand"
- [(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-   (match_operand:SI 2 "nonmemory_operand" "rI,Cal"))]))
-   (set (match_operand:SI 0 "dest_reg_operand" "=r,r")
+ [(plus:SI (match_operand:SI 1 "register_operand" "0,0,0")
+   (match_operand:SI 2 "nonmemory_operand" "r,Cm2,Cal"))]))
+   (set (match_operand:SI 0 "dest_reg_operand" "=r,r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
   "ldb.a%V4 %3,[%0,%S2]"
-  [(set_attr "type" "load,load")
-   (set_attr "length" "4,8")])
+  [(set_attr "type" "load,load,load")
+   (set_attr "length" "4,4,8")])
 
 (define_insn "*load_zeroextendqisi_update"
-  [(set (match_operand:SI 3 "dest_reg_operand" "=r,r")
+  [(set (match_operand:SI 3 "dest_reg_operand" "=r,r,r")
(zero_extend:SI (match_operator:QI 4 "any_mem_operand"
-[(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-  (match_operand:SI 2 "nonmemory_operand" 
"rI,Cal"))])))
-   (set (match_operand:SI 0 "dest_reg_operand" "=r,r")
+[(plus:SI (match_operand:SI 1 "register_operand" 
"0,0,0")
+  (match_operand:SI 2 "nonmemory_operand" 
"r,Cm2,Cal"))])))
+   (set (match_operand:SI 0 "dest_reg_operand" "=r,r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
   "ldb.a%V4 %3,[%0,%S2]"
-  [(set_attr "type" "load,load")
-   (set_attr "length" "4,8")])
+  [(set_attr "type" "load,load,load")
+   (set_attr "length" "4,4,8")])
 
 (define_insn "*load_signextendqisi_update"
-  [(set (match_operand:SI 3 "dest_reg_operand" "=r,r")
+  [(set (match_operand:SI 3 "dest_reg_operand" "=r,r,r")
(sign_extend:SI (match_operator:QI 4 "any_mem_operand"
-[(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-  (match_operand:SI 2 "nonmemory_operand" 
"rI,Cal"))])))
-   (set (match_operand:SI 0 "dest_reg_operand" "=r,r")
+[(plus:SI (match_operand:SI 1 "register_operand" 
"0,0,0")
+  (match_operand:SI 2 "nonmemory_operand" 
"r,Cm2,Cal"))])))
+   (set (match_operand:SI 0 "dest_reg_operand" "=r,r,r")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
   "ldb.x.a%V4 %3,[%0,%S2]"
-  [(set_attr "type" "load,load")
-   (set_attr "length" "4,8")])
+  [(set_attr "type" "load,load,load")
+   (set_attr "length" "4,4,8")])
 
 (define_insn "*storeqi_update"
   [(set (match_operator:QI 4 "any_mem_operand"
@@ -1201,41 +1201,41 @@
 ;; ??? pattern may have to be re-written
 ;; Note: no 16-bit variant for this pattern
 (define_insn "*loadhi_update"
-  [(set (match_operand:HI 3 "dest_reg_operand" "=r,r")
+  [(set (match_operand:HI 3 "dest_reg_operand" "=r,r,r")
(match_operator:HI 4 "any_mem_operand"
-[(plus:SI (match_operand:SI 1 "register_operand" "0,0")
-  (match_operand:SI 2 "nonmemory_operand" "rI,Cal"))]))
-   (set (match_operand:SI 0 "dest_reg_operand" "=w,w")
+[(plus:SI (match_operand:SI 1 "register_operand" "0,0,0")
+  (match_operand:SI 2 "nonmemory_operand" "r,Cm2,Cal"))]))
+   (set (match_operand:SI 0 "dest_reg_operand" "=w,w,w")
(plus:SI (match_dup 1) (match_dup 2)))]
   ""
   "ld%_.a%V4 %3,[%0,%S2]"
-  [(set_attr "type" "load,load")
-   (set_attr "length" "4,8")])

[PATCHv2 6/7] gcc/arc: Mask integer 'L' operands to 32-bit

2016-04-21 Thread Andrew Burgess
When formatting 'L' operands (least significant word) only print
32-bits, don't sign extend to 64-bits.

This commit could really be applied directly to the current GCC trunk,
however, the only test I have for this issue right now relies on the
nps400 bitops support.

gcc/ChangeLog:

* config/arc/arc.c (arc_print_operand): Print integer 'L' operands
as 32-bits.

gcc/testsuite/ChangeLog:

* gcc.target/arc/movh_cl-1.c: New file.
---
 gcc/ChangeLog.NPS400 |  6 ++
 gcc/config/arc/arc.c | 10 --
 gcc/testsuite/ChangeLog.NPS400   |  4 
 gcc/testsuite/gcc.target/arc/movh_cl-1.c | 27 +++
 4 files changed, 41 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/movh_cl-1.c

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 72a0825..b7b8516 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -3181,18 +3181,16 @@ arc_print_operand (FILE *file, rtx x, int code)
   else if (GET_CODE (x) == CONST_INT
   || GET_CODE (x) == CONST_DOUBLE)
{
- rtx first, second;
+ rtx first, second, word;
 
  split_double (x, &first, &second);
 
  if((WORDS_BIG_ENDIAN) == 0)
- fprintf (file, "0x%08" PRIx64,
-  code == 'L' ? INTVAL (first) : INTVAL (second));
+   word = (code == 'L' ? first : second);
  else
- fprintf (file, "0x%08" PRIx64,
-  code == 'L' ? INTVAL (second) : INTVAL (first));
-
+   word = (code == 'L' ? second : first);
 
+ fprintf (file, "0x%08" PRIx32, ((uint32_t) INTVAL (word)));
  }
   else
output_operand_lossage ("invalid operand to %%H/%%L code");
diff --git a/gcc/testsuite/gcc.target/arc/movh_cl-1.c 
b/gcc/testsuite/gcc.target/arc/movh_cl-1.c
new file mode 100644
index 000..220cd9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/movh_cl-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=nps400 -O2 -mbitops" } */
+
+struct thing
+{
+  union
+  {
+int raw;
+struct
+{
+  unsigned a : 1;
+  unsigned b : 1;
+};
+  };
+};
+
+extern void func (int);
+
+void
+blah ()
+{
+  struct thing xx;
+  xx.a = xx.b = 1;
+  func (xx.raw);
+}
+
+/* { dg-final { scan-assembler "movh\.cl r\[0-9\]+,0xc000>>16" } } */
-- 
2.6.4



[PATCHv2 1/7] gcc/arc: Add support for nps400 cpu type.

2016-04-21 Thread Andrew Burgess
The nps400 is an arc700 with a set of extension instructions produced by
Mellanox (formally EZChip).  This commit adds support for the nps400
architecture to the arc backend.

After this commit it is possible to compile using -mcpu=nps400 in order
to specialise for the nps400.  Later commits add support for the
specific extension instructions.

gcc/ChangeLog:

* common/config/arc/arc-common.c (arc_handle_option): Add NPS400
support, setup defaults.
* config/arc/arc-opts.h (enum processor_type): Add NPS400.
* config/arc/arc.c (arc_init): Add NPS400 support.
* config/arc/arc.h (CPP_SPEC): Add NPS400 defines.
(TARGET_ARC700): NPS400 is also an ARC700.
* config/arc/arc.opt: Add NPS400 options to -mcpu=.
---
 gcc/ChangeLog.NPS400   | 9 +
 gcc/common/config/arc/arc-common.c | 4 
 gcc/config/arc/arc-opts.h  | 1 +
 gcc/config/arc/arc.c   | 5 +
 gcc/config/arc/arc.h   | 5 -
 gcc/config/arc/arc.opt | 6 ++
 6 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/ChangeLog.NPS400

diff --git a/gcc/common/config/arc/arc-common.c 
b/gcc/common/config/arc/arc-common.c
index 64fb053..f5b9c6d 100644
--- a/gcc/common/config/arc/arc-common.c
+++ b/gcc/common/config/arc/arc-common.c
@@ -83,6 +83,10 @@ arc_handle_option (struct gcc_options *opts, struct 
gcc_options *opts_set,
 
   switch (value)
{
+   case PROCESSOR_NPS400:
+ if (! (opts_set->x_TARGET_CASE_VECTOR_PC_RELATIVE) )
+   opts->x_TARGET_CASE_VECTOR_PC_RELATIVE = 1;
+ /* Fall through */
case PROCESSOR_ARC600:
case PROCESSOR_ARC700:
  if (! (opts_set->x_target_flags & MASK_BARREL_SHIFTER) )
diff --git a/gcc/config/arc/arc-opts.h b/gcc/config/arc/arc-opts.h
index 1e11ebc4..cbd7898 100644
--- a/gcc/config/arc/arc-opts.h
+++ b/gcc/config/arc/arc-opts.h
@@ -24,6 +24,7 @@ enum processor_type
   PROCESSOR_ARC600,
   PROCESSOR_ARC601,
   PROCESSOR_ARC700,
+  PROCESSOR_NPS400,
   PROCESSOR_ARCEM,
   PROCESSOR_ARCHS
 };
diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index d60db50..ae8772e 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -649,6 +649,11 @@ arc_init (void)
   tune_dflt = TUNE_ARC700_4_2_STD;
   break;
 
+case PROCESSOR_NPS400:
+  arc_cpu_string = "NPS400";
+  tune_dflt = TUNE_ARC700_4_2_STD;
+  break;
+
 case PROCESSOR_ARCEM:
   arc_cpu_string = "EM";
   break;
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 1c2a38d..f96bf0f 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -136,6 +136,8 @@ along with GCC; see the file COPYING3.  If not see
 %{mdsp-packa:-D__Xdsp_packa} %{mcrc:-D__Xcrc} %{mdvbf:-D__Xdvbf} \
 %{mtelephony:-D__Xtelephony} %{mxy:-D__Xxy} %{mmul64: -D__Xmult32} \
 %{mlock:-D__Xlock} %{mswape:-D__Xswape} %{mrtsc:-D__Xrtsc} \
+%{mcpu=NPS400:-D__NPS400__} \
+%{mcpu=nps400:-D__NPS400__} \
 "
 
 #define CC1_SPEC "\
@@ -297,7 +299,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #define TARGET_ARC600 (arc_cpu == PROCESSOR_ARC600)
 #define TARGET_ARC601 (arc_cpu == PROCESSOR_ARC601)
-#define TARGET_ARC700 (arc_cpu == PROCESSOR_ARC700)
+#define TARGET_ARC700 (arc_cpu == PROCESSOR_ARC700 \
+  || arc_cpu == PROCESSOR_NPS400)
 #define TARGET_EM (arc_cpu == PROCESSOR_ARCEM)
 #define TARGET_HS (arc_cpu == PROCESSOR_ARCHS)
 #define TARGET_V2  \
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index 2227b75..14fd2a4 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -189,6 +189,12 @@ EnumValue
 Enum(processor_type) String(arc700) Value(PROCESSOR_ARC700)
 
 EnumValue
+Enum(processor_type) String(nps400) Value(PROCESSOR_NPS400)
+
+EnumValue
+Enum(processor_type) String(NPS400) Value(PROCESSOR_NPS400)
+
+EnumValue
 Enum(processor_type) String(ARCEM) Value(PROCESSOR_ARCEM)
 
 EnumValue
-- 
2.6.4



[PATCHv2 7/7] gcc/arc: Add an nps400 specific testcase

2016-04-21 Thread Andrew Burgess
This test case triggered a bug caused by VOIDmode not being handled in
proper_comparison_operator, this problem was fixed with a commit on
2016-01-27 by Claudiu Zissulescu, adding this test case for coverage.

gcc/testsuite/ChangeLog:

* gcc.target/arc/nps400-1.c: New file.
---
 gcc/testsuite/ChangeLog.NPS400  |  4 
 gcc/testsuite/gcc.target/arc/nps400-1.c | 23 +++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arc/nps400-1.c

diff --git a/gcc/testsuite/gcc.target/arc/nps400-1.c 
b/gcc/testsuite/gcc.target/arc/nps400-1.c
new file mode 100644
index 000..f3d6271
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/nps400-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=nps400 -mq-class -mbitops -munaligned-access -mcmem -O2 
-fno-strict-aliasing" } */
+
+enum npsdp_mem_space_type {
+  NPSDP_EXTERNAL_MS = 1
+};
+struct npsdp_ext_addr {
+  struct {
+struct {
+  enum npsdp_mem_space_type mem_type : 1;
+  unsigned msid : 5;
+};
+  };
+  char user_space[];
+} a;
+char b;
+void fn1() {
+  ((struct npsdp_ext_addr *)a.user_space)->mem_type = NPSDP_EXTERNAL_MS;
+  ((struct npsdp_ext_addr *)a.user_space)->msid =
+  ((struct npsdp_ext_addr *)a.user_space)->mem_type ? 1 : 10;
+  while (b)
+;
+}
-- 
2.6.4



[PATCHv2 3/7] gcc/arc: convert some constraints to define_constraint

2016-04-21 Thread Andrew Burgess
The define_memory_constraint allows for the address operand to be
reloaded into a base register.  However, for the constraints 'Us<' and
'Us>', which are used for matching 'push' and 'pop' instructions moving
the address into a base register is not helpful.  The constraints then
should be define_constraint, not define_memory_constraint.

Similarly the Usd constraint, used for generating small data area memory
accesses, can't have its operand loaded into a register as the
relocation for small data area symbols only works within ld/st
instructions.

gcc/ChangeLog:

* config/arc/constraints.md (Usd): Convert to define_constraint.
(Us<): Likewise.
(Us>): Likewise.
---
 gcc/ChangeLog.NPS400  |  7 +++
 gcc/config/arc/constraints.md | 18 +++---
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arc/constraints.md b/gcc/config/arc/constraints.md
index 668b60a..b6954ad 100644
--- a/gcc/config/arc/constraints.md
+++ b/gcc/config/arc/constraints.md
@@ -269,11 +269,15 @@
   (and (match_code "mem")
(match_test "compact_store_memory_operand (op, VOIDmode)")))
 
-(define_memory_constraint "Usd"
-  "@internal
-   A valid _small-data_ memory operand for ARCompact instructions"
-  (and (match_code "mem")
-   (match_test "compact_sda_memory_operand (op, VOIDmode)")))
+; Don't use define_memory_constraint here as the relocation patching
+; for small data symbols only works within a ld/st instruction and
+; define_memory_constraint may result in the address being calculated
+; into a register first.
+(define_constraint "Usd"
+   "@internal
+A valid _small-data_ memory operand for ARCompact instructions"
+   (and (match_code "mem")
+(match_test "compact_sda_memory_operand (op, VOIDmode)")))
 
 (define_memory_constraint "Usc"
   "@internal
@@ -283,7 +287,7 @@
 ;; ??? the assembler rejects stores of immediates to small data.
(match_test "!compact_sda_memory_operand (op, VOIDmode)")))
 
-(define_memory_constraint "Us<"
+(define_constraint "Us<"
   "@internal
Stack pre-decrement"
   (and (match_code "mem")
@@ -291,7 +295,7 @@
(match_test "REG_P (XEXP (XEXP (op, 0), 0))")
(match_test "REGNO (XEXP (XEXP (op, 0), 0)) == SP_REG")))
 
-(define_memory_constraint "Us>"
+(define_constraint "Us>"
   "@internal
Stack post-increment"
   (and (match_code "mem")
-- 
2.6.4



[PATCHv2 5/7] gcc/arc: Add nps400 bitops support

2016-04-21 Thread Andrew Burgess
Add support for nps400 bit operation instructions.  There's a new flag
-mbitops that turns this feature on.  There are new instructions, some
changes to existing instructions, a new register class to support the
new instructions, and some new expand and peephole optimisations.

gcc/ChangeLog:

* config/arc/arc.c (arc_conditional_register_usage): Take
TARGET_RRQ_CLASS into account.
(arc_print_operand): Support printing 'p' and 's' operands.
* config/arc/arc.h (TARGET_NPS_BITOPS_DEFAULT): Provide default
as 0.
(TARGET_RRQ_CLASS): Define.
(IS_POWEROF2_OR_0_P): Define.
* config/arc/arc.md (*movsi_insn): Add w/Clo, w/Chi, and w/Cbi
alternatives.
(*tst_movb): New define_insn.
(*tst): Avoid recognition if it could prevent '*tst_movb'
combination; replace c/CnL with c/Chs alternative.
(*tst_bitfield_tst): New define_insn.
(*tst_bitfield_asr): New define_insn.
(*tst_bitfield): New define_insn.
(andsi3_i): Add Rrq variant.
(extzv): New define_expand.
(insv): New define_expand.
(*insv_i): New define_insn.
(*movb): New define_insn.
(*movb_signed): New define_insn.
(*movb_high): New define_insn.
(*movb_high_signed): New define_insn.
(*movb_high_signed + 1): New define_split pattern.
(*mrgb): New define_insn.
(*mrgb + 1): New define_peephole2 pattern.
(*mrgb + 2): New define_peephole2 pattern.
* config/arc/arc.opt (mbitops): New option for nps400, uses
TARGET_NPS_BITOPS_DEFAULT.
* config/arc/constraints.md (q): Make register class conditional.
(Rrq): New register constraint.
(Chs): New constraint.
(Clo): New constraint.
(Chi): New constraint.
(Cbf): New constraint.
(Cbn): New constraint.
(C18): New constraint.
(Cbi): New constraint.

gcc/testsuite/ChangeLog:

* gcc.target/arc/extzv-1.c: New file.
* gcc.target/arc/insv-1.c: New file.
* gcc.target/arc/insv-2.c: New file.
* gcc.target/arc/movb-1.c: New file.
* gcc.target/arc/movb-2.c: New file.
* gcc.target/arc/movb-3.c: New file.
* gcc.target/arc/movb-4.c: New file.
* gcc.target/arc/movb-5.c: New file.
* gcc.target/arc/movb_cl-1.c: New file.
* gcc.target/arc/movb_cl-2.c: New file.
* gcc.target/arc/movbi_cl-1.c: New file.
* gcc.target/arc/movl-1.c: New file.
---
 gcc/ChangeLog.NPS400  |  42 
 gcc/config/arc/arc.c  |  33 ++-
 gcc/config/arc/arc.h  |   9 +
 gcc/config/arc/arc.md | 382 ++
 gcc/config/arc/arc.opt|   4 +
 gcc/config/arc/constraints.md |  58 -
 gcc/testsuite/ChangeLog.NPS400|  16 ++
 gcc/testsuite/gcc.target/arc/extzv-1.c|  11 +
 gcc/testsuite/gcc.target/arc/insv-1.c |  21 ++
 gcc/testsuite/gcc.target/arc/insv-2.c |  18 ++
 gcc/testsuite/gcc.target/arc/movb-1.c |  13 +
 gcc/testsuite/gcc.target/arc/movb-2.c |  13 +
 gcc/testsuite/gcc.target/arc/movb-3.c |  13 +
 gcc/testsuite/gcc.target/arc/movb-4.c |  13 +
 gcc/testsuite/gcc.target/arc/movb-5.c |  13 +
 gcc/testsuite/gcc.target/arc/movb_cl-1.c  |   9 +
 gcc/testsuite/gcc.target/arc/movb_cl-2.c  |  11 +
 gcc/testsuite/gcc.target/arc/movbi_cl-1.c |   9 +
 gcc/testsuite/gcc.target/arc/movl-1.c |  17 ++
 19 files changed, 648 insertions(+), 57 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/extzv-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/insv-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/insv-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-3.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-4.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb-5.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb_cl-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movb_cl-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movbi_cl-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/movl-1.c

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 890a1a5..72a0825 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -1376,7 +1376,8 @@ arc_conditional_register_usage (void)
 {
   if (i < 29)
{
- if (TARGET_Q_CLASS && ((i <= 3) || ((i >= 12) && (i <= 15
+ if ((TARGET_Q_CLASS || TARGET_RRQ_CLASS)
+ && ((i <= 3) || ((i >= 12) && (i <= 15
arc_regno_reg_class[i] = ARCOMPACT16_REGS;
  else
arc_regno_reg_class[i] = GENERAL_REGS;
@@ -1393,12 +1394,12 @@ arc_conditional_register_usage (void)
arc_regno_reg_class[i] = NO_REGS;
 }
 
-  /* ARCOMPACT16_REGS is empty, if TARGET

[PATCHv2 4/7] gcc/arc: Add support for nps400 cmem xld/xst instructions

2016-04-21 Thread Andrew Burgess
This commit adds support for NPS400 cmem memory sections.  Data to be
placed into cmem memory is placed into a section ".cmem",
".cmem_shared", or ".cmem_private".

There are restrictions on how instructions can be used to operate on
data held in cmem memory, this is reflected by the introduction of new
operand constraints (Uex/Ucm), and modifications to some instructions to
make use of these constraints.

gcc/ChangeLog:

* config/arc/arc.h (SYMBOL_FLAG_CMEM): Define.
(TARGET_NPS_CMEM_DEFAULT): Provide default definition.
* config/arc/arc.c (arc_address_cost): Return 0 for cmem_address.
(arc_encode_section_info): Set SYMBOL_FLAG_CMEM where indicated.
* config/arc/arc.opt (mcmem): New option.
* config/arc/arc.md (*extendqihi2_i): Add r/Uex alternative,
supply length for r/m alternative.
(*extendqisi2_ac): Likewise.
(*extendhisi2_i): Add r/Uex alternative, supply length for r/m and
r/Uex alternative.
(movqi_insn): Add r/Ucm and Ucm/?Rac alternatives.
(movhi_insn): Likewise.
(movsi_insn): Add r/Ucm,Ucm/w alternatives.
(*zero_extendqihi2_i): Add r/Ucm alternative.
(*zero_extendqisi2_ac): Likewise.
(*zero_extendhisi2_i): Likewise.
* config/arc/constraints.md (Uex): New memory constraint.
(Ucm): New define_constraint.
* config/arc/predicates.md (long_immediate_loadstore_operand):
Return 0 for MEM with cmem_address address.
(cmem_address_0): New predicates.
(cmem_address_1): Likewise.
(cmem_address_2): Likewise.
(cmem_address): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/arc/cmem-1.c: New file.
* gcc.target/arc/cmem-2.c: New file.
* gcc.target/arc/cmem-3.c: New file.
* gcc.target/arc/cmem-4.c: New file.
* gcc.target/arc/cmem-5.c: New file.
* gcc.target/arc/cmem-6.c: New file.
* gcc.target/arc/cmem-7.c: New file.
* gcc.target/arc/cmem-ld.inc: New file.
* gcc.target/arc/cmem-st.inc: New file.
---
 gcc/ChangeLog.NPS400 |  28 
 gcc/config/arc/arc.c |  20 ++
 gcc/config/arc/arc.h |   9 +++
 gcc/config/arc/arc.md| 115 +--
 gcc/config/arc/arc.opt   |   8 +++
 gcc/config/arc/constraints.md|  14 +++-
 gcc/config/arc/predicates.md |  19 +
 gcc/testsuite/ChangeLog.NPS400   |  19 +
 gcc/testsuite/gcc.target/arc/cmem-1.c|  10 +++
 gcc/testsuite/gcc.target/arc/cmem-2.c|  10 +++
 gcc/testsuite/gcc.target/arc/cmem-3.c|  10 +++
 gcc/testsuite/gcc.target/arc/cmem-4.c|  10 +++
 gcc/testsuite/gcc.target/arc/cmem-5.c|  10 +++
 gcc/testsuite/gcc.target/arc/cmem-6.c|  10 +++
 gcc/testsuite/gcc.target/arc/cmem-7.c|  26 +++
 gcc/testsuite/gcc.target/arc/cmem-ld.inc |  16 +
 gcc/testsuite/gcc.target/arc/cmem-st.inc |  18 +
 17 files changed, 301 insertions(+), 51 deletions(-)
 create mode 100644 gcc/testsuite/ChangeLog.NPS400
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-1.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-2.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-3.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-4.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-5.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-6.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-7.c
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-ld.inc
 create mode 100644 gcc/testsuite/gcc.target/arc/cmem-st.inc

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index ae8772e..890a1a5 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -1789,6 +1789,8 @@ arc_address_cost (rtx addr, machine_mode, addr_space_t, 
bool speed)
 case LABEL_REF :
 case SYMBOL_REF :
 case CONST :
+  if (TARGET_NPS_CMEM && cmem_address (addr, SImode))
+   return 0;
   /* Most likely needs a LIMM.  */
   return COSTS_N_INSNS (1);
 
@@ -4263,6 +4265,24 @@ arc_encode_section_info (tree decl, rtx rtl, int first)
 
   SYMBOL_REF_FLAGS (symbol) = flags;
 }
+  else if (TREE_CODE (decl) == VAR_DECL)
+{
+  rtx symbol = XEXP (rtl, 0);
+
+  tree attr = (TREE_TYPE (decl) != error_mark_node
+  ? DECL_ATTRIBUTES (decl) : NULL_TREE);
+
+  tree sec_attr = lookup_attribute ("section", attr);
+  if (sec_attr)
+   {
+ const char *sec_name
+   = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (sec_attr)));
+ if (strcmp (sec_name, ".cmem") == 0
+ || strcmp (sec_name, ".cmem_shared") == 0
+ || strcmp (sec_name, ".cmem_private") == 0)
+  SYMBOL_REF_FLAGS (symbol) |= SYMBOL_FLAG_CMEM;
+   }
+}
 }
 
 /* This is how to output a definition of an internal numbered label where
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h

Inline across -ffast-math boundary

2016-04-21 Thread Jan Hubicka
Hi,
this patch implements the long promised logic to inline across -ffast-math
boundary when eitehr caller or callee has no fp operations in it.  This is
needed to resolve code quality regression on Firefox with LTO where
-O3/-O2/-Ofast flags are combined and we fail to inline a lot of comdats
otherwise.

Bootstrapped/regtested x86_64-linux. Ricahrd, I would like to know your opinion
on fp_expression_p predicate - it is bit ugly but I do not know how to implement
it better.

We still won't inline -O1 code into -O2+ because flag_strict_overflow differs.
I will implement similar logic for overflows incrementally. Similarly 
flag_errno_math
can be handled better, but I am not sure it matters - I think wast majority of 
time
users set errno_math in sync with other -ffast-math implied flags.

Honza


* ipa-inline-analysis.c (reset_inline_summary): Clear fp_expressions
(dump_inline_summary): Dump it.
(fp_expression_p): New predicate.
(estimate_function_body_sizes): Use it.
(inline_merge_summary): Merge fp_expressions.
(inline_read_section): Read fp_expressions.
(inline_write_summary): Write fp_expressions.
* ipa-inline.c (can_inline_edge_p): Permit inlining across fp math
codegen boundary if either caller or callee is !fp_expressions.
* ipa-inline.h (inline_summary): Add fp_expressions.
* ipa-inline-transform.c (inline_call): When inlining !fp_expressions
to fp_expressions be sure the fp generation flags are updated.

* gcc.dg/ipa/inline-8.c: New testcase.
Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 235312)
+++ ipa-inline-analysis.c   (working copy)
@@ -1069,6 +1069,7 @@ reset_inline_summary (struct cgraph_node
 reset_inline_edge_summary (e);
   for (e = node->indirect_calls; e; e = e->next_callee)
 reset_inline_edge_summary (e);
+  info->fp_expressions = false;
 }
 
 /* Hook that is called by cgraph.c when a node is removed.  */
@@ -1423,6 +1424,8 @@ dump_inline_summary (FILE *f, struct cgr
fprintf (f, " inlinable");
   if (s->contains_cilk_spawn)
fprintf (f, " contains_cilk_spawn");
+  if (s->fp_expressions)
+   fprintf (f, " fp_expression");
   fprintf (f, "\n  self time:   %i\n", s->self_time);
   fprintf (f, "  global time: %i\n", s->time);
   fprintf (f, "  self size:   %i\n", s->self_size);
@@ -2459,6 +2462,42 @@ clobber_only_eh_bb_p (basic_block bb, bo
   return true;
 }
 
+/* Return true if STMT compute a floating point expression that may be affected
+   by -ffast-math and similar flags.  */
+
+static bool
+fp_expression_p (gimple *stmt)
+{
+  tree fndecl;
+
+  if (gimple_code (stmt) == GIMPLE_ASSIGN
+  /* Even conversion to and from float are FP expressions.  */
+  && (FLOAT_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+ || FLOAT_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt
+  /* Plain moves are safe.  */
+  && (IS_EXPR_CODE_CLASS (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)))
+ || TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison))
+return true;
+
+  /* Comparsions may be optimized with assumption that value is not NaN.  */
+  if (gimple_code (stmt) == GIMPLE_COND
+  && (FLOAT_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))
+ || FLOAT_TYPE_P (TREE_TYPE (gimple_cond_rhs (stmt)
+return true;
+
+  /* Builtins may be optimized depending on math mode.  We don't really have
+ list of these, so just check that there are no FP arguments.  */
+  if (gimple_code (stmt) == GIMPLE_CALL
+  && (fndecl = gimple_call_fndecl (stmt)) != NULL_TREE
+  && DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN)
+{
+  for (unsigned int i=0; i < gimple_call_num_args (stmt); i++)
+   if (FLOAT_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, i
+ return true;
+}
+  return false;
+}
+
 /* Compute function body size parameters for NODE.
When EARLY is true, we compute only simple summaries without
non-trivial predicates to drive the early inliner.  */
@@ -2733,6 +2772,13 @@ estimate_function_body_sizes (struct cgr
   this_time * (2 - prob), &p);
}
 
+ if (!info->fp_expressions && fp_expression_p (stmt))
+   {
+ info->fp_expressions = true;
+ if (dump_file)
+   fprintf (dump_file, "   fp_expression set\n");
+   }
+
  gcc_assert (time >= 0);
  gcc_assert (size >= 0);
}
@@ -3577,6 +3623,8 @@ inline_merge_summary (struct cgraph_edge
   else
 toplev_predicate = true_predicate ();
 
+  info->fp_expressions |= callee_info->fp_expressions;
+
   if (callee_info->conds)
 evaluate_properties_for_edge (edge, true, &clause, NULL, NULL, NULL);
   if (ipa_node_params_sum && callee_info->conds)
@@ -4229,6 

[PATCH, i386, AVX-512] Fix PR target/70728.

2016-04-21 Thread Kirill Yukhin
Hello,
Patch in the bottom fixes mentioned PR by separating
AVX and AVX-512BW constraints.

gcc/
* gcc/config/i386/sse.md (define_insn "3"):
Extract AVX-512BW constraint from AVX.
gcc/testsuite/
* gcc.target/i386/pr70728.c: New test.

Bootsrap and regtest is in progress for i?86|x86_64.

I'll check it into main trunk if it'll pass.

--
Thanks, K

commit c98976fc83f62c1faf1d7e3302632fa084e4cc60
Author: Kirill Yukhin 
Date:   Thu Apr 21 12:59:38 2016 +0300

AVX-512. Fir PR target/70728 by adding explicit constraint for EVEX.

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9a84468..48a7abb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10138,22 +10138,23 @@
(set_attr "mode" "")])
 
 (define_insn "3"
-  [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,v")
+  [(set (match_operand:VI48_AVX2 0 "register_operand" "=x,x,v")
(any_lshift:VI48_AVX2
- (match_operand:VI48_AVX2 1 "register_operand" "0,v")
- (match_operand:SI 2 "nonmemory_operand" "xN,vN")))]
+ (match_operand:VI48_AVX2 1 "register_operand" "0,x,v")
+ (match_operand:SI 2 "nonmemory_operand" "xN,xN,vN")))]
   "TARGET_SSE2 && "
   "@
p\t{%2, %0|%0, %2}
-   vp\t{%2, %1, %0|%0, 
%1, %2}"
-  [(set_attr "isa" "noavx,avx")
+   vp\t{%2, %1, %0|%0, 
%1, %2}
+   vp\t{%2, %1, %0|%0, 
%1, %2}"  
+  [(set_attr "isa" "noavx,avx,avx512bw")
(set_attr "type" "sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "prefix_data16" "1,*")
-   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_data16" "1,*,*")
+   (set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "")])
 
 (define_insn "3"
diff --git a/gcc/testsuite/gcc.target/i386/pr70728.c 
b/gcc/testsuite/gcc.target/i386/pr70728.c
new file mode 100644
index 000..89c140d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr70728.c
@@ -0,0 +1,30 @@
+/* PR target/70728 */
+/* { dg-do compile } */
+/* { dg-options "-S -Ofast -march=knl" } */
+
+short a = -15726;
+int b = (int)-7003557328690506537LL;
+short c[5][5][3][6];
+char d[2][5][3][2][4];
+void fn1() {
+  for (int e = 0; e < 3; e = e + 1)
+for (int f = 0; f < 2; f = f + 1)
+  for (int g = 0; g < 4; g = g + 1)
+for (int h = 0; h < 3; h = h + 1)
+  for (int i = 0; i < 2; i = i + 1)
+for (int j = 0; j < 4; j = j + 1)
+  d[f][g][h][i][j] =
+  7 << (1236110361944357083 >> a + 15728) - 309027590486089270 
>>
+  (c[e][f][h][j] + 2147483647 << ~b - 7003557328690506536) -
+  2147480981;
+}
+int main() {
+  for (int k = 0; k < 5; ++k)
+for (int l = 0; l < 5; ++l)
+  for (int m = 0; m < 3; ++m)
+for (int n = 0; n < 4; ++n)
+  c[k][l][m][n] = -2639;
+  fn1();
+}
+
+/* { dg-final { scan-assembler-not "sll\[ 
\\t\]+\[^\n\]*%\.mm(?:1\[6-9\]|\[2-3\]\[0-9\])" } } */


[PATCH][PR sanitizer/70624] Fix 'dyld: Symbol not found: _dyldVersionNumber' link error on old Darwin systems.

2016-04-21 Thread Maxim Ostapenko

Hi,

On older Darwin systems (in particular, Darwin 10), dyld doesn't export 
'_dyldVersionNumber' symbol so we would have 'undefined reference' error 
in sanitizer library. We can mitigate this by introducing weak reference 
to '_dyldVersionNumber' and bailing out if &dyldVersionNumber == 0.


Tested by Dominique on x86_64-apple-darwin10 and x86_64-apple-darwin15 
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70624#c8).
Ok for mainline? If yes, should I back port this patch to gcc-{4.9, 5, 
6}-branch?


-Maxim
libsanitizer/ChangeLog:

2016-04-21  Maxim Ostapenko  

	PR sanitizer/70624
	* asan/asan_mac.cc: Cherry pick upstream r266868.

diff --git a/libsanitizer/asan/asan_mac.cc b/libsanitizer/asan/asan_mac.cc
index 20e37ff..ab3c656 100644
--- a/libsanitizer/asan/asan_mac.cc
+++ b/libsanitizer/asan/asan_mac.cc
@@ -97,10 +97,14 @@ void DisableReexec() {
   reexec_disabled = true;
 }
 
-extern "C" double dyldVersionNumber;
+extern "C" SANITIZER_WEAK_ATTRIBUTE double dyldVersionNumber;
 static const double kMinDyldVersionWithAutoInterposition = 360.0;
 
 bool DyldNeedsEnvVariable() {
+  // Although sanitizer support was added to LLVM on OS X 10.7+, GCC users
+  // still may want use them on older systems. On older Darwin platforms, dyld
+  // doesn't export dyldVersionNumber symbol and we simply return true.
+  if (!&dyldVersionNumber) return true;
   // If running on OS X 10.11+ or iOS 9.0+, dyld will interpose even if
   // DYLD_INSERT_LIBRARIES is not set. However, checking OS version via
   // GetMacosVersion() doesn't work for the simulator. Let's instead check


Re: [PATCH][PR sanitizer/70624] Fix 'dyld: Symbol not found: _dyldVersionNumber' link error on old Darwin systems.

2016-04-21 Thread Jakub Jelinek
On Thu, Apr 21, 2016 at 02:52:12PM +0300, Maxim Ostapenko wrote:
> Hi,
> 
> On older Darwin systems (in particular, Darwin 10), dyld doesn't export
> '_dyldVersionNumber' symbol so we would have 'undefined reference' error in
> sanitizer library. We can mitigate this by introducing weak reference to
> '_dyldVersionNumber' and bailing out if &dyldVersionNumber == 0.
> 
> Tested by Dominique on x86_64-apple-darwin10 and x86_64-apple-darwin15 (see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70624#c8).
> Ok for mainline? If yes, should I back port this patch to gcc-{4.9, 5,
> 6}-branch?
> 
> -Maxim

> libsanitizer/ChangeLog:
> 
> 2016-04-21  Maxim Ostapenko  
> 
>   PR sanitizer/70624
>   * asan/asan_mac.cc: Cherry pick upstream r266868.

Ok, for all branches, including 6.1 (if you commit it today).
Thanks.

> diff --git a/libsanitizer/asan/asan_mac.cc b/libsanitizer/asan/asan_mac.cc
> index 20e37ff..ab3c656 100644
> --- a/libsanitizer/asan/asan_mac.cc
> +++ b/libsanitizer/asan/asan_mac.cc
> @@ -97,10 +97,14 @@ void DisableReexec() {
>reexec_disabled = true;
>  }
>  
> -extern "C" double dyldVersionNumber;
> +extern "C" SANITIZER_WEAK_ATTRIBUTE double dyldVersionNumber;
>  static const double kMinDyldVersionWithAutoInterposition = 360.0;
>  
>  bool DyldNeedsEnvVariable() {
> +  // Although sanitizer support was added to LLVM on OS X 10.7+, GCC users
> +  // still may want use them on older systems. On older Darwin platforms, 
> dyld
> +  // doesn't export dyldVersionNumber symbol and we simply return true.
> +  if (!&dyldVersionNumber) return true;
>// If running on OS X 10.11+ or iOS 9.0+, dyld will interpose even if
>// DYLD_INSERT_LIBRARIES is not set. However, checking OS version via
>// GetMacosVersion() doesn't work for the simulator. Let's instead check


Jakub


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread H.J. Lu
On Thu, Apr 21, 2016 at 3:18 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 9:42 AM, Uros Bizjak  wrote:
>> On Thu, Apr 21, 2016 at 9:37 AM, Uros Bizjak  wrote:
>>> On Wed, Apr 20, 2016 at 9:53 PM, H.J. Lu  wrote:
 Since all 1s in TImode is standard SSE2 constants, all 1s in OImode is
 standard AVX2 constants and all 1s in XImode is standard AVX512F constants,
 pass mode to standard_sse_constant_p and standard_sse_constant_opcode
 to check if all 1s is available for target.

 Tested on Linux/x86-64.  OK for master?
>>>
>>> No.
>>>
>>> This patch should use "isa" attribute instead of adding even more
>>> similar patterns. Also, please leave MEM_P checks, the rare C->m move
>>> can be easily resolved by IRA.
>>
>> Actually, register_operand checks are indeed better, please disregard
>> MEM_P recommendation.
>
> So, something like attached untested RFC proto-patch, that lacks
> wide-int handling.
>
> Uros.

+
+  else if (CONST_INT_P (x))
+{
+  if (INTVAL (X) == HOST_WIDE_INT_M1
+  && TARGET_SSE2)
+ return 2;
+}
+  else if (CONST_WIDE_INT_P (x))
+{
+  if ( something involving wi::minus-one 
+  && TARGET_AVX2)
+ return 2;
+  if (
+  && TARGET_AVX512F)
+ return 2;
+}
+  else if (vector_all_ones_operand (x, mode))

All 1s may not use winde_int.  It has VOIDmode.
The mode is passed by

@@ -18758,7 +18771,7 @@ ix86_expand_vector_move (machine_mode mode,
rtx operands[])
   && (CONSTANT_P (op1)
   || (SUBREG_P (op1)
   && CONSTANT_P (SUBREG_REG (op1
-  && !standard_sse_constant_p (op1))
+  && !standard_sse_constant_p (op1, mode))
 op1 = validize_mem (force_const_mem (mode, op1));

This is why I have

-standard_sse_constant_p (rtx x)
+standard_sse_constant_p (rtx x, machine_mode mode)
 {
-  machine_mode mode;
-
   if (!TARGET_SSE)
 return 0;

-  mode = GET_MODE (x);
-
+  if (mode == VOIDmode)
+mode = GET_MODE (x);
+

since all 1s `x' may have VOIDmode when called from
ix86_expand_vector_move if mode isn't passed.

-- 
H.J.


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 1:54 PM, H.J. Lu  wrote:
> On Thu, Apr 21, 2016 at 3:18 AM, Uros Bizjak  wrote:
>> On Thu, Apr 21, 2016 at 9:42 AM, Uros Bizjak  wrote:
>>> On Thu, Apr 21, 2016 at 9:37 AM, Uros Bizjak  wrote:
 On Wed, Apr 20, 2016 at 9:53 PM, H.J. Lu  wrote:
> Since all 1s in TImode is standard SSE2 constants, all 1s in OImode is
> standard AVX2 constants and all 1s in XImode is standard AVX512F 
> constants,
> pass mode to standard_sse_constant_p and standard_sse_constant_opcode
> to check if all 1s is available for target.
>
> Tested on Linux/x86-64.  OK for master?

 No.

 This patch should use "isa" attribute instead of adding even more
 similar patterns. Also, please leave MEM_P checks, the rare C->m move
 can be easily resolved by IRA.
>>>
>>> Actually, register_operand checks are indeed better, please disregard
>>> MEM_P recommendation.
>>
>> So, something like attached untested RFC proto-patch, that lacks
>> wide-int handling.
>>
>> Uros.
>
> +
> +  else if (CONST_INT_P (x))
> +{
> +  if (INTVAL (X) == HOST_WIDE_INT_M1
> +  && TARGET_SSE2)
> + return 2;
> +}
> +  else if (CONST_WIDE_INT_P (x))
> +{
> +  if ( something involving wi::minus-one 
> +  && TARGET_AVX2)
> + return 2;
> +  if (
> +  && TARGET_AVX512F)
> + return 2;
> +}
> +  else if (vector_all_ones_operand (x, mode))
>
> All 1s may not use winde_int.  It has VOIDmode.
> The mode is passed by
>
> @@ -18758,7 +18771,7 @@ ix86_expand_vector_move (machine_mode mode,
> rtx operands[])
>&& (CONSTANT_P (op1)
>|| (SUBREG_P (op1)
>&& CONSTANT_P (SUBREG_REG (op1
> -  && !standard_sse_constant_p (op1))
> +  && !standard_sse_constant_p (op1, mode))
>  op1 = validize_mem (force_const_mem (mode, op1));
>
> This is why I have
>
> -standard_sse_constant_p (rtx x)
> +standard_sse_constant_p (rtx x, machine_mode mode)
>  {
> -  machine_mode mode;
> -
>if (!TARGET_SSE)
>  return 0;
>
> -  mode = GET_MODE (x);
> -
> +  if (mode == VOIDmode)
> +mode = GET_MODE (x);
> +
>
> since all 1s `x' may have VOIDmode when called from
> ix86_expand_vector_move if mode isn't passed.

We know, that const_int (-1) is allowed with TARGET_SSE2 and that
const_wide_int (-1) is allowed with TARGET_AVX2. Probably we don't
have to check AVX512F in standard_sse_constant_p, as it implies
TARGET_AVX2.

As said, it is the job of insn mode attributes to emit correct instruction.

Based on the above observations, mode checks for -1 are not needed in
standard_sse_constant_p.

Uros.


[PATCH, www] Fix typo in htdocs/develop.html

2016-04-21 Thread Kirill Yukhin
Hello,
This looks like a typo to me.

  GCC 6 Stage 4 (starts 2016-01-20)GCC 5.3 release (2015-12-04)
   |
   +-- GCC 5 branch created +
   | \
   v  v

Patch in the bottom. Is it ok to install?

--
Thanks, K

Index: develop.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/develop.html,v
retrieving revision 1.162
diff -u -r1.162 develop.html
--- develop.html15 Apr 2016 15:30:19 -  1.162
+++ develop.html21 Apr 2016 12:16:24 -
@@ -571,7 +571,7 @@
|   v
   GCC 6 Stage 4 (starts 2016-01-20)GCC 5.3 release (2015-12-04)
|
-   +-- GCC 5 branch created +
+   +-- GCC 6 branch created +
| \
v  v
   GCC 7 Stage 1 (starts 2016-04-15)



[PATCH] Fix PR70725 (followup to Mareks patch)

2016-04-21 Thread Richard Biener

The following fixes the followup ICEs in the testcase for PR70725
where Markes patch only fixed the first one.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Jakub - the patch should be safe in that any testcase running into
the changed paths now would have caused SSA verifications without
the patch.  There's still the question what to do for GCC 6.1 - we
can revert Mareks patch or apply this one ontop.  I don't have a
strong opinion here.

Richard.

2016-04-21  Richard Biener  

PR tree-optimization/70725
* tree-if-conv.c (if_convertible_phi_p): Adjust guard
for phi_convertible_by_degenerating_args.
(predicate_all_scalar_phis): Handle single-argument PHIs.

Index: gcc/tree-if-conv.c
===
*** gcc/tree-if-conv.c  (revision 235305)
--- gcc/tree-if-conv.c  (working copy)
*** if_convertible_phi_p (struct loop *loop,
*** 659,665 
  
if (bb != loop->header)
  {
!   if (gimple_phi_num_args (phi) != 2
  && !aggressive_if_conv
  && !phi_convertible_by_degenerating_args (phi))
{
--- 659,665 
  
if (bb != loop->header)
  {
!   if (gimple_phi_num_args (phi) > 2
  && !aggressive_if_conv
  && !phi_convertible_by_degenerating_args (phi))
{
*** predicate_all_scalar_phis (struct loop *
*** 1911,1930 
if (bb == loop->header)
continue;
  
-   if (EDGE_COUNT (bb->preds) == 1)
-   continue;
- 
phi_gsi = gsi_start_phis (bb);
if (gsi_end_p (phi_gsi))
continue;
  
!   gsi = gsi_after_labels (bb);
!   while (!gsi_end_p (phi_gsi))
{
! phi = phi_gsi.phi ();
! predicate_scalar_phi (phi, &gsi);
! release_phi_node (phi);
! gsi_next (&phi_gsi);
}
  
set_phi_nodes (bb, NULL);
--- 1911,1941 
if (bb == loop->header)
continue;
  
phi_gsi = gsi_start_phis (bb);
if (gsi_end_p (phi_gsi))
continue;
  
!   if (EDGE_COUNT (bb->preds) == 1)
{
! /* Propagate degenerate PHIs.  */
! for (phi_gsi = gsi_start_phis (bb); !gsi_end_p (phi_gsi);
!  gsi_next (&phi_gsi))
!   {
! gphi *phi = phi_gsi.phi ();
! replace_uses_by (gimple_phi_result (phi),
!  gimple_phi_arg_def (phi, 0));
!   }
!   }
!   else
!   {
! gsi = gsi_after_labels (bb);
! while (!gsi_end_p (phi_gsi))
!   {
! phi = phi_gsi.phi ();
! predicate_scalar_phi (phi, &gsi);
! release_phi_node (phi);
! gsi_next (&phi_gsi);
!   }
}
  
set_phi_nodes (bb, NULL);


Re: [PATCH] libffi testsuite: Use split to ensure valid tcl list

2016-04-21 Thread Thomas Schwinge
Hi!

Jakub, ping for gcc-6-branch (or, are you seeing useful libffi testing
results there?); I understand Mike's emails to mean that he's approved
the patch for trunk, so I'll commit it there, soon.

I also filed this as ; no
reaction so far.

On Thu, 25 Feb 2016 20:10:18 +0100, I wrote:
> Already had noticed something odd here months ago; now finally looked
> into it...
> 
> On Sat, 28 Mar 2015 13:59:30 -0400, John David Anglin  
> wrote:
> > The attached change fixes tcl errors that occur running the complex.exp and 
> > go.exp test sets.
> > See: .
> > 
> > Tested on hppa2.0w-hp-hpux11.11.  Okay for trunk?
> 
> (Got approved, and installed as r221765.)
> 
> > 2015-03-28  John David Anglin  
> > 
> > PR libffi/65567
> > * testsuite/lib/libffi.exp (libffi_feature_test): Use split to ensure
> > lindex is applied to a list.
> > 
> > Index: testsuite/lib/libffi.exp
> > ===
> > --- testsuite/lib/libffi.exp(revision 221591)
> > +++ testsuite/lib/libffi.exp(working copy)
> > @@ -238,7 +239,7 @@
> >  set lines [libffi_target_compile $src "" "preprocess" ""]
> >  file delete $src
> >  
> > -set last [lindex $lines end]
> > +set last [lindex [split $lines] end]
> >  return [regexp -- "xyzzy" $last]
> >  }
> 
> On my several systems, this has the effect that any user of
> libffi_feature_test has their test results regress from PASS to
> UNSUPPORTED.  Apparently the regexp xyzzy matching doesn't work as
> intended.  If I revert your patch, it's OK for me -- but still not for
> you, I suppose.  ;-)
> 
> How about the followinginstead?  It's conceptually simpler (and similar
> to what other such tests are doing), works for me -- but can you also
> please test this?
> 
> --- libffi/testsuite/lib/libffi.exp
> +++ libffi/testsuite/lib/libffi.exp
> @@ -227,20 +227,21 @@ proc libffi_target_compile { source dest type options } 
> {
>  
>  # TEST should be a preprocessor condition.  Returns true if it holds.
>  proc libffi_feature_test { test } {
> -set src "ffitest.c"
> +set src "ffitest[pid].c"
>  
>  set f [open $src "w"]
>  puts $f "#include "
>  puts $f $test
> -puts $f "xyzzy"
> +puts $f "/* OK */"
> +puts $f "#else"
> +puts $f "# error Failed $test"
>  puts $f "#endif"
>  close $f
>  
> -set lines [libffi_target_compile $src "" "preprocess" ""]
> +set lines [libffi_target_compile $src /dev/null assembly ""]
>  file delete $src
>  
> -set last [lindex [split $lines] end]
> -return [regexp -- "xyzzy" $last]
> +return [string match "" $lines]
>  }
>  
>  # Utility routines.


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH][RFC] Gimplify "into SSA"

2016-04-21 Thread Richard Biener

The following patch makes us not allocate decls but SSA names for
temporaries required during gimplification.  This is basically the
same thing as we do when calling the gimplifier on GENERIC expressions
from optimization passes (when we are already in SSA).

There are two benefits of doing this.

1) SSA names are smaller (72 bytes) than VAR_DECLs (144 bytes) and we
rewrite them into anonymous SSA names later anyway, leaving up the
VAR_DECLs for GC reclaim (but not their UID)

2) We keep expressions "connected" by having the use->def link via
SSA_NAME_DEF_STMT for example allowing match-and-simplify of
larger expressions on early GIMPLE

Complications arise from the fact that there is no CFG built and thus
we have to make sure to not use SSA names where we'd need PHIs.  Or
when CFG build may end up separating SSA def and use in a way current
into-SSA doesn't fix up (adding of abnormal edges, save-expr placement,
gimplification of type sizes, etc.).

As-is the patch has the downside of effectively disabling the
lookup_tmp_var () CSE for register typed temporaries and not
preserving the "fancy" names we derive from val in
create_tmp_from_val (that can be recovered easily though if
deemed worthwhile).

On dwarf2out.c (with -O0) the .gimple dumps show a 10% reduction
(~12000) of allocated DECLs (when looking at the highest D.12345).
The first two hunks of difference in the dump looks like (to get you an 
idea):

--- t.ii.004t.gimple2016-04-21 14:21:00.382647093 +0200
+++ t.ii.004t.gimple.patched2016-04-21 14:16:00.395261019 +0200
@@ -2,9 +2,7 @@
 int VALGRIND_PRINTF(const char*, ...) (const char * format)
 {
   long long unsigned int retval.0;
-  long long unsigned int format.1;
-  long long unsigned int vargs.2;
-  int D.102545;
+  int D.102543;
   long unsigned int _qzz_res;
   struct  vargs[1];
 
@@ -16,10 +14,10 @@
 volatile long long unsigned int _zzq_result;
 
 _zzq_args[0] = 5123;
-format.1 = (long long unsigned int) format;
-_zzq_args[1] = format.1;
-vargs.2 = (long long unsigned int) &vargs;
-_zzq_args[2] = vargs.2;
+_1 = (long long unsigned int) format;
+_zzq_args[1] = _1;
+_2 = (long long unsigned int) &vargs;
+_zzq_args[2] = _2;
 _zzq_args[3] = 0;
 _zzq_args[4] = 0;
 _zzq_args[5] = 0;

you can see how we derived the fancy name vargs.2 from &vargs for
example.  As said, if people prefer that's easy to preserve and
would then look like

+format_1 = (long long unsigned int) format;
+_zzq_args[1] = _1;
+vargs_2 = (long long unsigned int) &vargs;
+_zzq_args[2] = _2;

not sure if that isn't more confusing (given that 'format' may
be later written into SSA form and thus format_3 may appear).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

ISTR some complications in pre-SSA OMP stuff, eventually some
adjustments are still needed there but my last full testing
pre-dates a lot of development there.

Any comments or objections?
I'd like to go forward with this once GCC 6.1 is released.

Thanks,
Richard.

2016-04-21  Richard Biener  

* gimplify.h (get_initialized_tmp_var): Add allow_ssa parameter
default true.
* gimplify.c (internal_get_tmp_var): Add allow_ssa parameter
and override into_ssa with it.
(get_formal_tmp_var): Adjust.
(get_initialized_tmp_var): Add allow_ssa parameter.
(gimplify_call_expr): If the call may return twice do not
gimplify parameters into SSA.
(prepare_gimple_addressable): Do not allow an SSA name as
temporary.
(gimplify_modify_expr): Adjust assert.
(gimplify_save_expr): Do not allow an SSA name as save-expr
result.
(gimplify_one_sizepos): Set into_ssa to false around
gimplifying a type/decl size.
(gimplify_body): Init GIMPLE SSA data structures and gimplify
into-SSA.
* passes.def (pass_init_datastructures): Remove.
* tree-into-ssa.c (mark_def_sites): Ignore existing SSA names.
(rewrite_stmt): Likewise.
* tree-inline.c (initialize_cfun): Properly transfer SSA state.
(replace_locals_op): Replace SSA names.
(copy_gimple_seq_and_replace_locals): Init src_cfun.
* gimple-low.c (lower_builtin_setjmp): Deal with SSA.
* cgraph.c (release_function_body): Free CFG annotations only
when we have a CFG.  Simplify.
* gimple-fold.c (gimplify_and_update_call_from_tree): Use
force_gimple_operand instead of get_initialized_tmp_var.
* tree-pass.h (make_pass_init_datastructures): Remove.
* tree-ssa.c (execute_init_datastructures): Remove.
(pass_data_init_datastructures): Likewise.
(class pass_init_datastructures): Likewise.
(make_pass_init_datastructures): Likewise.

Index: gcc/gimplify.c
===
*** gcc/gimplify.c.orig 2016-04-

Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread H.J. Lu
On Thu, Apr 21, 2016 at 5:15 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 1:54 PM, H.J. Lu  wrote:
>> On Thu, Apr 21, 2016 at 3:18 AM, Uros Bizjak  wrote:
>>> On Thu, Apr 21, 2016 at 9:42 AM, Uros Bizjak  wrote:
 On Thu, Apr 21, 2016 at 9:37 AM, Uros Bizjak  wrote:
> On Wed, Apr 20, 2016 at 9:53 PM, H.J. Lu  wrote:
>> Since all 1s in TImode is standard SSE2 constants, all 1s in OImode is
>> standard AVX2 constants and all 1s in XImode is standard AVX512F 
>> constants,
>> pass mode to standard_sse_constant_p and standard_sse_constant_opcode
>> to check if all 1s is available for target.
>>
>> Tested on Linux/x86-64.  OK for master?
>
> No.
>
> This patch should use "isa" attribute instead of adding even more
> similar patterns. Also, please leave MEM_P checks, the rare C->m move
> can be easily resolved by IRA.

 Actually, register_operand checks are indeed better, please disregard
 MEM_P recommendation.
>>>
>>> So, something like attached untested RFC proto-patch, that lacks
>>> wide-int handling.
>>>
>>> Uros.
>>
>> +
>> +  else if (CONST_INT_P (x))
>> +{
>> +  if (INTVAL (X) == HOST_WIDE_INT_M1
>> +  && TARGET_SSE2)
>> + return 2;
>> +}
>> +  else if (CONST_WIDE_INT_P (x))
>> +{
>> +  if ( something involving wi::minus-one 
>> +  && TARGET_AVX2)
>> + return 2;
>> +  if (
>> +  && TARGET_AVX512F)
>> + return 2;
>> +}
>> +  else if (vector_all_ones_operand (x, mode))
>>
>> All 1s may not use winde_int.  It has VOIDmode.
>> The mode is passed by
>>
>> @@ -18758,7 +18771,7 @@ ix86_expand_vector_move (machine_mode mode,
>> rtx operands[])
>>&& (CONSTANT_P (op1)
>>|| (SUBREG_P (op1)
>>&& CONSTANT_P (SUBREG_REG (op1
>> -  && !standard_sse_constant_p (op1))
>> +  && !standard_sse_constant_p (op1, mode))
>>  op1 = validize_mem (force_const_mem (mode, op1));
>>
>> This is why I have
>>
>> -standard_sse_constant_p (rtx x)
>> +standard_sse_constant_p (rtx x, machine_mode mode)
>>  {
>> -  machine_mode mode;
>> -
>>if (!TARGET_SSE)
>>  return 0;
>>
>> -  mode = GET_MODE (x);
>> -
>> +  if (mode == VOIDmode)
>> +mode = GET_MODE (x);
>> +
>>
>> since all 1s `x' may have VOIDmode when called from
>> ix86_expand_vector_move if mode isn't passed.
>
> We know, that const_int (-1) is allowed with TARGET_SSE2 and that
> const_wide_int (-1) is allowed with TARGET_AVX2. Probably we don't
> have to check AVX512F in standard_sse_constant_p, as it implies
> TARGET_AVX2.
>
> As said, it is the job of insn mode attributes to emit correct instruction.
>
> Based on the above observations, mode checks for -1 are not needed in
> standard_sse_constant_p.

void
ix86_expand_vector_move (machine_mode mode, rtx operands[])
{
  rtx op0 = operands[0], op1 = operands[1];
  /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
 psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
  unsigned int align = (TARGET_IAMCU
? GET_MODE_BITSIZE (mode)
: GET_MODE_ALIGNMENT (mode));

  if (push_operand (op0, VOIDmode))
op0 = emit_move_resolve_push (mode, op0);

  /* Force constants other than zero into memory.  We do not know how
 the instructions used to build constants modify the upper 64 bits
 of the register, once we have that information we may be able
 to handle some of them more efficiently.  */
  if (can_create_pseudo_p ()
  && register_operand (op0, mode)
  && (CONSTANT_P (op1)
  || (SUBREG_P (op1)
  && CONSTANT_P (SUBREG_REG (op1
  && !standard_sse_constant_p (op1))
^

What should it return for  op1 == (VOIDmode) -1 when
TARGET_AVX is true and TARGET_AVX2 is false for
mode == TImode and mode == OImode?

op1 = validize_mem (force_const_mem (mode, op1));


-- 
H.J.


Re: [PATCH] Fix PR70725 (followup to Mareks patch)

2016-04-21 Thread Jakub Jelinek
On Thu, Apr 21, 2016 at 02:20:34PM +0200, Richard Biener wrote:
> 
> The following fixes the followup ICEs in the testcase for PR70725
> where Markes patch only fixed the first one.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Jakub - the patch should be safe in that any testcase running into
> the changed paths now would have caused SSA verifications without
> the patch.  There's still the question what to do for GCC 6.1 - we
> can revert Mareks patch or apply this one ontop.  I don't have a
> strong opinion here.

I think reverting Marek's patch for 6.1 and reapplying it for 6.2 together
with this would be safest, but am not strongly against just applying this to
6.1 either.

> 2016-04-21  Richard Biener  
> 
>   PR tree-optimization/70725
>   * tree-if-conv.c (if_convertible_phi_p): Adjust guard
>   for phi_convertible_by_degenerating_args.
>   (predicate_all_scalar_phis): Handle single-argument PHIs.

Jakub


Re: [PATCH] libffi testsuite: Use split to ensure valid tcl list

2016-04-21 Thread Jakub Jelinek
On Thu, Apr 21, 2016 at 02:21:07PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> Jakub, ping for gcc-6-branch (or, are you seeing useful libffi testing
> results there?); I understand Mike's emails to mean that he's approved
> the patch for trunk, so I'll commit it there, soon.
> 
> I also filed this as ; no
> reaction so far.

This IMNSHO isn't release critical, I'm not against applying it for 6.2, but
for 6.1 we should only get release critical stuff in now.

Jakub


[PATCH][wwwdocs] Add deprecation of pre-ARMv4T architectures to the release notes

2016-04-21 Thread Kyrill Tkachov

Hi all,

This patch lists the -mcpu and -march values that are deprecated for GCC 6.
Joel indicated that it would be useful to enumerate them all.

Ok to commit?

Thanks,
Kyrill
Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.63
diff -U 3 -r1.63 changes.html
--- htdocs/gcc-6/changes.html	26 Feb 2016 14:02:21 -	1.63
+++ htdocs/gcc-6/changes.html	15 Apr 2016 13:43:56 -
@@ -342,9 +342,16 @@
  
Support for revisions of the ARM architecture prior to ARMv4t has
been deprecated and will be removed in a future GCC release.
-   This affects ARM6, ARM7 (but not ARM7TDMI), ARM8, StrongARM, and
-   Faraday fa526 and fa626 devices, which do not have support for
-   the Thumb execution state.
+   The -mcpu and -mtune values that are
+   deprecated are:
+   arm2, arm250, arm3, arm6, arm60, arm600, arm610, arm620, arm7,
+   arm7d, arm7di, arm70, arm700, arm700i, arm710, arm720, arm710c,
+   arm7100, arm7500, arm7500fe, arm7m, arm7dm, arm7dmi, arm8, arm810,
+   strongarm, strongarm110, strongarm1100, strongarm1110, fa526,
+   fa626.  The value
+   arm7tdmi is still supported.
+   The values of -march that are deprecated are:
+   armv2,armv2a,armv3,armv3m,armv4.
  
  
The ARM port now supports target attributes and pragmas.  Please


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 2:59 PM, H.J. Lu  wrote:

>> We know, that const_int (-1) is allowed with TARGET_SSE2 and that
>> const_wide_int (-1) is allowed with TARGET_AVX2. Probably we don't
>> have to check AVX512F in standard_sse_constant_p, as it implies
>> TARGET_AVX2.
>>
>> As said, it is the job of insn mode attributes to emit correct instruction.
>>
>> Based on the above observations, mode checks for -1 are not needed in
>> standard_sse_constant_p.
>
> void
> ix86_expand_vector_move (machine_mode mode, rtx operands[])
> {
>   rtx op0 = operands[0], op1 = operands[1];
>   /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
>  psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
>   unsigned int align = (TARGET_IAMCU
> ? GET_MODE_BITSIZE (mode)
> : GET_MODE_ALIGNMENT (mode));
>
>   if (push_operand (op0, VOIDmode))
> op0 = emit_move_resolve_push (mode, op0);
>
>   /* Force constants other than zero into memory.  We do not know how
>  the instructions used to build constants modify the upper 64 bits
>  of the register, once we have that information we may be able
>  to handle some of them more efficiently.  */
>   if (can_create_pseudo_p ()
>   && register_operand (op0, mode)
>   && (CONSTANT_P (op1)
>   || (SUBREG_P (op1)
>   && CONSTANT_P (SUBREG_REG (op1
>   && !standard_sse_constant_p (op1))
> ^
>
> What should it return for  op1 == (VOIDmode) -1 when
> TARGET_AVX is true and TARGET_AVX2 is false for
> mode == TImode and mode == OImode?
>
> op1 = validize_mem (force_const_mem (mode, op1));

Let me rethink and redesign this whole mess, so we will have
consistent predicates.

Uros.


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread H.J. Lu
On Thu, Apr 21, 2016 at 6:33 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 2:59 PM, H.J. Lu  wrote:
>
>>> We know, that const_int (-1) is allowed with TARGET_SSE2 and that
>>> const_wide_int (-1) is allowed with TARGET_AVX2. Probably we don't
>>> have to check AVX512F in standard_sse_constant_p, as it implies
>>> TARGET_AVX2.
>>>
>>> As said, it is the job of insn mode attributes to emit correct instruction.
>>>
>>> Based on the above observations, mode checks for -1 are not needed in
>>> standard_sse_constant_p.
>>
>> void
>> ix86_expand_vector_move (machine_mode mode, rtx operands[])
>> {
>>   rtx op0 = operands[0], op1 = operands[1];
>>   /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
>>  psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
>>   unsigned int align = (TARGET_IAMCU
>> ? GET_MODE_BITSIZE (mode)
>> : GET_MODE_ALIGNMENT (mode));
>>
>>   if (push_operand (op0, VOIDmode))
>> op0 = emit_move_resolve_push (mode, op0);
>>
>>   /* Force constants other than zero into memory.  We do not know how
>>  the instructions used to build constants modify the upper 64 bits
>>  of the register, once we have that information we may be able
>>  to handle some of them more efficiently.  */
>>   if (can_create_pseudo_p ()
>>   && register_operand (op0, mode)
>>   && (CONSTANT_P (op1)
>>   || (SUBREG_P (op1)
>>   && CONSTANT_P (SUBREG_REG (op1
>>   && !standard_sse_constant_p (op1))
>> ^
>>
>> What should it return for  op1 == (VOIDmode) -1 when
>> TARGET_AVX is true and TARGET_AVX2 is false for
>> mode == TImode and mode == OImode?
>>
>> op1 = validize_mem (force_const_mem (mode, op1));
>
> Let me rethink and redesign this whole mess, so we will have
> consistent predicates.

The problem is because -1 has no mode.  We can't tell
if -1 is a valid SSE constant without mode.  That is my
change to standard_sse_constant_p and
ix86_expand_vector_move is for.   It is sufficient for
all my tests, including benchmark runs.


-- 
H.J.


Re: gomp_target_fini

2016-04-21 Thread Alexander Monakov
On Tue, 19 Apr 2016, Jakub Jelinek wrote:
> On Tue, Apr 19, 2016 at 04:01:06PM +0200, Thomas Schwinge wrote:
> > Two other solutions have been proposed in the past months: Chung-Lin's
> > patches with subject: "Adjust offload plugin interface for avoiding
> > deadlock on exit", later: "Resolve libgomp plugin deadlock on exit",
> > later: "Resolve deadlock on plugin exit" (still pending review/approval),
> > and Alexander's much smaller patch with subject: "libgomp plugin: make
> > cuMemFreeHost error non-fatal",
> > .
> > (Both of which I have not reviewed in detail.)  Assuming that Chung-Lin's
> > patches are considered too invasive for gcc-6-branch, can we at least get
> > Alexander's patch committed to gcc-6-branch as well as on trunk, please?
> 
> Yeah, Alex' patch is IMHO fine, even for gcc-6-branch.

Applied to both.

Thanks.
Alexander


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 3:43 PM, H.J. Lu  wrote:
> On Thu, Apr 21, 2016 at 6:33 AM, Uros Bizjak  wrote:
>> On Thu, Apr 21, 2016 at 2:59 PM, H.J. Lu  wrote:
>>
 We know, that const_int (-1) is allowed with TARGET_SSE2 and that
 const_wide_int (-1) is allowed with TARGET_AVX2. Probably we don't
 have to check AVX512F in standard_sse_constant_p, as it implies
 TARGET_AVX2.

 As said, it is the job of insn mode attributes to emit correct instruction.

 Based on the above observations, mode checks for -1 are not needed in
 standard_sse_constant_p.
>>>
>>> void
>>> ix86_expand_vector_move (machine_mode mode, rtx operands[])
>>> {
>>>   rtx op0 = operands[0], op1 = operands[1];
>>>   /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
>>>  psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
>>>   unsigned int align = (TARGET_IAMCU
>>> ? GET_MODE_BITSIZE (mode)
>>> : GET_MODE_ALIGNMENT (mode));
>>>
>>>   if (push_operand (op0, VOIDmode))
>>> op0 = emit_move_resolve_push (mode, op0);
>>>
>>>   /* Force constants other than zero into memory.  We do not know how
>>>  the instructions used to build constants modify the upper 64 bits
>>>  of the register, once we have that information we may be able
>>>  to handle some of them more efficiently.  */
>>>   if (can_create_pseudo_p ()
>>>   && register_operand (op0, mode)
>>>   && (CONSTANT_P (op1)
>>>   || (SUBREG_P (op1)
>>>   && CONSTANT_P (SUBREG_REG (op1
>>>   && !standard_sse_constant_p (op1))
>>> ^
>>>
>>> What should it return for  op1 == (VOIDmode) -1 when
>>> TARGET_AVX is true and TARGET_AVX2 is false for
>>> mode == TImode and mode == OImode?
>>>
>>> op1 = validize_mem (force_const_mem (mode, op1));
>>
>> Let me rethink and redesign this whole mess, so we will have
>> consistent predicates.
>
> The problem is because -1 has no mode.  We can't tell
> if -1 is a valid SSE constant without mode.  That is my
> change to standard_sse_constant_p and
> ix86_expand_vector_move is for.   It is sufficient for
> all my tests, including benchmark runs.

I'm not against mode checks, but IMO, we have to do these checks in
predicates, where we know operand mode.

Uros.


Re: [PATCH][wwwdocs] Add deprecation of pre-ARMv4T architectures to the release notes

2016-04-21 Thread Gerald Pfeifer

On Thu, 21 Apr 2016, Kyrill Tkachov wrote:
This patch lists the -mcpu and -march values that are deprecated for 
GCC 6. Joel indicated that it would be useful to enumerate them all.


Yes, this is very useful.  Thank you!


Ok to commit?


Yep from my side.  (Perhaps a native speaker can help us whether
"value" is the best word here...  Joel?)

Gerald


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread H.J. Lu
On Thu, Apr 21, 2016 at 6:48 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 3:43 PM, H.J. Lu  wrote:
>> On Thu, Apr 21, 2016 at 6:33 AM, Uros Bizjak  wrote:
>>> On Thu, Apr 21, 2016 at 2:59 PM, H.J. Lu  wrote:
>>>
> We know, that const_int (-1) is allowed with TARGET_SSE2 and that
> const_wide_int (-1) is allowed with TARGET_AVX2. Probably we don't
> have to check AVX512F in standard_sse_constant_p, as it implies
> TARGET_AVX2.
>
> As said, it is the job of insn mode attributes to emit correct 
> instruction.
>
> Based on the above observations, mode checks for -1 are not needed in
> standard_sse_constant_p.

 void
 ix86_expand_vector_move (machine_mode mode, rtx operands[])
 {
   rtx op0 = operands[0], op1 = operands[1];
   /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
  psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
   unsigned int align = (TARGET_IAMCU
 ? GET_MODE_BITSIZE (mode)
 : GET_MODE_ALIGNMENT (mode));

   if (push_operand (op0, VOIDmode))
 op0 = emit_move_resolve_push (mode, op0);

   /* Force constants other than zero into memory.  We do not know how
  the instructions used to build constants modify the upper 64 bits
  of the register, once we have that information we may be able
  to handle some of them more efficiently.  */
   if (can_create_pseudo_p ()
   && register_operand (op0, mode)
   && (CONSTANT_P (op1)
   || (SUBREG_P (op1)
   && CONSTANT_P (SUBREG_REG (op1
   && !standard_sse_constant_p (op1))
 ^

 What should it return for  op1 == (VOIDmode) -1 when
 TARGET_AVX is true and TARGET_AVX2 is false for
 mode == TImode and mode == OImode?

 op1 = validize_mem (force_const_mem (mode, op1));
>>>
>>> Let me rethink and redesign this whole mess, so we will have
>>> consistent predicates.
>>
>> The problem is because -1 has no mode.  We can't tell
>> if -1 is a valid SSE constant without mode.  That is my
>> change to standard_sse_constant_p and
>> ix86_expand_vector_move is for.   It is sufficient for
>> all my tests, including benchmark runs.
>
> I'm not against mode checks, but IMO, we have to do these checks in
> predicates, where we know operand mode.

I tried and it doesn't work since the correct mode may not be always
available in predicates.  Yes, they pass mode.  But they just do

mode = GET_MODE (op);

which returns VOIDmode for -1.

-- 
H.J.


Re: C++ PATCH to fix a part of c++/70513 (ICE-on-invalid with enums)

2016-04-21 Thread Jason Merrill

On 04/21/2016 07:35 AM, Marek Polacek wrote:

+ permerror (type_start_token->location,
+"extra qualification not allowed");
+ type = error_mark_node;


If we're using permerror, we shouldn't set type to error_mark_node; if 
we do that, -fpermissive won't make it work.


Jason



Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 3:54 PM, H.J. Lu  wrote:
> On Thu, Apr 21, 2016 at 6:48 AM, Uros Bizjak  wrote:
>> On Thu, Apr 21, 2016 at 3:43 PM, H.J. Lu  wrote:
>>> On Thu, Apr 21, 2016 at 6:33 AM, Uros Bizjak  wrote:
 On Thu, Apr 21, 2016 at 2:59 PM, H.J. Lu  wrote:

>> We know, that const_int (-1) is allowed with TARGET_SSE2 and that
>> const_wide_int (-1) is allowed with TARGET_AVX2. Probably we don't
>> have to check AVX512F in standard_sse_constant_p, as it implies
>> TARGET_AVX2.
>>
>> As said, it is the job of insn mode attributes to emit correct 
>> instruction.
>>
>> Based on the above observations, mode checks for -1 are not needed in
>> standard_sse_constant_p.
>
> void
> ix86_expand_vector_move (machine_mode mode, rtx operands[])
> {
>   rtx op0 = operands[0], op1 = operands[1];
>   /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
>  psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
>   unsigned int align = (TARGET_IAMCU
> ? GET_MODE_BITSIZE (mode)
> : GET_MODE_ALIGNMENT (mode));
>
>   if (push_operand (op0, VOIDmode))
> op0 = emit_move_resolve_push (mode, op0);
>
>   /* Force constants other than zero into memory.  We do not know how
>  the instructions used to build constants modify the upper 64 bits
>  of the register, once we have that information we may be able
>  to handle some of them more efficiently.  */
>   if (can_create_pseudo_p ()
>   && register_operand (op0, mode)
>   && (CONSTANT_P (op1)
>   || (SUBREG_P (op1)
>   && CONSTANT_P (SUBREG_REG (op1
>   && !standard_sse_constant_p (op1))
> ^
>
> What should it return for  op1 == (VOIDmode) -1 when
> TARGET_AVX is true and TARGET_AVX2 is false for
> mode == TImode and mode == OImode?
>
> op1 = validize_mem (force_const_mem (mode, op1));

 Let me rethink and redesign this whole mess, so we will have
 consistent predicates.
>>>
>>> The problem is because -1 has no mode.  We can't tell
>>> if -1 is a valid SSE constant without mode.  That is my
>>> change to standard_sse_constant_p and
>>> ix86_expand_vector_move is for.   It is sufficient for
>>> all my tests, including benchmark runs.
>>
>> I'm not against mode checks, but IMO, we have to do these checks in
>> predicates, where we know operand mode.
>
> I tried and it doesn't work since the correct mode may not be always
> available in predicates.  Yes, they pass mode.  But they just do
>
> mode = GET_MODE (op);
>
> which returns VOIDmode for -1.

Well, looking at generated gcc/insns-preds.c, the predicates do:

(mode == VOIDmode || GET_MODE (op) == mode).

They *check* and don't *assign* "mode" variable.

So, I see no problem checking "mode" variable (that gets the value
from the pattern) in the predicates.

Uros.


Re: [PATCH] add support for placing variables in shared memory

2016-04-21 Thread Nathan Sidwell

On 04/20/16 12:58, Alexander Monakov wrote:

Allow using __attribute__((shared)) to place static variables in '.shared'
memory space.


What is the rationale for a new attribute, rather than leveraging the existing 
section(".shared") machinery?



+  else if (current_function_decl && !TREE_STATIC (decl))
+{
+  error ("%qE attribute only applies to non-stack variables", name);


'non-stack'?  don't you mean 'static' or 'static storage'?

Needs a test case.

nathan


Re: [PATCH] opts-global.c: Include gimple.h for LAST_AND_UNUSED_GIMPLE_CODE.

2016-04-21 Thread Khem Raj
On Thu, Apr 21, 2016 at 3:33 AM, Alexander Monakov  wrote:
> On Wed, 20 Apr 2016, Khem Raj wrote:
>
>> gcc/:
>> 2016-04-16  Khem Raj  
>>
>>   * opts-global.c: Include gimple.h for LAST_AND_UNUSED_GIMPLE_CODE.
>>
>> Fixes build errors e.g.
>>
>> | 
>> ../../../../../../../work-shared/gcc-6.0.0-r0/git/gcc/lto-streamer.h:159:34: 
>> error: 'LAST_AND_UNUSED_GIMPLE_CODE' was not declared in this scope
>> |LTO_bb0 = 1 + MAX_TREE_CODES + LAST_AND_UNUSED_GIMPLE_CODE,
>> ---
>>  gcc/opts-global.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/gcc/opts-global.c b/gcc/opts-global.c
>> index 989ef3d..92fb9ac 100644
>> --- a/gcc/opts-global.c
>> +++ b/gcc/opts-global.c
>> @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "plugin-api.h"
>>  #include "ipa-ref.h"
>>  #include "cgraph.h"
>> +#include "gimple.h"
>>  #include "lto-streamer.h"
>>  #include "output.h"
>>  #include "plugin.h"
>
> The context in this patch looks like old contents of opts-global.c, prior to
> Andrew MacLeod's cleanups in December 2015.

Right. Ignore this patch, I forgot that I had local patches in OpenEmbedded
which should have been forward ported correctly.

 Here's how the includes look in
> today's gcc-6 branch:
>
> 21 #include "config.h"
> 22 #include "system.h"
> 23 #include "coretypes.h"
> 24 #include "backend.h"
> 25 #include "rtl.h"
> 26 #include "tree.h"
> 27 #include "tree-pass.h"
> 28 #include "diagnostic.h"
> 29 #include "opts.h"
> 30 #include "flags.h"
> 31 #include "langhooks.h"
> 32 #include "dbgcnt.h"
> 33 #include "debug.h"
> 34 #include "output.h"
> 35 #include "plugin.h"
> 36 #include "toplev.h"
> 37 #include "context.h"
> 38 #include "asan.h"
>
> Alexander


Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-04-21 Thread Nathan Sidwell

On 04/20/16 12:59, Alexander Monakov wrote:

This patch implements per-warp compiler-defined stacks under -msoft-stack
option, and implements alloca on top of that.  In a few obvious places,
changes from -muniform-simt patch are present in the hunks.



It'd be better to not  mix fragments of patches, and have a description of how 
soft stacks works.



+  /* fstmp2 = &__nvptx_stacks[tid.y];  */


?


+  /* crtl->is_leaf is not initialized because RA is not run.  */


Cryptic comment is cryptic.



+  fprintf (asm_out_file, ".extern .shared .u%d __nvptx_stacks[32];\n",


Magic constant '32'?


+  if (need_unisimt_decl)
+{
+  write_var_marker (asm_out_file, false, true, "__nvptx_uni");
+  fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n");
+}


Looks like some other patch?





  /* Expander for the shuffle builtins.  */
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 381269e..6da4d06 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h



+  if (TARGET_UNIFORM_SIMT) \
+builtin_define ("__nvptx_unisimt__");\


Likewise.



+  rtx unisimt_master; /* Master lane index for "uniform simt" mode.  */
+  rtx unisimt_predicate; /* Predicate register for "uniform simt".  */


Likewise.

Needs testcases.

nathan


Re: [PATCH] Fix PR70725 (followup to Mareks patch)

2016-04-21 Thread Richard Biener
On Thu, 21 Apr 2016, Jakub Jelinek wrote:

> On Thu, Apr 21, 2016 at 02:20:34PM +0200, Richard Biener wrote:
> > 
> > The following fixes the followup ICEs in the testcase for PR70725
> > where Markes patch only fixed the first one.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > Jakub - the patch should be safe in that any testcase running into
> > the changed paths now would have caused SSA verifications without
> > the patch.  There's still the question what to do for GCC 6.1 - we
> > can revert Mareks patch or apply this one ontop.  I don't have a
> > strong opinion here.
> 
> I think reverting Marek's patch for 6.1 and reapplying it for 6.2 together
> with this would be safest, but am not strongly against just applying this to
> 6.1 either.

Ok, it doesn't apply cleanly and I won't get to it today so please
revert Mareks change on the branch if you want to create RC2 today.

Applied to trunk sofar.

Richard.

> > 2016-04-21  Richard Biener  
> > 
> > PR tree-optimization/70725
> > * tree-if-conv.c (if_convertible_phi_p): Adjust guard
> > for phi_convertible_by_degenerating_args.
> > (predicate_all_scalar_phis): Handle single-argument PHIs.


[Ping] [C++ Patch] PR 70540 ("[4.9/5/6 Regression] ICE on invalid code in cxx_incomplete_type_diagnostic...")

2016-04-21 Thread Paolo Carlini

Hi,

On 14/04/2016 11:50, Paolo Carlini wrote:

Hi,

in this regression we ICE during error recovery after an additional 
redundant error message. I think it's one of those cases (we have got 
quite a few elsewhere, in semantics.c too) when it's better to 
immediately return error_mark_node when mark_used returns false, even 
if we aren't in a SFINAE context. To be sure, I double checked that in 
those cases mark_used certainly issues an error, thus we aren't 
risking creating accepts-invalid bugs, it's only matter of fine tuning 
error recovery. Tested x86_64-linux.
Pinging this... for trunk and maybe for the branch too when it reopens 
if everything goes well?


https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00644.html

Thanks,
Paolo.


Re: [PATCH] add support for placing variables in shared memory

2016-04-21 Thread Alexander Monakov
On Thu, 21 Apr 2016, Nathan Sidwell wrote:
> On 04/20/16 12:58, Alexander Monakov wrote:
> > Allow using __attribute__((shared)) to place static variables in '.shared'
> > memory space.
> 
> What is the rationale for a new attribute, rather than leveraging the existing
> section(".shared") machinery?

Section switching does not work at all on NVPTX in GCC at present.  PTX
assembly has no notion of different data sections, so the backend does not
advertise section switching capability to the middle end.

CUDA C does it via attributes too, and there's no point in diverging
gratuitously I think.

> > +  else if (current_function_decl && !TREE_STATIC (decl))
> > +{
> > +  error ("%qE attribute only applies to non-stack variables", name);
> 
> 'non-stack'?  don't you mean 'static' or 'static storage'?

I avoided using 'static' because it applies to external declarations as well.
Other backends use "%qE attribute not allowed with auto storage class"; I'll
be happy to switch to that for consistency.

> Needs a test case.

OK, will follow up with a testcase.

Thanks.
Alexander


[PATCH] Avoid allocating garbage GIMPLE_NOPs at LTO stream in time

2016-04-21 Thread Richard Biener

The following avoids allocating a GIMPLE_NOP as a def for all SSA names
(we stream stmts and adjust SSA_NAME_DEF_STMT later).  This requires
us to make sure we do not have unused SSA names around which we can
easily achieve at stream-out time - otherwise passes walking over
all SSA names rightfully expect a def stmt for all of them.

LTO bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-04-21  Richard Biener  

* lto-streamer-in.c (input_ssa_names): Do not allocate
GIMPLE_NOP for all SSA names.
* lto-streamer-out.c (output_ssa_names): Do not output
SSA names that should have been released.

Index: gcc/lto-streamer-in.c
===
--- gcc/lto-streamer-in.c   (revision 235305)
+++ gcc/lto-streamer-in.c   (working copy)
@@ -881,10 +881,13 @@ input_ssa_names (struct lto_input_block
 
   is_default_def = (streamer_read_uchar (ib) != 0);
   name = stream_read_tree (ib, data_in);
-  ssa_name = make_ssa_name_fn (fn, name, gimple_build_nop ());
+  ssa_name = make_ssa_name_fn (fn, name, NULL);
 
   if (is_default_def)
-   set_ssa_default_def (cfun, SSA_NAME_VAR (ssa_name), ssa_name);
+   {
+ set_ssa_default_def (cfun, SSA_NAME_VAR (ssa_name), ssa_name);
+ SSA_NAME_DEF_STMT (ssa_name) = gimple_build_nop ();
+   }
 
   i = streamer_read_uhwi (ib);
 }
Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  (revision 235305)
+++ gcc/lto-streamer-out.c  (working copy)
@@ -1816,7 +1816,11 @@ output_ssa_names (struct output_block *o
 
   if (ptr == NULL_TREE
  || SSA_NAME_IN_FREE_LIST (ptr)
- || virtual_operand_p (ptr))
+ || virtual_operand_p (ptr)
+ /* Simply skip unreleased SSA names.  */
+ || (! SSA_NAME_IS_DEFAULT_DEF (ptr)
+ && (! SSA_NAME_DEF_STMT (ptr)
+ || ! gimple_bb (SSA_NAME_DEF_STMT (ptr)
continue;
 
   streamer_write_uhwi (ob, i);



Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread H.J. Lu
On Thu, Apr 21, 2016 at 6:59 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 3:54 PM, H.J. Lu  wrote:
>> On Thu, Apr 21, 2016 at 6:48 AM, Uros Bizjak  wrote:
>>> On Thu, Apr 21, 2016 at 3:43 PM, H.J. Lu  wrote:
 On Thu, Apr 21, 2016 at 6:33 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 2:59 PM, H.J. Lu  wrote:
>
>>> We know, that const_int (-1) is allowed with TARGET_SSE2 and that
>>> const_wide_int (-1) is allowed with TARGET_AVX2. Probably we don't
>>> have to check AVX512F in standard_sse_constant_p, as it implies
>>> TARGET_AVX2.
>>>
>>> As said, it is the job of insn mode attributes to emit correct 
>>> instruction.
>>>
>>> Based on the above observations, mode checks for -1 are not needed in
>>> standard_sse_constant_p.
>>
>> void
>> ix86_expand_vector_move (machine_mode mode, rtx operands[])
>> {
>>   rtx op0 = operands[0], op1 = operands[1];
>>   /* Use GET_MODE_BITSIZE instead of GET_MODE_ALIGNMENT for IA MCU
>>  psABI since the biggest alignment is 4 byte for IA MCU psABI.  */
>>   unsigned int align = (TARGET_IAMCU
>> ? GET_MODE_BITSIZE (mode)
>> : GET_MODE_ALIGNMENT (mode));
>>
>>   if (push_operand (op0, VOIDmode))
>> op0 = emit_move_resolve_push (mode, op0);
>>
>>   /* Force constants other than zero into memory.  We do not know how
>>  the instructions used to build constants modify the upper 64 bits
>>  of the register, once we have that information we may be able
>>  to handle some of them more efficiently.  */
>>   if (can_create_pseudo_p ()
>>   && register_operand (op0, mode)
>>   && (CONSTANT_P (op1)
>>   || (SUBREG_P (op1)
>>   && CONSTANT_P (SUBREG_REG (op1
>>   && !standard_sse_constant_p (op1))
>> ^
>>
>> What should it return for  op1 == (VOIDmode) -1 when
>> TARGET_AVX is true and TARGET_AVX2 is false for
>> mode == TImode and mode == OImode?
>>
>> op1 = validize_mem (force_const_mem (mode, op1));
>
> Let me rethink and redesign this whole mess, so we will have
> consistent predicates.

 The problem is because -1 has no mode.  We can't tell
 if -1 is a valid SSE constant without mode.  That is my
 change to standard_sse_constant_p and
 ix86_expand_vector_move is for.   It is sufficient for
 all my tests, including benchmark runs.
>>>
>>> I'm not against mode checks, but IMO, we have to do these checks in
>>> predicates, where we know operand mode.
>>
>> I tried and it doesn't work since the correct mode may not be always
>> available in predicates.  Yes, they pass mode.  But they just do
>>
>> mode = GET_MODE (op);
>>
>> which returns VOIDmode for -1.
>
> Well, looking at generated gcc/insns-preds.c, the predicates do:
>
> (mode == VOIDmode || GET_MODE (op) == mode).
>
> They *check* and don't *assign* "mode" variable.
>
> So, I see no problem checking "mode" variable (that gets the value
> from the pattern) in the predicates.

This is an incomplete list:

combine.c:   && ! push_operand (dest, GET_MODE (dest)))
expr.c:  if (push_operand (x, GET_MODE (x)))
expr.c:  && ! push_operand (x, GET_MODE (x
gcse.c:   && ! push_operand (dest, GET_MODE (dest)))
gcse.c:  if (general_operand (exp, GET_MODE (reg)))
ifcvt.c:  if (! general_operand (cmp_a, GET_MODE (cmp_a))
ifcvt.c:  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
ifcvt.c:  else if (general_operand (b, GET_MODE (b)))
ifcvt.c:  if (! general_operand (a, GET_MODE (a)) || tmp_a)
ifcvt.c:  if (! general_operand (b, GET_MODE (b)) || tmp_b)
ira-costs.c:  if (address_operand (op, GET_MODE (op))
ira-costs.c:  && general_operand (SET_SRC (set), GET_MODE (SET_SRC (set
lower-subreg.c:  if (GET_MODE (op_operand) != word_mode
lower-subreg.c:  && GET_MODE_SIZE (GET_MODE (op_operand)) > UNITS_PER_WORD)
lower-subreg.c: GET_MODE (op_operand),
lra-constraints.c: if (simplify_operand_subreg (i, GET_MODE (old)) ||
op_change_p)
optabs.c:  create_output_operand (&ops[0], target, GET_MODE (target));
optabs.c:  create_input_operand (&ops[1], op0, GET_MODE (op0));
postreload-gcse.c:  if (! push_operand (dest, GET_MODE (dest)))
postreload-gcse.c:  && general_operand (src, GET_MODE (src))
postreload-gcse.c:  && general_operand (dest, GET_MODE (dest))
postreload-gcse.c:  && general_operand (src, GET_MODE (src))

IRA and LRA use GET_MODE and pass it to predicates.

-- 
H.J.


[4.9/5/6: PATCH] Replace -skip-rax-setup with -mskip-rax-setup

2016-04-21 Thread H.J. Lu
On Wed, Apr 20, 2016 at 5:56 AM, H.J. Lu  wrote:
> This fixed a typo.  Checked into trunk.
>
> H.J.
> ---
> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog   (revision 235274)
> +++ gcc/ChangeLog   (working copy)
> @@ -1,3 +1,7 @@
> +2016-04-20  H.J. Lu  
> +
> +   * doc/invoke.texi: Replace -skip-rax-setup with -mskip-rax-setup.
> +
>  2016-04-20  Richard Biener  
>
> * gimple-match.h (maybe_build_generic_op): Adjust prototype.
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi (revision 235274)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -24157,7 +24157,7 @@ useful together with @option{-mrecord-mc
>  @itemx -mno-skip-rax-setup
>  @opindex mskip-rax-setup
>  When generating code for the x86-64 architecture with SSE extensions
> -disabled, @option{-skip-rax-setup} can be used to skip setting up RAX
> +disabled, @option{-mskip-rax-setup} can be used to skip setting up RAX
>  register when there are no variable arguments passed in vector registers.
>
>  @strong{Warning:} Since RAX register is used to avoid unnecessarily

OK for 4.9, 5 and 6 branches?


-- 
H.J.


Re: [PATCH] Fix PR70725 (followup to Mareks patch)

2016-04-21 Thread Bin.Cheng
On Thu, Apr 21, 2016 at 3:12 PM, Richard Biener  wrote:
> On Thu, 21 Apr 2016, Jakub Jelinek wrote:
>
>> On Thu, Apr 21, 2016 at 02:20:34PM +0200, Richard Biener wrote:
>> >
>> > The following fixes the followup ICEs in the testcase for PR70725
>> > where Markes patch only fixed the first one.
>> >
>> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> >
>> > Jakub - the patch should be safe in that any testcase running into
>> > the changed paths now would have caused SSA verifications without
>> > the patch.  There's still the question what to do for GCC 6.1 - we
>> > can revert Mareks patch or apply this one ontop.  I don't have a
>> > strong opinion here.
>>
>> I think reverting Marek's patch for 6.1 and reapplying it for 6.2 together
>> with this would be safest, but am not strongly against just applying this to
>> 6.1 either.
>
> Ok, it doesn't apply cleanly and I won't get to it today so please
> revert Mareks change on the branch if you want to create RC2 today.
Hi,
Hmm, I think this fix is not needed on GCC 6 because the ICE is
introduced by my patch @GCC7.  The default behavior of GCC6 is simply
return false, thus loop is not if converted.
Sorry I should have kept the old behavior before my next patch fixing
PHIs that can be degenerated.

Thanks,
bin
>
> Applied to trunk sofar.
>
> Richard.
>
>> > 2016-04-21  Richard Biener  
>> >
>> > PR tree-optimization/70725
>> > * tree-if-conv.c (if_convertible_phi_p): Adjust guard
>> > for phi_convertible_by_degenerating_args.
>> > (predicate_all_scalar_phis): Handle single-argument PHIs.


Re: [PATCH] Fix ICE in predicate_mem_writes (PR tree-optimization/70725)

2016-04-21 Thread Marek Polacek
On Wed, Apr 20, 2016 at 11:57:23AM -0700, H.J. Lu wrote:
> On Wed, Apr 20, 2016 at 4:19 AM, Marek Polacek  wrote:
> It leads to ICE on 32-bit x86 host:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70725#c8

I can't reproduce with gcc-6:
$ xgcc-6 -O3 -march=skylake-avx512 -c pr70725.c
nor with trunk with richi's fix:
$ xgcc -O3 -march=skylake-avx512 -c pr70725.c

Marek


Re: [PATCH] Fix ICE in predicate_mem_writes (PR tree-optimization/70725)

2016-04-21 Thread Bin.Cheng
On Thu, Apr 21, 2016 at 4:13 PM, Marek Polacek  wrote:
> On Wed, Apr 20, 2016 at 11:57:23AM -0700, H.J. Lu wrote:
>> On Wed, Apr 20, 2016 at 4:19 AM, Marek Polacek  wrote:
>> It leads to ICE on 32-bit x86 host:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70725#c8
>
> I can't reproduce with gcc-6:
> $ xgcc-6 -O3 -march=skylake-avx512 -c pr70725.c
> nor with trunk with richi's fix:
> $ xgcc -O3 -march=skylake-avx512 -c pr70725.c
Hi Marek,
It's my patches' fault, which are only applied on GCC7.  Also Richard
has quickly fixed the ICE on GCC7.
Sorry for the trouble.

Thanks,
bin
>
> Marek


Re: [PATCH] Fix ICE in predicate_mem_writes (PR tree-optimization/70725)

2016-04-21 Thread Marek Polacek
On Thu, Apr 21, 2016 at 04:19:12PM +0100, Bin.Cheng wrote:
> On Thu, Apr 21, 2016 at 4:13 PM, Marek Polacek  wrote:
> > On Wed, Apr 20, 2016 at 11:57:23AM -0700, H.J. Lu wrote:
> >> On Wed, Apr 20, 2016 at 4:19 AM, Marek Polacek  wrote:
> >> It leads to ICE on 32-bit x86 host:
> >>
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70725#c8
> >
> > I can't reproduce with gcc-6:
> > $ xgcc-6 -O3 -march=skylake-avx512 -c pr70725.c
> > nor with trunk with richi's fix:
> > $ xgcc -O3 -march=skylake-avx512 -c pr70725.c
> Hi Marek,
> It's my patches' fault, which are only applied on GCC7.  Also Richard
> has quickly fixed the ICE on GCC7.

I see, great.  Jakub, I think we're fine wrt RC2 then and don't need to
revert my patches for this PR on gcc-6-branch.

> Sorry for the trouble.

No problem.

Marek


Re: [PATCH, i386, AVX-512] Fix PR target/70728.

2016-04-21 Thread Kirill Yukhin
Hello,
On 21 Apr 14:50, Kirill Yukhin wrote:
> Hello,
> Patch in the bottom fixes mentioned PR by separating
> AVX and AVX-512BW constraints.
> 
> gcc/
>   * gcc/config/i386/sse.md (define_insn "3"):
>   Extract AVX-512BW constraint from AVX.
> gcc/testsuite/
>   * gcc.target/i386/pr70728.c: New test.
> 
> Bootsrap and regtest is in progress for i?86|x86_64.
> 
> I'll check it into main trunk if it'll pass.
Checked into maint trunk.

Is it OK to check into gcc-6?

--
Thanks, K


Re: gomp_target_fini

2016-04-21 Thread Thomas Schwinge
Hi!

On Thu, 21 Apr 2016 16:43:22 +0300, Alexander Monakov  
wrote:
> On Tue, 19 Apr 2016, Jakub Jelinek wrote:
> > On Tue, Apr 19, 2016 at 04:01:06PM +0200, Thomas Schwinge wrote:
> > > [...] Alexander's much smaller patch with subject: "libgomp plugin: make
> > > cuMemFreeHost error non-fatal",
> > > .

> > Yeah, Alex' patch is IMHO fine, even for gcc-6-branch.
> 
> Applied to both.

Thanks!

Backported to gomp-4_0-branch in r235345:

commit 7e774a1bb94e2c5f17765342a59c6cb25e76c943
Author: tschwinge 
Date:   Thu Apr 21 15:35:57 2016 +

libgomp nvptx plugin: make cuMemFreeHost error non-fatal

Backport trunk r235339:

libgomp/
2016-04-21  Alexander Monakov  

* plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error
non-fatal.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@235345 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp| 8 
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 1c99026..23a8eef 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2016-04-21  Thomas Schwinge  
+
+   Backport trunk r235339:
+   2016-04-21  Alexander Monakov  
+
+   * plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error
+   non-fatal.
+
 2016-04-08  Thomas Schwinge  
 
PR testsuite/70579
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index eea74d4..6b674c0 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -128,7 +128,7 @@ map_fini (struct ptx_stream *s)
 
   r = cuMemFreeHost (s->h);
   if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
+GOMP_PLUGIN_error ("cuMemFreeHost error: %s", cuda_error (r));
 }
 
 static void


Grüße
 Thomas


Re: [PATCH][wwwdocs] Add deprecation of pre-ARMv4T architectures to the release notes

2016-04-21 Thread Kyrill Tkachov

Hi Gerald,

On 21/04/16 14:51, Gerald Pfeifer wrote:

On Thu, 21 Apr 2016, Kyrill Tkachov wrote:

This patch lists the -mcpu and -march values that are deprecated for GCC 6. 
Joel indicated that it would be useful to enumerate them all.


Yes, this is very useful.  Thank you!


Ok to commit?


Yep from my side.  (Perhaps a native speaker can help us whether
"value" is the best word here...  Joel?)



Thanks, I've committed it.
We can always adjust the wording separately if needed.

Kyrill


Gerald




[PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops

2016-04-21 Thread Ilya Enkovich
Hi,

Currently when loop is vectorized we adjust its nb_iterations_upper_bound
by dividing it by VF.  This is incorrect since nb_iterations_upper_bound
is upper bound for ( - 1) and therefore simple
dividing it by VF in many cases gives us bounds greater than a real one.
Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).

Also decrement due to peeling for gaps should happen before we scale it
by VF because peeling applies to a scalar loop, not vectorized one.

This patch modifies nb_iterations_upper_bound computation to resolve
these issues.

Running regression testing I got one fail due to optimized loop. Heres
is a loop:

foo (signed char s)
{
  signed char i;
  for (i = 0; i < s; i++)
yy[i] = (signed int) i;
}

Here we vectorize for AVX512 using VF=64.  Original loop has max 127
iterations and therefore vectorized loop may be executed only once.
With the patch applied compiler detects it and transforms loop into
BB with just stores of constants vectors into yy.  Test was adjusted
to increase number of possible iterations.  A copy of test was added
to check we can optimize out the original loop.

Bootstrapped and regtested on x86_64-pc-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2016-04-21  Ilya Enkovich  

* tree-vect-loop.c (vect_transform_loop): Fix
nb_iterations_upper_bound computation for vectorized loop.

gcc/testsuite/

2016-04-21  Ilya Enkovich  

* gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
optimization of vector loop.
* gcc.target/i386/vect-unpack-3.c: New test.


diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c 
b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
index 4825248..51c518e 100644
--- a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
+++ b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
@@ -6,19 +6,22 @@
 
 #define N 120
 signed int yy[1];
+signed char zz[1];
 
 void
-__attribute__ ((noinline)) foo (signed char s)
+__attribute__ ((noinline,noclone)) foo (int s)
 {
-   signed char i;
+   int i;
for (i = 0; i < s; i++)
- yy[i] = (signed int) i;
+ yy[i] = zz[i];
 }
 
 void
 avx512bw_test ()
 {
   signed char i;
+  for (i = 0; i < N; i++)
+zz[i] = i;
   foo (N);
   for (i = 0; i < N; i++)
 if ( (signed int)i != yy [i] )
diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-3.c 
b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
new file mode 100644
index 000..eb8a93e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-vect-details -ftree-vectorize -ffast-math 
-mavx512bw -save-temps" } */
+/* { dg-require-effective-target avx512bw } */
+
+#include "avx512bw-check.h"
+
+#define N 120
+signed int yy[1];
+
+void
+__attribute__ ((noinline)) foo (signed char s)
+{
+   signed char i;
+   for (i = 0; i < s; i++)
+ yy[i] = (signed int) i;
+}
+
+void
+avx512bw_test ()
+{
+  signed char i;
+  foo (N);
+  for (i = 0; i < N; i++)
+if ( (signed int)i != yy [i] )
+  abort ();
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-assembler-not "vpmovsxbw\[ \\t\]+\[^\n\]*%zmm" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index d813b86..da98211 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6921,11 +6921,13 @@ vect_transform_loop (loop_vec_info loop_vinfo)
   /* Reduce loop iterations by the vectorization factor.  */
   scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor),
  expected_iterations / vectorization_factor);
-  loop->nb_iterations_upper_bound
-= wi::udiv_floor (loop->nb_iterations_upper_bound, vectorization_factor);
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
   && loop->nb_iterations_upper_bound != 0)
 loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 1;
+  loop->nb_iterations_upper_bound
+= wi::udiv_floor (loop->nb_iterations_upper_bound + 1,
+ vectorization_factor) - 1;
+
   if (loop->any_estimate)
 {
   loop->nb_iterations_estimate


Re: C++ PATCH to fix a part of c++/70513 (ICE-on-invalid with enums)

2016-04-21 Thread Marek Polacek
On Thu, Apr 21, 2016 at 09:57:30AM -0400, Jason Merrill wrote:
> On 04/21/2016 07:35 AM, Marek Polacek wrote:
> >+  permerror (type_start_token->location,
> >+ "extra qualification not allowed");
> >+  type = error_mark_node;
> 
> If we're using permerror, we shouldn't set type to error_mark_node; if we do
> that, -fpermissive won't make it work.

Yikes, that makes sense.  I removed the assignment.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-21  Marek Polacek  

PR c++/70513
* parser.c (cp_parser_enum_specifier): Check and possibly error for
extra qualification.

* g++.dg/cpp0x/forw_enum12.C: New test.
* g++.dg/cpp0x/forw_enum13.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index 0a1ed1a..feb8de7 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -17233,6 +17233,16 @@ cp_parser_enum_specifier (cp_parser* parser)
  type, prev_scope, nested_name_specifier);
  type = error_mark_node;
}
+ /* If that scope is the scope where the declaration is being placed
+the program is invalid.  */
+ else if (CLASS_TYPE_P (nested_name_specifier)
+  && CLASS_TYPE_P (prev_scope)
+  && same_type_p (nested_name_specifier, prev_scope))
+   {
+ permerror (type_start_token->location,
+"extra qualification not allowed");
+ nested_name_specifier = NULL_TREE;
+   }
}
 
   if (scoped_enum_p)
diff --git gcc/testsuite/g++.dg/cpp0x/forw_enum12.C 
gcc/testsuite/g++.dg/cpp0x/forw_enum12.C
index e69de29..906ba68 100644
--- gcc/testsuite/g++.dg/cpp0x/forw_enum12.C
+++ gcc/testsuite/g++.dg/cpp0x/forw_enum12.C
@@ -0,0 +1,29 @@
+// PR c++/70513
+// { dg-do compile { target c++11 } }
+
+struct S1
+{
+  enum E : int;
+  enum S1::E : int { X } e; // { dg-error "extra qualification not allowed" }
+};
+
+struct S2
+{
+  enum class E : int;
+  enum class S2::E : int { X } e; // { dg-error "extra qualification not 
allowed" }
+};
+
+struct S3
+{
+  enum struct E : int;
+  enum struct S3::E : int { X } e; // { dg-error "extra qualification not 
allowed" }
+};
+
+struct S4
+{
+  struct S5
+  {
+enum E : char;
+enum S4::S5::E : char { X } e; // { dg-error "extra qualification not 
allowed" }
+  };
+};
diff --git gcc/testsuite/g++.dg/cpp0x/forw_enum13.C 
gcc/testsuite/g++.dg/cpp0x/forw_enum13.C
index e69de29..b8027f0 100644
--- gcc/testsuite/g++.dg/cpp0x/forw_enum13.C
+++ gcc/testsuite/g++.dg/cpp0x/forw_enum13.C
@@ -0,0 +1,47 @@
+// PR c++/70513
+// { dg-do compile { target c++11 } }
+
+template 
+class D1
+{
+  enum A : int;
+  enum D1::A : int { foo } c; // { dg-error "extra qualification not allowed" }
+};
+
+template 
+class D2
+{
+  enum A : int;
+  enum D2::A : int { foo } c; // { dg-error "extra qualification not 
allowed" }
+};
+
+template 
+class D3
+{
+  enum D3::A { foo } c; // { dg-error "extra qualification not allowed" }
+};
+
+template 
+class D4
+{
+  enum D4::A { foo } c; // { dg-error "extra qualification not allowed" }
+};
+
+template 
+class D5
+{
+  class D6
+  {
+enum D6::A { foo } c; // { dg-error "extra qualification not allowed" }
+  };
+};
+
+template 
+class D7
+{
+  class D8
+  {
+enum A : int;
+enum D8::A : int { foo } c; // { dg-error "extra qualification not 
allowed" }
+  };
+};

Marek


Document OpenACC status for GCC 6

2016-04-21 Thread Thomas Schwinge
Hi!

OK to commit (something like) the following?  Should something be added
to the "News" section on  itself?  (I don't know
the policy for that.  We didn't suggest that for GCC 5, because at that
time we described the support as a "preliminary implementation of the
OpenACC 2.0a specification"; now it's much more complete and usable.)

Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.74
diff -u -p -r1.74 changes.html
--- htdocs/gcc-6/changes.html   19 Apr 2016 11:13:02 -  1.74
+++ htdocs/gcc-6/changes.html   21 Apr 2016 16:10:49 -
@@ -124,6 +124,52 @@ For more information, see the
 
 New Languages and Language specific improvements
 
+Compared to GCC 5, the GCC 6 release series includes a much improved
+implementation of the http://www.openacc.org/";>OpenACC 2.0a
+  specification.  Highlights are:
+
+  In addition to single-threaded host-fallback execution, offloading is
+   supported for nvptx (Nvidia GPUs) on x86_64 and PowerPC 64-bit
+   little-endian GNU/Linux host systems.  For nvptx offloading, with the
+   OpenACC parallel construct, the execution model allows for an arbitrary
+   number of gangs, up to 32 workers, and 32 vectors.
+  Initial support for parallelized execution of OpenACC kernels
+   constructs:
+   
+ Parallelization of a kernels region is switched on
+   by -fopenacc combined with -O2 or
+   higher.
+ Code will be offloaded onto multiple gangs, but executes with
+   just one worker, and a vector length of 1.
+ Directives inside a kernels region are not supported.
+ Loops with reductions can be parallelized.
+ Only kernels regions with one loop nest are parallelized.
+ Only the outer-most loop of a loop nest can be parallelized.
+ Loop nests containing sibling loops are not parallelized.
+   
+   Typically, using the OpenACC parallel construct will give much better
+   performance, compared to the initial support of the OpenACC kernels
+   construct.
+  The device_type clause is not supported.
+   The bind and nohost clauses are not
+   supported.  The host_data directive is not supported in
+   Fortran.
+  Nested parallelism (cf. CUDA dynamic parallelism) is not
+   supported.
+  Usage of OpenACC constructs inside multithreaded contexts (such as
+   created by OpenMP, or pthread programming) is not supported.
+  If a call to the acc_on_device function has a
+   compile-time constant argument, the function call evaluates to a
+   compile-time constant value only for C and C++ but not for
+   Fortran.
+
+See the https://gcc.gnu.org/wiki/OpenACC";>OpenACC
+and https://gcc.gnu.org/wiki/Offloading";>Offloading wiki pages
+for further information.
+  
+
 
 
 C family


Grüße
 Thomas


signature.asc
Description: PGP signature


[PATCH][doc] Update documentation of AArch64 options

2016-04-21 Thread Wilco Dijkstra
Update documentation of AArch64 options for GCC6 to be more accurate, 
fix a few minor mistakes and remove some duplication.

Tested with "make info dvi pdf html" and checked resulting PDF is as expected.

OK for trunk and backport to GCC6.1 branch?

ChangeLog:
2016-04-21  Wilco Dijkstra  

gcc/
* gcc/doc/invoke.texi (AArch64 Options): Update.


--

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
e9763d44d8d7aa6a64821a4b1811e550254e..ddd4eeaec1502f871d0febd6045e37153c48a7e1
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12827,9 +12827,9 @@ These options are defined for AArch64 implementations:
 @item -mabi=@var{name}
 @opindex mabi
 Generate code for the specified data model.  Permissible values
-are @samp{ilp32} for SysV-like data model where int, long int and pointer
+are @samp{ilp32} for SysV-like data model where int, long int and pointers
 are 32-bit, and @samp{lp64} for SysV-like data model where int is 32-bit,
-but long int and pointer are 64-bit.
+but long int and pointers are 64-bit.
 
 The default depends on the specific target configuration.  Note that
 the LP64 and ILP32 ABIs are not link-compatible; you must compile your
@@ -12854,9 +12854,8 @@ Generate little-endian code.  This is the default when 
GCC is configured for an
 @item -mcmodel=tiny
 @opindex mcmodel=tiny
 Generate code for the tiny code model.  The program and its statically defined
-symbols must be within 1GB of each other.  Pointers are 64 bits.  Programs can
-be statically or dynamically linked.  This model is not fully implemented and
-mostly treated as @samp{small}.
+symbols must be within 1MB of each other.  Pointers are 64 bits.  Programs can
+be statically or dynamically linked.
 
 @item -mcmodel=small
 @opindex mcmodel=small
@@ -12872,7 +12871,8 @@ statically linked only.
 
 @item -mstrict-align
 @opindex mstrict-align
-Do not assume that unaligned memory references are handled by the system.
+Avoid generating unaligned accesses when accessing objects at non-naturally
+aligned boundaries as described in the architecture.
 
 @item -momit-leaf-frame-pointer
 @itemx -mno-omit-leaf-frame-pointer
@@ -12894,7 +12894,7 @@ of TLS variables.
 @item -mtls-size=@var{size}
 @opindex mtls-size
 Specify bit size of immediate TLS offsets.  Valid values are 12, 24, 32, 48.
-This option depends on binutils higher than 2.25.
+This option requires binutils 2.26 or newer.
 
 @item -mfix-cortex-a53-835769
 @itemx -mno-fix-cortex-a53-835769
@@ -12916,10 +12916,11 @@ corresponding flag to the linker.
 @item -mno-low-precision-recip-sqrt
 @opindex -mlow-precision-recip-sqrt
 @opindex -mno-low-precision-recip-sqrt
-When calculating the reciprocal square root approximation,
-uses one less step than otherwise, thus reducing latency and precision.
-This is only relevant if @option{-ffast-math} enables the reciprocal square 
root
-approximation, which in turn depends on the target processor.
+Enable or disable reciprocal square root approximation.
+This option only has an effect if @option{-ffast-math} or
+@option{-funsafe-math-optimizations} is used as well.  Enabling this reduces
+precision of reciprocal square root results to about 16 bits for
+single-precision and to 32 bits for double-precision.
 
 @item -march=@var{name}
 @opindex march
@@ -12956,17 +12957,15 @@ Specify the name of the target processor for which 
GCC should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
 @samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx},
-@samp{xgene1}.
+@samp{xgene1}, @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
+@samp{native}.
 
-Additionally, this option can specify that GCC should tune the performance
-of the code for a big.LITTLE system.  Permissible values for this
-option are: @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}.
+The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}
+specify that GCC should tune for a big.LITTLE system.
 
 Additionally on native AArch64 GNU/Linux systems the value
-@samp{native} is available.  This option causes the compiler to pick
-the architecture of and tune the performance of the code for the
-processor of the host system.  This option has no effect if the
-compiler is unable to recognize the architecture of the host system.
+@samp{native} tunes performance to the host system.  This option has no effect
+if the compiler is unable to recognize the processor of the host system.
 
 Where none of @option{-mtune=}, @option{-mcpu=} or @option{-march=}
 are specified, the code is tuned to perform well across a range
@@ -12986,12 +12985,6 @@ documented in the sub-section on
 Feature Modifiers}.  Where conflicting feature modifiers are
 specified, the right-most feature is used.
 
-Additionally on native AArch64 GNU/Linux systems the value
-@samp{native} is available.  This option causes the compiler to tune
-the performance 

Re: C++ PATCH to fix a part of c++/70513 (ICE-on-invalid with enums)

2016-04-21 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 4:50 PM, H.J. Lu  wrote:

>>> I tried and it doesn't work since the correct mode may not be always
>>> available in predicates.  Yes, they pass mode.  But they just do
>>>
>>> mode = GET_MODE (op);
>>>
>>> which returns VOIDmode for -1.
>>
>> Well, looking at generated gcc/insns-preds.c, the predicates do:
>>
>> (mode == VOIDmode || GET_MODE (op) == mode).
>>
>> They *check* and don't *assign* "mode" variable.
>>
>> So, I see no problem checking "mode" variable (that gets the value
>> from the pattern) in the predicates.
>
> This is an incomplete list:
>
> combine.c:   && ! push_operand (dest, GET_MODE (dest)))
> expr.c:  if (push_operand (x, GET_MODE (x)))
> expr.c:  && ! push_operand (x, GET_MODE (x
> gcse.c:   && ! push_operand (dest, GET_MODE (dest)))
> gcse.c:  if (general_operand (exp, GET_MODE (reg)))
> ifcvt.c:  if (! general_operand (cmp_a, GET_MODE (cmp_a))
> ifcvt.c:  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
> ifcvt.c:  else if (general_operand (b, GET_MODE (b)))
> ifcvt.c:  if (! general_operand (a, GET_MODE (a)) || tmp_a)
> ifcvt.c:  if (! general_operand (b, GET_MODE (b)) || tmp_b)
> ira-costs.c:  if (address_operand (op, GET_MODE (op))
> ira-costs.c:  && general_operand (SET_SRC (set), GET_MODE (SET_SRC 
> (set
> lower-subreg.c:  if (GET_MODE (op_operand) != word_mode
> lower-subreg.c:  && GET_MODE_SIZE (GET_MODE (op_operand)) > 
> UNITS_PER_WORD)
> lower-subreg.c: GET_MODE (op_operand),
> lra-constraints.c: if (simplify_operand_subreg (i, GET_MODE (old)) ||
> op_change_p)
> optabs.c:  create_output_operand (&ops[0], target, GET_MODE (target));
> optabs.c:  create_input_operand (&ops[1], op0, GET_MODE (op0));
> postreload-gcse.c:  if (! push_operand (dest, GET_MODE (dest)))
> postreload-gcse.c:  && general_operand (src, GET_MODE (src))
> postreload-gcse.c:  && general_operand (dest, GET_MODE (dest))
> postreload-gcse.c:  && general_operand (src, GET_MODE (src))
>
> IRA and LRA use GET_MODE and pass it to predicates.

I don't know what are you trying to prove here ...

Uros.


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread H.J. Lu
On Thu, Apr 21, 2016 at 9:31 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 4:50 PM, H.J. Lu  wrote:
>
 I tried and it doesn't work since the correct mode may not be always
 available in predicates.  Yes, they pass mode.  But they just do

 mode = GET_MODE (op);

 which returns VOIDmode for -1.
>>>
>>> Well, looking at generated gcc/insns-preds.c, the predicates do:
>>>
>>> (mode == VOIDmode || GET_MODE (op) == mode).
>>>
>>> They *check* and don't *assign* "mode" variable.
>>>
>>> So, I see no problem checking "mode" variable (that gets the value
>>> from the pattern) in the predicates.
>>
>> This is an incomplete list:
>>
>> combine.c:   && ! push_operand (dest, GET_MODE (dest)))
>> expr.c:  if (push_operand (x, GET_MODE (x)))
>> expr.c:  && ! push_operand (x, GET_MODE (x
>> gcse.c:   && ! push_operand (dest, GET_MODE (dest)))
>> gcse.c:  if (general_operand (exp, GET_MODE (reg)))
>> ifcvt.c:  if (! general_operand (cmp_a, GET_MODE (cmp_a))
>> ifcvt.c:  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
>> ifcvt.c:  else if (general_operand (b, GET_MODE (b)))
>> ifcvt.c:  if (! general_operand (a, GET_MODE (a)) || tmp_a)
>> ifcvt.c:  if (! general_operand (b, GET_MODE (b)) || tmp_b)
>> ira-costs.c:  if (address_operand (op, GET_MODE (op))
>> ira-costs.c:  && general_operand (SET_SRC (set), GET_MODE (SET_SRC 
>> (set
>> lower-subreg.c:  if (GET_MODE (op_operand) != word_mode
>> lower-subreg.c:  && GET_MODE_SIZE (GET_MODE (op_operand)) > 
>> UNITS_PER_WORD)
>> lower-subreg.c: GET_MODE 
>> (op_operand),
>> lra-constraints.c: if (simplify_operand_subreg (i, GET_MODE (old)) ||
>> op_change_p)
>> optabs.c:  create_output_operand (&ops[0], target, GET_MODE (target));
>> optabs.c:  create_input_operand (&ops[1], op0, GET_MODE (op0));
>> postreload-gcse.c:  if (! push_operand (dest, GET_MODE (dest)))
>> postreload-gcse.c:  && general_operand (src, GET_MODE (src))
>> postreload-gcse.c:  && general_operand (dest, GET_MODE (dest))
>> postreload-gcse.c:  && general_operand (src, GET_MODE (src))
>>
>> IRA and LRA use GET_MODE and pass it to predicates.
>
> I don't know what are you trying to prove here ...

The "mode" argument passed to predates can't be used to determine if
-1 is a valid SSE constant.


-- 
H.J.


Re: [PATCH] Reuse the saved_scope structures allocated by push_to_top_level

2016-04-21 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch] PR 70540 ("[4.9/5/6 Regression] ICE on invalid code in cxx_incomplete_type_diagnostic...")

2016-04-21 Thread Jason Merrill

OK for trunk and 6.2.

Jason


Re: tuple move constructor

2016-04-21 Thread Marc Glisse

On Thu, 21 Apr 2016, Jonathan Wakely wrote:


On 20 April 2016 at 21:42, Marc Glisse wrote:

Hello,

does anyone remember why the move constructor of _Tuple_impl is not
defaulted? The attached patch does not cause any test to fail (whitespace
kept to avoid line number changes). Maybe something about tuples of
references?


I don't know/remember why. It's possible it was to workaround a
front-end bug that required it, or maybe just a mistake and it should
always have been defaulted.


Ok, then how about something like this? In order to suppress the move
constructor in tuple (when there is a non-movable element), we need to
either declare it with suitable constraints, or keep it defaulted and
ensure that we don't bypass a missing move constructor anywhere along
the way (_Tuple_impl, _Head_base). There is a strange mix of 2
strategies in the patch, I prefer the tag class, but I started using
enable_if before I realized how many places needed those horrors.

Bootstrap+regtest on powerpc64le-unknown-linux-gnu.


2016-04-22  Marc Glisse  

* include/std/tuple (__element_arg_t): New class.
(_Head_base(const _Head&), _Tuple_impl(const _Head&, const _Tail&...):
Remove.
(_Head_base(_UHead&&)): Add __element_arg_t argument...
(_Tuple_impl): ... and adjust callers.
(_Tuple_impl(_Tuple_impl&&)): Default.
(_Tuple_impl(const _Tuple_impl&),
_Tuple_impl(_Tuple_impl&&), _Tuple_impl(_UHead&&): Constrain.
* testsuite/20_util/tuple/nomove.cc: New.

--
Marc GlisseIndex: libstdc++-v3/include/std/tuple
===
--- libstdc++-v3/include/std/tuple  (revision 235346)
+++ libstdc++-v3/include/std/tuple  (working copy)
@@ -41,38 +41,37 @@
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /**
*  @addtogroup utilities
*  @{
*/
 
+  struct __element_arg_t { };
+
   template
 struct _Head_base;
 
   template
 struct _Head_base<_Idx, _Head, true>
 : public _Head
 {
   constexpr _Head_base()
   : _Head() { }
 
-  constexpr _Head_base(const _Head& __h)
-  : _Head(__h) { }
-
   constexpr _Head_base(const _Head_base&) = default;
   constexpr _Head_base(_Head_base&&) = default;
 
   template
-constexpr _Head_base(_UHead&& __h)
+constexpr _Head_base(__element_arg_t, _UHead&& __h)
: _Head(std::forward<_UHead>(__h)) { }
 
   _Head_base(allocator_arg_t, __uses_alloc0)
   : _Head() { }
 
   template
_Head_base(allocator_arg_t, __uses_alloc1<_Alloc> __a)
: _Head(allocator_arg, *__a._M_a) { }
 
   template
@@ -97,28 +96,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static constexpr const _Head&
   _M_head(const _Head_base& __b) noexcept { return __b; }
 };
 
   template
 struct _Head_base<_Idx, _Head, false>
 {
   constexpr _Head_base()
   : _M_head_impl() { }
 
-  constexpr _Head_base(const _Head& __h)
-  : _M_head_impl(__h) { }
-
   constexpr _Head_base(const _Head_base&) = default;
   constexpr _Head_base(_Head_base&&) = default;
 
   template
-constexpr _Head_base(_UHead&& __h)
+constexpr _Head_base(__element_arg_t, _UHead&& __h)
: _M_head_impl(std::forward<_UHead>(__h)) { }
 
   _Head_base(allocator_arg_t, __uses_alloc0)
   : _M_head_impl() { }
 
   template
_Head_base(allocator_arg_t, __uses_alloc1<_Alloc> __a)
: _M_head_impl(allocator_arg, *__a._M_a) { }
 
   template
@@ -194,50 +190,49 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   static constexpr _Inherited&
   _M_tail(_Tuple_impl& __t) noexcept { return __t; }
 
   static constexpr const _Inherited&
   _M_tail(const _Tuple_impl& __t) noexcept { return __t; }
 
   constexpr _Tuple_impl()
   : _Inherited(), _Base() { }
 
-  explicit 
-  constexpr _Tuple_impl(const _Head& __head, const _Tail&... __tail)
-  : _Inherited(__tail...), _Base(__head) { }
-
   template::type> 
 explicit
 constexpr _Tuple_impl(_UHead&& __head, _UTail&&... __tail)
: _Inherited(std::forward<_UTail>(__tail)...),
- _Base(std::forward<_UHead>(__head)) { }
+ _Base(__element_arg_t(), std::forward<_UHead>(__head)) { }
 
   constexpr _Tuple_impl(const _Tuple_impl&) = default;
+  constexpr _Tuple_impl(_Tuple_impl&&) = default;
 
-  constexpr
-  _Tuple_impl(_Tuple_impl&& __in)
-  noexcept(__and_,
- is_nothrow_move_constructible<_Inherited>>::value)
-  : _Inherited(std::move(_M_tail(__in))), 
-   _Base(std::forward<_Head>(_M_head(__in))) { }
-
-  template
+  template>::value,
+ bool>::type = false>
 constexpr _Tuple_impl(const _Tuple_impl<_Idx, _UElements...>& __in)
: _Inherited(_Tuple_impl<_Idx, _UElements...>::_M_tail(__in)),
- _Base(_Tuple_impl<_Idx, _UElements...>::_M_head(__in)) { }

[PATCH GCC]Refactor IVOPT.

2016-04-21 Thread Bin Cheng
Hi,
This patch refactors IVOPT in three major aspects:
Firstly it rewrites iv_use groups.  Use group is originally introduced only for 
address type uses, this patch makes it general to all (generic, compare, 
address) types.  Currently generic/compare groups contain only one iv_use, and 
compare groups can be extended to contain multiple uses.  As far as generic use 
is concerned, it won't contain multiple uses because IVOPT reuses one iv_use 
structure for generic uses at different places already.  This change also 
cleanups algorithms as well as data structures.
Secondly it implements group data structure in vector rather than in list as 
originally.  List was used because it's easy to split.  Of course list is hard 
to sort (For example, we use quadratic insertion sort now).  This problem will 
become more critical since I plan to introduce fine-control over splitting 
small address groups by checking if target supports load/store pair 
instructions or not.  In this case address group needs to be sorted more than 
once and against complex conditions, for example, memory loads in one basic 
block should be sorted together in offset ascending order.  With vector group, 
sorting can be done very efficiently with quick sort.
Thirdly this patch cleanups/reformats IVOPT's dump information.  I think the 
information is easier to read/test now.  Since change of dump information is 
entangled with group data-structure change, it's hard to make it a standalone 
patch.  Given this part patch is quite straightforward, I hope it won't be 
confusing.

Bootstrap and test on x86_64 and AArch64, no regressions.  I also checked 
generated assembly for spec2k and spec2k6 on both platforms, turns out output 
assembly is almost not changed except for several files.  After further 
investigation, I can confirm the difference is cause by small change when 
sorting groups. Given the original sorting condition as below:
-  /* Sub use list is maintained in offset ascending order.  */
-  if (addr_offset <= group->addr_offset)
-{
-  use->related_cands = group->related_cands;
-  group->related_cands = NULL;
-  use->next = group;
-  data->iv_uses[id_group] = use;
-}
iv_uses with same addr_offset are sorted in reverse control flow order.  This 
might be a typo since I don't remember any specific reason for it.  If this 
patch sorts groups in the same way, there will be no difference in generated 
assembly at all.  So this patch is a pure refactoring work which doesn't have 
any functional change.

Any comments?

Thanks,
bin

2016-04-19  Bin Cheng  

* tree-ssa-loop-ivopts.c (struct iv): Use pointer to struct iv_use
instead of redundant use_id and boolean have_use_for.
(struct iv_use): Change sub_id into group_id.  Remove field next.
Move fields: related_cands, n_map_members, cost_map and selected
to ...
(struct iv_group): ... here.  New structure.
(struct iv_common_cand): Use structure declaration directly.
(struct ivopts_data, iv_ca, iv_ca_delta): Rename fields.
(MAX_CONSIDERED_USES): Rename macro to ...
(MAX_CONSIDERED_GROUPS): ... here.
(n_iv_uses, iv_use, n_iv_cands, iv_cand): Delete.
(dump_iv, dump_use, dump_cand): Refactor format of dump information.
(dump_uses): Rename to ...
(dump_groups): ... here.  Update all uses.
(tree_ssa_iv_optimize_init, alloc_iv): Update all uses.
(find_induction_variables): Refactor format of dump information.
(record_sub_use): Delete.
(record_use): Update all uses.
(record_group): New function.
(record_group_use, find_interesting_uses_op): Call above functions.
Update all uses.
(find_interesting_uses_cond): Ditto.
(group_compare_offset): New function.
(split_all_small_groups): Rename to ...
(split_small_address_groups_p): ... here.  Update all uses.
(split_address_groups):  Update all uses.
(find_interesting_uses): Refactor format of dump information.
(add_candidate_1): Update all uses.  Remove redundant check on iv,
base and step.
(add_candidate, record_common_cand): Remove redundant assert.
(add_iv_candidate_for_biv): Update use.
(add_iv_candidate_derived_from_uses): Update all uses.
(add_iv_candidate_for_groups, record_important_candidates): Ditto.
(alloc_use_cost_map): Ditto.
(set_use_iv_cost, get_use_iv_cost): Rename to ...
(set_group_iv_cost, get_group_iv_cost): ... here.  Update all uses.
(determine_use_iv_cost_generic): Ditto.
(determine_group_iv_cost_generic): Ditto.
(determine_use_iv_cost_address): Ditto.
(determine_group_iv_cost_address): Ditto.
(determine_use_iv_cost_condition): Ditto.
(determine_group_iv_cost_cond): Ditto.
(determine_use_iv_cost): Ditto.
(determine_group_iv_cost): Ditto.
(set_autoinc_for_orig

[PATCH] PR target/70750: [6/7 Regression] Load and call no longer combined for indirect calls on x86

2016-04-21 Thread H.J. Lu
r231923 has

 ;; Test for a valid operand for a call instruction.
 ;; Allow constant call address operands in Pmode only.
 (define_special_predicate "call_insn_operand"
   (ior (match_test "constant_call_address_operand
 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "call_register_no_elim_operand")
-   (and (not (match_test "TARGET_X32"))
-   (match_operand 0 "memory_operand"
+   (ior (and (not (match_test "TARGET_X32"))
+(match_operand 0 "sibcall_memory_operand"))
   ^^^ A typo.
+   (and (match_test "TARGET_X32 && Pmode == DImode")
+(match_operand 0 "GOT_memory_operand")

"sibcall_memory_operand" should be "memory_operand".

OK for trunk and 6 branch if there is no regression on x86-64?

H.J.
---
gcc/

PR target/70750
* config/i386/predicates.md (call_insn_operand): Replace
sibcall_memory_operand with memory_operand.

gcc/testsuite/

PR target/70750
* gcc.target/i386/pr70750-1.c: New test.
* gcc.target/i386/pr70750-2.c: Likewise.
---
 gcc/config/i386/predicates.md |  2 +-
 gcc/testsuite/gcc.target/i386/pr70750-1.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr70750-2.c | 11 +++
 3 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70750-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr70750-2.c

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 14e80d9..93dda7b 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -637,7 +637,7 @@
 (op, mode == VOIDmode ? mode : Pmode)")
(match_operand 0 "call_register_no_elim_operand")
(ior (and (not (match_test "TARGET_X32"))
-(match_operand 0 "sibcall_memory_operand"))
+(match_operand 0 "memory_operand"))
(and (match_test "TARGET_X32 && Pmode == DImode")
 (match_operand 0 "GOT_memory_operand")
 
diff --git a/gcc/testsuite/gcc.target/i386/pr70750-1.c 
b/gcc/testsuite/gcc.target/i386/pr70750-1.c
new file mode 100644
index 000..9fcab17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr70750-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2" } */
+
+int
+f (int (**p) (void))
+{
+  return p[1]();
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\].*\\(%rdi\\)" { target { lp64 } } } 
} */
+/* { dg-final { scan-assembler "jmp\[ \t\]\\*%rax" { target { x32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr70750-2.c 
b/gcc/testsuite/gcc.target/i386/pr70750-2.c
new file mode 100644
index 000..afbef37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr70750-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2" } */
+
+int
+f (int (**p) (void))
+{
+  return -p[1]();
+}
+
+/* { dg-final { scan-assembler "call\[ \t\].*\\(%rdi\\)" { target { lp64 } } } 
} */
+/* { dg-final { scan-assembler "call\[ \t\]\\*%rax" { target { x32 } } } } */
-- 
2.5.5



Re: Document OpenACC status for GCC 6

2016-04-21 Thread Sandra Loosemore

On 04/21/2016 10:21 AM, Thomas Schwinge wrote:


+ Code will be offloaded onto multiple gangs, but executes with
+   just one worker, and a vector length of 1.


"will be" (future) vs "executes" (present).  Assuming this is all 
supposed to describe current behavior, please write consistently in the 
present tense.



+   Typically, using the OpenACC parallel construct will give much better
+   performance, compared to the initial support of the OpenACC kernels
+   construct.


Here too.

My only comment on the rest of the patch is that "a kernels region" 
sounds like a mistake but I think that is the official terminology?


-Sandra the nit-picky



Re: [PATCH] PR target/70750: [6/7 Regression] Load and call no longer combined for indirect calls on x86

2016-04-21 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 7:46 PM, H.J. Lu  wrote:
> r231923 has
>
>  ;; Test for a valid operand for a call instruction.
>  ;; Allow constant call address operands in Pmode only.
>  (define_special_predicate "call_insn_operand"
>(ior (match_test "constant_call_address_operand
>  (op, mode == VOIDmode ? mode : Pmode)")
> (match_operand 0 "call_register_no_elim_operand")
> -   (and (not (match_test "TARGET_X32"))
> -   (match_operand 0 "memory_operand"
> +   (ior (and (not (match_test "TARGET_X32"))
> +(match_operand 0 "sibcall_memory_operand"))
>^^^ A typo.
> +   (and (match_test "TARGET_X32 && Pmode == DImode")
> +(match_operand 0 "GOT_memory_operand")
>
> "sibcall_memory_operand" should be "memory_operand".
>
> OK for trunk and 6 branch if there is no regression on x86-64?

OK everywhere, but needs RM's approval for branch.

Thanks,
Uros.

> H.J.
> ---
> gcc/
>
> PR target/70750
> * config/i386/predicates.md (call_insn_operand): Replace
> sibcall_memory_operand with memory_operand.
>
> gcc/testsuite/
>
> PR target/70750
> * gcc.target/i386/pr70750-1.c: New test.
> * gcc.target/i386/pr70750-2.c: Likewise.
> ---
>  gcc/config/i386/predicates.md |  2 +-
>  gcc/testsuite/gcc.target/i386/pr70750-1.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr70750-2.c | 11 +++
>  3 files changed, 23 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr70750-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr70750-2.c
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index 14e80d9..93dda7b 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -637,7 +637,7 @@
>  (op, mode == VOIDmode ? mode : Pmode)")
> (match_operand 0 "call_register_no_elim_operand")
> (ior (and (not (match_test "TARGET_X32"))
> -(match_operand 0 "sibcall_memory_operand"))
> +(match_operand 0 "memory_operand"))
> (and (match_test "TARGET_X32 && Pmode == DImode")
>  (match_operand 0 "GOT_memory_operand")
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr70750-1.c 
> b/gcc/testsuite/gcc.target/i386/pr70750-1.c
> new file mode 100644
> index 000..9fcab17
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr70750-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2" } */
> +
> +int
> +f (int (**p) (void))
> +{
> +  return p[1]();
> +}
> +
> +/* { dg-final { scan-assembler "jmp\[ \t\].*\\(%rdi\\)" { target { lp64 } } 
> } } */
> +/* { dg-final { scan-assembler "jmp\[ \t\]\\*%rax" { target { x32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr70750-2.c 
> b/gcc/testsuite/gcc.target/i386/pr70750-2.c
> new file mode 100644
> index 000..afbef37
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr70750-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2" } */
> +
> +int
> +f (int (**p) (void))
> +{
> +  return -p[1]();
> +}
> +
> +/* { dg-final { scan-assembler "call\[ \t\].*\\(%rdi\\)" { target { lp64 } } 
> } } */
> +/* { dg-final { scan-assembler "call\[ \t\]\\*%rax" { target { x32 } } } } */
> --
> 2.5.5
>


Re: [PATCH][doc] Update documentation of AArch64 options

2016-04-21 Thread Sandra Loosemore

On 04/21/2016 10:26 AM, Wilco Dijkstra wrote:


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
e9763d44d8d7aa6a64821a4b1811e550254e..ddd4eeaec1502f871d0febd6045e37153c48a7e1
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12827,9 +12827,9 @@ These options are defined for AArch64 implementations:
  @item -mabi=@var{name}
  @opindex mabi
  Generate code for the specified data model.  Permissible values
-are @samp{ilp32} for SysV-like data model where int, long int and pointer
+are @samp{ilp32} for SysV-like data model where int, long int and pointers
  are 32-bit, and @samp{lp64} for SysV-like data model where int is 32-bit,
-but long int and pointer are 64-bit.
+but long int and pointers are 64-bit.


Can you please change all the incorrectly hyphenated "32-bit" and 
"64-bit" uses in this section to "32 bits" and "64 bits" respectively? 
("n-bit" should only be hyphenated when it is used as an adjective 
phrase immediately before the noun it modifies.)



@@ -12872,7 +12871,8 @@ statically linked only.

  @item -mstrict-align
  @opindex mstrict-align
-Do not assume that unaligned memory references are handled by the system.
+Avoid generating unaligned accesses when accessing objects at non-naturally
+aligned boundaries as described in the architecture.


The new text seems repetitive and awkward to me.  How about something like:

Avoid generating memory accesses that may not be aligned on a natural 
object boundary as described in the architecture specification.


??


@@ -12916,10 +12916,11 @@ corresponding flag to the linker.
  @item -mno-low-precision-recip-sqrt
  @opindex -mlow-precision-recip-sqrt
  @opindex -mno-low-precision-recip-sqrt
-When calculating the reciprocal square root approximation,
-uses one less step than otherwise, thus reducing latency and precision.
-This is only relevant if @option{-ffast-math} enables the reciprocal square 
root
-approximation, which in turn depends on the target processor.
+Enable or disable reciprocal square root approximation.
+This option only has an effect if @option{-ffast-math} or
+@option{-funsafe-math-optimizations} is used as well.  Enabling this reduces
+precision of reciprocal square root results to about 16 bits for
+single-precision and to 32 bits for double-precision.


"single precision" and "double precision" should not be hyphenated when 
used as nouns, as they are here (only when used as an adjective phrase 
immediately before the noun they modify).



@@ -13010,10 +13003,10 @@ This option is only intended to be useful when 
developing GCC.

  @item -mpc-relative-literal-loads
  @opindex mpcrelativeliteralloads


What happened to that @opindex entry?  :-(


-Enable PC relative literal loads. If this option is used, literal
-pools are assumed to have a range of up to 1MiB and an appropriate
-instruction sequence is used. This option has no impact when used
-with @option{-mcmodel=tiny}.
+Enable PC relative literal loads.  With this option literal pools are


"PC-relative" should be hyphenated since this *is* and adjective phrase 
immediately before the noun it modifies


-Sandra the nit-picky



RE: [AArch64] Emit division using the Newton series

2016-04-21 Thread Evandro Menezes
> On 04/04/16 14:06, Evandro Menezes wrote:
> > On 04/01/16 17:52, Evandro Menezes wrote:
> >> On 04/01/16 17:45, Wilco Dijkstra wrote:
> >>> Evandro Menezes wrote:
> >>>
>  However, I don't think that there's the need to handle any special
>  case for division.  The only case when the approximation differs
>  from division is when the numerator is infinity and the
>  denominator, zero, when the approximation returns infinity and the
>  division, NAN.  So I don't think that it's a special case that
>  deserves being handled.
>  IOW,
>  the result of the approximate reciprocal is always needed.
> >>>   No, the result of the approximate reciprocal is not needed.
> >>>
> >>> Basically a NR approximation produces a correction factor that is
> >>> very close to 1.0, and then multiplies that with the previous
> >>> estimate to get a more accurate estimate. The final calculation for
> >>> x * recip(y) is:
> >>>
> >>> result = (reciprocal_correction * reciprocal_estimate) * x
> >>>
> >>> while what I am suggesting is a trivial reassociation:
> >>>
> >>> result = reciprocal_correction * (reciprocal_estimate * x)
> >>>
> >>> The computation of the final reciprocal_correction is on the
> >>> critical latency path, while reciprocal_estimate is computed
> >>> earlier, so we can compute (reciprocal_estimate * x) without
> >>> increasing the overall latency.
> >>> Ie. we saved
> >>> a multiply.
> >>>
> >>> In principle this could be done as a separate optimization pass that
> >>> tries to reassociate to reduce latency. However I'm not too
> >>> convinced this would be easy to implement in GCC's scheduler, so
> >>> it's best to do it explicitly.
> >>
> >> I think that I see what you mean.  I'll hack something tomorrow.
> >
> >[AArch64] Emit division using the Newton series
> >
> >2016-04-04  Evandro Menezes  
> > Wilco Dijkstra 
> >
> >gcc/
> > * config/aarch64/aarch64-tuning-flags.def
> > * config/aarch64/aarch64-protos.h
> > (AARCH64_APPROX_MODE): New macro.
> > (AARCH64_EXTRA_TUNE_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}:
> > New tuning macros.
> > (tune_params): Add new member "approx_div_modes".
> > (aarch64_emit_approx_div): Declare new function.
> > * config/aarch64/aarch64.c
> > (generic_tunings): New member "approx_div_modes".
> > (cortexa35_tunings): Likewise.
> > (cortexa53_tunings): Likewise.
> > (cortexa57_tunings): Likewise.
> > (cortexa72_tunings): Likewise.
> > (exynosm1_tunings): Likewise.
> > (thunderx_tunings): Likewise.
> > (xgene1_tunings): Likewise.
> > (aarch64_emit_approx_div): Define new function.
> > * config/aarch64/aarch64.md ("div3"): New expansion.
> > * config/aarch64/aarch64-simd.md ("div3"): Likewise.
> > * config/aarch64/aarch64.opt (-mlow-precision-div): Add new
> >option.
> > * doc/invoke.texi (-mlow-precision-div): Describe new option.
> >
> >
> > This version of the patch has a shorter dependency chain at the last
> > iteration of the series.
> 
> Ping^1

Ping^2

-- 
Evandro Menezes  Austin, TX



RE: [AArch64] Emit square root using the Newton series

2016-04-21 Thread Evandro Menezes
> On 04/05/16 17:30, Evandro Menezes wrote:
> > On 04/05/16 13:37, Wilco Dijkstra wrote:
> >> I can't get any of these to work... Not only do I get a large number
> >> of collisions and duplicated code between these patches, when I try
> >> to resolve them, all I get is crashes whenever I try to use sqrt
> >> (even rsqrt stopped working). Do you have a patchset that applies
> >> cleanly so I can try all approximation routines?
> >
> > The original patches should be independent of each other, so indeed
> > they duplicate code.
> >
> > This patch suite should be suitable for testing.
> 
> Ping^1

Ping^2
 
--
Evandro Menezes




RE: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-21 Thread Evandro Menezes
> On 04/04/16 11:13, Evandro Menezes wrote:
> > On 04/01/16 18:08, Wilco Dijkstra wrote:
> >> Evandro Menezes wrote:
> >>> I hope that this gets in the ballpark of what's been discussed
> >>> previously.
> >> Yes that's very close to what I had in mind. A minor issue is that
> >> the vector modes cannot work as they start at MAX_MODE_FLOAT (which
> >> is > 32):
> >>
> >> +/* Control approximate alternatives to certain FP operators. */
> >> +#define AARCH64_APPROX_MODE(MODE) \
> >> +  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
> >> +   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
> >> +   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <=
> >> MAX_MODE_VECTOR_FLOAT) \
> >> + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \
> >> + : (0))
> >>
> >> That should be:
> >>
> >> + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT -
> >> MIN_MODE_FLOAT + 1)) \
> >>
> >> It would be worth testing all the obvious cases to be sure they work.
> >>
> >> Also I don't think it is a good idea to enable all modes on Exynos-M1
> >> and XGene-1 - I haven't seen any evidence that shows it gives a
> >> speedup on real code for all modes (or at least on a good micro
> >> benchmark like the unit vector test I suggested - a simple throughput
> >> test does not count!).
> >
> > This approximation does benefit M1 in general across several
> > benchmarks.  As for my choice for Xgene1, it preserves the original
> > setting.  I believe that, with this more granular option, developers
> > can fine tune their targets.
> >
> >> The issue is it hides performance gains from an improved divider/sqrt
> >> on new revisions or microarchitectures. That means you should only
> >> enable cases where there is evidence of a major speedup that cannot
> >> be matched by a future improved divider/sqrt.
> >
> > I did notice that some benchmarks with heavy use of multiplication or
> > multiply-accumulation, the series may be detrimental, since it
> > increases the competition for the unit(s) that do(es) such operations.
> >
> > But those micro-architectures that get a better unit for division or
> > sqrt() are free to add their own tuning parameters.  Granted, I assume
> > that running legacy code is not much of an issue only in a few markets.
> 
> Ping^1

Ping^2

-- 
Evandro Menezes  Austin, TX



Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-21 Thread H.J. Lu
On Thu, Apr 21, 2016 at 9:37 AM, H.J. Lu  wrote:
> On Thu, Apr 21, 2016 at 9:31 AM, Uros Bizjak  wrote:
>> On Thu, Apr 21, 2016 at 4:50 PM, H.J. Lu  wrote:
>>
> I tried and it doesn't work since the correct mode may not be always
> available in predicates.  Yes, they pass mode.  But they just do
>
> mode = GET_MODE (op);
>
> which returns VOIDmode for -1.

 Well, looking at generated gcc/insns-preds.c, the predicates do:

 (mode == VOIDmode || GET_MODE (op) == mode).

 They *check* and don't *assign* "mode" variable.

 So, I see no problem checking "mode" variable (that gets the value
 from the pattern) in the predicates.
>>>
>>> This is an incomplete list:
>>>
>>> combine.c:   && ! push_operand (dest, GET_MODE (dest)))
>>> expr.c:  if (push_operand (x, GET_MODE (x)))
>>> expr.c:  && ! push_operand (x, GET_MODE (x
>>> gcse.c:   && ! push_operand (dest, GET_MODE (dest)))
>>> gcse.c:  if (general_operand (exp, GET_MODE (reg)))
>>> ifcvt.c:  if (! general_operand (cmp_a, GET_MODE (cmp_a))
>>> ifcvt.c:  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
>>> ifcvt.c:  else if (general_operand (b, GET_MODE (b)))
>>> ifcvt.c:  if (! general_operand (a, GET_MODE (a)) || tmp_a)
>>> ifcvt.c:  if (! general_operand (b, GET_MODE (b)) || tmp_b)
>>> ira-costs.c:  if (address_operand (op, GET_MODE (op))
>>> ira-costs.c:  && general_operand (SET_SRC (set), GET_MODE (SET_SRC 
>>> (set
>>> lower-subreg.c:  if (GET_MODE (op_operand) != word_mode
>>> lower-subreg.c:  && GET_MODE_SIZE (GET_MODE (op_operand)) > 
>>> UNITS_PER_WORD)
>>> lower-subreg.c: GET_MODE 
>>> (op_operand),
>>> lra-constraints.c: if (simplify_operand_subreg (i, GET_MODE (old)) ||
>>> op_change_p)
>>> optabs.c:  create_output_operand (&ops[0], target, GET_MODE (target));
>>> optabs.c:  create_input_operand (&ops[1], op0, GET_MODE (op0));
>>> postreload-gcse.c:  if (! push_operand (dest, GET_MODE (dest)))
>>> postreload-gcse.c:  && general_operand (src, GET_MODE (src))
>>> postreload-gcse.c:  && general_operand (dest, GET_MODE (dest))
>>> postreload-gcse.c:  && general_operand (src, GET_MODE (src))
>>>
>>> IRA and LRA use GET_MODE and pass it to predicates.
>>
>> I don't know what are you trying to prove here ...
>
> The "mode" argument passed to predates can't be used to determine if
> -1 is a valid SSE constant.
>

Hi Uros,

Here is the updated patch with my standard_sse_constant_p change and
your SSE/AVX pattern change.  I didn't include your
standard_sse_constant_opcode since it didn't compile nor is needed
for this purpose.

-- 
H.J.
From 57c811f25e0735e744d255b3d66a86dbb131749c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 6 Mar 2016 06:40:31 -0800
Subject: [PATCH] Allow all 1s of integer as standard SSE constants

Since all 1s in TImode is standard SSE2 constants, all 1s in OImode is
standard AVX2 constants and all 1s in XImode is standard AVX512F constants,
pass mode to standard_sse_constant_p and standard_sse_constant_opcode
to check if all 1s is available for target.

	* config/i386/i386-protos.h (standard_sse_constant_p): Take
	machine_mode with VOIDmode as default.
	* config/i386/i386.c (standard_sse_constant_p): Get mode if
	it is VOIDmode.  Return 2 for all 1s of integer in supported
	modes.
	(ix86_expand_vector_move): Pass mode to standard_sse_constant_p.
	* config/i386/i386.md (*movxi_internal_avx512f): Replace
	vector_move_operand with nonimmediate_or_sse_const_operand and
	use BC instead of C in constraint.  Check register_operand
	instead of MEM_P.
	(*movoi_internal_avx): Add a "BC" alternative for AVX2.  Check
	register_operand instead of MEM_P.
	(*movti_internal): Add a "BC" alternative for SSE2.  Check
	register_operand instead of MEM_P for SSE.
---
 gcc/config/i386/i386-protos.h |  2 +-
 gcc/config/i386/i386.c| 27 ---
 gcc/config/i386/i386.md   | 39 ---
 3 files changed, 45 insertions(+), 23 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ff47bc1..cf54189 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -50,7 +50,7 @@ extern bool ix86_using_red_zone (void);
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
 extern rtx standard_80387_constant_rtx (int);
-extern int standard_sse_constant_p (rtx);
+extern int standard_sse_constant_p (rtx, machine_mode = VOIDmode);
 extern const char *standard_sse_constant_opcode (rtx_insn *, rtx);
 extern bool symbolic_reference_mentioned_p (rtx);
 extern bool extended_reg_mentioned_p (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6379313..dd951c2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10766,18 +10766,31 @@ standard_80387_constant_rtx (int idx)
in supported SSE/AVX vector mode.  */
 
 int
-standard_sse_cons

Re: [PATCH][cilkplus] fix c++ implicit conversions with cilk_spawn (PR/69024, PR/68997)

2016-04-21 Thread Jason Merrill

On 01/20/2016 12:57 PM, Ryan Burn wrote:

  case AGGR_INIT_EXPR:
+  {
+   int len = 0;
+   int ii = 0;
+   extract_free_variables (TREE_OPERAND (t, 1), wd, ADD_READ);
+   if (TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST)
+ {
+   len = TREE_INT_CST_LOW (TREE_OPERAND (t, 0));
+
+   for (ii = 3; ii < len; ii++)
+ extract_free_variables (TREE_OPERAND (t, ii), wd, ADD_READ);
+   extract_free_variables (TREE_TYPE (t), wd, ADD_READ);
+ }
+   break;
+  }


Please add a comment about skipping operand 2 (the slot).  Would it make 
sense to skip operand 2 (the static chain) for CALL_EXPR, too?



+is_conversion_operator_function_decl_p (tree t) {


Open brace gets its own line.


+   tree fn = AGGR_INIT_EXPR_FN (exp);
+   if (TREE_CODE (fn) == ADDR_EXPR
+   && is_conversion_operator_function_decl_p (TREE_OPERAND (fn, 0))


It would be good to have a cp_get_callee_fndecl like the normal 
get_callee_fndecl, but supporting AGGR_INIT_EXPR as well.  That would 
replace the less capable get_function_named_in_call function in 
constexpr.c and various other places that access AGGR_INIT_EXPR_FN 
directly.  Mind doing that, either in this patch or as a follow-up?


Jason


Re: [PATCH] Fix ICE in predicate_mem_writes (PR tree-optimization/70725)

2016-04-21 Thread Jakub Jelinek
On Thu, Apr 21, 2016 at 05:23:34PM +0200, Marek Polacek wrote:
> On Thu, Apr 21, 2016 at 04:19:12PM +0100, Bin.Cheng wrote:
> > On Thu, Apr 21, 2016 at 4:13 PM, Marek Polacek  wrote:
> > > On Wed, Apr 20, 2016 at 11:57:23AM -0700, H.J. Lu wrote:
> > >> On Wed, Apr 20, 2016 at 4:19 AM, Marek Polacek  
> > >> wrote:
> > >> It leads to ICE on 32-bit x86 host:
> > >>
> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70725#c8
> > >
> > > I can't reproduce with gcc-6:
> > > $ xgcc-6 -O3 -march=skylake-avx512 -c pr70725.c
> > > nor with trunk with richi's fix:
> > > $ xgcc -O3 -march=skylake-avx512 -c pr70725.c
> > Hi Marek,
> > It's my patches' fault, which are only applied on GCC7.  Also Richard
> > has quickly fixed the ICE on GCC7.
> 
> I see, great.  Jakub, I think we're fine wrt RC2 then and don't need to
> revert my patches for this PR on gcc-6-branch.

Yeah, don't revert anything if it works.

Jakub


match.pd patch: u + 3 < u is u > UINT_MAX - 3

2016-04-21 Thread Marc Glisse

Hello,

this optimizes a common pattern for unsigned overflow detection, when one 
of the arguments turns out to be a constant. There are more ways this 
could look like, (a + 42 <= 41) in particular, but that'll be for another 
patch.


Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

2016-04-22  Marc Glisse  

gcc/
* match.pd (X + CST CMP X): New transformation.

gcc/testsuite/
* gcc.dg/tree-ssa/overflow-1.c: New testcase.

--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd	(revision 235350)
+++ gcc/match.pd	(working copy)
@@ -3071,10 +3071,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  /* signbit(x) -> 0 if x is nonnegative.  */
  (SIGNBIT tree_expr_nonnegative_p@0)
  { integer_zero_node; })
 
 (simplify
  /* signbit(x) -> x<0 if x doesn't have signed zeros.  */
  (SIGNBIT @0)
  (if (!HONOR_SIGNED_ZEROS (@0))
   (convert (lt @0 { build_real (TREE_TYPE (@0), dconst0); }
+
+/* When one argument is a constant, overflow detection can be simplified.
+   Currently restricted to single use so as not to interfere too much with
+   ADD_OVERFLOW detection in tree-ssa-math-opts.c.  */
+(for cmp (lt le ge gt)
+ out (gt gt le le)
+ (simplify
+  (cmp (plus@2 @0 integer_nonzerop@1) @0)
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
+   && TYPE_MAX_VALUE (TREE_TYPE (@0))
+   && single_use (@2))
+   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)
+(for cmp (gt ge le lt)
+ out (gt gt le le)
+ (simplify
+  (cmp @0 (plus@2 @0 integer_nonzerop@1))
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
+   && TYPE_MAX_VALUE (TREE_TYPE (@0))
+   && single_use (@2))
+   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)
Index: gcc/testsuite/gcc.dg/tree-ssa/overflow-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/overflow-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/overflow-1.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+int f(unsigned a){
+unsigned b=5;
+unsigned c=a-b;
+return c>a;
+}
+int g(unsigned a){
+unsigned b=32;
+unsigned c=a+b;
+return c 4294967263;" "optimized" } } */


[Patch, regex, libstdc++/70745] Fix match_not_bow and match_not_eow

2016-04-21 Thread Tim Shen
Bootstrapped and tested on x86-pc-linux-gnu debug.

It is a conformance fix, but I don't think it's very important. I'm
happy to backport it to gcc 5/4.9, but if it's not considered
necessary, I'm Ok as well.

Thanks!


-- 
Regards,
Tim Shen
commit 7f4f729d5dd80050ff966398e28909a40fb57932
Author: Tim Shen 
Date:   Thu Apr 21 21:02:11 2016 -0700

PR libstdc++/70745
* include/bits/regex_executor.tcc (_Executor<>::_M_word_boundary):
Fix the match_not_bow and match_not_eow behavior.
* testsuite/28_regex/regression.cc: Add testcase.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index 2abd020..6bbcb1b 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -413,6 +413,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
 _M_word_boundary() const
 {
+  if (_M_current == _M_begin && (_M_flags & 
regex_constants::match_not_bow))
+   return false;
+  if (_M_current == _M_end && (_M_flags & regex_constants::match_not_eow))
+   return false;
+
   bool __left_is_word = false;
   if (_M_current != _M_begin
  || (_M_flags & regex_constants::match_prev_avail))
@@ -424,13 +429,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool __right_is_word =
 _M_current != _M_end && _M_is_word(*_M_current);
 
-  if (__left_is_word == __right_is_word)
-   return false;
-  if (__left_is_word && !(_M_flags & regex_constants::match_not_eow))
-   return true;
-  if (__right_is_word && !(_M_flags & regex_constants::match_not_bow))
-   return true;
-  return false;
+  return __left_is_word != __right_is_word;
 }
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc 
b/libstdc++-v3/testsuite/28_regex/regression.cc
index c9a3402..d367c8b 100644
--- a/libstdc++-v3/testsuite/28_regex/regression.cc
+++ b/libstdc++-v3/testsuite/28_regex/regression.cc
@@ -45,7 +45,20 @@ test02()
   "/ghci"
 };
   auto rx = std::regex(re_str, std::regex_constants::grep | 
std::regex_constants::icase);
-  VERIFY(std::regex_search("/abcd", rx));
+  VERIFY(regex_search_debug("/abcd", rx));
+}
+
+void
+test03()
+{
+  bool test __attribute__((unused)) = true;
+
+  VERIFY(regex_match_debug("a.", regex(R"(a\b.)"), 
regex_constants::match_not_eow));
+  VERIFY(regex_match_debug(".a", regex(R"(.\ba)"), 
regex_constants::match_not_bow));
+  VERIFY(regex_search_debug("a", regex(R"(^\b)")));
+  VERIFY(regex_search_debug("a", regex(R"(\b$)")));
+  VERIFY(!regex_search_debug("a", regex(R"(^\b)"), 
regex_constants::match_not_bow));
+  VERIFY(!regex_search_debug("a", regex(R"(\b$)"), 
regex_constants::match_not_eow));
 }
 
 int
@@ -53,6 +66,7 @@ main()
 {
   test01();
   test02();
+  test03();
   return 0;
 }
 


Re: [PATCH 01/18] stop using rtx_insn_list in reorg.c

2016-04-21 Thread Jeff Law

On 04/20/2016 12:22 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-04-19  Trevor Saunders  

* reorg.c (try_merge_delay_insns): Make merged_insns a vector.

OK.
jeff



Re: [PATCH 04/18] remove unused loads rtx_insn_list

2016-04-21 Thread Jeff Law

On 04/20/2016 12:22 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-04-19  Trevor Saunders  

* gcse.c (struct ls_expr): Remove loads field.
(ldst_entry): Adjust.
(free_ldst_entry): Likewise.
(print_ldst_list): Likewise.
(compute_ld_motion_mems): Likewise.

OK.
jeff



Re: [PATCH 05/18] make stores rtx_insn_list a vec

2016-04-21 Thread Jeff Law

On 04/20/2016 05:45 AM, Segher Boessenkool wrote:

On Wed, Apr 20, 2016 at 02:22:09AM -0400, tbsaunde+...@tbsaunde.org wrote:

2016-04-19  Trevor Saunders  

* gcse.c (struct ls_expr): make stores field a vector.


Capital M.


@@ -3604,7 +3604,7 @@ ldst_entry (rtx x)
ptr->expr = NULL;
ptr->pattern  = x;
ptr->pattern_regs = NULL_RTX;
-  ptr->stores   = NULL;
+  ptr->stores  .create (0);


Spaces.


@@ -3620,7 +3620,7 @@ ldst_entry (rtx x)
  static void
  free_ldst_entry (struct ls_expr * ptr)
  {
-  free_INSN_LIST_list (& ptr->stores);
+   ptr->stores.release ();


Wrong indent.

Patch is OK with those nits fixed.

jeff



Re: [PATCH 08/18] make side_effects a vec

2016-04-21 Thread Jeff Law

On 04/20/2016 12:22 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-04-19  Trevor Saunders  

* var-tracking.c (struct adjust_mem_data): Make side_effects a vector.
(adjust_mems): Adjust.
(adjust_insn): Likewise.
(prepare_call_arguments): Likewise.
---
  gcc/var-tracking.c | 30 +++---
  1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 9f09d30..7fc6ed3 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -926,7 +926,7 @@ struct adjust_mem_data
bool store;
machine_mode mem_mode;
HOST_WIDE_INT stack_adjust;
-  rtx_expr_list *side_effects;
+  auto_vec side_effects;
  };
Is auto_vec what you really want here?  AFAICT this object is never 
destructed, so we're not releasing the memory.  Am I missing something here?



Jeff


  1   2   >