[committed] Another testcase for already fixed PR (PR c++/53574)

2017-10-16 Thread Jakub Jelinek
Hi!

Paolo has pinged me about this PR, which is indeed fixed for years.
I've checked in a slightly further reduced testcase as obvious.

2017-10-16  Jakub Jelinek  

PR c++/53574
* g++.dg/other/pr53574.C: New test.

--- gcc/testsuite/g++.dg/other/pr53574.C.jj 2017-10-16 10:12:51.387091133 
+0200
+++ gcc/testsuite/g++.dg/other/pr53574.C2017-10-16 10:29:49.0 
+0200
@@ -0,0 +1,48 @@
+// PR c++/53574
+// { dg-do compile { target c++11 } }
+// { dg-options "-fstack-usage" }
+
+template  struct A { typedef int type; };
+struct B {
+  typedef __SIZE_TYPE__ H;
+};
+template  class allocator : B {};
+template  struct C {
+  template 
+  static typename T::H foo(T *);
+  typedef decltype(foo((_Alloc *)0)) H;
+  template 
+  static typename A::type bar(U) { return typename A::type (); }
+  static int baz(_Alloc p1) { bar(p1); return 0; }
+};
+template  struct I : C<_Alloc> {};
+template  struct J {
+  typedef I> K;
+  K k;
+};
+struct D : J> {
+  void fn(int, int) {
+K m;
+I::baz(m);
+  }
+};
+template  struct F {
+  F();
+  F(const Ch *);
+  F test();
+  D d;
+};
+int l;
+struct G {
+  G(F);
+};
+char n;
+template  F::F(const Ch *) {
+  test();
+}
+template 
+F F::test() {
+  d.fn(l, 0);
+  return F ();
+}
+G fn1() { return G(&n); }

Jakub


[patch] Enhance support for -Wstack-usage/-Wvla-larger-than/-Walloca-larger-than

2017-10-16 Thread Eric Botcazou
Hi,

a big limitation of -Wstack-usage/-Wvla-larger-than/-Walloca-larger-than is 
that you need -O2 (or more precisely -ftree-vrp) in order to be able to say 
something sensible for dynamically-sized objects/VLAs/calls to alloca.  That 
can be problematic, for example if the coding guidelines prevents you from 
using anything beyond -O1 for production builds.

Now in Ada it is very easy and common to use integer types with custom bounds 
(the compiler automatically generates the appropriate run-time bound checks) 
so it is very easy to be able to say something sensible about dynamically-
sized objects and VLAs (Ada doesn't have alloca) even at -O0 or -O1.

That's why the attached patch introduces a way for front-ends to communicate 
an upper bound for the size of dynamically-sized objects/VLAs/calls to alloca 
to the -Wstack-usage/-Wvla-larger-than/-Walloca-larger-than machineries, based 
on a 3rd form of the BUILT_IN_ALLOCA builtin which takes a 3rd parameter in 
addition to the 2 parameters of BUILT_IN_ALLOCA_WITH_ALIGN.  This 3rd form is 
only used when the front-end can put an upper bound via max_int_size_in_bytes, 
which invokes lang_hooks.types.max_size, for the time being, but its usage 
could easily be extended.

Macros and helper function are provided to manipulate the variants as a single 
builtin, so that code not directly tied to their support is little modified.
The -Wstack-usage and -Wvla-larger-than/-Walloca-larger-than machineries are 
enhanced to take into account the upper bound, if it exists.

Bootstrapped/regtested on x86_64-suse-linux, OK for the mainline?


2017-10-16  Eric Botcazou  

* asan.c (handle_builtin_alloca): Deal with all alloca variants.
(get_mem_refs_of_builtin_call): Likewise.
* builtins.c (expand_builtin_apply): Adjust call to
allocate_dynamic_stack_space.
(expand_builtin_alloca): For __builtin_alloca_with_align_and_max, pass
the third argument to allocate_dynamic_stack_space, otherwise -1.
(expand_builtin): Deal with all alloca variants.
(is_inexpensive_builtin): Likewise.
* builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX): New.
* calls.c (special_function_p): Deal with all alloca variants.
(initialize_argument_information): Adjust call to
allocate_dynamic_stack_space.
(expand_call): Likewise.
* cfgexpand.c (expand_stack_vars): Likewise.
(expand_call_stmt): Deal with all alloca variants.
* doc/extend.texi (Built-ins): Add __builtin_alloca_with_align_and_max
* explow.c (allocate_dynamic_stack_space): Add MAX_SIZE parameter and
use it for the stack usage computation.
* explow.h (allocate_dynamic_stack_space): Adjust prototype.
* function.c (gimplify_parameters): Turn BUILT_IN_ALLOCA_WITH_ALIGN
into BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX and pass maximum size.
* gimple-ssa-warn-alloca.c (alloca_call_type): Simplify control flow.
Take into account 3rd argument of __builtin_alloca_with_align_and_max.
(in_loop_p): Remove first argument and useless check.
(pass_walloca::execute): Remove useless test and adjust call to above.
* gimple.c (gimple_build_call_from_tree): Deal with all alloc variants
* gimplify.c (gimplify_vla_decl): Turn BUILT_IN_ALLOCA_WITH_ALIGN into
BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX and pass maximum size.
(gimplify_call_expr): Deal with all alloca variants.
* hsa-gen.c (gen_hsa_alloca): Likewise.
(gen_hsa_insns_for_call): Likewise.
* ipa-pure-const.c (special_builtin_state): Likewise.
* tree-chkp.c (chkp_build_returned_bound): Likewise.
* tree-object-size.c (alloc_object_size): Likewise.
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Likewise.
(call_may_clobber_ref_p_1): Likewise.
* tree-ssa-ccp.c (evaluate_stmt): Likewise.
(ccp_fold_stmt): Likewise.
(optimize_stack_restore): Likewise.
* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Likewise.
(mark_all_reaching_defs_necessary_1): Likewise.
(propagate_necessity): Likewise.
(eliminate_unnecessary_stmts): Likewise.
* tree.c (build_common_builtin_nodes): Build
BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX.
* tree.h (ALLOCA_FUNCTION_CODE_P): New macro.
(CASE_BUILT_IN_ALLOCA): Likewise.
* varasm.c (incorporeal_function_p): Deal with all alloca variants.
ada/
* gcc-interface/utils.c (max_size): Deal with SSA names.
c-family/
* c-common.c (check_builtin_function_arguments): Also check arguments
of __builtin_alloca_with_align_and_max.


2017-10-16  Eric Botcazou  

* gcc.dg/Walloca-15.c: New test.
* gnat.dg/stack_usage4.adb: Likewise.
* gnat.dg/stack_usage4_pkg.ads: New helper.

-- 
Eric BotcazouIndex: ada/gcc-interface/utils.c
===
--- ada

RE: [PATCH][GCC][ARM] Dot Product NEON patterns [Patch (2/8)]

2017-10-16 Thread Tamar Christina


> -Original Message-
> From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com]
> Sent: 12 October 2017 13:57
> To: Tamar Christina; Kyrill Tkachov; gcc-patches@gcc.gnu.org
> Cc: nd; Ramana Radhakrishnan; ni...@redhat.com
> Subject: Re: [PATCH][GCC][ARM] Dot Product NEON patterns [Patch (2/8)]
> 
> On 06/10/17 13:44, Tamar Christina wrote:
> > Hi All,
> >
> > this is a minor respin with changes echo'd from feedback from aarch64.
> > I assume still OK for trunk.
> >
> > Regtested on arm-none-eabi, armeb-none-eabi, aarch64-none-elf and
> > aarch64_be-none-elf with no issues found.
> >
> > Ok for trunk?
> >
> > gcc/
> > 2017-10-06  Tamar Christina  
> >
> > * config/arm/arm-builtins.c (arm_unsigned_uternop_qualifiers): New.
> > (UTERNOP_QUALIFIERS, arm_umac_lane_qualifiers,
> UMAC_LANE_QUALIFIERS): New.
> > * config/arm/arm_neon_builtins.def (sdot, udot, sdot_lane,
> udot_lane): new.
> > * config/arm/iterators.md (DOTPROD, VSI2QI, vsi2qi): New.
> > (UNSPEC_DOT_S, UNSPEC_DOT_U, opsuffix): New.
> > * config/arm/neon.md (neon_dot): New.
> > (neon_dot_lane, dot_prod): New.
> > * config/arm/types.md (neon_dot, neon_dot_q): New.
> > * config/arm/unspecs.md (sup): Add UNSPEC_DOT_S,
> UNSPEC_DOT_U.
> 
> OK if this passes a native bootstrap.

Bootstrapped on arm-none-linux-gnueabihf and no issues.

Thanks,
Tamar
> 
> R.
> 
> > 
> > From: Kyrill Tkachov 
> > Sent: Wednesday, September 13, 2017 10:36:38 AM
> > To: Tamar Christina; gcc-patches@gcc.gnu.org
> > Cc: nd; Ramana Radhakrishnan; Richard Earnshaw; ni...@redhat.com
> > Subject: Re: [PATCH][GCC][ARM] Dot Product NEON patterns [Patch (2/8)]
> >
> > Hi Tamar,
> >
> > On 01/09/17 14:33, Tamar Christina wrote:
> >> Hi All,
> >>
> >> This patch adds the instructions for Dot Product to ARM along with
> >> the intrinsics and vectorizer pattern.
> >>
> >> Armv8.2-a dot product supports 8-bit element values both signed and
> >> unsigned.
> >>
> >> Dot product is available from Armv8.2-a and onwards.
> >>
> >> Regtested and bootstrapped on arm-none-eabi and no issues.
> >>
> >> Ok for trunk?
> >
> > This is ok once the prerequisites are approved with one ChangeLog nit.
> >
> > Kyrill
> >
> >> gcc/
> >> 2017-09-01  Tamar Christina  
> >>
> >>   * config/arm/arm-builtins.c (arm_unsigned_uternop_qualifiers): New.
> >>   (UTERNOP_QUALIFIERS, arm_umac_lane_qualifiers,
> UMAC_LANE_QUALIFIERS): New.
> >>   * config/arm/arm_neon_builtins.def (sdot, udot, sdot_lane,
> udot_lane): new.
> >>   * config/arm/iterators.md (DOTPROD, DOT_MODE, dot_mode): New.
> >>   (UNSPEC_DOT_S, UNSPEC_DOT_U, opsuffix): New.
> >>   * config/arm/neon.md (neon_dot): New.
> >>   (neon_dot_lane, dot_prod):
> New.
> >>   * config/arm/types.md (neon_dot, neon_dot_q): New.
> >>   * config/arm/unspecs.md (UNSPEC_DOT_S, UNSPEC_DOT_U): New.
> >>
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index
> >
> 7acbaf1bb40a4f270e75968804546508f7839e49..139e09fd929e17216ad9383505
> f1
> > 453a73d071fb 100644
> > --- a/gcc/config/arm/iterators.md
> >
> > --snip---
> >
> >   
> > ;;
> >   ;; Code attributes
> >
> > ;;
> > 
> > @@ -816,6 +822,7 @@
> > (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u")
> > (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
> > (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
> > +  (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
> >   ])
> >
> > In your ChangeLog you list this as "New" whereas your patch just adds
> them to the "sup" int_attr.
> >



[openacc, testsuite, committed] Enable libgomp.oacc-*/declare-*.{c,f90} for non-nvidia devices

2017-10-16 Thread Tom de Vries

Hi,

this patch enables some openacc test-cases for non-nvidia devices.

Committed.

Thanks,
- Tom
Enable libgomp.oacc-*/declare-*.{c,f90} for non-nvidia devices

2017-10-16  Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/declare-1.c: Don't require
	openacc_nvidia_accel_selected.
	* testsuite/libgomp.oacc-c-c++-common/declare-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/declare-4.c: Same.
	* testsuite/libgomp.oacc-fortran/declare-2.f90: Same.
	* testsuite/libgomp.oacc-fortran/declare-4.f90: Same
	* testsuite/libgomp.oacc-fortran/declare-5.f90: Same.
	* testsuite/libgomp.oacc-c-c++-common/declare-5.c: Don't require
	openacc_nvidia_accel_selected. Skip for shared memory device.
	* testsuite/libgomp.oacc-fortran/declare-1.f90: Same.
	* testsuite/libgomp.oacc-fortran/declare-3.f90: Same.

---
 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c | 2 --
 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-2.c | 2 --
 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c | 2 --
 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-5.c | 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90| 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90| 2 --
 libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90| 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90| 2 --
 libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90| 2 --
 9 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
index c63a68d..bc7261742 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
@@ -1,5 +1,3 @@
-/* { dg-do run { target openacc_nvidia_accel_selected } } */
-
 #include 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-2.c
index 2078a33..d212458 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-2.c
@@ -1,5 +1,3 @@
-/* { dg-do run { target openacc_nvidia_accel_selected } } */
-
 #include 
 
 #define N 16
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c
index 36bf0eb..ca48e80 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c
@@ -1,5 +1,3 @@
-/* { dg-do run  { target openacc_nvidia_accel_selected } } */
-
 #include 
 #include 
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-5.c
index 38c5de0..229e96c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-5.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } } */
 
 #include 
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
index 2d4b707..ca8831e 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
@@ -1,4 +1,4 @@
-! { dg-do run  { target openacc_nvidia_accel_selected } }
+! { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } }
 
 ! Tests to exercise the declare directive along with
 ! the clauses: copy
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90
index 2aa7907..aeea10a 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90
@@ -1,5 +1,3 @@
-! { dg-do run  { target openacc_nvidia_accel_selected } }
-
 module globalvars
   implicit none
   integer a
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90
index 3a6b420..88b9aff 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90
@@ -1,4 +1,4 @@
-! { dg-do run  { target openacc_nvidia_accel_selected } }
+! { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } }
 
 module globalvars
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90
index 226264e..252c4ff 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90
@@ -1,5 +1,3 @@
-! { dg-do run  { target openacc_nvidia_accel_selected } }
-
 module vars
   implicit none
   real b
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 b/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
index bcd9c9c..e91f26b 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/de

Re: [PATCH][GCC][AArch64] Restrict lrint inlining on ILP32.

2017-10-16 Thread Tamar Christina
Ping?

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Tamar Christina 
Sent: Wednesday, September 13, 2017 4:00:24 PM
To: gcc-patches@gcc.gnu.org
Cc: nd; James Greenhalgh; Richard Earnshaw; Marcus Shawcroft; pins...@gmail.com
Subject: [PATCH][GCC][AArch64] Restrict lrint inlining on ILP32.

Hi All,

The inlining of lrint isn't valid in all cases on ILP32 when
-fno-math-errno is used because an inexact exception is raised in
certain circumstances.

Instead the restriction is placed such that the integer mode has to
be larger or equal to the float mode in addition to either inexacts being
allowed or not caring about trapping math.

This prevents the overflow, and the inexact errors that may arise.

Unfortunately I can't create a test for this as there is a bug where
the pattern is always passed DI as the smallest mode,
and later takes a sub-reg of it to SI. This would prevent an overflow
where one was expected.

This fixed PR/81800.

Regtested on aarch64-none-linux-gnu and no regressions.

Ok for trunk?

Thanks,
Tamar

gcc/
2017-09-13  Tamar Christina  

PR target/81800
* config/aarch64/aarch64.md (lrint2): Add 
flag_trapping_math
and flag_fp_int_builtin_inexact.

gcc/testsuite/
2017-09-13  Tamar Christina  

* gcc.target/aarch64/inline-lrint_2.c (dg-options): Add 
-fno-trapping-math.

--


RE: [PATCH, i386] Avoid 512-bit mode MOV for prefer-avx256 option in Intel AVX512 configuration

2017-10-16 Thread Shalnov, Sergey
Uros,
Is this patch (second one which fixed in the way as Jakub proposed) ok for the 
trunk?
Could you please merge it?
Sergey

-Original Message-
From: Shalnov, Sergey 
Sent: Friday, October 6, 2017 4:20 PM
To: Jakub Jelinek 
Cc: 'gcc-patches@gcc.gnu.org' ; 'ubiz...@gmail.com' 
; 'kirill.yuk...@gmail.com' ; 
Senkevich, Andrew ; Ivchenko, Alexander 
; Peryt, Sebastian 
Subject: RE: [PATCH, i386] Avoid 512-bit mode MOV for prefer-avx256 option in 
Intel AVX512 configuration

Jakub,
I completely agree with you. I fixed the patch.
Currently, TARGET_PREFER256 will work on architectures with 512VL. It will not 
work otherwise.

I will try to find better solution for this. I think I need to look into 
register classes to configure available registers for 512F and 512VL in case of 
TARGET_PREFER_AVX256.

I would propose to merge this patch as temporal solution.

Sergey


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Jakub Jelinek
Sent: Friday, October 6, 2017 11:58 AM
To: Shalnov, Sergey 
Cc: 'gcc-patches@gcc.gnu.org' ; 'ubiz...@gmail.com' 
; 'kirill.yuk...@gmail.com' ; 
Senkevich, Andrew ; Ivchenko, Alexander 
; Peryt, Sebastian 
Subject: Re: [PATCH, i386] Avoid 512-bit mode MOV for prefer-avx256 option in 
Intel AVX512 configuration

On Fri, Oct 06, 2017 at 09:33:21AM +, Shalnov, Sergey wrote:
> Hi,
> GCC uses full 512-bit register in case of moving SF/DF value between two 
> registers.
> The patch avoid 512-bit register usage if "-mprefer-avx256" option used.
> 
> 2017-10-06  Sergey Shalnov  
> 
> gcc/
>   * config/i386/i386.md(*movsf_internal, *movdf_internal):
>   Avoid 512-bit AVX modes for TARGET_PREFER_AVX256.
> 
From b96e657153b9aea24ff002e7e156ba12b2d443b5 Mon Sep 17 00:00:00 2001
From: Sergey Shalnov 
Date: Fri, 6 Oct 2017 10:45:40 +0300
Subject: [PATCH 1/1] Avoid 512-bit mode MOV for prefer-avx256 option

---
 gcc/config/i386/i386.md | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
99497a9..a6d7cca 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3564,8 +3564,9 @@
 
   /* movaps is one byte shorter for non-AVX targets.  */
   (eq_attr "alternative" "13,17")
-(cond [(ior (match_operand 0 "ext_sse_reg_operand")
-(match_operand 1 "ext_sse_reg_operand"))
+(cond [(and (not (match_test "TARGET_PREFER_AVX256"))
+ (ior (match_operand 0 "ext_sse_reg_operand")
+  (match_operand 1 "ext_sse_reg_operand")))
  (const_string "V8DF")
(ior (not (match_test "TARGET_SSE2"))
 (match_test 
"TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))

How does that work with -mprefer-avx256 -mavx512f -mno-avx512vl?
The constraints in these alternatives use v, and [SD]Fmode are included in 
VALID_AVX512F_SCALAR_MODE and thus you can get e.g. %?mm23 among the operands.
EVEX encoded VMOVAPD or VMOVAPS is AVX512F cpuid only with 512-bit operands, 
with 128-bit/256-bit it is AVX512VL + AVX512F.

So, in both of the spots you've changed in this patch, but also in the spots 
you've changed earlier, you need to use
TARGET_PREFER_AVX256 && TARGET_AVX512VL rather than just TARGET_PREFER_AVX256, 
because without TARGET_AVX512VL it is not a matter of preferring it, but a 
must.  Unless we disable %xmm16+ registers for TARGET_PREFER_AVX256 && 
!TARGET_AVX512VL code, but that would be weird:
an optimization preference would e.g. break people using %xmm16+ register 
variables etc.

@@ -3739,8 +3740,9 @@
  better to maintain the whole registers in single format
  to avoid problems on using packed logical operations.  */
   (eq_attr "alternative" "6")
-(cond [(ior  (match_operand 0 "ext_sse_reg_operand")
- (match_operand 1 "ext_sse_reg_operand"))
+(cond [(and (not (match_test "TARGET_PREFER_AVX256"))
+ (ior (match_operand 0 "ext_sse_reg_operand")
+  (match_operand 1 "ext_sse_reg_operand")))
  (const_string "V16SF")
(ior (match_test "TARGET_SSE_PARTIAL_REG_DEPENDENCY")
 (match_test "TARGET_SSE_SPLIT_REGS"))
--
1.8.3.1




Jakub


Re: [PATCH 1/2] S/390: Handle long-running instructions

2017-10-16 Thread Andreas Krebbel
On 10/11/2017 01:53 PM, Robin Dapp wrote:
...
> @@ -14623,8 +14659,13 @@ s390_sched_variable_issue (FILE *file, int verbose, 
> rtx_insn *insn, int more)
>   case 1:
>   case 2:
>   case S390_SCHED_STATE_NORMAL:
> +   if (s390_sched_state == 0)
> + starts_group = true;
> if (s390_sched_state == S390_SCHED_STATE_NORMAL)
> - s390_sched_state = 1;
> + {
> +   starts_group = true;
> +   s390_sched_state = 1;
> + }

Should be the same as:

case 0:
  starts_group = true;
  /* fallthru */
case 1:
case 2:
  s390_sched_state++;
  break;
case S390_SCHED_STATE_NORMAL:
  starts_group = true;
  s390_sched_state = 1;
  break;
case S390_SCHED_STATE_CRACKED:
  s390_sched_state = S390_SCHED_STATE_NORMAL;
  break;
}

Ok with that change. Thanks!

-Andreas-



[PATCH, GCC/ARM] Allow +nodsp for -mcpu=cortex-m33

2017-10-16 Thread Thomas Preudhomme

Hi,

DSP instructions are optional for Arm Cortex-M33, yet its -mcpu option
does not allow +nodsp. Users are thus left with using
-march=armv8-m.main -mtune=cortex-m33. This patch allows +nodsp to
-mcpu=cortex-m33.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-10-11  Thomas Preud'homme  

* config/arm/arm-cpus.in (cortex-m33): Add nodsp option.
* doc/invoke.texi: Document +nodsp as a valid extension for
-mcpu=cortex-m33.

Tested by building an arm-none-eabi GCC cross-compiler and checking that
__ARM_FEATURE_DSP is *not* defined when invoked with
-mcpu=cortex-m33+nodsp.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 07de4c9375ba7a0df0d8bd00385e54a4042e5264..25fc429a8338e433b9fcd0ee385ff127423494c2 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1516,6 +1516,7 @@ begin cpu cortex-m33
  architecture armv8-m.main+dsp
  fpu fpv5-sp-d16
  option nofp remove ALL_FP
+ option nodsp remove armv7em
  costs v7m
 end cpu cortex-m33
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9ad1fb339babe2ce8f45ecac2fa93d7b9ae5fd30..722d5cc2c0a020906e6df3260822cdd268245082 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15803,6 +15803,9 @@ Permissible names for this option are the same as those for
 The following extension options are common to the listed CPUs:
 
 @table @samp
+@item +nodsp
+Disable the DSP instructions on @samp{cortex-m33}.
+
 @item  +nofp
 Disables the floating-point instructions on @samp{arm9e},
 @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},


Patch ping

2017-10-16 Thread Jakub Jelinek
Hi!

I'd like to ping a few patches:

http://gcc.gnu.org/ml/gcc-patches/2017-09/msg01939.html
  PR libstdc++/81706 - omp declare simd attribute vs. builtins
  C++ part of the patch

http://gcc.gnu.org/ml/gcc-patches/2017-10/msg00162.html
  PR target/82370 - prefer shorter VEX encoding of VP{AND,OR,XOR,ANDN} over
  EVEX when possible

http://gcc.gnu.org/ml/gcc-patches/2017-10/msg00205.html
  PR target/82370 - improve AVX512 vector shift patterns

http://gcc.gnu.org/ml/gcc-patches/2017-10/msg00206.html
  PR target/82370 - improve V?TImode shifts

http://gcc.gnu.org/ml/gcc-patches/2017-10/msg00207.html
  Remove useless isa attributes from various sse.md patterns

http://gcc.gnu.org/ml/gcc-patches/2017-10/msg00525.html
  PR target/82460 - improve AVX512* vperm[ti]2*

Thanks

Jakub


[PATCH][GCC][Testsuite][SPARC][ARM] Fix vect-multitypes-1.c test on SPARC64 and ARMEB.

2017-10-16 Thread Tamar Christina
Hi All,

This patch fixes a regression introduced by r253451.
The target needs all three conditions to be true before it can
vectorize unaligned accesses. This patch turns the erroneous ||
into an &&.

regtested on aarch64-none-elf, arm-none-linux-gnueabihf,
x86_64-pc-linux-gnu, armeb-none-linux-gnueabihf and
sparc64-unknown-linux-gnu.

OK for trunk?

And for the GCC-7 branch?

Thanks,
Tamar

gcc/testsuite/
2017-10-16  Tamar Christina  

* gcc.dg/vect/vect-multitypes-1.c: Correct target selector.

-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
index fd7cacb483d9cfbea6d909ba12e67544fa32a190..49e304180e7b01d41a0922b5358d30ae3af88d24 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
@@ -83,5 +83,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" { xfail {{ vect_no_align && { ! vect_hw_misalign } } || {vect_sizes_32B_16B }}} } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 4 "vect" { xfail {{ vect_no_align && { ! vect_hw_misalign } } || {vect_sizes_32B_16B }}} } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 4 "vect" { target { vect_no_align && { {! vect_hw_misalign } && vect_sizes_32B_16B } } }} } */
 



[PATCH, testsuite] Add dg-require-stack-size

2017-10-16 Thread Tom de Vries

Hi,

I noticed gcc.dg/tree-ssa/ldist-27.c failing for nvptx due to a too 
large stack size.


I started updating the testcase using "dg-add-options stack_size", but 
came across dg-require-support and realized I could make a 
dg-require-stack-size directive with an argument, and use that instead.


With the patch applied, the test still passes on x86_64, and by mocking 
up limited stack space like this:

...
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp

index 4f9bf46..cafea26 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -511,6 +511,7 @@ proc check_effective_target_trampolines { } {
 # Return 1 if target has limited stack size.

 proc check_effective_target_stack_size { } {
+return 1
 if [target_info exists gcc,stack_size] {
return 1
 }
@@ -522,6 +523,7 @@ proc check_effective_target_stack_size { } {
 proc dg-effective-target-value { effective_target } {
 if { "$effective_target" == "stack_size" } {
if [check_effective_target_stack_size] {
+   return 8192
return [target_info gcc,stack_size]
}
 }
...
it's listed as unsupported instead.

The info entry looks like:
...
'dg-require-stack-size SIZE'
 Skip the test if the target does not support a stack size of SIZE.
...

OK for trunk?

Thanks,
- Tom
Add dg-require-stack-size

2017-10-16  Tom de Vries  

	* gcc.dg/tree-ssa/ldist-27.c: Use dg-require-stack-size.
	* lib/target-supports-dg.exp (dg-require-stack-size): New proc.

	* doc/sourcebuild.texi (Test Directives, Variants of
	dg-require-support): Add dg-require-stack-size.

---
 gcc/doc/sourcebuild.texi |  3 +++
 gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c |  1 +
 gcc/testsuite/lib/target-supports-dg.exp | 15 +++
 3 files changed, 19 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index a2f0429..7d6d4a3 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2358,6 +2358,9 @@ Skip the test if the target does not support the @code{-fstack-check}
 option.  If @var{check} is @code{""}, support for @code{-fstack-check}
 is checked, for @code{-fstack-check=("@var{check}")} otherwise.
 
+@item dg-require-stack-size @var{size}
+Skip the test if the target does not support a stack size of @var{size}.
+
 @item dg-require-visibility @var{vis}
 Skip the test if the target does not support the @code{visibility} attribute.
 If @var{vis} is @code{""}, support for @code{visibility("hidden")} is
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c
index 3580c65..dd0e705 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-27.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O3 -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */
+/* { dg-require-stack-size "484000" } */
 
 #define M (300)
 #define N (200)
diff --git a/gcc/testsuite/lib/target-supports-dg.exp b/gcc/testsuite/lib/target-supports-dg.exp
index d50d8b0..999034c 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -180,6 +180,21 @@ proc dg-require-iconv { args } {
 }
 }
 
+# If this target does not have sufficient stack size, skip this test.
+
+proc dg-require-stack-size { args } {
+if { ![is-effective-target stack_size] } {
+	return
+}
+
+set stack_size [dg-effective-target-value stack_size]
+set required [lindex $args 1]
+if { $stack_size < $required } {
+	upvar dg-do-what dg-do-what
+set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+}
+
 # If this target does not support named sections skip this test.
 
 proc dg-require-named-sections { args } {


Re: [PATCH][GCC][testsuite][mid-end][ARM][AARCH64] Fix failing vec align tests.

2017-10-16 Thread Tamar Christina
Hi All,

I've submitted a patch to fix this 
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00971.html

Permission (just as the new patch) to backport these test changes to
GCC 7 to fix the regressions there?

Thanks,
Tamar.

From: Christophe Lyon 
Sent: Friday, October 6, 2017 5:07:44 PM
To: Tamar Christina
Cc: Rainer Orth; gcc-patches@gcc.gnu.org; nd; James Greenhalgh; Richard 
Earnshaw; Marcus Shawcroft
Subject: Re: [PATCH][GCC][testsuite][mid-end][ARM][AARCH64] Fix failing vec 
align tests.

On 6 October 2017 at 09:45, Tamar Christina  wrote:
>
>
>> -Original Message-
>> From: Rainer Orth [mailto:r...@cebitec.uni-bielefeld.de]
>> Sent: 05 October 2017 20:16
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd; James Greenhalgh; Richard Earnshaw;
>> Marcus Shawcroft
>> Subject: Re: [PATCH][GCC][testsuite][mid-end][ARM][AARCH64] Fix failing
>> vec align tests.
>>
>> Hi Tamar,
>>
>> > Previously I had corrected the vect_hw_misalign check which prompted
>> > these three test to start failing because the condition needs to be
>> > inverted in the testcases.
>> >
>> > Regtested on aarch64-none-elf, arm-none-linux-gnueabihf and x86_64-pc-
>> linux-gnu.
>> >
>> > Ok for trunk?
>> >
>> > Thanks,
>> > Tamar.
>> >
>> > gcc/testsuite/
>> > 2017-10-02  Tamar Christina  
>> >
>> > * gcc.dg/vect/vect-align-1.c: Fix vect_hw_misalign condition.
>> > * gcc.dg/vect/vect-align-2.c: Likewise.
>> > * gcc.dg/vect/vect-multitypes-1.c: Likewise.
>>
>> unfortunately, your patch caused gcc.dg/vect/vect-multitypes-1.c to FAIL on
>> sparc-sun-solaris2.11 (32 and 64-bit):
>>
>> FAIL: gcc.dg/vect/vect-multitypes-1.c -flto -ffat-lto-objects  
>> scan-tree-dump-
>> times vect "Vectorizing an unaligned access" 4
>> FAIL: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect
>> "Vectorizing an unaligned access" 4
>
> Thanks! I'll take a look.
>

If that's easier for you, I've noticed the same thing on
armeb-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a9
--with-fpu neon-fp16

Christophe


> Tamar
>
>>
>> It had XFAILed before.
>>
>>   Rainer
>>
>> --
>> -
>> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH PR/82546] tree node size

2017-10-16 Thread Nathan Sidwell

On 10/16/2017 02:49 AM, Richard Biener wrote:

On October 13, 2017 8:29:40 PM GMT+02:00, Nathan Sidwell  wrote:



I intend to continue cleaning this up of course.  It's not clear to me
whether we should cache these node sizes in an array, and the way it
goes about checking nodes with nested switches is understandable, but
possible not the fastest solution. However let's at least get the
sizing
right first.


We were conservative exactly to avoid the langhook here. I think there's 
similar 'bug' on the decl side.


The other code types (decls, exprs, etc) call the langhook.  tcc_type 
seems the exception (now?).


nathan

--
Nathan Sidwell


Re: [PATCH] Implement unique_xmalloc_ptr and add more selftests

2017-10-16 Thread Pedro Alves
On 10/14/2017 12:35 AM, David Malcolm wrote:

> As far as I can tell from your mail, the one issue that blocks that
> is the lack of gdb::unique_xmalloc_ptr.
> 
> So here's an patch on top of the previous one which adds the
> xmalloc_deleter (taken from gdb, but changing "xfree" to
> "free", and adding support for pre-C++11 dialects), along with
> selftests for unique_ptr, unique_xmalloc_ptr and
> unique_xmalloc_ptr.

Thanks much!

> As before, successfully bootstrapped & regrtested on x86_64-pc-linux-gnu,
> using gcc 4.8 for the initial bootstrap (hence testing both gnu++03
> then gnu++14 in the selftests, for stage 1 and stages 2 and 3
> respectively).
> 

Excellent.

> Hand-tested with "make selftest-valgrind" with both gnu++03 and
> gnu++14.
> 
> Also tested stage1 on powerpc-ibm-aix7.1.3.0 ("gcc111" in the
> compile farm; gcc 4.8 i.e. gnu++03)
> 
> Is this OK?

Looks great to me.

> +/* Verify that gnu::unique_malloc_ptr works.  */

Typo: malloc -> xmalloc.  Appears in other comments too.

Thanks,
Pedro Alves



Re: [PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-16 Thread Wilco Dijkstra
Qing Zhao wrote:

> Is  my patch Okay?

Given it's a mid-end patch this shouldn't be marked as AArch64 specific.
Similarly the PR needs to be updated to say middle-end. So resending
it making it clear it's not a target bug should help getting a review.

Wilco
 

Re: [AArch64] Patches for review

2017-10-16 Thread Wilco Dijkstra
Hi,


Here is the list of my AArch64 patches for review:

* https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01216.html (Fix symbol offset 
limit)
* https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00396.html (PR60580: Fix frame 
pointer option magic)
* https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00359.html (Improve 
aarch64_legitimate_constant_p)
* https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01228.html (Improve addressing 
of TI/TFmode)
* https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00377.html (Introduce 
emit_frame_chain)
* https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00381.html (Remove 
aarch64_frame_pointer_required)
* https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00515.html (Simplify 
aarch64_can_eliminate)
* https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01569.html (Simplify frame 
layout for stack probing)

Wilco


[PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-16 Thread Wilco Dijkstra
This patch cleans up autopref scheduling.

The code is greatly simplified.  Sort accesses on the offset first, and
only if the offsets are the same fall back to other comparisons in
rank_for_schedule.  This doesn't at all restore the original behaviour
since we no longer compare the base address, but it now defines a total
sorting order.  More work will be required to improve the sorting so
that only loads/stores with the same base are affected.

AArch64 bootstrap completes.

OK for commit?

ChangeLog:
2017-10-03  Wilco Dijkstra  

    PR rtl-optimization/82396
    * gcc/haifa-sched.c (autopref_multipass_init): Simplify
    initialization.
    (autopref_rank_data): Simplify sort order.
    * gcc/sched-int.h (autopref_multipass_data_): Remove
    multi_mem_insn_p, min_offset and max_offset.
--

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 
549e8961411ecd0a04ac3b24ba78b5d53e63258a..77ba8c5292a0801bbbcb35ed61ab3040015f6677
 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5568,9 +5568,7 @@ autopref_multipass_init (const rtx_insn *insn, int write)
 
   gcc_assert (data->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED);
   data->base = NULL_RTX;
-  data->min_offset = 0;
-  data->max_offset = 0;
-  data->multi_mem_insn_p = false;
+  data->offset = 0;
   /* Set insn entry initialized, but not relevant for auto-prefetcher.  */
   data->status = AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
 
@@ -5585,10 +5583,9 @@ autopref_multipass_init (const rtx_insn *insn, int write)
 {
   int n_elems = XVECLEN (pat, 0);
 
-  int i = 0;
-  rtx prev_base = NULL_RTX;
-  int min_offset = 0;
-  int max_offset = 0;
+  int i, offset;
+  rtx base, prev_base = NULL_RTX;
+  int min_offset = INT_MAX;
 
   for (i = 0; i < n_elems; i++)
 {
@@ -5596,38 +5593,23 @@ autopref_multipass_init (const rtx_insn *insn, int 
write)
   if (GET_CODE (set) != SET)
 return;
 
- rtx base = NULL_RTX;
- int offset = 0;
   if (!analyze_set_insn_for_autopref (set, write, &base, &offset))
 return;
 
- if (i == 0)
-   {
- prev_base = base;
- min_offset = offset;
- max_offset = offset;
-   }
   /* Ensure that all memory operations in the PARALLEL use the same
  base register.  */
- else if (REGNO (base) != REGNO (prev_base))
+ if (i > 0 && REGNO (base) != REGNO (prev_base))
 return;
- else
-   {
- min_offset = MIN (min_offset, offset);
- max_offset = MAX (max_offset, offset);
-   }
+ prev_base = base;
+ min_offset = MIN (min_offset, offset);
 }
 
-  /* If we reached here then we have a valid PARALLEL of multiple memory
-    ops with prev_base as the base and min_offset and max_offset
-    containing the offsets range.  */
+  /* If we reached here then we have a valid PARALLEL of multiple memory 
ops
+    with prev_base as the base and min_offset containing the offset.  */
   gcc_assert (prev_base);
   data->base = prev_base;
-  data->min_offset = min_offset;
-  data->max_offset = max_offset;
-  data->multi_mem_insn_p = true;
+  data->offset = min_offset;
   data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
-
   return;
 }
 
@@ -5637,7 +5619,7 @@ autopref_multipass_init (const rtx_insn *insn, int write)
 return;
 
   if (!analyze_set_insn_for_autopref (set, write, &data->base,
-  &data->min_offset))
+  &data->offset))
 return;
 
   /* This insn is relevant for the auto-prefetcher.
@@ -5646,63 +5628,6 @@ autopref_multipass_init (const rtx_insn *insn, int write)
   data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
 }
 
-
-/* Helper for autopref_rank_for_schedule.  Given the data of two
-   insns relevant to the auto-prefetcher modelling code DATA1 and DATA2
-   return their comparison result.  Return 0 if there is no sensible
-   ranking order for the two insns.  */
-
-static int
-autopref_rank_data (autopref_multipass_data_t data1,
-    autopref_multipass_data_t data2)
-{
-  /* Simple case when both insns are simple single memory ops.  */
-  if (!data1->multi_mem_insn_p && !data2->multi_mem_insn_p)
-    return data1->min_offset - data2->min_offset;
-
-  /* Two load/store multiple insns.  Return 0 if the offset ranges
- overlap and the difference between the minimum offsets otherwise.  */
-  else if (data1->multi_mem_insn_p && data2->multi_mem_insn_p)
-    {
-  int min1 = data1->min_offset;
-  int max1 = data1->max_offset;
-  int min2 = data2->min_offset;
-  int max2 = data2->max_offset;
-
-  if (max1 < min2 || min1 > max2)
-   return min1 - min2;
-  else
-   return 0;
-    }
-
-  /* The other two cases is a pair of a load/store multiple and
- a simple memory op.  Ret

Re: [PATCH 2/2] S/390: Do not end groups after fallthru edge

2017-10-16 Thread Andreas Krebbel
On 10/11/2017 01:53 PM, Robin Dapp wrote:
> This patch fixes cases where we start a new group although the previous one 
> has
> not ended.
> 
> Regression tested on s390x.
> 
> gcc/ChangeLog:
> 
> 2017-10-11  Robin Dapp  
> 
> * config/s390/s390.c (s390_has_ok_fallthru): New function.
> (s390_sched_score): Temporarily change s390_sched_state.
> (s390_sched_variable_issue): Use s390_last_sched_state in case the
> group is not finished yet.
> (s390_sched_init): Init s390_last_sched_state.
> 
> 
> ---
>  gcc/config/s390/s390.c | 44 +---
>  1 file changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 2411c67..7f08ba2 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -14346,6 +14346,31 @@ s390_z10_prevent_earlyload_conflicts (rtx_insn 
> **ready, int *nready_p)
>ready[0] = tmp;
>  }
> 
> +static bool
> +s390_has_ok_fallthru (rtx_insn *insn)

Please add a comment describing what the function does.
Wouldn't bb be a better parameter?
Perhaps we also could find a better name for it?! Something shorter for
s390_bb_likely_entered_via_fallthru

> +{
> +  edge e, fallthru_edge;
> +  edge_iterator ei;
> +  bool other_likely_edge = false;
> +
> +  fallthru_edge =
> +find_fallthru_edge (BLOCK_FOR_INSN (insn)->preds);
> +  FOR_EACH_EDGE (e, ei, BLOCK_FOR_INSN (insn)->preds)
> +{
> +  if (e != fallthru_edge
> +   && e->probability >= profile_probability::unlikely ())
> + {
> +   other_likely_edge = true;
> +   break;
> + }
> +}
> +
> +  if (fallthru_edge && !other_likely_edge)
> +return true;
> +
> +  return false;
> +}

edge e, fallthru_edge;
edge_iterator ei;

fallthru_edge =
   find_fallthru_edge (BLOCK_FOR_INSN (insn)->preds);

if (!fallthru_edge)
  return false;

FOR_EACH_EDGE (e, ei, BLOCK_FOR_INSN (insn)->preds)
  if (e != fallthru_edge
  && e->probability >= profile_probability::unlikely ())
return false;

return true;


>  /* The s390_sched_state variable tracks the state of the current or
> the last instruction group.
> @@ -14355,6 +14380,7 @@ s390_z10_prevent_earlyload_conflicts (rtx_insn 
> **ready, int *nready_p)
> 4 the last group was a cracked/expanded insn */
> 
>  static int s390_sched_state;
> +static int s390_last_sched_state;
> 
>  #define S390_SCHED_STATE_NORMAL  3
>  #define S390_SCHED_STATE_CRACKED 4
> @@ -14430,7 +14456,14 @@ s390_sched_score (rtx_insn *insn)
>unsigned int mask = s390_get_sched_attrmask (insn);
>int score = 0;
> 
> -  switch (s390_sched_state)
> +  int temp_sched_state = s390_sched_state;
> +
> +  if (s390_sched_state < s390_last_sched_state
> +  && s390_last_sched_state < S390_SCHED_STATE_NORMAL
> +  && s390_has_ok_fallthru (insn))
> +temp_sched_state = s390_last_sched_state;

Can't we just set s390_sched_state to s390_last_sched_state in s390_sched_init.

> +
> +  switch (temp_sched_state)
>  {
>  case 0:
>/* Try to put insns into the first slot which would otherwise
> @@ -14656,11 +14689,15 @@ s390_sched_variable_issue (FILE *file, int verbose, 
> rtx_insn *insn, int more)
> switch (s390_sched_state)
>   {
>   case 0:
> +   if (s390_last_sched_state > 0 && s390_has_ok_fallthru (insn))
> + s390_sched_state = s390_last_sched_state;

Could be removed after setting s390_sched_state in s390_sched_init.

> +
> +   if (s390_sched_state == 0)
> + starts_group = true;
> +   /* fallthrough */
>   case 1:
>   case 2:
>   case S390_SCHED_STATE_NORMAL:
> -   if (s390_sched_state == 0)
> - starts_group = true;
> if (s390_sched_state == S390_SCHED_STATE_NORMAL)
>   {
> starts_group = true;
> @@ -14768,6 +14805,7 @@ s390_sched_init (FILE *file ATTRIBUTE_UNUSED,
>  {
>last_scheduled_insn = NULL;
>memset (last_scheduled_unit_distance, 0, MAX_SCHED_UNITS * sizeof (int));
> +  s390_last_sched_state = s390_sched_state;
>s390_sched_state = 0;

At the very first execution of s390_sched_init s390_sched_state would be used 
without being
initialized first. Well it is a static int and hence set to zero. But perhaps 
it makes sense to make
this explicit.

Preserving the sched state across basic blocks for your case works only if the 
BBs are traversed
with the fall through edges coming first. Is that the case? We probably should 
have a description
for s390_last_sched_state stating this.

-Andreas-



[patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-16 Thread Sam van Kampen via gcc-patches
This patch adds a warning flag for the warning described in bug report
#61414. My proposed warning flag is -Wenum-bitfield-conversion, which
corresponds with the warning flag that clang has for a similar warning.

2017-10-16  Sam van Kampen  

* c-family/c.opt: Add a warning flag for struct bit-fields
being too small to hold enumerated types.
* cp/class.c: Likewise.
* doc/invoke.texi: Add documentation for said warning flag.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 253769)
+++ gcc/c-family/c.opt  (working copy)
@@ -500,6 +500,10 @@ Wenum-compare
 C ObjC C++ ObjC++ Var(warn_enum_compare) Init(-1) Warning LangEnabledBy(C 
ObjC,Wall || Wc++-compat)
 Warn about comparison of different enum types.
 
+Wenum-bitfield-conversion
+C++ Var(warn_enum_bitfield_conversion) Init(1) Warning
+Warn about struct bit-fields being too small to hold enumerated types.
+
 Werror
 C ObjC C++ ObjC++
 ; Documented in common.opt
Index: gcc/cp/class.c
===
--- gcc/cp/class.c  (revision 253769)
+++ gcc/cp/class.c  (working copy)
@@ -3278,10 +3278,11 @@ check_bitfield_decl (tree field)
   && tree_int_cst_lt (TYPE_SIZE (type), w)))
warning_at (DECL_SOURCE_LOCATION (field), 0,
"width of %qD exceeds its type", field);
-  else if (TREE_CODE (type) == ENUMERAL_TYPE
+  else if (warn_enum_bitfield_conversion
+  && TREE_CODE (type) == ENUMERAL_TYPE
   && (0 > (compare_tree_int
(w, TYPE_PRECISION (ENUM_UNDERLYING_TYPE (type))
-   warning_at (DECL_SOURCE_LOCATION (field), 0,
+   warning_at (DECL_SOURCE_LOCATION (field), OPT_Wenum_bitfield_conversion,
"%qD is too small to hold all values of %q#T",
field, type);
 }
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 253769)
+++ gcc/doc/invoke.texi (working copy)
@@ -277,8 +277,8 @@ Objective-C and Objective-C++ Dialects}.
 -Wno-discarded-qualifiers  -Wno-discarded-array-qualifiers @gol
 -Wno-div-by-zero  -Wdouble-promotion @gol
 -Wduplicated-branches  -Wduplicated-cond @gol
--Wempty-body  -Wenum-compare  -Wno-endif-labels  -Wexpansion-to-defined @gol
--Werror  -Werror=*  -Wextra-semi  -Wfatal-errors @gol
+-Wempty-body  -Wenum-bitfield-conversion  -Wenum-compare  -Wno-endif-labels 
@gol
+-Wexpansion-to-defined  -Werror  -Werror=*  -Wextra-semi  -Wfatal-errors @gol
 -Wfloat-equal  -Wformat  -Wformat=2 @gol
 -Wno-format-contains-nul  -Wno-format-extra-args  @gol
 -Wformat-nonliteral -Wformat-overflow=@var{n} @gol
@@ -6126,6 +6126,12 @@ Warn when an expression is casted to its own type.
 Warn if an empty body occurs in an @code{if}, @code{else} or @code{do
 while} statement.  This warning is also enabled by @option{-Wextra}.
 
+@item -Wenum-bitfield-conversion
+@opindex Wenum-bitfield-conversion
+@opindex Wno-enum-bitfield-conversion
+Warn about a bit-field potentially being too small to hold all values
+of an enumerated type. This warning is enabled by default.
+
 @item -Wenum-compare
 @opindex Wenum-compare
 @opindex Wno-enum-compare


Missing REDUCE[SD,SS] intrinsics

2017-10-16 Thread Peryt, Sebastian
Hi,

This patch written by Olga Makhotina adds missing intrinsics for REDUCE[SD,SS].

16.10.2017 Olga Makhotina 

gcc/
* config/i386/avx512dqintrin.h (_mm_mask_reduce_sd,
_mm_maskz_reduce_sd, _mm_mask_reduce_ss, 
_mm_maskz_reduce_ss): New intrinsics.
* config/i386/i386-builtin.def (__builtin_ia32_reducesd_mask,
__builtin_ia32_reducess_mask): New builtin.
(__builtin_ia32_reducesd, __builtin_ia32_reducess): Remove.
* config/i386/sse.md (reduces): Renamed to ...
(reduces): ... this.
(vreduce\t{%3, %2, %1, %0|%0, %1, %2, %3}): 
Changed to ...
(vreduce\t{%3, %2, %1, %0|
%0, %1, %2, %3}): ... this.

gcc/testsuite/
* gcc.target/i386/avx512dq-vreducesd-1.c (_mm_mask_reduce_sd,
_mm_maskz_reduce_sd): Test new intrinsics.
* gcc.target/i386/avx512dq-vreducesd-2.c: New.
* gcc.target/i386/avx512dq-vreducess-1.c (_mm_mask_reduce_ss,
_mm_maskz_reduce_ss): Test new intrinsics.
* gcc.target/i386/avx512dq-vreducess-2.c: New.
* gcc.target/i386/avx-1.c (__builtin_ia32_reducesd,
__builtin_ia32_reducess): Remove builtin.
(__builtin_ia32_reducesd_mask,
__builtin_ia32_reducess_mask): Test new builtin.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

Is it ok for trunk?
 
Thanks,
Sebastian



0001-reduce_ss-reduce_sd.patch
Description: 0001-reduce_ss-reduce_sd.patch


Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-10-16 Thread Wilco Dijkstra

ping
    
Kyrill Tkachov wrote:
> On 14/12/16 16:37, Wilco Dijkstra wrote:
>
> > Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
> > unnecessary duplication and repeating bugs like PR78439 due to changes being
> > applied only to one of the duplicates.
>
> When this was brought up Ramana requested some more investigations on the 
> codegen impact [1].
> Have you done the archaeology on why we had two patterns in the first place 
> and what the codegen
> effect is of remove the Cortex-A8-specific one?

Yes, the reason to split the pattern was to introduce the '!' to discourage 
Neon->int moves on Cortex-A8 (https://patches.linaro.org/patch/541/). I am not 
removing the optimization for Cortex-A8, however I   haven't been able to find 
an example where it makes a difference, even on high register pressure code.

Wilco


> Bootstrap OK for ARM and Thumb-2 gnueabihf targets. OK for commit?
>
> ChangeLog:
> 2016-11-29  Wilco Dijkstra  
>
>  * config/arm/vfp.md (movdi_vfp): Merge changes from 
>movdi_vfp_cortexa8.
>  * (movdi_vfp_cortexa8): Remove pattern.
> --
>
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 
> 2051f1018f1cbff9c5bf044e71304d78e615458e..a917aa625a7b15f6c9e2b549ab22e5219bb9b99c
>  100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -304,9 +304,9 @@
>   ;; DImode moves
>   
>   (define_insn "*movdi_vfp"
> -  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
> "=r,r,r,r,q,q,m,w,r,w,w, Uv")
> +  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
> "=r,r,r,r,q,q,m,w,!r,w,w, Uv")
>  (match_operand:DI 1 "di_operand"  
>"r,rDa,Db,Dc,mi,mi,q,r,w,w,Uvi,w"))]
> -  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune != TARGET_CPU_cortexa8
> +  "TARGET_32BIT && TARGET_HARD_FLOAT
>  && (   register_operand (operands[0], DImode)
>  || register_operand (operands[1], DImode))
>  && !(TARGET_NEON && CONST_INT_P (operands[1])
> @@ -339,71 +339,25 @@
>   }
> "
> [(set_attr "type" 
>"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
> -   (set (attr "length") (cond [(eq_attr "alternative" "1,4,5,6") (const_int 
> 8)
> +   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
> (eq_attr "alternative" "2") (const_int 12)
> (eq_attr "alternative" "3") (const_int 16)
> + (eq_attr "alternative" "4,5,6")
> +  (symbol_ref 
> "arm_count_output_move_double_insns (operands) * 4")
> (eq_attr "alternative" "9")
>  (if_then_else
>    (match_test "TARGET_VFP_SINGLE")
>    (const_int 8)
>    (const_int 4))]
> (const_int 4)))
> +   (set_attr "predicable"    "yes")
>  (set_attr "arm_pool_range" "*,*,*,*,1020,4096,*,*,*,*,1020,*")
>  (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
>  (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
> +   (set (attr "ce_count") (symbol_ref "get_attr_length (insn) / 4"))
>  (set_attr "arch"   
>"t2,any,any,any,a,t2,any,any,any,any,any,any")]
>   )
>   
> -(define_insn "*movdi_vfp_cortexa8"
> -  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
> "=r,r,r,r,r,r,m,w,!r,w,w, Uv")
> -   (match_operand:DI 1 "di_operand"  
> "r,rDa,Db,Dc,mi,mi,r,r,w,w,Uvi,w"))]
> -  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune == TARGET_CPU_cortexa8
> -    && (   register_operand (operands[0], DImode)
> -    || register_operand (operands[1], DImode))
> -    && !(TARGET_NEON && CONST_INT_P (operands[1])
> -    && neon_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
> -  "*
> -  switch (which_alternative)
> -    {
> -    case 0:
> -    case 1:
> -    case 2:
> -    case 3:
> -  return \"#\";
> -    case 4:
> -    case 5:
> -    case 6:
> -  return output_move_double (operands, true, NULL);
> -    case 7:
> -  return \"vmov%?\\t%P0, %Q1, %R1\\t%@ int\";
> -    case 8:
> -  return \"vmov%?\\t%Q0, %R0, %P1\\t%@ int\";
> -    case 9:
> -  return \"vmov%?.f64\\t%P0, %P1\\t%@ int\";
> -    case 10: case 11:
> -  return output_move_vfp (operands);
> -    default:
> -  gcc_unreachable ();
> -    }
> -  "
> -  [(set_attr "type" 
> "multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
> -   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
> -   (eq_attr "alternative" "2") (const_int 12)
> -   (eq_attr "alternative" "3") (const_int 16)
> -   (eq_attr "alternative" "4,5,6")
> -  (symbol_ref
> -   "arm_count_out

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-10-16 Thread Wilco Dijkstra
    

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,&r")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,&r")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,&r")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (de

Re: [PATCH] Fix PR82396: qsort comparator non-negative on sorted output

2017-10-16 Thread Maxim Kuvyrkov
> On Oct 16, 2017, at 1:56 PM, Wilco Dijkstra  wrote:
> 
> This patch cleans up autopref scheduling.
> 
> The code is greatly simplified.  Sort accesses on the offset first, and
> only if the offsets are the same fall back to other comparisons in
> rank_for_schedule.  This doesn't at all restore the original behaviour
> since we no longer compare the base address, but it now defines a total
> sorting order.  More work will be required to improve the sorting so
> that only loads/stores with the same base are affected.
> 
> AArch64 bootstrap completes.
> 
> OK for commit?

Looks good to me.  I'm not an official scheduler maintainer, so wait for an ack 
from a maintainer.

--
Maxim Kuvyrkov
www.linaro.org




Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-16 Thread Martin Liška
On 10/13/2017 03:13 PM, Jakub Jelinek wrote:
> On Fri, Oct 13, 2017 at 02:53:50PM +0200, Martin Liška wrote:
>> @@ -3826,6 +3827,19 @@ pointer_diff (location_t loc, tree op0, tree op1)
>>  pedwarn (loc, OPT_Wpointer_arith,
>>   "pointer to a function used in subtraction");
>>  
>> +  if (sanitize_flags_p (SANITIZE_POINTER_SUBTRACT))
>> +{
>> +  gcc_assert (current_function_decl != NULL_TREE);
>> +
>> +  op0 = save_expr (op0);
>> +  op1 = save_expr (op1);
>> +
>> +  tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_SUBTRACT);
>> +  *instrument_expr
>> += build_call_expr_loc (loc, tt, 2, c_fully_fold (op0, false, NULL),
>> +   c_fully_fold (op1, false, NULL));
>> +}
> 

Hello Jakub.

> Why the c_fully_fold?  Can't that be deferred until it actually is
> folded all together later?

Yes, now it's not needed as I use save_expr. I hit some ICE before I switched
to use save_expr. That's why I put it there.

> 
>> +  ret = pointer_diff (location, op0, op1, &instrument_expr);
>>goto return_build_binary_op;
>>  }
>>/* Handle pointer minus int.  Just like pointer plus int.  */
>> @@ -11707,6 +11721,18 @@ build_binary_op (location_t location, enum 
>> tree_code code,
>>pedwarn (location, 0,
>> "comparison of distinct pointer types lacks a cast");
>>  }
>> +
>> +  if (sanitize_flags_p (SANITIZE_POINTER_COMPARE))
>> +{
>> +  gcc_assert (current_function_decl != NULL_TREE);
>> +
>> +  op0 = save_expr (op0);
>> +  op1 = save_expr (op1);
>> +
>> +  tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_COMPARE);
>> +  instrument_expr
>> += build_call_expr_loc (location, tt, 2, op0, op1);
>> +}
> 
> Is this the right spot for this?  I mean then you don't handle
> ptr >= 0 or ptr > 0 and similar or ptr >= 0x12345678.
> I know we have warnings or pedwarns for those, still I think it would be
> better to handle the above e.g. before
>   if ((TREE_CODE (TREE_TYPE (orig_op0)) == BOOLEAN_TYPE
>|| truth_value_p (TREE_CODE (orig_op0)))
>   ^ (TREE_CODE (TREE_TYPE (orig_op1)) == BOOLEAN_TYPE
>  || truth_value_p (TREE_CODE (orig_op1
> maybe_warn_bool_compare (location, code, orig_op0, orig_op1);
> by testing if ((code0 == POINTER_TYPE || code1 == POINTER_TYPE)
>  && sanitize_flags_p (SANITIZE_POINTER_COMPARE))

Agree.

> 
> What about the C++ FE?  Or is pointer comparison well defined there?
> What about pointer subtraction? My memory is fuzzy.

No, you're right, I basically forgot:

```
>From § 5.9 of the C++11 standard:

If two pointers p and q of the same type point to different objects that
are not members of the same object or elements of the same array or to
different functions, or if only one of them is null, the results
of pq, p<=q, and  p>=q are unspecified.

```

I also moved the tests to c-c++-common/asan/ subfolder.

> 
>> +// { dg-options "-fsanitize=pointer-compare -O0" }
> 
> Please use -fsanitize=address,pointer-compare
> etc. in the testcases, so that it is an example to users who don't know
> we have implicit -fsanitize=address for these tests.
>> +  if (offset <= 2048) {
>> +if (__asan_region_is_poisoned (left, offset) == 0)
> 
> I think the LLVM coding conventions don't want a space before ( above.
> 
>> +// check whether left is a stack memory pointer
>> +if (GetStackVariableBeginning(left, &shadow_offset1)) {
>> +  if (GetStackVariableBeginning(right - 1, &shadow_offset2)
>> +  && shadow_offset1 == shadow_offset2)
> 
> Nor && at the beginning of the line (they want it at the end of previous
> one).
> 
>> +return;
>> +  else
>> +goto do_error;
>> +}
> 
> If you have goto do_error; for all cases, then you don't need to indent
> further stuff into else ... further and further.
> 
>> +// check whether left is a heap memory address
>> +else {
>> +  HeapAddressDescription hdesc1, hdesc2;
>> +  if (GetHeapAddressInformation(left, 0, &hdesc1)
>> +  && hdesc1.chunk_access.access_type == kAccessTypeInside) {
>> +if (GetHeapAddressInformation(right, 0, &hdesc2)
>> +&& hdesc2.chunk_access.access_type == kAccessTypeInside
>> +&& (hdesc1.chunk_access.chunk_begin
>> +  == hdesc2.chunk_access.chunk_begin))
>> +  return;
> 
> So, here one is a heap object and the other is not.  That should be an
> do_error, right?

Yes, coding style should be fixed.

> 
>> +  } else {
>> +// check whether left is an address of a global variable
>> +GlobalAddressDescription gdesc1, gdesc2;
>> +if (GetGlobalAddressInformation(left, 0, &gdesc1)) {
>> +  if (GetGlobalAddressInformation(right - 1, 0, &gdesc2)
>> +  && gdesc1.PointsInsideASameVariable (gdesc2))
>> +return;
>> +} else {
>> +  // TODO
> 
> Either goto do_error; here too, or do the if (offset <

Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-16 Thread Martin Liška
On 10/13/2017 04:59 PM, Jeff Law wrote:
> On 10/13/2017 07:02 AM, Martin Liška wrote:
>> On 10/12/2017 11:54 PM, Jeff Law wrote:
>>> On 10/11/2017 12:13 AM, Martin Liška wrote:
 2017-10-10  Martin Liska  

PR tree-optimization/82493
* sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
(test_range_functions): New function.
(sbitmap_c_tests): Likewise.
* selftest-run-tests.c (selftest::run_tests): Run new tests.
* selftest.h (sbitmap_c_tests): New function.
>>> I went ahead and committed this along with a patch to fix the off-by-one
>>> error in live_bytes_read.  Bootstrapped and regression tested on x86.
>>>
>>> Actual patch attached for archival purposes.
>>>
>>> Jeff
>>>
>>
>> Hello.
>>
>> I wrote a patch that adds various gcc_checking_asserts and I hit following:
>>
>> ./xgcc -B. 
>> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90 -c 
>> -O2
>> during GIMPLE pass: dse
>> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90:7:0:
>>
>>   program testat
>>  
>> internal compiler error: in bitmap_check_index, at sbitmap.h:105
>> 0x1c014c1 bitmap_check_index
>>  ../../gcc/sbitmap.h:105
>> 0x1c01fa7 bitmap_bit_in_range_p(simple_bitmap_def const*, unsigned int, 
>> unsigned int)
>>  ../../gcc/sbitmap.c:335
>> 0x1179002 live_bytes_read
>>  ../../gcc/tree-ssa-dse.c:497
>> 0x117935a dse_classify_store
>>  ../../gcc/tree-ssa-dse.c:595
>> 0x1179947 dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)
>>  ../../gcc/tree-ssa-dse.c:786
>> 0x1179b6e dse_dom_walker::before_dom_children(basic_block_def*)
>>  ../../gcc/tree-ssa-dse.c:853
>> 0x1a6f659 dom_walker::walk(basic_block_def*)
>>  ../../gcc/domwalk.c:308
>> 0x1179cb9 execute
>>  ../../gcc/tree-ssa-dse.c:907
>>
>> Where we call:
>> Breakpoint 1, bitmap_bit_in_range_p (bmap=0x29d6cd0, start=0, end=515) at 
>> ../../gcc/sbitmap.c:335
>> 335bitmap_check_index (bmap, end);
>> (gdb) p *bmap
>> $1 = {n_bits = 256, size = 4, elms = {255}}
>>
>> Is it a valid call or should caller check indices?
> It doesn't look valid to me.  I'll dig into it.
> 
> In general the sbitmap interface requires callers to DTRT -- failure can
> easily lead to an out of bounds read or write.  It's one of the things I
> really dislike about the sbitmap implementation.
> 
> So it's safe to assume that I'm fully supportive of adding more testing
> to catch this kind thing.
> 
> Jeff
> 

Good.

Should I prepare fix for the ICE I mentioned or have you been working on that?

Thanks,
Martin


Re: [PATCH] scheduler bug fix for AArch64 insn fusing SCHED_GROUP usage

2017-10-16 Thread Maxim Kuvyrkov
> On Oct 11, 2017, at 6:52 AM, Jim Wilson  wrote:
> 
> On Sun, 2017-09-10 at 19:45 +0200, Jim Wilson wrote:
>> -- Forwarded message --
>> From: Jim Wilson 
>> Date: Tue, Sep 5, 2017 at 8:04 PM
>> Subject: Re: [PATCH] scheduler bug fix for AArch64 insn fusing
>> SCHED_GROUP usage
>> To: "gcc-patches@gcc.gnu.org" 
>> Cc: Jim Wilson 
>> 
>> 
>> ping^2
>> 
>> On Fri, Jul 21, 2017 at 3:09 PM, Jim Wilson 
>> wrote:
>>> 
>>> Ping.
>>> 
>>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00779.html
>>> 
> 
> Silly me.  I noticed that I am still listed as a sched maintainer,
> which means I can self approve this.
> 
> I reverified it with x86_64 and aarch64 bootstraps and make checks.  I
> also did a C only testsuite run for avr, which appears to be the best
> maintained cc0 target.  There were no regressions for any of the 3
> targets.
> 
> The patch was applied.

FWIW, the patch looks good.

--
Maxim Kuvyrkov
www.linaro.org





Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-16 Thread Jakub Jelinek
On Mon, Oct 16, 2017 at 01:57:59PM +0200, Martin Liška wrote:
> Agree. Do you feel that it's right moment to trigger review process of 
> libsanitizer
> changes?

Yes.  We don't have that much time left to get it through...

> --- a/libsanitizer/asan/asan_report.cc
> +++ b/libsanitizer/asan/asan_report.cc
> +  {
> +uptr shadow_offset1, shadow_offset2;
> +ThreadRegistryLock l(&asanThreadRegistry());
> +
> +// check whether left is a stack memory pointer
> +if (GetStackVariableBeginning(left, &shadow_offset1)) {
> +  if (GetStackVariableBeginning(right - 1, &shadow_offset2) &&
> +   shadow_offset1 == shadow_offset2)
> + return;
> +  else
> + goto do_error;
> +}

Do you need the ThreadRegistryLock for the following calls?
If not, the } should be closed and it should be reindented.

> +// check whether left is a heap memory address
> +HeapAddressDescription hdesc1, hdesc2;
> +if (GetHeapAddressInformation(left, 0, &hdesc1) &&
> + hdesc1.chunk_access.access_type == kAccessTypeInside) {
> +  if (GetHeapAddressInformation(right, 0, &hdesc2) &&
> +   hdesc2.chunk_access.access_type == kAccessTypeInside &&
> +   (hdesc1.chunk_access.chunk_begin
> +== hdesc2.chunk_access.chunk_begin))
> + return;
> +  else
> + goto do_error;
> +}
> +// check whether left is an address of a global variable
> +GlobalAddressDescription gdesc1, gdesc2;
> +if (GetGlobalAddressInformation(left, 0, &gdesc1)) {
> +  if (GetGlobalAddressInformation(right - 1, 0, &gdesc2) &&
> +   gdesc1.PointsInsideASameVariable (gdesc2))
> + return;
> +} else
> +  goto do_error;

??  If we don't know anything about the left object, doing a goto do_error;
sounds dangerous, it could be say mmapped region or whatever else.

Though, we could at that spot at least check if right isn't one
of the 3 kinds of regions we track and if yes, error out.
So perhaps move the else goto do_error; inside of the {} and
do
  if (GetStackVariableBeginning(right - 1, &shadow_offset2) ||
  GetHeapAddressInformation(right, 0, &hdesc2) ||
  GetGlobalAddressInformation(right - 1, 0, &gdesc2))
goto do_error;
  return;
(if the lock above is released, you'd of course need to retake it for
if (GetStackVariableBeginning(right - 1, &shadow_offset2)).

Jakub


Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-16 Thread Sam van Kampen via gcc-patches
..I just realised that the clang flag is -Wbitfield-enum-conversion, not
-Wenum-bitfield-conversion. Please apply the patch below instead, which
has replaced the two words to remove the inconsistency.

2017-10-16  Sam van Kampen  

* c-family/c.opt: Add a warning flag for struct bit-fields
being too small to hold enumerated types.
* cp/class.c: Likewise.
* doc/invoke.texi: Add documentation for said warning flag.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 253784)
+++ gcc/c-family/c.opt  (working copy)
@@ -346,6 +346,10 @@ Wframe-address
 C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn when __builtin_frame_address or __builtin_return_address is used unsafely.
 
+Wbitfield-enum-conversion
+C++ Var(warn_bitfield_enum_conversion) Init(1) Warning
+Warn about struct bit-fields being too small to hold enumerated types.
+
 Wbuiltin-declaration-mismatch
 C ObjC C++ ObjC++ Var(warn_builtin_declaraion_mismatch) Init(1) Warning
 Warn when a built-in function is declared with the wrong signature.
Index: gcc/cp/class.c
===
--- gcc/cp/class.c  (revision 253784)
+++ gcc/cp/class.c  (working copy)
@@ -3278,10 +3278,11 @@ check_bitfield_decl (tree field)
   && tree_int_cst_lt (TYPE_SIZE (type), w)))
warning_at (DECL_SOURCE_LOCATION (field), 0,
"width of %qD exceeds its type", field);
-  else if (TREE_CODE (type) == ENUMERAL_TYPE
+  else if (warn_bitfield_enum_conversion
+  && TREE_CODE (type) == ENUMERAL_TYPE
   && (0 > (compare_tree_int
(w, TYPE_PRECISION (ENUM_UNDERLYING_TYPE (type))
-   warning_at (DECL_SOURCE_LOCATION (field), 0,
+   warning_at (DECL_SOURCE_LOCATION (field), OPT_Wbitfield_enum_conversion,
"%qD is too small to hold all values of %q#T",
field, type);
 }
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 253784)
+++ gcc/doc/invoke.texi (working copy)
@@ -264,7 +264,7 @@ Objective-C and Objective-C++ Dialects}.
 -Walloca  -Walloca-larger-than=@var{n} @gol
 -Wno-aggressive-loop-optimizations  -Warray-bounds  -Warray-bounds=@var{n} @gol
 -Wno-attributes  -Wbool-compare  -Wbool-operation @gol
--Wno-builtin-declaration-mismatch @gol
+-Wbitfield-enum-conversion -Wno-builtin-declaration-mismatch @gol
 -Wno-builtin-macro-redefined  -Wc90-c99-compat  -Wc99-c11-compat @gol
 -Wc++-compat  -Wc++11-compat  -Wc++14-compat  @gol
 -Wcast-align  -Wcast-align=strict  -Wcast-qual  @gol
@@ -5431,6 +5431,12 @@ Incrementing a boolean is invalid in C++17, and de
 
 This warning is enabled by @option{-Wall}.
 
+@item -Wbitfield-enum-conversion
+@opindex Wbitfield-enum-conversion
+@opindex Wno-bitfield-enum-conversion
+Warn about a bit-field potentially being too small to hold all values
+of an enumerated type. This warning is enabled by default.
+
 @item -Wduplicated-branches
 @opindex Wno-duplicated-branches
 @opindex Wduplicated-branches



[PATCH][AArch64] Wrong type-attribute for stp and str

2017-10-16 Thread Dominik Inführ
Hi,

it seems the type attributes for neon_stp and neon_store1_1reg should be the 
other way around.

Thanks
Dominik

ChangeLog:
2017-10-16  Dominik Infuehr  

* config/aarch64/aarch64-simd.md
(*aarch64_simd_mov): Fix type-attribute.
--
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 49f615cfdbf..409ad3502ff 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -160,8 +160,8 @@
gcc_unreachable ();
 }
 }
-  [(set_attr "type" "neon_load1_1reg, neon_store1_1reg,\
-neon_stp, neon_logic, multiple, multiple,\
+  [(set_attr "type" "neon_load1_1reg, neon_stp, neon_store1_1reg,\
+neon_logic, multiple, multiple,\
 multiple, neon_move")
(set_attr "length" "4,4,4,4,8,8,8,4")]
 )


signature.asc
Description: Message signed with OpenPGP using GPGMail


[Committed][testsuite] Fix PR82442

2017-10-16 Thread Wilco Dijkstra
Recently the gcc.dg/vect/pr31699.c was modified to check for
vect_float effective target instead for vect_double.  As a 
result it now fails on armhf.  Fix by avoiding double.
Tested on armhf, aarch64 and x64.

Committed as trivial.

ChangeLog:
2017-10-16  Wilco Dijkstra  

gcc/testsuite/
PR target/82442
* gcc.dg/vect/pr31699.c: Fix testcase.
--

diff --git a/gcc/testsuite/gcc.dg/vect/pr31699.c 
b/gcc/testsuite/gcc.dg/vect/pr31699.c
index 
7ec4dfe770a8ce0f98c6ff46e6f301e88b63fc0e..b0b9971fcfc9f6f90ccef7bf1de10fd109c3d07b
 100644
--- a/gcc/testsuite/gcc.dg/vect/pr31699.c
+++ b/gcc/testsuite/gcc.dg/vect/pr31699.c
@@ -7,9 +7,9 @@
 float x[256];
 
 __attribute__ ((noinline))
-double *foo(void)
+float *foo(void)
 {
- double *z = malloc (sizeof(double) * 256);
+ float *z = malloc (sizeof(float) * 256);
 
  int i;
  for (i=0; i<256; ++i)



Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-16 Thread Martin Liška
On 10/16/2017 02:21 PM, Jakub Jelinek wrote:
> On Mon, Oct 16, 2017 at 01:57:59PM +0200, Martin Liška wrote:
>> Agree. Do you feel that it's right moment to trigger review process of 
>> libsanitizer
>> changes?
> 
> Yes.  We don't have that much time left to get it through...

Good, I've triggered regression tests. Will send it to phsabricator later this 
evening.

> 
>> --- a/libsanitizer/asan/asan_report.cc
>> +++ b/libsanitizer/asan/asan_report.cc
>> +  {
>> +uptr shadow_offset1, shadow_offset2;
>> +ThreadRegistryLock l(&asanThreadRegistry());
>> +
>> +// check whether left is a stack memory pointer
>> +if (GetStackVariableBeginning(left, &shadow_offset1)) {
>> +  if (GetStackVariableBeginning(right - 1, &shadow_offset2) &&
>> +  shadow_offset1 == shadow_offset2)
>> +return;
>> +  else
>> +goto do_error;
>> +}
> 
> Do you need the ThreadRegistryLock for the following calls?
> If not, the } should be closed and it should be reindented.

Yes, we do.

> 
>> +// check whether left is a heap memory address
>> +HeapAddressDescription hdesc1, hdesc2;
>> +if (GetHeapAddressInformation(left, 0, &hdesc1) &&
>> +hdesc1.chunk_access.access_type == kAccessTypeInside) {
>> +  if (GetHeapAddressInformation(right, 0, &hdesc2) &&
>> +  hdesc2.chunk_access.access_type == kAccessTypeInside &&
>> +  (hdesc1.chunk_access.chunk_begin
>> +   == hdesc2.chunk_access.chunk_begin))
>> +return;
>> +  else
>> +goto do_error;
>> +}
>> +// check whether left is an address of a global variable
>> +GlobalAddressDescription gdesc1, gdesc2;
>> +if (GetGlobalAddressInformation(left, 0, &gdesc1)) {
>> +  if (GetGlobalAddressInformation(right - 1, 0, &gdesc2) &&
>> +  gdesc1.PointsInsideASameVariable (gdesc2))
>> +return;
>> +} else
>> +  goto do_error;
> 
> ??  If we don't know anything about the left object, doing a goto do_error;
> sounds dangerous, it could be say mmapped region or whatever else.

Good point!

> 
> Though, we could at that spot at least check if right isn't one
> of the 3 kinds of regions we track and if yes, error out.
> So perhaps move the else goto do_error; inside of the {} and
> do
>   if (GetStackVariableBeginning(right - 1, &shadow_offset2) ||
>   GetHeapAddressInformation(right, 0, &hdesc2) ||
>   GetGlobalAddressInformation(right - 1, 0, &gdesc2))
> goto do_error;
>   return;
> (if the lock above is released, you'd of course need to retake it for
> if (GetStackVariableBeginning(right - 1, &shadow_offset2)).

Yes, should be fixed now.

Martin

> 
>   Jakub
> 

>From 55e226413b9b3533b4167a4895b40f66cd665f11 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Oct 2017 12:14:25 +0200
Subject: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

gcc/ChangeLog:

2017-10-13  Martin Liska  

	* doc/invoke.texi: Document the options.
	* flag-types.h (enum sanitize_code): Add
	SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
	* ipa-inline.c (sanitize_attrs_match_for_inline_p): Add handling
	of SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
	* opts.c: Define new sanitizer options.
	* sanitizer.def (BUILT_IN_ASAN_POINTER_COMPARE): Likewise.
	(BUILT_IN_ASAN_POINTER_SUBTRACT): Likewise.

gcc/c/ChangeLog:

2017-10-13  Martin Liska  

	* c-typeck.c (pointer_diff): Add new argument and instrument
	pointer subtraction.
	(build_binary_op): Similar for pointer comparison.

gcc/cp/ChangeLog:

2017-10-16  Martin Liska  

	* typeck.c (pointer_diff): Add new argument and instrument
	pointer subtraction.
	(cp_build_binary_op): Create compound expression if doing an
	instrumentation.

gcc/testsuite/ChangeLog:

2017-10-16  Martin Liska  

	* c-c++-common/asan/pointer-compare-1.c: New test.
	* c-c++-common/asan/pointer-compare-2.c: New test.
	* c-c++-common/asan/pointer-subtract-1.c: New test.
	* c-c++-common/asan/pointer-subtract-2.c: New test.
---
 gcc/c/c-typeck.c   | 33 +++--
 gcc/cp/typeck.c| 43 +--
 gcc/doc/invoke.texi| 22 ++
 gcc/flag-types.h   |  2 +
 gcc/ipa-inline.c   |  8 ++-
 gcc/opts.c | 15 
 gcc/sanitizer.def  |  4 ++
 .../c-c++-common/asan/pointer-compare-1.c  | 83 ++
 .../c-c++-common/asan/pointer-compare-2.c  | 76 
 .../c-c++-common/asan/pointer-subtract-1.c | 41 +++
 .../c-c++-common/asan/pointer-subtract-2.c | 32 +
 libsanitizer/asan/asan_descriptions.cc | 29 
 libsanitizer/asan/asan_descriptions.h  |  5 ++
 libsanitizer/asan/asan_report.cc   | 70 --
 libsanitizer/asan/asan_thread.cc   | 23 ++
 libsanitizer/asan/asan_thread.h|  3 +

Re: [PATCH, GCC/ARM] Allow +nodsp for -mcpu=cortex-m33

2017-10-16 Thread Richard Earnshaw (lists)
On 16/10/17 10:27, Thomas Preudhomme wrote:
> Hi,
> 
> DSP instructions are optional for Arm Cortex-M33, yet its -mcpu option
> does not allow +nodsp. Users are thus left with using
> -march=armv8-m.main -mtune=cortex-m33. This patch allows +nodsp to
> -mcpu=cortex-m33.
> 
> ChangeLog entry is as follows:
> 
> *** gcc/ChangeLog ***
> 
> 2017-10-11  Thomas Preud'homme  
> 
> * config/arm/arm-cpus.in (cortex-m33): Add nodsp option.
> * doc/invoke.texi: Document +nodsp as a valid extension for
> -mcpu=cortex-m33.
> 
> Tested by building an arm-none-eabi GCC cross-compiler and checking that
> __ARM_FEATURE_DSP is *not* defined when invoked with
> -mcpu=cortex-m33+nodsp.
> 
> Is this ok for trunk?
> 

OK.

R.

> Best regards,
> 
> Thomas
> 
> allow_cortex_m33_nodsp.patch
> 
> 
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 
> 07de4c9375ba7a0df0d8bd00385e54a4042e5264..25fc429a8338e433b9fcd0ee385ff127423494c2
>  100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -1516,6 +1516,7 @@ begin cpu cortex-m33
>   architecture armv8-m.main+dsp
>   fpu fpv5-sp-d16
>   option nofp remove ALL_FP
> + option nodsp remove armv7em
>   costs v7m
>  end cpu cortex-m33
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 9ad1fb339babe2ce8f45ecac2fa93d7b9ae5fd30..722d5cc2c0a020906e6df3260822cdd268245082
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -15803,6 +15803,9 @@ Permissible names for this option are the same as 
> those for
>  The following extension options are common to the listed CPUs:
>  
>  @table @samp
> +@item +nodsp
> +Disable the DSP instructions on @samp{cortex-m33}.
> +
>  @item  +nofp
>  Disables the floating-point instructions on @samp{arm9e},
>  @samp{arm946e-s}, @samp{arm966e-s}, @samp{arm968e-s}, @samp{arm10e},
> 



Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-16 Thread Jakub Jelinek
On Mon, Oct 16, 2017 at 03:38:28PM +0200, Martin Liška wrote:
> --- a/libsanitizer/asan/asan_report.cc
> +++ b/libsanitizer/asan/asan_report.cc
> @@ -344,14 +344,70 @@ static INLINE void CheckForInvalidPointerPair(void *p1, 
> void *p2) {
>if (!flags()->detect_invalid_pointer_pairs) return;
>uptr a1 = reinterpret_cast(p1);
>uptr a2 = reinterpret_cast(p2);
> -  AsanChunkView chunk1 = FindHeapChunkByAddress(a1);
> -  AsanChunkView chunk2 = FindHeapChunkByAddress(a2);
> -  bool valid1 = chunk1.IsAllocated();
> -  bool valid2 = chunk2.IsAllocated();
> -  if (!valid1 || !valid2 || !chunk1.Eq(chunk2)) {
> -GET_CALLER_PC_BP_SP;
> -return ReportInvalidPointerPair(pc, bp, sp, a1, a2);
> +
> +  if (a1 == a2)
> +return;
> +
> +  uptr offset = a1 < a2 ? a2 - a1 : a1 - a2;
> +  uptr left = a1 < a2 ? a1 : a2;
> +  uptr right = a1 < a2 ? a2 : a1;
> +  if (offset <= 2048) {
> +if (__asan_region_is_poisoned(left, offset) == 0)
> +  return;
> +else
> +  goto do_error;
> +  }
> +
> +  {
> +uptr shadow_offset1, shadow_offset2;

Some more nits.  This is C++ (but C99 would do as well), you don't need a new
scope for the above variables.  Just declare them without {} around and
extra indentation.

> +
> +{
> +  ThreadRegistryLock l(&asanThreadRegistry());
> +
> +  // check whether left is a stack memory pointer
> +  if (GetStackVariableBeginning(left, &shadow_offset1)) {
> + if (GetStackVariableBeginning(right - 1, &shadow_offset2) &&
> + shadow_offset1 == shadow_offset2)
> +   return;
> + else
> +   goto do_error;
> +  }
> +}
> +
> +// check whether left is a heap memory address
> +HeapAddressDescription hdesc1, hdesc2;
> +if (GetHeapAddressInformation(left, 0, &hdesc1) &&
> + hdesc1.chunk_access.access_type == kAccessTypeInside) {
> +  if (GetHeapAddressInformation(right, 0, &hdesc2) &&
> +   hdesc2.chunk_access.access_type == kAccessTypeInside &&
> +   (hdesc1.chunk_access.chunk_begin
> +== hdesc2.chunk_access.chunk_begin))
> + return;
> +  else
> + goto do_error;
> +}
> +// check whether left is an address of a global variable
> +GlobalAddressDescription gdesc1, gdesc2;
> +if (GetGlobalAddressInformation(left, 0, &gdesc1)) {
> +  if (GetGlobalAddressInformation(right - 1, 0, &gdesc2) &&
> +   gdesc1.PointsInsideASameVariable (gdesc2))
> + return;

I think it is better to use consistent coding, i.e. use else goto do_error;
here too.

> +}
> +else {

Without the else { here.

And do
  {
ThreadRegistryLock l(&asanThreadRegistry());
if (GetStackVariableBeginning(right - 1, &shadow_offset2))
  goto do_error;
  }
  if (GetHeapAddressInformation(right, 0, &hdesc2) ||
  GetGlobalAddressInformation(right - 1, 0, &gdesc2))
goto do_error;

  /* At this point we know nothing about both a1 and a2 addresses.  */
  return;

so that the lock is held over GetStackVariableBeginning only.

Jakub


Re: [PATCH, Makefile] improve default cpu selection for e500v2

2017-10-16 Thread Andrew Jenner

On 13/10/2017 17:07, Jeff Law wrote:

On 08/31/2017 12:54 PM, Olivier Hainque wrote:

Hello,

gcc can be configured with an "e500v2" cpu target
name, conveying SPE with double precision floats.

config.gcc already has a provision for a good default
cpu selection for SPE with double precision floats
when the latter is explicitly requested with an explicit
--enable command line option. This is:

   # If there is no $with_cpu option, try to infer one from ${target}.
   # This block sets nothing except for with_cpu.
   ...
   case ${target} in
   ...
 powerpc*-*-*spe*)
   if test x$enable_e500_double = xyes; then
  with_cpu=8548
   else
  with_cpu=8540
   fi
   ;;

The attached patch is a proposal to refine this to select 8548 also
when we were configured for an e500v2 target cpu (canonicalized to
match powerpc-*spe), regardless of enable_e500_double.

We have been using something like this in production for a few
years in-house, lately with gcc-6 based toolchains for VxWorks or
bareboard configurations.

I also checked that
- e500v2-wrs-vxworks builds work with gcc-7, that the default cpu is
   indeed properly set to 8548, and that the built toolchains pass Ada
   ACATS tests for a couple of runtime library variants (kernel and rtp).

- a mainline build for powerpc-eabispe without --enable-e500-double
   defaults to 8540

- a mainline build for powerpc-eabispe with --enable-e500-double
   defaults to 8548

- a mainline build for e500v2-wrs-vxworks without --enable-e500-double
   defaults to 8548

OK to commit ?

Thanks in advance

2017-08-31  Olivier Hainque  

 * gcc/config.gcc (powerpc*-*-*spe*): Pick 8548 as the
default with_cpu for an e500v2 target cpu name, in addition
to --enable-e500-double.

GIven this hits the powerpcspe port, I'd like Andrew Jenner to chime in
as the powerpcspe maintainer.  I've added him on CC.


This change is fine with me too. Thanks!

Andrew


[patch] implement generic debug() for vectors and hash sets

2017-10-16 Thread Aldy Hernandez
We have a generic mechanism for dumping types from the debugger with:

(gdb) call debug(some_type)

However, even though most types are implemented, we have no canonical
way of dumping vectors or hash sets.

The attached patch fixes this oversight.  With it you can call
debug(vec<>) and debug(hash_set<>) with the following types: rtx,
tree, basic_block, edge, rtx_insn.  More can be added simply by adding
a debug_slim(your_type) overload and calling:

  DEFINE_DEBUG_VEC (your_type)
  DEFINE_DEBUG_HASH_SET (your_type)

Here is an example of how things look with this patch:

vec of edges:
[0] =  10)>

vec of bbs:
[0] = 
[1] = 

vec of trees:
[0] =  
[1] =  
[2] =  

vec of rtx:
[0] = (reg:SI 87)
[1] = (reg:SI 87)

hash of bbs:



OK for mainline?
gcc/

	* vec.h (debug_helper): New function.
	(DEFINE_DEBUG_VEC): New macro.
	* hash-set.h (debug_helper): New function.
	(DEFINE_DEBUG_HASH_SET): New macro.
	* cfg.c (debug_slim (edge)): New function.
	Call DEFINE_DEBUG_VEC for edges.
	Call DEFINE_DEBUG_HASH_SET for edges.
	* cfghooks.c (debug_slim (basic_block)): New function.
	Call DEFINE_DEBUG_VEC for basic blocks.
	Call DEFINE_DEBUG_HASH_SET for basic blocks.
	* print-tree.c (debug_slim): New function to handle trees.
	Call DEFINE_DEBUG_VEC for trees.
	Call DEFINE_DEBUG_HASH_SET for trees.
	(debug (vec) &): Remove.
	(debug () *): Remove.
	* print-rtl.c (debug_slim): New function to handle const_rtx.
	Call DEFINE_DEBUG_VEC for rtx_def.
	Call DEFINE_DEBUG_VEC for rtx_insn.
	Call DEFINE_DEBUG_HASH_SET for rtx_def.
	Call DEFINE_DEBUG_HASH_SET for rtx_insn.
	* sel-sched-dump.c (debug (vec &): Remove.
	(debug (vec *ptr): Remove.
	(debug_insn_vector): Remove.
	* stor-layout.c (debug_rli): Call debug() instead of debug_vec_tree.

diff --git a/gcc/cfg.c b/gcc/cfg.c
index 01e68aeda51..4d02fb56cbf 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -573,6 +573,16 @@ debug (edge_def *ptr)
   else
 fprintf (stderr, "\n");
 }
+
+static void
+debug_slim (edge e)
+{
+  fprintf (stderr, " %d)>", (void *) e,
+	   e->src->index, e->dest->index);
+}
+
+DEFINE_DEBUG_VEC (edge)
+DEFINE_DEBUG_HASH_SET (edge)
 
 /* Simple routines to easily allocate AUX fields of basic blocks.  */
 
diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
index 258a5eabf8d..73b196feec7 100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -304,6 +304,14 @@ debug (basic_block_def *ptr)
 fprintf (stderr, "\n");
 }
 
+static void
+debug_slim (basic_block ptr)
+{
+  fprintf (stderr, "", (void *) ptr, ptr->index);
+}
+
+DEFINE_DEBUG_VEC (basic_block_def *)
+DEFINE_DEBUG_HASH_SET (basic_block_def *)
 
 /* Dumps basic block BB to pretty-printer PP, for use as a label of
a DOT graph record-node.  The implementation of this hook is
diff --git a/gcc/hash-set.h b/gcc/hash-set.h
index d2247d39571..58f7750243a 100644
--- a/gcc/hash-set.h
+++ b/gcc/hash-set.h
@@ -123,6 +123,44 @@ private:
   hash_table m_table;
 };
 
+/* Generic hash_set debug helper.
+
+   This needs to be instantiated for each hash_set used throughout
+   the compiler like this:
+
+DEFINE_DEBUG_HASH_SET (TYPE)
+
+   The reason we have a debug_helper() is because GDB can't
+   disambiguate a plain call to debug(some_hash), and it must be called
+   like debug(some_hash).  */
+template
+void
+debug_helper (hash_set &ref)
+{
+  for (typename hash_set::iterator it = ref.begin ();
+   it != ref.end (); ++it)
+{
+  debug_slim (*it);
+  fputc ('\n', stderr);
+}
+}
+
+#define DEFINE_DEBUG_HASH_SET(T) \
+  template static void debug_helper (hash_set &);	\
+  DEBUG_FUNCTION void	\
+  debug (hash_set &ref)\
+  {			\
+debug_helper  (ref);\
+  }			\
+  DEBUG_FUNCTION void	\
+  debug (hash_set *ptr)\
+  {			\
+if (ptr)		\
+  debug (*ptr);	\
+else		\
+  fprintf (stderr, "\n");			\
+  }
+
 /* ggc marking routines.  */
 
 template
diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 28d99862cad..5fe23801ab2 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -967,6 +967,23 @@ debug (const rtx_def *ptr)
 fprintf (stderr, "\n");
 }
 
+/* Like debug_rtx but with no newline, as debug_helper will add one.
+
+   Note: No debug_slim(rtx_insn *) variant implemented, as this
+   function can serve for both rtx and rtx_insn.  */
+
+static void
+debug_slim (const_rtx x)
+{
+  rtx_writer w (stderr, 0, false, false, NULL);
+  w.print_rtx (x);
+}
+
+DEFINE_DEBUG_VEC (rtx_def *)
+DEFINE_DEBUG_VEC (rtx_insn *)
+DEFINE_DEBUG_HASH_SET (rtx_def *)
+DEFINE_DEBUG_HASH_SET (rtx_insn *)
+
 /* Count of rtx's to print with debug_rtx_list.
This global exists because gdb user defined commands have no arguments.  */
 
diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index d534c76ee49..3a0f85d4038 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -1095,32 +1095,6 @@ debug_raw (vec &ref)
 }
 
 DEBUG_FUNCTION void
-debug (vec &ref)
-{
-  tree elt;
-  unsigned ix;
-
-  /* Print the slot this node is in, and its code, and address.  */
-  fprintf (stderr, 

Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions

2017-10-16 Thread Bin.Cheng
On Thu, Oct 12, 2017 at 2:43 PM, Richard Biener
 wrote:
> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng  wrote:
>> Hi,
>> This patch merges adjacent memset builtin partitions if possible.  It is
>> a useful special case optimization transforming below code:
>>
>> #define M (256)
>> #define N (512)
>>
>> struct st
>> {
>>   int a[M][N];
>>   int c[M];
>>   int b[M][N];
>> };
>>
>> void
>> foo (struct st *p)
>> {
>>   for (unsigned i = 0; i < M; ++i)
>> {
>>   p->c[i] = 0;
>>   for (unsigned j = N; j > 0; --j)
>> {
>>   p->a[i][j - 1] = 0;
>>   p->b[i][j - 1] = 0;
>> }
>> }
>>
>> into a single memset function call, rather than three calls initializing
>> the structure field by field.
>>
>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
>
> +  /* Insertion sort is good enough for the small sub-array.  */
> +  for (k = i + 1; k < j; ++k)
> +   {
> + part1 = (*partitions)[k];
> + for (l = k; l > i; --l)
> +   {
> + part2 = (*partitions)[l - 1];
> + if (part1->builtin->dst_base_offset
> +   >= part2->builtin->dst_base_offset)
> +   break;
> +
> + (*partitions)[l] = part2;
> +   }
> + (*partitions)[l] = part1;
> +   }
>
> so we want to sort [i, j[ after dst_base_offset.  I realize you don't want
> to write a qsort helper for this but I can't wrap my head around the above
> in 5 minutes so ... please!
Hmm, I thought twice about this and now I believe stable sorting (thus
insertion sort)
is required here.  Please see below for explanation.

>
> You don't seem to check the sizes of the memsets given that they
> obviously cannot overlap(!?)
Yes, given it's quite special case transformation, I did add code
checking overlap cases.
>
> Also why handle memset and not memcpy/memmove or combinations of
> them (for sorting)?
>
>   for ()
>{
>   p->a[i] = 0;
>   p->c[i] = q->c[i];
>   p->b[i] = 0;
>}
>
> with a and b adjacent.  I suppose p->c could be computed by a
> non-builtin partition as well.
Yes, the two memset builtin partitions can be merged in this case, but...
> So don't we want to see if dependences allow sorting all builtin
> partitions next to each other
> as much as possible?  (maybe we do that already?)
The answer for this, above partition merging and use of qsort is no.
I think all the three are the same question here.  For now we only do
topological sort for partitions.  To maximize parallelism (either by merging
normal parallel partitions or merging builtin partitions) requires fine-tuned
sorting between partitions that doesn't dependence on each other.
In order to sort all memset/memcpy/memmove, we need check dependence
between all data references between different partitions.  For example, I
created new test ldist-36.c illustrating sorting memcpy along with memset
would generate wrong code because dependence is broken.  It's the same
for qsort.  In extreme case, if the same array is set twice with different rhs
value, the order between the two sets needs to be preserved.  Unfortunately,
qsort is unstable and could reorder different sets.  This would break output
dependence.
At the point of this function, dependence graph is destroyed, we can't do
much in addition to special case handling for memset.  Full solution would
require a customized topological sorting process.

So, this updated patch keeps insertion sort with additional comment explaining
why.  Also two test cases added showing when memset partitions should be
merged (we can't for now) and when memset partitions should not be merged.

Bootstrap and test.  Is it OK?

Thanks,
bin

2017-10-14  Bin Cheng  

* tree-loop-distribution.c (tree-ssa-loop-ivopts.h): New header file.
(struct builtin_info): New fields.
(classify_builtin_1): Compute and record base and offset parts for
memset builtin partition by calling strip_offset.
(fuse_memset_builtins): New function.
(finalize_partitions): Fuse adjacent memset partitions by calling
above function.
* tree-ssa-loop-ivopts.c (strip_offset): Delete static declaration.
Expose the interface.
* tree-ssa-loop-ivopts.h (strip_offset): New declaration.

gcc/testsuite/ChangeLog
2017-10-14  Bin Cheng  

* gcc.dg/tree-ssa/ldist-17.c: Adjust test string.
* gcc.dg/tree-ssa/ldist-32.c: New test.
* gcc.dg/tree-ssa/ldist-35.c: New test.
* gcc.dg/tree-ssa/ldist-36.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
index 4efc0a4..b3617f6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
@@ -45,5 +45,5 @@ mad_synth_mute (struct mad_synth *synth)
   return;
 }
 
-/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 4 library 
calls" "ldist" } } */
-/* { dg-final { scan-tree-dump-times "generated memset zero" 4 "ldist" } } */
+/* { dg-final { scan-tree-dump "Loop nest . distributed: split to 0 loops and 
1 l

Re: [patch] implement generic debug() for vectors and hash sets

2017-10-16 Thread Aldy Hernandez
One more thing.

I see we have a "pvt" macro in gdbinit.in:

define pvt
set debug_vec_tree ($)
end

I propose we either remove this altogether, since we have a generic
debug() way of dumping things, or implement it with "set debug($)" if
somebody's finger memory will be adversely affected.  (I'm looking at
you Jakub ;-)).


On Mon, Oct 16, 2017 at 9:52 AM, Aldy Hernandez  wrote:
> We have a generic mechanism for dumping types from the debugger with:
>
> (gdb) call debug(some_type)
>
> However, even though most types are implemented, we have no canonical
> way of dumping vectors or hash sets.
>
> The attached patch fixes this oversight.  With it you can call
> debug(vec<>) and debug(hash_set<>) with the following types: rtx,
> tree, basic_block, edge, rtx_insn.  More can be added simply by adding
> a debug_slim(your_type) overload and calling:
>
>   DEFINE_DEBUG_VEC (your_type)
>   DEFINE_DEBUG_HASH_SET (your_type)
>
> Here is an example of how things look with this patch:
>
> vec of edges:
> [0] =  10)>
>
> vec of bbs:
> [0] = 
> [1] = 
>
> vec of trees:
> [0] =  
> [1] =  
> [2] =  
>
> vec of rtx:
> [0] = (reg:SI 87)
> [1] = (reg:SI 87)
>
> hash of bbs:
> 
> 
>
> OK for mainline?


Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-16 Thread Jeff Law
On 10/16/2017 06:03 AM, Martin Liška wrote:
> On 10/13/2017 04:59 PM, Jeff Law wrote:
>> On 10/13/2017 07:02 AM, Martin Liška wrote:
>>> On 10/12/2017 11:54 PM, Jeff Law wrote:
 On 10/11/2017 12:13 AM, Martin Liška wrote:
> 2017-10-10  Martin Liska  
>
>   PR tree-optimization/82493
>   * sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
>   (test_range_functions): New function.
>   (sbitmap_c_tests): Likewise.
>   * selftest-run-tests.c (selftest::run_tests): Run new tests.
>   * selftest.h (sbitmap_c_tests): New function.
 I went ahead and committed this along with a patch to fix the off-by-one
 error in live_bytes_read.  Bootstrapped and regression tested on x86.

 Actual patch attached for archival purposes.

 Jeff

>>>
>>> Hello.
>>>
>>> I wrote a patch that adds various gcc_checking_asserts and I hit following:
>>>
>>> ./xgcc -B. 
>>> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90 
>>> -c -O2
>>> during GIMPLE pass: dse
>>> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90:7:0:
>>>
>>>   program testat
>>>  
>>> internal compiler error: in bitmap_check_index, at sbitmap.h:105
>>> 0x1c014c1 bitmap_check_index
>>> ../../gcc/sbitmap.h:105
>>> 0x1c01fa7 bitmap_bit_in_range_p(simple_bitmap_def const*, unsigned int, 
>>> unsigned int)
>>> ../../gcc/sbitmap.c:335
>>> 0x1179002 live_bytes_read
>>> ../../gcc/tree-ssa-dse.c:497
>>> 0x117935a dse_classify_store
>>> ../../gcc/tree-ssa-dse.c:595
>>> 0x1179947 dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)
>>> ../../gcc/tree-ssa-dse.c:786
>>> 0x1179b6e dse_dom_walker::before_dom_children(basic_block_def*)
>>> ../../gcc/tree-ssa-dse.c:853
>>> 0x1a6f659 dom_walker::walk(basic_block_def*)
>>> ../../gcc/domwalk.c:308
>>> 0x1179cb9 execute
>>> ../../gcc/tree-ssa-dse.c:907
>>>
>>> Where we call:
>>> Breakpoint 1, bitmap_bit_in_range_p (bmap=0x29d6cd0, start=0, end=515) at 
>>> ../../gcc/sbitmap.c:335
>>> 335   bitmap_check_index (bmap, end);
>>> (gdb) p *bmap
>>> $1 = {n_bits = 256, size = 4, elms = {255}}
>>>
>>> Is it a valid call or should caller check indices?
>> It doesn't look valid to me.  I'll dig into it.
>>
>> In general the sbitmap interface requires callers to DTRT -- failure can
>> easily lead to an out of bounds read or write.  It's one of the things I
>> really dislike about the sbitmap implementation.
>>
>> So it's safe to assume that I'm fully supportive of adding more testing
>> to catch this kind thing.
>>
>> Jeff
>>
> 
> Good.
> 
> Should I prepare fix for the ICE I mentioned or have you been working on that?
I've already got a patch for it.  I'll likely commit it this morning.

jeff


[PATCH GCC]Introduce qsort_range interface for GCC vector

2017-10-16 Thread Bin Cheng
Hi,
I was asked by Richi to replace insertion sort with qsort_range in loop
nest distribution patch.  Although I believe stable sort (thus insertion)
sort is needed in that case, I also added qsort_range interface in vec.h.
The new interface might be useful in other places.
Bootstrap and test on x86_64 and AArch64 with other patches.  Is it OK?

Thanks,
bin
2017-10-13  Bin Cheng  

* vec.h (struct GTY((user)) vec::qsort_range): New
member function.
(struct vec): New member function.From a6aa2866fb067628f63f508e9314c3a092b6055c Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Fri, 13 Oct 2017 13:55:03 +0100
Subject: [PATCH 7/8] vec-qsort_range-20171013.txt

---
 gcc/vec.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/vec.h b/gcc/vec.h
index cbdd439..f49177d 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -497,6 +497,7 @@ public:
   void unordered_remove (unsigned);
   void block_remove (unsigned, unsigned);
   void qsort (int (*) (const void *, const void *));
+  void qsort_range (unsigned, unsigned, int (*) (const void *, const void *));
   T *bsearch (const void *key, int (*compar)(const void *, const void *));
   unsigned lower_bound (T, bool (*)(const T &, const T &)) const;
   bool contains (const T &search) const;
@@ -974,6 +975,20 @@ vec::qsort (int (*cmp) (const void *, const void *))
 }
 
 
+/* Sort the contents within range [S, E] of this vector with qsort.  Both
+   S and E should be within [0, length).  CMP is the comparison function
+   to pass to qsort.  */
+
+template
+inline void
+vec::qsort_range (unsigned s, unsigned e,
+  int (*cmp) (const void *, const void *))
+{
+  if (e > s && length () > e)
+::qsort (&(*this)[s], e - s + 1, sizeof (T), cmp);
+}
+
+
 /* Search the contents of the sorted vector with a binary search.
CMP is the comparison function to pass to bsearch.  */
 
@@ -1260,6 +1275,7 @@ public:
   void unordered_remove (unsigned);
   void block_remove (unsigned, unsigned);
   void qsort (int (*) (const void *, const void *));
+  void qsort_range (unsigned, unsigned, int (*) (const void *, const void *));
   T *bsearch (const void *key, int (*compar)(const void *, const void *));
   unsigned lower_bound (T, bool (*)(const T &, const T &)) const;
   bool contains (const T &search) const;
@@ -1736,6 +1752,20 @@ vec::qsort (int (*cmp) (const void *, const void *))
 }
 
 
+/* Sort the contents within range [S, E] of this vector with qsort.  Both
+   S and E should be within [0, length).  CMP is the comparison function
+   to pass to qsort.  */
+
+template
+inline void
+vec::qsort_range (unsigned s, unsigned e,
+  int (*cmp) (const void *, const void *))
+{
+  if (m_vec)
+m_vec->qsort_range (s, e, cmp);
+}
+
+
 /* Search the contents of the sorted vector with a binary search.
CMP is the comparison function to pass to bsearch.  */
 
-- 
1.9.1



Re: [PATCH, RFC] Add a pass counter for "are we there yet" purposes

2017-10-16 Thread Sandra Loosemore

On 10/16/2017 12:53 AM, Richard Biener wrote:

On October 16, 2017 7:38:50 AM GMT+02:00, Sandra Loosemore 
 wrote:

This patch is a first cut at solving the problem discussed in this
thread

https://gcc.gnu.org/ml/gcc/2017-10/msg00016.html

where I have some nios2 backend patches in my queue that need a way of
knowing whether the split1 pass has run yet.  There seemed to be
agreement that a general way to query the pass manager for this
information would be useful.

[snip]


I missed the post of why you need to know this. But as you noticed we're using 
reload_completed for similar purpose. There's also the possibility of 
setting/adding a pass property that split could provide and which you could 
query. We're using this to signal the various different lowering stages in 
GIMPLE for example.


See the thread linked above.  There was interest in a general mechanism 
to query the pass manager state instead of adding the bookkeeping to the 
nios2 backend or adding something to track the split1 pass to the 
target-independent parts of the compiler.  After fiddling with it, 
though, I'm not sure this is an improvement.


Maybe it would help the discussion if I got my nios2 patch set posted so 
that everybody could see the actual use case?  It'll take me a few days 
to finish cleaning it up.


-Sandra



Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions

2017-10-16 Thread Bin.Cheng
On Mon, Oct 16, 2017 at 2:56 PM, Bin.Cheng  wrote:
> On Thu, Oct 12, 2017 at 2:43 PM, Richard Biener
>  wrote:
>> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng  wrote:
>>> Hi,
>>> This patch merges adjacent memset builtin partitions if possible.  It is
>>> a useful special case optimization transforming below code:
>>>
>>> #define M (256)
>>> #define N (512)
>>>
>>> struct st
>>> {
>>>   int a[M][N];
>>>   int c[M];
>>>   int b[M][N];
>>> };
>>>
>>> void
>>> foo (struct st *p)
>>> {
>>>   for (unsigned i = 0; i < M; ++i)
>>> {
>>>   p->c[i] = 0;
>>>   for (unsigned j = N; j > 0; --j)
>>> {
>>>   p->a[i][j - 1] = 0;
>>>   p->b[i][j - 1] = 0;
>>> }
>>> }
>>>
>>> into a single memset function call, rather than three calls initializing
>>> the structure field by field.
>>>
>>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
>>
>> +  /* Insertion sort is good enough for the small sub-array.  */
>> +  for (k = i + 1; k < j; ++k)
>> +   {
>> + part1 = (*partitions)[k];
>> + for (l = k; l > i; --l)
>> +   {
>> + part2 = (*partitions)[l - 1];
>> + if (part1->builtin->dst_base_offset
>> +   >= part2->builtin->dst_base_offset)
>> +   break;
>> +
>> + (*partitions)[l] = part2;
>> +   }
>> + (*partitions)[l] = part1;
>> +   }
>>
>> so we want to sort [i, j[ after dst_base_offset.  I realize you don't want
>> to write a qsort helper for this but I can't wrap my head around the above
>> in 5 minutes so ... please!
> Hmm, I thought twice about this and now I believe stable sorting (thus
> insertion sort)
> is required here.  Please see below for explanation.
>
>>
>> You don't seem to check the sizes of the memsets given that they
>> obviously cannot overlap(!?)
> Yes, given it's quite special case transformation, I did add code
> checking overlap cases.
>>
>> Also why handle memset and not memcpy/memmove or combinations of
>> them (for sorting)?
>>
>>   for ()
>>{
>>   p->a[i] = 0;
>>   p->c[i] = q->c[i];
>>   p->b[i] = 0;
>>}
>>
>> with a and b adjacent.  I suppose p->c could be computed by a
>> non-builtin partition as well.
> Yes, the two memset builtin partitions can be merged in this case, but...
>> So don't we want to see if dependences allow sorting all builtin
>> partitions next to each other
>> as much as possible?  (maybe we do that already?)
> The answer for this, above partition merging and use of qsort is no.
> I think all the three are the same question here.  For now we only do
> topological sort for partitions.  To maximize parallelism (either by merging
> normal parallel partitions or merging builtin partitions) requires fine-tuned
> sorting between partitions that doesn't dependence on each other.
> In order to sort all memset/memcpy/memmove, we need check dependence
> between all data references between different partitions.  For example, I
> created new test ldist-36.c illustrating sorting memcpy along with memset
> would generate wrong code because dependence is broken.  It's the same
> for qsort.  In extreme case, if the same array is set twice with different rhs
> value, the order between the two sets needs to be preserved.  Unfortunately,
> qsort is unstable and could reorder different sets.  This would break output
> dependence.
> At the point of this function, dependence graph is destroyed, we can't do
> much in addition to special case handling for memset.  Full solution would
> require a customized topological sorting process.
>
> So, this updated patch keeps insertion sort with additional comment explaining
> why.  Also two test cases added showing when memset partitions should be
> merged (we can't for now) and when memset partitions should not be merged.
Hmm, I can factor out the sorting loop into a function, that might
make it easier
to read.

Thanks,
bin
>
> Bootstrap and test.  Is it OK?
>
> Thanks,
> bin
>
> 2017-10-14  Bin Cheng  
>
> * tree-loop-distribution.c (tree-ssa-loop-ivopts.h): New header file.
> (struct builtin_info): New fields.
> (classify_builtin_1): Compute and record base and offset parts for
> memset builtin partition by calling strip_offset.
> (fuse_memset_builtins): New function.
> (finalize_partitions): Fuse adjacent memset partitions by calling
> above function.
> * tree-ssa-loop-ivopts.c (strip_offset): Delete static declaration.
> Expose the interface.
> * tree-ssa-loop-ivopts.h (strip_offset): New declaration.
>
> gcc/testsuite/ChangeLog
> 2017-10-14  Bin Cheng  
>
> * gcc.dg/tree-ssa/ldist-17.c: Adjust test string.
> * gcc.dg/tree-ssa/ldist-32.c: New test.
> * gcc.dg/tree-ssa/ldist-35.c: New test.
> * gcc.dg/tree-ssa/ldist-36.c: New test.


Re: [PATCH, RFC] Add a pass counter for "are we there yet" purposes

2017-10-16 Thread Richard Biener
On October 16, 2017 5:46:35 PM GMT+02:00, Sandra Loosemore 
 wrote:
>On 10/16/2017 12:53 AM, Richard Biener wrote:
>> On October 16, 2017 7:38:50 AM GMT+02:00, Sandra Loosemore
> wrote:
>>> This patch is a first cut at solving the problem discussed in this
>>> thread
>>>
>>> https://gcc.gnu.org/ml/gcc/2017-10/msg00016.html
>>>
>>> where I have some nios2 backend patches in my queue that need a way
>of
>>> knowing whether the split1 pass has run yet.  There seemed to be
>>> agreement that a general way to query the pass manager for this
>>> information would be useful.
>>>
>>> [snip]
>>
>> I missed the post of why you need to know this. But as you noticed
>we're using reload_completed for similar purpose. There's also the
>possibility of setting/adding a pass property that split could provide
>and which you could query. We're using this to signal the various
>different lowering stages in GIMPLE for example.
>
>See the thread linked above.  There was interest in a general mechanism
>
>to query the pass manager state instead of adding the bookkeeping to
>the 
>nios2 backend or adding something to track the split1 pass to the 
>target-independent parts of the compiler.  After fiddling with it, 
>though, I'm not sure this is an improvement.
>
>Maybe it would help the discussion if I got my nios2 patch set posted
>so 
>that everybody could see the actual use case?  It'll take me a few days
>
>to finish cleaning it up.

I guess that might help. I have the feeling that querying for 'did pass X run' 
is wrong conceptually. 

Richard. 

>-Sandra



Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions

2017-10-16 Thread Bin.Cheng
On Mon, Oct 16, 2017 at 5:00 PM, Bin.Cheng  wrote:
> On Mon, Oct 16, 2017 at 2:56 PM, Bin.Cheng  wrote:
>> On Thu, Oct 12, 2017 at 2:43 PM, Richard Biener
>>  wrote:
>>> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng  wrote:
 Hi,
 This patch merges adjacent memset builtin partitions if possible.  It is
 a useful special case optimization transforming below code:

 #define M (256)
 #define N (512)

 struct st
 {
   int a[M][N];
   int c[M];
   int b[M][N];
 };

 void
 foo (struct st *p)
 {
   for (unsigned i = 0; i < M; ++i)
 {
   p->c[i] = 0;
   for (unsigned j = N; j > 0; --j)
 {
   p->a[i][j - 1] = 0;
   p->b[i][j - 1] = 0;
 }
 }

 into a single memset function call, rather than three calls initializing
 the structure field by field.

 Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
>>>
>>> +  /* Insertion sort is good enough for the small sub-array.  */
>>> +  for (k = i + 1; k < j; ++k)
>>> +   {
>>> + part1 = (*partitions)[k];
>>> + for (l = k; l > i; --l)
>>> +   {
>>> + part2 = (*partitions)[l - 1];
>>> + if (part1->builtin->dst_base_offset
>>> +   >= part2->builtin->dst_base_offset)
>>> +   break;
>>> +
>>> + (*partitions)[l] = part2;
>>> +   }
>>> + (*partitions)[l] = part1;
>>> +   }
>>>
>>> so we want to sort [i, j[ after dst_base_offset.  I realize you don't want
>>> to write a qsort helper for this but I can't wrap my head around the above
>>> in 5 minutes so ... please!
>> Hmm, I thought twice about this and now I believe stable sorting (thus
>> insertion sort)
>> is required here.  Please see below for explanation.
>>
>>>
>>> You don't seem to check the sizes of the memsets given that they
>>> obviously cannot overlap(!?)
>> Yes, given it's quite special case transformation, I did add code
>> checking overlap cases.
>>>
>>> Also why handle memset and not memcpy/memmove or combinations of
>>> them (for sorting)?
>>>
>>>   for ()
>>>{
>>>   p->a[i] = 0;
>>>   p->c[i] = q->c[i];
>>>   p->b[i] = 0;
>>>}
>>>
>>> with a and b adjacent.  I suppose p->c could be computed by a
>>> non-builtin partition as well.
>> Yes, the two memset builtin partitions can be merged in this case, but...
>>> So don't we want to see if dependences allow sorting all builtin
>>> partitions next to each other
>>> as much as possible?  (maybe we do that already?)
>> The answer for this, above partition merging and use of qsort is no.
>> I think all the three are the same question here.  For now we only do
>> topological sort for partitions.  To maximize parallelism (either by merging
>> normal parallel partitions or merging builtin partitions) requires fine-tuned
>> sorting between partitions that doesn't dependence on each other.
>> In order to sort all memset/memcpy/memmove, we need check dependence
>> between all data references between different partitions.  For example, I
>> created new test ldist-36.c illustrating sorting memcpy along with memset
>> would generate wrong code because dependence is broken.  It's the same
>> for qsort.  In extreme case, if the same array is set twice with different 
>> rhs
>> value, the order between the two sets needs to be preserved.  Unfortunately,
>> qsort is unstable and could reorder different sets.  This would break output
>> dependence.
>> At the point of this function, dependence graph is destroyed, we can't do
>> much in addition to special case handling for memset.  Full solution would
>> require a customized topological sorting process.
>>
>> So, this updated patch keeps insertion sort with additional comment 
>> explaining
>> why.  Also two test cases added showing when memset partitions should be
>> merged (we can't for now) and when memset partitions should not be merged.
> Hmm, I can factor out the sorting loop into a function, that might
> make it easier
> to read.
Okay, I will use std::stable_sort in this case, that's exactly what we
want for this case.

Thanks,
bin


Re: [PATCH, Makefile] improve default cpu selection for e500v2

2017-10-16 Thread Olivier Hainque
Hello Andrew,

> On Oct 16, 2017, at 15:45 , Andrew Jenner  wrote:
> 
> This change is fine with me too. Thanks!

Great, just checked-in. Thanks for your prompt feedback!

With Kind Regards,

Olivier



[PATCH, Fortran, committed] PR82511: ICE Bad IO basetype (12) on attempted read or write of entire DEC structure

2017-10-16 Thread Fritz Reese
All,

The simple attached patch which fixes PR 82511 has been committed to
trunk as r253791. It regtests on x86_64-redhat-linux-gnu.

The issue was an ICE when a variable containing a BT_UNION (basetype
12) component was given in a I/O list. The patch treats the BT_UNION
component as a derived type variable, thus recursing into its final
components (across all of its MAPs).

---
Fritz Reese
From 89ed92f0127b61bc802e43c8f3125f48540d7c27 Mon Sep 17 00:00:00 2001
From: Fritz Reese 
Date: Mon, 16 Oct 2017 13:27:28 -0400
Subject: [PATCH] Treat UNION components as DT comp. in I/O lists.

	PR fortran/82511
	gcc/fortran/
	* trans-io.c (transfer_expr): Treat BT_UNION as BT_DERIVED.

	PR fortran/82511
	gcc/testsuite/gfortran.dg/
	* dec_structure_22.f90: New testcase.
---
 gcc/fortran/trans-io.c |  4 +--
 gcc/testsuite/gfortran.dg/dec_structure_22.f90 | 38 ++
 2 files changed, 40 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/dec_structure_22.f90

diff --git a/gcc/fortran/trans-io.c b/gcc/fortran/trans-io.c
index 026f9a993d2..f3e1f3e4d09 100644
--- a/gcc/fortran/trans-io.c
+++ b/gcc/fortran/trans-io.c
@@ -2404,7 +2404,7 @@ transfer_expr (gfc_se * se, gfc_typespec * ts, tree addr_expr,
 case BT_CLASS:
   if (ts->u.derived->components == NULL)
 	return;
-  if (ts->type == BT_DERIVED || ts->type == BT_CLASS)
+  if (gfc_bt_struct (ts->type) || ts->type == BT_CLASS)
 	{
 	  gfc_symbol *derived;
 	  gfc_symbol *dtio_sub = NULL;
@@ -2438,7 +2438,7 @@ transfer_expr (gfc_se * se, gfc_typespec * ts, tree addr_expr,
 	  function = iocall[IOCALL_X_DERIVED];
 	  break;
 	}
-	  else if (ts->type == BT_DERIVED)
+	  else if (gfc_bt_struct (ts->type))
 	{
 	  /* Recurse into the elements of the derived type.  */
 	  expr = gfc_evaluate_now (addr_expr, &se->pre);
diff --git a/gcc/testsuite/gfortran.dg/dec_structure_22.f90 b/gcc/testsuite/gfortran.dg/dec_structure_22.f90
new file mode 100644
index 000..ddbee02602a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec_structure_22.f90
@@ -0,0 +1,38 @@
+  ! { dg-do run }
+  ! { dg-options "-fdec-structure" }
+  !
+  ! PR fortran/82511
+  !
+  ! Verify that structure variables with UNION components
+  ! are accepted in an I/O-list READ.
+  !
+  implicit none
+
+  structure /s/
+union
+  map
+character(16) :: c16_1
+  end map
+  map
+character(16) :: c16_2
+  end map
+end union
+  end structure
+
+  record /s/ r
+  character(32) :: instr = "ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^"
+
+  r.c16_1 = ''
+  r.c16_2 = ''
+  ! The record r shall be treated as if its components are listed:
+  ! read(...) r.c16_1, r.c16_2
+  ! This shall correspond to the formatted read of A16,A16
+  read(instr, '(A16,A16)') r
+
+  ! r.c16_1 and r.c16_2 are in a union, thus share the same memory
+  ! and the first 16 bytes of instr are overwritten
+  if ( r.c16_1 .ne. instr(17:32) .or. r.c16_2 .ne. instr(17:32) ) then
+call abort()
+  endif
+
+  end
-- 
2.12.2



[committed] Fix another tree-ssa-dse.c thinko

2017-10-16 Thread Jeff Law

Martin's checking turned up another bug in the same line of code that we
use to check if we're doing a real from a live location.

The computation of the end of the range to check did not account for the
offset of the write reference.  Ugh.

With this patch we get a clean bootstrap & regression test with Martin's
latest sbitmap checking patches on x86.

Installing on the trunk.

Jeff


commit c41999b60343c5db040eafb46fc1cf86b5584c14
Author: Jeff Law 
Date:   Mon Oct 16 13:40:29 2017 -0400

* tree-ssa-dse.c (live_bytes_read): Fix thinko.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ab0d1f6ac63..df26aeb2ada 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2017-10-16  Jeff Law  
+
+   * tree-ssa-dse.c (live_bytes_read): Fix thinko.
+
 2017-10-16  Jan Hubicka  
 
* x86-tune-costs.h (znver1_cost): Fix move cost tables.
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 9d6cb146436..c1a6475afe1 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -493,7 +493,7 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live)
 
   /* Now check if any of the remaining bits in use_ref are set in LIVE.  */
   unsigned int start = (use_ref.offset - ref->offset) / BITS_PER_UNIT;
-  unsigned int end  = ((use_ref.offset + use_ref.size) / BITS_PER_UNIT) - 
1;
+  unsigned int end = start + (use_ref.size / BITS_PER_UNIT) - 1;
   return bitmap_bit_in_range_p (live, start, end);
 }
   return true;


Re: [PATCH] Fix PR80295[aarch64] [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-16 Thread Qing Zhao

> On Oct 16, 2017, at 5:52 AM, Wilco Dijkstra  wrote:
> 
> Qing Zhao wrote:
> 
>> Is  my patch Okay?
> 
> Given it's a mid-end patch this shouldn't be marked as AArch64 specific.
> Similarly the PR needs to be updated to say middle-end. So resending
> it making it clear it's not a target bug should help getting a review.

thanks, I will do that.

I do have a question on the new added testing case:

currently, I added it into gcc.target/aarch64/pr80295.c,  but due to it’s a 
middleend bug, shall I added it to gcc.dg?

However, the option that triggered the issue is -mabi=ilp32, which is a target 
depend option, so, I am really confused on where to put the
testing case.

thanks for the help.

Qing



[PATCH] Avoid creation if invalid BIT_FIELD_REFs (PR tree-optimization/82549)

2017-10-16 Thread Jakub Jelinek
Hi!

The tree-cfg.c verification requires that BIT_FIELD_REF bitpos fits into
uhwi, but fold-const.c happily creates BIT_FIELD_REFs with negative bitpos
(on invalid source).

The following patch just punts optimization instead of creating
BIT_FIELD_REF with negative bitpos and also adds some formatting fixes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-16  Jakub Jelinek  

PR tree-optimization/82549
* fold-const.c (optimize_bit_field_compare, fold_truth_andor_1):
Formatting fixes.  Instead of calling make_bit_field_ref with negative
bitpos return 0.

* gcc.c-torture/compile/pr82549.c: New test.

--- gcc/fold-const.c.jj 2017-10-13 09:17:44.0 +0200
+++ gcc/fold-const.c2017-10-14 13:46:35.939329846 +0200
@@ -4013,21 +4013,20 @@ optimize_bit_field_compare (location_t l
  size_int (nbitsize - lbitsize - lbitpos));
 
   if (! const_p)
-/* If not comparing with constant, just rework the comparison
-   and return.  */
-return fold_build2_loc (loc, code, compare_type,
-   fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
-make_bit_field_ref (loc, linner, lhs,
-unsigned_type,
-nbitsize, nbitpos,
-1, lreversep),
-mask),
-   fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
-make_bit_field_ref (loc, rinner, rhs,
-unsigned_type,
-nbitsize, nbitpos,
-1, rreversep),
-mask));
+{
+  if (nbitpos < 0)
+   return 0;
+
+  /* If not comparing with constant, just rework the comparison
+and return.  */
+  tree t1 = make_bit_field_ref (loc, linner, lhs, unsigned_type,
+   nbitsize, nbitpos, 1, lreversep);
+  t1 = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type, t1, mask);
+  tree t2 = make_bit_field_ref (loc, rinner, rhs, unsigned_type,
+   nbitsize, nbitpos, 1, rreversep);
+  t2 = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type, t2, mask);
+  return fold_build2_loc (loc, code, compare_type, t1, t2);
+}
 
   /* Otherwise, we are handling the constant case.  See if the constant is too
  big for the field.  Warn and return a tree for 0 (false) if so.  We do
@@ -4058,6 +4057,9 @@ optimize_bit_field_compare (location_t l
}
 }
 
+  if (nbitpos < 0)
+return 0;
+
   /* Single-bit compares should always be against zero.  */
   if (lbitsize == 1 && ! integer_zerop (rhs))
 {
@@ -5874,7 +5876,10 @@ fold_truth_andor_1 (location_t loc, enum
 results.  */
   ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
   lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
-  if (lnbitsize == rnbitsize && xll_bitpos == xlr_bitpos)
+  if (lnbitsize == rnbitsize
+ && xll_bitpos == xlr_bitpos
+ && lnbitpos >= 0
+ && rnbitpos >= 0)
{
  lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
lntype, lnbitsize, lnbitpos,
@@ -5898,10 +5903,14 @@ fold_truth_andor_1 (location_t loc, enum
 Note that we still must mask the lhs/rhs expressions.  Furthermore,
 the mask must be shifted to account for the shift done by
 make_bit_field_ref.  */
-  if ((ll_bitsize + ll_bitpos == rl_bitpos
-  && lr_bitsize + lr_bitpos == rr_bitpos)
- || (ll_bitpos == rl_bitpos + rl_bitsize
- && lr_bitpos == rr_bitpos + rr_bitsize))
+  if (((ll_bitsize + ll_bitpos == rl_bitpos
+   && lr_bitsize + lr_bitpos == rr_bitpos)
+  || (ll_bitpos == rl_bitpos + rl_bitsize
+  && lr_bitpos == rr_bitpos + rr_bitsize))
+ && ll_bitpos >= 0
+ && rl_bitpos >= 0
+ && lr_bitpos >= 0
+ && rr_bitpos >= 0)
{
  tree type;
 
@@ -5970,6 +5979,9 @@ fold_truth_andor_1 (location_t loc, enum
}
 }
 
+  if (lnbitpos < 0)
+return 0;
+
   /* Construct the expression we will return.  First get the component
  reference we will make.  Unless the mask is all ones the width of
  that field, perform the mask operation.  Then compare with the
--- gcc/testsuite/gcc.c-torture/compile/pr82549.c.jj2017-10-14 
13:49:27.831214544 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr82549.c   2017-10-14 
13:49:23.561267090 +0200
@@ -0,0 +1,9 @@
+/* PR tree-optimization/82549 */
+
+int a, b[1];
+
+int
+main ()
+{
+  return !a || b[-2] || b[-2];
+}

Jakub


Re: [PATCH] Avoid creation if invalid BIT_FIELD_REFs (PR tree-optimization/82549)

2017-10-16 Thread Richard Biener
On October 16, 2017 8:46:12 PM GMT+02:00, Jakub Jelinek  
wrote:
>Hi!
>
>The tree-cfg.c verification requires that BIT_FIELD_REF bitpos fits
>into
>uhwi, but fold-const.c happily creates BIT_FIELD_REFs with negative
>bitpos
>(on invalid source).
>
>The following patch just punts optimization instead of creating
>BIT_FIELD_REF with negative bitpos and also adds some formatting fixes.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 
Richard. 

>2017-10-16  Jakub Jelinek  
>
>   PR tree-optimization/82549
>   * fold-const.c (optimize_bit_field_compare, fold_truth_andor_1):
>   Formatting fixes.  Instead of calling make_bit_field_ref with negative
>   bitpos return 0.
>
>   * gcc.c-torture/compile/pr82549.c: New test.
>
>--- gcc/fold-const.c.jj2017-10-13 09:17:44.0 +0200
>+++ gcc/fold-const.c   2017-10-14 13:46:35.939329846 +0200
>@@ -4013,21 +4013,20 @@ optimize_bit_field_compare (location_t l
> size_int (nbitsize - lbitsize - lbitpos));
> 
>   if (! const_p)
>-/* If not comparing with constant, just rework the comparison
>-   and return.  */
>-return fold_build2_loc (loc, code, compare_type,
>-  fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
>-   make_bit_field_ref (loc, linner, lhs,
>-   unsigned_type,
>-   nbitsize, nbitpos,
>-   1, lreversep),
>-   mask),
>-  fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
>-   make_bit_field_ref (loc, rinner, rhs,
>-   unsigned_type,
>-   nbitsize, nbitpos,
>-   1, rreversep),
>-   mask));
>+{
>+  if (nbitpos < 0)
>+  return 0;
>+
>+  /* If not comparing with constant, just rework the comparison
>+   and return.  */
>+  tree t1 = make_bit_field_ref (loc, linner, lhs, unsigned_type,
>+  nbitsize, nbitpos, 1, lreversep);
>+  t1 = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type, t1,
>mask);
>+  tree t2 = make_bit_field_ref (loc, rinner, rhs, unsigned_type,
>+  nbitsize, nbitpos, 1, rreversep);
>+  t2 = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type, t2,
>mask);
>+  return fold_build2_loc (loc, code, compare_type, t1, t2);
>+}
> 
>/* Otherwise, we are handling the constant case.  See if the constant
>is too
> big for the field.  Warn and return a tree for 0 (false) if so.  We do
>@@ -4058,6 +4057,9 @@ optimize_bit_field_compare (location_t l
>   }
> }
> 
>+  if (nbitpos < 0)
>+return 0;
>+
>   /* Single-bit compares should always be against zero.  */
>   if (lbitsize == 1 && ! integer_zerop (rhs))
> {
>@@ -5874,7 +5876,10 @@ fold_truth_andor_1 (location_t loc, enum
>results.  */
>   ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
>   lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
>-  if (lnbitsize == rnbitsize && xll_bitpos == xlr_bitpos)
>+  if (lnbitsize == rnbitsize
>+&& xll_bitpos == xlr_bitpos
>+&& lnbitpos >= 0
>+&& rnbitpos >= 0)
>   {
> lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
>   lntype, lnbitsize, lnbitpos,
>@@ -5898,10 +5903,14 @@ fold_truth_andor_1 (location_t loc, enum
>Note that we still must mask the lhs/rhs expressions.  Furthermore,
>the mask must be shifted to account for the shift done by
>make_bit_field_ref.  */
>-  if ((ll_bitsize + ll_bitpos == rl_bitpos
>- && lr_bitsize + lr_bitpos == rr_bitpos)
>-|| (ll_bitpos == rl_bitpos + rl_bitsize
>-&& lr_bitpos == rr_bitpos + rr_bitsize))
>+  if (((ll_bitsize + ll_bitpos == rl_bitpos
>+  && lr_bitsize + lr_bitpos == rr_bitpos)
>+ || (ll_bitpos == rl_bitpos + rl_bitsize
>+ && lr_bitpos == rr_bitpos + rr_bitsize))
>+&& ll_bitpos >= 0
>+&& rl_bitpos >= 0
>+&& lr_bitpos >= 0
>+&& rr_bitpos >= 0)
>   {
> tree type;
> 
>@@ -5970,6 +5979,9 @@ fold_truth_andor_1 (location_t loc, enum
>   }
> }
> 
>+  if (lnbitpos < 0)
>+return 0;
>+
>   /* Construct the expression we will return.  First get the component
>  reference we will make.  Unless the mask is all ones the width of
>  that field, perform the mask operation.  Then compare with the
>--- gcc/testsuite/gcc.c-torture/compile/pr82549.c.jj   2017-10-14
>13:49:27.831214544 +0200
>+++ gcc/testsuite/gcc.c-torture/compile/pr82549.c  2017-10-14
>13:49:23.561267090 +0200
>@@ -0,0 +1,9 @@
>+/* PR tree-optimiza

[patch][arm] gcc-7-branch: Fix bootstrap on FreeBSD

2017-10-16 Thread Andreas Tobler

Hi all,

I struggled over a bootstrap issue while building gcc-7 for armv7-*-freebsd*

I got a 'permission denied' while creating the arm-tables.opt file.

The source tree is located on a nfs server.

The below patch fixed it for me.

Ok to apply?

TIA,
Andreas

2017-10-16  Andreas Tobler  

* config/arm/t-arm (MD_INCLUDES): Create arm-tables.opt via
intermediate arm-tables.new like the other awk generated files.

Index: config/arm/t-arm
===
--- config/arm/t-arm(revision 253792)
+++ config/arm/t-arm(working copy)
@@ -75,8 +75,8 @@
 $(srcdir)/config/arm/arm-tables.opt: $(srcdir)/config/arm/parsecpu.awk \
   $(srcdir)/config/arm/arm-cpus.in
$(AWK) -f $(srcdir)/config/arm/parsecpu.awk -v cmd=opt \
-   $(srcdir)/config/arm/arm-cpus.in > \
-   $(srcdir)/config/arm/arm-tables.opt
+   $(srcdir)/config/arm/arm-cpus.in > arm-tables.new
+   mv arm-tables.new $(srcdir)/config/arm/arm-tables.opt

 $(srcdir)/config/arm/arm-cpu.h: $(srcdir)/config/arm/parsecpu.awk \
   $(srcdir)/config/arm/arm-cpus.in


[OBVIOUS][PATCH] Fix fallout of attribute directive ignored warning

2017-10-16 Thread Martin Liška

Hi.

This fixes follow-up which is caused by differences in between C and C++ FE:

./objdir/gcc/xgcc -Bobjdir/gcc 
/home/marxin/Programming/gcc/gcc/testsuite/c-c++-common/ubsan/attrib-5.c -c
/home/marxin/Programming/gcc/gcc/testsuite/c-c++-common/ubsan/attrib-5.c:6:1: 
warning: ‘foobar’ attribute directive ignored [-Wattributes]
 float_cast2 (void) { /* { dg-warning "attribute directive ignored" } */
 ^~~

./objdir/gcc/xg++ -Bobjdir/gcc 
/home/marxin/Programming/gcc/gcc/testsuite/c-c++-common/ubsan/attrib-5.c -c
/home/marxin/Programming/gcc/gcc/testsuite/c-c++-common/ubsan/attrib-5.c:6:18: 
warning: ‘foobar’ attribute directive ignored [-Wattributes]
 float_cast2 (void) { /* { dg-warning "attribute directive ignored" } */
  ^
I'll install it as obvious.

Thanks,
Martin
>From fc7fd410d2836776c8e49fa6755a5d58beb24901 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 16 Oct 2017 21:27:52 +0200
Subject: [PATCH] Fix attrib-5.c test-case.

gcc/testsuite/ChangeLog:

2017-10-16  Martin Liska  

	* c-c++-common/ubsan/attrib-5.c (float_cast2): Fix warning scan
	so that it will work for both C and C++ FEs.
---
 gcc/testsuite/c-c++-common/ubsan/attrib-5.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/ubsan/attrib-5.c b/gcc/testsuite/c-c++-common/ubsan/attrib-5.c
index fee1df1c433..209b5dd7d2b 100644
--- a/gcc/testsuite/c-c++-common/ubsan/attrib-5.c
+++ b/gcc/testsuite/c-c++-common/ubsan/attrib-5.c
@@ -3,8 +3,7 @@
 
 __attribute__((no_sanitize("foobar")))
 static void
-float_cast2 (void)
-{ /* { dg-warning "attribute directive ignored" } */
+float_cast2 (void) { /* { dg-warning "attribute directive ignored" } */
   volatile double d = 300;
   volatile signed char c;
   c = d;
-- 
2.14.2



[PATCH][Middle-end]Fix PR80295 [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-16 Thread Qing Zhao
resend this patch for middle-end to review. 

this patch was originally sent to aarch64 for review in the beginning:

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00404.html 

The implementation of __builtin_update_setjmp_buf is not correct. It takes a 
pointer
as an operand and treats the Mode of the pointer as Pmode, which is not correct.
a conversion from ptr_mode to Pmode is needed for this pointer.

bootstrapped and tested on both aarch64-unknown-linux-gnu and 
x86_64-pc-linux-gnu, 
no regressions.

Wilco helped me a lot during fixing this bug.

Okay for trunk?

the patch is:

gcc/ChangeLog

2017-10-16  Qing Zhao 
Wilco Dijkstra 

* builtins.c (expand_builtin_update_setjmp_buf): Add a
converstion to Pmode from the buf_addr.

gcc/testsuite/ChangeLog

2017-10-16  Qing Zhao 
Wilco Dijkstra 

PR middle-end/80295
* gcc.target/aarch64/pr80295.c: New test.

---
 gcc/builtins.c | 1 +
 gcc/testsuite/gcc.target/aarch64/pr80295.c | 8 
 2 files changed, 9 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr80295.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index c8a5ea6..01fb08b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -1199,6 +1199,7 @@ void
 expand_builtin_update_setjmp_buf (rtx buf_addr)
 {
   machine_mode sa_mode = STACK_SAVEAREA_MODE (SAVE_NONLOCAL);
+  buf_addr = convert_memory_address (Pmode, buf_addr);
   rtx stack_save
 = gen_rtx_MEM (sa_mode,
   memory_address
diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
b/gcc/testsuite/gcc.target/aarch64/pr80295.c
new file mode 100644
index 000..b3866d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr80295.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-mabi=ilp32" } */
+
+void f (void *b) 
+{ 
+  __builtin_update_setjmp_buf (b); 
+}
+
-- 
1.9.1



Re: [committed] Fix another tree-ssa-dse.c thinko

2017-10-16 Thread Martin Liška

On 10/16/2017 07:56 PM, Jeff Law wrote:

With this patch we get a clean bootstrap & regression test with Martin's
latest sbitmap checking patches on x86.


Thanks Jeff for testing. May I consider this as green for installation of my 
patch?

Martin


[PATCH] Do not put gimple stmt on an abnormal edge (PR sanitizer/82545).

2017-10-16 Thread Martin Liška

Hi.

As discussed with Jakub on IRC, we should not put ASAN reporting function
on critical edges. Can that potentially lead to a missed use-after-scope,
but I guess it's very rare.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Thanks,
Martin

gcc/ChangeLog:

2017-10-16  Martin Liska  

PR sanitizer/82545
* asan.c (asan_expand_poison_ifn): Do not put gimple stmt
on an abnormal edge.

gcc/testsuite/ChangeLog:

2017-10-16  Martin Liska  

PR sanitizer/82545
* gcc.dg/asan/pr82545.c: New test.
---
 gcc/asan.c  |  4 
 gcc/testsuite/gcc.dg/asan/pr82545.c | 15 +++
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr82545.c


diff --git a/gcc/asan.c b/gcc/asan.c
index 2aa0a795af2..99958ecc330 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -3400,6 +3400,10 @@ asan_expand_poison_ifn (gimple_stmt_iterator *iter,
 	  {
 		edge e = gimple_phi_arg_edge (phi, i);
 
+		/* Do not insert on an edge we can't split.  */
+		if (e->flags & EDGE_ABNORMAL)
+		  continue;
+
 		if (call_to_insert == NULL)
 		  call_to_insert = gimple_copy (call);
 
diff --git a/gcc/testsuite/gcc.dg/asan/pr82545.c b/gcc/testsuite/gcc.dg/asan/pr82545.c
new file mode 100644
index 000..a0e1edc53d4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr82545.c
@@ -0,0 +1,15 @@
+/* PR sanitizer/82545.  */
+/* { dg-do compile } */
+
+extern void c(int);
+extern void d(void);
+
+void a(void) {
+  {
+int b;
+&b;
+__builtin_setjmp(0);
+c(b);
+  }
+  d();
+}



patch to fix PR82353

2017-10-16 Thread Vladimir Makarov

This is another version of the patch to fix

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82353

The patch was successfully bootstrapped on x86-64 with Go and Ada.

Committed as rev. 253796.


Index: ChangeLog
===
--- ChangeLog	(revision 253795)
+++ ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2017-10-16  Vladimir Makarov  
+
+	PR sanitizer/82353
+	* lra.c (collect_non_operand_hard_regs): Don't ignore operator
+	locations.
+	* lra-lives.c (bb_killed_pseudos, bb_gen_pseudos): Move up.
+	(make_hard_regno_born, make_hard_regno_dead): Update
+	bb_killed_pseudos and bb_gen_pseudos for fixed regs.
+
 2017-10-16  Jeff Law  
 
 	* tree-ssa-dse.c (live_bytes_read): Fix thinko.
Index: lra.c
===
--- lra.c	(revision 253685)
+++ lra.c	(working copy)
@@ -820,7 +820,8 @@ collect_non_operand_hard_regs (rtx *x, l
   const char *fmt = GET_RTX_FORMAT (code);
 
   for (i = 0; i < data->insn_static_data->n_operands; i++)
-if (x == data->operand_loc[i])
+if (! data->insn_static_data->operand[i].is_operator
+	&& x == data->operand_loc[i])
   /* It is an operand loc. Stop here.  */
   return list;
   for (i = 0; i < data->insn_static_data->n_dups; i++)
Index: lra-lives.c
===
--- lra-lives.c	(revision 253685)
+++ lra-lives.c	(working copy)
@@ -220,6 +220,9 @@ lra_intersected_live_ranges_p (lra_live_
   return false;
 }
 
+/* The corresponding bitmaps of BB currently being processed.  */
+static bitmap bb_killed_pseudos, bb_gen_pseudos;
+
 /* The function processing birth of hard register REGNO.  It updates
living hard regs, START_LIVING, and conflict hard regs for living
pseudos.  Conflict hard regs for the pic pseudo is not updated if
@@ -243,6 +246,8 @@ make_hard_regno_born (int regno, bool ch
 	|| i != REGNO (pic_offset_table_rtx))
 #endif
   SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs, regno);
+  if (fixed_regs[regno])
+bitmap_set_bit (bb_gen_pseudos, regno);
 }
 
 /* Process the death of hard register REGNO.  This updates
@@ -255,6 +260,11 @@ make_hard_regno_dead (int regno)
 return;
   sparseset_set_bit (start_dying, regno);
   CLEAR_HARD_REG_BIT (hard_regs_live, regno);
+  if (fixed_regs[regno])
+{
+  bitmap_clear_bit (bb_gen_pseudos, regno);
+  bitmap_set_bit (bb_killed_pseudos, regno);
+}
 }
 
 /* Mark pseudo REGNO as living at program point POINT, update conflicting
@@ -299,9 +309,6 @@ mark_pseudo_dead (int regno, int point)
 }
 }
 
-/* The corresponding bitmaps of BB currently being processed.  */
-static bitmap bb_killed_pseudos, bb_gen_pseudos;
-
 /* Mark register REGNO (pseudo or hard register) in MODE as live at
program point POINT.  Update BB_GEN_PSEUDOS.
Return TRUE if the liveness tracking sets were modified, or FALSE


Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-16 Thread Martin Liška

Hi.

All nits included in mainline review request I've just done:
https://reviews.llvm.org/D38971

Martin


[testsuite] Fix directives order

2017-10-16 Thread Christophe Lyon
Hi,

I have noticed a few testcases where dg-do should be moved as the
first directive, and others where dg-options should be moved before
dg-add-options. The attached patch does that. I noticed no difference
in testing, at least because the arm configs I test do not include
v8m.
So, no regression from my point of view, but this should avoid some headaches.

OK?

Thanks,

Christophe
2017-10-16  Christophe Lyon  

* gcc.c-torture/execute/pr23135.c: Move dg-add-options after
dg-options.
* gcc.dg/torture/pr78305.c: Move dg-do as first directive.
* gcc.misc-tests/gcov-3.c: Likewise.
* gcc.target/arm/cmse/baseline/cmse-11.c: Move dg-options before 
dg-add-options.
* gcc.target/arm/cmse/baseline/cmse-13.c: Likewise.
* gcc.target/arm/cmse/baseline/cmse-2.c: Likewise.
* gcc.target/arm/cmse/baseline/cmse-6.c: Likewise.
* gcc.target/arm/cmse/baseline/softfp.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/soft/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-13.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-7.c: Likewise.
* gcc.target/arm/cmse/mainline/softfp/cmse-8.c: Likewise.
* gcc.target/arm/lp1189445.c: Likewise.

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr23135.c 
b/gcc/testsuite/gcc.c-torture/execute/pr23135.c
index 8dd6358..e740ff5 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr23135.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr23135.c
@@ -1,9 +1,8 @@
-/* { dg-add-options stack_size } */
-
 /* Based on execute/simd-1.c, modified by joern.renne...@st.com to
trigger a reload bug.  Verified for gcc mainline from 20050722 13:00 UTC
for sh-elf -m4 -O2.  */
 /* { dg-options "-Wno-psabi" } */
+/* { dg-add-options stack_size } */
 
 #ifndef STACK_SIZE
 #define STACK_SIZE (256*1024)
diff --git a/gcc/testsuite/gcc.dg/torture/pr78305.c 
b/gcc/testsuite/gcc.dg/torture/pr78305.c
index ccb8c6f..36d3620 100644
--- a/gcc/testsuite/gcc.dg/torture/pr78305.c
+++ b/gcc/testsuite/gcc.dg/torture/pr78305.c
@@ -1,5 +1,5 @@
-/* { dg-require-effective-target int32plus } */
 /* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
 
 int main ()
 {
diff --git a/gcc/testsuite/gcc.misc-tests/gcov-3.c 
b/gcc/testsuite/gcc.misc-tests/gcov-3.c
index eb6e4cc..5b07dd7 100644
--- a/gcc/testsuite/gcc.misc-tests/gcov-3.c
+++ b/gcc/testsuite/gcc.misc-tests/gcov-3.c
@@ -1,10 +1,10 @@
+/* { dg-do run { target native } } */
 /* { dg-require-effective-target label_values } */
 
 /* Test Gcov with computed gotos.
This is the same as test gcc.c-torture/execute/980526-1.c */
 
 /* { dg-options "-fprofile-arcs -ftest-coverage" } */
-/* { dg-do run { target native } } */
 
 extern void abort (void);
 extern void exit (int);
diff --git a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c 
b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c
index 3007409..795544f 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
+/* { dg-options "-mcmse" }  */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-add-options arm_arch_v8m_base } */
-/* { dg-options "-mcmse" }  */
 
 int __attribute__ ((cmse_nonsecure_call)) (*bar) (int);
 
diff --git a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c 
b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c
index f2b931b..8ced14b 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
+/* { dg-options "-mcmse" } */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-add-options arm_arch_v8m_base } */
-/* { dg-options "-mcmse" } */
 
 int __attribute__ ((cmse_nonsecure_call)) (*bar) (float, double);
 
diff --git a/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-2.c 
b/gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-2.c
index 814

[committed] Add gnu::unique_ptr

2017-10-16 Thread David Malcolm
From: Trevor Saunders 

On Mon, 2017-10-16 at 11:47 +0100, Pedro Alves wrote:
> On 10/14/2017 12:35 AM, David Malcolm wrote:
> 
> > As far as I can tell from your mail, the one issue that blocks that
> > is the lack of gdb::unique_xmalloc_ptr.
> > 
> > So here's an patch on top of the previous one which adds the
> > xmalloc_deleter (taken from gdb, but changing "xfree" to
> > "free", and adding support for pre-C++11 dialects), along with
> > selftests for unique_ptr, unique_xmalloc_ptr and
> > unique_xmalloc_ptr.
> 
> Thanks much!
> 
> > As before, successfully bootstrapped & regrtested on x86_64-pc-
> > linux-gnu,
> > using gcc 4.8 for the initial bootstrap (hence testing both gnu++03
> > then gnu++14 in the selftests, for stage 1 and stages 2 and 3
> > respectively).
> > 
> 
> Excellent.
> 
> > Hand-tested with "make selftest-valgrind" with both gnu++03 and
> > gnu++14.
> > 
> > Also tested stage1 on powerpc-ibm-aix7.1.3.0 ("gcc111" in the
> > compile farm; gcc 4.8 i.e. gnu++03)
> > 
> > Is this OK?
> 
> Looks great to me.
> 
> > +/* Verify that gnu::unique_malloc_ptr works.  */
> 
> Typo: malloc -> xmalloc.  Appears in other comments too.
> 
> Thanks,
> Pedro Alves

Thanks.  I've fixed those typos, re-tested, and committed it to
gcc trunk as r253797.

For reference, here's what I committed:


This is a version of the patch posted by Trevor Saunders on 2017-07-31,
for which he wrote:
> For most of the history of this see
>   https://sourceware.org/ml/gdb-patches/2016-10/msg00223.html
> The changes are mostly s/gdb/gtl/g

This version was updated by me (dmalcolm) adding these changes:
- renaming of "gtl" to "gnu" (3 letters, and one of the ones Richi
  proposed, and not a match for "*tl")
- renaming of DEFINE_GDB_UNIQUE_PTR to DEFINE_GNU_UNIQUE_PTR
- renaming of xfree_deleter to xmalloc_deleter, and making it
  use "free" rather than "xfree" (which doesn't exist)
- added a gcc/unique-ptr-tests.cc
- implement unique_xmalloc_ptr (taken from gdb, but changing
  "xfree" to "free", and adding support for pre-C++-11)

gcc/ChangeLog:

David Malcolm 

* Makefile.in (OBJS): Add unique-ptr-tests.o.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::unique_ptr_tests_cc_tests.
* selftest.h (selftest::unique_ptr_tests_cc_tests): New decl.
* unique-ptr-tests.cc: New file.

include/ChangeLog:

Trevor Saunders  
David Malcolm 

* unique-ptr.h: New file.
---
 gcc/Makefile.in  |   1 +
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 gcc/unique-ptr-tests.cc  | 234 +++
 include/unique-ptr.h | 403 +++
 5 files changed, 640 insertions(+)
 create mode 100644 gcc/unique-ptr-tests.cc
 create mode 100644 include/unique-ptr.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 878ce7b..2809619 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1568,6 +1568,7 @@ OBJS = \
tree-vrp.o \
tree.o \
typed-splay-tree.o \
+   unique-ptr-tests.o \
valtrack.o \
value-prof.o \
var-tracking.o \
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index 30e476d..b05e0fc 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -66,6 +66,7 @@ selftest::run_tests ()
   sreal_c_tests ();
   fibonacci_heap_c_tests ();
   typed_splay_tree_c_tests ();
+  unique_ptr_tests_cc_tests ();
 
   /* Mid-level data structures.  */
   input_c_tests ();
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 96eccac..adc0b68 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -194,6 +194,7 @@ extern void store_merging_c_tests ();
 extern void typed_splay_tree_c_tests ();
 extern void tree_c_tests ();
 extern void tree_cfg_c_tests ();
+extern void unique_ptr_tests_cc_tests ();
 extern void vec_c_tests ();
 extern void wide_int_cc_tests ();
 extern void predict_c_tests ();
diff --git a/gcc/unique-ptr-tests.cc b/gcc/unique-ptr-tests.cc
new file mode 100644
index 000..f5b72a8
--- /dev/null
+++ b/gcc/unique-ptr-tests.cc
@@ -0,0 +1,234 @@
+/* Unit tests for unique-ptr.h.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "unique-ptr.h"
+#include "selftest.h"
+
+#if CHECKING_P
+
+namespace selftest {
+

[PATCH] c-family: add name_hint/deferred_diagnostic (v2)

2017-10-16 Thread David Malcolm
Here's an updated version of the patch which drops refcounting in favor
of using gnu::unique_ptr.

One wart with this approach is that the handling of suppressed diagnostics
has to happen in both deferred_diagnostic subclasses, rather
than in the name_hint class.  It would be possible to fix this
by introducing another dynamically-allocated object to manage
this concern, but adding another dynamic allocation seemed like
overkill, so I went with the simpler approach (with the slight wart).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, with
the followup patches you already approved:

[PATCH 2/3] C++: provide macro used-before-defined hint (PR c++/72786).
  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00124.html

[PATCH 3/3] C: hints for missing stdlib includes for macros and types
  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00125.html

(plus I have a further followup under construction that extends this
to address PR c/81404)

OK for trunk?


Blurb from v1:

In various places we use lookup_name_fuzzy to provide a hint,
and can report messages of the form:
  error: unknown foo named 'bar'
or:
  error: unknown foo named 'bar'; did you mean 'SUGGESTION?

This patch provides a way for lookup_name_fuzzy to provide
both the suggestion above, and (optionally) additional hints
that can be printed e.g.

  note: did you forget to include ?

This patch provides the mechanism and ports existing users
of lookup_name_fuzzy to the new return type.
There are no uses of such hints in this patch, but followup
patches provide various front-end specific uses of this.

gcc/c-family/ChangeLog:
* c-common.h (class deferred_diagnostic): New class.
(class name_hint): New class.
(lookup_name_fuzzy): Convert return type from const char *
to name_hint.  Add location_t param.

gcc/c/ChangeLog:
* c-decl.c (implicit_decl_warning): Convert "hint" from
const char * to name_hint.  Pass location to
lookup_name_fuzzy.  Suppress any deferred diagnostic if the
warning was not printed.
(undeclared_variable): Likewise for "guessed_id".
(lookup_name_fuzzy): Convert return type from const char *
to name_hint.  Add location_t param.
* c-parser.c (c_parser_declaration_or_fndef): Convert "hint" from
const char * to name_hint.  Pass location to lookup_name_fuzzy.
(c_parser_parameter_declaration): Pass location to
lookup_name_fuzzy.

gcc/cp/ChangeLog:
* name-lookup.c (suggest_alternatives_for): Convert "fuzzy_name" from
const char * to name_hint, and rename to "hint".  Pass location to
lookup_name_fuzzy.
(lookup_name_fuzzy): Convert return type from const char *
to name_hint.  Add location_t param.
* parser.c (cp_parser_diagnose_invalid_type_name): Convert
"suggestion" from const char * to name_hint, and rename to "hint".
Pass location to lookup_name_fuzzy.
---
 gcc/c-family/c-common.h | 81 -
 gcc/c/c-decl.c  | 35 +++--
 gcc/c/c-parser.c| 16 +-
 gcc/cp/name-lookup.c| 14 +
 gcc/cp/parser.c | 12 
 5 files changed, 122 insertions(+), 36 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 7e1877e..50e60bf 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alias.h"
 #include "tree.h"
 #include "fold-const.h"
+#include "unique-ptr.h"
 
 /* In order for the format checking to accept the C frontend
diagnostic framework extensions, you must include this file before
@@ -1000,7 +1001,85 @@ enum lookup_name_fuzzy_kind {
   /* Any name.  */
   FUZZY_LOOKUP_NAME
 };
-extern const char *lookup_name_fuzzy (tree, enum lookup_name_fuzzy_kind);
+
+/* A deferred_diagnostic is a wrapper around optional extra diagnostics
+   that we may want to bundle into a name_hint.
+
+   The diagnostic is emitted by the subclass destructor, which should
+   check that is_suppressed_p () is not true.  */
+
+class deferred_diagnostic
+{
+ public:
+  virtual ~deferred_diagnostic () {}
+
+  location_t get_location () const { return m_loc; }
+
+  /* Call this if the corresponding warning was not emitted,
+ in which case we should also not emit the deferred_diagnostic.  */
+  void suppress ()
+  {
+m_suppress = true;
+  }
+
+  bool is_suppressed_p () const { return m_suppress; }
+
+ protected:
+  deferred_diagnostic (location_t loc)
+  : m_loc (loc), m_suppress (false) {}
+
+ private:
+  location_t m_loc;
+  bool m_suppress;
+};
+
+/* A name_hint is an optional string suggestion, along with an
+   optional deferred_diagnostic.
+   For example:
+
+   error: unknown foo named 'bar'
+
+   if the SUGGESTION is "baz", then one might print:
+
+   error: unknown foo named 'bar'; did you mean 'baz'?
+
+   and the deferred_diagnostic allows f

[PATCH, committed] Add myself to MAINTAINERS

2017-10-16 Thread Tsimbalist, Igor V
ChangeLog:

2017-10-16  Igor Tsimbalist  

* MAINTAINERS (write after approval): Add myself.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 253797)
+++ MAINTAINERS (working copy)
@@ -603,6 +603,7 @@
 Philipp Tomsich

 Konrad Trifunovic  
 Markus Trippelsdorf
+Igor Tsimbalist

 Martin Uecker  
 David Ung  
 Neil Vachharajani  


Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-16 Thread Joseph Myers
On Mon, 16 Oct 2017, Sam van Kampen via gcc-patches wrote:

> +Wbitfield-enum-conversion
> +C++ Var(warn_bitfield_enum_conversion) Init(1) Warning
> +Warn about struct bit-fields being too small to hold enumerated types.

Any option supported for C++ should also be supported for ObjC++ unless 
there is a clear reason it cannot work for ObjC++.

> +@item -Wbitfield-enum-conversion
> +@opindex Wbitfield-enum-conversion
> +@opindex Wno-bitfield-enum-conversion
> +Warn about a bit-field potentially being too small to hold all values
> +of an enumerated type. This warning is enabled by default.

The documentation of the option also needs to indicate that it's for 
C++/ObjC++ only, similar to other such options.

The patch also needs to add a testcase to the testsuite that verifies the 
warnings issues in appropriate cases (and that warnings are not issued 
when the bit-field is large enough).

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [PING][PATCH][Aarch64] Improve int<->FP conversions

2017-10-16 Thread Michael Collison
Patch updated with all comments from James.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2017-10-15  Michael Collison  

* config/aarch64/aarch64.md(_trunc>2):
New pattern.
(_trunchf2: New pattern.
(_trunc2: New pattern.
* config/aarch64/iterators.md (fpw): New mode attribute.
(fcvt_change_mode, FCVT_CHANGE_MODE): New mode attributes.
(s): Update attribute with SImode and DImode prefixes.
* testsuite/gcc.target/aarch64/fix_trunc1.c: New testcase.

-Original Message-
From: James Greenhalgh [mailto:james.greenha...@arm.com] 
Sent: Wednesday, October 4, 2017 2:39 AM
To: Michael Collison 
Cc: GCC Patches ; nd 
Subject: Re: [PING][PATCH][Aarch64] Improve int<->FP conversions

On Sun, Oct 01, 2017 at 02:07:57AM +0100, Michael Collison wrote:
> Sorry. Here is the patch.

I think this needs a small amount fo rework in iterators.md - the names you've 
used don't follow conventions in that file (e.g. "V" normally has something to 
do with vectors) so could do with patching up.


> -(define_insn "_trunc2"
> +(define_insn "_trunc2"
> +  [(set (match_operand:GPI 0 "register_operand" "=?r,w")
> + (FIXUORS:GPI (match_operand: 1 "register_operand" "w,w")))]
> +  "TARGET_FLOAT"
> +  "@
> +   fcvtz\t%0, %1
> +   fcvtz\t%0, %1"
> +  [(set_attr "type" "f_cvtf2i,neon_fp_to_int_s")]
> +)

So the point here is that we need to fork the pattern for two reasons.
Before we were iterating over both floating-point modes as the input to any 
integer-modes as output. Because only the same-sized instructions have 
vector-register to vector-register forms we need two patterns. One for 
same-size, one for cross-size. And one more special pattern for HFmode.

This makes sense to me. A comment explaining why we need the two patterns would 
be even easier to read.

This pattern gives us SFmode to SImode and DFmode to DImode.

> +(define_insn "_trunchf2"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> + (FIXUORS:GPI (match_operand:HF 1 "register_operand" "w")))]
> +  "TARGET_FP_F16INST"
> +  "fcvtz\t%0, %h1"
> +  [(set_attr "type" "f_cvtf2i")]

This pattern we need for HFmode to SImode.

> +(define_insn "_trunc2"
>[(set (match_operand:GPI 0 "register_operand" "=r")
> - (FIXUORS:GPI (match_operand:GPF_F16 1 "register_operand" "w")))]
> + (FIXUORS:GPI (match_operand: 1 "register_operand" "w")))]
>"TARGET_FLOAT"
> -  "fcvtz\t%0, %1"
> +  "fcvtz\t%0, %1"
>[(set_attr "type" "f_cvtf2i")]
>  )

And this pattern gives SFmode to DImode and DFmode to SImode.

Comments would definitely help here.

> diff --git a/gcc/config/aarch64/iterators.md 
> b/gcc/config/aarch64/iterators.md index cceb575..166a044 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -391,6 +391,9 @@
>  (define_mode_attr w1 [(HF "w") (SF "w") (DF "x")])  (define_mode_attr 
> w2 [(HF "x") (SF "x") (DF "w")])
>  
> +;; For inequal width float to int conversion (define_mode_attr wv 
> +[(DI "s") (SI "d")])

Could you invent a more descriptive name for this?

> +
>  (define_mode_attr short_mask [(HI "65535") (QI "255")])
>  
>  ;; For constraints used in scalar immediate vector moves @@ -399,6 
> +402,14 @@  ;; For doubling width of an integer mode  
> (define_mode_attr DWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI")])
>  
> +(define_mode_attr vf [(SI "sf") (DI "df")])
> +
> +(define_mode_attr VF [(SI "SF") (DI "DF")])

These two are fcvt_target and FCVT_TARGET ?

> +(define_mode_attr vgp [(SI "df") (DI "sf")])
> +
> +(define_mode_attr VGP [(SI "DF") (DI "SF")])

These names don't make sense to me - V is usually vector and GP sounds like 
general purpose. Maybe something like fcvt_change_mode ? Try to be more 
descriptive.

> +
>  ;; For scalar usage of vector/FP registers  (define_mode_attr v [(QI 
> "b") (HI "h") (SI "s") (DI "d")
>   (HF  "h") (SF "s") (DF "d")
> @@ -432,7 +443,7 @@
>  (define_mode_attr vas [(DI "") (SI ".2s")])
>  
>  ;; Map a floating point mode to the appropriate register name prefix

This comment is out of date after your changes, please update it.

> -(define_mode_attr s [(HF "h") (SF "s") (DF "d")])
> +(define_mode_attr s [(HF "h") (SF "s") (DF "d") (SI "s") (DI "d")])
>  
>  ;; Give the length suffix letter for a sign- or zero-extension.
>  (define_mode_attr size [(QI "b") (HI "h") (SI "w")])


Thanks,
James



pr6527v4.patch
Description: pr6527v4.patch


Re: [patch, c++] Add a warning flag for the enum bit-field declaration warning in bug #61414.

2017-10-16 Thread Martin Sebor

On 10/16/2017 06:37 AM, Sam van Kampen via gcc-patches wrote:

..I just realised that the clang flag is -Wbitfield-enum-conversion, not
-Wenum-bitfield-conversion. Please apply the patch below instead, which
has replaced the two words to remove the inconsistency.

2017-10-16  Sam van Kampen  

* c-family/c.opt: Add a warning flag for struct bit-fields
being too small to hold enumerated types.
* cp/class.c: Likewise.
* doc/invoke.texi: Add documentation for said warning flag.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 253784)
+++ gcc/c-family/c.opt  (working copy)
@@ -346,6 +346,10 @@ Wframe-address
 C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn when __builtin_frame_address or __builtin_return_address is used unsafely.

+Wbitfield-enum-conversion
+C++ Var(warn_bitfield_enum_conversion) Init(1) Warning
+Warn about struct bit-fields being too small to hold enumerated types.


Warning by default is usually reserved for problems that are
most likely indicative of bugs.  Does this rise to that level?
(It seems that it should be in the same class as -Wconversion).


+
 Wbuiltin-declaration-mismatch
 C ObjC C++ ObjC++ Var(warn_builtin_declaraion_mismatch) Init(1) Warning
 Warn when a built-in function is declared with the wrong signature.
Index: gcc/cp/class.c
===
--- gcc/cp/class.c  (revision 253784)
+++ gcc/cp/class.c  (working copy)
@@ -3278,10 +3278,11 @@ check_bitfield_decl (tree field)
   && tree_int_cst_lt (TYPE_SIZE (type), w)))
warning_at (DECL_SOURCE_LOCATION (field), 0,
"width of %qD exceeds its type", field);
-  else if (TREE_CODE (type) == ENUMERAL_TYPE
+  else if (warn_bitfield_enum_conversion
+  && TREE_CODE (type) == ENUMERAL_TYPE
   && (0 > (compare_tree_int
(w, TYPE_PRECISION (ENUM_UNDERLYING_TYPE (type))
-   warning_at (DECL_SOURCE_LOCATION (field), 0,
+   warning_at (DECL_SOURCE_LOCATION (field), OPT_Wbitfield_enum_conversion,
"%qD is too small to hold all values of %q#T",
field, type);


I would suggest to include a bit more detail in the message, such
as the smallest number of bits needed to represent every value of
the enumeration.  It's also often helpful to include a note
pointing to the relevant declaration (in this case it could be
the bit-field).


 }
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 253784)
+++ gcc/doc/invoke.texi (working copy)
@@ -264,7 +264,7 @@ Objective-C and Objective-C++ Dialects}.
 -Walloca  -Walloca-larger-than=@var{n} @gol
 -Wno-aggressive-loop-optimizations  -Warray-bounds  -Warray-bounds=@var{n} @gol
 -Wno-attributes  -Wbool-compare  -Wbool-operation @gol
--Wno-builtin-declaration-mismatch @gol
+-Wbitfield-enum-conversion -Wno-builtin-declaration-mismatch @gol
 -Wno-builtin-macro-redefined  -Wc90-c99-compat  -Wc99-c11-compat @gol
 -Wc++-compat  -Wc++11-compat  -Wc++14-compat  @gol
 -Wcast-align  -Wcast-align=strict  -Wcast-qual  @gol
@@ -5431,6 +5431,12 @@ Incrementing a boolean is invalid in C++17, and de

 This warning is enabled by @option{-Wall}.

+@item -Wbitfield-enum-conversion
+@opindex Wbitfield-enum-conversion
+@opindex Wno-bitfield-enum-conversion
+Warn about a bit-field potentially being too small to hold all values
+of an enumerated type. This warning is enabled by default.
+


The brief description makes me wonder what this really means.  What
is the meaning of "potentially?"  What (what expressions) triggers
the warning?  Presumably it's initialization and assignment.  Does
explicit or implicit conversion also trigger is?  I would suggest
to expand the documentation to obviate these questions, and perhaps
also include an example.

Martin