RE: [PATCH PR96757] aarch64: ICE during GIMPLE pass: vect

2020-09-19 Thread duanbo (C)


> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Saturday, August 29, 2020 2:31 AM
> To: duanbo (C) 
> Cc: GCC Patches ; rguent...@suse.de
> Subject: Re: [PATCH PR96757] aarch64: ICE during GIMPLE pass: vect
> 
> "duanbo (C)"  writes:
> > @@ -4395,6 +4395,40 @@ vect_recog_mask_conversion_pattern
> (vec_info *vinfo,
> > {
> >   tmp = vect_recog_temp_ssa_var (TREE_TYPE (rhs1), NULL);
> >   pattern_stmt = gimple_build_assign (tmp, rhs1);
> > + tree rhs1_op0 = TREE_OPERAND (rhs1, 0);
> > + tree rhs1_op1 = TREE_OPERAND (rhs1, 1);
> 
> I think we also need to handle this case when picking the vector types and
> deciding whether a pattern is needed.  Otherwise it would be possible for
> rhs1_op1 to be the only input that requires a different mask, and we'd
> wrongly avoid creating a conversion for it.
> 
> > + if (rhs1_op0 && rhs1_op1
> > + && (TREE_CODE (TREE_TYPE (rhs1_op0)) == BOOLEAN_TYPE)
> > + && (TREE_CODE (TREE_TYPE (rhs1_op1)) == BOOLEAN_TYPE))
> 
> Minor formatting nit, but the brackets around the == are redundant (but are
> still kept for multiline comparisons like the == below).
> 
> > +   {
> > + tree rhs1_op0_type = integer_type_for_mask (rhs1_op0, vinfo);
> > + tree rhs1_op1_type = integer_type_for_mask (rhs1_op1, vinfo);
> > + enum tree_code rhs1_code = gimple_assign_rhs_code
> (pattern_stmt);
> > + if (rhs1_op0_type && rhs1_op1_type
> > + && (!(TYPE_PRECISION (rhs1_op0_type)
> > +   == TYPE_PRECISION (rhs1_op1_type
> 
> More obvious as != rather than !(… == …).
> 
> > +   {
> > + if (TYPE_PRECISION (rhs1_op0_type)
> > + < TYPE_PRECISION (rhs1_op1_type))
> > +   {
> > + vectype2
> > +   = get_mask_type_for_scalar_type (vinfo,
> rhs1_op0_type);
> > + if (vectype2)
> > +   rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> > + vectype2,
> stmt_vinfo);
> 
> It isn't safe to ignore null returns like this.  We should bail out instead.
> 
> > +   }
> > + else
> > +   {
> > + vectype2
> > +   = get_mask_type_for_scalar_type (vinfo,
> rhs1_op1_type);
> > + if (vectype2)
> > +   rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> > + vectype2,
> stmt_vinfo);
> > +   }
> 
> This seems to go for the narrower of the two precisions.  Are you sure that's
> always the optimal choice?  Gut instinct says that we should move in the
> direction of vectype1, since that's what the result will be converted to.  
> E.g.
> maybe:
> 
> - if both precisions are less than vectype1's precision or both are
>   more than vectype1's precision, pick the closest to vectype1.
> 
> - otherwise, convert both inputs to vectype1 individually, rather than
>   converting the result to vectype1.
> 
> I admit I haven't thought about it much though.
> 
> > + pattern_stmt = gimple_build_assign (tmp, rhs1_code,
> > + rhs1_op0, rhs1_op1);
> > +   }
> 
> This statement is redundant with the one created above.  I think instead we
> should create the statement this way in all cases, even if rhs1_op0 and
> rhs1_op1 don't change.  That's effectively what the:
> 
>   gimple_build_assign (tmp, rhs1)
> 
> does anyway.
> 
> I'm going to be away next week so my next reply might be even slower than
> usual, sorry.  Would be happy for someone else to review in the meantime
> though. :-)
> 
> Thanks,
> Richard

Sorry for the late reply.
Thanks for your suggestions. I have modified accordingly.
Attached please find the v1 patch. 
Bootstrap and tested on both aarch64 and x86 Linux platform, no new regression 
witnessed.
Ok for trunk?

Thanks,
Duan bo




pr96757-v1.patch
Description: pr96757-v1.patch


libstdc++: Fix chrono::__detail::ceil to work with C++11 (was Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486])

2020-09-19 Thread Mike Crowe via Gcc-patches
On Friday 11 September 2020 at 19:59:36 +0100, Jonathan Wakely wrote:
> commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147
> Author: Jonathan Wakely 
> Date:   Fri Sep 11 19:59:11 2020
> 
> libstdc++: Fix chrono::__detail::ceil to work with C++11
> 
> In C++11 constexpr functions can only have a return statement, so we
> need to fix __detail::ceil to make it valid in C++11. This can be done
> by moving the comparison and increment into a new function, __ceil_impl,
> and calling that with the result of the duration_cast.
> 
> This would mean the standard C++17 std::chrono::ceil function would make
> two further calls, which would add too much overhead when not inlined.
> For C++17 and later use a using-declaration to add chrono::ceil to
> namespace __detail. For C++11 and C++14 define chrono::__detail::__ceil
> as a C++11-compatible constexpr function template.
>
>
> libstdc++-v3/ChangeLog:
> 
> * include/std/chrono [C++17] (chrono::__detail::ceil): Add
> using declaration to make chrono::ceil available for internal
> use with a consistent name.
> (chrono::__detail::__ceil_impl): New function template.
> (chrono::__detail::ceil): Use __ceil_impl to compare and
> increment the value. Remove SFINAE constraint.

This change introduces a new implementation of ceil that, as far as I can
tell, has no tests. A patch is attached to add the equivalent of the
existing chrono::ceil tests for chrono::__detail::ceil. The tests fail to
compile if I run them without 53ad6b1979f4bd7121e977c4a44151b14d8a0147 as
expected due to the previous non-C++11-compliant implementation.

Mike.
>From b9dffbf4f1bc05a887269ea95a3b86d5a611e720 Mon Sep 17 00:00:00 2001
From: Mike Crowe 
Date: Wed, 16 Sep 2020 15:31:28 +0100
Subject: [PATCH 1/2] libstdc++: Test C++11 implementation of
 std::chrono::__detail::ceil

Commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147 split the implementation
of std::chrono::__detail::ceil so that when compiling for C++17 and
later std::chrono::ceil is used but when compiling for earlier versions
a separate implementation is used to comply with C++11's limited
constexpr rules. Let's run the equivalent of the existing
std::chrono::ceil test cases on std::chrono::__detail::ceil too to make
sure that it doesn't get broken.

libstdc++-v3/ChangeLog:

	* testsuite/20_util/duration_cast/rounding_c++11.cc: Copy
rounding.cc and alter to support compilation for C++11 and to
test std::chrono::__detail::ceil.
---
 .../20_util/duration_cast/rounding_c++11.cc   | 43 +++
 1 file changed, 43 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc

diff --git a/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc b/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
new file mode 100644
index 000..f10d27fd082
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
@@ -0,0 +1,43 @@
+// Copyright (C) 2016-2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using std::chrono::seconds;
+using std::chrono::milliseconds;
+
+using fp_seconds = std::chrono::duration;
+
+static_assert( std::chrono::__detail::ceil(milliseconds(1000))
+	   == seconds(1) );
+static_assert( std::chrono::__detail::ceil(milliseconds(1001))
+	   == seconds(2) );
+static_assert( std::chrono::__detail::ceil(milliseconds(1500))
+	   == seconds(2) );
+static_assert( std::chrono::__detail::ceil(milliseconds(1999))
+	   == seconds(2) );
+static_assert( std::chrono::__detail::ceil(milliseconds(2000))
+	   == seconds(2) );
+static_assert( std::chrono::__detail::ceil(milliseconds(2001))
+	   == seconds(3) );
+static_assert( std::chrono::__detail::ceil(milliseconds(2500))
+	   == seconds(3) );
+static_assert( std::chrono::__detail::ceil(milliseconds(500))
+	   == fp_seconds{0.5f} );
-- 
2.28.0



Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]

2020-09-19 Thread Mike Crowe via Gcc-patches
On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:
> > > diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
> > > b/libstdc++-v3/include/bits/atomic_futex.h
> > > index 5f95ade..aa137a7 100644
> > > --- a/libstdc++-v3/include/bits/atomic_futex.h
> > > +++ b/libstdc++-v3/include/bits/atomic_futex.h
> > > @@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >  _M_load_when_equal_for(unsigned __val, memory_order __mo,
> > > const chrono::duration<_Rep, _Period>& __rtime)
> > >  {
> > > + using __dur = typename __clock_t::duration;
> > >   return _M_load_when_equal_until(__val, __mo,
> > > - __clock_t::now() + __rtime);
> > > + __clock_t::now() + chrono::__detail::ceil<__dur>(__rtime));
> > >  }
> > > 
> > >// Returns false iff a timeout occurred.
> > > @@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >   do {
> > > const __clock_t::time_point __s_entry = __clock_t::now();
> > > const auto __delta = __atime - __c_entry;
> > > -   const auto __s_atime = __s_entry + __delta;
> > > +   const auto __s_atime = __s_entry +
> > > +   chrono::__detail::ceil<_Duration>(__delta);

On Friday 11 September 2020 at 18:22:04 +0100, Jonathan Wakely wrote:
> I'm testing the attached patch to fix the C++11 constexpr error, but
> while re-looking at the uses of __detail::ceil I noticed this is using
> _Duration as the target type. Shouldn't that be __clock_t::duration
> instead? Why do we care about the duration of the user's time_point
> here, rather than the preferred duration of the clock we're about to
> wait against?

I think you're right. I've attached a patch to fix it and also add a test
that would have failed at least some of the time if run on a machine with
an uptime greater than 208.5 days with:

 void test_pr91486_wait_until(): Assertion 'float_steady_clock::call_count <= 
3' failed.

If we implement the optimisation to not re-check against the custom clock
when the wait is complete if is_steady == true then the test would have
started failing due to the wait not being long enough.

(I used a couple of the GCC farm machines that have high uptimes to test
this.)

Thanks.

Mike.
>From fa4decc00698785fb9e07aa36c0d862414ca5ff9 Mon Sep 17 00:00:00 2001
From: Mike Crowe 
Date: Wed, 16 Sep 2020 16:55:11 +0100
Subject: [PATCH 2/2] libstdc++: Use correct duration for atomic_futex wait on
 custom clock [PR 91486]

As Jonathan Wakely pointed out[1], my change in commit
f9ddb696a289cc48d24d3d23c0b324cb88de9573 should have been rounding to
the target clock duration type rather than the input clock duration type
in __atomic_futex_unsigned::_M_load_when_equal_until just as (e.g.)
condition_variable does.

As well as fixing this, let's create a rather contrived test that fails
with the previous code, but unfortunately only when run on a machine
with an uptime of over 208.5 days, and even then not always.

[1] https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html

libstdc++-v3/ChangeLog:

	* include/bits/atomic_futex.h:
(__atomic_futex_unsigned::_M_load_when_equal_until): Use
target clock duration type when rounding.

* testsuite/30_threads/async/async.cc: (test_pr91486_wait_for)
rename from test_pr91486.  (float_steady_clock) new class for
test.  (test_pr91486_wait_until) new test.
---
 libstdc++-v3/include/bits/atomic_futex.h  |  2 +-
 .../testsuite/30_threads/async/async.cc   | 62 ++-
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
index aa137a7b64e..6093be0fbc7 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -235,7 +235,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  const __clock_t::time_point __s_entry = __clock_t::now();
 	  const auto __delta = __atime - __c_entry;
 	  const auto __s_atime = __s_entry +
-	  chrono::__detail::ceil<_Duration>(__delta);
+	  chrono::__detail::ceil<__clock_t::duration>(__delta);
 	  if (_M_load_when_equal_until(__val, __mo, __s_atime))
 	return true;
 	  __c_entry = _Clock::now();
diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 46f8d2f327d..1c779bfbcad 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -157,7 +157,7 @@ void test04()
   }
 }
 
-void test_pr91486()
+void test_pr91486_wait_for()
 {
   future f1 = async(launch::async, []() {
   std::this_thread::sleep_for(std::chrono::seconds(1));
@@ -171,6 +171,63 @@ void test_pr91486()
   VERIFY( elapsed_steady >= std::chrono::seconds(1) );
 }
 
+// This is a clock with a very recent epoch which ensures that the difference
+// between now() and one second in the future is representable in a float so
+// that when the generic clock version of
+// __atomic_futex_unsigned::_M_lo

Re: [PATCH] Increase rtx_cost of sse_to_integer in skylake_cost.

2020-09-19 Thread H.J. Lu via Gcc-patches
On Thu, Sep 17, 2020 at 11:11 PM Hongtao Liu  wrote:
>
> On Thu, Sep 17, 2020 at 12:10 PM Jeff Law  wrote:
> >
> >
> > On 9/15/20 9:20 PM, Hongtao Liu via Gcc-patches wrote:
> > > Hi:
> > >   Rtx cost of sse_to_integer would be used by pass_stv as a
> > > measurement for the scalar-to-vector transformation. As
> > > https://gcc.gnu.org/pipermail/gcc-patches/2019-August/528839.html
> > > indicates, movement between sse regs and gprs should be much expensive
> > > than movement inside gprs(which is 2 as default). This patch would
> > > also fix "pr96861".
> > >
> > >   Bootstrap is ok, regression test is ok for both "i386.exp=*
> > > --target_board='unix{-m32,}'" and "i386.exp=*
> > > --target_board='unix{-m32\ -march=cascadelake,-m64\
> > > -march=cascadelake}"".
> > >   No big impact on SPEC2017.
> > >   Ok for trunk?
> > >
> > > gcc/ChangeLog
> > >
> > > PR target/96861
> > > * config/i386/x86-tune-costs.h (skylake_cost): increase rtx
> > > cost of sse_to_integer from 2 to 6.
> > >
> > > gcc/testsuite
> > >
> > > * gcc.target/i386/pr95021-3.c: Add -mtune=generic.
> >
> > I'll defer to HJ's judgement here.  If he's OK with it, then it's fine
> > by me.
> >
>
> I've talked to H.J and he agrees on this patch.
> So i'm going to check in this patch.

I am checking this patch to add a testcase.


-- 
H.J.
From 474d34ec403e5b8808e16b9a6460a896322fa3d0 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 19 Sep 2020 09:57:16 -0700
Subject: [PATCH] x86: Add a testcase for PR target/96861

Add a testcase to verify that -march=skylake-avx512 -mtune=skylake-avx512
generates desired code sequence.

	PR target/96861
	* gcc.target/i386/pr96861.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr96861.c | 38 +
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr96861.c

diff --git a/gcc/testsuite/gcc.target/i386/pr96861.c b/gcc/testsuite/gcc.target/i386/pr96861.c
new file mode 100644
index 000..7b7aeccb83c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr96861.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mstv -march=skylake-avx512 -mtune=skylake-avx512" } */
+
+#define max(a,b) (((a) > (b))? (a) : (b))
+#define min(a,b) (((a) < (b))? (a) : (b))
+
+int smax1(int x)
+{
+  return max(x,1);
+}
+
+int smin1(int x)
+{
+  return min(x,1);
+}
+
+int smaxm1(int x)
+{
+  return max(x,-1);
+}
+
+int sminm1(int x)
+{
+  return min(x,-1);
+}
+
+unsigned int umax1(unsigned int x)
+{
+  return max(x,1);
+}
+
+unsigned int umin1(unsigned int x)
+{
+  return min(x,1);
+}
+
+/* { dg-final { scan-assembler-times "test" 6 } } */
+/* { dg-final { scan-assembler-not "cmp" } } */
-- 
2.26.2



Re: [PATCH] amdgcn, nvptx: Disable OMP barriers in nested teams

2020-09-19 Thread Andrew Stubbs

On 18/09/2020 12:25, Andrew Stubbs wrote:
This patch fixes a problem in which nested OpenMP parallel regions cause 
errors if the number of inner teams is not balanced (i.e. the number of 
loop iterations is not divisible by the number of physical threads). A 
testcase is included.


This updated version removes an editing mistake that should have been 
spotted sooner.


Sorry for the inconvenience.

Andrew
libgomp: disable barriers in nested teams

Both GCN and NVPTX allow nested parallel regions, but the barrier
implementation did not allow the nested teams to run independently of each
other (due to hardware limitations).  This patch fixes that, under the
assumption that each thread will create a new subteam of one thread, by
simply not using barriers when there's no other thread to synchronise.

libgomp/ChangeLog:

	* config/gcn/bar.c (gomp_barrier_wait_end): Skip the barrier if the
	total number of threads is one.
	(gomp_team_barrier_wake): Likewise.
	(gomp_team_barrier_wait_end): Likewise.
	(gomp_team_barrier_wait_cancel_end): Likewise.
	* config/nvptx/bar.c (gomp_barrier_wait_end): Likewise.
	(gomp_team_barrier_wake): Likewise.
	(gomp_team_barrier_wait_end): Likewise.
	(gomp_team_barrier_wait_cancel_end): Likewise.
	* testsuite/libgomp.c-c++-common/nested-parallel-unbalanced.c: New test.

diff --git a/libgomp/config/gcn/bar.c b/libgomp/config/gcn/bar.c
index 02fd19710d4..a21529a624b 100644
--- a/libgomp/config/gcn/bar.c
+++ b/libgomp/config/gcn/bar.c
@@ -43,7 +43,8 @@ gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
   __atomic_store_n (&bar->generation, bar->generation + BAR_INCR,
 			MEMMODEL_RELAXED);
 }
-  asm ("s_barrier" ::: "memory");
+  if (bar->total > 1)
+asm ("s_barrier" ::: "memory");
 }
 
 void
@@ -71,7 +72,8 @@ gomp_barrier_wait_last (gomp_barrier_t *bar)
 void
 gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
 {
-  asm ("s_barrier" ::: "memory");
+  if (bar->total > 1)
+asm ("s_barrier" ::: "memory");
 }
 
 void
@@ -97,7 +99,8 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
 	  state &= ~BAR_CANCELLED;
 	  state += BAR_INCR - BAR_WAS_LAST;
 	  __atomic_store_n (&bar->generation, state, MEMMODEL_RELAXED);
-	  asm ("s_barrier" ::: "memory");
+	  if (bar->total > 1)
+	asm ("s_barrier" ::: "memory");
 	  return;
 	}
 }
@@ -172,7 +175,8 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
 	{
 	  state += BAR_INCR - BAR_WAS_LAST;
 	  __atomic_store_n (&bar->generation, state, MEMMODEL_RELAXED);
-	  asm ("s_barrier" ::: "memory");
+	  if (bar->total > 1)
+	asm ("s_barrier" ::: "memory");
 	  return false;
 	}
 }
@@ -195,7 +199,8 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
 	  abort();
 	}
 
-  asm ("s_barrier" ::: "memory");
+  if (bar->total > 1)
+	asm ("s_barrier" ::: "memory");
   gen = __atomic_load_n (&bar->generation, MEMMODEL_RELAXED);
   if (__builtin_expect (gen & BAR_CANCELLED, 0))
 	return true;
diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c
index 125ca3e49ec..1116561d931 100644
--- a/libgomp/config/nvptx/bar.c
+++ b/libgomp/config/nvptx/bar.c
@@ -41,7 +41,8 @@ gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
   __atomic_store_n (&bar->generation, bar->generation + BAR_INCR,
 			MEMMODEL_RELEASE);
 }
-  asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+  if (bar->total > 1)
+asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
 }
 
 void
@@ -69,7 +70,8 @@ gomp_barrier_wait_last (gomp_barrier_t *bar)
 void
 gomp_team_barrier_wake (gomp_barrier_t *bar, int count)
 {
-  asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+  if (bar->total > 1)
+asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
 }
 
 void
@@ -95,7 +97,8 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
 	  state &= ~BAR_CANCELLED;
 	  state += BAR_INCR - BAR_WAS_LAST;
 	  __atomic_store_n (&bar->generation, state, MEMMODEL_RELEASE);
-	  asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+	  if (bar->total > 1)
+	asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
 	  return;
 	}
 }
@@ -104,7 +107,8 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
   state &= ~BAR_CANCELLED;
   do
 {
-  asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+  if (bar->total > 1)
+	asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
   gen = __atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE);
   if (__builtin_expect (gen & BAR_TASK_PENDING, 0))
 	{
@@ -158,7 +162,8 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar,
 	{
 	  state += BAR_INCR - BAR_WAS_LAST;
 	  __atomic_store_n (&bar->generation, state, MEMMODEL_RELEASE);
-	  asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+	  if (bar->total > 1)
+	asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
 	  return false;
 	}
 }
@@ -169,7 +174,8 @@ gomp_team_barrier_wait_cancel_end (gomp_barrier_t *ba

[PATCH] c++: Return only in-scope tparms in keep_template_parm [PR95310]

2020-09-19 Thread Patrick Palka via Gcc-patches
In the testcase below, the dependent specializations iter_reference_t
and iter_reference_t share the same tree due to specialization
caching.  So when find_template_parameters walks through the
requires-expression (as part of normalization), it sees and includes the
out-of-scope template parameter F in the list of template parameters
it found within the requires-expression (along with Out and N).

>From a correctness perspective this is harmless since the parameter mapping
routines only care about the level and index of each parameter, so F is
no different from Out in this sense.  (And it's also harmless that two
parameters in the parameter mapping have the same level and index.)

But having both Out and F in the parameter mapping is extra work for
hash_atomic_constrant, tsubst_parameter_mapping and get_mapped_args; and
it also means we print this irrelevant template parameter in the
testcase's diagnostics (via pp_cxx_parameter_mapping):

  in requirements with ‘Out o’ [with N = (const int&)&a; F = const int*; Out = 
const int*]

This patch makes keep_template_parm return only in-scope template
parameters by looking into ctx_parms for the corresponding in-scope one.

(That we sometimes print irrelevant template parameters in diagnostics is
also the subject of PR99 and PR66968, so the above diagnostic issue
could likely be fixed in a more general way, but this targeted fix to
keep_template_parm is perhaps worthwhile on its own.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
cmcstl2 and range-v3.  Does this look OK for trunk?

gcc/cp/ChangeLog:

PR c++/95310
* pt.c (keep_template_parm): Adjust the given template parameter
to the corresponding in-scope one from ctx_parms.

gcc/testsuite/ChangeLog:

PR c++/95310
* g++.dg/concepts/diagnostic15.C: New test.
* g++.dg/cpp2a/concepts-ttp2.C: New test.
---
 gcc/cp/pt.c  | 19 +++
 gcc/testsuite/g++.dg/concepts/diagnostic15.C | 16 
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/concepts/diagnostic15.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fe45de8d796..c2c70ff02b9 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10550,6 +10550,25 @@ keep_template_parm (tree t, void* data)
BOUND_TEMPLATE_TEMPLATE_PARM itself.  */
 t = TREE_TYPE (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL (t));
 
+  /* This template parameter might be an argument to a cached dependent
+ specalization that was formed earlier inside some other template, in which
+ case the parameter is not among the ones that are in-scope.  Look in
+ CTX_PARMS to find the corresponding in-scope template parameter and
+ always return that instead.  */
+  tree cparms = ftpi->ctx_parms;
+  while (TMPL_PARMS_DEPTH (cparms) > level)
+cparms = TREE_CHAIN (cparms);
+  gcc_assert (TMPL_PARMS_DEPTH (cparms) == level);
+  if (TREE_VEC_LENGTH (TREE_VALUE (cparms)))
+{
+  t = TREE_VALUE (TREE_VEC_ELT (TREE_VALUE (cparms), index));
+  /* As in template_parm_to_arg.  */
+  if (TREE_CODE (t) == TYPE_DECL || TREE_CODE (t) == TEMPLATE_DECL)
+   t = TREE_TYPE (t);
+  else
+   t = DECL_INITIAL (t);
+}
+
   /* Arguments like const T yield parameters like const T. This means that
  a template-id like X would yield two distinct parameters:
  T and const T. Adjust types to their unqualified versions.  */
diff --git a/gcc/testsuite/g++.dg/concepts/diagnostic15.C 
b/gcc/testsuite/g++.dg/concepts/diagnostic15.C
new file mode 100644
index 000..3acd9f67968
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/diagnostic15.C
@@ -0,0 +1,16 @@
+// PR c++/95310
+// { dg-do compile { target concepts } }
+
+template 
+using iter_reference_t = decltype(*T{});
+
+template 
+struct result { using type = iter_reference_t; };
+
+template 
+concept indirectly_writable = requires(Out o) { // { dg-bogus "F =" }
+  iter_reference_t(*o) = N;
+};
+
+const int a = 0;
+static_assert(indirectly_writable); // { dg-error "assert" }
-- 
2.28.0.497.g54e85e7af1



Re: [PATCH] irange_pool class

2020-09-19 Thread Martin Sebor via Gcc-patches

On 9/18/20 3:09 PM, Andrew MacLeod wrote:

On 9/18/20 4:35 PM, Martin Sebor wrote:

On 9/18/20 11:36 AM, Andrew MacLeod wrote:

On

it works exactly like one would expect a simple allocator to work.. 
as long as the allcoator is "live", its allocations are live.  once 
it is destructed, all the memory it manages is freed..    It purpose 
is to avoid having to walk/find everything that was allocated so it 
can be freed.


I'll give you the use case from the ranger. in fact, it *is* the 
ranger's allocator, exposed for other passes to use.


Ranger uses the allocator to store the live-on-entry ranges for 
ssa-names.  Each basic block has a vec allocated as needed 
which is indexed by ssa-name.


int_range_max is passed to range_on_entry() to go off and find the 
range..  when it comes back, it could have 0-255 subranges,. it 
doesnt matter.
we call allocate(range) to get a pointer to an efficent copy and 
store it in the vector for the ssa name in that block.
If the updater later discovers that the range can be made more 
accurate, it checks if the new range fits in the memory allocated and 
if it does, simply copies.  if it doesnt fit, then it frees the old 
hunk, and allocates  a new one and stores that.


The ranger creates an allocator when it starts up, and then frees it 
when its being destructed.  Thats the life of the on-entry cache, so 
thats matches the life of the allocator..


I don't understand the proxy comment, or why one would ever want to 
copy the allocator itself? or why would you derive from irange? why 
cant you just create an allocator when the pass starts, use it when 
you need to store a range, and then let it go at the end of the pass 
with the other memory?


The int_range template is derived from irange and provides the array
of trees that the irange works with.  The pool also provides memory
for iranges and effectively returns objects "derived" from irange
(they're bigger than it is).

What I meant by proxy is a substitute class each of whose objects
stands in for a single instance of int_range where N is
a runtime value.   There's no class like that.



[Just to be clear, I don't meant for this discussion to hold up
the patch if you need the pool internally.  Anothert class like
the one I propose can be added later if necessary.]



no, but that doesnt serve a lot of purpose either.     you can call 
allocator.allocate(N) which is effectively that... ?


Yes, but the allocator object isn't always conveniently accessible.
Imagine needing to allocate a dynamically sized irange is in a copy
ctor of some class that's stored in a vec, as the vec is being
resized.  The allocator would need to be a global pointer.  That's
of course possible but not the most elegant design.

its mean to be a convenient way to get a range allocated to store via 
a pointer. nothing more.  if you have more complex needs,then you 
need to manage those needs.  The ranger manages the live on entry 
vectors separately from the ranges..


What I'm thinking of is actually more basic than that: an irange
class with a runtime number of subranges, one that can be either
directly stored in a container like vec:

  vec

where dynamic_range is such a class, or that can be a member of
a class that's stored in it.  I expect that will be the default
use case for the passes that walk the IL looking for the sizes
and offsets into the objects, accesses to which they validate.
This can be done with int_range for constant N but not with
irange because it doesn't own the memory it works with).

To illustrate what I mean here's a completely untested outline
of a plain-Jane dynamic_irange class (it won't compile because
it accesses private and protected members of irange, but it
should give you the idea):

  class dynamic_irange: public irange
  {
  public:
    dynamic_irange (unsigned num_pairs)
  : irange (new tree[num_pairs], num_pairs) { }

    dynamic_irange (const dynamic_irange &rhs)
  : irange (new tree[rhs.m_max_ranges], rhs.m_num_ranges)
    { irange::operator= (rhs); }

    dynamic_irange (dynamic_irange &&rhs)
  : irange (rhs.m_base, rhs.m_max_ranges)
    { rhs.m_base = 0; }

    dynamic_irange& operator= (const dynamic_irange &rhs)
    {
  if (this != &rhs)
    {
  delete[] m_base;
  m_base = new tree[rhs.m_max_ranges];
  m_num_ranges = rhs.m_num_ranges;
  irange::operator= (rhs);
    }
  return *this;
    }
    ~dynamic_irange () { delete[] m_base; }
  };

A more fancy class would be parameterized on an Allocator policy
that it would allocate memory with, much like C++ standard classes
do.  That would let you define an obstack-based allocator class to
use the way you need, as well and any others.  (Providing
the allocator as a template argument to just the ctor as opposed
to the whole class itself would make different "instances"
interchangeable for one another.)

Martin


  We had a dynamic sized irange, I told aldy to kill it off and we 
replaced it with

Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

2020-09-19 Thread Martin Sebor via Gcc-patches

On 9/18/20 12:29 AM, Aldy Hernandez wrote:



On 9/17/20 10:18 PM, Martin Sebor wrote:

On 9/17/20 12:39 PM, Andrew MacLeod wrote:

On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:

On 9/16/20 9:23 PM, Jeff Law wrote:


On 9/15/20 1:47 PM, Martin Sebor wrote:

Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
can lead to a subsequent buffer overflow corrupting the heap or
stack.  The attached patch diagnoses a subset of these cases where
the overflow/wraparound is still detectable.

Besides regtesting GCC on x86_64-linux I also verified the warning
doesn't introduce any false positives into Glibc or Binutils/GDB
builds on the same target.

Martin

gcc-96838.diff

PR middle-end/96838 - missing warning on integer overflow in calls 
to allocation functions


gcc/ChangeLog:

PR middle-end/96838
* calls.c (eval_size_vflow): New function.
(get_size_range): Call it.  Add argument.
(maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
* calls.h (get_size_range): Add argument.

gcc/testsuite/ChangeLog:

PR middle-end/96838
* gcc.dg/Walloc-size-larger-than-19.c: New test.
* gcc.dg/Walloc-size-larger-than-20.c: New test.


If an attacker can control an integer overflow that feeds an 
allocation, then they can do all kinds of bad things.  In fact, 
when my son was asking me attack vectors, this is one I said I'd 
look at if I were a bad guy.



I'm a bit surprised you can't just query the range of the argument 
and get the overflow status out of that range, but I don't see that 
in the APIs.  How painful would it be to make that part of the API? 
The conceptual model would be to just ask for the range of the 
argument to malloc which would include the range and a status bit 
indicating the computation might have overflowed.


  Do we know if it did/would have wrapped? sure.  since we have to do 
the math.    so you are correct in that the information is there. but 
is it useful?


We are in the very annoying habit of subtracting one by adding 
0xFFF.  which means you get an overflow for unsigned when you 
subtract one.   From what I have seen of unsigned math, we would be 
flagging very many operations as overflows, so you would still have 
the difficulty of figuring out whether its a "real" overflow or a 
fake one because of the way we do unsigned math


You and me both :)



At the very start, I did have an overflow flag in the range class... 
but it was turning out to be fairly useless so it was removed.

.


I agree that being able to evaluate an expression in an as-if
infinite precision (in addition to its type) would be helpful.


SO again, we get back to adding 0x0f when we are trying to 
subtract one...  now, with infinite precision you are going to see


  [2,10]  - 1  we end up with [2,10]+0xFF, which will now 
give you  [0x10001, 0x10009]    so its going to look like it 
overflowed?





But just to make sure I understood correctly, let me ask again
using an example:

  void* f (size_t n)
  {
    if (n < PTRDIFF_MAX / 2)
  n = PTRDIFF_MAX / 2;

    return malloc (n * sizeof (int));
  }

Can the unsigned wraparound in the argument be readily detected?

On trunk, this ends up with the following:

  # RANGE [4611686018427387903, 18446744073709551615]
  _6 = MAX_EXPR ;
  # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
  _1 = _6 * 4;
  ...
  p_5 = mallocD.1206 (_1); [tail call]
  ...
  return p_5;

so _1's range reflects the wraparound in size_t, but _6's range
has enough information to uncover it.  So detecting it is possible
and is done in the patch so we get a warning:

warning: argument 1 range [18446744073709551612, 
0x3fffc] is too large to represent in ‘long unsigned 
int’ [-Walloc-size-larger-than=]

    6 |   return malloc (n * sizeof (int));
  |  ^

The code is very simplistic and only handles a small subset of cases.
It could be generalized and exposed by a more generic API but it does
seem like the ranger must already have all the logic built into it so
if it isn't exposed now it should be a matter of opening it up.


everything is exposed in range-ops.  well, mostly.
if we have _1 = _6 * 4

if one wanted to do that infinite precision, you query the range for 
_6, and the range for 4 (which would be [4,4] :-)

range_of_expr (op1r, _6, stmt)
range_of_expr (op2r, 4, stmt)

you could take their current types, and cast those ranges to whatever 
the next higher precsion is,

range_cast  (op1r, highertype)
range_cast (op2r, highertype)
then invoke the operation on those parameters

gimple_range_fold (r, stmt,  op1r, op2r)

and that will do your operation in the higher precision.  you could 
compare that to the value in regular precision too i suppose.


The patch does pretty much exactly what you described, except in
offset_int, and only for a limited set of arithmetic operations.
It figures out if an unsigned expression wrapped simply by checking
to

[PATCH] c++: Detect deduction guide redeclaration [PR97099]

2020-09-19 Thread Marek Polacek via Gcc-patches
[temp.deduct.guide]p3: Two deduction guide declarations in the same
translation unit for the same class template shall not have equivalent
parameter-declaration-clauses.

So let's detect that.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/97099
* decl.c (redeclaration_error_message): Detect a redeclaration of
deduction guides.

gcc/testsuite/ChangeLog:

PR c++/97099
* g++.dg/cpp1z/class-deduction74.C: New test.
---
 gcc/cp/decl.c | 20 
 .../g++.dg/cpp1z/class-deduction74.C  | 31 +++
 2 files changed, 45 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction74.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 13f065d5058..af796499df7 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -3003,6 +3003,10 @@ redeclaration_error_message (tree newdecl, tree olddecl)
}
}
 
+  if (deduction_guide_p (olddecl)
+ && deduction_guide_p (newdecl))
+   return G_("deduction guide %q+D redeclared");
+
   /* [class.compare.default]: A definition of a comparison operator as
 defaulted that appears in a class shall be the first declaration of
 that function.  */
@@ -3053,24 +3057,28 @@ redeclaration_error_message (tree newdecl, tree olddecl)
  "% attribute");
  else
return G_("%q+D redeclared inline without "
- "% attribute");
+ "% attribute");
}
}
 
-  /* Core issue #226 (C++0x): 
-   
+  if (deduction_guide_p (olddecl)
+ && deduction_guide_p (newdecl))
+   return G_("deduction guide %q+D redeclared");
+
+  /* Core issue #226 (C++11):
+
If a friend function template declaration specifies a
default template-argument, that declaration shall be a
definition and shall be the only declaration of the
function template in the translation unit.  */
-  if ((cxx_dialect != cxx98) 
+  if ((cxx_dialect != cxx98)
   && TREE_CODE (ot) == FUNCTION_DECL && DECL_FRIEND_P (ot)
-  && !check_default_tmpl_args (nt, DECL_TEMPLATE_PARMS (newdecl), 
+ && !check_default_tmpl_args (nt, DECL_TEMPLATE_PARMS (newdecl),
/*is_primary=*/true,
   /*is_partial=*/false,
/*is_friend_decl=*/2))
 return G_("redeclaration of friend %q#D "
- "may not have default template arguments");
+ "may not have default template arguments");
 
   return NULL;
 }
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C
new file mode 100644
index 000..fe113819a95
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C
@@ -0,0 +1,31 @@
+// PR c++/97099
+// { dg-do compile { target c++17 } }
+// [temp.deduct.guide]p3: Two deduction guide declarations in the same
+// translation unit for the same class template shall not have equivalent
+// parameter-declaration-clauses.
+
+template struct S { };
+template struct X { };
+
+S() -> S; // { dg-message "previously declared here|old declaration" }
+S() -> S; // { dg-error "redeclared" }
+X() -> X;
+S() -> S; // { dg-error "ambiguating new declaration of" }
+
+S(bool) -> S; // { dg-message "previously declared here" }
+explicit S(bool) -> S; // { dg-error "redeclared" }
+
+explicit S(char) -> S; // { dg-message "previously declared here" }
+S(char) -> S; // { dg-error "redeclared" }
+
+template S(T, T) -> S; // { dg-message "previously declared 
here" }
+template X(T, T) -> X;
+template S(T, T) -> S; // { dg-error "redeclared" }
+
+// OK: Use SFINAE.
+template S(T) -> S;
+template S(T) -> S;
+
+// OK: Non-template wins.
+S(int) -> S;
+template S(int) -> S;

base-commit: 4a5ff2b56bfea0b3e154a15e809c5c42dc3b9e9f
-- 
2.26.2



[PATCH] c++: CTAD and explicit deduction guides for copy-list-init [PR90210]

2020-09-19 Thread Marek Polacek via Gcc-patches
This PR points out that we accept

  template struct tuple { tuple(T); }; // #1
  template explicit tuple(T t) -> tuple; // #2
  tuple t = { 1 };

despite the 'explicit' deduction guide in a copy-list-initialization
context.  That's because in deduction_guides_for we first find the
user-defined deduction guide (#2), and then ctor_deduction_guides_for
creates artificial deduction guides: one from the tuple(T) constructor and
a copy guide.  So we end up with these three guides:

  (1) template tuple(T) -> tuple [DECL_NONCONVERTING_P]
  (2) template tuple(tuple) -> tuple
  (3) template tuple(T) -> tuple

Then, in do_class_deduction, we prune this set, and get rid of (1).
Then overload resolution selects (3) and we succeed.

But [over.match.list]p1 says "In copy-list-initialization, if an explicit
constructor is chosen, the initialization is ill-formed."  It also goes
on to say that this differs from other situations where only converting
constructors are considered for copy-initialization.  Therefore for
list-initialization we consider explicit constructors and complain if one
is chosen.  E.g. convert_like_internal/ck_user can give an error.

So my logic runs that we should not prune the deduction_guides_for guides
in a copy-list-initialization context, and only complain if we actually
choose an explicit deduction guide.  This matches clang++/EDG/msvc++.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/90210
* pt.c (do_class_deduction): Don't prune explicit deduction guides
in copy-list-initialization.  In copy-list-initialization, if an
explicit deduction guide was selected, give an error.

gcc/testsuite/ChangeLog:

PR c++/90210
* g++.dg/cpp1z/class-deduction73.C: New test.
---
 gcc/cp/pt.c   | 49 ++-
 .../g++.dg/cpp1z/class-deduction73.C  | 41 
 2 files changed, 79 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction73.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index cfe5ff4a94f..9bcb743dc1d 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28929,6 +28929,7 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
   tree type = TREE_TYPE (tmpl);
 
   bool try_list_ctor = false;
+  bool list_init_p = false;
 
   releasing_vec rv_args = NULL;
   vec *&args = *&rv_args;
@@ -28936,6 +28937,7 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
 args = make_tree_vector ();
   else if (BRACE_ENCLOSED_INITIALIZER_P (init))
 {
+  list_init_p = true;
   try_list_ctor = TYPE_HAS_LIST_CTOR (type);
   if (try_list_ctor && CONSTRUCTOR_NELTS (init) == 1)
{
@@ -28967,9 +28969,10 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
   if (cands == error_mark_node)
 return error_mark_node;
 
-  /* Prune explicit deduction guides in copy-initialization context.  */
+  /* Prune explicit deduction guides in copy-initialization context (but
+ not copy-list-initialization).  */
   bool elided = false;
-  if (flags & LOOKUP_ONLYCONVERTING)
+  if (!list_init_p && (flags & LOOKUP_ONLYCONVERTING))
 {
   for (lkp_iterator iter (cands); !elided && iter; ++iter)
if (DECL_NONCONVERTING_P (STRIP_TEMPLATE (*iter)))
@@ -29038,18 +29041,42 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
   --cp_unevaluated_operand;
 }
 
-  if (call == error_mark_node
-  && (complain & tf_warning_or_error))
+  if (call == error_mark_node)
 {
-  error ("class template argument deduction failed:");
+  if (complain & tf_warning_or_error)
+   {
+ error ("class template argument deduction failed:");
 
-  ++cp_unevaluated_operand;
-  call = build_new_function_call (cands, &args, complain | tf_decltype);
-  --cp_unevaluated_operand;
+ ++cp_unevaluated_operand;
+ call = build_new_function_call (cands, &args,
+ complain | tf_decltype);
+ --cp_unevaluated_operand;
 
-  if (elided)
-   inform (input_location, "explicit deduction guides not considered "
-   "for copy-initialization");
+ if (elided)
+   inform (input_location, "explicit deduction guides not considered "
+   "for copy-initialization");
+   }
+  return error_mark_node;
+}
+  /* [over.match.list]/1: In copy-list-initialization, if an explicit
+ constructor is chosen, the initialization is ill-formed.  */
+  else if (flags & LOOKUP_ONLYCONVERTING)
+{
+  tree fndecl = cp_get_callee_fndecl_nofold (call);
+  if (fndecl && DECL_NONCONVERTING_P (fndecl))
+   {
+ if (complain & tf_warning_or_error)
+   {
+ // TODO: Pass down location from cp_finish_decl.
+ error ("class template argument deduction for %qT failed: "
+"explicit deduction guide selected in "
+"copy-list-initialization"

[PATCH] c++: Implement -Wctad-maybe-unsupported.

2020-09-19 Thread Marek Polacek via Gcc-patches
I noticed that clang++ has this CTAD warning and thought that it might
be useful to have it.  From clang++: "Some style guides want to allow
using CTAD only on types that "opt-in"; i.e. on types that are designed
to support it and not just types that *happen* to work with it."

So this warning warns when CTAD deduced a type, but the type does not
define any deduction guides.  In that case CTAD worked only because the
compiler synthesized the implicit deduction guides.  That might not be
intended.

It can be suppressed by adding a deduction guide that will never be
considered:

  struct allow_ctad_t;
  template  struct S { S(T) {} };
  S(allow_ctad_t) -> S;

This warning is off by default.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

* c.opt (Wctad-maybe-unsupported): New option.

gcc/cp/ChangeLog:

* pt.c (deduction_guides_for): Add a bool parameter.  Set it.
(do_class_deduction): Warn when CTAD succeeds but the type doesn't
have any explicit deduction guides.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wctad-maybe-unsupported.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wctad-maybe-unsupported.C: New test.
---
 gcc/c-family/c.opt|  5 ++
 gcc/cp/pt.c   | 22 -
 gcc/doc/invoke.texi   | 20 -
 .../g++.dg/warn/Wctad-maybe-unsupported.C | 88 +++
 4 files changed, 130 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wctad-maybe-unsupported.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c1d8fd336e8..fef4f09f72e 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -467,6 +467,11 @@ Wcpp
 C ObjC C++ ObjC++ CppReason(CPP_W_WARNING_DIRECTIVE)
 ; Documented in common.opt
 
+Wctad-maybe-unsupported
+C++ ObjC++ Var(warn_ctad_maybe_unsupported) Warning
+Warn when performing class template argument deduction on a type with no
+deduction guides.
+
 Wctor-dtor-privacy
 C++ ObjC++ Var(warn_ctor_dtor_privacy) Warning
 Warn when all constructors and destructors are private.
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index fe45de8d796..177b762883d 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28830,17 +28830,19 @@ static GTY((deletable)) hash_map 
*dguide_cache;
 
 /* Return the non-aggregate deduction guides for deducible template TMPL.  The
aggregate candidate is added separately because it depends on the
-   initializer.  */
+   initializer.  Set ANY_DGUIDES_P if we find a non-implicit deduction
+   guide.  */
 
 static tree
-deduction_guides_for (tree tmpl, tsubst_flags_t complain)
+deduction_guides_for (tree tmpl, bool &any_dguides_p, tsubst_flags_t complain)
 {
   tree guides = NULL_TREE;
   if (DECL_ALIAS_TEMPLATE_P (tmpl))
 {
   tree under = DECL_ORIGINAL_TYPE (DECL_TEMPLATE_RESULT (tmpl));
   tree tinfo = get_template_info (under);
-  guides = deduction_guides_for (TI_TEMPLATE (tinfo), complain);
+  guides = deduction_guides_for (TI_TEMPLATE (tinfo), any_dguides_p,
+complain);
 }
   else
 {
@@ -28849,6 +28851,8 @@ deduction_guides_for (tree tmpl, tsubst_flags_t 
complain)
  LOOK_want::NORMAL, /*complain*/false);
   if (guides == error_mark_node)
guides = NULL_TREE;
+  else
+   any_dguides_p = true;
 }
 
   /* Cache the deduction guides for a template.  We also remember the result of
@@ -28974,7 +28978,8 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
   if (args == NULL)
 return error_mark_node;
 
-  tree cands = deduction_guides_for (tmpl, complain);
+  bool any_dguides_p = false;
+  tree cands = deduction_guides_for (tmpl, any_dguides_p, complain);
   if (cands == error_mark_node)
 return error_mark_node;
 
@@ -29063,6 +29068,15 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
"for copy-initialization");
 }
 
+  /* If CTAD succeeded but the type doesn't have any explicit deduction
+ guides, this deduction might not be what the user intended.  */
+  if (call != error_mark_node
+  && !any_dguides_p
+  && warning (OPT_Wctad_maybe_unsupported,
+ "%qT may not intend to support class template argument "
+ "deduction", type))
+inform (input_location, "add a deduction guide to suppress this warning");
+
   return cp_build_qualified_type (TREE_TYPE (call), cp_type_quals (ptype));
 }
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8086e27aefb..74864535f53 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -236,7 +236,8 @@ in the following sections.
 -Wabi-tag  -Wcatch-value  -Wcatch-value=@var{n} @gol
 -Wno-class-conversion  -Wclass-memaccess @gol
 -Wcomma-subscript  -Wconditionally-supported @gol
--Wno-conversion-null  -Wctor-dtor-privacy  -Wno-delete-incomplete @gol
+-Wno-conversion-null  -Wctad-maybe-unsupported @gol
+-Wctor-dtor-privacy  

Re: New modref/ipa_modref optimization passes

2020-09-19 Thread Jan Hubicka
Hi,
this is cleaned up version of the patch.  I removed unfinished bits, fixed
propagation, cleaned it up and fixed fallout.

At present modref does bare minimum of what it can do and only collects
base/ref alias sets of accesses to disambiguate uding them.  I have followup
patches adding more features.

Like pure-const pass, modref is both local and IPA pass (so early optimizations
benefits from it). It works with both LTO and non-LTO. LTO tracking is bit
harder since instead of alias sets one needs to track types alias sets are
derived from (because alias sets change from compile to link time). For that
reason I had to exprt reference_alias_ptr_type_1 so I can lookup same type as
get_alias_set does.  There is an assert checking that both functions are in
sync.

Building cc1plus with LTO I get:

Alias oracle query stats:
  refs_may_alias_p: 59342803 disambiguations, 69597406 queries
  ref_maybe_used_by_call_p: 139290 disambiguations, 60227579 queries
  call_may_clobber_ref_p: 17604 disambiguations, 23290 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 37088 queries
  nonoverlapping_refs_since_match_p: 19439 disambiguations, 55079 must 
overlaps, 76966 queries
  aliasing_component_refs_p: 54891 disambiguations, 756375 queries
  TBAA oracle: 22475946 disambiguations 51373119 queries
   15188806 are in alias set 0
   8345975 queries asked about the same object
   124 queries asked about the same alias set
   0 access volatile
   3811042 are dependent in the DAG
   1551226 are aritificially in conflict with void *

Modref stats:
  modref use: 6315 disambiguations, 60861 queries
  modref clobber: 1788334 disambiguations, 2706342 queries
  1223123 tbaa querries (0.451947 per modref querry)

PTA query stats:
  pt_solution_includes: 922059 disambiguations, 13392569 queries
  pt_solutions_intersect: 1018189 disambiguations, 11428599 queries

This ignores the disambiguations enabled by the local modref done during early
optimization.

On cc1plus it seems that modref querries are cheap (usually converging after
checking 0.45 alias sets :) and quite effective for stores with similar number
of disambiguations as the PTA oracle (that is surprisingly high). Load
disambiguations are quite low. Parlty this is becuase we only track lods that
binds locally, since otherwise we risk that there is optimized out load that
will be still present in prevailing version of the function and may lead to
memory trap.

Incrementally it may be worth to start tracking info about ealry optimized out
loads, but that needs more work.

Other problem I noticed while looking to dumps is that gimple, tree and rtl
datastrutures are unions and we drop all accesses to them to alias set 0 that
makes the anaysis not very effetive on GCC itself. This analysis of all those
uninlined tree/rtl/gimple predicates quite ineffective that is bit sad.

For tramp3d I get:
Alias oracle query stats:
  refs_may_alias_p: 2261286 disambiguations, 2525487 queries
  ref_maybe_used_by_call_p: 7402 disambiguations, 2298244 queries
  call_may_clobber_ref_p: 232 disambiguations, 232 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 4145 queries
  nonoverlapping_refs_since_match_p: 329 disambiguations, 10232 must overlaps, 
10648 queries
  aliasing_component_refs_p: 858 disambiguations, 34731 queries
  TBAA oracle: 972943 disambiguations 1808294 queries
   130565 are in alias set 0
   497083 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   207388 are dependent in the DAG
   315 are aritificially in conflict with void *

Modref stats:
  modref use: 950 disambiguations, 5474 queries
  modref clobber: 30035 disambiguations, 81948 queries
  41671 tbaa querries (0.508505 per modref querry)

PTA query stats:
  pt_solution_includes: 383095 disambiguations, 593201 queries
  pt_solutions_intersect: 147774 disambiguations, 434255 queries

So again the loads are disambiguated less.

There is a new testsuite failure:
./testsuite/gfortran/gfortran.sum:FAIL: gfortran.dg/assumed_type_9.f90   -O2  
execution test  
./testsuite/gfortran/gfortran.sum:FAIL: 
gfortran.dg/assumed_type_9.f90   -Os  execution test

This is an pre-existing issue where fortran frontend builds array descriptors
in type that contains pointer to actual data while mixes them up with
data type describing array of unknown type. The two structures are not TBAA
compatible without LTO and thus leads to misoptimization.

Same wrong code can be triggered by forcing inlining to happen and disabling
ccp pass (that otherwise hides the problem), so I think we could handle this
incrementally as independent bug - I would like to get basic code in so we can
test additional patches in foreseeable future.

While there are 

Re: [PING 2][PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-09-19 Thread Martin Sebor via Gcc-patches

On 9/17/20 4:38 PM, Joseph Myers wrote:

On Wed, 16 Sep 2020, Martin Sebor via Gcc-patches wrote:


Attached is an updated revision of the patch.  Besides the tweaks
above it also contains a cosmetic change to the warning issued
for mismatches in unspecified VLA bounds: it points at the decl
with more of them to guide the user to specify them rather than
make them all unspecified.


The previous version of the patch had a while loop as previously discussed
to handle skipping multiple consecutive cdk_attrs.

+  next = pd->declarator;
+  while (next && next->kind == cdk_attrs)
+   next = next->declarator;

This version is back to an "if", but I don't see anything else in the new
version of that function that actually means the "if" would skip multiple
consecutive cdk_attrs as desired.

The patch is OK with the "while" restored there.  If for some reason the
"while" breaks something, we'll need to look in more detail at exactly
what case isn't being handled correctly by "while".


I guess it was the result of an experiment, trying to see if I could
break it with the 'if'.  I (hope I) put it back and pushed the whole
series.  I had to squash patches 1 and 2 because of a dependency that
I had missed.

Thanks for the review, by the way.  I think the signature validation
we've ended up with is quite a bit more comprehensive than the first
attempt.

Martin


Re: [PATCH] irange_pool class

2020-09-19 Thread Andrew MacLeod via Gcc-patches

On 9/19/20 4:32 PM, Martin Sebor wrote:

On 9/18/20 3:09 PM, Andrew MacLeod wrote:

O

What I meant by proxy is a substitute class each of whose objects
stands in for a single instance of int_range where N is
a runtime value.   There's no class like that.



[Just to be clear, I don't meant for this discussion to hold up
the patch if you need the pool internally.  Anothert class like
the one I propose can be added later if necessary.]


It won't  :-)




no, but that doesnt serve a lot of purpose either.     you can call 
allocator.allocate(N) which is effectively that... ?


Yes, but the allocator object isn't always conveniently accessible.
Imagine needing to allocate a dynamically sized irange is in a copy
ctor of some class that's stored in a vec, as the vec is being
resized.  The allocator would need to be a global pointer.  That's
of course possible but not the most elegant design.

I am now imagining an overly complex c++ system that doesnt really fit 
with GCC:-)  I dont think GCCs vec could deal with that model.


when thats a reality, we'll deal with it. for now, Im not overly concerned.
If we reintroduce a dynamic object, we'll worry about it then, and make 
sure a dynamic object can be associated with an allocation object..

<>




A more fancy class would be parameterized on an Allocator policy
that it would allocate memory with, much like C++ standard classes
do.  That would let you define an obstack-based allocator class to
use the way you need, as well and any others.  (Providing
the allocator as a template argument to just the ctor as opposed
to the whole class itself would make different "instances"
interchangeable for one another.)

Martin


  We had a dynamic sized irange, I told aldy to kill it off and we 
replaced it with int_range_max with 255 ranges because the combo of 
int_range_max for calculating and allocation to store seemed to solve 
all the problems with far less allocation overhead, and wasnt 
particularly onerous.


int_range_max use to have a small vector of something like 5 pairs. 
If a range was being created and we needed more by accessing elements 
higher than that, , it would allocate a hunk of memory to be able to 
handle it, plus a little extra buffer space, and point to that 
instead. So it was a dynamically size int_range_max that managed it 
own memory. we found that in practice, the pairing of the current 
int_range-max and the allocation pool was far more efficient 99% of 
the time.  so we just eliminated it.


Something like that could certainly be revived...  but most of the 
time it doesnt seem necessary.  Generally, you need to ask for a 
range and then store it.  Ask for it with int_range_max, and store it 
with the allocator if you are goignto need a lot fo them.  so instead of


range_of_expr (vec[x], name..)

you do

int_range_max m;
range_of_expr (m, name)
vec[x] = allocato(m);

Do you really need 6 or 10 subranges to find out the answer to the 
questions you are looking for?  most of the time, 2 or 3 pairs 
carries all the information anyone needs and its efficient switches 
are the biggest beneficiary of the multiple ranges, allowing us to be 
quite precise on what reaches the interior of a case or the default.


the next question is, how many of these do you need?  The range is 
doing it with there allocator because it could in theory have #BB * 
#SSA_NAMES, which could be a lot.    if you have just a single or 2 
vectors over ssa-names, and that is sparsley filled, just use 
int-range-max.


The use case I'm thinking of would have an irange of some size for
every decl or result of an allocation call accessed in a function,
plus another irange for every SSA_NAME that's an object pointer.
The size of the first irange would be that of the allocation
argument in the first case.  In the second case it would be
the combined range of the offsets the pointer from whatever it
points to (e.g., in p0 = &x; p1 = p0 + i; p2 = p1 + j; p2's
offset range would be Ri + Rj where R is the value range of i
or j.

It probably doesn't makes sense to keep track of 255 subranges
(or even many more than 5) but in compliance with the guidance
in the irange best practices document to write code for [as if]
infinite subranges, the data structures should be free of any
hardwired limit.  So I envision I might have something like

I think you are interpreting that guidance too literally.
It would perhaps be better to word it more in line with what is 
practical . If what you are interested in is the upper and lower limit, 
then 1 or 2 sub-ranges is plenty.  you really dont care about the middle 
ranges

if you are a switch operation, then infinite might be appropriate.
pointers tracking null and non-null?  int_range<1> is likely enough.

you are dealing with offsets.. you really dont care about those middle 
sub-ranges.  really.  I see little benefit for you from that.  You 
benefit for you is the more accurate/easy to use ranges.


for now, i would simply use an int_range

[r11-3306 Regression] FAIL: gcc.dg/Warray-bounds-66.c (test for warnings, line 122) on Linux/x86_64 (-m32 -march=cascadelake)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

3f9a497d1b0dd9da87908a11b59bf364ad40ddca is the first bad commit
commit 3f9a497d1b0dd9da87908a11b59bf364ad40ddca
Author: Martin Sebor 
Date:   Sat Sep 19 17:47:29 2020 -0600

Extend -Warray-bounds to detect out-of-bounds accesses to array parameters.

caused

FAIL: gcc.dg/Warray-bounds-66.c (test for excess errors)
FAIL: gcc.dg/Warray-bounds-66.c  (test for warnings, line 122)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3306/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-66.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/ipa/ipa-sra-15.c scan-ipa-dump-times sra "Will split parameter" 2 on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/ipa/ipa-sra-15.c scan-ipa-dump-times sra "component at byte 
offset" 4
FAIL: gcc.dg/ipa/ipa-sra-15.c scan-ipa-dump-times sra "Will split parameter" 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-15.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/ipa/ipa-sra-12.c scan-ipa-dump-times sra "Will split parameter" 2 on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/ipa/ipa-sra-12.c scan-ipa-dump-times sra "component at byte 
offset" 4
FAIL: gcc.dg/ipa/ipa-sra-12.c scan-ipa-dump-times sra "Will split parameter" 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-12.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/ipa/ipa-sra-13.c scan-ipa-dump-times sra "Will split parameter" 2 on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/ipa/ipa-sra-13.c scan-ipa-dump-times sra "component at byte 
offset" 4
FAIL: gcc.dg/ipa/ipa-sra-13.c scan-ipa-dump-times sra "Will split parameter" 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-13.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/ipa/ipa-sra-1.c scan-ipa-dump sra "Will split parameter" on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/ipa/ipa-sra-1.c scan-ipa-dump sra "Will split parameter"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/atomic/pr65345-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/atomic/pr65345-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: gcc.dg/atomic/pr65345-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/atomic/pr65345-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error)
FAIL: gcc.dg/atomic/pr65345-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="atomic.exp=gcc.dg/atomic/pr65345-4.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3306 Regression] FAIL: gcc.dg/Warray-bounds-66.c (test for warnings, line 122) on Linux/x86_64 (-m32)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

3f9a497d1b0dd9da87908a11b59bf364ad40ddca is the first bad commit
commit 3f9a497d1b0dd9da87908a11b59bf364ad40ddca
Author: Martin Sebor 
Date:   Sat Sep 19 17:47:29 2020 -0600

Extend -Warray-bounds to detect out-of-bounds accesses to array parameters.

caused

FAIL: gcc.dg/Warray-bounds-66.c (test for excess errors)
FAIL: gcc.dg/Warray-bounds-66.c  (test for warnings, line 122)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3306/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-66.c --target_board='unix{-m32}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/ipa/ipa-sra-15.c scan-ipa-dump-times sra "Will split parameter" 2 on Linux/x86_64 (-m64)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/ipa/ipa-sra-15.c scan-ipa-dump-times sra "component at byte 
offset" 4
FAIL: gcc.dg/ipa/ipa-sra-15.c scan-ipa-dump-times sra "Will split parameter" 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-15.c --target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/ipa/ipa-sra-1.c scan-ipa-dump sra "Will split parameter" on Linux/x86_64 (-m64)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/ipa/ipa-sra-1.c scan-ipa-dump sra "Will split parameter"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-1.c 
--target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/ipa/ipa-sra-12.c scan-ipa-dump-times sra "Will split parameter" 2 on Linux/x86_64 (-m64)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/ipa/ipa-sra-12.c scan-ipa-dump-times sra "component at byte 
offset" 4
FAIL: gcc.dg/ipa/ipa-sra-12.c scan-ipa-dump-times sra "Will split parameter" 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-12.c --target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/ipa/ipa-sra-13.c scan-ipa-dump-times sra "Will split parameter" 2 on Linux/x86_64 (-m64)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/ipa/ipa-sra-13.c scan-ipa-dump-times sra "component at byte 
offset" 4
FAIL: gcc.dg/ipa/ipa-sra-13.c scan-ipa-dump-times sra "Will split parameter" 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-13.c --target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3303 Regression] FAIL: gcc.dg/atomic/pr65345-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) on Linux/x86_64 (-m64)

2020-09-19 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6450f07388f9fe575a489c9309c36012b17b88b0 is the first bad commit
commit 6450f07388f9fe575a489c9309c36012b17b88b0
Author: Martin Sebor 
Date:   Sat Sep 19 17:21:52 2020 -0600

Infrastructure & C front end changes for array parameter checking (PR 
c/50584).

caused

FAIL: gcc.dg/atomic/pr65345-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: gcc.dg/atomic/pr65345-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: gcc.dg/atomic/pr65345-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error)
FAIL: gcc.dg/atomic/pr65345-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3303/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="atomic.exp=gcc.dg/atomic/pr65345-4.c --target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: New modref/ipa_modref optimization passes

2020-09-19 Thread Jan Hubicka
Hi,
this is patch I am using to fix the assumed_alias_type.f90 failure by
simply arranging alias set 0 for the problematic array descriptor.

I am not sure this is the best option, but I suppose it is better than
setting all array descritors to have same canonical type (as done by
LTO)?

Honza

* trans-types.c (gfc_get_array_type_bounds): Set alias set to 0 for
arrays of unknown element type.
diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index 26fdb2803a7..bef3d270c06 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -1903,6 +1903,12 @@ gfc_get_array_type_bounds (tree etype, int dimen, int 
codimen, tree * lbound,
   base_type = gfc_get_array_descriptor_base (dimen, codimen, false);
   TYPE_CANONICAL (fat_type) = base_type;
   TYPE_STUB_DECL (fat_type) = TYPE_STUB_DECL (base_type);
+  /* Arrays of unknown type must alias with all array descriptors.  */
+  if (etype == ptr_type_node)
+{
+  TYPE_ALIAS_SET (base_type) = 0;
+  TYPE_ALIAS_SET (fat_type) = 0;
+}
 
   tmp = TYPE_NAME (etype);
   if (tmp && TREE_CODE (tmp) == TYPE_DECL)


Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

2020-09-19 Thread Aldy Hernandez via Gcc-patches




On 9/19/20 11:22 PM, Martin Sebor wrote:

On 9/18/20 12:29 AM, Aldy Hernandez wrote:



On 9/17/20 10:18 PM, Martin Sebor wrote:

On 9/17/20 12:39 PM, Andrew MacLeod wrote:

On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:

On 9/16/20 9:23 PM, Jeff Law wrote:


On 9/15/20 1:47 PM, Martin Sebor wrote:

Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
can lead to a subsequent buffer overflow corrupting the heap or
stack.  The attached patch diagnoses a subset of these cases where
the overflow/wraparound is still detectable.

Besides regtesting GCC on x86_64-linux I also verified the warning
doesn't introduce any false positives into Glibc or Binutils/GDB
builds on the same target.

Martin

gcc-96838.diff

PR middle-end/96838 - missing warning on integer overflow in 
calls to allocation functions


gcc/ChangeLog:

PR middle-end/96838
* calls.c (eval_size_vflow): New function.
(get_size_range): Call it.  Add argument.
(maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
* calls.h (get_size_range): Add argument.

gcc/testsuite/ChangeLog:

PR middle-end/96838
* gcc.dg/Walloc-size-larger-than-19.c: New test.
* gcc.dg/Walloc-size-larger-than-20.c: New test.


If an attacker can control an integer overflow that feeds an 
allocation, then they can do all kinds of bad things.  In fact, 
when my son was asking me attack vectors, this is one I said I'd 
look at if I were a bad guy.



I'm a bit surprised you can't just query the range of the argument 
and get the overflow status out of that range, but I don't see 
that in the APIs.  How painful would it be to make that part of 
the API? The conceptual model would be to just ask for the range 
of the argument to malloc which would include the range and a 
status bit indicating the computation might have overflowed.


  Do we know if it did/would have wrapped? sure.  since we have to 
do the math.    so you are correct in that the information is there. 
but is it useful?


We are in the very annoying habit of subtracting one by adding 
0xFFF.  which means you get an overflow for unsigned when you 
subtract one.   From what I have seen of unsigned math, we would be 
flagging very many operations as overflows, so you would still have 
the difficulty of figuring out whether its a "real" overflow or a 
fake one because of the way we do unsigned math


You and me both :)



At the very start, I did have an overflow flag in the range class... 
but it was turning out to be fairly useless so it was removed.

.


I agree that being able to evaluate an expression in an as-if
infinite precision (in addition to its type) would be helpful.


SO again, we get back to adding 0x0f when we are trying to 
subtract one...  now, with infinite precision you are going to see


  [2,10]  - 1  we end up with [2,10]+0xFF, which will now 
give you  [0x10001, 0x10009]    so its going to look like it 
overflowed?





But just to make sure I understood correctly, let me ask again
using an example:

  void* f (size_t n)
  {
    if (n < PTRDIFF_MAX / 2)
  n = PTRDIFF_MAX / 2;

    return malloc (n * sizeof (int));
  }

Can the unsigned wraparound in the argument be readily detected?

On trunk, this ends up with the following:

  # RANGE [4611686018427387903, 18446744073709551615]
  _6 = MAX_EXPR ;
  # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
  _1 = _6 * 4;
  ...
  p_5 = mallocD.1206 (_1); [tail call]
  ...
  return p_5;

so _1's range reflects the wraparound in size_t, but _6's range
has enough information to uncover it.  So detecting it is possible
and is done in the patch so we get a warning:

warning: argument 1 range [18446744073709551612, 
0x3fffc] is too large to represent in ‘long unsigned 
int’ [-Walloc-size-larger-than=]

    6 |   return malloc (n * sizeof (int));
  |  ^

The code is very simplistic and only handles a small subset of cases.
It could be generalized and exposed by a more generic API but it does
seem like the ranger must already have all the logic built into it so
if it isn't exposed now it should be a matter of opening it up.


everything is exposed in range-ops.  well, mostly.
if we have _1 = _6 * 4

if one wanted to do that infinite precision, you query the range for 
_6, and the range for 4 (which would be [4,4] :-)

range_of_expr (op1r, _6, stmt)
range_of_expr (op2r, 4, stmt)

you could take their current types, and cast those ranges to 
whatever the next higher precsion is,

range_cast  (op1r, highertype)
range_cast (op2r, highertype)
then invoke the operation on those parameters

gimple_range_fold (r, stmt,  op1r, op2r)

and that will do your operation in the higher precision.  you could 
compare that to the value in regular precision too i suppose.


The patch does pretty much exactly what you described, except in
offset_int, and only for a limited set of arithmetic operations.
It figures out if an unsi