[committed][testsuite] Add missing require-effective-target alloca

2020-09-25 Thread Tom de Vries
Hi,

Add missing require-effect-target alloca directives.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Add missing require-effective-target alloca

gcc/testsuite/ChangeLog:

2020-09-25  Tom de Vries  

* gcc.dg/analyzer/pr93355-localealias.c: Require effective target
alloca.

---
 gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c 
b/gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c
index a5cb0d56e70..043e45f828e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr93355-localealias.c
@@ -5,6 +5,7 @@
 /* { dg-do "compile" } */
 /* { dg-additional-options "-Wno-analyzer-too-complex 
-fno-analyzer-feasibility" } */
 /* TODO: remove the need for these options.  */
+/* { dg-require-effective-target alloca } */
 
 /* Handle aliases for locale names.
Copyright (C) 1995-1999, 2000-2001, 2003 Free Software Foundation, Inc.


[PATCH][OBVIOUS] Fix spacing in cgraph_node::dump.

2020-09-25 Thread Martin Liška

It's a small refactoring which I consider as obvious.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
Martin

gcc/ChangeLog:

* cgraph.c (cgraph_node::dump): Always print space at the end
of a message.  Remove one extra space.
---
 gcc/cgraph.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index b43adaac7c0..eb5f1a56c26 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2272,18 +2272,17 @@ cgraph_node::dump (FILE *f)
   edge->dump_edge_flags (f);
   if (edge->indirect_info->param_index != -1)
{
- fprintf (f, " of param:%i", edge->indirect_info->param_index);
+ fprintf (f, "of param:%i ", edge->indirect_info->param_index);
  if (edge->indirect_info->agg_contents)
-  fprintf (f, " loaded from %s %s at offset %i",
+  fprintf (f, "loaded from %s %s at offset %i ",
edge->indirect_info->member_ptr ? "member ptr" : 
"aggregate",
edge->indirect_info->by_ref ? "passed by reference":"",
(int)edge->indirect_info->offset);
  if (edge->indirect_info->vptr_changed)
-   fprintf (f, " (vptr maybe changed)");
+   fprintf (f, "(vptr maybe changed) ");
}
-  fprintf (f, " Num speculative call targets: %i",
+  fprintf (f, "num speculative call targets: %i\n",
   edge->indirect_info->num_speculative_call_targets);
-  fprintf (f, "\n");
   if (edge->indirect_info->polymorphic)
edge->indirect_info->context.dump (f);
 }
--
2.28.0



[committed] openmp: Add support for non-rect simd and improve collapsed simd support

2020-09-25 Thread Jakub Jelinek via Gcc-patches
The following change adds support for non-rectangular simd loops.
While working on that, I've noticed we actually don't vectorize collapsed
simd loops at all, because the code that I thought would be vectorizable
actually is not vectorized.  While in theory for the constant lower/upper
bounds and constant step of all but the outermost loop we could in theory
vectorize by computing the seprate iterators using vectorized division
and modulo for each of them from the single iterator that increments
by 1 from 0 to total iteration count in the loop nest, I think that would
be fairly expensive and the chances of the loop body being vectorizable
would be low e.g. because of array indices unlikely to be linear and would
need scatters/gathers.
This patch changes the generated code to vectorize only the innermost
loop which has higher chance of being vectorized.  Below is the list of
tests and function names in which the patch resulted in vectorizing something
that hasn't been vectorized before (ok, the first line is a new test).
I've also found that the vectorizer will not vectorize loops with non-constant
steps, I plan to do something about those incrementally on the omp-expand.c
side (basically, compute number of iterations before the loop and use a 0 to
number_of_iterations step 1 IV as the main one).

I have problem with the composite simd vectorization though.
The point is that each thread (or task etc.) is given only a range of
consecutive iterations, so somewhere earlier it computes total number of 
iterations
and splits the work between the workers and then the intent is to try to 
vectorize it.
So, each thread is then given a begin ... end-1 range that it would handle.
This means that from the single begin value I need to compute the individual 
iteration
vars I should start at and then goto into the loop nest to begin iterating there
(and actually compute how many iterations the innermost loop should do each time
so that it stops before end).
Very roughly the IL I emit is something like:
int t[100][100][100];

void
foo (int a, int b, int c, int d, int e, int f, int g, int h, int u, int v, int 
w, int x)
{
  int i, j, k;
  int cnt;
  if (x)
{
  i = u; j = v; k = w; goto doit;
}
  for (i = a; i < b; i += c)
for (j = d; j < e; j += f)
  {
k = g;
doit:
for (; k < h; k++)
  t[i][j][k] += i + j + k;
  }
}
Unfortunately, some pass then turns the innermost loop to have more than 2 
basic blocks
and it isn't vectorized because of that.

Also, I have disabled (for now) SIMTization of collapsed simd loops, because 
for SIMT
it would be using a single thread anyway and I didn't want to bother with 
checking
SIMT on all places I've been changing.  If SIMT support is added for some or all
collapsed loops, that omp-low.c change needs to be reverted.

Here is that list of what hasn't been vectorized before and is now:

gcc/testsuite/gcc.dg/vect/vect-simd-17.c doit
gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 bar
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-10.c 
f28_taskloop_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-10.c 
_Z24f28_taskloop_simd_normalv._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c 
f25_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c 
f26_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c 
f27_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c 
f28_tpf_simd_guided32._omp_fn.1
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-11.c 
f28_tpf_simd_runtime._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c 
_Z17f25_t_simd_normaliii._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c 
_Z17f26_t_simd_normalxxi._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c 
_Z17f27_t_simd_normalv._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c 
_Z20f28_tpf_simd_runtimev._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-11.c 
_Z21f28_tpf_simd_guided32v._omp_fn.1
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c f7_simd_normal
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c f7_simd_normal
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c f8_f_simd_guided32
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c f8_f_simd_guided32
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c f8_f_simd_runtime
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c f8_f_simd_runtime
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c 
f8_pf_simd_guided32._omp_fn.0
libgomp/testsuite/libgomp.c/../libgomp.c-c++-common/for-2.c 
f8_pf_simd_runtime._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c 
_Z18f8_pf_simd_runtimev._omp_fn.0
libgomp/testsuite/libgomp.c++/../libgomp.c-c++-common/for-2.c 
_Z19f8_pf_simd_

[PATCH] GCOV: do not mangle .gcno files.

2020-09-25 Thread Martin Liška

Hi.

As mentioned in the PR, we should not mangle .gcno files.
I'm going to install the fix if there are no concerns.

Martin

gcc/ChangeLog:

PR gcov-profile/97193
* coverage.c (coverage_init): GCDA note files should not be
mangled and should end in output directory.
---
 gcc/coverage.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/coverage.c b/gcc/coverage.c
index f353c9c5022..7711412c3be 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -1206,6 +1206,8 @@ coverage_obj_finish (vec *ctor)
 void
 coverage_init (const char *filename)
 {
+  const char *original_filename = filename;
+  int original_len = strlen (original_filename);
 #if HAVE_DOS_BASED_FILE_SYSTEM
   const char *separator = "\\";
 #else
@@ -1277,9 +1279,9 @@ coverage_init (const char *filename)
bbg_file_name = xstrdup (profile_note_location);
   else
{
- bbg_file_name = XNEWVEC (char, len + strlen (GCOV_NOTE_SUFFIX) + 1);
- memcpy (bbg_file_name, filename, len);
- strcpy (bbg_file_name + len, GCOV_NOTE_SUFFIX);
+ bbg_file_name = XNEWVEC (char, original_len + strlen 
(GCOV_NOTE_SUFFIX) + 1);
+ memcpy (bbg_file_name, original_filename, original_len);
+ strcpy (bbg_file_name + original_len, GCOV_NOTE_SUFFIX);
}
 
   if (!gcov_open (bbg_file_name, -1))

--
2.28.0



[PATCH] stor-layout: Reject forming arrays with elt sizes not divisible by elt alignment [PR97164]

2020-09-25 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, since 2005 we reject if array elements are smaller
than their alignment (i.e. overaligned elements), because such arrays don't
make much sense, only their first element is guaranteed to be aligned as
user requested, but the next element can't be.
The following testcases show something we've been silent about but is
equally bad, the 2005 case is just the most common special case of that
the array element size is not divisible by the alignment.  In those arrays
too only the first element is guaranteed to be properly aligned and the
second one can't be.

This patch rejects those cases too, but keeps the existing wording for the
old common case.

Unfortunately, the patch breaks bootstrap, because libbid uses this mess
(forms arrays with 24 byte long elements with 16 byte element alignment).
I don't really see justification for that, so I've decreased the alignment
to 8 bytes instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-09-25  Jakub Jelinek  

PR tree-optimization/97164
gcc/
* stor-layout.c (layout_type): Also reject arrays where element size
is constant, but not a multiple of element alignment.
gcc/testsuite/
* c-c++-common/pr97164.c: New test.
* gcc.c-torture/execute/pr36093.c: Move ...
* gcc.dg/pr36093.c: ... here.  Add dg-do compile and dg-error
directives.
* gcc.c-torture/execute/pr43783.c: Move ...
* gcc.dg/pr43783.c: ... here.  Add dg-do compile, dg-options and
dg-error directives.
libgcc/
* config/libbid/bid_functions.h (UINT192): Decrease alignment to 8
bytes.

--- gcc/stor-layout.c.jj2020-04-21 17:06:17.267661631 +0200
+++ gcc/stor-layout.c   2020-09-22 20:51:39.311285386 +0200
@@ -2579,10 +2579,19 @@ layout_type (tree type)
/* If TYPE_SIZE_UNIT overflowed, then it is certainly larger than
   TYPE_ALIGN_UNIT.  */
&& !TREE_OVERFLOW (TYPE_SIZE_UNIT (element))
-   && !integer_zerop (TYPE_SIZE_UNIT (element))
-   && compare_tree_int (TYPE_SIZE_UNIT (element),
-TYPE_ALIGN_UNIT (element)) < 0)
- error ("alignment of array elements is greater than element size");
+   && !integer_zerop (TYPE_SIZE_UNIT (element)))
+ {
+   if (compare_tree_int (TYPE_SIZE_UNIT (element),
+ TYPE_ALIGN_UNIT (element)) < 0)
+ error ("alignment of array elements is greater than "
+"element size");
+   else if (TYPE_ALIGN_UNIT (element) > 1
+&& (wi::zext (wi::to_wide (TYPE_SIZE_UNIT (element)),
+ ffs_hwi (TYPE_ALIGN_UNIT (element)) - 1)
+!= 0))
+ error ("size of array element is not a multiple of its "
+"alignment");
+ }
break;
   }
 
--- libgcc/config/libbid/bid_functions.h.jj 2020-01-14 20:02:48.619582332 
+0100
+++ libgcc/config/libbid/bid_functions.h2020-09-23 01:12:02.672546190 
+0200
@@ -81,7 +81,7 @@ ALIGN (16)
 #define SQRT80 sqrtw
 #endif
 
- typedef ALIGN (16)
+ typedef ALIGN (8)
  struct {
UINT64 w[3];
  } UINT192;
--- gcc/testsuite/c-c++-common/pr97164.c.jj 2020-09-22 20:54:41.846670263 
+0200
+++ gcc/testsuite/c-c++-common/pr97164.c2020-09-22 20:54:21.815957235 
+0200
@@ -0,0 +1,15 @@
+/* PR tree-optimization/97164 */
+/* { dg-do compile } */
+
+typedef struct { int *a; char b[64]; } A __attribute__((aligned (64)));
+struct B { A d[4]; } b;/* { dg-error "size of array element is not a 
multiple of its alignment" } */
+void foo (void);
+
+int *
+bar (void)
+{
+  struct B *h = &b;
+  if (h->d[1].a)
+foo ();
+  return h->d[1].a;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr36093.c.jj2020-01-12 
11:54:37.353399227 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr36093.c   2020-09-24 
17:40:20.804678958 +0200
@@ -1,30 +0,0 @@
-/* { dg-skip-if "small alignment" { pdp11-*-* } } */
-
-extern void abort (void);
-
-typedef struct Bar {
-  char c[129];
-} Bar __attribute__((__aligned__(128)));
-
-typedef struct Foo {
-  Bar bar[4];
-} Foo;
-
-Foo foo[4];
-
-int main()
-{
-   int i, j;
-   Foo *foop = &foo[0];
-
-   for (i=0; i < 4; i++) {
-  Bar *bar = &foop->bar[i];
-  for (j=0; j < 129; j++) {
- bar->c[j] = 'a' + i;
-  }
-   }
-
-   if (foo[0].bar[3].c[128] != 'd')
- abort ();
-   return 0;
-}
--- gcc/testsuite/gcc.c-torture/execute/pr43783.c.jj2020-01-12 
11:54:37.354399212 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr43783.c   2020-09-24 
17:45:11.039563874 +0200
@@ -1,23 +0,0 @@
-/* { dg-skip-if "small alignment" { pdp11-*-* } } */
-
-typedef __attribute__((aligned(16)))
-struct {
-  unsigned long long w[3];
-} UINT192;
-
-UINT192 bid_Kx192[32];
-
-extern void abort (void);
-
-int main()
-{
-  int i = 0;
-  unsigned long x = 0;
-  for (i =

[PATCH] powerpc, libcpp: Fix gcc build with clang on power8 [PR97163]

2020-09-25 Thread Jakub Jelinek via Gcc-patches
Hi!

libcpp has two specialized altivec implementations of search_line_fast,
one for power8+ and the other one otherwise.
Both use __attribute__((altivec(vector))) and the GCC builtins rather than
altivec.h and the APIs from there, which is fine, but should be restricted
to when libcpp is built with GCC, so that it can be relied on.
The second elif is
#elif (GCC_VERSION >= 4005) && defined(__ALTIVEC__) && defined (__BIG_ENDIAN__)
and thus e.g. when built with clang it isn't picked, but the first one was
just guarded with
#elif defined(_ARCH_PWR8) && defined(__ALTIVEC__)
and so according to the bugreporter clang fails miserably on that.

The following patch fixes that by adding the same GCC_VERSION requirement
as the second version.  I don't know where the 4.5 in there comes from and
the exact version doesn't matter that much, as long as it is above 4.2 that
clang pretends to be and smaller or equal to 4.8 as the oldest gcc we
support as bootstrap compiler ATM.
Furthermore, the patch fixes the comment, the version it is talking about is
not pre-GCC 5, but actually the GCC 5+ one.

Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

2020-09-25  Jakub Jelinek  

PR bootstrap/97163
* lex.c (search_line_fast): Only use _ARCH_PWR8 Altivec version
for GCC >= 4.5.

--- libcpp/lex.c.jj 2020-07-28 15:39:56.434118065 +0200
+++ libcpp/lex.c2020-09-24 18:09:06.358207369 +0200
@@ -531,11 +531,11 @@ init_vectorized_lexer (void)
   search_line_fast = impl;
 }
 
-#elif defined(_ARCH_PWR8) && defined(__ALTIVEC__)
+#elif (GCC_VERSION >= 4005) && defined(_ARCH_PWR8) && defined(__ALTIVEC__)
 
 /* A vection of the fast scanner using AltiVec vectorized byte compares
and VSX unaligned loads (when VSX is available).  This is otherwise
-   the same as the pre-GCC 5 version.  */
+   the same as the AltiVec version.  */
 
 ATTRIBUTE_NO_SANITIZE_UNDEFINED
 static const uchar *

Jakub



Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

Hello.

All right, I come up with a rapid speed up that can allow us to remove
the introduced parameter. It contains 2 parts:
- BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
- JT: we spent quite some time in density calculation, we can guess it first
  and it leads to a fast bail out.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From dc4c1d129a50c7f51d28235506479f29d51dae07 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 24 Sep 2020 13:34:13 +0200
Subject: [PATCH 2/2] switch conversion: make a rapid speed up

gcc/ChangeLog:

	PR tree-optimization/96979
	* tree-switch-conversion.c (jump_table_cluster::can_be_handled):
	Make a fast bail out.
	(bit_test_cluster::can_be_handled): Likewise here.
	* tree-switch-conversion.h (get_range): Use wi::to_wide instead
	of a folding.

gcc/testsuite/ChangeLog:

	PR tree-optimization/96979
	* g++.dg/tree-ssa/pr96979.C: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr96979.C | 48 +
 gcc/tree-switch-conversion.c| 32 -
 gcc/tree-switch-conversion.h|  7 ++--
 3 files changed, 74 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr96979.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr96979.C b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
new file mode 100644
index 000..ec0f57a8548
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr96979.C
@@ -0,0 +1,48 @@
+/* PR tree-optimization/96979 */
+/* { dg-do compile } */
+/* { dg-options "-std=c++17 -O2" } */
+
+using u64 = unsigned long long;
+
+constexpr inline u64
+foo (const char *str) noexcept
+{
+  u64 value = 0xcbf29ce484222325ULL;
+  for (u64 i = 0; str[i]; i++)
+value = (value ^ u64(str[i])) * 0x10001b3ULL;
+  return value;
+}
+
+struct V
+{
+  enum W
+  {
+#define A(n) n,
+#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
+#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
+#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
+#define E D(foo1) D(foo2) D(foo3)
+E
+last
+  };
+
+  constexpr static W
+  bar (const u64 h) noexcept
+  {
+switch (h)
+  {
+#undef A
+#define F(n) #n
+#define A(n) case foo (F(n)): return n;
+E
+  }
+return last;
+  }
+};
+
+int
+baz (const char *s)
+{
+  const u64 h = foo (s);
+  return V::bar (h);
+}
diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 186411ff3c4..3212e964b84 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1268,6 +1268,15 @@ jump_table_cluster::can_be_handled (const vec &clusters,
   if (range == 0)
 return false;
 
+  unsigned HOST_WIDE_INT lhs = 100 * range;
+  if (lhs < range)
+return false;
+
+  /* First make quick guess as each cluster
+ can add at maximum 2 to the comparison_count.  */
+  if (lhs > 2 * max_ratio * (end - start + 1))
+return false;
+
   unsigned HOST_WIDE_INT comparison_count = 0;
   for (unsigned i = start; i <= end; i++)
 {
@@ -1275,10 +1284,6 @@ jump_table_cluster::can_be_handled (const vec &clusters,
   comparison_count += sc->m_range_p ? 2 : 1;
 }
 
-  unsigned HOST_WIDE_INT lhs = 100 * range;
-  if (lhs < range)
-return false;
-
   return lhs <= max_ratio * comparison_count;
 }
 
@@ -1364,12 +1369,12 @@ bit_test_cluster::can_be_handled (unsigned HOST_WIDE_INT range,
 {
   /* Check overflow.  */
   if (range == 0)
-return 0;
+return false;
 
   if (range >= GET_MODE_BITSIZE (word_mode))
 return false;
 
-  return uniq <= 3;
+  return uniq <= m_max_case_bit_tests;
 }
 
 /* Return true when cluster starting at START and ending at END (inclusive)
@@ -1379,6 +1384,7 @@ bool
 bit_test_cluster::can_be_handled (const vec &clusters,
   unsigned start, unsigned end)
 {
+  auto_vec dest_bbs;
   /* For algorithm correctness, bit test for a single case must return
  true.  We bail out in is_beneficial if it's called just for
  a single case.  */
@@ -1387,15 +1393,23 @@ bit_test_cluster::can_be_handled (const vec &clusters,
 
   unsigned HOST_WIDE_INT range = get_range (clusters[start]->get_low (),
 	clusters[end]->get_high ());
-  auto_bitmap dest_bbs;
+
+  /* Make a guess first.  */
+  if (!can_be_handled (range, m_max_case_bit_tests))
+return false;
 
   for (unsigned i = start; i <= end; i++)
 {
   simple_cluster *sc = static_cast (clusters[i]);
-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+	{
+	  dest_bbs.safe_push (sc->m_case_bb->index);
+	  if (dest_bbs.length () > m_max_case_bit_tests)
+	return false;
+	}
 }
 
-  return can_be_handled (range, bitmap_count_bits (dest_bbs));
+  return true;
 }
 
 /* Return true when COUNT of cases of UNIQ labels is beneficial for bit test
diff --git a/gcc/tree-switch-

[PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Biener
The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
with bit-precision elements correctly as the testcase shows before
the PR97085 fix.  The following makes it do the correct thing
(not 100% sure for CTOR of sub-vectors due to the lack of a testcase).

The alternative would be to assert such CTORs do not happen (and also
add IL verification for this).

The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
(thus the C FE needs that), thus test coverage is quite limited (zero)
now and I didn't manage to convince GCC to create such CTOR for SVE
VnBImode vectors.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Does this look sensible?

Thanks,
Richard.

2020-09-25  Richard Biener  

PR middle-end/96814
* expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
CTORs correctly.

* gcc.target/i386/pr96814.c: New testcase.
---
 gcc/expr.c  | 28 ++---
 gcc/testsuite/gcc.target/i386/pr96814.c | 19 +
 2 files changed, 40 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr96814.c

diff --git a/gcc/expr.c b/gcc/expr.c
index 1a15f24b397..fb42e485089 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -6922,7 +6922,9 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
insn_code icode = CODE_FOR_nothing;
tree elt;
tree elttype = TREE_TYPE (type);
-   int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
+   int elt_size
+ = (VECTOR_BOOLEAN_TYPE_P (type) ? TYPE_PRECISION (elttype)
+: tree_to_uhwi (TYPE_SIZE (elttype)));
machine_mode eltmode = TYPE_MODE (elttype);
HOST_WIDE_INT bitsize;
HOST_WIDE_INT bitpos;
@@ -6987,6 +6989,23 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
  }
  }
 
+   /* Compute the size of the elements in the CTOR.  */
+   tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
+   if (VECTOR_BOOLEAN_TYPE_P (type))
+ {
+   if (VECTOR_TYPE_P (val_type))
+ {
+   /* ???  Never seen such beast, but it's not disallowed.  */
+   gcc_assert (VECTOR_BOOLEAN_TYPE_P (val_type));
+   bitsize = (TYPE_PRECISION (TREE_TYPE (val_type))
+  * TYPE_VECTOR_SUBPARTS (val_type).to_constant ());
+ }
+   else
+ bitsize = TYPE_PRECISION (val_type);
+ }
+   else
+ bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
+
/* If the constructor has fewer elements than the vector,
   clear the whole array first.  Similarly if this is static
   constructor of a non-BLKmode object.  */
@@ -7001,11 +7020,7 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
 
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
  {
-   tree sz = TYPE_SIZE (TREE_TYPE (value));
-   int n_elts_here
- = tree_to_uhwi (int_const_binop (TRUNC_DIV_EXPR, sz,
-  TYPE_SIZE (elttype)));
-
+   int n_elts_here = bitsize / elt_size;
count += n_elts_here;
if (mostly_zeros_p (value))
  zero_count += n_elts_here;
@@ -7045,7 +7060,6 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
HOST_WIDE_INT eltpos;
tree value = ce->value;
 
-   bitsize = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (value)));
if (cleared && initializer_zerop (value))
  continue;
 
diff --git a/gcc/testsuite/gcc.target/i386/pr96814.c 
b/gcc/testsuite/gcc.target/i386/pr96814.c
new file mode 100644
index 000..b280c737130
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr96814.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-options "-mavx512vl -mavx512bw" } */
+/* { dg-require-effective-target avx512bw } */
+/* { dg-require-effective-target avx512vl } */
+
+typedef unsigned char __attribute__ ((__vector_size__ (32))) V;
+
+void
+test (void)
+{
+  V x = ((V){8} > 0) == 0;
+  for (unsigned i = 0; i < sizeof (x); i++)
+if (x[i] != (i ? 0xff : 0)) __builtin_abort();
+}
+
+#define DO_TEST test
+#define AVX512VL
+#define AVX512BW
+#include "avx512-check.h"
-- 
2.26.2


[PATCH] arm: Fix fp16 move patterns for base MVE

2020-09-25 Thread Richard Sandiford
This patch fixes ICEs in gcc.dg/torture/float16-basic.c for
-march=armv8.1-m.main+mve -mfloat-abi=hard.  The problem was
that an fp16 argument was (rightly) being passed in FPRs,
but the fp16 move patterns only handled GPRs.  LRA then cycled
trying to look for a way of handling the FPR.

It looks like there are three related problems here:

(1) We're using the wrong fp16 move pattern for base MVE.
*mov_vfp_16 (the pattern we use for +mve.fp)
works for base MVE too.

(2) The fp16 MVE load and store patterns are separate from the
main move patterns.  The loads and stores should instead be
alternatives of the main move patterns, so that LRA knows
what to do with pseudo registers that become stack slots.

(3) The range restrictions for the loads and stores were wrong
for fp16: we were enforcing a multiple of 4 in [-255*4, 255*4]
instead of a multiple of 2 in [-255*2, 255*2].

(2) came from a patch to prevent writeback being used for MVE.
That patch also added a Uj constraint to enforce the correct
memory types for MVE.  I think the simplest fix is therefore to merge
the loads and stores back into the main pattern and extend the Uj
constraint so that it acts like Um for non-MVE.

The testcase for that patch was mve-vldstr16-no-writeback.c, whose
main function is:

void
fn1 (__fp16 *pSrc)
{
  __fp16 high;
  __fp16 *pDst = 0;
  unsigned i;
  for (i = 0;; i++)
if (pSrc[i])
  pDst[i] = high;
}

Fixing (2) causes the store part to fail, not because we're using
writeback, but because we decide to use GPRs to store high (which is
uninitialised, and so gets replaced with zero).  This patch therefore
adds some scan-assembler-nots instead.  (I wondered about changing the
testcase to initialise high, but that seemed like a bad idea for
a regression test.)

For (3): MVE seems to be the only thing to use arm_coproc_mem_operand_wb
(and its various interfaces) for 16-bit scalars: the Neon patterns only
use it for 32-bit scalars.

I've added new tests to try the various FPR alternatives of the
move patterns.  The range of offsets that GCC uses for FPR loads
and stores is the intersection of the range allowed for GPRs and
FPRs, so the tests include GPR<->memory tests as well.

The fp32 and fp64 tests already pass, they're just there for
completeness.

Tested on arm-eabi (MVE configuration), armeb-eabi (generic
configuration) and arm-linux-gnueabihf.  OK to install?

Richard


gcc/
* config/arm/arm-protos.h (arm_mve_mode_and_operands_type_check):
Delete.
* config/arm/arm.c (arm_coproc_mem_operand_wb): Use a scale factor
of 2 rather than 4 for 16-bit modes.
(arm_mve_mode_and_operands_type_check): Delete.
* config/arm/constraints.md (Uj): Allow writeback for Neon,
but continue to disallow it for MVE.
* config/arm/arm.md (*arm32_mov): Add !TARGET_HAVE_MVE.
* config/arm/vfp.md (*mov_load_vfp_hf16, *mov_store_vfp_hf16): Fold
back into...
(*mov_vfp_16): ...here but use Uj for the FPR memory
constraints.  Use for base MVE too.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: Allow
the store to use GPRs instead of FPRs.  Add scan-assembler-nots
for writeback.
* gcc.target/arm/armv8_1m-fp16-move-1.c: New test.
* gcc.target/arm/armv8_1m-fp32-move-1.c: Likewise.
* gcc.target/arm/armv8_1m-fp64-move-1.c: Likewise.
---
 gcc/config/arm/arm-protos.h   |   1 -
 gcc/config/arm/arm.c  |  25 +-
 gcc/config/arm/arm.md |   4 +-
 gcc/config/arm/constraints.md |   9 +-
 gcc/config/arm/vfp.md |  32 +-
 .../gcc.target/arm/armv8_1m-fp16-move-1.c | 418 +
 .../gcc.target/arm/armv8_1m-fp32-move-1.c | 420 +
 .../gcc.target/arm/armv8_1m-fp64-move-1.c | 426 ++
 .../intrinsics/mve-vldstr16-no-writeback.c|   5 +-
 9 files changed, 1295 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp16-move-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp32-move-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-1.c

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0cc0ae78400..9bb9c61967b 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -120,7 +120,6 @@ extern int arm_coproc_mem_operand_no_writeback (rtx);
 extern int arm_coproc_mem_operand_wb (rtx, int);
 extern int neon_vector_mem_operand (rtx, int, bool);
 extern int mve_vector_mem_operand (machine_mode, rtx, bool);
-bool arm_mve_mode_and_operands_type_check (machine_mode, rtx, rtx);
 extern int neon_struct_mem_operand (rtx);
 
 extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 022ef6c3f1d..8105b39e7a4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/

[PATCH] testsuite/97204 - fix gcc.target/i386/sse2-mmx-pinsrw.c

2020-09-25 Thread Richard Biener
This fixes the testcase writing to adjacent stack vars, exposed
my IPA modref.

Tested on x86_64-unknown-linux-gnu, pushed.

2020-09-25  Richard Biener  

PR testsuite/97204
* gcc.target/i386/sse2-mmx-pinsrw.c: Fix.
---
 gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c 
b/gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c
index c25ddd96daa..fd933555913 100644
--- a/gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c
+++ b/gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c
@@ -42,7 +42,7 @@ compute_correct_result (__m64 *src_p, int val, unsigned int 
imm,
 static void
 sse2_test (void)
 {
-  int r, ck;
+  int r[2], ck[2];
   int i;
   int failed = 0;
   __v4hi y = { 3320, -3339, 48, 4392 };
@@ -50,9 +50,9 @@ sse2_test (void)
   /* Run the MMX tests */
   for (i = 0; i < 4; i++)
 {
-  test_pinsrw  ((__m64 *) &y, 0x1234, i, &r);
-  compute_correct_result ((__m64 *) &y, 0x1234, i, &ck);
-  if (r != ck)
+  test_pinsrw  ((__m64 *) &y, 0x1234, i, r);
+  compute_correct_result ((__m64 *) &y, 0x1234, i, ck);
+  if (r[0] != ck[0] || r[1] != ck[1])
failed++;
 }
 
-- 
2.26.2


RE: [PATCH] arm: Fix fp16 move patterns for base MVE

2020-09-25 Thread Kyrylo Tkachov
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: 25 September 2020 10:35
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH] arm: Fix fp16 move patterns for base MVE
> 
> This patch fixes ICEs in gcc.dg/torture/float16-basic.c for
> -march=armv8.1-m.main+mve -mfloat-abi=hard.  The problem was
> that an fp16 argument was (rightly) being passed in FPRs,
> but the fp16 move patterns only handled GPRs.  LRA then cycled
> trying to look for a way of handling the FPR.
> 
> It looks like there are three related problems here:
> 
> (1) We're using the wrong fp16 move pattern for base MVE.
> *mov_vfp_16 (the pattern we use for +mve.fp)
> works for base MVE too.
> 
> (2) The fp16 MVE load and store patterns are separate from the
> main move patterns.  The loads and stores should instead be
> alternatives of the main move patterns, so that LRA knows
> what to do with pseudo registers that become stack slots.
> 
> (3) The range restrictions for the loads and stores were wrong
> for fp16: we were enforcing a multiple of 4 in [-255*4, 255*4]
> instead of a multiple of 2 in [-255*2, 255*2].
> 
> (2) came from a patch to prevent writeback being used for MVE.
> That patch also added a Uj constraint to enforce the correct
> memory types for MVE.  I think the simplest fix is therefore to merge
> the loads and stores back into the main pattern and extend the Uj
> constraint so that it acts like Um for non-MVE.
> 
> The testcase for that patch was mve-vldstr16-no-writeback.c, whose
> main function is:
> 
> void
> fn1 (__fp16 *pSrc)
> {
>   __fp16 high;
>   __fp16 *pDst = 0;
>   unsigned i;
>   for (i = 0;; i++)
> if (pSrc[i])
>   pDst[i] = high;
> }
> 
> Fixing (2) causes the store part to fail, not because we're using
> writeback, but because we decide to use GPRs to store high (which is
> uninitialised, and so gets replaced with zero).  This patch therefore
> adds some scan-assembler-nots instead.  (I wondered about changing the
> testcase to initialise high, but that seemed like a bad idea for
> a regression test.)
> 
> For (3): MVE seems to be the only thing to use
> arm_coproc_mem_operand_wb
> (and its various interfaces) for 16-bit scalars: the Neon patterns only
> use it for 32-bit scalars.
> 
> I've added new tests to try the various FPR alternatives of the
> move patterns.  The range of offsets that GCC uses for FPR loads
> and stores is the intersection of the range allowed for GPRs and
> FPRs, so the tests include GPR<->memory tests as well.
> 
> The fp32 and fp64 tests already pass, they're just there for
> completeness.
> 
> Tested on arm-eabi (MVE configuration), armeb-eabi (generic
> configuration) and arm-linux-gnueabihf.  OK to install?

Ok.
Thanks for analysing these and fixing them.
Kyrill

> 
> Richard
> 
> 
> gcc/
>   * config/arm/arm-protos.h
> (arm_mve_mode_and_operands_type_check):
>   Delete.
>   * config/arm/arm.c (arm_coproc_mem_operand_wb): Use a scale
> factor
>   of 2 rather than 4 for 16-bit modes.
>   (arm_mve_mode_and_operands_type_check): Delete.
>   * config/arm/constraints.md (Uj): Allow writeback for Neon,
>   but continue to disallow it for MVE.
>   * config/arm/arm.md (*arm32_mov):
> Add !TARGET_HAVE_MVE.
>   * config/arm/vfp.md (*mov_load_vfp_hf16, *mov_store_vfp_hf16):
> Fold
>   back into...
>   (*mov_vfp_16): ...here but use Uj for the FPR
> memory
>   constraints.  Use for base MVE too.
> 
> gcc/testsuite/
>   * gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: Allow
>   the store to use GPRs instead of FPRs.  Add scan-assembler-nots
>   for writeback.
>   * gcc.target/arm/armv8_1m-fp16-move-1.c: New test.
>   * gcc.target/arm/armv8_1m-fp32-move-1.c: Likewise.
>   * gcc.target/arm/armv8_1m-fp64-move-1.c: Likewise.
> ---
>  gcc/config/arm/arm-protos.h   |   1 -
>  gcc/config/arm/arm.c  |  25 +-
>  gcc/config/arm/arm.md |   4 +-
>  gcc/config/arm/constraints.md |   9 +-
>  gcc/config/arm/vfp.md |  32 +-
>  .../gcc.target/arm/armv8_1m-fp16-move-1.c | 418 +
>  .../gcc.target/arm/armv8_1m-fp32-move-1.c | 420 +
>  .../gcc.target/arm/armv8_1m-fp64-move-1.c | 426 ++
>  .../intrinsics/mve-vldstr16-no-writeback.c|   5 +-
>  9 files changed, 1295 insertions(+), 45 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp16-move-
> 1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp32-move-
> 1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/armv8_1m-fp64-move-
> 1.c
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 0cc0ae78400..9bb9c61967b 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -120,7 +120,6

[PATCH][AArch64][GCC 9] Implement __rndr, __rndrrs intrinsics

2020-09-25 Thread Kyrylo Tkachov
Hi all,

I'd like to backport this patch to the GCC 9 branch implementing the RNG 
intrinsics from Armv8.5-a.
It should have been supported there from the start.
It doesn't apply cleanly since the SVE ACLE work in GCC 10 reworked some of the 
builtin handling,
but the resolution isn't complex.

Bootstrapped, tested and pushed on the branch.
Thanks,
Kyrill

This patch implements the recently published[1] __rndr and __rndrrs
intrinsics used to access the RNG in Armv8.5-A.
The __rndrrs intrinsics can be used to reseed the generator too.
They are guarded by the __ARM_FEATURE_RNG feature macro.
A quirk with these intrinsics is that they store the random number in
their pointer argument and return a status
code if the generation succeeded.

The instructions themselves write the CC flags indicating the success of
the operation that we can then read with a CSET.
Therefore this implementation makes use of the IGNORE indicator to the
builtin expand machinery to avoid generating
the CSET if its result is unused (the CC reg clobbering effect is still
reflected in the pattern).
I've checked that using unspec_volatile prevents undesirable CSEing of
the instructions.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

gcc/
PR target/71233
* config/aarch64/aarch64.md (UNSPEC_RNDR, UNSPEC_RNDRRS): Define.
(aarch64_rndr): New define_insn.
(aarch64_rndrrs): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_RNG): Define.
(TARGET_RNG): Likewise.
* config/aarch64/aarch64-builtins.c (enum aarch64_builtins):
Add AARCH64_BUILTIN_RNG_RNDR, AARCH64_BUILTIN_RNG_RNDRRS.
(aarch64_init_rng_builtins): Define.
(aarch64_init_builtins): Call aarch64_init_rng_builtins.
(aarch64_expand_rng_builtin): Define.
(aarch64_expand_builtin): Use IGNORE argument, handle
RNG builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RNG when TARGET_RNG.
* config/aarch64/arm_acle.h (__rndr, __rndrrs): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/rng_1.c: New test.


rndr-9.patch
Description: rndr-9.patch


[PATCH][AArch64][GCC 8] Implement __rndr, __rndrrs intrinsics

2020-09-25 Thread Kyrylo Tkachov
Hi all,

We got a request to support the RNG intrinsics on GCC 8 as some cores used with 
this compiler will support the instructions
and the intrinsics are not very invasive to implement.
This does require adding the +rng arch extension to GCC 8, which is simple to 
do.
Otherwise the patch looks very much like the GCC 9 version

Bootstrapped and tested on aarch64-none-linux gnu on the branch. Pushed.
Thanks,
Kyrill

Hi all,

This patch implements the recently published[1] __rndr and __rndrrs
intrinsics used to access the RNG in Armv8.5-A.
The __rndrrs intrinsics can be used to reseed the generator too.
They are guarded by the __ARM_FEATURE_RNG feature macro.
A quirk with these intrinsics is that they store the random number in
their pointer argument and return a status
code if the generation succeeded.

The instructions themselves write the CC flags indicating the success of
the operation that we can then read with a CSET.
Therefore this implementation makes use of the IGNORE indicator to the
builtin expand machinery to avoid generating
the CSET if its result is unused (the CC reg clobbering effect is still
reflected in the pattern).
I've checked that using unspec_volatile prevents undesirable CSEing of
the instructions.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

gcc/
PR target/71233
* config/aarch64/aarch64.md (UNSPEC_RNDR, UNSPEC_RNDRRS): Define.
(aarch64_rndr): New define_insn.
(aarch64_rndrrs): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_RNG): Define.
(TARGET_RNG): Likewise.
(AARCH64_FL_RNG): Likewise.
* config/aarch64/aarch64-option-extensions.def (rng): Define.
* config/aarch64/aarch64-builtins.c (enum aarch64_builtins):
Add AARCH64_BUILTIN_RNG_RNDR, AARCH64_BUILTIN_RNG_RNDRRS.
(aarch64_init_rng_builtins): Define.
(aarch64_init_builtins): Call aarch64_init_rng_builtins.
(aarch64_expand_rng_builtin): Define.
(aarch64_expand_builtin): Use IGNORE argument, handle
RNG builtins.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_RNG when TARGET_RNG.
* config/aarch64/arm_acle.h (__rndr, __rndrrs): Define.

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/acle/rng_1.c: New test.


rndr-8.patch
Description: rndr-8.patch


testsuite: [aarch64] Fix aarch64/advsimd-intrinsics/v{trn, uzp, zip}_half.c

2020-09-25 Thread Christophe Lyon via Gcc-patches
Since r11-3402 (g:65c9878641cbe0ed898aa7047b7b994e9d4a5bb1), the
vtrn_half, vuzp_half and vzip_half started failing with

vtrn_half.c:76:17: error: redeclaration of 'vector_float64x2' with no linkage
vtrn_half.c:77:17: error: redeclaration of 'vector2_float64x2' with no linkage
vtrn_half.c:80:17: error: redeclaration of 'vector_res_float64x2' with
no linkage

This is because r11-3402 now always declares float64x2 variables for
aarch64, leading to a duplicate declaration in these testcases.

The fix is simply to remove these now useless declarations.

These tests are skipped on arm*, so there is no impact on that target.

2020-09-25  Christophe Lyon  

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c: Remove
declarations of vector, vector2, vector_res for float64x2 type.
* gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vzip_half.c: Likewise.

OK?
testsuite: [aarch64] Fix aarch64/advsimd-intrinsics/v{trn,uzp,zip}_half.c

Since r11-3402 (g:65c9878641cbe0ed898aa7047b7b994e9d4a5bb1), the
vtrn_half, vuzp_half and vzip_half started failing with

vtrn_half.c:76:17: error: redeclaration of 'vector_float64x2' with no linkage
vtrn_half.c:77:17: error: redeclaration of 'vector2_float64x2' with no linkage
vtrn_half.c:80:17: error: redeclaration of 'vector_res_float64x2' with no 
linkage

This is because r11-3402 now always declares float64x2 variables for
aarch64, leading to a duplicate declaration in these testcases.

The fix is simply to remove these now useless declarations.

These tests are skipped on arm*, so there is no impact on that target.

2020-09-25  Christophe Lyon  

gcc/testsuite/
PR target/71233
* gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c: Remove
declarations of vector, vector2, vector_res for float64x2 type.
* gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vzip_half.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c
index 63f820f..25a0f19 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c
@@ -73,11 +73,8 @@ void exec_vtrn_half (void)
   /* Input vector can only have 64 bits.  */
   DECL_VARIABLE_ALL_VARIANTS(vector);
   DECL_VARIABLE_ALL_VARIANTS(vector2);
-  DECL_VARIABLE(vector, float, 64, 2);
-  DECL_VARIABLE(vector2, float, 64, 2);
 
   DECL_VARIABLE_ALL_VARIANTS(vector_res);
-  DECL_VARIABLE(vector_res, float, 64, 2);
 
   clean_results ();
   /* We don't have vtrn1_T64x1, so set expected to the clean value.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c
index 8706f24..2e6b666 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c
@@ -70,11 +70,8 @@ void exec_vuzp_half (void)
   /* Input vector can only have 64 bits.  */
   DECL_VARIABLE_ALL_VARIANTS(vector);
   DECL_VARIABLE_ALL_VARIANTS(vector2);
-  DECL_VARIABLE(vector, float, 64, 2);
-  DECL_VARIABLE(vector2, float, 64, 2);
 
   DECL_VARIABLE_ALL_VARIANTS(vector_res);
-  DECL_VARIABLE(vector_res, float, 64, 2);
 
   clean_results ();
   /* We don't have vuzp1_T64x1, so set expected to the clean value.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c
index 619d6b2..ef42451 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vzip_half.c
@@ -73,11 +73,8 @@ void exec_vzip_half (void)
   /* Input vector can only have 64 bits.  */
   DECL_VARIABLE_ALL_VARIANTS(vector);
   DECL_VARIABLE_ALL_VARIANTS(vector2);
-  DECL_VARIABLE(vector, float, 64, 2);
-  DECL_VARIABLE(vector2, float, 64, 2);
 
   DECL_VARIABLE_ALL_VARIANTS(vector_res);
-  DECL_VARIABLE(vector_res, float, 64, 2);
 
   clean_results ();
   /* We don't have vzip1_T64x1, so set expected to the clean value.  */


RE: testsuite: [aarch64] Fix aarch64/advsimd-intrinsics/v{trn, uzp, zip}_half.c

2020-09-25 Thread Kyrylo Tkachov


> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 25 September 2020 11:35
> To: gcc Patches ; Kyrill Tkachov
> 
> Subject: testsuite: [aarch64] Fix aarch64/advsimd-intrinsics/v{trn, uzp,
> zip}_half.c
> 
> Since r11-3402 (g:65c9878641cbe0ed898aa7047b7b994e9d4a5bb1), the
> vtrn_half, vuzp_half and vzip_half started failing with
> 
> vtrn_half.c:76:17: error: redeclaration of 'vector_float64x2' with no linkage
> vtrn_half.c:77:17: error: redeclaration of 'vector2_float64x2' with no linkage
> vtrn_half.c:80:17: error: redeclaration of 'vector_res_float64x2' with
> no linkage
> 
> This is because r11-3402 now always declares float64x2 variables for
> aarch64, leading to a duplicate declaration in these testcases.
> 
> The fix is simply to remove these now useless declarations.
> 
> These tests are skipped on arm*, so there is no impact on that target.
> 
> 2020-09-25  Christophe Lyon  
> 
> gcc/testsuite/
> PR target/71233
> * gcc.target/aarch64/advsimd-intrinsics/vtrn_half.c: Remove
> declarations of vector, vector2, vector_res for float64x2 type.
> * gcc.target/aarch64/advsimd-intrinsics/vuzp_half.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vzip_half.c: Likewise.
> 
> OK?
Oops, yes ok.
Thank you for catching these,
Kyrill


Disable modref for ipa-pta-13.c testcase

2020-09-25 Thread Jan Hubicka
Hi,
parameter tracking in ipa-modref causes failure of ipa-pta-13 testcase.
In partiuclar the check for "= x;" in fre3 is failing since we optimize
it out in fre1.  As far as I can tell this is correct transform because
ipa-modref propagates the fact that the call is passed pointer to y.
Comment speaks of missed optimization, so I gues sit is OK to disable
modref here so we still test whatever this was testing before?

Honza

gcc/testsuite/ChangeLog:

2020-09-25  Jan Hubicka  

* gcc.dg/ipa/ipa-pta-13.c: Disable ipa-modref.

diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c 
b/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
index 93dd87107cc..e7bf6d485a4 100644
--- a/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
+++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
@@ -1,5 +1,5 @@
 /* { dg-do link } */
-/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta2-details -fdump-tree-fre3 
-fno-ipa-icf" } */
+/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta2-details -fdump-tree-fre3 
-fno-ipa-icf -fno-ipa-modref" } */
 
 static int x, y;
 


[PATCH] tree-optimization/97199 - fix virtual operand update in if-conversion

2020-09-25 Thread Richard Biener
This fixes a corner case with virtual operand update in if-conversion
by re-organizing the code to remove edges only after the last point
we need virtual PHI operands to be available.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-09-25  Richard Biener  

PR tree-optimization/97199
* tree-if-conv.c (combine_blocks): Remove edges only
after looking at virtual PHI args.
---
 gcc/tree-if-conv.c | 107 ++---
 1 file changed, 63 insertions(+), 44 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 4b8d457867e..2062758f40f 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2544,8 +2544,7 @@ combine_blocks (class loop *loop)
   if (need_to_predicate)
 predicate_statements (loop);
 
-  /* Merge basic blocks: first remove all the edges in the loop,
- except for those from the exit block.  */
+  /* Merge basic blocks.  */
   exit_bb = NULL;
   bool *predicated = XNEWVEC (bool, orig_loop_num_nodes);
   for (i = 0; i < orig_loop_num_nodes; i++)
@@ -2561,43 +2560,6 @@ combine_blocks (class loop *loop)
 }
   gcc_assert (exit_bb != loop->latch);
 
-  for (i = 1; i < orig_loop_num_nodes; i++)
-{
-  bb = ifc_bbs[i];
-
-  for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei));)
-   {
- if (e->src == exit_bb)
-   ei_next (&ei);
- else
-   remove_edge (e);
-   }
-}
-
-  if (exit_bb != NULL)
-{
-  if (exit_bb != loop->header)
-   {
- /* Connect this node to loop header.  */
- make_single_succ_edge (loop->header, exit_bb, EDGE_FALLTHRU);
- set_immediate_dominator (CDI_DOMINATORS, exit_bb, loop->header);
-   }
-
-  /* Redirect non-exit edges to loop->latch.  */
-  FOR_EACH_EDGE (e, ei, exit_bb->succs)
-   {
- if (!loop_exit_edge_p (loop, e))
-   redirect_edge_and_branch (e, loop->latch);
-   }
-  set_immediate_dominator (CDI_DOMINATORS, loop->latch, exit_bb);
-}
-  else
-{
-  /* If the loop does not have an exit, reconnect header and latch.  */
-  make_edge (loop->header, loop->latch, EDGE_FALLTHRU);
-  set_immediate_dominator (CDI_DOMINATORS, loop->latch, loop->header);
-}
-
   merge_target_bb = loop->header;
 
   /* Get at the virtual def valid for uses starting at the first block
@@ -2682,13 +2644,9 @@ combine_blocks (class loop *loop)
   last = gsi_last_bb (merge_target_bb);
   gsi_insert_seq_after_without_update (&last, bb_seq (bb), GSI_NEW_STMT);
   set_bb_seq (bb, NULL);
-
-  delete_basic_block (bb);
 }
 
-  /* If possible, merge loop header to the block with the exit edge.
- This reduces the number of basic blocks to two, to please the
- vectorizer that handles only loops with two nodes.  */
+  /* Fixup virtual operands in the exit block.  */
   if (exit_bb
   && exit_bb != loop->header)
 {
@@ -2698,6 +2656,11 @@ combine_blocks (class loop *loop)
   vphi = get_virtual_phi (exit_bb);
   if (vphi)
{
+ /* When there's just loads inside the loop a stray virtual
+PHI merging the uses can appear, update last_vdef from
+it.  */
+ if (!last_vdef)
+   last_vdef = gimple_phi_arg_def (vphi, 0);
  imm_use_iterator iter;
  use_operand_p use_p;
  gimple *use_stmt;
@@ -2711,7 +2674,63 @@ combine_blocks (class loop *loop)
  gimple_stmt_iterator gsi = gsi_for_stmt (vphi); 
  remove_phi_node (&gsi, true);
}
+}
+
+  /* Now remove all the edges in the loop, except for those from the exit
+ block and delete the blocks we elided.  */
+  for (i = 1; i < orig_loop_num_nodes; i++)
+{
+  bb = ifc_bbs[i];
+
+  for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei));)
+   {
+ if (e->src == exit_bb)
+   ei_next (&ei);
+ else
+   remove_edge (e);
+   }
+}
+  for (i = 1; i < orig_loop_num_nodes; i++)
+{
+  bb = ifc_bbs[i];
+
+  if (bb == exit_bb || bb == loop->latch)
+   continue;
+
+  delete_basic_block (bb);
+}
+
+  /* Re-connect the exit block.  */
+  if (exit_bb != NULL)
+{
+  if (exit_bb != loop->header)
+   {
+ /* Connect this node to loop header.  */
+ make_single_succ_edge (loop->header, exit_bb, EDGE_FALLTHRU);
+ set_immediate_dominator (CDI_DOMINATORS, exit_bb, loop->header);
+   }
 
+  /* Redirect non-exit edges to loop->latch.  */
+  FOR_EACH_EDGE (e, ei, exit_bb->succs)
+   {
+ if (!loop_exit_edge_p (loop, e))
+   redirect_edge_and_branch (e, loop->latch);
+   }
+  set_immediate_dominator (CDI_DOMINATORS, loop->latch, exit_bb);
+}
+  else
+{
+  /* If the loop does not have an exit, reconnect header and latch.  */
+  make_edge (loop->header, loop->latch, EDGE_FALLTHRU);
+  set_immediate_dominator (CDI_DOMINATORS, loop->latch, loop->hea

[PATCH] c++/97197 - support TARGET_MEM_REF in C/C++ error pretty-printing

2020-09-25 Thread Richard Biener
This adds rough support to avoid "'target_mem_ref' not supported by"
in diagnostics.  There were recent patches by Martin to sanitize
dumping of MEM_REF so I'm not trying to interfere with this here.

Bootstrap & regtest pending.

OK?

2020-09-25  Richard Biener  

PR c++/97197
cp/
* error.c (dump_expr): Handle TARGET_MEM_REF as if it
were MEM_REF.

c-family/
* c-pretty-print.c (c_pretty_printer::postfix_expression):
Handle TARGET_MEM_REF as expression.
(c_pretty_printer::expression): Handle TARGET_MEM_REF as
unary_expression.
(c_pretty_printer::unary_expression): Handle TARGET_MEM_REF
as if it were MEM_REF.
---
 gcc/c-family/c-pretty-print.c | 3 +++
 gcc/cp/error.c| 1 +
 2 files changed, 4 insertions(+)

diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index acffd7b872c..1a0edb82312 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -1693,6 +1693,7 @@ c_pretty_printer::postfix_expression (tree e)
   break;
 
 case MEM_REF:
+case TARGET_MEM_REF:
   expression (e);
   break;
 
@@ -1833,6 +1834,7 @@ c_pretty_printer::unary_expression (tree e)
   break;
 
 case MEM_REF:
+case TARGET_MEM_REF:
   if (TREE_CODE (TREE_OPERAND (e, 0)) == ADDR_EXPR
  && integer_zerop (TREE_OPERAND (e, 1)))
expression (TREE_OPERAND (TREE_OPERAND (e, 0), 0));
@@ -2295,6 +2297,7 @@ c_pretty_printer::expression (tree e)
 case ADDR_EXPR:
 case INDIRECT_REF:
 case MEM_REF:
+case TARGET_MEM_REF:
 case NEGATE_EXPR:
 case BIT_NOT_EXPR:
 case TRUTH_NOT_EXPR:
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index ecb41e82d8c..c9a0c1e0288 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -2372,6 +2372,7 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
   break;
 
 case MEM_REF:
+case TARGET_MEM_REF:
   if (TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR
  && integer_zerop (TREE_OPERAND (t, 1)))
dump_expr (pp, TREE_OPERAND (TREE_OPERAND (t, 0), 0), flags);
-- 
2.26.2


Add support for iterative dataflow to ipa-modref-tree.h

2020-09-25 Thread Jan Hubicka
Hi,
this patch prepares support for iterative dataflow in ipa-modref-tree.h by
making inserts to track if anyting has changed at all.

Bootstrapped/regtested x86_64-linux and also tested with the actual iterative
dataflow in modref. I plan to commit it later today unless there are comments.

Honza

gcc/ChangeLog:

2020-09-24  Jan Hubicka  

* ipa-modref-tree.h (modref_ref_node::insert_access): Track if something
changed.
(modref_base_node::insert_ref): Likewise (and add a new optional
argument)
(modref_tree::insert): Likewise.
(modref_tree::merge): Likewise.

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index caf5d348dd8..02ce7036b3f 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -88,17 +88,18 @@ struct GTY((user)) modref_ref_node
   }
 
   /* Insert access with OFFSET and SIZE.
- Collapse tree if it has more than MAX_ACCESSES entries.  */
-  void insert_access (modref_access_node a, size_t max_accesses)
+ Collapse tree if it has more than MAX_ACCESSES entries.
+ Return true if record was changed.  */
+  bool insert_access (modref_access_node a, size_t max_accesses)
   {
 /* If this base->ref pair has no access information, bail out.  */
 if (every_access)
-  return;
+  return false;
 
 /* Otherwise, insert a node for the ref of the access under the base.  */
 modref_access_node *access_node = search (a);
 if (access_node)
-  return;
+  return false;
 
 /* If this base->ref pair has too many accesses stored, we will clear
all accesses and bail out.  */
@@ -109,9 +110,10 @@ struct GTY((user)) modref_ref_node
  fprintf (dump_file,
   "--param param=modref-max-accesses limit reached\n");
collapse ();
-   return;
+   return true;
   }
 vec_safe_push (accesses, a);
+return true;
   }
 };
 
@@ -139,8 +141,11 @@ struct GTY((user)) modref_base_node
 return NULL;
   }
 
-  /* Insert REF; collapse tree if there are more than MAX_REFS.  */
-  modref_ref_node  *insert_ref (T ref, size_t max_refs)
+  /* Insert REF; collapse tree if there are more than MAX_REFS.
+ Return inserted ref and if CHANGED is non-null set it to true if
+ something changed.  */
+  modref_ref_node  *insert_ref (T ref, size_t max_refs,
+  bool *changed = NULL)
   {
 modref_ref_node  *ref_node;
 
@@ -153,6 +158,9 @@ struct GTY((user)) modref_base_node
 if (ref_node)
   return ref_node;
 
+if (changed)
+  *changed = true;
+
 /* Collapse the node if too full already.  */
 if (refs && refs->length () >= max_refs)
   {
@@ -204,7 +212,11 @@ struct GTY((user)) modref_tree
 max_accesses (max_accesses),
 every_base (false) {}
 
-  modref_base_node  *insert_base (T base)
+  /* Insert BASE; collapse tree if there are more than MAX_REFS.
+ Return inserted base and if CHANGED is non-null set it to true if
+ something changed.  */
+
+  modref_base_node  *insert_base (T base, bool *changed = NULL)
   {
 modref_base_node  *base_node;
 
@@ -217,6 +229,9 @@ struct GTY((user)) modref_tree
 if (base_node)
   return base_node;
 
+if (changed)
+  *changed = true;
+
 /* Collapse the node if too full already.  */
 if (bases && bases->length () >= max_bases)
   {
@@ -232,43 +247,60 @@ struct GTY((user)) modref_tree
 return base_node;
   }
 
-  /* Insert memory access to the tree. */
-  void insert (T base, T ref, modref_access_node a)
+  /* Insert memory access to the tree.
+ Return true if something changed.  */
+  bool insert (T base, T ref, modref_access_node a)
   {
+if (every_base)
+  return false;
+
+bool changed = false;
+
 /* No useful information tracked; collapse everything.  */
 if (!base && !ref && !a.useful_p ())
   {
collapse ();
-   return;
+   return true;
   }
 
-modref_base_node  *base_node = insert_base (base);
+modref_base_node  *base_node = insert_base (base, &changed);
 if (!base_node)
-  return;
+  return changed;
 gcc_assert (search (base) != NULL);
 
-modref_ref_node  *ref_node = base_node->insert_ref (ref, max_refs);
+modref_ref_node  *ref_node = base_node->insert_ref (ref, max_refs,
+  &changed);
 
 /* No useful ref information and no useful base; collapse everyting.  */
 if (!base && base_node->every_ref)
   {
collapse ();
-   return;
+   return true;
   }
 if (ref_node)
   {
/* No useful ref and access; collapse ref.  */
if (!ref && !a.useful_p ())
- ref_node->collapse ();
+ {
+   if (!ref_node->every_access)
+ {
+   ref_node->collapse ();
+   changed = true;
+ }
+ }
else
  {
-   ref_node->insert_access (a, max_accesses);
+   chan

Track arguments pointing to local or readonly memory in ipa-fnsummary

2020-09-25 Thread Jan Hubicka
Hi,
this patch implement trakcing wehther argument points to readonly memory. This
is is useful for ipa-modref as well as for inline heuristics.  It is desirable
to inline functions that dereference pointers to local variables in order
to support SRA.  We always did the oposite heuristics (guessing that the
dereferences will be optimized out with 50% probability) but here we could
increase the probability for cases where we can track that argument is indeed
a local memory (or readonly which is also good)

Bootstrapped/regtested x86_64-linux.  I plan to commit it later today unless
there are comments.

Honza

* ipa-fnsummary.c (dump_ipa_call_summary): Dump
points_to_local_or_readonly_memory flag.
(analyze_function_body): Compute points_to_local_or_readonly_memory
flag.
(remap_edge_change_prob): Rename to ...
(remap_edge_params): ... this one; update
points_to_local_or_readonly_memory.
(remap_edge_summaries): Update.
(read_ipa_call_summary): Stream the new flag.
(write_ipa_call_summary): Likewise.
* ipa-predicate.h (struct inline_param_summary): Add
points_to_local_or_readonly_memory.
(inline_param_summary::equal_to): Update.
(inline_param_summary::useless_p): Update.
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bb703f62206..7f12b116dec 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -980,6 +980,9 @@ dump_ipa_call_summary (FILE *f, int indent, struct 
cgraph_node *node,
else if (prob != REG_BR_PROB_BASE)
  fprintf (f, "%*s op%i change %f%% of time\n", indent + 2, "", i,
   prob * 100.0 / REG_BR_PROB_BASE);
+   if (es->param[i].points_to_local_or_readonly_memory)
+ fprintf (f, "%*s op%i points to local or readonly memory\n",
+  indent + 2, "", i);
  }
   if (!edge->inline_failed)
{
@@ -2671,6 +2674,9 @@ analyze_function_body (struct cgraph_node *node, bool 
early)
  int prob = param_change_prob (&fbi, stmt, i);
  gcc_assert (prob >= 0 && prob <= REG_BR_PROB_BASE);
  es->param[i].change_prob = prob;
+ es->param[i].points_to_local_or_readonly_memory
+= points_to_local_or_readonly_memory_p
+(gimple_call_arg (stmt, i));
}
}
 
@@ -3783,15 +3789,17 @@ inline_update_callee_summaries (struct cgraph_node 
*node, int depth)
 ipa_call_summaries->get (e)->loop_depth += depth;
 }
 
-/* Update change_prob of EDGE after INLINED_EDGE has been inlined.
+/* Update change_prob and points_to_local_or_readonly_memory of EDGE after
+   INLINED_EDGE has been inlined.
+
When function A is inlined in B and A calls C with parameter that
changes with probability PROB1 and C is known to be passthrough
of argument if B that change with probability PROB2, the probability
of change is now PROB1*PROB2.  */
 
 static void
-remap_edge_change_prob (struct cgraph_edge *inlined_edge,
-   struct cgraph_edge *edge)
+remap_edge_params (struct cgraph_edge *inlined_edge,
+  struct cgraph_edge *edge)
 {
   if (ipa_node_params_sum)
 {
@@ -3825,7 +3833,16 @@ remap_edge_change_prob (struct cgraph_edge *inlined_edge,
prob = 1;
 
  es->param[i].change_prob = prob;
+
+ if (inlined_es
+   ->param[id].points_to_local_or_readonly_memory)
+   es->param[i].points_to_local_or_readonly_memory = true;
}
+ if (!es->param[i].points_to_local_or_readonly_memory
+ && jfunc->type == IPA_JF_CONST
+ && points_to_local_or_readonly_memory_p
+(ipa_get_jf_constant (jfunc)))
+   es->param[i].points_to_local_or_readonly_memory = true;
}
}
 }
@@ -3858,7 +3875,7 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
   if (e->inline_failed)
{
   class ipa_call_summary *es = ipa_call_summaries->get (e);
- remap_edge_change_prob (inlined_edge, e);
+ remap_edge_params (inlined_edge, e);
 
  if (es->predicate)
{
@@ -3884,7 +3901,7 @@ remap_edge_summaries (struct cgraph_edge *inlined_edge,
   predicate p;
   next = e->next_callee;
 
-  remap_edge_change_prob (inlined_edge, e);
+  remap_edge_params (inlined_edge, e);
   if (es->predicate)
{
  p = es->predicate->remap_after_inlining
@@ -4210,12 +4227,19 @@ read_ipa_call_summary (class lto_input_block *ib, 
struct cgraph_edge *e,
 {
   es->param.safe_grow_cleared (length, true);
   for (i = 0; i < length; i++)
-   es->param[i].change_prob = streamer_read_uhwi (ib);
+   {
+ es->param[i].change_prob = streamer_read_uhwi (ib);
+ es->param[i].p

[PATCH] arm: Add missing Neoverse V1 feature

2020-09-25 Thread Alex Coplan
Hello,

This simple follow-on patch adds a missing feature (FP16) to the
Neoverse V1 description in AArch32 GCC.

OK for master?

Thanks,
Alex

---

gcc/ChangeLog:

* config/arm/arm-cpus.in (neoverse-v1): Add FP16.

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index be563b7f807..bf460ddbcaf 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1494,7 +1494,7 @@ begin cpu neoverse-v1
   cname neoversev1
   tune for cortex-a57
   tune flags LDSCHED
-  architecture armv8.4-a+bf16+i8mm
+  architecture armv8.4-a+fp16+bf16+i8mm
   option crypto add FP_ARMv8 CRYPTO
   costs cortex_a57
 end cpu neoverse-v1


RE: [PATCH] arm: Add missing Neoverse V1 feature

2020-09-25 Thread Kyrylo Tkachov


> -Original Message-
> From: Alex Coplan 
> Sent: 25 September 2020 12:18
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH] arm: Add missing Neoverse V1 feature
> 
> Hello,
> 
> This simple follow-on patch adds a missing feature (FP16) to the
> Neoverse V1 description in AArch32 GCC.
> 
> OK for master?

Ok, sorry for not catching it in the original review.
Kyrill

> 
> Thanks,
> Alex
> 
> ---
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-cpus.in (neoverse-v1): Add FP16.



Re: Disable modref for ipa-pta-13.c testcase

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 1:04 PM Jan Hubicka  wrote:
>
> Hi,
> parameter tracking in ipa-modref causes failure of ipa-pta-13 testcase.
> In partiuclar the check for "= x;" in fre3 is failing since we optimize
> it out in fre1.  As far as I can tell this is correct transform because
> ipa-modref propagates the fact that the call is passed pointer to y.
> Comment speaks of missed optimization, so I gues sit is OK to disable
> modref here so we still test whatever this was testing before?
>
Hmm, I guess so.  Ideally both local and local_address_taken would
be noipa but then IPA PTA wouldn't apply either ;)  So yes, OK to
disable modref.

Richard.

> Honza
>
> gcc/testsuite/ChangeLog:
>
> 2020-09-25  Jan Hubicka  
>
> * gcc.dg/ipa/ipa-pta-13.c: Disable ipa-modref.
>
> diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c 
> b/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
> index 93dd87107cc..e7bf6d485a4 100644
> --- a/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
> +++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
> @@ -1,5 +1,5 @@
>  /* { dg-do link } */
> -/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta2-details -fdump-tree-fre3 
> -fno-ipa-icf" } */
> +/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta2-details -fdump-tree-fre3 
> -fno-ipa-icf -fno-ipa-modref" } */
>
>  static int x, y;
>


Re: [PATCH] c++/97197 - support TARGET_MEM_REF in C/C++ error pretty-printing

2020-09-25 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 25, 2020 at 01:11:37PM +0200, Richard Biener wrote:
> This adds rough support to avoid "'target_mem_ref' not supported by"
> in diagnostics.  There were recent patches by Martin to sanitize
> dumping of MEM_REF so I'm not trying to interfere with this here.

Is that correct?
I mean, TARGET_MEM_REF encodes more than what MEM_REF encodes,
so printing it like MEM_REF will ignore many things from there.
I'd say we should print it like:
*(type *)(BASE + STEP * INDEX + INDEX2 + OFFSET)
rather than how we print MEM_REFs as
*(type *)(BASE + OFFSET)
(with skipping whatever is NULL in there).
So instead of adding case MEM_REF: in the second and last hunk
copy and edit it (perhaps kill the probably unnecessary
part that checks for *&foo and prints it as foo, because who would
create TARGET_MEM_REF when MEM_REF could have been used in that case).
> 
> Bootstrap & regtest pending.
> 
> OK?
> 
> 2020-09-25  Richard Biener  
> 
>   PR c++/97197
> cp/
>   * error.c (dump_expr): Handle TARGET_MEM_REF as if it
>   were MEM_REF.
> 
> c-family/
>   * c-pretty-print.c (c_pretty_printer::postfix_expression):
>   Handle TARGET_MEM_REF as expression.
>   (c_pretty_printer::expression): Handle TARGET_MEM_REF as
>   unary_expression.
>   (c_pretty_printer::unary_expression): Handle TARGET_MEM_REF
>   as if it were MEM_REF.

Jakub



Re: [Patch] OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

2020-09-25 Thread Jakub Jelinek via Gcc-patches
On Wed, Sep 23, 2020 at 05:45:12PM +0200, Tobias Burnus wrote:
> On 9/23/20 4:06 PM, Jakub Jelinek wrote:
> 
> > What I really meant was:
> I did now something based on this.
> > > +  gcc_assert (node->alias && node->analyzed);
> 
> I believe from previous testing that node->analyzed is 0
> for the testcase at hand — and, hence, ultimate_alias_target()

That would be surprising, because if it is not node->analyzed, then
ultimate_alias_target_1 will not change node at all.

Anyway, the patch LGTM, thanks.

Jakub



Re: [PATCH v4 1/3] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 8:51 AM xionghu luo  wrote:
>
> Hi,
>
> On 2020/9/24 20:39, Richard Sandiford wrote:
> > xionghu luo  writes:
> >> @@ -2658,6 +2659,43 @@ expand_vect_cond_mask_optab_fn (internal_fn, gcall 
> >> *stmt, convert_optab optab)
> >>
> >>   #define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
> >>
> >> +/* Expand VEC_SET internal functions.  */
> >> +
> >> +static void
> >> +expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> >> +{
> >> +  tree lhs = gimple_call_lhs (stmt);
> >> +  tree op0 = gimple_call_arg (stmt, 0);
> >> +  tree op1 = gimple_call_arg (stmt, 1);
> >> +  tree op2 = gimple_call_arg (stmt, 2);
> >> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >> +  rtx src = expand_expr (op0, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >
> > I'm not sure about the expand_expr here.  ISTM that op0 is a normal
> > input and so should be expanded by expand_normal rather than
> > EXPAND_WRITE.  Also:
> >
> >> +
> >> +  machine_mode outermode = TYPE_MODE (TREE_TYPE (op0));
> >> +  scalar_mode innermode = GET_MODE_INNER (outermode);
> >> +
> >> +  rtx value = expand_expr (op1, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> >> +  rtx pos = expand_expr (op2, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> >> +
> >> +  class expand_operand ops[3];
> >> +  enum insn_code icode = optab_handler (optab, outermode);
> >> +
> >> +  if (icode != CODE_FOR_nothing)
> >> +{
> >> +  pos = convert_to_mode (E_SImode, pos, 0);
> >> +
> >> +  create_fixed_operand (&ops[0], src);
> >
> > ...this would mean that if SRC happens to be a MEM, the pattern
> > must also accept a MEM.
> >
> > ISTM that we're making more work for ourselves by not “fixing” the optab
> > to have a natural pure-input + pure-output interface. :-)  But if we
> > stick with the current optab interface, I think we need to:
> >
> > - create a temporary register
> > - move SRC into the temporary register before the insn
> > - use create_fixed_operand with the temporary register for operand 0
> > - move the temporary register into TARGET after the insn
> >
> >> +  create_input_operand (&ops[1], value, innermode);
> >> +  create_input_operand (&ops[2], pos, GET_MODE (pos));
> >
> > For this I think we should use convert_operand_from on the original “pos”,
> > so that the target gets to choose what the mode of the operand is.
> >
>
> Thanks a lot for the nice suggestions, fixed them all and updated the patch 
> as below.
>
>
> [PATCH v4 1/3] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR
>
> This patch enables transformation from ARRAY_REF(VIEW_CONVERT_EXPR) to
> VEC_SET internal function in gimple-isel pass if target supports
> vec_set with variable index by checking can_vec_set_var_idx_p.

OK with me if Richard is happy with the updated patch.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2020-09-25  Xionghu Luo  
>
> * gimple-isel.cc (gimple_expand_vec_set_expr): New function.
> (gimple_expand_vec_cond_exprs): Rename to ...
> (gimple_expand_vec_exprs): ... this and call
> gimple_expand_vec_set_expr.
> * internal-fn.c (vec_set_direct): New define.
> (expand_vec_set_optab_fn): New function.
> (direct_vec_set_optab_supported_p): New define.
> * internal-fn.def (VEC_SET): New DEF_INTERNAL_OPTAB_FN.
> * optabs.c (can_vec_set_var_idx_p): New function.
> * optabs.h (can_vec_set_var_idx_p): New declaration.
> ---
>  gcc/gimple-isel.cc  | 75 +++--
>  gcc/internal-fn.c   | 41 +
>  gcc/internal-fn.def |  2 ++
>  gcc/optabs.c| 21 +
>  gcc/optabs.h|  4 +++
>  5 files changed, 141 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
> index b330cf4c20e..02513e04900 100644
> --- a/gcc/gimple-isel.cc
> +++ b/gcc/gimple-isel.cc
> @@ -35,6 +35,74 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-cfg.h"
>  #include "bitmap.h"
>  #include "tree-ssa-dce.h"
> +#include "memmodel.h"
> +#include "optabs.h"
> +
> +/* Expand all ARRAY_REF(VIEW_CONVERT_EXPR) gimple assignments into calls to
> +   internal function based on vector type of selected expansion.
> +   i.e.:
> + VIEW_CONVERT_EXPR(u)[_1] =  = i_4(D);
> +   =>
> + _7 = u;
> + _8 = .VEC_SET (_7, i_4(D), _1);
> + u = _8;  */
> +
> +static gimple *
> +gimple_expand_vec_set_expr (gimple_stmt_iterator *gsi)
> +{
> +  enum tree_code code;
> +  gcall *new_stmt = NULL;
> +  gassign *ass_stmt = NULL;
> +
> +  /* Only consider code == GIMPLE_ASSIGN.  */
> +  gassign *stmt = dyn_cast (gsi_stmt (*gsi));
> +  if (!stmt)
> +return NULL;
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  code = TREE_CODE (lhs);
> +  if (code != ARRAY_REF)
> +return NULL;
> +
> +  tree val = gimple_assign_rhs1 (stmt);
> +  tree op0 = TREE_OPERAND (lhs, 0);
> +  if (TREE_CODE (op0) == VIEW_CONVERT_EXPR && DECL_P (TREE_OPERAND (op0, 0

Re: [PATCH] c++/97197 - support TARGET_MEM_REF in C/C++ error pretty-printing

2020-09-25 Thread Richard Biener
On Fri, 25 Sep 2020, Jakub Jelinek wrote:

> On Fri, Sep 25, 2020 at 01:11:37PM +0200, Richard Biener wrote:
> > This adds rough support to avoid "'target_mem_ref' not supported by"
> > in diagnostics.  There were recent patches by Martin to sanitize
> > dumping of MEM_REF so I'm not trying to interfere with this here.
> 
> Is that correct?
> I mean, TARGET_MEM_REF encodes more than what MEM_REF encodes,
> so printing it like MEM_REF will ignore many things from there.
> I'd say we should print it like:
> *(type *)(BASE + STEP * INDEX + INDEX2 + OFFSET)
> rather than how we print MEM_REFs as
> *(type *)(BASE + OFFSET)
> (with skipping whatever is NULL in there).
> So instead of adding case MEM_REF: in the second and last hunk
> copy and edit it (perhaps kill the probably unnecessary
> part that checks for *&foo and prints it as foo, because who would
> create TARGET_MEM_REF when MEM_REF could have been used in that case).

See my comment above for Martins attempts to improve things.  I don't
really want to try decide what to do with those late diagnostic IL
printing but my commit was blamed for showing target-mem-ref unsupported.

I don't have much time to spend to think what to best print and what not,
but yes, printing only the MEM_REF part is certainly imprecise.

I'll leave the PR to FE folks.

Thanks,
Richard.

> > 
> > Bootstrap & regtest pending.
> > 
> > OK?
> > 
> > 2020-09-25  Richard Biener  
> > 
> > PR c++/97197
> > cp/
> > * error.c (dump_expr): Handle TARGET_MEM_REF as if it
> > were MEM_REF.
> > 
> > c-family/
> > * c-pretty-print.c (c_pretty_printer::postfix_expression):
> > Handle TARGET_MEM_REF as expression.
> > (c_pretty_printer::expression): Handle TARGET_MEM_REF as
> > unary_expression.
> > (c_pretty_printer::unary_expression): Handle TARGET_MEM_REF
> > as if it were MEM_REF.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-09-25 Thread Tom de Vries
On 9/24/20 5:05 PM, Richard Biener wrote:
> On Thu, 24 Sep 2020, Jonathan Wakely wrote:
> 
>> On 24/09/20 11:11 +0200, Richard Biener wrote:
>>> On Wed, 26 Aug 2020, Richard Biener wrote:
>>>
 On Thu, 6 Aug 2020, Richard Biener wrote:

> On Thu, 6 Aug 2020, Richard Biener wrote:
>
>> This adds a move CTOR to auto_vec and makes use of a
>> auto_vec return value for get_loop_exit_edges denoting
>> that lifetime management of the vector is handed to the caller.
>>
>> The move CTOR prompted the hash_table change because it appearantly
>> makes the copy CTOR implicitely deleted (good) and hash-table
>> expansion of the odr_enum_map which is
>> hash_map  where odr_enum has an
>> auto_vec member triggers this.  Not sure if
>> there's a latent bug there before this (I think we're not
>> invoking DTORs, but we're invoking copy-CTORs).
>>
>> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>>
>> Does this all look sensible and is it a good change
>> (the get_loop_exit_edges one)?
>
> Regtest went OK, here's an update with a complete ChangeLog
> (how useful..) plus the move assign operator deleted, copy
> assign wouldn't work as auto-generated and at the moment
> there's no use of assigning.  I guess if we'd have functions
> that take an auto_vec<> argument meaning they will destroy
> the vector that will become useful and we can implement it.
>
> OK for trunk?

 Ping.
>>>
>>> Ping^2.
>>
>> Looks good to me as far as the use of C++ features goes.
> 
> Thanks, now pushed after re-testing.

Ran into a build breaker after this commit, reported here (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97207 ).

Thanks,
- Tom


Patch ping

2020-09-25 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping a few patches:

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552451.html
  - allow plugins to deal with global_options layout changes

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553420.html
  - --enable-link-serialization{,=N} support

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553948.html
  - PR96994 - fix up C++ handling of default initialization with
  consteval default ctor

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553992.html
  - pass -gdwarf-5 to assembler for -gdwarf-5 if possible

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554246.html
  - PR97073 - fix wrong-code on double-word op expansion

Jakub



[committed] libstdc++: Remove redundant -std=gnu++1z flags from makefile

2020-09-25 Thread Jonathan Wakely via Gcc-patches
Now that G++ defaults to gnu++17 we don't need special rules for
compiling the C++17 allocation and deallocation functions.

libstdc++-v3/ChangeLog:

* libsupc++/Makefile.am: Remove redundant -std=gnu++1z flags.
* libsupc++/Makefile.in: Regenerate.

Tested powerpc64le-linux. Committed to trunk.

commit 473da7e22c809fda9e3b37557d6ee8c07b226ca4
Author: Jonathan Wakely 
Date:   Fri Sep 25 12:50:17 2020

libstdc++: Remove redundant -std=gnu++1z flags from makefile

Now that G++ defaults to gnu++17 we don't need special rules for
compiling the C++17 allocation and deallocation functions.

libstdc++-v3/ChangeLog:

* libsupc++/Makefile.am: Remove redundant -std=gnu++1z flags.
* libsupc++/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/libsupc++/Makefile.am 
b/libstdc++-v3/libsupc++/Makefile.am
index 35ad3ae7799..091fe159d5a 100644
--- a/libstdc++-v3/libsupc++/Makefile.am
+++ b/libstdc++-v3/libsupc++/Makefile.am
@@ -128,28 +128,6 @@ cp-demangle.o: cp-demangle.c
$(C_COMPILE) -DIN_GLIBCPP_V3 -Wno-error -c $<
 
 
-# Use special rules for the C++17 sources so that the proper flags are passed.
-new_opa.lo: new_opa.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-new_opant.lo: new_opant.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-new_opva.lo: new_opva.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-new_opvant.lo: new_opvant.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opa.lo: del_opa.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opant.lo: del_opant.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opsa.lo: del_opsa.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opva.lo: del_opva.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opvant.lo: del_opvant.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-del_opvsa.lo: del_opvsa.cc
-   $(LTCXXCOMPILE) -std=gnu++1z -c $<
-
 # AM_CXXFLAGS needs to be in each subdirectory so that it can be
 # modified in a per-library or per-sub-library way.  Need to manually
 # set this option because CONFIG_CXXFLAGS has to be after


Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Sandiford
Richard Biener  writes:
> The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
> with bit-precision elements correctly as the testcase shows before
> the PR97085 fix.  The following makes it do the correct thing
> (not 100% sure for CTOR of sub-vectors due to the lack of a testcase).
>
> The alternative would be to assert such CTORs do not happen (and also
> add IL verification for this).
>
> The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
> (thus the C FE needs that), thus test coverage is quite limited (zero)
> now and I didn't manage to convince GCC to create such CTOR for SVE
> VnBImode vectors.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Does this look sensible?
>
> Thanks,
> Richard.
>
> 2020-09-25  Richard Biener  
>
>   PR middle-end/96814
>   * expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
>   CTORs correctly.
>
>   * gcc.target/i386/pr96814.c: New testcase.
> ---
>  gcc/expr.c  | 28 ++---
>  gcc/testsuite/gcc.target/i386/pr96814.c | 19 +
>  2 files changed, 40 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr96814.c
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 1a15f24b397..fb42e485089 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -6922,7 +6922,9 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
>   insn_code icode = CODE_FOR_nothing;
>   tree elt;
>   tree elttype = TREE_TYPE (type);
> - int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
> + int elt_size
> +   = (VECTOR_BOOLEAN_TYPE_P (type) ? TYPE_PRECISION (elttype)
> +  : tree_to_uhwi (TYPE_SIZE (elttype)));

FWIW, we now have vector_element_bits for this.

>   machine_mode eltmode = TYPE_MODE (elttype);
>   HOST_WIDE_INT bitsize;
>   HOST_WIDE_INT bitpos;
> @@ -6987,6 +6989,23 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
> }
> }
>  
> + /* Compute the size of the elements in the CTOR.  */
> + tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> + if (VECTOR_BOOLEAN_TYPE_P (type))
> +   {
> + if (VECTOR_TYPE_P (val_type))
> +   {
> + /* ???  Never seen such beast, but it's not disallowed.  */
> + gcc_assert (VECTOR_BOOLEAN_TYPE_P (val_type));
> + bitsize = (TYPE_PRECISION (TREE_TYPE (val_type))
> +* TYPE_VECTOR_SUBPARTS (val_type).to_constant ());
> +   }
> + else
> +   bitsize = TYPE_PRECISION (val_type);
> +   }
> + else
> +   bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
> +

What do we allow for non-boolean constructors.  E.g. for:

  v2hi = 0xf001;

do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
initialiser value allowed to be arbitrarily different from the type
of the elements being initialised?

Or is there requirement that (say) each constructor element is either:

- a scalar that initialises one element of the constructed vector
- a vector of N elements that initialises N elements of the constructed vector

?

Like you say, it mostly seems like guesswork how booleans would be
handled here, but personally I don't know the answer for non-booleans
either :-)

Thanks,
Richard


Re: [PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple [PR79251]

2020-09-25 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Sep 24, 2020 at 9:38 PM Segher Boessenkool
>  wrote:
>>
>> Hi!
>>
>> On Thu, Sep 24, 2020 at 04:55:21PM +0200, Richard Biener wrote:
>> > Btw, on x86_64 the following produces sth reasonable:
>> >
>> > #define N 32
>> > typedef int T;
>> > typedef T V __attribute__((vector_size(N)));
>> > V setg (V v, int idx, T val)
>> > {
>> >   V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
>> >   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
>> >   v = (v & ~mask) | (valv & mask);
>> >   return v;
>> > }
>> >
>> > vmovd   %edi, %xmm1
>> > vpbroadcastd%xmm1, %ymm1
>> > vpcmpeqd.LC0(%rip), %ymm1, %ymm2
>> > vpblendvb   %ymm2, %ymm1, %ymm0, %ymm0
>> > ret
>> >
>> > I'm quite sure you could do sth similar on power?
>>
>> This only allows inserting aligned elements.  Which is probably fine
>> of course (we don't allow elements that straddle vector boundaries
>> either, anyway).
>>
>> And yes, we can do that :-)
>>
>> That should be
>>   #define N 32
>>   typedef int T;
>>   typedef T V __attribute__((vector_size(N)));
>>   V setg (V v, int idx, T val)
>>   {
>> V valv = (V){val, val, val, val, val, val, val, val};
>> V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
>> V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
>> v = (v & ~mask) | (valv & mask);
>> return v;
>>   }
>
> Whoops yeah, simplified it a bit too much ;)
>
>> after which I get (-march=znver2)
>>
>> setg:
>> vmovd   %edi, %xmm1
>> vmovd   %esi, %xmm2
>> vpbroadcastd%xmm1, %ymm1
>> vpbroadcastd%xmm2, %ymm2
>> vpcmpeqd.LC0(%rip), %ymm1, %ymm1
>> vpandn  %ymm0, %ymm1, %ymm0
>> vpand   %ymm2, %ymm1, %ymm1
>> vpor%ymm0, %ymm1, %ymm0
>> ret
>
> I get with -march=znver2 -O2
>
> vmovd   %edi, %xmm1
> vmovd   %esi, %xmm2
> vpbroadcastd%xmm1, %ymm1
> vpbroadcastd%xmm2, %ymm2
> vpcmpeqd.LC0(%rip), %ymm1, %ymm1
> vpblendvb   %ymm1, %ymm2, %ymm0, %ymm0
>
> and with -mavx512vl
>
> vpbroadcastd%edi, %ymm1
> vpcmpd  $0, .LC0(%rip), %ymm1, %k1
> vpbroadcastd%esi, %ymm0{%k1}
>
> broadcast-with-mask - heh, would be interesting if we manage
> to combine v[idx1] = val; v[idx2] = val; ;)
>
> Now, with SSE4.2 the 16byte case compiles to
>
> setg:
> .LFB0:
> .cfi_startproc
> movd%edi, %xmm3
> movdqa  %xmm0, %xmm1
> movd%esi, %xmm4
> pshufd  $0, %xmm3, %xmm0
> pcmpeqd .LC0(%rip), %xmm0
> movdqa  %xmm0, %xmm2
> pandn   %xmm1, %xmm2
> pshufd  $0, %xmm4, %xmm1
> pand%xmm1, %xmm0
> por %xmm2, %xmm0
> ret
>
> since there's no blend with a variable mask IIRC.
>
> with aarch64 and SVE it doesn't handle the 32byte case at all,

FWIW, the SVE version with -msve-vector-bits=256 is:

ptrue   p0.b, vl32
mov z1.s, w1
index   z2.s, #0, #1
ld1wz0.s, p0/z, [x0]
cmpeq   p1.s, p0/z, z1.s, z2.s
mov z0.s, p1/m, w2
st1wz0.s, p0, [x8]

where the ptrue, ld1w and st1w are just because generic 256-bit
vectors are passed in memory; the real operation is:

mov z1.s, w1
index   z2.s, #0, #1
cmpeq   p1.s, p0/z, z1.s, z2.s
mov z0.s, p1/m, w2

Thanks,
Richard


Re: [PATCH] aarch64: Do not alter force_reg returned rtx expanding pauth builtins

2020-09-25 Thread Richard Sandiford
Andrea Corallo  writes:
> Hi Richard,
>
> thanks for reviewing
>
> Richard Sandiford  writes:
>
>> Andrea Corallo  writes:
>>> Hi all,
>>>
>>> having a look for force_reg returned rtx later on modified I've found
>>> this other case in `aarch64_general_expand_builtin` while expanding 
>>> pointer authentication builtins.
>>>
>>> Regtested and bootsraped on aarch64-linux-gnu.
>>>
>>> Okay for trunk?
>>>
>>>   Andrea
>>>
>>> From 8869ee04e3788fdec86aa7e5a13e2eb477091d0e Mon Sep 17 00:00:00 2001
>>> From: Andrea Corallo 
>>> Date: Mon, 21 Sep 2020 13:52:45 +0100
>>> Subject: [PATCH] aarch64: Do not alter force_reg returned rtx expanding 
>>> pauth
>>>  builtins
>>>
>>> 2020-09-21  Andrea Corallo  
>>>
>>> * config/aarch64/aarch64-builtins.c
>>> (aarch64_general_expand_builtin): Do not alter value on a
>>> force_reg returned rtx.
>>> ---
>>>  gcc/config/aarch64/aarch64-builtins.c | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
>>> b/gcc/config/aarch64/aarch64-builtins.c
>>> index b787719cf5e..a77718ccfac 100644
>>> --- a/gcc/config/aarch64/aarch64-builtins.c
>>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>>> @@ -2079,10 +2079,10 @@ aarch64_general_expand_builtin (unsigned int fcode, 
>>> tree exp, rtx target,
>>>arg0 = CALL_EXPR_ARG (exp, 0);
>>>op0 = force_reg (Pmode, expand_normal (arg0));
>>>  
>>> -  if (!target)
>>> +  if (!(target
>>> +   && REG_P (target)
>>> +   && GET_MODE (target) == Pmode))
>>> target = gen_reg_rtx (Pmode);
>>> -  else
>>> -   target = force_reg (Pmode, target);
>>>  
>>>emit_move_insn (target, op0);
>>
>> Do we actually use the result of this move?  It looked like we always
>> use op0 rather than target (good) and overwrite target with a later move.
>>
>> If so, I think we should delete the move
>
> Good point agree.
>
>> and convert the later code to use expand_insn.
>
> I'm not sure I understand the suggestion right, xpaclri&friends patterns
> are written with hardcoded in/out regs, is the suggestion to just use like
> 'expand_insn (CODE_FOR_xpaclri, 0, NULL)' in place of GEN_FCN+emit_insn?

Oops, sorry for the bogus comment, didn't look closely enough.

So yeah, no need to use expand_insn.  Rather than generate a new target,
it should be OK to return lr and x17_reg directly.  (Hope I'm right
this time. ;-))

Thanks,
Richard


Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Biener
On Fri, 25 Sep 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> > The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
> > with bit-precision elements correctly as the testcase shows before
> > the PR97085 fix.  The following makes it do the correct thing
> > (not 100% sure for CTOR of sub-vectors due to the lack of a testcase).
> >
> > The alternative would be to assert such CTORs do not happen (and also
> > add IL verification for this).
> >
> > The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
> > (thus the C FE needs that), thus test coverage is quite limited (zero)
> > now and I didn't manage to convince GCC to create such CTOR for SVE
> > VnBImode vectors.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Does this look sensible?
> >
> > Thanks,
> > Richard.
> >
> > 2020-09-25  Richard Biener  
> >
> > PR middle-end/96814
> > * expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
> > CTORs correctly.
> >
> > * gcc.target/i386/pr96814.c: New testcase.
> > ---
> >  gcc/expr.c  | 28 ++---
> >  gcc/testsuite/gcc.target/i386/pr96814.c | 19 +
> >  2 files changed, 40 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr96814.c
> >
> > diff --git a/gcc/expr.c b/gcc/expr.c
> > index 1a15f24b397..fb42e485089 100644
> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -6922,7 +6922,9 @@ store_constructor (tree exp, rtx target, int cleared, 
> > poly_int64 size,
> > insn_code icode = CODE_FOR_nothing;
> > tree elt;
> > tree elttype = TREE_TYPE (type);
> > -   int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
> > +   int elt_size
> > + = (VECTOR_BOOLEAN_TYPE_P (type) ? TYPE_PRECISION (elttype)
> > +: tree_to_uhwi (TYPE_SIZE (elttype)));
> 
> FWIW, we now have vector_element_bits for this.

ah, didn't know this

> > machine_mode eltmode = TYPE_MODE (elttype);
> > HOST_WIDE_INT bitsize;
> > HOST_WIDE_INT bitpos;
> > @@ -6987,6 +6989,23 @@ store_constructor (tree exp, rtx target, int 
> > cleared, poly_int64 size,
> >   }
> >   }
> >  
> > +   /* Compute the size of the elements in the CTOR.  */
> > +   tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> > +   if (VECTOR_BOOLEAN_TYPE_P (type))
> > + {
> > +   if (VECTOR_TYPE_P (val_type))
> > + {
> > +   /* ???  Never seen such beast, but it's not disallowed.  */
> > +   gcc_assert (VECTOR_BOOLEAN_TYPE_P (val_type));
> > +   bitsize = (TYPE_PRECISION (TREE_TYPE (val_type))
> > +  * TYPE_VECTOR_SUBPARTS (val_type).to_constant ());

but I wonder whether it is correct?  Say, for AVX512 which uses
at least 'char' as type TYPE_SIZE of that will likely be 8 so for
a hyphotetical 4 element mask it would need two-bit elements
to work out but IIRC AVX512 mask registers always use 1 bit per lane.

The target hook currently does

static opt_machine_mode
ix86_get_mask_mode (machine_mode data_mode)
{
  unsigned vector_size = GET_MODE_SIZE (data_mode);
  unsigned nunits = GET_MODE_NUNITS (data_mode);
  unsigned elem_size = vector_size / nunits;

  /* Scalar mask case.  */
  if ((TARGET_AVX512F && vector_size == 64)
  || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
{ 
  if (elem_size == 4
  || elem_size == 8
  || (TARGET_AVX512BW && (elem_size == 1 || elem_size == 2)))
return smallest_int_mode_for_size (nunits);
}

and then build_truth_vector_type_for_mode will end up building
a vector with QImode I think.  So it works but I now
wonder whether it works correctly ;)

I guess I will rework the above hunk to use the computed element
size for the non-vector element case and rely on TYPE_SIZE for
the vector element case since that's what vector_element_size does.
Simplifies the beast a bit.

> > + }
> > +   else
> > + bitsize = TYPE_PRECISION (val_type);
> > + }
> > +   else
> > + bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
> > +
> 
> What do we allow for non-boolean constructors.  E.g. for:
> 
>   v2hi = 0xf001;
> 
> do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
> initialiser value allowed to be arbitrarily different from the type
> of the elements being initialised?
> 
> Or is there requirement that (say) each constructor element is either:
> 
> - a scalar that initialises one element of the constructed vector
> - a vector of N elements that initialises N elements of the constructed vector
> 
> ?
> 
> Like you say, it mostly seems like guesswork how booleans would be
> handled here, but personally I don't know the answer for non-booleans
> either :-)

There's extensive checking in tree-cfg.c for vector CTORs meanwhile.
We only supporm uniform element CTORs with only trailing zeros elided.
And the elements need to either have types of the vector component
or be vectors with such com

Re: [PATCH] c++: Fix up default initialization with consteval default ctor [PR96994]

2020-09-25 Thread Stephan Bergmann via Gcc-patches

On 15/09/2020 09:57, Jakub Jelinek via Gcc-patches wrote:

The following testcase is miscompiled (in particular the a and i
initialization).  The problem is that build_special_member_call due to
the immediate constructors (but not evaluated in constant expression mode)
doesn't create a CALL_EXPR, but returns a TARGET_EXPR with CONSTRUCTOR
as the initializer for it, and then expand_default_init just emits
the returned statement, but this one doesn't have any side-effects and does
nothing.  There is an if to handle constexpr ctors which emits an INIT_EXPR
but constexpr ctors still show up as CALL_EXPR and need to be manually
evaluated to constant expressions (if possible).

The following patch fixes that, though I'm not sure about several things.
One is that the earlier if also has expr == true_exp && in the condition,
not sure if we want it in this case or not.
Another is that for delegating constructors, we emit two separate calls
and build_if_in_charge them together.  Not sure if consteval could come into
play in that case.


(Just reporting that with this patch applied, my build of LibreOffice 
using consteval, cf. 
 
"Turn OStringLiteral into a consteval'ed, static-refcound rtl_String", 
works fine.)




[PATCH] middle-end/97207 - implement move assign for auto_vec<>

2020-09-25 Thread Richard Biener
This implements the missing move assignment to make std::swap work
on auto_vec<>

Bootstrapped / tesed on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-09-25  Richard Biener  

PR middle-end/97207
* vec.h (auto_vec::operator=(auto_vec&&)): Implement.
---
 gcc/vec.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/vec.h b/gcc/vec.h
index d73d865cff2..d8c7cdac073 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1546,7 +1546,13 @@ public:
   this->m_vec = r.m_vec;
   r.m_vec = NULL;
 }
-  void operator= (auto_vec&&) = delete;
+  auto_vec& operator= (auto_vec&& r)
+{
+  this->release ();
+  this->m_vec = r.m_vec;
+  r.m_vec = NULL;
+  return *this;
+}
 };
 
 
-- 
2.26.2


Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Richard Sandiford
Qing Zhao  writes:
> Hi, Richard,
>
> As you suggested, I added a default implementation of the target hook 
> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>
>
> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>
> void
> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)

FWIW, I was suggesting to return the set of registers that are actually
cleared too.  Here you have the hook emit the asm statement, but IMO the
way we generate the asm for a given set of registers should be entirely
target-independent, and happen outside the hook.

So the hook returning the set of cleared registers does two things:

(1) It indicates which registers should be clobbered by the asm
(which would be generated after calling the hook, but emitted
before the sequence of instructions generated by the hook).

(2) It indicates which registers should be treated as live on return.

FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.
Then the wrapper around EPILOGUE_USES that we talked about would
check two things:

- EPILOGUE_USES itself
- the crtl HARD_REG_SET

The crtl set would start out empty and remain empty unless the
new option is used.

> if (zero_rtx[(int)mode] == NULL_RTX)
>   {
> zero_rtx[(int)mode] = reg;
> tmp = gen_rtx_SET (reg, const0_rtx);
> emit_insn (tmp);
>   }
> else
>   emit_move_insn (reg, zero_rtx[(int)mode]);

Hmm, OK, so you're assuming that it's better to zero one register
and reuse that register for later moves.  I guess this is my RISC
background/bias showing, but I think it might be simpler to assume
that zeroing is as cheap as a register move.  The danger with reusing
earlier registers is that you might introduce a cross-bank move,
and some targets can only do those via memory.

Or perhaps we could use insn_cost to choose between them.  But I think
the first implementation can just zero each register individually,
unless we already know of a specific case in which reusing registers
is necessary.

> I tested this default implementation on aarch64 with a small testing case, 
> -fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, 
> however, 
> -fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal 
> compiler error as following:
>
> t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
>15 | }
>   | ^
> 0xcff58b gen_highpart(machine_mode, rtx_def*)
>   ../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
> 0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
>   ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
> 0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
>   ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394
>
> As I studied today, I found the major issue for this bug is because the 
> following statement:
>
> machine_mode mode = reg_raw_mode[regno];
>
> “reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector 
> register on aarch64) , as a result, the zeroing insn for this register is:
>
> (insn 112 111 113 7 (set (reg:TI 32 v0)
> (const_int 0 [0])) "t1.c":15:1 -1
>  (nil))
>
>
> However, looks like that the above RTL have to be splitted into two sub 
> register moves on aarch64, and the splitting has some issue. 
>
> So, I guess that on aarch64, zeroing vector registers might need other modes 
> than the one returned by “reg_raw_mode”.  
>
> My questions are:
>
> 1. Is there another available utility routine that returns the proper MODE 
> for the hard registers that can be readily used to zero the hardr register?
> 2. If not, should I add one more target hook for this purpose? i.e 
>
> /* Return the proper machine mode that can be used to zero this hard register 
> specified by REGNO.  */
> machine_mode zero-call-used-regs-mode (unsigned int REGNO)

Thanks for testing aarch64.  I think there are two issues here,
one in the patch and one in the aarch64 backend:

- the patch should use emit_move_insn rather than use gen_rtx_SET directly.

- the aarch64 backend doesn't handle zeroing TImode vector registers,
  but should.  E.g. for:

void
foo ()
{
  register __int128_t q0 asm ("q0");
  q0 = 0;
  asm volatile ("" :: "w" (q0));
}

  we generate:

mov x0, 0
mov x1, 0
fmovd0, x0
fmovv0.d[1], x1

  which is, er, somewhat suboptimal.

I'll try to fix the aarch64 bug for Monday next week.

Thanks,
Richard


Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Sandiford
Richard Biener  writes:
>> What do we allow for non-boolean constructors.  E.g. for:
>> 
>>   v2hi = 0xf001;
>> 
>> do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
>> initialiser value allowed to be arbitrarily different from the type
>> of the elements being initialised?
>> 
>> Or is there requirement that (say) each constructor element is either:
>> 
>> - a scalar that initialises one element of the constructed vector
>> - a vector of N elements that initialises N elements of the constructed 
>> vector
>> 
>> ?
>> 
>> Like you say, it mostly seems like guesswork how booleans would be
>> handled here, but personally I don't know the answer for non-booleans
>> either :-)
>
> There's extensive checking in tree-cfg.c for vector CTORs meanwhile.
> We only supporm uniform element CTORs with only trailing zeros elided.
> And the elements need to either have types of the vector component
> or be vectors with such component.

Ah, great.  So in that case, could we ditch bitsize altogether and
just use:

  unsigned int nelts = (VECTOR_TYPE_P (val_type)
? TYPE_VECTOR_SUBPARTS (val_type).to_constant () : 1);

or equivalent to work out the number of elements being initialised
by each constructor element?

Thanks,
Richard


Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 25 Sep 2020, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> >> What do we allow for non-boolean constructors.  E.g. for:
>> >> 
>> >>   v2hi = 0xf001;
>> >> 
>> >> do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
>> >> initialiser value allowed to be arbitrarily different from the type
>> >> of the elements being initialised?
>> >> 
>> >> Or is there requirement that (say) each constructor element is either:
>> >> 
>> >> - a scalar that initialises one element of the constructed vector
>> >> - a vector of N elements that initialises N elements of the constructed 
>> >> vector
>> >> 
>> >> ?
>> >> 
>> >> Like you say, it mostly seems like guesswork how booleans would be
>> >> handled here, but personally I don't know the answer for non-booleans
>> >> either :-)
>> >
>> > There's extensive checking in tree-cfg.c for vector CTORs meanwhile.
>> > We only supporm uniform element CTORs with only trailing zeros elided.
>> > And the elements need to either have types of the vector component
>> > or be vectors with such component.
>> 
>> Ah, great.  So in that case, could we ditch bitsize altogether and
>> just use:
>> 
>>   unsigned int nelts = (VECTOR_TYPE_P (val_type)
>>  ? TYPE_VECTOR_SUBPARTS (val_type).to_constant () : 1);
>> 
>> or equivalent to work out the number of elements being initialised
>> by each constructor element?
>
> But
>
>store_constructor_field (target, bitsize, bitpos, 0,
>  bitregion_end, value_mode,
>  value, cleared, alias, reverse);
>
> still wants the bits to initialize (for the original testcase
> the vector had only the first 4 elements initialized,
> at wrong bit positions and sizes - QImode).
>
> But yes, I'm sure we can eventually simplify this further.
> FYI, the following passed bootstrap, regtest is still running
> (but as said, test coverage dropped to zero).

LGTM FWIW.

Thanks,
Richard

>
> Richard.
>
> commit d16b5975ca985cbe97698479fc38b6a636886978
> Author: Richard Biener 
> Date:   Fri Sep 25 11:13:13 2020 +0200
>
> middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion
> 
> The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
> with bit-precision elements correctly as the testcase shows before
> the PR97085 fix.  The following makes it do the correct thing
> (not 100% sure for CTOR of sub-vectors due to the lack of a testcase).
> 
> The alternative would be to assert such CTORs do not happen (and also
> add IL verification for this).
> 
> The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
> (thus the C FE needs that).
> 
> 2020-09-25  Richard Biener  
> 
> PR middle-end/96814
> * expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
> CTORs correctly.
> 
> * gcc.target/i386/pr96814.c: New testcase.
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 1a15f24b397..1c79518ee4d 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -6922,7 +6922,7 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
>   insn_code icode = CODE_FOR_nothing;
>   tree elt;
>   tree elttype = TREE_TYPE (type);
> - int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
> + int elt_size = vector_element_bits (type);
>   machine_mode eltmode = TYPE_MODE (elttype);
>   HOST_WIDE_INT bitsize;
>   HOST_WIDE_INT bitpos;
> @@ -6987,6 +6987,15 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
> }
> }
>  
> + /* Compute the size of the elements in the CTOR.  It differs
> +from the size of the vector type elements only when the
> +CTOR elements are vectors themselves.  */
> + tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> + if (VECTOR_TYPE_P (val_type))
> +   bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
> + else
> +   bitsize = elt_size;
> +
>   /* If the constructor has fewer elements than the vector,
>  clear the whole array first.  Similarly if this is static
>  constructor of a non-BLKmode object.  */
> @@ -7001,11 +7010,7 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
>  
>   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
> {
> - tree sz = TYPE_SIZE (TREE_TYPE (value));
> - int n_elts_here
> -   = tree_to_uhwi (int_const_binop (TRUNC_DIV_EXPR, sz,
> -TYPE_SIZE (elttype)));
> -
> + int n_elts_here = bitsize / elt_size;
>   count += n_elts_here;
>   if (mostly_zeros_p (value))
> zero_count += n_elts_here;
> @@ -7045,7 +7050,6 @@ store_constructor (tree exp, rtx target, int cleared, 
> poly_int64 size,
>  

Re: [PATCH] generalized range_query class for multiple contexts

2020-09-25 Thread Andrew MacLeod via Gcc-patches

On 9/24/20 5:51 PM, Martin Sebor via Gcc-patches wrote:

On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:



3. Conversion of sprintf/strlen pass to class.

This is a nonfunctional change to the sprintf/strlen passes. That is, 
no effort was made to change the passes to multi-ranges.  However, 
with this patch, we are able to plug in a ranger or evrp with just a 
few lines, since the range query mechanism share a common API.


Thanks for doing all this!  There isn't anything I don't understand
in the sprintf changes so no questions from me (well, almost none).
Just some comments:

The current call statement is available in all functions that take
a directive argument, as dir->info.callstmt.  There should be no need
to also add it as a new argument to the functions that now need it.

The change adds code along these lines in a bunch of places:

+  value_range vr;
+  if (!query->range_of_expr (vr, arg, stmt))
+    vr.set_varying (TREE_TYPE (arg));

I thought under the new Ranger APIs when a range couldn't be
determined it would be automatically set to the maximum for
the type.  I like that and have been moving in that direction
with my code myself (rather than having an API fail, have it
set the max range and succeed).

Aldy will have to comment why that is there, probably an oversight The 
API should return VARYING if it cant calculate a better range. The only 
time the API returns a FALSE for a query is when the range is 
unsupported..  ie, you ask for the range of a float statement or argument.


Andrew



Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:
>
> Hello.
>
> All right, I come up with a rapid speed up that can allow us to remove
> the introduced parameter. It contains 2 parts:
> - BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
> - JT: we spent quite some time in density calculation, we can guess it first
>and it leads to a fast bail out.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }

vec::contains is linear search so no.  Was this for the length check?
Just do

 if (bitmap_set_bit (...))
  {
length++;
if (length > ...)

> Thanks,
> Martin


Re: [PATCH] middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

2020-09-25 Thread Richard Biener
On Fri, 25 Sep 2020, Richard Sandiford wrote:

> Richard Biener  writes:
> >> What do we allow for non-boolean constructors.  E.g. for:
> >> 
> >>   v2hi = 0xf001;
> >> 
> >> do we allow the CONSTRUCTOR to be { 0xf001 }?  Is the type of an
> >> initialiser value allowed to be arbitrarily different from the type
> >> of the elements being initialised?
> >> 
> >> Or is there requirement that (say) each constructor element is either:
> >> 
> >> - a scalar that initialises one element of the constructed vector
> >> - a vector of N elements that initialises N elements of the constructed 
> >> vector
> >> 
> >> ?
> >> 
> >> Like you say, it mostly seems like guesswork how booleans would be
> >> handled here, but personally I don't know the answer for non-booleans
> >> either :-)
> >
> > There's extensive checking in tree-cfg.c for vector CTORs meanwhile.
> > We only supporm uniform element CTORs with only trailing zeros elided.
> > And the elements need to either have types of the vector component
> > or be vectors with such component.
> 
> Ah, great.  So in that case, could we ditch bitsize altogether and
> just use:
> 
>   unsigned int nelts = (VECTOR_TYPE_P (val_type)
>   ? TYPE_VECTOR_SUBPARTS (val_type).to_constant () : 1);
> 
> or equivalent to work out the number of elements being initialised
> by each constructor element?

But

   store_constructor_field (target, bitsize, bitpos, 0,
 bitregion_end, value_mode,
 value, cleared, alias, reverse);

still wants the bits to initialize (for the original testcase
the vector had only the first 4 elements initialized,
at wrong bit positions and sizes - QImode).

But yes, I'm sure we can eventually simplify this further.
FYI, the following passed bootstrap, regtest is still running
(but as said, test coverage dropped to zero).

Richard.

commit d16b5975ca985cbe97698479fc38b6a636886978
Author: Richard Biener 
Date:   Fri Sep 25 11:13:13 2020 +0200

middle-end/96814 - fix VECTOR_BOOLEAN_TYPE_P CTOR RTL expansion

The RTL expansion code for CTORs doesn't handle VECTOR_BOOLEAN_TYPE_P
with bit-precision elements correctly as the testcase shows before
the PR97085 fix.  The following makes it do the correct thing
(not 100% sure for CTOR of sub-vectors due to the lack of a testcase).

The alternative would be to assert such CTORs do not happen (and also
add IL verification for this).

The GIMPLE FE needs a way to declare the VECTOR_BOOLEAN_TYPE_P vectors
(thus the C FE needs that).

2020-09-25  Richard Biener  

PR middle-end/96814
* expr.c (store_constructor): Handle VECTOR_BOOLEAN_TYPE_P
CTORs correctly.

* gcc.target/i386/pr96814.c: New testcase.

diff --git a/gcc/expr.c b/gcc/expr.c
index 1a15f24b397..1c79518ee4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -6922,7 +6922,7 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
insn_code icode = CODE_FOR_nothing;
tree elt;
tree elttype = TREE_TYPE (type);
-   int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
+   int elt_size = vector_element_bits (type);
machine_mode eltmode = TYPE_MODE (elttype);
HOST_WIDE_INT bitsize;
HOST_WIDE_INT bitpos;
@@ -6987,6 +6987,15 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
  }
  }
 
+   /* Compute the size of the elements in the CTOR.  It differs
+  from the size of the vector type elements only when the
+  CTOR elements are vectors themselves.  */
+   tree val_type = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
+   if (VECTOR_TYPE_P (val_type))
+ bitsize = tree_to_uhwi (TYPE_SIZE (val_type));
+   else
+ bitsize = elt_size;
+
/* If the constructor has fewer elements than the vector,
   clear the whole array first.  Similarly if this is static
   constructor of a non-BLKmode object.  */
@@ -7001,11 +7010,7 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
 
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
  {
-   tree sz = TYPE_SIZE (TREE_TYPE (value));
-   int n_elts_here
- = tree_to_uhwi (int_const_binop (TRUNC_DIV_EXPR, sz,
-  TYPE_SIZE (elttype)));
-
+   int n_elts_here = bitsize / elt_size;
count += n_elts_here;
if (mostly_zeros_p (value))
  zero_count += n_elts_here;
@@ -7045,7 +7050,6 @@ store_constructor (tree exp, rtx target, int cleared, 
poly_int64 size,
HOST_WIDE_INT eltpos;
tree value = ce->value;
 
-   bitsize = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (value)));
if (cleared && initializer_zerop (value))
   

Re: [PATCH v4 1/3] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-25 Thread Richard Sandiford
xionghu luo  writes:
> @@ -2658,6 +2659,45 @@ expand_vect_cond_mask_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>  
>  #define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
>  
> +/* Expand VEC_SET internal functions.  */
> +
> +static void
> +expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree op0 = gimple_call_arg (stmt, 0);
> +  tree op1 = gimple_call_arg (stmt, 1);
> +  tree op2 = gimple_call_arg (stmt, 2);
> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rtx src = expand_normal (op0);
> +
> +  machine_mode outermode = TYPE_MODE (TREE_TYPE (op0));
> +  scalar_mode innermode = GET_MODE_INNER (outermode);
> +
> +  rtx value = expand_expr (op1, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> +  rtx pos = expand_expr (op2, NULL_RTX, VOIDmode, EXPAND_NORMAL);

These two can just use expand_normal.  Might be easier to read if
they come immediately after the expand_normal (op0).

LGTM with that change for the internal-fn.c stuff, thanks.

Richard


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:18 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:


Hello.

All right, I come up with a rapid speed up that can allow us to remove
the introduced parameter. It contains 2 parts:
- BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
- JT: we spent quite some time in density calculation, we can guess it first
and it leads to a fast bail out.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }


That's intentional as m_max_case_bit_tests is a very small number (3) and
I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
is a constant operation.



vec::contains is linear search so no.  Was this for the length check?
Just do

  if (bitmap_set_bit (...))
   {
 length++;
 if (length > ...)


I would need here bitmap_count_bits. Do you prefer it?

Martin




Thanks,
Martin




[PATCH] assorted improvements for fold_truth_andor_1

2020-09-25 Thread Alexandre Oliva


This patch introduces various improvements to the logic that merges
field compares.

Before the patch, we could merge:

  (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)

into something like:

  (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK)

if both of A's fields live within the same alignment boundaries, and
so do B's, at the same relative positions.  Constants may be used
instead of the object B.

The initial goal of this patch was to enable such combinations when a
field crossed alignment boundaries, e.g. for packed types.  We can't
generally access such fields with a single memory access, so when we
come across such a compare, we will attempt to combine each access
separately.

Some merging opportunities were missed because of right-shifts,
compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
narrowing conversions, especially after earlier merges.  This patch
introduces handlers for several cases involving these.

Other merging opportunities were missed because of association.  The
existing logic would only succeed in merging a pair of consecutive
compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying
e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D.  I've generalized the
handling of the rightmost compare in the left-hand operand, going for
the leftmost compare in the right-hand operand, and then onto trying
to merge compares pairwise, one from each operand, even if they are
not consecutive, taking care to avoid merging operations with
intervening side effects, including volatile accesses.

When it is the second of a non-consecutive pair of compares that first
accesses a word, we may merge the first compare with part of the
second compare that refers to the same word, keeping the compare of
the remaining bits at the spot where the second compare used to be.

Handling compares with non-constant fields was somewhat generalized,
now handling non-adjacent fields.  When a field of one object crosses
an alignment boundary but the other doesn't, we issue the same load in
both compares; gimple optimizers will later turn it into a single
load, without our having to handle SAVE_EXPRs at this point.

The logic for issuing split loads and compares, and ordering them, is
now shared between all cases of compares with constants and with
another object.

The -Wno-error for toplev.o on rs6000 is because of toplev.c's:

  if ((flag_sanitize & SANITIZE_ADDRESS)
  && !FRAME_GROWS_DOWNWARD)

and rs6000.h's:

#define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0   \
  || (flag_sanitize & SANITIZE_ADDRESS) != 0)

The mutually exclusive conditions involving flag_sanitize are now
noticed and reported by fold-const.c's:

  warning (0,
   "% of mutually exclusive equal-tests"
   " is always 0");

This patch enables over 12k compare-merging opportunities that we used
to miss in a GCC bootstrap.

Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* fold-const.c (prepare_xor): New.
(decode_field_reference): Handle xor, shift, and narrowing
conversions.
(all_ones_mask_p): Remove.
(compute_split_boundary_from_align): New.
(build_split_load, reuse_split_load): New.
(fold_truth_andor_1): Add recursion to combine pairs of
non-neighboring compares.  Handle xor compared with zero.
Handle fields straddling across alignment boundaries.
Generalize handling of non-constant rhs.
(fold_truth_andor): Leave sub-expression handling to the
recursion above.
* config/rs6000/t-rs6000 (toplev.o-warn): Disable errors.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-1.c: New.
* gcc.dg/field-merge-2.c: New.
* gcc.dg/field-merge-3.c: New.
* gcc.dg/field-merge-4.c: New.
* gcc.dg/field-merge-5.c: New.
---
 gcc/config/rs6000/t-rs6000   |4 
 gcc/fold-const.c |  818 --
 gcc/testsuite/gcc.dg/field-merge-1.c |   64 +++
 gcc/testsuite/gcc.dg/field-merge-2.c |   31 +
 gcc/testsuite/gcc.dg/field-merge-3.c |   36 +
 gcc/testsuite/gcc.dg/field-merge-4.c |   40 ++
 gcc/testsuite/gcc.dg/field-merge-5.c |   40 ++
 7 files changed, 882 insertions(+), 151 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c
 create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c

diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 1ddb572..516486d 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -52,6 +52,10 @@ $(srcdir)/config/rs6000/rs6000-tables.opt: 
$(srcdir)/config/rs6000/genopt.sh \
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
$(

[PATCH v2] c++/97197 - support TARGET_MEM_REF in C/C++ error pretty-printing

2020-09-25 Thread Jakub Jelinek via Gcc-patches
Hi!

On Fri, Sep 25, 2020 at 01:37:16PM +0200, Richard Biener wrote:
> See my comment above for Martins attempts to improve things.  I don't
> really want to try decide what to do with those late diagnostic IL
> printing but my commit was blamed for showing target-mem-ref unsupported.
> 
> I don't have much time to spend to think what to best print and what not,
> but yes, printing only the MEM_REF part is certainly imprecise.

Here is an updated version of the patch that prints TARGET_MEM_REF the way
it should be printed - as C representation of what it actually means.
Of course it would be better to have the original expressions, but with the
late diagnostics we no longer have them.

Ok for trunk if it passes bootstrap/regtest?

2020-09-25  Richard Biener  
Jakub Jelinek  

PR c++/97197
cp/
* error.c (dump_expr): Handle TARGET_MEM_REF.
c-family/
* c-pretty-print.c: Include langhooks.h.
(c_pretty_printer::postfix_expression): Handle TARGET_MEM_REF as
expression.
(c_pretty_printer::expression): Handle TARGET_MEM_REF as
unary_expression.
(c_pretty_printer::unary_expression): Handle TARGET_MEM_REF.

--- gcc/c-family/c-pretty-print.c.jj2020-09-21 11:15:53.600520132 +0200
+++ gcc/c-family/c-pretty-print.c   2020-09-25 15:21:26.034477251 +0200
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.
 #include "intl.h"
 #include "tree-pretty-print.h"
 #include "selftest.h"
+#include "langhooks.h"
 
 /* The pretty-printer code is primarily designed to closely follow
(GNU) C and C++ grammars.  That is to be contrasted with spaghetti
@@ -1693,6 +1694,7 @@ c_pretty_printer::postfix_expression (tr
   break;
 
 case MEM_REF:
+case TARGET_MEM_REF:
   expression (e);
   break;
 
@@ -1859,6 +1861,55 @@ c_pretty_printer::unary_expression (tree
}
   break;
 
+case TARGET_MEM_REF:
+  pp_c_star (this);
+  if (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (TMR_BASE (e == NULL_TREE
+ || !integer_onep (TYPE_SIZE_UNIT
+   (TREE_TYPE (TREE_TYPE (TMR_BASE (e))
+   {
+ if (TYPE_SIZE_UNIT (TREE_TYPE (e))
+ && integer_onep (TYPE_SIZE_UNIT (TREE_TYPE (e
+   {
+ pp_c_left_paren (this);
+ pp_c_type_cast (this, build_pointer_type (TREE_TYPE (e)));
+   }
+ else
+   {
+ pp_c_type_cast (this, build_pointer_type (TREE_TYPE (e)));
+ pp_c_left_paren (this);
+ pp_c_type_cast (this, build_pointer_type (char_type_node));
+   }
+   }
+  else if (!lang_hooks.types_compatible_p
+ (TREE_TYPE (e), TREE_TYPE (TREE_TYPE (TMR_BASE (e)
+   {
+ pp_c_type_cast (this, build_pointer_type (TREE_TYPE (e)));
+ pp_c_left_paren (this);
+   }
+  else
+   pp_c_left_paren (this);
+  pp_c_cast_expression (this, TMR_BASE (e));
+  if (TMR_STEP (e) && TMR_INDEX (e))
+   {
+ pp_plus (this);
+ pp_c_cast_expression (this, TMR_INDEX (e));
+ pp_c_star (this);
+ pp_c_cast_expression (this, TMR_STEP (e));
+   }
+  if (TMR_INDEX2 (e))
+   {
+ pp_plus (this);
+ pp_c_cast_expression (this, TMR_INDEX2 (e));
+   }
+  if (!integer_zerop (TMR_OFFSET (e)))
+   {
+ pp_plus (this);
+ pp_c_integer_constant (this,
+fold_convert (ssizetype, TMR_OFFSET (e)));
+   }
+  pp_c_right_paren (this);
+  break;
+
 case REALPART_EXPR:
 case IMAGPART_EXPR:
   pp_c_ws_string (this, code == REALPART_EXPR ? "__real__" : "__imag__");
@@ -2295,6 +2346,7 @@ c_pretty_printer::expression (tree e)
 case ADDR_EXPR:
 case INDIRECT_REF:
 case MEM_REF:
+case TARGET_MEM_REF:
 case NEGATE_EXPR:
 case BIT_NOT_EXPR:
 case TRUTH_NOT_EXPR:
--- gcc/cp/error.c.jj   2020-07-28 15:39:09.780759362 +0200
+++ gcc/cp/error.c  2020-09-25 15:30:17.452823375 +0200
@@ -2400,6 +2400,57 @@ dump_expr (cxx_pretty_printer *pp, tree
}
   break;
 
+case TARGET_MEM_REF:
+  pp_cxx_star (pp);
+  pp_cxx_left_paren (pp);
+  if (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (TMR_BASE (t == NULL_TREE
+ || !integer_onep (TYPE_SIZE_UNIT
+   (TREE_TYPE (TREE_TYPE (TMR_BASE (t))
+   {
+ if (TYPE_SIZE_UNIT (TREE_TYPE (t))
+ && integer_onep (TYPE_SIZE_UNIT (TREE_TYPE (t
+   {
+ pp_cxx_left_paren (pp);
+ dump_type (pp, build_pointer_type (TREE_TYPE (t)), flags);
+   }
+ else
+   {
+ dump_type (pp, build_pointer_type (TREE_TYPE (t)), flags);
+ pp_cxx_right_paren (pp);
+ pp_cxx_left_paren (pp);
+ pp_cxx_left_paren (pp);
+ dump_type (pp, build_pointer_type (char_type_node), flags);
+   }
+ pp_cxx_right_pa

[committed][nvptx] Fix Wimplicit-fallthrough in nvptx.c with -save-temps

2020-09-25 Thread Tom de Vries
Hi,

When compiling nvptx.c using -save-temps, I ran into Wimplicit-fallthrough
warnings.

The fallthrough locations have been marked with a fallthrough comment, but
that doesn't work with -save-temps, something that has been filed as
PR78497.

Work around this by using gcc_fallthrough () in addition to the comment.

Tested by building target nvptx, copying nvptx.c compile line and adding
-save-temps.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix Wimplicit-fallthrough in nvptx.c with -save-temps

gcc/ChangeLog:

2020-09-25  Tom de Vries  

* config/nvptx/nvptx.c (nvptx_assemble_integer, nvptx_print_operand):
Use gcc_fallthrough ().

---
 gcc/config/nvptx/nvptx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 54b1fdf669b..de82f9ab875 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2101,7 +2101,7 @@ nvptx_assemble_integer (rtx x, unsigned int size, int 
ARG_UNUSED (aligned_p))
   val = INTVAL (XEXP (x, 1));
   x = XEXP (x, 0);
   gcc_assert (GET_CODE (x) == SYMBOL_REF);
-  /* FALLTHROUGH */
+  gcc_fallthrough (); /* FALLTHROUGH */
 
 case SYMBOL_REF:
   gcc_assert (size == init_frag.size);
@@ -2603,7 +2603,7 @@ nvptx_print_operand (FILE *file, rtx x, int code)
 {
 case 'A':
   x = XEXP (x, 0);
-  /* FALLTHROUGH.  */
+  gcc_fallthrough (); /* FALLTHROUGH. */
 
 case 'D':
   if (GET_CODE (x) == CONST)


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 3:32 PM Martin Liška  wrote:
>
> On 9/25/20 3:18 PM, Richard Biener wrote:
> > On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> All right, I come up with a rapid speed up that can allow us to remove
> >> the introduced parameter. It contains 2 parts:
> >> - BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
> >> - JT: we spent quite some time in density calculation, we can guess it 
> >> first
> >> and it leads to a fast bail out.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > Err
> >
> > +  auto_vec dest_bbs;
> > -  auto_bitmap dest_bbs;
> >
> > -  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
> > +  if (!dest_bbs.contains (sc->m_case_bb->index))
> > +   {
> > + dest_bbs.safe_push (sc->m_case_bb->index);
> > + if (dest_bbs.length () > m_max_case_bit_tests)
> > +   return false;
> > +   }
>
> That's intentional as m_max_case_bit_tests is a very small number (3) and
> I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
> is a constant operation.

You're storing bb->index and formerly set bb->index bit, what's the difference?

For max 3 elements a vector is OK, of course but there should be a comment
that says this ;)  The static const is 'int' so it can in principle
hold up to two billion ;)

> >
> > vec::contains is linear search so no.  Was this for the length check?
> > Just do
> >
> >   if (bitmap_set_bit (...))
> >{
> >  length++;
> >  if (length > ...)
>
> I would need here bitmap_count_bits. Do you prefer it?

bitmap_set_bit returns false if the bit was already set so you can
count as you add bits, see the length++ above.

For three elements the vec will be faster though.  May I suggest
to use

 auto_vec dest_bbs;

then and quick_push rather than safe_push (need to guard the
push with the max_case_bit_test).

Richard.



> Martin
>
> >
> >> Thanks,
> >> Martin
>


PING^2 [GCC 10] [PATCH] IRA: Don't make a global register eliminable

2020-09-25 Thread H.J. Lu via Gcc-patches
On Tue, Sep 22, 2020 at 10:48 AM H.J. Lu  wrote:
>
> On Fri, Sep 18, 2020 at 10:21 AM H.J. Lu  wrote:
> >
> > On Thu, Sep 17, 2020 at 3:52 PM Jeff Law  wrote:
> > >
> > >
> > > On 9/16/20 8:46 AM, Richard Sandiford wrote:
> > >
> > > "H.J. Lu"  writes:
> > >
> > > On Tue, Sep 15, 2020 at 7:44 AM Richard Sandiford
> > >  wrote:
> > >
> > > Thanks for looking at this.
> > >
> > > "H.J. Lu"  writes:
> > >
> > > commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
> > > Author: Richard Sandiford 
> > > Date:   Wed Oct 2 07:37:10 2019 +
> > >
> > > [LRA] Don't make eliminable registers live (PR91957)
> > >
> > > didn't make eliminable registers live which breaks
> > >
> > > register void *cur_pro asm("reg");
> > >
> > > where "reg" is an eliminable register.  Make fixed eliminable registers
> > > live to fix it.
> > >
> > > I don't think fixedness itself is the issue here: it's usual for at
> > > least some registers involved in eliminations to be fixed registers.
> > >
> > > I think what makes this case different is instead that cur_pro/ebp
> > > is a global register.  But IMO things have already gone wrong if we
> > > think that a global register is eliminable.
> > >
> > > So I wonder if instead we should check global_regs at the beginning of:
> > >
> > >   for (i = 0; i < fp_reg_count; i++)
> > > if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
> > > HARD_FRAME_POINTER_REGNUM + i))
> > >   {
> > > SET_HARD_REG_BIT (eliminable_regset,
> > >   HARD_FRAME_POINTER_REGNUM + i);
> > > if (frame_pointer_needed)
> > >   SET_HARD_REG_BIT (ira_no_alloc_regs,
> > > HARD_FRAME_POINTER_REGNUM + i);
> > >   }
> > > else if (frame_pointer_needed)
> > >   error ("%s cannot be used in % here",
> > >  reg_names[HARD_FRAME_POINTER_REGNUM + i]);
> > > else
> > >   df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
> > >
> > > (ira_setup_eliminable_regset), and handle the global_regs[] case in
> > > the same way as the else case, i.e. short-circuiting both of the ifs.
> > >
> > > Like this?
> > >
> > > Sorry for the delay.  I was testing this in parallel.
> > >
> > > Bootstrapped & regression-tested on x86_64-linux-gnu.
> > >
> > > Thanks,
> > > Richard
> > >
> > >
> > > 0001-ira-Fix-elimination-for-global-hard-FPs-PR91957.patch
> > >
> > > From af4499845d26fe65573b21197a79fd22fd38694e Mon Sep 17 00:00:00 2001
> > > From: "H.J. Lu" 
> > > Date: Tue, 15 Sep 2020 06:23:26 -0700
> > > Subject: [PATCH] ira: Fix elimination for global hard FPs [PR91957]
> > > MIME-Version: 1.0
> > > Content-Type: text/plain; charset=UTF-8
> > > Content-Transfer-Encoding: 8bit
> > >
> > > If the hard frame pointer is being used as a global register,
> > > we should skip the usual handling for eliminations.  As the
> > > comment says, the register cannot in that case be eliminated
> > > (or eliminated to) and is already marked live where appropriate.
> > >
> > > Doing this removes the duplicate error for gcc.target/i386/pr82673.c.
> > > The “cannot be used in 'asm' here” message is meant to be for asm
> > > statements rather than register asms, and the function that the
> > > error is reported against doesn't use asm.
> > >
> > > gcc/
> > > 2020-09-16  Richard Sandiford  
> > >
> > > PR middle-end/91957
> > > * ira.c (ira_setup_eliminable_regset): Skip the special elimination
> > > handling of the hard frame pointer if the hard frame pointer is fixed.
> > >
> > > gcc/testsuite/
> > > 2020-09-16  H.J. Lu  
> > >Richard Sandiford  
> > >
> > > PR middle-end/91957
> > > * g++.target/i386/pr97054.C: New test.
> > > * gcc.target/i386/pr82673.c: Remove redundant extra message.
> > >
> > > OK
> >
> > OK for GCC 10 branch?
> >
> > Thanks.
>
> PING:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554268.html
>

PING.


-- 
H.J.


Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 25, 2020 at 11:13:06AM +0200, Martin Liška wrote:
> --- a/gcc/tree-switch-conversion.c
> +++ b/gcc/tree-switch-conversion.c
> @@ -1268,6 +1268,15 @@ jump_table_cluster::can_be_handled (const vec *> &clusters,
>if (range == 0)
>  return false;
>  
> +  unsigned HOST_WIDE_INT lhs = 100 * range;
> +  if (lhs < range)
> +return false;

If this test is meant to detect when 100 * range has overflowed,
then I think it is insufficient.
Perhaps do
  if (range > HOST_WIDE_INT_M1U / 100)
return false;

  unsigned HOST_WIDE_INT lhs = 100 * range;
instead?

Jakub



c++: DECL_BUILTIN_P for builtins

2020-09-25 Thread Nathan Sidwell


We currently detect builtin decls via DECL_ARTIFICIAL &&
!DECL_HIDDEN_FUNCTION_P, which, besides being clunky, is a problem as
hiddenness is a property of the symbol table -- not the decl being
hidden.  This adds DECL_BUILTIN_P, which just looks at the
SOURCE_LOCATION -- we have a magic one for builtins.

One of the consequential changes is to make function-scope omp udrs
have function context (needed because otherwise duplicate-decls thinks
the types don't match at the point we check).  This is also morally
better, because that's what they are -- nested functions, stop lying.

(That's actually my plan for all DECL_LOCAL_DECL_P decls, as they are
distinct decls to the namespace-scope decl they alias.)

gcc/cp/
* cp-tree.h (DECL_BUILTIN_P): New.
* decl.c (duplicate_decls): Use it.  Do not treat omp-udr as a
builtin.
* name-lookup.c (anticipated_builtin): Use it.
(set_decl_context_in_fn): Function-scope OMP UDRs have function 
context.

(do_nonmember_using_decl): Use DECL_BUILTIN_P.
* parser.c (cp_parser_omp_declare_reduction): Function-scope OMP
UDRs have function context.  Assert we never find a valid 
duplicate.
* pt.c (tsubst_expr): Function-scope OMP UDRs have function 
context.

libcc1/
* libcp1plugin.cc (supplement_binding): Use DECL_BULTIN_P.

pushing to trunk

nathan

--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 3ae48749b3d..bd78f00ba97 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -4040,6 +4040,10 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter)
 #define FNDECL_USED_AUTO(NODE) \
   TREE_LANG_FLAG_2 (FUNCTION_DECL_CHECK (NODE))
 
+/* True if NODE is a builtin decl.  */
+#define DECL_BUILTIN_P(NODE) \
+  (DECL_SOURCE_LOCATION(NODE) == BUILTINS_LOCATION)
+
 /* Nonzero if NODE is a DECL which we know about but which has not
been explicitly declared, such as a built-in function or a friend
declared inside a class.  In the latter case DECL_HIDDEN_FRIEND_P
diff --git i/gcc/cp/decl.c w/gcc/cp/decl.c
index 6019051ed12..1709dd9a370 100644
--- i/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -1464,9 +1464,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
 
   /* Check for redeclaration and other discrepancies.  */
   if (TREE_CODE (olddecl) == FUNCTION_DECL
-  && DECL_ARTIFICIAL (olddecl)
-  /* A C++20 implicit friend operator== uses the normal path (94462).  */
-  && !DECL_HIDDEN_FRIEND_P (olddecl))
+  && DECL_BUILTIN_P (olddecl))
 {
   if (TREE_CODE (newdecl) != FUNCTION_DECL)
 	{
@@ -1508,15 +1506,6 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
 		  "declaration %q#D", newdecl, olddecl);
 	  return NULL_TREE;
 	}
-  else if (DECL_OMP_DECLARE_REDUCTION_P (olddecl))
-	{
-	  gcc_assert (DECL_OMP_DECLARE_REDUCTION_P (newdecl));
-	  error_at (newdecl_loc,
-		"redeclaration of %");
-	  inform (olddecl_loc,
-		  "previous % declaration");
-	  return error_mark_node;
-	}
   else if (!types_match)
 	{
 	  /* Avoid warnings redeclaring built-ins which have not been
@@ -1815,6 +1804,17 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
 	  return error_mark_node;
 	}
 }
+  else if (TREE_CODE (newdecl) == FUNCTION_DECL
+	   && DECL_OMP_DECLARE_REDUCTION_P (newdecl))
+{
+  /* OMP UDRs are never duplicates. */
+  gcc_assert (DECL_OMP_DECLARE_REDUCTION_P (olddecl));
+  error_at (newdecl_loc,
+		"redeclaration of %");
+  inform (olddecl_loc,
+	  "previous % declaration");
+  return error_mark_node;
+}
   else if (TREE_CODE (newdecl) == FUNCTION_DECL
 	&& ((DECL_TEMPLATE_SPECIALIZATION (olddecl)
 		 && (!DECL_TEMPLATE_INFO (newdecl)
diff --git i/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index e7764abff67..dbc6cc32dd8 100644
--- i/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -2119,10 +2119,10 @@ anticipated_builtin_p (tree ovl)
   tree fn = OVL_FUNCTION (ovl);
   gcc_checking_assert (DECL_ANTICIPATED (fn));
 
-  if (DECL_HIDDEN_FRIEND_P (fn))
-return false;
+  if (DECL_BUILTIN_P (fn))
+return true;
 
-  return true;
+  return false;
 }
 
 /* BINDING records an existing declaration for a name in the current scope.
@@ -2857,9 +2857,12 @@ set_decl_context_in_fn (tree ctx, tree decl)
 {
   if (TREE_CODE (decl) == FUNCTION_DECL
   || (VAR_P (decl) && DECL_EXTERNAL (decl)))
-/* Make sure local externs are marked as such.  */
+/* Make sure local externs are marked as such.  OMP UDRs really
+   are nested functions.  */
 gcc_checking_assert (DECL_LOCAL_DECL_P (decl)
-			 && DECL_NAMESPACE_SCOPE_P (decl));
+			 && (DECL_NAMESPACE_SCOPE_P (decl)
+			 || (TREE_CODE (decl) == FUNCTION_DECL
+ && DECL_OMP_DECLARE_REDUCTION_P (decl;
 
   if (!DECL_CONTEXT (decl)
   /* When parsing the parameter list of a function declarator,
@@ -3934,7 +3937,7 @@ do_nonmember_using_decl (name_lookup &l

Re: [PATCH] Add if-chain to switch conversion pass.

2020-09-25 Thread Martin Liška

On 9/24/20 2:41 PM, Richard Biener wrote:

On Wed, Sep 2, 2020 at 1:53 PM Martin Liška  wrote:


On 9/1/20 4:50 PM, David Malcolm wrote:

Hope this is constructive
Dave


Thank you David. All of them very very useful!

There's updated version of the patch.


Hey.

What a juicy patch review!



I noticed several functions without a function-level comment.


Yep, but several of them are documented in a class declaration. Anyway, I will
improve for the next time.



-  cluster (tree case_label_expr, basic_block case_bb, profile_probability prob,
-  profile_probability subtree_prob);
+  inline cluster (tree case_label_expr, basic_block case_bb,
+ profile_probability prob, profile_probability subtree_prob);

I thought we generally leave this to the compiler ...

+@item -fconvert-if-to-switch
+@opindex fconvert-if-to-switch
+Perform conversion of an if cascade into a switch statement.
+Do so if the switch can be later transformed using a jump table
+or a bit test.  The transformation can help to produce faster code for
+the switch statement.  This flag is enabled by default
+at @option{-O2} and higher.

this mentions we do this only when we later can convert the
switch again but both passes (we still have two :/) have
independent guards.


Yes, we have the option for jump tables (-jump-tables), but we miss one for a 
bit-test.
Moreover, as mentioned in the cover email, one can see it beneficial to convert 
a if-chain
to switch as the expansion (without any BT and JT) can benefit from balanced 
tree.



+  /* For now, just wipe the dominator information.  */
+  free_dominance_info (CDI_DOMINATORS);

could at least be conditional on the vop renaming condition...

+  if (!all_candidates.is_empty ())
+mark_virtual_operands_for_renaming (fun);


Yep.



+  if (bitmap_bit_p (*visited_bbs, bb->index))
+   break;
+  bitmap_set_bit (*visited_bbs, bb->index);

since you are using a bitmap and not a sbitmap (why?)
you can combine those into


New to me, thanks.



if (!bitmap_set_bit (*visited_bbs, bb->index))
 break;

+  /* Current we support following patterns (situations):
+
+1) if condition with equal operation:
+
...

did you see whether using

register_edge_assert_for (lhs, true_edge, code, lhs, rhs, asserts);

works equally well?  It fills the 'asserts' vector with relations
derived from 'lhs'.  There's also
vr_values::extract_range_for_var_from_comparison_expr
to compute the case_range


Good point! I must admit that my patch doesn't properly handle negative 
conditions:

  if (argc != 1)
  {
if (argc == 1)
  global = 222;
...
  }

which can VRP correctly identify as anti-range:
int ~[1, 1]  EQUIVALENCES: { argc_8(D) } (1 elements)$1 = void

I have question about OR and AND conditions:

   :
  _1 = aChar_8(D) == 1;
  _2 = aChar_8(D) == 10;
  _3 = _1 | _2;
  if (_3 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  _1 = aChar_8(D) != 1;
  _2 = aChar_8(D) != 10;
  _3 = _1 & _2;
  if (_3 != 0)
goto ; [INV]
  else
goto ; [INV]

Can I somehow get that from VRP (as I ask register_edge_assert_for only for LHS
of a condition)?



+  /* If it's not the first condition, then we need a BB without
+any statements.  */
+  if (!first)
+   {
+ unsigned stmt_count = 0;
+ for (gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
+  !gsi_end_p (gsi); gsi_next_nondebug (&gsi))
+   ++stmt_count;
+
+ if (stmt_count - visited_stmt_count != 0)
+   break;

hmm, OK, this might be a bit iffy to get correct then, still it's a lot
of pattern maching code that is there elsewhere already.
ifcombine simply hoists any stmts without side-effects up the
dominator tree and thus only requires BBs without side-effects
(IIRC there's a predicate fn for that).


Yes, I completely miss support for code hoisting (expect first BB where we put 
gswitch).
If I'm correct hoisting should be possible where case destination should be a 
new BB
that will contain original statements and then it will jump to a case 
destination block.



+  /* Prevent loosing information for a PHI node where 2 edges will
+be folded into one.  Note that we must do the same also for false_edge
+(for last BB in a if-elseif chain).  */
+  if (!chain->record_phi_arguments (true_edge)
+ || !chain->record_phi_arguments (false_edge))

I don't really get this - looking at record_phi_arguments it seems
we're requiring that all edges into the same PHI from inside the case
(irrespective of from which case label) have the same value for the
PHI arg?


I guess so, I'll refresh the functionality.



+ if (arg != *v)
+   return false;

should use operand_equal_p at least, REAL_CSTs are for example
not shared tree nodes.  I'll also notice that if record_phi_arguments
fails we still may have altered its hash-map even though the particular
edge will not participate in the cu

Re: [PATCH 1/2] rs6000: Support _mm_insert_epi{8,32,64}

2020-09-25 Thread Paul A. Clarke via Gcc-patches
On Thu, Sep 24, 2020 at 06:22:10PM -0500, Segher Boessenkool wrote:
> On Wed, Sep 23, 2020 at 05:12:44PM -0500, Paul A. Clarke wrote:
> > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
> > +{
> > +  __v16qi result = (__v16qi)__A;
> > +
> > +  result [(__N & 0b)] = __D;
> 
> Hrm, GCC supports binary constants like this since 2007, so okay.  But I
> have to wonder if this improves anything over hex (or decimal even!)
> The parens are superfluous (and only hinder legibility), fwiw.
> 
> > +_mm_insert_epi64 (__m128i const __A, long long const __D, int const __N)
> > +{
> > +  __v2di result = (__v2di)__A;
> > +
> > +  result [(__N & 0b1)] = __D;
> 
> Especially single-digit numbers look really goofy (like 0x0, but even
> worse for binary somehow).
> 
> Anyway, okay for trunk, with or without those things improved.  Thanks!

I was trying to obviously and consistently convey the sizes of the masks,
but I really want to convey _why_ there are masks, so let me try a
different approach, below.

--

Add compatibility implementations for SSE4.1 intrinsics
_mm_insert_epi8, _mm_insert_epi32, _mm_insert_epi64.

2020-09-25  Paul A. Clarke  

gcc/
* config/rs6000/smmintrin.h (_mm_insert_epi8): New.
(_mm_insert_epi32): New.
(_mm_insert_epi64): New.
---
 gcc/config/rs6000/smmintrin.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index d78ddba99d9..8128c417978 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,36 @@
 #include 
 #include 
 
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
+{
+  __v16qi result = (__v16qi)__A;
+
+  result [__N % (sizeof result / sizeof result[0])] = __D;
+
+  return (__m128i) result;
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_insert_epi32 (__m128i const __A, int const __D, int const __N)
+{
+  __v4si result = (__v4si)__A;
+
+  result [__N % (sizeof result / sizeof result[0])] = __D;
+
+  return (__m128i) result;
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_insert_epi64 (__m128i const __A, long long const __D, int const __N)
+{
+  __v2di result = (__v2di)__A;
+
+  result [__N % (sizeof result / sizeof result[0])] = __D;
+
+  return (__m128i) result;
+}
+
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_extract_epi8 (__m128i __X, const int __N)
 {
-- 
2.18.4



Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:52 PM, Jakub Jelinek wrote:

On Fri, Sep 25, 2020 at 11:13:06AM +0200, Martin Liška wrote:

--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1268,6 +1268,15 @@ jump_table_cluster::can_be_handled (const vec 
&clusters,
if (range == 0)
  return false;
  
+  unsigned HOST_WIDE_INT lhs = 100 * range;

+  if (lhs < range)
+return false;


If this test is meant to detect when 100 * range has overflowed,
then I think it is insufficient.
Perhaps do
   if (range > HOST_WIDE_INT_M1U / 100)
 return false;

   unsigned HOST_WIDE_INT lhs = 100 * range;
instead?


Yes, I'll add the check.

Thanks,
Martin



Jakub





Re: [PATCH] switch lowering: limit number of cluster attemps

2020-09-25 Thread Martin Liška

On 9/25/20 3:45 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 3:32 PM Martin Liška  wrote:


On 9/25/20 3:18 PM, Richard Biener wrote:

On Fri, Sep 25, 2020 at 11:13 AM Martin Liška  wrote:


Hello.

All right, I come up with a rapid speed up that can allow us to remove
the introduced parameter. It contains 2 parts:
- BIT TEST: we allow at maximum a range that is smaller GET_MODE_BITSIZE
- JT: we spent quite some time in density calculation, we can guess it first
 and it leads to a fast bail out.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Err

+  auto_vec dest_bbs;
-  auto_bitmap dest_bbs;

-  bitmap_set_bit (dest_bbs, sc->m_case_bb->index);
+  if (!dest_bbs.contains (sc->m_case_bb->index))
+   {
+ dest_bbs.safe_push (sc->m_case_bb->index);
+ if (dest_bbs.length () > m_max_case_bit_tests)
+   return false;
+   }


That's intentional as m_max_case_bit_tests is a very small number (3) and
I want to track *distinct* indices in dest_bbs. So dest_bbs.contains
is a constant operation.


You're storing bb->index and formerly set bb->index bit, what's the difference?

For max 3 elements a vector is OK, of course but there should be a comment
that says this ;)  The static const is 'int' so it can in principle
hold up to two billion ;)


Sure, comment is needed.





vec::contains is linear search so no.  Was this for the length check?
Just do

   if (bitmap_set_bit (...))
{
  length++;
  if (length > ...)


I would need here bitmap_count_bits. Do you prefer it?


bitmap_set_bit returns false if the bit was already set so you can
count as you add bits, see the length++ above.


Ah, got it!



For three elements the vec will be faster though.  May I suggest
to use

  auto_vec dest_bbs;

then and quick_push rather than safe_push (need to guard the
push with the max_case_bit_test).


Yes.

Is the patch fine with that (and Jakub's comment)?

Martin



Richard.




Martin




Thanks,
Martin






[PATCH] gcov: fix streaming of HIST_TYPE_IOR histogram type.

2020-09-25 Thread Martin Liška

Hello.

I'm going to install quite obvious patch which allow negative values
for HIST_TYPE_IOR as it tracks pointers.

Martin

gcc/ChangeLog:

PR gcov-profile/64636
* value-prof.c (stream_out_histogram_value): Allow negative
values for HIST_TYPE_IOR.
---
 gcc/value-prof.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index ea1b1a8f98f..95d33c63a0c 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -331,7 +331,10 @@ stream_out_histogram_value (struct output_block *ob, 
histogram_value hist)
   /* When user uses an unsigned type with a big value, constant converted
 to gcov_type (a signed type) can be negative.  */
   gcov_type value = hist->hvalue.counters[i];
-  if (hist->type == HIST_TYPE_TOPN_VALUES)
+  if (hist->type == HIST_TYPE_TOPN_VALUES
+ || hist->type == HIST_TYPE_IOR)
+   /* Note that the IOR counter tracks pointer values and these can have
+  sign bit set.  */
;
   else
gcc_assert (value >= 0);
--
2.28.0



[PATCH v2 0/16][RFC][AArch64/Arm/SVE/SVE2/MVE]middle-end Add support for SLP vectorization of complex number instructions.

2020-09-25 Thread Tamar Christina
Hi All,

This patch series adds support for SLP vectorization of complex instructions 
[1].

These instructions exist only in their vector forms and require you to recognize
two statements in parallel.  Complex operations usually require a permute due to
the fact that the real and imaginary numbers are stored intermixed but these 
vector
instructions expect this and no longer need the compiler to generate a permute.

For this reason the pass also re-orders the loads in the SLP tree such that they
become contiguous and no longer need the permutes.  The Basic Blocks are left
untouched such that the scalar loop will still correctly issue permutes.

The instructions also support rotations along the Argand plane, as such the 
operands
have to be re-ordered to coincide with their load group.

For now, this patch only adds support for:

  * Complex Addition with rotation of 0 and 180.
  * Complex Multiplication and Multiplication where one operand is conjucated.
  * Complex FMA and FMA where one operand is conjucated.
  * Complex FMS and FMS where one operand is conjucated.
  
Complex dot-product is not currently supported in this patch set as build_slp 
fails
for it.  This will be provided as a future patch.
  
These are supported for both integer and floating point and as such these don't 
look
for real or imaginary pairs but instead rely on the early lowering of complex
numbers by GCC and canonicazation of the operations such that it just 
recognizes any
instruction sequence matching the operations requested.

To be safe when the it is not sure it can support the operation or if it finds 
something it
does not understand it backs off.

This patch is an RFC and I am looking on feedback on the approach.  Particularly
this series has one problem which is when it is decided that SLP is not viable
and that the normal loop vectorizer is to be used.

In this case I dissolve the changes but the compiler crashes because the use of
pattern matcher essentially undoes two_operands.  This means that the number of
copies needed when using the patterns and when not are different.  When using
the patterns the two operands become the same and so are treated as manually
unrolled loops.  The problem is that because nunits has already been decided
along with the unroll factor.  When the dissolved statements are then analyzed
they fail.  This is also the reason why I cannot analyze both the pattern and
original statements initially.

The relavent placed in the source code have comments describing the problem.

[1] https://developer.arm.com/documentation/ddi0487/fc/

Thanks,
Tamar

-- 


[PATCH v2 2/16]middle-end: Refactor and expose some vectorizer helper functions.

2020-09-25 Thread Tamar Christina
Hi All,

This is a small refactoring which exposes some helper functions in the
vectorizer so they can be used in other places.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.c (vect_mark_pattern_stmts): Remove static.
* tree-vect-slp.c (vect_free_slp_tree,
vect_build_slp_tree): Remove static.
(struct bst_traits, bst_traits::hash, bst_traits::equal): Move...
* tree-vectorizer.h (struct bst_traits, bst_traits::hash,
bst_traits::equal): ... to here.
(vect_mark_pattern_stmts, vect_free_slp_tree,
vect_build_slp_tree): Declare.

-- 
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index db45740da3cba14a3552f9446651e8f289187fbb..3bacd5c827e1a6436c5916022c04e0d6594c316a 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -5169,7 +5169,7 @@ const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
 
 /* Mark statements that are involved in a pattern.  */
 
-static inline void
+void
 vect_mark_pattern_stmts (vec_info *vinfo,
 			 stmt_vec_info orig_stmt_info, gimple *pattern_stmt,
  tree pattern_vectype)
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index bf8ea4326597f4211d2772e9db60aa69285b5998..01189d44d892fc42b132bbb7de1c471df45518ae 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -86,7 +86,7 @@ _slp_tree::~_slp_tree ()
 
 /* Recursively free the memory allocated for the SLP tree rooted at NODE.  */
 
-static void
+void
 vect_free_slp_tree (slp_tree node)
 {
   int i;
@@ -1120,45 +1120,6 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
   return true;
 }
 
-/* Traits for the hash_set to record failed SLP builds for a stmt set.
-   Note we never remove apart from at destruction time so we do not
-   need a special value for deleted that differs from empty.  */
-struct bst_traits
-{
-  typedef vec  value_type;
-  typedef vec  compare_type;
-  static inline hashval_t hash (value_type);
-  static inline bool equal (value_type existing, value_type candidate);
-  static inline bool is_empty (value_type x) { return !x.exists (); }
-  static inline bool is_deleted (value_type x) { return !x.exists (); }
-  static const bool empty_zero_p = true;
-  static inline void mark_empty (value_type &x) { x.release (); }
-  static inline void mark_deleted (value_type &x) { x.release (); }
-  static inline void remove (value_type &x) { x.release (); }
-};
-inline hashval_t
-bst_traits::hash (value_type x)
-{
-  inchash::hash h;
-  for (unsigned i = 0; i < x.length (); ++i)
-h.add_int (gimple_uid (x[i]->stmt));
-  return h.end ();
-}
-inline bool
-bst_traits::equal (value_type existing, value_type candidate)
-{
-  if (existing.length () != candidate.length ())
-return false;
-  for (unsigned i = 0; i < existing.length (); ++i)
-if (existing[i] != candidate[i])
-  return false;
-  return true;
-}
-
-typedef hash_map , slp_tree,
-		  simple_hashmap_traits  >
-  scalar_stmts_to_slp_tree_map_t;
-
 static slp_tree
 vect_build_slp_tree_2 (vec_info *vinfo,
 		   vec stmts, unsigned int group_size,
@@ -1166,7 +1127,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,
 		   bool *matches, unsigned *npermutes, unsigned *tree_size,
 		   scalar_stmts_to_slp_tree_map_t *bst_map);
 
-static slp_tree
+slp_tree
 vect_build_slp_tree (vec_info *vinfo,
 		 vec stmts, unsigned int group_size,
 		 poly_uint64 *max_nunits,
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2ebcf9f9926ec7175f28391f172800499bbc59db..79926f1a43534635ddca85556a928e364022c40a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2047,6 +2047,9 @@ extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
 extern bool vect_update_shared_vectype (stmt_vec_info, tree);
 
 /* In tree-vect-patterns.c.  */
+extern void
+vect_mark_pattern_stmts (vec_info *, stmt_vec_info, gimple *, tree);
+
 /* Pattern recognition functions.
Additional pattern recognition functions can (and will) be added
in the future.  */
@@ -2058,4 +2061,51 @@ void vect_free_loop_info_assumptions (class loop *);
 gimple *vect_loop_vectorized_call (class loop *, gcond **cond = NULL);
 bool vect_stmt_dominates_stmt_p (gimple *, gimple *);
 
+/* Traits for the hash_set to record failed SLP builds for a stmt set.
+   Note we never remove apart from at destruction time so we do not
+   need a special value for deleted that differs from empty.  */
+struct bst_traits
+{
+  typedef vec  value_type;
+  typedef vec  compare_type;
+  static inline hashval_t hash (value_type);
+  static inline bool equal (value_type existing, value_type candidate);
+  static inline bool is_empty (value_type x) { return !x.exists (); }
+  static inline bool is_deleted (value_type x) { return !x.exists (); }
+  static const bool empty_zero_p = true;
+  static inline void mark_empty (value_type &x) { 

[PATCH v2 1/16]middle-end: Refactor refcnt to use SLP_TREE_REF_COUNT for consistency

2020-09-25 Thread Tamar Christina
Hi All,

This is a small refactoring which introduces SLP_TREE_REF_COUNT and replaces
the uses of refcnt with it.  This for consistency between the other properties.

A similar patch was pre-approved last year but since there are more use now I am
sending it for review anyway.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vectorizer.h (SLP_TREE_REF_COUNT): New.
* tree-vect-slp.c (_slp_tree::_slp_tree, _slp_tree::~_slp_tree,
vect_free_slp_tree, vect_build_slp_tree, vect_print_slp_tree,
slp_copy_subtree, vect_attempt_slp_rearrange_stmts): Use it.

-- 
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c44fd396bf0b69a4153e46026c545bebb3797551..bf8ea4326597f4211d2772e9db60aa69285b5998 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -66,7 +66,7 @@ _slp_tree::_slp_tree ()
   SLP_TREE_CODE (this) = ERROR_MARK;
   SLP_TREE_VECTYPE (this) = NULL_TREE;
   SLP_TREE_REPRESENTATIVE (this) = NULL;
-  this->refcnt = 1;
+  SLP_TREE_REF_COUNT (this) = 1;
   this->max_nunits = 1;
   this->lanes = 0;
 }
@@ -92,7 +92,7 @@ vect_free_slp_tree (slp_tree node)
   int i;
   slp_tree child;
 
-  if (--node->refcnt != 0)
+  if (--SLP_TREE_REF_COUNT (node) != 0)
 return;
 
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
@@ -1180,7 +1180,7 @@ vect_build_slp_tree (vec_info *vinfo,
 			 *leader ? "" : "failed ", *leader);
   if (*leader)
 	{
-	  (*leader)->refcnt++;
+	  SLP_TREE_REF_COUNT (*leader)++;
 	  vect_update_max_nunits (max_nunits, (*leader)->max_nunits);
 	}
   return *leader;
@@ -1194,7 +1194,7 @@ vect_build_slp_tree (vec_info *vinfo,
   res->max_nunits = this_max_nunits;
   vect_update_max_nunits (max_nunits, this_max_nunits);
   /* Keep a reference for the bst_map use.  */
-  res->refcnt++;
+  SLP_TREE_REF_COUNT (res)++;
 }
   bst_map->put (stmts.copy (), res);
   return res;
@@ -1590,7 +1590,7 @@ fail:
   SLP_TREE_CHILDREN (two).safe_splice (children);
   slp_tree child;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (two), i, child)
-	child->refcnt++;
+	SLP_TREE_REF_COUNT (child)++;
 
   /* Here we record the original defs since this
 	 node represents the final lane configuration.  */
@@ -1650,7 +1650,8 @@ vect_print_slp_tree (dump_flags_t dump_kind, dump_location_t loc,
 		   : (SLP_TREE_DEF_TYPE (node) == vect_constant_def
 		  ? " (constant)"
 		  : ""), node,
-		   estimated_poly_value (node->max_nunits), node->refcnt);
+		   estimated_poly_value (node->max_nunits),
+	 SLP_TREE_REF_COUNT (node));
   if (SLP_TREE_SCALAR_STMTS (node).exists ())
 FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
   dump_printf_loc (metadata, user_loc, "\tstmt %u %G", i, stmt_info->stmt);
@@ -1802,7 +1803,7 @@ slp_copy_subtree (slp_tree node, hash_map &map)
   SLP_TREE_REPRESENTATIVE (copy) = SLP_TREE_REPRESENTATIVE (node);
   SLP_TREE_LANES (copy) = SLP_TREE_LANES (node);
   copy->max_nunits = node->max_nunits;
-  copy->refcnt = 0;
+  SLP_TREE_REF_COUNT (copy) = 0;
   if (SLP_TREE_SCALAR_STMTS (node).exists ())
 SLP_TREE_SCALAR_STMTS (copy) = SLP_TREE_SCALAR_STMTS (node).copy ();
   if (SLP_TREE_SCALAR_OPS (node).exists ())
@@ -1819,7 +1820,7 @@ slp_copy_subtree (slp_tree node, hash_map &map)
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (copy), i, child)
 {
   SLP_TREE_CHILDREN (copy)[i] = slp_copy_subtree (child, map);
-  SLP_TREE_CHILDREN (copy)[i]->refcnt++;
+  SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (copy)[i])++;
 }
   return copy;
 }
@@ -1935,7 +1936,7 @@ vect_attempt_slp_rearrange_stmts (slp_instance slp_instn)
   hash_map map;
   slp_tree unshared = slp_copy_subtree (SLP_INSTANCE_TREE (slp_instn), map);
   vect_free_slp_tree (SLP_INSTANCE_TREE (slp_instn));
-  unshared->refcnt++;
+  SLP_TREE_REF_COUNT (unshared)++;
   SLP_INSTANCE_TREE (slp_instn) = unshared;
   FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
 SLP_INSTANCE_LOADS (slp_instn)[i] = *map.get (node);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9dffc5570e51b21c2f5c02b80a9f49d25a183284..2ebcf9f9926ec7175f28391f172800499bbc59db 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -204,6 +204,7 @@ public:
 #define SLP_TREE_CHILDREN(S) (S)->children
 #define SLP_TREE_SCALAR_STMTS(S) (S)->stmts
 #define SLP_TREE_SCALAR_OPS(S)   (S)->ops
+#define SLP_TREE_REF_COUNT(S)(S)->refcnt
 #define SLP_TREE_VEC_STMTS(S)(S)->vec_stmts
 #define SLP_TREE_VEC_DEFS(S) (S)->vec_defs
 #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)  (S)->vec_stmts_size



[PATCH v2 3/16]middle-end Add basic SLP pattern matching scaffolding.

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds the basic infrastructure for doing pattern matching on SLP 
trees.
This is done immediately after the SLP tree creation because it can change the
shape of the tree in radical ways and so we would like to do it before any
analysis is performed on the tree.

A new file tree-vect-slp-patterns.c is added which contains all the code for
pattern matching on SLP trees.

This cover letter is short because the changes are heavily commented.

All pattern matchers need to implement the abstract type VectPatternMatch.
The VectSimplePatternMatch abstract class provides some default functionality
for pattern matchers that need to rebuild nodes.

The pattern matcher requires if replacing a statement in a node, that ALL
statements be replaced.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* Makefile.in (tree-vect-slp-patterns.o): New.
* doc/passes.texi: Update documentation.
* tree-vect-slp.c (vect_match_slp_patterns_2, vect_match_slp_patterns):
New.
(vect_analyze_slp_instance): Call pattern matcher.
* tree-vectorizer.h (class VectPatternMatch, class VectPattern): New.
* tree-vect-slp-patterns.c: New file.

-- 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9c6c1c93b976aaf350cc1f9b3bdc538308fdf08b..936202b73696c8529b32c05b2356c7316fabc542 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1638,6 +1638,7 @@ OBJS = \
 	tree-vect-loop.o \
 	tree-vect-loop-manip.o \
 	tree-vect-slp.o \
+	tree-vect-slp-patterns.o \
 	tree-vectorizer.o \
 	tree-vector-builder.o \
 	tree-vrp.o \
diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99a23386891a7b0c1 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -709,7 +709,8 @@ loop.
 The pass is implemented in @file{tree-vectorizer.c} (the main driver),
 @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts
 and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
-functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
+functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
+@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
 Analysis of data references is in @file{tree-data-ref.c}.
 
 SLP Vectorization.  This pass performs vectorization of straight-line code. The
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
new file mode 100644
index ..f605f68d2a14c4bf4941f97b7c1d57f6acb5ffb1
--- /dev/null
+++ b/gcc/tree-vect-slp-patterns.c
@@ -0,0 +1,310 @@
+/* SLP - Pattern matcher on SLP trees
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h"		/* FIXME: for insn_data */
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "gimple-iterator.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"
+#include "langhooks.h"
+#include "gimple-walk.h"
+#include "dbgcnt.h"
+#include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
+#include "gimple-fold.h"
+#include "internal-fn.h"
+
+/* SLP Pattern matching mechanism.
+
+  This extension to the SLP vectorizer allows one to transform the generated SLP
+  tree based on any pattern.  The difference between this and the normal vect
+  pattern matcher is that unlike the former, this matcher allows you to match
+  with instructions that do not belong to the same SSA dominator graph.
+
+  The only requirement that this pattern matcher has is that you are only
+  only allowed to either match an entire group or none.
+
+  As an example, the following simple loop:
+
+double a[restrict N]; double b[restrict N]; double c[restrict N];
+
+for (int i=0; i < N; i+=2)
+{
+  c[i] = a[i] - b[i+1];
+  c[i+1] = a[i+1] + b[i];
+}
+
+  which represents a complex addition on with a rotation of 90* around the
+  argand plane. i.e. if `a` and `b` were complex numbers then this would be the
+  same as `a + (b * I)`.
+
+  Here the expressions for `c[i

[PATCH v2 6/16]middle-end Add Complex Addition with rotation detection

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds pattern detections for the following operation:

  Addition with rotation of the second argument around the Argand plane.
Supported rotations are 90 and 180.

c = a + (b * I) and c = a + (b * I * I)

  where a, b and c are complex numbers.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/md.texi: Document optabs.
* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
* optabs.def (cadd90_optab, cadd270_optab): New.
* tree-vect-slp-patterns.c (class ComplexAddPattern): New.
(slp_patterns): Add ComplexAddPattern.

-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6132,6 +6132,17 @@ floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cadd@var{m}@var{n}3} instruction pattern
+@item @samp{cadd@var{m}@var{n}3}
+Perform a vector addition of complex numbers in operand 1 with operand 2
+rotated by @var{m} degrees around the argand plane and storing the result in
+operand 0.  The instruction must perform the operation on data loaded
+contiguously into the vectors.
+The operation is only supported for vector modes @var{n} and with
+rotations @var{m} of 90 or 270.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -663,12 +663,94 @@ graceful_exit:
 }
 };
 
+class ComplexAddPattern : public ComplexPattern
+{
+  protected:
+ComplexAddPattern (slp_tree node, vec_info *vinfo)
+  : ComplexPattern (node, vinfo)
+{
+  this->m_arity = 2;
+  this->m_num_args = 2;
+  this->m_vects.create (0);
+  this->m_defs.create (0);
+}
+
+  public:
+~ComplexAddPattern ()
+{
+  this->m_vects.release ();
+  this->m_defs.release ();
+}
+
+static VectPattern* create (slp_tree node, vec_info *vinfo)
+{
+   return new ComplexAddPattern (node, vinfo);
+}
+
+const char* get_name ()
+{
+  return "Complex Addition";
+}
+
+/* Pattern matcher for trying to match complex addition pattern in SLP tree
+   using the N statements statements found in node starting at position IDX.
+   If the operation matches then IFN is set to the operation it matched and
+   the arguments to the two replacement statements are put in VECTS.
+
+   If no match is found then IFN is set to IFN_LAST.
+
+   This function matches the patterns shaped as:
+
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+
+   If a match occurred then TRUE is returned, else FALSE.  */
+
+bool matches (stmt_vec_info *stmts, int idx)
+{
+  this->m_last_ifn = IFN_LAST;
+  int base = idx - (this->m_arity - 1);
+  this->m_last_idx = idx;
+  this->m_stmt_info = stmts[0];
+
+  complex_operation_t op
+	= vect_detect_pair_op (base, this->m_node, &this->m_vects);
+
+  /* Find the two components.  Rotation in the complex plane will modify
+	 the operations:
+
+	 * Rotation  0: + +
+	 * Rotation 90: - +
+	 * Rotation 180: - -
+	 * Rotation 270: + -
+
+	Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+	to care about them here.  */
+  if (op == MINUS_PLUS)
+	this->m_last_ifn = IFN_COMPLEX_ADD_ROT90;
+  else if (op == PLUS_MINUS)
+	this

[PATCH v2 5/16]middle-end: Add shared machinery for matching patterns involving complex numbers.

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds shared machinery for detecting patterns having to do with
complex number operations.  The class ComplexPattern provides helpers for
matching and ultimately undoing the permutation in the tree by rebuilding the
graph.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-slp-patterns.c (complex_operation_t,class ComplexPattern):
New.

-- 
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index f605f68d2a14c4bf4941f97b7c1d57f6acb5ffb1..6453a5b1b6464dba833adc2c2a194db5e712bb79 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -134,6 +134,19 @@ along with GCC; see the file COPYING3.  If not see
   To add a new pattern, implement the VectPattern class and add the type to
   slp_patterns.  */
 
+/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can
+   be matched when looking for expressions that we are interested matching for
+   complex numbers addition and mla.  */
+
+typedef enum _complex_operation {
+  PLUS_PLUS,
+  MINUS_PLUS,
+  PLUS_MINUS,
+  MULT_MULT,
+  NEG_NEG,
+  CMPLX_NONE
+} complex_operation_t;
+
 /* VectSimplePatternMatch holds contextual information about a single match
found in the SLP tree.  The use of the class is to allow you to defer
performing any modifications to the SLP tree until they are to be done.  By
@@ -298,6 +311,358 @@ class VectSimplePatternMatch : public VectPatternMatch
 }
 };
 
+/* The ComplexPattern class contains common code for pattern matchers that work
+   on complex numbers.  These provide functionality to allow de-construction and
+   validation of sequences depicting/transforming REAL and IMAG pairs.  */
+
+class ComplexPattern : public VectPattern
+{
+  protected:
+/* Current list of arguments that were found during the current invocation
+   of the pattern matcher.  */
+vec m_vects;
+
+/* Representative statement for the current match being performed.  */
+stmt_vec_info m_stmt_info;
+
+/* A list of all arguments found between all invocations of the current
+   pattern matcher.  */
+vec> m_defs;
+
+/* Checks to see of the expression EXPR is a gimple assign with code CODE
+   and if this is the case the two operands of EXPR is returned in OP1 and
+   OP2.
+
+   If the matching and extraction is successful TRUE is returned otherwise
+   FALSE in which case the value of OP1 and OP2 will not have been touched.
+*/
+
+bool
+vect_match_expression_p (slp_tree node, tree_code code, int base, int idx,
+			 stmt_vec_info *op1, stmt_vec_info *op2)
+{
+
+  vec scalar_stmts = SLP_TREE_SCALAR_STMTS (node);
+
+  /* Calculate the index of the statement in the node to inspect.  */
+  int n = base + idx;
+  if (scalar_stmts.length () < (unsigned)n) // can use group_size
+	return false;
+
+  gimple* expr = STMT_VINFO_STMT (scalar_stmts[n]);
+  if (!is_gimple_assign (expr)
+	  || gimple_expr_code (expr) != code)
+	return false;
+
+  vec children = SLP_TREE_CHILDREN (node);
+
+  /* If it's a VEC_PERM_EXPR we need to look one deeper.  VEC_PERM_EXPR
+	 only have one entry.  So pick on.  */
+  if (node->code == VEC_PERM_EXPR)
+	children = SLP_TREE_CHILDREN (children.last ());
+
+  if (children.length () != (op2 ? 2 : 1))
+	return false;
+
+  if (op1)
+	{
+	  if (SLP_TREE_DEF_TYPE (children[0]) != vect_internal_def)
+	return false;
+	  *op1 = SLP_TREE_SCALAR_STMTS (children[0])[n];
+	}
+
+  if (op2)
+	{
+	  if (SLP_TREE_DEF_TYPE (children[1]) != vect_internal_def)
+	return false;
+	  *op2 = SLP_TREE_SCALAR_STMTS (children[1])[n];
+	}
+
+  return true;
+}
+
+/* This function will match two gimple expressions STMT_0 and STMT_1 in
+   parallel and returns the pair operation that represents the two
+   expressions in the two statements.  The statements are located in NODE1
+   and NODE2 at offset base + offset1 and base + offset2 respectively.
+
+   If match is successful then the corresponding complex_operation is
+   returned and the arguments to the two matched operations are returned in
+   OPS.
+
+   If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
+
+   e.g. the following gimple statements
+
+   stmt 0 _39 = _37 + _12;
+   stmt 1 _6 = _38 - _36;
+
+   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
+*/
+
+complex_operation_t
+vect_detect_pair_op (int base, slp_tree node1, int offset1, slp_tree node2,
+			 int offset2, vec *ops)
+{
+  stmt_vec_info op1 = NULL, op2 = NULL, op3 = NULL, op4 = NULL;
+  complex_operation_t result = CMPLX_NONE;
+  #define CHECK_FOR(x, y, z)\
+	(vect_match_expression_p (node1, x, base, offset1, &op1,\
+  z ? &op2 : NULL)  \
+	 && vect_

[PATCH v2 4/16]middle-end: Add dissolve code for when SLP fails and non-SLP loop vectorization is to be tried.

2020-09-25 Thread Tamar Christina
Hi All,

This adds the dissolve code to undo the patterns created by the pattern matcher
in case SLP is to be aborted.

As mentioned in the cover letter this has one issue in that the number of copies
can needed can change depending on whether TWO_OPERATORS is needed or not.

Because of this I don't analyze the original statement when it's replaced by a
pattern and attempt to correct it here by analyzing it after dissolve.

This however seems too late and I would need to change the unroll factor, which
seems a bit odd.  Any advice would be appreciated.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.c (vect_dissolve_slp_only_patterns): New
(vect_dissolve_slp_only_groups): Call vect_dissolve_slp_only_patterns.

-- 
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b1a6e1508c7f00f5f369ec873f927f30d673059e..8231ad6452af6ff111911a7bfb6aab2257df9fc0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1956,6 +1956,92 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
   return opt_result::success ();
 }
 
+/* For every SLP only pattern created by the pattern matched rooted in ROOT
+   restore the relevancy of the original statements over those of the pattern
+   and destroy the pattern relationship.  This restores the SLP tree to a state
+   where it can be used when SLP build is cancelled or re-tried.  */
+
+static opt_result
+vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo,
+ hash_set *visited, slp_tree root)
+{
+  if (!root || visited->contains (root))
+return opt_result::success ();
+
+  unsigned int i;
+  slp_tree node;
+  opt_result res = opt_result::success ();
+  stmt_vec_info stmt_info;
+  stmt_vec_info related_stmt_info;
+  bool need_to_vectorize = false;
+  auto_vec cost_vec;
+
+  visited->add (root);
+
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (root), i, stmt_info)
+if (STMT_VINFO_SLP_VECT_ONLY (stmt_info)
+&& (related_stmt_info = STMT_VINFO_RELATED_STMT (stmt_info)) != NULL)
+  {
+	if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "dissolving relevancy of %G over %G",
+			   STMT_VINFO_STMT (stmt_info),
+			   STMT_VINFO_STMT (related_stmt_info));
+	STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
+	STMT_VINFO_RELEVANT (related_stmt_info) = vect_used_in_scope;
+	STMT_VINFO_IN_PATTERN_P (related_stmt_info) = false;
+	STMT_SLP_TYPE (related_stmt_info) = hybrid;
+	/* Now we have to re-analyze the statement since we skipped it in the
+	   the initial analysis due to the differences in copies.  */
+	res = vect_analyze_stmt (loop_vinfo, related_stmt_info,
+ &need_to_vectorize, NULL, NULL, &cost_vec);
+
+	if (!res)
+	  return res;
+  }
+
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, node)
+{
+  res = vect_dissolve_slp_only_patterns (loop_vinfo, visited, node);
+  if (!res)
+	return res;
+}
+
+  return res;
+}
+
+/* Lookup any SLP Only Pattern statements created by the SLP pattern matcher in
+   all slp_instances in LOOP_VINFO and undo the relevancy of statements such
+   that the original SLP tree before the pattern matching is used.  */
+
+static opt_result
+vect_dissolve_slp_only_patterns (loop_vec_info loop_vinfo)
+{
+
+  unsigned int i;
+  opt_result res = opt_result::success ();
+  hash_set *visited = new hash_set ();
+
+  DUMP_VECT_SCOPE ("vect_dissolve_slp_only_patterns");
+
+  /* Unmark any SLP only patterns as relevant and restore the STMT_INFO of the
+ related instruction.  */
+  slp_instance instance;
+  FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (loop_vinfo), i, instance)
+{
+  res = vect_dissolve_slp_only_patterns (loop_vinfo, visited,
+	 SLP_INSTANCE_TREE (instance));
+  if (!res)
+	{
+	  delete visited;
+	  return res;
+	}
+}
+
+  delete visited;
+  return res;
+}
+
 /* Look for SLP-only access groups and turn each individual access into its own
group.  */
 static void
@@ -2427,6 +2513,11 @@ again:
   /* Ensure that "ok" is false (with an opt_problem if dumping is enabled).  */
   gcc_assert (!ok);
 
+  /* Dissolve any SLP patterns created by the SLP pattern matcher.  */
+  opt_result dissolved = vect_dissolve_slp_only_patterns (loop_vinfo);
+  if (!dissolved)
+return dissolved;
+
   /* Try again with SLP forced off but if we didn't do any SLP there is
  no point in re-trying.  */
   if (!slp)



[PATCH v2 7/16]middle-end: Add Complex Multiplication and Multiplication with Conjucate detection

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds pattern detections for the following operation:

  Complex multiplication and Conjucate Complex multiplication of the second
 parameter.

c = a * b and c = a * conj (b)

  For the conjucate cases it supports under fast-math that the operands that is
  being conjucated be flipped by flipping the arguments to the optab.  This
  allows it to support c = conj (a) * b and c += conj (a) * b.

  where a, b and c are complex numbers.

and provides a shared class for anything needing to recognize complex MLA
patterns.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/md.texi: Document optabs.
* internal-fn.def (COMPLEX_MUL, COMPLEX_MUL_CONJ): New.
* optabs.def (cmul_optab, cmul_conj_optab): New,
* tree-vect-slp-patterns.c (class ComplexMLAPattern,
class ComplexMulPattern): New.
(slp_patterns): Add ComplexMulPattern.

-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 71e226505b2619d10982b59a4ebbed73a70f29be..ddaf1abaccbd44dae11ea902ec38b474aacfb8e1 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6143,6 +6143,28 @@ rotations @var{m} of 90 or 270.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cmul@var{m}4} instruction pattern
+@item @samp{cmul@var{m}4}
+Perform a vector floating point multiplication of complex numbers in operand 0
+and operand 1.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmul_conj@var{m}4} instruction pattern
+@item @samp{cmul_conj@var{m}4}
+Perform a vector floating point multiplication of complex numbers in operand 0
+and the conjucate of operand 1.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 956a65a338c157b51de7e78a3fb005b5af78ef31..51bebf8701af262b22d66d19a29a8dafb74db1f0 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
+
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2bb0bf857977035bf562a77f5f6848e80edf936d..9c267d422478d0011f288b1f5f62daabe3989ba7 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -292,6 +292,8 @@ OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
 OPTAB_D (cadd90_optab, "cadd90$a3")
 OPTAB_D (cadd270_optab, "cadd270$a3")
+OPTAB_D (cmul_optab, "cmul$a3")
+OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index b2b0ac62e9a69145470f41d2bac736dd970be735..bef7cc73b21c020e4c0128df5d186a034809b103 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -743,6 +743,179 @@ class ComplexAddPattern : public ComplexPattern
 }
 };
 
+class ComplexMLAPattern : public ComplexPattern
+{
+  protected:
+ComplexMLAPattern (slp_tree node, vec_info *vinfo)
+  : ComplexPattern (node, vinfo)
+{ }
+
+  protected:
+/* Helper function of vect_match_call_complex_mla that looks up the
+   definition of LHS_0 and LHS_1 by finding the statements starting in
+   position BASE + IDX in child ROOT of NODE and tries to match the
+   definition against pair ops.
+
+   If the match is successful then ARGS will contain the operands matched
+   and the complex_operation_t type is returned.  If match is not successful
+   then CMPLX_NONE is returned and ARGS is left unmodified.  */
+
+complex_operation_t
+vect_match_call_complex_mla_1 (slp_tree node, slp_tree *res, int root,
+   int base, int idx, vec *args)
+{
+  gcc_assert (base >= 0 && idx >= 0 && node != NULL);
+
+  if ((unsigned)root >= SLP_TREE_CHILDREN (node).length ())
+	return CMPLX_NONE;
+
+  slp_tree data = SLP_TREE_CHILDREN (node)[root];
+
+  /* If it's a VEC_PERM_EXPR we need to look one deeper.  */
+  if (node->code == VEC_PERM_EXPR)
+	data = SLP_TREE_CHILDREN (data)[root];
+
+  int lhs_0 = base + idx;
+  int lhs_1 = base + idx + 1;
+
+  vec s

[PATCH v2 9/16][docs] Add some missing test directive documentaion.

2020-09-25 Thread Tamar Christina
Hi All,

This adds some documentation for some test directives that are missing.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/sourcebuild.texi (vect_complex_rot_,
arm_v8_3a_complex_neon_ok, arm_v8_3a_complex_neon_hw): New.

-- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 65b2e552b74becdbc5474ba5ac387a4a0296e341..3abd8f631cb0234076641e399f6f00768b38ebee 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1671,6 +1671,10 @@ Target supports a vector dot-product of @code{signed short}.
 @item vect_udot_hi
 Target supports a vector dot-product of @code{unsigned short}.
 
+@item vect_complex_rot_@var{n}
+Target supports a vector complex addition and complex fma of mode @var{N}.
+Possible values of @var{n} are @code{hf}, @code{sf}, @code{df}.
+
 @item vect_pack_trunc
 Target supports a vector demotion (packing) of @code{short} to @code{char}
 and from @code{int} to @code{short} using modulo arithmetic.
@@ -1941,6 +1945,16 @@ ARM target supports executing instructions from ARMv8.2-A with the Dot
 Product extension. Some multilibs may be incompatible with these options.
 Implies arm_v8_2a_dotprod_neon_ok.
 
+@item arm_v8_3a_complex_neon_ok
+@anchor{arm_v8_3a_complex_neon_ok}
+ARM target supports options to generate complex number arithmetic instructions
+from ARMv8.3-A.  Some multilibs may be incompatible with these options.
+
+@item arm_v8_3a_complex_neon_hw
+ARM target supports executing complex arithmetic instructions from ARMv8.3-A.
+Some multilibs may be incompatible with these options.
+Implies arm_v8_3a_complex_neon_ok.
+
 @item arm_fp16fml_neon_ok
 @anchor{arm_fp16fml_neon_ok}
 ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}



[PATCH v2 8/16]middle-end: add Complex Multiply and Accumulate/Subtract and Multiply and Accumulate/Subtract with Conjucate detection

2020-09-25 Thread Tamar Christina
Hi All,

This patch adds pattern detections for the following operation:

  Complex FMLA, Conjucate FMLA of the second parameter and FMLS.

c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)

  For the conjucate cases it supports under fast-math that the operands that is
  being conjucated be flipped by flipping the arguments to the optab.  This
  allows it to support c = conj (a) * b and c += conj (a) * b.

  where a, b and c are complex numbers.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* doc/md.texi: Document optabs.
* internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ, COMPLEX_FMS,
COMPLEX_FMS_CONJ): New.
* optabs.def (cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab):
New.
* tree-vect-slp-patterns.c (class ComplexFMAPattern): New.
(slp_patterns): Add ComplexFMAPattern.

-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index ddaf1abaccbd44dae11ea902ec38b474aacfb8e1..d8142f745050d963e8d15c7793fae06d9ad02020 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6143,6 +6143,50 @@ rotations @var{m} of 90 or 270.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cmla@var{m}4} instruction pattern
+@item @samp{cmla@var{m}4}
+Perform a vector floating point multiply and accumulate of complex numbers
+in operand 0, operand 1 and operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmla_conj@var{m}4} instruction pattern
+@item @samp{cmla_conj@var{m}4}
+Perform a vector floating point multiply and accumulate of complex numbers
+in operand 0, operand 1 and the conjucate of operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmls@var{m}4} instruction pattern
+@item @samp{cmls@var{m}4}
+Perform a vector floating point multiply and subtract of complex numbers
+in operand 0, operand 1 and operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmls_conj@var{m}4} instruction pattern
+@item @samp{cmls_conj@var{m}4}
+Perform a vector floating point multiply and subtract of complex numbers
+in operand 0, operand 1 and the conjucate of operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{cmul@var{m}4} instruction pattern
 @item @samp{cmul@var{m}4}
 Perform a vector floating point multiplication of complex numbers in operand 0
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 51bebf8701af262b22d66d19a29a8dafb74db1f0..cc0135cb2c1c14b593181edeaa5f896fa6c4c659 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -286,6 +286,10 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
 
 /* Ternary math functions.  */
 DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, cmla_conj, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST, cmls_conj, ternary)
 
 /* Unary integer ops.  */
 DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 9c267d422478d0011f288b1f5f62daabe3989ba7..19db9c00896cd08adfd20a01669990bbbebd79f1 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -294,6 +294,10 @@ OPTAB_D (cadd90_optab, "cadd90$a3")
 OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cmul_optab, "cmul$a3")
 OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
+OPTAB_D (cmla_optab, "cmla$a4")
+OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
+OPTAB_D (cmls_optab, "cmls$a4")
+OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index bef7cc73b21c020e4c0128df5d186a034809b103..d9554aaaf2cce14bb5b9c68e6141ea7f555a35de 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -916,6 +916,199 @@ class ComplexMulPattern : public ComplexMLAPattern
 }
 };
 
+class ComplexFMAPattern : public ComplexMLAPattern
+{
+  protected:
+ComplexFMAPattern (slp_tree node, vec_info *vinfo)
+  : ComplexMLAPattern (node, vinfo)
+{
+  this->m_arity = 2;
+  this->m_num_args = 3;
+  this->m_vects.create (0);
+  this->m_defs.create (0);
+}
+
+  public:
+~Co

[PATCH v2 10/16]AArch64: Add NEON RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  mov x3, 0
  .p2align 3,,7
  .L2:
  ldr q0, [x0, x3]
  ldr q1, [x1, x3]
  fcadd   v0.4s, v0.4s, v1.4s, #90
  str q0, [x2, x3]
  add x3, x3, 16
  cmp x3, 1600
  bne .L2
  ret

instead of

  f90:
  add x3, x1, 1600
  .p2align 3,,7
  .L2:
  ld2 {v4.4s - v5.4s}, [x0], 32
  ld2 {v2.4s - v3.4s}, [x1], 32
  fsubv0.4s, v4.4s, v3.4s
  faddv1.4s, v5.4s, v2.4s
  st2 {v0.4s - v1.4s}, [x2], 32
  cmp x3, x1
  bne .L2
  ret

It defined a new iterator VALL_ARITH which contains types for which we can do
general arithmetic (excludes bfloat16).

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cadd3,
cml4, cmul3): New.
* config/aarch64/iterators.md (VALL_ARITH, UNSPEC_FCMUL,
UNSPEC_FCMUL180, UNSPEC_FCMLS, UNSPEC_FCMLS180, UNSPEC_CMLS,
UNSPEC_CMLS180, UNSPEC_CMUL, UNSPEC_CMUL180, FCMLA_OP, FCMUL_OP, rot_op,
rotsplit1, rotsplit2, fcmac1): New.
(rot): Add UNSPEC_FCMLS, UNSPEC_FCMUL, UNSPEC_FCMUL180.

-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 381a702eba003520d2e83e91065d2a808b9c6493..c2ddef19e4e433f7ca055e42d1222d9dad6bd6c2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -449,6 +449,14 @@ (define_insn "aarch64_fcadd"
   [(set_attr "type" "neon_fcadd")]
 )
 
+(define_expand "cadd3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+		   (match_operand:VHSDF 2 "register_operand")]
+		   FCADD))]
+  "TARGET_COMPLEX"
+)
+
 (define_insn "aarch64_fcmla"
   [(set (match_operand:VHSDF 0 "register_operand" "=w")
 	(plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
@@ -508,6 +516,45 @@ (define_insn "aarch64_fcmlaq_lane"
   [(set_attr "type" "neon_fcmla")]
 )
 
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml4"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
+		(unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
+   (match_operand:VHSDF 3 "register_operand")]
+   FCMLA_OP)))]
+  "TARGET_COMPLEX"
+{
+  emit_insn (gen_aarch64_fcmla (operands[0], operands[1],
+		 operands[2], operands[3]));
+  emit_insn (gen_aarch64_fcmla (operands[0], operands[0],
+		 operands[2], operands[3]));
+  DONE;
+})
+
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+		   (match_operand:VHSDF 2 "register_operand")]
+		   FCMUL_OP))]
+  "TARGET_COMPLEX"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  emit_move_insn (tmp, CONST0_RTX (mode));
+  emit_insn (gen_aarch64_fcmla (operands[0], tmp,
+		 operands[1], operands[2]));
+  emit_insn (gen_aarch64_fcmla (operands[0], operands[0],
+		 operands[1], operands[2]));
+  DONE;
+})
+
+
+
 ;; These instructions map to the __builtins for the Dot Product operations.
 (define_insn "aarch64_dot"
   [(set (match_operand:VS 0 "register_operand" "=w")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 054fd8515c6ebf136da699e2993f6ebb348c3b1a..98217c9fd3ee2b6063f7564193e400e9ef71c6ac 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -182,6 +182,11 @@ (define_mode_iterator V2F [V2SF V2DF])
 ;; All Advanced SIMD modes on which we support any arithmetic operations.
 (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
 
+;; All Advanced SIMD modes suitable for performing arithmetics.
+(define_mode_iterator VALL_ARITH [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
+  (V4HF "TARGET_SIMD_F16INST") (V8HF "TARGET_SIMD_F16INST")
+  V2SF V4SF V2DF])
+
 ;; All Advanced SIMD modes suitable for moving, loading, and storing.
 (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
 V4HF V8HF V4BF V8BF V2SF V4SF V2DF])
@@ -705,6 +710,10 @@ (define_c_enum "unspec"
 UNSPEC_FCMLA90	; Used in aarch64-simd.md.
 UNSPEC_FCMLA180	; Used in aarch64-simd.md.
 U

[PATCH v2 13/16]Arm: Add support for auto-vectorization using HF mode.

2020-09-25 Thread Tamar Christina
Hi All,

This adds support to the auto-vectorizer to support HFmode vectorization for
AArch32.  This is supported when +fp16 is used.  I wonder if I should disable
the returning of the type if the option isn't enabled.

At the moment it will be returned but the vectorizer will try and fail to use
it.  It wastes a few compile cycles but doesn't result in bad code.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm.c (arm_preferred_simd_mode): Add E_HFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/vect-half-floats.c: New test.

-- 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 022ef6c3f1d723bdf421268c81cd0c759c414d9a..8ca6b913fddb74cd6f4867efc0a7264184c59db0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28917,6 +28917,8 @@ arm_preferred_simd_mode (scalar_mode mode)
   if (TARGET_NEON)
 switch (mode)
   {
+  case E_HFmode:
+	return TARGET_NEON_VECTORIZE_DOUBLE ? V4HFmode : V8HFmode;
   case E_SFmode:
 	return TARGET_NEON_VECTORIZE_DOUBLE ? V2SFmode : V4SFmode;
   case E_SImode:
diff --git a/gcc/testsuite/gcc.target/arm/vect-half-floats.c b/gcc/testsuite/gcc.target/arm/vect-half-floats.c
new file mode 100644
index ..ebfe7f964442a09053b0cbe04bed425e36b0af96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-half-floats.c
@@ -0,0 +1,14 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target target_float16 } */ 
+/* { dg-require-effective-target arm_fp16_ok } */
+/* { dg-add-options for_float16 } */
+/* { dg-additional-options "-Ofast -ftree-vectorize -fdump-tree-vect-all -std=c11" } */
+
+void foo (_Float16 n1[], _Float16 n2[], _Float16 r[], int n)
+{
+  for (int i = 0; i < n; i++)
+   r[i] = n1[i] + n2[i];
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+



[PATCH v2 12/16]AArch64: Add SVE2 Integer RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void f90 (int _Complex a[restrict N], int _Complex b[restrict N],
int _Complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  mov x3, 0
  mov x4, 200
  whilelo p0.s, xzr, x4
  .p2align 3,,7
  .L2:
  ld1wz0.s, p0/z, [x0, x3, lsl 2]
  ld1wz1.s, p0/z, [x1, x3, lsl 2]
  caddz0.s, z0.s, z1.s, #90
  st1wz0.s, p0, [x2, x3, lsl 2]
  incwx3
  whilelo p0.s, x3, x4
  b.any   .L2
  ret

instead of

  f90:
  mov x3, 0
  mov x4, 0
  mov w5, 100
  whilelo p0.s, wzr, w5
  .p2align 3,,7
  .L2:
  ld2w{z4.s - z5.s}, p0/z, [x0, x3, lsl 2]
  ld2w{z2.s - z3.s}, p0/z, [x1, x3, lsl 2]
  sub z0.s, z4.s, z3.s
  add z1.s, z5.s, z2.s
  st2w{z0.s - z1.s}, p0, [x2, x3, lsl 2]
  incwx4
  inchx3
  whilelo p0.s, w4, w5
  b.any   .L2
  ret

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md (cadd3,
cml4, cmul3): New.
* config/aarch64/iterators.md (SVE2_INT_CMLA_OP, SVE2_INT_CMUL_OP,
SVE2_INT_CADD_OP): New.

-- 
diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index e18b9fef16e72496588fb5850e362da4ae42898a..e601c6a4586e3ed1e11aedf047f56d556a99a302 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -1774,6 +1774,16 @@ (define_insn "@aarch64_sve_"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+(define_expand "cadd3"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(unspec:SVE_FULL_I
+	  [(match_operand:SVE_FULL_I 1 "register_operand")
+	   (match_operand:SVE_FULL_I 2 "register_operand")]
+	  SVE2_INT_CADD_OP))]
+  "TARGET_SVE2"
+)
+
 ;; -
 ;;  [INT] Complex ternary operations
 ;; -
@@ -1813,6 +1823,47 @@ (define_insn "@aarch64__lane_"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml4"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(plus:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand")
+	  (unspec:SVE_FULL_I
+	[(match_operand:SVE_FULL_I 2 "register_operand")
+	 (match_operand:SVE_FULL_I 3 "register_operand")]
+	SVE2_INT_CMLA_OP)))]
+  "TARGET_SVE2"
+{
+  emit_insn (gen_aarch64_sve_cmla (operands[0], operands[1],
+		   operands[2], operands[3]));
+  emit_insn (gen_aarch64_sve_cmla (operands[0], operands[0],
+		   operands[2], operands[3]));
+  DONE;
+})
+
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul3"
+  [(set (match_operand:SVE_FULL_I 0 "register_operand")
+	(unspec:SVE_FULL_I
+	  [(match_operand:SVE_FULL_I 1 "register_operand")
+	   (match_operand:SVE_FULL_I 2 "register_operand")
+	   (match_dup 3)]
+	  SVE2_INT_CMUL_OP))]
+  "TARGET_SVE2"
+{
+  operands[3] = force_reg (mode, CONST0_RTX (mode));
+  emit_insn (gen_aarch64_sve_cmla (operands[0], operands[3],
+		   operands[1], operands[2]));
+  emit_insn (gen_aarch64_sve_cmla (operands[0], operands[0],
+		   operands[1], operands[2]));
+  DONE;
+})
+
 ;; -
 ;;  [INT] Complex dot product
 ;; -
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 7662b929e2c4f6c103cc06e051eb574247320809..c11e976237d30771a7bd7c7fb56922f9c5c785de 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2583,6 +2583,23 @@ (define_int_iterator SVE2_INT_CMLA [UNSPEC_CMLA
 UNSPEC_SQRDCMLAH180
 UNSPEC_SQRDCMLAH270])
 
+;; Unlike the normal CMLA instructions these represent the actual operation you
+;; to be performed.  They will always need to be expanded into multiple
+;; sequences consisting of CMLA.
+(define_int_iterator SVE2_INT_CMLA_OP [UNSPEC_CMLA
+   UNSPEC_CMLA180
+   UNSPEC_CMLS])
+
+;; Unlike the normal CMLA instructions these represent the actual operation you
+;; to be performed.  They will alwa

[PATCH v2 14/16]Arm: Add NEON RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex additions.  With this the
following C code:

  void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  add r3, r2, #1600
  .L2:
  vld1.32 {q8}, [r0]!
  vld1.32 {q9}, [r1]!
  vcadd.f32   q8, q8, q9, #90
  vst1.32 {q8}, [r2]!
  cmp r3, r2
  bne .L2
  bx  lr


instead of

  f90:
  add r3, r2, #1600
  .L2:
  vld2.32 {d24-d27}, [r0]!
  vld2.32 {d20-d23}, [r1]!
  vsub.f32  q8, q12, q11
  vadd.f32  q9, q13, q10
  vst2.32 {d16-d19}, [r2]!
  cmp r3, r2
  bne .L2
  bx  lr


Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/iterators.md (rot): Add UNSPEC_VCMLS, UNSPEC_VCMUL and
UNSPEC_VCMUL180.
(rot_op, rotsplit1, rotsplit2, fcmac1, VCMLA_OP, VCMUL_OP): New.
* config/arm/neon.md (cadd3, cml4,
cmul3): New.
* config/arm/unspecs.md (UNSPEC_VCMUL, UNSPEC_VCMUL180, UNSPEC_VCMLS,
UNSPEC_VCMLS180): New.

-- 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722689aff4c1a143e952f6eb91c0cd86..f5693c0524274da1eb1c767713574c01ec6d544c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1146,10 +1146,38 @@ (define_int_attr crypto_mode [(UNSPEC_SHA1H "V4SI") (UNSPEC_AESMC "V16QI")
 
 (define_int_attr rot [(UNSPEC_VCADD90 "90")
 		  (UNSPEC_VCADD270 "270")
+		  (UNSPEC_VCMLS "0")
 		  (UNSPEC_VCMLA "0")
 		  (UNSPEC_VCMLA90 "90")
 		  (UNSPEC_VCMLA180 "180")
-		  (UNSPEC_VCMLA270 "270")])
+		  (UNSPEC_VCMLA270 "270")
+		  (UNSPEC_VCMUL "0")
+		  (UNSPEC_VCMUL180 "180")])
+
+;; A conjucate is a rotation of 180* around the argand plane, or * I.
+(define_int_attr rot_op [(UNSPEC_VCMLS "")
+			 (UNSPEC_VCMLS180 "_conj")
+			 (UNSPEC_VCMLA "")
+			 (UNSPEC_VCMLA180 "_conj")
+			 (UNSPEC_VCMUL "")
+			 (UNSPEC_VCMUL180 "_conj")])
+
+(define_int_attr rotsplit1 [(UNSPEC_VCMLA "0")
+			(UNSPEC_VCMLA180 "0")
+			(UNSPEC_VCMUL "0")
+			(UNSPEC_VCMUL180 "0")
+			(UNSPEC_VCMLS "270")
+			(UNSPEC_VCMLS180 "90")])
+
+(define_int_attr rotsplit2 [(UNSPEC_VCMLA "90")
+			(UNSPEC_VCMLA180 "270")
+			(UNSPEC_VCMUL "90")
+			(UNSPEC_VCMUL180 "270")
+			(UNSPEC_VCMLS "180")
+			(UNSPEC_VCMLS180 "180")])
+
+(define_int_attr fcmac1 [(UNSPEC_VCMLA "a") (UNSPEC_VCMLA180 "a")
+			 (UNSPEC_VCMLS "s") (UNSPEC_VCMLS180 "s")])
 
 (define_int_attr simd32_op [(UNSPEC_QADD8 "qadd8") (UNSPEC_QSUB8 "qsub8")
 			(UNSPEC_SHADD8 "shadd8") (UNSPEC_SHSUB8 "shsub8")
@@ -1256,3 +1284,12 @@ (define_int_attr bt [(UNSPEC_BFMAB "b") (UNSPEC_BFMAT "t")])
 
 ;; An iterator for CDE MVE accumulator/non-accumulator versions.
 (define_int_attr a [(UNSPEC_VCDE "") (UNSPEC_VCDEA "a")])
+
+;; Define iterators for VCMLA operations
+(define_int_iterator VCMLA_OP [UNSPEC_VCMLA
+			   UNSPEC_VCMLA180
+			   UNSPEC_VCMLS])
+
+;; Define iterators for VCMLA operations as MUL
+(define_int_iterator VCMUL_OP [UNSPEC_VCMUL
+			   UNSPEC_VCMUL180])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7b51d8ab60007901392df0ca1cb09fead4d0e9..1611bcea1ba8cb416d27368e4dc39ce15b3a4cd8 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3217,6 +3217,14 @@ (define_insn "neon_vcadd"
   [(set_attr "type" "neon_fcadd")]
 )
 
+(define_expand "cadd3"
+  [(set (match_operand:VF 0 "register_operand")
+	(unspec:VF [(match_operand:VF 1 "register_operand")
+		(match_operand:VF 2 "register_operand")]
+		VCADD))]
+  "TARGET_COMPLEX"
+)
+
 (define_insn "neon_vcmla"
   [(set (match_operand:VF 0 "register_operand" "=w")
 	(plus:VF (match_operand:VF 1 "register_operand" "0")
@@ -3274,6 +3282,43 @@ (define_insn "neon_vcmlaq_lane"
 )
 
 
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml4"
+  [(set (match_operand:VF 0 "register_operand")
+	(plus:VF (match_operand:VF 1 "register_operand")
+		 (unspec:VF [(match_operand:VF 2 "register_operand")
+			 (match_operand:VF 3 "register_operand")]
+			 VCMLA_OP)))]
+  "TARGET_COMPLEX"
+{
+  emit_insn (gen_neon_vcmla (operands[0], operands[1],
+	  operands[2], operands[3]));
+  emit_insn (gen_neon_vcmla (operands[0], operands[0],
+	  operands[2], operands[3]));
+  DONE;
+})
+
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul3"
+  

[PATCH v2 11/16]AArch64: Add SVE RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void f90 (float complex a[restrict N], float complex b[restrict N],
float complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  mov x3, 0
  mov x4, 400
  ptrue   p1.b, all
  whilelo p0.s, xzr, x4
  .p2align 3,,7
  .L2:
  ld1wz0.s, p0/z, [x0, x3, lsl 2]
  ld1wz1.s, p0/z, [x1, x3, lsl 2]
  fcadd   z0.s, p1/m, z0.s, z1.s, #90
  st1wz0.s, p0, [x2, x3, lsl 2]
  incwx3
  whilelo p0.s, x3, x4
  b.any   .L2
  ret

instead of

  f90:
  mov x3, 0
  mov x4, 0
  mov w5, 200
  whilelo p0.s, wzr, w5
  .p2align 3,,7
  .L2:
  ld2w{z4.s - z5.s}, p0/z, [x0, x3, lsl 2]
  ld2w{z2.s - z3.s}, p0/z, [x1, x3, lsl 2]
  fsubz0.s, z4.s, z3.s
  faddz1.s, z2.s, z5.s
  st2w{z0.s - z1.s}, p0, [x2, x3, lsl 2]
  incwx4
  inchx3
  whilelo p0.s, w4, w5
  b.any   .L2
  ret

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (cadd3,
cml4, cmul3): New.
* config/aarch64/iterators.md (sve_rot1, sve_rot2): New.

-- 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index cd79aba90ec9cdb5da9e9758495015ef36b2d869..12bc8077994f5a130ff4af6e9bfa7ca1237d0868 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -5109,6 +5109,20 @@ (define_expand "@cond_"
   "TARGET_SVE"
 )
 
+;; Predicated FCADD using ptrue for unpredicated optab for auto-vectorizer
+(define_expand "@cadd3"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 3)
+	   (const_int SVE_RELAXED_GP)
+	   (match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")]
+	  SVE_COND_FCADD))]
+  "TARGET_SVE"
+{
+  operands[3] = aarch64_ptrue_reg (mode);
+})
+
 ;; Predicated FCADD, merging with the first input.
 (define_insn_and_rewrite "*cond__2"
   [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w, ?&w")
@@ -6554,6 +6568,62 @@ (define_insn "@aarch64_pred_"
   [(set_attr "movprfx" "*,yes")]
 )
 
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mla/mls operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cml4"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 4)
+	   (match_dup 5)
+	   (match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")
+	   (match_operand:SVE_FULL_F 3 "register_operand")]
+	  FCMLA_OP))]
+  "TARGET_SVE"
+{
+  operands[4] = aarch64_ptrue_reg (mode);
+  operands[5] = gen_int_mode (SVE_RELAXED_GP, SImode);
+  emit_insn (
+gen_aarch64_pred_fcmla (operands[0], operands[4],
+	operands[1], operands[2],
+	operands[3], operands[5]));
+  emit_insn (
+gen_aarch64_pred_fcmla (operands[0], operands[4],
+	operands[0], operands[2],
+	operands[3], operands[5]));
+  DONE;
+})
+
+;; unpredicated optab pattern for auto-vectorizer
+;; The complex mul operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "cmul3"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_dup 3)
+	   (match_dup 4)
+	   (match_operand:SVE_FULL_F 1 "register_operand")
+	   (match_operand:SVE_FULL_F 2 "register_operand")
+	   (match_dup 5)]
+	  FCMUL_OP))]
+  "TARGET_SVE"
+{
+  operands[3] = aarch64_ptrue_reg (mode);
+  operands[4] = gen_int_mode (SVE_RELAXED_GP, SImode);
+  operands[5] = force_reg (mode, CONST0_RTX (mode));
+  emit_insn (
+gen_aarch64_pred_fcmla (operands[0], operands[3], operands[1],
+	operands[2], operands[5], operands[4]));
+  emit_insn (
+gen_aarch64_pred_fcmla (operands[0], operands[3], operands[1],
+	operands[2], operands[0],
+	operands[4]));
+  DONE;
+})
+
 ;; Predicated FCMLA with merging.
 (define_expand "@cond_"
   [(set (match_operand:SVE_FULL_F 0 "register_operand")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 98217c9fd3ee2b6063f7564193e400e9ef71c6ac..7662b929e2c4f6c103cc06e051eb574247320809 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3443,6 +3443,35 @@ (define_int_attr rotsplit2 [(UNSPEC_FCMLA "90")
 			(UNSPEC_FCMLS "180")
 			(UNSPEC_FCMLS180 "180")])
 
+;; SVE has slightly different namings 

[PATCH v2 16/16] Testsuite: Add initial tests for NEON (incomplete)

2020-09-25 Thread Tamar Christina
Hi All,

These are just initial testcases to show what the patch is testing for,
however it is incomplete and I am working on better test setup
to test all targets and add middle-end tests.

These were just included for completeness.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex-autovec.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex-autovec.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_3.c: New test.

-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c
new file mode 100644
index ..8f660f392153c3a6a83b31486e275be316c6ad2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c
@@ -0,0 +1,13 @@
+/* { dg-skip-if "" { *-*-* } } */
+
+#define N 200
+
+__attribute__ ((noinline))
+void calc (TYPE a[N], TYPE b[N], TYPE *c)
+{
+  for (int i=0; i < N; i+=2)
+{
+  c[i] = a[i] + b[i+1];
+  c[i+1] = a[i+1] - b[i];
+}
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c
new file mode 100644
index ..14014b9d4f2c41e75be3e253d2e47e639e4224c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c
@@ -0,0 +1,12 @@
+/* { dg-skip-if "" { *-*-* } } */
+#define N 200
+
+__attribute__ ((noinline))
+void calc (TYPE a[N], TYPE b[N], TYPE *c)
+{
+  for (int i=0; i < N; i+=2)
+{
+  c[i] = a[i] - b[i+1];
+  c[i+1] = a[i+1] + b[i];
+}
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c
new file mode 100644
index ..997d9065504a9a16d3ea1316f7ea4208b3516c55
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
+/* { dg-require-effective-target vect_complex_rot_df } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define TYPE double
+#include "vcadd-arrays-autovec-90.c"
+
+extern void abort(void);
+
+int main()
+{
+  TYPE a[N] = {1.0, 2.0, 3.0, 4.0};
+  TYPE b[N] = {4.0, 2.0, 1.5, 4.5};
+  TYPE c[N] = {0};
+  calc (a, b, c);
+
+  if (c[0] != -1.0 || c[1] != 6.0)
+abort ();
+
+  if (c[2] != -1.5 || c[3] != 5.5)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {fcadd\tv[0-9]+\.2d, v[0-9]+\.2d, v[0-9]+\.2d, #90} 1 { target { aarch64*-*-* } } } } */
+/* { dg-final { scan-assembler-not {vcadd\.} { target { arm*-*-* } } } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-

[PATCH v2 15/16]Arm: Add MVE RTL patterns for Complex Addition, Multiply and FMA.

2020-09-25 Thread Tamar Christina
Hi All,

This adds implementation for the optabs for complex operations.  With this the
following C code:

  void f90 (int _Complex a[restrict N], int _Complex b[restrict N],
int _Complex c[restrict N])
  {
for (int i=0; i < N; i++)
  c[i] = a[i] + (b[i] * I);
  }

generates

  .L3:
  mov r3, r0
  vldrw.32  q2, [r3]
  mov r3, r1
  vldrw.32  q1, [r3]
  mov r3, r2
  vcadd.i32   q3, q2, q1, #90
  addsr0, r0, #16
  vstrw.32  q3, [r3]
  addsr1, r1, #16
  addsr2, r2, #16
  le  lr, .L3
  pop {r4, r5, r6, r7, r8, pc}

which is not ideal due to register allocation and addressing mode issues with
MVE in general.  However -frename-register cleans up the register allocation:

  .L3:
  mov r5, r0
  mov r6, r1
  vldrw.32  q2, [r5]
  vldrw.32  q1, [r6]
  mov r7, r2
  vcadd.i32   q3, q2, q1, #90
  addsr0, r0, #16
  vstrw.32  q3, [r7]
  addsr1, r1, #16
  addsr2, r2, #16
  le  lr, .L3
  pop {r4, r5, r6, r7, r8, pc}

but leaves the addressing mode problems.

Before this patch it generated a scalar loop

  .L2:
  ldr r7, [r0, r3, lsl #2]
  ldr r5, [r6, r3, lsl #2]
  ldr r4, [r1, r3, lsl #2]
  subsr5, r7, r5
  ldr r7, [lr, r3, lsl #2]
  add r4, r4, r7
  str r5, [r2, r3, lsl #2]
  str r4, [ip, r3, lsl #2]
  addsr3, r3, #2
  cmp r3, #200
  bne .L2
  pop {r4, r5, r6, r7, pc}



Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
Cross compiled arm-none-eabi and ran with -march=armv8.1-m.main+mve.fp
-mfloat-abi=hard -mfpu=auto and regression is on-going.

Unfortunately MVE does not currently implement auto-vectorization of floating
point values.  As such I cannot test this directly.  But since they share 90%
of the code with NEON these should just work whenever support is added so I
would still like to commit these.

To support this I had to refactor the MVE bits a bit.  This now uses the same
unspecs for both NEON and MVE and removes the unneeded different signed and
unsigned unspecs since they both point to the signed instruction.

I have tried multiple approaches to cleaning this up but I think this is the
nicest it can get given the slight ISA differences.

Ok for master if no issues?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8,
, __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8,
__arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16, __arm_vcaddq_rot90_s16,
__arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32,
__arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32,
__arm_vcaddq_rot270_s32, __arm_vcmulq_rot90_f16,
__arm_vcmulq_rot270_f16, __arm_vcmulq_rot180_f16,
__arm_vcmulq_f16, __arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16,
__arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32,
__arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcaddq_rot90_f32,
__arm_vcaddq_rot270_f32, __arm_vcmlaq_f16, __arm_vcmlaq_rot180_f16,
__arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32,
__arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32,
__arm_vcmlaq_rot90_f32): Update builtin calls.
* config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u,
vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f,
vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f,
vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f): Removed.
(vcaddq_rot90, vcaddq_rot270, vcmulq, vcmulq_rot90, vcmulq_rot180,
vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270):
New.
* config/arm/constraints.md (Dz): Include MVE.
* config/arm/iterators.md (mve_rotsplit1, mve_rotsplit2): New.
* config/arm/mve.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S, VCADDQ_ROT270_U,
VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F, VCMULQ_F,
VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F, VCMLAQ_F,
VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F, VCADDQ_ROT270_S,
VCADDQ_ROT270, VCADDQ_ROT90): Removed.
(mve_rot, VCMUL): New.
(mve_vcaddq_rot270_,
mve_vcaddq_rot270_f, mve_vcaddq_rot90_f, mve_vcmulq_f, mve_vcmulq_rot270_f,
mve_vcmulq_rot90_f, mve_vcmlaq_f, mve_vcmlaq_rot180_f,
mve_vcmlaq_rot270_f, mve_vcmlaq_rot90_f): Removed.
(mve_vcmlaq, mve_vcmulq,
mve_vcaddq, cadd3, mve_vcaddq):
New.
* config/arm/neon.md (cadd3, cml4):
Moved.
(cmul3): Exclude MVE types.
* config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270): New.
* config/arm/vec-common.md (cadd3,

Re: Issue with ggc_delete and finalizers (was Re: New modref/ipa_modref optimization passes)

2020-09-25 Thread David Malcolm via Gcc-patches
On Thu, 2020-09-24 at 08:30 +0200, Jan Hubicka wrote:
> Hi,
> This patch makes ggc_delete to be paired with ggc_alloc_no_dtor.
> I copy same scheme as used by Martin in ipa-fnsummary, that is
> creating a
> static member function create_ggc hidding the ugly bits and using it
> in
> ipa-modref.c.
> 
> I also noticed that modref-tree leaks memory on destruction/collapse
> method and
> fixed that.
> 
> Bootstrapped/regtested x86_64-linux.

It looks like you committed this as
c9da53d6987af5f8ff68b58dd76a9fbc900a6a21.

This appears to fix the issues seen with the GC with jit
(PR jit/97169).

With the previous commit, jit.sum had:

# of expected passes5751
# of unexpected failures64
# of unresolved testcases   1

with a number of SIGSEGV showing up in the FAIL reports, whereas with
c9da53d6987af5f8ff68b58dd76a9fbc900a6a21, jit.sum is restored to:

# of expected passes10854

Thanks!
Dave




Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-09-25 Thread Martin Liška

PING^5

On 7/21/20 6:24 PM, Qing Zhao wrote:

PING^4.

Our company is waiting for this patch to be committed to upstream.

Thanks a lot.

Qing


On Jun 16, 2020, at 7:49 AM, Martin Liška  wrote:

PING^3

On 6/2/20 11:16 AM, Martin Liška wrote:

PING^2
On 5/15/20 11:58 AM, Martin Liška wrote:

We're in stage1: PING^1

On 4/3/20 8:15 PM, Egeyar Bagcioglu wrote:



On 3/18/20 10:05 AM, Martin Liška wrote:

On 3/17/20 7:43 PM, Egeyar Bagcioglu wrote:

Hi Martin,

I like the patch. It definitely serves our purposes at Oracle and provides 
another way to do what my previous patches did as well.

1) It keeps the backwards compatibility regarding -frecord-gcc-switches; 
therefore, removes my related doubts about your previous patch.

2) It still makes use of -frecord-gcc-switches. The new option is only to 
control the format. This addresses some previous objections to having a new 
option doing something similar. Now the new option controls the behaviour of 
the existing one and that behaviour can be further extended.

3) It uses an environment variable as Jakub suggested.

The patch looks good and I confirm that it works for our purposes.


Hello.

Thank you for the support.



Having said that, I have to ask for recognition in this patch for my and my 
company's contributions. Can you please keep my name and my work email in the 
changelog and in the commit message?


Sure, sorry I forgot.


Hi Martin,

I noticed that some comments in the patch were still referring to 
--record-gcc-command-line, the option I suggested earlier. I updated those 
comments to mention -frecord-gcc-switches-format instead and also added my name 
to the patch as you agreed above. I attached the updated patch. We are starting 
to use this patch in the specific domain where we need its functionality.

Regards
Egeyar




Martin



Thanks
Egeyar



On 3/17/20 2:53 PM, Martin Liška wrote:

Hi.

I'm sending enhanced patch that makes the following changes:
- a new option -frecord-gcc-switches-format is added; the option
   selects format (processed, driver) for all options that record
   GCC command line
- Dwarf gen_produce_string is now used in -fverbose-asm
- The .s file is affected in the following way:

BEFORE:

# GNU C17 (SUSE Linux) version 9.2.1 20200128 [revision 
83f65674e78d97d27537361de1a9d74067ff228d] (x86_64-suse-linux)
#compiled by GNU C version 9.2.1 20200128 [revision 
83f65674e78d97d27537361de1a9d74067ff228d], GMP version 6.2.0, MPFR version 
4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP

# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -fpreprocessed test.i -march=znver1 -mmmx -mno-3dnow
# -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -msha
# -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx
# -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1
# -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw
# -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
# -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves
# -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi
# -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero -mno-pku
# -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni
# -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri -mno-movdir64b
# -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32
# --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=znver1
# -grecord-gcc-switches -g -fverbose-asm -frecord-gcc-switches
# options enabled:  -faggressive-loop-optimizations -fassume-phsa
# -fasynchronous-unwind-tables -fauto-inc-dec -fcommon
# -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
# -feliminate-unused-debug-types -ffp-int-builtin-inexact -ffunction-cse
# -fgcse-lm -fgnu-runtime -fgnu-unique -fident -finline-atomics
# -fipa-stack-alignment -fira-hoist-pressure -fira-share-save-slots
# -fira-share-spill-slots -fivopts -fkeep-static-consts
# -fleading-underscore -flifetime-dse -flto-odr-type-merging -fmath-errno
# -fmerge-debug-strings -fpeephole -fplt -fprefetch-loop-arrays
# -frecord-gcc-switches -freg-struct-return -fsched-critical-path-heuristic
# -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
# -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
# -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fschedule-fusion
# -fsemantic-interposition -fshow-column -fshrink-wrap-separate
# -fsigned-zeros -fsplit-ivs-in-unroller -fssa-backprop -fstdarg-opt
# -fstrict-volatile-bitfields -fsync-libcalls -ftrapping-math -ftree-cselim
# -ftree-forwprop -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
# -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop
# -ftree-reassoc -ftree-scev-cprop -funit-at-a-time -funwind-tables
# -fverbose-asm -fzero-initialized-in-bss -m128bit-long-double -m64 -m80387
# -mabm -madx -maes -malign-

Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 7:53 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> Hi, Richard,
>> 
>> As you suggested, I added a default implementation of the target hook 
>> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>> 
>> 
>> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>> 
>> void
>> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> 
> FWIW, I was suggesting to return the set of registers that are actually
> cleared too.  Here you have the hook emit the asm statement, but IMO the
> way we generate the asm for a given set of registers should be entirely
> target-independent, and happen outside the hook.
> 
> So the hook returning the set of cleared registers does two things:
> 
> (1) It indicates which registers should be clobbered by the asm
>(which would be generated after calling the hook, but emitted
>before the sequence of instructions generated by the hook).

For this purpose, this hook should return a Set of RTX that hold the cleared 
registers, a HARD_REG_SET is not enough.

Since in the ASM_OPERANDS, we will need the RTX for the register (not the 
REGNO).

Which data structure in GCC should be used here to hold this returned value as 
Set of RTX ?
> 
> (2) It indicates which registers should be treated as live on return.
> 
> FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.

Instead of storing this info in crtl, in my current patch, I added the 
following in “df-scan.c":
+static HARD_REG_SET zeroed_reg_set;

And routines that manipulate this HARD_REG_SET. 
I think that this should serve the same purpose as storing it to crtl? 

> Then the wrapper around EPILOGUE_USES that we talked about would
> check two things:
> 
> - EPILOGUE_USES itself
> - the crtl HARD_REG_SET
> 
> The crtl set would start out empty and remain empty unless the
> new option is used.

Yes, I did this for zeroed_reg_set in my current patch.
> 
>>if (zero_rtx[(int)mode] == NULL_RTX)
>>  {
>>zero_rtx[(int)mode] = reg;
>>tmp = gen_rtx_SET (reg, const0_rtx);
>>emit_insn (tmp);
>>  }
>>else
>>  emit_move_insn (reg, zero_rtx[(int)mode]);
> 
> Hmm, OK, so you're assuming that it's better to zero one register
> and reuse that register for later moves.  I guess this is my RISC
> background/bias showing, but I think it might be simpler to assume
> that zeroing is as cheap as a register move.  The danger with reusing
> earlier registers is that you might introduce a cross-bank move,
> and some targets can only do those via memory.
Okay, I will move zeroes to registers.
> 
> Or perhaps we could use insn_cost to choose between them.  But I think
> the first implementation can just zero each register individually,
> unless we already know of a specific case in which reusing registers
> is necessary.

The current X86 implementation uses register move instead of directly move zero 
to register, I guess it’s because the register move on X86 is cheaper.
> 
>> I tested this default implementation on aarch64 with a small testing case, 
>> -fzero-call-used-regs=all-gpr|used-gpr|used-gpr-arg|used-arg|used work well, 
>> however, 
>> -fzero-call-used-regs=all-arg or -fzero-call-used-regs=all have an internal 
>> compiler error as following:
>> 
>> t1.c:15:1: internal compiler error: in gen_highpart, at emit-rtl.c:1631
>>   15 | }
>>  | ^
>> 0xcff58b gen_highpart(machine_mode, rtx_def*)
>>  ../../hjl-caller-saved-gcc/gcc/emit-rtl.c:1631
>> 0x174b373 aarch64_split_128bit_move(rtx_def*, rtx_def*)
>>  ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.c:3390
>> 0x1d8b087 gen_split_11(rtx_insn*, rtx_def**)
>>  ../../hjl-caller-saved-gcc/gcc/config/aarch64/aarch64.md:1394
>> 
>> As I studied today, I found the major issue for this bug is because the 
>> following statement:
>> 
>>machine_mode mode = reg_raw_mode[regno];
>> 
>> “reg_raw_mode” returns E_TImode for aarch64 register V0 (which is a vector 
>> register on aarch64) , as a result, the zeroing insn for this register is:
>> 
>> (insn 112 111 113 7 (set (reg:TI 32 v0)
>>(const_int 0 [0])) "t1.c":15:1 -1
>> (nil))
>> 
>> 
>> However, looks like that the above RTL have to be splitted into two sub 
>> register moves on aarch64, and the splitting has some issue. 
>> 
>> So, I guess that on aarch64, zeroing vector registers might need other modes 
>> than the one returned by “reg_raw_mode”.  
>> 
>> My questions are:
>> 
>> 1. Is there another available utility routine that returns the proper MODE 
>> for the hard registers that can be readily used to zero the hardr register?
>> 2. If not, should I add one more target hook for this purpose? i.e 
>> 
>> /* Return the proper machine mode that can be used to zero this hard 
>> register specified by REGNO.  */
>> machine_mode zero-call-used-regs-mode (unsigned int REGNO)
> 
> Thanks for testing aarch64.  I think there are two issues here,
> one in the p

[PATCH] New patch for the port of gccgo to GNU/Hurd

2020-09-25 Thread Svante Signell via Gcc-patches
Hello,

Latest Debian snapshot of gcc (20200917-1) FTBFS due to a missing hurd
entry in the // +build line of libgo/go/net/fd_posix.go. Attached is a
patch for that missing entry.

With it the latest Debian snapshot has been successfully built. Test
results for libgo and go are:

=== libgo Summary ===

# of expected passes163
# of unexpected failures12

=== go Summary ===

# of expected passes7469
# of unexpected failures10
# of expected failures  1
# of untested testcases 6
# of unsupported tests  2


Thanks!
--- a/src/libgo/go/net/fd_posix.go	2020-08-03 15:12:53.0 +0200
+++ b/src/libgo/go/net/fd_posix.go	2020-09-24 16:03:50.0 +0200
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris windows
+// +build aix darwin dragonfly freebsd hurd linux netbsd openbsd solaris windows
 
 package net
 


Re: [PATCH 1/2] rs6000: Support _mm_insert_epi{8,32,64}

2020-09-25 Thread Peter Bergner via Gcc-patches
On 9/24/20 6:22 PM, Segher Boessenkool wrote:
>> +  result [(__N & 0b)] = __D;
> 
> Hrm, GCC supports binary constants like this since 2007, so okay.  But I
> have to wonder if this improves anything over hex (or decimal even!)
> The parens are superfluous (and only hinder legibility), fwiw.

+1 for using hex constants when using them with logical ops like '&'.

Peter




Re: [PATCH] OpenACC: Separate enter/exit data APIs

2020-09-25 Thread Andrew Stubbs

On 30/07/2020 12:10, Andrew Stubbs wrote:

On 29/07/2020 15:05, Andrew Stubbs wrote:
This patch does not implement anything new, but simply separates 
OpenACC 'enter data' and 'exit data' into two libgomp API functions.  
The original API name is kept for backward compatibility, but no 
longer referenced by the compiler.


The previous implementation assumed that it would always be possible 
to infer which kind of pragma it was dealing with from the context, 
but there are a few exceptions, and I want to add one more: 
zero-length arrays.


By cleaning this up I will be free to add the new feature without the 
reference counting getting broken.


This update fixes a new conflict and updates the patterns in a number of 
testcases that were affected.


OK to commit?

Andrew
OpenACC: Separate enter/exit data APIs

Move the OpenACC enter and exit data directives from using a single builtin
to having one each.  For most purposes it was easy to tell which was which,
from the directives given, but there are some exceptions.  In particular,
zero-length array copies are indistiguishable, but we still want reference
counting to work.

gcc/ChangeLog:

	* gimple-pretty-print.c (dump_gimple_omp_target): Replace
	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA with
	GF_OMP_TARGET_KIND_OACC_ENTER_DATA and
	GF_OMP_TARGET_KIND_OACC_EXIT_DATA.
	* gimple.h (enum gf_mask): Likewise.
	(is_gimple_omp_oacc): Likewise.
	* gimplify.c (gimplify_omp_target_update): Likewise.
	* omp-builtins.def (BUILT_IN_GOACC_ENTER_EXIT_DATA): Delete.
	(BUILT_IN_GOACC_ENTER_DATA): Add new.
	(BUILT_IN_GOACC_EXIT_DATA): Add new.
	* omp-expand.c (expand_omp_target): Replace
	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA with
	GF_OMP_TARGET_KIND_OACC_ENTER_DATA and
	GF_OMP_TARGET_KIND_OACC_EXIT_DATA.
	(build_omp_regions_1): Likewise.
	(omp_make_gimple_edges): Likewise.
	* omp-low.c (check_omp_nesting_restrictions): Likewise.
	(lower_omp_target): Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc-gomp/nesting-fail-1.c: Adjust patterns.
	* c-c++-common/goacc/finalize-1.c: Adjust patterns.
	* c-c++-common/goacc/mdc-1.c: Adjust patterns.
	* c-c++-common/goacc/nesting-fail-1.c: Adjust patterns.
	* c-c++-common/goacc/struct-enter-exit-data-1.c: Adjust patterns.

libgomp/ChangeLog:

	* libgomp.map: Add GOACC_enter_data and GOACC_exit_data.
	* libgomp_g.h (GOACC_enter_exit_data): Delete.
	(GOACC_enter_data): New prototype.
	(GOACC_exit_data) New prototype.:
	* oacc-mem.c (GOACC_enter_exit_data): Move most of the content ...
	(GOACC_enter_exit_data_internal): ... here.
	(GOACC_enter_data): New function.
	(GOACC_exit_data) New function.:
	* oacc-parallel.c (GOACC_declare): Replace GOACC_enter_exit_data with
	  GOACC_enter_data and GOACC_exit_data.
	* testsuite/libgomp.oacc-c-c++-common/lib-26.c: Delete file.
	* testsuite/libgomp.oacc-c-c++-common/lib-36.c: Delete file.
	* testsuite/libgomp.oacc-c-c++-common/lib-40.c: Delete file.

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index a01bf901657..26978ec1ab5 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1691,8 +1691,11 @@ dump_gimple_omp_target (pretty_printer *buffer, const gomp_target *gs,
 case GF_OMP_TARGET_KIND_OACC_UPDATE:
   kind = " oacc_update";
   break;
-case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
-  kind = " oacc_enter_exit_data";
+case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
+  kind = " oacc_enter_data";
+  break;
+case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
+  kind = " oacc_exit_data";
   break;
 case GF_OMP_TARGET_KIND_OACC_DECLARE:
   kind = " oacc_declare";
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 6cc7e66059d..3f17b1c0739 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -171,9 +171,10 @@ enum gf_mask {
 GF_OMP_TARGET_KIND_OACC_SERIAL = 7,
 GF_OMP_TARGET_KIND_OACC_DATA = 8,
 GF_OMP_TARGET_KIND_OACC_UPDATE = 9,
-GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 10,
+GF_OMP_TARGET_KIND_OACC_ENTER_DATA = 10,
 GF_OMP_TARGET_KIND_OACC_DECLARE = 11,
 GF_OMP_TARGET_KIND_OACC_HOST_DATA = 12,
+GF_OMP_TARGET_KIND_OACC_EXIT_DATA = 13,
 GF_OMP_TEAMS_HOST		= 1 << 0,
 
 /* True on an GIMPLE_OMP_RETURN statement if the return does not require
@@ -6482,7 +6483,8 @@ is_gimple_omp_oacc (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_UPDATE:
-	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
+	case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
+	case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
 	case GF_OMP_TARGET_KIND_OACC_DECLARE:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 	  return true;
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2dea03cce3d..8fcba8b5b18 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12976,8 +12976,11 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   switch (TREE_CODE (expr))
 {
 case OACC_ENTER_DATA:
+  kind = GF_OMP_TARGET_KIND_OACC_ENTER_DATA;
+  ort = ORT_ACC;
+   

Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Richard Sandiford
Qing Zhao  writes:
>> On Sep 25, 2020, at 7:53 AM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao  writes:
>>> Hi, Richard,
>>> 
>>> As you suggested, I added a default implementation of the target hook 
>>> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
>>> 
>>> 
>>> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>>> 
>>> void
>>> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> 
>> FWIW, I was suggesting to return the set of registers that are actually
>> cleared too.  Here you have the hook emit the asm statement, but IMO the
>> way we generate the asm for a given set of registers should be entirely
>> target-independent, and happen outside the hook.
>> 
>> So the hook returning the set of cleared registers does two things:
>> 
>> (1) It indicates which registers should be clobbered by the asm
>>(which would be generated after calling the hook, but emitted
>>before the sequence of instructions generated by the hook).
>
> For this purpose, this hook should return a Set of RTX that hold the cleared 
> registers, a HARD_REG_SET is not enough.
>
> Since in the ASM_OPERANDS, we will need the RTX for the register (not the 
> REGNO).
>
> Which data structure in GCC should be used here to hold this returned value 
> as Set of RTX ?

A HARD_REG_SET is enough.  All the caller needs to know is: which registers
were clobbered?  It can then represent a clobber of R with a clobber of
reg_regno_rtx[R].

The mode isn't important for single-register clobbers: clobbering a single
register in one mode is equivalent to clobbering it in another mode.
None of the register contents survive the clobber.

>> (2) It indicates which registers should be treated as live on return.
>> 
>> FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.
>
> Instead of storing this info in crtl, in my current patch, I added the 
> following in “df-scan.c":
> +static HARD_REG_SET zeroed_reg_set;
>
> And routines that manipulate this HARD_REG_SET. 
> I think that this should serve the same purpose as storing it to crtl? 

Storing it in crtl is better for two reasons:

- Using global statics for this kind of thing makes it harder to
  compile functions in parallel.  (Work is underway to allow that.)

- Having the information in crtl reduces the risk that information
  from one function will get reused for another function, without the
  variable being reinitialised inbetween.

Thanks,
Richard


Re: [EXTERNAL] Re: [PATCH 2/2, rs6000] VSX load/store rightmost element operations

2020-09-25 Thread will schmidt via Gcc-patches
On Thu, 2020-09-24 at 19:40 -0500, Segher Boessenkool wrote:
> On Thu, Sep 24, 2020 at 11:04:38AM -0500, will schmidt wrote:
> > [PATCH 2/2, rs6000] VSX load/store rightmost element operations
> > 
> > Hi,
> >   This adds support for the VSX load/store rightmost element
> > operations.
> > This includes the instructions lxvrbx, lxvrhx, lxvrwx, lxvrdx,
> > stxvrbx, stxvrhx, stxvrwx, stxvrdx; And the builtins
> > vec_xl_sext() /* vector load sign extend */
> > vec_xl_zext() /* vector load zero extend */
> > vec_xst_trunc() /* vector store truncate */.
> > 
> > Testcase results show that the instructions added with this patch
> > show
> > up at low/no optimization (-O0), with a number of those being
> > replaced
> > with other load and store instructions at higher optimization
> > levels.
> > For consistency I've left the tests at -O0.
> > 
> > Regtested OK for Linux on power8,power9 targets.  Sniff-regtested
> > OK on
> > power10 simulator.
> > OK for trunk?
> > 
> > Thanks,
> > -Will
> > 
> > gcc/ChangeLog:
> > * config/rs6000/altivec.h (vec_xl_zest, vec_xl_sext,
> > vec_xst_trunc): New
> > defines.
> 
> vec_xl_zext (no humour there :-) ).

Lol.. one of them slipped through.. my muscle memory struggled on
typing these.. :-)


> 
> > +BU_P10V_OVERLOAD_X (SE_LXVRX,   "se_lxvrx")
> > +BU_P10V_OVERLOAD_X (ZE_LXVRX,   "ze_lxvrx")
> > +BU_P10V_OVERLOAD_X (TR_STXVRX,  "tr_stxvrx")
> 
> I'm not a fan of the cryptic names.  I guess I'll get used to them ;-
> )
> 
> > +  if (op0 == const0_rtx)
> > + addr = gen_rtx_MEM (blk ? BLKmode : tmode, op1);
> 
> That indent is broken.
> 
> > +  else
> > + {
> > +   op0 = copy_to_mode_reg (mode0, op0);
> 
> And so is this.  Should be two spaces, not three.
> 
> > +   addr = gen_rtx_MEM (blk ? BLKmode : smode,
> > + gen_rtx_PLUS (Pmode, op1, op0));
> 
> "gen_rtx_PLUS" should line up with "blk".
> 
> > +  if (sign_extend)
> > +{
> > +   rtx discratch = gen_reg_rtx (DImode);
> > +   rtx tiscratch = gen_reg_rtx (TImode);
> 
> More broken indentation.  (And more later.)
> 
> > +   // emit the lxvr*x insn.
> 
> Use only /* comments */ please, don't mix them.  Emit with a capital
> E.
> 
> > +   pat = GEN_FCN (icode) (tiscratch, addr);
> > +   if (! pat)
> 
> No space after "!" (or any other unary op other than casts and sizeof
> and the like).
> 
> > +   // Emit a sign extention from QI,HI,WI to double.
> 
> "extension"

willdo, thanks

> 
> > +;; Store rightmost element into store_data
> > +;; using stxvrbx, stxvrhx, strvxwx, strvxdx.
> > +(define_insn "vsx_stxvrx"
> > +   [(set
> > +  (match_operand:INT_ISA3 0 "memory_operand" "=Z")
> > +  (truncate:INT_ISA3 (match_operand:TI 1
> > "vsx_register_operand" "wa")))]
> > +  "TARGET_POWER10"
> > +  "stxvrx %1,%y0"
> 
> %x1 I think?

I'll doublecheck. 

> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-
> > char.c
> > @@ -0,0 +1,168 @@
> > +/*
> > + * Test of vec_xl_sext and vec_xl_zext (load into rightmost
> > + * vector element and zero/sign extend). */
> > +
> > +/* { dg-do compile {target power10_ok} } */
> > +/* { dg-do run {target power10_hw} } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> 
> If you dg_require it, why test it on the "dg-do compile" line?  It
> will
> *work* with it of course, but it is puzzling :-)

I've had both compile-time and run-time versions of the test.  In this
case I wanted to try to handle both, so compile when I can compile it,
and run when I can run it, etc.

If that combo doesn't work the way I expect it to, i'll need to split
them out into separate tests.   

> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-load-element-extend-
> > int.c
> > @@ -0,0 +1,165 @@
> > +/*
> > + * Test of vec_xl_sext and vec_xl_zext (load into rightmost
> > + * vector element and zero/sign extend). */
> > +
> > +/* { dg-do compile {target power10_ok} } */
> > +/* { dg-do run {target power10_hw} } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-options "-mdejagnu-cpu=power10 -O0" } */
> 
> Please comment here what that -O0 is for?  So that we still know when
> we
> read it decades from now ;-)

I've got it commented at least once, I'll make sure to get all the
instances covered.

> 
> > +/* { dg-final { scan-assembler-times {\mlxvrwx\M} 2 } } */
> > +/* { dg-final { scan-assembler-times {\mlwax\M} 0 } } */
> 
> Maybe all of  {\mlwa}  here?

lwax was sufficient for what I sniff-tested.  I'll double-check.

Thanks,
-Will

> 
> 
> Segher



[Patch, fortran] PR/97045 A wrong column is selected when addressing individual elements of unlimited polymorphic dummy argument

2020-09-25 Thread Paul Richard Thomas via Gcc-patches
Hi All,

The original testcase turned out to be relatively easy to fix - the chunks
in trans-expr.c and trans-stmt.c do this. However, I tested character
actual arguments to 'write_array' in the testcase and found that the _len
component of the unlimited polymorphic dummy was not being used for the
selector and so the payloads were being treated as if they were
character(len = 1). The fix for this part of the problem further
complicates the building of array references. It looks to me as if
rationalizing this part of the trans-* part of gfortran is quite a
significant TODO, since it is now little more than bandaid on sticking
plaster! I will flag this up in a new PR.

Regtests on FC31/x86_64 - OK for master?

Paul

This patch fixes PR97045 - unlimited polymorphic array element selectors.

2020-25-09  Paul Thomas  

gcc/fortran
PR fortran/97045
* trans-array.c (gfc_conv_array_ref): Make sure that the class
decl is passed to build_array_ref in the case of unlimited
polymorphic entities.
* trans-expr.c (gfc_conv_derived_to_class): Ensure that array
refs do not preceed the _len component. Free the _len expr.
* trans-stmt.c (trans_associate_var): Reset 'need_len_assign'
for polymorphic scalars.
* trans.c (gfc_build_array_ref): When the vptr size is used for
span, multiply by the _len field of unlimited polymorphic
entities, when non-zero.

gcc/testsuite/
PR fortran/97045
* gfortran.dg/select_type_50.f90 : New test.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 6566c47d4ae..998d4d4ed9b 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3787,7 +3787,20 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
 	decl = sym->backend_decl;
 }
   else if (sym->ts.type == BT_CLASS)
-decl = NULL_TREE;
+{
+  if (UNLIMITED_POLY (sym))
+	{
+	  gfc_expr *class_expr = gfc_find_and_cut_at_last_class_ref (expr);
+	  gfc_init_se (&tmpse, NULL);
+	  gfc_conv_expr (&tmpse, class_expr);
+	  if (!se->class_vptr)
+	se->class_vptr = gfc_class_vptr_get (tmpse.expr);
+	  gfc_free_expr (class_expr);
+	  decl = tmpse.expr;
+	}
+  else
+	decl = NULL_TREE;
+}
 
   se->expr = build_array_ref (se->expr, offset, decl, se->class_vptr);
 }
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index a690839f591..2c31ec9bf01 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -728,7 +728,7 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e,
 	  gfc_expr *len;
 	  gfc_se se;
 
-	  len = gfc_copy_expr (e);
+	  len = gfc_find_and_cut_at_last_class_ref (e);
 	  gfc_add_len_component (len);
 	  gfc_init_se (&se, NULL);
 	  gfc_conv_expr (&se, len);
@@ -739,6 +739,7 @@ gfc_conv_derived_to_class (gfc_se *parmse, gfc_expr *e,
 	integer_zero_node));
 	  else
 	tmp = se.expr;
+	  gfc_free_expr (len);
 	}
   else
 	tmp = integer_zero_node;
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 389fec7227e..adc6b8fefb5 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -2091,6 +2091,7 @@ trans_associate_var (gfc_symbol *sym, gfc_wrapped_block *block)
 	  /* Obtain a temporary class container for the result.  */
 	  gfc_conv_derived_to_class (&se, e, sym->ts, tmp, false, false);
 	  se.expr = build_fold_indirect_ref_loc (input_location, se.expr);
+	  need_len_assign = false;
 	}
   else
 	{
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index ed054261452..8caa625ab0e 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -429,7 +429,28 @@ gfc_build_array_ref (tree base, tree offset, tree decl, tree vptr)
   /* If decl or vptr are non-null, pointer arithmetic for the array reference
  is likely. Generate the 'span' for the array reference.  */
   if (vptr)
-span = gfc_vptr_size_get (vptr);
+{
+  span = gfc_vptr_size_get (vptr);
+
+  /* Check if this is an unlimited polymorphic object carrying a character
+	 payload. In this case, the 'len' field is non-zero.  */
+  if (decl && GFC_CLASS_TYPE_P (TREE_TYPE (decl)))
+	{
+	  tmp = gfc_class_len_or_zero_get (decl);
+	  if (!integer_zerop (tmp))
+	{
+	  tree cond;
+	  tree stype = TREE_TYPE (span);
+	  tmp = fold_convert (stype, tmp);
+	  cond = fold_build2_loc (input_location, EQ_EXPR,
+  logical_type_node, tmp,
+  build_int_cst (stype, 0));
+	  tmp = fold_build2 (MULT_EXPR, stype, span, tmp);
+	  span = fold_build3_loc (input_location, COND_EXPR, stype,
+  cond, span, tmp);
+	}
+	}
+}
   else if (decl)
 span = get_array_span (type, decl);
 
! { dg-do run }
!
! Test the fix for PR97045. The report was for the INTEGER version. Testing
! revealed a further bug with the character versions.
!
! Contributed by Igor Gayday  
!
program test_prg
  implicit none
  integer :: i
  integer, allocatable :: arr(:, :)
  character(kind = 1, len = 2), allocatable :: chr(:, :)
  character(kind = 4, len = 2), allocatable :: chr4(:, :)

  arr = res

Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 10:28 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao mailto:qing.z...@oracle.com>> writes:
>>> On Sep 25, 2020, at 7:53 AM, Richard Sandiford  
>>> wrote:
>>> 
>>> Qing Zhao  writes:
 Hi, Richard,
 
 As you suggested, I added a default implementation of the target hook 
 “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
 
 
 /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
 
 void
 default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>>> 
>>> FWIW, I was suggesting to return the set of registers that are actually
>>> cleared too.  Here you have the hook emit the asm statement, but IMO the
>>> way we generate the asm for a given set of registers should be entirely
>>> target-independent, and happen outside the hook.
>>> 
>>> So the hook returning the set of cleared registers does two things:
>>> 
>>> (1) It indicates which registers should be clobbered by the asm
>>>   (which would be generated after calling the hook, but emitted
>>>   before the sequence of instructions generated by the hook).
>> 
>> For this purpose, this hook should return a Set of RTX that hold the cleared 
>> registers, a HARD_REG_SET is not enough.
>> 
>> Since in the ASM_OPERANDS, we will need the RTX for the register (not the 
>> REGNO).
>> 
>> Which data structure in GCC should be used here to hold this returned value 
>> as Set of RTX ?
> 
> A HARD_REG_SET is enough.  All the caller needs to know is: which registers
> were clobbered?  It can then represent a clobber of R with a clobber of
> reg_regno_rtx[R].

I did not find reg_regno_rtx in the current gcc source, not sure how to use it?
> 
> The mode isn't important for single-register clobbers: clobbering a single
> register in one mode is equivalent to clobbering it in another mode.
> None of the register contents survive the clobber.

Okay, I see.

Then is the following good enough:

/* Generate asm volatile("" : : : "memory") as a memory blockage, at the 
   same time clobbering the register set  specified by ZEROED_REGS.  */

void
expand_asm_memory_blockage_clobber_regs (HARD_REG_SET zeroed_regs)
{
  rtx asm_op, clob_mem, clob_reg;

  /* first get the number of registers that have been zeroed from ZEROED_REGS 
set.  */
  unsigned int  num_of_regs = ….;

  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
 rtvec_alloc (0), rtvec_alloc (0),
 rtvec_alloc (0), UNKNOWN_LOCATION);
  MEM_VOLATILE_P (asm_op) = 1;

  rtvec v = rtvec_alloc (num_of_regs + 2);

  clob_mem = gen_rtx_SCRATCH (VOIDmode);
  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);

  RTVEC_ELT (v,0) = asm_op;
  RTVEC_ELT (v,1) = clob_mem;

  for (unsigned int I = 0; i < FIRST_PSEUDO_REGISTER; i++)
If (TEST_HARD_REG_BIT (zeroed_regs, i))
  {
clob_reg  = gen_rtx_CLOBBER (VOIDmode, reg_regno_rtx[i]);
RTVEC_ELT (v,i+2) = clob_reg;
  }
  
  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));  
}

How to come up with the above:’

   clob_reg  = gen_rtx_CLOBBER (VOIDmode, reg_regno_rtx[i]);?

> 
>>> (2) It indicates which registers should be treated as live on return.
>>> 
>>> FWIW, for (2), I'd recommend storing the returned HARD_REG_SET in crtl.
>> 
>> Instead of storing this info in crtl, in my current patch, I added the 
>> following in “df-scan.c":
>> +static HARD_REG_SET zeroed_reg_set;
>> 
>> And routines that manipulate this HARD_REG_SET. 
>> I think that this should serve the same purpose as storing it to crtl? 
> 
> Storing it in crtl is better for two reasons:
> 
> - Using global statics for this kind of thing makes it harder to
>  compile functions in parallel.  (Work is underway to allow that.)
> 
> - Having the information in crtl reduces the risk that information
>  from one function will get reused for another function, without the
>  variable being reinitialised inbetween.

Okay, will add a new field zeroed_reg_set into crtl.

Thanks.

Qing
> 
> Thanks,
> Richard



[PATCH][GCC 8] AArch64: Implement Armv8.3-a complex arithmetic intrinsics

2020-09-25 Thread Kyrylo Tkachov
Hi all,

I'd like to backport some patches from Tamar in GCC 9 to GCC 8 that implement 
the complex arithmetic intrinsics for Advanced SIMD.
These should have been present in GCC 8 that gained support for Armv8.3-a.

There were 4 follow-up fixes that I've rolled into the one commit.

Bootstrapped and tested on aarch64-none-linux-gnu and arm-none-linux-gnueabihf 
on the GCC 8 branch.
Pushing to the releases/gcc-8 branch.

Thanks,
Kyrill

gcc/
PR target/71233
* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): Add 
qualifier_lane_pair_index.
(emit-rtl.h): Include.
(TYPES_QUADOP_LANE_PAIR): New.
(aarch64_simd_expand_args): Use it.
(aarch64_simd_expand_builtin): Likewise.
(AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_laneq_builtin_datum): 
New.
(FCMLA_LANEQ_BUILTIN, AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE,
AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_lane_builtin_data,
aarch64_init_fcmla_laneq_builtins, aarch64_expand_fcmla_builtin): New.
(aarch64_init_builtins): Add aarch64_init_fcmla_laneq_builtins.
(aarch64_expand_buildin): Add AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V2SF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V2SF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V2SF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ2700_V2SF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V4HF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V4HF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V4HF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ270_V4HF.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add 
__ARM_FEATURE_COMPLEX.
* config/aarch64/aarch64-simd-builtins.def (fcadd90, fcadd270, fcmla0, 
fcmla90,
fcmla180, fcmla270, fcmla_lane0, fcmla_lane90, fcmla_lane180, 
fcmla_lane270,
fcmla_laneq0, fcmla_laneq90, fcmla_laneq180, fcmla_laneq270,
fcmlaq_lane0, fcmlaq_lane90, fcmlaq_lane180, fcmlaq_lane270): New.
* config/aarch64/aarch64-simd.md (aarch64_fcmla_lane,
aarch64_fcmla_laneqv4hf, 
aarch64_fcmlaq_lane,aarch64_fcadd,
aarch64_fcmla): New.
* config/aarch64/arm_neon.h:
(vcadd_rot90_f16): New.
(vcaddq_rot90_f16): New.
(vcadd_rot270_f16): New.
(vcaddq_rot270_f16): New.
(vcmla_f16): New.
(vcmlaq_f16): New.
(vcmla_lane_f16): New.
(vcmla_laneq_f16): New.
(vcmlaq_lane_f16): New.
(vcmlaq_rot90_lane_f16): New.
(vcmla_rot90_laneq_f16): New.
(vcmla_rot90_lane_f16): New.
(vcmlaq_rot90_f16): New.
(vcmla_rot90_f16): New.
(vcmlaq_laneq_f16): New.
(vcmla_rot180_laneq_f16): New.
(vcmla_rot180_lane_f16): New.
(vcmlaq_rot180_f16): New.
(vcmla_rot180_f16): New.
(vcmlaq_rot90_laneq_f16): New.
(vcmlaq_rot270_laneq_f16): New.
(vcmlaq_rot270_lane_f16): New.
(vcmla_rot270_laneq_f16): New.
(vcmlaq_rot270_f16): New.
(vcmla_rot270_f16): New.
(vcmlaq_rot180_laneq_f16): New.
(vcmlaq_rot180_lane_f16): New.
(vcmla_rot270_lane_f16): New.
(vcadd_rot90_f32): New.
(vcaddq_rot90_f32): New.
(vcaddq_rot90_f64): New.
(vcadd_rot270_f32): New.
(vcaddq_rot270_f32): New.
(vcaddq_rot270_f64): New.
(vcmla_f32): New.
(vcmlaq_f32): New.
(vcmlaq_f64): New.
(vcmla_lane_f32): New.
(vcmla_laneq_f32): New.
(vcmlaq_lane_f32): New.
(vcmlaq_laneq_f32): New.
(vcmla_rot90_f32): New.
(vcmlaq_rot90_f32): New.
(vcmlaq_rot90_f64): New.
(vcmla_rot90_lane_f32): New.
(vcmla_rot90_laneq_f32): New.
(vcmlaq_rot90_lane_f32): New.
(vcmlaq_rot90_laneq_f32): New.
(vcmla_rot180_f32): New.
(vcmlaq_rot180_f32): New.
(vcmlaq_rot180_f64): New.
(vcmla_rot180_lane_f32): New.
(vcmla_rot180_laneq_f32): New.
(vcmlaq_rot180_lane_f32): New.
(vcmlaq_rot180_laneq_f32): New.
(vcmla_rot270_f32): New.
(vcmlaq_rot270_f32): New.
(vcmlaq_rot270_f64): New.
(vcmla_rot270_lane_f32): New.
(vcmla_rot270_laneq_f32): New.
(vcmlaq_rot270_lane_f32): New.
(vcmlaq_rot270_laneq_f32): New.
* config/aarch64/aarch64.h (TARGET_COMPLEX): New.
* config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New.
(FCADD, FCMLA): New.
(rot): New.
(FCMLA_maybe_lane): New.
* config/arm/types.md (neon_fcadd, neon_fcmla): New.

gcc/testsuite/
PR target/71233
* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache,
check_effective_target_arm_v8_3a_complex_neon_ok,
add_options_for_arm_v8_3a_complex_neon,
check_effective_target_arm_v8_3a_complex_neon_hw,
check_effective_target_vect_complex_rot_N): New.

[PATCH] AArch64: Add Linux cpuinfo string for rng feature

2020-09-25 Thread Kyrylo Tkachov
Hi all,

The Linux kernel has defined the cpuinfo string for the +rng feature, so this 
patch adds that to GCC so that -march=native can pick it up.
Bootstrapped and tested on aarch64-none-linux-gnu.
Committing to trunk and later to the branches.

Thanks,
Kyrill

gcc/
* config/aarch64/aarch64-option-extensions.def (rng): Add cpuinfo 
string.


rng-cpuinfo.patch
Description: rng-cpuinfo.patch


Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Richard Sandiford
Qing Zhao  writes:
>> On Sep 25, 2020, at 10:28 AM, Richard Sandiford  
>> wrote:
>> 
>> Qing Zhao mailto:qing.z...@oracle.com>> writes:
 On Sep 25, 2020, at 7:53 AM, Richard Sandiford  
 wrote:
 
 Qing Zhao  writes:
> Hi, Richard,
> 
> As you suggested, I added a default implementation of the target hook 
> “zero_cal_used_regs (HARD_REG_SET)” as following in my latest patch
> 
> 
> /* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
> 
> void
> default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 
 FWIW, I was suggesting to return the set of registers that are actually
 cleared too.  Here you have the hook emit the asm statement, but IMO the
 way we generate the asm for a given set of registers should be entirely
 target-independent, and happen outside the hook.
 
 So the hook returning the set of cleared registers does two things:
 
 (1) It indicates which registers should be clobbered by the asm
   (which would be generated after calling the hook, but emitted
   before the sequence of instructions generated by the hook).
>>> 
>>> For this purpose, this hook should return a Set of RTX that hold the 
>>> cleared registers, a HARD_REG_SET is not enough.
>>> 
>>> Since in the ASM_OPERANDS, we will need the RTX for the register (not the 
>>> REGNO).
>>> 
>>> Which data structure in GCC should be used here to hold this returned value 
>>> as Set of RTX ?
>> 
>> A HARD_REG_SET is enough.  All the caller needs to know is: which registers
>> were clobbered?  It can then represent a clobber of R with a clobber of
>> reg_regno_rtx[R].
>
> I did not find reg_regno_rtx in the current gcc source, not sure how to use 
> it?

Sorry, I misremembered the name, it's regno_reg_rtx (which makes
more sense than what I wrote).

>> 
>> The mode isn't important for single-register clobbers: clobbering a single
>> register in one mode is equivalent to clobbering it in another mode.
>> None of the register contents survive the clobber.
>
> Okay, I see.
>
> Then is the following good enough:
>
> /* Generate asm volatile("" : : : "memory") as a memory blockage, at the 
>same time clobbering the register set  specified by ZEROED_REGS.  */
>
> void
> expand_asm_memory_blockage_clobber_regs (HARD_REG_SET zeroed_regs)
> {
>   rtx asm_op, clob_mem, clob_reg;
>
>   /* first get the number of registers that have been zeroed from ZEROED_REGS 
> set.  */
>   unsigned int  num_of_regs = ….;
>
>   asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>  rtvec_alloc (0), rtvec_alloc (0),
>  rtvec_alloc (0), UNKNOWN_LOCATION);
>   MEM_VOLATILE_P (asm_op) = 1;
>
>   rtvec v = rtvec_alloc (num_of_regs + 2);
>
>   clob_mem = gen_rtx_SCRATCH (VOIDmode);
>   clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
>   clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
>
>   RTVEC_ELT (v,0) = asm_op;
>   RTVEC_ELT (v,1) = clob_mem;
>
>   for (unsigned int I = 0; i < FIRST_PSEUDO_REGISTER; i++)
> If (TEST_HARD_REG_BIT (zeroed_regs, i))
>   {
> clob_reg  = gen_rtx_CLOBBER (VOIDmode, reg_regno_rtx[i]);
> RTVEC_ELT (v,i+2) = clob_reg;
>   }
>   
>   emit_insn (gen_rtx_PARALLEL (VOIDmode, v));  
> }

Yeah, looks like it should work.

Thanks,
Richard


Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 11:58 AM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
 
 
 Which data structure in GCC should be used here to hold this returned 
 value as Set of RTX ?
>>> 
>>> A HARD_REG_SET is enough.  All the caller needs to know is: which registers
>>> were clobbered?  It can then represent a clobber of R with a clobber of
>>> reg_regno_rtx[R].
>> 
>> I did not find reg_regno_rtx in the current gcc source, not sure how to use 
>> it?
> 
> Sorry, I misremembered the name, it's regno_reg_rtx (which makes
> more sense than what I wrote).

Found it!

> 
>>> 
>>> The mode isn't important for single-register clobbers: clobbering a single
>>> register in one mode is equivalent to clobbering it in another mode.
>>> None of the register contents survive the clobber.
>> 
>> Okay, I see.
>> 
>> Then is the following good enough:
>> 
>> /* Generate asm volatile("" : : : "memory") as a memory blockage, at the 
>>   same time clobbering the register set  specified by ZEROED_REGS.  */
>> 
>> void
>> expand_asm_memory_blockage_clobber_regs (HARD_REG_SET zeroed_regs)
>> {
>>  rtx asm_op, clob_mem, clob_reg;
>> 
>>  /* first get the number of registers that have been zeroed from ZEROED_REGS 
>> set.  */
>>  unsigned int  num_of_regs = ….;
>> 
>>  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>> rtvec_alloc (0), rtvec_alloc (0),
>> rtvec_alloc (0), UNKNOWN_LOCATION);
>>  MEM_VOLATILE_P (asm_op) = 1;
>> 
>>  rtvec v = rtvec_alloc (num_of_regs + 2);
>> 
>>  clob_mem = gen_rtx_SCRATCH (VOIDmode);
>>  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
>>  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
>> 
>>  RTVEC_ELT (v,0) = asm_op;
>>  RTVEC_ELT (v,1) = clob_mem;
>> 
>>  for (unsigned int I = 0; i < FIRST_PSEUDO_REGISTER; i++)
>>If (TEST_HARD_REG_BIT (zeroed_regs, i))
>>  {
>>clob_reg  = gen_rtx_CLOBBER (VOIDmode, reg_regno_rtx[i]);
>>RTVEC_ELT (v,i+2) = clob_reg;
>>  }
>> 
>>  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));  
>> }
> 
> Yeah, looks like it should work.

thanks.

Last question, in the following code portion:

  /* Now we get a hard register set that need to be zeroed, pass it to
 target to generate zeroing sequence.  */
  HARD_REG_SET zeroed_hardregs;
  start_sequence ();
  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
  rtx_insn *seq = get_insns ();
  end_sequence ();
  if (seq)
{
  /* emit the memory blockage and register clobber asm volatile.  */
  rtx barrier_rtx = expand_asm_reg_clobber_blockage (zeroed_hardregs);

 /* How to insert the barrier_rtx before "seq"???.  */
 ??
 emit_insn_before (barrier_rtx, seq);  ??

  emit_insn_before (seq, ret);
  
  /* update the data flow information.  */

  df_set_bb_dirty (BLOCK_FOR_INSN (ret));
}

In the above, how should I insert the barrier_rtx in the beginning of “seq” ? 
And then insert the seq before ret?
Is there special thing I need to take care?

Qing
> 
> Thanks,
> Richard



c++: Replace tag_scope with TAG_how

2020-09-25 Thread Nathan Sidwell


I always found tag_scope confusing, as it is not a scope, but a
direction of how to lookup or insert an elaborated type tag.  This
replaces it with a enum class TAG_how.  I also add a new value,
HIDDEN_FRIEND, to distinguish the two cases of innermost-non-class
insertion that we currently conflate.  Also renamed
'lookup_type_scope' to 'lookup_elaborated_type', because again, we're
not providing a scope to lookup in.

gcc/cp/
* name-lookup.h (enum tag_scope): Replace with ...
(enum class TAG_how): ... this.  Add HIDDEN_FRIEND value.
(lookup_type_scope): Replace with ...
(lookup_elaborated_type): ... this.
(pushtag): Use TAG_how, not tag_scope.
* cp-tree.h (xref_tag): Parameter is TAG_how, not tag_scope.
* decl.c (lookup_and_check_tag): Likewise.  Adjust.
(xref_tag_1, xref_tag): Likewise. adjust.
(start_enum): Adjust lookup_and_check_tag call.
* name-lookup.c (lookup_type_scope_1): Rename to ...
(lookup_elaborated_type_1) ... here. Use TAG_how, not tag_scope.
(lookup_type_scope): Rename to ...
(lookup_elaborated_type): ... here.  Use TAG_how, not tag_scope.
(do_pushtag): Use TAG_how, not tag_scope.  Adjust.
(pushtag): Likewise.
* parser.c (cp_parser_elaborated_type_specifier): Adjust.
(cp_parser_class_head): Likewise.
gcc/objcp/
* objcp-decl.c (objcp_start_struct): Use TAG_how not tag_scope.
(objcp_xref_tag): Likewise.


pushing to trunk

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index bd78f00ba97..321bb959120 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -6507,7 +6507,7 @@ extern void grok_special_member_properties	(tree);
 extern bool grok_ctor_properties		(const_tree, const_tree);
 extern bool grok_op_properties			(tree, bool);
 extern tree xref_tag(tag_types, tree,
-		 tag_scope = ts_current,
+		 TAG_how = TAG_how::CURRENT_ONLY,
 		 bool tpl_header_p = false);
 extern void xref_basetypes			(tree, tree);
 extern tree start_enum(tree, tree, tree, tree, bool, bool *);
diff --git i/gcc/cp/decl.c w/gcc/cp/decl.c
index 1709dd9a370..b481bbd7b7d 100644
--- i/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -75,7 +75,7 @@ static void record_unknown_type (tree, const char *);
 static int member_function_or_else (tree, tree, enum overload_flags);
 static tree local_variable_p_walkfn (tree *, int *, void *);
 static const char *tag_name (enum tag_types);
-static tree lookup_and_check_tag (enum tag_types, tree, tag_scope, bool);
+static tree lookup_and_check_tag (enum tag_types, tree, TAG_how, bool);
 static void maybe_deduce_size_from_array_init (tree, tree);
 static void layout_var_decl (tree);
 static tree check_initializer (tree, tree, int, vec **);
@@ -14862,11 +14862,10 @@ check_elaborated_type_specifier (enum tag_types tag_code,
 
 static tree
 lookup_and_check_tag (enum tag_types tag_code, tree name,
-		  tag_scope scope, bool template_header_p)
+		  TAG_how how, bool template_header_p)
 {
-  tree t;
   tree decl;
-  if (scope == ts_global)
+  if (how == TAG_how::GLOBAL)
 {
   /* First try ordinary name lookup, ignoring hidden class name
 	 injected via friend declaration.  */
@@ -14879,16 +14878,16 @@ lookup_and_check_tag (enum tag_types tag_code, tree name,
 	 If we find one, that name will be made visible rather than
 	 creating a new tag.  */
   if (!decl)
-	decl = lookup_type_scope (name, ts_within_enclosing_non_class);
+	decl = lookup_elaborated_type (name, TAG_how::INNERMOST_NON_CLASS);
 }
   else
-decl = lookup_type_scope (name, scope);
+decl = lookup_elaborated_type (name, how);
 
   if (decl
   && (DECL_CLASS_TEMPLATE_P (decl)
-	  /* If scope is ts_current we're defining a class, so ignore a
-	 template template parameter.  */
-	  || (scope != ts_current
+	  /* If scope is TAG_how::CURRENT_ONLY we're defining a class,
+	 so ignore a template template parameter.  */
+	  || (how != TAG_how::CURRENT_ONLY
 	  && DECL_TEMPLATE_TEMPLATE_PARM_P (decl
 decl = DECL_TEMPLATE_RESULT (decl);
 
@@ -14898,11 +14897,10 @@ lookup_and_check_tag (enum tag_types tag_code, tree name,
 	   class C {
 	 class C {};
 	   };  */
-  if (scope == ts_current && DECL_SELF_REFERENCE_P (decl))
+  if (how == TAG_how::CURRENT_ONLY && DECL_SELF_REFERENCE_P (decl))
 	{
 	  error ("%qD has the same name as the class in which it is "
-		 "declared",
-		 decl);
+		 "declared", decl);
 	  return error_mark_node;
 	}
 
@@ -14922,10 +14920,10 @@ lookup_and_check_tag (enum tag_types tag_code, tree name,
 	 class C *c2;		// DECL_SELF_REFERENCE_P is true
 	   };  */
 
-  t = check_elaborated_type_specifier (tag_code,
-	   decl,
-	   template_header_p
-	   | DECL_SELF_REFERENCE_P (decl));
+  tree t = check_elaborated_type_specifier (tag_code,
+		decl,
+		template_header_p
+		| DECL_SELF_REFERENCE_P (decl));
   if (templa

Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Richard Sandiford
Qing Zhao  writes:
> Last question, in the following code portion:
>
>   /* Now we get a hard register set that need to be zeroed, pass it to
>  target to generate zeroing sequence.  */
>   HARD_REG_SET zeroed_hardregs;
>   start_sequence ();
>   zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
>   rtx_insn *seq = get_insns ();
>   end_sequence ();
>   if (seq)
> {
>   /* emit the memory blockage and register clobber asm volatile.  */
>   rtx barrier_rtx = expand_asm_reg_clobber_blockage (zeroed_hardregs);
>
>  /* How to insert the barrier_rtx before "seq"???.  */
>  ??
>  emit_insn_before (barrier_rtx, seq);  ??
>
>   emit_insn_before (seq, ret);
>   
>   /* update the data flow information.  */
>
>   df_set_bb_dirty (BLOCK_FOR_INSN (ret));
> }
>
> In the above, how should I insert the barrier_rtx in the beginning of “seq” ? 
> And then insert the seq before ret?
> Is there special thing I need to take care?

Easiest way is just to insert both of them before ret:

  emit_insn_before (barrier_rtx, ret);
  emit_insn_before (seq, ret);

Note that you shouldn't need to mark the block containing the
return instruction as dirty: the emit machinery should do that
for you.  But it might be necessary to mark the exit block
(EXIT_BLOCK_PTR_FOR_FN (cfun)) as dirty because of the new
liveness information -- I'm not sure.

Thanks,
Richard


Re: [PATCH, rs6000] correct an erroneous BTM value in the BU_P10_MISC define

2020-09-25 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 03:35:24PM -0500, will schmidt wrote:
> We have extraneous BTM entry (RS6000_BTM_POWERPC64) in the define for
> our P10 MISC 2 builtin definition.  This does not exist for the '0',
> '1' or '3' definitions. It appears to me that this was erroneously
> copied from the P7 version of the define which contains a version of the
> BU macro both with and without that element.  Removing the
> RS6000_BTM_POWERPC64 portion of the define does not introduce any obvious
> failures, I believe this extra line can be safely removed.

No, it cannot.

This is used for pdepd/pextd/cntlzdm/cnttzdm/cfuged, all of which do
need 64-bit registers to do anything sane.

This should really have defined some new builtin class, and I thought we
could just be tricky and take a massive shortcut.  Bill has been hit by
this already as well, sigh :-(


Segher


Re: [PATCH] generalized range_query class for multiple contexts

2020-09-25 Thread Andrew MacLeod via Gcc-patches

On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote:

On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
As part of the ranger work, we have been trying to clean up and 
generalize interfaces whenever possible. This not only helps in 
reducing the maintenance burden going forward, but provides 
mechanisms for backwards compatibility between ranger and other 
providers/users of ranges throughout the compiler like evrp and VRP.


One such interface is the range_query class in vr_values.h, which 
provides a range query mechanism for use in the simplify_using_ranges 
module.  With it, simplify_using_ranges can be used with the ranger, 
or the VRP twins by providing a get_value_range() method.  This has 
helped us in comparing apples to apples while doing our work, and has 
also future proofed the interface so that asking for a range can be 
done within the context in which it appeared.  For example, 
get_value_range now takes a gimple statement which provides context.  
We are no longer tied to asking for a global SSA range, but can ask 
for the range of an SSA within a statement. Granted, this 
functionality is currently only in the ranger, but evrp/vrp could be 
adapted to pass such context.


The range_query is a good first step, but what we really want is a 
generic query mechanism that can ask for SSA ranges within an 
expression, a statement, an edge, or anything else that may come up.  
We think that a generic mechanism can be used not only for range 
producers, but consumers such as the substitute_and_fold_engine (see 
get_value virtual) and possibly the gimple folder (see valueize).


The attached patchset provides such an interface.  It is meant to be 
a replacement for range_query that can be used for vr_values, 
substitute_and_fold, the subsitute_and_fold_engine, as well as the 
ranger.  The general API is:


class value_query
{
public:
   // Return the singleton expression for NAME at a gimple statement,
   // or NULL if none found.
   virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
   // Return the singleton expression for NAME at an edge, or NULL if
   // none found.
   virtual tree value_on_edge (edge, tree name);
   // Return the singleton expression for the LHS of a gimple
   // statement, assuming an (optional) initial value of NAME. Returns
   // NULL if none found.
   //
   // Note this method calculates the range the LHS would have *after*
   // the statement has executed.
   virtual tree value_of_stmt (gimple *, tree name = NULL);
};

class range_query : public value_query
{
public:
   range_query ();
   virtual ~range_query ();

   virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
   virtual tree value_on_edge (edge, tree name) OVERRIDE;
   virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;

   // These are the range equivalents of the value_* methods. Instead
   // of returning a singleton, they calculate a range and return it in
   // R.  TRUE is returned on success or FALSE if no range was found.
   virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) 
= 0;

   virtual bool range_on_edge (irange &r, edge, tree name);
   virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);

   // DEPRECATED: This method is used from vr-values.  The plan is to
   // rewrite all uses of it to the above API.
   virtual const class value_range_equiv *get_value_range (const_tree,
   gimple * = NULL);
};

The duality of the API (value_of_* and range_on_*) is because some 
passes are interested in a singleton value 
(substitute_and_fold_enginge), while others are interested in ranges 
(vr_values).  Passes that are only interested in singletons can take 
a value_query, while passes that are interested in full ranges, can 
take a range_query.  Of course, for future proofing, we would 
recommend taking a range_query, since if you provide a default 
range_of_expr, sensible defaults will be provided for the others in 
terms of range_of_expr.


Note, that the absolute bare minimum that must be provided is a 
value_of_expr and a range_of_expr respectively.


One piece of the API which is missing is a method  to return the 
range of an arbitrary SSA_NAME *after* a statement.  Currently 
range_of_expr calculates the range of an expression upon entry to the 
statement, whereas range_of_stmt calculates the range of *only* the 
LHS of a statement AFTER the statement has executed.


This would allow for complete representation of the ranges/values in 
something like:


 d_4 = *g_7;

Here the range of g_7 upon entry could be VARYING, but after the 
dereference we know it must be non-zero.  Well for sane targets anyhow.


Choices would be to:

   1) add a 4th method such as "range_after_stmt", or

   2) merge that functionality with the existing range_of_stmt method 
to provide "after" functionality for any ssa_name. Currently the 
SSA_NAME must be the same as the LHS if specified.  It also does not 
need to

Re: One issue with default implementation of zero_call_used_regs

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 12:31 PM, Richard Sandiford  
> wrote:
> 
> Qing Zhao  writes:
>> Last question, in the following code portion:
>> 
>>  /* Now we get a hard register set that need to be zeroed, pass it to
>> target to generate zeroing sequence.  */
>>  HARD_REG_SET zeroed_hardregs;
>>  start_sequence ();
>>  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
>>  rtx_insn *seq = get_insns ();
>>  end_sequence ();
>>  if (seq)
>>{
>>  /* emit the memory blockage and register clobber asm volatile.  */
>>  rtx barrier_rtx = expand_asm_reg_clobber_blockage (zeroed_hardregs);
>> 
>> /* How to insert the barrier_rtx before "seq"???.  */
>> ??
>> emit_insn_before (barrier_rtx, seq);  ??
>> 
>>  emit_insn_before (seq, ret);
>> 
>>  /* update the data flow information.  */
>> 
>>  df_set_bb_dirty (BLOCK_FOR_INSN (ret));
>>}
>> 
>> In the above, how should I insert the barrier_rtx in the beginning of “seq” 
>> ? And then insert the seq before ret?
>> Is there special thing I need to take care?
> 
> Easiest way is just to insert both of them before ret:
> 
>  emit_insn_before (barrier_rtx, ret);
>  emit_insn_before (seq, ret);
> 
Thanks. Will do that.

> Note that you shouldn't need to mark the block containing the
> return instruction as dirty: the emit machinery should do that
> for you.

Okay, I see.

>  But it might be necessary to mark the exit block
> (EXIT_BLOCK_PTR_FOR_FN (cfun)) as dirty because of the new
> liveness information -- I'm not sure.

Will study a little more here.

Thanks a lot for your help.

Qing
> 
> Thanks,
> Richard



Re: [PATCH 0/2] Rework adding Power10 IEEE 128-bit min, max, and conditional move

2020-09-25 Thread Segher Boessenkool
Hi!

On Thu, Sep 24, 2020 at 04:56:27PM -0400, Michael Meissner wrote:
> On Thu, Sep 24, 2020 at 10:24:52AM +0200, Florian Weimer wrote:
> > * Michael Meissner via Gcc-patches:
> > 
> > > These patches are my latest versions of the patches to add IEEE 128-bit 
> > > min,
> > > max, and conditional move to GCC.  They correspond to the earlier patches 
> > > #3
> > > and #4 (patches #1 and #2 have been installed).
> > 
> > Is this about IEEE min or IEEE minimum?  My understanding is that they
> > are not the same (or that the behavior depends on the standard version,
> > but I think min was replaced with minimum in the 2019 standard or
> > something like that).

This is about the GCC internal RTX code "smin", which returns an
undefined result if either operand is a NAN, or both are zeros (of
different sign).

> The ISA 3.0 added 2 min/max variants to add to the original variant in power7
> (ISA 2.6).

2.06, fwiw.

>   xsmaxdp   Maximum value
>   xsmaxcdp  Maximum value with "C" semantics
>   xsmaxjdp  Maximum value with "Java" semantics

xsmaxdp implements IEEE behaviour fine.  xsmaxcdp is simply the C
expression  (x > y ? x : y) (or something like that), and xsmaxjdp is
something like that for Java.

> Due to the NaN rules, unless you use -ffast-math, the compiler won't generate
> these by default.

Simply because the RTL would be undefined!

> In ISA 3.1 (power10) the decision was made to only provide the "C" form on
> maximum and minimum.

... for quad precision.


Segher


[PATCH] make handling of zero-length arrays in C++ pretty printer more robust (PR 97201)

2020-09-25 Thread Martin Sebor via Gcc-patches

The C and C++ representations of zero-length arrays are different:
C uses a null upper bound of the type's domain while C++ uses
SIZE_MAX.  This makes the middle end logic more complicated (and
prone to mistakes) because it has to be prepared for both.  A recent
change to -Warray-bounds has the middle end create a zero-length
array to print in a warning message.  I forgot about this gotcha
and, as a result, when the warning triggers under these conditions
in C++, it causes an ICE in the C++ pretty printer that in turn
isn't prepared for the C form of the domain.

In my mind, the "right fix" is to make the representation the same
between the front ends, but I'm certain that such a change would
cause more problems before it solved them.  Another solution might
be to provide APIs for creating (and querying) arrays and have them
call language hooks in cases where the representation might differ.
But that would likely be quite intrusive as well.  So with that in
mind, for the time being, the attached patch just continues to deal
with the difference by teaching the C++ pretty printer to also
recognize the C form of the zero-length domain.

While testing the one line fix I noticed that -Warray-bounds (and
therefore, I assume also all other warnings that detect out of bounds
accesses to allocated objects) triggers only for the ordinary form of
operator new and not for the nothrow overload, for instance.  That's
because the ordinary form is recognized as a built-in which has
the alloc_size attribute attached to it.  But because the other forms
are neither built-in nor declared in  with the same attribute,
the warning doesn't trigger.  So the patch also adds the attribute
to the declarations of these overloads in .  In addition, it
adds attribute malloc to a couple of overloads of the operator that
it's missing from.

Tested on x86_64-linux.

Martin
PR c++/97201 - ICE in -Warray-bounds writing to result of operator new(0)

gcc/cp/ChangeLog:

	PR c++/97201
	* error.c (dump_type_suffix): Handle both the C and C++ forms of
	zero-length arrays.

libstdc++-v3/ChangeLog:

	PR c++/97201
	* libsupc++/new (operator new): Add attribute alloc_size and malloc.

gcc/testsuite/ChangeLog:

	PR c++/97201
	* g++.dg/warn/Warray-bounds-10.C: New test.
	* g++.dg/warn/Warray-bounds-11.C: New test.
	* g++.dg/warn/Warray-bounds-12.C: New test.
	* g++.dg/warn/Warray-bounds-13.C: New test.


diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index ecb41e82d8c..11ed3aedc8d 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -951,8 +951,11 @@ dump_type_suffix (cxx_pretty_printer *pp, tree t, int flags)
   if (tree dtype = TYPE_DOMAIN (t))
 	{
 	  tree max = TYPE_MAX_VALUE (dtype);
-	  /* Zero-length arrays have an upper bound of SIZE_MAX.  */
-	  if (integer_all_onesp (max))
+	  /* Zero-length arrays have a null upper bound in C and SIZE_MAX
+	 in C++.  Handle both since the type might be constructed by
+	 the middle end and end up here as a result of a warning (see
+	 PR c++/97201).  */
+	  if (!max || integer_all_onesp (max))
 	pp_character (pp, '0');
 	  else if (tree_fits_shwi_p (max))
 	pp_wide_integer (pp, tree_to_shwi (max) + 1);
diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C b/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C
new file mode 100644
index 000..22466977b68
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-10.C
@@ -0,0 +1,64 @@
+/* PR c++/97201 - ICE in -Warray-bounds writing to result of operator new(0)
+   Verify that out-of-bounds accesses to memory returned by default operator
+   new() are diagnosed.
+   { dg-do compile }
+   { dg-options "-O2 -Wall -Warray-bounds -ftrack-macro-expansion=0" } */
+
+typedef __INT32_TYPE__ int32_t;
+
+void sink (void*);
+
+#define OP_NEW(n)  operator new (n)
+#define T(T, n, i) do {\
+T *p = (T*) OP_NEW (n);			\
+p[i] = 0;	\
+sink (p);	\
+  } while (0)
+
+void warn_op_new ()
+{
+  T (int32_t, 0, 0);  // { dg-warning "array subscript 0 is outside array bounds of 'int32_t \\\[0]'" }
+  // { dg-message "referencing an object of size \\d allocated by 'void\\\* operator new\\\(\(long \)?unsigned int\\\)'" "note" { target *-*-* } .-1 }
+  T (int32_t, 1, 0);  // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[1]'" }
+  T (int32_t, 2, 0); //  { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[2]'" }
+  T (int32_t, 3, 0); // { dg-warning "array subscript 'int32_t {aka int}\\\[0]' is partly outside array bounds of 'unsigned char \\\[3]'" }
+
+  T (int32_t, 4, 0);
+
+  T (int32_t, 0, 1);  // { dg-warning "array subscript 1 is outside array bounds of 'int32_t \\\[0]'" }
+  T (int32_t, 1, 1);  // { dg-warning "array subscript 1 is outside array bounds " }
+  T (int32_t, 2, 1);  // { dg-warning "array subscript 1 is outside array bound

c++: Adjust pushdecl/duplicate_decls API

2020-09-25 Thread Nathan Sidwell

The decl pushing APIs and duplicate_decls take an 'is_friend' parm,
when what they actually mean is 'hide this from name lookup'.  That
conflation has gotten more anachronistic as time moved on.  We now
have anticipated builtins, and I plan to have injected extern decls
soon.  So this patch is mainly a renaming excercise.  is_friend ->
hiding.  duplicate_decls gets an additional 'was_hidden' parm.  As
I've already said, hiddenness is a property of the symbol table, not
the decl.  Builtins are now pushed requesting hiding, and pushdecl
asserts that we don't attempt to push a thing that should be hidden
without asking for it to be hidden.

This is the final piece of groundwork to get rid of a bunch of 'this
is hidden' markers on decls and move the hiding management entirely
into name lookup.

gcc/cp/
* cp-tree.h (duplicate_decls): Replace 'is_friend' with 'hiding'
and add 'was_hidden'.
* name-lookup.h (pushdecl_namespace_level): Replace 'is_friend'
with 'hiding'.
(pushdecl): Likewise.
(pushdecl_top_level): Drop is_friend parm.
* decl.c (check_no_redeclaration_friend_default_args): Rename parm
olddelc_hidden_p.
(duplicate_decls): Replace 'is_friend' with 'hiding'
and 'was_hidden'.  Do minimal adjustments in body.
(cxx_builtin_function): Pass 'hiding' to pushdecl.
* friend.c (do_friend): Pass 'hiding' to pushdecl.
* name-lookup.c (supplement_binding_1): Drop defaulted arg to
duplicate_decls.
(update_binding): Replace 'is_friend' with 'hiding'.  Drop
defaulted arg to duplicate_decls.
(do_pushdecl): Replace 'is_friend' with 'hiding'.  Assert no
surprise hidhing.  Adjust duplicate_decls calls to inform of old
decl's hiddennes.
(pushdecl): Replace 'is_friend' with 'hiding'.
(set_identifier_type_value_with_scope): Adjust update_binding
call.
(do_pushdecl_with_scope): Replace 'is_friend' with 'hiding'.
(pushdecl_outermost_localscope): Drop default arg to
do_pushdecl_with_scope.
(pushdecl_namespace_level): Replace 'is_friend' with 'hiding'.
(pushdecl_top_level): Drop is_friend parm.
* pt.c (register_specialization): Comment duplicate_decls call
args.
(push_template_decl): Commont pushdecl_namespace_level.
(tsubst_friend_function, tsubst_friend_class): Likewise.

pushing to trunk

nathan
--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 321bb959120..b7f5b6b399f 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -6466,7 +6466,8 @@ extern void determine_local_discriminator	(tree);
 extern int decls_match(tree, tree, bool = true);
 extern bool maybe_version_functions		(tree, tree, bool);
 extern tree duplicate_decls			(tree, tree,
-		 bool is_friend = false);
+		 bool hiding = false,
+		 bool was_hidden = false);
 extern tree declare_local_label			(tree);
 extern tree define_label			(location_t, tree);
 extern void check_goto(tree);
diff --git i/gcc/cp/decl.c w/gcc/cp/decl.c
index b481bbd7b7d..c00b996294e 100644
--- i/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -1341,17 +1341,16 @@ check_redeclaration_no_default_args (tree decl)
 
 static void
 check_no_redeclaration_friend_default_args (tree olddecl, tree newdecl,
-	bool olddecl_hidden_friend_p)
+	bool olddecl_hidden_p)
 {
-  if (!olddecl_hidden_friend_p && !DECL_FRIEND_P (newdecl))
+  if (!olddecl_hidden_p && !DECL_FRIEND_P (newdecl))
 return;
 
-  tree t1 = FUNCTION_FIRST_USER_PARMTYPE (olddecl);
-  tree t2 = FUNCTION_FIRST_USER_PARMTYPE (newdecl);
-
-  for (; t1 && t1 != void_list_node;
+  for (tree t1 = FUNCTION_FIRST_USER_PARMTYPE (olddecl),
+	 t2 = FUNCTION_FIRST_USER_PARMTYPE (newdecl);
+   t1 && t1 != void_list_node;
t1 = TREE_CHAIN (t1), t2 = TREE_CHAIN (t2))
-if ((olddecl_hidden_friend_p && TREE_PURPOSE (t1))
+if ((olddecl_hidden_p && TREE_PURPOSE (t1))
 	|| (DECL_FRIEND_P (newdecl) && TREE_PURPOSE (t2)))
   {
 	auto_diagnostic_group d;
@@ -1435,10 +1434,14 @@ duplicate_function_template_decls (tree newdecl, tree olddecl)
If NEWDECL is not a redeclaration of OLDDECL, NULL_TREE is
returned.
 
-   NEWDECL_IS_FRIEND is true if NEWDECL was declared as a friend.  */
+   HIDING is true if the new decl is being hidden.  WAS_HIDDEN is true
+   if the old decl was hidden.
+
+   Hidden decls can be anticipated builtins, injected friends, or
+   (coming soon) injected from a local-extern decl.   */
 
 tree
-duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
+duplicate_decls (tree newdecl, tree olddecl, bool hiding, bool was_hidden)
 {
   unsigned olddecl_uid = DECL_UID (olddecl);
   int olddecl_friend = 0, types_match = 0, hidden_friend = 0;
@@ -1510,7 +1513,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool newdecl_is_friend)
 	{
 	  /* Avoid warnings redeclaring built-ins which have not been
 	 expli

Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-09-25 Thread Qing Zhao via Gcc-patches



> On Sep 25, 2020, at 9:55 AM, Martin Liška  wrote:
> 
> PING^5
> 

Thanks a lot for ping this patch again.

Hopefully it can be committed into GCC 11 very soon.

Qing
> On 7/21/20 6:24 PM, Qing Zhao wrote:
>> PING^4.
>> Our company is waiting for this patch to be committed to upstream.
>> Thanks a lot.
>> Qing
>>> On Jun 16, 2020, at 7:49 AM, Martin Liška  wrote:
>>> 
>>> PING^3
>>> 
>>> On 6/2/20 11:16 AM, Martin Liška wrote:
 PING^2
 On 5/15/20 11:58 AM, Martin Liška wrote:
> We're in stage1: PING^1
> 
> On 4/3/20 8:15 PM, Egeyar Bagcioglu wrote:
>> 
>> 
>> On 3/18/20 10:05 AM, Martin Liška wrote:
>>> On 3/17/20 7:43 PM, Egeyar Bagcioglu wrote:
 Hi Martin,
 
 I like the patch. It definitely serves our purposes at Oracle and 
 provides another way to do what my previous patches did as well.
 
 1) It keeps the backwards compatibility regarding 
 -frecord-gcc-switches; therefore, removes my related doubts about your 
 previous patch.
 
 2) It still makes use of -frecord-gcc-switches. The new option is only 
 to control the format. This addresses some previous objections to 
 having a new option doing something similar. Now the new option 
 controls the behaviour of the existing one and that behaviour can be 
 further extended.
 
 3) It uses an environment variable as Jakub suggested.
 
 The patch looks good and I confirm that it works for our purposes.
>>> 
>>> Hello.
>>> 
>>> Thank you for the support.
>>> 
 
 Having said that, I have to ask for recognition in this patch for my 
 and my company's contributions. Can you please keep my name and my 
 work email in the changelog and in the commit message?
>>> 
>>> Sure, sorry I forgot.
>> 
>> Hi Martin,
>> 
>> I noticed that some comments in the patch were still referring to 
>> --record-gcc-command-line, the option I suggested earlier. I updated 
>> those comments to mention -frecord-gcc-switches-format instead and also 
>> added my name to the patch as you agreed above. I attached the updated 
>> patch. We are starting to use this patch in the specific domain where we 
>> need its functionality.
>> 
>> Regards
>> Egeyar
>> 
>> 
>>> 
>>> Martin
>>> 
 
 Thanks
 Egeyar
 
 
 
 On 3/17/20 2:53 PM, Martin Liška wrote:
> Hi.
> 
> I'm sending enhanced patch that makes the following changes:
> - a new option -frecord-gcc-switches-format is added; the option
>   selects format (processed, driver) for all options that record
>   GCC command line
> - Dwarf gen_produce_string is now used in -fverbose-asm
> - The .s file is affected in the following way:
> 
> BEFORE:
> 
> # GNU C17 (SUSE Linux) version 9.2.1 20200128 [revision 
> 83f65674e78d97d27537361de1a9d74067ff228d] (x86_64-suse-linux)
> #compiled by GNU C version 9.2.1 20200128 [revision 
> 83f65674e78d97d27537361de1a9d74067ff228d], GMP version 6.2.0, MPFR 
> version 4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP
> 
> # GGC heuristics: --param ggc-min-expand=100 --param 
> ggc-min-heapsize=131072
> # options passed:  -fpreprocessed test.i -march=znver1 -mmmx 
> -mno-3dnow
> # -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes 
> -msha
> # -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi 
> -mno-sgx
> # -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 
> -msse4.1
> # -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed 
> -mprfchw
> # -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er 
> -mno-avx512cd
> # -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves
> # -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma 
> -mno-avx512vbmi
> # -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero 
> -mno-pku
> # -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni
> # -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri 
> -mno-movdir64b
> # -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32
> # --param l1-cache-line-size=64 --param l2-cache-size=512 
> -mtune=znver1
> # -grecord-gcc-switches -g -fverbose-asm -frecord-gcc-switches
> # options enabled:  -faggressive-loop-optimizations -fassume-phsa
> # -fasynchronous-unwind-tables -fauto-inc-dec -fcommon
> # -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
> # -feliminate-unused-debug-types -ffp-int-builtin-ine

Re: [PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)

2020-09-25 Thread Pat Haugen via Gcc-patches
On 9/24/20 10:59 AM, will schmidt via Gcc-patches wrote:
> +;; Move DI value from GPR to TI mode in VSX register, word 1.
> +(define_insn "mtvsrdd_diti_w1"
> +  [(set (match_operand:TI 0 "register_operand" "=wa")
> + (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> +UNSPEC_MTVSRD_DITI_W1))]
> +  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> +  "mtvsrdd %x0,0,%1"
> +  [(set_attr "type" "vecsimple")])

"vecmove" (since I just updated the other uses).

> +
> +;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
> +(define_insn "extendditi2_vector"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
> + UNSPEC_EXTENDDITI2))]
> +  "TARGET_POWER10"
> +  "vextsd2q %0,%1"
> +  [(set_attr "type" "exts")])

"vecexts".

> +
> +(define_expand "extendditi2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand")
> +(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> +  "TARGET_POWER10"
> +  {
> +/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits */
> +rtx temp = gen_reg_rtx (TImode);
> +emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
> +emit_insn (gen_extendditi2_vector (operands[0], temp));
> +DONE;
> +  }
> +  [(set_attr "type" "exts")])

Don't need "type" attr on define_expand since the type will come from the 2 
individual insns emitted.

Thanks,
Pat


Re: [PATCH] tree-optimization/97151 - improve PTA for C++ operator delete

2020-09-25 Thread Jason Merrill via Gcc-patches

On 9/25/20 2:30 AM, Richard Biener wrote:

On Thu, 24 Sep 2020, Jason Merrill wrote:


On 9/24/20 3:43 AM, Richard Biener wrote:

On Wed, 23 Sep 2020, Jason Merrill wrote:


On 9/23/20 2:42 PM, Richard Biener wrote:

On September 23, 2020 7:53:18 PM GMT+02:00, Jason Merrill

wrote:

On 9/23/20 4:14 AM, Richard Biener wrote:

C++ operator delete, when DECL_IS_REPLACEABLE_OPERATOR_DELETE_P,
does not cause the deleted object to be escaped.  It also has no
other interesting side-effects for PTA so skip it like we do
for BUILT_IN_FREE.


Hmm, this is true of the default implementation, but since the function

is replaceable, we don't know what a user definition might do with the
pointer.


But can the object still be 'used' after delete? Can delete fail / throw?

What guarantee does the predicate give us?


The deallocation function is called as part of a delete expression in order
to
release the storage for an object, ending its lifetime (if it was not ended
by
a destructor), so no, the object can't be used afterward.


OK, but the delete operator can access the object contents if there
wasn't a destructor ...



A deallocation function that throws has undefined behavior.


OK, so it seems the 'replaceable' operators are the global ones
(for user-defined/class-specific placement variants I see arbitrary
extra arguments that we'd possibly need to handle).

I'm happy to revert but I'd like to have a testcase that FAILs
with the patch ;)

Now, the following aborts:

struct X {
static struct X saved;
int *p;
X() { __builtin_memcpy (this, &saved, sizeof (X)); }
};
void operator delete (void *p)
{
__builtin_memcpy (&X::saved, p, sizeof (X));
}
int main()
{
int y = 1;
X *p = new X;
p->p = &y;
delete p;
X *q = new X;
*(q->p) = 2;
if (y != 2)
  __builtin_abort ();
}

and I could fix this by not making *p but what *p points to escape.
The testcase is of course maximally awkward, but hey ... ;)

Now this would all be moot if operator delete may not access
the object (or if the object contents are undefined at that point).

Oh, and the testcase segfaults when compiled with GCC 10 because
there we elide the new X / delete p pair ... which is invalid then?
Hmm, we emit

MEM[(struct X *)_8] ={v} {CLOBBER};
operator delete (_8, 8);

so the object contents are undefined _before_ calling delete
even when I do not have a DTOR?  That is, the above,
w/o -fno-lifetime-dse, makes the PTA patch OK for the testcase.


Yes, all classes have a destructor, even if it's trivial, so the object's
lifetime definitely ends before the call to operator delete. This is less
clear for scalar objects, but treating them similarly would be consistent with
other recent changes, so I think it's fine for us to assume that scalar
objects are also invalidated before the call to operator delete.  But of
course this doesn't apply to explicit calls to operator delete outside of a
delete expression.


OK, so change the testcase main slightly to

int main()
{
   int y = 1;
   X *p = new X;
   p->p = &y;
   ::operator delete(p);
   X *q = new X;
   *(q->p) = 2;
   if (y != 2)
 __builtin_abort ();
}

in this case the lifetime of *p does not end before calling
::operator delete() and delete can stash the object contents
somewhere before ending its lifetime.  For the very same reason
we may not elide a new/delete pair like in

int main()
{
   int *p = new int;
   *p = 1;
   ::operator delete (p);
}


Correct; the permission to elide new/delete pairs are for the 
expressions, not the functions.



which we before the change did not do only because calling
operator delete made p escape.  Unfortunately points-to analysis
cannot really reconstruct whether delete was called as part of
a delete expression or directly (and thus whether object lifetime
ended already), neither can DCE.  So I guess we need to mark
the operator delete call in some way to make those transforms
safe.  At least currently any operator delete call makes the
alias guarantee of a operator new call moot by forcing the object
to be aliased with all global and escaped memory ...

Looks like there are some unallocated flags for CALL_EXPR we could
pick but I wonder if we can recycle protected_flag which is

CALL_FROM_THUNK_P and
CALL_ALLOCA_FOR_VAR_P in
CALL_EXPR

for calls to DECL_IS_OPERATOR_{NEW,DELETE}_P, thus whether
we have CALL_FROM_THUNK_P for those operators.  Guess picking
a new flag is safer.


We won't ever call those operators from a thunk, so it should be OK to 
reuse it.



But, does it seem correct that we need to distinguish
delete expressions from plain calls to operator delete?


A reason for that distinction came up in the context of omitting 
new/delete pairs: we want to consider the operator first called by the 
new or delete expression, not a call from that first operator to another 
operator new/delete and exposed by inlining.


https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543404.htm

Re: [PATCH] c++: Implement -Wrange-loop-construct [PR94695]

2020-09-25 Thread Jason Merrill via Gcc-patches

On 9/24/20 8:05 PM, Marek Polacek wrote:

This new warning can be used to prevent expensive copies inside range-based
for-loops, for instance:

   struct S { char arr[128]; };
   void fn () {
 S arr[5];
 for (const auto x : arr) {  }
   }

where auto deduces to S and then we copy the big S in every iteration.
Using "const auto &x" would not incur such a copy.  With this patch the
compiler will warn:

q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' 
[-Wrange-loop-construct]
 4 |   for (const auto x : arr) {  }
   |   ^
q.C:4:19: note: use reference type 'const S&' to prevent copying
 4 |   for (const auto x : arr) {  }
   |   ^
   |   &

As per Clang, this warning is suppressed for trivially copyable types
whose size does not exceed 64B.  The tricky part of the patch was how
to figure out if using a reference would have prevented a copy.  I've
used perform_implicit_conversion to perform the imaginary conversion.
Then if the conversion doesn't have any side-effects, I assume it does
not call any functions or create any TARGET_EXPRs, and is just a simple
assignment like this one:

   const T &x = (const T &) <__for_begin>;

But it can also be a CALL_EXPR:

   x = (const T &) Iterator::operator* (&__for_begin)

which is still fine -- we just use the return value and don't create
any copies.


Would conv_binds_ref_to_prvalue (implicit_conversion (...)) do what you 
want?



This warning is enabled by -Wall.  Further warnings of similar nature
should follow soon.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

PR c++/94695
* c.opt (Wrange-loop-construct): New option.

gcc/cp/ChangeLog:

PR c++/94695
* parser.c (warn_for_range_copy): New function.
(cp_convert_range_for): Call it.

gcc/ChangeLog:

PR c++/94695
* doc/invoke.texi: Document -Wrange-loop-construct.

gcc/testsuite/ChangeLog:

PR c++/94695
* g++.dg/warn/Wrange-loop-construct.C: New test.
---
  gcc/c-family/c.opt|   4 +
  gcc/cp/parser.c   |  77 ++-
  gcc/doc/invoke.texi   |  21 +-
  .../g++.dg/warn/Wrange-loop-construct.C   | 207 ++
  4 files changed, 304 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wrange-loop-construct.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 7761eefd203..bbf7da89658 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -800,6 +800,10 @@ Wpacked-not-aligned
  C ObjC C++ ObjC++ Var(warn_packed_not_aligned) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wall)
  Warn when fields in a struct with the packed attribute are misaligned.
  
+Wrange-loop-construct

+C++ ObjC++ Var(warn_range_loop_construct) Warning LangEnabledBy(C++ 
ObjC++,Wall)
+Warn when a range-based for-loop is creating unnecessary copies.
+
  Wredundant-tags
  C++ ObjC++ Var(warn_redundant_tags) Warning
  Warn when a class or enumerated type is referenced using a redundant 
class-key.
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index fba3fcc0c4c..d233279ac62 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -12646,6 +12646,73 @@ do_range_for_auto_deduction (tree decl, tree 
range_expr)
  }
  }
  
+/* Warns when the loop variable should be changed to a reference type to

+   avoid unnecessary copying.  I.e., from
+
+ for (const auto x : range)
+
+   where range returns a reference, to
+
+ for (const auto &x : range)
+
+   if this version doesn't make a copy.  DECL is the RANGE_DECL; EXPR is the
+   *__for_begin expression.
+   This function is never called when processing_template_decl is on.  */
+
+static void
+warn_for_range_copy (tree decl, tree expr)
+{
+  if (!warn_range_loop_construct
+  || decl == error_mark_node)
+return;
+
+  location_t loc = DECL_SOURCE_LOCATION (decl);
+  tree type = TREE_TYPE (decl);
+
+  if (from_macro_expansion_at (loc))
+return;
+
+  if (TYPE_REF_P (type))
+{
+  /* TODO: Implement reference warnings.  */
+  return;
+}
+  else if (!CP_TYPE_CONST_P (type))
+return;
+
+  /* Since small trivially copyable types are cheap to copy, we suppress the
+ warning for them.  64B is a common size of a cache line.  */
+  if (TREE_CODE (TYPE_SIZE_UNIT (type)) != INTEGER_CST
+  || (tree_to_uhwi (TYPE_SIZE_UNIT (type)) <= 64
+ && trivially_copyable_p (type)))
+return;
+
+  tree rtype = cp_build_reference_type (type, /*rval*/false);
+  /* See what it would take to convert the expr if we used a reference.  */
+  expr = perform_implicit_conversion (rtype, expr, tf_none);
+  if (!TREE_SIDE_EFFECTS (expr))
+/* No calls/TARGET_EXPRs.  */;
+  else
+{
+  /* If we could initialize the reference directly from the call, it
+wouldn't involve any copies.  */
+  STRIP_NOPS (expr);
+  if (TREE_CODE (expr) != CALL_

Re: [PATCH] c++: Fix up default initialization with consteval default ctor [PR96994]

2020-09-25 Thread Jason Merrill via Gcc-patches

On 9/15/20 3:57 AM, Jakub Jelinek wrote:

Hi!

The following testcase is miscompiled (in particular the a and i
initialization).  The problem is that build_special_member_call due to
the immediate constructors (but not evaluated in constant expression mode)
doesn't create a CALL_EXPR, but returns a TARGET_EXPR with CONSTRUCTOR
as the initializer for it,


That seems like the bug; at the end of build_over_call, after you


   call = cxx_constant_value (call, obj_arg);


You need to build an INIT_EXPR if obj_arg isn't a dummy.

Jason



  1   2   >