RE: [PATCH] arm: Auto-vectorization for MVE: vorn

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Christophe Lyon 
> Sent: 02 February 2021 07:19
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] arm: Auto-vectorization for MVE: vorn
> 
> On Mon, 1 Feb 2021 at 10:08, Kyrylo Tkachov 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Christophe Lyon 
> > > Sent: 29 January 2021 18:18
> > > To: Kyrylo Tkachov 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH] arm: Auto-vectorization for MVE: vorn
> > >
> > > On Fri, 29 Jan 2021 at 16:03, Kyrylo Tkachov 
> > > wrote:
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Gcc-patches  On Behalf
> Of
> > > > > Christophe Lyon via Gcc-patches
> > > > > Sent: 29 January 2021 14:55
> > > > > To: gcc-patches@gcc.gnu.org
> > > > > Subject: [PATCH] arm: Auto-vectorization for MVE: vorn
> > > > >
> > > > > This patch enables MVE vornq instructions for auto-vectorization.
> MVE
> > > > > vornq insns in mve.md are modified to use ior instead of unspec
> > > > > expression to support ior3.  The ior3 expander is
> added
> > > to
> > > > > vec-common.md
> > > > >
> > > > > 2021-01-29  Christophe Lyon  
> > > > >
> > > > >   gcc/
> > > > >   * config/arm/iterators.md (supf): Remove VORNQ_S and
> VORNQ_U.
> > > > >   (VORNQ): Remove.
> > > > >   * config/arm/mve.md (mve_vornq_s): New entry for vorn
> > > > >   instruction using expression ior.
> > > > >   (mve_vornq_u): New expander.
> > > > >   (mve_vornq_f): Use ior code instead of unspec.
> > > > >   * config/arm/unspecs.md (VORNQ_S, VORNQ_U, VORNQ_F):
> > > > > Remove.
> > > > >   * config/arm/vec-common.md (orn3): New expander.
> > > > >
> > > > >   gcc/testsuite/
> > > > >   * gcc.target/arm/simd/mve-vorn.c: Add vorn tests.
> > > > > ---
> > > > >  gcc/config/arm/iterators.md  |  3 +--
> > > > >  gcc/config/arm/mve.md| 23 +++--
> > > > >  gcc/config/arm/unspecs.md|  3 ---
> > > > >  gcc/config/arm/vec-common.md |  8 ++
> > > > >  gcc/testsuite/gcc.target/arm/simd/mve-vorn.c | 38
> > > > > 
> > > > >  5 files changed, 62 insertions(+), 13 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vorn.c
> > > > >
> > > > > diff --git a/gcc/config/arm/iterators.md
> b/gcc/config/arm/iterators.md
> > > > > index b902790..43aab23 100644
> > > > > --- a/gcc/config/arm/iterators.md
> > > > > +++ b/gcc/config/arm/iterators.md
> > > > > @@ -1293,7 +1293,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s")
> > > > > (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
> > > > >  (VMULLBQ_INT_S "s") (VMULLBQ_INT_U "u")
> > > > > (VQADDQ_S "s")
> > > > >  (VMULLTQ_INT_S "s") (VMULLTQ_INT_U "u")
> > > > > (VQADDQ_U "u")
> > > > >  (VMULQ_N_S "s") (VMULQ_N_U "u") (VMULQ_S "s")
> > > > > -(VMULQ_U "u") (VORNQ_S "s") (VORNQ_U "u")
> > > > > +(VMULQ_U "u")
> > > > >  (VQADDQ_N_S "s") (VQADDQ_N_U "u")
> > > > >  (VQRSHLQ_N_S "s") (VQRSHLQ_N_U "u") (VQRSHLQ_S
> > > > > "s")
> > > > >  (VQRSHLQ_U "u") (VQSHLQ_N_S "s")
> > > > >   (VQSHLQ_N_U "u")
> > > > > @@ -1563,7 +1563,6 @@ (define_int_iterator VMULLBQ_INT
> > > > > [VMULLBQ_INT_U VMULLBQ_INT_S])
> > > > >  (define_int_iterator VMULLTQ_INT [VMULLTQ_INT_U
> VMULLTQ_INT_S])
> > > > >  (define_int_iterator VMULQ [VMULQ_U VMULQ_S])
> > > > >  (define_int_iterator VMULQ_N [VMULQ_N_U VMULQ_N_S])
> > > > > -(define_int_iterator VORNQ [VORNQ_U VORNQ_S])
> > > > >  (define_int_iterator VQADDQ [VQADDQ_U VQADDQ_S])
> > > > >  (define_int_iterator VQADDQ_N [VQADDQ_N_S VQADDQ_N_U])
> > > > >  (define_int_iterator VQRSHLQ [VQRSHLQ_S VQRSHLQ_U])
> > > > > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > > > > index 465f71c..ec0ef7b 100644
> > > > > --- a/gcc/config/arm/mve.md
> > > > > +++ b/gcc/config/arm/mve.md
> > > > > @@ -1634,18 +1634,26 @@ (define_insn "mve_vmulq"
> > > > >  ;;
> > > > >  ;; [vornq_u, vornq_s])
> > > > >  ;;
> > > > > -(define_insn "mve_vornq_"
> > > > > +(define_insn "mve_vornq_s"
> > > > >[
> > > > > (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> > > > > - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> > > > > -(match_operand:MVE_2 2 "s_register_operand" "w")]
> > > > > -  VORNQ))
> > > > > + (ior:MVE_2 (not:MVE_2 (match_operand:MVE_2 2
> > > > > "s_register_operand" "w"))
> > > > > +(match_operand:MVE_2 1 "s_register_operand" "w")))
> > > > >]
> > > > >"TARGET_HAVE_MVE"
> > > > > -  "vorn %q0, %q1, %q2"
> > > > > +   "vorn\t%q0, %q1, %q2"
> > > > >[(set_attr "type" "mve_move")
> > > > >  ])
> > > > >
> > > > > +(define_expand "mve_vornq_u"
> > > > > +  [
> > > > > +   (set (match_operand:MVE_2 0 "s_register_operand")
> > > > > 

Re: [PATCH v2] PR target/98743: Fix ICE in convert_move for RISC-V

2021-02-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 02, 2021 at 03:21:24PM +0800, Kito Cheng wrote:
>  - Check `from` mode is not BLMmode before call store_expr, calling store_expr
>with BLKmode will cause ICE.
> 
>  - Verified with riscv64, x86_64 and aarch64, no introduce new regression.
> 
> Note: Those logic was introduced by 3e60ddeb8220ed388819bb3f14e8caa9309fd3c2,
>   so I cc Jakub for reivew.
> 
> Changes for V2:
> 
>  - Checking mode of `from` rather than mode of `to`.
>  - Verified on riscv64, x86_64 and aarch64 again.
> 
> gcc/ChangeLog:
> 
>   PR target/98743
>   * expr.c: Check mode before calling store_expr.

Ok, with a small nit.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/98743
>   * g++.target/riscv/pr98743.C: New.

There is nothing riscv specific on the testcase, so it should go into
g++.dg/opt/ instead.

Jakub



Re: [PATCH] rtl-optimization/80960 - avoid creating garbage RTL in DSE

2021-02-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 02, 2021 at 08:51:07AM +0100, Richard Biener wrote:
> > Shouldn't we gather statistics from larger codebase first and perhaps
> > compare against tree-ssa-dse statistics?  I mean, in many functions there
> > are no DSE opportunities at all.
> 
> Of course.  Some DSE will definitely be required because we expose
> ABI details only on RTL and expand sometimes is quite stupid.  ISTR
> either DCE or CSE performs some limited amount of DSE as well?
> 
> The most needed and interesting work will be to disentangle RTL expansion
> into the "complex" bits to be done on (lowered) GIMPLE and the
> mechanical detail of GIMPLE to RTL one instruction at a time.  I guess
> only during this work we'll learn what we need in lowered GIMPLE.

BTW, more complete RTL DSE statistics (from x86_64-linux and i686-linux
bootstraps/regtests together) are below (columns are number of occurences of
specific local and global deletion pairs, local deletions, global
deletions), so it is true RTL DSE only removed something in 1.6% of
functions during those.  But RTL DSE is also doing the SCCVN-like
optimization of optimizing memory reads into constants if those constants
were stored before and that isn't tracked in this.

5908970 0 0
  20144 1 0
  14162 0 3
  11413 0 1
  10862 2 0
   5268 0 2
   5256 3 0
   4924 0 6
   3506 4 0
   2838 0 4
   1734 0 9
   1621 5 0
   1449 0 5
   1014 0 8
856 0 12
754 0 7
729 6 0
487 7 0
464 8 0
422 14 0
412 9 0
381 0 15
373 0 10
297 1 1
259 0 18
248 2 2
245 0 11
237 3 3
230 12 0
220 10 0
168 0 16
144 11 0
138 0 14
133 4 6
133 0 13
123 0 24
115 15 0
114 0 21
 94 0 20
 91 0 17
 88 1 4
 84 1 2
 75 16 0
 74 17 0
 71 0 27
 69 0 30
 55 2 1
 50 4 1
 50 1 9
 50 1 3
 46 0 36
 44 0 33
 44 0 25
 42 2 3
 41 18 0
 40 5 3
 39 34 0
 36 13 0
 35 2 18
 35 0 19
 34 3 1
 33 1 7
 31 1 5
 31 0 48
 31 0 23
 30 4 3
 30 1 6
 30 0 22
 28 0 29
 28 0 26
 26 0 28
 25 24 0
 24 1 18
 22 2 6
 21 0 32
 20 4 4
 20 3 2
 19 4 2
 18 0 40
 17 39 0
 17 22 0
 16 5 9
 16 5 5
 15 129 0
 15 0 66
 15 0 31
 14 20 0
 14 19 0
 13 6 3
 12 1 8
 12 0 42
 11 32 0
 11 2 4
 10 4 9
 10 42 0
 10 0 44
  9 3 4
  9 30 0
  9 259 0
  9 0 57
  9 0 39
  8 4 8
  8 46 0
  8 31 0
  8 25 35
  8 1 15
  8 0 56
  8 0 54
  8 0 45
  8 0 43
  8 0 38
  7 3 8
  7 35 0
  7 33 0
  7 130 0
  7 1 24
  7 12 1
  7 1 14
  7 0 60
  6 9 3
  6 5 6
  6 5 4
  6 5 1
  6 4 49
  6 3 13
  6 15 9
  6 12 6
  6 0 84
  6 0 41
  5 6 12
  5 44 0
  5 27 0
  5 2 7
  5 25 0
  5 23 0
  5 14 12
  5 12 2
  5 1 21
  5 10 2
  5 0 35
  5 0 150
  4 86 0
  4 7 6
  4 69 0
  4 6 8
  4 56 0
  4 45 0
  4 4 16
  4 29 0
  4 266 0
  4 263 0
  4 258 0
  4 2 22
  4 12 4
  4 1 12
  4 0 83
  4 0 61
  4 0 51
  4 0 34
  3 8 9
  3 8 6
  3 6 1
  3 55 0
  3 5 10
  3 49 0
  3 4 77
  3 4 5
  3 4 18
  3 4 13
  3 2 9
  3 28 0
  3 2 38
  3 2 12
  3 14 6
  3 1 13
  3 0 88
  3 0 86
  3 0 75
  3 0 69
  3 0 198
  3 0 100
  2 90 0
  2 6 39
  2 6 21
  2 53 0
  2 5 2
  2 512 0
  2 5 11
  2 50 0
  2 4 7
  2 4 33
  2 4 10
  2 40 0
  2 3 9
  2 3 6
  2 33 1
  2 2 8
  2 262 0
  2 26 0
  2 252 0
  2 24 12
  2 24 1
  2 23 29
  2 22 22
  2 21 5
  2 21 0
  2 16 5
  2 14 18
  2 128 0
  2 12 5
  2 1 20
  2 11 40
  2 1 11
  2 0 91
  2 0 37
  2 0 158
  2 0 136
  2 0 128
  2 0 104
  2 0 103
  1 9 6
  1 9 10
  1 8 7
  1 8 2
  1 8 16
  1 8 15
  1 81 0
  1 8 1
  1 7 5
  1 74 0
  1 73 6
  1 7 12
  1 7 1
  1 6 7
  1 54 0
  1 52 0
  1 48 0
  1 36 0
  1 3 5
  1 31 3
  1 2 5
  1 2 21
  1 2 13
  1 21 18
  1 2 10
  1 19 20
  1 19 2
  1 18 12
  1 17 10
  1 16 1
  1 14 9
  1 145 0
  1 14 15
  1 13 1
  1 12 8
  1 11 11
  1 1 10
  1 10 7
  1 10 11
  1 10 1
  1 0 89
  1 0 80
  1 0 70
  1 0 67
  1 0 59
  1 0 55
  1 0 49
  1 0 244
  1 0 148
  1 0 140

Jakub



Re: [PATCH v2] PR target/98743: Fix ICE in convert_move for RISC-V

2021-02-02 Thread Kito Cheng
Hi Jakub:

Thanks for your review, committed with testcase movement.

On Tue, Feb 2, 2021 at 4:41 PM Jakub Jelinek  wrote:
>
> On Tue, Feb 02, 2021 at 03:21:24PM +0800, Kito Cheng wrote:
> >  - Check `from` mode is not BLMmode before call store_expr, calling 
> > store_expr
> >with BLKmode will cause ICE.
> >
> >  - Verified with riscv64, x86_64 and aarch64, no introduce new regression.
> >
> > Note: Those logic was introduced by 
> > 3e60ddeb8220ed388819bb3f14e8caa9309fd3c2,
> >   so I cc Jakub for reivew.
> >
> > Changes for V2:
> >
> >  - Checking mode of `from` rather than mode of `to`.
> >  - Verified on riscv64, x86_64 and aarch64 again.
> >
> > gcc/ChangeLog:
> >
> >   PR target/98743
> >   * expr.c: Check mode before calling store_expr.
>
> Ok, with a small nit.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR target/98743
> >   * g++.target/riscv/pr98743.C: New.
>
> There is nothing riscv specific on the testcase, so it should go into
> g++.dg/opt/ instead.
>
> Jakub
>


[committed] testsuite: Add testcase for already fixed PR [PR97960]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
Hi!

This testcase has been fixed by
r11-5904-g4cf70c20cb10acd6fb1016611d05540728176b60
so I'm checking it in so that we can close the PR.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as
obvious.

2021-02-02  Jakub Jelinek  

PR tree-optimization/97960
* g++.dg/torture/pr97960.C: New test.

--- gcc/testsuite/g++.dg/torture/pr97960.C.jj   2021-02-01 19:00:07.912833437 
+0100
+++ gcc/testsuite/g++.dg/torture/pr97960.C  2021-02-01 18:57:04.616927413 
+0100
@@ -0,0 +1,30 @@
+// PR tree-optimization/97960
+// { dg-do run }
+
+#if __CHAR_BIT__ == 8 && __SIZEOF_INT__ == 4
+const int &
+foo (const int &d, const int &f)
+{
+  if (d < f)
+return f;
+  return d;
+}
+
+short a[575];
+unsigned b[25];
+unsigned char g;
+#endif
+
+int
+main ()
+{
+#if __CHAR_BIT__ == 8 && __SIZEOF_INT__ == 4
+  for (int e = 0; e < 23; ++e)
+a[e * 23] = 16137;
+  for (signed char h = (unsigned char) (foo (g, 253) + 3); h < 24; h++)
+b[h] = 1064739102;
+  for (int e = 0; e < 23; ++e)
+if (a[e * 23] != 16137)
+  __builtin_abort ();
+#endif
+}


Jakub



[PATCH] tree-vect-patterns: Don't create over widening patterns for stmts used in reductions [PR98848]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
Hi!

As discussed in the PR, the reduction code isn't able to cope with type
promotions/demotions in the reduction computation, so if we recognize an
over-widening pattern that has vect_reduction_def type, we most likely make
it non-vectorizable.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
turnk?

2021-02-02  Jakub Jelinek  

PR tree-optimization/98848
* tree-vect-patterns.c (vect_recog_over_widening_pattern): Punt if
STMT_VINFO_DEF_TYPE (last_stmt_info) is vect_reduction_def.

* gcc.dg/vect/pr98848.c: New test.
* gcc.dg/vect/pr92205.c: Remove xfail.

--- gcc/tree-vect-patterns.c.jj 2021-01-04 10:25:38.650235896 +0100
+++ gcc/tree-vect-patterns.c2021-02-01 11:04:48.136814236 +0100
@@ -1579,6 +1579,10 @@ vect_recog_over_widening_pattern (vec_in
   tree type = TREE_TYPE (lhs);
   tree_code code = gimple_assign_rhs_code (last_stmt);
 
+  /* Punt for reductions where we don't handle the type conversions.  */
+  if (STMT_VINFO_DEF_TYPE (last_stmt_info) == vect_reduction_def)
+return NULL;
+
   /* Keep the first operand of a COND_EXPR as-is: only the other two
  operands are interesting.  */
   unsigned int first_op = (code == COND_EXPR ? 2 : 1);
--- gcc/testsuite/gcc.dg/vect/pr98848.c.jj  2021-02-01 11:15:11.965633750 
+0100
+++ gcc/testsuite/gcc.dg/vect/pr98848.c 2021-02-01 11:15:07.348686891 +0100
@@ -0,0 +1,18 @@
+/* PR tree-optimization/98848 */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+short a[9000];
+
+int
+foo (void)
+{ 
+  int b = a[0];
+  int i;
+  for (i = 1; i < 9000; i ++)
+if (a[i] < b)
+  b = a[i];
+  return b;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { xfail { 
vect_no_int_add || vect_no_int_min_max } } } } */
--- gcc/testsuite/gcc.dg/vect/pr92205.c.jj  2020-01-12 11:54:37.645394821 
+0100
+++ gcc/testsuite/gcc.dg/vect/pr92205.c 2021-02-02 09:59:40.998104863 +0100
@@ -10,4 +10,4 @@ int b(int n, unsigned char *a)
   return d;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
vect_unpack && { ! vect_no_bitwise } } } } } */

Jakub



[PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase ICEs, because after the veclower pass which is the
last point which lowers unsupported vector operations to supported ones
(or scalars) match.pd simplifies a supported vector operation into
unsupported one (vec << 1 >> 1 into vec & { ~1, ... }).
Guarding each match.pd pattern that could do it manually would be IMHO a
nightmare and it could affect hundreds of them, so this patch instead
performs the verification and punting in the infrastructure (of course only
in PROP_gimple_lvec state which is set after the vector lowering).
The function attempts to match expand_vector_operations_1 (except I haven't
added VEC_PERM_EXPR, VEC_COND_EXPR and COND_EXPR cases there, so those
aren't checked and can be added later if we find it a problem).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-02-02  Jakub Jelinek  

PR tree-optimization/98287
* gimple-match.h (check_gimple_lvec): New declaration.
* gimple-match-head.c (check_gimple_lvec): New function.
(maybe_push_res_to_seq): Use it.
* gimple-fold.c: Include tree-pass.h.
(replace_stmt_with_simplification): Use it.

* gcc.dg/pr98287.c: New test.

--- gcc/gimple-match.h.jj   2021-01-04 10:25:38.456238093 +0100
+++ gcc/gimple-match.h  2021-02-01 13:19:30.393860766 +0100
@@ -335,6 +335,7 @@ extern tree (*mprts_hook) (gimple_match_
 
 bool gimple_simplify (gimple *, gimple_match_op *, gimple_seq *,
  tree (*)(tree), tree (*)(tree));
+bool check_gimple_lvec (gimple_match_op *);
 tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *,
tree res = NULL_TREE);
 void maybe_build_generic_op (gimple_match_op *);
--- gcc/gimple-match-head.c.jj  2021-01-16 19:46:53.511672837 +0100
+++ gcc/gimple-match-head.c 2021-02-01 13:18:58.676225427 +0100
@@ -549,6 +549,109 @@ build_call_internal (internal_fn fn, gim
 res_op->op_or_null (4));
 }
 
+/* In PROP_gimple_lvec mode, perform extra checking if RES_OP and return
+   true if RES_OP is not appropriate - would require vector operations that
+   would need to be lowered before expansion.  */
+
+bool
+check_gimple_lvec (gimple_match_op *res_op)
+{
+  enum tree_code code = res_op->code;
+  enum gimple_rhs_class rhs_class = get_gimple_rhs_class (code);
+  if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
+return false;
+
+  tree rhs1 = res_op->op_or_null (0);
+  tree rhs2 = res_op->op_or_null (1);
+  tree type = res_op->type;
+
+  if (!VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
+return false;
+
+  /* A scalar operation pretending to be a vector one.  */
+  if (VECTOR_BOOLEAN_TYPE_P (type)
+  && !VECTOR_MODE_P (TYPE_MODE (type))
+  && TYPE_MODE (type) != BLKmode
+  && (TREE_CODE_CLASS (code) != tcc_comparison
+ || (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1))
+ && !VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (rhs1)))
+ && TYPE_MODE (TREE_TYPE (rhs1)) != BLKmode)))
+return false;
+
+  machine_mode mode = TYPE_MODE (type);
+  optab op;
+  switch (code)
+{
+CASE_CONVERT:
+case FLOAT_EXPR:
+case FIX_TRUNC_EXPR:
+case VIEW_CONVERT_EXPR:
+  return false;
+
+case VEC_UNPACK_FLOAT_HI_EXPR:
+case VEC_UNPACK_FLOAT_LO_EXPR:
+case VEC_PACK_FLOAT_EXPR:
+  return false;
+
+case WIDEN_SUM_EXPR:
+case VEC_WIDEN_PLUS_HI_EXPR:
+case VEC_WIDEN_PLUS_LO_EXPR:
+case VEC_WIDEN_MINUS_HI_EXPR:
+case VEC_WIDEN_MINUS_LO_EXPR:
+case VEC_WIDEN_MULT_HI_EXPR:
+case VEC_WIDEN_MULT_LO_EXPR:
+case VEC_WIDEN_MULT_EVEN_EXPR:
+case VEC_WIDEN_MULT_ODD_EXPR:
+case VEC_UNPACK_HI_EXPR:
+case VEC_UNPACK_LO_EXPR:
+case VEC_UNPACK_FIX_TRUNC_HI_EXPR:
+case VEC_UNPACK_FIX_TRUNC_LO_EXPR:
+case VEC_PACK_TRUNC_EXPR:
+case VEC_PACK_SAT_EXPR:
+case VEC_PACK_FIX_TRUNC_EXPR:
+case VEC_WIDEN_LSHIFT_HI_EXPR:
+case VEC_WIDEN_LSHIFT_LO_EXPR:
+  return false;
+
+case LSHIFT_EXPR:
+case RSHIFT_EXPR:
+case LROTATE_EXPR:
+case RROTATE_EXPR:
+  if (!VECTOR_MODE_P (mode))
+   return true;
+  if (!VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
+   {
+ op = optab_for_tree_code (code, type, optab_scalar);
+ if (op && optab_handler (op, mode) != CODE_FOR_nothing)
+   return false;
+   }
+  op = optab_for_tree_code (code, type, optab_vector);
+  if (op && optab_handler (op, mode) != CODE_FOR_nothing)
+   return false;
+  return true;
+
+case MULT_HIGHPART_EXPR:
+  if (!VECTOR_MODE_P (mode)
+ || !can_mult_highpart_p (mode, TYPE_UNSIGNED (type)))
+   return true;
+  return false;
+
+default:
+  if (!VECTOR_MODE_P (mode))
+   return true;
+  op = optab_for_tree_code (code, type, optab_default);
+  if (op == unknown_optab
+ && code == NEGATE_EXPR
+ && INTEGRAL_TYPE_P (TREE_TYPE (type)))
+  

Re: [PATCH] tree-vect-patterns: Don't create over widening patterns for stmts used in reductions [PR98848]

2021-02-02 Thread Richard Biener
On Tue, 2 Feb 2021, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, the reduction code isn't able to cope with type
> promotions/demotions in the reduction computation, so if we recognize an
> over-widening pattern that has vect_reduction_def type, we most likely make
> it non-vectorizable.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> turnk?

OK.

Thanks,
Richard.

> 2021-02-02  Jakub Jelinek  
> 
>   PR tree-optimization/98848
>   * tree-vect-patterns.c (vect_recog_over_widening_pattern): Punt if
>   STMT_VINFO_DEF_TYPE (last_stmt_info) is vect_reduction_def.
> 
>   * gcc.dg/vect/pr98848.c: New test.
>   * gcc.dg/vect/pr92205.c: Remove xfail.
> 
> --- gcc/tree-vect-patterns.c.jj   2021-01-04 10:25:38.650235896 +0100
> +++ gcc/tree-vect-patterns.c  2021-02-01 11:04:48.136814236 +0100
> @@ -1579,6 +1579,10 @@ vect_recog_over_widening_pattern (vec_in
>tree type = TREE_TYPE (lhs);
>tree_code code = gimple_assign_rhs_code (last_stmt);
>  
> +  /* Punt for reductions where we don't handle the type conversions.  */
> +  if (STMT_VINFO_DEF_TYPE (last_stmt_info) == vect_reduction_def)
> +return NULL;
> +
>/* Keep the first operand of a COND_EXPR as-is: only the other two
>   operands are interesting.  */
>unsigned int first_op = (code == COND_EXPR ? 2 : 1);
> --- gcc/testsuite/gcc.dg/vect/pr98848.c.jj2021-02-01 11:15:11.965633750 
> +0100
> +++ gcc/testsuite/gcc.dg/vect/pr98848.c   2021-02-01 11:15:07.348686891 
> +0100
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/98848 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +short a[9000];
> +
> +int
> +foo (void)
> +{ 
> +  int b = a[0];
> +  int i;
> +  for (i = 1; i < 9000; i ++)
> +if (a[i] < b)
> +  b = a[i];
> +  return b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { xfail { 
> vect_no_int_add || vect_no_int_min_max } } } } */
> --- gcc/testsuite/gcc.dg/vect/pr92205.c.jj2020-01-12 11:54:37.645394821 
> +0100
> +++ gcc/testsuite/gcc.dg/vect/pr92205.c   2021-02-02 09:59:40.998104863 
> +0100
> @@ -10,4 +10,4 @@ int b(int n, unsigned char *a)
>return d;
>  }
>  
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } 
> } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { 
> vect_unpack && { ! vect_no_bitwise } } } } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] lra-constraints: Fix error-recovery for bad inline-asms [PR97971]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase has ice-on-invalid, it can't be reloaded, but we
shouldn't ICE the compiler because the user typed non-sense.

In current_insn_transform we have:
  if (process_alt_operands (reused_alternative_num))
alt_p = true;
  
  if (check_only_p)
return ! alt_p || best_losers != 0;
  
  /* If insn is commutative (it's safe to exchange a certain pair of
 operands) then we need to try each alternative twice, the second
 time matching those two operands as if we had exchanged them.  To
 do this, really exchange them in operands.

 If we have just tried the alternatives the second time, return
 operands to normal and drop through.  */

  if (reused_alternative_num < 0 && commutative >= 0)
{
  curr_swapped = !curr_swapped;
  if (curr_swapped)
{
  swap_operands (commutative);
  goto try_swapped;
}
  else
swap_operands (commutative);
}

  if (! alt_p && ! sec_mem_p)
{
  /* No alternative works with reloads??  */
  if (INSN_CODE (curr_insn) >= 0)
fatal_insn ("unable to generate reloads for:", curr_insn);
  error_for_asm (curr_insn,
 "inconsistent operand constraints in an %");
  lra_asm_error_p = true;
...
and so handle inline asms there differently (and delete/nullify them after
this) - fatal_insn is only called for non-inline asm.
But in process_alt_operands we do:
/* Both the earlyclobber operand and conflicting operand
   cannot both be user defined hard registers.  */
if (HARD_REGISTER_P (operand_reg[i])
&& REG_USERVAR_P (operand_reg[i])
&& operand_reg[j] != NULL_RTX
&& HARD_REGISTER_P (operand_reg[j])
&& REG_USERVAR_P (operand_reg[j]))
  fatal_insn ("unable to generate reloads for "
  "impossible constraints:", curr_insn);
and thus ICE even for inline-asms.

I think it is inappropriate to delete/nullify the insn in
process_alt_operands, as it could be done e.g. in the check_only_p mode,
so this patch just returns false in that case, which results in the
caller have alt_p false, and as inline asm isn't simple move, sec_mem_p
will be also false (and it isn't commutative either), so for check_only_p
it will suggests to the callers it isn't ok and otherwise will emit
error and delete/nullify the inline asm insn.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-02-02  Jakub Jelinek  

PR middle-end/97971
* lra-constraints.c (process_alt_operands): For inline asm, don't call
fatal_insn, but instead return false.

* gcc.target/i386/pr97971.c: New test.

--- gcc/lra-constraints.c.jj2021-01-30 10:48:21.380667825 +0100
+++ gcc/lra-constraints.c   2021-02-01 14:20:53.053603804 +0100
@@ -3106,8 +3106,13 @@ process_alt_operands (int only_alternati
&& operand_reg[j] != NULL_RTX
&& HARD_REGISTER_P (operand_reg[j])
&& REG_USERVAR_P (operand_reg[j]))
- fatal_insn ("unable to generate reloads for "
- "impossible constraints:", curr_insn);
+ {
+   /* For asm, let curr_insn_transform diagnose it.  */
+   if (INSN_CODE (curr_insn) < 0)
+ return false;
+   fatal_insn ("unable to generate reloads for "
+   "impossible constraints:", curr_insn);
+ }
  }
  if (last_conflict_j < 0)
continue;
--- gcc/testsuite/gcc.target/i386/pr97971.c.jj  2021-02-01 14:24:26.159160855 
+0100
+++ gcc/testsuite/gcc.target/i386/pr97971.c 2021-02-01 14:25:10.586652128 
+0100
@@ -0,0 +1,12 @@
+/* PR middle-end/97971 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (void)
+{
+  register _Complex long a asm ("rax");
+  register int b asm ("rdx");
+  asm ("# %0 %1" : "=&r" (a), "=r" (b));   /* { dg-error "inconsistent 
operand constraints in an 'asm'" } */
+  return a;
+}

Jakub



[PATCH] ifcvt: Avoid ICEs trying to force_operand random RTL [PR97487]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
Hi!

As the testcase shows, RTL ifcvt can throw random RTL (whatever it found in
some insns) at expand_binop or expand_unop and expects it to do something
(and then will check if it created valid insns and punts if not).
These functions in the end if the operands don't match try to
copy_to_mode_reg the operands, which does
if (!general_operand (x, VOIDmode))
  x = force_operand (x, temp);
but, force_operand is far from handling all possible RTLs, it will ICE for
all more unusual RTL codes.  Basically handles just simple arithmetic and
unary RTL operations if they have an optab and
expand_simple_binop/expand_simple_unop ICE on others.

The following patch fixes it by adding some operand verification (whether
there is a hope that copy_to_mode_reg will succeed on those).  It is added
both to noce_emit_move_insn (not needed for this exact testcase,
that function simply tries to recog the insn as is and if it fails,
handles some simple binop/unop cases; the patch performs the verification
of their operands) and noce_try_sign_mask.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-02-02  Jakub Jelinek  

PR middle-end/97487
* ifcvt.c (noce_can_force_operand): New function.
(noce_emit_move_insn): Use it.
(noce_try_sign_mask): Likewise.  Formatting fix.

* gcc.dg/pr97487-1.c: New test.
* gcc.dg/pr97487-2.c: New test.

--- gcc/ifcvt.c.jj  2021-01-04 10:25:39.663224426 +0100
+++ gcc/ifcvt.c 2021-02-01 16:57:35.483766472 +0100
@@ -885,6 +885,60 @@ noce_emit_store_flag (struct noce_if_inf
   || code == GEU || code == GTU), normalize);
 }
 
+/* Return true if X can be safely forced into a register by copy_to_mode_reg
+   / force_operand.  */
+
+static bool
+noce_can_force_operand (rtx x)
+{
+  if (general_operand (x, VOIDmode))
+return true;
+  if (SUBREG_P (x))
+{
+  if (!noce_can_force_operand (SUBREG_REG (x)))
+   return false;
+  return true;
+}
+  if (ARITHMETIC_P (x))
+{
+  if (!noce_can_force_operand (XEXP (x, 0))
+ || !noce_can_force_operand (XEXP (x, 1)))
+   return false;
+  switch (GET_CODE (x))
+   {
+   case MULT:
+   case DIV:
+   case MOD:
+   case UDIV:
+   case UMOD:
+ return true;
+   default:
+ return code_to_optab (GET_CODE (x));
+   }
+}
+  if (UNARY_P (x))
+{
+  if (!noce_can_force_operand (XEXP (x, 0)))
+   return false;
+  switch (GET_CODE (x))
+   {
+   case ZERO_EXTEND:
+   case SIGN_EXTEND:
+   case TRUNCATE:
+   case FLOAT_EXTEND:
+   case FLOAT_TRUNCATE:
+   case FIX:
+   case UNSIGNED_FIX:
+   case FLOAT:
+   case UNSIGNED_FLOAT:
+ return true;
+   default:
+ return code_to_optab (GET_CODE (x));
+   }
+}
+  return false;
+}
+
 /* Emit instruction to move an rtx, possibly into STRICT_LOW_PART.
X is the destination/target and Y is the value to copy.  */
 
@@ -943,7 +997,7 @@ noce_emit_move_insn (rtx x, rtx y)
{
case RTX_UNARY:
  ot = code_to_optab (GET_CODE (y));
- if (ot)
+ if (ot && noce_can_force_operand (XEXP (y, 0)))
{
  start_sequence ();
  target = expand_unop (GET_MODE (y), ot, XEXP (y, 0), x, 0);
@@ -960,7 +1014,9 @@ noce_emit_move_insn (rtx x, rtx y)
case RTX_BIN_ARITH:
case RTX_COMM_ARITH:
  ot = code_to_optab (GET_CODE (y));
- if (ot)
+ if (ot
+ && noce_can_force_operand (XEXP (y, 0))
+ && noce_can_force_operand (XEXP (y, 1)))
{
  start_sequence ();
  target = expand_binop (GET_MODE (y), ot,
@@ -2763,15 +2819,18 @@ noce_try_sign_mask (struct noce_if_info
  INSN_B which can happen for e.g. conditional stores to memory.  For the
  cost computation use the block TEST_BB where the evaluation will end up
  after the transformation.  */
-  t_unconditional =
-(t == if_info->b
- && (if_info->insn_b == NULL_RTX
-|| BLOCK_FOR_INSN (if_info->insn_b) == if_info->test_bb));
+  t_unconditional
+= (t == if_info->b
+   && (if_info->insn_b == NULL_RTX
+  || BLOCK_FOR_INSN (if_info->insn_b) == if_info->test_bb));
   if (!(t_unconditional
|| (set_src_cost (t, mode, if_info->speed_p)
< COSTS_N_INSNS (2
 return FALSE;
 
+  if (!noce_can_force_operand (t))
+return FALSE;
+
   start_sequence ();
   /* Use emit_store_flag to generate "m < 0 ? -1 : 0" instead of expanding
  "(signed) m >> 31" directly.  This benefits targets with specialized
--- gcc/testsuite/gcc.dg/pr97487-1.c.jj 2021-02-01 17:11:48.281063978 +0100
+++ gcc/testsuite/gcc.dg/pr97487-1.c2021-02-01 17:11:07.530527523 +0100
@@ -0,0 +1,9 @@
+/* PR middle-end/97487 */
+/* { dg-do compile } */
+/* { dg-options "-O2 --param max-rtl-if-

Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches  writes:
> Hi!
>
> The following testcase ICEs, because after the veclower pass which is the
> last point which lowers unsupported vector operations to supported ones
> (or scalars) match.pd simplifies a supported vector operation into
> unsupported one (vec << 1 >> 1 into vec & { ~1, ... }).
> Guarding each match.pd pattern that could do it manually would be IMHO a
> nightmare and it could affect hundreds of them, so this patch instead
> performs the verification and punting in the infrastructure (of course only
> in PROP_gimple_lvec state which is set after the vector lowering).
> The function attempts to match expand_vector_operations_1 (except I haven't
> added VEC_PERM_EXPR, VEC_COND_EXPR and COND_EXPR cases there, so those
> aren't checked and can be added later if we find it a problem).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-02-02  Jakub Jelinek  
>
>   PR tree-optimization/98287
>   * gimple-match.h (check_gimple_lvec): New declaration.
>   * gimple-match-head.c (check_gimple_lvec): New function.
>   (maybe_push_res_to_seq): Use it.
>   * gimple-fold.c: Include tree-pass.h.
>   (replace_stmt_with_simplification): Use it.
>
>   * gcc.dg/pr98287.c: New test.
>
> --- gcc/gimple-match.h.jj 2021-01-04 10:25:38.456238093 +0100
> +++ gcc/gimple-match.h2021-02-01 13:19:30.393860766 +0100
> @@ -335,6 +335,7 @@ extern tree (*mprts_hook) (gimple_match_
>  
>  bool gimple_simplify (gimple *, gimple_match_op *, gimple_seq *,
> tree (*)(tree), tree (*)(tree));
> +bool check_gimple_lvec (gimple_match_op *);
>  tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *,
>   tree res = NULL_TREE);
>  void maybe_build_generic_op (gimple_match_op *);
> --- gcc/gimple-match-head.c.jj2021-01-16 19:46:53.511672837 +0100
> +++ gcc/gimple-match-head.c   2021-02-01 13:18:58.676225427 +0100
> @@ -549,6 +549,109 @@ build_call_internal (internal_fn fn, gim
>res_op->op_or_null (4));
>  }
>  
> +/* In PROP_gimple_lvec mode, perform extra checking if RES_OP and return
> +   true if RES_OP is not appropriate - would require vector operations that
> +   would need to be lowered before expansion.  */
> +
> +bool
> +check_gimple_lvec (gimple_match_op *res_op)

Guess this is personal preference, but the return value seems inverted.
I'd have expected true if something is OK and false if something isn't.

> +{
> +  enum tree_code code = res_op->code;
> +  enum gimple_rhs_class rhs_class = get_gimple_rhs_class (code);
> +  if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
> +return false;
> +
> +  tree rhs1 = res_op->op_or_null (0);
> +  tree rhs2 = res_op->op_or_null (1);
> +  tree type = res_op->type;
> +
> +  if (!VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
> +return false;
> +
> +  /* A scalar operation pretending to be a vector one.  */
> +  if (VECTOR_BOOLEAN_TYPE_P (type)
> +  && !VECTOR_MODE_P (TYPE_MODE (type))
> +  && TYPE_MODE (type) != BLKmode
> +  && (TREE_CODE_CLASS (code) != tcc_comparison
> +   || (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1))
> +   && !VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (rhs1)))
> +   && TYPE_MODE (TREE_TYPE (rhs1)) != BLKmode)))
> +return false;
> +
> +  machine_mode mode = TYPE_MODE (type);
> +  optab op;
> +  switch (code)
> +{
> +CASE_CONVERT:
> +case FLOAT_EXPR:
> +case FIX_TRUNC_EXPR:
> +case VIEW_CONVERT_EXPR:
> +  return false;
> +
> +case VEC_UNPACK_FLOAT_HI_EXPR:
> +case VEC_UNPACK_FLOAT_LO_EXPR:
> +case VEC_PACK_FLOAT_EXPR:
> +  return false;
> +
> +case WIDEN_SUM_EXPR:
> +case VEC_WIDEN_PLUS_HI_EXPR:
> +case VEC_WIDEN_PLUS_LO_EXPR:
> +case VEC_WIDEN_MINUS_HI_EXPR:
> +case VEC_WIDEN_MINUS_LO_EXPR:
> +case VEC_WIDEN_MULT_HI_EXPR:
> +case VEC_WIDEN_MULT_LO_EXPR:
> +case VEC_WIDEN_MULT_EVEN_EXPR:
> +case VEC_WIDEN_MULT_ODD_EXPR:
> +case VEC_UNPACK_HI_EXPR:
> +case VEC_UNPACK_LO_EXPR:
> +case VEC_UNPACK_FIX_TRUNC_HI_EXPR:
> +case VEC_UNPACK_FIX_TRUNC_LO_EXPR:
> +case VEC_PACK_TRUNC_EXPR:
> +case VEC_PACK_SAT_EXPR:
> +case VEC_PACK_FIX_TRUNC_EXPR:
> +case VEC_WIDEN_LSHIFT_HI_EXPR:
> +case VEC_WIDEN_LSHIFT_LO_EXPR:
> +  return false;
> +
> +case LSHIFT_EXPR:
> +case RSHIFT_EXPR:
> +case LROTATE_EXPR:
> +case RROTATE_EXPR:
> +  if (!VECTOR_MODE_P (mode))
> + return true;
> +  if (!VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
> + {
> +   op = optab_for_tree_code (code, type, optab_scalar);
> +   if (op && optab_handler (op, mode) != CODE_FOR_nothing)
> + return false;
> + }
> +  op = optab_for_tree_code (code, type, optab_vector);
> +  if (op && optab_handler (op, mode) != CODE_FOR_nothing)
> + return false;
> +  return true;
> +
> +case MULT_HIGHP

Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 02, 2021 at 09:38:09AM +, Richard Sandiford wrote:
> > +/* In PROP_gimple_lvec mode, perform extra checking if RES_OP and return
> > +   true if RES_OP is not appropriate - would require vector operations that
> > +   would need to be lowered before expansion.  */
> > +
> > +bool
> > +check_gimple_lvec (gimple_match_op *res_op)
> 
> Guess this is personal preference, but the return value seems inverted.
> I'd have expected true if something is OK and false if something isn't.

Sure, I can change that.

> > +default:
> > +  if (!VECTOR_MODE_P (mode))
> > +   return true;
> > +  op = optab_for_tree_code (code, type, optab_default);
> > +  if (op == unknown_optab
> > + && code == NEGATE_EXPR
> > + && INTEGRAL_TYPE_P (TREE_TYPE (type)))
> > +   op = optab_for_tree_code (MINUS_EXPR, type, optab_default);
> > +  if (op && optab_handler (op, mode) != CODE_FOR_nothing)
> > +   return false;
> > +  return true;
> 
> This doesn't look right for things like SVE comparisons, where the
> result is a VECTOR_BOOLEAN_TYPE_P with MODE_VECTOR_BOOL and where
> the inputs are something else.  I guess it might also affect FP
> comparisons on most targets.

So how does that get through expand_vector_operations_1 ?

I don't see there anything that would catch it until:
  if (compute_type == NULL_TREE)
compute_type = get_compute_type (code, op, type);
  if (compute_type == type)
return;
and the above is an attempt to do what get_compute_type does.

expand_vector_comparison indeed has one case:
  /* As seen in PR95830, we should not expand comparisons that are only
 feeding a VEC_COND_EXPR statement.  */
...
and if expand_vector_comparison returns NULL, nothing is lowered.
But we can't really look at immediate uses during the checking :(.

So do you suggest for now letting all comparisons get through?

Jakub



Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches  writes:
> On Tue, Feb 02, 2021 at 09:38:09AM +, Richard Sandiford wrote:
>> > +default:
>> > +  if (!VECTOR_MODE_P (mode))
>> > +  return true;
>> > +  op = optab_for_tree_code (code, type, optab_default);
>> > +  if (op == unknown_optab
>> > +&& code == NEGATE_EXPR
>> > +&& INTEGRAL_TYPE_P (TREE_TYPE (type)))
>> > +  op = optab_for_tree_code (MINUS_EXPR, type, optab_default);
>> > +  if (op && optab_handler (op, mode) != CODE_FOR_nothing)
>> > +  return false;
>> > +  return true;
>> 
>> This doesn't look right for things like SVE comparisons, where the
>> result is a VECTOR_BOOLEAN_TYPE_P with MODE_VECTOR_BOOL and where
>> the inputs are something else.  I guess it might also affect FP
>> comparisons on most targets.
>
> So how does that get through expand_vector_operations_1 ?
>
> I don't see there anything that would catch it until:
>   if (compute_type == NULL_TREE)
> compute_type = get_compute_type (code, op, type);
>   if (compute_type == type)
> return;
> and the above is an attempt to do what get_compute_type does.
>
> expand_vector_comparison indeed has one case:
>   /* As seen in PR95830, we should not expand comparisons that are only
>  feeding a VEC_COND_EXPR statement.  */
> ...
> and if expand_vector_comparison returns NULL, nothing is lowered.

Right.  But I think the main thing for SVE is the expand_vec_cmp_expr_p
test, which takes both the lhs and rhs types.  So…

> But we can't really look at immediate uses during the checking :(.
>
> So do you suggest for now letting all comparisons get through?

…I guess we could just use expand_vec_cmp_expr_p for the comparisons.

But in general, I think it would be good to avoid duplicating so
much of the tests.  At least the:

  if (CONVERT_EXPR_CODE_P (code)
  || code == FLOAT_EXPR
  || code == FIX_TRUNC_EXPR
  || code == VIEW_CONVERT_EXPR)
return;

  /* The signedness is determined from input argument.  */
  if (code == VEC_UNPACK_FLOAT_HI_EXPR
  || code == VEC_UNPACK_FLOAT_LO_EXPR
  || code == VEC_PACK_FLOAT_EXPR)
{
  /* We do not know how to scalarize those.  */
  return;
}

  /* For widening/narrowing vector operations, the relevant type is of the
 arguments, not the widened result.  VEC_UNPACK_FLOAT_*_EXPR is
 calculated in the same way above.  */
  if (code == WIDEN_SUM_EXPR
  || code == VEC_WIDEN_PLUS_HI_EXPR
  || code == VEC_WIDEN_PLUS_LO_EXPR
  || code == VEC_WIDEN_MINUS_HI_EXPR
  || code == VEC_WIDEN_MINUS_LO_EXPR
  || code == VEC_WIDEN_MULT_HI_EXPR
  || code == VEC_WIDEN_MULT_LO_EXPR
  || code == VEC_WIDEN_MULT_EVEN_EXPR
  || code == VEC_WIDEN_MULT_ODD_EXPR
  || code == VEC_UNPACK_HI_EXPR
  || code == VEC_UNPACK_LO_EXPR
  || code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
  || code == VEC_UNPACK_FIX_TRUNC_LO_EXPR
  || code == VEC_PACK_TRUNC_EXPR
  || code == VEC_PACK_SAT_EXPR
  || code == VEC_PACK_FIX_TRUNC_EXPR
  || code == VEC_WIDEN_LSHIFT_HI_EXPR
  || code == VEC_WIDEN_LSHIFT_LO_EXPR)
{
  /* We do not know how to scalarize those.  */
  return;
}

part seems like it could be split out into a predicate.

Maybe we could also split out the optab choice, e.g. returning the optab
if a split is possible and null otherwise.  Just a suggestion though.

Thanks,
Richard


RE: [backport gcc10, gcc9] Requet to backport PR97969

2021-02-02 Thread Przemyslaw Wirkus via Gcc-patches
> On 2021-01-18 7:50 a.m., Richard Biener wrote:
> > On Mon, 18 Jan 2021, Przemyslaw Wirkus wrote:
> >
> >> Hi all,
> >>
> >> Can we backport PR97969 patch to GCC 10 and (maybe) GCC 9 ?:
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969
> >>
> >> IMHO bug is severe and could land in GCC 10 and 9. Vladimir's original
> patch:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563322.html
> >> applies without changes to both gcc-10 and gcc-9.
> >>
> >> I've regression tested this patch on both gcc-10 and gcc-9 branched
> >> for
> >> x86_64 cross (arm-eabi target) and no issues.
> >>
> >> OK for gcc-10 and gcc-9 ?
> > I see two fallout PRs with a trivial search: PR98643 and PR98722.  LRA
> > patches quite easily trigger unexpected fallout unfortunately ...
> >
> Yes, I am agree.  We should wait until the new regressions are fixed.  I am
> going to work on this patch more to fix the new regressions. Although the
> basic idea of the original problem solution probably will stay the same.

I've retested series of three patches which are related to this PR:

19af25c0b3aa2a78b4d45d295359ec26cb9fc607 [PR98777]
79c57603602c4493b6baa1d47ed451e8f5e9c0f3 [PR98722]
34aa56af2547e1646c0f07b9b88b210ebdb2a9f5 [PR97969]

on top of gcc-10 branch.

Bootstrapped and regression tested on aarch64-linux-gnu machine and no issues.
Regression tested on x86_64 host (arm-eabi target) cross and no issues.

OK for gcc-10 ?

> >> PS: I can commit if approved.
> >>



Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Richard Biener
On Tue, 2 Feb 2021, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs, because after the veclower pass which is the
> last point which lowers unsupported vector operations to supported ones
> (or scalars) match.pd simplifies a supported vector operation into
> unsupported one (vec << 1 >> 1 into vec & { ~1, ... }).
> Guarding each match.pd pattern that could do it manually would be IMHO a
> nightmare and it could affect hundreds of them, so this patch instead
> performs the verification and punting in the infrastructure (of course only
> in PROP_gimple_lvec state which is set after the vector lowering).
> The function attempts to match expand_vector_operations_1 (except I haven't
> added VEC_PERM_EXPR, VEC_COND_EXPR and COND_EXPR cases there, so those
> aren't checked and can be added later if we find it a problem).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

So I fear this only covers parts of the paths simplifications can
end up used.  Now one question is whether we want to allow
"invalid" intermediate transforms if a final match makes it
"valid" again.  If not then doing the verification in
gimple-match.c itself might be preferable.  OTOH I think that
most patterns should verify themselves - it's also odd that
v << 1 >> 1 is supported but not v & vec-cst - we might want to
require some basic support from targets.  I realize the PR is
one of the odd V1mode cases which we might want to support better
during RTL expansion (or require targets to be more sane here).

So in the end I don't like the patch much, not at this stage
anyway.

Thanks,
Richard.

> 2021-02-02  Jakub Jelinek  
> 
>   PR tree-optimization/98287
>   * gimple-match.h (check_gimple_lvec): New declaration.
>   * gimple-match-head.c (check_gimple_lvec): New function.
>   (maybe_push_res_to_seq): Use it.
>   * gimple-fold.c: Include tree-pass.h.
>   (replace_stmt_with_simplification): Use it.
> 
>   * gcc.dg/pr98287.c: New test.
> 
> --- gcc/gimple-match.h.jj 2021-01-04 10:25:38.456238093 +0100
> +++ gcc/gimple-match.h2021-02-01 13:19:30.393860766 +0100
> @@ -335,6 +335,7 @@ extern tree (*mprts_hook) (gimple_match_
>  
>  bool gimple_simplify (gimple *, gimple_match_op *, gimple_seq *,
> tree (*)(tree), tree (*)(tree));
> +bool check_gimple_lvec (gimple_match_op *);
>  tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *,
>   tree res = NULL_TREE);
>  void maybe_build_generic_op (gimple_match_op *);
> --- gcc/gimple-match-head.c.jj2021-01-16 19:46:53.511672837 +0100
> +++ gcc/gimple-match-head.c   2021-02-01 13:18:58.676225427 +0100
> @@ -549,6 +549,109 @@ build_call_internal (internal_fn fn, gim
>res_op->op_or_null (4));
>  }
>  
> +/* In PROP_gimple_lvec mode, perform extra checking if RES_OP and return
> +   true if RES_OP is not appropriate - would require vector operations that
> +   would need to be lowered before expansion.  */
> +
> +bool
> +check_gimple_lvec (gimple_match_op *res_op)
> +{
> +  enum tree_code code = res_op->code;
> +  enum gimple_rhs_class rhs_class = get_gimple_rhs_class (code);
> +  if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
> +return false;
> +
> +  tree rhs1 = res_op->op_or_null (0);
> +  tree rhs2 = res_op->op_or_null (1);
> +  tree type = res_op->type;
> +
> +  if (!VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
> +return false;
> +
> +  /* A scalar operation pretending to be a vector one.  */
> +  if (VECTOR_BOOLEAN_TYPE_P (type)
> +  && !VECTOR_MODE_P (TYPE_MODE (type))
> +  && TYPE_MODE (type) != BLKmode
> +  && (TREE_CODE_CLASS (code) != tcc_comparison
> +   || (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1))
> +   && !VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (rhs1)))
> +   && TYPE_MODE (TREE_TYPE (rhs1)) != BLKmode)))
> +return false;
> +
> +  machine_mode mode = TYPE_MODE (type);
> +  optab op;
> +  switch (code)
> +{
> +CASE_CONVERT:
> +case FLOAT_EXPR:
> +case FIX_TRUNC_EXPR:
> +case VIEW_CONVERT_EXPR:
> +  return false;
> +
> +case VEC_UNPACK_FLOAT_HI_EXPR:
> +case VEC_UNPACK_FLOAT_LO_EXPR:
> +case VEC_PACK_FLOAT_EXPR:
> +  return false;
> +
> +case WIDEN_SUM_EXPR:
> +case VEC_WIDEN_PLUS_HI_EXPR:
> +case VEC_WIDEN_PLUS_LO_EXPR:
> +case VEC_WIDEN_MINUS_HI_EXPR:
> +case VEC_WIDEN_MINUS_LO_EXPR:
> +case VEC_WIDEN_MULT_HI_EXPR:
> +case VEC_WIDEN_MULT_LO_EXPR:
> +case VEC_WIDEN_MULT_EVEN_EXPR:
> +case VEC_WIDEN_MULT_ODD_EXPR:
> +case VEC_UNPACK_HI_EXPR:
> +case VEC_UNPACK_LO_EXPR:
> +case VEC_UNPACK_FIX_TRUNC_HI_EXPR:
> +case VEC_UNPACK_FIX_TRUNC_LO_EXPR:
> +case VEC_PACK_TRUNC_EXPR:
> +case VEC_PACK_SAT_EXPR:
> +case VEC_PACK_FIX_TRUNC_EXPR:
> +case VEC_WIDEN_LSHIFT_HI_EXPR:
> +case VEC_WIDEN_LSHIFT_LO_EXPR:
> +  return false;
> +
> +case LSHIFT_E

RE: [backport gcc10, gcc9] Requet to backport PR97969

2021-02-02 Thread Richard Biener
On Tue, 2 Feb 2021, Przemyslaw Wirkus wrote:

> > On 2021-01-18 7:50 a.m., Richard Biener wrote:
> > > On Mon, 18 Jan 2021, Przemyslaw Wirkus wrote:
> > >
> > >> Hi all,
> > >>
> > >> Can we backport PR97969 patch to GCC 10 and (maybe) GCC 9 ?:
> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969
> > >>
> > >> IMHO bug is severe and could land in GCC 10 and 9. Vladimir's original
> > patch:
> > >> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563322.html
> > >> applies without changes to both gcc-10 and gcc-9.
> > >>
> > >> I've regression tested this patch on both gcc-10 and gcc-9 branched
> > >> for
> > >> x86_64 cross (arm-eabi target) and no issues.
> > >>
> > >> OK for gcc-10 and gcc-9 ?
> > > I see two fallout PRs with a trivial search: PR98643 and PR98722.  LRA
> > > patches quite easily trigger unexpected fallout unfortunately ...
> > >
> > Yes, I am agree.  We should wait until the new regressions are fixed.  I 
> > am
> > going to work on this patch more to fix the new regressions. Although the
> > basic idea of the original problem solution probably will stay the same.
> 
> I've retested series of three patches which are related to this PR:
> 
> 19af25c0b3aa2a78b4d45d295359ec26cb9fc607 [PR98777]
> 79c57603602c4493b6baa1d47ed451e8f5e9c0f3 [PR98722]
> 34aa56af2547e1646c0f07b9b88b210ebdb2a9f5 [PR97969]
> 
> on top of gcc-10 branch.
> 
> Bootstrapped and regression tested on aarch64-linux-gnu machine and no issues.
> Regression tested on x86_64 host (arm-eabi target) cross and no issues.
> 
> OK for gcc-10 ?

I think this warrants waiting until at least the GCC 11 release.

Richard.


Re: [committed] libstdc++: Update C++17 status table for

2021-02-02 Thread Jonathan Wakely via Gcc-patches

On 01/02/21 16:08 +, Jonathan Wakely wrote:

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2011.xml: Update std::call_once
status.
* doc/xml/manual/status_cxx2014.xml: Likewise.
* doc/xml/manual/status_cxx2017.xml: Likewise. Update
std::from_chars and std::to_chars status. Fix formatting.
* doc/html/manual/status.html: Regenerate.


I noticed we have an unwanted extra column at the right of the
C++11/14/17 tables, due to a stray  element. Fixed by this
patch.

Committed to trunk.


commit 886f9f519c0c6297c95887839e318fa79cba2052
Author: Jonathan Wakely 
Date:   Tue Feb 2 09:55:52 2021

libstdc++: Fix markup for status tables in docs

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2011.xml: Remove stray table cell.
* doc/xml/manual/status_cxx2014.xml: Likewise.
* doc/xml/manual/status_cxx2017.xml: Likewise.
* doc/html/manual/status.html: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index e13ca566ea3..be873962597 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -2174,7 +2174,6 @@ particular release.
   Class template regex_traits
   Partial
   transform_primary is not correctly implemented
-  
 
 
   28.8
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
index 7b2d4603b24..61bea5adad5 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
@@ -1142,7 +1142,6 @@ not in any particular release.
   Class template regex_traits
   Partial
   transform_primary is not correctly implemented
-  
 
 
   28.8
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 7b5df3d1385..aa34b8c3cf5 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -2224,7 +2224,6 @@ since C++14 and the implementation is complete.
   Class template regex_traits
   Partial
   transform_primary is not correctly implemented
-  
 
 
   31.8


[r11-7011 Regression] FAIL: g++.dg/cpp0x/alias-decl-dr1558.C -std=c++17 (test for excess errors) on Linux/x86_64

2021-02-02 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6e0a231a4aa2407bb7167daf98a37795a67364d8 is the first bad commit
commit 6e0a231a4aa2407bb7167daf98a37795a67364d8
Author: Jason Merrill 
Date:   Wed Jan 27 17:15:39 2021 -0500

c++: alias in qualified-id in template arg [PR98570]

caused

FAIL: g++.dg/modules/xtreme-header-1_a.H module-cmi  
(gcm.cache/$srcdir/g++.dg/modules/xtreme-header-1_a.H.gcm)
FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++2a (internal compiler error)
FAIL: g++.dg/modules/xtreme-header-1_a.H -std=c++2a (test for excess errors)
FAIL: g++.dg/pr77550.C   (internal compiler error)
FAIL: g++.dg/pr77550.C   (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-7011/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="modules.exp=g++.dg/modules/xtreme-header-1_a.H 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr77550.C 
--target_board='unix{-m32}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[Patch] Fortran: Fix Array dependency with local coarrays [PR98913] (was: Re: A question about coarrays and -Warray-temporaries)

2021-02-02 Thread Tobias Burnus

Hi all,

the attached patch now handles -fcoarray=single and access to the local
image like no coarray access, using the same check a before.

Actually, I think we could remove the if (..coarray..) check completely:
For 'single', it is like no coarrays; for 'lib' both AA[remote] =... and
"AA = ... AA[remove]" will create a function call and uses already
temporaries. I don't know what 'native'/'threads' does – but either it
the image index is different, then no temporary is needed at all – or it
is the same (in particular: this_image / not present), but then the
noncoarray analysis would work.

(Knowing that for i != this_image(), AA and AA[i] – or i /= j for AA[i]
and AA[j] – we could further relax the checks, but I don't think that
that's needed.)

Hence: Is the patch OK for the trunk? Or should we completely get rid of
if-conditional block?

Tobias

On 01.02.21 12:52, Tobias Burnus wrote:


On 01.02.21 08:42, Thomas Koenig via Fortran wrote:

I have a question with -Warray-temporaries in the test below.
With the AA coarray that warning appears in the first loop (on its
local part),
but not with the BB array with the same task, i.e. [...]

It seems that dependency checking does not detect that no overlap
can exist in that case, and so generates a temporary.  Probably,
dependency checking was not updated when coarrays were introduced.
This is a missed optimization, not a correctness issue.

I have now filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98913

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Fortran: Fix Array dependency with local coarrays [PR98913]

gcc/fortran/ChangeLog:

	PR fortran/98913
	* dependency.c (gfc_dep_resolver): Treat local access
	to coarrays like any array access in dependency analysis.

gcc/testsuite/ChangeLog:

	PR fortran/98913
	* gfortran.dg/coarray/array_temporary.f90: New test.

 gcc/fortran/dependency.c   | 11 +++-
 .../gfortran.dg/coarray/array_temporary.f90| 74 ++
 2 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/dependency.c b/gcc/fortran/dependency.c
index c9baca80cbc..0de5d093aab 100644
--- a/gcc/fortran/dependency.c
+++ b/gcc/fortran/dependency.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dependency.h"
 #include "constructor.h"
 #include "arith.h"
+#include "options.h"
 
 /* static declarations */
 /* Enums  */
@@ -2143,8 +2144,14 @@ gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse,
 
 	case REF_ARRAY:
 
-	  /* For now, treat all coarrays as dangerous.  */
-	  if (lref->u.ar.codimen || rref->u.ar.codimen)
+	  /* For now, treat all nonlocal coarrays as dangerous.  */
+	  if (flag_coarray != GFC_FCOARRAY_SINGLE
+	  && ((lref->u.ar.codimen
+		   && lref->u.ar.dimen_type[lref->u.ar.dimen]
+		  != DIMEN_THIS_IMAGE)
+		  || (rref->u.ar.codimen
+		  && lref->u.ar.dimen_type[lref->u.ar.dimen]
+			 != DIMEN_THIS_IMAGE)))
 	return 1;
 
 	  if (ref_same_as_full_array (lref, rref))
diff --git a/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90 b/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90
new file mode 100644
index 000..86460a7c282
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90
@@ -0,0 +1,74 @@
+! { dg-do compile }
+! { dg-additional-options "-Warray-temporaries" }
+!
+! PR fortran/98913
+!
+! Contributed by Jorge D'Elia
+!
+! Did create an array temporary for local access to coarray
+! (but not for identical noncoarray use).
+!
+
+program test
+  implicit none
+  integer, parameter :: iin = kind (1) 
+  integer, parameter :: idp = kind (1.0d0) 
+  real(kind=idp), allocatable :: AA (:,:)[:]
+  real(kind=idp), allocatable :: BB (:,:)
+  real(kind=idp), allocatable :: UU (:)
+  integer (kind=iin) :: nn, n1, n2
+  integer (kind=iin) :: j, k, k1
+  !
+  nn =  5
+  n1 =  1
+  n2 = 10
+  !
+  allocate (AA (1:nn,n1:n2)[*])
+  allocate (BB (1:nn,n1:n2))
+  allocate (UU (1:nn))
+  !
+  k  = 1
+  k1 = k + 1
+  !
+  AA = 1.0_idp
+  BB = 1.0_idp
+  UU = 2.0_idp
+
+  ! AA - coarrays
+  ! No temporary needed:
+  do  j = 1, nn
+AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+  do  j = 1, nn
+AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1-1:nn-1,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+  do  j = 1, nn
+AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1+1:nn+1,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+
+  ! But:
+  do  j = 1, nn
+AA (k1:nn,j) = AA (k1-1:nn-1,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1+1:nn+1,j)  ! { dg-warning "Creating array temporary" }
+  end do
+
+  ! BB - no coarrays
+  ! No temporary needed:
+  do  j = 1, nn
+BB (k1:nn,j) = BB (k1:nn,j) - UU (k1:nn) * BB (k,j)  ! { dg-bogus "Creating a

[PATCH] aarch64: Relax builtin flags for integer builtins

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch relaxes the flags for most integer builtins to NONE as they don't 
read/write memory
and don't care about the FPCR/FPSR or exceptions so we should be more 
aggressive with them.

This leads to fallout in a testcase where the result of an intrinsic was unused 
and it is now
DCE'd. The testcase is adjusted.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (clrsb, clz, ctz, popcount,
vec_smult_lane_, vec_smlal_lane_, vec_smult_laneq_, vec_smlal_laneq_,
vec_umult_lane_, vec_umlal_lane_, vec_umult_laneq_, vec_umlal_laneq_,
ashl, sshl, ushl, srshl, urshl, sdot_lane, udot_lane, sdot_laneq,
udot_laneq, usdot_lane, usdot_laneq, sudot_lane, sudot_laneq, ashr,
ashr_simd, lshr, lshr_simd, srshr_n, urshr_n, ssra_n, usra_n, srsra_n,
ursra_n, sshll_n, ushll_n, sshll2_n, ushll2_n, ssri_n, usri_n, ssli_n,
ssli_n, usli_n, bswap, rbit, simd_bsl, eor3q, rax1q, xarq, bcaxq): Use
NONE builtin flags.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/arg-type-diagnostics-1.c: Return result from foo.


flag-int.patch
Description: flag-int.patch


[PATCH] aarch64: Relax flags for floating-point builtins to FP where appropriate

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch relaxes various floating-point builtins to use the FP flags to 
signify they
made use the FPCR or raise exceptions.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (fcmla_lane0, fcmla_lane90,
fcmla_lane180, fcmla_lane270, fcmlaq_lane0, fcmlaq_lane90, 
fcmlaq_lane180,
fcmlaq_lane270, scvtf, ucvtf, fcvtzs, fcvtzu, scvtfsi, scvtfdi, ucvtfsi,
ucvtfdi, fcvtzshf, fcvtzuhf, fmlal_lane_low, fmlsl_lane_low,
fmlal_laneq_low, fmlsl_laneq_low, fmlalq_lane_low, fmlslq_lane_low,
fmlalq_laneq_low, fmlslq_laneq_low, fmlal_lane_high, fmlsl_lane_high,
fmlal_laneq_high, fmlsl_laneq_high, fmlalq_lane_high, fmlslq_lane_high,
fmlalq_laneq_high, fmlslq_laneq_high): Use FP flags.


fp-flag.patch
Description: fp-flag.patch


[PATCH] aarch64: Relax some builtins to AUTO_FP

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch relaxes the flags for some builtins to AUTO_FP. These builtins do 
permutes
and similar, so they shouldn't get the FP flags when operating on 
floating-point modes
as they don't care about FPCR/FPSR and exceptions.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (combine, zip1, zip2, uzp1,
uzp2, trn1, trn2, simd_bsl): Use AUTO_FP flags.


autofp-flag.patch
Description: autofp-flag.patch


[PATCH] aarch64: Add and use FLAG_LOAD in builtins

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

We already have a STORE flag that we use for builtins. This patch introduces a 
LOAD set
that uses AUTO_FP and FLAG_READ_MEMORY. This allows for more aggressive 
optimisation of the load
intrinsics.

Turns out we have a great many testcases that do:
float16x4x2_t
f_vld2_lane_f16 (float16_t * p, float16x4x2_t v)
{
  float16x4x2_t res;
  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */
  res = vld2_lane_f16 (p, v, 4);
  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */
  res = vld2_lane_f16 (p, v, -1);
  return res;
}

but since the first res is unused it now gets eliminated early on before we get 
to give an error
message. Ideally we'd like to warn for both.
This patch takes the conservative approach and doesn't convert the load-lane 
builtins to LOAD ; that's something we can improve later.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (FLAG_LOAD): Define.
* config/aarch64/aarch64-simd-builtins.def (ld1x2, ld2, ld3, ld4, ld2r,
ld3r, ld4r, ld1, ld1x3, ld1x4): Use LOAD flags.


load-flag.patch
Description: load-flag.patch


[PATCH] aarch64: Update flags for bfloat16 builtins

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch updates the flags for the bfloat16 builtins.
The bfdot ones aren't affected by the FPCR/FPSR so can be AUTO_FP
whereas the bfmlal ones follow the normal floating-point instructions and get 
FP.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (bfdot_lane, bfdot_laneq): 
Use
AUTO_FP flags.
(bfmlalb_lane, bfmlalt_lane, bfmlalb_lane_q, bfmlalt_lane_q): Use FP 
flags.


bfloat-flag.patch
Description: bfloat-flag.patch


[Patch, fortran] PR98897 - Erroneous procedure attribute for associate name

2021-02-02 Thread Paul Richard Thomas via Gcc-patches
This is more or less 'obvious' and does not require any further explanation.

Regtests with FC33/x86_64 - OK for master (and )?

Paul

Fortran: Fix calls to associate name typebound subroutines [PR98897].

2021-02-02  Paul Thomas  

gcc/fortran
PR fortran/98897
* match.c (gfc_match_call): Include associate names as possible
entities with typebound subroutines. The target needs to be
resolved for the type.

gcc/testsuite/
PR fortran/98897
* gfortran.dg/typebound_call_32.f90: New test.
diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index f0469e25da6..2df6191d7e6 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -4999,10 +4999,16 @@ gfc_match_call (void)
   sym = st->n.sym;
 
   /* If this is a variable of derived-type, it probably starts a type-bound
- procedure call.  */
-  if ((sym->attr.flavor != FL_PROCEDURE
-   || gfc_is_function_return_value (sym, gfc_current_ns))
-  && (sym->ts.type == BT_DERIVED || sym->ts.type == BT_CLASS))
+ procedure call. Associate variable targets have to be resolved for the
+ target type.  */
+  if (((sym->attr.flavor != FL_PROCEDURE
+	|| gfc_is_function_return_value (sym, gfc_current_ns))
+   && (sym->ts.type == BT_DERIVED || sym->ts.type == BT_CLASS))
+		||
+  (sym->assoc && sym->assoc->target
+   && gfc_resolve_expr (sym->assoc->target)
+   && (sym->assoc->target->ts.type == BT_DERIVED
+	   || sym->assoc->target->ts.type == BT_CLASS)))
 return match_typebound_call (st);
 
   /* If it does not seem to be callable (include functions so that the
! { dg-do run }
!
! Test the fix for PR98897 in which typebound subroutines of associate names
! were not recognised in a call. Functions were OK but this is tested below.
!
! Contributed by Damian Rouson  
!
module output_data_m
  implicit none

  type output_data_t
integer, private :: i = 0
  contains
procedure output, return_value
  end type


contains
  subroutine output(self)
  implicit none
  class(output_data_t) self
  self%i = 1234
  end subroutine

  integer function return_value(self)
  implicit none
  class(output_data_t) self
  return_value = self%i
  end function
end module

  use output_data_m
  implicit none
  associate(output_data => output_data_t())
call output_data%output
if (output_data%return_value() .ne. 1234) stop 1
  end associate
end



Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 02, 2021 at 11:06:33AM +0100, Richard Biener wrote:
> So I fear this only covers parts of the paths simplifications can
> end up used.  Now one question is whether we want to allow
> "invalid" intermediate transforms if a final match makes it
> "valid" again.  If not then doing the verification in
> gimple-match.c itself might be preferable.  OTOH I think that
> most patterns should verify themselves - it's also odd that
> v << 1 >> 1 is supported but not v & vec-cst - we might want to
> require some basic support from targets.  I realize the PR is
> one of the odd V1mode cases which we might want to support better
> during RTL expansion (or require targets to be more sane here).
> 
> So in the end I don't like the patch much, not at this stage
> anyway.

I'm afraid we can't just ignore it for GCC 11, there are many passes after
veclower and we have hundreds of folders on vector ops and many of them can
just turn something that is handled into something that isn't, even with
normal vector types and not the corner cases. Not all targets support all
optabs...
What about a modification of the patch which instead of preventing the
simplifications clears PROP_gimple_lvec property and schedule another
veclower pass right before isel which would be only run if the property
isn't set (I think the -O0 version of the pass already works that way).

Jakub



Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Richard Biener
On Tue, 2 Feb 2021, Jakub Jelinek wrote:

> On Tue, Feb 02, 2021 at 11:06:33AM +0100, Richard Biener wrote:
> > So I fear this only covers parts of the paths simplifications can
> > end up used.  Now one question is whether we want to allow
> > "invalid" intermediate transforms if a final match makes it
> > "valid" again.  If not then doing the verification in
> > gimple-match.c itself might be preferable.  OTOH I think that
> > most patterns should verify themselves - it's also odd that
> > v << 1 >> 1 is supported but not v & vec-cst - we might want to
> > require some basic support from targets.  I realize the PR is
> > one of the odd V1mode cases which we might want to support better
> > during RTL expansion (or require targets to be more sane here).
> > 
> > So in the end I don't like the patch much, not at this stage
> > anyway.
> 
> I'm afraid we can't just ignore it for GCC 11, there are many passes after
> veclower and we have hundreds of folders on vector ops and many of them can
> just turn something that is handled into something that isn't, even with
> normal vector types and not the corner cases. Not all targets support all
> optabs...
> What about a modification of the patch which instead of preventing the
> simplifications clears PROP_gimple_lvec property and schedule another
> veclower pass right before isel which would be only run if the property
> isn't set (I think the -O0 version of the pass already works that way).

I don't say ignore the problem.  All I say is that the x86 target
should either not advertise V1DF shifts or advertise the basic
ops that reasonable simplification would expect to exist.

The testcases sofar are artificially generating V1DF operations.

Now, if we want to have any fully automatic way of disabling
transforms like you propose then IMHO it should be based on
common a "is this op supported" machinery shared by vector lowering
and the new machinery (and eventually also used by RTL expansion
and the optab checks that exist for the vectorizers purpose).

Otherwise it's just a quite fragile (and invisble) machinery again.

Btw, I just can find V1DI mentioned in mmx.md but I can't find
rotate or shift patterns that would match?

That said, for match.pd result verification IMHO the correct
place is a gimple_match_op::target_supported_p routine or so that we'd 
call
in gimple-match.c like

  if (__builtin_expect (dump_file && (dump_flags & TDF_FOLDING), 0)) 
fprintf (dump_file, "Applying pattern %s:%d, %s:%d\n", "match.pd", 600, 
__FILE__, __LINE__);
  {
res_op->set_op (NOP_EXPR, type, 1);
{
  tree _o1[2], _r1;
  _o1[0] = captures[0];
  _o1[1] = captures[1];
  gimple_match_op tem_op (res_op->cond.any_else (), 
TRUNC_MOD_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
  tem_op.resimplify (lseq, valueize);

< here

  _r1 = maybe_push_res_to_seq (&tem_op, lseq);

vector lowering could then build a gimple_match_op and just
call the new routine as predicate.  We could return a tri-state
to explicitely signal "don't know" (aka not fully implement the
function but try to be spot-on for vector optabs).

That said, I'd prefer a x86 specific solution for the PR at
this point (just kill those shift V1DImode optab entries?)

Richard.


[PATCH 0/4] openacc: Mixing array elements and derived types

2021-02-02 Thread Julian Brown
This patch series fixes support for mixing arbitrary derived type
accesses and array element accesses in OpenACC directives, and fixes a
bug in derived-type lowering.

Somewhat relatedly, it also allows strided accesses in update directives.

Further commentary alongside each patch. Some of the test cases were
originally written by Tobias Burnus.

Tested with offloading to AMD GCN. OK for mainline (or for stage1)?

Thanks,

Julian

Julian Brown (4):
  openacc: Remove dereference for non-pointer derived-type members
  openacc: Use class_pointer instead of pointer attribute for class
types
  openacc: Fix lowering for derived-type mappings through array elements
  openacc: Allow strided arrays in update directives

 gcc/fortran/openmp.c  |   5 +-
 gcc/fortran/trans-openmp.c| 196 +++---
 gcc/gimplify.c|  12 ++
 .../gfortran.dg/goacc/array-with-dt-1.f90 |  11 +
 .../gfortran.dg/goacc/array-with-dt-2.f90 |  10 +
 .../gfortran.dg/goacc/array-with-dt-3.f90 |  14 ++
 .../gfortran.dg/goacc/array-with-dt-4.f90 |  18 ++
 .../gfortran.dg/goacc/array-with-dt-5.f90 |  12 ++
 .../goacc/derived-classtypes-1.f90| 129 
 .../derivedtypes-arrays-1.f90 | 109 ++
 .../libgomp.oacc-fortran/update-dt-array.f90  |  53 +
 11 files changed, 487 insertions(+), 82 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f90
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/update-dt-array.f90

-- 
2.29.2



[PATCH 1/4] openacc: Remove dereference for non-pointer derived-type members

2021-02-02 Thread Julian Brown
Derived-type members that are themselves derived types are not always
represented as pointers, so it is not always correct to dereference
them. The attached test case fails during gimplification without this
patch.

Tested with offloading to AMD GCN. OK for mainline?

Thanks,

Julian

2020-02-02  Julian Brown  

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Handle non-pointer type.

gcc/testsuite/
* gfortran.dg/goacc/derived-classtypes-1.f95: New test.
---
 gcc/fortran/trans-openmp.c|   7 +-
 .../goacc/derived-classtypes-1.f95| 129 ++
 2 files changed, 134 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 00358ca4d39..8d8da4593c3 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3013,7 +3013,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
}
 
  OMP_CLAUSE_DECL (node)
-   = build_fold_indirect_ref (data);
+   = (POINTER_TYPE_P (TREE_TYPE (data))
+  ? build_fold_indirect_ref (data) : data);
  OMP_CLAUSE_SIZE (node) = size;
  node2 = build_omp_clause (input_location,
OMP_CLAUSE_MAP);
@@ -3021,7 +3022,9 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
   openacc
   ? GOMP_MAP_ATTACH_DETACH
   : GOMP_MAP_ALWAYS_POINTER);
- OMP_CLAUSE_DECL (node2) = data;
+ OMP_CLAUSE_DECL (node2)
+   = (POINTER_TYPE_P (TREE_TYPE (data))
+  ? data : build_fold_addr_expr (data));
  OMP_CLAUSE_SIZE (node2) = size_int (0);
}
  else
diff --git a/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 
b/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95
new file mode 100644
index 000..e6cf09c6d3c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95
@@ -0,0 +1,129 @@
+type :: type1
+  integer :: a
+end type type1
+
+type :: type2
+  integer, pointer :: b
+end type type2
+
+type :: aux1
+  integer :: y
+end type aux1
+
+type, extends(aux1) :: aux
+  integer :: x
+end type aux
+
+type :: type3
+  class(aux), pointer :: c(:)
+end type type3
+
+type :: type4
+  integer, pointer :: d(:)
+end type type4
+
+type :: type5
+  type(aux) :: e
+end type type5
+
+type :: type6
+  type(aux), pointer :: f
+end type type6
+
+type :: type7
+  class(aux), pointer :: g
+end type type7
+
+type(type1) :: foo
+type(type2) :: bar
+type(type3) :: qux
+type(type4) :: quux
+type(type5) :: fred
+type(type6) :: jim
+type(type7) :: shiela
+
+type(type1), pointer :: pfoo
+type(type2), pointer :: pbar
+type(type3), pointer :: pqux
+type(type4), pointer :: pquux
+type(type5), pointer :: pfred
+type(type6), pointer :: pjim
+type(type7), pointer :: pshiela
+
+class(type1), pointer :: cfoo
+class(type2), pointer :: cbar
+class(type3), pointer :: cqux
+class(type4), pointer :: cquux
+class(type5), pointer :: cfred
+class(type6), pointer :: cjim
+class(type7), pointer :: cshiela
+
+class(type1), allocatable :: acfoo
+class(type2), allocatable :: acbar
+class(type3), allocatable :: acqux
+class(type4), allocatable :: acquux
+class(type5), allocatable :: acfred
+class(type6), allocatable :: acjim
+class(type7), allocatable :: acshiela
+
+!$acc enter data copyin(foo)
+!$acc enter data copyin(foo%a)
+!$acc enter data copyin(bar)
+!$acc enter data copyin(bar%b)
+!$acc enter data copyin(qux)
+!!$acc enter data copyin(qux%c)
+!$acc enter data copyin(quux)
+!$acc enter data copyin(quux%d)
+!$acc enter data copyin(fred)
+!$acc enter data copyin(fred%e)
+!$acc enter data copyin(jim)
+!$acc enter data copyin(jim%f)
+!$acc enter data copyin(shiela)
+!$acc enter data copyin(shiela%g)
+
+!$acc enter data copyin(pfoo)
+!$acc enter data copyin(pfoo%a)
+!$acc enter data copyin(pbar)
+!$acc enter data copyin(pbar%b)
+!$acc enter data copyin(pqux)
+!!$acc enter data copyin(pqux%c)
+!$acc enter data copyin(pquux)
+!$acc enter data copyin(pquux%d)
+!$acc enter data copyin(pfred)
+!$acc enter data copyin(pfred%e)
+!$acc enter data copyin(pjim)
+!$acc enter data copyin(pjim%f)
+!$acc enter data copyin(pshiela)
+!$acc enter data copyin(pshiela%g)
+
+!$acc enter data copyin(cfoo)
+!$acc enter data copyin(cfoo%a)
+!$acc enter data copyin(cbar)
+!$acc enter data copyin(cbar%b)
+!$acc enter data copyin(cqux)
+!!$acc enter data copyin(cqux%c)
+!$acc enter data copyin(cquux)
+!$acc enter data copyin(cquux%d)
+!$acc enter data copyin(cfred)
+!$acc enter 

[PATCH 2/4] openacc: Use class_pointer instead of pointer attribute for class types

2021-02-02 Thread Julian Brown
Elsewhere in the Fortran front-end, the class_pointer attribute is
used for BT_CLASS entities instead of the pointer attribute. This patch
follows suit for OpenACC. I couldn't actually come up with a test case
where this makes a difference (i.e., where "class_pointer" and "pointer"
have different values at this point in the code), but this may nonetheless
fix a latent bug.

Tested with offloading to AMD GCN. OK for mainline?

Thanks,

Julian

2020-02-02  Julian Brown  

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Use class_pointer attribute
for BT_CLASS.
---
 gcc/fortran/trans-openmp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 8d8da4593c3..7be34ef9a35 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2997,7 +2997,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  if (lastcomp->u.c.component->ts.type == BT_DERIVED
  || lastcomp->u.c.component->ts.type == BT_CLASS)
{
- if (sym_attr.pointer || (openacc && sym_attr.allocatable))
+ bool pointer
+   = (lastcomp->u.c.component->ts.type == BT_CLASS
+  ? sym_attr.class_pointer : sym_attr.pointer);
+ if (pointer || (openacc && sym_attr.allocatable))
{
  tree data, size;
 
-- 
2.29.2



[PATCH 4/4] openacc: Allow strided arrays in update directives

2021-02-02 Thread Julian Brown
OpenACC 3.0 ("2.14.4. Update Directive") states:

  Noncontiguous subarrays may appear. It is implementation-specific
  whether noncontiguous regions are updated by using one transfer for
  each contiguous subregion, or whether the non-contiguous data is
  packed, transferred once, and unpacked, or whether one or more larger
  subarrays (no larger than the smallest contiguous region that contains
  the specified subarray) are updated.

This patch relaxes some conditions in the Fortran front-end so that
strided accesses are permitted for update directives.

Tested with offloading to AMD GCN. OK for mainline?

Thanks,

Julian

2020-02-02  Julian Brown  

gcc/fortran/
* openmp.c (resolve_omp_clauses): Omit OpenACC update in
contiguity check and stride-specified error.

gcc/testsuite/
* gfortran.dg/goacc/array-with-dt-2.f90: New test.
---
 gcc/fortran/openmp.c|  5 +++--
 gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90 | 10 ++
 2 files changed, 13 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 9a3a8f63b5e..1c8c5315329 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5192,7 +5192,8 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
   array isn't contiguous.  An expression such as
   arr(-n:n,-n:n) could be contiguous even if it looks
   like it may not be.  */
-   if (list != OMP_LIST_CACHE
+   if (code->op != EXEC_OACC_UPDATE
+   && list != OMP_LIST_CACHE
&& list != OMP_LIST_DEPEND
&& !gfc_is_simply_contiguous (n->expr, false, true)
&& gfc_is_not_contiguous (n->expr))
@@ -5224,7 +5225,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
int i;
gfc_array_ref *ar = &array_ref->u.ar;
for (i = 0; i < ar->dimen; i++)
- if (ar->stride[i])
+ if (ar->stride[i] && code->op != EXEC_OACC_UPDATE)
{
  gfc_error ("Stride should not be specified for "
 "array section in %s clause at %L",
diff --git a/gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90 
b/gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90
new file mode 100644
index 000..807580d75a9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/array-with-dt-2.f90
@@ -0,0 +1,10 @@
+type t
+   integer, allocatable :: A(:,:)
+end type t
+
+type(t), allocatable :: b(:)
+
+!$acc update host(b(::2))
+!$acc update host(b(1)%A(::3,::4))
+end
+
-- 
2.29.2



[PATCH 3/4] openacc: Fix lowering for derived-type mappings through array elements

2021-02-02 Thread Julian Brown
This patch fixes lowering of derived-type mappings which select elements
of arrays of derived types, and similar. These would previously lead
to ICEs.

With this change, update directives and enter/exit data directives can
pass through constructs that are no longer recognized by the gimplifier,
hence alterations are needed there also.

Tested with offloading to AMD GCN. OK for mainline?

Thanks,

Julian

2020-02-02  Julian Brown  

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Handle element selection for
arrays of derived types.

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Handle ATTACH_DETACH for
non-decls.

gcc/testsuite/
* gfortran.dg/goacc/array-with-dt-1.f90: New test.
* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
* gfortran.dg/goacc/derived-classtypes-1.f95: Uncomment
previously-broken directives.

libgomp/
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: New test.
* testsuite/libgomp.oacc-fortran/update-dt-array.f90: Likewise.
---
 gcc/fortran/trans-openmp.c| 192 ++
 gcc/gimplify.c|  12 ++
 .../gfortran.dg/goacc/array-with-dt-1.f90 |  11 +
 .../gfortran.dg/goacc/array-with-dt-3.f90 |  14 ++
 .../gfortran.dg/goacc/array-with-dt-4.f90 |  18 ++
 .../gfortran.dg/goacc/array-with-dt-5.f90 |  12 ++
 ...sstypes-1.f95 => derived-classtypes-1.f90} |   8 +-
 .../derivedtypes-arrays-1.f90 | 109 ++
 .../libgomp.oacc-fortran/update-dt-array.f90  |  53 +
 9 files changed, 344 insertions(+), 85 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-5.f90
 rename gcc/testsuite/gfortran.dg/goacc/{derived-classtypes-1.f95 => 
derived-classtypes-1.f90} (95%)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/update-dt-array.f90

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 7be34ef9a35..0ab08dabe9a 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2675,6 +2675,32 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  tree decl = gfc_trans_omp_variable (n->sym, false);
  if (DECL_P (decl))
TREE_ADDRESSABLE (decl) = 1;
+
+ gfc_ref *lastref = NULL;
+
+ if (n->expr)
+   for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next)
+ if (ref->type == REF_COMPONENT || ref->type == REF_ARRAY)
+   lastref = ref;
+
+ bool allocatable = false, pointer = false;
+
+ if (lastref && lastref->type == REF_COMPONENT)
+   {
+ gfc_component *c = lastref->u.c.component;
+
+ if (c->ts.type == BT_CLASS)
+   {
+ pointer = CLASS_DATA (c)->attr.class_pointer;
+ allocatable = CLASS_DATA (c)->attr.allocatable;
+   }
+ else
+   {
+ pointer = c->attr.pointer;
+ allocatable = c->attr.allocatable;
+   }
+   }
+
  if (n->expr == NULL
  || (n->expr->ref->type == REF_ARRAY
  && n->expr->ref->u.ar.type == AR_FULL))
@@ -2911,74 +2937,79 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
}
  else if (n->expr
   && n->expr->expr_type == EXPR_VARIABLE
-  && n->expr->ref->type == REF_COMPONENT)
+  && n->expr->ref->type == REF_ARRAY
+  && !n->expr->ref->next)
{
- gfc_ref *lastcomp;
-
- for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next)
-   if (ref->type == REF_COMPONENT)
- lastcomp = ref;
-
- symbol_attribute sym_attr;
-
- if (lastcomp->u.c.component->ts.type == BT_CLASS)
-   sym_attr = CLASS_DATA (lastcomp->u.c.component)->attr;
- else
-   sym_attr = lastcomp->u.c.component->attr;
-
+ /* An array element or array section which is not part of a
+derived type, etc.  */
+ bool element = n->expr->ref->u.ar.type == AR_ELEMENT;
+ gfc_trans_omp_array_section (block, n, decl, element,
+  GOMP_MAP_POINTER, node, node2,
+  

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Biener via Gcc-patches
On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
>
> Hi Richard(s),
>
> I'm just looking to see if I'm going about this the right way, based on the 
> discussion we had on IRC. I've managed to hack something together, I've 
> attached a (very) WIP patch which gives the correct codegen for the testcase 
> in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
> obviously need to support other widening patterns and differentiate between 
> big/little endian among other things.
>
> I added a backend pattern because I wasn't quite clear which changes to make 
> in order to allow the existing backend patterns to be used with a V8QI, or 
> how to represent V16QI where we don't care about the top/bottom 8. I made 
> some attempt in optabs.c, which is in the patch commented out, but I'm not 
> sure if I'm going about this the right way.

Hmm, as said, I'd try to arrange like illustrated in the attachment,
confined to vectorizable_conversion.  The
only complication might be sub-optimal code-gen for the vector-vector
CTOR compensating for the input
vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)

Richard.

> Joel


p
Description: Binary data


Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 02, 2021 at 02:23:55PM +0100, Richard Biener wrote:
> Btw, I just can find V1DI mentioned in mmx.md but I can't find
> rotate or shift patterns that would match?

The backend has several V1?Imode shifts, but optab only for those V1DImode
ones:

grep '[la]sh[lr]v1[qhsdtox]' tmp-mddump.md 
(define_insn ("mmx_ashlv1di3")
(define_insn ("mmx_lshrv1di3")
(define_insn ("avx512bw_ashlv1ti3")
(define_insn ("avx512bw_lshrv1ti3")
(define_insn ("sse2_ashlv1ti3")
(define_insn ("sse2_lshrv1ti3")
(define_expand ("ashlv1di3")
(define_expand ("lshrv1di3")
  emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, operands[1]),

I think it has been introduced with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021#c13

Before we didn't have any V1DImode expanders (except mov/movmisalign, but
those are needed and are supplied for other V1??mode modes too).

So I'll test:

2021-02-02  Jakub Jelinek  

PR tree-optimization/98287
* config/i386/mmx.md (3): For shifts don't enable expander
for V1DImode.

* gcc.dg/pr98287.c: New test.

--- gcc/config/i386/mmx.md.jj   2021-01-07 15:29:52.601974578 +0100
+++ gcc/config/i386/mmx.md  2021-02-02 14:48:52.310935516 +0100
@@ -1528,9 +1528,9 @@ (define_insn "mmx_3"
(set_attr "mode" "DI,TI,TI")])
 
 (define_expand "3"
-  [(set (match_operand:MMXMODE248 0 "register_operand")
-(any_lshift:MMXMODE248
- (match_operand:MMXMODE248 1 "register_operand")
+  [(set (match_operand:MMXMODE24 0 "register_operand")
+(any_lshift:MMXMODE24
+ (match_operand:MMXMODE24 1 "register_operand")
  (match_operand:DI 2 "nonmemory_operand")))]
   "TARGET_MMX_WITH_SSE")
 
--- gcc/testsuite/gcc.dg/pr98287.c.jj   2021-02-02 14:50:05.583107569 +0100
+++ gcc/testsuite/gcc.dg/pr98287.c  2021-02-02 14:50:05.583107569 +0100
@@ -0,0 +1,19 @@
+/* PR tree-optimization/98287 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-ccp -fno-tree-forwprop -Wno-psabi -w" } */
+
+typedef unsigned long __attribute__((__vector_size__ (8))) V;
+V v;
+
+static __attribute__((noinline, noclone)) V
+bar (unsigned short s)
+{
+  return v >> s << s | v >> s >> 63;
+}
+
+unsigned long
+foo (void)
+{
+  V x = bar (1);
+  return x[0];
+}


Jakub



Re: [Patch, fortran] PR98897 - Erroneous procedure attribute for associate name

2021-02-02 Thread Tobias Burnus

Hi Paul,

On 02.02.21 13:20, Paul Richard Thomas via Gcc-patches wrote:

This is more or less 'obvious' and does not require any further explanation.


Well, I am not sure whether calling resolve is premature or not. In any
case, it still fails for the attached testcase. (Related but separate
issue.)

The second testcase fails with "Selector at (1) has no type" / "Symbol
'var' at (1) has no IMPLICIT type".

Disclaimer: I am not 100% sure whether those two or your/the PR's
testcase is valid. (It fails to compile with ifort 19.1. I have not read
the spec and assume that the original testcase is valid.)

Thus, please confirm that all three are valid. If so, do you see a way
to make the two new ones pass as well? If we are sure that the current
patch is still the right approach, I am also fine to do it stepwise.

Thanks,

Tobias


Regtests with FC33/x86_64 - OK for master (and )?

Paul

Fortran: Fix calls to associate name typebound subroutines [PR98897].

2021-02-02  Paul Thomas  

gcc/fortran
PR fortran/98897
* match.c (gfc_match_call): Include associate names as possible
entities with typebound subroutines. The target needs to be
resolved for the type.

gcc/testsuite/
PR fortran/98897
* gfortran.dg/typebound_call_32.f90: New test.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
module m
  implicit none
contains
  subroutine double(i)
integer :: i
i = 2*i
  end subroutine double

  function bar() result(res)
procedure(double), pointer :: res
res => double
  end function bar
  subroutine foo(i)
integer :: i

! This works:
procedure(), pointer :: proc
call double(i)
proc => bar()
call proc(i)

! This fails:
associate (var => bar())
  call var(i)  ! { dg-bogus "VARIABLE attribute of 'var' conflicts with PROCEDURE attribute" }
end associate
  end subroutine foo

end module m

program test
  use m
  implicit none (type, external)
  integer :: i
  i = 50
  call foo(i)
  if (i /= 50*2*2) stop 1
end program test
module m
  implicit none
contains
  subroutine double(i)
integer :: i
i = 2*i
  end subroutine double

  function bar() result(res)
procedure(double), pointer :: res
res => double
  end function bar

  subroutine foo(i)
integer :: i
procedure(bar) :: var

procedure(double), pointer :: proc
associate (var => bar())
  proc => var
end associate
call proc(i)
  end subroutine foo

end module m

program test
  use m
  implicit none (type, external)
  integer :: i
  i = 50
  call foo(i)
  if (i /= 50*2) stop 1
end program test


Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Uros Bizjak via Gcc-patches
On Tue, Feb 2, 2021 at 2:56 PM Jakub Jelinek  wrote:
>
> On Tue, Feb 02, 2021 at 02:23:55PM +0100, Richard Biener wrote:
> > Btw, I just can find V1DI mentioned in mmx.md but I can't find
> > rotate or shift patterns that would match?
>
> The backend has several V1?Imode shifts, but optab only for those V1DImode
> ones:
>
> grep '[la]sh[lr]v1[qhsdtox]' tmp-mddump.md
> (define_insn ("mmx_ashlv1di3")
> (define_insn ("mmx_lshrv1di3")
> (define_insn ("avx512bw_ashlv1ti3")
> (define_insn ("avx512bw_lshrv1ti3")
> (define_insn ("sse2_ashlv1ti3")
> (define_insn ("sse2_lshrv1ti3")
> (define_expand ("ashlv1di3")
> (define_expand ("lshrv1di3")
>   emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, operands[1]),
>
> I think it has been introduced with
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021#c13
>
> Before we didn't have any V1DImode expanders (except mov/movmisalign, but
> those are needed and are supplied for other V1??mode modes too).
>
> So I'll test:
>
> 2021-02-02  Jakub Jelinek  
>
> PR tree-optimization/98287
> * config/i386/mmx.md (3): For shifts don't enable expander
> for V1DImode.
>
> * gcc.dg/pr98287.c: New test.

OK.

Please note that V1DI vectors are intended only for builtins. They
were introduced to prevent former DImode MMX builtins from interfering
with DImode integer operations. Note that MMX shifts don't clobber
flags, so their RTX pattern was preferred in comparison to integer
DImode shifts.

The same happened with TImode shifts of XMM builtins.

Uros.

> --- gcc/config/i386/mmx.md.jj   2021-01-07 15:29:52.601974578 +0100
> +++ gcc/config/i386/mmx.md  2021-02-02 14:48:52.310935516 +0100
> @@ -1528,9 +1528,9 @@ (define_insn "mmx_3"
> (set_attr "mode" "DI,TI,TI")])
>
>  (define_expand "3"
> -  [(set (match_operand:MMXMODE248 0 "register_operand")
> -(any_lshift:MMXMODE248
> - (match_operand:MMXMODE248 1 "register_operand")
> +  [(set (match_operand:MMXMODE24 0 "register_operand")
> +(any_lshift:MMXMODE24
> + (match_operand:MMXMODE24 1 "register_operand")
>   (match_operand:DI 2 "nonmemory_operand")))]
>"TARGET_MMX_WITH_SSE")
>
> --- gcc/testsuite/gcc.dg/pr98287.c.jj   2021-02-02 14:50:05.583107569 +0100
> +++ gcc/testsuite/gcc.dg/pr98287.c  2021-02-02 14:50:05.583107569 +0100
> @@ -0,0 +1,19 @@
> +/* PR tree-optimization/98287 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-ccp -fno-tree-forwprop -Wno-psabi -w" } */
> +
> +typedef unsigned long __attribute__((__vector_size__ (8))) V;
> +V v;
> +
> +static __attribute__((noinline, noclone)) V
> +bar (unsigned short s)
> +{
> +  return v >> s << s | v >> s >> 63;
> +}
> +
> +unsigned long
> +foo (void)
> +{
> +  V x = bar (1);
> +  return x[0];
> +}
>
>
> Jakub
>


Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913] (was: Re: A question about coarrays and -Warray-temporaries)

2021-02-02 Thread Jorge D'Elia via Gcc-patches
Hi Tobias,,

- Mensaje original -
> De: "Tobias Burnus" 
> Para: "Thomas Koenig" , "jdelia" , "Gfortran List"
> gcc.gnu.org>, "GCC Patches" 
> Enviados: Martes, 2 de Febrero 2021 8:46:00
> Asunto: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913] 
> (was: Re: A question about coarrays and
> -Warray-temporaries)
>
> Hi all,
> 
> the attached patch now handles -fcoarray=single and access to the local
> image like no coarray access, using the same check a before.

I would like to apply the patch but, sorry, how do I proceed? 
Where in the gcc tree to apply the patch file? 
Should I use git apply or diff? e.g. it does not work with

$ pwd
/home/bigpack/gcc-paq/sources/gcc-11.0-20210202
$ git apply dep-caf.diff
error: patch failed: gcc/fortran/dependency.c:30
error: gcc/fortran/dependency.c: patch does not apply

> Actually, I think we could remove the if (..coarray..) check completely:
> For 'single', it is like no coarrays; for 'lib' both AA[remote] =... and
> "AA = ... AA[remove]" will create a function call and uses already
> temporaries. I don't know what 'native'/'threads' does – but either it
> the image index is different, then no temporary is needed at all – or it
> is the same (in particular: this_image / not present), but then the
> noncoarray analysis would work.
> 
> (Knowing that for i != this_image(), AA and AA[i] – or i /= j for AA[i]
> and AA[j] – we could further relax the checks, but I don't think that
> that's needed.)
> 
> Hence: Is the patch OK for the trunk? Or should we completely get rid of
> if-conditional block?

Thanks for the fast work with this code optimization issue.

Regards,
Jorge.

> --
> On 01.02.21 12:52, Tobias Burnus wrote:
> 
>> On 01.02.21 08:42, Thomas Koenig via Fortran wrote:
>>>> I have a question with -Warray-temporaries in the test below.
>>>> With the AA coarray that warning appears in the first loop (on its
>>>> local part),
>>>> but not with the BB array with the same task, i.e. [...]
>>> It seems that dependency checking does not detect that no overlap
>>> can exist in that case, and so generates a temporary.  Probably,
>>> dependency checking was not updated when coarrays were introduced.
>>> This is a missed optimization, not a correctness issue.
>> I have now filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98913
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank
> Thürauf


Re: [[C++ PATCH]] Implement C++2a P0330R2 - Literal Suffixes for ptrdiff_t and size_t

2021-02-02 Thread Jason Merrill via Gcc-patches

On 2/2/21 1:19 AM, Ed Smith-Rowland wrote:

On 2/2/21 12:12 AM, Jason Merrill wrote:

On 2/1/21 9:15 PM, Ed Smith-Rowland wrote:

On 2/1/21 2:23 PM, Jakub Jelinek wrote:

On Mon, Feb 01, 2021 at 01:46:13PM -0500, Ed Smith-Rowland wrote:

@@ -0,0 +1,8 @@
+// { dg-do compile { target c++23 } }
+
+#include 
+#include 
+
+static_assert(std::is_same_v);
+static_assert(std::is_same_v);
Shouldn't this be std::make_signed::type instead of 
std::ptrdiff_t
Yes it should. The paper goes on about ptrdiff_t but at the very end 
they punt on that in favor of what you have.


+std::ptrdiff_t pd1 = 1234z; // { dg-warning "use of C\+\+23 
ptrdiff_t integer constant" "" { target c++20_down } }
+std::ptrdiff_t PD1 = 5678Z; // { dg-warning "use of C\+\+23 
ptrdiff_t integer constant" "" { target c++20_down } }

Ditto here.

Agree.


+  const char *message = (result & CPP_N_UNSIGNED) == 
CPP_N_UNSIGNED

+    ? N_("use of C++23 size_t integer constant")
+    : N_("use of C++23 ptrdiff_t integer constant");

And here too (perhaps %::type%> )?
And maybe % too.

Agree.



--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -500,6 +500,9 @@ struct cpp_options
    /* Nonzero means tokenize C++20 module directives.  */
    unsigned char module_directives;
+  /* Nonzero for C++23 ptrdiff_t and size_t literals.  */

And drop "ptrdiff_t and " here?

+#define CPP_N_SIZE_T    0x200 /* C++23 size_t or ptrdiff_t 
literal  */

And " or ptrdiff_t" here?

While ptrdiff_t will usually be the same type, seems there is e.g.:
config/darwin.h:#define SIZE_TYPE "long unsigned int"
config/darwin.h:#define PTRDIFF_TYPE "int"
config/i386/djgpp.h:#define SIZE_TYPE "long unsigned int"
config/i386/djgpp.h:#define PTRDIFF_TYPE "int"
config/m32c/m32c.h:#define PTRDIFF_TYPE (TARGET_A16 ? "int" : "long 
int")

config/m32c/m32c.h:#define SIZE_TYPE "unsigned int"
config/rs6000/rs6000.h:#define PTRDIFF_TYPE "int"
config/rs6000/rs6000.h:#define SIZE_TYPE "long unsigned int"
config/s390/linux.h:#define SIZE_TYPE "long unsigned int"
config/s390/linux.h:#define PTRDIFF_TYPE (TARGET_64BIT ? "long int" 
: "int")

config/visium/visium.h:#define SIZE_TYPE "unsigned int"
config/visium/visium.h:#define PTRDIFF_TYPE "long int"
config/vms/vms.h:#define SIZE_TYPE  "unsigned int"
config/vms/vms.h:#define PTRDIFF_TYPE (flag_vms_pointer_size == 
VMS_POINTER_SIZE_NONE ? \

config/vms/vms.h-  "int" : "long long int")
so quite a few differences.

Jakub


Here is my last patch with all the concerns addressed.

I am not smart enough to get the dg-warning regex in 
Wsize_t-literals.C to work. If someone could carry this over the 
finish line that would be great. Or give me pointers. I can't any more.


Your regex will work fine if you wrap it in {} instead of "", e.g.

{ dg-warning {use of C\+\+23 .size_t. integer constant} }

Jason


Thank you Jason,

So here is the latest in testing.



+++ b/gcc/testsuite/g++.dg/warn/Wsize_t-literals.C
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++20_down } }
+
+#if __cplusplus >= 201103L
+
+#include 
+#include 
+
+std::size_t s1 = 1234zu; // { dg-warning {use of C\+\+23 .size_t. integer constant} 
"" { target { ! c++98_only } } }
+std::size_t S1 = 5678ZU; // { dg-warning {use of C\+\+23 .size_t. integer constant} 
"" { target { ! c++98_only } } }
+std::size_t s2 = 1234uz; // { dg-warning {use of C\+\+23 .size_t. integer constant} 
"" { target { ! c++98_only } } }
+std::size_t S2 = 5678UZ; // { dg-warning {use of C\+\+23 .size_t. integer constant} 
"" { target { ! c++98_only } } }
+
+std::make_signed::type pd1 = 1234z; // { dg-warning {use of C\+\+23 
.make_signed::type. integer constant} "" { target { ! c++98_only } } }
+std::make_signed::type PD1 = 5678Z; // { dg-warning {use of C\+\+23 
.make_signed::type. integer constant} "" { target { ! c++98_only } } }
+
+#endif


I'd suggest switching the target constraints around: use target c++11 in 
the dg-do line, drop the #if, and use target c++20_down in the 
diagnostics.  OK with that change.


Jason



Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913] (was: Re: A question about coarrays and -Warray-temporaries)

2021-02-02 Thread Thomas Koenig via Gcc-patches

Am 02.02.21 um 12:46 schrieb Tobias Burnus:

Hi all,

the attached patch now handles -fcoarray=single and access to the local
image like no coarray access, using the same check a before.

Actually, I think we could remove the if (..coarray..) check completely:
For 'single', it is like no coarrays; for 'lib' both AA[remote] =... and
"AA = ... AA[remove]" will create a function call and uses already
temporaries. I don't know what 'native'/'threads' does – but either it
the image index is different, then no temporary is needed at all – or it
is the same (in particular: this_image / not present), but then the
noncoarray analysis would work.


That analysis is correct, also as far as the shared memory coarray goes
(where you do offsets as an extra dimension, either it points to the
same memory or to different memory).

So, while your patch is OK, I think a simple removal of the test
is also OK.

Take your pick :-)

Best regards

Thomas



Re: [PATCH 13/16] Improve test codegen for interpreting assembly

2021-02-02 Thread Jonathan Wakely via Gcc-patches

On 27/01/21 21:42 +0100, Matthias Kretz wrote:

From: Matthias Kretz 

In many failure cases it is helpful to inspect the instructions leading
up to the test failure. After this change the location is easier to find
and the branch after failure is easier to find.

libstdc++-v3/ChangeLog:
* testsuite/experimental/simd/tests/bits/verify.h (verify): Add
instruction pointer data member. Ensure that the `if (m_failed)`
branch is always inlined into the calling code. The body of the
conditional can still be a function call. Move the get_ip call
into the verify ctor to simplify the ctor calls.
(COMPARE): Don't mention the use of all_of for reduction of a
simd_mask. It only distracts from the real issue.
---
.../experimental/simd/tests/bits/verify.h | 44 +--
1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/verify.h b/
libstdc++-v3/testsuite/experimental/simd/tests/bits/verify.h


Several of these patches have been mangled by your mailer. The line
above has been wrapped in the middle of the filename, making this an
invalid patch header.


@@ -220,24 +223,21 @@ template 

#define COMPARE(_a, _b)
\


And all these lines ending with a backslash have been wrapped.

I can easily fix the filename in the header, but my simplistic
attempts to put this back together have failed. Could you please
resend patch 13/16 as an attachment, not inline?

I think the others are OK.




Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
>>
>> Hi Richard(s),
>>
>> I'm just looking to see if I'm going about this the right way, based on the 
>> discussion we had on IRC. I've managed to hack something together, I've 
>> attached a (very) WIP patch which gives the correct codegen for the testcase 
>> in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
>> obviously need to support other widening patterns and differentiate between 
>> big/little endian among other things.
>>
>> I added a backend pattern because I wasn't quite clear which changes to make 
>> in order to allow the existing backend patterns to be used with a V8QI, or 
>> how to represent V16QI where we don't care about the top/bottom 8. I made 
>> some attempt in optabs.c, which is in the patch commented out, but I'm not 
>> sure if I'm going about this the right way.
>
> Hmm, as said, I'd try to arrange like illustrated in the attachment,
> confined to vectorizable_conversion.  The
> only complication might be sub-optimal code-gen for the vector-vector
> CTOR compensating for the input
> vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)

Yeah.  I don't really like this because it means that it'll be
impossible to remove the redundant work in gimple.  The extra elements
are just a crutch to satisfy the type system.

As far as Joel's patch goes, I was imagining that the new operation
would be an internal function rather than a tree code.  However,
if we don't want that, maybe we should just emit separate conversions
and a normal subtraction, like we would for (signed) x - (unsigned) y.

Thanks,
Richard


Re: [Patch, fortran] PR98897 - Erroneous procedure attribute for associate name

2021-02-02 Thread Paul Richard Thomas via Gcc-patches
Hi Tobias,

ifort (IFORT) 2021.1 Beta 20201112 is happy with the testcase in the patch.

In foo.f90, if I remove
 call var(i)  ! { dg-bogus "VARIABLE attribute of 'var' conflicts with
PROCEDURE attribute" }
gfortran correctly complains
   23 | associate (var => bar())
  |  1
Error: Selector at (1) has no type
ifort complains:
../pr98897/foo.f90(11): error #8179: The procedure pointer and the
procedure target must both be functions or subroutines.
res => double

The responses from both compilers to foo3.f90 are the same.

Cheers

Paul



On Tue, 2 Feb 2021 at 13:59, Tobias Burnus  wrote:

> Hi Paul,
>
> On 02.02.21 13:20, Paul Richard Thomas via Gcc-patches wrote:
> > This is more or less 'obvious' and does not require any further
> explanation.
>
> Well, I am not sure whether calling resolve is premature or not. In any
> case, it still fails for the attached testcase. (Related but separate
> issue.)
>
> The second testcase fails with "Selector at (1) has no type" / "Symbol
> 'var' at (1) has no IMPLICIT type".
>
> Disclaimer: I am not 100% sure whether those two or your/the PR's
> testcase is valid. (It fails to compile with ifort 19.1. I have not read
> the spec and assume that the original testcase is valid.)
>
> Thus, please confirm that all three are valid. If so, do you see a way
> to make the two new ones pass as well? If we are sure that the current
> patch is still the right approach, I am also fine to do it stepwise.
>
> Thanks,
>
> Tobias
>
> > Regtests with FC33/x86_64 - OK for master (and )?
> >
> > Paul
> >
> > Fortran: Fix calls to associate name typebound subroutines [PR98897].
> >
> > 2021-02-02  Paul Thomas  
> >
> > gcc/fortran
> > PR fortran/98897
> > * match.c (gfc_match_call): Include associate names as possible
> > entities with typebound subroutines. The target needs to be
> > resolved for the type.
> >
> > gcc/testsuite/
> > PR fortran/98897
> > * gfortran.dg/typebound_call_32.f90: New test.
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank
> Thürauf
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Biener via Gcc-patches
On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
> >>
> >> Hi Richard(s),
> >>
> >> I'm just looking to see if I'm going about this the right way, based on 
> >> the discussion we had on IRC. I've managed to hack something together, 
> >> I've attached a (very) WIP patch which gives the correct codegen for the 
> >> testcase in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). 
> >> It would obviously need to support other widening patterns and 
> >> differentiate between big/little endian among other things.
> >>
> >> I added a backend pattern because I wasn't quite clear which changes to 
> >> make in order to allow the existing backend patterns to be used with a 
> >> V8QI, or how to represent V16QI where we don't care about the top/bottom 
> >> 8. I made some attempt in optabs.c, which is in the patch commented out, 
> >> but I'm not sure if I'm going about this the right way.
> >
> > Hmm, as said, I'd try to arrange like illustrated in the attachment,
> > confined to vectorizable_conversion.  The
> > only complication might be sub-optimal code-gen for the vector-vector
> > CTOR compensating for the input
> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)
>
> Yeah.  I don't really like this because it means that it'll be
> impossible to remove the redundant work in gimple.  The extra elements
> are just a crutch to satisfy the type system.

We can certainly devise a more clever way to represent a paradoxical subreg,
but at least the actual operation (WIDEN_MINUS_LOW) would match what
the hardware can do.  OTOH we could simply accept half of a vector for
the _LOW (little-endial) or _HIGH (big-endian) op and have the expander
deal with subreg frobbing?  Not that I'd like that very much though, even
a VIEW_CONVERT  (v4hi-reg) would be cleaner IMHO (not sure
how to go about endianess here ... the _LOW/_HIGH paints us into some
corner here)

A new IFN (direct optab?) means targets with existing support for _LO/HI
do not automatically benefit which is a shame.

> As far as Joel's patch goes, I was imagining that the new operation
> would be an internal function rather than a tree code.  However,
> if we don't want that, maybe we should just emit separate conversions
> and a normal subtraction, like we would for (signed) x - (unsigned) y.
>
> Thanks,
> Richard


Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-02-02 Thread Qing Zhao via Gcc-patches



> On Feb 2, 2021, at 1:43 AM, Richard Biener  wrote:
> 
> On Mon, 1 Feb 2021, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.
>> 
>> And now the routine “bump_map” in 511.povray is like following:
>> ...
>> 
>> # DEBUG BEGIN_STMT
>>  xcoor = 0.0;
>>  ycoor = 0.0;
>>  # DEBUG BEGIN_STMT
>>  index = .DEFERRED_INIT (index, 2);
>>  index2 = .DEFERRED_INIT (index2, 2);
>>  index3 = .DEFERRED_INIT (index3, 2);
>>  # DEBUG BEGIN_STMT
>>  colour1 = .DEFERRED_INIT (colour1, 2);
>>  colour2 = .DEFERRED_INIT (colour2, 2);
>>  colour3 = .DEFERRED_INIT (colour3, 2);
>>  # DEBUG BEGIN_STMT
>>  p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
>>  # DEBUG p1$0 => p1$0_181
>>  p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
>>  # DEBUG p1$1 => p1$1_184
>>  p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
>>  # DEBUG p1$2 => p1$2_172
>>  p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
>>  # DEBUG p2$0 => p2$0_177
>>  p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
>>  # DEBUG p2$1 => p2$1_135
>>  p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
>>  # DEBUG p2$2 => p2$2_137
>>  p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
>>  # DEBUG p3$0 => p3$0_377
>>  p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
>>  # DEBUG p3$1 => p3$1_379
>>  p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
>>  # DEBUG p3$2 => p3$2_381
>> 
>> 
>> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of 
>> the components of p1, p2 and p3. 
>> 
>> With this change, the stack usage numbers with -fstack-usage for approach A, 
>> old approach D and new D with the splitting in SRA are:
>> 
>>  Approach A  Approach D-old  Approach D-new
>> 
>>  272 624 368
>> 
>> From the above, we can see that splitting the call to DEFERRED_INIT in SRA 
>> can reduce the stack usage increase dramatically. 
>> 
>> However, looks like that the stack size for D is still bigger than A. 
>> 
>> I checked the IR again, and found that the alias analysis might be 
>> responsible for this (by compare the image.cpp.026t.ealias for both A and D):
>> 
>> (Due to the call to:
>> 
>>  colour1 = .DEFERRED_INIT (colour1, 2);
>> )
>> 
>> **Approach A:
>> 
>> Points_to analysis:
>> 
>> Constraints:
>> …
>> colour1 = &NULL
>> …
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> ...
>> callarg(53) = &colour1
>> ...
>> _53 = colour1
>> 
>> Points_to sets:
>> …
>> colour1 = { NULL ESCAPED NONLOCAL } same as _53
>> ...
>> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
>> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as 
>> CALLUSED(48)
>> ...
>> callarg(53) = { NULL ESCAPED NONLOCAL colour1 }
>> 
>> **Apprach D:
>> 
>> Points_to analysis:
>> 
>> Constraints:
>> …
>> callarg(19) = colour1
>> callarg(19) = &NONLOCAL
>> colour1 = callarg(19) + UNKNOWN
>> colour1 = &NONLOCAL
>> …
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> colour1 = &NONLOCAL
>> …
>> callarg(74) = &colour1
>> callarg(74) = callarg(74) + UNKNOWN
>> callarg(74) = *callarg(74) + UNKNOWN
>> …
>> _53 = colour1
>> _54 = _53
>> _55 = _54 + UNKNOWN
>> _55 = &NONLOCAL
>> _56 = colour1
>> _57 = _56
>> _58 = _57 + UNKNOWN
>> _58 = &NONLOCAL
>> _59 = _55 + UNKNOWN
>> _59 = _58 + UNKNOWN
>> _60 = colour1
>> _61 = _60
>> _62 = _61 + UNKNOWN
>> _62 = &NONLOCAL
>> _63 = _59 + UNKNOWN
>> _63 = _62 + UNKNOWN
>> _64 = _63 + UNKNOWN
>> ..
>> Points_to set:
>> …
>> colour1 = { ESCAPED NONLOCAL } same as callarg(19)
>> …
>> CALLUSED(69) = { ESCAPED NONLOCAL index colour1 }
>> CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69)
>> callarg(71) = { ESCAPED NONLOCAL }
>> callarg(72) = { ESCAPED NONLOCAL }
>> callarg(73) = { ESCAPED NONLOCAL }
>> callarg(74) = { ESCAPED NONLOCAL colour1 }
>> 
>> My question:
>> 
>> Is it possible to adjust alias analysis to resolve this issue?
> 
> You probably want to handle .DEFERRED_INIT in tree-ssa-structalias.c
> find_func_aliases_for_call (it's not a builtin but you can look in
> the respective subroutine for examples).  Specifically you want to
> avoid making anything escaped or clobbered.

Okay, thanks.

Will check on that.

Qing
>> 
> 
> -- 
> Richard Biener mailto:rguent...@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



[PATCH] aarch64: Reimplement vqmovun_high* intrinsics using builtins

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Another transition from inline asm to builtin.
Only 3 intrinsics converted this time but they use the "+w" constraint in their 
inline asm
so are more likely to generate redundant moves so benefit more from 
reimplementation.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (sqxtun2): Define builtin.
* config/aarch64/aarch64-simd.md (aarch64_sqxtun2_le): Define.
(aarch64_sqxtun2_be): Likewise.
(aarch64_sqxtun2): Likewise.
* config/aarch64/arm_neon.h (vqmovun_high_s16): Reimplement using 
builtin.
(vqmovun_high_s32): Likewise.
(vqmovun_high_s64): Likewise.
* config/aarch64/iterators.md (UNSPEC_SQXTUN2): Define.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/narrow_high-intrinsics.c: Adjust sqxtun2 scan.


vqmovun2.patch
Description: vqmovun2.patch


[PATCH] aarch64: Reimplement vrsqrte* intrinsics with builtins

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Another very simple move from inline asm to builtins.
Only two intrinsics this time.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (ursqrte): Define builtin.
* config/aarch64/aarch64-simd.md (aarch64_ursqrte): New pattern.
* config/aarch64/arm_neon.h (vrsqrte_u32): Reimplement using builtin.
(vrsqrteq_u32): Likewise.


ursqrt.patch
Description: ursqrt.patch


Re: [Patch, fortran] PR98897 - Erroneous procedure attribute for associate name

2021-02-02 Thread Tobias Burnus

Hi,

first, I have attached a new example – it works if I move bar/hello up,
but if 'foo' comes first, it fails. I think it is valid.
(ifort 19 also compiles it.)

Sorry for trying hard to find examples where it does not
work – but I have simply the feeling that resolving things
during parsing cannot work in all cases.

On the other hand, I think your patch at least does not break
valid code as I had feared before. :-)
Thus, in that sense it would work for me.

 * * *

Regarding my previous examples, they are invalid because of:

C1105  (R1105) expr shall not be a designator of a procedure pointer
   or a function reference that returns a procedure pointer.

However:

On 02.02.21 16:05, Paul Richard Thomas via Fortran wrote:


In foo.f90, if I remove
  call var(i)  ! { dg-bogus "VARIABLE attribute of 'var' conflicts with
PROCEDURE attribute" }
gfortran correctly complains
23 | associate (var => bar())
   |  1
Error: Selector at (1) has no type


Which is not quite right. bar() has a type – it returns
a procedure pointer; even in cases where gfortran could
know at parse time, it does not diagnose C1105 but shows
an odd error instead.


ifort complains:
../pr98897/foo.f90(11): error #8179: The procedure pointer and the
procedure target must both be functions or subroutines.
 res => double

Okay, we found a bug in ifort. 'double' and 'res' have the same
interface by construction – and both are subroutines.
It seems to be a similar bug to the ifort bug I got before:
When 'double' is parsed, ifort expects that 'precision' follows
('double precision').


The responses from both compilers to foo3.f90 are the same.


(I forgot to comment/remove 'procedure(bar) :: var' when
playing around.) Again, this code violates C1105 – and the
error messages are still odd.


On Tue, 2 Feb 2021 at 13:59, Tobias Burnus  wrote:
On 02.02.21 13:20, Paul Richard Thomas via Gcc-patches wrote:

Regtests with FC33/x86_64 - OK for master (and )?
Fortran: Fix calls to associate name typebound subroutines [PR98897].

2021-02-02  Paul Thomas  

gcc/fortran
PR fortran/98897
* match.c (gfc_match_call): Include associate names as possible
entities with typebound subroutines. The target needs to be
resolved for the type.

gcc/testsuite/
PR fortran/98897
* gfortran.dg/typebound_call_32.f90: New test.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
module m
  implicit none
  type t
  contains
procedure, nopass :: hello
  end type t
contains
  subroutine foo()
associate (var => bar())
  call var%hello()
end associate
  end subroutine foo
  subroutine hello
print *, 'Hello'
  end
  type(t) function bar()
  end
end module m

program test
  use m
  call foo()
end program test


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
>> >>
>> >> Hi Richard(s),
>> >>
>> >> I'm just looking to see if I'm going about this the right way, based on 
>> >> the discussion we had on IRC. I've managed to hack something together, 
>> >> I've attached a (very) WIP patch which gives the correct codegen for the 
>> >> testcase in question 
>> >> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would obviously 
>> >> need to support other widening patterns and differentiate between 
>> >> big/little endian among other things.
>> >>
>> >> I added a backend pattern because I wasn't quite clear which changes to 
>> >> make in order to allow the existing backend patterns to be used with a 
>> >> V8QI, or how to represent V16QI where we don't care about the top/bottom 
>> >> 8. I made some attempt in optabs.c, which is in the patch commented out, 
>> >> but I'm not sure if I'm going about this the right way.
>> >
>> > Hmm, as said, I'd try to arrange like illustrated in the attachment,
>> > confined to vectorizable_conversion.  The
>> > only complication might be sub-optimal code-gen for the vector-vector
>> > CTOR compensating for the input
>> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)
>>
>> Yeah.  I don't really like this because it means that it'll be
>> impossible to remove the redundant work in gimple.  The extra elements
>> are just a crutch to satisfy the type system.
>
> We can certainly devise a more clever way to represent a paradoxical subreg,
> but at least the actual operation (WIDEN_MINUS_LOW) would match what
> the hardware can do.

At least for the Arm ISAs, the low parts are really 64-bit → 128-bit
operations.  E.g. the low-part intrinsic for signed 8-bit integers is:

   int16x8_t vsubl_s8 (int8x8_t __a, int8x8_t __b);

whereas the high-part intrinsic is:

   int16x8_t vsubl_high_s8 (int8x16_t __a, int8x16_t __b);

So representing the low part as a 128-bit → 128-bit operation is already
a little artifical.

> OTOH we could simply accept half of a vector for
> the _LOW (little-endial) or _HIGH (big-endian) op and have the expander
> deal with subreg frobbing?  Not that I'd like that very much though, even
> a VIEW_CONVERT  (v4hi-reg) would be cleaner IMHO (not sure
> how to go about endianess here ... the _LOW/_HIGH paints us into some
> corner here)

I think it only makes sense for the low part.  But yeah, I guess that
would work (although I agree it doesn't seem very appealing :-)).

> A new IFN (direct optab?) means targets with existing support for _LO/HI
> do not automatically benefit which is a shame.

In practice this will only affect targets that choose to use mixed
vector sizes, and I think it's reasonable to optimise only for the
case in which such targets support widening conversions.  So what
do you think about the idea of emitting separate conversions and
a normal subtract?  We'd be relying on RTL to fuse them together,
but at least there would be no redundancy to eliminate.

Thanks,
Richard
>
>> As far as Joel's patch goes, I was imagining that the new operation
>> would be an internal function rather than a tree code.  However,
>> if we don't want that, maybe we should just emit separate conversions
>> and a normal subtraction, like we would for (signed) x - (unsigned) y.
>>
>> Thanks,
>> Richard


Re: [PATCH 1/4] openacc: Remove dereference for non-pointer derived-type members

2021-02-02 Thread Tobias Burnus

(added fortran@ at it is about Fortran)

I note that some lines in the testcase are commented (polymorphic
arrays), but I note further that Patch 3/4 enables them.

On 02.02.21 14:28, Julian Brown wrote:

Derived-type members that are themselves derived types are not always
represented as pointers, so it is not always correct to dereference
them. The attached test case fails during gimplification without this
patch.
Tested with offloading to AMD GCN. OK for mainline?

2020-02-02  Julian Brown  

gcc/fortran/
  * trans-openmp.c (gfc_trans_omp_clauses): Handle non-pointer type.

gcc/testsuite/
  * gfortran.dg/goacc/derived-classtypes-1.f95: New test.
---
  gcc/fortran/trans-openmp.c|   7 +-
  .../goacc/derived-classtypes-1.f95| 129 ++
  2 files changed, 134 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 00358ca4d39..8d8da4593c3 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3013,7 +3013,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  }

OMP_CLAUSE_DECL (node)
- = build_fold_indirect_ref (data);
+ = (POINTER_TYPE_P (TREE_TYPE (data))
+? build_fold_indirect_ref (data) : data);
OMP_CLAUSE_SIZE (node) = size;
node2 = build_omp_clause (input_location,
  OMP_CLAUSE_MAP);


I am a bit surprised given that:
  if (sym_attr.pointer || (openacc && sym_attr.allocatable))

I wonder whether we have a size problem:
  data = inner;
  size = TYPE_SIZE_UNIT (TREE_TYPE (inner));

Testing shows that it fails for:
   80 | !$acc enter data copyin(jim%f)
   95 | !$acc enter data copyin(pjim%f)
  110 | !$acc enter data copyin(cjim%f)
  125 | !$acc enter data copyin(acjim%f)

where the component is 'type(aux), pointer :: f'.

As sizeof(void*) and 2*sizeof(int) is the same, it does not
matter in this case, but I fear it might in other cases.
('aux' contains two 'integer', one direct and one by inheriting
it from 'aux1'.)

Otherwise, I think the patch is fine. However, it also might be
useful to add some character tests (kind=1 and kind=4) for
completeness.

NOTE: character(len=:), allocatable/pointer is the most interesting
case, but that fails due to https://gcc.gnu.org/PR95868.
Thus, we can only test for len=.

Tobias


@@ -3021,7 +3022,9 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
 openacc
 ? GOMP_MAP_ATTACH_DETACH
 : GOMP_MAP_ALWAYS_POINTER);
-   OMP_CLAUSE_DECL (node2) = data;
+   OMP_CLAUSE_DECL (node2)
+ = (POINTER_TYPE_P (TREE_TYPE (data))
+? data : build_fold_addr_expr (data));
OMP_CLAUSE_SIZE (node2) = size_int (0);
  }
else
diff --git a/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95 
b/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95
new file mode 100644
index 000..e6cf09c6d3c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/derived-classtypes-1.f95
@@ -0,0 +1,129 @@
+type :: type1
+  integer :: a
+end type type1
+
+type :: type2
+  integer, pointer :: b
+end type type2
+
+type :: aux1
+  integer :: y
+end type aux1
+
+type, extends(aux1) :: aux
+  integer :: x
+end type aux
+
+type :: type3
+  class(aux), pointer :: c(:)
+end type type3
+
+type :: type4
+  integer, pointer :: d(:)
+end type type4
+
+type :: type5
+  type(aux) :: e
+end type type5
+
+type :: type6
+  type(aux), pointer :: f
+end type type6
+
+type :: type7
+  class(aux), pointer :: g
+end type type7
+
+type(type1) :: foo
+type(type2) :: bar
+type(type3) :: qux
+type(type4) :: quux
+type(type5) :: fred
+type(type6) :: jim
+type(type7) :: shiela
+
+type(type1), pointer :: pfoo
+type(type2), pointer :: pbar
+type(type3), pointer :: pqux
+type(type4), pointer :: pquux
+type(type5), pointer :: pfred
+type(type6), pointer :: pjim
+type(type7), pointer :: pshiela
+
+class(type1), pointer :: cfoo
+class(type2), pointer :: cbar
+class(type3), pointer :: cqux
+class(type4), pointer :: cquux
+class(type5), pointer :: cfred
+class(type6), pointer :: cjim
+class(type7), pointer :: cshiela
+
+class(type1), allocatable :: acfoo
+class(type2), allocatable :: acbar
+class(type3), allocatable :: acqux
+class(type4), allocatable :: acquux
+class(type5), allocatable :: acfred
+class(type6), allocatable :: acjim
+class(type7), allocatable :: acshiela
+
+!$acc enter data copyin(foo)
+!$ac

[pushed] c++: Member fns and deduction guide rewriting [PR98929]

2021-02-02 Thread Jason Merrill via Gcc-patches
My patch for 96199 had us re-substitute the parameter types of a constructor
in order to rewrite mentions of members into dependent references.  We need
to do that for member functions, too.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/98929
PR c++/96199
* error.c (dump_expr): Ignore dummy object.
* pt.c (tsubst_baselink): Handle dependent scope.

gcc/testsuite/ChangeLog:

PR c++/98929
* g++.dg/cpp1z/class-deduction-decltype1.C: New test.
---
 gcc/cp/error.c|  3 ++-
 gcc/cp/pt.c   | 10 ++
 .../g++.dg/cpp1z/class-deduction-decltype1.C  | 11 +++
 3 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction-decltype1.C

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 60f4fd691f3..5213a8030ca 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -2352,7 +2352,8 @@ dump_expr (cxx_pretty_printer *pp, tree t, int flags)
if (INDIRECT_REF_P (ob))
  {
ob = TREE_OPERAND (ob, 0);
-   if (!is_this_parameter (ob))
+   if (!is_this_parameter (ob)
+   && !is_dummy_object (ob))
  {
dump_expr (pp, ob, flags | TFF_EXPR_IN_PARENS);
if (TYPE_REF_P (TREE_TYPE (ob)))
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index db0ff73bdeb..aa1687a9f2a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -16196,6 +16196,16 @@ tsubst_baselink (tree baselink, tree object_type,
   if (IDENTIFIER_CONV_OP_P (name))
name = make_conv_op_name (optype);
 
+  /* See maybe_dependent_member_ref.  */
+  if (dependent_scope_p (qualifying_scope))
+   {
+ if (template_id_p)
+   name = build2 (TEMPLATE_ID_EXPR, unknown_type_node, name,
+  template_args);
+ return build_qualified_name (NULL_TREE, qualifying_scope, name,
+  /* ::template */false);
+   }
+
   if (name == complete_dtor_identifier)
/* Treat as-if non-dependent below.  */
dependent_p = false;
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction-decltype1.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction-decltype1.C
new file mode 100644
index 000..b83f7adecc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction-decltype1.C
@@ -0,0 +1,11 @@
+// PR c++/98929
+// { dg-do compile { target c++17 } }
+
+template 
+struct A {
+  void foo ();
+  using c = decltype (foo ());
+  A (c);   // { dg-message {decltype \(A::foo} }
+};
+A d;   // { dg-error "deduction failed" }
+// { dg-error "no match" "" { target *-*-* } .-1 }

base-commit: d14cf89b94299d6d66c150fbbb9899a5dd91e7d4
-- 
2.27.0



Re: [PATCH] i386, df: Fix up gcc.c-torture/compile/20051216-1.c -O1 -march=cascadelake

2021-02-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek  
> wrote:
>>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote:
>>> OK, so I'd prefer we simply unset the flag after processing deferred
>>rescan. I clearly misread the function to do that. 
>>
>>This works too, will bootstrap/regtest it now.
>
> OK. 

FWIW, I'm still not convinced we need to defer the rescan here.
AIUI, the concern was that the pass introduces uses of something
and then only later introduces the definition.  But that's OK:
the two things don't have to be added in a set order.

In particular, the normal rescan doesn't propagate the effects
throughout the cfg; it just updates the list of references in the
instruction itself and marks the block as dirty for later processing.

The usual reason for deferring rescans is if the instruction can't
cope with the df_refs changing from under them like that, e.g.
because they're iterating through a list of df_refs, or because
they're using df_refs to represent value numbers.  It shouldn't
be a problem for this pass.

Thanks,
Richard


Re: [RFC] test builtin ratio for loop distribution

2021-02-02 Thread Alexandre Oliva
On Jan 28, 2021, Richard Biener  wrote:

> That would allow turning back the memset into the original loop (but
> with optimal IVs, etc.).

Is this sort of what you had in mind?

I haven't tested the inline expansion of memset much yet; and that of
memcpy, not at all; this really is mainly to check that I understood
what you had in mind before I spend further time polishing it.


test builtin ratio for loop distribution

From: Alexandre Oliva 

The ldist pass turns even very short loops into memset calls.  E.g.,
the TFmode emulation calls end with a loop of up to 3 iterations, to
zero out trailing words, and the loop distribution pass turns them
into calls of the memset builtin.

Though short constant-length memsets are usually dealt with
efficiently, for non-constant-length ones, the options are setmemM, or
a function calls.

RISC-V doesn't have any setmemM pattern, so the loops above end up
"optimized" into memset calls, incurring not only the overhead of an
explicit call, but also discarding the information the compiler has
about the alignment of the destination, and that the length is a
multiple of the word alignment.

This patch adds, to the loop distribution pass, the ability to perform
inline expansion of bounded variable-length memset and memcpy in ways
that take advantage of known alignments and size's factors, when
preexisting *_RATIO macros suggest the inline expansion is
advantageous.


for  gcc/ChangeLog

* tree-loop-distribution.c: Include builtins.h.
(generate_memset_builtin): Introduce optional inline expansion
of bounded variable-sized memset calls.
(generate_memcpy_builtin): Likewise for memcpy only.
(loop_distribution::execute): Fix loop structure afterwards.
---
 gcc/tree-loop-distribution.c |  280 ++
 1 file changed, 279 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index bb15fd3723fb6..3be7a4c1ac281 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -115,6 +115,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 #include "gimple-fold.h"
+#include "builtins.h"
 
 
 #define MAX_DATAREFS_NUM \
@@ -1148,6 +1149,23 @@ generate_memset_builtin (class loop *loop, partition 
*partition)
   /* The new statements will be placed before LOOP.  */
   gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
 
+  /* Compute builtin->size range and ctz before it's gimplified; 
sub-expressions
+ thereof are rewritten in place, so they end up referencing SSA_NAMEs for
+ which we don't have VR info.  */
+  unsigned align = get_pointer_alignment (builtin->dst_base) / BITS_PER_UNIT;
+  unsigned alctz, szctz, xbits;
+  wide_int szmin, szmax;
+  value_range_kind vrk;
+  if (align)
+{
+  alctz = wi::ctz (align);
+  szctz = MIN (tree_ctz (builtin->size), alctz);
+  xbits = alctz - szctz;
+  vrk = determine_value_range (builtin->size, &szmin, &szmax);
+  if (szmin == szmax)
+   align = 0;
+}
+
   nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
   nb_bytes = force_gimple_operand_gsi (&gsi, nb_bytes, true, NULL_TREE,
   false, GSI_CONTINUE_LINKING);
@@ -1172,6 +1190,127 @@ generate_memset_builtin (class loop *loop, partition 
*partition)
   val = tem;
 }
 
+  unsigned HOST_WIDE_INT ratio;
+  if (integer_zerop (val))
+ratio = CLEAR_RATIO (optimize_loop_for_speed_p (loop));
+  else
+ratio = SET_RATIO (optimize_loop_for_speed_p (loop));
+
+  /* Compare the ratio with the number of (most-aligned) required stores needed
+ for szmax bytes, with the ratio: a loop that iterates up to szmax >> 
alctz,
+ and then one conditional store for each extra bit of alignment that the
+ destination has over the size.  */
+  if (align && vrk == VR_RANGE
+  && wi::ltu_p (wi::rshift (szmax, alctz, UNSIGNED) + xbits, ratio))
+{
+  gimple *stmt;
+  tree bytes = create_tmp_var_raw (size_type_node, "ldistbytes");
+  tree ptr = create_tmp_var_raw (build_pointer_type (char_type_node),
+"ldistptr");
+  tree stval = force_gimple_operand_gsi (&gsi,
+fold_convert
+(unsigned_char_type_node, val),
+true, NULL_TREE, false,
+GSI_CONTINUE_LINKING);
+
+  for (unsigned i = 1; i != alctz; i *= 2)
+   {
+ unsigned bits = i * 2 * BITS_PER_UNIT;
+ tree type = build_nonstandard_integer_type (bits, true);
+ stval = fold_convert (type, stval);
+ tree op2 = fold_build2_loc (partition->loc, LSHIFT_EXPR,
+ TREE_TYPE (stval), stval,
+ build_int_cst (type, i * BITS_PER_UNIT));
+ stval 

Re: [Patch] Fortran: Fix Array dependency with local coarrays [PR98913] (was: Re: A question about coarrays and -Warray-temporaries)

2021-02-02 Thread Tobias Burnus

Hi Thomas,

On 02.02.21 15:54, Thomas Koenig wrote:

So, while your patch is OK, I think a simple removal of the test
is also OK.
Take your pick :-)


I think I will do a combination: If 'identical' is true, I think I
cannot remove it. If it is false, it can be identical or nonoverlapping
– which makes sense.

Unless there are further comments, I will commit it later.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Fortran: Fix Array dependency with local coarrays [PR98913]

gcc/fortran/ChangeLog:

	PR fortran/98913
	* dependency.c (gfc_dep_resolver): Treat local access
	to coarrays like any array access in dependency analysis.

gcc/testsuite/ChangeLog:

	PR fortran/98913
	* gfortran.dg/coarray/array_temporary.f90: New test.

 gcc/fortran/dependency.c   | 15 -
 .../gfortran.dg/coarray/array_temporary.f90| 74 ++
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/dependency.c b/gcc/fortran/dependency.c
index c9baca80cbc..58593ba535b 100644
--- a/gcc/fortran/dependency.c
+++ b/gcc/fortran/dependency.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dependency.h"
 #include "constructor.h"
 #include "arith.h"
+#include "options.h"
 
 /* static declarations */
 /* Enums  */
@@ -2142,9 +2143,17 @@ gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse,
 	  return (fin_dep == GFC_DEP_OVERLAP) ? 1 : 0;
 
 	case REF_ARRAY:
-
-	  /* For now, treat all coarrays as dangerous.  */
-	  if (lref->u.ar.codimen || rref->u.ar.codimen)
+	  /* Coarrays: If there is a coindex, either the image differs and there
+	 is no overlap or the image is the same - then the normal analysis
+	 applies.  Hence, return early only if 'identical' is required and
+	 either ref is coindexed and more than one image can exist.  */
+	  if (identical && flag_coarray != GFC_FCOARRAY_SINGLE
+	  && ((lref->u.ar.codimen
+		   && lref->u.ar.dimen_type[lref->u.ar.dimen]
+		  != DIMEN_THIS_IMAGE)
+		  || (rref->u.ar.codimen
+		  && lref->u.ar.dimen_type[lref->u.ar.dimen]
+			 != DIMEN_THIS_IMAGE)))
 	return 1;
 
 	  if (ref_same_as_full_array (lref, rref))
diff --git a/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90 b/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90
new file mode 100644
index 000..86460a7c282
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/array_temporary.f90
@@ -0,0 +1,74 @@
+! { dg-do compile }
+! { dg-additional-options "-Warray-temporaries" }
+!
+! PR fortran/98913
+!
+! Contributed by Jorge D'Elia
+!
+! Did create an array temporary for local access to coarray
+! (but not for identical noncoarray use).
+!
+
+program test
+  implicit none
+  integer, parameter :: iin = kind (1) 
+  integer, parameter :: idp = kind (1.0d0) 
+  real(kind=idp), allocatable :: AA (:,:)[:]
+  real(kind=idp), allocatable :: BB (:,:)
+  real(kind=idp), allocatable :: UU (:)
+  integer (kind=iin) :: nn, n1, n2
+  integer (kind=iin) :: j, k, k1
+  !
+  nn =  5
+  n1 =  1
+  n2 = 10
+  !
+  allocate (AA (1:nn,n1:n2)[*])
+  allocate (BB (1:nn,n1:n2))
+  allocate (UU (1:nn))
+  !
+  k  = 1
+  k1 = k + 1
+  !
+  AA = 1.0_idp
+  BB = 1.0_idp
+  UU = 2.0_idp
+
+  ! AA - coarrays
+  ! No temporary needed:
+  do  j = 1, nn
+AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+  do  j = 1, nn
+AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1-1:nn-1,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+  do  j = 1, nn
+AA (k1:nn,j) = AA (k1:nn,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1+1:nn+1,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+
+  ! But:
+  do  j = 1, nn
+AA (k1:nn,j) = AA (k1-1:nn-1,j) - UU (k1:nn) * AA (k,j) - UU(k) * AA (k1+1:nn+1,j)  ! { dg-warning "Creating array temporary" }
+  end do
+
+  ! BB - no coarrays
+  ! No temporary needed:
+  do  j = 1, nn
+BB (k1:nn,j) = BB (k1:nn,j) - UU (k1:nn) * BB (k,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+  do  j = 1, nn
+BB (k1:nn,j) = BB (k1:nn,j) - UU (k1:nn) * BB (k,j) - UU(k) * BB (k1-1:nn-1,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+  do  j = 1, nn
+BB (k1:nn,j) = BB (k1:nn,j) - UU (k1:nn) * BB (k,j) - UU(k) * BB (k1+1:nn+1,j)  ! { dg-bogus "Creating array temporary" }
+  end do
+
+  ! But:
+  do  j = 1, nn
+BB (k1:nn,j) = BB (k1-1:nn-1,j) - UU (k1:nn) * BB (k,j) - UU(k) * BB (k1+1:nn+1,j)  ! { dg-warning "Creating array temporary" }
+  end do
+
+  deallocate (AA)
+  deallocate (BB)
+  deallocate (UU)
+end program test


Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

2021-02-02 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This is a gcc-9 backport of the PR97528 fix that has been applied to 
trunk and gcc-10.

Bootstraped on arm-linux-gnueabihf and regression tested.

OK for gcc-9 branch?

2021-02-02  Andre Vieira  

    Backport from mainline
    2020-11-20  Jakub Jelinek  

    PR target/97528
    * config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY, require
    first POST_MODIFY operand is a REG and is equal to the first operand
    of PLUS.

    * gcc.target/arm/pr97528.c: New test.

On 20/11/2020 11:25, Kyrylo Tkachov via Gcc-patches wrote:



-Original Message-
From: Jakub Jelinek 
Sent: 19 November 2020 18:57
To: Richard Earnshaw ; Ramana
Radhakrishnan ; Kyrylo Tkachov

Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

Hi!

The documentation for POST_MODIFY says:
Currently, the compiler can only handle second operands of the
form (plus (reg) (reg)) and (plus (reg) (const_int)), where
the first operand of the PLUS has to be the same register as
the first operand of the *_MODIFY.
The following testcase ICEs, because combine just attempts to simplify
things and ends up with
(post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1))
but the target predicates accept it, because they only verify
that POST_MODIFY's second operand is PLUS and the second operand
of the PLUS is a REG.

The following patch fixes this by performing further verification that
the POST_MODIFY is in the form it should be.

Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk
and release branches after a while?

Ok.
Thanks,
Kyrill


2020-11-19  Jakub Jelinek  

PR target/97528
* config/arm/arm.c (neon_vector_mem_operand): For
POST_MODIFY, require
first POST_MODIFY operand is a REG and is equal to the first operand
of PLUS.

* gcc.target/arm/pr97528.c: New test.

--- gcc/config/arm/arm.c.jj 2020-11-13 19:00:46.729620560 +0100
+++ gcc/config/arm/arm.c2020-11-18 17:05:44.656867343 +0100
@@ -13429,7 +13429,9 @@ neon_vector_mem_operand (rtx op, int typ
/* Allow post-increment by register for VLDn */
if (type == 2 && GET_CODE (ind) == POST_MODIFY
&& GET_CODE (XEXP (ind, 1)) == PLUS
-  && REG_P (XEXP (XEXP (ind, 1), 1)))
+  && REG_P (XEXP (XEXP (ind, 1), 1))
+  && REG_P (XEXP (ind, 0))
+  && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0)))
   return true;

/* Match:
--- gcc/testsuite/gcc.target/arm/pr97528.c.jj   2020-11-18
17:09:58.195053288 +0100
+++ gcc/testsuite/gcc.target/arm/pr97528.c  2020-11-18
17:09:47.839168237 +0100
@@ -0,0 +1,28 @@
+/* PR target/97528 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O1" }  */
+/* { dg-add-options arm_neon } */
+
+#include 
+
+typedef __simd64_int16_t T;
+typedef __simd64_uint16_t U;
+unsigned short c;
+int d;
+U e;
+
+void
+foo (void)
+{
+  unsigned short *dst = &c;
+  int g = d, b = 4;
+  U dc = e;
+  for (int h = 0; h < b; h++)
+{
+  unsigned short *i = dst;
+  U j = dc;
+  vst1_s16 ((int16_t *) i, (T) j);
+  dst += g;
+}
+}


Jakub
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
04edd637d43198ad801bb5ada8f1807faae7001e..4679da75dd823778d5a3e37c497ee10793e9c7d7
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12730,7 +12730,9 @@ neon_vector_mem_operand (rtx op, int type, bool strict)
   /* Allow post-increment by register for VLDn */
   if (type == 2 && GET_CODE (ind) == POST_MODIFY
   && GET_CODE (XEXP (ind, 1)) == PLUS
-  && REG_P (XEXP (XEXP (ind, 1), 1)))
+  && REG_P (XEXP (XEXP (ind, 1), 1))
+  && REG_P (XEXP (ind, 0))
+  && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0)))
  return true;
 
   /* Match:
diff --git a/gcc/testsuite/gcc.target/arm/pr97528.c 
b/gcc/testsuite/gcc.target/arm/pr97528.c
new file mode 100644
index 
..6cc59f2158c5f8c8dd78e5083ca7ebc4e5f63a44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr97528.c
@@ -0,0 +1,28 @@
+/* PR target/97528 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O1" }  */
+/* { dg-add-options arm_neon } */
+
+#include 
+
+typedef __simd64_int16_t T;
+typedef __simd64_uint16_t U;
+unsigned short c;
+int d;
+U e;
+
+void
+foo (void)
+{
+  unsigned short *dst = &c;
+  int g = d, b = 4;
+  U dc = e;
+  for (int h = 0; h < b; h++)
+{
+  unsigned short *i = dst;
+  U j = dc;
+  vst1_s16 ((int16_t *) i, (T) j);
+  dst += g;
+}
+}


Re: [PATCH] i386, df: Fix up gcc.c-torture/compile/20051216-1.c -O1 -march=cascadelake

2021-02-02 Thread Richard Biener
On Tue, 2 Feb 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek  
> > wrote:
> >>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote:
> >>> OK, so I'd prefer we simply unset the flag after processing deferred
> >>rescan. I clearly misread the function to do that. 
> >>
> >>This works too, will bootstrap/regtest it now.
> >
> > OK. 
> 
> FWIW, I'm still not convinced we need to defer the rescan here.
> AIUI, the concern was that the pass introduces uses of something
> and then only later introduces the definition.  But that's OK:
> the two things don't have to be added in a set order.
> 
> In particular, the normal rescan doesn't propagate the effects
> throughout the cfg; it just updates the list of references in the
> instruction itself and marks the block as dirty for later processing.
> 
> The usual reason for deferring rescans is if the instruction can't
> cope with the df_refs changing from under them like that, e.g.
> because they're iterating through a list of df_refs, or because
> they're using df_refs to represent value numbers.  It shouldn't
> be a problem for this pass.

I guess that's correct but doing the scan on the "final" insn
feels much cleaner - given all the confusion it also looks safer
(doing defered scans as before) since I intend to backport this.
But if you feel strongly about this we can of course change it.

Thanks,
Richard.


[PATCH v7] Practical improvement to libgcc complex divide

2021-02-02 Thread Patrick McGehearty via Gcc-patches
Changes in this version from Version 6:
Updated copyrights for following three files to -2021.
gcc/c-family/c-cppbuiltin.c
Moved code for setting LIBGCC modename to FUNC_EXT section.
Added code to set modename for any additional modes such as
FLT128 or FLT64X.

libgcc/libgcc2.c
Changed RMIN2 & RMINSCAL for XF and TF modes to use the matching
LIBGCC_?F_EPSILON instead of LIBGCC_DF_EPSILON

libgcc/config/rs6000/_divkc3.c
Changed RMIN2 & RMINSCAL for TF mode to use the matching
LIBGCC_TF_EPSILON instead of LIBGCC_DF_EPSILON

Correctness and performance test programs used during development of
this project may be found in the attachment to:
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html

Summary of Purpose

This patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test data by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rare and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Further investigation showed changing the half precision algorithm
to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no
loss of precision and modest improvement in performance.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.  This
use is made generic by defining appropriate __LIBGCC2_* macros in
c-cppbuiltin.c.

Tests are added for when both parts of the denominator have exponents
small enough to allow shifting any subnormal values to normal values
all input values could be scaled up without risking overflow. That
gained a clear improvement in accuracy. Similarly, when either
numerator was subnormal and the other numerator and both denominator
values were not too large, scaling could be used to reduce r

RE: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

2021-02-02 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: 02 February 2021 17:27
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; ja...@redhat.com
> Subject: Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]
> 
> Hi,
> 
> This is a gcc-9 backport of the PR97528 fix that has been applied to
> trunk and gcc-10.
> Bootstraped on arm-linux-gnueabihf and regression tested.
> 
> OK for gcc-9 branch?

Ok.
Thanks,
Kyrill

> 
> 2021-02-02  Andre Vieira  
> 
>      Backport from mainline
>      2020-11-20  Jakub Jelinek  
> 
>      PR target/97528
>      * config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY,
> require
>      first POST_MODIFY operand is a REG and is equal to the first operand
>      of PLUS.
> 
>      * gcc.target/arm/pr97528.c: New test.
> 
> On 20/11/2020 11:25, Kyrylo Tkachov via Gcc-patches wrote:
> >
> >> -Original Message-
> >> From: Jakub Jelinek 
> >> Sent: 19 November 2020 18:57
> >> To: Richard Earnshaw ; Ramana
> >> Radhakrishnan ; Kyrylo Tkachov
> >> 
> >> Cc: gcc-patches@gcc.gnu.org
> >> Subject: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]
> >>
> >> Hi!
> >>
> >> The documentation for POST_MODIFY says:
> >> Currently, the compiler can only handle second operands of the
> >> form (plus (reg) (reg)) and (plus (reg) (const_int)), where
> >> the first operand of the PLUS has to be the same register as
> >> the first operand of the *_MODIFY.
> >> The following testcase ICEs, because combine just attempts to simplify
> >> things and ends up with
> >> (post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1))
> >> but the target predicates accept it, because they only verify
> >> that POST_MODIFY's second operand is PLUS and the second operand
> >> of the PLUS is a REG.
> >>
> >> The following patch fixes this by performing further verification that
> >> the POST_MODIFY is in the form it should be.
> >>
> >> Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk
> >> and release branches after a while?
> > Ok.
> > Thanks,
> > Kyrill
> >
> >> 2020-11-19  Jakub Jelinek  
> >>
> >>PR target/97528
> >>* config/arm/arm.c (neon_vector_mem_operand): For
> >> POST_MODIFY, require
> >>first POST_MODIFY operand is a REG and is equal to the first operand
> >>of PLUS.
> >>
> >>* gcc.target/arm/pr97528.c: New test.
> >>
> >> --- gcc/config/arm/arm.c.jj2020-11-13 19:00:46.729620560 +0100
> >> +++ gcc/config/arm/arm.c   2020-11-18 17:05:44.656867343 +0100
> >> @@ -13429,7 +13429,9 @@ neon_vector_mem_operand (rtx op, int typ
> >> /* Allow post-increment by register for VLDn */
> >> if (type == 2 && GET_CODE (ind) == POST_MODIFY
> >> && GET_CODE (XEXP (ind, 1)) == PLUS
> >> -  && REG_P (XEXP (XEXP (ind, 1), 1)))
> >> +  && REG_P (XEXP (XEXP (ind, 1), 1))
> >> +  && REG_P (XEXP (ind, 0))
> >> +  && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0)))
> >>return true;
> >>
> >> /* Match:
> >> --- gcc/testsuite/gcc.target/arm/pr97528.c.jj  2020-11-18
> >> 17:09:58.195053288 +0100
> >> +++ gcc/testsuite/gcc.target/arm/pr97528.c 2020-11-18
> >> 17:09:47.839168237 +0100
> >> @@ -0,0 +1,28 @@
> >> +/* PR target/97528 */
> >> +/* { dg-do compile } */
> >> +/* { dg-require-effective-target arm_neon_ok } */
> >> +/* { dg-options "-O1" }  */
> >> +/* { dg-add-options arm_neon } */
> >> +
> >> +#include 
> >> +
> >> +typedef __simd64_int16_t T;
> >> +typedef __simd64_uint16_t U;
> >> +unsigned short c;
> >> +int d;
> >> +U e;
> >> +
> >> +void
> >> +foo (void)
> >> +{
> >> +  unsigned short *dst = &c;
> >> +  int g = d, b = 4;
> >> +  U dc = e;
> >> +  for (int h = 0; h < b; h++)
> >> +{
> >> +  unsigned short *i = dst;
> >> +  U j = dc;
> >> +  vst1_s16 ((int16_t *) i, (T) j);
> >> +  dst += g;
> >> +}
> >> +}
> >>
> >>
> >>Jakub


[PATCH] fix memory leaks

2021-02-02 Thread Richard Biener
This fixes various vec<> memory leaks as discovered compiling 521.wrf_r.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-02-02  Richard Biener  

* gimple-loop-interchange.cc (prepare_data_references):
Release vectors.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Likewise.
* tree-ssa-loop-im.c (hoist_memory_references): Likewise.
* tree-vect-stmts.c (vectorizable_condition): Do not
allocate vectors.
(vectorizable_comparison): Likewise.
---
 gcc/gimple-loop-interchange.cc | 10 --
 gcc/gimple-loop-jam.c  |  6 ++
 gcc/tree-ssa-loop-im.c |  1 +
 gcc/tree-vect-stmts.c  | 13 -
 4 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index a1dadd8e5d4..f45b9364644 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -1940,7 +1940,10 @@ prepare_data_references (class loop *loop, 
vec *datarefs)
   delete bb_refs;
 }
   else if (bb_refs->is_empty ())
-   delete bb_refs;
+   {
+ bb_refs->release ();
+ delete bb_refs;
+   }
   else
bb->aux = bb_refs;
 }
@@ -1954,7 +1957,10 @@ prepare_data_references (class loop *loop, 
vec *datarefs)
 
   bb_refs = (vec *) bb->aux;
   if (loop_nest && flow_bb_inside_loop_p (loop_nest, bb))
-   datarefs->safe_splice (*bb_refs);
+   {
+ datarefs->safe_splice (*bb_refs);
+ bb_refs->release ();
+   }
   else
free_data_refs (*bb_refs);
 
diff --git a/gcc/gimple-loop-jam.c b/gcc/gimple-loop-jam.c
index 485f5a9bf61..69dbaeb6cb9 100644
--- a/gcc/gimple-loop-jam.c
+++ b/gcc/gimple-loop-jam.c
@@ -505,15 +505,13 @@ tree_loop_unroll_and_jam (void)
   if (!unroll_jam_possible_p (outer, loop))
continue;
 
-  vec datarefs;
-  vec dependences;
+  vec datarefs = vNULL;
+  vec dependences = vNULL;
   unsigned unroll_factor, profit_unroll, removed;
   class tree_niter_desc desc;
   bool unroll = false;
 
   auto_vec loop_nest;
-  dependences.create (10);
-  datarefs.create (10);
   if (!compute_data_dependences_for_loop (outer, true, &loop_nest,
  &datarefs, &dependences))
{
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 445b93f7979..8034cf68d27 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -2508,6 +2508,7 @@ hoist_memory_references (class loop *loop, bitmap 
mem_refs,
   if (res != 1)
{
  bitmap_copy (refs_not_supported, mem_refs);
+ seq.release ();
  break;
}
   sms.safe_push (std::make_pair (e, seq));
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f180ced3124..5eb7b2d1d6e 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -10085,14 +10085,6 @@ vectorizable_condition (vec_info *vinfo,
 
   /* Transform.  */
 
-  if (!slp_node)
-{
-  vec_oprnds0.create (1);
-  vec_oprnds1.create (1);
-  vec_oprnds2.create (1);
-  vec_oprnds3.create (1);
-}
-
   /* Handle def.  */
   scalar_dest = gimple_assign_lhs (stmt);
   if (reduction_type != EXTRACT_LAST_REDUCTION)
@@ -10480,11 +10472,6 @@ vectorizable_comparison (vec_info *vinfo,
 }
 
   /* Transform.  */
-  if (!slp_node)
-{
-  vec_oprnds0.create (1);
-  vec_oprnds1.create (1);
-}
 
   /* Handle def.  */
   lhs = gimple_assign_lhs (stmt);
-- 
2.26.2


[PATCH]middle-end slp: Split out patterns away from using SLP_ONLY into their own flag

2021-02-02 Thread Tamar Christina via Gcc-patches
Hi All,

Previously the SLP pattern matcher was using STMT_VINFO_SLP_VECT_ONLY as a way
to dissolve the SLP only patterns during SLP cancellation.  However it seems
like the semantics for STMT_VINFO_SLP_VECT_ONLY are slightly different than what
I expected.

Namely that the non-SLP path can still use a statement marked
STMT_VINFO_SLP_VECT_ONLY.  One such example is masked loads which are used both
in the SLP and non-SLP path.

To fix this I now introduce a new flag STMT_VINFO_SLP_VECT_ONLY_PATTERN which is
used only by the pattern matcher.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/98928
* tree-vect-loop.c (vect_analyze_loop_2): Change
STMT_VINFO_SLP_VECT_ONLY to STMT_VINFO_SLP_VECT_ONLY_PATTERN.
* tree-vect-slp-patterns.c (complex_pattern::build): Likewise.
* tree-vectorizer.h (STMT_VINFO_SLP_VECT_ONLY_PATTERN): New.
(class _stmt_vec_info): Add slp_vect_pattern_only_p.

gcc/testsuite/ChangeLog:

PR tree-optimization/98928
* gcc.target/i386/pr98928.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.target/i386/pr98928.c 
b/gcc/testsuite/gcc.target/i386/pr98928.c
new file mode 100644
index 
..9503b579a88d95c427d3e3e5a71565b0c048c125
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98928.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast -march=skylake-avx512 -fwhole-program -w" } 
*/
+
+typedef float MagickRealType;
+typedef short Quantum;
+float InterpolateMagickPixelPacket_alpha[1];
+int InterpolateMagickPixelPacket_i;
+
+void InterpolateMagickPixelPacket();
+
+void main() { InterpolateMagickPixelPacket(); }
+
+typedef struct {
+  MagickRealType red, green, blue, opacity, index;
+} MagickPixelPacket;
+typedef struct {
+  Quantum blue, green, red, opacity;
+} PixelPacket;
+struct _Image {
+  int colorspace;
+  int matte;
+} GetMagickPixelPacket(MagickPixelPacket *pixel) {
+  pixel->red = pixel->green = pixel->blue = 0.0;
+}
+int AlphaBlendMagickPixelPacket(struct _Image *image, PixelPacket *color,
+Quantum *indexes, MagickPixelPacket *pixel,
+MagickRealType *alpha) {
+  if (image->matte) {
+*alpha = pixel->red = pixel->green = pixel->blue = pixel->opacity =
+color->opacity;
+pixel->index = 0.0;
+if (image->colorspace)
+  pixel->index = *indexes;
+return 0;
+  }
+  *alpha = 1.0 / 0.2;
+  pixel->red = *alpha * color->red;
+  pixel->green = *alpha * color->green;
+  pixel->blue = *alpha * color->blue;
+  pixel->opacity = pixel->index = 0.0;
+  if (image->colorspace && indexes)
+pixel->index = *indexes;
+}
+MagickPixelPacket InterpolateMagickPixelPacket_pixels[1];
+PixelPacket InterpolateMagickPixelPacket_p;
+
+void
+InterpolateMagickPixelPacket(struct _Image *image) {
+  Quantum *indexes;
+  for (; InterpolateMagickPixelPacket_i; InterpolateMagickPixelPacket_i++) {
+GetMagickPixelPacket(InterpolateMagickPixelPacket_pixels +
+ InterpolateMagickPixelPacket_i);
+AlphaBlendMagickPixelPacket(
+image, &InterpolateMagickPixelPacket_p + 
InterpolateMagickPixelPacket_i,
+indexes + InterpolateMagickPixelPacket_i,
+InterpolateMagickPixelPacket_pixels + InterpolateMagickPixelPacket_i,
+InterpolateMagickPixelPacket_alpha + InterpolateMagickPixelPacket_i);
+  }
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
acfd1952e3b803ea79cf51433101466743c9793e..200ed27b32ef4aa54c6783afa1864924b6f55582
 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2700,7 +2700,7 @@ again:
{
  stmt_vec_info pattern_stmt_info
= STMT_VINFO_RELATED_STMT (stmt_info);
- if (STMT_VINFO_SLP_VECT_ONLY (pattern_stmt_info))
+ if (STMT_VINFO_SLP_VECT_ONLY_PATTERN (pattern_stmt_info))
STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
 
  gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info);
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 
d25560fab97bb852e949884850d51c6148b14a68..f0817da9f622d22e3df2e30410d1cf610b4ffa1d
 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -599,7 +599,7 @@ complex_pattern::build (vec_info *vinfo)
 the call there.  */
   vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
   SLP_TREE_VECTYPE (node));
-  STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
+  STMT_VINFO_SLP_VECT_ONLY_PATTERN (call_stmt_info) = true;
 
   /* Since we are replacing all the statements in the group with the same
 thing it doesn't really matter.  So just set it every time a new stmt
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 
f8bf4488d0ea32e7909f6be2bf4e7cdaee4f55fe..e

Re: [PATCH] expand: Expand x / y * y as x - x % y if the latter is cheaper [PR96696]

2021-02-02 Thread Jeff Law via Gcc-patches



On 1/16/21 11:13 AM, Jakub Jelinek wrote:
> Hi!
>
> The following patch tests both x / y * y and x - x % y expansion for the
> former GIMPLE code and chooses the cheaper of those sequences.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-01-16  Jakub Jelinek  
>
>   PR tree-optimization/96696
>   * expr.c (expand_expr_divmod): New function.
>   (expand_expr_real_2) : Use it for truncations and
>   divisions.  Formatting fixes.
>   : Optimize x / y * y as x - x % y if the latter is
>   cheaper.
>
>   * gcc.target/i386/pr96696.c: New test.
Given this is strictly a missed optimization, I'd lean towards deferring
to gcc-12 at this point.  Thoughts?

jeff



Re: [PATCH] match.pd: Add some __builtin_ctz (x) cmp cst simplifications [PR95527]

2021-02-02 Thread Jeff Law via Gcc-patches



On 1/16/21 11:19 AM, Jakub Jelinek via Gcc-patches wrote:
> Hi!
>
> This patch adds some ctz simplifications (e.g. ctz (x) >= 3 can be done by
> testing if the low 3 bits are zero, etc.).
>
> In addition, I've noticed that in the CLZ case, the
> #ifdef CLZ_DEFINED_VALUE_AT_ZERO don't really work as intended, they
> are evaluated during genmatch and the macro is not defined then
> (but, because of the missing tm.h includes it isn't defined in
> gimple-match.c or generic-match.c either).  And when tm.h is included,
> defaults.h is included which defines a fallback version of that macro.
>
> For GCC 12, I wonder if it wouldn't be better to say in addition to 
> __builtin_c[lt]z*
> is always UB at zero that it would be undefined for .C[LT]Z ifn too if it
> has just one operand and use a second operand to be the constant we expect
> at zero.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-01-16  Jakub Jelinek  
>
>   PR tree-optimization/95527
>   * generic-match-head.c: Include tm.h.
>   * gimple-match-head.c: Include tm.h.
>   * match.pd (CLZ == INTEGER_CST): Don't use
>   #ifdef CLZ_DEFINED_VALUE_AT_ZERO, only test CLZ_DEFINED_VALUE_AT_ZERO
>   if clz == CFN_CLZ.  Add missing val declaration.
>   (CTZ cmp CST): New simplifications.
>
>   * gcc.dg/tree-ssa/pr95527-2.c: New test.
Similarly.  I'd lean towards deferring to gcc-12.
jeff



Re: [PATCH] match.pd: Add some __builtin_ctz (x) cmp cst simplifications [PR95527]

2021-02-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 02, 2021 at 11:39:30AM -0700, Jeff Law wrote:
> > This patch adds some ctz simplifications (e.g. ctz (x) >= 3 can be done by
> > testing if the low 3 bits are zero, etc.).
> >
> > In addition, I've noticed that in the CLZ case, the
> > #ifdef CLZ_DEFINED_VALUE_AT_ZERO don't really work as intended, they
> > are evaluated during genmatch and the macro is not defined then
> > (but, because of the missing tm.h includes it isn't defined in
> > gimple-match.c or generic-match.c either).  And when tm.h is included,
> > defaults.h is included which defines a fallback version of that macro.
> >
> > For GCC 12, I wonder if it wouldn't be better to say in addition to 
> > __builtin_c[lt]z*
> > is always UB at zero that it would be undefined for .C[LT]Z ifn too if it
> > has just one operand and use a second operand to be the constant we expect
> > at zero.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2021-01-16  Jakub Jelinek  
> >
> > PR tree-optimization/95527
> > * generic-match-head.c: Include tm.h.
> > * gimple-match-head.c: Include tm.h.
> > * match.pd (CLZ == INTEGER_CST): Don't use
> > #ifdef CLZ_DEFINED_VALUE_AT_ZERO, only test CLZ_DEFINED_VALUE_AT_ZERO
> > if clz == CFN_CLZ.  Add missing val declaration.
> > (CTZ cmp CST): New simplifications.
> >
> > * gcc.dg/tree-ssa/pr95527-2.c: New test.
> Similarly.  I'd lean towards deferring to gcc-12.

Ok, will repost at the start of stage1 then (for both).

Jakub



[PATCH] Add test-case.

2021-02-02 Thread Martin Liška

Pushed to master.

gcc/testsuite/ChangeLog:

PR target/97510
* gcc.target/i386/pr97510.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr97510.c | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr97510.c

diff --git a/gcc/testsuite/gcc.target/i386/pr97510.c 
b/gcc/testsuite/gcc.target/i386/pr97510.c
new file mode 100644
index 000..4f967b59c9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97510.c
@@ -0,0 +1,18 @@
+/* PR target/97510 */
+/* { dg-do compile } */
+/* { dg-options "-Os -fexcess-precision=standard -mfpmath=387 
-funsafe-math-optimizations" } */
+
+float compute_rsqrt_ref_r_0;
+
+__attribute__((optimize(1)))
+void compute_rsqrt_ref() {
+  compute_rsqrt_ref_r_0 = 1.0 / 0.0;
+}
+
+int icompute_rsqrt_ref(float *);
+
+void test_512() {
+  float in[0];
+  for (int i;;)
+in[i] = 8.6756 * icompute_rsqrt_ref(in);
+}
--
2.30.0



[PATCH] release pointer_query cache when done (PR 98937)

2021-02-02 Thread Martin Sebor via Gcc-patches

The strlen pass initializes its pointer_query member object with
a cache consisting of a couple of vec's.  Because vec doesn't
implement RAII its memory must be explicitly released to avoid
memory leaks.  The attached patch adds a dtor to
the strlen_dom_walker to do that.

Tested on x86_64-linux and by verifying that the cache leaks are
gone by compiling gcc.dg/Wstringop-overflow*.c tests under Valgrind.

I'll plan to commit this change as "obvious" tomorrow unless there
are suggestions for changes.

Martin

PS Valgrind shows a fair number of leaks even with the patch but
none of them due to the pointer_query cache.
PR tree-optimization/98937 - pointer_query cache leaks

gcc/ChangeLog:

	PR tree-optimization/98937
	* tree-ssa-strlen.c (strlen_dom_walker::~strlen_dom_walker): Define.
	Flush pointer_query cache.

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index c6f74051607..8912a113de9 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -5491,6 +5491,8 @@ public:
 m_cleanup_cfg (false)
   { }
 
+  ~strlen_dom_walker ();
+
   virtual edge before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
 
@@ -5508,6 +5510,13 @@ public:
   bool m_cleanup_cfg;
 };
 
+/* Release pointer_query cache.  */
+
+strlen_dom_walker::~strlen_dom_walker ()
+{
+  ptr_qry.flush_cache ();
+}
+
 /* Callback for walk_dominator_tree.  Attempt to optimize various
string ops by remembering string lengths pointed by pointer SSA_NAMEs.  */
 


gotools patch committed: Test file embedding

2021-02-02 Thread Ian Lance Taylor via Gcc-patches
This patch to the gotools Makefile runs "go test
embed/internal/embedtest" using the newly built go tool, to test that
file embedding works.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

* Makefile.am (check-embed): New target.
(check): Depend on check-embed.  Examine embed-testlog.
(mostlyclean-local): Add check-embed-dir.
(.PHONY): Add check-embed.
* Makefile.in: Regenerate.
345484a6755260cd6a01c17ac5547da669422b20
diff --git a/gotools/Makefile.am b/gotools/Makefile.am
index 3bbccb96c28..6576fe77b85 100644
--- a/gotools/Makefile.am
+++ b/gotools/Makefile.am
@@ -102,7 +102,7 @@ MOSTLYCLEANFILES = \
 mostlyclean-local:
if test -d check-go-dir; then chmod -R u+w check-go-dir; fi
rm -rf check-go-dir cgo-test-dir carchive-test-dir \
-   check-vet-dir gocache-test
+   check-vet-dir check-embed-dir gocache-test
 
 if NATIVE
 
@@ -307,11 +307,28 @@ check-vet: go$(EXEEXT) $(noinst_PROGRAMS) check-head 
check-gccgo check-gcc
(cd check-vet-dir/src/cmd/vet && $(abs_builddir)/go$(EXEEXT) test 
-test.short -test.timeout=$(GOTOOLS_TEST_TIMEOUT)s -test.v) >> cmd_vet-testlog 
2>&1 || echo "--- $${fl}: go test cmd/vet (0.00s)" >> cmd_vet-testlog
grep '^--- ' cmd_vet-testlog | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' | 
sort -k 2
 
+# check-embed runs `go test embed/internal/embedtest` in our environment.
+check-embed: go$(EXEEXT) $(noinst_PROGRAMS) check-head check-gccgo check-gcc
+   rm -rf check-embed-dir embed-testlog
+   $(MKDIR_P) check-embed-dir/src/embed/internal
+   cp $(libgosrcdir)/go.mod check-embed-dir/src/
+   $(MKDIR_P) check-embed-dir/src/vendor
+   cp $(libgosrcdir)/vendor/modules.txt check-embed-dir/src/vendor/
+   cp -r $(libgosrcdir)/embed/internal/embedtest 
check-embed-dir/src/embed/internal
+   @abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
+   abs_checkdir=`cd check-embed-dir && $(PWD_COMMAND)`; \
+   echo "cd check-embed-dir/src/embed/internal/embedtest && $(ECHO_ENV) 
GOPATH=$${abs_checkdir} $(abs_builddir)/go$(EXEEXT) test -test.short 
-test.timeout=$(GOTOOLS_TEST_TIMEOUT)s -test.v" > embed-testlog
+   $(CHECK_ENV) \
+   GOPATH=`cd check-embed-dir && $(PWD_COMMAND)`; \
+   export GOPATH; \
+   (cd check-embed-dir/src/embed/internal/embedtest && 
$(abs_builddir)/go$(EXEEXT) test -test.short 
-test.timeout=$(GOTOOLS_TEST_TIMEOUT)s -test.v) >> embed-testlog 2>&1 || echo 
"--- $${fl}: go test embed/internal/embedtest (0.00s)" >> embed-testlog
+   grep '^--- ' embed-testlog | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' | 
sort -k 2
+
 # The check targets runs the tests and assembles the output files.
-check: check-head check-go-tool check-runtime check-cgo-test 
check-carchive-test check-vet
+check: check-head check-go-tool check-runtime check-cgo-test 
check-carchive-test check-vet check-embed
@mv gotools.head gotools.sum
@cp gotools.sum gotools.log
-   @for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog; do \
+   @for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
  testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
  echo "Running $${testname}" >> gotools.sum; \
  echo "Running $${testname}" >> gotools.log; \
@@ -337,7 +354,8 @@ check: check-head check-go-tool check-runtime 
check-cgo-test check-carchive-test
@echo "runtest completed at `date`" >> gotools.log
@if grep '^FAIL' gotools.sum >/dev/null 2>&1; then exit 1; fi
 
-.PHONY: check check-head check-go-tool check-runtime check-cgo-test 
check-carchive-test check-vet
+.PHONY: check check-head check-go-tool check-runtime check-cgo-test
+.PHONY: check-carchive-test check-vet check-embed
 
 else
 


Re: [PATCH] release pointer_query cache when done (PR 98937)

2021-02-02 Thread David Malcolm via Gcc-patches
On Tue, 2021-02-02 at 12:57 -0700, Martin Sebor via Gcc-patches wrote:
> The strlen pass initializes its pointer_query member object with
> a cache consisting of a couple of vec's.  Because vec doesn't
> implement RAII its memory must be explicitly released to avoid
> memory leaks.  The attached patch adds a dtor to
> the strlen_dom_walker to do that.
> 
> Tested on x86_64-linux and by verifying that the cache leaks are
> gone by compiling gcc.dg/Wstringop-overflow*.c tests under Valgrind.
> 
> I'll plan to commit this change as "obvious" tomorrow unless there
> are suggestions for changes.

Why not make the vecs within struct pointer_query::cache_type's be
auto_vecs?  Then presumably the autogenerated dtor for
pointer_query::cache_type would clean things up, as called from the
autogenerated dtor for strlen_dom_walker, when cleaning up the
var_cache field?

Or am I missing something?  (sorry, I'm not familiar with this code)
Dave

> Martin
> 
> PS Valgrind shows a fair number of leaks even with the patch but
> none of them due to the pointer_query cache.




Re: [PATCH] Make asm not contain prefixed addresses.

2021-02-02 Thread Segher Boessenkool
Hi,

On Mon, Feb 01, 2021 at 11:24:42PM -0500, Michael Meissner wrote:
> In PR target/98519, the assembler does not like asm memory references that are
> prefixed.  We can't automatically change the instruction to prefixed form with
> a 'p' like we do for normal RTL insns, since this is assembly code.

It is incorrect in general, too, for at least three reasons.

> Instead,
> the patch prevents prefixed memory addresses from being passed by default.
> 
> This patch uses the TARGET_MD_ASM_ADJUST target hook to change the 'm' and 'o'
> constraints to be 'em' and 'eo'.

This does not work correctly, and is a gross misuse of this hook anyway.

Like I said before, just make 'm' (or even general_operand) not allow
prefixed memory in inline asm.

> +   while ((ch = *constraint++) != '\0')

Don't do assignments (or any other surprising side effects) in
conditionals please.  Code should be *readable*, not a puzzle.

0 is written as 0, not as 000 or '\0' or 0x0 or anything else.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr98519.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */

This is the default.

> +/* { dg-require-effective-target powerpc_prefixed_addr } */

You do not want this line, you want to compile this *always*.  You set
cpu=power10 anyway!  Also, the test should work *without it*!

> +/* { dg-final { scan-assembler-not {\m[@]pcrel\M} } } */

You can just write @ instead of [@].  Putting \m immediately in front of
a non-letter (or \M immediately after) does not do anything, either.


Segher


[PATCH] c++: ICE with late parsing of noexcept in nested class [PR98899]

2021-02-02 Thread Marek Polacek via Gcc-patches
Here we crash with a noexcept-specifier in a nested template class,
because my handling of such deferred-parse noexcept-specifiers was
gronked when we need to instantiate a DEFERRED_PARSE before it was
actually parsed at the end of the outermost class.

In

  struct S {
template struct B {
  B() noexcept(noexcept(x));
  int x;
};
struct A : B {
  A() : B() {}
};
  };

we call complete_type for B which triggers tsubsting S::B::B()
whose noexcept-specifier still contains a DEFERRED_PARSE.  The trick is
to stash such noexcept-specifiers into DEFPARSE_INSTANTIATIONS so that
we can replace it later when we've finally parsed all deferred
noexcept-specifiers.

In passing, fix missing usage of UNPARSED_NOEXCEPT_SPEC_P.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/cp/ChangeLog:

PR c++/98899
* parser.c (cp_parser_class_specifier_1): Use any possible
DEFPARSE_INSTANTIATIONS to update DEFERRED_NOEXCEPT_PATTERN.
(cp_parser_save_noexcept): Initialize DEFPARSE_INSTANTIATIONS.
* pt.c (tsubst_exception_specification): Stash new_specs into
DEFPARSE_INSTANTIATIONS.
* tree.c (fixup_deferred_exception_variants): Use
UNPARSED_NOEXCEPT_SPEC_P.

gcc/testsuite/ChangeLog:

PR c++/98899
* g++.dg/cpp0x/noexcept65.C: New test.
---
 gcc/cp/parser.c | 13 ++---
 gcc/cp/pt.c | 16 +++
 gcc/cp/tree.c   |  3 +--
 gcc/testsuite/g++.dg/cpp0x/noexcept65.C | 35 +
 4 files changed, 62 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept65.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index abadaf972d6..5da8670f0e2 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -25026,8 +25026,8 @@ cp_parser_class_specifier_1 (cp_parser* parser)
  pushed_scope = push_scope (class_type);
}
 
- tree spec = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (decl));
- spec = TREE_PURPOSE (spec);
+ tree def_parse = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (decl));
+ def_parse = TREE_PURPOSE (def_parse);
 
  /* Make sure that any template parameters are in scope.  */
  maybe_begin_member_template_processing (decl);
@@ -25044,7 +25044,7 @@ cp_parser_class_specifier_1 (cp_parser* parser)
parser->local_variables_forbidden_p |= THIS_FORBIDDEN;
 
  /* Now we can parse the noexcept-specifier.  */
- spec = cp_parser_late_noexcept_specifier (parser, spec);
+ tree spec = cp_parser_late_noexcept_specifier (parser, def_parse);
 
  if (spec == error_mark_node)
spec = NULL_TREE;
@@ -25052,6 +25052,12 @@ cp_parser_class_specifier_1 (cp_parser* parser)
  /* Update the fn's type directly -- it might have escaped
 beyond this decl :(  */
  fixup_deferred_exception_variants (TREE_TYPE (decl), spec);
+ /* Update any instantiations we've already created.  We must
+keep the new noexcept-specifier wrapped in a DEFERRED_NOEXCEPT
+so that maybe_instantiate_noexcept can tsubst the NOEXCEPT_EXPR
+in the pattern.  */
+ for (tree i : DEFPARSE_INSTANTIATIONS (def_parse))
+   DEFERRED_NOEXCEPT_PATTERN (TREE_PURPOSE (i)) = TREE_PURPOSE (spec);
 
  /* Restore the state of local_variables_forbidden_p.  */
  parser->local_variables_forbidden_p = local_variables_forbidden_p;
@@ -26695,6 +26701,7 @@ cp_parser_save_noexcept (cp_parser *parser)
   /* Save away the noexcept-specifier; we will process it when the
  class is complete.  */
   DEFPARSE_TOKENS (expr) = cp_token_cache_new (first, last);
+  DEFPARSE_INSTANTIATIONS (expr) = nullptr;
   expr = build_tree_list (expr, NULL_TREE);
   return expr;
 }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index aa1687a9f2a..4781519d00f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15189,6 +15189,22 @@ tsubst_exception_specification (tree fntype,
 /*integral_constant_expression_p=*/true);
}
   new_specs = build_noexcept_spec (new_specs, complain);
+  /* We've instantiated a template before a noexcept-specifier
+contained therein has been parsed.  This can happen for
+a nested template class:
+
+ struct S {
+   template struct B { B() noexcept(...); };
+   struct A : B { ... use B() ... };
+ };
+
+where completing B will trigger instantiating the
+noexcept, even though we only parse it at the end of S.  */
+  if (UNPARSED_NOEXCEPT_SPEC_P (specs))
+   {
+ gcc_checking_assert (defer_ok);
+ vec_safe_push (DEFPARSE_INSTANTIATIONS (expr), new_specs);
+   }
 }
   else if (specs)
 {
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 2e5a1f198e8..e6ced274959 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -2738,8 +2738,7 @@ fixup_deferred_exception_variants (tr

[PATCH v7] Practical improvement to libgcc complex divide

2021-02-02 Thread Patrick McGehearty via Gcc-patches
Changes in this version from Version 6:
Updated copyrights for following three files to -2021.
gcc/c-family/c-cppbuiltin.c
Moved code for setting LIBGCC modename to FUNC_EXT section.
Added code to set modename for any additional modes such as
FLT128 or FLT64X.

libgcc/libgcc2.c
Changed RMIN2 & RMINSCAL for XF and TF modes to use the matching
LIBGCC_?F_EPSILON instead of LIBGCC_DF_EPSILON

libgcc/config/rs6000/_divkc3.c
Changed RMIN2 & RMINSCAL for TF mode to use the matching
LIBGCC_TF_EPSILON instead of LIBGCC_DF_EPSILON

Resubmitted with correction to following link. No other changes
from previous PATCH v7.

Correctness and performance test programs used during development of
this project may be found in the attachment to:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564239.html
with the attachment at
https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210125/b5a7df3b/attachment-0001.bin

Summary of Purpose

This patch to libgcc/libgcc2.c __divdc3 provides an
opportunity to gain important improvements to the quality of answers
for the default complex divide routine (half, float, double, extended,
long double precisions) when dealing with very large or very small exponents.

The current code correctly implements Smith's method (1962) [2]
further modified by c99's requirements for dealing with NaN (not a
number) results. When working with input values where the exponents
are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
substantially different from the answers provided by quad precision
more than 1% of the time. This error rate may be unacceptable for many
applications that cannot a priori restrict their computations to the
safe range. The proposed method reduces the frequency of
"substantially different" answers by more than 99% for double
precision at a modest cost of performance.

Differences between current gcc methods and the new method will be
described. Then accuracy and performance differences will be discussed.

Background

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

I developed two sets of test data by randomly distributing values over
a restricted range and the full range of input values. The current
complex divide handled the restricted range well enough, but failed on
the full range more than 1% of the time. Baudin and Smith's primary
test for "ratio" equals zero reduced the cases with 16 or more error
bits by a factor of 5, but still left too many flawed answers. Adding
debug print out to cases with substantial errors allowed me to see the
intermediate calculations for test values that failed. I noted that
for many of the failures, "ratio" was a subnormal. Changing the
"ratio" test from check for zero to check for subnormal reduced the 16
bit error rate by another factor of 12. This single modified test
provides the greatest benefit for the least cost, but the percentage
of cases with greater than 16 bit errors (double precision data) is
still greater than 0.027% (2.7 in 10,000).

Continued examination of remaining errors and their intermediate
computations led to the various tests of input value tests and scaling
to avoid under/overflow. The current patch does not handle some of the
rare and most extreme combinations of input values, but the random
test data is only showing 1 case in 10 million that has an error of
greater than 12 bits. That case has 18 bits of error and is due to
subtraction cancellation. These results are significantly better
than the results reported by Baudin and Smith.

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Further investigation showed changing the half precision algorithm
to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no
loss of precision and modest improvement in performance.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.  This
use is made generic by defining appropriate __LIBGCC2_* macros in
c-cppbuiltin.c.

Tests are added for when both parts of the denominator have exponents
small enough to allow shifting any subnormal values to normal values
all input values could be scaled up without risking over

Re: [PATCH v6] Practical improvement to libgcc complex divide

2021-02-02 Thread Patrick McGehearty via Gcc-patches

I saw the gcc.gnu.org at the end of the address and did not realize
the link was not part gnu.org.

The gcc.gnu.org link is:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564239.html
with the attachment at
https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210125/b5a7df3b/attachment-0001.bin

I'll resubmit the patch with the corrected link.

- patrick




On 2/1/2021 5:49 PM, Joseph Myers wrote:

On Mon, 1 Feb 2021, Patrick McGehearty via Gcc-patches wrote:


The message which contains the attached gzipped tarball of the
development test programs is:
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html
I'll include that link in the new patch as well.

I think it's best to include the URL in the archives on gcc.gnu.org rather
than on a third-party site.





Re: [PATCH] PING lra: clear lra_insn_recog_data after simplifying a mem subreg

2021-02-02 Thread Vladimir Makarov via Gcc-patches



On 2021-01-28 5:40 a.m., Ilya Leoshkevich via Gcc-patches wrote:

Hello,

I would like to ping the following patch:

lra: clear lra_insn_recog_data after simplifying a mem subreg
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563428.html

Sorry, I missed your original email.  The patch is ok to submit into the 
trunk.


Thank you for addressing the issue.




Merge from trunk to gccgo branch

2021-02-02 Thread Ian Lance Taylor via Gcc-patches
I merged trunk revision 8e4a738d2540ab6aff77506d368bf4e3fa6963bd to
the gccgo branch.

Ian


Re: [PATCH] ifcvt: Avoid ICEs trying to force_operand random RTL [PR97487]

2021-02-02 Thread Jeff Law via Gcc-patches



On 2/2/21 2:29 AM, Jakub Jelinek wrote:
> Hi!
>
> As the testcase shows, RTL ifcvt can throw random RTL (whatever it found in
> some insns) at expand_binop or expand_unop and expects it to do something
> (and then will check if it created valid insns and punts if not).
> These functions in the end if the operands don't match try to
> copy_to_mode_reg the operands, which does
> if (!general_operand (x, VOIDmode))
>   x = force_operand (x, temp);
> but, force_operand is far from handling all possible RTLs, it will ICE for
> all more unusual RTL codes.  Basically handles just simple arithmetic and
> unary RTL operations if they have an optab and
> expand_simple_binop/expand_simple_unop ICE on others.
>
> The following patch fixes it by adding some operand verification (whether
> there is a hope that copy_to_mode_reg will succeed on those).  It is added
> both to noce_emit_move_insn (not needed for this exact testcase,
> that function simply tries to recog the insn as is and if it fails,
> handles some simple binop/unop cases; the patch performs the verification
> of their operands) and noce_try_sign_mask.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-02-02  Jakub Jelinek  
>
>   PR middle-end/97487
>   * ifcvt.c (noce_can_force_operand): New function.
>   (noce_emit_move_insn): Use it.
>   (noce_try_sign_mask): Likewise.  Formatting fix.
>
>   * gcc.dg/pr97487-1.c: New test.
>   * gcc.dg/pr97487-2.c: New test.
OK.  One might consider having force_operand call can_force_operand but
that feels more like a future thing than something we'd want to do right
now.

Jeff



Re: [PATCH] lra-constraints: Fix error-recovery for bad inline-asms [PR97971]

2021-02-02 Thread Jeff Law via Gcc-patches



On 2/2/21 2:21 AM, Jakub Jelinek via Gcc-patches wrote:
> Hi!
>
> The following testcase has ice-on-invalid, it can't be reloaded, but we
> shouldn't ICE the compiler because the user typed non-sense.
>
> In current_insn_transform we have:
>   if (process_alt_operands (reused_alternative_num))
> alt_p = true;
>   
>   if (check_only_p)
> return ! alt_p || best_losers != 0;
>   
>   /* If insn is commutative (it's safe to exchange a certain pair of
>  operands) then we need to try each alternative twice, the second
>  time matching those two operands as if we had exchanged them.  To
>  do this, really exchange them in operands.
> 
>  If we have just tried the alternatives the second time, return
>  operands to normal and drop through.  */
> 
>   if (reused_alternative_num < 0 && commutative >= 0)
> {
>   curr_swapped = !curr_swapped;
>   if (curr_swapped)
> {
>   swap_operands (commutative);
>   goto try_swapped;
> }
>   else
> swap_operands (commutative);
> }
> 
>   if (! alt_p && ! sec_mem_p)
> {
>   /* No alternative works with reloads??  */
>   if (INSN_CODE (curr_insn) >= 0)
> fatal_insn ("unable to generate reloads for:", curr_insn);
>   error_for_asm (curr_insn,
>  "inconsistent operand constraints in an %");
>   lra_asm_error_p = true;
> ...
> and so handle inline asms there differently (and delete/nullify them after
> this) - fatal_insn is only called for non-inline asm.
> But in process_alt_operands we do:
> /* Both the earlyclobber operand and conflicting operand
>cannot both be user defined hard registers.  */
> if (HARD_REGISTER_P (operand_reg[i])
> && REG_USERVAR_P (operand_reg[i])
> && operand_reg[j] != NULL_RTX
> && HARD_REGISTER_P (operand_reg[j])
> && REG_USERVAR_P (operand_reg[j]))
>   fatal_insn ("unable to generate reloads for "
>   "impossible constraints:", curr_insn);
> and thus ICE even for inline-asms.
>
> I think it is inappropriate to delete/nullify the insn in
> process_alt_operands, as it could be done e.g. in the check_only_p mode,
> so this patch just returns false in that case, which results in the
> caller have alt_p false, and as inline asm isn't simple move, sec_mem_p
> will be also false (and it isn't commutative either), so for check_only_p
> it will suggests to the callers it isn't ok and otherwise will emit
> error and delete/nullify the inline asm insn.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-02-02  Jakub Jelinek  
>
>   PR middle-end/97971
>   * lra-constraints.c (process_alt_operands): For inline asm, don't call
>   fatal_insn, but instead return false.
>
>   * gcc.target/i386/pr97971.c: New test.
OK
jeff



Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-02-02 Thread Qing Zhao via Gcc-patches
Hi,

With the following patch:

[qinzhao@localhost gcc]$ git diff tree-ssa-structalias.c
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index cf653be..bd18841 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4851,6 +4851,30 @@ find_func_aliases_for_builtin_call (struct function *fn, 
gcall *t)
   return false;
 }
 
+static void
+find_func_aliases_for_deferred_init (gcall *t)
+{
+  
+  tree lhsop = gimple_call_lhs (t);
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (t, 1));
+  auto_vec lhsc;
+  auto_vec rhsc;
+  struct constraint_expr temp;
+ 
+  get_constraint_for (lhsop, &lhsc);
+  if (init_type == AUTO_INIT_ZERO && flag_delete_null_pointer_checks)
+temp.var = nothing_id;
+  else
+temp.var = nonlocal_id;
+  temp.type = ADDRESSOF;
+  temp.offset = 0;
+  rhsc.safe_push (temp);
+
+  process_all_all_constraints (lhsc, rhsc);
+  return;
+}
+
 /* Create constraints for the call T.  */
 
 static void
@@ -4864,6 +4888,12 @@ find_func_aliases_for_call (struct function *fn, gcall 
*t)
   && find_func_aliases_for_builtin_call (fn, t))
 return;
 
+  if (gimple_call_internal_p (t, IFN_DEFERRED_INIT))
+{
+  find_func_aliases_for_deferred_init (t);
+  return;
+}
+

The *.ealias dump for the routine “bump_map” are exactly the same for approach 
A and D. 
However, the stack size for D still bigger than A. 

Any suggestions?

Qing


On Feb 2, 2021, at 9:17 AM, Qing Zhao via Gcc-patches  
wrote:
> 
> 
> 
>> On Feb 2, 2021, at 1:43 AM, Richard Biener  wrote:
>> 
>> On Mon, 1 Feb 2021, Qing Zhao wrote:
>> 
>>> Hi, Richard,
>>> 
>>> I have adjusted SRA phase to split calls to DEFERRED_INIT per you 
>>> suggestion.
>>> 
>>> And now the routine “bump_map” in 511.povray is like following:
>>> ...
>>> 
>>> # DEBUG BEGIN_STMT
>>> xcoor = 0.0;
>>> ycoor = 0.0;
>>> # DEBUG BEGIN_STMT
>>> index = .DEFERRED_INIT (index, 2);
>>> index2 = .DEFERRED_INIT (index2, 2);
>>> index3 = .DEFERRED_INIT (index3, 2);
>>> # DEBUG BEGIN_STMT
>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>> # DEBUG BEGIN_STMT
>>> p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2);
>>> # DEBUG p1$0 => p1$0_181
>>> p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2);
>>> # DEBUG p1$1 => p1$1_184
>>> p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2);
>>> # DEBUG p1$2 => p1$2_172
>>> p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2);
>>> # DEBUG p2$0 => p2$0_177
>>> p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2);
>>> # DEBUG p2$1 => p2$1_135
>>> p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2);
>>> # DEBUG p2$2 => p2$2_137
>>> p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2);
>>> # DEBUG p3$0 => p3$0_377
>>> p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2);
>>> # DEBUG p3$1 => p3$1_379
>>> p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2);
>>> # DEBUG p3$2 => p3$2_381
>>> 
>>> 
>>> In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of 
>>> the components of p1, p2 and p3. 
>>> 
>>> With this change, the stack usage numbers with -fstack-usage for approach 
>>> A, old approach D and new D with the splitting in SRA are:
>>> 
>>> Approach A  Approach D-old  Approach D-new
>>> 
>>> 272 624 368
>>> 
>>> From the above, we can see that splitting the call to DEFERRED_INIT in SRA 
>>> can reduce the stack usage increase dramatically. 
>>> 
>>> However, looks like that the stack size for D is still bigger than A. 
>>> 
>>> I checked the IR again, and found that the alias analysis might be 
>>> responsible for this (by compare the image.cpp.026t.ealias for both A and 
>>> D):
>>> 
>>> (Due to the call to:
>>> 
>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>> )
>>> 
>>> **Approach A:
>>> 
>>> Points_to analysis:
>>> 
>>> Constraints:
>>> …
>>> colour1 = &NULL
>>> …
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> ...
>>> callarg(53) = &colour1
>>> ...
>>> _53 = colour1
>>> 
>>> Points_to sets:
>>> …
>>> colour1 = { NULL ESCAPED NONLOCAL } same as _53
>>> ...
>>> CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 }
>>> CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as 
>>> CALLUSED(48)
>>> ...
>>> callarg(53) = { NULL ESCAPED NONLOCAL colour1 }
>>> 
>>> **Apprach D:
>>> 
>>> Points_to analysis:
>>> 
>>> Constraints:
>>> …
>>> callarg(19) = colour1
>>> callarg(19) = &NONLOCAL
>>> colour1 = callarg(19) + UNKNOWN
>>> colour1 = &NONLOCAL
>>> …
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> colour1 = &NONLOCAL
>>> …
>>> callarg(74) = &colour1
>>> callarg(74) = callarg(74) + UNKNOWN
>>> callarg(74) = *callarg(74) + UNKNOWN
>>> …
>>> _53 = colour1
>>> _54 = _53
>>> _55 = _54 + UNKNOWN
>>> _55 = &NONLOCAL
>>> _56 = colour1
>>> _57 = _56
>>> _58 = _57 + UNKNOWN
>>> _58 = &NONLOCAL
>>> _59 = _55 + UNKNOWN
>>> _59 = _58 + UNKNOW

Re: [PATCH] release pointer_query cache when done (PR 98937)

2021-02-02 Thread Martin Sebor via Gcc-patches

On 2/2/21 2:29 PM, David Malcolm wrote:

On Tue, 2021-02-02 at 12:57 -0700, Martin Sebor via Gcc-patches wrote:

The strlen pass initializes its pointer_query member object with
a cache consisting of a couple of vec's.  Because vec doesn't
implement RAII its memory must be explicitly released to avoid
memory leaks.  The attached patch adds a dtor to
the strlen_dom_walker to do that.

Tested on x86_64-linux and by verifying that the cache leaks are
gone by compiling gcc.dg/Wstringop-overflow*.c tests under Valgrind.

I'll plan to commit this change as "obvious" tomorrow unless there
are suggestions for changes.


Why not make the vecs within struct pointer_query::cache_type's be
auto_vecs?  Then presumably the autogenerated dtor for
pointer_query::cache_type would clean things up, as called from the
autogenerated dtor for strlen_dom_walker, when cleaning up the
var_cache field?

Or am I missing something?  (sorry, I'm not familiar with this code)
Dave


It would work as long as the cache isn't copied or assigned anywhere.
I don't think it is either, and GCC compiles and no C tests fail, so
it should be okay.

But I'm leery of using auto_vec as a class member because of pr90904.
Looks like auto_vec now has a move ctor and move assignment but still
no safe copy ctor or assignment.  The cache copy ctor and assignment
operator could be deleted to avoid accidental copies, but at that
point it starts to become more involved than just flushing the cache.

If you or someone else has a preference for using auto_vec despite
this I'll change it.  Otherwise I'd just as soon leave it as is.

Martin




Martin

PS Valgrind shows a fair number of leaks even with the patch but
none of them due to the pointer_query cache.







Re: [PATCH, rs6000] Optimization for PowerPC 64bit constant generation [PR94395]

2021-02-02 Thread Alan Modra via Gcc-patches
On Fri, Jan 29, 2021 at 11:11:23AM +0800, HAO CHEN GUI via Gcc-patches wrote:
>    This patch tries to optimize PowerPC 64 bit constant generation when the
> constant can be transformed from a 32 bit or 16 bit constant by rotating,
> shifting and mask AND.

All and more of what you are doing here for rotated 16-bit constants
is covered by
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html

That patch is still waiting on review.  Hmm, I see my local copy of
that patch has one extra line in
gcc/testsuite/gcc.target/powerpc/rot_cst2.c
+/* { dg-additional-options "-mno-prefixed" { target { lp64 } } } */
in order to keep scan-assembler-times counts correct for power10.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 2/2] RISC-V: Add riscv{32, 64}be with big endian as default

2021-02-02 Thread Kito Cheng via Gcc-patches
> > Could you extract the endian related LINK_SPEC change to
> > ENDIAN_LINK_SPEC to riscv.h, so that we can prevent
> > duplicate this several times.
>
> You mean a define which expands to
>
>   "-melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv"

Yeah, but I'd like to include following 2 lines too:

%{mbig-endian:-EB} \
%{mlittle-endian:-EL} \

I saw it's just the same among 3 files.

>
> ?  Sure, but I don't think ENDIAN_LINK_SPEC would be a good name for
> it since it defines the word size as well as the endianness, and also
> ELF in general.
>
> Maybe ELF_LINK_SPEC?  The word size and endianness are also ELF
> properties (as encoded in EI_CLASS and EI_DATA).

Either ENDIAN_LINK_SPEC or ELF_LINK_SPEC is ok to me,
I don't have strong preference on nanming.

>
>
>   // Marcus
>
>


[PATCH] rs6000: Use rldimi for vec init instead of shift + ior

2021-02-02 Thread Kewen.Lin via Gcc-patches
Hi,

This patch merges the previously approved one[1] and its relied patch
made by Segher here[2], it's to make unsigned int vector init go with
rldimi to merge two integers instead of shift and ior.

Segher's patch in [2] is required to make the test case pass,
otherwise the costing for new pseudo-to-pseudo copies and the folding
with nonzero_bits in combine will make the rl*imi pattern become
compact and split into ior and shift unexpectedly.

The commit log of Segher's patch describes it in more details:

"An rl*imi is usually written as an IOR of an ASHIFT or similar, and an
AND of a register with a constant mask.  In some cases combine knows
that that AND doesn't do anything (because all zero bits in that mask
correspond to bits known to be already zero), and then no pattern
matches.  This patch adds a define_split for such cases.  It uses
nonzero_bits in the condition of the splitter, but does not need it
afterwards for the instruction to be recognised.  This is necessary
because later passes can see fewer nonzero_bits.

Because it is a splitter, combine will only use it when starting with
three insns (or more), even though the result is just one.  This isn't
a huge problem in practice, but some possible combinations still won't
happen."

Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8, also SPEC2017 build/run passed on P9.

Is it ok for trunk?

BR,
Kewen

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563526.html


gcc/ChangeLog:

2020-02-03  Segher Boessenkool  
Kewen Lin  

* config/rs6000/rs6000.md (*rotl3_insert_3): Renamed to...
(rotl3_insert_3): ...this.
(plus_ior_xor): New code_iterator.
(define_split for GPR rl*imi): New splitter.
* config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
for integer merging.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vec-init-10.c: New test.

-

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bb9fb42f82a..dca311ebc80 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4067,7 +4067,7 @@
   [(set_attr "type" "insert")])
 
 ; There are also some forms without one of the ANDs.
-(define_insn "*rotl3_insert_3"
+(define_insn "rotl3_insert_3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
  (match_operand:GPR 4 "const_int_operand" "n"))
@@ -4082,6 +4082,24 @@
 }
   [(set_attr "type" "insert")])
 
+(define_code_iterator plus_ior_xor [plus ior xor])
+
+(define_split
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:SI 2 "const_int_operand"))
+ (match_operand:GPR 3 "gpc_reg_operand")))]
+  "nonzero_bits (operands[3], mode)
+   < HOST_WIDE_INT_1U << INTVAL (operands[2])"
+  [(set (match_dup 0)
+   (ior:GPR (and:GPR (match_dup 3)
+ (match_dup 4))
+(ashift:GPR (match_dup 1)
+(match_dup 2]
+{
+  operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1);
+})
+
 (define_insn "*rotl3_insert_4"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0c1bda522a9..07c2f7ffa6e 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3008,28 +3008,22 @@
(use (match_operand:SI 4 "gpc_reg_operand"))]
"VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
 {
-  rtx a = gen_reg_rtx (DImode);
-  rtx b = gen_reg_rtx (DImode);
-  rtx c = gen_reg_rtx (DImode);
-  rtx d = gen_reg_rtx (DImode);
-  emit_insn (gen_zero_extendsidi2 (a, operands[1]));
-  emit_insn (gen_zero_extendsidi2 (b, operands[2]));
-  emit_insn (gen_zero_extendsidi2 (c, operands[3]));
-  emit_insn (gen_zero_extendsidi2 (d, operands[4]));
+  rtx a = gen_lowpart_SUBREG (DImode, operands[1]);
+  rtx b = gen_lowpart_SUBREG (DImode, operands[2]);
+  rtx c = gen_lowpart_SUBREG (DImode, operands[3]);
+  rtx d = gen_lowpart_SUBREG (DImode, operands[4]);
   if (!BYTES_BIG_ENDIAN)
 {
   std::swap (a, b);
   std::swap (c, d);
 }
 
-  rtx aa = gen_reg_rtx (DImode);
   rtx ab = gen_reg_rtx (DImode);
-  rtx cc = gen_reg_rtx (DImode);
   rtx cd = gen_reg_rtx (DImode);
-  emit_insn (gen_ashldi3 (aa, a, GEN_INT (32)));
-  emit_insn (gen_ashldi3 (cc, c, GEN_INT (32)));
-  emit_insn (gen_iordi3 (ab, aa, b));
-  emit_insn (gen_iordi3 (cd, cc, d));
+  emit_insn (gen_rotldi3_insert_3 (ab, a, GEN_INT (32), b,
+  GEN_INT (0x)));
+  emit_insn (gen_rotldi3_insert_3 (cd, c, GEN_INT (32), d,
+  GEN_INT (0x

Re: [PATCH, rs6000] Optimization for PowerPC 64bit constant generation [PR94395]

2021-02-02 Thread HAO CHEN GUI via Gcc-patches

Alan,

   Thanks for your info. Just notice your patch. I will wait for your 
patch being reviewed.


On 3/2/2021 上午 10:32, Alan Modra wrote:

On Fri, Jan 29, 2021 at 11:11:23AM +0800, HAO CHEN GUI via Gcc-patches wrote:

    This patch tries to optimize PowerPC 64 bit constant generation when the
constant can be transformed from a 32 bit or 16 bit constant by rotating,
shifting and mask AND.

All and more of what you are doing here for rotated 16-bit constants
is covered by
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html

That patch is still waiting on review.  Hmm, I see my local copy of
that patch has one extra line in
gcc/testsuite/gcc.target/powerpc/rot_cst2.c
+/* { dg-additional-options "-mno-prefixed" { target { lp64 } } } */
in order to keep scan-assembler-times counts correct for power10.