VEC_COND_EXPR optimizations

2020-07-30 Thread Marc Glisse
When vector comparisons were forced to use vec_cond_expr, we lost a number 
of optimizations (my fault for not adding enough testcases to prevent 
that). This patch tries to unwrap vec_cond_expr a bit so some 
optimizations can still happen.


I wasn't planning to add all those transformations together, but adding 
one caused a regression, whose fix introduced a second regression, etc.


Using a simple fold_binary internally looks like an ok compromise to me. 
It remains cheap enough (not recursive, and vector instructions are not 
that frequent), while still allowing more than const_binop (X|0 or X&X for 
instance). The transformations are quite conservative with :s and folding 
only if everything simplifies, we may want to relax this later. And of 
course we are going to miss things like a?b:c + a?c:b -> b+c.


In terms of number of operations, some transformations turning 2 
VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not 
look like a gain... I expect the bit_not disappears in most cases, and 
VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.


I am a bit confused that with avx512 we get types like "vector(4) 
" with :2 and not :1 (is it a hack so true is 1 and not 
-1?), but that doesn't matter for this patch.


Regtest+bootstrap on x86_64-pc-linux-gnu

2020-07-30  Marc Glisse  

PR tree-optimization/95906
PR target/70314
* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.

* gcc.dg/tree-ssa/andnot-2.c: New file.
* gcc.dg/tree-ssa/pr95906.c: Likewise.
* gcc.target/i386/pr70314.c: Likewise.

--
Marc Glissediff --git a/gcc/match.pd b/gcc/match.pd
index c6ae7a7db7a..af52d56162b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3451,6 +3451,77 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (cst1 && cst2)
 (vec_cond @0 { cst1; } { cst2; })
 
+/* Sink binary operation to branches, but only if we can fold it.  */
+#if GIMPLE
+(for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
+	 rdiv trunc_div ceil_div floor_div round_div
+	 trunc_mod ceil_mod floor_mod round_mod min max)
+/* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
+ (simplify
+  (op (vec_cond:s @0 @1 @2) @3)
+  (with
+   {
+ tree rhs1, rhs2 = NULL;
+ rhs1 = fold_binary (op, type, @1, @3);
+ if (rhs1 && is_gimple_val (rhs1))
+   rhs2 = fold_binary (op, type, @2, @3);
+   }
+   (if (rhs2 && is_gimple_val (rhs2))
+(vec_cond @0 { rhs1; } { rhs2; }
+ (simplify
+  (op @3 (vec_cond:s @0 @1 @2))
+  (with
+   {
+ tree rhs1, rhs2 = NULL;
+ rhs1 = fold_binary (op, type, @3, @1);
+ if (rhs1 && is_gimple_val (rhs1))
+   rhs2 = fold_binary (op, type, @3, @2);
+   }
+   (if (rhs2 && is_gimple_val (rhs2))
+(vec_cond @0 { rhs1; } { rhs2; }
+
+/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
+ (simplify
+  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
+  (with
+   {
+ tree rhs1, rhs2 = NULL;
+ rhs1 = fold_binary (op, type, @1, @3);
+ if (rhs1 && is_gimple_val (rhs1))
+   rhs2 = fold_binary (op, type, @2, @4);
+   }
+   (if (rhs2 && is_gimple_val (rhs2))
+(vec_cond @0 { rhs1; } { rhs2; })
+#endif
+
+/* (v ? w : 0) ? a : b is just (v & w) ? a : b  */
+(simplify
+ (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
+ (vec_cond (bit_and @0 @3) @1 @2))
+(simplify
+ (vec_cond (vec_cond:s @0 integer_all_onesp @3) @1 @2)
+ (vec_cond (bit_ior @0 @3) @1 @2))
+(simplify
+ (vec_cond (vec_cond:s @0 integer_zerop @3) @1 @2)
+ (vec_cond (bit_ior @0 (bit_not @3)) @2 @1))
+(simplify
+ (vec_cond (vec_cond:s @0 @3 integer_all_onesp) @1 @2)
+ (vec_cond (bit_and @0 (bit_not @3)) @2 @1))
+
+/* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
+(simplify
+ (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
+ (vec_cond (bit_and @0 @1) @2 @3))
+(simplify
+ (vec_cond @0 @2 (vec_cond:s @1 @2 @3))
+ (vec_cond (bit_ior @0 @1) @2 @3))
+(simplify
+ (vec_cond @0 (vec_cond:s @1 @2 @3) @2)
+ (vec_cond (bit_ior (bit_not @0) @1) @2 @3))
+(simplify
+ (vec_cond @0 @3 (vec_cond:s @1 @2 @3))
+ (vec_cond (bit_and (bit_not @0) @1) @2 @3))
+
 /* Simplification moved from fold_cond_expr_with_comparison.  It may also
be extended.  */
 /* This pattern implements two kinds simplification:
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c b/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c
new file mode 100644
index 000..e0955ce3ffd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop3-raw -w -Wno-psabi" } */
+
+typedef long vec __attribute__((vector_size(16)));
+vec f(vec x){
+  vec y = x < 10;
+  return y & (y == 0);
+}
+
+/* { dg-final { scan-tree-dump-not "_expr" "forwprop3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c
new file mode 100644
index 000..3d820a58e93
--- /dev/null
+++ b/gcc/testsuite/

[PATCH] Do not allocate huge array in output_in_order.

2020-07-30 Thread Martin Liška

We noticed that when analyzing LTRANS memory peak that happens right
after the start:
https://gist.github.com/marxin/223890df4d8d8e490b6b2918b77dacad

In case of chrome, we have symtab->order == 200M, so we allocate
16B * 200M = 3.2GB.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* cgraph.h: Remove leading empty lines.
* cgraphunit.c (enum cgraph_order_sort_kind): Remove
ORDER_UNDEFINED.
(struct cgraph_order_sort): Add constructors.
(cgraph_order_sort::process): New.
(cgraph_order_cmp): New.
(output_in_order): Simplify and push nodes to vector.
---
 gcc/cgraph.h |   2 -
 gcc/cgraphunit.c | 158 +--
 2 files changed, 85 insertions(+), 75 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index cfae6e91da9..0211f08964f 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -2171,8 +2171,6 @@ private:
 /* Every top level asm statement is put into a asm_node.  */
 
 struct GTY(()) asm_node {

-
-
   /* Next asm node.  */
   asm_node *next;
   /* String for this asm node.  */
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index ea9a34bda6f..0a95eb93ce2 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2492,7 +2492,6 @@ expand_all_functions (void)
 
 enum cgraph_order_sort_kind

 {
-  ORDER_UNDEFINED = 0,
   ORDER_FUNCTION,
   ORDER_VAR,
   ORDER_VAR_UNDEF,
@@ -2501,6 +2500,30 @@ enum cgraph_order_sort_kind
 
 struct cgraph_order_sort

 {
+  /* Construct from a cgraph_node.  */
+  cgraph_order_sort (cgraph_node *node)
+  : kind (ORDER_FUNCTION), order (node->order)
+  {
+u.f = node;
+  }
+
+  /* Construct from a varpool_node.  */
+  cgraph_order_sort (varpool_node *node)
+  : kind (node->definition ? ORDER_VAR : ORDER_VAR_UNDEF), order (node->order)
+  {
+u.v = node;
+  }
+
+  /* Construct from a asm_node.  */
+  cgraph_order_sort (asm_node *node)
+  : kind (ORDER_ASM), order (node->order)
+  {
+u.a = node;
+  }
+
+  /* Assembly cgraph_order_sort based on its type.  */
+  void process ();
+
   enum cgraph_order_sort_kind kind;
   union
   {
@@ -2508,8 +2531,45 @@ struct cgraph_order_sort
 varpool_node *v;
 asm_node *a;
   } u;
+  int order;
 };
 
+/* Assembly cgraph_order_sort based on its type.  */

+
+void
+cgraph_order_sort::process ()
+{
+  switch (kind)
+{
+case ORDER_FUNCTION:
+  u.f->process = 0;
+  u.f->expand ();
+  break;
+case ORDER_VAR:
+  u.v->assemble_decl ();
+  break;
+case ORDER_VAR_UNDEF:
+  assemble_undefined_decl (u.v->decl);
+  break;
+case ORDER_ASM:
+  assemble_asm (u.a->asm_str);
+  break;
+default:
+  gcc_unreachable ();
+}
+}
+
+/* Compare cgraph_order_sort by order.  */
+
+static int
+cgraph_order_cmp (const void *a_p, const void *b_p)
+{
+  const cgraph_order_sort *nodea = (const cgraph_order_sort *)a_p;
+  const cgraph_order_sort *nodeb = (const cgraph_order_sort *)b_p;
+
+  return nodea->order - nodeb->order;
+}
+
 /* Output all functions, variables, and asm statements in the order
according to their order fields, which is the order in which they
appeared in the file.  This implements -fno-toplevel-reorder.  In
@@ -2519,89 +2579,41 @@ struct cgraph_order_sort
 static void
 output_in_order (void)
 {
-  int max;
-  cgraph_order_sort *nodes;
   int i;
-  cgraph_node *pf;
-  varpool_node *pv;
-  asm_node *pa;
-  max = symtab->order;
-  nodes = XCNEWVEC (cgraph_order_sort, max);
+  cgraph_node *cnode;
+  varpool_node *vnode;
+  asm_node *anode;
+  auto_vec nodes;
+  cgraph_order_sort *node;
 
-  FOR_EACH_DEFINED_FUNCTION (pf)

-{
-  if (pf->process && !pf->thunk.thunk_p && !pf->alias)
-   {
- if (!pf->no_reorder)
-   continue;
- i = pf->order;
- gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
- nodes[i].kind = ORDER_FUNCTION;
- nodes[i].u.f = pf;
-   }
-}
+  FOR_EACH_DEFINED_FUNCTION (cnode)
+if (cnode->process && !cnode->thunk.thunk_p
+   && !cnode->alias && cnode->no_reorder)
+  nodes.safe_push (cgraph_order_sort (cnode));
 
   /* There is a similar loop in symbol_table::output_variables.

  Please keep them in sync.  */
-  FOR_EACH_VARIABLE (pv)
-{
-  if (!pv->no_reorder)
-   continue;
-  if (DECL_HARD_REGISTER (pv->decl)
- || DECL_HAS_VALUE_EXPR_P (pv->decl))
-   continue;
-  i = pv->order;
-  gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
-  nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDEF;
-  nodes[i].u.v = pv;
-}
-
-  for (pa = symtab->first_asm_symbol (); pa; pa = pa->next)
-{
-  i = pa->order;
-  gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
-  nodes[i].kind = ORDER_ASM;
-  nodes[i].u.a = pa;
-}
-
-  /* In toplevel reorder mode we output all statics; mark them as needed.  */
-
-  for (i = 0; i < max; ++i)
-if (nodes[i].kind ==

[Patch, wwwdocs] gcc-11/changes.html: Update OpenMP status

2020-07-30 Thread Tobias Burnus

This does a first update of the GCC 11 changes for
OpenMP. Wording improvements highly welcome. The new features
are not fully supported yet – but I did not know how to
best state this. And especially on the Fortran side there
are still many loose ends – but Stage 1 is long and many more
patches are expected. :-)

Thoughts? OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
gcc-11/changes.html: Update OpenMP status

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index d3fb9882..e1df2c9b 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -67,6 +67,17 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
+  GCC 11 adds the nonrectangular loops features and the allocator routines
+  of https://www.openmp.org/specifications/";>OpenMP 5.0.
+  For Fortran, OpenMP 4.5 is now finally fully supported and OpenMP 5.0
+  support has been extended, including the following features which were
+  before only available in C and C++: order(concurrent),
+  lastprivate with conditional modifier,
+  if clause with simd/cancel
+  modifiers, target data without map clause,
+  and limitted support for the requires construct.
+  
+
 
 
 


Re: [Patch, wwwdocs] gcc-11/changes.html: Update OpenMP status

2020-07-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 30, 2020 at 10:17:29AM +0200, Tobias Burnus wrote:
> --- a/htdocs/gcc-11/changes.html
> +++ b/htdocs/gcc-11/changes.html
> @@ -67,6 +67,17 @@ a work-in-progress.
>  
>  New Languages and Language specific improvements
>  
> +  GCC 11 adds the nonrectangular loops features and the allocator 
> routines

s/the nonrectangular loops features/support for non-rectangular loop nests in 
OpenMP constructs/

> +  of https://www.openmp.org/specifications/";>OpenMP 5.0.
> +  For Fortran, OpenMP 4.5 is now finally fully supported and OpenMP 5.0
> +  support has been extended, including the following features which were
> +  before only available in C and C++: order(concurrent),
> +  lastprivate with conditional modifier,
> +  if clause with simd/cancel

replace / with and?

> +  modifiers, target data without map clause,
> +  and limitted support for the requires construct.
> +  

Otherwise LGTM

Jakub



[PATCH] AArch64: Fix hwasan failure in readline.

2020-07-30 Thread Tamar Christina
Hi All,

My previous fix added an unchecked call to fgets in the new function readline.
fgets can fail when there's an error reading the file in which case it returns
NULL.  It also returns NULL when the next character is EOF.

The EOF case is already covered by the existing code but the error case isn't.
This fixes it by returning the empty string on error.

Also I now use strnlen instead of strlen to make sure we never read outside the
buffer.

This was flagged by Matthew Malcomson during his hwasan work.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? And for backport with the other patches? (haven't done backport 
yet.)

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/driver-aarch64.c (readline): Check return value fgets.

-- 
diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c
index 0c706292333bd7445356f2f5bfdfb27b3badc063..d68a725899a7256a5541d547acb120a98fbc7c0a 100644
--- a/gcc/config/aarch64/driver-aarch64.c
+++ b/gcc/config/aarch64/driver-aarch64.c
@@ -191,10 +191,16 @@ readline (FILE *f)
   size += buf_size;
   buf = (char*) xrealloc (buf, size);
   gcc_assert (buf);
-  fgets (buf + last, buf_size, f);
+  /* If fgets fails it returns NULL, but if it reaches EOF
+	 with 0 characters read it also returns EOF.  However
+	 the condition on the loop would have broken out of the
+	 loop in that case,  and if we are in the first iteration
+	 then the empty string is the correct thing to return.  */
+  if (!fgets (buf + last, buf_size, f))
+	return std::string ();
   /* If we're not at the end of the line then override the
 	 \0 added by fgets.  */
-  last = strlen (buf) - 1;
+  last = strnlen (buf, size) - 1;
 }
   while (!feof (f) && buf[last] != '\n');
 



Re: [PATCH, rs6000] Add non-relative jump table support for 64bit rs6000

2020-07-30 Thread David Edelsohn via Gcc-patches
On Thu, Jul 30, 2020 at 1:22 AM HAO CHEN GUI  wrote:
>
> David,
>
> Seems there is something wrong with my email box.  I lost your email. I
> reconfigured the box and it should be OK now.
>
> Could you inform me how to exclude AIX from the condition testing? By
> the ABI? Thanks a lot.

The purpose should not be to exclude AIX.  If there is no fundamental
limitation with use of the new tablejump design on AIX then the patch
is not acceptable without AIX support.

The patch should use DOUBLE_INT_ASM_OP, not explicit ".quad".  AIX
always is PIC.  It's not obvious to me why the patch should limit
support to PPC64 Linux.  The section selection seems Linux/ELF
specific, but the rest seems like a general design optimization for
all PowerPC-based operating systems.

If you have questions about AIX details, that's fine. But limiting the
implementation to Linux is not acceptable.  All other ISA and
optimization features added to GCC that are supported on all OSes are
implemented on all OSes.  This tablejump change should be no
different.

Thanks, David


Thanks, David


Thanks, David


Re: [PATCH] nvptx: Provide vec_set and vec_extract patterns.

2020-07-30 Thread Tom de Vries
On 7/15/20 9:13 AM, Roger Sayle wrote:
> 
> This patch provides standard vec_extract and vec_set patterns to the
> nvptx backend, to extract an element from a PTX vector and set an
> element of a PTX vector respectively.  PTX vectors (I hesitate to
> call them SIMD vectors) may contain up to four elements, so vector
> modes up to size four are supported by this patch even though the
> nvptx backend currently only allows V2SI and V2DI, i.e. two out
> of the ten possible vector modes.
> 
> As an example of the improvement, the following C function:
> 
> typedef int __v2si __attribute__((__vector_size__(8)));
> int foo (__v2si arg) { return arg[0]+arg[1]; }
> 
> previously generated this code using a shift:
> 
> mov.u64 %r25, %ar0;
> ld.v2.u32   %r26, [%r25];
> mov.b64 %r28, %r26;
> shr.s64 %r30, %r28, 32;
> cvt.u32.u32 %r31, %r26.x;
> cvt.u32.u64 %r32, %r30;
> add.u32 %value, %r31, %r32;
> 
> but with this patch now generates:
> 
> mov.u64 %r25, %ar0;
> ld.v2.u32   %r26, [%r25];
> mov.u32 %r28, %r26.x;
> mov.u32 %r29, %r26.y;
> add.u32 %value, %r28, %r29;
> 
> I've implemented these getters and setters as their own instructions
> instead of attempting the much more intrusive patch of changing the
> backend's definition of register_operand.  Given the limited utility
> of PTX vectors, I'm not convinced that attempting to support them as
> operands in every instruction would be worth the effort involved.
> 
> This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
> with "make" and "make check" with no new regressions.
> Ok for mainline?
> 
> 
> 2020-07-15  Roger Sayle  
> 
> gcc/ChangeLog:
>   * config/nvptx/nvptx.md (nvptx_vector_index_operand): New predicate.
>   (VECELEM): New mode attribute for a vector's uppercase element mode.
>   (Vecelem): New mode attribute for a vector's lowercase element mode.
>   (*vec_set_0, *vec_set_1, *vec_set_2,
>   *vec_set_3): New instructions.
>   (vec_set): New expander to generate one of the above insns.
>   (vec_extract): New instruction.

Added test-case, fixed some nits, pushed (not reposting).

Thanks,
- Tom



Re: RFC: Monitoring old PRs, new dg directives

2020-07-30 Thread Martin Liška

Hello.

I support the initiative!
What would be nice to add leading 'PR component/12345'
to a git commit so that these test additions are linked to bugzilla issues.

Martin


[PATCH] tree-optimization/96370 - make reassoc expr rewrite more robust

2020-07-30 Thread Richard Biener
In the face of the more complex tricks in reassoc with respect
to negate processing it can happen that the expression rewrite
is fooled to recurse on a leaf and pick up a bogus expression
code.  The following patch makes the expression rewrite more
robust in providing the expression code to it directly since
it is the same for all operations in a chain.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to master.

2020-07-30  Richard Biener  

PR tree-optimization/96370
* tree-ssa-reassoc.c (rewrite_expr_tree): Add operation
code parameter and use it instead of picking it up from
the stmt that is being rewritten.
(reassociate_bb): Pass down the operation code.

* gcc.dg/pr96370.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr96370.c |  8 
 gcc/tree-ssa-reassoc.c | 10 +-
 2 files changed, 13 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr96370.c

diff --git a/gcc/testsuite/gcc.dg/pr96370.c b/gcc/testsuite/gcc.dg/pr96370.c
new file mode 100644
index 000..b939b2141d6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr96370.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target dfp } } */
+/* { dg-options "-O2 -ffast-math" } */
+
+void c(_Decimal128);
+void a(_Decimal128 b)
+{
+  c(-b * b);
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index d06b693ec76..266cff376e5 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -4913,7 +4913,7 @@ insert_stmt_before_use (gimple *stmt, gimple 
*stmt_to_insert)
recursive invocations.  */
 
 static tree
-rewrite_expr_tree (gimple *stmt, unsigned int opindex,
+rewrite_expr_tree (gimple *stmt, enum tree_code rhs_code, unsigned int opindex,
   vec ops, bool changed, bool next_changed)
 {
   tree rhs1 = gimple_assign_rhs1 (stmt);
@@ -4960,7 +4960,7 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
= find_insert_point (stmt, oe1->op, oe2->op);
  lhs = make_ssa_name (TREE_TYPE (lhs));
  stmt
-   = gimple_build_assign (lhs, gimple_assign_rhs_code (stmt),
+   = gimple_build_assign (lhs, rhs_code,
   oe1->op, oe2->op);
  gimple_set_uid (stmt, uid);
  gimple_set_visited (stmt, true);
@@ -5004,7 +5004,7 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
   /* Recurse on the LHS of the binary operator, which is guaranteed to
  be the non-leaf side.  */
   tree new_rhs1
-= rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops,
+= rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), rhs_code, opindex + 1, ops,
 changed || oe->op != rhs2 || next_changed,
 false);
 
@@ -5030,7 +5030,7 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
  gimple *insert_point = find_insert_point (stmt, new_rhs1, oe->op);
 
  lhs = make_ssa_name (TREE_TYPE (lhs));
- stmt = gimple_build_assign (lhs, gimple_assign_rhs_code (stmt),
+ stmt = gimple_build_assign (lhs, rhs_code,
  new_rhs1, oe->op);
  gimple_set_uid (stmt, uid);
  gimple_set_visited (stmt, true);
@@ -6477,7 +6477,7 @@ reassociate_bb (basic_block bb)
   if (len >= 3)
 swap_ops_for_binary_stmt (ops, len - 3, stmt);
 
- new_lhs = rewrite_expr_tree (stmt, 0, ops,
+ new_lhs = rewrite_expr_tree (stmt, rhs_code, 0, ops,
   powi_result != NULL
   || negate_result,
   len != orig_len);
-- 
2.26.2


Re: [PATCH 8/9] [OpenACC] Fix standalone attach for Fortran assumed-shape array pointers

2020-07-30 Thread Thomas Schwinge
Hi Julian, Tobias!

On 2020-07-27T15:33:41+0100, Julian Brown  wrote:
> On Fri, 17 Jul 2020 13:16:11 +0200
> Thomas Schwinge  wrote:
>> On 2020-07-15T12:28:42+0200, Thomas Schwinge
>>  wrote:
>> > On 2020-07-14T13:43:37+0200, I wrote:
>> >> On 2020-06-16T15:39:44-0700, Julian Brown
>> >>  wrote:
>> >>> As mentioned in the blurb for the previous patch, an "attach"
>> >>> operation for a Fortran pointer with an array descriptor must
>> >>> copy that array descriptor to the target.
>> >>
>> >> Heh, I see -- I don't think I had read the OpenACC standard in
>> >> that way, but I think I agree your interpretation is fine.
>> >>
>> >> This does not create some sort of memory leak -- everything
>> >> implicitly allocated there will eventually be deallocated again,
>> >> right?
>>
>> Unanswered -- but I may now have found this problem, and also found
>> "the reverse problem" ('finalize'); see below.
>
> Sorry, I didn't answer this explicitly -- the idea was to pair alloc
> (present) and release mappings for the pointed-to data. In that way,
> the idea was for the release mapping to perform that deallocation. That
> was partly done so that the existing handling in gfc_trans_omp_clauses
> could be used for this case without too much disruption to the code --
> but actually, after Tobias's reorganisation of that function, that's
> not really so much of an issue any more.
>
> You can still get a "leak" if you try to attach a synthesized/temporary
> array descriptor that goes out of scope before the pointed-to data it
> refers to does -- that's a problem I've mentioned earlier, and is
> kind-of unavoidable unless we do some more sophisticated analysis to
> diagnose it as user error.

ACK.  Do you remember if you already had a testcase (conceptual, or
actual) to demonstrate that problem?

>> >>> This patch arranges for that to be so.
>> >>
>> >> In response to the new OpenACC/Fortran testcase that I'd submtited
>> >> in
>> >> <87wo3co0tm.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87wo3co0tm.fsf@euler.schwinge.homeip.net>,
>> >> you (Julian) correctly supposed in
>> >> ,
>> >> that this patch indeed does resolve that testcase, too.  That
>> >> wasn't obvious to me.  So, similar to
>> >> 'libgomp/testsuite/libgomp.oacc-c-c++-common/pr95270-{1.2}.c',
>> >> please include my new OpenACC/Fortran testcase (if that makes
>> >> sense to you), and reference PR95270 in the commit log.
>> >
>> > My new OpenACC/Fortran testcase got again broken ('libgomp: pointer
>> > target not mapped for attach') by Tobias' commit
>> > 102502e32ea4e8a75d6b252ba319d09d735d9aa7 "[OpenMP, Fortran] Add
>> > structure/derived-type element mapping",
>> > http://mid.mail-archive.com/c5b43e02-d1d5-e7cf-c11c-6daf1e8f33c5@codesourcery.com>.
>> >
>> > Similar ('libgomp: attempt to attach null pointer') for your new
>> > 'libgomp.oacc-fortran/attach-descriptor-1.f90'.
>> >
>> > (Whether or not 'attach'ing 'NULL' should actually be allowed, is a
>> > separate topic for discussion.)
>> >
>> > So this patch here will (obviously) need to be adapted to what
>> > Tobias changed.
>>
>> I see what you pushed in commit
>> 39dda0020801045d9a604575b2a2593c05310015 "openacc: Fix standalone
>> attach for Fortran assumed-shape array pointers" indeed has become
>> much smaller/simpler.  :-)
>
> Yes, thank you.
>
>> (But, (parts of?) Tobias' commit mentioned above (plus commit
>> 524862db444b6544c6dc87c5f06f351100ecf50d "Fix goacc/finalize-1.f tree
>> dump-scanning for -m32", if applicable) will then also need to be
>> backported to releases/gcc-10 branch (once un-frozen).)
>>
>> > (Plus my more general questions quoted above and below.)
>>
>> >>> OK?
>> >>
>> >> Basically yes (for master and releases/gcc-10 branches), but please
>> >> consider the following:
>> >>
>> >>> --- a/gcc/fortran/trans-openmp.c
>> >>> +++ b/gcc/fortran/trans-openmp.c
>> >>> @@ -2573,8 +2573,44 @@ gfc_trans_omp_clauses (stmtblock_t *block,
>> >>> gfc_omp_clauses *clauses, }
>> >>>  }
>> >>>if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl))
>> >>> -  && n->u.map_op != OMP_MAP_ATTACH
>> >>> -  && n->u.map_op != OMP_MAP_DETACH)
>> >>> +  && (n->u.map_op == OMP_MAP_ATTACH
>> >>> +  || n->u.map_op == OMP_MAP_DETACH))
>> >>> +{
>> >>> +  tree type = TREE_TYPE (decl);
>> >>> +  tree data = gfc_conv_descriptor_data_get (decl);
>> >>> +  if (present)
>> >>> +data = gfc_build_cond_assign_expr (block, 
>> >>> present,
>> >>> +   data,
>> >>> + null_pointer_node);
>> >>> +  tree ptr
>> >>> += fold_convert (build_pointer_type 
>> >>> (char_type_node),
>> >>> + 

Re: RFC: Monitoring old PRs, new dg directives

2020-07-30 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 28, 2020 at 05:44:47PM -0400, Marek Polacek via Gcc-patches wrote:
> We will still have a surfeit of bugs that we've given short shrift to, but
> let's at least automate what we can.  The initial addition of the relevant
> old(-ish) tests won't of course happen automagically, but it's a price I'm
> willing to pay.  My goal here isn't merely to reduce the number of open PRs;
> it is to improve the testing of the compiler overall.
> 
> Thoughts?

Looks useful to me, but I'd think it might be desirable to use separate
directories for those tests, so that it is more obvious that it is a
different category of tests.  Now that we use git, just using git mv
to move them to another place once they are fixed for good (together with
some dg-* directive tweaks) wouldn't be that much work later.

So having gcc.dg/unfixed/ , g++.dg/unfixed/ , c-c++-common/unfixed/
and their torture/ suffixed variants (or better directory name for those)?

Jakub



[PATCH] debug/96383 - emit debug info for used external functions

2020-07-30 Thread Richard Biener
This makes sure to emit full declaration DIEs including
formal parameters for used external functions.  This helps
debugging when debug information of the external entity is
not available and also helps external tools cross-checking
ABI compatibility which was the bug reporters use case.

For cc1 this affects debug information size as follows:

 VM SIZE FILE SIZE
 ++ GROWING   ++
  [ = ]   0 .debug_info   +1.63Mi  +1.3%
  [ = ]   0 .debug_str +263Ki  +3.4%
  [ = ]   0 .debug_abbrev  +101Ki  +4.9%
  [ = ]   0 .debug_line   +5.71Ki  +0.0%
   +44% +16 [Unmapped]+48  +1.2%

 -- SHRINKING --
  [ = ]   0 .debug_loc   -213  -0.0%
  -0.0% -48 .text -48  -0.0%
  [ = ]   0 .debug_ranges -16  -0.0%

  -0.0% -32 TOTAL +1.99Mi  +0.6%

and DWARF compression via DWZ can only shave off minor bits
here.

Previously we emitted no DIEs for external functions at all
unless they were referenced via DW_TAG_GNU_call_site which
for some GCC revs caused a regular DIE to appear and since
GCC 4.9 only a stub without formal parameters.  This means
at -O0 we did not emit any DIE for external functions
but with optimization we emitted stubs.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK for trunk and backports?

Thanks,
Richard.

2020-07-30  Richard Biener  

PR debug/96383
* cgraphunit.c (symbol_table::finalize_compilation_unit):
Emit debug information for all used functions, not for
definitions.

* gcc.dg/debug/dwarf2/pr96383-1.c: New testcase.
* gcc.dg/debug/dwarf2/pr96383-2.c: Likewise.
---
 gcc/cgraphunit.c  |  2 +-
 gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c | 17 +
 gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-2.c | 17 +
 3 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-2.c

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index ea9a34bda6f..3873cf1195d 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2989,7 +2989,7 @@ symbol_table::finalize_compilation_unit (void)
   /* Emit early debug for reachable functions, and by consequence,
 locally scoped symbols.  */
   struct cgraph_node *cnode;
-  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (cnode)
+  FOR_EACH_FUNCTION (cnode)
(*debug_hooks->early_global_decl) (cnode->decl);
 
   /* Clean up anything that needs cleaning up after initial debug
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c
new file mode 100644
index 000..ede30f9a95e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-g -gdwarf -dA" } */
+
+extern void foo (int);
+extern void bar (int);
+
+int main()
+{
+  foo (1);
+}
+
+/* We want subprogram DIEs for both foo and main and a DIE for
+   the formal parameter of foo.  We do not want a DIE for
+   unusedbar.  */
+/* { dg-final { scan-assembler-times "DW_TAG_subprogram" 4 } } */
+/* { dg-final { scan-assembler-times "DW_TAG_formal_parameter" 2 } } */
+/* { dg-final { scan-assembler-not "unusedbar" } } */
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-2.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-2.c
new file mode 100644
index 000..c3a710e2f89
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O2 -gdwarf -dA" } */
+
+extern void foo (int);
+extern void unusedbar (int);
+
+int main()
+{
+  foo (1);
+}
+
+/* We want subprogram DIEs for both foo and main and a DIE for
+   the formal parameter of foo.  We do not want a DIE for
+   unusedbar.  */
+/* { dg-final { scan-assembler-times "DW_TAG_subprogram" 4 } } */
+/* { dg-final { scan-assembler-times "DW_TAG_formal_parameter" 2 } } */
+/* { dg-final { scan-assembler-not "unusedbar" } } */
-- 
2.26.2


Re: [PATCH] debug/96383 - emit debug info for used external functions

2020-07-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 30, 2020 at 11:55:19AM +0200, Richard Biener wrote:
> For cc1 this affects debug information size as follows:
> 
>  VM SIZE FILE SIZE
>  ++ GROWING   ++
>   [ = ]   0 .debug_info   +1.63Mi  +1.3%
>   [ = ]   0 .debug_str +263Ki  +3.4%
>   [ = ]   0 .debug_abbrev  +101Ki  +4.9%
>   [ = ]   0 .debug_line   +5.71Ki  +0.0%
>+44% +16 [Unmapped]+48  +1.2%
> 
>  -- SHRINKING --
>   [ = ]   0 .debug_loc   -213  -0.0%
>   -0.0% -48 .text -48  -0.0%
>   [ = ]   0 .debug_ranges -16  -0.0%
> 
>   -0.0% -32 TOTAL +1.99Mi  +0.6%
> 
> and DWARF compression via DWZ can only shave off minor bits
> here.

I'm surprised DWZ doesn't shave off more, the prototypes in
the different TUs should be the same and therefore mergeable.

What could help a little if DWZ is able to merge the prototypes with
definition DIE (verify that the type is the same and there is no information
the definition doesn't have appart from DW_AT_external).

> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> OK for trunk and backports?

I'd go with it for trunk and 10.2.1 now and consider further backports
later.  Maybe even defer the 10.2.1 backport for two weeks.

Jakub



Re: [PATCH] debug/96383 - emit debug info for used external functions

2020-07-30 Thread Andreas Schwab
On Jul 30 2020, Richard Biener wrote:

> diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c 
> b/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c
> new file mode 100644
> index 000..ede30f9a95e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g -gdwarf -dA" } */
> +
> +extern void foo (int);
> +extern void bar (int);
> +
> +int main()
> +{
> +  foo (1);
> +}
> +
> +/* We want subprogram DIEs for both foo and main and a DIE for
> +   the formal parameter of foo.  We do not want a DIE for
> +   unusedbar.  */
> +/* { dg-final { scan-assembler-times "DW_TAG_subprogram" 4 } } */
> +/* { dg-final { scan-assembler-times "DW_TAG_formal_parameter" 2 } } */
> +/* { dg-final { scan-assembler-not "unusedbar" } } */

The test doesn't reference unusedbar at all, is that intended?

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH] testsuite: Enable fenv effective-targets for C++

2020-07-30 Thread Jonathan Wakely via Gcc-patches
I recently wanted to use { dg-require-effective-target fenv } in a
libstdc++ test, but it uses -std=gnu99 which is only valid for C.

This allows C++ tests to use the fenv and fenv_exceptions
effective-target keywords.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_fenv): Use
-std=gnu++11 instead of -std=gnu99 for C++ tests.
(check_effective_target_fenv_exceptions): Likewise.


I don't actually need to use the 'fenv' target now, but does this seem
like a reasonable change anyway?


commit 727a3c3008038f97b0d43ebd88b2da6b4d4159f6
Author: Jonathan Wakely 
Date:   Thu Jul 30 10:32:03 2020

testsuite: Enable fenv effective-targets for C++

I recently wanted to use { dg-require-effective-target fenv } in a
libstdc++ test, but it uses -std=gnu99 which is only valid for C.

This allows C++ tests to use the fenv and fenv_exceptions
effective-target keywords.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_fenv): Use
-std=gnu++11 instead of -std=gnu99 for C++ tests.
(check_effective_target_fenv_exceptions): Likewise.

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 57eed3012b9..042334e3a35 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9533,9 +9533,14 @@ proc check_effective_target_avr_tiny { } {
 # Return 1 if  is available.
 
 proc check_effective_target_fenv {} {
+if [check_effective_target_c++] {
+	set std "-std=gnu++11"
+} else {
+	set std "-std=gnu99"
+}
 return [check_no_compiler_messages fenv object {
 	#include 
-} [add_options_for_ieee "-std=gnu99"]]
+} [add_options_for_ieee "$std"]]
 }
 
 # Return 1 if  is available with all the standard IEEE
@@ -9544,6 +9549,11 @@ proc check_effective_target_fenv {} {
 # exceptions, those need to be specified in the testcases.)
 
 proc check_effective_target_fenv_exceptions {} {
+if [check_effective_target_c++] {
+	set std "-std=gnu++11"
+} else {
+	set std "-std=gnu99"
+}
 return [check_runtime fenv_exceptions {
 	#include 
 	#include 
@@ -9572,7 +9582,7 @@ proc check_effective_target_fenv_exceptions {} {
 	  else
 	abort ();
 	}
-} [add_options_for_ieee "-std=gnu99"]]
+} [add_options_for_ieee "$std"]]
 }
 
 # Return 1 if -fexceptions is supported.


Re: [PATCH] debug/96383 - emit debug info for used external functions

2020-07-30 Thread Richard Biener
On Thu, 30 Jul 2020, Andreas Schwab wrote:

> On Jul 30 2020, Richard Biener wrote:
> 
> > diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c 
> > b/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c
> > new file mode 100644
> > index 000..ede30f9a95e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr96383-1.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-g -gdwarf -dA" } */
> > +
> > +extern void foo (int);
> > +extern void bar (int);
> > +
> > +int main()
> > +{
> > +  foo (1);
> > +}
> > +
> > +/* We want subprogram DIEs for both foo and main and a DIE for
> > +   the formal parameter of foo.  We do not want a DIE for
> > +   unusedbar.  */
> > +/* { dg-final { scan-assembler-times "DW_TAG_subprogram" 4 } } */
> > +/* { dg-final { scan-assembler-times "DW_TAG_formal_parameter" 2 } } */
> > +/* { dg-final { scan-assembler-not "unusedbar" } } */
> 
> The test doesn't reference unusedbar at all, is that intended?

Whoops, fixed.

Richard.


[committed] d: Refactor matching and lowering of intrinsic functions.

2020-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

This patch cleans up D intrinsic handling in the front-end.

Intrinsics are now matched explicitly, rather than through a common
alias where there are multiple overrides for a common intrinsic.
Where there is a corresponding DECL_FUNCTION_CODE, that is now stored in
the D intrinsic array.  All run-time std.math intrinsics have been
removed, as the library implementation already forwards to core.math.

Bootstrapped and regression tested on x86_64-linux-gnu, and committed to
mainline.

Regards
Iain

---
gcc/d/ChangeLog:

* d-tree.h (DEF_D_INTRINSIC): Rename second argument from A to B.
* intrinsics.cc (intrinsic_decl): Add built_in field.
(DEF_D_INTRINSIC): Rename second argument from ALIAS to BUILTIN.
(maybe_set_intrinsic): Handle new intrinsic codes.
(expand_intrinsic_bt): Likewise.
(expand_intrinsic_checkedint): Likewise.
(expand_intrinsic_bswap): Remove.
(expand_intrinsic_sqrt): Remove.
(maybe_expand_intrinsic): Group together intrinsic cases that map
directly to gcc built-ins.
* intrinsics.def (DEF_D_BUILTIN): Rename second argument from A to B.
Update all callers to pass equivalent DECL_FUNCTION_CODE.
(DEF_CTFE_BUILTIN): Likewise.
(STD_COS): Remove intrinsic.
(STD_FABS): Remove intrinsic.
(STD_LDEXP): Remove intrinsic.
(STD_RINT): Remove intrinsic.
(STD_RNDTOL): Remove intrinsic.
(STD_SIN): Remove intrinsic.
(STD_SQRTF): Remove intrinsic.
(STD_SQRT): Remove intrinsic.
(STD_SQRTL): Remove intrinsic.

gcc/testsuite/ChangeLog:

* gdc.dg/intrinsics.d: New test.
---
 gcc/d/d-tree.h|   2 +-
 gcc/d/intrinsics.cc   | 342 ++
 gcc/d/intrinsics.def  | 170 +++
 gcc/testsuite/gdc.dg/intrinsics.d | 117 ++
 4 files changed, 358 insertions(+), 273 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/intrinsics.d

diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index b4b832f9ad8..d92418a55c9 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -80,7 +80,7 @@ enum level_kind
 
 enum intrinsic_code
 {
-#define DEF_D_INTRINSIC(CODE, A, N, M, D, C) INTRINSIC_ ## CODE,
+#define DEF_D_INTRINSIC(CODE, B, N, M, D, C) INTRINSIC_ ## CODE,
 
 #include "intrinsics.def"
 
diff --git a/gcc/d/intrinsics.cc b/gcc/d/intrinsics.cc
index ba7e6aef6ed..7ef1ec5ea20 100644
--- a/gcc/d/intrinsics.cc
+++ b/gcc/d/intrinsics.cc
@@ -40,9 +40,12 @@ along with GCC; see the file COPYING3.  If not see
 
 struct intrinsic_decl
 {
-  /* The DECL_FUNCTION_CODE of this decl.  */
+  /* The DECL_INTRINSIC_CODE of this decl.  */
   intrinsic_code code;
 
+  /* The DECL_FUNCTION_CODE of this decl, if it directly maps to any.  */
+  built_in_function built_in;
+
   /* The name of the intrinsic.  */
   const char *name;
 
@@ -58,8 +61,8 @@ struct intrinsic_decl
 
 static const intrinsic_decl intrinsic_decls[] =
 {
-#define DEF_D_INTRINSIC(CODE, ALIAS, NAME, MODULE, DECO, CTFE) \
-{ INTRINSIC_ ## ALIAS, NAME, MODULE, DECO, CTFE },
+#define DEF_D_INTRINSIC(CODE, BUILTIN, NAME, MODULE, DECO, CTFE) \
+{ INTRINSIC_ ## CODE, BUILT_IN_ ## BUILTIN, NAME, MODULE, DECO, CTFE },
 
 #include "intrinsics.def"
 
@@ -144,11 +147,28 @@ maybe_set_intrinsic (FuncDeclaration *decl)
case INTRINSIC_C_VA_ARG:
case INTRINSIC_VASTART:
case INTRINSIC_ADDS:
+   case INTRINSIC_ADDSL:
+   case INTRINSIC_ADDU:
+   case INTRINSIC_ADDUL:
case INTRINSIC_SUBS:
+   case INTRINSIC_SUBSL:
+   case INTRINSIC_SUBU:
+   case INTRINSIC_SUBUL:
case INTRINSIC_MULS:
+   case INTRINSIC_MULSL:
+   case INTRINSIC_MULU:
+   case INTRINSIC_MULUI:
+   case INTRINSIC_MULUL:
case INTRINSIC_NEGS:
-   case INTRINSIC_VLOAD:
-   case INTRINSIC_VSTORE:
+   case INTRINSIC_NEGSL:
+   case INTRINSIC_VLOAD8:
+   case INTRINSIC_VLOAD16:
+   case INTRINSIC_VLOAD32:
+   case INTRINSIC_VLOAD64:
+   case INTRINSIC_VSTORE8:
+   case INTRINSIC_VSTORE16:
+   case INTRINSIC_VSTORE32:
+   case INTRINSIC_VSTORE64:
  break;
 
case INTRINSIC_POW:
@@ -302,14 +322,33 @@ expand_intrinsic_bt (intrinsic_code intrinsic, tree 
callexp)
 integer_minus_one_node, integer_zero_node);
 
   /* Update the bit as needed, only testing the bit for bt().  */
-  if (intrinsic == INTRINSIC_BT)
-return cond;
+  tree_code code;
+
+  switch (intrinsic)
+{
+case INTRINSIC_BT:
+case INTRINSIC_BT64:
+  return cond;
+
+case INTRINSIC_BTC:
+case INTRINSIC_BTC64:
+  code = BIT_XOR_EXPR;
+  break;
+
+case INTRINSIC_BTR:
+case INTRINSIC_BTR64:
+  bitnum = fold_build1 (BIT_NOT_EXPR, TREE_TYPE (bitnum), bitnum);
+  code = BIT_AND_EXPR;
+  break;

Re: [PATCH] OpenACC: Separate enter/exit data APIs

2020-07-30 Thread Andrew Stubbs

On 29/07/2020 15:05, Andrew Stubbs wrote:
This patch does not implement anything new, but simply separates OpenACC 
'enter data' and 'exit data' into two libgomp API functions.  The 
original API name is kept for backward compatibility, but no longer 
referenced by the compiler.


The previous implementation assumed that it would always be possible to 
infer which kind of pragma it was dealing with from the context, but 
there are a few exceptions, and I want to add one more: zero-length arrays.


By cleaning this up I will be free to add the new feature without the 
reference counting getting broken.


Here's an updated patch that handles the variadic functions properly.

Thanks to Julian for pointing out my misunderstanding.

OK for mainline (and backport to OG10)?

Andrew
OpenACC: Separate enter/exit data APIs

Move the OpenACC enter and exit data directives from using a single builtin
to having one each.  For most purposes it was easy to tell which was which,
from the directives given, but there are some exceptions.  In particular,
zero-length array copies are indistiguishable, but we still want reference
counting to work.

gcc/ChangeLog:

	* gimple-pretty-print.c (dump_gimple_omp_target): Replace
	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA with
	GF_OMP_TARGET_KIND_OACC_ENTER_DATA and
	GF_OMP_TARGET_KIND_OACC_EXIT_DATA.
	* gimple.h (enum gf_mask): Likewise.
	(is_gimple_omp_oacc): Likewise.
	* gimplify.c (gimplify_omp_target_update): Likewise.
	* omp-builtins.def (BUILT_IN_GOACC_ENTER_EXIT_DATA): Delete.
	(BUILT_IN_GOACC_ENTER_DATA): Add new.
	(BUILT_IN_GOACC_EXIT_DATA): Add new.
	* omp-expand.c (expand_omp_target): Replace
	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA with
	GF_OMP_TARGET_KIND_OACC_ENTER_DATA and
	GF_OMP_TARGET_KIND_OACC_EXIT_DATA.
	(build_omp_regions_1): Likewise.
	(omp_make_gimple_edges): Likewise.
	* omp-low.c (check_omp_nesting_restrictions): Likewise.
	(lower_omp_target): Likewise.

libgomp/ChangeLog:

	* libgomp.map: Add GOACC_enter_data and GOACC_exit_data.
	* libgomp_g.h (GOACC_enter_exit_data): Delete.
	(GOACC_enter_data): New prototype.
	(GOACC_exit_data) New prototype.:
	* oacc-mem.c (GOACC_enter_exit_data): Move most of the content ...
	(GOACC_enter_exit_data_internal): ... here.
	(GOACC_enter_data): New function.
	(GOACC_exit_data) New function.:
	* oacc-parallel.c (GOACC_declare): Replace GOACC_enter_exit_data with
	  GOACC_enter_data and GOACC_exit_data.

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index e05b770138e..a4dfc493588 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1694,8 +1694,11 @@ dump_gimple_omp_target (pretty_printer *buffer, const gomp_target *gs,
 case GF_OMP_TARGET_KIND_OACC_UPDATE:
   kind = " oacc_update";
   break;
-case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
-  kind = " oacc_enter_exit_data";
+case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
+  kind = " oacc_enter_data";
+  break;
+case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
+  kind = " oacc_exit_data";
   break;
 case GF_OMP_TARGET_KIND_OACC_DECLARE:
   kind = " oacc_declare";
diff --git a/gcc/gimple.h b/gcc/gimple.h
index d64c47a7d0d..681000d2ae4 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -180,9 +180,10 @@ enum gf_mask {
 GF_OMP_TARGET_KIND_OACC_SERIAL = 7,
 GF_OMP_TARGET_KIND_OACC_DATA = 8,
 GF_OMP_TARGET_KIND_OACC_UPDATE = 9,
-GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 10,
+GF_OMP_TARGET_KIND_OACC_ENTER_DATA = 10,
 GF_OMP_TARGET_KIND_OACC_DECLARE = 11,
 GF_OMP_TARGET_KIND_OACC_HOST_DATA = 12,
+GF_OMP_TARGET_KIND_OACC_EXIT_DATA = 13,
 GF_OMP_TEAMS_GRID_PHONY	= 1 << 0,
 GF_OMP_TEAMS_HOST		= 1 << 1,
 
@@ -6587,7 +6588,8 @@ is_gimple_omp_oacc (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_UPDATE:
-	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
+	case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
+	case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
 	case GF_OMP_TARGET_KIND_OACC_DECLARE:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 	  return true;
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 15dfee903ab..6948c1ccdbb 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12950,8 +12950,11 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   switch (TREE_CODE (expr))
 {
 case OACC_ENTER_DATA:
+  kind = GF_OMP_TARGET_KIND_OACC_ENTER_DATA;
+  ort = ORT_ACC;
+  break;
 case OACC_EXIT_DATA:
-  kind = GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA;
+  kind = GF_OMP_TARGET_KIND_OACC_EXIT_DATA;
   ort = ORT_ACC;
   break;
 case OACC_UPDATE:
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index f461d60e52b..ab45eecb752 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -35,7 +35,10 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_START, "GOACC_data_start",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_END, 

[committed] d: Implement core.bitop.rol() and core.bitop.ror() as intrinsics.

2020-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

Following on from the last patch, this implements rol() and ror() in the
core.bitop module as D intrinsics.

Bootstrapped and regression tested on x86_64-linux-gnu and committed to
mainline.

Regards
Iain

---
gcc/d/ChangeLog:

* intrinsics.cc (expand_intrinsic_rotate): Add function.
(maybe_expand_intrinsic): Handle rol and ror intrinsics.
* intrinsics.def (ROL): Add intrinsic.
(ROL_TIARG): Add intrinsic.
(ROR): Add intrinsic.
(ROR_TIARG): Add intrinsic.

gcc/testsuite/ChangeLog:

* gdc.dg/intrinsics.d: Add ror and rol tests.
---
 gcc/d/intrinsics.cc   | 56 +++
 gcc/d/intrinsics.def  |  5 +++
 gcc/testsuite/gdc.dg/intrinsics.d |  6 
 3 files changed, 67 insertions(+)

diff --git a/gcc/d/intrinsics.cc b/gcc/d/intrinsics.cc
index 7ef1ec5ea20..28667c63e17 100644
--- a/gcc/d/intrinsics.cc
+++ b/gcc/d/intrinsics.cc
@@ -390,6 +390,56 @@ expand_intrinsic_popcnt (tree callexp)
   return call_builtin_fn (callexp, code, 1, arg);
 }
 
+/* Expand a front-end intrinsic call to INTRINSIC, which is either a call to
+   rol() or ror().  These intrinsics expect to take one or two arguments,
+   the signature to which can be either:
+
+   T rol(T) (const T value, const uint count);
+   T rol(uint count, T) (const T value);
+   T ror(T) (const T value, const uint count);
+   T ror(uint count, T) (const T value);
+
+   This bitwise rotates VALUE left or right by COUNT bit positions.  */
+
+static tree
+expand_intrinsic_rotate (intrinsic_code intrinsic, tree callexp)
+{
+  tree type = TREE_TYPE (callexp);
+  tree value = CALL_EXPR_ARG (callexp, 0);
+  tree count;
+  tree_code code;
+
+  /* Get the equivalent tree code for the intrinsic.  */
+  if (intrinsic == INTRINSIC_ROL || intrinsic == INTRINSIC_ROL_TIARG)
+code = LROTATE_EXPR;
+  else if (intrinsic == INTRINSIC_ROR || intrinsic == INTRINSIC_ROR_TIARG)
+code = RROTATE_EXPR;
+  else
+gcc_unreachable ();
+
+  /* Get the COUNT parameter.  Either from the call expression arguments or the
+ template instantiation arguments.  */
+  if (intrinsic == INTRINSIC_ROL || intrinsic == INTRINSIC_ROR)
+count = CALL_EXPR_ARG (callexp, 1);
+  else
+{
+  tree callee = CALL_EXPR_FN (callexp);
+
+  if (TREE_CODE (callee) == ADDR_EXPR)
+   callee = TREE_OPERAND (callee, 0);
+
+  /* Retrieve from the encoded template instantation.  */
+  TemplateInstance *ti = DECL_LANG_FRONTEND (callee)->isInstantiated ();
+  gcc_assert (ti && ti->tiargs && ti->tiargs->length == 2);
+
+  Expression *e = isExpression ((*ti->tiargs)[0]);
+  gcc_assert (e && e->op == TOKint64);
+  count = build_expr (e, true);
+}
+
+  return fold_build2 (code, type, value, count);
+}
+
 /* Expand a front-end intrinsic call to copysign().  This takes two arguments,
the signature to which can be either:
 
@@ -737,6 +787,12 @@ maybe_expand_intrinsic (tree callexp)
 case INTRINSIC_POPCNT64:
   return expand_intrinsic_popcnt (callexp);
 
+case INTRINSIC_ROL:
+case INTRINSIC_ROL_TIARG:
+case INTRINSIC_ROR:
+case INTRINSIC_ROR_TIARG:
+  return expand_intrinsic_rotate (intrinsic, callexp);
+
 case INTRINSIC_BSWAP32:
 case INTRINSIC_BSWAP64:
 case INTRINSIC_CEIL:
diff --git a/gcc/d/intrinsics.def b/gcc/d/intrinsics.def
index 0c32126b1fa..5b8cb712264 100644
--- a/gcc/d/intrinsics.def
+++ b/gcc/d/intrinsics.def
@@ -59,6 +59,11 @@ DEF_D_BUILTIN (BSWAP64, BSWAP64, "bswap", "core.bitop", 
"FNaNbNiNfmZm")
 DEF_D_BUILTIN (POPCNT32, NONE, "popcnt", "core.bitop", "FNaNbNiNfkZi")
 DEF_D_BUILTIN (POPCNT64, NONE, "popcnt", "core.bitop", "FNaNbNiNfmZi")
 
+DEF_D_BUILTIN (ROL, NONE, "rol", "core.bitop", "FNaI1TkZI1T")
+DEF_D_BUILTIN (ROL_TIARG, NONE, "rol", "core.bitop", "FNaI1TZI1T")
+DEF_D_BUILTIN (ROR, NONE, "ror", "core.bitop", "FNaI1TkZI1T")
+DEF_D_BUILTIN (ROR_TIARG, NONE, "ror", "core.bitop", "FNaI1TZI1T")
+
 DEF_D_BUILTIN (VLOAD8, NONE, "volatileLoad", "core.bitop", "FNbNiNfPhZh")
 DEF_D_BUILTIN (VLOAD16, NONE, "volatileLoad", "core.bitop", "FNbNiNfPtZt")
 DEF_D_BUILTIN (VLOAD32, NONE, "volatileLoad", "core.bitop", "FNbNiNfPkZk")
diff --git a/gcc/testsuite/gdc.dg/intrinsics.d 
b/gcc/testsuite/gdc.dg/intrinsics.d
index e10c06dd41a..5888361a438 100644
--- a/gcc/testsuite/gdc.dg/intrinsics.d
+++ b/gcc/testsuite/gdc.dg/intrinsics.d
@@ -38,6 +38,12 @@ void test_volatileStore(ubyte *a, ubyte b) { return 
volatileStore(a, b); }
 void test_volatileStore(ushort *a, ushort b) { return volatileStore(a, b); }
 void test_volatileStore(uint *a, uint b) { return volatileStore(a, b); }
 void test_volatileStore(ulong *a, ulong b) { return volatileStore(a, b); }
+// { dg-final { scan-tree-dump-not " rol " "original" } }
+ubyte test_rol(ubyte a, uint b) { return rol!ubyte(a, b); }
+uint test_rol(uint a) { return rol!(1, uint)(a); }
+// { dg-final { scan-tree-dump-not " ror " "original" } }
+ushort test_ror(ushort a, uint b) { 

Re: [PATCH] debug/96383 - emit debug info for used external functions

2020-07-30 Thread Richard Biener
On Thu, 30 Jul 2020, Jakub Jelinek wrote:

> On Thu, Jul 30, 2020 at 11:55:19AM +0200, Richard Biener wrote:
> > For cc1 this affects debug information size as follows:
> > 
> >  VM SIZE FILE SIZE
> >  ++ GROWING   ++
> >   [ = ]   0 .debug_info   +1.63Mi  +1.3%
> >   [ = ]   0 .debug_str +263Ki  +3.4%
> >   [ = ]   0 .debug_abbrev  +101Ki  +4.9%
> >   [ = ]   0 .debug_line   +5.71Ki  +0.0%
> >+44% +16 [Unmapped]+48  +1.2%
> > 
> >  -- SHRINKING --
> >   [ = ]   0 .debug_loc   -213  -0.0%
> >   -0.0% -48 .text -48  -0.0%
> >   [ = ]   0 .debug_ranges -16  -0.0%
> > 
> >   -0.0% -32 TOTAL +1.99Mi  +0.6%
> > 
> > and DWARF compression via DWZ can only shave off minor bits
> > here.
> 
> I'm surprised DWZ doesn't shave off more, the prototypes in
> the different TUs should be the same and therefore mergeable.

Yeah, DWZ shaves off 60% of the .debug_info overhead, but due
to the size of cc1 I have not tried figuring out what the difference
in the end will be.

> What could help a little if DWZ is able to merge the prototypes with
> definition DIE (verify that the type is the same and there is no information
> the definition doesn't have appart from DW_AT_external).

DWZ could split out a shareable specification DIE from definition
DIEs and use DW_AT_specification.  When we bring declarations into
scope elsewhere we have to avoid refering to definitions there.

> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > OK for trunk and backports?
> 
> I'd go with it for trunk and 10.2.1 now and consider further backports
> later.  Maybe even defer the 10.2.1 backport for two weeks.

So testing revealed I have to be more careful what nodes I create
DIEs for.  For example for g++.dg/debug/dwarf2/cdtor-1.C we'd now
end up with DIEs for all of the DTOR aliases (and IIRC aliases are
not subject to unused removal, nor unused members of a comdat group).

In particular guarding the nodes with

if (!cnode->alias && !cnode->thunk.thunk_p
&& !fndecl_built_in_p (cnode->decl))

seems to fix the few regressions observed.  I'm retesting fully with
that.

Richard.


[PATCH] x86_64: Integer min/max improvements.

2020-07-30 Thread Roger Sayle

This patch tweaks the way that min and max are expanded, so that the
semantics of these operations are visible to the early RTL optimization
passes, until split into explicit comparison and conditional move
instructions. The good news is that i386.md already contains all of
the required logic (many thanks to Richard Biener and Uros Bizjak),
but this is currently only to enable scalar-to-vector (STV) synthesis
of min/max instructions.  This change enables this functionality for
all TARGET_CMOVE architectures for SImode, HImode and DImode.

My first attempt to support "min(reg,reg)" as an instruction revealed
why this functionality isn't already enabled: In PR rtl-optimization 91154
we end up generating a cmp instead of a test in gcc.target/i386/minmax-3.c
which has poor performance on AMD Opteron.  My solution to this is to
actually support "min(reg,general_operand)" allowing us to inspect
any immediate constant at the point this operation is split.  This
allows us to use "test" instructions for min/max against 0, 1 and -1.
As an added benefit it allows us to CSE large immediate constants,
reducing code size.

Previously, "int foo(int x) { return max(x,12345); }" would generate:

foo:cmpl$12345, %edi
movl$12345, %eax
cmovge  %edi, %eax
ret

with this patch we instead generate:

foo:movl$12345, %eax
cmpl%eax, %edi
cmovge  %edi, %eax
ret


I've also included a peephole2 to avoid the "movl $0,%eax" instructions
being produced by the register allocator.  Materializing constants as
late as possible reduces register pressure, but for const0_rtx on x86,
it's preferable to use "xor" by moving this set from between a flags
setting operation and its use.  This should also help instruction macro
fusion on some microarchitectures, where test/cmp and the following
instruction can sometimes be combined.

Previously, "int foo(int x) { return max(x,0); }" would generate:

foo:testl   %edi, %edi
movl$0, %eax
cmovns  %edi, %eax
ret

with this patch we instead generate:
foo:xorl%eax, %eax
testl   %edi, %edi
cmovns  %edi, %eax
ret

The benefits of having min/max explicit at the RTL level can be seen
from compiling the following example with "-O2 -fno-tree-reassoc".


#define max(a,b) (((a) > (b))? (a) : (b))

int foo(int x)
{
  int y = max(x,5);
  return max(y,10);
}

where GCC currently produces:

foo:cmpl$5, %edi
movl$5, %eax
movl$10, %edx
cmovge  %edi, %eax
cmpl$10, %eax
cmovl   %edx, %eax
ret

and with this patch it instead now produces:

foo:movl$10, %eax
cmpl%eax, %edi
cmovge  %edi, %eax
ret


The original motivation was from looking at a performance critical
function in a quantum mechanics code, that performed MIN_EXPR and
MAX_EXPR of the same arguments (effectively a two-element sort),
where GCC was performing the comparison twice.  I'd hoped that it
might be possible to fuse these together, perhaps in combine, but
this patch is an intermediate step towards that goal.

This patch has been tested on x86_64-pc-linux-gnu with a make
bootstrap followed by make -k check with no new regressions.
Ok for mainline?


2020-07-30  Roger Sayle  

* config/i386/i386.md (MAXMIN_IMODE): No longer needed.
(3):  Support SWI248 and general_operand for
second operand, when TARGET_CMOVE.
(3_1 splitter): Optimize comparisons against
0, 1 and -1 to use "test" instead of "cmp".
(*di3_doubleword): Likewise, allow general_operand
and enable on TARGET_CMOVE.
(peephole2): Convert clearing a register after a flag setting
instruction into an xor followed by the original flag setter.


Many thanks in advance.  Almost all of the credit goes to Uros and 
Richard for already implementing this feature, I've just copied the
transformations from optab expansion, that allow it to be enabled
without penalty (this late in the compilation).

Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b24a455..717d55f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18809,45 +18809,68 @@
 
 ;; min/max patterns
 
-(define_mode_iterator MAXMIN_IMODE
-  [(SI "TARGET_SSE4_1") (DI "TARGET_AVX512VL")])
 (define_code_attr maxmin_rel
   [(smax "GE") (smin "LE") (umax "GEU") (umin "LEU")])
 
 (define_expand "3"
   [(parallel
-[(set (match_operand:MAXMIN_IMODE 0 "register_operand")
- (maxmin:MAXMIN_IMODE
-   (match_operand:MAXMIN_IMODE 1 "register_operand")
-   (match_operand:MAXMIN_IMODE 2 "nonimmediate_operand")))
+[(set (match_operand:SWI248 0 "register_operand")
+ (maxmin:SWI248
+   (match_operand:SWI248 1 "register_operand")
+   (match_operand:SWI248 2 "general_operand")))
  (clobber (reg:CC FLAGS_REG))])]
-  "TARG

[committed] libstdc++: Make testsuite usable with -fno-exceptions

2020-07-30 Thread Jonathan Wakely via Gcc-patches
Previously it was not possible to add -fno-exceptions to the testsuite
flags, because some files that are compiled by the v3-build_support
procedure failed with exceptions disabled.

This adjusts those files to still compile without exceptions (with
degraded functionality in some cases).

The sole testcase that explicitly checks for -fno-exceptions has also
been adjusted to use the more robust exceptions_enabled effective-target
keyword from gcc/testsuite/lib/target-supports.exp.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/vector/bool/72847.cc: Use the
exceptions_enabled effective-target keyword instead of
checking for an explicit -fno-exceptions option.
* testsuite/util/testsuite_abi.cc (examine_symbol): Remove
redundant try-catch.
* testsuite/util/testsuite_allocator.h [!__cpp_exceptions]:
Do not define check_allocate_max_size and memory_resource.
* testsuite/util/testsuite_containers.h: Replace comment with
#error if wrong standard dialect used.
* testsuite/util/testsuite_shared.cc: Likewise.

Tested powerpc64le-linux, committed to trunk.


commit 4c27c6584d0c15926f57ac40f931e238cf0b3110
Author: Jonathan Wakely 
Date:   Thu Jul 30 12:23:54 2020

libstdc++: Make testsuite usable with -fno-exceptions

Previously it was not possible to add -fno-exceptions to the testsuite
flags, because some files that are compiled by the v3-build_support
procedure failed with exceptions disabled.

This adjusts those files to still compile without exceptions (with
degraded functionality in some cases).

The sole testcase that explicitly checks for -fno-exceptions has also
been adjusted to use the more robust exceptions_enabled effective-target
keyword from gcc/testsuite/lib/target-supports.exp.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/vector/bool/72847.cc: Use the
exceptions_enabled effective-target keyword instead of
checking for an explicit -fno-exceptions option.
* testsuite/util/testsuite_abi.cc (examine_symbol): Remove
redundant try-catch.
* testsuite/util/testsuite_allocator.h [!__cpp_exceptions]:
Do not define check_allocate_max_size and memory_resource.
* testsuite/util/testsuite_containers.h: Replace comment with
#error if wrong standard dialect used.
* testsuite/util/testsuite_shared.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/72847.cc 
b/libstdc++-v3/testsuite/23_containers/vector/bool/72847.cc
index 26260e762af..c4fbc75abe8 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/bool/72847.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/bool/72847.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-skip-if "" { *-*-* } { "-fno-exceptions" } }
+// { dg-require-effective-target exceptions_enabled }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc 
b/libstdc++-v3/testsuite/util/testsuite_abi.cc
index fd8224b6789..f4bd319855a 100644
--- a/libstdc++-v3/testsuite/util/testsuite_abi.cc
+++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc
@@ -362,14 +362,9 @@ get_symbol(const string& name, const symbols& s)
 void
 examine_symbol(const char* name, const char* file)
 {
-  try
-{
-  symbols s = create_symbols(file);
-  const symbol& sym = get_symbol(name, s);
-  sym.print();
-}
-  catch(...)
-{ __throw_exception_again; }
+symbols s = create_symbols(file);
+const symbol& sym = get_symbol(name, s);
+sym.print();
 }
 
 int
diff --git a/libstdc++-v3/testsuite/util/testsuite_allocator.h 
b/libstdc++-v3/testsuite/util/testsuite_allocator.h
index e52ef788467..e030ed5500c 100644
--- a/libstdc++-v3/testsuite/util/testsuite_allocator.h
+++ b/libstdc++-v3/testsuite/util/testsuite_allocator.h
@@ -257,6 +257,7 @@ namespace __gnu_test
   return true;
 }
 
+#if __cpp_exceptions
   template
 bool
 check_allocate_max_size()
@@ -276,6 +277,7 @@ namespace __gnu_test
}
   throw;
 }
+#endif
 
   // A simple allocator which can be constructed endowed of a given
   // "personality" (an integer), queried in operator== to simulate the
@@ -761,7 +763,7 @@ namespace __gnu_test
 #endif // C++11
 
 #if __cplusplus >= 201703L
-#if __cpp_aligned_new && __cpp_rtti
+#if __cpp_aligned_new && __cpp_rtti && __cpp_exceptions
   // A concrete memory_resource, with error checking.
   class memory_resource : public std::pmr::memory_resource
   {
diff --git a/libstdc++-v3/testsuite/util/testsuite_containers.h 
b/libstdc++-v3/testsuite/util/testsuite_containers.h
index 8566af17c4a..33259ae3601 100644
--- a/libstdc++-v3/testsuite/util/testsuite_containers.h
+++ b/libstdc++-v3/testsuite/util/testsuite_containers.h
@@ -408,8 +408,9 @@ namespace __gnu_test
   void
   erase_external_ite

[committed] libstdc++: Add options for ieee float to relevant tests

2020-07-30 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* testsuite/20_util/from_chars/4.cc: Use dg-add-options ieee.
* testsuite/29_atomics/atomic_float/1.cc: Likewise.

Tested powerpc64le-linux, committed to trunk.


commit eeb45f8a41f7214d8682151bff7c7f00b06a
Author: Jonathan Wakely 
Date:   Thu Jul 30 12:23:55 2020

libstdc++: Add options for ieee float to relevant tests

libstdc++-v3/ChangeLog:

* testsuite/20_util/from_chars/4.cc: Use dg-add-options ieee.
* testsuite/29_atomics/atomic_float/1.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/20_util/from_chars/4.cc 
b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
index e7127ed0c48..23fc990bb38 100644
--- a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
+++ b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
@@ -17,6 +17,7 @@
 
 //  is supported in C++14 as a GNU extension
 // { dg-do run { target c++14 } }
+// { dg-add-options ieee }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc
index a38b040ee8f..1c2d273ce33 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc
@@ -15,6 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+// { dg-add-options ieee }
 // { dg-options "-std=gnu++2a" }
 // { dg-do run { target c++2a } }
 


[committed] libstdc++: cv bool can't be an integer-like type (LWG 3467)

2020-07-30 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (__detail::__cv_bool): New
helper concept.
(__detail::__integral_nonbool): Likewise.
(__detail::__is_integer_like): Use __integral_nonbool.
* testsuite/std/ranges/access/lwg3467.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit 9e67b4356efe4291fdb170fb116092f4ed9d2f05
Author: Jonathan Wakely 
Date:   Thu Jul 30 12:23:55 2020

libstdc++: cv bool can't be an integer-like type (LWG 3467)

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (__detail::__cv_bool): New
helper concept.
(__detail::__integral_nonbool): Likewise.
(__detail::__is_integer_like): Use __integral_nonbool.
* testsuite/std/ranges/access/lwg3467.cc: New test.

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index e56abe2be6c..aad6c2dcd91 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -520,7 +520,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
 template
-  concept __is_integer_like = integral<_Tp>
+  concept __cv_bool = same_as;
+
+template
+  concept __integral_nonbool = integral<_Tp> && !__cv_bool<_Tp>;
+
+template
+  concept __is_integer_like = __integral_nonbool<_Tp>
|| same_as<_Tp, __max_diff_type> || same_as<_Tp, __max_size_type>;
 
 template
diff --git a/libstdc++-v3/testsuite/std/ranges/access/lwg3467.cc 
b/libstdc++-v3/testsuite/std/ranges/access/lwg3467.cc
new file mode 100644
index 000..23418b476a0
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/access/lwg3467.cc
@@ -0,0 +1,33 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+// LWG 3467. bool can't be an integer-like type
+
+#include 
+
+struct R
+{
+  bool size() const;
+};
+
+template
+concept sized = requires(T t) { std::ranges::size(t); };
+
+static_assert( !sized );


Re: [PATCH][Hashtable 3/6] Fix noexcept qualifications

2020-07-30 Thread Jonathan Wakely via Gcc-patches

On 29/07/20 11:33 +0200, François Dumont via Libstdc++ wrote:
I eventually committed the attached patch which consider all your 
remarks and moreover put back the move constructor implementation 
where it used to be (not inline). It limits the size of the patch.


I also added comments on true_type/false_type new usages like you 
advised on [Hashtable 0/6] thread.


Thanks!




Re: [PATCH] x86_64: Integer min/max improvements.

2020-07-30 Thread Richard Biener via Gcc-patches
On Thu, Jul 30, 2020 at 1:24 PM Roger Sayle  wrote:
>
>
> This patch tweaks the way that min and max are expanded, so that the
> semantics of these operations are visible to the early RTL optimization
> passes, until split into explicit comparison and conditional move
> instructions. The good news is that i386.md already contains all of
> the required logic (many thanks to Richard Biener and Uros Bizjak),
> but this is currently only to enable scalar-to-vector (STV) synthesis
> of min/max instructions.  This change enables this functionality for
> all TARGET_CMOVE architectures for SImode, HImode and DImode.
>
> My first attempt to support "min(reg,reg)" as an instruction revealed
> why this functionality isn't already enabled: In PR rtl-optimization 91154
> we end up generating a cmp instead of a test in gcc.target/i386/minmax-3.c
> which has poor performance on AMD Opteron.  My solution to this is to
> actually support "min(reg,general_operand)" allowing us to inspect
> any immediate constant at the point this operation is split.  This
> allows us to use "test" instructions for min/max against 0, 1 and -1.
> As an added benefit it allows us to CSE large immediate constants,
> reducing code size.
>
> Previously, "int foo(int x) { return max(x,12345); }" would generate:
>
> foo:cmpl$12345, %edi
> movl$12345, %eax
> cmovge  %edi, %eax
> ret
>
> with this patch we instead generate:
>
> foo:movl$12345, %eax
> cmpl%eax, %edi
> cmovge  %edi, %eax
> ret
>
>
> I've also included a peephole2 to avoid the "movl $0,%eax" instructions
> being produced by the register allocator.  Materializing constants as
> late as possible reduces register pressure, but for const0_rtx on x86,
> it's preferable to use "xor" by moving this set from between a flags
> setting operation and its use.  This should also help instruction macro
> fusion on some microarchitectures, where test/cmp and the following
> instruction can sometimes be combined.
>
> Previously, "int foo(int x) { return max(x,0); }" would generate:
>
> foo:testl   %edi, %edi
> movl$0, %eax
> cmovns  %edi, %eax
> ret
>
> with this patch we instead generate:
> foo:xorl%eax, %eax
> testl   %edi, %edi
> cmovns  %edi, %eax
> ret
>
> The benefits of having min/max explicit at the RTL level can be seen
> from compiling the following example with "-O2 -fno-tree-reassoc".
>
>
> #define max(a,b) (((a) > (b))? (a) : (b))
>
> int foo(int x)
> {
>   int y = max(x,5);
>   return max(y,10);
> }
>
> where GCC currently produces:
>
> foo:cmpl$5, %edi
> movl$5, %eax
> movl$10, %edx
> cmovge  %edi, %eax
> cmpl$10, %eax
> cmovl   %edx, %eax
> ret
>
> and with this patch it instead now produces:
>
> foo:movl$10, %eax
> cmpl%eax, %edi
> cmovge  %edi, %eax
> ret
>
>
> The original motivation was from looking at a performance critical
> function in a quantum mechanics code, that performed MIN_EXPR and
> MAX_EXPR of the same arguments (effectively a two-element sort),
> where GCC was performing the comparison twice.  I'd hoped that it
> might be possible to fuse these together, perhaps in combine, but
> this patch is an intermediate step towards that goal.
>
> This patch has been tested on x86_64-pc-linux-gnu with a make
> bootstrap followed by make -k check with no new regressions.
> Ok for mainline?

May I ask for a testcase or two?  You have good examples above.
Btw, I'm sure some variants of those are in bugzilla as well.

Richard.

>
> 2020-07-30  Roger Sayle  
>
> * config/i386/i386.md (MAXMIN_IMODE): No longer needed.
> (3):  Support SWI248 and general_operand for
> second operand, when TARGET_CMOVE.
> (3_1 splitter): Optimize comparisons against
> 0, 1 and -1 to use "test" instead of "cmp".
> (*di3_doubleword): Likewise, allow general_operand
> and enable on TARGET_CMOVE.
> (peephole2): Convert clearing a register after a flag setting
> instruction into an xor followed by the original flag setter.
>
>
> Many thanks in advance.  Almost all of the credit goes to Uros and
> Richard for already implementing this feature, I've just copied the
> transformations from optab expansion, that allow it to be enabled
> without penalty (this late in the compilation).
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


Re: [PATCH] Require CET support only for the final GCC build

2020-07-30 Thread Richard Biener
On Wed, 29 Jul 2020, Joseph Myers wrote:

> On Wed, 29 Jul 2020, Richard Biener wrote:
> 
> > Note that any workable solution is fine with me, I just
> > don't feel comfortable approving the solution involving
> > ../curr_stage and friends.  Joseph, would HJs latest
> > patch be OK technically?
> 
> Yes, I think that's OK.

Thus the patch is OK then.

Thanks HJ,
Richard.


[RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-07-30 Thread Joshua via Gcc-patches
From: cooper.joshua 

gcc/

* config/riscv/riscv.c (asan_shadow_offset): Implement the offset of 
asan shadow memory for risc-v.
(asan_shadow_offset): new macro definition.

libsanitizer/

* sanitizer_common/sanitizer_common.h (ModuleArch): New enumerator.
(ModuleArchToString): New architecture option.
* sanitizer_common/sanitizer_platform.h: New macro definition.
* sanitizer_common/sanitizer_symbolizer_libcdep.cpp (GetArgV): New 
architecture option.
---
 gcc/config/riscv/riscv.c  | 11 +++
 libsanitizer/sanitizer_common/sanitizer_common.h  |  5 -
 libsanitizer/sanitizer_common/sanitizer_platform.h|  6 ++
 .../sanitizer_common/sanitizer_symbolizer_libcdep.cpp |  2 ++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index bfb3885..05669c2 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -5245,6 +5245,14 @@ riscv_gpr_save_operation_p (rtx op)linux.alibaba
   return true;
 }
 
+/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
+
+static unsigned HOST_WIDE_INT
+riscv_asan_shadow_offset (void)
+{
+  return HOST_WIDE_INT_UC (0x1000);
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -5428,6 +5436,9 @@ riscv_gpr_save_operation_p (rtx op)
 #undef TARGET_NEW_ADDRESS_PROFITABLE_P
 #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
 
+#undef TARGET_ASAN_SHADOW_OFFSET
+#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/libsanitizer/sanitizer_common/sanitizer_common.h 
b/libsanitizer/sanitizer_common/sanitizer_common.h
index ac16e0e..ea7dff7 100644
--- a/libsanitizer/sanitizer_common/sanitizer_common.h
+++ b/libsanitizer/sanitizer_common/sanitizer_common.h
@@ -649,7 +649,8 @@ enum ModuleArch {
   kModuleArchARMV7,
   kModuleArchARMV7S,
   kModuleArchARMV7K,
-  kModuleArchARM64
+  kModuleArchARM64,
+  kModuleArchRISCV
 };
 
 // Opens the file 'file_name" and reads up to 'max_len' bytes.
@@ -693,6 +694,8 @@ inline const char *ModuleArchToString(ModuleArch arch) {
   return "armv7k";
 case kModuleArchARM64:
   return "arm64";
+case kModuleArchRISCV:
+  return "riscv";
   }
   CHECK(0 && "Invalid module arch");
   return "";
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform.h 
b/libsanitizer/sanitizer_common/sanitizer_platform.h
index c68bfa2..bf52490 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform.h
+++ b/libsanitizer/sanitizer_common/sanitizer_platform.h
@@ -126,6 +126,12 @@
 # define FIRST_32_SECOND_64(a, b) (a)
 #endif
 
+#if defined(__riscv__)
+# define SANITIZER_RISCV 1
+#else
+# define SANITIZER_RISCV 0
+#endif
+
 #if defined(__x86_64__) && !defined(_LP64)
 # define SANITIZER_X32 1
 #else
diff --git a/libsanitizer/sanitizer_common/sanitizer_symbolizer_libcdep.cpp 
b/libsanitizer/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
index 490c6fe..408f57d 100644
--- a/libsanitizer/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
@@ -270,6 +270,8 @@ class LLVMSymbolizerProcess : public SymbolizerProcess {
 const char* const kSymbolizerArch = "--default-arch=s390x";
 #elif defined(__s390__)
 const char* const kSymbolizerArch = "--default-arch=s390";
+#elif defined(__riscv__)
+const char* const kSymbolizerArch = "--default-arch=riscv";
 #else
 const char* const kSymbolizerArch = "--default-arch=unknown";
 #endif
-- 
2.7.4



RE: [PATCH] x86_64: Integer min/max improvements.

2020-07-30 Thread Roger Sayle


Hi Richard,

>> This patch tweaks the way that min and max are expanded, so that the 
>> semantics of these operations are visible to the early RTL 
>> optimization passes, until split into explicit comparison and 
>> conditional move instructions.
>
> Btw, I'm sure some variants of those are in bugzilla as well.

Now that you mention it, I'm not sure whether PR rtl-optimization 94543
is a bug at all, but with you and Richard Henderson weighing in, I suspect
that I must be missing something subtle.

The initial text of the bug report complains about an AND of 0xff not being
optimized away, as a zero extension is still present.  I believe that the AND
with 0xff has been completely optimized away, and the remaining movzwl
(not movzbl) is the zero extension of the short result to the integer ABI return
type.  If the result is written as a short to memory, there is no extension 
(or AND).

But perhaps the problem isn't with min/and at all, but about whether function
arguments and return values are guaranteed to be extended on input/return,
so perhaps this is related to SUBREG_PROMOTED_P and friends?

Thoughts?

Cheers,
Roger

> 2020-07-30  Roger Sayle  
>
> * config/i386/i386.md (MAXMIN_IMODE): No longer needed.
> (3):  Support SWI248 and general_operand for
> second operand, when TARGET_CMOVE.
> (3_1 splitter): Optimize comparisons against
> 0, 1 and -1 to use "test" instead of "cmp".
> (*di3_doubleword): Likewise, allow general_operand
> and enable on TARGET_CMOVE.
> (peephole2): Convert clearing a register after a flag setting
> instruction into an xor followed by the original flag setter.
>




Re: [PATCH] Use exact=false for vec_safe_grow{,_cleared} functions.

2020-07-30 Thread Martin Liška

On 7/29/20 3:47 PM, Richard Biener wrote:

On Tue, Jul 28, 2020 at 12:23 PM Martin Liška  wrote:


On 7/27/20 1:31 PM, Richard Biener wrote:

No, add gro*_exact variants and replace existing ones with this, then switch
to exact = false for the non-_exact variants.  Or add a exact=false argument
to all of them and make all existing calls explicitly passing true.


-EBITLAZY


Indeed you are - you're missing to update a lot of vec_safe_grow[_cleared]
calls.  Esp. the lto-streamer ones warrant a , true arg since we explicitly
streamed the number of elements.


Yep, these should probably use exact allocation.



How did you select the ones you updated and those you did not?


Based on testing, I only kept changes in selective scheduling which caused
some ICEs.

(which

is why I asked for a boiler-plate replacement of , true first, then making


This step would require to change about 250 places in the source code :(


the parameter defaulted to false and adjusting those cases you found)


What about rather doing exact = true and changing the few places which don't 
need
it (as they make their of growth calculations)?

Martin


Anyway, I prepared a patch that adds exact=false for all the grow functions.
Apart from selective scheduling, all other consumers seems happy. And I removed
the artificial capacity growth calculations at places mentioned by Honza.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin




Re: [PATCH] x86_64: Integer min/max improvements.

2020-07-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 30, 2020 at 01:55:03PM +0100, Roger Sayle wrote:
> Now that you mention it, I'm not sure whether PR rtl-optimization 94543
> is a bug at all, but with you and Richard Henderson weighing in, I suspect
> that I must be missing something subtle.
> 
> The initial text of the bug report complains about an AND of 0xff not being
> optimized away, as a zero extension is still present.  I believe that the AND
> with 0xff has been completely optimized away, and the remaining movzwl
> (not movzbl) is the zero extension of the short result to the integer ABI 
> return
> type.  If the result is written as a short to memory, there is no extension 
> (or AND).
> 
> But perhaps the problem isn't with min/and at all, but about whether function
> arguments and return values are guaranteed to be extended on input/return,
> so perhaps this is related to SUBREG_PROMOTED_P and friends?
> 
> Thoughts?

Yeah, given that the RTL performs the computation in HImode rather than
SImode, an extension is needed at least from the RTL POV, because there is
no guarantee what behavior will be for the upper bits of the GPR.

(insn 10 6 9 (set (reg:HI 88)
(const_int 255 [0xff])) "x.c":1:56 -1
 (nil))
...
(insn 11 9 12 (set (reg:HI 87)
(if_then_else:HI (leu (reg:CC 17 flags)
(const_int 0 [0]))
(reg/v:HI 84 [ x ])
(reg:HI 88))) "x.c":1:56 -1
 (nil))

If these two (or just one of them) insns were emitted as movw $255, %ax
or cmova %ax, %di rather than movl $255, %eax or cmova %eax, %edi, then
the upper 16 bits would contain the previous content rather than being
cleared.  So, if we want to get rid of the useless zero extension, we'd
need some target specific pass or hook or whatever that would query
instruction attributes/whatever to figure out what can or can't be trusted
to be zero.  E.g. the conditions on when movhi_internal uses mode attribute
of SI vs. HI is quite complicated, but I guess hopefully it is reliable.
I'm afraid for many insns it is not.
And then there is also the on the side knowledge that pretty much all insns
for TARGET_64BIT with GPRs zero extend if they are 32-bit insns, but the
question is if one can trust mode attribute for those.

Another possibility would be to move the zero extensions earlier in this
case, so use set (reg:SI 88) (const_int 255) and 32-bit conditional move
rather than 16-bit, and in those cases perhaps with help of range
information during expansion we could create SUBREGs and use
SUBREG_PROMOTED_* on those.

Jakub



[committed] MAINTAINERS: Add myself for write after approval

2020-07-30 Thread Joe Ramsay
2020-07-30  Joe Ramsay  

* MAINTAINERS (Write After Approval): Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 300c10e..0b825c7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -563,6 +563,7 @@ Vladimir Prus   

 Yao Qi 
 Jerry Quinn
 Easwaran Raman 
+Joe Ramsay 
 Rolf Rasmussen 
 Fritz Reese
 Volker Reichelt

-- 
2.7.4



Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-07-30 Thread Martin Liška

Hello.

What's the reason for sending the same patch multiple times
from a different sender?

Thanks,
Martin


Re: [PATCH] c++: decl_constant_value and unsharing [PR96197]

2020-07-30 Thread Patrick Palka via Gcc-patches
On Thu, 30 Jul 2020, Jason Merrill wrote:

> On 7/21/20 3:07 PM, Patrick Palka wrote:
> > In the testcase from the PR we are seeing excessive memory use (> 5GB)
> > during constexpr evaluation, almost all of which is due to the call to
> > decl_constant_value in the VAR_DECL/CONST_DECL branch of
> > cxx_eval_constant_expression.  We reach here every time we evaluate an
> > ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
> > often, and from there decl_constant_value makes an unshared copy of the
> > VAR_DECL's initializer, even though the unsharing is not needed at this
> > call site (because it is up to callers of cxx_eval_constant_expression
> > to unshare).
> > 
> > To fix this, this patch moves the responsibility of unsharing the result
> > of decl_constant_value, decl_really_constant_value and
> > scalar_constant_value from the callee to the caller.
> 
> How about creating another entry point that doesn't unshare, and using that in
> constexpr evaluation?

Is something like this what you have in mind?  This adds a defaulted
bool parameter to the three routines that controls unsharing (except for
decl_constant_value, which instead needs a new overload if we don't want
to touch its common declaration in c-common.h.)  Bootstrap and regtest
in progress.

-- >8 --

Subject: [PATCH] c++: decl_constant_value and unsharing [PR96197]

In the testcase from the PR we are seeing excessive memory use (> 5GB)
during constexpr evaluation, almost all of which is due to the call to
decl_constant_value in the VAR_DECL/CONST_DECL branch of
cxx_eval_constant_expression.  We reach here every time we evaluate an
ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
often, and from there decl_constant_value makes an unshared copy of the
VAR_DECL's initializer, even though the unsharing is not needed at this
call site (because callers of cxx_eval_constant_expression already
unshare its result when necessary).

To fix this excessive unsharing, this patch introduces a new defaulted
parameter unshare_p to scalar_constant_value, decl_really_constant_value
and decl_constant_value to allow callers to decide if these routines
should unshare their result before returning.  (Since decl_constant_value
is declared in c-common.h, it instead gets a new overload declared in
cp-tree.h.)

As a simplification, this patch also moves the call to unshare_expr in
constant_value_1 outside of the loop, since calling unshare_expr on a
DECL_P should be a no-op.

Additionally, in unify there is one call to scalar_constant_value that
seems to be dead code since we apparently don't see unlowered
enumerators anymore, so this patch replaces it with an appropriate
gcc_assert.

Now that the the calls to decl_constant_value and
decl_really_constant_value from cxx_eval_constant_expression no longer
unshare their result, memory use during constexpr evaluation for the
testcase from the PR falls from ~5GB to 15MB according to -ftime-report.

gcc/cp/ChangeLog:

PR c++/96197
* constexpr.c (cxx_eval_constant_expression) :
Pass false as the decl_constant_value and
decl_really_constant_value so that they don't unshare their
result.
* cp-tree.h (decl_constant_value): New declaration with an
additional bool parameter.
(scalar_constant_value): Add defaulted bool parameter.
(decl_really_constant_value): Likewise.
* cp-gimplify.c (cp_fold_maybe_rvalue): Call unshare_expr on the
result of decl_constant_value.
* init.c (constant_value_1): Add bool parameter.  Conditionally
unshare the initializer only before returning.
(scalar_constant_value): Add bool parameter and pass it to
constant_value_1
(decl_really_constant_value): Likewise.
(decl_constant_value): New overload with an additional bool
parameter.
* pt.c (tsubst_copy) : Assert
DECL_TEMPLATE_PARM_P and simplify accordingly.

gcc/testsuite/ChangeLog:

PR c++/96197
* g++.dg/cpp1y/constexpr-array8.C: New test.
---
 gcc/cp/constexpr.c|  4 +--
 gcc/cp/cp-tree.h  |  5 +--
 gcc/cp/init.c | 35 ---
 gcc/cp/pt.c   |  7 ++--
 gcc/testsuite/g++.dg/cpp1y/constexpr-array8.C | 18 ++
 5 files changed, 48 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-array8.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 97dcc1b1d10..b1c1d249c6e 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -5695,9 +5695,9 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  TREE_CONSTANT (r) = true;
}
   else if (ctx->strict)
-   r = decl_really_constant_value (t);
+   r = decl_really_constant_value (t, /*unshare_p=*/false);
   else
-   r = decl_constant_value (t);
+   r = decl_constant_value (t, /*

Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-07-30 Thread Jason Merrill via Gcc-patches

On 7/18/20 2:50 PM, Jakub Jelinek wrote:

Hi!

The following patch adds __builtin_bit_cast builtin, similarly to
clang or MSVC which implement std::bit_cast using such an builtin too.


Great!


It checks the various std::bit_cast requirements, when not constexpr
evaluated acts pretty much like VIEW_CONVERT_EXPR of the source argument
to the destination type and the hardest part is obviously the constexpr
evaluation.  I couldn't use the middle-end native_encode_initializer,
because it needs to handle the C++ CONSTRUCTOR_NO_CLEARING vs. negation of
that


CONSTRUCTOR_NO_CLEARING isn't specific to C++.


value initialization of missing members if there are any etc.


Doesn't native_encode_initializer handle omitted initializers already? 
Any elements for which the usual zero-initialization is wrong should 
already have explicit initializers by the time we get here.



needs to handle bitfields even if they don't have an integral representative


Can we use TYPE_MODE for this?

These all sound like things that could be improved in 
native_encode_initializer rather than duplicate a complex function.



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-07-18  Jakub Jelinek  

PR libstdc++/93121
* c-common.h (enum rid): Add RID_BUILTIN_BIT_CAST.
* c-common.c (c_common_reswords): Add __builtin_bit_cast.

* cp-tree.h (cp_build_bit_cast): Declare.
* cp-tree.def (BIT_CAST_EXPR): New tree code.
* cp-objcp-common.c (names_builtin_p): Handle RID_BUILTIN_BIT_CAST.
(cp_common_init_ts): Handle BIT_CAST_EXPR.
* cxx-pretty-print.c (cxx_pretty_printer::postfix_expression):
Likewise.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_BIT_CAST.
* semantics.c (cp_build_bit_cast): New function.
* tree.c (cp_tree_equal): Handle BIT_CAST_EXPR.
(cp_walk_subtrees): Likewise.
* pt.c (tsubst_copy): Likewise.
* constexpr.c (check_bit_cast_type, cxx_find_bitfield_repr_type,
cxx_native_interpret_aggregate, cxx_native_encode_aggregate,
cxx_eval_bit_cast): New functions.
(cxx_eval_constant_expression): Handle BIT_CAST_EXPR.
(potential_constant_expression_1): Likewise.
* cp-gimplify.c (cp_genericize_r): Likewise.

* g++.dg/cpp2a/bit-cast1.C: New test.
* g++.dg/cpp2a/bit-cast2.C: New test.
* g++.dg/cpp2a/bit-cast3.C: New test.
* g++.dg/cpp2a/bit-cast4.C: New test.
* g++.dg/cpp2a/bit-cast5.C: New test.

--- gcc/c-family/c-common.h.jj  2020-07-18 10:48:51.442493640 +0200
+++ gcc/c-family/c-common.h 2020-07-18 10:52:46.175021398 +0200
@@ -164,7 +164,7 @@ enum rid
RID_HAS_NOTHROW_COPY,RID_HAS_TRIVIAL_ASSIGN,
RID_HAS_TRIVIAL_CONSTRUCTOR, RID_HAS_TRIVIAL_COPY,
RID_HAS_TRIVIAL_DESTRUCTOR,  RID_HAS_UNIQUE_OBJ_REPRESENTATIONS,
-  RID_HAS_VIRTUAL_DESTRUCTOR,
+  RID_HAS_VIRTUAL_DESTRUCTOR,  RID_BUILTIN_BIT_CAST,
RID_IS_ABSTRACT, RID_IS_AGGREGATE,
RID_IS_BASE_OF,  RID_IS_CLASS,
RID_IS_EMPTY,RID_IS_ENUM,
--- gcc/c-family/c-common.c.jj  2020-07-18 10:48:51.381494543 +0200
+++ gcc/c-family/c-common.c 2020-07-18 10:52:46.178021354 +0200
@@ -373,6 +373,7 @@ const struct c_common_resword c_common_r
{ "__auto_type",  RID_AUTO_TYPE,  D_CONLY },
{ "__bases",  RID_BASES, D_CXXONLY },
{ "__builtin_addressof", RID_ADDRESSOF, D_CXXONLY },
+  { "__builtin_bit_cast", RID_BUILTIN_BIT_CAST, D_CXXONLY },
{ "__builtin_call_with_static_chain",
  RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
{ "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
--- gcc/cp/cp-tree.h.jj 2020-07-18 10:48:51.573491701 +0200
+++ gcc/cp/cp-tree.h2020-07-18 10:52:46.180021324 +0200
@@ -7312,6 +7312,8 @@ extern tree finish_builtin_launder(loc
 tsubst_flags_t);
  extern tree cp_build_vec_convert  (tree, location_t, tree,
 tsubst_flags_t);
+extern tree cp_build_bit_cast  (location_t, tree, tree,
+tsubst_flags_t);
  extern void start_lambda_scope(tree);
  extern void record_lambda_scope   (tree);
  extern void record_null_lambda_scope  (tree);
--- gcc/cp/cp-tree.def.jj   2020-07-18 10:48:51.569491760 +0200
+++ gcc/cp/cp-tree.def  2020-07-18 10:52:46.180021324 +0200
@@ -460,6 +460,9 @@ DEFTREECODE (UNARY_RIGHT_FOLD_EXPR, "una
  DEFTREECODE (BINARY_LEFT_FOLD_EXPR, "binary_left_fold_expr", tcc_expression, 
3)
  DEFTREECODE (BINARY_RIGHT_FOLD_EXPR, "binary_right_fold_expr", 
tcc_expression, 3)
  
+/* Represents the __builtin_bit_cast (type, expr) expression.

+   The type is in TREE_TYPE, expression in TREE_OPERAND (bitcast, 0).  */
+DEFTREECODE (BIT_CAST_EXPR, "bit_cast_expr", tcc_expression, 1)
  
  /** C++ 

Re: [PATCH] Implement P0966 std::string::reserve should not shrink

2020-07-30 Thread Jonathan Wakely via Gcc-patches

On 12/05/19 21:22 +, Andrew Luo wrote:

It's been a while, but thought I'd check in again now that GCC 9 has been 
branched, and master is now on GCC 10.

I've merged my changes with the latest code and attached a diff... Let me know 
if there's any thoughts on this.


Well I failed to get around to this for GCC 10, so let's try again
now.

I've done another review of the patch, and I'm now wondering why the
new _M_shrink function takes a requested capacity, when the caller
always passes zero. We can simplify both implementations of _M_shrink
if we remove the parameter and change the callers from _M_shrink(0) to
_M_shrink().

Was there a reason to make it take an argument? Did you anticipate
future uses of _M_shrink(n) for n >= 0?

If we simplify it then we need fewer branches in _M_shrink, because we
don't need to do:

  // Make sure we don't shrink below the current size.
  if (__requested_capacity < length())
__requested_capacity = length();

We only need to check whether length() < capacity() (and whether the
string is shared, for the COW implementation).

And if we do that, we can get rid of _M_shrink() because it's now
identical to the zero-argument form of reserve(). So we can just put
the body of _M_shrink() straight in reserve(). The reserve() function
is deprecated, but when we need to remove it we can just make it
private, so that shrink_to_fit() can still call it.

The only downside of this I see is that when the deprecated reserve()
eventually gets removed from the standard, our users will get a
"reserve() is private" error rather than a "wrong number of arguments"
error. But that might actually be better, since they can go to the
location of the private member and see the comments and attribute
describing its status in different standard versions.

I've attached a relative diff showing my suggested changes to your
most recent patch. This also fixes some regressions, because the
_M_shrink function was not swallowing exceptions that result from a
failure to reallocate, which shrink_to_fit() was doing previously.

What do you think?


commit 518cfa3c6e344e0346c51c7cab53a71309d0f730
Author: Jonathan Wakely 
Date:   Thu Jul 30 15:09:55 2020

Simplify

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver: Change _M_shrink exports to reserve().
* include/bits/basic_string.h: Simplify.
* include/bits/basic_string.tcc: Simplify.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 2a88951fa52..3ff7bb61002 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2310,10 +2310,10 @@ GLIBCXX_3.4.29 {
 # std::from_chars
 _ZSt10from_charsPKcS0_R[def]St12chars_format;
 
-# basic_string::_M_shrink
-_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE9_M_shrinkEm;
-_ZNSs9_M_shrinkEm;
-_ZNSbIwSt11char_traitsIwESaIwEE9_M_shrinkEm;
+# basic_string::reserve()
+_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE7reserveEv;
+_ZNSs7reserveEv;
+_ZNSbIwSt11char_traitsIwESaIwEE7reserveEv;
 
 } GLIBCXX_3.4.28;
 
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 9ee7a7d3ac4..1c07aa0aa07 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -420,9 +420,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   void
   _M_erase(size_type __pos, size_type __n);
 
-  void
-  _M_shrink(size_type __requested_capacity);
-
 public:
   // Construct/copy/destroy:
   // NB: We overload ctors in some cases instead of using default
@@ -943,10 +940,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { this->resize(__n, _CharT()); }
 
 #if __cplusplus >= 201103L
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
   ///  A non-binding request to reduce capacity() to size().
   void
   shrink_to_fit() noexcept
-  { _M_shrink(0); }
+  { reserve(); }
+#pragma GCC diagnostic pop
 #endif
 
   /**
@@ -983,13 +983,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   /**
*  Equivalent to shrink_to_fit().
*/
-  __attribute__((__always_inline__))
 #if __cplusplus > 201703L
   [[deprecated("use shrink_to_fit() instead")]]
 #endif
   void
-  reserve()
-  { _M_shrink(0); }
+  reserve();
 
   /**
*  Erases the string, making it empty.
@@ -3946,10 +3944,13 @@ _GLIBCXX_END_NAMESPACE_CXX11
   { this->resize(__n, _CharT()); }
 
 #if __cplusplus >= 201103L
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
   ///  A non-binding request to reduce capacity() to size().
   void
-  shrink_to_fit() _GLIBCXX_NOEXCEPT
-  { _M_shrink(0); }
+  shrink_to_fit() noexcept
+  { reserve(); }
+#pragma GCC diagnostic pop
 #endif
 
   /**
@@ -3980,16 +3981,12 @@ _GLIBCXX_END_NAMESPACE_CXX11
   

Re: [PATCH] c++: decl_constant_value and unsharing [PR96197]

2020-07-30 Thread Jason Merrill via Gcc-patches

On 7/30/20 9:49 AM, Patrick Palka wrote:

On Thu, 30 Jul 2020, Jason Merrill wrote:


On 7/21/20 3:07 PM, Patrick Palka wrote:

In the testcase from the PR we are seeing excessive memory use (> 5GB)
during constexpr evaluation, almost all of which is due to the call to
decl_constant_value in the VAR_DECL/CONST_DECL branch of
cxx_eval_constant_expression.  We reach here every time we evaluate an
ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
often, and from there decl_constant_value makes an unshared copy of the
VAR_DECL's initializer, even though the unsharing is not needed at this
call site (because it is up to callers of cxx_eval_constant_expression
to unshare).

To fix this, this patch moves the responsibility of unsharing the result
of decl_constant_value, decl_really_constant_value and
scalar_constant_value from the callee to the caller.


How about creating another entry point that doesn't unshare, and using that in
constexpr evaluation?


Is something like this what you have in mind?  This adds a defaulted
bool parameter to the three routines that controls unsharing (except for
decl_constant_value, which instead needs a new overload if we don't want
to touch its common declaration in c-common.h.)  Bootstrap and regtest
in progress.


That looks good, though I don't think we need the added parameter for 
scalar_constant_value.



-- >8 --

Subject: [PATCH] c++: decl_constant_value and unsharing [PR96197]

In the testcase from the PR we are seeing excessive memory use (> 5GB)
during constexpr evaluation, almost all of which is due to the call to
decl_constant_value in the VAR_DECL/CONST_DECL branch of
cxx_eval_constant_expression.  We reach here every time we evaluate an
ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
often, and from there decl_constant_value makes an unshared copy of the
VAR_DECL's initializer, even though the unsharing is not needed at this
call site (because callers of cxx_eval_constant_expression already
unshare its result when necessary).

To fix this excessive unsharing, this patch introduces a new defaulted
parameter unshare_p to scalar_constant_value, decl_really_constant_value
and decl_constant_value to allow callers to decide if these routines
should unshare their result before returning.  (Since decl_constant_value
is declared in c-common.h, it instead gets a new overload declared in
cp-tree.h.)

As a simplification, this patch also moves the call to unshare_expr in
constant_value_1 outside of the loop, since calling unshare_expr on a
DECL_P should be a no-op.

Additionally, in unify there is one call to scalar_constant_value that
seems to be dead code since we apparently don't see unlowered
enumerators anymore, so this patch replaces it with an appropriate
gcc_assert.

Now that the the calls to decl_constant_value and
decl_really_constant_value from cxx_eval_constant_expression no longer
unshare their result, memory use during constexpr evaluation for the
testcase from the PR falls from ~5GB to 15MB according to -ftime-report.

gcc/cp/ChangeLog:

PR c++/96197
* constexpr.c (cxx_eval_constant_expression) :
Pass false as the decl_constant_value and
decl_really_constant_value so that they don't unshare their
result.
* cp-tree.h (decl_constant_value): New declaration with an
additional bool parameter.
(scalar_constant_value): Add defaulted bool parameter.
(decl_really_constant_value): Likewise.
* cp-gimplify.c (cp_fold_maybe_rvalue): Call unshare_expr on the
result of decl_constant_value.
* init.c (constant_value_1): Add bool parameter.  Conditionally
unshare the initializer only before returning.
(scalar_constant_value): Add bool parameter and pass it to
constant_value_1
(decl_really_constant_value): Likewise.
(decl_constant_value): New overload with an additional bool
parameter.
* pt.c (tsubst_copy) : Assert
DECL_TEMPLATE_PARM_P and simplify accordingly.

gcc/testsuite/ChangeLog:

PR c++/96197
* g++.dg/cpp1y/constexpr-array8.C: New test.
---
  gcc/cp/constexpr.c|  4 +--
  gcc/cp/cp-tree.h  |  5 +--
  gcc/cp/init.c | 35 ---
  gcc/cp/pt.c   |  7 ++--
  gcc/testsuite/g++.dg/cpp1y/constexpr-array8.C | 18 ++
  5 files changed, 48 insertions(+), 21 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-array8.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 97dcc1b1d10..b1c1d249c6e 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -5695,9 +5695,9 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  TREE_CONSTANT (r) = true;
}
else if (ctx->strict)
-   r = decl_really_constant_value (t);
+   r = decl_really_constant_value (t, /*unshare_p=*/fa

Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-07-30 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 30, 2020 at 10:16:46AM -0400, Jason Merrill wrote:
> > It checks the various std::bit_cast requirements, when not constexpr
> > evaluated acts pretty much like VIEW_CONVERT_EXPR of the source argument
> > to the destination type and the hardest part is obviously the constexpr
> > evaluation.  I couldn't use the middle-end native_encode_initializer,
> > because it needs to handle the C++ CONSTRUCTOR_NO_CLEARING vs. negation of
> > that
> 
> CONSTRUCTOR_NO_CLEARING isn't specific to C++.

That is true, but the middle-end doesn't really use that much (except one
spot in the gimplifier and when deciding if two constructors are equal).

> > value initialization of missing members if there are any etc.
> 
> Doesn't native_encode_initializer handle omitted initializers already? Any
> elements for which the usual zero-initialization is wrong should already
> have explicit initializers by the time we get here.

It assumes zero initialization for everything that is not explicitly 
initialized.
When it is used in the middle-end, CONSTRUCTORs are either used in
initializers of non-automatic vars and then they are zero initialized, or
for automatic variables only empty CONSTRUCTORs are allowed.
So, what it does is memset the memory to zero before trying to fill in the
individual initializers for RECORD_TYPE/ARRAY_TYPE.

Even if we are guaranteed that (what guarantees it?) when __builtin_bit_cast
is constexpr evaluated no initializer will be omitted if may be value 
initialized
to something other than all zeros, we still need to track what bits are well
defined and what are not (e.g. any padding in there).

Thinking about it now, maybe the mask handling for !CONSTRUCTOR_NO_CLEARING
is incorrect though if there are missing initializers, because those omitted
initializers still could have paddings with unspecified content, right?

> > needs to handle bitfields even if they don't have an integral representative
> 
> Can we use TYPE_MODE for this?

When the bitfield representative is an array, it will have BLKmode
TYPE_MODE, and we need to find some suitable other integral type.
The bit-field handling for BLKmode representatives can be copied to the 
middle-end
implementation.

> These all sound like things that could be improved in
> native_encode_initializer rather than duplicate a complex function.

I think in the end only small parts of it are actually duplicated.
It is true that both native_encode_initializer and the new C++ FE
counterpart are roughly 310 lines long, but the mask handling the latter
performs is unlikely useful for the middle-end (and if added there it would
need to allow it thus to be NULL and not filled), on the other side the very
complex handling the middle-end version needs to do where it can be asked
only for a small portion of the memory complicates things a lot.
Maybe we could say that non-NULL mask is only supported for the case where
one asks for everything and have some assertions for that, but I fear we'd
still end up with ~ 450-500 lines of code in the middle-end compared to the
current 310.

I guess I can try to merge the two back in one with optional mask and see
how does it look like.  But most likely the !CONSTRUCTOR_NO_CLEARING
handling etc. would need to be conditionalized on this special C++ mode
(mask non-NULL) etc.

> > +  if (REFERENCE_REF_P (arg))
> > +arg = TREE_OPERAND (arg, 0);
> 
> Why?

I've added it so that the decay_conversion later on didn't mishandle
references to arrays.

> > +  if (error_operand_p (arg))
> > +return error_mark_node;
> > +
> > +  if (!type_dependent_expression_p (arg))
> > +{
> > +  if (TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
> > +   {
> > + /* Don't perform array-to-pointer conversion.  */
> > + arg = mark_rvalue_use (arg, loc, true);
> > + if (!complete_type_or_maybe_complain (TREE_TYPE (arg), arg, complain))
> > +   return error_mark_node;
> > +   }

But then I've added the above to match more the __builtin_bit_cast behavior
in clang, so maybe the if (REFERENCE_REF_P (arg)) is not needed anymore.
For the std::bit_cast uses, the argument will be always a reference, never
an array directly and not sure if we want to encourage people to use the
builtin directly in their code.

> > +  else
> > +   arg = decay_conversion (arg, complain);
> > +
> > +  if (error_operand_p (arg))
> > +   return error_mark_node;
> > +
> > +  arg = convert_from_reference (arg);
> > +  if (!trivially_copyable_p (TREE_TYPE (arg)))
> > +   {
> > + error_at (cp_expr_loc_or_loc (arg, loc),
> > +   "%<__builtin_bit_cast%> source type %qT "
> > +   "is not trivially copyable", TREE_TYPE (arg));
> > + return error_mark_node;
> > +   }
> > +  if (!dependent_type_p (type)
> > + && !cp_tree_equal (TYPE_SIZE_UNIT (type),
> > +TYPE_SIZE_UNIT (TREE_TYPE (arg
> > +   {
> > + error_at (loc, "%<__builtin_bit_cast%> source size %qE "
> > +"no

[committed] libstdc++: Fix test for old string ABI

2020-07-30 Thread Jonathan Wakely via Gcc-patches
The COW string doesn't accept const_iterator arguments in insert and
related member functions. Pass a mutable iterator instead.

libstdc++-v3/ChangeLog:

* testsuite/20_util/from_chars/4.cc: Pass non-const iterator
to string::insert.

Tested x86_64-linux, committed to trunk.

commit 561a19c3011f7bde3a41f2a27ea979118e3a2dff
Author: Jonathan Wakely 
Date:   Thu Jul 30 16:04:59 2020

libstdc++: Fix test for old string ABI

The COW string doesn't accept const_iterator arguments in insert and
related member functions. Pass a mutable iterator instead.

libstdc++-v3/ChangeLog:

* testsuite/20_util/from_chars/4.cc: Pass non-const iterator
to string::insert.

diff --git a/libstdc++-v3/testsuite/20_util/from_chars/4.cc 
b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
index 23fc990bb38..8148560be48 100644
--- a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
+++ b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
@@ -338,7 +338,7 @@ test_max_mantissa()
VERIFY( flt == val );
 
std::string s2 = s.substr(0, len - 5);
-   s2.insert(s2.cbegin() + orig_len - 1, '.');
+   s2.insert(s2.begin() + orig_len - 1, '.');
s2 += "e0001";
res = std::from_chars(s.data(), s.data() + len, flt, fmt);
VERIFY( res.ec == std::errc{} );


Re: [PATCH] Implement P0966 std::string::reserve should not shrink

2020-07-30 Thread Jonathan Wakely via Gcc-patches

On 30/07/20 15:29 +0100, Jonathan Wakely wrote:

On 12/05/19 21:22 +, Andrew Luo wrote:

It's been a while, but thought I'd check in again now that GCC 9 has been 
branched, and master is now on GCC 10.

I've merged my changes with the latest code and attached a diff... Let me know 
if there's any thoughts on this.


Well I failed to get around to this for GCC 10, so let's try again
now.

I've done another review of the patch, and I'm now wondering why the
new _M_shrink function takes a requested capacity, when the caller
always passes zero. We can simplify both implementations of _M_shrink
if we remove the parameter and change the callers from _M_shrink(0) to
_M_shrink().

Was there a reason to make it take an argument? Did you anticipate
future uses of _M_shrink(n) for n >= 0?

If we simplify it then we need fewer branches in _M_shrink, because we
don't need to do:

 // Make sure we don't shrink below the current size.
 if (__requested_capacity < length())
__requested_capacity = length();

We only need to check whether length() < capacity() (and whether the
string is shared, for the COW implementation).

And if we do that, we can get rid of _M_shrink() because it's now
identical to the zero-argument form of reserve(). So we can just put
the body of _M_shrink() straight in reserve(). The reserve() function
is deprecated, but when we need to remove it we can just make it
private, so that shrink_to_fit() can still call it.

The only downside of this I see is that when the deprecated reserve()
eventually gets removed from the standard, our users will get a
"reserve() is private" error rather than a "wrong number of arguments"
error. But that might actually be better, since they can go to the
location of the private member and see the comments and attribute
describing its status in different standard versions.

I've attached a relative diff showing my suggested changes to your
most recent patch. This also fixes some regressions, because the
_M_shrink function was not swallowing exceptions that result from a
failure to reallocate, which shrink_to_fit() was doing previously.

What do you think?


Here's the combined patch, based on your original with my proposed
simplifications applied.


commit 1ba08b8a7f52e64ea6ee07f7b8fb85f2adc15e33
Author: Jonathan Wakely 
Date:   Thu Jul 30 15:29:39 2020

libstdc++: Implement P0966 std::string::reserve should not shrink

Remove ability for reserve(n) to reduce a string's capacity. Add a new
reserve() overload that makes a shrink-to-fit request, and make
shrink_to_fit() use that.

Co-authored-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

2020-07-30  Andrew Luo  
Jonathan Wakely  

* config/abi/pre/gnu.ver (GLIBCXX_3.4): Use less greedy
patterns for basic_string members.
(GLIBCXX_3.4.29): Export new basic_string::reserve symbols.
* doc/xml/manual/status_cxx2020.xml: Update P0966 status.
* include/bits/basic_string.h (shrink_to_fit()): Call reserve().
(reserve(size_type)): Remove default argument.
(reserve()): Declare new overload.
[!_GLIBCXX_USE_CXX11_ABI] (shrink_to_fit, reserve): Likewise.
* include/bits/basic_string.tcc (reserve(size_type)): Remove
support for shrinking capacity.
(reserve()): Perform shrink-to-fit operation.
[!_GLIBCXX_USE_CXX11_ABI] (reserve): Likewise.
* testsuite/21_strings/basic_string/capacity/1.cc: Adjust to
reflect new behavior.
* testsuite/21_strings/basic_string/capacity/char/1.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/char/18654.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/char/2.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/wchar_t/1.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/wchar_t/18654.cc:
Likewise.
* testsuite/21_strings/basic_string/capacity/wchar_t/2.cc:
Likewise.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 17aff5d907b..3ff7bb61002 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -223,7 +223,10 @@ GLIBCXX_3.4 {
 _ZNSs6assignE[PRcjmvy]*;
 _ZNSs6insertE[PRcjmvy]*;
 _ZNSs6insertEN9__gnu_cxx17__normal_iteratorIPcSsEE[PRcjmvy]*;
-_ZNSs[67][j-z]*E[PRcjmvy]*;
+_ZNSs6rbeginEv;
+_ZNSs6resizeE[jmy]*;
+_ZNSs7replaceE[jmy]*;
+_ZNSs7reserveE[jmy];
 _ZNSs7[a-z]*EES2_[NPRjmy]*;
 _ZNSs7[a-z]*EES2_S[12]*;
 _ZNSs12_Alloc_hiderC*;
@@ -290,7 +293,10 @@ GLIBCXX_3.4 {
 _ZNSbIwSt11char_traitsIwESaIwEE6assignE[PRwjmvy]*;
 _ZNSbIwSt11char_traitsIwESaIwEE6insertE[PRwjmvy]*;
 _ZNSbIwSt11char_traitsIwESaIwEE6insertEN9__gnu_cxx17__normal_iteratorIPwS2_EE[PRwjmvy]*;
-  

[committed] d: Inline bounds checking for simple array assignments.

2020-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

This patch implements a small optimization to the code generation of
simple array assignments, inlining the array bounds checking code so
there is no reliance on the library routine _d_arraycopy(), which also
deals with postblit and copy constructors for non-trivial arrays.

Bootstrapped and regression tested on x86_64-linux-gnu, and committed to
mainline.

Regards
Iain

---
gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (AssignExp *)): Inline bounds checking
for simple array assignments.

gcc/testsuite/ChangeLog:

* gdc.dg/array1.d: New test.
---
 gcc/d/expr.cc | 45 ++-
 gcc/testsuite/gdc.dg/array1.d | 14 +++
 2 files changed, 53 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/array1.d

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 58d4943ef2b..355561a481e 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -962,14 +962,47 @@ public:
/* Perform a memcpy operation.  */
gcc_assert (e->e2->type->ty != Tpointer);
 
-   if (!postblit && !destructor && !array_bounds_check ())
+   if (!postblit && !destructor)
  {
tree t1 = d_save_expr (d_array_convert (e->e1));
-   tree t2 = d_array_convert (e->e2);
-   tree size = size_mult_expr (d_array_length (t1),
-   size_int (etype->size ()));
-   tree result = build_memcpy_call (d_array_ptr (t1),
-d_array_ptr (t2), size);
+   tree t2 = d_save_expr (d_array_convert (e->e2));
+
+   /* References to array data.  */
+   tree t1ptr = d_array_ptr (t1);
+   tree t1len = d_array_length (t1);
+   tree t2ptr = d_array_ptr (t2);
+
+   /* Generate: memcpy(to, from, size)  */
+   tree size = size_mult_expr (t1len, size_int (etype->size ()));
+   tree result = build_memcpy_call (t1ptr, t2ptr, size);
+
+   /* Insert check that array lengths match and do not overlap.  */
+   if (array_bounds_check ())
+ {
+   /* tlencmp = (t1len == t2len)  */
+   tree t2len = d_array_length (t2);
+   tree tlencmp = build_boolop (EQ_EXPR, t1len, t2len);
+
+   /* toverlap = (t1ptr + size <= t2ptr
+  || t2ptr + size <= t1ptr)  */
+   tree t1ptrcmp = build_boolop (LE_EXPR,
+ build_offset (t1ptr, size),
+ t2ptr);
+   tree t2ptrcmp = build_boolop (LE_EXPR,
+ build_offset (t2ptr, size),
+ t1ptr);
+   tree toverlap = build_boolop (TRUTH_ORIF_EXPR, t1ptrcmp,
+ t2ptrcmp);
+
+   /* (tlencmp && toverlap) ? memcpy() : _d_arraybounds()  */
+   tree tassert = build_array_bounds_call (e->loc);
+   tree tboundscheck = build_boolop (TRUTH_ANDIF_EXPR,
+ tlencmp, toverlap);
+
+   result = build_condition (void_type_node, tboundscheck,
+ result, tassert);
+ }
+
this->result_ = compound_expr (result, t1);
  }
else if ((postblit || destructor) && e->op != TOKblit)
diff --git a/gcc/testsuite/gdc.dg/array1.d b/gcc/testsuite/gdc.dg/array1.d
new file mode 100644
index 000..af81813e736
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/array1.d
@@ -0,0 +1,14 @@
+// { dg-do compile }
+// { dg-final { scan-assembler-not "_d_arraycopy" } }
+
+void test1()
+{
+int[10] a1 = void;
+int[10] a2 = void;
+a1[] = a2[];
+}
+
+void test2(int[] a1, int[] a2)
+{
+a1[] = a2[];
+}
-- 
2.25.1



[committed] d: Move private functions out of ExprVisitor into local statics

2020-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

This patch breaks out all private functions from ExprVisitor into free
standing static functions.  None of them need access to the context
pointer of the visitor class.

Bootstrapped and regression tested on x86_64-linux-gnu, and committed to
mainline.

Regards
Iain

---
gcc/d/ChangeLog:

* expr.cc (needs_postblit): Move out of ExprVisitor as a static
function.  Update all callers.
(needs_dtor): Likewise.
(lvalue_p): Likewise.
(binary_op): Likewise.
(binop_assignment): Likewise.
---
 gcc/d/expr.cc | 313 +-
 1 file changed, 159 insertions(+), 154 deletions(-)

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 7a209fbe733..ac9a2820112 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -43,181 +43,186 @@ along with GCC; see the file COPYING3.  If not see
 #include "d-tree.h"
 
 
-/* Implements the visitor interface to build the GCC trees of all Expression
-   AST classes emitted from the D Front-end.
-   All visit methods accept one parameter E, which holds the frontend AST
-   of the expression to compile.  They also don't return any value, instead
-   generated code is cached in RESULT_ and returned from the caller.  */
+/* Determine if type T is a struct that has a postblit.  */
 
-class ExprVisitor : public Visitor
+static bool
+needs_postblit (Type *t)
 {
-  using Visitor::visit;
-
-  tree result_;
-  bool constp_;
+  t = t->baseElemOf ();
 
-  /* Determine if type is a struct that has a postblit.  */
-
-  bool needs_postblit (Type *t)
-  {
-t = t->baseElemOf ();
-
-if (TypeStruct *ts = t->isTypeStruct ())
-  {
-   if (ts->sym->postblit)
- return true;
-  }
+  if (TypeStruct *ts = t->isTypeStruct ())
+{
+  if (ts->sym->postblit)
+   return true;
+}
 
-return false;
-  }
+  return false;
+}
 
-  /* Determine if type is a struct that has a destructor.  */
+/* Determine if type T is a struct that has a destructor.  */
 
-  bool needs_dtor (Type *t)
-  {
-t = t->baseElemOf ();
+static bool
+needs_dtor (Type *t)
+{
+  t = t->baseElemOf ();
 
-if (TypeStruct *ts = t->isTypeStruct ())
-  {
-   if (ts->sym->dtor)
- return true;
-  }
+  if (TypeStruct *ts = t->isTypeStruct ())
+{
+  if (ts->sym->dtor)
+   return true;
+}
 
-return false;
-  }
+  return false;
+}
 
-  /* Determine if expression is suitable lvalue.  */
+/* Determine if expression E is a suitable lvalue.  */
 
-  bool lvalue_p (Expression *e)
-  {
-SliceExp *se = e->isSliceExp ();
-if (se != NULL && se->e1->isLvalue ())
-  return true;
+static bool
+lvalue_p (Expression *e)
+{
+  SliceExp *se = e->isSliceExp ();
+  if (se != NULL && se->e1->isLvalue ())
+return true;
 
-CastExp *ce = e->isCastExp ();
-if (ce != NULL && ce->e1->isLvalue ())
-  return true;
+  CastExp *ce = e->isCastExp ();
+  if (ce != NULL && ce->e1->isLvalue ())
+return true;
 
-return (e->op != TOKslice && e->isLvalue ());
-  }
-
-  /* Build an expression of code CODE, data type TYPE, and operands ARG0 and
- ARG1.  Perform relevant conversions needed for correct code operations.  
*/
+  return (e->op != TOKslice && e->isLvalue ());
+}
 
-  tree binary_op (tree_code code, tree type, tree arg0, tree arg1)
-  {
-tree t0 = TREE_TYPE (arg0);
-tree t1 = TREE_TYPE (arg1);
-tree ret = NULL_TREE;
+/* Build an expression of code CODE, data type TYPE, and operands ARG0 and
+   ARG1.  Perform relevant conversions needed for correct code operations.  */
 
-bool unsignedp = TYPE_UNSIGNED (t0) || TYPE_UNSIGNED (t1);
+static tree
+binary_op (tree_code code, tree type, tree arg0, tree arg1)
+{
+  tree t0 = TREE_TYPE (arg0);
+  tree t1 = TREE_TYPE (arg1);
+  tree ret = NULL_TREE;
 
-/* Deal with float mod expressions immediately.  */
-if (code == FLOAT_MOD_EXPR)
-  return build_float_modulus (type, arg0, arg1);
+  bool unsignedp = TYPE_UNSIGNED (t0) || TYPE_UNSIGNED (t1);
 
-if (POINTER_TYPE_P (t0) && INTEGRAL_TYPE_P (t1))
-  return build_nop (type, build_offset_op (code, arg0, arg1));
+  /* Deal with float mod expressions immediately.  */
+  if (code == FLOAT_MOD_EXPR)
+return build_float_modulus (type, arg0, arg1);
 
-if (INTEGRAL_TYPE_P (t0) && POINTER_TYPE_P (t1))
-  return build_nop (type, build_offset_op (code, arg1, arg0));
+  if (POINTER_TYPE_P (t0) && INTEGRAL_TYPE_P (t1))
+return build_nop (type, build_offset_op (code, arg0, arg1));
 
-if (POINTER_TYPE_P (t0) && POINTER_TYPE_P (t1))
-  {
-   gcc_assert (code == MINUS_EXPR);
-   tree ptrtype = lang_hooks.types.type_for_mode (ptr_mode, 0);
+  if (INTEGRAL_TYPE_P (t0) && POINTER_TYPE_P (t1))
+return build_nop (type, build_offset_op (code, arg1, arg0));
 
-   /* POINTER_DIFF_EXPR requires a signed integer type of the same size as
-  pointers.  If some platform cannot provide that, or has a larger
-  ptrdiff_type to suppor

[committed] d: Refactor use of built-in memcmp/memcpy/memset into helper functions.

2020-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

This patch adds three codegen helper functions for generating calls to
memcmp, memcpy and memset.  All parts of the front-end have been updated
to call them instead of doing it themselves.

Bootstrapped and regression tested on x86_64-linux-gnu, and committed to
mainline.

Regards
Iain

---
gcc/d/ChangeLog:

* d-codegen.cc (build_memcmp_call): New function.
(build_memcpy_call): New function.
(build_memset_call): New function.
(build_float_identity): Call build_memcmp_call.
(lower_struct_comparison): Likewise.
(build_struct_comparison): Likewise.
* d-tree.h (build_memcmp_call): Declare.
(build_memcpy_call): Declare.
(build_memset_call): Declare.
* expr.cc (ExprVisitor::visit (EqualExp *)): Call build_memcmp_call.
(ExprVisitor::visit (AssignExp *)): Call build_memset_call.
(ExprVisitor::visit (ArrayLiteralExp *)): Call build_memcpy_call.
(ExprVisitor::visit (StructLiteralExp *)): Call build_memset_call.
---
 gcc/d/d-codegen.cc | 62 --
 gcc/d/d-tree.h |  3 +++
 gcc/d/expr.cc  | 44 +++-
 3 files changed, 56 insertions(+), 53 deletions(-)

diff --git a/gcc/d/d-codegen.cc b/gcc/d/d-codegen.cc
index cea47315d0e..a38aa6c55e0 100644
--- a/gcc/d/d-codegen.cc
+++ b/gcc/d/d-codegen.cc
@@ -794,6 +794,41 @@ d_mark_read (tree exp)
   return exp;
 }
 
+/* Build a call to memcmp(), compares the first NUM bytes of PTR1 with PTR2.  
*/
+
+tree
+build_memcmp_call (tree ptr1, tree ptr2, tree num)
+{
+  return build_call_expr (builtin_decl_explicit (BUILT_IN_MEMCMP), 3,
+ ptr1, ptr2, num);
+}
+
+/* Build a call to memcpy(), copies the first NUM bytes of SRC into DST.  */
+
+tree
+build_memcpy_call (tree dst, tree src, tree num)
+{
+  return build_call_expr (builtin_decl_explicit (BUILT_IN_MEMCPY), 3,
+ dst, src, num);
+}
+
+/* Build a call to memset(), fills the first NUM bytes of PTR with zeros.
+   If NUM is NULL, then we expect PTR to be object that requires filling.  */
+
+tree
+build_memset_call (tree ptr, tree num)
+{
+  if (num == NULL_TREE)
+{
+  gcc_assert (TREE_CODE (ptr) != ADDR_EXPR);
+  num = TYPE_SIZE_UNIT (TREE_TYPE (ptr));
+  ptr = build_address (ptr);
+}
+
+  return build_call_expr (builtin_decl_explicit (BUILT_IN_MEMSET), 3,
+ ptr, integer_zero_node, num);
+}
+
 /* Return TRUE if the struct SD is suitable for comparison using memcmp.
This is because we don't guarantee that padding is zero-initialized for
a stack variable, so we can't use memcmp to compare struct values.  */
@@ -846,11 +881,9 @@ identity_compare_p (StructDeclaration *sd)
 tree
 build_float_identity (tree_code code, tree t1, tree t2)
 {
-  tree tmemcmp = builtin_decl_explicit (BUILT_IN_MEMCMP);
   tree size = size_int (TYPE_PRECISION (TREE_TYPE (t1)) / BITS_PER_UNIT);
-
-  tree result = build_call_expr (tmemcmp, 3, build_address (t1),
-build_address (t2), size);
+  tree result = build_memcmp_call (build_address (t1),
+  build_address (t2), size);
   return build_boolop (code, result, integer_zero_node);
 }
 
@@ -879,10 +912,8 @@ lower_struct_comparison (tree_code code, StructDeclaration 
*sd,
   /* Let back-end take care of union comparisons.  */
   if (sd->isUnionDeclaration ())
 {
-  tmemcmp = build_call_expr (builtin_decl_explicit (BUILT_IN_MEMCMP), 3,
-build_address (t1), build_address (t2),
-size_int (sd->structsize));
-
+  tmemcmp = build_memcmp_call (build_address (t1), build_address (t2),
+  size_int (sd->structsize));
   return build_boolop (code, tmemcmp, integer_zero_node);
 }
 
@@ -943,11 +974,9 @@ lower_struct_comparison (tree_code code, StructDeclaration 
*sd,
  else
{
  /* Simple memcmp between types.  */
- tcmp = build_call_expr (builtin_decl_explicit (BUILT_IN_MEMCMP),
- 3, build_address (t1ref),
- build_address (t2ref),
- TYPE_SIZE_UNIT (stype));
-
+ tcmp = build_memcmp_call (build_address (t1ref),
+   build_address (t2ref),
+   TYPE_SIZE_UNIT (stype));
  tcmp = build_boolop (code, tcmp, integer_zero_node);
}
}
@@ -995,11 +1024,8 @@ build_struct_comparison (tree_code code, 
StructDeclaration *sd,
   else
 {
   /* Do bit compare of structs.  */
-  tree size = size_int (sd->structsize);
-  tree tmemcmp = build_call_expr (builtin_decl_explicit (BUILT_IN_MEMCMP),
- 3, build_address (t1),
- build_address (t2), size);
-
+   

Re: [PATCH] avoid -Wnonnull on synthesized condition in static_cast (PR 96003)

2020-07-30 Thread Martin Sebor via Gcc-patches

On 7/29/20 2:27 PM, Jason Merrill wrote:

On 7/23/20 3:08 PM, Martin Sebor wrote:

Another test case added to the bug since I posted the patch shows
that the no-warning bit set by the C++ front end is easily lost
during early folding, triggering the -Wnonull again despite
the suppression.  The attached revision tweaks the folder in just
this one case to let the suppression take effect in this situation.

In light of how pervasive this potential for stripping looks (none
of the other folders preserves the bit) the tweak feels like a band-
aid rather than a general solution.  But implementing something
better (and mainly developing tests to verify that it works in
general) would probably be quite the project.  So I leave it for
some other time.


How about in check_function_arguments_recurse COND_EXPR handling, 
checking for TREE_NO_WARNING on the condition?  Then we wouldn't need to 
deal with setting and copying it on the COND_EXPR itself.


The condition is already checked at the beginning of the function.

But the latest trunk doesn't seem to need the fold-const.c change
anymore to suppress the warning and the original patch is good
enough.  The updated test should catch the regression if
the COND_EXPR should again lose the no warning bit.

Martin





On 7/17/20 1:00 PM, Martin Sebor wrote:

The recent enhancement to treat the implicit this pointer argument
as nonnull in member functions triggers spurious -Wnonnull for
the synthesized conditional expression the C++ front end replaces
the pointer with in some static_cast expressions.  The front end
already sets the no-warning bit for the test but not for the whole
conditional expression, so the attached fix extends the same solution
to it.

The consequence of this fix is that user-written code like this:

   static_cast(p ? p : 0)->f ();
or
   static_cast(p ? p : nullptr)->f ();

don't trigger the warning because they are both transformed into
the same expression as:

   static_cast(p)->f ();

What still does trigger it is this:

   static_cast(p ? p : (T*)0)->f ();

because here it's the inner COND_EXPR's no-warning bit that's set
(the outer one is clear), whereas in the former expressions it's
the other way around.  It would be nice if this worked consistently
but I didn't see an easy way to do that and more than a quick fix
seems outside the scope for this bug.

Another case reported by someone else in the same bug involves
a dynamic_cast.  A simplified test case goes something like this:

   if (dynamic_cast(p))
 dynamic_cast(p)->f ();

The root cause is the same: the front end emitting the COND_EXPR

   ((p != 0) ? ((T*)__dynamic_cast(p, (& _ZTI1B), (& _ZTI1C), 0)) : 0)

I decided not to suppress the warning in this case because doing
so would also suppress it in unconditional calls with the result
of the cast:

   dynamic_cast(p)->f ();

and that doesn't seem helpful.  Instead, I'd suggest to make
the second cast in the if statement to reference to T&:

   if (dynamic_cast(p))
 dynamic_cast(*p).f ();

Martin





PR c++/96003 spurious -Wnonnull calling a member on the result of static_cast

gcc/c-family/ChangeLog:

	PR c++/96003
	* c-common.c (check_function_arguments_recurse): Return early when
	no-warning bit is set.

gcc/cp/ChangeLog:

	PR c++/96003
	* class.c (build_base_path): Set no-warning bit on the synthesized
	conditional expression in static_cast.

gcc/testsuite/ChangeLog:

	PR c++/96003
	* g++.dg/warn/Wnonnull7.C: New test.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 116867a4513..72935b274ec 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5821,6 +5821,9 @@ check_function_arguments_recurse (void (*callback)
   void *ctx, tree param,
   unsigned HOST_WIDE_INT param_num)
 {
+  if (TREE_NO_WARNING (param))
+return;
+
   if (CONVERT_EXPR_P (param)
   && (TYPE_PRECISION (TREE_TYPE (param))
 	  == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (param, 0)
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 7a25d8fc76c..b39bdaaa3ab 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -516,8 +516,14 @@ build_base_path (enum tree_code code,
 
  out:
   if (null_test)
-expr = fold_build3_loc (input_location, COND_EXPR, target_type, null_test, expr,
-			build_zero_cst (target_type));
+{
+  expr = fold_build3_loc (input_location, COND_EXPR, target_type, null_test,
+			  expr, build_zero_cst (target_type));
+  /* Avoid warning for the whole conditional expression (in addition
+	 to NULL_TEST itself -- see above) in case the result is used in
+	 a nonnull context that the front end -Wnonnull checks.  */
+  TREE_NO_WARNING (expr) = 1;
+}
 
   return expr;
 }
diff --git a/gcc/testsuite/g++.dg/warn/Wnonnull7.C b/gcc/testsuite/g++.dg/warn/Wnonnull7.C
new file mode 100644
index 000..7611c18d948
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wnonnull7.C
@@ -0,0 +1,36 @@
+/* PR c++/96003 - spurious -Wnonnull calling a member on the result
+   of static_

Re: [PATCH v2] [RISC-V] Add support for TLS stack protector canary access

2020-07-30 Thread Kito Cheng via Gcc-patches
Hi Cooper:

Thanks for your patch! committed to trunk.

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c931e8d5a96463427040b0d11f9c4352ac22b2b0

On Wed, Jul 29, 2020 at 8:34 PM Cooper Qu via Gcc-patches
 wrote:
>
> Sorry for later replay, I will add testcases on a following patch if the
> patch is accepted.
>
>
> Regards,
>
> Cooper
>
> On 2020/7/28 上午9:23, Kito Cheng wrote:
> > Add testcase later is OK to me.
> >
> > On Tue, Jul 28, 2020 at 6:55 AM Jim Wilson  wrote:
> >> On Sun, Jul 19, 2020 at 7:04 PM cooper  wrote:
> >>> Ping
> >>>
> >>> On 2020/7/13 下午4:15, cooper wrote:
>  gcc/
> * config/riscv/riscv-opts.h (stack_protector_guard): New enum.
> * config/riscv/riscv.c (riscv_option_override): Handle
> the new options.
> * config/riscv/riscv.md (stack_protect_set): New pattern to handle
> flexible stack protector guard settings.
> (stack_protect_set_): Ditto.
> (stack_protect_test): Ditto.
> (stack_protect_test_): Ditto.
> * config/riscv/riscv.opt (mstack-protector-guard=,
> mstack-protector-guard-reg=, mstack-protector-guard-offset=): New
> options.
> * doc/invoke.texi (Option Summary) [RISC-V Options]:
> Add -mstack-protector-guard=, -mstack-protector-guard-reg=, and
> -mstack-protector-guard-offset=.
> (RISC-V Options): Ditto.
> >> The v2 patch looks fine to me.  Meanwhile, Kito asked for testcases
> >> which would be nice to have but I don't think is critical considering
> >> that this has already been tested with a kernel build.  Maybe the
> >> testcases can be a follow on patch?  I'd like to see forward movement
> >> on this, even if we accept a patch without the testcases.
> >>
> >> Jim


Re: [PATCH 26/29] rs6000: Add Power9 builtins

2020-07-30 Thread will schmidt via Gcc-patches
On Mon, 2020-07-27 at 09:14 -0500, Bill Schmidt wrote:
> From: Bill Schmidt 
> 
> 2020-07-26  Bill Schmidt  
> 
>   * config/rs6000/rs6000-builtin-new.def: Add power9,
>   power9-vector, and power9-64 builtins.
> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 354
> +++
>  1 file changed, 354 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index 2f918c1d69e..1338f543a6a 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -2394,3 +2394,357 @@
>  XSCVSPDPN vsx_xscvspdpn {}
> 
> 
> +; Power9 vector builtins.
> +[power9-vector]
> +  const vus __builtin_altivec_convert_4f32_8i16 (vf, vf);
> +CONVERT_4F32_8I16 convert_4f32_8i16 {}
> +
> +  const unsigned int __builtin_altivec_first_match_index_v16qi (vsc,
> vsc);
> +VFIRSTMATCHINDEX_V16QI first_match_index_v16qi {}

Noting a "vus" on the previous entry, was/is a define for "ui ==
unsigned int" considered?
Similar question for subsequent types that are still fully spelled out.
(unsigned char, unsigned short, ... )




> +; Miscellaneour P9 functions


Miscellaneous   :-) 



> +
> +
> +; These things need some review to see whether they really require
> +; MASK_POWERPC64.  For xsxexpdp, this seems to be fine for 32-bit,
> +; because the result will always fit in 32 bits and the return
> +; value is SImode; but the pattern currently requires TARGET_64BIT.
> +; On the other hand, xsssigdp has a result that doesn't fit in

perhaps xsxsigdp ? 


> +; 32 bits, and the return value is DImode, so it seems that
> +; TARGET_64BIT (actually TARGET_POWERPC64) is justified.  TBD. 
> +[power9-64]
> +; The following two are inexplicably named __builtin_altivec_* while
> +; their load counterparts are __builtin_vsx_*.  Need to deprecate
> +; these interfaces in favor of the other naming scheme (or vice
> versa).

Thanks
-Will




Re: [PATCH PR96195] aarch64: ICE during GIMPLE pass:vect

2020-07-30 Thread Richard Sandiford
"yangyang (ET)"  writes:
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Tuesday, July 21, 2020 2:49 AM
>> To: yangyang (ET) 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR96195] aarch64: ICE during GIMPLE pass:vect
>> 
>> Sorry for the slow reply.
>> 
>> "yangyang (ET)"  writes:
>> > Hi,
>> >
>> >This is a simple fix for PR96195.
>> >
>> >For the test case, GCC generates the following gimple statement in
>> pass_vect:
>> >
>> >  vect__21.16_58 = zp.simdclone.2 (vect_vec_iv_.15_56);
>> >
>> >The mode of vect__21.16_58 is VNx2SI while the mode of zp.simdclone.2
>> (vect_vec_iv_.15_56) is V4SI, resulting in the crash.
>> >
>> >In vectorizable_simd_clone_call, type compatibility is handled based on
>> the number of elements and the type compatibility of elements, which is not
>> enough.
>> >This patch add VIEW_CONVERT_EXPRs if the arguments types and return
>> type of simd clone function are distinct with the vectype of stmt.
>> >
>> >Added one testcase for this. Bootstrap and tested on both aarch64 and
>> x86 Linux platform, no new regression witnessed.
>> 
>> I agree this looks correct as far as the target-independent interface goes.
>> However, the underlying problem is that we haven't yet added support for SVE
>> omp simd functions.  What should happen for the testcase is that we assume
>> both SVE and Advanced SIMD versions of zp exist and call the SVE version
>> instead of the Advanced SIMD version.
>> 
>> There again, for little-endian -msve-vector-bits=128 there should be no
>> overhead with using the Advanced SIMD version, and big-endian
>> -msve-vector-bits=128 is equivalent to -msve-vector-bits=scalable.
>> 
>> Things would get more interesting for:
>> 
>>   #pragma omp declare simd simdlen(8)
>>   int
>>   zp (int);
>> 
>> and -msve-vector-bits=256, but again, we don't yet support simdlen(8) for
>> Advanced SIMD.
>> 
>> So all in all, I agree this is the right fix.  Pushed to master with a minor
>> whitespace fixup for:
>> 
>> > +  gassign *new_stmt
>> > += gimple_build_assign (make_ssa_name (atype),
>> > + vec_oprnd0);
>> 
>> …the indentation on this line.
>> 
>> Thanks,
>> Richard
>
> Thanks for reviewing and installing the patch. By the way, are there any 
> plans for adding support for SVE omp simd functions in the future?

FWIW, I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
to track this.  I guess the easiest thing would be for anyone who's
planning on working on it to assign themselves.

Thanks,
Richard


Re: [PATCH][Hashtable 3/6] Fix noexcept qualifications

2020-07-30 Thread Jonathan Wakely via Gcc-patches

On 29/07/20 11:33 +0200, François Dumont via Libstdc++ wrote:

+using type4 = std::unordered_set, std::equal_to,
+ not_noexcept_dflt_cons_alloc>>;


A couple of these tests use the wrong allocator type, which means they
fail when compiled as C++20.

Fixed with this patch. Tested x86_64-linux, committed to trunk.


commit 357beca8bce179315bdf112c0f1df20ff5874f39
Author: Jonathan Wakely 
Date:   Thu Jul 30 18:41:47 2020

libstdc++: Fix tests using wrong allocator type

libstdc++-v3/ChangeLog:

* testsuite/23_containers/unordered_multiset/cons/noexcept_default_construct.cc:
Use allocator with the correct value type.
* testsuite/23_containers/unordered_set/cons/noexcept_default_construct.cc:
Likewise.

diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multiset/cons/noexcept_default_construct.cc b/libstdc++-v3/testsuite/23_containers/unordered_multiset/cons/noexcept_default_construct.cc
index b7c0d802125..8511cb95421 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multiset/cons/noexcept_default_construct.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multiset/cons/noexcept_default_construct.cc
@@ -62,7 +62,7 @@ template
   };
 
 using type4 = std::unordered_multiset, std::equal_to,
-  not_noexcept_dflt_cons_alloc>>;
+  not_noexcept_dflt_cons_alloc>;
 
 static_assert(!std::is_nothrow_default_constructible::value,
 	  "not noexcept default constructible");
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_set/cons/noexcept_default_construct.cc b/libstdc++-v3/testsuite/23_containers/unordered_set/cons/noexcept_default_construct.cc
index d60a81ffb7c..44db4aec6cf 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_set/cons/noexcept_default_construct.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_set/cons/noexcept_default_construct.cc
@@ -62,7 +62,7 @@ template
   };
 
 using type4 = std::unordered_set, std::equal_to,
-  not_noexcept_dflt_cons_alloc>>;
+ not_noexcept_dflt_cons_alloc>;
 
 static_assert(!std::is_nothrow_default_constructible::value,
 	  "not noexcept default constructible");


Re: [PATCH 26/29] rs6000: Add Power9 builtins

2020-07-30 Thread Bill Schmidt via Gcc-patches

On 7/30/20 12:15 PM, will schmidt wrote:

On Mon, 2020-07-27 at 09:14 -0500, Bill Schmidt wrote:

From: Bill Schmidt 

2020-07-26  Bill Schmidt  

* config/rs6000/rs6000-builtin-new.def: Add power9,
power9-vector, and power9-64 builtins.
---
  gcc/config/rs6000/rs6000-builtin-new.def | 354
+++
  1 file changed, 354 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def
b/gcc/config/rs6000/rs6000-builtin-new.def
index 2f918c1d69e..1338f543a6a 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -2394,3 +2394,357 @@
  XSCVSPDPN vsx_xscvspdpn {}


+; Power9 vector builtins.
+[power9-vector]
+  const vus __builtin_altivec_convert_4f32_8i16 (vf, vf);
+CONVERT_4F32_8I16 convert_4f32_8i16 {}
+
+  const unsigned int __builtin_altivec_first_match_index_v16qi (vsc,
vsc);
+VFIRSTMATCHINDEX_V16QI first_match_index_v16qi {}

Noting a "vus" on the previous entry, was/is a define for "ui ==
unsigned int" considered?
Similar question for subsequent types that are still fully spelled out.
(unsigned char, unsigned short, ... )


I did consider it, but I feel there's a balance between keeping the 
lines short and keeping them readable.  The vector types are egregious 
space hogs, so I felt I needed to abbreviate them to avoid line wrap, 
but in most cases the scalar variables are fine with normal types.  One 
of those YMMV situations.






+; Miscellaneour P9 functions


Miscellaneous   :-)


Thanks :)





+
+
+; These things need some review to see whether they really require
+; MASK_POWERPC64.  For xsxexpdp, this seems to be fine for 32-bit,
+; because the result will always fit in 32 bits and the return
+; value is SImode; but the pattern currently requires TARGET_64BIT.
+; On the other hand, xsssigdp has a result that doesn't fit in

perhaps xsxsigdp ?


Indeed!

Thanks very much for the review!

Bill





+; 32 bits, and the return value is DImode, so it seems that
+; TARGET_64BIT (actually TARGET_POWERPC64) is justified.  TBD. 
+[power9-64]
+; The following two are inexplicably named __builtin_altivec_* while
+; their load counterparts are __builtin_vsx_*.  Need to deprecate
+; these interfaces in favor of the other naming scheme (or vice
versa).

Thanks
-Will




Re: [PATCH] c++: decl_constant_value and unsharing [PR96197]

2020-07-30 Thread Patrick Palka via Gcc-patches
On Thu, 30 Jul 2020, Jason Merrill wrote:

> On 7/30/20 9:49 AM, Patrick Palka wrote:
> > On Thu, 30 Jul 2020, Jason Merrill wrote:
> > 
> > > On 7/21/20 3:07 PM, Patrick Palka wrote:
> > > > In the testcase from the PR we are seeing excessive memory use (> 5GB)
> > > > during constexpr evaluation, almost all of which is due to the call to
> > > > decl_constant_value in the VAR_DECL/CONST_DECL branch of
> > > > cxx_eval_constant_expression.  We reach here every time we evaluate an
> > > > ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
> > > > often, and from there decl_constant_value makes an unshared copy of the
> > > > VAR_DECL's initializer, even though the unsharing is not needed at this
> > > > call site (because it is up to callers of cxx_eval_constant_expression
> > > > to unshare).
> > > > 
> > > > To fix this, this patch moves the responsibility of unsharing the result
> > > > of decl_constant_value, decl_really_constant_value and
> > > > scalar_constant_value from the callee to the caller.
> > > 
> > > How about creating another entry point that doesn't unshare, and using
> > > that in
> > > constexpr evaluation?
> > 
> > Is something like this what you have in mind?  This adds a defaulted
> > bool parameter to the three routines that controls unsharing (except for
> > decl_constant_value, which instead needs a new overload if we don't want
> > to touch its common declaration in c-common.h.)  Bootstrap and regtest
> > in progress.
> 
> That looks good, though I don't think we need the added parameter for
> scalar_constant_value.

Hmm, I guess it should always be cheap to unshare an scalar initializer.
So consider the parameter removed for scalar_constant_value.

> 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: decl_constant_value and unsharing [PR96197]
> > 
> > In the testcase from the PR we are seeing excessive memory use (> 5GB)
> > during constexpr evaluation, almost all of which is due to the call to
> > decl_constant_value in the VAR_DECL/CONST_DECL branch of
> > cxx_eval_constant_expression.  We reach here every time we evaluate an
> > ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
> > often, and from there decl_constant_value makes an unshared copy of the
> > VAR_DECL's initializer, even though the unsharing is not needed at this
> > call site (because callers of cxx_eval_constant_expression already
> > unshare its result when necessary).
> > 
> > To fix this excessive unsharing, this patch introduces a new defaulted
> > parameter unshare_p to scalar_constant_value, decl_really_constant_value
> > and decl_constant_value to allow callers to decide if these routines
> > should unshare their result before returning.  (Since decl_constant_value
> > is declared in c-common.h, it instead gets a new overload declared in
> > cp-tree.h.)
> > 
> > As a simplification, this patch also moves the call to unshare_expr in
> > constant_value_1 outside of the loop, since calling unshare_expr on a
> > DECL_P should be a no-op.
> > 
> > Additionally, in unify there is one call to scalar_constant_value that
> > seems to be dead code since we apparently don't see unlowered
> > enumerators anymore, so this patch replaces it with an appropriate
> > gcc_assert.

I'll also push this change as a separate patch, now that
scalar_constant_value is unrelated to the rest of the patch.

Here is the main patch that I guess I'll commit after a full bootstrap
and regtest:

-- >8 --

Subject: [PATCH] c++: decl_constant_value and unsharing [PR96197]

In the testcase from the PR we are seeing excessive memory use (> 5GB)
during constexpr evaluation, almost all of which is due to the call to
decl_constant_value in the VAR_DECL/CONST_DECL branch of
cxx_eval_constant_expression.  We reach here every time we evaluate an
ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
often, and from there decl_constant_value makes an unshared copy of the
VAR_DECL's initializer, even though the unsharing is not needed at this
call site (because callers of cxx_eval_constant_expression already
unshare its result when necessary).

To fix this excessive unsharing, this patch introduces a new defaulted
parameter unshare_p to decl_really_constant_value and
decl_constant_value to allow callers to choose whether these routines
should unshare the returned result.  (Since decl_constant_value is
declared in c-common.h, we introduce a new overload declared in
cp-tree.h instead of changing its existing declaration.)

As a simplification, this patch also moves the call to unshare_expr in
constant_value_1 outside of the loop, since calling unshare_expr on a
DECL_P should be a no-op.

Now that the the calls to decl_constant_value and
decl_really_constant_value from cxx_eval_constant_expression no longer
unshare their result, memory use during constexpr evaluation for the
testcase from the PR falls from ~5GB to 15MB according to -ftime-report.

gcc/cp/ChangeLog:

PR c++/96197
* constexpr.c (c

Re: [PATCH] avoid -Wnonnull on synthesized condition in static_cast (PR 96003)

2020-07-30 Thread Jason Merrill via Gcc-patches

On 7/30/20 12:25 PM, Martin Sebor wrote:

On 7/29/20 2:27 PM, Jason Merrill wrote:

On 7/23/20 3:08 PM, Martin Sebor wrote:

Another test case added to the bug since I posted the patch shows
that the no-warning bit set by the C++ front end is easily lost
during early folding, triggering the -Wnonull again despite
the suppression.  The attached revision tweaks the folder in just
this one case to let the suppression take effect in this situation.

In light of how pervasive this potential for stripping looks (none
of the other folders preserves the bit) the tweak feels like a band-
aid rather than a general solution.  But implementing something
better (and mainly developing tests to verify that it works in
general) would probably be quite the project.  So I leave it for
some other time.


How about in check_function_arguments_recurse COND_EXPR handling, 
checking for TREE_NO_WARNING on the condition?  Then we wouldn't need 
to deal with setting and copying it on the COND_EXPR itself.


The condition is already checked at the beginning of the function.


I mean:


   if (TREE_CODE (param) == COND_EXPR)
 {
+  if (TREE_NO_WARNING (TREE_OPERAND (param, 0)))
+   return;
+
   /* Simplify to avoid warning for an impossible case.  */


But your patch is OK as is.


But the latest trunk doesn't seem to need the fold-const.c change
anymore to suppress the warning and the original patch is good
enough.  The updated test should catch the regression if
the COND_EXPR should again lose the no warning bit.

Martin





On 7/17/20 1:00 PM, Martin Sebor wrote:

The recent enhancement to treat the implicit this pointer argument
as nonnull in member functions triggers spurious -Wnonnull for
the synthesized conditional expression the C++ front end replaces
the pointer with in some static_cast expressions.  The front end
already sets the no-warning bit for the test but not for the whole
conditional expression, so the attached fix extends the same solution
to it.

The consequence of this fix is that user-written code like this:

   static_cast(p ? p : 0)->f ();
or
   static_cast(p ? p : nullptr)->f ();

don't trigger the warning because they are both transformed into
the same expression as:

   static_cast(p)->f ();

What still does trigger it is this:

   static_cast(p ? p : (T*)0)->f ();

because here it's the inner COND_EXPR's no-warning bit that's set
(the outer one is clear), whereas in the former expressions it's
the other way around.  It would be nice if this worked consistently
but I didn't see an easy way to do that and more than a quick fix
seems outside the scope for this bug.

Another case reported by someone else in the same bug involves
a dynamic_cast.  A simplified test case goes something like this:

   if (dynamic_cast(p))
 dynamic_cast(p)->f ();

The root cause is the same: the front end emitting the COND_EXPR

   ((p != 0) ? ((T*)__dynamic_cast(p, (& _ZTI1B), (& _ZTI1C), 0)) : 0)

I decided not to suppress the warning in this case because doing
so would also suppress it in unconditional calls with the result
of the cast:

   dynamic_cast(p)->f ();

and that doesn't seem helpful.  Instead, I'd suggest to make
the second cast in the if statement to reference to T&:

   if (dynamic_cast(p))
 dynamic_cast(*p).f ();

Martin









Re: [PATCH 8/9] [OpenACC] Fix standalone attach for Fortran assumed-shape array pointers

2020-07-30 Thread Julian Brown
On Thu, 30 Jul 2020 11:53:10 +0200
Thomas Schwinge  wrote:

> Hi Julian, Tobias!
> 
> On 2020-07-27T15:33:41+0100, Julian Brown 
> wrote:
> > You can still get a "leak" if you try to attach a
> > synthesized/temporary array descriptor that goes out of scope
> > before the pointed-to data it refers to does -- that's a problem
> > I've mentioned earlier, and is kind-of unavoidable unless we do
> > some more sophisticated analysis to diagnose it as user error.  
> 
> ACK.  Do you remember if you already had a testcase (conceptual, or
> actual) to demonstrate that problem?

I have the attached, but it's not "clean", i.e. not really
testsuite-ready -- the breakage demonstrated depends on the stack
layout, and it only "works" at -O0.

HTH,

Julian
program myprog
  implicit none
  integer :: a(16)
  integer :: i

  call enterdata_wrapper(a, 16)
  call exitdata_wrapper(a, 16)

  contains

  subroutine enterdata_wrapper(a, n)
implicit none
integer :: n
integer, target :: a(n)
integer :: aa(16)
integer :: bb(16)
integer, pointer :: ap(:)
integer :: cc(16)
integer :: dd(16)

! An array descriptor appears somewhere around here...
ap => a

!$acc enter data copyin(aa,bb,cc,dd)
call enterdata(ap)
!$acc exit data copyout(aa,bb,cc,dd)

! ...and goes out of scope.
  end subroutine enterdata_wrapper

  subroutine enterdata(a)
implicit none
integer, pointer :: a(:)

! Map "to(a.data) to_pset(a) pointer(a.data)"
!$acc enter data copyin(a)
  end subroutine enterdata

  subroutine exitdata_wrapper(a, n)
implicit none
integer :: n
integer, target :: a(n)
integer :: aa(32)
integer :: bb(32)
integer, pointer :: ap(:)
integer :: cc(32)
integer :: dd(32)

! A different array descriptor appears...
ap => a

!$acc enter data copyin(aa,bb,cc,dd)
call exitdata(ap)
!$acc exit data copyout(aa,bb,cc,dd)

! ...and goes out of scope.
  end subroutine exitdata_wrapper

  subroutine exitdata(a)
implicit none
integer, pointer :: a(:)

! Try to unmap the fresh array descriptor: FAILS.
!$acc exit data copyout(a)
  end subroutine exitdata
end program myprog


[committed] libstdc++: Make COW string use allocator_traits for nested types

2020-07-30 Thread Jonathan Wakely via Gcc-patches
When compiled as C++20 the COW std::string fails due to assuming that
the allocator always defines size_type and difference_type. That has
been incorrect since C++11, but we got away with it for specializations
using std::allocator until those members were removed in C++20.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (size_type, difference_type):
Use allocator_traits to obtain the allocator's size_type and
difference_type.

Tested powerpc64le-linux. Committed to trunk.


commit 684d6ee140af6585c18c8790f8f5bddfcc6bd153
Author: Jonathan Wakely 
Date:   Thu Jul 30 20:58:09 2020

libstdc++: Make COW string use allocator_traits for nested types

When compiled as C++20 the COW std::string fails due to assuming that
the allocator always defines size_type and difference_type. That has
been incorrect since C++11, but we got away with it for specializations
using std::allocator until those members were removed in C++20.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (size_type, difference_type):
Use allocator_traits to obtain the allocator's size_type and
difference_type.

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index bc0c256b65e..52cecdd7ca1 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3147,8 +3147,8 @@ _GLIBCXX_END_NAMESPACE_CXX11
   typedef _Traits  traits_type;
   typedef typename _Traits::char_type  value_type;
   typedef _Alloc   allocator_type;
-  typedef typename _CharT_alloc_type::size_typesize_type;
-  typedef typename _CharT_alloc_type::difference_type   difference_type;
+  typedef typename _CharT_alloc_traits::size_type  size_type;
+  typedef typename _CharT_alloc_traits::difference_type difference_type;
 #if __cplusplus < 201103L
   typedef typename _CharT_alloc_type::referencereference;
   typedef typename _CharT_alloc_type::const_reference   const_reference;


[PATCH] RISC-V/libgcc: Reduce the size of RV64 millicode by 6 bytes

2020-07-30 Thread Maciej W. Rozycki via Gcc-patches
Rewrite code sequences throughout the 64-bit RISC-V `__riscv_save_*' 
routines replacing `li t1, -48', `li t1, -64', and `li t1, -80', 
instructions, which do not have a compressed encoding, respectively with 
`li t1, 3', `li t1, 4', and `li t1, 4', which do, and then adjusting the 
remaining code accordingly observing that `sub sp, sp, t1' takes the 
same amount of space as an `slli t1, t1, 4'/`add sp, sp, t1' instruction 
pair does, again due to the use of compressed encodings, saving 6 bytes 
total.

This change does increase code size by 4 bytes for RISC-V processors 
lacking the compressed instruction set, however their users couldn't 
care about the code size or they would have chosen an implementation 
that does have the compressed instructions, wouldn't they?

libgcc/
* config/riscv/save-restore.S [__riscv_xlen == 64] 
(__riscv_save_10, __riscv_save_8, __riscv_save_6, __riscv_save_4)
(__riscv_save_2): Replace negative immediates used for the final 
stack pointer adjustment with positive ones, right-shifted by 4.
---
Hi,

 This is hopefully functionally obviously correct.  There were also no 
regressions in `riscv64-linux-gnu' lp64d multilib testing across all our 
testsuites (with QEMU run in the Linux user emulation mode) where target 
libraries, including glibc, have been built with `-Os -msave-restore', 
that is with millicode enabled.

 OK to apply?

  Maciej
---
 libgcc/config/riscv/save-restore.S |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

gcc-riscv-libgcc-save-sll.diff
Index: gcc/libgcc/config/riscv/save-restore.S
===
--- gcc.orig/libgcc/config/riscv/save-restore.S
+++ gcc/libgcc/config/riscv/save-restore.S
@@ -45,7 +45,7 @@ FUNC_BEGIN (__riscv_save_10)
   .cfi_restore 27
   addi sp, sp, -112
   .cfi_def_cfa_offset 112
-  li t1, -16
+  li t1, 1
 .Ls10:
   sd s10, 16(sp)
   .cfi_offset 26, -96
@@ -60,7 +60,7 @@ FUNC_BEGIN (__riscv_save_8)
   .cfi_restore 27
   addi sp, sp, -112
   .cfi_def_cfa_offset 112
-  li t1, -32
+  li t1, 2
 .Ls8:
   sd s8, 32(sp)
   .cfi_offset 24, -80
@@ -77,7 +77,7 @@ FUNC_BEGIN (__riscv_save_6)
   .cfi_restore 27
   addi sp, sp, -112
   .cfi_def_cfa_offset 112
-  li t1, -48
+  li t1, 3
 .Ls6:
   sd s6, 48(sp)
   .cfi_offset 22, -64
@@ -99,7 +99,7 @@ FUNC_BEGIN (__riscv_save_4)
   .cfi_restore 27
   addi sp, sp, -112
   .cfi_def_cfa_offset 112
-  li t1, -64
+  li t1, 4
 .Ls4:
   sd s4, 64(sp)
   .cfi_offset 20, -48
@@ -123,7 +123,7 @@ FUNC_BEGIN (__riscv_save_2)
   .cfi_restore 27
   addi sp, sp, -112
   .cfi_def_cfa_offset 112
-  li t1, -80
+  li t1, 5
 .Ls2:
   sd s2, 80(sp)
   .cfi_offset 18, -32
@@ -133,9 +133,10 @@ FUNC_BEGIN (__riscv_save_2)
   .cfi_offset 8, -16
   sd ra, 104(sp)
   .cfi_offset 1, -8
+  slli t1, t1, 4
   # CFA info is not correct in next 2 instruction since t1's
   # value is depend on how may register really save.
-  sub sp, sp, t1
+  add sp, sp, t1
   jr t0
   .cfi_endproc
 FUNC_END (__riscv_save_12)


[committed] d: Fix associative array literals that don't have alignment holes filled (PR96152)

2020-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes an assert that is triggered at run-time due to the hash
of an associative array literal not matching a non-literal with the same
contents.  Associative array literals are now filled using memset()
prior to usage, with LTR evalution of side-effects enforced.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
mainline.

Regards
Iain.

---
gcc/d/ChangeLog:

PR d/96152
* d-codegen.cc (build_array_from_exprs): New function.
* d-tree.h (build_array_from_exprs): Declare.
* expr.cc (ExprVisitor::visit (AssocArrayLiteralExp *)): Use
build_array_from_exprs to generate key and value arrays.

gcc/testsuite/ChangeLog:

PR d/96152
* gdc.dg/pr96152.d: New test.
---
 gcc/d/d-codegen.cc | 36 ++
 gcc/d/d-tree.h |  1 +
 gcc/d/expr.cc  | 33 +--
 gcc/testsuite/gdc.dg/pr96152.d | 32 ++
 4 files changed, 78 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr96152.d

diff --git a/gcc/d/d-codegen.cc b/gcc/d/d-codegen.cc
index a38aa6c55e0..2dce09d7187 100644
--- a/gcc/d/d-codegen.cc
+++ b/gcc/d/d-codegen.cc
@@ -1722,6 +1722,42 @@ build_array_from_val (Type *type, tree val)
   return build_constructor (build_ctype (type), elms);
 }
 
+/* Build a static array of type TYPE from an array of EXPS.
+   If CONST_P is true, then all elements in EXPS are constants.  */
+
+tree
+build_array_from_exprs (Type *type, Expressions *exps, bool const_p)
+{
+  /* Build a CONSTRUCTOR from all expressions.  */
+  vec  *elms = NULL;
+  vec_safe_reserve (elms, exps->length);
+
+  Type *etype = type->nextOf ();
+  tree satype = make_array_type (etype, exps->length);
+
+  for (size_t i = 0; i < exps->length; i++)
+{
+  Expression *expr = (*exps)[i];
+  tree t = build_expr (expr, const_p);
+  CONSTRUCTOR_APPEND_ELT (elms, size_int (i),
+ convert_expr (t, expr->type, etype));
+}
+
+  /* Create a new temporary to store the array.  */
+  tree var = build_local_temp (satype);
+
+  /* Fill any alignment holes with zeroes.  */
+  TypeStruct *ts = etype->baseElemOf ()->isTypeStruct ();
+  tree init = NULL;
+  if (ts && (!identity_compare_p (ts->sym) || ts->sym->isUnionDeclaration ()))
+init = build_memset_call (var);
+
+  /* Initialize the temporary.  */
+  tree assign = modify_expr (var, build_constructor (satype, elms));
+  return compound_expr (compound_expr (init, assign), var);
+}
+
+
 /* Implicitly converts void* T to byte* as D allows { void[] a; &a[3]; }  */
 
 tree
diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index 48c8ef09dd1..df317d557eb 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -562,6 +562,7 @@ extern tree build_offset (tree, tree);
 extern tree build_memref (tree, tree, tree);
 extern tree build_array_set (tree, tree, tree);
 extern tree build_array_from_val (Type *, tree);
+extern tree build_array_from_exprs (Type *, Expressions *, bool);
 extern tree void_okay_p (tree);
 extern tree build_array_bounds_call (const Loc &);
 extern tree build_bounds_condition (const Loc &, tree, tree, bool);
diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 355561a481e..20ab49d7b8c 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -2799,30 +2799,14 @@ public:
 
 /* Build an expression that assigns all expressions in KEYS
to a constructor.  */
-vec  *kelts = NULL;
-vec_safe_reserve (kelts, e->keys->length);
-for (size_t i = 0; i < e->keys->length; i++)
-  {
-   Expression *key = (*e->keys)[i];
-   tree t = build_expr (key);
-   CONSTRUCTOR_APPEND_ELT (kelts, size_int (i),
-   convert_expr (t, key->type, ta->index));
-  }
-tree tkeys = make_array_type (ta->index, e->keys->length);
-tree akeys = build_constructor (tkeys, kelts);
+tree akeys = build_array_from_exprs (ta->index->sarrayOf (e->keys->length),
+e->keys, this->constp_);
+tree init = stabilize_expr (&akeys);
 
 /* Do the same with all expressions in VALUES.  */
-vec  *velts = NULL;
-vec_safe_reserve (velts, e->values->length);
-for (size_t i = 0; i < e->values->length; i++)
-  {
-   Expression *value = (*e->values)[i];
-   tree t = build_expr (value);
-   CONSTRUCTOR_APPEND_ELT (velts, size_int (i),
-   convert_expr (t, value->type, ta->next));
-  }
-tree tvals = make_array_type (ta->next, e->values->length);
-tree avals = build_constructor (tvals, velts);
+tree avals = build_array_from_exprs (ta->next->sarrayOf 
(e->values->length),
+e->values, this->constp_);
+init = compound_expr (init, stabilize_expr (&avals));
 
 /* Generate: _d_assocarrayliteralTX (ti, keys, vals);  */
 tree keys = d_array_value (build_ctype (ta->index->arrayOf ()),
@@ -2840,8 +282

[committed] d: Fix ICE in expand_intrinsic_vaarg (PR96140)

2020-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes an ICE in the D front-end in the va_arg intrinsic
functions.  Both intrinsics did not handler the case where the va_list
object comes from a ref parameter.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
mainline.

Regards
Iain.

---
gcc/d/ChangeLog:

PR d/96140
* intrinsics.cc (expand_intrinsic_vaarg): Handle ref parameters as
arguments to va_arg().
(expand_intrinsic_vastart): Handle ref parameters as arguments to
va_start().

gcc/testsuite/ChangeLog:

PR d/96140
* gdc.dg/pr96140.d: New test.
---
 gcc/d/intrinsics.cc| 23 +++
 gcc/testsuite/gdc.dg/pr96140.d | 15 +++
 2 files changed, 34 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr96140.d

diff --git a/gcc/d/intrinsics.cc b/gcc/d/intrinsics.cc
index 28667c63e17..8eec0af60ee 100644
--- a/gcc/d/intrinsics.cc
+++ b/gcc/d/intrinsics.cc
@@ -549,8 +549,17 @@ expand_intrinsic_vaarg (tree callexp)
 {
   parmn = CALL_EXPR_ARG (callexp, 1);
   STRIP_NOPS (parmn);
-  gcc_assert (TREE_CODE (parmn) == ADDR_EXPR);
-  parmn = TREE_OPERAND (parmn, 0);
+
+  /* The `ref' argument to va_arg is either an address or reference,
+get the value of it.  */
+  if (TREE_CODE (parmn) == PARM_DECL && POINTER_TYPE_P (TREE_TYPE (parmn)))
+   parmn = build_deref (parmn);
+  else
+   {
+ gcc_assert (TREE_CODE (parmn) == ADDR_EXPR);
+ parmn = TREE_OPERAND (parmn, 0);
+   }
+
   type = TREE_TYPE (parmn);
 }
 
@@ -584,10 +593,16 @@ expand_intrinsic_vastart (tree callexp)
   /* The va_list argument should already have its address taken.  The second
  argument, however, is inout and that needs to be fixed to prevent a
  warning.  Could be casting, so need to check type too?  */
-  gcc_assert (TREE_CODE (ap) == ADDR_EXPR && TREE_CODE (parmn) == ADDR_EXPR);
+  gcc_assert (TREE_CODE (ap) == ADDR_EXPR
+ || (TREE_CODE (ap) == PARM_DECL
+ && POINTER_TYPE_P (TREE_TYPE (ap;
 
   /* Assuming nobody tries to change the return type.  */
-  parmn = TREE_OPERAND (parmn, 0);
+  if (TREE_CODE (parmn) != PARM_DECL)
+{
+  gcc_assert (TREE_CODE (parmn) == ADDR_EXPR);
+  parmn = TREE_OPERAND (parmn, 0);
+}
 
   return call_builtin_fn (callexp, BUILT_IN_VA_START, 2, ap, parmn);
 }
diff --git a/gcc/testsuite/gdc.dg/pr96140.d b/gcc/testsuite/gdc.dg/pr96140.d
new file mode 100644
index 000..d25bb5d3360
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr96140.d
@@ -0,0 +1,15 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96140
+// { dg-do compile }
+module pr94140;
+
+import core.stdc.stdarg;
+
+void test_va_arg(ref int a, ...)
+{
+return va_arg!int(_argptr, a);
+}
+
+void test_va_start(ref va_list a, ...)
+{
+return va_start(a, a);
+}
-- 
2.25.1



[committed] d: Add -Wvarargs warning flag to the D front-end (PR96154)

2020-07-30 Thread Iain Buclaw via Gcc-patches
Hi,

This patch adds the -Wvarargs warning flag to the D front-end.  The
language has C-style variadic functions and va_start/va_arg, so it is
right to also have warnings for inproper use.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
mainline.

Regards
Iain

---
gcc/d/ChangeLog:

PR d/96154
* gdc.texi (Warnings): Document -Wvarargs.
* lang.opt: Add -Wvarargs

gcc/testsuite/ChangeLog:

PR d/96154
* gdc.dg/pr96154a.d: New test.
* gdc.dg/pr96154b.d: New test.
---
 gcc/d/gdc.texi  |  6 ++
 gcc/d/lang.opt  |  4 
 gcc/testsuite/gdc.dg/pr96154a.d | 18 ++
 gcc/testsuite/gdc.dg/pr96154b.d | 19 +++
 4 files changed, 47 insertions(+)
 create mode 100644 gcc/testsuite/gdc.dg/pr96154a.d
 create mode 100644 gcc/testsuite/gdc.dg/pr96154b.d

diff --git a/gcc/d/gdc.texi b/gcc/d/gdc.texi
index 2ce560f3cae..9bc1297cba4 100644
--- a/gcc/d/gdc.texi
+++ b/gcc/d/gdc.texi
@@ -600,6 +600,12 @@ Warn when a @code{pragma()} is encountered that is not 
understood by
 where a pragma that is part of the D language, but not implemented by
 the compiler, won't get reported.
 
+@item -Wno-varargs
+@cindex Wvarargs
+@cindex Wno-varargs
+Do not warn upon questionable usage of the macros used to handle variable
+arguments like @code{va_start}.
+
 @item -fignore-unknown-pragmas
 @cindex @option{-fignore-unknown-pragmas}
 @cindex @option{-fno-ignore-unknown-pragmas}
diff --git a/gcc/d/lang.opt b/gcc/d/lang.opt
index e14c83d1adb..ade92d21cc6 100644
--- a/gcc/d/lang.opt
+++ b/gcc/d/lang.opt
@@ -146,6 +146,10 @@ Wunknown-pragmas
 D LangEnabledBy(D, Wall)
 ; Documented in C
 
+Wvarargs
+D
+; Documented in C
+
 X
 D
 Generate JSON file.
diff --git a/gcc/testsuite/gdc.dg/pr96154a.d b/gcc/testsuite/gdc.dg/pr96154a.d
new file mode 100644
index 000..8c0ca658c55
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr96154a.d
@@ -0,0 +1,18 @@
+// { dg-do compile }
+
+import core.stdc.stdarg;
+
+void
+error (int a)
+{
+  va_list vp;
+  va_start (vp, a); // { dg-error "used in function with fixed arguments" }
+}
+
+void
+warn (int a, int b, ...)
+{
+va_list vp;
+va_start (vp, a); // { dg-warning "second parameter" }
+va_end (vp);
+}
diff --git a/gcc/testsuite/gdc.dg/pr96154b.d b/gcc/testsuite/gdc.dg/pr96154b.d
new file mode 100644
index 000..dec7f489308
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/pr96154b.d
@@ -0,0 +1,19 @@
+// { dg-options "-Wno-varargs" }
+// { dg-do compile }
+
+import core.stdc.stdarg;
+
+void
+error (int a)
+{
+  va_list vp;
+  va_start (vp, a); // { dg-error "used in function with fixed arguments" }
+}
+
+void
+warn (int a, int b, ...)
+{
+va_list vp;
+va_start (vp, a); // No warning because of -Wno-varargs in effect.
+va_end (vp);
+}
-- 
2.25.1



Re: [PATCH] c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

2020-07-30 Thread Jason Merrill via Gcc-patches

On 7/30/20 10:57 AM, Jakub Jelinek wrote:

On Thu, Jul 30, 2020 at 10:16:46AM -0400, Jason Merrill wrote:

It checks the various std::bit_cast requirements, when not constexpr
evaluated acts pretty much like VIEW_CONVERT_EXPR of the source argument
to the destination type and the hardest part is obviously the constexpr
evaluation.  I couldn't use the middle-end native_encode_initializer,
because it needs to handle the C++ CONSTRUCTOR_NO_CLEARING vs. negation of
that


CONSTRUCTOR_NO_CLEARING isn't specific to C++.


That is true, but the middle-end doesn't really use that much (except one
spot in the gimplifier and when deciding if two constructors are equal).

value initialization of missing members if there are any etc.


Doesn't native_encode_initializer handle omitted initializers already? Any
elements for which the usual zero-initialization is wrong should already
have explicit initializers by the time we get here.


It assumes zero initialization for everything that is not explicitly 
initialized.
When it is used in the middle-end, CONSTRUCTORs are either used in
initializers of non-automatic vars and then they are zero initialized, or
for automatic variables only empty CONSTRUCTORs are allowed.
So, what it does is memset the memory to zero before trying to fill in the
individual initializers for RECORD_TYPE/ARRAY_TYPE.

Even if we are guaranteed that (what guarantees it?) when __builtin_bit_cast
is constexpr evaluated no initializer will be omitted if may be value 
initialized
to something other than all zeros,


This is established by digest_init/process_init_constructor_record, 
which replace implicit value-initialization with explicit values when 
it's not simple zero-initialization.



we still need to track what bits are well
defined and what are not (e.g. any padding in there).



Thinking about it now, maybe the mask handling for !CONSTRUCTOR_NO_CLEARING
is incorrect though if there are missing initializers, because those omitted
initializers still could have paddings with unspecified content, right?


Zero-initialization (and thus trivial value-initialization) of a class 
also zeroes out padding bits, so when !CONSTRUCTOR_NO_CLEARING all bits 
should be well defined.



needs to handle bitfields even if they don't have an integral representative


Can we use TYPE_MODE for this?


When the bitfield representative is an array, it will have BLKmode
TYPE_MODE, and we need to find some suitable other integral type.
The bit-field handling for BLKmode representatives can be copied to the 
middle-end
implementation.


These all sound like things that could be improved in
native_encode_initializer rather than duplicate a complex function.


I think in the end only small parts of it are actually duplicated.
It is true that both native_encode_initializer and the new C++ FE
counterpart are roughly 310 lines long, but the mask handling the latter
performs is unlikely useful for the middle-end (and if added there it would
need to allow it thus to be NULL and not filled), on the other side the very
complex handling the middle-end version needs to do where it can be asked
only for a small portion of the memory complicates things a lot.
Maybe we could say that non-NULL mask is only supported for the case where
one asks for everything and have some assertions for that, but I fear we'd
still end up with ~ 450-500 lines of code in the middle-end compared to the
current 310.

I guess I can try to merge the two back in one with optional mask and see
how does it look like.  But most likely the !CONSTRUCTOR_NO_CLEARING
handling etc. would need to be conditionalized on this special C++ mode
(mask non-NULL) etc.


+  if (REFERENCE_REF_P (arg))
+arg = TREE_OPERAND (arg, 0);


Why?


I've added it so that the decay_conversion later on didn't mishandle
references to arrays.


+  if (error_operand_p (arg))
+return error_mark_node;
+
+  if (!type_dependent_expression_p (arg))
+{
+  if (TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+   {
+ /* Don't perform array-to-pointer conversion.  */
+ arg = mark_rvalue_use (arg, loc, true);
+ if (!complete_type_or_maybe_complain (TREE_TYPE (arg), arg, complain))
+   return error_mark_node;
+   }


But then I've added the above to match more the __builtin_bit_cast behavior
in clang, so maybe the if (REFERENCE_REF_P (arg)) is not needed anymore.
For the std::bit_cast uses, the argument will be always a reference, never
an array directly and not sure if we want to encourage people to use the
builtin directly in their code.


+  else
+   arg = decay_conversion (arg, complain);
+
+  if (error_operand_p (arg))
+   return error_mark_node;
+
+  arg = convert_from_reference (arg);
+  if (!trivially_copyable_p (TREE_TYPE (arg)))
+   {
+ error_at (cp_expr_loc_or_loc (arg, loc),
+   "%<__builtin_bit_cast%> source type %qT "
+   "is not trivially copyable", TREE_TYPE (arg));
+

[PATCH] [og10] openacc: Unshare reduction temporaries for GCN

2020-07-30 Thread Julian Brown
The GCN backend uses tree nodes like MEM((__lds TYPE *) )
for reduction temporaries. Unlike e.g. var decls and SSA names, these
nodes cannot be shared during gimplification, but are so in some
circumstances. This is detected when appropriate --enable-checking
options are used. This patch unshares such nodes when they are reused
more than once.

Tested with offloading to AMD GCN. I will apply shortly.

Thanks,

Julian

2020-07-30  Julian Brown  

gcc/
* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Do not
cache/share decls for reduction temporaries between invocations.
(gcn_goacc_reduction_teardown): Unshare VAR on second use.
* config/gcn/gcn.c (gcn_init_machine_status): Do not initialise
reduc_decls.
* config/gcn/gcn.h (machine_function): Remove reduc_decls cache.
---
 gcc/ChangeLog.omp |  9 +
 gcc/config/gcn/gcn-tree.c | 30 ++
 gcc/config/gcn/gcn.c  |  3 ---
 gcc/config/gcn/gcn.h  |  1 -
 4 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 12b8c044a1b1..34622a3bfa1b 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,12 @@
+2020-07-30  Julian Brown  
+
+   * config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Do not
+   cache/share decls for reduction temporaries between invocations.
+   (gcn_goacc_reduction_teardown): Unshare VAR on second use.
+   * config/gcn/gcn.c (gcn_init_machine_status): Do not initialise
+   reduc_decls.
+   * config/gcn/gcn.h (machine_function): Remove reduc_decls cache.
+
 2020-07-30  Julian Brown  
 
* config/gcn/gcn-tree.c (gcn_goacc_reduction_teardown): Remove useless
diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
index bfde71555d8a..96fdf75a8cd9 100644
--- a/gcc/config/gcn/gcn-tree.c
+++ b/gcc/config/gcn/gcn-tree.c
@@ -429,7 +429,6 @@ static tree
 gcn_goacc_get_worker_red_decl (tree type, unsigned offset)
 {
   machine_function *machfun = cfun->machine;
-  tree existing_decl;
 
   if (TREE_CODE (type) == REFERENCE_TYPE)
 type = TREE_TYPE (type);
@@ -439,29 +438,12 @@ gcn_goacc_get_worker_red_decl (tree type, unsigned offset)
(TYPE_QUALS (type)
 | ENCODE_QUAL_ADDR_SPACE (ADDR_SPACE_LDS)));
 
-  if (machfun->reduc_decls
-  && offset < machfun->reduc_decls->length ()
-  && (existing_decl = (*machfun->reduc_decls)[offset]))
-{
-  gcc_assert (TREE_TYPE (existing_decl) == var_type);
-  return existing_decl;
-}
-  else
-{
-  gcc_assert (offset
- < (machfun->reduction_limit - machfun->reduction_base));
-  tree ptr_type = build_pointer_type (var_type);
-  tree addr = build_int_cst (ptr_type, machfun->reduction_base + offset);
-
-  tree decl = build_simple_mem_ref (addr);
+  gcc_assert (offset
+ < (machfun->reduction_limit - machfun->reduction_base));
+  tree ptr_type = build_pointer_type (var_type);
+  tree addr = build_int_cst (ptr_type, machfun->reduction_base + offset);
 
-  vec_safe_grow_cleared (machfun->reduc_decls, offset + 1);
-  (*machfun->reduc_decls)[offset] = decl;
-
-  return decl;
-}
-
-  return NULL_TREE;
+  return build_simple_mem_ref (addr);
 }
 
 /* Expand IFN_GOACC_REDUCTION_SETUP.  */
@@ -616,7 +598,7 @@ gcn_goacc_reduction_teardown (gcall *call)
 }
 
   if (lhs)
-gimplify_assign (lhs, var, &seq);
+gimplify_assign (lhs, unshare_expr (var), &seq);
 
   pop_gimplify_context (NULL);
 
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 4b2df3c7df06..028e5c577c07 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -106,9 +106,6 @@ gcn_init_machine_status (void)
 
   f = ggc_cleared_alloc ();
 
-  /* And LDS temporary decls for worker reductions.  */
-  vec_alloc (f->reduc_decls, 0);
-
   if (TARGET_GCN3)
 f->use_flat_addressing = true;
 
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index 91b3d88c062a..06475f59ad79 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -573,7 +573,6 @@ struct GTY(()) machine_function
 
   unsigned HOST_WIDE_INT reduction_base;
   unsigned HOST_WIDE_INT reduction_limit;
-  vec *reduc_decls;
 
   bool use_flat_addressing;
 };
-- 
2.23.0



[PATCH] [og10] openacc: Fix parallel-dims.c test

2020-07-30 Thread Julian Brown
This test appears to have been missed on the og10 branch when enabling
worker partitioning support for AMD GCN. It was still assuming num_workers
was fixed at 1.

Tested with offloading to AMD GCN. I will apply shortly.

Thanks,

Julian

2020-07-30  Julian Brown  

libgomp/
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Fix for GCN
num_workers > 1.
---
 libgomp/ChangeLog.omp |  5 +
 .../libgomp.oacc-c-c++-common/parallel-dims.c | 15 ++-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index db88fa64661b..e741121b234e 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,8 @@
+2020-07-30  Julian Brown  
+
+   * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Fix for GCN
+   num_workers > 1.
+
 2020-07-17  Thomas Schwinge  
Kwok Cheung Yeung  
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index 71600e2386ed..511bedb9f81a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -288,9 +288,8 @@ int main ()
}
   else if (acc_on_device (acc_device_radeon))
{
- /* The GCC GCN back end is limited to num_workers (16).
-Temporarily set this to 1 until multiple workers are permitted. */
- workers_actual = 1; // 16;
+ /* The GCC GCN back end is limited to num_workers (16).  */
+ workers_actual = 16;
}
   else
__builtin_abort ();
@@ -491,8 +490,6 @@ int main ()
}
   else if (acc_on_device (acc_device_radeon))
{
- /* Temporary setting, until multiple workers are permitted.  */
- workers_actual = 1;
  /* See above comments about GCN vectors_actual.  */
  vectors_actual = 1;
}
@@ -618,10 +615,10 @@ int main ()
 gangs_max = workers_max = vectors_max = INT_MIN;
 #pragma acc serial copy (vectors_actual) \
   copy (gangs_min, gangs_max, workers_min, workers_max, vectors_min, 
vectors_max)
-  /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target 
openacc_nvidia_accel_selected } 619 } */
-  /* { dg-warning "region contains gang partitioned code but is not gang 
partitioned" "" { target *-*-* } 619 } */
-  /* { dg-warning "region contains worker partitioned code but is not worker 
partitioned" "" { target *-*-* } 619 } */
-  /* { dg-warning "region contains vector partitioned code but is not vector 
partitioned" "" { target *-*-* } 619 } */
+  /* { dg-warning "using vector_length \\(32\\), ignoring 1" "" { target 
openacc_nvidia_accel_selected } 616 } */
+  /* { dg-warning "region contains gang partitioned code but is not gang 
partitioned" "" { target *-*-* } 616 } */
+  /* { dg-warning "region contains worker partitioned code but is not worker 
partitioned" "" { target *-*-* } 616 } */
+  /* { dg-warning "region contains vector partitioned code but is not vector 
partitioned" "" { target *-*-* } 616 } */
 {
   if (acc_on_device (acc_device_nvidia))
{
-- 
2.23.0



[PATCH] [og10] openacc: Delete useless temp in gcn-tree.c

2020-07-30 Thread Julian Brown
This is an obvious cleanup to get rid of an unnecessary temporary
variable.

I will apply shortly to the og10 branch.

Thanks,

Julian

2020-07-30  Julian Brown  

gcc/
* config/gcn/gcn-tree.c (gcn_goacc_reduction_teardown): Remove useless
temporary variable "decl".
---
 gcc/ChangeLog.omp | 5 +
 gcc/config/gcn/gcn-tree.c | 4 +---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 9e6e95f667b7..12b8c044a1b1 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,8 @@
+2020-07-30  Julian Brown  
+
+   * config/gcn/gcn-tree.c (gcn_goacc_reduction_teardown): Remove useless
+   temporary variable "decl".
+
 2020-07-17  Andrew Stubbs  
 
Backport from mainline (42b47dae498):
diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
index 1dc9b21823b0..bfde71555d8a 100644
--- a/gcc/config/gcn/gcn-tree.c
+++ b/gcc/config/gcn/gcn-tree.c
@@ -603,9 +603,7 @@ gcn_goacc_reduction_teardown (gcall *call)
 
   /* Read the worker reduction buffer.  */
   tree offset = gimple_call_arg (call, 5);
-  tree decl
-   = gcn_goacc_get_worker_red_decl (var_type, TREE_INT_CST_LOW (offset));
-  var = decl;
+  var = gcn_goacc_get_worker_red_decl (var_type, TREE_INT_CST_LOW 
(offset));
 }
 
   if (level != GOMP_DIM_GANG)
-- 
2.23.0



Re: [PATCH] RISC-V/libgcc: Reduce the size of RV64 millicode by 6 bytes

2020-07-30 Thread Andrew Waterman
IIRC, I didn't use this approach originally because I wanted to avoid
the additional dynamic instruction.  But I agree that code size is the
more important metric for users of this feature.  LGTM.


On Thu, Jul 30, 2020 at 1:56 PM Maciej W. Rozycki via Gcc-patches
 wrote:
>
> Rewrite code sequences throughout the 64-bit RISC-V `__riscv_save_*'
> routines replacing `li t1, -48', `li t1, -64', and `li t1, -80',
> instructions, which do not have a compressed encoding, respectively with
> `li t1, 3', `li t1, 4', and `li t1, 4', which do, and then adjusting the
> remaining code accordingly observing that `sub sp, sp, t1' takes the
> same amount of space as an `slli t1, t1, 4'/`add sp, sp, t1' instruction
> pair does, again due to the use of compressed encodings, saving 6 bytes
> total.
>
> This change does increase code size by 4 bytes for RISC-V processors
> lacking the compressed instruction set, however their users couldn't
> care about the code size or they would have chosen an implementation
> that does have the compressed instructions, wouldn't they?
>
> libgcc/
> * config/riscv/save-restore.S [__riscv_xlen == 64]
> (__riscv_save_10, __riscv_save_8, __riscv_save_6, __riscv_save_4)
> (__riscv_save_2): Replace negative immediates used for the final
> stack pointer adjustment with positive ones, right-shifted by 4.
> ---
> Hi,
>
>  This is hopefully functionally obviously correct.  There were also no
> regressions in `riscv64-linux-gnu' lp64d multilib testing across all our
> testsuites (with QEMU run in the Linux user emulation mode) where target
> libraries, including glibc, have been built with `-Os -msave-restore',
> that is with millicode enabled.
>
>  OK to apply?
>
>   Maciej
> ---
>  libgcc/config/riscv/save-restore.S |   13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> gcc-riscv-libgcc-save-sll.diff
> Index: gcc/libgcc/config/riscv/save-restore.S
> ===
> --- gcc.orig/libgcc/config/riscv/save-restore.S
> +++ gcc/libgcc/config/riscv/save-restore.S
> @@ -45,7 +45,7 @@ FUNC_BEGIN (__riscv_save_10)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -16
> +  li t1, 1
>  .Ls10:
>sd s10, 16(sp)
>.cfi_offset 26, -96
> @@ -60,7 +60,7 @@ FUNC_BEGIN (__riscv_save_8)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -32
> +  li t1, 2
>  .Ls8:
>sd s8, 32(sp)
>.cfi_offset 24, -80
> @@ -77,7 +77,7 @@ FUNC_BEGIN (__riscv_save_6)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -48
> +  li t1, 3
>  .Ls6:
>sd s6, 48(sp)
>.cfi_offset 22, -64
> @@ -99,7 +99,7 @@ FUNC_BEGIN (__riscv_save_4)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -64
> +  li t1, 4
>  .Ls4:
>sd s4, 64(sp)
>.cfi_offset 20, -48
> @@ -123,7 +123,7 @@ FUNC_BEGIN (__riscv_save_2)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -80
> +  li t1, 5
>  .Ls2:
>sd s2, 80(sp)
>.cfi_offset 18, -32
> @@ -133,9 +133,10 @@ FUNC_BEGIN (__riscv_save_2)
>.cfi_offset 8, -16
>sd ra, 104(sp)
>.cfi_offset 1, -8
> +  slli t1, t1, 4
># CFA info is not correct in next 2 instruction since t1's
># value is depend on how may register really save.
> -  sub sp, sp, t1
> +  add sp, sp, t1
>jr t0
>.cfi_endproc
>  FUNC_END (__riscv_save_12)


Re: [PATCH, rs6000] Add non-relative jump table support for 64bit rs6000

2020-07-30 Thread Segher Boessenkool
Hi!

On Thu, Jul 30, 2020 at 04:35:57AM -0400, David Edelsohn wrote:
> The purpose should not be to exclude AIX.  If there is no fundamental
> limitation with use of the new tablejump design on AIX then the patch
> is not acceptable without AIX support.

The patch should work on all subtargets.  It will be controlled by an
(undocumented?) command line option, for easy experimentation, and to
make it easy to (temporarily) disable it for some subtarget.

This change will probably cause some problems on different OSes and with
different binary file formats, so it is useful to be able to control the
default separately for each, to keep things working while we figure
things out.  (It is not just AIX and Darwin, but also 32-bit Linux, the
several BSDs (which are different), bare-metal ports, the embedded OSes,
etc.)

But the goal is for this to work everywhere, yes.

We could default it to non-relative, to see what breaks ;-)

> The patch should use DOUBLE_INT_ASM_OP, not explicit ".quad".  AIX
> always is PIC.  It's not obvious to me why the patch should limit
> support to PPC64 Linux.

It's the only thing that was tested so far (I'm not sure if BE was
tested even there?)

> The section selection seems Linux/ELF
> specific, but the rest seems like a general design optimization for
> all PowerPC-based operating systems.

Yup, needs some work.


This patch should be split into the core piece, which creates the flag
and adds infrastructure etc., and patches for the subtarget-specific
pieces.  It is likely that different subtargets will want to choose
different sections for the jump tables?  But the basic things can work
everywhere, in principle.


Segher


Re: [PATCH, rs6000] Add non-relative jump table support for 64bit rs6000

2020-07-30 Thread Segher Boessenkool
Hi!

As I explained in the other mail, please split this into two: one the
core thing, adding the flag and all code generation (maybe using some
new macros or such for section selection, for example); and the second
patch hooking it up and enabling this by default for powerpc64* (or only
for powerpc64le?)

The first patch should work on all OSes, all endians, all word sizes.

On Tue, Jul 28, 2020 at 01:25:36PM +0800, HAO CHEN GUI via Gcc-patches wrote:
> We want to put non-relative jump 
> table in data.rel.ro section for rs6000. So I add a new target hook - 
> TARGET_ASM_SELECT_JUMPTABLE_SECTION to choose which section jump table 
> should be put in.

That sounds like a good idea, but please put it in a separate patch as
well?  Before everything else.

> It puts the jump table in data.rel.ro for 64bit 
> rs6000. For other platforms, it calls 
> targetm.asm_out.function_rodata_section, just as before.

Maybe that hook should get another parameter, instead?

> We want to have 
> an option to use non-relative jump table even when PIC flag is set. So I 
> add a new option - "mforce-nonrelative-jumptables" for rs6000.

"force" isn't good in a name, and no negatives please (what would
-mno-force-norelative-jumptables mean?)  So just -mrelative-jumptables
then?

> static bool
> rs6000_gen_pic_addr_diff_vec (void)
> {
>   return TARGET_32BIT || !rs6000_force_nonrelative_jumptables;
> }

There is no reason why it cannot work on 32-bit?  It even is more
obviously a win there (for 64-bit the jump table elements are just 32
bits for relative, but 64 otherwise; for 32-bit, you get 32 bits either
way, so that isn't an advantage for relative there).

> * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Redefine.
> * config/rs6000/rs6000.c 
> (TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC,TARGET_ASM_SELECT_JUMPTABLE_SECTION): 
> Implement two hooks.

(Space after comma.  Line too long.)

The target macros are not very directly related, so put them in separate
entries please?

>   * config/rs6000/rs6000.h 
> (CASE_VECTOR_PC_RELATIVE,CASE_VECTOR_MODE) Redefine.

Space after comma, colon after closing paren here.

"Redefine"?  The patch doesn't do that (which is good, we don't want
that).  Ah you mean the patch changes the value it is defined as; your
changelog sounds like it now #undefs it first, which would be nasty :-)

>   * config/rs6000/rs6000.md (nonrelative_tablejumpdi, 
> nonrelative_tablejumpdi_nospec) Add two expansions.
>   * config/rs6000/rs6000.opt (mforce-nonrelative-jumptables) Add 
> a new options for rs6000.

Colon, line too long.  Changelog lines are 80 chars max, including the
initial tab (which counts as 8 spaces).

> * doc/tm.texi.in (TARGET_ASM_SELECT_JUMPTABLE_SECTION): Likewise.

Likewise?  It doesn't do the same thing as the previous line.

> * doc/tm.texi: Regenerate.
> * final.c (targetm.asm_out.select_jumptable_section): Replace 
> targetm.asm_out.function_rodata_section with 
> targetm.asm_out.select_jumptable_section in jumptable section selection.
> * output.h (default_select_jumptable_section): Declare.
> * target.def (default_select_jumptable_section): Likewise
> * varasm.c (default_select_jumptable_section): Implementation.


> --- a/gcc/config/rs6000/linux64.h
> +++ b/gcc/config/rs6000/linux64.h
> @@ -324,7 +324,7 @@ extern int dot_symbols;
>  
>  /* Indicate that jump tables go in the text section.  */
>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> +#define JUMP_TABLES_IN_TEXT_SECTION (TARGET_64BIT && flag_pic && 
> !rs6000_force_nonrelative_jumptables)

I'm not sure where flag_pic matters here?

> +/* Implement TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC. Return false if 
> rs6000_force_nonrelative_jumptables is set and target is 64bit. */

Line too long.  Lines of code are at most 80 chars.

> +/* Implement TARGET_ASM_FUNCTION_RODATA_SECTION. Put non-relative jump table 
> in data.rel.ro section */
> +
> +section *
> +rs6000_select_jumptable_section (tree decl)
> +{
> +  if (TARGET_32BIT)
> +return default_function_rodata_section (decl);

This shouldn't exclude 32 bit?

> +  if (decl != NULL_TREE && DECL_SECTION_NAME (decl))

Just "if (decl && DECL_SECTION_NAME (decl))" if you ask me...  but some
people disagree, I guess.

> +{
> +  const char *name = DECL_SECTION_NAME (decl);
> +
> +  if (DECL_COMDAT_GROUP (decl) && HAVE_COMDAT_GROUP)
> +{
> +  const char *dot;
> +  size_t len;
> +  char* rname;
> +
> +  dot = strchr (name + 1, '.');
> +  if (!dot)
> +dot = name;
> +  len = strlen (dot) + 13;
> +  rname = (char *) alloca (len);
> +
> +  strcpy (rname, ".data.rel.ro");
> +  strcat (rname, dot);
> +  return get_section (rname, SECTION_WRITE | SECTION_RELRO | 
> SECTION_LINKONCE, decl);
> +}
> +/* For .gnu.linkonce.t.foo w

[PATCH] libgomp: Add OMPD functions defined in Thread Handles.

2020-07-30 Thread y2s1982 via Gcc-patches
This patch adds functions defined in the section 5.5.5 Thread Handles in
OpenMP API Specification 5.0. It currently omits CUDA support, and
hard-codes the offset and sizeof values. I would appreciate any
suggestions.

2020-07-30  Tony Sim  

libgomp/ChangeLog:

* Makefile.am (libgompd_la_SOURCES): Add ompd-thread.c.
* Makefile.in: Regenerate.
* libgompd.h (ompd_thread_handle_t): Add definition.
(ompd_parallel_handle_t): Add definition.
* ompd-thread.c: New file.

---
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |   5 +-
 libgomp/libgompd.h|  12 +++
 libgomp/ompd-thread.c | 203 ++
 4 files changed, 219 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/ompd-thread.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index fe0a92122ea..0e6f2d1c830 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -90,7 +90,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c oacc-target.c
 
-libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-thread.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index f74d5f3ac8e..137ce35dfc1 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -235,7 +235,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
$(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 libgompd_la_LIBADD =
-am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo
+am_libgompd_la_OBJECTS = ompd-lib.lo ompd-proc.lo ompd-thread.lo
 libgompd_la_OBJECTS = $(am_libgompd_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -593,7 +593,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
oacc-target.c $(am__append_4)
-libgompd_la_SOURCES = ompd-lib.c ompd-proc.c
+libgompd_la_SOURCES = ompd-lib.c ompd-proc.c ompd-thread.c
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -819,6 +819,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/oacc-target.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-lib.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-proc.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ompd-thread.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordered.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/parallel.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/priority_queue.Plo@am__quote@
diff --git a/libgomp/libgompd.h b/libgomp/libgompd.h
index 495995e00d3..98a2f74b983 100644
--- a/libgomp/libgompd.h
+++ b/libgomp/libgompd.h
@@ -47,4 +47,16 @@ typedef struct _ompd_aspace_handle {
   ompd_size_t ref_count;
 } ompd_address_space_handle_t;
 
+typedef struct _ompd_thread_handle {
+  ompd_parallel_handle_t *ph;
+  ompd_address_space_handle_t *ah;
+  ompd_thread_context_t *tcontext;
+  ompd_address_t *thread;
+} ompd_thread_handle_t;
+
+typedef struct _ompd_parallel_handle {
+  ompd_address_space_handle_t *ah;
+  ompd_address_t *tpool;
+} ompd_parallel_handle_t;
+
 #endif /* LIBGOMPD_H */
diff --git a/libgomp/ompd-thread.c b/libgomp/ompd-thread.c
new file mode 100644
index 000..281cd7e2520
--- /dev/null
+++ b/libgomp/ompd-thread.c
@@ -0,0 +1,203 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Yoosuk Sim .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This file contains OMPD function definitions related to threads, as defined
+   in the section 5.5.5 in the OpenMP API Specification 5.0.  */
+
+#include "omp-tools.

Re: [PATCH] c++: decl_constant_value and unsharing [PR96197]

2020-07-30 Thread Richard Biener via Gcc-patches
On Thu, Jul 30, 2020 at 8:55 PM Patrick Palka via Gcc-patches
 wrote:
>
> On Thu, 30 Jul 2020, Jason Merrill wrote:
>
> > On 7/30/20 9:49 AM, Patrick Palka wrote:
> > > On Thu, 30 Jul 2020, Jason Merrill wrote:
> > >
> > > > On 7/21/20 3:07 PM, Patrick Palka wrote:
> > > > > In the testcase from the PR we are seeing excessive memory use (> 5GB)
> > > > > during constexpr evaluation, almost all of which is due to the call to
> > > > > decl_constant_value in the VAR_DECL/CONST_DECL branch of
> > > > > cxx_eval_constant_expression.  We reach here every time we evaluate an
> > > > > ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
> > > > > often, and from there decl_constant_value makes an unshared copy of 
> > > > > the
> > > > > VAR_DECL's initializer, even though the unsharing is not needed at 
> > > > > this
> > > > > call site (because it is up to callers of cxx_eval_constant_expression
> > > > > to unshare).
> > > > >
> > > > > To fix this, this patch moves the responsibility of unsharing the 
> > > > > result
> > > > > of decl_constant_value, decl_really_constant_value and
> > > > > scalar_constant_value from the callee to the caller.
> > > >
> > > > How about creating another entry point that doesn't unshare, and using
> > > > that in
> > > > constexpr evaluation?
> > >
> > > Is something like this what you have in mind?  This adds a defaulted
> > > bool parameter to the three routines that controls unsharing (except for
> > > decl_constant_value, which instead needs a new overload if we don't want
> > > to touch its common declaration in c-common.h.)  Bootstrap and regtest
> > > in progress.
> >
> > That looks good, though I don't think we need the added parameter for
> > scalar_constant_value.
>
> Hmm, I guess it should always be cheap to unshare an scalar initializer.
> So consider the parameter removed for scalar_constant_value.
>
> >
> > > -- >8 --
> > >
> > > Subject: [PATCH] c++: decl_constant_value and unsharing [PR96197]
> > >
> > > In the testcase from the PR we are seeing excessive memory use (> 5GB)
> > > during constexpr evaluation, almost all of which is due to the call to
> > > decl_constant_value in the VAR_DECL/CONST_DECL branch of
> > > cxx_eval_constant_expression.  We reach here every time we evaluate an
> > > ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
> > > often, and from there decl_constant_value makes an unshared copy of the
> > > VAR_DECL's initializer, even though the unsharing is not needed at this
> > > call site (because callers of cxx_eval_constant_expression already
> > > unshare its result when necessary).
> > >
> > > To fix this excessive unsharing, this patch introduces a new defaulted
> > > parameter unshare_p to scalar_constant_value, decl_really_constant_value
> > > and decl_constant_value to allow callers to decide if these routines
> > > should unshare their result before returning.  (Since decl_constant_value
> > > is declared in c-common.h, it instead gets a new overload declared in
> > > cp-tree.h.)
> > >
> > > As a simplification, this patch also moves the call to unshare_expr in
> > > constant_value_1 outside of the loop, since calling unshare_expr on a
> > > DECL_P should be a no-op.
> > >
> > > Additionally, in unify there is one call to scalar_constant_value that
> > > seems to be dead code since we apparently don't see unlowered
> > > enumerators anymore, so this patch replaces it with an appropriate
> > > gcc_assert.
>
> I'll also push this change as a separate patch, now that
> scalar_constant_value is unrelated to the rest of the patch.
>
> Here is the main patch that I guess I'll commit after a full bootstrap
> and regtest:

Might be a good candidate for backporting to GCC 10 as well after
a while - it at least looks simple enough.

Richard.

> -- >8 --
>
> Subject: [PATCH] c++: decl_constant_value and unsharing [PR96197]
>
> In the testcase from the PR we are seeing excessive memory use (> 5GB)
> during constexpr evaluation, almost all of which is due to the call to
> decl_constant_value in the VAR_DECL/CONST_DECL branch of
> cxx_eval_constant_expression.  We reach here every time we evaluate an
> ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
> often, and from there decl_constant_value makes an unshared copy of the
> VAR_DECL's initializer, even though the unsharing is not needed at this
> call site (because callers of cxx_eval_constant_expression already
> unshare its result when necessary).
>
> To fix this excessive unsharing, this patch introduces a new defaulted
> parameter unshare_p to decl_really_constant_value and
> decl_constant_value to allow callers to choose whether these routines
> should unshare the returned result.  (Since decl_constant_value is
> declared in c-common.h, we introduce a new overload declared in
> cp-tree.h instead of changing its existing declaration.)
>
> As a simplification, this patch also moves the call to unshare_expr in
> constant_value_1 outside of th

RE: [PATCH PR95696] regrename creates overlapping register allocations for vliw

2020-07-30 Thread Zhongyunde

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Tuesday, July 28, 2020 1:33 AM
> To: Zhongyunde 
> Cc: gcc-patches@gcc.gnu.org; Yangfei (Felix) 
> Subject: Re: [PATCH PR95696] regrename creates overlapping register
> allocations for vliw
> 
> Zhongyunde  writes:
> > I reconsider the issue and update patch attached.
> > Yes, If the kernel loop bb's information doesn't use in regrename, it
> > also need not be collected to save compile time.
> 
> Thanks.  The point about other callers was a good one though.
> I think it would be OK to add a new parameter to regrename_analyze that
> says whether it should process all blocks, ignoring BB_DISABLE_SCHEDULE.
> The default value could be true, with only regrename itself passing false.

Thanks for your detail advice, and I add a new parameter to regrename_analyze.

> Why do you need the BB_MODIFIED test?  The point in time that that flag
> is measured from is somewhat arbitrary.  Also, the modification that
> caused the flag to be set might not have invalidated the schedule.

Yes, in fact the BB_MODIFIED test is not need.

> Richard
> 
> >
> >> -Original Message-
> >> From: Zhongyunde
> >> Sent: Sunday, July 26, 2020 3:29 PM
> >> To: 'Richard Sandiford' 
> >> Cc: gcc-patches@gcc.gnu.org; Yangfei (Felix) 
> >> Subject: RE: [PATCH PR95696] regrename creates overlapping register
> >> allocations for vliw
> >>
> >>
> >> > >> It's interesting that this is for a testcase using SMS.  One of
> >> > >> the traditional problems with the GCC implementation of SMS has
> >> > >> been ensuring that later passes don't mess up the scheduled
> >> > >> loop.  So in your testcase, does register allocation succeed for
> >> > >> the SMS loop without invalidating the bundling decisions?
> >> > >
> >> > > Yes.
> >> > >
> >> > >> If so, then it's probably better to avoid running regrename on it at
> all.
> >> > >> It mostly exists to help the second scheduling pass, but the
> >> > >> second scheduling pass shouldn't be messing with an SMS loop
> anyway.
> >> > >> Also, although the patch deals with one case in which regrename
> >> > >> could disrupt the bundling, there are others too.
> >> > >>
> >> > >> So maybe one option would be to make regrename ignore blocks
> >> > >> that have BB_DISABLE_SCHEDULE set.  (Sorry if that's been
> >> > >> discussed and discounted
> >> > >> already.)
> >> > >
> >> > > ok, according your advice, I make a new patch attached.
> >> >
> >> > Thanks.  I think we should treat the SMS and the REG_UNUSED stuff
> >> > as separate patches though.
> >> >
> >> > For the SMS part, I think a better place to enforce the rule is in
> >> > build_def_use.  If that function returns false early for
> >> > BB_DISABLE_SCHEDULE, we will avoid disrupting the schedule for the
> >> > block without wasting too much compile time on it, and we'll still
> >> > keep the pass structures internally correct.  (It would also be
> >> > good to have a dump_file message to say that that's what we're
> >> > doing.)
> >>
> >> > Do you still need the REG_UNUSED stuff with the SMS patch?  If so,
> >> > could you describe the (presumably non-SMS) cases that are
> affected?
> >>
> >> Yes, the non-SMS basic block should not be affected.
> >> An alternate method attached can avoid use REG_UNUSED stuff for BB
> >> with BB_DISABLE_SCHEDUL.
> >>
> >> I don't change build_def_use to return false early as I find some
> >> other optimization reuse the function regrename_analyze to creat
> >> def/use chain info of the kernel loop body in our target.
> >>
> >> > TBH, given that the bundling information is so uncertain at this
> >> > stage, I think it would be better to have a mode in which regrename
> >> > ignores REG_UNUSED notes altogether.  Perhaps we could put it
> under
> >> a
> >> > --param, which targets could then set to whichever default they
> prefer.
> >> > The default should be the current behaviour though.
> >>
> >>
> >> > Thanks,
> >> > Richard
> >
> > diff --git a/gcc/regrename.c b/gcc/regrename.c index
> > c38173a77..1df58e52d 100644
> > --- a/gcc/regrename.c
> > +++ b/gcc/regrename.c
> > @@ -737,6 +737,14 @@ regrename_analyze (bitmap bb_mask)
> >if (dump_file)
> > fprintf (dump_file, "\nprocessing block %d:\n", bb1->index);
> >
> > +  if ((bb1->flags & BB_DISABLE_SCHEDULE) != 0
> > + && (bb1->flags & BB_MODIFIED) == 0)
> > +   {
> > + if (dump_file)
> > +   fprintf (dump_file, "skip to avoid disrupting the sms schedule\n");
> > + continue;
> > +   }
> > +
> >init_rename_info (this_info, bb1);
> >
> >success = build_def_use (bb1);


PR95696_5.patch
Description: PR95696_5.patch