date:20211028

Hi!

On Thu, Oct 14, 2021 at 10:26:29AM +0200, Jakub Jelinek via Gcc-patches wrote:
> The following patch implements the C++23 Multidimensional subscript operator
> P2128R6 paper.

I'd like to ping this patch.

Thanks.

> 2021-10-14  Jakub Jelinek  
> 
>   PR c++/102611
> gcc/
>   * doc/invoke.texi (-Wcomma-subscript): Document that for
>   -std=c++20 the option isn't enabled by default with -Wno-deprecated
>   but for -std=c++23 it is.
> gcc/c-family/
>   * c-opts.c (c_common_post_options): Enable -Wcomma-subscript by
>   default for C++23 regardless of warn_deprecated.
>   * c-cppbuiltin.c (c_cpp_builtins): Predefine
>   __cpp_multidimensional_subscript=202110L for C++23.
> gcc/cp/
>   * cp-tree.h (build_op_subscript): Implement P2128R6
>   - Multidimensional subscript operator.  Declare.
>   (grok_array_decl): Remove bool argument, add vec **
>   and tsubst_flags_t arguments.
>   (build_min_non_dep_op_overload): Declare another overload.
>   * parser.c (cp_parser_postfix_expression): Mention C++23 syntax
>   in function comment.  For C++23 parse zero or more than one
>   initializer clauses in expression list, adjust grok_array_decl
>   caller.
>   (cp_parser_builtin_offsetof): Adjust grok_array_decl caller.
>   * decl.c (grok_op_properties): For C++23 don't check number
>   of arguments of operator[].
>   * decl2.c (grok_array_decl): Remove decltype_p argument, add
>   index_exp_list and complain arguments.  If index_exp is NULL,
>   handle *index_exp_list as the subscript expression list.
>   * tree.c (build_min_non_dep_op_overload): New overload.
>   * call.c (build_op_subscript_1, build_op_subscript): New
>   functions.
>   * pt.c (tsubst_copy_and_build) : If second
>   operand is magic CALL_EXPR with ovl_op_identifier (ARRAY_REF)
>   as CALL_EXPR_FN, tsubst CALL_EXPR arguments including expanding
>   pack expressions in it and call grok_array_decl instead of
>   build_x_array_ref.
>   * semantics.c (handle_omp_array_sections_1): Adjust grok_array_decl
>   caller.
> gcc/testsuite/
>   * g++.dg/cpp2a/comma1.C: Expect different diagnostics for C++23.
>   * g++.dg/cpp2a/comma3.C: Likewise.
>   * g++.dg/cpp2a/comma4.C: Expect diagnostics for C++23.
>   * g++.dg/cpp2a/comma5.C: Expect different diagnostics for C++23.
>   * g++.dg/cpp23/feat-cxx2b.C: Test __cpp_multidimensional_subscript
>   predefined macro.
>   * g++.dg/cpp23/subscript1.C: New test.
>   * g++.dg/cpp23/subscript2.C: New test.
>   * g++.dg/cpp23/subscript3.C: New test.
>   * g++.dg/cpp23/subscript4.C: New test.
>   * g++.dg/cpp23/subscript5.C: New test.
>   * g++.dg/cpp23/subscript6.C: New test.

Jakub

Re: [PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets





On 10/28/2021 1:04 AM, Richard Biener via Gcc-patches wrote:

On Wed, Oct 27, 2021 at 10:10 PM Richard Purdie via Gcc-patches
 wrote:

During cross compiling, CPP is being set to the target compiler even for
build targets. As an example, when building a cross compiler targetting
mingw, the config.log for libiberty in
build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log
shows:

configure:3786: checking how to run the C preprocessor
configure:3856: result: x86_64-pokysdk-mingw32-gcc -E 
--sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32
configure:3876: x86_64-pokysdk-mingw32-gcc -E 
--sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c
configure:3876: $? = 0

This is libiberty being built for the build environment, not the target one
(i.e. in build-x86_64-linux). As such it should be using the build environment's
gcc and not the target one. In the mingw case the system headers are quite
different leading to build failures related to not being able to include a
process.h file for pem-unix.c.

Further analysis shows the same issue occuring for CPPFLAGS too.

Fix this by adding support for CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD which
for example, avoids mixing the mingw headers for host binaries on linux
systems.

OK.
I don't think Richard P. has write access, so I went ahead and pushed 
this to the trunk.


jeff

[PATCH] vect: Add bias parameter for partial vectorization

2021-10-28 Thread Robin Dapp via Gcc-patches

Hi,

as discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582627.html this
introduces a bias parameter for the len_load/len_store ifns as well as
optabs that is meant to distinguish between Power and s390 variants.
The default is a bias of 0, while in s390's case vll/vstl do not support
lengths of zero bytes and a bias of -1 should be used.

Bootstrapped and regtested on Power9 (--with-cpu=power9) and s390
(--with-arch=z15).

The tiny changes in the Power backend I will post separately.

Regards
 Robincommit 18a5fcd0f8835247e86d86fb018789fe755404be
Author: Robin Dapp 
Date:   Wed Oct 27 11:42:11 2021 +0200

vect: Add bias parameter for partial vectorization

This adds a bias parameter for LEN_LOAD and LEN_STORE as well as the
corresponding internal functions.  A bias of 0 represents the status
quo, while -1 is used for the s390 vll instruction that expects the
highest byte to load rather than the number of bytes to load.
Backends need to support one of these biases via an operand predicate
and the vectorizer will then emit the appropriate variant.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 8312d08aab2..993e32c1854 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int index)
 static void
 expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  class expand_operand ops[3];
-  tree type, lhs, rhs, maskt;
-  rtx mem, target, mask;
+  class expand_operand ops[4];
+  tree type, lhs, rhs, maskt, biast;
+  rtx mem, target, mask, bias;
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
@@ -2727,7 +2727,16 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
  TYPE_UNSIGNED (TREE_TYPE (maskt)));
   else
 create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (icode, 3, ops);
+  if (optab == len_load_optab)
+{
+  biast = gimple_call_arg (stmt, 3);
+  bias = expand_normal (biast);
+  create_input_operand (&ops[3], bias, QImode);
+  expand_insn (icode, 4, ops);
+}
+  else
+expand_insn (icode, 3, ops);
+
   if (!rtx_equal_p (target, ops[0].value))
 emit_move_insn (target, ops[0].value);
 }
@@ -2741,9 +2750,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 static void
 expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  class expand_operand ops[3];
-  tree type, lhs, rhs, maskt;
-  rtx mem, reg, mask;
+  class expand_operand ops[4];
+  tree type, lhs, rhs, maskt, biast;
+  rtx mem, reg, mask, bias;
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
@@ -2770,7 +2779,16 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
  TYPE_UNSIGNED (TREE_TYPE (maskt)));
   else
 create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (icode, 3, ops);
+
+  if (optab == len_store_optab)
+{
+  biast = gimple_call_arg (stmt, 4);
+  bias = expand_normal (biast);
+  create_input_operand (&ops[3], bias, QImode);
+  expand_insn (icode, 4, ops);
+}
+  else
+expand_insn (icode, 3, ops);
 }
 
 #define expand_mask_store_optab_fn expand_partial_store_optab_fn
@@ -4172,6 +4190,30 @@ internal_check_ptrs_fn_supported_p (internal_fn ifn, tree type,
 	  && insn_operand_matches (icode, 4, GEN_INT (align)));
 }
 
+/* Return the supported bias for the len_load IFN.  For now we support a
+   default bias of 0 and -1 in case 0 is not an allowable length for len_load.
+   If none of these biases match what the backend provides, return
+   VECT_PARTIAL_BIAS_UNSUPPORTED.  */
+
+signed char
+internal_len_load_bias_supported (internal_fn ifn, machine_mode mode)
+{
+  optab optab = direct_internal_fn_optab (ifn);
+  insn_code icode = direct_optab_handler (optab, mode);
+
+  if (icode != CODE_FOR_nothing)
+{
+  /* We only support a bias of 0 (default) or -1.  Try both
+	 of them.  */
+  if (insn_operand_matches (icode, 3, GEN_INT (0)))
+	return 0;
+  else if (insn_operand_matches (icode, 3, GEN_INT (-1)))
+	return -1;
+}
+
+  return VECT_PARTIAL_BIAS_UNSUPPORTED;
+}
+
 /* Expand STMT as though it were a call to internal function FN.  */
 
 void
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 19d0f849a5a..af28cf0d566 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -227,6 +227,10 @@ extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
 		tree, tree, int);
 extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree,
 		poly_uint64, unsigned int);
+#define VECT_PARTIAL_BIAS_UNSUPPORTED 127
+
+extern signed char internal_len_load_bias_supported (internal_fn ifn,
+		 machine_mode);
 
 extern void expand_addsub_overflow (location_t, tree_code, tree, tree, tree,
 bool, bool, bool, bool, tree *);
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
in

Re: [PATCH 4/5] gcc/nios2: Define the musl linker





On 10/27/2021 2:05 PM, Richard Purdie via Gcc-patches wrote:

Add a definition of the musl linker used on the nios2 platform.

2021-10-26 Richard Purdie 

gcc/ChangeLog:

 * config/nios2/linux.h (MUSL_DYNAMIC_LINKER): Add musl linker

THanks.  I've pushed this to the trunk

jeff

[COMMITTED] tree-optimization/102940 - Reset scev before invoking array_checker.

As pointed out, we need to reset scev before invoking the array-checker 
in execute_ranger_vrp.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


>From d46aeb5906b8ed7ee255cfbacc5cf9d2f56b850c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 26 Oct 2021 14:43:33 -0400
Subject: [PATCH 1/3] Reset scev before invoking array_checker.

Before invoking the array_checker, we need to reset scev so it will not try to
access any ssa_names that the substitute and fold engine has freed.

	PR tree-optimization/102940
	* tree-vrp.c (execute_ranger_vrp): Reset scev.
---
 gcc/tree-vrp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 38ea50303e0..dc3e250537a 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -4351,7 +4351,6 @@ execute_ranger_vrp (struct function *fun, bool warn_array_bounds_p)
   if (dump_file && (dump_flags & TDF_DETAILS))
 ranger->dump (dump_file);
 
-
   if (warn_array_bounds && warn_array_bounds_p)
 {
   // Set all edges as executable, except those ranger says aren't.
@@ -4367,6 +4366,7 @@ execute_ranger_vrp (struct function *fun, bool warn_array_bounds_p)
 	else
 	  e->flags |= EDGE_EXECUTABLE;
 	}
+  scev_reset ();
   array_bounds_checker array_checker (fun, ranger);
   array_checker.check ();
 }
-- 
2.17.2

[COMMITTED] Unify EVRP and VRP folding predicate message.


When EVRP folds a predicate it reports it only with TDF_DETAILS set as:

    Predicate evaluates to: 0

VRP on the other hand always reports it to a dump_file as:

    Folding predicate c_10 > 6 to 0

This patch changes fold_cond() in the simplifier to use the latter 
format, and converts a couple of EVRP tests to expect the new format.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From a6bbf1cc9f2847115543d720a99152d7dc2c4892 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 26 Oct 2021 13:19:05 -0400
Subject: [PATCH 2/3] Unify EVRP and VRP folding predicate message.

EVRP issues a message fior folding predicates in a different format than
VRP does, this patch unifies the messaging.

	gcc/
	* vr-values.c (simplify_using_ranges::fold_cond): Change fold message.

	gcc/testsuite/
	* gcc.dg/tree-ssa/evrp9.c: Adjust message scanned for.
	* gcc.dg/tree-ssa/pr21458-2.c: Ditto.
---
 gcc/testsuite/gcc.dg/tree-ssa/evrp9.c |  6 --
 gcc/testsuite/gcc.dg/tree-ssa/pr21458-2.c |  2 +-
 gcc/vr-values.c   | 14 ++
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp9.c b/gcc/testsuite/gcc.dg/tree-ssa/evrp9.c
index 6e7828e4340..fb7c319fc43 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/evrp9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp9.c
@@ -24,5 +24,7 @@ foo (unsigned int x, unsigned int y)
 bar ();
 }
 
-/* { dg-final { scan-tree-dump-not "== 5" "evrp" } } */
-/* { dg-final { scan-tree-dump-not "== 6" "evrp" } } */
+/* { dg-final { scan-tree-dump-times "Folding predicate minv_.* == 5 to 0" 1 "evrp" } } */
+/* { dg-final { scan-tree-dump-times "Folding predicate minv_.* == 6 to 0" 1 "evrp" } } */
+/* { dg-final { scan-tree-dump-times "Folding predicate maxv_.* == 5 to 0" 1 "evrp" } } */
+/* { dg-final { scan-tree-dump-times "Folding predicate maxv_.* == 6 to 0" 1 "evrp" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21458-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21458-2.c
index f8d7353fc0e..9610570e272 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr21458-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21458-2.c
@@ -16,4 +16,4 @@ foo (int a)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "Predicate evaluates to: 1" 1 "evrp" } } */
+/* { dg-final { scan-tree-dump-times "Folding predicate.* to 1" 1 "evrp" } } */
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 00246c9d3af..ea925f7559d 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -3495,12 +3495,18 @@ simplify_using_ranges::fold_cond (gcond *cond)
   if (TREE_CODE (gimple_cond_lhs (cond)) != SSA_NAME
 	  && TREE_CODE (gimple_cond_rhs (cond)) != SSA_NAME)
 	return false;
+  if (dump_file)
+	{
+	  fprintf (dump_file, "Folding predicate ");
+	  print_gimple_expr (dump_file, cond, 0);
+	  fprintf (dump_file, " to ");
+	}
   edge e0 = EDGE_SUCC (gimple_bb (cond), 0);
   edge e1 = EDGE_SUCC (gimple_bb (cond), 1);
   if (r.zero_p ())
 	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "\nPredicate evaluates to: 0\n");
+	  if (dump_file)
+	fprintf (dump_file, "0\n");
 	  gimple_cond_make_false (cond);
 	  if (e0->flags & EDGE_TRUE_VALUE)
 	set_and_propagate_unexecutable (e0);
@@ -3509,8 +3515,8 @@ simplify_using_ranges::fold_cond (gcond *cond)
 	}
   else
 	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "\nPredicate evaluates to: 1\n");
+	  if (dump_file)
+	fprintf (dump_file, "1\n");
 	  gimple_cond_make_true (cond);
 	  if (e0->flags & EDGE_FALSE_VALUE)
 	set_and_propagate_unexecutable (e0);
-- 
2.17.2

Re: [PATCH 3/5] gcc: Add --nostdlib++ option





On 10/27/2021 2:05 PM, Richard Purdie via Gcc-patches wrote:

OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime libraries
separately from the compiler and slightly differently to the standard gcc build.

In general this works well but in trying to build them separately we run into
an issue since we're using our gcc, not xgcc and there is no way to tell 
configure
to use libgcc but not look for libstdc++.

This adds such an option allowing such configurations to work.

2021-10-26 Richard Purdie 

gcc/c-family/ChangeLog:

 * c.opt: Add --nostdlib++ option

gcc/cp/ChangeLog:

 * g++spec.c (lang_specific_driver): Add --nostdlib++ option

gcc/ChangeLog:

 * doc/invoke.texi: Document --nostdlib++ option
 * gcc.c: Add --nostdlib++ option

Couldn't you use -nostdlib then explicitly add -lgcc?

If that works, that would seem better to me compared to adding an option 
to specs processing that is really only useful to one build 
system/procedure.


jeff

[COMMITTED] Fix ifcvt-4.c to not depend on VRP2 asserts.

as discussed elsewhere, gcc.dg/ifcvt-4.c seems to be a flawed testcase 
as it is.


it does not use __builtin_expect properly, so edges are predicted 
50/50.  It also turns out that it only works due to an oddity that 
causes the basic blocks to be restructured by the VRP2 pass when it 
removes all the ASSERT_EXPRs that it uses. If we run ranger for VRP2, 
the testcase fails because the blocks do not get rearranged.


This patch tweaks the testcase so that it uses expect properly and 
passes with both classic vrp2, and ranger vrp2.   I have tested it on 
both powerpc and x86 which seem the be the primaries.  Pushed.


Andrew

>From d123daec0c237533cf974334d98bc6d357d4273e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 25 Oct 2021 13:34:36 -0400
Subject: [PATCH 3/3] Fix ifcvt-4.c to not depend on VRP2 asserts.

The testcase fails if VRP2 is replaced with a non-assert based VRP because it
accidentally depends on specific IL changes when the asserts are removed.  This
removes that dependency.

	gcc/testsuite/
	* gcc.dg/ifcvt-4.c: Adjust.
---
 gcc/testsuite/gcc.dg/ifcvt-4.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index ec142cfd943..e74e449b402 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -13,8 +13,7 @@ foo (word x, word y, word a)
   word i = x;
   word j = y;
   /* Try to make taking the branch likely.  */
-  __builtin_expect (x > y, 1);
-  if (x > y)
+  if (__builtin_expect (x > y, 0))
 {
   i = a;
   j = i;
-- 
2.17.2

Re: [PATCH] gcc/Makefile.in: fix bug in gengtype link rule





On 10/26/2021 3:26 PM, David Malcolm via Gcc-patches wrote:

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/ChangeLog:
* Makefile.in: Fix syntax for reference to LIBDEPS in
gengtype link rule.

Signed-off-by: David Malcolm 

OK
jeff

Re: [PATCH] assert_streq: add newlines to failure message





On 10/26/2021 3:27 PM, David Malcolm via Gcc-patches wrote:

Adding newlines so that the two strings line up makes string equality
failures considerably easier to read.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/ChangeLog:
* selftest.c (assert_streq): Add newlines when emitting non-equal
non-NULL strings.

Signed-off-by: David Malcolm 

OK
jeff

Re: [PATCH] libcody: add mostlyclean Makefile target





On 10/26/2021 3:47 AM, Martin Liška wrote:

On 10/25/21 18:10, Eric Gallager wrote:

On Mon, Oct 25, 2021 at 7:35 AM Martin Liška  wrote:


Hello.

The patch adds missing Makefile mostlyclean.

Ready to be installed?
Thanks,
Martin



Generally the way the various "*clean" targets are arranged, in order
of cleanliness, from least clean to most clean, is:
mostlyclean
clean
distclean
maintainer-clean
...with each target depending on the previous one in the order. So
thus, instead of mostlyclean depending on clean, it'd be the other way
around, with clean depending on mostlyclean. See how the gcc/
subdirectory does it, for example. See the "Standard Targets for
Users" section of the GNU Coding Standards:
https://www.gnu.org/prep/standards/html_node/Standard-Targets.html#Standard-Targets 



Thank you for the explanation.

There's updated version of the patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin




 PR other/102657

libcody/ChangeLog:

 * Makefile.in: Add mostlyclean Makefile target.
---
   libcody/Makefile.in | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libcody/Makefile.in b/libcody/Makefile.in
index b8b45a2e310..d8f1e8216d4 100644
--- a/libcody/Makefile.in
+++ b/libcody/Makefile.in
@@ -111,7 +111,7 @@ maintainer-clean:: distclean
   clean::
 rm -f $(shell find $(srcdir) -name '*~')

-.PHONY: all check clean distclean maintainer-clean
+.PHONY: all check clean distclean maintainer-clean mostlyclean

   CXXFLAGS/ := -I$(srcdir)
   LIBCODY.O := buffer.o client.o fatal.o netclient.o netserver.o \
@@ -127,6 +127,8 @@ clean::
 rm -f $(LIBCODY.O) $(LIBCODY.O:.o=.d)
 rm -f libcody.a

+mostlyclean: clean
+
   CXXFLAGS/fatal.cc = -DSRCDIR='"$(srcdir)"'

   fatal.o: Makefile revision
--
2.33.1



0001-libcody-add-mostlyclean-Makefile-target.patch

 From fcad6039f910b49dfc4022d3b1eeb993025dabca Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 25 Oct 2021 16:32:55 +0200
Subject: [PATCH] libcody: add mostlyclean Makefile target

PR other/102657

libcody/ChangeLog:

* Makefile.in: Add mostlyclean Makefile target.

OK

jeff

Re: [PATCH] c++, v2: Implement DR2351 - void{} [PR102820]

2021-10-28 Thread Jason Merrill via Gcc-patches


On 10/28/21 08:19, Jakub Jelinek wrote:

On Thu, Oct 28, 2021 at 08:01:27AM -0400, Jason Merrill wrote:

--- gcc/cp/semantics.c.jj   2021-10-27 09:16:41.161600606 +0200
+++ gcc/cp/semantics.c  2021-10-28 13:06:59.325791588 +0200
@@ -3079,6 +3079,24 @@ finish_unary_op_expr (location_t op_loc,
 return result;
   }
+/* Return true if CONSTRUCTOR EXPR after pack expansion could have no
+   elements.  */
+
+static bool
+maybe_zero_constructor_nelts (tree expr)
+{
+  if (CONSTRUCTOR_NELTS (expr) == 0)
+return true;
+  if (!processing_template_decl)
+return false;
+  unsigned int i;
+  tree val;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (expr), i, val)


Let's use

   for (constructor_elt &elt : CONSTRUCTOR_ELTS (t))


Ok, will do.


@@ -3104,9 +3122,20 @@ finish_compound_literal (tree type, tree
 if (!TYPE_OBJ_P (type))
   {
-  if (complain & tf_error)
-   error ("compound literal of non-object type %qT", type);
-  return error_mark_node;
+  /* DR2351 */
+  if (VOID_TYPE_P (type) && CONSTRUCTOR_NELTS (compound_literal) == 0)
+   return void_node;


This test now seems redundant with the one below (if you remove the &&
processing_template_decl).


It is not redundant, for the maybe case it doesn't return void_node, but
falls through into if (processing_template_decl), which, because
compound_literal is necessarily instantiation_dependent_expression_p
(it contains packs) will just create CONSTRUCTOR_IS_DEPENDENT CONSTRUCTOR
and we'll get here back during instantiation.
For the CONSTRUCTOR_NELTS == 0 case even in templates we know
compound_literal isn't dependent (it doesn't contain anything) and type
isn't either, so we can return void_node right away (and when
!processing_template_decl we have to do that).


Ah, right.  Never mind that comment, then.

Jason

Re: [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]

2021-10-28 Thread David Edelsohn via Gcc-patches

On Thu, Oct 28, 2021 at 1:39 AM Xionghu Luo  wrote:
>
> On 2021/10/27 21:24, David Edelsohn wrote:
> > On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo  wrote:
> >>
> >> If the second operand of __builtin_shuffle is const vector 0, and with
> >> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
> >> patterns match and emit for VSX xxpermdi.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/powerpc/pr102868.c: New test.
> >> ---
> >>  gcc/config/rs6000/rs6000.c  | 47 --
> >>  gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +
> >>  2 files changed, 97 insertions(+), 3 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index d0730253bcc..5d802c1fa96 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx 
> >> op0, rtx op1,
> >>  {OPTION_MASK_P8_VECTOR,
> >>   BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
> >>   : CODE_FOR_p8_vmrgew_v4sf_direct,
> >> - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
> >> + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
> >> +{OPTION_MASK_VSX,
> >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> >> + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
> >> +{OPTION_MASK_VSX,
> >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> >> + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
> >> +{OPTION_MASK_VSX,
> >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> >> + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
> >> +{OPTION_MASK_VSX,
> >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> >> + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};
> >
> > If the insn_code is the same for big endian and little endian, why
> > does the new code test BYTES_BIG_ENDIAN to set the same value
> > (CODE_FOR_vsx_xxpermdi_v16qi)?
> >
>
> Thanks for the catch, updated the patch as below:
>
> [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the 
> upper bits [PR102868]
>
> If the second operand of __builtin_shuffle is const vector 0, and with
> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
> patterns match and emit for VSX xxpermdi.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr102868.c: New test.

Okay.

Thanks, David

[PATCH] path relation oracle: Remove SSA's being killed from the equivalence list.

Same thing as the relational change.  Walk any equivalences that have
been registered on the path, and remove the name being killed.  The
only reason we had added the equivalence with itself earlier is so we
wouldn't search any further in the equivalency list.  So if we are
removing all references to it, then we no longer need to add a "kill"
record.

Will push pending tests on x86-64 Linux.

Co-authored-by: Andrew MacLeod 

gcc/ChangeLog:

* value-relation.cc (path_oracle::killing_def): Walk the
equivalency list and remove SSA from any equivalencies.
---
 gcc/value-relation.cc | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 0ad4f7a9495..512b51ce022 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -1298,17 +1298,17 @@ path_oracle::killing_def (tree ssa)
 }
 
   unsigned v = SSA_NAME_VERSION (ssa);
-  bitmap b = BITMAP_ALLOC (&m_bitmaps);
-  bitmap_set_bit (b, v);
-  equiv_chain *ptr = (equiv_chain *) obstack_alloc (&m_chain_obstack,
-   sizeof (equiv_chain));
-  ptr->m_names = b;
-  ptr->m_bb = NULL;
-  ptr->m_next = m_equiv.m_next;
-  m_equiv.m_next = ptr;
-  bitmap_ior_into (m_equiv.m_names, b);
 
-  // Walk the relation list an remove SSA from any relations.
+  // Walk the equivalency list and remove SSA from any equivalencies.
+  if (bitmap_bit_p (m_equiv.m_names, v))
+{
+  bitmap_clear_bit (m_equiv.m_names, v);
+  for (equiv_chain *ptr = m_equiv.m_next; ptr; ptr = ptr->m_next)
+   if (bitmap_bit_p (ptr->m_names, v))
+ bitmap_clear_bit (ptr->m_names, v);
+}
+
+  // Walk the relation list and remove SSA from any relations.
   if (!bitmap_bit_p (m_relations.m_names, v))
 return;
 
-- 
2.31.1

[PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

This patch upgrades the pre-VRP threading passes to fully resolving
backward threaders, and removes the post-VRP threading passes altogether.
With it, we reduce the number of threaders in our pipeline from 9 to 7.

This will leave DOM as the only forward threader client.  When the ranger
can handle floats, we should be able to upgrade the pre-DOM threaders to
fully resolving threaders and kill the embedded DOM threader.

The final numbers are:

prev: # threads in backward + vrp-threaders = 92624
now:  # threads in backward threaders = 94275
Gain: +1.78%

prev: # total threads: 189495
now:  # total threads: 193714
Gain: +2.22%

The numbers are not as great as my initial proposal, but I've
recently pushed all the work that got us to this point ;-).

And... the total compilation improves by 1.32%!

There's a regression on uninit-pred-7_a.c that I've yet to look at.  I
want to make sure it's not a missing thread.  If it is, I'll create a PR
and own it.

Also, the tree-ssa/phi_on_compare-*.c tests have all regressed.  This
seems to be some special case the forward threader handles that the
backward threader does not (edge_forwards_cmp_to_conditional_jump*).
I haven't dug deep to see if this is solveable within our
infrastructure, but a cursory look shows that even though the VRP
threader threads this, the *.optimized dump ends with more conditional
jumps than without the optimization.  I'd like to punt on this for
now, because DOM actually catches this through its lone use of the
forward threader (I've adjusted the tests).  However, we will need to
address this sooner or later, if indeed it's still improving the final
assembly.

Even though we have been incrementally stressing all the pieces of this
intricate puzzle, I do expect fall out.  My plan from here until stage1
ends is to stop new development in the threader(s), and focus on bug
fixing and improving the developer's debugging experience.

OK pending another round of tests on x86-64 and ppc64le Linux?

gcc/ChangeLog:

* passes.def: Replace the pass_thread_jumps before VRP* with
pass_thread_jumps_full.  Remove all pass_vrp_threader instances.

libgomp/ChangeLog:

* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading 
changes.
* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

* gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
* gcc.dg/old-style-asm-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
* gcc.dg/tree-ssa/pr20701.c: Same.
* gcc.dg/tree-ssa/pr21001.c: Same.
* gcc.dg/tree-ssa/pr21294.c: Same.
* gcc.dg/tree-ssa/pr21417.c: Same.
* gcc.dg/tree-ssa/pr21559.c: Same.
* gcc.dg/tree-ssa/pr21563.c: Same.
* gcc.dg/tree-ssa/pr49039.c: Same.
* gcc.dg/tree-ssa/pr59597.c: Same.
* gcc.dg/tree-ssa/pr61839_1.c: Same.
* gcc.dg/tree-ssa/pr61839_3.c: Same.
* gcc.dg/tree-ssa/pr66752-3.c: Same.
* gcc.dg/tree-ssa/pr68198.c: Same.
* gcc.dg/tree-ssa/pr77445-2.c: Same.
* gcc.dg/tree-ssa/pr77445.c: Same.
* gcc.dg/tree-ssa/ranger-threader-1.c: Same.
* gcc.dg/tree-ssa/ranger-threader-2.c: Same.
* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
* gcc.dg/tree-ssa/vrp02.c: Same.
* gcc.dg/tree-ssa/vrp03.c: Same.
* gcc.dg/tree-ssa/vrp05.c: Same.
* gcc.dg/tree-ssa/vrp06.c: Same.
* gcc.dg/tree-ssa/vrp07.c: Same.
* gcc.dg/tree-ssa/vrp08.c: Same.
* gcc.dg/tree-ssa/vrp09.c: Same.
* gcc.dg/tree-ssa/vrp106.c: Same.
* gcc.dg/tree-ssa/vrp33.c: Same.
---
 gcc/passes.def|  6 ++
 gcc/testsuite/gcc.dg/loop-unswitch-2.c|  2 +-
 gcc/testsuite/gcc.dg/old-style-asm-1.c|  5 +
 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-1.c  |  9 +++--
 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-2.c  |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-3.c  |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/phi_on_compare-4.c  |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/pr20701.c   |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21001.c   |  2 +-
 gcc/testsuite/gcc

Re: [PATCH] configure: Avoid unnecessary constraints on executables for $build.

2021-10-28 Thread Iain Sandoe

Hi Richard,

> On 8 Sep 2021, at 07:35, Richard Biener  wrote:
> 
> On Tue, Sep 7, 2021 at 10:11 PM Iain Sandoe  wrote:
>> 

>> So, looking through the various email threads and the PR, I think that
>> what has happened is :
>> 
>> As the PR points out, our existing PCH model does not work if the compiler
>> executable is PIE - which manifests on platforms like Darwin (which is PIE
>> by default) or Linux when configured —enable-default-pie.
>> 
>> H.J’s original patch forces no-PIE onto the compiler executables, and
>> because of shared code on $host also to the driver etc.

>> OK for master, and eventually backports?
> 
> OK for trunk, I think it warrants quite some soaking time before considering
> backports.

It’s been on master for quite some time now (and presumably several cycles of
everyone’s CI) without any reports of problems,  it would be good to get this at
least onto 11 and 10 (since that is the last version we can bootstrap with 
c++98).

OK for backports now?
thanks
Iain

[Patch] libcpp: Fix _Pragma expansion [PR102409]

2021-10-28 Thread Tobias Burnus


Before this patch, running

#define TEST(T) T
#define PARALLEL(X) TEST(X)
PARALLEL(
for (int i = 0; i < N; i++) { \
  _Pragma("omp ordered") \
  S[0] += C[i] + D[i]; \
})

through 'gcc -E' yielded

#pragma omp ordered
 for (int i = 0; i < N; i++) { S[0] += C[i] + D[i]; }

Note that the '#pragma omp ordered' is now above the loop, i.e. before
all macro arguments (and macro expansions).

With the patch, the result is the following, which matches Clang and I
assume GCC before 4.2 or 4.3, but I have no GCC 4.x available:

for (int i = 0; i < N; i++) {
#pragma omp ordered
 S[0] += C[i] + D[i]; }


The reason seems to be the addition done for PR34692 in r131819, which
added code to avoid an ICE with
FOO(
#pragma GCC diagnostic
)

There is a length description in macro.c about what it does and the
pragma_buff which is passed around, including to the now modified
collect_args. Namely, the comment above enter_macro_context states:

   If there were additionally any unexpanded deferred #pragma
   directives among macro arguments, push another context containing
   the pragma tokens before the yet-to-be-rescanned replacement list
   and return two.

While that seems to work fine with #pragma, it obviously does not do
what it should for _Pragma. The solution in the patch was to add a
flag to distinguish the CPP_PRAGMA coming from the _Pragma operator
(alias BT_PRAGMA) from the CPP_PRAGMA coming from a user's #pragma.

OK for mainline? – It is a long-standing regression, but it hasn't
been reported for a while. Thus: how do you feel about backporting?

I did test it with a full bootstrap + regtesting. I also tested
omptests (cf. PR).

Tobias

PS: I had the hope that it would fix some of the other _Pragma related
PRs (see e.g. refs in this PR102409 or search Bugzilla), but it does
not seem to help for those. I do note that most of them are related to
diagnostic. In particular, for PR91669, the output of gcc -E is the
same for GCC 10, for a patched GCC and for clang-11, which makes the
result (issue unaffected by this patch) not that surprising.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libcpp: Fix _Pragma expansion [PR102409]

Both #pragma and _Pragma ended up as CPP_PRAGMA. Presumably since
r131819 (2008, GCC 4.3) for PR34692, pragmas are not expanded in
macro arguments but are output as is before. From the old bug report,
that was to fix usage like
  FOO (
#pragma GCC diagnostic
  )
However, that change also affected _Pragma such that
  BAR (
"1";
_Pragma("omp ..."); )
yielded
  #pragma omp ...
followed by what BAR expanded too, possibly including '"1";'.

This commit adds a flag, PRAGMA_OP, to tokens to make the two
distinguishable - and include again _Pragma in the expanded arguments.

libcpp/ChangeLog:

	PR c++/102409
	* directives.c (destringize_and_run): Add PRAGMA_OP to the
	CPP_PRAGMA token's flags to mark is as coming from _Pragma.
	* include/cpplib.h (PRAGMA_OP): #define, to be used with token flags.
	* macro.c (collect_args): Only handle CPP_PRAGMA special if PRAGMA_OP
	is set.

 libcpp/directives.c | 2 ++
 libcpp/include/cpplib.h | 1 +
 libcpp/macro.c  | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/libcpp/directives.c b/libcpp/directives.c
index b4bc8b4df30..34f7677f718 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -1907,6 +1907,8 @@ destringize_and_run (cpp_reader *pfile, const cpp_string *in,
   save_directive = pfile->directive;
   pfile->directive = &dtable[T_PRAGMA];
   do_pragma (pfile);
+  if (pfile->directive_result.type == CPP_PRAGMA)
+pfile->directive_result.flags |= PRAGMA_OP;
   end_directive (pfile, 1);
   pfile->directive = save_directive;
 
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 6e2fcb6b1f2..56b07acc1d7 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -198,6 +198,7 @@ struct GTY(()) cpp_string {
 operator, or before this token
 after a # operator.  */
 #define NO_EXPAND	(1 << 10) /* Do not macro-expand this token.  */
+#define PRAGMA_OP	(1 << 11) /* _Pragma token.  */
 
 /* Specify which field, if any, of the cpp_token union is used.  */
 
diff --git a/libcpp/macro.c b/libcpp/macro.c
index f214548de1e..b2f797cae35 100644
--- a/libcpp/macro.c
+++ b/libcpp/macro.c
@@ -1259,7 +1259,7 @@ collect_args (cpp_reader *pfile, const cpp_hashnode *node,
 	  else if (token->type == CPP_EOF
 		   || (token->type == CPP_HASH && token->flags & BOL))
 	break;
-	  else if (token->type == CPP_PRAGMA)
+	  else if (token->type == CPP_PRAGMA && !(token->flags & PRAGMA_OP))
 	{
 	  cpp_token *newtok = _cpp_temp_token (pfile);

Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr


On 10/28/21 2:59 AM, Prathamesh Kulkarni via Gcc-patches wrote:

On Fri, 22 Oct 2021 at 14:41, Prathamesh Kulkarni
 wrote:


On Wed, 20 Oct 2021 at 15:05, Richard Sandiford
 wrote:


Prathamesh Kulkarni  writes:

On Tue, 19 Oct 2021 at 19:58, Richard Sandiford
 wrote:


Prathamesh Kulkarni  writes:

Hi,
The attached patch emits a more verbose diagnostic for target attribute that
is an architecture extension needing a leading '+'.

For the following test,
void calculate(void) __attribute__ ((__target__ ("sve")));

With patch, the compiler now emits:
102376.c:1:1: error: arch extension ‘sve’ should be prepended with ‘+’
 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
   | ^~~~

instead of:
102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not valid
 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
   | ^~~~


Nice :-)


(This isn't specific to sve though).
OK to commit after bootstrap+test ?

Thanks,
Prathamesh

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a9a1800af53..975f7faf968 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args)
num_attrs++;
if (!aarch64_process_one_target_attr (token))
   {
-   error ("pragma or attribute % is not valid", token);
+   /* Check if token is possibly an arch extension without
+  leading '+'.  */
+   char *str = (char *) xmalloc (strlen (token) + 2);
+   str[0] = '+';
+   strcpy(str + 1, token);


I think std::string would be better here, e.g.:

   auto with_plus = std::string ("+") + token;


+   if (aarch64_handle_attr_isa_flags (str))
+ error("arch extension %<%s%> should be prepended with %<+%>", token);


Nit: should be a space before the “(”.

In principle, a fixit hint would have been nice here, but I don't think
we have enough information to provide one.  (Just saying for the record.)

Thanks for the suggestions.
Does the attached patch look OK ?


Looks good apart from a couple of formatting nits.


Thanks,
Prathamesh


Thanks,
Richard


+   else
+ error ("pragma or attribute % is not valid", token);
+   free (str);
 return false;
   }



[aarch64] PR102376 - Emit better diagnostics for arch extension in target 
attribute.

gcc/ChangeLog:
   PR target/102376
   * config/aarch64/aarch64.c (aarch64_handle_attr_isa_flags): Change str's
   type to const char *.
   (aarch64_process_target_attr): Check if token is possibly an arch 
extension
   without leading '+' and emit diagnostic accordingly.

gcc/testsuite/ChangeLog:
   PR target/102376
   * gcc.target/aarch64/pr102376.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a9a1800af53..b72079bc466 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17548,7 +17548,7 @@ aarch64_handle_attr_tune (const char *str)
 modified.  */

  static bool
-aarch64_handle_attr_isa_flags (char *str)
+aarch64_handle_attr_isa_flags (const char *str)
  {
enum aarch64_parse_opt_result parse_res;
uint64_t isa_flags = aarch64_isa_flags;
@@ -17821,7 +17821,13 @@ aarch64_process_target_attr (tree args)
num_attrs++;
if (!aarch64_process_one_target_attr (token))
   {
-   error ("pragma or attribute % is not valid", token);
+   /* Check if token is possibly an arch extension without
+  leading '+'.  */
+   auto with_plus = std::string("+") + token;


Should be a space before “(”.


+   if (aarch64_handle_attr_isa_flags (with_plus.c_str ()))
+ error ("arch extension %<%s%> should be prepended with %<+%>", token);


Long line, should be:

 error ("arch extension %<%s%> should be prepended with %<+%>",
token);

OK with those changes, thanks.

Thanks, the patch regressed some target attr tests because it emitted
diagnostics twice from
aarch64_handle_attr_isa_flags.
So for eg, spellcheck_1.c:
__attribute__((target ("arch=armv8-a-typo"))) void foo () {}

results in:
spellcheck_1.c:5:1: error: invalid name ("armv8-a-typo") in
‘target("arch=")’ pragma or attribute
 5 | {
   | ^
spellcheck_1.c:5:1: note: valid arguments are: armv8-a armv8.1-a
armv8.2-a armv8.3-a armv8.4-a armv8.5-a armv8.6-a armv8.7-a armv8-r
armv9-a
spellcheck_1.c:5:1: error: invalid feature modifier arch=armv8-a-typo
of value ("+arch=armv8-a-typo") in ‘target()’ pragma or attribute
spellcheck_1.c:5:1: error: pragma or attribute
‘target("arch=armv8-a-typo")’ is not valid

The patch adds an additional argument to the
aarch64_handle_attr_isa_flags, to optionally not emit an error, which
works to fix the issue.
Does it look OK ?

ping https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582345.html


Just a couple of minor points:

+ if (aarch64_handle_attr_isa_flags (with_plus.c_str (), false))
+   error ("arch ex

Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

>
>
> And... the total compilation improves by 1.32%!
>

This last number is compilation speed, not number of threads.

Aldy

Re: [PATCH] Adjust testcase for O2 vect.


On 10/28/21 1:23 AM, liuhongt via Gcc-patches wrote:

Adjust code in check_vect_slp_aligned_store_usage to make it an exact
pattern match of the corresponding testcases.
These new target/xfail selectors are added as a temporary solution,
and should be removed after real issue is fixed for Wstringop-overflow.


Thanks for all the work you're putting into this!  I can't say
I understand the conditions under which to use which selector
in what case but hopefully we will be able to remove them all
from the tests once the warnings are moved to a better pass.
If that's a safe assumption I'm okay with the changes to
the tests.  I do have a question/comment on the .exp changes.



gcc/ChangeLog:

* doc/sourcebuild.texi (vect_slp_v4qi_store_2): Document
efficient target.
(vect_slp_v4qi_store_3): Ditto.
(vect_slp_v2hi_store_2): Ditto.

gcc/testsuite/ChangeLog:

PR testsuite/102944
* gcc.dg/Warray-bounds-48.c: Adjust target/xfail selector.
* gcc.dg/Warray-parameter-3.c: Ditto.
* gcc.dg/Wstringop-overflow-68.c: Ditto
* gcc.dg/Wstringop-overflow-76.c: Ditto
* lib/target-supports.exp (vect_slp_v4qi_store_2): New
efficient target.
(vect_slp_v4qi_store_3): Ditto.
(vect_slp_v2hi_store_2): Ditto.
(check_vect_slp_aligned_store_usage): Adjust code to make it
an exact pattern match of corresponding testcase.
---
  gcc/doc/sourcebuild.texi |  12 ++
  gcc/testsuite/gcc.dg/Warray-bounds-48.c  |   4 +-
  gcc/testsuite/gcc.dg/Warray-parameter-3.c|   2 +-
  gcc/testsuite/gcc.dg/Wstringop-overflow-68.c |   4 +-
  gcc/testsuite/gcc.dg/Wstringop-overflow-76.c |  16 +-
  gcc/testsuite/lib/target-supports.exp| 201 ++-
  6 files changed, 179 insertions(+), 60 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6a165767630..2bb3cb3a9be 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1854,6 +1854,14 @@ address at plain @option{-O2}.
  Target supports vectorization of 4-byte char stores with 4-byte aligned
  address at plain @option{-O2}.
  
+@item vect_slp_v4qi_store_2

+Target supports vectorization of 4-byte char stores with 4-byte aligned
+address at plain @option{-O2}.
+
+@item vect_slp_v4qi_store_3
+Target supports vectorization of 4-byte char stores with 4-byte aligned
+address at plain @option{-O2}.


The description is the same for both of these targets as well
as for vect_slp_v2qi_store.

I think if anyone other than a vectorization expert is to have
a chance of using these in the future without reverse engineering
the code the descriptions need to capture the differences between
them.  I.e., make it clear when vect_slp_v4qi_store is appropriate
and when either vect_slp_v4qi_store_2 or vect_slp_v4qi_store_3
should be used instead.

Martin


+
  @item vect_slp_v8qi_store
  Target supports vectorization of 8-byte char stores with 8-byte aligned
  address at plain @option{-O2}.
@@ -1874,6 +1882,10 @@ address at plain @option{-O2}.
  Target supports vectorization of 8-byte int stores with 8-byte aligned
  address at plain @option{-O2}.
  
+@item vect_slp_v2si_store_2

+Target supports vectorization of 8-byte int stores with 8-byte aligned
+address at plain @option{-O2}.
+
  @item vect_slp_v4si_store
  Target supports vectorization of 16-byte int stores with 16-byte aligned
  address at plain @option{-O2}.
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-48.c 
b/gcc/testsuite/gcc.dg/Warray-bounds-48.c
index 19b7634c063..32c0df843d2 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-48.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-48.c
@@ -30,7 +30,7 @@ static void nowarn_ax_extern (struct AX *p)
  
  static void warn_ax_local_buf (struct AX *p)

  {
-  p->ax[0] = 4; p->ax[1] = 5;  // { dg-warning "\\\[-Wstringop-overflow" "pr102706" { 
target { vect_slp_v2hi_store &&  { ! vect_slp_v4hi_store } } } }
+  p->ax[0] = 4; p->ax[1] = 5;  // { dg-warning "\\\[-Wstringop-overflow" "pr102706" { 
target { vect_slp_v2hi_store_2 &&  { ! vect_slp_v4hi_store } } } }
  
p->ax[2] = 6; // { dg-warning "\\\[-Warray-bounds" }

p->ax[3] = 7; // { dg-warning "\\\[-Warray-bounds" }
@@ -130,7 +130,7 @@ static void warn_a0_extern (struct A0 *p)
  
  static void warn_a0_local_buf (struct A0 *p)

  {
-  p->a0[0] = 4; p->a0[1] = 5;  // { dg-warning "\\\[-Wstringop-overflow" "pr102706" { 
target { vect_slp_v2hi_store && { ! vect_slp_v4hi_store } } } }
+  p->a0[0] = 4; p->a0[1] = 5;  // { dg-warning "\\\[-Wstringop-overflow" "pr102706" { 
target { vect_slp_v2hi_store_2 && { ! vect_slp_v4hi_store } } } }
  
p->a0[2] = 6; // { dg-warning "\\\[-Warray-bounds" }

p->a0[3] = 7; // { dg-warning "\\\[-Warray-bounds" }
diff --git a/gcc/testsuite/gcc.dg/Warray-parameter-3.c 
b/gcc/testsuite/gcc.dg/Warray-parameter-3.c
index b6ed8daf51c..bbf55a40a3c 100644
--- a/gcc/testsuite/gcc.dg/Warray-parameter-3.c
+++ b/gcc/testsuite/g

Re: [Patch] libcpp: Fix _Pragma expansion [PR102409]


On 10/28/21 9:51 AM, Tobias Burnus wrote:

Before this patch, running

#define TEST(T) T
#define PARALLEL(X) TEST(X)
PARALLEL(
     for (int i = 0; i < N; i++) { \
   _Pragma("omp ordered") \
   S[0] += C[i] + D[i]; \
     })

through 'gcc -E' yielded

#pragma omp ordered
  for (int i = 0; i < N; i++) { S[0] += C[i] + D[i]; }

Note that the '#pragma omp ordered' is now above the loop, i.e. before
all macro arguments (and macro expansions).

With the patch, the result is the following, which matches Clang and I
assume GCC before 4.2 or 4.3, but I have no GCC 4.x available:

for (int i = 0; i < N; i++) {
#pragma omp ordered
  S[0] += C[i] + D[i]; }


There are a number of bug reports of _Pragma not working right
in macros, including (and especially) to control diagnostics:
https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=_Pragma%20macro&list_id=328003

Just by the description this change seems like it could also
fix some of them.  It would be helpful to check to see if it
does and if so, add tests and resolve the bugs it fixes (I'm
willing to help with that in stage 3).

Martin




The reason seems to be the addition done for PR34692 in r131819, which
added code to avoid an ICE with
FOO(
#pragma GCC diagnostic
)

There is a length description in macro.c about what it does and the
pragma_buff which is passed around, including to the now modified
collect_args. Namely, the comment above enter_macro_context states:

    If there were additionally any unexpanded deferred #pragma
    directives among macro arguments, push another context containing
    the pragma tokens before the yet-to-be-rescanned replacement list
    and return two.

While that seems to work fine with #pragma, it obviously does not do
what it should for _Pragma. The solution in the patch was to add a
flag to distinguish the CPP_PRAGMA coming from the _Pragma operator
(alias BT_PRAGMA) from the CPP_PRAGMA coming from a user's #pragma.

OK for mainline? – It is a long-standing regression, but it hasn't
been reported for a while. Thus: how do you feel about backporting?

I did test it with a full bootstrap + regtesting. I also tested
omptests (cf. PR).

Tobias

PS: I had the hope that it would fix some of the other _Pragma related
PRs (see e.g. refs in this PR102409 or search Bugzilla), but it does
not seem to help for those. I do note that most of them are related to
diagnostic. In particular, for PR91669, the output of gcc -E is the
same for GCC 10, for a patched GCC and for clang-11, which makes the
result (issue unaffected by this patch) not that surprising.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955

Re: [PATCH] c++: CTAD within template argument [PR102933]

2021-10-28 Thread Patrick Palka via Gcc-patches

On Wed, 27 Oct 2021, Jason Merrill wrote:

> On 10/26/21 13:44, Patrick Palka wrote:
> > Here when checking for erroneous occurrences of 'auto' inside a template
> > argument (which is allowed by the concepts TS for class templates),
> > extract_autos_r picks up the CTAD placeholder for X{T{0}} which causes
> > check_auto_in_tmpl_args to reject this valid template argument.  This
> > patch fixes this by making extract_autos_r ignore CTAD placeholders.
> 
> It also seems questionable that check_auto_in_tmpl_args is looking into
> non-type arguments, which won't have the bad autos this is looking for.

Ah yeah, interesting.  Somehow this doesn't cause problems when passing
an auto NTTP as a template argument.

> 
> > However, it seems we don't need to call check_auto_in_tmpl_args at all
> > outside of the concepts TS since using 'auto' as a type-id is otherwise
> > rejected more generally at parse time.  So this patch guards calls to
> > check_auto_in_tmpl_args with flag_concepts_ts instead of flag_concepts.
> > 
> > Relatedly, I think the concepts code paths in do_auto_deduction and
> > type_uses_auto are also necessary only for the concepts TS, so this
> > patch also restricts these code paths accordingly.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk and perhaps 11?
> 
> For 11 (and possibly trunk) maybe return false from check_auto... if
> !flag_concepts_ts rather than asserting and changing the call sites. That one
> change is OK for 11, the whole patch is OK for trunk.

Done, thanks a lot.

> 
> The comment on the test or assert could be elaborated to explain as you do
> above that any bad autos will have been rejected already by the parser.

Done, here's what I ended up committing to trunk so far:

-- >8 --

Subject: [PATCH] c++: CTAD within template argument [PR102933]

Here when checking for erroneous occurrences of 'auto' inside a template
argument (which is allowed by the concepts TS for class templates),
extract_autos_r picks up the CTAD placeholder for X{T{0}} which causes
check_auto_in_tmpl_args to reject this valid template argument.  This
patch fixes this by making extract_autos_r ignore CTAD placeholders.

However, it seems we don't need to call check_auto_in_tmpl_args at all
outside of the concepts TS since using 'auto' as a type-id is otherwise
rejected more generally at parse time.  So this patch makes the function
just exit early if !flag_concepts_ts.

Similarly, I think the concepts code paths in do_auto_deduction and
type_uses_auto are only necessary for the concepts TS, so this patch
also restricts these code paths accordingly.

PR c++/102933

gcc/cp/ChangeLog:

* parser.c (cp_parser_simple_type_specifier): Adjust diagnostic
for using auto in parameter declaration.
* pt.c (extract_autos_r): Ignore CTAD placeholders.
(extract_autos): Use range-based for.
(do_auto_deduction): Use extract_autos only for the concepts TS
and not also for standard concepts.
(type_uses_auto): Likewise with for_each_template_parm.
(check_auto_in_tmpl_args): Just return false outside of the
concepts TS.  Simplify.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class50.C: New test.
* g++.dg/cpp2a/nontype-class50a.C: New test.
---
 gcc/cp/parser.c   |  2 +-
 gcc/cp/pt.c   | 24 ++-
 gcc/testsuite/g++.dg/cpp2a/nontype-class50.C  | 13 ++
 gcc/testsuite/g++.dg/cpp2a/nontype-class50a.C |  5 
 4 files changed, 32 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class50.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class50a.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 93335c817d7..4c2075742d6 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -19513,7 +19513,7 @@ cp_parser_simple_type_specifier (cp_parser* parser,
  else if (!flag_concepts)
pedwarn (token->location, 0,
 "use of % in parameter declaration "
-"only available with %<-fconcepts-ts%>");
+"only available with %<-std=c++20%> or %<-fconcepts%>");
}
   else
type = make_auto ();
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 287cf4ce9d0..66040035b2f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28560,7 +28560,7 @@ static int
 extract_autos_r (tree t, void *data)
 {
   hash_table &hash = *(hash_table*)data;
-  if (is_auto (t))
+  if (is_auto (t) && !template_placeholder_p (t))
 {
   /* All the autos were built with index 0; fix that up now.  */
   tree *p = hash.find_slot (t, INSERT);
@@ -28594,10 +28594,8 @@ extract_autos (tree type)
   for_each_template_parm (type, extract_autos_r, &hash, &visited, true);

   tree tree_vec = make_tree_vec (hash.elements());
-  for (hash_table::iterator iter = hash.begin();
-   iter != hash.end(); ++iter)
+  for (tree elt : h

Re: [PATCH 3/5] gcc: Add --nostdlib++ option

2021-10-28 Thread Richard Purdie via Gcc-patches

On Thu, 2021-10-28 at 08:51 -0600, Jeff Law wrote:
> 
> On 10/27/2021 2:05 PM, Richard Purdie via Gcc-patches wrote:
> > OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime libraries
> > separately from the compiler and slightly differently to the standard gcc 
> > build.
> > 
> > In general this works well but in trying to build them separately we run 
> > into
> > an issue since we're using our gcc, not xgcc and there is no way to tell 
> > configure
> > to use libgcc but not look for libstdc++.
> > 
> > This adds such an option allowing such configurations to work.
> > 
> > 2021-10-26 Richard Purdie 
> > 
> > gcc/c-family/ChangeLog:
> > 
> >  * c.opt: Add --nostdlib++ option
> > 
> > gcc/cp/ChangeLog:
> > 
> >  * g++spec.c (lang_specific_driver): Add --nostdlib++ option
> > 
> > gcc/ChangeLog:
> > 
> >  * doc/invoke.texi: Document --nostdlib++ option
> >  * gcc.c: Add --nostdlib++ option
> Couldn't you use -nostdlib then explicitly add -lgcc?
> 
> If that works, that would seem better to me compared to adding an option 
> to specs processing that is really only useful to one build 
> system/procedure.

It sounds great in principle but I've never been able to get it to work. With 
"-nostdinc++ -nostdlib" I miss the startup files so I also tried "-nostdinc++ -
nodefaultlibs -lgcc". The latter gets further and I can build libstdc++ but the
resulting library doesn't link into applications correctly.

Cheers,

Richard

Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-28 Thread Patrick Palka via Gcc-patches

> On 10/27/21 17:10, Patrick Palka wrote:
> > On Wed, 27 Oct 2021, Jason Merrill wrote:
> > 
> > > On 10/27/21 14:54, Patrick Palka wrote:
> > > > On Tue, 26 Oct 2021, Jakub Jelinek wrote:
> > > > 
> > > > > On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote:
> > > > > > The performance impact of the other calls to
> > > > > > cxx_eval_outermost_const_expr
> > > > > > from p_c_e_1 is probably already mostly mitigated by the constexpr
> > > > > > call
> > > > > > cache and the fact that we try to evaluate all calls to constexpr
> > > > > > functions during cp_fold_function anyway (at least with -O).  So
> > > > > > trial
> > > > > 
> > > > > constexpr function bodies don't go through cp_fold_function
> > > > > (intentionally,
> > > > > so that we don't optimize away UB), the bodies are copied before the
> > > > > trees
> > > > > of the
> > > > > normal copy are folded.
> > > > 
> > > > Ah right, I had forgotten about that..
> > > > 
> > > > Here's another approach that doesn't need to remove trial evaluation for
> > > > &&/||.  The idea is to first quietly check if the second operand is
> > > > potentially constant _before_ performing trial evaluation of the first
> > > > operand.  This speeds up the case we care about (both operands are
> > > > potentially constant) without regressing any diagnostics.  We have to be
> > > > careful about emitting bogus diagnostics when tf_error is set, hence the
> > > > first hunk below which makes p_c_e_1 always proceed quietly first, and
> > > > replay noisily in case of error (similar to how satisfaction works).
> > > > 
> > > > Would something like this be preferable?
> > > 
> > > Seems plausible, though doubling the number of stack frames is a downside.
> > 
> > Whoops, good point..  The noisy -> quiet adjustment only needs to
> > be performed during the outermost call to p_c_e_1, and not also during
> > each recursive call.  The revised diff below fixes this thinko, and so
> > only a single extra stack frame is needed AFAICT.
> > 
> > > What did you think of Jakub's suggestion of linearizing the terms?
> > 
> > IIUC that would fix the quadraticness, but it wouldn't address that
> > we end up evaluating the entire expression twice, once during the trial
> > evaluation of each term from p_c_e_1 and again during the proper
> > evaluation of the entire expression.  It'd be nice if we could somehow
> > avoid the double evaluation, as in the approach below (or in the first
> > patch).
> 
> OK with more comments to explain the tf_error hijinks.

Thanks a lot, here's the complete committed patch for the record:

-- >8 --

Subject: [PATCH] c++: quadratic constexpr behavior for left-assoc logical
 exprs [PR102780]

In the testcase below the two left fold expressions each expand into a
constant logical expression with 1024 terms, for which potential_const_expr
takes more than a minute to return true.  This happens because p_c_e_1
performs trial evaluation of the first operand of a &&/|| in order to
determine whether to consider the potentiality of the second operand.
And because the expanded expression is left-associated, this trial
evaluation causes p_c_e_1 to be quadratic in the number of terms of the
expression.

This patch fixes this quadratic behavior by making p_c_e_1 preemptively
compute potentiality of the second operand of a &&/||, and perform trial
evaluation of the first operand only if the second operand isn't
potentially constant.  We must be careful to avoid emitting bogus
diagnostics during the preemptive computation; to that end, we perform
this shortcut only when tf_error is cleared, and when tf_error is set we
now first check potentiality of the whole expression quietly and replay
the check noisily for diagnostics.

Apart from fixing the quadraticness for left-associated logical exprs,
this change also reduces compile time for the libstdc++ testcase
20_util/variant/87619.cc by about 15% even though our  uses
right folds instead of left folds.  Likewise for the testcase in the PR,
for which compile time is reduced by 30%.  The reason for these speedups
is that p_c_e_1 no longer performs expensive trial evaluation of each term
of large constant logical expressions when determining their potentiality.

PR c++/102780

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1) :
When tf_error isn't set, preemptively check potentiality of the
second operand before performing trial evaluation of the first
operand.
(potential_constant_expression_1): When tf_error is set, first check
potentiality quietly and return true if successful, otherwise
proceed noisily to give errors.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/fold13.C: New test.
---
 gcc/cp/constexpr.c  | 26 +-
 gcc/testsuite/g++.dg/cpp1z/fold13.C | 29 +
 2 files changed, 50 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/fold

Re: [PATCH 3/5] gcc: Add --nostdlib++ option

2021-10-28 Thread Richard Purdie via Gcc-patches

On Wed, 2021-10-27 at 22:56 +0200, Bernhard Reutner-Fischer wrote:
> On Wed, 27 Oct 2021 21:05:03 +0100
> Richard Purdie via Gcc-patches  wrote:
> 
> > OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime libraries
> > separately from the compiler and slightly differently to the standard gcc 
> > build.
> > 
> > In general this works well but in trying to build them separately we run 
> > into
> > an issue since we're using our gcc, not xgcc and there is no way to tell 
> > configure
> > to use libgcc but not look for libstdc++.
> > 
> > This adds such an option allowing such configurations to work.
> 
> But shouldn't it be called --nostdlibc++ then?

Maybe :). There are already --nostdinc++ and nostdlib options so --nostdlib++
matches those but I'm happy to use --nostdlibc++ if that is preferred.

Cheers,

Richard

Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

On Thu, Oct 28, 2021 at 12:40:02PM -0400, Patrick Palka wrote:
>   PR c++/102780
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.c (potential_constant_expression_1) :
>   When tf_error isn't set, preemptively check potentiality of the
>   second operand before performing trial evaluation of the first
>   operand.
>   (potential_constant_expression_1): When tf_error is set, first check
>   potentiality quietly and return true if successful, otherwise
>   proceed noisily to give errors.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1z/fold13.C: New test.
> ---
>  gcc/cp/constexpr.c  | 26 +-
>  gcc/testsuite/g++.dg/cpp1z/fold13.C | 29 +
>  2 files changed, 50 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1z/fold13.C

Is there a reason to turn this mode of evaluating everything twice if an
error should be diagnosed all the time, rather than only if we actually see
a TRUTH_*_EXPR we want to handle this way?
If we don't see any TRUTH_*_EXPR, or if processing_template_decl, or if
the first operand is already a constant, that seems like a waste of time.

So, can't we instead drop the second hunk, if processing_template_decl or
TREE_CODE (op0) == INTEGER_CST do what the code did before, and just
otherwise if flag & tf_error temporarily clear it, RECUR on op1 first, then
cxx_eval_outermost_constant_expr and if tree_int_cst_equal RECUR on op1
the second time with tf_error and some extra flag (1 << 30?) that will mean not 
to
RECUR on op1 twice in recursive invocations (to avoid bad compile time
complexity on
(x && (x && (x && (x && (x && (x && (x && (x && (x && (x && (x && (x && 
y
etc. if x constant evaluates to true and y is not potential constant
expression.

Though, I'm not sure that doing the RECUR (op1, rval) instead of
cxx_eval_outermost_constant_expr must be generally a win for compile time,
it can be if op0 is very large expression, but it can be a pessimization if
op1 is huge and op0 is simple and doesn't constexpr evaluate to tmp.

As I said, another possibility is something like:
/* Try to quietly evaluate T to constant, but don't try too hard.  */

static tree
potential_constant_expression_eval (tree t)
{
  auto o = make_temp_override (constexpr_ops_limit,
   MIN (constexpr_ops_limit, 100));
  return cxx_eval_outermost_constant_expr (t, true);
}
and using this new function instead of cxx_eval_outermost_constant_expr (op, 
true);
everywhere in potential_constant_expression_1 should fix the quadraticness
too.

> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -8892,13 +8892,18 @@ potential_constant_expression_1 (tree t, bool 
> want_rval, bool strict, bool now,
>tmp = boolean_false_node;
>  truth:
>{
> - tree op = TREE_OPERAND (t, 0);
> - if (!RECUR (op, rval))
> + tree op0 = TREE_OPERAND (t, 0);
> + tree op1 = TREE_OPERAND (t, 1);
> + if (!RECUR (op0, rval))
> return false;
> + if (!(flags & tf_error) && RECUR (op1, rval))
> +   /* When quiet, try to avoid expensive trial evaluation by first
> +  checking potentiality of the second operand.  */
> +   return true;
>   if (!processing_template_decl)
> -   op = cxx_eval_outermost_constant_expr (op, true);
> - if (tree_int_cst_equal (op, tmp))
> -   return RECUR (TREE_OPERAND (t, 1), rval);
> +   op0 = cxx_eval_outermost_constant_expr (op0, true);
> + if (tree_int_cst_equal (op0, tmp))
> +   return (flags & tf_error) ? RECUR (op1, rval) : false;
>   else
> return true;
>}
> @@ -9107,6 +9112,17 @@ bool
>  potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool 
> now,
>tsubst_flags_t flags)
>  {
> +  if (flags & tf_error)
> +{
> +  /* Check potentiality quietly first, as that could be performed more
> +  efficiently in some cases (currently only for TRUTH_*_EXPR).  If
> +  that fails, replay the check noisily to give errors.  */
> +  flags &= ~tf_error;
> +  if (potential_constant_expression_1 (t, want_rval, strict, now, flags))
> + return true;
> +  flags |= tf_error;
> +}
> +
>tree target = NULL_TREE;
>return potential_constant_expression_1 (t, want_rval, strict, now,
> flags, &target);

Jakub

Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.





On 10/28/2021 9:24 AM, Aldy Hernandez wrote:

This patch upgrades the pre-VRP threading passes to fully resolving
backward threaders, and removes the post-VRP threading passes altogether.
With it, we reduce the number of threaders in our pipeline from 9 to 7.

This will leave DOM as the only forward threader client.  When the ranger
can handle floats, we should be able to upgrade the pre-DOM threaders to
fully resolving threaders and kill the embedded DOM threader.

The final numbers are:

prev: # threads in backward + vrp-threaders = 92624
now:  # threads in backward threaders = 94275
Gain: +1.78%

prev: # total threads: 189495
now:  # total threads: 193714
Gain: +2.22%

The numbers are not as great as my initial proposal, but I've
recently pushed all the work that got us to this point ;-).

And... the total compilation improves by 1.32%!

There's a regression on uninit-pred-7_a.c that I've yet to look at.  I
want to make sure it's not a missing thread.  If it is, I'll create a PR
and own it.

Also, the tree-ssa/phi_on_compare-*.c tests have all regressed.  This
seems to be some special case the forward threader handles that the
backward threader does not (edge_forwards_cmp_to_conditional_jump*).
I haven't dug deep to see if this is solveable within our
infrastructure, but a cursory look shows that even though the VRP
threader threads this, the *.optimized dump ends with more conditional
jumps than without the optimization.  I'd like to punt on this for
now, because DOM actually catches this through its lone use of the
forward threader (I've adjusted the tests).  However, we will need to
address this sooner or later, if indeed it's still improving the final
assembly.

Even though we have been incrementally stressing all the pieces of this
intricate puzzle, I do expect fall out.  My plan from here until stage1
ends is to stop new development in the threader(s), and focus on bug
fixing and improving the developer's debugging experience.

OK pending another round of tests on x86-64 and ppc64le Linux?

gcc/ChangeLog:

* passes.def: Replace the pass_thread_jumps before VRP* with
pass_thread_jumps_full.  Remove all pass_vrp_threader instances.

libgomp/ChangeLog:

* testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading 
changes.
* testsuite/libgomp.graphite/force-parallel-8.c: Same.

gcc/testsuite/ChangeLog:

* gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
* gcc.dg/old-style-asm-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
* gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
* gcc.dg/tree-ssa/pr20701.c: Same.
* gcc.dg/tree-ssa/pr21001.c: Same.
* gcc.dg/tree-ssa/pr21294.c: Same.
* gcc.dg/tree-ssa/pr21417.c: Same.
* gcc.dg/tree-ssa/pr21559.c: Same.
* gcc.dg/tree-ssa/pr21563.c: Same.
* gcc.dg/tree-ssa/pr49039.c: Same.
* gcc.dg/tree-ssa/pr59597.c: Same.
* gcc.dg/tree-ssa/pr61839_1.c: Same.
* gcc.dg/tree-ssa/pr61839_3.c: Same.
* gcc.dg/tree-ssa/pr66752-3.c: Same.
* gcc.dg/tree-ssa/pr68198.c: Same.
* gcc.dg/tree-ssa/pr77445-2.c: Same.
* gcc.dg/tree-ssa/pr77445.c: Same.
* gcc.dg/tree-ssa/ranger-threader-1.c: Same.
* gcc.dg/tree-ssa/ranger-threader-2.c: Same.
* gcc.dg/tree-ssa/ranger-threader-4.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
* gcc.dg/tree-ssa/ssa-thread-14.c: Same.
* gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
* gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
* gcc.dg/tree-ssa/vrp02.c: Same.
* gcc.dg/tree-ssa/vrp03.c: Same.
* gcc.dg/tree-ssa/vrp05.c: Same.
* gcc.dg/tree-ssa/vrp06.c: Same.
* gcc.dg/tree-ssa/vrp07.c: Same.
* gcc.dg/tree-ssa/vrp08.c: Same.
* gcc.dg/tree-ssa/vrp09.c: Same.
* gcc.dg/tree-ssa/vrp106.c: Same.
* gcc.dg/tree-ssa/vrp33.c: Same.
OK.  And yes, there will probably be fallout.  Fully expected and we'll 
deal with it.


jeff

Re: [PATCH] configure, d: Add support for bootstrapping the D front-end





On 10/9/2021 7:32 AM, Iain Buclaw via Gcc-patches wrote:

Hi,

The implementation of the D front-end in GCC is based on the original
C++ version of the D programming language compiler, which was ported to
D itself in version 2.069.0 (released in 2015).  To keep it somewhat
up-to-date, I have been backporting fixes from upstream back into C++,
but this stopped at version 2.076.1 (released in 2017), and since then
I've only been keeping the front-end only updated enough to still be
able to build the latest version of the D language (now 2.098.0).

Reasons for putting off switching to the D implementation immediately
after GCC 9 has been a mixture of the front-end not being ready to use,
and current portability status of the D core runtime library.

It has come to the point now that I'm happy enough with the process to
switch out the C++ sources in gcc/d/dmd with D sources.

Before that, there's only this patch that makes the required changes to
GCC itself in order to have a D front-end written in D itself.

The rest of the series only changes code in the D language front-end or
libphobos standard library, so I've left that out for the time being
until I'm ready to commit it.

The complete set of changes are in the ibuclaw/gdc branch under
users/ibuclaw.  It has been well-tested on x86_64-linux-gnu for about 3
years now, and I've also been testing the self-hosted compiler on
powerpc64le-linux-gnu as well with no regressions from the D language
testsuite run.

Does anything stand out as being problematic in this patch, or may need
splitting out first?  Or would it be OK for trunk?

Thanks,
Iain.

---
ChangeLog:

* Makefile.def: Add bootstrap to libbacktrace, libphobos, zlib, and
libatomic.
* Makefile.in: Regenerate.
* Makefile.tpl (POSTSTAGE1_HOST_EXPORTS): Fix command for GDC.
(STAGE1_CONFIGURE_FLAGS): Add --with-libphobos-druntime-only if
target-libphobos-bootstrap.
(STAGE2_CONFIGURE_FLAGS): Likewise.
* configure: Regenerate.
* configure.ac: Add support for bootstrapping D front-end.

config/ChangeLog:

* acx.m4 (ACX_PROG_GDC): New m4 function.

gcc/ChangeLog:

* Makefile.in (GDC): New variable.
(GDCFLAGS): New variable.
* configure: Regenerate.
* configure.ac: Add call to ACX_PROG_GDC.  Substitute GDCFLAGS.

gcc/po/ChangeLog:

* EXCLUDES: Remove d/dmd sources from list.
Presumably this means that the only way to build D for the first time on 
a new target is to cross from an existing target that supports D, right?


I think that's not unreasonable and I don't think we want to increase 
the burden of maintaining an old codebase just for the sake of a 
marginally easier bootstrap process for a new target.


So I think you should go with this whenever you're ready.

jeff

Re: dejagnu version update?

2021-10-28 Thread Bernhard Reutner-Fischer via Gcc-patches

On 10/27/2021 5:00 PM, Bernhard Reutner-Fischer wrote:

On Sat, 4 Aug 2018 18:32:24 +0200
Bernhard Reutner-Fischer wrote:

On Tue, 16 May 2017 at 21:08, Mike Stump wrote:

On May 16, 2017, at 5:16 AM, Jonathan Wakely wrote:

The change I care about in 1.5.3

So, we haven't talked much about the version people want most. If we update,
might as well get something that more people care about. 1.5.3 is in ubuntu
LTS 16.04 and Fedora 24, so it's been around awhile. SUSU is said to be using
1.6, in the post 1.4.4 systems. People stated they want 1.5.2 and 1.5.3, so,
I'm inclined to say, let's shoot for 1.5.3 when we do update.

As for the machines in the FSF compile farm, nah, tail wagging the dog. I'd
rather just update the requirement, and the owners or users of those machines
can install a new dejagnu, if they are using one that is too old and they want
to support testing gcc.

So.. let me ping that, again, now that another year has passed :)

or another 3 or 4 :)

PS: Recap: https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html was
later applied as
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
and was part of the dejagnu-1.5.2 release from 2015. Jonathan requires
1.5.3 for libstdc++ testing.

(i.e.
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347
)

The libdirs fix would allow us to remove the 150 occurrences of the
load_gcc_lib hack, refer to the patch to the fortran list back then.
AFAIR this is still not fixed: +# BUG: gcc-dg calls
gcc-set-multilib-library-path but does not load gcc-defs!

debian-stable (i think 9 ATM), Ubuntu LTS ship versions recent enough
to contain both fixes. Commercial distros seem to ship fixed versions,
too.

It seems in May 2020 there was a thread on gcc with about the same
subject: https://gcc.gnu.org/pipermail/gcc/2020-May/232427.html
where Mike suggests to have approved to bump the required minimum
version to 1.5.3.
So who's in the position to update the
https://gcc.gnu.org/install/prerequisites.html
to s/1.4.4/1.5.3/g && git commit -m 'bump dejagnu required version' ?
All kinds of people. Submit a patch and I bet it'll get approved. More
than anything I suspect it's out-of-sight-out-of-mind at this point
holding us back.

jeff

Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-28 Thread Patrick Palka via Gcc-patches

On Thu, 28 Oct 2021, Jakub Jelinek wrote:
> On Thu, Oct 28, 2021 at 12:40:02PM -0400, Patrick Palka wrote:
> > PR c++/102780
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constexpr.c (potential_constant_expression_1) :
> > When tf_error isn't set, preemptively check potentiality of the
> > second operand before performing trial evaluation of the first
> > operand.
> > (potential_constant_expression_1): When tf_error is set, first check
> > potentiality quietly and return true if successful, otherwise
> > proceed noisily to give errors.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1z/fold13.C: New test.
> > ---
> >  gcc/cp/constexpr.c  | 26 +-
> >  gcc/testsuite/g++.dg/cpp1z/fold13.C | 29 +
> >  2 files changed, 50 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp1z/fold13.C
> 
> Is there a reason to turn this mode of evaluating everything twice if an
> error should be diagnosed all the time, rather than only if we actually see
> a TRUTH_*_EXPR we want to handle this way?
> If we don't see any TRUTH_*_EXPR, or if processing_template_decl, or if
> the first operand is already a constant, that seems like a waste of time.

Hmm yeah, at the very least it wouldn't hurt to check
processing_template_decl before doing the tf_error shenanigans.  I'm not
sure if we would gain anything by first looking for TRUTH_*_EXPR since
that'd involve walking the entire expression anyway IIUC.

> 
> So, can't we instead drop the second hunk, if processing_template_decl or
> TREE_CODE (op0) == INTEGER_CST do what the code did before, and just
> otherwise if flag & tf_error temporarily clear it, RECUR on op1 first, then
> cxx_eval_outermost_constant_expr and if tree_int_cst_equal RECUR on op1
> the second time with tf_error and some extra flag (1 << 30?) that will mean 
> not to
> RECUR on op1 twice in recursive invocations (to avoid bad compile time
> complexity on
> (x && (x && (x && (x && (x && (x && (x && (x && (x && (x && (x && (x && 
> y
> etc. if x constant evaluates to true and y is not potential constant
> expression.

I considered this approach but I wasn't sure it was worth the added
complexity just to speed up the error case.  And the current approach is
already how constraint satisfaction works (it always proceeds quietly
first, and replays the whole thing noisily upon error if tf_error was
set) so there's also a precedence I suppose..

> 
> Though, I'm not sure that doing the RECUR (op1, rval) instead of
> cxx_eval_outermost_constant_expr must be generally a win for compile time,
> it can be if op0 is very large expression, but it can be a pessimization if
> op1 is huge and op0 is simple and doesn't constexpr evaluate to tmp.

True, it's not always a win but it seems to me that checking potentiality
of both operands first before doing the trial evaluation gives the most
consistent performance, at least for constant logical expressions.
Recursing on both operands takes time proportional to the size of the
operands, whereas trial evaluation can take arbitrarily long.

> 
> As I said, another possibility is something like:
> /* Try to quietly evaluate T to constant, but don't try too hard.  */
> 
> static tree
> potential_constant_expression_eval (tree t)
> {
>   auto o = make_temp_override (constexpr_ops_limit,
>  MIN (constexpr_ops_limit, 100));
>   return cxx_eval_outermost_constant_expr (t, true);
> }
> and using this new function instead of cxx_eval_outermost_constant_expr (op, 
> true);
> everywhere in potential_constant_expression_1 should fix the quadraticness
> too.

This would technically fix the quadraticness but wouldn't it still mean
that a huge left-associated constant logical expression is quite a bit
slower to check than an equivalent right-associated one (depending on
what we set constexpr_ops_limit to)?  We should probably do this anyway
anyway but it doesn't seem sufficient on its own to make equivalent
left/right-associated logical expressions have the same performance
behavior IMHO.

> 
> > --- a/gcc/cp/constexpr.c
> > +++ b/gcc/cp/constexpr.c
> > @@ -8892,13 +8892,18 @@ potential_constant_expression_1 (tree t, bool 
> > want_rval, bool strict, bool now,
> >tmp = boolean_false_node;
> >  truth:
> >{
> > -   tree op = TREE_OPERAND (t, 0);
> > -   if (!RECUR (op, rval))
> > +   tree op0 = TREE_OPERAND (t, 0);
> > +   tree op1 = TREE_OPERAND (t, 1);
> > +   if (!RECUR (op0, rval))
> >   return false;
> > +   if (!(flags & tf_error) && RECUR (op1, rval))
> > + /* When quiet, try to avoid expensive trial evaluation by first
> > +checking potentiality of the second operand.  */
> > + return true;
> > if (!processing_template_decl)
> > - op = cxx_eval_outermost_constant_expr (op, true);
> > -   if (tree_int_cst_equal (op, tmp))
> > - return RECUR (TREE_OPERAND (t, 1), rval);
> > +

[PATCH] PR fortran/99853 - ICE: Cannot convert 'LOGICAL(4)' to 'INTEGER(8)' (etc.)

2021-10-28 Thread Harald Anlauf via Gcc-patches

Dear Fortranners,

the original fix by Steve was lingering in the PR.

We did ICE in situations where in a SELECT CASE a kind conversion
was deemed necessary, but it did involve different types.
The check gfc_convert_type_warn () was invoked with arguments
requesting to generate an internal error.  A regular gfc_error
is good enough here.

Regtested on x86_64-pc-linux-gnu.  OK?

Thanks, also to Steve,

Harald


Fortran: generate regular error on invalid conversions of CASE expressions

gcc/fortran/ChangeLog:

PR fortran/99853
* resolve.c (resolve_select): Generate regular gfc_error on
invalid conversions instead of an gfc_internal_error.

gcc/testsuite/ChangeLog:

PR fortran/99853
* gfortran.dg/pr99853.f90: New test.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index af71b132dec..8da396b32ec 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -8770,11 +8770,11 @@ resolve_select (gfc_code *code, bool select_type)

 	  if (cp->low != NULL
 		  && case_expr->ts.kind != gfc_kind_max(case_expr, cp->low))
-		gfc_convert_type_warn (case_expr, &cp->low->ts, 2, 0);
+		gfc_convert_type_warn (case_expr, &cp->low->ts, 1, 0);

 	  if (cp->high != NULL
 		  && case_expr->ts.kind != gfc_kind_max(case_expr, cp->high))
-		gfc_convert_type_warn (case_expr, &cp->high->ts, 2, 0);
+		gfc_convert_type_warn (case_expr, &cp->high->ts, 1, 0);
 	}
 	 }
 }
diff --git a/gcc/testsuite/gfortran.dg/pr99853.f90 b/gcc/testsuite/gfortran.dg/pr99853.f90
new file mode 100644
index 000..421a656bec2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr99853.f90
@@ -0,0 +1,29 @@
+! { dg-do compile }
+! { dg-options "-std=f2018" }
+! PR fortran/99853
+
+subroutine s1 ()
+  select case (.true.) ! { dg-error "Cannot convert" }
+  case (1_8)   ! { dg-error "must be of type LOGICAL" }
+  end select
+end
+
+subroutine s2 ()
+  select case (.false._1) ! { dg-error "Cannot convert" }
+  case (2:3)  ! { dg-error "must be of type LOGICAL" }
+  end select
+end
+
+subroutine s3 ()
+  select case (3_2) ! { dg-error "Cannot convert" }
+  case (.false.)! { dg-error "must be of type INTEGER" }
+  end select
+end
+
+subroutine s4 (i)
+  select case (i) ! { dg-error "Cannot convert" }
+  case (.true._8) ! { dg-error "must be of type INTEGER" }
+  end select
+end
+
+! { dg-prune-output "Cannot convert" }

Re: [PATCH,FORTRAN] Fix memory leak of gsymbol

2021-10-28 Thread Harald Anlauf via Gcc-patches


Hi Bernhard,

Am 27.10.21 um 23:43 schrieb Bernhard Reutner-Fischer via Gcc-patches:

ping
[I'll rebase and retest this too since it's been a while.
Ok if it passes?]

On Sun, 21 Oct 2018 16:04:34 +0200
Bernhard Reutner-Fischer  wrote:


Hi!

Regtested on x86_64-unknown-linux, installing on
aldot/fortran-fe-stringpool.

We did not free global symbols. For a simplified abstract_type_3.f03
valgrind reports:

96 bytes in 1 blocks are still reachable in loss record 461 of 602
at 0x48377D5: calloc (vg_replace_malloc.c:711)
by 0x21257C3: xcalloc (xmalloc.c:162)
by 0x98611B: gfc_get_gsymbol(char const*) (symbol.c:4341)
by 0x932C58: parse_module() (parse.c:5912)
by 0x9336F8: gfc_parse_file() (parse.c:6236)
by 0x991449: gfc_be_parse_file() (f95-lang.c:204)
by 0x11D8EDE: compile_file() (toplev.c:455)
by 0x11DB9C3: do_compile() (toplev.c:2170)
by 0x11DBCAF: toplev::main(int, char**) (toplev.c:2305)
by 0x2045D37: main (main.c:39)

This patch reduces leaks to

  LEAK SUMMARY:
 definitely lost: 344 bytes in 1 blocks
 indirectly lost: 3,024 bytes in 4 blocks
   possibly lost: 0 bytes in 0 blocks
-   still reachable: 1,576,174 bytes in 2,277 blocks
+   still reachable: 1,576,078 bytes in 2,276 blocks
  suppressed: 0 bytes in 0 blocks

gcc/fortran/ChangeLog:

2018-10-21  Bernhard Reutner-Fischer  

* parse.c (clean_up_modules): Free gsym.
---
  gcc/fortran/parse.c | 18 +++---
  1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index b7265c42f58..f7c369a17ac 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -6066,7 +6066,7 @@ resolve_all_program_units (gfc_namespace 
*gfc_global_ns_list)
  
  
  static void

-clean_up_modules (gfc_gsymbol *gsym)
+clean_up_modules (gfc_gsymbol *&gsym)
  {
if (gsym == NULL)
  return;
@@ -6074,14 +6074,18 @@ clean_up_modules (gfc_gsymbol *gsym)
clean_up_modules (gsym->left);
clean_up_modules (gsym->right);
  
-  if (gsym->type != GSYM_MODULE || !gsym->ns)

+  if (gsym->type != GSYM_MODULE)
  return;
  
-  gfc_current_ns = gsym->ns;

-  gfc_derived_types = gfc_current_ns->derived_types;
-  gfc_done_2 ();
-  gsym->ns = NULL;
-  return;
+  if (gsym->ns)
+{
+  gfc_current_ns = gsym->ns;
+  gfc_derived_types = gfc_current_ns->derived_types;
+  gfc_done_2 ();
+  gsym->ns = NULL;
+}
+  free (gsym);
+  gsym = NULL;


this essentially looks fine, but did you inspect the callers?

With the change to the interface (*gsym -> *&gsym), it could have
effects not visible here due to the explicit gsym = NULL.

Assuming you checked that, and if it regtests fine, then it is
OK for mainline.

Thanks for the patch!

Harald


  }

Re: [PATCH,FORTRAN] Fix memory leak of gsymbol

On Thu, 28 Oct 2021 23:37:59 +0200
Harald Anlauf  wrote:

> Hi Bernhard,
> 
> Am 27.10.21 um 23:43 schrieb Bernhard Reutner-Fischer via Gcc-patches:
> > ping
> > [I'll rebase and retest this too since it's been a while.
> > Ok if it passes?]
> >
> > On Sun, 21 Oct 2018 16:04:34 +0200
> > Bernhard Reutner-Fischer  wrote:

> >> gcc/fortran/ChangeLog:
> >>
> >> 2018-10-21  Bernhard Reutner-Fischer  
> >>
> >>* parse.c (clean_up_modules): Free gsym.

> this essentially looks fine, but did you inspect the callers?
> 
> With the change to the interface (*gsym -> *&gsym), it could have
> effects not visible here due to the explicit gsym = NULL.
> 
> Assuming you checked that, and if it regtests fine, then it is
> OK for mainline.

The only caller is translate_all_program_units.
Since we free only module gsyms, even -fdump-fortran-global is
unaffected by this, fwiw.

It regtests cleanly and i will push it when the rest is approved.
Thanks!

Re: [PATCH][RFC] Map -ftrapv to -fsanitize=signed-integer-overflow -fsanitize-undefined-trap-on-error

2021-10-28 Thread Hans-Peter Nilsson

On Wed, 20 Oct 2021, Richard Biener via Gcc-patches wrote:

> This maps -ftrapv to -fsanitize=signed-integer-overflow
> -fsanitize-undefined-trap-on-error,

Isn't that UBSAN target-dependent, i.e. not supported on all
targets, whereas -ftrapv is just about universally supported?

I.e. isn't this patch breaking -ftrapv for some targets?

brgds, H-P

[PATCH,Fortran 2/2] Fix write_omp_udr for user-operator REDUCTIONs

2021-10-28 Thread Bernhard Reutner-Fischer via Gcc-patches

From: Bernhard Reutner-Fischer 

Due to a typo a user operator used in a reduction was not found in the
symtree so would have been written multiple times (in theory).

E.g. user operator ".add." was looked up as ".ad" instead of "add".

For gcc-11 branch and earlier one would
- memcpy (name, udr->name, len - 1);
+ memcpy (name, udr->name + 1, len - 1);

but for gcc-12 we have an appropriate helper already.
Jakub, please take care of non-trunk branches if you want it fixed
there.

Cc: Jakub Jelinek 

gcc/fortran/ChangeLog:

2017-11-16  Bernhard Reutner-Fischer  

* module.c (write_omp_udr): Use gfc_get_name_from_uop.
---
 gcc/fortran/module.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 1328414e4f7..90ab9e275f3 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -6021,12 +6021,8 @@ write_omp_udr (gfc_omp_udr *udr)
return;
   else
{
- gfc_symtree *st;
- size_t len = strlen (udr->name + 1);
- char *name = XALLOCAVEC (char, len);
- memcpy (name, udr->name, len - 1);
- name[len - 1] = '\0';
- st = gfc_find_symtree (gfc_current_ns->uop_root, name);
+ const char *name = gfc_get_name_from_uop (udr->name);
+ gfc_symtree *st = gfc_find_symtree (gfc_current_ns->uop_root, name);
  /* If corresponding user operator is private, don't write
 the UDR.  */
  if (st != NULL)
-- 
2.33.0

[PATCH,Fortran 1/2] Add uop/name helpers

2021-10-28 Thread Bernhard Reutner-Fischer via Gcc-patches

From: Bernhard Reutner-Fischer 

Introduce a helper to construct a user operator from a name and the
reverse operation, i.e. a helper to construct a name from a user
operator.

Cc: Jakub Jelinek 

gcc/fortran/ChangeLog:

2017-10-29  Bernhard Reutner-Fischer  

* gfortran.h (gfc_get_uop_from_name, gfc_get_name_from_uop): Declare.
* symbol.c (gfc_get_uop_from_name, gfc_get_name_from_uop): Define.
* module.c (load_omp_udrs): Use them.
---
 gcc/fortran/gfortran.h |  2 ++
 gcc/fortran/module.c   | 21 +++--
 gcc/fortran/symbol.c   | 21 +
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 9378b4b8a24..afe9f2354ee 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3399,6 +3399,8 @@ void gfc_delete_symtree (gfc_symtree **, const char *);
 gfc_symtree *gfc_get_unique_symtree (gfc_namespace *);
 gfc_user_op *gfc_get_uop (const char *);
 gfc_user_op *gfc_find_uop (const char *, gfc_namespace *);
+const char *gfc_get_uop_from_name (const char*);
+const char *gfc_get_name_from_uop (const char*);
 void gfc_free_symbol (gfc_symbol *&);
 void gfc_release_symbol (gfc_symbol *&);
 gfc_symbol *gfc_new_symbol (const char *, gfc_namespace *);
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 7b98ba539d6..1328414e4f7 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -5027,7 +5027,7 @@ load_omp_udrs (void)
   while (peek_atom () != ATOM_RPAREN)
 {
   const char *name = NULL, *newname;
-  char *altname;
+  const char *altname = NULL;
   gfc_typespec ts;
   gfc_symtree *st;
   gfc_omp_reduction_op rop = OMP_REDUCTION_USER;
@@ -5054,15 +5054,8 @@ load_omp_udrs (void)
  else if (strcmp (p, ".neqv.") == 0)
rop = OMP_REDUCTION_NEQV;
}
-  altname = NULL;
   if (rop == OMP_REDUCTION_USER && name[0] == '.')
-   {
- size_t len = strlen (name + 1);
- altname = XALLOCAVEC (char, len);
- gcc_assert (name[len] == '.');
- memcpy (altname, name + 1, len - 1);
- altname[len - 1] = '\0';
-   }
+   altname = gfc_get_name_from_uop (name);
   newname = name;
   if (rop == OMP_REDUCTION_USER)
newname = find_use_name (altname ? altname : name, !!altname);
@@ -5074,15 +5067,7 @@ load_omp_udrs (void)
  continue;
}
   if (altname && newname != altname)
-   {
- size_t len = strlen (newname);
- altname = XALLOCAVEC (char, len + 3);
- altname[0] = '.';
- memcpy (altname + 1, newname, len);
- altname[len + 1] = '.';
- altname[len + 2] = '\0';
- name = gfc_get_string ("%s", altname);
-   }
+   name = altname = gfc_get_uop_from_name (newname);
   st = gfc_find_symtree (gfc_current_ns->omp_udr_root, name);
   gfc_omp_udr *udr = gfc_omp_udr_find (st, &ts);
   if (udr)
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 289d85734bd..900ab49c478 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -3044,6 +3044,27 @@ gfc_find_uop (const char *name, gfc_namespace *ns)
   return (st == NULL) ? NULL : st->n.uop;
 }
 
+/* Given a name return a string usable as user operator name.  */
+const char *
+gfc_get_uop_from_name (const char* name) {
+  gcc_assert (name[0] != '.');
+  return gfc_get_string (".%s.", name);
+}
+
+/* Given a user operator name return a string usable as name.  */
+const char *
+gfc_get_name_from_uop (const char* name) {
+  gcc_assert (name[0] == '.');
+  const size_t len = strlen (name) - 1;
+  gcc_assert (len > 1);
+  gcc_assert (name[len] == '.');
+  char *buffer = XNEWVEC (char, len);
+  memcpy (buffer, name + 1, len - 1);
+  buffer[len - 1] = '\0';
+  const char *ret = gfc_get_string ("%s", buffer);
+  XDELETEVEC (buffer);
+  return ret;
+}
 
 /* Update a symbol's common_block field, and take care of the associated
memory management.  */
-- 
2.33.0

Re: [PATCH,FORTRAN] Fix memory leak in finalization wrappers

2021-10-28 Thread Bernhard Reutner-Fischer via Gcc-patches

On Wed, 27 Oct 2021 23:39:43 +0200
Bernhard Reutner-Fischer  wrote:

> Ping
> [hmz. it's been a while, I'll rebase and retest this one.
> Ok if it passes?]
Testing passed without any new regressions.
Ok for trunk?
thanks,
> 
> On Mon, 15 Oct 2018 10:23:06 +0200
> Bernhard Reutner-Fischer  wrote:
> 
> > If a finalization is not required we created a namespace containing
> > formal arguments for an internal interface definition but never used
> > any of these. So the whole sub_ns namespace was not wired up to the
> > program and consequently was never freed. The fix is to simply not
> > generate any finalization wrappers if we know that it will be unused.
> > Note that this reverts back to the original r190869
> > (8a96d64282ac534cb597f446f02ac5d0b13249cc) handling for this case
> > by reverting this specific part of r194075
> > (f1ee56b4be7cc3892e6ccc75d73033c129098e87) for PR fortran/37336.
> > 
> > Regtests cleanly, installed to the fortran-fe-stringpool branch, sent
> > here for reference and later inclusion.
> > I might plug a few more leaks in preparation of switching to hash-maps.
> > I fear that the leaks around interfaces are another candidate ;)
> > 
> > Should probably add a tag for the compile-time leak PR68800 shouldn't i.
> > 
> > valgrind summary for e.g.
> > gfortran.dg/abstract_type_3.f03 and gfortran.dg/abstract_type_4.f03
> > where ".orig" is pristine trunk and ".mine" contains this fix:
> > 
> > at3.orig.vg:LEAK SUMMARY:
> > at3.orig.vg-   definitely lost: 8,460 bytes in 11 blocks
> > at3.orig.vg-   indirectly lost: 13,288 bytes in 55 blocks
> > at3.orig.vg- possibly lost: 0 bytes in 0 blocks
> > at3.orig.vg-   still reachable: 572,278 bytes in 2,142 blocks
> > at3.orig.vg-suppressed: 0 bytes in 0 blocks
> > at3.orig.vg-
> > at3.orig.vg-Use --track-origins=yes to see where uninitialised values come 
> > from
> > at3.orig.vg-ERROR SUMMARY: 38 errors from 33 contexts (suppressed: 0 from 0)
> > --
> > at3.mine.vg:LEAK SUMMARY:
> > at3.mine.vg-   definitely lost: 344 bytes in 1 blocks
> > at3.mine.vg-   indirectly lost: 7,192 bytes in 18 blocks
> > at3.mine.vg- possibly lost: 0 bytes in 0 blocks
> > at3.mine.vg-   still reachable: 572,278 bytes in 2,142 blocks
> > at3.mine.vg-suppressed: 0 bytes in 0 blocks
> > at3.mine.vg-
> > at3.mine.vg-ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
> > at3.mine.vg-ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
> > at4.orig.vg:LEAK SUMMARY:
> > at4.orig.vg-   definitely lost: 13,751 bytes in 12 blocks
> > at4.orig.vg-   indirectly lost: 11,976 bytes in 60 blocks
> > at4.orig.vg- possibly lost: 0 bytes in 0 blocks
> > at4.orig.vg-   still reachable: 572,278 bytes in 2,142 blocks
> > at4.orig.vg-suppressed: 0 bytes in 0 blocks
> > at4.orig.vg-
> > at4.orig.vg-Use --track-origins=yes to see where uninitialised values come 
> > from
> > at4.orig.vg-ERROR SUMMARY: 18 errors from 16 contexts (suppressed: 0 from 0)
> > --
> > at4.mine.vg:LEAK SUMMARY:
> > at4.mine.vg-   definitely lost: 3,008 bytes in 3 blocks
> > at4.mine.vg-   indirectly lost: 4,056 bytes in 11 blocks
> > at4.mine.vg- possibly lost: 0 bytes in 0 blocks
> > at4.mine.vg-   still reachable: 572,278 bytes in 2,142 blocks
> > at4.mine.vg-suppressed: 0 bytes in 0 blocks
> > at4.mine.vg-
> > at4.mine.vg-ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
> > at4.mine.vg-ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
> > 
> > gcc/fortran/ChangeLog:
> > 
> > 2018-10-12  Bernhard Reutner-Fischer  
> > 
> > * class.c (generate_finalization_wrapper): Do leak finalization
> > wrappers if they will not be used.
> > * expr.c (gfc_free_actual_arglist): Formatting fix.
> > * gfortran.h (gfc_free_symbol): Pass argument by reference.
> > (gfc_release_symbol): Likewise.
> > (gfc_free_namespace): Likewise.
> > * symbol.c (gfc_release_symbol): Adjust acordingly.
> > (free_components): Set procedure pointer components
> > of derived types to NULL after freeing.
> > (free_tb_tree): Likewise.
> > (gfc_free_symbol): Set sym to NULL after freeing.
> > (gfc_free_namespace): Set namespace to NULL after freeing.
> > ---
> >  gcc/fortran/class.c| 25 +
> >  gcc/fortran/expr.c |  2 +-
> >  gcc/fortran/gfortran.h |  6 +++---
> >  gcc/fortran/symbol.c   | 19 ++-
> >  4 files changed, 23 insertions(+), 29 deletions(-)
> > 
> > diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
> > index 69c95fc5dfa..e0bb381a55f 100644
> > --- a/gcc/fortran/class.c
> > +++ b/gcc/fortran/class.c
> > @@ -1533,7 +1533,6 @@ generate_finalization_wrapper (gfc_symbol *derived, 
> > gfc_namespace *ns,
> >gfc_code *last_code, *block;
> >const char *name;
> >bool finalizable_comp = false;
> > -  bool expr_null_wrapper = false;
> >gfc_expr *ancestor_wrapper = NULL, *rank;
> >gfc_iterator *iter;
> >  
> > @@ -1561,13 +1560,17 @@ genera

Re: [PATCH,Fortran 0/1] Correct CAF locations in simplify

2021-10-28 Thread Bernhard Reutner-Fischer via Gcc-patches

On Wed, 27 Oct 2021 23:29:40 +0200
Bernhard Reutner-Fischer  wrote:

> Hi!
> 
> I found this lying around in an oldish tree.
> Regtest running over night, ok for trunk if it passes?

Regtest turned up no regressions.
Ok for trunk?
thanks,

> 
> Bernhard Reutner-Fischer (1):
>   Tweak locations around CAF simplify
> 
>  gcc/fortran/simplify.c | 28 +++-
>  1 file changed, 15 insertions(+), 13 deletions(-)
>

Re: [PATCH,FORTRAN 28/29] Free type-bound procedure structs

2021-10-28 Thread Bernhard Reutner-Fischer via Gcc-patches

ping
[Rebased, re-regtested cleanly. Ok for trunk?]
On Wed,  5 Sep 2018 14:57:31 +
Bernhard Reutner-Fischer  wrote:

> From: Bernhard Reutner-Fischer 
> 
> compiling gfortran.dg/typebound_proc_31.f90 leaked the type-bound
> structs:
> 
> 56 bytes in 1 blocks are definitely lost.
>   at 0x4C2CC05: calloc (vg_replace_malloc.c:711)
>   by 0x151EA90: xcalloc (xmalloc.c:162)
>   by 0x8E3E4F: gfc_get_typebound_proc(gfc_typebound_proc*) (symbol.c:4945)
>   by 0x84C095: match_procedure_in_type (decl.c:10486)
>   by 0x84C095: gfc_match_procedure() (decl.c:6696)
> ...
> 
> gcc/fortran/ChangeLog:
> 
> 2017-12-06  Bernhard Reutner-Fischer  
> 
>   * symbol.c (free_tb_tree): Free type-bound procedure struct.
>   (gfc_get_typebound_proc): Use explicit memcpy for clarity.
> ---
>  gcc/fortran/symbol.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
> index 53c760a6c38..cde34c67482 100644
> --- a/gcc/fortran/symbol.c
> +++ b/gcc/fortran/symbol.c
> @@ -3845,7 +3845,7 @@ free_tb_tree (gfc_symtree *t)
>  
>/* TODO: Free type-bound procedure structs themselves; probably needs some
>   sort of ref-counting mechanism.  */
> -
> +  free (t->n.tb);
>free (t);
>  }
>  
> @@ -5052,7 +5052,7 @@ gfc_get_typebound_proc (gfc_typebound_proc *tb0)
>  
>result = XCNEW (gfc_typebound_proc);
>if (tb0)
> -*result = *tb0;
> +memcpy (result, tb0, sizeof (gfc_typebound_proc));;
>result->error = 1;
>  
>latest_undo_chgset->tbps.safe_push (result);

[PATCH] Bump required minimum DejaGnu version to 1.5.3

2021-10-28 Thread Bernhard Reutner-Fischer via Gcc-patches

From: Bernhard Reutner-Fischer 

Bump required DejaGnu version to 1.5.3 (or later).
Ok for trunk?

gcc/ChangeLog:

* doc/install.texi: Bump required minimum DejaGnu version.
---
 gcc/doc/install.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 36c8280d7da..094469b9a4e 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -452,7 +452,7 @@ Necessary when modifying @command{gperf} input files, e.g.@:
 @file{gcc/cp/cfns.gperf} to regenerate its associated header file, e.g.@:
 @file{gcc/cp/cfns.h}.
 
-@item DejaGnu 1.4.4
+@item DejaGnu version 1.5.3 (or later)
 @itemx Expect
 @itemx Tcl
 @c Once Tcl 8.5 or higher is required, remove any obsolete
-- 
2.33.0

Re: [PATCH] Adjust testcase for O2 vect.

2021-10-28 Thread Hongtao Liu via Gcc-patches

On Fri, Oct 29, 2021 at 12:20 AM Martin Sebor via Gcc-patches
 wrote:
>
> On 10/28/21 1:23 AM, liuhongt via Gcc-patches wrote:
> > Adjust code in check_vect_slp_aligned_store_usage to make it an exact
> > pattern match of the corresponding testcases.
> > These new target/xfail selectors are added as a temporary solution,
> > and should be removed after real issue is fixed for Wstringop-overflow.
>
> Thanks for all the work you're putting into this!  I can't say
> I understand the conditions under which to use which selector
> in what case but hopefully we will be able to remove them all
> from the tests once the warnings are moved to a better pass.
> If that's a safe assumption I'm okay with the changes to
> the tests.  I do have a question/comment on the .exp changes.
>
> >
> > gcc/ChangeLog:
> >
> >   * doc/sourcebuild.texi (vect_slp_v4qi_store_2): Document
> >   efficient target.
> >   (vect_slp_v4qi_store_3): Ditto.
> >   (vect_slp_v2hi_store_2): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR testsuite/102944
> >   * gcc.dg/Warray-bounds-48.c: Adjust target/xfail selector.
> >   * gcc.dg/Warray-parameter-3.c: Ditto.
> >   * gcc.dg/Wstringop-overflow-68.c: Ditto
> >   * gcc.dg/Wstringop-overflow-76.c: Ditto
> >   * lib/target-supports.exp (vect_slp_v4qi_store_2): New
> >   efficient target.
> >   (vect_slp_v4qi_store_3): Ditto.
> >   (vect_slp_v2hi_store_2): Ditto.
> >   (check_vect_slp_aligned_store_usage): Adjust code to make it
> >   an exact pattern match of corresponding testcase.
> > ---
> >   gcc/doc/sourcebuild.texi |  12 ++
> >   gcc/testsuite/gcc.dg/Warray-bounds-48.c  |   4 +-
> >   gcc/testsuite/gcc.dg/Warray-parameter-3.c|   2 +-
> >   gcc/testsuite/gcc.dg/Wstringop-overflow-68.c |   4 +-
> >   gcc/testsuite/gcc.dg/Wstringop-overflow-76.c |  16 +-
> >   gcc/testsuite/lib/target-supports.exp| 201 ++-
> >   6 files changed, 179 insertions(+), 60 deletions(-)
> >
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > index 6a165767630..2bb3cb3a9be 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -1854,6 +1854,14 @@ address at plain @option{-O2}.
> >   Target supports vectorization of 4-byte char stores with 4-byte aligned
> >   address at plain @option{-O2}.
> >
> > +@item vect_slp_v4qi_store_2
> > +Target supports vectorization of 4-byte char stores with 4-byte aligned
> > +address at plain @option{-O2}.
> > +
> > +@item vect_slp_v4qi_store_3
> > +Target supports vectorization of 4-byte char stores with 4-byte aligned
> > +address at plain @option{-O2}.
>
> The description is the same for both of these targets as well
> as for vect_slp_v2qi_store.
>
> I think if anyone other than a vectorization expert is to have
> a chance of using these in the future without reverse engineering
> the code the descriptions need to capture the differences between
> them.  I.e., make it clear when vect_slp_v4qi_store is appropriate
> and when either vect_slp_v4qi_store_2 or vect_slp_v4qi_store_3
> should be used instead.
It's hard to describe vectorization difference like
vect_slp_v4qi_store
struct A1
{
char n;
char a[3];
};

extern void sink (void*);
void
foo2 ()
{
struct A1 a = { 0, {  } };
a.a[0] = 3;
a.a[1] = 4;
a.a[2] = 5;
sink (&a);
}

from

vect_slp_v4qi_store_2
extern char p[4];
void
foo2_2 ()
{
p[0] = 0;
p[1] = 1;
p[2] = 2;
p[3] = 3;
}

and

vect_slp_v4qi_store_3
typedef struct AC4 { char a[4]; } AC4;
extern char a[4];
void
foo ()
{
*(AC4*)a = Ac4;
}

They're all 4 continuous byte-stores, but with minor differences in
data reference.
Those efficient targets are an exact match of the corresponding
testcases, and maybe too special to be used in other places.(I have
tried to write them as general cases, but failed as PR102944
indicates)
The reuse of those targets do need reverse engineering.

>
> Martin
>
> > +
> >   @item vect_slp_v8qi_store
> >   Target supports vectorization of 8-byte char stores with 8-byte aligned
> >   address at plain @option{-O2}.
> > @@ -1874,6 +1882,10 @@ address at plain @option{-O2}.
> >   Target supports vectorization of 8-byte int stores with 8-byte aligned
> >   address at plain @option{-O2}.
> >
> > +@item vect_slp_v2si_store_2
> > +Target supports vectorization of 8-byte int stores with 8-byte aligned
> > +address at plain @option{-O2}.
> > +
> >   @item vect_slp_v4si_store
> >   Target supports vectorization of 16-byte int stores with 16-byte aligned
> >   address at plain @option{-O2}.
> > diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-48.c 
> > b/gcc/testsuite/gcc.dg/Warray-bounds-48.c
> > index 19b7634c063..32c0df843d2 100644
> > --- a/gcc/testsuite/gcc.dg/Warray-bounds-48.c
> > +++ b/gcc/testsuite/gcc.dg/Warray-bounds-48.c
> > @@ -30,7 +30,7 @@ static void nowarn_ax_extern (struct AX *p)
> >
> >   static void warn_ax_local_buf (struct AX *p)
> >   {

Re: [PATCH] Enable vectorization for _Float16 floor/ceil/trunc/nearbyint/rint operations.

2021-10-28 Thread Hongtao Liu via Gcc-patches

On Thu, Oct 28, 2021 at 10:26 AM Hongtao Liu  wrote:
>
> On Mon, Oct 25, 2021 at 4:24 PM liuhongt  wrote:
> >
> >   Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >   Ok for trunk?
> >
> I'm going to check in this patch if there's no objection.
Committed.
> > gcc/ChangeLog:
> >
> > PR target/102464
> > * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF): New
> > function type.
> > (V16HF_FTYPE_V16HF): Ditto.
> > (V32HF_FTYPE_V32HF): Ditto.
> > (V8HF_FTYPE_V8HF_ROUND): Ditto.
> > (V16HF_FTYPE_V16HF_ROUND): Ditto.
> > (V32HF_FTYPE_V32HF_ROUND): Ditto.
> > * config/i386/i386-builtin.def ( IX86_BUILTIN_FLOORPH,
> > IX86_BUILTIN_CEILPH, IX86_BUILTIN_TRUNCPH,
> > IX86_BUILTIN_FLOORPH256, IX86_BUILTIN_CEILPH256,
> > IX86_BUILTIN_TRUNCPH256, IX86_BUILTIN_FLOORPH512,
> > IX86_BUILTIN_CEILPH512, IX86_BUILTIN_TRUNCPH512): New builtin.
> > * config/i386/i386-builtins.c
> > (ix86_builtin_vectorized_function): Enable vectorization for
> > HFmode FLOOR/CEIL/TRUNC operation.
> > * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle
> > new builtins.
> > * config/i386/sse.md (rint2, nearbyint2): Extend
> > to vector HFmodes.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr102464-vrndscaleph.c: New test.
> > ---
> >  gcc/config/i386/i386-builtin-types.def|   7 ++
> >  gcc/config/i386/i386-builtin.def  |  11 ++
> >  gcc/config/i386/i386-builtins.c   |  42 +++
> >  gcc/config/i386/i386-expand.c |   3 +
> >  gcc/config/i386/sse.md|  12 +-
> >  .../gcc.target/i386/pr102464-vrndscaleph.c| 115 ++
> >  6 files changed, 184 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-vrndscaleph.c
> >
> > diff --git a/gcc/config/i386/i386-builtin-types.def 
> > b/gcc/config/i386/i386-builtin-types.def
> > index 4c355c587b5..e33f06ab30b 100644
> > --- a/gcc/config/i386/i386-builtin-types.def
> > +++ b/gcc/config/i386/i386-builtin-types.def
> > @@ -1380,3 +1380,10 @@ DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
> >  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, UHI, INT)
> >  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
> >  DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT)
> > +
> > +DEF_FUNCTION_TYPE (V8HF, V8HF)
> > +DEF_FUNCTION_TYPE (V16HF, V16HF)
> > +DEF_FUNCTION_TYPE (V32HF, V32HF)
> > +DEF_FUNCTION_TYPE_ALIAS (V8HF_FTYPE_V8HF, ROUND)
> > +DEF_FUNCTION_TYPE_ALIAS (V16HF_FTYPE_V16HF, ROUND)
> > +DEF_FUNCTION_TYPE_ALIAS (V32HF_FTYPE_V32HF, ROUND)
> > diff --git a/gcc/config/i386/i386-builtin.def 
> > b/gcc/config/i386/i386-builtin.def
> > index 99217d08d37..d9eee3f373c 100644
> > --- a/gcc/config/i386/i386-builtin.def
> > +++ b/gcc/config/i386/i386-builtin.def
> > @@ -958,6 +958,10 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, 
> > CODE_FOR_sse4_1_roundpd_vec_pack_sfix, "__buil
> >  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv2df2, 
> > "__builtin_ia32_roundpd_az", IX86_BUILTIN_ROUNDPD_AZ, UNKNOWN, (int) 
> > V2DF_FTYPE_V2DF)
> >  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv2df2_vec_pack_sfix, 
> > "__builtin_ia32_roundpd_az_vec_pack_sfix", 
> > IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX, UNKNOWN, (int) V4SI_FTYPE_V2DF_V2DF)
> >
> > +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> > CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_floorph", 
> > IX86_BUILTIN_FLOORPH, (enum rtx_code) ROUND_FLOOR, (int) 
> > V8HF_FTYPE_V8HF_ROUND)
> > +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> > CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_ceilph", 
> > IX86_BUILTIN_CEILPH, (enum rtx_code) ROUND_CEIL, (int) 
> > V8HF_FTYPE_V8HF_ROUND)
> > +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> > CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_truncph", 
> > IX86_BUILTIN_TRUNCPH, (enum rtx_code) ROUND_TRUNC, (int) 
> > V8HF_FTYPE_V8HF_ROUND)
> > +
> >  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> > "__builtin_ia32_floorps", IX86_BUILTIN_FLOORPS, (enum rtx_code) 
> > ROUND_FLOOR, (int) V4SF_FTYPE_V4SF_ROUND)
> >  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> > "__builtin_ia32_ceilps", IX86_BUILTIN_CEILPS, (enum rtx_code) ROUND_CEIL, 
> > (int) V4SF_FTYPE_V4SF_ROUND)
> >  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> > "__builtin_ia32_truncps", IX86_BUILTIN_TRUNCPS, (enum rtx_code) 
> > ROUND_TRUNC, (int) V4SF_FTYPE_V4SF_ROUND)
> > @@ -1090,6 +1094,10 @@ BDESC (OPTION_MASK_ISA_AVX, 0, 
> > CODE_FOR_roundv4df2_vec_pack_sfix, "__builtin_ia3
> >  BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_roundpd_vec_pack_sfix256, 
> > "__builtin_ia32_floorpd_vec_pack_sfix256", 
> > IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX256, (enum rtx_code) ROUND_FLOOR, (int) 
> > V8SI_FTYPE_V4DF_V4DF_ROUND)
> >  BDESC (OPTION_MASK_IS

[PATCH] ipa/102714 - IPA SRA eliding volatile

2021-10-28 Thread duan.db via Gcc-patches

Hi.
This patch fix the pr102714 in the trunk.
Should we backport this patch to gcc-10?  If needed, an adapted patch is 
attached.

Thanks,
Bo Duan

0001-Subject-PATCH-ipa-102714-IPA-SRA-eliding-volatile.patch
Description: Binary data

Re: [PATCH] Adjust testcase for O2 vect.


On 10/28/21 7:47 PM, Hongtao Liu wrote:

On Fri, Oct 29, 2021 at 12:20 AM Martin Sebor via Gcc-patches
 wrote:


On 10/28/21 1:23 AM, liuhongt via Gcc-patches wrote:

Adjust code in check_vect_slp_aligned_store_usage to make it an exact
pattern match of the corresponding testcases.
These new target/xfail selectors are added as a temporary solution,
and should be removed after real issue is fixed for Wstringop-overflow.


Thanks for all the work you're putting into this!  I can't say
I understand the conditions under which to use which selector
in what case but hopefully we will be able to remove them all
from the tests once the warnings are moved to a better pass.
If that's a safe assumption I'm okay with the changes to
the tests.  I do have a question/comment on the .exp changes.



gcc/ChangeLog:

   * doc/sourcebuild.texi (vect_slp_v4qi_store_2): Document
   efficient target.
   (vect_slp_v4qi_store_3): Ditto.
   (vect_slp_v2hi_store_2): Ditto.

gcc/testsuite/ChangeLog:

   PR testsuite/102944
   * gcc.dg/Warray-bounds-48.c: Adjust target/xfail selector.
   * gcc.dg/Warray-parameter-3.c: Ditto.
   * gcc.dg/Wstringop-overflow-68.c: Ditto
   * gcc.dg/Wstringop-overflow-76.c: Ditto
   * lib/target-supports.exp (vect_slp_v4qi_store_2): New
   efficient target.
   (vect_slp_v4qi_store_3): Ditto.
   (vect_slp_v2hi_store_2): Ditto.
   (check_vect_slp_aligned_store_usage): Adjust code to make it
   an exact pattern match of corresponding testcase.
---
   gcc/doc/sourcebuild.texi |  12 ++
   gcc/testsuite/gcc.dg/Warray-bounds-48.c  |   4 +-
   gcc/testsuite/gcc.dg/Warray-parameter-3.c|   2 +-
   gcc/testsuite/gcc.dg/Wstringop-overflow-68.c |   4 +-
   gcc/testsuite/gcc.dg/Wstringop-overflow-76.c |  16 +-
   gcc/testsuite/lib/target-supports.exp| 201 ++-
   6 files changed, 179 insertions(+), 60 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6a165767630..2bb3cb3a9be 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1854,6 +1854,14 @@ address at plain @option{-O2}.
   Target supports vectorization of 4-byte char stores with 4-byte aligned
   address at plain @option{-O2}.

+@item vect_slp_v4qi_store_2
+Target supports vectorization of 4-byte char stores with 4-byte aligned
+address at plain @option{-O2}.
+
+@item vect_slp_v4qi_store_3
+Target supports vectorization of 4-byte char stores with 4-byte aligned
+address at plain @option{-O2}.


The description is the same for both of these targets as well
as for vect_slp_v2qi_store.

I think if anyone other than a vectorization expert is to have
a chance of using these in the future without reverse engineering
the code the descriptions need to capture the differences between
them.  I.e., make it clear when vect_slp_v4qi_store is appropriate
and when either vect_slp_v4qi_store_2 or vect_slp_v4qi_store_3
should be used instead.

It's hard to describe vectorization difference like
vect_slp_v4qi_store
struct A1
{
 char n;
 char a[3];
};

extern void sink (void*);
void
foo2 ()
{
 struct A1 a = { 0, {  } };
 a.a[0] = 3;
 a.a[1] = 4;
 a.a[2] = 5;
 sink (&a);
}

from

vect_slp_v4qi_store_2
extern char p[4];
void
foo2_2 ()
{
 p[0] = 0;
 p[1] = 1;
 p[2] = 2;
 p[3] = 3;
}

and

vect_slp_v4qi_store_3
typedef struct AC4 { char a[4]; } AC4;
extern char a[4];
void
foo ()
{
 *(AC4*)a = Ac4;
}

They're all 4 continuous byte-stores, but with minor differences in
data reference.
Those efficient targets are an exact match of the corresponding
testcases, and maybe too special to be used in other places.(I have
tried to write them as general cases, but failed as PR102944
indicates)
The reuse of those targets do need reverse engineering.


Then there must be another variable (or several) besides size
and alignment that determines whether such stores can be
vectorized and that reflects the minor differences.   Can you
explain (at least roughly, in email) what it is?  I'd like to
understand this enough not just so I have an idea what to look
for if I have to tweak the tests, but also when I add new ones
that might have the same issue.

(I'm assuming the difference is due to some architectural
constraints as opposed to arbitrary limitations in the code
for the various targets that keep one other other from handling
this or that IL.)

In any event, this isn't an objection to your fix (though
I do think we should try harder to capture the differences
between the selectors).  Just questions to help me understand
why and when it's needed.

Thanks
Martin





Martin


+
   @item vect_slp_v8qi_store
   Target supports vectorization of 8-byte char stores with 8-byte aligned
   address at plain @option{-O2}.
@@ -1874,6 +1882,10 @@ address at plain @option{-O2}.
   Target supports vectorization of 8-byte int stores with 8-byte aligned
   address at plain @option{-O2}.

+@item vect_slp_

Re: [PATCH] Adjust testcase for O2 vect.

2021-10-28 Thread Hongtao Liu via Gcc-patches

On Fri, Oct 29, 2021 at 10:34 AM Martin Sebor  wrote:
>
> On 10/28/21 7:47 PM, Hongtao Liu wrote:
> > On Fri, Oct 29, 2021 at 12:20 AM Martin Sebor via Gcc-patches
> >  wrote:
> >>
> >> On 10/28/21 1:23 AM, liuhongt via Gcc-patches wrote:
> >>> Adjust code in check_vect_slp_aligned_store_usage to make it an exact
> >>> pattern match of the corresponding testcases.
> >>> These new target/xfail selectors are added as a temporary solution,
> >>> and should be removed after real issue is fixed for Wstringop-overflow.
> >>
> >> Thanks for all the work you're putting into this!  I can't say
> >> I understand the conditions under which to use which selector
> >> in what case but hopefully we will be able to remove them all
> >> from the tests once the warnings are moved to a better pass.
> >> If that's a safe assumption I'm okay with the changes to
> >> the tests.  I do have a question/comment on the .exp changes.
> >>
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* doc/sourcebuild.texi (vect_slp_v4qi_store_2): Document
> >>>efficient target.
> >>>(vect_slp_v4qi_store_3): Ditto.
> >>>(vect_slp_v2hi_store_2): Ditto.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>PR testsuite/102944
> >>>* gcc.dg/Warray-bounds-48.c: Adjust target/xfail selector.
> >>>* gcc.dg/Warray-parameter-3.c: Ditto.
> >>>* gcc.dg/Wstringop-overflow-68.c: Ditto
> >>>* gcc.dg/Wstringop-overflow-76.c: Ditto
> >>>* lib/target-supports.exp (vect_slp_v4qi_store_2): New
> >>>efficient target.
> >>>(vect_slp_v4qi_store_3): Ditto.
> >>>(vect_slp_v2hi_store_2): Ditto.
> >>>(check_vect_slp_aligned_store_usage): Adjust code to make it
> >>>an exact pattern match of corresponding testcase.
> >>> ---
> >>>gcc/doc/sourcebuild.texi |  12 ++
> >>>gcc/testsuite/gcc.dg/Warray-bounds-48.c  |   4 +-
> >>>gcc/testsuite/gcc.dg/Warray-parameter-3.c|   2 +-
> >>>gcc/testsuite/gcc.dg/Wstringop-overflow-68.c |   4 +-
> >>>gcc/testsuite/gcc.dg/Wstringop-overflow-76.c |  16 +-
> >>>gcc/testsuite/lib/target-supports.exp| 201 ++-
> >>>6 files changed, 179 insertions(+), 60 deletions(-)
> >>>
> >>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> >>> index 6a165767630..2bb3cb3a9be 100644
> >>> --- a/gcc/doc/sourcebuild.texi
> >>> +++ b/gcc/doc/sourcebuild.texi
> >>> @@ -1854,6 +1854,14 @@ address at plain @option{-O2}.
> >>>Target supports vectorization of 4-byte char stores with 4-byte aligned
> >>>address at plain @option{-O2}.
> >>>
> >>> +@item vect_slp_v4qi_store_2
> >>> +Target supports vectorization of 4-byte char stores with 4-byte aligned
> >>> +address at plain @option{-O2}.
> >>> +
> >>> +@item vect_slp_v4qi_store_3
> >>> +Target supports vectorization of 4-byte char stores with 4-byte aligned
> >>> +address at plain @option{-O2}.
> >>
> >> The description is the same for both of these targets as well
> >> as for vect_slp_v2qi_store.
> >>
> >> I think if anyone other than a vectorization expert is to have
> >> a chance of using these in the future without reverse engineering
> >> the code the descriptions need to capture the differences between
> >> them.  I.e., make it clear when vect_slp_v4qi_store is appropriate
> >> and when either vect_slp_v4qi_store_2 or vect_slp_v4qi_store_3
> >> should be used instead.
> > It's hard to describe vectorization difference like
> > vect_slp_v4qi_store
> > struct A1
> > {
> >  char n;
> >  char a[3];
> > };
> >
> > extern void sink (void*);
> > void
> > foo2 ()
> > {
> >  struct A1 a = { 0, {  } };
> >  a.a[0] = 3;
> >  a.a[1] = 4;
> >  a.a[2] = 5;
> >  sink (&a);
> > }
> >
> > from
> >
> > vect_slp_v4qi_store_2
> > extern char p[4];
> > void
> > foo2_2 ()
> > {
> >  p[0] = 0;
> >  p[1] = 1;
> >  p[2] = 2;
> >  p[3] = 3;
> > }
> >
> > and
> >
> > vect_slp_v4qi_store_3
> > typedef struct AC4 { char a[4]; } AC4;
> > extern char a[4];
> > void
> > foo ()
> > {
> >  *(AC4*)a = Ac4;
> > }
> >
> > They're all 4 continuous byte-stores, but with minor differences in
> > data reference.
> > Those efficient targets are an exact match of the corresponding
> > testcases, and maybe too special to be used in other places.(I have
> > tried to write them as general cases, but failed as PR102944
> > indicates)
> > The reuse of those targets do need reverse engineering.
>
> Then there must be another variable (or several) besides size
> and alignment that determines whether such stores can be
> vectorized and that reflects the minor differences.   Can you
> explain (at least roughly, in email) what it is?  I'd like to
> understand this enough not just so I have an idea what to look
> for if I have to tweak the tests, but also when I add new ones
> that might have the same issue.
>
> (I'm assuming the difference is due to some architectural
> constraints as opposed to arbitrary limita

Re: [PATCH] vect: Add bias parameter for partial vectorization

2021-10-28 Thread Kewen.Lin via Gcc-patches

Hi Robin,

on 2021/10/28 下午10:44, Robin Dapp wrote:
> Hi,
> 
> as discussed in
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582627.html this
> introduces a bias parameter for the len_load/len_store ifns as well as
> optabs that is meant to distinguish between Power and s390 variants.
> The default is a bias of 0, while in s390's case vll/vstl do not support
> lengths of zero bytes and a bias of -1 should be used.
> 
> Bootstrapped and regtested on Power9 (--with-cpu=power9) and s390
> (--with-arch=z15).
> 
> The tiny changes in the Power backend I will post separately.
> 

Thanks for extending this!

I guess your separated Power (rs6000) patch will be committed with this one
together? otherwise I'm worried that those existing rs6000 partial vector
cases could fail since the existing rs6000 optabs miss the new operand which
isn't optional.

You might need to update the documentation doc/md.texi for the new operand
in sections len_load_@var{m} and len_store_@var{m}, and might want to add
the costing consideration for this non-zero biasing in hunk
"
  else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
{
"
of function vect_estimate_min_profitable_iters.

I may think too much, it seems we can have one assertion in function
vect_verify_loop_lens to ensure the (internal_len_load_bias_supported ==
internal_len_load_bias_supported) to avoid some mixed biasing cases from
some weird targets or optab typos.

BR,
Kewen

Re: [PATCH 1/5] Makefile.in: Ensure build CPP/CPPFLAGS is used for build targets

On Wed, Oct 27, 2021 at 10:10 PM Richard Purdie via Gcc-patches
 wrote:
>
> During cross compiling, CPP is being set to the target compiler even for
> build targets. As an example, when building a cross compiler targetting
> mingw, the config.log for libiberty in
> build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log
> shows:
>
> configure:3786: checking how to run the C preprocessor
> configure:3856: result: x86_64-pokysdk-mingw32-gcc -E 
> --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32
> configure:3876: x86_64-pokysdk-mingw32-gcc -E 
> --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c
> configure:3876: $? = 0
>
> This is libiberty being built for the build environment, not the target one
> (i.e. in build-x86_64-linux). As such it should be using the build 
> environment's
> gcc and not the target one. In the mingw case the system headers are quite
> different leading to build failures related to not being able to include a
> process.h file for pem-unix.c.
>
> Further analysis shows the same issue occuring for CPPFLAGS too.
>
> Fix this by adding support for CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD which
> for example, avoids mixing the mingw headers for host binaries on linux
> systems.

OK.

Thanks,
Richard.

> 2021-10-27 Richard Purdie 
>
> ChangeLog:
>
> * Makefile.tpl: Add CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD support
> * Makefile.in: Regenerate.
> * configure: Regenerate.
> * configure.ac: Add CPP_FOR_BUILD and CPPFLAGS_FOR_BUILD support
>
> gcc/ChangeLog:
>
> * configure: Regenerate.
> * configure.ac: Use CPPFLAGS_FOR_BUILD for GMPINC
>
> Signed-off-by: Richard Purdie 
> ---
>  Makefile.in  | 6 ++
>  Makefile.tpl | 6 ++
>  configure| 4 
>  configure.ac | 4 
>  gcc/configure| 2 +-
>  gcc/configure.ac | 2 +-
>  6 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile.in b/Makefile.in
> index 34b2d89660d..d13f6c353ee 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -154,6 +154,8 @@ BUILD_EXPORTS = \
> CC="$(CC_FOR_BUILD)"; export CC; \
> CFLAGS="$(CFLAGS_FOR_BUILD)"; export CFLAGS; \
> CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> +   CPP="$(CPP_FOR_BUILD)"; export CPP; \
> +   CPPFLAGS="$(CPPFLAGS_FOR_BUILD)"; export CPPFLAGS; \
> CXX="$(CXX_FOR_BUILD)"; export CXX; \
> CXXFLAGS="$(CXXFLAGS_FOR_BUILD)"; export CXXFLAGS; \
> GFORTRAN="$(GFORTRAN_FOR_BUILD)"; export GFORTRAN; \
> @@ -202,6 +204,8 @@ HOST_EXPORTS = \
> AR="$(AR)"; export AR; \
> AS="$(AS)"; export AS; \
> CC_FOR_BUILD="$(CC_FOR_BUILD)"; export CC_FOR_BUILD; \
> +   CPP_FOR_BUILD="$(CPP_FOR_BUILD)"; export CPP_FOR_BUILD; \
> +   CPPFLAGS_FOR_BUILD="$(CPPFLAGS_FOR_BUILD)"; export 
> CPPFLAGS_FOR_BUILD; \
> CXX_FOR_BUILD="$(CXX_FOR_BUILD)"; export CXX_FOR_BUILD; \
> DLLTOOL="$(DLLTOOL)"; export DLLTOOL; \
> DSYMUTIL="$(DSYMUTIL)"; export DSYMUTIL; \
> @@ -360,6 +364,8 @@ AR_FOR_BUILD = @AR_FOR_BUILD@
>  AS_FOR_BUILD = @AS_FOR_BUILD@
>  CC_FOR_BUILD = @CC_FOR_BUILD@
>  CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
> +CPP_FOR_BUILD = @CPP_FOR_BUILD@
> +CPPFLAGS_FOR_BUILD = @CPPFLAGS_FOR_BUILD@
>  CXXFLAGS_FOR_BUILD = @CXXFLAGS_FOR_BUILD@
>  CXX_FOR_BUILD = @CXX_FOR_BUILD@
>  DLLTOOL_FOR_BUILD = @DLLTOOL_FOR_BUILD@
> diff --git a/Makefile.tpl b/Makefile.tpl
> index 08e68e83ea8..213052f8226 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -157,6 +157,8 @@ BUILD_EXPORTS = \
> CC="$(CC_FOR_BUILD)"; export CC; \
> CFLAGS="$(CFLAGS_FOR_BUILD)"; export CFLAGS; \
> CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> +   CPP="$(CPP_FOR_BUILD)"; export CPP; \
> +   CPPFLAGS="$(CPPFLAGS_FOR_BUILD)"; export CPPFLAGS; \
> CXX="$(CXX_FOR_BUILD)"; export CXX; \
> CXXFLAGS="$(CXXFLAGS_FOR_BUILD)"; export CXXFLAGS; \
> GFORTRAN="$(GFORTRAN_FOR_BUILD)"; export GFORTRAN; \
> @@ -205,6 +207,8 @@ HOST_EXPORTS = \
> AR="$(AR)"; export AR; \
> AS="$(AS)"; export AS; \
> CC_FOR_BUILD="$(CC_FOR_BUILD)"; export CC_FOR_BUILD; \
> +   CPP_FOR_BUILD="$(CPP_FOR_BUILD)"; export CPP_FOR_BUILD; \
> +   CPPFLAGS_FOR_BUILD="$(CPPFLAGS_FOR_BUILD)"; export 
> CPPFLAGS_FOR_BUILD; \
> CXX_FOR_BUILD="$(CXX_FOR_BUILD)"; export CXX_FOR_BUILD; \
> DLLTOOL="$(DLLTOOL)"; export DLLTOOL; \
> DSYMUTIL="$(DSYMUTIL)"; export DSYMUTIL; \
> @@ -363,6 +367,8 @@ AR_FOR_BUILD = @AR_FOR_BUILD@
>  AS_FOR_BUILD = @AS_FOR_BUILD@
>  CC_FOR_BUILD = @CC_FOR_BUILD@
>  CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
> +CPP_FOR_BUILD = @CPP_FOR_BUILD@
> +CPPFLAGS_FOR_BUILD = @CPPFLAGS_FOR_BUILD@
>  CXXFLAGS_FOR_BUILD = @CXXFLAGS_FOR_BUILD@
>  CXX_FOR_BUILD = @CXX_FOR_BUILD@
>  DLLTOOL_FOR_BUILD = @DLLTOOL_FOR_BUILD@
> diff --git a/configure b/configure
> index 785498efff5..58979d6e3b1 100755
> --- a/configure
> +++ b/configure
> @@ -655,6 +655,8 @@

[PATCH] Adjust testcase for O2 vect.

2021-10-28 Thread liuhongt via Gcc-patches

Adjust code in check_vect_slp_aligned_store_usage to make it an exact
pattern match of the corresponding testcases.
These new target/xfail selectors are added as a temporary solution,
and should be removed after real issue is fixed for Wstringop-overflow.

gcc/ChangeLog:

* doc/sourcebuild.texi (vect_slp_v4qi_store_2): Document
efficient target.
(vect_slp_v4qi_store_3): Ditto.
(vect_slp_v2hi_store_2): Ditto.

gcc/testsuite/ChangeLog:

PR testsuite/102944
* gcc.dg/Warray-bounds-48.c: Adjust target/xfail selector.
* gcc.dg/Warray-parameter-3.c: Ditto.
* gcc.dg/Wstringop-overflow-68.c: Ditto
* gcc.dg/Wstringop-overflow-76.c: Ditto
* lib/target-supports.exp (vect_slp_v4qi_store_2): New
efficient target.
(vect_slp_v4qi_store_3): Ditto.
(vect_slp_v2hi_store_2): Ditto.
(check_vect_slp_aligned_store_usage): Adjust code to make it
an exact pattern match of corresponding testcase.
---
 gcc/doc/sourcebuild.texi |  12 ++
 gcc/testsuite/gcc.dg/Warray-bounds-48.c  |   4 +-
 gcc/testsuite/gcc.dg/Warray-parameter-3.c|   2 +-
 gcc/testsuite/gcc.dg/Wstringop-overflow-68.c |   4 +-
 gcc/testsuite/gcc.dg/Wstringop-overflow-76.c |  16 +-
 gcc/testsuite/lib/target-supports.exp| 201 ++-
 6 files changed, 179 insertions(+), 60 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6a165767630..2bb3cb3a9be 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1854,6 +1854,14 @@ address at plain @option{-O2}.
 Target supports vectorization of 4-byte char stores with 4-byte aligned
 address at plain @option{-O2}.
 
+@item vect_slp_v4qi_store_2
+Target supports vectorization of 4-byte char stores with 4-byte aligned
+address at plain @option{-O2}.
+
+@item vect_slp_v4qi_store_3
+Target supports vectorization of 4-byte char stores with 4-byte aligned
+address at plain @option{-O2}.
+
 @item vect_slp_v8qi_store
 Target supports vectorization of 8-byte char stores with 8-byte aligned
 address at plain @option{-O2}.
@@ -1874,6 +1882,10 @@ address at plain @option{-O2}.
 Target supports vectorization of 8-byte int stores with 8-byte aligned
 address at plain @option{-O2}.
 
+@item vect_slp_v2si_store_2
+Target supports vectorization of 8-byte int stores with 8-byte aligned
+address at plain @option{-O2}.
+
 @item vect_slp_v4si_store
 Target supports vectorization of 16-byte int stores with 16-byte aligned
 address at plain @option{-O2}.
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-48.c 
b/gcc/testsuite/gcc.dg/Warray-bounds-48.c
index 19b7634c063..32c0df843d2 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-48.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-48.c
@@ -30,7 +30,7 @@ static void nowarn_ax_extern (struct AX *p)
 
 static void warn_ax_local_buf (struct AX *p)
 {
-  p->ax[0] = 4; p->ax[1] = 5;  // { dg-warning "\\\[-Wstringop-overflow" 
"pr102706" { target { vect_slp_v2hi_store &&  { ! vect_slp_v4hi_store } } } }
+  p->ax[0] = 4; p->ax[1] = 5;  // { dg-warning "\\\[-Wstringop-overflow" 
"pr102706" { target { vect_slp_v2hi_store_2 &&  { ! vect_slp_v4hi_store } } } }
 
   p->ax[2] = 6; // { dg-warning "\\\[-Warray-bounds" }
   p->ax[3] = 7; // { dg-warning "\\\[-Warray-bounds" }
@@ -130,7 +130,7 @@ static void warn_a0_extern (struct A0 *p)
 
 static void warn_a0_local_buf (struct A0 *p)
 {
-  p->a0[0] = 4; p->a0[1] = 5;  // { dg-warning "\\\[-Wstringop-overflow" 
"pr102706" { target { vect_slp_v2hi_store && { ! vect_slp_v4hi_store } } } }
+  p->a0[0] = 4; p->a0[1] = 5;  // { dg-warning "\\\[-Wstringop-overflow" 
"pr102706" { target { vect_slp_v2hi_store_2 && { ! vect_slp_v4hi_store } } } }
 
   p->a0[2] = 6; // { dg-warning "\\\[-Warray-bounds" }
   p->a0[3] = 7; // { dg-warning "\\\[-Warray-bounds" }
diff --git a/gcc/testsuite/gcc.dg/Warray-parameter-3.c 
b/gcc/testsuite/gcc.dg/Warray-parameter-3.c
index b6ed8daf51c..bbf55a40a3c 100644
--- a/gcc/testsuite/gcc.dg/Warray-parameter-3.c
+++ b/gcc/testsuite/gcc.dg/Warray-parameter-3.c
@@ -77,7 +77,7 @@ gia3 (int a[3])
 __attribute__ ((noipa)) void
 gcas3 (char a[static 3])
 {
-  a[0] = 0; a[1] = 1; a[2] = 2; // { dg-warning "\\\[-Wstringop-overflow" 
"pr102706" { target { vect_slp_v4qi_store } } }
+  a[0] = 0; a[1] = 1; a[2] = 2; // { dg-warning "\\\[-Wstringop-overflow" 
"pr102706" { target { vect_slp_v4qi_store_2 } } }
   a[3] = 3;   // { dg-warning "\\\[-Warray-bounds" }
 }
 
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-68.c 
b/gcc/testsuite/gcc.dg/Wstringop-overflow-68.c
index 04e91afb8bc..488b4a9b0c7 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-68.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-68.c
@@ -65,8 +65,8 @@ void warn_comp_lit (void)
   // MEM  [(char *)&a7] = { 0, 1, 2, 3, 4, 5, 6, 7 };
   // MEM  [(char *)&a15] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
10, 11, 12, 13, 14, 15 };
   // and warning should be expected,

Re: [PATCH] middle-end/57245 - honor -frounding-math in real truncation

On Wed, 27 Oct 2021, Richard Biener wrote:

> On October 27, 2021 4:44:53 PM GMT+02:00, Jakub Jelinek  
> wrote:
> >On Wed, Oct 27, 2021 at 04:29:38PM +0200, Richard Biener wrote:
> >> So something like the following below?  Note I have to fix 
> >> simplify_const_unary_operation to not perform the invalid constant
> >> folding with (not worrying about the exact conversion case - I doubt
> >> any of the constant folding is really relevant on RTL these days,
> >> maybe we should simply punt for all unary float-float ops when either
> >> mode has sign dependent rounding modes)
> >> 
> >> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> >> index bbbd6b74942..9522a31570e 100644
> >> --- a/gcc/simplify-rtx.c
> >> +++ b/gcc/simplify-rtx.c
> >> @@ -2068,6 +2073,9 @@ simplify_const_unary_operation (enum rtx_code code, 
> >> machine_mode mode,
> >>  and the operand is a signaling NaN.  */
> >>   if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
> >> return NULL_RTX;
> >> + /* Or if flag_rounding_math is on.  */
> >> + if (HONOR_SIGN_DEPENDENT_ROUNDING (mode))
> >> +   return NULL_RTX;
> >>   d = real_value_truncate (mode, d);
> >>   break;
> >
> >Won't this stop folding of truncations that are never a problem?
> >I mean at least if the wider float mode constant is exactly representable
> >in the narrower float mode, no matter what rounding mode is used the value
> >will be always the same...
> >And people use
> >  float f = 1.0;
> >or
> >  float f = 1.25;
> >etc. a lot.
> 
> Yes, but I do expect any such opportunities to be realized on GENERIC/GIMPLE? 
> 
> >So perhaps again
> > if (HONOR_SIGN_DEPENDENT_ROUNDING (mode)
> > && !exact_real_truncate (mode, &d))
> >   return NULL_RTX;
> >?
> 
> Sure, for this case it's short and straight forward. 
> 
> >
> >> /* PR57245 */
> >> /* { dg-do run } */
> >> /* { dg-require-effective-target fenv } */
> >> /* { dg-additional-options "-frounding-math" } */
> >> 
> >> #include 
> >> #include 
> >> 
> >> int
> >> main ()
> >> {
> >
> >Roughly yes.  Some tests also do #ifdef FE_*, so in your case
> >> #if __DBL_MANT_DIG__ == 53 && __FLT_MANT_DIG__ == 24
> >+#ifdef FE_UPWARD
> 
> Ah, OK. Will fix. 

So like this - bootstrapped and tested on x86_64-unknown-linux-gnu.

OK now?

Thanks,
Richard.

>From 22da541c70ec2eff1e9208dd53c6d7309c33b0c9 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Wed, 27 Oct 2021 14:27:40 +0200
Subject: [PATCH] middle-end/57245 - honor -frounding-math in real truncation
To: gcc-patches@gcc.gnu.org

The following honors -frounding-math when converting a FP constant
to another FP type.

2021-10-27  Richard Biener  

PR middle-end/57245
* fold-const.c (fold_convert_const_real_from_real): Honor
-frounding-math if the conversion is not exact.
* simplify-rtx.c (simplify_const_unary_operation): Do not
simplify FLOAT_TRUNCATE with sign dependent rounding.

* gcc.dg/torture/fp-double-convert-float-1.c: New testcase.
---
 gcc/fold-const.c  |  6 +++
 gcc/simplify-rtx.c|  5 +++
 .../torture/fp-double-convert-float-1.c   | 41 +++
 3 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ff23f12f33c..18950aeb760 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2139,6 +2139,12 @@ fold_convert_const_real_from_real (tree type, const_tree 
arg1)
   && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1)))
 return NULL_TREE; 
 
+  /* With flag_rounding_math we should respect the current rounding mode
+ unless the conversion is exact.  */
+  if (HONOR_SIGN_DEPENDENT_ROUNDING (arg1)
+  && !exact_real_truncate (TYPE_MODE (type), &TREE_REAL_CST (arg1)))
+return NULL_TREE;
+
   real_convert (&value, TYPE_MODE (type), &TREE_REAL_CST (arg1));
   t = build_real (type, value);
 
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index bbbd6b74942..f38b6d7d31c 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2068,6 +2068,11 @@ simplify_const_unary_operation (enum rtx_code code, 
machine_mode mode,
 and the operand is a signaling NaN.  */
  if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
return NULL_RTX;
+ /* Or if flag_rounding_math is on and the truncation is not
+exact.  */
+ if (HONOR_SIGN_DEPENDENT_ROUNDING (mode)
+ && !exact_real_truncate (mode, &d))
+   return NULL_RTX;
  d = real_value_truncate (mode, d);
  break;
case FLOAT_EXTEND:
diff --git a/gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c 
b/gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c
new file mode 100644
index 000..ec23274ea98
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c
@@ -0,0

Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr

2021-10-28 Thread Prathamesh Kulkarni via Gcc-patches

On Fri, 22 Oct 2021 at 14:41, Prathamesh Kulkarni
 wrote:
>
> On Wed, 20 Oct 2021 at 15:05, Richard Sandiford
>  wrote:
> >
> > Prathamesh Kulkarni  writes:
> > > On Tue, 19 Oct 2021 at 19:58, Richard Sandiford
> > >  wrote:
> > >>
> > >> Prathamesh Kulkarni  writes:
> > >> > Hi,
> > >> > The attached patch emits a more verbose diagnostic for target 
> > >> > attribute that
> > >> > is an architecture extension needing a leading '+'.
> > >> >
> > >> > For the following test,
> > >> > void calculate(void) __attribute__ ((__target__ ("sve")));
> > >> >
> > >> > With patch, the compiler now emits:
> > >> > 102376.c:1:1: error: arch extension ‘sve’ should be prepended with ‘+’
> > >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
> > >> >   | ^~~~
> > >> >
> > >> > instead of:
> > >> > 102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not valid
> > >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
> > >> >   | ^~~~
> > >>
> > >> Nice :-)
> > >>
> > >> > (This isn't specific to sve though).
> > >> > OK to commit after bootstrap+test ?
> > >> >
> > >> > Thanks,
> > >> > Prathamesh
> > >> >
> > >> > diff --git a/gcc/config/aarch64/aarch64.c 
> > >> > b/gcc/config/aarch64/aarch64.c
> > >> > index a9a1800af53..975f7faf968 100644
> > >> > --- a/gcc/config/aarch64/aarch64.c
> > >> > +++ b/gcc/config/aarch64/aarch64.c
> > >> > @@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args)
> > >> >num_attrs++;
> > >> >if (!aarch64_process_one_target_attr (token))
> > >> >   {
> > >> > -   error ("pragma or attribute % is not valid", 
> > >> > token);
> > >> > +   /* Check if token is possibly an arch extension without
> > >> > +  leading '+'.  */
> > >> > +   char *str = (char *) xmalloc (strlen (token) + 2);
> > >> > +   str[0] = '+';
> > >> > +   strcpy(str + 1, token);
> > >>
> > >> I think std::string would be better here, e.g.:
> > >>
> > >>   auto with_plus = std::string ("+") + token;
> > >>
> > >> > +   if (aarch64_handle_attr_isa_flags (str))
> > >> > + error("arch extension %<%s%> should be prepended with 
> > >> > %<+%>", token);
> > >>
> > >> Nit: should be a space before the “(”.
> > >>
> > >> In principle, a fixit hint would have been nice here, but I don't think
> > >> we have enough information to provide one.  (Just saying for the record.)
> > > Thanks for the suggestions.
> > > Does the attached patch look OK ?
> >
> > Looks good apart from a couple of formatting nits.
> > >
> > > Thanks,
> > > Prathamesh
> > >>
> > >> Thanks,
> > >> Richard
> > >>
> > >> > +   else
> > >> > + error ("pragma or attribute % is not 
> > >> > valid", token);
> > >> > +   free (str);
> > >> > return false;
> > >> >   }
> > >> >
> > >
> > > [aarch64] PR102376 - Emit better diagnostics for arch extension in target 
> > > attribute.
> > >
> > > gcc/ChangeLog:
> > >   PR target/102376
> > >   * config/aarch64/aarch64.c (aarch64_handle_attr_isa_flags): Change 
> > > str's
> > >   type to const char *.
> > >   (aarch64_process_target_attr): Check if token is possibly an arch 
> > > extension
> > >   without leading '+' and emit diagnostic accordingly.
> > >
> > > gcc/testsuite/ChangeLog:
> > >   PR target/102376
> > >   * gcc.target/aarch64/pr102376.c: New test.
> > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > > index a9a1800af53..b72079bc466 100644
> > > --- a/gcc/config/aarch64/aarch64.c
> > > +++ b/gcc/config/aarch64/aarch64.c
> > > @@ -17548,7 +17548,7 @@ aarch64_handle_attr_tune (const char *str)
> > > modified.  */
> > >
> > >  static bool
> > > -aarch64_handle_attr_isa_flags (char *str)
> > > +aarch64_handle_attr_isa_flags (const char *str)
> > >  {
> > >enum aarch64_parse_opt_result parse_res;
> > >uint64_t isa_flags = aarch64_isa_flags;
> > > @@ -17821,7 +17821,13 @@ aarch64_process_target_attr (tree args)
> > >num_attrs++;
> > >if (!aarch64_process_one_target_attr (token))
> > >   {
> > > -   error ("pragma or attribute % is not valid", 
> > > token);
> > > +   /* Check if token is possibly an arch extension without
> > > +  leading '+'.  */
> > > +   auto with_plus = std::string("+") + token;
> >
> > Should be a space before “(”.
> >
> > > +   if (aarch64_handle_attr_isa_flags (with_plus.c_str ()))
> > > + error ("arch extension %<%s%> should be prepended with %<+%>", 
> > > token);
> >
> > Long line, should be:
> >
> > error ("arch extension %<%s%> should be prepended with %<+%>",
> >token);
> >
> > OK with those changes, thanks.
> Thanks, the patch regressed some target attr tests because it emitted
> diagnostics twice from
> aarch64_handle_attr_isa_flags.
> So for eg, spellcheck_1.c:
> __attribute__((target ("arch=armv8-a-typo"))) void foo () {}
>
> results in:
> spellcheck_1.c:5:1: error: invalid name ("armv8

[PATCH] tree-optimization/102949 - fix base object alignment

This fixes fallout of g:4703182a06b831a9 where we now silently fail
to force alignment of a base object.  The fix is to look at the
dr_info of the group leader to be consistent with alignment analysis.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-28  Richard Biener  

PR tree-optimization/102949
* tree-vect-stmts.c (ensure_base_align): Look at the
dr_info of a group leader and assert we are looking at
one with analyzed alignment.
---
 gcc/tree-vect-stmts.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index bf07e7a9495..03cc7267cf8 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -6338,8 +6338,12 @@ vectorizable_operation (vec_info *vinfo,
 static void
 ensure_base_align (dr_vec_info *dr_info)
 {
-  if (dr_info->misalignment == DR_MISALIGNMENT_UNINITIALIZED)
-return;
+  /* Alignment is only analyzed for the first element of a DR group,
+ use that to look at base alignment we need to enforce.  */
+  if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
+dr_info = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
+
+  gcc_assert (dr_info->misalignment != DR_MISALIGNMENT_UNINITIALIZED);
 
   if (dr_info->base_misaligned)
 {
-- 
2.31.1

Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.

On Mon, 18 Oct 2021, Jiufu Guo wrote:

> With reference the discussions in:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578672.html
> 
> Base on the patches in above discussion, we may draft a patch to fix the
> issue.
> 
> In this patch, to make sure it is ok to change '{b0,s0} op {b1,s1}' to
> '{b0,s0-s1} op {b1,0}', we also compute the condition which could assume
> both 2 ivs are not overflow/wrap: the niter "of '{b0,s0-s1} op {b1,0}'"
> < the niter "of untill wrap for iv0 or iv1".
> 
> Does this patch make sense?

Hum, the patch is mightly complex :/  I'm not sure we can throw
artficial IVs at number_of_iterations_cond and expect a meaningful
result.

ISTR the problem is with number_of_iterations_ne[_max], but I would
have to go and dig in myself again for a full recap of the problem.
I did plan to do that, but not before stage3 starts.

Thanks,
Richard.


> BR,
> Jiufu Guo
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/100740
>   * tree-ssa-loop-niter.c (number_of_iterations_cond): Add
>   assume condition for combining of two IVs
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.c-torture/execute/pr100740.c: New test.
> ---
>  gcc/tree-ssa-loop-niter.c | 103 +++---
>  .../gcc.c-torture/execute/pr100740.c  |  11 ++
>  2 files changed, 99 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c
> 
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index 75109407124..f2987a4448d 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -1863,29 +1863,102 @@ number_of_iterations_cond (class loop *loop,
>  
>   provided that either below condition is satisfied:
>  
> -   a) the test is NE_EXPR;
> -   b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
> +   a) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
> +   b) assumptions in below table also need to be satisfied.
> +
> + | iv0 | iv1 | assum (iv0 + |-+-+-+-|
> + | (b0,2)  | (b1,1)  | before iv1 overflow | before iv1 overflow |
> + | (b0,2)  | (b1,-1) | true| true|
> + | (b0,-1) | (b1,-2) | before iv0 overflow | before iv0 overflow |
> + | | | | |
> + | (b0,1)  | (b1,2)  | false   | before iv0 overflow |
> + | (b0,-1) | (b1,2)  | false   | true|
> + | (b0,-2) | (b1,-1) | false   | before iv1 overflow |
> +   'true' in above table means no need additional condition.
> +   'false' means this case can not satify the transform.
> +   The first three rows: iv0->step > iv1->step;
> +   The second three rows: iv0->step < iv1->step.
>  
>   This rarely occurs in practice, but it is simple enough to manage.  */
>if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
>  {
> +  if (TREE_CODE (iv0->step) != INTEGER_CST
> +   || TREE_CODE (iv1->step) != INTEGER_CST)
> + return false;
> +  if (!iv0->no_overflow || !iv1->no_overflow)
> + return false;
> +
>tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
> -  tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
> -iv0->step, iv1->step);
> -
> -  /* No need to check sign of the new step since below code takes care
> -  of this well.  */
> -  if (code != NE_EXPR
> -   && (TREE_CODE (step) != INTEGER_CST
> -   || !iv0->no_overflow || !iv1->no_overflow))
> +  tree step
> + = fold_binary_to_constant (MINUS_EXPR, step_type, iv0->step, iv1->step);
> +
> +  if (code != NE_EXPR && tree_int_cst_sign_bit (step))
>   return false;
>  
> -  iv0->step = step;
> -  if (!POINTER_TYPE_P (type))
> - iv0->no_overflow = false;
> +  bool positive0 = !tree_int_cst_sign_bit (iv0->step);
> +  bool positive1 = !tree_int_cst_sign_bit (iv1->step);
>  
> -  iv1->step = build_int_cst (step_type, 0);
> -  iv1->no_overflow = true;
> +  /* Cases in rows 2 and 4 of above table.  */
> +  if ((positive0 && !positive1) || (!positive0 && positive1))
> + {
> +   iv0->step = step;
> +   iv1->step = build_int_cst (step_type, 0);
> +   return number_of_iterations_cond (loop, type, iv0, code, iv1,
> + niter, only_exit, every_iteration);
> + }
> +
> +  affine_iv i_0, i_1;
> +  class tree_niter_desc num;
> +  i_0 = *iv0;
> +  i_1 = *iv1;
> +  i_0.step = step;
> +  i_1.step = build_int_cst (step_type, 0);
> +  if (!number_of_iterations_cond (loop, type, &i_0, code, &i_1, &num,
> +   only_exit,

Re: [PATCH] middle-end/57245 - honor -frounding-math in real truncation

On Thu, Oct 28, 2021 at 10:11:36AM +0200, Richard Biener wrote:
> 2021-10-27  Richard Biener  
> 
>   PR middle-end/57245
>   * fold-const.c (fold_convert_const_real_from_real): Honor
>   -frounding-math if the conversion is not exact.
>   * simplify-rtx.c (simplify_const_unary_operation): Do not
>   simplify FLOAT_TRUNCATE with sign dependent rounding.
> 
>   * gcc.dg/torture/fp-double-convert-float-1.c: New testcase.

LGTM, thanks.

Jakub

[PATCH] match.pd: Optimize MIN_EXPR etc. addr1 < addr2 would be simplified [PR102951]

Hi!

This patch outlines the decision whether address comparison can be folded
or not from the match.pd simple comparison simplification and uses it
both there and in a new minmax simplification, such that we fold e.g.
  MAX (&a[2], &a[1])
etc.
Some of the Wstringop-overflow-62.c changes might look weird, but that
seems to be mainly due to gimple_fold_builtin_memset not bothering to
copy over location, will fix that incrementally.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-10-28  Jakub Jelinek  

PR tree-optimization/102951
* fold-const.h (address_compare): Declare.
* fold-const.c (address_compare): New function.
* match.pd (cmp (convert1?@2 addr@0) (convert2? addr@1)): Use
address_compare helper.
(minmax cmp (convert1?@2 addr@0) (convert2?@3 addr@1)): New
simplification.

* gcc.dg/tree-ssa/pr102951.c: New test.
* gcc.dg/Wstringop-overflow-62.c: Adjust expected diagnostics.

--- gcc/fold-const.h.jj 2021-06-14 12:27:18.572411152 +0200
+++ gcc/fold-const.h2021-10-27 11:54:50.781412075 +0200
@@ -213,6 +213,8 @@ extern bool negate_mathfn_p (combined_fn
 extern const char *getbyterep (tree, unsigned HOST_WIDE_INT *);
 extern const char *c_getstr (tree);
 extern wide_int tree_nonzero_bits (const_tree);
+extern int address_compare (tree_code, tree, tree, tree, tree &, tree &,
+   poly_int64 &, poly_int64 &, bool);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
--- gcc/fold-const.c.jj 2021-08-11 23:43:59.195893727 +0200
+++ gcc/fold-const.c2021-10-27 12:16:26.504267476 +0200
@@ -16473,6 +16473,132 @@ tree_nonzero_bits (const_tree t)
   return wi::shwi (-1, TYPE_PRECISION (TREE_TYPE (t)));
 }
 
+/* Helper function for address compare simplifications in match.pd.
+   OP0 and OP1 are ADDR_EXPR operands being compared by CODE.
+   BASE0, BASE1, OFF0 and OFF1 are set by the function.
+   GENERIC is true if GENERIC folding and false for GIMPLE folding.
+   Returns 0 if OP0 is known to be unequal to OP1 regardless of OFF{0,1},
+   1 if bases are known to be equal and OP0 cmp OP1 depends on OFF0 cmp OFF1,
+   and 2 if unknown.  */
+
+int
+address_compare (tree_code code, tree type, tree op0, tree op1,
+tree &base0, tree &base1, poly_int64 &off0, poly_int64 &off1,
+bool generic)
+{
+  gcc_checking_assert (TREE_CODE (op0) == ADDR_EXPR);
+  gcc_checking_assert (TREE_CODE (op1) == ADDR_EXPR);
+  base0 = get_addr_base_and_unit_offset (TREE_OPERAND (op0, 0), &off0);
+  base1 = get_addr_base_and_unit_offset (TREE_OPERAND (op1, 0), &off1);
+  if (base0 && TREE_CODE (base0) == MEM_REF)
+{
+  off0 += mem_ref_offset (base0).force_shwi ();
+  base0 = TREE_OPERAND (base0, 0);
+}
+  if (base1 && TREE_CODE (base1) == MEM_REF)
+{
+  off1 += mem_ref_offset (base1).force_shwi ();
+  base1 = TREE_OPERAND (base1, 0);
+}
+  if (base0 == NULL_TREE || base1 == NULL_TREE)
+return 2;
+
+  int equal = 2;
+  /* Punt in GENERIC on variables with value expressions;
+ the value expressions might point to fields/elements
+ of other vars etc.  */
+  if (generic
+  && ((VAR_P (base0) && DECL_HAS_VALUE_EXPR_P (base0))
+ || (VAR_P (base1) && DECL_HAS_VALUE_EXPR_P (base1
+return 2;
+  else if (decl_in_symtab_p (base0) && decl_in_symtab_p (base1))
+{
+  symtab_node *node0 = symtab_node::get_create (base0);
+  symtab_node *node1 = symtab_node::get_create (base1);
+  equal = node0->equal_address_to (node1);
+}
+  else if ((DECL_P (base0)
+   || TREE_CODE (base0) == SSA_NAME
+   || TREE_CODE (base0) == STRING_CST)
+  && (DECL_P (base1)
+  || TREE_CODE (base1) == SSA_NAME
+  || TREE_CODE (base1) == STRING_CST))
+equal = (base0 == base1);
+  if (equal == 1)
+{
+  if (code == EQ_EXPR
+ || code == NE_EXPR
+ /* If the offsets are equal we can ignore overflow.  */
+ || known_eq (off0, off1)
+ || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))
+ /* Or if we compare using pointers to decls or strings.  */
+ || (POINTER_TYPE_P (type)
+ && (DECL_P (base0) || TREE_CODE (base0) == STRING_CST)))
+   return 1;
+  return 2;
+}
+  if (equal != 0)
+return equal;
+  if (code != EQ_EXPR && code != NE_EXPR)
+return 2;
+
+  HOST_WIDE_INT ioff0 = -1, ioff1 = -1;
+  off0.is_constant (&ioff0);
+  off1.is_constant (&ioff1);
+  if ((DECL_P (base0) && TREE_CODE (base1) == STRING_CST)
+   || (TREE_CODE (base0) == STRING_CST && DECL_P (base1))
+   || (TREE_CODE (base0) == STRING_CST
+  && TREE_CODE (base1) == STRING_CST
+  && ioff0 >= 0 && ioff1 >= 0
+  && ioff0 < TREE_STRING_LENGTH (base0)
+  && ioff1 < TREE_STRING_LENGTH (base1)
+ /* This is a too conservative test tha

[PATCH] c++, v2: Implement DR2351 - void{} [PR102820]

On Wed, Oct 27, 2021 at 04:58:53PM -0400, Jason Merrill wrote:
> On 10/21/21 04:42, Jakub Jelinek wrote:
> > Hi!
> > 
> > Here is an attempt to implement DR2351 - void{} - where void{} after
> > pack expansion is considered valid and the same thing as void().
> > For templates, dunno if we have some better way to check if a CONSTRUCTOR
> > might be empty after pack expansion.  Would that only if the constructor
> > only contains EXPR_PACK_EXPANSION elements and nothing else, or something
> > else too?
> 
> I think that's the only case.  For template args there's the
> pack_expansion_args_count function, but I don't think there's anything
> similar for constructor elts; please feel free to add it.

Ok.  But counting how many packs its CONSTRUCTOR_ELTS have and then comparing
that number against CONSTRUCTOR_NELTS seems to be unnecessarily expensive if
there are many elements, for the purpose the DR2351 code needs we can stop
as soon as we see first non-pack element.

So what about this if it passes bootstrap/regtest?

2021-10-28  Jakub Jelinek  

PR c++/102820
* semantics.c (maybe_zero_constructor_nelts): New function.
(finish_compound_literal): Implement DR2351 - void{}.
If type is cv void and compound_literal has no elements, return
void_node.  If type is cv void and compound_literal might have no
elements after expansion, handle it like other dependent compound
literals.

* g++.dg/cpp0x/dr2351.C: New test.

--- gcc/cp/semantics.c.jj   2021-10-27 09:16:41.161600606 +0200
+++ gcc/cp/semantics.c  2021-10-28 13:06:59.325791588 +0200
@@ -3079,6 +3079,24 @@ finish_unary_op_expr (location_t op_loc,
   return result;
 }
 
+/* Return true if CONSTRUCTOR EXPR after pack expansion could have no
+   elements.  */
+
+static bool
+maybe_zero_constructor_nelts (tree expr)
+{
+  if (CONSTRUCTOR_NELTS (expr) == 0)
+return true;
+  if (!processing_template_decl)
+return false;
+  unsigned int i;
+  tree val;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (expr), i, val)
+if (!PACK_EXPANSION_P (val))
+  return false;
+  return true;
+}
+
 /* Finish a compound-literal expression or C++11 functional cast with aggregate
initializer.  TYPE is the type to which the CONSTRUCTOR in COMPOUND_LITERAL
is being cast.  */
@@ -3104,9 +3122,20 @@ finish_compound_literal (tree type, tree
 
   if (!TYPE_OBJ_P (type))
 {
-  if (complain & tf_error)
-   error ("compound literal of non-object type %qT", type);
-  return error_mark_node;
+  /* DR2351 */
+  if (VOID_TYPE_P (type) && CONSTRUCTOR_NELTS (compound_literal) == 0)
+   return void_node;
+  else if (VOID_TYPE_P (type)
+  && processing_template_decl
+  && maybe_zero_constructor_nelts (compound_literal))
+   /* If there are only packs in compound_literal, it could
+  be void{} after pack expansion.  */;
+  else
+   {
+ if (complain & tf_error)
+   error ("compound literal of non-object type %qT", type);
+ return error_mark_node;
+   }
 }
 
   if (template_placeholder_p (type))
--- gcc/testsuite/g++.dg/cpp0x/dr2351.C.jj  2021-10-28 12:59:27.987120315 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/dr2351.C 2021-10-28 13:15:20.532760871 +0200
@@ -0,0 +1,51 @@
+// DR2351
+// { dg-do compile { target c++11 } }
+
+void
+foo ()
+{
+  void{};
+  void();
+}
+
+template 
+void
+bar (T... t)
+{
+  void{t...};
+  void(t...);
+}
+
+void
+baz ()
+{
+  bar ();
+}
+
+template 
+void
+qux (T... t)
+{
+  void{t...};  // { dg-error "compound literal of non-object type" }
+}
+
+void
+corge ()
+{
+  qux (1, 2);
+}
+
+template 
+void
+garply (T... t)
+{
+  void{t..., t..., t...};
+  void(t..., t..., t...);
+}
+
+template 
+void
+grault (T... t)
+{
+  void{t..., 1};   // { dg-error "compound literal of non-object type" }
+}


Jakub

[PATCH v4 0/1] implement TLS register based stack canary for ARM

2021-10-28 Thread Ard Biesheuvel via Gcc-patches

Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352

In the Linux kernel, user processes calling into the kernel are
essentially threads running in the same address space, of a program that
never terminates. This means that using a global variable for the stack
protector canary value is problematic on SMP systems, as we can never
change it unless we reboot the system. (Processes that sleep for any
reason will do so on a call into the kernel, which means that there will
always be live kernel stack frames carrying copies of the canary taken
when the function was entered)

AArch64 implements -mstack-protector-guard=sysreg for this purpose, as
this permits the kernel to use different memory addresses for the stack
canary for each CPU, and context switch the chosen system register with
the rest of the process, allowing each process to use its own unique
value for the stack canary.

This patch implements something similar, but for the 32-bit ARM kernel,
which will start using the user space TLS register TPIDRURO to index
per-process metadata while running in the kernel. This means we can just
add an offset to TPIDRURO to obtain the address from which to load the
canary value.

Changes since v3:
- force a reload of the TLS register before performing the stack
  protector check, so that we never rely on the stack for the address of
  the canary 
Changes since v2:
- fix the template for stack_protect_test_tls so it correctly conveys
  the fact that it sets the Z flag

Comments/suggestions welcome.

Cc: Keith Packard 
Cc: thomas.preudho...@celest.fr
Cc: adhemerval.zane...@linaro.org
Cc: Qing Zhao 
Cc: Richard Sandiford 
Cc: gcc-patches@gcc.gnu.org

Ard Biesheuvel (1):
  [ARM] Add support for TLS register based stack protector canary access

 gcc/config/arm/arm-opts.h   |  6 ++
 gcc/config/arm/arm-protos.h |  2 +
 gcc/config/arm/arm.c| 55 +++
 gcc/config/arm/arm.md   | 71 +++-
 gcc/config/arm/arm.opt  | 22 ++
 gcc/doc/invoke.texi |  9 +++
 6 files changed, 163 insertions(+), 2 deletions(-)

-- 
2.30.2

[PATCH v4 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-10-28 Thread Ard Biesheuvel via Gcc-patches

Add support for accessing the stack canary value via the TLS register,
so that multiple threads running in the same address space can use
distinct canary values. This is intended for the Linux kernel running in
SMP mode, where processes entering the kernel are essentially threads
running the same program concurrently: using a global variable for the
canary in that context is problematic because it can never be rotated,
and so the OS is forced to use the same value as long as it remains up.

Using the TLS register to index the stack canary helps with this, as it
allows each CPU to context switch the TLS register along with the rest
of the process, permitting each process to use its own value for the
stack canary.

2021-10-28 Ard Biesheuvel 

* config/arm/arm-opts.h (enum stack_protector_guard): New
* config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
New
* config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
(arm_option_override_internal): Handle and put in error checks
for stack protector guard options.
(arm_option_reconfigure_globals): Likewise
(arm_stack_protect_tls_canary_mem): New
(arm_stack_protect_guard): New
* config/arm/arm.md (stack_protect_set): New
(stack_protect_set_tls): Likewise
(stack_protect_test): Likewise
(stack_protect_test_tls): Likewise
(reload_tp_hard): Likewise
* config/arm/arm.opt (-mstack-protector-guard): New
(-mstack-protector-guard-offset): New.
* doc/invoke.texi: Document new options

Signed-off-by: Ard Biesheuvel 
---
 gcc/config/arm/arm-opts.h   |  6 ++
 gcc/config/arm/arm-protos.h |  2 +
 gcc/config/arm/arm.c| 55 +++
 gcc/config/arm/arm.md   | 71 +++-
 gcc/config/arm/arm.opt  | 22 ++
 gcc/doc/invoke.texi |  9 +++
 6 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h
index 5c4b62f404f7..581ba3c4fbbb 100644
--- a/gcc/config/arm/arm-opts.h
+++ b/gcc/config/arm/arm-opts.h
@@ -69,4 +69,10 @@ enum arm_tls_type {
   TLS_GNU,
   TLS_GNU2
 };
+
+/* Where to get the canary for the stack protector.  */
+enum stack_protector_guard {
+  SSP_TLSREG,  /* per-thread canary in TLS register */
+  SSP_GLOBAL   /* global canary */
+};
 #endif
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9b1f61394ad7..d8d605920c97 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, 
rtx, rtx, rtx, rtx);
 extern rtx arm_load_tp (rtx);
 extern bool arm_coproc_builtin_available (enum unspecv);
 extern bool arm_coproc_ldc_stc_legitimate_address (rtx);
+extern rtx arm_stack_protect_tls_canary_mem (bool);
+
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c4ff06b087eb..6a659d81a6fe 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+
+#undef TARGET_STACK_PROTECT_GUARD
+#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard
 
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts,
   if (TARGET_THUMB2_P (opts->x_target_flags))
 opts->x_inline_asm_unified = true;
 
+  if (arm_stack_protector_guard == SSP_GLOBAL
+  && opts->x_arm_stack_protector_guard_offset_str)
+{
+  error ("incompatible options %'-mstack-protector-guard=global%' and"
+"%'-mstack-protector-guard-offset=%qs%'",
+arm_stack_protector_guard_offset_str);
+}
+
+  if (opts->x_arm_stack_protector_guard_offset_str)
+{
+  char *end;
+  const char *str = arm_stack_protector_guard_offset_str;
+  errno = 0;
+  long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0);
+  if (!*str || *end || errno)
+   error ("%qs is not a valid offset in %qs", str,
+  "-mstack-protector-guard-offset=");
+  arm_stack_protector_guard_offset = offs;
+}
+
 #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
   SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
 #endif
@@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void)
   else
target_thread_pointer = TP_SOFT;
 }
+
+  if (arm_stack_protector_guard == SSP_TLSREG
+  && target_thread_pointer != TP_CP15)
+error("%'-mstack-protector-guard=tls%' needs a hardware TLS register");
 }
 
 /* Perform some validation between the desired architecture and the rest of the
@@ -8087,6 +8114,22 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx 
reg, rtx pic_reg,
 }
 
 
+rtx
+arm_stack_protect_tls_canary_mem (bool reloa

[PATCH] middle-end/84407 - honor -frounding-math for int to float conversion

This makes us honor -frounding-math for integer to float conversions
and avoid constant folding when such conversion is not exact.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-10-28  Richard Biener  

PR middle-end/84407
* fold-const.c (fold_convert_const): Avoid int to float
constant folding with -frounding-math and inexact result.
* simplify-rtx.c (simplify_const_unary_operation): Likewise.

* gcc.dg/torture/fp-uint64-convert-double-1.c: New testcase.
---
 gcc/fold-const.c  | 15 +++-
 gcc/simplify-rtx.c| 13 
 .../torture/fp-uint64-convert-double-1.c  | 74 +++
 3 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 18950aeb760..c7daf871125 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2290,7 +2290,20 @@ fold_convert_const (enum tree_code code, tree type, tree 
arg1)
   else if (TREE_CODE (type) == REAL_TYPE)
 {
   if (TREE_CODE (arg1) == INTEGER_CST)
-   return build_real_from_int_cst (type, arg1);
+   {
+ tree res = build_real_from_int_cst (type, arg1);
+ /* Avoid the folding if flag_rounding_math is on and the
+conversion is not exact.  */
+ if (HONOR_SIGN_DEPENDENT_ROUNDING (type))
+   {
+ bool fail = false;
+ wide_int w = real_to_integer (&TREE_REAL_CST (res), &fail,
+   TYPE_PRECISION (TREE_TYPE (arg1)));
+ if (fail || wi::ne_p (w, wi::to_wide (arg1)))
+   return NULL_TREE;
+   }
+ return res;
+   }
   else if (TREE_CODE (arg1) == REAL_CST)
return fold_convert_const_real_from_real (type, arg1);
   else if (TREE_CODE (arg1) == FIXED_CST)
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index f38b6d7d31c..a16395befcd 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1917,6 +1917,19 @@ simplify_const_unary_operation (enum rtx_code code, 
machine_mode mode,
 return 0;
 
   d = real_value_truncate (mode, d);
+
+  /* Avoid the folding if flag_rounding_math is on and the
+conversion is not exact.  */
+  if (HONOR_SIGN_DEPENDENT_ROUNDING (mode))
+   {
+ bool fail = false;
+ wide_int w = real_to_integer (&d, &fail,
+   GET_MODE_PRECISION
+ (as_a  (op_mode)));
+ if (fail || wi::ne_p (w, wide_int (rtx_mode_t (op, op_mode
+   return 0;
+   }
+
   return const_double_from_real_value (d, mode);
 }
   else if (code == UNSIGNED_FLOAT && CONST_SCALAR_INT_P (op))
diff --git a/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c 
b/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
new file mode 100644
index 000..b40a16a2257
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
@@ -0,0 +1,74 @@
+/* PR84407 */
+/* { dg-do run } */
+/* { dg-require-effective-target fenv } */
+/* { dg-additional-options "-frounding-math" } */
+
+#include 
+#include 
+
+void __attribute__((noipa))
+fooa ()
+{
+#if __DBL_MANT_DIG__ == 53
+#ifdef FE_TONEAREST
+  fesetround(FE_TONEAREST);
+  __UINT64_TYPE__ x = 0x7fff;
+  double f = x;
+  if (f != 0x1p+63)
+abort ();
+#endif
+#endif
+}
+
+void __attribute__((noipa))
+foob ()
+{
+#if __DBL_MANT_DIG__ == 53
+#ifdef FE_DOWNWARD
+  fesetround(FE_DOWNWARD);
+  __UINT64_TYPE__ x = 0x7fff;
+  double f = x;
+  if (f != 0x1.fp+62)
+abort ();
+#endif
+#endif
+}
+
+void __attribute__((noipa))
+fooc ()
+{
+#if __DBL_MANT_DIG__ == 53
+#ifdef FE_UPWARD
+  fesetround(FE_UPWARD);
+  __UINT64_TYPE__ x = 0x7fff;
+  double f = x;
+  if (f != 0x1p+63)
+abort ();
+#endif
+#endif
+}
+
+void __attribute__((noipa))
+food ()
+{
+#if __DBL_MANT_DIG__ == 53
+#ifdef FE_TOWARDZERO
+  fesetround(FE_TOWARDZERO);
+  __UINT64_TYPE__ x = 0x7fff;
+  double f = x;
+  if (f != 0x1.fp+62)
+abort ();
+#endif
+#endif
+}
+
+
+int
+main ()
+{
+  fooa ();
+  foob ();
+  fooc ();
+  food ();
+  return 0;
+}
-- 
2.31.1

[Patch 1/8, Arm, AArch64, GCC] Refactor mbranch-protection option parsing and make it common to AArch32 and AArch64 backends. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]

> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 1:58 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.
> 
> On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> > Hi,
> >
> > Add -mbranch-protection option and its associated parsing routines.
> > This option enables the code-generation of pointer signing and
> > authentication instructions in function prologues and epilogues.
> >
> > Tested on arm-none-eabi. OK for trunk?
> >
> > 2021-10-04  Tejas Belagod  
> >
> > gcc/ChangeLog:
> >
> > * common/config/arm/arm-common.c
> >  (arm_print_hit_for_pacbti_option): New.
> >  (arm_progress_next_token): New.
> >  (arm_parse_pac_ret_clause): New routine for parsing the
> > pac-ret clause for -mbranch-protection.
> > (arm_parse_pacbti_option): New routine to parse all the options
> > to -mbranch-protection.
> > * config/arm/arm-protos.h (arm_parse_pacbti_option): Export.
> > * config/arm/arm.c (arm_configure)build_target): Handle option
> > to -mbranch-protection.
> > * config/arm/arm.opt (mbranch-protection). New.
> > (arm_enable_pacbti): New.
> >
> 
> You're missing documentation for invoke.texi.
> 
> Also, how does this differ from the exising option in aarch64?  Can the code
> from that be adapted to be made common to both targets rather than doing
> a new implementation?
> 
> Finally, there are far to many manifest constants in this patch, they need
> replacing with enums or #defines as appropriate if we cannot share the
> aarch64 code.
> 

Thanks for the reviews.

This change refactors all the mbranch-protection option parsing code and types
to make it common to both AArch32 and AArch64 backends.  This change also pulls
in some supporting types from AArch64 to make it common
(aarch_parse_opt_result).  The significant changes in this patch are the
movement of all branch protection parsing routines from aarch64.c to
aarch-common.c and supporting data types and static data structures.  This
patch also pre-declares variables and types required in the aarch32 back for
moved variables for function sign scope and key to prepare for the impending
series of patches that support parsing the feature mbranch-protection in the
aarch32 back end.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.c: Include aarch-common.h.
(all_architectures): Fix comment.
(aarch64_parse_extension): Rename return type, enum value names.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Rename
factored out aarch_ra_sign_scope and aarch_ra_sign_key variables.
Also rename corresponding enum values.
* config/aarch64/aarch64-opts.h (aarch64_function_type): Factor out
aarch64_function_type and move it to common code as aarch_function_type
in aarch-common.h.
* config/aarch64/aarch64-protos.h: Include common types header, move out
types aarch64_parse_opt_result and aarch64_key_type to aarch-common.h
* config/aarch64/aarch64.c: Move mbranch-protection parsing types and
functions out into aarch-common.h and aarch-common.c.  Fix up all the 
name
changes resulting from the move.
* config/aarch64/aarch64.md: Fix up aarch64_ra_sign_key type name change
and enum value.
* config/aarch64/aarch64.opt: Include aarch-common.h to import type 
move.
Fix up name changes from factoring out common code and data.
* config/arm/aarch-common-protos.h: Export factored out routines to both
backends.
* config/arm/aarch-common.c: Include newly factored out types.  Move all
mbranch-protection code and data structures from aarch64.c.
* config/arm/aarch-common.h: New header that declares types shared 
between
aarch32 and aarch64 backends.
* config/arm/arm-protos.h: Declare types and variables that are made 
common
to aarch64 and aarch32 backends - aarch_ra_sign_key, 
aarch_ra_sign_scope and
aarch_enable_bti.

Tested the following configurations. OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.

> R.
diff --git a/gcc/common/config/aarch64/aarch64-common.c 
b/gcc/common/config/aarch64/aarch64-common.c
index 
6d200a186604be2028b19ee9691e7bbf4a7be9c2..92c8f14a17466b9d6c44bdf4ede673a65f1b426f
 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -30,6 +30,7 @@
 #include "opts.h"
 #include "flags.h"
 #include "diagnostic.h"
+#include "config/arm/aarch-common.h"

 #ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
@@ -192,11 +193,11 @@ static const struct arch_to_arch_name all_architectures[] 
=

 /* Parse the architecture extens

[Patch 2/8, Arm, GCC] Add Armv8.1-M Mainline target feature +pacbti. [Was RE: [Patch 1/7, Arm, GCC] Add Armv8.1-M Mainline target feature +pacbti.]



> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 1:29 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 1/7, Arm, GCC] Add Armv8.1-M Mainline target feature
> +pacbti.
> 
> On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> > Hi,
> >
> > This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
> > This feature enables pointer signing and authentication instructions
> > on M-class architectures.
> >
> > Tested on arm-none-eabi. OK for trunk?
> >
> > 2021-10-04  Tejas Belagod  
> >
> > gcc/Changelog:
> >
> > * config/arm/arm-cpus.in: Define new feature pacbti.
> > * config/arm/arm.h (TARGET_HAVE_PACBTI): New.
> >
> 
> "+pacbti" needs to be documented in invoke.texi at the appropriate place.
> 

Thanks for the reviews.

This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
This feature enables pointer signing and authentication instructions
on M-class architectures.

2021-10-25  Tejas Belagod  

gcc/Changelog:

* config/arm/arm-cpus.in: Define new feature pacbti.
* config/arm/arm.h (TARGET_HAVE_PACBTI): New.
* doc/invoke.texi: Document new feature pacbti.



Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap


> R.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 
d0d0d0f1c7e4176fc4aa30d82394fe938b083a59..8a0e9c79682766ee2bec3fd7ba6ed67dff69dbad
 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -223,6 +223,10 @@ define feature cdecp5
 define feature cdecp6
 define feature cdecp7
 
+# M-profile control flow integrity extensions (PAC/AUT/BTI).
+# Optional from Armv8.1-M Mainline.
+define feature pacbti
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -741,6 +745,7 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add MVE
  option mve.fp add MVE_FP
+ option pacbti add pacbti
  option cdecp0 add cdecp0
  option cdecp1 add cdecp1
  option cdecp2 add cdecp2
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
015299c15346f1bea59d70fdcb1d19545473b23b..8e6ef41f6b065217d1af3f4f1cb85b2d8fbd0dc0
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -335,6 +335,12 @@ emission of floating point pcs attributes.  */
isa_bit_mve_float) \
   && !TARGET_GENERAL_REGS_ONLY)
 
+/* Non-zero if this target supports Armv8.1-M Mainline pointer-signing
+   extension.  */
+#define TARGET_HAVE_PACBTI (arm_arch8_1m_main \
+   && bitmap_bit_p (arm_active_target.isa, \
+isa_bit_pacbti))
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
71992b8c59749f5508a3c6a1b1792910652eac57..27df8cf5bee79c2abac8b81c1ac54f1c3e50c628
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20469,6 +20469,9 @@ Disable the floating-point extension.
 @item +cdecp0, +cdecp1, ... , +cdecp7
 Enable the Custom Datapath Extension (CDE) on selected coprocessors according
 to the numbers given in the options in the range 0 to 7.
+
+@item +pacbti
+Enable the Pointer Authentication and Branch Target Identification Extension.
 @end table
 
 @item  armv8-m.main

[Patch 3/8, Arm, GCC] Add option -mbranch-protection. [Was RE: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.]



> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 1:58 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 2/7, Arm, GCC] Add option -mbranch-protection.
> 
> On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> > Hi,
> >
> > Add -mbranch-protection option and its associated parsing routines.
> > This option enables the code-generation of pointer signing and
> > authentication instructions in function prologues and epilogues.
> >
> > Tested on arm-none-eabi. OK for trunk?
> >
> > 2021-10-04  Tejas Belagod  
> >
> > gcc/ChangeLog:
> >
> > * common/config/arm/arm-common.c
> >  (arm_print_hit_for_pacbti_option): New.
> >  (arm_progress_next_token): New.
> >  (arm_parse_pac_ret_clause): New routine for parsing the
> > pac-ret clause for -mbranch-protection.
> > (arm_parse_pacbti_option): New routine to parse all the options
> > to -mbranch-protection.
> > * config/arm/arm-protos.h (arm_parse_pacbti_option): Export.
> > * config/arm/arm.c (arm_configure)build_target): Handle option
> > to -mbranch-protection.
> > * config/arm/arm.opt (mbranch-protection). New.
> > (arm_enable_pacbti): New.
> >
> 
> You're missing documentation for invoke.texi.
> 
> Also, how does this differ from the exising option in aarch64?  Can the code
> from that be adapted to be made common to both targets rather than doing
> a new implementation?
> 
> Finally, there are far to many manifest constants in this patch, they need
> replacing with enums or #defines as appropriate if we cannot share the
> aarch64 code.

Thanks for the reviews.

Add -mbranch-protection option.  This option enables the code-generation of
pointer signing and authentication instructions in function prologues and
epilogues.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm.c (arm_configure_build_target): Parse and validate
-mbranch-protection option and initialize appropriate data structures.
* config/arm/arm.opt: New option -mbranch-protection.
* doc/invoke.texi: Document -mbranch-protection.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
a952655db80663f28f5a5d12005f2adb4702894f..946841526ee127105396097d143e755bdfc756f5
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3216,6 +3216,17 @@ arm_configure_build_target (struct arm_build_target 
*target,
   tune_opts = strchr (opts->x_arm_tune_string, '+');
 }
 
+  if (opts->x_arm_branch_protection_string)
+{
+  aarch_validate_mbranch_protection (opts->x_arm_branch_protection_string);
+
+  if (aarch_ra_sign_key != AARCH_KEY_A)
+   {
+ warning (0, "invalid key type for %<-mbranch-protection=%>");
+ aarch_ra_sign_key = AARCH_KEY_A;
+   }
+}
+
   if (arm_selected_arch)
 {
   arm_initialize_isa (target->isa, arm_selected_arch->common.isa_bits);
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 
5c5b4f3ae0699a3a9d78df40a5ab65324dcba7b9..4f2754c3e84c436f7058ea0bd1c9f517b3a63ccd
 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -313,6 +313,10 @@ mbranch-cost=
 Target RejectNegative Joined UInteger Var(arm_branch_cost) Init(-1)
 Cost to assume for a branch insn.
 
+mbranch-protection=
+Target RejectNegative Joined Var(arm_branch_protection_string) Save
+Use branch-protection features.
+
 mgeneral-regs-only
 Target RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses the core registers only (r0-r14).
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
27df8cf5bee79c2abac8b81c1ac54f1c3e50c628..7f886db008a39c44819616eb2799c01822d0aae9
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -810,7 +810,9 @@ Objective-C and Objective-C++ Dialects}.
 -mpure-code @gol
 -mcmse @gol
 -mfix-cmse-cve-2021-35465 @gol
--mfdpic}
+-mfdpic @gol
+-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]
+[+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args @gol
@@ -20969,6 +20971,18 @@ The opposite @option{-mno-fdpic} option is useful (and 
required) to
 build the Linux kernel using the same (@code{arm-*-uclinuxfdpiceabi})
 toolchain as the one used to build the userland programs.
 
+@item 
-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}][+@var{bti}]|@var{bti}[+@var{pac-ret}[+@var{leaf}]]
+@opindex mbranch-protection
+Select the branch protection features to use.
+@samp{none} is the default and turns off all types of branch protection.
+@samp{standard} turns on all types of branch protection features.  If a feature
+has additional tuning options, then @samp{standard} sets it to its s

[Patch 4/8, Arm. GCC] Add testsuite library support for PACBTI target. [Was RE: [Patch 3/7, Arm, GCC] Add testsuite library support for PACBTI target.]



> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 2:38 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 3/7, Arm, GCC] Add testsuite library support for PACBTI
> target.
> 
> On 11/10/2021 14:36, Richard Earnshaw via Gcc-patches wrote:
> > On 08/10/2021 13:17, Tejas Belagod via Gcc-patches wrote:
> >> Hi,
> >>
> >> Add targeting-checking entities for PACBTI in testsuite framework.
> >>
> >> Tested on arm-none-eabi. OK for trunk?
> >>
> >> 2021-10-04  Tejas Belagod  
> >>
> >> gcc/ChangeLog:
> >>
> >> * testsuite/lib/target-supports.exp
> >> (check_effective_target_arm_pacbti_hw): New.
> >>
> >
> > OK.
> >
> > R.
> 
> Oh, wait!  Not OK.  Needs documentation in sourcebuild.texi.
> 

Thanks for the reviews.

Add targeting-checking entities for PACBTI in testsuite
framework.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* testsuite/lib/target-supports.exp:
(check_effective_target_arm_pacbti_hw): New.
* doc/sourcebuild.texi: Document arm_pacbti_hw.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 
6a16576763006a13e946147ab1ea5b16b5bc219b..3dd1dd8d7f031720e55cf389376f1572991d8071
 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2141,6 +2141,10 @@ ARM target supports options to generate instructions 
from ARMv8.1-M with
 the Custom Datapath Extension (CDE) and M-Profile Vector Extension (MVE).
 Some multilibs may be incompatible with these options.
 
+@item arm_pacbti_hw
+Test system supports executing Pointer Authentication and Branch Target
+Identification instructions.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
1c8b1ebb86e8769e40fe88af3a4c651990dbb2a1..843397adf437700ca622ce140359b6aaa0172e42
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5064,6 +5064,22 @@ proc check_effective_target_arm_cmse_clear_ok {} {
 } "-mcmse"];
 }
 
+# Return 1 if the target supports executing PACBTI instructions, 0
+# otherwise.
+
+proc check_effective_target_arm_pacbti_hw {} {
+return [check_runtime arm_pacbti_hw_available {
+   __attribute__ ((naked)) int
+   main (void)
+   {
+ asm ("pac r12, lr, sp");
+ asm ("mov r0, #0");
+ asm ("autg r12, lr, sp");
+ asm ("bx lr");
+   }
+} ""]
+}
+
 # Return 1 if this compilation turns on string_ops_prefer_neon on.
 
 proc check_effective_target_arm_tune_string_ops_prefer_neon { } {

[Patch 5/8, Arm, GCC] Implement target feature macros for PACBTI. [Was RE: [Patch 4/7, Arm. GCC] Implement target feature macros for PACBTI.]



> -Original Message-
> From: Richard Earnshaw 
> Sent: Monday, October 11, 2021 2:58 PM
> To: Tejas Belagod ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch 4/7, Arm. GCC] Implement target feature macros for
> PACBTI.
> 
> On 08/10/2021 13:18, Tejas Belagod via Gcc-patches wrote:
> > Hi,
> >
> > This patch implements target feature macros when PACBTI is enabled
> > through the -march option or -mbranch-protection.
> >
> > Tested on arm-none-eabi. OK for trunk?
> >
> > 2021-10-04  Tejas Belagod  
> >
> > gcc/ChangeLog:
> >
> > * config/arm/arm-c.c (arm_cpu_builtins): Define
> > __ARM_FEATURE_BTI_DEFAULT and
> __ARM_FEATURE_PAC_DEFAULT.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/arm/acle/pacbti-m-predef-2.c: New test.
> > * gcc.target/arm/acle/pacbti-m-predef-4.c: New test.
> > * gcc.target/arm/acle/pacbti-m-predef-5.c: New test.
> >
> 
> I presume the specification for this is ACLE - please say so rather than 
> making
> me guess.
> 

Yes, sorry, very poor description on my part. Now fixed - please see patch 
description below for links to specific ACLE sections.

> 
> +  cpp_undef (pfile, "__ARM_FEATURE_BTI_DEFAULT");
> +  cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
> +  if (TARGET_HAVE_PACBTI)
> +{
> +  builtin_define_with_int_value ("__ARM_FEATURE_BTI_DEFAULT",
> +  arm_enable_pacbti & 0x1);
> 
> My reading of the ACLE specification would suggest this shouldn't be
> defined if it would have a value of 0, but that's not what this code
> does.  I think it would be better to move this outside the
> TARGET_HAVE_PACBTI and use the def_or_undef approach.
> 
> +  builtin_define_with_int_value ("__ARM_FEATURE_PAC_DEFAULT",
> +  arm_enable_pacbti >> 1);
> 
> This one is less clear, could the value ever be zero?  I guess exactly
> one of a-key and b-key must be defined and each has a separate bit.
> 

Now fixed according to what the arch specifies. For the M-profile, there's only 
one key which means when -mbranch-protection is invoked, bit 0 is always 1.

> +}
> +
> +
> 
> Not more than one blank line at the end of a block.
> 
> 
> diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
> b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
> 
> 
> Given what I've said above, I think you need to also test that
> __ARM_FEATURE_BTI_DEFAULT is defined before testing the value (and
> emitting #error if it isn't).
> 

Fixed.

This patch implements target feature macros when PACBTI is
enabled through the -march option or -mbranch-protection.
The target feature macros __ARM_FEATURE_PAC_DEFAULT and
__ARM_FEATURE_BTI_DEFAULT are specified in ARM ACLE
(https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros?lang=en)
__ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI are specified in the pull-request
(https://github.com/ARM-software/acle/pull/55). 

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm-c.c (arm_cpu_builtins): Define
__ARM_FEATURE_BTI_DEFAULT, __ARM_FEATURE_PAC_DEFAULT,
__ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-2.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-4.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-5.c: New test.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 
cc7901bca8dc9c5c27ed6afc5bc26afd42689e6d..98d47ad4cc6e88aa7401429a809c555c5aadc15f
 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -193,6 +193,24 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_PAUTH", TARGET_HAVE_PACBTI);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BTI", TARGET_HAVE_PACBTI);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BTI_DEFAULT",
+ aarch_enable_bti == 1);
+
+  cpp_undef (pfile, "__ARM_FEATURE_PAC_DEFAULT");
+  if (aarch_ra_sign_scope != AARCH_FUNCTION_NONE)
+  {
+unsigned int pac = 1;
+
+gcc_assert (aarch_ra_sign_key == AARCH_KEY_A);
+
+if (aarch_ra_sign_scope == AARCH_FUNCTION_ALL)
+  pac |= 0x4;
+
+builtin_define_with_int_value ("__ARM_FEATURE_PAC_DEFAULT", pac);
+  }
+
   cpp_undef (pfile, "__ARM_FEATURE_MVE");
   if (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT)
 {
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-2.c
new file mode 100644
index 
..4394fd147d7bf468238bd66a24b79bd1338d33aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target

[Patch 6/8, Arm. GCC] Add pointer authentication for stack-unwinding runtime. [Was RE: [Patch 5/7, Arm. GCC] Add pointer authentication for stack-unwinding runtime.]



> -Original Message-
> From: Gcc-patches  bounces+belagod=gcc.gnu@gcc.gnu.org> On Behalf Of Tejas Belagod via
> Gcc-patches
> Sent: Friday, October 8, 2021 1:18 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [Patch 5/7, Arm. GCC] Add pointer authentication for stack-
> unwinding runtime.
> 
> Hi,
> 
> This patch adds authentication for when the stack is unwound when an
> exception is taken.  All the changes here are done to the runtime code in
> libgcc's unwinder code for Arm target. All the changes are guarded under
> defined (__ARM_FEATURE_PAC_DEFAULT) and activates only if the +pacbti
> feature is switched on for the architecture. This means that switching on the
> target feature via -march or -mcpu is sufficient and -mbranch-protection
> need not be enabled. This ensures that the unwinder is authenticated only if
> the PACBTI instructions are available in the non-NOP space as it uses AUTG.
> Just generating PAC/AUT instructions using -mbranch-protection will not
> enable authentication on the unwinder.
> 
> Tested on arm-none-eabi. OK for trunk?
> 
> 2021-10-04  Tejas Belagod  
> 
> gcc/ChangeLog:
> 
>   * ginclude/unwind-arm-common.h (_Unwind_VRS_RegClass):
> Introduce
>   new pseudo register class _UVRSC_PAC.
>   * libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
>   exception opcode (0xb4) for saving RA_AUTH_CODE and
> authenticate
>   with AUTG if found.
>   * libgcc/config/arm/unwind-arm.c (struct pseudo_regs): New.
>   (phase1_vrs): Introduce new field to store pseudo-reg state.
>   (phase2_vrs): Likewise.
>   (_Unwind_VRS_Get): Load pseudo register state from virtual reg set.
>   (_Unwind_VRS_Set): Store pseudo register state to virtual reg set.
>   (_Unwind_VRS_Pop): Load pseudo register value from stack into
> VRS.

Rebased and respin based on reviews for previous patches.

This patch adds authentication for when the stack is unwound when
an exception is taken.  All the changes here are done to the runtime
code in libgcc's unwinder code for Arm target. All the changes are
guarded under defined (__ARM_FEATURE_PAUTH) and activates only
if the +pacbti feature is switched on for the architecture. This means
that switching on the target feature via -march or -mcpu is sufficient
and -mbranch-protection need not be enabled. This ensures that the
unwinder is authenticated only if the PACBTI instructions are available
in the non-NOP space as it uses AUTG. Just generating PAC/AUT instructions
using -mbranch-protection will not enable authentication on the unwinder.

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* ginclude/unwind-arm-common.h (_Unwind_VRS_RegClass): Introduce
new pseudo register class _UVRSC_PAC.
* libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
exception opcode (0xb4) for saving RA_AUTH_CODE and authenticate
with AUTG if found.
* libgcc/config/arm/unwind-arm.c (struct pseudo_regs): New.
(phase1_vrs): Introduce new field to store pseudo-reg state.
(phase2_vrs): Likewise.
(_Unwind_VRS_Get): Load pseudo register state from virtual reg set.
(_Unwind_VRS_Set): Store pseudo register state to virtual reg set.
(_Unwind_VRS_Pop): Load pseudo register value from stack into VRS.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.

diff --git a/gcc/ginclude/unwind-arm-common.h b/gcc/ginclude/unwind-arm-common.h
index 
79f107d8abb2dd1e2d4903531db47147da63fee8..b60c07128460f8c5e82bdffac7ec469f3607a271
 100644
--- a/gcc/ginclude/unwind-arm-common.h
+++ b/gcc/ginclude/unwind-arm-common.h
@@ -127,7 +127,10 @@ extern "C" {
   _UVRSC_VFP = 1,   /* vfp */
   _UVRSC_FPA = 2,   /* fpa */
   _UVRSC_WMMXD = 3, /* Intel WMMX data register */
-  _UVRSC_WMMXC = 4  /* Intel WMMX control register */
+  _UVRSC_WMMXC = 4, /* Intel WMMX control register */
+#if defined(__ARM_FEATURE_PAUTH)
+  _UVRSC_PAC = 5/* Armv8.1-M Mainline PAC/AUTH pseudo-register */
+#endif
 }
   _Unwind_VRS_RegClass;
 
diff --git a/libgcc/config/arm/pr-support.c b/libgcc/config/arm/pr-support.c
index 
7525e35b4918d38b4ab3ae73a69b722e31b4b322..da27d742fc7be1cef7704a1ea03204743017a591
 100644
--- a/libgcc/config/arm/pr-support.c
+++ b/libgcc/config/arm/pr-support.c
@@ -106,6 +106,9 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
 {
   _uw op;
   int set_pc;
+#if defined(__ARM_FEATURE_PAUTH)
+  int set_pac = 0;
+#endif
   _uw reg;
 
   set_pc = 0;
@@ -114,6 +117,22 @@ __gnu_unwind_execute (_Unwind_Context * context, 
__gnu_unwind_state * uws)
   op = next_unwind_byte (uws);
   if (op == CODE_FINISH)
{
+#if defined(__ARM_FEATURE_PAUTH)
+ /* When we reach end, we have to authenti

[Patch 7/8, Arm, GCC] Emit build attributes for PACBTI target feature. [ Was RE: [Patch 6/7, Arm, GCC] Emit build attributes for PACBTI target feature.]



> -Original Message-
> From: Gcc-patches  bounces+belagod=gcc.gnu@gcc.gnu.org> On Behalf Of Tejas Belagod via
> Gcc-patches
> Sent: Friday, October 8, 2021 1:19 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [Patch 6/7, Arm, GCC] Emit build attributes for PACBTI target
> feature.
> 
> Hi,
> 
> This patch emits assembler directives for PACBTI build attributes as defined
> by the ABI. (https://github.com/ARM-software/abi-
> aa/releases/download/2021Q1/addenda32.pdf)
> 
> Tested on arm-none-eabi.
> 
> 2021-10-04  Tejas Belagod  
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm.c (arm_file_start): Emit EABI attributes for
>   Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use,
> TAG_PACRET_use.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
>   * gcc.target/arm/acle/pacbti-m-predef-3: New test.
>   * gcc.target/arm/acle/pacbti-m-predef-6.c: New test.


This patch emits assembler directives for PACBTI build attributes
as defined by the ABI.
https://github.com/ARM-software/abi-aa/releases/download/2021Q1/addenda32.pdf

2021-10-25  Tejas Belagod  

gcc/ChangeLog:

* config/arm/arm.c (arm_file_start): Emit EABI attributes for
Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use, TAG_PACRET_use.

gcc/testsuite/ChangeLog:

* gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
* gcc.target/arm/acle/pacbti-m-predef-3: New test.
* gcc.target/arm/acle/pacbti-m-predef-6.c: New test.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
946841526ee127105396097d143e755bdfc756f5..a87bcb298f9e6d7b2f3fd61b4586e291f46b0f81
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28200,6 +28200,8 @@ static void
 arm_file_start (void)
 {
   int val;
+  bool pac = (aarch_ra_sign_scope != AARCH_FUNCTION_NONE);
+  bool bti = (aarch_enable_bti == 1);
 
   arm_print_asm_arch_directives
 (asm_out_file, TREE_TARGET_OPTION (target_option_default_node));
@@ -28270,6 +28272,24 @@ arm_file_start (void)
arm_emit_eabi_attribute ("Tag_ABI_FP_16bit_format", 38,
 (int) arm_fp16_format);
 
+  if (TARGET_HAVE_PACBTI)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 2);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 2);
+ arm_emit_eabi_attribute ("TAG_BTI_use", 74, bti);
+ arm_emit_eabi_attribute ("TAG_PACRET_use", 76, pac);
+   }
+  else
+   {
+ if (pac || bti)
+   {
+ arm_emit_eabi_attribute ("Tag_PAC_extension", 50, 1);
+ arm_emit_eabi_attribute ("Tag_BTI_extension", 52, 1);
+ arm_emit_eabi_attribute ("TAG_BTI_use", 74, bti);
+ arm_emit_eabi_attribute ("TAG_PACRET_use", 76, pac);
+   }
+   }
+
   if (arm_lang_output_object_attributes_hook)
arm_lang_output_object_attributes_hook();
 }
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
new file mode 100644
index 
..cc88380731ae81dd27c0a343518252a172f8f3ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-1.c
@@ -0,0 +1,30 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+bti --save-temps" } */
+
+/* { dg-final { scan-assembler "\.arch_extension pacbti" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 74, 1" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 76, 1" } } */
+
+#if !defined (__ARM_FEATURE_BTI_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_BTI_DEFAULT should be defined."
+#endif
+
+#if !defined (__ARM_FEATURE_PAC_DEFAULT)
+#error "Feature test macro __ARM_FEATURE_PAC_DEFAULT should be defined."
+#endif
+
+int
+main()
+{
+  if (__ARM_FEATURE_BTI_DEFAULT != 1)
+__builtin_abort ();
+
+  if (__ARM_FEATURE_PAC_DEFAULT != 1)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c 
b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
new file mode 100644
index 
..8bebd995b170df953e13f86d2276576d5ab34e93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/pacbti-m-predef-3.c
@@ -0,0 +1,26 @@
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_pacbti_hw } */
+/* { dg-additional-options " -mbranch-protection=pac-ret+leaf --save-temps" } 
*/
+
+/* { dg-final { scan-assembler "\.eabi_attribute 50, 2" } } */
+/* { dg-final { scan-assembler "\.eabi_attribute 52, 2" } }

[Patch 8/8, Arm, GCC] Introduce multilibs for PACBTI target feature. [Was RE: [Patch 7/7, Arm, GCC] Introduce multilibs for PACBTI target feature.]



> -Original Message-
> From: Gcc-patches  bounces+belagod=gcc.gnu@gcc.gnu.org> On Behalf Of Tejas Belagod via
> Gcc-patches
> Sent: Friday, October 8, 2021 1:19 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [Patch 7/7, Arm, GCC] Introduce multilibs for PACBTI target feature.
> 
> Hi,
> 
> This patch adds a multilib for pacbti target feature.
> 
> Tested on arm-none-eabi. OK for trunk?
> 
> 2021-10-04  Tejas Belagod  
> 
> gcc/ChangeLog:
> 
>   * config/arm/t-rmprofile: Add multilib rules for +pacbti.


This patch adds a multilib for pacbti target feature.

2021-10-04  Tejas Belagod  

gcc/ChangeLog:

* config/arm/t-rmprofile: Add multilib rules for +pacbti.

Tested the following configurations, OK for trunk?

-mthumb/-march=armv8.1-m.main+pacbti/-mfloat-abi=soft
-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
mcmodel=small and tiny
aarch64-none-linux-gnu native test and bootstrap

Thanks,
Tejas.
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 
a6036bf0a5191a3cac3bfbe2329783204d5c3ef4..241bf1939e30ae7890ae332556d33759f538ced5
 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -27,8 +27,8 @@
 
 # Arch and FPU variants to build libraries with
 
-MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve
-MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve
+MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve/march=armv8.1-m.main+pacbti
+MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base v8-m.main 
v8-m.main+fp v8-m.main+dp v8.1-m.main+mve v8.1-m.main+pacbti
 
 # Base M-profile (no fp)
 MULTILIB_REQUIRED  += mthumb/march=armv6s-m/mfloat-abi=soft
@@ -36,6 +36,7 @@ MULTILIB_REQUIRED += mthumb/march=armv7-m/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv7e-m/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.base/mfloat-abi=soft
 MULTILIB_REQUIRED  += mthumb/march=armv8-m.main/mfloat-abi=soft
+MULTILIB_REQUIRED  += mthumb/march=armv8.1-m.main+pacbti/mfloat-abi=soft
 
 # ARMv7e-M with FP (single and double precision variants)
 MULTILIB_REQUIRED  += mthumb/march=armv7e-m+fp/mfloat-abi=hard
@@ -93,3 +94,4 @@ MULTILIB_MATCHES  += 
march?armv8-m.main=mlibarch?armv8-m.main
 MULTILIB_MATCHES   += march?armv8-m.main+fp=mlibarch?armv8-m.main+fp
 MULTILIB_MATCHES   += march?armv8-m.main+fp.dp=mlibarch?armv8-m.main+fp.dp
 MULTILIB_MATCHES   += march?armv8.1-m.main+mve=mlibarch?armv8.1-m.main+mve
+MULTILIB_MATCHES   += 
march?armv8.1-m.main+pacbti=mlibarch?armv8.1-m.main+pacbti

Re: [PATCH] match.pd: Optimize MIN_EXPR etc. addr1 < addr2 would be simplified [PR102951]

On Thu, 28 Oct 2021, Jakub Jelinek wrote:

> Hi!
> 
> This patch outlines the decision whether address comparison can be folded
> or not from the match.pd simple comparison simplification and uses it
> both there and in a new minmax simplification, such that we fold e.g.
>   MAX (&a[2], &a[1])
> etc.
> Some of the Wstringop-overflow-62.c changes might look weird, but that
> seems to be mainly due to gimple_fold_builtin_memset not bothering to
> copy over location, will fix that incrementally.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2021-10-28  Jakub Jelinek  
> 
>   PR tree-optimization/102951
>   * fold-const.h (address_compare): Declare.
>   * fold-const.c (address_compare): New function.
>   * match.pd (cmp (convert1?@2 addr@0) (convert2? addr@1)): Use
>   address_compare helper.
>   (minmax cmp (convert1?@2 addr@0) (convert2?@3 addr@1)): New
>   simplification.
> 
>   * gcc.dg/tree-ssa/pr102951.c: New test.
>   * gcc.dg/Wstringop-overflow-62.c: Adjust expected diagnostics.
> 
> --- gcc/fold-const.h.jj   2021-06-14 12:27:18.572411152 +0200
> +++ gcc/fold-const.h  2021-10-27 11:54:50.781412075 +0200
> @@ -213,6 +213,8 @@ extern bool negate_mathfn_p (combined_fn
>  extern const char *getbyterep (tree, unsigned HOST_WIDE_INT *);
>  extern const char *c_getstr (tree);
>  extern wide_int tree_nonzero_bits (const_tree);
> +extern int address_compare (tree_code, tree, tree, tree, tree &, tree &,
> + poly_int64 &, poly_int64 &, bool);
>  
>  /* Return OFF converted to a pointer offset type suitable as offset for
> POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
> --- gcc/fold-const.c.jj   2021-08-11 23:43:59.195893727 +0200
> +++ gcc/fold-const.c  2021-10-27 12:16:26.504267476 +0200
> @@ -16473,6 +16473,132 @@ tree_nonzero_bits (const_tree t)
>return wi::shwi (-1, TYPE_PRECISION (TREE_TYPE (t)));
>  }
>  
> +/* Helper function for address compare simplifications in match.pd.
> +   OP0 and OP1 are ADDR_EXPR operands being compared by CODE.
> +   BASE0, BASE1, OFF0 and OFF1 are set by the function.
> +   GENERIC is true if GENERIC folding and false for GIMPLE folding.
> +   Returns 0 if OP0 is known to be unequal to OP1 regardless of OFF{0,1},
> +   1 if bases are known to be equal and OP0 cmp OP1 depends on OFF0 cmp OFF1,
> +   and 2 if unknown.  */
> +
> +int
> +address_compare (tree_code code, tree type, tree op0, tree op1,
> +  tree &base0, tree &base1, poly_int64 &off0, poly_int64 &off1,
> +  bool generic)
> +{
> +  gcc_checking_assert (TREE_CODE (op0) == ADDR_EXPR);
> +  gcc_checking_assert (TREE_CODE (op1) == ADDR_EXPR);
> +  base0 = get_addr_base_and_unit_offset (TREE_OPERAND (op0, 0), &off0);
> +  base1 = get_addr_base_and_unit_offset (TREE_OPERAND (op1, 0), &off1);
> +  if (base0 && TREE_CODE (base0) == MEM_REF)
> +{
> +  off0 += mem_ref_offset (base0).force_shwi ();
> +  base0 = TREE_OPERAND (base0, 0);
> +}
> +  if (base1 && TREE_CODE (base1) == MEM_REF)
> +{
> +  off1 += mem_ref_offset (base1).force_shwi ();
> +  base1 = TREE_OPERAND (base1, 0);
> +}
> +  if (base0 == NULL_TREE || base1 == NULL_TREE)
> +return 2;
> +
> +  int equal = 2;
> +  /* Punt in GENERIC on variables with value expressions;
> + the value expressions might point to fields/elements
> + of other vars etc.  */
> +  if (generic
> +  && ((VAR_P (base0) && DECL_HAS_VALUE_EXPR_P (base0))
> +   || (VAR_P (base1) && DECL_HAS_VALUE_EXPR_P (base1
> +return 2;
> +  else if (decl_in_symtab_p (base0) && decl_in_symtab_p (base1))
> +{
> +  symtab_node *node0 = symtab_node::get_create (base0);
> +  symtab_node *node1 = symtab_node::get_create (base1);
> +  equal = node0->equal_address_to (node1);
> +}
> +  else if ((DECL_P (base0)
> + || TREE_CODE (base0) == SSA_NAME
> + || TREE_CODE (base0) == STRING_CST)
> +&& (DECL_P (base1)
> +|| TREE_CODE (base1) == SSA_NAME
> +|| TREE_CODE (base1) == STRING_CST))
> +equal = (base0 == base1);
> +  if (equal == 1)
> +{
> +  if (code == EQ_EXPR
> +   || code == NE_EXPR
> +   /* If the offsets are equal we can ignore overflow.  */
> +   || known_eq (off0, off1)
> +   || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (op0))
> +   /* Or if we compare using pointers to decls or strings.  */
> +   || (POINTER_TYPE_P (type)
> +   && (DECL_P (base0) || TREE_CODE (base0) == STRING_CST)))
> + return 1;
> +  return 2;
> +}
> +  if (equal != 0)
> +return equal;
> +  if (code != EQ_EXPR && code != NE_EXPR)
> +return 2;
> +
> +  HOST_WIDE_INT ioff0 = -1, ioff1 = -1;
> +  off0.is_constant (&ioff0);
> +  off1.is_constant (&ioff1);
> +  if ((DECL_P (base0) && TREE_CODE (base1) == STRING_CST)
> +   || (TREE_CODE (base0) == STRING_CST && DECL_P (base1))
> +   || (TREE_CO

Re: [PATCH] middle-end/84407 - honor -frounding-math for int to float conversion

On Thu, Oct 28, 2021 at 01:32:17PM +0200, Richard Biener wrote:
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index f38b6d7d31c..a16395befcd 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -1917,6 +1917,19 @@ simplify_const_unary_operation (enum rtx_code code, 
> machine_mode mode,
>  return 0;
>  
>d = real_value_truncate (mode, d);
> +
> +  /* Avoid the folding if flag_rounding_math is on and the
> +  conversion is not exact.  */
> +  if (HONOR_SIGN_DEPENDENT_ROUNDING (mode))
> + {
> +   bool fail = false;
> +   wide_int w = real_to_integer (&d, &fail,
> + GET_MODE_PRECISION
> +   (as_a  (op_mode)));
> +   if (fail || wi::ne_p (w, wide_int (rtx_mode_t (op, op_mode
> + return 0;
> + }
> +
>return const_double_from_real_value (d, mode);
>  }
>else if (code == UNSIGNED_FLOAT && CONST_SCALAR_INT_P (op))

What about the else if case (i.e. UNSIGNED_FLOAT)?
And I think it would be nice to test the simplify-rtx.c code somewhere,
perhaps gcc/testsuite/gcc.dg/rtl/x86_64 testcase and check that we
simplify with -frounding-math e.g. UNSIGNED_FLOAT from DImode
0x8000 or FLOAT or UNSIGNED_FLOAT from DImode
0x7c00, but will not fold FLOAT or UNSIGNED_FLOAT from
DImode 0x7c01 or 0x7fff.

Jakub

Re: [PATCH] c++, v2: Implement DR2351 - void{} [PR102820]

2021-10-28 Thread Jason Merrill via Gcc-patches


On 10/28/21 07:26, Jakub Jelinek wrote:

On Wed, Oct 27, 2021 at 04:58:53PM -0400, Jason Merrill wrote:

On 10/21/21 04:42, Jakub Jelinek wrote:

Hi!

Here is an attempt to implement DR2351 - void{} - where void{} after
pack expansion is considered valid and the same thing as void().
For templates, dunno if we have some better way to check if a CONSTRUCTOR
might be empty after pack expansion.  Would that only if the constructor
only contains EXPR_PACK_EXPANSION elements and nothing else, or something
else too?


I think that's the only case.  For template args there's the
pack_expansion_args_count function, but I don't think there's anything
similar for constructor elts; please feel free to add it.


Ok.  But counting how many packs its CONSTRUCTOR_ELTS have and then comparing
that number against CONSTRUCTOR_NELTS seems to be unnecessarily expensive if
there are many elements, for the purpose the DR2351 code needs we can stop
as soon as we see first non-pack element.

So what about this if it passes bootstrap/regtest?

2021-10-28  Jakub Jelinek  

PR c++/102820
* semantics.c (maybe_zero_constructor_nelts): New function.
(finish_compound_literal): Implement DR2351 - void{}.
If type is cv void and compound_literal has no elements, return
void_node.  If type is cv void and compound_literal might have no
elements after expansion, handle it like other dependent compound
literals.

* g++.dg/cpp0x/dr2351.C: New test.

--- gcc/cp/semantics.c.jj   2021-10-27 09:16:41.161600606 +0200
+++ gcc/cp/semantics.c  2021-10-28 13:06:59.325791588 +0200
@@ -3079,6 +3079,24 @@ finish_unary_op_expr (location_t op_loc,
return result;
  }
  
+/* Return true if CONSTRUCTOR EXPR after pack expansion could have no

+   elements.  */
+
+static bool
+maybe_zero_constructor_nelts (tree expr)
+{
+  if (CONSTRUCTOR_NELTS (expr) == 0)
+return true;
+  if (!processing_template_decl)
+return false;
+  unsigned int i;
+  tree val;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (expr), i, val)


Let's use

  for (constructor_elt &elt : CONSTRUCTOR_ELTS (t))


+if (!PACK_EXPANSION_P (val))
+  return false;
+  return true;
+}
+
  /* Finish a compound-literal expression or C++11 functional cast with 
aggregate
 initializer.  TYPE is the type to which the CONSTRUCTOR in COMPOUND_LITERAL
 is being cast.  */
@@ -3104,9 +3122,20 @@ finish_compound_literal (tree type, tree
  
if (!TYPE_OBJ_P (type))

  {
-  if (complain & tf_error)
-   error ("compound literal of non-object type %qT", type);
-  return error_mark_node;
+  /* DR2351 */
+  if (VOID_TYPE_P (type) && CONSTRUCTOR_NELTS (compound_literal) == 0)
+   return void_node;


This test now seems redundant with the one below (if you remove the && 
processing_template_decl).


OK with those tweaks.


+  else if (VOID_TYPE_P (type)
+  && processing_template_decl
+  && maybe_zero_constructor_nelts (compound_literal))
+   /* If there are only packs in compound_literal, it could
+  be void{} after pack expansion.  */;
+  else
+   {
+ if (complain & tf_error)
+   error ("compound literal of non-object type %qT", type);
+ return error_mark_node;
+   }
  }
  
if (template_placeholder_p (type))

--- gcc/testsuite/g++.dg/cpp0x/dr2351.C.jj  2021-10-28 12:59:27.987120315 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/dr2351.C 2021-10-28 13:15:20.532760871 +0200
@@ -0,0 +1,51 @@
+// DR2351
+// { dg-do compile { target c++11 } }
+
+void
+foo ()
+{
+  void{};
+  void();
+}
+
+template 
+void
+bar (T... t)
+{
+  void{t...};
+  void(t...);
+}
+
+void
+baz ()
+{
+  bar ();
+}
+
+template 
+void
+qux (T... t)
+{
+  void{t...};  // { dg-error "compound literal of non-object type" }
+}
+
+void
+corge ()
+{
+  qux (1, 2);
+}
+
+template 
+void
+garply (T... t)
+{
+  void{t..., t..., t...};
+  void(t..., t..., t...);
+}
+
+template 
+void
+grault (T... t)
+{
+  void{t..., 1};   // { dg-error "compound literal of non-object type" }
+}


Jakub

Re: [PATCH] c++, v2: Implement DR2351 - void{} [PR102820]

On Thu, Oct 28, 2021 at 08:01:27AM -0400, Jason Merrill wrote:
> > --- gcc/cp/semantics.c.jj   2021-10-27 09:16:41.161600606 +0200
> > +++ gcc/cp/semantics.c  2021-10-28 13:06:59.325791588 +0200
> > @@ -3079,6 +3079,24 @@ finish_unary_op_expr (location_t op_loc,
> > return result;
> >   }
> > +/* Return true if CONSTRUCTOR EXPR after pack expansion could have no
> > +   elements.  */
> > +
> > +static bool
> > +maybe_zero_constructor_nelts (tree expr)
> > +{
> > +  if (CONSTRUCTOR_NELTS (expr) == 0)
> > +return true;
> > +  if (!processing_template_decl)
> > +return false;
> > +  unsigned int i;
> > +  tree val;
> > +  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (expr), i, val)
> 
> Let's use
> 
>   for (constructor_elt &elt : CONSTRUCTOR_ELTS (t))

Ok, will do.

> > @@ -3104,9 +3122,20 @@ finish_compound_literal (tree type, tree
> > if (!TYPE_OBJ_P (type))
> >   {
> > -  if (complain & tf_error)
> > -   error ("compound literal of non-object type %qT", type);
> > -  return error_mark_node;
> > +  /* DR2351 */
> > +  if (VOID_TYPE_P (type) && CONSTRUCTOR_NELTS (compound_literal) == 0)
> > +   return void_node;
> 
> This test now seems redundant with the one below (if you remove the &&
> processing_template_decl).

It is not redundant, for the maybe case it doesn't return void_node, but
falls through into if (processing_template_decl), which, because
compound_literal is necessarily instantiation_dependent_expression_p
(it contains packs) will just create CONSTRUCTOR_IS_DEPENDENT CONSTRUCTOR
and we'll get here back during instantiation.
For the CONSTRUCTOR_NELTS == 0 case even in templates we know
compound_literal isn't dependent (it doesn't contain anything) and type
isn't either, so we can return void_node right away (and when
!processing_template_decl we have to do that).

Jakub

Re: [PATCH] middle-end/84407 - honor -frounding-math for int to float conversion

On Thu, 28 Oct 2021, Jakub Jelinek wrote:

> On Thu, Oct 28, 2021 at 01:32:17PM +0200, Richard Biener wrote:
> > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > index f38b6d7d31c..a16395befcd 100644
> > --- a/gcc/simplify-rtx.c
> > +++ b/gcc/simplify-rtx.c
> > @@ -1917,6 +1917,19 @@ simplify_const_unary_operation (enum rtx_code code, 
> > machine_mode mode,
> >  return 0;
> >  
> >d = real_value_truncate (mode, d);
> > +
> > +  /* Avoid the folding if flag_rounding_math is on and the
> > +conversion is not exact.  */
> > +  if (HONOR_SIGN_DEPENDENT_ROUNDING (mode))
> > +   {
> > + bool fail = false;
> > + wide_int w = real_to_integer (&d, &fail,
> > +   GET_MODE_PRECISION
> > + (as_a  (op_mode)));
> > + if (fail || wi::ne_p (w, wide_int (rtx_mode_t (op, op_mode
> > +   return 0;
> > +   }
> > +
> >return const_double_from_real_value (d, mode);
> >  }
> >else if (code == UNSIGNED_FLOAT && CONST_SCALAR_INT_P (op))
> 
> What about the else if case (i.e. UNSIGNED_FLOAT)?

I'm not able to trigger unsigned_float to be used, even when
converting 0x8001 I get (float:DF (reg:DI...))
on x86_64 because we emit conditional code that will end up
using some compensation to emulate unsigned_float with
float with some tricks that do not necessarily look safe
from a rounding perspective (so maybe x86 would need to
resort to soft-fp here?):

movabsq $4611686018427387905, %rax
cvtsi2sdq   %rax, %xmm0
addsd   %xmm0, %xmm0
ucomisd .LC0(%rip), %xmm0

the constant is (0x8001u >> 1) | 1

> And I think it would be nice to test the simplify-rtx.c code somewhere,
> perhaps gcc/testsuite/gcc.dg/rtl/x86_64 testcase and check that we
> simplify with -frounding-math e.g. UNSIGNED_FLOAT from DImode
> 0x8000 or FLOAT or UNSIGNED_FLOAT from DImode
> 0x7c00, but will not fold FLOAT or UNSIGNED_FLOAT from
> DImode 0x7c01 or 0x7fff.

That it is not folded is exercised by the testcase already.  I indeed
have no good way to test actual folding besides an RTL testcase
(and I didn't add one for GIMPLE).

But as said elsehwere I don't see the RTL constant folding code
as important with regard to floats, but of course we have to fix it up.

The patch as-is fixes the reported testcase on x86_64, a target
eventually not implementing float but only unsigned_float might be
still broken.  I can put the same code in the unsigned_float code
but I have no way of exercising it.

Anyway, it feels like I spent too much time on this already for what
was supposed to be low-hanging fruit ;)

Richard.

[COMMITTED] Make back_threader_registry inherit from back_jt_path_registry.

When a class's only purpose is to expose the methods of its only
member, it's really a derived class ;-).

Tested on x86-64 Linux.

Committed as obvious.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class back_threader_registry):
Inherit from back_jt_path_registry.
(back_threader_registry::thread_through_all_blocks): Remove.
(back_threader_registry::register_path): Remove
m_lowlevel_registry prefix.
---
 gcc/tree-ssa-threadbackward.c | 21 +
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index d9ce056b06c..6c1b15904ce 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -49,13 +49,10 @@ along with GCC; see the file COPYING3.  If not see
 // registered with register_path(), thread_through_all_blocks() is called
 // to modify the CFG.
 
-class back_threader_registry
+class back_threader_registry : public back_jt_path_registry
 {
 public:
   bool register_path (const vec &, edge taken);
-  bool thread_through_all_blocks (bool may_peel_loop_headers);
-private:
-  back_jt_path_registry m_lowlevel_registry;
 };
 
 // Class to abstract the profitability code for the backwards threader.
@@ -541,12 +538,6 @@ back_threader::debug ()
   dump (stderr);
 }
 
-bool
-back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers)
-{
-  return m_lowlevel_registry.thread_through_all_blocks (may_peel_loop_headers);
-}
-
 /* Examine jump threading path PATH and return TRUE if it is profitable to
thread it, otherwise return FALSE.
 
@@ -873,8 +864,7 @@ bool
 back_threader_registry::register_path (const vec &m_path,
   edge taken_edge)
 {
-  vec *jump_thread_path
-= m_lowlevel_registry.allocate_thread_path ();
+  vec *jump_thread_path = allocate_thread_path ();
 
   // The generic copier ignores the edge type.  We can build the
   // thread edges with any type.
@@ -885,12 +875,11 @@ back_threader_registry::register_path (const 
vec &m_path,
 
   edge e = find_edge (bb1, bb2);
   gcc_assert (e);
-  m_lowlevel_registry.push_edge (jump_thread_path, e, EDGE_COPY_SRC_BLOCK);
+  push_edge (jump_thread_path, e, EDGE_COPY_SRC_BLOCK);
 }
 
-  m_lowlevel_registry.push_edge (jump_thread_path,
-taken_edge, EDGE_NO_COPY_SRC_BLOCK);
-  m_lowlevel_registry.register_jump_thread (jump_thread_path);
+  push_edge (jump_thread_path, taken_edge, EDGE_NO_COPY_SRC_BLOCK);
+  register_jump_thread (jump_thread_path);
   return true;
 }
 
-- 
2.31.1

[COMMITTED] Improve backward threading with switches.

We've been essentially using find_taken_edge_switch_expr() in the
backward threader, but this is suboptimal because said function only
works with singletons.  VRP has a much smarter find_case_label_range
that works with ranges.

Tested on x86-64 Linux with:

a) Bootstrap & regtests.

b) Verifying we get more threads than before.

c) Asserting that the new code catches everything the old one
code caught (over a set of bootstrap .ii files).

Committed as obvious.

gcc/ChangeLog:

* tree-ssa-threadbackward.c
(back_threader::find_taken_edge_switch): Use find_case_label_range
instead of find_taken_edge.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp106.c: Adjust for threading.
* gcc.dg/tree-ssa/vrp113.c: Same.
---
 gcc/testsuite/gcc.dg/tree-ssa/vrp106.c | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/vrp113.c | 2 --
 gcc/tree-ssa-threadbackward.c  | 8 
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c
index f25ea9c3826..dc5021a57b5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp106.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/18046  */
-/* { dg-options "-O2 -fdump-tree-vrp-thread1-details" }  */
-/* { dg-final { scan-tree-dump-times "Threaded jump" 1 "vrp-thread1" } }  */
+/* { dg-options "-O2 -fdump-tree-ethread-details" }  */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 1 "ethread" } } 
 */
 /* During VRP we expect to thread the true arm of the conditional through the 
switch
and to the BB that corresponds to the 7 ... 9 case label.  */
 extern void foo (void);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp113.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp113.c
index ab8d91e0f10..dfe4989d313 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp113.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp113.c
@@ -13,5 +13,3 @@ int f(int a) {
   case 7: return 19;
 }
 }
-
-/* { dg-final { scan-tree-dump "return 3;" "vrp1" { xfail *-*-* } } } */
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 6c1b15904ce..456effca5e1 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -195,11 +195,11 @@ back_threader::find_taken_edge_switch (const 
vec &path,
   if (r.varying_p ())
 return NULL;
 
-  tree val;
-  if (r.singleton_p (&val))
-return ::find_taken_edge (gimple_bb (sw), val);
+  tree label = find_case_label_range (sw, &r);
+  if (!label)
+return NULL;
 
-  return NULL;
+  return find_edge (gimple_bb (sw), label_to_block (cfun, CASE_LABEL (label)));
 }
 
 // Same as find_taken_edge, but for paths ending in a GIMPLE_COND.
-- 
2.31.1

Re: [PATCH] middle-end/84407 - honor -frounding-math for int to float conversion

On Thu, Oct 28, 2021 at 02:24:23PM +0200, Richard Biener wrote:
> I'm not able to trigger unsigned_float to be used, even when
> converting 0x8001 I get (float:DF (reg:DI...))
> on x86_64 because we emit conditional code that will end up
> using some compensation to emulate unsigned_float with
> float with some tricks that do not necessarily look safe
> from a rounding perspective (so maybe x86 would need to
> resort to soft-fp here?):
> 
> movabsq $4611686018427387905, %rax
> cvtsi2sdq   %rax, %xmm0
> addsd   %xmm0, %xmm0
> ucomisd .LC0(%rip), %xmm0
> 
> the constant is (0x8001u >> 1) | 1

Missing -mavx512f ?
(define_expand "floatunsdidf2"
  [(set (match_operand:DF 0 "register_operand")
(unsigned_float:DF
  (match_operand:DI 1 "nonimmediate_operand")))]
  "((TARGET_64BIT && TARGET_AVX512F)
|| TARGET_KEEPS_VECTOR_ALIGNED_STACK)
   && TARGET_SSE2 && TARGET_SSE_MATH"
{
  if (!TARGET_64BIT)
{
  ix86_expand_convert_uns_didf_sse (operands[0], operands[1]);
  DONE;
}
  if (!TARGET_AVX512F)
{
  x86_emit_floatuns (operands);
  DONE;
}
})
where x86_emit_floatuns emits that emulation?
Anyway, what the testcase probably needs to do is this
  (set (reg:DI temp1) (const_int ...))
  (set (reg:DF temp2) (unsigned_float:DF (reg:DI temp1))) ! And also float 
separately too
  (set (reg:DI temp3) (subreg:DF (reg:DF temp2)))
or something similar so that during combine it is not rejected because
it is not valid to have the DFmode constants as immediates and they'd need
to go into memory instead.  But the subreg might not be valid too.
So perhaps some different target.
Yet another option would be a self-test...

But if you don't have time for the testcase right now, let's just
handle it in UNSIGNED_FLOAT too and I can try to look at the testcase
later?

Jakub

Re: [PATCH] middle-end/84407 - honor -frounding-math for int to float conversion

On Thu, 28 Oct 2021, Richard Biener wrote:

> On Thu, 28 Oct 2021, Jakub Jelinek wrote:
> 
> > On Thu, Oct 28, 2021 at 01:32:17PM +0200, Richard Biener wrote:
> > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> > > index f38b6d7d31c..a16395befcd 100644
> > > --- a/gcc/simplify-rtx.c
> > > +++ b/gcc/simplify-rtx.c
> > > @@ -1917,6 +1917,19 @@ simplify_const_unary_operation (enum rtx_code 
> > > code, machine_mode mode,
> > >  return 0;
> > >  
> > >d = real_value_truncate (mode, d);
> > > +
> > > +  /* Avoid the folding if flag_rounding_math is on and the
> > > +  conversion is not exact.  */
> > > +  if (HONOR_SIGN_DEPENDENT_ROUNDING (mode))
> > > + {
> > > +   bool fail = false;
> > > +   wide_int w = real_to_integer (&d, &fail,
> > > + GET_MODE_PRECISION
> > > +   (as_a  (op_mode)));
> > > +   if (fail || wi::ne_p (w, wide_int (rtx_mode_t (op, op_mode
> > > + return 0;
> > > + }
> > > +
> > >return const_double_from_real_value (d, mode);
> > >  }
> > >else if (code == UNSIGNED_FLOAT && CONST_SCALAR_INT_P (op))
> > 
> > What about the else if case (i.e. UNSIGNED_FLOAT)?
> 
> I'm not able to trigger unsigned_float to be used, even when
> converting 0x8001 I get (float:DF (reg:DI...))
> on x86_64 because we emit conditional code that will end up
> using some compensation to emulate unsigned_float with
> float with some tricks that do not necessarily look safe
> from a rounding perspective (so maybe x86 would need to
> resort to soft-fp here?):
> 
> movabsq $4611686018427387905, %rax
> cvtsi2sdq   %rax, %xmm0
> addsd   %xmm0, %xmm0
> ucomisd .LC0(%rip), %xmm0
> 
> the constant is (0x8001u >> 1) | 1
> 
> > And I think it would be nice to test the simplify-rtx.c code somewhere,
> > perhaps gcc/testsuite/gcc.dg/rtl/x86_64 testcase and check that we
> > simplify with -frounding-math e.g. UNSIGNED_FLOAT from DImode
> > 0x8000 or FLOAT or UNSIGNED_FLOAT from DImode
> > 0x7c00, but will not fold FLOAT or UNSIGNED_FLOAT from
> > DImode 0x7c01 or 0x7fff.
> 
> That it is not folded is exercised by the testcase already.  I indeed
> have no good way to test actual folding besides an RTL testcase
> (and I didn't add one for GIMPLE).
> 
> But as said elsehwere I don't see the RTL constant folding code
> as important with regard to floats, but of course we have to fix it up.
> 
> The patch as-is fixes the reported testcase on x86_64, a target
> eventually not implementing float but only unsigned_float might be
> still broken.  I can put the same code in the unsigned_float code
> but I have no way of exercising it.
> 
> Anyway, it feels like I spent too much time on this already for what
> was supposed to be low-hanging fruit ;)

The following nevertheless adds a testcase with large enough
constants that might trigger unsigned_float plus a hunk to fix that.

Would that be OK?

Thanks,
Richard.

>From bb57eaf45329e1dd0ccb0fe82b30e189d1cd86a4 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 28 Oct 2021 11:38:32 +0200
Subject: [PATCH] middle-end/84407 - honor -frounding-math for int to float
 conversion
To: gcc-patches@gcc.gnu.org

This makes us honor -frounding-math for integer to float conversions
and avoid constant folding when such conversion is not exact.

2021-10-28  Richard Biener  

PR middle-end/84407
* fold-const.c (fold_convert_const): Avoid int to float
constant folding with -frounding-math and inexact result.
* simplify-rtx.c (simplify_const_unary_operation): Likewise
for both float and unsigned_float.

* gcc.dg/torture/fp-uint64-convert-double-1.c: New testcase.
* gcc.dg/torture/fp-uint64-convert-double-2.c: Likewise.
---
 gcc/fold-const.c  | 15 +++-
 gcc/simplify-rtx.c| 26 +++
 .../torture/fp-uint64-convert-double-1.c  | 74 ++
 .../torture/fp-uint64-convert-double-2.c  | 75 +++
 4 files changed, 189 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-2.c

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 18950aeb760..c7daf871125 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2290,7 +2290,20 @@ fold_convert_const (enum tree_code code, tree type, tree 
arg1)
   else if (TREE_CODE (type) == REAL_TYPE)
 {
   if (TREE_CODE (arg1) == INTEGER_CST)
-   return build_real_from_int_cst (type, arg1);
+   {
+ tree res = build_real_from_int_cst (type, arg1);
+ /* Avoid the folding if flag_rounding_math is on and the
+conversion is not exact.  */
+ if (HONOR_SIGN_DEPENDENT_ROUNDING (type))
+   {
+ bool fail = false;
+

Re: [PATCH] middle-end/84407 - honor -frounding-math for int to float conversion

On Thu, 28 Oct 2021, Jakub Jelinek wrote:

> On Thu, Oct 28, 2021 at 02:24:23PM +0200, Richard Biener wrote:
> > I'm not able to trigger unsigned_float to be used, even when
> > converting 0x8001 I get (float:DF (reg:DI...))
> > on x86_64 because we emit conditional code that will end up
> > using some compensation to emulate unsigned_float with
> > float with some tricks that do not necessarily look safe
> > from a rounding perspective (so maybe x86 would need to
> > resort to soft-fp here?):
> > 
> > movabsq $4611686018427387905, %rax
> > cvtsi2sdq   %rax, %xmm0
> > addsd   %xmm0, %xmm0
> > ucomisd .LC0(%rip), %xmm0
> > 
> > the constant is (0x8001u >> 1) | 1
> 
> Missing -mavx512f ?

Yeah, but I have no way to test AVX512 (well, I might try SDE but not
sure whether that handles rounding modes ;))  OK, so just trying
with that the -2 testcase indeed FAILs without the unsigned_float
hunk but succeeds with.  It also oddly succeeds with or without
the patch and the floatunsdidf emulation.

> (define_expand "floatunsdidf2"
>   [(set (match_operand:DF 0 "register_operand")
> (unsigned_float:DF
>   (match_operand:DI 1 "nonimmediate_operand")))]
>   "((TARGET_64BIT && TARGET_AVX512F)
> || TARGET_KEEPS_VECTOR_ALIGNED_STACK)
>&& TARGET_SSE2 && TARGET_SSE_MATH"
> {
>   if (!TARGET_64BIT)
> {
>   ix86_expand_convert_uns_didf_sse (operands[0], operands[1]);
>   DONE;
> }
>   if (!TARGET_AVX512F)
> {
>   x86_emit_floatuns (operands);
>   DONE;
> }
> })
> where x86_emit_floatuns emits that emulation?
> Anyway, what the testcase probably needs to do is this
>   (set (reg:DI temp1) (const_int ...))
>   (set (reg:DF temp2) (unsigned_float:DF (reg:DI temp1))) ! And also float 
> separately too
>   (set (reg:DI temp3) (subreg:DF (reg:DF temp2)))
> or something similar so that during combine it is not rejected because
> it is not valid to have the DFmode constants as immediates and they'd need
> to go into memory instead.  But the subreg might not be valid too.
> So perhaps some different target.
> Yet another option would be a self-test...
> 
> But if you don't have time for the testcase right now, let's just
> handle it in UNSIGNED_FLOAT too and I can try to look at the testcase
> later?

Sure, that works for me.  See the patch I posted which is now in 
re-testing.

Thanks,
Richard.

Re: [PATCH] middle-end/84407 - honor -frounding-math for int to float conversion

On Thu, Oct 28, 2021 at 02:40:24PM +0200, Richard Biener wrote:
> This makes us honor -frounding-math for integer to float conversions
> and avoid constant folding when such conversion is not exact.
> 
> 2021-10-28  Richard Biener  
> 
>   PR middle-end/84407
>   * fold-const.c (fold_convert_const): Avoid int to float
>   constant folding with -frounding-math and inexact result.
>   * simplify-rtx.c (simplify_const_unary_operation): Likewise
>   for both float and unsigned_float.
> 
>   * gcc.dg/torture/fp-uint64-convert-double-1.c: New testcase.
>   * gcc.dg/torture/fp-uint64-convert-double-2.c: Likewise.

Ok, thanks.

Jakub

Re: [COMMITTED] Fold all statements in Ranger VRP.





On 10/25/2021 6:03 PM, Andrew MacLeod via Gcc-patches wrote:
This patch changes the ranger VRP pass to simplify all statements, not 
just the ones with ranges.  I believe Jeff had mentioned we were no 
longer doing this a while back.  Now we need it when running as the 
VRP2 pass to satisfy the testcase: gcc.dg/wrapped-binop-simplify.c
I did :-)   The lack of folding in [E]VRP caused a critical function in 
perl to be ever-so-slightly larger and not be inlined leading to a 
measurable performance regression in spec when compiled with gcc-11.  I 
never bothered upstreaming our internal version of this as other 
(non-VRP) changes on the trunk addressed the problem.


Jeff

Re: [COMMITTED] Fold all statements in Ranger VRP.


On 10/28/21 9:38 AM, Jeff Law wrote:



On 10/25/2021 6:03 PM, Andrew MacLeod via Gcc-patches wrote:
This patch changes the ranger VRP pass to simplify all statements, 
not just the ones with ranges.  I believe Jeff had mentioned we were 
no longer doing this a while back.  Now we need it when running as 
the VRP2 pass to satisfy the testcase: gcc.dg/wrapped-binop-simplify.c
I did :-)   The lack of folding in [E]VRP caused a critical function 
in perl to be ever-so-slightly larger and not be inlined leading to a 
measurable performance regression in spec when compiled with gcc-11.  
I never bothered upstreaming our internal version of this as other 
(non-VRP) changes on the trunk addressed the problem.


Jeff


so does this resolve the situation then?  It should, in theory.

Andrew

Re: [COMMITTED] Fold all statements in Ranger VRP.





On 10/28/2021 7:47 AM, Andrew MacLeod wrote:

On 10/28/21 9:38 AM, Jeff Law wrote:



On 10/25/2021 6:03 PM, Andrew MacLeod via Gcc-patches wrote:
This patch changes the ranger VRP pass to simplify all statements, 
not just the ones with ranges.  I believe Jeff had mentioned we were 
no longer doing this a while back.  Now we need it when running as 
the VRP2 pass to satisfy the testcase: gcc.dg/wrapped-binop-simplify.c
I did :-)   The lack of folding in [E]VRP caused a critical function 
in perl to be ever-so-slightly larger and not be inlined leading to a 
measurable performance regression in spec when compiled with gcc-11.  
I never bothered upstreaming our internal version of this as other 
(non-VRP) changes on the trunk addressed the problem.


Jeff


so does this resolve the situation then?  It should, in theory.
I would expect so.  It's quite similar to what we're doing internally 
with our gcc-11 tree.

jeff

[PATCH 0/3] RISC-V: Zfinx extension support

Zfinx extension[1] had already finished public review. Here is the 
implementation patch set that reuse floating point pattern and ban the use of 
fpr when use zfinx as a target.

Current works can be find in follow links, we will keep update zhinx and 
zhinxmin after zfh extension goes upstream.
  https://github.com/pz9115/riscv-gcc/tree/zfinx-rebase
  https://github.com/pz9115/riscv-binutils-gdb/tree/zfinx-rebase

For test you can use qemu or spike that support zfinx extension, the
qemu will go upstream soon and spike is still in review:
  https://github.com/plctlab/plct-qemu/tree/plct-zfinx-dev
  https://github.com/plctlab/plct-spike/tree/plct-upstream-zfinx  

Thanks for Tariq Kurd, Kito Cheng, Jim Willson, Jeremy Bennett helped us a lot 
with this work.

[1] https://github.com/riscv/riscv-zfinx/blob/main/zfinx-1.0.0-rc.pdf

jiawei sinan (3):
  RISC-V: Minimal support of zfinx extension
  RISC-V: Target support for zfinx extension
  RISC-V: Imply info and regs limit for zfinx extension

 gcc/common/config/riscv/riscv-common.c |  6 +++
 gcc/config/riscv/arch-canonicalize |  1 +
 gcc/config/riscv/constraints.md|  3 +-
 gcc/config/riscv/riscv-builtins.c  |  4 +-
 gcc/config/riscv/riscv-c.c |  2 +-
 gcc/config/riscv/riscv-opts.h  |  6 +++
 gcc/config/riscv/riscv.c   | 15 +-
 gcc/config/riscv/riscv.md  | 72 +-
 gcc/config/riscv/riscv.opt |  3 ++
 9 files changed, 70 insertions(+), 42 deletions(-)

[PATCH 1/3] RISC-V: Minimal support of zfinx extension

Co-Authored-By: sinan 
---
 gcc/common/config/riscv/riscv-common.c | 6 ++
 gcc/config/riscv/riscv-opts.h  | 6 ++
 gcc/config/riscv/riscv.opt | 3 +++
 3 files changed, 15 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 37b6ea80086..ab48909e338 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -106,6 +106,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbs", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zfinx", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zdinx", ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -916,6 +919,9 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
   {"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
   {"zbs",&gcc_options::x_riscv_zb_subext, MASK_ZBS},
 
+  {"zfinx",&gcc_options::x_riscv_zf_subext, MASK_ZFINX},
+  {"zdinx",&gcc_options::x_riscv_zf_subext, MASK_ZDINX},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 2efc4b80f1f..5a790a028cf 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -83,4 +83,10 @@ enum stack_protector_guard {
 #define TARGET_ZBC((riscv_zb_subext & MASK_ZBC) != 0)
 #define TARGET_ZBS((riscv_zb_subext & MASK_ZBS) != 0)
 
+#define MASK_ZFINX  (1 << 0)
+#define MASK_ZDINX  (1 << 1)
+
+#define TARGET_ZFINX((riscv_zf_subext & MASK_ZFINX) != 0)
+#define TARGET_ZDINX((riscv_zf_subext & MASK_ZDINX) != 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 15bf89e17c2..54d27747eff 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -198,6 +198,9 @@ int riscv_zi_subext
 TargetVariable
 int riscv_zb_subext
 
+TargetVariable
+int riscv_zf_subext
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
-- 
2.25.1

[PATCH 2/3] RISC-V: Target support for zfinx extension

Co-Authored-By: sinan 
---
 gcc/config/riscv/riscv-builtins.c |  4 +-
 gcc/config/riscv/riscv-c.c|  2 +-
 gcc/config/riscv/riscv.md | 72 +++
 3 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/gcc/config/riscv/riscv-builtins.c 
b/gcc/config/riscv/riscv-builtins.c
index 97b1480a15e..d892e6cdb26 100644
--- a/gcc/config/riscv/riscv-builtins.c
+++ b/gcc/config/riscv/riscv-builtins.c
@@ -85,7 +85,7 @@ struct riscv_builtin_description {
   unsigned int (*avail) (void);
 };
 
-AVAIL (hard_float, TARGET_HARD_FLOAT)
+AVAIL (hard_float, TARGET_HARD_FLOAT || TARGET_ZFINX)
 
 /* Construct a riscv_builtin_description from the given arguments.
 
@@ -279,7 +279,7 @@ riscv_expand_builtin (tree exp, rtx target, rtx subtarget 
ATTRIBUTE_UNUSED,
 void
 riscv_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 {
-  if (!TARGET_HARD_FLOAT)
+  if (!(TARGET_HARD_FLOAT || TARGET_ZFINX))
 return;
 
   tree frflags = GET_BUILTIN_DECL (CODE_FOR_riscv_frflags);
diff --git a/gcc/config/riscv/riscv-c.c b/gcc/config/riscv/riscv-c.c
index efd4a61ea29..d064a7fc2b3 100644
--- a/gcc/config/riscv/riscv-c.c
+++ b/gcc/config/riscv/riscv-c.c
@@ -58,7 +58,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
   if (TARGET_HARD_FLOAT)
 builtin_define_with_int_value ("__riscv_flen", UNITS_PER_FP_REG * 8);
 
-  if (TARGET_HARD_FLOAT && TARGET_FDIV)
+  if ((TARGET_HARD_FLOAT || TARGET_ZFINX) && TARGET_FDIV)
 {
   builtin_define ("__riscv_fdiv");
   builtin_define ("__riscv_fsqrt");
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index dd4c24292f2..0fef80c8742 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -296,8 +296,8 @@
 (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")])
 
 ;; Iterator for hardware-supported floating-point modes.
-(define_mode_iterator ANYF [(SF "TARGET_HARD_FLOAT")
-   (DF "TARGET_DOUBLE_FLOAT")])
+(define_mode_iterator ANYF [(SF "TARGET_HARD_FLOAT || TARGET_ZFINX")
+   (DF "TARGET_DOUBLE_FLOAT || TARGET_ZDINX")])
 
 ;; Iterator for floating-point modes that can be loaded into X registers.
 (define_mode_iterator SOFTF [SF (DF "TARGET_64BIT")])
@@ -444,7 +444,7 @@
   [(set (match_operand:ANYF0 "register_operand" "=f")
(plus:ANYF (match_operand:ANYF 1 "register_operand" " f")
   (match_operand:ANYF 2 "register_operand" " f")))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fadd.\t%0,%1,%2"
   [(set_attr "type" "fadd")
(set_attr "mode" "")])
@@ -575,7 +575,7 @@
   [(set (match_operand:ANYF 0 "register_operand" "=f")
(minus:ANYF (match_operand:ANYF 1 "register_operand" " f")
(match_operand:ANYF 2 "register_operand" " f")))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fsub.\t%0,%1,%2"
   [(set_attr "type" "fadd")
(set_attr "mode" "")])
@@ -745,7 +745,7 @@
   [(set (match_operand:ANYF   0 "register_operand" "=f")
(mult:ANYF (match_operand:ANYF1 "register_operand" " f")
  (match_operand:ANYF 2 "register_operand" " f")))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fmul.\t%0,%1,%2"
   [(set_attr "type" "fmul")
(set_attr "mode" "")])
@@ -1052,7 +1052,7 @@
   [(set (match_operand:ANYF   0 "register_operand" "=f")
(div:ANYF (match_operand:ANYF 1 "register_operand" " f")
  (match_operand:ANYF 2 "register_operand" " f")))]
-  "TARGET_HARD_FLOAT && TARGET_FDIV"
+  "(TARGET_HARD_FLOAT || TARGET_ZFINX) && TARGET_FDIV"
   "fdiv.\t%0,%1,%2"
   [(set_attr "type" "fdiv")
(set_attr "mode" "")])
@@ -1067,7 +1067,7 @@
 (define_insn "sqrt2"
   [(set (match_operand:ANYF0 "register_operand" "=f")
(sqrt:ANYF (match_operand:ANYF 1 "register_operand" " f")))]
-  "TARGET_HARD_FLOAT && TARGET_FDIV"
+  "(TARGET_HARD_FLOAT || TARGET_ZFINX) && TARGET_FDIV"
 {
 return "fsqrt.\t%0,%1";
 }
@@ -1082,7 +1082,7 @@
(fma:ANYF (match_operand:ANYF 1 "register_operand" " f")
  (match_operand:ANYF 2 "register_operand" " f")
  (match_operand:ANYF 3 "register_operand" " f")))]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fmadd.\t%0,%1,%2,%3"
   [(set_attr "type" "fmadd")
(set_attr "mode" "")])
@@ -1093,7 +1093,7 @@
(fma:ANYF (match_operand:ANYF   1 "register_operand" " f")
  (match_operand:ANYF   2 "register_operand" " f")
  (neg:ANYF (match_operand:ANYF 3 "register_operand" " f"]
-  "TARGET_HARD_FLOAT"
+  "TARGET_HARD_FLOAT || TARGET_ZFINX"
   "fmsub.\t%0,%1,%2,%3"
   [(set_attr "type" "fmadd")
(set_attr "mode" "")])
@@ -1105,7 +1105,7 @@
(neg:ANYF (match_operand:ANYF 1 "register_operand" " f"))
(match_operand:ANYF   2 "register_operand" " f")
(neg:ANYF (match_operand:A

[PATCH 3/3] RISC-V: Imply info and regs limit for zfinx extension