[V2][to-be-committed][RISC-V] Trivial permutation constant derivation

2025-05-04 Thread Jeff Law
This is a patch from late 2024 (just before stage1 freeze), but I never 
pushed hard to the change, and thus never integrated it.


It's mostly unchanged except for updating insn in the hash table after 
finding an optimizable case.  We were holding the deleted insn in the 
hash table rather than the new insn.  Just something I noticed recently.


Bootstrapped and regression tested on my BPI and regression tested 
riscv32-elf and riscv64-elf configurations.  We've used this since 
November internally, so it's well exercised on spec as well.


Waiting pre-commit testing's verdict...

jeff

gcc/
* config.cc (riscv): Add riscv-vect-permcost.o to extra_objs.
* config/riscv/riscv-passes.def (pass_vector_permcost): Add new pass.
* config/riscv/riscv-protos.h (make_pass_vector_permconst): Declare.
* config/riscv/riscv-vect-permconst.cc: New file.
* config/riscv/t-riscv: Add build rule for riscv-vect-permcost.o


diff --git a/gcc/config.gcc b/gcc/config.gcc
index d98df883fce..2d4155b14e4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -550,7 +550,7 @@ pru-*-*)
 riscv*)
cpu_type=riscv
extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
-   extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o 
riscv-avlprop.o"
+   extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o 
riscv-avlprop.o riscv-vect-permconst.o"
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o 
sifive-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o riscv-zicfilp.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-passes.def 
b/gcc/config/riscv/riscv-passes.def
index 7e6a2a0e53d..bc803c4678e 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -21,3 +21,5 @@ INSERT_PASS_AFTER (pass_rtl_store_motion, 1, 
pass_shorten_memrefs);
 INSERT_PASS_AFTER (pass_split_all_insns, 1, pass_avlprop);
 INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
 INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_landing_pad);
+INSERT_PASS_AFTER (pass_cse2, 1, pass_vector_permconst);
+
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 2bedd878a04..2e889903eb3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -201,6 +201,8 @@ rtl_opt_pass * make_pass_shorten_memrefs (gcc::context 
*ctxt);
 rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 rtl_opt_pass * make_pass_insert_landing_pad (gcc::context *ctxt);
+rtl_opt_pass * make_pass_vector_permconst (gcc::context *ctxt);
+
 
 /* Routines implemented in riscv-string.c.  */
 extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx);
diff --git a/gcc/config/riscv/riscv-vect-permconst.cc 
b/gcc/config/riscv/riscv-vect-permconst.cc
new file mode 100644
index 000..feecc7ed6da
--- /dev/null
+++ b/gcc/config/riscv/riscv-vect-permconst.cc
@@ -0,0 +1,300 @@
+/* Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or(at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define IN_TARGET_CODE 1
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+#define INCLUDE_MEMORY
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "rtl-ssa.h"
+#include "cfgcleanup.h"
+#include "insn-attr.h"
+#include "tm-constrs.h"
+#include "insn-opinit.h"
+#include "cfgrtl.h"
+
+/* So the basic idea of this pass is to identify loads of permutation
+   constants from the constant pool which could instead be trivially
+   derived from some earlier vector permutation constant.  This will
+   replace a memory load from the constant pool with a vadd.vi
+   instruction.
+
+   Conceptually this is much like the related_values optimization in
+   CSE, reload_cse_move2add or using SLSR to optimize constant synthesis.
+   If we wanted to make this generic I would suggest putting it into CSE
+   and providing target hooks to determine if particular permutation
+   constants could be derived from earlier permutation constants.  */
+
+const pass_data pass_data_vect_permconst = {
+  RTL_PASS,

Re: [PATCH][gcc13] PR tree-optimization/117287 - Backport new assume implementation

2025-05-04 Thread Andrew MacLeod

And now another PR was opened  (120887).  so perhaps I should apply it.

If I don't hear a dissenting opinion, I'll Apply it monday PM to GCC 13.

It is completely self contained to uses of complex ASSUME's as Jakub 
suggests it is...


Andrew

On 3/28/25 05:25, Jakub Jelinek wrote:

On Fri, Mar 28, 2025 at 08:12:35AM +0100, Richard Biener wrote:

On Thu, Mar 27, 2025 at 8:14 PM Andrew MacLeod  wrote:

This patch backports the ASSUME support that was rewritten in GCC 15.

Its slightly more complicated than the port to GCC 14 was in that a few
classes have been rewritten. I've isolated them all to tree-assume.cc
which contains the pass.

It has to also bring in the ssa_cache and lazy_ssa_cache from gcc14,
along with some tweaks to those classes to deal with changes in the way
range_allocators worked started in GCC14. Those changes are are all the
top of the tree-assume.cc file. The rest of the file is a carbon copy of
the GCC14 version. (well, what should be... there is an outstanding
debug output support that was never submitted I discovered)

I'm not sure if its worth putting this in GCC13 or not, but I will
submit it and leave it to the release managers :-)  It should be low
risk, especially since assume was experimental support?

I have no strong opinion here besides questioning whether it's
necessary (as you say, assume is experimental) and the fact that
by splicing out the VRP changes to a special place further maintenance
is made more difficult.

IMO, up to you (expecting you'll fix issues if they come up), but would
like to hear a 2nd opinion from Jakub.

I'd probably apply it, it was a wrong-code issue and I'm not sure
users understand assume as experimental.
While the [[assume (...)]]; form is a C++23 feature which is experimental,
we accept that attribute even since C++11 and in C23 and in the
__attribute__((assume (...))); form everywhere and as a documented
extension.

If the ranger changes are done only when users actually use assume rather
than all the time (and only when using non-trivial assumptions, trivial
ones with no side-effects are turned into if (!x) __builtin_unreachable ()),
I think this decreases the risks.

Jakub





Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-04 Thread Maciej W. Rozycki
On Sun, 4 May 2025, John Paul Adrian Glaubitz wrote:

> > > What exactly is broken with the QEMU emulation in Alpha? I don't know of 
> > > any
> > > bugs, but it could be that you have run into the nasty stack alignment 
> > > issue
> > > in the kernel that was fixed in Linux 6.14.
> > 
> >  This was with QEMU in the user emulation mode, causing intermittent 
> > failures across the GCC testsuite, so unrelated to any Linux kernel 
> > issues.  Perhaps the system emulation mode works better, but the GCC 
> > testsuite doesn't rely much on syscall emulation and the nature of the 
> > failures didn't indicate this aspect of the user emulation mode mattered 
> > here.
> 
> >From my personal experience, qemu-user has various issues that don't exist
> on qemu-system. So, if you're experiencing a qemu-related bug in qemu-user,
> it's always worth verifying it with qemu-system.

 No time to verify odd configurations here.  However of all the syscalls 
most GCC test cases only rely on exit_group(2) and kill(2) and ones that 
did fail intermittenly were purely arithmetic, so honestly I doubt the 
emulation mode matters.

 Yes, I know what the shortcomings of QEMU are, having worked with and 
contributing to the project for 15+ years now.  My interest with the 
project has faded though after a series of arguments with a short-lived 
MIPS backend maintainer.

 NB it was me who diagnosed the stack alignment bug in the Linux kernel 
(Ivan made the fixes), so I've been fairly aware of its existence.

> >  I have reported it at the time and this has led to Magnus being kind 
> > enough, following your request, to let me use his BWX Alpha system for 
> > verification instead, where no intermittent failures were observed, so 
> > again no Linux kernel bugs mattered here (this was last year, well before 
> > the fix) and it was QEMU clearly at fault.
> 
> Could you point me to the bug report in question? I would like to look into
> it and see if it is alpha-specific.

 No actual bug report, just the mention in a discussion, but I'm fairly 
sure you were cc-ed, so you should be able to chase it.  It was around if 
not along with my GCC patch submission for `-msafe-partial' option back in 
Nov last year.  Please feel free turning it into a proper bug report 
against QEMU.  Somehow I feel there won't be a rush of volunteers to fix 
it though.

> > > >  What I was not aware of is the situation with the Alpha backend and 
> > > > the 
> > > > need to put out fires there.  That non-BWX issue with Linux kernel's 
> > > > RCU 
> > > > algorithms was a nasty surprise to me, one I could have dealt with 
> > > > before 
> > > > with less time pressure if I knew about it.
> > > 
> > > What RCU issue are you talking about? I can only stress that to use Linux 
> > > on
> > > Alpha, you *must* use kernel 6.14 or later with CONFIG_COMPACTION disabled
> > > otherwise you will run into all kinds of issues.
> > 
> >  The very RCU issue that prompted the removal of non-BWX support from the 
> > kernel last year and then this whole effort of mine.
> 
> Aha, I wasn't aware that the original cause for the removal of non-BWX support
> was due to issue with RCU. I thought the original motivation was that non-BWX
> Alpha doesn't support byte-access which Linus called a design mistake.

 The lack of byte accesses in the architecture isn't itself a problem, 
though indeed an engineering challenge.  What the problem has been is the 
original replacement RMW sequences GCC has produced cause data races that 
triggered with RCU code.  It was discussed at the time of non-BWX Alpha 
support removal from the Linux kernel, which you raised an objection 
against.

 HTH,

  Maciej


[RFC PATCH 2/2] Add target_clones profile option support

2025-05-04 Thread Yangyu Chen
This patch adds support for target_clones profile option. The
target_clones profile option allows users to specify multiple versions
of a function and select the version at runtime based on the specified
profile.

The profile is formatted as a string of ':' separated pairs of
assembler name and target_clones attribute string, one pair per
line.

The format of attribute string is target specific and should follow the
target_clones attribute syntax. For the splitter of multiple
target_clones attributes, we should follow the
TARGET_CLONES_ATTR_SEPARATOR defined in each target. Currently, we use
'#' for RISC-V and ',' for i386 and aarch64.

A example of the profile is as follows on RISC-V:

C source code "ror32.c":

```
void ror32(unsigned int *a, unsigned int b, unsigned long size) {
for (unsigned long i = 0; i < size; i++) {
a[i] = a[i] >> b | (a[i] << (32 - b));
}
}
```

Profile "ror32.target_clones":

```
ror32:default#arch=+zvbb,+zbb#arch=+zbb
```

Then use: gcc -O3 -ftarget-profile=ror32.target_clones -S ror32.c
to compile the source code. This will generate 3 versions and its IFUNC
resolver for the ror32 function which is "arch=+zvbb,+zbb" and
"arch=+zbb" and the default version.

Signed-off-by: Yangyu Chen 

gcc/ChangeLog:

* common.opt: Add target_clones profile option.
* multiple_target.cc (expand_target_clones): Add support for
target_clones profile option.
* opts.cc (common_handle_option): Ditto.
---
 gcc/common.opt |  7 +++
 gcc/multiple_target.cc | 24 +++-
 gcc/opts.cc| 23 +++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 0e50305dde8..31d7c879c81 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2688,6 +2688,13 @@ fprofile-reorder-functions
 Common Var(flag_profile_reorder_functions) Optimization
 Enable function reordering that improves code placement.
 
+ftarget-profile=
+Common Joined RejectNegative Var(target_profile)
+Enable target_clones profile options.
+
+Variable
+void *target_profile_map = NULL
+
 fpatchable-function-entry=
 Common Var(flag_patchable_function_entry) Joined Optimization
 Insert NOP instructions at each function entry.
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index d25277c0a93..4b8a7da2308 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -38,6 +38,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-walk.h"
 #include "tree-inline.h"
 #include "intl.h"
+#include 
+#include 
 
 /* Walker callback that replaces all FUNCTION_DECL of a function that's
going to be versioned.  */
@@ -319,7 +321,27 @@ expand_target_clones (struct cgraph_node *node, bool 
definition)
   DECL_ATTRIBUTES (node->decl));
   /* No targets specified.  */
   if (!attr_target)
-return false;
+{
+  /* Skip functions that are declared but not defined.  */
+  if (target_profile != NULL && DECL_INITIAL (node->decl) != NULL_TREE)
+   {
+ auto profile_map
+   = static_cast *>
+   (target_profile_map);
+ auto it = profile_map->find (IDENTIFIER_POINTER (
+ DECL_ASSEMBLER_NAME_RAW (node->decl)));
+ if (it != profile_map->end ())
+   {
+ attr_target = make_attribute ("target_clones",
+   it->second.c_str (),
+   DECL_ATTRIBUTES (node->decl));
+   }
+ else
+   return false;
+   }
+  else
+   return false;
+}
 
   tree arglist = TREE_VALUE (attr_target);
   int attr_len = get_target_clone_attr_len (arglist);
diff --git a/gcc/opts.cc b/gcc/opts.cc
index a9b9b9148a9..6ae004e6653 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -36,6 +36,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "version.h"
 #include "selftest.h"
 #include "file-prefix-map.h"
+#include 
+#include 
+#include 
 
 /* In this file all option sets are explicit.  */
 #undef OPTION_SET_P
@@ -3160,6 +3163,26 @@ common_handle_option (struct gcc_options *opts,
 case OPT_fprofile_info_section:
   opts->x_profile_info_section = ".gcov_info";
   break;
+case OPT_ftarget_profile_:
+  {
+   std::ifstream profile_file;
+   profile_file.open (arg);
+   if (profile_file.fail ())
+ error_at (loc, "cannot open profile file %qs: %m", arg);
+   std::map  *profile_map
+ = new std::map();
+   opts->x_target_profile_map = profile_map;
+   std::string line;
+   while (std::getline (profile_file, line))
+ {
+   std::string::size_type pos = line.find (':');
+   if (pos == std::string::npos)
+ error_at (loc, "invalid profile file format");
+   profile_map->insert (std::make_pair (line.substr (0, pos),
+line

[RFC PATCH 1/2] Fortran: Do not make_decl_rtl in trans_function_start

2025-05-04 Thread Yangyu Chen
This patch is a preparation for the function multi-versioning support
in Fortran. The function multi-versioning support requires changing the
change_decl_assembler_name to add the target suffix, which is not
allowed after make_decl_rtl is called, since the assembler name will be
in the constant pool.

This patch removes the make_decl_rtl call in trans_function_start, and
it will finally be called in all my testcases since we will check it
whenever using the DECL_RTL macro.

Signed-off-by: Yangyu Chen 

gcc/fortran/ChangeLog:

* trans-decl.cc (trans_function_start): Remove call make_decl_rtl.
---
 gcc/fortran/trans-decl.cc | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 43bd7be54cb..bcd4a23e16e 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -3048,9 +3048,6 @@ trans_function_start (gfc_symbol * sym)
   rest_of_decl_compilation (fndecl, 1, 0);
 }
 
-  /* Create RTL for function definition.  */
-  make_decl_rtl (fndecl);
-
   allocate_struct_function (fndecl, false);
 
   /* function.cc requires a push at the start of the function.  */
-- 
2.49.0



[RFC PATCH 0/2] Add target_clones profile option support

2025-05-04 Thread Yangyu Chen
Hi everyone,

This patch series introduces support for the target_clones profile
option in GCC. This option enables users to specify target_clones
attributes in a separate file, allowing GCC to generate multiple
versions of the function with different ISA extensions based on the
specified profile. This is achieved using the -ftarget-profile
option.

The primary objective of this patch series is to provide a
user-friendly way to specify target_clones attributes without
modifying the source code. This approach enhances the source code's
cleanliness, facilitates easier maintenance, and ensures portability
across different architectures and compiler versions.

The example usage of the target_clones profile option is detailed in
the commit message of the second patch.

I understand that this patch lacks comprehensive documentation and
test cases, as I am still in the process of writing them.
However, I would appreciate feedback on the implementation before
adding them. If the implementation is deemed acceptable, I
will proceed with writing the documentation and test cases.

Yangyu Chen (2):
  Fortran: Do not make_decl_rtl in trans_function_start
  Add target_clones profile option support

 gcc/common.opt|  7 +++
 gcc/fortran/trans-decl.cc |  3 ---
 gcc/multiple_target.cc| 24 +++-
 gcc/opts.cc   | 23 +++
 4 files changed, 53 insertions(+), 4 deletions(-)

-- 
2.49.0



[PATCH 1/4] Loop-IM: Don't unconditional move conditional stmts during first LIM

2025-05-04 Thread Andrew Pinski
While fixing up how rewrite_to_defined_overflow works, gcc.dg/Wrestrict-22.c 
started
to fail. This is because `d p+ 2` would moved by LIM and then be rewritten not 
using
pointer plus. The rewriting part is correct behavior. It only recently started 
to be
moved out; due to r16-190-g6901d56fea2132.
Which has the following comment:
```
When we run before PRE and PRE is active hoist all expressions
since PRE would do so anyway and we can preserve range info
but PRE cannot.
```
But LIM will not preserve rang info when moving conditional executed
statements out of the loop.  So it would be better not to override the cost 
check
for conditional executed statements in the first place.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-loop-im.cc (compute_invariantness): Don't ignore the cost for
conditional executed statements.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-loop-im.cc | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index a3ca5af3e3e..444ee31242f 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -1244,8 +1244,13 @@ compute_invariantness (basic_block bb)
   if (lim_data->cost >= LIM_EXPENSIVE
  /* When we run before PRE and PRE is active hoist all expressions
 since PRE would do so anyway and we can preserve range info
-but PRE cannot.  */
- || (flag_tree_pre && !in_loop_pipeline))
+but PRE cannot.  Except for conditional statements as those will 
never
+preserve range info.  */
+ || (flag_tree_pre && !in_loop_pipeline
+ && ALWAYS_EXECUTED_IN (bb)
+ && (ALWAYS_EXECUTED_IN (bb) == lim_data->max_loop
+ || flow_loop_nested_p (ALWAYS_EXECUTED_IN (bb),
+lim_data->max_loop
set_profitable_level (stmt);
 }
 }
-- 
2.34.1



[PATCH 2/4] gimple: Add gimple_with_undefined_signed_overflow and use it [PR111276]

2025-05-04 Thread Andrew Pinski
While looking into the ifcombine, I noticed that rewrite_to_defined_overflow
was rewriting already defined code. In the previous attempt at fixing this,
the review mentioned we should not be calling rewrite_to_defined_overflow
in those cases. The places which called rewrite_to_defined_overflow didn't
always check the lhs of the assignment. This fixes the problem by
introducing a helper function which is to be used before calling
rewrite_to_defined_overflow.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111276
* gimple-fold.cc (arith_code_with_undefined_signed_overflow): Make 
static.
(gimple_with_undefined_signed_overflow): New function.
* gimple-fold.h (arith_code_with_undefined_signed_overflow): Remove.
(gimple_with_undefined_signed_overflow): Add declaration.
* tree-if-conv.cc (if_convertible_gimple_assign_stmt_p): Use
gimple_with_undefined_signed_overflow instead of manually
checking lhs and the code of the stmt.
(predicate_statements): Likewise.
* tree-ssa-ifcombine.cc (ifcombine_rewrite_to_defined_overflow): 
Likewise.
* tree-ssa-loop-im.cc (move_computations_worker): Likewise.
* tree-ssa-reassoc.cc (update_range_test): Likewise. Reformat.
* tree-scalar-evolution.cc (final_value_replacement_loop): Use
gimple_with_undefined_signed_overflow instead of
arith_code_with_undefined_signed_overflow.
* tree-ssa-loop-split.cc (split_loop): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc   | 26 ++-
 gcc/gimple-fold.h|  2 +-
 gcc/tree-if-conv.cc  | 16 +++
 gcc/tree-scalar-evolution.cc |  5 +
 gcc/tree-ssa-ifcombine.cc| 10 ++---
 gcc/tree-ssa-loop-im.cc  |  6 +-
 gcc/tree-ssa-loop-split.cc   |  5 +
 gcc/tree-ssa-reassoc.cc  | 40 +++-
 8 files changed, 50 insertions(+), 60 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 94d5a1ebbd7..c060ef81a42 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -10569,7 +10569,7 @@ gimple_fold_indirect_ref (tree t)
integer types involves undefined behavior on overflow and the
operation can be expressed with unsigned arithmetic.  */
 
-bool
+static bool
 arith_code_with_undefined_signed_overflow (tree_code code)
 {
   switch (code)
@@ -10586,6 +10586,30 @@ arith_code_with_undefined_signed_overflow (tree_code 
code)
 }
 }
 
+/* Return true if STMT has an operation that operates on a signed
+   integer types involves undefined behavior on overflow and the
+   operation can be expressed with unsigned arithmetic.  */
+
+bool
+gimple_with_undefined_signed_overflow (gimple *stmt)
+{
+  if (!is_gimple_assign (stmt))
+return false;
+  tree lhs = gimple_assign_lhs (stmt);
+  if (!lhs)
+return false;
+  tree lhs_type = TREE_TYPE (lhs);
+  if (!INTEGRAL_TYPE_P (lhs_type)
+  && !POINTER_TYPE_P (lhs_type))
+return false;
+  if (!TYPE_OVERFLOW_UNDEFINED (lhs_type))
+return false;
+  if (!arith_code_with_undefined_signed_overflow
+   (gimple_assign_rhs_code (stmt)))
+return false;
+  return true;
+}
+
 /* Rewrite STMT, an assignment with a signed integer or pointer arithmetic
operation that can be transformed to unsigned arithmetic by converting
its operand, carrying out the operation in the corresponding unsigned
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 2790d0ffc65..5fcfdcda81b 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -59,7 +59,7 @@ extern tree gimple_get_virt_method_for_vtable (HOST_WIDE_INT, 
tree,
 extern tree gimple_fold_indirect_ref (tree);
 extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
-extern bool arith_code_with_undefined_signed_overflow (tree_code);
+extern bool gimple_with_undefined_signed_overflow (gimple *);
 extern void rewrite_to_defined_overflow (gimple_stmt_iterator *);
 extern gimple_seq rewrite_to_defined_overflow (gimple *);
 extern void replace_call_with_value (gimple_stmt_iterator *, tree);
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 5b63bf67fe0..fe8aee057b3 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1066,11 +1066,7 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
fprintf (dump_file, "tree could trap...\n");
   return false;
 }
-  else if ((INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-   || POINTER_TYPE_P (TREE_TYPE (lhs)))
-  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
-  && arith_code_with_undefined_signed_overflow
-   (gimple_assign_rhs_code (stmt)))
+  else if (gimple_with_undefined_signed_overflow (stmt))
 /* We have to rewrite stmts with undefined overflow.  */
 need_to_rewrite_undefined = true;
 
@@ -2830,7 +2826,6 @@ predicate_statements (loop_p loop)
   for (

[PATCH 4/4] phiopt: Use rewrite_to_defined_overflow in move_stmt [PR116938]

2025-05-04 Thread Andrew Pinski
As mentioned previously the rewrite in move_stmt should be
using rewrite_to_defined_overflow/gimple_with_undefined_signed_overflow
instead of just rewriting the VCE.
This moves move_stmt over to that.

A few testcases needed to be updated due to ABS_EXPR rewrite that happens.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (move_stmt): Use rewrite_to_defined_overflow
isntead of manually doing the rewrite of the VCE.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-40.c: Update to expect ABSU_EXPR.
* gcc.dg/tree-ssa/phi-opt-41.c: Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c |  7 +++---
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c |  4 ++--
 gcc/tree-ssa-phiopt.cc | 26 +++---
 3 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c
index a9011ce97fb..70629165bb6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-40.c
@@ -20,6 +20,7 @@ int f1(int x)
 
 /* { dg-final { scan-tree-dump-times "if " 1 "phiopt1" } } */
 /* { dg-final { scan-tree-dump-not "if " "phiopt2" } } */
-/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 2 "phiopt1" } } */
-/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 1 "phiopt2" } } */
-/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 1 "phiopt2" } } */
+/* The ABS_EXPR in f gets rewritten to ABSU_EXPR as phiopt can't prove it was 
not undefined when moving it. */
+/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 2 "phiopt2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
index 9774e283a7b..817d4feb027 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
@@ -29,6 +29,6 @@ int fge(int a, unsigned char b)
   return a > 0 ? a : -a;
 }
 
-
+/* The ABS_EXPR gets rewritten to ABSU_EXPR as phiopt can't prove it was not 
undefined when moving it. */
 /* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
-/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 4 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "ABSU_EXPR <" 4 "phiopt1" } } */
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 54ecd93495a..5c7dcee19d1 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -838,33 +838,13 @@ move_stmt (gimple *stmt, gimple_stmt_iterator *gsi, 
auto_bitmap &inserted_exprs)
   // Mark the name to be renamed if there is one.
   bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (name));
   gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt);
-  gsi_move_before (&gsi1, gsi);
+  gsi_move_before (&gsi1, gsi, GSI_NEW_STMT);
   reset_flow_sensitive_info (name);
 
   /* Rewrite some code which might be undefined when
  unconditionalized. */
-  if (gimple_assign_single_p (stmt))
-{
-  tree rhs = gimple_assign_rhs1 (stmt);
-  /* VCE from integral types to another integral types but with
-different precisions need to be changed into casts
-to be well defined when unconditional. */
-  if (gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR
- && INTEGRAL_TYPE_P (TREE_TYPE (name))
- && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (rhs, 0
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   {
- fprintf (dump_file, "rewriting stmt with maybe undefined VCE ");
- print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
-   }
- tree new_rhs = TREE_OPERAND (rhs, 0);
- gcc_assert (is_gimple_val (new_rhs));
- gimple_assign_set_rhs_code (stmt, NOP_EXPR);
- gimple_assign_set_rhs1 (stmt, new_rhs);
- update_stmt (stmt);
-   }
-}
+  if (gimple_with_undefined_signed_overflow (stmt))
+rewrite_to_defined_overflow (gsi);
 }
 
 /* RAII style class to temporarily remove flow sensitive
-- 
2.34.1



[PATCH 3/4] Rewrite VCEs of integral types [PR116939]

2025-05-04 Thread Andrew Pinski
Like the patch to phiopt (r15-4033-g1f619fe25925a5f7), this adds rewriting
of VCE to gimple_with_undefined_signed_overflow/rewrite_to_defined_overflow.
I have not seen a case yet for needing this rewrite but this step is needed
to use gimple_with_undefined_signed_overflow/rewrite_to_defined_overflow from
phiopt.

Bootstrappd and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/116939
* gimple-fold.cc (gimple_with_undefined_signed_overflow): Return true
for VCE with integral types.
(rewrite_to_defined_overflow): Handle VCE rewriting to a cast.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c060ef81a42..a6a7fcbb8c1 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -10602,6 +10602,14 @@ gimple_with_undefined_signed_overflow (gimple *stmt)
   if (!INTEGRAL_TYPE_P (lhs_type)
   && !POINTER_TYPE_P (lhs_type))
 return false;
+  tree rhs = gimple_assign_rhs1 (stmt);
+  /* VCE from integral types to another integral types but with
+ different precisions need to be changed into casts
+ to be well defined. */
+  if (gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR
+  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (rhs, 0)))
+  && is_gimple_val (TREE_OPERAND (rhs, 0)))
+return true;
   if (!TYPE_OVERFLOW_UNDEFINED (lhs_type))
 return false;
   if (!arith_code_with_undefined_signed_overflow
@@ -10630,10 +10638,28 @@ rewrite_to_defined_overflow (gimple_stmt_iterator 
*gsi, gimple *stmt,
   "overflow ");
   print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 }
-
+  gimple_seq stmts = NULL;
+  /* VCE from integral types to another integral types but with
+ different precisions need to be changed into casts
+ to be well defined. */
+  if (gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR)
+{
+  tree rhs = gimple_assign_rhs1 (stmt);
+  tree new_rhs = TREE_OPERAND (rhs, 0);
+  gcc_assert (is_gimple_val (new_rhs));
+  gimple_assign_set_rhs_code (stmt, NOP_EXPR);
+  gimple_assign_set_rhs1 (stmt, new_rhs);
+  if (in_place)
+ update_stmt (stmt);
+  else
+   {
+ gimple_set_modified (stmt, true);
+ gimple_seq_add_stmt (&stmts, stmt);
+   }
+  return stmts;
+}
   tree lhs = gimple_assign_lhs (stmt);
   tree type = unsigned_type_for (TREE_TYPE (lhs));
-  gimple_seq stmts = NULL;
   if (gimple_assign_rhs_code (stmt) == ABS_EXPR)
 gimple_assign_set_rhs_code (stmt, ABSU_EXPR);
   else
-- 
2.34.1



RE: [PATCH] cobol: Rewrite exception handling. Partially refactor subscript/refmod calculations.

2025-05-04 Thread Robert Dubner
I know it's the weekend.  But this has been niggling at me, so I decided
to push it.

> -Original Message-
> From: Robert Dubner 
> Sent: Saturday, May 3, 2025 10:19
> To: gcc-patches@gcc.gnu.org
> Cc: 'Jakub Jelinek' ; 'James K. Lowden'
> 
> Subject: [PATCH] cobol: Rewrite exception handling. Partially refactor
> subscript/refmod calculations.
> 
> I really hope that I am not stomping on the work that other people
(Jakub,
> in particular) have been doing.  I obviously don't think that I am, but
I
> still haven't had the time that I would like to learn more of the
details
> of what he has been doing.
> 
> Normally I would just commit these very extensive changes, because they
> are all confined to the internal workings of gcc/cobol and libgcobol.
But
> I figure that giving others a chance to at least look at them is
> reasonable.
> 
> As posted here, these pass all of my internal tests and "make
check-cobol"
> on x86_64-linux Ubuntu.
> 
> Does anybody have any objections to my pushing this to trunk?
> 
> =
> 
> This commit includes changes to exception handling, and changes to the
> calculations for offsets and lengths when processing subscripted table
> entries
> and variables with (from:length) reference modifications.
> 
> Exception handling in COBOL requires significant amounts of information
to
> be
> built at compile time and sent to libgcobol.so at run time.  The changes
> here
> reduce some problems caused by creating structures by the host that are
> processed by the target, mainly by creating arrays of simple integers
> rather
> than by turning a structure into a stream of bytes.
> 
> Significant changes to the logic of exception handling brings the
run-time
> performance more in line with the ISO specification.
> 
> The handling of COBOL variables that include tables defined with
DEPENDING
> ON
> clauses is subtly different when used as sending variables versus when
> they are
> receiving variables.  This commit folds the very similar
> refer_offset_source
> and refer_offset_dest routines into a single refer_offset routine.  It
> also
> streamlines the refer_length_source and refer_length_dest routines by
> moving
> common code into a static refer_length() routine, and having
> refer_length_source() and refer_length_dest() each call refer_length()
> with a
> a type flag.
> 
> Co-Authored by: James K. Lowden mailto:jklow...@cobolworx.com
> Co-Authored by: Robert Dubner mailto:rdub...@symas.com
> 
> gcc/cobol/ChangeLog:
> 
>   * cdf.y: Exceptions.
>   * except.cc (cbl_enabled_exception_t::dump): Likewise.
>   (cbl_enabled_exceptions_t::dump): Likewise.
>   (cbl_enabled_exceptions_t::status): Likewise.
>   (cbl_enabled_exceptions_t::encode): Likewise.
>   (cbl_enabled_exceptions_t::turn_on_off): Likewise.
>   (cbl_enabled_exceptions_t::match): Likewise.
>   (declarative_runtime_match): Likewise. Likewise.
>   * exceptg.h (struct cbl_exception_files_t): Likewise.
>   (class exception_turn_t): Likewise.
>   (apply_cdf_turn): Likewise.
>   * genapi.cc (treeplet_fill_source): Use refer_offset().
>   (function_handle_from_name): Likewise.
>   (parser_initialize_programs): Likewise.
>   (parser_statement_begin): Likewise.
>   (array_of_long_long): Exceptions.
>   (parser_compile_ecs): Exceptions.
>   (parser_compile_dcls): Exceptions.
>   (store_location_stuff): Exceptions.
>   (initialize_variable_internal): Use refer_offset().
>   (compare_binary_binary): Use refer_offset().
>   (cobol_compare): Use refer_offset().
>   (paragraph_label): Formatting.
>   (parser_goto): Use refer_offset().
>   (parser_perform_times): Likewise.
>   (internal_perform_through_times): Likewise.
>   (parser_enter_file): Exceptions.
>   (psa_FldLiteralN): Add comment.
>   (parser_accept): Use refer_offset().
>   (parser_accept_command_line): Likewise.
>   (parser_accept_command_line_count): Likewise.
>   (parser_accept_envar): Likewise.
>   (parser_set_envar): Likewise.
>   (parser_display_internal): Likewise.
>   (parser_initialize_table): Likewise.
>   (parser_sleep): Likewise.
>   (parser_allocate): Likewise.
>   (parser_free): Likewise.
>   (parser_division): Likewise.
>   (parser_relop_long): Likewise.
>   (parser_see_stop_run): Likewise.
>   (parser_classify): Likewise.
>   (parser_file_add): Include symbol_table_index in
> __gg__file_init().
>   (parser_file_open): Use refer_offset().
>   (parser_file_write): Move forward declaration of
> store_location_stuff().
>   (parser_file_start): Use refer_offset().
>   (parser_inspect_conv): Likewise:
>   (parser_intrinsic_numval_c): Likewise:
>   (parser_intrinsic_subst): Likewise:
>   (parser_intrinsic_call_1): Likewise:
>   (parser_intrinsic_call_2): Likewise:
>   (parser_intrinsic_call_3): Likewise:
>   (parser_intrinsic_call_4): Likewise:
>  

Re: [PATCH] PR tree-optimization/120048 - Allow IPA_CP to handle UNDEFINED as VARYING.

2025-05-04 Thread Richard Biener
On Sat, May 3, 2025 at 4:42 PM Andrew MacLeod  wrote:
>
>
> On 5/3/25 07:41, Richard Biener wrote:
> > On Sat, May 3, 2025 at 12:39 AM Andrew MacLeod  wrote:
> >> On trunk I'll eventually do something different.. but it will be more
> >> invasive than I think is reasonable for a backport.
> >>
> >> The problem in the PR is that there is a variable with a range and has a
> >> bitmask attached to it.   We often defer bitmask processing, the the
> >> change which triggers this problem "improves" the range by applying the
> >> bitmask when  we call update_bitmask. (PR 119712)
> >>
> >> The case in point is a range of 0, combined with a bitmask that says the
> >> '1' bit must be on.   This results in an UNDEFINED range since its
> >> impossible.   this is rarely a problem but this particular snippet of
> >> code in IPA is tripping over it because it has checked for undefined,
> >> and then created a new range by combining the [0, 0] and the bitmask,
> >> which we turn into an UNDEFINED.. which it isn't expected.and then
> >> it asks for the type of the range.
> >>
> >> As Jakub points out in the PR, this is effectively unreachable code that
> >> is being propagated. A harmless fix would be to check if the result of
> >> applying the bitmask results in an UNDEFINED value and  to simply
> >> replace it with a VARYING value.
> >>
> >> WE still reduce the testcase to "return 0" and no more failure.
> >>
> >> bootstraps on -x86_64-pc-linux-gnu  with no regressions.
> >>
> >> If this is acceptable, I will push it to trunk, then also test/verify
> >> for the GCC15 and 14(?) branches and check it in there.
> > LGTM.  IPA CP might want to either avoid looking at the type
> > for UNDEFINED or track it separate from the value-range, not
> > sure where it looks at the type of a range.
> >
> > Richard.
> >
> It appears they don't track undefined at al?  l...   but thats just a
> cursory glance.

It probably isn't very useful to IPA CP propagate UNDEFINED.

> On trunk. I think I'll adjust it next week and put the type into the
> UNDEFINED range..  I have a functioning patch now.  I think its too
> pervasive and not really enough of an issue to do that on the release
> branches
>
> It'll solve a few of these kinds of things when they pop up, and allow
> us to properly do an invert () operation  on VARYING and UNDEFINED,
> which we have discussed before.
>
> Andrew
>


Re: [PATCH 1/4] Loop-IM: Don't unconditional move conditional stmts during first LIM

2025-05-04 Thread Richard Biener
On Mon, May 5, 2025 at 3:42 AM Andrew Pinski  wrote:
>
> While fixing up how rewrite_to_defined_overflow works, gcc.dg/Wrestrict-22.c 
> started
> to fail. This is because `d p+ 2` would moved by LIM and then be rewritten 
> not using
> pointer plus. The rewriting part is correct behavior. It only recently 
> started to be
> moved out; due to r16-190-g6901d56fea2132.
> Which has the following comment:
> ```
> When we run before PRE and PRE is active hoist all expressions
> since PRE would do so anyway and we can preserve range info
> but PRE cannot.
> ```
> But LIM will not preserve range info when moving conditional executed
> statements out of the loop.  So it would be better not to override the cost 
> check
> for conditional executed statements in the first place.
>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> * tree-ssa-loop-im.cc (compute_invariantness): Don't ignore the cost 
> for
> conditional executed statements.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-ssa-loop-im.cc | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
> index a3ca5af3e3e..444ee31242f 100644
> --- a/gcc/tree-ssa-loop-im.cc
> +++ b/gcc/tree-ssa-loop-im.cc
> @@ -1244,8 +1244,13 @@ compute_invariantness (basic_block bb)
>if (lim_data->cost >= LIM_EXPENSIVE
>   /* When we run before PRE and PRE is active hoist all expressions
>  since PRE would do so anyway and we can preserve range info
> -but PRE cannot.  */
> - || (flag_tree_pre && !in_loop_pipeline))
> +but PRE cannot.  Except for conditional statements as those will 
> never
> +preserve range info.  */
> + || (flag_tree_pre && !in_loop_pipeline
> + && ALWAYS_EXECUTED_IN (bb)
> + && (ALWAYS_EXECUTED_IN (bb) == lim_data->max_loop

So that's not exactly the same condition as we use in the end - we should
be able to set profitability to the level the stmt is always executed in
instead by using set_level (stmt, gimple_bb (stmt)->loop_father,
ALWAYS_EXECUTED_IN (bb)),
of course only when lim_data->cost < LIM_EXPENSIVE, so I'd split up the
if into two cases, profitable cost-wise and the PRE exception.

> + || flow_loop_nested_p (ALWAYS_EXECUTED_IN (bb),
> +lim_data->max_loop
> set_profitable_level (stmt);
>  }
>  }
> --
> 2.34.1
>


Re: [PATCH 2/4] gimple: Add gimple_with_undefined_signed_overflow and use it [PR111276]

2025-05-04 Thread Richard Biener
On Mon, May 5, 2025 at 3:42 AM Andrew Pinski  wrote:
>
> While looking into the ifcombine, I noticed that rewrite_to_defined_overflow
> was rewriting already defined code. In the previous attempt at fixing this,
> the review mentioned we should not be calling rewrite_to_defined_overflow
> in those cases. The places which called rewrite_to_defined_overflow didn't
> always check the lhs of the assignment. This fixes the problem by
> introducing a helper function which is to be used before calling
> rewrite_to_defined_overflow.
>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> PR tree-optimization/111276
> * gimple-fold.cc (arith_code_with_undefined_signed_overflow): Make 
> static.
> (gimple_with_undefined_signed_overflow): New function.
> * gimple-fold.h (arith_code_with_undefined_signed_overflow): Remove.
> (gimple_with_undefined_signed_overflow): Add declaration.
> * tree-if-conv.cc (if_convertible_gimple_assign_stmt_p): Use
> gimple_with_undefined_signed_overflow instead of manually
> checking lhs and the code of the stmt.
> (predicate_statements): Likewise.
> * tree-ssa-ifcombine.cc (ifcombine_rewrite_to_defined_overflow): 
> Likewise.
> * tree-ssa-loop-im.cc (move_computations_worker): Likewise.
> * tree-ssa-reassoc.cc (update_range_test): Likewise. Reformat.
> * tree-scalar-evolution.cc (final_value_replacement_loop): Use
> gimple_with_undefined_signed_overflow instead of
> arith_code_with_undefined_signed_overflow.
> * tree-ssa-loop-split.cc (split_loop): Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-fold.cc   | 26 ++-
>  gcc/gimple-fold.h|  2 +-
>  gcc/tree-if-conv.cc  | 16 +++
>  gcc/tree-scalar-evolution.cc |  5 +
>  gcc/tree-ssa-ifcombine.cc| 10 ++---
>  gcc/tree-ssa-loop-im.cc  |  6 +-
>  gcc/tree-ssa-loop-split.cc   |  5 +
>  gcc/tree-ssa-reassoc.cc  | 40 +++-
>  8 files changed, 50 insertions(+), 60 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 94d5a1ebbd7..c060ef81a42 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -10569,7 +10569,7 @@ gimple_fold_indirect_ref (tree t)
> integer types involves undefined behavior on overflow and the
> operation can be expressed with unsigned arithmetic.  */
>
> -bool
> +static bool
>  arith_code_with_undefined_signed_overflow (tree_code code)
>  {
>switch (code)
> @@ -10586,6 +10586,30 @@ arith_code_with_undefined_signed_overflow (tree_code 
> code)
>  }
>  }
>
> +/* Return true if STMT has an operation that operates on a signed
> +   integer types involves undefined behavior on overflow and the
> +   operation can be expressed with unsigned arithmetic.  */
> +
> +bool
> +gimple_with_undefined_signed_overflow (gimple *stmt)
> +{
> +  if (!is_gimple_assign (stmt))

Using

gassign *ass = dyn_cast  (stmt);
if (!ass)
  return false;

and using the gassing below makes the followon tests
cheaper (and eases future wiping off 'gimple *' overloads).

OK with that change.
Richard.

> +return false;
> +  tree lhs = gimple_assign_lhs (stmt);
> +  if (!lhs)
> +return false;
> +  tree lhs_type = TREE_TYPE (lhs);
> +  if (!INTEGRAL_TYPE_P (lhs_type)
> +  && !POINTER_TYPE_P (lhs_type))
> +return false;
> +  if (!TYPE_OVERFLOW_UNDEFINED (lhs_type))
> +return false;
> +  if (!arith_code_with_undefined_signed_overflow
> +   (gimple_assign_rhs_code (stmt)))
> +return false;
> +  return true;
> +}
> +
>  /* Rewrite STMT, an assignment with a signed integer or pointer arithmetic
> operation that can be transformed to unsigned arithmetic by converting
> its operand, carrying out the operation in the corresponding unsigned
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 2790d0ffc65..5fcfdcda81b 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -59,7 +59,7 @@ extern tree gimple_get_virt_method_for_vtable 
> (HOST_WIDE_INT, tree,
>  extern tree gimple_fold_indirect_ref (tree);
>  extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
>  extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
> -extern bool arith_code_with_undefined_signed_overflow (tree_code);
> +extern bool gimple_with_undefined_signed_overflow (gimple *);
>  extern void rewrite_to_defined_overflow (gimple_stmt_iterator *);
>  extern gimple_seq rewrite_to_defined_overflow (gimple *);
>  extern void replace_call_with_value (gimple_stmt_iterator *, tree);
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 5b63bf67fe0..fe8aee057b3 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1066,11 +1066,7 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
> fprintf (dump_file, "tree could trap...\n");
>return false;
>  }
> -  else if

[PATCH 2/3] x86: Add a pass to fold tail call

2025-05-04 Thread H.J. Lu
x86 conditional branch (jcc) target can be either a label or a symbol.
Add a pass to fold tail call with jcc by turning:

jcc .L6
...
.L6:
jmp tailcall

into:

jcc tailcall

Immediately before the pass which turning REG_EH_REGION notes back into
NOTE_INSN_EH_REGION notes, conditional branches look like

(jump_insn 7 6 14 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 23)
(pc))) "x.c":8:5 1458 {jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 217325348 (nil)))
...
(code_label 23 20 8 4 4 (nil) [1 uses])
(note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  ) [0 bar S1 A8])
(const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41]  )
(nil))
(nil))

If the branch edge destination is a basic block with only a direct
sibcall, change the jcc target to the sibcall target, decrement the
destination basic block entry label use count and redirect the edge
to the exit basic block.  Call delete_unreachable_blocks to delete
the unreachable basic blocks at the end if edges are redirected.

gcc/

PR target/47253
* i386/i386-features.cc: Include "cfgcleanup.h".
(sibcall_only_bb): New.
(reg_eh_region_note_ok_p): Likewise.
(fold_sibcall): Likewise.
(pass_data_fold_sibcall): Likewise.
(pass_fold_sibcall): Likewise.
(make_pass_fold_sibcall): Likewise.
* config/i386/i386-passes.def: Add pass_fold_sibcall before
pass_convert_to_eh_region_ranges.
* config/i386/i386-protos.h (ix86_output_jcc_insn): New.
(make_pass_fold_sibcall): Likewise.
* config/i386/i386.cc (ix86_output_jcc_insn): Likewise.
* config/i386/i386.md (*jcc): Renamed to ...
(jcc): This.  Replace label_ref with symbol_label_operand.  Use
ix86_output_jcc_insn.  Set length to 6 if the branch target
isn't a label.

gcc/testsuite/

PR target/47253
* gcc.target/i386/pr47253-1a.c: New file.
* gcc.target/i386/pr47253-1b.c: Likewise.
* gcc.target/i386/pr47253-2a.c: Likewise.
* gcc.target/i386/pr47253-2b.c: Likewise.
* gcc.target/i386/pr47253-3a.c: Likewise.
* gcc.target/i386/pr47253-3b.c: Likewise.
* gcc.target/i386/pr47253-3c.c: Likewise.
* gcc.target/i386/pr47253-4a.c: Likewise.
* gcc.target/i386/pr47253-4b.c: Likewise.
* gcc.target/i386/pr47253-5.c: Likewise.
* gcc.target/i386/pr47253-6.c: Likewise.
* gcc.target/i386/pr47253-7a.c: Likewise.
* gcc.target/i386/pr47253-7b.c: Likewise.
* gcc.target/i386/pr47253-8.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-features.cc   | 204 +
 gcc/config/i386/i386-passes.def|   1 +
 gcc/config/i386/i386-protos.h  |   3 +
 gcc/config/i386/i386.cc|  12 ++
 gcc/config/i386/i386.md|   9 +-
 gcc/config/i386/predicates.md  |   4 +
 gcc/testsuite/gcc.target/i386/pr47253-1a.c |  24 +++
 gcc/testsuite/gcc.target/i386/pr47253-1b.c |  17 ++
 gcc/testsuite/gcc.target/i386/pr47253-2a.c |  29 +++
 gcc/testsuite/gcc.target/i386/pr47253-2b.c |  17 ++
 gcc/testsuite/gcc.target/i386/pr47253-3a.c |  32 
 gcc/testsuite/gcc.target/i386/pr47253-3b.c |  20 ++
 gcc/testsuite/gcc.target/i386/pr47253-3c.c |  20 ++
 gcc/testsuite/gcc.target/i386/pr47253-4a.c |  26 +++
 gcc/testsuite/gcc.target/i386/pr47253-4b.c |  18 ++
 gcc/testsuite/gcc.target/i386/pr47253-5.c  |  15 ++
 gcc/testsuite/gcc.target/i386/pr47253-6.c  |  15 ++
 gcc/testsuite/gcc.target/i386/pr47253-7a.c |  52 ++
 gcc/testsuite/gcc.target/i386/pr47253-7b.c |  36 
 gcc/testsuite/gcc.target/i386/pr47253-8.c  |  74 
 20 files changed, 624 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-8.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 31f3e

[PATCH 0/3] x86: Add a pass to fold tail call

2025-05-04 Thread H.J. Lu
Conditional and unconditional branch targets can be either a label or
a symbol.  For conditional jump:

(jump_insn 7 6 14 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 23)
(pc))) "x.c":8:5 1458 {jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 217325348 (nil)))
...
(code_label 23 20 8 4 4 (nil) [1 uses])
(note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  
) [0 bar S1 A8])
(const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41]  )
(nil))
(nil))

they can be changed to

(jump_insn 7 6 14 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
((symbol_ref:DI ("bar") [flags 0x41] )
(pc))) "x.c":8:5 1458 {jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 217325348 (nil)))

if the call is a sibcall.  For jump table:

(jump_table_data 16 15 17 (addr_vec:DI [
(label_ref:DI 18)
(label_ref:DI 22)
(label_ref:DI 26)
(label_ref:DI 30)
(label_ref:DI 34)
]))
...
(code_label 30 17 31 4 5 (nil) [1 uses])
(note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41]  
) [0 bar3 S1 A8])
(const_int 0 [0])) "j.c":15:13 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41]  
)
(nil))
(nil))

They can be changed to

(jump_table_data 16 15 17 (addr_vec:DI [
(symbol_ref:DI ("bar0") [flags 0x41]  )
(symbol_ref:DI ("bar1") [flags 0x41]  )
(symbol_ref:DI ("bar2") [flags 0x41]  )
(symbol_ref:DI ("bar3") [flags 0x41]  )
(symbol_ref:DI ("bar4") [flags 0x41]  )
]))

if bar0/bar1/bar2/bar3/bar4 calls are sibcalls.

Instead of supporting symbol reference in jump label and jump table in
the full RTL optimization pipeline, which requires very invasive changes
to GCC RTL infrastructure, support symbol reference in jump label and
jump table for the pass which turning REG_EH_REGION notes back into
NOTE_INSN_EH_REGION notes and after.

If the branch edge destination is a basic block with only a direct
sibcall, change the jcc target to the sibcall target, decrement the
destination basic block entry label use count and redirect the edge
to the exit basic block.

If the jump table entry points to a target basic block with only a direct
sibcall, change the entry to point to the sibcall target, decrement the
target basic block entry label use count and redirect the edge to the
exit basic block.

Call delete_unreachable_blocks to delete the unreachable basic blocks at
the end if edges are redirected.

H.J. Lu (3):
  Support symbol reference in jump label and jump table
  x86: Add a pass to fold tail call
  x86: Fold sibcall targets into jump table

 gcc/config/i386/i386-expand.cc |   5 +-
 gcc/config/i386/i386-features.cc   | 280 +
 gcc/config/i386/i386-passes.def|   1 +
 gcc/config/i386/i386-protos.h  |   3 +
 gcc/config/i386/i386.cc|  12 +
 gcc/config/i386/i386.md|   9 +-
 gcc/config/i386/predicates.md  |   4 +
 gcc/doc/rtl.texi   |  24 +-
 gcc/dwarf2cfi.cc   |  20 +-
 gcc/final.cc   |  26 +-
 gcc/function-abi.cc|   2 +-
 gcc/jump.cc|  36 +++
 gcc/print-rtl.cc   |   2 +
 gcc/rtl.h  |   8 +
 gcc/rtlanal.cc |   5 +-
 gcc/testsuite/gcc.target/i386/pr14721-1a.c |  54 
 gcc/testsuite/gcc.target/i386/pr14721-1b.c |  37 +++
 gcc/testsuite/gcc.target/i386/pr14721-1c.c |  37 +++
 gcc/testsuite/gcc.target/i386/pr14721-2a.c |  58 +
 gcc/testsuite/gcc.target/i386/pr14721-2b.c |  41 +++
 gcc/testsuite/gcc.target/i386/pr14721-2c.c |  43 
 gcc/testsuite/gcc.target/i386/pr14721-3a.c |  56 +
 gcc/testsuite/gcc.target/i386/pr14721-3b.c |  40 +++
 gcc/testsuite/gcc.target/i386/pr14721-3c.c |  39 +++
 gcc/testsuite/gcc.target/i386/pr47253-1a.c |  24 ++
 gcc/testsuite/gcc.target/i386/pr47253-1b.c |  17 ++
 gcc/testsuite/gcc.target/i386/pr47253-2a.c |  29 +++
 gcc/testsuite/gcc.target/i386/pr47253-2b.c |  17 ++
 gcc/testsuite/gcc.target/i386/pr47253-3a.c |  32 +++
 gcc/testsuite/gcc.target/i386/pr47253-3b.c |  20 ++
 gcc/testsuite/gcc.target/i386/pr47253-3c.c |  20 ++
 gcc/testsuite/gcc.target/i386/pr47253-4a.c |  26 ++
 gcc/testsuite/gcc.target/i386/pr47253-4b.c |  18 ++
 gcc/testsuite/gcc.target/i386/pr47253-5.c  |  15 ++
 gcc/testsuite/gcc.target/i386/pr47253-6.c  |  15 ++
 gcc/testsuite/gcc.target/i386/pr47253-7a.c |  52 
 gcc/

[PATCH 1/3] Support symbol reference in jump label and jump table

2025-05-04 Thread H.J. Lu
Conditional and unconditional branch targets can be either a label or
a symbol.  For conditional jump:

(jump_insn 7 6 14 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 23)
(pc))) "x.c":8:5 1458 {jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 217325348 (nil)))
...
(code_label 23 20 8 4 4 (nil) [1 uses])
(note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  
) [0 bar S1 A8])
(const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41]  )
(nil))
(nil))

they can be changed to

(jump_insn 7 6 14 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
((symbol_ref:DI ("bar") [flags 0x41] )
(pc))) "x.c":8:5 1458 {jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 217325348 (nil)))

if the call is a sibcall.  For jump table:

(jump_table_data 16 15 17 (addr_vec:DI [
(label_ref:DI 18)
(label_ref:DI 22)
(label_ref:DI 26)
(label_ref:DI 30)
(label_ref:DI 34)
]))
...
(code_label 30 17 31 4 5 (nil) [1 uses])
(note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41]  
) [0 bar3 S1 A8])
(const_int 0 [0])) "j.c":15:13 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41]  
)
(nil))
(nil))

They can be changed to

(jump_table_data 16 15 17 (addr_vec:DI [
(symbol_ref:DI ("bar0") [flags 0x41]  )
(symbol_ref:DI ("bar1") [flags 0x41]  )
(symbol_ref:DI ("bar2") [flags 0x41]  )
(symbol_ref:DI ("bar3") [flags 0x41]  )
(symbol_ref:DI ("bar4") [flags 0x41]  )
]))

if bar0/bar1/bar2/bar3/bar4 calls are sibcalls.

Instead of supporting symbol reference in jump label and jump table in
the full RTL optimization pipeline, which requires very invasive changes
to GCC RTL infrastructure, support symbol reference in jump label and
jump table for the pass which turning REG_EH_REGION notes back into
NOTE_INSN_EH_REGION notes and after:

1. Add a set_jump_target method to assign symbol reference to jump label.
2. Add condsibcall_p for conditional sibling call.
3. Return false for symbol reference in jump table check.
4. Update create_trace_edges and rtx_writer::print_rtx_operand_code_0 to
handle symbol reference in jump label.
5. Update to final_scan_insn_1 to handle symbol reference in jump table.
6. Update fndecl_abi and collect_fn_hard_reg_usage to support conditional
sibling call for callee ABI.
7. Document limitation of symbol reference support in jump label.

* dwarf2cfi.c (create_trace_edges): Skip symbol reference in
jump table and in JUMP_LABEL.  Short-circuit JUMP for the pure
sibcall.
* final.cc (final_scan_insn_1): Support symbol reference in jump
table.
(collect_fn_hard_reg_usage): Also check conditional sibcall.
* function-abi.cc (insn_callee_abi): Likewise.
* jump.cc (condsibcall_p): New.
* print-rtl.cc (rtx_writer::print_rtx_operand_code_0): Support
symbol reference in JUMP_LABEL.
* rtl.h (rtx_jump_insn::set_jump_target): New, with the rtx
argument.
* rtl.h (condsibcall_p): New.
* rtlanal.cc (tablejump_p): Return false if JUMP_LABEL is a
symbol reference.
* config/i386/i386-expand.cc (ix86_notrack_prefixed_insn_p):
Likewise.
* doc/rtl.texi (addr_vec): Also allow symbol reference.
(JUMP_LABEL): Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-expand.cc |  5 -
 gcc/doc/rtl.texi   | 24 +--
 gcc/dwarf2cfi.cc   | 20 ++-
 gcc/final.cc   | 26 +---
 gcc/function-abi.cc|  2 +-
 gcc/jump.cc| 36 ++
 gcc/print-rtl.cc   |  2 ++
 gcc/rtl.h  |  8 
 gcc/rtlanal.cc |  5 -
 9 files changed, 111 insertions(+), 17 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 7f0fdb6fa9e..0d0802692d1 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -25501,7 +25501,10 @@ ix86_notrack_prefixed_insn_p (rtx_insn *insn)
   if (JUMP_P (insn) && !flag_cet_switch)
 {
   rtx target = JUMP_LABEL (insn);
-  if (target == NULL_RTX || ANY_RETURN_P (target))
+  if (target == NULL_RTX
+ || ANY_RETURN_P (target)
+ /* Also check for conditional sibcall.  */
+ || SYMBOL_REF_P (target))
return false;
 
   /* Check the jump is a switch table.  */
diff --git a/gcc/doc/rtl.texi b

[PATCH 3/3] x86: Fold sibcall targets into jump table

2025-05-04 Thread H.J. Lu
Enhance fold sibcall pass to fold sibcall targets into jump table by
turning:

foo:
.cfi_startproc
cmpl$4, %edi
ja  .L1
movl%edi, %edi
jmp *.L4(,%rdi,8)
.section.rodata
.L4:
.quad   .L8
.quad   .L7
.quad   .L6
.quad   .L5
.quad   .L3
.text
.L5:
jmp bar3
.L3:
jmp bar4
.L8:
jmp bar0
.L7:
jmp bar1
.L6:
jmp bar2
.L1:
ret
.cfi_endproc

into:

foo:
.cfi_startproc
cmpl$4, %edi
ja  .L1
movl%edi, %edi
jmp *.L4(,%rdi,8)
.section.rodata
.L4:
.quad   bar0
.quad   bar1
.quad   bar2
.quad   bar3
.quad   bar4
.text
.L1:
ret
.cfi_endproc

Before DWARF frame generation pass, jump tables look like:

(jump_table_data 16 15 17 (addr_vec:DI [
(label_ref:DI 18)
(label_ref:DI 22)
(label_ref:DI 26)
(label_ref:DI 30)
(label_ref:DI 34)
]))
...
(code_label 30 17 31 4 5 (nil) [1 uses])
(note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41]  
) [0 bar3 S1 A8])
(const_int 0 [0])) "j.c":15:13 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41]  
)
(nil))
(nil))

If the jump table entry points to a target basic block with only a direct
sibcall, change the entry to point to the sibcall target, decrement the
target basic block entry label use count and redirect the edge to the
exit basic block.

gcc/

PR target/14721
* config/i386/i386-features.cc (fold_sibcall): Fold the sibcall
targets into jump table.

gcc/testsuite/

PR target/14721
* gcc.target/i386/pr14721-1a.c: New.
* gcc.target/i386/pr14721-1b.c: Likewise.
* gcc.target/i386/pr14721-1c.c: Likewise.
* gcc.target/i386/pr14721-2c.c: Likewise.
* gcc.target/i386/pr14721-2b.c: Likewise.
* gcc.target/i386/pr14721-2c.c: Likewise.
* gcc.target/i386/pr14721-3c.c: Likewise.
* gcc.target/i386/pr14721-3b.c: Likewise.
* gcc.target/i386/pr14721-3c.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-features.cc   | 76 ++
 gcc/testsuite/gcc.target/i386/pr14721-1a.c | 54 +++
 gcc/testsuite/gcc.target/i386/pr14721-1b.c | 37 +++
 gcc/testsuite/gcc.target/i386/pr14721-1c.c | 37 +++
 gcc/testsuite/gcc.target/i386/pr14721-2a.c | 58 +
 gcc/testsuite/gcc.target/i386/pr14721-2b.c | 41 
 gcc/testsuite/gcc.target/i386/pr14721-2c.c | 43 
 gcc/testsuite/gcc.target/i386/pr14721-3a.c | 56 
 gcc/testsuite/gcc.target/i386/pr14721-3b.c | 40 
 gcc/testsuite/gcc.target/i386/pr14721-3c.c | 39 +++
 10 files changed, 481 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-3c.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index a55f8e38283..108191955e2 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3660,6 +3660,82 @@ fold_sibcall (void)
  EXIT_BLOCK_PTR_FOR_FN (cfun));
  branch_edge->flags |= EDGE_SIBCALL | EDGE_ABNORMAL;
}
+ else if (label && !ANY_RETURN_P (label))
+   {
+ /* Check if it is a jump table with addresses.  */
+ rtx_insn *target = as_a (label);
+ rtx_insn *table = next_insn (target);
+ if (!table
+ || !JUMP_TABLE_DATA_P (table)
+ || GET_CODE (PATTERN (table)) != ADDR_VEC)
+   continue;
+
+ basic_block bb_dest;
+ rtx body = PATTERN (table);
+ unsigned int i, len = XVECLEN (body, 0);
+ rtx *sibcall_targets = new rtx [len]();
+ rtx *sibcall_notes = new rtx [len]();
+
+ for (i = 0; i < len; i++)
+   {
+ label = XVECEXP (body, 0, i);
+ label = XEXP (label, 0);
+ /* Check if the basic block referenced by LABEL only
+has a direct sibcall.  */
+ bb_dest = BLOCK_FOR_INSN (label);
+ sibcall_targets[i]
+   = sibcall_on

Re: [PATCH 3/4] Rewrite VCEs of integral types [PR116939]

2025-05-04 Thread Richard Biener
On Mon, May 5, 2025 at 3:45 AM Andrew Pinski  wrote:
>
> Like the patch to phiopt (r15-4033-g1f619fe25925a5f7), this adds rewriting
> of VCE to gimple_with_undefined_signed_overflow/rewrite_to_defined_overflow.
> I have not seen a case yet for needing this rewrite but this step is needed
> to use gimple_with_undefined_signed_overflow/rewrite_to_defined_overflow from
> phiopt.

So what again was the "undefinedness" here?  And why is this relevant
for _signed overflow_?  Wouldn't it be at least no problem to V_C_E
a lower-precision value to a higher precision?

Maybe we should simply reject V_C_Es of non-mode-precision entities
on GIMPLE, much like we reject BIT_FIELD_REFs of those.

> Bootstrappd and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> PR tree-optimization/116939
> * gimple-fold.cc (gimple_with_undefined_signed_overflow): Return true
> for VCE with integral types.
> (rewrite_to_defined_overflow): Handle VCE rewriting to a cast.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-fold.cc | 30 --
>  1 file changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index c060ef81a42..a6a7fcbb8c1 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -10602,6 +10602,14 @@ gimple_with_undefined_signed_overflow (gimple *stmt)
>if (!INTEGRAL_TYPE_P (lhs_type)
>&& !POINTER_TYPE_P (lhs_type))
>  return false;
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  /* VCE from integral types to another integral types but with
> + different precisions need to be changed into casts
> + to be well defined. */
> +  if (gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR
> +  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (rhs, 0)))
> +  && is_gimple_val (TREE_OPERAND (rhs, 0)))
> +return true;
>if (!TYPE_OVERFLOW_UNDEFINED (lhs_type))
>  return false;
>if (!arith_code_with_undefined_signed_overflow
> @@ -10630,10 +10638,28 @@ rewrite_to_defined_overflow (gimple_stmt_iterator 
> *gsi, gimple *stmt,
>"overflow ");
>print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>  }
> -
> +  gimple_seq stmts = NULL;
> +  /* VCE from integral types to another integral types but with
> + different precisions need to be changed into casts
> + to be well defined. */
> +  if (gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR)
> +{
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  tree new_rhs = TREE_OPERAND (rhs, 0);
> +  gcc_assert (is_gimple_val (new_rhs));
> +  gimple_assign_set_rhs_code (stmt, NOP_EXPR);
> +  gimple_assign_set_rhs1 (stmt, new_rhs);
> +  if (in_place)
> + update_stmt (stmt);
> +  else
> +   {
> + gimple_set_modified (stmt, true);
> + gimple_seq_add_stmt (&stmts, stmt);
> +   }
> +  return stmts;
> +}
>tree lhs = gimple_assign_lhs (stmt);
>tree type = unsigned_type_for (TREE_TYPE (lhs));
> -  gimple_seq stmts = NULL;
>if (gimple_assign_rhs_code (stmt) == ABS_EXPR)
>  gimple_assign_set_rhs_code (stmt, ABSU_EXPR);
>else
> --
> 2.34.1
>


[to-be-committed][RISC-V] Adjust rvv tests after recent jump threading change

2025-05-04 Thread Jeff Law

[ Resending with RISC-V tag. ]

Richi's jump threading patch is resulting in new jump threading 
opportunities triggering in various vsetvl related tests.  When those 
new threading opportunities are realized on vector code we usually end 
up with a different number of vsetvls due to the inherent block copying.


At first I was adjusting cases to work with the new jump threads, then 
realized we could easily end up back here if we change the threading 
heuristics and such.  So I just made these tests disable jump threading. 
 I didn't do it pervasively, just for those that have been affected.


Waiting on pre-commit CI to render its verdict.

Jeffgcc/testsuite

* gcc.target/riscv/rvv/vsetvl/avl_prop-2.c: Disable jump threading
and adjust number of expected vsetvls as needed.
* gcc.target/riscv/rvv/vsetvl/avl_single-56.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-67.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-71.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_prop-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_prop-2.c
index 0379429a754..edb12a12664 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_prop-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_prop-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gc_zve32f -mabi=lp64d 
-O3 -mrvv-vector-bits=zvl" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gc_zve32f -mabi=lp64d 
-O3 -mrvv-vector-bits=zvl -fno-thread-jumps" } */
 
 int d0, sj, v0, rp, zi;
 
@@ -38,4 +38,4 @@ ka:
   goto ka;
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*1} 2 } } */
+/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*1} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-56.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-56.c
index 5db1a402be6..3d3c5d6e9fb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-56.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-56.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-schedule-insns -fno-schedule-insns2 -fno-thread-jumps" } */
 
 #include "riscv_vector.h"
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
index 3f22fc870d9..013d32c55a8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2 -fno-thread-jumps" 
} */
 
 #include "riscv_vector.h"
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
index 64666d31f1a..aef832546c7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2 -fno-thread-jumps" 
} */
 
 #include "riscv_vector.h"
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
index 07a64b43a53..fa4328f97f3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2 -fno-thread-jumps" 
} */
 
 #include "riscv_vector.h"
 
@@ -50,5 +50,5 @@ void f (int8_t * restrict in, int8_t * restrict out, int l, 
int n, int m, size_t
   }
 }
 
-/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts 
"-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts 
"-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" 
no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" 
} } } } */
+/* { dg-final { scan-assembler-

Re: [PATCH] RISC-V: Implment H modifier for printing the next register name

2025-05-04 Thread Jeff Law




On 4/27/25 1:28 AM, Jin Ma wrote:

For RV32 inline assembly, when handling 64-bit integer data, it is
often necessary to process the lower and upper 32 bits separately.
Unfortunately, we can only output the current register name
(lower 32 bits) but not the next register name (upper 32 bits).

To address this, the modifier 'H' has been added to allow users
to handle the upper 32 bits of the data. While I believe the
modifier 'N' (representing the next register name) might be more
suitable for this functionality, 'N' is already in use.
Therefore, 'H' (representing the high register) was chosen instead.

Co-Authored-By: Dimitar Dimitrov

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add H.
* doc/extend.texi: Document for H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/modifier-H-error-1.c: New test.
* gcc.target/riscv/modifier-H-error-2.c: New test.
* gcc.target/riscv/modifier-H.c: New test.
I went ahead and pushed this to the trunk.  One less patch on the 
dashboard for the Tuesday meeting :-)


jeff



[to-be-committed] Adjust rvv tests after recent jump threading change

2025-05-04 Thread Jeff Law
Richi's jump threading patch is resulting in new jump threading 
opportunities triggering in various vsetvl related tests.  When those 
new threading opportunities are realized on vector code we usually end 
up with a different number of vsetvls due to the inherent block copying.


At first I was adjusting cases to work with the new jump threads, then 
realized we could easily end up back here if we change the threading 
heuristics and such.  So I just made these tests disable jump threading. 
 I didn't do it pervasively, just for those that have been affected.


Waiting on pre-commit CI to render its verdict.

Jeffgcc/testsuite

* gcc.target/riscv/rvv/vsetvl/avl_prop-2.c: Disable jump threading
and adjust number of expected vsetvls as needed.
* gcc.target/riscv/rvv/vsetvl/avl_single-56.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-67.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-68.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-71.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_prop-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_prop-2.c
index 0379429a754..edb12a12664 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_prop-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_prop-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gc_zve32f -mabi=lp64d 
-O3 -mrvv-vector-bits=zvl" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gc_zve32f -mabi=lp64d 
-O3 -mrvv-vector-bits=zvl -fno-thread-jumps" } */
 
 int d0, sj, v0, rp, zi;
 
@@ -38,4 +38,4 @@ ka:
   goto ka;
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*1} 2 } } */
+/* { dg-final { scan-assembler-times {vsetivli\tzero,\s*1} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-56.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-56.c
index 5db1a402be6..3d3c5d6e9fb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-56.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-56.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-schedule-insns -fno-schedule-insns2 -fno-thread-jumps" } */
 
 #include "riscv_vector.h"
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
index 3f22fc870d9..013d32c55a8 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-67.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2 -fno-thread-jumps" 
} */
 
 #include "riscv_vector.h"
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
index 64666d31f1a..aef832546c7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-68.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2 -fno-thread-jumps" 
} */
 
 #include "riscv_vector.h"
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
index 07a64b43a53..fa4328f97f3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-71.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv32gcv -mabi=ilp32 
-fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2 -fno-thread-jumps" 
} */
 
 #include "riscv_vector.h"
 
@@ -50,5 +50,5 @@ void f (int8_t * restrict in, int8_t * restrict out, int l, 
int n, int m, size_t
   }
 }
 
-/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts 
"-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts 
"-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" 
no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" 
} } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9

Re: [PATCH] Fortran: array subreferences and components of derived types [PR119986]

2025-05-04 Thread Harald Anlauf

Hi Paul,

Am 04.05.25 um 11:33 schrieb Paul Richard Thomas:

Hi Harald,

This looks good to me both for mainline and backporting as far back as you
wish.


thanks for the review!

Committed as r16-376-gfceb6022798b58 so far.
Will wait a week or so before starting a backport.

Cheers,
Harald


Thanks

Paul


On Sat, 3 May 2025 at 19:51, Harald Anlauf  wrote:


Dear all,

the attached, semi-obvious patch fixes bugging issues with passing of
array subreferences when either an inquiry reference to a complex array
or a substring reference to a character array was involved, and the
array was a component of a derived type.  The obvious cause was always
an early termination of the scan of the reference.

The original PR was about complex issues, but since I was aware of
a similar issue for substrings, I fixed that at the same time.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this is a hideous wrong-code bug, I'd like to backport
to at least 15-branch, if this is ok.

Thanks,
Harald









Re: [patch, Fortran] Fix PR 119928, rejects-valid 15/16 regression

2025-05-04 Thread Harald Anlauf

Hi Thomas,

Am 04.05.25 um 12:10 schrieb Thomas Koenig:

Hi Harald,


It appears that something is not right and generates wrong code with
the check enabled.  Can you have another look?


The problem was indeed that generating a formal from an actual
arglist is a bad idea when classes are involved.  Fixed in the
attached patch.  I think it still makes sense to remove the checks
when the other attributes are present (or PR96073 may come back
in different guise, even if I have to test case at present).


this is probably the best solution.  So let's go with it.


I have also converted the test to a run-time check.

Ok for trunk and backport to gcc-15?


OK for both.  Thanks for the patch!

Harald


Best regards

 Thomas

gcc/fortran/ChangeLog:

 PR fortran/119928
 * interface.cc (gfc_check_dummy_characteristics): Do not issue
 error if one dummy symbol has been generated from an actual
 argument and the other one has OPTIONAL, INTENT, ALLOCATABLE,
 POINTER, TARGET, VALUE, ASYNCHRONOUS or CONTIGUOUS.
 (gfc_get_formal_from_actual_arglist): Do nothing if symbol
 is a class.

gcc/testsuite/ChangeLog:

 PR fortran/119928
 * gfortran.dg/interface_60.f90: New test.




[PATCH] RISC-V: Fix gcc.target/riscv/predef-19.c [PR120054]

2025-05-04 Thread Kito Cheng
gcc/testsuite/ChangeLog:

PR target/120054
* gcc.target/riscv/predef-19.c: Adjust testcase.
---
 gcc/testsuite/gcc.target/riscv/predef-19.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/predef-19.c 
b/gcc/testsuite/gcc.target/riscv/predef-19.c
index ca3d57abca9..f2b4d9b3048 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-19.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-19.c
@@ -27,10 +27,6 @@ int main () {
 #error "__riscv_zicsr"
 #endif
 
-#if !defined(_riscv_zmmul)
-#error "__riscv_zmmul"
-#endif
-
 #if !defined(__riscv_zve32x)
 #error "__riscv_zve32x"
 #endif
-- 
2.34.1



Re: [PATCH v6 1/2] RISC-V: Add intrinsics support for SiFive Xsfvcp extensions.

2025-05-04 Thread Kito Cheng
Hi Mark:

Thanks for notifying me, fixed on the trunk :)

[1]
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d3651f07bbf56837f019e366b75d01f197dab2f1

On Fri, May 2, 2025 at 4:31 AM Mark Wielaard  wrote:

> Hi Kito,
>
> Unfortunately this breaks bootstrap for riscv:
>
> ../../gcc/gcc/config/riscv/genrvv-type-indexer.cc: In function ‘int
> main(int, const char**)’:
> ../../gcc/gcc/config/riscv/genrvv-type-indexer.cc:302:7: error: this ‘for’
> clause does not guard... [-Werror=misleading-indentation]
>   302 |   for (unsigned eew : EEW_SIZE_LIST)
>   |   ^~~
> ../../gcc/gcc/config/riscv/genrvv-type-indexer.cc:306:9: note: ...this
> statement, but the latter is misleadingly indented as if it were guarded by
> the ‘for’
>   306 | fprintf (fp, "  /*X2*/ INVALID,\n");
>   | ^~~
> cc1plus: all warnings being treated as errors
>
> https://builder.sourceware.org/buildbot/#/builders/338/builds/150
>
> On Wed, Apr 30, 2025 at 05:25:34PM +0800, Kito Cheng wrote:
> [...]
> > > diff --git a/gcc/config/riscv/genrvv-type-indexer.cc
> b/gcc/config/riscv/genrvv-type-indexer.cc
> > > index 6de23cb6e1c..2fd429ad734 100644
> > > --- a/gcc/config/riscv/genrvv-type-indexer.cc
> > > +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> > > @@ -303,6 +303,8 @@ main (int argc, const char **argv)
> > > fprintf (fp, "  /*UNSIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
> > >  inttype (eew, LMUL1_LOG2, /* unsigned_p */true).c_str
> ());
> > >
> > > +   fprintf (fp, "  /*X2*/ INVALID,\n");
> > > +
> > >for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
> > > {
> > >   unsigned multiple_of_lmul = 1 << lmul_log2_offset;
>
> That fprintf line is indented too much.
>
> Cheers,
>
> Mark
>


[PATCH v2] Use incoming small integer argument value as if promoted

2025-05-04 Thread H.J. Lu
On Wed, Apr 30, 2025 at 2:43 PM Richard Biener
 wrote:
>
> On Tue, Apr 29, 2025 at 3:53 PM H.J. Lu  wrote:
> >
> > On Tue, Apr 29, 2025 at 9:34 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Apr 29, 2025 at 2:33 PM H.J. Lu  wrote:
> > > >
> > > > On Tue, Apr 29, 2025 at 6:46 PM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Tue, Apr 29, 2025 at 12:32 PM H.J. Lu  wrote:
> > > > > >
> > > > > > On Tue, Apr 29, 2025 at 5:56 PM Richard Biener
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Apr 29, 2025 at 10:48 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, Apr 29, 2025 at 4:25 PM Richard Biener
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Apr 29, 2025 at 9:39 AM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > For targets, like x86, which define 
> > > > > > > > > > TARGET_PROMOTE_PROTOTYPES to return
> > > > > > > > > > true, all integer arguments smaller than int are passed as 
> > > > > > > > > > int:
> > > > > > > > > >
> > > > > > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.c
> > > > > > > > > > extern int baz (char c1);
> > > > > > > > > >
> > > > > > > > > > int
> > > > > > > > > > foo (char c1)
> > > > > > > > > > {
> > > > > > > > > >   return baz (c1);
> > > > > > > > > > }
> > > > > > > > > > [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c
> > > > > > > > > > [hjl@gnu-tgl-3 pr14907]$ cat x.s
> > > > > > > > > > .file "x.c"
> > > > > > > > > > .text
> > > > > > > > > > .p2align 4
> > > > > > > > > > .globl foo
> > > > > > > > > > .type foo, @function
> > > > > > > > > > foo:
> > > > > > > > > > .LFB0:
> > > > > > > > > > .cfi_startproc
> > > > > > > > > > movsbl 4(%esp), %eax
> > > > > > > > > > movl %eax, 4(%esp)
> > > > > > > > > > jmp baz
> > > > > > > > > > .cfi_endproc
> > > > > > > > > > .LFE0:
> > > > > > > > > > .size foo, .-foo
> > > > > > > > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)"
> > > > > > > > > > .section .note.GNU-stack,"",@progbits
> > > > > > > > > > [hjl@gnu-tgl-3 pr14907]$
> > > > > > > > > >
> > > > > > > > > > But integer promotion:
> > > > > > > > > >
> > > > > > > > > > movsbl 4(%esp), %eax
> > > > > > > > > > movl %eax, 4(%esp)
> > > > > > > > > >
> > > > > > > > > > isn't necessary if incoming arguments are copied to 
> > > > > > > > > > outgoing arguments
> > > > > > > > > > directly.
> > > > > > > > > >
> > > > > > > > > > Add a new target hook, 
> > > > > > > > > > TARGET_GET_SMALL_INTEGER_ARGUMENT_VALUE, defaulting
> > > > > > > > > > to return nullptr.  If the new target hook returns 
> > > > > > > > > > non-nullptr, use it to
> > > > > > > > > > get the outgoing small integer argument.  The x86 target 
> > > > > > > > > > hook returns the
> > > > > > > > > > value of the corresponding incoming argument as int if it 
> > > > > > > > > > can be used as
> > > > > > > > > > the outgoing argument.  If callee is a global function, we 
> > > > > > > > > > always properly
> > > > > > > > > > extend the incoming small integer arguments in callee.  If 
> > > > > > > > > > callee is a
> > > > > > > > > > local function, since DECL_ARG_TYPE has the original small 
> > > > > > > > > > integer type,
> > > > > > > > > > we will extend the incoming small integer arguments in 
> > > > > > > > > > callee if needed.
> > > > > > > > > > It is safe only if
> > > > > > > > > >
> > > > > > > > > > 1. Caller and callee are not nested functions.
> > > > > > > > > > 2. Caller and callee use the same ABI.
> > > > > > > > >
> > > > > > > > > How do these influence the value?  TARGET_PROMOTE_PROTOTYPES
> > > > > > > > > should apply to all of them, no?
> > > > > > > >
> > > > > > > > When the arguments are passed in different registers in 
> > > > > > > > different ABIs,
> > > > > > > > we have to copy them anyway.
> > > > > > >
> > > > > > > But optimization can elide copies easily, but not easily elide
> > > > > > > sign-/zero-extensions.
> > > > > >
> > > > > > What I meant was that caller and callee have different ABIs.
> > > > > > Optimizer can't elide copies since incoming arguments and outgoing
> > > > > > arguments are in different registers.  They have to be moved.
> > > > > >
> > > > > > > > >
> > > > > > > > > > 3. The incoming argument and the outgoing argument are in 
> > > > > > > > > > the same
> > > > > > > > > > location.
> > > > > > > > >
> > > > > > > > > Why's that?  Can't we move them but still elide the 
> > > > > > > > > sign-/zero-extension?
> > > > > > > >
> > > > > > > > If they aren't in the same locations, we have to move them 
> > > > > > > > anyway.
> > > > > > > > This patch tries to avoid necessary moves of incoming arguments 
> > > > > > > > to
> > > > > > > > outgoing arguments.
> > > > > > >
> > > > > > > That's not exactly how you presented it, but you convenitently 
> > > > > > > used
> > > > > > > x86 stack argument passing.  That might be difficult to elide, 
> > > > > > > but is
> > > > > > > also uncommon for "small integer types" - does the same issue 

[PATCH] RISC-V: Use vclmul for CRC expansion if available

2025-05-04 Thread Anton Blanchard
If the vector version of clmul (vclmul) is available and the scalar
one is not, use it for CRC expansion.

gcc/Changelog:

* config/riscv/bitmanip.md (crc_rev4): Check
TARGET_ZVBC.
(crc4): Likewise.
* config/riscv/riscv.cc (expand_crc_using_clmul): Emit code using
vclmul if TARGET_ZVBC.
(expand_reversed_crc_using_clmul): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/crc-builtin-zvbc.c: New test.

Signed-off-by: Anton Blanchard 
---
 gcc/config/riscv/bitmanip.md  |   5 +-
 gcc/config/riscv/riscv.cc | 110 +++---
 .../riscv/rvv/base/crc-builtin-zvbc.c |  66 +++
 3 files changed, 160 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/crc-builtin-zvbc.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d0919ece31f..86a8c5d5ed9 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -1221,7 +1221,7 @@
  we can't keep it in 64 bit variable.)
  then use clmul instruction to implement the CRC,
  otherwise (TARGET_ZBKB) generate table based using brev.  */
-  if ((TARGET_ZBKC || TARGET_ZBC) && mode < word_mode)
+  if ((TARGET_ZBKC || TARGET_ZBC || TARGET_ZVBC) && mode < 
word_mode)
 expand_reversed_crc_using_clmul (mode, mode,
 operands);
   else if (TARGET_ZBKB)
@@ -1253,7 +1253,8 @@
  (match_operand:SUBX 3)]
  UNSPEC_CRC))]
   /* We don't support the case when data's size is bigger than CRC's size.  */
-  "(TARGET_ZBKC || TARGET_ZBC) && mode >= mode"
+  "(TARGET_ZBKC || TARGET_ZBC || TARGET_ZVBC)
+   && mode >= mode"
 {
   /* If we have the ZBC or ZBKC extension (ie, clmul) and
  it is possible to store the quotient within a single variable
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a0657323f65..13d6e157448 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -13987,17 +13987,53 @@ expand_crc_using_clmul (scalar_mode crc_mode, 
scalar_mode data_mode,
   rtx data = gen_rtx_ZERO_EXTEND (word_mode, operands[2]);
   riscv_expand_op (XOR, word_mode, a0, crc, data);
 
-  if (TARGET_64BIT)
-emit_insn (gen_riscv_clmul_di (a0, a0, t0));
-  else
-emit_insn (gen_riscv_clmul_si (a0, a0, t0));
+  if (TARGET_ZBKC || TARGET_ZBC)
+{
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_clmul_di (a0, a0, t0));
+  else
+   emit_insn (gen_riscv_clmul_si (a0, a0, t0));
 
-  riscv_expand_op (LSHIFTRT, word_mode, a0, a0,
-  gen_int_mode (crc_size, word_mode));
-  if (TARGET_64BIT)
-emit_insn (gen_riscv_clmul_di (a0, a0, t1));
+  riscv_expand_op (LSHIFTRT, word_mode, a0, a0,
+  gen_int_mode (crc_size, word_mode));
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_clmul_di (a0, a0, t1));
+  else
+   emit_insn (gen_riscv_clmul_si (a0, a0, t1));
+}
   else
-emit_insn (gen_riscv_clmul_si (a0, a0, t1));
+{
+  machine_mode vmode;
+  if (!riscv_vector::get_vector_mode (DImode, 1).exists (&vmode))
+   gcc_unreachable ();
+
+  rtx vec = gen_reg_rtx (vmode);
+
+  insn_code icode1 = code_for_pred_broadcast (vmode);
+  rtx ops1[] = {vec, a0};
+  emit_nonvlmax_insn (icode1, UNARY_OP, ops1, CONST1_RTX (Pmode));
+
+  rtx rvv1di_reg = gen_rtx_SUBREG (RVVM1DImode, vec, 0);
+  insn_code icode2 = code_for_pred_vclmul_scalar (UNSPEC_VCLMUL,
+ E_RVVM1DImode);
+  rtx ops2[] = {rvv1di_reg, rvv1di_reg, t0};
+  emit_nonvlmax_insn (icode2, riscv_vector::BINARY_OP, ops2, CONST1_RTX
+ (Pmode));
+
+  rtx shift_amount = gen_int_mode (data_size, Pmode);
+  insn_code icode3 = code_for_pred_scalar (LSHIFTRT, vmode);
+  rtx ops3[] = {vec, vec, shift_amount};
+  emit_nonvlmax_insn (icode3, BINARY_OP, ops3, CONST1_RTX (Pmode));
+
+  insn_code icode4 = code_for_pred_vclmul_scalar (UNSPEC_VCLMULH,
+ E_RVVM1DImode);
+  rtx ops4[] = {rvv1di_reg, rvv1di_reg, t1};
+  emit_nonvlmax_insn (icode4, riscv_vector::BINARY_OP, ops4, CONST1_RTX
+ (Pmode));
+
+  rtx vec_low_lane = gen_lowpart (DImode, vec);
+  riscv_emit_move (a0, vec_low_lane);
+}
 
   if (crc_size > data_size)
 {
@@ -14046,19 +14082,55 @@ expand_reversed_crc_using_clmul (scalar_mode 
crc_mode, scalar_mode data_mode,
   rtx a0 = gen_reg_rtx (word_mode);
   riscv_expand_op (XOR, word_mode, a0, crc, data);
 
-  if (TARGET_64BIT)
-emit_insn (gen_riscv_clmul_di (a0, a0, t0));
-  else
-emit_insn (gen_riscv_clmul_si (a0, a0, t0));
+  if (TARGET_ZBKC || TARGET_ZBC)
+{
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_clmul_di (a0, a0, t0));
+  else
+   emit_insn (gen_riscv_clmul_si (a0, a0, t0));
 
-  rtx num

Re: [PATCH] i386: Implement Thread Local Storage on Windows

2025-05-04 Thread Julian Waters
gcc bootstrap works on my end pretty well, but you know what they say,
no one likes an "It works on my device" developer :)

In all seriousness, no observable problems were seen on my end, apart
from all the existing ucrt64 programs failing to run because of the
libstdc++ issue. I believe Liu Hao's patch should fix that, those
installed packages will need a recompile afterwards to run properly
again.

best regards,
Julian

On Sat, May 3, 2025 at 1:20 PM Jonathan Yong <10wa...@gmail.com> wrote:
>
> On 5/2/25 9:59 AM, Julian Waters wrote:
> > Hi all,
> >
> > After a long hiatus, I've returned to address review comments on the 
> > Windows TLS patch. Attached here is the final patch from this effort. Ok 
> > for merge? Will need help from Windows maintainers to commit once this is 
> > approved.
> >
> > best regards,
> > Julian
> >
> Patch looks OK to me, have you tried bootstrapping gcc as a test?
>
>


Re: [PATCH] RISC-V: Implment H modifier for printing the next register name

2025-05-04 Thread Jin Ma
On Sun, 4 May 2025 08:45:25 -0600, Jeff Law wrote:
> 
> 
> On 4/27/25 1:28 AM, Jin Ma wrote:
> > For RV32 inline assembly, when handling 64-bit integer data, it is
> > often necessary to process the lower and upper 32 bits separately.
> > Unfortunately, we can only output the current register name
> > (lower 32 bits) but not the next register name (upper 32 bits).
> > 
> > To address this, the modifier 'H' has been added to allow users
> > to handle the upper 32 bits of the data. While I believe the
> > modifier 'N' (representing the next register name) might be more
> > suitable for this functionality, 'N' is already in use.
> > Therefore, 'H' (representing the high register) was chosen instead.
> > 
> > Co-Authored-By: Dimitar Dimitrov
> > 
> > gcc/ChangeLog:
> > 
> > * config/riscv/riscv.cc (riscv_print_operand): Add H.
> > * doc/extend.texi: Document for H.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/riscv/modifier-H-error-1.c: New test.
> > * gcc.target/riscv/modifier-H-error-2.c: New test.
> > * gcc.target/riscv/modifier-H.c: New test.
> I went ahead and pushed this to the trunk.  One less patch on the 
> dashboard for the Tuesday meeting :-)

Apologies for the delay in addressing this patch. I’ve been on leave over
the past few days. Thanks.

Best regards,
Jin Ma

Re: [PATCH 2/3] Flip default to LRA for targets with -mlra

2025-05-04 Thread John Paul Adrian Glaubitz
Hello Richard,

On Fri, 2025-05-02 at 12:12 +0200, Richard Biener wrote:
> This flips the default to LRA for targets with an -mlra option not
> using Mask(..).
> 
>   * config/avr/avr.opt (mlra): Flip to default on.
>   * config/m68k/m68k.opt (mlra): Likewise.
>   * config/pa/pa.opt (mlra): Likewise.
>   * config/sh/sh.opt (mlra): Likewise.
> ---
>  gcc/config/avr/avr.opt   | 2 +-
>  gcc/config/m68k/m68k.opt | 2 +-
>  gcc/config/pa/pa.opt | 2 +-
>  gcc/config/sh/sh.opt | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
> index fcd2bf68f2a..3aa4955361b 100644
> --- a/gcc/config/avr/avr.opt
> +++ b/gcc/config/avr/avr.opt
> @@ -19,7 +19,7 @@
>  ; .
>  
>  mlra
> -Target Var(avropt_lra_p) UInteger Init(0) Optimization Undocumented
> +Target Var(avropt_lra_p) UInteger Init(1) Optimization Undocumented
>  Usa LRA for reload instead of the old reload framework.  This option is 
> experimental, and it may be removed in future versions of the compiler.
>  
>  mcall-prologues
> diff --git a/gcc/config/m68k/m68k.opt b/gcc/config/m68k/m68k.opt
> index 35f86ba11ff..b664425c5a1 100644
> --- a/gcc/config/m68k/m68k.opt
> +++ b/gcc/config/m68k/m68k.opt
> @@ -147,7 +147,7 @@ Target RejectNegative Mask(LONG_JUMP_TABLE_OFFSETS)
>  Use 32-bit offsets in jump tables rather than 16-bit offsets.
>  
>  mlra
> -Target Var(m68k_lra_p) Undocumented
> +Target Var(m68k_lra_p) Init(1) Undocumented
>  Usa LRA for reload instead of the old reload framework.  This option is
>  experimental, and it may be removed in future versions of the compiler.

I just applied this patch against master and did a full native bootstrap with
all languages except Ada, Go, Rust and Cobol on Debian unstable m68k which 
worked
without any problems. Full build log in [1].

I therefore suggest to flip the switch for m68k now with reference to PR113939 
[2]
so that we can close this bug and tick off m68k on the list of targets to be 
switched
to LRA by default.

Thanks,
Adrian

> [1] 
> http://people.debian.org/~glaubitz/gcc16-m68k-lra-all-but-go-and-ada.log.gz
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113939

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH 2/3] Flip default to LRA for targets with -mlra

2025-05-04 Thread John Paul Adrian Glaubitz
Hi,

On Sun, 2025-05-04 at 11:18 +0200, John Paul Adrian Glaubitz wrote:
> On Fri, 2025-05-02 at 12:12 +0200, Richard Biener wrote:
> > This flips the default to LRA for targets with an -mlra option not
> > using Mask(..).
> > 
> > * config/avr/avr.opt (mlra): Flip to default on.
> > * config/m68k/m68k.opt (mlra): Likewise.
> > * config/pa/pa.opt (mlra): Likewise.
> > * config/sh/sh.opt (mlra): Likewise.
> > ---
> >  gcc/config/avr/avr.opt   | 2 +-
> >  gcc/config/m68k/m68k.opt | 2 +-
> >  gcc/config/pa/pa.opt | 2 +-
> >  gcc/config/sh/sh.opt | 2 +-
> >  4 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
> > index fcd2bf68f2a..3aa4955361b 100644
> > --- a/gcc/config/avr/avr.opt
> > +++ b/gcc/config/avr/avr.opt
> > @@ -19,7 +19,7 @@
> >  ; .
> >  
> >  mlra
> > -Target Var(avropt_lra_p) UInteger Init(0) Optimization Undocumented
> > +Target Var(avropt_lra_p) UInteger Init(1) Optimization Undocumented
> >  Usa LRA for reload instead of the old reload framework.  This option is 
> > experimental, and it may be removed in future versions of the compiler.
> >  
> >  mcall-prologues
> > diff --git a/gcc/config/m68k/m68k.opt b/gcc/config/m68k/m68k.opt
> > index 35f86ba11ff..b664425c5a1 100644
> > --- a/gcc/config/m68k/m68k.opt
> > +++ b/gcc/config/m68k/m68k.opt
> > @@ -147,7 +147,7 @@ Target RejectNegative Mask(LONG_JUMP_TABLE_OFFSETS)
> >  Use 32-bit offsets in jump tables rather than 16-bit offsets.
> >  
> >  mlra
> > -Target Var(m68k_lra_p) Undocumented
> > +Target Var(m68k_lra_p) Init(1) Undocumented
> >  Usa LRA for reload instead of the old reload framework.  This option is
> >  experimental, and it may be removed in future versions of the compiler.
> 
> I just applied this patch against master and did a full native bootstrap with
> all languages except Ada, Go, Rust and Cobol on Debian unstable m68k which 
> worked
> without any problems. Full build log in [1].
> 
> I therefore suggest to flip the switch for m68k now with reference to 
> PR113939 [2]
> so that we can close this bug and tick off m68k on the list of targets to be 
> switched
> to LRA by default.

I just noticed that my builds were with --disable-bootstrap. *sigh*

Let me try that again.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH] Fortran: array subreferences and components of derived types [PR119986]

2025-05-04 Thread Paul Richard Thomas
Hi Harald,

This looks good to me both for mainline and backporting as far back as you
wish.

Thanks

Paul


On Sat, 3 May 2025 at 19:51, Harald Anlauf  wrote:

> Dear all,
>
> the attached, semi-obvious patch fixes bugging issues with passing of
> array subreferences when either an inquiry reference to a complex array
> or a substring reference to a character array was involved, and the
> array was a component of a derived type.  The obvious cause was always
> an early termination of the scan of the reference.
>
> The original PR was about complex issues, but since I was aware of
> a similar issue for substrings, I fixed that at the same time.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> As this is a hideous wrong-code bug, I'd like to backport
> to at least 15-branch, if this is ok.
>
> Thanks,
> Harald
>
>


Re: [patch, Fortran] Fix PR 119928, rejects-valid 15/16 regression

2025-05-04 Thread Thomas Koenig

Hi Harald,


It appears that something is not right and generates wrong code with
the check enabled.  Can you have another look?


The problem was indeed that generating a formal from an actual
arglist is a bad idea when classes are involved.  Fixed in the
attached patch.  I think it still makes sense to remove the checks
when the other attributes are present (or PR96073 may come back
in different guise, even if I have to test case at present).
I have also converted the test to a run-time check.

Ok for trunk and backport to gcc-15?

Best regards

Thomas

gcc/fortran/ChangeLog:

PR fortran/119928
* interface.cc (gfc_check_dummy_characteristics): Do not issue
error if one dummy symbol has been generated from an actual
argument and the other one has OPTIONAL, INTENT, ALLOCATABLE,
POINTER, TARGET, VALUE, ASYNCHRONOUS or CONTIGUOUS.
(gfc_get_formal_from_actual_arglist): Do nothing if symbol
is a class.

gcc/testsuite/ChangeLog:

PR fortran/119928
* gfortran.dg/interface_60.f90: New test.diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index 1e552a3df86..753f589ff67 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -1403,77 +1403,82 @@ gfc_check_dummy_characteristics (gfc_symbol *s1, gfc_symbol *s2,
 	}
 }
 
-  /* Check INTENT.  */
-  if (s1->attr.intent != s2->attr.intent && !s1->attr.artificial
-  && !s2->attr.artificial)
-{
-  snprintf (errmsg, err_len, "INTENT mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* A lot of information is missing for artificially generated
+ formal arguments, let's not look into that.  */
 
-  /* Check OPTIONAL attribute.  */
-  if (s1->attr.optional != s2->attr.optional)
+  if (!s1->attr.artificial && !s2->attr.artificial)
 {
-  snprintf (errmsg, err_len, "OPTIONAL mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check INTENT.  */
+  if (s1->attr.intent != s2->attr.intent)
+	{
+	  snprintf (errmsg, err_len, "INTENT mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check ALLOCATABLE attribute.  */
-  if (s1->attr.allocatable != s2->attr.allocatable)
-{
-  snprintf (errmsg, err_len, "ALLOCATABLE mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check OPTIONAL attribute.  */
+  if (s1->attr.optional != s2->attr.optional)
+	{
+	  snprintf (errmsg, err_len, "OPTIONAL mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check POINTER attribute.  */
-  if (s1->attr.pointer != s2->attr.pointer)
-{
-  snprintf (errmsg, err_len, "POINTER mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check ALLOCATABLE attribute.  */
+  if (s1->attr.allocatable != s2->attr.allocatable)
+	{
+	  snprintf (errmsg, err_len, "ALLOCATABLE mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check TARGET attribute.  */
-  if (s1->attr.target != s2->attr.target)
-{
-  snprintf (errmsg, err_len, "TARGET mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check POINTER attribute.  */
+  if (s1->attr.pointer != s2->attr.pointer)
+	{
+	  snprintf (errmsg, err_len, "POINTER mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check ASYNCHRONOUS attribute.  */
-  if (s1->attr.asynchronous != s2->attr.asynchronous)
-{
-  snprintf (errmsg, err_len, "ASYNCHRONOUS mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check TARGET attribute.  */
+  if (s1->attr.target != s2->attr.target)
+	{
+	  snprintf (errmsg, err_len, "TARGET mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check CONTIGUOUS attribute.  */
-  if (s1->attr.contiguous != s2->attr.contiguous)
-{
-  snprintf (errmsg, err_len, "CONTIGUOUS mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check ASYNCHRONOUS attribute.  */
+  if (s1->attr.asynchronous != s2->attr.asynchronous)
+	{
+	  snprintf (errmsg, err_len, "ASYNCHRONOUS mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check VALUE attribute.  */
-  if (s1->attr.value != s2->attr.value)
-{
-  snprintf (errmsg, err_len, "VALUE mismatch in argument '%s'",
-		s1->name);
-  return false;
-}
+  /* Check CONTIGUOUS attribute.  */
+  if (s1->attr.contiguous != s2->attr.contiguous)
+	{
+	  snprintf (errmsg, err_len, "CONTIGUOUS mismatch in argument '%s'",
+		s1->name);
+	  return false;
+	}
 
-  /* Check VOLATILE attribute.  */
-  if (s1->attr.volatile_ != s2->attr.volatile_)
-{
-  snprintf (errmsg, err_len, "VOLATILE mismatch in argument '%s'",
-		s1->name);
-  return false;
+  /* Check VALUE attribute.  */
+  if (s1->attr.value != s2->attr.value)
+	{
+	  snprintf (errmsg, err_len, "VALUE mismatch in argument '%s'",

Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-04 Thread Maciej W. Rozycki
On Sun, 4 May 2025, John Paul Adrian Glaubitz wrote:

> >  I only have non-BWX hardware and I'm not interested in decommissioning it 
> > or upgrading.  There appear to be a few users around, but I seem to be the 
> > last GCC developer remaining who is willing to do anything about the port.  
> > It doesn't help that Alpha/QEMU appears broken and produces unreliable 
> > results, so it'd have to be someone with actual hardware (or willing to 
> > fix QEMU first).
> 
> What exactly is broken with the QEMU emulation in Alpha? I don't know of any
> bugs, but it could be that you have run into the nasty stack alignment issue
> in the kernel that was fixed in Linux 6.14.

 This was with QEMU in the user emulation mode, causing intermittent 
failures across the GCC testsuite, so unrelated to any Linux kernel 
issues.  Perhaps the system emulation mode works better, but the GCC 
testsuite doesn't rely much on syscall emulation and the nature of the 
failures didn't indicate this aspect of the user emulation mode mattered 
here.

 I have reported it at the time and this has led to Magnus being kind 
enough, following your request, to let me use his BWX Alpha system for 
verification instead, where no intermittent failures were observed, so 
again no Linux kernel bugs mattered here (this was last year, well before 
the fix) and it was QEMU clearly at fault.

> >  What I was not aware of is the situation with the Alpha backend and the 
> > need to put out fires there.  That non-BWX issue with Linux kernel's RCU 
> > algorithms was a nasty surprise to me, one I could have dealt with before 
> > with less time pressure if I knew about it.
> 
> What RCU issue are you talking about? I can only stress that to use Linux on
> Alpha, you *must* use kernel 6.14 or later with CONFIG_COMPACTION disabled
> otherwise you will run into all kinds of issues.

 The very RCU issue that prompted the removal of non-BWX support from the 
kernel last year and then this whole effort of mine.

  Maciej


Re: [PATCH 0/3][RFC] Remove TARGET_LRA_P hook

2025-05-04 Thread John Paul Adrian Glaubitz
Hi Maciej,

On Sun, 2025-05-04 at 12:11 +0100, Maciej W. Rozycki wrote:
> > What exactly is broken with the QEMU emulation in Alpha? I don't know of any
> > bugs, but it could be that you have run into the nasty stack alignment issue
> > in the kernel that was fixed in Linux 6.14.
> 
>  This was with QEMU in the user emulation mode, causing intermittent 
> failures across the GCC testsuite, so unrelated to any Linux kernel 
> issues.  Perhaps the system emulation mode works better, but the GCC 
> testsuite doesn't rely much on syscall emulation and the nature of the 
> failures didn't indicate this aspect of the user emulation mode mattered 
> here.

>From my personal experience, qemu-user has various issues that don't exist
on qemu-system. So, if you're experiencing a qemu-related bug in qemu-user,
it's always worth verifying it with qemu-system.

>  I have reported it at the time and this has led to Magnus being kind 
> enough, following your request, to let me use his BWX Alpha system for 
> verification instead, where no intermittent failures were observed, so 
> again no Linux kernel bugs mattered here (this was last year, well before 
> the fix) and it was QEMU clearly at fault.

Could you point me to the bug report in question? I would like to look into
it and see if it is alpha-specific.

> > >  What I was not aware of is the situation with the Alpha backend and the 
> > > need to put out fires there.  That non-BWX issue with Linux kernel's RCU 
> > > algorithms was a nasty surprise to me, one I could have dealt with before 
> > > with less time pressure if I knew about it.
> > 
> > What RCU issue are you talking about? I can only stress that to use Linux on
> > Alpha, you *must* use kernel 6.14 or later with CONFIG_COMPACTION disabled
> > otherwise you will run into all kinds of issues.
> 
>  The very RCU issue that prompted the removal of non-BWX support from the 
> kernel last year and then this whole effort of mine.

Aha, I wasn't aware that the original cause for the removal of non-BWX support
was due to issue with RCU. I thought the original motivation was that non-BWX
Alpha doesn't support byte-access which Linus called a design mistake.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913