date:20250212

Re: [PATCH] ifcvt: Don't speculation move inline-asm [PR102150]

2025-02-12 Thread Richard Biener

On Wed, Feb 12, 2025 at 5:39 AM Andrew Pinski  wrote:
>
> So unlike loop invariant motion, moving an inline-asm out of an
> if is not always profitable and the cost estimate for the instruction
> inside inline-asm is unknown.
>
> This is a regression from GCC 4.6 which didn't speculatively move inline-asm
> as far as I can tell.
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

Thanks,
Richard.

> PR rtl-optimization/102150
> gcc/ChangeLog:
>
> * ifcvt.cc (cheap_bb_rtx_cost_p): Return false if the insn
> has an inline-asm in it.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/ifcvt.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index cb5597bc171..707937ba2f0 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -166,6 +166,12 @@ cheap_bb_rtx_cost_p (const_basic_block bb,
>  {
>if (NONJUMP_INSN_P (insn))
> {
> + /* Inline-asm's cost is not very estimatable.
> +It could be a costly instruction but the
> +estimate would be the same as a non costly
> +instruction.  */
> + if (asm_noperands (PATTERN (insn)) >= 0)
> +   return false;
>   int cost = insn_cost (insn, speed) * REG_BR_PROB_BASE;
>   if (cost == 0)
> return false;
> --
> 2.43.0
>

Re: [PATCH] loop-invariant: Treat inline-asm conditional trapping [PR102150]

2025-02-12 Thread Richard Biener

On Wed, Feb 12, 2025 at 9:41 AM Andrew Pinski  wrote:
>
> So inline-asm is known not to trap BUT it can have undefined behavior
> if made executed speculatively. This fixes the loop invariant pass to
> treat it similarly as trapping cases. If the inline-asm could be executed
> always, then it will be pulled out of the loop; otherwise it will be kept
> inside the loop.
>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> * loop-invariant.cc (find_invariant_insn): Treat inline-asm similar to
> trapping instruction and only move them if always executed.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/loop-invariant.cc | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/loop-invariant.cc b/gcc/loop-invariant.cc
> index bcb52bb9c76..79a4c39dfb0 100644
> --- a/gcc/loop-invariant.cc
> +++ b/gcc/loop-invariant.cc
> @@ -1123,6 +1123,11 @@ find_invariant_insn (rtx_insn *insn, bool 
> always_reached, bool always_executed)
>if (may_trap_or_fault_p (PATTERN (insn)) && !always_reached)
>  return;
>
> +  /* inline-asm that is not always executed cannot be moved
> + as it might trap. */

as it might conditionally trap?

OK.

Thanks,
Richard.

> +  if (!always_reached && asm_noperands (PATTERN (insn)) >= 0)
> +return;
> +
>depends_on = BITMAP_ALLOC (NULL);
>if (!check_dependencies (insn, depends_on))
>  {
> --
> 2.43.0
>

Re: [PATCH] testsuite: Enable reduced parallel batch sizes

2025-02-12 Thread Thomas Schwinge

Hi!

On 2025-02-05T16:12:38+, Andrew Carlotti  wrote:
> Various aarch64 tests attempt to reduce the batch size for parallel test
> execution to a single test per batch, but it looks like the necessary
> changes to gcc_parallel_test_run_p were accidentally omitted when the
> aarch64-*-acle-asm.exp files were merged.  This patch corrects that
> omission.
>
> This does have a measurable performance impact when running a limited
> number of tests.  For example, in aarch64-sve-acle-asm.exp the use of
> torture options results in 16 compiler executions for each test; when
> running two such tests I observed a total test duration of 3m39 without
> this patch, and 1m55 with the patch.  A full batch of 10 tests would
> have taken over 15 minutes to run on this machine.
>
>
> Ok for master?

Shouldn't the 'libstdc++-v3/testsuite/lib/libstdc++.exp' copy of this
code likewise be changed?


Grüße
 Thomas


> gcc/testsuite/ChangeLog:
>
>   * lib/gcc-defs.exp
>   (gcc_runtest_parallelize_limit_minor): New global variable.
>   (gcc_parallel_test_run_p): Use new variable for batch size.
>
>
> diff --git a/gcc/testsuite/lib/gcc-defs.exp b/gcc/testsuite/lib/gcc-defs.exp
> index 
> 29403d7317c762cba9485dc0562c4c292b777634..2f8b7d488691cc69ccca256864dfaebf3c9b40ac
>  100644
> --- a/gcc/testsuite/lib/gcc-defs.exp
> +++ b/gcc/testsuite/lib/gcc-defs.exp
> @@ -172,6 +172,7 @@ if { [info exists env(GCC_RUNTEST_PARALLELIZE_DIR)] \
>   && [info procs gcc_parallelize_saved_runtest_file_p] == [list] } then {
>  global gcc_runtest_parallelize_counter
>  global gcc_runtest_parallelize_counter_minor
> +global gcc_runtest_parallelize_limit_minor
>  global gcc_runtest_parallelize_enable
>  global gcc_runtest_parallelize_dir
>  global gcc_runtest_parallelize_last
> @@ -212,6 +213,7 @@ if { [info exists env(GCC_RUNTEST_PARALLELIZE_DIR)] \
>  #and investigate if they don't.
>  set gcc_runtest_parallelize_counter 0
>  set gcc_runtest_parallelize_counter_minor 0
> +set gcc_runtest_parallelize_limit_minor 10
>  set gcc_runtest_parallelize_enable 1
>  set gcc_runtest_parallelize_dir [getenv GCC_RUNTEST_PARALLELIZE_DIR]
>  set gcc_runtest_parallelize_last 0
> @@ -219,6 +221,7 @@ if { [info exists env(GCC_RUNTEST_PARALLELIZE_DIR)] \
>  proc gcc_parallel_test_run_p { testcase } {
>   global gcc_runtest_parallelize_counter
>   global gcc_runtest_parallelize_counter_minor
> + global gcc_runtest_parallelize_limit_minor
>   global gcc_runtest_parallelize_enable
>   global gcc_runtest_parallelize_dir
>   global gcc_runtest_parallelize_last
> @@ -228,10 +231,10 @@ if { [info exists env(GCC_RUNTEST_PARALLELIZE_DIR)] \
>   }
>  
>   # Only test the filesystem every 10th iteration
> - incr gcc_runtest_parallelize_counter_minor
> - if { $gcc_runtest_parallelize_counter_minor == 10 } {
> + if { $gcc_runtest_parallelize_counter_minor >= 
> $gcc_runtest_parallelize_limit_minor } {
>   set gcc_runtest_parallelize_counter_minor 0
>   }
> + incr gcc_runtest_parallelize_counter_minor
>   if { $gcc_runtest_parallelize_counter_minor != 1 } {
>   #verbose -log "gcc_parallel_test_run_p $testcase 
> $gcc_runtest_parallelize_counter $gcc_runtest_parallelize_last"
>   return $gcc_runtest_parallelize_last

[PATCH htdocs 2/2] gcc-15/porting_to: link to "Standards conformance" section for C++

2025-02-12 Thread Sam James

Suggested by Andrew Pinski.
---
 htdocs/gcc-15/porting_to.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html
index b9b2efc7..829ae92f 100644
--- a/htdocs/gcc-15/porting_to.html
+++ b/htdocs/gcc-15/porting_to.html
@@ -137,6 +137,12 @@ In file included from :1:
 
 C++ language issues
 
+
+Note that all GCC releases make
+https://gcc.gnu.org/bugs/#upgrading";>improvements to conformance
+which may reject non-conforming, legacy codebases.
+
+
 Header dependency changes
 Some C++ Standard Library headers have been changed to no longer include
 other headers that were being used internally by the library.
-- 
2.48.1

[PATCH htdocs 1/2] bugs: improve "ABI changes" subsection

2025-02-12 Thread Sam James

C++ ABI for C++ standards with full support by GCC (rather than those
marked as experimental per https://gcc.gnu.org/projects/cxx-status.html)
should be stable. It's certainly not the case in 2025 that one needs a
full world rebuild for C++ libraries using e.g. the default standard
or any other supported standard by C++, unless it is marked experimental
where we provide no guarantees.
---
 htdocs/bugs/index.html | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
index d6556b26..99d19095 100644
--- a/htdocs/bugs/index.html
+++ b/htdocs/bugs/index.html
@@ -633,14 +633,14 @@ changed the parser rules so that <:: 
works as expected.
 components: the first defines how the elements of classes are laid
 out, how functions are called, how function names are mangled, etc;
 the second part deals with the internals of the objects in libstdc++.
-Although we strive for a non-changing ABI, so far we have had to
-modify it with each major release.  If you change your compiler to a
-different major release you must recompile all libraries that
-contain C++ code.  If you fail to do so you risk getting linker
-errors or malfunctioning programs.
-It should not be necessary to recompile if you have changed
-to a bug-fix release of the same version of the compiler; bug-fix
-releases are careful to avoid ABI changes. See also the
+For C++ standards marked as
+https://gcc.gnu.org/projects/cxx-status.html";>experimental,
+stable ABI is not guaranteed: for these, if you change your compiler to a
+different major release you must recompile any such libraries built
+with experimental standard support that contain C++ code.  If you fail
+to do so, you risk getting linker errors or malfunctioning programs.
+It should not be necessary to recompile for C++ standards supported fully
+by GCC, such as the default standard.  See also the
 https://gcc.gnu.org/onlinedocs/gcc/Compatibility.html";>compatibility
 section of the GCC manual.
 
-- 
2.48.1

[PATCH] arm: gimple fold aes[ed] [PR114522]

2025-02-12 Thread Christophe Lyon

Almost a copy/paste from the recent aarch64 version of this patch,
this one is a bit more intrusive because it also introduces
arm_general_gimple_fold_builtin.

With this patch,
gcc.target/arm/aes_xor_combine.c scan-assembler-not veor
passes again.

gcc/ChangeLog:

PR target/114522
* config/arm/arm-builtins.cc (arm_fold_aes_op): New function.
(arm_general_gimple_fold_builtin): New function.
* config/arm/arm-builtins.h (arm_general_gimple_fold_builtin): New
prototype.
* config/arm/arm.cc (arm_gimple_fold_builtin): Call
arm_general_gimple_fold_builtin as needed.
---
 gcc/config/arm/arm-builtins.cc | 55 ++
 gcc/config/arm/arm-builtins.h  |  1 +
 gcc/config/arm/arm.cc  |  3 ++
 3 files changed, 59 insertions(+)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index e860607686c..c56ab5db985 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -45,6 +45,9 @@
 #include "arm-builtins.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "ssa.h"
 
 #define SIMD_MAX_BUILTIN_ARGS 7
 
@@ -4053,4 +4056,56 @@ arm_cde_end_args (tree fndecl)
 }
 }
 
+/* Fold a call to vaeseq_u8 and vaesdq_u8.
+   That is `vaeseq_u8 (x ^ y, 0)` gets folded
+   into `vaeseq_u8 (x, y)`.*/
+static gimple *
+arm_fold_aes_op (gcall *stmt)
+{
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+  if (integer_zerop (arg0))
+arg0 = arg1;
+  else if (!integer_zerop (arg1))
+return nullptr;
+  if (TREE_CODE (arg0) != SSA_NAME)
+return nullptr;
+  if (!has_single_use (arg0))
+return nullptr;
+  auto *s = dyn_cast (SSA_NAME_DEF_STMT (arg0));
+  if (!s || gimple_assign_rhs_code (s) != BIT_XOR_EXPR)
+return nullptr;
+  gimple_call_set_arg (stmt, 0, gimple_assign_rhs1 (s));
+  gimple_call_set_arg (stmt, 1, gimple_assign_rhs2 (s));
+  return stmt;
+}
+
+/* Try to fold STMT, given that it's a call to the built-in function with
+   subcode FCODE.  Return the new statement on success and null on
+   failure.  */
+gimple *
+arm_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt)
+{
+  gimple *new_stmt = NULL;
+
+  switch (fcode)
+{
+case ARM_BUILTIN_CRYPTO_AESE:
+case ARM_BUILTIN_CRYPTO_AESD:
+  new_stmt = arm_fold_aes_op (stmt);
+  break;
+}
+
+  /* GIMPLE assign statements (unlike calls) require a non-null lhs.  If we
+ created an assign statement with a null lhs, then fix this by assigning
+ to a new (and subsequently unused) variable.  */
+  if (new_stmt && is_gimple_assign (new_stmt) && !gimple_assign_lhs (new_stmt))
+{
+  tree new_lhs = make_ssa_name (gimple_call_return_type (stmt));
+  gimple_assign_set_lhs (new_stmt, new_lhs);
+}
+
+  return new_stmt;
+}
+
 #include "gt-arm-builtins.h"
diff --git a/gcc/config/arm/arm-builtins.h b/gcc/config/arm/arm-builtins.h
index 1fa85b602d9..3a646619f44 100644
--- a/gcc/config/arm/arm-builtins.h
+++ b/gcc/config/arm/arm-builtins.h
@@ -32,6 +32,7 @@ enum resolver_ident {
 };
 enum resolver_ident arm_describe_resolver (tree);
 unsigned arm_cde_end_args (tree);
+gimple *arm_general_gimple_fold_builtin (unsigned int, gcall *);
 
 #define ENTRY(E, M, Q, S, T, G) E,
 enum arm_simd_type
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index a95ddf8201f..00499a26bae 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -76,6 +76,7 @@
 #include "aarch-common.h"
 #include "aarch-common-protos.h"
 #include "machmode.h"
+#include "arm-builtins.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2859,7 +2860,9 @@ arm_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   switch (code & ARM_BUILTIN_CLASS)
 {
 case ARM_BUILTIN_GENERAL:
+  new_stmt = arm_general_gimple_fold_builtin (subcode, stmt);
   break;
+
 case ARM_BUILTIN_MVE:
   new_stmt = arm_mve::gimple_fold_builtin (subcode, stmt);
 }
-- 
2.34.1

Re: [PATCH] SFINAE check for floating point fetch_add builtins in libstdc++

2025-02-12 Thread Matthew Malcomson

Hi there -- since you've not pushed quite yet can I ask that you use the 
following commit message instead of the existing one?
(Updated a few comments to match what has changed since I wrote that 
message).


Apologies for not noticing it earlier.



libstdc++: Conditionally use floating-point fetch_add builtins

- Some hardware has support for floating point atomic fetch_add (and
  similar).
- There are existing compilers targetting this hardware that use
  libstdc++ -- e.g. NVC++.
- Since the libstdc++ atomic::fetch_add and similar is written
  directly as a CAS loop these compilers can not emit optimal code when
  seeing such constructs.
- I hope to use __atomic_fetch_add builtins on floating point types
  directly in libstdc++ so these compilers can emit better code.
- Clang already handles some floating point types in the
  __atomic_fetch_add family of builtins.
- In order to only use this when available, I originally thought I could
  check against the resolved versions of the builtin in a manner
  something like `__has_builtin(__atomic_fetch_add_)`.
  I then realised that clang does not expose resolved versions of these
  atomic builtins to the user.
  From the clang discourse it was suggested we instead use SFINAE (which
  clang already supports).
- I have recently pushed a patch for allowing the use of SFINAE on
  builtins.
  https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html
  Now that patch is upstream, this patch does not change what happens
  for GCC, while it uses the builtin for codegen with clang.
- I have previously sent a patchset upstream adding the ability to use
  __atomic_fetch_add and similar on floating point types.
  https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668754.html
  Once that patchset is upstream (plus the automatic linking of
  libatomic as Joseph pointed out in the email below
  https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html )
  then current GCC should start to use the builtin branch added in this
  patch.

So *currently*, this patch allows external compilers (NVC++ in
particular) to generate better code, and similarly lets clang understand
the operation better since it maps to a known builtin.

I hope that by GCC 16 this patch would also allow GCC to understand the
operation better via mapping to a known builtin.

Testing done:
1) No change seen in bootstrap & regression test on AArch64
2) No change seen in bootstrap & regression test on x86_64 (Jonathan did
   this test).
3) Manually checked that when compiling with clang we follow the branch
   that uses the builtin for `float` (because clang has already
   implemented these builtins for `float`).
   - Done by adding `+1` to the result of the builtin branch and
 checking that we abort when running the result for each of
 fetch_add/fetch_sub/add_fetch/sub_fetch in turn (i.e. 4 separate
 manual checks).
4) Manually checked that when compiling with GCC we follow the branch
   that does not use the builtin for `float`.
   - Done this by adding the same temporary bug to the header in the
 builtin branch, and re-running tests to see that we still pass with
 GCC.

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h (__atomic_fetch_addable): Define
new concept.
(__atomic_impl::__fetch_add_flt): Use new concept to make use of
__atomic_fetch_add when available.
(__atomic_fetch_subtractable, __fetch_sub_flt): Likewise.
(__atomic_add_fetchable, __add_fetch_flt): Likewise.
(__atomic_sub_fetchable, __sub_fetch_flt): Likewise.

Signed-off-by: Matthew Malcomson 
Co-authored-by: Jonathan Wakely 

---


On 2/8/25 14:05, Jonathan Wakely wrote:

External email: Use caution opening links or attachments


On Sat, 8 Feb 2025 at 10:55, Matthew Malcomson  wrote:


Hi Jonathan!

Many thanks!  Will learn the libstdc++ style eventually.

I've run bootstrap & regression test on this, and did the manual checks
I mentioned before of compiling atomic_float/1.cc with clang and then
adding `+ 1` on the builtin codepath to check that the clang binary
aborts while the GCC built binary passes.


Great, thanks for checking it. I'll get this pushed early next week.




Thanks again!
Matthew

On 2/7/25 15:33, Jonathan Wakely wrote:

External email: Use caution opening links or attachments


On 05/02/25 13:43 +, Jonathan Wakely wrote:

On 28/10/24 17:15 +, mmalcom...@nvidia.com wrote:

From: Matthew Malcomson 

I noticed that the libstdc++ patch is essentially separate and figured I
could send it upstream earlier to give reviewers more time to look at
it.
I am still working on adding the ability to use floating point types in
the __atomic_fetch_add builtins

Review of current state and motivation (for anyone reading this that has
not already seen the previous patches):
- Some hardware has support for floating point atomic fetch_add (and
similar).
- There are existing compilers targetting this hardware that use
libstdc++ --

[PATCH] c++/modules: Don't treat template parameters as TU-local [PR118846]

2025-02-12 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

There are two separate issues making various template parameters behave
as if they were TU-local.

Firstly, the TU-local detection code uses WILDCARD_TYPE_P to check for
types that are not yet concrete; for some reason UNBOUND_CLASS_TEMPLATE
is not on that list.  I don't see any particular reason why it shouldn't
be, so this patch adds it; this may solve other latent issues as well.

Secondly, the TEMPLATE_DECL for a type with expressions involving
template template parameters is currently always constrained to internal
linkage, because the result does not have TREE_PUBLIC set.  Rather than
messing with TREE_PUBLIC here, I think rather we just should ensure that
we only attempt to constrain visiblity of templates of type, variable,
or function decls.

PR c++/118846

gcc/cp/ChangeLog:

* cp-tree.h (WILDCARD_TYPE_P): Include UNBOUND_CLASS_TEMPLATE.
* decl2.cc (min_vis_expr_r): Don't rely on TREE_PUBLIC always
being accurate for the DECL_TEMPLATE_RESULT of a template.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr118846_a.C: New test.
* g++.dg/modules/pr118846_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/decl2.cc   | 22 ++
 gcc/testsuite/g++.dg/modules/pr118846_a.C | 18 ++
 gcc/testsuite/g++.dg/modules/pr118846_b.C | 10 ++
 4 files changed, 43 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr118846_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr118846_b.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ec976928f5f..3d0349f66a3 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2325,6 +2325,7 @@ enum languages { lang_c, lang_cplusplus };
|| TREE_CODE (T) == TYPENAME_TYPE   \
|| TREE_CODE (T) == TYPEOF_TYPE \
|| TREE_CODE (T) == BOUND_TEMPLATE_TEMPLATE_PARM\
+   || TREE_CODE (T) == UNBOUND_CLASS_TEMPLATE  \
|| TREE_CODE (T) == DECLTYPE_TYPE   \
|| TREE_CODE (T) == TRAIT_TYPE  \
|| TREE_CODE (T) == DEPENDENT_OPERATOR_TYPE \
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 593dcaa4e2d..4415cea93e0 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -2786,6 +2786,19 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
   tree t = *tp;
   if (TREE_CODE (t) == PTRMEM_CST)
 t = PTRMEM_CST_MEMBER (t);
+
+  if (TREE_CODE (t) == TEMPLATE_DECL)
+{
+  if (DECL_ALIAS_TEMPLATE_P (t) || concept_definition_p (t))
+   /* FIXME: We don't maintain TREE_PUBLIC / DECL_VISIBILITY for
+  alias templates so we can't trust it here (PR107906).  Ditto
+  for concepts.  */
+   return NULL_TREE;
+  t = DECL_TEMPLATE_RESULT (t);
+  if (!t)
+   return NULL_TREE;
+}
+
   switch (TREE_CODE (t))
 {
 case CAST_EXPR:
@@ -2797,17 +2810,10 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
 case NEW_EXPR:
 case CONSTRUCTOR:
 case LAMBDA_EXPR:
+case TYPE_DECL:
   tpvis = type_visibility (TREE_TYPE (t));
   break;
 
-case TEMPLATE_DECL:
-  if (DECL_ALIAS_TEMPLATE_P (t) || concept_definition_p (t))
-   /* FIXME: We don't maintain TREE_PUBLIC / DECL_VISIBILITY for
-  alias templates so we can't trust it here (PR107906).  Ditto
-  for concepts.  */
-   break;
-  t = DECL_TEMPLATE_RESULT (t);
-  /* Fall through.  */
 case VAR_DECL:
 case FUNCTION_DECL:
   if (decl_constant_var_p (t))
diff --git a/gcc/testsuite/g++.dg/modules/pr118846_a.C 
b/gcc/testsuite/g++.dg/modules/pr118846_a.C
new file mode 100644
index 000..bbbdde78457
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr118846_a.C
@@ -0,0 +1,18 @@
+// PR c++/118846
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi M }
+
+export module M;
+
+template  struct integral_constant { static constexpr int value = N; };
+template  class> constexpr int cx_count_if() { return 0; }
+template  class P> struct mp_count_if_impl {
+  using type = integral_constant()>;
+};
+
+template  class> struct consume {
+  static constexpr bool value = true;
+};
+template  struct use {
+  using type = consume;
+};
diff --git a/gcc/testsuite/g++.dg/modules/pr118846_b.C 
b/gcc/testsuite/g++.dg/modules/pr118846_b.C
new file mode 100644
index 000..a2f28894630
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr118846_b.C
@@ -0,0 +1,10 @@
+// PR c++/118846
+// { dg-additional-options "-fmodules" }
+
+module M;
+
+template  struct S {
+  template  struct fn {};
+};
+static_assert(mp_count_if_impl::type::value == 0);
+static_assert(use>::type::value);
-- 
2.47.0

Re: [PATCH] SFINAE check for floating point fetch_add builtins in libstdc++

2025-02-12 Thread Jonathan Wakely

On Wed, 12 Feb 2025 at 12:05, Matthew Malcomson  wrote:
>
> Hi there -- since you've not pushed quite yet can I ask that you use the
> following commit message instead of the existing one?
> (Updated a few comments to match what has changed since I wrote that
> message).
>
> Apologies for not noticing it earlier.

Sure, I'll update it before I push.


>
>
> 
> libstdc++: Conditionally use floating-point fetch_add builtins
>
> - Some hardware has support for floating point atomic fetch_add (and
>similar).
> - There are existing compilers targetting this hardware that use
>libstdc++ -- e.g. NVC++.
> - Since the libstdc++ atomic::fetch_add and similar is written
>directly as a CAS loop these compilers can not emit optimal code when
>seeing such constructs.
> - I hope to use __atomic_fetch_add builtins on floating point types
>directly in libstdc++ so these compilers can emit better code.
> - Clang already handles some floating point types in the
>__atomic_fetch_add family of builtins.
> - In order to only use this when available, I originally thought I could
>check against the resolved versions of the builtin in a manner
>something like `__has_builtin(__atomic_fetch_add_)`.
>I then realised that clang does not expose resolved versions of these
>atomic builtins to the user.
>From the clang discourse it was suggested we instead use SFINAE (which
>clang already supports).
> - I have recently pushed a patch for allowing the use of SFINAE on
>builtins.
>https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html
>Now that patch is upstream, this patch does not change what happens
>for GCC, while it uses the builtin for codegen with clang.
> - I have previously sent a patchset upstream adding the ability to use
>__atomic_fetch_add and similar on floating point types.
>https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668754.html
>Once that patchset is upstream (plus the automatic linking of
>libatomic as Joseph pointed out in the email below
>https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html )
>then current GCC should start to use the builtin branch added in this
>patch.
>
> So *currently*, this patch allows external compilers (NVC++ in
> particular) to generate better code, and similarly lets clang understand
> the operation better since it maps to a known builtin.
>
> I hope that by GCC 16 this patch would also allow GCC to understand the
> operation better via mapping to a known builtin.
>
> Testing done:
> 1) No change seen in bootstrap & regression test on AArch64
> 2) No change seen in bootstrap & regression test on x86_64 (Jonathan did
> this test).
> 3) Manually checked that when compiling with clang we follow the branch
> that uses the builtin for `float` (because clang has already
> implemented these builtins for `float`).
> - Done by adding `+1` to the result of the builtin branch and
>   checking that we abort when running the result for each of
>   fetch_add/fetch_sub/add_fetch/sub_fetch in turn (i.e. 4 separate
>   manual checks).
> 4) Manually checked that when compiling with GCC we follow the branch
> that does not use the builtin for `float`.
> - Done this by adding the same temporary bug to the header in the
>   builtin branch, and re-running tests to see that we still pass with
>   GCC.
>
> libstdc++-v3/ChangeLog:
>
>  * include/bits/atomic_base.h (__atomic_fetch_addable): Define
>  new concept.
>  (__atomic_impl::__fetch_add_flt): Use new concept to make use of
>  __atomic_fetch_add when available.
>  (__atomic_fetch_subtractable, __fetch_sub_flt): Likewise.
>  (__atomic_add_fetchable, __add_fetch_flt): Likewise.
>  (__atomic_sub_fetchable, __sub_fetch_flt): Likewise.
>
> Signed-off-by: Matthew Malcomson 
> Co-authored-by: Jonathan Wakely 
>
> ---
>
>
> On 2/8/25 14:05, Jonathan Wakely wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Sat, 8 Feb 2025 at 10:55, Matthew Malcomson  
> > wrote:
> >>
> >> Hi Jonathan!
> >>
> >> Many thanks!  Will learn the libstdc++ style eventually.
> >>
> >> I've run bootstrap & regression test on this, and did the manual checks
> >> I mentioned before of compiling atomic_float/1.cc with clang and then
> >> adding `+ 1` on the builtin codepath to check that the clang binary
> >> aborts while the GCC built binary passes.
> >
> > Great, thanks for checking it. I'll get this pushed early next week.
> >
> >
> >>
> >> Thanks again!
> >> Matthew
> >>
> >> On 2/7/25 15:33, Jonathan Wakely wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On 05/02/25 13:43 +, Jonathan Wakely wrote:
>  On 28/10/24 17:15 +, mmalcom...@nvidia.com wrote:
> > From: Matthew Malcomson 
> >
> > I noticed that the libstdc++ patch is essentially separate and figured I
> > could se

[PATCH] tree-optimization/86270 - improve SSA coalescing for loop exit test

2025-02-12 Thread Richard Biener

The PR indicates a very specific issue with regard to SSA coalescing
failures because there's a pre IV increment loop exit test.  While
IVOPTs created the desired IL we later simplify the exit test into
the undesirable form again.  The following fixes this up during RTL
expansion where we try to improve coalescing of IVs.  That seems
easier that trying to avoid the simplification with some weird
heuristics (it could also have been written this way).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

PR tree-optimization/86270
* tree-outof-ssa.cc (insert_backedge_copies): Pattern
match a single conflict in a loop condition and adjust
that avoiding the conflict if possible.

* gcc.target/i386/pr86270.c: Adjust to check for no reg-reg
copies as well.
---
 gcc/testsuite/gcc.target/i386/pr86270.c |  3 ++
 gcc/tree-outof-ssa.cc   | 49 ++---
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c 
b/gcc/testsuite/gcc.target/i386/pr86270.c
index 68562446fa4..89b9aeb317a 100644
--- a/gcc/testsuite/gcc.target/i386/pr86270.c
+++ b/gcc/testsuite/gcc.target/i386/pr86270.c
@@ -13,3 +13,6 @@ test ()
 
 /* Check we do not split the backedge but keep nice loop form.  */
 /* { dg-final { scan-assembler-times "L\[0-9\]+:" 2 } } */
+/* Check we do not end up with reg-reg moves from a pre-increment IV
+   exit test.  */
+/* { dg-final { scan-assembler-not "mov\[lq\]\?\t%\?\[er\].x, %\?\[er\].x" } } 
*/
diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
index d340d4ba529..f285c81599e 100644
--- a/gcc/tree-outof-ssa.cc
+++ b/gcc/tree-outof-ssa.cc
@@ -1259,10 +1259,9 @@ insert_backedge_copies (void)
  if (gimple_nop_p (def)
  || gimple_code (def) == GIMPLE_PHI)
continue;
- tree name = copy_ssa_name (result);
- gimple *stmt = gimple_build_assign (name, result);
  imm_use_iterator imm_iter;
  gimple *use_stmt;
+ auto_vec uses;
  /* The following matches trivially_conflicts_p.  */
  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, result)
{
@@ -1273,11 +1272,51 @@ insert_backedge_copies (void)
{
  use_operand_p use;
  FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
-   SET_USE (use, name);
+   uses.safe_push (use);
}
}
- gimple_stmt_iterator gsi = gsi_for_stmt (def);
- gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+ /* When there is just a conflicting statement try to
+adjust that to refer to the new definition.
+In particular for now handle a conflict with the
+use in a (exit) condition with a NE compare,
+replacing a pre-IV-increment compare with a
+post-IV-increment one.  */
+ if (uses.length () == 1
+ && is_a  (USE_STMT (uses[0]))
+ && gimple_cond_code (USE_STMT (uses[0])) == NE_EXPR
+ && is_gimple_assign (def)
+ && gimple_assign_rhs1 (def) == result
+ && (gimple_assign_rhs_code (def) == PLUS_EXPR
+ || gimple_assign_rhs_code (def) == MINUS_EXPR
+ || gimple_assign_rhs_code (def) == POINTER_PLUS_EXPR)
+ && TREE_CODE (gimple_assign_rhs2 (def)) == INTEGER_CST)
+   {
+ gcond *cond = as_a  (USE_STMT (uses[0]));
+ tree *adj;
+ if (gimple_cond_lhs (cond) == result)
+   adj = gimple_cond_rhs_ptr (cond);
+ else
+   adj = gimple_cond_lhs_ptr (cond);
+ tree name = copy_ssa_name (result);
+ gimple *stmt
+   = gimple_build_assign (name,
+  gimple_assign_rhs_code (def),
+  *adj, gimple_assign_rhs2 (def));
+ gimple_stmt_iterator gsi = gsi_for_stmt (cond);
+ gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+ *adj = name;
+ SET_USE (uses[0], arg);
+ update_stmt (cond);
+   }
+ else
+   {
+ tree name = copy_ssa_name (result);
+ gimple *stmt = gimple_build_assign (name, result);
+ gimple_stmt_iterator gsi = gsi_for_stmt (def);
+ gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+ for (auto use : us

[PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-12 Thread Uros Bizjak

The combine pass is trying to combine:

Trying 16, 22, 21 -> 23:
   16: r104:QI=flags:CCNO>0
   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
  REG_UNUSED flags:CC
   21: r119:QI=flags:CCNO<=0
  REG_DEAD flags:CCNO
   23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
  REG_DEAD r120:QI
  REG_DEAD r119:QI
  REG_UNUSED flags:CC

and creates the following two insn sequence:

modifying insn i222: r104:QI=flags:CCNO>0
  REG_DEAD flags:CC
deferring rescan insn with uid = 22.
modifying insn i323: r110:QI=flags:CCNO<=0
  REG_DEAD flags:CC
deferring rescan insn with uid = 23.

where the REG_DEAD note in i2 is not correct, because the flags
register is still referenced in i3.  In try_combine() megafunction, we
have this part:

--cut here--
/* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3.  */
if (i3notes)
  distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
if (i2notes)
  distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
if (i1notes)
  distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
elim_i2, local_elim_i1, local_elim_i0);
if (i0notes)
  distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, local_elim_i0);
if (midnotes)
  distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
elim_i2, elim_i1, elim_i0);
--cut here--

where the compiler distributes REG_UNUSED note from i2:

   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
  REG_UNUSED flags:CC

via distribute_notes() using the following:

--cut here--
  /* Otherwise, if this register is now referenced in i2
 then the register used to be modified in one of the
 original insns.  If it was i3 (say, in an unused
 parallel), it's now completely gone, so the note can
 be discarded.  But if it was modified in i2, i1 or i0
 and we still reference it in i2, then we're
 referencing the previous value, and since the
 register was modified and REG_UNUSED, we know that
 the previous value is now dead.  So, if we only
 reference the register in i2, we change the note to
 REG_DEAD, to reflect the previous value.  However, if
 we're also setting or clobbering the register as
 scratch, we know (because the register was not
 referenced in i3) that it's unused, just as it was
 unused before, and we place the note in i2.  */
  if (from_insn != i3 && i2 && INSN_P (i2)
  && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
{
  if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
PUT_REG_NOTE_KIND (note, REG_DEAD);
  if (! (REG_P (XEXP (note, 0))
 ? find_regno_note (i2, REG_NOTE_KIND (note),
REGNO (XEXP (note, 0)))
 : find_reg_note (i2, REG_NOTE_KIND (note),
  XEXP (note, 0
place = i2;
}
--cut here--

However, the flags register is not UNUSED (or DEAD), because it is
used in i3.  The proposed solution is to remove the REG_UNUSED note
from i2 when the register is also mentioned in i3.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for master and eventual backports?

Uros.
diff --git a/gcc/combine.cc b/gcc/combine.cc
index 3beeb514b81..0589ddbaca7 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -14557,9 +14557,12 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
 we're also setting or clobbering the register as
 scratch, we know (because the register was not
 referenced in i3) that it's unused, just as it was
-unused before, and we place the note in i2.  */
+unused before, and we place the note in i2.  If this
+register is still referenced in i3, the note can be
+discarded.  */
  if (from_insn != i3 && i2 && INSN_P (i2)
- && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
+ && reg_referenced_p (XEXP (note, 0), PATTERN (i2))
+ && !reg_referenced_p (XEXP (note, 0), PATTERN (i3)))
{
  if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
PUT_REG_NOTE_KIND (note, REG_DEAD);
diff --git a/gcc/testsuite/gcc.target/i386/pr118739.c 
b/gcc/testsuite/gcc.target/i386/pr118739.c
new file mode 100644
index 000..89bed546363
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr118739.c
@@ -0,0 +1,50 @@
+/* PR rtl-optimization/118739 */
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-tree-forwprop -fno-tree-vrp" } */
+
+volatile int a;
+int b, c, d = 1, e, f, g;
+
+int h (void)
+{
+  int i = 1;
+
+ j:
+  for (b = 1; b; b--)
+{
+  asm ("#");
+
+  g = 0;
+
+  for (; g <= 1; g++)
+   {
+ int k = f = 0

[PATCH] c++: Constrain visibility for CNTTPs with internal types [PR118849]

2025-02-12 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/14?

-- >8 --

While looking into PR118846 I noticed that we don't currently constrain
the linkage of functions involving CNTTPs of internal-linkage types.  It
seems to me that this would be sensible to do.

PR c++/118849

gcc/cp/ChangeLog:

* decl2.cc (min_vis_expr_r): Constrain visibility according to
the type of decl_constant_var_p decls.

gcc/testsuite/ChangeLog:

* g++.dg/template/linkage6.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/decl2.cc  |  4 +++-
 gcc/testsuite/g++.dg/template/linkage6.C | 12 
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/linkage6.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 4415cea93e0..9a76e00dcde 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -2820,7 +2820,9 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
/* The ODR allows definitions in different TUs to refer to distinct
   constant variables with internal or no linkage, so such a reference
   shouldn't affect visibility (PR110323).  FIXME but only if the
-  lvalue-rvalue conversion is applied.  */;
+  lvalue-rvalue conversion is applied.  We still want to restrict
+  visibility according to the type of the declaration however.  */
+   tpvis = type_visibility (TREE_TYPE (t));
   else if (! TREE_PUBLIC (t))
tpvis = VISIBILITY_ANON;
   else
diff --git a/gcc/testsuite/g++.dg/template/linkage6.C 
b/gcc/testsuite/g++.dg/template/linkage6.C
new file mode 100644
index 000..fb589f67874
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/linkage6.C
@@ -0,0 +1,12 @@
+// { dg-do compile { target c++20 } }
+// { dg-final { scan-assembler-not "(weak|glob)\[^\n\]*_Z" { xfail 
powerpc-*-aix* } } }
+
+namespace {
+  struct A {};
+}
+
+template  void f() { }
+
+int main() {
+  f();
+}
-- 
2.47.0

Re: [PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-12 Thread Uros Bizjak

On Wed, Feb 12, 2025 at 1:14 PM Uros Bizjak  wrote:
>
> The combine pass is trying to combine:
>
> Trying 16, 22, 21 -> 23:
>16: r104:QI=flags:CCNO>0
>22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
>   REG_UNUSED flags:CC
>21: r119:QI=flags:CCNO<=0
>   REG_DEAD flags:CCNO
>23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
>   REG_DEAD r120:QI
>   REG_DEAD r119:QI
>   REG_UNUSED flags:CC
>
> and creates the following two insn sequence:
>
> modifying insn i222: r104:QI=flags:CCNO>0
>   REG_DEAD flags:CC
> deferring rescan insn with uid = 22.
> modifying insn i323: r110:QI=flags:CCNO<=0
>   REG_DEAD flags:CC
> deferring rescan insn with uid = 23.
>
> where the REG_DEAD note in i2 is not correct, because the flags
> register is still referenced in i3.  In try_combine() megafunction, we
> have this part:
>
> --cut here--
> /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3.  */
> if (i3notes)
>   distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
> elim_i2, elim_i1, elim_i0);
> if (i2notes)
>   distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
> elim_i2, elim_i1, elim_i0);
> if (i1notes)
>   distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
> elim_i2, local_elim_i1, local_elim_i0);
> if (i0notes)
>   distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
> elim_i2, elim_i1, local_elim_i0);
> if (midnotes)
>   distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
> elim_i2, elim_i1, elim_i0);
> --cut here--
>
> where the compiler distributes REG_UNUSED note from i2:
>
>22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
>   REG_UNUSED flags:CC
>
> via distribute_notes() using the following:
>
> --cut here--
>   /* Otherwise, if this register is now referenced in i2
>  then the register used to be modified in one of the
>  original insns.  If it was i3 (say, in an unused
>  parallel), it's now completely gone, so the note can
>  be discarded.  But if it was modified in i2, i1 or i0
>  and we still reference it in i2, then we're
>  referencing the previous value, and since the
>  register was modified and REG_UNUSED, we know that
>  the previous value is now dead.  So, if we only
>  reference the register in i2, we change the note to
>  REG_DEAD, to reflect the previous value.  However, if
>  we're also setting or clobbering the register as
>  scratch, we know (because the register was not
>  referenced in i3) that it's unused, just as it was
>  unused before, and we place the note in i2.  */
>   if (from_insn != i3 && i2 && INSN_P (i2)
>   && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
> {
>   if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
> PUT_REG_NOTE_KIND (note, REG_DEAD);
>   if (! (REG_P (XEXP (note, 0))
>  ? find_regno_note (i2, REG_NOTE_KIND (note),
> REGNO (XEXP (note, 0)))
>  : find_reg_note (i2, REG_NOTE_KIND (note),
>   XEXP (note, 0
> place = i2;
> }
> --cut here--
>
> However, the flags register is not UNUSED (or DEAD), because it is
> used in i3.  The proposed solution is to remove the REG_UNUSED note
> from i2 when the register is also mentioned in i3.

Oops, forgot to include ChangeLog entry:

--cut here--
PR rtl-optimization/118739

gcc/ChangeLog:

* combine.cc (distribute_notes) : Remove
REG_UNUSED note from i2 when the register is also mentioned in i3.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr118739.c: New test.
--cut here--

>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> OK for master and eventual backports?
>
> Uros.

Re: [PATCH v2 3/4] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-12 Thread Xi Ruoyao

On Wed, 2025-02-12 at 18:03 +0800, Lulu Cheng wrote:

/* snip */

> diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c 
> b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
> new file mode 100644
> index 000..a682ae4a356
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
> @@ -0,0 +1,55 @@
> +/* { dg-do run } */
> +/* { dg-options "-march=loongarch64" } */
> +
> +#include 
> +#include 
> +#include 
> +
> +#ifndef __loongarch_arch
> +#error __loongarch_arch should not be available here
> +#endif

Hmm this seems not correct?  __loongarch_arch should be just
"loongarch64" here (at least it is "loongarch64" with GCC <= 14).

And a dg-do test with explicit -march= in dg-options is problematic
because it'll fail on less-capable CPUs (in this case, after we add
LA32).  We can change this to something like:

/* { dg-do compile } */
/* { dg-options "-O2 -march=loongarch64" } */
/* { dg-final { scan-assembler "t1: loongarch64" } } */
/* { dg-final { scan-assembler "t2: la64v1.1" } } */
/* { dg-final { scan-assembler "t3: loongarch64" } } */

void
t1 (void)
{
  asm volatile ("# t1: " __loongarch_arch);
}

#pragma GCC push_options
#pragma GCC target("arch=la64v1.1")

void
t2 (void)
{
  asm volatile ("# t2: " __loongarch_arch);
}

#pragma GCC pop_options

void
t3 (void)
{
  asm volatile ("# t3: " __loongarch_arch);
}

... ...

/* snip */

> diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-4.c 
> b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c
> new file mode 100644
> index 000..3b3a7c6078c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c
> @@ -0,0 +1,55 @@
> +/* { dg-do run } */
> +/* { dg-options "-mtune=la464" } */
> +
> +#include 
> +#include 
> +#include 
> +
> +#ifndef __loongarch_tune
> +#error __loongarch_tune should not be available here

Likewise.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread Uros Bizjak

On Wed, Feb 12, 2025 at 11:06 AM H.J. Lu  wrote:
>
> On Wed, Feb 12, 2025 at 5:28 PM Uros Bizjak  wrote:
> >
> > On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu  wrote:
> > >
> > > Don't assume that stack slots can only be accessed by stack or frame
> > > registers.  We first find all registers defined by stack or frame
> > > registers.  Then check memory accesses by such registers, including
> > > stack and frame registers.
> > >
> > > gcc/
> > >
> > > PR target/109780
> > > PR target/109093
> > > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > > (ix86_find_all_reg_use): Likewise.
> > > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > > from registers defined by stack or frame registers.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/109780
> > > PR target/109093
> > > * g++.target/i386/pr109780-1.C: New test.
> > > * gcc.target/i386/pr109093-1.c: Likewise.
> > > * gcc.target/i386/pr109780-1.c: Likewise.
> > > * gcc.target/i386/pr109780-2.c: Likewise.
> >
> > > +/* Find all registers defined with REG.  */
> > > +
> > > +static void
> > > +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
> > > +   unsigned int reg, auto_bitmap &worklist)
> > > +{
> > > +  for (df_ref ref = DF_REG_USE_CHAIN (reg);
> > > +   ref != NULL;
> > > +   ref = DF_REF_NEXT_REG (ref))
> > > +{
> > > +  if (DF_REF_IS_ARTIFICIAL (ref))
> > > +continue;
> > > +
> > > +  rtx_insn *insn = DF_REF_INSN (ref);
> > > +  if (!NONDEBUG_INSN_P (insn))
> > > +continue;
> > > +
> > > +  rtx set = single_set (insn);
> > > +  if (!set)
> > > +continue;
> > > +
> >
> > Isn't the above condition a bit too limiting? We can have insn with
> > multiple sets in the chain.
> >
> > The issue at hand is the correctness issue (the program will segfault
> > if registers are not tracked correctly), not some missing
> > optimization. I'd suggest to stay on the safe side and also process
> > PARALLELs. Something similar to e.g. store_data_bypass_p from
> > recog.cc:
> >
> > --cut here--
> >   rtx set = single_set (insn);
> >   if (set)
> > ix86_find_all_reg_use_1(...);
> >
> >   rtx pat = PATTERN (insn);
> >   if (GET_CODE (pat) != PARALLEL)
> > return false;
> >
> >   for (int i = 0; i < XVECLEN (pat, 0); i++)
> > {
> >   rtx exp = XVECEXP (pat, 0, i);
> >
> >   if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
> > continue;
> >
> >   gcc_assert (GET_CODE (exp) == SET);
> >
> >   ix86_find_all_reg_use_1(...);
> > }
> > --cut here--
> >
> > The above will make ix86_find_all_reg_use significantly more robust.
> >
> > Uros.
>
> Like this?

Yes.

Thanks,
Uros.

>
> /* Helper function for ix86_find_all_reg_use.  */
>
> static void
> ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access,
>  auto_bitmap &worklist)
> {
>   rtx src = SET_SRC (set);
>   if (MEM_P (src))
> return;
>
>   rtx dest = SET_DEST (set);
>   if (!REG_P (dest))
> return;
>
>   if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest)))
> return;
>
>   /* Add this register to stack_slot_access.  */
>   add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest));
>   bitmap_set_bit (worklist, REGNO (dest));
> }
>
> /* Find all registers defined with REG.  */
>
> static void
> ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
>unsigned int reg, auto_bitmap &worklist)
> {
>   for (df_ref ref = DF_REG_USE_CHAIN (reg);
>ref != NULL;
>ref = DF_REF_NEXT_REG (ref))
> {
>   if (DF_REF_IS_ARTIFICIAL (ref))
> continue;
>
>   rtx_insn *insn = DF_REF_INSN (ref);
>   if (!NONDEBUG_INSN_P (insn))
> continue;
>
>   rtx set = single_set (insn);
>   if (set)
> ix86_find_all_reg_use_1 (set, stack_slot_access, worklist);
>
>   rtx pat = PATTERN (insn);
>   if (GET_CODE (pat) != PARALLEL)
> continue;
>
>   for (int i = 0; i < XVECLEN (pat, 0); i++)
> {
>   rtx exp = XVECEXP (pat, 0, i);
>
>   if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
> continue;
>
>   gcc_assert (GET_CODE (exp) == SET);
>
>   ix86_find_all_reg_use_1 (exp, stack_slot_access, worklist);
> }
> }
> }
>
>
> --
> H.J.

Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103]

2025-02-12 Thread Richard Sandiford

Jeff Law  writes:
> On 2/11/25 3:17 PM, Richard Sandiford wrote:
>> Jeff Law  writes:
>>> On 2/11/25 9:08 AM, Richard Sandiford wrote:
 Jeff Law  writes:
> On 2/7/25 5:59 AM, Andrew Waterman wrote:
>> This patch runs counter to the ABI spec, which states that vxrm is not
>> preserved across calls and is volatile upon function entry [1].  vxrm
>> does not play the same role as frm plays in the calling convention.
>> (I won't get into the rationale in this email, but the rationale isn't
>> especially important: we should follow the ABI.)
>>
>> [1] 
>> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/3a79e936eec5491078b1133ac943f91ef5fd75fd/riscv-cc.adoc?plain=1#L119-L120
> Pan's patch doesn't change the basic property that VXRM has no known
> state at function entry or upon return from a function call.

 I think it will.  global_regs[X] means that X is defined on entry,
 defined on exit, and can be changed by calls.  If the register is
 call-clobbered/volatile/caller-saved, then I agree with Andrew that
 this doesn't look like the right fix.
>>> But the LCM code we use to manage vxrm assignments makes no assumption
>>> about incoming state and assumes no state is preserved across calls.
>> 
>> In that case, I wonder what the patch is fixing.  Like you say,
>> the initial mode seems to be VXRM_MODE_NONE, and it looks like
>> riscv_vxrm_mode_after correctly models calls as clobbering the mode.
> Just realized I didn't answer this part of your message.  It's not 
> really fixing any known issue.  Just felt like the right thing to do as 
> VXRM is roughly similar to (but clearly not 100% the same) FRM.

But it sounds from the discussion like one of the differences between
FRM and VXRM is also the key difference between marking something as a
global register and marking it as a call-clobbered register.

Rounding modes and exception modes are usually global, because that's
necessary for things like fesetround and fesetexceptflag to work properly.
Like you said in your other reply, there are restrictions about what
the rounding mode can be on entry to certain functions, but that's more
of an API precondition.

It sounds like the ABI defines FRM to be such a global register but that
it defines VXRM (which isn't bound to C library restrictions) to be a
call-clobbered register.

If we want to set a call-clobbered fixed register to a specific local
value, between calls to foo and bar, the sequence would be:

call foo
FIXED_REG := ...
...use FIXED_REG...
call bar

It sounds like this is the correct sequence for VXRM and that it's what
the port generates.  If, after introducing the FIXED_REG assignment,
we later delete the uses as dead, the FIXED_REG assignment will also
become dead, since its value is clobbered by the call to bar.

If instead we want to set a global register to a specific local value,
the sequence would be:

call foo
TMP := FIXED_REG
FIXED_REG := ...
...use FIXED_REG...
FIXED_REG := TMP
call bar

It sounds like this is the correct sequence for FRM and it seemed to be
what the port was generating in the PR.  If, after introducing the FIXED_REG
assignment, we later delete the uses as dead, the first FIXED_REG assignment
will become dead due to the later FIXED_REG := TMP.  Then the TMP := FIXED_REG
and FIXED_REG := TMP collapse into a no-op.

But if we pretend that a call-clobbered register is a global register,
we'd still generate the first sequence above:

call foo
FIXED_REG := ...
...use FIXED_REG...
call bar

but the dataflow would not be as accurate.  If we later delete the use
of the fixed register as dead, the assignment would still be kept live
by its assumed use in bar.  (Or, if there is no later call, by its
assumed use in the caller.)

Obviously it's not the port I work on, or my call, but if the patch
isn't fixing a known issue then I wonder if it should be reverted.
The justification in the commit message -- that VXRM is a cooperatively-
managed global register -- seems from what Andrew said to be inaccurate.
So it seems like the only effect of the patch is to make the dataflow
less correct than it was before.

But like I say, I realise I'm sticking my oar in here.

Thanks,
Richard

[PATCH] RISC-V: Avoid more unsplit insns in const expander [PR118832].

2025-02-12 Thread Robin Dapp

Hi,

in PR118832 we have another instance of the problem already noticed in
PR117878.  We sometimes use e.g. expand_simple_binop for vector
operations like shift or and.  While this is usually OK, it causes
problems when doing it late, e.g. during LRA.

In particular, we might rematerialize a const_vector during LRA, which
then leaves an insn laying around that cannot be split any more if it
requires a pseudo.  Therefore we should only use the split variants
in expand_const_vector.

This patch fixed the issue in the PR and also pre-emptively rewrites two
other spots that might be prone to the same issue.

Regtested on rv64gcv_zvl512b.  As the two other cases don't have a test
(so might not even trigger) I unconditionally enabled them for my testsuite
run.

Regards
 Robin

PR target/118832

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector):  Expand as
vlmax insn during lra.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr118832.c: New test.
---
 gcc/config/riscv/riscv-v.cc   | 46 +++
 .../gcc.target/riscv/rvv/autovec/pr118832.c   | 13 ++
 2 files changed, 51 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118832.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9847439ca77..3e86b12bb40 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1265,7 +1265,16 @@ expand_const_vector (rtx target, rtx src)
   element. Use element width = 64 and broadcast a vector with
   all element equal to 0x0706050403020100.  */
  rtx ele = builder.get_merged_repeating_sequence ();
- rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
+ rtx dup;
+ if (lra_in_progress)
+   {
+ dup = gen_reg_rtx (builder.new_mode ());
+ rtx ops[] = {dup, ele};
+ emit_vlmax_insn (code_for_pred_broadcast
+  (builder.new_mode ()), UNARY_OP, ops);
+   }
+ else
+   dup = expand_vector_broadcast (builder.new_mode (), ele);
  emit_move_insn (result, gen_lowpart (mode, dup));
}
   else
@@ -1523,10 +1532,20 @@ expand_const_vector (rtx target, rtx src)
  base2 = gen_int_mode (rtx_to_poly_int64 (base2), new_smode);
  expand_vec_series (tmp2, base2,
 gen_int_mode (step2, new_smode));
- rtx shifted_tmp2 = expand_simple_binop (
-   new_mode, ASHIFT, tmp2,
-   gen_int_mode (builder.inner_bits_size (), Pmode), NULL_RTX,
-   false, OPTAB_DIRECT);
+ rtx shifted_tmp2;
+ rtx shift = gen_int_mode (builder.inner_bits_size (), Xmode);
+ if (lra_in_progress)
+   {
+ shifted_tmp2 = gen_reg_rtx (new_mode);
+ rtx shift_ops[] = {shifted_tmp2, tmp2, shift};
+ emit_vlmax_insn (code_for_pred_scalar
+  (ASHIFT, new_mode), BINARY_OP,
+  shift_ops);
+   }
+ else
+   shifted_tmp2 = expand_simple_binop (new_mode, ASHIFT, tmp2,
+   shift, NULL_RTX, false,
+   OPTAB_DIRECT);
  rtx tmp3 = gen_reg_rtx (new_mode);
  rtx ior_ops[] = {tmp3, tmp1, shifted_tmp2};
  emit_vlmax_insn (code_for_pred (IOR, new_mode), BINARY_OP,
@@ -1539,9 +1558,20 @@ expand_const_vector (rtx target, rtx src)
  rtx vid = gen_reg_rtx (mode);
  expand_vec_series (vid, const0_rtx, const1_rtx);
  /* Transform into { 0, 0, 1, 1, 2, 2, ... }.  */
- rtx shifted_vid
-   = expand_simple_binop (mode, LSHIFTRT, vid, const1_rtx,
-  NULL_RTX, false, OPTAB_DIRECT);
+ rtx shifted_vid;
+ if (lra_in_progress)
+   {
+ shifted_vid = gen_reg_rtx (mode);
+ rtx shift = gen_int_mode (1, Xmode);
+ rtx shift_ops[] = {shifted_vid, vid, shift};
+ emit_vlmax_insn (code_for_pred_scalar
+  (ASHIFT, mode), BINARY_OP,
+  shift_ops);
+   }
+ else
+   shifted_vid = expand_simple_binop (mode, LSHIFTRT, vid,
+  const1_rtx, NULL_RTX,
+  false, OPTAB_DIRECT);
  rtx tmp1 = gen_reg_rtx (mode);
  rtx tmp2 = gen_reg_rtx (mode);
  expand_vec_series (tmp1, base1,
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118832.c 
b/gcc/testsuite/gcc

[PATCH] tree-optimization/118817 - fix ICE with VN CTOR simplification

2025-02-12 Thread Richard Biener

The representation of CONSTRUCTOR nodes in VN NARY and gimple_match_op
do not agree so do not attempt to marshal between them.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/118817
* tree-ssa-sccvn.cc (vn_nary_simplify): Do not process
CONSTRUCTOR NARY or update from CONSTRUCTOR simplified
gimple_match_op.

* gcc.dg/pr118817.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr118817.c | 14 ++
 gcc/tree-ssa-sccvn.cc   |  9 +++--
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr118817.c

diff --git a/gcc/testsuite/gcc.dg/pr118817.c b/gcc/testsuite/gcc.dg/pr118817.c
new file mode 100644
index 000..6cfb424dbf4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr118817.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v4si __attribute__((vector_size (sizeof(int) * 4)));
+
+v4si x;
+
+void foo (int flag)
+{
+  v4si tem = (v4si) { 0, 0, 0, 0 };
+  if (flag)
+tem = (v4si) { flag };
+  x = __builtin_shufflevector (tem, tem, 0, 0, 0, 0);
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 06f6b0ccd72..8bb45780a98 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -2604,13 +2604,18 @@ vn_nary_build_or_lookup (gimple_match_op *res_op)
 tree
 vn_nary_simplify (vn_nary_op_t nary)
 {
-  if (nary->length > gimple_match_op::MAX_NUM_OPS)
+  if (nary->length > gimple_match_op::MAX_NUM_OPS
+  /* For CONSTRUCTOR the vn_nary_op_t and gimple_match_op representation
+does not match.  */
+  || nary->opcode == CONSTRUCTOR)
 return NULL_TREE;
   gimple_match_op op (gimple_match_cond::UNCOND, nary->opcode,
  nary->type, nary->length);
   memcpy (op.ops, nary->op, sizeof (tree) * nary->length);
   tree res = vn_nary_build_or_lookup_1 (&op, false, true);
-  if (op.code.is_tree_code () && op.num_ops <= nary->length)
+  if (op.code.is_tree_code ()
+  && op.num_ops <= nary->length
+  && (tree_code) op.code != CONSTRUCTOR)
 {
   nary->opcode = (tree_code) op.code;
   nary->length = op.num_ops;
-- 
2.43.0

Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-12 Thread Jeff Law





On 2/12/25 7:17 AM, Andrew MacLeod wrote:
The patch is mostly fine, although you probably want to change the 
condition to check for a non-null stmt as well... ie
Thanks for the reminder.  I saw that defaulting and made a mental note 
to test that we actually had a statement, then promptly forgot about it. 
 Spinning it now...




I defer to the release managers about whether it goes in trunk now or 
stage 1 :-)
I think this easily meets the current stage's criteria.  So if you're 
comfortable with the technical bits (with the adjustment noted above), 
then I'll take the responsibility for "in or out of this release" decision.


Jeff

Re: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-12 Thread Richard Biener

On Tue, 11 Feb 2025, Tamar Christina wrote:

> Hi All,
> 
> This fixes two PRs on Early break vectorization by delaying the safety checks 
> to
> vectorizable_load when the VF, VMAT and vectype are all known.
> 
> This patch does add two new restrictions:
> 
> 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
>group sizes, as they are unaligned every n % 2 iterations and so may cross
>a page unwittingly.
> 
> 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization 
> if
>we cannot peel for alignment, as the alignment requirement is quite large 
> at
>GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
>don't support it for now.
> 
> There are other steps documented inside the code itself so that the reasoning
> is next to the code.
> 
> Note that for VLA I have still left this fully disabled when not working on a
> fixed buffer.
> 
> For VLA targets like SVE return element alignment as the desired vector
> alignment.  This means that the loads are never misaligned and so annoying it
> won't ever need to peel.
> 
> So what I think needs to happen in GCC 16 is that.
> 
> 1. during vect_compute_data_ref_alignment we need to take the max of
>POLY_VALUE_MIN and vector_alignment.
> 
> 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
>check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use 
> as a
>proxy for pagesize.
> 
> 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
>vect_determine_partial_vectors_and_peeling since the first iteration has to
>be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
>vectorize.
> 
> 4. Create a default mask to be used, so that 
> vect_use_loop_mask_for_alignment_p
>becomes true and we generate the peeled check through loop control for
>partial loops.  From what I can tell this won't work for
>LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
>all in the compiler.  That would need to be done independently from the
>above.

We basically need to implement peeling/versioning for alignment based
on the actual POLY value with the fallback being first-fault loads.

> In any case, not GCC 15 material so I've kept the WIP patches I have 
> downstream.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/118464
>   PR tree-optimization/116855
>   * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
>   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
>   checks.
>   (vect_compute_data_ref_alignment): Remove alignment checks and move to
>   get_load_store_type, increase group access alignment.
>   (vect_enhance_data_refs_alignment): Add note to comment needing
>   investigating.
>   (vect_analyze_data_refs_alignment): Likewise.
>   (vect_supportable_dr_alignment): For group loads look at first DR.
>   * tree-vect-stmts.cc (get_load_store_type):
>   Perform safety checks for early break pfa.
>   * tree-vectorizer.h (dr_peeling_alignment,
>   dr_set_peeling_alignment): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/118464
>   PR tree-optimization/116855
>   * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
>   load type is relaxed later.
>   * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
>   * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
>   * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
>   * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
>   * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
>   * g++.dg/ext/pragma-unroll-lambda-lto.C: Add pragma novector.
>   * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
>   * gcc.dg/tree-ssa/gen-vect-25.c: Likewise.
>   * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
>   * gcc.dg/tree-ssa/ivopt_mult_2g.c: Likewise.
>   * gcc.dg/tree-ssa/ivopts-5.c: Likewise.
>   * gcc.dg/tree-ssa/ivopts-6.c: Likewise.
>   * gcc.dg/tree-ssa/ivopts-7.c: Likewise.
>   * gcc.dg/tree-ssa/ivopts-8.c: Likewise.
>   * gcc.dg/tree-ssa/ivopts-9.c: Likewise.
>   * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
>

[PATCH] tree-optimization/90579 - avoid STLF fail by better optimizing

2025-02-12 Thread Richard Biener

For the testcase in question which uses a fold-left vectorized
reduction of a reverse iterating loop we'd need two forwprop
invocations to first bypass the permute emitted for the reverse
iterating loop and then to decompose the vector load that only
feeds element extracts.  The following moves the first transform
to a match.pd pattern and makes sure we fold the element extracts
when the vectorizer emits them so the single forwprop pass can
then pick up the vector load decomposition, avoiding the forwarding
fail that causes.

Moving simplify_bitfield_ref also makes forwprop remove the dead
VEC_PERM_EXPR via the simple-dce it uses - this was also
previously missing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/90579
* tree-ssa-forwprop.cc (simplify_bitfield_ref): Move to
match.pd.
(pass_forwprop::execute): Adjust.
* match.pd (bit_field_ref (vec_perm ...)): New pattern
modeled after simplify_bitfield_ref.
* tree-vect-loop.cc (vect_expand_fold_left): Fold the
element extract stmt, combining it with the vector def.

* gcc.target/i386/pr90579.c: New testcase.
---
 gcc/match.pd|  56 +
 gcc/testsuite/gcc.target/i386/pr90579.c |  23 ++
 gcc/tree-ssa-forwprop.cc| 103 +---
 gcc/tree-vect-loop.cc   |   5 ++
 4 files changed, 85 insertions(+), 102 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr90579.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 20b2aec6f37..ea44201f2eb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9538,6 +9538,62 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / const_k)->value; }
   @1 { bitsize_int ((idx % const_k) * width); })
 
+(simplify
+ (BIT_FIELD_REF (vec_perm@0 @1 @2 VECTOR_CST@3) @rsize @rpos)
+ (with
+  {
+tree elem_type = TREE_TYPE (TREE_TYPE (@0));
+poly_uint64 elem_size = tree_to_poly_uint64 (TYPE_SIZE (elem_type));
+poly_uint64 size = tree_to_poly_uint64 (TYPE_SIZE (type));
+unsigned HOST_WIDE_INT nelts, idx;
+unsigned HOST_WIDE_INT nelts_op = 0;
+  }
+  (if (constant_multiple_p (tree_to_poly_uint64 (@rpos), elem_size, &idx)
+   && VECTOR_CST_NELTS (@3).is_constant (&nelts)
+   && (known_eq (size, elem_size)
+  || (constant_multiple_p (size, elem_size, &nelts_op)
+  && pow2p_hwi (nelts_op
+   (with
+{
+  bool ok = true;
+  /* One element.  */
+  if (known_eq (size, elem_size))
+idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (@3, idx)) % (2 * nelts);
+  else
+{
+ /* Clamp vec_perm_expr index.  */
+ unsigned start
+   = TREE_INT_CST_LOW (vector_cst_elt (@3, idx)) % (2 * nelts);
+ unsigned end
+   = (TREE_INT_CST_LOW (vector_cst_elt (@3, idx + nelts_op - 1))
+  % (2 * nelts));
+ /* Be in the same vector.  */
+ if ((start < nelts) != (end < nelts))
+   ok = false;
+ else
+   for (unsigned HOST_WIDE_INT i = 1; i != nelts_op; i++)
+ {
+   /* Continuous area.  */
+   if ((TREE_INT_CST_LOW (vector_cst_elt (@3, idx + i))
+% (2 * nelts) - 1)
+   != (TREE_INT_CST_LOW (vector_cst_elt (@3, idx + i - 1))
+   % (2 * nelts)))
+ {
+   ok = false;
+   break;
+ }
+ }
+ /* Alignment not worse than before.  */
+ if (start % nelts_op)
+   ok = false;
+ idx = start;
+   }
+}
+(if (ok)
+ (if (idx < nelts)
+  (BIT_FIELD_REF @1 @rsize { bitsize_int (idx * elem_size); })
+  (BIT_FIELD_REF @2 @rsize { bitsize_int ((idx - nelts) * elem_size); 
})))
+
 /* Simplify a bit extraction from a bit insertion for the cases with
the inserted element fully covering the extraction or the insertion
not touching the extraction.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr90579.c 
b/gcc/testsuite/gcc.target/i386/pr90579.c
new file mode 100644
index 000..ab48a44063c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr90579.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -mfpmath=sse" } */
+
+extern double r[6];
+extern double a[];
+
+double
+loop (int k, double x)
+{
+  int i;
+  double t=0;
+  for (i=0;i<6;i++)
+r[i] = x * a[i + k];
+  for (i=0;i<6;i++)
+t+=r[5-i];
+  return t;
+}
+
+/* Verify we end up with scalar loads from r for the final sum.  */
+/* { dg-final { scan-assembler "vaddsd\tr\\\+40" } } */
+/* { dg-final { scan-assembler "vaddsd\tr\\\+32" } } */
+/* { dg-final { scan-assembler "vaddsd\tr\\\+24" } } */
+/* { dg-final { scan-assembler "vaddsd\tr\\\+16" } } */
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 9474682152a..fafc4d6b77a 100

Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-12 Thread Andrew MacLeod

The patch is mostly fine, although you probably want to change the 
condition to check for a non-null stmt as well... ie


-   if (subcode == MINUS_EXPR)
+  if (s && subcode == MINUS_EXPR)

because it looks like the stmt defaults to NULL and I suspect the 
relation query will trap if S is null.


I defer to the release managers about whether it goes in trunk now or 
stage 1 :-)


Andrew

On 2/11/25 19:20, Jeff Law wrote:
So this is a fairly old regression, but with all the ranger work 
that's been done, it's become easy to resolve.


The basic idea here is to use known relationships between two operands 
of a SUB_OVERFLOW IFN to statically compute the overflow state and 
ultimately allow turning the IFN into simple arithmetic (or for the 
tests in this BZ elide the arithmetic entirely).


The regression example is when the two inputs are known equal.  In 
that case the subtraction will never overflow.    But there's a few 
other cases we can handle as well.


a == b -> never overflows
a > b  -> never overflows when A and B are unsigned
a >= b -> never overflows when A and B are unsigned
a < b  -> always overflows when A and B are unsigned

Bootstrapped and regression tested on x86, and regression tested on 
the usual cross platforms.


OK for the trunk?

Jeff

Re: [PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-12 Thread Richard Sandiford

Uros Bizjak  writes:
> The combine pass is trying to combine:
>
> Trying 16, 22, 21 -> 23:
>16: r104:QI=flags:CCNO>0
>22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
>   REG_UNUSED flags:CC
>21: r119:QI=flags:CCNO<=0
>   REG_DEAD flags:CCNO

It looks like something has already gone wrong in this sequence,
in that insn 21 is using the flags after the register has been clobbered.
If the flags result of insn 22 is useful, the insn should be setting the
flags using a parallel of two sets.

Thanks,
Richard

>23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
>   REG_DEAD r120:QI
>   REG_DEAD r119:QI
>   REG_UNUSED flags:CC
>
> and creates the following two insn sequence:
>
> modifying insn i222: r104:QI=flags:CCNO>0
>   REG_DEAD flags:CC
> deferring rescan insn with uid = 22.
> modifying insn i323: r110:QI=flags:CCNO<=0
>   REG_DEAD flags:CC
> deferring rescan insn with uid = 23.
>
> where the REG_DEAD note in i2 is not correct, because the flags
> register is still referenced in i3.  In try_combine() megafunction, we
> have this part:
>
> --cut here--
> /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3.  */
> if (i3notes)
>   distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
> elim_i2, elim_i1, elim_i0);
> if (i2notes)
>   distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
> elim_i2, elim_i1, elim_i0);
> if (i1notes)
>   distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
> elim_i2, local_elim_i1, local_elim_i0);
> if (i0notes)
>   distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
> elim_i2, elim_i1, local_elim_i0);
> if (midnotes)
>   distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
> elim_i2, elim_i1, elim_i0);
> --cut here--
>
> where the compiler distributes REG_UNUSED note from i2:
>
>22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
>   REG_UNUSED flags:CC
>
> via distribute_notes() using the following:
>
> --cut here--
>   /* Otherwise, if this register is now referenced in i2
>  then the register used to be modified in one of the
>  original insns.  If it was i3 (say, in an unused
>  parallel), it's now completely gone, so the note can
>  be discarded.  But if it was modified in i2, i1 or i0
>  and we still reference it in i2, then we're
>  referencing the previous value, and since the
>  register was modified and REG_UNUSED, we know that
>  the previous value is now dead.  So, if we only
>  reference the register in i2, we change the note to
>  REG_DEAD, to reflect the previous value.  However, if
>  we're also setting or clobbering the register as
>  scratch, we know (because the register was not
>  referenced in i3) that it's unused, just as it was
>  unused before, and we place the note in i2.  */
>   if (from_insn != i3 && i2 && INSN_P (i2)
>   && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
> {
>   if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
> PUT_REG_NOTE_KIND (note, REG_DEAD);
>   if (! (REG_P (XEXP (note, 0))
>  ? find_regno_note (i2, REG_NOTE_KIND (note),
> REGNO (XEXP (note, 0)))
>  : find_reg_note (i2, REG_NOTE_KIND (note),
>   XEXP (note, 0
> place = i2;
> }
> --cut here--
>
> However, the flags register is not UNUSED (or DEAD), because it is
> used in i3.  The proposed solution is to remove the REG_UNUSED note
> from i2 when the register is also mentioned in i3.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> OK for master and eventual backports?
>
> Uros.

Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-12 Thread Jakub Jelinek

On Tue, Feb 11, 2025 at 05:20:49PM -0700, Jeff Law wrote:
> So this is a fairly old regression, but with all the ranger work that's been
> done, it's become easy to resolve.
> 
> The basic idea here is to use known relationships between two operands of a
> SUB_OVERFLOW IFN to statically compute the overflow state and ultimately
> allow turning the IFN into simple arithmetic (or for the tests in this BZ
> elide the arithmetic entirely).
> 
> The regression example is when the two inputs are known equal.  In that case
> the subtraction will never overflow.But there's a few other cases we can
> handle as well.
> 
> a == b -> never overflows
> a > b  -> never overflows when A and B are unsigned
> a >= b -> never overflows when A and B are unsigned
> a < b  -> always overflows when A and B are unsigned

Is that really the case?
I mean, .SUB_OVERFLOW etc. can have 3 arbitrary types, the type into which
we are trying to write the result and the 2 types of arguments.
Consider:

int
foo (unsigned x, unsigned y)
{
  return __builtin_sub_overflow_p (x, y, (signed char) 0);
}

int
bar (unsigned int x, unsigned long long y)
{
  return __builtin_sub_overflow_p (x, y, (_BitInt(33)) 0);
}

int
main ()
{
  __builtin_printf ("%d\n", foo (16, 16));
  __builtin_printf ("%d\n", foo (65536, 65536));
  __builtin_printf ("%d\n", foo (65536, 16));
  __builtin_printf ("%d\n", bar (0, ~0U));
  __builtin_printf ("%d\n", bar (0, ~0ULL));
}

The a == b case is probably ok, although unsure if the relation query
won't be upset if op0 and op1 have different types (say signed long long vs.
unsigned int), given that result in infinite precision should be 0 and
that will fit into any type.
But the a > b and a >= b cases clearly can overflow if the result type
(element type of the COMPLEX_TYPE result of the ifn) can't represent all the
values of op0's type, so if say __builtin_add_overflow_p with second
argument 0 would also overflow.
And the a < b case can also overflow or not overflow depending on the types
as shown in the test.

So, I think you want to:
a) see with Andrew or check yourself whether relation query can deal with
   operands with different types; if not, restrict just to the case
   where they are compatible
b) either restrict the a > b, a >= b and a < b new optimizations to
   cases where the result and operand types are the same, or add further
   checks; for the a > b and a >= b case there won't be overflow if
   result type can fit all the values in a's type, or as we are in the
   ranger we can just check if the range maximum of op0
   fits into the result type (or even use ranges of both op0 and op1
   for that, we know that the result will be always non-negative given the
   a > b or a >= b relations, so check if maximum of op0 minus minimum of
   op1 will fit into the result type); if yes, it never overflows, otherwise
   we don't know.  Also, unsure why this is about TYPE_UNSIGNED only;
   even for signed operands, if a > b or a >= b, the infinite precision
   result is still non-negative.
   What the a > b and a >= b relations bring into the picture for MINUS_EXPR
   is simply that we don't need to check 2 arith_overflowed_p cases
   but just one; the vr0max vs. vr1min, as for the vr0min vs. vr1max case
   we know it doesn't overflow unless the first one overflows.
   For the a < b case again similarly.

Jakub

RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-12 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, February 12, 2025 2:58 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load
> [PR118464]
> 
> On Tue, 11 Feb 2025, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This fixes two PRs on Early break vectorization by delaying the safety 
> > checks to
> > vectorizable_load when the VF, VMAT and vectype are all known.
> >
> > This patch does add two new restrictions:
> >
> > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> >group sizes, as they are unaligned every n % 2 iterations and so may 
> > cross
> >a page unwittingly.
> >
> > 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization
> if
> >we cannot peel for alignment, as the alignment requirement is quite 
> > large at
> >GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so we
> >don't support it for now.
> >
> > There are other steps documented inside the code itself so that the 
> > reasoning
> > is next to the code.
> >
> > Note that for VLA I have still left this fully disabled when not working on 
> > a
> > fixed buffer.
> >
> > For VLA targets like SVE return element alignment as the desired vector
> > alignment.  This means that the loads are never misaligned and so annoying 
> > it
> > won't ever need to peel.
> >
> > So what I think needs to happen in GCC 16 is that.
> >
> > 1. during vect_compute_data_ref_alignment we need to take the max of
> >POLY_VALUE_MIN and vector_alignment.
> >
> > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard 
> > add a
> >check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use
> as a
> >proxy for pagesize.
> >
> > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> >vect_determine_partial_vectors_and_peeling since the first iteration has 
> > to
> >be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
> >vectorize.
> >
> > 4. Create a default mask to be used, so that
> vect_use_loop_mask_for_alignment_p
> >becomes true and we generate the peeled check through loop control for
> >partial loops.  From what I can tell this won't work for
> >LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling
> support at
> >all in the compiler.  That would need to be done independently from the
> >above.
> 
> We basically need to implement peeling/versioning for alignment based
> on the actual POLY value with the fallback being first-fault loads.
> 
> > In any case, not GCC 15 material so I've kept the WIP patches I have
> downstream.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/118464
> > PR tree-optimization/116855
> > * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > checks.
> > (vect_compute_data_ref_alignment): Remove alignment checks and move
> to
> > get_load_store_type, increase group access alignment.
> > (vect_enhance_data_refs_alignment): Add note to comment needing
> > investigating.
> > (vect_analyze_data_refs_alignment): Likewise.
> > (vect_supportable_dr_alignment): For group loads look at first DR.
> > * tree-vect-stmts.cc (get_load_store_type):
> > Perform safety checks for early break pfa.
> > * tree-vectorizer.h (dr_peeling_alignment,
> > dr_set_peeling_alignment): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/118464
> > PR tree-optimization/116855
> > * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > load type is relaxed later.
> > * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
> > * gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
> > * g++.dg/ext/pragma-unroll-lambda-lto.C: Add pragma novector.
> > * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
> > * gcc.dg/tree-ssa/gen-vect-25.c: Likewise.
>

[PATCH] loop-invariant: Treat inline-asm conditional trapping [PR102150]

2025-02-12 Thread Andrew Pinski

So inline-asm is known not to trap BUT it can have undefined behavior
if made executed speculatively. This fixes the loop invariant pass to
treat it similarly as trapping cases. If the inline-asm could be executed
always, then it will be pulled out of the loop; otherwise it will be kept
inside the loop.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* loop-invariant.cc (find_invariant_insn): Treat inline-asm similar to
trapping instruction and only move them if always executed.

Signed-off-by: Andrew Pinski 
---
 gcc/loop-invariant.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/loop-invariant.cc b/gcc/loop-invariant.cc
index bcb52bb9c76..79a4c39dfb0 100644
--- a/gcc/loop-invariant.cc
+++ b/gcc/loop-invariant.cc
@@ -1123,6 +1123,11 @@ find_invariant_insn (rtx_insn *insn, bool 
always_reached, bool always_executed)
   if (may_trap_or_fault_p (PATTERN (insn)) && !always_reached)
 return;
 
+  /* inline-asm that is not always executed cannot be moved
+ as it might trap. */
+  if (!always_reached && asm_noperands (PATTERN (insn)) >= 0)
+return;
+
   depends_on = BITMAP_ALLOC (NULL);
   if (!check_dependencies (insn, depends_on))
 {
-- 
2.43.0

[PATCH htdocs] bugs: Link to all 'Porting to' docs in 'Common problems when upgrading ...'

2025-02-12 Thread Sam James

Suggested by Andrew Pinski. I think it makes sense to have it in here even
if perhaps a bit verbose, because we really try to tell bug reporters to
read the page properly.

This could also be a table.
---
 htdocs/bugs/index.html | 24 
 1 file changed, 24 insertions(+)

diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
index 99d19095..910ed868 100644
--- a/htdocs/bugs/index.html
+++ b/htdocs/bugs/index.html
@@ -627,6 +627,30 @@ changed the parser rules so that <:: works 
as expected.
 
 Common problems when upgrading the compiler
 
+
+GCC maintains a 'Porting to' resource for new versions of the compiler:
+
+
+  https://gcc.gnu.org/gcc-15/porting_to.html";>GCC 15
+  https://gcc.gnu.org/gcc-14/porting_to.html";>GCC 14
+  https://gcc.gnu.org/gcc-13/porting_to.html";>GCC 13
+  https://gcc.gnu.org/gcc-12/porting_to.html";>GCC 12
+  https://gcc.gnu.org/gcc-11/porting_to.html";>GCC 11
+  https://gcc.gnu.org/gcc-10/porting_to.html";>GCC 10
+  https://gcc.gnu.org/gcc-9/porting_to.html";>GCC 9
+  https://gcc.gnu.org/gcc-8/porting_to.html";>GCC 8
+  https://gcc.gnu.org/gcc-7/porting_to.html";>GCC 7
+  https://gcc.gnu.org/gcc-6/porting_to.html";>GCC 6
+  https://gcc.gnu.org/gcc-5/porting_to.html";>GCC 5
+  https://gcc.gnu.org/gcc-4.9/porting_to.html";>GCC 4.9
+  https://gcc.gnu.org/gcc-4.8/porting_to.html";>GCC 4.8
+  https://gcc.gnu.org/gcc-4.7/porting_to.html";>GCC 4.7
+  https://gcc.gnu.org/gcc-4.6/porting_to.html";>GCC 4.6
+  
+  https://gcc.gnu.org/gcc-4.4/porting_to.html";>GCC 4.4
+  https://gcc.gnu.org/gcc-4.3/porting_to.html";>GCC 4.3
+
+
 ABI changes
 
 The C++ application binary interface (ABI) consists of two
-- 
2.48.1

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread H.J. Lu

On Wed, Feb 12, 2025 at 5:28 PM Uros Bizjak  wrote:
>
> On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu  wrote:
> >
> > Don't assume that stack slots can only be accessed by stack or frame
> > registers.  We first find all registers defined by stack or frame
> > registers.  Then check memory accesses by such registers, including
> > stack and frame registers.
> >
> > gcc/
> >
> > PR target/109780
> > PR target/109093
> > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > (ix86_find_all_reg_use): Likewise.
> > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > from registers defined by stack or frame registers.
> >
> > gcc/testsuite/
> >
> > PR target/109780
> > PR target/109093
> > * g++.target/i386/pr109780-1.C: New test.
> > * gcc.target/i386/pr109093-1.c: Likewise.
> > * gcc.target/i386/pr109780-1.c: Likewise.
> > * gcc.target/i386/pr109780-2.c: Likewise.
>
> > +/* Find all registers defined with REG.  */
> > +
> > +static void
> > +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
> > +   unsigned int reg, auto_bitmap &worklist)
> > +{
> > +  for (df_ref ref = DF_REG_USE_CHAIN (reg);
> > +   ref != NULL;
> > +   ref = DF_REF_NEXT_REG (ref))
> > +{
> > +  if (DF_REF_IS_ARTIFICIAL (ref))
> > +continue;
> > +
> > +  rtx_insn *insn = DF_REF_INSN (ref);
> > +  if (!NONDEBUG_INSN_P (insn))
> > +continue;
> > +
> > +  rtx set = single_set (insn);
> > +  if (!set)
> > +continue;
> > +
>
> Isn't the above condition a bit too limiting? We can have insn with
> multiple sets in the chain.
>
> The issue at hand is the correctness issue (the program will segfault
> if registers are not tracked correctly), not some missing
> optimization. I'd suggest to stay on the safe side and also process
> PARALLELs. Something similar to e.g. store_data_bypass_p from
> recog.cc:
>
> --cut here--
>   rtx set = single_set (insn);
>   if (set)
> ix86_find_all_reg_use_1(...);
>
>   rtx pat = PATTERN (insn);
>   if (GET_CODE (pat) != PARALLEL)
> return false;
>
>   for (int i = 0; i < XVECLEN (pat, 0); i++)
> {
>   rtx exp = XVECEXP (pat, 0, i);
>
>   if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
> continue;
>
>   gcc_assert (GET_CODE (exp) == SET);
>
>   ix86_find_all_reg_use_1(...);
> }
> --cut here--
>
> The above will make ix86_find_all_reg_use significantly more robust.
>
> Uros.

Like this?

/* Helper function for ix86_find_all_reg_use.  */

static void
ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access,
 auto_bitmap &worklist)
{
  rtx src = SET_SRC (set);
  if (MEM_P (src))
return;

  rtx dest = SET_DEST (set);
  if (!REG_P (dest))
return;

  if (TEST_HARD_REG_BIT (stack_slot_access, REGNO (dest)))
return;

  /* Add this register to stack_slot_access.  */
  add_to_hard_reg_set (&stack_slot_access, Pmode, REGNO (dest));
  bitmap_set_bit (worklist, REGNO (dest));
}

/* Find all registers defined with REG.  */

static void
ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
   unsigned int reg, auto_bitmap &worklist)
{
  for (df_ref ref = DF_REG_USE_CHAIN (reg);
   ref != NULL;
   ref = DF_REF_NEXT_REG (ref))
{
  if (DF_REF_IS_ARTIFICIAL (ref))
continue;

  rtx_insn *insn = DF_REF_INSN (ref);
  if (!NONDEBUG_INSN_P (insn))
continue;

  rtx set = single_set (insn);
  if (set)
ix86_find_all_reg_use_1 (set, stack_slot_access, worklist);

  rtx pat = PATTERN (insn);
  if (GET_CODE (pat) != PARALLEL)
continue;

  for (int i = 0; i < XVECLEN (pat, 0); i++)
{
  rtx exp = XVECEXP (pat, 0, i);

  if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
continue;

  gcc_assert (GET_CODE (exp) == SET);

  ix86_find_all_reg_use_1 (exp, stack_slot_access, worklist);
}
}
}


-- 
H.J.

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread H.J. Lu

On Wed, Feb 12, 2025 at 4:03 PM Sam James  wrote:
>
> "H.J. Lu"  writes:
>
> > Don't assume that stack slots can only be accessed by stack or frame
> > registers.  We first find all registers defined by stack or frame
> > registers.  Then check memory accesses by such registers, including
> > stack and frame registers.
> >
> > gcc/
> >
> > PR target/109780
> > PR target/109093
> > * config/i386/i386.cc (ix86_update_stack_alignment): New.
> > (ix86_find_all_reg_use): Likewise.
> > (ix86_find_max_used_stack_alignment): Also check memory accesses
> > from registers defined by stack or frame registers.
> >
> > gcc/testsuite/
> >
> > PR target/109780
> > PR target/109093
> > * g++.target/i386/pr109780-1.C: New test.
> > * gcc.target/i386/pr109093-1.c: Likewise.
> > * gcc.target/i386/pr109780-1.c: Likewise.
> > * gcc.target/i386/pr109780-2.c: Likewise.
>
> Please add the runtime testcase at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780#c29 too.
>
> Also, for pr109093-1.c, please initialise 'f' to 1 to avoid UB (division
> by zero).

Will do.

Thanks.

-- 
H.J.

Re: [PATCH htdocs 1/2] bugs: improve "ABI changes" subsection

2025-02-12 Thread Jonathan Wakely

This looks more accurate than the current wording, yes.

Specifically, only objects/libraries "built with experimental standard
support" need to be recompiled.

LGTM, but I'll let Jason give approval.




On Wed, 12 Feb 2025 at 09:30, Sam James  wrote:
>
> C++ ABI for C++ standards with full support by GCC (rather than those
> marked as experimental per https://gcc.gnu.org/projects/cxx-status.html)
> should be stable. It's certainly not the case in 2025 that one needs a
> full world rebuild for C++ libraries using e.g. the default standard
> or any other supported standard by C++, unless it is marked experimental
> where we provide no guarantees.
> ---
>  htdocs/bugs/index.html | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
> index d6556b26..99d19095 100644
> --- a/htdocs/bugs/index.html
> +++ b/htdocs/bugs/index.html
> @@ -633,14 +633,14 @@ changed the parser rules so that <:: 
> works as expected.
>  components: the first defines how the elements of classes are laid
>  out, how functions are called, how function names are mangled, etc;
>  the second part deals with the internals of the objects in libstdc++.
> -Although we strive for a non-changing ABI, so far we have had to
> -modify it with each major release.  If you change your compiler to a
> -different major release you must recompile all libraries that
> -contain C++ code.  If you fail to do so you risk getting linker
> -errors or malfunctioning programs.
> -It should not be necessary to recompile if you have changed
> -to a bug-fix release of the same version of the compiler; bug-fix
> -releases are careful to avoid ABI changes. See also the
> +For C++ standards marked as
> +https://gcc.gnu.org/projects/cxx-status.html";>experimental,
> +stable ABI is not guaranteed: for these, if you change your compiler to a
> +different major release you must recompile any such libraries built
> +with experimental standard support that contain C++ code.  If you fail
> +to do so, you risk getting linker errors or malfunctioning programs.
> +It should not be necessary to recompile for C++ standards supported fully
> +by GCC, such as the default standard.  See also the
>  https://gcc.gnu.org/onlinedocs/gcc/Compatibility.html";>compatibility
>  section of the GCC manual.
>
> --
> 2.48.1
>

Re: [PATCH htdocs 2/2] gcc-15/porting_to: link to "Standards conformance" section for C++

2025-02-12 Thread Jonathan Wakely

LGTM, thanks

On Wed, 12 Feb 2025 at 09:25, Sam James  wrote:
>
> Suggested by Andrew Pinski.
> ---
>  htdocs/gcc-15/porting_to.html | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/htdocs/gcc-15/porting_to.html b/htdocs/gcc-15/porting_to.html
> index b9b2efc7..829ae92f 100644
> --- a/htdocs/gcc-15/porting_to.html
> +++ b/htdocs/gcc-15/porting_to.html
> @@ -137,6 +137,12 @@ In file included from :1:
>
>  C++ language issues
>
> +
> +Note that all GCC releases make
> +https://gcc.gnu.org/bugs/#upgrading";>improvements to conformance
> +which may reject non-conforming, legacy codebases.
> +
> +
>  Header dependency changes
>  Some C++ Standard Library headers have been changed to no longer include
>  other headers that were being used internally by the library.
> --
> 2.48.1
>

[PATCH v2] s390: Fix s390_valid_shift_count() for TI mode [PR118835]

2025-02-12 Thread Stefan Schulze Frielinghaus

Giving it a second thought: instead of checking for something we don't
expect, namely a const_wide_int, and bailing out in such a case, rather
check for something we expect, namely a const_int, and bail out if that
is not the case.  This should be more robust.

Bootstrap and regtest are still running.  About to push if they are
successful.

-- >8 --

During combine we may end up with

(set (reg:DI 66 [ _6 ])
 (ashift:DI (reg:DI 72 [ x ])
(subreg:QI (and:TI (reg:TI 67 [ _1 ])
   (const_wide_int 0x0aabf))
   15)))

where the shift count operand does not trivially fit the scheme of
address operands.  Reject those operands, especially since
strip_address_mutations() expects expressions of the form
(and ... (const_int ...)) and fails for (and ... (const_wide_int ...)).

Thus, be more strict here and accept only CONST_INT operands.  Done by
replacing immediate_operand() with const_int_operand() which is enough
since the former only additionally checks for LEGITIMATE_PIC_OPERAND_P
and targetm.legitimate_constant_p which are always true for CONST_INT
operands.

While on it, fix indentation of the if block.

gcc/ChangeLog:

PR target/118835
* config/s390/s390.cc (s390_valid_shift_count): Reject shift
count operands which do not trivially fit the scheme of
address operands.

gcc/testsuite/ChangeLog:

* gcc.target/s390/pr118835.c: New test.
---
 gcc/config/s390/s390.cc  | 35 ++--
 gcc/testsuite/gcc.target/s390/pr118835.c | 21 ++
 2 files changed, 41 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr118835.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 1d96df49fea..29aef501fdd 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -3510,26 +3510,31 @@ s390_valid_shift_count (rtx op, HOST_WIDE_INT 
implicit_mask)
 
   /* Check for an and with proper constant.  */
   if (GET_CODE (op) == AND)
-  {
-rtx op1 = XEXP (op, 0);
-rtx imm = XEXP (op, 1);
+{
+  rtx op1 = XEXP (op, 0);
+  rtx imm = XEXP (op, 1);
 
-if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1))
-  op1 = XEXP (op1, 0);
+  if (GET_CODE (op1) == SUBREG && subreg_lowpart_p (op1))
+   op1 = XEXP (op1, 0);
 
-if (!(register_operand (op1, GET_MODE (op1)) || GET_CODE (op1) == PLUS))
-  return false;
+  if (!(register_operand (op1, GET_MODE (op1)) || GET_CODE (op1) == PLUS))
+   return false;
 
-if (!immediate_operand (imm, GET_MODE (imm)))
-  return false;
+  /* Accept only CONST_INT as immediates, i.e., reject shift count operands
+which do not trivially fit the scheme of address operands.  Especially
+since strip_address_mutations() expects expressions of the form
+(and ... (const_int ...)) and fails for
+(and ... (const_wide_int ...)).  */
+  if (!const_int_operand (imm, GET_MODE (imm)))
+   return false;
 
-HOST_WIDE_INT val = INTVAL (imm);
-if (implicit_mask > 0
-   && (val & implicit_mask) != implicit_mask)
-  return false;
+  HOST_WIDE_INT val = INTVAL (imm);
+  if (implicit_mask > 0
+ && (val & implicit_mask) != implicit_mask)
+   return false;
 
-op = op1;
-  }
+  op = op1;
+}
 
   /* Check the rest.  */
   return s390_decompose_addrstyle_without_index (op, NULL, NULL);
diff --git a/gcc/testsuite/gcc.target/s390/pr118835.c 
b/gcc/testsuite/gcc.target/s390/pr118835.c
new file mode 100644
index 000..1ca6cd95543
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr118835.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+/* During combine we may end up with patterns of the form
+
+   (set (reg:DI 66 [ _6 ])
+(ashift:DI (reg:DI 72 [ x ])
+   (subreg:QI (and:TI (reg:TI 67 [ _1 ])
+  (const_wide_int 0x0aabf))
+  15)))
+
+   which should be rejected since the shift count does not trivially fit the
+   scheme of address operands.  */
+
+long
+test (long x, int y)
+{
+  __int128 z = 0xAABF;
+  z &= y;
+  return x << z;
+}
-- 
2.47.0

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread Uros Bizjak

On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu  wrote:
>
> Don't assume that stack slots can only be accessed by stack or frame
> registers.  We first find all registers defined by stack or frame
> registers.  Then check memory accesses by such registers, including
> stack and frame registers.
>
> gcc/
>
> PR target/109780
> PR target/109093
> * config/i386/i386.cc (ix86_update_stack_alignment): New.
> (ix86_find_all_reg_use): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> PR target/109093
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109093-1.c: Likewise.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.

> +/* Find all registers defined with REG.  */
> +
> +static void
> +ix86_find_all_reg_use (HARD_REG_SET &stack_slot_access,
> +   unsigned int reg, auto_bitmap &worklist)
> +{
> +  for (df_ref ref = DF_REG_USE_CHAIN (reg);
> +   ref != NULL;
> +   ref = DF_REF_NEXT_REG (ref))
> +{
> +  if (DF_REF_IS_ARTIFICIAL (ref))
> +continue;
> +
> +  rtx_insn *insn = DF_REF_INSN (ref);
> +  if (!NONDEBUG_INSN_P (insn))
> +continue;
> +
> +  rtx set = single_set (insn);
> +  if (!set)
> +continue;
> +

Isn't the above condition a bit too limiting? We can have insn with
multiple sets in the chain.

The issue at hand is the correctness issue (the program will segfault
if registers are not tracked correctly), not some missing
optimization. I'd suggest to stay on the safe side and also process
PARALLELs. Something similar to e.g. store_data_bypass_p from
recog.cc:

--cut here--
  rtx set = single_set (insn);
  if (set)
ix86_find_all_reg_use_1(...);

  rtx pat = PATTERN (insn);
  if (GET_CODE (pat) != PARALLEL)
return false;

  for (int i = 0; i < XVECLEN (pat, 0); i++)
{
  rtx exp = XVECEXP (pat, 0, i);

  if (GET_CODE (exp) == CLOBBER || GET_CODE (exp) == USE)
continue;

  gcc_assert (GET_CODE (exp) == SET);

  ix86_find_all_reg_use_1(...);
}
--cut here--

The above will make ix86_find_all_reg_use significantly more robust.

Uros.

[PATCH v2 3/4] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-12 Thread Lulu Cheng

PR target/118828

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (loongarch_pragma_target_parse):
Update the predefined macros.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118828.c: New test.
* gcc.target/loongarch/pr118828-2.c: New test.
* gcc.target/loongarch/pr118828-3.c: New test.
* gcc.target/loongarch/pr118828-4.c: New test.

Change-Id: I13f7b44b11bba2080db797157a0389cc1bd65ac6
---
 gcc/config/loongarch/loongarch-c.cc   | 14 +
 .../gcc.target/loongarch/pr118828-2.c | 30 ++
 .../gcc.target/loongarch/pr118828-3.c | 55 +++
 .../gcc.target/loongarch/pr118828-4.c | 55 +++
 gcc/testsuite/gcc.target/loongarch/pr118828.c | 34 
 5 files changed, 188 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-4.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 9a8de1ec381..66ae77ad665 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "c-family/c-common.h"
 #include "cpplib.h"
+#include "c-family/c-pragma.h"
 #include "tm_p.h"
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
@@ -212,6 +213,19 @@ loongarch_pragma_target_parse (tree args, tree pop_target)
 
   loongarch_reset_previous_fndecl ();
 
+  /* For the definitions, ensure all newly defined macros are considered
+ as used for -Wunused-macros.  There is no point warning about the
+ compiler predefined macros.  */
+  cpp_options *cpp_opts = cpp_get_options (parse_in);
+  unsigned char saved_warn_unused_macros = cpp_opts->warn_unused_macros;
+  cpp_opts->warn_unused_macros = 0;
+
+  cpp_force_token_locations (parse_in, BUILTINS_LOCATION);
+  loongarch_update_cpp_builtins (parse_in);
+  cpp_stop_forcing_token_locations (parse_in);
+
+  cpp_opts->warn_unused_macros = saved_warn_unused_macros;
+
   /* If we're popping or reseting make sure to update the globals so that
  the optab availability predicates get recomputed.  */
   if (pop_target)
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-2.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828-2.c
new file mode 100644
index 000..3d32fcc15c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828-2.c
@@ -0,0 +1,30 @@
+/* { dg-do preprocess } */
+/* { dg-options "-mno-lsx" } */
+
+#ifdef __loongarch_sx
+#error LSX should not be available here
+#endif
+
+#ifdef __loongarch_simd_width
+#error simd width shuold not be available here
+#endif
+
+#pragma GCC push_options
+#pragma GCC target("lsx")
+#ifndef __loongarch_sx
+#error LSX should be available here
+#endif
+#ifndef __loongarch_simd_width
+#error simd width should be available here
+#elif __loongarch_simd_width != 128
+#error simd width should be 128
+#endif
+#pragma GCC pop_options
+
+#ifdef __loongarch_sx
+#error LSX should become unavailable again
+#endif
+
+#ifdef __loongarch_simd_width
+#error simd width shuold become unavailable again
+#endif
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
new file mode 100644
index 000..a682ae4a356
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-options "-march=loongarch64" } */
+
+#include 
+#include 
+#include 
+
+#ifndef __loongarch_arch
+#error __loongarch_arch should not be available here
+#endif
+
+void
+test1 (void)
+{
+  if (strcmp (__loongarch_arch, "loongarch64") != 0)
+{
+  printf ("__loongarch_arch should be loongarch64 here.\n");
+  abort ();
+}
+}
+
+
+#pragma GCC push_options
+#pragma GCC target("arch=la64v1.1")
+
+void
+test2 (void)
+{
+  if (strcmp (__loongarch_arch, "la64v1.1") != 0)
+{
+  printf ("__loongarch_arch should be la464 here.\n");
+  abort ();
+}
+}
+#pragma GCC pop_options
+
+void
+test3 (void)
+{
+  if (strcmp (__loongarch_arch, "loongarch64") != 0)
+{
+  printf ("__loongarch_arch should be loongarch64 here.\n");
+  abort ();
+}
+}
+
+int
+main (int argc, char **argv)
+{
+  test1 ();
+  test2 ();
+  test3 ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-4.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c
new file mode 100644
index 000..3b3a7c6078c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-options "-mtune=la464" } */
+
+#include 
+#include 
+#include 
+
+#ifndef __loongarch_tune
+#error __loongarch_tune should not be available here
+#endif
+
+void
+test1 (void)
+{
+  if (strcmp (__loongarc

[PATCH v2 1/4] LoongArch: Move the function loongarch_register_pragmas to loongarch-c.cc.

2025-02-12 Thread Lulu Cheng

gcc/ChangeLog:

* config/loongarch/loongarch-target-attr.cc
(loongarch_pragma_target_parse): Move to ...
(loongarch_register_pragmas): Move to ...
* config/loongarch/loongarch-c.cc
(loongarch_pragma_target_parse): ... here.
(loongarch_register_pragmas): ... here.
* config/loongarch/loongarch-protos.h
(loongarch_process_target_attr): Function Declaration.

Change-Id: Iacb09467e4b4551d6bf0ae55cced5c4abb901ddf
---
 gcc/config/loongarch/loongarch-c.cc   | 51 +++
 gcc/config/loongarch/loongarch-protos.h   |  1 +
 gcc/config/loongarch/loongarch-target-attr.cc | 48 -
 3 files changed, 52 insertions(+), 48 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index c95c0f373be..5d8c02e094b 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -23,9 +23,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "target.h"
 #include "tm.h"
 #include "c-family/c-common.h"
 #include "cpplib.h"
+#include "tm_p.h"
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
 #define builtin_define(TXT) cpp_define (pfile, TXT)
@@ -145,3 +147,52 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32);
 
 }
+
+/* Hook to validate the current #pragma GCC target and set the state, and
+   update the macros based on what was changed.  If ARGS is NULL, then
+   POP_TARGET is used to reset the options.  */
+
+static bool
+loongarch_pragma_target_parse (tree args, tree pop_target)
+{
+  /* If args is not NULL then process it and setup the target-specific
+ information that it specifies.  */
+  if (args)
+{
+  if (!loongarch_process_target_attr (args, NULL))
+   return false;
+
+  loongarch_option_override_internal (&la_target,
+ &global_options,
+ &global_options_set);
+}
+
+  /* args is NULL, restore to the state described in pop_target.  */
+  else
+{
+  pop_target = pop_target ? pop_target : target_option_default_node;
+  cl_target_option_restore (&global_options, &global_options_set,
+   TREE_TARGET_OPTION (pop_target));
+}
+
+  target_option_current_node
+= build_target_option_node (&global_options, &global_options_set);
+
+  loongarch_reset_previous_fndecl ();
+
+  /* If we're popping or reseting make sure to update the globals so that
+ the optab availability predicates get recomputed.  */
+  if (pop_target)
+loongarch_save_restore_target_globals (pop_target);
+
+  return true;
+}
+
+/* Implement REGISTER_TARGET_PRAGMAS.  */
+
+void
+loongarch_register_pragmas (void)
+{
+  /* Update pragma hook to allow parsing #pragma GCC target.  */
+  targetm.target_option.pragma_parse = loongarch_pragma_target_parse;
+}
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index b99f949a004..e7b318143bf 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -219,4 +219,5 @@ extern void loongarch_option_override_internal (struct 
loongarch_target *, struc
 extern void loongarch_reset_previous_fndecl (void);
 extern void loongarch_save_restore_target_globals (tree new_tree);
 extern void loongarch_register_pragmas (void);
+extern bool loongarch_process_target_attr (tree args, tree fndecl);
 #endif /* ! GCC_LOONGARCH_PROTOS_H */
diff --git a/gcc/config/loongarch/loongarch-target-attr.cc 
b/gcc/config/loongarch/loongarch-target-attr.cc
index cee7031ca1e..cb537446dff 100644
--- a/gcc/config/loongarch/loongarch-target-attr.cc
+++ b/gcc/config/loongarch/loongarch-target-attr.cc
@@ -422,51 +422,3 @@ loongarch_option_valid_attribute_p (tree fndecl, tree, 
tree args, int)
   return ret;
 }
 
-/* Hook to validate the current #pragma GCC target and set the state, and
-   update the macros based on what was changed.  If ARGS is NULL, then
-   POP_TARGET is used to reset the options.  */
-
-static bool
-loongarch_pragma_target_parse (tree args, tree pop_target)
-{
-  /* If args is not NULL then process it and setup the target-specific
- information that it specifies.  */
-  if (args)
-{
-  if (!loongarch_process_target_attr (args, NULL))
-   return false;
-
-  loongarch_option_override_internal (&la_target,
- &global_options,
- &global_options_set);
-}
-
-  /* args is NULL, restore to the state described in pop_target.  */
-  else
-{
-  pop_target = pop_target ? pop_target : target_option_default_node;
-  cl_target_option_restore (&global_options, &global_options_set,
-   TREE_TARGET_OPTION (pop_target));
-}
-
-  target_option

[PATCH v2 4/4] LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined [PR118843].

2025-02-12 Thread Lulu Cheng

PR target/118843

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc
(loongarch_update_cpp_builtins): Fix macro definition issues.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr118843.c: New test.

Change-Id: I777e46ccbc80bfa8948e7d416ac86853c8f4c16d
---
 gcc/config/loongarch/loongarch-c.cc   | 27 ++-
 gcc/testsuite/gcc.target/loongarch/pr118843.c |  6 +
 2 files changed, 21 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118843.c

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 66ae77ad665..effdcf0e255 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -129,9 +129,6 @@ loongarch_update_cpp_builtins (cpp_reader *pfile)
   else
 builtin_define ("__loongarch_frlen=0");
 
-  loongarch_def_or_undef (TARGET_HARD_FLOAT && ISA_HAS_FRECIPE,
- "__loongarch_frecipe", pfile);
-
   loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_simd", pfile);
   loongarch_def_or_undef (ISA_HAS_LSX, "__loongarch_sx", pfile);
   loongarch_def_or_undef (ISA_HAS_LASX, "__loongarch_asx", pfile);
@@ -149,17 +146,23 @@ loongarch_update_cpp_builtins (cpp_reader *pfile)
   int max_v_major = 1, max_v_minor = 0;
 
   for (int i = 0; i < N_EVO_FEATURES; i++)
-if (la_target.isa.evolution & la_evo_feature_masks[i])
-  {
-   builtin_define (la_evo_macro_name[i]);
+{
+  builtin_undef (la_evo_macro_name[i]);
 
-   int major = la_evo_version_major[i],
-   minor = la_evo_version_minor[i];
+  if (la_target.isa.evolution & la_evo_feature_masks[i]
+ && (la_evo_feature_masks[i] != OPTION_MASK_ISA_FRECIPE
+ || TARGET_HARD_FLOAT))
+   {
+ builtin_define (la_evo_macro_name[i]);
 
-   max_v_major = major > max_v_major ? major : max_v_major;
-   max_v_minor = major == max_v_major
- ? (minor > max_v_minor ? minor : max_v_minor) : max_v_minor;
-  }
+ int major = la_evo_version_major[i],
+ minor = la_evo_version_minor[i];
+
+ max_v_major = major > max_v_major ? major : max_v_major;
+ max_v_minor = major == max_v_major
+   ? (minor > max_v_minor ? minor : max_v_minor) : max_v_minor;
+   }
+}
 
   /* Find the minimum ISA version required to run the target program.  */
   builtin_undef ("__loongarch_version_major");
diff --git a/gcc/testsuite/gcc.target/loongarch/pr118843.c 
b/gcc/testsuite/gcc.target/loongarch/pr118843.c
new file mode 100644
index 000..30372b8ffe6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118843.c
@@ -0,0 +1,6 @@
+/* { dg-do preprocess } */
+/* { dg-options "-mfrecipe -mfpu=none" } */
+
+#ifdef __loongarch_frecipe
+#error __loongarch_frecipe should not be avaliable here
+#endif
-- 
2.34.1

[PATCH v2 0/4] Organize the code and fix PR118828 and PR118843.

2025-02-12 Thread Lulu Cheng

v1 -> v2:
 1. Move __loongarch_{arch,tune} _LOONGARCH_{ARCH,TUNE}
__loongarch_{div32,am_bh,amcas,ld_seq_sa} and 
__loongarch_version_major/__loongarch_version_minor to update function.
 2. Fixed PR118843.
 3. Add testsuites.

Lulu Cheng (4):
  LoongArch: Move the function loongarch_register_pragmas to
loongarch-c.cc.
  LoongArch: Split the function loongarch_cpu_cpp_builtins into two
functions.
  LoongArch: After setting the compilation options, update the
predefined macros.
  LoongArch: When -mfpu=none, '__loongarch_frecipe' shouldn't be defined
[PR118843].

 gcc/config/loongarch/loongarch-c.cc   | 204 +-
 gcc/config/loongarch/loongarch-protos.h   |   1 +
 gcc/config/loongarch/loongarch-target-attr.cc |  48 -
 .../gcc.target/loongarch/pr118828-2.c |  30 +++
 .../gcc.target/loongarch/pr118828-3.c |  55 +
 .../gcc.target/loongarch/pr118828-4.c |  55 +
 gcc/testsuite/gcc.target/loongarch/pr118828.c |  34 +++
 gcc/testsuite/gcc.target/loongarch/pr118843.c |   6 +
 8 files changed, 333 insertions(+), 100 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-2.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-3.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828-4.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118828.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr118843.c

-- 
2.34.1

[PATCH v2 2/4] LoongArch: Split the function loongarch_cpu_cpp_builtins into two functions.

2025-02-12 Thread Lulu Cheng

Split the implementation of the function loongarch_cpu_cpp_builtins into two 
parts:
  1. Macro definitions that do not change (only considering 64-bit architecture)
  2. Macro definitions that change with different compilation options.

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc (builtin_undef): New macro.
(loongarch_cpu_cpp_builtins): Split to loongarch_update_cpp_builtins
and loongarch_define_unconditional_macros.
(loongarch_def_or_undef): New functions.
(loongarch_define_unconditional_macros): Likewise.
(loongarch_update_cpp_builtins): Likewise.

Change-Id: Ifae73ffa2a07a595ed2a7f6ab7b82d8f51328a2a
---
 gcc/config/loongarch/loongarch-c.cc | 122 ++--
 1 file changed, 77 insertions(+), 45 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 5d8c02e094b..9a8de1ec381 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -31,26 +31,22 @@ along with GCC; see the file COPYING3.  If not see
 
 #define preprocessing_asm_p() (cpp_get_options (pfile)->lang == CLK_ASM)
 #define builtin_define(TXT) cpp_define (pfile, TXT)
+#define builtin_undef(TXT) cpp_undef (pfile, TXT)
 #define builtin_assert(TXT) cpp_assert (pfile, TXT)
 
-void
-loongarch_cpu_cpp_builtins (cpp_reader *pfile)
+static void
+loongarch_def_or_undef (bool def_p, const char *macro, cpp_reader *pfile)
 {
-  builtin_assert ("machine=loongarch");
-  builtin_assert ("cpu=loongarch");
-  builtin_define ("__loongarch__");
-
-  builtin_define_with_value ("__loongarch_arch",
-loongarch_arch_strings[la_target.cpu_arch], 1);
-
-  builtin_define_with_value ("__loongarch_tune",
-loongarch_tune_strings[la_target.cpu_tune], 1);
-
-  builtin_define_with_value ("_LOONGARCH_ARCH",
-loongarch_arch_strings[la_target.cpu_arch], 1);
+  if (def_p)
+cpp_define (pfile, macro);
+  else
+cpp_undef (pfile, macro);
+}
 
-  builtin_define_with_value ("_LOONGARCH_TUNE",
-loongarch_tune_strings[la_target.cpu_tune], 1);
+static void
+loongarch_define_unconditional_macros (cpp_reader *pfile)
+{
+  builtin_define ("__loongarch__");
 
   /* Base architecture / ABI.  */
   if (TARGET_64BIT)
@@ -66,6 +62,48 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   builtin_define ("__loongarch_lp64");
 }
 
+  /* Add support for FLOAT128_TYPE on the LoongArch architecture.  */
+  builtin_define ("__FLOAT128_TYPE__");
+
+  /* Map the old _Float128 'q' builtins into the new 'f128' builtins.  */
+  builtin_define ("__builtin_fabsq=__builtin_fabsf128");
+  builtin_define ("__builtin_copysignq=__builtin_copysignf128");
+  builtin_define ("__builtin_nanq=__builtin_nanf128");
+  builtin_define ("__builtin_nansq=__builtin_nansf128");
+  builtin_define ("__builtin_infq=__builtin_inff128");
+  builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
+
+  /* Native Data Sizes.  */
+  builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_SZPTR", POINTER_SIZE);
+  builtin_define_with_int_value ("_LOONGARCH_FPSET", 32);
+  builtin_define_with_int_value ("_LOONGARCH_SPFPSET", 32);
+}
+
+static void
+loongarch_update_cpp_builtins (cpp_reader *pfile)
+{
+  /* Since the macros in this function might be redefined, it's necessary to
+ undef them first.*/
+  builtin_undef ("__loongarch_arch");
+  builtin_define_with_value ("__loongarch_arch",
+loongarch_arch_strings[la_target.cpu_arch], 1);
+
+  builtin_undef ("__loongarch_tune");
+  builtin_define_with_value ("__loongarch_tune",
+loongarch_tune_strings[la_target.cpu_tune], 1);
+
+  builtin_undef ("_LOONGARCH_ARCH");
+  builtin_define_with_value ("_LOONGARCH_ARCH",
+loongarch_arch_strings[la_target.cpu_arch], 1);
+
+  builtin_undef ("_LOONGARCH_TUNE");
+  builtin_define_with_value ("_LOONGARCH_TUNE",
+loongarch_tune_strings[la_target.cpu_tune], 1);
+
+  builtin_undef ("__loongarch_double_float");
+  builtin_undef ("__loongarch_single_float");
   /* These defines reflect the ABI in use, not whether the
  FPU is directly accessible.  */
   if (TARGET_DOUBLE_FLOAT_ABI)
@@ -73,6 +111,8 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else if (TARGET_SINGLE_FLOAT_ABI)
 builtin_define ("__loongarch_single_float=1");
 
+  builtin_undef ("__loongarch_soft_float");
+  builtin_undef ("__loongarch_hard_float");
   if (TARGET_DOUBLE_FLOAT_ABI || TARGET_SINGLE_FLOAT_ABI)
 builtin_define ("__loongarch_hard_float=1");
   else
@@ -80,6 +120,7 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
 
 
   /* ISA Extensions.  */
+  builtin_undef ("__loongarch_frlen");
   if (TARGET_DOUBLE_FLOAT)
 builtin_define ("__loongarc

Re: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-02-12 Thread Alex Coplan

Ping

On 03/02/2025 14:46, Tamar Christina wrote:
> Ping
> 
> > -Original Message-
> > From: Tamar Christina
> > Sent: Friday, January 24, 2025 9:18 AM
> > To: Alex Coplan ; gcc-patches@gcc.gnu.org
> > Cc: Richard Biener ; Jan Hubicka 
> > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of 
> > multi-exit
> > loops [PR117790]
> > 
> > ping
> > 
> > > -Original Message-
> > > From: Tamar Christina
> > > Sent: Wednesday, January 15, 2025 2:08 PM
> > > To: Alex Coplan ; gcc-patches@gcc.gnu.org
> > > Cc: Richard Biener ; Jan Hubicka 
> > > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of 
> > > multi-
> > exit
> > > loops [PR117790]
> > >
> > > Ping
> > >
> > > > -Original Message-
> > > > From: Alex Coplan 
> > > > Sent: Monday, January 6, 2025 11:35 AM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: Richard Biener ; Jan Hubicka ;
> > Tamar
> > > > Christina 
> > > > Subject: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of 
> > > > multi-exit
> > > > loops [PR117790]
> > > >
> > > > As it stands, scale_loop_profile doesn't correctly handle loops with
> > > > multiple exits.  In particular, in the case where the expected niters
> > > > exceeds iteration_bound, scale_loop_profile attempts to reduce the
> > > > number of iterations with a call to scale_loop_frequencies, which
> > > > multiplies the count of each BB by a given probability.  This
> > > > transformation preserves the relationships between the counts of the BBs
> > > > within the loop (and thus the edge probabilities stay the same) but this
> > > > cannot possibly work for loops with multiple exits, since in order for
> > > > the expected niters to reduce (and counts along exit edges to remain the
> > > > same), the exit edge probabilities must increase, thus decreasing the
> > > > probabilities of the internal edges, meaning that the ratios of the
> > > > counts of the BBs inside the loop must change.  So we need a different
> > > > approach (not a straightforward multiplicative scaling) to adjust the
> > > > expected niters of a loop with multiple exits.
> > > >
> > > > This patch introduces a new helper, flow_scale_loop_freqs, which can be
> > > > used to correctly scale the profile of a loop with multiple exits.  It
> > > > is parameterized by a probability (with which to scale the header and
> > > > therefore the expected niters) and a lambda which gives the desired
> > > > counts for the exit edges.  In this patch, to make things simpler,
> > > > flow_scale_loop_freqs only handles loop shapes without internal control
> > > > flow, and we introduce a predicate can_flow_scale_loop_freqs_p to test
> > > > whether a given loop meets these criteria.  This restriction is
> > > > reasonable since this patch is motivated by fixing the profile
> > > > consistency for early break vectorization, and we don't currently
> > > > vectorize loops with internal control flow.  We also fall back to a
> > > > multiplicative scaling (the status quo) for loops that
> > > > flow_scale_loop_freqs can't handle, so the patch should be a net
> > > > improvement.
> > > >
> > > > We wrap the call to flow_scale_loop_freqs in a helper
> > > > scale_loop_freqs_with_exit_counts which handles the above-mentioned
> > > > fallback.  This wrapper is still generic in that it accepts a lambda to
> > > > allow overriding the desired exit edge counts.  We specialize this with
> > > > another wrapper, scale_loop_freqs_hold_exit_counts (keeping the
> > > > counts along exit edges fixed), which is then used to implement the
> > > > niters-scaling case of scale_loop_profile, thus fixing this path through
> > > > the function for loops with multiple exits.
> > > >
> > > > Finally, we expose two new wrapper functions in cfgloopmanip.h for use
> > > > in subsequent vectorizer patches.  scale_loop_profile_hold_exit_counts
> > > > is a variant of scale_loop_profile which assumes we want to keep the
> > > > counts along exit edges of the loop fixed through both parts of the
> > > > transformation (including the initial probability scale).
> > > > scale_loop_freqs_with_new_exit_count is intended to be used in a
> > > > subsequent patch when adding a skip edge around the epilog, where the
> > > > reduction of count entering the loop is mirrored by a reduced count
> > > > along a given exit edge.
> > > >
> > > > Bootstrapped/regtested as a series on aarch64-linux-gnu,
> > > > x86_64-linux-gnu, and arm-linux-gnueabihf.  OK for trunk?
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/117790
> > > > * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > > > (flow_scale_loop_freqs): New.
> > > > (scale_loop_freqs_with_exit_counts): New.
> > > > (scale_loop_freqs_hold_exit_counts): New.
> > > > (scale_loop_profile): Refactor to use the newly-added
> > > > scale_loop_profile_1, and use scale_loop_freqs_hold_ex

Re: [PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-12 Thread Uros Bizjak

On Wed, Feb 12, 2025 at 4:16 PM Richard Sandiford
 wrote:
>
> Uros Bizjak  writes:
> > The combine pass is trying to combine:
> >
> > Trying 16, 22, 21 -> 23:
> >16: r104:QI=flags:CCNO>0
> >22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
> >   REG_UNUSED flags:CC
> >21: r119:QI=flags:CCNO<=0
> >   REG_DEAD flags:CCNO
>
> It looks like something has already gone wrong in this sequence,
> in that insn 21 is using the flags after the register has been clobbered.
> If the flags result of insn 22 is useful, the insn should be setting the
> flags using a parallel of two sets.

Please note that the insn sequence before combine pass is correct:

   16: r104:QI=flags:CCNO>0
   21: r119:QI=flags:CCNO<=0
  REG_DEAD flags:CCNO
   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
  REG_UNUSED flags:CC
   23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
  REG_DEAD r120:QI
  REG_DEAD r119:QI
  REG_UNUSED flags:CC

then combine tries to make the combined insn with:

Trying 16, 22, 21 -> 23:

where:

Failed to match this instruction:
(parallel [
(set (reg:QI 110 [ d_lsm_flag.20 ])
(le:QI (reg:CCNO 17 flags)
(const_int 0 [0])))
(clobber (reg:CC 17 flags))
(set (reg:QI 104 [ _36 ])
(gt:QI (reg:CCNO 17 flags)
(const_int 0 [0])))
])
Failed to match this instruction:
(parallel [
(set (reg:QI 110 [ d_lsm_flag.20 ])
(le:QI (reg:CCNO 17 flags)
(const_int 0 [0])))
(set (reg:QI 104 [ _36 ])
(gt:QI (reg:CCNO 17 flags)
(const_int 0 [0])))
])
Successfully matched this instruction:
(set (reg:QI 104 [ _36 ])
(gt:QI (reg:CCNO 17 flags)
(const_int 0 [0])))
Successfully matched this instruction:
(set (reg:QI 110 [ d_lsm_flag.20 ])
(le:QI (reg:CCNO 17 flags)
(const_int 0 [0])))
allowing combination of insns 16, 21, 22 and 23
original costs 4 + 4 + 4 + 4 = 16
replacement costs 4 + 4 = 8
deferring deletion of insn with uid = 21.
deferring deletion of insn with uid = 16.
modifying insn i222: r104:QI=flags:CCNO>0
  REG_DEAD flags:CC
deferring rescan insn with uid = 22.
modifying insn i323: r110:QI=flags:CCNO<=0
  REG_DEAD flags:CC
deferring rescan insn with uid = 23.

Combined instruction is OK, but when insn is split into two, it
propagates REG_UNUSED from insn 22 (and converting them to REG_DEAD on
the fly in the referred code) to insn i2. This is clearly wrong when
flags reg is used in insn i3. I don't see anything wrong besides this.

Uros.

Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-12 Thread Jakub Jelinek

On Wed, Feb 12, 2025 at 08:29:37AM -0700, Jeff Law wrote:
> On 2/12/25 8:18 AM, Jakub Jelinek wrote:
> > On Tue, Feb 11, 2025 at 05:20:49PM -0700, Jeff Law wrote:
> > > So this is a fairly old regression, but with all the ranger work that's 
> > > been
> > > done, it's become easy to resolve.
> > > 
> > > The basic idea here is to use known relationships between two operands of 
> > > a
> > > SUB_OVERFLOW IFN to statically compute the overflow state and ultimately
> > > allow turning the IFN into simple arithmetic (or for the tests in this BZ
> > > elide the arithmetic entirely).
> > > 
> > > The regression example is when the two inputs are known equal.  In that 
> > > case
> > > the subtraction will never overflow.But there's a few other cases we 
> > > can
> > > handle as well.
> > > 
> > > a == b -> never overflows
> > > a > b  -> never overflows when A and B are unsigned
> > > a >= b -> never overflows when A and B are unsigned
> > > a < b  -> always overflows when A and B are unsigned
> > 
> > Is that really the case?
> > I mean, .SUB_OVERFLOW etc. can have 3 arbitrary types, the type into which
> > we are trying to write the result and the 2 types of arguments.
> So if the types are allowed to vary within a statement (I guess as an IFN
> they can unlike most gimple operations).

The overflow builtins started from clang's
__builtin_{s,u}{add,sub,mul}{,l,ll}_overflow
builtins (those were added for compatibility), all of those have the 3 types
identical; but the __builtin_{add,sub,mul}_overflow{,_p} builtins allow 3
arbitrary types, whether something overflows or not is determined by
performing the operation in virtually infinite precision and then seeing if
it is representable in the target type.

I'm fine if your patch is for GCC 15 limited to the easy cases with all 3
types compatible (that is the common case).  Still, I think it would be nice
not to restrict to TYPE_UNSIGNED, just say check either that for a >= b or a > b
b is not negative (using vr1min).

And let's file a PR for GCC 16 to do it properly.

Jakub

[PATCH 0/2] v2 Add prime path coverage to gcc/gcov

2025-02-12 Thread Jørgen Kvalsvik

I have applied fixes for everything in the last review, plus some GNU
style fixes that I had missed previously. We have tested and used a
build with this applied for 3-4 months now and haven't run into any
issues.

Jørgen Kvalsvik (2):
  gcov: branch, conds, calls in function summaries
  Add prime path coverage to gcc/gcov

 gcc/Makefile.in|6 +-
 gcc/builtins.cc|2 +-
 gcc/collect2.cc|6 +-
 gcc/common.opt |   16 +
 gcc/doc/gcov.texi  |  187 +++
 gcc/doc/invoke.texi|   36 +
 gcc/gcc.cc |4 +-
 gcc/gcov-counter.def   |3 +
 gcc/gcov-io.h  |3 +
 gcc/gcov.cc|  535 +-
 gcc/ipa-inline.cc  |2 +-
 gcc/passes.cc  |4 +-
 gcc/path-coverage.cc   |  776 +
 gcc/prime-paths.cc | 2052 
 gcc/profile.cc |6 +-
 gcc/selftest-run-tests.cc  |1 +
 gcc/selftest.h |1 +
 gcc/testsuite/g++.dg/gcov/gcov-22.C|  170 ++
 gcc/testsuite/g++.dg/gcov/gcov-23-1.h  |9 +
 gcc/testsuite/g++.dg/gcov/gcov-23-2.h  |9 +
 gcc/testsuite/g++.dg/gcov/gcov-23.C|   30 +
 gcc/testsuite/gcc.misc-tests/gcov-29.c |  869 ++
 gcc/testsuite/gcc.misc-tests/gcov-30.c |  869 ++
 gcc/testsuite/gcc.misc-tests/gcov-31.c |   35 +
 gcc/testsuite/gcc.misc-tests/gcov-32.c |   24 +
 gcc/testsuite/gcc.misc-tests/gcov-33.c |   27 +
 gcc/testsuite/gcc.misc-tests/gcov-34.c |   29 +
 gcc/testsuite/lib/gcov.exp |  118 +-
 gcc/tree-profile.cc|   11 +-
 29 files changed, 5818 insertions(+), 22 deletions(-)
 create mode 100644 gcc/path-coverage.cc
 create mode 100644 gcc/prime-paths.cc
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-1.h
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-2.h
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23.C
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-30.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-31.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-32.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-33.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-34.c

-- 
2.39.5

Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-12 Thread Jeff Law





On 2/12/25 8:18 AM, Jakub Jelinek wrote:

On Tue, Feb 11, 2025 at 05:20:49PM -0700, Jeff Law wrote:

So this is a fairly old regression, but with all the ranger work that's been
done, it's become easy to resolve.

The basic idea here is to use known relationships between two operands of a
SUB_OVERFLOW IFN to statically compute the overflow state and ultimately
allow turning the IFN into simple arithmetic (or for the tests in this BZ
elide the arithmetic entirely).

The regression example is when the two inputs are known equal.  In that case
the subtraction will never overflow.But there's a few other cases we can
handle as well.

a == b -> never overflows
a > b  -> never overflows when A and B are unsigned
a >= b -> never overflows when A and B are unsigned
a < b  -> always overflows when A and B are unsigned


Is that really the case?
I mean, .SUB_OVERFLOW etc. can have 3 arbitrary types, the type into which
we are trying to write the result and the 2 types of arguments.
So if the types are allowed to vary within a statement (I guess as an 
IFN they can unlike most gimple operations).


In which case we'd need to tighten the checks.  THe simplest would be to 
ensure that both arguments have the same/equivalent types.  We could go 
farther than that, though the risk/benefit of doing more may not be great.


I'll digest the rest a bit later, but clearly pushing pause while I do so...

jeff

[PATCH 1/2] gcov: branch, conds, calls in function summaries

2025-02-12 Thread Jørgen Kvalsvik

The gcov function summaries only output the covered lines, not the
branches and calls. Since the function summaries is an opt-in it
probably makes sense to also include branch coverage, calls, and
condition coverage.

$ gcc --coverage -fpath-coverage hello.c -o hello
$ ./hello

Before:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 4

Function 'fn'
Lines executed:100.00% of 7

File 'hello.c'
Lines executed:100.00% of 11
Creating 'hello.c.gcov'

After:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

With conditions:
$ gcov -fg hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1
No conditions

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
Condition outcomes covered:100.00% of 8
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

gcc/ChangeLog:

* gcov.cc (generate_results): Count branches, conditions.
(function_summary): Output branch, calls, condition count.
---
 gcc/gcov.cc | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 89f3c536bfb..026f130db87 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -1687,11 +1687,19 @@ generate_results (const char *file_name)
   memset (&coverage, 0, sizeof (coverage));
   coverage.name = fn->get_name ();
   add_line_counts (flag_function_summary ? &coverage : NULL, fn);
-  if (flag_function_summary)
-   {
- function_summary (&coverage);
- fnotice (stdout, "\n");
-   }
+
+  if (!flag_function_summary)
+   continue;
+
+  for (const block_info& block : fn->blocks)
+   for (arc_info *arc = block.succ; arc; arc = arc->succ_next)
+ add_branch_counts (&coverage, arc);
+
+  for (const block_info& block : fn->blocks)
+   add_condition_counts (&coverage, &block);
+
+  function_summary (&coverage);
+  fnotice (stdout, "\n");
 }
 
   name_map needle;
@@ -2764,6 +2772,36 @@ function_summary (const coverage_info *coverage)
 {
   fnotice (stdout, "%s '%s'\n", "Function", coverage->name);
   executed_summary (coverage->lines, coverage->lines_executed);
+
+  if (coverage->branches)
+{
+  fnotice (stdout, "Branches executed:%s of %d\n",
+  format_gcov (coverage->branches_executed, coverage->branches, 2),
+  coverage->branches);
+  fnotice (stdout, "Taken at least once:%s of %d\n",
+  format_gcov (coverage->branches_taken, coverage->branches, 2),
+   coverage->branches);
+}
+  else
+fnotice (stdout, "No branches\n");
+
+  if (coverage->calls)
+fnotice (stdout, "Calls executed:%s of %d\n",
+format_gcov (coverage->calls_executed, coverage->calls, 2),
+coverage->calls);
+  else
+fnotice (stdout, "No calls\n");
+
+  if (flag_conditions)
+{
+  if (coverage->conditions)
+   fnotice (stdout, "Condition outcomes covered:%s of %d\n",
+format_gcov (coverage->conditions_covered,
+ coverage->conditions, 2),
+coverage->conditions);
+  else
+   fnotice (stdout, "No conditions\n");
+}
 }
 
 /* Output summary info for a file.  */
-- 
2.39.5

Re: rs6000: Add -msplit-patch-nops (PR112980)

2025-02-12 Thread Martin Jambor

Hello,

On Fri, Jan 10 2025, Martin Jambor wrote:
> Hello,
>
> On Wed, Dec 11 2024, Martin Jambor wrote:
>> Hello,
>>
>> even though it is not my work, I would like to ping this patch.  Having
>> it upstream would really help us a lot.
>>
>
> Please, pretty please, consider reviewing this in time for GCC 15,
> having it upstream would really help us a lot and from what I can tell,
> it should no longer be controversial.

Even though we are already in stage4, I believe the risk of accepting
the patch are tiny compared to the benefits it adds.  Can I therefore
still ping it, please?

Thanks,

Martin


>
> Thank you very much in advance,
>
> Martin
>
>>
>> On Wed, Nov 13 2024, Michael Matz wrote:
>>> Hello,
>>>
>>> this is essentially 
>>>
>>>   https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html
>>>
>>> from Kewen in functionality.  When discussing this with Segher at the 
>>> Cauldron he expressed reservations about changing the default 
>>> implementation of -fpatchable-function-entry.  So, to move forward, let's 
>>> move it under a new target option -msplit-patch-nops (expressing the 
>>> important deviation from the default behaviour, namely that all the 
>>> patching nops form a consecutive sequence normally).
>>>
>>> Regstrapping on power9 ppc64le in progress.  Okay if that passes?
>>>
>>>
>>> Ciao,
>>> Michael.
>>>
>>> ---
>>>
>>> as the bug report details some uses of -fpatchable-function-entry
>>> aren't happy with the "before" NOPs being inserted between global and
>>> local entry point on powerpc.  We want the before NOPs be in front
>>> of the global entry point.  That means that the patching NOPs aren't
>>> consecutive for dual entry point functions, but for these usecases
>>> that's not the problem.  But let us support both under the control
>>> of a new target option: -msplit-patch-nops.
>>>
>>> gcc/
>>>
>>> PR target/112980
>>> * config/rs6000/rs6000.opt (msplit-patch-nops): New option.
>>> * doc/invoke.texi (RS/6000 and PowerPC Options): Document it.
>>> * config/rs6000/rs6000.h (machine_function.stop_patch_area_print):
>>> New member.
>>> * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>>> Emit split nops under control of that one.
>>> * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
>>> Add handling of split patch nops.
>>> ---
>>>  gcc/config/rs6000/rs6000-logue.cc | 15 +--
>>>  gcc/config/rs6000/rs6000.cc   | 27 +++
>>>  gcc/config/rs6000/rs6000.h|  6 ++
>>>  gcc/config/rs6000/rs6000.opt  |  4 
>>>  gcc/doc/invoke.texi   | 17 +++--
>>>  5 files changed, 57 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
>>> b/gcc/config/rs6000/rs6000-logue.cc
>>> index c87058b435e..aa1e0442f2b 100644
>>> --- a/gcc/config/rs6000/rs6000-logue.cc
>>> +++ b/gcc/config/rs6000/rs6000-logue.cc
>>> @@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file)
>>>  
>>>unsigned short patch_area_size = crtl->patch_area_size;
>>>unsigned short patch_area_entry = crtl->patch_area_entry;
>>> -  /* Need to emit the patching area.  */
>>> -  if (patch_area_size > 0)
>>> +  /* Emit non-split patching area now.  */
>>> +  if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0)
>>> {
>>>   cfun->machine->global_entry_emitted = true;
>>>   /* As ELFv2 ABI shows, the allowable bytes between the global
>>> @@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file)
>>>patch_area_entry);
>>>   rs6000_print_patchable_function_entry (file, patch_area_entry,
>>>  true);
>>> - patch_area_size -= patch_area_entry;
>>> }
>>> }
>>>  
>>> @@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file)
>>>assemble_name (file, name);
>>>fputs ("\n", file);
>>>/* Emit the nops after local entry.  */
>>> -  if (patch_area_size > 0)
>>> -   rs6000_print_patchable_function_entry (file, patch_area_size,
>>> -  patch_area_entry == 0);
>>> +  if (patch_area_size > patch_area_entry)
>>> +   {
>>> + patch_area_size -= patch_area_entry;
>>> + cfun->machine->stop_patch_area_print = false;
>>> + rs6000_print_patchable_function_entry (file, patch_area_size,
>>> +patch_area_entry == 0);
>>> +   }
>>>  }
>>>  
>>>else if (rs6000_pcrel_p ())
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 950fd947fda..6427e6913ba 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -15226,11 +15226,25 @@ rs6000_print_patchable_function_entry (FILE *file,
>>>  {
>>>bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p 
>>> ();
>>>/* For a func

Re: [PATCH 1/4] vect: Set counts of early break exit blocks correctly [PR117790]

2025-02-12 Thread Alex Coplan

On 05/02/2025 08:05, Tamar Christina wrote:
> 
> 
> > -Original Message-
> > From: Jan Hubicka 
> > Sent: Tuesday, February 4, 2025 4:25 PM
> > To: Alex Coplan 
> > Cc: gcc-patches@gcc.gnu.org; Richard Biener ; Tamar
> > Christina 
> > Subject: Re: [PATCH 1/4] vect: Set counts of early break exit blocks 
> > correctly
> > [PR117790]
> >
> > > This adds missing code to correctly set the counts of the exit blocks we
> > > create when building the CFG for a vectorized early break loop.
> > >
> > > Tested as a series on aarch64-linux-gnu, arm-linux-gnueabihf, and
> > > x86_64-linux-gnu.  OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > > PR tree-optimization/117790
> > > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > > Set profile counts for {main,alt}_loop_exit_block.
> > > ---
> > >  gcc/tree-vect-loop-manip.cc | 10 ++
> > >  1 file changed, 10 insertions(+)
> > >
> >
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index 5d1b70aea43..53d36eaa25f 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1686,6 +1686,16 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop
> > *loop, edge loop_exit,
> > >
> > >   set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> > >loop->header);
> > > +
> > > + /* Fix up the profile counts of the new exit blocks.
> > > +main_loop_exit_block was created by duplicating the
> > > +preheader, so needs its count scaling according to the main
> > > +exit edge's probability.  The remaining count from the
> > > +preheader goes to the alt_loop_exit_block, since all
> > > +alternative exits have been redirected there.  */
> > > + main_loop_exit_block->count = loop_exit->count ();
> > > + alt_loop_exit_block->count
> > > +   = preheader->count - main_loop_exit_block->count;
> >
> > Reading the code, we orignaly have new_preheader that is split into
> > several pieces and loop exits are redirected to them and exit edges of
> > pieces are redirected to the last part?
> >
> 
> Indeed, All alternate exits are redirected to the same block, as they share
> the same induction values, that block and the main exit are then directed to
> a join block which is created just above the guard block for the epilogue.
> 
> 
> > In that case patch is OK.

Thanks, now pushed to trunk (after rebasing and re-testing) as
g:cfdb961588ba318a78e995d2e2cde43130acd993.

Alex

> 
> Thanks!
> Tamar
> 
> > Honza

Re: [PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-12 Thread Richard Sandiford

Uros Bizjak  writes:
> On Wed, Feb 12, 2025 at 4:16 PM Richard Sandiford
>  wrote:
>>
>> Uros Bizjak  writes:
>> > The combine pass is trying to combine:
>> >
>> > Trying 16, 22, 21 -> 23:
>> >16: r104:QI=flags:CCNO>0
>> >22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
>> >   REG_UNUSED flags:CC
>> >21: r119:QI=flags:CCNO<=0
>> >   REG_DEAD flags:CCNO
>>
>> It looks like something has already gone wrong in this sequence,
>> in that insn 21 is using the flags after the register has been clobbered.
>> If the flags result of insn 22 is useful, the insn should be setting the
>> flags using a parallel of two sets.
>
> Please note that the insn sequence before combine pass is correct:
>
>16: r104:QI=flags:CCNO>0
>21: r119:QI=flags:CCNO<=0
>   REG_DEAD flags:CCNO
>22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
>   REG_UNUSED flags:CC
>23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
>   REG_DEAD r120:QI
>   REG_DEAD r119:QI
>   REG_UNUSED flags:CC

Ah, ok!  Sorry for the noise.  I hadn't realised that combine sorted
the instructions into program order only after printing them:

  if (dump_file && (dump_flags & TDF_DETAILS))
{
  if (i0)
fprintf (dump_file, "\nTrying %d, %d, %d -> %d:\n",
 INSN_UID (i0), INSN_UID (i1), INSN_UID (i2), INSN_UID (i3));
  else if (i1)
fprintf (dump_file, "\nTrying %d, %d -> %d:\n",
 INSN_UID (i1), INSN_UID (i2), INSN_UID (i3));
  else
fprintf (dump_file, "\nTrying %d -> %d:\n",
 INSN_UID (i2), INSN_UID (i3));

  if (i0)
dump_insn_slim (dump_file, i0);
  if (i1)
dump_insn_slim (dump_file, i1);
  dump_insn_slim (dump_file, i2);
  dump_insn_slim (dump_file, i3);
}

  /* If multiple insns feed into one of I2 or I3, they can be in any
 order.  To simplify the code below, reorder them in sequence.  */
  if (i0 && DF_INSN_LUID (i0) > DF_INSN_LUID (i2))
std::swap (i0, i2);
  if (i0 && DF_INSN_LUID (i0) > DF_INSN_LUID (i1))
std::swap (i0, i1);
  if (i1 && DF_INSN_LUID (i1) > DF_INSN_LUID (i2))
std::swap (i1, i2);

Feels like it would be more useful to sort them first.

So yeah, please ignore my comment above.

Thanks,
Richard

[PATCH] gcc: testsuite: Fix builtin-speculation-overloads[14].C testism

2025-02-12 Thread mmalcomson

From: Matthew Malcomson 

I've posted the patch on the relevant Bugzilla, but also sending to
mailing list.  If should have only done one please do mention.

- 8< --- >8 
When making warnings trigger a failure in template substitution I
could not find any way to trigger the warning about builtin speculation
not being available on the given target.

Turns out I misread the code -- this warning happens when the
speculation_barrier pattern is not defined.

Here we add an effective target to represent
"__builtin_speculation_safe_value is available on this target" and use
that to adjust our test on SFINAE behaviour accordingly.
N.b. this means that we get extra testing -- not just that things work
on targets which support __builtin_speculation_safe_value, but also that
the behaviour works on targets which don't support it.

Tested with AArch64 native, AArch64 cross compiler, and RISC-V cross
compiler (just running the tests that I've changed).

Points of interest for any reviewer:

In the new `check_known_compiler_messages_nocache` procedure I use some
procedures from `prune.exp`.  This works for the use I need in
the g++ testsuite since g++.exp imports prune.exp and g++-dg.exp
includes gcc-dg.exp which does the initialisation of prune_notes
(needed for this procedure).
- Would it be preferred to include a `load_lib prune.exp` statement at
  the top of `target-supports.exp` in order to use this procedure?
- What about the handling of `initialize_prune_notes` which must have
  been called before calling `prune_gcc_output`?
- I believe it's sensible to not use `gcc-dg-prune` which wraps
  `prune_gcc_output` since I don't believe the wrapping functionality
  useful here -- wanted to highlight that decision for review.

Ok for trunk?

gcc/testsuite/ChangeLog:

PR/117991
* g++.dg/template/builtin-speculation-overloads.def: SUCCESS
argument in SPECULATION_ASSERTS now uses a macro `true_def`
instead of the literal `true` for arguments which should work
with `__builtin_speculation_safe_value`.
* g++.dg/template/builtin-speculation-overloads1.C: Define
`true_def` macro on command line to compiler according to the
effective target representing that
`__builtin_speculation_safe_value` does something on this
target.
* g++.dg/template/builtin-speculation-overloads4.C: Likewise.
* lib/target-supports.exp
(check_known_compiler_messages_nocache): New.
(check_effective_target_speculation_barrier_defined): New.

Signed-off-by: Matthew Malcomson 
---
 .../builtin-speculation-overloads.def |  9 ++-
 .../template/builtin-speculation-overloads1.C |  2 +
 .../template/builtin-speculation-overloads4.C |  2 +
 gcc/testsuite/lib/target-supports.exp | 62 +++
 4 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def 
b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def
index 39d9b748d52..ada13e6f77c 100644
--- a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def
+++ b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads.def
@@ -15,14 +15,17 @@ class X{};
 class Large { public: int arr[10]; };
 class Incomplete;
 
+/* Using `true_def` in order to account for the fact that if this target
+ * doesn't support __builtin_speculation_safe_value at all everything fails to
+ * substitute.  */
 #define SPECULATION_ASSERTS
\
-  MAKE_SPECULATION_ASSERT (int, true)  
\
+  MAKE_SPECULATION_ASSERT (int, true_def)  
\
   MAKE_SPECULATION_ASSERT (float, false)   
\
   MAKE_SPECULATION_ASSERT (X, false)   
\
   MAKE_SPECULATION_ASSERT (Large, false)   
\
   MAKE_SPECULATION_ASSERT (Incomplete, false)  
\
-  MAKE_SPECULATION_ASSERT (int *, true)
\
-  MAKE_SPECULATION_ASSERT (long, true)
+  MAKE_SPECULATION_ASSERT (int *, true_def)
\
+  MAKE_SPECULATION_ASSERT (long, true_def)
 
 int main() {
 SPECULATION_ASSERTS
diff --git a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C 
b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C
index bc8f1083a99..4c50d4aa6f5 100644
--- a/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C
+++ b/gcc/testsuite/g++.dg/template/builtin-speculation-overloads1.C
@@ -1,5 +1,7 @@
 /* Check that overloaded builtins can be used in templates with SFINAE.  */
 // { dg-do compile { target c++17 } }
+// { dg-additional-options "-Dtrue_def=true" { target 
speculation_barrier_defined } }
+// { dg-additional-options "-Dtrue_def=false" { target { ! 
speculation_barrier_defined } }

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread Jeff Law





On 2/12/25 6:44 PM, 钟居哲 wrote:
VSETVL PASS is supposed to insert "vsetvli" into optimal place so 
"vsetvli" inserted by VSETVL PASS shouldn't involved into instruction 
scheduling.
vsetvl pass inserts based on needs of vector instructions.  The 
scheduler will move code to try and minimize the critical path length. 
That includes potentially hoisting any insn into different control 
blocks if doing so has the same semantics, which is what's happening 
here.  The hoisting, AFAICT doesn't change semantics.



jeff

Re: [PATCH] loop-invariant: Treat inline-asm conditional trapping [PR102150]

2025-02-12 Thread Andrew Pinski

On Wed, Feb 12, 2025 at 1:00 AM Richard Biener
 wrote:
>
> On Wed, Feb 12, 2025 at 9:41 AM Andrew Pinski  
> wrote:
> >
> > So inline-asm is known not to trap BUT it can have undefined behavior
> > if made executed speculatively. This fixes the loop invariant pass to
> > treat it similarly as trapping cases. If the inline-asm could be executed
> > always, then it will be pulled out of the loop; otherwise it will be kept
> > inside the loop.
> >
> > Bootstrapped and tested on x86_64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> > * loop-invariant.cc (find_invariant_insn): Treat inline-asm similar 
> > to
> > trapping instruction and only move them if always executed.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/loop-invariant.cc | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/gcc/loop-invariant.cc b/gcc/loop-invariant.cc
> > index bcb52bb9c76..79a4c39dfb0 100644
> > --- a/gcc/loop-invariant.cc
> > +++ b/gcc/loop-invariant.cc
> > @@ -1123,6 +1123,11 @@ find_invariant_insn (rtx_insn *insn, bool 
> > always_reached, bool always_executed)
> >if (may_trap_or_fault_p (PATTERN (insn)) && !always_reached)
> >  return;
> >
> > +  /* inline-asm that is not always executed cannot be moved
> > + as it might trap. */
>
> as it might conditionally trap?

Yes and I changed the comment to be:
  /* inline-asm that is not always executed cannot be moved
 as it might conditionally trap. */

And pushed with that change.

Thanks,
Andrew Pinski

>
> OK.
>
> Thanks,
> Richard.
>
> > +  if (!always_reached && asm_noperands (PATTERN (insn)) >= 0)
> > +return;
> > +
> >depends_on = BITMAP_ALLOC (NULL);
> >if (!check_dependencies (insn, depends_on))
> >  {
> > --
> > 2.43.0
> >

Re: [PATCH v2 3/4] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-12 Thread Xi Ruoyao

On Thu, 2025-02-13 at 09:24 +0800, Lulu Cheng wrote:
> 
> 在 2025/2/12 下午6:19, Xi Ruoyao 写道:
> > On Wed, 2025-02-12 at 18:03 +0800, Lulu Cheng wrote:
> > 
> > /* snip */
> > 
> > > diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c 
> > > b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
> > > new file mode 100644
> > > index 000..a682ae4a356
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
> > > @@ -0,0 +1,55 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-march=loongarch64" } */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#ifndef __loongarch_arch
> > > +#error __loongarch_arch should not be available here
> > > +#endif
> > Hmm this seems not correct?  __loongarch_arch should be just
> > "loongarch64" here (at least it is "loongarch64" with GCC <= 14).
> 
> Well, because this predefined macro must be defined in any case, I added 
> this check here.
> 
> But it seems a bit redundant. I will delete it in v3.

Oh, no need to delete it.  Just change "should not" to "should" :).

I misunderstand the check due to this typo.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] tree-optimization/86270 - improve SSA coalescing for loop exit test

2025-02-12 Thread Andrew Pinski

On Wed, Feb 12, 2025 at 4:04 AM Richard Biener  wrote:
>
> The PR indicates a very specific issue with regard to SSA coalescing
> failures because there's a pre IV increment loop exit test.  While
> IVOPTs created the desired IL we later simplify the exit test into
> the undesirable form again.  The following fixes this up during RTL
> expansion where we try to improve coalescing of IVs.  That seems
> easier that trying to avoid the simplification with some weird
> heuristics (it could also have been written this way).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK for trunk?
>
> Thanks,
> Richard.
>
> PR tree-optimization/86270
> * tree-outof-ssa.cc (insert_backedge_copies): Pattern
> match a single conflict in a loop condition and adjust
> that avoiding the conflict if possible.
>
> * gcc.target/i386/pr86270.c: Adjust to check for no reg-reg
> copies as well.
> ---
>  gcc/testsuite/gcc.target/i386/pr86270.c |  3 ++
>  gcc/tree-outof-ssa.cc   | 49 ++---
>  2 files changed, 47 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr86270.c 
> b/gcc/testsuite/gcc.target/i386/pr86270.c
> index 68562446fa4..89b9aeb317a 100644
> --- a/gcc/testsuite/gcc.target/i386/pr86270.c
> +++ b/gcc/testsuite/gcc.target/i386/pr86270.c
> @@ -13,3 +13,6 @@ test ()
>
>  /* Check we do not split the backedge but keep nice loop form.  */
>  /* { dg-final { scan-assembler-times "L\[0-9\]+:" 2 } } */
> +/* Check we do not end up with reg-reg moves from a pre-increment IV
> +   exit test.  */
> +/* { dg-final { scan-assembler-not "mov\[lq\]\?\t%\?\[er\].x, %\?\[er\].x" } 
> } */
> diff --git a/gcc/tree-outof-ssa.cc b/gcc/tree-outof-ssa.cc
> index d340d4ba529..f285c81599e 100644
> --- a/gcc/tree-outof-ssa.cc
> +++ b/gcc/tree-outof-ssa.cc
> @@ -1259,10 +1259,9 @@ insert_backedge_copies (void)
>   if (gimple_nop_p (def)
>   || gimple_code (def) == GIMPLE_PHI)
> continue;
> - tree name = copy_ssa_name (result);
> - gimple *stmt = gimple_build_assign (name, result);
>   imm_use_iterator imm_iter;
>   gimple *use_stmt;
> + auto_vec uses;
>   /* The following matches trivially_conflicts_p.  */
>   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, result)
> {
> @@ -1273,11 +1272,51 @@ insert_backedge_copies (void)
> {
>   use_operand_p use;
>   FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
> -   SET_USE (use, name);
> +   uses.safe_push (use);
> }
> }
> - gimple_stmt_iterator gsi = gsi_for_stmt (def);
> - gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> + /* When there is just a conflicting statement try to
> +adjust that to refer to the new definition.
> +In particular for now handle a conflict with the
> +use in a (exit) condition with a NE compare,
> +replacing a pre-IV-increment compare with a
> +post-IV-increment one.  */
> + if (uses.length () == 1
> + && is_a  (USE_STMT (uses[0]))
> + && gimple_cond_code (USE_STMT (uses[0])) == NE_EXPR
> + && is_gimple_assign (def)
> + && gimple_assign_rhs1 (def) == result
> + && (gimple_assign_rhs_code (def) == PLUS_EXPR
> + || gimple_assign_rhs_code (def) == MINUS_EXPR
> + || gimple_assign_rhs_code (def) == 
> POINTER_PLUS_EXPR)
> + && TREE_CODE (gimple_assign_rhs2 (def)) == INTEGER_CST)
> +   {
> + gcond *cond = as_a  (USE_STMT (uses[0]));
> + tree *adj;
> + if (gimple_cond_lhs (cond) == result)
> +   adj = gimple_cond_rhs_ptr (cond);
> + else
> +   adj = gimple_cond_lhs_ptr (cond);
> + tree name = copy_ssa_name (result);

Should this be `copy_ssa_name (*adj)`? Since the new name is based on
`*adj` rather than based on the result.

Thanks,
Andrew Pinski

> + gimple *stmt
> +   = gimple_build_assign (name,
> +  gimple_assign_rhs_code (def),
> +  *adj, gimple_assign_rhs2 
> (def));
> + gimple_stmt_iterator gsi = gsi_for_stmt (cond);
> + gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> + *adj = name;
> + SET_USE (uses[0], arg);
> + update_stm

Re: [PATCH 1/2] LoongArch: Fix wrong code with _alsl_reversesi_extended

2025-02-12 Thread Lulu Cheng




在 2025/1/24 下午7:44, Richard Sandiford 写道:

Lulu Cheng  writes:

在 2025/1/24 下午3:58, Richard Sandiford 写道:

Lulu Cheng  writes:

在 2025/1/22 上午8:49, Xi Ruoyao 写道:
I have no problem with this patch.
But, I have always been confused about the use of reload_completed.

I can understand that it needs to be true here, but I don't quite
understand the following:

```

(define_insn_and_split "*zero_extendsidi2_internal"
     [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
       (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand"
"r,m,ZC,k")))]
     "TARGET_64BIT"
     "@
      bstrpick.d\t%0,%1,31,0
      ld.wu\t%0,%1
      #
      ldx.wu\t%0,%1"
     "&& reload_completed
      && MEM_P (operands[1])
      && (loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0),
SImode)
      && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0),
SImode))
      && !paradoxical_subreg_p (operands[0])"
     [(set (match_dup 3) (match_dup 1))
      (set (match_dup 0)
       (ior:DI (zero_extend:DI
     (subreg:SI (match_dup 0) 0))
       (match_dup 2)))]
     {
       operands[1] = gen_lowpart (SImode, operands[1]);
       operands[3] = gen_lowpart (SImode, operands[0]);
       operands[2] = const0_rtx;
     }
     [(set_attr "move_type" "arith,load,load,load")
      (set_attr "mode" "DI")])
```

What is the role of reload_complete here?

Yeah, I agree it looks odd.  In particular, operands[0] should never be
a subreg after RA, so the paradoxical_subreg_p test shouldn't be needed.
And the hard-coded (subreg:SI ... 0) in the expansion pattern doesn't
seem correct for hard registers -- it should be folded down to a single
(reg:SI ...) instead, as for operands[3].

Thanks,
Richard

Now I have a very vague idea of when reload_completed needs to be judged

in the split stage and when it does not need to be judged.:-(

Could you please give me some guidance?

Two of the main uses of reload_completed in splits that I know of are:

(1) Splitting an instruction that has multiple alternatives, in cases
 where the choice between splitting and not splitting depends on
 the register allocation.  An aarch64 example of this is:

 (define_insn_and_split "aarch64_simd_mov_from_low"
   [(set (match_operand: 0 "register_operand")
 (vec_select:
   (match_operand:VQMOV_NO2E 1 "register_operand")
   (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half")))]
   "TARGET_FLOAT"
   {@ [ cons: =0 , 1 ; attrs: type   , arch  ]
  [ w, w ; mov_reg   , simd  ] #
  [ ?r   , w ; neon_to_gp , base_simd ] umov\t%0, %1.d[0]
  [ ?r   , w ; f_mrc , * ] fmov\t%0, %d1
   }
   "&& reload_completed && aarch64_simd_register (operands[0], mode)"
   [(set (match_dup 0) (match_dup 1))]
   {
 operands[1] = aarch64_replace_reg_mode (operands[1], mode);
   }
   [(set_attr "length" "4")]
 )

 Here, we want to split the first alternative (the one where the
 destination is a SIMD register), but we don't know until after RA
 whether the destination is a SIMD register.

(2) Splitting an instruction that the RA finds easier to allocate when
 unsplit.  A common instance of this is multiword moves.  An aarch64
 example is:

 (define_split
   [(set (match_operand:VSTRUCT_2QD 0 "register_operand")
 (match_operand:VSTRUCT_2QD 1 "register_operand"))]
   "TARGET_FLOAT && reload_completed"
   [(const_int 0)]
 {
   aarch64_simd_emit_reg_reg_move (operands, mode, 2);
   DONE;
 })

 In particular, the unsplit form allows input and output registers to
 overlap.  The RA would not allow overlap if the instructions were
 split before RA (since the RA doesn't track the liveness of individual
 SIMD registers in multi-register tuples).

 This might become less of an issue in future, if the RA does become
 able to track the liveness of individual registers in a multi-register
 value.

There'll be other uses too, though.

Richard


I have modified the `zero_extendsidi_internal` template.

I think this writing conforms to the description of subreg in gccint.pdf.:-)


Thanks!


@@ -1766,18 +1766,13 @@ (define_insn_and_split "*zero_extendsidi2_internal"
    ldx.wu\t%0,%1"
   "&& reload_completed
    && MEM_P (operands[1])
-   && (loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0), 
SImode)
-   && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0), 
SImode))

-   && !paradoxical_subreg_p (operands[0])"
+   && loongarch_14bit_shifted_offset_address_p (XEXP (operands[1], 0), 
SImode)

+   && !loongarch_12bit_offset_address_p (XEXP (operands[1], 0), SImode)"
   [(set (match_dup 3) (match_dup 1))
    (set (match_dup 0)
-   (ior:DI (zero_extend:DI
- (subreg:SI (match_dup 0) 0))
-   (match_dup 2)))]
+   (zero_extend:DI (match_dup 3)

Re: Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread 钟居哲

>> vsetvl pass inserts based on needs of vector instructions.
Yes. vector instructions should make use of scheduler which could minimize 
"vsetvli" insertion in VSETVL PASS.
What I said is we should not involve "vsetvli" instruction (not vector 
instructions like vadd.vv) in scheduler.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2025-02-13 11:33
To: 钟居哲; ewlu; gcc-patches
CC: gnu-toolchain; vineetg
Subject: Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
 
 
On 2/12/25 6:44 PM, 钟居哲 wrote:
> VSETVL PASS is supposed to insert "vsetvli" into optimal place so 
> "vsetvli" inserted by VSETVL PASS shouldn't involved into instruction 
> scheduling.
vsetvl pass inserts based on needs of vector instructions.  The 
scheduler will move code to try and minimize the critical path length. 
That includes potentially hoisting any insn into different control 
blocks if doing so has the same semantics, which is what's happening 
here.  The hoisting, AFAICT doesn't change semantics.
 
 
jeff

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread Palmer Dabbelt


On Wed, 12 Feb 2025 16:47:07 PST (-0800), jeffreya...@gmail.com wrote:



On 2/12/25 4:27 PM, Edwin Lu wrote:

The instruction scheduler appears to be speculatively hoisting vsetvl
insns outside of their basic block without checking for data
dependencies. This resulted in a situation where the following occurs

 vsetvli a5,a1,e32,m1,tu,ma
 vle32.v v2,0(a0)
 sub a1,a1,a5 <-- a1 potentially set to 0
 sh2add  a0,a5,a0
 vfmacc.vv   v1,v2,v2
 vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0
 beq a1,zero,.L12 <-- check if avl is 0

This patch would essentially delay the vsetvl update to after the branch
to prevent unnecessarily updating the vinfo at the end of a basic block.

PR/117974

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
(TARGET_SCHED_CAN_SPECULATE_INSN): Implement.

Correct me if I'm wrong, there's not anything inherently wrong with the
speculation from a correctness standpoint.  This is "just" a performance
issue, right?


Hopefully, we didn't go into this looking for functional bugs.

We were seeing some odd behavior from the scheduler: it only moves the 
first vsetvli before the branch, the second crosses the branch while 
merging.  That tripped up a "maybe there's a functional bug lurking 
here", but thinking about it again it seems more likely we're just 
missing an opportunity to schedule that's getting made irrelevant by the 
vsetvli elimination pass.



And from a performance standpoint speculation of the vsetvl could vary
pretty wildly based on uarch characteristics.   I can easily see cases
where it it wildly bad, wildly good and don't really care.

Point being it seems like it should be controlled by a uarch setting
rather than always avoiding or always enabling.


Ya, very much seems like a uarch thing.

We kind of just threw this one together as we're still experimenting 
with this.  The goal was to avoid the VL=0 cases, but I think that's 
even just sort of a side effect of avoiding speculative scheduling here.  
So I think we need to go benchmark this before we can really get a feel 
for what's going on, as it might not be direct enough to catch the 
interesting cases.



Other thoughts?


The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff 
we can't/don't model in the pipeline, but I have no idea how to model 
the VL=0 case there.

Re: [PATCH 0/1] gdc: define ELFv1, ELFv2 and D_PPCUseIEEE128 versions for powerpc

2025-02-12 Thread liushuyu


Hi,


Excerpts from liushuyu's message of Februar 6, 2025 2:02 am:

From: Zixing Liu 

This set of patches will add proper IEEE 128 quad precision marking to GDC
compiler, so that it works with the new changes in D standard library
where POWER system can use any math functions inside the standard library
that requires the "real" type.

The patch also adds the ELFv1 and ELFv2 version identifiers to bridge
the gap between LLVM D Compiler (LDC) and GNU D Compiler (GDC) so that
the user can reliably use the "version(...)" syntax to check for which
ABI is currently in use.

Thanks. I wonder if something could be done to predefine ELFv1 for other
targets too. Unconditionally calling add_builtin_version in glibc-d.cc,
freebsd-d.cc, ..., doesn't seem like the best thing to do, but I'm open
for suggestions.


As far as I understand, ELFv1 and ELFv2 concepts are limited to PowerPC 
platforms due to the need for compatibility.


No other platform (supported by D) has such a need for defining ELFv1 
and ELFv2 version identifiers.



+
+  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
+d_add_builtin_version ("D_PPCUseIEEE128");

It says in the spec that all version identifiers derived from any
predefined versions by appending any character(s) are reserved.

So there's no need for the `D_` prefix, `PPC_UseIEEE128` will suffice.


In the upstream druntime change 
(https://github.com/dlang/dmd/pull/20826), I have changed the identifier 
to `|PPCUseIEEE128` and instead of letting compilers setting it, I have 
changed the druntime so that compiler does not need to explicitly set 
this version identifier.

|

||


+
+// { dg-final { scan-assembler "test_version" } }
+extern (C) bool test_version() {
+// { dg-final { scan-assembler "li 3,1" } }
+version (D_PPCUseIEEE128) return true;
+else return false;
+}
+
+// { dg-final { scan-assembler "test_elf_version" } }
+extern (C) bool test_elf_version() {
+// { dg-final { scan-assembler "li 3,1" } }
+version (ELFv2) return true;
+else return false;
+}

These two tests should return a different value, otherwise you could
just end up matching the same function return twice.


I will address this comment in a newer revision of the patch.

A newer revision of the patch will be submitted after the D language 
upstream merges the druntime changes.



Kind regards,
Iain.


Thanks,

Zixing

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread Jeff Law





On 2/12/25 9:23 PM, Palmer Dabbelt wrote:



PR/117974

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
(TARGET_SCHED_CAN_SPECULATE_INSN): Implement.

Correct me if I'm wrong, there's not anything inherently wrong with the
speculation from a correctness standpoint.  This is "just" a performance
issue, right?


Hopefully, we didn't go into this looking for functional bugs.
Right.  The usual looking for one thing and finding something odd along 
the way...





We kind of just threw this one together as we're still experimenting 
with this.  The goal was to avoid the VL=0 cases, but I think that's 
even just sort of a side effect of avoiding speculative scheduling here. 
So I think we need to go benchmark this before we can really get a feel 
for what's going on, as it might not be direct enough to catch the 
interesting cases.
Yea.  And it's that vl=0 case that falls into the wildly bad bucket of 
scenarios.





Other thoughts?


The docs seem to hint TARGET_SCHED_CAN_SPECULATE_INSN is meant for stuff 
we can't/don't model in the pipeline, but I have no idea how to model 
the VL=0 case there.
Maybe so, but what Edwin is doing looks sensible enough.  It wouldn't be 
the first time a hook got (ab)used in ways that weren't part of the 
original intent.


Jeff

[PATCH, V3] PR target/118541 - Do not generate unordered fp cmoves for IEEE compares on PowerPC

2025-02-12 Thread Michael Meissner

This is version 3 of the patch.

In version 3, I made the following changes:

1:  The new argument to rs6000_reverse_condition that says whether we should
allow ordered floating point compares to be reversed is now an
enumeration instead of a boolean.

2:  I tried to make the code in rs6000_reverse_condition clearer.

3:  I added checks in invert_fpmask_comparison_operator to prevent ordered
floating point compares from being reversed unless -ffast-math.

4:  I split the test cases into 4 separate tests (ordered vs. unordered
compare and -O2 vs. -Ofast).

In bug PR target/118541 on power9, power10, and power11 systems, for the
function:

extern double __ieee754_acos (double);

double
__acospi (double x)
{
  double ret = __ieee754_acos (x) / 3.14;
  return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
}

GCC currently generates the following code:

Power9  Power10 and Power11
==  ===
bl __ieee754_acos   bl __ieee754_acos@notoc
nop plfd 0,.LC0@pcrel
addis 9,2,.LC2@toc@ha   xxspltidp 12,1065353216
addi 1,1,32 addi 1,1,32
lfd 0,.LC2@toc@l(9) ld 0,16(1)
addis 9,2,.LC0@toc@ha   fdiv 0,1,0
ld 0,16(1)  mtlr 0
lfd 12,.LC0@toc@l(9)xscmpgtdp 1,0,12
fdiv 0,1,0  xxsel 1,0,12,1
mtlr 0  blr
xscmpgtdp 1,0,12
xxsel 1,0,12,1
blr

This is because ifcvt.c optimizes the conditional floating point move to use the
XSCMPGTDP instruction.

However, the XSCMPGTDP instruction will generate an interrupt if one of the
arguments is a signalling NaN and signalling NaNs can generate an interrupt.
The IEEE comparison functions (isgreater, etc.) require that the comparison not
raise an interrupt.

The following patch changes the PowerPC back end so that ifcvt.c will not change
the if/then test and move into a conditional move if the comparison is one of
the comparisons that do not raise an error with signalling NaNs and -Ofast is
not used.  If a normal comparison is used or -Ofast is used, GCC will continue
to generate XSCMPGTDP and XXSEL.

For the following code:

double
ordered_compare (double a, double b, double c, double d)
{
  return __builtin_isgreater (a, b) ? c : d;
}

/* Verify normal > does generate xscmpgtdp.  */

double
normal_compare (double a, double b, double c, double d)
{
  return a > b ? c : d;
}

with the following patch, GCC generates the following for power9, power10, and
power11:

ordered_compare:
fcmpu 0,1,2
fmr 1,4
bnglr 0
fmr 1,3
blr

normal_compare:
xscmpgtdp 1,1,2
xxsel 1,4,3,1
blr

I have built bootstrap compilers on big endian power9 systems and little endian
power9/power10 systems and there were no regressions.  Can I check this patch
into the GCC trunk, and after a waiting period, can I check this into the active
older branches?

2025-02-12  Michael Meissner  

gcc/

PR target/118541
* config/rs6000/predicates.md (invert_fpmask_comparison_operator): Do
not allow UNLT and UNLE unless -ffast-math.
* config/rs6000/rs6000-protos.h (enum reverse_cond_t): New enumeration.
(rs6000_reverse_condition): Add argument.
* config/rs6000/rs6000.cc (rs6000_reverse_condition): Do not allow
ordered comparisons to be reversed for floating point conditional moves,
but allow ordered comparisons to be reversed on jumps.
(rs6000_emit_sCOND): Adjust rs6000_reverse_condition call.
* config/rs6000/rs6000.h (REVERSE_CONDITION): Likewise.
* config/rs6000/rs6000.md (reverse_branch_comparison): Name insn.
Adjust rs6000_reverse_condition calls.

gcc/testsuite/

PR target/118541
* gcc.target/powerpc/pr118541-1.c: New test.
* gcc.target/powerpc/pr118541-2.c: Likewise.
* gcc.target/powerpc/pr118541-3.c: Likewise.
* gcc.target/powerpc/pr118541-4.c: Likewise.
---
 gcc/config/rs6000/predicates.md   | 10 +++-
 gcc/config/rs6000/rs6000-protos.h | 17 ++-
 gcc/config/rs6000/rs6000.cc   | 46 ++-
 gcc/config/rs6000/rs6000.h| 10 +++-
 gcc/config/rs6000/rs6000.md   | 24 ++
 gcc/testsuite/gcc.target/powerpc/pr118541-1.c | 28 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-2.c | 26 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-3.c | 26 +++
 gcc/testsuite/gcc.target/powerpc/pr118541-4.c | 26 +++
 9 files changed, 189

Re: [patch,avr] Add -mno-call-main to tweak running main()

2025-02-12 Thread Georg-Johann Lay


...plus, I updated the documentation: -mno-call-main
asserts that main() does not return.

Johann

index 0aef2abf05b..af41d7b9ad3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -24457,6 +24457,24 @@ Do not save registers in @code{main}.  The 
effect is the same like

 attaching attribute @ref{AVR Function Attributes,,@code{OS_task}}
 to @code{main}. It is activated per default if optimization is on.

+@opindex mno-call-main
+@opindex mcall-main
+@item -mno-call-main
+Don't run @code{main} by means of
+@example
+XCALL  main
+XJMP   exit
+@end example
+Instead, put @code{main} in section
+@w{@uref{https://avrdudes.github.io/avr-libc/avr-libc-user-manual/mem_sections.html#sec_dot_init,@code{.init9}}}
+so that no call is required.
+By setting this options the user asserts that @code{main} will not return.
+
+This option can be used for devices with very limited resources in order
+to save a few bytes of code and stack space.  It will work as expected 
since

+@w{@uref{https://github.com/avrdudes/avr-libc/issues/1012,AVR-LibC v2.3}}.
+With older versions, there will be no performance gain.
+
 @opindex mno-interrupts
 @item -mno-interrupts
 Generated code is not compatible with hardware interrupts.
@@ -24535,7 +24553,19 @@ Allow to use truncation instead of rounding 
towards zero for fractional fixed-po


 @opindex nodevicelib
 @item -nodevicelib
-Don't link against AVR-LibC's device specific library
Am 09.02.25 um 11:26 schrieb Georg-Johann Lay:

On devices with very limited resources, it may be desirable to run
main in a more efficient way than provided by the startup code

    XCALL main
    XJMP  exit

from section .init9.  In AVR-LibC v2.3, that code has been moved to
libmcu.a, hence symbol __call_main can be satisfied so that the
respective code is no more pulled in from that library.
Instead, main can be run by putting it in section .init9.

The patch adds attributes noreturn and section(".init9"), and
sets __call_main=0 when it encounters main().

Ok for trunk?

Johann

--

AVR: target/118806 - Add -mno-call-main to tweak running main().

On devices with very limited resources, it may be desirable to run
main in a more efficient way than provided by the startup code

    XCALL main
    XJMP  exit

from section .init9.  In AVR-LibC v2.3, that code has been moved to
libmcu.a, hence symbol __call_main can be satisfied so that the
respective code is no more pulled in from that library.
Instead, main can be run by putting it in section .init9.

The patch adds attributes noreturn and section(".init9"), and
sets __call_main=0 when it encounters main().

gcc/
 PR target/118806
 * config/avr/avr.opt (-mcall-main): New option and...
 (avropt_call_main): ...variable.
 * config/avr/avr.cc (avr_no_call_main_p): New variable.
 (avr_insert_attributes) [-mno-call-main, main]: Add attributes
 noreturn and section(".init9") to main.  Set avr_no_call_main_p.
 (avr_file_end) [avr_no_call_main_p]: Define symbol __call_main.
 * doc/invoke.texi (AVR Options) <-mno-call-main>: Document.
 <-mnodevicelib>: Extend explanation.

Ping [PATCH] Record, report basic blocks of conditional exprs

2025-02-12 Thread Jørgen Kvalsvik


Ping.

On 1/31/25 10:35, Jørgen Kvalsvik wrote:

Record basic blocks that make up a conditional expression with
-fcondition-coverage and report when using the gcov -w/--verbose flag.
This makes the report more accurate when basic blocks are included as
there may be blocks in-between the actual Boolean expressions, e.g. when
there a term is the result of a function call. This helps understanding
the report as gcc uses the CFG, and not source code, to figure out
MC/DC, which is somewhat lost in gcov. While it does not make a
tremendous difference for the gcov report directly, it opens up for more
analysis and clearer reporting.

This change includes deleting the GCOV_TAG_COND_* macros as the .gcno
records are now dynamic in length.

Here is an example with, comparing two programs:

int main() {
   int a = 1;
   int b = 0;

   if (a && b)
 printf ("Success!\n");
   else
 printf ("Failure!\n");
}

int f(int);
int g(int);
int main() {
   int a = 1;
   int b = 0;

   if (f (a) && g (b))
 printf ("Success!\n");
   else
 printf ("Failure!\n");
}

And the corresponding reports:
$ gcov -tagw p1 p2
 1:3:int main() {
 1:4:  int a = 1;
 1:5:  int b = 0;
 -:6:
 1:7:  if (a && b)
 1:7-block 2 (BB 2)
condition outcomes covered 1/4
BBs 2 3
condition  0 not covered (true false)
condition  1 not covered (true)
 1:7-block 3 (BB 3)
 #:8:printf ("Success!\n");
 %:8-block 4 (BB 4)
 -:9:  else
 1:   10:printf ("Failure!\n");
 1:   10-block 5 (BB 5)
 -:   11:}

 #:6:int main() {
 #:7:  int a = 1;
 #:8:  int b = 0;
 -:9:
 #:   10:  if (f (a) && g (b))
 %:   10-block 2 (BB 2)
condition outcomes covered 0/4
BBs 3 5
condition  0 not covered (true false)
condition  1 not covered (true false)
 %:   10-block 4 (BB 4)
 #:   11:printf ("Success!\n");
 %:   11-block 6 (BB 6)
 -:   12:  else
 #:   13:printf ("Failure!\n");
 %:   13-block 7 (BB 7)
 -:   14:}

gcc/ChangeLog:

* doc/gcov.texi: Add example.
 * gcov-dump.cc (tag_conditions): Print basic blocks, not length.
* gcov-io.h (GCOV_TAG_CONDS_LENGTH): Delete.
(GCOV_TAG_CONDS_NUM): Likewise.
* gcov.cc (output_intermediate_json_line): Output basic blocks.
(read_graph_file): Read basic blocks.
(output_conditions): Output basic blocks.
* profile.cc (branch_prob): Write basic blocks for conditions.
---
  gcc/doc/gcov.texi | 32 
  gcc/gcov-dump.cc  | 12 +++-
  gcc/gcov-io.h |  2 --
  gcc/gcov.cc   | 20 +++-
  gcc/profile.cc| 11 +--
  5 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index dda279fbff3..268e9e553f3 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -423,6 +423,7 @@ Each @var{condition} has the following form:
"covered": 2,
"not_covered_false": [],
"not_covered_true": [0, 1],
+  "basic_blocks": [2, 3]
  @}
  
  @end smallexample

@@ -989,6 +990,37 @@ condition  1 not covered (true)
  -:   12:@}
  @end smallexample
  
+With @option{-w}, each condition will also print the basic blocks that

+make up the decision.
+
+@smallexample
+$ gcov -t -m -g -a -w tmp
+-:0:Source:tmp.c
+-:0:Graph:tmp.gcno
+-:0:Data:tmp.gcda
+-:0:Runs:1
+-:1:#include 
+-:2:
+1:3:int main()
+-:4:@{
+1:5:  int a = 1;
+1:6:  int b = 0;
+-:7:
+1:7:  if (a && b)
+1:7-block 2 (BB 2)
+condition outcomes covered 1/4
+BBs 2 3
+condition  0 not covered (true false)
+condition  1 not covered (true)
+1:7-block 3 (BB 3)
+#:8:printf ("Success!\n");
+%:8-block 4 (BB 4)
+-:9:  else
+1:   10:printf ("Failure!\n");
+1:   10-block 5 (BB 5)
+-:   12:@}
+@end smallexample
+
  The execution counts are cumulative.  If the example program were
  executed again without removing the @file{.gcda} file, the count for the
  number of times each line in the source was executed would be added to
diff --git a/gcc/gcov-dump.cc b/gcc/gcov-dump.cc
index cc7f8a9ebfb..642e58c22bf 100644
--- a/gcc/gcov-dump.cc
+++ b/gcc/gcov-dump.cc
@@ -396,23 +396,25 @@ tag_arcs (const char *filename ATTRIBUTE_UNUSED,
  
  /* Print number of conditions (not outcomes, i.e. if (x && y) is 2, not 4).  */

  static void
-tag_conditions (const char *filename, unsigned /* tag */, int length,
+tag_conditions (const char *filename, unsigned /* tag */, int /* length */,
unsigned depth)
  {
-  unsigned n_conditions = GCOV_TAG_CONDS_NUM (length);
+  unsigned n_conditions = gcov_read_unsigned ();
  
printf (" %u condit

Re: 1/7 [Fortran, Patch, Coarray, PR107635] Move caf_get-rewrite to rewrite.cc

2025-02-12 Thread Jerry D


On 2/10/25 2:25 AM, Andre Vehreschild wrote:

[PATCH 1/7] Fortran: Move caf_get-rewrite to rewrite.cc [PR107635]

Add a rewriter to keep all expression tree manipulation that is not
optimization together.  At the moment this is just a move from resolve.cc,
but will be extended to handle more cases where rewriting the expression
tree may be easier.  The first use case is to extract accessors for coarray
remote image data access.

gcc/fortran/ChangeLog:

 PR fortran/107635
 * Make-lang.in: Add rewrite.cc.
 * gfortran.h (gfc_rewrite): New procedure.
 * parse.cc (rewrite_expr_tree): Add entrypoint for rewriting
 expression trees.
 * resolve.cc (gfc_resolve_ref): Remove caf_lhs handling.
 (get_arrayspec_from_expr): Moved to rewrite.cc.
 (remove_coarray_from_derived_type): Same.
 (convert_coarray_class_to_derived_type): Same.
 (split_expr_at_caf_ref): Same.
 (check_add_new_component): Same.
 (create_get_parameter_type): Same.
 (create_get_callback): Same.
 (add_caf_get_intrinsic): Same.
 (resolve_variable): Remove caf_lhs handling.
 * rewrite.cc: New file.

libgfortran/ChangeLog:

 * caf/single.c (_gfortran_caf_finalize): Free memory preventing
 leaks.
 (_gfortran_caf_get_by_ct): Fix constness.

--
Andre Vehreschild * Email: vehre ad gmx dot de


I have started to go through these patches for low hanging fruit. I 
might be good if someone like Tobias or Paul looked deeper although I am 
not really concerned too much as Andre is an expert.


I would like to suggest that you change the name of rewrite.cc to 
coarray.cc since this is what it is dealing with.


Jerry

Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-12 Thread Jeff Law





On 2/12/25 8:44 AM, Jakub Jelinek wrote:

On Wed, Feb 12, 2025 at 08:29:37AM -0700, Jeff Law wrote:

On 2/12/25 8:18 AM, Jakub Jelinek wrote:

On Tue, Feb 11, 2025 at 05:20:49PM -0700, Jeff Law wrote:

So this is a fairly old regression, but with all the ranger work that's been
done, it's become easy to resolve.

The basic idea here is to use known relationships between two operands of a
SUB_OVERFLOW IFN to statically compute the overflow state and ultimately
allow turning the IFN into simple arithmetic (or for the tests in this BZ
elide the arithmetic entirely).

The regression example is when the two inputs are known equal.  In that case
the subtraction will never overflow.But there's a few other cases we can
handle as well.

a == b -> never overflows
a > b  -> never overflows when A and B are unsigned
a >= b -> never overflows when A and B are unsigned
a < b  -> always overflows when A and B are unsigned


Is that really the case?
I mean, .SUB_OVERFLOW etc. can have 3 arbitrary types, the type into which
we are trying to write the result and the 2 types of arguments.

So if the types are allowed to vary within a statement (I guess as an IFN
they can unlike most gimple operations).


The overflow builtins started from clang's
__builtin_{s,u}{add,sub,mul}{,l,ll}_overflow
builtins (those were added for compatibility), all of those have the 3 types
identical; but the __builtin_{add,sub,mul}_overflow{,_p} builtins allow 3
arbitrary types, whether something overflows or not is determined by
performing the operation in virtually infinite precision and then seeing if
it is representable in the target type.

I'm fine if your patch is for GCC 15 limited to the easy cases with all 3
types compatible (that is the common case).  Still, I think it would be nice
not to restrict to TYPE_UNSIGNED, just say check either that for a >= b or a > b
b is not negative (using vr1min).

And let's file a PR for GCC 16 to do it properly.
The pause button was to give me time to get coffee in and brain turned 
on to walk through the cases.


Note there's code later in that function which actually does 
computations based on known ranges to try and prove/disprove overflow 
state.  There may be cases there we can improve range based analysis as 
well and there may be cases where the combination of range information 
and relationships couple be helpful.   Those all felt out of scope to me 
in terms of addressing the regression.  Happy to open a distinct issue 
on possibilities there.


The regression can be resolved entirely by looking at the relationship 
between and the types of the inputs.  Hence it's a distinct block of 
code in that routine and avoids most of the complexity in that routine.


I agree that the most common cases should be all the arguments the same 
type.  I was working under the assumption that the args would be 
compatible types already, forgetting that IFNs are different in that 
regard than other gimple ops.  I wouldn't want to go any further than 
all three operands the same with the easy to reason about relation checks.



For gcc-16 I think we can extend that block fairly easily to handle 
certain mismatched size cases and we look to see if there's cases where 
the combination of a relationship between the arguments and some range 
information would allow us to capture further cases.


It may even make a good relatively early task for one of the interns 
I've got that's starting soon.  Narrow in scope, easily understood, 
doesn't require a ton of internals knowledge to reason about the cases,

easy to evaluate if the transformations are triggering, etc etc.


Jeff

[patch, fortran] PR117430 gfortran allows type(C_ptr) in I/O list

2025-02-12 Thread Jerry D

The attached patch is fairly obvious. The use of notify_std is changed 
to a gfc_error. Several test cases had to be adjusted.


Regression tested on x86_64.

OK for trunk?

Regards,

Jerry


Author: Jerry DeLisle 
Date:   Tue Feb 11 20:57:50 2025 -0800

Fortran:  gfortran allows type(C_ptr) in I/O list

Before this patch, gfortran was accepting invalid use of
type(c_ptr) in I/O statements. The fix affects several
existing test cases so no new test case needed.

Existing tests were modified to pass by either using the
transfer function to convert to an acceptable value or
using an assignment to a like type (non-I/O).

PR fortran/117430

gcc/fortran/ChangeLog:

* resolve.cc (resolve_transfer): Issue the error
with no exceptions allowed.

gcc/testsuite/ChangeLog:

* gfortran.dg/c_loc_test_17.f90: Modify to pass.
* gfortran.dg/c_ptr_tests_10.f03: Likewise.
* gfortran.dg/c_ptr_tests_16.f90: Likewise.
* gfortran.dg/c_ptr_tests_9.f03: Likewise.
* gfortran.dg/init_flag_17.f90: Likewise.
* gfortran.dg/pr32601_1.f03: Likewise.

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 1a4799dac78..3d3f117216c 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -11824,8 +11824,8 @@ resolve_transfer (gfc_code *code)
  the component to be printed to help debugging.  */
   if (ts->u.derived->ts.f90_type == BT_VOID)
 	{
-	  if (!gfc_notify_std (GFC_STD_GNU, "Data transfer element at %L "
-			   "cannot have PRIVATE components", &code->loc))
+	  gfc_error ("Data transfer element at %L "
+		 "cannot have PRIVATE components", &code->loc);
 	return;
 	}
   else if (derived_inaccessible (ts->u.derived) && dtio_sub == NULL)
diff --git a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90 b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
index 4c2a7d657ee..92bfca4363d 100644
--- a/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
+++ b/gcc/testsuite/gfortran.dg/c_loc_test_17.f90
@@ -1,5 +1,4 @@
 ! { dg-do compile }
-! { dg-options "" }
 !
 ! PR fortran/56378
 ! PR fortran/52426
@@ -24,5 +23,5 @@ contains
 end module
 
 use iso_c_binding
-print *, c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have either the POINTER or the TARGET attribute" }
+i = c_loc([1]) ! { dg-error "Argument X at .1. to C_LOC shall have either the POINTER or the TARGET attribute" }
 end
diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03 b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
index 4ce1c6809e4..834570cb74d 100644
--- a/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
+++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_10.f03
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-options "-std=gnu" }
 ! This test case exists because gfortran had an error in converting the 
 ! expressions for the derived types from iso_c_binding in some cases.
 module c_ptr_tests_10
@@ -7,7 +6,7 @@ module c_ptr_tests_10
 
 contains
   subroutine sub0() bind(c)
-print *, 'c_null_ptr is: ', c_null_ptr
+print *, 'c_null_ptr is: ', transfer (cptr, C_LONG_LONG)
   end subroutine sub0
 end module c_ptr_tests_10
 
diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_16.f90 b/gcc/testsuite/gfortran.dg/c_ptr_tests_16.f90
index 68c1da161a0..d1f74857c78 100644
--- a/gcc/testsuite/gfortran.dg/c_ptr_tests_16.f90
+++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_16.f90
@@ -22,13 +22,13 @@ end program test
 subroutine bug1
use ISO_C_BINDING
implicit none
-   type(c_ptr) :: m
+   type(c_ptr) :: m, i
type mytype
  integer a, b, c
end type mytype
type(mytype) x
print *, transfer(32512, x)  ! Works.
-   print *, transfer(32512, m)  ! Caused ICE.
+   i = transfer(32512, m)  ! Caused ICE.
 end subroutine bug1 
 
 subroutine bug6
diff --git a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03 b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
index 5a32553b8c5..711b9c157d4 100644
--- a/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
+++ b/gcc/testsuite/gfortran.dg/c_ptr_tests_9.f03
@@ -16,9 +16,9 @@ contains
 type(myF90Derived), pointer :: my_f90_type_ptr
 
 my_f90_type%my_c_ptr = c_null_ptr
-print *, 'my_f90_type is: ', my_f90_type%my_c_ptr
+print *, 'my_f90_type is: ', transfer(my_f90_type%my_c_ptr,  C_LONG_LONG)
 my_f90_type_ptr => my_f90_type
-print *, 'my_f90_type_ptr is: ', my_f90_type_ptr%my_c_ptr
+print *, 'my_f90_type_ptr is: ', transfer(my_f90_type_ptr%my_c_ptr,  C_LONG_LONG)
   end subroutine sub0
 end module c_ptr_tests_9
 
diff --git a/gcc/testsuite/gfortran.dg/init_flag_17.f90 b/gcc/testsuite/gfortran.dg/init_flag_17.f90
index 401830fccbc..8bb9f7b1ef7 100644
--- a/gcc/testsuite/gfortran.dg/init_flag_17.f90
+++ b/gcc/testsuite/gfortran.dg/init_flag_17.f90
@@ -19,8 +19,8 @@ program init_flag_17
 
   type(ty) :: t
 
-  print *, t%ptr
-  print *, t%fptr
+  print *, transfer(t%ptr, c_long_long)
+  print *, transfer(t%fptr, c_long_long)
 
 end program
 
diff --git a/gcc/te

[pushed] c++: add fixed test [PR101740]

2025-02-12 Thread Marek Polacek

Tested x86_64-pc-linux-gnu, applying to trunk.

-- >8 --
Fixed by r12-3643.

PR c++/101740

gcc/testsuite/ChangeLog:

* g++.dg/template/dtor12.C: New test.
---
 gcc/testsuite/g++.dg/template/dtor12.C | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/dtor12.C

diff --git a/gcc/testsuite/g++.dg/template/dtor12.C 
b/gcc/testsuite/g++.dg/template/dtor12.C
new file mode 100644
index 000..2c75ee03d8e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/dtor12.C
@@ -0,0 +1,19 @@
+// PR c++/101740
+
+template class T, class U>
+struct Test{
+void fun(){
+T d;
+d.~GG();  // #1
+}
+};
+
+template
+struct GG {};
+
+int
+main ()
+{
+  Test b;
+  b.fun();
+}

base-commit: cfdb961588ba318a78e995d2e2cde43130acd993
-- 
2.48.1

Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-12 Thread Andrew MacLeod




On 2/12/25 10:18, Jakub Jelinek wrote:

On Tue, Feb 11, 2025 at 05:20:49PM -0700, Jeff Law wrote:

So this is a fairly old regression, but with all the ranger work that's been
done, it's become easy to resolve.

The basic idea here is to use known relationships between two operands of a
SUB_OVERFLOW IFN to statically compute the overflow state and ultimately
allow turning the IFN into simple arithmetic (or for the tests in this BZ
elide the arithmetic entirely).

The regression example is when the two inputs are known equal.  In that case
the subtraction will never overflow.But there's a few other cases we can
handle as well.

a == b -> never overflows
a > b  -> never overflows when A and B are unsigned
a >= b -> never overflows when A and B are unsigned
a < b  -> always overflows when A and B are unsigned

Is that really the case?
I mean, .SUB_OVERFLOW etc. can have 3 arbitrary types, the type into which
we are trying to write the result and the 2 types of arguments.
Consider:

int
foo (unsigned x, unsigned y)
{
   return __builtin_sub_overflow_p (x, y, (signed char) 0);
}

int
bar (unsigned int x, unsigned long long y)
{
   return __builtin_sub_overflow_p (x, y, (_BitInt(33)) 0);
}

int
main ()
{
   __builtin_printf ("%d\n", foo (16, 16));
   __builtin_printf ("%d\n", foo (65536, 65536));
   __builtin_printf ("%d\n", foo (65536, 16));
   __builtin_printf ("%d\n", bar (0, ~0U));
   __builtin_printf ("%d\n", bar (0, ~0ULL));
}

The a == b case is probably ok, although unsure if the relation query
won't be upset if op0 and op1 have different types (say signed long long vs.


Relation queries are purely ssa-name based, and never look at the 
type.   The only way a relation can exist between 2 different typed 
SSA_NAMES with different size/signs is if a call is made to record such 
a relation. Its not disallowed, but its unlikely to happen from within 
ranger currently (other than partial equivalences), but presumably even 
if it did, it should only come from a circumstance where the operation 
generating the relation knows it to be true.  Typically relations are 
generated as known side effects of a stmt executing.. like if (a < b), 
and all the instances I am aware of involve range-ops between operands 
of the same size..


Its easy enough to be safe an check if it matters tho I suppose.

FWIW, ==  should only come up when both the sign and # bits are the 
same.  Otherwise it is represented with a partial equivalence which 
indicates only that a certain number of bits are equal.


Andrew

[PATCH] LoongArch: Fix the issue of function jump out of range caused by crtbeginS.o [PR118844].

2025-02-12 Thread Lulu Cheng

Due to the presence of R_LARCH_B26 in
/usr/lib/gcc/loongarch64-linux-gnu/14/crtbeginS.o, its addressing
range is [PC-128MiB, PC+128MiB-4]. This means that when the code
segment size exceeds 128MB, linking with lld will definitely fail
(ld will not fail because the order of the two is different).

The linking order:
  lld: crtbeginS.o + .text + .plt
  ld : .plt + crtbeginS.o + .text

To solve this issue, add '-mcmodel=extreme' when compiling crtbeginS.o.

libgcc/ChangeLog:

* config/loongarch/t-crtstuff: Add '-mcmodel=extreme'
to CRTSTUFF_T_CFLAGS_S.

---
 libgcc/config/loongarch/t-crtstuff | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libgcc/config/loongarch/t-crtstuff 
b/libgcc/config/loongarch/t-crtstuff
index b8c36eb66b7..2a2489b7ef4 100644
--- a/libgcc/config/loongarch/t-crtstuff
+++ b/libgcc/config/loongarch/t-crtstuff
@@ -3,3 +3,9 @@
 # to .eh_frame data from crtbeginT.o instead of the user-defined object
 # during static linking.
 CRTSTUFF_T_CFLAGS += -fno-omit-frame-pointer -fno-asynchronous-unwind-tables
+
+# As shown in the test case PR118844, when using lld for linking,
+# it fails due to B26 in crtbeginS.o causing the link to exceed the range.
+# Therefore, the issue was resolved by adding the compilation option
+# "-mcmodel=extreme" when compiling crtbeginS.o.
+CRTSTUFF_T_CFLAGS_S += -mcmodel=extreme
-- 
2.34.1

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread Sam James

"H.J. Lu"  writes:

> Don't assume that stack slots can only be accessed by stack or frame
> registers.  We first find all registers defined by stack or frame
> registers.  Then check memory accesses by such registers, including
> stack and frame registers.
>
> gcc/
>
> PR target/109780
> PR target/109093
> * config/i386/i386.cc (ix86_update_stack_alignment): New.
> (ix86_find_all_reg_use): Likewise.
> (ix86_find_max_used_stack_alignment): Also check memory accesses
> from registers defined by stack or frame registers.
>
> gcc/testsuite/
>
> PR target/109780
> PR target/109093
> * g++.target/i386/pr109780-1.C: New test.
> * gcc.target/i386/pr109093-1.c: Likewise.
> * gcc.target/i386/pr109780-1.c: Likewise.
> * gcc.target/i386/pr109780-2.c: Likewise.

Please add the runtime testcase at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109780#c29 too.

Also, for pr109093-1.c, please initialise 'f' to 1 to avoid UB (division
by zero).

Re: [PATCH] c++: Fix up regressions caused by for/while loops with declarations [PR118822]

2025-02-12 Thread Jason Merrill


On 2/11/25 6:59 PM, Jakub Jelinek wrote:

Hi!

The recent PR86769 r15-7426 changes regressed the following two testcases,
the first one is more important as it is derived from real-world code.

The first problem is that the chosen
prep = do_pushlevel (sk_block);
// emit something
body = push_stmt_list ();
// emit further stuff
body = pop_stmt_list (body);
prep = do_poplevel (prep);
way of constructing the {FOR,WHILE}_COND_PREP and {FOR,WHILE}_BODY
isn't reliable.  If during parsing a label is seen in the body and then
some decl with destructors, sk_cleanup transparent scope is added, but
the correspondiong result from push_stmt_list is saved in
*current_binding_level and pop_stmt_list then pops even that statement list
but only do_poplevel actually attempts to pop the sk_cleanup scope and so we
ICE.
The reason for not doing do_pushlevel (sk_block); do_pushlevel (sk_block);
is that variables should be in the same scope (otherwise various e.g.
redeclaration*.C tests FAIL) and doing do_pushlevel (sk_block); do_pushlevel
(sk_cleanup); wouldn't work either as do_poplevel would silently unwind even
the cleanup one.

The second problem is that my assumption that the declaration in the
condition will have zero or one cleanup is just wrong, at least for
structured bindings used as condition, there can be as many cleanups as
there are names in the binding + 1.

So, the following patch changes the earlier approach.  Nothing is removed
from the {FOR,WHILE}_COND_PREP subtrees while doing adjust_loop_decl_cond,
push_stmt_list isn't called either; all it does is remember as an integer
the number of cleanups (CLEANUP_STMT at the end of the STATEMENT_LISTs)
from querying stmt_list_stack and finding the initial *body_p in there
(that integer is stored into {FOR,WHILE}_COND_CLEANUP), and temporarily
{FOR,WHILE}_BODY is set to the last statement (if any) in the innermost
STATEMENT_LIST at the adjust_loop_decl_cond time; then at
finish_{for,while}_stmt a new finish_loop_cond_prep routine takes care of
do_poplevel for the scope (which is in {FOR,WHILE}_COND_PREP) and finds
given {FOR,WHILE}_COND_CLEANUP number and {FOR,WHILE}_BODY tree the right
spot where body statements start and moves that into {FOR,WHILE}_BODY.
Finally genericize_c_loop then inserts the cond, body, continue label, expr
into the right subtree of {FOR,WHILE}_COND_PREP.
The constexpr evaluation unfortunately had to be changed as well, because
we don't want to evaluate everything in BIND_EXPR_BODY (*_COND_PREP ())
right away, we want to evaluate it with the exception of the CLEANUP_STMT
cleanups at the end (given {FOR,WHILE}_COND_CLEANUP levels), and defer
the evaluation of the cleanups until after cond, body, expr are evaluated.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-02-11  Jakub Jelinek  

PR c++/118822
PR c++/118833
gcc/c-family/
* c-common.h (WHILE_COND_CLEANUP): Change description in comment.
(FOR_COND_CLEANUP): Likewise.
* c-gimplify.cc (genericize_c_loop): Adjust for COND_CLEANUP
being CLEANUP_STMT/TRY_FINALLY_EXPR trailing nesting depth
instead of actual cleanup.
gcc/cp/
* semantics.cc (adjust_loop_decl_cond): Allow multiple trailing
CLEANUP_STMT levels in *BODY_P.  Set *CLEANUP_P to the number
of levels rather than one particular cleanup, keep the cleanups
in *PREP_P.  Set *BODY_P to the last stmt in the cur_stmt_list
or NULL if *CLEANUP_P and the innermost cur_stmt_list is empty.
(finish_loop_cond_prep): New function.
(finish_while_stmt, finish_for_stmt): Use it.  Don't call
set_one_cleanup_loc.
* constexpr.cc (cxx_eval_loop_expr): Adjust handling of
{FOR,WHILE}_COND_{PREP,CLEANUP}.
gcc/testsuite/
* g++.dg/expr/for9.C: New test.
* g++.dg/cpp26/decomp12.C: New test.

--- gcc/c-family/c-common.h.jj  2025-02-07 17:06:50.777235245 +0100
+++ gcc/c-family/c-common.h 2025-02-11 12:12:13.034861256 +0100
@@ -1518,7 +1518,8 @@ extern tree build_userdef_literal (tree
  
  /* WHILE_STMT accessors.  These give access to the condition of the

 while statement, the body, and name of the while statement, and
-   condition preparation statements and its cleanup, respectively.  */
+   condition preparation statements and number of its nested cleanups,
+   respectively.  */
  #define WHILE_COND(NODE)  TREE_OPERAND (WHILE_STMT_CHECK (NODE), 0)
  #define WHILE_BODY(NODE)  TREE_OPERAND (WHILE_STMT_CHECK (NODE), 1)
  #define WHILE_NAME(NODE)  TREE_OPERAND (WHILE_STMT_CHECK (NODE), 2)
@@ -1533,7 +1534,8 @@ extern tree build_userdef_literal (tree
  
  /* FOR_STMT accessors.  These give access to the init statement,

 condition, update expression, body and name of the for statement,
-   and condition preparation statements and its cleanup, respectively.  */
+   and condition preparation statements and number of its nested cleanups,
+   respectively.  */
  #define FOR

Re: [PATCH] c++: P2308, Template parameter initialization (tests) [PR113800]

2025-02-12 Thread Jason Merrill


On 2/12/25 7:54 PM, Marek Polacek wrote:

Tested on x86_64-pc-linux-gnu, ok for trunk?  I'll also update cxx-status.html.


OK.


-- >8 --
This proposal was implemented a long time ago by my r9-5271,
but it took me this long to verify that it still works as per P2308.

This patch adds assorted tests, both from clang and from [temp.arg.nontype].
Fortunately I did not discover any issues in the compiler.

PR c++/113800
DR 2450

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing15.C: New test.
* g++.dg/cpp2a/nontype-class68.C: New test.
* g++.dg/cpp2a/nontype-class69.C: New test.
* g++.dg/cpp2a/nontype-class70.C: New test.
* g++.dg/cpp2a/nontype-class71.C: New test.
* g++.dg/cpp2a/nontype-class72.C: New test.
---
  gcc/testsuite/g++.dg/cpp26/pack-indexing15.C | 20 +
  gcc/testsuite/g++.dg/cpp2a/nontype-class68.C | 24 ++
  gcc/testsuite/g++.dg/cpp2a/nontype-class69.C | 27 +++
  gcc/testsuite/g++.dg/cpp2a/nontype-class70.C | 47 
  gcc/testsuite/g++.dg/cpp2a/nontype-class71.C | 19 
  gcc/testsuite/g++.dg/cpp2a/nontype-class72.C | 41 +
  6 files changed, 178 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp26/pack-indexing15.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class68.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class69.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class70.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class71.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class72.C

diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing15.C 
b/gcc/testsuite/g++.dg/cpp26/pack-indexing15.C
new file mode 100644
index 000..3f8382b12cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing15.C
@@ -0,0 +1,20 @@
+// PR c++/113800
+// { dg-do compile { target c++26 } }
+// From LLVM's temp_arg_nontype_cxx2c.cpp.
+
+template
+concept C = sizeof(T...[1]) == 1;
+
+struct A {};
+
+template auto = A{}> struct Set {};
+
+template
+void
+foo ()
+{
+  Set u;
+}
+
+Set sb;
+Set sf; // { dg-error "placeholder constraints not satisfied" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class68.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class68.C
new file mode 100644
index 000..ade646e391b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class68.C
@@ -0,0 +1,24 @@
+// PR c++/113800
+// { dg-do compile { target c++20 } }
+// From [temp.arg.nontype].
+
+template struct B { /* ... */ };
+B<5> b1;// OK, template parameter type is int
+B<'a'> b2;  // OK, template parameter type is char
+B<2.5> b3;  // OK, template parameter type is double
+B b4;  // { dg-error ".void. is not a valid type for a 
template non-type parameter" }
+
+template struct C { /* ... */ };
+C<{ 42 }> c1;   // OK
+
+struct J1 {
+  J1 *self = this;
+};
+B j1; // { dg-error "not a constant expression" }
+
+struct J2 {
+  J2 *self = this;
+  constexpr J2() {}
+  constexpr J2(const J2&) {}
+};
+B j2; // { dg-error "not a constant expression" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class69.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class69.C
new file mode 100644
index 000..08b0a5ef73c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class69.C
@@ -0,0 +1,27 @@
+// PR c++/113800
+// { dg-do compile { target c++20 } }
+
+// DR 2450
+struct S { int a; };
+
+template
+void
+f ()
+{
+}
+
+void
+test ()
+{
+  f<{0}>();
+  f<{.a= 0}>();
+}
+
+// DR 2459
+struct A {
+  constexpr A (float) {}
+};
+
+template
+struct X {};
+X<1> x;
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class70.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class70.C
new file mode 100644
index 000..0e50847e440
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class70.C
@@ -0,0 +1,47 @@
+// PR c++/113800
+// P2308R1 - Template parameter initialization
+// { dg-do compile { target c++20 } }
+
+struct S {
+  int a = 0;
+  int b = 42;
+};
+
+template 
+struct A {
+  static constexpr auto a = t.a;
+  static constexpr auto b = t.b;
+};
+
+static_assert(A<{}>::a == 0);
+static_assert(A<{}>::b == 42);
+static_assert(A<{.a = 3}>::a == 3);
+static_assert(A<{.b = 4}>::b == 4);
+
+template
+struct D1 {};
+
+template
+struct D2 {};
+
+template 
+struct D3 {};
+
+struct E {};
+
+struct I {
+  constexpr I(E) {};
+};
+
+template
+struct W {};
+
+void
+g ()
+{
+  D1<> d1;
+  D2<> d2;
+  D3<> d3;
+
+  W w;
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class71.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class71.C
new file mode 100644
index 000..36ce5b16dee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class71.C
@@ -0,0 +1,19 @@
+// PR c++/113800
+// { dg-do compile { target c++20 } }
+// From LLVM's temp_arg_nontype_cxx2c.cpp.
+
+template
+struct A {
+  T x[I];
+};
+
+template
+A(T, U...) -> A;
+
+template void foo() { }
+
+void
+bar ()
+{
+  foo<{1}>();
+}
di

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread 钟居哲

Could you add PR117974 testcase ?



juzhe.zh...@rivai.ai
 
From: Edwin Lu
Date: 2025-02-13 07:27
To: gcc-patches
CC: gnu-toolchain; vineetg; juzhe.zhong; Edwin Lu
Subject: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
The instruction scheduler appears to be speculatively hoisting vsetvl
insns outside of their basic block without checking for data
dependencies. This resulted in a situation where the following occurs
 
vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
sub a1,a1,a5 <-- a1 potentially set to 0
sh2add  a0,a5,a0
vfmacc.vv   v1,v2,v2
vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0
beq a1,zero,.L12 <-- check if avl is 0
 
This patch would essentially delay the vsetvl update to after the branch
to prevent unnecessarily updating the vinfo at the end of a basic block.
 
PR/117974
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
(TARGET_SCHED_CAN_SPECULATE_INSN): Implement.
 
Signed-off-by: Edwin Lu 
---
gcc/config/riscv/riscv.cc | 20 
1 file changed, 20 insertions(+)
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6e14126e3a4..24450bae517 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10209,6 +10209,23 @@ riscv_sched_adjust_cost (rtx_insn *, int, rtx_insn 
*insn, int cost,
   return new_cost;
}
+/* Implement TARGET_SCHED_CAN_SPECULATE_INSN hook.  Return true if insn can
+   can be scheduled for speculative execution.  Reject vsetvl instructions to
+   prevent the scheduler from hoisting them out of basic blocks without
+   checking for data dependencies PR117974.  */
+static bool
+riscv_sched_can_speculate_insn (rtx_insn *insn)
+{
+  switch (get_attr_type (insn))
+{
+  case TYPE_VSETVL:
+  case TYPE_VSETVL_PRE:
+ return false;
+  default:
+ return true;
+}
+}
+
/* Auxiliary function to emit RISC-V ELF attribute. */
static void
riscv_emit_attribute ()
@@ -14055,6 +14072,9 @@ bool need_shadow_stack_push_pop_p ()
#undef  TARGET_SCHED_ADJUST_COST
#define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost
+#undef TARGET_SCHED_CAN_SPECULATE_INSN
+#define TARGET_SCHED_CAN_SPECULATE_INSN riscv_sched_can_speculate_insn
+
#undef TARGET_FUNCTION_OK_FOR_SIBCALL
#define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall
-- 
2.43.0

Re: [PATCH] RISC-V: Avoid more unsplit insns in const expander [PR118832].

2025-02-12 Thread 钟居哲

LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2025-02-12 22:03
To: gcc-patches
CC: pal...@dabbelt.com; kito.ch...@gmail.com; juzhe.zh...@rivai.ai; 
jeffreya...@gmail.com; pan2...@intel.com; rdapp@gmail.com
Subject: [PATCH] RISC-V: Avoid more unsplit insns in const expander [PR118832].
Hi,
 
in PR118832 we have another instance of the problem already noticed in
PR117878.  We sometimes use e.g. expand_simple_binop for vector
operations like shift or and.  While this is usually OK, it causes
problems when doing it late, e.g. during LRA.
 
In particular, we might rematerialize a const_vector during LRA, which
then leaves an insn laying around that cannot be split any more if it
requires a pseudo.  Therefore we should only use the split variants
in expand_const_vector.
 
This patch fixed the issue in the PR and also pre-emptively rewrites two
other spots that might be prone to the same issue.
 
Regtested on rv64gcv_zvl512b.  As the two other cases don't have a test
(so might not even trigger) I unconditionally enabled them for my testsuite
run.
 
Regards
Robin
 
PR target/118832
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_const_vector):  Expand as
vlmax insn during lra.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr118832.c: New test.
---
gcc/config/riscv/riscv-v.cc   | 46 +++
.../gcc.target/riscv/rvv/autovec/pr118832.c   | 13 ++
2 files changed, 51 insertions(+), 8 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118832.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9847439ca77..3e86b12bb40 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1265,7 +1265,16 @@ expand_const_vector (rtx target, rtx src)
   element. Use element width = 64 and broadcast a vector with
   all element equal to 0x0706050403020100.  */
  rtx ele = builder.get_merged_repeating_sequence ();
-   rtx dup = expand_vector_broadcast (builder.new_mode (), ele);
+   rtx dup;
+   if (lra_in_progress)
+ {
+   dup = gen_reg_rtx (builder.new_mode ());
+   rtx ops[] = {dup, ele};
+   emit_vlmax_insn (code_for_pred_broadcast
+(builder.new_mode ()), UNARY_OP, ops);
+ }
+   else
+ dup = expand_vector_broadcast (builder.new_mode (), ele);
  emit_move_insn (result, gen_lowpart (mode, dup));
}
   else
@@ -1523,10 +1532,20 @@ expand_const_vector (rtx target, rtx src)
  base2 = gen_int_mode (rtx_to_poly_int64 (base2), new_smode);
  expand_vec_series (tmp2, base2,
 gen_int_mode (step2, new_smode));
-   rtx shifted_tmp2 = expand_simple_binop (
- new_mode, ASHIFT, tmp2,
- gen_int_mode (builder.inner_bits_size (), Pmode), NULL_RTX,
- false, OPTAB_DIRECT);
+   rtx shifted_tmp2;
+   rtx shift = gen_int_mode (builder.inner_bits_size (), Xmode);
+   if (lra_in_progress)
+ {
+   shifted_tmp2 = gen_reg_rtx (new_mode);
+   rtx shift_ops[] = {shifted_tmp2, tmp2, shift};
+   emit_vlmax_insn (code_for_pred_scalar
+(ASHIFT, new_mode), BINARY_OP,
+shift_ops);
+ }
+   else
+ shifted_tmp2 = expand_simple_binop (new_mode, ASHIFT, tmp2,
+ shift, NULL_RTX, false,
+ OPTAB_DIRECT);
  rtx tmp3 = gen_reg_rtx (new_mode);
  rtx ior_ops[] = {tmp3, tmp1, shifted_tmp2};
  emit_vlmax_insn (code_for_pred (IOR, new_mode), BINARY_OP,
@@ -1539,9 +1558,20 @@ expand_const_vector (rtx target, rtx src)
  rtx vid = gen_reg_rtx (mode);
  expand_vec_series (vid, const0_rtx, const1_rtx);
  /* Transform into { 0, 0, 1, 1, 2, 2, ... }.  */
-   rtx shifted_vid
- = expand_simple_binop (mode, LSHIFTRT, vid, const1_rtx,
-NULL_RTX, false, OPTAB_DIRECT);
+   rtx shifted_vid;
+   if (lra_in_progress)
+ {
+   shifted_vid = gen_reg_rtx (mode);
+   rtx shift = gen_int_mode (1, Xmode);
+   rtx shift_ops[] = {shifted_vid, vid, shift};
+   emit_vlmax_insn (code_for_pred_scalar
+(ASHIFT, mode), BINARY_OP,
+shift_ops);
+ }
+   else
+ shifted_vid = expand_simple_binop (mode, LSHIFTRT, vid,
+const1_rtx, NULL_RTX,
+false, OPTAB_DIRECT);
  rtx tmp1 = gen_reg_rtx (mode);
  rtx tmp2 = gen_reg_rtx (mode);
  expand_vec_series (tmp1, base1,
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118832.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118832.c
new file mode 100644
index 000..db0b12bee5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118832.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zbb -mabi=lp64d -O3" } */
+
+int *a;
+void b(int *);
+void c(int *g, short h) {
+int d[8], e[8];
+for (int f = 0; f < h; f++)
+  d[f] = g[f] << 24 | (g[f] & 4278190080) >> 24;
+b(d);
+for (int f = 0; f < h; f++)
+  a[f] = e[f] << 24 | (e[f] & 4278190080) >> 24;
+}
-- 
2.48.1

[WWWDOCS, COMMITTED] gcc-15: Fix HTML validation error

2025-02-12 Thread Sandra Loosemore

It appears that bin/preprocess-html.py introduces HTML errors when an
... element contains a nested hyperlink.  My recent gcc-15 release
note changes passed validation when checked via file upload, but not after
commit via file link.  It seems the easiest fix is just to remove the
offending  element.
---
 htdocs/gcc-15/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 9359e562..7638d3d5 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -65,7 +65,7 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
-OpenMP
+OpenMP
 
   
 Support for unified-shared memory has been added for some AMD and Nvidia
-- 
2.34.1

[PATCH] c++, v2: Fix up regressions caused by for/while loops with declarations [PR118822]

2025-02-12 Thread Jakub Jelinek

Hi!

On Thu, Feb 13, 2025 at 12:08:30AM +0100, Jason Merrill wrote:

Thanks for the review.

> > +  if (tsi_end_p (iter))
> > +*body_p = NULL_TREE;
> > +  else
> > +*body_p = tsi_stmt (iter);
> 
> This could use a comment that we're setting _BODY temporarily for the next
> function.

Done.

> > +  gcc_assert (TREE_CODE (BIND_EXPR_BODY (*prep_p)) == STATEMENT_LIST);
> > +  tree stmt_list = BIND_EXPR_BODY (*prep_p);
> 
> It would be shorter to assert TREE_CODE (stmt_list)?

Changed.

> > +  if (tsi_one_before_end_p (iter))
> > +*body_p = build_empty_stmt (input_location);
> > +  else
> > +{
> > +  tsi_next (&iter);
> > +  *body_p = NULL_TREE;
> > +  while (!tsi_end_p (iter))
> > +   {
> > + tree t = tsi_stmt (iter);
> > + tsi_delink (&iter);
> > + append_to_statement_list_force (t, body_p);
> > +   }
> 
> I wonder about factoring this if/else out into a split_statement_list (or
> tsi_split_list?) function.

Done (used tsi_split_stmt_list).

> > -  auto cleanup_cond = [=] {
> > +  auto cleanup_cond = [=, &cond_cleanup_depth] {
> 
> Let's just change = to &.

Done.

> > + if (cond_cleanup)
> > +   {
> > + /* If COND_CLEANUP is non-NULL, we need to evaluate DEPTH
> > +nested STATEMENT_EXPRs from inside of BIND_EXPR_BODY,
> > +but defer the evaluation of CLEANUP_EXPRs at the end
> > +of those STATEMENT_EXPRs.  */
> 
> Instead of (both) STATEMENT_EXPR do you mean CLEANUP_STMT?

I meant STATEMENT_LIST instead of STATEMENT_EXPR, reworded slightly.

> > + cond_cleanup_depth = 0;
> > + tree s = BIND_EXPR_BODY (cond_prep);
> > + for (unsigned depth = tree_to_uhwi (cond_cleanup);
> > +  depth; --depth)
> > +   {
> > + for (tree_stmt_iterator i = tsi_start (s);
> > +  !tsi_end_p (i); ++i)
> > +   {
> > + tree stmt = *i;
> > + if (TREE_CODE (stmt) == DEBUG_BEGIN_STMT)
> > +   continue;
> > + if (tsi_one_before_end_p (i))
> > +   {
> > + gcc_assert (TREE_CODE (stmt) == CLEANUP_STMT);
> > + if (*jump_target)
> > +   {
> > + depth = 1;
> > + break;
> > +   }
> 
> Why is this here, given the jump_target handling a bit below?

This is IMHO needed the way it is written.  This whole block is essentially
trying to do what cxx_eval_constant_expression would do, except it skips
evaluation of the CLEANUP_EXPRs of the last depth CLEANUP_STMTs (and a small
optimization, we know there is no continue label in any of these
STATEMENT_LISTs, so no need to special case those).
cxx_eval_statement_list on the tsi_one_before_end_p statement would just
call cxx_eval_constant_expression (ok, perhaps with vc_prvalue rather than
vc_discard, but we know we aren't in a statement expression too).
And that on CLEANUP_STMT will just do:
  if (jump_target && *jump_target)
...
default:
  return NULL_TREE;
- CLEANUP_STMT isn't one of those into which it recurses, one can't just
switch etc. across a variable declaration.  And in that case the
CLEANUP_EXPR isn't evaluated either.  So, the purpose of this is
if the CLEANUP_STMT wouldn't be evaluated, stop iterating, so don't
evaluate CLEANUP_BODY of it, and don't increase cond_cleanup_depth,
so that its CLEANUP_EXPR won't be evaluated either.
> 
> > + ++cond_cleanup_depth;
> > + if (depth > 1)
> > +   {
> > + s = CLEANUP_BODY (stmt);
> > + break;
> > +   }
> > + cxx_eval_constant_expression (ctx,
> > +   CLEANUP_BODY (stmt),
> > +   vc_discard,
> > +   non_constant_p,
> > +   overflow_p,
> > +   jump_target);
> > + break;
> > +   }
> 
> Are you assuming that stmt will not be a CLEANUP_STMT here?  Maybe swap the

This assumes stmt is CLEANUP_STMT as verified by the gcc_assert.

> one_before_end and CLEANUP_STMT tests between the if and assert above?

I guess CLEANUP_STMTs wouldn't appear in the middle of STATEMENT_LIST, but
if they would (say some cleanup no longer applying after certain point), the
check for last stmt is IMHO the right one, that is what we've verified is an
CLEANUP_STMT during construction and it is what the cleanup also searches
for to find out what CLEANUP_EXPR should be evaluated.

Here is an updated patch, passed
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ 
RUNTESTFLAGS="debug.exp=cleanup1.C dg.exp='name-independent-decl*.C 
redeclaration-*.C pr18770.C for*.C stmtexpr

Re: [PATCH v2 3/4] LoongArch: After setting the compilation options, update the predefined macros.

2025-02-12 Thread Lulu Cheng




在 2025/2/12 下午6:19, Xi Ruoyao 写道:

On Wed, 2025-02-12 at 18:03 +0800, Lulu Cheng wrote:

/* snip */


diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-3.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
new file mode 100644
index 000..a682ae4a356
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828-3.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-options "-march=loongarch64" } */
+
+#include 
+#include 
+#include 
+
+#ifndef __loongarch_arch
+#error __loongarch_arch should not be available here
+#endif

Hmm this seems not correct?  __loongarch_arch should be just
"loongarch64" here (at least it is "loongarch64" with GCC <= 14).


Well, because this predefined macro must be defined in any case, I added 
this check here.


But it seems a bit redundant. I will delete it in v3.




And a dg-do test with explicit -march= in dg-options is problematic
because it'll fail on less-capable CPUs (in this case, after we add
LA32).  We can change this to something like:


What I didn't take into account was...



/* { dg-do compile } */
/* { dg-options "-O2 -march=loongarch64" } */
/* { dg-final { scan-assembler "t1: loongarch64" } } */
/* { dg-final { scan-assembler "t2: la64v1.1" } } */
/* { dg-final { scan-assembler "t3: loongarch64" } } */

void
t1 (void)
{
   asm volatile ("# t1: " __loongarch_arch);
}

#pragma GCC push_options
#pragma GCC target("arch=la64v1.1")

void
t2 (void)
{
   asm volatile ("# t2: " __loongarch_arch);
}

#pragma GCC pop_options

void
t3 (void)
{
   asm volatile ("# t3: " __loongarch_arch);
}

... ...

/* snip */


diff --git a/gcc/testsuite/gcc.target/loongarch/pr118828-4.c 
b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c
new file mode 100644
index 000..3b3a7c6078c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr118828-4.c
@@ -0,0 +1,55 @@
+/* { dg-do run } */
+/* { dg-options "-mtune=la464" } */
+
+#include 
+#include 
+#include 
+
+#ifndef __loongarch_tune
+#error __loongarch_tune should not be available here

Likewise.

[PATCH] c-family: Improve location for -Wunknown-pragmas in a _Pragma [PR118838]

2025-02-12 Thread Lewis Hyatt

Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118838

This patch addresses the issue mentioned in the PR (another instance of
_Pragma string location issues). bootstrap + regtest all languages on
aarch64 looks good. Is it OK please for now or for stage 1?  Note, it is not
a regression, since this never worked in C or C++ frontends; but on the
other hand, r15-4505 for GCC 15 fixed some related issues, so it could be
nice if this one gets in along with it. Thanks!

-Lewis

-- >8 --

The warning for -Wunknown-pragmas is issued at the location provided by
libcpp to the def_pragma() callback. This location is
cpp_reader::directive_line, which is a location for the start of the line
only; it is also not a valid location in case the unknown pragma was lexed
from a _Pragma string. These factors make it impossible to suppress
-Wunknown-pragmas via _Pragma("GCC diagnostic...") directives on the same
source line, as in the PR and the test case. Address that by issuing the
warning at a better location returned by cpp_get_diagnostic_override_loc().
libcpp already maintains this location to handle _Pragma-related diagnostics
internally; it was needed also to make a publicly accessible version of it.

gcc/c-family/ChangeLog:

PR c/118838
* c-lex.cc (cb_def_pragma): Call cpp_get_diagnostic_override_loc()
to get a valid location at which to issue -Wunknown-pragmas, in case
it was triggered from a _Pragma.

libcpp/ChangeLog:

PR c/118838
* errors.cc (cpp_get_diagnostic_override_loc): New function.
* include/cpplib.h (cpp_get_diagnostic_override_loc): Declare.

gcc/testsuite/ChangeLog:

PR c/118838
* c-c++-common/cpp/pragma-diagnostic-loc-2.c: New test.
* g++.dg/gomp/macro-4.C: Adjust expected output.
* gcc.dg/gomp/macro-4.c: Likewise.
* gcc.dg/cpp/Wunknown-pragmas-1.c: Likewise.
---
 libcpp/errors.cc  | 10 +
 libcpp/include/cpplib.h   |  5 +
 gcc/c-family/c-lex.cc |  7 +-
 .../cpp/pragma-diagnostic-loc-2.c | 15 +
 gcc/testsuite/g++.dg/gomp/macro-4.C   |  8 +++
 gcc/testsuite/gcc.dg/cpp/Wunknown-pragmas-1.c | 22 +++
 gcc/testsuite/gcc.dg/gomp/macro-4.c   |  8 +++
 7 files changed, 57 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc-2.c

diff --git a/libcpp/errors.cc b/libcpp/errors.cc
index 9621c4b66ea..d9efb6acd30 100644
--- a/libcpp/errors.cc
+++ b/libcpp/errors.cc
@@ -52,6 +52,16 @@ cpp_diagnostic_get_current_location (cpp_reader *pfile)
 }
 }
 
+/* Sometimes a diagnostic needs to be generated before libcpp has been able
+   to generate a valid location for the current token; in that case, the
+   non-zero location returned by this function is the preferred one to use.  */
+
+location_t
+cpp_get_diagnostic_override_loc (const cpp_reader *pfile)
+{
+  return pfile->diagnostic_override_loc;
+}
+
 /* Print a diagnostic at the given location.  */
 
 ATTRIBUTE_CPP_PPDIAG (5, 0)
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 90aa3160ebf..04d4621da3c 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -1168,6 +1168,11 @@ extern const char *cpp_probe_header_unit (cpp_reader *, 
const char *file,
 extern const char *cpp_get_narrow_charset_name (cpp_reader *) ATTRIBUTE_PURE;
 extern const char *cpp_get_wide_charset_name (cpp_reader *) ATTRIBUTE_PURE;
 
+/* Sometimes a diagnostic needs to be generated before libcpp has been able
+   to generate a valid location for the current token; in that case, the
+   non-zero location returned by this function is the preferred one to use.  */
+extern location_t cpp_get_diagnostic_override_loc (const cpp_reader *);
+
 /* This function reads the file, but does not start preprocessing.  It
returns the name of the original file; this is the same as the
input file, except for preprocessed input.  This will generate at
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index e450c9a57f0..df84020de62 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -248,7 +248,12 @@ cb_def_pragma (cpp_reader *pfile, location_t loc)
 {
   const unsigned char *space, *name;
   const cpp_token *s;
-  location_t fe_loc = loc;
+
+  /* If we are processing a _Pragma, LOC is not a valid location, but 
libcpp
+will provide a good location via this function instead.  */
+  location_t fe_loc = cpp_get_diagnostic_override_loc (pfile);
+  if (!fe_loc)
+   fe_loc = loc;
 
   space = name = (const unsigned char *) "";
 
diff --git a/gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc-2.c 
b/gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc-2.c
new file mode 100644
index 000..e7e8cf23759
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pragma-diagnostic-loc-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compil

[PATCH V2] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread Edwin Lu

The instruction scheduler appears to be speculatively hoisting vsetvl
insns outside of their basic block without checking for data
dependencies. This resulted in a situation where the following occurs

vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
sub a1,a1,a5 <-- a1 potentially set to 0
sh2add  a0,a5,a0
vfmacc.vv   v1,v2,v2
vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0
beq a1,zero,.L12 <-- check if avl is 0

This patch would essentially delay the vsetvl update to after the branch
to prevent unnecessarily updating the vinfo at the end of a basic block.

PR 117974

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
(TARGET_SCHED_CAN_SPECULATE_INSN): Implement

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr117974.c: New test.

Signed-off-by: Edwin Lu 
---
Changes in V2:
  - Add testcase
---
 gcc/config/riscv/riscv.cc | 20 +++
 .../gcc.target/riscv/rvv/vsetvl/pr117974.c| 16 +++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6e14126e3a4..24450bae517 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10209,6 +10209,23 @@ riscv_sched_adjust_cost (rtx_insn *, int, rtx_insn 
*insn, int cost,
   return new_cost;
 }

+/* Implement TARGET_SCHED_CAN_SPECULATE_INSN hook.  Return true if insn can
+   can be scheduled for speculative execution.  Reject vsetvl instructions to
+   prevent the scheduler from hoisting them out of basic blocks without
+   checking for data dependencies PR117974.  */
+static bool
+riscv_sched_can_speculate_insn (rtx_insn *insn)
+{
+  switch (get_attr_type (insn))
+{
+  case TYPE_VSETVL:
+  case TYPE_VSETVL_PRE:
+   return false;
+  default:
+   return true;
+}
+}
+
 /* Auxiliary function to emit RISC-V ELF attribute. */
 static void
 riscv_emit_attribute ()
@@ -14055,6 +14072,9 @@ bool need_shadow_stack_push_pop_p ()
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost

+#undef TARGET_SCHED_CAN_SPECULATE_INSN
+#define TARGET_SCHED_CAN_SPECULATE_INSN riscv_sched_can_speculate_insn
+
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c
new file mode 100644
index 000..22e2ec7337c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr117974.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -mrvv-vector-bits=zvl 
-Ofast" } */
+
+float g(float q[], int N){
+float dqnorm = 0.0;
+
+#pragma GCC unroll 4
+
+for (int i=0; i < N; i++) {
+dqnorm = dqnorm + q[i] * q[i];
+}
+return dqnorm;
+}
+
+/* { dg-final { scan-assembler-times {beq\s+[a-x0-9]+,zero,.L12\s+vsetvli} 3 } 
} */
+
--
2.43.0

Re: Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread 钟居哲

VSETVL PASS is supposed to insert "vsetvli" into optimal place so "vsetvli" 
inserted by VSETVL PASS shouldn't involved into instruction scheduling.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2025-02-13 08:47
To: Edwin Lu; gcc-patches
CC: gnu-toolchain; vineetg; juzhe.zhong
Subject: Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
 
 
On 2/12/25 4:27 PM, Edwin Lu wrote:
> The instruction scheduler appears to be speculatively hoisting vsetvl
> insns outside of their basic block without checking for data
> dependencies. This resulted in a situation where the following occurs
> 
>  vsetvli a5,a1,e32,m1,tu,ma
>  vle32.v v2,0(a0)
>  sub a1,a1,a5 <-- a1 potentially set to 0
>  sh2add  a0,a5,a0
>  vfmacc.vv   v1,v2,v2
>  vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0
>  beq a1,zero,.L12 <-- check if avl is 0
> 
> This patch would essentially delay the vsetvl update to after the branch
> to prevent unnecessarily updating the vinfo at the end of a basic block.
> 
> PR/117974
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
> (TARGET_SCHED_CAN_SPECULATE_INSN): Implement.
Correct me if I'm wrong, there's not anything inherently wrong with the 
speculation from a correctness standpoint.  This is "just" a performance 
issue, right?
 
And from a performance standpoint speculation of the vsetvl could vary 
pretty wildly based on uarch characteristics.   I can easily see cases 
where it it wildly bad, wildly good and don't really care.
 
 
Point being it seems like it should be controlled by a uarch setting 
rather than always avoiding or always enabling.
 
Other thoughts?
 
Jeff

Re: [RFA][PR tree-optimization/98028] Use relationship between operands to simplify SUB_OVERFLOW

2025-02-12 Thread Jakub Jelinek

On Wed, Feb 12, 2025 at 01:42:12PM -0700, Jeff Law wrote:
> > I'm fine if your patch is for GCC 15 limited to the easy cases with all 3
> > types compatible (that is the common case).  Still, I think it would be nice
> > not to restrict to TYPE_UNSIGNED, just say check either that for a >= b or 
> > a > b
> > b is not negative (using vr1min).
> > 
> > And let's file a PR for GCC 16 to do it properly.
> The pause button was to give me time to get coffee in and brain turned on to
> walk through the cases.
> 
> Note there's code later in that function which actually does computations
> based on known ranges to try and prove/disprove overflow state.  There may
> be cases there we can improve range based analysis as well and there may be
> cases where the combination of range information and relationships couple be
> helpful.   Those all felt out of scope to me in terms of addressing the
> regression.  Happy to open a distinct issue on possibilities there.
> 
> The regression can be resolved entirely by looking at the relationship
> between and the types of the inputs.  Hence it's a distinct block of code in
> that routine and avoids most of the complexity in that routine.

Ok.

> I agree that the most common cases should be all the arguments the same
> type.  I was working under the assumption that the args would be compatible
> types already, forgetting that IFNs are different in that regard than other
> gimple ops.  I wouldn't want to go any further than all three operands the
> same with the easy to reason about relation checks.
> 
> 
> For gcc-16 I think we can extend that block fairly easily to handle certain
> mismatched size cases and we look to see if there's cases where the
> combination of a relationship between the arguments and some range
> information would allow us to capture further cases.

For the GCC 16 version, I think best would be (given Andrew's mail that
the relations aren't likely very useful for incompatible types) to
  relation_kind rel = VREL_VARYING;
  if (code == MINUS_EXPR
  && types_compatible_p (TREE_TYPE (op0), TREE_TYPE (op1))
{
  rel = query->relation().query (s, op0, op1);
  /* The result of the infinite precision subtraction of
 the same values will be always 0.  That will fit into any result
 type.  */
  if (rel == VREL_EQ)
return true;
}

then do the current
  int_range_max vr0, vr1;
  if (!query->range_of_expr (vr0, op0, s) || vr0.undefined_p ())
vr0.set_varying (TREE_TYPE (op0));
  if (!query->range_of_expr (vr1, op1, s) || vr1.undefined_p ())
vr1.set_varying (TREE_TYPE (op1));

  tree vr0min = wide_int_to_tree (TREE_TYPE (op0), vr0.lower_bound ());
  tree vr0max = wide_int_to_tree (TREE_TYPE (op0), vr0.upper_bound ());
  tree vr1min = wide_int_to_tree (TREE_TYPE (op1), vr1.lower_bound ());
  tree vr1max = wide_int_to_tree (TREE_TYPE (op1), vr1.upper_bound ());

and then we can e.g. special case > and >=:
  /* If op1 is not negative, op0 - op1 for op0 >= op1 will be always
 in [0, op0] and so if vr0max - vr1min fits into type, there won't
 be any overflow.  */
  if ((rel == VREL_GT || rel == VREL_GE)
  && tree_int_cst_sgn (vr1min) >= 0
  && !arith_overflowed_p (MINUS_EXPR, type, vr0max, vr1min))
return true;

Would need to think about if anything could be simplified for
VREL_G{T,E} if tree_int_cst_sgn (vr1min) < 0.

As for VREL_LT, one would need to think it through as well for both
tree_int_cst_sgn (vr1min) >= 0 and tree_int_cst_sgn (vr1min) < 0.
For the former, the infinite precision of subtraction is known given
the relation to be < 0.  Now obviously if TYPE_UNSIGNED (type) that
would imply always overflow.  But for !TYPE_UNSIGNED (type) that isn't
necessarily the case and the question is if the relation helps with the
reasoning.  Generally the code otherwise tries to check 2 boundaries
(for MULT_EXPR 4 but we don't care about that), if they both don't overflow,
it is ok, if only one overflows, we don't know, if both boundaries don't
overflow, we need to look further and check some corner cases in between.

Or just go with that even for GCC 15 (completely untested and dunno if
something needs to be done about s = NULL passed to query or not) for
now, with the advantage that it can do something even for the cases where
type is not compatible with types of arguments, and perhaps add additional
cases later?

--- gcc/vr-values.cc.jj 2025-01-13 09:12:09.461954569 +0100
+++ gcc/vr-values.cc2025-02-12 22:18:51.696314406 +0100
@@ -85,6 +85,19 @@ check_for_binary_op_overflow (range_quer
  enum tree_code subcode, tree type,
  tree op0, tree op1, bool *ovf, gimple *s = NULL)
 {
+  relation_kind rel = VREL_VARYING;
+  /* For subtraction see if relations could simplify it.  */
+  if (code == MINUS_EXPR
+  && types_compatible_p (TREE_TYPE (op0), TREE_TYPE (op1))
+{
+  rel = query->relation().query (s, op0, op1);
+  /* The result of

RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-12 Thread Tamar Christina

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, February 12, 2025 3:20 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: RE: [PATCH v2]middle-end: delay checking for alignment to load
> [PR118464]
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, February 12, 2025 2:58 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load
> > [PR118464]
> >
> > On Tue, 11 Feb 2025, Tamar Christina wrote:
> >
> > > Hi All,
> > >
> > > This fixes two PRs on Early break vectorization by delaying the safety 
> > > checks to
> > > vectorizable_load when the VF, VMAT and vectype are all known.
> > >
> > > This patch does add two new restrictions:
> > >
> > > 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven
> > >group sizes, as they are unaligned every n % 2 iterations and so may 
> > > cross
> > >a page unwittingly.
> > >
> > > 2. On LOAD_LANES targets when the buffer is unknown, we reject 
> > > vectorization
> > if
> > >we cannot peel for alignment, as the alignment requirement is quite 
> > > large at
> > >GROUP_SIZE * vectype_size.  This is unlikely to ever be beneficial so 
> > > we
> > >don't support it for now.
> > >
> > > There are other steps documented inside the code itself so that the 
> > > reasoning
> > > is next to the code.
> > >
> > > Note that for VLA I have still left this fully disabled when not working 
> > > on a
> > > fixed buffer.
> > >
> > > For VLA targets like SVE return element alignment as the desired vector
> > > alignment.  This means that the loads are never misaligned and so 
> > > annoying it
> > > won't ever need to peel.
> > >
> > > So what I think needs to happen in GCC 16 is that.
> > >
> > > 1. during vect_compute_data_ref_alignment we need to take the max of
> > >POLY_VALUE_MIN and vector_alignment.
> > >
> > > 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard 
> > > add
> a
> > >check that ncopies * vectype does not exceed POLY_VALUE_MAX which we
> use
> > as a
> > >proxy for pagesize.
> > >
> > > 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
> > >vect_determine_partial_vectors_and_peeling since the first iteration 
> > > has to
> > >be partial. If LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P we have to fail to
> > >vectorize.
> > >
> > > 4. Create a default mask to be used, so that
> > vect_use_loop_mask_for_alignment_p
> > >becomes true and we generate the peeled check through loop control for
> > >partial loops.  From what I can tell this won't work for
> > >LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling
> > support at
> > >all in the compiler.  That would need to be done independently from the
> > >above.
> >
> > We basically need to implement peeling/versioning for alignment based
> > on the actual POLY value with the fallback being first-fault loads.
> >
> > > In any case, not GCC 15 material so I've kept the WIP patches I have
> > downstream.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/118464
> > >   PR tree-optimization/116855
> > >   * doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
> > >   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
> > >   checks.
> > >   (vect_compute_data_ref_alignment): Remove alignment checks and move
> > to
> > >   get_load_store_type, increase group access alignment.
> > >   (vect_enhance_data_refs_alignment): Add note to comment needing
> > >   investigating.
> > >   (vect_analyze_data_refs_alignment): Likewise.
> > >   (vect_supportable_dr_alignment): For group loads look at first DR.
> > >   * tree-vect-stmts.cc (get_load_store_type):
> > >   Perform safety checks for early break pfa.
> > >   * tree-vectorizer.h (dr_peeling_alignment,
> > >   dr_set_peeling_alignment): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR tree-optimization/118464
> > >   PR tree-optimization/116855
> > >   * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
> > >   load type is relaxed later.
> > >   * gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
> > >   * gcc.dg/vect/vect-early-break_22.c: Reject for load_lanes targets
> > >   * g++.dg/vect/vect-early-break_7-pr118464.cc: New test.
> > >   * gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
> > >   * gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
> > >   * gcc.dg/vect/vect-early-brea

[WWWDOCS, COMMITTED] gcc-15: Update OpenMP release notes

2025-02-12 Thread Sandra Loosemore

---
 htdocs/gcc-15/changes.html | 80 --
 1 file changed, 50 insertions(+), 30 deletions(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index e273693a..d7919379 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -65,37 +65,57 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
+OpenMP
 
-  OpenMP
-  
-
-  Support for unified-shared memory has been added for some AMD and Nvidia
-  GPU devices, enabled when using the unified_shared_memory
-  clause to the requires directive. For details,
-  see the offload-target specifics section in the
-  https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";
-  >GNU Offloading and Multi Processing Runtime Library Manual.
-  GCC added ompx_gnu_pinned_mem_alloc as https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fALLOCATOR.html";>predefined
-  allocator. On https://gcc.gnu.org/onlinedocs/libgomp/nvptx.html";>Nvidia
-  GPUs, writing to the terminal from OpenMP target regions (but not 
from
-  OpenACC compute regions) is now also supported in Fortran; in C/C++ and
-  on AMD GPUs this was already supported before with both OpenMP and 
OpenACC.
-  Constructors and destructors on the device side for declare 
target
-  static aggregates are now handled.
-
-
-  OpenMP 5.1: The unroll and tile
-  loop-transformation constructs are now supported.
-
-
-  OpenMP 6.0: The https://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fdevice_005ffrom_005fuid.html";
-  >get_device_from_uid and https://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fuid_005ffrom_005fdevice.html";>
-  omp_get_uid_from_device API routines have been added.
-
-  
+  
+Support for unified-shared memory has been added for some AMD and Nvidia
+GPU devices, enabled when using the unified_shared_memory
+clause to the requires directive.
+The OpenMP 6.0 self_maps clause is also now supported.
+For details,
+see the offload-target specifics section in the
+https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";
+   >GNU Offloading and Multi Processing Runtime Library Manual.
+  
+  
+GCC added ompx_gnu_pinned_mem_alloc as a https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fALLOCATOR.html";>predefined
+  allocator.
+  
+  
+In C and Fortran, the allocate directive now supports
+static variables; stack variables were previously supported in
+those languages.  C++ support is not available yet.
+  
+  
+Offloading improvements:
+On https://gcc.gnu.org/onlinedocs/libgomp/nvptx.html";>Nvidia
+GPUs, writing to the terminal from OpenMP target regions (but not from
+OpenACC compute regions) is now also supported in Fortran; in C/C++ and
+on AMD GPUs this was already supported before with both OpenMP and OpenACC.
+Constructors and destructors on the device side for
+declare target static aggregates are now handled.
+  
+  
+The OpenMP 5.1 unroll and tile
+loop-transforming constructs are now supported.
+  
+  OpenMP 5.0 metadirectives are now supported, as are OpenMP 5.1 dynamic
+selectors in both metadirective and
+declare variant (the latter with some restrictions).
+  
+  
+The OpenMP 5.1 dispatch construct has been implemented
+with support for the adjust_args clause to the
+declare variant directive.
+  
+  
+OpenMP 6.0: The https://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fdevice_005ffrom_005fuid.html";
+>get_device_from_uid and https://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fuid_005ffrom_005fdevice.html";>
+omp_get_uid_from_device API routines have been added.
+  
 
 
 
-- 
2.34.1

[PATCH] c++: P2308, Template parameter initialization (tests) [PR113800]

2025-02-12 Thread Marek Polacek

Tested on x86_64-pc-linux-gnu, ok for trunk?  I'll also update cxx-status.html.

-- >8 --
This proposal was implemented a long time ago by my r9-5271,
but it took me this long to verify that it still works as per P2308.

This patch adds assorted tests, both from clang and from [temp.arg.nontype].
Fortunately I did not discover any issues in the compiler.

PR c++/113800
DR 2450

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing15.C: New test.
* g++.dg/cpp2a/nontype-class68.C: New test.
* g++.dg/cpp2a/nontype-class69.C: New test.
* g++.dg/cpp2a/nontype-class70.C: New test.
* g++.dg/cpp2a/nontype-class71.C: New test.
* g++.dg/cpp2a/nontype-class72.C: New test.
---
 gcc/testsuite/g++.dg/cpp26/pack-indexing15.C | 20 +
 gcc/testsuite/g++.dg/cpp2a/nontype-class68.C | 24 ++
 gcc/testsuite/g++.dg/cpp2a/nontype-class69.C | 27 +++
 gcc/testsuite/g++.dg/cpp2a/nontype-class70.C | 47 
 gcc/testsuite/g++.dg/cpp2a/nontype-class71.C | 19 
 gcc/testsuite/g++.dg/cpp2a/nontype-class72.C | 41 +
 6 files changed, 178 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp26/pack-indexing15.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class68.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class69.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class70.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class71.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class72.C

diff --git a/gcc/testsuite/g++.dg/cpp26/pack-indexing15.C 
b/gcc/testsuite/g++.dg/cpp26/pack-indexing15.C
new file mode 100644
index 000..3f8382b12cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp26/pack-indexing15.C
@@ -0,0 +1,20 @@
+// PR c++/113800
+// { dg-do compile { target c++26 } }
+// From LLVM's temp_arg_nontype_cxx2c.cpp.
+
+template
+concept C = sizeof(T...[1]) == 1;
+
+struct A {};
+
+template auto = A{}> struct Set {};
+
+template
+void
+foo ()
+{
+  Set u;
+}
+
+Set sb;
+Set sf; // { dg-error "placeholder constraints not satisfied" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class68.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class68.C
new file mode 100644
index 000..ade646e391b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class68.C
@@ -0,0 +1,24 @@
+// PR c++/113800
+// { dg-do compile { target c++20 } }
+// From [temp.arg.nontype].
+
+template struct B { /* ... */ };
+B<5> b1;// OK, template parameter type is int
+B<'a'> b2;  // OK, template parameter type is char
+B<2.5> b3;  // OK, template parameter type is double
+B b4;  // { dg-error ".void. is not a valid type for 
a template non-type parameter" }
+
+template struct C { /* ... */ };
+C<{ 42 }> c1;   // OK
+
+struct J1 {
+  J1 *self = this;
+};
+B j1; // { dg-error "not a constant expression" }
+
+struct J2 {
+  J2 *self = this;
+  constexpr J2() {}
+  constexpr J2(const J2&) {}
+};
+B j2; // { dg-error "not a constant expression" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class69.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class69.C
new file mode 100644
index 000..08b0a5ef73c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class69.C
@@ -0,0 +1,27 @@
+// PR c++/113800
+// { dg-do compile { target c++20 } }
+
+// DR 2450
+struct S { int a; };
+
+template
+void
+f ()
+{
+}
+
+void
+test ()
+{
+  f<{0}>();
+  f<{.a= 0}>();
+}
+
+// DR 2459
+struct A {
+  constexpr A (float) {}
+};
+
+template
+struct X {};
+X<1> x;
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class70.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class70.C
new file mode 100644
index 000..0e50847e440
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class70.C
@@ -0,0 +1,47 @@
+// PR c++/113800
+// P2308R1 - Template parameter initialization
+// { dg-do compile { target c++20 } }
+
+struct S {
+  int a = 0;
+  int b = 42;
+};
+
+template 
+struct A {
+  static constexpr auto a = t.a;
+  static constexpr auto b = t.b;
+};
+
+static_assert(A<{}>::a == 0);
+static_assert(A<{}>::b == 42);
+static_assert(A<{.a = 3}>::a == 3);
+static_assert(A<{.b = 4}>::b == 4);
+
+template
+struct D1 {};
+
+template
+struct D2 {};
+
+template 
+struct D3 {};
+
+struct E {};
+
+struct I {
+  constexpr I(E) {};
+};
+
+template
+struct W {};
+
+void
+g ()
+{
+  D1<> d1;
+  D2<> d2;
+  D3<> d3;
+
+  W w;
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class71.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class71.C
new file mode 100644
index 000..36ce5b16dee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class71.C
@@ -0,0 +1,19 @@
+// PR c++/113800
+// { dg-do compile { target c++20 } }
+// From LLVM's temp_arg_nontype_cxx2c.cpp.
+
+template
+struct A {
+  T x[I];
+};
+
+template
+A(T, U...) -> A;
+
+template void foo() { }
+
+void
+bar ()
+{
+  foo<{1}>();
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class72.C 
b/gcc/

Ping #4: [PATCH V4 0/5] Add more user friendly TARGET_ names for PowerPC

2025-02-12 Thread Michael Meissner

Ping patches 1-5 to Add more user friendly TARGET_ names for PowerPC:

Note, I will be away on vacation from Tuesday February 25th through Friday
March 7th.  At this point of time, I do not anticipate bringing a laptop that I
can respond to emails on this account.

Message-ID 

Information for patch set:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669067.html

Patch #1, Change TARGET_POPCNTB to TARGET_POWER5:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669068.html

Patch #2: Change TARGET_FPRND to TARGET_POWER5X:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669069.html

Patch #3: Change TARGET_CMPB to TARGET_POWER6:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669070.html

Patch #4: Change TARGET_POPCNTD to TARGET_POWER7:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669071.html

Patch #5: Change TARGET_MODULO to TARGET_POWER9:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669072.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #4: [PATCH] PR target/117487 Add power9/power10 float to logical operations

2025-02-12 Thread Michael Meissner

Ping patch to fix PR target/117487, Add power9/power10 float to logical
operations

Note, I will be away on vacation from Tuesday February 25th through Friday
March 7th.  At this point of time, I do not anticipate bringing a laptop that I
can respond to emails on this account.

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669137.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #4: [PATCH] PR target/108958: Use mtvsrdd to zero extend GPR DImode to VSX TImode

2025-02-12 Thread Michael Meissner

Ping patch for PR target/108958, Use mtvsrdd to zero extend GPR DImode to VSX
TImode

Note, I will be away on vacation from Tuesday February 25th through Friday
March 7th.  At this point of time, I do not anticipate bringing a laptop that I
can respond to emails on this account.

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669242.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #4: [PATCH V4 0/2] Separate PowerPC ISA bits from architecture bits set by -mcpu=

2025-02-12 Thread Michael Meissner

Ping patches 1-2 to separate PowerPC ISA bits from architecture bits set by
-mcpu=.

Note, I will be away on vacation from Tuesday February 25th through Friday
March 7th.  At this point of time, I do not anticipate bringing a laptop that I
can respond to emails on this account.

Message-ID 

Explanation of the patch set:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669108.html

Patch #1, add rs6000 architecture masks:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669109.html

Patch #2, use architecture flags for defining _ARCH_PWR macros:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669110.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Ping #4: [PATCH report] PR target/99293 Optimize splat of a V2DF/V2DI extract with constant element

2025-02-12 Thread Michael Meissner

Ping patch to fix PR target/99293, Optimize splat of a V2DF/V2DI extract with
constant element:

Note, I will be away on vacation from Tuesday February 25th through Friday
March 7th.  At this point of time, I do not anticipate bringing a laptop that I
can respond to emails on this account.

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669136.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH 1/2] x86: Add a pass to fold tail call

2025-02-12 Thread H.J. Lu

x86 conditional branch (jcc) target can be either a label or a symbol.
Add a pass to fold tail call with jcc by turning:

jcc .L6
...
.L6:
jmp tailcall

into:

jcc tailcall

After basic block reordering pass, conditional branches look like

(jump_insn 7 6 14 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 23)
(pc))) "x.c":8:5 1458 {jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 217325348 (nil)))
...
(code_label 23 20 8 4 4 (nil) [1 uses])
(note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  ) [0 bar S1 A8])
(const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41]  )
(nil))
(nil))

If the branch edge destination is a basic block with only a direct
sibcall, change the jcc target to the sibcall target and decrement
the destination basic block entry label use count.  Even though the
destination basic block is unused, it must be kept since it is required
by RTL control flow check and JUMP_LABEL of the conditional jump can
only point to a code label, not a code symbol.  Dummy sibcall patterns
are added so that sibcalls in basic blocks, whose entry label use count
is 0, won't be generated.

Update final_scan_insn_1 to skip a label if its use count is 0.

gcc/

PR target/47253
* final.cc (final_scan_insn_1): Skip the unused label.
* config/i386/i386-features.cc (sibcall_only_bb): New.
(fold_sibcall): Likewise.
(pass_data_fold_sibcall): Likewise.
(pass_fold_sibcall): Likewise.
(make_pass_fold_sibcall): Likewise.
* config/i386/i386-passes.def: Add pass_fold_sibcall after
pass_reorder_blocks.
* config/i386/i386-protos.h (ix86_output_jcc_insn): New.
(make_pass_fold_sibcall): Likewise.
* config/i386/i386.cc (ix86_output_jcc_insn): Likewise.
* config/i386/i386.md (*jcc): Renamed to ...
(jcc): This.  Replace label_ref with symbol_label_operand.  Use
ix86_output_jcc_insn.  Set length to 6 if the branch target
isn't a label.
(*sibcall): Renamed to ...
(sibcall_): This.
(sibcall_dummy_): New.
(*sibcall_pop): Renamed to ...
(sibcall_pop): This.
(sibcall_pop_dummy): New.
(*sibcall_value): Renamed to ...
(sibcall_value_): This.
(sibcall_value_dummy_): New.
(*sibcall_value_pop): Renamed to ...
(sibcall_value_pop): This.
(sibcall_value_pop_dummy): New.
* config/i386/predicates.md (symbol_label_operand): Likewise.

gcc/testsuite/

PR target/47253
* gcc.target/i386/pr47253-1a.c: New file.
* gcc.target/i386/pr47253-1b.c: Likewise.
* gcc.target/i386/pr47253-2a.c: Likewise.
* gcc.target/i386/pr47253-2b.c: Likewise.
* gcc.target/i386/pr47253-3a.c: Likewise.
* gcc.target/i386/pr47253-3b.c: Likewise.
* gcc.target/i386/pr47253-3c.c: Likewise.
* gcc.target/i386/pr47253-4a.c: Likewise.
* gcc.target/i386/pr47253-4b.c: Likewise.
* gcc.target/i386/pr47253-5.c: Likewise.
* gcc.target/i386/pr47253-6.c: Likewise.
* gcc.target/i386/pr47253-7a.c: Likewise.
* gcc.target/i386/pr47253-7b.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-features.cc   | 208 +
 gcc/config/i386/i386-passes.def|   1 +
 gcc/config/i386/i386-protos.h  |   3 +
 gcc/config/i386/i386.cc|  12 ++
 gcc/config/i386/i386.md|  57 +-
 gcc/config/i386/predicates.md  |   4 +
 gcc/final.cc   |   4 +
 gcc/testsuite/gcc.target/i386/pr47253-1a.c |  24 +++
 gcc/testsuite/gcc.target/i386/pr47253-1b.c |  17 ++
 gcc/testsuite/gcc.target/i386/pr47253-2a.c |  27 +++
 gcc/testsuite/gcc.target/i386/pr47253-2b.c |  17 ++
 gcc/testsuite/gcc.target/i386/pr47253-3a.c |  32 
 gcc/testsuite/gcc.target/i386/pr47253-3b.c |  20 ++
 gcc/testsuite/gcc.target/i386/pr47253-3c.c |  20 ++
 gcc/testsuite/gcc.target/i386/pr47253-4a.c |  26 +++
 gcc/testsuite/gcc.target/i386/pr47253-4b.c |  18 ++
 gcc/testsuite/gcc.target/i386/pr47253-5.c  |  15 ++
 gcc/testsuite/gcc.target/i386/pr47253-6.c  |  15 ++
 gcc/testsuite/gcc.target/i386/pr47253-7a.c |  52 ++
 gcc/testsuite/gcc.target/i386/pr47253-7b.c |  36 
 20 files changed, 600 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47253-3

[PATCH 2/2] x86: Fold sibcall targets into jump table

2025-02-12 Thread H.J. Lu

Enhance fold sibcall pass to fold sibcall targets into jump table by
turning:

foo:
.cfi_startproc
cmpl$4, %edi
ja  .L1
movl%edi, %edi
jmp *.L4(,%rdi,8)
.section.rodata
.L4:
.quad   .L8
.quad   .L7
.quad   .L6
.quad   .L5
.quad   .L3
.text
.L5:
jmp bar3
.L3:
jmp bar4
.L8:
jmp bar0
.L7:
jmp bar1
.L6:
jmp bar2
.L1:
ret
.cfi_endproc

into:

foo:
.cfi_startproc
cmpl$4, %edi
ja  .L1
movl%edi, %edi
jmp *.L4(,%rdi,8)
.section.rodata
.L4:
.quad   bar0
.quad   bar1
.quad   bar2
.quad   bar3
.quad   bar4
.text
.L1:
ret
.cfi_endproc

After basic block reordering pass, jump tables look like:

(jump_table_data 16 15 17 (addr_vec:DI [
(label_ref:DI 18)
(label_ref:DI 22)
(label_ref:DI 26)
(label_ref:DI 30)
(label_ref:DI 34)
]))
...
(code_label 30 17 31 4 5 (nil) [1 uses])
(note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41]  
) [0 bar3 S1 A8])
(const_int 0 [0])) "j.c":15:13 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41]  
)
(nil))
(nil))

If the jump table entry points to a target basic block with only a direct
sibcall, change the entry to point to the sibcall target and decrement
the target basic block entry label use count.  If the target basic block
isn't kept for JUMP_LABEL of the conditional tailcall, delete it if its
entry label use count is 0.

Update create_trace_edges to skip symbol reference in jump table and
update final_scan_insn_1 to support symbol reference in jump table.

gcc/

PR target/14721
* dwarf2cfi.cc (create_trace_edges): Skip symbol reference in
jump table.
* final.cc (final_scan_insn_1): Support symbol reference in
jump table.
* config/i386/i386-features.cc (jump_table_label_to_basic_block):
New.
(fold_sibcall): Fold the sibcall targets into jump table.

gcc/testsuite/

PR target/14721
* gcc.target/i386/pr14721-1a.c: New.
* gcc.target/i386/pr14721-1b.c: Likewise.
* gcc.target/i386/pr14721-1c.c: Likewise.
* gcc.target/i386/pr14721-2c.c: Likewise.
* gcc.target/i386/pr14721-2b.c: Likewise.
* gcc.target/i386/pr14721-2c.c: Likewise.
* gcc.target/i386/pr14721-3c.c: Likewise.
* gcc.target/i386/pr14721-3b.c: Likewise.
* gcc.target/i386/pr14721-3c.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-features.cc   | 70 +-
 gcc/dwarf2cfi.cc   |  7 ++-
 gcc/final.cc   | 22 ++-
 gcc/testsuite/gcc.target/i386/pr14721-1a.c | 54 +
 gcc/testsuite/gcc.target/i386/pr14721-1b.c | 37 
 gcc/testsuite/gcc.target/i386/pr14721-1c.c | 37 
 gcc/testsuite/gcc.target/i386/pr14721-2a.c | 58 ++
 gcc/testsuite/gcc.target/i386/pr14721-2b.c | 41 +
 gcc/testsuite/gcc.target/i386/pr14721-2c.c | 43 +
 gcc/testsuite/gcc.target/i386/pr14721-3a.c | 56 +
 gcc/testsuite/gcc.target/i386/pr14721-3b.c | 40 +
 gcc/testsuite/gcc.target/i386/pr14721-3c.c | 39 
 12 files changed, 499 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-1c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr14721-3c.c

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index bb1e428bb1b..e89c8324f34 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3328,6 +3328,22 @@ sibcall_only_bb (basic_block bb, bitmap sibcall_bbs)
   return nullptr;
 }
 
+/* Return the sibcall target if the basic block referenced by LABEL only
+   has a direct sibcall.  */
+
+static rtx
+jump_table_label_to_basic_block (rtx label, bitmap sibcall_bbs)
+{
+  label = XEXP (label, 0);
+  basic_block bb = BLOCK_FOR_INSN (label);
+  rtx target = sibcall_only_bb (bb, sibcall_bbs);
+  /* Decrement the label use count if jump table entry will use the
+ sibcall directly.  */
+  if (target)
+LABEL_NUSES (label) -= 1;
+  return target;
+}
+
 /* Fold direct sibcall.  */
 
 static unsigned i

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread Jeff Law





On 2/12/25 4:27 PM, Edwin Lu wrote:

The instruction scheduler appears to be speculatively hoisting vsetvl
insns outside of their basic block without checking for data
dependencies. This resulted in a situation where the following occurs

 vsetvli a5,a1,e32,m1,tu,ma
 vle32.v v2,0(a0)
 sub a1,a1,a5 <-- a1 potentially set to 0
 sh2add  a0,a5,a0
 vfmacc.vv   v1,v2,v2
 vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0
 beq a1,zero,.L12 <-- check if avl is 0

This patch would essentially delay the vsetvl update to after the branch
to prevent unnecessarily updating the vinfo at the end of a basic block.

PR/117974

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
(TARGET_SCHED_CAN_SPECULATE_INSN): Implement.
Correct me if I'm wrong, there's not anything inherently wrong with the 
speculation from a correctness standpoint.  This is "just" a performance 
issue, right?


And from a performance standpoint speculation of the vsetvl could vary 
pretty wildly based on uarch characteristics.   I can easily see cases 
where it it wildly bad, wildly good and don't really care.



Point being it seems like it should be controlled by a uarch setting 
rather than always avoiding or always enabling.


Other thoughts?

Jeff

[PATCH 0/2] x86: Add a pass to fold tail call

2025-02-12 Thread H.J. Lu

x86 conditional branch (jcc) target can be either a label or a symbol.
Add a pass to fold tail call with jcc by turning:

jcc .L6
...
.L6:
jmp tailcall

into:

jcc tailcall

After basic block reordering pass, conditional branches look like

(jump_insn 7 6 14 2 (set (pc)
(if_then_else (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 23)
(pc))) "x.c":8:5 1458 {jcc}
 (expr_list:REG_DEAD (reg:CCZ 17 flags)
(int_list:REG_BR_PROB 217325348 (nil)))
...
(code_label 23 20 8 4 4 (nil) [1 uses])
(note 8 23 9 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 9 8 10 4 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  ) [0 bar S1 A8])
(const_int 0 [0])) "x.c":8:14 discrim 1 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar") [flags 0x41]  )
(nil))
(nil))

If the branch edge destination is a basic block with only a direct
sibcall, change the jcc target to the sibcall target and decrement
the destination basic block entry label use count.  Even though the
destination basic block is unused, it must be kept since it is required
by RTL control flow check and JUMP_LABEL of the conditional jump can
only point to a code label, not a code symbol.  Dummy sibcall patterns
are added so that sibcalls in basic blocks, whose entry label use count
is 0, won't be generated.

Jump tables like

foo:
.cfi_startproc
cmpl$4, %edi
ja  .L1
movl%edi, %edi
jmp *.L4(,%rdi,8)
.section.rodata
.L4:
.quad   .L8
.quad   .L7
.quad   .L6
.quad   .L5
.quad   .L3
.text
.L5:
jmp bar3
.L3:
jmp bar4
.L8:
jmp bar0
.L7:
jmp bar1
.L6:
jmp bar2
.L1:
ret
.cfi_endproc

can also be changed to:

foo:
.cfi_startproc
cmpl$4, %edi
ja  .L1
movl%edi, %edi
jmp *.L4(,%rdi,8)
.section.rodata
.L4:
.quad   bar0
.quad   bar1
.quad   bar2
.quad   bar3
.quad   bar4
.text
.L1:
ret
.cfi_endproc

After basic block reordering pass, jump tables look like:

(jump_table_data 16 15 17 (addr_vec:DI [
(label_ref:DI 18)
(label_ref:DI 22)
(label_ref:DI 26)
(label_ref:DI 30)
(label_ref:DI 34)
]))
...
(code_label 30 17 31 4 5 (nil) [1 uses])
(note 31 30 32 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(call_insn/j 32 31 33 4 (call (mem:QI (symbol_ref:DI ("bar3") [flags 0x41]  
) [0 bar3 S1 A8])
(const_int 0 [0])) "j.c":15:13 1469 {sibcall_di}
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("bar3") [flags 0x41]  
)
(nil))
(nil))

If the jump table entry points to a target basic block with only a direct
sibcall, change the entry to point to the sibcall target and decrement
the target basic block entry label use count.  If the target basic block
isn't kept for JUMP_LABEL of the conditional tailcall, delete it if its
entry label use count is 0.

Update final_scan_insn_1 to skip a label if its use count is 0 and
support symbol reference in jump table.  Update create_trace_edges to
skip symbol reference in jump table.

H.J. Lu (2):
  x86: Add a pass to fold tail call
  x86: Fold sibcall targets into jump table

 gcc/config/i386/i386-features.cc   | 274 +
 gcc/config/i386/i386-passes.def|   1 +
 gcc/config/i386/i386-protos.h  |   3 +
 gcc/config/i386/i386.cc|  12 +
 gcc/config/i386/i386.md|  57 -
 gcc/config/i386/predicates.md  |   4 +
 gcc/dwarf2cfi.cc   |   7 +-
 gcc/final.cc   |  26 +-
 gcc/testsuite/gcc.target/i386/pr14721-1a.c |  54 
 gcc/testsuite/gcc.target/i386/pr14721-1b.c |  37 +++
 gcc/testsuite/gcc.target/i386/pr14721-1c.c |  37 +++
 gcc/testsuite/gcc.target/i386/pr14721-2a.c |  58 +
 gcc/testsuite/gcc.target/i386/pr14721-2b.c |  41 +++
 gcc/testsuite/gcc.target/i386/pr14721-2c.c |  43 
 gcc/testsuite/gcc.target/i386/pr14721-3a.c |  56 +
 gcc/testsuite/gcc.target/i386/pr14721-3b.c |  40 +++
 gcc/testsuite/gcc.target/i386/pr14721-3c.c |  39 +++
 gcc/testsuite/gcc.target/i386/pr47253-1a.c |  24 ++
 gcc/testsuite/gcc.target/i386/pr47253-1b.c |  17 ++
 gcc/testsuite/gcc.target/i386/pr47253-2a.c |  27 ++
 gcc/testsuite/gcc.target/i386/pr47253-2b.c |  17 ++
 gcc/testsuite/gcc.target/i386/pr47253-3a.c |  32 +++
 gcc/testsuite/gcc.target/i386/pr47253-3b.c |  20 ++
 gcc/testsuite/gcc.target/i386/pr47253-3c.c |  20 ++
 gcc/testsuite/gcc.target/i386/pr47253-4a.c |  26 ++
 gcc/testsuite/gcc.target/i386/pr47253-4b.c |  18 ++
 gcc/testsuite/gcc.target/i386/pr47253-5.c  |  15 ++
 gcc/testsuite/gcc.target/i386/pr47253-6.c  |  15 ++
 gcc/testsuite/gcc.target/i386/pr47253-7a.c |  52 
 gcc/testsuit

Ping #2: [PATCH, V2] Add Vector pair support

2025-02-12 Thread Michael Meissner

Ping the following patch:

Note, I will be away on vacation from Tuesday February 25th through Friday
March 7th.  At this point of time, I do not anticipate bringing a laptop that I
can respond to emails on this account.

Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670787.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread Edwin Lu

The instruction scheduler appears to be speculatively hoisting vsetvl
insns outside of their basic block without checking for data
dependencies. This resulted in a situation where the following occurs

vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
sub a1,a1,a5 <-- a1 potentially set to 0
sh2add  a0,a5,a0
vfmacc.vv   v1,v2,v2
vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update vl to 0
beq a1,zero,.L12 <-- check if avl is 0

This patch would essentially delay the vsetvl update to after the branch
to prevent unnecessarily updating the vinfo at the end of a basic block.

PR/117974

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
(TARGET_SCHED_CAN_SPECULATE_INSN): Implement.

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/riscv.cc | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6e14126e3a4..24450bae517 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10209,6 +10209,23 @@ riscv_sched_adjust_cost (rtx_insn *, int, rtx_insn 
*insn, int cost,
   return new_cost;
 }
 
+/* Implement TARGET_SCHED_CAN_SPECULATE_INSN hook.  Return true if insn can
+   can be scheduled for speculative execution.  Reject vsetvl instructions to
+   prevent the scheduler from hoisting them out of basic blocks without
+   checking for data dependencies PR117974.  */
+static bool
+riscv_sched_can_speculate_insn (rtx_insn *insn)
+{
+  switch (get_attr_type (insn))
+{
+  case TYPE_VSETVL:
+  case TYPE_VSETVL_PRE:
+   return false;
+  default:
+   return true;
+}
+}
+
 /* Auxiliary function to emit RISC-V ELF attribute. */
 static void
 riscv_emit_attribute ()
@@ -14055,6 +14072,9 @@ bool need_shadow_stack_push_pop_p ()
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost
 
+#undef TARGET_SCHED_CAN_SPECULATE_INSN
+#define TARGET_SCHED_CAN_SPECULATE_INSN riscv_sched_can_speculate_insn
+
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall
 
-- 
2.43.0

Re: [PATCH] gcc: testsuite: Fix builtin-speculation-overloads[14].C testism

2025-02-12 Thread Jason Merrill


On 2/12/25 5:23 PM, mmalcom...@nvidia.com wrote:

From: Matthew Malcomson 

I've posted the patch on the relevant Bugzilla, but also sending to
mailing list.  If should have only done one please do mention.


Having it in both places is fine, or send to the mailing list and put a 
link to the list in bugzilla.



- 8< --- >8 
When making warnings trigger a failure in template substitution I
could not find any way to trigger the warning about builtin speculation
not being available on the given target.

Turns out I misread the code -- this warning happens when the
speculation_barrier pattern is not defined.

Here we add an effective target to represent
"__builtin_speculation_safe_value is available on this target" and use
that to adjust our test on SFINAE behaviour accordingly.
N.b. this means that we get extra testing -- not just that things work
on targets which support __builtin_speculation_safe_value, but also that
the behaviour works on targets which don't support it.

Tested with AArch64 native, AArch64 cross compiler, and RISC-V cross
compiler (just running the tests that I've changed).

Points of interest for any reviewer:

In the new `check_known_compiler_messages_nocache` procedure I use some


Why is it not enough to look for the message with "[regexp" like 
check_alias_available does?


Jason

Re: [PATCH] c++: Constrain visibility for CNTTPs with internal types [PR118849]

2025-02-12 Thread Jason Merrill


On 2/12/25 1:22 PM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/14?


OK.


-- >8 --

While looking into PR118846 I noticed that we don't currently constrain
the linkage of functions involving CNTTPs of internal-linkage types.  It
seems to me that this would be sensible to do.

 PR c++/118849

gcc/cp/ChangeLog:

* decl2.cc (min_vis_expr_r): Constrain visibility according to
the type of decl_constant_var_p decls.

gcc/testsuite/ChangeLog:

* g++.dg/template/linkage6.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl2.cc  |  4 +++-
  gcc/testsuite/g++.dg/template/linkage6.C | 12 
  2 files changed, 15 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/linkage6.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 4415cea93e0..9a76e00dcde 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -2820,7 +2820,9 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
/* The ODR allows definitions in different TUs to refer to distinct
   constant variables with internal or no linkage, so such a reference
   shouldn't affect visibility (PR110323).  FIXME but only if the
-  lvalue-rvalue conversion is applied.  */;
+  lvalue-rvalue conversion is applied.  We still want to restrict
+  visibility according to the type of the declaration however.  */
+   tpvis = type_visibility (TREE_TYPE (t));
else if (! TREE_PUBLIC (t))
tpvis = VISIBILITY_ANON;
else
diff --git a/gcc/testsuite/g++.dg/template/linkage6.C 
b/gcc/testsuite/g++.dg/template/linkage6.C
new file mode 100644
index 000..fb589f67874
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/linkage6.C
@@ -0,0 +1,12 @@
+// { dg-do compile { target c++20 } }
+// { dg-final { scan-assembler-not "(weak|glob)\[^\n\]*_Z" { xfail 
powerpc-*-aix* } } }
+
+namespace {
+  struct A {};
+}
+
+template  void f() { }
+
+int main() {
+  f();
+}

Re: [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling

2025-02-12 Thread Edwin Lu

Oops I made a mental note to add one and then completely forgot. I'll 
run the testsuite again with the testcase before sending it up as v2.


Edwin

On 2/12/2025 3:38 PM, 钟居哲 wrote:

Could you add PR117974 testcase ?


juzhe.zh...@rivai.ai

*From:* Edwin Lu 
*Date:* 2025-02-13 07:27
*To:* gcc-patches 
*CC:* gnu-toolchain ; vineetg
; juzhe.zhong
; Edwin Lu 
*Subject:* [PATCH] RISC-V: Prevent speculative vsetvl insn scheduling
The instruction scheduler appears to be speculatively hoisting vsetvl
insns outside of their basic block without checking for data
dependencies. This resulted in a situation where the following occurs
    vsetvli a5,a1,e32,m1,tu,ma
    vle32.v v2,0(a0)
    sub a1,a1,a5 <-- a1 potentially set to 0
    sh2add  a0,a5,a0
    vfmacc.vv   v1,v2,v2
    vsetvli a5,a1,e32,m1,tu,ma <-- incompatible vinfo. update
vl to 0
    beq a1,zero,.L12 <-- check if avl is 0
This patch would essentially delay the vsetvl update to after the
branch
to prevent unnecessarily updating the vinfo at the end of a basic
block.
PR/117974
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sched_can_speculate_insn):
(TARGET_SCHED_CAN_SPECULATE_INSN): Implement.
Signed-off-by: Edwin Lu 
---
gcc/config/riscv/riscv.cc | 20 
1 file changed, 20 insertions(+)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6e14126e3a4..24450bae517 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10209,6 +10209,23 @@ riscv_sched_adjust_cost (rtx_insn *, int,
rtx_insn *insn, int cost,
   return new_cost;
}
+/* Implement TARGET_SCHED_CAN_SPECULATE_INSN hook. Return true if
insn can
+   can be scheduled for speculative execution.  Reject vsetvl
instructions to
+   prevent the scheduler from hoisting them out of basic blocks
without
+   checking for data dependencies PR117974.  */
+static bool
+riscv_sched_can_speculate_insn (rtx_insn *insn)
+{
+  switch (get_attr_type (insn))
+    {
+  case TYPE_VSETVL:
+  case TYPE_VSETVL_PRE:
+ return false;
+  default:
+ return true;
+    }
+}
+
/* Auxiliary function to emit RISC-V ELF attribute. */
static void
riscv_emit_attribute ()
@@ -14055,6 +14072,9 @@ bool need_shadow_stack_push_pop_p ()
#undef  TARGET_SCHED_ADJUST_COST
#define TARGET_SCHED_ADJUST_COST riscv_sched_adjust_cost
+#undef TARGET_SCHED_CAN_SPECULATE_INSN
+#define TARGET_SCHED_CAN_SPECULATE_INSN
riscv_sched_can_speculate_insn
+
#undef TARGET_FUNCTION_OK_FOR_SIBCALL
#define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall
-- 
2.43.0

Ping #2: [PATCH V2], Add PowerPC Dense Match Support for future cpus

2025-02-12 Thread Michael Meissner

Ping patches for adding intial dense math support to a potential future PowerPC
processor:

Note, I will be away on vacation from Tuesday February 25th through Friday
March 7th.  At this point of time, I do not anticipate bringing a laptop that I
can respond to emails on this account.

Explation of the patches:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670789.html

Patch 1 of 3, add wD constraint:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670790.html

Patch 2 of 3, add support for dense math registers:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670791.html

Patch 3 of 3, add support for 1,024 bit dense math registers:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/670792.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH] tree-optimization/90579 - avoid STLF fail by better optimizing

2025-02-12 Thread Andrew Pinski

On Wed, Feb 12, 2025 at 6:58 AM Richard Biener  wrote:
>
> For the testcase in question which uses a fold-left vectorized
> reduction of a reverse iterating loop we'd need two forwprop
> invocations to first bypass the permute emitted for the reverse
> iterating loop and then to decompose the vector load that only
> feeds element extracts.  The following moves the first transform
> to a match.pd pattern and makes sure we fold the element extracts
> when the vectorizer emits them so the single forwprop pass can
> then pick up the vector load decomposition, avoiding the forwarding
> fail that causes.
>
> Moving simplify_bitfield_ref also makes forwprop remove the dead
> VEC_PERM_EXPR via the simple-dce it uses - this was also
> previously missing.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

LGTM

>
> Thanks,
> Richard.
>
> PR tree-optimization/90579
> * tree-ssa-forwprop.cc (simplify_bitfield_ref): Move to
> match.pd.
> (pass_forwprop::execute): Adjust.
> * match.pd (bit_field_ref (vec_perm ...)): New pattern
> modeled after simplify_bitfield_ref.
> * tree-vect-loop.cc (vect_expand_fold_left): Fold the
> element extract stmt, combining it with the vector def.
>
> * gcc.target/i386/pr90579.c: New testcase.
> ---
>  gcc/match.pd|  56 +
>  gcc/testsuite/gcc.target/i386/pr90579.c |  23 ++
>  gcc/tree-ssa-forwprop.cc| 103 +---
>  gcc/tree-vect-loop.cc   |   5 ++
>  4 files changed, 85 insertions(+), 102 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90579.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 20b2aec6f37..ea44201f2eb 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -9538,6 +9538,62 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / const_k)->value; }
>@1 { bitsize_int ((idx % const_k) * width); })
>
> +(simplify
> + (BIT_FIELD_REF (vec_perm@0 @1 @2 VECTOR_CST@3) @rsize @rpos)
> + (with
> +  {
> +tree elem_type = TREE_TYPE (TREE_TYPE (@0));
> +poly_uint64 elem_size = tree_to_poly_uint64 (TYPE_SIZE (elem_type));
> +poly_uint64 size = tree_to_poly_uint64 (TYPE_SIZE (type));
> +unsigned HOST_WIDE_INT nelts, idx;
> +unsigned HOST_WIDE_INT nelts_op = 0;
> +  }
> +  (if (constant_multiple_p (tree_to_poly_uint64 (@rpos), elem_size, &idx)
> +   && VECTOR_CST_NELTS (@3).is_constant (&nelts)
> +   && (known_eq (size, elem_size)
> +  || (constant_multiple_p (size, elem_size, &nelts_op)
> +  && pow2p_hwi (nelts_op
> +   (with
> +{
> +  bool ok = true;
> +  /* One element.  */
> +  if (known_eq (size, elem_size))
> +idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (@3, idx)) % (2 * nelts);
> +  else
> +{
> + /* Clamp vec_perm_expr index.  */
> + unsigned start
> +   = TREE_INT_CST_LOW (vector_cst_elt (@3, idx)) % (2 * nelts);
> + unsigned end
> +   = (TREE_INT_CST_LOW (vector_cst_elt (@3, idx + nelts_op - 1))
> +  % (2 * nelts));
> + /* Be in the same vector.  */
> + if ((start < nelts) != (end < nelts))
> +   ok = false;
> + else
> +   for (unsigned HOST_WIDE_INT i = 1; i != nelts_op; i++)
> + {
> +   /* Continuous area.  */
> +   if ((TREE_INT_CST_LOW (vector_cst_elt (@3, idx + i))
> +% (2 * nelts) - 1)
> +   != (TREE_INT_CST_LOW (vector_cst_elt (@3, idx + i - 1))
> +   % (2 * nelts)))
> + {
> +   ok = false;
> +   break;
> + }
> + }
> + /* Alignment not worse than before.  */
> + if (start % nelts_op)
> +   ok = false;
> + idx = start;
> +   }
> +}
> +(if (ok)
> + (if (idx < nelts)
> +  (BIT_FIELD_REF @1 @rsize { bitsize_int (idx * elem_size); })
> +  (BIT_FIELD_REF @2 @rsize { bitsize_int ((idx - nelts) * elem_size); 
> })))
> +
>  /* Simplify a bit extraction from a bit insertion for the cases with
> the inserted element fully covering the extraction or the insertion
> not touching the extraction.  */
> diff --git a/gcc/testsuite/gcc.target/i386/pr90579.c 
> b/gcc/testsuite/gcc.target/i386/pr90579.c
> new file mode 100644
> index 000..ab48a44063c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr90579.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mfpmath=sse" } */
> +
> +extern double r[6];
> +extern double a[];
> +
> +double
> +loop (int k, double x)
> +{
> +  int i;
> +  double t=0;
> +  for (i=0;i<6;i++)
> +r[i] = x * a[i + k];
> +  for (i=0;i<6;i++)
> +t+=r[5-i];
> +  return t;
> +}
> +
> +/* Verify we end up with scalar loads from r for the final sum.  */
> +/* { d

Ping #4: [PATCH repost] PR target/117251 Add PowerPC XXEVAL support for fusion optimization in power10

2025-02-12 Thread Michael Meissner

Ping patch to fix PR target/117251, Add PowerPC XXEVAL support for fusion
optimization in power10

Note, I will be away on vacation from Tuesday February 25th through Friday
March 7th.  At this point of time, I do not anticipate bringing a laptop that I
can respond to emails on this account.

Message-ID 

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669138.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

98 matches

Mail list logo