Re: [PATCH] fortran: Fix debug info for unsigned(kind=1) and unsigned(kind=4) [PR120193]

2025-05-10 Thread Thomas Koenig

Hi Jakub,


The following patch uses a variant of the character(kind=4) type
for unsigned(kind=4) and a distinct type based on character(kind=1)
type for unsigned(kind=1).  The reason for the latter is that
unsigned_char_type_node has TYPE_STRING_FLAG set on it, so it has
DW_AT_encoding DW_ATE_unsigned_char rather than DW_ATE_unsigned and
so the debugger then likes to print it as characters rather than numbers.
That is IMHO in Fortran desirable for character(kind=1) but not for
unsigned(kind=1).  I've made sure TYPE_CANONICAL of the unsigned(kind=1)
type is still character(kind=1), so they are considered compatible by
the middle-end also e.g. for aliasing etc.


Ah, that's how to do it.  I wasn't really comfortable re-using
CHARACTER, but I had found no better way.


Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and later on for 15.2?


Yes.  And thanks for the patch!

Best regards

Thomas



[PATCH v4 1/2] RISC-V: Support RISC-V Profiles 20/22.

2025-05-10 Thread Jiawei
This patch introduces support for RISC-V Profiles RV20 and RV22 [1],
enabling developers to utilize these profiles through the -march option.

[1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0

Version log:
Using lowercase letters to present Profiles.
Using '_' as divsor between Profiles and other RISC-V extension.
Add descriptions in invoke.texi.
Checking if there exist '_' between Profiles and additional extensions.
Using std::string to avoid memory problems.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (struct riscv_profiles): New 
struct.
(riscv_subset_list::parse_profiles): New parser.
(riscv_subset_list::parse_base_ext): Ditto.
* config/riscv/riscv-subset.h: New def.
* doc/invoke.texi: New option descriptions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-49.c: New test.
* gcc.target/riscv/arch-50.c: New test.
* gcc.target/riscv/arch-51.c: New test.
* gcc.target/riscv/arch-52.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc   | 85 ++-
 gcc/config/riscv/riscv-subset.h   |  2 +
 gcc/doc/invoke.texi   | 17 ++--
 .../aarch64/sve/acle/general/attributes_7.c   |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-49.c  |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-50.c  | 12 +++
 gcc/testsuite/gcc.target/riscv/arch-51.c  | 12 +++
 gcc/testsuite/gcc.target/riscv/arch-52.c  |  6 ++
 8 files changed, 131 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-49.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-50.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-51.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-52.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index ca14eb96b2..0fa2e21f27 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -274,6 +274,12 @@ struct riscv_ext_version
   int minor_version;
 };
 
+struct riscv_profiles
+{
+  const char *profile_name;
+  const char *profile_string;
+};
+
 /* All standard extensions defined in all supported ISA spec.  */
 static const struct riscv_ext_version riscv_ext_version_table[] =
 {
@@ -502,6 +508,31 @@ static const struct riscv_ext_version riscv_combine_info[] 
=
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
 
+/* This table records the mapping form RISC-V Profiles into march string.  */
+static const riscv_profiles riscv_profiles_table[] =
+{
+  /* RVI20U only contains the base extension 'i' as mandatory extension.  */
+  {"rvi20u64", "rv64i"},
+  {"rvi20u32", "rv32i"},
+
+  /* RVA20U contains the 'i,m,a,f,d,c,zicsr,zicntr,ziccif,ziccrse,ziccamoa,
+ zicclsm,za128rs' as mandatory extensions.  */
+  {"rva20u64", "rv64imafdc_zicsr_zicntr_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_za128rs"},
+
+  /* RVA22U contains the 'i,m,a,f,d,c,zicsr,zihintpause,zba,zbb,zbs,zicntr,
+ zihpm,ziccif,ziccrse,ziccamoa, zicclsm,zic64b,za64rs,zicbom,zicbop,zicboz,
+ zfhmin,zkt' as mandatory extensions.  */
+  {"rva22u64", "rv64imafdc_zicsr_zicntr_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
+   "_zicboz_zfhmin_zkt"},
+
+  /* Currently we do not define S/M mode Profiles in gcc part.  */
+
+  /* Terminate the list.  */
+  {NULL, NULL}
+};
+
 static const riscv_cpu_info riscv_cpu_tables[] =
 {
 #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \
@@ -1109,6 +1140,52 @@ riscv_subset_list::parsing_subset_version (const char 
*ext,
   return p;
 }
 
+/* Parsing RISC-V Profiles in -march string.
+   Return string with mandatory extensions of Profiles.  */
+std::string
+riscv_subset_list::parse_profiles (const char *arch)
+{
+  /* Checking if input string contains a Profiles.
+There are two cases use Profiles in -march option:
+
+1. Only use Profiles in '-march' as input
+2. Mixed Profiles with other extensions
+
+Use '_' to split Profiles and other extension.  */
+std::string p(arch);
+const size_t p_len = p.size();
+
+for (int i = 0; riscv_profiles_table[i].profile_name != nullptr; ++i)
+{
+  const std::string& p_name = riscv_profiles_table[i].profile_name;
+  const std::string& p_str = riscv_profiles_table[i].profile_string;
+  size_t pos = p.find(p_name);
+  /* Find profile at the begin.  */
+  if (pos == 0 && pos + p_name.size() <= p_len)
+   {
+ size_t after_pos = pos + p_name.size();
+ std::string after_part = p.substr(after_pos);
+
+ /* If there're only profile, return the profile_string directly.  */
+ if (after_part[0] == '\0')
+   return p_str;
+
+ /* If isn't '_' after profile, need to add it and mention the user.  
*/
+ if (after_part[0] != '_')
+ {
+   warning_at (m_loc, 0, "Should use \"%c\" to contact Profiles with 
other "
+ "extensions", '_');
+  

[PATCH v4 2/2] RISC-V: Support RISC-V Profiles 23.

2025-05-10 Thread Jiawei
This patch introduces support for RISC-V Profiles RV23A and RV23B [1],
enabling developers to utilize these profiles through the -march option.

[1] https://github.com/riscv/riscv-profiles/releases/tag/rva23-rvb23-ratified

Version log:
Update the testcases, using lowercase letter.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New profile.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-53.c: New test.
* gcc.target/riscv/arch-54.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc  | 16 
 gcc/testsuite/gcc.target/riscv/arch-53.c | 11 +++
 gcc/testsuite/gcc.target/riscv/arch-54.c | 10 ++
 3 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-53.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-54.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0fa2e21f27..e06cd5fa57 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -527,6 +527,22 @@ static const riscv_profiles riscv_profiles_table[] =
"_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
"_zicboz_zfhmin_zkt"},
 
+  /* RVA23 contains all mandatory base ISA for RVA22U64 and the new extension
+ 'v,zihintntl,zvfhmin,zvbb,zvkt,zicond,zimop,zcmop,zfa,zawrs' as mandatory
+ extensions.  */
+  {"rva23u64", "rv64imafdcv_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
+   "_zicboz_zfhmin_zkt_zvfhmin_zvbb_zvkt_zihintntl_zicond_zimop_zcmop_zcb"
+   "_zfa_zawrs"},
+
+  /* RVB23 contains all mandatory base ISA for RVA22U64 and the new extension
+ 'zihintntl,zicond,zimop,zcmop,zfa,zawrs' as mandatory
+ extensions.  */
+  {"rvb23u64", "rv64imafdc_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa"
+   "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop"
+   "_zicboz_zfhmin_zkt_zihintntl_zicond_zimop_zcmop_zcb"
+   "_zfa_zawrs"},
+
   /* Currently we do not define S/M mode Profiles in gcc part.  */
 
   /* Terminate the list.  */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-53.c 
b/gcc/testsuite/gcc.target/riscv/arch-53.c
new file mode 100644
index 00..8210978ee8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-53.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rva23u64 -mabi=lp64d" } */
+
+void foo(){}
+
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0"
+"_b1p0_v1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0"
+_ziccrse1p0_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0"
+_za64rs1p0_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0"
+_zba1p0_zbb1p0_zbs1p0_zkt1p0_zvbb1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0"
+_zvfhmin1p0_zvkb1p0_zvkt1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0\"" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-54.c 
b/gcc/testsuite/gcc.target/riscv/arch-54.c
new file mode 100644
index 00..6d242dfba5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-54.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rvb23u64 -mabi=lp64d" } */
+
+void foo(){}
+
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0"
+"_b1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0_ziccrse1p0"
+"_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0_za64rs1p0"
+"_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0_zba1p0"
+"_zbb1p0_zbs1p0_zkt1p0\"" } } */
-- 
2.43.0



New Swedish PO file for 'gcc' (version 15.1.0)

2025-05-10 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-15.1.0.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-10 Thread Jonathan Yong

On 5/9/25 4:26 PM, LIU Hao wrote:

在 2025-5-3 20:52, LIU Hao 写道:

在 2025-5-2 01:25, LIU Hao 写道:
Remove `STACK_REALIGN_DEFAULT` for this target, because now the 
default value of `incoming_stack_boundary` equals 
`MIN_STACK_BOUNDARY` and it doesn't have an effect any more.





I suddenly realized the previous patch was for GCC 15 branch. Here's a 
new one, rebased on master.





Ping.

Also, I think I need some comments on the `force_align_arg_pointer` 
hunk. There's really no good reason for the change, except that 'we had 
better let the attribute do something.'




Can you please rebase your patch? Thanks.



Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-10 Thread LIU Hao

在 2025-5-10 20:48, Jonathan Yong 写道:

On 5/9/25 4:26 PM, LIU Hao wrote:

在 2025-5-3 20:52, LIU Hao 写道:

在 2025-5-2 01:25, LIU Hao 写道:
Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of 
`incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more.





I suddenly realized the previous patch was for GCC 15 branch. Here's a new one, 
rebased on master.




Ping.

Also, I think I need some comments on the `force_align_arg_pointer` hunk. There's really no good reason 
for the change, except that 'we had better let the attribute do something.'




Can you please rebase your patch? Thanks.


Here is a patch, rebased on current master.


--
Best regards,
LIU Hao

---
 gcc/config/i386/cygming.h | 9 -
 gcc/config/i386/i386.cc   | 9 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index d587d25a58a8..743cc38f5852 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -28,16 +28,15 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_SEH
 #define TARGET_SEH  (TARGET_64BIT_MS_ABI && flag_unwind_tables)

+#undef PREFERRED_STACK_BOUNDARY_DEFAULT
+#define PREFERRED_STACK_BOUNDARY_DEFAULT \
+  (TARGET_64BIT ? 128 : MIN_STACK_BOUNDARY)
+
 /* Win64 with SEH cannot represent DRAP stack frames.  Disable its use.
Force the use of different mechanisms to allocate aligned local data.  */
 #undef MAX_STACK_ALIGNMENT
 #define MAX_STACK_ALIGNMENT  (TARGET_SEH ? 128 : MAX_OFILE_ALIGNMENT)

-/* 32-bit Windows aligns the stack on a 4-byte boundary but SSE instructions
-   may require 16-byte alignment.  */
-#undef STACK_REALIGN_DEFAULT
-#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1)
-
 /* Support hooks for SEH.  */
 #undef  TARGET_ASM_UNWIND_EMIT
 #define TARGET_ASM_UNWIND_EMIT  i386_pe_seh_unwind_emit
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fd36ea802c00..9c24a926a890 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -7942,6 +7942,15 @@ ix86_update_stack_boundary (void)
   if (ix86_tls_descriptor_calls_expanded_in_cfun
   && crtl->preferred_stack_boundary < 128)
 crtl->preferred_stack_boundary = 128;
+
+  /* For 32-bit MS ABI, both the incoming and preferred stack boundaries
+ are 32 bits, but if force_align_arg_pointer is specified, it should
+ prefer 128 bits for a backward-compatibility reason, which is also
+ what the doc suggests.  */
+  if (lookup_attribute ("force_align_arg_pointer",
+   TYPE_ATTRIBUTES (TREE_TYPE (current_function_decl)))
+  && crtl->preferred_stack_boundary < 128)
+crtl->preferred_stack_boundary = 128;
 }

 /* Handle the TARGET_GET_DRAP_RTX hook.  Return NULL if no DRAP is
--
2.49.0




From 5a49e9a4aa1da8ade48c55df666d03e03f36e797 Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Tue, 29 Apr 2025 10:43:06 +0800
Subject: [PATCH] i386/cygming: Decrease default preferred stack boundary for
 32-bit targets

This commit decreases the default preferred stack boundary to 4.

In i386-options.cc, there's

   ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;

which sets the default incoming stack boundary to this value, if it's not
overridden by other options or attributes.

Previously, GCC preferred 16-byte alignment like other platforms, unless
`-miamcu` was specified. However, the Microsoft x86 ABI only requires the
stack be aligned to 4-byte boundaries. Callback functions from MSVC code may
break this assumption by GCC (see reference below), causing local variables
to be misaligned.

For compatibility reasons, when the attribute `force_align_arg_pointer` is
attached to a function, it continues to ensure the stack is at least aligned
to a 16-byte boundary, as the documentation seems to suggest.

After this change, `STACK_REALIGN_DEFAULT` no longer has an effect on this
target, so it is removed.

Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c9
Signed-off-by: LIU Hao 

gcc/ChangeLog:

PR 07
* config/i386/cygming.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Override
definition from i386.h.
(STACK_REALIGN_DEFAULT): Undefine, as it no longer has an effect.
* config/i386/i386.cc (ix86_update_stack_boundary): Force minimum
128-bit alignment if `force_align_arg_pointer`.

Signed-off-by: LIU Hao 
---
 gcc/config/i386/cygming.h | 9 -
 gcc/config/i386/i386.cc   | 9 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index d587d25a58a8..743cc38f5852 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -28,16 +28,15 @@ along with GCC; see the file COPYING3.  If not see
 #undef TARGET_SEH
 #define TARGET_SEH  (TARGET_64BIT_MS_ABI && flag_unwind_tables)
 
+#undef PREFERRED_STACK_BOUNDARY_DEFAULT
+#define PREFERRED_STACK_BOUNDARY_DEFAULT \
+  (TARGET_64BIT ? 128 : MI

Re: [PATCH 1/2] libstdc++: Support UNC paths on MinGW in fs::path

2025-05-10 Thread Jonathan Wakely
On Sat, 10 May 2025 at 12:49, Johannes Grunenberg
 wrote:
>
> UNC paths on Windows start with a root name "\\server" instead of a
> drive letter ("c:"). Support for parsing this was already implemented
> for cygwin, which (compared to MinGW) doesn't use backslash as the
> preferred separator.
> This mainly enables SLASHSLASH_IS_ROOTNAME on Windows and adjusts the
> tests accordingly.
> I aligned path::_M_append with path::operator/= to use has_root_directory
> and is_absolute from `this` instead of the appended path. Note that
> before this patch, `is_absolute && has_root_directory` was always
> `false`, as is_absolute implied has_root_directory. This isn't the case
> anymore (e.g. `\\foo` doesn't have a root directory).
>
> PR libstdc++-v3/99333
> PR libstdc++-v3/99430
> PR libstdc++-v3/106127
>
> libstdc++-v3/ChangeLog:
>
> * config/abi/pre/gnu.ver: Add fs::path::_M_is_unc_path() to ABI
> * include/bits/fs_path.h: Add and use fs::path::_M_is_unc_path()
> on Windows.
> * src/c++17/fs_path.cc (defined): Treat "//" as a root name on
> Windows.
> (path::_M_append): Align with operator/=.
> (path::_M_is_unc_path): Added helper to check for
> "(slash)(slash)(not-slash)".
> * testsuite/27_io/filesystem/path/append/path.cc: Added tests for UNC
> paths.
> * testsuite/27_io/filesystem/path/append/source.cc: Added tests for
> UNC paths.
> * testsuite/27_io/filesystem/path/decompose/filename.cc: Switched
> Windows FS to test for UNC paths.
> * testsuite/27_io/filesystem/path/decompose/root_directory.cc:
> Switched Windows FS to test for UNC paths.
> * testsuite/27_io/filesystem/path/generation/normal.cc: Switched
> Windows FS to test for UNC paths.
> * testsuite/27_io/filesystem/path/generic/94242.cc: Added case for
> double-slashes as root directories.
> * testsuite/27_io/filesystem/path/generic/generic_string.cc: Added
> case for double-slashes as root directories.
> * testsuite/27_io/filesystem/path/generic/utf.cc: Added case for
> double-slashes as root directories.
> * testsuite/27_io/filesystem/path/generic/wchar_t.cc: Added case for
> double-slashes as root directories.
> * testsuite/27_io/filesystem/path/itr/traversal.cc: Switched Windows
> FS to test for UNC paths.
> * testsuite/27_io/filesystem/path/query/is_absolute.cc: Test that UNC
> paths are absolute.
>
> Signed-off-by: Johannes Grunenberg 

Just to check, the Signed-off-by: trailer has a specific meaning which
is described at https://gcc.gnu.org/dco.html
Please confirm that you're using it that way (and not just adding it
because others do so).

I don't know much about UNC paths so this might take me a while to
review, but thanks for working on this and contributing the patches!


Re: [PATCH v2] RISC-V: Use vclmul for CRC expansion if available

2025-05-10 Thread Jeff Law




On 5/9/25 4:09 AM, Anton Blanchard wrote:

If the vector version of clmul (vclmul) is available and the scalar
one is not, use it for CRC expansion.

gcc/Changelog:

* config/riscv/bitmanip.md (crc_rev4): Check
TARGET_ZVBC.
(crc4): Likewise.
* config/riscv/riscv.cc (expand_crc_using_clmul): Emit code using
vclmul if TARGET_ZVBC.
(expand_reversed_crc_using_clmul): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/crc-builtin-zvbc.c: New test.

THanks.  I pushed this after fixing one formatting nit.



+
+  rtx shift_amount = gen_int_mode (GET_MODE_BITSIZE (word_mode) -
+  data_size, Pmode);
When we wrap we bring the operator down to the next line.  We usually 
also put expressions like this in an additional level of parenthesis.


But there's an easier way :-)  GET_MODE_BITSIZE (word_mode) is always 
going to be BITS_PER_WORD.  So I used that instead which avoids 
wrapping.  There was another instance a elsewhere I adjusted to be 
consistent.


Thanks!

jeff




Re: [PATCH, fortran] Fix simple typo in libgfortran

2025-05-10 Thread Tobias Burnus

Hi Yuao,


Yuao Ma wrote:
I'm writing to express my sincere gratitude for the opportunity to 
participate
in Google Summer of Code with GCC this year. I am very enthusiastic 
about this
program and fully committed to making a valuable contribution and 
fulfilling my

responsibilities.


Welcome and thanks for participating! And sorry for not coming back to 
you earlier!



As I am relatively new to submitting patches via mailing lists, I 
would like to

send a very simple patch primarily to familiarize myself with the correct
procedure. I have reviewed the contribution guidelines on the GCC 
website and

have tried to follow them closely.


Looks quite good, just a few remarks:


* This is a trivial patch but for larger patches, you need
  grant the right to use your patch. There are two ways of doing so:
  (a) The simple one, called "Developer Certificate of Origin" (DCO).
  This requires a "Signed-off-by:" line (as added by 'git commit -s'.
  (b) A copyright assignment with the Free Software Foundation (FSF).

See https://gcc.gnu.org/contribute.html and for the DCO variant
in particular: https://gcc.gnu.org/dco.html for details and for
what you actually certify/grant.


* Regarding the ChangeLog in the email:

(a) FYI - if you run for the last commit
   ./contrib/gcc-changelog/git_check_commit.py -v -p or for some commit 
./contrib/gcc-changelog/git_check_commit.py -v -p  Then you 
see how the resulting ChangeLog will look like - it also shows errors 
and warnings. (That's also used on the server.) (b) Small nit picking:

Subject: [PATCH] Fix simple typo in libgfortran

libgfortran/ChangeLog:

* io/read.c: typo fix, explict -> explicit


If applicable, the function name is put in parentheses after
the file name, i.e.:
	* io/read.c (read_f): ... Additionally, per style convention, the entry starts with a 
captial letter '... (read_f): Typo ...' - and it ends with a full stop 
'.' Otherwise, I would use 'comment typo' to make clear that it is just 
in a comment (and not, e.g., in some diagnostic output). But that's 
really only a small personal preference. * * * Thanks again for the 
patch and welcome again :-) Tobias




[PATCH v2] libstdc++: More efficient weekday from year_month_day.

2025-05-10 Thread Cassio Neri
Use a better algorithm to calculate n % 7, taking into account that the divisor
is a Mersenne number. This is explained in this two-part lightning talk at
CppCon 2024 [1, 2].

Roughly speaking, the algorithm is used for n = sys_days but it has a
precondition that doesn't hold for all values of sys_days. However, it does hold
for every value coming out of year_month_day::operator sys_days().

Nevertheless, there's an issue because the conversion is done in two steps:
year_month_day -> sys_days -> weekday and in the second step the information
that the input sys_days was obtained from the first step (so the precondition
holds) is lost. In the talk a single function performs the two steps and uses
the optimization. Unless the standard adds a similar function (e.g. a weekday
constructor) or this is added as an extension (not done here), this is not an
option for libstdc++.

The issue is adressed in the following way. At the end of the fist step, the
code informs the compiler, through an [[assume]] attribute that the precondition
holds. Then, at the beginning of the second step, the code checks, through
__builtin_constant_p, if the precondition holds, in which case, it uses the
better algorithm or, otherwise, it falls back to n % 7.

The idea is illustrated in [3] which compares code generation (with and without
the optimization) and provides an exhaustive test of correctness. A benchmark is
shown in [4].

References:

[1] https://youtu.be/64mTEXnSnZs
[2] https://youtu.be/bnVkWEjRNeI
[3] https://godbolt.org/z/TMMcKj8Gv
[4] https://quick-bench.com/q/VPtEBjGDmmB4QEG2IDjIvfU6WzA

libstdc++-v3/ChangeLog:

* include/std/chrono: Add postcondition to year_month_date::operator
sys_days and check this condition in weekday::_S_from_days.
* testsuite/std/time/weekday/1.cc: Add tests for special values.
---
The approach might be somewhat controversial and might need to be discussed. The
problem is that passing information from [[assume]] to __builtin_constant_p is
quite fragile in 3 different ways.

1) The distance in source code between [[assume]] and __builtin_constant_p makes
hard to understand why they are there in the first place. A reader who doesn't
know the full picture might feel tempted to remove either of these lines.

2) Even cosmetic changes in the conditions of [[assume]] or 
__builtin_constant_p might break the compiler's chain of thought and thus, the 
optimization.

3) Similarly to 2, changes in the compiler code that handles [[assume]] or 
__builtin_constant_p might break the optimization.

If any of the issues above happens, the result will still be right but a
performance regression will appear. Comments can mitigate the issues but a
performance test or a test for code generation would be more effective to
prevent performance regressions. Unfortunately, I don't know how either of these
can be done.

Version history:
 v2. Fix bug in v1, add tests and improve comments.
 v1. Initial patch.
 
 libstdc++-v3/include/std/chrono  | 29 ++--
 libstdc++-v3/testsuite/std/time/weekday/1.cc | 19 +
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 8eb9fd9baac..b747f46c13f 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -984,7 +984,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static constexpr weekday
   _S_from_days(const days& __d)
   {
-   return weekday{__detail::__add_modulo<7>(4, __d.count())};
+   const auto __days = __d.count();
+   using _Up = make_unsigned_t;
+   const auto __n = static_cast<_Up>(__days) + 12'687'434u;
+
+   // This condition holds when __d = sys_days(__ymd).time_since_epoch(),
+   // where __ymd's type is year_month_day.
+   const auto __use_fast_mod7 = __n < 178'956'973u;
+
+   if (__builtin_constant_p(__use_fast_mod7) && __use_fast_mod7)
+   {
+ // https://youtu.be/64mTEXnSnZs and https://youtu.be/bnVkWEjRNeI
+ const auto __wd = 613'566'757u * static_cast(__n) >> 29;
+ [[assume(__wd != 7)]];
+ return weekday{__wd};
+   }
+   return weekday{__detail::__add_modulo<7>(4, __days)};
   }
 
 public:
@@ -1652,7 +1667,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   constexpr
   operator sys_days() const noexcept
-  { return sys_days{_M_days_since_epoch()}; }
+  {
+   const auto __days = _M_days_since_epoch();
+
+   // The assume attribute below allows weekday::_S_from_days to use a
+   // faster algorithm for modulus 7.
+   const auto __n = static_cast(__days.count()) + 12'687'434u;
+   const auto __use_fast_mod7 = __n < 178'956'973u;
+   [[assume(__use_fast_mod7)]];
+
+   return sys_days{__days};
+  }
 
   explicit constexpr
   operator local_days() const noexcept
diff --git a/libstdc++-v3/testsuite/std/time/weekday/1.cc 
b/libstdc++-v3/testsuite/std/time/weekday/1.cc
i

Re: [PATCH v1] libstdc++: More efficient weekday from year_month_day.

2025-05-10 Thread Andrew Pinski
On Mon, May 5, 2025, 11:48 AM Cassio Neri  wrote:

> Use a better algorithm to calculate n % 7, taking into account that the
> divisor
> is a Mersenne number. This is explained in this two-part lightning talk at
> CppCon 2024 [1, 2].
>
> Roughly speaking, the algorithm is used for n = sys_days but it has a
> precondition that doesn't hold for all values of sys_days. However, it
> does hold
> for every value coming out of year_month_day::operator sys_days().
>
> Nevertheless, there's an issue because the conversion is done in two steps:
> year_month_day -> sys_days -> weekday and in the second step the
> information
> that the input sys_days was obtained from the first step (so the
> precondition
> holds) is lost. In the talk a single function performs the two steps and
> uses
> the optimization. Unless the standard adds a similar function (e.g. a
> weekday
> constructor) or this is added as an extension (not done here), this is not
> an
> option for libstdc++.
>
> The issue is adressed in the following way. At the end of the fist step,
> the
> code informs the compiler, through an [[assume]] attribute that the
> precondition
> holds. Then, at the beginning of the second step, the code checks, through
> __builtin_constant_p, if the precondition holds, in which case, it uses the
> better algorithm or, otherwise, it falls back to n % 7.
>

I suspect we want to disable this for -Os . And plus i am not 100%
convinced it is best for all micro-architures. Especially on say aarch64.
Can you do more benchmarking and peocide which exaxt core is being used?
And mention the size difference too?

Plus gcc knows how to do %7 via multiplication is that being used or is it
due to generic x86 tuning it is using the div instruction?

Thanks,
Andrew



> The idea is illustrated in [3] which compares code generation (with and
> without
> the optimization) and provides an exhaustive test of correctness. A
> benchmark is
> shown in [4].
>
> References:
>
> [1] https://youtu.be/64mTEXnSnZs
> [2] https://youtu.be/bnVkWEjRNeI
> [3] https://godbolt.org/z/E14asPK8r
> [4] https://quick-bench.com/q/vzxlrLaEccner3h-ahIrbB0BZvE
>
> libstdc++-v3/ChangeLog:
>
> * include/std/chrono: Add postcondition to
> year_month_date::operator
> sys_days and check this condition in weekday::_S_from_days.
> ---
> The approach might be somewhat controversial and might need to be
> discussed. The problem is that passing information from [[assume]] to
> __builtin_constant_p is
> quite fragile in 3 different ways.
>
> 1) The distance in source code between [[assume]] and __builtin_constant_p
> makes
> hard to understand why they are there in the first place. A reader who
> doesn't
> know the full picture might feel tempted to remove either of these lines.
>
> 2) Logically, the post-condition expressed by [[assume]] only needs to
> imply the
> condition tested by __builtin_constant_p. However, even whey they are
> obviously
> equivalent for a human, this might not be the case for the compiler. Even
> slight cosmetic changes might break the compiler's chain of thought and
> thus,
> the optimization.
>
> 3) Similarly to 2, changes in the compiler code that handles [[assume]] or
> __builtin_constant_p might break the optimization.
>
> If any of the issues above happens, the result will still be right but a
> performance regression will appear. Comments can mitigate the issues but a
> performance test or a test for code generation would be more effective to
> prevent performance regressions. Unfortunately, I don't know how either of
> these
> can be done.
>
>  libstdc++-v3/include/std/chrono | 27 +--
>  1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> index 8eb9fd9baac..e762dd46cc4 100644
> --- a/libstdc++-v3/include/std/chrono
> +++ b/libstdc++-v3/include/std/chrono
> @@ -984,7 +984,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>static constexpr weekday
>_S_from_days(const days& __d)
>{
> -   return weekday{__detail::__add_modulo<7>(4, __d.count())};
> +   const auto __days = __d.count();
> +
> +   const auto __n = static_cast(__days) + 12'687'434u;
> +   const auto __use_fast_mod7 = __n < 178'956'973u;
> +   // This condition is true when __d = __dp.time_since_epoch() where
> __dp
> +   // comes from year_month_day::operator sys_days().
> +   if (__builtin_constant_p(__use_fast_mod7))
> +   {
> + const auto __wd = 613'566'757u * __n >> 29;
> + [[assume(__wd != 7)]];
> + return weekday{__wd};
> +   }
> +
> +   return weekday{__detail::__add_modulo<7>(4, __days)};
>}
>
>  public:
> @@ -1652,7 +1665,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>constexpr
>operator sys_days() const noexcept
> -  { return sys_days{_M_days_since_epoch()}; }
> +  {
> +   const auto __days = _M_days_since_epoch();
> +
> +   // The assume at

[PATCH v2] c++: Only reject virtual base data member access in __builtin_offsetof [PR118346]

2025-05-10 Thread Simon Martin
The following test case highlights two issues - see
https://godbolt.org/z/7E1KGYreh:
 1. We error out at both L4 and L5, while (at least) clang, EDG and MSVC
only reject L5
 2. Our error message for L5 incorrectly mentions using a null pointer

=== cut here ===
struct A { int i; };
struct C : virtual public A { };
void foo () {
  auto res = ((C*)0)->i;  // L4
  __builtin_offsetof (C, i);  // L5
}
=== cut here ===

Even though L4 is UB, it's technically not invalid, and this patch
transforms the error into a warning categorized under -Waddress (so that
it can be controlled by the user, and also deactivated for offsetof).

It also fixes the error message for L5 to not be confused by the
artificial null pointer created by cp_parser_builtin_offsetof.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/118346

gcc/cp/ChangeLog:

* parser.cc (cp_parser_builtin_offsetof): Temporarily disable
-Waddress warnings.
* semantics.cc (finish_offsetof): Reject accesses to members in
virtual bases.
* typeck.cc (build_class_member_access_expr): Don't error but
warn about accesses to members in virtual bases.

gcc/testsuite/ChangeLog:

* g++.dg/other/offsetof8.C: Avoid -Wnarrowing warning.
* g++.dg/other/offsetof9.C: Adjust test expectations.
* g++.dg/other/offsetof10.C: New test.

---
 gcc/cp/parser.cc| 13 ++---
 gcc/cp/semantics.cc | 28 ++-
 gcc/cp/typeck.cc| 13 +++--
 gcc/testsuite/g++.dg/other/offsetof10.C | 36 +
 gcc/testsuite/g++.dg/other/offsetof8.C  |  6 ++---
 gcc/testsuite/g++.dg/other/offsetof9.C  |  6 ++---
 6 files changed, 76 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/other/offsetof10.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 1fb9e7fd872..43bbc69b196 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11521,10 +11521,15 @@ cp_parser_builtin_offsetof (cp_parser *parser)
   tree object_ptr
 = build_static_cast (input_location, build_pointer_type (type),
 null_pointer_node, tf_warning_or_error);
-
-  /* Parse the offsetof-member-designator.  We begin as if we saw "expr->".  */
-  expr = cp_parser_postfix_dot_deref_expression (parser, CPP_DEREF, object_ptr,
-true, &dummy, token->location);
+  {
+/* PR c++/118346: don't complain about object_ptr being null.  */
+warning_sentinel s(warn_address);
+/* Parse the offsetof-member-designator.  We begin as if we saw
+   "expr->".  */
+expr = cp_parser_postfix_dot_deref_expression (parser, CPP_DEREF,
+  object_ptr, true, &dummy,
+  token->location);
+  }
   while (true)
 {
   token = cp_lexer_peek_token (parser->lexer);
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 43a0eabfa12..9ba00196542 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5324,13 +5324,29 @@ finish_offsetof (tree object_ptr, tree expr, location_t 
loc)
 expr = TREE_OPERAND (expr, 0);
   if (!complete_type_or_else (TREE_TYPE (TREE_TYPE (object_ptr)), object_ptr))
 return error_mark_node;
+
   if (warn_invalid_offsetof
-  && CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (object_ptr)))
-  && CLASSTYPE_NON_STD_LAYOUT (TREE_TYPE (TREE_TYPE (object_ptr)))
-  && cp_unevaluated_operand == 0)
-warning_at (loc, OPT_Winvalid_offsetof, "% within "
-   "non-standard-layout type %qT is conditionally-supported",
-   TREE_TYPE (TREE_TYPE (object_ptr)));
+  && CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (object_ptr
+{
+  bool member_in_base_p = false;
+  if (TREE_CODE (expr) == COMPONENT_REF)
+   {
+ tree member = expr;
+ do {
+ member = TREE_OPERAND (member, 1);
+ } while (TREE_CODE (member) == COMPONENT_REF);
+ member_in_base_p = !same_type_p (TREE_TYPE (TREE_TYPE (object_ptr)),
+  DECL_CONTEXT (member));
+   }
+  if (member_in_base_p
+ && CLASSTYPE_VBASECLASSES (TREE_TYPE (TREE_TYPE (object_ptr
+   error_at (loc, "invalid % to data member in virtual base");
+  else if (CLASSTYPE_NON_STD_LAYOUT (TREE_TYPE (TREE_TYPE (object_ptr)))
+  && cp_unevaluated_operand == 0)
+   warning_at (loc, OPT_Winvalid_offsetof, "% within "
+   "non-standard-layout type %qT is conditionally-supported",
+   TREE_TYPE (TREE_TYPE (object_ptr)));
+}
   return fold_offsetof (expr);
 }
 
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 1b9fdf5b21d..ec35ab6dbb4 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -2958,17 +2958,10 @@ build_class_member_access_expr (cp_expr object, tree 
member,
return error_mark_node;
 
  /* It is inva

Re: [PATCH 1/6] emit-rtl: document next_nonnote_nondebug_insn_bb () can breach into next BB

2025-05-10 Thread Jeff Law




On 5/9/25 2:27 PM, Vineet Gupta wrote:

gcc/ChangeLog:

* emit-rtl.cc (next_nonnote_nondebug_insn): Update comments.

Signed-off-by: Vineet Gupta 
---
  gcc/emit-rtl.cc | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 3e2c4309dee6..b78b29ecf989 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -3677,7 +3677,11 @@ next_nonnote_nondebug_insn (rtx_insn *insn)
  
  /* Return the next insn after INSN that is not a NOTE nor DEBUG_INSN,

 but stop the search before we enter another basic block.  This
-   routine does not look inside SEQUENCEs.  */
+   routine does not look inside SEQUENCEs.
+   NOTE: This can potentially bleed into next BB. If current insn is
+last insn of BB, followed by a code_label before the start of
+the next BB, code_label will be returned. But this is the
+behavior rest of gcc assumes/relies on e.g. get_last_bb_insn.  */
I don't see how get_last_bb_insn itself inherently needs this behavior. 
I could believe something that *calls* get_last_bb_insn might depend on 
this behavior -- but I'd also consider that bogus.  The CODE_LABEL is 
part of the next block, so returning that seems quite wrong given the 
original comment and users like get_last_bb_insn.


I don't mind adding the comment, but I'd much rather chase down the 
offenders and fix them.


Jeff




[PATCH 0/2] libstdc++: Support UNC paths on MinGW in fs::path

2025-05-10 Thread Johannes Grunenberg
These two patches add support for UNC paths on MinGW,
i.e. paths like `\\server\foo\bar\baz.txt`.
This is my first time submitting a patch to the GCC repo, so I'm not
fully sure if I submitted the patch correctly.

This was tested on x86_64-w64-mingw32.

I'll try to look at the fs-operations on UNC paths after this patch,
since some of them (like fs::canonical) break with UNC paths.

Johannes Grunenberg (2):
  libstdc++: Support UNC paths on MinGW in fs::path
  libstdc++: Extend tests for UNC paths on MinGW

 libstdc++-v3/config/abi/pre/gnu.ver   |  7 +++
 libstdc++-v3/include/bits/fs_path.h   | 13 +++-
 libstdc++-v3/src/c++17/fs_path.cc | 14 +++--
 .../27_io/filesystem/path/append/path.cc  | 17 +++-
 .../27_io/filesystem/path/append/source.cc| 14 +
 .../filesystem/path/decompose/extension.cc|  5 +
 .../filesystem/path/decompose/filename.cc |  2 +-
 .../path/decompose/relative_path.cc   |  9 +
 .../path/decompose/root_directory.cc  |  3 ++-
 .../filesystem/path/decompose/root_name.cc| 20 +++
 .../filesystem/path/decompose/root_path.cc| 13 
 .../27_io/filesystem/path/decompose/stem.cc   |  5 +
 .../filesystem/path/generation/normal.cc  |  2 +-
 .../filesystem/path/generation/relative.cc| 14 +
 .../27_io/filesystem/path/generic/94242.cc|  8 
 .../filesystem/path/generic/generic_string.cc |  6 +-
 .../27_io/filesystem/path/generic/utf.cc  |  6 +-
 .../27_io/filesystem/path/generic/wchar_t.cc  |  6 +-
 .../27_io/filesystem/path/itr/traversal.cc|  6 +++---
 .../filesystem/path/query/is_absolute.cc  |  1 +
 20 files changed, 158 insertions(+), 13 deletions(-)

-- 
2.49.0.windows.1



[PATCH, fortran] Fix simple typo in libgfortran

2025-05-10 Thread Yuao Ma
Hi Tobias,

Thanks for your time and guidance on this GSOC project.

> * This is a trivial patch but for larger patches, you need
>grant the right to use your patch. There are two ways of doing so:
>(a) The simple one, called "Developer Certificate of Origin" (DCO).
>This requires a "Signed-off-by:" line (as added by 'git commit -s'.
>(b) A copyright assignment with the Free Software Foundation (FSF).

You're right. I found the information on the website stating that minor patches
can omit the DCO. I have now added the "Signed-off-by:" line in the new patch.

> (a) FYI - if you run for the last commit
> ./contrib/gcc-changelog/git_check_commit.py -v -p or for some commit
> ./contrib/gcc-changelog/git_check_commit.py -v -p  Then you
> see how the resulting ChangeLog will look like - it also shows errors
> and warnings. (That's also used on the server.) (b) Small nit picking:

Thanks for the tip! I ran the git_check_commit.py script, and it reported "OK."
However, when I regenerated the ChangeLog, the function name wasn't included
automatically. I've manually added the function name to the ChangeLog for now
and will investigate the script's behavior later.

The updated patch is attached for your review.

Yuao



0001-fortran-fix-simple-typo-in-libgfortran.patch
Description: 0001-fortran-fix-simple-typo-in-libgfortran.patch


[PATCH 1/2] libstdc++: Support UNC paths on MinGW in fs::path

2025-05-10 Thread Johannes Grunenberg
UNC paths on Windows start with a root name "\\server" instead of a
drive letter ("c:"). Support for parsing this was already implemented
for cygwin, which (compared to MinGW) doesn't use backslash as the
preferred separator.
This mainly enables SLASHSLASH_IS_ROOTNAME on Windows and adjusts the
tests accordingly.
I aligned path::_M_append with path::operator/= to use has_root_directory
and is_absolute from `this` instead of the appended path. Note that
before this patch, `is_absolute && has_root_directory` was always
`false`, as is_absolute implied has_root_directory. This isn't the case
anymore (e.g. `\\foo` doesn't have a root directory).

PR libstdc++-v3/99333
PR libstdc++-v3/99430
PR libstdc++-v3/106127

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver: Add fs::path::_M_is_unc_path() to ABI
* include/bits/fs_path.h: Add and use fs::path::_M_is_unc_path()
on Windows.
* src/c++17/fs_path.cc (defined): Treat "//" as a root name on
Windows.
(path::_M_append): Align with operator/=.
(path::_M_is_unc_path): Added helper to check for
"(slash)(slash)(not-slash)".
* testsuite/27_io/filesystem/path/append/path.cc: Added tests for UNC
paths.
* testsuite/27_io/filesystem/path/append/source.cc: Added tests for
UNC paths.
* testsuite/27_io/filesystem/path/decompose/filename.cc: Switched
Windows FS to test for UNC paths.
* testsuite/27_io/filesystem/path/decompose/root_directory.cc:
Switched Windows FS to test for UNC paths.
* testsuite/27_io/filesystem/path/generation/normal.cc: Switched
Windows FS to test for UNC paths.
* testsuite/27_io/filesystem/path/generic/94242.cc: Added case for
double-slashes as root directories.
* testsuite/27_io/filesystem/path/generic/generic_string.cc: Added
case for double-slashes as root directories.
* testsuite/27_io/filesystem/path/generic/utf.cc: Added case for
double-slashes as root directories.
* testsuite/27_io/filesystem/path/generic/wchar_t.cc: Added case for
double-slashes as root directories.
* testsuite/27_io/filesystem/path/itr/traversal.cc: Switched Windows
FS to test for UNC paths.
* testsuite/27_io/filesystem/path/query/is_absolute.cc: Test that UNC
paths are absolute.

Signed-off-by: Johannes Grunenberg 
---
 libstdc++-v3/config/abi/pre/gnu.ver |  7 +++
 libstdc++-v3/include/bits/fs_path.h | 13 -
 libstdc++-v3/src/c++17/fs_path.cc   | 14 --
 .../27_io/filesystem/path/append/path.cc| 17 -
 .../27_io/filesystem/path/append/source.cc  | 14 ++
 .../27_io/filesystem/path/decompose/filename.cc |  2 +-
 .../filesystem/path/decompose/root_directory.cc |  2 +-
 .../27_io/filesystem/path/generation/normal.cc  |  2 +-
 .../27_io/filesystem/path/generic/94242.cc  |  8 
 .../filesystem/path/generic/generic_string.cc   |  6 +-
 .../27_io/filesystem/path/generic/utf.cc|  6 +-
 .../27_io/filesystem/path/generic/wchar_t.cc|  6 +-
 .../27_io/filesystem/path/itr/traversal.cc  |  6 +++---
 .../27_io/filesystem/path/query/is_absolute.cc  |  1 +
 14 files changed, 91 insertions(+), 13 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 29bc7d86256..ed2275fead0 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2546,6 +2546,13 @@ GLIBCXX_3.4.34 {
 
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE12_M_constructILb[01]EEEvPK[cw][jmy];
 } GLIBCXX_3.4.33;
 
+# GCC 16.1.0
+GLIBCXX_3.4.35 {
+# std::filesystem::__cxx11::path::_M_is_unc_path() const
+_ZNKSt10filesystem7__cxx114path14_M_is_unc_pathEv;
+_ZNKSt10filesystem4path14_M_is_unc_pathEv;
+} GLIBCXX_3.4.34;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index ccf1d708e09..2694a13d5c3 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -609,6 +609,10 @@ namespace __detail
 
 pair _M_find_extension() const noexcept;
 
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+bool _M_is_unc_path() const noexcept;
+#endif
+
 // path::_S_convert creates a basic_string or
 // basic_string_view from a basic_string or
 // basic_string_view, for an encoded character type C,
@@ -1224,6 +1228,13 @@ namespace __detail
  for (auto& __elem : *this)
{
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+ if (__elem._M_type() == _Type::_Root_name && _M_is_unc_path())
+   {
+ __str += __slash;
+ __str += __slash;
+ __str += 
basic_string_view(__elem._M_pathname).substr(2);
+   

[PATCH 2/2] libstdc++: Extend tests for UNC paths on MinGW

2025-05-10 Thread Johannes Grunenberg
Adds some additional tests to ensure that UNC paths are properly
handled (mainly decomposed) on MinGW.

libstdc++-v3/ChangeLog:

* testsuite/27_io/filesystem/path/decompose/extension.cc:
Test for UNC paths with dots in the root directory.
* testsuite/27_io/filesystem/path/decompose/relative_path.cc:
Test for UNC paths.
* testsuite/27_io/filesystem/path/decompose/root_directory.cc:
Test for UNC path.
* testsuite/27_io/filesystem/path/decompose/root_name.cc:
Test for Windows paths.
* testsuite/27_io/filesystem/path/decompose/root_path.cc:
Test for Windows paths.
* testsuite/27_io/filesystem/path/decompose/stem.cc:
Test for Windows paths.
* testsuite/27_io/filesystem/path/generation/relative.cc:
Test for Windows paths.

Signed-off-by: Johannes Grunenberg 
---
 .../filesystem/path/decompose/extension.cc|  5 +
 .../path/decompose/relative_path.cc   |  9 +
 .../path/decompose/root_directory.cc  |  1 +
 .../filesystem/path/decompose/root_name.cc| 20 +++
 .../filesystem/path/decompose/root_path.cc| 13 
 .../27_io/filesystem/path/decompose/stem.cc   |  5 +
 .../filesystem/path/generation/relative.cc| 14 +
 7 files changed, 67 insertions(+)

diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc
index 82887beea37..0965209d283 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc
@@ -45,6 +45,11 @@ test01()
   VERIFY( path(".").extension() == path("") );
 
   VERIFY( path().extension() == path() );
+
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  VERIFY( path("wsl.local").extension() == path("") );
+  VERIFY( path("wsl.local\\foo.bar").extension() == path(".bar") );
+#endif
 }
 
 void
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/relative_path.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/relative_path.cc
index 0c3eb4b333e..ebc56b99d56 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/relative_path.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/relative_path.cc
@@ -34,6 +34,15 @@ test01()
   VERIFY( p2.relative_path() == p2 );
   path p3 = "/foo/bar";
   VERIFY( p3.relative_path() == p2 );
+
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  path p4 = "baz/foo/bar";
+  VERIFY( p4.relative_path() == p2 );
+  path p5 = "foo\\bar";
+  VERIFY( p5.relative_path() == p5 );
+  path p6 = "baz\\foo\\bar";
+  VERIFY( p6.relative_path() == p5 );
+#endif
 }
 
 #include  // XXX
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_directory.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_directory.cc
index a7c90f99752..ffdf18929be 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_directory.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_directory.cc
@@ -35,6 +35,7 @@ test01()
   path p3 = "//foo";
 #if defined(__CYGWIN__) || defined(_GLIBCXX_FILESYSTEM_IS_WINDOWS)
   VERIFY( p3.root_directory() == path() );
+  VERIFY( path("//").root_directory() == path("//") );
 #else
   VERIFY( p3.root_directory() == path("/") );
 #endif
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_name.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_name.cc
index c32242ff6fe..d5d1934ec15 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_name.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_name.cc
@@ -34,8 +34,28 @@ test01()
   VERIFY( path("..").extension().empty() );
 }
 
+void
+test02()
+{
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  VERIFY( path("/foo/bar").root_name() == "" );
+  VERIFY( path("\\").root_name() == "" );
+  VERIFY( path("").root_name() == "" );
+  VERIFY( path("foo").root_name() == "foo" );
+  VERIFY( path("foo\\").root_name() == "foo" );
+  VERIFY( path("foo\\bar").root_name() == "foo" );
+  VERIFY( path("?\\c\\").root_name() == "?" );
+  VERIFY( path("c:/").root_name() == "c:" );
+  VERIFY( path("c:\\").root_name() == "c:" );
+  VERIFY( path("c:").root_name() == "c:" );
+  VERIFY( path("c:\\foo").root_name() == "c:" );
+#endif
+  VERIFY( path("").root_name() == "" );
+}
+
 int
 main()
 {
   test01();
+  test02();
 }
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_path.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_path.cc
index b01c75fa0b9..c393e5e4681 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_path.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_path.cc
@@ -32,6 +32,19 @@ test01()
   VERIFY( p1.root_path() == path() );
   path p2 = "/foo/bar";
   VERIFY( p2.root_path() == path(

Re: [PING] [PATCH] Fortran: fix passing of inquiry ref of complex array to TRANSFER [PR102891]

2025-05-10 Thread Jerry D

On 5/10/25 11:33 AM, Harald Anlauf wrote:

Early ping.

Am 06.05.25 um 21:06 schrieb Harald Anlauf:

Dear all,

here's another rather obvious case where a temporary is needed for
an inquiry reference of a complex array which is a component of a
derived type.  In contrast to PR119986, the argument is handled
within the code for the intrinsic TRANSFER, so that the other
patch did not catch the present one.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Yes, quite alright.



I'd also like to backport this one to 15-branch if this is ok.


... and please do backport.

Best regards,

Jerry



Thanks,
Harald









[PATCH] testsuite: Fix pr119131-1.c for targets which emit a psabi warning for vectors of DFP [PR119909]

2025-05-10 Thread Andrew Pinski
On PowerPC, there is a psabi warning for argument passing of a DFP vector.
We are not expecting this warning and we get a failure due to it.
Adding -Wno-psabi fixes the testcase.

Committed as obvious after a quick test.

gcc/testsuite/ChangeLog:

PR testsuite/119909
* gcc.dg/torture/pr119131-1.c: Add -Wno-psabi.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr119131-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/torture/pr119131-1.c 
b/gcc/testsuite/gcc.dg/torture/pr119131-1.c
index c62f702f98c..1780035dade 100644
--- a/gcc/testsuite/gcc.dg/torture/pr119131-1.c
+++ b/gcc/testsuite/gcc.dg/torture/pr119131-1.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target dfp } */
+/* { dg-additional-options "-Wno-psabi" } */
 /* PR target/119131 */
 
 typedef __attribute__((__vector_size__ (64))) char C;
-- 
2.43.0



[PATCH] x86: Change dest to src in replace_vector_const

2025-05-10 Thread H.J. Lu
Replace

rtx dest = SET_SRC (set);

with

rtx src = SET_SRC (set);

in replace_vector_const to avoid confusion.

* config/i386/i386-features.cc (replace_vector_const): Change
dest to src.

I am checking it in shortly.

-- 
H.J.
From 7b102b3d29efc7b005b2fa820e3ce7a7b1664916 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 11 May 2025 06:17:45 +0800
Subject: [PATCH] x86: Change dest to src in replace_vector_const

Replace

rtx dest = SET_SRC (set);

with

rtx src = SET_SRC (set);

in replace_vector_const to avoid confusion.

	* config/i386/i386-features.cc (replace_vector_const): Change
	dest to src.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386-features.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 54b3f6d33b2..13e6c2a8abd 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -3372,12 +3372,12 @@ replace_vector_const (machine_mode vector_mode, rtx vector_const,
 
   /* Get the single SET instruction.  */
   rtx set = single_set (insn);
-  rtx dest = SET_SRC (set);
-  machine_mode mode = GET_MODE (dest);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (src);
 
   rtx replace;
   /* Replace the source operand with VECTOR_CONST.  */
-  if (SUBREG_P (dest) || mode == vector_mode)
+  if (SUBREG_P (src) || mode == vector_mode)
 	replace = vector_const;
   else
 	{
-- 
2.49.0



i386: Fix some problems in stv cost model

2025-05-10 Thread Jan Hubicka
Hi,
this patch fixes some of problems with cosint in scalar to vector pass.
In particular
 1) the pass uses optimize_insn_for_size which is intended to be used by
expanders and splitters and requires the optimization pass to use
set_rtl_profile (bb) for currently processed bb.
This is not done, so we get random stale info about hotness of insn.
 2) register allocator move costs are all realtive to integer reg-reg move
which has cost of 2, so it is (except for size tables and i386)
a latency of instruction multiplied by 2.
These costs has been duplicated and are now used in combination with
rtx costs which are all based to COSTS_N_INSNS that multiplies latency
by 4.
Some of vectorizer costing contains COSTS_N_INSNS (move_cost) / 2
to compensate, but some new code does not.  This patch adds compensatoin.

Perhaps we should update the cost tables to use COSTS_N_INSNS everywher
but I think we want to first fix inconsistencies.  Also the tables will
get optically much longer, since we have many move costs and COSTS_N_INSNS
is a lot of characters.
 3) variable m which decides how much to multiply integer variant (to account
that with -m32 all 64bit computations needs 2 instructions) is declared
unsigned which makes the signed computation of instruction gain to be
done in unsigned type and breaks i.e. for division.
 4) I added integer_to_sse costs which are currently all duplicationof
sse_to_integer. AMD chips are asymetric and moving one direction is faster
than another.  I will chance costs incremnetally once vectorizer part
is fixed up, too.

There are two failures gcc.target/i386/minmax-6.c and 
gcc.target/i386/minmax-7.c.
Both test stv on hasswell which no longer happens since SSE->INT and INT->SSE 
moves
are now more expensive.

There is only one instruction to convert:

Computing gain for chain #1...
  Instruction gain 8 for11: {r110:SI=smax(r116:SI,0);clobber flags:CC;}
  Instruction conversion gain: 8
  Registers conversion cost: 8<- this is integer_to_sse and sse_to_integer
  Total gain: 0

total gain used to be 4 since the patch doubles the conversion costs.
According to agner fog's tables the costs should be 1 cycle which is correct
here.

Final code gnerated is:

vmovd   %esi, %xmm0 * latency 1
cmpl%edx, %esi
je  .L2
vpxor   %xmm1, %xmm1, %xmm1 * latency 1
vpmaxsd %xmm1, %xmm0, %xmm0 * latency 1
vmovd   %xmm0, %eax * latency 1
imull   %edx, %eax
cltq
movzwl  (%rdi,%rax,2), %eax
ret

cmpl%edx, %esi
je  .L2
xorl%eax, %eax  * latency 1
testl   %esi, %esi  * latency 1
cmovs   %eax, %esi  * latency 2
imull   %edx, %esi
movslq  %esi, %rsi
movzwl  (%rdi,%rsi,2), %eax
ret

Instructions with latency info are those really different.
So the uncoverted code has sum of latencies 4 and real latency 3.
Converted code has sum of latencies 4 and real latency 3 (vmod+vpmaxsd+vmov).
So I do not quite see it should be a win.

There is also a bug in costing MIN/MAX

case ABS:
case SMAX:
case SMIN:
case UMAX:
case UMIN:
  /* We do not have any conditional move cost, estimate it as a
 reg-reg move.  Comparisons are costed as adds.  */
  igain += m * (COSTS_N_INSNS (2) + ix86_cost->add);
  /* Integer SSE ops are all costed the same.  */
  igain -= ix86_cost->sse_op;
  break;

Now COSTS_N_INSNS (2) is not quite right since reg-reg move should be 1 or 
perhaps 0.
For Haswell cmov really is 2 cycles, but I guess we want to have that in cost 
vectors
like all other instructions.

I am not sure if this is really a win in this case (other minmax testcases 
seems to make
sense).  I have xfailed it for now and will check if that affects specs on LNT 
testers.

Bootstrapped/regtested x86_64-linux, comitted.

I will proceed with similar fixes on vectorizer cost side. Sadly those 
introduces
quite some differences in the testuiste (partly triggered by other costing 
problems,
such as one of scatter/gather)

gcc/ChangeLog:

* config/i386/i386-features.cc
(general_scalar_chain::vector_const_cost): Add BB parameter; handle
size costs; use COSTS_N_INSNS to compute move costs.
(general_scalar_chain::compute_convert_gain): Use optimize_bb_for_size
instead of optimize_insn_for size; use COSTS_N_INSNS to compute move 
costs;
update calls of general_scalar_chain::vector_const_cost; use
ix86_cost->integer_to_sse.
(timode_immed_const_gain): Add bb parameter; use
optimize_bb_for_size_p.
(timode_scalar_chain::compute_convert_gain): Use optimize_bb_for_size_p.
* config/i386/i386-features.h (class general_scalar_chain): Update

Re: [parch, fortran] Fix PR 120163, 15/16 regression

2025-05-10 Thread Thomas Koenig

Am 10.05.25 um 20:32 schrieb Harald Anlauf:

Hi Thomas!

Am 10.05.25 um 11:42 schrieb Thomas Koenig:



This bug was another case of generating a formal arglist from
an actual one where we should not have done so.  The fix is
straightforward:  If we have resolved the formal arglist, we should
not generare a new one.

OK for trunk and backport to gcc 15?


Assuming this regtests OK (you didn't say so) this is OK for both.


Ah yes, I did.


Thanks for the patch!


Committed as r16-525-g004bf889e0b1b96ae50f93339104d3602a88deb5 .

Thanks for the review!

Best regards

Thomas



[PATCH, fortran] Fix simple typo in libgfortran

2025-05-10 Thread Yuao Ma
Hi GCC Maintainers,

I'm writing to express my sincere gratitude for the opportunity to participate
in Google Summer of Code with GCC this year. I am very enthusiastic about this
program and fully committed to making a valuable contribution and fulfilling my
responsibilities.

As I am relatively new to submitting patches via mailing lists, I would like to
send a very simple patch primarily to familiarize myself with the correct
procedure. I have reviewed the contribution guidelines on the GCC website and
have tried to follow them closely.

Please let me know if this approach is acceptable. I'm eager to learn and
ensure my future contributions align with the project's standards.

Thank you for your time and guidance.

Best regards,
Yuao



0001-Fix-simple-typo-in-libgfortran.patch
Description: 0001-Fix-simple-typo-in-libgfortran.patch


Re: [PATCH, fortran] Fix simple typo in libgfortran

2025-05-10 Thread Thomas Koenig

Hi Yuao,

first, good to have you on board!

As I am relatively new to submitting patches via mailing lists, I would 
like to

send a very simple patch primarily to familiarize myself with the correct
procedure. I have reviewed the contribution guidelines on the GCC 
website and

have tried to follow them closely.


The patch itself is OK.

There should be a ChangeLog entry. You can generate a template by
running your patch through contrib/mklog.py, which you can then fill
out with the details.

Best regards

Thomas



Re: [PATCH] fortran: Fix up minloc/maxloc lowering [PR120191]

2025-05-10 Thread Tobias Burnus

Jakub Jelinek wrote:

This unfortunately only fixes some of the cases in the new testcase.

We indeed should drop the kind argument from what is passed to the
library, but need to do it not only when one uses the argument name
for it (so kind=4 etc.) but also when one passes all the arguments
to the intrinsics.

The following patch uses what gfc_conv_intrinsic_findloc uses,
which looks more efficient and cleaner, we already set automatic
vars to point to the kind and back actual arguments, so we can just
free/clear expr on the former and set name to "%VAL" on the latter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


LGTM – but it seems that we should do a minor cleanup while being there:

...


--- gcc/fortran/trans-intrinsic.cc.jj   2025-04-22 21:26:15.772920190 +0200
+++ gcc/fortran/trans-intrinsic.cc  2025-05-09 17:41:27.323962631 +0200

...

+  /* Remove kind.  */
+  if (kind_arg->expr)
  {

...

+  gfc_free_expr (kind_arg->expr);
+  kind_arg->expr = NULL;
  }
  
+  /* Pass BACK argument by value.  */

+  back_arg->name = "%VAL";
+

...

gfc_actual_arglist *a = actual;
-  strip_kind_from_actual (a);
while (a)
{
  if (a->name && strcmp (a->name, "dim") == 0)


Namely: Similar to above, we should be able to just do:

   if (dim_arg->expr)

I think the comment should be also updated and we
can also get rid of the 'actual' variable for cleanup.

Namely, something like the following on top of your patch (untested):
-
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -4912 +4912 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, 
enum tree_code op)
-  gfc_actual_arglist *actual, *array_arg, *dim_arg, *mask_arg, *kind_arg;
+  gfc_actual_arglist *array_arg, *dim_arg, *mask_arg, *kind_arg;
@@ -4931,2 +4931 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, 
enum tree_code op)
-  actual = expr->value.function.actual;
-  array_arg = actual;
+  array_arg = expr->value.function.actual;
@@ -4972 +4971 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, 
enum tree_code op)
-  arrayexpr = actual->expr;
+  arrayexpr = array_arg->expr;
@@ -4974,2 +4973,2 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
-  /* Special case for character maxloc.  Remove unneeded actual
- arguments, then call a library function.  */
+  /* Special case for character maxloc.  Remove unneeded "dim"
+ actual argument, then call the library function.  */
@@ -4980,3 +4979 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, 
enum tree_code op)
-
-  gfc_actual_arglist *a = actual;
-  while (a)
+  if (dim_arg->expr)
@@ -4984,6 +4981,2 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
- if (a->name && strcmp (a->name, "dim") == 0)
-   {
- gfc_free_expr (a->expr);
- a->expr = NULL;
-   }
- a = a->next;
+ gfc_free_expr (dim_arg->expr);
+ dim_arg->expr = NULL;
-

Thanks to both of your for the patch and to Daniil additionally
for reporting and debugging this issue!

Tobias



[parch, fortran] Fix PR 120163, 15/16 regression

2025-05-10 Thread Thomas Koenig



This bug was another case of generating a formal arglist from
an actual one where we should not have done so.  The fix is
straightforward:  If we have resolved the formal arglist, we should
not generare a new one.

OK for trunk and backport to gcc 15?

Best regards

Thomas

Do not generate formal arglist from actual if we have already resolved it.

gcc/fortran/ChangeLog:

PR fortran/120163
* gfortran.h: Add formal_resolved to gfc_symbol.
* resolve.cc (gfc_resolve_formal_arglist): Set it.
(resolve_function): Do not call gfc_get_formal_from_actual_arglist
if we already resolved a formal arglist.
(resolve_call): Likewise.
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 46310a088f2..4740c3676d9 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2028,6 +2028,9 @@ typedef struct gfc_symbol
  This is legal in Fortran, but can cause problems with autogenerated
  C prototypes for C23.  */
   unsigned ext_dummy_arglist_mismatch:1;
+  /* Set if the formal arglist has already been resolved, to avoid
+ trying to generate it again from actual arguments.  */
+  unsigned formal_resolved:1;
 
   /* Reference counter, used for memory management.
 
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 1e62e94788b..bf1aa704888 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -533,7 +533,8 @@ gfc_resolve_formal_arglist (gfc_symbol *proc)
 	}
 	}
 }
-
+  if (sym)
+sym->formal_resolved = 1;
   gfc_current_ns = orig_current_ns;
 }
 
@@ -3472,7 +3473,7 @@ resolve_function (gfc_expr *expr)
 			   &expr->where, &sym->formal_at);
 	}
 	}
-  else
+  else if (!sym->formal_resolved)
 	{
 	  gfc_get_formal_from_actual_arglist (sym, expr->value.function.actual);
 	  sym->formal_at = expr->where;
@@ -4033,7 +4034,7 @@ resolve_call (gfc_code *c)
 			   &c->loc, &csym->formal_at);
 	}
 	}
-  else
+  else if (!csym->formal_resolved)
 	{
 	  gfc_get_formal_from_actual_arglist (csym, c->ext.actual);
 	  csym->formal_at = c->loc;
diff --git a/gcc/testsuite/gfortran.dg/interface_61.f90 b/gcc/testsuite/gfortran.dg/interface_61.f90
new file mode 100644
index 000..15db3b8a7a7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/interface_61.f90
@@ -0,0 +1,27 @@
+! { dg-do compile }
+! { dg-options -Wexternal-argument-mismatch }
+! PR fortran/120163 - this used to cause an error.
+! Original test case by Bálint Aradi
+module mod1
+  implicit none
+
+  abstract interface
+pure subroutine callback_interface(a)
+  real, intent(in) :: a
+end subroutine callback_interface
+  end interface
+
+contains
+
+  subroutine caller(callback)
+procedure(callback_interface) :: callback
+real :: a
+call callback(a)
+  end subroutine caller
+
+end module mod1
+
+
+module mod2
+  use mod1
+end module mod2


Re: [PATCH v4 1/2] RISC-V: Support RISC-V Profiles 20/22.

2025-05-10 Thread Jeff Law




On 5/10/25 6:30 AM, Jiawei wrote:

This patch introduces support for RISC-V Profiles RV20 and RV22 [1],
enabling developers to utilize these profiles through the -march option.

[1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0

Version log:
Using lowercase letters to present Profiles.
Using '_' as divsor between Profiles and other RISC-V extension.
Add descriptions in invoke.texi.
Checking if there exist '_' between Profiles and additional extensions.
Using std::string to avoid memory problems.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (struct riscv_profiles): New 
struct.
(riscv_subset_list::parse_profiles): New parser.
(riscv_subset_list::parse_base_ext): Ditto.
* config/riscv/riscv-subset.h: New def.
* doc/invoke.texi: New option descriptions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-49.c: New test.
* gcc.target/riscv/arch-50.c: New test.
* gcc.target/riscv/arch-51.c: New test.
* gcc.target/riscv/arch-52.c: New test.

Both patches in this series are OK.  Please install.

Thanks again,
Jeff





[COMMITED] testsuite: Disable bit tests in aarch64/pr99988.c

2025-05-10 Thread Filip Kastl
My recent changes to bit-test switch lowering broke pr99988.c testcase.
The testcase assumes a switch will be lowered using jump tables.  Make
the testcase run with -fno-bit-tests.

Pushed as obvious.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr99988.c: Add -fno-bit-tests.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.target/aarch64/pr99988.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/pr99988.c 
b/gcc/testsuite/gcc.target/aarch64/pr99988.c
index 7cca4962944..c09ce67c0fa 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr99988.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr99988.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-O2 -mbranch-protection=standard" } */
+/* { dg-options "-O2 -mbranch-protection=standard -fno-bit-tests" } */
 /* { dg-final { scan-assembler-times {bti j} 13 } } */
 int a;
 int c();
-- 
2.49.0



[PATCH] libgcc SH: fix alignment for relaxation

2025-05-10 Thread Quentin Boswank

From 6462f1e6a2565c5d4756036d9bc2f39dce9bd768 Mon Sep 17 00:00:00 2001
From: QBos07 
Date: Sat, 10 May 2025 16:56:28 +
Subject: [PATCH] libgcc SH: fix alignment for relaxation

when relaxation is enabled we can not infer the alignment
from the position as that may change. This should not change
non-relaxed builds as its allready aligned there. This was
the missing piece to building an entire toolchain with -mrelax

Credit goes to Oleg Endo: 
https://sourceware.org/bugzilla/show_bug.cgi?id=3298#c4

---
 libgcc/config/sh/lib1funcs.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/sh/lib1funcs.S b/libgcc/config/sh/lib1funcs.S
index eda48066c..11acfd5e3 100644
--- a/libgcc/config/sh/lib1funcs.S
+++ b/libgcc/config/sh/lib1funcs.S
@@ -115,7 +115,7 @@ see the files COPYING3 and COPYING.RUNTIME 
respectively.  If not, see

    HIDDEN_FUNC(GLOBAL(ashiftrt_r4_31))
    HIDDEN_FUNC(GLOBAL(ashiftrt_r4_32))

-   .align  1
+   .align  4
 GLOBAL(ashiftrt_r4_32):
 GLOBAL(ashiftrt_r4_31):
    rotcl   r4
@@ -764,6 +764,7 @@ LOCAL(movmem_loop): /* Reached with rts */
    bt  GLOBAL(movmemSI52)
 ! done all the large groups, do the remainder
 ! jump to movmem+
+   .balign 4
    mova    GLOBAL(movmemSI4)+4,r0
    add r6,r0
    jmp @r0
--
2.43.0


[PATCH, fortran] Fix simple typo in libgfortran

2025-05-10 Thread Yuao Ma
Hi Thomas,

Thanks for your quick response.

> There should be a ChangeLog entry. You can generate a template by
> running your patch through contrib/mklog.py, which you can then fill
> out with the details.

I think the ChangeLog entry is already in the patch. I used
git gcc-commit-mklog, and it produced the following:

```
libgfortran/ChangeLog:

   * io/read.c: typo fix, explict -> explicit
```

Is this sufficient, or does it need more detail?

Yuao


0001-Fix-simple-typo-in-libgfortran.patch
Description: 0001-Fix-simple-typo-in-libgfortran.patch


Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-10 Thread Sam James
Kugan Vivekanandarajah  writes:

> Add support for autoprofiledbootstrap in aarch64.
> This is similar to what is done for i386. Added
> gcc/config/aarch64/gcc-auto-profile for aarch64 profile
> creation.
>
> How to run:
> configure --with-build-config=bootstrap-lto
> make autoprofiledbootstrap
>
>
> Regression tested on aarch64-linux-gnu with no new regression.
> Also successfully  done autoprofiledbootstrap with the relevant patch.
>
> Is this OK for trunk?

I'd CC Eugene, the autofdo maintainer, on these patches so he can review
them. Andi Kleen is also interested in autofdo.

> Thanks,
> Kugan
>
> [2. 0004-AUTOFDO-AARCH64-Add-support-for-profilebootstrap.patch --- 
> text/x-patch; 0004-AUTOFDO-AARCH64-Add-support-for-profilebootstrap.patch]...

FWIW, gcc/config/aarch64/gcc-auto-profile has a #!/bin/sh shebang yet
uses a bashism ('=='). Just change that to '='.


Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn

2025-05-10 Thread Jeff Law




On 5/9/25 2:27 PM, Vineet Gupta wrote:

This showed up when debugging the testcase for PR119164.

RISC-V FRM mode-switching state machine has special handling for transitions
to and from a call_insn as FRM needs to saved/restored around calls (any
call is considered potentially FRM clobbering). Consider the following
canonical example where insns 2, 4, 6 come are due to user code, while
the rest of frm save/restore insns 1, 3, 5, 7 need to be generated for the
ABI semantics.

 test_float_point_frm_static:
 1:  frrma5 <--
  2: fsrmi   2
 3:  fsrma5 <--
  4: callnormalize_vl
 5:  frrma5 <--
  6: fsrmi   3
 7:  fsrma5 <--
Not the focus of this series, but isn't the frrm@5 unnecessary?  If 
there is no use of FRM after a call, but before the next set of FRM, 
then the restore isn't needed.  This is a pretty standard optimization 
in caller-saves style code generation.  Though maybe it's really an 
artifact of not showing the uses of FRM in your little example?





Current implementation of RISC-V TARGET_MODE_NEEDED has special handling
if the call_insn is last insn of BB, to ensure FRM save/reads are emitted
on all the edges. However it doesn't work as intended and is borderline
bogus for following reasons:

  - It fails to detect call_insn as last of BB (PR119164 test) if the
next BB starts with a code label (say due to call being conditional).
Granted this is a deficiency of API next_nonnote_nondebug_insn_bb ()
which incorrectly returns next BB code_label as opposed to returning
NULL (and this behavior is kind of relied upon by much of gcc).
This causes missed/delayed state transition to DYN.

  - If code is tightened to actually detect above such as:

  -  rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
  -  if (!insn)
  +  if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn)
I'm guessing the concern here was CUR_INSN could be a call and the last 
real insn in the block, but there could be notes and debug statements 
between the call and the actual end of the block?


Would it be safer as:
rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
if (!insn || BLOCK_FOR_INSN (cur_insn) != BLOCK_FOR_INSN (insn)

But it's all academic since this code really just needs to go away.




edge insertion happens but ends up splitting the BB which generic
mode-sw doesn't expect and ends up hittng an ICE.

  - TARGET_MODE_NEEDED hook typically don't modify the CFG.

Yea, we probably don't want TARGET_MODE_NEEDED to be modifying the CFG.



  - For abnormal edges, insert_insn_end_basic_block () is called, which
by design on encountering call_insn as last in BB, inserts new insn
BEFORE the call, not after.
Right.  In theory we're not supposed to be asking for insertion on 
abnormals anyway.  It would be interesting to know if that's still 
happening or if it's legacy from the late 90s when we first started 
lighting up LCM algorithms and made all the common mistakes WRT abnormals.





So this is just all wrong and ripe for removal. Moreover there seems to
be no testsuite coverage for this code path at all. Results don't change
at all if this is removed.

The total number of FRM read/writes emitted (static count) across all
benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly
so its a net win even if minimal but the real gain is reduced complexity
and maintenance.

Before Patch
  ---
 frrm fsrmi fsrm   frrm fsrmi frrm
 perlbench_r   4204  4204
cpugcc_r  1670   17 1670   17
bwaves_r   1601  1601
   mcf_r   1100  1100
cactusBSSN_r   790   27  760   27
  namd_r  1190   63 1190   63
parest_r  2180  114 1680  114 <--
povray_r  1231   17 1231   17
   lbm_r600   600
   omnetpp_r   1701  1701
   wrf_r 2287   13 19562287   13 1956
  cpuxalan_r   1701  1701
ldecod_r   1100  1100
  x264_r   1401  1401
   blender_r  724   12  182 724   12  182
  cam4_r  324   13  169 324   13  169
 deepsjeng_r   1100  1100
   imagick_r  265   16   34 265   16   34
 leela_r   1200  1200
   nab_r   1301  1301
 exchange2_r   1601  1601
 fotonik3d_r   200   11  200   11
  roms_r   330   23  330   23
xz_r600   600
      ---
 4551   55 26234498   55 2623

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_frm_emit_after_b

Re: [PATCH 2/6] RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE

2025-05-10 Thread Jeff Law




On 5/9/25 2:27 PM, Vineet Gupta wrote:

This is effectively reverting e5d1f538bb7d
"(RISC-V: Allow different dynamic floating point mode to be merged)"
while retaining the testcase.

The change itself is valid, however it obfuscates the deficiencies in
current frm mode switching code.

Also for a SPEC2017 -Ofast -march=rv64gcv build, it ends up generating
net more FRM restores (writes) vs. the rest of this changeset.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_dynamic_frm_mode_p): Remove.
(riscv_mode_confluence): Ditto.
(TARGET_MODE_CONFLUENCE): Ditto.

Unsure on this one.




-   /* FRM_DYN, FRM_DYN_CALL and FRM_DYN_EXIT are all compatible.
-  Although we already try to set the mode needed to FRM_DYN after a
-  function call, there are still some corner cases where both FRM_DYN
-  and FRM_DYN_CALL may appear on incoming edges.  */
Do we have an understanding of these corner cases?  That's my biggest 
worry with simply removing this code.


jeff



Re: [PATCH 4/6] RISC-V: frm/mode-switch: TARGET_MODE_AFTER not needed for frm switching

2025-05-10 Thread Jeff Law




On 5/9/25 2:27 PM, Vineet Gupta wrote:

Stumbled upon this when trying to wholesale rewrite frm switching code
and seeing what pieces needed to be retained from current implementation.

My interpretation of how this hook worked, for the following case:

 fsrmi 3
   fsrm a4
 call
   frrm a4
 fsrmi 1

TARGET_MODE_NEEDED(call_insn) returns DYN_EXIT (to generate fsrm) and
TARGET_MODE_AFTER(call_insn) returns DYN (to generate frrm). However
for a given insn, if the 2 hooks return different values, the final
state machine doesn't switch as expected above (and instead both NEEDED
and AFTER need to return the same mode, for most cases).

Anyhow it turns out that no-oping this (return the last_mode back) doesn't
change any testcase outcomes. There's no change to total number of FRM
read/writes emitted (static count) for SPEC2017 -Ofast -march=rv64gcv build
But we win again on reduced complexity and maintenance.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_frm_mode_needed): Move static
state update here.
(frm_unknown_dynamic_p): Delete.
(riscv_frm_mode_after): Delete.
(riscv_mode_after): Remove call to riscv_frm_mode_after ().

This doesn't seem right to me.

Don't we need to know when an insn sets FRM to an unknown value, either 
due to a call or due to an explicit assignment to FRM from a GPR?


I suspect that you don't see a difference when you nop this out because 
spec doesn't inherently do FRM mode changes and we probably have lousy 
coverage in the testsuite.


Jeff


[COMMITED] [PATCH v2] gimple: Don't assert that switch has nondefault cases during lowering [PR120080]

2025-05-10 Thread Filip Kastl
I have mistakenly assumed that switch lowering cannot encounter a switch
with zero clusters.  This patch removes the relevant assert and instead
gives up bit-test lowering when this happens.

PR tree-optimization/120080

gcc/ChangeLog:

* tree-switch-conversion.cc (bit_test_cluster::find_bit_tests):
Replace assert with return.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr120080.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.dg/tree-ssa/pr120080.c | 26 
 gcc/tree-switch-conversion.cc|  8 +---
 2 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr120080.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c
new file mode 100644
index 000..d71ef5e9dd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -O2" } */
+
+void __GIMPLE (ssa,startwith("switchlower1"))
+foo (int b)
+{
+  __BB(2):
+  switch (b) {default: L9; case 0: L5; case 5: L5; case 101: L5; }
+
+  __BB(3):
+L9:
+  switch (b) {default: L7; case 5: L6; case 101: L6; }
+
+  __BB(4):
+L6:
+  __builtin_unreachable ();
+
+  __BB(5):
+L7:
+  __builtin_trap ();
+
+  __BB(6):
+L5:
+  return;
+
+}
diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index dea217a01ef..bd4de966892 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1793,12 +1793,14 @@ bit_test_cluster::find_bit_tests (vec 
&clusters, int max_c)
  end up with as few clusters as possible.  */
 
   unsigned l = clusters.length ();
-  auto_vec min;
-  min.reserve (l + 1);
 
-  gcc_checking_assert (l > 0);
+  if (l == 0)
+return clusters.copy ();
   gcc_checking_assert (l <= INT_MAX);
 
+  auto_vec min;
+  min.reserve (l + 1);
+
   int bits_in_word = GET_MODE_BITSIZE (word_mode);
 
   /* First phase: Compute the minimum number of clusters for each prefix of the
-- 
2.49.0



Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn

2025-05-10 Thread Vineet Gupta
On 5/10/25 07:17, Jeff Law wrote:
> On 5/9/25 2:27 PM, Vineet Gupta wrote:
>> This showed up when debugging the testcase for PR119164.
>>
>> RISC-V FRM mode-switching state machine has special handling for transitions
>> to and from a call_insn as FRM needs to saved/restored around calls (any
>> call is considered potentially FRM clobbering). Consider the following
>> canonical example where insns 2, 4, 6 come are due to user code, while
>> the rest of frm save/restore insns 1, 3, 5, 7 need to be generated for the
>> ABI semantics.
>>
>>  test_float_point_frm_static:
>>  1:  frrma5 <--
>>   2: fsrmi   2
>>  3:  fsrma5 <--
>>   4: callnormalize_vl
>>  5:  frrma5 <--
>>   6: fsrmi   3
>>  7:  fsrma5 <--
> Not the focus of this series, but isn't the frrm@5 unnecessary?  

It is necessary. Any function call can potentially change frm. Assuming
normalize_vl does, we need to read the latest "global" so that the restore in
insn 7 does the right thing.


> If there is no use of FRM after a call, but before the next set of FRM, 
> then the restore isn't needed.  This is a pretty standard optimization 
> in caller-saves style code generation.  Though maybe it's really an 
> artifact of not showing the uses of FRM in your little example?

Please see above.

>> Current implementation of RISC-V TARGET_MODE_NEEDED has special handling
>> if the call_insn is last insn of BB, to ensure FRM save/reads are emitted
>> on all the edges. However it doesn't work as intended and is borderline
>> bogus for following reasons:
>>
>>   - It fails to detect call_insn as last of BB (PR119164 test) if the
>> next BB starts with a code label (say due to call being conditional).
>> Granted this is a deficiency of API next_nonnote_nondebug_insn_bb ()
>> which incorrectly returns next BB code_label as opposed to returning
>> NULL (and this behavior is kind of relied upon by much of gcc).
>> This causes missed/delayed state transition to DYN.
>>
>>   - If code is tightened to actually detect above such as:
>>
>>   -  rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
>>   -  if (!insn)
>>   +  if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn)
> I'm guessing the concern here was CUR_INSN could be a call and the last 
> real insn in the block, but there could be notes and debug statements 
> between the call and the actual end of the block?

Yeah in the test case, there was a label as actual last insn of BB, but generic
mode_sw after call skips over labels (NONDEBUG_INSN_P used in the iterator).
Whereas the {prev,next}_nonnote_nondebug_insn_bb () doesn't hence the impedance
mismatch.

> Would it be safer as:
> rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn);
> if (!insn || BLOCK_FOR_INSN (cur_insn) != BLOCK_FOR_INSN (insn)
>
> But it's all academic since this code really just needs to go away.

Exactly.

>> edge insertion happens but ends up splitting the BB which generic
>> mode-sw doesn't expect and ends up hittng an ICE.
>>
>>   - TARGET_MODE_NEEDED hook typically don't modify the CFG.
> Yea, we probably don't want TARGET_MODE_NEEDED to be modifying the CFG.
>
>>   - For abnormal edges, insert_insn_end_basic_block () is called, which
>> by design on encountering call_insn as last in BB, inserts new insn
>> BEFORE the call, not after.
> Right.  In theory we're not supposed to be asking for insertion on 
> abnormals anyway.  It would be interesting to know if that's still 
> happening or if it's legacy from the late 90s when we first started 
> lighting up LCM algorithms and made all the common mistakes WRT abnormals.

Well the RISC-V code was looping thru all the edges and handling regular and
abnormal edges.
When I added the debug prints there, it was sometimes hitting the abnormal edge
case and putting before call, not after.

>> So this is just all wrong and ripe for removal. Moreover there seems to
>> be no testsuite coverage for this code path at all. Results don't change
>> at all if this is removed.
>>
>> The total number of FRM read/writes emitted (static count) across all
>> benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly
>> so its a net win even if minimal but the real gain is reduced complexity
>> and maintenance.
>>
>> Before Patch
>>   ---
>>  frrm fsrmi fsrm   frrm fsrmi frrm
>>  perlbench_r   4204  4204
>> cpugcc_r  1670   17 1670   17
>> bwaves_r   1601  1601
>>mcf_r   1100  1100
>> cactusBSSN_r   790   27  760   27
>>   namd_r  1190   63 1190   63
>> parest_r  2180  114 1680  114 <--
>> povray_r  1231   17 1231   17
>>lbm_r600   600
>>omnetpp

Re: [PATCH v4 1/2] RISC-V: Support RISC-V Profiles 20/22.

2025-05-10 Thread Jiawei

Committed on trunk, thanks!
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=43b450e3f72a53c744e77f55393962f1d349373a

 BR,

Jiawei


在 2025/5/11 0:42, Jeff Law 写道:



On 5/10/25 6:30 AM, Jiawei wrote:

This patch introduces support for RISC-V Profiles RV20 and RV22 [1],
enabling developers to utilize these profiles through the -march option.

[1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0

Version log:
Using lowercase letters to present Profiles.
Using '_' as divsor between Profiles and other RISC-V extension.
Add descriptions in invoke.texi.
Checking if there exist '_' between Profiles and additional extensions.
Using std::string to avoid memory problems.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (struct riscv_profiles): 
New struct.

(riscv_subset_list::parse_profiles): New parser.
(riscv_subset_list::parse_base_ext): Ditto.
* config/riscv/riscv-subset.h: New def.
* doc/invoke.texi: New option descriptions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-49.c: New test.
* gcc.target/riscv/arch-50.c: New test.
* gcc.target/riscv/arch-51.c: New test.
* gcc.target/riscv/arch-52.c: New test.

Both patches in this series are OK.  Please install.

Thanks again,
Jeff






Re: [PING] [PATCH] Fortran: fix passing of inquiry ref of complex array to TRANSFER [PR102891]

2025-05-10 Thread Harald Anlauf

Hi Jerry!

Am 10.05.25 um 21:53 schrieb Jerry D:

On 5/10/25 11:33 AM, Harald Anlauf wrote:

Early ping.

Am 06.05.25 um 21:06 schrieb Harald Anlauf:

Dear all,

here's another rather obvious case where a temporary is needed for
an inquiry reference of a complex array which is a component of a
derived type.  In contrast to PR119986, the argument is handled
within the code for the intrinsic TRANSFER, so that the other
patch did not catch the present one.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Yes, quite alright.


Thanks for the review!

Pushed as r16-518-g94fa992b60e53d so far.

Cheers,
Harald



I'd also like to backport this one to 15-branch if this is ok.


... and please do backport.

Best regards,

Jerry



Thanks,
Harald













Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets

2025-05-10 Thread Jonathan Yong

On 5/10/25 12:54 PM, LIU Hao wrote:

在 2025-5-10 20:48, Jonathan Yong 写道:

On 5/9/25 4:26 PM, LIU Hao wrote:

在 2025-5-3 20:52, LIU Hao 写道:

在 2025-5-2 01:25, LIU Hao 写道:
Remove `STACK_REALIGN_DEFAULT` for this target, because now the 
default value of `incoming_stack_boundary` equals 
`MIN_STACK_BOUNDARY` and it doesn't have an effect any more.





I suddenly realized the previous patch was for GCC 15 branch. Here's 
a new one, rebased on master.





Ping.

Also, I think I need some comments on the `force_align_arg_pointer` 
hunk. There's really no good reason for the change, except that 'we 
had better let the attribute do something.'




Can you please rebase your patch? Thanks.


Here is a patch, rebased on current master.




Thanks, pushed to master branch.



Re: [extern] Re: [PATCH 1/2] libstdc++: Support UNC paths on MinGW in fs::path

2025-05-10 Thread Johannes Grunenberg
On 10/05/2025 15:09, Jonathan Wakely wrote:
> Please confirm that you're using it that way (and not just adding it
> because others do so).

Yes, that's how I'm using it.


Re: [PATCH 0/6] RISC-V: frm state-machine improvements

2025-05-10 Thread Jeff Law




On 5/9/25 2:27 PM, Vineet Gupta wrote:

Hi,

This came out of Rivos perf team reporting (shoutout to Siavash) that
some of the SPEC2017 workloads had unnecessary FRM wiggles, when
none were needed. The writes in particular could be expensive.

I started with reduced test for PR/119164 from blender:node_testure_util.c.

However in trying to understand (and a botched rewrite of whole thing)
it turned out that lot of code was just unnecessary leading to more
complexity than warranted. As a result there are more deletions here and
the actual improvements come from just a few lines of actual changes.

I've verified each patch incrementally with
  - Testsuite run (unchanged, 1 unexpected pass 
gcc.target/riscv/rvv/autovec/pr119114.c)
  - SPEC build
  - Static analysis of FRM read/write insns emitted in all of SPEC binaries.
  - There's BPI date for some of this too, but the delta there is not
significant as this could really be uarch specific.

Here's the result for static analysis.


 1. revert-confluence  2. remove-edge-insert  4-fewer-frm-restore  
5-call-backtrack
   3. remove-mode-after
   ---    ---  
---
 frrm fsrmi fsrm   frrm fsrmi fsrm   frrm fsrmi fsrm
 frrm fsrmi fsrm
 perlbench_r   4204  4204  1701 
   1701
cpugcc_r  1670   17 1670   17  1100 
   1100
bwaves_r   1601  1601  1601 
   1601
   mcf_r   1100  1100  1100 
   1100
cactusBSSN_r   790   27  760   27  1901 
   1901
  namd_r  1190   63 1190   63  1401 
   1401
parest_r  2180  114 1680  114  2401 
   2401
povray_r  1231   17 1231   17  2616 
   2616
   lbm_r600   600   600 
600
   omnetpp_r   1701  1701  1701 
   1701
   wrf_r 2287   13 19562287   13 19561268   13 1603 
  613   13   82
  cpuxalan_r   1701  1701  1701 
   1701
ldecod_r   1100  1100  1100 
   1100
  x264_r   1401  1401  1100 
   1100
   blender_r  724   12  182 724   12  182  61   12   42 
   39   12   16
  cam4_r  324   13  169 324   13  169  45   13   20 
   40   13   17
 deepsjeng_r   1100  1100  1100 
   1100
   imagick_r  265   16   34 265   16   34 132   16   25 
   33   16   18
 leela_r   1200  1200  1200 
   1200
   nab_r   1301  1301  1301 
   1301
 exchange2_r   1601  1601  1601 
   1601
 fotonik3d_r   200   11  200   11  1901 
   1901
  roms_r   330   23  330   23  2101 
   2101
xz_r600   600   600 
600
     ---  ---  

 4551   55 26234498   55 26231804   55 1707 
 1023   55  150
     ---  ---  

   7729  7176  3566 
   1228
     ---  ---  


It seems wrf still has half of all read/writes
  613   13   82

with one function having the bulk of them
   solve_em_  5551   50

This is 1 static RM so ideally needs 1 save and 1 restore.

I have a feeling this has to do with following:
 https://godbolt.org/z/Px9es7j1r

The function call code path need not bother with frm save/restore at
all. This is currently being investigated but could take more time.
Frankly I'm surprised we need FRM adjustments as much as we do, though 
presumably there's some builtin or somesuch that we need to twiddle FRM 
to implement and as a result if the builtin ever gets used it leads to 
FRM games.  But it still seems high.  For example, what does xz do that 
triggers any FRM adjustments, even statically?!?





Please review.

Will do.

Thanks!

jeff



Re: [PATCH, fortran] Fix simple typo in libgfortran

2025-05-10 Thread Tobias Burnus

Hi Yuao,


Yuao Ma wrote:
Thanks for the tip! I ran the git_check_commit.py script, and it 
reported "OK."
However, when I regenerated the ChangeLog, the function name wasn't 
included

automatically.


Usually 'git diff' shows the function name in the '@@' line, but that does
not always work – either it shows no function name (as in this case)
or the wrong one.

Showing the wrong happens relatively often when, e.g., adding a member
to a 'struct' or when updating the function description comment as that
comes before the associated function. And not showing the name happens
most often for labels (see below).

For instance, picking the most recent commit with code changes,
a diff usually has something like the following - and then mklog.py works:

--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1793,12 +1793,14 @@ bit_test_cluster::find_bit_tests (vec 
&clusters, int max_c)

→ 'bit_test_cluster::find_bit_tests' is the function name.

However, in your case, the label ('exponent:') caused that 'git diff'
showed it instead of the function name from a few lines up. Namely,
your patch diff has:

--- a/libgfortran/io/read.c
+++ b/libgfortran/io/read.c
@@ -1375,7 +1375,7 @@ exponent:


* * *



I've manually added the function name to the ChangeLog for now


Thanks.

* * *

One very minor nit:

The email subject reads
  "[PATCH, fortran] Fix simple typo in libgfortran"
while the patch subject is
  "[PATCH] fortran: fix simple typo in libgfortran"

It's usually nicer if they are the same to make finding old patch
reviews easier - usage case: someone wonders 5 years on why a
certain change was done in a particular way and looks at the old
email thread.

In your case, those are similar enough that it doesn't matter - and
sometimes it is unavoidable that they diverge (e.g. when fixing the
bug number before the commit - while the email thread shows still
the wrong bug), but, if possible/sensible, they should be the same.

* * *

I have now applied it on your behalf as
r16-515-g5d9e66493afaa9 alias 5d9e66493afaa9

[The first version one denotes the 515th commit to GCC 16 (= mainline
alias trunk); that way, it is easier to see in which GCC version a
commit was done - and (relatively to other commits) when.]

Short links - which is a readable way to link to commits
(I hope Outlook.com doesn't mangle them in an unreadable way):
https://gcc.gnu.org/r16-515-g5d9e66493afaa9
orhttps://gcc.gnu.org/g:5d9e66493afaa9


If you ever want to link to bug reports (PR, problem report),
something likehttps://gcc.gnu.org/PR1234 can be used.

Tobias

PS: The ./contrib/git-descr.sh script creates the rXX-YYY number; or
'git gcc-descr' if you added the alias (e.g. via 
contrib/gcc-git-customization.sh).


Re: [parch, fortran] Fix PR 120163, 15/16 regression

2025-05-10 Thread Harald Anlauf

Hi Thomas!

Am 10.05.25 um 11:42 schrieb Thomas Koenig:



This bug was another case of generating a formal arglist from
an actual one where we should not have done so.  The fix is
straightforward:  If we have resolved the formal arglist, we should
not generare a new one.

OK for trunk and backport to gcc 15?


Assuming this regtests OK (you didn't say so) this is OK for both.

Thanks for the patch!

Harald


Best regards

 Thomas

Do not generate formal arglist from actual if we have already resolved it.

gcc/fortran/ChangeLog:

 PR fortran/120163
 * gfortran.h: Add formal_resolved to gfc_symbol.
 * resolve.cc (gfc_resolve_formal_arglist): Set it.
 (resolve_function): Do not call gfc_get_formal_from_actual_arglist
 if we already resolved a formal arglist.
 (resolve_call): Likewise.





[PING] [PATCH] Fortran: fix passing of inquiry ref of complex array to TRANSFER [PR102891]

2025-05-10 Thread Harald Anlauf

Early ping.

Am 06.05.25 um 21:06 schrieb Harald Anlauf:

Dear all,

here's another rather obvious case where a temporary is needed for
an inquiry reference of a complex array which is a component of a
derived type.  In contrast to PR119986, the argument is handled
within the code for the intrinsic TRANSFER, so that the other
patch did not catch the present one.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

I'd also like to backport this one to 15-branch if this is ok.

Thanks,
Harald





Re: [PATCH v1] libstdc++: More efficient weekday from year_month_day.

2025-05-10 Thread Cassio Neri
Thanks Andrew for your prompt reply.

The results below regard my PoC which is as close to the proposed patch as
I could make. This is because I can't have chrono with my patch on godbold
for a comparison between current chrono and patched chrono.

I tried on all platforms that I could make it to compile. Please double
check everything because I might be misreading some results, especially on
the platforms that I'm not familiar with. Sometimes godbold seems to have
issues and cut pieces of the generated assembly from the output. I've
marked these cases with (X).

I suspect we want to disable this for -Os
>

Below are the sizes with -Os. Most of the time the new code is shorter than
the old one with a few exceptions where they are the same size (because the
platform doesn't seem to support [[assume]]). The new code is never longer.
On each link, the middle panel shows the result for the old code and the
right panel for the new code. These panels have tabs for different
platforms.

Old   New

https://godbolt.org/z/hfz9szEWf

x86-64 0x81  0x69
ARM32  0x78  0x68
ARM64  0x81  0x71
ARM64 Morello  0x48  0x48
HPPA   0xf8  0xc8
KVX ACB0xec  0xcc
loongarch640x94  0x8c  (x)

https://godbolt.org/z/eMfzoPhT5

M68K   0xb6  0xa6
MinGW  0xa0  0x80  (X)
mips   0xdc  0xac
mips64 0xcc  0xb8
mipls64 (el)   0xbc  0xa8
mipsel 0xe0  0xb0
MRISC320xa4  0x74
power  0xb8  0x80

https://godbolt.org/z/PjqbTqK6b

power640xa8  0x8c
power64le  0xa4  0x88
RISC-V (32)0x90  0x7e  (X)
RISC-V (64)0x86  0x86  (X)
s390x  0xf0  0x90
sh 0xc2  0xb2
SPARC  0xc0  0x98
SPARC LEON 0xbc  0x94

https://godbolt.org/z/7oebGMYTM

SPARC640xac  0x94
TI C6x 0xc4  0x98
Tricore0xb0  0xb0
VAX0xc8  0xc5


> And plus i am not 100% convinced it is best for all micro-architures.
> Especially on say aarch64.
> Can you do more benchmarking and peocide which exaxt core is being used?
>

I don't have access to any platform other than x86-64 to do benchmarks :-(

And mention the size difference too?
>

Same exercise explained above but with -O2:

 OldNew
https://godbolt.org/z/eqGo9xnz3

x86-64 0x a4  0x 72
ARM32  0x a8  0x 74
ARM64  0x 98  0x 80
ARM64 Morello  0x14c  0x14c
HPPA   0x134  0x c8
KVX ACB0x f4  0x 98
loongarch640x ac  0x 9c  (X)

https://godbolt.org/z/7qh94zGMK

M68K   0x13a  0x a2
MinGW  0x d0  0x 80  (X)
mips   0x11c  0x e4
mips64 0x130  0x f0
mipls64 (el)   0x120  0x e0
mipsel 0x120  0x e8
MRISC320x a0  0x 74
power  0x dc  0x 88

https://godbolt.org/z/Y11Trnqc1

power640x d0  0x 94
power64le  0x d0  0x 90
RISC-V (32)0x bc  0x 84  (X)
RISC-V (64)0x be  0x 94  (X)
s390x  0x f0  0x a8
sh 0x c6  0x cc  (*)
SPARC  0x108  0x 9c
SPARC LEON 0x f4  0x 94

https://godbolt.org/z/h456PTEWh

SPARC640x c0  0x a0
TI C6x 0x108  0x b0
Tricore0x e8  0x ea  (*)
VAX0x dc  0x dc

(*) These are the only cases where the new code is larger than the old one.

Plus gcc knows how to do %7 via multiplication is that being used or is it
> due to generic x86 tuning it is using the div instruction?
>

Yes and no. In x86-64 (and probably many other platforms) the current
optimisation for n % 7 is a byproduct of the optimisation for /, that is,
to calculate n % 7, the generated code evaluates n - (n / 7) * 7. The
quotient q = n / 7 is optimised to avoid div and uses a multiplication and
other cheaper operations. In total it evaluates 2 multiplications + shifts
+ add + subs and movs. (One multiplication is q*7 which is performed with
LEA + sub.) The algorithm that I'm suggesting, performs only one
multiplication and one. Below are the comparisons of n % 7 and the proposed
algorithm.

https://godbolt.org/z/o7dazs4Gc
https://godbolt.org/z/zP79736WK
https://godbolt.org/z/65x7naMfq
https://godbolt.org/z/z9ofaMzex

I hope this helps.
Cassio.


Re: i386: Fix some problems in stv cost model

2025-05-10 Thread H.J. Lu
On Sun, May 11, 2025 at 4:28 AM Jan Hubicka  wrote:
>
> Hi,
> this patch fixes some of problems with cosint in scalar to vector pass.
> In particular

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120215

>  1) the pass uses optimize_insn_for_size which is intended to be used by
> expanders and splitters and requires the optimization pass to use
> set_rtl_profile (bb) for currently processed bb.
> This is not done, so we get random stale info about hotness of insn.
>  2) register allocator move costs are all realtive to integer reg-reg move
> which has cost of 2, so it is (except for size tables and i386)
> a latency of instruction multiplied by 2.
> These costs has been duplicated and are now used in combination with
> rtx costs which are all based to COSTS_N_INSNS that multiplies latency
> by 4.
> Some of vectorizer costing contains COSTS_N_INSNS (move_cost) / 2
> to compensate, but some new code does not.  This patch adds compensatoin.
>
> Perhaps we should update the cost tables to use COSTS_N_INSNS everywher
> but I think we want to first fix inconsistencies.  Also the tables will
> get optically much longer, since we have many move costs and COSTS_N_INSNS
> is a lot of characters.
>  3) variable m which decides how much to multiply integer variant (to account
> that with -m32 all 64bit computations needs 2 instructions) is declared
> unsigned which makes the signed computation of instruction gain to be
> done in unsigned type and breaks i.e. for division.
>  4) I added integer_to_sse costs which are currently all duplicationof
> sse_to_integer. AMD chips are asymetric and moving one direction is faster
> than another.  I will chance costs incremnetally once vectorizer part
> is fixed up, too.
>
> There are two failures gcc.target/i386/minmax-6.c and 
> gcc.target/i386/minmax-7.c.
> Both test stv on hasswell which no longer happens since SSE->INT and INT->SSE 
> moves
> are now more expensive.
>
> There is only one instruction to convert:
>
> Computing gain for chain #1...
>   Instruction gain 8 for11: {r110:SI=smax(r116:SI,0);clobber flags:CC;}
>   Instruction conversion gain: 8
>   Registers conversion cost: 8<- this is integer_to_sse and sse_to_integer
>   Total gain: 0
>
> total gain used to be 4 since the patch doubles the conversion costs.
> According to agner fog's tables the costs should be 1 cycle which is correct
> here.
>
> Final code gnerated is:
>
> vmovd   %esi, %xmm0 * latency 1
> cmpl%edx, %esi
> je  .L2
> vpxor   %xmm1, %xmm1, %xmm1 * latency 1
> vpmaxsd %xmm1, %xmm0, %xmm0 * latency 1
> vmovd   %xmm0, %eax * latency 1
> imull   %edx, %eax
> cltq
> movzwl  (%rdi,%rax,2), %eax
> ret
>
> cmpl%edx, %esi
> je  .L2
> xorl%eax, %eax  * latency 1
> testl   %esi, %esi  * latency 1
> cmovs   %eax, %esi  * latency 2
> imull   %edx, %esi
> movslq  %esi, %rsi
> movzwl  (%rdi,%rsi,2), %eax
> ret
>
> Instructions with latency info are those really different.
> So the uncoverted code has sum of latencies 4 and real latency 3.
> Converted code has sum of latencies 4 and real latency 3 (vmod+vpmaxsd+vmov).
> So I do not quite see it should be a win.
>
> There is also a bug in costing MIN/MAX
>
> case ABS:
> case SMAX:
> case SMIN:
> case UMAX:
> case UMIN:
>   /* We do not have any conditional move cost, estimate it as a
>  reg-reg move.  Comparisons are costed as adds.  */
>   igain += m * (COSTS_N_INSNS (2) + ix86_cost->add);
>   /* Integer SSE ops are all costed the same.  */
>   igain -= ix86_cost->sse_op;
>   break;
>
> Now COSTS_N_INSNS (2) is not quite right since reg-reg move should be 1 or 
> perhaps 0.
> For Haswell cmov really is 2 cycles, but I guess we want to have that in cost 
> vectors
> like all other instructions.
>
> I am not sure if this is really a win in this case (other minmax testcases 
> seems to make
> sense).  I have xfailed it for now and will check if that affects specs on 
> LNT testers.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> I will proceed with similar fixes on vectorizer cost side. Sadly those 
> introduces
> quite some differences in the testuiste (partly triggered by other costing 
> problems,
> such as one of scatter/gather)
>
> gcc/ChangeLog:
>
> * config/i386/i386-features.cc
> (general_scalar_chain::vector_const_cost): Add BB parameter; handle
> size costs; use COSTS_N_INSNS to compute move costs.
> (general_scalar_chain::compute_convert_gain): Use optimize_bb_for_size
> instead of optimize_insn_for size; use COSTS_N_INSNS to compute move 
> costs;
> update cal

Re: [PATCH 0/6] RISC-V: frm state-machine improvements

2025-05-10 Thread Vineet Gupta
On 5/10/25 06:49, Jeff Law wrote:
> On 5/9/25 2:27 PM, Vineet Gupta wrote:
>> Hi,
>>
>> This came out of Rivos perf team reporting (shoutout to Siavash) that
>> some of the SPEC2017 workloads had unnecessary FRM wiggles, when
>> none were needed. The writes in particular could be expensive.
>>
>> I started with reduced test for PR/119164 from blender:node_testure_util.c.
>>
>> However in trying to understand (and a botched rewrite of whole thing)
>> it turned out that lot of code was just unnecessary leading to more
>> complexity than warranted. As a result there are more deletions here and
>> the actual improvements come from just a few lines of actual changes.
>>
>> I've verified each patch incrementally with
>>   - Testsuite run (unchanged, 1 unexpected pass 
>> gcc.target/riscv/rvv/autovec/pr119114.c)
>>   - SPEC build
>>   - Static analysis of FRM read/write insns emitted in all of SPEC binaries.
>>   - There's BPI date for some of this too, but the delta there is not
>> significant as this could really be uarch specific.
>>
>> Here's the result for static analysis.
>>
>>
>>  1. revert-confluence  2. remove-edge-insert  
>> 4-fewer-frm-restore  5-call-backtrack
>>3. remove-mode-after
>>---    
>> ---  ---
>>  frrm fsrmi fsrm   frrm fsrmi fsrm   frrm fsrmi fsrm 
>> frrm fsrmi fsrm
>>  perlbench_r   4204  4204  1701  
>>   1701
>> cpugcc_r  1670   17 1670   17  1100  
>>   1100
>> bwaves_r   1601  1601  1601  
>>   1601
>>mcf_r   1100  1100  1100  
>>   1100
>> cactusBSSN_r   790   27  760   27  1901  
>>   1901
>>   namd_r  1190   63 1190   63  1401  
>>   1401
>> parest_r  2180  114 1680  114  2401  
>>   2401
>> povray_r  1231   17 1231   17  2616  
>>   2616
>>lbm_r600   600   600  
>>600
>>omnetpp_r   1701  1701  1701  
>>   1701
>>wrf_r 2287   13 19562287   13 19561268   13 1603  
>>  613   13   82
>>   cpuxalan_r   1701  1701  1701  
>>   1701
>> ldecod_r   1100  1100  1100  
>>   1100
>>   x264_r   1401  1401  1100  
>>   1100
>>blender_r  724   12  182 724   12  182  61   12   42  
>>   39   12   16
>>   cam4_r  324   13  169 324   13  169  45   13   20  
>>   40   13   17
>>  deepsjeng_r   1100  1100  1100  
>>   1100
>>imagick_r  265   16   34 265   16   34 132   16   25  
>>   33   16   18
>>  leela_r   1200  1200  1200  
>>   1200
>>nab_r   1301  1301  1301  
>>   1301
>>  exchange2_r   1601  1601  1601  
>>   1601
>>  fotonik3d_r   200   11  200   11  1901  
>>   1901
>>   roms_r   330   23  330   23  2101  
>>   2101
>> xz_r600   600   600  
>>600
>>  ---  
>> ---  
>>  4551   55 26234498   55 26231804   55 1707  
>> 1023   55  150
>>  ---  
>> ---  
>>7729  7176  3566  
>>   1228
>>  ---  
>> ---  
>>
>> It seems wrf still has half of all read/writes
>>   613   13   82
>>
>> with one function having the bulk of them
>>solve_em_  5551   50
>>
>> This is 1 static RM so ideally needs 1 save and 1 restore.
>>
>> I have a feeling this has to do with following:
>>  https://godbolt.org/z/Px9es7j1r
>>
>> The function call code path need not bother with frm save/restore at
>> all. This is currently being investigated but could take more time.

I'll get back to this in detail in a follow up. It is kind of add-on