Re: [PATCH] fortran: Fix debug info for unsigned(kind=1) and unsigned(kind=4) [PR120193]
Hi Jakub, The following patch uses a variant of the character(kind=4) type for unsigned(kind=4) and a distinct type based on character(kind=1) type for unsigned(kind=1). The reason for the latter is that unsigned_char_type_node has TYPE_STRING_FLAG set on it, so it has DW_AT_encoding DW_ATE_unsigned_char rather than DW_ATE_unsigned and so the debugger then likes to print it as characters rather than numbers. That is IMHO in Fortran desirable for character(kind=1) but not for unsigned(kind=1). I've made sure TYPE_CANONICAL of the unsigned(kind=1) type is still character(kind=1), so they are considered compatible by the middle-end also e.g. for aliasing etc. Ah, that's how to do it. I wasn't really comfortable re-using CHARACTER, but I had found no better way. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and later on for 15.2? Yes. And thanks for the patch! Best regards Thomas
[PATCH v4 1/2] RISC-V: Support RISC-V Profiles 20/22.
This patch introduces support for RISC-V Profiles RV20 and RV22 [1], enabling developers to utilize these profiles through the -march option. [1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0 Version log: Using lowercase letters to present Profiles. Using '_' as divsor between Profiles and other RISC-V extension. Add descriptions in invoke.texi. Checking if there exist '_' between Profiles and additional extensions. Using std::string to avoid memory problems. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (struct riscv_profiles): New struct. (riscv_subset_list::parse_profiles): New parser. (riscv_subset_list::parse_base_ext): Ditto. * config/riscv/riscv-subset.h: New def. * doc/invoke.texi: New option descriptions. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-49.c: New test. * gcc.target/riscv/arch-50.c: New test. * gcc.target/riscv/arch-51.c: New test. * gcc.target/riscv/arch-52.c: New test. --- gcc/common/config/riscv/riscv-common.cc | 85 ++- gcc/config/riscv/riscv-subset.h | 2 + gcc/doc/invoke.texi | 17 ++-- .../aarch64/sve/acle/general/attributes_7.c | 2 +- gcc/testsuite/gcc.target/riscv/arch-49.c | 5 ++ gcc/testsuite/gcc.target/riscv/arch-50.c | 12 +++ gcc/testsuite/gcc.target/riscv/arch-51.c | 12 +++ gcc/testsuite/gcc.target/riscv/arch-52.c | 6 ++ 8 files changed, 131 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/arch-49.c create mode 100644 gcc/testsuite/gcc.target/riscv/arch-50.c create mode 100644 gcc/testsuite/gcc.target/riscv/arch-51.c create mode 100644 gcc/testsuite/gcc.target/riscv/arch-52.c diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index ca14eb96b2..0fa2e21f27 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -274,6 +274,12 @@ struct riscv_ext_version int minor_version; }; +struct riscv_profiles +{ + const char *profile_name; + const char *profile_string; +}; + /* All standard extensions defined in all supported ISA spec. */ static const struct riscv_ext_version riscv_ext_version_table[] = { @@ -502,6 +508,31 @@ static const struct riscv_ext_version riscv_combine_info[] = {NULL, ISA_SPEC_CLASS_NONE, 0, 0} }; +/* This table records the mapping form RISC-V Profiles into march string. */ +static const riscv_profiles riscv_profiles_table[] = +{ + /* RVI20U only contains the base extension 'i' as mandatory extension. */ + {"rvi20u64", "rv64i"}, + {"rvi20u32", "rv32i"}, + + /* RVA20U contains the 'i,m,a,f,d,c,zicsr,zicntr,ziccif,ziccrse,ziccamoa, + zicclsm,za128rs' as mandatory extensions. */ + {"rva20u64", "rv64imafdc_zicsr_zicntr_ziccif_ziccrse_ziccamoa" + "_zicclsm_za128rs"}, + + /* RVA22U contains the 'i,m,a,f,d,c,zicsr,zihintpause,zba,zbb,zbs,zicntr, + zihpm,ziccif,ziccrse,ziccamoa, zicclsm,zic64b,za64rs,zicbom,zicbop,zicboz, + zfhmin,zkt' as mandatory extensions. */ + {"rva22u64", "rv64imafdc_zicsr_zicntr_ziccif_ziccrse_ziccamoa" + "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop" + "_zicboz_zfhmin_zkt"}, + + /* Currently we do not define S/M mode Profiles in gcc part. */ + + /* Terminate the list. */ + {NULL, NULL} +}; + static const riscv_cpu_info riscv_cpu_tables[] = { #define RISCV_CORE(CORE_NAME, ARCH, TUNE) \ @@ -1109,6 +1140,52 @@ riscv_subset_list::parsing_subset_version (const char *ext, return p; } +/* Parsing RISC-V Profiles in -march string. + Return string with mandatory extensions of Profiles. */ +std::string +riscv_subset_list::parse_profiles (const char *arch) +{ + /* Checking if input string contains a Profiles. +There are two cases use Profiles in -march option: + +1. Only use Profiles in '-march' as input +2. Mixed Profiles with other extensions + +Use '_' to split Profiles and other extension. */ +std::string p(arch); +const size_t p_len = p.size(); + +for (int i = 0; riscv_profiles_table[i].profile_name != nullptr; ++i) +{ + const std::string& p_name = riscv_profiles_table[i].profile_name; + const std::string& p_str = riscv_profiles_table[i].profile_string; + size_t pos = p.find(p_name); + /* Find profile at the begin. */ + if (pos == 0 && pos + p_name.size() <= p_len) + { + size_t after_pos = pos + p_name.size(); + std::string after_part = p.substr(after_pos); + + /* If there're only profile, return the profile_string directly. */ + if (after_part[0] == '\0') + return p_str; + + /* If isn't '_' after profile, need to add it and mention the user. */ + if (after_part[0] != '_') + { + warning_at (m_loc, 0, "Should use \"%c\" to contact Profiles with other " + "extensions", '_'); +
[PATCH v4 2/2] RISC-V: Support RISC-V Profiles 23.
This patch introduces support for RISC-V Profiles RV23A and RV23B [1], enabling developers to utilize these profiles through the -march option. [1] https://github.com/riscv/riscv-profiles/releases/tag/rva23-rvb23-ratified Version log: Update the testcases, using lowercase letter. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: New profile. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-53.c: New test. * gcc.target/riscv/arch-54.c: New test. --- gcc/common/config/riscv/riscv-common.cc | 16 gcc/testsuite/gcc.target/riscv/arch-53.c | 11 +++ gcc/testsuite/gcc.target/riscv/arch-54.c | 10 ++ 3 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/arch-53.c create mode 100644 gcc/testsuite/gcc.target/riscv/arch-54.c diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 0fa2e21f27..e06cd5fa57 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -527,6 +527,22 @@ static const riscv_profiles riscv_profiles_table[] = "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop" "_zicboz_zfhmin_zkt"}, + /* RVA23 contains all mandatory base ISA for RVA22U64 and the new extension + 'v,zihintntl,zvfhmin,zvbb,zvkt,zicond,zimop,zcmop,zfa,zawrs' as mandatory + extensions. */ + {"rva23u64", "rv64imafdcv_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa" + "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop" + "_zicboz_zfhmin_zkt_zvfhmin_zvbb_zvkt_zihintntl_zicond_zimop_zcmop_zcb" + "_zfa_zawrs"}, + + /* RVB23 contains all mandatory base ISA for RVA22U64 and the new extension + 'zihintntl,zicond,zimop,zcmop,zfa,zawrs' as mandatory + extensions. */ + {"rvb23u64", "rv64imafdc_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa" + "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop" + "_zicboz_zfhmin_zkt_zihintntl_zicond_zimop_zcmop_zcb" + "_zfa_zawrs"}, + /* Currently we do not define S/M mode Profiles in gcc part. */ /* Terminate the list. */ diff --git a/gcc/testsuite/gcc.target/riscv/arch-53.c b/gcc/testsuite/gcc.target/riscv/arch-53.c new file mode 100644 index 00..8210978ee8 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/arch-53.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rva23u64 -mabi=lp64d" } */ + +void foo(){} + +/* { dg-final { scan-assembler ".attribute arch, \"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0" +"_b1p0_v1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0" +_ziccrse1p0_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0" +_za64rs1p0_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0" +_zba1p0_zbb1p0_zbs1p0_zkt1p0_zvbb1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0" +_zvfhmin1p0_zvkb1p0_zvkt1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0\"" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/arch-54.c b/gcc/testsuite/gcc.target/riscv/arch-54.c new file mode 100644 index 00..6d242dfba5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/arch-54.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rvb23u64 -mabi=lp64d" } */ + +void foo(){} + +/* { dg-final { scan-assembler ".attribute arch, \"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0" +"_b1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0_ziccrse1p0" +"_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0_za64rs1p0" +"_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0_zba1p0" +"_zbb1p0_zbs1p0_zkt1p0\"" } } */ -- 2.43.0
New Swedish PO file for 'gcc' (version 15.1.0)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Swedish team of translators. The file is available at: https://translationproject.org/latest/gcc/sv.po (This file, 'gcc-15.1.0.sv.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets
On 5/9/25 4:26 PM, LIU Hao wrote: 在 2025-5-3 20:52, LIU Hao 写道: 在 2025-5-2 01:25, LIU Hao 写道: Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of `incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more. I suddenly realized the previous patch was for GCC 15 branch. Here's a new one, rebased on master. Ping. Also, I think I need some comments on the `force_align_arg_pointer` hunk. There's really no good reason for the change, except that 'we had better let the attribute do something.' Can you please rebase your patch? Thanks.
Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets
在 2025-5-10 20:48, Jonathan Yong 写道: On 5/9/25 4:26 PM, LIU Hao wrote: 在 2025-5-3 20:52, LIU Hao 写道: 在 2025-5-2 01:25, LIU Hao 写道: Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of `incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more. I suddenly realized the previous patch was for GCC 15 branch. Here's a new one, rebased on master. Ping. Also, I think I need some comments on the `force_align_arg_pointer` hunk. There's really no good reason for the change, except that 'we had better let the attribute do something.' Can you please rebase your patch? Thanks. Here is a patch, rebased on current master. -- Best regards, LIU Hao --- gcc/config/i386/cygming.h | 9 - gcc/config/i386/i386.cc | 9 + 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h index d587d25a58a8..743cc38f5852 100644 --- a/gcc/config/i386/cygming.h +++ b/gcc/config/i386/cygming.h @@ -28,16 +28,15 @@ along with GCC; see the file COPYING3. If not see #undef TARGET_SEH #define TARGET_SEH (TARGET_64BIT_MS_ABI && flag_unwind_tables) +#undef PREFERRED_STACK_BOUNDARY_DEFAULT +#define PREFERRED_STACK_BOUNDARY_DEFAULT \ + (TARGET_64BIT ? 128 : MIN_STACK_BOUNDARY) + /* Win64 with SEH cannot represent DRAP stack frames. Disable its use. Force the use of different mechanisms to allocate aligned local data. */ #undef MAX_STACK_ALIGNMENT #define MAX_STACK_ALIGNMENT (TARGET_SEH ? 128 : MAX_OFILE_ALIGNMENT) -/* 32-bit Windows aligns the stack on a 4-byte boundary but SSE instructions - may require 16-byte alignment. */ -#undef STACK_REALIGN_DEFAULT -#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1) - /* Support hooks for SEH. */ #undef TARGET_ASM_UNWIND_EMIT #define TARGET_ASM_UNWIND_EMIT i386_pe_seh_unwind_emit diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index fd36ea802c00..9c24a926a890 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -7942,6 +7942,15 @@ ix86_update_stack_boundary (void) if (ix86_tls_descriptor_calls_expanded_in_cfun && crtl->preferred_stack_boundary < 128) crtl->preferred_stack_boundary = 128; + + /* For 32-bit MS ABI, both the incoming and preferred stack boundaries + are 32 bits, but if force_align_arg_pointer is specified, it should + prefer 128 bits for a backward-compatibility reason, which is also + what the doc suggests. */ + if (lookup_attribute ("force_align_arg_pointer", + TYPE_ATTRIBUTES (TREE_TYPE (current_function_decl))) + && crtl->preferred_stack_boundary < 128) +crtl->preferred_stack_boundary = 128; } /* Handle the TARGET_GET_DRAP_RTX hook. Return NULL if no DRAP is -- 2.49.0 From 5a49e9a4aa1da8ade48c55df666d03e03f36e797 Mon Sep 17 00:00:00 2001 From: LIU Hao Date: Tue, 29 Apr 2025 10:43:06 +0800 Subject: [PATCH] i386/cygming: Decrease default preferred stack boundary for 32-bit targets This commit decreases the default preferred stack boundary to 4. In i386-options.cc, there's ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY; which sets the default incoming stack boundary to this value, if it's not overridden by other options or attributes. Previously, GCC preferred 16-byte alignment like other platforms, unless `-miamcu` was specified. However, the Microsoft x86 ABI only requires the stack be aligned to 4-byte boundaries. Callback functions from MSVC code may break this assumption by GCC (see reference below), causing local variables to be misaligned. For compatibility reasons, when the attribute `force_align_arg_pointer` is attached to a function, it continues to ensure the stack is at least aligned to a 16-byte boundary, as the documentation seems to suggest. After this change, `STACK_REALIGN_DEFAULT` no longer has an effect on this target, so it is removed. Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=07#c9 Signed-off-by: LIU Hao gcc/ChangeLog: PR 07 * config/i386/cygming.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Override definition from i386.h. (STACK_REALIGN_DEFAULT): Undefine, as it no longer has an effect. * config/i386/i386.cc (ix86_update_stack_boundary): Force minimum 128-bit alignment if `force_align_arg_pointer`. Signed-off-by: LIU Hao --- gcc/config/i386/cygming.h | 9 - gcc/config/i386/i386.cc | 9 + 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h index d587d25a58a8..743cc38f5852 100644 --- a/gcc/config/i386/cygming.h +++ b/gcc/config/i386/cygming.h @@ -28,16 +28,15 @@ along with GCC; see the file COPYING3. If not see #undef TARGET_SEH #define TARGET_SEH (TARGET_64BIT_MS_ABI && flag_unwind_tables) +#undef PREFERRED_STACK_BOUNDARY_DEFAULT +#define PREFERRED_STACK_BOUNDARY_DEFAULT \ + (TARGET_64BIT ? 128 : MI
Re: [PATCH 1/2] libstdc++: Support UNC paths on MinGW in fs::path
On Sat, 10 May 2025 at 12:49, Johannes Grunenberg wrote: > > UNC paths on Windows start with a root name "\\server" instead of a > drive letter ("c:"). Support for parsing this was already implemented > for cygwin, which (compared to MinGW) doesn't use backslash as the > preferred separator. > This mainly enables SLASHSLASH_IS_ROOTNAME on Windows and adjusts the > tests accordingly. > I aligned path::_M_append with path::operator/= to use has_root_directory > and is_absolute from `this` instead of the appended path. Note that > before this patch, `is_absolute && has_root_directory` was always > `false`, as is_absolute implied has_root_directory. This isn't the case > anymore (e.g. `\\foo` doesn't have a root directory). > > PR libstdc++-v3/99333 > PR libstdc++-v3/99430 > PR libstdc++-v3/106127 > > libstdc++-v3/ChangeLog: > > * config/abi/pre/gnu.ver: Add fs::path::_M_is_unc_path() to ABI > * include/bits/fs_path.h: Add and use fs::path::_M_is_unc_path() > on Windows. > * src/c++17/fs_path.cc (defined): Treat "//" as a root name on > Windows. > (path::_M_append): Align with operator/=. > (path::_M_is_unc_path): Added helper to check for > "(slash)(slash)(not-slash)". > * testsuite/27_io/filesystem/path/append/path.cc: Added tests for UNC > paths. > * testsuite/27_io/filesystem/path/append/source.cc: Added tests for > UNC paths. > * testsuite/27_io/filesystem/path/decompose/filename.cc: Switched > Windows FS to test for UNC paths. > * testsuite/27_io/filesystem/path/decompose/root_directory.cc: > Switched Windows FS to test for UNC paths. > * testsuite/27_io/filesystem/path/generation/normal.cc: Switched > Windows FS to test for UNC paths. > * testsuite/27_io/filesystem/path/generic/94242.cc: Added case for > double-slashes as root directories. > * testsuite/27_io/filesystem/path/generic/generic_string.cc: Added > case for double-slashes as root directories. > * testsuite/27_io/filesystem/path/generic/utf.cc: Added case for > double-slashes as root directories. > * testsuite/27_io/filesystem/path/generic/wchar_t.cc: Added case for > double-slashes as root directories. > * testsuite/27_io/filesystem/path/itr/traversal.cc: Switched Windows > FS to test for UNC paths. > * testsuite/27_io/filesystem/path/query/is_absolute.cc: Test that UNC > paths are absolute. > > Signed-off-by: Johannes Grunenberg Just to check, the Signed-off-by: trailer has a specific meaning which is described at https://gcc.gnu.org/dco.html Please confirm that you're using it that way (and not just adding it because others do so). I don't know much about UNC paths so this might take me a while to review, but thanks for working on this and contributing the patches!
Re: [PATCH v2] RISC-V: Use vclmul for CRC expansion if available
On 5/9/25 4:09 AM, Anton Blanchard wrote: If the vector version of clmul (vclmul) is available and the scalar one is not, use it for CRC expansion. gcc/Changelog: * config/riscv/bitmanip.md (crc_rev4): Check TARGET_ZVBC. (crc4): Likewise. * config/riscv/riscv.cc (expand_crc_using_clmul): Emit code using vclmul if TARGET_ZVBC. (expand_reversed_crc_using_clmul): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/crc-builtin-zvbc.c: New test. THanks. I pushed this after fixing one formatting nit. + + rtx shift_amount = gen_int_mode (GET_MODE_BITSIZE (word_mode) - + data_size, Pmode); When we wrap we bring the operator down to the next line. We usually also put expressions like this in an additional level of parenthesis. But there's an easier way :-) GET_MODE_BITSIZE (word_mode) is always going to be BITS_PER_WORD. So I used that instead which avoids wrapping. There was another instance a elsewhere I adjusted to be consistent. Thanks! jeff
Re: [PATCH, fortran] Fix simple typo in libgfortran
Hi Yuao, Yuao Ma wrote: I'm writing to express my sincere gratitude for the opportunity to participate in Google Summer of Code with GCC this year. I am very enthusiastic about this program and fully committed to making a valuable contribution and fulfilling my responsibilities. Welcome and thanks for participating! And sorry for not coming back to you earlier! As I am relatively new to submitting patches via mailing lists, I would like to send a very simple patch primarily to familiarize myself with the correct procedure. I have reviewed the contribution guidelines on the GCC website and have tried to follow them closely. Looks quite good, just a few remarks: * This is a trivial patch but for larger patches, you need grant the right to use your patch. There are two ways of doing so: (a) The simple one, called "Developer Certificate of Origin" (DCO). This requires a "Signed-off-by:" line (as added by 'git commit -s'. (b) A copyright assignment with the Free Software Foundation (FSF). See https://gcc.gnu.org/contribute.html and for the DCO variant in particular: https://gcc.gnu.org/dco.html for details and for what you actually certify/grant. * Regarding the ChangeLog in the email: (a) FYI - if you run for the last commit ./contrib/gcc-changelog/git_check_commit.py -v -p or for some commit ./contrib/gcc-changelog/git_check_commit.py -v -p Then you see how the resulting ChangeLog will look like - it also shows errors and warnings. (That's also used on the server.) (b) Small nit picking: Subject: [PATCH] Fix simple typo in libgfortran libgfortran/ChangeLog: * io/read.c: typo fix, explict -> explicit If applicable, the function name is put in parentheses after the file name, i.e.: * io/read.c (read_f): ... Additionally, per style convention, the entry starts with a captial letter '... (read_f): Typo ...' - and it ends with a full stop '.' Otherwise, I would use 'comment typo' to make clear that it is just in a comment (and not, e.g., in some diagnostic output). But that's really only a small personal preference. * * * Thanks again for the patch and welcome again :-) Tobias
[PATCH v2] libstdc++: More efficient weekday from year_month_day.
Use a better algorithm to calculate n % 7, taking into account that the divisor is a Mersenne number. This is explained in this two-part lightning talk at CppCon 2024 [1, 2]. Roughly speaking, the algorithm is used for n = sys_days but it has a precondition that doesn't hold for all values of sys_days. However, it does hold for every value coming out of year_month_day::operator sys_days(). Nevertheless, there's an issue because the conversion is done in two steps: year_month_day -> sys_days -> weekday and in the second step the information that the input sys_days was obtained from the first step (so the precondition holds) is lost. In the talk a single function performs the two steps and uses the optimization. Unless the standard adds a similar function (e.g. a weekday constructor) or this is added as an extension (not done here), this is not an option for libstdc++. The issue is adressed in the following way. At the end of the fist step, the code informs the compiler, through an [[assume]] attribute that the precondition holds. Then, at the beginning of the second step, the code checks, through __builtin_constant_p, if the precondition holds, in which case, it uses the better algorithm or, otherwise, it falls back to n % 7. The idea is illustrated in [3] which compares code generation (with and without the optimization) and provides an exhaustive test of correctness. A benchmark is shown in [4]. References: [1] https://youtu.be/64mTEXnSnZs [2] https://youtu.be/bnVkWEjRNeI [3] https://godbolt.org/z/TMMcKj8Gv [4] https://quick-bench.com/q/VPtEBjGDmmB4QEG2IDjIvfU6WzA libstdc++-v3/ChangeLog: * include/std/chrono: Add postcondition to year_month_date::operator sys_days and check this condition in weekday::_S_from_days. * testsuite/std/time/weekday/1.cc: Add tests for special values. --- The approach might be somewhat controversial and might need to be discussed. The problem is that passing information from [[assume]] to __builtin_constant_p is quite fragile in 3 different ways. 1) The distance in source code between [[assume]] and __builtin_constant_p makes hard to understand why they are there in the first place. A reader who doesn't know the full picture might feel tempted to remove either of these lines. 2) Even cosmetic changes in the conditions of [[assume]] or __builtin_constant_p might break the compiler's chain of thought and thus, the optimization. 3) Similarly to 2, changes in the compiler code that handles [[assume]] or __builtin_constant_p might break the optimization. If any of the issues above happens, the result will still be right but a performance regression will appear. Comments can mitigate the issues but a performance test or a test for code generation would be more effective to prevent performance regressions. Unfortunately, I don't know how either of these can be done. Version history: v2. Fix bug in v1, add tests and improve comments. v1. Initial patch. libstdc++-v3/include/std/chrono | 29 ++-- libstdc++-v3/testsuite/std/time/weekday/1.cc | 19 + 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 8eb9fd9baac..b747f46c13f 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -984,7 +984,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static constexpr weekday _S_from_days(const days& __d) { - return weekday{__detail::__add_modulo<7>(4, __d.count())}; + const auto __days = __d.count(); + using _Up = make_unsigned_t; + const auto __n = static_cast<_Up>(__days) + 12'687'434u; + + // This condition holds when __d = sys_days(__ymd).time_since_epoch(), + // where __ymd's type is year_month_day. + const auto __use_fast_mod7 = __n < 178'956'973u; + + if (__builtin_constant_p(__use_fast_mod7) && __use_fast_mod7) + { + // https://youtu.be/64mTEXnSnZs and https://youtu.be/bnVkWEjRNeI + const auto __wd = 613'566'757u * static_cast(__n) >> 29; + [[assume(__wd != 7)]]; + return weekday{__wd}; + } + return weekday{__detail::__add_modulo<7>(4, __days)}; } public: @@ -1652,7 +1667,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr operator sys_days() const noexcept - { return sys_days{_M_days_since_epoch()}; } + { + const auto __days = _M_days_since_epoch(); + + // The assume attribute below allows weekday::_S_from_days to use a + // faster algorithm for modulus 7. + const auto __n = static_cast(__days.count()) + 12'687'434u; + const auto __use_fast_mod7 = __n < 178'956'973u; + [[assume(__use_fast_mod7)]]; + + return sys_days{__days}; + } explicit constexpr operator local_days() const noexcept diff --git a/libstdc++-v3/testsuite/std/time/weekday/1.cc b/libstdc++-v3/testsuite/std/time/weekday/1.cc i
Re: [PATCH v1] libstdc++: More efficient weekday from year_month_day.
On Mon, May 5, 2025, 11:48 AM Cassio Neri wrote: > Use a better algorithm to calculate n % 7, taking into account that the > divisor > is a Mersenne number. This is explained in this two-part lightning talk at > CppCon 2024 [1, 2]. > > Roughly speaking, the algorithm is used for n = sys_days but it has a > precondition that doesn't hold for all values of sys_days. However, it > does hold > for every value coming out of year_month_day::operator sys_days(). > > Nevertheless, there's an issue because the conversion is done in two steps: > year_month_day -> sys_days -> weekday and in the second step the > information > that the input sys_days was obtained from the first step (so the > precondition > holds) is lost. In the talk a single function performs the two steps and > uses > the optimization. Unless the standard adds a similar function (e.g. a > weekday > constructor) or this is added as an extension (not done here), this is not > an > option for libstdc++. > > The issue is adressed in the following way. At the end of the fist step, > the > code informs the compiler, through an [[assume]] attribute that the > precondition > holds. Then, at the beginning of the second step, the code checks, through > __builtin_constant_p, if the precondition holds, in which case, it uses the > better algorithm or, otherwise, it falls back to n % 7. > I suspect we want to disable this for -Os . And plus i am not 100% convinced it is best for all micro-architures. Especially on say aarch64. Can you do more benchmarking and peocide which exaxt core is being used? And mention the size difference too? Plus gcc knows how to do %7 via multiplication is that being used or is it due to generic x86 tuning it is using the div instruction? Thanks, Andrew > The idea is illustrated in [3] which compares code generation (with and > without > the optimization) and provides an exhaustive test of correctness. A > benchmark is > shown in [4]. > > References: > > [1] https://youtu.be/64mTEXnSnZs > [2] https://youtu.be/bnVkWEjRNeI > [3] https://godbolt.org/z/E14asPK8r > [4] https://quick-bench.com/q/vzxlrLaEccner3h-ahIrbB0BZvE > > libstdc++-v3/ChangeLog: > > * include/std/chrono: Add postcondition to > year_month_date::operator > sys_days and check this condition in weekday::_S_from_days. > --- > The approach might be somewhat controversial and might need to be > discussed. The problem is that passing information from [[assume]] to > __builtin_constant_p is > quite fragile in 3 different ways. > > 1) The distance in source code between [[assume]] and __builtin_constant_p > makes > hard to understand why they are there in the first place. A reader who > doesn't > know the full picture might feel tempted to remove either of these lines. > > 2) Logically, the post-condition expressed by [[assume]] only needs to > imply the > condition tested by __builtin_constant_p. However, even whey they are > obviously > equivalent for a human, this might not be the case for the compiler. Even > slight cosmetic changes might break the compiler's chain of thought and > thus, > the optimization. > > 3) Similarly to 2, changes in the compiler code that handles [[assume]] or > __builtin_constant_p might break the optimization. > > If any of the issues above happens, the result will still be right but a > performance regression will appear. Comments can mitigate the issues but a > performance test or a test for code generation would be more effective to > prevent performance regressions. Unfortunately, I don't know how either of > these > can be done. > > libstdc++-v3/include/std/chrono | 27 +-- > 1 file changed, 25 insertions(+), 2 deletions(-) > > diff --git a/libstdc++-v3/include/std/chrono > b/libstdc++-v3/include/std/chrono > index 8eb9fd9baac..e762dd46cc4 100644 > --- a/libstdc++-v3/include/std/chrono > +++ b/libstdc++-v3/include/std/chrono > @@ -984,7 +984,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >static constexpr weekday >_S_from_days(const days& __d) >{ > - return weekday{__detail::__add_modulo<7>(4, __d.count())}; > + const auto __days = __d.count(); > + > + const auto __n = static_cast(__days) + 12'687'434u; > + const auto __use_fast_mod7 = __n < 178'956'973u; > + // This condition is true when __d = __dp.time_since_epoch() where > __dp > + // comes from year_month_day::operator sys_days(). > + if (__builtin_constant_p(__use_fast_mod7)) > + { > + const auto __wd = 613'566'757u * __n >> 29; > + [[assume(__wd != 7)]]; > + return weekday{__wd}; > + } > + > + return weekday{__detail::__add_modulo<7>(4, __days)}; >} > > public: > @@ -1652,7 +1665,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >constexpr >operator sys_days() const noexcept > - { return sys_days{_M_days_since_epoch()}; } > + { > + const auto __days = _M_days_since_epoch(); > + > + // The assume at
[PATCH v2] c++: Only reject virtual base data member access in __builtin_offsetof [PR118346]
The following test case highlights two issues - see https://godbolt.org/z/7E1KGYreh: 1. We error out at both L4 and L5, while (at least) clang, EDG and MSVC only reject L5 2. Our error message for L5 incorrectly mentions using a null pointer === cut here === struct A { int i; }; struct C : virtual public A { }; void foo () { auto res = ((C*)0)->i; // L4 __builtin_offsetof (C, i); // L5 } === cut here === Even though L4 is UB, it's technically not invalid, and this patch transforms the error into a warning categorized under -Waddress (so that it can be controlled by the user, and also deactivated for offsetof). It also fixes the error message for L5 to not be confused by the artificial null pointer created by cp_parser_builtin_offsetof. Successfully tested on x86_64-pc-linux-gnu. PR c++/118346 gcc/cp/ChangeLog: * parser.cc (cp_parser_builtin_offsetof): Temporarily disable -Waddress warnings. * semantics.cc (finish_offsetof): Reject accesses to members in virtual bases. * typeck.cc (build_class_member_access_expr): Don't error but warn about accesses to members in virtual bases. gcc/testsuite/ChangeLog: * g++.dg/other/offsetof8.C: Avoid -Wnarrowing warning. * g++.dg/other/offsetof9.C: Adjust test expectations. * g++.dg/other/offsetof10.C: New test. --- gcc/cp/parser.cc| 13 ++--- gcc/cp/semantics.cc | 28 ++- gcc/cp/typeck.cc| 13 +++-- gcc/testsuite/g++.dg/other/offsetof10.C | 36 + gcc/testsuite/g++.dg/other/offsetof8.C | 6 ++--- gcc/testsuite/g++.dg/other/offsetof9.C | 6 ++--- 6 files changed, 76 insertions(+), 26 deletions(-) create mode 100644 gcc/testsuite/g++.dg/other/offsetof10.C diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 1fb9e7fd872..43bbc69b196 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -11521,10 +11521,15 @@ cp_parser_builtin_offsetof (cp_parser *parser) tree object_ptr = build_static_cast (input_location, build_pointer_type (type), null_pointer_node, tf_warning_or_error); - - /* Parse the offsetof-member-designator. We begin as if we saw "expr->". */ - expr = cp_parser_postfix_dot_deref_expression (parser, CPP_DEREF, object_ptr, -true, &dummy, token->location); + { +/* PR c++/118346: don't complain about object_ptr being null. */ +warning_sentinel s(warn_address); +/* Parse the offsetof-member-designator. We begin as if we saw + "expr->". */ +expr = cp_parser_postfix_dot_deref_expression (parser, CPP_DEREF, + object_ptr, true, &dummy, + token->location); + } while (true) { token = cp_lexer_peek_token (parser->lexer); diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 43a0eabfa12..9ba00196542 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -5324,13 +5324,29 @@ finish_offsetof (tree object_ptr, tree expr, location_t loc) expr = TREE_OPERAND (expr, 0); if (!complete_type_or_else (TREE_TYPE (TREE_TYPE (object_ptr)), object_ptr)) return error_mark_node; + if (warn_invalid_offsetof - && CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (object_ptr))) - && CLASSTYPE_NON_STD_LAYOUT (TREE_TYPE (TREE_TYPE (object_ptr))) - && cp_unevaluated_operand == 0) -warning_at (loc, OPT_Winvalid_offsetof, "% within " - "non-standard-layout type %qT is conditionally-supported", - TREE_TYPE (TREE_TYPE (object_ptr))); + && CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (object_ptr +{ + bool member_in_base_p = false; + if (TREE_CODE (expr) == COMPONENT_REF) + { + tree member = expr; + do { + member = TREE_OPERAND (member, 1); + } while (TREE_CODE (member) == COMPONENT_REF); + member_in_base_p = !same_type_p (TREE_TYPE (TREE_TYPE (object_ptr)), + DECL_CONTEXT (member)); + } + if (member_in_base_p + && CLASSTYPE_VBASECLASSES (TREE_TYPE (TREE_TYPE (object_ptr + error_at (loc, "invalid % to data member in virtual base"); + else if (CLASSTYPE_NON_STD_LAYOUT (TREE_TYPE (TREE_TYPE (object_ptr))) + && cp_unevaluated_operand == 0) + warning_at (loc, OPT_Winvalid_offsetof, "% within " + "non-standard-layout type %qT is conditionally-supported", + TREE_TYPE (TREE_TYPE (object_ptr))); +} return fold_offsetof (expr); } diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc index 1b9fdf5b21d..ec35ab6dbb4 100644 --- a/gcc/cp/typeck.cc +++ b/gcc/cp/typeck.cc @@ -2958,17 +2958,10 @@ build_class_member_access_expr (cp_expr object, tree member, return error_mark_node; /* It is inva
Re: [PATCH 1/6] emit-rtl: document next_nonnote_nondebug_insn_bb () can breach into next BB
On 5/9/25 2:27 PM, Vineet Gupta wrote: gcc/ChangeLog: * emit-rtl.cc (next_nonnote_nondebug_insn): Update comments. Signed-off-by: Vineet Gupta --- gcc/emit-rtl.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc index 3e2c4309dee6..b78b29ecf989 100644 --- a/gcc/emit-rtl.cc +++ b/gcc/emit-rtl.cc @@ -3677,7 +3677,11 @@ next_nonnote_nondebug_insn (rtx_insn *insn) /* Return the next insn after INSN that is not a NOTE nor DEBUG_INSN, but stop the search before we enter another basic block. This - routine does not look inside SEQUENCEs. */ + routine does not look inside SEQUENCEs. + NOTE: This can potentially bleed into next BB. If current insn is +last insn of BB, followed by a code_label before the start of +the next BB, code_label will be returned. But this is the +behavior rest of gcc assumes/relies on e.g. get_last_bb_insn. */ I don't see how get_last_bb_insn itself inherently needs this behavior. I could believe something that *calls* get_last_bb_insn might depend on this behavior -- but I'd also consider that bogus. The CODE_LABEL is part of the next block, so returning that seems quite wrong given the original comment and users like get_last_bb_insn. I don't mind adding the comment, but I'd much rather chase down the offenders and fix them. Jeff
[PATCH 0/2] libstdc++: Support UNC paths on MinGW in fs::path
These two patches add support for UNC paths on MinGW, i.e. paths like `\\server\foo\bar\baz.txt`. This is my first time submitting a patch to the GCC repo, so I'm not fully sure if I submitted the patch correctly. This was tested on x86_64-w64-mingw32. I'll try to look at the fs-operations on UNC paths after this patch, since some of them (like fs::canonical) break with UNC paths. Johannes Grunenberg (2): libstdc++: Support UNC paths on MinGW in fs::path libstdc++: Extend tests for UNC paths on MinGW libstdc++-v3/config/abi/pre/gnu.ver | 7 +++ libstdc++-v3/include/bits/fs_path.h | 13 +++- libstdc++-v3/src/c++17/fs_path.cc | 14 +++-- .../27_io/filesystem/path/append/path.cc | 17 +++- .../27_io/filesystem/path/append/source.cc| 14 + .../filesystem/path/decompose/extension.cc| 5 + .../filesystem/path/decompose/filename.cc | 2 +- .../path/decompose/relative_path.cc | 9 + .../path/decompose/root_directory.cc | 3 ++- .../filesystem/path/decompose/root_name.cc| 20 +++ .../filesystem/path/decompose/root_path.cc| 13 .../27_io/filesystem/path/decompose/stem.cc | 5 + .../filesystem/path/generation/normal.cc | 2 +- .../filesystem/path/generation/relative.cc| 14 + .../27_io/filesystem/path/generic/94242.cc| 8 .../filesystem/path/generic/generic_string.cc | 6 +- .../27_io/filesystem/path/generic/utf.cc | 6 +- .../27_io/filesystem/path/generic/wchar_t.cc | 6 +- .../27_io/filesystem/path/itr/traversal.cc| 6 +++--- .../filesystem/path/query/is_absolute.cc | 1 + 20 files changed, 158 insertions(+), 13 deletions(-) -- 2.49.0.windows.1
[PATCH, fortran] Fix simple typo in libgfortran
Hi Tobias, Thanks for your time and guidance on this GSOC project. > * This is a trivial patch but for larger patches, you need >grant the right to use your patch. There are two ways of doing so: >(a) The simple one, called "Developer Certificate of Origin" (DCO). >This requires a "Signed-off-by:" line (as added by 'git commit -s'. >(b) A copyright assignment with the Free Software Foundation (FSF). You're right. I found the information on the website stating that minor patches can omit the DCO. I have now added the "Signed-off-by:" line in the new patch. > (a) FYI - if you run for the last commit > ./contrib/gcc-changelog/git_check_commit.py -v -p or for some commit > ./contrib/gcc-changelog/git_check_commit.py -v -p Then you > see how the resulting ChangeLog will look like - it also shows errors > and warnings. (That's also used on the server.) (b) Small nit picking: Thanks for the tip! I ran the git_check_commit.py script, and it reported "OK." However, when I regenerated the ChangeLog, the function name wasn't included automatically. I've manually added the function name to the ChangeLog for now and will investigate the script's behavior later. The updated patch is attached for your review. Yuao 0001-fortran-fix-simple-typo-in-libgfortran.patch Description: 0001-fortran-fix-simple-typo-in-libgfortran.patch
[PATCH 1/2] libstdc++: Support UNC paths on MinGW in fs::path
UNC paths on Windows start with a root name "\\server" instead of a drive letter ("c:"). Support for parsing this was already implemented for cygwin, which (compared to MinGW) doesn't use backslash as the preferred separator. This mainly enables SLASHSLASH_IS_ROOTNAME on Windows and adjusts the tests accordingly. I aligned path::_M_append with path::operator/= to use has_root_directory and is_absolute from `this` instead of the appended path. Note that before this patch, `is_absolute && has_root_directory` was always `false`, as is_absolute implied has_root_directory. This isn't the case anymore (e.g. `\\foo` doesn't have a root directory). PR libstdc++-v3/99333 PR libstdc++-v3/99430 PR libstdc++-v3/106127 libstdc++-v3/ChangeLog: * config/abi/pre/gnu.ver: Add fs::path::_M_is_unc_path() to ABI * include/bits/fs_path.h: Add and use fs::path::_M_is_unc_path() on Windows. * src/c++17/fs_path.cc (defined): Treat "//" as a root name on Windows. (path::_M_append): Align with operator/=. (path::_M_is_unc_path): Added helper to check for "(slash)(slash)(not-slash)". * testsuite/27_io/filesystem/path/append/path.cc: Added tests for UNC paths. * testsuite/27_io/filesystem/path/append/source.cc: Added tests for UNC paths. * testsuite/27_io/filesystem/path/decompose/filename.cc: Switched Windows FS to test for UNC paths. * testsuite/27_io/filesystem/path/decompose/root_directory.cc: Switched Windows FS to test for UNC paths. * testsuite/27_io/filesystem/path/generation/normal.cc: Switched Windows FS to test for UNC paths. * testsuite/27_io/filesystem/path/generic/94242.cc: Added case for double-slashes as root directories. * testsuite/27_io/filesystem/path/generic/generic_string.cc: Added case for double-slashes as root directories. * testsuite/27_io/filesystem/path/generic/utf.cc: Added case for double-slashes as root directories. * testsuite/27_io/filesystem/path/generic/wchar_t.cc: Added case for double-slashes as root directories. * testsuite/27_io/filesystem/path/itr/traversal.cc: Switched Windows FS to test for UNC paths. * testsuite/27_io/filesystem/path/query/is_absolute.cc: Test that UNC paths are absolute. Signed-off-by: Johannes Grunenberg --- libstdc++-v3/config/abi/pre/gnu.ver | 7 +++ libstdc++-v3/include/bits/fs_path.h | 13 - libstdc++-v3/src/c++17/fs_path.cc | 14 -- .../27_io/filesystem/path/append/path.cc| 17 - .../27_io/filesystem/path/append/source.cc | 14 ++ .../27_io/filesystem/path/decompose/filename.cc | 2 +- .../filesystem/path/decompose/root_directory.cc | 2 +- .../27_io/filesystem/path/generation/normal.cc | 2 +- .../27_io/filesystem/path/generic/94242.cc | 8 .../filesystem/path/generic/generic_string.cc | 6 +- .../27_io/filesystem/path/generic/utf.cc| 6 +- .../27_io/filesystem/path/generic/wchar_t.cc| 6 +- .../27_io/filesystem/path/itr/traversal.cc | 6 +++--- .../27_io/filesystem/path/query/is_absolute.cc | 1 + 14 files changed, 91 insertions(+), 13 deletions(-) diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 29bc7d86256..ed2275fead0 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -2546,6 +2546,13 @@ GLIBCXX_3.4.34 { _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE12_M_constructILb[01]EEEvPK[cw][jmy]; } GLIBCXX_3.4.33; +# GCC 16.1.0 +GLIBCXX_3.4.35 { +# std::filesystem::__cxx11::path::_M_is_unc_path() const +_ZNKSt10filesystem7__cxx114path14_M_is_unc_pathEv; +_ZNKSt10filesystem4path14_M_is_unc_pathEv; +} GLIBCXX_3.4.34; + # Symbols in the support library (libsupc++) have their own tag. CXXABI_1.3 { diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h index ccf1d708e09..2694a13d5c3 100644 --- a/libstdc++-v3/include/bits/fs_path.h +++ b/libstdc++-v3/include/bits/fs_path.h @@ -609,6 +609,10 @@ namespace __detail pair _M_find_extension() const noexcept; +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS +bool _M_is_unc_path() const noexcept; +#endif + // path::_S_convert creates a basic_string or // basic_string_view from a basic_string or // basic_string_view, for an encoded character type C, @@ -1224,6 +1228,13 @@ namespace __detail for (auto& __elem : *this) { #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS + if (__elem._M_type() == _Type::_Root_name && _M_is_unc_path()) + { + __str += __slash; + __str += __slash; + __str += basic_string_view(__elem._M_pathname).substr(2); +
[PATCH 2/2] libstdc++: Extend tests for UNC paths on MinGW
Adds some additional tests to ensure that UNC paths are properly handled (mainly decomposed) on MinGW. libstdc++-v3/ChangeLog: * testsuite/27_io/filesystem/path/decompose/extension.cc: Test for UNC paths with dots in the root directory. * testsuite/27_io/filesystem/path/decompose/relative_path.cc: Test for UNC paths. * testsuite/27_io/filesystem/path/decompose/root_directory.cc: Test for UNC path. * testsuite/27_io/filesystem/path/decompose/root_name.cc: Test for Windows paths. * testsuite/27_io/filesystem/path/decompose/root_path.cc: Test for Windows paths. * testsuite/27_io/filesystem/path/decompose/stem.cc: Test for Windows paths. * testsuite/27_io/filesystem/path/generation/relative.cc: Test for Windows paths. Signed-off-by: Johannes Grunenberg --- .../filesystem/path/decompose/extension.cc| 5 + .../path/decompose/relative_path.cc | 9 + .../path/decompose/root_directory.cc | 1 + .../filesystem/path/decompose/root_name.cc| 20 +++ .../filesystem/path/decompose/root_path.cc| 13 .../27_io/filesystem/path/decompose/stem.cc | 5 + .../filesystem/path/generation/relative.cc| 14 + 7 files changed, 67 insertions(+) diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc index 82887beea37..0965209d283 100644 --- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc +++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/extension.cc @@ -45,6 +45,11 @@ test01() VERIFY( path(".").extension() == path("") ); VERIFY( path().extension() == path() ); + +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS + VERIFY( path("wsl.local").extension() == path("") ); + VERIFY( path("wsl.local\\foo.bar").extension() == path(".bar") ); +#endif } void diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/relative_path.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/relative_path.cc index 0c3eb4b333e..ebc56b99d56 100644 --- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/relative_path.cc +++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/relative_path.cc @@ -34,6 +34,15 @@ test01() VERIFY( p2.relative_path() == p2 ); path p3 = "/foo/bar"; VERIFY( p3.relative_path() == p2 ); + +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS + path p4 = "baz/foo/bar"; + VERIFY( p4.relative_path() == p2 ); + path p5 = "foo\\bar"; + VERIFY( p5.relative_path() == p5 ); + path p6 = "baz\\foo\\bar"; + VERIFY( p6.relative_path() == p5 ); +#endif } #include // XXX diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_directory.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_directory.cc index a7c90f99752..ffdf18929be 100644 --- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_directory.cc +++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_directory.cc @@ -35,6 +35,7 @@ test01() path p3 = "//foo"; #if defined(__CYGWIN__) || defined(_GLIBCXX_FILESYSTEM_IS_WINDOWS) VERIFY( p3.root_directory() == path() ); + VERIFY( path("//").root_directory() == path("//") ); #else VERIFY( p3.root_directory() == path("/") ); #endif diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_name.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_name.cc index c32242ff6fe..d5d1934ec15 100644 --- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_name.cc +++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_name.cc @@ -34,8 +34,28 @@ test01() VERIFY( path("..").extension().empty() ); } +void +test02() +{ +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS + VERIFY( path("/foo/bar").root_name() == "" ); + VERIFY( path("\\").root_name() == "" ); + VERIFY( path("").root_name() == "" ); + VERIFY( path("foo").root_name() == "foo" ); + VERIFY( path("foo\\").root_name() == "foo" ); + VERIFY( path("foo\\bar").root_name() == "foo" ); + VERIFY( path("?\\c\\").root_name() == "?" ); + VERIFY( path("c:/").root_name() == "c:" ); + VERIFY( path("c:\\").root_name() == "c:" ); + VERIFY( path("c:").root_name() == "c:" ); + VERIFY( path("c:\\foo").root_name() == "c:" ); +#endif + VERIFY( path("").root_name() == "" ); +} + int main() { test01(); + test02(); } diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_path.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_path.cc index b01c75fa0b9..c393e5e4681 100644 --- a/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_path.cc +++ b/libstdc++-v3/testsuite/27_io/filesystem/path/decompose/root_path.cc @@ -32,6 +32,19 @@ test01() VERIFY( p1.root_path() == path() ); path p2 = "/foo/bar"; VERIFY( p2.root_path() == path(
Re: [PING] [PATCH] Fortran: fix passing of inquiry ref of complex array to TRANSFER [PR102891]
On 5/10/25 11:33 AM, Harald Anlauf wrote: Early ping. Am 06.05.25 um 21:06 schrieb Harald Anlauf: Dear all, here's another rather obvious case where a temporary is needed for an inquiry reference of a complex array which is a component of a derived type. In contrast to PR119986, the argument is handled within the code for the intrinsic TRANSFER, so that the other patch did not catch the present one. Regtested on x86_64-pc-linux-gnu. OK for mainline? Yes, quite alright. I'd also like to backport this one to 15-branch if this is ok. ... and please do backport. Best regards, Jerry Thanks, Harald
[PATCH] testsuite: Fix pr119131-1.c for targets which emit a psabi warning for vectors of DFP [PR119909]
On PowerPC, there is a psabi warning for argument passing of a DFP vector. We are not expecting this warning and we get a failure due to it. Adding -Wno-psabi fixes the testcase. Committed as obvious after a quick test. gcc/testsuite/ChangeLog: PR testsuite/119909 * gcc.dg/torture/pr119131-1.c: Add -Wno-psabi. Signed-off-by: Andrew Pinski --- gcc/testsuite/gcc.dg/torture/pr119131-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/torture/pr119131-1.c b/gcc/testsuite/gcc.dg/torture/pr119131-1.c index c62f702f98c..1780035dade 100644 --- a/gcc/testsuite/gcc.dg/torture/pr119131-1.c +++ b/gcc/testsuite/gcc.dg/torture/pr119131-1.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target dfp } */ +/* { dg-additional-options "-Wno-psabi" } */ /* PR target/119131 */ typedef __attribute__((__vector_size__ (64))) char C; -- 2.43.0
[PATCH] x86: Change dest to src in replace_vector_const
Replace rtx dest = SET_SRC (set); with rtx src = SET_SRC (set); in replace_vector_const to avoid confusion. * config/i386/i386-features.cc (replace_vector_const): Change dest to src. I am checking it in shortly. -- H.J. From 7b102b3d29efc7b005b2fa820e3ce7a7b1664916 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" Date: Sun, 11 May 2025 06:17:45 +0800 Subject: [PATCH] x86: Change dest to src in replace_vector_const Replace rtx dest = SET_SRC (set); with rtx src = SET_SRC (set); in replace_vector_const to avoid confusion. * config/i386/i386-features.cc (replace_vector_const): Change dest to src. Signed-off-by: H.J. Lu --- gcc/config/i386/i386-features.cc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 54b3f6d33b2..13e6c2a8abd 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -3372,12 +3372,12 @@ replace_vector_const (machine_mode vector_mode, rtx vector_const, /* Get the single SET instruction. */ rtx set = single_set (insn); - rtx dest = SET_SRC (set); - machine_mode mode = GET_MODE (dest); + rtx src = SET_SRC (set); + machine_mode mode = GET_MODE (src); rtx replace; /* Replace the source operand with VECTOR_CONST. */ - if (SUBREG_P (dest) || mode == vector_mode) + if (SUBREG_P (src) || mode == vector_mode) replace = vector_const; else { -- 2.49.0
i386: Fix some problems in stv cost model
Hi, this patch fixes some of problems with cosint in scalar to vector pass. In particular 1) the pass uses optimize_insn_for_size which is intended to be used by expanders and splitters and requires the optimization pass to use set_rtl_profile (bb) for currently processed bb. This is not done, so we get random stale info about hotness of insn. 2) register allocator move costs are all realtive to integer reg-reg move which has cost of 2, so it is (except for size tables and i386) a latency of instruction multiplied by 2. These costs has been duplicated and are now used in combination with rtx costs which are all based to COSTS_N_INSNS that multiplies latency by 4. Some of vectorizer costing contains COSTS_N_INSNS (move_cost) / 2 to compensate, but some new code does not. This patch adds compensatoin. Perhaps we should update the cost tables to use COSTS_N_INSNS everywher but I think we want to first fix inconsistencies. Also the tables will get optically much longer, since we have many move costs and COSTS_N_INSNS is a lot of characters. 3) variable m which decides how much to multiply integer variant (to account that with -m32 all 64bit computations needs 2 instructions) is declared unsigned which makes the signed computation of instruction gain to be done in unsigned type and breaks i.e. for division. 4) I added integer_to_sse costs which are currently all duplicationof sse_to_integer. AMD chips are asymetric and moving one direction is faster than another. I will chance costs incremnetally once vectorizer part is fixed up, too. There are two failures gcc.target/i386/minmax-6.c and gcc.target/i386/minmax-7.c. Both test stv on hasswell which no longer happens since SSE->INT and INT->SSE moves are now more expensive. There is only one instruction to convert: Computing gain for chain #1... Instruction gain 8 for11: {r110:SI=smax(r116:SI,0);clobber flags:CC;} Instruction conversion gain: 8 Registers conversion cost: 8<- this is integer_to_sse and sse_to_integer Total gain: 0 total gain used to be 4 since the patch doubles the conversion costs. According to agner fog's tables the costs should be 1 cycle which is correct here. Final code gnerated is: vmovd %esi, %xmm0 * latency 1 cmpl%edx, %esi je .L2 vpxor %xmm1, %xmm1, %xmm1 * latency 1 vpmaxsd %xmm1, %xmm0, %xmm0 * latency 1 vmovd %xmm0, %eax * latency 1 imull %edx, %eax cltq movzwl (%rdi,%rax,2), %eax ret cmpl%edx, %esi je .L2 xorl%eax, %eax * latency 1 testl %esi, %esi * latency 1 cmovs %eax, %esi * latency 2 imull %edx, %esi movslq %esi, %rsi movzwl (%rdi,%rsi,2), %eax ret Instructions with latency info are those really different. So the uncoverted code has sum of latencies 4 and real latency 3. Converted code has sum of latencies 4 and real latency 3 (vmod+vpmaxsd+vmov). So I do not quite see it should be a win. There is also a bug in costing MIN/MAX case ABS: case SMAX: case SMIN: case UMAX: case UMIN: /* We do not have any conditional move cost, estimate it as a reg-reg move. Comparisons are costed as adds. */ igain += m * (COSTS_N_INSNS (2) + ix86_cost->add); /* Integer SSE ops are all costed the same. */ igain -= ix86_cost->sse_op; break; Now COSTS_N_INSNS (2) is not quite right since reg-reg move should be 1 or perhaps 0. For Haswell cmov really is 2 cycles, but I guess we want to have that in cost vectors like all other instructions. I am not sure if this is really a win in this case (other minmax testcases seems to make sense). I have xfailed it for now and will check if that affects specs on LNT testers. Bootstrapped/regtested x86_64-linux, comitted. I will proceed with similar fixes on vectorizer cost side. Sadly those introduces quite some differences in the testuiste (partly triggered by other costing problems, such as one of scatter/gather) gcc/ChangeLog: * config/i386/i386-features.cc (general_scalar_chain::vector_const_cost): Add BB parameter; handle size costs; use COSTS_N_INSNS to compute move costs. (general_scalar_chain::compute_convert_gain): Use optimize_bb_for_size instead of optimize_insn_for size; use COSTS_N_INSNS to compute move costs; update calls of general_scalar_chain::vector_const_cost; use ix86_cost->integer_to_sse. (timode_immed_const_gain): Add bb parameter; use optimize_bb_for_size_p. (timode_scalar_chain::compute_convert_gain): Use optimize_bb_for_size_p. * config/i386/i386-features.h (class general_scalar_chain): Update
Re: [parch, fortran] Fix PR 120163, 15/16 regression
Am 10.05.25 um 20:32 schrieb Harald Anlauf: Hi Thomas! Am 10.05.25 um 11:42 schrieb Thomas Koenig: This bug was another case of generating a formal arglist from an actual one where we should not have done so. The fix is straightforward: If we have resolved the formal arglist, we should not generare a new one. OK for trunk and backport to gcc 15? Assuming this regtests OK (you didn't say so) this is OK for both. Ah yes, I did. Thanks for the patch! Committed as r16-525-g004bf889e0b1b96ae50f93339104d3602a88deb5 . Thanks for the review! Best regards Thomas
[PATCH, fortran] Fix simple typo in libgfortran
Hi GCC Maintainers, I'm writing to express my sincere gratitude for the opportunity to participate in Google Summer of Code with GCC this year. I am very enthusiastic about this program and fully committed to making a valuable contribution and fulfilling my responsibilities. As I am relatively new to submitting patches via mailing lists, I would like to send a very simple patch primarily to familiarize myself with the correct procedure. I have reviewed the contribution guidelines on the GCC website and have tried to follow them closely. Please let me know if this approach is acceptable. I'm eager to learn and ensure my future contributions align with the project's standards. Thank you for your time and guidance. Best regards, Yuao 0001-Fix-simple-typo-in-libgfortran.patch Description: 0001-Fix-simple-typo-in-libgfortran.patch
Re: [PATCH, fortran] Fix simple typo in libgfortran
Hi Yuao, first, good to have you on board! As I am relatively new to submitting patches via mailing lists, I would like to send a very simple patch primarily to familiarize myself with the correct procedure. I have reviewed the contribution guidelines on the GCC website and have tried to follow them closely. The patch itself is OK. There should be a ChangeLog entry. You can generate a template by running your patch through contrib/mklog.py, which you can then fill out with the details. Best regards Thomas
Re: [PATCH] fortran: Fix up minloc/maxloc lowering [PR120191]
Jakub Jelinek wrote: This unfortunately only fixes some of the cases in the new testcase. We indeed should drop the kind argument from what is passed to the library, but need to do it not only when one uses the argument name for it (so kind=4 etc.) but also when one passes all the arguments to the intrinsics. The following patch uses what gfc_conv_intrinsic_findloc uses, which looks more efficient and cleaner, we already set automatic vars to point to the kind and back actual arguments, so we can just free/clear expr on the former and set name to "%VAL" on the latter. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? LGTM – but it seems that we should do a minor cleanup while being there: ... --- gcc/fortran/trans-intrinsic.cc.jj 2025-04-22 21:26:15.772920190 +0200 +++ gcc/fortran/trans-intrinsic.cc 2025-05-09 17:41:27.323962631 +0200 ... + /* Remove kind. */ + if (kind_arg->expr) { ... + gfc_free_expr (kind_arg->expr); + kind_arg->expr = NULL; } + /* Pass BACK argument by value. */ + back_arg->name = "%VAL"; + ... gfc_actual_arglist *a = actual; - strip_kind_from_actual (a); while (a) { if (a->name && strcmp (a->name, "dim") == 0) Namely: Similar to above, we should be able to just do: if (dim_arg->expr) I think the comment should be also updated and we can also get rid of the 'actual' variable for cleanup. Namely, something like the following on top of your patch (untested): - --- a/gcc/fortran/trans-intrinsic.cc +++ b/gcc/fortran/trans-intrinsic.cc @@ -4912 +4912 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op) - gfc_actual_arglist *actual, *array_arg, *dim_arg, *mask_arg, *kind_arg; + gfc_actual_arglist *array_arg, *dim_arg, *mask_arg, *kind_arg; @@ -4931,2 +4931 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op) - actual = expr->value.function.actual; - array_arg = actual; + array_arg = expr->value.function.actual; @@ -4972 +4971 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op) - arrayexpr = actual->expr; + arrayexpr = array_arg->expr; @@ -4974,2 +4973,2 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op) - /* Special case for character maxloc. Remove unneeded actual - arguments, then call a library function. */ + /* Special case for character maxloc. Remove unneeded "dim" + actual argument, then call the library function. */ @@ -4980,3 +4979 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op) - - gfc_actual_arglist *a = actual; - while (a) + if (dim_arg->expr) @@ -4984,6 +4981,2 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * expr, enum tree_code op) - if (a->name && strcmp (a->name, "dim") == 0) - { - gfc_free_expr (a->expr); - a->expr = NULL; - } - a = a->next; + gfc_free_expr (dim_arg->expr); + dim_arg->expr = NULL; - Thanks to both of your for the patch and to Daniil additionally for reporting and debugging this issue! Tobias
[parch, fortran] Fix PR 120163, 15/16 regression
This bug was another case of generating a formal arglist from an actual one where we should not have done so. The fix is straightforward: If we have resolved the formal arglist, we should not generare a new one. OK for trunk and backport to gcc 15? Best regards Thomas Do not generate formal arglist from actual if we have already resolved it. gcc/fortran/ChangeLog: PR fortran/120163 * gfortran.h: Add formal_resolved to gfc_symbol. * resolve.cc (gfc_resolve_formal_arglist): Set it. (resolve_function): Do not call gfc_get_formal_from_actual_arglist if we already resolved a formal arglist. (resolve_call): Likewise. diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 46310a088f2..4740c3676d9 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -2028,6 +2028,9 @@ typedef struct gfc_symbol This is legal in Fortran, but can cause problems with autogenerated C prototypes for C23. */ unsigned ext_dummy_arglist_mismatch:1; + /* Set if the formal arglist has already been resolved, to avoid + trying to generate it again from actual arguments. */ + unsigned formal_resolved:1; /* Reference counter, used for memory management. diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 1e62e94788b..bf1aa704888 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -533,7 +533,8 @@ gfc_resolve_formal_arglist (gfc_symbol *proc) } } } - + if (sym) +sym->formal_resolved = 1; gfc_current_ns = orig_current_ns; } @@ -3472,7 +3473,7 @@ resolve_function (gfc_expr *expr) &expr->where, &sym->formal_at); } } - else + else if (!sym->formal_resolved) { gfc_get_formal_from_actual_arglist (sym, expr->value.function.actual); sym->formal_at = expr->where; @@ -4033,7 +4034,7 @@ resolve_call (gfc_code *c) &c->loc, &csym->formal_at); } } - else + else if (!csym->formal_resolved) { gfc_get_formal_from_actual_arglist (csym, c->ext.actual); csym->formal_at = c->loc; diff --git a/gcc/testsuite/gfortran.dg/interface_61.f90 b/gcc/testsuite/gfortran.dg/interface_61.f90 new file mode 100644 index 000..15db3b8a7a7 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/interface_61.f90 @@ -0,0 +1,27 @@ +! { dg-do compile } +! { dg-options -Wexternal-argument-mismatch } +! PR fortran/120163 - this used to cause an error. +! Original test case by Bálint Aradi +module mod1 + implicit none + + abstract interface +pure subroutine callback_interface(a) + real, intent(in) :: a +end subroutine callback_interface + end interface + +contains + + subroutine caller(callback) +procedure(callback_interface) :: callback +real :: a +call callback(a) + end subroutine caller + +end module mod1 + + +module mod2 + use mod1 +end module mod2
Re: [PATCH v4 1/2] RISC-V: Support RISC-V Profiles 20/22.
On 5/10/25 6:30 AM, Jiawei wrote: This patch introduces support for RISC-V Profiles RV20 and RV22 [1], enabling developers to utilize these profiles through the -march option. [1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0 Version log: Using lowercase letters to present Profiles. Using '_' as divsor between Profiles and other RISC-V extension. Add descriptions in invoke.texi. Checking if there exist '_' between Profiles and additional extensions. Using std::string to avoid memory problems. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (struct riscv_profiles): New struct. (riscv_subset_list::parse_profiles): New parser. (riscv_subset_list::parse_base_ext): Ditto. * config/riscv/riscv-subset.h: New def. * doc/invoke.texi: New option descriptions. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-49.c: New test. * gcc.target/riscv/arch-50.c: New test. * gcc.target/riscv/arch-51.c: New test. * gcc.target/riscv/arch-52.c: New test. Both patches in this series are OK. Please install. Thanks again, Jeff
[COMMITED] testsuite: Disable bit tests in aarch64/pr99988.c
My recent changes to bit-test switch lowering broke pr99988.c testcase. The testcase assumes a switch will be lowered using jump tables. Make the testcase run with -fno-bit-tests. Pushed as obvious. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr99988.c: Add -fno-bit-tests. Signed-off-by: Filip Kastl --- gcc/testsuite/gcc.target/aarch64/pr99988.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/aarch64/pr99988.c b/gcc/testsuite/gcc.target/aarch64/pr99988.c index 7cca4962944..c09ce67c0fa 100644 --- a/gcc/testsuite/gcc.target/aarch64/pr99988.c +++ b/gcc/testsuite/gcc.target/aarch64/pr99988.c @@ -1,5 +1,5 @@ /* { dg-do compile { target lp64 } } */ -/* { dg-options "-O2 -mbranch-protection=standard" } */ +/* { dg-options "-O2 -mbranch-protection=standard -fno-bit-tests" } */ /* { dg-final { scan-assembler-times {bti j} 13 } } */ int a; int c(); -- 2.49.0
[PATCH] libgcc SH: fix alignment for relaxation
From 6462f1e6a2565c5d4756036d9bc2f39dce9bd768 Mon Sep 17 00:00:00 2001 From: QBos07 Date: Sat, 10 May 2025 16:56:28 + Subject: [PATCH] libgcc SH: fix alignment for relaxation when relaxation is enabled we can not infer the alignment from the position as that may change. This should not change non-relaxed builds as its allready aligned there. This was the missing piece to building an entire toolchain with -mrelax Credit goes to Oleg Endo: https://sourceware.org/bugzilla/show_bug.cgi?id=3298#c4 --- libgcc/config/sh/lib1funcs.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libgcc/config/sh/lib1funcs.S b/libgcc/config/sh/lib1funcs.S index eda48066c..11acfd5e3 100644 --- a/libgcc/config/sh/lib1funcs.S +++ b/libgcc/config/sh/lib1funcs.S @@ -115,7 +115,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see HIDDEN_FUNC(GLOBAL(ashiftrt_r4_31)) HIDDEN_FUNC(GLOBAL(ashiftrt_r4_32)) - .align 1 + .align 4 GLOBAL(ashiftrt_r4_32): GLOBAL(ashiftrt_r4_31): rotcl r4 @@ -764,6 +764,7 @@ LOCAL(movmem_loop): /* Reached with rts */ bt GLOBAL(movmemSI52) ! done all the large groups, do the remainder ! jump to movmem+ + .balign 4 mova GLOBAL(movmemSI4)+4,r0 add r6,r0 jmp @r0 -- 2.43.0
[PATCH, fortran] Fix simple typo in libgfortran
Hi Thomas, Thanks for your quick response. > There should be a ChangeLog entry. You can generate a template by > running your patch through contrib/mklog.py, which you can then fill > out with the details. I think the ChangeLog entry is already in the patch. I used git gcc-commit-mklog, and it produced the following: ``` libgfortran/ChangeLog: * io/read.c: typo fix, explict -> explicit ``` Is this sufficient, or does it need more detail? Yuao 0001-Fix-simple-typo-in-libgfortran.patch Description: 0001-Fix-simple-typo-in-libgfortran.patch
Re: [AUTOFDO][AARCH64] Add support for profilebootstrap
Kugan Vivekanandarajah writes: > Add support for autoprofiledbootstrap in aarch64. > This is similar to what is done for i386. Added > gcc/config/aarch64/gcc-auto-profile for aarch64 profile > creation. > > How to run: > configure --with-build-config=bootstrap-lto > make autoprofiledbootstrap > > > Regression tested on aarch64-linux-gnu with no new regression. > Also successfully done autoprofiledbootstrap with the relevant patch. > > Is this OK for trunk? I'd CC Eugene, the autofdo maintainer, on these patches so he can review them. Andi Kleen is also interested in autofdo. > Thanks, > Kugan > > [2. 0004-AUTOFDO-AARCH64-Add-support-for-profilebootstrap.patch --- > text/x-patch; 0004-AUTOFDO-AARCH64-Add-support-for-profilebootstrap.patch]... FWIW, gcc/config/aarch64/gcc-auto-profile has a #!/bin/sh shebang yet uses a bashism ('=='). Just change that to '='.
Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn
On 5/9/25 2:27 PM, Vineet Gupta wrote: This showed up when debugging the testcase for PR119164. RISC-V FRM mode-switching state machine has special handling for transitions to and from a call_insn as FRM needs to saved/restored around calls (any call is considered potentially FRM clobbering). Consider the following canonical example where insns 2, 4, 6 come are due to user code, while the rest of frm save/restore insns 1, 3, 5, 7 need to be generated for the ABI semantics. test_float_point_frm_static: 1: frrma5 <-- 2: fsrmi 2 3: fsrma5 <-- 4: callnormalize_vl 5: frrma5 <-- 6: fsrmi 3 7: fsrma5 <-- Not the focus of this series, but isn't the frrm@5 unnecessary? If there is no use of FRM after a call, but before the next set of FRM, then the restore isn't needed. This is a pretty standard optimization in caller-saves style code generation. Though maybe it's really an artifact of not showing the uses of FRM in your little example? Current implementation of RISC-V TARGET_MODE_NEEDED has special handling if the call_insn is last insn of BB, to ensure FRM save/reads are emitted on all the edges. However it doesn't work as intended and is borderline bogus for following reasons: - It fails to detect call_insn as last of BB (PR119164 test) if the next BB starts with a code label (say due to call being conditional). Granted this is a deficiency of API next_nonnote_nondebug_insn_bb () which incorrectly returns next BB code_label as opposed to returning NULL (and this behavior is kind of relied upon by much of gcc). This causes missed/delayed state transition to DYN. - If code is tightened to actually detect above such as: - rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn); - if (!insn) + if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn) I'm guessing the concern here was CUR_INSN could be a call and the last real insn in the block, but there could be notes and debug statements between the call and the actual end of the block? Would it be safer as: rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn); if (!insn || BLOCK_FOR_INSN (cur_insn) != BLOCK_FOR_INSN (insn) But it's all academic since this code really just needs to go away. edge insertion happens but ends up splitting the BB which generic mode-sw doesn't expect and ends up hittng an ICE. - TARGET_MODE_NEEDED hook typically don't modify the CFG. Yea, we probably don't want TARGET_MODE_NEEDED to be modifying the CFG. - For abnormal edges, insert_insn_end_basic_block () is called, which by design on encountering call_insn as last in BB, inserts new insn BEFORE the call, not after. Right. In theory we're not supposed to be asking for insertion on abnormals anyway. It would be interesting to know if that's still happening or if it's legacy from the late 90s when we first started lighting up LCM algorithms and made all the common mistakes WRT abnormals. So this is just all wrong and ripe for removal. Moreover there seems to be no testsuite coverage for this code path at all. Results don't change at all if this is removed. The total number of FRM read/writes emitted (static count) across all benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly so its a net win even if minimal but the real gain is reduced complexity and maintenance. Before Patch --- frrm fsrmi fsrm frrm fsrmi frrm perlbench_r 4204 4204 cpugcc_r 1670 17 1670 17 bwaves_r 1601 1601 mcf_r 1100 1100 cactusBSSN_r 790 27 760 27 namd_r 1190 63 1190 63 parest_r 2180 114 1680 114 <-- povray_r 1231 17 1231 17 lbm_r600 600 omnetpp_r 1701 1701 wrf_r 2287 13 19562287 13 1956 cpuxalan_r 1701 1701 ldecod_r 1100 1100 x264_r 1401 1401 blender_r 724 12 182 724 12 182 cam4_r 324 13 169 324 13 169 deepsjeng_r 1100 1100 imagick_r 265 16 34 265 16 34 leela_r 1200 1200 nab_r 1301 1301 exchange2_r 1601 1601 fotonik3d_r 200 11 200 11 roms_r 330 23 330 23 xz_r600 600 --- 4551 55 26234498 55 2623 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_frm_emit_after_b
Re: [PATCH 2/6] RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE
On 5/9/25 2:27 PM, Vineet Gupta wrote: This is effectively reverting e5d1f538bb7d "(RISC-V: Allow different dynamic floating point mode to be merged)" while retaining the testcase. The change itself is valid, however it obfuscates the deficiencies in current frm mode switching code. Also for a SPEC2017 -Ofast -march=rv64gcv build, it ends up generating net more FRM restores (writes) vs. the rest of this changeset. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_dynamic_frm_mode_p): Remove. (riscv_mode_confluence): Ditto. (TARGET_MODE_CONFLUENCE): Ditto. Unsure on this one. - /* FRM_DYN, FRM_DYN_CALL and FRM_DYN_EXIT are all compatible. - Although we already try to set the mode needed to FRM_DYN after a - function call, there are still some corner cases where both FRM_DYN - and FRM_DYN_CALL may appear on incoming edges. */ Do we have an understanding of these corner cases? That's my biggest worry with simply removing this code. jeff
Re: [PATCH 4/6] RISC-V: frm/mode-switch: TARGET_MODE_AFTER not needed for frm switching
On 5/9/25 2:27 PM, Vineet Gupta wrote: Stumbled upon this when trying to wholesale rewrite frm switching code and seeing what pieces needed to be retained from current implementation. My interpretation of how this hook worked, for the following case: fsrmi 3 fsrm a4 call frrm a4 fsrmi 1 TARGET_MODE_NEEDED(call_insn) returns DYN_EXIT (to generate fsrm) and TARGET_MODE_AFTER(call_insn) returns DYN (to generate frrm). However for a given insn, if the 2 hooks return different values, the final state machine doesn't switch as expected above (and instead both NEEDED and AFTER need to return the same mode, for most cases). Anyhow it turns out that no-oping this (return the last_mode back) doesn't change any testcase outcomes. There's no change to total number of FRM read/writes emitted (static count) for SPEC2017 -Ofast -march=rv64gcv build But we win again on reduced complexity and maintenance. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_frm_mode_needed): Move static state update here. (frm_unknown_dynamic_p): Delete. (riscv_frm_mode_after): Delete. (riscv_mode_after): Remove call to riscv_frm_mode_after (). This doesn't seem right to me. Don't we need to know when an insn sets FRM to an unknown value, either due to a call or due to an explicit assignment to FRM from a GPR? I suspect that you don't see a difference when you nop this out because spec doesn't inherently do FRM mode changes and we probably have lousy coverage in the testsuite. Jeff
[COMMITED] [PATCH v2] gimple: Don't assert that switch has nondefault cases during lowering [PR120080]
I have mistakenly assumed that switch lowering cannot encounter a switch with zero clusters. This patch removes the relevant assert and instead gives up bit-test lowering when this happens. PR tree-optimization/120080 gcc/ChangeLog: * tree-switch-conversion.cc (bit_test_cluster::find_bit_tests): Replace assert with return. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr120080.c: New test. Signed-off-by: Filip Kastl --- gcc/testsuite/gcc.dg/tree-ssa/pr120080.c | 26 gcc/tree-switch-conversion.cc| 8 +--- 2 files changed, 31 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr120080.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c b/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c new file mode 100644 index 000..d71ef5e9dd0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr120080.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-fgimple -O2" } */ + +void __GIMPLE (ssa,startwith("switchlower1")) +foo (int b) +{ + __BB(2): + switch (b) {default: L9; case 0: L5; case 5: L5; case 101: L5; } + + __BB(3): +L9: + switch (b) {default: L7; case 5: L6; case 101: L6; } + + __BB(4): +L6: + __builtin_unreachable (); + + __BB(5): +L7: + __builtin_trap (); + + __BB(6): +L5: + return; + +} diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc index dea217a01ef..bd4de966892 100644 --- a/gcc/tree-switch-conversion.cc +++ b/gcc/tree-switch-conversion.cc @@ -1793,12 +1793,14 @@ bit_test_cluster::find_bit_tests (vec &clusters, int max_c) end up with as few clusters as possible. */ unsigned l = clusters.length (); - auto_vec min; - min.reserve (l + 1); - gcc_checking_assert (l > 0); + if (l == 0) +return clusters.copy (); gcc_checking_assert (l <= INT_MAX); + auto_vec min; + min.reserve (l + 1); + int bits_in_word = GET_MODE_BITSIZE (word_mode); /* First phase: Compute the minimum number of clusters for each prefix of the -- 2.49.0
Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn
On 5/10/25 07:17, Jeff Law wrote: > On 5/9/25 2:27 PM, Vineet Gupta wrote: >> This showed up when debugging the testcase for PR119164. >> >> RISC-V FRM mode-switching state machine has special handling for transitions >> to and from a call_insn as FRM needs to saved/restored around calls (any >> call is considered potentially FRM clobbering). Consider the following >> canonical example where insns 2, 4, 6 come are due to user code, while >> the rest of frm save/restore insns 1, 3, 5, 7 need to be generated for the >> ABI semantics. >> >> test_float_point_frm_static: >> 1: frrma5 <-- >> 2: fsrmi 2 >> 3: fsrma5 <-- >> 4: callnormalize_vl >> 5: frrma5 <-- >> 6: fsrmi 3 >> 7: fsrma5 <-- > Not the focus of this series, but isn't the frrm@5 unnecessary? It is necessary. Any function call can potentially change frm. Assuming normalize_vl does, we need to read the latest "global" so that the restore in insn 7 does the right thing. > If there is no use of FRM after a call, but before the next set of FRM, > then the restore isn't needed. This is a pretty standard optimization > in caller-saves style code generation. Though maybe it's really an > artifact of not showing the uses of FRM in your little example? Please see above. >> Current implementation of RISC-V TARGET_MODE_NEEDED has special handling >> if the call_insn is last insn of BB, to ensure FRM save/reads are emitted >> on all the edges. However it doesn't work as intended and is borderline >> bogus for following reasons: >> >> - It fails to detect call_insn as last of BB (PR119164 test) if the >> next BB starts with a code label (say due to call being conditional). >> Granted this is a deficiency of API next_nonnote_nondebug_insn_bb () >> which incorrectly returns next BB code_label as opposed to returning >> NULL (and this behavior is kind of relied upon by much of gcc). >> This causes missed/delayed state transition to DYN. >> >> - If code is tightened to actually detect above such as: >> >> - rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn); >> - if (!insn) >> + if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn) > I'm guessing the concern here was CUR_INSN could be a call and the last > real insn in the block, but there could be notes and debug statements > between the call and the actual end of the block? Yeah in the test case, there was a label as actual last insn of BB, but generic mode_sw after call skips over labels (NONDEBUG_INSN_P used in the iterator). Whereas the {prev,next}_nonnote_nondebug_insn_bb () doesn't hence the impedance mismatch. > Would it be safer as: > rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn); > if (!insn || BLOCK_FOR_INSN (cur_insn) != BLOCK_FOR_INSN (insn) > > But it's all academic since this code really just needs to go away. Exactly. >> edge insertion happens but ends up splitting the BB which generic >> mode-sw doesn't expect and ends up hittng an ICE. >> >> - TARGET_MODE_NEEDED hook typically don't modify the CFG. > Yea, we probably don't want TARGET_MODE_NEEDED to be modifying the CFG. > >> - For abnormal edges, insert_insn_end_basic_block () is called, which >> by design on encountering call_insn as last in BB, inserts new insn >> BEFORE the call, not after. > Right. In theory we're not supposed to be asking for insertion on > abnormals anyway. It would be interesting to know if that's still > happening or if it's legacy from the late 90s when we first started > lighting up LCM algorithms and made all the common mistakes WRT abnormals. Well the RISC-V code was looping thru all the edges and handling regular and abnormal edges. When I added the debug prints there, it was sometimes hitting the abnormal edge case and putting before call, not after. >> So this is just all wrong and ripe for removal. Moreover there seems to >> be no testsuite coverage for this code path at all. Results don't change >> at all if this is removed. >> >> The total number of FRM read/writes emitted (static count) across all >> benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly >> so its a net win even if minimal but the real gain is reduced complexity >> and maintenance. >> >> Before Patch >> --- >> frrm fsrmi fsrm frrm fsrmi frrm >> perlbench_r 4204 4204 >> cpugcc_r 1670 17 1670 17 >> bwaves_r 1601 1601 >>mcf_r 1100 1100 >> cactusBSSN_r 790 27 760 27 >> namd_r 1190 63 1190 63 >> parest_r 2180 114 1680 114 <-- >> povray_r 1231 17 1231 17 >>lbm_r600 600 >>omnetpp
Re: [PATCH v4 1/2] RISC-V: Support RISC-V Profiles 20/22.
Committed on trunk, thanks! https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=43b450e3f72a53c744e77f55393962f1d349373a BR, Jiawei 在 2025/5/11 0:42, Jeff Law 写道: On 5/10/25 6:30 AM, Jiawei wrote: This patch introduces support for RISC-V Profiles RV20 and RV22 [1], enabling developers to utilize these profiles through the -march option. [1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0 Version log: Using lowercase letters to present Profiles. Using '_' as divsor between Profiles and other RISC-V extension. Add descriptions in invoke.texi. Checking if there exist '_' between Profiles and additional extensions. Using std::string to avoid memory problems. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (struct riscv_profiles): New struct. (riscv_subset_list::parse_profiles): New parser. (riscv_subset_list::parse_base_ext): Ditto. * config/riscv/riscv-subset.h: New def. * doc/invoke.texi: New option descriptions. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-49.c: New test. * gcc.target/riscv/arch-50.c: New test. * gcc.target/riscv/arch-51.c: New test. * gcc.target/riscv/arch-52.c: New test. Both patches in this series are OK. Please install. Thanks again, Jeff
Re: [PING] [PATCH] Fortran: fix passing of inquiry ref of complex array to TRANSFER [PR102891]
Hi Jerry! Am 10.05.25 um 21:53 schrieb Jerry D: On 5/10/25 11:33 AM, Harald Anlauf wrote: Early ping. Am 06.05.25 um 21:06 schrieb Harald Anlauf: Dear all, here's another rather obvious case where a temporary is needed for an inquiry reference of a complex array which is a component of a derived type. In contrast to PR119986, the argument is handled within the code for the intrinsic TRANSFER, so that the other patch did not catch the present one. Regtested on x86_64-pc-linux-gnu. OK for mainline? Yes, quite alright. Thanks for the review! Pushed as r16-518-g94fa992b60e53d so far. Cheers, Harald I'd also like to backport this one to 15-branch if this is ok. ... and please do backport. Best regards, Jerry Thanks, Harald
Re: [PATCH v3] i386/cygming: Decrease default preferred stack boundary for 32-bit targets
On 5/10/25 12:54 PM, LIU Hao wrote: 在 2025-5-10 20:48, Jonathan Yong 写道: On 5/9/25 4:26 PM, LIU Hao wrote: 在 2025-5-3 20:52, LIU Hao 写道: 在 2025-5-2 01:25, LIU Hao 写道: Remove `STACK_REALIGN_DEFAULT` for this target, because now the default value of `incoming_stack_boundary` equals `MIN_STACK_BOUNDARY` and it doesn't have an effect any more. I suddenly realized the previous patch was for GCC 15 branch. Here's a new one, rebased on master. Ping. Also, I think I need some comments on the `force_align_arg_pointer` hunk. There's really no good reason for the change, except that 'we had better let the attribute do something.' Can you please rebase your patch? Thanks. Here is a patch, rebased on current master. Thanks, pushed to master branch.
Re: [extern] Re: [PATCH 1/2] libstdc++: Support UNC paths on MinGW in fs::path
On 10/05/2025 15:09, Jonathan Wakely wrote: > Please confirm that you're using it that way (and not just adding it > because others do so). Yes, that's how I'm using it.
Re: [PATCH 0/6] RISC-V: frm state-machine improvements
On 5/9/25 2:27 PM, Vineet Gupta wrote: Hi, This came out of Rivos perf team reporting (shoutout to Siavash) that some of the SPEC2017 workloads had unnecessary FRM wiggles, when none were needed. The writes in particular could be expensive. I started with reduced test for PR/119164 from blender:node_testure_util.c. However in trying to understand (and a botched rewrite of whole thing) it turned out that lot of code was just unnecessary leading to more complexity than warranted. As a result there are more deletions here and the actual improvements come from just a few lines of actual changes. I've verified each patch incrementally with - Testsuite run (unchanged, 1 unexpected pass gcc.target/riscv/rvv/autovec/pr119114.c) - SPEC build - Static analysis of FRM read/write insns emitted in all of SPEC binaries. - There's BPI date for some of this too, but the delta there is not significant as this could really be uarch specific. Here's the result for static analysis. 1. revert-confluence 2. remove-edge-insert 4-fewer-frm-restore 5-call-backtrack 3. remove-mode-after --- --- --- frrm fsrmi fsrm frrm fsrmi fsrm frrm fsrmi fsrm frrm fsrmi fsrm perlbench_r 4204 4204 1701 1701 cpugcc_r 1670 17 1670 17 1100 1100 bwaves_r 1601 1601 1601 1601 mcf_r 1100 1100 1100 1100 cactusBSSN_r 790 27 760 27 1901 1901 namd_r 1190 63 1190 63 1401 1401 parest_r 2180 114 1680 114 2401 2401 povray_r 1231 17 1231 17 2616 2616 lbm_r600 600 600 600 omnetpp_r 1701 1701 1701 1701 wrf_r 2287 13 19562287 13 19561268 13 1603 613 13 82 cpuxalan_r 1701 1701 1701 1701 ldecod_r 1100 1100 1100 1100 x264_r 1401 1401 1100 1100 blender_r 724 12 182 724 12 182 61 12 42 39 12 16 cam4_r 324 13 169 324 13 169 45 13 20 40 13 17 deepsjeng_r 1100 1100 1100 1100 imagick_r 265 16 34 265 16 34 132 16 25 33 16 18 leela_r 1200 1200 1200 1200 nab_r 1301 1301 1301 1301 exchange2_r 1601 1601 1601 1601 fotonik3d_r 200 11 200 11 1901 1901 roms_r 330 23 330 23 2101 2101 xz_r600 600 600 600 --- --- 4551 55 26234498 55 26231804 55 1707 1023 55 150 --- --- 7729 7176 3566 1228 --- --- It seems wrf still has half of all read/writes 613 13 82 with one function having the bulk of them solve_em_ 5551 50 This is 1 static RM so ideally needs 1 save and 1 restore. I have a feeling this has to do with following: https://godbolt.org/z/Px9es7j1r The function call code path need not bother with frm save/restore at all. This is currently being investigated but could take more time. Frankly I'm surprised we need FRM adjustments as much as we do, though presumably there's some builtin or somesuch that we need to twiddle FRM to implement and as a result if the builtin ever gets used it leads to FRM games. But it still seems high. For example, what does xz do that triggers any FRM adjustments, even statically?!? Please review. Will do. Thanks! jeff
Re: [PATCH, fortran] Fix simple typo in libgfortran
Hi Yuao, Yuao Ma wrote: Thanks for the tip! I ran the git_check_commit.py script, and it reported "OK." However, when I regenerated the ChangeLog, the function name wasn't included automatically. Usually 'git diff' shows the function name in the '@@' line, but that does not always work – either it shows no function name (as in this case) or the wrong one. Showing the wrong happens relatively often when, e.g., adding a member to a 'struct' or when updating the function description comment as that comes before the associated function. And not showing the name happens most often for labels (see below). For instance, picking the most recent commit with code changes, a diff usually has something like the following - and then mklog.py works: --- a/gcc/tree-switch-conversion.cc +++ b/gcc/tree-switch-conversion.cc @@ -1793,12 +1793,14 @@ bit_test_cluster::find_bit_tests (vec &clusters, int max_c) → 'bit_test_cluster::find_bit_tests' is the function name. However, in your case, the label ('exponent:') caused that 'git diff' showed it instead of the function name from a few lines up. Namely, your patch diff has: --- a/libgfortran/io/read.c +++ b/libgfortran/io/read.c @@ -1375,7 +1375,7 @@ exponent: * * * I've manually added the function name to the ChangeLog for now Thanks. * * * One very minor nit: The email subject reads "[PATCH, fortran] Fix simple typo in libgfortran" while the patch subject is "[PATCH] fortran: fix simple typo in libgfortran" It's usually nicer if they are the same to make finding old patch reviews easier - usage case: someone wonders 5 years on why a certain change was done in a particular way and looks at the old email thread. In your case, those are similar enough that it doesn't matter - and sometimes it is unavoidable that they diverge (e.g. when fixing the bug number before the commit - while the email thread shows still the wrong bug), but, if possible/sensible, they should be the same. * * * I have now applied it on your behalf as r16-515-g5d9e66493afaa9 alias 5d9e66493afaa9 [The first version one denotes the 515th commit to GCC 16 (= mainline alias trunk); that way, it is easier to see in which GCC version a commit was done - and (relatively to other commits) when.] Short links - which is a readable way to link to commits (I hope Outlook.com doesn't mangle them in an unreadable way): https://gcc.gnu.org/r16-515-g5d9e66493afaa9 orhttps://gcc.gnu.org/g:5d9e66493afaa9 If you ever want to link to bug reports (PR, problem report), something likehttps://gcc.gnu.org/PR1234 can be used. Tobias PS: The ./contrib/git-descr.sh script creates the rXX-YYY number; or 'git gcc-descr' if you added the alias (e.g. via contrib/gcc-git-customization.sh).
Re: [parch, fortran] Fix PR 120163, 15/16 regression
Hi Thomas! Am 10.05.25 um 11:42 schrieb Thomas Koenig: This bug was another case of generating a formal arglist from an actual one where we should not have done so. The fix is straightforward: If we have resolved the formal arglist, we should not generare a new one. OK for trunk and backport to gcc 15? Assuming this regtests OK (you didn't say so) this is OK for both. Thanks for the patch! Harald Best regards Thomas Do not generate formal arglist from actual if we have already resolved it. gcc/fortran/ChangeLog: PR fortran/120163 * gfortran.h: Add formal_resolved to gfc_symbol. * resolve.cc (gfc_resolve_formal_arglist): Set it. (resolve_function): Do not call gfc_get_formal_from_actual_arglist if we already resolved a formal arglist. (resolve_call): Likewise.
[PING] [PATCH] Fortran: fix passing of inquiry ref of complex array to TRANSFER [PR102891]
Early ping. Am 06.05.25 um 21:06 schrieb Harald Anlauf: Dear all, here's another rather obvious case where a temporary is needed for an inquiry reference of a complex array which is a component of a derived type. In contrast to PR119986, the argument is handled within the code for the intrinsic TRANSFER, so that the other patch did not catch the present one. Regtested on x86_64-pc-linux-gnu. OK for mainline? I'd also like to backport this one to 15-branch if this is ok. Thanks, Harald
Re: [PATCH v1] libstdc++: More efficient weekday from year_month_day.
Thanks Andrew for your prompt reply. The results below regard my PoC which is as close to the proposed patch as I could make. This is because I can't have chrono with my patch on godbold for a comparison between current chrono and patched chrono. I tried on all platforms that I could make it to compile. Please double check everything because I might be misreading some results, especially on the platforms that I'm not familiar with. Sometimes godbold seems to have issues and cut pieces of the generated assembly from the output. I've marked these cases with (X). I suspect we want to disable this for -Os > Below are the sizes with -Os. Most of the time the new code is shorter than the old one with a few exceptions where they are the same size (because the platform doesn't seem to support [[assume]]). The new code is never longer. On each link, the middle panel shows the result for the old code and the right panel for the new code. These panels have tabs for different platforms. Old New https://godbolt.org/z/hfz9szEWf x86-64 0x81 0x69 ARM32 0x78 0x68 ARM64 0x81 0x71 ARM64 Morello 0x48 0x48 HPPA 0xf8 0xc8 KVX ACB0xec 0xcc loongarch640x94 0x8c (x) https://godbolt.org/z/eMfzoPhT5 M68K 0xb6 0xa6 MinGW 0xa0 0x80 (X) mips 0xdc 0xac mips64 0xcc 0xb8 mipls64 (el) 0xbc 0xa8 mipsel 0xe0 0xb0 MRISC320xa4 0x74 power 0xb8 0x80 https://godbolt.org/z/PjqbTqK6b power640xa8 0x8c power64le 0xa4 0x88 RISC-V (32)0x90 0x7e (X) RISC-V (64)0x86 0x86 (X) s390x 0xf0 0x90 sh 0xc2 0xb2 SPARC 0xc0 0x98 SPARC LEON 0xbc 0x94 https://godbolt.org/z/7oebGMYTM SPARC640xac 0x94 TI C6x 0xc4 0x98 Tricore0xb0 0xb0 VAX0xc8 0xc5 > And plus i am not 100% convinced it is best for all micro-architures. > Especially on say aarch64. > Can you do more benchmarking and peocide which exaxt core is being used? > I don't have access to any platform other than x86-64 to do benchmarks :-( And mention the size difference too? > Same exercise explained above but with -O2: OldNew https://godbolt.org/z/eqGo9xnz3 x86-64 0x a4 0x 72 ARM32 0x a8 0x 74 ARM64 0x 98 0x 80 ARM64 Morello 0x14c 0x14c HPPA 0x134 0x c8 KVX ACB0x f4 0x 98 loongarch640x ac 0x 9c (X) https://godbolt.org/z/7qh94zGMK M68K 0x13a 0x a2 MinGW 0x d0 0x 80 (X) mips 0x11c 0x e4 mips64 0x130 0x f0 mipls64 (el) 0x120 0x e0 mipsel 0x120 0x e8 MRISC320x a0 0x 74 power 0x dc 0x 88 https://godbolt.org/z/Y11Trnqc1 power640x d0 0x 94 power64le 0x d0 0x 90 RISC-V (32)0x bc 0x 84 (X) RISC-V (64)0x be 0x 94 (X) s390x 0x f0 0x a8 sh 0x c6 0x cc (*) SPARC 0x108 0x 9c SPARC LEON 0x f4 0x 94 https://godbolt.org/z/h456PTEWh SPARC640x c0 0x a0 TI C6x 0x108 0x b0 Tricore0x e8 0x ea (*) VAX0x dc 0x dc (*) These are the only cases where the new code is larger than the old one. Plus gcc knows how to do %7 via multiplication is that being used or is it > due to generic x86 tuning it is using the div instruction? > Yes and no. In x86-64 (and probably many other platforms) the current optimisation for n % 7 is a byproduct of the optimisation for /, that is, to calculate n % 7, the generated code evaluates n - (n / 7) * 7. The quotient q = n / 7 is optimised to avoid div and uses a multiplication and other cheaper operations. In total it evaluates 2 multiplications + shifts + add + subs and movs. (One multiplication is q*7 which is performed with LEA + sub.) The algorithm that I'm suggesting, performs only one multiplication and one. Below are the comparisons of n % 7 and the proposed algorithm. https://godbolt.org/z/o7dazs4Gc https://godbolt.org/z/zP79736WK https://godbolt.org/z/65x7naMfq https://godbolt.org/z/z9ofaMzex I hope this helps. Cassio.
Re: i386: Fix some problems in stv cost model
On Sun, May 11, 2025 at 4:28 AM Jan Hubicka wrote: > > Hi, > this patch fixes some of problems with cosint in scalar to vector pass. > In particular This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120215 > 1) the pass uses optimize_insn_for_size which is intended to be used by > expanders and splitters and requires the optimization pass to use > set_rtl_profile (bb) for currently processed bb. > This is not done, so we get random stale info about hotness of insn. > 2) register allocator move costs are all realtive to integer reg-reg move > which has cost of 2, so it is (except for size tables and i386) > a latency of instruction multiplied by 2. > These costs has been duplicated and are now used in combination with > rtx costs which are all based to COSTS_N_INSNS that multiplies latency > by 4. > Some of vectorizer costing contains COSTS_N_INSNS (move_cost) / 2 > to compensate, but some new code does not. This patch adds compensatoin. > > Perhaps we should update the cost tables to use COSTS_N_INSNS everywher > but I think we want to first fix inconsistencies. Also the tables will > get optically much longer, since we have many move costs and COSTS_N_INSNS > is a lot of characters. > 3) variable m which decides how much to multiply integer variant (to account > that with -m32 all 64bit computations needs 2 instructions) is declared > unsigned which makes the signed computation of instruction gain to be > done in unsigned type and breaks i.e. for division. > 4) I added integer_to_sse costs which are currently all duplicationof > sse_to_integer. AMD chips are asymetric and moving one direction is faster > than another. I will chance costs incremnetally once vectorizer part > is fixed up, too. > > There are two failures gcc.target/i386/minmax-6.c and > gcc.target/i386/minmax-7.c. > Both test stv on hasswell which no longer happens since SSE->INT and INT->SSE > moves > are now more expensive. > > There is only one instruction to convert: > > Computing gain for chain #1... > Instruction gain 8 for11: {r110:SI=smax(r116:SI,0);clobber flags:CC;} > Instruction conversion gain: 8 > Registers conversion cost: 8<- this is integer_to_sse and sse_to_integer > Total gain: 0 > > total gain used to be 4 since the patch doubles the conversion costs. > According to agner fog's tables the costs should be 1 cycle which is correct > here. > > Final code gnerated is: > > vmovd %esi, %xmm0 * latency 1 > cmpl%edx, %esi > je .L2 > vpxor %xmm1, %xmm1, %xmm1 * latency 1 > vpmaxsd %xmm1, %xmm0, %xmm0 * latency 1 > vmovd %xmm0, %eax * latency 1 > imull %edx, %eax > cltq > movzwl (%rdi,%rax,2), %eax > ret > > cmpl%edx, %esi > je .L2 > xorl%eax, %eax * latency 1 > testl %esi, %esi * latency 1 > cmovs %eax, %esi * latency 2 > imull %edx, %esi > movslq %esi, %rsi > movzwl (%rdi,%rsi,2), %eax > ret > > Instructions with latency info are those really different. > So the uncoverted code has sum of latencies 4 and real latency 3. > Converted code has sum of latencies 4 and real latency 3 (vmod+vpmaxsd+vmov). > So I do not quite see it should be a win. > > There is also a bug in costing MIN/MAX > > case ABS: > case SMAX: > case SMIN: > case UMAX: > case UMIN: > /* We do not have any conditional move cost, estimate it as a > reg-reg move. Comparisons are costed as adds. */ > igain += m * (COSTS_N_INSNS (2) + ix86_cost->add); > /* Integer SSE ops are all costed the same. */ > igain -= ix86_cost->sse_op; > break; > > Now COSTS_N_INSNS (2) is not quite right since reg-reg move should be 1 or > perhaps 0. > For Haswell cmov really is 2 cycles, but I guess we want to have that in cost > vectors > like all other instructions. > > I am not sure if this is really a win in this case (other minmax testcases > seems to make > sense). I have xfailed it for now and will check if that affects specs on > LNT testers. > > Bootstrapped/regtested x86_64-linux, comitted. > > I will proceed with similar fixes on vectorizer cost side. Sadly those > introduces > quite some differences in the testuiste (partly triggered by other costing > problems, > such as one of scatter/gather) > > gcc/ChangeLog: > > * config/i386/i386-features.cc > (general_scalar_chain::vector_const_cost): Add BB parameter; handle > size costs; use COSTS_N_INSNS to compute move costs. > (general_scalar_chain::compute_convert_gain): Use optimize_bb_for_size > instead of optimize_insn_for size; use COSTS_N_INSNS to compute move > costs; > update cal
Re: [PATCH 0/6] RISC-V: frm state-machine improvements
On 5/10/25 06:49, Jeff Law wrote: > On 5/9/25 2:27 PM, Vineet Gupta wrote: >> Hi, >> >> This came out of Rivos perf team reporting (shoutout to Siavash) that >> some of the SPEC2017 workloads had unnecessary FRM wiggles, when >> none were needed. The writes in particular could be expensive. >> >> I started with reduced test for PR/119164 from blender:node_testure_util.c. >> >> However in trying to understand (and a botched rewrite of whole thing) >> it turned out that lot of code was just unnecessary leading to more >> complexity than warranted. As a result there are more deletions here and >> the actual improvements come from just a few lines of actual changes. >> >> I've verified each patch incrementally with >> - Testsuite run (unchanged, 1 unexpected pass >> gcc.target/riscv/rvv/autovec/pr119114.c) >> - SPEC build >> - Static analysis of FRM read/write insns emitted in all of SPEC binaries. >> - There's BPI date for some of this too, but the delta there is not >> significant as this could really be uarch specific. >> >> Here's the result for static analysis. >> >> >> 1. revert-confluence 2. remove-edge-insert >> 4-fewer-frm-restore 5-call-backtrack >>3. remove-mode-after >>--- >> --- --- >> frrm fsrmi fsrm frrm fsrmi fsrm frrm fsrmi fsrm >> frrm fsrmi fsrm >> perlbench_r 4204 4204 1701 >> 1701 >> cpugcc_r 1670 17 1670 17 1100 >> 1100 >> bwaves_r 1601 1601 1601 >> 1601 >>mcf_r 1100 1100 1100 >> 1100 >> cactusBSSN_r 790 27 760 27 1901 >> 1901 >> namd_r 1190 63 1190 63 1401 >> 1401 >> parest_r 2180 114 1680 114 2401 >> 2401 >> povray_r 1231 17 1231 17 2616 >> 2616 >>lbm_r600 600 600 >>600 >>omnetpp_r 1701 1701 1701 >> 1701 >>wrf_r 2287 13 19562287 13 19561268 13 1603 >> 613 13 82 >> cpuxalan_r 1701 1701 1701 >> 1701 >> ldecod_r 1100 1100 1100 >> 1100 >> x264_r 1401 1401 1100 >> 1100 >>blender_r 724 12 182 724 12 182 61 12 42 >> 39 12 16 >> cam4_r 324 13 169 324 13 169 45 13 20 >> 40 13 17 >> deepsjeng_r 1100 1100 1100 >> 1100 >>imagick_r 265 16 34 265 16 34 132 16 25 >> 33 16 18 >> leela_r 1200 1200 1200 >> 1200 >>nab_r 1301 1301 1301 >> 1301 >> exchange2_r 1601 1601 1601 >> 1601 >> fotonik3d_r 200 11 200 11 1901 >> 1901 >> roms_r 330 23 330 23 2101 >> 2101 >> xz_r600 600 600 >>600 >> --- >> --- >> 4551 55 26234498 55 26231804 55 1707 >> 1023 55 150 >> --- >> --- >>7729 7176 3566 >> 1228 >> --- >> --- >> >> It seems wrf still has half of all read/writes >> 613 13 82 >> >> with one function having the bulk of them >>solve_em_ 5551 50 >> >> This is 1 static RM so ideally needs 1 save and 1 restore. >> >> I have a feeling this has to do with following: >> https://godbolt.org/z/Px9es7j1r >> >> The function call code path need not bother with frm save/restore at >> all. This is currently being investigated but could take more time. I'll get back to this in detail in a follow up. It is kind of add-on