Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic
> > Perhaps someone is interested in the following thread from LKML: > > "[PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops" > > https://lore.kernel.org/lkml/20250605164733.737543-1-mjgu...@gmail.com/ > > There are several PRs regarding memcpy/memset linked from the above message. > > Please also note a message from Linus from the above thread: > > https://lore.kernel.org/lkml/CAHk-=wg1qqlwkpyvxxznxwbot48--lkjucjjf8phdhrxv0u...@mail.gmail.com/ This is my understanding of the situation. Please correct me where I am wrong. According to Linus, the calls in kernel are more expensive then elsewhere due to mitigations. I wonder if -minline-all-stringops would make sense here. Linus writes about the alternate entryopint for memcpy with non-standard calling convention, which we also discussed few times in the past. I think having call convention for memset/memcpy that only clobbers SI/DE/CX and nothing else (especially no SSE regs) makes sense. This should make offlined mempcy noticeably cheaper, specially when called from loops that needs SSE and the implmentation can be done w/o cloberring extra registers for small blocks while it will have enoug time to spill for large ones. The other patch does +KBUILD_CFLAGS += -mmemcpy-strategy=unrolled_loop:256:noalign,libcall:-1:noalign +KBUILD_CFLAGS += -mmemset-strategy=unrolled_loop:256:noalign,libcall:-1:noalign for non-native CPUs (so something we should fix for generic tuning). Which is about our current default to rep stosq that does not work well on Intel hardware. We do loop for blocks up to 32bytes and rep stosq up to 8k. We now have X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB for Intel cores, but no changes for generic yet (it is on my TODO to do some more testing on Zen). So I think we can do following: 1) decide whether to go with X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB or relpace rep_prefix_8_byte by unrolled_loop 2) fix issue with repeated constants. I.e. instead movq $0, movq $0, movq $0, Which we currently generate for memset fitting in CLEAR_RATIO by mov $0, tmpreg movq tmpreg, movq tmpreg, movq tmpreg, Which will make memset sequences smaller. I agree with Richi that HJ's patch that adds new cloar block expander is probably not a right place for solving the problem. Ideall we should catch repeated constants more generally since this appears elsewhere too. I am not quite sure where to fit it best. We already have a machine specific task that loads 0 into SSE register which is kind of similar to this as well. 3) Figure out what are reasonable MOVE_RATIO/CLEAR_RATIO defaults 4) Possibly go with the entry point idea? Honza
Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic
On Fri, Jun 13, 2025 at 3:15 PM Cui, Lili wrote: > > > On Mon, Apr 21, 2025 at 7:24 AM H.J. Lu wrote: > > > > > > > > On Sun, Apr 20, 2025 at 6:31 PM Jan Hubicka wrote: > > > > > > > > > > > PR target/102294 > > > > > > PR target/119596 > > > > > > * config/i386/x86-tune-costs.h (generic_memcpy): Updated. > > > > > > (generic_memset): Likewise. > > > > > > (generic_cost): Change CLEAR_RATIO to 17. > > > > > > * config/i386/x86-tune.def > > (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): > > > > > > Add m_GENERIC. > > > > > > > > > > Looking through the PRs, there they are primarily about > > > > > CLEAR_RATIO being lower than on clang which makes us to produce > > > > > slower (but smaller) initialization sequence for blocks of certain > > > > > size. > > > > > It seems Kenrel is discussed there too (-mno-sse). > > > > > > > > > > Bumping it up for SSE makes sense provided that SSE codegen does > > > > > not suffer from the long $0 immediates. I would say it is OK also > > > > > for -mno-sse provided speedups are quite noticeable, but it would > > > > > be really nice to solve this incrementally. > > > > > > > > > > concerning X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB my > > understanding > > > > > is that Intel chips likes stosb for small blocks, since they are > > > > > not optimized for stosw/q. Zen seems to preffer stopsq over stosb > > > > > for blocks up to 128 bytes. > > > > > > > > > > How does the loop version compare to stopsb for blocks in rage > > > > > 1...128 bytes in Intel hardware? > > > > > > > > > > Since the case we prove block size to be small but we do not know > > > > > a size, I think using loop or unrolled for blocks up to say 128 > > > > > bytes may work well for both. Perhaps someone is interested in the following thread from LKML: "[PATCH v2] x86: prevent gcc from emitting rep movsq/stosq for inlined ops" https://lore.kernel.org/lkml/20250605164733.737543-1-mjgu...@gmail.com/ There are several PRs regarding memcpy/memset linked from the above message. Please also note a message from Linus from the above thread: https://lore.kernel.org/lkml/CAHk-=wg1qqlwkpyvxxznxwbot48--lkjucjjf8phdhrxv0u...@mail.gmail.com/ Uros,
[PATCH v1] RISC-V: Refine VX combine test case 0 to avoid code duplication
From: Pan Li The case 0 for vx combine def functions are most the same across the different test files. Thus, re-arrange them in one place to avoid code duplication. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Leverage helper macros to avoid code duplication. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add signed and unsigned vx combine test macros. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/vx_vf/vx-1-i16.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-1-i32.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-1-i64.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-1-i8.c | 12 + .../riscv/rvv/autovec/vx_vf/vx-1-u16.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-1-u32.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-1-u64.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-1-u8.c | 11 +--- .../riscv/rvv/autovec/vx_vf/vx-2-i16.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-2-i32.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-2-i64.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-2-i8.c | 12 + .../riscv/rvv/autovec/vx_vf/vx-2-u16.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-2-u32.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-2-u64.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-2-u8.c | 11 +--- .../riscv/rvv/autovec/vx_vf/vx-3-i16.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-3-i32.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-3-i64.c| 12 + .../riscv/rvv/autovec/vx_vf/vx-3-i8.c | 12 + .../riscv/rvv/autovec/vx_vf/vx-3-u16.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-3-u32.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-3-u64.c| 11 +--- .../riscv/rvv/autovec/vx_vf/vx-3-u8.c | 11 +--- .../riscv/rvv/autovec/vx_vf/vx_binary.h | 25 +++ 25 files changed, 49 insertions(+), 252 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c index b070efdcbb2..e18a672704a 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c @@ -5,17 +5,7 @@ #define T int16_t -DEF_VX_BINARY_CASE_0_WRAP(T, +, add) -DEF_VX_BINARY_CASE_0_WRAP(T, -, sub) -DEF_VX_BINARY_REVERSE_CASE_0_WRAP(T, -, rsub); -DEF_VX_BINARY_CASE_0_WRAP(T, &, and) -DEF_VX_BINARY_CASE_0_WRAP(T, |, or) -DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) -DEF_VX_BINARY_CASE_0_WRAP(T, *, mul) -DEF_VX_BINARY_CASE_0_WRAP(T, /, div) -DEF_VX_BINARY_CASE_0_WRAP(T, %, rem) -DEF_VX_BINARY_CASE_2_WRAP(T, MAX_FUNC_0_WARP(T), max) -DEF_VX_BINARY_CASE_2_WRAP(T, MAX_FUNC_1_WARP(T), max) +TEST_BINARY_VX_SIGNED_0(T) /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */ /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c index 3b51ca7ab1b..5feec251a4c 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c @@ -5,17 +5,7 @@ #define T int32_t -DEF_VX_BINARY_CASE_0_WRAP(T, +, add) -DEF_VX_BINARY_CASE_0_WRAP(T, -, sub) -DEF_VX_BINARY_REVERSE_CASE_0_WRAP(T, -, rsub); -DEF_VX_BINARY_CASE_0_WRAP(T, &, and) -DEF_VX_BINARY_CAS
[PATCH] c: Revise -Wjump-misses-init to better support idiomatic C code [PR87038]
Instead of adding it to -Wextra, here is my attempt to improve this warning as discussed in the PR and make it suitable for -Wall. There were only two tests I had to add -Wno-jump-misses-init. Bootstrapped and regression tested for x86_64. c: Revise -Wjump-misses-init to better support idiomatic C code [PR87038] This change revises -Wjump-misses-init to emit a diagnostic only when the variable is read somewhere after the label until the end of the scope, e.g. no warning is emitted anymore in the following example. With this change warning is suitable for -Wall. void f(void) { goto err; int x = 1; // missed initialization f(&x); err: return;// no use of 'x' } This is implemented by deferring all warnings until the end of the scope by recording potential warnings in a data structure, resetting DECL_READ (while recording the current value), and emitting warnings at the end of the scope only for declarations that were read. We still emit diagnostics directly for variably modified types, and for omp allocations, and for all cases with -Wc++-compat. There is overlap with -Wmaybe-uninitialized, but -Wjump-misses-init captures some new situations, e.g. when the address of the variable escapes as in the example above. The overlap could be reduced, but the two warnings provide complementary information, so it seems more useful to simply emit both. Finally, we now do emit labels for switch cases even when there is an error to prevent incorrect warnings about unreachable switch statements. PR c/87038 gcc/c-family/ChangeLog: * c.opts: Add -Wjump-misses-init to -Wall. gcc/c/ChangeLog: * c-tree.h (c_check_switch_jump_warnings): Change return type to void. * c-decl.cc (decl_jump_unsafe): Fix comment. (emit_deferred_jump_warnings): New function. (warn_about_goto): Forward to warn_about_jump. (warn_about_jump): Emit or record warnings for goto or switch when missing an initialization. (pop_scope): Call emit_deferred_jump_warnings. (c_check_switch_jump_warnings): Refactor to use warn_about_jump. * c-typeck.cc (do_case): Adapt call to c_check_switch_jump_warnings. gcc/testsuite/ChangeLog: * c-c++-common/gomp/allocate-11.c: Remove incorrect warning. * gcc.dg/Wjump-misses-init-1.c: Use -Wc++-compat. * gcc.dg/Wjump-misses-init-4.c: New test. * gcc.dg/c23-labels-3.c: Use -Wno-jump-misses-init. * gccdg./uninit-pr102403-c2.c: Use -Wno-jump-misses-init. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 50ba856fedb..023fc186b03 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -938,7 +938,7 @@ C ObjC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(w Warn about invalid UTF-8 characters. Wjump-misses-init -C ObjC Var(warn_jump_misses_init) Warning LangEnabledBy(C ObjC,Wc++-compat) +C ObjC Var(warn_jump_misses_init) Warning LangEnabledBy(C ObjC, Wc++-compat || Wall) Warn when a jump misses a variable initialization. Enum diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc index 8bbd6ebc66a..5e6c510802f 100644 --- a/gcc/c/c-decl.cc +++ b/gcc/c/c-decl.cc @@ -749,7 +749,7 @@ decl_jump_unsafe (tree decl) && c_type_variably_modified_p (TREE_TYPE (decl))) return true; - /* Otherwise, only warn if -Wgoto-misses-init and this is an + /* Otherwise, only warn if -Wjump-misses-init and this is an initialized automatic decl. */ if (warn_jump_misses_init && VAR_P (decl) @@ -1227,6 +1227,87 @@ update_label_decls (struct c_scope *scope) } } + +enum jump_type { JUMP_GOTO, JUMP_SWITCH }; + + +/* Issue a warning about a missed initialization for a goto or switch + statement. */ + +static void +emit_warn_about_jump (enum jump_type jtype, location_t jump_loc, tree label, + tree decl, location_t label_loc) +{ + auto_diagnostic_group d; + switch (jtype) +{ +case JUMP_GOTO: + if (c_type_variably_modified_p (TREE_TYPE (decl))) +error_at (jump_loc, "jump into scope of identifier with " +"variably modified type"); + else if (flag_openmp + && lookup_attribute ("omp allocate", DECL_ATTRIBUTES (decl))) + error_at (jump_loc, "jump skips OpenMP % allocation"); + else if (!warning_at (jump_loc, OPT_Wjump_misses_init, + "jump skips variable initialization")) + return; + inform (DECL_SOURCE_LOCATION (label), "label %qD defined here", label); + inform (DECL_SOURCE_LOCATION (decl), "%qD declared here", decl); + break; +case JUMP_SWITCH: + if (c_type_variably_modified_p (TREE_TYPE (decl))) + error_at (label_loc, "switch jumps into scope of identifier wi
Re: [PATCH] Fortran: fix checking of MOLD= in ALLOCATE statements [PR51961]
On 6/15/25 1:22 PM, Harald Anlauf wrote: Am 15.06.25 um 21:25 schrieb Harald Anlauf: Dear all, the attached patch fixes a rejects-valid: in an ALLOCATE statement with MOLD= present, if the allocate-object has an explicit-shape-spec, the compatibility of ranks is not required by the standard. (It is explicitly required only for SOURCE=). Oops, I attached the wrong patch. Fixed now... Since this could surprise users, we emit a warning if -Wsurprising is specified (contained in -Wall). This agrees with NAG's behavior. Testcase cross-checked with ifx and NAG. Regtested on x86_64-pc-linux-gnu. OK for mainline / 15-branch? Thanks, Harald Yes, looks OK Harald. Thanks. Jerry
[PATCH] RISC-V: Update profiles string in RV23.
Add b-ext in RVA/B23 as independent extension flags and add supm in RVA23. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add b-ext and supm. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-53.c: Update testcase. --- gcc/common/config/riscv/riscv-common.cc | 6 +++--- gcc/testsuite/gcc.target/riscv/arch-53.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 6b5440365e3..3c25848ccd3 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -290,15 +290,15 @@ static const riscv_profiles riscv_profiles_table[] = /* RVA23 contains all mandatory base ISA for RVA22U64 and the new extension 'v,zihintntl,zvfhmin,zvbb,zvkt,zicond,zimop,zcmop,zfa,zawrs' as mandatory extensions. */ - {"rva23u64", "rv64imafdcv_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa" + {"rva23u64", "rv64imafdcbv_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa" "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop" "_zicboz_zfhmin_zkt_zvfhmin_zvbb_zvkt_zihintntl_zicond_zimop_zcmop_zcb" - "_zfa_zawrs"}, + "_zfa_zawrs_supm"}, /* RVB23 contains all mandatory base ISA for RVA22U64 and the new extension 'zihintntl,zicond,zimop,zcmop,zfa,zawrs' as mandatory extensions. */ - {"rvb23u64", "rv64imafdc_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa" + {"rvb23u64", "rv64imafdcb_zicsr_zicntr_zihpm_ziccif_ziccrse_ziccamoa" "_zicclsm_zic64b_za64rs_zihintpause_zba_zbb_zbs_zicbom_zicbop" "_zicboz_zfhmin_zkt_zihintntl_zicond_zimop_zcmop_zcb" "_zfa_zawrs"}, diff --git a/gcc/testsuite/gcc.target/riscv/arch-53.c b/gcc/testsuite/gcc.target/riscv/arch-53.c index 8210978ee8b..43ab23aee4d 100644 --- a/gcc/testsuite/gcc.target/riscv/arch-53.c +++ b/gcc/testsuite/gcc.target/riscv/arch-53.c @@ -8,4 +8,4 @@ void foo(){} _ziccrse1p0_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0" _za64rs1p0_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcd1p0_zcmop1p0" _zba1p0_zbb1p0_zbs1p0_zkt1p0_zvbb1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0" -_zvfhmin1p0_zvkb1p0_zvkt1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0\"" } } */ +_zvfhmin1p0_zvkb1p0_zvkt1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0_supm1p0\"" } } */ -- 2.43.0
Re: [PATCH, part1, v2] Fortran: various fixes for STAT/LSTAT/FSTAT intrinsics [PR82480]
Harald, I did a quick glance at the patch and did not see anything that jumped out as needing a change. OK to commit. Earlier today I came to the same conclusion that -1 on overflow is probably the right thing to do. Gfortran would need a way to supply the value of ERANGE (on all supported targets) so a user can write a test. Yes, POSIX seems to define ERANGE as 34, but is that guaranteed on non-POSIX targets? -- steve On Sun, Jun 15, 2025 at 09:01:37PM +0200, Harald Anlauf wrote: > > here's a modification that returns -1 for those components of stat > that would overflow assignment to integer(kind=4), and does not > return ERANGE as in v1 of this patch. There is no need to modify > the existing testcasese stat_{1,2}.f90. > > Cheers, > Harald > > Am 12.06.25 um 22:12 schrieb Harald Anlauf: > > Hi Steve, > > > > On 6/11/25 23:06, Steve Kargl wrote: > > > On Wed, Jun 11, 2025 at 10:18:37PM +0200, Harald Anlauf wrote: > > > > - for the INTEGER(KIND=4) versions the STATUS returns ERANGE if > > > > an overflow is encountered. > > > > > > > > The latter is certainly debatable, as one of the existing testcases > > > > stat_{1,2}.f90 may fail on systems where e.g. an inode number is larger > > > > than INT32_MAX may occur. Options are to drop the overflow check, or > > > > to run those testcases with additional option -fdefault-integer-8. > > > > > > > > Opinions? > > > > > > > > another option is: > > > > - return -1 for components which overflow, and not return ERANGE, > > thus to leave it up to the user to handle this > > > > It is arguably not an error generated by stat(3), but by the > > interface to Fortran in the runtime. > > > > > > > > Thanks for doing these types of cleanups. > > > > > > You may want to take a peek at > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30372 > > > > > > where I posted a few cleanups for SLEEP, UMASK, UNLINK, > > > etc. In those cleanups, I would cast arguments to > > > integer(4) if I could (ie., if I know the arg was in range) > > > to prevent an explosion in the size of libgfortran. > > > > I do not plan to implement any new library versions. The > > *_i4 and *_i8 versions are already available. All integer > > arguments should be kind=4 or 8, and needed conversions > > can be done using scalar temporaries. > > > > > I'll need to think about your -fdefault-integer-8 question > > > for a bit. Because that option exists and can change > > > default integer kind, we'll need *_i4 and *_i8 versions of > > > the functions in libgfortran. I suspect we'll need to > > > run the testcases with -fdefault-integer-8. > > > > This depends on the way we handle overflow. The variant > > above would not need this option. > > > > > If no one approves your patch by Saturday, I'll review. > > > > Any helpful feedback is greatly appreciated. > > > > Thanks > > Harald > > > > > > > From aa79324885ba44b64911ec7a75375b28a2223cf7 Mon Sep 17 00:00:00 2001 > From: Harald Anlauf > Date: Sun, 15 Jun 2025 20:47:13 +0200 > Subject: [PATCH] Fortran: various fixes for STAT/LSTAT/FSTAT intrinsics > [PR82480] > > The GNU intrinsics STAT/LSTAT/FSTAT were inherited from g77, but changed > the names of some keywords: FILE became NAME, and SARRAY became VALUES, > which are the keywords documented in the gfortran manual. > Adjust code and libgfortran error messages to reflect this change. > Furthermore, add compile-time checking that INTENT(OUT) arguments are > definable, and that array VALUES has at least size 13. > > PR fortran/82480 > > gcc/fortran/ChangeLog: > > * check.cc (gfc_check_fstat): Extend checks to INTENT(OUT) arguments. > (gfc_check_fstat_sub): Likewise. > (gfc_check_stat): Likewise. > (gfc_check_stat_sub): Likewise. > > libgfortran/ChangeLog: > > * intrinsics/stat.c (stat_i4_sub_0): Fix argument names. Rename > SARRAY to VALUES also in error message. When array VALUES is > KIND=4, get only stat components that do not overflow INT32_MAX, > otherwise set the corresponding VALUES elements to -1. > (stat_i4_sub): Fix argument names. > (lstat_i4_sub): Likewise. > (stat_i8_sub_0): Likewise. > (stat_i8_sub): Likewise. > (lstat_i8_sub): Likewise. > (stat_i4): Likewise. > (stat_i8): Likewise. > (lstat_i4): Likewise. > (lstat_i8): Likewise. > (fstat_i4_sub): Likewise. > (fstat_i8_sub): Likewise. > (fstat_i4): Likewise. > (fstat_i8): Likewise. > > gcc/testsuite/ChangeLog: > > * gfortran.dg/stat_3.f90: New test. > --- > gcc/fortran/check.cc | 61 +++--- > gcc/testsuite/gfortran.dg/stat_3.f90 | 46 + > libgfortran/intrinsics/stat.c| 274 +++ > 3 files changed, 226 insertions(+), 155 deletions(-) > create mode 100644 gcc/testsuite/gfortran.dg/stat_3.f90 > > diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc > index c8904df3b21..838d523f7c4 10
Re: [PATCH] xtensa: Revert "xtensa: Eliminate unnecessary general-purpose reg-reg moves"
On Sat, Jun 14, 2025 at 4:31 AM Takayuki 'January June' Suwa wrote: > > Due to improved register allocation for GP registers whose modes has been > changed by paradoxical SUBREGs, the previously committed patch > "xtensa: eliminate unnecessary general-purpose reg-reg moves" > (commit f83e76c3f998c8708fe2ddca16ae3f317c39c37a) is no longer necessary > and is therefore reverted. > > gcc/ChangeLog: > > * config/xtensa/xtensa.md: > Remove the peephole2 pattern that was previously added. > > gcc/testsuite/ChangeLog: > > * gcc.target/xtensa/elim_GP_regmove_0.c: Remove. > * gcc.target/xtensa/elim_GP_regmove_1.c: Remove. > --- > gcc/config/xtensa/xtensa.md | 46 --- > .../gcc.target/xtensa/elim_GP_regmove_0.c | 23 -- > .../gcc.target/xtensa/elim_GP_regmove_1.c | 10 > 3 files changed, 79 deletions(-) > delete mode 100644 gcc/testsuite/gcc.target/xtensa/elim_GP_regmove_0.c > delete mode 100644 gcc/testsuite/gcc.target/xtensa/elim_GP_regmove_1.c Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master. -- Thanks. -- Max
Re: [PATCH] xtensa: Revert "xtensa: Eliminate unwanted reg-reg moves during DFmode input reloads"
On Sun, Jun 15, 2025 at 2:00 AM Takayuki 'January June' Suwa wrote: > > Since there are no unwanted reg-reg moves during DFmode input reloads in > recent GCCs, the previously committed patch > "xtensa: eliminate unwanted reg-reg moves during DFmode input reloads" > (commit cfad4856fa46abc878934a9433d0bfc2482ccf00) is no longer necessary > and is therefore being reverted. > > gcc/ChangeLog: > > * config/xtensa/predicates.md (reload_operand): > Remove. > * config/xtensa/xtensa.md: > Remove the peephole2 pattern that was previously added. > --- > gcc/config/xtensa/predicates.md | 13 - > gcc/config/xtensa/xtensa.md | 30 -- > 2 files changed, 43 deletions(-) Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master. -- Thanks. -- Max
Re: [PATCH v3] simplify-rtx.cc:Simplify XOR(AND(ROTATE(~1) A) ASHIFT(1 A)) to IOR.
Committed to trunk, thanks. Jiawei 在 2025/6/13 21:02, Richard Sandiford 写道: Jiawei writes: This patch adds a new simplification rule to `simplify-rtx.cc` that handles a common bit manipulation pattern involving a single-bit set and clear followed by XOR. The transformation targets RTL of the form: (xor (and (rotate (~1) A) B) (ashift 1 A)) which is semantically equivalent to: B | (1 << A) - v3 log: Update RTL format, remove commas. Only apply on SHIFT_COUNT_TRUNCATED target. check '!side_effects_p' on XEXP (op1, 1). gcc/ChangeLog: * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Handle more logical simplifications. --- gcc/simplify-rtx.cc | 14 ++ 1 file changed, 14 insertions(+) OK, thanks. Richard diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index b34fd2f4b9e..cbe61b49bf6 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -4063,6 +4063,20 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1)) return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1); + /* Convert (xor (and (rotate (~1) A) B) (ashift 1 A)) +into B | (1 << A). */ + if (SHIFT_COUNT_TRUNCATED + && GET_CODE (op0) == AND + && GET_CODE (XEXP (op0, 0)) == ROTATE + && CONST_INT_P (XEXP (XEXP (op0, 0), 0)) + && INTVAL (XEXP (XEXP (op0, 0), 0)) == -2 + && GET_CODE (op1) == ASHIFT + && CONST_INT_P (XEXP (op1, 0)) + && INTVAL (XEXP (op1, 0)) == 1 + && rtx_equal_p (XEXP (XEXP (op0, 0), 1), XEXP (op1, 1)) + && !side_effects_p (XEXP (op1, 1))) + return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1); + tem = simplify_with_subreg_not (code, mode, op0, op1); if (tem) return tem;
RE: [r14-11845 Regression] FAIL: c-c++-common/tsan/tls_race.c -O2 output pattern test on Linux/x86_64
> From: haochen.jiang > Sent: Monday, June 16, 2025 11:42 AM > To: a...@gjlay.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; > Jiang, Haochen > > On Linux/x86_64, > > ddf8b0e06f27667b689dbd970d6c4ab0f088d671 is the first bad commit > commit ddf8b0e06f27667b689dbd970d6c4ab0f088d671 > Author: Georg-Johann Lay > Date: Thu Jun 12 10:07:37 2025 +0200 > > Fix test case for PR117811 which failed for int < 32 bit. > > caused > > FAIL: c-c++-common/tsan/tls_race.c -O0 output pattern test > FAIL: c-c++-common/tsan/tls_race.c -O2 output pattern test > Please disregard this. I will shut down GCC 14 bisect for a while to see why it is still happening. Thx, Haochen
[PATCH, part1, v2] Fortran: various fixes for STAT/LSTAT/FSTAT intrinsics [PR82480]
Dear all, here's a modification that returns -1 for those components of stat that would overflow assignment to integer(kind=4), and does not return ERANGE as in v1 of this patch. There is no need to modify the existing testcasese stat_{1,2}.f90. Cheers, Harald Am 12.06.25 um 22:12 schrieb Harald Anlauf: Hi Steve, On 6/11/25 23:06, Steve Kargl wrote: On Wed, Jun 11, 2025 at 10:18:37PM +0200, Harald Anlauf wrote: - for the INTEGER(KIND=4) versions the STATUS returns ERANGE if an overflow is encountered. The latter is certainly debatable, as one of the existing testcases stat_{1,2}.f90 may fail on systems where e.g. an inode number is larger than INT32_MAX may occur. Options are to drop the overflow check, or to run those testcases with additional option -fdefault-integer-8. Opinions? another option is: - return -1 for components which overflow, and not return ERANGE, thus to leave it up to the user to handle this It is arguably not an error generated by stat(3), but by the interface to Fortran in the runtime. Thanks for doing these types of cleanups. You may want to take a peek at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30372 where I posted a few cleanups for SLEEP, UMASK, UNLINK, etc. In those cleanups, I would cast arguments to integer(4) if I could (ie., if I know the arg was in range) to prevent an explosion in the size of libgfortran. I do not plan to implement any new library versions. The *_i4 and *_i8 versions are already available. All integer arguments should be kind=4 or 8, and needed conversions can be done using scalar temporaries. I'll need to think about your -fdefault-integer-8 question for a bit. Because that option exists and can change default integer kind, we'll need *_i4 and *_i8 versions of the functions in libgfortran. I suspect we'll need to run the testcases with -fdefault-integer-8. This depends on the way we handle overflow. The variant above would not need this option. If no one approves your patch by Saturday, I'll review. Any helpful feedback is greatly appreciated. Thanks Harald From aa79324885ba44b64911ec7a75375b28a2223cf7 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Sun, 15 Jun 2025 20:47:13 +0200 Subject: [PATCH] Fortran: various fixes for STAT/LSTAT/FSTAT intrinsics [PR82480] The GNU intrinsics STAT/LSTAT/FSTAT were inherited from g77, but changed the names of some keywords: FILE became NAME, and SARRAY became VALUES, which are the keywords documented in the gfortran manual. Adjust code and libgfortran error messages to reflect this change. Furthermore, add compile-time checking that INTENT(OUT) arguments are definable, and that array VALUES has at least size 13. PR fortran/82480 gcc/fortran/ChangeLog: * check.cc (gfc_check_fstat): Extend checks to INTENT(OUT) arguments. (gfc_check_fstat_sub): Likewise. (gfc_check_stat): Likewise. (gfc_check_stat_sub): Likewise. libgfortran/ChangeLog: * intrinsics/stat.c (stat_i4_sub_0): Fix argument names. Rename SARRAY to VALUES also in error message. When array VALUES is KIND=4, get only stat components that do not overflow INT32_MAX, otherwise set the corresponding VALUES elements to -1. (stat_i4_sub): Fix argument names. (lstat_i4_sub): Likewise. (stat_i8_sub_0): Likewise. (stat_i8_sub): Likewise. (lstat_i8_sub): Likewise. (stat_i4): Likewise. (stat_i8): Likewise. (lstat_i4): Likewise. (lstat_i8): Likewise. (fstat_i4_sub): Likewise. (fstat_i8_sub): Likewise. (fstat_i4): Likewise. (fstat_i8): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/stat_3.f90: New test. --- gcc/fortran/check.cc | 61 +++--- gcc/testsuite/gfortran.dg/stat_3.f90 | 46 + libgfortran/intrinsics/stat.c| 274 +++ 3 files changed, 226 insertions(+), 155 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/stat_3.f90 diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc index c8904df3b21..838d523f7c4 100644 --- a/gcc/fortran/check.cc +++ b/gcc/fortran/check.cc @@ -6507,7 +6507,7 @@ gfc_check_fseek_sub (gfc_expr *unit, gfc_expr *offset, gfc_expr *whence, gfc_exp bool -gfc_check_fstat (gfc_expr *unit, gfc_expr *array) +gfc_check_fstat (gfc_expr *unit, gfc_expr *values) { if (!type_check (unit, 0, BT_INTEGER)) return false; @@ -6515,11 +6515,17 @@ gfc_check_fstat (gfc_expr *unit, gfc_expr *array) if (!scalar_check (unit, 0)) return false; - if (!type_check (array, 1, BT_INTEGER) + if (!type_check (values, 1, BT_INTEGER) || !kind_value_check (unit, 0, gfc_default_integer_kind)) return false; - if (!array_check (array, 1)) + if (!array_check (values, 1)) +return false; + + if (!variable_check (values, 1, false)) +return false; + + if (!array_size_check (values, 1, 13)) return false; return true; @@ -6527,19 +6533,9 @@ gfc_check_fstat (gfc_expr *unit, gfc_expr *array) bool -gfc_check_fstat_sub (gfc_expr *unit, gfc_expr *array, gfc_expr *status) +gfc_ch
[PATCH] Fortran: fix checking of MOLD= in ALLOCATE statements [PR51961]
Dear all, the attached patch fixes a rejects-valid: in an ALLOCATE statement with MOLD= present, if the allocate-object has an explicit-shape-spec, the compatibility of ranks is not required by the standard. (It is explicitly required only for SOURCE=). Since this could surprise users, we emit a warning if -Wsurprising is specified (contained in -Wall). This agrees with NAG's behavior. Testcase cross-checked with ifx and NAG. Regtested on x86_64-pc-linux-gnu. OK for mainline / 15-branch? Thanks, Harald diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index d09aef0a899..25d12768e97 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -8740,8 +8767,25 @@ static bool conformable_arrays (gfc_expr *e1, gfc_expr *e2) { gfc_ref *tail; + bool scalar; + for (tail = e2->ref; tail && tail->next; tail = tail->next); + /* If MOLD= is present and is not scalar, and the allocate-object has an + explicit-shape-spec, the ranks need not agree. Let's emit a warning if + -pedantic is given. */ + scalar = !tail || tail->type == REF_COMPONENT; + if (e1->mold && e1->rank > 0 + && (scalar || (tail->type == REF_ARRAY && tail->u.ar.type != AR_FULL))) +{ + if (scalar || (tail->u.ar.as && e1->rank != tail->u.ar.as->rank)) + gfc_warning (OPT_Wpedantic, "Allocate-object at %L has rank %d " + "but MOLD= expression at %L has rank %d", + &e2->where, scalar ? 0 : tail->u.ar.as->rank, + &e1->where, e1->rank); + return true; +} + /* First compare rank. */ if ((tail && (!tail->u.ar.as || e1->rank != tail->u.ar.as->rank)) || (!tail && e1->rank != e2->rank))
Re: [PATCH] Fortran: fix checking of MOLD= in ALLOCATE statements [PR51961]
Am 15.06.25 um 21:25 schrieb Harald Anlauf: Dear all, the attached patch fixes a rejects-valid: in an ALLOCATE statement with MOLD= present, if the allocate-object has an explicit-shape-spec, the compatibility of ranks is not required by the standard. (It is explicitly required only for SOURCE=). Oops, I attached the wrong patch. Fixed now... Since this could surprise users, we emit a warning if -Wsurprising is specified (contained in -Wall). This agrees with NAG's behavior. Testcase cross-checked with ifx and NAG. Regtested on x86_64-pc-linux-gnu. OK for mainline / 15-branch? Thanks, Harald Harald From 7194cdde73ed2b2c6ad6bc1a200a9f508c9659fa Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Sun, 15 Jun 2025 21:09:28 +0200 Subject: [PATCH] Fortran: fix checking of MOLD= in ALLOCATE statements [PR51961] In ALLOCATE statements where the MOLD= argument is present and is not scalar, and the allocate-object has an explicit-shape-spec, the standard does not require the ranks to agree. In that case we skip the rank check, but emit a warning if -Wsurprising is given. PR fortran/51961 gcc/fortran/ChangeLog: * resolve.cc (conformable_arrays): Use modified rank check when MOLD= expression is given. gcc/testsuite/ChangeLog: * gfortran.dg/allocate_with_mold_5.f90: New test. --- gcc/fortran/resolve.cc| 17 +++ .../gfortran.dg/allocate_with_mold_5.f90 | 51 +++ 2 files changed, 68 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/allocate_with_mold_5.f90 diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index d09aef0a899..5413d8f9c54 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -8740,8 +8740,25 @@ static bool conformable_arrays (gfc_expr *e1, gfc_expr *e2) { gfc_ref *tail; + bool scalar; + for (tail = e2->ref; tail && tail->next; tail = tail->next); + /* If MOLD= is present and is not scalar, and the allocate-object has an + explicit-shape-spec, the ranks need not agree. This may be unintended, + so let's emit a warning if -Wsurprising is given. */ + scalar = !tail || tail->type == REF_COMPONENT; + if (e1->mold && e1->rank > 0 + && (scalar || (tail->type == REF_ARRAY && tail->u.ar.type != AR_FULL))) +{ + if (scalar || (tail->u.ar.as && e1->rank != tail->u.ar.as->rank)) + gfc_warning (OPT_Wsurprising, "Allocate-object at %L has rank %d " + "but MOLD= expression at %L has rank %d", + &e2->where, scalar ? 0 : tail->u.ar.as->rank, + &e1->where, e1->rank); + return true; +} + /* First compare rank. */ if ((tail && (!tail->u.ar.as || e1->rank != tail->u.ar.as->rank)) || (!tail && e1->rank != e2->rank)) diff --git a/gcc/testsuite/gfortran.dg/allocate_with_mold_5.f90 b/gcc/testsuite/gfortran.dg/allocate_with_mold_5.f90 new file mode 100644 index 000..f5e2fc93d0a --- /dev/null +++ b/gcc/testsuite/gfortran.dg/allocate_with_mold_5.f90 @@ -0,0 +1,51 @@ +! { dg-do compile } +! { dg-additional-options "-Wsurprising" } +! +! PR fortran/51961 - fix checking of MOLD= in ALLOCATE statements +! +! Contributed by Tobias Burnus + +program p + implicit none + type t + end type t + type u + class(t), allocatable :: a(:), b(:,:), c + end type u + class(T), allocatable :: a(:), b(:,:), c + type(u) :: z + + allocate (b(2,2)) + allocate (z% b(2,2)) + + allocate (a(2), mold=b(:,1)) + allocate (a(1:2),mold=b(1,:)) + allocate (a(2), mold=b)! { dg-warning "but MOLD= expression at" } + allocate (a(1:2),mold=b)! { dg-warning "but MOLD= expression at" } + allocate (z% a(2), mold=b(:,1)) + allocate (z% a(1:2), mold=b(1,:)) + allocate (z% a(2), mold=b)! { dg-warning "but MOLD= expression at" } + allocate (z% a(1:2), mold=b)! { dg-warning "but MOLD= expression at" } + allocate (z% a(2), mold=z% b(:,1)) + allocate (z% a(1:2), mold=z% b(1,:)) + allocate (z% a(2), mold=z% b) ! { dg-warning "but MOLD= expression at" } + allocate (z% a(1:2), mold=z% b) ! { dg-warning "but MOLD= expression at" } + + allocate (c, mold=b(1,1)) + allocate (c, mold=b) ! { dg-warning "but MOLD= expression at" } + allocate (z% c, mold=b(1,1)) + allocate (z% c, mold=b) ! { dg-warning "but MOLD= expression at" } + allocate (z% c, mold=z% b(1,1)) + allocate (z% c, mold=z% b)! { dg-warning "but MOLD= expression at" } + + allocate (a, mold=b(:,1)) + allocate (a, mold=b(1,:)) + allocate (z% a, mold=b(:,1)) + allocate (z% a, mold=b(1,:)) + allocate (z% a, mold=z% b(:,1)) + allocate (z% a, mold=z% b(1,:)) + + allocate (a, mold=b) ! { dg-error "or have the same rank" } + allocate (z% a, mold=b) ! { dg-error "or have the same rank" } + allocate (z% a, mold=z% b) ! { dg-error "or have the same rank" } +end -- 2.43.0
[r14-11845 Regression] FAIL: c-c++-common/tsan/tls_race.c -O2 output pattern test on Linux/x86_64
On Linux/x86_64, ddf8b0e06f27667b689dbd970d6c4ab0f088d671 is the first bad commit commit ddf8b0e06f27667b689dbd970d6c4ab0f088d671 Author: Georg-Johann Lay Date: Thu Jun 12 10:07:37 2025 +0200 Fix test case for PR117811 which failed for int < 32 bit. caused FAIL: c-c++-common/tsan/tls_race.c -O0 output pattern test FAIL: c-c++-common/tsan/tls_race.c -O2 output pattern test with GCC configured with ../../gcc/configure --prefix=/export/users3/haochenj/src/gcc-bisect/gcc-14/releases/gcc-14/r14-11845/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="tsan.exp=c-c++-common/tsan/tls_race.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)