Re: [PING^2] [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c
Hi Christophe, On 2025-01-27 13:07, Christophe Lyon wrote: Hi Torbjorn, On Fri, 24 Jan 2025 at 18:45, Torbjorn SVENSSON wrote: Another ping... :) Kind regards, Torbjörn On 2024-12-18 18:35, Torbjorn SVENSSON wrote: Gentle ping :) Kind regards, Torbjörn On 2024-11-14 17:16, Torbjorn SVENSSON wrote: On 2024-11-14 16:32, Christophe Lyon wrote: On Fri, 8 Nov 2024 at 19:49, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? -- A long time ago, this test forced -march=armv6. With -marm, the generated assembler is: foo: sub r0, r0, #48 cmp r0, #9 movhi r0, #0 movls r0, #1 bx lr With -mthumb, the generated assembler is: foo: subsr0, r0, #48 movsr2, #9 uxtbr3, r0 movsr0, #0 cmp r2, r3 adcsr0, r0, r0 uxtbr0, r0 bx lr Require effective-target arm_arm_ok to skip the test for thumb-only targets (Cortex-M). gcc/testsuite/ChangeLog: * gcc.target/arm/unsigned-extend-1.c: Use effective-target arm_arm_ok. Signed-off-by: Torbjörn SVENSSON --- gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ testsuite/gcc.target/arm/unsigned-extend-1.c index 3b4ab048fb0..73f2e1a556d 100644 --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_arm_ok } */ /* { dg-options "-O2" } */ So I'm guessing arm_arm_ok fails when you are testing for M-profile targets? What happens when you test for A-profile, providing -mthumb as part of your runtest flags? Looking at our CI results, without your patch the test passes on: cortex-m33, m3, m55, m7, v7A in arm mode, v8A in thumb mode, and it fails only on cortex-m0 and m23. Does this match your observations? Without my patch, I see failure on M0, M0+ and M23. With my patch, it goes to unsupported on all Cortex-M targets that I test. So your patch will make the test unsupported on m33, m3, m55 and m7 while it currently passes? Am I missing something? I think you might have missed my comment down Thanks, Christophe For Cortex-M0 (thumb/cpu=cortex-m0/float-abi=soft): Testing arm/unsigned-extend-1.c check_compile tool: gcc for arm_arm_ok doing compile Executing on host: .../bin/arm-none-eabi-gcc arm_arm_ok21114.c - mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21114.s (timeout = 800) spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21114.c - mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output - marm -Wno- complain-wrong-lang -S -o arm_arm_ok21114.s pid is 21254 -21254 cc1: error: target CPU does not support ARM mode pid is -1 close result is 21254 exp6 0 1 output is cc1: error: target CPU does not support ARM mode status 1 compiler exited with status 1 UNSUPPORTED: gcc.target/arm/unsigned-extend-1.c For Cortex-A7 (thumb/cpu=cortex-a7/float-abi=soft): Testing arm/unsigned-extend-1.c check_compile tool: gcc for arm_arm_ok doing compile Executing on host: .../bin/arm-none-eabi-gcc arm_arm_ok21296.c - mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21296.s (timeout = 800) spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21296.c - mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output - marm -Wno- complain-wrong-lang -S -o arm_arm_ok21296.s pid is 21434 -21434 pid is -1 output is status 0 doing compile Executing on host: .../bin/arm-none-eabi-gcc .../gcc/testsuite/ gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat- abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno- ident -S -o unsigned-extend-1.s(timeout = 800) spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc .../gcc/testsuite/ gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat- abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno-ident - S -o unsigned-extend-1.s pid is 21438 -21438 pid is -1 output is status 0 check_compile tool: gcc for exceptions_enabled doing compile Executing on host: .../bin/arm-none-eabi-gcc exceptions_enabled21296.cc -mthumb -mcpu=cortex-a7 -mfloat- abi=soft -fdiagnostics-plain-output -Wno-complain-wrong-lang -S -o exceptions_enabled21296.s(timeout = 800) spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc exceptions_enabled21296.cc -mthumb -mcpu=cortex-a7 -mfloat-abi=soft - fdiagnostics-plain-output -Wno-complain-wrong-lang -S -o exceptions_enabled21296.s pid is 21442 -21442 pid is -1 output is status 0 PASS: gcc.target/arm/unsigned-extend-1.c (test for excess errors) PASS: gcc.target/arm/unsigned-extend-1.c scan-assembler-not uxtb For Cortex-A7 (cpu=cortex-a7/float
[PATCH][v2] rtl-optimization/118662 - wrong combination of vector sign-extends
The following fixes an issue in the RTL combiner where we correctly combine two vector sign-extends with a vector load Trying 7, 9 -> 10: 7: r106:V4QI=[r119:DI] REG_DEAD r119:DI 9: r108:V4HI=sign_extend(vec_select(r106:V4QI#0,parallel)) 10: r109:V4SI=sign_extend(vec_select(r108:V4HI#0,parallel)) REG_DEAD r108:V4HI to modifying insn i2 9: r109:V4SI=sign_extend([r119:DI]) but since r106 is used we wrongly materialize it using a subreg: modifying insn i310: r106:V4QI=r109:V4SI#0 which of course does not work for modes with more than one component, specifically vector and complex modes. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR rtl-optimization/118662 * combine.cc (try_combine): When re-materializing a load from an extended reg by a lowpart subreg make sure we're not dealing with vector or complex modes. * gcc.dg/torture/pr118662.c: New testcase. --- gcc/combine.cc | 5 + gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++ 2 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c diff --git a/gcc/combine.cc b/gcc/combine.cc index a2d4387cebe..b0159b23d86 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, copy. This saves at least one insn, more if register allocation can eliminate the copy. + We cannot do this if the involved modes have more than one elements, + like for vector or complex modes. + We cannot do this if the destination of the first assignment is a condition code register. We eliminate this case by making sure the SET_DEST and SET_SRC have the same mode. @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0))) == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0 + && ! VECTOR_MODE_P (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0 + && ! COMPLEX_MODE_P (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0 && GET_CODE (XVECEXP (newpat, 0, 1)) == SET && rtx_equal_p (SET_SRC (XVECEXP (newpat, 0, 1)), XEXP (SET_SRC (XVECEXP (newpat, 0, 0)), 0)) diff --git a/gcc/testsuite/gcc.dg/torture/pr118662.c b/gcc/testsuite/gcc.dg/torture/pr118662.c new file mode 100644 index 000..b9e8cca0aeb --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr118662.c @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } */ +/* { dg-additional-options "-msse4" { target sse4_runtime} } */ + +int __attribute__((noipa)) addup(signed char *num) { + int val = num[0] + num[1] + num[2] + num[3]; + if (num[3] >= 0) +val++; + return val; +} + +int main(int, char *[]) +{ + signed char num[4] = {1, 1, 1, -1}; + if (addup(num) != 2) +__builtin_abort(); + return 0; +} -- 2.43.0
[PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]
Hi Jason, On 24 Jan 2025, at 16:51, Jason Merrill wrote: > On 1/24/25 6:19 AM, Simon Martin wrote: >> Hi Jason, >> >> On 23 Jan 2025, at 22:56, Jason Merrill wrote: >> >>> On 1/23/25 12:06 PM, Simon Martin wrote: Hi Marek, On 23 Jan 2025, at 16:45, Marek Polacek wrote: > On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote: >> Hi Jason, >> >> On 20 Jan 2025, at 22:49, Jason Merrill wrote: >> >>> On 1/20/25 2:11 PM, Simon Martin wrote: Hi Jason, On 15 Jan 2025, at 22:42, Jason Merrill wrote: > On 12/12/24 2:07 PM, Simon Martin wrote: >> We currently ICE upon the following valid (under -Wno-vla) >> code >> >> >> === cut here === >> void f(int c) { >> constexpr int r = 4; >> [&](auto) { int t[r * c]; }(0); >> } >> === cut here === >> >> The problem is that when parsing the lambda body, and more >> specifically >> the multiplication, we mark the lambda as >> LAMBDA_EXPR_CAPTURE_OPTIMIZED >> even though the replacement of r by 4 is "undone" by the call >> to >> build_min_non_dep in build_x_binary_op. This makes >> prune_lambda_captures >> remove the proxy declaration while it should not, and we trip >> on >> an >> >> assert at instantiation time. > > Why does prune_lambda_captures remove the declaration if it's > still > used in the function body? Setting > LAMBDA_EXPR_CAPTURE_OPTIMIZED > just > means "we might have optimized away some captures", the tree > walk > should have found the use put back by build_x_binary_op. I think that this is due to a bug in mark_const_cap_r, that’s been here since the beginning, i.e. r8-7213-g1577f10a637352: it decides >> NOT to walk sub-trees when encountering a DECL_EXPR for a constant capture proxy (at lambda.cc:1769). I don’t understand why we wouldn’t want to continue. >>> >>> Because that's the declaration of the capture proxy, not a use >>> of >>> it. >> Indeed, thanks for clarifying. >> >>> Why aren't we finding the use in the declaration of t? >> After further investigation, the problem is rather that neither >> walk_tree_1 nor cp_walk_subtrees walk the dimensions of array >> types, >> so >> we miss the uses. >> Removing that line fixes the PR, but breaks 3 existing tests (lambda-capture1.C, lambda-const11.C and lambda-const11a.C, that all assert that we optimise out the capture); that’s why I did not pursue it in the first place. >>> >>> Right. >>> But taking another look, it might not be that big a deal that we don’t optimise those out: as soon as we use -O1 or above, the assignment to >> the closure field actually disappears. >>> >>> Completely breaking this optimization is a big deal, >>> particularly >>> since it affects the layout of closure objects. We can't always >>> optimize everything away. >> ACK. >> >> The attached updated patch makes cp_walk_subtrees walk array type >> dimensions, which fixes the initial PR without those 3 >> regressions. >> >> >> Successfully tested on x86_64-pc-linux-gnu. Is it OK? >> >> Simon >> >> PS: In theory I think it’d make sense to do do this change in >> >> walk_tree_1 since C also supports VLAs, but doing so breaks some >> >> OMP >> tests. OMP can do interesting things with array bounds (see >> r9-5354-gda972c05f48637), and fixing those would require teaching >> walk_tree_1 about the “omp dummy var” array bounds, which I >> think >> would be a bit ugly. And I’m not aware of any C case that would >> be >> improved/fixed by doing this change, so we’re probably fine not >> doing >> it. > >> From e19bb6c943a422b1201c5c9a2a1d4f32141baf84 Mon Sep 17 >> 00:00:00 >> 2001 >> From: Simon Martin >> Date: Wed, 22 Jan 2025 16:19:47 +0100 >> Subject: [PATCH] c++: Don't prune constant capture proxies only >> used >> in array >>dimensions [PR114292] >> >> We currently ICE upon the following valid (under -Wno-vla) code >> >> === cut here === >> void f(int c) { >> constexpr int r = 4; >> [&](auto) { int t[r * c]; }(0); >> } >> === cut here === >> >> When parsing the lambda body, and more specifically the >> multiplication, >> we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED, which >> indicates >> to >> prune_lambda_captures that it might be po
Re: [PATCH] asan: Fix missing FakeStack flag cleanup
On Fri, 2025-01-24 at 17:25 +0100, Jakub Jelinek wrote: > On Thu, Jan 09, 2025 at 01:15:30AM +0100, Ilya Leoshkevich wrote: > > Bootstrapped and regtested on x86_64-redhat-linux. Ok for master? > > > > > > > > The FakeStack flag is not zeroed out when can_store_by_pieces() > > returns false. Over time, this causes FakeStack::Allocate() to > > perform > > the maximum number of loop iterations, significantly slowing down > > the > > instrumented program. > > Took me a while to construct a testcase where it makes a difference, > but e.g. > void foo (int *, int *, int *, int *, int *, int *); > > int > bar (void) > { > int a[3], b[26]; > foo (a, b, 0, 0, 0, 0); > return 0; > } > > int > baz (void) > { > int a[3], b[26], c[371], d[12], e[257], f[5]; > foo (a, b, c, d, e, f); > return 0; > } > shows it on s390x with -O2 -fsanitize=address on bar but not baz (on > x86_64 > not on either). > > > gcc/ChangeLog: > > > > * asan.cc (asan_emit_stack_protection): Always zero the flag > > unless it is cleared by the __asan_stack_free_N() libcall. > > > > Signed-off-by: Ilya Leoshkevich > > --- > > gcc/asan.cc | 30 ++ > > 1 file changed, 18 insertions(+), 12 deletions(-) [...] > Ok for trunk with that nit fixed. > > Jakub Thank you for the review! I have fixed the style and committed this. Would it be okay to backport this to gcc-13 and gcc-14? Bootstrap and regtest pass on x86_64-redhat-linux.
Re: [PATCH] libstdc++: correct symbol version of typeinfo for bfloat16_t on RISC-V
On Mon, 27 Jan 2025 at 10:55, Andreas Schwab wrote: > > RISC-V only gained support for bfloat16_t after gcc 14. Passes > libstdc++/check_abi on {x86_64,aarch64,ppc64le,riscv64,s390x}-suse-linux. OK, thanks. > > PR libstdc++/118563 > * testsuite/util/testsuite_abi.cc (check_version): Add > CXXABI_1.3.16. > * config/abi/pre/gnu.ver (CXXABI_1.3.14) [__riscv]: Exclude > typeinfo for bfloat16_t. > (CXXABI_1.3.16) [__riscv]: Add it here. > --- > libstdc++-v3/config/abi/pre/gnu.ver | 23 > libstdc++-v3/testsuite/util/testsuite_abi.cc | 1 + > 2 files changed, 20 insertions(+), 4 deletions(-) > > diff --git a/libstdc++-v3/config/abi/pre/gnu.ver > b/libstdc++-v3/config/abi/pre/gnu.ver > index f7641974ec4..84ce874fe03 100644 > --- a/libstdc++-v3/config/abi/pre/gnu.ver > +++ b/libstdc++-v3/config/abi/pre/gnu.ver > @@ -2852,10 +2852,15 @@ CXXABI_1.3.13 { > CXXABI_1.3.14 { > > # typeinfo for _Float{16,32,64,128,32x,64x,128x} and > -# __bf16 > -_ZTIDF[0-9]*[_bx]; > -_ZTIPDF[0-9]*[_bx]; > -_ZTIPKDF[0-9]*[_bx]; > +# __bf16/bfloat16_t > +_ZTIDF[0-9]*[_x]; > +_ZTIPDF[0-9]*[_x]; > +_ZTIPKDF[0-9]*[_x]; > +#ifndef __riscv > +_ZTIDF16b; > +_ZTIPDF16b; > +_ZTIPKDF16b; > +#endif > _ZTIu6__bf16; > _ZTIPu6__bf16; > _ZTIPKu6__bf16; > @@ -2869,6 +2874,16 @@ CXXABI_1.3.15 { > > } CXXABI_1.3.14; > > +CXXABI_1.3.16 { > + > +#ifdef __riscv > +_ZTIDF16b; > +_ZTIPDF16b; > +_ZTIPKDF16b; > +#endif > + > +} CXXABI_1.3.15; > + > # Symbols in the support library (libsupc++) supporting transactional memory. > CXXABI_TM_1 { > > diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc > b/libstdc++-v3/testsuite/util/testsuite_abi.cc > index dcf5b1a4e0b..0d6080fb92c 100644 > --- a/libstdc++-v3/testsuite/util/testsuite_abi.cc > +++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc > @@ -237,6 +237,7 @@ check_version(symbol& test, bool added) >known_versions.push_back("CXXABI_1.3.13"); >known_versions.push_back("CXXABI_1.3.14"); >known_versions.push_back("CXXABI_1.3.15"); > + known_versions.push_back("CXXABI_1.3.16"); >known_versions.push_back("CXXABI_IEEE128_1.3.13"); >known_versions.push_back("CXXABI_TM_1"); >known_versions.push_back("CXXABI_FLOAT128"); > -- > 2.48.1 > > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different." >
Re: [PATCH] asan: Fix missing FakeStack flag cleanup
On Mon, Jan 27, 2025 at 02:02:14PM +0100, Ilya Leoshkevich wrote: > Thank you for the review! > I have fixed the style and committed this. > > Would it be okay to backport this to gcc-13 and gcc-14? > Bootstrap and regtest pass on x86_64-redhat-linux. Ok. Jakub
Re: [PATCH] c++, v3: Implement for namespace statics CWG 2867 - Order of initialization for structured bindings [PR115769]
On 1/26/25 2:25 PM, Jakub Jelinek wrote: On Sat, Jan 25, 2025 at 10:53:50AM -0500, Jason Merrill wrote: On 1/25/25 4:12 AM, Jakub Jelinek wrote: On Fri, Jan 24, 2025 at 07:07:15PM -0500, Jason Merrill wrote: Hypothetically, but those cases are just either error or DECL_EXTERNAL. In the error case we're failing anyway; in the external case all the base/nonbase for a particular structured binding declaration should be consistent. So shall I just remove all the prune_vars_needing_no_initialization hunks then or add gcc_checking_assert (!STATIC_INIT_DECOMP_BASE_P (t) && !STATIC_INIT_DECOMP_NONBASE_P (t)); for the DECL_EXTERNAL punt case? Just the assert sounds good. Note, unfortunately it is hard to come up with a testcase that actually prunes something on purpose... Indeed, it shouldn't be possible. So like this? OK. Passed x86_64-linux and i686-linux bootstrap/regtest. 2025-01-26 Jakub Jelinek PR c++/115769 gcc/cp/ * cp-tree.h (STATIC_INIT_DECOMP_BASE_P): Define. (STATIC_INIT_DECOMP_NONBASE_P): Define. * decl.cc (cp_finish_decl): Mark nodes in {static,tls}_aggregates with * decl2.cc (decomp_handle_one_var, decomp_finalize_var_list): New functions. (emit_partial_init_fini_fn): Use them. (prune_vars_needing_no_initialization): Assert STATIC_INIT_DECOMP_*BASE_P is not set on DECL_EXTERNAL vars to be pruned out. (partition_vars_for_init_fini): Use same priority for consecutive STATIC_INIT_DECOMP_*BASE_P vars and propagate those flags to new TREE_LISTs when possible. Formatting fix. (handle_tls_init): Use decomp_handle_one_var and decomp_finalize_var_list functions. gcc/testsuite/ * g++.dg/DRs/dr2867-5.C: New test. * g++.dg/DRs/dr2867-6.C: New test. * g++.dg/DRs/dr2867-7.C: New test. * g++.dg/DRs/dr2867-8.C: New test. --- gcc/cp/cp-tree.h.jj 2024-09-07 09:31:20.601484156 +0200 +++ gcc/cp/cp-tree.h2024-09-09 15:53:44.924112247 +0200 @@ -470,6 +470,7 @@ extern GTY(()) tree cp_global_trees[CPTI BASELINK_FUNCTIONS_MAYBE_INCOMPLETE_P (in BASELINK) BIND_EXPR_VEC_DTOR (in BIND_EXPR) ATOMIC_CONSTR_EXPR_FROM_CONCEPT_P (in ATOMIC_CONSTR) + STATIC_INIT_DECOMP_BASE_P (in the TREE_LIST for {static,tls}_aggregates) 2: IDENTIFIER_KIND_BIT_2 (in IDENTIFIER_NODE) ICS_THIS_FLAG (in _CONV) DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (in VAR_DECL) @@ -489,6 +490,8 @@ extern GTY(()) tree cp_global_trees[CPTI IMPLICIT_CONV_EXPR_BRACED_INIT (in IMPLICIT_CONV_EXPR) PACK_EXPANSION_AUTO_P (in *_PACK_EXPANSION) contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT) + STATIC_INIT_DECOMP_NONBASE_P (in the TREE_LIST + for {static,tls}_aggregates) 3: IMPLICIT_RVALUE_P (in NON_LVALUE_EXPR or STATIC_CAST_EXPR) ICS_BAD_FLAG (in _CONV) FN_TRY_BLOCK_P (in TRY_BLOCK) @@ -5947,6 +5950,21 @@ extern bool defer_mangling_aliases; extern bool flag_noexcept_type; +/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic + initialization of namespace scope structured binding base or related + extended ref init temps. Temporaries from the initialization of + STATIC_INIT_DECOMP_BASE_P dynamic initializers should be destroyed only + after the last STATIC_INIT_DECOMP_NONBASE_P dynamic initializer following + it. */ +#define STATIC_INIT_DECOMP_BASE_P(NODE) \ + TREE_LANG_FLAG_1 (TREE_LIST_CHECK (NODE)) + +/* True if this TREE_LIST in {static,tls}_aggregates is a for dynamic + initialization of namespace scope structured binding non-base + variable using get. */ +#define STATIC_INIT_DECOMP_NONBASE_P(NODE) \ + TREE_LANG_FLAG_2 (TREE_LIST_CHECK (NODE)) + /* A list of namespace-scope objects which have constructors or destructors which reside in the global scope. The decl is stored in the TREE_VALUE slot and the initializer is stored in the --- gcc/cp/decl.cc.jj 2024-09-09 11:50:07.146394047 +0200 +++ gcc/cp/decl.cc 2024-09-09 17:16:26.459094150 +0200 @@ -8485,6 +8485,7 @@ cp_finish_decl (tree decl, tree init, bo bool var_definition_p = false; tree auto_node; auto_vec extra_cleanups; + tree aggregates1 = NULL_TREE; struct decomp_cleanup { tree decl; cp_decomp *&decomp; @@ -8872,7 +8873,16 @@ cp_finish_decl (tree decl, tree init, bo } if (decomp) - cp_maybe_mangle_decomp (decl, decomp); + { + cp_maybe_mangle_decomp (decl, decomp); + if (TREE_STATIC (decl) && !DECL_FUNCTION_SCOPE_P (decl)) + { + if (CP_DECL_THREAD_LOCAL_P (decl)) + aggregates1 = tls_aggregates; + else + aggregates1 = static_aggregates; + } + } /* If this is a local variable that will need a mangled name, register it now. We must do this be
RE: [RFC PATCH] i386: Re-alias -mavx10.2 to 512 bit and make -mno-avx10.x-512 disable the whole AVX10.x
> From: Richard Biener > Sent: Monday, January 27, 2025 5:09 PM > > On Mon, Jan 27, 2025 at 8:30 AM Haochen Jiang > wrote: > > > > Hi all, > > > > AVX10 has been published for one and half year and we have got many > > feedbacks on that, one of the feedback is on whether the alias option > > -mavx10.x should point to 256 or 512. > > > > If you also pay attention to LLVM community, you might see this thread > > related to AVX10 options just sent out several hours ago: > > > > [X86][AVX10] Disable m[no-]avx10.1 and switch m[no-]avx10.2 to alias > > of 512 bit options > > https://github.com/llvm/llvm-project/pull/124511 > > > > In GCC, we will also do so. This RFC patch is slightly different with > > LLVM, just > > including: > > > > - Switch -m[no-]avx10.2 to alias of 512 bit options. > > - Change -mno-avx10.[1,2]-512 to disable both 256 and 512 instructions. > This > > will also result in -mno-avx10.2 would still disable both 256 and 512 > > insts > > according to new alias point to 512. > > > > But not including disabling -m[no-]avx10.1, since I still want more > > input on how to handle that. We actually have three choices on that: > > > > a. Directly re-alias -m[no-]avx10.1 to -m[no-]avx10.1-512 GCC 15 and > > backport to GCC 14. > > b. Disable -m[no]-avx10.1 in GCC 15, and add it back with > > -m[no-]avx10.1-512 in the future. This is for in case if someone > > cross compile with different versions of GCC with -mavx10.1, it might get > unexpected result sliently. > > c. Disable -m[no]-avx10.1 in GCC 15, and never add it back. Since the > > option has been 256 bit, changing them back and forth is messy. > > > > It might be the final chance we could change the alias option since > > real > > AVX10.1 hardware is coming soon. And it is only x86 specific, so it > > might still squeeze into GCC 15 at this time. > > > > I call this patch RFC patch since we also need to change the doc and > > testcases accordingly, which makes this patch incomplete. Discussion > > and input is welcomed on this topic. > > Can you re-hash on how users need to select 256bit vs 512bit support? > I understand > the above change basically makes -m[no-]avx10.[12] gate ISA features but not > size? > So that will now enable 512bit unless -mno-evex512 is given? > -mno-evex512 will do nothing with AVX10 related options. It will only apply on -mavx512xxx options. In GCC currently, take AVX10.2 as example, we have the following options for one AVX10 version: -mavx10.2-256: Enable AVX10.2 ISA set with 256 bit vector size only -mavx10.2-512: Enable AVX10.2 ISA set with both 256 and 512 bit vector size -mavx10.2: An alias to -mavx10.2-256 Based on that, the current -mno- option would be: -mno-avx10.2-256: Disable AVX10.2-256 ISA set, which actually disables the whole AVX10.2. -mno-avx10.2-512: Disable AVX10.2 ISA set 512 bit vector usage, but keep 256 bit there. -mno-avx10.2: An alias to -mno-avx10.2-256. In this RFC, basically we have two main goal, both of them are introduced in GCC 15: - Change -mavx10.2 alias to 512 bit instead of 256 bit. - -mno-avx10.2-512 would disable the whole AVX10.2, instead of 512 bit only, to eliminate different understanding on that due to in AVX10 we turned the imply relationship from line-like to square-like. Also, it will lead to the new -mno-avx10.2 still disable the whole AVX10.2 ISA set, which meets its first impression on option name. Due to that, -mno-avx10.1-512 should also align with that, which is introduced in GCC 14. > Given for the -mavx10.[12]-{256,512} the behavior changes compared to GCC > 14 I'd rather drop the options that behave differently from GCC 14 on GCC 15 > than changing their meaning. That unfortunately will make them a hard error > (but I don't expect much use). I'm not sure it's worth retaining -m[no- > ]avx10.[12]-512. -mavx10.[12]-256/512 are clear, -mno-avx10.[12]-256 is also clear. No need to concern on that. As said, the changes are on -mavx10.x alias and -mno-avx10.x-512, where AVX10.1 introduced in GCC 14 and AVX10.2 in GCC 15. In my opinion, I would go either option a) or option c) for -mavx10.1. AVX10.1 options are not widely used for now due to its first appearance in GNR. So that is why option a) could be doable although changing the meaning, but it is the last chance for option a. Option c) is a safe choice and I pretty like it. I don't like option b) but I list that out since it is also a choice. I listed them all for discussion. For AVX10.2, since it is first introduced in GCC 15, we could change that w/o considering compatibility issue. > > But maybe I'm misunderstanding the change (too many avx10.x related > options are there). > Yes, that is quite confusing and takes me some time to try to illustrate them as clear as I can. Thx, Haochen
Re: [PATCH] options: Adjust cl_optimization_compare to avoid checking ICE [PR115913]
On Mon, Jan 27, 2025 at 1:28 PM Lewis Hyatt wrote: > > Hello- > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115913#c10 > > This tweak to optc-save-gen.awk weakens the check performed by > cl_optimization_compare() to avoid checking asserts that have been there > since this function was first added in r11-1141. Is it OK for 15 please? I > think it would be appropriate to backport back to 12,13,14 as well. > Bootstrap + regtest all languages on x86-64 Linux with no issues. Thanks! The check is supposed to verify that cl_target_option_restore does not alter the set of global options restored by cl_optimization_restore (or others). Since there are no warning flags restored in cl_optimization_restore the patch is OK. I suppose to follow the intend we'd have to save the current set of diagnostic options after cl_optimization_restore and compare those as well after the cl_target_option_restore. Thanks, Richard. > -Lewis > > -- >8 -- > > At the end of a sequence like: > #pragma GCC push_options > ... > #pragma GCC pop_options > > the handler for pop_options calls cl_optimization_compare() (as generated by > optc-save-gen.awk) to make sure that all global state has been restored to > the value it had prior to the push_options call. The verification is > performed for almost all entries in the global_options struct. This leads to > unexpected checking asserts, as discussed in the PR, in case the state of > warnings-related options has been intentionally modified in between > push_options and pop_options via a call to #pragma GCC diagnostic. Address > that by skipping the verification for CL_WARNING-flagged options. > > gcc/ChangeLog: > > PR middle-end/115913 > * optc-save-gen.awk (cl_optimization_compare): Skip options with > CL_WARNING flag. > > gcc/testsuite/ChangeLog: > > PR middle-end/115913 > * c-c++-common/cpp/pr115913.c: New test. > --- > gcc/optc-save-gen.awk | 5 + > gcc/testsuite/c-c++-common/cpp/pr115913.c | 7 +++ > 2 files changed, 12 insertions(+) > create mode 100644 gcc/testsuite/c-c++-common/cpp/pr115913.c > > diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk > index fa9218472ed..a3d7e5a478e 100644 > --- a/gcc/optc-save-gen.awk > +++ b/gcc/optc-save-gen.awk > @@ -1484,6 +1484,11 @@ for (i = 0; i < n_opts; i++) { > if (name == "") > continue; > > + # We do not want to compare warning-related options, since they > + # might have been modified by a #pragma GCC diagnostic. > + if (flag_set_p("Warning", flags[i])) > + continue; > + > if (name in checked_options) > continue; > checked_options[name]++ > diff --git a/gcc/testsuite/c-c++-common/cpp/pr115913.c > b/gcc/testsuite/c-c++-common/cpp/pr115913.c > new file mode 100644 > index 000..b9d10cda8d2 > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/cpp/pr115913.c > @@ -0,0 +1,7 @@ > +/* { dg-do preprocess } */ > +/* PR middle-end/115913 */ > +#pragma GCC push_options > +#pragma GCC diagnostic warning "-Wundef" > +/* The call to cl_optimization_compare performed by pop_options should not > + lead to a checking failure. */ > +#pragma GCC pop_options
Re: [PATCH v3] [ifcombine] avoid creating out-of-bounds BIT_FIELD_REFs [PR118514]
On Sat, Jan 25, 2025 at 4:37 AM Alexandre Oliva wrote: > > On Jan 24, 2025, Richard Biener wrote: > > > Hmm. I think when an original ref could trap that should be the > > insertion point (or the original ref should post-dominate the insertion > > point). > > I suppose we could do that, but... intuitively, it doesn't feel safe to > move a nontrapping load down to combine it with a trapping load either. > > > It's fine when an originally may-trap reference becomes not trapping > > (and it really cannot trap). Introducing new (may-)traps is never OK. > > I don't really see how a may-trap reference to a certain range of bits > could possibly become non-trapping by merely reissuing it, so I can't > respond intelligently to your assessment. What I was saying that the conservative tree_could_trap_p could say 'yes' to a certain encoding of a ref but 'no' to another if in reality the ref can never trap. We of course cannot (apart from bugs in tree_could_trap_p) turn a for-sure trap into a not-trap by simply rewriting the ref. > > So, can we split the patch to the case with the testcase at hand and > > consider the assert and the extra tree_could_trap_p checks separately? > > The approved patch, with the testcase and the simplest fix, is already > in. I've rebased the additional defensive checks and interface changes > into the following, but I don't have any pressing need of getting them > in: I don't know how to trigger any of the issues tested for. > > > If decode_field_reference finds a load that accesses past the inner > object's size, bail out. > > If loads we're attempting to combine don't have the same trapping > status, bail out, to avoid replacing the wrong load and enabling loads > to be moved that shouldn't. > > Regstrapped on x86_64-linux-gnu. Should I put this in? > > > for gcc/ChangeLog > > PR tree-optimization/118514 > * gimple-fold.cc (decode_field_reference): Refuse to consider > merging out-of-bounds BIT_FIELD_REFs. > (fold_truth_andor_for_ifcombine): Check that combinable loads > have the same trapping status. > * tree-eh.cc (bit_field_ref_in_bounds_p): Rename to... > (access_in_bounds_of_type_p): ... this. Change interface, > export. > (tree_could_trap_p): Adjust. > * tree-eh.h (access_in_bounds_of_type_p): Declare. > --- > gcc/gimple-fold.cc | 16 ++-- > gcc/tree-eh.cc | 25 + > gcc/tree-eh.h |1 + > 3 files changed, 24 insertions(+), 18 deletions(-) > > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc > index 45485782cdf91..24aada4bc8fee 100644 > --- a/gcc/gimple-fold.cc > +++ b/gcc/gimple-fold.cc > @@ -7686,10 +7686,8 @@ decode_field_reference (tree *pexp, HOST_WIDE_INT > *pbitsize, >|| bs <= shiftrt >|| offset != 0 >|| TREE_CODE (inner) == PLACEHOLDER_EXPR > - /* Reject out-of-bound accesses (PR79731). */ > - || (! AGGREGATE_TYPE_P (TREE_TYPE (inner)) > - && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)), > - bp + bs) < 0) > + /* Reject out-of-bound accesses (PR79731, PR118514). */ > + || !access_in_bounds_of_type_p (TREE_TYPE (inner), bs, bp) So I think we want this bit in (and it's dependences), but >|| (INTEGRAL_TYPE_P (TREE_TYPE (inner)) > && !type_has_mode_precision_p (TREE_TYPE (inner > return NULL_TREE; > @@ -8228,8 +8226,12 @@ fold_truth_andor_for_ifcombine (enum tree_code code, > tree truth_type, >std::swap (rl_loc, rr_loc); > } > > + /* Check that the loads that we're trying to combine have the same vuse and > + same trapping status. */ >if ((ll_load && rl_load) > - ? gimple_vuse (ll_load) != gimple_vuse (rl_load) > + ? (gimple_vuse (ll_load) != gimple_vuse (rl_load) > +|| (tree_could_trap_p (gimple_assign_rhs1 (ll_load)) > +!= tree_could_trap_p (gimple_assign_rhs1 (rl_load >: (!ll_load != !rl_load)) this and > return 0; > > @@ -8282,7 +8284,9 @@ fold_truth_andor_for_ifcombine (enum tree_code code, > tree truth_type, >else if (lr_reversep != rr_reversep >|| ! operand_equal_p (lr_inner, rr_inner, 0) >|| ((lr_load && rr_load) > - ? gimple_vuse (lr_load) != gimple_vuse (rr_load) > + ? (gimple_vuse (lr_load) != gimple_vuse (rr_load) > + || (tree_could_trap_p (gimple_assign_rhs1 (lr_load)) > + != tree_could_trap_p (gimple_assign_rhs1 (rr_load >: (!lr_load != !rr_load))) this not at this point (I see the assert is no longer in the patch). So, apart from these two hunks this is OK as well. Thanks, Richard. > return 0; > > diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc > index 27a4b2b5b1665..68008cea588a7 100644 > --- a/gcc/tree-eh.cc > +++ b/gcc/tree-eh.cc > @@ -2646,24 +2646,22 @@ range_in_array_bounds_p (tree ref) >return true; > } >
Re: [PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]
On 1/27/25 8:14 AM, Simon Martin wrote: Hi Jason, On 24 Jan 2025, at 16:51, Jason Merrill wrote: On 1/24/25 6:19 AM, Simon Martin wrote: Hi Jason, On 23 Jan 2025, at 22:56, Jason Merrill wrote: On 1/23/25 12:06 PM, Simon Martin wrote: Hi Marek, On 23 Jan 2025, at 16:45, Marek Polacek wrote: On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote: Hi Jason, On 20 Jan 2025, at 22:49, Jason Merrill wrote: On 1/20/25 2:11 PM, Simon Martin wrote: Hi Jason, On 15 Jan 2025, at 22:42, Jason Merrill wrote: On 12/12/24 2:07 PM, Simon Martin wrote: We currently ICE upon the following valid (under -Wno-vla) code === cut here === void f(int c) { constexpr int r = 4; [&](auto) { int t[r * c]; }(0); } === cut here === The problem is that when parsing the lambda body, and more specifically the multiplication, we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED even though the replacement of r by 4 is "undone" by the call to build_min_non_dep in build_x_binary_op. This makes prune_lambda_captures remove the proxy declaration while it should not, and we trip on an assert at instantiation time. Why does prune_lambda_captures remove the declaration if it's still used in the function body? Setting LAMBDA_EXPR_CAPTURE_OPTIMIZED just means "we might have optimized away some captures", the tree walk should have found the use put back by build_x_binary_op. I think that this is due to a bug in mark_const_cap_r, that’s been here since the beginning, i.e. r8-7213-g1577f10a637352: it decides NOT to walk sub-trees when encountering a DECL_EXPR for a constant capture proxy (at lambda.cc:1769). I don’t understand why we wouldn’t want to continue. Because that's the declaration of the capture proxy, not a use of it. Indeed, thanks for clarifying. Why aren't we finding the use in the declaration of t? After further investigation, the problem is rather that neither walk_tree_1 nor cp_walk_subtrees walk the dimensions of array types, so we miss the uses. Removing that line fixes the PR, but breaks 3 existing tests (lambda-capture1.C, lambda-const11.C and lambda-const11a.C, that all assert that we optimise out the capture); that’s why I did not pursue it in the first place. Right. But taking another look, it might not be that big a deal that we don’t optimise those out: as soon as we use -O1 or above, the assignment to the closure field actually disappears. Completely breaking this optimization is a big deal, particularly since it affects the layout of closure objects. We can't always optimize everything away. ACK. The attached updated patch makes cp_walk_subtrees walk array type dimensions, which fixes the initial PR without those 3 regressions. Successfully tested on x86_64-pc-linux-gnu. Is it OK? Simon PS: In theory I think it’d make sense to do do this change in walk_tree_1 since C also supports VLAs, but doing so breaks some OMP tests. OMP can do interesting things with array bounds (see r9-5354-gda972c05f48637), and fixing those would require teaching walk_tree_1 about the “omp dummy var” array bounds, which I think would be a bit ugly. And I’m not aware of any C case that would be improved/fixed by doing this change, so we’re probably fine not doing it. From e19bb6c943a422b1201c5c9a2a1d4f32141baf84 Mon Sep 17 00:00:00 2001 From: Simon Martin Date: Wed, 22 Jan 2025 16:19:47 +0100 Subject: [PATCH] c++: Don't prune constant capture proxies only used in array dimensions [PR114292] We currently ICE upon the following valid (under -Wno-vla) code === cut here === void f(int c) { constexpr int r = 4; [&](auto) { int t[r * c]; }(0); } === cut here === When parsing the lambda body, and more specifically the multiplication, we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED, which indicates to prune_lambda_captures that it might be possible to optimize out some captures. The problem is that prune_lambda_captures then misses the use of the r capture (because neither walk_tree_1 nor cp_walk_subtrees walks the dimensions of array types - here "r * c"), hence believes the capture can be pruned... and we trip on an assert when instantiating the lambda. This patch changes cp_walk_subtrees so that when walking a declaration with an array type, it walks that array type's dimensions. Since C also supports VLAs, I thought it'd make sense to do this in walk_tree_1, but this breaks some omp-low related test cases (because OMP can do funky things with array bounds when adjust_array_error_bounds is set), and I don't really want to add code to walk_tree_1 to skips arrays whose dimension is a temporary variable with the "omp dummy var" attribute. Successfully tested on x86_64-pc-linux-gnu. PR c++/114292 gcc/cp/ChangeLog: * tree.cc (cp_walk_subtrees): Walk array type dimensions. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/lambda-ice4.C: New test. --- gcc/cp/tree.cc
Re: [PATCH/GCC16 1/1] AArch64: Emit half-precision FCMP/FCMPE
Spencer Abson writes: > Enable a target with FEAT_FP16 to emit the half-precision variants > of FCMP/FCMPE. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp > and fcmpe to use the GPF_F16 iterator for floating-point > modes. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/_Float16_cmp_1.c: New test. > * gcc.target/aarch64/_Float16_cmp_2.c: New (negative) test. > --- > gcc/config/aarch64/aarch64.md | 29 +- > .../gcc.target/aarch64/_Float16_cmp_1.c | 54 +++ > .../gcc.target/aarch64/_Float16_cmp_2.c | 7 +++ > 3 files changed, 77 insertions(+), 13 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 071058dbeb3..8721bf5d4f3 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -707,11 +707,12 @@ > ) > > (define_expand "cbranch4" > - [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" > - [(match_operand:GPF 1 "register_operand") > - (match_operand:GPF 2 > "aarch64_fp_compare_operand")]) > -(label_ref (match_operand 3 "" "")) > -(pc)))] > + [(set (pc) (if_then_else > + (match_operator 0 "aarch64_comparison_operator" > + [(match_operand:GPF_F16 1 "register_operand") > + (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")]) > + (label_ref (match_operand 3 "" "")) > + (pc)))] >"" >" >operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1], > @@ -4338,26 +4339,28 @@ > > (define_insn "fcmp" >[(set (reg:CCFP CC_REGNUM) > -(compare:CCFP (match_operand:GPF 0 "register_operand") > - (match_operand:GPF 1 "aarch64_fp_compare_operand")))] > + (compare:CCFP > + (match_operand:GPF_F16 0 "register_operand") > + (match_operand:GPF_F16 1 "aarch64_fp_compare_operand")))] Minor formatting nit, but it'd be more usual to indent by two extra columns relative to the (compare ...), i.e.: (compare:CCFP (match_operand:GPF_F16 0 "register_operand") (match_operand:GPF_F16 1 "aarch64_fp_compare_operand")))] Similarly for the others. OK with that change for GCC 16, thanks. BTW: > diff --git a/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c > b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c > new file mode 100644 > index 000..e714304970b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c > @@ -0,0 +1,7 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=armv8.2-a+nofp16" } */ > + > +#include "_Float16_cmp_1.c" > + > +/* { dg-final { scan-assembler-not "\tfcmp\th\[0-9\]\+" } } */ > +/* { dg-final { scan-assembler-not "\tfcmpe\th\[0-9\]\+" } } */ > \ No newline at end of file It can be easier to write these tests using {...} quoting for the regexp as well, so that backslashes are regexp backslashes rather than Tcl string backslashes: /* { dg-final { scan-assembler-not {\tfcmp\th[0-9]+} } } */ /* { dg-final { scan-assembler-not {\tfcmpe\th[0-9]+} } } */ The "..." version is ok (and often used) too though. Richard
Re: [PATCH] rtl-optimization/118662 - wrong combination of vector sign-extends
Richard Biener writes: > On Mon, 27 Jan 2025, Jakub Jelinek wrote: > >> On Mon, Jan 27, 2025 at 11:09:38AM +0100, Richard Biener wrote: >> >PR rtl-optimization/118662 >> >* combine.cc (try_combine): When re-materializing a load >> >from an extended reg by a lowpart subreg make sure we're >> >dealing with single-component modes. >> > >> >* gcc.dg/torture/pr118662.c: New testcase. >> > --- >> > gcc/combine.cc | 5 + >> > gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++ >> > 2 files changed, 23 insertions(+) >> > create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c >> > >> > diff --git a/gcc/combine.cc b/gcc/combine.cc >> > index a2d4387cebe..4849603ba5e 100644 >> > --- a/gcc/combine.cc >> > +++ b/gcc/combine.cc >> > @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn >> > *i1, rtx_insn *i0, >> > copy. This saves at least one insn, more if register allocation can >> > eliminate the copy. >> > >> > + We cannot do this if the involved modes have more than one elements, >> > + like for vector or complex modes. >> > + >> > We cannot do this if the destination of the first assignment is a >> > condition code register. We eliminate this case by making sure >> > the SET_DEST and SET_SRC have the same mode. >> > @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn >> > *i1, rtx_insn *i0, >> > && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND >> > && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0))) >> > == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0 >> > + && known_eq (GET_MODE_NUNITS >> > +(GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0, 1) >> >> If you want to rule this out for vector/complex modes, then the above >> doesn't do that, all the V1??mode will still be optimized, and I think >> that is undesirable. > > If you think so (I think it should work fine for V1??mode). Yeah, FWIW, I liked the original version, since it seemed to capture the property that matters. (No objection to the new variant though.) Thanks, Richard
[PATCH v3 3/4] RISC-V: Fix incorrect code gen for scalar signed SAT_SUB [PR117688]
From: Pan Li This patch would like to fix the wroing code generation for the scalar signed SAT_SUB. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like sub. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get 0xff - 1 = 0xfe which is incorrect. Thus, we have to sign extend 0xff(Qmode) to 0x(assume XImode is DImode) before sub in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688.h: Add test helper macro. * gcc.target/riscv/pr117688-sub-run-1-s16.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s32.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s64.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s8.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 4 ++-- .../gcc.target/riscv/pr117688-sub-run-1-s16.c | 6 ++ .../gcc.target/riscv/pr117688-sub-run-1-s32.c | 6 ++ .../gcc.target/riscv/pr117688-sub-run-1-s64.c | 6 ++ .../gcc.target/riscv/pr117688-sub-run-1-s8.c | 6 ++ gcc/testsuite/gcc.target/riscv/pr117688.h | 21 +++ 6 files changed, 47 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index b9fb3733b86..1e5de85e871 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12919,8 +12919,8 @@ riscv_expand_sssub (rtx dest, rtx x, rtx y) machine_mode mode = GET_MODE (dest); unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant (); rtx shift_bits = GEN_INT (bitsize - 1); - rtx xmode_x = gen_lowpart (Xmode, x); - rtx xmode_y = gen_lowpart (Xmode, y); + rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, SIGN_EXTEND); + rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, SIGN_EXTEND); rtx xmode_minus = gen_reg_rtx (Xmode); rtx xmode_xor_0 = gen_reg_rtx (Xmode); rtx xmode_xor_1 = gen_reg_rtx (Xmode); diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c new file mode 100644 index 000..7b375bb6c85 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s16.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_SUB_RUN(int16_t, INT16_MIN, INT16_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c new file mode 100644 index 000..ba0e8fc8ea5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s32.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_SUB_RUN(int32_t, INT32_MIN, INT32_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c new file mode 100644 index 000..c24c549af30 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s64.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_SUB_RUN(int64_t, INT64_MIN, INT64_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c new file mode 100644 index 000..67f9df179a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-sub-run-1-s8.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_SUB_RUN(int8_t, INT8_MIN, INT8_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688.h b/gcc/testsuite/gcc.target/riscv/pr117688.h index 1013a8a8012..3b734ce62ed 100644 --- a/gcc/testsuite/gcc.target/riscv/pr117688.h +++ b/gcc/testsuite/gcc.target/riscv/pr117688.h @@ -24,4 +24,25 @@ return 0; \ } +#define DEFINE_SIGNED_SAT_SUB_RUN(T, MIN, MAX) \ + T x, y, result;
[PATCH v3 2/4] RISC-V: Fix incorrect code gen for scalar signed SAT_ADD [PR117688]
From: Pan Li This patch would like to fix the wroing code generation for the scalar signed SAT_ADD. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like add. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get 0xff + 2 = 0x101 which is incorrect. Thus, we have to sign extend 0xff(Qmode) to 0x(assume XImode is DImode) before plus in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688-add-run-1-s16.c: New test. * gcc.target/riscv/pr117688-add-run-1-s32.c: New test. * gcc.target/riscv/pr117688-add-run-1-s64.c: New test. * gcc.target/riscv/pr117688-add-run-1-s8.c: New test. * gcc.target/riscv/pr117688.h: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 4 +-- .../gcc.target/riscv/pr117688-add-run-1-s16.c | 6 + .../gcc.target/riscv/pr117688-add-run-1-s32.c | 6 + .../gcc.target/riscv/pr117688-add-run-1-s64.c | 6 + .../gcc.target/riscv/pr117688-add-run-1-s8.c | 6 + gcc/testsuite/gcc.target/riscv/pr117688.h | 27 +++ 6 files changed, 53 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688.h diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 34f0a888c5c..b9fb3733b86 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12815,8 +12815,8 @@ riscv_expand_ssadd (rtx dest, rtx x, rtx y) machine_mode mode = GET_MODE (dest); unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant (); rtx shift_bits = GEN_INT (bitsize - 1); - rtx xmode_x = gen_lowpart (Xmode, x); - rtx xmode_y = gen_lowpart (Xmode, y); + rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, SIGN_EXTEND); + rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, SIGN_EXTEND); rtx xmode_sum = gen_reg_rtx (Xmode); rtx xmode_dest = gen_reg_rtx (Xmode); rtx xmode_xor_0 = gen_reg_rtx (Xmode); diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c new file mode 100644 index 000..21ec107cbf1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s16.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_ADD_RUN(int16_t, INT16_MIN, INT16_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c new file mode 100644 index 000..1f197d1280b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s32.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_ADD_RUN(int32_t, INT32_MIN, INT32_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c new file mode 100644 index 000..4903bc854d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s64.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_ADD_RUN(int64_t, INT64_MIN, INT64_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c new file mode 100644 index 000..a9f2fe7f192 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-add-run-1-s8.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_ADD_RUN(int8_t, INT8_MIN, INT8_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688.h b/gcc/testsuite/gcc.target/riscv/pr117688.h new file mode 100644 index 000..1013a8a8012 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688.h @@ -0,0 +1,27 @@ +#ifndef HAVE_DEFINED_PR117688_H +#define HAVE_DEFINED_PR117688_H + +#include + +#define DEFINE_SIGNED_SAT_ADD_RUN(T, MIN, MAX)\ + T x, y, re
[PATCH v3 4/4] RISC-V: Fix incorrect code gen for scalar signed SAT_TRUNC [PR117688]
From: Pan Li This patch would like to fix the wroing code generation for the scalar signed SAT_TRUNC. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like add. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually want compare -129 to -128, but if there is no sign extend like lbu, we will compare 0xff7f to 0xff80(assum Xmode is DImode). Thus, we have to sign extend 0xff(Qmode) to 0xff7f(assume Xmode is DImode) before compare in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688.h: Add test helper macros. * gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 2 +- .../riscv/pr117688-trunc-run-1-s16-to-s8.c| 6 + .../riscv/pr117688-trunc-run-1-s32-to-s16.c | 6 + .../riscv/pr117688-trunc-run-1-s32-to-s8.c| 6 + .../riscv/pr117688-trunc-run-1-s64-to-s16.c | 6 + .../riscv/pr117688-trunc-run-1-s64-to-s32.c | 6 + .../riscv/pr117688-trunc-run-1-s64-to-s8.c| 6 + gcc/testsuite/gcc.target/riscv/pr117688.h | 22 +++ 8 files changed, 59 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 1e5de85e871..7f8952735d8 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -13030,7 +13030,7 @@ riscv_expand_sstrunc (rtx dest, rtx src) rtx xmode_narrow_min = gen_reg_rtx (Xmode); rtx xmode_lt = gen_reg_rtx (Xmode); rtx xmode_gt = gen_reg_rtx (Xmode); - rtx xmode_src = gen_lowpart (Xmode, src); + rtx xmode_src = riscv_extend_to_xmode_reg (src, GET_MODE (src), SIGN_EXTEND); rtx xmode_dest = gen_reg_rtx (Xmode); rtx xmode_mask = gen_reg_rtx (Xmode); rtx xmode_sat = gen_reg_rtx (Xmode); diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c new file mode 100644 index 000..df84615a25f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_TRUNC_RUN(int16_t, int8_t, INT8_MIN, INT8_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c new file mode 100644 index 000..1b0f860eb55 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_TRUNC_RUN(int32_t, int16_t, INT16_MIN, INT16_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c new file mode 100644 index 000..e412a29df36 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "pr117688.h" + +DEFINE_SIGNED_SAT_TRUNC_RUN(int32_t, int8_t, INT8_MIN, INT8_MAX) diff --git a/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c new file mode 100644 index 000..234d33b1895 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c @@ -0,0 +1,6 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-a
Re: [PING^2] [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c
Hi Torbjorn, On Fri, 24 Jan 2025 at 18:45, Torbjorn SVENSSON wrote: > > Another ping... :) > > Kind regards, > Torbjörn > > On 2024-12-18 18:35, Torbjorn SVENSSON wrote: > > Gentle ping :) > > > > Kind regards, > > Torbjörn > > > > On 2024-11-14 17:16, Torbjorn SVENSSON wrote: > >> > >> > >> On 2024-11-14 16:32, Christophe Lyon wrote: > >>> On Fri, 8 Nov 2024 at 19:49, Torbjörn SVENSSON > >>> wrote: > > Ok for trunk and releases/gcc-14? > > -- > > A long time ago, this test forced -march=armv6. > > With -marm, the generated assembler is: > foo: > sub r0, r0, #48 > cmp r0, #9 > movhi r0, #0 > movls r0, #1 > bx lr > > With -mthumb, the generated assembler is: > foo: > subsr0, r0, #48 > movsr2, #9 > uxtbr3, r0 > movsr0, #0 > cmp r2, r3 > adcsr0, r0, r0 > uxtbr0, r0 > bx lr > > Require effective-target arm_arm_ok to skip the test for thumb-only > targets (Cortex-M). > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/unsigned-extend-1.c: Use effective-target > arm_arm_ok. > > Signed-off-by: Torbjörn SVENSSON > --- > gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ > testsuite/gcc.target/arm/unsigned-extend-1.c > index 3b4ab048fb0..73f2e1a556d 100644 > --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c > +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c > @@ -1,4 +1,5 @@ > /* { dg-do compile } */ > +/* { dg-require-effective-target arm_arm_ok } */ > /* { dg-options "-O2" } */ > > >>> > >>> So I'm guessing arm_arm_ok fails when you are testing for M-profile > >>> targets? > >>> What happens when you test for A-profile, providing -mthumb as part of > >>> your runtest flags? > >> > >> Looking at our CI results, without your patch the test passes on: cortex-m33, m3, m55, m7, v7A in arm mode, v8A in thumb mode, and it fails only on cortex-m0 and m23. Does this match your observations? So your patch will make the test unsupported on m33, m3, m55 and m7 while it currently passes? Am I missing something? Thanks, Christophe > >> > >> For Cortex-M0 (thumb/cpu=cortex-m0/float-abi=soft): > >> > >> Testing arm/unsigned-extend-1.c > >> check_compile tool: gcc for arm_arm_ok > >> doing compile > >> Executing on host: .../bin/arm-none-eabi-gcc arm_arm_ok21114.c - > >> mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output > >> -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21114.s > >> (timeout = 800) > >> spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21114.c - > >> mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output - > >> marm -Wno- complain-wrong-lang -S -o arm_arm_ok21114.s > >> pid is 21254 -21254 > >> cc1: error: target CPU does not support ARM mode > >> pid is -1 > >> close result is 21254 exp6 0 1 > >> output is cc1: error: target CPU does not support ARM mode > >> status 1 > >> compiler exited with status 1 > >> UNSUPPORTED: gcc.target/arm/unsigned-extend-1.c > >> > >> > >> For Cortex-A7 (thumb/cpu=cortex-a7/float-abi=soft): > >> > >> Testing arm/unsigned-extend-1.c > >> check_compile tool: gcc for arm_arm_ok > >> doing compile > >> Executing on host: .../bin/arm-none-eabi-gcc arm_arm_ok21296.c - > >> mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output > >> -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21296.s > >> (timeout = 800) > >> spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21296.c - > >> mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output - > >> marm -Wno- complain-wrong-lang -S -o arm_arm_ok21296.s > >> pid is 21434 -21434 > >> pid is -1 > >> output is status 0 > >> doing compile > >> Executing on host: .../bin/arm-none-eabi-gcc .../gcc/testsuite/ > >> gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat- > >> abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno- > >> ident -S -o unsigned-extend-1.s(timeout = 800) > >> spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc .../gcc/testsuite/ > >> gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat- > >> abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno-ident - > >> S -o unsigned-extend-1.s > >> pid is 21438 -21438 > >> pid is -1 > >> output is status 0 > >> check_compile tool: gcc for exceptions_enabled > >> doing compile > >> Executing on host: .../bin/arm-none-eabi-gcc > >> exceptions_enabled21296.cc -mthumb -mcpu=cortex-a7 -mfloat- > >> abi=soft -fdiagnostics-plain-output -Wno-complain-wrong-lang -S > >> -o exceptions_enabled21296.s(time
[PATCH] libstdc++: correct symbol version of typeinfo for bfloat16_t on RISC-V
RISC-V only gained support for bfloat16_t after gcc 14. Passes libstdc++/check_abi on {x86_64,aarch64,ppc64le,riscv64,s390x}-suse-linux. PR libstdc++/118563 * testsuite/util/testsuite_abi.cc (check_version): Add CXXABI_1.3.16. * config/abi/pre/gnu.ver (CXXABI_1.3.14) [__riscv]: Exclude typeinfo for bfloat16_t. (CXXABI_1.3.16) [__riscv]: Add it here. --- libstdc++-v3/config/abi/pre/gnu.ver | 23 libstdc++-v3/testsuite/util/testsuite_abi.cc | 1 + 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index f7641974ec4..84ce874fe03 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -2852,10 +2852,15 @@ CXXABI_1.3.13 { CXXABI_1.3.14 { # typeinfo for _Float{16,32,64,128,32x,64x,128x} and -# __bf16 -_ZTIDF[0-9]*[_bx]; -_ZTIPDF[0-9]*[_bx]; -_ZTIPKDF[0-9]*[_bx]; +# __bf16/bfloat16_t +_ZTIDF[0-9]*[_x]; +_ZTIPDF[0-9]*[_x]; +_ZTIPKDF[0-9]*[_x]; +#ifndef __riscv +_ZTIDF16b; +_ZTIPDF16b; +_ZTIPKDF16b; +#endif _ZTIu6__bf16; _ZTIPu6__bf16; _ZTIPKu6__bf16; @@ -2869,6 +2874,16 @@ CXXABI_1.3.15 { } CXXABI_1.3.14; +CXXABI_1.3.16 { + +#ifdef __riscv +_ZTIDF16b; +_ZTIPDF16b; +_ZTIPKDF16b; +#endif + +} CXXABI_1.3.15; + # Symbols in the support library (libsupc++) supporting transactional memory. CXXABI_TM_1 { diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc b/libstdc++-v3/testsuite/util/testsuite_abi.cc index dcf5b1a4e0b..0d6080fb92c 100644 --- a/libstdc++-v3/testsuite/util/testsuite_abi.cc +++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc @@ -237,6 +237,7 @@ check_version(symbol& test, bool added) known_versions.push_back("CXXABI_1.3.13"); known_versions.push_back("CXXABI_1.3.14"); known_versions.push_back("CXXABI_1.3.15"); + known_versions.push_back("CXXABI_1.3.16"); known_versions.push_back("CXXABI_IEEE128_1.3.13"); known_versions.push_back("CXXABI_TM_1"); known_versions.push_back("CXXABI_FLOAT128"); -- 2.48.1 -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
[PATCH v3 1/4] RISC-V: Refactor SAT_* operand rtx extend to reg help func [NFC]
From: Pan Li This patch would like to refactor the helper function of the SAT_* scalar. The helper function will convert the define_pattern ops to the xmode reg for the underlying code-gen. This patch add new parameter for ZERO_EXTEND or SIGN_EXTEND if the input is const_int or the mode is non-Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Rename from ... (riscv_extend_to_xmode_reg): Rename to and add rtx_code for zero/sign extend if non-Xmode. (riscv_expand_usadd): Leverage the renamed function with ZERO_EXTEND. (riscv_expand_ussub): Ditto. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 77 --- 1 file changed, 48 insertions(+), 29 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index dd50fe4eddf..34f0a888c5c 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12644,14 +12644,26 @@ riscv_get_raw_result_mode (int regno) /* Generate a REG rtx of Xmode from the given rtx and mode. The rtx x can be REG (QI/HI/SI/DI) or const_int. The machine_mode mode is the original mode from define pattern. + The rtx_code can be ZERO_EXTEND or SIGN_EXTEND. - If rtx is REG and Xmode, the RTX x will be returned directly. + If rtx is REG: - If rtx is REG and non-Xmode, the zero extended to new REG of Xmode will be - returned. + 1. If rtx Xmode, the RTX x will be returned directly. + 2. If rtx non-Xmode, the extended to new REG of Xmode will be returned. - If rtx is const_int, a new REG rtx will be created to hold the value of - const_int and then returned. + The scalar ALU like add don't support non-Xmode like QI/HI. Then the + gen_lowpart will have problem here. For example, when we would like + to add -1 (0xff if QImode) and 2 (0x2 if QImode). The 0xff and 0x2 will + be loaded to register for adding. Aka: + + 0xff + 0x2 = 0x101 instead of -1 + 2 = 1. + + Thus we need to sign extend 0xff to 0x if Xmode is DImode + for correctness. Similar the unsigned also need zero extend. + + If rtx is const_int: + + 1. A new REG rtx will be created to hold the value of const_int. According to the gccint doc, the constants generated for modes with fewer bits than in HOST_WIDE_INT must be sign extended to full width. Thus there @@ -12669,34 +12681,41 @@ riscv_get_raw_result_mode (int regno) the REG rtx of Xmode, instead of taking care of these in expand func. */ static rtx -riscv_gen_zero_extend_rtx (rtx x, machine_mode mode) +riscv_extend_to_xmode_reg (rtx x, machine_mode mode, enum rtx_code rcode) { + gcc_assert (rcode == ZERO_EXTEND || rcode == SIGN_EXTEND); + rtx xmode_reg = gen_reg_rtx (Xmode); - if (!CONST_INT_P (x)) + if (CONST_INT_P (x)) { if (mode == Xmode) - return x; - - riscv_emit_unary (ZERO_EXTEND, xmode_reg, x); - return xmode_reg; -} - - if (mode == Xmode) -emit_move_insn (xmode_reg, x); - else -{ - /* Combine deliberately does not simplify extensions of constants -(long story). So try to generate the zero extended constant -efficiently. + emit_move_insn (xmode_reg, x); + else if (rcode == ZERO_EXTEND) + { + /* Combine deliberately does not simplify extensions of constants +(long story). So try to generate the zero extended constant +efficiently. -First extract the constant and mask off all the bits not in MODE. */ - HOST_WIDE_INT val = INTVAL (x); - val &= GET_MODE_MASK (mode); +First extract the constant and mask off all the bits not in +MODE. */ + HOST_WIDE_INT val = INTVAL (x); + val &= GET_MODE_MASK (mode); - /* X may need synthesis, so do not blindly copy it. */ - xmode_reg = force_reg (Xmode, gen_int_mode (val, Xmode)); + /* X may need synthesis, so do not blindly copy it. */ + xmode_reg = force_reg (Xmode, gen_int_mode (val, Xmode)); + } + else /* SIGN_EXTEND. */ + { + rtx x_reg = gen_reg_rtx (mode); + emit_move_insn (x_reg, x); + riscv_emit_unary (rcode, xmode_reg, x_reg); + } } + else if (mode == Xmode) +return x; + else +riscv_emit_unary (rcode, xmode_reg, x); return xmode_reg; } @@ -12717,8 +12736,8 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y) machine_mode mode = GET_MODE (dest); rtx xmode_sum = gen_reg_rtx (Xmode); rtx xmode_lt = gen_reg_rtx (Xmode); - rtx xmode_x = riscv_gen_zero_extend_rtx (x, mode); - rtx xmode_y = riscv_gen_zero_extend_rtx (y, mode); + rtx xmode_x = riscv_extend_to_xmode_reg (x, mode, ZERO_EXTEND); + rtx xmode_y = riscv_extend_to_xmode_reg (y, mode, ZERO_EXTEND); rtx xmode_dest = gen_reg_rtx (Xmode); /* Step-1: sum = x + y */ @@ -12852,8 +12
[PATCH] arm: libbacktrace: Check if the compiler supports __sync atomics
Older versions of the Arm architecture lack support for __sync operations directly in hardware and require calls into appropriate operating-system hooks. But such hooks obviously don't exist in a freestanding environment. Consquently, it is incorrect to assume during configure that such functions will exist and we need a configure-time check to determine whether or not these routines will work. libbacktrace: * configure.ac: Always check if the compiler supports __sync operations. * configure: Regenerated. --- A consequence of this assumption that these functions 'just work' (TM) is that we build libbacktrace with threading support, but that then fails at link time when we find out that they do not. I'm not sure I really understand why we can assume that having with_target_subdir set is enough to assume that the operations exist: it isn't for Arm and it looks like it wasn't for hpux either. Perhaps the whole logic here needs reconsideration... libbacktrace/configure| 23 +++ libbacktrace/configure.ac | 10 ++ 2 files changed, 33 insertions(+) diff --git a/libbacktrace/configure b/libbacktrace/configure index db491a78234..0ecdd3ec0a3 100755 --- a/libbacktrace/configure +++ b/libbacktrace/configure @@ -12760,6 +12760,29 @@ else if test -n "${with_target_subdir}"; then case "${host}" in hppa*-*-hpux*) libbacktrace_cv_sys_sync=no ;; + arm*-*-eabi*) + # Older versions of the Arm architecture lack the necessary instructions + # for these constructs, so check whether we can use them. + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +int i; +int +main () +{ +__sync_bool_compare_and_swap (&i, i, i); + __sync_lock_test_and_set (&i, 1); + __sync_lock_release (&i); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + libbacktrace_cv_sys_sync=yes +else + libbacktrace_cv_sys_sync=no +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext conftest.$ac_ext;; *) libbacktrace_cv_sys_sync=yes ;; esac else diff --git a/libbacktrace/configure.ac b/libbacktrace/configure.ac index b700bf9d4f9..75b3a7536f1 100644 --- a/libbacktrace/configure.ac +++ b/libbacktrace/configure.ac @@ -199,6 +199,16 @@ AC_CACHE_CHECK([__sync extensions], [if test -n "${with_target_subdir}"; then case "${host}" in hppa*-*-hpux*) libbacktrace_cv_sys_sync=no ;; + arm*-*-eabi*) + # Older versions of the Arm architecture lack the necessary instructions + # for these constructs, so check whether we can use them. + AC_LINK_IFELSE( + [AC_LANG_PROGRAM([int i;], +[__sync_bool_compare_and_swap (&i, i, i); + __sync_lock_test_and_set (&i, 1); + __sync_lock_release (&i);])], + [libbacktrace_cv_sys_sync=yes], + [libbacktrace_cv_sys_sync=no]);; *) libbacktrace_cv_sys_sync=yes ;; esac else -- 2.34.1
[PATCH] options: Adjust cl_optimization_compare to avoid checking ICE [PR115913]
Hello- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115913#c10 This tweak to optc-save-gen.awk weakens the check performed by cl_optimization_compare() to avoid checking asserts that have been there since this function was first added in r11-1141. Is it OK for 15 please? I think it would be appropriate to backport back to 12,13,14 as well. Bootstrap + regtest all languages on x86-64 Linux with no issues. Thanks! -Lewis -- >8 -- At the end of a sequence like: #pragma GCC push_options ... #pragma GCC pop_options the handler for pop_options calls cl_optimization_compare() (as generated by optc-save-gen.awk) to make sure that all global state has been restored to the value it had prior to the push_options call. The verification is performed for almost all entries in the global_options struct. This leads to unexpected checking asserts, as discussed in the PR, in case the state of warnings-related options has been intentionally modified in between push_options and pop_options via a call to #pragma GCC diagnostic. Address that by skipping the verification for CL_WARNING-flagged options. gcc/ChangeLog: PR middle-end/115913 * optc-save-gen.awk (cl_optimization_compare): Skip options with CL_WARNING flag. gcc/testsuite/ChangeLog: PR middle-end/115913 * c-c++-common/cpp/pr115913.c: New test. --- gcc/optc-save-gen.awk | 5 + gcc/testsuite/c-c++-common/cpp/pr115913.c | 7 +++ 2 files changed, 12 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/cpp/pr115913.c diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk index fa9218472ed..a3d7e5a478e 100644 --- a/gcc/optc-save-gen.awk +++ b/gcc/optc-save-gen.awk @@ -1484,6 +1484,11 @@ for (i = 0; i < n_opts; i++) { if (name == "") continue; + # We do not want to compare warning-related options, since they + # might have been modified by a #pragma GCC diagnostic. + if (flag_set_p("Warning", flags[i])) + continue; + if (name in checked_options) continue; checked_options[name]++ diff --git a/gcc/testsuite/c-c++-common/cpp/pr115913.c b/gcc/testsuite/c-c++-common/cpp/pr115913.c new file mode 100644 index 000..b9d10cda8d2 --- /dev/null +++ b/gcc/testsuite/c-c++-common/cpp/pr115913.c @@ -0,0 +1,7 @@ +/* { dg-do preprocess } */ +/* PR middle-end/115913 */ +#pragma GCC push_options +#pragma GCC diagnostic warning "-Wundef" +/* The call to cl_optimization_compare performed by pop_options should not + lead to a checking failure. */ +#pragma GCC pop_options
Re: [PING^2] [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c
On Mon, 27 Jan 2025 at 13:30, Torbjorn SVENSSON wrote: > > Hi Christophe, > > On 2025-01-27 13:07, Christophe Lyon wrote: > > Hi Torbjorn, > > > > On Fri, 24 Jan 2025 at 18:45, Torbjorn SVENSSON > > wrote: > >> > >> Another ping... :) > >> > >> Kind regards, > >> Torbjörn > >> > >> On 2024-12-18 18:35, Torbjorn SVENSSON wrote: > >>> Gentle ping :) > >>> > >>> Kind regards, > >>> Torbjörn > >>> > >>> On 2024-11-14 17:16, Torbjorn SVENSSON wrote: > > > On 2024-11-14 16:32, Christophe Lyon wrote: > > On Fri, 8 Nov 2024 at 19:49, Torbjörn SVENSSON > > wrote: > >> > >> Ok for trunk and releases/gcc-14? > >> > >> -- > >> > >> A long time ago, this test forced -march=armv6. > >> > >> With -marm, the generated assembler is: > >> foo: > >> sub r0, r0, #48 > >> cmp r0, #9 > >> movhi r0, #0 > >> movls r0, #1 > >> bx lr > >> > >> With -mthumb, the generated assembler is: > >> foo: > >> subsr0, r0, #48 > >> movsr2, #9 > >> uxtbr3, r0 > >> movsr0, #0 > >> cmp r2, r3 > >> adcsr0, r0, r0 > >> uxtbr0, r0 > >> bx lr > >> > >> Require effective-target arm_arm_ok to skip the test for thumb-only > >> targets (Cortex-M). > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/arm/unsigned-extend-1.c: Use effective-target > >> arm_arm_ok. > >> > >> Signed-off-by: Torbjörn SVENSSON > >> --- > >>gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 + > >>1 file changed, 1 insertion(+) > >> > >> diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ > >> testsuite/gcc.target/arm/unsigned-extend-1.c > >> index 3b4ab048fb0..73f2e1a556d 100644 > >> --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c > >> +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c > >> @@ -1,4 +1,5 @@ > >>/* { dg-do compile } */ > >> +/* { dg-require-effective-target arm_arm_ok } */ > >>/* { dg-options "-O2" } */ > >> > > > > So I'm guessing arm_arm_ok fails when you are testing for M-profile > > targets? > > What happens when you test for A-profile, providing -mthumb as part of > > your runtest flags? > > > > > > Looking at our CI results, without your patch the test passes on: > > cortex-m33, m3, m55, m7, v7A in arm mode, v8A in thumb mode, > > and it fails only on cortex-m0 and m23. > > Does this match your observations? > > Without my patch, I see failure on M0, M0+ and M23. > With my patch, it goes to unsupported on all Cortex-M targets that I test. > > > So your patch will make the test unsupported on m33, m3, m55 and m7 > > while it currently passes? > > > > Am I missing something? > > I think you might have missed my comment down > > > > > Thanks, > > > > Christophe > > > > For Cortex-M0 (thumb/cpu=cortex-m0/float-abi=soft): > > Testing arm/unsigned-extend-1.c > check_compile tool: gcc for arm_arm_ok > doing compile > Executing on host: .../bin/arm-none-eabi-gcc arm_arm_ok21114.c - > mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output > -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21114.s > (timeout = 800) > spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21114.c - > mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output - > marm -Wno- complain-wrong-lang -S -o arm_arm_ok21114.s > pid is 21254 -21254 > cc1: error: target CPU does not support ARM mode > pid is -1 > close result is 21254 exp6 0 1 > output is cc1: error: target CPU does not support ARM mode > status 1 > compiler exited with status 1 > UNSUPPORTED: gcc.target/arm/unsigned-extend-1.c > > > For Cortex-A7 (thumb/cpu=cortex-a7/float-abi=soft): > > Testing arm/unsigned-extend-1.c > check_compile tool: gcc for arm_arm_ok > doing compile > Executing on host: .../bin/arm-none-eabi-gcc arm_arm_ok21296.c - > mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output > -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21296.s > (timeout = 800) > spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21296.c - > mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output - > marm -Wno- complain-wrong-lang -S -o arm_arm_ok21296.s > pid is 21434 -21434 > pid is -1 > output is status 0 > doing compile > Executing on host: .../bin/arm-none-eabi-gcc .../gcc/testsuite/ > gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat- > abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno- > ident -S -o unsigned-extend
Re: [PATCH v3 0/4] Hard Register Constraints
On 1/26/25 12:24 PM, Jakub Jelinek wrote: On Sun, Jan 26, 2025 at 08:35:29AM -0700, Jeff Law wrote: On 1/23/25 8:49 AM, Stefan Schulze Frielinghaus wrote: On Sat, Jan 18, 2025 at 09:36:14AM -0700, Jeff Law wrote: [...] Do we detect conflicts between a hard register constraint and another constraint which requires a singleton class? That's going to be an error I suspect, but curious if it's handled. That is a good point. Currently I suspect no. I will have a look. Thanks. It's not the most important thing on our plate, but given the way x86 is structured we probably need to do something sensible here. I also worry a bit about non-singleton classes that the target may have added to CLASS_LIKELY_SPILLED_P, though unlike the singleton case, there's at least a chance these will work, albeit potentially generating poor code when an object needs spilling. I also don't think it's terribly common to add non-singleton classes to that set. I was first worried that the single register class construct is somewhat special. To me, it turns out that they behave very similar to my current draft. Basically during LRA in process_alt_operands() I'm installing Yea, I would think they'd largely behave like your proposal. Given the presence of a singleton class one could use that to write an ASM just as effectively as your hard register proposal. And satisfying the constraints should boil down to the same basic process. What your proposal does is give users fine grained control as-if the port had a singleton class for every register -- without us having to add all those pesky register classes. Though, don't we have various hacks for small register classes? I mean e.g. targetm.class_likely_spilled_p calls in combine/cse/loop-invariant etc. If all the registers could be made to behave similarly, don't we need to create those similarly? Unclear at this point, though leaning towards not strictly requiring a change to that code, though we may be able to clean it up as a result of Stefan's work. Essentially he's code code to split ranges, which in theory we could use to split ranges of pseudos in likely spilled classes and perhaps simplify or improve that code. Jeff
[PATCH] tree-optimization/114052 - wrong code with niter analysis
The following fixes part of the issue that niter analysis uses UB of stmts that might not execute due to inner infinite loops (the testcase in the PR) or due to calls terminating the program or the function. This patch addresses inner infinite loops by copying parts of the analysis loop invariant motion does. This requires enumerating the loop body in DOM order rather than DFS order. On trunk the testcase no longer triggers the bug, but it is present on the branches where this patch fixes the issue. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Any objections? The missing related bits are /* If LOOP exits from this BB stop processing. */ FOR_EACH_EDGE (e, ei, bb->succs) if (!flow_bb_inside_loop_p (loop, e->dest)) break; if (e) break; /* A loop might be infinite (TODO use simple loop analysis to disprove this if possible). */ if (bb->flags & BB_IRREDUCIBLE_LOOP) break; and of course call handling. But somehow I'd like to see relevant testcases since this all likely will cause some (too optimistic) missed optimizations. Richard. PR tree-optimization/114052 * tree-ssa-loop-niter.cc (estimate_numbers_of_iterations): Enumerate the loop body in DOM order. (infer_loop_bounds_from_undefined): Track which subloop we walk, when exiting a subloop that may be not finite terminate the walk. * gcc.dg/pr114052-1.c: New testcase. --- gcc/testsuite/gcc.dg/pr114052-1.c | 41 +++ gcc/tree-ssa-loop-niter.cc| 19 +- 2 files changed, 59 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/pr114052-1.c diff --git a/gcc/testsuite/gcc.dg/pr114052-1.c b/gcc/testsuite/gcc.dg/pr114052-1.c new file mode 100644 index 000..920d32b1a65 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr114052-1.c @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-require-effective-target alarm } */ +/* { dg-require-effective-target signal } */ +/* { dg-options "-O2" } */ + +#include +#include +#include +#include + +volatile int y; +void __attribute__((noipa)) put(int x) +{ + if (y) +__builtin_printf ("%i\n", x); +} + +void __attribute__((noipa)) f(void) +{ + int counter = 0; + while (1) { + if (counter >= 2) continue; + put (counter++); + } +} + +void do_exit (int i) +{ + exit (0); +} + +int main() +{ + struct sigaction s; + sigemptyset (&s.sa_mask); + s.sa_handler = do_exit; + s.sa_flags = 0; + sigaction (SIGALRM, &s, NULL); + alarm (1); + f(); +} diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index de8d5ae6233..2bd349e58bd 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -4451,11 +4451,28 @@ infer_loop_bounds_from_undefined (class loop *loop, basic_block *bbs) gimple_stmt_iterator bsi; basic_block bb; bool reliable; + class loop *inn_loop = loop; for (i = 0; i < loop->num_nodes; i++) { bb = bbs[i]; + if (!flow_bb_inside_loop_p (inn_loop, bb)) + { + /* When we are leaving a possibly infinite inner loop +we have to stop processing. */ + if (!finite_loop_p (inn_loop)) + break; + /* If the loop was finite we can continue with processing +the loop we exited to. */ + inn_loop = bb->loop_father; + } + + if (bb->loop_father->header == bb) + /* Record that we enter into a subloop since it might not + be finite. */ + inn_loop = bb->loop_father; + /* If BB is not executed in each iteration of the loop, we cannot use the operations in it to infer reliable upper bound on the # of iterations of the loop. However, we can use it as a guess. @@ -4885,7 +4902,7 @@ estimate_numbers_of_iterations (class loop *loop) diagnose those loops with -Waggressive-loop-optimizations. */ number_of_latch_executions (loop); - basic_block *body = get_loop_body (loop); + basic_block *body = get_loop_body_in_dom_order (loop); auto_vec exits = get_loop_exit_edges (loop, body); likely_exit = single_likely_exit (loop, exits); FOR_EACH_VEC_ELT (exits, i, ex) -- 2.43.0
[PATCH] tree-optimization/118653 - ICE in vectorizable_live_operation
The checking code didn't take into account debug uses. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/118653 * tree-vect-loop.cc (vectorizable_live_operation): Also allow out-of-loop debug uses. * gcc.dg/vect/pr118653.c: New testcase. --- gcc/testsuite/gcc.dg/vect/pr118653.c | 15 +++ gcc/tree-vect-loop.cc| 3 ++- 2 files changed, 17 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/pr118653.c diff --git a/gcc/testsuite/gcc.dg/vect/pr118653.c b/gcc/testsuite/gcc.dg/vect/pr118653.c new file mode 100644 index 000..9322b23a17d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr118653.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -g" } */ + +static short func1 (short si1, short si2) { + return (si1 - si2); +} +unsigned short g_72; +extern int g_100[]; +short g_173; +void func_42(void) +{ + for (g_173 = 10; g_173 > 0; g_173 = func1 (g_173, 1)) +for (g_72 = 1; g_72 < 5; g_72++) + g_100[g_72] &= 1; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index edd7d4d8763..ce674a71e8a 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -11574,7 +11574,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, /* There a no further out-of-loop uses of lhs by LC-SSA construction. */ FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs) - gcc_assert (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))); + gcc_assert (is_gimple_debug (use_stmt) + || flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))); } else { -- 2.43.0
Re: [PATCH] tree-optimization/112859 - bogus loop distribution
Hello, On Thu, 23 Jan 2025, Richard Biener wrote: > When we get a zero distance vector we still have to check for the > situation of a common inner loop with zero distance. But we can > still allow a zero distance for the loop we distribute > (gcc.dg/tree-ssa/ldist-33.c is such a case). This is because > zero distances in non-outermost loops are a misrepresentation > of dependence by dependence analysis. I think as long as that is the case your proposed changes makes sense. But perhaps it's worth a comment to that effect, i.e. that because dependence analysis is wonky (sometime using zero as unknown) we do these "strange" tests. At least for the occasional reader it will be unintuitive why the obvious tests aren't working. Ciao, Michael.
[PATCH] RISC-V: testsuite: Fix gather_load_64-12-zvbb.c
Hi, the test fails with _zvfh because we vectorize more. Just adjust the test expectations. I regtested this a while ago on rv64gcv_zvl512b. Will commit as obvious if the CI is happy. Regards Robin gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c: Distinguish between zvfh and !zvfh. --- .../riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c index de5a5ed7d56..698f0091390 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c @@ -106,7 +106,8 @@ TEST_LOOP (_Float16, uint64_t) TEST_LOOP (float, uint64_t) TEST_LOOP (double, uint64_t) -/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 80 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 80 "vect" { target { ! riscv_zvfh } } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 88 "vect" { target riscv_zvfh } } } */ /* { dg-final { scan-tree-dump " \.MASK_LEN_GATHER_LOAD" "vect" } } */ /* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "vect" } } */ /* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "vect" } } */ -- 2.47.1
[PATCH] RISC-V: testsuite: Fix reduc-8.c and reduc-9.c
Hi, in both tests we expect a VEC_SHL_INSERT expression but we now add the initial value at the end. Just remove that scan check. Will commit as obvious after the CI run. Regards Robin --- gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c | 1 - gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c | 1 - 2 files changed, 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c index fe47aa3648d..518f0c33cc4 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-8.c @@ -12,5 +12,4 @@ add_loop (int *x, int n, int res) return res; } -/* { dg-final { scan-tree-dump-times "VEC_SHL_INSERT" 1 "optimized" } } */ /* { dg-final { scan-assembler-times {vslide1up\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[a-x0-9]+} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c index 6630d302721..a5bb8dcccb8 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-9.c @@ -12,5 +12,4 @@ add_loop (float *x, int n, float res) return res; } -/* { dg-final { scan-tree-dump-times "VEC_SHL_INSERT" 1 "optimized" } } */ /* { dg-final { scan-assembler-times {vfslide1up\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[a-x0-9]+} 1 } } */ -- 2.47.1
Re: [PATCH] rtl-optimization/118662 - wrong combination of vector sign-extends
On Mon, Jan 27, 2025 at 11:09:38AM +0100, Richard Biener wrote: > PR rtl-optimization/118662 > * combine.cc (try_combine): When re-materializing a load > from an extended reg by a lowpart subreg make sure we're > dealing with single-component modes. > > * gcc.dg/torture/pr118662.c: New testcase. > --- > gcc/combine.cc | 5 + > gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++ > 2 files changed, 23 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c > > diff --git a/gcc/combine.cc b/gcc/combine.cc > index a2d4387cebe..4849603ba5e 100644 > --- a/gcc/combine.cc > +++ b/gcc/combine.cc > @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, > rtx_insn *i0, > copy. This saves at least one insn, more if register allocation can > eliminate the copy. > > + We cannot do this if the involved modes have more than one elements, > + like for vector or complex modes. > + > We cannot do this if the destination of the first assignment is a > condition code register. We eliminate this case by making sure > the SET_DEST and SET_SRC have the same mode. > @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, > rtx_insn *i0, > && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND > && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0))) > == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0 > +&& known_eq (GET_MODE_NUNITS > + (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0, 1) If you want to rule this out for vector/complex modes, then the above doesn't do that, all the V1??mode will still be optimized, and I think that is undesirable. So, either && !VECTOR_MODE_P (GET_MODE (...)) && !COMPLEX_MODE_P (GET_MODE (...)) or perhaps as less expensive but less readable check && GET_MODE_INNER (GET_MODE (...)) == GET_MODE (...) with a comment. Jakub
[PATCH] middle-end/118643 - ICE with out-of-bound decl access
When RTL expansion of an out-of-bound access of a register falls back to a BIT_FIELD_REF we have to ensure that's valid. The following avoids negative offsets by expanding through a stack temporary. Bootstrap and regtest running on x86_64-unknown-linux-gnu. OK if that succeeds? Thanks, Richard. PR middle-end/118643 * expr.cc (expand_expr_real_1): Avoid falling back to BIT_FIELD_REF expansion for negative offset. * gcc.dg/pr118643.c: New testcase. --- gcc/expr.cc | 8 +--- gcc/testsuite/gcc.dg/pr118643.c | 11 +++ 2 files changed, 16 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr118643.c diff --git a/gcc/expr.cc b/gcc/expr.cc index a310b2d9131..4bfcde54523 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -11796,13 +11796,15 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, && known_eq (GET_MODE_BITSIZE (DECL_MODE (base)), type_size)) return expand_expr (build1 (VIEW_CONVERT_EXPR, type, base), target, tmode, modifier); - if (TYPE_MODE (type) == BLKmode) + if (TYPE_MODE (type) == BLKmode + || maybe_lt (offset, 0)) { temp = assign_stack_temp (DECL_MODE (base), GET_MODE_SIZE (DECL_MODE (base))); store_expr (base, temp, 0, false, false); - temp = adjust_address (temp, BLKmode, offset); - set_mem_size (temp, int_size_in_bytes (type)); + temp = adjust_address (temp, TYPE_MODE (type), offset); + if (TYPE_MODE (type) == BLKmode) + set_mem_size (temp, int_size_in_bytes (type)); return temp; } exp = build3 (BIT_FIELD_REF, type, base, TYPE_SIZE (type), diff --git a/gcc/testsuite/gcc.dg/pr118643.c b/gcc/testsuite/gcc.dg/pr118643.c new file mode 100644 index 000..ff2b081ae0b --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr118643.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +typedef __attribute__((__vector_size__(1))) unsigned char V; + +V x; +void foo() +{ + V v = x; + x = *(V *)(&v - 1); +} -- 2.43.0
Re: [PATCH] middle-end/118643 - ICE with out-of-bound decl access
On Mon, Jan 27, 2025 at 11:32:17AM +0100, Richard Biener wrote: > When RTL expansion of an out-of-bound access of a register falls > back to a BIT_FIELD_REF we have to ensure that's valid. The > following avoids negative offsets by expanding through a stack > temporary. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > OK if that succeeds? > > Thanks, > Richard. > > PR middle-end/118643 > * expr.cc (expand_expr_real_1): Avoid falling back to BIT_FIELD_REF > expansion for negative offset. > > * gcc.dg/pr118643.c: New testcase. > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -11796,13 +11796,15 @@ expand_expr_real_1 (tree exp, rtx target, > machine_mode tmode, > && known_eq (GET_MODE_BITSIZE (DECL_MODE (base)), type_size)) > return expand_expr (build1 (VIEW_CONVERT_EXPR, type, base), > target, tmode, modifier); > - if (TYPE_MODE (type) == BLKmode) > + if (TYPE_MODE (type) == BLKmode > + || maybe_lt (offset, 0)) The whole condition would fit on one line. > { > temp = assign_stack_temp (DECL_MODE (base), > GET_MODE_SIZE (DECL_MODE (base))); > store_expr (base, temp, 0, false, false); > - temp = adjust_address (temp, BLKmode, offset); > - set_mem_size (temp, int_size_in_bytes (type)); > + temp = adjust_address (temp, TYPE_MODE (type), offset); > + if (TYPE_MODE (type) == BLKmode) > + set_mem_size (temp, int_size_in_bytes (type)); > return temp; > } > exp = build3 (BIT_FIELD_REF, type, base, TYPE_SIZE (type), Otherwise LGTM. Jakub
[Patch, fortran] PR118640 - [15 Regression] cp2k ICE in gfc_conv_expr_present since r15-5347
Hi All, Pushed as an 'obvious' one-liner(less additional comments) in r15-7224 Fortran: ICE in gfc_conv_expr_present w. defined assignment [PR118640] 2025-01-27 Paul Thomas gcc/fortran PR fortran/118640 * resolve.cc (generate_component_assignments): Make sure that the rhs temporary does not pick up the optional attribute from the lhs. gcc/testsuite/ PR fortran/118640 * gfortran.dg/pr118640.f90: New test. Paul
[PATCH] rtl-optimization/118662 - wrong combination of vector sign-extends
The following fixes an issue in the RTL combiner where we correctly combine two vector sign-exxtends with a vector load Trying 7, 9 -> 10: 7: r106:V4QI=[r119:DI] REG_DEAD r119:DI 9: r108:V4HI=sign_extend(vec_select(r106:V4QI#0,parallel)) 10: r109:V4SI=sign_extend(vec_select(r108:V4HI#0,parallel)) REG_DEAD r108:V4HI to modifying insn i2 9: r109:V4SI=sign_extend([r119:DI]) but since r106 is used we wrongly materialize it using a subreg: modifying insn i310: r106:V4QI=r109:V4SI#0 which of course does not work for modes with more than one component. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. Note the check allows subreg:V1QI reg:V1SI (which I think is OK). There's no SCALAR_MODE_P, maybe the other checks guarantee it's an integer mode so eventually SCALAR_INT_MODE_P covers everything important (it wouldn't cover V1QI, not that that's important). OK? Or do you prefer a different check - which? Thanks, Richard. PR rtl-optimization/118662 * combine.cc (try_combine): When re-materializing a load from an extended reg by a lowpart subreg make sure we're dealing with single-component modes. * gcc.dg/torture/pr118662.c: New testcase. --- gcc/combine.cc | 5 + gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++ 2 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c diff --git a/gcc/combine.cc b/gcc/combine.cc index a2d4387cebe..4849603ba5e 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, copy. This saves at least one insn, more if register allocation can eliminate the copy. + We cannot do this if the involved modes have more than one elements, + like for vector or complex modes. + We cannot do this if the destination of the first assignment is a condition code register. We eliminate this case by making sure the SET_DEST and SET_SRC have the same mode. @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, && GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND && (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0))) == GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0 + && known_eq (GET_MODE_NUNITS + (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0, 1) && GET_CODE (XVECEXP (newpat, 0, 1)) == SET && rtx_equal_p (SET_SRC (XVECEXP (newpat, 0, 1)), XEXP (SET_SRC (XVECEXP (newpat, 0, 0)), 0)) diff --git a/gcc/testsuite/gcc.dg/torture/pr118662.c b/gcc/testsuite/gcc.dg/torture/pr118662.c new file mode 100644 index 000..b9e8cca0aeb --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr118662.c @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } */ +/* { dg-additional-options "-msse4" { target sse4_runtime} } */ + +int __attribute__((noipa)) addup(signed char *num) { + int val = num[0] + num[1] + num[2] + num[3]; + if (num[3] >= 0) +val++; + return val; +} + +int main(int, char *[]) +{ + signed char num[4] = {1, 1, 1, -1}; + if (addup(num) != 2) +__builtin_abort(); + return 0; +} -- 2.43.0
[PATCH/GCC16 0/1] AArch64: Emit half-precision FCMP/FCMPE
This patch allows the AArch64 back end to emit the half-precision variants of FCMP and FCMPE, given the target supports FEAT_FP16. Previously, such comparisons would be unnecessarily promoted to single-precision. The latest documentation of these instructions can be found here: https://developer.arm.com/documentation/ddi0602/2024-12 Successfully bootstrapped and regtested on aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Emit half-precision FCMP/FCMPE gcc/config/aarch64/aarch64.md | 29 +- .../gcc.target/aarch64/_Float16_cmp_1.c | 54 +++ .../gcc.target/aarch64/_Float16_cmp_2.c | 7 +++ 3 files changed, 77 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c -- 2.34.1
[PATCH/GCC16 1/1] AArch64: Emit half-precision FCMP/FCMPE
Enable a target with FEAT_FP16 to emit the half-precision variants of FCMP/FCMPE. gcc/ChangeLog: * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp and fcmpe to use the GPF_F16 iterator for floating-point modes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/_Float16_cmp_1.c: New test. * gcc.target/aarch64/_Float16_cmp_2.c: New (negative) test. --- gcc/config/aarch64/aarch64.md | 29 +- .../gcc.target/aarch64/_Float16_cmp_1.c | 54 +++ .../gcc.target/aarch64/_Float16_cmp_2.c | 7 +++ 3 files changed, 77 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 071058dbeb3..8721bf5d4f3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -707,11 +707,12 @@ ) (define_expand "cbranch4" - [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" - [(match_operand:GPF 1 "register_operand") -(match_operand:GPF 2 "aarch64_fp_compare_operand")]) - (label_ref (match_operand 3 "" "")) - (pc)))] + [(set (pc) (if_then_else + (match_operator 0 "aarch64_comparison_operator" +[(match_operand:GPF_F16 1 "register_operand") + (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] "" " operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), operands[1], @@ -4338,26 +4339,28 @@ (define_insn "fcmp" [(set (reg:CCFP CC_REGNUM) -(compare:CCFP (match_operand:GPF 0 "register_operand") - (match_operand:GPF 1 "aarch64_fp_compare_operand")))] + (compare:CCFP + (match_operand:GPF_F16 0 "register_operand") + (match_operand:GPF_F16 1 "aarch64_fp_compare_operand")))] "TARGET_FLOAT" {@ [ cons: 0 , 1 ] [ w , Y ] fcmp\t%0, #0.0 [ w , w ] fcmp\t%0, %1 } - [(set_attr "type" "fcmp")] + [(set_attr "type" "fcmp")] ) (define_insn "fcmpe" [(set (reg:CCFPE CC_REGNUM) -(compare:CCFPE (match_operand:GPF 0 "register_operand") - (match_operand:GPF 1 "aarch64_fp_compare_operand")))] + (compare:CCFPE + (match_operand:GPF_F16 0 "register_operand") + (match_operand:GPF_F16 1 "aarch64_fp_compare_operand")))] "TARGET_FLOAT" {@ [ cons: 0 , 1 ] [ w , Y ] fcmpe\t%0, #0.0 [ w , w ] fcmpe\t%0, %1 } - [(set_attr "type" "fcmp")] + [(set_attr "type" "fcmp")] ) (define_insn "*cmp_swp__reg" @@ -4425,8 +4428,8 @@ (define_expand "cstore4" [(set (match_operand:SI 0 "register_operand") (match_operator:SI 1 "aarch64_comparison_operator_mode" -[(match_operand:GPF 2 "register_operand") - (match_operand:GPF 3 "aarch64_fp_compare_operand")]))] +[(match_operand:GPF_F16 2 "register_operand") + (match_operand:GPF_F16 3 "aarch64_fp_compare_operand")]))] "" " operands[2] = aarch64_gen_compare_reg (GET_CODE (operands[1]), operands[2], diff --git a/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c new file mode 100644 index 000..e49ace1d7dc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_1.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+fp16" } */ + +/* +** test_fcmp_store: +** fcmph0, h1 +** csetw0, eq +** ret +*/ +int +test_fcmp_store(_Float16 a, _Float16 b) +{ +return a == b; +} + +/* +** test_fcmpe_store: +** fcmpe h0, h1 +** csetw0, mi +** ret +*/ +int +test_fcmpe_store(_Float16 a, _Float16 b) +{ +return a < b; +} + +/* +** test_fcmp_branch: +** fcmph0, h1 +** ... +*/ +_Float16 +test_fcmp_branch(_Float16 a, _Float16 b) +{ +if (a == b) +return a * b; +return a; +} + +/* +** test_fcmpe_branch: +** fcmpe h0, h1 +** ... +*/ +_Float16 +test_fcmpe_branch(_Float16 a, _Float16 b) +{ +if (a < b) +return a * b; +return a; +} + +/* { dg-final { check-function-bodies "**" "" "" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c new file mode 100644 index 000..e714304970b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/_Float16_cmp_2.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv8.2-a+nofp16" } */ + +#include "_Float16_cmp_1.c" + +/* { dg-final { scan-assembler-not "\tfcmp\th\[0-9\]\+" } } */ +/* { dg-final { scan-assembler-not "\tfcmpe\th\[0-9\]\+" } } */ \ No newline at end of file --
Re: [RFC PATCH] i386: Re-alias -mavx10.2 to 512 bit and make -mno-avx10.x-512 disable the whole AVX10.x
On Mon, Jan 27, 2025 at 8:30 AM Haochen Jiang wrote: > > Hi all, > > AVX10 has been published for one and half year and we have got many feedbacks > on that, one of the feedback is on whether the alias option -mavx10.x should > point to 256 or 512. > > If you also pay attention to LLVM community, you might see this thread related > to AVX10 options just sent out several hours ago: > > [X86][AVX10] Disable m[no-]avx10.1 and switch m[no-]avx10.2 to alias of 512 > bit options > https://github.com/llvm/llvm-project/pull/124511 > > In GCC, we will also do so. This RFC patch is slightly different with LLVM, > just > including: > > - Switch -m[no-]avx10.2 to alias of 512 bit options. > - Change -mno-avx10.[1,2]-512 to disable both 256 and 512 instructions. This > will also result in -mno-avx10.2 would still disable both 256 and 512 insts > according to new alias point to 512. > > But not including disabling -m[no-]avx10.1, since I still want more input on > how to handle that. We actually have three choices on that: > > a. Directly re-alias -m[no-]avx10.1 to -m[no-]avx10.1-512 GCC 15 and backport > to GCC 14. > b. Disable -m[no]-avx10.1 in GCC 15, and add it back with -m[no-]avx10.1-512 > in the future. This is for in case if someone cross compile with different > versions > of GCC with -mavx10.1, it might get unexpected result sliently. > c. Disable -m[no]-avx10.1 in GCC 15, and never add it back. Since the option > has > been 256 bit, changing them back and forth is messy. > > It might be the final chance we could change the alias option since real > AVX10.1 hardware is coming soon. And it is only x86 specific, so it might > still > squeeze into GCC 15 at this time. > > I call this patch RFC patch since we also need to change the doc and testcases > accordingly, which makes this patch incomplete. Discussion and input is > welcomed > on this topic. Can you re-hash on how users need to select 256bit vs 512bit support? I understand the above change basically makes -m[no-]avx10.[12] gate ISA features but not size? So that will now enable 512bit unless -mno-evex512 is given? Given for the -mavx10.[12]-{256,512} the behavior changes compared to GCC 14 I'd rather drop the options that behave differently from GCC 14 on GCC 15 than changing their meaning. That unfortunately will make them a hard error (but I don't expect much use). I'm not sure it's worth retaining -m[no-]avx10.[12]-512. But maybe I'm misunderstanding the change (too many avx10.x related options are there). Richard. > > Thx, > Haochen > > --- > gcc/common/config/i386/i386-common.cc | 30 +-- > gcc/common/config/i386/i386-isas.h| 2 +- > gcc/config/i386/i386-options.cc | 2 +- > gcc/config/i386/i386.opt | 4 ++-- > gcc/doc/extend.texi | 8 --- > gcc/doc/sourcebuild.texi | 4 ++-- > 6 files changed, 25 insertions(+), 25 deletions(-) > > diff --git a/gcc/common/config/i386/i386-common.cc > b/gcc/common/config/i386/i386-common.cc > index 52ad1c5acd1..3891fca8ecb 100644 > --- a/gcc/common/config/i386/i386-common.cc > +++ b/gcc/common/config/i386/i386-common.cc > @@ -325,14 +325,12 @@ along with GCC; see the file COPYING3. If not see > #define OPTION_MASK_ISA2_APX_F_UNSET OPTION_MASK_ISA2_APX_F > #define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512 > #define OPTION_MASK_ISA2_USER_MSR_UNSET OPTION_MASK_ISA2_USER_MSR > -#define OPTION_MASK_ISA2_AVX10_1_256_UNSET \ > - (OPTION_MASK_ISA2_AVX10_1_256 | OPTION_MASK_ISA2_AVX10_1_512_UNSET \ > - | OPTION_MASK_ISA2_AVX10_2_256_UNSET) > -#define OPTION_MASK_ISA2_AVX10_1_512_UNSET \ > - (OPTION_MASK_ISA2_AVX10_1_512 | OPTION_MASK_ISA2_AVX10_2_512_UNSET) > -#define OPTION_MASK_ISA2_AVX10_2_256_UNSET OPTION_MASK_ISA2_AVX10_2_256 > -#define OPTION_MASK_ISA2_AVX10_2_512_UNSET \ > - (OPTION_MASK_ISA2_AVX10_2_512 | OPTION_MASK_ISA2_AMX_AVX512_UNSET) > +#define OPTION_MASK_ISA2_AVX10_1_UNSET \ > + (OPTION_MASK_ISA2_AVX10_1_256 | OPTION_MASK_ISA2_AVX10_1_512 \ > + | OPTION_MASK_ISA2_AVX10_2_UNSET) > +#define OPTION_MASK_ISA2_AVX10_2_UNSET \ > + (OPTION_MASK_ISA2_AVX10_2_256 | OPTION_MASK_ISA2_AVX10_2_512 \ > + OPTION_MASK_ISA2_AMX_AVX512_UNSET) > #define OPTION_MASK_ISA2_AMX_AVX512_UNSET OPTION_MASK_ISA2_AMX_AVX512 > #define OPTION_MASK_ISA2_AMX_TF32_UNSET OPTION_MASK_ISA2_AMX_TF32 > #define OPTION_MASK_ISA2_AMX_TRANSPOSE_UNSET OPTION_MASK_ISA2_AMX_TRANSPOSE > @@ -1378,8 +1376,8 @@ ix86_handle_option (struct gcc_options *opts, > } >else > { > - opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_1_256_UNSET; > - opts->x_ix86_isa_flags2_explicit |= > OPTION_MASK_ISA2_AVX10_1_256_UNSET; > + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_1_UNSET; > + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_UNSET; > opts->x_ix86_no_avx10_1_explicit = 1; > } >return true; > @@ -1394,8 +1392,8 @@ ix86
Re: [PATCH] rtl-optimization/118662 - wrong combination of vector sign-extends
On Mon, 27 Jan 2025, Jakub Jelinek wrote: > On Mon, Jan 27, 2025 at 11:09:38AM +0100, Richard Biener wrote: > > PR rtl-optimization/118662 > > * combine.cc (try_combine): When re-materializing a load > > from an extended reg by a lowpart subreg make sure we're > > dealing with single-component modes. > > > > * gcc.dg/torture/pr118662.c: New testcase. > > --- > > gcc/combine.cc | 5 + > > gcc/testsuite/gcc.dg/torture/pr118662.c | 18 ++ > > 2 files changed, 23 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/torture/pr118662.c > > > > diff --git a/gcc/combine.cc b/gcc/combine.cc > > index a2d4387cebe..4849603ba5e 100644 > > --- a/gcc/combine.cc > > +++ b/gcc/combine.cc > > @@ -3904,6 +3904,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn > > *i1, rtx_insn *i0, > > copy. This saves at least one insn, more if register allocation can > > eliminate the copy. > > > > + We cannot do this if the involved modes have more than one elements, > > + like for vector or complex modes. > > + > > We cannot do this if the destination of the first assignment is a > > condition code register. We eliminate this case by making sure > > the SET_DEST and SET_SRC have the same mode. > > @@ -3919,6 +3922,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn > > *i1, rtx_insn *i0, > >&& GET_CODE (SET_SRC (XVECEXP (newpat, 0, 0))) == SIGN_EXTEND > >&& (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0))) > >== GET_MODE (SET_SRC (XVECEXP (newpat, 0, 0 > > + && known_eq (GET_MODE_NUNITS > > + (GET_MODE (SET_DEST (XVECEXP (newpat, 0, 0, 1) > > If you want to rule this out for vector/complex modes, then the above > doesn't do that, all the V1??mode will still be optimized, and I think > that is undesirable. If you think so (I think it should work fine for V1??mode). > So, either && !VECTOR_MODE_P (GET_MODE (...)) > && !COMPLEX_MODE_P (GET_MODE (...)) > or perhaps as less expensive but less readable check > && GET_MODE_INNER (GET_MODE (...)) == GET_MODE (...) Had the latter before figuring the GET_MODE_NUNITS variant. I do prefer a "positive" check, rtl.texi says sign_extend applies to all fixed-point modes but ALL_SCALAR_FIXED_POINT_MODE_P doesn't include regular MODE_INT. I guess I'll go with !VECTOR_MODE_P && !COMPLEX_MODE_P, while SCALAR_INT_MODE_P looks safe it might cause regressions. Thanks, Richard. > with a comment. > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH] c++: Handle CWG2867 even in namespace scope structured bindings in header modules [PR115769]
On 1/21/25 10:52 AM, Jakub Jelinek wrote: Hi! On top of the https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662507.html https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662750.html patches (where the first one implements CWG2867 for block scope static or thread_local structured bindings and the latter for namespace scope structured bindings; CWG2867 for automatic structured bindings is already committed in r15-3513) the following patch implements the module streaming of the new STATIC_INIT_DECOMP_BASE_P and STATIC_INIT_DECOMP_NONBASE_P flags. As I think namespace scope structured bindings in the header modules will be pretty rare, I've tried to stream something extra only when they actually appear, in that case it streams extra INTEGER_CSTs which mark end of STATIC_INIT_DECOMP_*BASE_P (0), start of STATIC_INIT_DECOMP_BASE_P for static_aggregates (1), start of STATIC_INIT_DECOMP_NONBASE_P for static_aggregates (2) and ditto for tls_aggregates (3 and 4). The patch also copies with just small tweaks the testcases from the second patch above as header modules. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-01-21 Jakub Jelinek PR c++/115769 gcc/cp/ * module.cc (module_state::write_inits): Verify STATIC_INIT_DECOMP_{,NON}BASE_P flags and stream changes in those out. (module_state::read_inits): Stream those flags in. gcc/testsuite/ * g++.dg/modules/dr2867-1_a.H: New test. * g++.dg/modules/dr2867-1_b.C: New test. * g++.dg/modules/dr2867-2_a.H: New test. * g++.dg/modules/dr2867-2_b.C: New test. * g++.dg/modules/dr2867-3_a.H: New test. * g++.dg/modules/dr2867-3_b.C: New test. * g++.dg/modules/dr2867-4_a.H: New test. * g++.dg/modules/dr2867-4_b.C: New test. --- gcc/cp/module.cc.jj 2025-01-21 09:04:18.085457077 +0100 +++ gcc/cp/module.cc2025-01-21 13:31:40.670938455 +0100 @@ -18723,6 +18723,65 @@ module_state::write_inits (elf_out *to, for (tree init = list; init; init = TREE_CHAIN (init)) if (TREE_LANG_FLAG_0 (init)) { + if (STATIC_INIT_DECOMP_BASE_P (init)) + { + /* Ensure that in the returned result chain if the + STATIC_INIT_DECOMP_*BASE_P flags are set, there is + always one or more STATIC_INIT_DECOMP_BASE_P TREE_LIST + followed by one or more STATIC_INIT_DECOMP_NONBASE_P. */ + int phase = 0; + tree last = NULL_TREE; + for (tree init2 = TREE_CHAIN (init); +init2; init2 = TREE_CHAIN (init2)) + { + if (phase == 0 && STATIC_INIT_DECOMP_BASE_P (init2)) + ; + else if (phase == 0 +&& STATIC_INIT_DECOMP_NONBASE_P (init2)) + { + phase = TREE_LANG_FLAG_0 (init2) ? 2 : 1; + last = init2; + } + else if (IN_RANGE (phase, 1, 2) +&& STATIC_INIT_DECOMP_NONBASE_P (init2)) + { + if (TREE_LANG_FLAG_0 (init2)) + phase = 2; + last = init2; + } + else + break; + } + if (phase == 2) + { + /* In that case, add markers about it so that the + STATIC_INIT_DECOMP_BASE_P and + STATIC_INIT_DECOMP_NONBASE_P flags can be restored. */ + sec.tree_node (build_int_cst (integer_type_node, + 2 * passes + 1)); + phase = 1; + for (tree init2 = init; init2 != TREE_CHAIN (last); +init2 = TREE_CHAIN (init2)) + if (TREE_LANG_FLAG_0 (init2)) + { + tree decl = TREE_VALUE (init2); + if (phase == 1 + && STATIC_INIT_DECOMP_NONBASE_P (init2)) + { + sec.tree_node (build_int_cst (integer_type_node, + 2 * passes + 2)); + phase = 2; + } + dump ("Initializer:%u for %N", count, decl); + sec.tree_node (decl); + ++count; + } + sec.tree_node (integer_zero_node); + init = last; + continue; + } + } + tree decl = TREE_VALUE (init); dump ("Initializer:%u for %N", count, decl); @@ -18793,16 +18852,43 @@ module_state::read_inits (unsigned count dump.indent (); l
[PATCH v10] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]
Hi Jason, On 17 Jan 2025, at 23:33, Jason Merrill wrote: > On 1/17/25 9:52 AM, Simon Martin wrote: >> Hi Jason, >> >> On 16 Jan 2025, at 22:49, Jason Merrill wrote: >> >>> On 10/16/24 11:43 AM, Simon Martin wrote: As you know the patch had to be reverted due to PR117114, that highlighted a bunch of issues with comparing DECL_VINDEXes: it might give false positives in case of multiple inheritance (the case in PR117114), but also if there’s single inheritance by the hierarchy has more than two levels (another issue I found while bootstrapping with rust enabled). >>> >>> Yes, relying on DECL_VINDEX equality was wrong, sorry to mislead >>> you. >>> The attached updated patch introduces an overrides_p function, based on the existing check_final_overrider, and uses it when the signatures match. >>> >>> That seems unnecessary. It seems like removing that only breaks >>> Woverloaded-virt11.C, and making that work again only requires >>> bringing back the check that DECL_VINDEX (fndecl) is set (to any >>> value). Or remembering that fndecl was a template, so it can't >>> really >>> have the same signature as a non-template, whatever same_signature_p >>> says. >> That’s right, only Woverloaded-virt11.C fails without the >> check_final_overrider call. >> >> Thanks for the suggestion to check whether fndecl is a template. This >> is >> what the updated attached patch does, successfully tested on >> x86_64-pc-linux-gnu. >> >> OK for GCC 15? And if so, thoughts on backporting to release branches >> (technically it’s a regression but it’s “just” an incorrect >> warning fix, so probably not worth the risk)? > > Right, I wouldn't backport. > >> +if (warn_overloaded_virtual == 1 >> +&& overrider_fndecls.elements () == num_fns) >> + /* All the fns override a base virtual. */ >> + continue; > > This looks like the only use of the overrider_fndecls hash_set. A > hash_set seems a bit overkill for checking whether everything in fns > is an overrider; keeping track of how many times the old any_override > was set should work just as well? Yeah you’re right :-/ I’ve changed my latest patch to simply count overriders. >> +/* fndecls hides base_fndecls[k]. */ >> +auto_vec &hiders = >> + hidden_base_fndecls.get_or_insert (base_fndecls[k]); >> +if (!hiders.contains (fndecl)) >> + hiders.safe_push (fndecl); > > Hmm, do you think users want a full list of the overloads that don't > override? I'd think the problem is more the overload that doesn't > exist rather than the ones that do. The current code ends up in the > OVERLOAD handling of dump_decl that just prints scope::name. Indeed, the full list is probably not super useful... One problem with the current code is that for conversion operators, it will give a note such as “note: by 'operator’”, so I propose to keep track of at least one of the hiders, and use it to show the note (and get a proper “by 'virtual B::operator char()'” note for conversion operators). Hence the updated patch, successfully tested on x86_64-pc-linux-gnu. Ok for trunk? Simon From 29ee5ada1c2fe132891d09528b6e26aece16e6d1 Mon Sep 17 00:00:00 2001 From: Simon Martin Date: Mon, 27 Jan 2025 14:17:19 +0100 Subject: [PATCH] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918] We currently emit an incorrect -Woverloaded-virtual warning upon the following test case === cut here === struct A { virtual operator int() { return 42; } virtual operator char() = 0; }; struct B : public A { operator char() { return 'A'; } }; === cut here === The problem is that when iterating over ovl_range (fns), warn_hidden gets confused by the conversion operator marker, concludes that seen_non_override is true and therefore emits a warning for all conversion operators in A that do not convert to char, even if -Woverloaded-virtual is 1 (e.g. with -Wall, the case reported). A second set of problems is highlighted when -Woverloaded-virtual is 2. First, with the same test case, since base_fndecls contains all conversion operators in A (except the one to char, that's been removed when iterating over ovl_range (fns)), we emit a spurious warning for the conversion operator to int, even though it's unrelated. Second, in case there are several conversion operators with different cv-qualifiers to the same type in A, we rightfully emit a warning, however the note uses the location of the conversion operator marker instead of the right one; location_of should go over conv_op_marker. This patch fixes all these by explicitly keeping track of (1) base methods that are overriden, as well as (2) base methods that are hidden but not overriden (and by what), and warning about methods that are in (2) but not (1). It also ignores non virtual base methods, per "definition" of -Woverloaded-virtual. Successfully te
Re: [PATCH] c++: Use mapped reads and writes when munmap and msync are available
On 1/16/25 3:15 PM, John David Anglin wrote: Tested on hppa64-hp-hpux11.11 and hppa-unknown-linux-gnu. Okay for trunk? OK. Dave --- c++: Use mapped reads and writes when munmap and msync are available Module support is broken when MAPPED_READING and MAPPED_WRITING are defined to 0. This causes internal compiler errors in the permissive-error-1.C and permissive-error-2.C tests. HP-UX 11.11 doesn't define _POSIX_MAPPED_FILES but it does have munmap and msync. Testing indicates support is sufficient for c++ modules, so use checks for these functions instead of _POSIX_MAPPED_FILES check. 2025-01-16 John David Anglin gcc/ChangeLog: PR c++/116524 * configure.ac: Check for munmap and msync. * configure: Regenerate. * config.in: Regenerate. gcc/cp/ChangeLog: * module.cc: Test HAVE_MUNMAP and HAVE_MSYNC instead of _POSIX_MAPPED_FILES > 0. diff --git a/gcc/configure.ac b/gcc/configure.ac index 6c38c4925fb..8fab93c9365 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -1574,7 +1574,7 @@ AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoq \ popen sysconf strsignal getrusage nl_langinfo \ gettimeofday mbstowcs wcswidth mmap posix_fallocate setlocale \ gcc_UNLOCKED_FUNCS madvise mallinfo mallinfo2 fstatat getauxval \ - clock_gettime) + clock_gettime munmap msync) # At least for glibc, clock_gettime is in librt. But don't pull that # in if it still doesn't give us the function we want. diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 61116fe7669..9b2bbdb2988 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -241,11 +241,11 @@ Classes used: #define MAPPED_READING 0 #define MAPPED_WRITING 0 #else -#if HAVE_MMAP_FILE && _POSIX_MAPPED_FILES > 0 -/* mmap, munmap. */ +#if HAVE_MMAP_FILE && HAVE_MUNMAP && HAVE_MSYNC +/* mmap, munmap, msync. */ #define MAPPED_READING 1 #if HAVE_SYSCONF && defined (_SC_PAGE_SIZE) -/* msync, sysconf (_SC_PAGE_SIZE), ftruncate */ +/* sysconf (_SC_PAGE_SIZE), ftruncate */ /* posix_fallocate used if available. */ #define MAPPED_WRITING 1 #else
Re: [PATCH] gcc: Add tree walk case to reach A pack from B in ...B>. [PR118265]
On Wed, 1 Jan 2025, A J Ryan Solutions Ltd wrote: > Hi and happy new year, this patch is to fix a compiler seg-fault as > encountered in the following example: Hi, thanks for the patch! Your fix makes sense to me, and I believe it also fixes the testcases from PR102626 and at least some of its related PRs. > > template struct Class1{}; > > template class Class2; > template...Un> class Class2 > { > public: void apply(){} > }; > > Class1 class1_bool; > Class1 class1_char; > > int main() > { > Class2 class2; > class2.apply(); > } > where it ends up populating the V argument in the instantiated template at > the class2 declaration with the unknown placeholder and the segfault is when > it later tries to get the name for an incompletely > resolved type. Here's a more reduced example that's fixed by your patch: template struct A { }; template void f(A...); int main() { f(A<0>{}, A<1u>{}, A<2l>{}); // OK, Ts deduced to {int,unsigned,long} } A related PR that isn't and shouldn't be fixed by this patch is PR84796 where the type parameter pack is declared at a different level than the non-type parameter pack, e.g.: template struct A { }; template struct B { template static void f(A); }; int main() { A<1, true, 'c'> a; B::f(a); } Before your patch we would ICE on this valid testcase, after we now reject it with a bogus error: : In function ‘int main()’: :12:24: error: no matching function for call to ‘B::f(A<1, true, 'c'>&)’ :12:24: note: there is 1 candidate :7:15: note: candidate 1: ‘template static void B::f(A) [with Ts ...Vs = {Vs ...}; Ts = {int, bool, char}]’ :7:15: note: template argument deduction/substitution failed: :12:24: note: ‘Vs’ is not equivalent to ‘1’ Which is no worse than ICEing, I suppose :) > > The incorrect processing happens in unify_pack_expansion where it is calling > unify_one_argument > here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l24718) > for each argument of the argument pack and recursively unifying the inner > arguments. In > this example the inner argument against V (bool/char) is set in targs however > it is not seen as a pack argument in unify_pack_expansion because in the > unfixed code it does not reach and link this associated > pack and later on when processing the collected arguments to substitute it in > in instantiate_class_template the argument has not been set as an argument > pack in targs and it doesn't match the the template > specialisation parameter which is a parameter pack and it falls back to the > unknown placeholder. > The parameter packs are originally linked in make_pack_expansion > here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l4246) > and the change is in > this tree walk so that in the example above V is chained to Un and after > calling unify_one_argument it loops over both packs and extracts their > associated value set in targs and puts it into it's pack > argument collection. > > Regards > Adam Ryan > > > Subject: [PATCH] Add tree walk case to obtain A pack in > ...B>. > > For non-type parameter packs when unifying the arguments in > unify_pack_expansion > it iterates over the associated packs of a param so that when it recursivly > unifies the param with the arguments it knows which targs have been populated > with parameter pack arguments that it can then collect up. This change adds a > tree walk so that in the example above it reaches ...A and adds it to the > associated packs for ...B and therefore knows it will have been set in targs > in unify_pack_expansion and processes it as per other pack arguments. > > PR gcc/118265 PR c++/118265 rather A ChangeLog entry is missing from the commit message as per https://gcc.gnu.org/contribute.html#patches Here's an example ChangeLog we can use for this patch: gcc/cp/ChangeLog: * pt.cc (find_parameter_packs_r) : Walk into the type of a parameter pack. > > Signed-off-by: Adam J Ryan > --- > gcc/cp/pt.cc| 5 + > gcc/testsuite/g++.dg/pr118265.C | 17 + > 2 files changed, 22 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/pr118265.C > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc > index 0ffa0b53e26..22f5d4b1875 100644 > --- a/gcc/cp/pt.cc > +++ b/gcc/cp/pt.cc > @@ -4012,6 +4012,11 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, > void* data) > &find_parameter_packs_r, ppd, ppd->visited); >return NULL_TREE; > > +case TEMPLATE_PARM_INDEX: > + if (parameter_pack_p) > +WALK_SUBTREE ( TREE_TYPE(t) ); There should be a space before the ( starting a function call not after (as per the GNU coding style), so this should be "WALK_SUBTREE (TREE_TYPE (t));" > + return NULL_TREE; > + > case DECL_EXPR: >{ > tree decl = DECL_EXPR_DECL (t); > dif
Re: [PATCH] RISC-V: ensure needed FRM restore is not eliminable [PR118646]
On 1/27/25 12:48 PM, Vineet Gupta wrote: On 1/26/25 05:29, Jeff Law wrote: On 1/24/25 3:12 PM, Vineet Gupta wrote: RV-Vector FP-INT insns use the rounding mode in FRM register which if explicitly set for V insn needs, is saved/restored (although from the psABI CC Spec, it is not clear if it actually a caller-saved or callee-saved). Anyhow in the failure case the save/restore were generated by the Mode Switch pass, but then eliminated by sched1:DCE and Late-Combine. Fix this by using unspec_volatile variant which won't be eliminated. This showed up as SPEC2017 527.cam4 runtime aborts in glibc:round_away() which checks for standard rounding modes and the "leaking" rounding mode due to the bug happened to be a non-standard RISC-V specific RMM "Round to Nearest, ties to Max". This is testsuite clean: Not sure how it could be clean as I think the test itself is busted ;-) Sorry for the snafu; I would blame "send the patch out on friday" that bit me. I used the raw test for debugging and coming up with the fix, but before integrating the test properly I ran the existing testsuite w/ my patch and made sure it didn't regress anything, so those test results are for real minus the new test. It happens. I had a goof of similar nature trying to get something out the door last week as well :-) What I wanted to do was use your testcase with Pan's patch to see if Pan's patch resolved both issues. Yes it does. Your compiler patch may still be desirable as well, I really haven't really evaluated that yet. FWIW with Pan's rework the need for volatile goes away anyways. The fixed up test (actually tested for A/B pass/fail) which can be integrated in Pan's commit can be found below If you could go ahead and install the test into the testsuite, it'd be appreciated. Thanks! jeff
Re: [PATCH v2 3/7] Remove m_total_lines support from input cache
On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote: > From: Andi Kleen > > With the new cache maintenance algorithm we don't need the > maximum number of lines anymore. Remove all the code for that. > > gcc/ChangeLog: > > PR preprocessor/118168 > * input.cc (total_lines_num): Remove. > (file_cache_slot::evict): Dito. > (file_cache_slot::create): Dito. > (file_cache_slot::set_content): Dito. > (file_cache_slot::file_cache_slot): Dito. > (file_cache_slot::dump): Dito. Thanks, this is OK for trunk once patch 2 goes in - though it's "Ditto", not "Dito" :) Dave
Re: [PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]
Hi Jason, On 27 Jan 2025, at 20:53, Jason Merrill wrote: > On 1/27/25 2:34 PM, Simon Martin wrote: >> Hi Jason, >> >> On 27 Jan 2025, at 15:23, Jason Merrill wrote: >> >>> On 1/27/25 8:14 AM, Simon Martin wrote: Hi Jason, On 24 Jan 2025, at 16:51, Jason Merrill wrote: > On 1/24/25 6:19 AM, Simon Martin wrote: >> Hi Jason, >> >> On 23 Jan 2025, at 22:56, Jason Merrill wrote: >> >>> On 1/23/25 12:06 PM, Simon Martin wrote: Hi Marek, On 23 Jan 2025, at 16:45, Marek Polacek wrote: > On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote: >> Hi Jason, >> >> On 20 Jan 2025, at 22:49, Jason Merrill wrote: >> >>> On 1/20/25 2:11 PM, Simon Martin wrote: Hi Jason, On 15 Jan 2025, at 22:42, Jason Merrill wrote: > On 12/12/24 2:07 PM, Simon Martin wrote: >> We currently ICE upon the following valid (under >> -Wno-vla) >> code >> >> >> === cut here === >> void f(int c) { >> constexpr int r = 4; >> [&](auto) { int t[r * c]; }(0); >> } >> === cut here === >> >> The problem is that when parsing the lambda body, and >> more >> specifically >> the multiplication, we mark the lambda as >> LAMBDA_EXPR_CAPTURE_OPTIMIZED >> even though the replacement of r by 4 is "undone" by the >> call >> to >> build_min_non_dep in build_x_binary_op. This makes >> prune_lambda_captures >> remove the proxy declaration while it should not, and we >> trip >> on >> an >> >> assert at instantiation time. > > Why does prune_lambda_captures remove the declaration if > it's > still > used in the function body? Setting > LAMBDA_EXPR_CAPTURE_OPTIMIZED > just > means "we might have optimized away some captures", the > tree > walk > should have found the use put back by build_x_binary_op. I think that this is due to a bug in mark_const_cap_r, that’s been here since the beginning, i.e. r8-7213-g1577f10a637352: it decides >> NOT to walk sub-trees when encountering a DECL_EXPR for a constant capture proxy (at lambda.cc:1769). I don’t understand why we wouldn’t want to continue. >>> >>> Because that's the declaration of the capture proxy, not a >>> use >>> of >>> it. >> Indeed, thanks for clarifying. >> >>> Why aren't we finding the use in the declaration of t? >> After further investigation, the problem is rather that >> neither >> walk_tree_1 nor cp_walk_subtrees walk the dimensions of array >> types, >> so >> we miss the uses. >> Removing that line fixes the PR, but breaks 3 existing tests (lambda-capture1.C, lambda-const11.C and lambda-const11a.C, that all assert that we optimise out the capture); that’s why I did not pursue it in the first place. >>> >>> Right. >>> But taking another look, it might not be that big a deal that we don’t optimise those out: as soon as we use -O1 or above, the assignment to >> the closure field actually disappears. >>> >>> Completely breaking this optimization is a big deal, >>> particularly >>> since it affects the layout of closure objects. We can't >>> always >>> optimize everything away. >> ACK. >> >> The attached updated patch makes cp_walk_subtrees walk array >> type >> dimensions, which fixes the initial PR without those 3 >> regressions. >> >> >> Successfully tested on x86_64-pc-linux-gnu. Is it OK? >> >> Simon >> >> PS: In theory I think it’d make sense to do do this change >> in >> >> walk_tree_1 since C also supports VLAs, but doing so breaks >> some >> >> OMP >> tests. OMP can do interesting things with array bounds (see >> r9-5354-gda972c05f48637), and fixing those would require >> teaching >> walk_tree_1 about the “omp dummy var” arra
Re: [PATCH v2 2/7] Rebalance file_cache input line cache dynamically
On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote: > From: Andi Kleen > > The input context file_cache maintains an array of anchors > to speed up accessing lines before the previous line. > The array has a fixed upper size and the algorithm relies > on the linemap reporting the maximum number of lines in the file > in advance to compute the position of each anchor in the cache. > > This doesn't work for C which doesn't know the maximum number > of lines before the files has finished parsing. The code > has a fallback for this, but it is quite inefficient and > effectively defeats the cache, so many accesses have to > go through most of the input buffer to compute line > boundaries. For large files this can be very costly > as demonstrated in PR118168. > > Use a different algorithm to maintain the cache without > needing the maximum number of lines in advance. When the cache > runs out of entries and the gap to the last line anchor gets > too large, prune every second entry in the cache. This maintains > even spacing of the line anchors without requiring the maximum > index. > > For the original PR this moves the overhead of enabling > -Wmisleading-indentation to 32% with the default cache size. > With a 10k entry cache it becomes noise. > > cc1 -O0 -fsyntax-only mypy.c -quiet ran > 1.03 ± 0.05 times faster than cc1 -O0 -fsyntax-only mypy.c - > quiet -Wmisleading-indentation --param=file-cache-lines=1 > 1.09 ± 0.08 times faster than cc1 -O0 -fsyntax-only mypy.c - > quiet -Wmisleading-indentation --param=file-cache-lines=1000 > 1.32 ± 0.07 times faster than cc1 -O0 -fsyntax-only mypy.c - > quiet -Wmisleading-indentation > > The code could be further optimized, e.g. use the vectorized > line search functions the preprocessor uses. > > Also it seems the input cache always reads the whole file into > memory, so perhaps it should just be using file mmap if possible. > > gcc/ChangeLog: > > PR preprocessor/118168 > * input.cc (file_cache_slot::get_next_line): Use new > algorithm > to maintain > (file_cache_slot::read_line_num): Use binary search for > lookup. Thanks; this is OK for trunk. I spent some time stepping through this to get clear in my mind how the new algorithm works. FWIW I found the patch below helpful, to clarify in dumps about the slot index versus the line number of the slot; I'll commit this as a followup at some point: Dave diff --git a/gcc/input.cc b/gcc/input.cc index 11dd39412ea1..2abd2cd8eb0f 100644 --- a/gcc/input.cc +++ b/gcc/input.cc @@ -663,10 +663,11 @@ file_cache_slot::dump (FILE *out, int indent) const indent, "", (int)m_missing_trailing_newline); fprintf (out, "%*sline records (%i):\n", indent, "", m_line_record.length ()); + int idx = 0; for (auto &line : m_line_record) -fprintf (out, "%*sline %zi: byte offsets: %zi-%zi\n", +fprintf (out, "%*s[%i]: line %zi: byte offsets: %zi-%zi\n", indent + 2, "", -line.line_num, line.start_pos, line.end_pos); +idx++, line.line_num, line.start_pos, line.end_pos); } /* Returns TRUE iff the cache would need to be filled with data coming
Re: [PATCH v2 4/7] Add a cache of recent lines
On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote: > From: Andi Kleen > > For larger files the file_cache line index will be spread out to make > the index fit into the fixed buffer, so any access to the non latest > line > will need some skipping of lines. > > Most accesses for line are near the latest line because > a diagnostic is likely near where the scanner is currently lexing. > > Add a second cache for recent lines. It is organized as a ring buffer > and maintains the last 256 lines relative to the last input line. > > With that, enabling -Wmisleading-indentation for the test case in > PR preprocessor/118168, is within the run-to-run variation. > > gcc/ChangeLog: > > PR preprocessor/118168 > * input.cc (file_cache::m_line_recent, > m_line_recent_first, m_line_recent_last): Add. > (file_cache_slot::evict): Clear new fields. > (file_cache_slot::create): Clear new fields. > (file_cache_slot::file_cache_slot): Initialize new fields. > (file_cache_slot::~file_cache_slot): Release m_line_recent. > (file_cache_slot::get_next_line): Maintain ring buffer of > lines > in m_line_recent. > (file_cache_slot::read_line_num): Use m_line_recent to look > up > recent lines quickly. > --- > gcc/input.cc | 49 - > 1 file changed, 48 insertions(+), 1 deletion(-) > > diff --git a/gcc/input.cc b/gcc/input.cc > index 82a58fdef27..b3fe38eb77c 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc > @@ -126,6 +126,7 @@ public: > > static const size_t buffer_size = 4 * 1024; > static size_t line_record_size; > + static size_t recent_cached_lines_shift; > > /* The number of time this file has been accessed. This is used > to designate which file cache to evict from the cache > @@ -174,6 +175,13 @@ public: > this is scaled down dynamically, with the line_info becoming > anchors. */ > vec m_line_record; > > + /* A cache of the recently seen lines. This is maintained as a > ring > + buffer. */ > + vec m_line_recent; > + > + /* First and last valid entry in m_line_recent. */ > + size_t m_line_recent_last, m_line_recent_first; > + > void offset_buffer (int offset) > { > gcc_assert (offset < 0 ? m_alloc_offset + offset >= 0 > @@ -187,6 +195,7 @@ public: > }; > > size_t file_cache_slot::line_record_size = 100; > +size_t file_cache_slot::recent_cached_lines_shift = 8; > > /* Tune file_cache. */ > void > @@ -391,6 +400,8 @@ file_cache_slot::evict () > m_line_start_idx = 0; > m_line_num = 0; > m_line_record.truncate (0); > + m_line_recent_first = 0; > + m_line_recent_last = 0; > m_use_count = 0; > m_missing_trailing_newline = true; > } > @@ -486,6 +497,8 @@ file_cache_slot::create (const > file_cache::input_context &in_context, > m_nb_read = 0; > m_line_start_idx = 0; > m_line_num = 0; > + m_line_recent_first = 0; > + m_line_recent_last = 0; > m_line_record.truncate (0); > /* Ensure that this cache entry doesn't get evicted next time > add_file_to_cache_tab is called. */ > @@ -592,9 +605,13 @@ file_cache::lookup_or_add_file (const char > *file_path) > file_cache_slot::file_cache_slot () > : m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0), > m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx > (0), > - m_line_num (0), m_missing_trailing_newline (true) > + m_line_num (0), m_missing_trailing_newline (true), > + m_line_recent_last (0), m_line_recent_first (0) > { > m_line_record.create (0); > + m_line_recent.create (1U << recent_cached_lines_shift); > + for (int i = 0; i < 1 << recent_cached_lines_shift; i++) > + m_line_recent.quick_push (file_cache_slot::line_info (0, 0, 0)); > } > > /* Destructor for a cache of file used by caret diagnostic. */ > @@ -613,6 +630,7 @@ file_cache_slot::~file_cache_slot () > m_data = 0; > } > m_line_record.release (); > + m_line_recent.release (); > } > > void > @@ -865,6 +883,20 @@ file_cache_slot::get_next_line (char **line, > ssize_t *line_len) > line_end - m_data)); > } > > + /* Cache recent tail lines separately for fast access. This > assumes > + most accesses do not skip backwards. */ > + if (m_line_recent_last == m_line_recent_first > + || m_line_recent[m_line_recent_last].line_num == m_line_num > - 1) > + { > + size_t mask = ((size_t)1 << recent_cached_lines_shift) - 1; > + m_line_recent_last = (m_line_recent_last + 1) & mask; > + if (m_line_recent_last == m_line_recent_first) > + m_line_recent_first = (m_line_recent_first + 1) & mask; > + m_line_recent[m_line_recent_last] = > + file_cache_slot::line_info (m_line_num, m_line_start_idx, > + line_end - m_data); > + } > + If I reading this right, calls to get_next_line lead to insertions into the ring buffer whilst the buffer is empty or the last
Re: [PATCH] Fortran: fix bogus diagnostics on renamed interface import [PR110993]
Am 27.01.25 um 01:41 schrieb Jerry D: On 1/26/25 2:07 PM, Harald Anlauf wrote: Dear all, in the checking of imported interfaces we need to use the local names of procedures that are renamed-on-use, as the original name becomes inaccessible. Similarly, we should not compare interfaces of non-bind(C) procedures against bind(C) interfaces that are not explicitly made accessible via a use statement, see testcase. Regtested on x86_64-pc-linux-gnu. OK for mainline? Could this one be backportable, e.g. to 14-branch? Thanks, Harald This is OK. Backport up to you. Pushed to mainline as r15-7234-g9104472b645f76 . Will backport to 14-branch in a couple of days. Jerry Thanks, Harald
Re: [PATCH v2 5/7] Size input line cache based on file size
On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote: > From: Andi Kleen > > While the input line cache size now tunable it's better if the > compiler > auto tunes it. Otherwise large files needing random file access will > still have to search many lines to find the right lines. > > Add support for allocating one line anchor per hundred input lines. > This means an overhead of ~235k per 1M input lines on 64bit, which > seems reasonable. > > gcc/ChangeLog: > > PR preprocessor/118168 > * input.cc (file_cache_slot::get_next_line): Implement > dynamic sizing of m_line_record based on input length. > * params.opt: (param_file_cache_lines): Set to 0 to size > dynamically. > --- > gcc/input.cc | 11 --- > gcc/params.opt | 2 +- > 2 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/gcc/input.cc b/gcc/input.cc > index b3fe38eb77c..d5d7dbb043e 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc > @@ -194,7 +194,7 @@ public: > > }; > > -size_t file_cache_slot::line_record_size = 100; > +size_t file_cache_slot::line_record_size = 0; > size_t file_cache_slot::recent_cached_lines_shift = 8; > > /* Tune file_cache. */ > @@ -865,8 +865,13 @@ file_cache_slot::get_next_line (char **line, > ssize_t *line_len) > size_t delta = rlen >= 1 ? > m_line_num - m_line_record[rlen - 1].line_num : 1; > > + size_t max_size = line_record_size; > + /* One anchor per hundred input lines. */ > + if (max_size == 0) > + max_size = m_line_num / 100; > + > /* If we're too far beyond drop half of the lines to > rebalance. */ > - if (rlen == line_record_size && delta >= spacing*2) > + if (rlen == max_size && delta >= spacing*2) > { > size_t j = 0; > for (size_t i = 1; i < rlen; i += 2) > @@ -876,7 +881,7 @@ file_cache_slot::get_next_line (char **line, > ssize_t *line_len) > spacing *= 2; > } > > - if (rlen < line_record_size && delta >= spacing) > + if (rlen < max_size && delta >= spacing) > m_line_record.safe_push > (file_cache_slot::line_info (m_line_num, > m_line_start_idx, > diff --git a/gcc/params.opt b/gcc/params.opt > index 5d234a607c0..a56c20af2f8 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -139,7 +139,7 @@ Common Joined UInteger > Var(param_file_cache_files) Init(16) Param > Max number of files in the file cache. > > -param=file-cache-lines= > -Common Joined UInteger Var(param_file_cache_lines) Init(100) Param > +Common Joined UInteger Var(param_file_cache_lines) Init(0) Param > Max number of lines to index into file cache. > > -param=fsm-scale-path-stmts= Please update the description line to indicate that 0 is automatic. OK for trunk with that change, once the prerequisites are in. Thanks Dave
Re: [PATCH v2 6/7] Enable vectorization for input.cc find_end_of_line function
On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote: > From: Andi Kleen > > This is the hot function in input.cc > > The vectorizer can vectorize it now, but in a generic cpu O2 x86 > build it isn't. > Add a automatic target clone to handle it for x86 and build > that function with O3. > > The ifdef here is ugly, perhaps gcc should have a more convenient > "clone for vectorization if possible" attribute to handle this > portably. This patch is very cool (no pun intended); how much does it help? The patch is OK by me, but given that we're in stage 4, does a release manager approve? [CCed] Thanks Dave > > gcc/ChangeLog: > > * input.cc: (VECTORIZE): Add definition for x86. > (find_end_of_line): Mark for vectorizer. > --- > gcc/input.cc | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/gcc/input.cc b/gcc/input.cc > index d5d7dbb043e..f1a15de66f1 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc > @@ -740,11 +740,18 @@ file_cache_slot::maybe_read_data () > return read_data (); > } > > +#if defined(__x86_64__) && __GNUC__ >= 15 > +#define VECTORIZE > __attribute__((target_clones("default,avx2,avx10.2,avx512f"), > optimize("O3"))) > +#else > +#define VECTORIZE > +#endif > + > /* Helper function for file_cache_slot::get_next_line (), to find > the end of > the next line. Returns with the memchr convention, i.e. nullptr > if a line > terminator was not found. We need to determine line endings in > the same > manner that libcpp does: any of \n, \r\n, or \r is a line > ending. */ > > +VECTORIZE > static const char * > find_end_of_line (const char *s, size_t len) > {
[PATCH] c++: friend vs inherited guide confusion [PR117855]
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/14? -- >8 -- We recently started using the lang_decl_fn::context field to track inheritedness of a deduction guide (for C++23 inherited CTAD). This new overloading of the field accidentally made DECL_FRIEND_CONTEXT return non-NULL for inherited guides, which breaks the below testcase overload resolution containing an inherited guide. This patch fixes this by refining DECL_FRIEND_CONTEXT to return NULL for deduction guides. PR c++/117855 gcc/cp/ChangeLog: * cp-tree.h (DECL_FRIEND_CONTEXT): Refine to exclude deduction guides. gcc/testsuite/ChangeLog: * g++.dg/cpp23/class-deduction-inherited7.C: New test. --- gcc/cp/cp-tree.h | 3 ++- .../g++.dg/cpp23/class-deduction-inherited7.C| 12 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index c9128bd3e7b..dc527380921 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -3631,7 +3631,8 @@ struct GTY(()) lang_decl { the DECL_FRIEND_CONTEXT for `f' will be `S'. */ #define DECL_FRIEND_CONTEXT(NODE) \ ((DECL_DECLARES_FUNCTION_P (NODE) && !DECL_VIRTUAL_P (NODE) \ -&& !DECL_CONSTRUCTOR_P (NODE)) \ +&& !DECL_CONSTRUCTOR_P (NODE) \ +&& (cxx_dialect < cxx23 || !deduction_guide_p (NODE))) \ ? LANG_DECL_FN_CHECK (NODE)->context\ : NULL_TREE) diff --git a/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C b/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C new file mode 100644 index 000..b1d5e89ad02 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C @@ -0,0 +1,12 @@ +// PR c++/117855 +// { dg-do compile { target c++20 } } + +template struct span { span(T&&);}; +template span(T &&) -> span; +template +struct this_span : span { + using span::span; +}; +template this_span(T &&) -> this_span; +int vec; +this_span a = vec; -- 2.48.1.91.g5f8f7081f7
Re: [PATCH] c++: friend vs inherited guide confusion [PR117855]
On Mon, 27 Jan 2025, Patrick Palka wrote: > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look > OK for trunk/14? > > -- >8 -- > > We recently started using the lang_decl_fn::context field to track > inheritedness of a deduction guide (for C++23 inherited CTAD). This > new overloading of the field accidentally made DECL_FRIEND_CONTEXT > return non-NULL for inherited guides, which breaks the below testcase > overload resolution containing an inherited guide. > > This patch fixes this by refining DECL_FRIEND_CONTEXT to return NULL > for deduction guides. > > PR c++/117855 > > gcc/cp/ChangeLog: > > * cp-tree.h (DECL_FRIEND_CONTEXT): Refine to exclude deduction > guides. > > gcc/testsuite/ChangeLog: > > * g++.dg/cpp23/class-deduction-inherited7.C: New test. > --- > gcc/cp/cp-tree.h | 3 ++- > .../g++.dg/cpp23/class-deduction-inherited7.C| 12 > 2 files changed, 14 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C > > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h > index c9128bd3e7b..dc527380921 100644 > --- a/gcc/cp/cp-tree.h > +++ b/gcc/cp/cp-tree.h > @@ -3631,7 +3631,8 @@ struct GTY(()) lang_decl { > the DECL_FRIEND_CONTEXT for `f' will be `S'. */ > #define DECL_FRIEND_CONTEXT(NODE)\ >((DECL_DECLARES_FUNCTION_P (NODE) && !DECL_VIRTUAL_P (NODE)\ > -&& !DECL_CONSTRUCTOR_P (NODE)) \ > +&& !DECL_CONSTRUCTOR_P (NODE)\ > +&& (cxx_dialect < cxx23 || !deduction_guide_p (NODE))) \ N.B. I decided to check cxx_dialect here since deduction_guide_p seems relatively expensive to call? In another version of this patch I made us track dguide-ness via a dedicated bit flag instead of the DECL_NAME comparison to make the predicate essentially free (and potentially also more robust) but I eventually shelved that as more stage 1 material. > ? LANG_DECL_FN_CHECK (NODE)->context \ > : NULL_TREE) > > diff --git a/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C > b/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C > new file mode 100644 > index 000..b1d5e89ad02 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp23/class-deduction-inherited7.C > @@ -0,0 +1,12 @@ > +// PR c++/117855 > +// { dg-do compile { target c++20 } } > + > +template struct span { span(T&&);}; > +template span(T &&) -> span; > +template > +struct this_span : span { > + using span::span; > +}; > +template this_span(T &&) -> this_span; > +int vec; > +this_span a = vec; > -- > 2.48.1.91.g5f8f7081f7 > >
Re: [PATCH v2 7/7] Add a unit test for random access in the file cache
On Mon, 2025-01-27 at 11:25 -0500, David Malcolm wrote: > On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote: > > From: Andi Kleen > > Thanks for writing the test case! > > Some nitpicks... > > > > > > gcc/ChangeLog: > > > > * input.cc (check_line): New. > > (test_replacement): New function to test line caching. > > (input_cc_tests): Call test_replacement > > --- > > gcc/input.cc | 46 ++ > > 1 file changed, 46 insertions(+) > > > > diff --git a/gcc/input.cc b/gcc/input.cc > > index f1a15de66f1..524d5d2b6d1 100644 > > --- a/gcc/input.cc > > +++ b/gcc/input.cc > > @@ -2346,6 +2346,51 @@ test_make_location_nonpure_range_endpoints > > (const line_table_case &case_) > > ASSERT_FALSE (IS_ADHOC_LOC (get_finish (not_aaa_eq_bbb))); > > } > > > > +/* Verify reading of a specific line LINENUM in TMP, FC. */ > > + > > +static void check_line (temp_source_file &tmp, file_cache &fc, int > > linenum) > > +{ > > + char_span line = fc.get_source_line (tmp.get_filename (), > > linenum); > > + int n; > > + ASSERT_TRUE (sscanf (line.get_buffer (), "%d", &n) == 1); > > Please add a comment around here something like: > > /* Although get_buffer's result isn't guaranteed to be > null-terminated, it will be for every line in test_replacement. > */ > > or somesuch. ...or rather, sscanf will always hit a "\n" character before running off the end of the buffer for the uses within test_replacement. Dave
Re: [PING^2,PATCH] Add target hook to assemble a VAR_DECL
Georg-Johann Lay writes: > Am 24.01.25 um 08:18 schrieb Richard Biener: >> On Thu, Jan 23, 2025 at 4:53 PM Georg-Johann Lay wrote: >>> >>> Am 23.01.25 um 14:58 schrieb Richard Biener: On Thu, Jan 23, 2025 at 2:23 PM Georg-Johann Lay wrote: > > Hi, this is Ping #2 for a patch from 2024. > > It adds a new target hook that allows to output > assembly code for a VAR_DECL in a custom way. > > The default action is an obvious no-op, > i.e. assemble_variable() behaves like before. I tried to understand the AVR part of the series - there I fail to see what exactly special handling "io" and friends requires and how that was made work with the TLS noswitch section. I do not think the sentence the middle-end doesn't allow custom NOSWITCH sections is correct (it would be a matter of exporting get_noswitch_section), but I also don't exactly see how "io" and friends require a NOSWITCH. >>> >>> A noswitch section is one way to go. >>> >>> However, there is no way to attach a noswitch section to a VAR_DECL. >> >> Hmm, I don't know varasm.cc by heart either but there's the select_section >> hook invoked by get_variable_section () which looks like it could do the >> trick. > > No. > > That hook only returns a section *name*, not a section object. > So you cannot return an unnamed section, which by definition, don't > have a section name. Perhaps I misunderstand what you mean, but select_section does return a "section *". On the point about -fdata-sections, I think avr's unique_section should be a no-op for these variables. That should ensure that the compiler doesn't set DECL_SECTION_NAME itself. (I suppose that source-code attributes that set DECL_SECTION_NAME should be diagnosed as incompatible with the io attribute.) FWIW, I agree with Richard that a hook to bypass most of assemble_variable doesn't feel like the right abstraction. Thanks, Richard
Re: [PATCH v10] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]
On 1/27/25 10:41 AM, Simon Martin wrote: Hi Jason, On 17 Jan 2025, at 23:33, Jason Merrill wrote: On 1/17/25 9:52 AM, Simon Martin wrote: Hi Jason, On 16 Jan 2025, at 22:49, Jason Merrill wrote: On 10/16/24 11:43 AM, Simon Martin wrote: As you know the patch had to be reverted due to PR117114, that highlighted a bunch of issues with comparing DECL_VINDEXes: it might give false positives in case of multiple inheritance (the case in PR117114), but also if there’s single inheritance by the hierarchy has more than two levels (another issue I found while bootstrapping with rust enabled). Yes, relying on DECL_VINDEX equality was wrong, sorry to mislead you. The attached updated patch introduces an overrides_p function, based on the existing check_final_overrider, and uses it when the signatures match. That seems unnecessary. It seems like removing that only breaks Woverloaded-virt11.C, and making that work again only requires bringing back the check that DECL_VINDEX (fndecl) is set (to any value). Or remembering that fndecl was a template, so it can't really have the same signature as a non-template, whatever same_signature_p says. That’s right, only Woverloaded-virt11.C fails without the check_final_overrider call. Thanks for the suggestion to check whether fndecl is a template. This is what the updated attached patch does, successfully tested on x86_64-pc-linux-gnu. OK for GCC 15? And if so, thoughts on backporting to release branches (technically it’s a regression but it’s “just” an incorrect warning fix, so probably not worth the risk)? Right, I wouldn't backport. + if (warn_overloaded_virtual == 1 + && overrider_fndecls.elements () == num_fns) + /* All the fns override a base virtual. */ + continue; This looks like the only use of the overrider_fndecls hash_set. A hash_set seems a bit overkill for checking whether everything in fns is an overrider; keeping track of how many times the old any_override was set should work just as well? Yeah you’re right :-/ I’ve changed my latest patch to simply count overriders. + /* fndecls hides base_fndecls[k]. */ + auto_vec &hiders = + hidden_base_fndecls.get_or_insert (base_fndecls[k]); + if (!hiders.contains (fndecl)) + hiders.safe_push (fndecl); Hmm, do you think users want a full list of the overloads that don't override? I'd think the problem is more the overload that doesn't exist rather than the ones that do. The current code ends up in the OVERLOAD handling of dump_decl that just prints scope::name. Indeed, the full list is probably not super useful... One problem with the current code is that for conversion operators, it will give a note such as “note: by 'operator’”, so I propose to keep track of at least one of the hiders, and use it to show the note (and get a proper “by 'virtual B::operator char()'” note for conversion operators). Hence the updated patch, successfully tested on x86_64-pc-linux-gnu. Ok for trunk? + else if (!template_p /* Template methods don't override. */ +&& same_signature_p (fndecl, base_fndecls[k])) + { + overriden_base_fndecls.add (base_fndecls[k]); + ++num_overriders; + } I'm concerned that this will increment num_overriders multiple times for a single fndecl if it overrides functions in multiple bases. Jason
Re: [PING^2,PATCH] Add target hook to assemble a VAR_DECL
Am 27.01.25 um 16:19 schrieb Richard Sandiford: Georg-Johann Lay writes: Am 24.01.25 um 08:18 schrieb Richard Biener: On Thu, Jan 23, 2025 at 4:53 PM Georg-Johann Lay wrote: Am 23.01.25 um 14:58 schrieb Richard Biener: On Thu, Jan 23, 2025 at 2:23 PM Georg-Johann Lay wrote: Hi, this is Ping #2 for a patch from 2024. It adds a new target hook that allows to output assembly code for a VAR_DECL in a custom way. The default action is an obvious no-op, i.e. assemble_variable() behaves like before. I tried to understand the AVR part of the series - there I fail to see what exactly special handling "io" and friends requires and how that was made work with the TLS noswitch section. I do not think the sentence the middle-end doesn't allow custom NOSWITCH sections is correct (it would be a matter of exporting get_noswitch_section), but I also don't exactly see how "io" and friends require a NOSWITCH. A noswitch section is one way to go. However, there is no way to attach a noswitch section to a VAR_DECL. Hmm, I don't know varasm.cc by heart either but there's the select_section hook invoked by get_variable_section () which looks like it could do the trick. No. That hook only returns a section *name*, not a section object. So you cannot return an unnamed section, which by definition, don't have a section name. Perhaps I misunderstand what you mean, but select_section does return a "section *". Hi Richard, returning a custom noswitch section in targetm.asm_out.select_section does not work. Reason is that varasm.cc::get_variable_section has its own ideas how noswitch / unnamed sections should work and bypasses the result from the target hook. I tried to set DECL_COMMON (decl) = 0; and also DECL_COMMON (decl) = 1; but neither of which does work. On the point about -fdata-sections, I think avr's unique_section should be a no-op for these variables. That should ensure that the compiler doesn't set DECL_SECTION_NAME itself. (I suppose that source-code attributes that set DECL_SECTION_NAME should be diagnosed as incompatible with the io attribute.) The avr backend rejects initializers for respective decls since initializers don't make sense for the attribute. Outcome is that varasm puts the decl into .bss or .comm and ignores anything from targetm.asm_out.select_section. FWIW, I agree with Richard that a hook to bypass most of assemble_variable doesn't feel like the right abstraction. So would you please propose something that would be approved *and* will work. assemble_variable() calls a zoo of hook functions and macros that cannot be customized since they don't provide enough context. I attached a patch of the targetm.asm_out.select_section proposal for reference. Can you please point out what I am doing wrong, or what other hooks are eligible to implement such a feature. The avr testsuite has already some test cases (all of which are currently ICEing due to abusing TLS). $ make -k check-gcc RUNTESTFLAGS="--target_board=atmega128-sim avr.exp=pr112952*.c " Johanndiff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc index 656d3e7389b..f264ac22688 100644 --- a/gcc/config/avr/avr.cc +++ b/gcc/config/avr/avr.cc @@ -222,6 +222,8 @@ avr_arch_id avr_arch_index; avr_asm_select_section, but it must not be local there because of GTY. */ static GTY(()) section *progmem_section[ADDR_SPACE_COUNT]; +static GTY(()) section *my_noswitch_section; + /* Condition for insns/expanders from avr-dimode.md. */ bool avr_have_dimode = true; @@ -11930,7 +11932,12 @@ avr_asm_init_sections (void) readonly_data_section->unnamed.callback = avr_output_data_section_asm_op; data_section->unnamed.callback = avr_output_data_section_asm_op; bss_section->unnamed.callback = avr_output_bss_section_asm_op; - tls_comm_section->noswitch.callback = avr_output_addr_attrib; + + extern section* get_noswitch_section (unsigned int tree, + noswitch_section_callback callback); + unsigned flags = SECTION_NOSWITCH; + my_noswitch_section = get_noswitch_section (flags, + avr_output_addr_attrib); } @@ -12159,9 +12166,8 @@ avr_encode_section_info (tree decl, rtx rtl, int new_decl_p) other way than making use of tls_comm_section. As we are using that section anyway, also use it in the public case. */ - DECL_COMMON (decl) = 1; + DECL_COMMON (decl) = 0; set_decl_section_name (decl, (const char *) nullptr); - set_decl_tls_model (decl, (tls_model) 2); } } } @@ -12209,6 +12215,11 @@ avr_encode_section_info (tree decl, rtx rtl, int new_decl_p) static section * avr_asm_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT align) { + if (lookup_attribute ("io", DECL_ATTRIBUTES (decl)) + || lookup_attribute ("io_low", DECL_ATTRIBUTES (decl)) + || lookup_attribute ("address", DECL_ATTRIBUTES (decl))) +return my_noswitch_section; + section *sect = default_elf_select_section (decl, reloc, align); if (decl
[PATCH] tree-ssa-dce: Avoid creating invalid BBs with no outgoing edge (PR117892)
Hi, Zhendong Su and Michal Jireš found out that our gimple DSE pass can, under fairly specific conditions, remove a noreturn call which then leaves behind a "normal" BB with no successor edges which following passes do not expect. This patch simply tells the pass to leave such calls alone even when they otherwise appear to be dead. Interestingly, our CFG verifier does not report this. I'll put on my todo list to add a test for it in the next stage 1. Bootstrapped and tested on x86_64-linux, OK for master? Thanks, Martin gcc/ChangeLog: 2025-01-27 Martin Jambor PR tree-optimization/117892 * tree-ssa-dse.cc (dse_optimize_call): Leave noreturn calls alone. gcc/testsuite/ChangeLog: 2025-01-27 Martin Jambor PR tree-optimization/117892 * gcc.dg/tree-ssa/pr117892.c: New test. * gcc.dg/tree-ssa/pr118517.c: Likewise. co-authored-by: Michal Jireš --- gcc/testsuite/gcc.dg/tree-ssa/pr117892.c | 17 + gcc/testsuite/gcc.dg/tree-ssa/pr118517.c | 11 +++ gcc/tree-ssa-dse.cc | 5 +++-- 3 files changed, 31 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr117892.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr118517.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr117892.c b/gcc/testsuite/gcc.dg/tree-ssa/pr117892.c new file mode 100644 index 000..d9b9c15095f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr117892.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O1" } */ + + +volatile int a; +void b(int *c) { + int *d = 0; + *c = 0; + *d = 0; + __builtin_abort(); +} +int main() { + int f; + if (a) +b(&f); + return 0; +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr118517.c b/gcc/testsuite/gcc.dg/tree-ssa/pr118517.c new file mode 100644 index 000..3a34f6788a9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr118517.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fno-ipa-pure-const" } */ + +void __attribute__((noreturn)) bar(void) { + __builtin_unreachable (); +} + +int p; +void foo() { + if (p) bar(); +} diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc index 753d7ef148b..492080a7eb0 100644 --- a/gcc/tree-ssa-dse.cc +++ b/gcc/tree-ssa-dse.cc @@ -1396,8 +1396,9 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes) if (!node) return false; - if (stmt_could_throw_p (cfun, stmt) - && !cfun->can_delete_dead_exceptions) + if ((stmt_could_throw_p (cfun, stmt) + && !cfun->can_delete_dead_exceptions) + || gimple_call_noreturn_p (stmt)) return false; /* If return value is used the call is not dead. */ -- 2.47.1
Re: [PATCH v3 2/5] c++/modules: Ignore TU-local entities where necessary
On Sat, 12 Oct 2024, Nathaniel Shead wrote: > This version rewords all "ignored exposures" language. > > I haven't fixed up the issue with DECL_TEMPLATE_INSTANTIATIONS for this > patch. I'll try to get to that as a separate patch if I find the time, > but it's not 100% needed here I don't think. > > It seems the discussion around PR115126 RE: the libgcc changes that > people are leaning towards reverting the changes I made to the gthread > static variables and instead having some sort of attribute to instruct > the compiler to ignore the ODR issues; that should be relatively > straight-forward to add (maybe something like [[gnu::ignore_tu_local]]?) > but I'll probably handle that as a separate patch once I'm sure that's > what we actually want to do, and would probably be worth corresponding > with the Clang folks to see what thoughts they have. > > Finally, I wasn't able to work out a good way to fold non-ODR usages of > TU-local constants early. I attempted modifying 'mark_use', which > helped for 'constexpr' usages (though I ran into issues with e.g. > > constexpr int n[1] = 0; > constexpr int x = n ? 1 : 2; > > segfaulting as build_address was not expecting to see a constructor > rather than a declaration here). But I didn't look too hard into > solving this as it appears that any modifications made by 'mark_use' are > not actually applied to the primary template at all, and this is > consistent across many places I investigated. > > Given that erroring on these cases is still the status-quo, how easy it > is to workaround most of the time, and that I'm still not sure how to > solve this, I've also left this as a FIXME (with an XFAILed testcase) to > revisit later. > > OK for trunk? > > -- >8 -- > > [basic.link] p14 lists a number of circumstances where a declaration > naming a TU-local entity is not an exposure, notably the bodies of > non-inline templates and friend declarations in classes. This patch > ensures that these references do not error when exporting the module. > > We do need to still error on instantiation from a different module, > however, in case this refers to a TU-local entity. As such this patch > adds a new tree TU_LOCAL_ENTITY which is used purely as a placeholder to > poison any attempted template instantiations that refer to it. I'm a bit late to the party, but this is a nice patch series! I have a couple of comments below. > > This is also streamed for friend decls so that merging (based on the > index of an entity into the friend decl list) doesn't break and to > prevent complicating the logic; I imagine this shouldn't ever come up > though. > > We also add a new warning, '-Wtemplate-names-tu-local', to handle the > case where someone accidentally refers to a TU-local value from within a > non-inline function template. This will compile without errors as-is, > but any attempt to instantiate the decl will fail; this warning can be > used to ensure that this doesn't happen. Unfortunately the warning has > quite some false positives; for instance, a user could deliberately only > call explicit instantiations of the decl, or use 'if constexpr' to avoid > instantiating the TU-local entity from other TUs, neither of which are > currently detected. > > The main piece that this patch doesn't yet attempt to solve is ADL: as > specified, if ADL adds an overload set that includes a translation-unit > local entity when instantiating a template, that overload set is now > poisoned and counts as an exposure. Unfortunately, we don't currently > differentiate between decls that are hidden due to not being exported, > or decls that are hidden due to being hidden friends, so this patch > instead just keeps the current (wrong) behaviour of non-exported > entities not being visible to ADL at all. > > Additionally, this patch doesn't attempt to ignore non-ODR uses of > constants in constexpr functions or templates. The obvious approach of > folding them early in 'mark_use' doesn't seem to work (for a variety of > reasons), so this leaves this to a later patch to implement, as it's at > least no worse than the current behaviour and easy enough to workaround. > > For completeness this patch adds a new xtreme-header testcase to ensure > that we have no regressions with regards to exposures of TU-local > declarations in the standard library header files. A more restrictive > test would be to do 'export extern "C++"' here, but unfortunately the > system headers on some targets declare TU-local entities, so we'll make > do with checking that at least the C++ standard library headers don't > refer to such entities. > > gcc/c-family/ChangeLog: > > * c.opt: New warning '-Wtemplate-names-tu-local'. > > gcc/cp/ChangeLog: > > * cp-objcp-common.cc (cp_tree_size): Add TU_LOCAL_ENTITY. > * cp-tree.def (TU_LOCAL_ENTITY): New tree code. > * cp-tree.h (struct tree_tu_local_entity): New type. > (TU_LOCAL_ENTITY_NAME): New accessor. > (TU_LOCAL
Re: [PING^2] [PATCH] testsuite: arm: Use effective-target for unsigned-extend-1.c
On 2025-01-27 14:37, Christophe Lyon wrote: On Mon, 27 Jan 2025 at 13:30, Torbjorn SVENSSON wrote: Hi Christophe, On 2025-01-27 13:07, Christophe Lyon wrote: Hi Torbjorn, On Fri, 24 Jan 2025 at 18:45, Torbjorn SVENSSON wrote: Another ping... :) Kind regards, Torbjörn On 2024-12-18 18:35, Torbjorn SVENSSON wrote: Gentle ping :) Kind regards, Torbjörn On 2024-11-14 17:16, Torbjorn SVENSSON wrote: On 2024-11-14 16:32, Christophe Lyon wrote: On Fri, 8 Nov 2024 at 19:49, Torbjörn SVENSSON wrote: Ok for trunk and releases/gcc-14? -- A long time ago, this test forced -march=armv6. With -marm, the generated assembler is: foo: sub r0, r0, #48 cmp r0, #9 movhi r0, #0 movls r0, #1 bx lr With -mthumb, the generated assembler is: foo: subsr0, r0, #48 movsr2, #9 uxtbr3, r0 movsr0, #0 cmp r2, r3 adcsr0, r0, r0 uxtbr0, r0 bx lr Require effective-target arm_arm_ok to skip the test for thumb-only targets (Cortex-M). gcc/testsuite/ChangeLog: * gcc.target/arm/unsigned-extend-1.c: Use effective-target arm_arm_ok. Signed-off-by: Torbjörn SVENSSON --- gcc/testsuite/gcc.target/arm/unsigned-extend-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c b/gcc/ testsuite/gcc.target/arm/unsigned-extend-1.c index 3b4ab048fb0..73f2e1a556d 100644 --- a/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c +++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-1.c @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target arm_arm_ok } */ /* { dg-options "-O2" } */ So I'm guessing arm_arm_ok fails when you are testing for M-profile targets? What happens when you test for A-profile, providing -mthumb as part of your runtest flags? Looking at our CI results, without your patch the test passes on: cortex-m33, m3, m55, m7, v7A in arm mode, v8A in thumb mode, and it fails only on cortex-m0 and m23. Does this match your observations? Without my patch, I see failure on M0, M0+ and M23. With my patch, it goes to unsupported on all Cortex-M targets that I test. So your patch will make the test unsupported on m33, m3, m55 and m7 while it currently passes? Am I missing something? I think you might have missed my comment down Thanks, Christophe For Cortex-M0 (thumb/cpu=cortex-m0/float-abi=soft): Testing arm/unsigned-extend-1.c check_compile tool: gcc for arm_arm_ok doing compile Executing on host: .../bin/arm-none-eabi-gcc arm_arm_ok21114.c - mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21114.s (timeout = 800) spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21114.c - mthumb -mcpu=cortex-m0 -mfloat-abi=soft -fdiagnostics-plain-output - marm -Wno- complain-wrong-lang -S -o arm_arm_ok21114.s pid is 21254 -21254 cc1: error: target CPU does not support ARM mode pid is -1 close result is 21254 exp6 0 1 output is cc1: error: target CPU does not support ARM mode status 1 compiler exited with status 1 UNSUPPORTED: gcc.target/arm/unsigned-extend-1.c For Cortex-A7 (thumb/cpu=cortex-a7/float-abi=soft): Testing arm/unsigned-extend-1.c check_compile tool: gcc for arm_arm_ok doing compile Executing on host: .../bin/arm-none-eabi-gcc arm_arm_ok21296.c - mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output -marm - Wno-complain-wrong-lang -S -o arm_arm_ok21296.s (timeout = 800) spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc arm_arm_ok21296.c - mthumb -mcpu=cortex-a7 -mfloat-abi=soft -fdiagnostics-plain-output - marm -Wno- complain-wrong-lang -S -o arm_arm_ok21296.s pid is 21434 -21434 pid is -1 output is status 0 doing compile Executing on host: .../bin/arm-none-eabi-gcc .../gcc/testsuite/ gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat- abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno- ident -S -o unsigned-extend-1.s(timeout = 800) spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc .../gcc/testsuite/ gcc.target/arm/unsigned-extend-1.c -mthumb -mcpu=cortex-a7 -mfloat- abi=soft -fdiagnostics-plain-output -O2 -ffat-lto-objects -fno-ident - S -o unsigned-extend-1.s pid is 21438 -21438 pid is -1 output is status 0 check_compile tool: gcc for exceptions_enabled doing compile Executing on host: .../bin/arm-none-eabi-gcc exceptions_enabled21296.cc -mthumb -mcpu=cortex-a7 -mfloat- abi=soft -fdiagnostics-plain-output -Wno-complain-wrong-lang -S -o exceptions_enabled21296.s(timeout = 800) spawn -ignore SIGHUP .../bin/arm-none-eabi-gcc exceptions_enabled21296.cc -mthumb -mcpu=cortex-a7 -mfloat-abi=soft - fdiagnostics-plain-output -Wno-complain-wrong-lang -S -o exceptions_enabled21296.s pid is 21442 -21442 pid is -1 output is status 0 PASS: gcc.target/arm/unsigned-extend-1.c
Re: [PATCH v2 7/7] Add a unit test for random access in the file cache
On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote: > From: Andi Kleen Thanks for writing the test case! Some nitpicks... > > gcc/ChangeLog: > > * input.cc (check_line): New. > (test_replacement): New function to test line caching. > (input_cc_tests): Call test_replacement > --- > gcc/input.cc | 46 ++ > 1 file changed, 46 insertions(+) > > diff --git a/gcc/input.cc b/gcc/input.cc > index f1a15de66f1..524d5d2b6d1 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc > @@ -2346,6 +2346,51 @@ test_make_location_nonpure_range_endpoints > (const line_table_case &case_) > ASSERT_FALSE (IS_ADHOC_LOC (get_finish (not_aaa_eq_bbb))); > } > > +/* Verify reading of a specific line LINENUM in TMP, FC. */ > + > +static void check_line (temp_source_file &tmp, file_cache &fc, int > linenum) > +{ > + char_span line = fc.get_source_line (tmp.get_filename (), > linenum); > + int n; > + ASSERT_TRUE (sscanf (line.get_buffer (), "%d", &n) == 1); Please add a comment around here something like: /* Although get_buffer's result isn't guaranteed to be null-terminated, it will be for every line in test_replacement. */ or somesuch. > + ASSERT_EQ (n, linenum); > +} > + > +/* Test file cache replacement. */ > + > +static void test_replacement () > +{ > + const int maxline = 1000; > + char *vec = XNEWVEC (char, maxline * 15); > + char *p = vec; > + int i; > + for (i = 1; i <= maxline; i++) > + { > + p += sprintf (p, "%d\n", i); > + } > + temp_source_file tmp (SELFTEST_LOCATION, ".txt", vec); > + free (vec); > + file_cache fc; > + > + for (i = 2; i <= maxline; i++) > + { > + check_line (tmp, fc, i); > + check_line (tmp, fc, i - 1); > + if (i >= 10) > + check_line (tmp, fc, i - 9); > + if (i >= 350) /* Exceed the look behind cache. */ > + check_line (tmp, fc, i - 300); > + } > + for (i = 5; i <= maxline; i += 100) > + { > + check_line (tmp, fc, i); > + } > + for (i = 1; i <= maxline; i++) > + { > + check_line (tmp, fc, i); > + } Probably should lose the {} around the single statement being guarded by the "if" in these two loops. Patch 7 is OK otherwise, and I'm taking a look at the rest of the patches now; thanks. Dave > +} > + > /* Verify reading of input files (e.g. for caret-based > diagnostics). */ > > static void > @@ -4296,6 +4341,7 @@ input_cc_tests () > > test_reading_source_line (); > test_reading_source_buffer (); > + test_replacement (); > > test_line_offset_overflow (); >
Re: [PATCH] arm: libbacktrace: Check if the compiler supports __sync atomics
Richard Earnshaw writes: > Older versions of the Arm architecture lack support for __sync > operations directly in hardware and require calls into appropriate > operating-system hooks. But such hooks obviously don't exist in a > freestanding environment. > > Consquently, it is incorrect to assume during configure that such > functions will exist and we need a configure-time check to determine > whether or not these routines will work. > > libbacktrace: > > * configure.ac: Always check if the compiler supports __sync > operations. > * configure: Regenerated. > --- > > A consequence of this assumption that these functions 'just work' (TM) > is that we build libbacktrace with threading support, but that then > fails at link time when we find out that they do not. I'm not sure I > really understand why we can assume that having with_target_subdir set > is enough to assume that the operations exist: it isn't for Arm and it > looks like it wasn't for hpux either. Perhaps the whole logic here > needs reconsideration... My understanding was that libatomic would provide implementations of the __sync operations where required. I guess that is not always the case, though I don't understand why that should be. In any case, this patch is fine. Thanks. Ian
Re: [PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]
Hi Jason, On 27 Jan 2025, at 15:23, Jason Merrill wrote: > On 1/27/25 8:14 AM, Simon Martin wrote: >> Hi Jason, >> >> On 24 Jan 2025, at 16:51, Jason Merrill wrote: >> >>> On 1/24/25 6:19 AM, Simon Martin wrote: Hi Jason, On 23 Jan 2025, at 22:56, Jason Merrill wrote: > On 1/23/25 12:06 PM, Simon Martin wrote: >> Hi Marek, >> >> On 23 Jan 2025, at 16:45, Marek Polacek wrote: >> >>> On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote: Hi Jason, On 20 Jan 2025, at 22:49, Jason Merrill wrote: > On 1/20/25 2:11 PM, Simon Martin wrote: >> Hi Jason, >> >> On 15 Jan 2025, at 22:42, Jason Merrill wrote: >> >>> On 12/12/24 2:07 PM, Simon Martin wrote: We currently ICE upon the following valid (under -Wno-vla) >> code === cut here === void f(int c) { constexpr int r = 4; [&](auto) { int t[r * c]; }(0); } === cut here === The problem is that when parsing the lambda body, and more specifically the multiplication, we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED even though the replacement of r by 4 is "undone" by the call >> to build_min_non_dep in build_x_binary_op. This makes prune_lambda_captures remove the proxy declaration while it should not, and we trip >> on an assert at instantiation time. >>> >>> Why does prune_lambda_captures remove the declaration if >>> it's >>> still >>> used in the function body? Setting >>> LAMBDA_EXPR_CAPTURE_OPTIMIZED >>> just >>> means "we might have optimized away some captures", the tree >>> walk >>> should have found the use put back by build_x_binary_op. >> I think that this is due to a bug in mark_const_cap_r, >> that’s >> been >> here since the beginning, i.e. r8-7213-g1577f10a637352: it >> decides >> NOT >> to walk sub-trees when encountering a DECL_EXPR for a >> constant >> capture >> proxy (at lambda.cc:1769). I don’t understand why we >> >> wouldn’t >> want >> to continue. > > Because that's the declaration of the capture proxy, not a use > of > it. Indeed, thanks for clarifying. > Why aren't we finding the use in the declaration of t? After further investigation, the problem is rather that neither walk_tree_1 nor cp_walk_subtrees walk the dimensions of array types, so we miss the uses. >> Removing that line fixes the PR, but breaks 3 existing tests >> (lambda-capture1.C, lambda-const11.C and lambda-const11a.C, >> that >> all >> assert that we optimise out the capture); that’s why I did >> not >> pursue >> it in the first place. > > Right. > >> But taking another look, it might not be that big a deal that >> we >> don’t >> optimise those out: as soon as we use -O1 or above, the >> assignment >> to >> the closure field actually disappears. > > Completely breaking this optimization is a big deal, > particularly > since it affects the layout of closure objects. We can't > always >> > optimize everything away. ACK. The attached updated patch makes cp_walk_subtrees walk array type >> dimensions, which fixes the initial PR without those 3 regressions. Successfully tested on x86_64-pc-linux-gnu. Is it OK? Simon PS: In theory I think it’d make sense to do do this change in walk_tree_1 since C also supports VLAs, but doing so breaks some OMP tests. OMP can do interesting things with array bounds (see r9-5354-gda972c05f48637), and fixing those would require teaching >> walk_tree_1 about the “omp dummy var” array bounds, which I think would be a bit ugly. And I’m not aware of any C case that would be improved/fixed by doing this change, so we’re probably fine not doing it. >>> From e19bb6c943a422b1201c5c9a2a1d4f32141baf84 Mon Sep 17 00:00:00 2001 From: Simon Martin Date: Wed, 22 Jan 2025 16:19:47 +0100 Subject: [PATCH] c++: Don't prune constant capture proxies only used >
Re: [PATCH v2 1/7] Add tunables for input buffer
On Sat, 2025-01-25 at 23:30 -0800, Andi Kleen wrote: > From: Andi Kleen > > The input machinery to read the source code independent of the lexer > has a range of hard coded maximum array sizes that can impact > performance. > Make them tunable. > > input.cc is part of libcommon so it cannot direct access params > without a level of indirection. Thanks; this patch is OK for trunk Dave > > gcc/ChangeLog: > > PR preprocessor/118168 > * input.cc (file_cache::tune): New function. > * input.h (class file_cache): Make tunables non const. > * params.opt: Add new tunables. > * toplev.cc (toplev::main): Initialize input buffer context > tunables. > --- > gcc/input.cc | 18 +- > gcc/input.h | 4 +++- > gcc/params.opt | 8 > gcc/toplev.cc | 2 ++ > 4 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/gcc/input.cc b/gcc/input.cc > index 7ed80cad13f..f824c34e0cc 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc > @@ -79,6 +79,10 @@ public: > void evict (); > void set_content (const char *buf, size_t sz); > > + static void tune(size_t line_record_size_) { > + line_record_size = line_record_size_; > + } > + > private: > /* These are information used to store a line boundary. */ > class line_info > @@ -116,7 +120,7 @@ public: > bool goto_next_line (); > > static const size_t buffer_size = 4 * 1024; > - static const size_t line_record_size = 100; > + static size_t line_record_size; > > /* The number of time this file has been accessed. This is used > to designate which file cache to evict from the cache > @@ -189,6 +193,18 @@ public: > > }; > > +size_t file_cache_slot::line_record_size = 100; > + > +/* Tune file_cache. */ > +void > +file_cache::tune (size_t num_file_slots_, size_t lines) > +{ > + num_file_slots = num_file_slots_; > + file_cache_slot::tune (lines); > +} > + > +size_t file_cache::num_file_slots = 16; > + > static const char * > find_end_of_line (const char *s, size_t len); > > diff --git a/gcc/input.h b/gcc/input.h > index 18ccf4429fc..a60afe80681 100644 > --- a/gcc/input.h > +++ b/gcc/input.h > @@ -161,13 +161,15 @@ class file_cache > const char *buffer, > size_t sz); > > + static void tune(size_t num_file_slots_, size_t lines); > + > private: > file_cache_slot *evicted_cache_tab_entry (unsigned > *highest_use_count); > file_cache_slot *add_file (const char *file_path); > file_cache_slot *lookup_file (const char *file_path); > > private: > - static const size_t num_file_slots = 16; > + static size_t num_file_slots; > file_cache_slot *m_file_slots; > input_context m_input_context; > }; > diff --git a/gcc/params.opt b/gcc/params.opt > index b5e7800d7e4..5d234a607c0 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -134,6 +134,14 @@ Maximum size (in bytes) of objects tracked > bytewise by dead store elimination. > Common Joined UInteger Var(param_early_inlining_insns) Init(6) > Optimization Param > Maximal estimated growth of function body caused by early inlining > of single call. > > +-param=file-cache-files= > +Common Joined UInteger Var(param_file_cache_files) Init(16) Param > +Max number of files in the file cache. > + > +-param=file-cache-lines= > +Common Joined UInteger Var(param_file_cache_lines) Init(100) Param > +Max number of lines to index into file cache. > + > -param=fsm-scale-path-stmts= > Common Joined UInteger Var(param_fsm_scale_path_stmts) Init(2) > IntegerRange(1, 10) Param Optimization > Scale factor to apply to the number of statements in a threading > path crossing a loop backedge when comparing to max-jump-thread- > duplication-stmts. > diff --git a/gcc/toplev.cc b/gcc/toplev.cc > index d45a12cab45..e03af8b1805 100644 > --- a/gcc/toplev.cc > +++ b/gcc/toplev.cc > @@ -2333,6 +2333,8 @@ toplev::main (int argc, char **argv) > UNKNOWN_LOCATION, global_dc, > targetm.target_option.override); > > + file_cache::tune (param_file_cache_files, param_file_cache_lines); > + > handle_common_deferred_options (); > > init_local_tick ();
Re: [PATCH] gcc: Add tree walk case to reach A pack from B in ...B>. [PR118265]
On 1/27/25 1:26 PM, Patrick Palka wrote: On Wed, 1 Jan 2025, A J Ryan Solutions Ltd wrote: Hi and happy new year, this patch is to fix a compiler seg-fault as encountered in the following example: Hi, thanks for the patch! Your fix makes sense to me, and I believe it also fixes the testcases from PR102626 and at least some of its related PRs. Yes, thanks! In addition to Patrick's comments, all of which I agree with, the subject line should start with "c++" rather than "gcc". We also leave out a closing period. The subject line, and the rest of the commit message, must have lines no longer than 76 characters. For the subject line specifically, under 70 characters is preferable, for the sake of git log --oneline. Also, the testcase should not go in the main g++.dg/ directory, but a subdirectory, with a filename indicating what it tests. Perhaps cpp1z/variadic-nontype1.C template struct Class1{}; template class Class2; template...Un> class Class2 { public: void apply(){} }; Class1 class1_bool; Class1 class1_char; int main() { Class2 class2; class2.apply(); } where it ends up populating the V argument in the instantiated template at the class2 declaration with the unknown placeholder and the segfault is when it later tries to get the name for an incompletely resolved type. Here's a more reduced example that's fixed by your patch: template struct A { }; template void f(A...); int main() { f(A<0>{}, A<1u>{}, A<2l>{}); // OK, Ts deduced to {int,unsigned,long} } A related PR that isn't and shouldn't be fixed by this patch is PR84796 where the type parameter pack is declared at a different level than the non-type parameter pack, e.g.: template struct A { }; template struct B { template static void f(A); }; int main() { A<1, true, 'c'> a; B::f(a); } Before your patch we would ICE on this valid testcase, after we now reject it with a bogus error: : In function ‘int main()’: :12:24: error: no matching function for call to ‘B::f(A<1, true, 'c'>&)’ :12:24: note: there is 1 candidate :7:15: note: candidate 1: ‘template static void B::f(A) [with Ts ...Vs = {Vs ...}; Ts = {int, bool, char}]’ :7:15: note: template argument deduction/substitution failed: :12:24: note: ‘Vs’ is not equivalent to ‘1’ Which is no worse than ICEing, I suppose :) The incorrect processing happens in unify_pack_expansion where it is calling unify_one_argument here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l24718) for each argument of the argument pack and recursively unifying the inner arguments. In this example the inner argument against V (bool/char) is set in targs however it is not seen as a pack argument in unify_pack_expansion because in the unfixed code it does not reach and link this associated pack and later on when processing the collected arguments to substitute it in in instantiate_class_template the argument has not been set as an argument pack in targs and it doesn't match the the template specialisation parameter which is a parameter pack and it falls back to the unknown placeholder. The parameter packs are originally linked in make_pack_expansion here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l4246) and the change is in this tree walk so that in the example above V is chained to Un and after calling unify_one_argument it loops over both packs and extracts their associated value set in targs and puts it into it's pack argument collection. Regards Adam Ryan Subject: [PATCH] Add tree walk case to obtain A pack in ...B>. For non-type parameter packs when unifying the arguments in unify_pack_expansion it iterates over the associated packs of a param so that when it recursivly unifies the param with the arguments it knows which targs have been populated with parameter pack arguments that it can then collect up. This change adds a tree walk so that in the example above it reaches ...A and adds it to the associated packs for ...B and therefore knows it will have been set in targs in unify_pack_expansion and processes it as per other pack arguments. PR gcc/118265 PR c++/118265 rather A ChangeLog entry is missing from the commit message as per https://gcc.gnu.org/contribute.html#patches Here's an example ChangeLog we can use for this patch: gcc/cp/ChangeLog: * pt.cc (find_parameter_packs_r) : Walk into the type of a parameter pack. Signed-off-by: Adam J Ryan --- gcc/cp/pt.cc| 5 + gcc/testsuite/g++.dg/pr118265.C | 17 + 2 files changed, 22 insertions(+) create mode 100644 gcc/testsuite/g++.dg/pr118265.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 0ffa0b53e26..22f5d4b1875 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -4012,6 +4012,11 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees,
Re: [PATCH] RISC-V: ensure needed FRM restore is not eliminable [PR118646]
On 1/26/25 05:29, Jeff Law wrote: > > On 1/24/25 3:12 PM, Vineet Gupta wrote: >> RV-Vector FP-INT insns use the rounding mode in FRM register which if >> explicitly set for V insn needs, is saved/restored (although from the >> psABI CC Spec, it is not clear if it actually a caller-saved or >> callee-saved). >> >> Anyhow in the failure case the save/restore were generated by the >> Mode Switch pass, but then eliminated by sched1:DCE and Late-Combine. >> Fix this by using unspec_volatile variant which won't be eliminated. >> >> This showed up as SPEC2017 527.cam4 runtime aborts in glibc:round_away() >> which checks for standard rounding modes and the "leaking" rounding mode >> due to the bug happened to be a non-standard RISC-V specific RMM >> "Round to Nearest, ties to Max". >> >> This is testsuite clean: > Not sure how it could be clean as I think the test itself is busted ;-) Sorry for the snafu; I would blame "send the patch out on friday" that bit me. I used the raw test for debugging and coming up with the fix, but before integrating the test properly I ran the existing testsuite w/ my patch and made sure it didn't regress anything, so those test results are for real minus the new test. > As-is it'll trigger compile time failures: >> FAIL: gfortran.target/riscv/rvv/pr118646.f90 -O0 (test for excess errors) >> Excess errors: >> /home/jlaw/test/gcc/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90:18:12: >> Warning: Deleted feature: End expression in DO loop at (1) must be integer >> /home/jlaw/test/gcc/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90:22:15: >> Warning: Deleted feature: End expression in DO loop at (1) must be integer >> /home/jlaw/test/gcc/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90:36:18: >> Warning: Deleted feature: End expression in DO loop at (1) must be integer Added -std-leagcy to address that > A "-w" will work around that, but then there's no Fortran equivalent of > main and we get a link error: >> FAIL: gfortran.target/riscv/rvv/pr118646.f90 -O0 (test for excess errors) >> Excess errors: >> /release/linux/.build/src/glibc-git-4d29ec7c/csu/../sysdeps/riscv/start.S:67:(.text+0x22): >> undefined reference to `main' >> >> UNRESOLVED: gfortran.target/riscv/rvv/pr118646.f90 -O0 compilation failed >> to produce executable Its not a run test, we just have to scan number of f.rm instances, which was also missing in the test :-( > What I wanted to do was use your testcase with Pan's patch to see if > Pan's patch resolved both issues. Yes it does. > Your compiler patch may still be desirable as well, I really haven't > really evaluated that yet. FWIW with Pan's rework the need for volatile goes away anyways. The fixed up test (actually tested for A/B pass/fail) which can be integrated in Pan's commit can be found below -> diff --git a/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90 b/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90 new file mode 100644 index ..e2a034316123 --- /dev/null +++ b/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90 @@ -0,0 +1,48 @@ +! Reduced from SPEC2017 527.cam4 zm_conv.F90 + +! { dg-do compile } +! { dg-options "-Ofast -std=legacy -march=rv64gcv_zvl256b_zba_zbb_zbs_zicond -ftree-vectorize -mabi=lp64d" } + +module a + contains +subroutine b(f) + + real d(4) + integer e(4) + integer f(4) + real hmax(4) + real g(4) + + integer h(4) + integer l(4,5) + do i = 1,c + h(i) = 0 + end do + do k = j ,1 + do i = 1,c + q = g(i) + hmax(i) + if (k >= nint(d(i)) .and. k <= e(i) .and. q > 1.e4) then + f(i) = k + end if + if (k < o ) then + if (buoy<= 0.) then + l(i,h) = k + end if + end if + end do + end do + do n = 1,5 + do k = 1,m + do i = 1,c + if (k > l(i,n)) then + p = r() + end if + end do + end do + end do +end +end + +! { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 1 } } +! { dg-final { scan-assembler-times {fsrmi\s+[01234]} 1 } } +! { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 1 } }
Re: [PATCH v3] c++: Don't prune constant capture proxies only used in array dimensions [PR114292]
On 1/27/25 2:34 PM, Simon Martin wrote: Hi Jason, On 27 Jan 2025, at 15:23, Jason Merrill wrote: On 1/27/25 8:14 AM, Simon Martin wrote: Hi Jason, On 24 Jan 2025, at 16:51, Jason Merrill wrote: On 1/24/25 6:19 AM, Simon Martin wrote: Hi Jason, On 23 Jan 2025, at 22:56, Jason Merrill wrote: On 1/23/25 12:06 PM, Simon Martin wrote: Hi Marek, On 23 Jan 2025, at 16:45, Marek Polacek wrote: On Thu, Jan 23, 2025 at 02:40:09PM +, Simon Martin wrote: Hi Jason, On 20 Jan 2025, at 22:49, Jason Merrill wrote: On 1/20/25 2:11 PM, Simon Martin wrote: Hi Jason, On 15 Jan 2025, at 22:42, Jason Merrill wrote: On 12/12/24 2:07 PM, Simon Martin wrote: We currently ICE upon the following valid (under -Wno-vla) code === cut here === void f(int c) { constexpr int r = 4; [&](auto) { int t[r * c]; }(0); } === cut here === The problem is that when parsing the lambda body, and more specifically the multiplication, we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED even though the replacement of r by 4 is "undone" by the call to build_min_non_dep in build_x_binary_op. This makes prune_lambda_captures remove the proxy declaration while it should not, and we trip on an assert at instantiation time. Why does prune_lambda_captures remove the declaration if it's still used in the function body? Setting LAMBDA_EXPR_CAPTURE_OPTIMIZED just means "we might have optimized away some captures", the tree walk should have found the use put back by build_x_binary_op. I think that this is due to a bug in mark_const_cap_r, that’s been here since the beginning, i.e. r8-7213-g1577f10a637352: it decides NOT to walk sub-trees when encountering a DECL_EXPR for a constant capture proxy (at lambda.cc:1769). I don’t understand why we wouldn’t want to continue. Because that's the declaration of the capture proxy, not a use of it. Indeed, thanks for clarifying. Why aren't we finding the use in the declaration of t? After further investigation, the problem is rather that neither walk_tree_1 nor cp_walk_subtrees walk the dimensions of array types, so we miss the uses. Removing that line fixes the PR, but breaks 3 existing tests (lambda-capture1.C, lambda-const11.C and lambda-const11a.C, that all assert that we optimise out the capture); that’s why I did not pursue it in the first place. Right. But taking another look, it might not be that big a deal that we don’t optimise those out: as soon as we use -O1 or above, the assignment to the closure field actually disappears. Completely breaking this optimization is a big deal, particularly since it affects the layout of closure objects. We can't always optimize everything away. ACK. The attached updated patch makes cp_walk_subtrees walk array type dimensions, which fixes the initial PR without those 3 regressions. Successfully tested on x86_64-pc-linux-gnu. Is it OK? Simon PS: In theory I think it’d make sense to do do this change in walk_tree_1 since C also supports VLAs, but doing so breaks some OMP tests. OMP can do interesting things with array bounds (see r9-5354-gda972c05f48637), and fixing those would require teaching walk_tree_1 about the “omp dummy var” array bounds, which I think would be a bit ugly. And I’m not aware of any C case that would be improved/fixed by doing this change, so we’re probably fine not doing it. From e19bb6c943a422b1201c5c9a2a1d4f32141baf84 Mon Sep 17 00:00:00 2001 From: Simon Martin Date: Wed, 22 Jan 2025 16:19:47 +0100 Subject: [PATCH] c++: Don't prune constant capture proxies only used in array dimensions [PR114292] We currently ICE upon the following valid (under -Wno-vla) code === cut here === void f(int c) { constexpr int r = 4; [&](auto) { int t[r * c]; }(0); } === cut here === When parsing the lambda body, and more specifically the multiplication, we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED, which indicates to prune_lambda_captures that it might be possible to optimize out some captures. The problem is that prune_lambda_captures then misses the use of the r capture (because neither walk_tree_1 nor cp_walk_subtrees walks the dimensions of array types - here "r * c"), hence believes the capture can be pruned... and we trip on an assert when instantiating the lambda. This patch changes cp_walk_subtrees so that when walking a declaration with an array type, it walks that array type's dimensions. Since C also supports VLAs, I thought it'd make sense to do this in walk_tree_1, but this breaks some omp-low related test cases (because OMP can do funky things with array bounds when adjust_array_error_bounds is set), and I don't really want to add code to walk_tree_1 to skips arrays whose dimension is a temporary variable with the "omp dummy var" attribute. Successfully tested on x86_64-pc-linux-gnu. PR c++/114292 gcc/cp/ChangeLog: * tree.cc (cp_walk_subtrees): Walk array type dimensions. gcc/testsuite/C
[COMMITTED] RISC-V: Add another test for FRM elimination bug [PR118646]
The issue is same as PR118103 and fixed by commit 55d288d4ff53 ("RISC-V: Make FRM as global register [PR118103]"). Essentially FRM save/restore were getting eliminated because FRM reg semantics were not being modelled correctly. In this case it showed up as SPEC2017 527.cam4 runtime aborts in glibc:round_away() due to non-canonical rounding mode showing up, "leaking" earlier in the call chain because such rounding mode save/restore was getting eliminated. PR target/118646 gcc/testsuite/ChangeLog: * gfortran.target/riscv/rvv/pr118646.f90 (New Test). Signed-off-by: Vineet Gupta --- .../gfortran.target/riscv/rvv/pr118646.f90| 48 +++ 1 file changed, 48 insertions(+) create mode 100644 gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90 diff --git a/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90 b/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90 new file mode 100644 index ..e2a034316123 --- /dev/null +++ b/gcc/testsuite/gfortran.target/riscv/rvv/pr118646.f90 @@ -0,0 +1,48 @@ +! Reduced from SPEC2017 527.cam4 zm_conv.F90 + +! { dg-do compile } +! { dg-options "-Ofast -std=legacy -march=rv64gcv_zvl256b_zba_zbb_zbs_zicond -ftree-vectorize -mabi=lp64d" } + +module a + contains +subroutine b(f) + + real d(4) + integer e(4) + integer f(4) + real hmax(4) + real g(4) + + integer h(4) + integer l(4,5) + do i = 1,c + h(i) = 0 + end do + do k = j ,1 + do i = 1,c + q = g(i) + hmax(i) + if (k >= nint(d(i)) .and. k <= e(i) .and. q > 1.e4) then +f(i) = k + end if + if (k < o ) then +if (buoy<= 0.) then + l(i,h) = k +end if + end if + end do + end do + do n = 1,5 + do k = 1,m + do i = 1,c +if (k > l(i,n)) then + p = r() +end if + end do + end do + end do +end +end + +! { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 1 } } +! { dg-final { scan-assembler-times {fsrmi\s+[01234]} 1 } } +! { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 1 } } -- 2.43.0
Re: [PATCH] wwwdocs: Clarify DCO name/identity and (anonymous) pseudonym policy
Hi, On Mon, Jan 20, 2025 at 11:59:16AM +0100, Mark Wielaard wrote: > On Sun, Jan 19, 2025 at 05:14:20PM +0100, Mark Wielaard wrote: > > On Tue, Dec 17, 2024 at 04:40:10PM +0900, Gerald Pfeifer wrote: > > > On Mon, 2 Dec 2024, Mark Wielaard wrote: > > > > Adjust the DCO text to match the broader community usage and > > > > clarifications around the use of real names, known identities and > > > > (anonymous) pseudonyms. > > > > > > > > These changes clarify what was meant by "real name" and that it is not > > > > required to be a "legal name" or any other stronger requirement than a > > > > known identity that could be contacted to discuss the contribution as > > > > adopted by other communities like the linux kernel, elfutils, cncf and > > > > gentoo. > > > > > > > > Also explain that the FSF assignment policy might be more appropriate > > > > when wanting to contribute using an anonymous pseudonym. > > > > > > This looks fine to me personally, though I don't feel I can simply > > > approve > > > this wearing my wwwdocs hat (which is why I haven't responded earlier). > > > > > > Jason, you have originally contributed this if I remember correctly; how > > > do we go about such a change? > > > > Please let me know. Updated patch below. > > > > > (One minor bit: The sentence "This will be done for you automatically if > > > you use `git commit -s`" feels a bit off now there is more details on > > > something else preceeding it. i.e., `git commit -s` does not establish > > > real names as such. Maybe leave it earlier in the paragraph?) > > > > Yes, good point. I swapped the sentences. > > And then I attached the old patch anyway. Sorry. Really updated patch > attached now. Ping. > > P.S. There will be a panel discussion organized by the FSF licensing > > and compliance manager on copyrights and developer certificates of > > origin at Fosdem: > > https://fosdem.org/2025/schedule/event/fosdem-2025-5376-managing-copyrights-in-free-software-projects-discussion-panel-with-gnu-maintainers/ > > I'll try to attend and report back. Also the GDB hackers and the FSF have a discussion about clarifications of copyright assignment and DCO practices: https://inbox.sourceware.org/gdb/605f81a2-000e-42a3-b5fd-1854e92c5...@fsf.org/ > From b385e23076bdf843068f8242995461bef2e79748 Mon Sep 17 00:00:00 2001 > From: Mark Wielaard > Date: Mon, 2 Dec 2024 11:16:00 +0100 > Subject: [PATCH] wwwdocs: Clarify DCO name/identity and (anonymous) pseudonym > policy > > Adjust the DCO text to match the broader community usage and > clarifications around the use of real names, known identities and > (anonymous) pseudonyms. > > These changes clarify what was meant by "real name" and that it is not > required to be a "legal name" or any other stronger requirement than a > known identity and can be contacted to discuss the contribution as > adopted by other communities like the linux kernel, elfutils, cncf and > gentoo. > > Also explain that the FSF assignment policy might be more appropriate > when wanting to contribute using an anonymous pseudonym or when > needing an explicit company disclaimer. > --- > htdocs/dco.html | 17 +++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/htdocs/dco.html b/htdocs/dco.html > index 68fa183b9fc0..0bf0d3fedf34 100644 > --- a/htdocs/dco.html > +++ b/htdocs/dco.html > @@ -54,8 +54,21 @@ then you just add a line saying: > > Signed-off-by: Random J Developer >> > -using your real name (sorry, no pseudonyms or anonymous contributions.) This > -will be done for you automatically if you use `git commit -s`. > +using a known identity (sorry, no anonymous contributions.) This will > +be done for you automatically if you use `git commit -s`. The name > +you use as your identity should not be an anonymous id or false name > +that misrepresents who you are. > + > +A known identity can be the committer's real, birth or legal name, > +but can also be an established (online) identity. It is the name you > +convey to people in the community for them to use to identify you as > +you. A key concern is that your identification is sufficient enough > +to contact you if an issue were to arise in the future about your > +contribution. You should not deliberately use a name or email address > +that hides your identity. When you wish to only contribute under an > +(anonymous) pseudonym, or when you require an explicit employer > +disclaimer, then following the FSF > +assignment process is more appropriate. > > Some people also put extra optional tags at the end. The GCC project does > not require tags from anyone other than the original author of the patch, but > -- > 2.47.1 >
Re: [PATCH] c: For array element type drop qualifiers but keep other properties of the element type [PR116357]
On Sat, 25 Jan 2025, Jakub Jelinek wrote: > The following patch uses build_qualified_type with TYPE_UNQUALIFIED instead, > which will be in the common case the same as TYPE_MAIN_VARIANT if the > checks are satisfied for it, but if not, will look up different unqualified > type or even create it if there is none. Using build_qualified_type would I think also have the effect of not removing the qualifiers from the element type of an array type. The design followed in grokdeclarator is given by the comment: At each stage we maintain an unqualified version of the type together with any qualifiers that should be applied to it with c_build_qualified_type; this way, array types including multidimensional array types are first built up in unqualified form and then the qualified form is created with TYPE_MAIN_VARIANT pointing to the unqualified form. */ The patch is OK with c_build_qualified_type used instead of build_qualified_type, if that works. -- Joseph S. Myers josmy...@redhat.com
Re: [PATCH] c++: friend vs inherited guide confusion [PR117855]
On 1/27/25 4:17 PM, Patrick Palka wrote: On Mon, 27 Jan 2025, Patrick Palka wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/14? OK. +&& (cxx_dialect < cxx23 || !deduction_guide_p (NODE))) \ N.B. I decided to check cxx_dialect here since deduction_guide_p seems relatively expensive to call? In another version of this patch I made us track dguide-ness via a dedicated bit flag instead of the DECL_NAME comparison to make the predicate essentially free (and potentially also more robust) but I eventually shelved that as more stage 1 material. deduction_guide_p doesn't look like a significant expense to me; I wouldn't bother trying to optimize it without profile data as evidence. Jason
[pushed] c++: only strip conversions for deduction [PR118632]
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- In r15-2761 I changed unify to always strip IMPLICIT_CONV_EXPR from PARM. In this testcase that leads to comparing nullptr to (int*)0, and failing because they aren't the same. Let's only strip conversions if we're actually going to try to deduce template arguments. While we're at it, let's move this after the early exits. And with this adjustment we can remove the workaround for mangle57.C. PR c++/118632 gcc/cp/ChangeLog: * pt.cc (unify): Only strip conversion if deducible_expression. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/nontype7.C: New test. --- gcc/cp/pt.cc | 24 ++-- gcc/testsuite/g++.dg/cpp0x/nontype7.C | 13 + 2 files changed, 23 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/nontype7.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index d4a6e2e0675..2811afed007 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -25099,15 +25099,6 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, ? tf_warning_or_error : tf_none); - /* I don't think this will do the right thing with respect to types. - But the only case I've seen it in so far has been array bounds, where - signedness is the only information lost, and I think that will be - okay. VIEW_CONVERT_EXPR can appear with class NTTP, thanks to - finish_id_expression_1, and are also OK. */ - while (CONVERT_EXPR_P (parm) || TREE_CODE (parm) == VIEW_CONVERT_EXPR -|| TREE_CODE (parm) == IMPLICIT_CONV_EXPR) -parm = TREE_OPERAND (parm, 0); - if (arg == error_mark_node) return unify_invalid (explain_p); if (arg == unknown_type_node @@ -25119,11 +25110,16 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, if (parm == any_targ_node || arg == any_targ_node) return unify_success (explain_p); - /* Stripping IMPLICIT_CONV_EXPR above can produce this mismatch - (g++.dg/abi/mangle57.C). */ - if (TREE_CODE (parm) == FUNCTION_DECL - && TREE_CODE (arg) == ADDR_EXPR) -arg = TREE_OPERAND (arg, 0); + /* Strip conversions that will interfere with NTTP deduction. + I don't think this will do the right thing with respect to types. + But the only case I've seen it in so far has been array bounds, where + signedness is the only information lost, and I think that will be + okay. VIEW_CONVERT_EXPR can appear with class NTTP, thanks to + finish_id_expression_1, and are also OK. */ + if (deducible_expression (parm)) +while (CONVERT_EXPR_P (parm) || TREE_CODE (parm) == VIEW_CONVERT_EXPR + || TREE_CODE (parm) == IMPLICIT_CONV_EXPR) + parm = TREE_OPERAND (parm, 0); /* If PARM uses template parameters, then we can't bail out here, even if ARG == PARM, since we won't record unifications for the diff --git a/gcc/testsuite/g++.dg/cpp0x/nontype7.C b/gcc/testsuite/g++.dg/cpp0x/nontype7.C new file mode 100644 index 000..30c1d3933ee --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/nontype7.C @@ -0,0 +1,13 @@ +// PR c++/118632 +// { dg-do compile { target c++11 } } + +template +class Matrix {}; + +template +void operator*(Matrix, int); + +int main() { + Matrix m; + m * 42; +} base-commit: ceabea405ffdc851736e240111be9b297ad86c53 -- 2.48.0
[PATCH] combine: Fix up make_extraction [PR118638]
Hi! The following testcase is miscompiled at -Os on x86_64-linux. The problem is during make_compound_operation of (ashiftrt:SI (ashift:SI (mult:SI (reg:SI 107 [ a_5 ]) (const_int 3 [0x3])) (const_int 31 [0x1f])) (const_int 31 [0x1f])) where it incorrectly returns (mult:SI (sign_extract:SI (reg:SI 107 [ a_5 ]) (const_int 2 [0x2]) (const_int 0 [0])) (const_int 3 [0x3])) which isn't obviously true, the former returns either 0 or -1 depending on the least significant bit of the multiplication, the latter returns either 0 or -3 depending on the second least significant bit of the multiplication argument. The bug has been introduced in PR96998 r11-4563, which added handling of x * (2^N) similar to x << N. In the above case, pos is 0 and len is 1, sign extracting a single least significant bit of the multiplication. As 3 is not a power of 2, shift_amt is -1. But IN_RANGE (-1, 1, 1 - 1) is still true, because the basic requirement of IN_RANGE that LOWER is not greater than UPPER is violated. The intention of using 1 as LOWER is to avoid matching multiplication by 1, that really shouldn't appear in the IL. But to avoid violating IN_RANGE requirement, we need to verify that len is at least 2. I've added this len > 1 check to the inner if rather than outer because I think for GCC 16 we should add a further optimization. In the particular case of 1 least significant bit sign extraction from multiplication by 3, we could actually say it is equivalent to (sign_extract:SI (reg:SI 107 [ a_5 ]) (const_int 1 [0x2]) (const_int 0 [0])) That is because 3 is an odd number and multiplication by 2 will yield the least significant bit 0 (we are sign extracting just one) and so the multiplication doesn't change anything on the outcome. More generally, even for larger len, multiplication by C which is (1 << X) + 1 where X is >= len should be optimizable just to extraction of the multiplicand's least significant len bits. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-01-28 Jakub Jelinek PR rtl-optimization/118638 * combine.cc (make_extraction): Only optimize (mult x 2^n) if len is larger than 1. * gcc.c-torture/execute/pr118638.c: New test. --- gcc/combine.cc.jj 2025-01-27 16:38:47.0 +0100 +++ gcc/combine.cc 2025-01-27 18:55:22.082925556 +0100 @@ -7629,7 +7629,7 @@ make_extraction (machine_mode mode, rtx least significant (LEN - C) bits of X, giving an rtx whose mode is MODE, then multiply it by 2^C. */ const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1))); - if (IN_RANGE (shift_amt, 1, len - 1)) + if (len > 1 && IN_RANGE (shift_amt, 1, len - 1)) { new_rtx = make_extraction (mode, XEXP (inner, 0), 0, 0, len - shift_amt, --- gcc/testsuite/gcc.c-torture/execute/pr118638.c.jj 2025-01-27 18:56:30.246986900 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr118638.c 2025-01-27 18:56:13.660215308 +0100 @@ -0,0 +1,20 @@ +/* PR rtl-optimization/118638 */ + +__attribute__((noipa)) int +foo (int x) +{ + int a = x != -3, b, c; + a *= 3; + b = 2 * x - 9; + a = a + b; + a = ~a; + c = a & 1; + return -c; +} + +int +main () +{ + if (foo (0) != -1) +__builtin_abort (); +} Jakub
[patch, libfortran] PR114618 Format produces incorrect output when contains 1x, ok when uses " "
Hello all, The attached patch is part 1 of my effort to fix these X and T edit descriptor issues. This one cleans up some really ugly output. Before the patch with the test case provided by the reporter: PI.^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ 3.1415926535897931 REAL(PI)...^@^@^@^@^@^@^@^@^@^@^@ 3.14159274 DBLE(PI)...^@^@^@^@^@^@^@^@^@^@^@ 3.1415926535897931 RADIX..^@^@^@^@^@^@^@^@^@^@^@ 3. 2 RANGE..^@^@^@^@^@^@^@^@^@^@^@ 3. 307 PRECISION..^@^@^@^@^@^@^@^@^@^@ 15 Which is complete garbage. After the patch: PI 3.1415926535897931 REAL(PI) .. 3.14159274 DBLE(PI) .. 3.1415926535897931 RADIX . 2 RANGE . 307 PRECISION . 15 I greatly reduced the test case included in the patch. While working on this one I discovered other problems not addressed by this patch and I will address these through PR113897. You will see some changes in the factoring of some of the code in the case FMT_X:, case FMT_TR:, case FMT_TL:, case FMT_T:. I anticipate in part 2 that I will be doing more specific changes on these. Regression tested on x86_64_linux_gnu. OK for trunk? Regards, Jerry Author: Jerry DeLisle Date: Mon Jan 27 19:08:46 2025 -0800 Fortran: Fix handling of the X edit descriptor. This patch is a partial fix of handling of X edit descriptors when combined with certain T edit descriptors. PR libfortran/114618 libgfortran/ChangeLog: * io/transfer.c (formatted_transfer_scalar_write): Change name of vriable 'pos' to 'tab_pos' to improve clarity. Add new variable next_pos when calculating the maximum position. Update the calculation of pending spaces. gcc/testsuite/ChangeLog: * gfortran.dg/pr114618.f90: New test. diff --git a/gcc/testsuite/gfortran.dg/pr114618.f90 b/gcc/testsuite/gfortran.dg/pr114618.f90 new file mode 100644 index 000..c06c4debe31 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr114618.f90 @@ -0,0 +1,15 @@ +! { dg-do run } +! PR114618 Format produces incorrect output when contains 1x, ok when uses " " +! aside: Before patch output1 is garbage. +! Output should be 'RADIX . 2' +program pr114618 + implicit none + integer, parameter :: wp = kind(0d0) + real(kind=wp) :: pi = 3.14159265358979323846264338_wp + character(len=*), parameter:: fmt1 = '(19("."),t1,g0,1x,t21,g0)' + character(len=*), parameter:: fmt2 = '(19("."),t1,g0," ",t21,g0)' + character(30) :: output1, output2 + write (output1, fmt1) 'RADIX', radix(pi) + write (output2, fmt2) 'RADIX', radix(pi) + if (output1 /= output2) stop 1 +end program pr114618 diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c index b3b72f39c5b..8a24a099afe 100644 --- a/libgfortran/io/transfer.c +++ b/libgfortran/io/transfer.c @@ -2068,12 +2068,14 @@ static void formatted_transfer_scalar_write (st_parameter_dt *dtp, bt type, void *p, int kind, size_t size) { - gfc_offset pos, bytes_used; + gfc_offset tab_pos, bytes_used; const fnode *f; format_token t; int n; int consume_data_flag; + tab_pos = 0; bytes_used = 0; + /* Change a complex data item into a pair of reals. */ n = (p == NULL) ? 0 : ((type != BT_COMPLEX) ? 1 : 2); @@ -2398,10 +2400,17 @@ formatted_transfer_scalar_write (st_parameter_dt *dtp, bt type, void *p, int kin case FMT_X: case FMT_TR: consume_data_flag = 0; - dtp->u.p.skips += f->u.n; - pos = bytes_used + dtp->u.p.skips - 1; - dtp->u.p.pending_spaces = pos - dtp->u.p.max_pos + 1; + tab_pos = bytes_used + dtp->u.p.skips - 1; + dtp->u.p.pending_spaces = tab_pos - dtp->u.p.max_pos + 1; + dtp->u.p.pending_spaces = dtp->u.p.pending_spaces < 0 +? f->u.n : dtp->u.p.pending_spaces; + + if (t == FMT_X && tab_pos < dtp->u.p.max_pos) + { + write_x (dtp, dtp->u.p.skips, dtp->u.p.pending_spaces); + dtp->u.p.skips = dtp->u.p.pending_spaces = 0; + } /* Writes occur just before the switch on f->format, above, so that trailing blanks are suppressed, unless we are doing a non-advancing write in which case we want to output the blanks @@ -2414,35 +2423,50 @@ formatted_transfer_scalar_write (st_parameter_dt *dtp, bt type, void *p, int kin break; case FMT_TL: - case FMT_T: consume_data_flag = 0; - - if (f->format == FMT_TL) + /* Handle the special case when no bytes have been used yet. + Cannot go below zero. */ + if (bytes_used == 0) { - - /* Handle the special case when no bytes have been used yet. - Cannot go below zero. */ - if (bytes_used == 0) - { - dtp->u.p.pending_spaces -= f->u.n; - dtp->u.p.skips -= f->u.n; - dtp->u.p.skips = dtp->u.p.skips < 0 ? 0 : dtp->u.p.skips; - } - - pos = bytes_used - f->u.n; + dtp->u.p.pending_spaces -= f->u.n; + dtp->u.p.skips -= f->u.n; + dtp->u.p.skips = dtp->u.p.skips < 0 ? 0
[PATCH] libstdc++: Fix views::transform(move_only_fn{}) forwarding [PR118413]
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14? -- >8 -- This case was incorrectly failing in C++23 mode even after P2492R2 because the perfect forwarding simplification mechanism assumed bound arguments are copy-constructible which is no longer necessarily true after that paper. It'd be easy enough to fix the mechanism, but in C++23 mode the mechanism is no longer useful since we can just rely on deducing 'this' to implement perfect forwarding with a single overload (and done in r14-7150-gd2cb4693a0b383). So this patch just disables the mechanism in C++23 mode so that the generic implementation is always used. PR libstdc++/118413 libstdc++-v3/ChangeLog: * include/std/ranges (_GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING): New. (__closure_has_simple_call_op): Only define in C++20 mode. (__closure_has_simple_extra_args): Likewise. (_Partial, _Pipe): Likewise, for the "simple" partial specializations. (*::_S_has_simple_call_op): Likewise. (*::_S_has_simple_extra_args): Likewise. * testsuite/std/ranges/adaptors/100577.cc: Disable some implementation detail checks in C++23 mode. * testsuite/std/ranges/adaptors/transform.cc (test09): Also test partially applying the move-only function. --- libstdc++-v3/include/std/ranges | 53 +++ .../testsuite/std/ranges/adaptors/100577.cc | 4 +- .../std/ranges/adaptors/transform.cc | 2 + 3 files changed, 58 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges index ad69a94b21f..a2b8748bea6 100644 --- a/libstdc++-v3/include/std/ranges +++ b/libstdc++-v3/include/std/ranges @@ -1028,6 +1028,16 @@ namespace views::__adaptor } }; +#if __cplusplus <= 202003L + // In C++20 mode we simplify perfect forwarding of a range adaptor closure's + // bound arguments when possible (according to their types), for sake of compile + // times and diagnostic quality. In C++23 mode we instead rely on deducing 'this' + // to idiomatically implement perfect forwarding. Note that this means the + // simplification logic doesn't consider C++23 changes such as P2492R2. +# define _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING 1 +#endif + +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING // True if the range adaptor closure _Adaptor has a simple operator(), i.e. // one that's not overloaded according to constness or value category of the // _Adaptor object. @@ -1039,6 +1049,7 @@ namespace views::__adaptor template concept __adaptor_has_simple_extra_args = _Adaptor::_S_has_simple_extra_args || _Adaptor::template _S_has_simple_extra_args<_Args...>; +#endif // A range adaptor closure that represents partial application of // the range adaptor _Adaptor with arguments _Args. @@ -1139,6 +1150,7 @@ namespace views::__adaptor #endif }; +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING // Partial specialization of the primary template for the case where the extra // arguments of the adaptor can always be safely and efficiently forwarded by // const reference. This lets us get away with a single operator() overload, @@ -1195,6 +1207,7 @@ namespace views::__adaptor static constexpr bool _S_has_simple_call_op = true; }; +#endif // _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING template concept __pipe_invocable @@ -1245,6 +1258,7 @@ namespace views::__adaptor #endif }; +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING // A partial specialization of the above primary template for the case where // both adaptor operands have a simple operator(). This in turn lets us // implement composition using a single simple operator(), which makes @@ -1271,6 +1285,7 @@ namespace views::__adaptor static constexpr bool _S_has_simple_call_op = true; }; +#endif // _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING } // namespace views::__adaptor #if __cpp_lib_ranges >= 202202L @@ -1454,7 +1469,9 @@ namespace views::__adaptor return owning_view{std::forward<_Range>(__r)}; } +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING static constexpr bool _S_has_simple_call_op = true; +#endif }; inline constexpr _All all; @@ -1872,7 +1889,9 @@ namespace views::__adaptor using _RangeAdaptor<_Filter>::operator(); static constexpr int _S_arity = 2; +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING static constexpr bool _S_has_simple_extra_args = true; +#endif }; inline constexpr _Filter filter; @@ -2260,7 +2279,9 @@ namespace views::__adaptor using _RangeAdaptor<_Transform>::operator(); static constexpr int _S_arity = 2; +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING static constexpr bool _S_has_simple_extra_args = true; +#endif }; inline constexpr _Transform transform; @@ -2494,12
Re: [PATCH] libstdc++: Fix views::transform(move_only_fn{}) forwarding [PR118413]
On Mon, 27 Jan 2025, Patrick Palka wrote: > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14? > > -- >8 -- > > This case was incorrectly failing in C++23 mode even after P2492R2 > because the perfect forwarding simplification mechanism assumed bound > arguments are copy-constructible which is no longer necessarily true > after that paper. It'd be easy enough to fix the mechanism, but in > C++23 mode the mechanism is no longer useful since we can just rely on > deducing 'this' to implement perfect forwarding with a single overload > (and done in r14-7150-gd2cb4693a0b383). So this patch just disables > the mechanism in C++23 mode so that the generic implementation is always > used. I should add that this doesn't constitute an ABI change because the disabled "simple" partial specializations of _Partial and _Pipe have the same layout as the corresponding non-simple versions. > > PR libstdc++/118413 > > libstdc++-v3/ChangeLog: > > * include/std/ranges > (_GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING): New. > (__closure_has_simple_call_op): Only define in C++20 mode. > (__closure_has_simple_extra_args): Likewise. > (_Partial, _Pipe): Likewise, for the "simple" partial > specializations. > (*::_S_has_simple_call_op): Likewise. > (*::_S_has_simple_extra_args): Likewise. > * testsuite/std/ranges/adaptors/100577.cc: Disable some > implementation detail checks in C++23 mode. > * testsuite/std/ranges/adaptors/transform.cc (test09): Also test > partially applying the move-only function. > --- > libstdc++-v3/include/std/ranges | 53 +++ > .../testsuite/std/ranges/adaptors/100577.cc | 4 +- > .../std/ranges/adaptors/transform.cc | 2 + > 3 files changed, 58 insertions(+), 1 deletion(-) > > diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges > index ad69a94b21f..a2b8748bea6 100644 > --- a/libstdc++-v3/include/std/ranges > +++ b/libstdc++-v3/include/std/ranges > @@ -1028,6 +1028,16 @@ namespace views::__adaptor > } > }; > > +#if __cplusplus <= 202003L > + // In C++20 mode we simplify perfect forwarding of a range adaptor > closure's > + // bound arguments when possible (according to their types), for sake of > compile > + // times and diagnostic quality. In C++23 mode we instead rely on > deducing 'this' > + // to idiomatically implement perfect forwarding. Note that this means the > + // simplification logic doesn't consider C++23 changes such as > P2492R2. > +# define _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING 1 > +#endif > + > +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING >// True if the range adaptor closure _Adaptor has a simple operator(), i.e. >// one that's not overloaded according to constness or value category of > the >// _Adaptor object. > @@ -1039,6 +1049,7 @@ namespace views::__adaptor >template > concept __adaptor_has_simple_extra_args = > _Adaptor::_S_has_simple_extra_args >|| _Adaptor::template _S_has_simple_extra_args<_Args...>; > +#endif > >// A range adaptor closure that represents partial application of >// the range adaptor _Adaptor with arguments _Args. > @@ -1139,6 +1150,7 @@ namespace views::__adaptor > #endif > }; > > +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING >// Partial specialization of the primary template for the case where the > extra >// arguments of the adaptor can always be safely and efficiently forwarded > by >// const reference. This lets us get away with a single operator() > overload, > @@ -1195,6 +1207,7 @@ namespace views::__adaptor > >static constexpr bool _S_has_simple_call_op = true; > }; > +#endif // _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING > >template > concept __pipe_invocable > @@ -1245,6 +1258,7 @@ namespace views::__adaptor > #endif > }; > > +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING >// A partial specialization of the above primary template for the case > where >// both adaptor operands have a simple operator(). This in turn lets us >// implement composition using a single simple operator(), which makes > @@ -1271,6 +1285,7 @@ namespace views::__adaptor > >static constexpr bool _S_has_simple_call_op = true; > }; > +#endif // _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING > } // namespace views::__adaptor > > #if __cpp_lib_ranges >= 202202L > @@ -1454,7 +1469,9 @@ namespace views::__adaptor > return owning_view{std::forward<_Range>(__r)}; > } > > +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWARDING >static constexpr bool _S_has_simple_call_op = true; > +#endif > }; > > inline constexpr _All all; > @@ -1872,7 +1889,9 @@ namespace views::__adaptor > >using _RangeAdaptor<_Filter>::operator(); >static constexpr int _S_arity = 2; > +#ifdef _GLIBCXX_RANGES_SIMPLIFY_ADAPTOR_FORWA
[committed][PR target/114085] Fix H8 constraint issue which led to ICE
Nowhere near the top of my list, but a quick looksie Sunday led to an easy to fix backend bug. It's not a regression, but given its the H8 backend I think we've safely got a degree of freedom here. The H8 has a constraint "U" which allowed both a subset of MEMs and REGs, so it wasn't marked as a memory constraint. LRA doesn't really handle this well -- a pseudo which didn't get a hard reg was replaced by its MEM. The stack slot doesn't fit the limited addressing forms available and LRA didn't know it just needed to reload the address into a reg. Fixed by removing REG from the "U" constraint, turning "U" into a memory constraint and adjusting a few patterns to allow "rU" instead of "U". We don't really support C++ on the H8 and as a result libstdc++ won't build. Interestingly enough that also keeps the C++ tests from working, even for a compile-only test. So no testcase. Though I did check the reduced and original test manually and ran it through my tester without any regressions. Pushing to the trunk. Jeff commit 235215323c67d8ce021a00df0f42e2c1713c7959 Author: Jeff Law Date: Mon Jan 27 21:25:39 2025 -0700 [PR target/114085] Fix H8 constraint issue which led to ICE Nowhere near the top of my list, but a quick looksie Sunday led to an easy to fix backend bug. It's not a regression, but given its the H8 backend I think we've safely got a degree of freedom here. The H8 has a constraint "U" which allowed both a subset of MEMs and REGs, so it wasn't marked as a memory constraint. LRA doesn't really handle this well -- a pseudo which didn't get a hard reg was replaced by its MEM. The stack slot doesn't fit the limited addressing forms available and LRA didn't know it just needed to reload the address into a reg. Fixed by removing REG from the "U" constraint, turning "U" into a memory constraint and adjusting a few patterns to allow "rU" instead of "U". We don't really support C++ on the H8 and as a result libstdc++ won't build. Interestingly enough that also keeps the C++ tests from working, even for a compile-only test. So no testcase. Though I did check the reduced and original test manually and ran it through my tester without any regressions. PR target/114085 gcc/ * config/h8300/constraints.md (U): No longer accept REGs. * config/h8300/logical.md (andqi3_2): Use "rU" rather than "U". (andqi3_2_clobber_flags, andqi3_1, qi3_1): Likewise. * config/h8300/testcompare.md (tst_extzv_1_n): Likewise. diff --git a/gcc/config/h8300/constraints.md b/gcc/config/h8300/constraints.md index 65579267359..ba76df834a6 100644 --- a/gcc/config/h8300/constraints.md +++ b/gcc/config/h8300/constraints.md @@ -168,13 +168,9 @@ (define_constraint "T" (and (match_code "const_int") (match_test "!h8300_shift_needs_scratch_p (ival, SImode, CLOBBER)"))) -(define_constraint "U" +(define_memory_constraint "U" "An operand valid for a bset destination." - (ior (and (match_code "reg") - (match_test "(reload_in_progress || reload_completed) -? REG_OK_FOR_BASE_STRICT_P (op) -: REG_OK_FOR_BASE_P (op)")) - (and (match_code "mem") + (ior (and (match_code "mem") (match_code "reg" "0") (match_test "(reload_in_progress || reload_completed) ? REG_OK_FOR_BASE_STRICT_P (XEXP (op, 0)) diff --git a/gcc/config/h8300/logical.md b/gcc/config/h8300/logical.md index 5df0922ef4e..f848242ac87 100644 --- a/gcc/config/h8300/logical.md +++ b/gcc/config/h8300/logical.md @@ -32,7 +32,7 @@ (define_expand "3" ;; -- (define_insn_and_split "*andqi3_2" - [(set (match_operand:QI 0 "bit_operand" "=U,rQ,r") + [(set (match_operand:QI 0 "bit_operand" "=rU,rQ,r") (and:QI (match_operand:QI 1 "bit_operand" "%0,0,WU") (match_operand:QI 2 "h8300_src_operand" "Y0,rQi,IP1>X")))] "TARGET_H8300SX" @@ -42,7 +42,7 @@ (define_insn_and_split "*andqi3_2" (clobber (reg:CC CC_REG))])]) (define_insn "*andqi3_2_clobber_flags" - [(set (match_operand:QI 0 "bit_operand" "=U,rQ,r") + [(set (match_operand:QI 0 "bit_operand" "=rU,rQ,r") (and:QI (match_operand:QI 1 "bit_operand" "%0,0,WU") (match_operand:QI 2 "h8300_src_operand" "Y0,rQi,IP1>X"))) (clobber (reg:CC CC_REG))] @@ -55,7 +55,7 @@ (define_insn "*andqi3_2_clobber_flags" (set_attr "length_table" "*,logicb,*")]) (define_insn_and_split "andqi3_1" - [(set (match_operand:QI 0 "bit_operand" "=U,r") + [(set (match_operand:QI 0 "bit_operand" "=rU,r") (and:QI (match_operand:QI 1 "bit_operand" "%0,0") (match_operand:QI 2 "h8300_src_operand" "Y0,rn")))] "register_operand (operands[0], QImode) @@ -156,7 +156,7 @@ (define_insn "*andorsi3_shift_8_clobber_flags
[PATCH] c++: wrong-code with consteval constructor [PR117501]
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14? -- >8 -- We've had a wrong-code problem since r14-4140, due to which we forget to initialize a variable. In consteval39.C, we evaluate struct QQQ q; <>> (const char *) "" ) >; into struct QQQ q; <; and then the useless expr_stmt is dropped on the floor, so q isn't initialized. As pre-r14-4140, we need to handle constructors specially. With this patch, we generate: struct QQQ q; <; initializing q properly. PR c++/117501 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold_immediate_r): If DECL is a constructor, build up an INIT_EXPR to initialize its object. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/consteval39.C: New test. * g++.dg/cpp2a/consteval40.C: New test. --- gcc/cp/cp-gimplify.cc| 13 ++-- gcc/testsuite/g++.dg/cpp2a/consteval39.C | 27 gcc/testsuite/g++.dg/cpp2a/consteval40.C | 25 ++ 3 files changed, 63 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/consteval39.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/consteval40.C diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc index 4ec3de13008..65b486ab943 100644 --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -1259,7 +1259,16 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, void *data_) not need to produce a constant expression. */ if (DECL_IMMEDIATE_FUNCTION_P (decl)) { - tree e = cxx_constant_value (stmt, tf_none); + tree obj_arg = NULL_TREE; + if (call_p && DECL_CONSTRUCTOR_P (decl)) + if (tree o = get_nth_callarg (stmt, 0)) + if (!is_dummy_object (o)) + { + obj_arg = o; + if (TREE_CODE (obj_arg) == ADDR_EXPR) + obj_arg = TREE_OPERAND (obj_arg, 0); + } + tree e = cxx_constant_value (stmt, obj_arg, tf_none); if (e == error_mark_node) { /* This takes care of, e.g., @@ -1297,7 +1306,7 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, void *data_) } /* We've evaluated the consteval function call. */ if (call_p) - *stmt_p = e; + *stmt_p = obj_arg ? cp_build_init_expr (obj_arg, e) : e; } /* We've encountered a function call that may turn out to be consteval later. Store its caller so that we can ensure that the call is diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval39.C b/gcc/testsuite/g++.dg/cpp2a/consteval39.C new file mode 100644 index 000..523e8260eab --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/consteval39.C @@ -0,0 +1,27 @@ +// PR c++/117501 +// { dg-do run { target c++20 } } + +constexpr unsigned +length () +{ + bool __trans_tmp_1 = __builtin_is_constant_evaluated(); + if (__trans_tmp_1) +return 42; + return 1; +} +struct basic_string_view { + constexpr basic_string_view(const char *) : _M_len{length()}, _M_str{} {} + long _M_len; + char _M_str; +}; +struct QQQ { + consteval QQQ(basic_string_view d) : data(d) {} + basic_string_view data; +}; +int +main () +{ + QQQ q(""); + if (q.data._M_len != 42) +__builtin_abort (); +} diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval40.C b/gcc/testsuite/g++.dg/cpp2a/consteval40.C new file mode 100644 index 000..4d3ba20092b --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/consteval40.C @@ -0,0 +1,25 @@ +// PR c++/117501 +// { dg-do run { target c++20 } } + +constexpr int +twiddle (int i) +{ + if (__builtin_is_constant_evaluated ()) +return 3; + return i; +} +struct S { + constexpr S(int i) : i{twiddle (i)} {} + int i; +}; +struct Q { + consteval Q(S s_) : s{s_, s_} {} + S s[2]; +}; +int +main () +{ + Q q(twiddle (42)); + if (q.s[0].i != 3 || q.s[1].i != 3) +__builtin_abort (); +} base-commit: ceabea405ffdc851736e240111be9b297ad86c53 -- 2.48.1
[PATCH] c++: Fix ICE with #embed/RAW_DATA_CST after list conversion [PR118671]
Hi! The following testcases ICE with RAW_DATA_CSTs (so the first one since introduction of #embed C++ optimizations and the latter since optimization of large sequences of comma separated literals). I've missed the fact that implicit_conversion can embed the exact expression passed to it into stuff pointed out by conversion * (e.g. for user conversions in sub->cand->args). So, it isn't enough in convert_like_internal to pass the right INTEGER_CST for each element of the RAW_DATA_CST because the whole RAW_DATA_CST might be in sub->cand->args etc. Either I'd need to chase for wherever the RAW_DATA_CST is found and update those for each element processed, or, as implemented in the following patch, build_list_conv detects the easy optimizable case where convert_like_internal can be kept as the whole RAW_DATA_CST with changed type and possibly narrowing diagnostics, and otherwise instead of having a single subconversion it has RAW_DATA_CST separate subconversions. Instead of trying to reallocate the subconvs array when we detect that case, the patch instead uses an artificial ck_list inside of the u.list array to hold the individual subconversions. Seems the only places where u.list is used are build_list_conv and convert_like_internal. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-01-28 Jakub Jelinek PR c++/118671 * call.cc (build_list_conv): For RAW_DATA_CST, call implicit_conversion with INTEGER_CST representing first byte instead of the whole RAW_DATA_CST. If it is an optimizable trivial conversion, just save that to subconvs, otherwise allocate an artificial ck_list for all the RAW_DATA_CST bytes and create subsubconv for each of them. (convert_like_internal): For ck_list with RAW_DATA_CST, instead of doing all the checks for optimizable conversion just check kind and assert everything else, otherwise use subsubconversions instead of the subconversion for each element. * g++.dg/cpp/embed-25.C: New test. * g++.dg/cpp0x/pr118671.C: New test. --- gcc/cp/call.cc.jj 2025-01-22 09:22:53.0 +0100 +++ gcc/cp/call.cc 2025-01-27 21:36:31.159889633 +0100 @@ -868,6 +868,67 @@ build_list_conv (tree type, tree ctor, i FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (ctor), i, val) { + if (TREE_CODE (val) == RAW_DATA_CST) + { + tree elt + = build_int_cst (TREE_TYPE (val), RAW_DATA_UCHAR_ELT (val, 0)); + conversion *sub + = implicit_conversion (elttype, TREE_TYPE (val), elt, + false, flags, complain); + conversion *next; + if (sub == NULL) + return NULL; + /* For conversion to initializer_list or +initializer_list or initializer_list +we can optimize and keep RAW_DATA_CST with adjusted +type if we report narrowing errors if needed. +Use just one subconversion for that case. */ + if (sub->kind == ck_std + && sub->type + && (TREE_CODE (sub->type) == INTEGER_TYPE + || is_byte_access_type (sub->type)) + && TYPE_PRECISION (sub->type) == CHAR_BIT + && (next = next_conversion (sub)) + && next->kind == ck_identity) + { + subconvs[i] = sub; + continue; + } + /* Otherwise. build separate subconv for each RAW_DATA_CST +byte. Wrap those into an artificial ck_list which convert_like +will then handle. */ + conversion **subsubconvs = alloc_conversions (RAW_DATA_LENGTH (val)); + unsigned int j; + subsubconvs[0] = sub; + for (j = 1; j < (unsigned) RAW_DATA_LENGTH (val); ++j) + { + elt = build_int_cst (TREE_TYPE (val), + RAW_DATA_UCHAR_ELT (val, j)); + sub = implicit_conversion (elttype, TREE_TYPE (val), elt, +false, flags, complain); + if (sub == NULL) + return NULL; + subsubconvs[j] = sub; + } + + t = alloc_conversion (ck_list); + t->type = type; + t->u.list = subsubconvs; + t->rank = cr_exact; + for (j = 0; j < (unsigned) RAW_DATA_LENGTH (val); ++j) + { + sub = subsubconvs[i]; + if (sub->rank > t->rank) + t->rank = sub->rank; + if (sub->user_conv_p) + t->user_conv_p = true; + if (sub->bad_p) + t->bad_p = true; + } + subconvs[i] = t; + continue; + } + conversion *sub = implicit_conversion (elttype, TREE_TYPE (val), val, false, flags, complain); @@ -8841,22 +8902,22 @@ convert_like_internal (conversion *convs { if (TREE_CODE (val) =
[PATCH] c++: Return false from __is_bounded_array for zero-sized arrays [PR118655]
Hi! This is basically Marek's PR114479 r14-9759 __is_array fix applied to __is_bounded_array as well. Similarly to that trait, when not using the builtin it returned false for zero sized arrays but when using the builtin it returns true. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Do we want to backport this to 14.3 or not? 2025-01-28 Jakub Jelinek PR c++/118655 * semantics.cc (trait_expr_value) : Return false for zero-sized arrays. * g++.dg/ext/is_bounded_array.C: Extend. --- gcc/cp/semantics.cc.jj 2025-01-17 11:29:33.616689665 +0100 +++ gcc/cp/semantics.cc 2025-01-27 12:21:11.439367867 +0100 @@ -13165,7 +13165,14 @@ trait_expr_value (cp_trait_kind kind, tr || DERIVED_FROM_P (type1, type2))); case CPTK_IS_BOUNDED_ARRAY: - return type_code1 == ARRAY_TYPE && TYPE_DOMAIN (type1); + return (type_code1 == ARRAY_TYPE + && TYPE_DOMAIN (type1) + /* We don't want to report T[0] as being a bounded array type. +This is for compatibility with an implementation of +std::is_bounded_array by template argument deduction, because +compute_array_index_type_loc rejects a zero-size array +in SFINAE context. */ + && !(TYPE_SIZE (type1) && integer_zerop (TYPE_SIZE (type1; case CPTK_IS_CLASS: return NON_UNION_CLASS_TYPE_P (type1); --- gcc/testsuite/g++.dg/ext/is_bounded_array.C.jj 2023-12-22 12:26:14.917893636 +0100 +++ gcc/testsuite/g++.dg/ext/is_bounded_array.C 2025-01-27 12:34:18.400549388 +0100 @@ -1,4 +1,5 @@ // { dg-do compile { target c++11 } } +// { dg-options "" } #define SA(X) static_assert((X),#X) @@ -14,21 +15,34 @@ class ClassType { }; +constexpr int sz0 = 0; +constexpr int sz2 = 2; + SA_TEST_CATEGORY(__is_bounded_array, int[2], true); SA_TEST_CATEGORY(__is_bounded_array, int[], false); +SA_TEST_CATEGORY(__is_bounded_array, int[0], false); SA_TEST_CATEGORY(__is_bounded_array, int[2][3], true); SA_TEST_CATEGORY(__is_bounded_array, int[][3], false); +SA_TEST_CATEGORY(__is_bounded_array, int[0][3], false); +SA_TEST_CATEGORY(__is_bounded_array, int[3][0], false); SA_TEST_CATEGORY(__is_bounded_array, float*[2], true); SA_TEST_CATEGORY(__is_bounded_array, float*[], false); SA_TEST_CATEGORY(__is_bounded_array, float*[2][3], true); SA_TEST_CATEGORY(__is_bounded_array, float*[][3], false); SA_TEST_CATEGORY(__is_bounded_array, ClassType[2], true); SA_TEST_CATEGORY(__is_bounded_array, ClassType[], false); +SA_TEST_CATEGORY(__is_bounded_array, ClassType[0], false); SA_TEST_CATEGORY(__is_bounded_array, ClassType[2][3], true); SA_TEST_CATEGORY(__is_bounded_array, ClassType[][3], false); +SA_TEST_CATEGORY(__is_bounded_array, ClassType[0][3], false); +SA_TEST_CATEGORY(__is_bounded_array, ClassType[2][0], false); +SA_TEST_CATEGORY(__is_bounded_array, int[sz2], true); +SA_TEST_CATEGORY(__is_bounded_array, int[sz0], false); SA_TEST_CATEGORY(__is_bounded_array, int(*)[2], false); SA_TEST_CATEGORY(__is_bounded_array, int(*)[], false); +SA_TEST_CATEGORY(__is_bounded_array, int(*)[0], false); SA_TEST_CATEGORY(__is_bounded_array, int(&)[2], false); +SA_TEST_CATEGORY(__is_bounded_array, int(&)[0], false); SA_TEST_FN(__is_bounded_array, int(&)[], false); // Sanity check. Jakub
Re: [PATCH v1] RISC-V: Remove unnecessary frm restore volatile define_insn
On 1/26/25 05:33, pan2...@intel.com wrote: > From: Pan Li > > After we add the frm register to the global_regs, we may not need to > define_insn that volatile to emit the frm restore insns. The > cooperatively-managed global register will help to handle this, instead > of emit the volatile define_insn explicitly. > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_emit_frm_mode_set): Refactor > the frm mode set by removing fsrmsi_restore_volatile. > * config/riscv/vector-iterators.md (unspecv): Remove as unnecessary. > * config/riscv/vector.md (fsrmsi_restore_volatile): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: Adjust > the asm dump check times. > * gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: Ditto. > * gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: Ditto. > * gcc.target/riscv/rvv/base/float-point-dynamic-frm-74.c: Ditto. > * gcc.target/riscv/rvv/base/float-point-dynamic-frm-75.c: Ditto. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/riscv.cc | 43 ++- > gcc/config/riscv/vector-iterators.md | 4 -- > gcc/config/riscv/vector.md| 13 -- > .../rvv/base/float-point-dynamic-frm-49.c | 2 +- > .../rvv/base/float-point-dynamic-frm-50.c | 2 +- > .../rvv/base/float-point-dynamic-frm-52.c | 2 +- > .../rvv/base/float-point-dynamic-frm-74.c | 2 +- > .../rvv/base/float-point-dynamic-frm-75.c | 2 +- > 8 files changed, 28 insertions(+), 42 deletions(-) > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index dd50fe4eddf..8e3bf0077cd 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -12031,27 +12031,30 @@ riscv_emit_frm_mode_set (int mode, int prev_mode) >if (prev_mode == riscv_vector::FRM_DYN_CALL) > emit_insn (gen_frrmsi (backup_reg)); /* Backup frm when DYN_CALL. */ > > - if (mode != prev_mode) > -{ > - rtx frm = gen_int_mode (mode, SImode); > - > - if (mode == riscv_vector::FRM_DYN_CALL > - && prev_mode != riscv_vector::FRM_DYN && STATIC_FRM_P (cfun)) > - /* No need to emit when prev mode is DYN already. */ > - emit_insn (gen_fsrmsi_restore_volatile (backup_reg)); > - else if (mode == riscv_vector::FRM_DYN_EXIT && STATIC_FRM_P (cfun) > - && prev_mode != riscv_vector::FRM_DYN > - && prev_mode != riscv_vector::FRM_DYN_CALL) > - /* No need to emit when prev mode is DYN or DYN_CALL already. */ > - emit_insn (gen_fsrmsi_restore_volatile (backup_reg)); > - else if (mode == riscv_vector::FRM_DYN > - && prev_mode != riscv_vector::FRM_DYN_CALL) > - /* Restore frm value from backup when switch to DYN mode. */ > - emit_insn (gen_fsrmsi_restore (backup_reg)); > - else if (riscv_static_frm_mode_p (mode)) > - /* Set frm value when switch to static mode. */ > - emit_insn (gen_fsrmsi_restore (frm)); > + if (mode == prev_mode) > +return; Nit, can you move this check in the caller riscv_emit_mode_set () which already checks similarly for VXRM (unless there's a corner case where the gen_frrmsi is needed despite both prev and curr being same. riscv_emit_mode_set (int entity, int mode, int prev_mode, HARD_REG_SET regs_live ATTRIBUTE_UNUSED) { switch (entity) { case RISCV_VXRM: if (mode != VXRM_MODE_NONE && mode != prev_mode) emit_insn (gen_vxrmsi (gen_int_mode (mode, SImode))); break; case RISCV_FRM: + if (mode != prev_mode) riscv_emit_frm_mode_set (mode, prev_mode); break; > + > + if (riscv_static_frm_mode_p (mode)) > +{ > + /* Set frm value when switch to static mode. */ > + emit_insn (gen_fsrmsi_restore (gen_int_mode (mode, SImode))); > + return; > } > + > + bool restore_p > += /* No need to emit when prev mode is DYN. */ > + (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_CALL > + && prev_mode != riscv_vector::FRM_DYN) > + /* No need to emit if prev mode is DYN or DYN_CALL. */ > + || (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN_EXIT > + && prev_mode != riscv_vector::FRM_DYN > + && prev_mode != riscv_vector::FRM_DYN_CALL) > + /* Restore frm value when switch to DYN mode. */ > + || (mode == riscv_vector::FRM_DYN > + && prev_mode != riscv_vector::FRM_DYN_CALL); > + > + if (restore_p) > +emit_insn (gen_fsrmsi_restore (backup_reg)); > } > > /* Implement Mode switching. */ > diff --git a/gcc/config/riscv/vector-iterators.md > b/gcc/config/riscv/vector-iterators.md > index c1bd7397441..f64e7ad70dd 100644 > --- a/gcc/config/riscv/vector-iterators.md > +++ b/gcc/config/riscv/vector-iterators.md > @@ -122,10 +122,6 @@ (define_c_enum "unspec" [ >UNSPEC_SF_VFNRCLIPU > ]) > > -(define_c_enum "unspecv" [ > - UNSPECV_FRM_RESTORE_EXIT >