RE: [PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled
Committed, thanks Richard. Pan -Original Message- From: Richard Biener Sent: Sunday, March 10, 2024 2:53 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Wang, Yanzhang ; rdapp@gmail.com; jeffreya...@gmail.com Subject: Re: [PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled > Am 10.03.2024 um 04:14 schrieb pan2...@intel.com: > > From: Pan Li > > This patch would like to fix one ICE in vectorizable_store when both the > loop_masks and loop_lens are enabled. The ICE looks like below when build > with "-march=rv64gcv -O3". > > during GIMPLE pass: vect > test.c: In function ‘d’: > test.c:6:6: internal compiler error: in vectorizable_store, at > tree-vect-stmts.cc:8691 >6 | void d() { > | ^ > 0x37a6f2f vectorizable_store >.../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691 > 0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*, > _slp_tree*, _slp_instance*, vec*) >.../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242 > 0x1db5dca vect_analyze_loop_operations >.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208 > 0x1db885b vect_analyze_loop_2 >.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041 > 0x1dba029 vect_analyze_loop_1 >.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481 > 0x1dbabad vect_analyze_loop(loop*, vec_info_shared*) >.../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639 > 0x1e389d1 try_vectorize_loop_1 >.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066 > 0x1e38f3d try_vectorize_loop >.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182 > 0x1e39230 execute >.../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298 > > There are two ways to reach vectorizer LD/ST, one is the analysis and > the other is transform. We cannot have both the lens and the masks > enabled during transform but it is valid during analysis. Given the > transform doesn't required cost_vec, we can only enable the assert > based on cost_vec is NULL or not. > > Below testsuites are passed for this patch: > * The x86 bootstrap tests. > * The x86 fully regression tests. > * The aarch64 fully regression tests. > * The riscv fully regressison tests. Ok Thanks, Richard > gcc/ChangeLog: > >* tree-vect-stmts.cc (vectorizable_store): Enable the assert >during transform process. >(vectorizable_load): Ditto. > > gcc/testsuite/ChangeLog: > >* gcc.target/riscv/rvv/base/pr114195-1.c: New test. > > Signed-off-by: Pan Li > --- > .../gcc.target/riscv/rvv/base/pr114195-1.c | 15 +++ > gcc/tree-vect-stmts.cc | 18 ++ > 2 files changed, 29 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c > new file mode 100644 > index 000..a67b847112b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c > @@ -0,0 +1,15 @@ > +/* Test that we do not have ice when compile */ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */ > + > +long a, b; > +extern short c[]; > + > +void d() { > + for (int e = 0; e < 35; e = 2) { > +a = ({ a < 0 ? a : 0; }); > +b = ({ b < 0 ? b : 0; }); > + > +c[e] = 0; > + } > +} > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index 14a3ffb5f02..e8617439a48 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -8697,8 +8697,13 @@ vectorizable_store (vec_info *vinfo, >? &LOOP_VINFO_LENS (loop_vinfo) >: NULL); > > - /* Shouldn't go with length-based approach if fully masked. */ > - gcc_assert (!loop_lens || !loop_masks); > + /* The vect_transform_stmt and vect_analyze_stmt will go here but there > + are some difference here. We cannot enable both the lens and masks > + during transform but it is allowed during analysis. > + Shouldn't go with length-based approach if fully masked. */ > + if (cost_vec == NULL) > +/* The cost_vec is NULL during transfrom. */ > +gcc_assert ((!loop_lens || !loop_masks)); > > /* Targets with store-lane instructions must not require explicit > realignment. vect_supportable_dr_alignment always returns either > @@ -10577,8 +10582,13 @@ vectorizable_load (vec_info *vinfo, >? &LOOP_VINFO_LENS (loop_vinfo) >: NULL); > > - /* Shouldn't go with length-based approach if fully masked. */ > - gcc_assert (!loop_lens || !loop_masks); > + /* The vect_transform_stmt and vect_analyze_stmt will go here but there > + are some difference here. We cannot enable both the lens and masks > + during transform but it is allowed during analysis. > + Shouldn't go with length-based approach if fully masked. */ > + if (cost_vec == NULL) > +/* The cost_vec is NULL during transfrom. */
Re: Frontend access to target features (was Re: [PATCH] libgccjit: Add ability to get CPU features)
Excerpts from David Malcolm's message of März 5, 2024 4:09 pm: > On Thu, 2023-11-09 at 19:33 -0500, Antoni Boucher wrote: >> Hi. >> See answers below. >> >> On Thu, 2023-11-09 at 18:04 -0500, David Malcolm wrote: >> > On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote: >> > > Hi. >> > > This patch adds support for getting the CPU features in libgccjit >> > > (bug >> > > 112466) >> > > >> > > There's a TODO in the test: >> > > I'm not sure how to test that gcc_jit_target_info_arch returns >> > > the >> > > correct value since it is dependant on the CPU. >> > > Any idea on how to improve this? >> > > >> > > Also, I created a CStringHash to be able to have a >> > > std::unordered_set. Is there any built-in way of >> > > doing >> > > this? >> > >> > Thanks for the patch. >> > >> > Some high-level questions: >> > >> > Is this specifically about detecting capabilities of the host that >> > libgccjit is currently running on? or how the target was configured >> > when libgccjit was built? >> >> I'm less sure about this part. I'll need to do more tests. >> >> > >> > One of the benefits of libgccjit is that, in theory, we support all >> > of >> > the targets that GCC already supports. Does this patch change >> > that, >> > or >> > is this more about giving client code the ability to determine >> > capabilities of the specific host being compiled for? >> >> This should not change that. If it does, this is a bug. >> >> > >> > I'm nervous about having per-target jit code. Presumably there's a >> > reason that we can't reuse existing target logic here - can you >> > please >> > describe what the problem is. I see that the ChangeLog has: >> > >> > > * config/i386/i386-jit.cc: New file. >> > >> > where i386-jit.cc has almost 200 lines of nontrivial code. Where >> > did >> > this come from? Did you base it on existing code in our source >> > tree, >> > making modifications to fit the new internal API, or did you write >> > it >> > from scratch? In either case, how onerous would this be for other >> > targets? >> >> This was mostly copied from the same code done for the Rust and D >> frontends. >> See this commit and the following: >> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b1c06fd9723453dd2b2ec306684cb806dc2b4fbb >> The equivalent to i386-jit.cc is there: >> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=22e3557e2d52f129f2bbfdc98688b945dba28dc9 > > [CCing Iain and Arthur re those patches; for reference, the patch being > discussed is attached to : > https://gcc.gnu.org/pipermail/jit/2024q1/001792.html ] > > One of my concerns about this patch is that we seem to be gaining code > that's per-(frontend x config) which seems to be copied and pasted with > a search and replace, which could lead to an M*N explosion. > That's certainly the case with the configure/make rules. Itself I think is copied originally from the {cpu_type}-protos.h machinery. It might be worth pointing out that the c-family of front-ends don't have separate headers because their per-target macros are defined in {cpu_type}.h directly - for better or worse. > Is there any real difference between the per-config code for the > different frontends, or should there be a general "enumerate all > features of the target" hook that's independent of the frontend? (but > perhaps calls into it). > As far as I understand, the configure parts should all be identical between tm_p, tm_d, tm_rust, ..., so would benefit from being templated to aid any other front-ends adding in their own per target hooks. > Am I right in thinking that (rustc with default LLVM backend) has some > set of feature strings that both (rustc with rustc_codegen_gcc) and > gccrs are trying to emulate? If so, is it presumably a goal that > libgccjit gives identical results to gccrs? If so, would it be crazy > for libgccjit to consume e.g. config/i386/i386-rust.cc ? I don't know whether libgccjit can just pull in directly the implementation of the rust target hooks here. The per-frontend target hooks usually also make use of code specific to that front-end - TARGET_CPU_CPP_BUILTINS and others can't be used by a non-c-family front-end without adding a plethora of stubs, for example. Whether or not libgccjit wants to give identical information as as rust I think is a decision for you as the maintainer of its API. Iain.
[PATCH] c++/modules: Support target-specific nodes with streaming [PR111224]
Bootstrapped and regtested on x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu, OK for trunk? It's worth noting that the AArch64 machines I had available to test with didn't have a new enough glibc to reproduce the ICEs in the PR, but this patch will be necessary (albeit possibly not sufficient) to fix it. -- >8 -- Some targets make use of POLY_INT_CSTs and other custom builtin types, which currently violate some assumptions when streaming. This patch adds support for them, specifically AArch64 SVE types like __fp16. This patch doesn't provide "full" support of AArch64 SVE, however, since for that we would need to support 'target' nodes (tracked in PR108080). PR c++/111224 gcc/cp/ChangeLog: * module.cc (enum tree_tag): Add new tag for builtin types. (trees_out::start): POLY_INT_CSTs can be emitted. (trees_in::start): Likewise. (trees_out::core_vals): Stream POLY_INT_CSTs. (trees_in::core_vals): Likewise. (trees_out::type_node): Handle target-specific builtin types, and vectors with NUM_POLY_INT_COEFFS > 1. (trees_in::tree_node): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/pr111224_a.C: New test. * g++.dg/modules/pr111224_b.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc | 70 +++ gcc/testsuite/g++.dg/modules/pr111224_a.C | 17 ++ gcc/testsuite/g++.dg/modules/pr111224_b.C | 13 + 3 files changed, 90 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/g++.dg/modules/pr111224_a.C create mode 100644 gcc/testsuite/g++.dg/modules/pr111224_b.C diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 99055523d91..0b5e2e67053 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -2718,6 +2718,7 @@ enum tree_tag { tt_typedef_type, /* A (possibly implicit) typedefed type. */ tt_derived_type, /* A type derived from another type. */ tt_variant_type, /* A variant of another type. */ + tt_builtin_type, /* A custom builtin type. */ tt_tinfo_var,/* Typeinfo object. */ tt_tinfo_typedef,/* Typeinfo typedef. */ @@ -2732,7 +2733,7 @@ enum tree_tag { tt_binfo,/* A BINFO. */ tt_vtable, /* A vtable. */ tt_thunk,/* A thunk. */ - tt_clone_ref, + tt_clone_ref, /* A cloned function. */ tt_entity, /* A extra-cluster entity. */ @@ -5173,7 +5174,6 @@ trees_out::start (tree t, bool code_streamed) break; case FIXED_CST: -case POLY_INT_CST: gcc_unreachable (); /* Not supported in C++. */ break; @@ -5259,7 +5259,6 @@ trees_in::start (unsigned code) case FIXED_CST: case IDENTIFIER_NODE: -case POLY_INT_CST: case SSA_NAME: case TARGET_MEM_REF: case TRANSLATION_UNIT_DECL: @@ -6106,7 +6105,10 @@ trees_out::core_vals (tree t) break; case POLY_INT_CST: - gcc_unreachable (); /* Not supported in C++. */ + if (streaming_p ()) + for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++) + WT (POLY_INT_CST_COEFF (t, ix)); + break; case REAL_CST: if (streaming_p ()) @@ -6615,8 +6617,9 @@ trees_in::core_vals (tree t) break; case POLY_INT_CST: - /* Not suported in C++. */ - return false; + for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++) + RT (POLY_INT_CST_COEFF (t, ix)); + break; case REAL_CST: if (const void *bytes = buf (sizeof (real_value))) @@ -8930,6 +8933,32 @@ trees_out::type_node (tree type) return; } + if (tree name = TYPE_NAME (type)) +if (TREE_CODE (name) == TYPE_DECL && DECL_ARTIFICIAL (name)) + { + /* Potentially a custom machine- or OS-specific builtin type. */ + bool found = false; + unsigned ix = 0; + for (tree t = registered_builtin_types; t; t = TREE_CHAIN (t), ix++) + if (TREE_VALUE (t) == type) + { + found = true; + break; + } + if (found) + { + int type_tag = insert (type); + if (streaming_p ()) + { + i (tt_builtin_type); + u (ix); + dump (dumper::TREE) + && dump ("Wrote:%d builtin type %N", type_tag, name); + } + return; + } + } + if (streaming_p ()) { u (tt_derived_type); @@ -9068,8 +9097,8 @@ trees_out::type_node (tree type) if (streaming_p ()) { poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (type); - /* to_constant asserts that only coeff[0] is of interest. */ - wu (static_cast (nunits.to_constant ())); + for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++) + wu (nunits.coeffs[ix]); } break; } @@ -9630,9 +9659,11 @@ trees_in::tree_node (bool is_use) case VECTOR_TYPE: { -
New Swedish PO file for 'gcc' (version 14.1-b20240218)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Swedish team of translators. The file is available at: https://translationproject.org/latest/gcc/sv.po (This file, 'gcc-14.1-b20240218.sv.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled
> Am 10.03.2024 um 11:02 schrieb Li, Pan2 : > > Committed, thanks Richard. You might want to investigate why you get mask and not Len for a particular stmt. mixing will cause variable length vectorization to fail. > Pan > > -Original Message- > From: Richard Biener > Sent: Sunday, March 10, 2024 2:53 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > Wang, Yanzhang ; rdapp@gmail.com; > jeffreya...@gmail.com > Subject: Re: [PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len > and store are enabled > > > >> Am 10.03.2024 um 04:14 schrieb pan2...@intel.com: >> >> From: Pan Li >> >> This patch would like to fix one ICE in vectorizable_store when both the >> loop_masks and loop_lens are enabled. The ICE looks like below when build >> with "-march=rv64gcv -O3". >> >> during GIMPLE pass: vect >> test.c: In function ‘d’: >> test.c:6:6: internal compiler error: in vectorizable_store, at >> tree-vect-stmts.cc:8691 >> 6 | void d() { >> | ^ >> 0x37a6f2f vectorizable_store >> .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691 >> 0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*, >> _slp_tree*, _slp_instance*, vec*) >> .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242 >> 0x1db5dca vect_analyze_loop_operations >> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208 >> 0x1db885b vect_analyze_loop_2 >> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041 >> 0x1dba029 vect_analyze_loop_1 >> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481 >> 0x1dbabad vect_analyze_loop(loop*, vec_info_shared*) >> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639 >> 0x1e389d1 try_vectorize_loop_1 >> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066 >> 0x1e38f3d try_vectorize_loop >> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182 >> 0x1e39230 execute >> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298 >> >> There are two ways to reach vectorizer LD/ST, one is the analysis and >> the other is transform. We cannot have both the lens and the masks >> enabled during transform but it is valid during analysis. Given the >> transform doesn't required cost_vec, we can only enable the assert >> based on cost_vec is NULL or not. >> >> Below testsuites are passed for this patch: >> * The x86 bootstrap tests. >> * The x86 fully regression tests. >> * The aarch64 fully regression tests. >> * The riscv fully regressison tests. > > Ok > > Thanks, > Richard > >> gcc/ChangeLog: >> >> * tree-vect-stmts.cc (vectorizable_store): Enable the assert >> during transform process. >> (vectorizable_load): Ditto. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/rvv/base/pr114195-1.c: New test. >> >> Signed-off-by: Pan Li >> --- >> .../gcc.target/riscv/rvv/base/pr114195-1.c | 15 +++ >> gcc/tree-vect-stmts.cc | 18 ++ >> 2 files changed, 29 insertions(+), 4 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c >> >> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c >> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c >> new file mode 100644 >> index 000..a67b847112b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c >> @@ -0,0 +1,15 @@ >> +/* Test that we do not have ice when compile */ >> +/* { dg-do compile } */ >> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */ >> + >> +long a, b; >> +extern short c[]; >> + >> +void d() { >> + for (int e = 0; e < 35; e = 2) { >> +a = ({ a < 0 ? a : 0; }); >> +b = ({ b < 0 ? b : 0; }); >> + >> +c[e] = 0; >> + } >> +} >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index 14a3ffb5f02..e8617439a48 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -8697,8 +8697,13 @@ vectorizable_store (vec_info *vinfo, >> ? &LOOP_VINFO_LENS (loop_vinfo) >> : NULL); >> >> - /* Shouldn't go with length-based approach if fully masked. */ >> - gcc_assert (!loop_lens || !loop_masks); >> + /* The vect_transform_stmt and vect_analyze_stmt will go here but there >> + are some difference here. We cannot enable both the lens and masks >> + during transform but it is allowed during analysis. >> + Shouldn't go with length-based approach if fully masked. */ >> + if (cost_vec == NULL) >> +/* The cost_vec is NULL during transfrom. */ >> +gcc_assert ((!loop_lens || !loop_masks)); >> >> /* Targets with store-lane instructions must not require explicit >> realignment. vect_supportable_dr_alignment always returns either >> @@ -10577,8 +10582,13 @@ vectorizable_load (vec_info *vinfo, >> ? &LOOP_VINFO_LENS (loop_vinfo) >> : NULL); >> >> - /* Shouldn't go with length-based approach if fully masked. */ >> - gcc_assert (!loop_lens || !loop_masks); >> + /* The vect_transform_stmt and v
[PATCH] testsuite: Define _POSIX_C_SOURCE for test
Ok for trunk? -- As the tests assume that strndup() is visible (only part of POSIX.1-2008) define the guard to ensure that it's visible. Currently, glibc appears to always have this defined in C++, newlib does not. Without this patch, fails like this can be seen: Testing analyzer/strndup-1.c, -std=c++98 .../strndup-1.c: In function 'void test_1(const char*)': .../strndup-1.c:11:13: error: 'strndup' was not declared in this scope; did you mean 'strncmp'? .../strndup-1.c: In function 'void test_2(const char*)': .../strndup-1.c:16:13: error: 'strndup' was not declared in this scope; did you mean 'strncmp'? .../strndup-1.c: In function 'void test_3(const char*)': .../strndup-1.c:21:13: error: 'strndup' was not declared in this scope; did you mean 'strncmp'? Patch has been verified on Linux. gcc/testsuite/ChangeLog: * c-c++-common/analyzer/strndup-1.c: Define _POSIX_C_SOURCE. Signed-off-by: Torbjörn SVENSSON --- gcc/testsuite/c-c++-common/analyzer/strndup-1.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/c-c++-common/analyzer/strndup-1.c b/gcc/testsuite/c-c++-common/analyzer/strndup-1.c index 85ccae85d83..577ece0cfba 100644 --- a/gcc/testsuite/c-c++-common/analyzer/strndup-1.c +++ b/gcc/testsuite/c-c++-common/analyzer/strndup-1.c @@ -1,4 +1,5 @@ /* { dg-skip-if "no strndup in libc" { *-*-darwin[789]* *-*-darwin10* hppa*-*-hpux* *-*-mingw* *-*-vxworks* } } */ +/* { dg-additional-options "-D_POSIX_C_SOURCE=200809L" } */ #include #include -- 2.25.1
Re: [wwwdocs] Add Ada's GCC 14 changelog entry
Hi all, I have a new revision of the patch. Alexandre pointed out a few issues with the hardening options and I agreed with the comments. I took a look at when the boolean hardening and stack scrubbing options became available within Ada. Hardbools were already available in GCC 13.1, stack scrubbing was already present in GCC 12.1. Which means that adding this changes to the changelog would be incorrect. The additional compiler hardening options/flags within GCC are not unique to Ada and they are already documented in the general compiler section and they are available for the C family of languages as well as Ada. Therefore, it made sense not to explicitly have them in the Ada section. Nonetheless, there have been some (smaller) hardening improvements to Ada, so I just wrote a generic note and pointers to the documentation. I know this is not the pretties thing to do, but I did something similar in the GCC 12 changelog so... On 2/26/24 20:36, Fernando Oleo Blanco wrote: > Hi Mark, > > On 2/26/24 10:17, Marc Poulhiès wrote: >> >> Fernando, >> >> Thank you for this work! I have a few comments, see below. >> >> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html >> index 85ccc54d..e6c96c9f 100644 >> --- a/htdocs/gcc-14/changes.html >> +++ b/htdocs/gcc-14/changes.html >> @@ -171,7 +171,49 @@ a work-in-progress. >> >>New Languages and Language specific improvements >> >> - >> +Ada [... omitted for brevity ...] > > I have applied your recommendations. The documentation links are still > not up... Nonetheless, I created the URL in such a way that they should > work once the final documentation is given a release number (which I > guessed to be 14.1.0). If you think this can be improved just say so. > Nonetheless, feel free to modify my patch if you see it fit. In this newly revised patch I have not modified the URLs to point to the future GCC 14 documentation. I saw that the links in the changelog all had the unversioned "master" links, so I just followed the same convention. > > Best regards, > Fer I squashed the different commits I had submitted and created a completely new patch. Hopefully this is acceptable and leads to a cleaner, less noisy commit history/patch. It is attached to the email. I think the patch should be in an acceptable state to be committed, but feel free to give back any feedback! Best regards, FerFrom 9ad2e979e921938c466de3a7868346e8b2426e49 Mon Sep 17 00:00:00 2001 From: Fernando Oleo Blanco Date: Sun, 10 Mar 2024 17:47:46 +0100 Subject: [PATCH] Add Ada changes for v14 --- htdocs/gcc-14/changes.html | 43 +- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index 85ccc54d..0886241a 100644 --- a/htdocs/gcc-14/changes.html +++ b/htdocs/gcc-14/changes.html @@ -171,7 +171,48 @@ a work-in-progress. New Languages and Language specific improvements - +Ada + + + Several new implementation defined aspects and contracts have been + added: + + Exceptional_Cases may be specified for procedures and + functions with side effects; it can be used to list exceptions that might + be propagated by the subprogram with side effects in the context of its + precondition, and associate them with a specific postcondition. For more + information, refer to SPARK 2014 Reference Manual, section 6.1.9. + User_Aspect takes an argument that is the name of an + aspect defined by a User_Aspect_Definition configuration pragma. + Local_Restrictions is used to specify that a particular + subprogram does not violate one or more local restrictions, nor can it + call a subprogram that is not subject to the same requirements. + Side_Effects is equivalent to pragma + Side_Effecs. + Always_Terminates is a boolean equivalent to pragma + Always_Terminates + Ghost_Predicate introduces a subtype predicate that can + reference Ghost entities. + +For more information and usage guidelines, see +the https://gcc.gnu.org/onlinedocs/gnat_rm/Implementation-Defined-Pragmas.html";>GNAT +Reference Manual. + + The new attributes and contracts have been applied to the relevant parts +of the Ada library and more code has been proven to be correct. + Initial support for the + https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/";>CHERI + architecture. + Support for the LoongArch architecture. + Support for vxWorks 7 Cert RTP has been removed. + Additional hardening improvements. For more information reltated to hardening options, refer to +the https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fharden-compares";>GCC +Instrumentation Options and +the https://gcc.gnu.org/onlinedocs/gnat_rm/Security-Hardening-Features.html";>GNAT +Reference Manual, Security and Hardening Features. + + Further clean up and improvements to the GNAT code. +
[committed] [PR tree-optimization/110199] Simplify MIN/MAX more often
So as I mentioned in the BZ, the case of t = MIN_EXPR (A, B) where we know something about the relationship between A and B can be trivially handled by some existing code in DOM. That existing code would simplify when A == B. But by testing GE and LE instead of EQ we can cover more cases with minimal effort. When applicable the MIN/MAX turns into a simple copy. I made one other change. We have other binary operations that we simplify when we know something about the relationship between the operands. That code was not canonicalizing the order of operands when building the expression to lookup in the hash tables to discover that relationship. Since those paths are only testing for equality, we can trivially reverse them and not have to worry about changing codes or anything like that. So extremely safe and avoids having to come back and fix that code to match the MIN_EXPR/MAX_EXPR case later. Bootstrapped on x86 and also tested on the crosses. I briefly thought there was an sh regression, but that was actually the recent fwprop changes twiddling code generation for one test. Pushed to the trunk. Jeffcommit 8fe27ed193d60f6cd8b34761858a720c95eabbdb Author: jlaw Date: Sun Mar 10 11:58:00 2024 -0600 [committed] [PR tree-optimization/110199] Simplify MIN/MAX more often So as I mentioned in the BZ, the case of t = MIN_EXPR (A, B) where we know something about the relationship between A and B can be trivially handled by some existing code in DOM. That existing code would simplify when A == B. But by testing GE and LE instead of EQ we can cover more cases with minimal effort. When applicable the MIN/MAX turns into a simple copy. I made one other change. We have other binary operations that we simplify when we know something about the relationship between the operands. That code was not canonicalizing the order of operands when building the expression to lookup in the hash tables to discover that relationship. Since those paths are only testing for equality, we can trivially reverse them and not have to worry about changing codes or anything like that. So extremely safe and avoids having to come back and fix that code to match the MIN_EXPR/MAX_EXPR case later. Bootstrapped on x86 and also tested on the crosses. I briefly thought there was an sh regression, but that was actually the recent fwprop changes twiddling code generation for one test. PR tree-optimization/110199 gcc/ * tree-ssa-scopedtables.cc (avail_exprs_stack::simplify_binary_operation): Generalize handling of MIN_EXPR/MAX_EXPR to allow additional simplifications. Canonicalize comparison operands for other cases. gcc/testsuite * gcc.dg/tree-ssa/minmax-27.c: New test. * gcc.dg/tree-ssa/minmax-28.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-27.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-27.c new file mode 100644 index 000..4b94203b0d0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-27.c @@ -0,0 +1,118 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-dom2" } */ + + +int min1(int a, int b) +{ +if (a <= b) +return a < b ? a : b; +return 0; +} + +int min2(int a, int b) +{ +if (a <= b) +return a > b ? b : a; +return 0; +} + +int min3(int a, int b) +{ +if (a < b) +return a < b ? a : b; +return 0; +} + +int min4(int a, int b) +{ +if (a < b) +return a > b ? b : a; +return 0; +} + +int min5(int a, int b) +{ +if (a <= b) +return a <= b ? a : b; +return 0; +} + +int min6(int a, int b) +{ +if (a <= b) +return a >= b ? b : a; +return 0; +} + +int min7(int a, int b) +{ +if (a < b) +return a <= b ? a : b; +return 0; +} + +int min8(int a, int b) +{ +if (b > a) +return a >= b ? b : a; +return 0; +} + +int min9(int a, int b) +{ +if (b >= a) +return a < b ? a : b; +return 0; +} + +int min10(int a, int b) +{ +if (b >= a) +return a > b ? b : a; +return 0; +} + +int min11(int a, int b) +{ +if (b > a) +return a < b ? a : b; +return 0; +} + +int min12(int a, int b) +{ +if (b > a) +return a > b ? b : a; +return 0; +} + +int min13(int a, int b) +{ +if (b >= a) +return a <= b ? a : b; +return 0; +} + +int min14(int a, int b) +{ +if (b >= a) +return a >= b ? b : a; +return 0; +} + +int min15(int a, int b) +{ +if (b > a) +return a <= b ? a : b; +return 0; +} + +int min16(int a, int b) +{ +if (b > a) +return a >= b ? b : a; +return 0; +} + +/* { dg-final { scan-tree-dump-not "MIN_EXPR" "dom2" } } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-28.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-28.c new file mode 100644 index 000..7
[committed] d: Fix -fpreview=in ICEs with forward referenced parameter [PR112285]
Hi, This patch removes the early lowering of D AST types as GCC trees in Target::preferPassByRef, fixing both PR12285 and PR12290. The way that the target hook preferPassByRef is implemented, it relied on the GCC "back-end" tree type to determine whether or not to use `ref' ABI for D `in' parameters; e.g: prefer by value if it is expected that the target will pass the type around in registers. Building the GCC tree type depends on the AST type being complete - all semantic processing is finished - but as this hook is called from the front-end, this will not be the case for forward referenced or self-referencing types. The consensus in upstream is that `in' parameters should always be implicitly `ref', but as the front-end does not yet support all types being rvalue references, limit this just static arrays and structs. Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed to mainline and backported to releases/gcc-13 and releases/gcc-12. Regards, Iain. --- PR d/112285 PR d/112290 gcc/d/ChangeLog: * d-target.cc (Target::preferPassByRef): Return true for all static array and struct types. gcc/testsuite/ChangeLog: * gdc.dg/pr112285.d: New test. * gdc.dg/pr112290.d: New test. --- gcc/d/d-target.cc | 25 + gcc/testsuite/gdc.dg/pr112285.d | 13 + gcc/testsuite/gdc.dg/pr112290.d | 15 +++ 3 files changed, 33 insertions(+), 20 deletions(-) create mode 100644 gcc/testsuite/gdc.dg/pr112285.d create mode 100644 gcc/testsuite/gdc.dg/pr112290.d diff --git a/gcc/d/d-target.cc b/gcc/d/d-target.cc index b9d124422b7..9b69b0bc873 100644 --- a/gcc/d/d-target.cc +++ b/gcc/d/d-target.cc @@ -575,31 +575,16 @@ Target::supportsLinkerDirective (void) const } /* Decides whether an `in' parameter of the specified POD type PARAM_TYPE is to - be passed by reference or by valie. This is used only when compiling with + be passed by reference or by value. This is used only when compiling with `-fpreview=in' enabled. */ bool Target::preferPassByRef (Type *param_type) { - if (param_type->size () == SIZE_INVALID) + /* See note in Target::isReturnOnStack. */ + Type *tb = param_type->toBasetype (); + if (tb->size () == SIZE_INVALID) return false; - tree type = build_ctype (param_type); - - /* Prefer a `ref' if the type is an aggregate, and its size is greater than - its alignment. */ - if (AGGREGATE_TYPE_P (type) - && (!valid_constant_size_p (TYPE_SIZE_UNIT (type)) - || compare_tree_int (TYPE_SIZE_UNIT (type), TYPE_ALIGN (type)) > 0)) -return true; - - /* If the back-end is always going to pass this by invisible reference. */ - if (pass_by_reference (NULL, function_arg_info (type, true))) -return true; - - /* If returning the parameter means the caller will do RVO. */ - if (targetm.calls.return_in_memory (type, NULL_TREE)) -return true; - - return false; + return (tb->ty == TY::Tstruct || tb->ty == TY::Tsarray); } diff --git a/gcc/testsuite/gdc.dg/pr112285.d b/gcc/testsuite/gdc.dg/pr112285.d new file mode 100644 index 000..5ca100a74a9 --- /dev/null +++ b/gcc/testsuite/gdc.dg/pr112285.d @@ -0,0 +1,13 @@ +// { dg-do compile } +// { dg-additional-options "-fpreview=in" } +struct S112285 +{ +} + +class C112285 +{ +S112285 s; +void f112285(in C112285) +{ +} +} diff --git a/gcc/testsuite/gdc.dg/pr112290.d b/gcc/testsuite/gdc.dg/pr112290.d new file mode 100644 index 000..7456fc21be1 --- /dev/null +++ b/gcc/testsuite/gdc.dg/pr112290.d @@ -0,0 +1,15 @@ +// { dg-do compile } +// { dg-additional-options "-fpreview=in" } +struct S112290a +{ +S112290b* p; +bool opEquals(in S112290a) +{ +return p == p; +} +} + +struct S112290b +{ +string s; +} -- 2.40.1
[RFC] [PR tree-optimization/92539] Optimize away tests against invalid pointers
Here's a potential approach to fixing PR92539, a P2 -Warray-bounds false positive triggered by loop unrolling. As I speculated a couple years ago, we could eliminate the comparisons against bogus pointers. Consider: [local count: 30530247]: if (last_12 != &MEM [(void *)"aa" + 3B]) goto ; [54.59%] else goto ; [45.41%] That's a valid comparison as ISO allows us to generate, but not dereference, a pointer one element past the end of the object. But +4B is a bogus pointer. So given an EQ comparison against that pointer we could always return false and for NE always return true. VRP and DOM seem to be the most natural choices for this kind of optimization on the surface. However DOM is actually not viable because the out-of-bounds pointer warning pass is run at the end of VRP. So we've got to take care of this prior to the end of VRP. I haven't done a bootstrap or regression test with this. But if it looks reasonable I can certainly push on it further. I have confirmed it does eliminate the tests and shuts up the bogus warning. The downside is this would also shut up valid warnings if user code did this kind of test. Comments/Suggestions? Jeff
Re: [RFC] [PR tree-optimization/92539] Optimize away tests against invalid pointers
On Sun, Mar 10, 2024 at 2:04 PM Jeff Law wrote: > > Here's a potential approach to fixing PR92539, a P2 -Warray-bounds false > positive triggered by loop unrolling. > > As I speculated a couple years ago, we could eliminate the comparisons > against bogus pointers. Consider: > > >[local count: 30530247]: > > if (last_12 != &MEM [(void *)"aa" + 3B]) > > goto ; [54.59%] > > else > > goto ; [45.41%] > > > That's a valid comparison as ISO allows us to generate, but not > dereference, a pointer one element past the end of the object. > > But +4B is a bogus pointer. So given an EQ comparison against that > pointer we could always return false and for NE always return true. > > VRP and DOM seem to be the most natural choices for this kind of > optimization on the surface. However DOM is actually not viable because > the out-of-bounds pointer warning pass is run at the end of VRP. So > we've got to take care of this prior to the end of VRP. > > > > I haven't done a bootstrap or regression test with this. But if it > looks reasonable I can certainly push on it further. I have confirmed it > does eliminate the tests and shuts up the bogus warning. > > The downside is this would also shut up valid warnings if user code did > this kind of test. > > Comments/Suggestions? ENOPATCH > > Jeff
Re: [RFC] [PR tree-optimization/92539] Optimize away tests against invalid pointers
On 3/10/24 3:05 PM, Andrew Pinski wrote: On Sun, Mar 10, 2024 at 2:04 PM Jeff Law wrote: Here's a potential approach to fixing PR92539, a P2 -Warray-bounds false positive triggered by loop unrolling. As I speculated a couple years ago, we could eliminate the comparisons against bogus pointers. Consider: [local count: 30530247]: if (last_12 != &MEM [(void *)"aa" + 3B]) goto ; [54.59%] else goto ; [45.41%] That's a valid comparison as ISO allows us to generate, but not dereference, a pointer one element past the end of the object. But +4B is a bogus pointer. So given an EQ comparison against that pointer we could always return false and for NE always return true. VRP and DOM seem to be the most natural choices for this kind of optimization on the surface. However DOM is actually not viable because the out-of-bounds pointer warning pass is run at the end of VRP. So we've got to take care of this prior to the end of VRP. I haven't done a bootstrap or regression test with this. But if it looks reasonable I can certainly push on it further. I have confirmed it does eliminate the tests and shuts up the bogus warning. The downside is this would also shut up valid warnings if user code did this kind of test. Comments/Suggestions? ENOPATCH Yea, realized it as I pushed the send button. Then t-bird crashed, repeatedly. Attached this time.. jeff diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc index 3ccb77d28be..cc753e79a8a 100644 --- a/gcc/vr-values.cc +++ b/gcc/vr-values.cc @@ -325,6 +325,34 @@ simplify_using_ranges::fold_cond_with_ops (enum tree_code code, if (res == range_false (type)) return boolean_false_node; } + + /* If we're comparing pointers and one of the pointers is + not a valid pointer (say &MEM [(void *)"aa" + 4B) + return true for EQ and false for NE. */ + if ((code == EQ_EXPR || code == NE_EXPR) + && POINTER_TYPE_P (type) + && TREE_CODE (op1) == ADDR_EXPR + && TREE_CODE (TREE_OPERAND (op1, 0)) == MEM_REF) +{ + tree mem_ref = TREE_OPERAND (op1, 0); + if (TREE_CODE (TREE_OPERAND (mem_ref, 0)) == ADDR_EXPR) + { + tree addr_expr = TREE_OPERAND (mem_ref, 0); + if (TREE_CODE (TREE_OPERAND (addr_expr, 0)) == STRING_CST) + { + tree string = TREE_OPERAND (addr_expr, 0); + + if (TREE_CODE (TREE_OPERAND (mem_ref, 1)) == INTEGER_CST) + { + HOST_WIDE_INT len = TREE_STRING_LENGTH (string); + HOST_WIDE_INT offset = tree_to_uhwi (TREE_OPERAND (mem_ref, 1)); + if (offset > len) + return code == EQ_EXPR ? boolean_false_node : boolean_true_node; + } + } + } +} + return NULL; }
[PATCH, v2] Fortran: use name of array component in runtime error message [PR30802]
Dear all, after playing for some time with NAG and Intel, and an off-list discussion with Jerry, I am getting more and more convinced that simpler runtime error messages (also simpler to parse by a human) are superior to awkward solutions. This is also what Intel does: use only the name of the array (component) in the message whose indices are out of bounds. (NAG's solution appears also inconsistent for nested derived types.) So no x%z, or x%_data, etc. in runtime error messages any more. Please give it a spin... Regtested on x86_64-pc-linux-gnu. OK for mainline? Thanks, Harald On 1/30/24 11:46, Mikael Morin wrote: Le 30/01/2024 à 11:38, Mikael Morin a écrit : Another (easier) way to clarify the data reference would be rephrasing the message so that the array part is separate from the scalar part, like so (there are too many 'of', but I lack inspiration): Index '0' of dimension 1 of component 'zz' of element from 'x1%vv' below lower bound of 1 This has the same number of 'of' but sounds better maybe: Out of bounds accessing component 'zz' of element from 'x1%yy': index '0' of dimension 1 below lower bound of 1 From cdf3b197beed0ce1649661b2132643b54cbade8d Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Sun, 10 Mar 2024 22:14:30 +0100 Subject: [PATCH] Fortran: use name of array component in runtime error message [PR30802] gcc/fortran/ChangeLog: PR fortran/30802 * trans-array.cc (trans_array_bound_check): Find name of component to use in runtime error message. (array_bound_check_elemental): Likewise. (gfc_conv_array_ref): Likewise. gcc/testsuite/ChangeLog: PR fortran/30802 * gfortran.dg/bounds_check_17.f90: Adjust dg-pattern. * gfortran.dg/bounds_check_fail_6.f90: Likewise. * gfortran.dg/pr92050.f90: Likewise. * gfortran.dg/bounds_check_fail_8.f90: New test. --- gcc/fortran/trans-array.cc| 60 +-- gcc/testsuite/gfortran.dg/bounds_check_17.f90 | 2 +- .../gfortran.dg/bounds_check_fail_6.f90 | 7 ++- .../gfortran.dg/bounds_check_fail_8.f90 | 48 +++ gcc/testsuite/gfortran.dg/pr92050.f90 | 2 +- 5 files changed, 83 insertions(+), 36 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/bounds_check_fail_8.f90 diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index 3673fa40720..9c62b070c50 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -3497,6 +3497,8 @@ trans_array_bound_check (gfc_se * se, gfc_ss *ss, tree index, int n, tree descriptor; char *msg; const char * name = NULL; + gfc_expr *expr; + gfc_ref *ref; if (!(gfc_option.rtcheck & GFC_RTCHECK_BOUNDS)) return index; @@ -3509,6 +3511,24 @@ trans_array_bound_check (gfc_se * se, gfc_ss *ss, tree index, int n, name = ss->info->expr->symtree->n.sym->name; gcc_assert (name != NULL); + /* When we have a component ref, get name of the array section. + Note that there can only be one part ref. */ + expr = ss->info->expr; + if (expr->ref && !compname) +{ + for (ref = expr->ref; ref; ref = ref->next) + { + /* Remember component name. */ + if (ref->type == REF_COMPONENT) + { + name = ref->u.c.component->name; + continue; + } + if (ref->type == REF_ARRAY && ref->u.ar.type == AR_SECTION) + break; + } +} + if (VAR_P (descriptor)) name = IDENTIFIER_POINTER (DECL_NAME (descriptor)); @@ -3574,29 +3594,20 @@ array_bound_check_elemental (gfc_se * se, gfc_ss * ss, gfc_expr * expr) gfc_array_ref *ar; gfc_ref *ref; gfc_symbol *sym; - char *var_name = NULL; - size_t len; + const char *var_name = NULL; int dim; if (expr->expr_type == EXPR_VARIABLE) { sym = expr->symtree->n.sym; - len = strlen (sym->name) + 1; - - for (ref = expr->ref; ref; ref = ref->next) - if (ref->type == REF_COMPONENT) - len += 2 + strlen (ref->u.c.component->name); - - var_name = XALLOCAVEC (char, len); - strcpy (var_name, sym->name); + var_name = sym->name; for (ref = expr->ref; ref; ref = ref->next) { - /* Append component name. */ + /* Get component name. */ if (ref->type == REF_COMPONENT) { - strcat (var_name, "%%"); - strcat (var_name, ref->u.c.component->name); + var_name = ref->u.c.component->name; continue; } @@ -4001,7 +4012,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, gfc_se indexse; gfc_se tmpse; gfc_symbol * sym = expr->symtree->n.sym; - char *var_name = NULL; + const char *var_name = NULL; if (ar->dimen == 0) { @@ -4035,30 +4046,17 @@ gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr, if (gfc_option.rtcheck & GFC_RTCHECK_BOUNDS) { - size_t len; gfc_ref *ref; - len = strlen (sym->name) + 1; - for (ref = expr->ref; ref; ref = ref->next) - { - if (ref->type == REF_ARRAY && &ref->u.ar == ar) - break; - if (ref->type == REF_C
Re: [PATCH wwwdocs] gcc-14: Some very common historic Autoconf probes that no longer work
On Sat, 17 Feb 2024, Florian Weimer wrote: > + > +Older Autoconf versions (for example, Autoconf 2.13) generate core > +probes that are incompatible with C99. These include the basic > +compiler functionality check: : : Yes, thank you! Gerald PS: Feel free to copy me on wwwdocs patches.
RE: [PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled
> You might want to investigate why you get mask and not Len for a particular > stmt. mixing will cause variable length vectorization to fail. Yes, the new added gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c cannot vectorize, will try to investigate why. Pan -Original Message- From: Richard Biener Sent: Monday, March 11, 2024 1:05 AM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Wang, Yanzhang ; rdapp@gmail.com; jeffreya...@gmail.com Subject: Re: [PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len and store are enabled > Am 10.03.2024 um 11:02 schrieb Li, Pan2 : > > Committed, thanks Richard. You might want to investigate why you get mask and not Len for a particular stmt. mixing will cause variable length vectorization to fail. > Pan > > -Original Message- > From: Richard Biener > Sent: Sunday, March 10, 2024 2:53 PM > To: Li, Pan2 > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > Wang, Yanzhang ; rdapp@gmail.com; > jeffreya...@gmail.com > Subject: Re: [PATCH v2] VECT: Fix ICE for vectorizable LD/ST when both len > and store are enabled > > > >> Am 10.03.2024 um 04:14 schrieb pan2...@intel.com: >> >> From: Pan Li >> >> This patch would like to fix one ICE in vectorizable_store when both the >> loop_masks and loop_lens are enabled. The ICE looks like below when build >> with "-march=rv64gcv -O3". >> >> during GIMPLE pass: vect >> test.c: In function ‘d’: >> test.c:6:6: internal compiler error: in vectorizable_store, at >> tree-vect-stmts.cc:8691 >> 6 | void d() { >> | ^ >> 0x37a6f2f vectorizable_store >> .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:8691 >> 0x37b861c vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*, >> _slp_tree*, _slp_instance*, vec*) >> .../__RISC-V_BUILD__/../gcc/tree-vect-stmts.cc:13242 >> 0x1db5dca vect_analyze_loop_operations >> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:2208 >> 0x1db885b vect_analyze_loop_2 >> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3041 >> 0x1dba029 vect_analyze_loop_1 >> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3481 >> 0x1dbabad vect_analyze_loop(loop*, vec_info_shared*) >> .../__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3639 >> 0x1e389d1 try_vectorize_loop_1 >> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1066 >> 0x1e38f3d try_vectorize_loop >> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1182 >> 0x1e39230 execute >> .../__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1298 >> >> There are two ways to reach vectorizer LD/ST, one is the analysis and >> the other is transform. We cannot have both the lens and the masks >> enabled during transform but it is valid during analysis. Given the >> transform doesn't required cost_vec, we can only enable the assert >> based on cost_vec is NULL or not. >> >> Below testsuites are passed for this patch: >> * The x86 bootstrap tests. >> * The x86 fully regression tests. >> * The aarch64 fully regression tests. >> * The riscv fully regressison tests. > > Ok > > Thanks, > Richard > >> gcc/ChangeLog: >> >> * tree-vect-stmts.cc (vectorizable_store): Enable the assert >> during transform process. >> (vectorizable_load): Ditto. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/rvv/base/pr114195-1.c: New test. >> >> Signed-off-by: Pan Li >> --- >> .../gcc.target/riscv/rvv/base/pr114195-1.c | 15 +++ >> gcc/tree-vect-stmts.cc | 18 ++ >> 2 files changed, 29 insertions(+), 4 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c >> >> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c >> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c >> new file mode 100644 >> index 000..a67b847112b >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114195-1.c >> @@ -0,0 +1,15 @@ >> +/* Test that we do not have ice when compile */ >> +/* { dg-do compile } */ >> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */ >> + >> +long a, b; >> +extern short c[]; >> + >> +void d() { >> + for (int e = 0; e < 35; e = 2) { >> +a = ({ a < 0 ? a : 0; }); >> +b = ({ b < 0 ? b : 0; }); >> + >> +c[e] = 0; >> + } >> +} >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index 14a3ffb5f02..e8617439a48 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -8697,8 +8697,13 @@ vectorizable_store (vec_info *vinfo, >> ? &LOOP_VINFO_LENS (loop_vinfo) >> : NULL); >> >> - /* Shouldn't go with length-based approach if fully masked. */ >> - gcc_assert (!loop_lens || !loop_masks); >> + /* The vect_transform_stmt and vect_analyze_stmt will go here but there >> + are some difference here. We cannot enable both the lens and masks >> + during transform but it is allowed during analysis. >> + Shouldn't go wit
[COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]
The problem here is that merge_truthop_with_opposite_arm would use the type of the result of the comparison rather than the operands of the comparison to figure out if we are honoring NaNs. This fixes that oversight and now we get the correct results in this case. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/95351 gcc/ChangeLog: * fold-const.cc (merge_truthop_with_opposite_arm): Use the type of the operands of the comparison and not the type of the comparison. gcc/testsuite/ChangeLog: * gcc.dg/float_opposite_arm-1.c: New test. Signed-off-by: Andrew Pinski --- gcc/fold-const.cc | 3 ++- gcc/testsuite/gcc.dg/float_opposite_arm-1.c | 17 + 2 files changed, 19 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/float_opposite_arm-1.c diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 43105d20be3..299c22bf391 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -6420,7 +6420,6 @@ static tree merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop, bool rhs_only) { - tree type = TREE_TYPE (cmpop); enum tree_code code = TREE_CODE (cmpop); enum tree_code truthop_code = TREE_CODE (op); tree lhs = TREE_OPERAND (op, 0); @@ -6436,6 +6435,8 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop, if (TREE_CODE_CLASS (code) != tcc_comparison) return NULL_TREE; + tree type = TREE_TYPE (TREE_OPERAND (cmpop, 0)); + if (rhs_code == truthop_code) { tree newrhs = merge_truthop_with_opposite_arm (loc, rhs, cmpop, rhs_only); diff --git a/gcc/testsuite/gcc.dg/float_opposite_arm-1.c b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c new file mode 100644 index 000..d2dbff35066 --- /dev/null +++ b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-original -fdump-tree-optimized" } */ +/* { dg-add-options ieee } */ +/* PR middle-end/95351 */ + +int Foo(double possiblyNAN, double b, double c) +{ +return (possiblyNAN <= 2.0) || ((possiblyNAN > 2.0) && (b > c)); +} + +/* Make sure we don't remove either >/<= */ + +/* { dg-final { scan-tree-dump "possiblyNAN > 2.0e.0" "original" } } */ +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. > 2.0e.0" "optimized" } } */ + +/* { dg-final { scan-tree-dump "possiblyNAN <= 2.0e.0" "original" } } */ +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. <= 2.0e.0" "optimized" } } */ -- 2.43.0
Re: [PATCH] fix PowerPC < 7 w/ Altivec not to default to power7
Hi, on 2024/3/8 19:33, Rene Rebe wrote: > This might not be the best timing -short before a major release-, > however, Sam just commented on the bug I filled years ago [1], so here > we go: > > Glibc uses .machine to determine assembler optimizations to use. > However, since reworking the rs6000 .machine output selection in > commit e154242724b084380e3221df7c08fcdbd8460674 22 May 2019, G5 as > well as Cell, and even power4 w/ -maltivec currently resulted in > power7. Mask _ALTIVEC away as the .machine selection already did for > GFX and GPOPT. Thanks for fixing, this fix looks reasonable to me, OPTION_MASK_ALTIVEC is a part of POWERPC_7400_MASK so any specified cpu type which has this POWERPC_7400_MASK by default and isn't handled early in function rs6000_machine_from_flags can suffer from this issue. > > powerpc64-t2-linux-gnu-gcc test.c -S -o - -mcpu=G5 > .file "test.c" > .machine power7 > .abiversion 2 > .section".text" > .ident "GCC: (GNU) 10.2.0" > .section.note.GNU-stack,"",@progbits > Nit: Could you also add one test case for this? btw, -mdejagnu-cpu=G5 can force the cpu type in dg-options. > We ship this in T2/Linux [2] since 2020 and it is tested on G5, Cell > and Power8. > > Signed-of-by: René Rebe > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97367 > [2] https://t2sde.org > > --- gcc-11.1.0-RC-20210423/gcc/config/rs6000/rs6000.cc.vanilla > 2021-04-25 22:57:16.964223106 +0200 > +++ gcc-11.1.0-RC-20210423/gcc/config/rs6000/rs6000.cc2021-04-25 > 22:57:27.193223841 +0200 > @@ -5765,7 +5765,7 @@ >HOST_WIDE_INT flags = rs6000_isa_flags; > >/* Disable the flags that should never influence the .machine selection. > */ > - flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | > OPTION_MASK_ISEL); > + flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | > OPTION_MASK_ALTIVEC | OPTION_MASK_ISEL); Nit: This line is too long and needs re-format. BR, Kewen > >if ((flags & (ISA_3_1_MASKS_SERVER & ~ISA_3_0_MASKS_SERVER)) != 0) > return "power10"; >
Re: [RFC] [PR tree-optimization/92539] Optimize away tests against invalid pointers
On Sun, Mar 10, 2024 at 2:09 PM Jeff Law wrote: > > > > On 3/10/24 3:05 PM, Andrew Pinski wrote: > > On Sun, Mar 10, 2024 at 2:04 PM Jeff Law wrote: > >> > >> Here's a potential approach to fixing PR92539, a P2 -Warray-bounds false > >> positive triggered by loop unrolling. > >> > >> As I speculated a couple years ago, we could eliminate the comparisons > >> against bogus pointers. Consider: > >> > >>> [local count: 30530247]: > >>>if (last_12 != &MEM [(void *)"aa" + 3B]) > >>> goto ; [54.59%] > >>>else > >>> goto ; [45.41%] > >> > >> > >> That's a valid comparison as ISO allows us to generate, but not > >> dereference, a pointer one element past the end of the object. > >> > >> But +4B is a bogus pointer. So given an EQ comparison against that > >> pointer we could always return false and for NE always return true. > >> > >> VRP and DOM seem to be the most natural choices for this kind of > >> optimization on the surface. However DOM is actually not viable because > >> the out-of-bounds pointer warning pass is run at the end of VRP. So > >> we've got to take care of this prior to the end of VRP. > >> > >> > >> > >> I haven't done a bootstrap or regression test with this. But if it > >> looks reasonable I can certainly push on it further. I have confirmed it > >> does eliminate the tests and shuts up the bogus warning. > >> > >> The downside is this would also shut up valid warnings if user code did > >> this kind of test. > >> > >> Comments/Suggestions? > > > > ENOPATCH > Yea, realized it as I pushed the send button. Then t-bird crashed, > repeatedly. > > Attached this time.. One minor comment on it The comment: > return true for EQ and false for NE. Seems to be the opposite for what the code does: > return code == EQ_EXPR ? boolean_false_node : boolean_true_node; Thanks, Andrew > > jeff >
[PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits
Hi, This patch tries to fix the problem when a canonical form doesn't benefit on a specific target. The const operand of AND is and with the nonzero bits of another operand in combine pass. It's a canonical form, but it's no benefits for the target which has rotate and mask insns. As the mask is truncated, it can't match the insn conditions which it originally matches. For example, the following insn condition checks the sum of two AND masks. When one of the mask is truncated, the condition breaks. (define_insn "*rotlsi3_insert_5" [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") (match_operand:SI 2 "const_int_operand" "n,n")) (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") (match_operand:SI 4 "const_int_operand" "n,n"] "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" ... This patch tries to fix the problem by comparing the rtx cost. If another operand (varop) is not changed and rtx cost with new mask is not less than the original one, the mask is restored to original one. I'm not sure if comparison of rtx cost here is proper. The outer code is unknown and I suppose it as "SET". Also the rtx cost might not be accurate. >From my understanding, the canonical forms should always benefit as it can't be undo in combine pass. Do we have a perfect solution for this kind of issues? Looking forward for your advice. Another similar issues for canonical forms. Whether the widen mode for lshiftrt is always good? https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html Thanks Gui Haochen ChangeLog Combine: Don't truncate const operand of AND if it's no benefits In combine pass, the canonical form is to turn off all bits in the constant that are know to already be zero for AND. /* Turn off all bits in the constant that are known to already be zero. Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS which is tested below. */ constop &= nonzero; But it doesn't benefit when the target has rotate and mask insert insns. The AND mask is truncated and lost its information. Thus it can't match the insn conditions. For example, the following insn condition checks the sum of two AND masks. (define_insn "*rotlsi3_insert_5" [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r") (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r") (match_operand:SI 2 "const_int_operand" "n,n")) (and:SI (match_operand:SI 3 "gpc_reg_operand" "r,0") (match_operand:SI 4 "const_int_operand" "n,n"] "rs6000_is_valid_mask (operands[2], NULL, NULL, SImode) && UINTVAL (operands[2]) != 0 && UINTVAL (operands[4]) != 0 && UINTVAL (operands[2]) + UINTVAL (operands[4]) + 1 == 0" ... This patch restores the const operand of AND if the another operand is not optimized and the truncated const operand doesn't save the rtx cost. gcc/ * combine.cc (simplify_and_const_int_1): Restore the const operand of AND if varop is not optimized and the rtx cost of the new const operand is not reduced. gcc/testsuite/ * gcc.target/powerpc/rlwimi-0.c: Reduced total number of insns and adjust the number of rotate and mask insns. * gcc.target/powerpc/rlwimi-1.c: Likewise. * gcc.target/powerpc/rlwimi-2.c: Likewise. patch.diff diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d836..16ff09ea854 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -10161,8 +10161,23 @@ simplify_and_const_int_1 (scalar_int_mode mode, rtx varop, if (constop == nonzero) return varop; - if (varop == orig_varop && constop == orig_constop) -return NULL_RTX; + if (varop == orig_varop) +{ + if (constop == orig_constop) + return NULL_RTX; + else + { + rtx tmp = simplify_gen_binary (AND, mode, varop, +gen_int_mode (constop, mode)); + rtx orig = simplify_gen_binary (AND, mode, varop, + gen_int_mode (orig_constop, mode)); + if (set_src_cost (tmp, mode, optimize_this_for_speed_p) + < set_src_cost (orig, mode, optimize_this_for_speed_p)) + return tmp; + else + return NULL_RTX; + } +} /* Otherwise, return an AND. */ return simplify_gen_binary (AND, mode, varop, gen_int_mode (constop, mode)); diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c index 961be199901..d9dd4419f1d 100644 --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-0.c @@ -2,15 +2,15 @@ /* { dg-options "-O2" } */ /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]