[PATCH] RISC-V: Add Z*inx incompatible check in gcc.
Z*inx is conflict with float extensions, add incompatible check when z*inx and hard_float both enabled. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_option_override): New check. --- gcc/config/riscv/riscv.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 76eee4a55e9..162ba14d3c7 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -6285,6 +6285,10 @@ riscv_option_override (void) && riscv_abi != ABI_LP64 && riscv_abi != ABI_ILP32E) error ("z*inx requires ABI ilp32, ilp32e or lp64"); + // Zfinx is conflict with float extensions. + if (TARGET_ZFINX && TARGET_HARD_FLOAT) +error ("z*inx is conflict with float extensions"); + /* We do not yet support ILP32 on RV64. */ if (BITS_PER_WORD != POINTER_SIZE) error ("ABI requires %<-march=rv%d%>", POINTER_SIZE); -- 2.25.1
[pushed] doc: Remove anachronistic note related to languages built
Jonathan's patch https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604796.html lat November made me have a look for further instances, and indeed there was another one referring to separate tarballs (which we have not been shipping for a fair bit). Since the item above already refers to `gcc -v` we can simply drop the entire list item. Pushed. Gerald --- This is another instance of what ce51e8439a49 (and originally 05432288d4e5) addressed in a different part. We stopped shipping granular tarballs years ago. gcc/ChangeLog: * doc/install.texi: Remove anachronistic note related to languages built and separate source tarballs. --- gcc/doc/install.texi | 7 --- 1 file changed, 7 deletions(-) diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 63fc949b447..15aef1394f4 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -3481,13 +3481,6 @@ The output of @samp{gcc -v} for your newly installed @command{gcc}. This tells us which version of GCC you built and the options you passed to configure. -@item -Whether you enabled all languages or a subset of them. If you used a -full distribution then this information is part of the configure -options in the output of @samp{gcc -v}, but if you downloaded the -``core'' compiler plus additional front ends then it isn't apparent -which ones you built unless you tell us about it. - @item If the build was for GNU/Linux, also include: @itemize @bullet -- 2.39.2
[wwwdocs] Add Ada's GCC13 changelog entry
Hi all, a bit belated but just like last year, I've made a patch for the Ada entry in the changelog. You can find the patch attached to this email. If I have forgotten anything relevant or if I have done something incorrectly, please, say so. Best regards, Fernando Oleo BlancoFrom d273bb1835c1ef23e15d422bed22ca5d333cbdae Mon Sep 17 00:00:00 2001 From: Fernando Oleo Blanco Date: Sun, 26 Mar 2023 14:20:36 +0200 Subject: [PATCH 1/1] [PATCH] Add Ada's entry in the v13 changelog Signed-off-by: Fernando Oleo Blanco --- htdocs/gcc-13/changes.html | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index ff70d2ee..2e25bcf5 100644 --- a/htdocs/gcc-13/changes.html +++ b/htdocs/gcc-13/changes.html @@ -160,7 +160,16 @@ a work-in-progress. New Languages and Language specific improvements - +Ada + + Traceback support added in RTEMS for the PPC ELF and ARM architectures. + Support for versions older than VxWorks 7 has been removed. + General improvements to the contracts in the standard libraries. + Addition of GNAT.Binary_Search. + Further additions and fixes for the Ada 2022 specification. + The Pragma SPARK_Mode=>Auto is now accepted. Contract analysis has been further improved. + Documentation improvements. + C family -- 2.40.0
[PATCH] c++, coroutines: Stabilize names of promoted slot vars [PR101118].
Tested on x86_64-darwin21, x86-64-linux-gnu OK for trunk? Iain When we need to 'promote' a value (i.e. store it in the coroutine frame) it is given a frame entry name. This was based on the DECL_UID for slot vars. However, when LTO is used, the names from multiple TUs become visible at the same time, and the DECL_UIDs usually differ between units. This leads to a "ODR mismatch" warning for the frame type. The fix here is to use a counter instead of the DECL_UID which makes a name that is stable between TUs for each frame layout (one per coroutine func). Signed-off-by: Iain Sandoe PR c++/101118 gcc/cp/ChangeLog: * coroutines.cc: Add counter for promoted slot vars. (flatten_await_stmt): Use slot vars counter instead of DECL_UID to generate the frame entry name for promoted target expression slot variables. (morph_fn_to_coro): Reset the slot vars counter at the start of each coroutine function. --- gcc/cp/coroutines.cc | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index a2189e43db8..359a5bf46ff 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -2726,6 +2726,11 @@ struct var_nest_node var_nest_node *else_cl; }; +/* This is used to make a stable, but unique-per-function, sequence number for + each TARGET_EXPR slot variable that we 'promote' to a frame entry. It needs + to be stable because the frame type is visible to LTO ODR checking. */ +static unsigned tmpno = 0; + /* This is called for single statements from the co-await statement walker. It checks to see if the statement contains any initializers for awaitables and if any of these capture items by reference. */ @@ -2889,7 +2894,7 @@ flatten_await_stmt (var_nest_node *n, hash_set *promoted, tree init = t; temps_used->add (init); tree var_type = TREE_TYPE (init); - char *buf = xasprintf ("D.%d", DECL_UID (TREE_OPERAND (init, 0))); + char *buf = xasprintf ("T%03u", tmpno++); tree var = build_lang_decl (VAR_DECL, get_identifier (buf), var_type); DECL_ARTIFICIAL (var) = true; free (buf); @@ -4374,6 +4379,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree *destroyer) { gcc_checking_assert (orig && TREE_CODE (orig) == FUNCTION_DECL); + tmpno = 0; *resumer = error_mark_node; *destroyer = error_mark_node; if (!coro_function_valid_p (orig)) -- 2.37.1 (Apple Git-137.1)
Re: [PATCH] predict: Don't emit -Wsuggest-attribute=cold warning for functions which already have that attribute [PR105685]
On 3/25/23 03:53, Jakub Jelinek via Gcc-patches wrote: Hi! In the following testcase, we predict baz to have cold entry regardless of the user supplied attribute (as it call unconditionally a cold function), but still issue a -Wsuggest-attribute=cold warning despite it having that attribute already. The following patch avoids that. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2023-03-25 Jakub Jelinek PR ipa/105685 * predict.cc (compute_function_frequency): Don't call warn_function_cold if function already has cold attribute. * c-c++-common/cold-2.c: New test. OK jeff
Re: [PATCH] tree-optimization/109237 - last_stmt is possibly slow
On 3/22/23 06:29, Richard Biener via Gcc-patches wrote: Most uses of last_stmt are interested in control transfer stmts and for the testcase gimple_purge_dead_eh_edges shows up in the profile. But last_stmt looks past trailing debug stmts but those would be rejected by GIMPLEs verify_flow_info. The following adds possible_ctrl_stmt besides last_stmt which does not look past trailing debug stmts and adjusts gimple_purge_dead_eh_edges. I've put checking code into possible_ctrl_stmt that it will not miss a control statement if the real last stmt is a debug stmt. The alternative would be to change last_stmt, explicitely introducing last_nondebug_stmt. I remember we chickened out and made last_stmt conservative here but not anticipating the compile-time issues this creates. I count 227 last_stmt and 12 last_and_only_stmt uses. Bootstrapped and tested on x86_64-unknown-linux-gnu. Any opinions? I probably lean towards s/last_stmt/last_nondebug_stmt/ in next stage1 and then adding last_stmt and changing some uses back - through for maintainance that's going to be a nightmare (or maybe not, a "wrong" last_stmt should be safe to backport and a last_nondebug_stmt will fail to build). Sounds quite sensible to me. 227+12 isn't terrible and I bet the vast majority, should be safe for last_nondebug_stmt. Richard. PR tree-optimization/109237 * tree-cfg.h (possible_ctrl_stmt): New function returning the last stmt not skipping debug stmts. (gimple_purge_dead_eh_edges): Use it. OK jeff
Re: [PATCH] match.pd: Fix up fneg/fadd simplification [PR109230]
On 3/22/23 04:16, Jakub Jelinek via Gcc-patches wrote: Hi! The following testcase is miscompiled on aarch64-linux. match.pd has a simplification for addsub, where it negates one of the vectors in twice as large floating point element vector (effectively negating every other element) and then doing addition. But a requirement for that is that the permutation picks the right elements, in particular 0, nelts+1, 2, nelts+3, 4, nelts+5, ... The pattern tests this with sel.series_p (0, 2, 0, 2) check, which as documented verifies that the even elements of the permutation mask are identity, but doesn't say anything about the others. The following patch fixes it by also checking that the odd elements start at nelts + 1 with the same step of 2. Bootstrapped/regtested on aarch64-linux, x86_64-linux and i686-linux, ok for trunk? 2023-03-22 Jakub Jelinek PR tree-optimization/109230 * match.pd (fneg/fadd simplify): Verify also odd permutation indexes. * gcc.dg/pr109230.c: New test. OK Jeff
Re: [PATCH] rtl-optimization/109237 - speedup bb_is_just_return
On 3/22/23 04:03, Richard Biener via Gcc-patches wrote: For the testcase bb_is_just_return is on top of the profile, changing it to walk BB insns backwards puts it off the profile. That's because in the forward walk you have to process possibly many debug insns but in a backward walk you very likely run into control insns first. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. OK? For the record, the profile was (after the delete_trivially_dead_insns fix) Samples: 289K of event 'cycles:u', Event count (approx.): 384226334976 Overhead Samples Command Shared Object Symbol 3.52% 9747 cc1 cc1 [.] bb_is_just_return # and after the fix bb_is_just_return has no recorded samples anymore. Thanks, Richard. PR rtl-optimization/109237 * cfgcleanup.cc (bb_is_just_return): Walk insns backwards. OK. Sorry if I introduced this hog. jeff
Re: [PATCH, commited] Fortran: remove dead code [PR104321]
Hi Paul, > If you will excuse the British cultural reference, that's a Norwegian Blue > alright! Good spot. ROTFL! I first had to look up the "Norwegian Blue", and then I remembered. :) You're bringing back the fun to gfortran hacking! Cheers, Harald On Sat, 25 Mar 2023 at 19:13, Harald Anlauf via Fortran mailto:fort...@gcc.gnu.org]> wrote:Dear all, I've committed the attached patch from the PR that removes a dead code snippet, see discussion. Regtested originally by Tobias, and reconfirmed on x86_64-pc-linux-gnu. Pushed as r13-6862-gb5fce899dbbd72 . Thanks, Harald -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: [PATCH] predict: Don't emit -Wsuggest-attribute=cold warning for functions which already have that attribute [PR105685]
> Hi! > > In the following testcase, we predict baz to have cold > entry regardless of the user supplied attribute (as it call > unconditionally a cold function), but still issue > a -Wsuggest-attribute=cold warning despite it having that attribute > already. > > The following patch avoids that. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2023-03-25 Jakub Jelinek > > PR ipa/105685 > * predict.cc (compute_function_frequency): Don't call > warn_function_cold if function already has cold attribute. > > * c-c++-common/cold-2.c: New test. > > --- gcc/predict.cc.jj 2023-01-02 09:32:38.273055726 +0100 > +++ gcc/predict.cc2023-03-24 16:54:13.658606215 +0100 > @@ -4033,7 +4033,9 @@ compute_function_frequency (void) > } > >node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED; > - warn_function_cold (current_function_decl); > + if (lookup_attribute ("cold", DECL_ATTRIBUTES (current_function_decl)) > + == NULL) > +warn_function_cold (current_function_decl); OK, tanks! In general we probably want to walk aliases and suggest warning on aliases attached to the function, but we get this wrong with other attributes too, so I will add it to TODo for next stage1. Honza >if (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.ipa() == profile_count::zero ()) > return; >FOR_EACH_BB_FN (bb, cfun) > --- gcc/testsuite/c-c++-common/cold-2.c.jj2023-03-24 16:56:07.344000973 > +0100 > +++ gcc/testsuite/c-c++-common/cold-2.c 2023-03-24 16:55:58.985119001 > +0100 > @@ -0,0 +1,19 @@ > +/* PR ipa/105685 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -Wsuggest-attribute=cold" } */ > + > +extern void foo (char *, char const *, int); > + > +__attribute__((cold)) char * > +bar (int x) > +{ > + static char b[42]; > + foo (b, "foo", x); > + return b; > +} > + > +__attribute__((cold)) char * > +baz (int x) /* { dg-bogus "function might be candidate for > attribute 'cold'" } */ > +{ > + return bar (x); > +} > > Jakub >
m68k: handle TLS access with offset
This reinstates FINAL_PRESCAN_INSN, and the calls in handle_move_double, so that access to TLS variables with offset are properly handled. gcc: PR target/106282 * config/m68k/m68k.h (FINAL_PRESCAN_INSN): Define. * config/m68k/m68k.cc (m68k_final_prescan_insn): Define. (handle_move_double): Call it before handle_movsi. * config/m68k/m68k-protos.h: Declare it. gcc/testsuite: PR target/106282 * gcc.target/m68k/tls-gd-off.c: New. * gcc.target/m68k/tls-ie-off.c: New. * gcc.target/m68k/tls-ld-off.c: New. * gcc.target/m68k/tls-ld-xtls-off.c: New. * gcc.target/m68k/tls-le-off.c: New. * gcc.target/m68k/tls-le-xtls-off.c: New. * gcc.target/m68k/tls-ld.c: Make pattern less strict. * gcc.target/m68k/tls-le.c: Likewise. --- gcc/config/m68k/m68k-protos.h | 1 + gcc/config/m68k/m68k.cc | 15 +++ gcc/config/m68k/m68k.h| 3 +++ .../gcc.target/m68k/{tls-ld.c => tls-gd-off.c}| 7 +++ .../gcc.target/m68k/{tls-le.c => tls-ie-off.c}| 6 +++--- .../gcc.target/m68k/{tls-ld.c => tls-ld-off.c}| 8 .../m68k/{tls-ld.c => tls-ld-xtls-off.c} | 8 gcc/testsuite/gcc.target/m68k/tls-ld.c| 4 ++-- .../gcc.target/m68k/{tls-le.c => tls-le-off.c}| 6 +++--- gcc/testsuite/gcc.target/m68k/tls-le-xtls-off.c | 13 + gcc/testsuite/gcc.target/m68k/tls-le.c| 2 +- 11 files changed, 52 insertions(+), 21 deletions(-) copy gcc/testsuite/gcc.target/m68k/{tls-ld.c => tls-gd-off.c} (52%) copy gcc/testsuite/gcc.target/m68k/{tls-le.c => tls-ie-off.c} (62%) copy gcc/testsuite/gcc.target/m68k/{tls-ld.c => tls-ld-off.c} (52%) copy gcc/testsuite/gcc.target/m68k/{tls-ld.c => tls-ld-xtls-off.c} (57%) copy gcc/testsuite/gcc.target/m68k/{tls-le.c => tls-le-off.c} (62%) create mode 100644 gcc/testsuite/gcc.target/m68k/tls-le-xtls-off.c diff --git a/gcc/config/m68k/m68k-protos.h b/gcc/config/m68k/m68k-protos.h index 60bff796534..724d446af93 100644 --- a/gcc/config/m68k/m68k-protos.h +++ b/gcc/config/m68k/m68k-protos.h @@ -84,6 +84,7 @@ extern int emit_move_sequence (rtx *, machine_mode, rtx); extern bool m68k_movem_pattern_p (rtx, rtx, HOST_WIDE_INT, bool); extern const char *m68k_output_movem (rtx *, rtx, HOST_WIDE_INT, bool); extern bool m68k_epilogue_uses (int); +extern void m68k_final_prescan_insn (rtx_insn *, rtx *, int); /* Functions from m68k.cc used in constraints.md. */ extern rtx m68k_unwrap_symbol (rtx, bool); diff --git a/gcc/config/m68k/m68k.cc b/gcc/config/m68k/m68k.cc index 0bff89bc39d..03db2b6a936 100644 --- a/gcc/config/m68k/m68k.cc +++ b/gcc/config/m68k/m68k.cc @@ -2550,6 +2550,18 @@ m68k_adjust_decorated_operand (rtx op) } } +/* Prescan insn before outputing assembler for it. */ + +void +m68k_final_prescan_insn (rtx_insn *insn ATTRIBUTE_UNUSED, +rtx *operands, int n_operands) +{ + int i; + + for (i = 0; i < n_operands; ++i) +m68k_adjust_decorated_operand (operands[i]); +} + /* Move X to a register and add REG_EQUAL note pointing to ORIG. If REG is non-null, use it; generate new pseudo otherwise. */ @@ -3658,6 +3670,7 @@ handle_move_double (rtx operands[2], /* Normal case: do the two words, low-numbered first. */ + m68k_final_prescan_insn (NULL, operands, 2); handle_movsi (operands); /* Do the middle one of the three words for long double */ @@ -3668,6 +3681,7 @@ handle_move_double (rtx operands[2], if (addreg1) handle_reg_adjust (addreg1, 4); + m68k_final_prescan_insn (NULL, middlehalf, 2); handle_movsi (middlehalf); } @@ -3678,6 +3692,7 @@ handle_move_double (rtx operands[2], handle_reg_adjust (addreg1, 4); /* Do that word. */ + m68k_final_prescan_insn (NULL, latehalf, 2); handle_movsi (latehalf); /* Undo the adds we just did. */ diff --git a/gcc/config/m68k/m68k.h b/gcc/config/m68k/m68k.h index 6f0bdd8dffa..450c380359c 100644 --- a/gcc/config/m68k/m68k.h +++ b/gcc/config/m68k/m68k.h @@ -837,6 +837,9 @@ __transfer_from_trampoline () \ assemble_name ((FILE), (NAME)), \ fprintf ((FILE), ",%u\n", (int)(ROUNDED))) +#define FINAL_PRESCAN_INSN(INSN, OPVEC, NOPERANDS) \ + m68k_final_prescan_insn (INSN, OPVEC, NOPERANDS) + /* On the 68000, we use several CODE characters: '.' for dot needed in Motorola-style opcode names. '-' for an operand pushing on the stack: diff --git a/gcc/testsuite/gcc.target/m68k/tls-ld.c b/gcc/testsuite/gcc.target/m68k/tls-gd-off.c similarity index 52% copy from gcc/testsuite/gcc.target/m68k/tls-ld.c copy to gcc/testsuite/gcc.target/m68k/tls-gd-off.c index af470c9613a..4af6128ae27 100644 --- a/gcc/testsuite/gcc.target/m68k/tls-ld.c +++ b/gcc/testsuite/gcc.target/m68k/tls-gd-off.c @@ -1,14 +1,13 @@ /* { dg-do compile } */ /* { dg-skip-if
Re: Re: [PATCH] RISC-V: Optimize zbb ins sext.b and sext.h in rv64
On 2023-03-26 02:18 Jeff Law wrote: > > > >On 3/23/23 20:45, juzhe.zh...@rivai.ai wrote: >> Sounds like you are looking at redundant extension problem in RISC-V port. >> This is the issue I want to fix but I don't find the time to do that. >> My first impression is that we need to fix redundant extension in "ree" >> PASS. >> I am not sure. >It's actually quite a bit more complicated. > >Some extension elimination can and probably should be happening in >gimple. In gimple you have access to type information as well as range >information. So you have the opportunity to do things like rewrite the >IL to use different types when it's safe to do so, or to use range >information to identify when an object is already properly extended and >thus eliminate the extension before we expand gimple into RTL. > >Once in RTL, you can use forward propagation to eliminate extensions, or >at least fold them into existing operations. combine can eliminate >extensions and it has the ability to track (for example) if the upper >bits are copies of the sign bit, if they're known zero, etc. combine is >also capable of recognizing that a load implicitly extends and using >that knowledge to eliminate extensions or to discover that a pair of >shifts are just zero or sign extending a value, etc etc. combine also >interacts with simplify-rtx which is used by other passes, so there's a >chance that work in simplify-rtx can eliminate extensions not just in >combine, but in other passes as well. > >REE is a post-register allocation pass and kind of the last chance to >eliminate extensions. > >So for any given redundant extension, the way to go (IMHO) is to walk >through the optimizer pipeline to see where it can potentially be >eliminated. In general, the earlier in the optimizer pipeline the >extension can be eliminated, the better. > >Jeff Hi Jeff,Do you think my patch modification is suitable?What else needs to be improved? Thanks. Feng Wang
Re: [PATCH] RTL: Bugfix for wrong code with v16hi compare & mask
On Sun, Mar 26, 2023 at 3:01 AM Jeff Law via Gcc-patches wrote: > > > > On 3/24/23 08:11, pan2.li--- via Gcc-patches wrote: > > From: Pan Li > > > > Fix the bug of the incorrect code generation for the > > below code sample. > > > > typedef unsigned short __attribute__((__vector_size__ (32))) V; > > typedef unsigned short u16; > > > > void > > foo (V m, u16 *ret) > > { > >V v = 6 > ((V) { 2049, 8 } & m); > >*ret = v[0]; // + a + b + c + d; > > } > > > > Before this patch. > > addisp,sp,-64 > > ld a5,0(a0) > > li a4,528384 > > addia4,a4,-2047 > > and a5,a5,a4 > > // sllia5,a5,48 <- eliminated by mistake > > // srlia5,a5,48 <- eliminated by mistake > > sltiu a5,a5,6 > > negwa5,a5 > > sh a5,0(a1) > > > > After this patch. > > addisp,sp,-64 > > ld a5,0(a0) > > li a4,528384 > > addia4,a4,-2047 > > and a5,a5,a4 > > sllia5,a5,48 > > srlia5,a5,48 > > sltiu a5,a5,6 > > negwa5,a5 > > sh a5,0(a1) > > > > The simplify_comparation for the AND operation will > > try to simplify below RTL code from: > > (and:DI (subreg:DI (reg:HI 154) 0) (const_int 0x801)) > > to: > > (subreg:DI (and (reg:HI 154) (const_int 0x801)) 0) > These look equivalent to me -- assuming they're used as rvalues. They're equivalent only when WORD_REGISTER_OPERATIONS, orelse the upper bits of latter is UD, but the former is 0. (and (reg:HI 154) (const_int 0x801)) is simplified to (reg:HI 154) since nonzero_bits (reg:154, HImode) is exactly same as 0x801. These two optimizations are fine on their own, but if they are put together, there are problems. The first optimization relies on the WORD_REGISTER_OPERATIONS, but the second optimize the operation off which make upper bits of (subreg:DI (reg:HI 154) 0) UD, but originally it should be 0 after AND (const_int 0x801). > > > > > > If reg:HI 154 is 0x801 and reg:DI 154 is 0x80801, the RTL will > > be simplified continuely to: > That statement has no meaning. Each pseudo has one and only one native > mode and you can only refer to it in that mode. ie reg:HI 154. reg:DI > 154 has no meaning. You might say that (subreg:DI (reg:HI 154) 0) has > the value 0x80801, but that's OK. The subreg says those bits outside > HImode simply don't matter -- you can not depend on them having any > particular value. > > > (subreg:DI (reg:HI 154) 0) > I think that's equivalent to (subreg:DI (and:HI (reg:HI 154) (const_int > 0x801)) 0) when used as an rvalue. > > I suspect your problem is elsewhere. > > jeff > -- BR, Hongtao
Re: [PATCH] RISC-V: Optimize zbb ins sext.b and sext.h in rv64
On 3/26/23 19:32, Feng Wang wrote: On 2023-03-26 02:18 Jeff Law wrote: On 3/23/23 20:45, juzhe.zh...@rivai.ai wrote: Sounds like you are looking at redundant extension problem in RISC-V port. This is the issue I want to fix but I don't find the time to do that. My first impression is that we need to fix redundant extension in "ree" PASS. I am not sure. It's actually quite a bit more complicated. Some extension elimination can and probably should be happening in gimple. In gimple you have access to type information as well as range information. So you have the opportunity to do things like rewrite the IL to use different types when it's safe to do so, or to use range information to identify when an object is already properly extended and thus eliminate the extension before we expand gimple into RTL. Once in RTL, you can use forward propagation to eliminate extensions, or at least fold them into existing operations. combine can eliminate extensions and it has the ability to track (for example) if the upper bits are copies of the sign bit, if they're known zero, etc. combine is also capable of recognizing that a load implicitly extends and using that knowledge to eliminate extensions or to discover that a pair of shifts are just zero or sign extending a value, etc etc. combine also interacts with simplify-rtx which is used by other passes, so there's a chance that work in simplify-rtx can eliminate extensions not just in combine, but in other passes as well. REE is a post-register allocation pass and kind of the last chance to eliminate extensions. So for any given redundant extension, the way to go (IMHO) is to walk through the optimizer pipeline to see where it can potentially be eliminated. In general, the earlier in the optimizer pipeline the extension can be eliminated, the better. Jeff Hi Jeff,Do you think my patch modification is suitable?What else needs to be improved? I haven't looked at it in any detail. We're in stage4 right now, so it's regression bugfixes only going into the tree. Once gcc-13 branches I'll be focused on helping folks move RVV forward, submitting/refining various RISC-V patches from Ventana and reviewing other RISC-V related patches. Jeff
[PATCH, rs6000] rs6000: correct vector sign extend built-ins on Big Endian [PR108812]
Hi, This patch removes byte reverse operation before vector integer sign extension on Big Endian. These built-ins require to sign extend the rightmost element. So both BE and LE should do the same operation and the byte reversion is no need. This patch fixes it. Now these built-ins have the same behavior on all compilers. The test case is modified also. The patch passed regression test on Power Linux platforms. Thanks Gui Haochen ChangeLog rs6000: correct vector sign extend builtins on Big Endian gcc/ PR target/108812 * config/rs6000/vsx.md (vsignextend_qi_): Remove byte reverse for Big Endian. (vsignextend_hi_): Likewise. (vsignextend_si_v2di): Remove. * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsignextsw2d): Set bif-pattern to vsx_sign_extend_si_v2di. gcc/testsuite/ PR target/108812 * gcc.target/powerpc/p9-sign_extend-runnable.c: Set different expected vectors for Big Endian. patch.diff diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index f76f54793d7..059a455b388 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -2699,7 +2699,7 @@ VSIGNEXTSH2W vsignextend_hi_v4si {} const vsll __builtin_altivec_vsignextsw2d (vsi); -VSIGNEXTSW2D vsignextend_si_v2di {} +VSIGNEXTSW2D vsx_sign_extend_si_v2di {} const vsc __builtin_altivec_vslv (vsc, vsc); VSLV vslv {} diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 992fbc983be..9e9b33f56ab 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4941,14 +4941,7 @@ (define_expand "vsignextend_qi_" UNSPEC_VSX_SIGN_EXTEND))] "TARGET_P9_VECTOR" { - if (BYTES_BIG_ENDIAN) -{ - rtx tmp = gen_reg_rtx (V16QImode); - emit_insn (gen_altivec_vrevev16qi2(tmp, operands[1])); - emit_insn (gen_vsx_sign_extend_qi_(operands[0], tmp)); -} - else -emit_insn (gen_vsx_sign_extend_qi_(operands[0], operands[1])); + emit_insn (gen_vsx_sign_extend_qi_(operands[0], operands[1])); DONE; }) @@ -4968,14 +4961,7 @@ (define_expand "vsignextend_hi_" UNSPEC_VSX_SIGN_EXTEND))] "TARGET_P9_VECTOR" { - if (BYTES_BIG_ENDIAN) -{ - rtx tmp = gen_reg_rtx (V8HImode); - emit_insn (gen_altivec_vrevev8hi2(tmp, operands[1])); - emit_insn (gen_vsx_sign_extend_hi_(operands[0], tmp)); -} - else - emit_insn (gen_vsx_sign_extend_hi_(operands[0], operands[1])); + emit_insn (gen_vsx_sign_extend_hi_(operands[0], operands[1])); DONE; }) @@ -4987,24 +4973,6 @@ (define_insn "vsx_sign_extend_si_v2di" "vextsw2d %0,%1" [(set_attr "type" "vecexts")]) -(define_expand "vsignextend_si_v2di" - [(set (match_operand:V2DI 0 "vsx_register_operand" "=v") - (unspec:V2DI [(match_operand:V4SI 1 "vsx_register_operand" "v")] -UNSPEC_VSX_SIGN_EXTEND))] - "TARGET_P9_VECTOR" -{ - if (BYTES_BIG_ENDIAN) -{ - rtx tmp = gen_reg_rtx (V4SImode); - - emit_insn (gen_altivec_vrevev4si2(tmp, operands[1])); - emit_insn (gen_vsx_sign_extend_si_v2di(operands[0], tmp)); -} - else - emit_insn (gen_vsx_sign_extend_si_v2di(operands[0], operands[1])); - DONE; -}) - ;; Sign extend DI to TI. We provide both GPR targets and Altivec targets on ;; power10. On earlier systems, the machine independent code will generate a ;; shift left to sign extend the 64-bit value to 128-bit. diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c index fdcad019b96..03c0f1201e4 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c @@ -34,7 +34,12 @@ int main () /* test sign extend byte to word */ vec_arg_qi = (vector signed char) {1, 2, 3, 4, 5, 6, 7, 8, -1, -2, -3, -4, -5, -6, -7, -8}; + +#ifdef __BIG_ENDIAN__ + vec_expected_wi = (vector signed int) {4, 8, -4, -8}; +#else vec_expected_wi = (vector signed int) {1, 5, -1, -5}; +#endif vec_result_wi = vec_signexti (vec_arg_qi); @@ -54,7 +59,12 @@ int main () /* test sign extend byte to double */ vec_arg_qi = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8, -1, -2, -3, -4, -5, -6, -7, -8}; + +#ifdef __BIG_ENDIAN__ + vec_expected_di = (vector signed long long int){8, -8}; +#else vec_expected_di = (vector signed long long int){1, -1}; +#endif vec_result_di = vec_signextll(vec_arg_qi); @@ -72,7 +82,12 @@ int main () /* test sign extend short to word */ vec_arg_hi = (vector signed short int){1, 2, 3, 4, -1, -2, -3, -4}; + +#ifdef __BIG_ENDIAN__ + vec_expected_wi = (vector signed int){2, 4, -2, -4}; +#else vec_expected_wi = (vector signed int){1, 3, -1, -3}; +#endif vec_result_wi = vec_signexti(vec_arg_hi); @@ -90,7 +105,12 @@ int main () /* test sign
Re: [PATCH] c++, coroutines: Stabilize names of promoted slot vars [PR101118].
On Sun, Mar 26, 2023 at 6:55 PM Iain Sandoe via Gcc-patches wrote: > > Tested on x86_64-darwin21, x86-64-linux-gnu > OK for trunk? > Iain > > When we need to 'promote' a value (i.e. store it in the coroutine frame) it > is given a frame entry name. This was based on the DECL_UID for slot vars. > However, when LTO is used, the names from multiple TUs become visible at the > same time, and the DECL_UIDs usually differ between units. This leads to a > "ODR mismatch" warning for the frame type. > > The fix here is to use a counter instead of the DECL_UID which makes a name > that is stable between TUs for each frame layout (one per coroutine func). I don't see how this avoids clashes across TUs? But are those VAR_DECLs not local anyway? I suppose -Wodr diagnostics for DECL_ARTIFICIAL vars are a bit on the edge as well ... Richard. > Signed-off-by: Iain Sandoe > > PR c++/101118 > > gcc/cp/ChangeLog: > > * coroutines.cc: Add counter for promoted slot vars. > (flatten_await_stmt): Use slot vars counter instead of DECL_UID > to generate the frame entry name for promoted target expression > slot variables. > (morph_fn_to_coro): Reset the slot vars counter at the start of > each coroutine function. > --- > gcc/cp/coroutines.cc | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc > index a2189e43db8..359a5bf46ff 100644 > --- a/gcc/cp/coroutines.cc > +++ b/gcc/cp/coroutines.cc > @@ -2726,6 +2726,11 @@ struct var_nest_node >var_nest_node *else_cl; > }; > > +/* This is used to make a stable, but unique-per-function, sequence number > for > + each TARGET_EXPR slot variable that we 'promote' to a frame entry. It > needs > + to be stable because the frame type is visible to LTO ODR checking. */ > +static unsigned tmpno = 0; > + > /* This is called for single statements from the co-await statement walker. > It checks to see if the statement contains any initializers for awaitables > and if any of these capture items by reference. */ > @@ -2889,7 +2894,7 @@ flatten_await_stmt (var_nest_node *n, hash_set > *promoted, > tree init = t; > temps_used->add (init); > tree var_type = TREE_TYPE (init); > - char *buf = xasprintf ("D.%d", DECL_UID (TREE_OPERAND (init, 0))); > + char *buf = xasprintf ("T%03u", tmpno++); > tree var = build_lang_decl (VAR_DECL, get_identifier (buf), > var_type); > DECL_ARTIFICIAL (var) = true; > free (buf); > @@ -4374,6 +4379,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree > *destroyer) > { >gcc_checking_assert (orig && TREE_CODE (orig) == FUNCTION_DECL); > > + tmpno = 0; >*resumer = error_mark_node; >*destroyer = error_mark_node; >if (!coro_function_valid_p (orig)) > -- > 2.37.1 (Apple Git-137.1) >
Re: [PATCH] lto/109263 - lto-wrapper and -g0 -ggdb
On Thu, 23 Mar 2023, Richard Biener wrote: > The following makes lto-wrapper deal with non-combined debug > disabling / enabling option combinations properly. Interestingly > -gno-dwarf also enables debug. > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > OK? Or do we want to try harder to zap earlier -g0 when later > -g* appear? I pushed this to fix the regression, the patch stays valid even when the patches rejecting negative variants of -ggdb and friends is approved. Richard. > PR lto/109263 > * lto-wrapper.c (run_gcc): Parse alternate debug options > as well, they always enable debug. > --- > gcc/lto-wrapper.cc | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc > index fe8c5f6e80d..5186d040ce0 100644 > --- a/gcc/lto-wrapper.cc > +++ b/gcc/lto-wrapper.cc > @@ -1564,6 +1564,16 @@ run_gcc (unsigned argc, char *argv[]) > skip_debug = option->arg && !strcmp (option->arg, "0"); > break; > > + case OPT_gbtf: > + case OPT_gctf: > + case OPT_gdwarf: > + case OPT_gdwarf_: > + case OPT_ggdb: > + case OPT_gvms: > + /* Negative forms, if allowed, enable debug info as well. */ > + skip_debug = false; > + break; > + > case OPT_dumpdir: > incoming_dumppfx = dumppfx = option->arg; > break; > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Re: [PATCH] c++, coroutines: Stabilize names of promoted slot vars [PR101118].
Hi Richard, (I’m away from my usual infrastructure, so responses could be slow and testing things could take a while). > On 27 Mar 2023, at 12:10, Richard Biener wrote: > > On Sun, Mar 26, 2023 at 6:55 PM Iain Sandoe via Gcc-patches > wrote: >> >> Tested on x86_64-darwin21, x86-64-linux-gnu >> OK for trunk? >> Iain >> >> When we need to 'promote' a value (i.e. store it in the coroutine frame) it >> is given a frame entry name. This was based on the DECL_UID for slot vars. >> However, when LTO is used, the names from multiple TUs become visible at the >> same time, and the DECL_UIDs usually differ between units. This leads to a >> "ODR mismatch" warning for the frame type. >> >> The fix here is to use a counter instead of the DECL_UID which makes a name >> that is stable between TUs for each frame layout (one per coroutine func). > > I don't see how this avoids clashes across TUs? But are those VAR_DECLs not > local anyway? The reported ODR issue is in the frame type (which is a structure) — it sees two frame layouts with the same types for each field but a different name for the entries that came from the promotion of the slot var (because I used the DECL_UID to generate the field name). > I suppose -Wodr diagnostics for DECL_ARTIFICIAL vars are a bit on the > edge as well ... These promoted vars get DECL_VALUE_EXPRs (and as noted above a name to assist in debugging) tying them to the frame entry, .. although I do agree that reporting warnings for compiler-internal stuff is definitely on the edge (ISTR seeing maybe unused reports against such too). Not sure if we have an easy way to tell that the frame type is an internal one tho. Perhaps that needs a DECL_ARTIFICAL - but would that not make it unavailable for debug? Iain > > Richard. > >> Signed-off-by: Iain Sandoe >> >>PR c++/101118 >> >> gcc/cp/ChangeLog: >> >>* coroutines.cc: Add counter for promoted slot vars. >>(flatten_await_stmt): Use slot vars counter instead of DECL_UID >>to generate the frame entry name for promoted target expression >>slot variables. >>(morph_fn_to_coro): Reset the slot vars counter at the start of >>each coroutine function. >> --- >> gcc/cp/coroutines.cc | 8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc >> index a2189e43db8..359a5bf46ff 100644 >> --- a/gcc/cp/coroutines.cc >> +++ b/gcc/cp/coroutines.cc >> @@ -2726,6 +2726,11 @@ struct var_nest_node >> var_nest_node *else_cl; >> }; >> >> +/* This is used to make a stable, but unique-per-function, sequence number >> for >> + each TARGET_EXPR slot variable that we 'promote' to a frame entry. It >> needs >> + to be stable because the frame type is visible to LTO ODR checking. */ >> +static unsigned tmpno = 0; >> + >> /* This is called for single statements from the co-await statement walker. >>It checks to see if the statement contains any initializers for awaitables >>and if any of these capture items by reference. */ >> @@ -2889,7 +2894,7 @@ flatten_await_stmt (var_nest_node *n, hash_set >> *promoted, >> tree init = t; >> temps_used->add (init); >> tree var_type = TREE_TYPE (init); >> - char *buf = xasprintf ("D.%d", DECL_UID (TREE_OPERAND (init, 0))); >> + char *buf = xasprintf ("T%03u", tmpno++); >> tree var = build_lang_decl (VAR_DECL, get_identifier (buf), >> var_type); >> DECL_ARTIFICIAL (var) = true; >> free (buf); >> @@ -4374,6 +4379,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree >> *destroyer) >> { >> gcc_checking_assert (orig && TREE_CODE (orig) == FUNCTION_DECL); >> >> + tmpno = 0; >> *resumer = error_mark_node; >> *destroyer = error_mark_node; >> if (!coro_function_valid_p (orig)) >> -- >> 2.37.1 (Apple Git-137.1)
[PATCH] RISC-V: Fix PR108279
From: Juzhe-Zhong PR 108270 Fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270. Consider the following testcase: void f (void * restrict in, void * restrict out, int l, int n, int m) { for (int i = 0; i < l; i++){ for (int j = 0; j < m; j++){ for (int k = 0; k < n; k++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17); __riscv_vse8_v_i8mf8 (out + i + j, v, 17); } } } } Compile option: -O3 Before this patch: mv a7,a2 mv a6,a0 mv t1,a1 mv a2,a3 vsetivlizero,17,e8,mf8,ta,ma ... After this patch: mv a7,a2 mv a6,a0 mv t1,a1 mv a2,a3 ble a7,zero,.L1 ble a4,zero,.L1 ble a3,zero,.L1 add a1,a0,a4 li a0,0 vsetivlizero,17,e8,mf8,ta,ma ... It will produce potential bug when: int main () { vsetivli zero, 100,. f (in, out, 0,0,0) asm volatile ("csrr a0,vl":::"memory"); // Before this patch the a0 is 17. (Wrong). // After this patch the a0 is 100. (Correct). ... } gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_empty_predecessor_p): New function. (pass_vsetvl::backward_demand_fusion): Fix bug. * config/riscv/riscv-vsetvl.h: New function declare. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt test. * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Adapt test. * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 24 +++ gcc/config/riscv/riscv-vsetvl.h | 2 ++ .../riscv/rvv/vsetvl/imm_bb_prop-1.c | 2 +- .../riscv/rvv/vsetvl/imm_conflict-3.c | 4 ++-- .../gcc.target/riscv/rvv/vsetvl/pr108270.c| 19 +++ 5 files changed, 48 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index b5f5301ea43..4948e5d4c5e 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2361,6 +2361,21 @@ vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const return true; } +bool +vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const +{ + hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb); + for (const basic_block pred_cfg_bb : pred_cfg_bbs) +{ + const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index]; + if (!pred_block_info.local_dem.valid_or_dirty_p () + && !pred_block_info.reaching_out.valid_or_dirty_p ()) + continue; + return false; +} + return true; +} + bool vector_infos_manager::all_same_avl_p (const basic_block cfg_bb, sbitmap bitdata) const @@ -3118,6 +3133,14 @@ pass_vsetvl::backward_demand_fusion (void) if (!backward_propagate_worthwhile_p (cfg_bb, curr_block_info)) continue; + /* Fix PR108270: + + bb 0 -> bb 1 +We don't need to backward fuse VL/VTYPE info from bb 1 to bb 0 +if bb 1 is not inside a loop and all predecessors of bb 0 are empty. */ + if (m_vector_manager->all_empty_predecessor_p (cfg_bb)) + continue; + edge e; edge_iterator ei; /* Backward propagate to each predecessor. */ @@ -3131,6 +3154,7 @@ pass_vsetvl::backward_demand_fusion (void) continue; if (e->src->index == ENTRY_BLOCK_PTR_FOR_FN (cfun)->index) continue; + /* If prop is demand of vsetvl instruction and reaching doesn't demand AVL. We don't backward propagate since vsetvl instruction has no side effects. */ diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h index 237381f7026..eec03d35071 100644 --- a/gcc/config/riscv/riscv-vsetvl.h +++ b/gcc/config/riscv/riscv-vsetvl.h @@ -450,6 +450,8 @@ public: /* Return true if all expression set in bitmap are same ratio. */ bool all_same_ratio_p (sbitmap) const; + bool all_empty_predecessor_p (const basic_block) const; + void release (void); void create_bitmap_vectors (void); void free_bitmap_vectors (void); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c index cd4ee7dd0d3..ed32a40f5e7 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c @@ -29,4 +29,4 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond) } } -/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,